Series comparison

-[PULL 00/26] target-arm queue
+[PULL 00/29] target-arm queue
-Small pile of bug fixes for rc1. I've included my patches to get
+Hi; this mostly contains the first slice of A64 decodetree
-our docs building with Sphinx 3, just for convenience...
+patches, plus some other minor pieces. It also has the
 enablement of MTE for KVM guests.
+thanks
 -- PMM
-The following changes since commit b149dea55cce97cb226683d06af61984a1c11e96:
+The following changes since commit d27e7c359330ba7020bdbed7ed2316cb4cf6ffc1:
-  Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging (2020-11-02 10:57:48 +0000)
+  qapi/parser: Drop two bad type hints for now (2023-05-17 10:18:33 -0700)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201102
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230518
-for you to fetch changes up to ffb4fbf90a2f63c9cb33e4bb9f854c79bf04ca4a:
+for you to fetch changes up to 91608e2a44f36e79cb83f863b8a7bb57d2c98061:
-  tests/qtest/npcm7xx_rng-test: Disable randomness tests (2020-11-02 16:52:18 +0000)
+  docs: Convert u2f.txt to rST (2023-05-18 11:40:32 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * target/arm: Fix Neon emulation bugs on big-endian hosts
+ * Fix vd == vm overlap in sve_ldff1_z
- * target/arm: fix handling of HCR.FB
+ * Add support for MTE with KVM guests
- * target/arm: fix LORID_EL1 access check
+ * Add RAZ/WI handling for DBGDTR[TX|RX]
- * disas/capstone: Fix monitor disassembly of >32 bytes
+ * Start of conversion of A64 decoder to decodetree
- * hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+ * Saturate L2CTLR_EL1 core count field rather than overflowing
- * hw/arm/boot: fix SVE for EL3 direct kernel boot
+ * vexpress: Avoid trivial memory leak of 'flashalias'
- * hw/display/omap_lcdc: Fix potential NULL pointer dereference
+ * sbsa-ref: switch default cpu core to Neoverse-N1
- * hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
+ * sbsa-ref: use Bochs graphics card instead of VGA
- * target/arm: Get correct MMU index for other-security-state
+ * MAINTAINERS: Add Marcin Juszkiewicz to sbsa-ref reviewer list
- * configure: Test that gio libs from pkg-config work
+ * docs: Convert u2f.txt to rST
  * hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
  * docs: Fix building with Sphinx 3
  * tests/qtest/npcm7xx_rng-test: Disable randomness tests
 ----------------------------------------------------------------
-AlexChen (2):
+Alex Bennée (1):
-      hw/display/omap_lcdc: Fix potential NULL pointer dereference
+      target/arm: add RAZ/WI handling for DBGDTR[TX|RX]
       hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
-Peter Maydell (9):
+Cornelia Huck (1):
-      target/arm: Fix float16 pairwise Neon ops on big-endian hosts
+      arm/kvm: add support for MTE
       target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
       disas/capstone: Fix monitor disassembly of >32 bytes
       target/arm: Get correct MMU index for other-security-state
       configure: Test that gio libs from pkg-config work
       hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
       scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
       qemu-option-trace.rst.inc: Don't use option:: markup
       tests/qtest/npcm7xx_rng-test: Disable randomness tests
-Philippe Mathieu-Daudé (1):
+Marcin Juszkiewicz (3):
-      hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+      sbsa-ref: switch default cpu core to Neoverse-N1
       Maintainers: add myself as reviewer for sbsa-ref
       sbsa-ref: use Bochs graphics card instead of VGA
-Richard Henderson (11):
+Peter Maydell (14):
-      target/arm: Introduce neon_full_reg_offset
+      target/arm: Create decodetree skeleton for A64
-      target/arm: Move neon_element_offset to translate.c
+      target/arm: Pull calls to disas_sve() and disas_sme() out of legacy decoder
-      target/arm: Use neon_element_offset in neon_load/store_reg
+      target/arm: Convert Extract instructions to decodetree
-      target/arm: Use neon_element_offset in vfp_reg_offset
+      target/arm: Convert unconditional branch immediate to decodetree
-      target/arm: Add read/write_neon_element32
+      target/arm: Convert CBZ, CBNZ to decodetree
-      target/arm: Expand read/write_neon_element32 to all MemOp
+      target/arm: Convert TBZ, TBNZ to decodetree
-      target/arm: Rename neon_load_reg32 to vfp_load_reg32
+      target/arm: Convert conditional branch insns to decodetree
-      target/arm: Add read/write_neon_element64
+      target/arm: Convert BR, BLR, RET to decodetree
-      target/arm: Rename neon_load_reg64 to vfp_load_reg64
+      target/arm: Convert BRA[AB]Z, BLR[AB]Z, RETA[AB] to decodetree
-      target/arm: Simplify do_long_3d and do_2scalar_long
+      target/arm: Convert BRAA, BRAB, BLRAA, BLRAB to decodetree
-      target/arm: Improve do_prewiden_3d
+      target/arm: Convert ERET, ERETAA, ERETAB to decodetree
       target/arm: Saturate L2CTLR_EL1 core count field rather than overflowing
       hw/arm/vexpress: Avoid trivial memory leak of 'flashalias'
       docs: Convert u2f.txt to rST
-Rémi Denis-Courmont (3):
+Richard Henderson (10):
-      target/arm: fix handling of HCR.FB
+      target/arm: Fix vd == vm overlap in sve_ldff1_z
-      target/arm: fix LORID_EL1 access check
+      target/arm: Split out disas_a64_legacy
-      hw/arm/boot: fix SVE for EL3 direct kernel boot
+      target/arm: Convert PC-rel addressing to decodetree
       target/arm: Split gen_add_CC and gen_sub_CC
       target/arm: Convert Add/subtract (immediate) to decodetree
       target/arm: Convert Add/subtract (immediate with tags) to decodetree
       target/arm: Replace bitmask64 with MAKE_64BIT_MASK
       target/arm: Convert Logical (immediate) to decodetree
       target/arm: Convert Move wide (immediate) to decodetree
       target/arm: Convert Bitfield to decodetree
- docs/qemu-option-trace.rst.inc     |   6 +-
+ MAINTAINERS                      |    1 +
- configure                          |  10 +-
+ docs/system/device-emulation.rst |    1 +
- include/hw/intc/arm_gicv3_common.h |   1 -
+ docs/system/devices/usb-u2f.rst  |   93 +++
- disas/capstone.c                   |   2 +-
+ docs/system/devices/usb.rst      |    2 +-
- hw/arm/boot.c                      |   3 +
+ docs/u2f.txt                     |  110 ----
- hw/arm/smmuv3.c                    |   3 +-
+ target/arm/cpu.h                 |    4 +
- hw/display/exynos4210_fimd.c       |   4 +-
+ target/arm/kvm_arm.h             |   19 +
- hw/display/omap_lcdc.c             |  10 +-
+ target/arm/tcg/translate.h       |    5 +
- hw/intc/arm_gicv3_cpuif.c          |   5 +-
+ target/arm/tcg/a64.decode        |  152 +++++
- target/arm/helper.c                |  24 +-
+ hw/arm/sbsa-ref.c                |    4 +-
- target/arm/m_helper.c              |   3 +-
+ hw/arm/vexpress.c                |   40 +-
- target/arm/translate.c             | 153 +++++++++---
+ hw/arm/virt.c                    |   73 ++-
- target/arm/vec_helper.c            |  12 +-
+ target/arm/cortex-regs.c         |   11 +-
- tests/qtest/npcm7xx_rng-test.c     |  14 +-
+ target/arm/cpu.c                 |    9 +-
- scripts/kernel-doc                 |  18 +-
+ target/arm/debug_helper.c        |   11 +-
- target/arm/translate-neon.c.inc    | 472 ++++++++++++++++++++-----------------
+ target/arm/kvm.c                 |   35 +
- target/arm/translate-vfp.c.inc     | 341 +++++++++++----------------
+ target/arm/kvm64.c               |    5 +
-files changed, 588 insertions(+), 493 deletions(-)
+ target/arm/tcg/sve_helper.c      |    6 +
  target/arm/tcg/translate-a64.c   | 1321 ++++++++++++++++----------------------
  target/arm/tcg/meson.build       |    1 +
 files changed, 979 insertions(+), 924 deletions(-)
  create mode 100644 docs/system/devices/usb-u2f.rst
  delete mode 100644 docs/u2f.txt
  create mode 100644 target/arm/tcg/a64.decode

-[PULL 16/26] disas/capstone: Fix monitor disassembly of >32 bytes
+[PULL 01/29] sbsa-ref: switch default cpu core to Neoverse-N1
-If we're using the capstone disassembler, disassembly of a run of
+From: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
 instructions more than 32 bytes long disassembles the wrong data for
 instructions beyond the 32 byte mark:
-(qemu) xp /16x 0x100
+The world outside moves to newer and newer cpu cores. Let move SBSA
-0000000000000100: 0x00000005 0x54410001 0x00000001 0x00001000
+Reference Platform to something newer as well.
 0000000000000110: 0x00000000 0x00000004 0x54410002 0x3c000000
 0000000000000120: 0x00000000 0x00000004 0x54410009 0x74736574
 0000000000000130: 0x00000000 0x00000000 0x00000000 0x00000000
 (qemu) xp /16i 0x100
 x00000100: 00000005 andeq r0, r0, r5
 x00000104: 54410001 strbpl r0, [r1], #-1
 x00000108: 00000001 andeq r0, r0, r1
 x0000010c: 00001000 andeq r1, r0, r0
 x00000110: 00000000 andeq r0, r0, r0
 x00000114: 00000004 andeq r0, r0, r4
 x00000118: 54410002 strbpl r0, [r1], #-2
 x0000011c: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
 x00000120: 54410001 strbpl r0, [r1], #-1
 x00000124: 00000001 andeq r0, r0, r1
 x00000128: 00001000 andeq r1, r0, r0
 x0000012c: 00000000 andeq r0, r0, r0
 x00000130: 00000004 andeq r0, r0, r4
 x00000134: 54410002 strbpl r0, [r1], #-2
 x00000138: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
 x0000013c: 00000000 andeq r0, r0, r0
-Here the disassembly of 0x120..0x13f is using the data that is in
+Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
-x104..0x123.
+Reviewed-by: Leif Lindholm <quic_llindhol@quicinc.com>
+Message-id: 20230506183417.1360427-1-marcin.juszkiewicz@linaro.org
 This is caused by passing the wrong value to the read_memory_func().
 The intention is that at this point in the loop the 'cap_buf' buffer
 already contains 'csize' bytes of data for the instruction at guest
 addr 'pc', and we want to read in an extra 'tsize' bytes.  Those
 extra bytes are therefore at 'pc + csize', not 'pc'.  On the first
 time through the loop 'csize' happens to be zero, so the initial read
 of 32 bytes into cap_buf is correct and as long as the disassembly
 never needs to read more data we return the correct information.
 Use the correct guest address in the call to read_memory_func().
 Cc: qemu-stable@nongnu.org
 Fixes: https://bugs.launchpad.net/qemu/+bug/1900779
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20201022132445.25039-1-peter.maydell@linaro.org
 ---
- disas/capstone.c | 2 +-
+ hw/arm/sbsa-ref.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/disas/capstone.c b/disas/capstone.c
+diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
 index XXXXXXX..XXXXXXX 100644
---- a/disas/capstone.c
+--- a/hw/arm/sbsa-ref.c
-+++ b/disas/capstone.c
++++ b/hw/arm/sbsa-ref.c
-@@ -XXX,XX +XXX,XX @@ bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
+@@ -XXX,XX +XXX,XX @@ static void sbsa_ref_class_init(ObjectClass *oc, void *data)
-         /* Make certain that we can make progress.  */
+     mc->init = sbsa_ref_init;
-         assert(tsize != 0);
+     mc->desc = "QEMU 'SBSA Reference' ARM Virtual Machine";
--        info->read_memory_func(pc, cap_buf + csize, tsize, info);
+-    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a57");
-+        info->read_memory_func(pc + csize, cap_buf + csize, tsize, info);
++    mc->default_cpu_type = ARM_CPU_TYPE_NAME("neoverse-n1");
-         csize += tsize;
+     mc->max_cpus = 512;
+     mc->pci_allow_0_address = true;
-         if (cs_disasm_iter(handle, &cbuf, &csize, &pc, insn)) {
+     mc->minimum_page_bits = 12;
 --
-.20.1
+.34.1

-[PULL 19/26] hw/display/omap_lcdc: Fix potential NULL pointer dereference
+[PULL 02/29] target/arm: Fix vd == vm overlap in sve_ldff1_z
-From: AlexChen <alex.chen@huawei.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-In omap_lcd_interrupts(), the pointer omap_lcd is dereferinced before
+If vd == vm, copy vm to scratch, so that we can pre-zero
-being check if it is valid, which may lead to NULL pointer dereference.
+the output and still access the gather indicies.
 So move the assignment to surface after checking that the omap_lcd is valid
 and move surface_bits_per_pixel(surface) to after the surface assignment.
-Reported-by: Euler Robot <euler.robot@huawei.com>
+Cc: qemu-stable@nongnu.org
-Signed-off-by: AlexChen <alex.chen@huawei.com>
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1612
-Message-id: 5F9CDB8A.9000001@huawei.com
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20230504104232.1877774-1-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/display/omap_lcdc.c | 10 +++++++---
+ target/arm/tcg/sve_helper.c | 6 ++++++
-file changed, 7 insertions(+), 3 deletions(-)
+file changed, 6 insertions(+)
-diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/omap_lcdc.c
+--- a/target/arm/tcg/sve_helper.c
-+++ b/hw/display/omap_lcdc.c
++++ b/target/arm/tcg/sve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
+@@ -XXX,XX +XXX,XX @@ void sve_ldff1_z(CPUARMState *env, void *vd, uint64_t *vg, void *vm,
- static void omap_update_display(void *opaque)
+     intptr_t reg_off;
- {
+     SVEHostPage info;
-     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
+     target_ulong addr, in_page;
--    DisplaySurface *surface = qemu_console_surface(omap_lcd->con);
++    ARMVectorReg scratch;
-+    DisplaySurface *surface;
-     draw_line_func draw_line;
+     /* Skip to the first true predicate.  */
-     int size, height, first, last;
+     reg_off = find_next_active(vg, 0, reg_max, esz);
-     int width, linesize, step, bpp, frame_offset;
+@@ -XXX,XX +XXX,XX @@ void sve_ldff1_z(CPUARMState *env, void *vd, uint64_t *vg, void *vm,
-     hwaddr frame_base;
+         return;
+     }
--    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable ||
--        !surface_bits_per_pixel(surface)) {
++    /* Protect against overlap between vd and vm. */
-+    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable) {
++    if (unlikely(vd == vm)) {
-+        return;
++        vm = memcpy(&scratch, vm, reg_max);
 +    }
 +
-+    surface = qemu_console_surface(omap_lcd->con);
+     /*
-+    if (!surface_bits_per_pixel(surface)) {
+      * Probe the first element, allowing faults.
-         return;
+      */
      }
 --
-.20.1
+.34.1

-[PULL 20/26] hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
+[PULL 03/29] Maintainers: add myself as reviewer for sbsa-ref
-From: AlexChen <alex.chen@huawei.com>
+From: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
-In exynos4210_fimd_update(), the pointer s is dereferinced before
+At Linaro I work on sbsa-ref, know direction it goes.
 being check if it is valid, which may lead to NULL pointer dereference.
 So move the assignment to global_width after checking that the s is valid.
-Reported-by: Euler Robot <euler.robot@huawei.com>
+May not get code details each time.
-Signed-off-by: Alex Chen <alex.chen@huawei.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
-Message-id: 5F9F8D88.9030102@huawei.com
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20230515143753.365591-1-marcin.juszkiewicz@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/display/exynos4210_fimd.c | 4 +++-
+ MAINTAINERS | 1 +
-file changed, 3 insertions(+), 1 deletion(-)
+file changed, 1 insertion(+)
-diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
+diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/exynos4210_fimd.c
+--- a/MAINTAINERS
-+++ b/hw/display/exynos4210_fimd.c
++++ b/MAINTAINERS
-@@ -XXX,XX +XXX,XX @@ static void exynos4210_fimd_update(void *opaque)
+@@ -XXX,XX +XXX,XX @@ SBSA-REF
-     bool blend = false;
+ M: Radoslaw Biernacki <rad@semihalf.com>
-     uint8_t *host_fb_addr;
+ M: Peter Maydell <peter.maydell@linaro.org>
-     bool is_dirty = false;
+ R: Leif Lindholm <quic_llindhol@quicinc.com>
--    const int global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
++R: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
-+    int global_width;
+ L: qemu-arm@nongnu.org
+ S: Maintained
-     if (!s || !s->console || !s->enabled ||
+ F: hw/arm/sbsa-ref.c
          surface_bits_per_pixel(qemu_console_surface(s->console)) == 0) {
          return;
      }
 +
 +    global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
      exynos4210_update_resolution(s);
      surface = qemu_console_surface(s->console);
 --
-.20.1
+.34.1

-[PULL 14/26] target/arm: fix handling of HCR.FB
+[PULL 04/29] arm/kvm: add support for MTE
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+From: Cornelia Huck <cohuck@redhat.com>
-HCR should be applied when NS is set, not when it is cleared.
+Extend the 'mte' property for the virt machine to cover KVM as
+well. For KVM, we don't allocate tag memory, but instead enable the
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+capability.
 If MTE has been enabled, we need to disable migration, as we do not
 yet have a way to migrate the tags as well. Therefore, MTE will stay
 off with KVM unless requested explicitly.
 Signed-off-by: Cornelia Huck <cohuck@redhat.com>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20230428095533.21747-2-cohuck@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 5 ++---
+ target/arm/cpu.h     |  4 +++
-file changed, 2 insertions(+), 3 deletions(-)
+ target/arm/kvm_arm.h | 19 ++++++++++++
+ hw/arm/virt.c        | 73 +++++++++++++++++++++++++-------------------
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+ target/arm/cpu.c     |  9 +++---
-index XXXXXXX..XXXXXXX 100644
+ target/arm/kvm.c     | 35 +++++++++++++++++++++
---- a/target/arm/helper.c
+ target/arm/kvm64.c   |  5 +++
-+++ b/target/arm/helper.c
+files changed, 109 insertions(+), 36 deletions(-)
-@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
       */
      uint32_t psci_conduit;
 +    /* CPU has Memory Tag Extension */
 +    bool has_mte;
 +
      /* For v8M, initial value of the Secure VTOR */
      uint32_t init_svtor;
      /* For v8M, initial value of the Non-secure VTOR */
@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
      bool prop_pauth;
      bool prop_pauth_impdef;
      bool prop_lpa2;
 +    OnOffAuto prop_mte;
      /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
      uint32_t dcz_blocksize;
 diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm_arm.h
 +++ b/target/arm/kvm_arm.h
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_pmu_supported(void);
   */
  bool kvm_arm_sve_supported(void);
 +/**
 + * kvm_arm_mte_supported:
 + *
 + * Returns: true if KVM can enable MTE, and false otherwise.
 + */
 +bool kvm_arm_mte_supported(void);
 +
  /**
   * kvm_arm_get_max_vm_ipa_size:
   * @ms: Machine state handle
@@ -XXX,XX +XXX,XX @@ void kvm_arm_pvtime_init(CPUState *cs, uint64_t ipa);
  int kvm_arm_set_irq(int cpu, int irqtype, int irq, int level);
 +void kvm_arm_enable_mte(Object *cpuobj, Error **errp);
 +
  #else
  /*
-  * Non-IS variants of TLB operations are upgraded to
+@@ -XXX,XX +XXX,XX @@ static inline bool kvm_arm_steal_time_supported(void)
-- * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
+     return false;
-+ * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to
+ }
-  * force broadcast of these operations.
 +static inline bool kvm_arm_mte_supported(void)
 +{
 +    return false;
 +}
 +
  /*
   * These functions should never actually be called without KVM support.
   */
- static bool tlb_force_broadcast(CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t kvm_arm_sve_get_vls(CPUState *cs)
      g_assert_not_reached();
  }
 +static inline void kvm_arm_enable_mte(Object *cpuobj, Error **errp)
 +{
 +    g_assert_not_reached();
 +}
 +
  #endif
  static inline const char *gic_class_name(void)
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
          exit(1);
      }
 -    if (vms->mte && (kvm_enabled() || hvf_enabled())) {
 +    if (vms->mte && hvf_enabled()) {
          error_report("mach-virt: %s does not support providing "
                       "MTE to the guest CPU",
                       current_accel_name());
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
          }
          if (vms->mte) {
 -            /* Create the memory region only once, but link to all cpus. */
 -            if (!tag_sysmem) {
 -                /*
 -                 * The property exists only if MemTag is supported.
 -                 * If it is, we must allocate the ram to back that up.
 -                 */
 -                if (!object_property_find(cpuobj, "tag-memory")) {
 -                    error_report("MTE requested, but not supported "
 -                                 "by the guest CPU");
 +            if (tcg_enabled()) {
 +                /* Create the memory region only once, but link to all cpus. */
 +                if (!tag_sysmem) {
 +                    /*
 +                     * The property exists only if MemTag is supported.
 +                     * If it is, we must allocate the ram to back that up.
 +                     */
 +                    if (!object_property_find(cpuobj, "tag-memory")) {
 +                        error_report("MTE requested, but not supported "
 +                                     "by the guest CPU");
 +                        exit(1);
 +                    }
 +
 +                    tag_sysmem = g_new(MemoryRegion, 1);
 +                    memory_region_init(tag_sysmem, OBJECT(machine),
 +                                       "tag-memory", UINT64_MAX / 32);
 +
 +                    if (vms->secure) {
 +                        secure_tag_sysmem = g_new(MemoryRegion, 1);
 +                        memory_region_init(secure_tag_sysmem, OBJECT(machine),
 +                                           "secure-tag-memory",
 +                                           UINT64_MAX / 32);
 +
 +                        /* As with ram, secure-tag takes precedence over tag. */
 +                        memory_region_add_subregion_overlap(secure_tag_sysmem,
 +                                                            0, tag_sysmem, -1);
 +                    }
 +                }
 +
 +                object_property_set_link(cpuobj, "tag-memory",
 +                                         OBJECT(tag_sysmem), &error_abort);
 +                if (vms->secure) {
 +                    object_property_set_link(cpuobj, "secure-tag-memory",
 +                                             OBJECT(secure_tag_sysmem),
 +                                             &error_abort);
 +                }
 +            } else if (kvm_enabled()) {
 +                if (!kvm_arm_mte_supported()) {
 +                    error_report("MTE requested, but not supported by KVM");
                      exit(1);
                  }
 -
 -                tag_sysmem = g_new(MemoryRegion, 1);
 -                memory_region_init(tag_sysmem, OBJECT(machine),
 -                                   "tag-memory", UINT64_MAX / 32);
 -
 -                if (vms->secure) {
 -                    secure_tag_sysmem = g_new(MemoryRegion, 1);
 -                    memory_region_init(secure_tag_sysmem, OBJECT(machine),
 -                                       "secure-tag-memory", UINT64_MAX / 32);
 -
 -                    /* As with ram, secure-tag takes precedence over tag.  */
 -                    memory_region_add_subregion_overlap(secure_tag_sysmem, 0,
 -                                                        tag_sysmem, -1);
 -                }
 -            }
 -
 -            object_property_set_link(cpuobj, "tag-memory", OBJECT(tag_sysmem),
 -                                     &error_abort);
 -            if (vms->secure) {
 -                object_property_set_link(cpuobj, "secure-tag-memory",
 -                                         OBJECT(secure_tag_sysmem),
 -                                         &error_abort);
 +                kvm_arm_enable_mte(cpuobj, &error_abort);
              }
          }
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
                                       qdev_prop_allow_set_link_before_realize,
                                       OBJ_PROP_LINK_STRONG);
          }
 +        cpu->has_mte = true;
      }
  #endif
  }
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
          }
          if (cpu->tag_memory) {
              error_setg(errp,
 -                       "Cannot enable %s when guest CPUs has MTE enabled",
 +                       "Cannot enable %s when guest CPUs has tag memory enabled",
                         current_accel_name());
              return;
          }
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
      }
  #ifndef CONFIG_USER_ONLY
 -    if (cpu->tag_memory == NULL && cpu_isar_feature(aa64_mte, cpu)) {
 +    if (!cpu->has_mte && cpu_isar_feature(aa64_mte, cpu)) {
          /*
 -         * Disable the MTE feature bits if we do not have tag-memory
 -         * provided by the machine.
 +         * Disable the MTE feature bits if we do not have the feature
 +         * setup by the machine.
           */
          cpu->isar.id_aa64pfr1 =
              FIELD_DP64(cpu->isar.id_aa64pfr1, ID_AA64PFR1, MTE, 0);
 diff --git a/target/arm/kvm.c b/target/arm/kvm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm.c
 +++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/boards.h"
  #include "hw/irq.h"
  #include "qemu/log.h"
 +#include "migration/blocker.h"
  const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
      KVM_CAP_LAST_INFO
@@ -XXX,XX +XXX,XX @@ bool kvm_arch_cpu_check_are_resettable(void)
  void kvm_arch_accel_class_init(ObjectClass *oc)
  {
--    return (env->cp15.hcr_el2 & HCR_FB) &&
+ }
--        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
++
-+    return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB);
++void kvm_arm_enable_mte(Object *cpuobj, Error **errp)
- }
++{
++    static bool tried_to_enable;
- static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
++    static bool succeeded_to_enable;
 +    Error *mte_migration_blocker = NULL;
 +    int ret;
 +
 +    if (!tried_to_enable) {
 +        /*
 +         * MTE on KVM is enabled on a per-VM basis (and retrying doesn't make
 +         * sense), and we only want a single migration blocker as well.
 +         */
 +        tried_to_enable = true;
 +
 +        ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_ARM_MTE, 0);
 +        if (ret) {
 +            error_setg_errno(errp, -ret, "Failed to enable KVM_CAP_ARM_MTE");
 +            return;
 +        }
 +
 +        /* TODO: add proper migration support with MTE enabled */
 +        error_setg(&mte_migration_blocker,
 +                   "Live migration disabled due to MTE enabled");
 +        if (migrate_add_blocker(mte_migration_blocker, errp)) {
 +            error_free(mte_migration_blocker);
 +            return;
 +        }
 +        succeeded_to_enable = true;
 +    }
 +    if (succeeded_to_enable) {
 +        object_property_set_bool(cpuobj, "has_mte", true, NULL);
 +    }
 +}
 diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm64.c
 +++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_steal_time_supported(void)
      return kvm_check_extension(kvm_state, KVM_CAP_STEAL_TIME);
  }
 +bool kvm_arm_mte_supported(void)
 +{
 +    return kvm_check_extension(kvm_state, KVM_CAP_ARM_MTE);
 +}
 +
  QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1);
  uint32_t kvm_arm_sve_get_vls(CPUState *cs)
 --
-.20.1
+.34.1

-[PULL 18/26] hw/arm/boot: fix SVE for EL3 direct kernel boot
+[PULL 05/29] target/arm: add RAZ/WI handling for DBGDTR[TX|RX]
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+From: Alex Bennée <alex.bennee@linaro.org>
-When booting a CPU with EL3 using the -kernel flag, set up CPTR_EL3 so
+The commit b3aa2f2128 (target/arm: provide stubs for more external
-that SVE will not trap to EL3.
+debug registers) was added to handle HyperV's unconditional usage of
 Debug Communications Channel. It turns out that Linux will similarly
 break if you enable CONFIG_HVC_DCC "ARM JTAG DCC console".
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+Extend the registers we RAZ/WI set to avoid this.
 Cc: Anders Roxell <anders.roxell@linaro.org>
 Cc: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
 Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030151541.11976-1-remi@remlab.net
+Message-id: 20230516104420.407912-1-alex.bennee@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/boot.c | 3 +++
+ target/arm/debug_helper.c | 11 +++++++++--
-file changed, 3 insertions(+)
+file changed, 9 insertions(+), 2 deletions(-)
-diff --git a/hw/arm/boot.c b/hw/arm/boot.c
+diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/boot.c
+--- a/target/arm/debug_helper.c
-+++ b/hw/arm/boot.c
++++ b/target/arm/debug_helper.c
-@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
-                     if (cpu_isar_feature(aa64_mte, cpu)) {
+       .access = PL0_R, .accessfn = access_tdcc,
-                         env->cp15.scr_el3 |= SCR_ATA;
+       .type = ARM_CP_CONST, .resetvalue = 0 },
-                     }
+     /*
-+                    if (cpu_isar_feature(aa64_sve, cpu)) {
+-     * OSDTRRX_EL1/OSDTRTX_EL1 are used for save and restore of DBGDTRRX_EL0.
-+                        env->cp15.cptr_el[3] |= CPTR_EZ;
+-     * It is a component of the Debug Communications Channel, which is not implemented.
-+                    }
++     * These registers belong to the Debug Communications Channel,
-                     /* AArch64 kernels never boot in secure mode */
++     * which is not implemented. However we implement RAZ/WI behaviour
-                     assert(!info->secure_boot);
++     * with trapping to prevent spurious SIGILLs if the guest OS does
-                     /* This hook is only supported for AArch32 currently:
++     * access them as the support cannot be probed for.
       */
      { .name = "OSDTRRX_EL1", .state = ARM_CP_STATE_BOTH, .cp = 14,
        .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 0, .opc2 = 2,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
        .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 2,
        .access = PL1_RW, .accessfn = access_tdcc,
        .type = ARM_CP_CONST, .resetvalue = 0 },
 +    /* DBGDTRTX_EL0/DBGDTRRX_EL0 depend on direction */
 +    { .name = "DBGDTR_EL0", .state = ARM_CP_STATE_BOTH, .cp = 14,
 +      .opc0 = 2, .opc1 = 3, .crn = 0, .crm = 5, .opc2 = 0,
 +      .access = PL0_RW, .accessfn = access_tdcc,
 +      .type = ARM_CP_CONST, .resetvalue = 0 },
      /*
       * OSECCR_EL1 provides a mechanism for an operating system
       * to access the contents of EDECCR. EDECCR is not implemented though,
 --
-.20.1
+.34.1

-[PULL 17/26] hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+[PULL 06/29] sbsa-ref: use Bochs graphics card instead of VGA
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+From: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
-Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
+Bochs card is normal PCI Express card so it fits better in system with
-This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):
+PCI Express bus. VGA is simple legacy PCI card.
-  CID 1432363 (#1 of 1): Unintentional integer overflow:
+Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
+Reviewed-by: Leif Lindholm <quic_llindhol@quicinc.com>
-  overflow_before_widen:
+Message-id: 20230505120936.1097060-1-marcin.juszkiewicz@linaro.org
     Potentially overflowing expression 1 << scale with type int
     (32 bits, signed) is evaluated using 32-bit arithmetic, and
     then used in a context that expects an expression of type
     hwaddr (64 bits, unsigned).
 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Acked-by: Eric Auger <eric.auger@redhat.com>
 Message-id: 20201030144617.1535064-1-philmd@redhat.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/smmuv3.c | 3 ++-
+ hw/arm/sbsa-ref.c | 2 +-
-file changed, 2 insertions(+), 1 deletion(-)
+file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
+diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/smmuv3.c
+--- a/hw/arm/sbsa-ref.c
-+++ b/hw/arm/smmuv3.c
++++ b/hw/arm/sbsa-ref.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void create_pcie(SBSAMachineState *sms)
-  */
+         }
  #include "qemu/osdep.h"
 +#include "qemu/bitops.h"
  #include "hw/irq.h"
  #include "hw/sysbus.h"
  #include "migration/vmstate.h"
@@ -XXX,XX +XXX,XX @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
          scale = CMD_SCALE(cmd);
          num = CMD_NUM(cmd);
          ttl = CMD_TTL(cmd);
 -        num_pages = (num + 1) * (1 << (scale));
 +        num_pages = (num + 1) * BIT_ULL(scale);
      }
-     if (type == SMMU_CMD_TLBI_NH_VA) {
+-    pci_create_simple(pci->bus, -1, "VGA");
 +    pci_create_simple(pci->bus, -1, "bochs-display");
      create_smmu(sms, pci->bus);
  }
 --
-.20.1
+.34.1

-[PULL 05/26] target/arm: Add read/write_neon_element32
+[PULL 07/29] target/arm: Split out disas_a64_legacy
 From: Richard Henderson <richard.henderson@linaro.org>
-Model these off the aa64 read/write_vec_element functions.
+Split out all of the decode stuff from aarch64_tr_translate_insn.
-Use it within translate-neon.c.inc.  The new functions do
+Call it disas_a64_legacy to indicate it will be replaced.
 not allocate or free temps, so this rearranges the calling
 code a bit.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20230512144106.3608981-2-peter.maydell@linaro.org
 [PMM: Rebased]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          |  26 ++++
+ target/arm/tcg/translate-a64.c | 82 ++++++++++++++++++----------------
- target/arm/translate-neon.c.inc | 256 ++++++++++++++++++++------------
+file changed, 44 insertions(+), 38 deletions(-)
 files changed, 183 insertions(+), 99 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
+@@ -XXX,XX +XXX,XX @@ static bool btype_destination_ok(uint32_t insn, bool bt, int btype)
-     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
+     return false;
  }
-+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
++/* C3.1 A64 instruction index by encoding */
 +static void disas_a64_legacy(DisasContext *s, uint32_t insn)
 +{
-+    long off = neon_element_offset(reg, ele, size);
++    switch (extract32(insn, 25, 4)) {
-+
++    case 0x0:
-+    switch (size) {
++        if (!extract32(insn, 31, 1) || !disas_sme(s, insn)) {
-+    case MO_32:
++            unallocated_encoding(s);
-+        tcg_gen_ld_i32(dest, cpu_env, off);
++        }
 +        break;
 +    case 0x1: case 0x3: /* UNALLOCATED */
 +        unallocated_encoding(s);
 +        break;
 +    case 0x2:
 +        if (!disas_sve(s, insn)) {
 +            unallocated_encoding(s);
 +        }
 +        break;
 +    case 0x8: case 0x9: /* Data processing - immediate */
 +        disas_data_proc_imm(s, insn);
 +        break;
 +    case 0xa: case 0xb: /* Branch, exception generation and system insns */
 +        disas_b_exc_sys(s, insn);
 +        break;
 +    case 0x4:
 +    case 0x6:
 +    case 0xc:
 +    case 0xe:      /* Loads and stores */
 +        disas_ldst(s, insn);
 +        break;
 +    case 0x5:
 +    case 0xd:      /* Data processing - register */
 +        disas_data_proc_reg(s, insn);
 +        break;
 +    case 0x7:
 +    case 0xf:      /* Data processing - SIMD and floating point */
 +        disas_data_proc_simd_fp(s, insn);
 +        break;
 +    default:
-+        g_assert_not_reached();
++        assert(FALSE); /* all 15 cases should be handled above */
 +        break;
 +    }
 +}
 +
-+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
-+{
+                                           CPUState *cpu)
 +    long off = neon_element_offset(reg, ele, size);
 +
 +    switch (size) {
 +    case MO_32:
 +        tcg_gen_st_i32(src, cpu_env, off);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
  static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
  {
-     TCGv_ptr ret = tcg_temp_new_ptr();
+@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+         disas_sme_fa64(s, insn);
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
       * early. Since Q is 0 there are always just two passes, so instead
       * of a complicated loop over each pass we just unroll.
       */
 -    tmp = neon_load_reg(a->vn, 0);
 -    tmp2 = neon_load_reg(a->vn, 1);
 +    tmp = tcg_temp_new_i32();
 +    tmp2 = tcg_temp_new_i32();
 +    tmp3 = tcg_temp_new_i32();
 +
 +    read_neon_element32(tmp, a->vn, 0, MO_32);
 +    read_neon_element32(tmp2, a->vn, 1, MO_32);
      fn(tmp, tmp, tmp2);
 -    tcg_temp_free_i32(tmp2);
 -    tmp3 = neon_load_reg(a->vm, 0);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    read_neon_element32(tmp3, a->vm, 0, MO_32);
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      fn(tmp3, tmp3, tmp2);
 -    tcg_temp_free_i32(tmp2);
 -    neon_store_reg(a->vd, 0, tmp);
 -    neon_store_reg(a->vd, 1, tmp3);
 +    write_neon_element32(tmp, a->vd, 0, MO_32);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i32(tmp2);
 +    tcg_temp_free_i32(tmp3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
       * 2-reg-and-shift operations, size < 3 case, where the
       * helper needs to be passed cpu_env.
       */
 -    TCGv_i32 constimm;
 +    TCGv_i32 constimm, tmp;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
       * by immediate using the variable shift operations.
       */
      constimm = tcg_const_i32(dup_const(a->size, a->shift));
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 +        read_neon_element32(tmp, a->vm, pass, MO_32);
          fn(tmp, cpu_env, tmp, constimm);
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
-+    tcg_temp_free_i32(tmp);
-     tcg_temp_free_i32(constimm);
+-    switch (extract32(insn, 25, 4)) {
-     return true;
+-    case 0x0:
- }
+-        if (!extract32(insn, 31, 1) || !disas_sme(s, insn)) {
-@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
+-            unallocated_encoding(s);
-     constimm = tcg_const_i64(-a->shift);
+-        }
-     rm1 = tcg_temp_new_i64();
+-        break;
-     rm2 = tcg_temp_new_i64();
+-    case 0x1: case 0x3: /* UNALLOCATED */
-+    rd = tcg_temp_new_i32();
+-        unallocated_encoding(s);
+-        break;
-     /* Load both inputs first to avoid potential overwrite if rm == rd */
+-    case 0x2:
-     neon_load_reg64(rm1, a->vm);
+-        if (!disas_sve(s, insn)) {
-     neon_load_reg64(rm2, a->vm + 1);
+-            unallocated_encoding(s);
+-        }
-     shiftfn(rm1, rm1, constimm);
+-        break;
--    rd = tcg_temp_new_i32();
+-    case 0x8: case 0x9: /* Data processing - immediate */
-     narrowfn(rd, cpu_env, rm1);
+-        disas_data_proc_imm(s, insn);
--    neon_store_reg(a->vd, 0, rd);
+-        break;
-+    write_neon_element32(rd, a->vd, 0, MO_32);
+-    case 0xa: case 0xb: /* Branch, exception generation and system insns */
+-        disas_b_exc_sys(s, insn);
-     shiftfn(rm2, rm2, constimm);
+-        break;
--    rd = tcg_temp_new_i32();
+-    case 0x4:
-     narrowfn(rd, cpu_env, rm2);
+-    case 0x6:
--    neon_store_reg(a->vd, 1, rd);
+-    case 0xc:
-+    write_neon_element32(rd, a->vd, 1, MO_32);
+-    case 0xe:      /* Loads and stores */
+-        disas_ldst(s, insn);
-+    tcg_temp_free_i32(rd);
+-        break;
-     tcg_temp_free_i64(rm1);
+-    case 0x5:
-     tcg_temp_free_i64(rm2);
+-    case 0xd:      /* Data processing - register */
-     tcg_temp_free_i64(constimm);
+-        disas_data_proc_reg(s, insn);
-@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
+-        break;
-     constimm = tcg_const_i32(imm);
+-    case 0x7:
+-    case 0xf:      /* Data processing - SIMD and floating point */
-     /* Load all inputs first to avoid potential overwrite */
+-        disas_data_proc_simd_fp(s, insn);
--    rm1 = neon_load_reg(a->vm, 0);
+-        break;
--    rm2 = neon_load_reg(a->vm, 1);
+-    default:
--    rm3 = neon_load_reg(a->vm + 1, 0);
+-        assert(FALSE); /* all 15 cases should be handled above */
--    rm4 = neon_load_reg(a->vm + 1, 1);
+-        break;
-+    rm1 = tcg_temp_new_i32();
+-    }
-+    rm2 = tcg_temp_new_i32();
++    disas_a64_legacy(s, insn);
-+    rm3 = tcg_temp_new_i32();
-+    rm4 = tcg_temp_new_i32();
+     /*
-+    read_neon_element32(rm1, a->vm, 0, MO_32);
+      * After execution of most insns, btype is reset to 0.
 +    read_neon_element32(rm2, a->vm, 1, MO_32);
 +    read_neon_element32(rm3, a->vm, 2, MO_32);
 +    read_neon_element32(rm4, a->vm, 3, MO_32);
      rtmp = tcg_temp_new_i64();
      shiftfn(rm1, rm1, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      tcg_temp_free_i32(rm2);
      narrowfn(rm1, cpu_env, rtmp);
 -    neon_store_reg(a->vd, 0, rm1);
 +    write_neon_element32(rm1, a->vd, 0, MO_32);
 +    tcg_temp_free_i32(rm1);
      shiftfn(rm3, rm3, constimm);
      shiftfn(rm4, rm4, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      narrowfn(rm3, cpu_env, rtmp);
      tcg_temp_free_i64(rtmp);
 -    neon_store_reg(a->vd, 1, rm3);
 +    write_neon_element32(rm3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rm3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          widen_mask = dup_const(a->size + 1, widen_mask);
      }
 -    rm0 = neon_load_reg(a->vm, 0);
 -    rm1 = neon_load_reg(a->vm, 1);
 +    rm0 = tcg_temp_new_i32();
 +    rm1 = tcg_temp_new_i32();
 +    read_neon_element32(rm0, a->vm, 0, MO_32);
 +    read_neon_element32(rm1, a->vm, 1, MO_32);
      tmp = tcg_temp_new_i64();
      widenfn(tmp, rm0);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      if (src1_wide) {
          neon_load_reg64(rn0_64, a->vn);
      } else {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, 0, MO_32);
          widenfn(rn0_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = neon_load_reg(a->vm, 0);
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rm, a->vm, 0, MO_32);
      widenfn(rm_64, rm);
      tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      if (src1_wide) {
          neon_load_reg64(rn1_64, a->vn + 1);
      } else {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, 1, MO_32);
          widenfn(rn1_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = neon_load_reg(a->vm, 1);
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rm, a->vm, 1, MO_32);
      neon_store_reg64(rn0_64, a->vd);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
      narrowfn(rd1, rn_64);
 -    neon_store_reg(a->vd, 0, rd0);
 -    neon_store_reg(a->vd, 1, rd1);
 +    write_neon_element32(rd0, a->vd, 0, MO_32);
 +    write_neon_element32(rd1, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd0);
 +    tcg_temp_free_i32(rd1);
      tcg_temp_free_i64(rn_64);
      tcg_temp_free_i64(rm_64);
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
      rd0 = tcg_temp_new_i64();
      rd1 = tcg_temp_new_i64();
 -    rn = neon_load_reg(a->vn, 0);
 -    rm = neon_load_reg(a->vm, 0);
 +    rn = tcg_temp_new_i32();
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rn, a->vn, 0, MO_32);
 +    read_neon_element32(rm, a->vm, 0, MO_32);
      opfn(rd0, rn, rm);
 -    tcg_temp_free_i32(rn);
 -    tcg_temp_free_i32(rm);
 -    rn = neon_load_reg(a->vn, 1);
 -    rm = neon_load_reg(a->vm, 1);
 +    read_neon_element32(rn, a->vn, 1, MO_32);
 +    read_neon_element32(rm, a->vm, 1, MO_32);
      opfn(rd1, rn, rm);
      tcg_temp_free_i32(rn);
      tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
  static inline TCGv_i32 neon_get_scalar(int size, int reg)
  {
 -    TCGv_i32 tmp;
 -    if (size == 1) {
 -        tmp = neon_load_reg(reg & 7, reg >> 4);
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +    if (size == MO_16) {
 +        read_neon_element32(tmp, reg & 7, reg >> 4, MO_32);
          if (reg & 8) {
              gen_neon_dup_high16(tmp);
          } else {
              gen_neon_dup_low16(tmp);
          }
      } else {
 -        tmp = neon_load_reg(reg & 15, reg >> 4);
 +        read_neon_element32(tmp, reg & 15, reg >> 4, MO_32);
      }
      return tmp;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
       * perform an accumulation operation of that result into the
       * destination.
       */
 -    TCGv_i32 scalar;
 +    TCGv_i32 scalar, tmp;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
      }
      scalar = neon_get_scalar(a->size, a->vm);
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
 +        read_neon_element32(tmp, a->vn, pass, MO_32);
          opfn(tmp, tmp, scalar);
          if (accfn) {
 -            TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +            TCGv_i32 rd = tcg_temp_new_i32();
 +            read_neon_element32(rd, a->vd, pass, MO_32);
              accfn(tmp, rd, tmp);
              tcg_temp_free_i32(rd);
          }
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(scalar);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
       * performs a kind of fused op-then-accumulate using a helper
       * function that takes all of rd, rn and the scalar at once.
       */
 -    TCGv_i32 scalar;
 +    TCGv_i32 scalar, rn, rd;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
      }
      scalar = neon_get_scalar(a->size, a->vm);
 +    rn = tcg_temp_new_i32();
 +    rd = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 rn = neon_load_reg(a->vn, pass);
 -        TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +        read_neon_element32(rn, a->vn, pass, MO_32);
 +        read_neon_element32(rd, a->vd, pass, MO_32);
          opfn(rd, cpu_env, rn, scalar, rd);
 -        tcg_temp_free_i32(rn);
 -        neon_store_reg(a->vd, pass, rd);
 +        write_neon_element32(rd, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(rd);
      tcg_temp_free_i32(scalar);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      scalar = neon_get_scalar(a->size, a->vm);
      /* Load all inputs before writing any outputs, in case of overlap */
 -    rn = neon_load_reg(a->vn, 0);
 +    rn = tcg_temp_new_i32();
 +    read_neon_element32(rn, a->vn, 0, MO_32);
      rn0_64 = tcg_temp_new_i64();
      opfn(rn0_64, rn, scalar);
 -    tcg_temp_free_i32(rn);
 -    rn = neon_load_reg(a->vn, 1);
 +    read_neon_element32(rn, a->vn, 1, MO_32);
      rn1_64 = tcg_temp_new_i64();
      opfn(rn1_64, rn, scalar);
      tcg_temp_free_i32(rn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
          return false;
      }
      n <<= 3;
 +    tmp = tcg_temp_new_i32();
      if (a->op) {
 -        tmp = neon_load_reg(a->vd, 0);
 +        read_neon_element32(tmp, a->vd, 0, MO_32);
      } else {
 -        tmp = tcg_temp_new_i32();
          tcg_gen_movi_i32(tmp, 0);
      }
 -    tmp2 = neon_load_reg(a->vm, 0);
 +    tmp2 = tcg_temp_new_i32();
 +    read_neon_element32(tmp2, a->vm, 0, MO_32);
      ptr1 = vfp_reg_ptr(true, a->vn);
      tmp4 = tcg_const_i32(n);
      gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
 -    tcg_temp_free_i32(tmp);
 +
      if (a->op) {
 -        tmp = neon_load_reg(a->vd, 1);
 +        read_neon_element32(tmp, a->vd, 1, MO_32);
      } else {
 -        tmp = tcg_temp_new_i32();
          tcg_gen_movi_i32(tmp, 0);
      }
 -    tmp3 = neon_load_reg(a->vm, 1);
 +    tmp3 = tcg_temp_new_i32();
 +    read_neon_element32(tmp3, a->vm, 1, MO_32);
      gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(tmp4);
      tcg_temp_free_ptr(ptr1);
 -    neon_store_reg(a->vd, 0, tmp2);
 -    neon_store_reg(a->vd, 1, tmp3);
 -    tcg_temp_free_i32(tmp);
 +
 +    write_neon_element32(tmp2, a->vd, 0, MO_32);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp2);
 +    tcg_temp_free_i32(tmp3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
  static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
  {
      int pass, half;
 +    TCGv_i32 tmp[2];
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
          return true;
      }
 -    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
 -        TCGv_i32 tmp[2];
 +    tmp[0] = tcg_temp_new_i32();
 +    tmp[1] = tcg_temp_new_i32();
 +    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
          for (half = 0; half < 2; half++) {
 -            tmp[half] = neon_load_reg(a->vm, pass * 2 + half);
 +            read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32);
              switch (a->size) {
              case 0:
                  tcg_gen_bswap32_i32(tmp[half], tmp[half]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
                  g_assert_not_reached();
              }
          }
 -        neon_store_reg(a->vd, pass * 2, tmp[1]);
 -        neon_store_reg(a->vd, pass * 2 + 1, tmp[0]);
 +        write_neon_element32(tmp[1], a->vd, pass * 2, MO_32);
 +        write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32);
      }
 +
 +    tcg_temp_free_i32(tmp[0]);
 +    tcg_temp_free_i32(tmp[1]);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
          rm0_64 = tcg_temp_new_i64();
          rm1_64 = tcg_temp_new_i64();
          rd_64 = tcg_temp_new_i64();
 -        tmp = neon_load_reg(a->vm, pass * 2);
 +
 +        tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, pass * 2, MO_32);
          widenfn(rm0_64, tmp);
 -        tcg_temp_free_i32(tmp);
 -        tmp = neon_load_reg(a->vm, pass * 2 + 1);
 +        read_neon_element32(tmp, a->vm, pass * 2 + 1, MO_32);
          widenfn(rm1_64, tmp);
          tcg_temp_free_i32(tmp);
 +
          opfn(rd_64, rm0_64, rm1_64);
          tcg_temp_free_i64(rm0_64);
          tcg_temp_free_i64(rm1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
      narrowfn(rd0, cpu_env, rm);
      neon_load_reg64(rm, a->vm + 1);
      narrowfn(rd1, cpu_env, rm);
 -    neon_store_reg(a->vd, 0, rd0);
 -    neon_store_reg(a->vd, 1, rd1);
 +    write_neon_element32(rd0, a->vd, 0, MO_32);
 +    write_neon_element32(rd1, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd0);
 +    tcg_temp_free_i32(rd1);
      tcg_temp_free_i64(rm);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      }
      rd = tcg_temp_new_i64();
 +    rm0 = tcg_temp_new_i32();
 +    rm1 = tcg_temp_new_i32();
 -    rm0 = neon_load_reg(a->vm, 0);
 -    rm1 = neon_load_reg(a->vm, 1);
 +    read_neon_element32(rm0, a->vm, 0, MO_32);
 +    read_neon_element32(rm1, a->vm, 1, MO_32);
      widenfn(rd, rm0);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
      fpst = fpstatus_ptr(FPST_STD);
      ahp = get_ahp_flag();
 -    tmp = neon_load_reg(a->vm, 0);
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vm, 0, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    tmp2 = tcg_temp_new_i32();
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp2, tmp2, fpst, ahp);
      tcg_gen_shli_i32(tmp2, tmp2, 16);
      tcg_gen_or_i32(tmp2, tmp2, tmp);
 -    tcg_temp_free_i32(tmp);
 -    tmp = neon_load_reg(a->vm, 2);
 +    read_neon_element32(tmp, a->vm, 2, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -    tmp3 = neon_load_reg(a->vm, 3);
 -    neon_store_reg(a->vd, 0, tmp2);
 +    tmp3 = tcg_temp_new_i32();
 +    read_neon_element32(tmp3, a->vm, 3, MO_32);
 +    write_neon_element32(tmp2, a->vd, 0, MO_32);
 +    tcg_temp_free_i32(tmp2);
      gen_helper_vfp_fcvt_f32_to_f16(tmp3, tmp3, fpst, ahp);
      tcg_gen_shli_i32(tmp3, tmp3, 16);
      tcg_gen_or_i32(tmp3, tmp3, tmp);
 -    neon_store_reg(a->vd, 1, tmp3);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp3);
      tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(ahp);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
      fpst = fpstatus_ptr(FPST_STD);
      ahp = get_ahp_flag();
      tmp3 = tcg_temp_new_i32();
 -    tmp = neon_load_reg(a->vm, 0);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    tmp2 = tcg_temp_new_i32();
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vm, 0, MO_32);
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      tcg_gen_ext16u_i32(tmp3, tmp);
      gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
 -    neon_store_reg(a->vd, 0, tmp3);
 +    write_neon_element32(tmp3, a->vd, 0, MO_32);
      tcg_gen_shri_i32(tmp, tmp, 16);
      gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp);
 -    neon_store_reg(a->vd, 1, tmp);
 -    tmp3 = tcg_temp_new_i32();
 +    write_neon_element32(tmp, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp);
      tcg_gen_ext16u_i32(tmp3, tmp2);
      gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
 -    neon_store_reg(a->vd, 2, tmp3);
 +    write_neon_element32(tmp3, a->vd, 2, MO_32);
 +    tcg_temp_free_i32(tmp3);
      tcg_gen_shri_i32(tmp2, tmp2, 16);
      gen_helper_vfp_fcvt_f16_to_f32(tmp2, tmp2, fpst, ahp);
 -    neon_store_reg(a->vd, 3, tmp2);
 +    write_neon_element32(tmp2, a->vd, 3, MO_32);
 +    tcg_temp_free_i32(tmp2);
      tcg_temp_free_i32(ahp);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ DO_2M_CRYPTO(SHA256SU0, aa32_sha2, 2)
  static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
  {
 +    TCGv_i32 tmp;
      int pass;
      /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
@@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
          return true;
      }
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 +        read_neon_element32(tmp, a->vm, pass, MO_32);
          fn(tmp, tmp);
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VTRN(DisasContext *s, arg_2misc *a)
          return true;
      }
 -    if (a->size == 2) {
 +    tmp = tcg_temp_new_i32();
 +    tmp2 = tcg_temp_new_i32();
 +    if (a->size == MO_32) {
          for (pass = 0; pass < (a->q ? 4 : 2); pass += 2) {
 -            tmp = neon_load_reg(a->vm, pass);
 -            tmp2 = neon_load_reg(a->vd, pass + 1);
 -            neon_store_reg(a->vm, pass, tmp2);
 -            neon_store_reg(a->vd, pass + 1, tmp);
 +            read_neon_element32(tmp, a->vm, pass, MO_32);
 +            read_neon_element32(tmp2, a->vd, pass + 1, MO_32);
 +            write_neon_element32(tmp2, a->vm, pass, MO_32);
 +            write_neon_element32(tmp, a->vd, pass + 1, MO_32);
          }
      } else {
          for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -            tmp = neon_load_reg(a->vm, pass);
 -            tmp2 = neon_load_reg(a->vd, pass);
 -            if (a->size == 0) {
 +            read_neon_element32(tmp, a->vm, pass, MO_32);
 +            read_neon_element32(tmp2, a->vd, pass, MO_32);
 +            if (a->size == MO_8) {
                  gen_neon_trn_u8(tmp, tmp2);
              } else {
                  gen_neon_trn_u16(tmp, tmp2);
              }
 -            neon_store_reg(a->vm, pass, tmp2);
 -            neon_store_reg(a->vd, pass, tmp);
 +            write_neon_element32(tmp2, a->vm, pass, MO_32);
 +            write_neon_element32(tmp, a->vd, pass, MO_32);
          }
      }
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i32(tmp2);
      return true;
  }
 --
-.20.1
+.34.1

-[PULL 26/26] tests/qtest/npcm7xx_rng-test: Disable randomness tests
+[PULL 08/29] target/arm: Create decodetree skeleton for A64
-The randomness tests in the NPCM7xx RNG test fail intermittently
+The A64 translator uses a hand-written decoder for everything except
-but fairly frequently. On my machine running the test in a loop:
+SVE or SME.  It's fairly well structured, but it's becoming obvious
- while QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/npcm7xx_rng-test; do true; done
+that it's still more painful to add instructions to than the A32
 translator, because putting a new instruction into the right place in
 a hand-written decoder is much harder than adding new instruction
 patterns to a decodetree file.
-will fail in less than a minute with an error like:
+As the first step in conversion to decodetree, create the skeleton of
-ERROR:../../tests/qtest/npcm7xx_rng-test.c:256:test_first_byte_runs:
+the decodetree decoder; where it does not handle instructions we will
-assertion failed (calc_runs_p(buf.l, sizeof(buf) * BITS_PER_BYTE) > 0.01): (0.00286205989 > 0.01)
+fall back to the legacy decoder (which will be for everything at the
+moment, since there are no patterns in a64.decode).
 (Failures have been observed on all 4 of the randomness tests,
 not just first_byte_runs.)
 It's not clear why these tests are failing like this, but intermittent
 failures make CI and merge testing awkward, so disable running them
 unless a developer specifically sets QEMU_TEST_FLAKY_RNG_TESTS when
 running the test suite, until we work out the cause.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201102152454.8287-1-peter.maydell@linaro.org
+Message-id: 20230512144106.3608981-3-peter.maydell@linaro.org
 Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>
 ---
- tests/qtest/npcm7xx_rng-test.c | 14 ++++++++++----
+ target/arm/tcg/a64.decode      | 20 ++++++++++++++++++++
-file changed, 10 insertions(+), 4 deletions(-)
+ target/arm/tcg/translate-a64.c | 18 +++++++++++-------
  target/arm/tcg/meson.build     |  1 +
 files changed, 32 insertions(+), 7 deletions(-)
  create mode 100644 target/arm/tcg/a64.decode
-diff --git a/tests/qtest/npcm7xx_rng-test.c b/tests/qtest/npcm7xx_rng-test.c
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/tcg/a64.decode
@@ -XXX,XX +XXX,XX @@
 +# AArch64 A64 allowed instruction decoding
 +#
 +#  Copyright (c) 2023 Linaro, Ltd
 +#
 +# This library is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU Lesser General Public
 +# License as published by the Free Software Foundation; either
 +# version 2.1 of the License, or (at your option) any later version.
 +#
 +# This library is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +# Lesser General Public License for more details.
 +#
 +# You should have received a copy of the GNU Lesser General Public
 +# License along with this library; if not, see <http://www.gnu.org/licenses/>.
 +
 +#
 +# This file is processed by scripts/decodetree.py
 +#
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/tests/qtest/npcm7xx_rng-test.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/tests/qtest/npcm7xx_rng-test.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
+@@ -XXX,XX +XXX,XX @@ enum a64_shift_type {
+     A64_SHIFT_TYPE_ROR = 3
-     qtest_add_func("npcm7xx_rng/enable_disable", test_enable_disable);
+ };
-     qtest_add_func("npcm7xx_rng/rosel", test_rosel);
--    qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
++/*
--    qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
++ * Include the generated decoders.
--    qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
++ */
--    qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
++
-+    /*
++#include "decode-sme-fa64.c.inc"
-+     * These tests fail intermittently; only run them on explicit
++#include "decode-a64.c.inc"
-+     * request until we figure out why.
++
-+     */
+ /* Table based decoder typedefs - used when the relevant bits for decode
-+    if (getenv("QEMU_TEST_FLAKY_RNG_TESTS")) {
+  * are too awkwardly scattered across the instruction (eg SIMD).
-+        qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
+  */
-+        qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
+@@ -XXX,XX +XXX,XX @@ static void disas_data_proc_simd_fp(DisasContext *s, uint32_t insn)
-+        qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
+     }
-+        qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+ }
 -/*
 - * Include the generated SME FA64 decoder.
 - */
 -
 -#include "decode-sme-fa64.c.inc"
 -
  static bool trans_OK(DisasContext *s, arg_OK *a)
  {
      return true;
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
          disas_sme_fa64(s, insn);
      }
 -    disas_a64_legacy(s, insn);
 +
 +    if (!disas_a64(s, insn)) {
 +        disas_a64_legacy(s, insn);
 +    }
-     qtest_start("-machine npcm750-evb");
+     /*
-     ret = g_test_run();
+      * After execution of most insns, btype is reset to 0.
 diff --git a/target/arm/tcg/meson.build b/target/arm/tcg/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/meson.build
 +++ b/target/arm/tcg/meson.build
@@ -XXX,XX +XXX,XX @@ gen = [
    decodetree.process('a32-uncond.decode', extra_args: '--static-decode=disas_a32_uncond'),
    decodetree.process('t32.decode', extra_args: '--static-decode=disas_t32'),
    decodetree.process('t16.decode', extra_args: ['-w', '16', '--static-decode=disas_t16']),
 +  decodetree.process('a64.decode', extra_args: ['--static-decode=disas_a64']),
  ]
  arm_ss.add(gen)
 --
-.20.1
+.34.1

-New patch
+[PULL 09/29] target/arm: Pull calls to disas_sve() and disas_sme() out of legacy decoder
+The SVE and SME decode is already done by decodetree.  Pull the calls
+to these decoders out of the legacy decoder.  This doesn't change
+behaviour because all the patterns in sve.decode and sme.decode
+already require the bits that the legacy decoder is decoding to have
+the correct values.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20230512144106.3608981-4-peter.maydell@linaro.org
+---
+ target/arm/tcg/translate-a64.c | 20 ++++----------------
+file changed, 4 insertions(+), 16 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool btype_destination_ok(uint32_t insn, bool bt, int btype)
+ static void disas_a64_legacy(DisasContext *s, uint32_t insn)
+ {
+     switch (extract32(insn, 25, 4)) {
+-    case 0x0:
+-        if (!extract32(insn, 31, 1) || !disas_sme(s, insn)) {
+-            unallocated_encoding(s);
+-        }
+-        break;
+-    case 0x1: case 0x3: /* UNALLOCATED */
+-        unallocated_encoding(s);
+-        break;
+-    case 0x2:
+-        if (!disas_sve(s, insn)) {
+-            unallocated_encoding(s);
+-        }
+-        break;
+     case 0x8: case 0x9: /* Data processing - immediate */
+         disas_data_proc_imm(s, insn);
+         break;
+@@ -XXX,XX +XXX,XX @@ static void disas_a64_legacy(DisasContext *s, uint32_t insn)
+         disas_data_proc_simd_fp(s, insn);
+         break;
+     default:
+-        assert(FALSE); /* all 15 cases should be handled above */
++        unallocated_encoding(s);
+         break;
+     }
+ }
+@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
+         disas_sme_fa64(s, insn);
+     }
+-
+-    if (!disas_a64(s, insn)) {
++    if (!disas_a64(s, insn) &&
++        !disas_sme(s, insn) &&
++        !disas_sve(s, insn)) {
+         disas_a64_legacy(s, insn);
+     }
+--
+.34.1

-[PULL 09/26] target/arm: Rename neon_load_reg64 to vfp_load_reg64
+[PULL 10/29] target/arm: Convert PC-rel addressing to decodetree
 From: Richard Henderson <richard.henderson@linaro.org>
-The only uses of this function are for loading VFP
+Convert the ADR and ADRP instructions.
 double-precision values, and nothing to do with NEON.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20230512144106.3608981-5-peter.maydell@linaro.org
 [PMM: Rebased]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c         |  8 ++--
+ target/arm/tcg/a64.decode      | 13 ++++++++++++
- target/arm/translate-vfp.c.inc | 84 +++++++++++++++++-----------------
+ target/arm/tcg/translate-a64.c | 38 +++++++++++++---------------------
-files changed, 46 insertions(+), 46 deletions(-)
+files changed, 27 insertions(+), 24 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/a64.decode
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/a64.decode
-@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
+@@ -XXX,XX +XXX,XX @@
  #
  # This file is processed by scripts/decodetree.py
  #
 +
 +&ri              rd imm
 +
 +
 +### Data Processing - Immediate
 +
 +# PC-rel addressing
 +
 +%imm_pcrel      5:s19 29:2
 +@pcrel          . .. ..... ................... rd:5     &ri imm=%imm_pcrel
 +
 +ADR             0 .. 10000 ................... .....    @pcrel
 +ADRP            1 .. 10000 ................... .....    @pcrel
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_ldst(DisasContext *s, uint32_t insn)
      }
  }
--static inline void neon_load_reg64(TCGv_i64 var, int reg)
+-/* PC-rel. addressing
-+static inline void vfp_load_reg64(TCGv_i64 var, int reg)
+- *   31  30   29 28       24 23                5 4    0
 - * +----+-------+-----------+-------------------+------+
 - * | op | immlo | 1 0 0 0 0 |       immhi       |  Rd  |
 - * +----+-------+-----------+-------------------+------+
 +/*
 + * PC-rel. addressing
   */
 -static void disas_pc_rel_adr(DisasContext *s, uint32_t insn)
 +
 +static bool trans_ADR(DisasContext *s, arg_ri *a)
  {
--    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
+-    unsigned int page, rd;
-+    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
+-    int64_t offset;
 +    gen_pc_plus_diff(s, cpu_reg(s, a->rd), a->imm);
 +    return true;
 +}
 -    page = extract32(insn, 31, 1);
 -    /* SignExtend(immhi:immlo) -> offset */
 -    offset = sextract64(insn, 5, 19);
 -    offset = offset << 2 | extract32(insn, 29, 2);
 -    rd = extract32(insn, 0, 5);
 +static bool trans_ADRP(DisasContext *s, arg_ri *a)
 +{
 +    int64_t offset = (int64_t)a->imm << 12;
 -    if (page) {
 -        /* ADRP (page based) */
 -        offset <<= 12;
 -        /* The page offset is ok for CF_PCREL. */
 -        offset -= s->pc_curr & 0xfff;
 -    }
 -
 -    gen_pc_plus_diff(s, cpu_reg(s, rd), offset);
 +    /* The page offset is ok for CF_PCREL. */
 +    offset -= s->pc_curr & 0xfff;
 +    gen_pc_plus_diff(s, cpu_reg(s, a->rd), offset);
 +    return true;
  }
--static inline void neon_store_reg64(TCGv_i64 var, int reg)
+ /*
-+static inline void vfp_store_reg64(TCGv_i64 var, int reg)
+@@ -XXX,XX +XXX,XX @@ static void disas_extract(DisasContext *s, uint32_t insn)
  static void disas_data_proc_imm(DisasContext *s, uint32_t insn)
  {
--    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
+     switch (extract32(insn, 23, 6)) {
-+    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
+-    case 0x20: case 0x21: /* PC-rel. addressing */
- }
+-        disas_pc_rel_adr(s, insn);
+-        break;
- static inline void vfp_load_reg32(TCGv_i32 var, int reg)
+     case 0x22: /* Add/subtract (immediate) */
-diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
+         disas_add_sub_imm(s, insn);
-index XXXXXXX..XXXXXXX 100644
+         break;
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          tcg_gen_ext_i32_i64(nf, cpu_NF);
          tcg_gen_ext_i32_i64(vf, cpu_VF);
 -        neon_load_reg64(frn, rn);
 -        neon_load_reg64(frm, rm);
 +        vfp_load_reg64(frn, rn);
 +        vfp_load_reg64(frm, rm);
          switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
              tcg_temp_free_i64(tmp);
              break;
          }
 -        neon_store_reg64(dest, rd);
 +        vfp_store_reg64(dest, rd);
          tcg_temp_free_i64(frn);
          tcg_temp_free_i64(frm);
          tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i64 tcg_res;
          tcg_op = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
 -        neon_load_reg64(tcg_op, rm);
 +        vfp_load_reg64(tcg_op, rm);
          gen_helper_rintd(tcg_res, tcg_op, fpst);
 -        neon_store_reg64(tcg_res, rd);
 +        vfp_store_reg64(tcg_res, rd);
          tcg_temp_free_i64(tcg_op);
          tcg_temp_free_i64(tcg_res);
      } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
          tcg_double = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
          tcg_tmp = tcg_temp_new_i32();
 -        neon_load_reg64(tcg_double, rm);
 +        vfp_load_reg64(tcg_double, rm);
          if (is_signed) {
              gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
          } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
      tmp = tcg_temp_new_i64();
      if (a->l) {
          gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg64(tmp, a->vd);
 +        vfp_store_reg64(tmp, a->vd);
      } else {
 -        neon_load_reg64(tmp, a->vd);
 +        vfp_load_reg64(tmp, a->vd);
          gen_aa32_st64(s, tmp, addr, get_mem_index(s));
      }
      tcg_temp_free_i64(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
          if (a->l) {
              /* load */
              gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
 -            neon_store_reg64(tmp, a->vd + i);
 +            vfp_store_reg64(tmp, a->vd + i);
          } else {
              /* store */
 -            neon_load_reg64(tmp, a->vd + i);
 +            vfp_load_reg64(tmp, a->vd + i);
              gen_aa32_st64(s, tmp, addr, get_mem_index(s));
          }
          tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
      fd = tcg_temp_new_i64();
      fpst = fpstatus_ptr(FPST_FPCR);
 -    neon_load_reg64(f0, vn);
 -    neon_load_reg64(f1, vm);
 +    vfp_load_reg64(f0, vn);
 +    vfp_load_reg64(f1, vm);
      for (;;) {
          if (reads_vd) {
 -            neon_load_reg64(fd, vd);
 +            vfp_load_reg64(fd, vd);
          }
          fn(fd, f0, f1, fpst);
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
          veclen--;
          vd = vfp_advance_dreg(vd, delta_d);
          vn = vfp_advance_dreg(vn, delta_d);
 -        neon_load_reg64(f0, vn);
 +        vfp_load_reg64(f0, vn);
          if (delta_m) {
              vm = vfp_advance_dreg(vm, delta_m);
 -            neon_load_reg64(f1, vm);
 +            vfp_load_reg64(f1, vm);
          }
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
      f0 = tcg_temp_new_i64();
      fd = tcg_temp_new_i64();
 -    neon_load_reg64(f0, vm);
 +    vfp_load_reg64(f0, vm);
      for (;;) {
          fn(fd, f0);
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
              /* single source one-many */
              while (veclen--) {
                  vd = vfp_advance_dreg(vd, delta_d);
 -                neon_store_reg64(fd, vd);
 +                vfp_store_reg64(fd, vd);
              }
              break;
          }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
          veclen--;
          vd = vfp_advance_dreg(vd, delta_d);
          vd = vfp_advance_dreg(vm, delta_m);
 -        neon_load_reg64(f0, vm);
 +        vfp_load_reg64(f0, vm);
      }
      tcg_temp_free_i64(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i64();
 -    neon_load_reg64(vn, a->vn);
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vn, a->vn);
 +    vfp_load_reg64(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negd(vn, vn);
      }
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negd(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
      fd = tcg_const_i64(vfp_expand_imm(MO_64, a->imm));
      for (;;) {
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
      vd = tcg_temp_new_i64();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i64(vm, 0);
      } else {
 -        neon_load_reg64(vm, a->vm);
 +        vfp_load_reg64(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
      tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
      vd = tcg_temp_new_i64();
      gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(ahp_mode);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
      tmp = tcg_temp_new_i32();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
      tcg_temp_free_i64(vm);
      tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rintd(tmp, tmp, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rintd(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      tcg_temp_free_i32(tcg_rmode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rintd_exact(tmp, tmp, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
      vd = tcg_temp_new_i64();
      vfp_load_reg32(vm, a->vm);
      gen_helper_vfp_fcvtds(vd, vm, cpu_env);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_i64(vd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
      vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
          /* u32 -> f64 */
          gen_helper_vfp_uitod(vd, vm, fpst);
      }
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_i64(vd);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i32();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vjcvt(vd, vm, cpu_env);
      vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i64(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i64();
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i64(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR);
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i32();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      if (a->s) {
          if (a->rz) {
 --
-.20.1
+.34.1

-[PULL 06/26] target/arm: Expand read/write_neon_element32 to all MemOp
+[PULL 11/29] target/arm: Split gen_add_CC and gen_sub_CC
 From: Richard Henderson <richard.henderson@linaro.org>
-We can then use this to improve VMOV (scalar to gp) and
+Split out specific 32-bit and 64-bit functions.
-VMOV (gp to scalar) so that we simply perform the memory
+These carry the same signature as tcg_gen_add_i64,
-operation that we wanted, rather than inserting or
+and so will be easier to pass as callbacks.
 extracting from a 32-bit quantity.
-These were the last uses of neon_load/store_reg, so remove them.
+Retain gen_add_CC and gen_sub_CC during conversion.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20230512144106.3608981-6-peter.maydell@linaro.org
 [PMM: rebased]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c         | 50 +++++++++++++-----------
+ target/arm/tcg/translate-a64.c | 149 +++++++++++++++++++--------------
- target/arm/translate-vfp.c.inc | 71 +++++-----------------------------
+file changed, 84 insertions(+), 65 deletions(-)
 files changed, 37 insertions(+), 84 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
+@@ -XXX,XX +XXX,XX @@ static inline void gen_logic_CC(int sf, TCGv_i64 result)
-  * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
+ }
-  * where 0 is the least significant end of the register.
-  */
+ /* dest = T0 + T1; compute C, N, V and Z flags */
--static long neon_element_offset(int reg, int element, MemOp size)
++static void gen_add64_CC(TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
-+static long neon_element_offset(int reg, int element, MemOp memop)
++{
 +    TCGv_i64 result, flag, tmp;
 +    result = tcg_temp_new_i64();
 +    flag = tcg_temp_new_i64();
 +    tmp = tcg_temp_new_i64();
 +
 +    tcg_gen_movi_i64(tmp, 0);
 +    tcg_gen_add2_i64(result, flag, t0, tmp, t1, tmp);
 +
 +    tcg_gen_extrl_i64_i32(cpu_CF, flag);
 +
 +    gen_set_NZ64(result);
 +
 +    tcg_gen_xor_i64(flag, result, t0);
 +    tcg_gen_xor_i64(tmp, t0, t1);
 +    tcg_gen_andc_i64(flag, flag, tmp);
 +    tcg_gen_extrh_i64_i32(cpu_VF, flag);
 +
 +    tcg_gen_mov_i64(dest, result);
 +}
 +
 +static void gen_add32_CC(TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
 +{
 +    TCGv_i32 t0_32 = tcg_temp_new_i32();
 +    TCGv_i32 t1_32 = tcg_temp_new_i32();
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +
 +    tcg_gen_movi_i32(tmp, 0);
 +    tcg_gen_extrl_i64_i32(t0_32, t0);
 +    tcg_gen_extrl_i64_i32(t1_32, t1);
 +    tcg_gen_add2_i32(cpu_NF, cpu_CF, t0_32, tmp, t1_32, tmp);
 +    tcg_gen_mov_i32(cpu_ZF, cpu_NF);
 +    tcg_gen_xor_i32(cpu_VF, cpu_NF, t0_32);
 +    tcg_gen_xor_i32(tmp, t0_32, t1_32);
 +    tcg_gen_andc_i32(cpu_VF, cpu_VF, tmp);
 +    tcg_gen_extu_i32_i64(dest, cpu_NF);
 +}
 +
  static void gen_add_CC(int sf, TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
  {
--    int element_size = 1 << size;
+     if (sf) {
-+    int element_size = 1 << (memop & MO_SIZE);
+-        TCGv_i64 result, flag, tmp;
-     int ofs = element * element_size;
+-        result = tcg_temp_new_i64();
- #ifdef HOST_WORDS_BIGENDIAN
+-        flag = tcg_temp_new_i64();
-     /*
+-        tmp = tcg_temp_new_i64();
-@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
+-
 -        tcg_gen_movi_i64(tmp, 0);
 -        tcg_gen_add2_i64(result, flag, t0, tmp, t1, tmp);
 -
 -        tcg_gen_extrl_i64_i32(cpu_CF, flag);
 -
 -        gen_set_NZ64(result);
 -
 -        tcg_gen_xor_i64(flag, result, t0);
 -        tcg_gen_xor_i64(tmp, t0, t1);
 -        tcg_gen_andc_i64(flag, flag, tmp);
 -        tcg_gen_extrh_i64_i32(cpu_VF, flag);
 -
 -        tcg_gen_mov_i64(dest, result);
 +        gen_add64_CC(dest, t0, t1);
      } else {
 -        /* 32 bit arithmetic */
 -        TCGv_i32 t0_32 = tcg_temp_new_i32();
 -        TCGv_i32 t1_32 = tcg_temp_new_i32();
 -        TCGv_i32 tmp = tcg_temp_new_i32();
 -
 -        tcg_gen_movi_i32(tmp, 0);
 -        tcg_gen_extrl_i64_i32(t0_32, t0);
 -        tcg_gen_extrl_i64_i32(t1_32, t1);
 -        tcg_gen_add2_i32(cpu_NF, cpu_CF, t0_32, tmp, t1_32, tmp);
 -        tcg_gen_mov_i32(cpu_ZF, cpu_NF);
 -        tcg_gen_xor_i32(cpu_VF, cpu_NF, t0_32);
 -        tcg_gen_xor_i32(tmp, t0_32, t1_32);
 -        tcg_gen_andc_i32(cpu_VF, cpu_VF, tmp);
 -        tcg_gen_extu_i32_i64(dest, cpu_NF);
 +        gen_add32_CC(dest, t0, t1);
      }
  }
--static TCGv_i32 neon_load_reg(int reg, int pass)
+ /* dest = T0 - T1; compute C, N, V and Z flags */
--{
++static void gen_sub64_CC(TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
--    TCGv_i32 tmp = tcg_temp_new_i32();
++{
--    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
++    /* 64 bit arithmetic */
--    return tmp;
++    TCGv_i64 result, flag, tmp;
--}
++
 +    result = tcg_temp_new_i64();
 +    flag = tcg_temp_new_i64();
 +    tcg_gen_sub_i64(result, t0, t1);
 +
 +    gen_set_NZ64(result);
 +
 +    tcg_gen_setcond_i64(TCG_COND_GEU, flag, t0, t1);
 +    tcg_gen_extrl_i64_i32(cpu_CF, flag);
 +
 +    tcg_gen_xor_i64(flag, result, t0);
 +    tmp = tcg_temp_new_i64();
 +    tcg_gen_xor_i64(tmp, t0, t1);
 +    tcg_gen_and_i64(flag, flag, tmp);
 +    tcg_gen_extrh_i64_i32(cpu_VF, flag);
 +    tcg_gen_mov_i64(dest, result);
 +}
 +
 +static void gen_sub32_CC(TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
 +{
 +    /* 32 bit arithmetic */
 +    TCGv_i32 t0_32 = tcg_temp_new_i32();
 +    TCGv_i32 t1_32 = tcg_temp_new_i32();
 +    TCGv_i32 tmp;
 +
 +    tcg_gen_extrl_i64_i32(t0_32, t0);
 +    tcg_gen_extrl_i64_i32(t1_32, t1);
 +    tcg_gen_sub_i32(cpu_NF, t0_32, t1_32);
 +    tcg_gen_mov_i32(cpu_ZF, cpu_NF);
 +    tcg_gen_setcond_i32(TCG_COND_GEU, cpu_CF, t0_32, t1_32);
 +    tcg_gen_xor_i32(cpu_VF, cpu_NF, t0_32);
 +    tmp = tcg_temp_new_i32();
 +    tcg_gen_xor_i32(tmp, t0_32, t1_32);
 +    tcg_gen_and_i32(cpu_VF, cpu_VF, tmp);
 +    tcg_gen_extu_i32_i64(dest, cpu_NF);
 +}
 +
  static void gen_sub_CC(int sf, TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
  {
      if (sf) {
 -        /* 64 bit arithmetic */
 -        TCGv_i64 result, flag, tmp;
 -
--static void neon_store_reg(int reg, int pass, TCGv_i32 var)
+-        result = tcg_temp_new_i64();
--{
+-        flag = tcg_temp_new_i64();
--    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
+-        tcg_gen_sub_i64(result, t0, t1);
 -    tcg_temp_free_i32(var);
 -}
 -
- static inline void neon_load_reg64(TCGv_i64 var, int reg)
+-        gen_set_NZ64(result);
- {
+-
-     tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
+-        tcg_gen_setcond_i64(TCG_COND_GEU, flag, t0, t1);
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
+-        tcg_gen_extrl_i64_i32(cpu_CF, flag);
-     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
+-
- }
+-        tcg_gen_xor_i64(flag, result, t0);
+-        tmp = tcg_temp_new_i64();
--static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+-        tcg_gen_xor_i64(tmp, t0, t1);
-+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
+-        tcg_gen_and_i64(flag, flag, tmp);
- {
+-        tcg_gen_extrh_i64_i32(cpu_VF, flag);
--    long off = neon_element_offset(reg, ele, size);
+-        tcg_gen_mov_i64(dest, result);
-+    long off = neon_element_offset(reg, ele, memop);
++        gen_sub64_CC(dest, t0, t1);
+     } else {
--    switch (size) {
+-        /* 32 bit arithmetic */
--    case MO_32:
+-        TCGv_i32 t0_32 = tcg_temp_new_i32();
-+    switch (memop) {
+-        TCGv_i32 t1_32 = tcg_temp_new_i32();
-+    case MO_SB:
+-        TCGv_i32 tmp;
-+        tcg_gen_ld8s_i32(dest, cpu_env, off);
+-
-+        break;
+-        tcg_gen_extrl_i64_i32(t0_32, t0);
-+    case MO_UB:
+-        tcg_gen_extrl_i64_i32(t1_32, t1);
-+        tcg_gen_ld8u_i32(dest, cpu_env, off);
+-        tcg_gen_sub_i32(cpu_NF, t0_32, t1_32);
-+        break;
+-        tcg_gen_mov_i32(cpu_ZF, cpu_NF);
-+    case MO_SW:
+-        tcg_gen_setcond_i32(TCG_COND_GEU, cpu_CF, t0_32, t1_32);
-+        tcg_gen_ld16s_i32(dest, cpu_env, off);
+-        tcg_gen_xor_i32(cpu_VF, cpu_NF, t0_32);
-+        break;
+-        tmp = tcg_temp_new_i32();
-+    case MO_UW:
+-        tcg_gen_xor_i32(tmp, t0_32, t1_32);
-+        tcg_gen_ld16u_i32(dest, cpu_env, off);
+-        tcg_gen_and_i32(cpu_VF, cpu_VF, tmp);
-+        break;
+-        tcg_gen_extu_i32_i64(dest, cpu_NF);
-+    case MO_UL:
++        gen_sub32_CC(dest, t0, t1);
 +    case MO_SL:
          tcg_gen_ld_i32(dest, cpu_env, off);
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
      }
  }
--static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
-+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
- {
--    long off = neon_element_offset(reg, ele, size);
-+    long off = neon_element_offset(reg, ele, memop);
--    switch (size) {
-+    switch (memop) {
-+    case MO_8:
-+        tcg_gen_st8_i32(src, cpu_env, off);
-+        break;
-+    case MO_16:
-+        tcg_gen_st16_i32(src, cpu_env, off);
-+        break;
-     case MO_32:
-         tcg_gen_st_i32(src, cpu_env, off);
-         break;
-diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c.inc
-+++ b/target/arm/translate-vfp.c.inc
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
- {
-     /* VMOV scalar to general purpose register */
-     TCGv_i32 tmp;
--    int pass;
--    uint32_t offset;
--    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
--    if (a->size == 2
-+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-+    if (a->size == MO_32
-         ? !dc_isar_feature(aa32_fpsp_v2, s)
-         : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-         return false;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
-         return false;
-     }
--    offset = a->index << a->size;
--    pass = extract32(offset, 2, 1);
--    offset = extract32(offset, 0, 2) * 8;
--
-     if (!vfp_access_check(s)) {
-         return true;
-     }
--    tmp = neon_load_reg(a->vn, pass);
--    switch (a->size) {
--    case 0:
--        if (offset) {
--            tcg_gen_shri_i32(tmp, tmp, offset);
--        }
--        if (a->u) {
--            gen_uxtb(tmp);
--        } else {
--            gen_sxtb(tmp);
--        }
--        break;
--    case 1:
--        if (a->u) {
--            if (offset) {
--                tcg_gen_shri_i32(tmp, tmp, 16);
--            } else {
--                gen_uxth(tmp);
--            }
--        } else {
--            if (offset) {
--                tcg_gen_sari_i32(tmp, tmp, 16);
--            } else {
--                gen_sxth(tmp);
--            }
--        }
--        break;
--    case 2:
--        break;
--    }
-+    tmp = tcg_temp_new_i32();
-+    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
-     store_reg(s, a->rt, tmp);
-     return true;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
- static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
- {
-     /* VMOV general purpose register to scalar */
--    TCGv_i32 tmp, tmp2;
--    int pass;
--    uint32_t offset;
-+    TCGv_i32 tmp;
--    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
--    if (a->size == 2
-+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-+    if (a->size == MO_32
-         ? !dc_isar_feature(aa32_fpsp_v2, s)
-         : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-         return false;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
-         return false;
-     }
--    offset = a->index << a->size;
--    pass = extract32(offset, 2, 1);
--    offset = extract32(offset, 0, 2) * 8;
--
-     if (!vfp_access_check(s)) {
-         return true;
-     }
-     tmp = load_reg(s, a->rt);
--    switch (a->size) {
--    case 0:
--        tmp2 = neon_load_reg(a->vn, pass);
--        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
--        tcg_temp_free_i32(tmp2);
--        break;
--    case 1:
--        tmp2 = neon_load_reg(a->vn, pass);
--        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
--        tcg_temp_free_i32(tmp2);
--        break;
--    case 2:
--        break;
--    }
--    neon_store_reg(a->vn, pass, tmp);
-+    write_neon_element32(tmp, a->vn, a->index, a->size);
-+    tcg_temp_free_i32(tmp);
-     return true;
- }
 --
-.20.1
+.34.1

-[PULL 04/26] target/arm: Use neon_element_offset in vfp_reg_offset
+[PULL 12/29] target/arm: Convert Add/subtract (immediate) to decodetree
 From: Richard Henderson <richard.henderson@linaro.org>
-This seems a bit more readable than using offsetof CPU_DoubleU.
+Convert the ADD and SUB (immediate) instructions.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20230512144106.3608981-7-peter.maydell@linaro.org
 [PMM: Rebased; adjusted to use translate.h's TRANS macro]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 13 ++++---------
+ target/arm/tcg/translate.h     |  5 +++
-file changed, 4 insertions(+), 9 deletions(-)
+ target/arm/tcg/a64.decode      | 17 ++++++++
  target/arm/tcg/translate-a64.c | 73 ++++++++++------------------------
 files changed, 42 insertions(+), 53 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/translate.h
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/translate.h
-@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp size)
+@@ -XXX,XX +XXX,XX @@ static inline int rsub_8(DisasContext *s, int x)
-     return neon_full_reg_offset(reg) + ofs;
+     return 8 - x;
  }
--static inline long vfp_reg_offset(bool dp, unsigned reg)
++static inline int shl_12(DisasContext *s, int x)
-+/* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
++{
-+static long vfp_reg_offset(bool dp, unsigned reg)
++    return x << 12;
 +}
 +
  static inline int neon_3same_fp_size(DisasContext *s, int x)
  {
-     if (dp) {
+     /* Convert 0==fp32, 1==fp16 into a MO_* value */
--        return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
-+        return neon_element_offset(reg, 0, MO_64);
+index XXXXXXX..XXXXXXX 100644
-     } else {
+--- a/target/arm/tcg/a64.decode
--        long ofs = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
++++ b/target/arm/tcg/a64.decode
--        if (reg & 1) {
+@@ -XXX,XX +XXX,XX @@
--            ofs += offsetof(CPU_DoubleU, l.upper);
+ #
--        } else {
--            ofs += offsetof(CPU_DoubleU, l.lower);
+ &ri              rd imm
--        }
++&rri_sf          rd rn imm sf
--        return ofs;
-+        return neon_element_offset(reg >> 1, reg & 1, MO_32);
  ### Data Processing - Immediate
@@ -XXX,XX +XXX,XX @@
  ADR             0 .. 10000 ................... .....    @pcrel
  ADRP            1 .. 10000 ................... .....    @pcrel
 +
 +# Add/subtract (immediate)
 +
 +%imm12_sh12     10:12 !function=shl_12
 +@addsub_imm     sf:1 .. ...... . imm:12 rn:5 rd:5
 +@addsub_imm12   sf:1 .. ...... . ............ rn:5 rd:5 imm=%imm12_sh12
 +
 +ADD_i           . 00 100010 0 ............ ..... .....  @addsub_imm
 +ADD_i           . 00 100010 1 ............ ..... .....  @addsub_imm12
 +ADDS_i          . 01 100010 0 ............ ..... .....  @addsub_imm
 +ADDS_i          . 01 100010 1 ............ ..... .....  @addsub_imm12
 +
 +SUB_i           . 10 100010 0 ............ ..... .....  @addsub_imm
 +SUB_i           . 10 100010 1 ............ ..... .....  @addsub_imm12
 +SUBS_i          . 11 100010 0 ............ ..... .....  @addsub_imm
 +SUBS_i          . 11 100010 1 ............ ..... .....  @addsub_imm12
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_ldst(DisasContext *s, uint32_t insn)
      }
  }
++typedef void ArithTwoOp(TCGv_i64, TCGv_i64, TCGv_i64);
++
++static bool gen_rri(DisasContext *s, arg_rri_sf *a,
++                    bool rd_sp, bool rn_sp, ArithTwoOp *fn)
++{
++    TCGv_i64 tcg_rn = rn_sp ? cpu_reg_sp(s, a->rn) : cpu_reg(s, a->rn);
++    TCGv_i64 tcg_rd = rd_sp ? cpu_reg_sp(s, a->rd) : cpu_reg(s, a->rd);
++    TCGv_i64 tcg_imm = tcg_constant_i64(a->imm);
++
++    fn(tcg_rd, tcg_rn, tcg_imm);
++    if (!a->sf) {
++        tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
++    }
++    return true;
++}
++
+ /*
+  * PC-rel. addressing
+  */
+@@ -XXX,XX +XXX,XX @@ static bool trans_ADRP(DisasContext *s, arg_ri *a)
+ /*
+  * Add/subtract (immediate)
+- *
+- *  31 30 29 28         23 22 21         10 9   5 4   0
+- * +--+--+--+-------------+--+-------------+-----+-----+
+- * |sf|op| S| 1 0 0 0 1 0 |sh|    imm12    |  Rn | Rd  |
+- * +--+--+--+-------------+--+-------------+-----+-----+
+- *
+- *    sf: 0 -> 32bit, 1 -> 64bit
+- *    op: 0 -> add  , 1 -> sub
+- *     S: 1 -> set flags
+- *    sh: 1 -> LSL imm by 12
+  */
+-static void disas_add_sub_imm(DisasContext *s, uint32_t insn)
+-{
+-    int rd = extract32(insn, 0, 5);
+-    int rn = extract32(insn, 5, 5);
+-    uint64_t imm = extract32(insn, 10, 12);
+-    bool shift = extract32(insn, 22, 1);
+-    bool setflags = extract32(insn, 29, 1);
+-    bool sub_op = extract32(insn, 30, 1);
+-    bool is_64bit = extract32(insn, 31, 1);
+-
+-    TCGv_i64 tcg_rn = cpu_reg_sp(s, rn);
+-    TCGv_i64 tcg_rd = setflags ? cpu_reg(s, rd) : cpu_reg_sp(s, rd);
+-    TCGv_i64 tcg_result;
+-
+-    if (shift) {
+-        imm <<= 12;
+-    }
+-
+-    tcg_result = tcg_temp_new_i64();
+-    if (!setflags) {
+-        if (sub_op) {
+-            tcg_gen_subi_i64(tcg_result, tcg_rn, imm);
+-        } else {
+-            tcg_gen_addi_i64(tcg_result, tcg_rn, imm);
+-        }
+-    } else {
+-        TCGv_i64 tcg_imm = tcg_constant_i64(imm);
+-        if (sub_op) {
+-            gen_sub_CC(is_64bit, tcg_result, tcg_rn, tcg_imm);
+-        } else {
+-            gen_add_CC(is_64bit, tcg_result, tcg_rn, tcg_imm);
+-        }
+-    }
+-
+-    if (is_64bit) {
+-        tcg_gen_mov_i64(tcg_rd, tcg_result);
+-    } else {
+-        tcg_gen_ext32u_i64(tcg_rd, tcg_result);
+-    }
+-}
++TRANS(ADD_i, gen_rri, a, 1, 1, tcg_gen_add_i64)
++TRANS(SUB_i, gen_rri, a, 1, 1, tcg_gen_sub_i64)
++TRANS(ADDS_i, gen_rri, a, 0, 1, a->sf ? gen_add64_CC : gen_add32_CC)
++TRANS(SUBS_i, gen_rri, a, 0, 1, a->sf ? gen_sub64_CC : gen_sub32_CC)
+ /*
+  * Add/subtract (immediate, with tags)
+@@ -XXX,XX +XXX,XX @@ static void disas_extract(DisasContext *s, uint32_t insn)
+ static void disas_data_proc_imm(DisasContext *s, uint32_t insn)
+ {
+     switch (extract32(insn, 23, 6)) {
+-    case 0x22: /* Add/subtract (immediate) */
+-        disas_add_sub_imm(s, insn);
+-        break;
+     case 0x23: /* Add/subtract (immediate, with tags) */
+         disas_add_sub_imm_with_tags(s, insn);
+         break;
 --
-.20.1
+.34.1

-[PULL 10/26] target/arm: Simplify do_long_3d and do_2scalar_long
+[PULL 13/29] target/arm: Convert Add/subtract (immediate with tags) to decodetree
 From: Richard Henderson <richard.henderson@linaro.org>
-In both cases, we can sink the write-back and perform
+Convert the ADDG and SUBG (immediate) instructions.
 the accumulate into the normal destination temps.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-11-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20230512144106.3608981-8-peter.maydell@linaro.org
 [PMM: Rebased; use TRANS_FEAT()]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-neon.c.inc | 23 +++++++++--------------
+ target/arm/tcg/a64.decode      |  8 +++++++
-file changed, 9 insertions(+), 14 deletions(-)
+ target/arm/tcg/translate-a64.c | 38 ++++++++++------------------------
 files changed, 19 insertions(+), 27 deletions(-)
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.c.inc
+--- a/target/arm/tcg/a64.decode
-+++ b/target/arm/translate-neon.c.inc
++++ b/target/arm/tcg/a64.decode
-@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
+@@ -XXX,XX +XXX,XX @@ SUB_i           . 10 100010 0 ............ ..... .....  @addsub_imm
-     if (accfn) {
+ SUB_i           . 10 100010 1 ............ ..... .....  @addsub_imm12
-         tmp = tcg_temp_new_i64();
+ SUBS_i          . 11 100010 0 ............ ..... .....  @addsub_imm
-         read_neon_element64(tmp, a->vd, 0, MO_64);
+ SUBS_i          . 11 100010 1 ............ ..... .....  @addsub_imm12
--        accfn(tmp, tmp, rd0);
++
--        write_neon_element64(tmp, a->vd, 0, MO_64);
++# Add/subtract (immediate with tags)
-+        accfn(rd0, tmp, rd0);
++
-         read_neon_element64(tmp, a->vd, 1, MO_64);
++&rri_tag        rd rn uimm6 uimm4
--        accfn(tmp, tmp, rd1);
++@addsub_imm_tag . .. ...... . uimm6:6 .. uimm4:4 rn:5 rd:5 &rri_tag
--        write_neon_element64(tmp, a->vd, 1, MO_64);
++
-+        accfn(rd1, tmp, rd1);
++ADDG_i          1 00 100011 0 ...... 00 .... ..... ..... @addsub_imm_tag
-         tcg_temp_free_i64(tmp);
++SUBG_i          1 10 100011 0 ...... 00 .... ..... ..... @addsub_imm_tag
--    } else {
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
--        write_neon_element64(rd0, a->vd, 0, MO_64);
+index XXXXXXX..XXXXXXX 100644
--        write_neon_element64(rd1, a->vd, 1, MO_64);
+--- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ TRANS(SUBS_i, gen_rri, a, 0, 1, a->sf ? gen_sub64_CC : gen_sub32_CC)
  /*
   * Add/subtract (immediate, with tags)
 - *
 - *  31 30 29 28         23 22 21     16 14      10 9   5 4   0
 - * +--+--+--+-------------+--+---------+--+-------+-----+-----+
 - * |sf|op| S| 1 0 0 0 1 1 |o2|  uimm6  |o3| uimm4 |  Rn | Rd  |
 - * +--+--+--+-------------+--+---------+--+-------+-----+-----+
 - *
 - *    op: 0 -> add, 1 -> sub
   */
 -static void disas_add_sub_imm_with_tags(DisasContext *s, uint32_t insn)
 +
 +static bool gen_add_sub_imm_with_tags(DisasContext *s, arg_rri_tag *a,
 +                                      bool sub_op)
  {
 -    int rd = extract32(insn, 0, 5);
 -    int rn = extract32(insn, 5, 5);
 -    int uimm4 = extract32(insn, 10, 4);
 -    int uimm6 = extract32(insn, 16, 6);
 -    bool sub_op = extract32(insn, 30, 1);
      TCGv_i64 tcg_rn, tcg_rd;
      int imm;
 -    /* Test all of sf=1, S=0, o2=0, o3=0.  */
 -    if ((insn & 0xa040c000u) != 0x80000000u ||
 -        !dc_isar_feature(aa64_mte_insn_reg, s)) {
 -        unallocated_encoding(s);
 -        return;
 -    }
 -
 -    imm = uimm6 << LOG2_TAG_GRANULE;
 +    imm = a->uimm6 << LOG2_TAG_GRANULE;
      if (sub_op) {
          imm = -imm;
      }
-+    write_neon_element64(rd0, a->vd, 0, MO_64);
+-    tcg_rn = cpu_reg_sp(s, rn);
-+    write_neon_element64(rd1, a->vd, 1, MO_64);
+-    tcg_rd = cpu_reg_sp(s, rd);
-     tcg_temp_free_i64(rd0);
++    tcg_rn = cpu_reg_sp(s, a->rn);
-     tcg_temp_free_i64(rd1);
++    tcg_rd = cpu_reg_sp(s, a->rd);
-@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
+     if (s->ata) {
-     if (accfn) {
+         gen_helper_addsubg(tcg_rd, cpu_env, tcg_rn,
-         TCGv_i64 t64 = tcg_temp_new_i64();
+                            tcg_constant_i32(imm),
-         read_neon_element64(t64, a->vd, 0, MO_64);
+-                           tcg_constant_i32(uimm4));
--        accfn(t64, t64, rn0_64);
++                           tcg_constant_i32(a->uimm4));
--        write_neon_element64(t64, a->vd, 0, MO_64);
+     } else {
-+        accfn(rn0_64, t64, rn0_64);
+         tcg_gen_addi_i64(tcg_rd, tcg_rn, imm);
-         read_neon_element64(t64, a->vd, 1, MO_64);
+         gen_address_with_allocation_tag0(tcg_rd, tcg_rd);
 -        accfn(t64, t64, rn1_64);
 -        write_neon_element64(t64, a->vd, 1, MO_64);
 +        accfn(rn1_64, t64, rn1_64);
          tcg_temp_free_i64(t64);
 -    } else {
 -        write_neon_element64(rn0_64, a->vd, 0, MO_64);
 -        write_neon_element64(rn1_64, a->vd, 1, MO_64);
      }
++    return true;
+ }
++TRANS_FEAT(ADDG_i, aa64_mte_insn_reg, gen_add_sub_imm_with_tags, a, false)
++TRANS_FEAT(SUBG_i, aa64_mte_insn_reg, gen_add_sub_imm_with_tags, a, true)
 +
-+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
+ /* The input should be a value in the bottom e bits (with higher
-+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
+  * bits zero); returns that value replicated into every element
-     tcg_temp_free_i64(rn0_64);
+  * of size e in a 64 bit integer.
-     tcg_temp_free_i64(rn1_64);
+@@ -XXX,XX +XXX,XX @@ static void disas_extract(DisasContext *s, uint32_t insn)
-     return true;
+ static void disas_data_proc_imm(DisasContext *s, uint32_t insn)
  {
      switch (extract32(insn, 23, 6)) {
 -    case 0x23: /* Add/subtract (immediate, with tags) */
 -        disas_add_sub_imm_with_tags(s, insn);
 -        break;
      case 0x24: /* Logical (immediate) */
          disas_logic_imm(s, insn);
          break;
 --
-.20.1
+.34.1

-[PULL 02/26] target/arm: Move neon_element_offset to translate.c
+[PULL 14/29] target/arm: Replace bitmask64 with MAKE_64BIT_MASK
 From: Richard Henderson <richard.henderson@linaro.org>
-This will shortly have users outside of translate-neon.c.inc.
+Use the bitops.h macro rather than rolling our own here.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20230512144106.3608981-9-peter.maydell@linaro.org
 ---
- target/arm/translate.c          | 20 ++++++++++++++++++++
+ target/arm/tcg/translate-a64.c | 11 ++---------
- target/arm/translate-neon.c.inc | 19 -------------------
+file changed, 2 insertions(+), 9 deletions(-)
 files changed, 20 insertions(+), 19 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
+@@ -XXX,XX +XXX,XX @@ static uint64_t bitfield_replicate(uint64_t mask, unsigned int e)
-     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+     return mask;
  }
-+/*
+-/* Return a value with the bottom len bits set (where 0 < len <= 64) */
-+ * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
+-static inline uint64_t bitmask64(unsigned int length)
 + * where 0 is the least significant end of the register.
 + */
 +static long neon_element_offset(int reg, int element, MemOp size)
 +{
 +    int element_size = 1 << size;
 +    int ofs = element * element_size;
 +#ifdef HOST_WORDS_BIGENDIAN
 +    /*
 +     * Calculate the offset assuming fully little-endian,
 +     * then XOR to account for the order of the 8-byte units.
 +     */
 +    if (element_size < 8) {
 +        ofs ^= 8 - element_size;
 +    }
 +#endif
 +    return neon_full_reg_offset(reg) + ofs;
 +}
 +
  static inline long vfp_reg_offset(bool dp, unsigned reg)
  {
      if (dp) {
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
  #include "decode-neon-ls.c.inc"
  #include "decode-neon-shared.c.inc"
 -/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
 - * where 0 is the least significant end of the register.
 - */
 -static inline long
 -neon_element_offset(int reg, int element, MemOp size)
 -{
--    int element_size = 1 << size;
+-    assert(length > 0 && length <= 64);
--    int ofs = element * element_size;
+-    return ~0ULL >> (64 - length);
 -#ifdef HOST_WORDS_BIGENDIAN
 -    /* Calculate the offset assuming fully little-endian,
 -     * then XOR to account for the order of the 8-byte units.
 -     */
 -    if (element_size < 8) {
 -        ofs ^= 8 - element_size;
 -    }
 -#endif
 -    return neon_full_reg_offset(reg) + ofs;
 -}
 -
- static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
+ /* Simplified variant of pseudocode DecodeBitMasks() for the case where we
- {
+  * only require the wmask. Returns false if the imms/immr/immn are a reserved
-     long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
+  * value (ie should cause a guest UNDEF exception), and true if they are
@@ -XXX,XX +XXX,XX @@ bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
      /* Create the value of one element: s+1 set bits rotated
       * by r within the element (which is e bits wide)...
       */
 -    mask = bitmask64(s + 1);
 +    mask = MAKE_64BIT_MASK(0, s + 1);
      if (r) {
          mask = (mask >> r) | (mask << (e - r));
 -        mask &= bitmask64(e);
 +        mask &= MAKE_64BIT_MASK(0, e);
      }
      /* ...then replicate the element over the whole 64 bit value */
      mask = bitfield_replicate(mask, e);
 --
-.20.1
+.34.1

-[PULL 07/26] target/arm: Rename neon_load_reg32 to vfp_load_reg32
+[PULL 15/29] target/arm: Convert Logical (immediate) to decodetree
 From: Richard Henderson <richard.henderson@linaro.org>
-The only uses of this function are for loading VFP
+Convert the ADD, ORR, EOR, ANDS (immediate) instructions.
 single-precision values, and nothing to do with NEON.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20230512144106.3608981-10-peter.maydell@linaro.org
+[PMM: rebased]
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c         |   4 +-
+ target/arm/tcg/a64.decode      | 15 ++++++
- target/arm/translate-vfp.c.inc | 184 ++++++++++++++++-----------------
+ target/arm/tcg/translate-a64.c | 94 +++++++++++-----------------------
-files changed, 94 insertions(+), 94 deletions(-)
+files changed, 44 insertions(+), 65 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/a64.decode
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/a64.decode
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
+@@ -XXX,XX +XXX,XX @@ SUBS_i          . 11 100010 1 ............ ..... .....  @addsub_imm12
-     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
  ADDG_i          1 00 100011 0 ...... 00 .... ..... ..... @addsub_imm_tag
  SUBG_i          1 10 100011 0 ...... 00 .... ..... ..... @addsub_imm_tag
 +
 +# Logical (immediate)
 +
 +&rri_log        rd rn sf dbm
 +@logic_imm_64   1 .. ...... dbm:13 rn:5 rd:5            &rri_log sf=1
 +@logic_imm_32   0 .. ...... 0 dbm:12 rn:5 rd:5          &rri_log sf=0
 +
 +AND_i           . 00 100100 . ...... ...... ..... ..... @logic_imm_64
 +AND_i           . 00 100100 . ...... ...... ..... ..... @logic_imm_32
 +ORR_i           . 01 100100 . ...... ...... ..... ..... @logic_imm_64
 +ORR_i           . 01 100100 . ...... ...... ..... ..... @logic_imm_32
 +EOR_i           . 10 100100 . ...... ...... ..... ..... @logic_imm_64
 +EOR_i           . 10 100100 . ...... ...... ..... ..... @logic_imm_32
 +ANDS_i          . 11 100100 . ...... ...... ..... ..... @logic_imm_64
 +ANDS_i          . 11 100100 . ...... ...... ..... ..... @logic_imm_32
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static uint64_t bitfield_replicate(uint64_t mask, unsigned int e)
      return mask;
  }
--static inline void neon_load_reg32(TCGv_i32 var, int reg)
+-/* Simplified variant of pseudocode DecodeBitMasks() for the case where we
-+static inline void vfp_load_reg32(TCGv_i32 var, int reg)
++/*
- {
++ * Logical (immediate)
-     tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
++ */
- }
++
++/*
--static inline void neon_store_reg32(TCGv_i32 var, int reg)
++ * Simplified variant of pseudocode DecodeBitMasks() for the case where we
-+static inline void vfp_store_reg32(TCGv_i32 var, int reg)
+  * only require the wmask. Returns false if the imms/immr/immn are a reserved
- {
+  * value (ie should cause a guest UNDEF exception), and true if they are
-     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
+  * valid, in which case the decoded bit pattern is written to result.
- }
+@@ -XXX,XX +XXX,XX @@ bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          frn = tcg_temp_new_i32();
          frm = tcg_temp_new_i32();
          dest = tcg_temp_new_i32();
 -        neon_load_reg32(frn, rn);
 -        neon_load_reg32(frm, rm);
 +        vfp_load_reg32(frn, rn);
 +        vfp_load_reg32(frm, rm);
          switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          if (sz == 1) {
              tcg_gen_andi_i32(dest, dest, 0xffff);
          }
 -        neon_store_reg32(dest, rd);
 +        vfp_store_reg32(dest, rd);
          tcg_temp_free_i32(frn);
          tcg_temp_free_i32(frm);
          tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i32 tcg_res;
          tcg_op = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        neon_load_reg32(tcg_op, rm);
 +        vfp_load_reg32(tcg_op, rm);
          if (sz == 1) {
              gen_helper_rinth(tcg_res, tcg_op, fpst);
          } else {
              gen_helper_rints(tcg_res, tcg_op, fpst);
          }
 -        neon_store_reg32(tcg_res, rd);
 +        vfp_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_op);
          tcg_temp_free_i32(tcg_res);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
              gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
          }
          tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
 -        neon_store_reg32(tcg_tmp, rd);
 +        vfp_store_reg32(tcg_tmp, rd);
          tcg_temp_free_i32(tcg_tmp);
          tcg_temp_free_i64(tcg_res);
          tcg_temp_free_i64(tcg_double);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
          TCGv_i32 tcg_single, tcg_res;
          tcg_single = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        neon_load_reg32(tcg_single, rm);
 +        vfp_load_reg32(tcg_single, rm);
          if (sz == 1) {
              if (is_signed) {
                  gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
                  gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
              }
          }
 -        neon_store_reg32(tcg_res, rd);
 +        vfp_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_res);
          tcg_temp_free_i32(tcg_single);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
      if (a->l) {
          /* VFP to general purpose register */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vn);
 +        vfp_load_reg32(tmp, a->vn);
          tcg_gen_andi_i32(tmp, tmp, 0xffff);
          store_reg(s, a->rt, tmp);
      } else {
          /* general purpose register to VFP */
          tmp = load_reg(s, a->rt);
          tcg_gen_andi_i32(tmp, tmp, 0xffff);
 -        neon_store_reg32(tmp, a->vn);
 +        vfp_store_reg32(tmp, a->vn);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
      if (a->l) {
          /* VFP to general purpose register */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vn);
 +        vfp_load_reg32(tmp, a->vn);
          if (a->rt == 15) {
              /* Set the 4 flag bits in the CPSR.  */
              gen_set_nzcv(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
      } else {
          /* general purpose register to VFP */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vn);
 +        vfp_store_reg32(tmp, a->vn);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
      if (a->op) {
          /* fpreg to gpreg */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm);
 +        vfp_load_reg32(tmp, a->vm);
          store_reg(s, a->rt, tmp);
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm + 1);
 +        vfp_load_reg32(tmp, a->vm + 1);
          store_reg(s, a->rt2, tmp);
      } else {
          /* gpreg to fpreg */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vm);
 +        vfp_store_reg32(tmp, a->vm);
          tcg_temp_free_i32(tmp);
          tmp = load_reg(s, a->rt2);
 -        neon_store_reg32(tmp, a->vm + 1);
 +        vfp_store_reg32(tmp, a->vm + 1);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
      if (a->op) {
          /* fpreg to gpreg */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm * 2);
 +        vfp_load_reg32(tmp, a->vm * 2);
          store_reg(s, a->rt, tmp);
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm * 2 + 1);
 +        vfp_load_reg32(tmp, a->vm * 2 + 1);
          store_reg(s, a->rt2, tmp);
      } else {
          /* gpreg to fpreg */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vm * 2);
 +        vfp_store_reg32(tmp, a->vm * 2);
          tcg_temp_free_i32(tmp);
          tmp = load_reg(s, a->rt2);
 -        neon_store_reg32(tmp, a->vm * 2 + 1);
 +        vfp_store_reg32(tmp, a->vm * 2 + 1);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      tmp = tcg_temp_new_i32();
      if (a->l) {
          gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg32(tmp, a->vd);
 +        vfp_store_reg32(tmp, a->vd);
      } else {
 -        neon_load_reg32(tmp, a->vd);
 +        vfp_load_reg32(tmp, a->vd);
          gen_aa32_st16(s, tmp, addr, get_mem_index(s));
      }
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      tmp = tcg_temp_new_i32();
      if (a->l) {
          gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg32(tmp, a->vd);
 +        vfp_store_reg32(tmp, a->vd);
      } else {
 -        neon_load_reg32(tmp, a->vd);
 +        vfp_load_reg32(tmp, a->vd);
          gen_aa32_st32(s, tmp, addr, get_mem_index(s));
      }
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
          if (a->l) {
              /* load */
              gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
 -            neon_store_reg32(tmp, a->vd + i);
 +            vfp_store_reg32(tmp, a->vd + i);
          } else {
              /* store */
 -            neon_load_reg32(tmp, a->vd + i);
 +            vfp_load_reg32(tmp, a->vd + i);
              gen_aa32_st32(s, tmp, addr, get_mem_index(s));
          }
          tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
      fd = tcg_temp_new_i32();
      fpst = fpstatus_ptr(FPST_FPCR);
 -    neon_load_reg32(f0, vn);
 -    neon_load_reg32(f1, vm);
 +    vfp_load_reg32(f0, vn);
 +    vfp_load_reg32(f1, vm);
      for (;;) {
          if (reads_vd) {
 -            neon_load_reg32(fd, vd);
 +            vfp_load_reg32(fd, vd);
          }
          fn(fd, f0, f1, fpst);
 -        neon_store_reg32(fd, vd);
 +        vfp_store_reg32(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
          veclen--;
          vd = vfp_advance_sreg(vd, delta_d);
          vn = vfp_advance_sreg(vn, delta_d);
 -        neon_load_reg32(f0, vn);
 +        vfp_load_reg32(f0, vn);
          if (delta_m) {
              vm = vfp_advance_sreg(vm, delta_m);
 -            neon_load_reg32(f1, vm);
 +            vfp_load_reg32(f1, vm);
          }
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
      fd = tcg_temp_new_i32();
      fpst = fpstatus_ptr(FPST_FPCR_F16);
 -    neon_load_reg32(f0, vn);
 -    neon_load_reg32(f1, vm);
 +    vfp_load_reg32(f0, vn);
 +    vfp_load_reg32(f1, vm);
      if (reads_vd) {
 -        neon_load_reg32(fd, vd);
 +        vfp_load_reg32(fd, vd);
      }
      fn(fd, f0, f1, fpst);
 -    neon_store_reg32(fd, vd);
 +    vfp_store_reg32(fd, vd);
      tcg_temp_free_i32(f0);
      tcg_temp_free_i32(f1);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      f0 = tcg_temp_new_i32();
      fd = tcg_temp_new_i32();
 -    neon_load_reg32(f0, vm);
 +    vfp_load_reg32(f0, vm);
      for (;;) {
          fn(fd, f0);
 -        neon_store_reg32(fd, vd);
 +        vfp_store_reg32(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
              /* single source one-many */
              while (veclen--) {
                  vd = vfp_advance_sreg(vd, delta_d);
 -                neon_store_reg32(fd, vd);
 +                vfp_store_reg32(fd, vd);
              }
              break;
          }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
          veclen--;
          vd = vfp_advance_sreg(vd, delta_d);
          vm = vfp_advance_sreg(vm, delta_m);
 -        neon_load_reg32(f0, vm);
 +        vfp_load_reg32(f0, vm);
      }
      tcg_temp_free_i32(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      }
      f0 = tcg_temp_new_i32();
 -    neon_load_reg32(f0, vm);
 +    vfp_load_reg32(f0, vm);
      fn(f0, f0);
 -    neon_store_reg32(f0, vd);
 +    vfp_store_reg32(f0, vd);
      tcg_temp_free_i32(f0);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vn, a->vn);
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vn, a->vn);
 +    vfp_load_reg32(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negh(vn, vn);
      }
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negh(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vn, a->vn);
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vn, a->vn);
 +    vfp_load_reg32(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negs(vn, vn);
      }
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negs(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
      }
      fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
 -    neon_store_reg32(fd, a->vd);
 +    vfp_store_reg32(fd, a->vd);
      tcg_temp_free_i32(fd);
      return true;
  }
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
-     fd = tcg_const_i32(vfp_expand_imm(MO_32, a->imm));
+-/* Logical (immediate)
+- *   31  30 29 28         23 22  21  16 15  10 9    5 4    0
-     for (;;) {
+- * +----+-----+-------------+---+------+------+------+------+
--        neon_store_reg32(fd, vd);
+- * | sf | opc | 1 0 0 1 0 0 | N | immr | imms |  Rn  |  Rd  |
-+        vfp_store_reg32(fd, vd);
+- * +----+-----+-------------+---+------+------+------+------+
+- */
-         if (veclen == 0) {
+-static void disas_logic_imm(DisasContext *s, uint32_t insn)
-             break;
++static bool gen_rri_log(DisasContext *s, arg_rri_log *a, bool set_cc,
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
++                        void (*fn)(TCGv_i64, TCGv_i64, int64_t))
-     vd = tcg_temp_new_i32();
+ {
-     vm = tcg_temp_new_i32();
+-    unsigned int sf, opc, is_n, immr, imms, rn, rd;
+     TCGv_i64 tcg_rd, tcg_rn;
--    neon_load_reg32(vd, a->vd);
+-    uint64_t wmask;
-+    vfp_load_reg32(vd, a->vd);
+-    bool is_and = false;
-     if (a->z) {
++    uint64_t imm;
-         tcg_gen_movi_i32(vm, 0);
-     } else {
+-    sf = extract32(insn, 31, 1);
--        neon_load_reg32(vm, a->vm);
+-    opc = extract32(insn, 29, 2);
-+        vfp_load_reg32(vm, a->vm);
+-    is_n = extract32(insn, 22, 1);
 -    immr = extract32(insn, 16, 6);
 -    imms = extract32(insn, 10, 6);
 -    rn = extract32(insn, 5, 5);
 -    rd = extract32(insn, 0, 5);
 -
 -    if (!sf && is_n) {
 -        unallocated_encoding(s);
 -        return;
 +    /* Some immediate field values are reserved. */
 +    if (!logic_imm_decode_wmask(&imm, extract32(a->dbm, 12, 1),
 +                                extract32(a->dbm, 0, 6),
 +                                extract32(a->dbm, 6, 6))) {
 +        return false;
 +    }
 +    if (!a->sf) {
 +        imm &= 0xffffffffull;
      }
-     if (a->e) {
+-    if (opc == 0x3) { /* ANDS */
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
+-        tcg_rd = cpu_reg(s, rd);
-     vd = tcg_temp_new_i32();
+-    } else {
-     vm = tcg_temp_new_i32();
+-        tcg_rd = cpu_reg_sp(s, rd);
+-    }
--    neon_load_reg32(vd, a->vd);
+-    tcg_rn = cpu_reg(s, rn);
-+    vfp_load_reg32(vd, a->vd);
++    tcg_rd = set_cc ? cpu_reg(s, a->rd) : cpu_reg_sp(s, a->rd);
-     if (a->z) {
++    tcg_rn = cpu_reg(s, a->rn);
-         tcg_gen_movi_i32(vm, 0);
-     } else {
+-    if (!logic_imm_decode_wmask(&wmask, is_n, imms, immr)) {
--        neon_load_reg32(vm, a->vm);
+-        /* some immediate field values are reserved */
-+        vfp_load_reg32(vm, a->vm);
+-        unallocated_encoding(s);
 -        return;
 +    fn(tcg_rd, tcg_rn, imm);
 +    if (set_cc) {
 +        gen_logic_CC(a->sf, tcg_rd);
      }
+-
-     if (a->e) {
+-    if (!sf) {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
+-        wmask &= 0xffffffff;
-     /* The T bit tells us if we want the low or high 16 bits of Vm */
+-    }
-     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
+-
-     gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
+-    switch (opc) {
--    neon_store_reg32(tmp, a->vd);
+-    case 0x3: /* ANDS */
-+    vfp_store_reg32(tmp, a->vd);
+-    case 0x0: /* AND */
-     tcg_temp_free_i32(ahp_mode);
+-        tcg_gen_andi_i64(tcg_rd, tcg_rn, wmask);
-     tcg_temp_free_ptr(fpst);
+-        is_and = true;
-     tcg_temp_free_i32(tmp);
+-        break;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
+-    case 0x1: /* ORR */
-     ahp_mode = get_ahp_flag();
+-        tcg_gen_ori_i64(tcg_rd, tcg_rn, wmask);
-     tmp = tcg_temp_new_i32();
+-        break;
+-    case 0x2: /* EOR */
--    neon_load_reg32(tmp, a->vm);
+-        tcg_gen_xori_i64(tcg_rd, tcg_rn, wmask);
-+    vfp_load_reg32(tmp, a->vm);
+-        break;
-     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
+-    default:
-     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
+-        assert(FALSE); /* must handle all above */
-     tcg_temp_free_i32(ahp_mode);
+-        break;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
+-    }
 -
 -    if (!sf && !is_and) {
 -        /* zero extend final result; we know we can skip this for AND
 -         * since the immediate had the high 32 bits clear.
 -         */
 +    if (!a->sf) {
          tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
      }
+-
-     tmp = tcg_temp_new_i32();
+-    if (opc == 3) { /* ANDS */
--    neon_load_reg32(tmp, a->vm);
+-        gen_logic_CC(sf, tcg_rd);
-+    vfp_load_reg32(tmp, a->vm);
+-    }
-     fpst = fpstatus_ptr(FPST_FPCR_F16);
++    return true;
      gen_helper_rinth(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rints(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rinth(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tcg_rmode);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rints(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tcg_rmode);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_rinth_exact(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rints_exact(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i64();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      gen_helper_vfp_fcvtds(vd, vm, cpu_env);
      neon_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
      vm = tcg_temp_new_i64();
      neon_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i64(vm);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
      }
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      if (a->s) {
          /* i32 -> f16 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
          /* u32 -> f16 */
          gen_helper_vfp_uitoh(vm, vm, fpst);
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
      }
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      if (a->s) {
          /* i32 -> f32 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
          /* u32 -> f32 */
          gen_helper_vfp_uitos(vm, vm, fpst);
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i64();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      if (a->s) {
          /* i32 -> f64 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      vd = tcg_temp_new_i32();
      neon_load_reg64(vm, a->vm);
      gen_helper_vjcvt(vd, vm, cpu_env);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i64(vm);
      tcg_temp_free_i32(vd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      if (a->s) {
          if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
              gen_helper_vfp_touih(vm, vm, fpst);
          }
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR);
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      if (a->s) {
          if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
              gen_helper_vfp_touis(vm, vm, fpst);
          }
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
              gen_helper_vfp_touid(vd, vm, fpst);
          }
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i64(vm);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
      /* Insert low half of Vm into high half of Vd */
      rm = tcg_temp_new_i32();
      rd = tcg_temp_new_i32();
 -    neon_load_reg32(rm, a->vm);
 -    neon_load_reg32(rd, a->vd);
 +    vfp_load_reg32(rm, a->vm);
 +    vfp_load_reg32(rd, a->vd);
      tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
 -    neon_store_reg32(rd, a->vd);
 +    vfp_store_reg32(rd, a->vd);
      tcg_temp_free_i32(rm);
      tcg_temp_free_i32(rd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
      /* Set Vd to high half of Vm */
      rm = tcg_temp_new_i32();
 -    neon_load_reg32(rm, a->vm);
 +    vfp_load_reg32(rm, a->vm);
      tcg_gen_shri_i32(rm, rm, 16);
 -    neon_store_reg32(rm, a->vd);
 +    vfp_store_reg32(rm, a->vd);
      tcg_temp_free_i32(rm);
      return true;
  }
++TRANS(AND_i, gen_rri_log, a, false, tcg_gen_andi_i64)
++TRANS(ORR_i, gen_rri_log, a, false, tcg_gen_ori_i64)
++TRANS(EOR_i, gen_rri_log, a, false, tcg_gen_xori_i64)
++TRANS(ANDS_i, gen_rri_log, a, true, tcg_gen_andi_i64)
++
+ /*
+  * Move wide (immediate)
+  *
+@@ -XXX,XX +XXX,XX @@ static void disas_extract(DisasContext *s, uint32_t insn)
+ static void disas_data_proc_imm(DisasContext *s, uint32_t insn)
+ {
+     switch (extract32(insn, 23, 6)) {
+-    case 0x24: /* Logical (immediate) */
+-        disas_logic_imm(s, insn);
+-        break;
+     case 0x25: /* Move wide (immediate) */
+         disas_movw_imm(s, insn);
+         break;
 --
-.20.1
+.34.1

-[PULL 01/26] target/arm: Introduce neon_full_reg_offset
+[PULL 16/29] target/arm: Convert Move wide (immediate) to decodetree
 From: Richard Henderson <richard.henderson@linaro.org>
-This function makes it clear that we're talking about the whole
+Convert the MON, MOVZ, MOVK instructions.
 register, and not the 32-bit piece at index 0.  This fixes a bug
 when running on a big-endian host.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20230512144106.3608981-11-peter.maydell@linaro.org
 [PMM: Rebased]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          |  8 ++++++
+ target/arm/tcg/a64.decode      | 13 ++++++
- target/arm/translate-neon.c.inc | 44 ++++++++++++++++-----------------
+ target/arm/tcg/translate-a64.c | 73 ++++++++++++++--------------------
- target/arm/translate-vfp.c.inc  |  2 +-
+files changed, 42 insertions(+), 44 deletions(-)
 files changed, 31 insertions(+), 23 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/a64.decode
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/a64.decode
-@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
+@@ -XXX,XX +XXX,XX @@ EOR_i           . 10 100100 . ...... ...... ..... ..... @logic_imm_64
-     unallocated_encoding(s);
+ EOR_i           . 10 100100 . ...... ...... ..... ..... @logic_imm_32
- }
+ ANDS_i          . 11 100100 . ...... ...... ..... ..... @logic_imm_64
+ ANDS_i          . 11 100100 . ...... ...... ..... ..... @logic_imm_32
-+/*
++
-+ * Return the offset of a "full" NEON Dreg.
++# Move wide (immediate)
-+ */
++
-+static long neon_full_reg_offset(unsigned reg)
++&movw           rd sf imm hw
 +@movw_64        1 .. ...... hw:2   imm:16 rd:5          &movw sf=1
 +@movw_32        0 .. ...... 0 hw:1 imm:16 rd:5          &movw sf=0
 +
 +MOVN            . 00 100101 .. ................ .....   @movw_64
 +MOVN            . 00 100101 .. ................ .....   @movw_32
 +MOVZ            . 10 100101 .. ................ .....   @movw_64
 +MOVZ            . 10 100101 .. ................ .....   @movw_32
 +MOVK            . 11 100101 .. ................ .....   @movw_64
 +MOVK            . 11 100101 .. ................ .....   @movw_32
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ TRANS(ANDS_i, gen_rri_log, a, true, tcg_gen_andi_i64)
  /*
   * Move wide (immediate)
 - *
 - *  31 30 29 28         23 22 21 20             5 4    0
 - * +--+-----+-------------+-----+----------------+------+
 - * |sf| opc | 1 0 0 1 0 1 |  hw |  imm16         |  Rd  |
 - * +--+-----+-------------+-----+----------------+------+
 - *
 - * sf: 0 -> 32 bit, 1 -> 64 bit
 - * opc: 00 -> N, 10 -> Z, 11 -> K
 - * hw: shift/16 (0,16, and sf only 32, 48)
   */
 -static void disas_movw_imm(DisasContext *s, uint32_t insn)
 +
 +static bool trans_MOVZ(DisasContext *s, arg_movw *a)
  {
 -    int rd = extract32(insn, 0, 5);
 -    uint64_t imm = extract32(insn, 5, 16);
 -    int sf = extract32(insn, 31, 1);
 -    int opc = extract32(insn, 29, 2);
 -    int pos = extract32(insn, 21, 2) << 4;
 -    TCGv_i64 tcg_rd = cpu_reg(s, rd);
 +    int pos = a->hw << 4;
 +    tcg_gen_movi_i64(cpu_reg(s, a->rd), (uint64_t)a->imm << pos);
 +    return true;
 +}
 -    if (!sf && (pos >= 32)) {
 -        unallocated_encoding(s);
 -        return;
 -    }
 +static bool trans_MOVN(DisasContext *s, arg_movw *a)
 +{
-+    return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
++    int pos = a->hw << 4;
 +    uint64_t imm = a->imm;
 -    switch (opc) {
 -    case 0: /* MOVN */
 -    case 2: /* MOVZ */
 -        imm <<= pos;
 -        if (opc == 0) {
 -            imm = ~imm;
 -        }
 -        if (!sf) {
 -            imm &= 0xffffffffu;
 -        }
 -        tcg_gen_movi_i64(tcg_rd, imm);
 -        break;
 -    case 3: /* MOVK */
 -        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_constant_i64(imm), pos, 16);
 -        if (!sf) {
 -            tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
 -        }
 -        break;
 -    default:
 -        unallocated_encoding(s);
 -        break;
 +    imm = ~(imm << pos);
 +    if (!a->sf) {
 +        imm = (uint32_t)imm;
      }
 +    tcg_gen_movi_i64(cpu_reg(s, a->rd), imm);
 +    return true;
 +}
 +
- static inline long vfp_reg_offset(bool dp, unsigned reg)
++static bool trans_MOVK(DisasContext *s, arg_movw *a)
 +{
 +    int pos = a->hw << 4;
 +    TCGv_i64 tcg_rd, tcg_im;
 +
 +    tcg_rd = cpu_reg(s, a->rd);
 +    tcg_im = tcg_constant_i64(a->imm);
 +    tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_im, pos, 16);
 +    if (!a->sf) {
 +        tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
 +    }
 +    return true;
  }
  /* Bitfield
@@ -XXX,XX +XXX,XX @@ static void disas_extract(DisasContext *s, uint32_t insn)
  static void disas_data_proc_imm(DisasContext *s, uint32_t insn)
  {
-     if (dp) {
+     switch (extract32(insn, 23, 6)) {
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+-    case 0x25: /* Move wide (immediate) */
-index XXXXXXX..XXXXXXX 100644
+-        disas_movw_imm(s, insn);
---- a/target/arm/translate-neon.c.inc
+-        break;
-+++ b/target/arm/translate-neon.c.inc
+     case 0x26: /* Bitfield */
-@@ -XXX,XX +XXX,XX @@ neon_element_offset(int reg, int element, MemOp size)
+         disas_bitfield(s, insn);
-         ofs ^= 8 - element_size;
+         break;
      }
  #endif
 -    return neon_reg_offset(reg, 0) + ofs;
 +    return neon_full_reg_offset(reg) + ofs;
  }
  static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
               * We cannot write 16 bytes at once because the
               * destination is unaligned.
               */
 -            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
 +            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
 , 8, tmp);
 -            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
 -                             neon_reg_offset(vd, 0), 8, 8);
 +            tcg_gen_gvec_mov(0, neon_full_reg_offset(vd + 1),
 +                             neon_full_reg_offset(vd), 8, 8);
          } else {
 -            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
 +            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                   vec_size, vec_size, tmp);
          }
          tcg_gen_addi_i32(addr, addr, 1 << size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
  static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
  {
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rn_ofs = neon_reg_offset(a->vn, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rn_ofs = neon_full_reg_offset(a->vn);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
  {
      /* Handle a 2-reg-shift insn which can be vectorized. */
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
  {
      /* FP operations in 2-reg-and-shift group */
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      TCGv_ptr fpst;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
          return true;
      }
 -    reg_ofs = neon_reg_offset(a->vd, 0);
 +    reg_ofs = neon_full_reg_offset(a->vd);
      vec_size = a->q ? 16 : 8;
      imm = asimd_imm_const(a->imm, a->cmode, a->op);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
          return true;
      }
 -    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
 -                       neon_reg_offset(a->vn, 0),
 -                       neon_reg_offset(a->vm, 0),
 +    tcg_gen_gvec_3_ool(neon_full_reg_offset(a->vd),
 +                       neon_full_reg_offset(a->vn),
 +                       neon_full_reg_offset(a->vm),
 , 16, 0, fn_gvec);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
  {
      /* Two registers and a scalar, using gvec */
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rn_ofs = neon_reg_offset(a->vn, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rn_ofs = neon_full_reg_offset(a->vn);
      int rm_ofs;
      int idx;
      TCGv_ptr fpstatus;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
      /* a->vm is M:Vm, which encodes both register and index */
      idx = extract32(a->vm, a->size + 2, 2);
      a->vm = extract32(a->vm, 0, a->size + 2);
 -    rm_ofs = neon_reg_offset(a->vm, 0);
 +    rm_ofs = neon_full_reg_offset(a->vm);
      fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
      tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
          return true;
      }
 -    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
 +    tcg_gen_gvec_dup_mem(a->size, neon_full_reg_offset(a->vd),
                           neon_element_offset(a->vm, a->index, a->size),
                           a->q ? 16 : 8, a->q ? 16 : 8);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
  static bool do_2misc_vec(DisasContext *s, arg_2misc *a, GVecGen2Fn *fn)
  {
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
      }
      tmp = load_reg(s, a->rt);
 -    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
 +    tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(a->vn),
                           vec_size, vec_size, tmp);
      tcg_temp_free_i32(tmp);
 --
-.20.1
+.34.1

-[PULL 11/26] target/arm: Improve do_prewiden_3d
+[PULL 17/29] target/arm: Convert Bitfield to decodetree
 From: Richard Henderson <richard.henderson@linaro.org>
-We can use proper widening loads to extend 32-bit inputs,
+Convert the BFM, SBFM, UBFM instructions.
 and skip the "widenfn" step.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20230512144106.3608981-12-peter.maydell@linaro.org
+[PMM: Rebased]
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          |  6 +++
+ target/arm/tcg/a64.decode      |  13 +++
- target/arm/translate-neon.c.inc | 66 ++++++++++++++++++---------------
+ target/arm/tcg/translate-a64.c | 144 ++++++++++++++++++---------------
-files changed, 43 insertions(+), 29 deletions(-)
+files changed, 94 insertions(+), 63 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/a64.decode
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/a64.decode
-@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
+@@ -XXX,XX +XXX,XX @@ MOVZ            . 10 100101 .. ................ .....   @movw_64
-     long off = neon_element_offset(reg, ele, memop);
+ MOVZ            . 10 100101 .. ................ .....   @movw_32
+ MOVK            . 11 100101 .. ................ .....   @movw_64
-     switch (memop) {
+ MOVK            . 11 100101 .. ................ .....   @movw_32
-+    case MO_SL:
++
-+        tcg_gen_ld32s_i64(dest, cpu_env, off);
++# Bitfield
-+        break;
++
-+    case MO_UL:
++&bitfield       rd rn sf immr imms
-+        tcg_gen_ld32u_i64(dest, cpu_env, off);
++@bitfield_64    1 .. ...... 1 immr:6 imms:6 rn:5 rd:5      &bitfield sf=1
-+        break;
++@bitfield_32    0 .. ...... 0 0 immr:5 0 imms:5 rn:5 rd:5  &bitfield sf=0
-     case MO_Q:
++
-         tcg_gen_ld_i64(dest, cpu_env, off);
++SBFM            . 00 100110 . ...... ...... ..... ..... @bitfield_64
-         break;
++SBFM            . 00 100110 . ...... ...... ..... ..... @bitfield_32
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
++BFM             . 01 100110 . ...... ...... ..... ..... @bitfield_64
 +BFM             . 01 100110 . ...... ...... ..... ..... @bitfield_32
 +UBFM            . 10 100110 . ...... ...... ..... ..... @bitfield_64
 +UBFM            . 10 100110 . ...... ...... ..... ..... @bitfield_32
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.c.inc
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/translate-neon.c.inc
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
+@@ -XXX,XX +XXX,XX @@ static bool trans_MOVK(DisasContext *s, arg_movw *a)
  static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
                             NeonGenWidenFn *widenfn,
                             NeonGenTwo64OpFn *opfn,
 -                           bool src1_wide)
 +                           int src1_mop, int src2_mop)
  {
      /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
      TCGv_i64 rn0_64, rn1_64, rm_64;
 -    TCGv_i32 rm;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
          return false;
      }
 -    if (!widenfn || !opfn) {
 +    if (!opfn) {
          /* size == 3 case, which is an entirely different insn group */
          return false;
      }
 -    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
 +    if ((a->vd & 1) || (src1_mop == MO_Q && (a->vn & 1))) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rn1_64 = tcg_temp_new_i64();
      rm_64 = tcg_temp_new_i64();
 -    if (src1_wide) {
 -        read_neon_element64(rn0_64, a->vn, 0, MO_64);
 +    if (src1_mop >= 0) {
 +        read_neon_element64(rn0_64, a->vn, 0, src1_mop);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 0, MO_32);
          widenfn(rn0_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = tcg_temp_new_i32();
 -    read_neon_element32(rm, a->vm, 0, MO_32);
 +    if (src2_mop >= 0) {
 +        read_neon_element64(rm_64, a->vm, 0, src2_mop);
 +    } else {
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, 0, MO_32);
 +        widenfn(rm_64, tmp);
 +        tcg_temp_free_i32(tmp);
 +    }
 -    widenfn(rm_64, rm);
 -    tcg_temp_free_i32(rm);
      opfn(rn0_64, rn0_64, rm_64);
      /*
       * Load second pass inputs before storing the first pass result, to
       * avoid incorrect results if a narrow input overlaps with the result.
       */
 -    if (src1_wide) {
 -        read_neon_element64(rn1_64, a->vn, 1, MO_64);
 +    if (src1_mop >= 0) {
 +        read_neon_element64(rn1_64, a->vn, 1, src1_mop);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 1, MO_32);
          widenfn(rn1_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = tcg_temp_new_i32();
 -    read_neon_element32(rm, a->vm, 1, MO_32);
 +    if (src2_mop >= 0) {
 +        read_neon_element64(rm_64, a->vm, 1, src2_mop);
 +    } else {
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, 1, MO_32);
 +        widenfn(rm_64, tmp);
 +        tcg_temp_free_i32(tmp);
 +    }
      write_neon_element64(rn0_64, a->vd, 0, MO_64);
 -    widenfn(rm_64, rm);
 -    tcg_temp_free_i32(rm);
      opfn(rn1_64, rn1_64, rm_64);
      write_neon_element64(rn1_64, a->vd, 1, MO_64);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      return true;
  }
--#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
+-/* Bitfield
-+#define DO_PREWIDEN(INSN, S, OP, SRC1WIDE, SIGN)                        \
+- *   31  30 29 28         23 22  21  16 15  10 9    5 4    0
-     static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+- * +----+-----+-------------+---+------+------+------+------+
-     {                                                                   \
+- * | sf | opc | 1 0 0 1 1 0 | N | immr | imms |  Rn  |  Rd  |
-         static NeonGenWidenFn * const widenfn[] = {                     \
+- * +----+-----+-------------+---+------+------+------+------+
-             gen_helper_neon_widen_##S##8,                               \
++/*
-             gen_helper_neon_widen_##S##16,                              \
++ * Bitfield
--            tcg_gen_##EXT##_i32_i64,                                    \
+  */
--            NULL,                                                       \
+-static void disas_bitfield(DisasContext *s, uint32_t insn)
-+            NULL, NULL,                                                 \
++
-         };                                                              \
++static bool trans_SBFM(DisasContext *s, arg_SBFM *a)
-         static NeonGenTwo64OpFn * const addfn[] = {                     \
+ {
-             gen_helper_neon_##OP##l_u16,                                \
+-    unsigned int sf, n, opc, ri, si, rn, rd, bitsize, pos, len;
-@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
+-    TCGv_i64 tcg_rd, tcg_tmp;
-             tcg_gen_##OP##_i64,                                         \
++    TCGv_i64 tcg_rd = cpu_reg(s, a->rd);
-             NULL,                                                       \
++    TCGv_i64 tcg_tmp = read_cpu_reg(s, a->rn, 1);
-         };                                                              \
++    unsigned int bitsize = a->sf ? 64 : 32;
--        return do_prewiden_3d(s, a, widenfn[a->size],                   \
++    unsigned int ri = a->immr;
--                              addfn[a->size], SRC1WIDE);                \
++    unsigned int si = a->imms;
-+        int narrow_mop = a->size == MO_32 ? MO_32 | SIGN : -1;          \
++    unsigned int pos, len;
-+        return do_prewiden_3d(s, a, widenfn[a->size], addfn[a->size],   \
-+                              SRC1WIDE ? MO_Q : narrow_mop,             \
+-    sf = extract32(insn, 31, 1);
-+                              narrow_mop);                              \
+-    opc = extract32(insn, 29, 2);
 -    n = extract32(insn, 22, 1);
 -    ri = extract32(insn, 16, 6);
 -    si = extract32(insn, 10, 6);
 -    rn = extract32(insn, 5, 5);
 -    rd = extract32(insn, 0, 5);
 -    bitsize = sf ? 64 : 32;
 -
 -    if (sf != n || ri >= bitsize || si >= bitsize || opc > 2) {
 -        unallocated_encoding(s);
 -        return;
 -    }
 -
 -    tcg_rd = cpu_reg(s, rd);
 -
 -    /* Suppress the zero-extend for !sf.  Since RI and SI are constrained
 -       to be smaller than bitsize, we'll never reference data outside the
 -       low 32-bits anyway.  */
 -    tcg_tmp = read_cpu_reg(s, rn, 1);
 -
 -    /* Recognize simple(r) extractions.  */
      if (si >= ri) {
          /* Wd<s-r:0> = Wn<s:r> */
          len = (si - ri) + 1;
 -        if (opc == 0) { /* SBFM: ASR, SBFX, SXTB, SXTH, SXTW */
 -            tcg_gen_sextract_i64(tcg_rd, tcg_tmp, ri, len);
 -            goto done;
 -        } else if (opc == 2) { /* UBFM: UBFX, LSR, UXTB, UXTH */
 -            tcg_gen_extract_i64(tcg_rd, tcg_tmp, ri, len);
 -            return;
 +        tcg_gen_sextract_i64(tcg_rd, tcg_tmp, ri, len);
 +        if (!a->sf) {
 +            tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
          }
 -        /* opc == 1, BFXIL fall through to deposit */
 +    } else {
 +        /* Wd<32+s-r,32-r> = Wn<s:0> */
 +        len = si + 1;
 +        pos = (bitsize - ri) & (bitsize - 1);
 +
 +        if (len < ri) {
 +            /*
 +             * Sign extend the destination field from len to fill the
 +             * balance of the word.  Let the deposit below insert all
 +             * of those sign bits.
 +             */
 +            tcg_gen_sextract_i64(tcg_tmp, tcg_tmp, 0, len);
 +            len = ri;
 +        }
 +
 +        /*
 +         * We start with zero, and we haven't modified any bits outside
 +         * bitsize, therefore no final zero-extension is unneeded for !sf.
 +         */
 +        tcg_gen_deposit_z_i64(tcg_rd, tcg_tmp, pos, len);
 +    }
 +    return true;
 +}
 +
 +static bool trans_UBFM(DisasContext *s, arg_UBFM *a)
 +{
 +    TCGv_i64 tcg_rd = cpu_reg(s, a->rd);
 +    TCGv_i64 tcg_tmp = read_cpu_reg(s, a->rn, 1);
 +    unsigned int bitsize = a->sf ? 64 : 32;
 +    unsigned int ri = a->immr;
 +    unsigned int si = a->imms;
 +    unsigned int pos, len;
 +
 +    tcg_rd = cpu_reg(s, a->rd);
 +    tcg_tmp = read_cpu_reg(s, a->rn, 1);
 +
 +    if (si >= ri) {
 +        /* Wd<s-r:0> = Wn<s:r> */
 +        len = (si - ri) + 1;
 +        tcg_gen_extract_i64(tcg_rd, tcg_tmp, ri, len);
 +    } else {
 +        /* Wd<32+s-r,32-r> = Wn<s:0> */
 +        len = si + 1;
 +        pos = (bitsize - ri) & (bitsize - 1);
 +        tcg_gen_deposit_z_i64(tcg_rd, tcg_tmp, pos, len);
 +    }
 +    return true;
 +}
 +
 +static bool trans_BFM(DisasContext *s, arg_BFM *a)
 +{
 +    TCGv_i64 tcg_rd = cpu_reg(s, a->rd);
 +    TCGv_i64 tcg_tmp = read_cpu_reg(s, a->rn, 1);
 +    unsigned int bitsize = a->sf ? 64 : 32;
 +    unsigned int ri = a->immr;
 +    unsigned int si = a->imms;
 +    unsigned int pos, len;
 +
 +    tcg_rd = cpu_reg(s, a->rd);
 +    tcg_tmp = read_cpu_reg(s, a->rn, 1);
 +
 +    if (si >= ri) {
 +        /* Wd<s-r:0> = Wn<s:r> */
          tcg_gen_shri_i64(tcg_tmp, tcg_tmp, ri);
 +        len = (si - ri) + 1;
          pos = 0;
      } else {
 -        /* Handle the ri > si case with a deposit
 -         * Wd<32+s-r,32-r> = Wn<s:0>
 -         */
 +        /* Wd<32+s-r,32-r> = Wn<s:0> */
          len = si + 1;
          pos = (bitsize - ri) & (bitsize - 1);
      }
--DO_PREWIDEN(VADDL_S, s, ext, add, false)
+-    if (opc == 0 && len < ri) {
--DO_PREWIDEN(VADDL_U, u, extu, add, false)
+-        /* SBFM: sign extend the destination field from len to fill
--DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
+-           the balance of the word.  Let the deposit below insert all
--DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
+-           of those sign bits.  */
--DO_PREWIDEN(VADDW_S, s, ext, add, true)
+-        tcg_gen_sextract_i64(tcg_tmp, tcg_tmp, 0, len);
--DO_PREWIDEN(VADDW_U, u, extu, add, true)
+-        len = ri;
--DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
+-    }
--DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+-
-+DO_PREWIDEN(VADDL_S, s, add, false, MO_SIGN)
+-    if (opc == 1) { /* BFM, BFXIL */
-+DO_PREWIDEN(VADDL_U, u, add, false, 0)
+-        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, pos, len);
-+DO_PREWIDEN(VSUBL_S, s, sub, false, MO_SIGN)
+-    } else {
-+DO_PREWIDEN(VSUBL_U, u, sub, false, 0)
+-        /* SBFM or UBFM: We start with zero, and we haven't modified
-+DO_PREWIDEN(VADDW_S, s, add, true, MO_SIGN)
+-           any bits outside bitsize, therefore the zero-extension
-+DO_PREWIDEN(VADDW_U, u, add, true, 0)
+-           below is unneeded.  */
-+DO_PREWIDEN(VSUBW_S, s, sub, true, MO_SIGN)
+-        tcg_gen_deposit_z_i64(tcg_rd, tcg_tmp, pos, len);
-+DO_PREWIDEN(VSUBW_U, u, sub, true, 0)
+-        return;
+-    }
- static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
+-
-                          NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
+- done:
 -    if (!sf) { /* zero extend final result */
 +    tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, pos, len);
 +    if (!a->sf) {
          tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
      }
 +    return true;
  }
  /* Extract
@@ -XXX,XX +XXX,XX @@ static void disas_extract(DisasContext *s, uint32_t insn)
  static void disas_data_proc_imm(DisasContext *s, uint32_t insn)
  {
      switch (extract32(insn, 23, 6)) {
 -    case 0x26: /* Bitfield */
 -        disas_bitfield(s, insn);
 -        break;
      case 0x27: /* Extract */
          disas_extract(s, insn);
          break;
 --
-.20.1
+.34.1

-New patch
+[PULL 18/29] target/arm: Convert Extract instructions to decodetree
+Convert the EXTR instruction to decodetree (this is the
+only one in the 'Extract" class). This is the last of
+the dp-immediate insns in the legacy decoder, so we
+can now remove disas_data_proc_imm().
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20230512144106.3608981-13-peter.maydell@linaro.org
+---
+ target/arm/tcg/a64.decode      |  7 +++
+ target/arm/tcg/translate-a64.c | 94 +++++++++++-----------------------
+files changed, 36 insertions(+), 65 deletions(-)
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/a64.decode
++++ b/target/arm/tcg/a64.decode
+@@ -XXX,XX +XXX,XX @@ BFM             . 01 100110 . ...... ...... ..... ..... @bitfield_64
+ BFM             . 01 100110 . ...... ...... ..... ..... @bitfield_32
+ UBFM            . 10 100110 . ...... ...... ..... ..... @bitfield_64
+ UBFM            . 10 100110 . ...... ...... ..... ..... @bitfield_32
++
++# Extract
++
++&extract        rd rn rm imm sf
++
++EXTR            1 00 100111 1 0 rm:5 imm:6 rn:5 rd:5     &extract sf=1
++EXTR            0 00 100111 0 0 rm:5 0 imm:5 rn:5 rd:5   &extract sf=0
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_BFM(DisasContext *s, arg_BFM *a)
+     return true;
+ }
+-/* Extract
+- *   31  30  29 28         23 22   21  20  16 15    10 9    5 4    0
+- * +----+------+-------------+---+----+------+--------+------+------+
+- * | sf | op21 | 1 0 0 1 1 1 | N | o0 |  Rm  |  imms  |  Rn  |  Rd  |
+- * +----+------+-------------+---+----+------+--------+------+------+
+- */
+-static void disas_extract(DisasContext *s, uint32_t insn)
++static bool trans_EXTR(DisasContext *s, arg_extract *a)
+ {
+-    unsigned int sf, n, rm, imm, rn, rd, bitsize, op21, op0;
++    TCGv_i64 tcg_rd, tcg_rm, tcg_rn;
+-    sf = extract32(insn, 31, 1);
+-    n = extract32(insn, 22, 1);
+-    rm = extract32(insn, 16, 5);
+-    imm = extract32(insn, 10, 6);
+-    rn = extract32(insn, 5, 5);
+-    rd = extract32(insn, 0, 5);
+-    op21 = extract32(insn, 29, 2);
+-    op0 = extract32(insn, 21, 1);
+-    bitsize = sf ? 64 : 32;
++    tcg_rd = cpu_reg(s, a->rd);
+-    if (sf != n || op21 || op0 || imm >= bitsize) {
+-        unallocated_encoding(s);
+-    } else {
+-        TCGv_i64 tcg_rd, tcg_rm, tcg_rn;
+-
+-        tcg_rd = cpu_reg(s, rd);
+-
+-        if (unlikely(imm == 0)) {
+-            /* tcg shl_i32/shl_i64 is undefined for 32/64 bit shifts,
+-             * so an extract from bit 0 is a special case.
+-             */
+-            if (sf) {
+-                tcg_gen_mov_i64(tcg_rd, cpu_reg(s, rm));
+-            } else {
+-                tcg_gen_ext32u_i64(tcg_rd, cpu_reg(s, rm));
+-            }
++    if (unlikely(a->imm == 0)) {
++        /*
++         * tcg shl_i32/shl_i64 is undefined for 32/64 bit shifts,
++         * so an extract from bit 0 is a special case.
++         */
++        if (a->sf) {
++            tcg_gen_mov_i64(tcg_rd, cpu_reg(s, a->rm));
+         } else {
+-            tcg_rm = cpu_reg(s, rm);
+-            tcg_rn = cpu_reg(s, rn);
++            tcg_gen_ext32u_i64(tcg_rd, cpu_reg(s, a->rm));
++        }
++    } else {
++        tcg_rm = cpu_reg(s, a->rm);
++        tcg_rn = cpu_reg(s, a->rn);
+-            if (sf) {
+-                /* Specialization to ROR happens in EXTRACT2.  */
+-                tcg_gen_extract2_i64(tcg_rd, tcg_rm, tcg_rn, imm);
++        if (a->sf) {
++            /* Specialization to ROR happens in EXTRACT2.  */
++            tcg_gen_extract2_i64(tcg_rd, tcg_rm, tcg_rn, a->imm);
++        } else {
++            TCGv_i32 t0 = tcg_temp_new_i32();
++
++            tcg_gen_extrl_i64_i32(t0, tcg_rm);
++            if (a->rm == a->rn) {
++                tcg_gen_rotri_i32(t0, t0, a->imm);
+             } else {
+-                TCGv_i32 t0 = tcg_temp_new_i32();
+-
+-                tcg_gen_extrl_i64_i32(t0, tcg_rm);
+-                if (rm == rn) {
+-                    tcg_gen_rotri_i32(t0, t0, imm);
+-                } else {
+-                    TCGv_i32 t1 = tcg_temp_new_i32();
+-                    tcg_gen_extrl_i64_i32(t1, tcg_rn);
+-                    tcg_gen_extract2_i32(t0, t0, t1, imm);
+-                }
+-                tcg_gen_extu_i32_i64(tcg_rd, t0);
++                TCGv_i32 t1 = tcg_temp_new_i32();
++                tcg_gen_extrl_i64_i32(t1, tcg_rn);
++                tcg_gen_extract2_i32(t0, t0, t1, a->imm);
+             }
++            tcg_gen_extu_i32_i64(tcg_rd, t0);
+         }
+     }
+-}
+-
+-/* Data processing - immediate */
+-static void disas_data_proc_imm(DisasContext *s, uint32_t insn)
+-{
+-    switch (extract32(insn, 23, 6)) {
+-    case 0x27: /* Extract */
+-        disas_extract(s, insn);
+-        break;
+-    default:
+-        unallocated_encoding(s);
+-        break;
+-    }
++    return true;
+ }
+ /* Shift a TCGv src by TCGv shift_amount, put result in dst.
+@@ -XXX,XX +XXX,XX @@ static bool btype_destination_ok(uint32_t insn, bool bt, int btype)
+ static void disas_a64_legacy(DisasContext *s, uint32_t insn)
+ {
+     switch (extract32(insn, 25, 4)) {
+-    case 0x8: case 0x9: /* Data processing - immediate */
+-        disas_data_proc_imm(s, insn);
+-        break;
+     case 0xa: case 0xb: /* Branch, exception generation and system insns */
+         disas_b_exc_sys(s, insn);
+         break;
+--
+.34.1

-[PULL 25/26] qemu-option-trace.rst.inc: Don't use option:: markup
+[PULL 19/29] target/arm: Convert unconditional branch immediate to decodetree
-Sphinx 3.2 is pickier than earlier versions about the option:: markup,
+Convert the unconditional branch immediate insns B and BL to
-and complains about our usage in qemu-option-trace.rst:
+decodetree.
 ../../docs/qemu-option-trace.rst.inc:4:Malformed option description
   '[enable=]PATTERN', should look like "opt", "-opt args", "--opt args",
   "/opt args" or "+opt args"
 In this file, we're really trying to document the different parts of
 the top-level --trace option, which qemu-nbd.rst and qemu-img.rst
 have already introduced with an option:: markup.  So it's not right
 to use option:: here anyway.  Switch to a different markup
 (definition lists) which gives about the same formatted output.
 (Unlike option::, this markup doesn't produce index entries; but
 at the moment we don't do anything much with indexes anyway, and
 in any case I think it doesn't make much sense to have individual
 index entries for the sub-parts of the --trace option.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20230512144106.3608981-14-peter.maydell@linaro.org
 Message-id: 20201030174700.7204-3-peter.maydell@linaro.org
 ---
- docs/qemu-option-trace.rst.inc | 6 +++---
+ target/arm/tcg/a64.decode      |  9 +++++++++
-file changed, 3 insertions(+), 3 deletions(-)
+ target/arm/tcg/translate-a64.c | 31 +++++++++++--------------------
 files changed, 20 insertions(+), 20 deletions(-)
-diff --git a/docs/qemu-option-trace.rst.inc b/docs/qemu-option-trace.rst.inc
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
 index XXXXXXX..XXXXXXX 100644
---- a/docs/qemu-option-trace.rst.inc
+--- a/target/arm/tcg/a64.decode
-+++ b/docs/qemu-option-trace.rst.inc
++++ b/target/arm/tcg/a64.decode
 @@ -XXX,XX +XXX,XX @@
- Specify tracing options.
+ &ri              rd imm
+ &rri_sf          rd rn imm sf
--.. option:: [enable=]PATTERN
++&i               imm
-+``[enable=]PATTERN``
-   Immediately enable events matching *PATTERN*
+ ### Data Processing - Immediate
-   (either event name or a globbing pattern).  This option is only
+@@ -XXX,XX +XXX,XX @@ UBFM            . 10 100110 . ...... ...... ..... ..... @bitfield_32
-@@ -XXX,XX +XXX,XX @@ Specify tracing options.
+ EXTR            1 00 100111 1 0 rm:5 imm:6 rn:5 rd:5     &extract sf=1
-   Use :option:`-trace help` to print a list of names of trace points.
+ EXTR            0 00 100111 0 0 rm:5 0 imm:5 rn:5 rd:5   &extract sf=0
++
--.. option:: events=FILE
++# Branches
-+``events=FILE``
++
++%imm26   0:s26 !function=times_4
-   Immediately enable events listed in *FILE*.
++@branch         . ..... .......................... &i imm=%imm26
-   The file must contain one event name (as listed in the ``trace-events-all``
++
-@@ -XXX,XX +XXX,XX @@ Specify tracing options.
++B               0 00101 .......................... @branch
-   available if QEMU has been compiled with the ``simple``, ``log`` or
++BL              1 00101 .......................... @branch
-   ``ftrace`` tracing backend.
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
--.. option:: file=FILE
+--- a/target/arm/tcg/translate-a64.c
-+``file=FILE``
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static inline AArch64DecodeFn *lookup_disas_fn(const AArch64DecodeTable *table,
-   Log output traces to *FILE*.
+  * match up with those in the manual.
-   This option is only available if QEMU has been compiled with
+  */
 -/* Unconditional branch (immediate)
 - *   31  30       26 25                                  0
 - * +----+-----------+-------------------------------------+
 - * | op | 0 0 1 0 1 |                 imm26               |
 - * +----+-----------+-------------------------------------+
 - */
 -static void disas_uncond_b_imm(DisasContext *s, uint32_t insn)
 +static bool trans_B(DisasContext *s, arg_i *a)
  {
 -    int64_t diff = sextract32(insn, 0, 26) * 4;
 -
 -    if (insn & (1U << 31)) {
 -        /* BL Branch with link */
 -        gen_pc_plus_diff(s, cpu_reg(s, 30), curr_insn_len(s));
 -    }
 -
 -    /* B Branch / BL Branch with link */
      reset_btype(s);
 -    gen_goto_tb(s, 0, diff);
 +    gen_goto_tb(s, 0, a->imm);
 +    return true;
 +}
 +
 +static bool trans_BL(DisasContext *s, arg_i *a)
 +{
 +    gen_pc_plus_diff(s, cpu_reg(s, 30), curr_insn_len(s));
 +    reset_btype(s);
 +    gen_goto_tb(s, 0, a->imm);
 +    return true;
  }
  /* Compare and branch (immediate)
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
  static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
  {
      switch (extract32(insn, 25, 7)) {
 -    case 0x0a: case 0x0b:
 -    case 0x4a: case 0x4b: /* Unconditional branch (immediate) */
 -        disas_uncond_b_imm(s, insn);
 -        break;
      case 0x1a: case 0x5a: /* Compare & branch (immediate) */
          disas_comp_b_imm(s, insn);
          break;
 --
-.20.1
+.34.1

-New patch
+[PULL 20/29] target/arm: Convert CBZ, CBNZ to decodetree
+Convert the compare-and-branch-immediate insns CBZ and CBNZ
+to decodetree.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20230512144106.3608981-15-peter.maydell@linaro.org
+---
+ target/arm/tcg/a64.decode      |  5 +++++
+ target/arm/tcg/translate-a64.c | 26 ++++++--------------------
+files changed, 11 insertions(+), 20 deletions(-)
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/a64.decode
++++ b/target/arm/tcg/a64.decode
+@@ -XXX,XX +XXX,XX @@ EXTR            0 00 100111 0 0 rm:5 0 imm:5 rn:5 rd:5   &extract sf=0
+ B               0 00101 .......................... @branch
+ BL              1 00101 .......................... @branch
++
++%imm19   5:s19 !function=times_4
++&cbz     rt imm sf nz
++
++CBZ             sf:1 011010 nz:1 ................... rt:5 &cbz imm=%imm19
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_BL(DisasContext *s, arg_i *a)
+     return true;
+ }
+-/* Compare and branch (immediate)
+- *   31  30         25  24  23                  5 4      0
+- * +----+-------------+----+---------------------+--------+
+- * | sf | 0 1 1 0 1 0 | op |         imm19       |   Rt   |
+- * +----+-------------+----+---------------------+--------+
+- */
+-static void disas_comp_b_imm(DisasContext *s, uint32_t insn)
++
++static bool trans_CBZ(DisasContext *s, arg_cbz *a)
+ {
+-    unsigned int sf, op, rt;
+-    int64_t diff;
+     DisasLabel match;
+     TCGv_i64 tcg_cmp;
+-    sf = extract32(insn, 31, 1);
+-    op = extract32(insn, 24, 1); /* 0: CBZ; 1: CBNZ */
+-    rt = extract32(insn, 0, 5);
+-    diff = sextract32(insn, 5, 19) * 4;
+-
+-    tcg_cmp = read_cpu_reg(s, rt, sf);
++    tcg_cmp = read_cpu_reg(s, a->rt, a->sf);
+     reset_btype(s);
+     match = gen_disas_label(s);
+-    tcg_gen_brcondi_i64(op ? TCG_COND_NE : TCG_COND_EQ,
++    tcg_gen_brcondi_i64(a->nz ? TCG_COND_NE : TCG_COND_EQ,
+                         tcg_cmp, 0, match.label);
+     gen_goto_tb(s, 0, 4);
+     set_disas_label(s, match);
+-    gen_goto_tb(s, 1, diff);
++    gen_goto_tb(s, 1, a->imm);
++    return true;
+ }
+ /* Test and branch (immediate)
+@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
+ static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
+ {
+     switch (extract32(insn, 25, 7)) {
+-    case 0x1a: case 0x5a: /* Compare & branch (immediate) */
+-        disas_comp_b_imm(s, insn);
+-        break;
+     case 0x1b: case 0x5b: /* Test & branch (immediate) */
+         disas_test_b_imm(s, insn);
+         break;
+--
+.34.1

-[PULL 24/26] scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
+[PULL 21/29] target/arm: Convert TBZ, TBNZ to decodetree
-The kerneldoc script currently emits Sphinx markup for a macro with
+Convert the test-and-branch-immediate insns TBZ and TBNZ
-arguments that uses the c:function directive. This is correct for
+to decodetree.
 Sphinx versions earlier than Sphinx 3, where c:macro doesn't allow
 documentation of macros with arguments and c:function is not picky
 about the syntax of what it is passed. However, in Sphinx 3 the
 c:macro directive was enhanced to support macros with arguments,
 and c:function was made more picky about what syntax it accepted.
 When kerneldoc is told that it needs to produce output for Sphinx
 or later, make it emit c:function only for functions and c:macro
 for macros with arguments. We assume that anything with a return
 type is a function and anything without is a macro.
 This fixes the Sphinx error:
 /home/petmay01/linaro/qemu-from-laptop/qemu/docs/../include/qom/object.h:155:Error in declarator
 If declarator-id with parameters (e.g., 'void f(int arg)'):
   Invalid C declaration: Expected identifier in nested name. [error at 25]
     DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
     -------------------------^
 If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
   Error in declarator or parameters
   Invalid C declaration: Expecting "(" in parameters. [error at 39]
     DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
     ---------------------------------------^
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20230512144106.3608981-16-peter.maydell@linaro.org
 Message-id: 20201030174700.7204-2-peter.maydell@linaro.org
 ---
- scripts/kernel-doc | 18 +++++++++++++++++-
+ target/arm/tcg/a64.decode      |  6 ++++++
-file changed, 17 insertions(+), 1 deletion(-)
+ target/arm/tcg/translate-a64.c | 25 +++++--------------------
 files changed, 11 insertions(+), 20 deletions(-)
-diff --git a/scripts/kernel-doc b/scripts/kernel-doc
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
-index XXXXXXX..XXXXXXX 100755
+index XXXXXXX..XXXXXXX 100644
---- a/scripts/kernel-doc
+--- a/target/arm/tcg/a64.decode
-+++ b/scripts/kernel-doc
++++ b/target/arm/tcg/a64.decode
-@@ -XXX,XX +XXX,XX @@ sub output_function_rst(%) {
+@@ -XXX,XX +XXX,XX @@ BL              1 00101 .......................... @branch
-     output_highlight_rst($args{'purpose'});
+ &cbz     rt imm sf nz
-     $start = "\n\n**Syntax**\n\n  ``";
-     } else {
+ CBZ             sf:1 011010 nz:1 ................... rt:5 &cbz imm=%imm19
--    print ".. c:function:: ";
++
-+        if ((split(/\./, $sphinx_version))[0] >= 3) {
++%imm14     5:s14 !function=times_4
-+            # Sphinx 3 and later distinguish macros and functions and
++%imm31_19  31:1 19:5
-+            # complain if you use c:function with something that's not
++&tbz       rt imm nz bitpos
-+            # syntactically valid as a function declaration.
++
-+            # We assume that anything with a return type is a function
++TBZ             . 011011 nz:1 ..... .............. rt:5 &tbz  imm=%imm14 bitpos=%imm31_19
-+            # and anything without is a macro.
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
-+            if ($args{'functiontype'} ne "") {
+index XXXXXXX..XXXXXXX 100644
-+                print ".. c:function:: ";
+--- a/target/arm/tcg/translate-a64.c
-+            } else {
++++ b/target/arm/tcg/translate-a64.c
-+                print ".. c:macro:: ";
+@@ -XXX,XX +XXX,XX @@ static bool trans_CBZ(DisasContext *s, arg_cbz *a)
-+            }
+     return true;
-+        } else {
+ }
-+            # Older Sphinx don't support documenting macros that take
-+            # arguments with c:macro, and don't complain about the use
+-/* Test and branch (immediate)
-+            # of c:function for this.
+- *   31  30         25  24  23   19 18          5 4    0
-+            print ".. c:function:: ";
+- * +----+-------------+----+-------+-------------+------+
-+        }
+- * | b5 | 0 1 1 0 1 1 | op |  b40  |    imm14    |  Rt  |
-     }
+- * +----+-------------+----+-------+-------------+------+
-     if ($args{'functiontype'} ne "") {
+- */
-     $start .= $args{'functiontype'} . " " . $args{'function'} . " (";
+-static void disas_test_b_imm(DisasContext *s, uint32_t insn)
 +static bool trans_TBZ(DisasContext *s, arg_tbz *a)
  {
 -    unsigned int bit_pos, op, rt;
 -    int64_t diff;
      DisasLabel match;
      TCGv_i64 tcg_cmp;
 -    bit_pos = (extract32(insn, 31, 1) << 5) | extract32(insn, 19, 5);
 -    op = extract32(insn, 24, 1); /* 0: TBZ; 1: TBNZ */
 -    diff = sextract32(insn, 5, 14) * 4;
 -    rt = extract32(insn, 0, 5);
 -
      tcg_cmp = tcg_temp_new_i64();
 -    tcg_gen_andi_i64(tcg_cmp, cpu_reg(s, rt), (1ULL << bit_pos));
 +    tcg_gen_andi_i64(tcg_cmp, cpu_reg(s, a->rt), 1ULL << a->bitpos);
      reset_btype(s);
      match = gen_disas_label(s);
 -    tcg_gen_brcondi_i64(op ? TCG_COND_NE : TCG_COND_EQ,
 +    tcg_gen_brcondi_i64(a->nz ? TCG_COND_NE : TCG_COND_EQ,
                          tcg_cmp, 0, match.label);
      gen_goto_tb(s, 0, 4);
      set_disas_label(s, match);
 -    gen_goto_tb(s, 1, diff);
 +    gen_goto_tb(s, 1, a->imm);
 +    return true;
  }
  /* Conditional branch (immediate)
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
  static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
  {
      switch (extract32(insn, 25, 7)) {
 -    case 0x1b: case 0x5b: /* Test & branch (immediate) */
 -        disas_test_b_imm(s, insn);
 -        break;
      case 0x2a: /* Conditional branch (immediate) */
          disas_cond_b_imm(s, insn);
          break;
 --
-.20.1
+.34.1

-[PULL 23/26] hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
+[PULL 22/29] target/arm: Convert conditional branch insns to decodetree
-In gicv3_init_cpuif() we copy the ARMCPU gicv3_maintenance_interrupt
+Convert the immediate conditional branch insn B.cond to
-into the GICv3CPUState struct's maintenance_irq field.  This will
+decodetree.
 only work if the board happens to have already wired up the CPU
 maintenance IRQ before the GIC was realized.  Unfortunately this is
 not the case for the 'virt' board, and so the value that gets copied
 is NULL (since a qemu_irq is really a pointer to an IRQState struct
 under the hood).  The effect is that the CPU interface code never
 actually raises the maintenance interrupt line.
-Instead, since the GICv3CPUState has a pointer to the CPUState, make
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-the dereference at the point where we want to raise the interrupt, to
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-avoid an implicit requirement on board code to wire things up in a
+Message-id: 20230512144106.3608981-17-peter.maydell@linaro.org
-particular order.
+---
  target/arm/tcg/a64.decode      |  2 ++
  target/arm/tcg/translate-a64.c | 30 ++++++------------------------
 files changed, 8 insertions(+), 24 deletions(-)
-Reported-by: Jose Martins <josemartins90@gmail.com>
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20201009153904.28529-1-peter.maydell@linaro.org
 Reviewed-by: Luc Michel <luc@lmichel.fr>
 ---
  include/hw/intc/arm_gicv3_common.h | 1 -
  hw/intc/arm_gicv3_cpuif.c          | 5 ++---
 files changed, 2 insertions(+), 4 deletions(-)
 diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/intc/arm_gicv3_common.h
+--- a/target/arm/tcg/a64.decode
-+++ b/include/hw/intc/arm_gicv3_common.h
++++ b/target/arm/tcg/a64.decode
-@@ -XXX,XX +XXX,XX @@ struct GICv3CPUState {
+@@ -XXX,XX +XXX,XX @@ CBZ             sf:1 011010 nz:1 ................... rt:5 &cbz imm=%imm19
-     qemu_irq parent_fiq;
+ &tbz       rt imm nz bitpos
-     qemu_irq parent_virq;
-     qemu_irq parent_vfiq;
+ TBZ             . 011011 nz:1 ..... .............. rt:5 &tbz  imm=%imm14 bitpos=%imm31_19
--    qemu_irq maintenance_irq;
++
++B_cond          0101010 0 ................... 0 cond:4 imm=%imm19
-     /* Redistributor */
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
      uint32_t level;                  /* Current IRQ level */
 diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/arm_gicv3_cpuif.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/hw/intc/arm_gicv3_cpuif.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
+@@ -XXX,XX +XXX,XX @@ static bool trans_TBZ(DisasContext *s, arg_tbz *a)
-     int irqlevel = 0;
+     return true;
      int fiqlevel = 0;
      int maintlevel = 0;
 +    ARMCPU *cpu = ARM_CPU(cs->cpu);
      idx = hppvi_index(cs);
      trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx);
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
      qemu_set_irq(cs->parent_vfiq, fiqlevel);
      qemu_set_irq(cs->parent_virq, irqlevel);
 -    qemu_set_irq(cs->maintenance_irq, maintlevel);
 +    qemu_set_irq(cpu->gicv3_maintenance_interrupt, maintlevel);
  }
- static uint64_t icv_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
+-/* Conditional branch (immediate)
-@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
+- *  31           25  24  23                  5   4  3    0
-             && cpu->gic_num_lrs) {
+- * +---------------+----+---------------------+----+------+
-             int j;
+- * | 0 1 0 1 0 1 0 | o1 |         imm19       | o0 | cond |
+- * +---------------+----+---------------------+----+------+
--            cs->maintenance_irq = cpu->gicv3_maintenance_interrupt;
+- */
 -static void disas_cond_b_imm(DisasContext *s, uint32_t insn)
 +static bool trans_B_cond(DisasContext *s, arg_B_cond *a)
  {
 -    unsigned int cond;
 -    int64_t diff;
 -
-             cs->num_list_regs = cpu->gic_num_lrs;
+-    if ((insn & (1 << 4)) || (insn & (1 << 24))) {
-             cs->vpribits = cpu->gic_vpribits;
+-        unallocated_encoding(s);
-             cs->vprebits = cpu->gic_vprebits;
+-        return;
 -    }
 -    diff = sextract32(insn, 5, 19) * 4;
 -    cond = extract32(insn, 0, 4);
 -
      reset_btype(s);
 -    if (cond < 0x0e) {
 +    if (a->cond < 0x0e) {
          /* genuinely conditional branches */
          DisasLabel match = gen_disas_label(s);
 -        arm_gen_test_cc(cond, match.label);
 +        arm_gen_test_cc(a->cond, match.label);
          gen_goto_tb(s, 0, 4);
          set_disas_label(s, match);
 -        gen_goto_tb(s, 1, diff);
 +        gen_goto_tb(s, 1, a->imm);
      } else {
          /* 0xe and 0xf are both "always" conditions */
 -        gen_goto_tb(s, 0, diff);
 +        gen_goto_tb(s, 0, a->imm);
      }
 +    return true;
  }
  /* HINT instruction group, including various allocated HINTs */
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
  static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
  {
      switch (extract32(insn, 25, 7)) {
 -    case 0x2a: /* Conditional branch (immediate) */
 -        disas_cond_b_imm(s, insn);
 -        break;
      case 0x6a: /* Exception generation / System */
          if (insn & (1 << 24)) {
              if (extract32(insn, 22, 2) == 0) {
 --
-.20.1
+.34.1

-[PULL 08/26] target/arm: Add read/write_neon_element64
+[PULL 23/29] target/arm: Convert BR, BLR, RET to decodetree
-From: Richard Henderson <richard.henderson@linaro.org>
+Convert the simple (non-pointer-auth) BR, BLR and RET insns
 to decodetree.
-Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20230512144106.3608981-18-peter.maydell@linaro.org
 ---
  target/arm/tcg/a64.decode      |  5 ++++
  target/arm/tcg/translate-a64.c | 55 ++++++++++++++++++++++++++++++----
 files changed, 54 insertions(+), 6 deletions(-)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
 Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/translate.c          | 26 +++++++++
  target/arm/translate-neon.c.inc | 94 ++++++++++++++++-----------------
 files changed, 73 insertions(+), 47 deletions(-)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/a64.decode
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/a64.decode
-@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
+@@ -XXX,XX +XXX,XX @@
-     }
+ # This file is processed by scripts/decodetree.py
  #
 +&r               rn
  &ri              rd imm
  &rri_sf          rd rn imm sf
  &i               imm
@@ -XXX,XX +XXX,XX @@ CBZ             sf:1 011010 nz:1 ................... rt:5 &cbz imm=%imm19
  TBZ             . 011011 nz:1 ..... .............. rt:5 &tbz  imm=%imm14 bitpos=%imm31_19
  B_cond          0101010 0 ................... 0 cond:4 imm=%imm19
 +
 +BR              1101011 0000 11111 000000 rn:5 00000 &r
 +BLR             1101011 0001 11111 000000 rn:5 00000 &r
 +RET             1101011 0010 11111 000000 rn:5 00000 &r
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_B_cond(DisasContext *s, arg_B_cond *a)
      return true;
  }
-+static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
++static void set_btype_for_br(DisasContext *s, int rn)
 +{
-+    long off = neon_element_offset(reg, ele, memop);
++    if (dc_isar_feature(aa64_bti, s)) {
-+
++        /* BR to {x16,x17} or !guard -> 1, else 3.  */
-+    switch (memop) {
++        set_btype(s, rn == 16 || rn == 17 || !s->guarded_page ? 1 : 3);
 +    case MO_Q:
 +        tcg_gen_ld_i64(dest, cpu_env, off);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
- static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
++static void set_btype_for_blr(DisasContext *s)
  {
      long off = neon_element_offset(reg, ele, memop);
@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
      }
  }
 +static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
 +{
-+    long off = neon_element_offset(reg, ele, memop);
++    if (dc_isar_feature(aa64_bti, s)) {
-+
++        /* BLR sets BTYPE to 2, regardless of source guarded page.  */
-+    switch (memop) {
++        set_btype(s, 2);
 +    case MO_64:
 +        tcg_gen_st_i64(src, cpu_env, off);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
- static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
++static bool trans_BR(DisasContext *s, arg_r *a)
- {
++{
-     TCGv_ptr ret = tcg_temp_new_ptr();
++    gen_a64_set_pc(s, cpu_reg(s, a->rn));
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
++    set_btype_for_br(s, a->rn);
-index XXXXXXX..XXXXXXX 100644
++    s->base.is_jmp = DISAS_JUMP;
---- a/target/arm/translate-neon.c.inc
++    return true;
-+++ b/target/arm/translate-neon.c.inc
++}
-@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
++
-     for (pass = 0; pass < a->q + 1; pass++) {
++static bool trans_BLR(DisasContext *s, arg_r *a)
-         TCGv_i64 tmp = tcg_temp_new_i64();
++{
++    TCGv_i64 dst = cpu_reg(s, a->rn);
--        neon_load_reg64(tmp, a->vm + pass);
++    TCGv_i64 lr = cpu_reg(s, 30);
-+        read_neon_element64(tmp, a->vm, pass, MO_64);
++    if (dst == lr) {
-         fn(tmp, cpu_env, tmp, constimm);
++        TCGv_i64 tmp = tcg_temp_new_i64();
--        neon_store_reg64(tmp, a->vd + pass);
++        tcg_gen_mov_i64(tmp, dst);
-+        write_neon_element64(tmp, a->vd, pass, MO_64);
++        dst = tmp;
-         tcg_temp_free_i64(tmp);
++    }
-     }
++    gen_pc_plus_diff(s, lr, curr_insn_len(s));
-     tcg_temp_free_i64(constimm);
++    gen_a64_set_pc(s, dst);
-@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
++    set_btype_for_blr(s);
-     rd = tcg_temp_new_i32();
++    s->base.is_jmp = DISAS_JUMP;
++    return true;
-     /* Load both inputs first to avoid potential overwrite if rm == rd */
++}
--    neon_load_reg64(rm1, a->vm);
++
--    neon_load_reg64(rm2, a->vm + 1);
++static bool trans_RET(DisasContext *s, arg_r *a)
-+    read_neon_element64(rm1, a->vm, 0, MO_64);
++{
-+    read_neon_element64(rm2, a->vm, 1, MO_64);
++    gen_a64_set_pc(s, cpu_reg(s, a->rn));
++    s->base.is_jmp = DISAS_JUMP;
-     shiftfn(rm1, rm1, constimm);
++    return true;
-     narrowfn(rd, cpu_env, rm1);
++}
-@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
++
-         tcg_gen_shli_i64(tmp, tmp, a->shift);
+ /* HINT instruction group, including various allocated HINTs */
-         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
+ static void handle_hint(DisasContext *s, uint32_t insn,
-     }
+                         unsigned int op1, unsigned int op2, unsigned int crm)
--    neon_store_reg64(tmp, a->vd);
+@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
-+    write_neon_element64(tmp, a->vd, 0, MO_64);
+         btype_mod = opc;
+         switch (op3) {
-     widenfn(tmp, rm1);
+         case 0:
-     tcg_temp_free_i32(rm1);
+-            /* BR, BLR, RET */
-@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
+-            if (op4 != 0) {
-         tcg_gen_shli_i64(tmp, tmp, a->shift);
+-                goto do_unallocated;
-         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
+-            }
-     }
+-            dst = cpu_reg(s, rn);
--    neon_store_reg64(tmp, a->vd + 1);
+-            break;
-+    write_neon_element64(tmp, a->vd, 1, MO_64);
++            /* BR, BLR, RET : handled in decodetree */
-     tcg_temp_free_i64(tmp);
++            goto do_unallocated;
-     return true;
- }
+         case 2:
-@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
+         case 3:
      rm_64 = tcg_temp_new_i64();
      if (src1_wide) {
 -        neon_load_reg64(rn0_64, a->vn);
 +        read_neon_element64(rn0_64, a->vn, 0, MO_64);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 0, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
       * avoid incorrect results if a narrow input overlaps with the result.
       */
      if (src1_wide) {
 -        neon_load_reg64(rn1_64, a->vn + 1);
 +        read_neon_element64(rn1_64, a->vn, 1, MO_64);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rm = tcg_temp_new_i32();
      read_neon_element32(rm, a->vm, 1, MO_32);
 -    neon_store_reg64(rn0_64, a->vd);
 +    write_neon_element64(rn0_64, a->vd, 0, MO_64);
      widenfn(rm_64, rm);
      tcg_temp_free_i32(rm);
      opfn(rn1_64, rn1_64, rm_64);
 -    neon_store_reg64(rn1_64, a->vd + 1);
 +    write_neon_element64(rn1_64, a->vd, 1, MO_64);
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
      rd0 = tcg_temp_new_i32();
      rd1 = tcg_temp_new_i32();
 -    neon_load_reg64(rn_64, a->vn);
 -    neon_load_reg64(rm_64, a->vm);
 +    read_neon_element64(rn_64, a->vn, 0, MO_64);
 +    read_neon_element64(rm_64, a->vm, 0, MO_64);
      opfn(rn_64, rn_64, rm_64);
      narrowfn(rd0, rn_64);
 -    neon_load_reg64(rn_64, a->vn + 1);
 -    neon_load_reg64(rm_64, a->vm + 1);
 +    read_neon_element64(rn_64, a->vn, 1, MO_64);
 +    read_neon_element64(rm_64, a->vm, 1, MO_64);
      opfn(rn_64, rn_64, rm_64);
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
      /* Don't store results until after all loads: they might overlap */
      if (accfn) {
          tmp = tcg_temp_new_i64();
 -        neon_load_reg64(tmp, a->vd);
 +        read_neon_element64(tmp, a->vd, 0, MO_64);
          accfn(tmp, tmp, rd0);
 -        neon_store_reg64(tmp, a->vd);
 -        neon_load_reg64(tmp, a->vd + 1);
 +        write_neon_element64(tmp, a->vd, 0, MO_64);
 +        read_neon_element64(tmp, a->vd, 1, MO_64);
          accfn(tmp, tmp, rd1);
 -        neon_store_reg64(tmp, a->vd + 1);
 +        write_neon_element64(tmp, a->vd, 1, MO_64);
          tcg_temp_free_i64(tmp);
      } else {
 -        neon_store_reg64(rd0, a->vd);
 -        neon_store_reg64(rd1, a->vd + 1);
 +        write_neon_element64(rd0, a->vd, 0, MO_64);
 +        write_neon_element64(rd1, a->vd, 1, MO_64);
      }
      tcg_temp_free_i64(rd0);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      if (accfn) {
          TCGv_i64 t64 = tcg_temp_new_i64();
 -        neon_load_reg64(t64, a->vd);
 +        read_neon_element64(t64, a->vd, 0, MO_64);
          accfn(t64, t64, rn0_64);
 -        neon_store_reg64(t64, a->vd);
 -        neon_load_reg64(t64, a->vd + 1);
 +        write_neon_element64(t64, a->vd, 0, MO_64);
 +        read_neon_element64(t64, a->vd, 1, MO_64);
          accfn(t64, t64, rn1_64);
 -        neon_store_reg64(t64, a->vd + 1);
 +        write_neon_element64(t64, a->vd, 1, MO_64);
          tcg_temp_free_i64(t64);
      } else {
 -        neon_store_reg64(rn0_64, a->vd);
 -        neon_store_reg64(rn1_64, a->vd + 1);
 +        write_neon_element64(rn0_64, a->vd, 0, MO_64);
 +        write_neon_element64(rn1_64, a->vd, 1, MO_64);
      }
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
          right = tcg_temp_new_i64();
          dest = tcg_temp_new_i64();
 -        neon_load_reg64(right, a->vn);
 -        neon_load_reg64(left, a->vm);
 +        read_neon_element64(right, a->vn, 0, MO_64);
 +        read_neon_element64(left, a->vm, 0, MO_64);
          tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
 -        neon_store_reg64(dest, a->vd);
 +        write_neon_element64(dest, a->vd, 0, MO_64);
          tcg_temp_free_i64(left);
          tcg_temp_free_i64(right);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
          destright = tcg_temp_new_i64();
          if (a->imm < 8) {
 -            neon_load_reg64(right, a->vn);
 -            neon_load_reg64(middle, a->vn + 1);
 +            read_neon_element64(right, a->vn, 0, MO_64);
 +            read_neon_element64(middle, a->vn, 1, MO_64);
              tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
 -            neon_load_reg64(left, a->vm);
 +            read_neon_element64(left, a->vm, 0, MO_64);
              tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
          } else {
 -            neon_load_reg64(right, a->vn + 1);
 -            neon_load_reg64(middle, a->vm);
 +            read_neon_element64(right, a->vn, 1, MO_64);
 +            read_neon_element64(middle, a->vm, 0, MO_64);
              tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
 -            neon_load_reg64(left, a->vm + 1);
 +            read_neon_element64(left, a->vm, 1, MO_64);
              tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
          }
 -        neon_store_reg64(destright, a->vd);
 -        neon_store_reg64(destleft, a->vd + 1);
 +        write_neon_element64(destright, a->vd, 0, MO_64);
 +        write_neon_element64(destleft, a->vd, 1, MO_64);
          tcg_temp_free_i64(destright);
          tcg_temp_free_i64(destleft);
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
          if (accfn) {
              TCGv_i64 tmp64 = tcg_temp_new_i64();
 -            neon_load_reg64(tmp64, a->vd + pass);
 +            read_neon_element64(tmp64, a->vd, pass, MO_64);
              accfn(rd_64, tmp64, rd_64);
              tcg_temp_free_i64(tmp64);
          }
 -        neon_store_reg64(rd_64, a->vd + pass);
 +        write_neon_element64(rd_64, a->vd, pass, MO_64);
          tcg_temp_free_i64(rd_64);
      }
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
      rd0 = tcg_temp_new_i32();
      rd1 = tcg_temp_new_i32();
 -    neon_load_reg64(rm, a->vm);
 +    read_neon_element64(rm, a->vm, 0, MO_64);
      narrowfn(rd0, cpu_env, rm);
 -    neon_load_reg64(rm, a->vm + 1);
 +    read_neon_element64(rm, a->vm, 1, MO_64);
      narrowfn(rd1, cpu_env, rm);
      write_neon_element32(rd0, a->vd, 0, MO_32);
      write_neon_element32(rd1, a->vd, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      widenfn(rd, rm0);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
 -    neon_store_reg64(rd, a->vd);
 +    write_neon_element64(rd, a->vd, 0, MO_64);
      widenfn(rd, rm1);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
 -    neon_store_reg64(rd, a->vd + 1);
 +    write_neon_element64(rd, a->vd, 1, MO_64);
      tcg_temp_free_i64(rd);
      tcg_temp_free_i32(rm0);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSWP(DisasContext *s, arg_2misc *a)
      rm = tcg_temp_new_i64();
      rd = tcg_temp_new_i64();
      for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
 -        neon_load_reg64(rm, a->vm + pass);
 -        neon_load_reg64(rd, a->vd + pass);
 -        neon_store_reg64(rm, a->vd + pass);
 -        neon_store_reg64(rd, a->vm + pass);
 +        read_neon_element64(rm, a->vm, pass, MO_64);
 +        read_neon_element64(rd, a->vd, pass, MO_64);
 +        write_neon_element64(rm, a->vd, pass, MO_64);
 +        write_neon_element64(rd, a->vm, pass, MO_64);
      }
      tcg_temp_free_i64(rm);
      tcg_temp_free_i64(rd);
 --
-.20.1
+.34.1

-[PULL 12/26] target/arm: Fix float16 pairwise Neon ops on big-endian hosts
+[PULL 24/29] target/arm: Convert BRA[AB]Z, BLR[AB]Z, RETA[AB] to decodetree
-In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
+Convert the single-register pointer-authentication variants of BR,
-meant we were using the H4() address swizzler macro rather than the
+BLR, RET to decodetree. (BRAA/BLRAA are in a different branch of
-H2() which is required for 2-byte data.  This had no effect on
+the legacy decoder and will be dealt with in the next commit.)
 little-endian hosts but meant we put the result data into the
 destination Dreg in the wrong order on big-endian hosts.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20230512144106.3608981-19-peter.maydell@linaro.org
 Message-id: 20201028191712.4910-2-peter.maydell@linaro.org
 ---
- target/arm/vec_helper.c | 8 ++++----
+ target/arm/tcg/a64.decode      |   7 ++
-file changed, 4 insertions(+), 4 deletions(-)
+ target/arm/tcg/translate-a64.c | 132 +++++++++++++++++++--------------
 files changed, 84 insertions(+), 55 deletions(-)
-diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_helper.c
+--- a/target/arm/tcg/a64.decode
-+++ b/target/arm/vec_helper.c
++++ b/target/arm/tcg/a64.decode
-@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_d, uint64_t)
+@@ -XXX,XX +XXX,XX @@ B_cond          0101010 0 ................... 0 cond:4 imm=%imm19
-         r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst);                    \
+ BR              1101011 0000 11111 000000 rn:5 00000 &r
-         r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst);                    \
+ BLR             1101011 0001 11111 000000 rn:5 00000 &r
-                                                                         \
+ RET             1101011 0010 11111 000000 rn:5 00000 &r
--        d[H4(0)] = r0;                                                  \
++
--        d[H4(1)] = r1;                                                  \
++&braz       rn m
--        d[H4(2)] = r2;                                                  \
++BRAZ            1101011 0000 11111 00001 m:1 rn:5 11111 &braz   # BRAAZ, BRABZ
--        d[H4(3)] = r3;                                                  \
++BLRAZ           1101011 0001 11111 00001 m:1 rn:5 11111 &braz   # BLRAAZ, BLRABZ
-+        d[H2(0)] = r0;                                                  \
++
-+        d[H2(1)] = r1;                                                  \
++&reta       m
-+        d[H2(2)] = r2;                                                  \
++RETA            1101011 0010 11111 00001 m:1 11111 11111 &reta  # RETAA, RETAB
-+        d[H2(3)] = r3;                                                  \
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_RET(DisasContext *s, arg_r *a)
      return true;
  }
 +static TCGv_i64 auth_branch_target(DisasContext *s, TCGv_i64 dst,
 +                                   TCGv_i64 modifier, bool use_key_a)
 +{
 +    TCGv_i64 truedst;
 +    /*
 +     * Return the branch target for a BRAA/RETA/etc, which is either
 +     * just the destination dst, or that value with the pauth check
 +     * done and the code removed from the high bits.
 +     */
 +    if (!s->pauth_active) {
 +        return dst;
 +    }
 +
 +    truedst = tcg_temp_new_i64();
 +    if (use_key_a) {
 +        gen_helper_autia(truedst, cpu_env, dst, modifier);
 +    } else {
 +        gen_helper_autib(truedst, cpu_env, dst, modifier);
 +    }
 +    return truedst;
 +}
 +
 +static bool trans_BRAZ(DisasContext *s, arg_braz *a)
 +{
 +    TCGv_i64 dst;
 +
 +    if (!dc_isar_feature(aa64_pauth, s)) {
 +        return false;
 +    }
 +
 +    dst = auth_branch_target(s, cpu_reg(s, a->rn), tcg_constant_i64(0), !a->m);
 +    gen_a64_set_pc(s, dst);
 +    set_btype_for_br(s, a->rn);
 +    s->base.is_jmp = DISAS_JUMP;
 +    return true;
 +}
 +
 +static bool trans_BLRAZ(DisasContext *s, arg_braz *a)
 +{
 +    TCGv_i64 dst, lr;
 +
 +    if (!dc_isar_feature(aa64_pauth, s)) {
 +        return false;
 +    }
 +
 +    dst = auth_branch_target(s, cpu_reg(s, a->rn), tcg_constant_i64(0), !a->m);
 +    lr = cpu_reg(s, 30);
 +    if (dst == lr) {
 +        TCGv_i64 tmp = tcg_temp_new_i64();
 +        tcg_gen_mov_i64(tmp, dst);
 +        dst = tmp;
 +    }
 +    gen_pc_plus_diff(s, lr, curr_insn_len(s));
 +    gen_a64_set_pc(s, dst);
 +    set_btype_for_blr(s);
 +    s->base.is_jmp = DISAS_JUMP;
 +    return true;
 +}
 +
 +static bool trans_RETA(DisasContext *s, arg_reta *a)
 +{
 +    TCGv_i64 dst;
 +
 +    dst = auth_branch_target(s, cpu_reg(s, 30), cpu_X[31], !a->m);
 +    gen_a64_set_pc(s, dst);
 +    s->base.is_jmp = DISAS_JUMP;
 +    return true;
 +}
 +
  /* HINT instruction group, including various allocated HINTs */
  static void handle_hint(DisasContext *s, uint32_t insn,
                          unsigned int op1, unsigned int op2, unsigned int crm)
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
      }
- DO_NEON_PAIRWISE(neon_padd, add)
+     switch (opc) {
 -    case 0: /* BR */
 -    case 1: /* BLR */
 -    case 2: /* RET */
 -        btype_mod = opc;
 -        switch (op3) {
 -        case 0:
 -            /* BR, BLR, RET : handled in decodetree */
 -            goto do_unallocated;
 -
 -        case 2:
 -        case 3:
 -            if (!dc_isar_feature(aa64_pauth, s)) {
 -                goto do_unallocated;
 -            }
 -            if (opc == 2) {
 -                /* RETAA, RETAB */
 -                if (rn != 0x1f || op4 != 0x1f) {
 -                    goto do_unallocated;
 -                }
 -                rn = 30;
 -                modifier = cpu_X[31];
 -            } else {
 -                /* BRAAZ, BRABZ, BLRAAZ, BLRABZ */
 -                if (op4 != 0x1f) {
 -                    goto do_unallocated;
 -                }
 -                modifier = tcg_constant_i64(0);
 -            }
 -            if (s->pauth_active) {
 -                dst = tcg_temp_new_i64();
 -                if (op3 == 2) {
 -                    gen_helper_autia(dst, cpu_env, cpu_reg(s, rn), modifier);
 -                } else {
 -                    gen_helper_autib(dst, cpu_env, cpu_reg(s, rn), modifier);
 -                }
 -            } else {
 -                dst = cpu_reg(s, rn);
 -            }
 -            break;
 -
 -        default:
 -            goto do_unallocated;
 -        }
 -        /* BLR also needs to load return address */
 -        if (opc == 1) {
 -            TCGv_i64 lr = cpu_reg(s, 30);
 -            if (dst == lr) {
 -                TCGv_i64 tmp = tcg_temp_new_i64();
 -                tcg_gen_mov_i64(tmp, dst);
 -                dst = tmp;
 -            }
 -            gen_pc_plus_diff(s, lr, curr_insn_len(s));
 -        }
 -        gen_a64_set_pc(s, dst);
 -        break;
 +    case 0:
 +    case 1:
 +    case 2:
 +        /*
 +         * BR, BLR, RET, RETAA, RETAB, BRAAZ, BRABZ, BLRAAZ, BLRABZ:
 +         * handled in decodetree
 +         */
 +        goto do_unallocated;
      case 8: /* BRAA */
      case 9: /* BLRAA */
 --
-.20.1
+.34.1

-[PULL 15/26] target/arm: fix LORID_EL1 access check
+[PULL 25/29] target/arm: Convert BRAA, BRAB, BLRAA, BLRAB to decodetree
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+Convert the last four BR-with-pointer-auth insns to decodetree.
 The remaining cases in the outer switch in disas_uncond_b_reg()
 all return early rather than leaving the case statement, so we
 can delete the now-unused code at the end of that function.
-Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-future HCR_EL2.TLOR when S-EL2 is enabled.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20230512144106.3608981-20-peter.maydell@linaro.org
 ---
  target/arm/tcg/a64.decode      |  4 ++
  target/arm/tcg/translate-a64.c | 97 ++++++++++++++--------------------
 files changed, 43 insertions(+), 58 deletions(-)
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper.c | 19 +++++--------------
 file changed, 5 insertions(+), 14 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/a64.decode
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/a64.decode
-@@ -XXX,XX +XXX,XX @@ static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
+@@ -XXX,XX +XXX,XX @@ BLRAZ           1101011 0001 11111 00001 m:1 rn:5 11111 &braz   # BLRAAZ, BLRABZ
- #endif
+ &reta       m
- /* Shared logic between LORID and the rest of the LOR* registers.
+ RETA            1101011 0010 11111 00001 m:1 11111 11111 &reta  # RETAA, RETAB
-- * Secure state has already been delt with.
++
-+ * Secure state exclusion has already been dealt with.
++&bra        rn rm m
-  */
++BRA             1101011 1000 11111 00001 m:1 rn:5 rm:5 &bra # BRAA, BRAB
--static CPAccessResult access_lor_ns(CPUARMState *env)
++BLRA            1101011 1001 11111 00001 m:1 rn:5 rm:5 &bra # BLRAA, BLRAB
-+static CPAccessResult access_lor_ns(CPUARMState *env,
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
-+                                    const ARMCPRegInfo *ri, bool isread)
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_RETA(DisasContext *s, arg_reta *a)
      return true;
  }
 +static bool trans_BRA(DisasContext *s, arg_bra *a)
 +{
 +    TCGv_i64 dst;
 +
 +    if (!dc_isar_feature(aa64_pauth, s)) {
 +        return false;
 +    }
 +    dst = auth_branch_target(s, cpu_reg(s,a->rn), cpu_reg_sp(s, a->rm), !a->m);
 +    gen_a64_set_pc(s, dst);
 +    set_btype_for_br(s, a->rn);
 +    s->base.is_jmp = DISAS_JUMP;
 +    return true;
 +}
 +
 +static bool trans_BLRA(DisasContext *s, arg_bra *a)
 +{
 +    TCGv_i64 dst, lr;
 +
 +    if (!dc_isar_feature(aa64_pauth, s)) {
 +        return false;
 +    }
 +    dst = auth_branch_target(s, cpu_reg(s, a->rn), cpu_reg_sp(s, a->rm), !a->m);
 +    lr = cpu_reg(s, 30);
 +    if (dst == lr) {
 +        TCGv_i64 tmp = tcg_temp_new_i64();
 +        tcg_gen_mov_i64(tmp, dst);
 +        dst = tmp;
 +    }
 +    gen_pc_plus_diff(s, lr, curr_insn_len(s));
 +    gen_a64_set_pc(s, dst);
 +    set_btype_for_blr(s);
 +    s->base.is_jmp = DISAS_JUMP;
 +    return true;
 +}
 +
  /* HINT instruction group, including various allocated HINTs */
  static void handle_hint(DisasContext *s, uint32_t insn,
                          unsigned int op1, unsigned int op2, unsigned int crm)
@@ -XXX,XX +XXX,XX @@ static void disas_exc(DisasContext *s, uint32_t insn)
  static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
  {
-     int el = arm_current_el(env);
+     unsigned int opc, op2, op3, rn, op4;
+-    unsigned btype_mod = 2;   /* 0: BR, 1: BLR, 2: other */
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_ns(CPUARMState *env)
+     TCGv_i64 dst;
-     return CP_ACCESS_OK;
+     TCGv_i64 modifier;
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
      case 0:
      case 1:
      case 2:
 +    case 8:
 +    case 9:
          /*
 -         * BR, BLR, RET, RETAA, RETAB, BRAAZ, BRABZ, BLRAAZ, BLRABZ:
 -         * handled in decodetree
 +         * BR, BLR, RET, RETAA, RETAB, BRAAZ, BRABZ, BLRAAZ, BLRABZ,
 +         * BRAA, BLRAA: handled in decodetree
           */
          goto do_unallocated;
 -    case 8: /* BRAA */
 -    case 9: /* BLRAA */
 -        if (!dc_isar_feature(aa64_pauth, s)) {
 -            goto do_unallocated;
 -        }
 -        if ((op3 & ~1) != 2) {
 -            goto do_unallocated;
 -        }
 -        btype_mod = opc & 1;
 -        if (s->pauth_active) {
 -            dst = tcg_temp_new_i64();
 -            modifier = cpu_reg_sp(s, op4);
 -            if (op3 == 2) {
 -                gen_helper_autia(dst, cpu_env, cpu_reg(s, rn), modifier);
 -            } else {
 -                gen_helper_autib(dst, cpu_env, cpu_reg(s, rn), modifier);
 -            }
 -        } else {
 -            dst = cpu_reg(s, rn);
 -        }
 -        /* BLRAA also needs to load return address */
 -        if (opc == 9) {
 -            TCGv_i64 lr = cpu_reg(s, 30);
 -            if (dst == lr) {
 -                TCGv_i64 tmp = tcg_temp_new_i64();
 -                tcg_gen_mov_i64(tmp, dst);
 -                dst = tmp;
 -            }
 -            gen_pc_plus_diff(s, lr, curr_insn_len(s));
 -        }
 -        gen_a64_set_pc(s, dst);
 -        break;
 -
      case 4: /* ERET */
          if (s->current_el == 0) {
              goto do_unallocated;
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
          unallocated_encoding(s);
          return;
      }
 -
 -    switch (btype_mod) {
 -    case 0: /* BR */
 -        if (dc_isar_feature(aa64_bti, s)) {
 -            /* BR to {x16,x17} or !guard -> 1, else 3.  */
 -            set_btype(s, rn == 16 || rn == 17 || !s->guarded_page ? 1 : 3);
 -        }
 -        break;
 -
 -    case 1: /* BLR */
 -        if (dc_isar_feature(aa64_bti, s)) {
 -            /* BLR sets BTYPE to 2, regardless of source guarded page.  */
 -            set_btype(s, 2);
 -        }
 -        break;
 -
 -    default: /* RET or none of the above.  */
 -        /* BTYPE will be set to 0 by normal end-of-insn processing.  */
 -        break;
 -    }
 -
 -    s->base.is_jmp = DISAS_JUMP;
  }
--static CPAccessResult access_lorid(CPUARMState *env, const ARMCPRegInfo *ri,
+ /* Branches, exception generating and system instructions */
 -                                   bool isread)
 -{
 -    if (arm_is_secure_below_el3(env)) {
 -        /* Access ok in secure mode.  */
 -        return CP_ACCESS_OK;
 -    }
 -    return access_lor_ns(env);
 -}
 -
  static CPAccessResult access_lor_other(CPUARMState *env,
                                         const ARMCPRegInfo *ri, bool isread)
  {
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_other(CPUARMState *env,
          /* Access denied in secure mode.  */
          return CP_ACCESS_TRAP;
      }
 -    return access_lor_ns(env);
 +    return access_lor_ns(env, ri, isread);
  }
  /*
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lor_reginfo[] = {
        .type = ARM_CP_CONST, .resetvalue = 0 },
      { .name = "LORID_EL1", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 7,
 -      .access = PL1_R, .accessfn = access_lorid,
 +      .access = PL1_R, .accessfn = access_lor_ns,
        .type = ARM_CP_CONST, .resetvalue = 0 },
      REGINFO_SENTINEL
  };
 --
-.20.1
+.34.1

-[PULL 03/26] target/arm: Use neon_element_offset in neon_load/store_reg
+[PULL 26/29] target/arm: Convert ERET, ERETAA, ERETAB to decodetree
-From: Richard Henderson <richard.henderson@linaro.org>
+Convert the exception-return insns ERET, ERETA and ERETB to
 decodetree. These were the last insns left in the legacy
 decoder function disas_uncond_reg_b(), which allows us to
 remove it.
-These are the only users of neon_reg_offset, so remove that.
+The old decoder explicitly decoded the DRPS instruction,
 only in order to call unallocated_encoding() on it, exactly
 as would have happened if it hadn't decoded it. This is
 because this insn always UNDEFs unless the CPU is in
 halting-debug state, which we don't emulate. So we list
 the pattern in a comment in a64.decode, but don't actively
 decode it.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20230512144106.3608981-21-peter.maydell@linaro.org
 ---
- target/arm/translate.c | 14 ++------------
+ target/arm/tcg/a64.decode      |   8 ++
-file changed, 2 insertions(+), 12 deletions(-)
+ target/arm/tcg/translate-a64.c | 163 +++++++++++----------------------
 files changed, 63 insertions(+), 108 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/a64.decode
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/a64.decode
-@@ -XXX,XX +XXX,XX @@ static inline long vfp_reg_offset(bool dp, unsigned reg)
+@@ -XXX,XX +XXX,XX @@ RETA            1101011 0010 11111 00001 m:1 11111 11111 &reta  # RETAA, RETAB
  &bra        rn rm m
  BRA             1101011 1000 11111 00001 m:1 rn:5 rm:5 &bra # BRAA, BRAB
  BLRA            1101011 1001 11111 00001 m:1 rn:5 rm:5 &bra # BLRAA, BLRAB
 +
 +ERET            1101011 0100 11111 000000 11111 00000
 +ERETA           1101011 0100 11111 00001 m:1 11111 11111 &reta  # ERETAA, ERETAB
 +
 +# We don't need to decode DRPS because it always UNDEFs except when
 +# the processor is in halting debug state (which we don't implement).
 +# The pattern is listed here as documentation.
 +# DRPS            1101011 0101 11111 000000 11111 00000
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BLRA(DisasContext *s, arg_bra *a)
      return true;
  }
 +static bool trans_ERET(DisasContext *s, arg_ERET *a)
 +{
 +    TCGv_i64 dst;
 +
 +    if (s->current_el == 0) {
 +        return false;
 +    }
 +    if (s->fgt_eret) {
 +        gen_exception_insn_el(s, 0, EXCP_UDEF, 0, 2);
 +        return true;
 +    }
 +    dst = tcg_temp_new_i64();
 +    tcg_gen_ld_i64(dst, cpu_env,
 +                   offsetof(CPUARMState, elr_el[s->current_el]));
 +
 +    if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 +        gen_io_start();
 +    }
 +
 +    gen_helper_exception_return(cpu_env, dst);
 +    /* Must exit loop to check un-masked IRQs */
 +    s->base.is_jmp = DISAS_EXIT;
 +    return true;
 +}
 +
 +static bool trans_ERETA(DisasContext *s, arg_reta *a)
 +{
 +    TCGv_i64 dst;
 +
 +    if (!dc_isar_feature(aa64_pauth, s)) {
 +        return false;
 +    }
 +    if (s->current_el == 0) {
 +        return false;
 +    }
 +    /* The FGT trap takes precedence over an auth trap. */
 +    if (s->fgt_eret) {
 +        gen_exception_insn_el(s, 0, EXCP_UDEF, a->m ? 3 : 2, 2);
 +        return true;
 +    }
 +    dst = tcg_temp_new_i64();
 +    tcg_gen_ld_i64(dst, cpu_env,
 +                   offsetof(CPUARMState, elr_el[s->current_el]));
 +
 +    dst = auth_branch_target(s, dst, cpu_X[31], !a->m);
 +    if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 +        gen_io_start();
 +    }
 +
 +    gen_helper_exception_return(cpu_env, dst);
 +    /* Must exit loop to check un-masked IRQs */
 +    s->base.is_jmp = DISAS_EXIT;
 +    return true;
 +}
 +
  /* HINT instruction group, including various allocated HINTs */
  static void handle_hint(DisasContext *s, uint32_t insn,
                          unsigned int op1, unsigned int op2, unsigned int crm)
@@ -XXX,XX +XXX,XX @@ static void disas_exc(DisasContext *s, uint32_t insn)
      }
  }
--/* Return the offset of a 32-bit piece of a NEON register.
+-/* Unconditional branch (register)
--   zero is the least significant end of the register.  */
+- *  31           25 24   21 20   16 15   10 9    5 4     0
--static inline long
+- * +---------------+-------+-------+-------+------+-------+
--neon_reg_offset (int reg, int n)
+- * | 1 1 0 1 0 1 1 |  opc  |  op2  |  op3  |  Rn  |  op4  |
 - * +---------------+-------+-------+-------+------+-------+
 - */
 -static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
 -{
--    int sreg;
+-    unsigned int opc, op2, op3, rn, op4;
--    sreg = reg * 2 + n;
+-    TCGv_i64 dst;
--    return vfp_reg_offset(0, sreg);
+-    TCGv_i64 modifier;
 -
 -    opc = extract32(insn, 21, 4);
 -    op2 = extract32(insn, 16, 5);
 -    op3 = extract32(insn, 10, 6);
 -    rn = extract32(insn, 5, 5);
 -    op4 = extract32(insn, 0, 5);
 -
 -    if (op2 != 0x1f) {
 -        goto do_unallocated;
 -    }
 -
 -    switch (opc) {
 -    case 0:
 -    case 1:
 -    case 2:
 -    case 8:
 -    case 9:
 -        /*
 -         * BR, BLR, RET, RETAA, RETAB, BRAAZ, BRABZ, BLRAAZ, BLRABZ,
 -         * BRAA, BLRAA: handled in decodetree
 -         */
 -        goto do_unallocated;
 -
 -    case 4: /* ERET */
 -        if (s->current_el == 0) {
 -            goto do_unallocated;
 -        }
 -        switch (op3) {
 -        case 0: /* ERET */
 -            if (op4 != 0) {
 -                goto do_unallocated;
 -            }
 -            if (s->fgt_eret) {
 -                gen_exception_insn_el(s, 0, EXCP_UDEF, syn_erettrap(op3), 2);
 -                return;
 -            }
 -            dst = tcg_temp_new_i64();
 -            tcg_gen_ld_i64(dst, cpu_env,
 -                           offsetof(CPUARMState, elr_el[s->current_el]));
 -            break;
 -
 -        case 2: /* ERETAA */
 -        case 3: /* ERETAB */
 -            if (!dc_isar_feature(aa64_pauth, s)) {
 -                goto do_unallocated;
 -            }
 -            if (rn != 0x1f || op4 != 0x1f) {
 -                goto do_unallocated;
 -            }
 -            /* The FGT trap takes precedence over an auth trap. */
 -            if (s->fgt_eret) {
 -                gen_exception_insn_el(s, 0, EXCP_UDEF, syn_erettrap(op3), 2);
 -                return;
 -            }
 -            dst = tcg_temp_new_i64();
 -            tcg_gen_ld_i64(dst, cpu_env,
 -                           offsetof(CPUARMState, elr_el[s->current_el]));
 -            if (s->pauth_active) {
 -                modifier = cpu_X[31];
 -                if (op3 == 2) {
 -                    gen_helper_autia(dst, cpu_env, dst, modifier);
 -                } else {
 -                    gen_helper_autib(dst, cpu_env, dst, modifier);
 -                }
 -            }
 -            break;
 -
 -        default:
 -            goto do_unallocated;
 -        }
 -        if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 -            gen_io_start();
 -        }
 -
 -        gen_helper_exception_return(cpu_env, dst);
 -        /* Must exit loop to check un-masked IRQs */
 -        s->base.is_jmp = DISAS_EXIT;
 -        return;
 -
 -    case 5: /* DRPS */
 -        if (op3 != 0 || op4 != 0 || rn != 0x1f) {
 -            goto do_unallocated;
 -        } else {
 -            unallocated_encoding(s);
 -        }
 -        return;
 -
 -    default:
 -    do_unallocated:
 -        unallocated_encoding(s);
 -        return;
 -    }
 -}
 -
- static TCGv_i32 neon_load_reg(int reg, int pass)
+ /* Branches, exception generating and system instructions */
  static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
  {
-     TCGv_i32 tmp = tcg_temp_new_i32();
+@@ -XXX,XX +XXX,XX @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
--    tcg_gen_ld_i32(tmp, cpu_env, neon_reg_offset(reg, pass));
+             disas_exc(s, insn);
-+    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
+         }
-     return tmp;
+         break;
- }
+-    case 0x6b: /* Unconditional branch (register) */
+-        disas_uncond_b_reg(s, insn);
- static void neon_store_reg(int reg, int pass, TCGv_i32 var)
+-        break;
- {
+     default:
--    tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
+         unallocated_encoding(s);
-+    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
+         break;
      tcg_temp_free_i32(var);
  }
 --
-.20.1
+.34.1

-[PULL 21/26] target/arm: Get correct MMU index for other-security-state
+[PULL 27/29] target/arm: Saturate L2CTLR_EL1 core count field rather than overflowing
-In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
+The IMPDEF sysreg L2CTLR_EL1 found on the Cortex-A35, A53, A57, A72
-armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
+and which we (arguably dubiously) also provide in '-cpu max' has a
-This is incorrect when the security state being queried is not the
+bit field for the number of processors in the cluster. On real
-current one, because arm_current_el() uses the current security state
+hardware this must be sufficient because it can only be configured
-to determine which of the banked CONTROL.nPRIV bits to look at.
+with up to 4 CPUs in the cluster. However on QEMU if the board code
-The effect was that if (for instance) Secure state was in privileged
+does not explicitly configure the code into clusters with the right
-mode but Non-Secure was not then we would return the wrong MMU index.
+CPU count we default to "give the value assuming that all CPUs in
 the system are in a single cluster", which might be too big to fit
 in the field.
-The only places where we are using this function in a way that could
+Instead of just overflowing this 2-bit field, saturate to 3 (meaning
-trigger this bug are for the stack loads during a v8M function-return
+"4 CPUs", so at least we don't overwrite other fields in the register.
-and for the instruction fetch of a v8M SG insn.
+It's unlikely that any guest code really cares about the value in
 this field; at least, if it does it probably also wants the system
 to be more closely matching real hardware, i.e. not to have more
 than 4 CPUs.
-Fix the bug by expanding out the M-profile version of the
+This issue has been present since the L2CTLR was first added in
-arm_current_el() logic inline so it can use the passed in secstate
+commit 377a44ec8f2fac5b back in 2014. It was only noticed because
-rather than env->v7m.secure.
+Coverity complains (CID 1509227) that the shift might overflow 32 bits
 and inadvertently sign extend into the top half of the 64 bit value.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201022164408.13214-1-peter.maydell@linaro.org
+Message-id: 20230512170223.3801643-2-peter.maydell@linaro.org
 ---
- target/arm/m_helper.c | 3 ++-
+ target/arm/cortex-regs.c | 11 +++++++++--
-file changed, 2 insertions(+), 1 deletion(-)
+file changed, 9 insertions(+), 2 deletions(-)
-diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
+diff --git a/target/arm/cortex-regs.c b/target/arm/cortex-regs.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/m_helper.c
+--- a/target/arm/cortex-regs.c
-+++ b/target/arm/m_helper.c
++++ b/target/arm/cortex-regs.c
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
+@@ -XXX,XX +XXX,XX @@ static uint64_t l2ctlr_read(CPUARMState *env, const ARMCPRegInfo *ri)
  /* Return the MMU index for a v7M CPU in the specified security state */
  ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
  {
--    bool priv = arm_current_el(env) != 0;
+     ARMCPU *cpu = env_archcpu(env);
-+    bool priv = arm_v7m_is_handler_mode(env) ||
-+        !(env->v7m.control[secstate] & 1);
+-    /* Number of cores is in [25:24]; otherwise we RAZ */
+-    return (cpu->core_count - 1) << 24;
-     return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
++    /*
 +     * Number of cores is in [25:24]; otherwise we RAZ.
 +     * If the board didn't configure the CPUs into clusters,
 +     * we default to "all CPUs in one cluster", which might be
 +     * more than the 4 that the hardware permits and which is
 +     * all you can report in this two-bit field. Saturate to
 +     * 0b11 (== 4 CPUs) rather than overflowing the field.
 +     */
 +    return MIN(cpu->core_count - 1, 3) << 24;
  }
+ static const ARMCPRegInfo cortex_a72_a57_a53_cp_reginfo[] = {
 --
-.20.1
+.34.1

-[PULL 13/26] target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
+[PULL 28/29] hw/arm/vexpress: Avoid trivial memory leak of 'flashalias'
-The helper functions for performing the udot/sdot operations against
+In the vexpress board code, we allocate a new MemoryRegion at the top
-a scalar were not using an address-swizzling macro when converting
+of vexpress_common_init() but only set it up and use it inside the
-the index of the scalar element into a pointer into the vm array.
+"if (map[VE_NORFLASHALIAS] != -1)" conditional, so we leak it if not.
-This had no effect on little-endian hosts but meant we generated
+This isn't a very interesting leak as it's a tiny amount of memory
-incorrect results on big-endian hosts.
+once at startup, but it's easy to fix.
-For these insns, the index is indexing over group of 4 8-bit values,
+We could silence Coverity simply by moving the g_new() into the
-so 32 bits per indexed entity, and H4() is therefore what we want.
+if() block, but this use of g_new(MemoryRegion, 1) is a legacy from
-(For Neon the only possible input indexes are 0 and 1.)
+when this board model was originally written; we wouldn't do that
 if we wrote it today. The MemoryRegions are conceptually a part of
 the board and must not go away until the whole board is done with
 (at the end of the simulation), so they belong in its state struct.
 This machine already has a VexpressMachineState struct that extends
 MachineState, so statically put the MemoryRegions in there instead of
 dynamically allocating them separately at runtime.
 Spotted by Coverity (CID 1509083).
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 20201028191712.4910-3-peter.maydell@linaro.org
+Message-id: 20230512170223.3801643-3-peter.maydell@linaro.org
 ---
- target/arm/vec_helper.c | 4 ++--
+ hw/arm/vexpress.c | 40 ++++++++++++++++++++--------------------
-file changed, 2 insertions(+), 2 deletions(-)
+file changed, 20 insertions(+), 20 deletions(-)
-diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
+diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_helper.c
+--- a/hw/arm/vexpress.c
-+++ b/target/arm/vec_helper.c
++++ b/hw/arm/vexpress.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
+@@ -XXX,XX +XXX,XX @@ struct VexpressMachineClass {
-     intptr_t index = simd_data(desc);
-     uint32_t *d = vd;
+ struct VexpressMachineState {
-     int8_t *n = vn;
+     MachineState parent;
--    int8_t *m_indexed = (int8_t *)vm + index * 4;
++    MemoryRegion vram;
-+    int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
++    MemoryRegion sram;
++    MemoryRegion flashalias;
-     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
++    MemoryRegion lowram;
-      * Otherwise opr_sz is a multiple of 16.
++    MemoryRegion a15sram;
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
+     bool secure;
-     intptr_t index = simd_data(desc);
+     bool virt;
-     uint32_t *d = vd;
+ };
-     uint8_t *n = vn;
+@@ -XXX,XX +XXX,XX @@ struct VexpressMachineState {
--    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
+ #define TYPE_VEXPRESS_A15_MACHINE   MACHINE_TYPE_NAME("vexpress-a15")
-+    uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
+ OBJECT_DECLARE_TYPE(VexpressMachineState, VexpressMachineClass, VEXPRESS_MACHINE)
-     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
+-typedef void DBoardInitFn(const VexpressMachineState *machine,
-      * Otherwise opr_sz is a multiple of 16.
++typedef void DBoardInitFn(VexpressMachineState *machine,
                            ram_addr_t ram_size,
                            const char *cpu_type,
                            qemu_irq *pic);
@@ -XXX,XX +XXX,XX @@ static void init_cpus(MachineState *ms, const char *cpu_type,
      }
  }
 -static void a9_daughterboard_init(const VexpressMachineState *vms,
 +static void a9_daughterboard_init(VexpressMachineState *vms,
                                    ram_addr_t ram_size,
                                    const char *cpu_type,
                                    qemu_irq *pic)
  {
      MachineState *machine = MACHINE(vms);
      MemoryRegion *sysmem = get_system_memory();
 -    MemoryRegion *lowram = g_new(MemoryRegion, 1);
      ram_addr_t low_ram_size;
      if (ram_size > 0x40000000) {
@@ -XXX,XX +XXX,XX @@ static void a9_daughterboard_init(const VexpressMachineState *vms,
       * address space should in theory be remappable to various
       * things including ROM or RAM; we always map the RAM there.
       */
 -    memory_region_init_alias(lowram, NULL, "vexpress.lowmem", machine->ram,
 -                             0, low_ram_size);
 -    memory_region_add_subregion(sysmem, 0x0, lowram);
 +    memory_region_init_alias(&vms->lowram, NULL, "vexpress.lowmem",
 +                             machine->ram, 0, low_ram_size);
 +    memory_region_add_subregion(sysmem, 0x0, &vms->lowram);
      memory_region_add_subregion(sysmem, 0x60000000, machine->ram);
      /* 0x1e000000 A9MPCore (SCU) private memory region */
@@ -XXX,XX +XXX,XX @@ static VEDBoardInfo a9_daughterboard = {
      .init = a9_daughterboard_init,
  };
 -static void a15_daughterboard_init(const VexpressMachineState *vms,
 +static void a15_daughterboard_init(VexpressMachineState *vms,
                                     ram_addr_t ram_size,
                                     const char *cpu_type,
                                     qemu_irq *pic)
  {
      MachineState *machine = MACHINE(vms);
      MemoryRegion *sysmem = get_system_memory();
 -    MemoryRegion *sram = g_new(MemoryRegion, 1);
      {
          /* We have to use a separate 64 bit variable here to avoid the gcc
@@ -XXX,XX +XXX,XX @@ static void a15_daughterboard_init(const VexpressMachineState *vms,
      /* 0x2b060000: SP805 watchdog: not modelled */
      /* 0x2b0a0000: PL341 dynamic memory controller: not modelled */
      /* 0x2e000000: system SRAM */
 -    memory_region_init_ram(sram, NULL, "vexpress.a15sram", 0x10000,
 +    memory_region_init_ram(&vms->a15sram, NULL, "vexpress.a15sram", 0x10000,
                             &error_fatal);
 -    memory_region_add_subregion(sysmem, 0x2e000000, sram);
 +    memory_region_add_subregion(sysmem, 0x2e000000, &vms->a15sram);
      /* 0x7ffb0000: DMA330 DMA controller: not modelled */
      /* 0x7ffd0000: PL354 static memory controller: not modelled */
@@ -XXX,XX +XXX,XX @@ static void vexpress_common_init(MachineState *machine)
      I2CBus *i2c;
      ram_addr_t vram_size, sram_size;
      MemoryRegion *sysmem = get_system_memory();
 -    MemoryRegion *vram = g_new(MemoryRegion, 1);
 -    MemoryRegion *sram = g_new(MemoryRegion, 1);
 -    MemoryRegion *flashalias = g_new(MemoryRegion, 1);
 -    MemoryRegion *flash0mem;
      const hwaddr *map = daughterboard->motherboard_map;
      int i;
@@ -XXX,XX +XXX,XX @@ static void vexpress_common_init(MachineState *machine)
      if (map[VE_NORFLASHALIAS] != -1) {
          /* Map flash 0 as an alias into low memory */
 +        MemoryRegion *flash0mem;
          flash0mem = sysbus_mmio_get_region(SYS_BUS_DEVICE(pflash0), 0);
 -        memory_region_init_alias(flashalias, NULL, "vexpress.flashalias",
 +        memory_region_init_alias(&vms->flashalias, NULL, "vexpress.flashalias",
                                   flash0mem, 0, VEXPRESS_FLASH_SIZE);
 -        memory_region_add_subregion(sysmem, map[VE_NORFLASHALIAS], flashalias);
 +        memory_region_add_subregion(sysmem, map[VE_NORFLASHALIAS], &vms->flashalias);
      }
      dinfo = drive_get(IF_PFLASH, 0, 1);
      ve_pflash_cfi01_register(map[VE_NORFLASH1], "vexpress.flash1", dinfo);
      sram_size = 0x2000000;
 -    memory_region_init_ram(sram, NULL, "vexpress.sram", sram_size,
 +    memory_region_init_ram(&vms->sram, NULL, "vexpress.sram", sram_size,
                             &error_fatal);
 -    memory_region_add_subregion(sysmem, map[VE_SRAM], sram);
 +    memory_region_add_subregion(sysmem, map[VE_SRAM], &vms->sram);
      vram_size = 0x800000;
 -    memory_region_init_ram(vram, NULL, "vexpress.vram", vram_size,
 +    memory_region_init_ram(&vms->vram, NULL, "vexpress.vram", vram_size,
                             &error_fatal);
 -    memory_region_add_subregion(sysmem, map[VE_VIDEORAM], vram);
 +    memory_region_add_subregion(sysmem, map[VE_VIDEORAM], &vms->vram);
      /* 0x4e000000 LAN9118 Ethernet */
      if (nd_table[0].used) {
 --
-.20.1
+.34.1

-[PULL 22/26] configure: Test that gio libs from pkg-config work
+[PULL 29/29] docs: Convert u2f.txt to rST
-On some hosts (eg Ubuntu Bionic) pkg-config returns a set of
+Convert the u2f.txt file to rST, and place it in the right place
-libraries for gio-2.0 which don't actually work when compiling
+in our manual layout. The old text didn't fit very well into our
-statically. (Specifically, the returned library string includes
+manual style, so the new version ends up looking like a rewrite,
--lmount, but not -lblkid which -lmount depends upon, so linking
+although some of the original text is preserved:
-fails due to missing symbols.)
+ * the 'building' section of the old file is removed, since we
-Check that the libraries work, and don't enable gio if they don't,
+   generally assume that users have already built QEMU
-in the same way we do for gnutls.
+ * some rather verbose text has been cut back
  * document the passthrough device first, on the assumption
    that's most likely to be of interest to users
  * cut back on the duplication of text between sections
  * format example command lines etc with rST
 As it's a short document it seemed simplest to do this all
 in one go rather than try to do a minimal syntactic conversion
 and then clean up the wording and layout.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
+Reviewed-by: Thomas Huth <thuth@redhat.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Message-id: 20230421163734.1152076-1-peter.maydell@linaro.org
 Message-id: 20200928160402.7961-1-peter.maydell@linaro.org
 ---
- configure | 10 +++++++++-
+ docs/system/device-emulation.rst |   1 +
-file changed, 9 insertions(+), 1 deletion(-)
+ docs/system/devices/usb-u2f.rst  |  93 ++++++++++++++++++++++++++
+ docs/system/devices/usb.rst      |   2 +-
-diff --git a/configure b/configure
+ docs/u2f.txt                     | 110 -------------------------------
-index XXXXXXX..XXXXXXX 100755
+files changed, 95 insertions(+), 111 deletions(-)
---- a/configure
+ create mode 100644 docs/system/devices/usb-u2f.rst
-+++ b/configure
+ delete mode 100644 docs/u2f.txt
-@@ -XXX,XX +XXX,XX @@ if test "$static" = yes && test "$mingw32" = yes; then
- fi
+diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
+index XXXXXXX..XXXXXXX 100644
- if $pkg_config --atleast-version=$glib_req_ver gio-2.0; then
+--- a/docs/system/device-emulation.rst
--    gio=yes
++++ b/docs/system/device-emulation.rst
-     gio_cflags=$($pkg_config --cflags gio-2.0)
+@@ -XXX,XX +XXX,XX @@ Emulated Devices
-     gio_libs=$($pkg_config --libs gio-2.0)
+    devices/virtio-pmem.rst
-     gdbus_codegen=$($pkg_config --variable=gdbus_codegen gio-2.0)
+    devices/vhost-user-rng.rst
-     if [ ! -x "$gdbus_codegen" ]; then
+    devices/canokey.rst
-         gdbus_codegen=
++   devices/usb-u2f.rst
-     fi
+    devices/igb.rst
-+    # Check that the libraries actually work -- Ubuntu 18.04 ships
+diff --git a/docs/system/devices/usb-u2f.rst b/docs/system/devices/usb-u2f.rst
-+    # with pkg-config --static --libs data for gio-2.0 that is missing
+new file mode 100644
-+    # -lblkid and will give a link error.
+index XXXXXXX..XXXXXXX
-+    write_c_skeleton
+--- /dev/null
-+    if compile_prog "" "gio_libs" ; then
++++ b/docs/system/devices/usb-u2f.rst
-+        gio=yes
+@@ -XXX,XX +XXX,XX @@
-+    else
++Universal Second Factor (U2F) USB Key Device
-+        gio=no
++============================================
-+    fi
++
- else
++U2F is an open authentication standard that enables relying parties
-     gio=no
++exposed to the internet to offer a strong second factor option for end
- fi
++user authentication.
 +
 +The second factor is provided by a device implementing the U2F
 +protocol. In case of a USB U2F security key, it is a USB HID device
 +that implements the U2F protocol.
 +
 +QEMU supports both pass-through of a host U2F key device to a VM,
 +and software emulation of a U2F key.
 +
 +``u2f-passthru``
 +----------------
 +
 +The ``u2f-passthru`` device allows you to connect a real hardware
 +U2F key on your host to a guest VM. All requests made from the guest
 +are passed through to the physical security key connected to the
 +host machine and vice versa.
 +
 +In addition, the dedicated pass-through allows you to share a single
 +U2F security key with several guest VMs, which is not possible with a
 +simple host device assignment pass-through.
 +
 +You can specify the host U2F key to use with the ``hidraw``
 +option, which takes the host path to a Linux ``/dev/hidrawN`` device:
 +
 +.. parsed-literal::
 +   |qemu_system| -usb -device u2f-passthru,hidraw=/dev/hidraw0
 +
 +If you don't specify the device, the ``u2f-passthru`` device will
 +autoscan to take the first U2F device it finds on the host (this
 +requires a working libudev):
 +
 +.. parsed-literal::
 +   |qemu_system| -usb -device u2f-passthru
 +
 +``u2f-emulated``
 +----------------
 +
 +``u2f-emulated`` is a completely software emulated U2F device.
 +It uses `libu2f-emu <https://github.com/MattGorko/libu2f-emu>`__
 +for the U2F key emulation. libu2f-emu
 +provides a complete implementation of the U2F protocol device part for
 +all specified transports given by the FIDO Alliance.
 +
 +To work, an emulated U2F device must have four elements:
 +
 + * ec x509 certificate
 + * ec private key
 + * counter (four bytes value)
 + * 48 bytes of entropy (random bits)
 +
 +To use this type of device, these have to be configured, and these
 +four elements must be passed one way or another.
 +
 +Assuming that you have a working libu2f-emu installed on the host,
 +there are three possible ways to configure the ``u2f-emulated`` device:
 +
 + * ephemeral
 + * setup directory
 + * manual
 +
 +Ephemeral is the simplest way to configure; it lets the device generate
 +all the elements it needs for a single use of the lifetime of the device.
 +It is the default if you do not pass any other options to the device.
 +
 +.. parsed-literal::
 +   |qemu_system| -usb -device u2f-emulated
 +
 +You can pass the device the path of a setup directory on the host
 +using the ``dir`` option; the directory must contain these four files:
 +
 + * ``certificate.pem``: ec x509 certificate
 + * ``private-key.pem``: ec private key
 + * ``counter``: counter value
 + * ``entropy``: 48 bytes of entropy
 +
 +.. parsed-literal::
 +   |qemu_system| -usb -device u2f-emulated,dir=$dir
 +
 +You can also manually pass the device the paths to each of these files,
 +if you don't want them all to be in the same directory, using the options
 +
 + * ``cert``
 + * ``priv``
 + * ``counter``
 + * ``entropy``
 +
 +.. parsed-literal::
 +   |qemu_system| -usb -device u2f-emulated,cert=$DIR1/$FILE1,priv=$DIR2/$FILE2,counter=$DIR3/$FILE3,entropy=$DIR4/$FILE4
 diff --git a/docs/system/devices/usb.rst b/docs/system/devices/usb.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/system/devices/usb.rst
 +++ b/docs/system/devices/usb.rst
@@ -XXX,XX +XXX,XX @@ option or the ``device_add`` monitor command. Available devices are:
     USB audio device
  ``u2f-{emulated,passthru}``
 -   Universal Second Factor device
 +   :doc:`usb-u2f`
  ``canokey``
     An Open-source Secure Key implementing FIDO2, OpenPGP, PIV and more.
 diff --git a/docs/u2f.txt b/docs/u2f.txt
 deleted file mode 100644
 index XXXXXXX..XXXXXXX
 --- a/docs/u2f.txt
 +++ /dev/null
@@ -XXX,XX +XXX,XX @@
 -QEMU U2F Key Device Documentation.
 -
 -Contents
 -1. USB U2F key device
 -2. Building
 -3. Using u2f-emulated
 -4. Using u2f-passthru
 -5. Libu2f-emu
 -
 -1. USB U2F key device
 -
 -U2F is an open authentication standard that enables relying parties
 -exposed to the internet to offer a strong second factor option for end
 -user authentication.
 -
 -The standard brings many advantages to both parties, client and server,
 -allowing to reduce over-reliance on passwords, it increases authentication
 -security and simplifies passwords.
 -
 -The second factor is materialized by a device implementing the U2F
 -protocol. In case of a USB U2F security key, it is a USB HID device
 -that implements the U2F protocol.
 -
 -In QEMU, the USB U2F key device offers a dedicated support of U2F, allowing
 -guest USB FIDO/U2F security keys operating in two possible modes:
 -pass-through and emulated.
 -
 -The pass-through mode consists of passing all requests made from the guest
 -to the physical security key connected to the host machine and vice versa.
 -In addition, the dedicated pass-through allows to have a U2F security key
 -shared on several guests which is not possible with a simple host device
 -assignment pass-through.
 -
 -The emulated mode consists of completely emulating the behavior of an
 -U2F device through software part. Libu2f-emu is used for that.
 -
 -
 -2. Building
 -
 -To ensure the build of the u2f-emulated device variant which depends
 -on libu2f-emu: configuring and building:
 -
 -    ./configure --enable-u2f && make
 -
 -The pass-through mode is built by default on Linux. To take advantage
 -of the autoscan option it provides, make sure you have a working libudev
 -installed on the host.
 -
 -
 -3. Using u2f-emulated
 -
 -To work, an emulated U2F device must have four elements:
 - * ec x509 certificate
 - * ec private key
 - * counter (four bytes value)
 - * 48 bytes of entropy (random bits)
 -
 -To use this type of device, this one has to be configured, and these
 -four elements must be passed one way or another.
 -
 -Assuming that you have a working libu2f-emu installed on the host.
 -There are three possible ways of configurations:
 - * ephemeral
 - * setup directory
 - * manual
 -
 -Ephemeral is the simplest way to configure, it lets the device generate
 -all the elements it needs for a single use of the lifetime of the device.
 -
 -    qemu -usb -device u2f-emulated
 -
 -Setup directory allows to configure the device from a directory containing
 -four files:
 - * certificate.pem: ec x509 certificate
 - * private-key.pem: ec private key
 - * counter: counter value
 - * entropy: 48 bytes of entropy
 -
 -    qemu -usb -device u2f-emulated,dir=$dir
 -
 -Manual allows to configure the device more finely by specifying each
 -of the elements necessary for the device:
 - * cert
 - * priv
 - * counter
 - * entropy
 -
 -    qemu -usb -device u2f-emulated,cert=$DIR1/$FILE1,priv=$DIR2/$FILE2,counter=$DIR3/$FILE3,entropy=$DIR4/$FILE4
 -
 -
 -4. Using u2f-passthru
 -
 -On the host specify the u2f-passthru device with a suitable hidraw:
 -
 -    qemu -usb -device u2f-passthru,hidraw=/dev/hidraw0
 -
 -Alternately, the u2f-passthru device can autoscan to take the first
 -U2F device it finds on the host (this requires a working libudev):
 -
 -    qemu -usb -device u2f-passthru
 -
 -
 -5. Libu2f-emu
 -
 -The u2f-emulated device uses libu2f-emu for the U2F key emulation. Libu2f-emu
 -implements completely the U2F protocol device part for all specified
 -transport given by the FIDO Alliance.
 -
 -For more information about libu2f-emu see this page:
 -https://github.com/MattGorko/libu2f-emu.
 --
-.20.1
+.34.1

Small pile of bug fixes for rc1. I've included my patches to get
our docs building with Sphinx 3, just for convenience...

-- PMM

The following changes since commit b149dea55cce97cb226683d06af61984a1c11e96:

Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging (2020-11-02 10:57:48 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201102

for you to fetch changes up to ffb4fbf90a2f63c9cb33e4bb9f854c79bf04ca4a:

tests/qtest/npcm7xx_rng-test: Disable randomness tests (2020-11-02 16:52:18 +0000)

----------------------------------------------------------------
target-arm queue:
 * target/arm: Fix Neon emulation bugs on big-endian hosts
 * target/arm: fix handling of HCR.FB
 * target/arm: fix LORID_EL1 access check
 * disas/capstone: Fix monitor disassembly of >32 bytes
 * hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
 * hw/arm/boot: fix SVE for EL3 direct kernel boot
 * hw/display/omap_lcdc: Fix potential NULL pointer dereference
 * hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
 * target/arm: Get correct MMU index for other-security-state
 * configure: Test that gio libs from pkg-config work
 * hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
 * docs: Fix building with Sphinx 3
 * tests/qtest/npcm7xx_rng-test: Disable randomness tests

----------------------------------------------------------------
AlexChen (2):
      hw/display/omap_lcdc: Fix potential NULL pointer dereference
      hw/display/exynos4210_fimd: Fix potential NULL pointer dereference

Peter Maydell (9):
      target/arm: Fix float16 pairwise Neon ops on big-endian hosts
      target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
      disas/capstone: Fix monitor disassembly of >32 bytes
      target/arm: Get correct MMU index for other-security-state
      configure: Test that gio libs from pkg-config work
      hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
      scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
      qemu-option-trace.rst.inc: Don't use option:: markup
      tests/qtest/npcm7xx_rng-test: Disable randomness tests

Philippe Mathieu-Daudé (1):
      hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)

Richard Henderson (11):
      target/arm: Introduce neon_full_reg_offset
      target/arm: Move neon_element_offset to translate.c
      target/arm: Use neon_element_offset in neon_load/store_reg
      target/arm: Use neon_element_offset in vfp_reg_offset
      target/arm: Add read/write_neon_element32
      target/arm: Expand read/write_neon_element32 to all MemOp
      target/arm: Rename neon_load_reg32 to vfp_load_reg32
      target/arm: Add read/write_neon_element64
      target/arm: Rename neon_load_reg64 to vfp_load_reg64
      target/arm: Simplify do_long_3d and do_2scalar_long
      target/arm: Improve do_prewiden_3d

Rémi Denis-Courmont (3):
      target/arm: fix handling of HCR.FB
      target/arm: fix LORID_EL1 access check
      hw/arm/boot: fix SVE for EL3 direct kernel boot

From: Richard Henderson <richard.henderson@linaro.org>

This function makes it clear that we're talking about the whole
register, and not the 32-bit piece at index 0.  This fixes a bug
when running on a big-endian host.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  8 ++++++
 target/arm/translate-neon.c.inc | 44 ++++++++++++++++-----------------
 target/arm/translate-vfp.c.inc  |  2 +-
 3 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
     unallocated_encoding(s);
 }
 
+/*
+ * Return the offset of a "full" NEON Dreg.
+ */
+static long neon_full_reg_offset(unsigned reg)
+{
+    return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+}
+
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ neon_element_offset(int reg, int element, MemOp size)
         ofs ^= 8 - element_size;
     }
 #endif
-    return neon_reg_offset(reg, 0) + ofs;
+    return neon_full_reg_offset(reg) + ofs;
 }
 
 static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
              * We cannot write 16 bytes at once because the
              * destination is unaligned.
              */
-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                  8, 8, tmp);
-            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
-                             neon_reg_offset(vd, 0), 8, 8);
+            tcg_gen_gvec_mov(0, neon_full_reg_offset(vd + 1),
+                             neon_full_reg_offset(vd), 8, 8);
         } else {
-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                  vec_size, vec_size, tmp);
         }
         tcg_gen_addi_i32(addr, addr, 1 << size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
 static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
 {
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rn_ofs = neon_reg_offset(a->vn, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rn_ofs = neon_full_reg_offset(a->vn);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
 {
     /* Handle a 2-reg-shift insn which can be vectorized. */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
 {
     /* FP operations in 2-reg-and-shift group */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
     TCGv_ptr fpst;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
         return true;
     }
 
-    reg_ofs = neon_reg_offset(a->vd, 0);
+    reg_ofs = neon_full_reg_offset(a->vd);
     vec_size = a->q ? 16 : 8;
     imm = asimd_imm_const(a->imm, a->cmode, a->op);
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
         return true;
     }
 
-    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
-                       neon_reg_offset(a->vn, 0),
-                       neon_reg_offset(a->vm, 0),
+    tcg_gen_gvec_3_ool(neon_full_reg_offset(a->vd),
+                       neon_full_reg_offset(a->vn),
+                       neon_full_reg_offset(a->vm),
                        16, 16, 0, fn_gvec);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
 {
     /* Two registers and a scalar, using gvec */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rn_ofs = neon_reg_offset(a->vn, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rn_ofs = neon_full_reg_offset(a->vn);
     int rm_ofs;
     int idx;
     TCGv_ptr fpstatus;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
     /* a->vm is M:Vm, which encodes both register and index */
     idx = extract32(a->vm, a->size + 2, 2);
     a->vm = extract32(a->vm, 0, a->size + 2);
-    rm_ofs = neon_reg_offset(a->vm, 0);
+    rm_ofs = neon_full_reg_offset(a->vm);
 
     fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
     tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
         return true;
     }
 
-    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
+    tcg_gen_gvec_dup_mem(a->size, neon_full_reg_offset(a->vd),
                          neon_element_offset(a->vm, a->index, a->size),
                          a->q ? 16 : 8, a->q ? 16 : 8);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
 static bool do_2misc_vec(DisasContext *s, arg_2misc *a, GVecGen2Fn *fn)
 {
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
     }
 
     tmp = load_reg(s, a->rt);
-    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
+    tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(a->vn),
                          vec_size, vec_size, tmp);
     tcg_temp_free_i32(tmp);
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This will shortly have users outside of translate-neon.c.inc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          | 20 ++++++++++++++++++++
 target/arm/translate-neon.c.inc | 19 -------------------
 2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
 }
 
+/*
+ * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
+ * where 0 is the least significant end of the register.
+ */
+static long neon_element_offset(int reg, int element, MemOp size)
+{
+    int element_size = 1 << size;
+    int ofs = element * element_size;
+#ifdef HOST_WORDS_BIGENDIAN
+    /*
+     * Calculate the offset assuming fully little-endian,
+     * then XOR to account for the order of the 8-byte units.
+     */
+    if (element_size < 8) {
+        ofs ^= 8 - element_size;
+    }
+#endif
+    return neon_full_reg_offset(reg) + ofs;
+}
+
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
 #include "decode-neon-ls.c.inc"
 #include "decode-neon-shared.c.inc"
 
-/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
- * where 0 is the least significant end of the register.
- */
-static inline long
-neon_element_offset(int reg, int element, MemOp size)
-{
-    int element_size = 1 << size;
-    int ofs = element * element_size;
-#ifdef HOST_WORDS_BIGENDIAN
-    /* Calculate the offset assuming fully little-endian,
-     * then XOR to account for the order of the 8-byte units.
-     */
-    if (element_size < 8) {
-        ofs ^= 8 - element_size;
-    }
-#endif
-    return neon_full_reg_offset(reg) + ofs;
-}
-
 static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
 {
     long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

These are the only users of neon_reg_offset, so remove that.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-/* Return the offset of a 32-bit piece of a NEON register.
-   zero is the least significant end of the register.  */
-static inline long
-neon_reg_offset (int reg, int n)
-{
-    int sreg;
-    sreg = reg * 2 + n;
-    return vfp_reg_offset(0, sreg);
-}
-
 static TCGv_i32 neon_load_reg(int reg, int pass)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ld_i32(tmp, cpu_env, neon_reg_offset(reg, pass));
+    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
     return tmp;
 }
 
 static void neon_store_reg(int reg, int pass, TCGv_i32 var)
 {
-    tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
+    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
     tcg_temp_free_i32(var);
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This seems a bit more readable than using offsetof CPU_DoubleU.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp size)
     return neon_full_reg_offset(reg) + ofs;
 }
 
-static inline long vfp_reg_offset(bool dp, unsigned reg)
+/* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
+static long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
-        return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+        return neon_element_offset(reg, 0, MO_64);
     } else {
-        long ofs = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
-        if (reg & 1) {
-            ofs += offsetof(CPU_DoubleU, l.upper);
-        } else {
-            ofs += offsetof(CPU_DoubleU, l.lower);
-        }
-        return ofs;
+        return neon_element_offset(reg >> 1, reg & 1, MO_32);
     }
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Model these off the aa64 read/write_vec_element functions.
Use it within translate-neon.c.inc.  The new functions do
not allocate or free temps, so this rearranges the calling
code a bit.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  26 ++++
 target/arm/translate-neon.c.inc | 256 ++++++++++++++++++++------------
 2 files changed, 183 insertions(+), 99 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+{
+    long off = neon_element_offset(reg, ele, size);
+
+    switch (size) {
+    case MO_32:
+        tcg_gen_ld_i32(dest, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+{
+    long off = neon_element_offset(reg, ele, size);
+
+    switch (size) {
+    case MO_32:
+        tcg_gen_st_i32(src, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
      * early. Since Q is 0 there are always just two passes, so instead
      * of a complicated loop over each pass we just unroll.
      */
-    tmp = neon_load_reg(a->vn, 0);
-    tmp2 = neon_load_reg(a->vn, 1);
+    tmp = tcg_temp_new_i32();
+    tmp2 = tcg_temp_new_i32();
+    tmp3 = tcg_temp_new_i32();
+
+    read_neon_element32(tmp, a->vn, 0, MO_32);
+    read_neon_element32(tmp2, a->vn, 1, MO_32);
     fn(tmp, tmp, tmp2);
-    tcg_temp_free_i32(tmp2);
 
-    tmp3 = neon_load_reg(a->vm, 0);
-    tmp2 = neon_load_reg(a->vm, 1);
+    read_neon_element32(tmp3, a->vm, 0, MO_32);
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     fn(tmp3, tmp3, tmp2);
-    tcg_temp_free_i32(tmp2);
 
-    neon_store_reg(a->vd, 0, tmp);
-    neon_store_reg(a->vd, 1, tmp3);
+    write_neon_element32(tmp, a->vd, 0, MO_32);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(tmp2);
+    tcg_temp_free_i32(tmp3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
      * 2-reg-and-shift operations, size < 3 case, where the
      * helper needs to be passed cpu_env.
      */
-    TCGv_i32 constimm;
+    TCGv_i32 constimm, tmp;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
      * by immediate using the variable shift operations.
      */
     constimm = tcg_const_i32(dup_const(a->size, a->shift));
+    tmp = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
+        read_neon_element32(tmp, a->vm, pass, MO_32);
         fn(tmp, cpu_env, tmp, constimm);
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(constimm);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
     constimm = tcg_const_i64(-a->shift);
     rm1 = tcg_temp_new_i64();
     rm2 = tcg_temp_new_i64();
+    rd = tcg_temp_new_i32();
 
     /* Load both inputs first to avoid potential overwrite if rm == rd */
     neon_load_reg64(rm1, a->vm);
     neon_load_reg64(rm2, a->vm + 1);
 
     shiftfn(rm1, rm1, constimm);
-    rd = tcg_temp_new_i32();
     narrowfn(rd, cpu_env, rm1);
-    neon_store_reg(a->vd, 0, rd);
+    write_neon_element32(rd, a->vd, 0, MO_32);
 
     shiftfn(rm2, rm2, constimm);
-    rd = tcg_temp_new_i32();
     narrowfn(rd, cpu_env, rm2);
-    neon_store_reg(a->vd, 1, rd);
+    write_neon_element32(rd, a->vd, 1, MO_32);
 
+    tcg_temp_free_i32(rd);
     tcg_temp_free_i64(rm1);
     tcg_temp_free_i64(rm2);
     tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
     constimm = tcg_const_i32(imm);
 
     /* Load all inputs first to avoid potential overwrite */
-    rm1 = neon_load_reg(a->vm, 0);
-    rm2 = neon_load_reg(a->vm, 1);
-    rm3 = neon_load_reg(a->vm + 1, 0);
-    rm4 = neon_load_reg(a->vm + 1, 1);
+    rm1 = tcg_temp_new_i32();
+    rm2 = tcg_temp_new_i32();
+    rm3 = tcg_temp_new_i32();
+    rm4 = tcg_temp_new_i32();
+    read_neon_element32(rm1, a->vm, 0, MO_32);
+    read_neon_element32(rm2, a->vm, 1, MO_32);
+    read_neon_element32(rm3, a->vm, 2, MO_32);
+    read_neon_element32(rm4, a->vm, 3, MO_32);
     rtmp = tcg_temp_new_i64();
 
     shiftfn(rm1, rm1, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
     tcg_temp_free_i32(rm2);
 
     narrowfn(rm1, cpu_env, rtmp);
-    neon_store_reg(a->vd, 0, rm1);
+    write_neon_element32(rm1, a->vd, 0, MO_32);
+    tcg_temp_free_i32(rm1);
 
     shiftfn(rm3, rm3, constimm);
     shiftfn(rm4, rm4, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
 
     narrowfn(rm3, cpu_env, rtmp);
     tcg_temp_free_i64(rtmp);
-    neon_store_reg(a->vd, 1, rm3);
+    write_neon_element32(rm3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(rm3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         widen_mask = dup_const(a->size + 1, widen_mask);
     }
 
-    rm0 = neon_load_reg(a->vm, 0);
-    rm1 = neon_load_reg(a->vm, 1);
+    rm0 = tcg_temp_new_i32();
+    rm1 = tcg_temp_new_i32();
+    read_neon_element32(rm0, a->vm, 0, MO_32);
+    read_neon_element32(rm1, a->vm, 1, MO_32);
     tmp = tcg_temp_new_i64();
 
     widenfn(tmp, rm0);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     if (src1_wide) {
         neon_load_reg64(rn0_64, a->vn);
     } else {
-        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, 0, MO_32);
         widenfn(rn0_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = neon_load_reg(a->vm, 0);
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rm, a->vm, 0, MO_32);
 
     widenfn(rm_64, rm);
     tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     if (src1_wide) {
         neon_load_reg64(rn1_64, a->vn + 1);
     } else {
-        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, 1, MO_32);
         widenfn(rn1_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = neon_load_reg(a->vm, 1);
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rm, a->vm, 1, MO_32);
 
     neon_store_reg64(rn0_64, a->vd);
 
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
 
     narrowfn(rd1, rn_64);
 
-    neon_store_reg(a->vd, 0, rd0);
-    neon_store_reg(a->vd, 1, rd1);
+    write_neon_element32(rd0, a->vd, 0, MO_32);
+    write_neon_element32(rd1, a->vd, 1, MO_32);
 
+    tcg_temp_free_i32(rd0);
+    tcg_temp_free_i32(rd1);
     tcg_temp_free_i64(rn_64);
     tcg_temp_free_i64(rm_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     rd0 = tcg_temp_new_i64();
     rd1 = tcg_temp_new_i64();
 
-    rn = neon_load_reg(a->vn, 0);
-    rm = neon_load_reg(a->vm, 0);
+    rn = tcg_temp_new_i32();
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rn, a->vn, 0, MO_32);
+    read_neon_element32(rm, a->vm, 0, MO_32);
     opfn(rd0, rn, rm);
-    tcg_temp_free_i32(rn);
-    tcg_temp_free_i32(rm);
 
-    rn = neon_load_reg(a->vn, 1);
-    rm = neon_load_reg(a->vm, 1);
+    read_neon_element32(rn, a->vn, 1, MO_32);
+    read_neon_element32(rm, a->vm, 1, MO_32);
     opfn(rd1, rn, rm);
     tcg_temp_free_i32(rn);
     tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
 
 static inline TCGv_i32 neon_get_scalar(int size, int reg)
 {
-    TCGv_i32 tmp;
-    if (size == 1) {
-        tmp = neon_load_reg(reg & 7, reg >> 4);
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    if (size == MO_16) {
+        read_neon_element32(tmp, reg & 7, reg >> 4, MO_32);
         if (reg & 8) {
             gen_neon_dup_high16(tmp);
         } else {
             gen_neon_dup_low16(tmp);
         }
     } else {
-        tmp = neon_load_reg(reg & 15, reg >> 4);
+        read_neon_element32(tmp, reg & 15, reg >> 4, MO_32);
     }
     return tmp;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
      * perform an accumulation operation of that result into the
      * destination.
      */
-    TCGv_i32 scalar;
+    TCGv_i32 scalar, tmp;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
     }
 
     scalar = neon_get_scalar(a->size, a->vm);
+    tmp = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
+        read_neon_element32(tmp, a->vn, pass, MO_32);
         opfn(tmp, tmp, scalar);
         if (accfn) {
-            TCGv_i32 rd = neon_load_reg(a->vd, pass);
+            TCGv_i32 rd = tcg_temp_new_i32();
+            read_neon_element32(rd, a->vd, pass, MO_32);
             accfn(tmp, rd, tmp);
             tcg_temp_free_i32(rd);
         }
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(scalar);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
      * performs a kind of fused op-then-accumulate using a helper
      * function that takes all of rd, rn and the scalar at once.
      */
-    TCGv_i32 scalar;
+    TCGv_i32 scalar, rn, rd;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
     }
 
     scalar = neon_get_scalar(a->size, a->vm);
+    rn = tcg_temp_new_i32();
+    rd = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 rn = neon_load_reg(a->vn, pass);
-        TCGv_i32 rd = neon_load_reg(a->vd, pass);
+        read_neon_element32(rn, a->vn, pass, MO_32);
+        read_neon_element32(rd, a->vd, pass, MO_32);
         opfn(rd, cpu_env, rn, scalar, rd);
-        tcg_temp_free_i32(rn);
-        neon_store_reg(a->vd, pass, rd);
+        write_neon_element32(rd, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rd);
     tcg_temp_free_i32(scalar);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
     scalar = neon_get_scalar(a->size, a->vm);
 
     /* Load all inputs before writing any outputs, in case of overlap */
-    rn = neon_load_reg(a->vn, 0);
+    rn = tcg_temp_new_i32();
+    read_neon_element32(rn, a->vn, 0, MO_32);
     rn0_64 = tcg_temp_new_i64();
     opfn(rn0_64, rn, scalar);
-    tcg_temp_free_i32(rn);
 
-    rn = neon_load_reg(a->vn, 1);
+    read_neon_element32(rn, a->vn, 1, MO_32);
     rn1_64 = tcg_temp_new_i64();
     opfn(rn1_64, rn, scalar);
     tcg_temp_free_i32(rn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
         return false;
     }
     n <<= 3;
+    tmp = tcg_temp_new_i32();
     if (a->op) {
-        tmp = neon_load_reg(a->vd, 0);
+        read_neon_element32(tmp, a->vd, 0, MO_32);
     } else {
-        tmp = tcg_temp_new_i32();
         tcg_gen_movi_i32(tmp, 0);
     }
-    tmp2 = neon_load_reg(a->vm, 0);
+    tmp2 = tcg_temp_new_i32();
+    read_neon_element32(tmp2, a->vm, 0, MO_32);
     ptr1 = vfp_reg_ptr(true, a->vn);
     tmp4 = tcg_const_i32(n);
     gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
-    tcg_temp_free_i32(tmp);
+
     if (a->op) {
-        tmp = neon_load_reg(a->vd, 1);
+        read_neon_element32(tmp, a->vd, 1, MO_32);
     } else {
-        tmp = tcg_temp_new_i32();
         tcg_gen_movi_i32(tmp, 0);
     }
-    tmp3 = neon_load_reg(a->vm, 1);
+    tmp3 = tcg_temp_new_i32();
+    read_neon_element32(tmp3, a->vm, 1, MO_32);
     gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(tmp4);
     tcg_temp_free_ptr(ptr1);
-    neon_store_reg(a->vd, 0, tmp2);
-    neon_store_reg(a->vd, 1, tmp3);
-    tcg_temp_free_i32(tmp);
+
+    write_neon_element32(tmp2, a->vd, 0, MO_32);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp2);
+    tcg_temp_free_i32(tmp3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
 static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
 {
     int pass, half;
+    TCGv_i32 tmp[2];
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
         return true;
     }
 
-    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
-        TCGv_i32 tmp[2];
+    tmp[0] = tcg_temp_new_i32();
+    tmp[1] = tcg_temp_new_i32();
 
+    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
         for (half = 0; half < 2; half++) {
-            tmp[half] = neon_load_reg(a->vm, pass * 2 + half);
+            read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32);
             switch (a->size) {
             case 0:
                 tcg_gen_bswap32_i32(tmp[half], tmp[half]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
                 g_assert_not_reached();
             }
         }
-        neon_store_reg(a->vd, pass * 2, tmp[1]);
-        neon_store_reg(a->vd, pass * 2 + 1, tmp[0]);
+        write_neon_element32(tmp[1], a->vd, pass * 2, MO_32);
+        write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32);
     }
+
+    tcg_temp_free_i32(tmp[0]);
+    tcg_temp_free_i32(tmp[1]);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
         rm0_64 = tcg_temp_new_i64();
         rm1_64 = tcg_temp_new_i64();
         rd_64 = tcg_temp_new_i64();
-        tmp = neon_load_reg(a->vm, pass * 2);
+
+        tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, pass * 2, MO_32);
         widenfn(rm0_64, tmp);
-        tcg_temp_free_i32(tmp);
-        tmp = neon_load_reg(a->vm, pass * 2 + 1);
+        read_neon_element32(tmp, a->vm, pass * 2 + 1, MO_32);
         widenfn(rm1_64, tmp);
         tcg_temp_free_i32(tmp);
+
         opfn(rd_64, rm0_64, rm1_64);
         tcg_temp_free_i64(rm0_64);
         tcg_temp_free_i64(rm1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
     narrowfn(rd0, cpu_env, rm);
     neon_load_reg64(rm, a->vm + 1);
     narrowfn(rd1, cpu_env, rm);
-    neon_store_reg(a->vd, 0, rd0);
-    neon_store_reg(a->vd, 1, rd1);
+    write_neon_element32(rd0, a->vd, 0, MO_32);
+    write_neon_element32(rd1, a->vd, 1, MO_32);
+    tcg_temp_free_i32(rd0);
+    tcg_temp_free_i32(rd1);
     tcg_temp_free_i64(rm);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
     }
 
     rd = tcg_temp_new_i64();
+    rm0 = tcg_temp_new_i32();
+    rm1 = tcg_temp_new_i32();
 
-    rm0 = neon_load_reg(a->vm, 0);
-    rm1 = neon_load_reg(a->vm, 1);
+    read_neon_element32(rm0, a->vm, 0, MO_32);
+    read_neon_element32(rm1, a->vm, 1, MO_32);
 
     widenfn(rd, rm0);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
 
     fpst = fpstatus_ptr(FPST_STD);
     ahp = get_ahp_flag();
-    tmp = neon_load_reg(a->vm, 0);
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vm, 0, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-    tmp2 = neon_load_reg(a->vm, 1);
+    tmp2 = tcg_temp_new_i32();
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp2, tmp2, fpst, ahp);
     tcg_gen_shli_i32(tmp2, tmp2, 16);
     tcg_gen_or_i32(tmp2, tmp2, tmp);
-    tcg_temp_free_i32(tmp);
-    tmp = neon_load_reg(a->vm, 2);
+    read_neon_element32(tmp, a->vm, 2, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-    tmp3 = neon_load_reg(a->vm, 3);
-    neon_store_reg(a->vd, 0, tmp2);
+    tmp3 = tcg_temp_new_i32();
+    read_neon_element32(tmp3, a->vm, 3, MO_32);
+    write_neon_element32(tmp2, a->vd, 0, MO_32);
+    tcg_temp_free_i32(tmp2);
     gen_helper_vfp_fcvt_f32_to_f16(tmp3, tmp3, fpst, ahp);
     tcg_gen_shli_i32(tmp3, tmp3, 16);
     tcg_gen_or_i32(tmp3, tmp3, tmp);
-    neon_store_reg(a->vd, 1, tmp3);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp3);
     tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(ahp);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
     fpst = fpstatus_ptr(FPST_STD);
     ahp = get_ahp_flag();
     tmp3 = tcg_temp_new_i32();
-    tmp = neon_load_reg(a->vm, 0);
-    tmp2 = neon_load_reg(a->vm, 1);
+    tmp2 = tcg_temp_new_i32();
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vm, 0, MO_32);
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     tcg_gen_ext16u_i32(tmp3, tmp);
     gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
-    neon_store_reg(a->vd, 0, tmp3);
+    write_neon_element32(tmp3, a->vd, 0, MO_32);
     tcg_gen_shri_i32(tmp, tmp, 16);
     gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp);
-    neon_store_reg(a->vd, 1, tmp);
-    tmp3 = tcg_temp_new_i32();
+    write_neon_element32(tmp, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp);
     tcg_gen_ext16u_i32(tmp3, tmp2);
     gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
-    neon_store_reg(a->vd, 2, tmp3);
+    write_neon_element32(tmp3, a->vd, 2, MO_32);
+    tcg_temp_free_i32(tmp3);
     tcg_gen_shri_i32(tmp2, tmp2, 16);
     gen_helper_vfp_fcvt_f16_to_f32(tmp2, tmp2, fpst, ahp);
-    neon_store_reg(a->vd, 3, tmp2);
+    write_neon_element32(tmp2, a->vd, 3, MO_32);
+    tcg_temp_free_i32(tmp2);
     tcg_temp_free_i32(ahp);
     tcg_temp_free_ptr(fpst);
 
@@ -XXX,XX +XXX,XX @@ DO_2M_CRYPTO(SHA256SU0, aa32_sha2, 2)
 
 static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
 {
+    TCGv_i32 tmp;
     int pass;
 
     /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
@@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
         return true;
     }
 
+    tmp = tcg_temp_new_i32();
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
+        read_neon_element32(tmp, a->vm, pass, MO_32);
         fn(tmp, tmp);
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
 
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VTRN(DisasContext *s, arg_2misc *a)
         return true;
     }
 
-    if (a->size == 2) {
+    tmp = tcg_temp_new_i32();
+    tmp2 = tcg_temp_new_i32();
+    if (a->size == MO_32) {
         for (pass = 0; pass < (a->q ? 4 : 2); pass += 2) {
-            tmp = neon_load_reg(a->vm, pass);
-            tmp2 = neon_load_reg(a->vd, pass + 1);
-            neon_store_reg(a->vm, pass, tmp2);
-            neon_store_reg(a->vd, pass + 1, tmp);
+            read_neon_element32(tmp, a->vm, pass, MO_32);
+            read_neon_element32(tmp2, a->vd, pass + 1, MO_32);
+            write_neon_element32(tmp2, a->vm, pass, MO_32);
+            write_neon_element32(tmp, a->vd, pass + 1, MO_32);
         }
     } else {
         for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-            tmp = neon_load_reg(a->vm, pass);
-            tmp2 = neon_load_reg(a->vd, pass);
-            if (a->size == 0) {
+            read_neon_element32(tmp, a->vm, pass, MO_32);
+            read_neon_element32(tmp2, a->vd, pass, MO_32);
+            if (a->size == MO_8) {
                 gen_neon_trn_u8(tmp, tmp2);
             } else {
                 gen_neon_trn_u16(tmp, tmp2);
             }
-            neon_store_reg(a->vm, pass, tmp2);
-            neon_store_reg(a->vd, pass, tmp);
+            write_neon_element32(tmp2, a->vm, pass, MO_32);
+            write_neon_element32(tmp, a->vd, pass, MO_32);
         }
     }
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(tmp2);
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We can then use this to improve VMOV (scalar to gp) and
VMOV (gp to scalar) so that we simply perform the memory
operation that we wanted, rather than inserting or
extracting from a 32-bit quantity.

These were the last uses of neon_load/store_reg, so remove them.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         | 50 +++++++++++++-----------
 target/arm/translate-vfp.c.inc | 71 +++++-----------------------------
 2 files changed, 37 insertions(+), 84 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
  * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
  * where 0 is the least significant end of the register.
  */
-static long neon_element_offset(int reg, int element, MemOp size)
+static long neon_element_offset(int reg, int element, MemOp memop)
 {
-    int element_size = 1 << size;
+    int element_size = 1 << (memop & MO_SIZE);
     int ofs = element * element_size;
 #ifdef HOST_WORDS_BIGENDIAN
     /*
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-static TCGv_i32 neon_load_reg(int reg, int pass)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
-    return tmp;
-}
-
-static void neon_store_reg(int reg, int pass, TCGv_i32 var)
-{
-    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
-    tcg_temp_free_i32(var);
-}
-
 static inline void neon_load_reg64(TCGv_i64 var, int reg)
 {
     tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
-static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
 {
-    long off = neon_element_offset(reg, ele, size);
+    long off = neon_element_offset(reg, ele, memop);
 
-    switch (size) {
-    case MO_32:
+    switch (memop) {
+    case MO_SB:
+        tcg_gen_ld8s_i32(dest, cpu_env, off);
+        break;
+    case MO_UB:
+        tcg_gen_ld8u_i32(dest, cpu_env, off);
+        break;
+    case MO_SW:
+        tcg_gen_ld16s_i32(dest, cpu_env, off);
+        break;
+    case MO_UW:
+        tcg_gen_ld16u_i32(dest, cpu_env, off);
+        break;
+    case MO_UL:
+    case MO_SL:
         tcg_gen_ld_i32(dest, cpu_env, off);
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
     }
 }
 
-static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
 {
-    long off = neon_element_offset(reg, ele, size);
+    long off = neon_element_offset(reg, ele, memop);
 
-    switch (size) {
+    switch (memop) {
+    case MO_8:
+        tcg_gen_st8_i32(src, cpu_env, off);
+        break;
+    case MO_16:
+        tcg_gen_st16_i32(src, cpu_env, off);
+        break;
     case MO_32:
         tcg_gen_st_i32(src, cpu_env, off);
         break;
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 {
     /* VMOV scalar to general purpose register */
     TCGv_i32 tmp;
-    int pass;
-    uint32_t offset;
 
-    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
-    if (a->size == 2
+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
+    if (a->size == MO_32
         ? !dc_isar_feature(aa32_fpsp_v2, s)
         : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
         return false;
     }
 
-    offset = a->index << a->size;
-    pass = extract32(offset, 2, 1);
-    offset = extract32(offset, 0, 2) * 8;
-
     if (!vfp_access_check(s)) {
         return true;
     }
 
-    tmp = neon_load_reg(a->vn, pass);
-    switch (a->size) {
-    case 0:
-        if (offset) {
-            tcg_gen_shri_i32(tmp, tmp, offset);
-        }
-        if (a->u) {
-            gen_uxtb(tmp);
-        } else {
-            gen_sxtb(tmp);
-        }
-        break;
-    case 1:
-        if (a->u) {
-            if (offset) {
-                tcg_gen_shri_i32(tmp, tmp, 16);
-            } else {
-                gen_uxth(tmp);
-            }
-        } else {
-            if (offset) {
-                tcg_gen_sari_i32(tmp, tmp, 16);
-            } else {
-                gen_sxth(tmp);
-            }
-        }
-        break;
-    case 2:
-        break;
-    }
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
     store_reg(s, a->rt, tmp);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
 {
     /* VMOV general purpose register to scalar */
-    TCGv_i32 tmp, tmp2;
-    int pass;
-    uint32_t offset;
+    TCGv_i32 tmp;
 
-    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
-    if (a->size == 2
+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
+    if (a->size == MO_32
         ? !dc_isar_feature(aa32_fpsp_v2, s)
         : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
         return false;
     }
 
-    offset = a->index << a->size;
-    pass = extract32(offset, 2, 1);
-    offset = extract32(offset, 0, 2) * 8;
-
     if (!vfp_access_check(s)) {
         return true;
     }
 
     tmp = load_reg(s, a->rt);
-    switch (a->size) {
-    case 0:
-        tmp2 = neon_load_reg(a->vn, pass);
-        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
-        tcg_temp_free_i32(tmp2);
-        break;
-    case 1:
-        tmp2 = neon_load_reg(a->vn, pass);
-        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
-        tcg_temp_free_i32(tmp2);
-        break;
-    case 2:
-        break;
-    }
-    neon_store_reg(a->vn, pass, tmp);
+    write_neon_element32(tmp, a->vn, a->index, a->size);
+    tcg_temp_free_i32(tmp);
 
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The only uses of this function are for loading VFP
single-precision values, and nothing to do with NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         |   4 +-
 target/arm/translate-vfp.c.inc | 184 ++++++++++++++++-----------------
 2 files changed, 94 insertions(+), 94 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
 }
 
-static inline void neon_load_reg32(TCGv_i32 var, int reg)
+static inline void vfp_load_reg32(TCGv_i32 var, int reg)
 {
     tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
-static inline void neon_store_reg32(TCGv_i32 var, int reg)
+static inline void vfp_store_reg32(TCGv_i32 var, int reg)
 {
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         frn = tcg_temp_new_i32();
         frm = tcg_temp_new_i32();
         dest = tcg_temp_new_i32();
-        neon_load_reg32(frn, rn);
-        neon_load_reg32(frm, rm);
+        vfp_load_reg32(frn, rn);
+        vfp_load_reg32(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         if (sz == 1) {
             tcg_gen_andi_i32(dest, dest, 0xffff);
         }
-        neon_store_reg32(dest, rd);
+        vfp_store_reg32(dest, rd);
         tcg_temp_free_i32(frn);
         tcg_temp_free_i32(frm);
         tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i32 tcg_res;
         tcg_op = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        neon_load_reg32(tcg_op, rm);
+        vfp_load_reg32(tcg_op, rm);
         if (sz == 1) {
             gen_helper_rinth(tcg_res, tcg_op, fpst);
         } else {
             gen_helper_rints(tcg_res, tcg_op, fpst);
         }
-        neon_store_reg32(tcg_res, rd);
+        vfp_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_op);
         tcg_temp_free_i32(tcg_res);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
             gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
         }
         tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
-        neon_store_reg32(tcg_tmp, rd);
+        vfp_store_reg32(tcg_tmp, rd);
         tcg_temp_free_i32(tcg_tmp);
         tcg_temp_free_i64(tcg_res);
         tcg_temp_free_i64(tcg_double);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         TCGv_i32 tcg_single, tcg_res;
         tcg_single = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        neon_load_reg32(tcg_single, rm);
+        vfp_load_reg32(tcg_single, rm);
         if (sz == 1) {
             if (is_signed) {
                 gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
                 gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
             }
         }
-        neon_store_reg32(tcg_res, rd);
+        vfp_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_res);
         tcg_temp_free_i32(tcg_single);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
     if (a->l) {
         /* VFP to general purpose register */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vn);
+        vfp_load_reg32(tmp, a->vn);
         tcg_gen_andi_i32(tmp, tmp, 0xffff);
         store_reg(s, a->rt, tmp);
     } else {
         /* general purpose register to VFP */
         tmp = load_reg(s, a->rt);
         tcg_gen_andi_i32(tmp, tmp, 0xffff);
-        neon_store_reg32(tmp, a->vn);
+        vfp_store_reg32(tmp, a->vn);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
     if (a->l) {
         /* VFP to general purpose register */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vn);
+        vfp_load_reg32(tmp, a->vn);
         if (a->rt == 15) {
             /* Set the 4 flag bits in the CPSR.  */
             gen_set_nzcv(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
     } else {
         /* general purpose register to VFP */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vn);
+        vfp_store_reg32(tmp, a->vn);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
     if (a->op) {
         /* fpreg to gpreg */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm);
+        vfp_load_reg32(tmp, a->vm);
         store_reg(s, a->rt, tmp);
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm + 1);
+        vfp_load_reg32(tmp, a->vm + 1);
         store_reg(s, a->rt2, tmp);
     } else {
         /* gpreg to fpreg */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vm);
+        vfp_store_reg32(tmp, a->vm);
         tcg_temp_free_i32(tmp);
         tmp = load_reg(s, a->rt2);
-        neon_store_reg32(tmp, a->vm + 1);
+        vfp_store_reg32(tmp, a->vm + 1);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
     if (a->op) {
         /* fpreg to gpreg */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm * 2);
+        vfp_load_reg32(tmp, a->vm * 2);
         store_reg(s, a->rt, tmp);
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm * 2 + 1);
+        vfp_load_reg32(tmp, a->vm * 2 + 1);
         store_reg(s, a->rt2, tmp);
     } else {
         /* gpreg to fpreg */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vm * 2);
+        vfp_store_reg32(tmp, a->vm * 2);
         tcg_temp_free_i32(tmp);
         tmp = load_reg(s, a->rt2);
-        neon_store_reg32(tmp, a->vm * 2 + 1);
+        vfp_store_reg32(tmp, a->vm * 2 + 1);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     tmp = tcg_temp_new_i32();
     if (a->l) {
         gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
-        neon_store_reg32(tmp, a->vd);
+        vfp_store_reg32(tmp, a->vd);
     } else {
-        neon_load_reg32(tmp, a->vd);
+        vfp_load_reg32(tmp, a->vd);
         gen_aa32_st16(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     tmp = tcg_temp_new_i32();
     if (a->l) {
         gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-        neon_store_reg32(tmp, a->vd);
+        vfp_store_reg32(tmp, a->vd);
     } else {
-        neon_load_reg32(tmp, a->vd);
+        vfp_load_reg32(tmp, a->vd);
         gen_aa32_st32(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
         if (a->l) {
             /* load */
             gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-            neon_store_reg32(tmp, a->vd + i);
+            vfp_store_reg32(tmp, a->vd + i);
         } else {
             /* store */
-            neon_load_reg32(tmp, a->vd + i);
+            vfp_load_reg32(tmp, a->vd + i);
             gen_aa32_st32(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
     fd = tcg_temp_new_i32();
     fpst = fpstatus_ptr(FPST_FPCR);
 
-    neon_load_reg32(f0, vn);
-    neon_load_reg32(f1, vm);
+    vfp_load_reg32(f0, vn);
+    vfp_load_reg32(f1, vm);
 
     for (;;) {
         if (reads_vd) {
-            neon_load_reg32(fd, vd);
+            vfp_load_reg32(fd, vd);
         }
         fn(fd, f0, f1, fpst);
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
         veclen--;
         vd = vfp_advance_sreg(vd, delta_d);
         vn = vfp_advance_sreg(vn, delta_d);
-        neon_load_reg32(f0, vn);
+        vfp_load_reg32(f0, vn);
         if (delta_m) {
             vm = vfp_advance_sreg(vm, delta_m);
-            neon_load_reg32(f1, vm);
+            vfp_load_reg32(f1, vm);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
     fd = tcg_temp_new_i32();
     fpst = fpstatus_ptr(FPST_FPCR_F16);
 
-    neon_load_reg32(f0, vn);
-    neon_load_reg32(f1, vm);
+    vfp_load_reg32(f0, vn);
+    vfp_load_reg32(f1, vm);
 
     if (reads_vd) {
-        neon_load_reg32(fd, vd);
+        vfp_load_reg32(fd, vd);
     }
     fn(fd, f0, f1, fpst);
-    neon_store_reg32(fd, vd);
+    vfp_store_reg32(fd, vd);
 
     tcg_temp_free_i32(f0);
     tcg_temp_free_i32(f1);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     f0 = tcg_temp_new_i32();
     fd = tcg_temp_new_i32();
 
-    neon_load_reg32(f0, vm);
+    vfp_load_reg32(f0, vm);
 
     for (;;) {
         fn(fd, f0);
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
             /* single source one-many */
             while (veclen--) {
                 vd = vfp_advance_sreg(vd, delta_d);
-                neon_store_reg32(fd, vd);
+                vfp_store_reg32(fd, vd);
             }
             break;
         }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
         veclen--;
         vd = vfp_advance_sreg(vd, delta_d);
         vm = vfp_advance_sreg(vm, delta_m);
-        neon_load_reg32(f0, vm);
+        vfp_load_reg32(f0, vm);
     }
 
     tcg_temp_free_i32(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     }
 
     f0 = tcg_temp_new_i32();
-    neon_load_reg32(f0, vm);
+    vfp_load_reg32(f0, vm);
     fn(f0, f0);
-    neon_store_reg32(f0, vd);
+    vfp_store_reg32(f0, vd);
     tcg_temp_free_i32(f0);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i32();
 
-    neon_load_reg32(vn, a->vn);
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vn, a->vn);
+    vfp_load_reg32(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negh(vn, vn);
     }
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negh(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i32();
 
-    neon_load_reg32(vn, a->vn);
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vn, a->vn);
+    vfp_load_reg32(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negs(vn, vn);
     }
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negs(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
     }
 
     fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
-    neon_store_reg32(fd, a->vd);
+    vfp_store_reg32(fd, a->vd);
     tcg_temp_free_i32(fd);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
     fd = tcg_const_i32(vfp_expand_imm(MO_32, a->imm));
 
     for (;;) {
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i32();
 
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i32(vm, 0);
     } else {
-        neon_load_reg32(vm, a->vm);
+        vfp_load_reg32(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i32();
 
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i32(vm, 0);
     } else {
-        neon_load_reg32(vm, a->vm);
+        vfp_load_reg32(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
     /* The T bit tells us if we want the low or high 16 bits of Vm */
     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
     gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_i32(ahp_mode);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
     ahp_mode = get_ahp_flag();
     tmp = tcg_temp_new_i32();
 
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
     tcg_temp_free_i32(ahp_mode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_rinth(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rints(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rinth(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tcg_rmode);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rints(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tcg_rmode);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_rinth_exact(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rints_exact(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
 
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i64();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     gen_helper_vfp_fcvtds(vd, vm, cpu_env);
     neon_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
     vm = tcg_temp_new_i64();
     neon_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i64(vm);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
     }
 
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     if (a->s) {
         /* i32 -> f16 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
         /* u32 -> f16 */
         gen_helper_vfp_uitoh(vm, vm, fpst);
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
     }
 
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     if (a->s) {
         /* i32 -> f32 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
         /* u32 -> f32 */
         gen_helper_vfp_uitos(vm, vm, fpst);
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
 
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i64();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     if (a->s) {
         /* i32 -> f64 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
     vd = tcg_temp_new_i32();
     neon_load_reg64(vm, a->vm);
     gen_helper_vjcvt(vd, vm, cpu_env);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i64(vm);
     tcg_temp_free_i32(vd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i32();
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i32();
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
 
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
             gen_helper_vfp_touih(vm, vm, fpst);
         }
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
 
     fpst = fpstatus_ptr(FPST_FPCR);
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
             gen_helper_vfp_touis(vm, vm, fpst);
         }
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
             gen_helper_vfp_touid(vd, vm, fpst);
         }
     }
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i64(vm);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
     /* Insert low half of Vm into high half of Vd */
     rm = tcg_temp_new_i32();
     rd = tcg_temp_new_i32();
-    neon_load_reg32(rm, a->vm);
-    neon_load_reg32(rd, a->vd);
+    vfp_load_reg32(rm, a->vm);
+    vfp_load_reg32(rd, a->vd);
     tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
-    neon_store_reg32(rd, a->vd);
+    vfp_store_reg32(rd, a->vd);
     tcg_temp_free_i32(rm);
     tcg_temp_free_i32(rd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
 
     /* Set Vd to high half of Vm */
     rm = tcg_temp_new_i32();
-    neon_load_reg32(rm, a->vm);
+    vfp_load_reg32(rm, a->vm);
     tcg_gen_shri_i32(rm, rm, 16);
-    neon_store_reg32(rm, a->vd);
+    vfp_store_reg32(rm, a->vd);
     tcg_temp_free_i32(rm);
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          | 26 +++++++++
 target/arm/translate-neon.c.inc | 94 ++++++++++++++++-----------------
 2 files changed, 73 insertions(+), 47 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
     }
 }
 
+static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
+{
+    long off = neon_element_offset(reg, ele, memop);
+
+    switch (memop) {
+    case MO_Q:
+        tcg_gen_ld_i64(dest, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
 {
     long off = neon_element_offset(reg, ele, memop);
@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
     }
 }
 
+static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
+{
+    long off = neon_element_offset(reg, ele, memop);
+
+    switch (memop) {
+    case MO_64:
+        tcg_gen_st_i64(src, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
     for (pass = 0; pass < a->q + 1; pass++) {
         TCGv_i64 tmp = tcg_temp_new_i64();
 
-        neon_load_reg64(tmp, a->vm + pass);
+        read_neon_element64(tmp, a->vm, pass, MO_64);
         fn(tmp, cpu_env, tmp, constimm);
-        neon_store_reg64(tmp, a->vd + pass);
+        write_neon_element64(tmp, a->vd, pass, MO_64);
         tcg_temp_free_i64(tmp);
     }
     tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
     rd = tcg_temp_new_i32();
 
     /* Load both inputs first to avoid potential overwrite if rm == rd */
-    neon_load_reg64(rm1, a->vm);
-    neon_load_reg64(rm2, a->vm + 1);
+    read_neon_element64(rm1, a->vm, 0, MO_64);
+    read_neon_element64(rm2, a->vm, 1, MO_64);
 
     shiftfn(rm1, rm1, constimm);
     narrowfn(rd, cpu_env, rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
     }
-    neon_store_reg64(tmp, a->vd);
+    write_neon_element64(tmp, a->vd, 0, MO_64);
 
     widenfn(tmp, rm1);
     tcg_temp_free_i32(rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
     }
-    neon_store_reg64(tmp, a->vd + 1);
+    write_neon_element64(tmp, a->vd, 1, MO_64);
     tcg_temp_free_i64(tmp);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rm_64 = tcg_temp_new_i64();
 
     if (src1_wide) {
-        neon_load_reg64(rn0_64, a->vn);
+        read_neon_element64(rn0_64, a->vn, 0, MO_64);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 0, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      * avoid incorrect results if a narrow input overlaps with the result.
      */
     if (src1_wide) {
-        neon_load_reg64(rn1_64, a->vn + 1);
+        read_neon_element64(rn1_64, a->vn, 1, MO_64);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rm = tcg_temp_new_i32();
     read_neon_element32(rm, a->vm, 1, MO_32);
 
-    neon_store_reg64(rn0_64, a->vd);
+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
 
     widenfn(rm_64, rm);
     tcg_temp_free_i32(rm);
     opfn(rn1_64, rn1_64, rm_64);
-    neon_store_reg64(rn1_64, a->vd + 1);
+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
 
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
     rd0 = tcg_temp_new_i32();
     rd1 = tcg_temp_new_i32();
 
-    neon_load_reg64(rn_64, a->vn);
-    neon_load_reg64(rm_64, a->vm);
+    read_neon_element64(rn_64, a->vn, 0, MO_64);
+    read_neon_element64(rm_64, a->vm, 0, MO_64);
 
     opfn(rn_64, rn_64, rm_64);
 
     narrowfn(rd0, rn_64);
 
-    neon_load_reg64(rn_64, a->vn + 1);
-    neon_load_reg64(rm_64, a->vm + 1);
+    read_neon_element64(rn_64, a->vn, 1, MO_64);
+    read_neon_element64(rm_64, a->vm, 1, MO_64);
 
     opfn(rn_64, rn_64, rm_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     /* Don't store results until after all loads: they might overlap */
     if (accfn) {
         tmp = tcg_temp_new_i64();
-        neon_load_reg64(tmp, a->vd);
+        read_neon_element64(tmp, a->vd, 0, MO_64);
         accfn(tmp, tmp, rd0);
-        neon_store_reg64(tmp, a->vd);
-        neon_load_reg64(tmp, a->vd + 1);
+        write_neon_element64(tmp, a->vd, 0, MO_64);
+        read_neon_element64(tmp, a->vd, 1, MO_64);
         accfn(tmp, tmp, rd1);
-        neon_store_reg64(tmp, a->vd + 1);
+        write_neon_element64(tmp, a->vd, 1, MO_64);
         tcg_temp_free_i64(tmp);
     } else {
-        neon_store_reg64(rd0, a->vd);
-        neon_store_reg64(rd1, a->vd + 1);
+        write_neon_element64(rd0, a->vd, 0, MO_64);
+        write_neon_element64(rd1, a->vd, 1, MO_64);
     }
 
     tcg_temp_free_i64(rd0);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
 
     if (accfn) {
         TCGv_i64 t64 = tcg_temp_new_i64();
-        neon_load_reg64(t64, a->vd);
+        read_neon_element64(t64, a->vd, 0, MO_64);
         accfn(t64, t64, rn0_64);
-        neon_store_reg64(t64, a->vd);
-        neon_load_reg64(t64, a->vd + 1);
+        write_neon_element64(t64, a->vd, 0, MO_64);
+        read_neon_element64(t64, a->vd, 1, MO_64);
         accfn(t64, t64, rn1_64);
-        neon_store_reg64(t64, a->vd + 1);
+        write_neon_element64(t64, a->vd, 1, MO_64);
         tcg_temp_free_i64(t64);
     } else {
-        neon_store_reg64(rn0_64, a->vd);
-        neon_store_reg64(rn1_64, a->vd + 1);
+        write_neon_element64(rn0_64, a->vd, 0, MO_64);
+        write_neon_element64(rn1_64, a->vd, 1, MO_64);
     }
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
         right = tcg_temp_new_i64();
         dest = tcg_temp_new_i64();
 
-        neon_load_reg64(right, a->vn);
-        neon_load_reg64(left, a->vm);
+        read_neon_element64(right, a->vn, 0, MO_64);
+        read_neon_element64(left, a->vm, 0, MO_64);
         tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
-        neon_store_reg64(dest, a->vd);
+        write_neon_element64(dest, a->vd, 0, MO_64);
 
         tcg_temp_free_i64(left);
         tcg_temp_free_i64(right);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
         destright = tcg_temp_new_i64();
 
         if (a->imm < 8) {
-            neon_load_reg64(right, a->vn);
-            neon_load_reg64(middle, a->vn + 1);
+            read_neon_element64(right, a->vn, 0, MO_64);
+            read_neon_element64(middle, a->vn, 1, MO_64);
             tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
-            neon_load_reg64(left, a->vm);
+            read_neon_element64(left, a->vm, 0, MO_64);
             tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
         } else {
-            neon_load_reg64(right, a->vn + 1);
-            neon_load_reg64(middle, a->vm);
+            read_neon_element64(right, a->vn, 1, MO_64);
+            read_neon_element64(middle, a->vm, 0, MO_64);
             tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
-            neon_load_reg64(left, a->vm + 1);
+            read_neon_element64(left, a->vm, 1, MO_64);
             tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
         }
 
-        neon_store_reg64(destright, a->vd);
-        neon_store_reg64(destleft, a->vd + 1);
+        write_neon_element64(destright, a->vd, 0, MO_64);
+        write_neon_element64(destleft, a->vd, 1, MO_64);
 
         tcg_temp_free_i64(destright);
         tcg_temp_free_i64(destleft);
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
 
         if (accfn) {
             TCGv_i64 tmp64 = tcg_temp_new_i64();
-            neon_load_reg64(tmp64, a->vd + pass);
+            read_neon_element64(tmp64, a->vd, pass, MO_64);
             accfn(rd_64, tmp64, rd_64);
             tcg_temp_free_i64(tmp64);
         }
-        neon_store_reg64(rd_64, a->vd + pass);
+        write_neon_element64(rd_64, a->vd, pass, MO_64);
         tcg_temp_free_i64(rd_64);
     }
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
     rd0 = tcg_temp_new_i32();
     rd1 = tcg_temp_new_i32();
 
-    neon_load_reg64(rm, a->vm);
+    read_neon_element64(rm, a->vm, 0, MO_64);
     narrowfn(rd0, cpu_env, rm);
-    neon_load_reg64(rm, a->vm + 1);
+    read_neon_element64(rm, a->vm, 1, MO_64);
     narrowfn(rd1, cpu_env, rm);
     write_neon_element32(rd0, a->vd, 0, MO_32);
     write_neon_element32(rd1, a->vd, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
 
     widenfn(rd, rm0);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
-    neon_store_reg64(rd, a->vd);
+    write_neon_element64(rd, a->vd, 0, MO_64);
     widenfn(rd, rm1);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
-    neon_store_reg64(rd, a->vd + 1);
+    write_neon_element64(rd, a->vd, 1, MO_64);
 
     tcg_temp_free_i64(rd);
     tcg_temp_free_i32(rm0);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSWP(DisasContext *s, arg_2misc *a)
     rm = tcg_temp_new_i64();
     rd = tcg_temp_new_i64();
     for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
-        neon_load_reg64(rm, a->vm + pass);
-        neon_load_reg64(rd, a->vd + pass);
-        neon_store_reg64(rm, a->vd + pass);
-        neon_store_reg64(rd, a->vm + pass);
+        read_neon_element64(rm, a->vm, pass, MO_64);
+        read_neon_element64(rd, a->vd, pass, MO_64);
+        write_neon_element64(rm, a->vd, pass, MO_64);
+        write_neon_element64(rd, a->vm, pass, MO_64);
     }
     tcg_temp_free_i64(rm);
     tcg_temp_free_i64(rd);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The only uses of this function are for loading VFP
double-precision values, and nothing to do with NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         |  8 ++--
 target/arm/translate-vfp.c.inc | 84 +++++++++++++++++-----------------
 2 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-static inline void neon_load_reg64(TCGv_i64 var, int reg)
+static inline void vfp_load_reg64(TCGv_i64 var, int reg)
 {
-    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
+    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
 }
 
-static inline void neon_store_reg64(TCGv_i64 var, int reg)
+static inline void vfp_store_reg64(TCGv_i64 var, int reg)
 {
-    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
+    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
 }
 
 static inline void vfp_load_reg32(TCGv_i32 var, int reg)
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         tcg_gen_ext_i32_i64(nf, cpu_NF);
         tcg_gen_ext_i32_i64(vf, cpu_VF);
 
-        neon_load_reg64(frn, rn);
-        neon_load_reg64(frm, rm);
+        vfp_load_reg64(frn, rn);
+        vfp_load_reg64(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
             tcg_temp_free_i64(tmp);
             break;
         }
-        neon_store_reg64(dest, rd);
+        vfp_store_reg64(dest, rd);
         tcg_temp_free_i64(frn);
         tcg_temp_free_i64(frm);
         tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i64 tcg_res;
         tcg_op = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
-        neon_load_reg64(tcg_op, rm);
+        vfp_load_reg64(tcg_op, rm);
         gen_helper_rintd(tcg_res, tcg_op, fpst);
-        neon_store_reg64(tcg_res, rd);
+        vfp_store_reg64(tcg_res, rd);
         tcg_temp_free_i64(tcg_op);
         tcg_temp_free_i64(tcg_res);
     } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         tcg_double = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
         tcg_tmp = tcg_temp_new_i32();
-        neon_load_reg64(tcg_double, rm);
+        vfp_load_reg64(tcg_double, rm);
         if (is_signed) {
             gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
         } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
     tmp = tcg_temp_new_i64();
     if (a->l) {
         gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-        neon_store_reg64(tmp, a->vd);
+        vfp_store_reg64(tmp, a->vd);
     } else {
-        neon_load_reg64(tmp, a->vd);
+        vfp_load_reg64(tmp, a->vd);
         gen_aa32_st64(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i64(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
         if (a->l) {
             /* load */
             gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-            neon_store_reg64(tmp, a->vd + i);
+            vfp_store_reg64(tmp, a->vd + i);
         } else {
             /* store */
-            neon_load_reg64(tmp, a->vd + i);
+            vfp_load_reg64(tmp, a->vd + i);
             gen_aa32_st64(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     fd = tcg_temp_new_i64();
     fpst = fpstatus_ptr(FPST_FPCR);
 
-    neon_load_reg64(f0, vn);
-    neon_load_reg64(f1, vm);
+    vfp_load_reg64(f0, vn);
+    vfp_load_reg64(f1, vm);
 
     for (;;) {
         if (reads_vd) {
-            neon_load_reg64(fd, vd);
+            vfp_load_reg64(fd, vd);
         }
         fn(fd, f0, f1, fpst);
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
         veclen--;
         vd = vfp_advance_dreg(vd, delta_d);
         vn = vfp_advance_dreg(vn, delta_d);
-        neon_load_reg64(f0, vn);
+        vfp_load_reg64(f0, vn);
         if (delta_m) {
             vm = vfp_advance_dreg(vm, delta_m);
-            neon_load_reg64(f1, vm);
+            vfp_load_reg64(f1, vm);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     f0 = tcg_temp_new_i64();
     fd = tcg_temp_new_i64();
 
-    neon_load_reg64(f0, vm);
+    vfp_load_reg64(f0, vm);
 
     for (;;) {
         fn(fd, f0);
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
             /* single source one-many */
             while (veclen--) {
                 vd = vfp_advance_dreg(vd, delta_d);
-                neon_store_reg64(fd, vd);
+                vfp_store_reg64(fd, vd);
             }
             break;
         }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
         veclen--;
         vd = vfp_advance_dreg(vd, delta_d);
         vd = vfp_advance_dreg(vm, delta_m);
-        neon_load_reg64(f0, vm);
+        vfp_load_reg64(f0, vm);
     }
 
     tcg_temp_free_i64(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i64();
 
-    neon_load_reg64(vn, a->vn);
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vn, a->vn);
+    vfp_load_reg64(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negd(vn, vn);
     }
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negd(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     fd = tcg_const_i64(vfp_expand_imm(MO_64, a->imm));
 
     for (;;) {
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
     vd = tcg_temp_new_i64();
     vm = tcg_temp_new_i64();
 
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i64(vm, 0);
     } else {
-        neon_load_reg64(vm, a->vm);
+        vfp_load_reg64(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
     vd = tcg_temp_new_i64();
     gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(ahp_mode);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
     tmp = tcg_temp_new_i32();
     vm = tcg_temp_new_i64();
 
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
     tcg_temp_free_i64(vm);
     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rintd(tmp, tmp, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rintd(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     tcg_temp_free_i32(tcg_rmode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rintd_exact(tmp, tmp, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
     vd = tcg_temp_new_i64();
     vfp_load_reg32(vm, a->vm);
     gen_helper_vfp_fcvtds(vd, vm, cpu_env);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_i64(vd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
 
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i64();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
     vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
         /* u32 -> f64 */
         gen_helper_vfp_uitod(vd, vm, fpst);
     }
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_i64(vd);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
 
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i32();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vjcvt(vd, vm, cpu_env);
     vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i64(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i64();
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i64(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
     fpst = fpstatus_ptr(FPST_FPCR);
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i32();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

In both cases, we can sink the write-back and perform
the accumulate into the normal destination temps.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-11-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.c.inc | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     if (accfn) {
         tmp = tcg_temp_new_i64();
         read_neon_element64(tmp, a->vd, 0, MO_64);
-        accfn(tmp, tmp, rd0);
-        write_neon_element64(tmp, a->vd, 0, MO_64);
+        accfn(rd0, tmp, rd0);
         read_neon_element64(tmp, a->vd, 1, MO_64);
-        accfn(tmp, tmp, rd1);
-        write_neon_element64(tmp, a->vd, 1, MO_64);
+        accfn(rd1, tmp, rd1);
         tcg_temp_free_i64(tmp);
-    } else {
-        write_neon_element64(rd0, a->vd, 0, MO_64);
-        write_neon_element64(rd1, a->vd, 1, MO_64);
     }
 
+    write_neon_element64(rd0, a->vd, 0, MO_64);
+    write_neon_element64(rd1, a->vd, 1, MO_64);
     tcg_temp_free_i64(rd0);
     tcg_temp_free_i64(rd1);
 
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
     if (accfn) {
         TCGv_i64 t64 = tcg_temp_new_i64();
         read_neon_element64(t64, a->vd, 0, MO_64);
-        accfn(t64, t64, rn0_64);
-        write_neon_element64(t64, a->vd, 0, MO_64);
+        accfn(rn0_64, t64, rn0_64);
         read_neon_element64(t64, a->vd, 1, MO_64);
-        accfn(t64, t64, rn1_64);
-        write_neon_element64(t64, a->vd, 1, MO_64);
+        accfn(rn1_64, t64, rn1_64);
         tcg_temp_free_i64(t64);
-    } else {
-        write_neon_element64(rn0_64, a->vd, 0, MO_64);
-        write_neon_element64(rn1_64, a->vd, 1, MO_64);
     }
+
+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
     return true;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We can use proper widening loads to extend 32-bit inputs,
and skip the "widenfn" step.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  6 +++
 target/arm/translate-neon.c.inc | 66 ++++++++++++++++++---------------
 2 files changed, 43 insertions(+), 29 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
     long off = neon_element_offset(reg, ele, memop);
 
     switch (memop) {
+    case MO_SL:
+        tcg_gen_ld32s_i64(dest, cpu_env, off);
+        break;
+    case MO_UL:
+        tcg_gen_ld32u_i64(dest, cpu_env, off);
+        break;
     case MO_Q:
         tcg_gen_ld_i64(dest, cpu_env, off);
         break;
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
 static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
                            NeonGenWidenFn *widenfn,
                            NeonGenTwo64OpFn *opfn,
-                           bool src1_wide)
+                           int src1_mop, int src2_mop)
 {
     /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
     TCGv_i64 rn0_64, rn1_64, rm_64;
-    TCGv_i32 rm;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
         return false;
     }
 
-    if (!widenfn || !opfn) {
+    if (!opfn) {
         /* size == 3 case, which is an entirely different insn group */
         return false;
     }
 
-    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+    if ((a->vd & 1) || (src1_mop == MO_Q && (a->vn & 1))) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rn1_64 = tcg_temp_new_i64();
     rm_64 = tcg_temp_new_i64();
 
-    if (src1_wide) {
-        read_neon_element64(rn0_64, a->vn, 0, MO_64);
+    if (src1_mop >= 0) {
+        read_neon_element64(rn0_64, a->vn, 0, src1_mop);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 0, MO_32);
         widenfn(rn0_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = tcg_temp_new_i32();
-    read_neon_element32(rm, a->vm, 0, MO_32);
+    if (src2_mop >= 0) {
+        read_neon_element64(rm_64, a->vm, 0, src2_mop);
+    } else {
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, 0, MO_32);
+        widenfn(rm_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
 
-    widenfn(rm_64, rm);
-    tcg_temp_free_i32(rm);
     opfn(rn0_64, rn0_64, rm_64);
 
     /*
      * Load second pass inputs before storing the first pass result, to
      * avoid incorrect results if a narrow input overlaps with the result.
      */
-    if (src1_wide) {
-        read_neon_element64(rn1_64, a->vn, 1, MO_64);
+    if (src1_mop >= 0) {
+        read_neon_element64(rn1_64, a->vn, 1, src1_mop);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 1, MO_32);
         widenfn(rn1_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = tcg_temp_new_i32();
-    read_neon_element32(rm, a->vm, 1, MO_32);
+    if (src2_mop >= 0) {
+        read_neon_element64(rm_64, a->vm, 1, src2_mop);
+    } else {
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, 1, MO_32);
+        widenfn(rm_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
 
     write_neon_element64(rn0_64, a->vd, 0, MO_64);
 
-    widenfn(rm_64, rm);
-    tcg_temp_free_i32(rm);
     opfn(rn1_64, rn1_64, rm_64);
     write_neon_element64(rn1_64, a->vd, 1, MO_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     return true;
 }
 
-#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
+#define DO_PREWIDEN(INSN, S, OP, SRC1WIDE, SIGN)                        \
     static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
     {                                                                   \
         static NeonGenWidenFn * const widenfn[] = {                     \
             gen_helper_neon_widen_##S##8,                               \
             gen_helper_neon_widen_##S##16,                              \
-            tcg_gen_##EXT##_i32_i64,                                    \
-            NULL,                                                       \
+            NULL, NULL,                                                 \
         };                                                              \
         static NeonGenTwo64OpFn * const addfn[] = {                     \
             gen_helper_neon_##OP##l_u16,                                \
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
             tcg_gen_##OP##_i64,                                         \
             NULL,                                                       \
         };                                                              \
-        return do_prewiden_3d(s, a, widenfn[a->size],                   \
-                              addfn[a->size], SRC1WIDE);                \
+        int narrow_mop = a->size == MO_32 ? MO_32 | SIGN : -1;          \
+        return do_prewiden_3d(s, a, widenfn[a->size], addfn[a->size],   \
+                              SRC1WIDE ? MO_Q : narrow_mop,             \
+                              narrow_mop);                              \
     }
 
-DO_PREWIDEN(VADDL_S, s, ext, add, false)
-DO_PREWIDEN(VADDL_U, u, extu, add, false)
-DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
-DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
-DO_PREWIDEN(VADDW_S, s, ext, add, true)
-DO_PREWIDEN(VADDW_U, u, extu, add, true)
-DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
-DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+DO_PREWIDEN(VADDL_S, s, add, false, MO_SIGN)
+DO_PREWIDEN(VADDL_U, u, add, false, 0)
+DO_PREWIDEN(VSUBL_S, s, sub, false, MO_SIGN)
+DO_PREWIDEN(VSUBL_U, u, sub, false, 0)
+DO_PREWIDEN(VADDW_S, s, add, true, MO_SIGN)
+DO_PREWIDEN(VADDW_U, u, add, true, 0)
+DO_PREWIDEN(VSUBW_S, s, sub, true, MO_SIGN)
+DO_PREWIDEN(VSUBW_U, u, sub, true, 0)
 
 static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
                          NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
-- 
2.20.1

In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
meant we were using the H4() address swizzler macro rather than the
H2() which is required for 2-byte data.  This had no effect on
little-endian hosts but meant we put the result data into the
destination Dreg in the wrong order on big-endian hosts.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201028191712.4910-2-peter.maydell@linaro.org
---
 target/arm/vec_helper.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_d, uint64_t)
         r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst);                    \
         r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst);                    \
                                                                         \
-        d[H4(0)] = r0;                                                  \
-        d[H4(1)] = r1;                                                  \
-        d[H4(2)] = r2;                                                  \
-        d[H4(3)] = r3;                                                  \
+        d[H2(0)] = r0;                                                  \
+        d[H2(1)] = r1;                                                  \
+        d[H2(2)] = r2;                                                  \
+        d[H2(3)] = r3;                                                  \
     }
 
 DO_NEON_PAIRWISE(neon_padd, add)
-- 
2.20.1

The helper functions for performing the udot/sdot operations against
a scalar were not using an address-swizzling macro when converting
the index of the scalar element into a pointer into the vm array.
This had no effect on little-endian hosts but meant we generated
incorrect results on big-endian hosts.

For these insns, the index is indexing over group of 4 8-bit values,
so 32 bits per indexed entity, and H4() is therefore what we want.
(For Neon the only possible input indexes are 0 and 1.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201028191712.4910-3-peter.maydell@linaro.org
---
 target/arm/vec_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t index = simd_data(desc);
     uint32_t *d = vd;
     int8_t *n = vn;
-    int8_t *m_indexed = (int8_t *)vm + index * 4;
+    int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
 
     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
      * Otherwise opr_sz is a multiple of 16.
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t index = simd_data(desc);
     uint32_t *d = vd;
     uint8_t *n = vn;
-    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
+    uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
 
     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
      * Otherwise opr_sz is a multiple of 16.
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

HCR should be applied when NS is set, not when it is cleared.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 
 /*
  * Non-IS variants of TLB operations are upgraded to
- * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
+ * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to
  * force broadcast of these operations.
  */
 static bool tlb_force_broadcast(CPUARMState *env)
 {
-    return (env->cp15.hcr_el2 & HCR_FB) &&
-        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
+    return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB);
 }
 
 static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
future HCR_EL2.TLOR when S-EL2 is enabled.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
 #endif
 
 /* Shared logic between LORID and the rest of the LOR* registers.
- * Secure state has already been delt with.
+ * Secure state exclusion has already been dealt with.
  */
-static CPAccessResult access_lor_ns(CPUARMState *env)
+static CPAccessResult access_lor_ns(CPUARMState *env,
+                                    const ARMCPRegInfo *ri, bool isread)
 {
     int el = arm_current_el(env);
 
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_ns(CPUARMState *env)
     return CP_ACCESS_OK;
 }
 
-static CPAccessResult access_lorid(CPUARMState *env, const ARMCPRegInfo *ri,
-                                   bool isread)
-{
-    if (arm_is_secure_below_el3(env)) {
-        /* Access ok in secure mode.  */
-        return CP_ACCESS_OK;
-    }
-    return access_lor_ns(env);
-}
-
 static CPAccessResult access_lor_other(CPUARMState *env,
                                        const ARMCPRegInfo *ri, bool isread)
 {
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_other(CPUARMState *env,
         /* Access denied in secure mode.  */
         return CP_ACCESS_TRAP;
     }
-    return access_lor_ns(env);
+    return access_lor_ns(env, ri, isread);
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lor_reginfo[] = {
       .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "LORID_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 7,
-      .access = PL1_R, .accessfn = access_lorid,
+      .access = PL1_R, .accessfn = access_lor_ns,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     REGINFO_SENTINEL
 };
-- 
2.20.1

If we're using the capstone disassembler, disassembly of a run of
instructions more than 32 bytes long disassembles the wrong data for
instructions beyond the 32 byte mark:

(qemu) xp /16x 0x100
0000000000000100: 0x00000005 0x54410001 0x00000001 0x00001000
0000000000000110: 0x00000000 0x00000004 0x54410002 0x3c000000
0000000000000120: 0x00000000 0x00000004 0x54410009 0x74736574
0000000000000130: 0x00000000 0x00000000 0x00000000 0x00000000
(qemu) xp /16i 0x100
0x00000100: 00000005 andeq r0, r0, r5
0x00000104: 54410001 strbpl r0, [r1], #-1
0x00000108: 00000001 andeq r0, r0, r1
0x0000010c: 00001000 andeq r1, r0, r0
0x00000110: 00000000 andeq r0, r0, r0
0x00000114: 00000004 andeq r0, r0, r4
0x00000118: 54410002 strbpl r0, [r1], #-2
0x0000011c: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
0x00000120: 54410001 strbpl r0, [r1], #-1
0x00000124: 00000001 andeq r0, r0, r1
0x00000128: 00001000 andeq r1, r0, r0
0x0000012c: 00000000 andeq r0, r0, r0
0x00000130: 00000004 andeq r0, r0, r4
0x00000134: 54410002 strbpl r0, [r1], #-2
0x00000138: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
0x0000013c: 00000000 andeq r0, r0, r0

Here the disassembly of 0x120..0x13f is using the data that is in
0x104..0x123.

This is caused by passing the wrong value to the read_memory_func().
The intention is that at this point in the loop the 'cap_buf' buffer
already contains 'csize' bytes of data for the instruction at guest
addr 'pc', and we want to read in an extra 'tsize' bytes.  Those
extra bytes are therefore at 'pc + csize', not 'pc'.  On the first
time through the loop 'csize' happens to be zero, so the initial read
of 32 bytes into cap_buf is correct and as long as the disassembly
never needs to read more data we return the correct information.

Use the correct guest address in the call to read_memory_func().

Cc: qemu-stable@nongnu.org
Fixes: https://bugs.launchpad.net/qemu/+bug/1900779
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201022132445.25039-1-peter.maydell@linaro.org
---
 disas/capstone.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/disas/capstone.c b/disas/capstone.c
index XXXXXXX..XXXXXXX 100644
--- a/disas/capstone.c
+++ b/disas/capstone.c
@@ -XXX,XX +XXX,XX @@ bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
 
         /* Make certain that we can make progress.  */
         assert(tsize != 0);
-        info->read_memory_func(pc, cap_buf + csize, tsize, info);
+        info->read_memory_func(pc + csize, cap_buf + csize, tsize, info);
         csize += tsize;
 
         if (cs_disasm_iter(handle, &cbuf, &csize, &pc, insn)) {
-- 
2.20.1

From: Philippe Mathieu-Daudé <philmd@redhat.com>

Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):

CID 1432363 (#1 of 1): Unintentional integer overflow:

overflow_before_widen:
    Potentially overflowing expression 1 << scale with type int
    (32 bits, signed) is evaluated using 32-bit arithmetic, and
    then used in a context that expects an expression of type
    hwaddr (64 bits, unsigned).

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20201030144617.1535064-1-philmd@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/smmuv3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bitops.h"
 #include "hw/irq.h"
 #include "hw/sysbus.h"
 #include "migration/vmstate.h"
@@ -XXX,XX +XXX,XX @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
         scale = CMD_SCALE(cmd);
         num = CMD_NUM(cmd);
         ttl = CMD_TTL(cmd);
-        num_pages = (num + 1) * (1 << (scale));
+        num_pages = (num + 1) * BIT_ULL(scale);
     }
 
     if (type == SMMU_CMD_TLBI_NH_VA) {
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

When booting a CPU with EL3 using the -kernel flag, set up CPTR_EL3 so
that SVE will not trap to EL3.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030151541.11976-1-remi@remlab.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/boot.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
                     if (cpu_isar_feature(aa64_mte, cpu)) {
                         env->cp15.scr_el3 |= SCR_ATA;
                     }
+                    if (cpu_isar_feature(aa64_sve, cpu)) {
+                        env->cp15.cptr_el[3] |= CPTR_EZ;
+                    }
                     /* AArch64 kernels never boot in secure mode */
                     assert(!info->secure_boot);
                     /* This hook is only supported for AArch32 currently:
-- 
2.20.1

From: AlexChen <alex.chen@huawei.com>

In omap_lcd_interrupts(), the pointer omap_lcd is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to surface after checking that the omap_lcd is valid
and move surface_bits_per_pixel(surface) to after the surface assignment.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: AlexChen <alex.chen@huawei.com>
Message-id: 5F9CDB8A.9000001@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/display/omap_lcdc.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/omap_lcdc.c
+++ b/hw/display/omap_lcdc.c
@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
 static void omap_update_display(void *opaque)
 {
     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
-    DisplaySurface *surface = qemu_console_surface(omap_lcd->con);
+    DisplaySurface *surface;
     draw_line_func draw_line;
     int size, height, first, last;
     int width, linesize, step, bpp, frame_offset;
     hwaddr frame_base;
 
-    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable ||
-        !surface_bits_per_pixel(surface)) {
+    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable) {
+        return;
+    }
+
+    surface = qemu_console_surface(omap_lcd->con);
+    if (!surface_bits_per_pixel(surface)) {
         return;
     }
 
-- 
2.20.1

From: AlexChen <alex.chen@huawei.com>

In exynos4210_fimd_update(), the pointer s is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to global_width after checking that the s is valid.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 5F9F8D88.9030102@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/display/exynos4210_fimd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/exynos4210_fimd.c
+++ b/hw/display/exynos4210_fimd.c
@@ -XXX,XX +XXX,XX @@ static void exynos4210_fimd_update(void *opaque)
     bool blend = false;
     uint8_t *host_fb_addr;
     bool is_dirty = false;
-    const int global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
+    int global_width;
 
     if (!s || !s->console || !s->enabled ||
         surface_bits_per_pixel(qemu_console_surface(s->console)) == 0) {
         return;
     }
+
+    global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
     exynos4210_update_resolution(s);
     surface = qemu_console_surface(s->console);
 
-- 
2.20.1

In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
This is incorrect when the security state being queried is not the
current one, because arm_current_el() uses the current security state
to determine which of the banked CONTROL.nPRIV bits to look at.
The effect was that if (for instance) Secure state was in privileged
mode but Non-Secure was not then we would return the wrong MMU index.

The only places where we are using this function in a way that could
trigger this bug are for the stack loads during a v8M function-return
and for the instruction fetch of a v8M SG insn.

Fix the bug by expanding out the M-profile version of the
arm_current_el() logic inline so it can use the passed in secstate
rather than env->v7m.secure.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201022164408.13214-1-peter.maydell@linaro.org
---
 target/arm/m_helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
 /* Return the MMU index for a v7M CPU in the specified security state */
 ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
 {
-    bool priv = arm_current_el(env) != 0;
+    bool priv = arm_v7m_is_handler_mode(env) ||
+        !(env->v7m.control[secstate] & 1);
 
     return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
 }
-- 
2.20.1

On some hosts (eg Ubuntu Bionic) pkg-config returns a set of
libraries for gio-2.0 which don't actually work when compiling
statically. (Specifically, the returned library string includes
-lmount, but not -lblkid which -lmount depends upon, so linking
fails due to missing symbols.)

Check that the libraries work, and don't enable gio if they don't,
in the same way we do for gnutls.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200928160402.7961-1-peter.maydell@linaro.org
---
 configure | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index XXXXXXX..XXXXXXX 100755
--- a/configure
+++ b/configure
@@ -XXX,XX +XXX,XX @@ if test "$static" = yes && test "$mingw32" = yes; then
 fi
 
 if $pkg_config --atleast-version=$glib_req_ver gio-2.0; then
-    gio=yes
     gio_cflags=$($pkg_config --cflags gio-2.0)
     gio_libs=$($pkg_config --libs gio-2.0)
     gdbus_codegen=$($pkg_config --variable=gdbus_codegen gio-2.0)
     if [ ! -x "$gdbus_codegen" ]; then
         gdbus_codegen=
     fi
+    # Check that the libraries actually work -- Ubuntu 18.04 ships
+    # with pkg-config --static --libs data for gio-2.0 that is missing
+    # -lblkid and will give a link error.
+    write_c_skeleton
+    if compile_prog "" "gio_libs" ; then
+        gio=yes
+    else
+        gio=no
+    fi
 else
     gio=no
 fi
-- 
2.20.1

In gicv3_init_cpuif() we copy the ARMCPU gicv3_maintenance_interrupt
into the GICv3CPUState struct's maintenance_irq field.  This will
only work if the board happens to have already wired up the CPU
maintenance IRQ before the GIC was realized.  Unfortunately this is
not the case for the 'virt' board, and so the value that gets copied
is NULL (since a qemu_irq is really a pointer to an IRQState struct
under the hood).  The effect is that the CPU interface code never
actually raises the maintenance interrupt line.

Instead, since the GICv3CPUState has a pointer to the CPUState, make
the dereference at the point where we want to raise the interrupt, to
avoid an implicit requirement on board code to wire things up in a
particular order.

Reported-by: Jose Martins <josemartins90@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201009153904.28529-1-peter.maydell@linaro.org
Reviewed-by: Luc Michel <luc@lmichel.fr>
---
 include/hw/intc/arm_gicv3_common.h | 1 -
 hw/intc/arm_gicv3_cpuif.c          | 5 ++---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -XXX,XX +XXX,XX @@ struct GICv3CPUState {
     qemu_irq parent_fiq;
     qemu_irq parent_virq;
     qemu_irq parent_vfiq;
-    qemu_irq maintenance_irq;
 
     /* Redistributor */
     uint32_t level;                  /* Current IRQ level */
diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
     int irqlevel = 0;
     int fiqlevel = 0;
     int maintlevel = 0;
+    ARMCPU *cpu = ARM_CPU(cs->cpu);
 
     idx = hppvi_index(cs);
     trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx);
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
 
     qemu_set_irq(cs->parent_vfiq, fiqlevel);
     qemu_set_irq(cs->parent_virq, irqlevel);
-    qemu_set_irq(cs->maintenance_irq, maintlevel);
+    qemu_set_irq(cpu->gicv3_maintenance_interrupt, maintlevel);
 }
 
 static uint64_t icv_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
             && cpu->gic_num_lrs) {
             int j;
 
-            cs->maintenance_irq = cpu->gicv3_maintenance_interrupt;
-
             cs->num_list_regs = cpu->gic_num_lrs;
             cs->vpribits = cpu->gic_vpribits;
             cs->vprebits = cpu->gic_vprebits;
-- 
2.20.1

The kerneldoc script currently emits Sphinx markup for a macro with
arguments that uses the c:function directive. This is correct for
Sphinx versions earlier than Sphinx 3, where c:macro doesn't allow
documentation of macros with arguments and c:function is not picky
about the syntax of what it is passed. However, in Sphinx 3 the
c:macro directive was enhanced to support macros with arguments,
and c:function was made more picky about what syntax it accepted.

When kerneldoc is told that it needs to produce output for Sphinx
3 or later, make it emit c:function only for functions and c:macro
for macros with arguments. We assume that anything with a return
type is a function and anything without is a macro.

This fixes the Sphinx error:

/home/petmay01/linaro/qemu-from-laptop/qemu/docs/../include/qom/object.h:155:Error in declarator
If declarator-id with parameters (e.g., 'void f(int arg)'):
  Invalid C declaration: Expected identifier in nested name. [error at 25]
    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
    -------------------------^
If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
  Error in declarator or parameters
  Invalid C declaration: Expecting "(" in parameters. [error at 39]
    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
    ---------------------------------------^

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201030174700.7204-2-peter.maydell@linaro.org
---
 scripts/kernel-doc | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index XXXXXXX..XXXXXXX 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -XXX,XX +XXX,XX @@ sub output_function_rst(%) {
 	output_highlight_rst($args{'purpose'});
 	$start = "\n\n**Syntax**\n\n  ``";
     } else {
-	print ".. c:function:: ";
+        if ((split(/\./, $sphinx_version))[0] >= 3) {
+            # Sphinx 3 and later distinguish macros and functions and
+            # complain if you use c:function with something that's not
+            # syntactically valid as a function declaration.
+            # We assume that anything with a return type is a function
+            # and anything without is a macro.
+            if ($args{'functiontype'} ne "") {
+                print ".. c:function:: ";
+            } else {
+                print ".. c:macro:: ";
+            }
+        } else {
+            # Older Sphinx don't support documenting macros that take
+            # arguments with c:macro, and don't complain about the use
+            # of c:function for this.
+            print ".. c:function:: ";
+        }
     }
     if ($args{'functiontype'} ne "") {
 	$start .= $args{'functiontype'} . " " . $args{'function'} . " (";
-- 
2.20.1

Sphinx 3.2 is pickier than earlier versions about the option:: markup,
and complains about our usage in qemu-option-trace.rst:

../../docs/qemu-option-trace.rst.inc:4:Malformed option description
  '[enable=]PATTERN', should look like "opt", "-opt args", "--opt args",
  "/opt args" or "+opt args"

In this file, we're really trying to document the different parts of
the top-level --trace option, which qemu-nbd.rst and qemu-img.rst
have already introduced with an option:: markup.  So it's not right
to use option:: here anyway.  Switch to a different markup
(definition lists) which gives about the same formatted output.

(Unlike option::, this markup doesn't produce index entries; but
at the moment we don't do anything much with indexes anyway, and
in any case I think it doesn't make much sense to have individual
index entries for the sub-parts of the --trace option.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201030174700.7204-3-peter.maydell@linaro.org
---
 docs/qemu-option-trace.rst.inc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/qemu-option-trace.rst.inc b/docs/qemu-option-trace.rst.inc
index XXXXXXX..XXXXXXX 100644
--- a/docs/qemu-option-trace.rst.inc
+++ b/docs/qemu-option-trace.rst.inc
@@ -XXX,XX +XXX,XX @@
 
 Specify tracing options.
 
-.. option:: [enable=]PATTERN
+``[enable=]PATTERN``
 
   Immediately enable events matching *PATTERN*
   (either event name or a globbing pattern).  This option is only
@@ -XXX,XX +XXX,XX @@ Specify tracing options.
 
   Use :option:`-trace help` to print a list of names of trace points.
 
-.. option:: events=FILE
+``events=FILE``
 
   Immediately enable events listed in *FILE*.
   The file must contain one event name (as listed in the ``trace-events-all``
@@ -XXX,XX +XXX,XX @@ Specify tracing options.
   available if QEMU has been compiled with the ``simple``, ``log`` or
   ``ftrace`` tracing backend.
 
-.. option:: file=FILE
+``file=FILE``
 
   Log output traces to *FILE*.
   This option is only available if QEMU has been compiled with
-- 
2.20.1

The randomness tests in the NPCM7xx RNG test fail intermittently
but fairly frequently. On my machine running the test in a loop:
 while QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/npcm7xx_rng-test; do true; done

will fail in less than a minute with an error like:
ERROR:../../tests/qtest/npcm7xx_rng-test.c:256:test_first_byte_runs:
assertion failed (calc_runs_p(buf.l, sizeof(buf) * BITS_PER_BYTE) > 0.01): (0.00286205989 > 0.01)

(Failures have been observed on all 4 of the randomness tests,
not just first_byte_runs.)

It's not clear why these tests are failing like this, but intermittent
failures make CI and merge testing awkward, so disable running them
unless a developer specifically sets QEMU_TEST_FLAKY_RNG_TESTS when
running the test suite, until we work out the cause.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201102152454.8287-1-peter.maydell@linaro.org
Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>
---
 tests/qtest/npcm7xx_rng-test.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/npcm7xx_rng-test.c b/tests/qtest/npcm7xx_rng-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/npcm7xx_rng-test.c
+++ b/tests/qtest/npcm7xx_rng-test.c
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
 
     qtest_add_func("npcm7xx_rng/enable_disable", test_enable_disable);
     qtest_add_func("npcm7xx_rng/rosel", test_rosel);
-    qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
-    qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
-    qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
-    qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+    /*
+     * These tests fail intermittently; only run them on explicit
+     * request until we figure out why.
+     */
+    if (getenv("QEMU_TEST_FLAKY_RNG_TESTS")) {
+        qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
+        qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
+        qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
+        qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+    }
 
     qtest_start("-machine npcm750-evb");
     ret = g_test_run();
-- 
2.20.1

Hi; this mostly contains the first slice of A64 decodetree
patches, plus some other minor pieces. It also has the
enablement of MTE for KVM guests.

thanks
-- PMM

The following changes since commit d27e7c359330ba7020bdbed7ed2316cb4cf6ffc1:

qapi/parser: Drop two bad type hints for now (2023-05-17 10:18:33 -0700)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230518

for you to fetch changes up to 91608e2a44f36e79cb83f863b8a7bb57d2c98061:

docs: Convert u2f.txt to rST (2023-05-18 11:40:32 +0100)

----------------------------------------------------------------
target-arm queue:
 * Fix vd == vm overlap in sve_ldff1_z
 * Add support for MTE with KVM guests
 * Add RAZ/WI handling for DBGDTR[TX|RX]
 * Start of conversion of A64 decoder to decodetree
 * Saturate L2CTLR_EL1 core count field rather than overflowing
 * vexpress: Avoid trivial memory leak of 'flashalias'
 * sbsa-ref: switch default cpu core to Neoverse-N1
 * sbsa-ref: use Bochs graphics card instead of VGA
 * MAINTAINERS: Add Marcin Juszkiewicz to sbsa-ref reviewer list
 * docs: Convert u2f.txt to rST

----------------------------------------------------------------
Alex Bennée (1):
      target/arm: add RAZ/WI handling for DBGDTR[TX|RX]

Cornelia Huck (1):
      arm/kvm: add support for MTE

Marcin Juszkiewicz (3):
      sbsa-ref: switch default cpu core to Neoverse-N1
      Maintainers: add myself as reviewer for sbsa-ref
      sbsa-ref: use Bochs graphics card instead of VGA

Peter Maydell (14):
      target/arm: Create decodetree skeleton for A64
      target/arm: Pull calls to disas_sve() and disas_sme() out of legacy decoder
      target/arm: Convert Extract instructions to decodetree
      target/arm: Convert unconditional branch immediate to decodetree
      target/arm: Convert CBZ, CBNZ to decodetree
      target/arm: Convert TBZ, TBNZ to decodetree
      target/arm: Convert conditional branch insns to decodetree
      target/arm: Convert BR, BLR, RET to decodetree
      target/arm: Convert BRA[AB]Z, BLR[AB]Z, RETA[AB] to decodetree
      target/arm: Convert BRAA, BRAB, BLRAA, BLRAB to decodetree
      target/arm: Convert ERET, ERETAA, ERETAB to decodetree
      target/arm: Saturate L2CTLR_EL1 core count field rather than overflowing
      hw/arm/vexpress: Avoid trivial memory leak of 'flashalias'
      docs: Convert u2f.txt to rST

Richard Henderson (10):
      target/arm: Fix vd == vm overlap in sve_ldff1_z
      target/arm: Split out disas_a64_legacy
      target/arm: Convert PC-rel addressing to decodetree
      target/arm: Split gen_add_CC and gen_sub_CC
      target/arm: Convert Add/subtract (immediate) to decodetree
      target/arm: Convert Add/subtract (immediate with tags) to decodetree
      target/arm: Replace bitmask64 with MAKE_64BIT_MASK
      target/arm: Convert Logical (immediate) to decodetree
      target/arm: Convert Move wide (immediate) to decodetree
      target/arm: Convert Bitfield to decodetree

From: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>

The world outside moves to newer and newer cpu cores. Let move SBSA
Reference Platform to something newer as well.

Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Reviewed-by: Leif Lindholm <quic_llindhol@quicinc.com>
Message-id: 20230506183417.1360427-1-marcin.juszkiewicz@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/sbsa-ref.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -XXX,XX +XXX,XX @@ static void sbsa_ref_class_init(ObjectClass *oc, void *data)
 
     mc->init = sbsa_ref_init;
     mc->desc = "QEMU 'SBSA Reference' ARM Virtual Machine";
-    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a57");
+    mc->default_cpu_type = ARM_CPU_TYPE_NAME("neoverse-n1");
     mc->max_cpus = 512;
     mc->pci_allow_0_address = true;
     mc->minimum_page_bits = 12;
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

If vd == vm, copy vm to scratch, so that we can pre-zero
the output and still access the gather indicies.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1612
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230504104232.1877774-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sve_helper.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -XXX,XX +XXX,XX @@ void sve_ldff1_z(CPUARMState *env, void *vd, uint64_t *vg, void *vm,
     intptr_t reg_off;
     SVEHostPage info;
     target_ulong addr, in_page;
+    ARMVectorReg scratch;
 
     /* Skip to the first true predicate.  */
     reg_off = find_next_active(vg, 0, reg_max, esz);
@@ -XXX,XX +XXX,XX @@ void sve_ldff1_z(CPUARMState *env, void *vd, uint64_t *vg, void *vm,
         return;
     }
 
+    /* Protect against overlap between vd and vm. */
+    if (unlikely(vd == vm)) {
+        vm = memcpy(&scratch, vm, reg_max);
+    }
+
     /*
      * Probe the first element, allowing faults.
      */
-- 
2.34.1

From: Cornelia Huck <cohuck@redhat.com>

Extend the 'mte' property for the virt machine to cover KVM as
well. For KVM, we don't allocate tag memory, but instead enable the
capability.

If MTE has been enabled, we need to disable migration, as we do not
yet have a way to migrate the tags as well. Therefore, MTE will stay
off with KVM unless requested explicitly.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230428095533.21747-2-cohuck@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h     |  4 +++
 target/arm/kvm_arm.h | 19 ++++++++++++
 hw/arm/virt.c        | 73 +++++++++++++++++++++++++-------------------
 target/arm/cpu.c     |  9 +++---
 target/arm/kvm.c     | 35 +++++++++++++++++++++
 target/arm/kvm64.c   |  5 +++
 6 files changed, 109 insertions(+), 36 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
      */
     uint32_t psci_conduit;
 
+    /* CPU has Memory Tag Extension */
+    bool has_mte;
+
     /* For v8M, initial value of the Secure VTOR */
     uint32_t init_svtor;
     /* For v8M, initial value of the Non-secure VTOR */
@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
     bool prop_pauth;
     bool prop_pauth_impdef;
     bool prop_lpa2;
+    OnOffAuto prop_mte;
 
     /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
     uint32_t dcz_blocksize;
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_pmu_supported(void);
  */
 bool kvm_arm_sve_supported(void);
 
+/**
+ * kvm_arm_mte_supported:
+ *
+ * Returns: true if KVM can enable MTE, and false otherwise.
+ */
+bool kvm_arm_mte_supported(void);
+
 /**
  * kvm_arm_get_max_vm_ipa_size:
  * @ms: Machine state handle
@@ -XXX,XX +XXX,XX @@ void kvm_arm_pvtime_init(CPUState *cs, uint64_t ipa);
 
 int kvm_arm_set_irq(int cpu, int irqtype, int irq, int level);
 
+void kvm_arm_enable_mte(Object *cpuobj, Error **errp);
+
 #else
 
 /*
@@ -XXX,XX +XXX,XX @@ static inline bool kvm_arm_steal_time_supported(void)
     return false;
 }
 
+static inline bool kvm_arm_mte_supported(void)
+{
+    return false;
+}
+
 /*
  * These functions should never actually be called without KVM support.
  */
@@ -XXX,XX +XXX,XX @@ static inline uint32_t kvm_arm_sve_get_vls(CPUState *cs)
     g_assert_not_reached();
 }
 
+static inline void kvm_arm_enable_mte(Object *cpuobj, Error **errp)
+{
+    g_assert_not_reached();
+}
+
 #endif
 
 static inline const char *gic_class_name(void)
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
         exit(1);
     }
 
-    if (vms->mte && (kvm_enabled() || hvf_enabled())) {
+    if (vms->mte && hvf_enabled()) {
         error_report("mach-virt: %s does not support providing "
                      "MTE to the guest CPU",
                      current_accel_name());
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
         }
 
         if (vms->mte) {
-            /* Create the memory region only once, but link to all cpus. */
-            if (!tag_sysmem) {
-                /*
-                 * The property exists only if MemTag is supported.
-                 * If it is, we must allocate the ram to back that up.
-                 */
-                if (!object_property_find(cpuobj, "tag-memory")) {
-                    error_report("MTE requested, but not supported "
-                                 "by the guest CPU");
+            if (tcg_enabled()) {
+                /* Create the memory region only once, but link to all cpus. */
+                if (!tag_sysmem) {
+                    /*
+                     * The property exists only if MemTag is supported.
+                     * If it is, we must allocate the ram to back that up.
+                     */
+                    if (!object_property_find(cpuobj, "tag-memory")) {
+                        error_report("MTE requested, but not supported "
+                                     "by the guest CPU");
+                        exit(1);
+                    }
+
+                    tag_sysmem = g_new(MemoryRegion, 1);
+                    memory_region_init(tag_sysmem, OBJECT(machine),
+                                       "tag-memory", UINT64_MAX / 32);
+
+                    if (vms->secure) {
+                        secure_tag_sysmem = g_new(MemoryRegion, 1);
+                        memory_region_init(secure_tag_sysmem, OBJECT(machine),
+                                           "secure-tag-memory",
+                                           UINT64_MAX / 32);
+
+                        /* As with ram, secure-tag takes precedence over tag. */
+                        memory_region_add_subregion_overlap(secure_tag_sysmem,
+                                                            0, tag_sysmem, -1);
+                    }
+                }
+
+                object_property_set_link(cpuobj, "tag-memory",
+                                         OBJECT(tag_sysmem), &error_abort);
+                if (vms->secure) {
+                    object_property_set_link(cpuobj, "secure-tag-memory",
+                                             OBJECT(secure_tag_sysmem),
+                                             &error_abort);
+                }
+            } else if (kvm_enabled()) {
+                if (!kvm_arm_mte_supported()) {
+                    error_report("MTE requested, but not supported by KVM");
                     exit(1);
                 }
-
-                tag_sysmem = g_new(MemoryRegion, 1);
-                memory_region_init(tag_sysmem, OBJECT(machine),
-                                   "tag-memory", UINT64_MAX / 32);
-
-                if (vms->secure) {
-                    secure_tag_sysmem = g_new(MemoryRegion, 1);
-                    memory_region_init(secure_tag_sysmem, OBJECT(machine),
-                                       "secure-tag-memory", UINT64_MAX / 32);
-
-                    /* As with ram, secure-tag takes precedence over tag.  */
-                    memory_region_add_subregion_overlap(secure_tag_sysmem, 0,
-                                                        tag_sysmem, -1);
-                }
-            }
-
-            object_property_set_link(cpuobj, "tag-memory", OBJECT(tag_sysmem),
-                                     &error_abort);
-            if (vms->secure) {
-                object_property_set_link(cpuobj, "secure-tag-memory",
-                                         OBJECT(secure_tag_sysmem),
-                                         &error_abort);
+                kvm_arm_enable_mte(cpuobj, &error_abort);
             }
         }
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
                                      qdev_prop_allow_set_link_before_realize,
                                      OBJ_PROP_LINK_STRONG);
         }
+        cpu->has_mte = true;
     }
 #endif
 }
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         }
         if (cpu->tag_memory) {
             error_setg(errp,
-                       "Cannot enable %s when guest CPUs has MTE enabled",
+                       "Cannot enable %s when guest CPUs has tag memory enabled",
                        current_accel_name());
             return;
         }
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
     }
 
 #ifndef CONFIG_USER_ONLY
-    if (cpu->tag_memory == NULL && cpu_isar_feature(aa64_mte, cpu)) {
+    if (!cpu->has_mte && cpu_isar_feature(aa64_mte, cpu)) {
         /*
-         * Disable the MTE feature bits if we do not have tag-memory
-         * provided by the machine.
+         * Disable the MTE feature bits if we do not have the feature
+         * setup by the machine.
          */
         cpu->isar.id_aa64pfr1 =
             FIELD_DP64(cpu->isar.id_aa64pfr1, ID_AA64PFR1, MTE, 0);
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/boards.h"
 #include "hw/irq.h"
 #include "qemu/log.h"
+#include "migration/blocker.h"
 
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
     KVM_CAP_LAST_INFO
@@ -XXX,XX +XXX,XX @@ bool kvm_arch_cpu_check_are_resettable(void)
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
 }
+
+void kvm_arm_enable_mte(Object *cpuobj, Error **errp)
+{
+    static bool tried_to_enable;
+    static bool succeeded_to_enable;
+    Error *mte_migration_blocker = NULL;
+    int ret;
+
+    if (!tried_to_enable) {
+        /*
+         * MTE on KVM is enabled on a per-VM basis (and retrying doesn't make
+         * sense), and we only want a single migration blocker as well.
+         */
+        tried_to_enable = true;
+
+        ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_ARM_MTE, 0);
+        if (ret) {
+            error_setg_errno(errp, -ret, "Failed to enable KVM_CAP_ARM_MTE");
+            return;
+        }
+
+        /* TODO: add proper migration support with MTE enabled */
+        error_setg(&mte_migration_blocker,
+                   "Live migration disabled due to MTE enabled");
+        if (migrate_add_blocker(mte_migration_blocker, errp)) {
+            error_free(mte_migration_blocker);
+            return;
+        }
+        succeeded_to_enable = true;
+    }
+    if (succeeded_to_enable) {
+        object_property_set_bool(cpuobj, "has_mte", true, NULL);
+    }
+}
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_steal_time_supported(void)
     return kvm_check_extension(kvm_state, KVM_CAP_STEAL_TIME);
 }
 
+bool kvm_arm_mte_supported(void)
+{
+    return kvm_check_extension(kvm_state, KVM_CAP_ARM_MTE);
+}
+
 QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1);
 
 uint32_t kvm_arm_sve_get_vls(CPUState *cs)
-- 
2.34.1

From: Alex Bennée <alex.bennee@linaro.org>

The commit b3aa2f2128 (target/arm: provide stubs for more external
debug registers) was added to handle HyperV's unconditional usage of
Debug Communications Channel. It turns out that Linux will similarly
break if you enable CONFIG_HVC_DCC "ARM JTAG DCC console".

Extend the registers we RAZ/WI set to avoid this.

Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230516104420.407912-1-alex.bennee@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/debug_helper.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/debug_helper.c
+++ b/target/arm/debug_helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
       .access = PL0_R, .accessfn = access_tdcc,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     /*
-     * OSDTRRX_EL1/OSDTRTX_EL1 are used for save and restore of DBGDTRRX_EL0.
-     * It is a component of the Debug Communications Channel, which is not implemented.
+     * These registers belong to the Debug Communications Channel,
+     * which is not implemented. However we implement RAZ/WI behaviour
+     * with trapping to prevent spurious SIGILLs if the guest OS does
+     * access them as the support cannot be probed for.
      */
     { .name = "OSDTRRX_EL1", .state = ARM_CP_STATE_BOTH, .cp = 14,
       .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 0, .opc2 = 2,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
       .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 2,
       .access = PL1_RW, .accessfn = access_tdcc,
       .type = ARM_CP_CONST, .resetvalue = 0 },
+    /* DBGDTRTX_EL0/DBGDTRRX_EL0 depend on direction */
+    { .name = "DBGDTR_EL0", .state = ARM_CP_STATE_BOTH, .cp = 14,
+      .opc0 = 2, .opc1 = 3, .crn = 0, .crm = 5, .opc2 = 0,
+      .access = PL0_RW, .accessfn = access_tdcc,
+      .type = ARM_CP_CONST, .resetvalue = 0 },
     /*
      * OSECCR_EL1 provides a mechanism for an operating system
      * to access the contents of EDECCR. EDECCR is not implemented though,
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Split out all of the decode stuff from aarch64_tr_translate_insn.
Call it disas_a64_legacy to indicate it will be replaced.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230512144106.3608981-2-peter.maydell@linaro.org
[PMM: Rebased]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 82 ++++++++++++++++++----------------
 1 file changed, 44 insertions(+), 38 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool btype_destination_ok(uint32_t insn, bool bt, int btype)
     return false;
 }
 
+/* C3.1 A64 instruction index by encoding */
+static void disas_a64_legacy(DisasContext *s, uint32_t insn)
+{
+    switch (extract32(insn, 25, 4)) {
+    case 0x0:
+        if (!extract32(insn, 31, 1) || !disas_sme(s, insn)) {
+            unallocated_encoding(s);
+        }
+        break;
+    case 0x1: case 0x3: /* UNALLOCATED */
+        unallocated_encoding(s);
+        break;
+    case 0x2:
+        if (!disas_sve(s, insn)) {
+            unallocated_encoding(s);
+        }
+        break;
+    case 0x8: case 0x9: /* Data processing - immediate */
+        disas_data_proc_imm(s, insn);
+        break;
+    case 0xa: case 0xb: /* Branch, exception generation and system insns */
+        disas_b_exc_sys(s, insn);
+        break;
+    case 0x4:
+    case 0x6:
+    case 0xc:
+    case 0xe:      /* Loads and stores */
+        disas_ldst(s, insn);
+        break;
+    case 0x5:
+    case 0xd:      /* Data processing - register */
+        disas_data_proc_reg(s, insn);
+        break;
+    case 0x7:
+    case 0xf:      /* Data processing - SIMD and floating point */
+        disas_data_proc_simd_fp(s, insn);
+        break;
+    default:
+        assert(FALSE); /* all 15 cases should be handled above */
+        break;
+    }
+}
+
 static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
                                           CPUState *cpu)
 {
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
         disas_sme_fa64(s, insn);
     }
 
-    switch (extract32(insn, 25, 4)) {
-    case 0x0:
-        if (!extract32(insn, 31, 1) || !disas_sme(s, insn)) {
-            unallocated_encoding(s);
-        }
-        break;
-    case 0x1: case 0x3: /* UNALLOCATED */
-        unallocated_encoding(s);
-        break;
-    case 0x2:
-        if (!disas_sve(s, insn)) {
-            unallocated_encoding(s);
-        }
-        break;
-    case 0x8: case 0x9: /* Data processing - immediate */
-        disas_data_proc_imm(s, insn);
-        break;
-    case 0xa: case 0xb: /* Branch, exception generation and system insns */
-        disas_b_exc_sys(s, insn);
-        break;
-    case 0x4:
-    case 0x6:
-    case 0xc:
-    case 0xe:      /* Loads and stores */
-        disas_ldst(s, insn);
-        break;
-    case 0x5:
-    case 0xd:      /* Data processing - register */
-        disas_data_proc_reg(s, insn);
-        break;
-    case 0x7:
-    case 0xf:      /* Data processing - SIMD and floating point */
-        disas_data_proc_simd_fp(s, insn);
-        break;
-    default:
-        assert(FALSE); /* all 15 cases should be handled above */
-        break;
-    }
+    disas_a64_legacy(s, insn);
 
     /*
      * After execution of most insns, btype is reset to 0.
-- 
2.34.1

The A64 translator uses a hand-written decoder for everything except
SVE or SME.  It's fairly well structured, but it's becoming obvious
that it's still more painful to add instructions to than the A32
translator, because putting a new instruction into the right place in
a hand-written decoder is much harder than adding new instruction
patterns to a decodetree file.

As the first step in conversion to decodetree, create the skeleton of
the decodetree decoder; where it does not handle instructions we will
fall back to the legacy decoder (which will be for everything at the
moment, since there are no patterns in a64.decode).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230512144106.3608981-3-peter.maydell@linaro.org
---
 target/arm/tcg/a64.decode      | 20 ++++++++++++++++++++
 target/arm/tcg/translate-a64.c | 18 +++++++++++-------
 target/arm/tcg/meson.build     |  1 +
 3 files changed, 32 insertions(+), 7 deletions(-)
 create mode 100644 target/arm/tcg/a64.decode

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/tcg/a64.decode
@@ -XXX,XX +XXX,XX @@
+# AArch64 A64 allowed instruction decoding
+#
+#  Copyright (c) 2023 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ enum a64_shift_type {
     A64_SHIFT_TYPE_ROR = 3
 };
 
+/*
+ * Include the generated decoders.
+ */
+
+#include "decode-sme-fa64.c.inc"
+#include "decode-a64.c.inc"
+
 /* Table based decoder typedefs - used when the relevant bits for decode
  * are too awkwardly scattered across the instruction (eg SIMD).
  */
@@ -XXX,XX +XXX,XX @@ static void disas_data_proc_simd_fp(DisasContext *s, uint32_t insn)
     }
 }
 
-/*
- * Include the generated SME FA64 decoder.
- */
-
-#include "decode-sme-fa64.c.inc"
-
 static bool trans_OK(DisasContext *s, arg_OK *a)
 {
     return true;
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
         disas_sme_fa64(s, insn);
     }
 
-    disas_a64_legacy(s, insn);
+
+    if (!disas_a64(s, insn)) {
+        disas_a64_legacy(s, insn);
+    }
 
     /*
      * After execution of most insns, btype is reset to 0.
diff --git a/target/arm/tcg/meson.build b/target/arm/tcg/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/meson.build
+++ b/target/arm/tcg/meson.build
@@ -XXX,XX +XXX,XX @@ gen = [
   decodetree.process('a32-uncond.decode', extra_args: '--static-decode=disas_a32_uncond'),
   decodetree.process('t32.decode', extra_args: '--static-decode=disas_t32'),
   decodetree.process('t16.decode', extra_args: ['-w', '16', '--static-decode=disas_t16']),
+  decodetree.process('a64.decode', extra_args: ['--static-decode=disas_a64']),
 ]
 
 arm_ss.add(gen)
-- 
2.34.1

The SVE and SME decode is already done by decodetree.  Pull the calls
to these decoders out of the legacy decoder.  This doesn't change
behaviour because all the patterns in sve.decode and sme.decode
already require the bits that the legacy decoder is decoding to have
the correct values.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230512144106.3608981-4-peter.maydell@linaro.org
---
 target/arm/tcg/translate-a64.c | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool btype_destination_ok(uint32_t insn, bool bt, int btype)
 static void disas_a64_legacy(DisasContext *s, uint32_t insn)
 {
     switch (extract32(insn, 25, 4)) {
-    case 0x0:
-        if (!extract32(insn, 31, 1) || !disas_sme(s, insn)) {
-            unallocated_encoding(s);
-        }
-        break;
-    case 0x1: case 0x3: /* UNALLOCATED */
-        unallocated_encoding(s);
-        break;
-    case 0x2:
-        if (!disas_sve(s, insn)) {
-            unallocated_encoding(s);
-        }
-        break;
     case 0x8: case 0x9: /* Data processing - immediate */
         disas_data_proc_imm(s, insn);
         break;
@@ -XXX,XX +XXX,XX @@ static void disas_a64_legacy(DisasContext *s, uint32_t insn)
         disas_data_proc_simd_fp(s, insn);
         break;
     default:
-        assert(FALSE); /* all 15 cases should be handled above */
+        unallocated_encoding(s);
         break;
     }
 }
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
         disas_sme_fa64(s, insn);
     }
 
-
-    if (!disas_a64(s, insn)) {
+    if (!disas_a64(s, insn) &&
+        !disas_sme(s, insn) &&
+        !disas_sve(s, insn)) {
         disas_a64_legacy(s, insn);
     }
 
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Convert the ADR and ADRP instructions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230512144106.3608981-5-peter.maydell@linaro.org
[PMM: Rebased]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/a64.decode      | 13 ++++++++++++
 target/arm/tcg/translate-a64.c | 38 +++++++++++++---------------------
 2 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -XXX,XX +XXX,XX @@
 #
 # This file is processed by scripts/decodetree.py
 #
+
+&ri              rd imm
+
+
+### Data Processing - Immediate
+
+# PC-rel addressing
+
+%imm_pcrel      5:s19 29:2
+@pcrel          . .. ..... ................... rd:5     &ri imm=%imm_pcrel
+
+ADR             0 .. 10000 ................... .....    @pcrel
+ADRP            1 .. 10000 ................... .....    @pcrel
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_ldst(DisasContext *s, uint32_t insn)
     }
 }
 
-/* PC-rel. addressing
- *   31  30   29 28       24 23                5 4    0
- * +----+-------+-----------+-------------------+------+
- * | op | immlo | 1 0 0 0 0 |       immhi       |  Rd  |
- * +----+-------+-----------+-------------------+------+
+/*
+ * PC-rel. addressing
  */
-static void disas_pc_rel_adr(DisasContext *s, uint32_t insn)
+
+static bool trans_ADR(DisasContext *s, arg_ri *a)
 {
-    unsigned int page, rd;
-    int64_t offset;
+    gen_pc_plus_diff(s, cpu_reg(s, a->rd), a->imm);
+    return true;
+}
 
-    page = extract32(insn, 31, 1);
-    /* SignExtend(immhi:immlo) -> offset */
-    offset = sextract64(insn, 5, 19);
-    offset = offset << 2 | extract32(insn, 29, 2);
-    rd = extract32(insn, 0, 5);
+static bool trans_ADRP(DisasContext *s, arg_ri *a)
+{
+    int64_t offset = (int64_t)a->imm << 12;
 
-    if (page) {
-        /* ADRP (page based) */
-        offset <<= 12;
-        /* The page offset is ok for CF_PCREL. */
-        offset -= s->pc_curr & 0xfff;
-    }
-
-    gen_pc_plus_diff(s, cpu_reg(s, rd), offset);
+    /* The page offset is ok for CF_PCREL. */
+    offset -= s->pc_curr & 0xfff;
+    gen_pc_plus_diff(s, cpu_reg(s, a->rd), offset);
+    return true;
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ static void disas_extract(DisasContext *s, uint32_t insn)
 static void disas_data_proc_imm(DisasContext *s, uint32_t insn)
 {
     switch (extract32(insn, 23, 6)) {
-    case 0x20: case 0x21: /* PC-rel. addressing */
-        disas_pc_rel_adr(s, insn);
-        break;
     case 0x22: /* Add/subtract (immediate) */
         disas_add_sub_imm(s, insn);
         break;
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Split out specific 32-bit and 64-bit functions.
These carry the same signature as tcg_gen_add_i64,
and so will be easier to pass as callbacks.

Retain gen_add_CC and gen_sub_CC during conversion.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230512144106.3608981-6-peter.maydell@linaro.org
[PMM: rebased]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 149 +++++++++++++++++++--------------
 1 file changed, 84 insertions(+), 65 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_logic_CC(int sf, TCGv_i64 result)
 }
 
 /* dest = T0 + T1; compute C, N, V and Z flags */
+static void gen_add64_CC(TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
+{
+    TCGv_i64 result, flag, tmp;
+    result = tcg_temp_new_i64();
+    flag = tcg_temp_new_i64();
+    tmp = tcg_temp_new_i64();
+
+    tcg_gen_movi_i64(tmp, 0);
+    tcg_gen_add2_i64(result, flag, t0, tmp, t1, tmp);
+
+    tcg_gen_extrl_i64_i32(cpu_CF, flag);
+
+    gen_set_NZ64(result);
+
+    tcg_gen_xor_i64(flag, result, t0);
+    tcg_gen_xor_i64(tmp, t0, t1);
+    tcg_gen_andc_i64(flag, flag, tmp);
+    tcg_gen_extrh_i64_i32(cpu_VF, flag);
+
+    tcg_gen_mov_i64(dest, result);
+}
+
+static void gen_add32_CC(TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
+{
+    TCGv_i32 t0_32 = tcg_temp_new_i32();
+    TCGv_i32 t1_32 = tcg_temp_new_i32();
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    tcg_gen_movi_i32(tmp, 0);
+    tcg_gen_extrl_i64_i32(t0_32, t0);
+    tcg_gen_extrl_i64_i32(t1_32, t1);
+    tcg_gen_add2_i32(cpu_NF, cpu_CF, t0_32, tmp, t1_32, tmp);
+    tcg_gen_mov_i32(cpu_ZF, cpu_NF);
+    tcg_gen_xor_i32(cpu_VF, cpu_NF, t0_32);
+    tcg_gen_xor_i32(tmp, t0_32, t1_32);
+    tcg_gen_andc_i32(cpu_VF, cpu_VF, tmp);
+    tcg_gen_extu_i32_i64(dest, cpu_NF);
+}
+
 static void gen_add_CC(int sf, TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
 {
     if (sf) {
-        TCGv_i64 result, flag, tmp;
-        result = tcg_temp_new_i64();
-        flag = tcg_temp_new_i64();
-        tmp = tcg_temp_new_i64();
-
-        tcg_gen_movi_i64(tmp, 0);
-        tcg_gen_add2_i64(result, flag, t0, tmp, t1, tmp);
-
-        tcg_gen_extrl_i64_i32(cpu_CF, flag);
-
-        gen_set_NZ64(result);
-
-        tcg_gen_xor_i64(flag, result, t0);
-        tcg_gen_xor_i64(tmp, t0, t1);
-        tcg_gen_andc_i64(flag, flag, tmp);
-        tcg_gen_extrh_i64_i32(cpu_VF, flag);
-
-        tcg_gen_mov_i64(dest, result);
+        gen_add64_CC(dest, t0, t1);
     } else {
-        /* 32 bit arithmetic */
-        TCGv_i32 t0_32 = tcg_temp_new_i32();
-        TCGv_i32 t1_32 = tcg_temp_new_i32();
-        TCGv_i32 tmp = tcg_temp_new_i32();
-
-        tcg_gen_movi_i32(tmp, 0);
-        tcg_gen_extrl_i64_i32(t0_32, t0);
-        tcg_gen_extrl_i64_i32(t1_32, t1);
-        tcg_gen_add2_i32(cpu_NF, cpu_CF, t0_32, tmp, t1_32, tmp);
-        tcg_gen_mov_i32(cpu_ZF, cpu_NF);
-        tcg_gen_xor_i32(cpu_VF, cpu_NF, t0_32);
-        tcg_gen_xor_i32(tmp, t0_32, t1_32);
-        tcg_gen_andc_i32(cpu_VF, cpu_VF, tmp);
-        tcg_gen_extu_i32_i64(dest, cpu_NF);
+        gen_add32_CC(dest, t0, t1);
     }
 }
 
 /* dest = T0 - T1; compute C, N, V and Z flags */
+static void gen_sub64_CC(TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
+{
+    /* 64 bit arithmetic */
+    TCGv_i64 result, flag, tmp;
+
+    result = tcg_temp_new_i64();
+    flag = tcg_temp_new_i64();
+    tcg_gen_sub_i64(result, t0, t1);
+
+    gen_set_NZ64(result);
+
+    tcg_gen_setcond_i64(TCG_COND_GEU, flag, t0, t1);
+    tcg_gen_extrl_i64_i32(cpu_CF, flag);
+
+    tcg_gen_xor_i64(flag, result, t0);
+    tmp = tcg_temp_new_i64();
+    tcg_gen_xor_i64(tmp, t0, t1);
+    tcg_gen_and_i64(flag, flag, tmp);
+    tcg_gen_extrh_i64_i32(cpu_VF, flag);
+    tcg_gen_mov_i64(dest, result);
+}
+
+static void gen_sub32_CC(TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
+{
+    /* 32 bit arithmetic */
+    TCGv_i32 t0_32 = tcg_temp_new_i32();
+    TCGv_i32 t1_32 = tcg_temp_new_i32();
+    TCGv_i32 tmp;
+
+    tcg_gen_extrl_i64_i32(t0_32, t0);
+    tcg_gen_extrl_i64_i32(t1_32, t1);
+    tcg_gen_sub_i32(cpu_NF, t0_32, t1_32);
+    tcg_gen_mov_i32(cpu_ZF, cpu_NF);
+    tcg_gen_setcond_i32(TCG_COND_GEU, cpu_CF, t0_32, t1_32);
+    tcg_gen_xor_i32(cpu_VF, cpu_NF, t0_32);
+    tmp = tcg_temp_new_i32();
+    tcg_gen_xor_i32(tmp, t0_32, t1_32);
+    tcg_gen_and_i32(cpu_VF, cpu_VF, tmp);
+    tcg_gen_extu_i32_i64(dest, cpu_NF);
+}
+
 static void gen_sub_CC(int sf, TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
 {
     if (sf) {
-        /* 64 bit arithmetic */
-        TCGv_i64 result, flag, tmp;
-
-        result = tcg_temp_new_i64();
-        flag = tcg_temp_new_i64();
-        tcg_gen_sub_i64(result, t0, t1);
-
-        gen_set_NZ64(result);
-
-        tcg_gen_setcond_i64(TCG_COND_GEU, flag, t0, t1);
-        tcg_gen_extrl_i64_i32(cpu_CF, flag);
-
-        tcg_gen_xor_i64(flag, result, t0);
-        tmp = tcg_temp_new_i64();
-        tcg_gen_xor_i64(tmp, t0, t1);
-        tcg_gen_and_i64(flag, flag, tmp);
-        tcg_gen_extrh_i64_i32(cpu_VF, flag);
-        tcg_gen_mov_i64(dest, result);
+        gen_sub64_CC(dest, t0, t1);
     } else {
-        /* 32 bit arithmetic */
-        TCGv_i32 t0_32 = tcg_temp_new_i32();
-        TCGv_i32 t1_32 = tcg_temp_new_i32();
-        TCGv_i32 tmp;
-
-        tcg_gen_extrl_i64_i32(t0_32, t0);
-        tcg_gen_extrl_i64_i32(t1_32, t1);
-        tcg_gen_sub_i32(cpu_NF, t0_32, t1_32);
-        tcg_gen_mov_i32(cpu_ZF, cpu_NF);
-        tcg_gen_setcond_i32(TCG_COND_GEU, cpu_CF, t0_32, t1_32);
-        tcg_gen_xor_i32(cpu_VF, cpu_NF, t0_32);
-        tmp = tcg_temp_new_i32();
-        tcg_gen_xor_i32(tmp, t0_32, t1_32);
-        tcg_gen_and_i32(cpu_VF, cpu_VF, tmp);
-        tcg_gen_extu_i32_i64(dest, cpu_NF);
+        gen_sub32_CC(dest, t0, t1);
     }
 }
 
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Convert the ADD and SUB (immediate) instructions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230512144106.3608981-7-peter.maydell@linaro.org
[PMM: Rebased; adjusted to use translate.h's TRANS macro]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate.h     |  5 +++
 target/arm/tcg/a64.decode      | 17 ++++++++
 target/arm/tcg/translate-a64.c | 73 ++++++++++------------------------
 3 files changed, 42 insertions(+), 53 deletions(-)

diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ static inline int rsub_8(DisasContext *s, int x)
     return 8 - x;
 }
 
+static inline int shl_12(DisasContext *s, int x)
+{
+    return x << 12;
+}
+
 static inline int neon_3same_fp_size(DisasContext *s, int x)
 {
     /* Convert 0==fp32, 1==fp16 into a MO_* value */
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -XXX,XX +XXX,XX @@
 #
 
 &ri              rd imm
+&rri_sf          rd rn imm sf
 
 
 ### Data Processing - Immediate
@@ -XXX,XX +XXX,XX @@
 
 ADR             0 .. 10000 ................... .....    @pcrel
 ADRP            1 .. 10000 ................... .....    @pcrel
+
+# Add/subtract (immediate)
+
+%imm12_sh12     10:12 !function=shl_12
+@addsub_imm     sf:1 .. ...... . imm:12 rn:5 rd:5
+@addsub_imm12   sf:1 .. ...... . ............ rn:5 rd:5 imm=%imm12_sh12
+
+ADD_i           . 00 100010 0 ............ ..... .....  @addsub_imm
+ADD_i           . 00 100010 1 ............ ..... .....  @addsub_imm12
+ADDS_i          . 01 100010 0 ............ ..... .....  @addsub_imm
+ADDS_i          . 01 100010 1 ............ ..... .....  @addsub_imm12
+
+SUB_i           . 10 100010 0 ............ ..... .....  @addsub_imm
+SUB_i           . 10 100010 1 ............ ..... .....  @addsub_imm12
+SUBS_i          . 11 100010 0 ............ ..... .....  @addsub_imm
+SUBS_i          . 11 100010 1 ............ ..... .....  @addsub_imm12
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_ldst(DisasContext *s, uint32_t insn)
     }
 }
 
+typedef void ArithTwoOp(TCGv_i64, TCGv_i64, TCGv_i64);
+
+static bool gen_rri(DisasContext *s, arg_rri_sf *a,
+                    bool rd_sp, bool rn_sp, ArithTwoOp *fn)
+{
+    TCGv_i64 tcg_rn = rn_sp ? cpu_reg_sp(s, a->rn) : cpu_reg(s, a->rn);
+    TCGv_i64 tcg_rd = rd_sp ? cpu_reg_sp(s, a->rd) : cpu_reg(s, a->rd);
+    TCGv_i64 tcg_imm = tcg_constant_i64(a->imm);
+
+    fn(tcg_rd, tcg_rn, tcg_imm);
+    if (!a->sf) {
+        tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
+    }
+    return true;
+}
+
 /*
  * PC-rel. addressing
  */
@@ -XXX,XX +XXX,XX @@ static bool trans_ADRP(DisasContext *s, arg_ri *a)
 
 /*
  * Add/subtract (immediate)
- *
- *  31 30 29 28         23 22 21         10 9   5 4   0
- * +--+--+--+-------------+--+-------------+-----+-----+
- * |sf|op| S| 1 0 0 0 1 0 |sh|    imm12    |  Rn | Rd  |
- * +--+--+--+-------------+--+-------------+-----+-----+
- *
- *    sf: 0 -> 32bit, 1 -> 64bit
- *    op: 0 -> add  , 1 -> sub
- *     S: 1 -> set flags
- *    sh: 1 -> LSL imm by 12
  */
-static void disas_add_sub_imm(DisasContext *s, uint32_t insn)
-{
-    int rd = extract32(insn, 0, 5);
-    int rn = extract32(insn, 5, 5);
-    uint64_t imm = extract32(insn, 10, 12);
-    bool shift = extract32(insn, 22, 1);
-    bool setflags = extract32(insn, 29, 1);
-    bool sub_op = extract32(insn, 30, 1);
-    bool is_64bit = extract32(insn, 31, 1);
-
-    TCGv_i64 tcg_rn = cpu_reg_sp(s, rn);
-    TCGv_i64 tcg_rd = setflags ? cpu_reg(s, rd) : cpu_reg_sp(s, rd);
-    TCGv_i64 tcg_result;
-
-    if (shift) {
-        imm <<= 12;
-    }
-
-    tcg_result = tcg_temp_new_i64();
-    if (!setflags) {
-        if (sub_op) {
-            tcg_gen_subi_i64(tcg_result, tcg_rn, imm);
-        } else {
-            tcg_gen_addi_i64(tcg_result, tcg_rn, imm);
-        }
-    } else {
-        TCGv_i64 tcg_imm = tcg_constant_i64(imm);
-        if (sub_op) {
-            gen_sub_CC(is_64bit, tcg_result, tcg_rn, tcg_imm);
-        } else {
-            gen_add_CC(is_64bit, tcg_result, tcg_rn, tcg_imm);
-        }
-    }
-
-    if (is_64bit) {
-        tcg_gen_mov_i64(tcg_rd, tcg_result);
-    } else {
-        tcg_gen_ext32u_i64(tcg_rd, tcg_result);
-    }
-}
+TRANS(ADD_i, gen_rri, a, 1, 1, tcg_gen_add_i64)
+TRANS(SUB_i, gen_rri, a, 1, 1, tcg_gen_sub_i64)
+TRANS(ADDS_i, gen_rri, a, 0, 1, a->sf ? gen_add64_CC : gen_add32_CC)
+TRANS(SUBS_i, gen_rri, a, 0, 1, a->sf ? gen_sub64_CC : gen_sub32_CC)
 
 /*
  * Add/subtract (immediate, with tags)
@@ -XXX,XX +XXX,XX @@ static void disas_extract(DisasContext *s, uint32_t insn)
 static void disas_data_proc_imm(DisasContext *s, uint32_t insn)
 {
     switch (extract32(insn, 23, 6)) {
-    case 0x22: /* Add/subtract (immediate) */
-        disas_add_sub_imm(s, insn);
-        break;
     case 0x23: /* Add/subtract (immediate, with tags) */
         disas_add_sub_imm_with_tags(s, insn);
         break;
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Convert the ADDG and SUBG (immediate) instructions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230512144106.3608981-8-peter.maydell@linaro.org
[PMM: Rebased; use TRANS_FEAT()]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/a64.decode      |  8 +++++++
 target/arm/tcg/translate-a64.c | 38 ++++++++++------------------------
 2 files changed, 19 insertions(+), 27 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Use the bitops.h macro rather than rolling our own here.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230512144106.3608981-9-peter.maydell@linaro.org
---
 target/arm/tcg/translate-a64.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static uint64_t bitfield_replicate(uint64_t mask, unsigned int e)
     return mask;
 }
 
-/* Return a value with the bottom len bits set (where 0 < len <= 64) */
-static inline uint64_t bitmask64(unsigned int length)
-{
-    assert(length > 0 && length <= 64);
-    return ~0ULL >> (64 - length);
-}
-
 /* Simplified variant of pseudocode DecodeBitMasks() for the case where we
  * only require the wmask. Returns false if the imms/immr/immn are a reserved
  * value (ie should cause a guest UNDEF exception), and true if they are
@@ -XXX,XX +XXX,XX @@ bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
     /* Create the value of one element: s+1 set bits rotated
      * by r within the element (which is e bits wide)...
      */
-    mask = bitmask64(s + 1);
+    mask = MAKE_64BIT_MASK(0, s + 1);
     if (r) {
         mask = (mask >> r) | (mask << (e - r));
-        mask &= bitmask64(e);
+        mask &= MAKE_64BIT_MASK(0, e);
     }
     /* ...then replicate the element over the whole 64 bit value */
     mask = bitfield_replicate(mask, e);
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Convert the ADD, ORR, EOR, ANDS (immediate) instructions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230512144106.3608981-10-peter.maydell@linaro.org
[PMM: rebased]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/a64.decode      | 15 ++++++
 target/arm/tcg/translate-a64.c | 94 +++++++++++-----------------------
 2 files changed, 44 insertions(+), 65 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -XXX,XX +XXX,XX @@ SUBS_i          . 11 100010 1 ............ ..... .....  @addsub_imm12
 
 ADDG_i          1 00 100011 0 ...... 00 .... ..... ..... @addsub_imm_tag
 SUBG_i          1 10 100011 0 ...... 00 .... ..... ..... @addsub_imm_tag
+
+# Logical (immediate)
+
+&rri_log        rd rn sf dbm
+@logic_imm_64   1 .. ...... dbm:13 rn:5 rd:5            &rri_log sf=1
+@logic_imm_32   0 .. ...... 0 dbm:12 rn:5 rd:5          &rri_log sf=0
+
+AND_i           . 00 100100 . ...... ...... ..... ..... @logic_imm_64
+AND_i           . 00 100100 . ...... ...... ..... ..... @logic_imm_32
+ORR_i           . 01 100100 . ...... ...... ..... ..... @logic_imm_64
+ORR_i           . 01 100100 . ...... ...... ..... ..... @logic_imm_32
+EOR_i           . 10 100100 . ...... ...... ..... ..... @logic_imm_64
+EOR_i           . 10 100100 . ...... ...... ..... ..... @logic_imm_32
+ANDS_i          . 11 100100 . ...... ...... ..... ..... @logic_imm_64
+ANDS_i          . 11 100100 . ...... ...... ..... ..... @logic_imm_32
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static uint64_t bitfield_replicate(uint64_t mask, unsigned int e)
     return mask;
 }
 
-/* Simplified variant of pseudocode DecodeBitMasks() for the case where we
+/*
+ * Logical (immediate)
+ */
+
+/*
+ * Simplified variant of pseudocode DecodeBitMasks() for the case where we
  * only require the wmask. Returns false if the imms/immr/immn are a reserved
  * value (ie should cause a guest UNDEF exception), and true if they are
  * valid, in which case the decoded bit pattern is written to result.
@@ -XXX,XX +XXX,XX @@ bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
     return true;
 }
 
-/* Logical (immediate)
- *   31  30 29 28         23 22  21  16 15  10 9    5 4    0
- * +----+-----+-------------+---+------+------+------+------+
- * | sf | opc | 1 0 0 1 0 0 | N | immr | imms |  Rn  |  Rd  |
- * +----+-----+-------------+---+------+------+------+------+
- */
-static void disas_logic_imm(DisasContext *s, uint32_t insn)
+static bool gen_rri_log(DisasContext *s, arg_rri_log *a, bool set_cc,
+                        void (*fn)(TCGv_i64, TCGv_i64, int64_t))
 {
-    unsigned int sf, opc, is_n, immr, imms, rn, rd;
     TCGv_i64 tcg_rd, tcg_rn;
-    uint64_t wmask;
-    bool is_and = false;
+    uint64_t imm;
 
-    sf = extract32(insn, 31, 1);
-    opc = extract32(insn, 29, 2);
-    is_n = extract32(insn, 22, 1);
-    immr = extract32(insn, 16, 6);
-    imms = extract32(insn, 10, 6);
-    rn = extract32(insn, 5, 5);
-    rd = extract32(insn, 0, 5);
-
-    if (!sf && is_n) {
-        unallocated_encoding(s);
-        return;
+    /* Some immediate field values are reserved. */
+    if (!logic_imm_decode_wmask(&imm, extract32(a->dbm, 12, 1),
+                                extract32(a->dbm, 0, 6),
+                                extract32(a->dbm, 6, 6))) {
+        return false;
+    }
+    if (!a->sf) {
+        imm &= 0xffffffffull;
     }
 
-    if (opc == 0x3) { /* ANDS */
-        tcg_rd = cpu_reg(s, rd);
-    } else {
-        tcg_rd = cpu_reg_sp(s, rd);
-    }
-    tcg_rn = cpu_reg(s, rn);
+    tcg_rd = set_cc ? cpu_reg(s, a->rd) : cpu_reg_sp(s, a->rd);
+    tcg_rn = cpu_reg(s, a->rn);
 
-    if (!logic_imm_decode_wmask(&wmask, is_n, imms, immr)) {
-        /* some immediate field values are reserved */
-        unallocated_encoding(s);
-        return;
+    fn(tcg_rd, tcg_rn, imm);
+    if (set_cc) {
+        gen_logic_CC(a->sf, tcg_rd);
     }
-
-    if (!sf) {
-        wmask &= 0xffffffff;
-    }
-
-    switch (opc) {
-    case 0x3: /* ANDS */
-    case 0x0: /* AND */
-        tcg_gen_andi_i64(tcg_rd, tcg_rn, wmask);
-        is_and = true;
-        break;
-    case 0x1: /* ORR */
-        tcg_gen_ori_i64(tcg_rd, tcg_rn, wmask);
-        break;
-    case 0x2: /* EOR */
-        tcg_gen_xori_i64(tcg_rd, tcg_rn, wmask);
-        break;
-    default:
-        assert(FALSE); /* must handle all above */
-        break;
-    }
-
-    if (!sf && !is_and) {
-        /* zero extend final result; we know we can skip this for AND
-         * since the immediate had the high 32 bits clear.
-         */
+    if (!a->sf) {
         tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
     }
-
-    if (opc == 3) { /* ANDS */
-        gen_logic_CC(sf, tcg_rd);
-    }
+    return true;
 }
 
+TRANS(AND_i, gen_rri_log, a, false, tcg_gen_andi_i64)
+TRANS(ORR_i, gen_rri_log, a, false, tcg_gen_ori_i64)
+TRANS(EOR_i, gen_rri_log, a, false, tcg_gen_xori_i64)
+TRANS(ANDS_i, gen_rri_log, a, true, tcg_gen_andi_i64)
+
 /*
  * Move wide (immediate)
  *
@@ -XXX,XX +XXX,XX @@ static void disas_extract(DisasContext *s, uint32_t insn)
 static void disas_data_proc_imm(DisasContext *s, uint32_t insn)
 {
     switch (extract32(insn, 23, 6)) {
-    case 0x24: /* Logical (immediate) */
-        disas_logic_imm(s, insn);
-        break;
     case 0x25: /* Move wide (immediate) */
         disas_movw_imm(s, insn);
         break;
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Convert the MON, MOVZ, MOVK instructions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230512144106.3608981-11-peter.maydell@linaro.org
[PMM: Rebased]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/a64.decode      | 13 ++++++
 target/arm/tcg/translate-a64.c | 73 ++++++++++++++--------------------
 2 files changed, 42 insertions(+), 44 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Convert the BFM, SBFM, UBFM instructions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230512144106.3608981-12-peter.maydell@linaro.org
[PMM: Rebased]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/a64.decode      |  13 +++
 target/arm/tcg/translate-a64.c | 144 ++++++++++++++++++---------------
 2 files changed, 94 insertions(+), 63 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -XXX,XX +XXX,XX @@ MOVZ            . 10 100101 .. ................ .....   @movw_64
 MOVZ            . 10 100101 .. ................ .....   @movw_32
 MOVK            . 11 100101 .. ................ .....   @movw_64
 MOVK            . 11 100101 .. ................ .....   @movw_32
+
+# Bitfield
+
+&bitfield       rd rn sf immr imms
+@bitfield_64    1 .. ...... 1 immr:6 imms:6 rn:5 rd:5      &bitfield sf=1
+@bitfield_32    0 .. ...... 0 0 immr:5 0 imms:5 rn:5 rd:5  &bitfield sf=0
+
+SBFM            . 00 100110 . ...... ...... ..... ..... @bitfield_64
+SBFM            . 00 100110 . ...... ...... ..... ..... @bitfield_32
+BFM             . 01 100110 . ...... ...... ..... ..... @bitfield_64
+BFM             . 01 100110 . ...... ...... ..... ..... @bitfield_32
+UBFM            . 10 100110 . ...... ...... ..... ..... @bitfield_64
+UBFM            . 10 100110 . ...... ...... ..... ..... @bitfield_32
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_MOVK(DisasContext *s, arg_movw *a)
     return true;
 }
 
-/* Bitfield
- *   31  30 29 28         23 22  21  16 15  10 9    5 4    0
- * +----+-----+-------------+---+------+------+------+------+
- * | sf | opc | 1 0 0 1 1 0 | N | immr | imms |  Rn  |  Rd  |
- * +----+-----+-------------+---+------+------+------+------+
+/*
+ * Bitfield
  */
-static void disas_bitfield(DisasContext *s, uint32_t insn)
+
+static bool trans_SBFM(DisasContext *s, arg_SBFM *a)
 {
-    unsigned int sf, n, opc, ri, si, rn, rd, bitsize, pos, len;
-    TCGv_i64 tcg_rd, tcg_tmp;
+    TCGv_i64 tcg_rd = cpu_reg(s, a->rd);
+    TCGv_i64 tcg_tmp = read_cpu_reg(s, a->rn, 1);
+    unsigned int bitsize = a->sf ? 64 : 32;
+    unsigned int ri = a->immr;
+    unsigned int si = a->imms;
+    unsigned int pos, len;
 
-    sf = extract32(insn, 31, 1);
-    opc = extract32(insn, 29, 2);
-    n = extract32(insn, 22, 1);
-    ri = extract32(insn, 16, 6);
-    si = extract32(insn, 10, 6);
-    rn = extract32(insn, 5, 5);
-    rd = extract32(insn, 0, 5);
-    bitsize = sf ? 64 : 32;
-
-    if (sf != n || ri >= bitsize || si >= bitsize || opc > 2) {
-        unallocated_encoding(s);
-        return;
-    }
-
-    tcg_rd = cpu_reg(s, rd);
-
-    /* Suppress the zero-extend for !sf.  Since RI and SI are constrained
-       to be smaller than bitsize, we'll never reference data outside the
-       low 32-bits anyway.  */
-    tcg_tmp = read_cpu_reg(s, rn, 1);
-
-    /* Recognize simple(r) extractions.  */
     if (si >= ri) {
         /* Wd<s-r:0> = Wn<s:r> */
         len = (si - ri) + 1;
-        if (opc == 0) { /* SBFM: ASR, SBFX, SXTB, SXTH, SXTW */
-            tcg_gen_sextract_i64(tcg_rd, tcg_tmp, ri, len);
-            goto done;
-        } else if (opc == 2) { /* UBFM: UBFX, LSR, UXTB, UXTH */
-            tcg_gen_extract_i64(tcg_rd, tcg_tmp, ri, len);
-            return;
+        tcg_gen_sextract_i64(tcg_rd, tcg_tmp, ri, len);
+        if (!a->sf) {
+            tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
         }
-        /* opc == 1, BFXIL fall through to deposit */
+    } else {
+        /* Wd<32+s-r,32-r> = Wn<s:0> */
+        len = si + 1;
+        pos = (bitsize - ri) & (bitsize - 1);
+
+        if (len < ri) {
+            /*
+             * Sign extend the destination field from len to fill the
+             * balance of the word.  Let the deposit below insert all
+             * of those sign bits.
+             */
+            tcg_gen_sextract_i64(tcg_tmp, tcg_tmp, 0, len);
+            len = ri;
+        }
+
+        /*
+         * We start with zero, and we haven't modified any bits outside
+         * bitsize, therefore no final zero-extension is unneeded for !sf.
+         */
+        tcg_gen_deposit_z_i64(tcg_rd, tcg_tmp, pos, len);
+    }
+    return true;
+}
+
+static bool trans_UBFM(DisasContext *s, arg_UBFM *a)
+{
+    TCGv_i64 tcg_rd = cpu_reg(s, a->rd);
+    TCGv_i64 tcg_tmp = read_cpu_reg(s, a->rn, 1);
+    unsigned int bitsize = a->sf ? 64 : 32;
+    unsigned int ri = a->immr;
+    unsigned int si = a->imms;
+    unsigned int pos, len;
+
+    tcg_rd = cpu_reg(s, a->rd);
+    tcg_tmp = read_cpu_reg(s, a->rn, 1);
+
+    if (si >= ri) {
+        /* Wd<s-r:0> = Wn<s:r> */
+        len = (si - ri) + 1;
+        tcg_gen_extract_i64(tcg_rd, tcg_tmp, ri, len);
+    } else {
+        /* Wd<32+s-r,32-r> = Wn<s:0> */
+        len = si + 1;
+        pos = (bitsize - ri) & (bitsize - 1);
+        tcg_gen_deposit_z_i64(tcg_rd, tcg_tmp, pos, len);
+    }
+    return true;
+}
+
+static bool trans_BFM(DisasContext *s, arg_BFM *a)
+{
+    TCGv_i64 tcg_rd = cpu_reg(s, a->rd);
+    TCGv_i64 tcg_tmp = read_cpu_reg(s, a->rn, 1);
+    unsigned int bitsize = a->sf ? 64 : 32;
+    unsigned int ri = a->immr;
+    unsigned int si = a->imms;
+    unsigned int pos, len;
+
+    tcg_rd = cpu_reg(s, a->rd);
+    tcg_tmp = read_cpu_reg(s, a->rn, 1);
+
+    if (si >= ri) {
+        /* Wd<s-r:0> = Wn<s:r> */
         tcg_gen_shri_i64(tcg_tmp, tcg_tmp, ri);
+        len = (si - ri) + 1;
         pos = 0;
     } else {
-        /* Handle the ri > si case with a deposit
-         * Wd<32+s-r,32-r> = Wn<s:0>
-         */
+        /* Wd<32+s-r,32-r> = Wn<s:0> */
         len = si + 1;
         pos = (bitsize - ri) & (bitsize - 1);
     }
 
-    if (opc == 0 && len < ri) {
-        /* SBFM: sign extend the destination field from len to fill
-           the balance of the word.  Let the deposit below insert all
-           of those sign bits.  */
-        tcg_gen_sextract_i64(tcg_tmp, tcg_tmp, 0, len);
-        len = ri;
-    }
-
-    if (opc == 1) { /* BFM, BFXIL */
-        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, pos, len);
-    } else {
-        /* SBFM or UBFM: We start with zero, and we haven't modified
-           any bits outside bitsize, therefore the zero-extension
-           below is unneeded.  */
-        tcg_gen_deposit_z_i64(tcg_rd, tcg_tmp, pos, len);
-        return;
-    }
-
- done:
-    if (!sf) { /* zero extend final result */
+    tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, pos, len);
+    if (!a->sf) {
         tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
     }
+    return true;
 }
 
 /* Extract
@@ -XXX,XX +XXX,XX @@ static void disas_extract(DisasContext *s, uint32_t insn)
 static void disas_data_proc_imm(DisasContext *s, uint32_t insn)
 {
     switch (extract32(insn, 23, 6)) {
-    case 0x26: /* Bitfield */
-        disas_bitfield(s, insn);
-        break;
     case 0x27: /* Extract */
         disas_extract(s, insn);
         break;
-- 
2.34.1

Convert the EXTR instruction to decodetree (this is the
only one in the 'Extract" class). This is the last of
the dp-immediate insns in the legacy decoder, so we
can now remove disas_data_proc_imm().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230512144106.3608981-13-peter.maydell@linaro.org
---
 target/arm/tcg/a64.decode      |  7 +++
 target/arm/tcg/translate-a64.c | 94 +++++++++++-----------------------
 2 files changed, 36 insertions(+), 65 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -XXX,XX +XXX,XX @@ BFM             . 01 100110 . ...... ...... ..... ..... @bitfield_64
 BFM             . 01 100110 . ...... ...... ..... ..... @bitfield_32
 UBFM            . 10 100110 . ...... ...... ..... ..... @bitfield_64
 UBFM            . 10 100110 . ...... ...... ..... ..... @bitfield_32
+
+# Extract
+
+&extract        rd rn rm imm sf
+
+EXTR            1 00 100111 1 0 rm:5 imm:6 rn:5 rd:5     &extract sf=1
+EXTR            0 00 100111 0 0 rm:5 0 imm:5 rn:5 rd:5   &extract sf=0
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFM(DisasContext *s, arg_BFM *a)
     return true;
 }
 
-/* Extract
- *   31  30  29 28         23 22   21  20  16 15    10 9    5 4    0
- * +----+------+-------------+---+----+------+--------+------+------+
- * | sf | op21 | 1 0 0 1 1 1 | N | o0 |  Rm  |  imms  |  Rn  |  Rd  |
- * +----+------+-------------+---+----+------+--------+------+------+
- */
-static void disas_extract(DisasContext *s, uint32_t insn)
+static bool trans_EXTR(DisasContext *s, arg_extract *a)
 {
-    unsigned int sf, n, rm, imm, rn, rd, bitsize, op21, op0;
+    TCGv_i64 tcg_rd, tcg_rm, tcg_rn;
 
-    sf = extract32(insn, 31, 1);
-    n = extract32(insn, 22, 1);
-    rm = extract32(insn, 16, 5);
-    imm = extract32(insn, 10, 6);
-    rn = extract32(insn, 5, 5);
-    rd = extract32(insn, 0, 5);
-    op21 = extract32(insn, 29, 2);
-    op0 = extract32(insn, 21, 1);
-    bitsize = sf ? 64 : 32;
+    tcg_rd = cpu_reg(s, a->rd);
 
-    if (sf != n || op21 || op0 || imm >= bitsize) {
-        unallocated_encoding(s);
-    } else {
-        TCGv_i64 tcg_rd, tcg_rm, tcg_rn;
-
-        tcg_rd = cpu_reg(s, rd);
-
-        if (unlikely(imm == 0)) {
-            /* tcg shl_i32/shl_i64 is undefined for 32/64 bit shifts,
-             * so an extract from bit 0 is a special case.
-             */
-            if (sf) {
-                tcg_gen_mov_i64(tcg_rd, cpu_reg(s, rm));
-            } else {
-                tcg_gen_ext32u_i64(tcg_rd, cpu_reg(s, rm));
-            }
+    if (unlikely(a->imm == 0)) {
+        /*
+         * tcg shl_i32/shl_i64 is undefined for 32/64 bit shifts,
+         * so an extract from bit 0 is a special case.
+         */
+        if (a->sf) {
+            tcg_gen_mov_i64(tcg_rd, cpu_reg(s, a->rm));
         } else {
-            tcg_rm = cpu_reg(s, rm);
-            tcg_rn = cpu_reg(s, rn);
+            tcg_gen_ext32u_i64(tcg_rd, cpu_reg(s, a->rm));
+        }
+    } else {
+        tcg_rm = cpu_reg(s, a->rm);
+        tcg_rn = cpu_reg(s, a->rn);
 
-            if (sf) {
-                /* Specialization to ROR happens in EXTRACT2.  */
-                tcg_gen_extract2_i64(tcg_rd, tcg_rm, tcg_rn, imm);
+        if (a->sf) {
+            /* Specialization to ROR happens in EXTRACT2.  */
+            tcg_gen_extract2_i64(tcg_rd, tcg_rm, tcg_rn, a->imm);
+        } else {
+            TCGv_i32 t0 = tcg_temp_new_i32();
+
+            tcg_gen_extrl_i64_i32(t0, tcg_rm);
+            if (a->rm == a->rn) {
+                tcg_gen_rotri_i32(t0, t0, a->imm);
             } else {
-                TCGv_i32 t0 = tcg_temp_new_i32();
-
-                tcg_gen_extrl_i64_i32(t0, tcg_rm);
-                if (rm == rn) {
-                    tcg_gen_rotri_i32(t0, t0, imm);
-                } else {
-                    TCGv_i32 t1 = tcg_temp_new_i32();
-                    tcg_gen_extrl_i64_i32(t1, tcg_rn);
-                    tcg_gen_extract2_i32(t0, t0, t1, imm);
-                }
-                tcg_gen_extu_i32_i64(tcg_rd, t0);
+                TCGv_i32 t1 = tcg_temp_new_i32();
+                tcg_gen_extrl_i64_i32(t1, tcg_rn);
+                tcg_gen_extract2_i32(t0, t0, t1, a->imm);
             }
+            tcg_gen_extu_i32_i64(tcg_rd, t0);
         }
     }
-}
-
-/* Data processing - immediate */
-static void disas_data_proc_imm(DisasContext *s, uint32_t insn)
-{
-    switch (extract32(insn, 23, 6)) {
-    case 0x27: /* Extract */
-        disas_extract(s, insn);
-        break;
-    default:
-        unallocated_encoding(s);
-        break;
-    }
+    return true;
 }
 
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
@@ -XXX,XX +XXX,XX @@ static bool btype_destination_ok(uint32_t insn, bool bt, int btype)
 static void disas_a64_legacy(DisasContext *s, uint32_t insn)
 {
     switch (extract32(insn, 25, 4)) {
-    case 0x8: case 0x9: /* Data processing - immediate */
-        disas_data_proc_imm(s, insn);
-        break;
     case 0xa: case 0xb: /* Branch, exception generation and system insns */
         disas_b_exc_sys(s, insn);
         break;
-- 
2.34.1

Convert the unconditional branch immediate insns B and BL to
decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230512144106.3608981-14-peter.maydell@linaro.org
---
 target/arm/tcg/a64.decode      |  9 +++++++++
 target/arm/tcg/translate-a64.c | 31 +++++++++++--------------------
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -XXX,XX +XXX,XX @@
 
 &ri              rd imm
 &rri_sf          rd rn imm sf
+&i               imm
 
 
 ### Data Processing - Immediate
@@ -XXX,XX +XXX,XX @@ UBFM            . 10 100110 . ...... ...... ..... ..... @bitfield_32
 
 EXTR            1 00 100111 1 0 rm:5 imm:6 rn:5 rd:5     &extract sf=1
 EXTR            0 00 100111 0 0 rm:5 0 imm:5 rn:5 rd:5   &extract sf=0
+
+# Branches
+
+%imm26   0:s26 !function=times_4
+@branch         . ..... .......................... &i imm=%imm26
+
+B               0 00101 .......................... @branch
+BL              1 00101 .......................... @branch
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static inline AArch64DecodeFn *lookup_disas_fn(const AArch64DecodeTable *table,
  * match up with those in the manual.
  */
 
-/* Unconditional branch (immediate)
- *   31  30       26 25                                  0
- * +----+-----------+-------------------------------------+
- * | op | 0 0 1 0 1 |                 imm26               |
- * +----+-----------+-------------------------------------+
- */
-static void disas_uncond_b_imm(DisasContext *s, uint32_t insn)
+static bool trans_B(DisasContext *s, arg_i *a)
 {
-    int64_t diff = sextract32(insn, 0, 26) * 4;
-
-    if (insn & (1U << 31)) {
-        /* BL Branch with link */
-        gen_pc_plus_diff(s, cpu_reg(s, 30), curr_insn_len(s));
-    }
-
-    /* B Branch / BL Branch with link */
     reset_btype(s);
-    gen_goto_tb(s, 0, diff);
+    gen_goto_tb(s, 0, a->imm);
+    return true;
+}
+
+static bool trans_BL(DisasContext *s, arg_i *a)
+{
+    gen_pc_plus_diff(s, cpu_reg(s, 30), curr_insn_len(s));
+    reset_btype(s);
+    gen_goto_tb(s, 0, a->imm);
+    return true;
 }
 
 /* Compare and branch (immediate)
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
 static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
 {
     switch (extract32(insn, 25, 7)) {
-    case 0x0a: case 0x0b:
-    case 0x4a: case 0x4b: /* Unconditional branch (immediate) */
-        disas_uncond_b_imm(s, insn);
-        break;
     case 0x1a: case 0x5a: /* Compare & branch (immediate) */
         disas_comp_b_imm(s, insn);
         break;
-- 
2.34.1

Convert the compare-and-branch-immediate insns CBZ and CBNZ
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230512144106.3608981-15-peter.maydell@linaro.org
---
 target/arm/tcg/a64.decode      |  5 +++++
 target/arm/tcg/translate-a64.c | 26 ++++++--------------------
 2 files changed, 11 insertions(+), 20 deletions(-)

Convert the test-and-branch-immediate insns TBZ and TBNZ
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230512144106.3608981-16-peter.maydell@linaro.org
---
 target/arm/tcg/a64.decode      |  6 ++++++
 target/arm/tcg/translate-a64.c | 25 +++++--------------------
 2 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -XXX,XX +XXX,XX @@ BL              1 00101 .......................... @branch
 &cbz     rt imm sf nz
 
 CBZ             sf:1 011010 nz:1 ................... rt:5 &cbz imm=%imm19
+
+%imm14     5:s14 !function=times_4
+%imm31_19  31:1 19:5
+&tbz       rt imm nz bitpos
+
+TBZ             . 011011 nz:1 ..... .............. rt:5 &tbz  imm=%imm14 bitpos=%imm31_19
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_CBZ(DisasContext *s, arg_cbz *a)
     return true;
 }
 
-/* Test and branch (immediate)
- *   31  30         25  24  23   19 18          5 4    0
- * +----+-------------+----+-------+-------------+------+
- * | b5 | 0 1 1 0 1 1 | op |  b40  |    imm14    |  Rt  |
- * +----+-------------+----+-------+-------------+------+
- */
-static void disas_test_b_imm(DisasContext *s, uint32_t insn)
+static bool trans_TBZ(DisasContext *s, arg_tbz *a)
 {
-    unsigned int bit_pos, op, rt;
-    int64_t diff;
     DisasLabel match;
     TCGv_i64 tcg_cmp;
 
-    bit_pos = (extract32(insn, 31, 1) << 5) | extract32(insn, 19, 5);
-    op = extract32(insn, 24, 1); /* 0: TBZ; 1: TBNZ */
-    diff = sextract32(insn, 5, 14) * 4;
-    rt = extract32(insn, 0, 5);
-
     tcg_cmp = tcg_temp_new_i64();
-    tcg_gen_andi_i64(tcg_cmp, cpu_reg(s, rt), (1ULL << bit_pos));
+    tcg_gen_andi_i64(tcg_cmp, cpu_reg(s, a->rt), 1ULL << a->bitpos);
 
     reset_btype(s);
 
     match = gen_disas_label(s);
-    tcg_gen_brcondi_i64(op ? TCG_COND_NE : TCG_COND_EQ,
+    tcg_gen_brcondi_i64(a->nz ? TCG_COND_NE : TCG_COND_EQ,
                         tcg_cmp, 0, match.label);
     gen_goto_tb(s, 0, 4);
     set_disas_label(s, match);
-    gen_goto_tb(s, 1, diff);
+    gen_goto_tb(s, 1, a->imm);
+    return true;
 }
 
 /* Conditional branch (immediate)
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
 static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
 {
     switch (extract32(insn, 25, 7)) {
-    case 0x1b: case 0x5b: /* Test & branch (immediate) */
-        disas_test_b_imm(s, insn);
-        break;
     case 0x2a: /* Conditional branch (immediate) */
         disas_cond_b_imm(s, insn);
         break;
-- 
2.34.1

Convert the immediate conditional branch insn B.cond to
decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230512144106.3608981-17-peter.maydell@linaro.org
---
 target/arm/tcg/a64.decode      |  2 ++
 target/arm/tcg/translate-a64.c | 30 ++++++------------------------
 2 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -XXX,XX +XXX,XX @@ CBZ             sf:1 011010 nz:1 ................... rt:5 &cbz imm=%imm19
 &tbz       rt imm nz bitpos
 
 TBZ             . 011011 nz:1 ..... .............. rt:5 &tbz  imm=%imm14 bitpos=%imm31_19
+
+B_cond          0101010 0 ................... 0 cond:4 imm=%imm19
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_TBZ(DisasContext *s, arg_tbz *a)
     return true;
 }
 
-/* Conditional branch (immediate)
- *  31           25  24  23                  5   4  3    0
- * +---------------+----+---------------------+----+------+
- * | 0 1 0 1 0 1 0 | o1 |         imm19       | o0 | cond |
- * +---------------+----+---------------------+----+------+
- */
-static void disas_cond_b_imm(DisasContext *s, uint32_t insn)
+static bool trans_B_cond(DisasContext *s, arg_B_cond *a)
 {
-    unsigned int cond;
-    int64_t diff;
-
-    if ((insn & (1 << 4)) || (insn & (1 << 24))) {
-        unallocated_encoding(s);
-        return;
-    }
-    diff = sextract32(insn, 5, 19) * 4;
-    cond = extract32(insn, 0, 4);
-
     reset_btype(s);
-    if (cond < 0x0e) {
+    if (a->cond < 0x0e) {
         /* genuinely conditional branches */
         DisasLabel match = gen_disas_label(s);
-        arm_gen_test_cc(cond, match.label);
+        arm_gen_test_cc(a->cond, match.label);
         gen_goto_tb(s, 0, 4);
         set_disas_label(s, match);
-        gen_goto_tb(s, 1, diff);
+        gen_goto_tb(s, 1, a->imm);
     } else {
         /* 0xe and 0xf are both "always" conditions */
-        gen_goto_tb(s, 0, diff);
+        gen_goto_tb(s, 0, a->imm);
     }
+    return true;
 }
 
 /* HINT instruction group, including various allocated HINTs */
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
 static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
 {
     switch (extract32(insn, 25, 7)) {
-    case 0x2a: /* Conditional branch (immediate) */
-        disas_cond_b_imm(s, insn);
-        break;
     case 0x6a: /* Exception generation / System */
         if (insn & (1 << 24)) {
             if (extract32(insn, 22, 2) == 0) {
-- 
2.34.1

Convert the simple (non-pointer-auth) BR, BLR and RET insns
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230512144106.3608981-18-peter.maydell@linaro.org
---
 target/arm/tcg/a64.decode      |  5 ++++
 target/arm/tcg/translate-a64.c | 55 ++++++++++++++++++++++++++++++----
 2 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -XXX,XX +XXX,XX @@
 # This file is processed by scripts/decodetree.py
 #
 
+&r               rn
 &ri              rd imm
 &rri_sf          rd rn imm sf
 &i               imm
@@ -XXX,XX +XXX,XX @@ CBZ             sf:1 011010 nz:1 ................... rt:5 &cbz imm=%imm19
 TBZ             . 011011 nz:1 ..... .............. rt:5 &tbz  imm=%imm14 bitpos=%imm31_19
 
 B_cond          0101010 0 ................... 0 cond:4 imm=%imm19
+
+BR              1101011 0000 11111 000000 rn:5 00000 &r
+BLR             1101011 0001 11111 000000 rn:5 00000 &r
+RET             1101011 0010 11111 000000 rn:5 00000 &r
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_B_cond(DisasContext *s, arg_B_cond *a)
     return true;
 }
 
+static void set_btype_for_br(DisasContext *s, int rn)
+{
+    if (dc_isar_feature(aa64_bti, s)) {
+        /* BR to {x16,x17} or !guard -> 1, else 3.  */
+        set_btype(s, rn == 16 || rn == 17 || !s->guarded_page ? 1 : 3);
+    }
+}
+
+static void set_btype_for_blr(DisasContext *s)
+{
+    if (dc_isar_feature(aa64_bti, s)) {
+        /* BLR sets BTYPE to 2, regardless of source guarded page.  */
+        set_btype(s, 2);
+    }
+}
+
+static bool trans_BR(DisasContext *s, arg_r *a)
+{
+    gen_a64_set_pc(s, cpu_reg(s, a->rn));
+    set_btype_for_br(s, a->rn);
+    s->base.is_jmp = DISAS_JUMP;
+    return true;
+}
+
+static bool trans_BLR(DisasContext *s, arg_r *a)
+{
+    TCGv_i64 dst = cpu_reg(s, a->rn);
+    TCGv_i64 lr = cpu_reg(s, 30);
+    if (dst == lr) {
+        TCGv_i64 tmp = tcg_temp_new_i64();
+        tcg_gen_mov_i64(tmp, dst);
+        dst = tmp;
+    }
+    gen_pc_plus_diff(s, lr, curr_insn_len(s));
+    gen_a64_set_pc(s, dst);
+    set_btype_for_blr(s);
+    s->base.is_jmp = DISAS_JUMP;
+    return true;
+}
+
+static bool trans_RET(DisasContext *s, arg_r *a)
+{
+    gen_a64_set_pc(s, cpu_reg(s, a->rn));
+    s->base.is_jmp = DISAS_JUMP;
+    return true;
+}
+
 /* HINT instruction group, including various allocated HINTs */
 static void handle_hint(DisasContext *s, uint32_t insn,
                         unsigned int op1, unsigned int op2, unsigned int crm)
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
         btype_mod = opc;
         switch (op3) {
         case 0:
-            /* BR, BLR, RET */
-            if (op4 != 0) {
-                goto do_unallocated;
-            }
-            dst = cpu_reg(s, rn);
-            break;
+            /* BR, BLR, RET : handled in decodetree */
+            goto do_unallocated;
 
         case 2:
         case 3:
-- 
2.34.1

Convert the single-register pointer-authentication variants of BR,
BLR, RET to decodetree. (BRAA/BLRAA are in a different branch of
the legacy decoder and will be dealt with in the next commit.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230512144106.3608981-19-peter.maydell@linaro.org
---
 target/arm/tcg/a64.decode      |   7 ++
 target/arm/tcg/translate-a64.c | 132 +++++++++++++++++++--------------
 2 files changed, 84 insertions(+), 55 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -XXX,XX +XXX,XX @@ B_cond          0101010 0 ................... 0 cond:4 imm=%imm19
 BR              1101011 0000 11111 000000 rn:5 00000 &r
 BLR             1101011 0001 11111 000000 rn:5 00000 &r
 RET             1101011 0010 11111 000000 rn:5 00000 &r
+
+&braz       rn m
+BRAZ            1101011 0000 11111 00001 m:1 rn:5 11111 &braz   # BRAAZ, BRABZ
+BLRAZ           1101011 0001 11111 00001 m:1 rn:5 11111 &braz   # BLRAAZ, BLRABZ
+
+&reta       m
+RETA            1101011 0010 11111 00001 m:1 11111 11111 &reta  # RETAA, RETAB
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_RET(DisasContext *s, arg_r *a)
     return true;
 }
 
+static TCGv_i64 auth_branch_target(DisasContext *s, TCGv_i64 dst,
+                                   TCGv_i64 modifier, bool use_key_a)
+{
+    TCGv_i64 truedst;
+    /*
+     * Return the branch target for a BRAA/RETA/etc, which is either
+     * just the destination dst, or that value with the pauth check
+     * done and the code removed from the high bits.
+     */
+    if (!s->pauth_active) {
+        return dst;
+    }
+
+    truedst = tcg_temp_new_i64();
+    if (use_key_a) {
+        gen_helper_autia(truedst, cpu_env, dst, modifier);
+    } else {
+        gen_helper_autib(truedst, cpu_env, dst, modifier);
+    }
+    return truedst;
+}
+
+static bool trans_BRAZ(DisasContext *s, arg_braz *a)
+{
+    TCGv_i64 dst;
+
+    if (!dc_isar_feature(aa64_pauth, s)) {
+        return false;
+    }
+
+    dst = auth_branch_target(s, cpu_reg(s, a->rn), tcg_constant_i64(0), !a->m);
+    gen_a64_set_pc(s, dst);
+    set_btype_for_br(s, a->rn);
+    s->base.is_jmp = DISAS_JUMP;
+    return true;
+}
+
+static bool trans_BLRAZ(DisasContext *s, arg_braz *a)
+{
+    TCGv_i64 dst, lr;
+
+    if (!dc_isar_feature(aa64_pauth, s)) {
+        return false;
+    }
+
+    dst = auth_branch_target(s, cpu_reg(s, a->rn), tcg_constant_i64(0), !a->m);
+    lr = cpu_reg(s, 30);
+    if (dst == lr) {
+        TCGv_i64 tmp = tcg_temp_new_i64();
+        tcg_gen_mov_i64(tmp, dst);
+        dst = tmp;
+    }
+    gen_pc_plus_diff(s, lr, curr_insn_len(s));
+    gen_a64_set_pc(s, dst);
+    set_btype_for_blr(s);
+    s->base.is_jmp = DISAS_JUMP;
+    return true;
+}
+
+static bool trans_RETA(DisasContext *s, arg_reta *a)
+{
+    TCGv_i64 dst;
+
+    dst = auth_branch_target(s, cpu_reg(s, 30), cpu_X[31], !a->m);
+    gen_a64_set_pc(s, dst);
+    s->base.is_jmp = DISAS_JUMP;
+    return true;
+}
+
 /* HINT instruction group, including various allocated HINTs */
 static void handle_hint(DisasContext *s, uint32_t insn,
                         unsigned int op1, unsigned int op2, unsigned int crm)
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
     }
 
     switch (opc) {
-    case 0: /* BR */
-    case 1: /* BLR */
-    case 2: /* RET */
-        btype_mod = opc;
-        switch (op3) {
-        case 0:
-            /* BR, BLR, RET : handled in decodetree */
-            goto do_unallocated;
-
-        case 2:
-        case 3:
-            if (!dc_isar_feature(aa64_pauth, s)) {
-                goto do_unallocated;
-            }
-            if (opc == 2) {
-                /* RETAA, RETAB */
-                if (rn != 0x1f || op4 != 0x1f) {
-                    goto do_unallocated;
-                }
-                rn = 30;
-                modifier = cpu_X[31];
-            } else {
-                /* BRAAZ, BRABZ, BLRAAZ, BLRABZ */
-                if (op4 != 0x1f) {
-                    goto do_unallocated;
-                }
-                modifier = tcg_constant_i64(0);
-            }
-            if (s->pauth_active) {
-                dst = tcg_temp_new_i64();
-                if (op3 == 2) {
-                    gen_helper_autia(dst, cpu_env, cpu_reg(s, rn), modifier);
-                } else {
-                    gen_helper_autib(dst, cpu_env, cpu_reg(s, rn), modifier);
-                }
-            } else {
-                dst = cpu_reg(s, rn);
-            }
-            break;
-
-        default:
-            goto do_unallocated;
-        }
-        /* BLR also needs to load return address */
-        if (opc == 1) {
-            TCGv_i64 lr = cpu_reg(s, 30);
-            if (dst == lr) {
-                TCGv_i64 tmp = tcg_temp_new_i64();
-                tcg_gen_mov_i64(tmp, dst);
-                dst = tmp;
-            }
-            gen_pc_plus_diff(s, lr, curr_insn_len(s));
-        }
-        gen_a64_set_pc(s, dst);
-        break;
+    case 0:
+    case 1:
+    case 2:
+        /*
+         * BR, BLR, RET, RETAA, RETAB, BRAAZ, BRABZ, BLRAAZ, BLRABZ:
+         * handled in decodetree
+         */
+        goto do_unallocated;
 
     case 8: /* BRAA */
     case 9: /* BLRAA */
-- 
2.34.1

Convert the last four BR-with-pointer-auth insns to decodetree.
The remaining cases in the outer switch in disas_uncond_b_reg()
all return early rather than leaving the case statement, so we
can delete the now-unused code at the end of that function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230512144106.3608981-20-peter.maydell@linaro.org
---
 target/arm/tcg/a64.decode      |  4 ++
 target/arm/tcg/translate-a64.c | 97 ++++++++++++++--------------------
 2 files changed, 43 insertions(+), 58 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -XXX,XX +XXX,XX @@ BLRAZ           1101011 0001 11111 00001 m:1 rn:5 11111 &braz   # BLRAAZ, BLRABZ
 
 &reta       m
 RETA            1101011 0010 11111 00001 m:1 11111 11111 &reta  # RETAA, RETAB
+
+&bra        rn rm m
+BRA             1101011 1000 11111 00001 m:1 rn:5 rm:5 &bra # BRAA, BRAB
+BLRA            1101011 1001 11111 00001 m:1 rn:5 rm:5 &bra # BLRAA, BLRAB
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_RETA(DisasContext *s, arg_reta *a)
     return true;
 }
 
+static bool trans_BRA(DisasContext *s, arg_bra *a)
+{
+    TCGv_i64 dst;
+
+    if (!dc_isar_feature(aa64_pauth, s)) {
+        return false;
+    }
+    dst = auth_branch_target(s, cpu_reg(s,a->rn), cpu_reg_sp(s, a->rm), !a->m);
+    gen_a64_set_pc(s, dst);
+    set_btype_for_br(s, a->rn);
+    s->base.is_jmp = DISAS_JUMP;
+    return true;
+}
+
+static bool trans_BLRA(DisasContext *s, arg_bra *a)
+{
+    TCGv_i64 dst, lr;
+
+    if (!dc_isar_feature(aa64_pauth, s)) {
+        return false;
+    }
+    dst = auth_branch_target(s, cpu_reg(s, a->rn), cpu_reg_sp(s, a->rm), !a->m);
+    lr = cpu_reg(s, 30);
+    if (dst == lr) {
+        TCGv_i64 tmp = tcg_temp_new_i64();
+        tcg_gen_mov_i64(tmp, dst);
+        dst = tmp;
+    }
+    gen_pc_plus_diff(s, lr, curr_insn_len(s));
+    gen_a64_set_pc(s, dst);
+    set_btype_for_blr(s);
+    s->base.is_jmp = DISAS_JUMP;
+    return true;
+}
+
 /* HINT instruction group, including various allocated HINTs */
 static void handle_hint(DisasContext *s, uint32_t insn,
                         unsigned int op1, unsigned int op2, unsigned int crm)
@@ -XXX,XX +XXX,XX @@ static void disas_exc(DisasContext *s, uint32_t insn)
 static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
 {
     unsigned int opc, op2, op3, rn, op4;
-    unsigned btype_mod = 2;   /* 0: BR, 1: BLR, 2: other */
     TCGv_i64 dst;
     TCGv_i64 modifier;
 
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
     case 0:
     case 1:
     case 2:
+    case 8:
+    case 9:
         /*
-         * BR, BLR, RET, RETAA, RETAB, BRAAZ, BRABZ, BLRAAZ, BLRABZ:
-         * handled in decodetree
+         * BR, BLR, RET, RETAA, RETAB, BRAAZ, BRABZ, BLRAAZ, BLRABZ,
+         * BRAA, BLRAA: handled in decodetree
          */
         goto do_unallocated;
 
-    case 8: /* BRAA */
-    case 9: /* BLRAA */
-        if (!dc_isar_feature(aa64_pauth, s)) {
-            goto do_unallocated;
-        }
-        if ((op3 & ~1) != 2) {
-            goto do_unallocated;
-        }
-        btype_mod = opc & 1;
-        if (s->pauth_active) {
-            dst = tcg_temp_new_i64();
-            modifier = cpu_reg_sp(s, op4);
-            if (op3 == 2) {
-                gen_helper_autia(dst, cpu_env, cpu_reg(s, rn), modifier);
-            } else {
-                gen_helper_autib(dst, cpu_env, cpu_reg(s, rn), modifier);
-            }
-        } else {
-            dst = cpu_reg(s, rn);
-        }
-        /* BLRAA also needs to load return address */
-        if (opc == 9) {
-            TCGv_i64 lr = cpu_reg(s, 30);
-            if (dst == lr) {
-                TCGv_i64 tmp = tcg_temp_new_i64();
-                tcg_gen_mov_i64(tmp, dst);
-                dst = tmp;
-            }
-            gen_pc_plus_diff(s, lr, curr_insn_len(s));
-        }
-        gen_a64_set_pc(s, dst);
-        break;
-
     case 4: /* ERET */
         if (s->current_el == 0) {
             goto do_unallocated;
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
         unallocated_encoding(s);
         return;
     }
-
-    switch (btype_mod) {
-    case 0: /* BR */
-        if (dc_isar_feature(aa64_bti, s)) {
-            /* BR to {x16,x17} or !guard -> 1, else 3.  */
-            set_btype(s, rn == 16 || rn == 17 || !s->guarded_page ? 1 : 3);
-        }
-        break;
-
-    case 1: /* BLR */
-        if (dc_isar_feature(aa64_bti, s)) {
-            /* BLR sets BTYPE to 2, regardless of source guarded page.  */
-            set_btype(s, 2);
-        }
-        break;
-
-    default: /* RET or none of the above.  */
-        /* BTYPE will be set to 0 by normal end-of-insn processing.  */
-        break;
-    }
-
-    s->base.is_jmp = DISAS_JUMP;
 }
 
 /* Branches, exception generating and system instructions */
-- 
2.34.1

Convert the exception-return insns ERET, ERETA and ERETB to
decodetree. These were the last insns left in the legacy
decoder function disas_uncond_reg_b(), which allows us to
remove it.

The old decoder explicitly decoded the DRPS instruction,
only in order to call unallocated_encoding() on it, exactly
as would have happened if it hadn't decoded it. This is
because this insn always UNDEFs unless the CPU is in
halting-debug state, which we don't emulate. So we list
the pattern in a comment in a64.decode, but don't actively
decode it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230512144106.3608981-21-peter.maydell@linaro.org
---
 target/arm/tcg/a64.decode      |   8 ++
 target/arm/tcg/translate-a64.c | 163 +++++++++++----------------------
 2 files changed, 63 insertions(+), 108 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -XXX,XX +XXX,XX @@ RETA            1101011 0010 11111 00001 m:1 11111 11111 &reta  # RETAA, RETAB
 &bra        rn rm m
 BRA             1101011 1000 11111 00001 m:1 rn:5 rm:5 &bra # BRAA, BRAB
 BLRA            1101011 1001 11111 00001 m:1 rn:5 rm:5 &bra # BLRAA, BLRAB
+
+ERET            1101011 0100 11111 000000 11111 00000
+ERETA           1101011 0100 11111 00001 m:1 11111 11111 &reta  # ERETAA, ERETAB
+
+# We don't need to decode DRPS because it always UNDEFs except when
+# the processor is in halting debug state (which we don't implement).
+# The pattern is listed here as documentation.
+# DRPS            1101011 0101 11111 000000 11111 00000
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BLRA(DisasContext *s, arg_bra *a)
     return true;
 }
 
+static bool trans_ERET(DisasContext *s, arg_ERET *a)
+{
+    TCGv_i64 dst;
+
+    if (s->current_el == 0) {
+        return false;
+    }
+    if (s->fgt_eret) {
+        gen_exception_insn_el(s, 0, EXCP_UDEF, 0, 2);
+        return true;
+    }
+    dst = tcg_temp_new_i64();
+    tcg_gen_ld_i64(dst, cpu_env,
+                   offsetof(CPUARMState, elr_el[s->current_el]));
+
+    if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+        gen_io_start();
+    }
+
+    gen_helper_exception_return(cpu_env, dst);
+    /* Must exit loop to check un-masked IRQs */
+    s->base.is_jmp = DISAS_EXIT;
+    return true;
+}
+
+static bool trans_ERETA(DisasContext *s, arg_reta *a)
+{
+    TCGv_i64 dst;
+
+    if (!dc_isar_feature(aa64_pauth, s)) {
+        return false;
+    }
+    if (s->current_el == 0) {
+        return false;
+    }
+    /* The FGT trap takes precedence over an auth trap. */
+    if (s->fgt_eret) {
+        gen_exception_insn_el(s, 0, EXCP_UDEF, a->m ? 3 : 2, 2);
+        return true;
+    }
+    dst = tcg_temp_new_i64();
+    tcg_gen_ld_i64(dst, cpu_env,
+                   offsetof(CPUARMState, elr_el[s->current_el]));
+
+    dst = auth_branch_target(s, dst, cpu_X[31], !a->m);
+    if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+        gen_io_start();
+    }
+
+    gen_helper_exception_return(cpu_env, dst);
+    /* Must exit loop to check un-masked IRQs */
+    s->base.is_jmp = DISAS_EXIT;
+    return true;
+}
+
 /* HINT instruction group, including various allocated HINTs */
 static void handle_hint(DisasContext *s, uint32_t insn,
                         unsigned int op1, unsigned int op2, unsigned int crm)
@@ -XXX,XX +XXX,XX @@ static void disas_exc(DisasContext *s, uint32_t insn)
     }
 }
 
-/* Unconditional branch (register)
- *  31           25 24   21 20   16 15   10 9    5 4     0
- * +---------------+-------+-------+-------+------+-------+
- * | 1 1 0 1 0 1 1 |  opc  |  op2  |  op3  |  Rn  |  op4  |
- * +---------------+-------+-------+-------+------+-------+
- */
-static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
-{
-    unsigned int opc, op2, op3, rn, op4;
-    TCGv_i64 dst;
-    TCGv_i64 modifier;
-
-    opc = extract32(insn, 21, 4);
-    op2 = extract32(insn, 16, 5);
-    op3 = extract32(insn, 10, 6);
-    rn = extract32(insn, 5, 5);
-    op4 = extract32(insn, 0, 5);
-
-    if (op2 != 0x1f) {
-        goto do_unallocated;
-    }
-
-    switch (opc) {
-    case 0:
-    case 1:
-    case 2:
-    case 8:
-    case 9:
-        /*
-         * BR, BLR, RET, RETAA, RETAB, BRAAZ, BRABZ, BLRAAZ, BLRABZ,
-         * BRAA, BLRAA: handled in decodetree
-         */
-        goto do_unallocated;
-
-    case 4: /* ERET */
-        if (s->current_el == 0) {
-            goto do_unallocated;
-        }
-        switch (op3) {
-        case 0: /* ERET */
-            if (op4 != 0) {
-                goto do_unallocated;
-            }
-            if (s->fgt_eret) {
-                gen_exception_insn_el(s, 0, EXCP_UDEF, syn_erettrap(op3), 2);
-                return;
-            }
-            dst = tcg_temp_new_i64();
-            tcg_gen_ld_i64(dst, cpu_env,
-                           offsetof(CPUARMState, elr_el[s->current_el]));
-            break;
-
-        case 2: /* ERETAA */
-        case 3: /* ERETAB */
-            if (!dc_isar_feature(aa64_pauth, s)) {
-                goto do_unallocated;
-            }
-            if (rn != 0x1f || op4 != 0x1f) {
-                goto do_unallocated;
-            }
-            /* The FGT trap takes precedence over an auth trap. */
-            if (s->fgt_eret) {
-                gen_exception_insn_el(s, 0, EXCP_UDEF, syn_erettrap(op3), 2);
-                return;
-            }
-            dst = tcg_temp_new_i64();
-            tcg_gen_ld_i64(dst, cpu_env,
-                           offsetof(CPUARMState, elr_el[s->current_el]));
-            if (s->pauth_active) {
-                modifier = cpu_X[31];
-                if (op3 == 2) {
-                    gen_helper_autia(dst, cpu_env, dst, modifier);
-                } else {
-                    gen_helper_autib(dst, cpu_env, dst, modifier);
-                }
-            }
-            break;
-
-        default:
-            goto do_unallocated;
-        }
-        if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-            gen_io_start();
-        }
-
-        gen_helper_exception_return(cpu_env, dst);
-        /* Must exit loop to check un-masked IRQs */
-        s->base.is_jmp = DISAS_EXIT;
-        return;
-
-    case 5: /* DRPS */
-        if (op3 != 0 || op4 != 0 || rn != 0x1f) {
-            goto do_unallocated;
-        } else {
-            unallocated_encoding(s);
-        }
-        return;
-
-    default:
-    do_unallocated:
-        unallocated_encoding(s);
-        return;
-    }
-}
-
 /* Branches, exception generating and system instructions */
 static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
 {
@@ -XXX,XX +XXX,XX @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
             disas_exc(s, insn);
         }
         break;
-    case 0x6b: /* Unconditional branch (register) */
-        disas_uncond_b_reg(s, insn);
-        break;
     default:
         unallocated_encoding(s);
         break;
-- 
2.34.1

The IMPDEF sysreg L2CTLR_EL1 found on the Cortex-A35, A53, A57, A72
and which we (arguably dubiously) also provide in '-cpu max' has a
2 bit field for the number of processors in the cluster. On real
hardware this must be sufficient because it can only be configured
with up to 4 CPUs in the cluster. However on QEMU if the board code
does not explicitly configure the code into clusters with the right
CPU count we default to "give the value assuming that all CPUs in
the system are in a single cluster", which might be too big to fit
in the field.

Instead of just overflowing this 2-bit field, saturate to 3 (meaning
"4 CPUs", so at least we don't overwrite other fields in the register.
It's unlikely that any guest code really cares about the value in
this field; at least, if it does it probably also wants the system
to be more closely matching real hardware, i.e. not to have more
than 4 CPUs.

This issue has been present since the L2CTLR was first added in
commit 377a44ec8f2fac5b back in 2014. It was only noticed because
Coverity complains (CID 1509227) that the shift might overflow 32 bits
and inadvertently sign extend into the top half of the 64 bit value.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230512170223.3801643-2-peter.maydell@linaro.org
---
 target/arm/cortex-regs.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/arm/cortex-regs.c b/target/arm/cortex-regs.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cortex-regs.c
+++ b/target/arm/cortex-regs.c
@@ -XXX,XX +XXX,XX @@ static uint64_t l2ctlr_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
     ARMCPU *cpu = env_archcpu(env);
 
-    /* Number of cores is in [25:24]; otherwise we RAZ */
-    return (cpu->core_count - 1) << 24;
+    /*
+     * Number of cores is in [25:24]; otherwise we RAZ.
+     * If the board didn't configure the CPUs into clusters,
+     * we default to "all CPUs in one cluster", which might be
+     * more than the 4 that the hardware permits and which is
+     * all you can report in this two-bit field. Saturate to
+     * 0b11 (== 4 CPUs) rather than overflowing the field.
+     */
+    return MIN(cpu->core_count - 1, 3) << 24;
 }
 
 static const ARMCPRegInfo cortex_a72_a57_a53_cp_reginfo[] = {
-- 
2.34.1

In the vexpress board code, we allocate a new MemoryRegion at the top
of vexpress_common_init() but only set it up and use it inside the
"if (map[VE_NORFLASHALIAS] != -1)" conditional, so we leak it if not.
This isn't a very interesting leak as it's a tiny amount of memory
once at startup, but it's easy to fix.

We could silence Coverity simply by moving the g_new() into the
if() block, but this use of g_new(MemoryRegion, 1) is a legacy from
when this board model was originally written; we wouldn't do that
if we wrote it today. The MemoryRegions are conceptually a part of
the board and must not go away until the whole board is done with
(at the end of the simulation), so they belong in its state struct.

This machine already has a VexpressMachineState struct that extends
MachineState, so statically put the MemoryRegions in there instead of
dynamically allocating them separately at runtime.

Spotted by Coverity (CID 1509083).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230512170223.3801643-3-peter.maydell@linaro.org
---
 hw/arm/vexpress.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/vexpress.c
+++ b/hw/arm/vexpress.c
@@ -XXX,XX +XXX,XX @@ struct VexpressMachineClass {
 
 struct VexpressMachineState {
     MachineState parent;
+    MemoryRegion vram;
+    MemoryRegion sram;
+    MemoryRegion flashalias;
+    MemoryRegion lowram;
+    MemoryRegion a15sram;
     bool secure;
     bool virt;
 };
@@ -XXX,XX +XXX,XX @@ struct VexpressMachineState {
 #define TYPE_VEXPRESS_A15_MACHINE   MACHINE_TYPE_NAME("vexpress-a15")
 OBJECT_DECLARE_TYPE(VexpressMachineState, VexpressMachineClass, VEXPRESS_MACHINE)
 
-typedef void DBoardInitFn(const VexpressMachineState *machine,
+typedef void DBoardInitFn(VexpressMachineState *machine,
                           ram_addr_t ram_size,
                           const char *cpu_type,
                           qemu_irq *pic);
@@ -XXX,XX +XXX,XX @@ static void init_cpus(MachineState *ms, const char *cpu_type,
     }
 }
 
-static void a9_daughterboard_init(const VexpressMachineState *vms,
+static void a9_daughterboard_init(VexpressMachineState *vms,
                                   ram_addr_t ram_size,
                                   const char *cpu_type,
                                   qemu_irq *pic)
 {
     MachineState *machine = MACHINE(vms);
     MemoryRegion *sysmem = get_system_memory();
-    MemoryRegion *lowram = g_new(MemoryRegion, 1);
     ram_addr_t low_ram_size;
 
     if (ram_size > 0x40000000) {
@@ -XXX,XX +XXX,XX @@ static void a9_daughterboard_init(const VexpressMachineState *vms,
      * address space should in theory be remappable to various
      * things including ROM or RAM; we always map the RAM there.
      */
-    memory_region_init_alias(lowram, NULL, "vexpress.lowmem", machine->ram,
-                             0, low_ram_size);
-    memory_region_add_subregion(sysmem, 0x0, lowram);
+    memory_region_init_alias(&vms->lowram, NULL, "vexpress.lowmem",
+                             machine->ram, 0, low_ram_size);
+    memory_region_add_subregion(sysmem, 0x0, &vms->lowram);
     memory_region_add_subregion(sysmem, 0x60000000, machine->ram);
 
     /* 0x1e000000 A9MPCore (SCU) private memory region */
@@ -XXX,XX +XXX,XX @@ static VEDBoardInfo a9_daughterboard = {
     .init = a9_daughterboard_init,
 };
 
-static void a15_daughterboard_init(const VexpressMachineState *vms,
+static void a15_daughterboard_init(VexpressMachineState *vms,
                                    ram_addr_t ram_size,
                                    const char *cpu_type,
                                    qemu_irq *pic)
 {
     MachineState *machine = MACHINE(vms);
     MemoryRegion *sysmem = get_system_memory();
-    MemoryRegion *sram = g_new(MemoryRegion, 1);
 
     {
         /* We have to use a separate 64 bit variable here to avoid the gcc
@@ -XXX,XX +XXX,XX @@ static void a15_daughterboard_init(const VexpressMachineState *vms,
     /* 0x2b060000: SP805 watchdog: not modelled */
     /* 0x2b0a0000: PL341 dynamic memory controller: not modelled */
     /* 0x2e000000: system SRAM */
-    memory_region_init_ram(sram, NULL, "vexpress.a15sram", 0x10000,
+    memory_region_init_ram(&vms->a15sram, NULL, "vexpress.a15sram", 0x10000,
                            &error_fatal);
-    memory_region_add_subregion(sysmem, 0x2e000000, sram);
+    memory_region_add_subregion(sysmem, 0x2e000000, &vms->a15sram);
 
     /* 0x7ffb0000: DMA330 DMA controller: not modelled */
     /* 0x7ffd0000: PL354 static memory controller: not modelled */
@@ -XXX,XX +XXX,XX @@ static void vexpress_common_init(MachineState *machine)
     I2CBus *i2c;
     ram_addr_t vram_size, sram_size;
     MemoryRegion *sysmem = get_system_memory();
-    MemoryRegion *vram = g_new(MemoryRegion, 1);
-    MemoryRegion *sram = g_new(MemoryRegion, 1);
-    MemoryRegion *flashalias = g_new(MemoryRegion, 1);
-    MemoryRegion *flash0mem;
     const hwaddr *map = daughterboard->motherboard_map;
     int i;
 
@@ -XXX,XX +XXX,XX @@ static void vexpress_common_init(MachineState *machine)
 
     if (map[VE_NORFLASHALIAS] != -1) {
         /* Map flash 0 as an alias into low memory */
+        MemoryRegion *flash0mem;
         flash0mem = sysbus_mmio_get_region(SYS_BUS_DEVICE(pflash0), 0);
-        memory_region_init_alias(flashalias, NULL, "vexpress.flashalias",
+        memory_region_init_alias(&vms->flashalias, NULL, "vexpress.flashalias",
                                  flash0mem, 0, VEXPRESS_FLASH_SIZE);
-        memory_region_add_subregion(sysmem, map[VE_NORFLASHALIAS], flashalias);
+        memory_region_add_subregion(sysmem, map[VE_NORFLASHALIAS], &vms->flashalias);
     }
 
     dinfo = drive_get(IF_PFLASH, 0, 1);
     ve_pflash_cfi01_register(map[VE_NORFLASH1], "vexpress.flash1", dinfo);
 
     sram_size = 0x2000000;
-    memory_region_init_ram(sram, NULL, "vexpress.sram", sram_size,
+    memory_region_init_ram(&vms->sram, NULL, "vexpress.sram", sram_size,
                            &error_fatal);
-    memory_region_add_subregion(sysmem, map[VE_SRAM], sram);
+    memory_region_add_subregion(sysmem, map[VE_SRAM], &vms->sram);
 
     vram_size = 0x800000;
-    memory_region_init_ram(vram, NULL, "vexpress.vram", vram_size,
+    memory_region_init_ram(&vms->vram, NULL, "vexpress.vram", vram_size,
                            &error_fatal);
-    memory_region_add_subregion(sysmem, map[VE_VIDEORAM], vram);
+    memory_region_add_subregion(sysmem, map[VE_VIDEORAM], &vms->vram);
 
     /* 0x4e000000 LAN9118 Ethernet */
     if (nd_table[0].used) {
-- 
2.34.1

Convert the u2f.txt file to rST, and place it in the right place
in our manual layout. The old text didn't fit very well into our
manual style, so the new version ends up looking like a rewrite,
although some of the original text is preserved:

* the 'building' section of the old file is removed, since we
   generally assume that users have already built QEMU
 * some rather verbose text has been cut back
 * document the passthrough device first, on the assumption
   that's most likely to be of interest to users
 * cut back on the duplication of text between sections
 * format example command lines etc with rST

As it's a short document it seemed simplest to do this all
in one go rather than try to do a minimal syntactic conversion
and then clean up the wording and layout.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20230421163734.1152076-1-peter.maydell@linaro.org
---
 docs/system/device-emulation.rst |   1 +
 docs/system/devices/usb-u2f.rst  |  93 ++++++++++++++++++++++++++
 docs/system/devices/usb.rst      |   2 +-
 docs/u2f.txt                     | 110 -------------------------------
 4 files changed, 95 insertions(+), 111 deletions(-)
 create mode 100644 docs/system/devices/usb-u2f.rst
 delete mode 100644 docs/u2f.txt

diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -XXX,XX +XXX,XX @@ Emulated Devices
    devices/virtio-pmem.rst
    devices/vhost-user-rng.rst
    devices/canokey.rst
+   devices/usb-u2f.rst
    devices/igb.rst
diff --git a/docs/system/devices/usb-u2f.rst b/docs/system/devices/usb-u2f.rst
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/docs/system/devices/usb-u2f.rst
@@ -XXX,XX +XXX,XX @@
+Universal Second Factor (U2F) USB Key Device
+============================================
+
+U2F is an open authentication standard that enables relying parties
+exposed to the internet to offer a strong second factor option for end
+user authentication.
+
+The second factor is provided by a device implementing the U2F
+protocol. In case of a USB U2F security key, it is a USB HID device
+that implements the U2F protocol.
+
+QEMU supports both pass-through of a host U2F key device to a VM,
+and software emulation of a U2F key.
+
+``u2f-passthru``
+----------------
+
+The ``u2f-passthru`` device allows you to connect a real hardware
+U2F key on your host to a guest VM. All requests made from the guest
+are passed through to the physical security key connected to the
+host machine and vice versa.
+
+In addition, the dedicated pass-through allows you to share a single
+U2F security key with several guest VMs, which is not possible with a
+simple host device assignment pass-through.
+
+You can specify the host U2F key to use with the ``hidraw``
+option, which takes the host path to a Linux ``/dev/hidrawN`` device:
+
+.. parsed-literal::
+   |qemu_system| -usb -device u2f-passthru,hidraw=/dev/hidraw0
+
+If you don't specify the device, the ``u2f-passthru`` device will
+autoscan to take the first U2F device it finds on the host (this
+requires a working libudev):
+
+.. parsed-literal::
+   |qemu_system| -usb -device u2f-passthru
+
+``u2f-emulated``
+----------------
+
+``u2f-emulated`` is a completely software emulated U2F device.
+It uses `libu2f-emu <https://github.com/MattGorko/libu2f-emu>`__
+for the U2F key emulation. libu2f-emu
+provides a complete implementation of the U2F protocol device part for
+all specified transports given by the FIDO Alliance.
+
+To work, an emulated U2F device must have four elements:
+
+ * ec x509 certificate
+ * ec private key
+ * counter (four bytes value)
+ * 48 bytes of entropy (random bits)
+
+To use this type of device, these have to be configured, and these
+four elements must be passed one way or another.
+
+Assuming that you have a working libu2f-emu installed on the host,
+there are three possible ways to configure the ``u2f-emulated`` device:
+
+ * ephemeral
+ * setup directory
+ * manual
+
+Ephemeral is the simplest way to configure; it lets the device generate
+all the elements it needs for a single use of the lifetime of the device.
+It is the default if you do not pass any other options to the device.
+
+.. parsed-literal::
+   |qemu_system| -usb -device u2f-emulated
+
+You can pass the device the path of a setup directory on the host
+using the ``dir`` option; the directory must contain these four files:
+
+ * ``certificate.pem``: ec x509 certificate
+ * ``private-key.pem``: ec private key
+ * ``counter``: counter value
+ * ``entropy``: 48 bytes of entropy
+
+.. parsed-literal::
+   |qemu_system| -usb -device u2f-emulated,dir=$dir
+
+You can also manually pass the device the paths to each of these files,
+if you don't want them all to be in the same directory, using the options
+
+ * ``cert``
+ * ``priv``
+ * ``counter``
+ * ``entropy``
+
+.. parsed-literal::
+   |qemu_system| -usb -device u2f-emulated,cert=$DIR1/$FILE1,priv=$DIR2/$FILE2,counter=$DIR3/$FILE3,entropy=$DIR4/$FILE4
diff --git a/docs/system/devices/usb.rst b/docs/system/devices/usb.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/devices/usb.rst
+++ b/docs/system/devices/usb.rst
@@ -XXX,XX +XXX,XX @@ option or the ``device_add`` monitor command. Available devices are:
    USB audio device
 
 ``u2f-{emulated,passthru}``
-   Universal Second Factor device
+   :doc:`usb-u2f`
 
 ``canokey``
    An Open-source Secure Key implementing FIDO2, OpenPGP, PIV and more.
diff --git a/docs/u2f.txt b/docs/u2f.txt
deleted file mode 100644
index XXXXXXX..XXXXXXX
--- a/docs/u2f.txt
+++ /dev/null
@@ -XXX,XX +XXX,XX @@
-QEMU U2F Key Device Documentation.
-
-Contents
-1. USB U2F key device
-2. Building
-3. Using u2f-emulated
-4. Using u2f-passthru
-5. Libu2f-emu
-
-1. USB U2F key device
-
-U2F is an open authentication standard that enables relying parties
-exposed to the internet to offer a strong second factor option for end
-user authentication.
-
-The standard brings many advantages to both parties, client and server,
-allowing to reduce over-reliance on passwords, it increases authentication
-security and simplifies passwords.
-
-The second factor is materialized by a device implementing the U2F
-protocol. In case of a USB U2F security key, it is a USB HID device
-that implements the U2F protocol.
-
-In QEMU, the USB U2F key device offers a dedicated support of U2F, allowing
-guest USB FIDO/U2F security keys operating in two possible modes:
-pass-through and emulated.
-
-The pass-through mode consists of passing all requests made from the guest
-to the physical security key connected to the host machine and vice versa.
-In addition, the dedicated pass-through allows to have a U2F security key
-shared on several guests which is not possible with a simple host device
-assignment pass-through.
-
-The emulated mode consists of completely emulating the behavior of an
-U2F device through software part. Libu2f-emu is used for that.
-
-
-2. Building
-
-To ensure the build of the u2f-emulated device variant which depends
-on libu2f-emu: configuring and building:
-
-    ./configure --enable-u2f && make
-
-The pass-through mode is built by default on Linux. To take advantage
-of the autoscan option it provides, make sure you have a working libudev
-installed on the host.
-
-
-3. Using u2f-emulated
-
-To work, an emulated U2F device must have four elements:
- * ec x509 certificate
- * ec private key
- * counter (four bytes value)
- * 48 bytes of entropy (random bits)
-
-To use this type of device, this one has to be configured, and these
-four elements must be passed one way or another.
-
-Assuming that you have a working libu2f-emu installed on the host.
-There are three possible ways of configurations:
- * ephemeral
- * setup directory
- * manual
-
-Ephemeral is the simplest way to configure, it lets the device generate
-all the elements it needs for a single use of the lifetime of the device.
-
-    qemu -usb -device u2f-emulated
-
-Setup directory allows to configure the device from a directory containing
-four files:
- * certificate.pem: ec x509 certificate
- * private-key.pem: ec private key
- * counter: counter value
- * entropy: 48 bytes of entropy
-
-    qemu -usb -device u2f-emulated,dir=$dir
-
-Manual allows to configure the device more finely by specifying each
-of the elements necessary for the device:
- * cert
- * priv
- * counter
- * entropy
-
-    qemu -usb -device u2f-emulated,cert=$DIR1/$FILE1,priv=$DIR2/$FILE2,counter=$DIR3/$FILE3,entropy=$DIR4/$FILE4
-
-
-4. Using u2f-passthru
-
-On the host specify the u2f-passthru device with a suitable hidraw:
-
-    qemu -usb -device u2f-passthru,hidraw=/dev/hidraw0
-
-Alternately, the u2f-passthru device can autoscan to take the first
-U2F device it finds on the host (this requires a working libudev):
-
-    qemu -usb -device u2f-passthru
-
-
-5. Libu2f-emu
-
-The u2f-emulated device uses libu2f-emu for the U2F key emulation. Libu2f-emu
-implements completely the U2F protocol device part for all specified
-transport given by the FIDO Alliance.
-
-For more information about libu2f-emu see this page:
-https://github.com/MattGorko/libu2f-emu.
-- 
2.34.1