Series comparison

-[Qemu-devel] [PULL 00/22] target-arm queue
+[PULL 00/39] target-arm queue
-Arm stuff, mostly patches from RTH.
+Most of this is the Neon decodetree patches, followed by Edgar's versal cleanups.
 thanks
 -- PMM
-The following changes since commit 01a9a51ffaf4699827ea6425cb2b834a356e159d:
-  Merge remote-tracking branch 'remotes/kraxel/tags/ui-20190205-pull-request' into staging (2019-02-05 14:01:29 +0000)
+The following changes since commit 2ef486e76d64436be90f7359a3071fb2a56ce835:
   Merge remote-tracking branch 'remotes/marcel/tags/rdma-pull-request' into staging (2020-05-03 14:12:56 +0100)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20190205
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200504
-for you to fetch changes up to a15945d98d3a3390c3da344d1b47218e91e49d8b:
+for you to fetch changes up to 9aefc6cf9b73f66062d2f914a0136756e7a28211:
-  target/arm: Make FPSCR/FPCR trapped-exception bits RAZ/WI (2019-02-05 16:52:42 +0000)
+  target/arm: Move gen_ function typedefs to translate.h (2020-05-04 12:59:26 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * Implement Armv8.5-BTI extension for system emulation mode
+ * Start of conversion of Neon insns to decodetree
- * Implement the PR_PAC_RESET_KEYS prctl() for linux-user mode's Armv8.3-PAuth support
+ * versal board: support SD and RTC
- * Support TBI (top-byte-ignore) properly for linux-user mode
+ * Implement ARMv8.2-TTS2UXN
- * gdbstub: allow killing QEMU via vKill command
+ * Make VQDMULL undefined when U=1
- * hw/arm/boot: Support DTB autoload for firmware-only boots
+ * Some minor code cleanups
  * target/arm: Make FPSCR/FPCR trapped-exception bits RAZ/WI
 ----------------------------------------------------------------
-Max Filippov (1):
+Edgar E. Iglesias (11):
-      gdbstub: allow killing QEMU via vKill command
+      hw/arm: versal: Remove inclusion of arm_gicv3_common.h
       hw/arm: versal: Move misplaced comment
       hw/arm: versal-virt: Fix typo xlnx-ve -> xlnx-versal
       hw/arm: versal: Embed the UARTs into the SoC type
       hw/arm: versal: Embed the GEMs into the SoC type
       hw/arm: versal: Embed the ADMAs into the SoC type
       hw/arm: versal: Embed the APUs into the SoC type
       hw/arm: versal: Add support for SD
       hw/arm: versal: Add support for the RTC
       hw/arm: versal-virt: Add support for SD
       hw/arm: versal-virt: Add support for the RTC
-Peter Maydell (7):
+Fredrik Strupe (1):
-      target/arm: Compute TB_FLAGS for TBI for user-only
+      target/arm: Make VQDMULL undefined when U=1
       hw/arm/boot: Fix block comment style in arm_load_kernel()
       hw/arm/boot: Factor out "direct kernel boot" code into its own function
       hw/arm/boot: Factor out "set up firmware boot" code
       hw/arm/boot: Clarify why arm_setup_firmware_boot() doesn't set env->boot_info
       hw/arm/boot: Support DTB autoload for firmware-only boots
       target/arm: Make FPSCR/FPCR trapped-exception bits RAZ/WI
-Richard Henderson (14):
+Peter Maydell (25):
-      target/arm: Introduce isar_feature_aa64_bti
+      target/arm: Don't use a TLB for ARMMMUIdx_Stage2
-      target/arm: Add PSTATE.BTYPE
+      target/arm: Use enum constant in get_phys_addr_lpae() call
-      target/arm: Add BT and BTYPE to tb->flags
+      target/arm: Add new 's1_is_el0' argument to get_phys_addr_lpae()
-      exec: Add target-specific tlb bits to MemTxAttrs
+      target/arm: Implement ARMv8.2-TTS2UXN
-      target/arm: Cache the GP bit for a page in MemTxAttrs
+      target/arm: Use correct variable for setting 'max' cpu's ID_AA64DFR0
-      target/arm: Default handling of BTYPE during translation
+      target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check
-      target/arm: Reset btype for direct branches
+      target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
-      target/arm: Set btype for indirect branches
+      target/arm: Add stubs for AArch32 Neon decodetree
-      target/arm: Enable BTI for -cpu max
+      target/arm: Convert VCMLA (vector) to decodetree
-      linux-user: Implement PR_PAC_RESET_KEYS
+      target/arm: Convert VCADD (vector) to decodetree
-      tests/tcg/aarch64: Add pauth smoke test
+      target/arm: Convert V[US]DOT (vector) to decodetree
-      target/arm: Add TBFLAG_A64_TBID, split out gen_top_byte_ignore
+      target/arm: Convert VFM[AS]L (vector) to decodetree
-      target/arm: Clean TBI for data operations in the translator
+      target/arm: Convert VCMLA (scalar) to decodetree
-      target/arm: Enable TBI for user-only
+      target/arm: Convert V[US]DOT (scalar) to decodetree
       target/arm: Convert VFM[AS]L (scalar) to decodetree
       target/arm: Convert Neon load/store multiple structures to decodetree
       target/arm: Convert Neon 'load single structure to all lanes' to decodetree
       target/arm: Convert Neon 'load/store single structure' to decodetree
       target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree
       target/arm: Convert Neon 3-reg-same logic ops to decodetree
       target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree
       target/arm: Convert Neon 3-reg-same comparisons to decodetree
       target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree
       target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to decodetree
       target/arm: Move gen_ function typedefs to translate.h
- tests/tcg/aarch64/Makefile.target   |   6 +-
+Philippe Mathieu-Daudé (2):
- include/exec/memattrs.h             |  10 +
+      hw/arm/mps2-tz: Use TYPE_IOTKIT instead of hardcoded string
- linux-user/aarch64/target_syscall.h |   7 +
+      target/arm: Use uint64_t for midr field in CPU state struct
  target/arm/cpu.h                    |  27 +-
  target/arm/internals.h              |  27 +-
  target/arm/translate.h              |  12 +-
  gdbstub.c                           |   4 +
  hw/arm/boot.c                       | 166 +++++++------
  linux-user/syscall.c                |  36 +++
  target/arm/cpu.c                    |   6 +
  target/arm/cpu64.c                  |   4 +
  target/arm/helper.c                 |  80 +++---
  target/arm/translate-a64.c          | 476 +++++++++++++++++++++++++-----------
  tests/tcg/aarch64/pauth-1.c         |  23 ++
 files changed, 623 insertions(+), 261 deletions(-)
  create mode 100644 tests/tcg/aarch64/pauth-1.c
+ include/hw/arm/xlnx-versal.h    |  31 +-
+ target/arm/cpu-param.h          |   2 +-
+ target/arm/cpu.h                |  38 ++-
+ target/arm/translate-a64.h      |   9 -
+ target/arm/translate.h          |  26 ++
+ target/arm/neon-dp.decode       |  86 +++++
+ target/arm/neon-ls.decode       |  52 +++
+ target/arm/neon-shared.decode   |  66 ++++
+ hw/arm/mps2-tz.c                |   2 +-
+ hw/arm/xlnx-versal-virt.c       |  74 ++++-
+ hw/arm/xlnx-versal.c            | 115 +++++--
+ target/arm/cpu.c                |   3 +-
+ target/arm/cpu64.c              |   8 +-
+ target/arm/helper.c             | 183 ++++------
+ target/arm/translate-a64.c      |  17 -
+ target/arm/translate-neon.inc.c | 714 +++++++++++++++++++++++++++++++++++++++
+ target/arm/translate-vfp.inc.c  |   6 -
+ target/arm/translate.c          | 716 +++-------------------------------------
+ target/arm/Makefile.objs        |  18 +
+files changed, 1302 insertions(+), 864 deletions(-)
+ create mode 100644 target/arm/neon-dp.decode
+ create mode 100644 target/arm/neon-ls.decode
+ create mode 100644 target/arm/neon-shared.decode
+ create mode 100644 target/arm/translate-neon.inc.c

-New patch
+[PULL 01/39] target/arm: Make VQDMULL undefined when U=1
+From: Fredrik Strupe <fredrik@strupe.net>
+According to Arm ARM, VQDMULL is only valid when U=0, while having
+U=1 is unallocated.
+Signed-off-by: Fredrik Strupe <fredrik@strupe.net>
+Fixes: 695272dcb976 ("target-arm: Handle UNDEF cases for Neon 3-regs-different-widths")
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/translate.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+                     {0, 0, 0, 0}, /* VMLSL */
+                     {0, 0, 0, 9}, /* VQDMLSL */
+                     {0, 0, 0, 0}, /* Integer VMULL */
+-                    {0, 0, 0, 1}, /* VQDMULL */
++                    {0, 0, 0, 9}, /* VQDMULL */
+                     {0, 0, 0, 0xa}, /* Polynomial VMULL */
+                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
+                 };
+--
+.20.1

-New patch
+[PULL 02/39] hw/arm/mps2-tz: Use TYPE_IOTKIT instead of hardcoded string
+From: Philippe Mathieu-Daudé <f4bug@amsat.org>
+By using the TYPE_* definitions for devices, we can:
+ - quickly find where devices are used with 'git-grep'
+ - easily rename a device (one-line change).
+Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20200428154650.21991-1-f4bug@amsat.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/arm/mps2-tz.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/arm/mps2-tz.c
++++ b/hw/arm/mps2-tz.c
+@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
+         exit(EXIT_FAILURE);
+     }
+-    sysbus_init_child_obj(OBJECT(machine), "iotkit", &mms->iotkit,
++    sysbus_init_child_obj(OBJECT(machine), TYPE_IOTKIT, &mms->iotkit,
+                           sizeof(mms->iotkit), mmc->armsse_type);
+     iotkitdev = DEVICE(&mms->iotkit);
+     object_property_set_link(OBJECT(&mms->iotkit), OBJECT(system_memory),
+--
+.20.1

-[Qemu-devel] [PULL 12/22] target/arm: Add TBFLAG_A64_TBID, split out gen_top_byte_ignore
+[PULL 03/39] target/arm: Don't use a TLB for ARMMMUIdx_Stage2
-From: Richard Henderson <richard.henderson@linaro.org>
+We define ARMMMUIdx_Stage2 as being an MMU index which uses a QEMU
+TLB.  However we never actually use the TLB -- all stage 2 lookups
-Split out gen_top_byte_ignore in preparation of handling these
+are done by direct calls to get_phys_addr_lpae() followed by a
-data accesses; the new tbflags field is not yet honored.
+physical address load via address_space_ld*().
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Remove Stage2 from the list of ARM MMU indexes which correspond to
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+real core MMU indexes, and instead put it in the set of "NOTLB" ARM
-Message-id: 20190204132126.3255-2-richard.henderson@linaro.org
+MMU indexes.
 This allows us to drop NB_MMU_MODES to 11.  It also means we can
 safely add support for the ARMv8.3-TTS2UXN extension, which adds
 permission bits to the stage 2 descriptors which define execute
 permission separatel for EL0 and EL1; supporting that while keeping
 Stage2 in a QEMU TLB would require us to use separate TLBs for
 "Stage2 for an EL0 access" and "Stage2 for an EL1 access", which is a
 lot of extra complication given we aren't even using the QEMU TLB.
 In the process of updating the comment on our MMU index use,
 fix a couple of other minor errors:
  * NS EL2 EL2&0 was missing from the list in the comment
  * some text hadn't been updated from when we bumped NB_MMU_MODES
    above 8
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200330210400.11724-2-peter.maydell@linaro.org
 ---
- target/arm/cpu.h           |  1 +
+ target/arm/cpu-param.h |   2 +-
- target/arm/translate.h     |  3 +-
+ target/arm/cpu.h       |  21 +++++---
- target/arm/helper.c        |  1 +
+ target/arm/helper.c    | 112 ++++-------------------------------------
- target/arm/translate-a64.c | 72 +++++++++++++++++++-------------------
+files changed, 27 insertions(+), 108 deletions(-)
-files changed, 40 insertions(+), 37 deletions(-)
+diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu-param.h
 +++ b/target/arm/cpu-param.h
@@ -XXX,XX +XXX,XX @@
  # define TARGET_PAGE_BITS_MIN  10
  #endif
 -#define NB_MMU_MODES 12
 +#define NB_MMU_MODES 11
  #endif
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, ZCR_LEN, 4, 4)
+@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
- FIELD(TBFLAG_A64, PAUTH_ACTIVE, 8, 1)
+  *     handling via the TLB. The only way to do a stage 1 translation without
- FIELD(TBFLAG_A64, BT, 9, 1)
+  *     the immediate stage 2 translation is via the ATS or AT system insns,
- FIELD(TBFLAG_A64, BTYPE, 10, 2)
+  *     which can be slow-pathed and always do a page table walk.
-+FIELD(TBFLAG_A64, TBID, 12, 2)
++ *     The only use of stage 2 translations is either as part of an s1+2
++ *     lookup or when loading the descriptors during a stage 1 page table walk,
- static inline bool bswap_code(bool sctlr_b)
++ *     and in both those cases we don't use the TLB.
- {
+  *  4. we can also safely fold together the "32 bit EL3" and "64 bit EL3"
-diff --git a/target/arm/translate.h b/target/arm/translate.h
+  *     translation regimes, because they map reasonably well to each other
-index XXXXXXX..XXXXXXX 100644
+  *     and they can't both be active at the same time.
---- a/target/arm/translate.h
+@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
-+++ b/target/arm/translate.h
+  * NS EL1 EL1&0 stage 1+2 (aka NS PL1)
-@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
+  * NS EL1 EL1&0 stage 1+2 +PAN
-     int user;
+  * NS EL0 EL2&0
- #endif
++ * NS EL2 EL2&0
-     ARMMMUIdx mmu_idx; /* MMU index to use for normal loads/stores */
+  * NS EL2 EL2&0 +PAN
--    uint8_t tbii;      /* TBI1|TBI0 for EL0/1 or TBI for EL2/3 */
+  * NS EL2 (aka NS PL2)
-+    uint8_t tbii;      /* TBI1|TBI0 for insns */
+  * S EL0 EL1&0 (aka S PL0)
-+    uint8_t tbid;      /* TBI1|TBI0 for data */
+  * S EL1 EL1&0 (not used if EL3 is 32 bit)
-     bool ns;        /* Use non-secure CPREG bank on access */
+  * S EL1 EL1&0 +PAN
-     int fp_excp_el; /* FP exception EL or 0 if enabled */
+  * S EL3 (aka S PL1)
-     int sve_excp_el; /* SVE exception EL or 0 if enabled */
+- * NS EL1&0 stage 2
   *
 - * for a total of 12 different mmu_idx.
 + * for a total of 11 different mmu_idx.
   *
   * R profile CPUs have an MPU, but can use the same set of MMU indexes
   * as A profile. They only need to distinguish NS EL0 and NS EL1 (and
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
   * are not quite the same -- different CPU types (most notably M profile
   * vs A/R profile) would like to use MMU indexes with different semantics,
   * but since we don't ever need to use all of those in a single CPU we
 - * can avoid setting NB_MMU_MODES to more than 8. The lower bits of
 + * can avoid having to set NB_MMU_MODES to "total number of A profile MMU
 + * modes + total number of M profile MMU modes". The lower bits of
   * ARMMMUIdx are the core TLB mmu index, and the higher bits are always
   * the same for any particular CPU.
   * Variables of type ARMMUIdx are always full values, and the core
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
      ARMMMUIdx_SE10_1_PAN = 9 | ARM_MMU_IDX_A,
      ARMMMUIdx_SE3        = 10 | ARM_MMU_IDX_A,
 -    ARMMMUIdx_Stage2     = 11 | ARM_MMU_IDX_A,
 -
      /*
       * These are not allocated TLBs and are used only for AT system
       * instructions or for the first stage of an S12 page table walk.
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
      ARMMMUIdx_Stage1_E0 = 0 | ARM_MMU_IDX_NOTLB,
      ARMMMUIdx_Stage1_E1 = 1 | ARM_MMU_IDX_NOTLB,
      ARMMMUIdx_Stage1_E1_PAN = 2 | ARM_MMU_IDX_NOTLB,
 +    /*
 +     * Not allocated a TLB: used only for second stage of an S12 page
 +     * table walk, or for descriptor loads during first stage of an S1
 +     * page table walk. Note that if we ever want to have a TLB for this
 +     * then various TLB flush insns which currently are no-ops or flush
 +     * only stage 1 MMU indexes will need to change to flush stage 2.
 +     */
 +    ARMMMUIdx_Stage2     = 3 | ARM_MMU_IDX_NOTLB,
      /*
       * M-profile.
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdxBit {
      TO_CORE_BIT(SE10_1),
      TO_CORE_BIT(SE10_1_PAN),
      TO_CORE_BIT(SE3),
 -    TO_CORE_BIT(Stage2),
      TO_CORE_BIT(MUser),
      TO_CORE_BIT(MPriv),
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
+@@ -XXX,XX +XXX,XX @@ static void tlbiall_nsnh_write(CPUARMState *env, const ARMCPRegInfo *ri,
-             }
+     tlb_flush_by_mmuidx(cs,
+                         ARMMMUIdxBit_E10_1 |
-             flags = FIELD_DP32(flags, TBFLAG_A64, TBII, tbii);
+                         ARMMMUIdxBit_E10_1_PAN |
-+            flags = FIELD_DP32(flags, TBFLAG_A64, TBID, tbid);
+-                        ARMMMUIdxBit_E10_0 |
-         }
+-                        ARMMMUIdxBit_Stage2);
- #endif
++                        ARMMMUIdxBit_E10_0);
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ void gen_a64_set_pc_im(uint64_t val)
      tcg_gen_movi_i64(cpu_pc, val);
  }
--/* Load the PC from a generic TCG variable.
+ static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
-+/*
+@@ -XXX,XX +XXX,XX @@ static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
-+ * Handle Top Byte Ignore (TBI) bits.
+     tlb_flush_by_mmuidx_all_cpus_synced(cs,
-  *
+                                         ARMMMUIdxBit_E10_1 |
-- * If address tagging is enabled via the TCR TBI bits, then loading
+                                         ARMMMUIdxBit_E10_1_PAN |
-- * an address into the PC will clear out any tag in it:
+-                                        ARMMMUIdxBit_E10_0 |
-+ * If address tagging is enabled via the TCR TBI bits:
+-                                        ARMMMUIdxBit_Stage2);
-  *  + for EL2 and EL3 there is only one TBI bit, and if it is set
++                                        ARMMMUIdxBit_E10_0);
-  *    then the address is zero-extended, clearing bits [63:56]
+ }
-  *  + for EL0 and EL1, TBI0 controls addresses with bit 55 == 0
-@@ -XXX,XX +XXX,XX @@ void gen_a64_set_pc_im(uint64_t val)
+-static void tlbiipas2_write(CPUARMState *env, const ARMCPRegInfo *ri,
-  *    If the appropriate TBI bit is set for the address then
+-                            uint64_t value)
-  *    the address is sign-extended from bit 55 into bits [63:56]
+-{
-  *
+-    /* Invalidate by IPA. This has to invalidate any structures that
-- * We can avoid doing this for relative-branches, because the
+-     * contain only stage 2 translation information, but does not need
-- * PC + offset can never overflow into the tag bits (assuming
+-     * to apply to structures that contain combined stage 1 and stage 2
-- * that virtual addresses are less than 56 bits wide, as they
+-     * translation information.
-- * are currently), but we must handle it for branch-to-register.
+-     * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
-+ * Here We have concatenated TBI{1,0} into tbi.
+-     */
-  */
+-    CPUState *cs = env_cpu(env);
--static void gen_a64_set_pc(DisasContext *s, TCGv_i64 src)
+-    uint64_t pageaddr;
-+static void gen_top_byte_ignore(DisasContext *s, TCGv_i64 dst,
+-
-+                                TCGv_i64 src, int tbi)
+-    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
 -        return;
 -    }
 -
 -    pageaddr = sextract64(value << 12, 0, 40);
 -
 -    tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2);
 -}
 -
 -static void tlbiipas2_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 -                               uint64_t value)
 -{
 -    CPUState *cs = env_cpu(env);
 -    uint64_t pageaddr;
 -
 -    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
 -        return;
 -    }
 -
 -    pageaddr = sextract64(value << 12, 0, 40);
 -
 -    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
 -                                             ARMMMUIdxBit_Stage2);
 -}
  static void tlbiall_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                uint64_t value)
@@ -XXX,XX +XXX,XX @@ static void vttbr_write(CPUARMState *env, const ARMCPRegInfo *ri,
          tlb_flush_by_mmuidx(cs,
                              ARMMMUIdxBit_E10_1 |
                              ARMMMUIdxBit_E10_1_PAN |
 -                            ARMMMUIdxBit_E10_0 |
 -                            ARMMMUIdxBit_Stage2);
 +                            ARMMMUIdxBit_E10_0);
          raw_write(env, ri, value);
      }
  }
@@ -XXX,XX +XXX,XX @@ static int alle1_tlbmask(CPUARMState *env)
          return ARMMMUIdxBit_SE10_1 |
                 ARMMMUIdxBit_SE10_1_PAN |
                 ARMMMUIdxBit_SE10_0;
 -    } else if (arm_feature(env, ARM_FEATURE_EL2)) {
 -        return ARMMMUIdxBit_E10_1 |
 -               ARMMMUIdxBit_E10_1_PAN |
 -               ARMMMUIdxBit_E10_0 |
 -               ARMMMUIdxBit_Stage2;
      } else {
          return ARMMMUIdxBit_E10_1 |
                 ARMMMUIdxBit_E10_1_PAN |
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                               ARMMMUIdxBit_SE3);
  }
 -static void tlbi_aa64_ipas2e1_write(CPUARMState *env, const ARMCPRegInfo *ri,
 -                                    uint64_t value)
 -{
 -    /* Invalidate by IPA. This has to invalidate any structures that
 -     * contain only stage 2 translation information, but does not need
 -     * to apply to structures that contain combined stage 1 and stage 2
 -     * translation information.
 -     * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
 -     */
 -    ARMCPU *cpu = env_archcpu(env);
 -    CPUState *cs = CPU(cpu);
 -    uint64_t pageaddr;
 -
 -    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
 -        return;
 -    }
 -
 -    pageaddr = sextract64(value << 12, 0, 48);
 -
 -    tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2);
 -}
 -
 -static void tlbi_aa64_ipas2e1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 -                                      uint64_t value)
 -{
 -    CPUState *cs = env_cpu(env);
 -    uint64_t pageaddr;
 -
 -    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
 -        return;
 -    }
 -
 -    pageaddr = sextract64(value << 12, 0, 48);
 -
 -    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
 -                                             ARMMMUIdxBit_Stage2);
 -}
 -
  static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri,
                                        bool isread)
  {
--    /* Note that TBII is TBI1:TBI0.  */
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
--    int tbi = s->tbii;
+       .writefn = tlbi_aa64_vae1_write },
--
+     { .name = "TLBI_IPAS2E1IS", .state = ARM_CP_STATE_AA64,
--    if (s->current_el <= 1) {
+       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
--        if (tbi != 0) {
+-      .access = PL2_W, .type = ARM_CP_NO_RAW,
--            /* Sign-extend from bit 55.  */
+-      .writefn = tlbi_aa64_ipas2e1is_write },
--            tcg_gen_sextract_i64(cpu_pc, src, 0, 56);
++      .access = PL2_W, .type = ARM_CP_NOP },
--
+     { .name = "TLBI_IPAS2LE1IS", .state = ARM_CP_STATE_AA64,
--            if (tbi != 3) {
+       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
--                TCGv_i64 tcg_zero = tcg_const_i64(0);
+-      .access = PL2_W, .type = ARM_CP_NO_RAW,
--
+-      .writefn = tlbi_aa64_ipas2e1is_write },
--                /*
++      .access = PL2_W, .type = ARM_CP_NOP },
--                 * The two TBI bits differ.
+     { .name = "TLBI_ALLE1IS", .state = ARM_CP_STATE_AA64,
--                 * If tbi0, then !tbi1: only use the extension if positive.
+       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4,
--                 * if !tbi0, then tbi1: only use the extension if negative.
+       .access = PL2_W, .type = ARM_CP_NO_RAW,
--                 */
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
--                tcg_gen_movcond_i64(tbi == 1 ? TCG_COND_GE : TCG_COND_LT,
+       .writefn = tlbi_aa64_alle1is_write },
--                                    cpu_pc, cpu_pc, tcg_zero, cpu_pc, src);
+     { .name = "TLBI_IPAS2E1", .state = ARM_CP_STATE_AA64,
--                tcg_temp_free_i64(tcg_zero);
+       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
--            }
+-      .access = PL2_W, .type = ARM_CP_NO_RAW,
--            return;
+-      .writefn = tlbi_aa64_ipas2e1_write },
--        }
++      .access = PL2_W, .type = ARM_CP_NOP },
-+    if (tbi == 0) {
+     { .name = "TLBI_IPAS2LE1", .state = ARM_CP_STATE_AA64,
-+        /* Load unmodified address */
+       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
-+        tcg_gen_mov_i64(dst, src);
+-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-+    } else if (s->current_el >= 2) {
+-      .writefn = tlbi_aa64_ipas2e1_write },
-+        /* FIXME: ARMv8.1-VHE S2 translation regime.  */
++      .access = PL2_W, .type = ARM_CP_NOP },
-+        /* Force tag byte to all zero */
+     { .name = "TLBI_ALLE1", .state = ARM_CP_STATE_AA64,
-+        tcg_gen_extract_i64(dst, src, 0, 56);
+       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4,
-     } else {
+       .access = PL2_W, .type = ARM_CP_NO_RAW,
--        if (tbi != 0) {
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
--            /* Force tag byte to all zero */
+       .writefn = tlbimva_hyp_is_write },
--            tcg_gen_extract_i64(cpu_pc, src, 0, 56);
+     { .name = "TLBIIPAS2",
--            return;
+       .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
-+        /* Sign-extend from bit 55.  */
+-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-+        tcg_gen_sextract_i64(dst, src, 0, 56);
+-      .writefn = tlbiipas2_write },
-+
++      .type = ARM_CP_NOP, .access = PL2_W },
-+        if (tbi != 3) {
+     { .name = "TLBIIPAS2IS",
-+            TCGv_i64 tcg_zero = tcg_const_i64(0);
+       .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
-+
+-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-+            /*
+-      .writefn = tlbiipas2_is_write },
-+             * The two TBI bits differ.
++      .type = ARM_CP_NOP, .access = PL2_W },
-+             * If tbi0, then !tbi1: only use the extension if positive.
+     { .name = "TLBIIPAS2L",
-+             * if !tbi0, then tbi1: only use the extension if negative.
+       .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
-+             */
+-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-+            tcg_gen_movcond_i64(tbi == 1 ? TCG_COND_GE : TCG_COND_LT,
+-      .writefn = tlbiipas2_write },
-+                                dst, dst, tcg_zero, dst, src);
++      .type = ARM_CP_NOP, .access = PL2_W },
-+            tcg_temp_free_i64(tcg_zero);
+     { .name = "TLBIIPAS2LIS",
-         }
+       .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
-     }
+-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-+}
+-      .writefn = tlbiipas2_is_write },
++      .type = ARM_CP_NOP, .access = PL2_W },
--    /* Load unmodified address */
+     /* 32 bit cache operations */
--    tcg_gen_mov_i64(cpu_pc, src);
+     { .name = "ICIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
-+static void gen_a64_set_pc(DisasContext *s, TCGv_i64 src)
+       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
 +{
 +    /*
 +     * If address tagging is enabled for instructions via the TCR TBI bits,
 +     * then loading an address into the PC will clear out any tag.
 +     */
 +    gen_top_byte_ignore(s, cpu_pc, src, s->tbii);
  }
  typedef struct DisasCompare64 {
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
      core_mmu_idx = FIELD_EX32(tb_flags, TBFLAG_ANY, MMUIDX);
      dc->mmu_idx = core_to_arm_mmu_idx(env, core_mmu_idx);
      dc->tbii = FIELD_EX32(tb_flags, TBFLAG_A64, TBII);
 +    dc->tbid = FIELD_EX32(tb_flags, TBFLAG_A64, TBID);
      dc->current_el = arm_mmu_idx_to_el(dc->mmu_idx);
  #if !defined(CONFIG_USER_ONLY)
      dc->user = (dc->current_el == 0);
 --
 .20.1

-[Qemu-devel] [PULL 05/22] target/arm: Cache the GP bit for a page in MemTxAttrs
+[PULL 04/39] target/arm: Use enum constant in get_phys_addr_lpae() call
-From: Richard Henderson <richard.henderson@linaro.org>
+The access_type argument to get_phys_addr_lpae() is an MMUAccessType;
 use the enum constant MMU_DATA_LOAD rather than a literal 0 when we
 call it in S1_ptw_translate().
-Caching the bit means that we will not have to re-walk the
-page tables to look up the bit during translation.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20190128223118.5255-6-richard.henderson@linaro.org
-[PMM: no need to OR in guarded bit status]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200330210400.11724-3-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 6 ++++++
+ target/arm/helper.c | 5 +++--
-file changed, 6 insertions(+)
+file changed, 3 insertions(+), 2 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
+@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
-     bool ttbr1_valid;
+             pcacheattrs = &cacheattrs;
      uint64_t descaddrmask;
      bool aarch64 = arm_el_is_aa64(env, el);
 +    bool guarded = false;
      /* TODO:
       * This code does not handle the different format TCR for VTCR_EL2.
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
          }
-         /* Merge in attributes from table descriptors */
-         attrs |= nstable << 3; /* NS */
+-        ret = get_phys_addr_lpae(env, addr, 0, ARMMMUIdx_Stage2, &s2pa,
-+        guarded = extract64(descriptor, 50, 1);  /* GP */
+-                                 &txattrs, &s2prot, &s2size, fi, pcacheattrs);
-         if (param.hpd) {
++        ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2,
-             /* HPD disables all the table attributes except NSTable.  */
++                                 &s2pa, &txattrs, &s2prot, &s2size, fi,
-             break;
++                                 pcacheattrs);
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
+         if (ret) {
-          */
+             assert(fi->type != ARMFault_None);
-         txattrs->secure = false;
+             fi->s2addr = addr;
      }
 +    /* When in aarch64 mode, and BTI is enabled, remember GP in the IOTLB.  */
 +    if (aarch64 && guarded && cpu_isar_feature(aa64_bti, cpu)) {
 +        txattrs->target_tlb_bit0 = true;
 +    }
      if (cacheattrs != NULL) {
          if (mmu_idx == ARMMMUIdx_S2NS) {
 --
 .20.1

-[Qemu-devel] [PULL 22/22] target/arm: Make FPSCR/FPCR trapped-exception bits RAZ/WI
+[PULL 05/39] target/arm: Add new 's1_is_el0' argument to get_phys_addr_lpae()
-The {IOE, DZE, OFE, UFE, IXE, IDE} bits in the FPSCR/FPCR are for
+For ARMv8.2-TTS2UXN, the stage 2 page table walk wants to know
-enabling trapped IEEE floating point exceptions (where IEEE exception
+whether the stage 1 access is for EL0 or not, because whether
-conditions cause a CPU exception rather than updating the FPSR status
+exec permission is given can depend on whether this is an EL0
-bits). QEMU doesn't implement this (and nor does the hardware we're
+or EL1 access. Add a new argument to get_phys_addr_lpae() so
-modelling), but for implementations which don't implement trapped
+the call sites can pass this information in.
 exception handling these control bits are supposed to be RAZ/WI.
 This allows guest code to test for whether the feature is present
 by trying to write to the bit and checking whether it sticks.
-QEMU is incorrectly making these bits read as written. Make them
+Since get_phys_addr_lpae() doesn't already have a doc comment,
-RAZ/WI as the architecture requires.
+add one so we have a place to put the documentation of the
 semantics of the new s1_is_el0 argument.
-In particular this was causing problems for the NetBSD automatic
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-test suite.
+Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200330210400.11724-4-peter.maydell@linaro.org
 ---
  target/arm/helper.c | 29 ++++++++++++++++++++++++++++-
 file changed, 28 insertions(+), 1 deletion(-)
-Reported-by: Martin Husemann <martin@netbsd.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190131130700.28392-1-peter.maydell@linaro.org
----
- target/arm/cpu.h    | 6 ++++++
- target/arm/helper.c | 6 ++++++
-files changed, 12 insertions(+)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
-+++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
- #define FPSR_MASK 0xf800009f
- #define FPCR_MASK 0x07ff9f00
-+#define FPCR_IOE    (1 << 8)    /* Invalid Operation exception trap enable */
-+#define FPCR_DZE    (1 << 9)    /* Divide by Zero exception trap enable */
-+#define FPCR_OFE    (1 << 10)   /* Overflow exception trap enable */
-+#define FPCR_UFE    (1 << 11)   /* Underflow exception trap enable */
-+#define FPCR_IXE    (1 << 12)   /* Inexact exception trap enable */
-+#define FPCR_IDE    (1 << 15)   /* Input Denormal exception trap enable */
- #define FPCR_FZ16   (1 << 19)   /* ARMv8.2+, FP16 flush-to-zero */
- #define FPCR_FZ     (1 << 24)   /* Flush-to-zero enable bit */
- #define FPCR_DN     (1 << 25)   /* Default NaN enable bit */
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
+@@ -XXX,XX +XXX,XX @@
-         val &= ~FPCR_FZ16;
  static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                               bool s1_is_el0,
                                 hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
                                 target_ulong *page_size_ptr,
                                 ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs);
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
          }
          ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2,
 +                                 false,
                                   &s2pa, &txattrs, &s2prot, &s2size, fi,
                                   pcacheattrs);
          if (ret) {
@@ -XXX,XX +XXX,XX @@ static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
      };
  }
 +/**
 + * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
 + *
 + * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
 + * prot and page_size may not be filled in, and the populated fsr value provides
 + * information on why the translation aborted, in the format of a long-format
 + * DFSR/IFSR fault register, with the following caveats:
 + *  * the WnR bit is never set (the caller must do this).
 + *
 + * @env: CPUARMState
 + * @address: virtual address to get physical address for
 + * @access_type: MMU_DATA_LOAD, MMU_DATA_STORE or MMU_INST_FETCH
 + * @mmu_idx: MMU index indicating required translation regime
 + * @s1_is_el0: if @mmu_idx is ARMMMUIdx_Stage2 (so this is a stage 2 page table
 + *             walk), must be true if this is stage 2 of a stage 1+2 walk for an
 + *             EL0 access). If @mmu_idx is anything else, @s1_is_el0 is ignored.
 + * @phys_ptr: set to the physical address corresponding to the virtual address
 + * @attrs: set to the memory transaction attributes to use
 + * @prot: set to the permissions for the page containing phys_ptr
 + * @page_size_ptr: set to the size of the page containing phys_ptr
 + * @fi: set to fault info if the translation fails
 + * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
 + */
  static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                               bool s1_is_el0,
                                 hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
                                 target_ulong *page_size_ptr,
                                 ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
              /* S1 is done. Now do S2 translation.  */
              ret = get_phys_addr_lpae(env, ipa, access_type, ARMMMUIdx_Stage2,
 +                                     mmu_idx == ARMMMUIdx_E10_0,
                                       phys_ptr, attrs, &s2_prot,
                                       page_size, fi,
                                       cacheattrs != NULL ? &cacheattrs2 : NULL);
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
      }
-+    /*
+     if (regime_using_lpae_format(env, mmu_idx)) {
-+     * We don't implement trapped exception handling, so the
+-        return get_phys_addr_lpae(env, address, access_type, mmu_idx,
-+     * trap enable bits are all RAZ/WI (not RES0!)
++        return get_phys_addr_lpae(env, address, access_type, mmu_idx, false,
-+     */
+                                   phys_ptr, attrs, prot, page_size,
-+    val &= ~(FPCR_IDE | FPCR_IXE | FPCR_UFE | FPCR_OFE | FPCR_DZE | FPCR_IOE);
+                                   fi, cacheattrs);
-+
+     } else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
      changed = env->vfp.xregs[ARM_VFP_FPSCR];
      env->vfp.xregs[ARM_VFP_FPSCR] = (val & 0xffc8ffff);
      env->vfp.vec_len = (val >> 16) & 7;
 --
 .20.1

-[Qemu-devel] [PULL 01/22] target/arm: Introduce isar_feature_aa64_bti
+[PULL 06/39] target/arm: Implement ARMv8.2-TTS2UXN
-From: Richard Henderson <richard.henderson@linaro.org>
+The ARMv8.2-TTS2UXN feature extends the XN field in stage 2
 translation table descriptors from just bit [54] to bits [54:53],
 allowing stage 2 to control execution permissions separately for EL0
 and EL1. Implement the new semantics of the XN field and enable
 the feature for our 'max' CPU.
-Also create field definitions for id_aa64pfr1 from ARMv8.5.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190128223118.5255-2-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200330210400.11724-5-peter.maydell@linaro.org
 ---
- target/arm/cpu.h | 10 ++++++++++
+ target/arm/cpu.h    | 15 +++++++++++++++
-file changed, 10 insertions(+)
+ target/arm/cpu.c    |  1 +
  target/arm/cpu64.c  |  2 ++
  target/arm/helper.c | 37 +++++++++++++++++++++++++++++++------
 files changed, 49 insertions(+), 6 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ FIELD(ID_AA64PFR0, GIC, 24, 4)
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_ccidx(const ARMISARegisters *id)
- FIELD(ID_AA64PFR0, RAS, 28, 4)
+     return FIELD_EX32(id->id_mmfr4, ID_MMFR4, CCIDX) != 0;
- FIELD(ID_AA64PFR0, SVE, 32, 4)
+ }
-+FIELD(ID_AA64PFR1, BT, 0, 4)
++static inline bool isar_feature_aa32_tts2uxn(const ARMISARegisters *id)
-+FIELD(ID_AA64PFR1, SBSS, 4, 4)
++{
-+FIELD(ID_AA64PFR1, MTE, 8, 4)
++    return FIELD_EX32(id->id_mmfr4, ID_MMFR4, XNX) != 0;
-+FIELD(ID_AA64PFR1, RAS_FRAC, 12, 4)
++}
 +
- FIELD(ID_AA64MMFR0, PARANGE, 0, 4)
+ /*
- FIELD(ID_AA64MMFR0, ASIDBITS, 4, 4)
+  * 64-bit feature tests via id registers.
- FIELD(ID_AA64MMFR0, BIGEND, 8, 4)
+  */
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_lor(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_ccidx(const ARMISARegisters *id)
-     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, LO) != 0;
+     return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, CCIDX) != 0;
  }
-+static inline bool isar_feature_aa64_bti(const ARMISARegisters *id)
++static inline bool isar_feature_aa64_tts2uxn(const ARMISARegisters *id)
 +{
-+    return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, BT) != 0;
++    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, XNX) != 0;
 +}
 +
  /*
   * Feature tests for "does this exist in either 32-bit or 64-bit?"
   */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_any_ccidx(const ARMISARegisters *id)
      return isar_feature_aa64_ccidx(id) || isar_feature_aa32_ccidx(id);
  }
 +static inline bool isar_feature_any_tts2uxn(const ARMISARegisters *id)
 +{
 +    return isar_feature_aa64_tts2uxn(id) || isar_feature_aa32_tts2uxn(id);
 +}
 +
  /*
   * Forward to the above feature tests given an ARMCPU pointer.
   */
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
+             t = FIELD_DP32(t, ID_MMFR4, HPDS, 1); /* AA32HPD */
+             t = FIELD_DP32(t, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
+             t = FIELD_DP32(t, ID_MMFR4, CNP, 1); /* TTCNP */
++            t = FIELD_DP32(t, ID_MMFR4, XNX, 1); /* TTS2UXN */
+             cpu->isar.id_mmfr4 = t;
+         }
+ #endif
+diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu64.c
++++ b/target/arm/cpu64.c
+@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
+         t = FIELD_DP64(t, ID_AA64MMFR1, VH, 1);
+         t = FIELD_DP64(t, ID_AA64MMFR1, PAN, 2); /* ATS1E1 */
+         t = FIELD_DP64(t, ID_AA64MMFR1, VMIDBITS, 2); /* VMID16 */
++        t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1); /* TTS2UXN */
+         cpu->isar.id_aa64mmfr1 = t;
+         t = cpu->isar.id_aa64mmfr2;
+@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
+         u = FIELD_DP32(u, ID_MMFR4, HPDS, 1); /* AA32HPD */
+         u = FIELD_DP32(u, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
+         u = FIELD_DP32(u, ID_MMFR4, CNP, 1); /* TTCNP */
++        u = FIELD_DP32(u, ID_MMFR4, XNX, 1); /* TTS2UXN */
+         cpu->isar.id_mmfr4 = u;
+         u = cpu->isar.id_aa64dfr0;
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+  *
+  * @env:     CPUARMState
+  * @s2ap:    The 2-bit stage2 access permissions (S2AP)
+- * @xn:      XN (execute-never) bit
++ * @xn:      XN (execute-never) bits
++ * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
+  */
+-static int get_S2prot(CPUARMState *env, int s2ap, int xn)
++static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
+ {
+     int prot = 0;
+@@ -XXX,XX +XXX,XX @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn)
+     if (s2ap & 2) {
+         prot |= PAGE_WRITE;
+     }
+-    if (!xn) {
+-        if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
++
++    if (cpu_isar_feature(any_tts2uxn, env_archcpu(env))) {
++        switch (xn) {
++        case 0:
+             prot |= PAGE_EXEC;
++            break;
++        case 1:
++            if (s1_is_el0) {
++                prot |= PAGE_EXEC;
++            }
++            break;
++        case 2:
++            break;
++        case 3:
++            if (!s1_is_el0) {
++                prot |= PAGE_EXEC;
++            }
++            break;
++        default:
++            g_assert_not_reached();
++        }
++    } else {
++        if (!extract32(xn, 1, 1)) {
++            if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
++                prot |= PAGE_EXEC;
++            }
+         }
+     }
+     return prot;
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
+     }
+     ap = extract32(attrs, 4, 2);
+-    xn = extract32(attrs, 12, 1);
+     if (mmu_idx == ARMMMUIdx_Stage2) {
+         ns = true;
+-        *prot = get_S2prot(env, ap, xn);
++        xn = extract32(attrs, 11, 2);
++        *prot = get_S2prot(env, ap, xn, s1_is_el0);
+     } else {
+         ns = extract32(attrs, 3, 1);
++        xn = extract32(attrs, 12, 1);
+         pxn = extract32(attrs, 11, 1);
+         *prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
+     }
 --
 .20.1

-[Qemu-devel] [PULL 09/22] target/arm: Enable BTI for -cpu max
+[PULL 07/39] target/arm: Use correct variable for setting 'max' cpu's ID_AA64DFR0
-From: Richard Henderson <richard.henderson@linaro.org>
+In aarch64_max_initfn() we update both 32-bit and 64-bit ID
 registers.  The intended pattern is that for 64-bit ID registers we
 use FIELD_DP64 and the uint64_t 't' register, while 32-bit ID
 registers use FIELD_DP32 and the uint32_t 'u' register.  For
 ID_AA64DFR0 we accidentally used 'u', meaning that the top 32 bits of
 this 64-bit ID register would end up always zero.  Luckily at the
 moment that's what they should be anyway, so this bug has no visible
 effects.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Use the right-sized variable.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190128223118.5255-11-richard.henderson@linaro.org
+Fixes: 3bec78447a958d481991
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20200423110915.10527-1-peter.maydell@linaro.org
 ---
- target/arm/cpu64.c | 4 ++++
+ target/arm/cpu64.c | 6 +++---
-file changed, 4 insertions(+)
+file changed, 3 insertions(+), 3 deletions(-)
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
 @@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
-         t = FIELD_DP64(t, ID_AA64PFR0, ADVSIMD, 1);
+         u = FIELD_DP32(u, ID_MMFR4, XNX, 1); /* TTS2UXN */
-         cpu->isar.id_aa64pfr0 = t;
+         cpu->isar.id_mmfr4 = u;
-+        t = cpu->isar.id_aa64pfr1;
+-        u = cpu->isar.id_aa64dfr0;
-+        t = FIELD_DP64(t, ID_AA64PFR1, BT, 1);
+-        u = FIELD_DP64(u, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
-+        cpu->isar.id_aa64pfr1 = t;
+-        cpu->isar.id_aa64dfr0 = u;
-+
++        t = cpu->isar.id_aa64dfr0;
-         t = cpu->isar.id_aa64mmfr1;
++        t = FIELD_DP64(t, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
-         t = FIELD_DP64(t, ID_AA64MMFR1, HPDS, 1); /* HPD */
++        cpu->isar.id_aa64dfr0 = t;
-         t = FIELD_DP64(t, ID_AA64MMFR1, LO, 1);
          u = cpu->isar.id_dfr0;
          u = FIELD_DP32(u, ID_DFR0, PERFMON, 5); /* v8.4-PMU */
 --
 .20.1

-[Qemu-devel] [PULL 02/22] target/arm: Add PSTATE.BTYPE
+[PULL 08/39] target/arm: Use uint64_t for midr field in CPU state struct
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Place this in its own field within ENV, as that will
+MIDR_EL1 is a 64-bit system register with the top 32-bit being RES0.
-make it easier to reset from within TCG generated code.
+Represent it in QEMU's ARMCPU struct with a uint64_t, not a
 uint32_t.
-With the change to pstate_read/write, exception entry
+This fixes an error when compiling with -Werror=conversion
-and return are automatically handled.
+because we were manipulating the register value using a
 local uint64_t variable:
+  target/arm/cpu64.c: In function ‘aarch64_max_initfn’:
+  target/arm/cpu64.c:628:21: error: conversion from ‘uint64_t’ {aka ‘long unsigned int’} to ‘uint32_t’ {aka ‘unsigned int’} may change value [-Werror=conversion]
+|         cpu->midr = t;
+        |                     ^
+and future-proofs us against a possible future architecture
+change using some of the top 32 bits.
+Suggested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
+Suggested-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
+Message-id: 20200428172634.29707-1-f4bug@amsat.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190128223118.5255-3-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h           | 8 ++++++--
+ target/arm/cpu.h | 2 +-
- target/arm/translate-a64.c | 3 +++
+ target/arm/cpu.c | 2 +-
-files changed, 9 insertions(+), 2 deletions(-)
+files changed, 2 insertions(+), 2 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
+@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
-      *    semantics as for AArch32, as described in the comments on each field)
+         uint64_t id_aa64dfr0;
-      *  nRW (also known as M[4]) is kept, inverted, in env->aarch64
+         uint64_t id_aa64dfr1;
-      *  DAIF (exception masks) are kept in env->daif
+     } isar;
-+     *  BTYPE is kept in env->btype
+-    uint32_t midr;
-      *  all other bits are stored in their correct places in env->pstate
++    uint64_t midr;
-      */
+     uint32_t revidr;
-     uint32_t pstate;
+     uint32_t reset_fpsid;
-@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
+     uint32_t ctr;
-     uint32_t GE; /* cpsr[19:16] */
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
      uint32_t thumb; /* cpsr[5]. 0 = arm mode, 1 = thumb mode. */
      uint32_t condexec_bits; /* IT bits.  cpsr[15:10,26:25].  */
 +    uint32_t btype;  /* BTI branch type.  spsr[11:10].  */
      uint64_t daif; /* exception masks, in the bits they are in PSTATE */
      uint64_t elr_el[4]; /* AArch64 exception link regs  */
@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
  #define PSTATE_I (1U << 7)
  #define PSTATE_A (1U << 8)
  #define PSTATE_D (1U << 9)
 +#define PSTATE_BTYPE (3U << 10)
  #define PSTATE_IL (1U << 20)
  #define PSTATE_SS (1U << 21)
  #define PSTATE_V (1U << 28)
@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
  #define PSTATE_N (1U << 31)
  #define PSTATE_NZCV (PSTATE_N | PSTATE_Z | PSTATE_C | PSTATE_V)
  #define PSTATE_DAIF (PSTATE_D | PSTATE_A | PSTATE_I | PSTATE_F)
 -#define CACHED_PSTATE_BITS (PSTATE_NZCV | PSTATE_DAIF)
 +#define CACHED_PSTATE_BITS (PSTATE_NZCV | PSTATE_DAIF | PSTATE_BTYPE)
  /* Mode values for AArch64 */
  #define PSTATE_MODE_EL3h 13
  #define PSTATE_MODE_EL3t 12
@@ -XXX,XX +XXX,XX @@ static inline uint32_t pstate_read(CPUARMState *env)
      ZF = (env->ZF == 0);
      return (env->NF & 0x80000000) | (ZF << 30)
          | (env->CF << 29) | ((env->VF & 0x80000000) >> 3)
 -        | env->pstate | env->daif;
 +        | env->pstate | env->daif | (env->btype << 10);
  }
  static inline void pstate_write(CPUARMState *env, uint32_t val)
@@ -XXX,XX +XXX,XX @@ static inline void pstate_write(CPUARMState *env, uint32_t val)
      env->CF = (val >> 29) & 1;
      env->VF = (val << 3) & 0x80000000;
      env->daif = val & PSTATE_DAIF;
 +    env->btype = (val >> 10) & 3;
      env->pstate = val & ~CACHED_PSTATE_BITS;
  }
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/cpu.c
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_cpus[] = {
-                 el,
+ static Property arm_cpu_properties[] = {
-                 psr & PSTATE_SP ? 'h' : 't');
+     DEFINE_PROP_BOOL("start-powered-off", ARMCPU, start_powered_off, false),
+     DEFINE_PROP_UINT32("psci-conduit", ARMCPU, psci_conduit, 0),
-+    if (cpu_isar_feature(aa64_bti, cpu)) {
+-    DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
-+        cpu_fprintf(f, "  BTYPE=%d", (psr & PSTATE_BTYPE) >> 10);
++    DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
-+    }
+     DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
-     if (!(flags & CPU_DUMP_FPU)) {
+                         mp_affinity, ARM64_AFFINITY_INVALID),
-         cpu_fprintf(f, "\n");
+     DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
          return;
 --
 .20.1

-New patch
+[PULL 09/39] hw/arm: versal: Remove inclusion of arm_gicv3_common.h
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+Remove inclusion of arm_gicv3_common.h, this already gets
+included via xlnx-versal.h.
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+Message-id: 20200427181649.26851-2-edgar.iglesias@gmail.com
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/arm/xlnx-versal.c | 1 -
+file changed, 1 deletion(-)
+diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/arm/xlnx-versal.c
++++ b/hw/arm/xlnx-versal.c
+@@ -XXX,XX +XXX,XX @@
+ #include "hw/arm/boot.h"
+ #include "kvm_arm.h"
+ #include "hw/misc/unimp.h"
+-#include "hw/intc/arm_gicv3_common.h"
+ #include "hw/arm/xlnx-versal.h"
+ #include "hw/char/pl011.h"
+--
+.20.1

-[Qemu-devel] [PULL 16/22] gdbstub: allow killing QEMU via vKill command
+[PULL 10/39] hw/arm: versal: Move misplaced comment
-From: Max Filippov <jcmvbkbc@gmail.com>
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
-With multiprocess extensions gdb uses 'vKill' packet instead of 'k' to
+Move misplaced comment.
 kill the inferior. Handle 'vKill' the same way 'k' was handled in the
 presence of single process.
-Fixes: 7cf48f6752e5 ("gdbstub: add multiprocess support to
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
-(f|s)ThreadInfo and ThreadExtraInfo")
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Cc: Luc Michel <luc.michel@greensocs.com>
 Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
-Reviewed-by: KONRAD Frederic <frederic.konrad@adacore.com>
+Message-id: 20200427181649.26851-3-edgar.iglesias@gmail.com
 Tested-by: KONRAD Frederic <frederic.konrad@adacore.com>
 Message-id: 20190130192403.13754-1-jcmvbkbc@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- gdbstub.c | 4 ++++
+ hw/arm/xlnx-versal.c | 2 +-
-file changed, 4 insertions(+)
+file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/gdbstub.c b/gdbstub.c
+diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
---- a/gdbstub.c
+--- a/hw/arm/xlnx-versal.c
-+++ b/gdbstub.c
++++ b/hw/arm/xlnx-versal.c
-@@ -XXX,XX +XXX,XX @@ static int gdb_handle_packet(GDBState *s, const char *line_buf)
+@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
-             put_packet(s, buf);
+         obj = object_new(XLNX_VERSAL_ACPU_TYPE);
-             break;
+         if (!obj) {
-+        } else if (strncmp(p, "Kill;", 5) == 0) {
+-            /* Secondary CPUs start in PSCI powered-down state */
-+            /* Kill the target */
+             error_report("Unable to create apu.cpu[%d] of type %s",
-+            error_report("QEMU: Terminated via GDBstub");
+                          i, XLNX_VERSAL_ACPU_TYPE);
-+            exit(0);
+             exit(EXIT_FAILURE);
-         } else {
+@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
-             goto unknown_command;
+         object_property_set_int(obj, s->cfg.psci_conduit,
                                  "psci-conduit", &error_abort);
          if (i) {
 +            /* Secondary CPUs start in PSCI powered-down state */
              object_property_set_bool(obj, true,
                                       "start-powered-off", &error_abort);
          }
 --
 .20.1

-New patch
+[PULL 11/39] hw/arm: versal-virt: Fix typo xlnx-ve -> xlnx-versal
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+Fix typo xlnx-ve -> xlnx-versal.
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+Message-id: 20200427181649.26851-4-edgar.iglesias@gmail.com
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/arm/xlnx-versal-virt.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/arm/xlnx-versal-virt.c
++++ b/hw/arm/xlnx-versal-virt.c
+@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
+         psci_conduit = QEMU_PSCI_CONDUIT_SMC;
+     }
+-    sysbus_init_child_obj(OBJECT(machine), "xlnx-ve", &s->soc,
++    sysbus_init_child_obj(OBJECT(machine), "xlnx-versal", &s->soc,
+                           sizeof(s->soc), TYPE_XLNX_VERSAL);
+     object_property_set_link(OBJECT(&s->soc), OBJECT(machine->ram),
+                              "ddr", &error_abort);
+--
+.20.1

-[Qemu-devel] [PULL 15/22] target/arm: Enable TBI for user-only
+[PULL 12/39] hw/arm: versal: Embed the UARTs into the SoC type
-From: Richard Henderson <richard.henderson@linaro.org>
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
-This has been enabled in the linux kernel since v3.11
+Embed the UARTs into the SoC type.
 (commit d50240a5f6cea, 2013-09-03,
 "arm64: mm: permit use of tagged pointers at EL0").
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Suggested-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
-Message-id: 20190204132126.3255-5-richard.henderson@linaro.org
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-5-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.c | 6 ++++++
+ include/hw/arm/xlnx-versal.h |  3 ++-
-file changed, 6 insertions(+)
+ hw/arm/xlnx-versal.c         | 12 ++++++------
 files changed, 8 insertions(+), 7 deletions(-)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/target/arm/cpu.c
++++ b/include/hw/arm/xlnx-versal.h
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
+@@ -XXX,XX +XXX,XX @@
-         env->vfp.zcr_el[1] = cpu->sve_max_vq - 1;
+ #include "hw/sysbus.h"
-         env->vfp.zcr_el[2] = env->vfp.zcr_el[1];
+ #include "hw/arm/boot.h"
-         env->vfp.zcr_el[3] = env->vfp.zcr_el[1];
+ #include "hw/intc/arm_gicv3.h"
-+        /*
++#include "hw/char/pl011.h"
-+         * Enable TBI0 and TBI1.  While the real kernel only enables TBI0,
-+         * turning on both here will produce smaller code and otherwise
+ #define TYPE_XLNX_VERSAL "xlnx-versal"
-+         * make no difference to the user-level emulation.
+ #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
-+         */
+@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
-+        env->cp15.tcr_el[1].raw_tcr = (3ULL << 37);
+         MemoryRegion mr_ocm;
- #else
-         /* Reset into the highest available EL */
+         struct {
-         if (arm_feature(env, ARM_FEATURE_EL3)) {
+-            SysBusDevice *uart[XLNX_VERSAL_NR_UARTS];
 +            PL011State uart[XLNX_VERSAL_NR_UARTS];
              SysBusDevice *gem[XLNX_VERSAL_NR_GEMS];
              SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
          } iou;
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-versal.c
 +++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@
  #include "kvm_arm.h"
  #include "hw/misc/unimp.h"
  #include "hw/arm/xlnx-versal.h"
 -#include "hw/char/pl011.h"
  #define XLNX_VERSAL_ACPU_TYPE ARM_CPU_TYPE_NAME("cortex-a72")
  #define GEM_REVISION        0x40070106
@@ -XXX,XX +XXX,XX @@ static void versal_create_uarts(Versal *s, qemu_irq *pic)
          DeviceState *dev;
          MemoryRegion *mr;
 -        dev = qdev_create(NULL, TYPE_PL011);
 -        s->lpd.iou.uart[i] = SYS_BUS_DEVICE(dev);
 +        sysbus_init_child_obj(OBJECT(s), name,
 +                              &s->lpd.iou.uart[i], sizeof(s->lpd.iou.uart[i]),
 +                              TYPE_PL011);
 +        dev = DEVICE(&s->lpd.iou.uart[i]);
          qdev_prop_set_chr(dev, "chardev", serial_hd(i));
 -        object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
          qdev_init_nofail(dev);
 -        mr = sysbus_mmio_get_region(s->lpd.iou.uart[i], 0);
 +        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
          memory_region_add_subregion(&s->mr_ps, addrs[i], mr);
 -        sysbus_connect_irq(s->lpd.iou.uart[i], 0, pic[irqs[i]]);
 +        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[irqs[i]]);
          g_free(name);
      }
  }
 --
 .20.1

-[Qemu-devel] [PULL 11/22] tests/tcg/aarch64: Add pauth smoke test
+[PULL 13/39] hw/arm: versal: Embed the GEMs into the SoC type
-From: Richard Henderson <richard.henderson@linaro.org>
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Embed the GEMs into the SoC type.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190201195404.30486-3-richard.henderson@linaro.org
+Suggested-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-6-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- tests/tcg/aarch64/Makefile.target |  6 +++++-
+ include/hw/arm/xlnx-versal.h |  3 ++-
- tests/tcg/aarch64/pauth-1.c       | 23 +++++++++++++++++++++++
+ hw/arm/xlnx-versal.c         | 15 ++++++++-------
-files changed, 28 insertions(+), 1 deletion(-)
+files changed, 10 insertions(+), 8 deletions(-)
  create mode 100644 tests/tcg/aarch64/pauth-1.c
-diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 index XXXXXXX..XXXXXXX 100644
---- a/tests/tcg/aarch64/Makefile.target
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/tests/tcg/aarch64/Makefile.target
++++ b/include/hw/arm/xlnx-versal.h
@@ -XXX,XX +XXX,XX @@ VPATH         += $(AARCH64_SRC)
  # we don't build any of the ARM tests
  AARCH64_TESTS=$(filter-out $(ARM_TESTS), $(TESTS))
  AARCH64_TESTS+=fcvt
 -TESTS:=$(AARCH64_TESTS)
  fcvt: LDFLAGS+=-lm
  run-fcvt: fcvt
      $(call run-test,$<,$(QEMU) $<, "$< on $(TARGET_NAME)")
      $(call diff-out,$<,$(AARCH64_SRC)/fcvt.ref)
 +
 +AARCH64_TESTS += pauth-1
 +run-pauth-%: QEMU += -cpu max
 +
 +TESTS:=$(AARCH64_TESTS)
 diff --git a/tests/tcg/aarch64/pauth-1.c b/tests/tcg/aarch64/pauth-1.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/tests/tcg/aarch64/pauth-1.c
 @@ -XXX,XX +XXX,XX @@
-+#include <assert.h>
+ #include "hw/arm/boot.h"
-+#include <sys/prctl.h>
+ #include "hw/intc/arm_gicv3.h"
-+
+ #include "hw/char/pl011.h"
-+asm(".arch armv8.4-a");
++#include "hw/net/cadence_gem.h"
-+
-+#ifndef PR_PAC_RESET_KEYS
+ #define TYPE_XLNX_VERSAL "xlnx-versal"
-+#define PR_PAC_RESET_KEYS  54
+ #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
-+#define PR_PAC_APDAKEY     (1 << 2)
+@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
-+#endif
-+
+         struct {
-+int main()
+             PL011State uart[XLNX_VERSAL_NR_UARTS];
-+{
+-            SysBusDevice *gem[XLNX_VERSAL_NR_GEMS];
-+    int x;
++            CadenceGEMState gem[XLNX_VERSAL_NR_GEMS];
-+    void *p0 = &x, *p1, *p2;
+             SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
-+
+         } iou;
-+    asm volatile("pacdza %0" : "=r"(p1) : "0"(p0));
+     } lpd;
-+    prctl(PR_PAC_RESET_KEYS, PR_PAC_APDAKEY, 0, 0, 0);
+diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
-+    asm volatile("pacdza %0" : "=r"(p2) : "0"(p0));
+index XXXXXXX..XXXXXXX 100644
-+
+--- a/hw/arm/xlnx-versal.c
-+    assert(p1 != p0);
++++ b/hw/arm/xlnx-versal.c
-+    assert(p1 != p2);
+@@ -XXX,XX +XXX,XX @@ static void versal_create_gems(Versal *s, qemu_irq *pic)
-+    return 0;
+         DeviceState *dev;
-+}
+         MemoryRegion *mr;
 -        dev = qdev_create(NULL, "cadence_gem");
 -        s->lpd.iou.gem[i] = SYS_BUS_DEVICE(dev);
 -        object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
 +        sysbus_init_child_obj(OBJECT(s), name,
 +                              &s->lpd.iou.gem[i], sizeof(s->lpd.iou.gem[i]),
 +                              TYPE_CADENCE_GEM);
 +        dev = DEVICE(&s->lpd.iou.gem[i]);
          if (nd->used) {
              qemu_check_nic_model(nd, "cadence_gem");
              qdev_set_nic_properties(dev, nd);
          }
 -        object_property_set_int(OBJECT(s->lpd.iou.gem[i]),
 +        object_property_set_int(OBJECT(dev),
 , "num-priority-queues",
                                  &error_abort);
 -        object_property_set_link(OBJECT(s->lpd.iou.gem[i]),
 +        object_property_set_link(OBJECT(dev),
                                   OBJECT(&s->mr_ps), "dma",
                                   &error_abort);
          qdev_init_nofail(dev);
 -        mr = sysbus_mmio_get_region(s->lpd.iou.gem[i], 0);
 +        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
          memory_region_add_subregion(&s->mr_ps, addrs[i], mr);
 -        sysbus_connect_irq(s->lpd.iou.gem[i], 0, pic[irqs[i]]);
 +        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[irqs[i]]);
          g_free(name);
      }
  }
 --
 .20.1

-[Qemu-devel] [PULL 04/22] exec: Add target-specific tlb bits to MemTxAttrs
+[PULL 14/39] hw/arm: versal: Embed the ADMAs into the SoC type
-From: Richard Henderson <richard.henderson@linaro.org>
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
-These bits can be used to cache target-specific data in cputlb
+Embed the ADMAs into the SoC type.
 read from the page tables.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Suggested-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
-Message-id: 20190128223118.5255-5-richard.henderson@linaro.org
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-7-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/exec/memattrs.h | 10 ++++++++++
+ include/hw/arm/xlnx-versal.h |  3 ++-
-file changed, 10 insertions(+)
+ hw/arm/xlnx-versal.c         | 14 +++++++-------
 files changed, 9 insertions(+), 8 deletions(-)
-diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/exec/memattrs.h
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/include/exec/memattrs.h
++++ b/include/hw/arm/xlnx-versal.h
-@@ -XXX,XX +XXX,XX @@ typedef struct MemTxAttrs {
+@@ -XXX,XX +XXX,XX @@
-     unsigned int user:1;
+ #include "hw/arm/boot.h"
-     /* Requester ID (for MSI for example) */
+ #include "hw/intc/arm_gicv3.h"
-     unsigned int requester_id:16;
+ #include "hw/char/pl011.h"
-+    /*
++#include "hw/dma/xlnx-zdma.h"
-+     * The following are target-specific page-table bits.  These are not
+ #include "hw/net/cadence_gem.h"
-+     * related to actual memory transactions at all.  However, this structure
-+     * is part of the tlb_fill interface, cached in the cputlb structure,
+ #define TYPE_XLNX_VERSAL "xlnx-versal"
-+     * and has unused bits.  These fields will be read by target-specific
+@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
-+     * helpers using env->iotlb[mmu_idx][tlb_index()].attrs.target_tlb_bitN.
+         struct {
-+     */
+             PL011State uart[XLNX_VERSAL_NR_UARTS];
-+    unsigned int target_tlb_bit0 : 1;
+             CadenceGEMState gem[XLNX_VERSAL_NR_GEMS];
-+    unsigned int target_tlb_bit1 : 1;
+-            SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
-+    unsigned int target_tlb_bit2 : 1;
++            XlnxZDMA adma[XLNX_VERSAL_NR_ADMAS];
- } MemTxAttrs;
+         } iou;
+     } lpd;
- /* Bus masters which don't specify any attributes will get this,
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-versal.c
 +++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_admas(Versal *s, qemu_irq *pic)
          DeviceState *dev;
          MemoryRegion *mr;
 -        dev = qdev_create(NULL, "xlnx.zdma");
 -        s->lpd.iou.adma[i] = SYS_BUS_DEVICE(dev);
 -        object_property_set_int(OBJECT(s->lpd.iou.adma[i]), 128, "bus-width",
 -                                &error_abort);
 -        object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
 +        sysbus_init_child_obj(OBJECT(s), name,
 +                              &s->lpd.iou.adma[i], sizeof(s->lpd.iou.adma[i]),
 +                              TYPE_XLNX_ZDMA);
 +        dev = DEVICE(&s->lpd.iou.adma[i]);
 +        object_property_set_int(OBJECT(dev), 128, "bus-width", &error_abort);
          qdev_init_nofail(dev);
 -        mr = sysbus_mmio_get_region(s->lpd.iou.adma[i], 0);
 +        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
          memory_region_add_subregion(&s->mr_ps,
                                      MM_ADMA_CH0 + i * MM_ADMA_CH0_SIZE, mr);
 -        sysbus_connect_irq(s->lpd.iou.adma[i], 0, pic[VERSAL_ADMA_IRQ_0 + i]);
 +        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[VERSAL_ADMA_IRQ_0 + i]);
          g_free(name);
      }
  }
 --
 .20.1

-[Qemu-devel] [PULL 13/22] target/arm: Clean TBI for data operations in the translator
+[PULL 15/39] hw/arm: versal: Embed the APUs into the SoC type
-From: Richard Henderson <richard.henderson@linaro.org>
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
-This will allow TBI to be used in user-only mode, as well as
+Embed the APUs into the SoC type.
 avoid ping-ponging the softmmu TLB when TBI is in use.  It
 will also enable other armv8 extensions.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Suggested-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
-Message-id: 20190204132126.3255-3-richard.henderson@linaro.org
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-8-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 217 ++++++++++++++++++++-----------------
+ include/hw/arm/xlnx-versal.h |  2 +-
-file changed, 116 insertions(+), 101 deletions(-)
+ hw/arm/xlnx-versal-virt.c    |  4 ++--
  hw/arm/xlnx-versal.c         | 19 +++++--------------
 files changed, 8 insertions(+), 17 deletions(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/target/arm/translate-a64.c
++++ b/include/hw/arm/xlnx-versal.h
-@@ -XXX,XX +XXX,XX @@ static void gen_a64_set_pc(DisasContext *s, TCGv_i64 src)
+@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
-     gen_top_byte_ignore(s, cpu_pc, src, s->tbii);
+     struct {
- }
+         struct {
+             MemoryRegion mr;
-+/*
+-            ARMCPU *cpu[XLNX_VERSAL_NR_ACPUS];
-+ * Return a "clean" address for ADDR according to TBID.
++            ARMCPU cpu[XLNX_VERSAL_NR_ACPUS];
-+ * This is always a fresh temporary, as we need to be able to
+             GICv3State gic;
-+ * increment this independently of a dirty write-back address.
+         } apu;
-+ */
+     } fpd;
-+static TCGv_i64 clean_data_tbi(DisasContext *s, TCGv_i64 addr)
+diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
-+{
+index XXXXXXX..XXXXXXX 100644
-+    TCGv_i64 clean = new_tmp_a64(s);
+--- a/hw/arm/xlnx-versal-virt.c
-+    gen_top_byte_ignore(s, clean, addr, s->tbid);
++++ b/hw/arm/xlnx-versal-virt.c
-+    return clean;
+@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
-+}
+     s->binfo.get_dtb = versal_virt_get_dtb;
-+
+     s->binfo.modify_dtb = versal_virt_modify_dtb;
- typedef struct DisasCompare64 {
+     if (machine->kernel_filename) {
-     TCGCond cond;
+-        arm_load_kernel(s->soc.fpd.apu.cpu[0], machine, &s->binfo);
-     TCGv_i64 value;
++        arm_load_kernel(&s->soc.fpd.apu.cpu[0], machine, &s->binfo);
@@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap(DisasContext *s, int rs, int rt,
      TCGv_i64 tcg_rs = cpu_reg(s, rs);
      TCGv_i64 tcg_rt = cpu_reg(s, rt);
      int memidx = get_mem_index(s);
 -    TCGv_i64 addr = cpu_reg_sp(s, rn);
 +    TCGv_i64 clean_addr;
      if (rn == 31) {
          gen_check_sp_alignment(s);
      }
 -    tcg_gen_atomic_cmpxchg_i64(tcg_rs, addr, tcg_rs, tcg_rt, memidx,
 +    clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
 +    tcg_gen_atomic_cmpxchg_i64(tcg_rs, clean_addr, tcg_rs, tcg_rt, memidx,
                                 size | MO_ALIGN | s->be_data);
  }
@@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
      TCGv_i64 s2 = cpu_reg(s, rs + 1);
      TCGv_i64 t1 = cpu_reg(s, rt);
      TCGv_i64 t2 = cpu_reg(s, rt + 1);
 -    TCGv_i64 addr = cpu_reg_sp(s, rn);
 +    TCGv_i64 clean_addr;
      int memidx = get_mem_index(s);
      if (rn == 31) {
          gen_check_sp_alignment(s);
      }
 +    clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
      if (size == 2) {
          TCGv_i64 cmp = tcg_temp_new_i64();
@@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
              tcg_gen_concat32_i64(cmp, s2, s1);
          }
 -        tcg_gen_atomic_cmpxchg_i64(cmp, addr, cmp, val, memidx,
 +        tcg_gen_atomic_cmpxchg_i64(cmp, clean_addr, cmp, val, memidx,
                                     MO_64 | MO_ALIGN | s->be_data);
          tcg_temp_free_i64(val);
@@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
          if (HAVE_CMPXCHG128) {
              TCGv_i32 tcg_rs = tcg_const_i32(rs);
              if (s->be_data == MO_LE) {
 -                gen_helper_casp_le_parallel(cpu_env, tcg_rs, addr, t1, t2);
 +                gen_helper_casp_le_parallel(cpu_env, tcg_rs,
 +                                            clean_addr, t1, t2);
              } else {
 -                gen_helper_casp_be_parallel(cpu_env, tcg_rs, addr, t1, t2);
 +                gen_helper_casp_be_parallel(cpu_env, tcg_rs,
 +                                            clean_addr, t1, t2);
              }
              tcg_temp_free_i32(tcg_rs);
          } else {
@@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
          TCGv_i64 zero = tcg_const_i64(0);
          /* Load the two words, in memory order.  */
 -        tcg_gen_qemu_ld_i64(d1, addr, memidx,
 +        tcg_gen_qemu_ld_i64(d1, clean_addr, memidx,
                              MO_64 | MO_ALIGN_16 | s->be_data);
 -        tcg_gen_addi_i64(a2, addr, 8);
 -        tcg_gen_qemu_ld_i64(d2, addr, memidx, MO_64 | s->be_data);
 +        tcg_gen_addi_i64(a2, clean_addr, 8);
 +        tcg_gen_qemu_ld_i64(d2, clean_addr, memidx, MO_64 | s->be_data);
          /* Compare the two words, also in memory order.  */
          tcg_gen_setcond_i64(TCG_COND_EQ, c1, d1, s1);
@@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
          /* If compare equal, write back new data, else write back old data.  */
          tcg_gen_movcond_i64(TCG_COND_NE, c1, c2, zero, t1, d1);
          tcg_gen_movcond_i64(TCG_COND_NE, c2, c2, zero, t2, d2);
 -        tcg_gen_qemu_st_i64(c1, addr, memidx, MO_64 | s->be_data);
 +        tcg_gen_qemu_st_i64(c1, clean_addr, memidx, MO_64 | s->be_data);
          tcg_gen_qemu_st_i64(c2, a2, memidx, MO_64 | s->be_data);
          tcg_temp_free_i64(a2);
          tcg_temp_free_i64(c1);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
      int is_lasr = extract32(insn, 15, 1);
      int o2_L_o1_o0 = extract32(insn, 21, 3) * 2 | is_lasr;
      int size = extract32(insn, 30, 2);
 -    TCGv_i64 tcg_addr;
 +    TCGv_i64 clean_addr;
      switch (o2_L_o1_o0) {
      case 0x0: /* STXR */
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
          if (is_lasr) {
              tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
          }
 -        tcg_addr = read_cpu_reg_sp(s, rn, 1);
 -        gen_store_exclusive(s, rs, rt, rt2, tcg_addr, size, false);
 +        clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
 +        gen_store_exclusive(s, rs, rt, rt2, clean_addr, size, false);
          return;
      case 0x4: /* LDXR */
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
          if (rn == 31) {
              gen_check_sp_alignment(s);
          }
 -        tcg_addr = read_cpu_reg_sp(s, rn, 1);
 +        clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
          s->is_ldex = true;
 -        gen_load_exclusive(s, rt, rt2, tcg_addr, size, false);
 +        gen_load_exclusive(s, rt, rt2, clean_addr, size, false);
          if (is_lasr) {
              tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
          }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
              gen_check_sp_alignment(s);
          }
          tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
 -        tcg_addr = read_cpu_reg_sp(s, rn, 1);
 -        do_gpr_st(s, cpu_reg(s, rt), tcg_addr, size, true, rt,
 +        clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
 +        do_gpr_st(s, cpu_reg(s, rt), clean_addr, size, true, rt,
                    disas_ldst_compute_iss_sf(size, false, 0), is_lasr);
          return;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
          if (rn == 31) {
              gen_check_sp_alignment(s);
          }
 -        tcg_addr = read_cpu_reg_sp(s, rn, 1);
 -        do_gpr_ld(s, cpu_reg(s, rt), tcg_addr, size, false, false, true, rt,
 +        clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
 +        do_gpr_ld(s, cpu_reg(s, rt), clean_addr, size, false, false, true, rt,
                    disas_ldst_compute_iss_sf(size, false, 0), is_lasr);
          tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
          return;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
              if (is_lasr) {
                  tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
              }
 -            tcg_addr = read_cpu_reg_sp(s, rn, 1);
 -            gen_store_exclusive(s, rs, rt, rt2, tcg_addr, size, true);
 +            clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
 +            gen_store_exclusive(s, rs, rt, rt2, clean_addr, size, true);
              return;
          }
          if (rt2 == 31
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
              if (rn == 31) {
                  gen_check_sp_alignment(s);
              }
 -            tcg_addr = read_cpu_reg_sp(s, rn, 1);
 +            clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
              s->is_ldex = true;
 -            gen_load_exclusive(s, rt, rt2, tcg_addr, size, true);
 +            gen_load_exclusive(s, rt, rt2, clean_addr, size, true);
              if (is_lasr) {
                  tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
              }
@@ -XXX,XX +XXX,XX @@ static void disas_ld_lit(DisasContext *s, uint32_t insn)
      int opc = extract32(insn, 30, 2);
      bool is_signed = false;
      int size = 2;
 -    TCGv_i64 tcg_rt, tcg_addr;
 +    TCGv_i64 tcg_rt, clean_addr;
      if (is_vector) {
          if (opc == 3) {
@@ -XXX,XX +XXX,XX @@ static void disas_ld_lit(DisasContext *s, uint32_t insn)
      tcg_rt = cpu_reg(s, rt);
 -    tcg_addr = tcg_const_i64((s->pc - 4) + imm);
 +    clean_addr = tcg_const_i64((s->pc - 4) + imm);
      if (is_vector) {
 -        do_fp_ld(s, rt, tcg_addr, size);
 +        do_fp_ld(s, rt, clean_addr, size);
      } else {
-         /* Only unsigned 32bit loads target 32bit registers.  */
+-        AddressSpace *as = arm_boot_address_space(s->soc.fpd.apu.cpu[0],
-         bool iss_sf = opc != 0;
++        AddressSpace *as = arm_boot_address_space(&s->soc.fpd.apu.cpu[0],
+                                                   &s->binfo);
--        do_gpr_ld(s, tcg_rt, tcg_addr, size, is_signed, false,
+         /* Some boot-loaders (e.g u-boot) don't like blobs at address 0 (NULL).
-+        do_gpr_ld(s, tcg_rt, clean_addr, size, is_signed, false,
+          * Offset things by 4K.  */
-                   true, rt, iss_sf, false);
+diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
-     }
+index XXXXXXX..XXXXXXX 100644
--    tcg_temp_free_i64(tcg_addr);
+--- a/hw/arm/xlnx-versal.c
-+    tcg_temp_free_i64(clean_addr);
++++ b/hw/arm/xlnx-versal.c
- }
+@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
- /*
+     for (i = 0; i < ARRAY_SIZE(s->fpd.apu.cpu); i++) {
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn)
+         Object *obj;
-     bool postindex = false;
+-        char *name;
      bool wback = false;
 -    TCGv_i64 tcg_addr; /* calculated address */
 +    TCGv_i64 clean_addr, dirty_addr;
 +
      int size;
      if (opc == 3) {
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn)
          gen_check_sp_alignment(s);
      }
 -    tcg_addr = read_cpu_reg_sp(s, rn, 1);
 -
-+    dirty_addr = read_cpu_reg_sp(s, rn, 1);
+-        obj = object_new(XLNX_VERSAL_ACPU_TYPE);
-     if (!postindex) {
+-        if (!obj) {
--        tcg_gen_addi_i64(tcg_addr, tcg_addr, offset);
+-            error_report("Unable to create apu.cpu[%d] of type %s",
-+        tcg_gen_addi_i64(dirty_addr, dirty_addr, offset);
+-                         i, XLNX_VERSAL_ACPU_TYPE);
-     }
+-            exit(EXIT_FAILURE);
-+    clean_addr = clean_data_tbi(s, dirty_addr);
+-        }
+-
-     if (is_vector) {
+-        name = g_strdup_printf("apu-cpu[%d]", i);
-         if (is_load) {
+-        object_property_add_child(OBJECT(s), name, obj, &error_fatal);
--            do_fp_ld(s, rt, tcg_addr, size);
+-        g_free(name);
-+            do_fp_ld(s, rt, clean_addr, size);
-         } else {
++        object_initialize_child(OBJECT(s), "apu-cpu[*]",
--            do_fp_st(s, rt, tcg_addr, size);
++                                &s->fpd.apu.cpu[i], sizeof(s->fpd.apu.cpu[i]),
-+            do_fp_st(s, rt, clean_addr, size);
++                                XLNX_VERSAL_ACPU_TYPE, &error_abort, NULL);
-         }
++        obj = OBJECT(&s->fpd.apu.cpu[i]);
--        tcg_gen_addi_i64(tcg_addr, tcg_addr, 1 << size);
+         object_property_set_int(obj, s->cfg.psci_conduit,
-+        tcg_gen_addi_i64(clean_addr, clean_addr, 1 << size);
+                                 "psci-conduit", &error_abort);
-         if (is_load) {
+         if (i) {
--            do_fp_ld(s, rt2, tcg_addr, size);
+@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
-+            do_fp_ld(s, rt2, clean_addr, size);
+         object_property_set_link(obj, OBJECT(&s->fpd.apu.mr), "memory",
-         } else {
+                                  &error_abort);
--            do_fp_st(s, rt2, tcg_addr, size);
+         object_property_set_bool(obj, true, "realized", &error_fatal);
-+            do_fp_st(s, rt2, clean_addr, size);
+-        s->fpd.apu.cpu[i] = ARM_CPU(obj);
          }
      } else {
          TCGv_i64 tcg_rt = cpu_reg(s, rt);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn)
              /* Do not modify tcg_rt before recognizing any exception
               * from the second load.
               */
 -            do_gpr_ld(s, tmp, tcg_addr, size, is_signed, false,
 +            do_gpr_ld(s, tmp, clean_addr, size, is_signed, false,
                        false, 0, false, false);
 -            tcg_gen_addi_i64(tcg_addr, tcg_addr, 1 << size);
 -            do_gpr_ld(s, tcg_rt2, tcg_addr, size, is_signed, false,
 +            tcg_gen_addi_i64(clean_addr, clean_addr, 1 << size);
 +            do_gpr_ld(s, tcg_rt2, clean_addr, size, is_signed, false,
                        false, 0, false, false);
              tcg_gen_mov_i64(tcg_rt, tmp);
              tcg_temp_free_i64(tmp);
          } else {
 -            do_gpr_st(s, tcg_rt, tcg_addr, size,
 +            do_gpr_st(s, tcg_rt, clean_addr, size,
                        false, 0, false, false);
 -            tcg_gen_addi_i64(tcg_addr, tcg_addr, 1 << size);
 -            do_gpr_st(s, tcg_rt2, tcg_addr, size,
 +            tcg_gen_addi_i64(clean_addr, clean_addr, 1 << size);
 +            do_gpr_st(s, tcg_rt2, clean_addr, size,
                        false, 0, false, false);
          }
      }
      if (wback) {
          if (postindex) {
 -            tcg_gen_addi_i64(tcg_addr, tcg_addr, offset - (1 << size));
 -        } else {
 -            tcg_gen_subi_i64(tcg_addr, tcg_addr, 1 << size);
 +            tcg_gen_addi_i64(dirty_addr, dirty_addr, offset);
          }
 -        tcg_gen_mov_i64(cpu_reg_sp(s, rn), tcg_addr);
 +        tcg_gen_mov_i64(cpu_reg_sp(s, rn), dirty_addr);
      }
  }
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn,
+@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_gic(Versal *s, qemu_irq *pic)
      bool post_index;
      bool writeback;
 -    TCGv_i64 tcg_addr;
 +    TCGv_i64 clean_addr, dirty_addr;
      if (is_vector) {
          size |= (opc & 2) << 1;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn,
      if (rn == 31) {
          gen_check_sp_alignment(s);
      }
--    tcg_addr = read_cpu_reg_sp(s, rn, 1);
+     for (i = 0; i < nr_apu_cpus; i++) {
-+    dirty_addr = read_cpu_reg_sp(s, rn, 1);
+-        DeviceState *cpudev = DEVICE(s->fpd.apu.cpu[i]);
-     if (!post_index) {
++        DeviceState *cpudev = DEVICE(&s->fpd.apu.cpu[i]);
--        tcg_gen_addi_i64(tcg_addr, tcg_addr, imm9);
+         int ppibase = XLNX_VERSAL_NR_IRQS + i * GIC_INTERNAL + GIC_NR_SGIS;
-+        tcg_gen_addi_i64(dirty_addr, dirty_addr, imm9);
+         qemu_irq maint_irq;
-     }
+         int ti;
 +    clean_addr = clean_data_tbi(s, dirty_addr);
      if (is_vector) {
          if (is_store) {
 -            do_fp_st(s, rt, tcg_addr, size);
 +            do_fp_st(s, rt, clean_addr, size);
          } else {
 -            do_fp_ld(s, rt, tcg_addr, size);
 +            do_fp_ld(s, rt, clean_addr, size);
          }
      } else {
          TCGv_i64 tcg_rt = cpu_reg(s, rt);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn,
          bool iss_sf = disas_ldst_compute_iss_sf(size, is_signed, opc);
          if (is_store) {
 -            do_gpr_st_memidx(s, tcg_rt, tcg_addr, size, memidx,
 +            do_gpr_st_memidx(s, tcg_rt, clean_addr, size, memidx,
                               iss_valid, rt, iss_sf, false);
          } else {
 -            do_gpr_ld_memidx(s, tcg_rt, tcg_addr, size,
 +            do_gpr_ld_memidx(s, tcg_rt, clean_addr, size,
                               is_signed, is_extended, memidx,
                               iss_valid, rt, iss_sf, false);
          }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn,
      if (writeback) {
          TCGv_i64 tcg_rn = cpu_reg_sp(s, rn);
          if (post_index) {
 -            tcg_gen_addi_i64(tcg_addr, tcg_addr, imm9);
 +            tcg_gen_addi_i64(dirty_addr, dirty_addr, imm9);
          }
 -        tcg_gen_mov_i64(tcg_rn, tcg_addr);
 +        tcg_gen_mov_i64(tcg_rn, dirty_addr);
      }
  }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn,
      bool is_store = false;
      bool is_extended = false;
 -    TCGv_i64 tcg_rm;
 -    TCGv_i64 tcg_addr;
 +    TCGv_i64 tcg_rm, clean_addr, dirty_addr;
      if (extract32(opt, 1, 1) == 0) {
          unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn,
      if (rn == 31) {
          gen_check_sp_alignment(s);
      }
 -    tcg_addr = read_cpu_reg_sp(s, rn, 1);
 +    dirty_addr = read_cpu_reg_sp(s, rn, 1);
      tcg_rm = read_cpu_reg(s, rm, 1);
      ext_and_shift_reg(tcg_rm, tcg_rm, opt, shift ? size : 0);
 -    tcg_gen_add_i64(tcg_addr, tcg_addr, tcg_rm);
 +    tcg_gen_add_i64(dirty_addr, dirty_addr, tcg_rm);
 +    clean_addr = clean_data_tbi(s, dirty_addr);
      if (is_vector) {
          if (is_store) {
 -            do_fp_st(s, rt, tcg_addr, size);
 +            do_fp_st(s, rt, clean_addr, size);
          } else {
 -            do_fp_ld(s, rt, tcg_addr, size);
 +            do_fp_ld(s, rt, clean_addr, size);
          }
      } else {
          TCGv_i64 tcg_rt = cpu_reg(s, rt);
          bool iss_sf = disas_ldst_compute_iss_sf(size, is_signed, opc);
          if (is_store) {
 -            do_gpr_st(s, tcg_rt, tcg_addr, size,
 +            do_gpr_st(s, tcg_rt, clean_addr, size,
                        true, rt, iss_sf, false);
          } else {
 -            do_gpr_ld(s, tcg_rt, tcg_addr, size,
 +            do_gpr_ld(s, tcg_rt, clean_addr, size,
                        is_signed, is_extended,
                        true, rt, iss_sf, false);
          }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn,
      unsigned int imm12 = extract32(insn, 10, 12);
      unsigned int offset;
 -    TCGv_i64 tcg_addr;
 +    TCGv_i64 clean_addr, dirty_addr;
      bool is_store;
      bool is_signed = false;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn,
      if (rn == 31) {
          gen_check_sp_alignment(s);
      }
 -    tcg_addr = read_cpu_reg_sp(s, rn, 1);
 +    dirty_addr = read_cpu_reg_sp(s, rn, 1);
      offset = imm12 << size;
 -    tcg_gen_addi_i64(tcg_addr, tcg_addr, offset);
 +    tcg_gen_addi_i64(dirty_addr, dirty_addr, offset);
 +    clean_addr = clean_data_tbi(s, dirty_addr);
      if (is_vector) {
          if (is_store) {
 -            do_fp_st(s, rt, tcg_addr, size);
 +            do_fp_st(s, rt, clean_addr, size);
          } else {
 -            do_fp_ld(s, rt, tcg_addr, size);
 +            do_fp_ld(s, rt, clean_addr, size);
          }
      } else {
          TCGv_i64 tcg_rt = cpu_reg(s, rt);
          bool iss_sf = disas_ldst_compute_iss_sf(size, is_signed, opc);
          if (is_store) {
 -            do_gpr_st(s, tcg_rt, tcg_addr, size,
 +            do_gpr_st(s, tcg_rt, clean_addr, size,
                        true, rt, iss_sf, false);
          } else {
 -            do_gpr_ld(s, tcg_rt, tcg_addr, size, is_signed, is_extended,
 +            do_gpr_ld(s, tcg_rt, clean_addr, size, is_signed, is_extended,
                        true, rt, iss_sf, false);
          }
      }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
      int rs = extract32(insn, 16, 5);
      int rn = extract32(insn, 5, 5);
      int o3_opc = extract32(insn, 12, 4);
 -    TCGv_i64 tcg_rn, tcg_rs;
 +    TCGv_i64 tcg_rs, clean_addr;
      AtomicThreeOpFn *fn;
      if (is_vector || !dc_isar_feature(aa64_atomics, s)) {
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
      if (rn == 31) {
          gen_check_sp_alignment(s);
      }
 -    tcg_rn = cpu_reg_sp(s, rn);
 +    clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
      tcg_rs = read_cpu_reg(s, rs, true);
      if (o3_opc == 1) { /* LDCLR */
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
      /* The tcg atomic primitives are all full barriers.  Therefore we
       * can ignore the Acquire and Release bits of this instruction.
       */
 -    fn(cpu_reg(s, rt), tcg_rn, tcg_rs, get_mem_index(s),
 +    fn(cpu_reg(s, rt), clean_addr, tcg_rs, get_mem_index(s),
         s->be_data | size | MO_ALIGN);
  }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_pac(DisasContext *s, uint32_t insn,
      bool is_wback = extract32(insn, 11, 1);
      bool use_key_a = !extract32(insn, 23, 1);
      int offset;
 -    TCGv_i64 tcg_addr, tcg_rt;
 +    TCGv_i64 clean_addr, dirty_addr, tcg_rt;
      if (size != 3 || is_vector || !dc_isar_feature(aa64_pauth, s)) {
          unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_pac(DisasContext *s, uint32_t insn,
      if (rn == 31) {
          gen_check_sp_alignment(s);
      }
 -    tcg_addr = read_cpu_reg_sp(s, rn, 1);
 +    dirty_addr = read_cpu_reg_sp(s, rn, 1);
      if (s->pauth_active) {
          if (use_key_a) {
 -            gen_helper_autda(tcg_addr, cpu_env, tcg_addr, cpu_X[31]);
 +            gen_helper_autda(dirty_addr, cpu_env, dirty_addr, cpu_X[31]);
          } else {
 -            gen_helper_autdb(tcg_addr, cpu_env, tcg_addr, cpu_X[31]);
 +            gen_helper_autdb(dirty_addr, cpu_env, dirty_addr, cpu_X[31]);
          }
      }
      /* Form the 10-bit signed, scaled offset.  */
      offset = (extract32(insn, 22, 1) << 9) | extract32(insn, 12, 9);
      offset = sextract32(offset << size, 0, 10 + size);
 -    tcg_gen_addi_i64(tcg_addr, tcg_addr, offset);
 +    tcg_gen_addi_i64(dirty_addr, dirty_addr, offset);
 +
 +    /* Note that "clean" and "dirty" here refer to TBI not PAC.  */
 +    clean_addr = clean_data_tbi(s, dirty_addr);
      tcg_rt = cpu_reg(s, rt);
 -
 -    do_gpr_ld(s, tcg_rt, tcg_addr, size, /* is_signed */ false,
 +    do_gpr_ld(s, tcg_rt, clean_addr, size, /* is_signed */ false,
                /* extend */ false, /* iss_valid */ !is_wback,
                /* iss_srt */ rt, /* iss_sf */ true, /* iss_ar */ false);
      if (is_wback) {
 -        tcg_gen_mov_i64(cpu_reg_sp(s, rn), tcg_addr);
 +        tcg_gen_mov_i64(cpu_reg_sp(s, rn), dirty_addr);
      }
  }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
      bool is_store = !extract32(insn, 22, 1);
      bool is_postidx = extract32(insn, 23, 1);
      bool is_q = extract32(insn, 30, 1);
 -    TCGv_i64 tcg_addr, tcg_rn, tcg_ebytes;
 +    TCGv_i64 clean_addr, tcg_rn, tcg_ebytes;
      TCGMemOp endian = s->be_data;
      int ebytes;   /* bytes per element */
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
      elements = (is_q ? 16 : 8) / ebytes;
      tcg_rn = cpu_reg_sp(s, rn);
 -    tcg_addr = tcg_temp_new_i64();
 -    tcg_gen_mov_i64(tcg_addr, tcg_rn);
 +    clean_addr = clean_data_tbi(s, tcg_rn);
      tcg_ebytes = tcg_const_i64(ebytes);
      for (r = 0; r < rpt; r++) {
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
              for (xs = 0; xs < selem; xs++) {
                  int tt = (rt + r + xs) % 32;
                  if (is_store) {
 -                    do_vec_st(s, tt, e, tcg_addr, size, endian);
 +                    do_vec_st(s, tt, e, clean_addr, size, endian);
                  } else {
 -                    do_vec_ld(s, tt, e, tcg_addr, size, endian);
 +                    do_vec_ld(s, tt, e, clean_addr, size, endian);
                  }
 -                tcg_gen_add_i64(tcg_addr, tcg_addr, tcg_ebytes);
 +                tcg_gen_add_i64(clean_addr, clean_addr, tcg_ebytes);
              }
          }
      }
 +    tcg_temp_free_i64(tcg_ebytes);
      if (!is_store) {
          /* For non-quad operations, setting a slice of the low
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
      if (is_postidx) {
          if (rm == 31) {
 -            tcg_gen_mov_i64(tcg_rn, tcg_addr);
 +            tcg_gen_addi_i64(tcg_rn, tcg_rn, rpt * elements * selem * ebytes);
          } else {
              tcg_gen_add_i64(tcg_rn, tcg_rn, cpu_reg(s, rm));
          }
      }
 -    tcg_temp_free_i64(tcg_ebytes);
 -    tcg_temp_free_i64(tcg_addr);
  }
  /* AdvSIMD load/store single structure
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
      bool replicate = false;
      int index = is_q << 3 | S << 2 | size;
      int ebytes, xs;
 -    TCGv_i64 tcg_addr, tcg_rn, tcg_ebytes;
 +    TCGv_i64 clean_addr, tcg_rn, tcg_ebytes;
      if (extract32(insn, 31, 1)) {
          unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
      }
      tcg_rn = cpu_reg_sp(s, rn);
 -    tcg_addr = tcg_temp_new_i64();
 -    tcg_gen_mov_i64(tcg_addr, tcg_rn);
 +    clean_addr = clean_data_tbi(s, tcg_rn);
      tcg_ebytes = tcg_const_i64(ebytes);
      for (xs = 0; xs < selem; xs++) {
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
              /* Load and replicate to all elements */
              TCGv_i64 tcg_tmp = tcg_temp_new_i64();
 -            tcg_gen_qemu_ld_i64(tcg_tmp, tcg_addr,
 +            tcg_gen_qemu_ld_i64(tcg_tmp, clean_addr,
                                  get_mem_index(s), s->be_data + scale);
              tcg_gen_gvec_dup_i64(scale, vec_full_reg_offset(s, rt),
                                   (is_q + 1) * 8, vec_full_reg_size(s),
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
          } else {
              /* Load/store one element per register */
              if (is_load) {
 -                do_vec_ld(s, rt, index, tcg_addr, scale, s->be_data);
 +                do_vec_ld(s, rt, index, clean_addr, scale, s->be_data);
              } else {
 -                do_vec_st(s, rt, index, tcg_addr, scale, s->be_data);
 +                do_vec_st(s, rt, index, clean_addr, scale, s->be_data);
              }
          }
 -        tcg_gen_add_i64(tcg_addr, tcg_addr, tcg_ebytes);
 +        tcg_gen_add_i64(clean_addr, clean_addr, tcg_ebytes);
          rt = (rt + 1) % 32;
      }
 +    tcg_temp_free_i64(tcg_ebytes);
      if (is_postidx) {
          if (rm == 31) {
 -            tcg_gen_mov_i64(tcg_rn, tcg_addr);
 +            tcg_gen_addi_i64(tcg_rn, tcg_rn, selem * ebytes);
          } else {
              tcg_gen_add_i64(tcg_rn, tcg_rn, cpu_reg(s, rm));
          }
      }
 -    tcg_temp_free_i64(tcg_ebytes);
 -    tcg_temp_free_i64(tcg_addr);
  }
  /* Loads and stores */
 --
 .20.1

-[Qemu-devel] [PULL 14/22] target/arm: Compute TB_FLAGS for TBI for user-only
+[PULL 16/39] hw/arm: versal: Add support for SD
-Enables, but does not turn on, TBI for CONFIG_USER_ONLY.
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Add support for SD.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190204132126.3255-4-richard.henderson@linaro.org
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
-[PMM: adjusted #ifdeffery to placate clang, which otherwise complains
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-about static functions that are unused in the CONFIG_USER_ONLY build]
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-9-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/internals.h | 21 --------------------
+ include/hw/arm/xlnx-versal.h | 12 ++++++++++++
- target/arm/helper.c    | 45 ++++++++++++++++++++++--------------------
+ hw/arm/xlnx-versal.c         | 31 +++++++++++++++++++++++++++++++
-files changed, 24 insertions(+), 42 deletions(-)
+files changed, 43 insertions(+)
-diff --git a/target/arm/internals.h b/target/arm/internals.h
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/target/arm/internals.h
++++ b/include/hw/arm/xlnx-versal.h
-@@ -XXX,XX +XXX,XX @@ typedef struct ARMVAParameters {
+@@ -XXX,XX +XXX,XX @@
-     bool using64k   : 1;
- } ARMVAParameters;
+ #include "hw/sysbus.h"
+ #include "hw/arm/boot.h"
--#ifdef CONFIG_USER_ONLY
++#include "hw/sd/sdhci.h"
--static inline ARMVAParameters aa64_va_parameters_both(CPUARMState *env,
+ #include "hw/intc/arm_gicv3.h"
--                                                      uint64_t va,
+ #include "hw/char/pl011.h"
--                                                      ARMMMUIdx mmu_idx)
+ #include "hw/dma/xlnx-zdma.h"
--{
+@@ -XXX,XX +XXX,XX @@
--    return (ARMVAParameters) {
+ #define XLNX_VERSAL_NR_UARTS   2
--        /* 48-bit address space */
+ #define XLNX_VERSAL_NR_GEMS    2
--        .tsz = 16,
+ #define XLNX_VERSAL_NR_ADMAS   8
--        /* We can't handle tagged addresses properly in user-only mode */
++#define XLNX_VERSAL_NR_SDS     2
--        .tbi = false,
+ #define XLNX_VERSAL_NR_IRQS    192
--    };
--}
+ typedef struct Versal {
--
+@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
--static inline ARMVAParameters aa64_va_parameters(CPUARMState *env,
+         } iou;
--                                                 uint64_t va,
+     } lpd;
--                                                 ARMMMUIdx mmu_idx, bool data)
--{
++    /* The Platform Management Controller subsystem.  */
--    return aa64_va_parameters_both(env, va, mmu_idx);
++    struct {
--}
++        struct {
--#else
++            SDHCIState sd[XLNX_VERSAL_NR_SDS];
- ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
++        } iou;
-                                         ARMMMUIdx mmu_idx);
++    } pmc;
- ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
++
-                                    ARMMMUIdx mmu_idx, bool data);
+     struct {
--#endif
+         MemoryRegion *mr_ddr;
+         uint32_t psci_conduit;
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
  #define VERSAL_GEM1_IRQ_0          58
  #define VERSAL_GEM1_WAKE_IRQ_0     59
  #define VERSAL_ADMA_IRQ_0          60
 +#define VERSAL_SD0_IRQ_0           126
  /* Architecturally reserved IRQs suitable for virtualization.  */
  #define VERSAL_RSVD_IRQ_FIRST 111
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
  #define MM_FPD_CRF                  0xfd1a0000U
  #define MM_FPD_CRF_SIZE             0x140000
 +#define MM_PMC_SD0                  0xf1040000U
 +#define MM_PMC_SD0_SIZE             0x10000
  #define MM_PMC_CRP                  0xf1260000U
  #define MM_PMC_CRP_SIZE             0x10000
  #endif
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/hw/arm/xlnx-versal.c
-+++ b/target/arm/helper.c
++++ b/hw/arm/xlnx-versal.c
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rbit)(uint32_t x)
+@@ -XXX,XX +XXX,XX @@ static void versal_create_admas(Versal *s, qemu_irq *pic)
      return revbit32(x);
  }
 -#if defined(CONFIG_USER_ONLY)
 +#ifdef CONFIG_USER_ONLY
  /* These should probably raise undefined insn exceptions.  */
  void HELPER(v7m_msr)(CPUARMState *env, uint32_t reg, uint32_t val)
@@ -XXX,XX +XXX,XX @@ void arm_cpu_do_interrupt(CPUState *cs)
          cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
      }
  }
-+#endif /* !CONFIG_USER_ONLY */
++#define SDHCI_CAPABILITIES  0x280737ec6481 /* Same as on ZynqMP.  */
- /* Return the exception level which controls this address translation regime */
++static void versal_create_sds(Versal *s, qemu_irq *pic)
- static inline uint32_t regime_el(CPUARMState *env, ARMMMUIdx mmu_idx)
++{
-@@ -XXX,XX +XXX,XX @@ static inline uint32_t regime_el(CPUARMState *env, ARMMMUIdx mmu_idx)
++    int i;
      }
  }
 +#ifndef CONFIG_USER_ONLY
 +
- /* Return the SCTLR value which controls this address translation regime */
++    for (i = 0; i < ARRAY_SIZE(s->pmc.iou.sd); i++) {
- static inline uint32_t regime_sctlr(CPUARMState *env, ARMMMUIdx mmu_idx)
++        DeviceState *dev;
- {
++        MemoryRegion *mr;
-@@ -XXX,XX +XXX,XX @@ static inline bool regime_translation_big_endian(CPUARMState *env,
++
-     return (regime_sctlr(env, mmu_idx) & SCTLR_EE) != 0;
++        sysbus_init_child_obj(OBJECT(s), "sd[*]",
- }
++                              &s->pmc.iou.sd[i], sizeof(s->pmc.iou.sd[i]),
++                              TYPE_SYSBUS_SDHCI);
-+/* Return the TTBR associated with this translation regime */
++        dev = DEVICE(&s->pmc.iou.sd[i]);
-+static inline uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx,
++
-+                                   int ttbrn)
++        object_property_set_uint(OBJECT(dev),
-+{
++                                 3, "sd-spec-version", &error_fatal);
-+    if (mmu_idx == ARMMMUIdx_S2NS) {
++        object_property_set_uint(OBJECT(dev), SDHCI_CAPABILITIES, "capareg",
-+        return env->cp15.vttbr_el2;
++                                 &error_fatal);
-+    }
++        object_property_set_uint(OBJECT(dev), UHS_I, "uhs", &error_fatal);
-+    if (ttbrn == 0) {
++        qdev_init_nofail(dev);
-+        return env->cp15.ttbr0_el[regime_el(env, mmu_idx)];
++
-+    } else {
++        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
-+        return env->cp15.ttbr1_el[regime_el(env, mmu_idx)];
++        memory_region_add_subregion(&s->mr_ps,
 +                                    MM_PMC_SD0 + i * MM_PMC_SD0_SIZE, mr);
 +
 +        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0,
 +                           pic[VERSAL_SD0_IRQ_0 + i * 2]);
 +    }
 +}
 +
-+#endif /* !CONFIG_USER_ONLY */
+ /* This takes the board allocated linear DDR memory and creates aliases
-+
+  * for each split DDR range/aperture on the Versal address map.
- /* Return the TCR controlling this translation regime */
+  */
- static inline TCR *regime_tcr(CPUARMState *env, ARMMMUIdx mmu_idx)
+@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
- {
+     versal_create_uarts(s, pic);
-@@ -XXX,XX +XXX,XX @@ static inline ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
+     versal_create_gems(s, pic);
-     return mmu_idx;
+     versal_create_admas(s, pic);
- }
++    versal_create_sds(s, pic);
+     versal_map_ddr(s);
--/* Return the TTBR associated with this translation regime */
+     versal_unimp(s);
--static inline uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx,
 -                                   int ttbrn)
 -{
 -    if (mmu_idx == ARMMMUIdx_S2NS) {
 -        return env->cp15.vttbr_el2;
 -    }
 -    if (ttbrn == 0) {
 -        return env->cp15.ttbr0_el[regime_el(env, mmu_idx)];
 -    } else {
 -        return env->cp15.ttbr1_el[regime_el(env, mmu_idx)];
 -    }
 -}
 -
  /* Return true if the translation regime is using LPAE format page tables */
  static inline bool regime_using_lpae_format(CPUARMState *env,
                                              ARMMMUIdx mmu_idx)
@@ -XXX,XX +XXX,XX @@ bool arm_s1_regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
      return regime_using_lpae_format(env, mmu_idx);
  }
 +#ifndef CONFIG_USER_ONLY
  static inline bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
  {
      switch (mmu_idx) {
@@ -XXX,XX +XXX,XX @@ static uint8_t convert_stage2_attrs(CPUARMState *env, uint8_t s2attrs)
      return (hiattr << 6) | (hihint << 4) | (loattr << 2) | lohint;
  }
 +#endif /* !CONFIG_USER_ONLY */
  ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
                                          ARMMMUIdx mmu_idx)
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
      return ret;
  }
 +#ifndef CONFIG_USER_ONLY
  static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
                                            ARMMMUIdx mmu_idx)
  {
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
          *pc = env->pc;
          flags = FIELD_DP32(flags, TBFLAG_ANY, AARCH64_STATE, 1);
 -#ifndef CONFIG_USER_ONLY
 -        /*
 -         * Get control bits for tagged addresses.  Note that the
 -         * translator only uses this for instruction addresses.
 -         */
 +        /* Get control bits for tagged addresses.  */
          {
              ARMMMUIdx stage1 = stage_1_mmu_idx(mmu_idx);
              ARMVAParameters p0 = aa64_va_parameters_both(env, 0, stage1);
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
              flags = FIELD_DP32(flags, TBFLAG_A64, TBII, tbii);
              flags = FIELD_DP32(flags, TBFLAG_A64, TBID, tbid);
          }
 -#endif
          if (cpu_isar_feature(aa64_sve, cpu)) {
              int sve_el = sve_exception_el(env, current_el);
 --
 .20.1

-[Qemu-devel] [PULL 08/22] target/arm: Set btype for indirect branches
+[PULL 17/39] hw/arm: versal: Add support for the RTC
-From: Richard Henderson <richard.henderson@linaro.org>
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+hw/arm: versal: Add support for the RTC.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190128223118.5255-9-richard.henderson@linaro.org
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-10-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 37 ++++++++++++++++++++++++++++++++++++-
+ include/hw/arm/xlnx-versal.h |  8 ++++++++
-file changed, 36 insertions(+), 1 deletion(-)
+ hw/arm/xlnx-versal.c         | 21 +++++++++++++++++++++
 files changed, 29 insertions(+)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/target/arm/translate-a64.c
++++ b/include/hw/arm/xlnx-versal.h
-@@ -XXX,XX +XXX,XX @@ static void reset_btype(DisasContext *s)
+@@ -XXX,XX +XXX,XX @@
  #include "hw/char/pl011.h"
  #include "hw/dma/xlnx-zdma.h"
  #include "hw/net/cadence_gem.h"
 +#include "hw/rtc/xlnx-zynqmp-rtc.h"
  #define TYPE_XLNX_VERSAL "xlnx-versal"
  #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
          struct {
              SDHCIState sd[XLNX_VERSAL_NR_SDS];
          } iou;
 +
 +        XlnxZynqMPRTC rtc;
      } pmc;
      struct {
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
  #define VERSAL_GEM1_IRQ_0          58
  #define VERSAL_GEM1_WAKE_IRQ_0     59
  #define VERSAL_ADMA_IRQ_0          60
 +#define VERSAL_RTC_APB_ERR_IRQ     121
  #define VERSAL_SD0_IRQ_0           126
 +#define VERSAL_RTC_ALARM_IRQ       142
 +#define VERSAL_RTC_SECONDS_IRQ     143
  /* Architecturally reserved IRQs suitable for virtualization.  */
  #define VERSAL_RSVD_IRQ_FIRST 111
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
  #define MM_PMC_SD0_SIZE             0x10000
  #define MM_PMC_CRP                  0xf1260000U
  #define MM_PMC_CRP_SIZE             0x10000
 +#define MM_PMC_RTC                  0xf12a0000
 +#define MM_PMC_RTC_SIZE             0x10000
  #endif
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-versal.c
 +++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_sds(Versal *s, qemu_irq *pic)
      }
  }
-+static void set_btype(DisasContext *s, int val)
++static void versal_create_rtc(Versal *s, qemu_irq *pic)
 +{
-+    TCGv_i32 tcg_val;
++    SysBusDevice *sbd;
 +    MemoryRegion *mr;
 +
-+    /* BTYPE is a 2-bit field, and 0 should be done with reset_btype.  */
++    sysbus_init_child_obj(OBJECT(s), "rtc", &s->pmc.rtc, sizeof(s->pmc.rtc),
-+    tcg_debug_assert(val >= 1 && val <= 3);
++                          TYPE_XLNX_ZYNQMP_RTC);
 +    sbd = SYS_BUS_DEVICE(&s->pmc.rtc);
 +    qdev_init_nofail(DEVICE(sbd));
 +
-+    tcg_val = tcg_const_i32(val);
++    mr = sysbus_mmio_get_region(sbd, 0);
-+    tcg_gen_st_i32(tcg_val, cpu_env, offsetof(CPUARMState, btype));
++    memory_region_add_subregion(&s->mr_ps, MM_PMC_RTC, mr);
-+    tcg_temp_free_i32(tcg_val);
++
-+    s->btype = -1;
++    /*
 +     * TODO: Connect the ALARM and SECONDS interrupts once our RTC model
 +     * supports them.
 +     */
 +    sysbus_connect_irq(sbd, 1, pic[VERSAL_RTC_APB_ERR_IRQ]);
 +}
 +
- void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
+ /* This takes the board allocated linear DDR memory and creates aliases
-                             fprintf_function cpu_fprintf, int flags)
+  * for each split DDR range/aperture on the Versal address map.
- {
+  */
-@@ -XXX,XX +XXX,XX @@ static void disas_exc(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
- static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
+     versal_create_gems(s, pic);
- {
+     versal_create_admas(s, pic);
-     unsigned int opc, op2, op3, rn, op4;
+     versal_create_sds(s, pic);
-+    unsigned btype_mod = 2;   /* 0: BR, 1: BLR, 2: other */
++    versal_create_rtc(s, pic);
-     TCGv_i64 dst;
+     versal_map_ddr(s);
-     TCGv_i64 modifier;
+     versal_unimp(s);
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
      case 0: /* BR */
      case 1: /* BLR */
      case 2: /* RET */
 +        btype_mod = opc;
          switch (op3) {
          case 0:
              /* BR, BLR, RET */
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
          default:
              goto do_unallocated;
          }
 -
          gen_a64_set_pc(s, dst);
          /* BLR also needs to load return address */
          if (opc == 1) {
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
          if ((op3 & ~1) != 2) {
              goto do_unallocated;
          }
 +        btype_mod = opc & 1;
          if (s->pauth_active) {
              dst = new_tmp_a64(s);
              modifier = cpu_reg_sp(s, op4);
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
          return;
      }
 +    switch (btype_mod) {
 +    case 0: /* BR */
 +        if (dc_isar_feature(aa64_bti, s)) {
 +            /* BR to {x16,x17} or !guard -> 1, else 3.  */
 +            set_btype(s, rn == 16 || rn == 17 || !s->guarded_page ? 1 : 3);
 +        }
 +        break;
 +
 +    case 1: /* BLR */
 +        if (dc_isar_feature(aa64_bti, s)) {
 +            /* BLR sets BTYPE to 2, regardless of source guarded page.  */
 +            set_btype(s, 2);
 +        }
 +        break;
 +
 +    default: /* RET or none of the above.  */
 +        /* BTYPE will be set to 0 by normal end-of-insn processing.  */
 +        break;
 +    }
 +
      s->base.is_jmp = DISAS_JUMP;
  }
 --
 .20.1

-[Qemu-devel] [PULL 06/22] target/arm: Default handling of BTYPE during translation
+[PULL 18/39] hw/arm: versal-virt: Add support for SD
-From: Richard Henderson <richard.henderson@linaro.org>
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
-The branch target exception for guarded pages has high priority,
+Add support for SD.
 and only 8 instructions are valid for that case.  Perform this
 check before doing any other decode.
-Clear BTYPE after all insns that neither set BTYPE nor exit via
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
-exception (DISAS_NORETURN).
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Luc Michel <luc.michel@greensocs.com>
-Not yet handled are insns that exit via DISAS_NORETURN for some
+Message-id: 20200427181649.26851-11-edgar.iglesias@gmail.com
 other reason, like direct branches.
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20190128223118.5255-7-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/internals.h     |   6 ++
+ hw/arm/xlnx-versal-virt.c | 46 +++++++++++++++++++++++++++++++++++++++
- target/arm/translate.h     |   9 ++-
+file changed, 46 insertions(+)
  target/arm/translate-a64.c | 139 +++++++++++++++++++++++++++++++++++++
 files changed, 152 insertions(+), 2 deletions(-)
-diff --git a/target/arm/internals.h b/target/arm/internals.h
+diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
+--- a/hw/arm/xlnx-versal-virt.c
-+++ b/target/arm/internals.h
++++ b/hw/arm/xlnx-versal-virt.c
-@@ -XXX,XX +XXX,XX @@ enum arm_exception_class {
+@@ -XXX,XX +XXX,XX @@
-     EC_FPIDTRAP               = 0x08,
+ #include "hw/arm/sysbus-fdt.h"
-     EC_PACTRAP                = 0x09,
+ #include "hw/arm/fdt.h"
-     EC_CP14RRTTRAP            = 0x0c,
+ #include "cpu.h"
-+    EC_BTITRAP                = 0x0d,
++#include "hw/qdev-properties.h"
-     EC_ILLEGALSTATE           = 0x0e,
+ #include "hw/arm/xlnx-versal.h"
-     EC_AA32_SVC               = 0x11,
-     EC_AA32_HVC               = 0x12,
+ #define TYPE_XLNX_VERSAL_VIRT_MACHINE MACHINE_TYPE_NAME("xlnx-versal-virt")
-@@ -XXX,XX +XXX,XX @@ static inline uint32_t syn_pactrap(void)
+@@ -XXX,XX +XXX,XX @@ static void fdt_add_zdma_nodes(VersalVirt *s)
-     return EC_PACTRAP << ARM_EL_EC_SHIFT;
+     }
  }
-+static inline uint32_t syn_btitrap(int btype)
++static void fdt_add_sd_nodes(VersalVirt *s)
 +{
-+    return (EC_BTITRAP << ARM_EL_EC_SHIFT) | btype;
++    const char clocknames[] = "clk_xin\0clk_ahb";
-+}
++    const char compat[] = "arasan,sdhci-8.9a";
 +    int i;
 +
- static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
++    for (i = ARRAY_SIZE(s->soc.pmc.iou.sd) - 1; i >= 0; i--) {
- {
++        uint64_t addr = MM_PMC_SD0 + MM_PMC_SD0_SIZE * i;
-     return (EC_INSNABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
++        char *name = g_strdup_printf("/sdhci@%" PRIx64, addr);
-diff --git a/target/arm/translate.h b/target/arm/translate.h
++
-index XXXXXXX..XXXXXXX 100644
++        qemu_fdt_add_subnode(s->fdt, name);
---- a/target/arm/translate.h
++
-+++ b/target/arm/translate.h
++        qemu_fdt_setprop_cells(s->fdt, name, "clocks",
-@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
++                               s->phandle.clk_25Mhz, s->phandle.clk_25Mhz);
-     bool pauth_active;
++        qemu_fdt_setprop(s->fdt, name, "clock-names",
-     /* True with v8.5-BTI and SCTLR_ELx.BT* set.  */
++                         clocknames, sizeof(clocknames));
-     bool bt;
++        qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
--    /* A copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.  */
++                               GIC_FDT_IRQ_TYPE_SPI, VERSAL_SD0_IRQ_0 + i * 2,
--    uint8_t btype;
++                               GIC_FDT_IRQ_FLAGS_LEVEL_HI);
-+    /*
++        qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
-+     * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
++                                     2, addr, 2, MM_PMC_SD0_SIZE);
-+     *  < 0, set by the current instruction.
++        qemu_fdt_setprop(s->fdt, name, "compatible", compat, sizeof(compat));
-+     */
++        g_free(name);
 +    int8_t btype;
 +    /* True if this page is guarded.  */
 +    bool guarded_page;
      /* Bottom two bits of XScale c15_cpar coprocessor access control reg */
      int c15_cpar;
      /* TCG op of the current insn_start.  */
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static inline int get_a64_user_mem_index(DisasContext *s)
      return arm_to_core_mmu_idx(useridx);
  }
 +static void reset_btype(DisasContext *s)
 +{
 +    if (s->btype != 0) {
 +        TCGv_i32 zero = tcg_const_i32(0);
 +        tcg_gen_st_i32(zero, cpu_env, offsetof(CPUARMState, btype));
 +        tcg_temp_free_i32(zero);
 +        s->btype = 0;
 +    }
 +}
 +
- void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
+ static void fdt_nop_memory_nodes(void *fdt, Error **errp)
                              fprintf_function cpu_fprintf, int flags)
  {
-@@ -XXX,XX +XXX,XX @@ static void disas_data_proc_simd_fp(DisasContext *s, uint32_t insn)
+     Error *err = NULL;
@@ -XXX,XX +XXX,XX @@ static void create_virtio_regions(VersalVirt *s)
      }
  }
-+/**
++static void sd_plugin_card(SDHCIState *sd, DriveInfo *di)
 + * is_guarded_page:
 + * @env: The cpu environment
 + * @s: The DisasContext
 + *
 + * Return true if the page is guarded.
 + */
 +static bool is_guarded_page(CPUARMState *env, DisasContext *s)
 +{
-+#ifdef CONFIG_USER_ONLY
++    BlockBackend *blk = di ? blk_by_legacy_dinfo(di) : NULL;
-+    return false;  /* FIXME */
++    DeviceState *card;
 +#else
 +    uint64_t addr = s->base.pc_first;
 +    int mmu_idx = arm_to_core_mmu_idx(s->mmu_idx);
 +    unsigned int index = tlb_index(env, mmu_idx, addr);
 +    CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
 +
-+    /*
++    card = qdev_create(qdev_get_child_bus(DEVICE(sd), "sd-bus"), TYPE_SD_CARD);
-+     * We test this immediately after reading an insn, which means
++    object_property_add_child(OBJECT(sd), "card[*]", OBJECT(card),
-+     * that any normal page must be in the TLB.  The only exception
++                              &error_fatal);
-+     * would be for executing from flash or device memory, which
++    qdev_prop_set_drive(card, "drive", blk, &error_fatal);
-+     * does not retain the TLB entry.
++    object_property_set_bool(OBJECT(card), true, "realized", &error_fatal);
 +     *
 +     * FIXME: Assume false for those, for now.  We could use
 +     * arm_cpu_get_phys_page_attrs_debug to re-read the page
 +     * table entry even for that case.
 +     */
 +    return (tlb_hit(entry->addr_code, addr) &&
 +            env->iotlb[mmu_idx][index].attrs.target_tlb_bit0);
 +#endif
 +}
 +
-+/**
+ static void versal_virt_init(MachineState *machine)
 + * btype_destination_ok:
 + * @insn: The instruction at the branch destination
 + * @bt: SCTLR_ELx.BT
 + * @btype: PSTATE.BTYPE, and is non-zero
 + *
 + * On a guarded page, there are a limited number of insns
 + * that may be present at the branch target:
 + *   - branch target identifiers,
 + *   - paciasp, pacibsp,
 + *   - BRK insn
 + *   - HLT insn
 + * Anything else causes a Branch Target Exception.
 + *
 + * Return true if the branch is compatible, false to raise BTITRAP.
 + */
 +static bool btype_destination_ok(uint32_t insn, bool bt, int btype)
 +{
 +    if ((insn & 0xfffff01fu) == 0xd503201fu) {
 +        /* HINT space */
 +        switch (extract32(insn, 5, 7)) {
 +        case 0b011001: /* PACIASP */
 +        case 0b011011: /* PACIBSP */
 +            /*
 +             * If SCTLR_ELx.BT, then PACI*SP are not compatible
 +             * with btype == 3.  Otherwise all btype are ok.
 +             */
 +            return !bt || btype != 3;
 +        case 0b100000: /* BTI */
 +            /* Not compatible with any btype.  */
 +            return false;
 +        case 0b100010: /* BTI c */
 +            /* Not compatible with btype == 3 */
 +            return btype != 3;
 +        case 0b100100: /* BTI j */
 +            /* Not compatible with btype == 2 */
 +            return btype != 2;
 +        case 0b100110: /* BTI jc */
 +            /* Compatible with any btype.  */
 +            return true;
 +        }
 +    } else {
 +        switch (insn & 0xffe0001fu) {
 +        case 0xd4200000u: /* BRK */
 +        case 0xd4400000u: /* HLT */
 +            /* Give priority to the breakpoint exception.  */
 +            return true;
 +        }
 +    }
 +    return false;
 +}
 +
  /* C3.1 A64 instruction index by encoding */
  static void disas_a64_insn(CPUARMState *env, DisasContext *s)
  {
-@@ -XXX,XX +XXX,XX @@ static void disas_a64_insn(CPUARMState *env, DisasContext *s)
+     VersalVirt *s = XLNX_VERSAL_VIRT_MACHINE(machine);
+     int psci_conduit = QEMU_PSCI_CONDUIT_DISABLED;
-     s->fp_access_checked = false;
++    int i;
-+    if (dc_isar_feature(aa64_bti, s)) {
+     /*
-+        if (s->base.num_insns == 1) {
+      * If the user provides an Operating System to be loaded, we expect them
-+            /*
+@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
-+             * At the first insn of the TB, compute s->guarded_page.
+     fdt_add_gic_nodes(s);
-+             * We delayed computing this until successfully reading
+     fdt_add_timer_nodes(s);
-+             * the first insn of the TB, above.  This (mostly) ensures
+     fdt_add_zdma_nodes(s);
-+             * that the softmmu tlb entry has been populated, and the
++    fdt_add_sd_nodes(s);
-+             * page table GP bit is available.
+     fdt_add_cpu_nodes(s, psci_conduit);
-+             *
+     fdt_add_clk_node(s, "/clk125", 125000000, s->phandle.clk_125Mhz);
-+             * Note that we need to compute this even if btype == 0,
+     fdt_add_clk_node(s, "/clk25", 25000000, s->phandle.clk_25Mhz);
-+             * because this value is used for BR instructions later
+@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
-+             * where ENV is not available.
+     memory_region_add_subregion_overlap(get_system_memory(),
-+             */
+, &s->soc.fpd.apu.mr, 0);
-+            s->guarded_page = is_guarded_page(env, s);
-+
++    /* Plugin SD cards.  */
-+            /* First insn can have btype set to non-zero.  */
++    for (i = 0; i < ARRAY_SIZE(s->soc.pmc.iou.sd); i++) {
-+            tcg_debug_assert(s->btype >= 0);
++        sd_plugin_card(&s->soc.pmc.iou.sd[i], drive_get_next(IF_SD));
 +
 +            /*
 +             * Note that the Branch Target Exception has fairly high
 +             * priority -- below debugging exceptions but above most
 +             * everything else.  This allows us to handle this now
 +             * instead of waiting until the insn is otherwise decoded.
 +             */
 +            if (s->btype != 0
 +                && s->guarded_page
 +                && !btype_destination_ok(insn, s->bt, s->btype)) {
 +                gen_exception_insn(s, 4, EXCP_UDEF, syn_btitrap(s->btype),
 +                                   default_exception_el(s));
 +                return;
 +            }
 +        } else {
 +            /* Not the first insn: btype must be 0.  */
 +            tcg_debug_assert(s->btype == 0);
 +        }
 +    }
 +
-     switch (extract32(insn, 25, 4)) {
+     s->binfo.ram_size = machine->ram_size;
-     case 0x0: case 0x1: case 0x3: /* UNALLOCATED */
+     s->binfo.loader_start = 0x0;
-         unallocated_encoding(s);
+     s->binfo.get_dtb = versal_virt_get_dtb;
@@ -XXX,XX +XXX,XX @@ static void disas_a64_insn(CPUARMState *env, DisasContext *s)
      /* if we allocated any temporaries, free them here */
      free_tmp_a64(s);
 +
 +    /*
 +     * After execution of most insns, btype is reset to 0.
 +     * Note that we set btype == -1 when the insn sets btype.
 +     */
 +    if (s->btype > 0 && s->base.is_jmp != DISAS_NORETURN) {
 +        reset_btype(s);
 +    }
  }
  static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
 --
 .20.1

-[Qemu-devel] [PULL 07/22] target/arm: Reset btype for direct branches
+[PULL 19/39] hw/arm: versal-virt: Add support for the RTC
-From: Richard Henderson <richard.henderson@linaro.org>
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
-This is all of the non-exception cases of DISAS_NORETURN.
+Add support for the RTC.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-Message-id: 20190128223118.5255-8-richard.henderson@linaro.org
+Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-12-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 6 ++++++
+ hw/arm/xlnx-versal-virt.c | 22 ++++++++++++++++++++++
-file changed, 6 insertions(+)
+file changed, 22 insertions(+)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/hw/arm/xlnx-versal-virt.c
-+++ b/target/arm/translate-a64.c
++++ b/hw/arm/xlnx-versal-virt.c
-@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_imm(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void fdt_add_sd_nodes(VersalVirt *s)
      }
-     /* B Branch / BL Branch with link */
-+    reset_btype(s);
-     gen_goto_tb(s, 0, addr);
  }
-@@ -XXX,XX +XXX,XX @@ static void disas_comp_b_imm(DisasContext *s, uint32_t insn)
++static void fdt_add_rtc_node(VersalVirt *s)
-     tcg_cmp = read_cpu_reg(s, rt, sf);
++{
-     label_match = gen_new_label();
++    const char compat[] = "xlnx,zynqmp-rtc";
++    const char interrupt_names[] = "alarm\0sec";
-+    reset_btype(s);
++    char *name = g_strdup_printf("/rtc@%x", MM_PMC_RTC);
      tcg_gen_brcondi_i64(op ? TCG_COND_NE : TCG_COND_EQ,
                          tcg_cmp, 0, label_match);
@@ -XXX,XX +XXX,XX @@ static void disas_test_b_imm(DisasContext *s, uint32_t insn)
      tcg_cmp = tcg_temp_new_i64();
      tcg_gen_andi_i64(tcg_cmp, cpu_reg(s, rt), (1ULL << bit_pos));
      label_match = gen_new_label();
 +
-+    reset_btype(s);
++    qemu_fdt_add_subnode(s->fdt, name);
-     tcg_gen_brcondi_i64(op ? TCG_COND_NE : TCG_COND_EQ,
++
-                         tcg_cmp, 0, label_match);
++    qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
-     tcg_temp_free_i64(tcg_cmp);
++                           GIC_FDT_IRQ_TYPE_SPI, VERSAL_RTC_ALARM_IRQ,
-@@ -XXX,XX +XXX,XX @@ static void disas_cond_b_imm(DisasContext *s, uint32_t insn)
++                           GIC_FDT_IRQ_FLAGS_LEVEL_HI,
-     addr = s->pc + sextract32(insn, 5, 19) * 4 - 4;
++                           GIC_FDT_IRQ_TYPE_SPI, VERSAL_RTC_SECONDS_IRQ,
-     cond = extract32(insn, 0, 4);
++                           GIC_FDT_IRQ_FLAGS_LEVEL_HI);
++    qemu_fdt_setprop(s->fdt, name, "interrupt-names",
-+    reset_btype(s);
++                     interrupt_names, sizeof(interrupt_names));
-     if (cond < 0x0e) {
++    qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
-         /* genuinely conditional branches */
++                                 2, MM_PMC_RTC, 2, MM_PMC_RTC_SIZE);
-         TCGLabel *label_match = gen_new_label();
++    qemu_fdt_setprop(s->fdt, name, "compatible", compat, sizeof(compat));
-@@ -XXX,XX +XXX,XX @@ static void handle_sync(DisasContext *s, uint32_t insn,
++    g_free(name);
-          * a self-modified code correctly and also to take
++}
-          * any pending interrupts immediately.
++
-          */
+ static void fdt_nop_memory_nodes(void *fdt, Error **errp)
-+        reset_btype(s);
+ {
-         gen_goto_tb(s, 0, s->pc);
+     Error *err = NULL;
-         return;
+@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
-     default:
+     fdt_add_timer_nodes(s);
      fdt_add_zdma_nodes(s);
      fdt_add_sd_nodes(s);
 +    fdt_add_rtc_node(s);
      fdt_add_cpu_nodes(s, psci_conduit);
      fdt_add_clk_node(s, "/clk125", 125000000, s->phandle.clk_125Mhz);
      fdt_add_clk_node(s, "/clk25", 25000000, s->phandle.clk_25Mhz);
 --
 .20.1

-[Qemu-devel] [PULL 21/22] hw/arm/boot: Support DTB autoload for firmware-only boots
+[PULL 20/39] target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check
-The arm_boot_info struct has a skip_dtb_autoload flag: if this is
+Somewhere along theline we accidentally added a duplicate
-set to true by the board code then arm_load_kernel() will not
+"using D16-D31 when they don't exist" check to do_vfm_dp()
-load the DTB itself, but will leave this for the board code to
+(probably an artifact of a patchseries rebase). Remove it.
 do itself later. However, the check for this is done in a
 code path which is only executed for the case where we load
 a kernel image file. If we're taking the "boot via firmware"
 code path then the flag isn't honoured and the DTB is never
 loaded.
 We didn't notice this because the only real user of "boot
 via firmware" that cares about the DTB is the virt board
 (for UEFI boot), and that always wants skip_dtb_autoload
 anyway. But the SBSA reference board model we're planning to
 add will want the flag to behave correctly.
 Now we've refactored the arm_load_kernel() function, the
 fix is simple: drop the early 'return' so we fall into
 the same "load the DTB" code the boot-direct-kernel path uses.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Igor Mammedov <imammedo@redhat.com>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20190131112240.8395-6-peter.maydell@linaro.org
+Message-id: 20200430181003.21682-2-peter.maydell@linaro.org
 ---
- hw/arm/boot.c | 1 -
+ target/arm/translate-vfp.inc.c | 6 ------
-file changed, 1 deletion(-)
+file changed, 6 deletions(-)
-diff --git a/hw/arm/boot.c b/hw/arm/boot.c
+diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/boot.c
+--- a/target/arm/translate-vfp.inc.c
-+++ b/hw/arm/boot.c
++++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
+@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
-     /* Load the kernel.  */
+         return false;
-     if (!info->kernel_filename || info->firmware_loaded) {
+     }
-         arm_setup_firmware_boot(cpu, info);
--        return;
+-    /* UNDEF accesses to D16-D31 if they don't exist. */
-     } else {
+-    if (!dc_isar_feature(aa32_simd_r32, s) &&
-         arm_setup_direct_kernel_boot(cpu, info);
+-        ((a->vd | a->vn | a->vm) & 0x10)) {
 -        return false;
 -    }
 -
      if (!vfp_access_check(s)) {
          return true;
      }
 --
 .20.1

-New patch
+[PULL 21/39] target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
+We were accidentally permitting decode of Thumb Neon insns even if
+the CPU didn't have the FEATURE_NEON bit set, because the feature
+check was being done before the call to disas_neon_data_insn() and
+disas_neon_ls_insn() in the Arm decoder but was omitted from the
+Thumb decoder.  Push the feature bit check down into the called
+functions so it is done for both Arm and Thumb encodings.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20200430181003.21682-3-peter.maydell@linaro.org
+---
+ target/arm/translate.c | 16 ++++++++--------
+file changed, 8 insertions(+), 8 deletions(-)
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
+     TCGv_i32 tmp2;
+     TCGv_i64 tmp64;
++    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
++        return 1;
++    }
++
+     /* FIXME: this access check should not take precedence over UNDEF
+      * for invalid encodings; we will generate incorrect syndrome information
+      * for attempts to execute invalid vfp/neon encodings with FP disabled.
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+     TCGv_ptr ptr1, ptr2, ptr3;
+     TCGv_i64 tmp64;
++    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
++        return 1;
++    }
++
+     /* FIXME: this access check should not take precedence over UNDEF
+      * for invalid encodings; we will generate incorrect syndrome information
+      * for attempts to execute invalid vfp/neon encodings with FP disabled.
+@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
+         if (((insn >> 25) & 7) == 1) {
+             /* NEON Data processing.  */
+-            if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+-                goto illegal_op;
+-            }
+-
+             if (disas_neon_data_insn(s, insn)) {
+                 goto illegal_op;
+             }
+@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
+         }
+         if ((insn & 0x0f100000) == 0x04000000) {
+             /* NEON load/store.  */
+-            if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+-                goto illegal_op;
+-            }
+-
+             if (disas_neon_ls_insn(s, insn)) {
+                 goto illegal_op;
+             }
+--
+.20.1

-[Qemu-devel] [PULL 17/22] hw/arm/boot: Fix block comment style in arm_load_kernel()
+[PULL 22/39] target/arm: Add stubs for AArch32 Neon decodetree
-Fix the block comment style in arm_load_kernel() to QEMU's
+Add the infrastructure for building and invoking a decodetree decoder
-current style preferences. This will allow us to do some
+for the AArch32 Neon encodings.  At the moment the new decoder covers
-refactoring of this function without checkpatch complaining
+nothing, so we always fall back to the existing hand-written decode.
-about the code-motion patches.
 We follow the same pattern we did for the VFP decodetree conversion
 (commit 78e138bc1f672c145ef6ace74617d and following): code that deals
 with Neon will be moving gradually out to translate-neon.vfp.inc,
 which we #include into translate.c.
 In order to share the decode files between A32 and T32, we
 split Neon into 3 parts:
  * data-processing
  * load-store
  * 'shared' encodings
 The first two groups of instructions have similar but not identical
 A32 and T32 encodings, so we need to manually transform the T32
 encoding into the A32 one before calling the decoder; the third group
 covers the Neon instructions which are identical in A32 and T32.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Igor Mammedov <imammedo@redhat.com>
+Message-id: 20200430181003.21682-4-peter.maydell@linaro.org
 Message-id: 20190131112240.8395-2-peter.maydell@linaro.org
 ---
- hw/arm/boot.c | 30 ++++++++++++++++++++----------
+ target/arm/neon-dp.decode       | 29 ++++++++++++++++++++++++++
-file changed, 20 insertions(+), 10 deletions(-)
+ target/arm/neon-ls.decode       | 29 ++++++++++++++++++++++++++
+ target/arm/neon-shared.decode   | 27 +++++++++++++++++++++++++
-diff --git a/hw/arm/boot.c b/hw/arm/boot.c
+ target/arm/translate-neon.inc.c | 32 +++++++++++++++++++++++++++++
  target/arm/translate.c          | 36 +++++++++++++++++++++++++++++++--
  target/arm/Makefile.objs        | 18 +++++++++++++++++
 files changed, 169 insertions(+), 2 deletions(-)
  create mode 100644 target/arm/neon-dp.decode
  create mode 100644 target/arm/neon-ls.decode
  create mode 100644 target/arm/neon-shared.decode
  create mode 100644 target/arm/translate-neon.inc.c
 diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@
 +# AArch32 Neon data-processing instruction descriptions
 +#
 +#  Copyright (c) 2020 Linaro, Ltd
 +#
 +# This library is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU Lesser General Public
 +# License as published by the Free Software Foundation; either
 +# version 2 of the License, or (at your option) any later version.
 +#
 +# This library is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +# Lesser General Public License for more details.
 +#
 +# You should have received a copy of the GNU Lesser General Public
 +# License along with this library; if not, see <http://www.gnu.org/licenses/>.
 +
 +#
 +# This file is processed by scripts/decodetree.py
 +#
 +
 +# Encodings for Neon data processing instructions where the T32 encoding
 +# is a simple transformation of the A32 encoding.
 +# More specifically, this file covers instructions where the A32 encoding is
 +#   0b1111_001p_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
 +# and the T32 encoding is
 +#   0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
 +# This file works on the A32 encoding only; calling code for T32 has to
 +# transform the insn into the A32 version first.
 diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/neon-ls.decode
@@ -XXX,XX +XXX,XX @@
 +# AArch32 Neon load/store instruction descriptions
 +#
 +#  Copyright (c) 2020 Linaro, Ltd
 +#
 +# This library is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU Lesser General Public
 +# License as published by the Free Software Foundation; either
 +# version 2 of the License, or (at your option) any later version.
 +#
 +# This library is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +# Lesser General Public License for more details.
 +#
 +# You should have received a copy of the GNU Lesser General Public
 +# License along with this library; if not, see <http://www.gnu.org/licenses/>.
 +
 +#
 +# This file is processed by scripts/decodetree.py
 +#
 +
 +# Encodings for Neon load/store instructions where the T32 encoding
 +# is a simple transformation of the A32 encoding.
 +# More specifically, this file covers instructions where the A32 encoding is
 +#   0b1111_0100_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
 +# and the T32 encoding is
 +#   0b1111_1001_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
 +# This file works on the A32 encoding only; calling code for T32 has to
 +# transform the insn into the A32 version first.
 diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@
 +# AArch32 Neon instruction descriptions
 +#
 +#  Copyright (c) 2020 Linaro, Ltd
 +#
 +# This library is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU Lesser General Public
 +# License as published by the Free Software Foundation; either
 +# version 2 of the License, or (at your option) any later version.
 +#
 +# This library is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +# Lesser General Public License for more details.
 +#
 +# You should have received a copy of the GNU Lesser General Public
 +# License along with this library; if not, see <http://www.gnu.org/licenses/>.
 +
 +#
 +# This file is processed by scripts/decodetree.py
 +#
 +
 +# Encodings for Neon instructions whose encoding is the same for
 +# both A32 and T32.
 +
 +# More specifically, this covers:
 +# 2reg scalar ext: 0b1111_1110_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
 +# 3same ext:       0b1111_110x_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + *  ARM translation: AArch32 Neon instructions
 + *
 + *  Copyright (c) 2003 Fabrice Bellard
 + *  Copyright (c) 2005-2007 CodeSourcery
 + *  Copyright (c) 2007 OpenedHand, Ltd.
 + *  Copyright (c) 2020 Linaro, Ltd.
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
 + */
 +
 +/*
 + * This file is intended to be included from translate.c; it uses
 + * some macros and definitions provided by that file.
 + * It might be possible to convert it to a standalone .c file eventually.
 + */
 +
 +/* Include the generated Neon decoder */
 +#include "decode-neon-dp.inc.c"
 +#include "decode-neon-ls.inc.c"
 +#include "decode-neon-shared.inc.c"
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/boot.c
+--- a/target/arm/translate.c
-+++ b/hw/arm/boot.c
++++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
+@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
-     static const ARMInsnFixup *primary_loader;
-     AddressSpace *as = arm_boot_address_space(cpu, info);
+ #define ARM_CP_RW_BIT   (1 << 20)
--    /* CPU objects (unlike devices) are not automatically reset on system
+-/* Include the VFP decoder */
-+    /*
++/* Include the VFP and Neon decoders */
-+     * CPU objects (unlike devices) are not automatically reset on system
+ #include "translate-vfp.inc.c"
-      * reset, so we must always register a handler to do so. If we're
++#include "translate-neon.inc.c"
-      * actually loading a kernel, the handler is also responsible for
-      * arranging that we start it correctly.
+ static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
-@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
+ {
-         qemu_register_reset(do_cpu_reset, ARM_CPU(cs));
+@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
          /* Unconditional instructions.  */
          /* TODO: Perhaps merge these into one decodetree output file.  */
          if (disas_a32_uncond(s, insn) ||
 -            disas_vfp_uncond(s, insn)) {
 +            disas_vfp_uncond(s, insn) ||
 +            disas_neon_dp(s, insn) ||
 +            disas_neon_ls(s, insn) ||
 +            disas_neon_shared(s, insn)) {
              return;
          }
          /* fall back to legacy decoder */
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
          ARCH(6T2);
      }
--    /* The board code is not supposed to set secure_board_setup unless
++    if ((insn & 0xef000000) == 0xef000000) {
-+    /*
++        /*
-+     * The board code is not supposed to set secure_board_setup unless
++         * T32 encodings 0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
-      * running its code in secure mode is actually possible, and KVM
++         * transform into
-      * doesn't support secure.
++         * A32 encodings 0b1111_001p_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
 +         */
 +        uint32_t a32_insn = (insn & 0xe2ffffff) |
 +            ((insn & (1 << 28)) >> 4) | (1 << 28);
 +
 +        if (disas_neon_dp(s, a32_insn)) {
 +            return;
 +        }
 +    }
 +
 +    if ((insn & 0xff100000) == 0xf9000000) {
 +        /*
 +         * T32 encodings 0b1111_1001_ppp0_qqqq_qqqq_qqqq_qqqq_qqqq
 +         * transform into
 +         * A32 encodings 0b1111_0100_ppp0_qqqq_qqqq_qqqq_qqqq_qqqq
 +         */
 +        uint32_t a32_insn = (insn & 0x00ffffff) | 0xf4000000;
 +
 +        if (disas_neon_ls(s, a32_insn)) {
 +            return;
 +        }
 +    }
 +
      /*
       * TODO: Perhaps merge these into one decodetree output file.
       * Note disas_vfp is written for a32 with cond field in the
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
       */
-@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
+     if (disas_t32(s, insn) ||
-     if (!info->kernel_filename || info->firmware_loaded) {
+         disas_vfp_uncond(s, insn) ||
++        disas_neon_shared(s, insn) ||
-         if (have_dtb(info)) {
+         ((insn >> 28) == 0xe && disas_vfp(s, insn))) {
 -            /* If we have a device tree blob, but no kernel to supply it to (or
 +            /*
 +             * If we have a device tree blob, but no kernel to supply it to (or
               * the kernel is supposed to be loaded by the bootloader), copy the
               * DTB to the base of RAM for the bootloader to pick up.
               */
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
              try_decompressing_kernel = arm_feature(&cpu->env,
                                                     ARM_FEATURE_AARCH64);
 -            /* Expose the kernel, the command line, and the initrd in fw_cfg.
 +            /*
 +             * Expose the kernel, the command line, and the initrd in fw_cfg.
               * We don't process them here at all, it's all left to the
               * firmware.
               */
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
              }
          }
 -        /* We will start from address 0 (typically a boot ROM image) in the
 +        /*
 +         * We will start from address 0 (typically a boot ROM image) in the
           * same way as hardware.
           */
          return;
-@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
+     }
-     if (info->nb_cpus == 0)
+diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
-         info->nb_cpus = 1;
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/Makefile.objs
--    /* We want to put the initrd far enough into RAM that when the
++++ b/target/arm/Makefile.objs
-+    /*
+@@ -XXX,XX +XXX,XX @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
-+     * We want to put the initrd far enough into RAM that when the
+       $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
-      * kernel is uncompressed it will not clobber the initrd. However
+       "GEN", $(TARGET_DIR)$@)
-      * on boards without much RAM we must ensure that we still leave
-      * enough room for a decent sized initrd, and on boards with large
++target/arm/decode-neon-shared.inc.c: $(SRC_PATH)/target/arm/neon-shared.decode $(DECODETREE)
-@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
++    $(call quiet-command,\
-     kernel_size = arm_load_elf(info, &elf_entry, &elf_low_addr,
++      $(PYTHON) $(DECODETREE) --static-decode disas_neon_shared -o $@ $<,\
-                                &elf_high_addr, elf_machine, as);
++      "GEN", $(TARGET_DIR)$@)
-     if (kernel_size > 0 && have_dtb(info)) {
++
--        /* If there is still some room left at the base of RAM, try and put
++target/arm/decode-neon-dp.inc.c: $(SRC_PATH)/target/arm/neon-dp.decode $(DECODETREE)
-+        /*
++    $(call quiet-command,\
-+         * If there is still some room left at the base of RAM, try and put
++      $(PYTHON) $(DECODETREE) --static-decode disas_neon_dp -o $@ $<,\
-          * the DTB there like we do for images loaded with -bios or -pflash.
++      "GEN", $(TARGET_DIR)$@)
-          */
++
-         if (elf_low_addr > info->loader_start
++target/arm/decode-neon-ls.inc.c: $(SRC_PATH)/target/arm/neon-ls.decode $(DECODETREE)
-             || elf_high_addr < info->loader_start) {
++    $(call quiet-command,\
--            /* Set elf_low_addr as address limit for arm_load_dtb if it may be
++      $(PYTHON) $(DECODETREE) --static-decode disas_neon_ls -o $@ $<,\
-+            /*
++      "GEN", $(TARGET_DIR)$@)
-+             * Set elf_low_addr as address limit for arm_load_dtb if it may be
++
-              * pointing into RAM, otherwise pass '0' (no limit)
+ target/arm/decode-vfp.inc.c: $(SRC_PATH)/target/arm/vfp.decode $(DECODETREE)
-              */
+     $(call quiet-command,\
-             if (elf_low_addr < info->loader_start) {
+       $(PYTHON) $(DECODETREE) --static-decode disas_vfp -o $@ $<,\
-@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
+@@ -XXX,XX +XXX,XX @@ target/arm/decode-t16.inc.c: $(SRC_PATH)/target/arm/t16.decode $(DECODETREE)
-         fixupcontext[FIXUP_BOARDID] = info->board_id;
+       "GEN", $(TARGET_DIR)$@)
-         fixupcontext[FIXUP_BOARD_SETUP] = info->board_setup_addr;
+ target/arm/translate-sve.o: target/arm/decode-sve.inc.c
--        /* for device tree boot, we pass the DTB directly in r2. Otherwise
++target/arm/translate.o: target/arm/decode-neon-shared.inc.c
-+        /*
++target/arm/translate.o: target/arm/decode-neon-dp.inc.c
-+         * for device tree boot, we pass the DTB directly in r2. Otherwise
++target/arm/translate.o: target/arm/decode-neon-ls.inc.c
-          * we point to the kernel args.
+ target/arm/translate.o: target/arm/decode-vfp.inc.c
-          */
+ target/arm/translate.o: target/arm/decode-vfp-uncond.inc.c
-         if (have_dtb(info)) {
+ target/arm/translate.o: target/arm/decode-a32.inc.c
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
              info->write_board_setup(cpu, info);
          }
 -        /* Notify devices which need to fake up firmware initialization
 +        /*
 +         * Notify devices which need to fake up firmware initialization
           * that we're doing a direct kernel boot.
           */
          object_child_foreach_recursive(object_get_root(),
 --
 .20.1

-New patch
+[PULL 23/39] target/arm: Convert VCMLA (vector) to decodetree
+Convert the VCMLA (vector) insns in the 3same extension group to
+decodetree.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-5-peter.maydell@linaro.org
+---
+ target/arm/neon-shared.decode   | 11 ++++++++++
+ target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
+ target/arm/translate.c          | 11 +---------
+files changed, 49 insertions(+), 10 deletions(-)
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon-shared.decode
++++ b/target/arm/neon-shared.decode
+@@ -XXX,XX +XXX,XX @@
+ # More specifically, this covers:
+ # 2reg scalar ext: 0b1111_1110_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
+ # 3same ext:       0b1111_110x_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
++
++# VFP/Neon register fields; same as vfp.decode
++%vm_dp  5:1 0:4
++%vm_sp  0:4 5:1
++%vn_dp  7:1 16:4
++%vn_sp  16:4 7:1
++%vd_dp  22:1 12:4
++%vd_sp  12:4 22:1
++
++VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
++               vm=%vm_dp vn=%vn_dp vd=%vd_dp
+diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.inc.c
++++ b/target/arm/translate-neon.inc.c
+@@ -XXX,XX +XXX,XX @@
+ #include "decode-neon-dp.inc.c"
+ #include "decode-neon-ls.inc.c"
+ #include "decode-neon-shared.inc.c"
++
++static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
++{
++    int opr_sz;
++    TCGv_ptr fpst;
++    gen_helper_gvec_3_ptr *fn_gvec_ptr;
++
++    if (!dc_isar_feature(aa32_vcma, s)
++        || (!a->size && !dc_isar_feature(aa32_fp16_arith, s))) {
++        return false;
++    }
++
++    /* UNDEF accesses to D16-D31 if they don't exist. */
++    if (!dc_isar_feature(aa32_simd_r32, s) &&
++        ((a->vd | a->vn | a->vm) & 0x10)) {
++        return false;
++    }
++
++    if ((a->vn | a->vm | a->vd) & a->q) {
++        return false;
++    }
++
++    if (!vfp_access_check(s)) {
++        return true;
++    }
++
++    opr_sz = (1 + a->q) * 8;
++    fpst = get_fpstatus_ptr(1);
++    fn_gvec_ptr = a->size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
++    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
++                       vfp_reg_offset(1, a->vn),
++                       vfp_reg_offset(1, a->vm),
++                       fpst, opr_sz, opr_sz, a->rot,
++                       fn_gvec_ptr);
++    tcg_temp_free_ptr(fpst);
++    return true;
++}
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
+     bool is_long = false, q = extract32(insn, 6, 1);
+     bool ptr_is_env = false;
+-    if ((insn & 0xfe200f10) == 0xfc200800) {
+-        /* VCMLA -- 1111 110R R.1S .... .... 1000 ...0 .... */
+-        int size = extract32(insn, 20, 1);
+-        data = extract32(insn, 23, 2); /* rot */
+-        if (!dc_isar_feature(aa32_vcma, s)
+-            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
+-            return 1;
+-        }
+-        fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
+-    } else if ((insn & 0xfea00f10) == 0xfc800800) {
++    if ((insn & 0xfea00f10) == 0xfc800800) {
+         /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
+         int size = extract32(insn, 20, 1);
+         data = extract32(insn, 24, 1); /* rot */
+--
+.20.1

-New patch
+[PULL 24/39] target/arm: Convert VCADD (vector) to decodetree
+Convert the VCADD (vector) insns to decodetree.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-6-peter.maydell@linaro.org
+---
+ target/arm/neon-shared.decode   |  3 +++
+ target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
+ target/arm/translate.c          | 11 +---------
+files changed, 41 insertions(+), 10 deletions(-)
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon-shared.decode
++++ b/target/arm/neon-shared.decode
+@@ -XXX,XX +XXX,XX @@
+ VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp
++
++VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
++               vm=%vm_dp vn=%vn_dp vd=%vd_dp
+diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.inc.c
++++ b/target/arm/translate-neon.inc.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
+     tcg_temp_free_ptr(fpst);
+     return true;
+ }
++
++static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
++{
++    int opr_sz;
++    TCGv_ptr fpst;
++    gen_helper_gvec_3_ptr *fn_gvec_ptr;
++
++    if (!dc_isar_feature(aa32_vcma, s)
++        || (!a->size && !dc_isar_feature(aa32_fp16_arith, s))) {
++        return false;
++    }
++
++    /* UNDEF accesses to D16-D31 if they don't exist. */
++    if (!dc_isar_feature(aa32_simd_r32, s) &&
++        ((a->vd | a->vn | a->vm) & 0x10)) {
++        return false;
++    }
++
++    if ((a->vn | a->vm | a->vd) & a->q) {
++        return false;
++    }
++
++    if (!vfp_access_check(s)) {
++        return true;
++    }
++
++    opr_sz = (1 + a->q) * 8;
++    fpst = get_fpstatus_ptr(1);
++    fn_gvec_ptr = a->size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
++    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
++                       vfp_reg_offset(1, a->vn),
++                       vfp_reg_offset(1, a->vm),
++                       fpst, opr_sz, opr_sz, a->rot,
++                       fn_gvec_ptr);
++    tcg_temp_free_ptr(fpst);
++    return true;
++}
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
+     bool is_long = false, q = extract32(insn, 6, 1);
+     bool ptr_is_env = false;
+-    if ((insn & 0xfea00f10) == 0xfc800800) {
+-        /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
+-        int size = extract32(insn, 20, 1);
+-        data = extract32(insn, 24, 1); /* rot */
+-        if (!dc_isar_feature(aa32_vcma, s)
+-            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
+-            return 1;
+-        }
+-        fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
+-    } else if ((insn & 0xfeb00f00) == 0xfc200d00) {
++    if ((insn & 0xfeb00f00) == 0xfc200d00) {
+         /* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */
+         bool u = extract32(insn, 4, 1);
+         if (!dc_isar_feature(aa32_dp, s)) {
+--
+.20.1

-New patch
+[PULL 25/39] target/arm: Convert V[US]DOT (vector) to decodetree
+Convert the V[US]DOT (vector) insns to decodetree.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-7-peter.maydell@linaro.org
+---
+ target/arm/neon-shared.decode   |  4 ++++
+ target/arm/translate-neon.inc.c | 32 ++++++++++++++++++++++++++++++++
+ target/arm/translate.c          |  9 +--------
+files changed, 37 insertions(+), 8 deletions(-)
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon-shared.decode
++++ b/target/arm/neon-shared.decode
+@@ -XXX,XX +XXX,XX @@ VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
+ VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp
++
++# VUDOT and VSDOT
++VDOT           1111 110 00 . 10 .... .... 1101 . q:1 . u:1 .... \
++               vm=%vm_dp vn=%vn_dp vd=%vd_dp
+diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.inc.c
++++ b/target/arm/translate-neon.inc.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
+     tcg_temp_free_ptr(fpst);
+     return true;
+ }
++
++static bool trans_VDOT(DisasContext *s, arg_VDOT *a)
++{
++    int opr_sz;
++    gen_helper_gvec_3 *fn_gvec;
++
++    if (!dc_isar_feature(aa32_dp, s)) {
++        return false;
++    }
++
++    /* UNDEF accesses to D16-D31 if they don't exist. */
++    if (!dc_isar_feature(aa32_simd_r32, s) &&
++        ((a->vd | a->vn | a->vm) & 0x10)) {
++        return false;
++    }
++
++    if ((a->vn | a->vm | a->vd) & a->q) {
++        return false;
++    }
++
++    if (!vfp_access_check(s)) {
++        return true;
++    }
++
++    opr_sz = (1 + a->q) * 8;
++    fn_gvec = a->u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b;
++    tcg_gen_gvec_3_ool(vfp_reg_offset(1, a->vd),
++                       vfp_reg_offset(1, a->vn),
++                       vfp_reg_offset(1, a->vm),
++                       opr_sz, opr_sz, 0, fn_gvec);
++    return true;
++}
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
+     bool is_long = false, q = extract32(insn, 6, 1);
+     bool ptr_is_env = false;
+-    if ((insn & 0xfeb00f00) == 0xfc200d00) {
+-        /* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */
+-        bool u = extract32(insn, 4, 1);
+-        if (!dc_isar_feature(aa32_dp, s)) {
+-            return 1;
+-        }
+-        fn_gvec = u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b;
+-    } else if ((insn & 0xff300f10) == 0xfc200810) {
++    if ((insn & 0xff300f10) == 0xfc200810) {
+         /* VFM[AS]L -- 1111 1100 S.10 .... .... 1000 .Q.1 .... */
+         int is_s = extract32(insn, 23, 1);
+         if (!dc_isar_feature(aa32_fhm, s)) {
+--
+.20.1

-New patch
+[PULL 26/39] target/arm: Convert VFM[AS]L (vector) to decodetree
+Convert the VFM[AS]L (vector) insns to decodetree.  This is the last
+insn in the legacy decoder for the 3same_ext group, so we can
+delete the legacy decoder function for the group entirely.
+Note that in disas_thumb2_insn() the parts of this encoding space
+where the decodetree decoder returns false will correctly be directed
+to illegal_op by the "(insn & (1 << 28))" check so they won't fall
+into disas_coproc_insn() by mistake.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-8-peter.maydell@linaro.org
+---
+ target/arm/neon-shared.decode   |  6 +++
+ target/arm/translate-neon.inc.c | 31 +++++++++++
+ target/arm/translate.c          | 92 +--------------------------------
+files changed, 38 insertions(+), 91 deletions(-)
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon-shared.decode
++++ b/target/arm/neon-shared.decode
+@@ -XXX,XX +XXX,XX @@ VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
+ # VUDOT and VSDOT
+ VDOT           1111 110 00 . 10 .... .... 1101 . q:1 . u:1 .... \
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp
++
++# VFM[AS]L
++VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
++               vm=%vm_sp vn=%vn_sp vd=%vd_dp q=0
++VFML           1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
++               vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
+diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.inc.c
++++ b/target/arm/translate-neon.inc.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_VDOT(DisasContext *s, arg_VDOT *a)
+                        opr_sz, opr_sz, 0, fn_gvec);
+     return true;
+ }
++
++static bool trans_VFML(DisasContext *s, arg_VFML *a)
++{
++    int opr_sz;
++
++    if (!dc_isar_feature(aa32_fhm, s)) {
++        return false;
++    }
++
++    /* UNDEF accesses to D16-D31 if they don't exist. */
++    if (!dc_isar_feature(aa32_simd_r32, s) &&
++        (a->vd & 0x10)) {
++        return false;
++    }
++
++    if (a->vd & a->q) {
++        return false;
++    }
++
++    if (!vfp_access_check(s)) {
++        return true;
++    }
++
++    opr_sz = (1 + a->q) * 8;
++    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
++                       vfp_reg_offset(a->q, a->vn),
++                       vfp_reg_offset(a->q, a->vm),
++                       cpu_env, opr_sz, opr_sz, a->s, /* is_2 == 0 */
++                       gen_helper_gvec_fmlal_a32);
++    return true;
++}
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+     return 0;
+ }
+-/* Advanced SIMD three registers of the same length extension.
+- *  31           25    23  22    20   16   12  11   10   9    8        3     0
+- * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
+- * | 1 1 1 1 1 1 0 | op1 | D | op2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
+- * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
+- */
+-static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
+-{
+-    gen_helper_gvec_3 *fn_gvec = NULL;
+-    gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL;
+-    int rd, rn, rm, opr_sz;
+-    int data = 0;
+-    int off_rn, off_rm;
+-    bool is_long = false, q = extract32(insn, 6, 1);
+-    bool ptr_is_env = false;
+-
+-    if ((insn & 0xff300f10) == 0xfc200810) {
+-        /* VFM[AS]L -- 1111 1100 S.10 .... .... 1000 .Q.1 .... */
+-        int is_s = extract32(insn, 23, 1);
+-        if (!dc_isar_feature(aa32_fhm, s)) {
+-            return 1;
+-        }
+-        is_long = true;
+-        data = is_s; /* is_2 == 0 */
+-        fn_gvec_ptr = gen_helper_gvec_fmlal_a32;
+-        ptr_is_env = true;
+-    } else {
+-        return 1;
+-    }
+-
+-    VFP_DREG_D(rd, insn);
+-    if (rd & q) {
+-        return 1;
+-    }
+-    if (q || !is_long) {
+-        VFP_DREG_N(rn, insn);
+-        VFP_DREG_M(rm, insn);
+-        if ((rn | rm) & q & !is_long) {
+-            return 1;
+-        }
+-        off_rn = vfp_reg_offset(1, rn);
+-        off_rm = vfp_reg_offset(1, rm);
+-    } else {
+-        rn = VFP_SREG_N(insn);
+-        rm = VFP_SREG_M(insn);
+-        off_rn = vfp_reg_offset(0, rn);
+-        off_rm = vfp_reg_offset(0, rm);
+-    }
+-
+-    if (s->fp_excp_el) {
+-        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+-                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
+-        return 0;
+-    }
+-    if (!s->vfp_enabled) {
+-        return 1;
+-    }
+-
+-    opr_sz = (1 + q) * 8;
+-    if (fn_gvec_ptr) {
+-        TCGv_ptr ptr;
+-        if (ptr_is_env) {
+-            ptr = cpu_env;
+-        } else {
+-            ptr = get_fpstatus_ptr(1);
+-        }
+-        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), off_rn, off_rm, ptr,
+-                           opr_sz, opr_sz, data, fn_gvec_ptr);
+-        if (!ptr_is_env) {
+-            tcg_temp_free_ptr(ptr);
+-        }
+-    } else {
+-        tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), off_rn, off_rm,
+-                           opr_sz, opr_sz, data, fn_gvec);
+-    }
+-    return 0;
+-}
+-
+ /* Advanced SIMD two registers and a scalar extension.
+  *  31             24   23  22   20   16   12  11   10   9    8        3     0
+  * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
+@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
+                     }
+                 }
+             }
+-        } else if ((insn & 0x0e000a00) == 0x0c000800
+-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
+-            if (disas_neon_insn_3same_ext(s, insn)) {
+-                goto illegal_op;
+-            }
+-            return;
+         } else if ((insn & 0x0f000a00) == 0x0e000800
+                    && arm_dc_feature(s, ARM_FEATURE_V8)) {
+             if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
+@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
+             }
+             break;
+         }
+-        if ((insn & 0xfe000a00) == 0xfc000800
++        if ((insn & 0xff000a00) == 0xfe000800
+             && arm_dc_feature(s, ARM_FEATURE_V8)) {
+             /* The Thumb2 and ARM encodings are identical.  */
+-            if (disas_neon_insn_3same_ext(s, insn)) {
+-                goto illegal_op;
+-            }
+-        } else if ((insn & 0xff000a00) == 0xfe000800
+-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
+-            /* The Thumb2 and ARM encodings are identical.  */
+             if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
+                 goto illegal_op;
+             }
+--
+.20.1

-[Qemu-devel] [PULL 20/22] hw/arm/boot: Clarify why arm_setup_firmware_boot() doesn't set env->boot_info
+[PULL 27/39] target/arm: Convert VCMLA (scalar) to decodetree
-The code path for booting firmware doesn't set env->boot_info. At
+Convert VCMLA (scalar) in the 2reg-scalar-ext group to decodetree.
 first sight this looks odd, so add a comment saying why we don't.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Igor Mammedov <imammedo@redhat.com>
+Message-id: 20200430181003.21682-9-peter.maydell@linaro.org
 Message-id: 20190131112240.8395-5-peter.maydell@linaro.org
 ---
- hw/arm/boot.c | 3 ++-
+ target/arm/neon-shared.decode   |  5 +++++
-file changed, 2 insertions(+), 1 deletion(-)
+ target/arm/translate-neon.inc.c | 40 +++++++++++++++++++++++++++++++++
  target/arm/translate.c          | 26 +--------------------
 files changed, 46 insertions(+), 25 deletions(-)
-diff --git a/hw/arm/boot.c b/hw/arm/boot.c
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/boot.c
+--- a/target/arm/neon-shared.decode
-+++ b/hw/arm/boot.c
++++ b/target/arm/neon-shared.decode
-@@ -XXX,XX +XXX,XX @@ static void arm_setup_firmware_boot(ARMCPU *cpu, struct arm_boot_info *info)
+@@ -XXX,XX +XXX,XX @@ VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
+                vm=%vm_sp vn=%vn_sp vd=%vd_dp q=0
-     /*
+ VFML           1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
-      * We will start from address 0 (typically a boot ROM image) in the
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
--     * same way as hardware.
++
-+     * same way as hardware. Leave env->boot_info NULL, so that
++VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
-+     * do_cpu_reset() knows it does not need to alter the PC on reset.
++               vn=%vn_dp vd=%vd_dp size=0
-      */
++VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
 +               vm=%vm_dp vn=%vn_dp vd=%vd_dp size=1 index=0
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VFML(DisasContext *s, arg_VFML *a)
                         gen_helper_gvec_fmlal_a32);
      return true;
  }
++
++static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
++{
++    gen_helper_gvec_3_ptr *fn_gvec_ptr;
++    int opr_sz;
++    TCGv_ptr fpst;
++
++    if (!dc_isar_feature(aa32_vcma, s)) {
++        return false;
++    }
++    if (a->size == 0 && !dc_isar_feature(aa32_fp16_arith, s)) {
++        return false;
++    }
++
++    /* UNDEF accesses to D16-D31 if they don't exist. */
++    if (!dc_isar_feature(aa32_simd_r32, s) &&
++        ((a->vd | a->vn | a->vm) & 0x10)) {
++        return false;
++    }
++
++    if ((a->vd | a->vn) & a->q) {
++        return false;
++    }
++
++    if (!vfp_access_check(s)) {
++        return true;
++    }
++
++    fn_gvec_ptr = (a->size ? gen_helper_gvec_fcmlas_idx
++                   : gen_helper_gvec_fcmlah_idx);
++    opr_sz = (1 + a->q) * 8;
++    fpst = get_fpstatus_ptr(1);
++    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
++                       vfp_reg_offset(1, a->vn),
++                       vfp_reg_offset(1, a->vm),
++                       fpst, opr_sz, opr_sz,
++                       (a->index << 2) | a->rot, fn_gvec_ptr);
++    tcg_temp_free_ptr(fpst);
++    return true;
++}
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
+     bool is_long = false, q = extract32(insn, 6, 1);
+     bool ptr_is_env = false;
+-    if ((insn & 0xff000f10) == 0xfe000800) {
+-        /* VCMLA (indexed) -- 1111 1110 S.RR .... .... 1000 ...0 .... */
+-        int rot = extract32(insn, 20, 2);
+-        int size = extract32(insn, 23, 1);
+-        int index;
+-
+-        if (!dc_isar_feature(aa32_vcma, s)) {
+-            return 1;
+-        }
+-        if (size == 0) {
+-            if (!dc_isar_feature(aa32_fp16_arith, s)) {
+-                return 1;
+-            }
+-            /* For fp16, rm is just Vm, and index is M.  */
+-            rm = extract32(insn, 0, 4);
+-            index = extract32(insn, 5, 1);
+-        } else {
+-            /* For fp32, rm is the usual M:Vm, and index is 0.  */
+-            VFP_DREG_M(rm, insn);
+-            index = 0;
+-        }
+-        data = (index << 2) | rot;
+-        fn_gvec_ptr = (size ? gen_helper_gvec_fcmlas_idx
+-                       : gen_helper_gvec_fcmlah_idx);
+-    } else if ((insn & 0xffb00f00) == 0xfe200d00) {
++    if ((insn & 0xffb00f00) == 0xfe200d00) {
+         /* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */
+         int u = extract32(insn, 4, 1);
 --
 .20.1

-New patch
+[PULL 28/39] target/arm: Convert V[US]DOT (scalar) to decodetree
+Convert the V[US]DOT (scalar) insns in the 2reg-scalar-ext group
+to decodetree.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-10-peter.maydell@linaro.org
+---
+ target/arm/neon-shared.decode   |  3 +++
+ target/arm/translate-neon.inc.c | 35 +++++++++++++++++++++++++++++++++
+ target/arm/translate.c          | 13 +-----------
+files changed, 39 insertions(+), 12 deletions(-)
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon-shared.decode
++++ b/target/arm/neon-shared.decode
+@@ -XXX,XX +XXX,XX @@ VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
+                vn=%vn_dp vd=%vd_dp size=0
+ VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp size=1 index=0
++
++VDOT_scalar    1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 rm:4 \
++               vm=%vm_dp vn=%vn_dp vd=%vd_dp
+diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.inc.c
++++ b/target/arm/translate-neon.inc.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
+     tcg_temp_free_ptr(fpst);
+     return true;
+ }
++
++static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
++{
++    gen_helper_gvec_3 *fn_gvec;
++    int opr_sz;
++    TCGv_ptr fpst;
++
++    if (!dc_isar_feature(aa32_dp, s)) {
++        return false;
++    }
++
++    /* UNDEF accesses to D16-D31 if they don't exist. */
++    if (!dc_isar_feature(aa32_simd_r32, s) &&
++        ((a->vd | a->vn) & 0x10)) {
++        return false;
++    }
++
++    if ((a->vd | a->vn) & a->q) {
++        return false;
++    }
++
++    if (!vfp_access_check(s)) {
++        return true;
++    }
++
++    fn_gvec = a->u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b;
++    opr_sz = (1 + a->q) * 8;
++    fpst = get_fpstatus_ptr(1);
++    tcg_gen_gvec_3_ool(vfp_reg_offset(1, a->vd),
++                       vfp_reg_offset(1, a->vn),
++                       vfp_reg_offset(1, a->rm),
++                       opr_sz, opr_sz, a->index, fn_gvec);
++    tcg_temp_free_ptr(fpst);
++    return true;
++}
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
+     bool is_long = false, q = extract32(insn, 6, 1);
+     bool ptr_is_env = false;
+-    if ((insn & 0xffb00f00) == 0xfe200d00) {
+-        /* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */
+-        int u = extract32(insn, 4, 1);
+-
+-        if (!dc_isar_feature(aa32_dp, s)) {
+-            return 1;
+-        }
+-        fn_gvec = u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b;
+-        /* rm is just Vm, and index is M.  */
+-        data = extract32(insn, 5, 1); /* index */
+-        rm = extract32(insn, 0, 4);
+-    } else if ((insn & 0xffa00f10) == 0xfe000810) {
++    if ((insn & 0xffa00f10) == 0xfe000810) {
+         /* VFM[AS]L -- 1111 1110 0.0S .... .... 1000 .Q.1 .... */
+         int is_s = extract32(insn, 20, 1);
+         int vm20 = extract32(insn, 0, 3);
+--
+.20.1

-[Qemu-devel] [PULL 10/22] linux-user: Implement PR_PAC_RESET_KEYS
+[PULL 29/39] target/arm: Convert VFM[AS]L (scalar) to decodetree
-From: Richard Henderson <richard.henderson@linaro.org>
+Convert the VFM[AS]L (scalar) insns in the 2reg-scalar-ext group
+to decodetree. These are the last ones in the group so we can remove
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+all the legacy decode for the group.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190201195404.30486-2-richard.henderson@linaro.org
+Note that in disas_thumb2_insn() the parts of this encoding space
 where the decodetree decoder returns false will correctly be directed
 to illegal_op by the "(insn & (1 << 28))" check so they won't fall
 into disas_coproc_insn() by mistake.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-11-peter.maydell@linaro.org
 ---
- linux-user/aarch64/target_syscall.h |  7 ++++++
+ target/arm/neon-shared.decode   |   7 +++
- linux-user/syscall.c                | 36 +++++++++++++++++++++++++++++
+ target/arm/translate-neon.inc.c |  32 ++++++++++
-files changed, 43 insertions(+)
+ target/arm/translate.c          | 107 +-------------------------------
+files changed, 40 insertions(+), 106 deletions(-)
-diff --git a/linux-user/aarch64/target_syscall.h b/linux-user/aarch64/target_syscall.h
 diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
---- a/linux-user/aarch64/target_syscall.h
+--- a/target/arm/neon-shared.decode
-+++ b/linux-user/aarch64/target_syscall.h
++++ b/target/arm/neon-shared.decode
-@@ -XXX,XX +XXX,XX @@ struct target_pt_regs {
+@@ -XXX,XX +XXX,XX @@ VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
- #define TARGET_PR_SVE_SET_VL  50
- #define TARGET_PR_SVE_GET_VL  51
+ VDOT_scalar    1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 rm:4 \
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+#define TARGET_PR_PAC_RESET_KEYS 54
++
-+# define TARGET_PR_PAC_APIAKEY   (1 << 0)
++%vfml_scalar_q0_rm 0:3 5:1
-+# define TARGET_PR_PAC_APIBKEY   (1 << 1)
++%vfml_scalar_q1_index 5:1 3:1
-+# define TARGET_PR_PAC_APDAKEY   (1 << 2)
++VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
-+# define TARGET_PR_PAC_APDBKEY   (1 << 3)
++               rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
-+# define TARGET_PR_PAC_APGAKEY   (1 << 4)
++VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
-+
++               index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
- void arm_init_pauth_key(ARMPACKey *key);
+diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
  #endif /* AARCH64_TARGET_SYSCALL_H */
 diff --git a/linux-user/syscall.c b/linux-user/syscall.c
 index XXXXXXX..XXXXXXX 100644
---- a/linux-user/syscall.c
+--- a/target/arm/translate-neon.inc.c
-+++ b/linux-user/syscall.c
++++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
+@@ -XXX,XX +XXX,XX @@ static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
      tcg_temp_free_ptr(fpst);
      return true;
  }
 +
 +static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
 +{
 +    int opr_sz;
 +
 +    if (!dc_isar_feature(aa32_fhm, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd & 0x10) || (a->q && (a->vn & 0x10)))) {
 +        return false;
 +    }
 +
 +    if (a->vd & a->q) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    opr_sz = (1 + a->q) * 8;
 +    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
 +                       vfp_reg_offset(a->q, a->vn),
 +                       vfp_reg_offset(a->q, a->rm),
 +                       cpu_env, opr_sz, opr_sz,
 +                       (a->index << 2) | a->s, /* is_2 == 0 */
 +                       gen_helper_gvec_fmlal_idx_a32);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
  }
  #define VFP_REG_SHR(x, n) (((n) > 0) ? (x) >> (n) : (x) << -(n))
 -#define VFP_SREG(insn, bigbit, smallbit) \
 -  ((VFP_REG_SHR(insn, bigbit - 1) & 0x1e) | (((insn) >> (smallbit)) & 1))
  #define VFP_DREG(reg, insn, bigbit, smallbit) do { \
      if (dc_isar_feature(aa32_simd_r32, s)) { \
          reg = (((insn) >> (bigbit)) & 0x0f) \
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
          reg = ((insn) >> (bigbit)) & 0x0f; \
      }} while (0)
 -#define VFP_SREG_D(insn) VFP_SREG(insn, 12, 22)
  #define VFP_DREG_D(reg, insn) VFP_DREG(reg, insn, 12, 22)
 -#define VFP_SREG_N(insn) VFP_SREG(insn, 16,  7)
  #define VFP_DREG_N(reg, insn) VFP_DREG(reg, insn, 16,  7)
 -#define VFP_SREG_M(insn) VFP_SREG(insn,  0,  5)
  #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
  static void gen_neon_dup_low16(TCGv_i32 var)
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
      return 0;
  }
 -/* Advanced SIMD two registers and a scalar extension.
 - *  31             24   23  22   20   16   12  11   10   9    8        3     0
 - * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
 - * | 1 1 1 1 1 1 1 0 | o1 | D | o2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
 - * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
 - *
 - */
 -
 -static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
 -{
 -    gen_helper_gvec_3 *fn_gvec = NULL;
 -    gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL;
 -    int rd, rn, rm, opr_sz, data;
 -    int off_rn, off_rm;
 -    bool is_long = false, q = extract32(insn, 6, 1);
 -    bool ptr_is_env = false;
 -
 -    if ((insn & 0xffa00f10) == 0xfe000810) {
 -        /* VFM[AS]L -- 1111 1110 0.0S .... .... 1000 .Q.1 .... */
 -        int is_s = extract32(insn, 20, 1);
 -        int vm20 = extract32(insn, 0, 3);
 -        int vm3 = extract32(insn, 3, 1);
 -        int m = extract32(insn, 5, 1);
 -        int index;
 -
 -        if (!dc_isar_feature(aa32_fhm, s)) {
 -            return 1;
 -        }
 -        if (q) {
 -            rm = vm20;
 -            index = m * 2 + vm3;
 -        } else {
 -            rm = vm20 * 2 + m;
 -            index = vm3;
 -        }
 -        is_long = true;
 -        data = (index << 2) | is_s; /* is_2 == 0 */
 -        fn_gvec_ptr = gen_helper_gvec_fmlal_idx_a32;
 -        ptr_is_env = true;
 -    } else {
 -        return 1;
 -    }
 -
 -    VFP_DREG_D(rd, insn);
 -    if (rd & q) {
 -        return 1;
 -    }
 -    if (q || !is_long) {
 -        VFP_DREG_N(rn, insn);
 -        if (rn & q & !is_long) {
 -            return 1;
 -        }
 -        off_rn = vfp_reg_offset(1, rn);
 -        off_rm = vfp_reg_offset(1, rm);
 -    } else {
 -        rn = VFP_SREG_N(insn);
 -        off_rn = vfp_reg_offset(0, rn);
 -        off_rm = vfp_reg_offset(0, rm);
 -    }
 -    if (s->fp_excp_el) {
 -        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
 -                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
 -        return 0;
 -    }
 -    if (!s->vfp_enabled) {
 -        return 1;
 -    }
 -
 -    opr_sz = (1 + q) * 8;
 -    if (fn_gvec_ptr) {
 -        TCGv_ptr ptr;
 -        if (ptr_is_env) {
 -            ptr = cpu_env;
 -        } else {
 -            ptr = get_fpstatus_ptr(1);
 -        }
 -        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), off_rn, off_rm, ptr,
 -                           opr_sz, opr_sz, data, fn_gvec_ptr);
 -        if (!ptr_is_env) {
 -            tcg_temp_free_ptr(ptr);
 -        }
 -    } else {
 -        tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), off_rn, off_rm,
 -                           opr_sz, opr_sz, data, fn_gvec);
 -    }
 -    return 0;
 -}
 -
  static int disas_coproc_insn(DisasContext *s, uint32_t insn)
  {
      int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                      }
                  }
              }
-             return ret;
+-        } else if ((insn & 0x0f000a00) == 0x0e000800
-+        case TARGET_PR_PAC_RESET_KEYS:
+-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
-+            {
+-            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
-+                CPUARMState *env = cpu_env;
+-                goto illegal_op;
-+                ARMCPU *cpu = arm_env_get_cpu(env);
+-            }
-+
+-            return;
-+                if (arg3 || arg4 || arg5) {
+         }
-+                    return -TARGET_EINVAL;
+         goto illegal_op;
-+                }
+     }
-+                if (cpu_isar_feature(aa64_pauth, cpu)) {
+@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
-+                    int all = (TARGET_PR_PAC_APIAKEY | TARGET_PR_PAC_APIBKEY |
+             }
-+                               TARGET_PR_PAC_APDAKEY | TARGET_PR_PAC_APDBKEY |
+             break;
-+                               TARGET_PR_PAC_APGAKEY);
+         }
-+                    if (arg2 == 0) {
+-        if ((insn & 0xff000a00) == 0xfe000800
-+                        arg2 = all;
+-            && arm_dc_feature(s, ARM_FEATURE_V8)) {
-+                    } else if (arg2 & ~all) {
+-            /* The Thumb2 and ARM encodings are identical.  */
-+                        return -TARGET_EINVAL;
+-            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
-+                    }
+-                goto illegal_op;
-+                    if (arg2 & TARGET_PR_PAC_APIAKEY) {
+-            }
-+                        arm_init_pauth_key(&env->apia_key);
+-        } else if (((insn >> 24) & 3) == 3) {
-+                    }
++        if (((insn >> 24) & 3) == 3) {
-+                    if (arg2 & TARGET_PR_PAC_APIBKEY) {
+             /* Translate into the equivalent ARM encoding.  */
-+                        arm_init_pauth_key(&env->apib_key);
+             insn = (insn & 0xe2ffffff) | ((insn & (1 << 28)) >> 4) | (1 << 28);
-+                    }
+             if (disas_neon_data_insn(s, insn)) {
 +                    if (arg2 & TARGET_PR_PAC_APDAKEY) {
 +                        arm_init_pauth_key(&env->apda_key);
 +                    }
 +                    if (arg2 & TARGET_PR_PAC_APDBKEY) {
 +                        arm_init_pauth_key(&env->apdb_key);
 +                    }
 +                    if (arg2 & TARGET_PR_PAC_APGAKEY) {
 +                        arm_init_pauth_key(&env->apga_key);
 +                    }
 +                    return 0;
 +                }
 +            }
 +            return -TARGET_EINVAL;
  #endif /* AARCH64 */
          case PR_GET_SECCOMP:
          case PR_SET_SECCOMP:
 --
 .20.1

-New patch
+[PULL 30/39] target/arm: Convert Neon load/store multiple structures to decodetree
+Convert the Neon "load/store multiple structures" insns to decodetree.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200430181003.21682-12-peter.maydell@linaro.org
 ---
  target/arm/neon-ls.decode       |   7 ++
  target/arm/translate-neon.inc.c | 124 ++++++++++++++++++++++++++++++++
  target/arm/translate.c          |  91 +----------------------
 files changed, 133 insertions(+), 89 deletions(-)
 diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-ls.decode
 +++ b/target/arm/neon-ls.decode
@@ -XXX,XX +XXX,XX @@
  #   0b1111_1001_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
  # This file works on the A32 encoding only; calling code for T32 has to
  # transform the insn into the A32 version first.
 +
 +%vd_dp  22:1 12:4
 +
 +# Neon load/store multiple structures
 +
 +VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
 +               vd=%vd_dp
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
                         gen_helper_gvec_fmlal_idx_a32);
      return true;
  }
 +
 +static struct {
 +    int nregs;
 +    int interleave;
 +    int spacing;
 +} const neon_ls_element_type[11] = {
 +    {1, 4, 1},
 +    {1, 4, 2},
 +    {4, 1, 1},
 +    {2, 2, 2},
 +    {1, 3, 1},
 +    {1, 3, 2},
 +    {3, 1, 1},
 +    {1, 1, 1},
 +    {1, 2, 1},
 +    {1, 2, 2},
 +    {2, 1, 1}
 +};
 +
 +static void gen_neon_ldst_base_update(DisasContext *s, int rm, int rn,
 +                                      int stride)
 +{
 +    if (rm != 15) {
 +        TCGv_i32 base;
 +
 +        base = load_reg(s, rn);
 +        if (rm == 13) {
 +            tcg_gen_addi_i32(base, base, stride);
 +        } else {
 +            TCGv_i32 index;
 +            index = load_reg(s, rm);
 +            tcg_gen_add_i32(base, base, index);
 +            tcg_temp_free_i32(index);
 +        }
 +        store_reg(s, rn, base);
 +    }
 +}
 +
 +static bool trans_VLDST_multiple(DisasContext *s, arg_VLDST_multiple *a)
 +{
 +    /* Neon load/store multiple structures */
 +    int nregs, interleave, spacing, reg, n;
 +    MemOp endian = s->be_data;
 +    int mmu_idx = get_mem_index(s);
 +    int size = a->size;
 +    TCGv_i64 tmp64;
 +    TCGv_i32 addr, tmp;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
 +        return false;
 +    }
 +    if (a->itype > 10) {
 +        return false;
 +    }
 +    /* Catch UNDEF cases for bad values of align field */
 +    switch (a->itype & 0xc) {
 +    case 4:
 +        if (a->align >= 2) {
 +            return false;
 +        }
 +        break;
 +    case 8:
 +        if (a->align == 3) {
 +            return false;
 +        }
 +        break;
 +    default:
 +        break;
 +    }
 +    nregs = neon_ls_element_type[a->itype].nregs;
 +    interleave = neon_ls_element_type[a->itype].interleave;
 +    spacing = neon_ls_element_type[a->itype].spacing;
 +    if (size == 3 && (interleave | spacing) != 1) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    /* For our purposes, bytes are always little-endian.  */
 +    if (size == 0) {
 +        endian = MO_LE;
 +    }
 +    /*
 +     * Consecutive little-endian elements from a single register
 +     * can be promoted to a larger little-endian operation.
 +     */
 +    if (interleave == 1 && endian == MO_LE) {
 +        size = 3;
 +    }
 +    tmp64 = tcg_temp_new_i64();
 +    addr = tcg_temp_new_i32();
 +    tmp = tcg_const_i32(1 << size);
 +    load_reg_var(s, addr, a->rn);
 +    for (reg = 0; reg < nregs; reg++) {
 +        for (n = 0; n < 8 >> size; n++) {
 +            int xs;
 +            for (xs = 0; xs < interleave; xs++) {
 +                int tt = a->vd + reg + spacing * xs;
 +
 +                if (a->l) {
 +                    gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
 +                    neon_store_element64(tt, n, size, tmp64);
 +                } else {
 +                    neon_load_element64(tmp64, tt, n, size);
 +                    gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
 +                }
 +                tcg_gen_add_i32(addr, addr, tmp);
 +            }
 +        }
 +    }
 +    tcg_temp_free_i32(addr);
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i64(tmp64);
 +
 +    gen_neon_ldst_base_update(s, a->rm, a->rn, nregs * interleave * 8);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_trn_u16(TCGv_i32 t0, TCGv_i32 t1)
  }
 -static struct {
 -    int nregs;
 -    int interleave;
 -    int spacing;
 -} const neon_ls_element_type[11] = {
 -    {1, 4, 1},
 -    {1, 4, 2},
 -    {4, 1, 1},
 -    {2, 2, 2},
 -    {1, 3, 1},
 -    {1, 3, 2},
 -    {3, 1, 1},
 -    {1, 1, 1},
 -    {1, 2, 1},
 -    {1, 2, 2},
 -    {2, 1, 1}
 -};
 -
  /* Translate a NEON load/store element instruction.  Return nonzero if the
     instruction is invalid.  */
  static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
  {
      int rd, rn, rm;
 -    int op;
      int nregs;
 -    int interleave;
 -    int spacing;
      int stride;
      int size;
      int reg;
      int load;
 -    int n;
      int vec_size;
 -    int mmu_idx;
 -    MemOp endian;
      TCGv_i32 addr;
      TCGv_i32 tmp;
 -    TCGv_i32 tmp2;
 -    TCGv_i64 tmp64;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
      rn = (insn >> 16) & 0xf;
      rm = insn & 0xf;
      load = (insn & (1 << 21)) != 0;
 -    endian = s->be_data;
 -    mmu_idx = get_mem_index(s);
      if ((insn & (1 << 23)) == 0) {
 -        /* Load store all elements.  */
 -        op = (insn >> 8) & 0xf;
 -        size = (insn >> 6) & 3;
 -        if (op > 10)
 -            return 1;
 -        /* Catch UNDEF cases for bad values of align field */
 -        switch (op & 0xc) {
 -        case 4:
 -            if (((insn >> 5) & 1) == 1) {
 -                return 1;
 -            }
 -            break;
 -        case 8:
 -            if (((insn >> 4) & 3) == 3) {
 -                return 1;
 -            }
 -            break;
 -        default:
 -            break;
 -        }
 -        nregs = neon_ls_element_type[op].nregs;
 -        interleave = neon_ls_element_type[op].interleave;
 -        spacing = neon_ls_element_type[op].spacing;
 -        if (size == 3 && (interleave | spacing) != 1) {
 -            return 1;
 -        }
 -        /* For our purposes, bytes are always little-endian.  */
 -        if (size == 0) {
 -            endian = MO_LE;
 -        }
 -        /* Consecutive little-endian elements from a single register
 -         * can be promoted to a larger little-endian operation.
 -         */
 -        if (interleave == 1 && endian == MO_LE) {
 -            size = 3;
 -        }
 -        tmp64 = tcg_temp_new_i64();
 -        addr = tcg_temp_new_i32();
 -        tmp2 = tcg_const_i32(1 << size);
 -        load_reg_var(s, addr, rn);
 -        for (reg = 0; reg < nregs; reg++) {
 -            for (n = 0; n < 8 >> size; n++) {
 -                int xs;
 -                for (xs = 0; xs < interleave; xs++) {
 -                    int tt = rd + reg + spacing * xs;
 -
 -                    if (load) {
 -                        gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
 -                        neon_store_element64(tt, n, size, tmp64);
 -                    } else {
 -                        neon_load_element64(tmp64, tt, n, size);
 -                        gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
 -                    }
 -                    tcg_gen_add_i32(addr, addr, tmp2);
 -                }
 -            }
 -        }
 -        tcg_temp_free_i32(addr);
 -        tcg_temp_free_i32(tmp2);
 -        tcg_temp_free_i64(tmp64);
 -        stride = nregs * interleave * 8;
 +        /* Load store all elements -- handled already by decodetree */
 +        return 1;
      } else {
          size = (insn >> 10) & 3;
          if (size == 3) {
 --
 .20.1

-[Qemu-devel] [PULL 19/22] hw/arm/boot: Factor out "set up firmware boot" code
+[PULL 31/39] target/arm: Convert Neon 'load single structure to all lanes' to decodetree
-Factor out the "boot via firmware" code path from arm_load_kernel()
+Convert the Neon "load single structure to all lanes" insns to
-into its own function.
+decodetree.
 This commit only moves code around; no semantic changes.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Igor Mammedov <imammedo@redhat.com>
+Message-id: 20200430181003.21682-13-peter.maydell@linaro.org
 Message-id: 20190131112240.8395-4-peter.maydell@linaro.org
 ---
- hw/arm/boot.c | 92 +++++++++++++++++++++++++++------------------------
+ target/arm/neon-ls.decode       |  5 +++
-file changed, 49 insertions(+), 43 deletions(-)
+ target/arm/translate-neon.inc.c | 73 +++++++++++++++++++++++++++++++++
  target/arm/translate.c          | 55 +------------------------
 files changed, 80 insertions(+), 53 deletions(-)
-diff --git a/hw/arm/boot.c b/hw/arm/boot.c
+diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/boot.c
+--- a/target/arm/neon-ls.decode
-+++ b/hw/arm/boot.c
++++ b/target/arm/neon-ls.decode
-@@ -XXX,XX +XXX,XX @@ static void arm_setup_direct_kernel_boot(ARMCPU *cpu,
+@@ -XXX,XX +XXX,XX @@
-     }
  VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
                 vd=%vd_dp
 +
 +# Neon load single element to all lanes
 +
 +VLD_all_lanes  1111 0100 1 . 1 0 rn:4 .... 11 n:2 size:2 t:1 a:1 rm:4 \
 +               vd=%vd_dp
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_multiple(DisasContext *s, arg_VLDST_multiple *a)
      gen_neon_ldst_base_update(s, a->rm, a->rn, nregs * interleave * 8);
      return true;
  }
++
-+static void arm_setup_firmware_boot(ARMCPU *cpu, struct arm_boot_info *info)
++static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
 +{
-+    /* Set up for booting firmware (which might load a kernel via fw_cfg) */
++    /* Neon load single structure to all lanes */
 +    int reg, stride, vec_size;
 +    int vd = a->vd;
 +    int size = a->size;
 +    int nregs = a->n + 1;
 +    TCGv_i32 addr, tmp;
 +
-+    if (have_dtb(info)) {
++    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        /*
++        return false;
 +         * If we have a device tree blob, but no kernel to supply it to (or
 +         * the kernel is supposed to be loaded by the bootloader), copy the
 +         * DTB to the base of RAM for the bootloader to pick up.
 +         */
 +        info->dtb_start = info->loader_start;
 +    }
 +
-+    if (info->kernel_filename) {
++    /* UNDEF accesses to D16-D31 if they don't exist */
-+        FWCfgState *fw_cfg;
++    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
-+        bool try_decompressing_kernel;
++        return false;
 +    }
 +
-+        fw_cfg = fw_cfg_find();
++    if (size == 3) {
-+        try_decompressing_kernel = arm_feature(&cpu->env,
++        if (nregs != 4 || a->a == 0) {
-+                                               ARM_FEATURE_AARCH64);
++            return false;
 +        }
 +        /* For VLD4 size == 3 a == 1 means 32 bits at 16 byte alignment */
 +        size = 2;
 +    }
 +    if (nregs == 1 && a->a == 1 && size == 0) {
 +        return false;
 +    }
 +    if (nregs == 3 && a->a == 1) {
 +        return false;
 +    }
 +
-+        /*
++    if (!vfp_access_check(s)) {
-+         * Expose the kernel, the command line, and the initrd in fw_cfg.
++        return true;
 +         * We don't process them here at all, it's all left to the
 +         * firmware.
 +         */
 +        load_image_to_fw_cfg(fw_cfg,
 +                             FW_CFG_KERNEL_SIZE, FW_CFG_KERNEL_DATA,
 +                             info->kernel_filename,
 +                             try_decompressing_kernel);
 +        load_image_to_fw_cfg(fw_cfg,
 +                             FW_CFG_INITRD_SIZE, FW_CFG_INITRD_DATA,
 +                             info->initrd_filename, false);
 +
 +        if (info->kernel_cmdline) {
 +            fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
 +                           strlen(info->kernel_cmdline) + 1);
 +            fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA,
 +                              info->kernel_cmdline);
 +        }
 +    }
 +
 +    /*
-+     * We will start from address 0 (typically a boot ROM image) in the
++     * VLD1 to all lanes: T bit indicates how many Dregs to write.
-+     * same way as hardware.
++     * VLD2/3/4 to all lanes: T bit indicates register stride.
 +     */
++    stride = a->t ? 2 : 1;
++    vec_size = nregs == 1 ? stride * 8 : 8;
++
++    tmp = tcg_temp_new_i32();
++    addr = tcg_temp_new_i32();
++    load_reg_var(s, addr, a->rn);
++    for (reg = 0; reg < nregs; reg++) {
++        gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
++                        s->be_data | size);
++        if ((vd & 1) && vec_size == 16) {
++            /*
++             * We cannot write 16 bytes at once because the
++             * destination is unaligned.
++             */
++            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
++                                 8, 8, tmp);
++            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
++                             neon_reg_offset(vd, 0), 8, 8);
++        } else {
++            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
++                                 vec_size, vec_size, tmp);
++        }
++        tcg_gen_addi_i32(addr, addr, 1 << size);
++        vd += stride;
++    }
++    tcg_temp_free_i32(tmp);
++    tcg_temp_free_i32(addr);
++
++    gen_neon_ldst_base_update(s, a->rm, a->rn, (1 << size) * nregs);
++
++    return true;
 +}
-+
+diff --git a/target/arm/translate.c b/target/arm/translate.c
- void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
+index XXXXXXX..XXXXXXX 100644
- {
+--- a/target/arm/translate.c
-     CPUState *cs;
++++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
+     int size;
-     /* Load the kernel.  */
+     int reg;
-     if (!info->kernel_filename || info->firmware_loaded) {
+     int load;
 -    int vec_size;
      TCGv_i32 addr;
      TCGv_i32 tmp;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
      } else {
          size = (insn >> 10) & 3;
          if (size == 3) {
 -            /* Load single element to all lanes.  */
 -            int a = (insn >> 4) & 1;
 -            if (!load) {
 -                return 1;
 -            }
 -            size = (insn >> 6) & 3;
 -            nregs = ((insn >> 8) & 3) + 1;
 -
--        if (have_dtb(info)) {
+-            if (size == 3) {
--            /*
+-                if (nregs != 4 || a == 0) {
--             * If we have a device tree blob, but no kernel to supply it to (or
+-                    return 1;
--             * the kernel is supposed to be loaded by the bootloader), copy the
+-                }
--             * DTB to the base of RAM for the bootloader to pick up.
+-                /* For VLD4 size==3 a == 1 means 32 bits at 16 byte alignment */
 -                size = 2;
 -            }
 -            if (nregs == 1 && a == 1 && size == 0) {
 -                return 1;
 -            }
 -            if (nregs == 3 && a == 1) {
 -                return 1;
 -            }
 -            addr = tcg_temp_new_i32();
 -            load_reg_var(s, addr, rn);
 -
 -            /* VLD1 to all lanes: bit 5 indicates how many Dregs to write.
 -             * VLD2/3/4 to all lanes: bit 5 indicates register stride.
 -             */
--            info->dtb_start = info->loader_start;
+-            stride = (insn & (1 << 5)) ? 2 : 1;
--        }
+-            vec_size = nregs == 1 ? stride * 8 : 8;
 -
--        if (info->kernel_filename) {
+-            tmp = tcg_temp_new_i32();
--            FWCfgState *fw_cfg;
+-            for (reg = 0; reg < nregs; reg++) {
--            bool try_decompressing_kernel;
+-                gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
--
+-                                s->be_data | size);
--            fw_cfg = fw_cfg_find();
+-                if ((rd & 1) && vec_size == 16) {
--            try_decompressing_kernel = arm_feature(&cpu->env,
+-                    /* We cannot write 16 bytes at once because the
--                                                   ARM_FEATURE_AARCH64);
+-                     * destination is unaligned.
--
+-                     */
--            /*
+-                    tcg_gen_gvec_dup_i32(size, neon_reg_offset(rd, 0),
--             * Expose the kernel, the command line, and the initrd in fw_cfg.
+-                                         8, 8, tmp);
--             * We don't process them here at all, it's all left to the
+-                    tcg_gen_gvec_mov(0, neon_reg_offset(rd + 1, 0),
--             * firmware.
+-                                     neon_reg_offset(rd, 0), 8, 8);
--             */
+-                } else {
--            load_image_to_fw_cfg(fw_cfg,
+-                    tcg_gen_gvec_dup_i32(size, neon_reg_offset(rd, 0),
--                                 FW_CFG_KERNEL_SIZE, FW_CFG_KERNEL_DATA,
+-                                         vec_size, vec_size, tmp);
--                                 info->kernel_filename,
+-                }
--                                 try_decompressing_kernel);
+-                tcg_gen_addi_i32(addr, addr, 1 << size);
--            load_image_to_fw_cfg(fw_cfg,
+-                rd += stride;
 -                                 FW_CFG_INITRD_SIZE, FW_CFG_INITRD_DATA,
 -                                 info->initrd_filename, false);
 -
 -            if (info->kernel_cmdline) {
 -                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
 -                               strlen(info->kernel_cmdline) + 1);
 -                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA,
 -                                  info->kernel_cmdline);
 -            }
--        }
+-            tcg_temp_free_i32(tmp);
--
+-            tcg_temp_free_i32(addr);
--        /*
+-            stride = (1 << size) * nregs;
--         * We will start from address 0 (typically a boot ROM image) in the
++            /* Load single element to all lanes -- handled by decodetree  */
--         * same way as hardware.
++            return 1;
--         */
+         } else {
-+        arm_setup_firmware_boot(cpu, info);
+             /* Single element.  */
-         return;
+             int idx = (insn >> 4) & 0xf;
      } else {
          arm_setup_direct_kernel_boot(cpu, info);
 --
 .20.1

-New patch
+[PULL 32/39] target/arm: Convert Neon 'load/store single structure' to decodetree
+Convert the Neon "load/store single structure to one lane" insns to
 decodetree.
 As this is the last set of insns in the neon load/store group,
 we can remove the whole disas_neon_ls_insn() function.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200430181003.21682-14-peter.maydell@linaro.org
 ---
  target/arm/neon-ls.decode       |  11 +++
  target/arm/translate-neon.inc.c |  89 +++++++++++++++++++
  target/arm/translate.c          | 147 --------------------------------
 files changed, 100 insertions(+), 147 deletions(-)
 diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-ls.decode
 +++ b/target/arm/neon-ls.decode
@@ -XXX,XX +XXX,XX @@ VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
  VLD_all_lanes  1111 0100 1 . 1 0 rn:4 .... 11 n:2 size:2 t:1 a:1 rm:4 \
                 vd=%vd_dp
 +
 +# Neon load/store single structure to one lane
 +%imm1_5_p1 5:1 !function=plus1
 +%imm1_6_p1 6:1 !function=plus1
 +
 +VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 00 n:2 reg_idx:3 align:1 rm:4 \
 +               vd=%vd_dp size=0 stride=1
 +VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 01 n:2 reg_idx:2 align:2 rm:4 \
 +               vd=%vd_dp size=1 stride=%imm1_5_p1
 +VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 10 n:2 reg_idx:1 align:3 rm:4 \
 +               vd=%vd_dp size=2 stride=%imm1_6_p1
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@
   * It might be possible to convert it to a standalone .c file eventually.
   */
 +static inline int plus1(DisasContext *s, int x)
 +{
 +    return x + 1;
 +}
 +
  /* Include the generated Neon decoder */
  #include "decode-neon-dp.inc.c"
  #include "decode-neon-ls.inc.c"
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
      return true;
  }
 +
 +static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
 +{
 +    /* Neon load/store single structure to one lane */
 +    int reg;
 +    int nregs = a->n + 1;
 +    int vd = a->vd;
 +    TCGv_i32 addr, tmp;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
 +        return false;
 +    }
 +
 +    /* Catch the UNDEF cases. This is unavoidably a bit messy. */
 +    switch (nregs) {
 +    case 1:
 +        if (((a->align & (1 << a->size)) != 0) ||
 +            (a->size == 2 && ((a->align & 3) == 1 || (a->align & 3) == 2))) {
 +            return false;
 +        }
 +        break;
 +    case 3:
 +        if ((a->align & 1) != 0) {
 +            return false;
 +        }
 +        /* fall through */
 +    case 2:
 +        if (a->size == 2 && (a->align & 2) != 0) {
 +            return false;
 +        }
 +        break;
 +    case 4:
 +        if ((a->size == 2) && ((a->align & 3) == 3)) {
 +            return false;
 +        }
 +        break;
 +    default:
 +        abort();
 +    }
 +    if ((vd + a->stride * (nregs - 1)) > 31) {
 +        /*
 +         * Attempts to write off the end of the register file are
 +         * UNPREDICTABLE; we choose to UNDEF because otherwise we would
 +         * access off the end of the array that holds the register data.
 +         */
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i32();
 +    addr = tcg_temp_new_i32();
 +    load_reg_var(s, addr, a->rn);
 +    /*
 +     * TODO: if we implemented alignment exceptions, we should check
 +     * addr against the alignment encoded in a->align here.
 +     */
 +    for (reg = 0; reg < nregs; reg++) {
 +        if (a->l) {
 +            gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
 +                            s->be_data | a->size);
 +            neon_store_element(vd, a->reg_idx, a->size, tmp);
 +        } else { /* Store */
 +            neon_load_element(tmp, vd, a->reg_idx, a->size);
 +            gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
 +                            s->be_data | a->size);
 +        }
 +        vd += a->stride;
 +        tcg_gen_addi_i32(addr, addr, 1 << a->size);
 +    }
 +    tcg_temp_free_i32(addr);
 +    tcg_temp_free_i32(tmp);
 +
 +    gen_neon_ldst_base_update(s, a->rm, a->rn, (1 << a->size) * nregs);
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_trn_u16(TCGv_i32 t0, TCGv_i32 t1)
      tcg_temp_free_i32(rd);
  }
 -
 -/* Translate a NEON load/store element instruction.  Return nonzero if the
 -   instruction is invalid.  */
 -static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
 -{
 -    int rd, rn, rm;
 -    int nregs;
 -    int stride;
 -    int size;
 -    int reg;
 -    int load;
 -    TCGv_i32 addr;
 -    TCGv_i32 tmp;
 -
 -    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 -        return 1;
 -    }
 -
 -    /* FIXME: this access check should not take precedence over UNDEF
 -     * for invalid encodings; we will generate incorrect syndrome information
 -     * for attempts to execute invalid vfp/neon encodings with FP disabled.
 -     */
 -    if (s->fp_excp_el) {
 -        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
 -                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
 -        return 0;
 -    }
 -
 -    if (!s->vfp_enabled)
 -      return 1;
 -    VFP_DREG_D(rd, insn);
 -    rn = (insn >> 16) & 0xf;
 -    rm = insn & 0xf;
 -    load = (insn & (1 << 21)) != 0;
 -    if ((insn & (1 << 23)) == 0) {
 -        /* Load store all elements -- handled already by decodetree */
 -        return 1;
 -    } else {
 -        size = (insn >> 10) & 3;
 -        if (size == 3) {
 -            /* Load single element to all lanes -- handled by decodetree  */
 -            return 1;
 -        } else {
 -            /* Single element.  */
 -            int idx = (insn >> 4) & 0xf;
 -            int reg_idx;
 -            switch (size) {
 -            case 0:
 -                reg_idx = (insn >> 5) & 7;
 -                stride = 1;
 -                break;
 -            case 1:
 -                reg_idx = (insn >> 6) & 3;
 -                stride = (insn & (1 << 5)) ? 2 : 1;
 -                break;
 -            case 2:
 -                reg_idx = (insn >> 7) & 1;
 -                stride = (insn & (1 << 6)) ? 2 : 1;
 -                break;
 -            default:
 -                abort();
 -            }
 -            nregs = ((insn >> 8) & 3) + 1;
 -            /* Catch the UNDEF cases. This is unavoidably a bit messy. */
 -            switch (nregs) {
 -            case 1:
 -                if (((idx & (1 << size)) != 0) ||
 -                    (size == 2 && ((idx & 3) == 1 || (idx & 3) == 2))) {
 -                    return 1;
 -                }
 -                break;
 -            case 3:
 -                if ((idx & 1) != 0) {
 -                    return 1;
 -                }
 -                /* fall through */
 -            case 2:
 -                if (size == 2 && (idx & 2) != 0) {
 -                    return 1;
 -                }
 -                break;
 -            case 4:
 -                if ((size == 2) && ((idx & 3) == 3)) {
 -                    return 1;
 -                }
 -                break;
 -            default:
 -                abort();
 -            }
 -            if ((rd + stride * (nregs - 1)) > 31) {
 -                /* Attempts to write off the end of the register file
 -                 * are UNPREDICTABLE; we choose to UNDEF because otherwise
 -                 * the neon_load_reg() would write off the end of the array.
 -                 */
 -                return 1;
 -            }
 -            tmp = tcg_temp_new_i32();
 -            addr = tcg_temp_new_i32();
 -            load_reg_var(s, addr, rn);
 -            for (reg = 0; reg < nregs; reg++) {
 -                if (load) {
 -                    gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
 -                                    s->be_data | size);
 -                    neon_store_element(rd, reg_idx, size, tmp);
 -                } else { /* Store */
 -                    neon_load_element(tmp, rd, reg_idx, size);
 -                    gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
 -                                    s->be_data | size);
 -                }
 -                rd += stride;
 -                tcg_gen_addi_i32(addr, addr, 1 << size);
 -            }
 -            tcg_temp_free_i32(addr);
 -            tcg_temp_free_i32(tmp);
 -            stride = nregs * (1 << size);
 -        }
 -    }
 -    if (rm != 15) {
 -        TCGv_i32 base;
 -
 -        base = load_reg(s, rn);
 -        if (rm == 13) {
 -            tcg_gen_addi_i32(base, base, stride);
 -        } else {
 -            TCGv_i32 index;
 -            index = load_reg(s, rm);
 -            tcg_gen_add_i32(base, base, index);
 -            tcg_temp_free_i32(index);
 -        }
 -        store_reg(s, rn, base);
 -    }
 -    return 0;
 -}
 -
  static inline void gen_neon_narrow(int size, TCGv_i32 dest, TCGv_i64 src)
  {
      switch (size) {
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
              }
              return;
          }
 -        if ((insn & 0x0f100000) == 0x04000000) {
 -            /* NEON load/store.  */
 -            if (disas_neon_ls_insn(s, insn)) {
 -                goto illegal_op;
 -            }
 -            return;
 -        }
          if ((insn & 0x0e000f00) == 0x0c000100) {
              if (arm_dc_feature(s, ARM_FEATURE_IWMMXT)) {
                  /* iWMMXt register transfer.  */
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
          }
          break;
      case 12:
 -        if ((insn & 0x01100000) == 0x01000000) {
 -            if (disas_neon_ls_insn(s, insn)) {
 -                goto illegal_op;
 -            }
 -            break;
 -        }
          goto illegal_op;
      default:
      illegal_op:
 --
 .20.1

-[Qemu-devel] [PULL 18/22] hw/arm/boot: Factor out "direct kernel boot" code into its own function
+[PULL 33/39] target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree
-Factor out the "direct kernel boot" code path from arm_load_kernel()
+Convert the Neon 3-reg-same VADD and VSUB insns to decodetree.
 into its own function; this function is getting long enough that
 the code flow is a bit confusing.
-This commit only moves code around; no semantic changes.
+Note that we don't need the neon_3r_sizes[op] check here because all
 size values are OK for VADD and VSUB; we'll add this when we convert
 the first insn that has size restrictions.
-We leave the "load the dtb" code in arm_load_kernel() -- this
+For this we need one of the GVecGen*Fn typedefs currently in
-is currently only used by the "direct kernel boot" path, but
+translate-a64.h; move them all to translate.h as a block so they
-this is a bug which we will fix shortly.
+are visible to the 32-bit decoder.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Igor Mammedov <imammedo@redhat.com>
+Message-id: 20200430181003.21682-15-peter.maydell@linaro.org
 Message-id: 20190131112240.8395-3-peter.maydell@linaro.org
 ---
- hw/arm/boot.c | 150 +++++++++++++++++++++++++++-----------------------
+ target/arm/translate-a64.h      |  9 --------
-file changed, 80 insertions(+), 70 deletions(-)
+ target/arm/translate.h          |  9 ++++++++
  target/arm/neon-dp.decode       | 17 +++++++++++++++
  target/arm/translate-neon.inc.c | 38 +++++++++++++++++++++++++++++++++
  target/arm/translate.c          | 14 ++++--------
 files changed, 68 insertions(+), 19 deletions(-)
-diff --git a/hw/arm/boot.c b/hw/arm/boot.c
+diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/boot.c
+--- a/target/arm/translate-a64.h
-+++ b/hw/arm/boot.c
++++ b/target/arm/translate-a64.h
-@@ -XXX,XX +XXX,XX @@ static uint64_t load_aarch64_image(const char *filename, hwaddr mem_base,
+@@ -XXX,XX +XXX,XX @@ static inline int vec_full_reg_size(DisasContext *s)
-     return size;
  bool disas_sve(DisasContext *, uint32_t);
 -/* Note that the gvec expanders operate on offsets + sizes.  */
 -typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
 -typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
 -                         uint32_t, uint32_t);
 -typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
 -                        uint32_t, uint32_t, uint32_t);
 -typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
 -                        uint32_t, uint32_t, uint32_t);
 -
  #endif /* TARGET_ARM_TRANSLATE_A64_H */
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
  #define dc_isar_feature(name, ctx) \
      ({ DisasContext *ctx_ = (ctx); isar_feature_##name(ctx_->isar); })
 +/* Note that the gvec expanders operate on offsets + sizes.  */
 +typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
 +typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
 +                         uint32_t, uint32_t);
 +typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
 +                        uint32_t, uint32_t, uint32_t);
 +typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
 +                        uint32_t, uint32_t, uint32_t);
 +
  #endif /* TARGET_ARM_TRANSLATE_H */
 diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-dp.decode
 +++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@
  #
  # This file is processed by scripts/decodetree.py
  #
 +# VFP/Neon register fields; same as vfp.decode
 +%vm_dp  5:1 0:4
 +%vn_dp  7:1 16:4
 +%vd_dp  22:1 12:4
  # Encodings for Neon data processing instructions where the T32 encoding
  # is a simple transformation of the A32 encoding.
@@ -XXX,XX +XXX,XX @@
  #   0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
  # This file works on the A32 encoding only; calling code for T32 has to
  # transform the insn into the A32 version first.
 +
 +######################################################################
 +# 3-reg-same grouping:
 +# 1111 001 U 0 D sz:2 Vn:4 Vd:4 opc:4 N Q M op Vm:4
 +######################################################################
 +
 +&3same vm vn vd q size
 +
 +@3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
 +                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
 +VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
 +VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
      return true;
  }
++
--void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
++static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
-+static void arm_setup_direct_kernel_boot(ARMCPU *cpu,
++{
-+                                         struct arm_boot_info *info)
++    int vec_size = a->q ? 16 : 8;
- {
++    int rd_ofs = neon_reg_offset(a->vd, 0);
-+    /* Set up for a direct boot of a kernel image file. */
++    int rn_ofs = neon_reg_offset(a->vn, 0);
-     CPUState *cs;
++    int rm_ofs = neon_reg_offset(a->vm, 0);
-+    AddressSpace *as = arm_boot_address_space(cpu, info);
++
-     int kernel_size;
++    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-     int initrd_size;
++        return false;
-     int is_linux = 0;
++    }
-@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
++
-     int elf_machine;
++    /* UNDEF accesses to D16-D31 if they don't exist. */
-     hwaddr entry;
++    if (!dc_isar_feature(aa32_simd_r32, s) &&
-     static const ARMInsnFixup *primary_loader;
++        ((a->vd | a->vn | a->vm) & 0x10)) {
--    AddressSpace *as = arm_boot_address_space(cpu, info);
++        return false;
--
++    }
--    /*
++
--     * CPU objects (unlike devices) are not automatically reset on system
++    if ((a->vn | a->vm | a->vd) & a->q) {
--     * reset, so we must always register a handler to do so. If we're
++        return false;
--     * actually loading a kernel, the handler is also responsible for
++    }
--     * arranging that we start it correctly.
++
--     */
++    if (!vfp_access_check(s)) {
--    for (cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
++        return true;
--        qemu_register_reset(do_cpu_reset, ARM_CPU(cs));
++    }
--    }
++
--
++    fn(a->size, rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
--    /*
++    return true;
 -     * The board code is not supposed to set secure_board_setup unless
 -     * running its code in secure mode is actually possible, and KVM
 -     * doesn't support secure.
 -     */
 -    assert(!(info->secure_board_setup && kvm_enabled()));
 -
 -    info->dtb_filename = qemu_opt_get(qemu_get_machine_opts(), "dtb");
 -    info->dtb_limit = 0;
 -
 -    /* Load the kernel.  */
 -    if (!info->kernel_filename || info->firmware_loaded) {
 -
 -        if (have_dtb(info)) {
 -            /*
 -             * If we have a device tree blob, but no kernel to supply it to (or
 -             * the kernel is supposed to be loaded by the bootloader), copy the
 -             * DTB to the base of RAM for the bootloader to pick up.
 -             */
 -            info->dtb_start = info->loader_start;
 -        }
 -
 -        if (info->kernel_filename) {
 -            FWCfgState *fw_cfg;
 -            bool try_decompressing_kernel;
 -
 -            fw_cfg = fw_cfg_find();
 -            try_decompressing_kernel = arm_feature(&cpu->env,
 -                                                   ARM_FEATURE_AARCH64);
 -
 -            /*
 -             * Expose the kernel, the command line, and the initrd in fw_cfg.
 -             * We don't process them here at all, it's all left to the
 -             * firmware.
 -             */
 -            load_image_to_fw_cfg(fw_cfg,
 -                                 FW_CFG_KERNEL_SIZE, FW_CFG_KERNEL_DATA,
 -                                 info->kernel_filename,
 -                                 try_decompressing_kernel);
 -            load_image_to_fw_cfg(fw_cfg,
 -                                 FW_CFG_INITRD_SIZE, FW_CFG_INITRD_DATA,
 -                                 info->initrd_filename, false);
 -
 -            if (info->kernel_cmdline) {
 -                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
 -                               strlen(info->kernel_cmdline) + 1);
 -                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA,
 -                                  info->kernel_cmdline);
 -            }
 -        }
 -
 -        /*
 -         * We will start from address 0 (typically a boot ROM image) in the
 -         * same way as hardware.
 -         */
 -        return;
 -    }
      if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
          primary_loader = bootloader_aarch64;
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
      for (cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
          ARM_CPU(cs)->env.boot_info = info;
      }
 +}
 +
-+void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
++#define DO_3SAME(INSN, FUNC)                                            \
-+{
++    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
-+    CPUState *cs;
++    {                                                                   \
-+    AddressSpace *as = arm_boot_address_space(cpu, info);
++        return do_3same(s, a, FUNC);                                    \
 +
 +    /*
 +     * CPU objects (unlike devices) are not automatically reset on system
 +     * reset, so we must always register a handler to do so. If we're
 +     * actually loading a kernel, the handler is also responsible for
 +     * arranging that we start it correctly.
 +     */
 +    for (cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
 +        qemu_register_reset(do_cpu_reset, ARM_CPU(cs));
 +    }
 +
-+    /*
++DO_3SAME(VADD, tcg_gen_gvec_add)
-+     * The board code is not supposed to set secure_board_setup unless
++DO_3SAME(VSUB, tcg_gen_gvec_sub)
-+     * running its code in secure mode is actually possible, and KVM
+diff --git a/target/arm/translate.c b/target/arm/translate.c
-+     * doesn't support secure.
+index XXXXXXX..XXXXXXX 100644
-+     */
+--- a/target/arm/translate.c
-+    assert(!(info->secure_board_setup && kvm_enabled()));
++++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
              }
              return 0;
 -        case NEON_3R_VADD_VSUB:
 -            if (u) {
 -                tcg_gen_gvec_sub(size, rd_ofs, rn_ofs, rm_ofs,
 -                                 vec_size, vec_size);
 -            } else {
 -                tcg_gen_gvec_add(size, rd_ofs, rn_ofs, rm_ofs,
 -                                 vec_size, vec_size);
 -            }
 -            return 0;
 -
          case NEON_3R_VQADD:
              tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
                             rn_ofs, rm_ofs, vec_size, vec_size,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
              tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
                             u ? &ushl_op[size] : &sshl_op[size]);
              return 0;
 +
-+    info->dtb_filename = qemu_opt_get(qemu_get_machine_opts(), "dtb");
++        case NEON_3R_VADD_VSUB:
-+    info->dtb_limit = 0;
++            /* Already handled by decodetree */
-+
++            return 1;
-+    /* Load the kernel.  */
+         }
-+    if (!info->kernel_filename || info->firmware_loaded) {
-+
+         if (size == 3) {
 +        if (have_dtb(info)) {
 +            /*
 +             * If we have a device tree blob, but no kernel to supply it to (or
 +             * the kernel is supposed to be loaded by the bootloader), copy the
 +             * DTB to the base of RAM for the bootloader to pick up.
 +             */
 +            info->dtb_start = info->loader_start;
 +        }
 +
 +        if (info->kernel_filename) {
 +            FWCfgState *fw_cfg;
 +            bool try_decompressing_kernel;
 +
 +            fw_cfg = fw_cfg_find();
 +            try_decompressing_kernel = arm_feature(&cpu->env,
 +                                                   ARM_FEATURE_AARCH64);
 +
 +            /*
 +             * Expose the kernel, the command line, and the initrd in fw_cfg.
 +             * We don't process them here at all, it's all left to the
 +             * firmware.
 +             */
 +            load_image_to_fw_cfg(fw_cfg,
 +                                 FW_CFG_KERNEL_SIZE, FW_CFG_KERNEL_DATA,
 +                                 info->kernel_filename,
 +                                 try_decompressing_kernel);
 +            load_image_to_fw_cfg(fw_cfg,
 +                                 FW_CFG_INITRD_SIZE, FW_CFG_INITRD_DATA,
 +                                 info->initrd_filename, false);
 +
 +            if (info->kernel_cmdline) {
 +                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
 +                               strlen(info->kernel_cmdline) + 1);
 +                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA,
 +                                  info->kernel_cmdline);
 +            }
 +        }
 +
 +        /*
 +         * We will start from address 0 (typically a boot ROM image) in the
 +         * same way as hardware.
 +         */
 +        return;
 +    } else {
 +        arm_setup_direct_kernel_boot(cpu, info);
 +    }
      if (!info->skip_dtb_autoload && have_dtb(info)) {
          if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
 --
 .20.1

-New patch
+[PULL 34/39] target/arm: Convert Neon 3-reg-same logic ops to decodetree
+Convert the Neon logic ops in the 3-reg-same grouping to decodetree.
+Note that for the logic ops the 'size' field forms part of their
+decode and the actual operations are always bitwise.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-16-peter.maydell@linaro.org
+---
+ target/arm/neon-dp.decode       | 12 +++++++++++
+ target/arm/translate-neon.inc.c | 19 +++++++++++++++++
+ target/arm/translate.c          | 38 +--------------------------------
+files changed, 32 insertions(+), 37 deletions(-)
+diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon-dp.decode
++++ b/target/arm/neon-dp.decode
+@@ -XXX,XX +XXX,XX @@
+ @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
+                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
++@3same_logic     .... ... . . . .. .... .... .... . q:1 .. .... \
++                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
++
++VAND_3s          1111 001 0 0 . 00 .... .... 0001 ... 1 .... @3same_logic
++VBIC_3s          1111 001 0 0 . 01 .... .... 0001 ... 1 .... @3same_logic
++VORR_3s          1111 001 0 0 . 10 .... .... 0001 ... 1 .... @3same_logic
++VORN_3s          1111 001 0 0 . 11 .... .... 0001 ... 1 .... @3same_logic
++VEOR_3s          1111 001 1 0 . 00 .... .... 0001 ... 1 .... @3same_logic
++VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
++VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
++VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
++
+ VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
+ VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
+diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.inc.c
++++ b/target/arm/translate-neon.inc.c
+@@ -XXX,XX +XXX,XX @@ static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
+ DO_3SAME(VADD, tcg_gen_gvec_add)
+ DO_3SAME(VSUB, tcg_gen_gvec_sub)
++DO_3SAME(VAND, tcg_gen_gvec_and)
++DO_3SAME(VBIC, tcg_gen_gvec_andc)
++DO_3SAME(VORR, tcg_gen_gvec_or)
++DO_3SAME(VORN, tcg_gen_gvec_orc)
++DO_3SAME(VEOR, tcg_gen_gvec_xor)
++
++/* These insns are all gvec_bitsel but with the inputs in various orders. */
++#define DO_3SAME_BITSEL(INSN, O1, O2, O3)                               \
++    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
++                                uint32_t rn_ofs, uint32_t rm_ofs,       \
++                                uint32_t oprsz, uint32_t maxsz)         \
++    {                                                                   \
++        tcg_gen_gvec_bitsel(vece, rd_ofs, O1, O2, O3, oprsz, maxsz);    \
++    }                                                                   \
++    DO_3SAME(INSN, gen_##INSN##_3s)
++
++DO_3SAME_BITSEL(VBSL, rd_ofs, rn_ofs, rm_ofs)
++DO_3SAME_BITSEL(VBIT, rm_ofs, rn_ofs, rd_ofs)
++DO_3SAME_BITSEL(VBIF, rm_ofs, rd_ofs, rn_ofs)
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+             }
+             return 1;
+-        case NEON_3R_LOGIC: /* Logic ops.  */
+-            switch ((u << 2) | size) {
+-            case 0: /* VAND */
+-                tcg_gen_gvec_and(0, rd_ofs, rn_ofs, rm_ofs,
+-                                 vec_size, vec_size);
+-                break;
+-            case 1: /* VBIC */
+-                tcg_gen_gvec_andc(0, rd_ofs, rn_ofs, rm_ofs,
+-                                  vec_size, vec_size);
+-                break;
+-            case 2: /* VORR */
+-                tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
+-                                vec_size, vec_size);
+-                break;
+-            case 3: /* VORN */
+-                tcg_gen_gvec_orc(0, rd_ofs, rn_ofs, rm_ofs,
+-                                 vec_size, vec_size);
+-                break;
+-            case 4: /* VEOR */
+-                tcg_gen_gvec_xor(0, rd_ofs, rn_ofs, rm_ofs,
+-                                 vec_size, vec_size);
+-                break;
+-            case 5: /* VBSL */
+-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rd_ofs, rn_ofs, rm_ofs,
+-                                    vec_size, vec_size);
+-                break;
+-            case 6: /* VBIT */
+-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rn_ofs, rd_ofs,
+-                                    vec_size, vec_size);
+-                break;
+-            case 7: /* VBIF */
+-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rd_ofs, rn_ofs,
+-                                    vec_size, vec_size);
+-                break;
+-            }
+-            return 0;
+-
+         case NEON_3R_VQADD:
+             tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
+                            rn_ofs, rm_ofs, vec_size, vec_size,
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+             return 0;
+         case NEON_3R_VADD_VSUB:
++        case NEON_3R_LOGIC:
+             /* Already handled by decodetree */
+             return 1;
+         }
+--
+.20.1

-New patch
+[PULL 35/39] target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree
+Convert the Neon 3-reg-same VMAX and VMIN insns to decodetree.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-17-peter.maydell@linaro.org
+---
+ target/arm/neon-dp.decode       |  5 +++++
+ target/arm/translate-neon.inc.c | 14 ++++++++++++++
+ target/arm/translate.c          | 21 ++-------------------
+files changed, 21 insertions(+), 19 deletions(-)
+diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon-dp.decode
++++ b/target/arm/neon-dp.decode
+@@ -XXX,XX +XXX,XX @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
+ VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
+ VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
++VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
++VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
++VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
++VMIN_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
++
+ VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
+ VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
+diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.inc.c
++++ b/target/arm/translate-neon.inc.c
+@@ -XXX,XX +XXX,XX @@ DO_3SAME(VEOR, tcg_gen_gvec_xor)
+ DO_3SAME_BITSEL(VBSL, rd_ofs, rn_ofs, rm_ofs)
+ DO_3SAME_BITSEL(VBIT, rm_ofs, rn_ofs, rd_ofs)
+ DO_3SAME_BITSEL(VBIF, rm_ofs, rd_ofs, rn_ofs)
++
++#define DO_3SAME_NO_SZ_3(INSN, FUNC)                                    \
++    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
++    {                                                                   \
++        if (a->size == 3) {                                             \
++            return false;                                               \
++        }                                                               \
++        return do_3same(s, a, FUNC);                                    \
++    }
++
++DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
++DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
++DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
++DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+                              rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
+             return 0;
+-        case NEON_3R_VMAX:
+-            if (u) {
+-                tcg_gen_gvec_umax(size, rd_ofs, rn_ofs, rm_ofs,
+-                                  vec_size, vec_size);
+-            } else {
+-                tcg_gen_gvec_smax(size, rd_ofs, rn_ofs, rm_ofs,
+-                                  vec_size, vec_size);
+-            }
+-            return 0;
+-        case NEON_3R_VMIN:
+-            if (u) {
+-                tcg_gen_gvec_umin(size, rd_ofs, rn_ofs, rm_ofs,
+-                                  vec_size, vec_size);
+-            } else {
+-                tcg_gen_gvec_smin(size, rd_ofs, rn_ofs, rm_ofs,
+-                                  vec_size, vec_size);
+-            }
+-            return 0;
+-
+         case NEON_3R_VSHL:
+             /* Note the operation is vshl vd,vm,vn */
+             tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+         case NEON_3R_VADD_VSUB:
+         case NEON_3R_LOGIC:
++        case NEON_3R_VMAX:
++        case NEON_3R_VMIN:
+             /* Already handled by decodetree */
+             return 1;
+         }
+--
+.20.1

-New patch
+[PULL 36/39] target/arm: Convert Neon 3-reg-same comparisons to decodetree
+Convert the Neon comparison ops in the 3-reg-same grouping
+to decodetree.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-18-peter.maydell@linaro.org
+---
+ target/arm/neon-dp.decode       |  8 ++++++++
+ target/arm/translate-neon.inc.c | 22 ++++++++++++++++++++++
+ target/arm/translate.c          | 23 +++--------------------
+files changed, 33 insertions(+), 20 deletions(-)
+diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon-dp.decode
++++ b/target/arm/neon-dp.decode
+@@ -XXX,XX +XXX,XX @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
+ VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
+ VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
++VCGT_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 0 .... @3same
++VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
++VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
++VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
++
+ VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
+ VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
+ VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
+@@ -XXX,XX +XXX,XX @@ VMIN_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
+ VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
+ VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
++
++VTST_3s          1111 001 0 0 . .. .... .... 1000 . . . 1 .... @3same
++VCEQ_3s          1111 001 1 0 . .. .... .... 1000 . . . 1 .... @3same
+diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.inc.c
++++ b/target/arm/translate-neon.inc.c
+@@ -XXX,XX +XXX,XX @@ DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
+ DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
+ DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
+ DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
++
++#define DO_3SAME_CMP(INSN, COND)                                        \
++    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
++                                uint32_t rn_ofs, uint32_t rm_ofs,       \
++                                uint32_t oprsz, uint32_t maxsz)         \
++    {                                                                   \
++        tcg_gen_gvec_cmp(COND, vece, rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz); \
++    }                                                                   \
++    DO_3SAME_NO_SZ_3(INSN, gen_##INSN##_3s)
++
++DO_3SAME_CMP(VCGT_S, TCG_COND_GT)
++DO_3SAME_CMP(VCGT_U, TCG_COND_GTU)
++DO_3SAME_CMP(VCGE_S, TCG_COND_GE)
++DO_3SAME_CMP(VCGE_U, TCG_COND_GEU)
++DO_3SAME_CMP(VCEQ, TCG_COND_EQ)
++
++static void gen_VTST_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
++                         uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
++{
++    tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &cmtst_op[vece]);
++}
++DO_3SAME_NO_SZ_3(VTST, gen_VTST_3s)
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+                            u ? &mls_op[size] : &mla_op[size]);
+             return 0;
+-        case NEON_3R_VTST_VCEQ:
+-            if (u) { /* VCEQ */
+-                tcg_gen_gvec_cmp(TCG_COND_EQ, size, rd_ofs, rn_ofs, rm_ofs,
+-                                 vec_size, vec_size);
+-            } else { /* VTST */
+-                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
+-                               vec_size, vec_size, &cmtst_op[size]);
+-            }
+-            return 0;
+-
+-        case NEON_3R_VCGT:
+-            tcg_gen_gvec_cmp(u ? TCG_COND_GTU : TCG_COND_GT, size,
+-                             rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
+-            return 0;
+-
+-        case NEON_3R_VCGE:
+-            tcg_gen_gvec_cmp(u ? TCG_COND_GEU : TCG_COND_GE, size,
+-                             rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
+-            return 0;
+-
+         case NEON_3R_VSHL:
+             /* Note the operation is vshl vd,vm,vn */
+             tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+         case NEON_3R_LOGIC:
+         case NEON_3R_VMAX:
+         case NEON_3R_VMIN:
++        case NEON_3R_VTST_VCEQ:
++        case NEON_3R_VCGT:
++        case NEON_3R_VCGE:
+             /* Already handled by decodetree */
+             return 1;
+         }
+--
+.20.1

-New patch
+[PULL 37/39] target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree
+Convert the Neon VQADD/VQSUB insns in the 3-reg-same grouping
+to decodetree.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-19-peter.maydell@linaro.org
+---
+ target/arm/neon-dp.decode       |  6 ++++++
+ target/arm/translate-neon.inc.c | 15 +++++++++++++++
+ target/arm/translate.c          | 14 ++------------
+files changed, 23 insertions(+), 12 deletions(-)
+diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon-dp.decode
++++ b/target/arm/neon-dp.decode
+@@ -XXX,XX +XXX,XX @@
+ @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
+                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
++VQADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
++VQADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 1 .... @3same
++
+ @3same_logic     .... ... . . . .. .... .... .... . q:1 .. .... \
+                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
+@@ -XXX,XX +XXX,XX @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
+ VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
+ VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
++VQSUB_S_3s       1111 001 0 0 . .. .... .... 0010 . . . 1 .... @3same
++VQSUB_U_3s       1111 001 1 0 . .. .... .... 0010 . . . 1 .... @3same
++
+ VCGT_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 0 .... @3same
+ VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
+ VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
+diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.inc.c
++++ b/target/arm/translate-neon.inc.c
+@@ -XXX,XX +XXX,XX @@ static void gen_VTST_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+     tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &cmtst_op[vece]);
+ }
+ DO_3SAME_NO_SZ_3(VTST, gen_VTST_3s)
++
++#define DO_3SAME_GVEC4(INSN, OPARRAY)                                   \
++    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
++                                uint32_t rn_ofs, uint32_t rm_ofs,       \
++                                uint32_t oprsz, uint32_t maxsz)         \
++    {                                                                   \
++        tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),           \
++                       rn_ofs, rm_ofs, oprsz, maxsz, &OPARRAY[vece]);   \
++    }                                                                   \
++    DO_3SAME(INSN, gen_##INSN##_3s)
++
++DO_3SAME_GVEC4(VQADD_S, sqadd_op)
++DO_3SAME_GVEC4(VQADD_U, uqadd_op)
++DO_3SAME_GVEC4(VQSUB_S, sqsub_op)
++DO_3SAME_GVEC4(VQSUB_U, uqsub_op)
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+             }
+             return 1;
+-        case NEON_3R_VQADD:
+-            tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
+-                           rn_ofs, rm_ofs, vec_size, vec_size,
+-                           (u ? uqadd_op : sqadd_op) + size);
+-            return 0;
+-
+-        case NEON_3R_VQSUB:
+-            tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
+-                           rn_ofs, rm_ofs, vec_size, vec_size,
+-                           (u ? uqsub_op : sqsub_op) + size);
+-            return 0;
+-
+         case NEON_3R_VMUL: /* VMUL */
+             if (u) {
+                 /* Polynomial case allows only P8.  */
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+         case NEON_3R_VTST_VCEQ:
+         case NEON_3R_VCGT:
+         case NEON_3R_VCGE:
++        case NEON_3R_VQADD:
++        case NEON_3R_VQSUB:
+             /* Already handled by decodetree */
+             return 1;
+         }
+--
+.20.1

-New patch
+[PULL 38/39] target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to decodetree
+Convert the Neon VMUL, VMLA, VMLS and VSHL insns in the
+-reg-same grouping to decodetree.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-20-peter.maydell@linaro.org
+---
+ target/arm/neon-dp.decode       |  9 +++++++
+ target/arm/translate-neon.inc.c | 44 +++++++++++++++++++++++++++++++++
+ target/arm/translate.c          | 28 +++------------------
+files changed, 56 insertions(+), 25 deletions(-)
+diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon-dp.decode
++++ b/target/arm/neon-dp.decode
+@@ -XXX,XX +XXX,XX @@ VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
+ VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
+ VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
++VSHL_S_3s        1111 001 0 0 . .. .... .... 0100 . . . 0 .... @3same
++VSHL_U_3s        1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same
++
+ VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
+ VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
+ VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
+@@ -XXX,XX +XXX,XX @@ VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
+ VTST_3s          1111 001 0 0 . .. .... .... 1000 . . . 1 .... @3same
+ VCEQ_3s          1111 001 1 0 . .. .... .... 1000 . . . 1 .... @3same
++
++VMLA_3s          1111 001 0 0 . .. .... .... 1001 . . . 0 .... @3same
++VMLS_3s          1111 001 1 0 . .. .... .... 1001 . . . 0 .... @3same
++
++VMUL_3s          1111 001 0 0 . .. .... .... 1001 . . . 1 .... @3same
++VMUL_p_3s        1111 001 1 0 . .. .... .... 1001 . . . 1 .... @3same
+diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.inc.c
++++ b/target/arm/translate-neon.inc.c
+@@ -XXX,XX +XXX,XX @@ DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
+ DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
+ DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
+ DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
++DO_3SAME_NO_SZ_3(VMUL, tcg_gen_gvec_mul)
+ #define DO_3SAME_CMP(INSN, COND)                                        \
+     static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+@@ -XXX,XX +XXX,XX @@ DO_3SAME_GVEC4(VQADD_S, sqadd_op)
+ DO_3SAME_GVEC4(VQADD_U, uqadd_op)
+ DO_3SAME_GVEC4(VQSUB_S, sqsub_op)
+ DO_3SAME_GVEC4(VQSUB_U, uqsub_op)
++
++static void gen_VMUL_p_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
++                           uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
++{
++    tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz,
++                       0, gen_helper_gvec_pmul_b);
++}
++
++static bool trans_VMUL_p_3s(DisasContext *s, arg_3same *a)
++{
++    if (a->size != 0) {
++        return false;
++    }
++    return do_3same(s, a, gen_VMUL_p_3s);
++}
++
++#define DO_3SAME_GVEC3_NO_SZ_3(INSN, OPARRAY)                           \
++    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
++                                uint32_t rn_ofs, uint32_t rm_ofs,       \
++                                uint32_t oprsz, uint32_t maxsz)         \
++    {                                                                   \
++        tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,                          \
++                       oprsz, maxsz, &OPARRAY[vece]);                   \
++    }                                                                   \
++    DO_3SAME_NO_SZ_3(INSN, gen_##INSN##_3s)
++
++
++DO_3SAME_GVEC3_NO_SZ_3(VMLA, mla_op)
++DO_3SAME_GVEC3_NO_SZ_3(VMLS, mls_op)
++
++#define DO_3SAME_GVEC3_SHIFT(INSN, OPARRAY)                             \
++    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
++                                uint32_t rn_ofs, uint32_t rm_ofs,       \
++                                uint32_t oprsz, uint32_t maxsz)         \
++    {                                                                   \
++        /* Note the operation is vshl vd,vm,vn */                       \
++        tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs,                          \
++                       oprsz, maxsz, &OPARRAY[vece]);                   \
++    }                                                                   \
++    DO_3SAME(INSN, gen_##INSN##_3s)
++
++DO_3SAME_GVEC3_SHIFT(VSHL_S, sshl_op)
++DO_3SAME_GVEC3_SHIFT(VSHL_U, ushl_op)
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+             }
+             return 1;
+-        case NEON_3R_VMUL: /* VMUL */
+-            if (u) {
+-                /* Polynomial case allows only P8.  */
+-                if (size != 0) {
+-                    return 1;
+-                }
+-                tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
+-                                   0, gen_helper_gvec_pmul_b);
+-            } else {
+-                tcg_gen_gvec_mul(size, rd_ofs, rn_ofs, rm_ofs,
+-                                 vec_size, vec_size);
+-            }
+-            return 0;
+-
+-        case NEON_3R_VML: /* VMLA, VMLS */
+-            tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
+-                           u ? &mls_op[size] : &mla_op[size]);
+-            return 0;
+-
+-        case NEON_3R_VSHL:
+-            /* Note the operation is vshl vd,vm,vn */
+-            tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
+-                           u ? &ushl_op[size] : &sshl_op[size]);
+-            return 0;
+-
+         case NEON_3R_VADD_VSUB:
+         case NEON_3R_LOGIC:
+         case NEON_3R_VMAX:
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+         case NEON_3R_VCGE:
+         case NEON_3R_VQADD:
+         case NEON_3R_VQSUB:
++        case NEON_3R_VMUL:
++        case NEON_3R_VML:
++        case NEON_3R_VSHL:
+             /* Already handled by decodetree */
+             return 1;
+         }
+--
+.20.1

-[Qemu-devel] [PULL 03/22] target/arm: Add BT and BTYPE to tb->flags
+[PULL 39/39] target/arm: Move gen_ function typedefs to translate.h
-From: Richard Henderson <richard.henderson@linaro.org>
+We're going to want at least some of the NeonGen* typedefs
 for the refactored 32-bit Neon decoder, so move them all
 to translate.h since it makes more sense to keep them in
 one group.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190128223118.5255-4-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-23-peter.maydell@linaro.org
 ---
- target/arm/cpu.h           |  2 ++
+ target/arm/translate.h     | 17 +++++++++++++++++
- target/arm/translate.h     |  4 ++++
+ target/arm/translate-a64.c | 17 -----------------
- target/arm/helper.c        | 22 +++++++++++++++-------
+files changed, 17 insertions(+), 17 deletions(-)
  target/arm/translate-a64.c |  2 ++
 files changed, 23 insertions(+), 7 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
-+++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, TBII, 0, 2)
- FIELD(TBFLAG_A64, SVEEXC_EL, 2, 2)
- FIELD(TBFLAG_A64, ZCR_LEN, 4, 4)
- FIELD(TBFLAG_A64, PAUTH_ACTIVE, 8, 1)
-+FIELD(TBFLAG_A64, BT, 9, 1)
-+FIELD(TBFLAG_A64, BTYPE, 10, 2)
- static inline bool bswap_code(bool sctlr_b)
- {
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
-@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
+@@ -XXX,XX +XXX,XX @@ typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
-     bool ss_same_el;
+ typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
-     /* True if v8.3-PAuth is active.  */
+                         uint32_t, uint32_t, uint32_t);
-     bool pauth_active;
-+    /* True with v8.5-BTI and SCTLR_ELx.BT* set.  */
++/* Function prototype for gen_ functions for calling Neon helpers */
-+    bool bt;
++typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
-+    /* A copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.  */
++typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
-+    uint8_t btype;
++typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
-     /* Bottom two bits of XScale c15_cpar coprocessor access control reg */
++typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
-     int c15_cpar;
++typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
-     /* TCG op of the current insn_start.  */
++typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
-diff --git a/target/arm/helper.c b/target/arm/helper.c
++typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
-index XXXXXXX..XXXXXXX 100644
++typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
---- a/target/arm/helper.c
++typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
-+++ b/target/arm/helper.c
++typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
++typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
++typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
-     if (is_a64(env)) {
++typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
-         ARMCPU *cpu = arm_env_get_cpu(env);
++typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
-+        uint64_t sctlr;
++typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
          *pc = env->pc;
          flags = FIELD_DP32(flags, TBFLAG_ANY, AARCH64_STATE, 1);
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
              flags = FIELD_DP32(flags, TBFLAG_A64, ZCR_LEN, zcr_len);
          }
 +        if (current_el == 0) {
 +            /* FIXME: ARMv8.1-VHE S2 translation regime.  */
 +            sctlr = env->cp15.sctlr_el[1];
 +        } else {
 +            sctlr = env->cp15.sctlr_el[current_el];
 +        }
          if (cpu_isar_feature(aa64_pauth, cpu)) {
              /*
               * In order to save space in flags, we record only whether
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
               * a nop, or "active" when some action must be performed.
               * The decision of which action to take is left to a helper.
               */
 -            uint64_t sctlr;
 -            if (current_el == 0) {
 -                /* FIXME: ARMv8.1-VHE S2 translation regime.  */
 -                sctlr = env->cp15.sctlr_el[1];
 -            } else {
 -                sctlr = env->cp15.sctlr_el[current_el];
 -            }
              if (sctlr & (SCTLR_EnIA | SCTLR_EnIB | SCTLR_EnDA | SCTLR_EnDB)) {
                  flags = FIELD_DP32(flags, TBFLAG_A64, PAUTH_ACTIVE, 1);
              }
          }
 +
-+        if (cpu_isar_feature(aa64_bti, cpu)) {
+ #endif /* TARGET_ARM_TRANSLATE_H */
 +            /* Note that SCTLR_EL[23].BT == SCTLR_BT1.  */
 +            if (sctlr & (current_el == 0 ? SCTLR_BT0 : SCTLR_BT1)) {
 +                flags = FIELD_DP32(flags, TBFLAG_A64, BT, 1);
 +            }
 +            flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
 +        }
      } else {
          *pc = env->regs[15];
          flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
+@@ -XXX,XX +XXX,XX @@ typedef struct AArch64DecodeTable {
-     dc->sve_excp_el = FIELD_EX32(tb_flags, TBFLAG_A64, SVEEXC_EL);
+     AArch64DecodeFn *disas_fn;
-     dc->sve_len = (FIELD_EX32(tb_flags, TBFLAG_A64, ZCR_LEN) + 1) * 16;
+ } AArch64DecodeTable;
-     dc->pauth_active = FIELD_EX32(tb_flags, TBFLAG_A64, PAUTH_ACTIVE);
-+    dc->bt = FIELD_EX32(tb_flags, TBFLAG_A64, BT);
+-/* Function prototype for gen_ functions for calling Neon helpers */
-+    dc->btype = FIELD_EX32(tb_flags, TBFLAG_A64, BTYPE);
+-typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
-     dc->vec_len = 0;
+-typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
-     dc->vec_stride = 0;
+-typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
-     dc->cp_regs = arm_cpu->cp_regs;
+-typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
 -typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
 -typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
 -typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
 -typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
 -typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
 -typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
 -typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
 -typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
 -typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 -typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 -typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
 -
  /* initialize TCG globals.  */
  void a64_translate_init(void)
  {
 --
 .20.1

Arm stuff, mostly patches from RTH.

thanks
-- PMM

The following changes since commit 01a9a51ffaf4699827ea6425cb2b834a356e159d:

Merge remote-tracking branch 'remotes/kraxel/tags/ui-20190205-pull-request' into staging (2019-02-05 14:01:29 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20190205

for you to fetch changes up to a15945d98d3a3390c3da344d1b47218e91e49d8b:

target/arm: Make FPSCR/FPCR trapped-exception bits RAZ/WI (2019-02-05 16:52:42 +0000)

----------------------------------------------------------------
target-arm queue:
 * Implement Armv8.5-BTI extension for system emulation mode
 * Implement the PR_PAC_RESET_KEYS prctl() for linux-user mode's Armv8.3-PAuth support
 * Support TBI (top-byte-ignore) properly for linux-user mode
 * gdbstub: allow killing QEMU via vKill command
 * hw/arm/boot: Support DTB autoload for firmware-only boots
 * target/arm: Make FPSCR/FPCR trapped-exception bits RAZ/WI

----------------------------------------------------------------
Max Filippov (1):
      gdbstub: allow killing QEMU via vKill command

Peter Maydell (7):
      target/arm: Compute TB_FLAGS for TBI for user-only
      hw/arm/boot: Fix block comment style in arm_load_kernel()
      hw/arm/boot: Factor out "direct kernel boot" code into its own function
      hw/arm/boot: Factor out "set up firmware boot" code
      hw/arm/boot: Clarify why arm_setup_firmware_boot() doesn't set env->boot_info
      hw/arm/boot: Support DTB autoload for firmware-only boots
      target/arm: Make FPSCR/FPCR trapped-exception bits RAZ/WI

Richard Henderson (14):
      target/arm: Introduce isar_feature_aa64_bti
      target/arm: Add PSTATE.BTYPE
      target/arm: Add BT and BTYPE to tb->flags
      exec: Add target-specific tlb bits to MemTxAttrs
      target/arm: Cache the GP bit for a page in MemTxAttrs
      target/arm: Default handling of BTYPE during translation
      target/arm: Reset btype for direct branches
      target/arm: Set btype for indirect branches
      target/arm: Enable BTI for -cpu max
      linux-user: Implement PR_PAC_RESET_KEYS
      tests/tcg/aarch64: Add pauth smoke test
      target/arm: Add TBFLAG_A64_TBID, split out gen_top_byte_ignore
      target/arm: Clean TBI for data operations in the translator
      target/arm: Enable TBI for user-only

From: Richard Henderson <richard.henderson@linaro.org>

Also create field definitions for id_aa64pfr1 from ARMv8.5.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190128223118.5255-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(ID_AA64PFR0, GIC, 24, 4)
 FIELD(ID_AA64PFR0, RAS, 28, 4)
 FIELD(ID_AA64PFR0, SVE, 32, 4)
 
+FIELD(ID_AA64PFR1, BT, 0, 4)
+FIELD(ID_AA64PFR1, SBSS, 4, 4)
+FIELD(ID_AA64PFR1, MTE, 8, 4)
+FIELD(ID_AA64PFR1, RAS_FRAC, 12, 4)
+
 FIELD(ID_AA64MMFR0, PARANGE, 0, 4)
 FIELD(ID_AA64MMFR0, ASIDBITS, 4, 4)
 FIELD(ID_AA64MMFR0, BIGEND, 8, 4)
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_lor(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, LO) != 0;
 }
 
+static inline bool isar_feature_aa64_bti(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, BT) != 0;
+}
+
 /*
  * Forward to the above feature tests given an ARMCPU pointer.
  */
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Place this in its own field within ENV, as that will
make it easier to reset from within TCG generated code.

With the change to pstate_read/write, exception entry
and return are automatically handled.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190128223118.5255-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           | 8 ++++++--
 target/arm/translate-a64.c | 3 +++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
      *    semantics as for AArch32, as described in the comments on each field)
      *  nRW (also known as M[4]) is kept, inverted, in env->aarch64
      *  DAIF (exception masks) are kept in env->daif
+     *  BTYPE is kept in env->btype
      *  all other bits are stored in their correct places in env->pstate
      */
     uint32_t pstate;
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
     uint32_t GE; /* cpsr[19:16] */
     uint32_t thumb; /* cpsr[5]. 0 = arm mode, 1 = thumb mode. */
     uint32_t condexec_bits; /* IT bits.  cpsr[15:10,26:25].  */
+    uint32_t btype;  /* BTI branch type.  spsr[11:10].  */
     uint64_t daif; /* exception masks, in the bits they are in PSTATE */
 
     uint64_t elr_el[4]; /* AArch64 exception link regs  */
@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
 #define PSTATE_I (1U << 7)
 #define PSTATE_A (1U << 8)
 #define PSTATE_D (1U << 9)
+#define PSTATE_BTYPE (3U << 10)
 #define PSTATE_IL (1U << 20)
 #define PSTATE_SS (1U << 21)
 #define PSTATE_V (1U << 28)
@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
 #define PSTATE_N (1U << 31)
 #define PSTATE_NZCV (PSTATE_N | PSTATE_Z | PSTATE_C | PSTATE_V)
 #define PSTATE_DAIF (PSTATE_D | PSTATE_A | PSTATE_I | PSTATE_F)
-#define CACHED_PSTATE_BITS (PSTATE_NZCV | PSTATE_DAIF)
+#define CACHED_PSTATE_BITS (PSTATE_NZCV | PSTATE_DAIF | PSTATE_BTYPE)
 /* Mode values for AArch64 */
 #define PSTATE_MODE_EL3h 13
 #define PSTATE_MODE_EL3t 12
@@ -XXX,XX +XXX,XX @@ static inline uint32_t pstate_read(CPUARMState *env)
     ZF = (env->ZF == 0);
     return (env->NF & 0x80000000) | (ZF << 30)
         | (env->CF << 29) | ((env->VF & 0x80000000) >> 3)
-        | env->pstate | env->daif;
+        | env->pstate | env->daif | (env->btype << 10);
 }
 
 static inline void pstate_write(CPUARMState *env, uint32_t val)
@@ -XXX,XX +XXX,XX @@ static inline void pstate_write(CPUARMState *env, uint32_t val)
     env->CF = (val >> 29) & 1;
     env->VF = (val << 3) & 0x80000000;
     env->daif = val & PSTATE_DAIF;
+    env->btype = (val >> 10) & 3;
     env->pstate = val & ~CACHED_PSTATE_BITS;
 }
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
                 el,
                 psr & PSTATE_SP ? 'h' : 't');
 
+    if (cpu_isar_feature(aa64_bti, cpu)) {
+        cpu_fprintf(f, "  BTYPE=%d", (psr & PSTATE_BTYPE) >> 10);
+    }
     if (!(flags & CPU_DUMP_FPU)) {
         cpu_fprintf(f, "\n");
         return;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190128223118.5255-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           |  2 ++
 target/arm/translate.h     |  4 ++++
 target/arm/helper.c        | 22 +++++++++++++++-------
 target/arm/translate-a64.c |  2 ++
 4 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, TBII, 0, 2)
 FIELD(TBFLAG_A64, SVEEXC_EL, 2, 2)
 FIELD(TBFLAG_A64, ZCR_LEN, 4, 4)
 FIELD(TBFLAG_A64, PAUTH_ACTIVE, 8, 1)
+FIELD(TBFLAG_A64, BT, 9, 1)
+FIELD(TBFLAG_A64, BTYPE, 10, 2)
 
 static inline bool bswap_code(bool sctlr_b)
 {
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool ss_same_el;
     /* True if v8.3-PAuth is active.  */
     bool pauth_active;
+    /* True with v8.5-BTI and SCTLR_ELx.BT* set.  */
+    bool bt;
+    /* A copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.  */
+    uint8_t btype;
     /* Bottom two bits of XScale c15_cpar coprocessor access control reg */
     int c15_cpar;
     /* TCG op of the current insn_start.  */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
 
     if (is_a64(env)) {
         ARMCPU *cpu = arm_env_get_cpu(env);
+        uint64_t sctlr;
 
         *pc = env->pc;
         flags = FIELD_DP32(flags, TBFLAG_ANY, AARCH64_STATE, 1);
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
             flags = FIELD_DP32(flags, TBFLAG_A64, ZCR_LEN, zcr_len);
         }
 
+        if (current_el == 0) {
+            /* FIXME: ARMv8.1-VHE S2 translation regime.  */
+            sctlr = env->cp15.sctlr_el[1];
+        } else {
+            sctlr = env->cp15.sctlr_el[current_el];
+        }
         if (cpu_isar_feature(aa64_pauth, cpu)) {
             /*
              * In order to save space in flags, we record only whether
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
              * a nop, or "active" when some action must be performed.
              * The decision of which action to take is left to a helper.
              */
-            uint64_t sctlr;
-            if (current_el == 0) {
-                /* FIXME: ARMv8.1-VHE S2 translation regime.  */
-                sctlr = env->cp15.sctlr_el[1];
-            } else {
-                sctlr = env->cp15.sctlr_el[current_el];
-            }
             if (sctlr & (SCTLR_EnIA | SCTLR_EnIB | SCTLR_EnDA | SCTLR_EnDB)) {
                 flags = FIELD_DP32(flags, TBFLAG_A64, PAUTH_ACTIVE, 1);
             }
         }
+
+        if (cpu_isar_feature(aa64_bti, cpu)) {
+            /* Note that SCTLR_EL[23].BT == SCTLR_BT1.  */
+            if (sctlr & (current_el == 0 ? SCTLR_BT0 : SCTLR_BT1)) {
+                flags = FIELD_DP32(flags, TBFLAG_A64, BT, 1);
+            }
+            flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
+        }
     } else {
         *pc = env->regs[15];
         flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->sve_excp_el = FIELD_EX32(tb_flags, TBFLAG_A64, SVEEXC_EL);
     dc->sve_len = (FIELD_EX32(tb_flags, TBFLAG_A64, ZCR_LEN) + 1) * 16;
     dc->pauth_active = FIELD_EX32(tb_flags, TBFLAG_A64, PAUTH_ACTIVE);
+    dc->bt = FIELD_EX32(tb_flags, TBFLAG_A64, BT);
+    dc->btype = FIELD_EX32(tb_flags, TBFLAG_A64, BTYPE);
     dc->vec_len = 0;
     dc->vec_stride = 0;
     dc->cp_regs = arm_cpu->cp_regs;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

These bits can be used to cache target-specific data in cputlb
read from the page tables.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190128223118.5255-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/exec/memattrs.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
index XXXXXXX..XXXXXXX 100644
--- a/include/exec/memattrs.h
+++ b/include/exec/memattrs.h
@@ -XXX,XX +XXX,XX @@ typedef struct MemTxAttrs {
     unsigned int user:1;
     /* Requester ID (for MSI for example) */
     unsigned int requester_id:16;
+    /*
+     * The following are target-specific page-table bits.  These are not
+     * related to actual memory transactions at all.  However, this structure
+     * is part of the tlb_fill interface, cached in the cputlb structure,
+     * and has unused bits.  These fields will be read by target-specific
+     * helpers using env->iotlb[mmu_idx][tlb_index()].attrs.target_tlb_bitN.
+     */
+    unsigned int target_tlb_bit0 : 1;
+    unsigned int target_tlb_bit1 : 1;
+    unsigned int target_tlb_bit2 : 1;
 } MemTxAttrs;
 
 /* Bus masters which don't specify any attributes will get this,
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Caching the bit means that we will not have to re-walk the
page tables to look up the bit during translation.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190128223118.5255-6-richard.henderson@linaro.org
[PMM: no need to OR in guarded bit status]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
     bool ttbr1_valid;
     uint64_t descaddrmask;
     bool aarch64 = arm_el_is_aa64(env, el);
+    bool guarded = false;
 
     /* TODO:
      * This code does not handle the different format TCR for VTCR_EL2.
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
         }
         /* Merge in attributes from table descriptors */
         attrs |= nstable << 3; /* NS */
+        guarded = extract64(descriptor, 50, 1);  /* GP */
         if (param.hpd) {
             /* HPD disables all the table attributes except NSTable.  */
             break;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
          */
         txattrs->secure = false;
     }
+    /* When in aarch64 mode, and BTI is enabled, remember GP in the IOTLB.  */
+    if (aarch64 && guarded && cpu_isar_feature(aa64_bti, cpu)) {
+        txattrs->target_tlb_bit0 = true;
+    }
 
     if (cacheattrs != NULL) {
         if (mmu_idx == ARMMMUIdx_S2NS) {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The branch target exception for guarded pages has high priority,
and only 8 instructions are valid for that case.  Perform this
check before doing any other decode.

Clear BTYPE after all insns that neither set BTYPE nor exit via
exception (DISAS_NORETURN).

Not yet handled are insns that exit via DISAS_NORETURN for some
other reason, like direct branches.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190128223118.5255-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h     |   6 ++
 target/arm/translate.h     |   9 ++-
 target/arm/translate-a64.c | 139 +++++++++++++++++++++++++++++++++++++
 3 files changed, 152 insertions(+), 2 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ enum arm_exception_class {
     EC_FPIDTRAP               = 0x08,
     EC_PACTRAP                = 0x09,
     EC_CP14RRTTRAP            = 0x0c,
+    EC_BTITRAP                = 0x0d,
     EC_ILLEGALSTATE           = 0x0e,
     EC_AA32_SVC               = 0x11,
     EC_AA32_HVC               = 0x12,
@@ -XXX,XX +XXX,XX @@ static inline uint32_t syn_pactrap(void)
     return EC_PACTRAP << ARM_EL_EC_SHIFT;
 }
 
+static inline uint32_t syn_btitrap(int btype)
+{
+    return (EC_BTITRAP << ARM_EL_EC_SHIFT) | btype;
+}
+
 static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
 {
     return (EC_INSNABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool pauth_active;
     /* True with v8.5-BTI and SCTLR_ELx.BT* set.  */
     bool bt;
-    /* A copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.  */
-    uint8_t btype;
+    /*
+     * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
+     *  < 0, set by the current instruction.
+     */
+    int8_t btype;
+    /* True if this page is guarded.  */
+    bool guarded_page;
     /* Bottom two bits of XScale c15_cpar coprocessor access control reg */
     int c15_cpar;
     /* TCG op of the current insn_start.  */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static inline int get_a64_user_mem_index(DisasContext *s)
     return arm_to_core_mmu_idx(useridx);
 }
 
+static void reset_btype(DisasContext *s)
+{
+    if (s->btype != 0) {
+        TCGv_i32 zero = tcg_const_i32(0);
+        tcg_gen_st_i32(zero, cpu_env, offsetof(CPUARMState, btype));
+        tcg_temp_free_i32(zero);
+        s->btype = 0;
+    }
+}
+
 void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
                             fprintf_function cpu_fprintf, int flags)
 {
@@ -XXX,XX +XXX,XX @@ static void disas_data_proc_simd_fp(DisasContext *s, uint32_t insn)
     }
 }
 
+/**
+ * is_guarded_page:
+ * @env: The cpu environment
+ * @s: The DisasContext
+ *
+ * Return true if the page is guarded.
+ */
+static bool is_guarded_page(CPUARMState *env, DisasContext *s)
+{
+#ifdef CONFIG_USER_ONLY
+    return false;  /* FIXME */
+#else
+    uint64_t addr = s->base.pc_first;
+    int mmu_idx = arm_to_core_mmu_idx(s->mmu_idx);
+    unsigned int index = tlb_index(env, mmu_idx, addr);
+    CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
+
+    /*
+     * We test this immediately after reading an insn, which means
+     * that any normal page must be in the TLB.  The only exception
+     * would be for executing from flash or device memory, which
+     * does not retain the TLB entry.
+     *
+     * FIXME: Assume false for those, for now.  We could use
+     * arm_cpu_get_phys_page_attrs_debug to re-read the page
+     * table entry even for that case.
+     */
+    return (tlb_hit(entry->addr_code, addr) &&
+            env->iotlb[mmu_idx][index].attrs.target_tlb_bit0);
+#endif
+}
+
+/**
+ * btype_destination_ok:
+ * @insn: The instruction at the branch destination
+ * @bt: SCTLR_ELx.BT
+ * @btype: PSTATE.BTYPE, and is non-zero
+ *
+ * On a guarded page, there are a limited number of insns
+ * that may be present at the branch target:
+ *   - branch target identifiers,
+ *   - paciasp, pacibsp,
+ *   - BRK insn
+ *   - HLT insn
+ * Anything else causes a Branch Target Exception.
+ *
+ * Return true if the branch is compatible, false to raise BTITRAP.
+ */
+static bool btype_destination_ok(uint32_t insn, bool bt, int btype)
+{
+    if ((insn & 0xfffff01fu) == 0xd503201fu) {
+        /* HINT space */
+        switch (extract32(insn, 5, 7)) {
+        case 0b011001: /* PACIASP */
+        case 0b011011: /* PACIBSP */
+            /*
+             * If SCTLR_ELx.BT, then PACI*SP are not compatible
+             * with btype == 3.  Otherwise all btype are ok.
+             */
+            return !bt || btype != 3;
+        case 0b100000: /* BTI */
+            /* Not compatible with any btype.  */
+            return false;
+        case 0b100010: /* BTI c */
+            /* Not compatible with btype == 3 */
+            return btype != 3;
+        case 0b100100: /* BTI j */
+            /* Not compatible with btype == 2 */
+            return btype != 2;
+        case 0b100110: /* BTI jc */
+            /* Compatible with any btype.  */
+            return true;
+        }
+    } else {
+        switch (insn & 0xffe0001fu) {
+        case 0xd4200000u: /* BRK */
+        case 0xd4400000u: /* HLT */
+            /* Give priority to the breakpoint exception.  */
+            return true;
+        }
+    }
+    return false;
+}
+
 /* C3.1 A64 instruction index by encoding */
 static void disas_a64_insn(CPUARMState *env, DisasContext *s)
 {
@@ -XXX,XX +XXX,XX @@ static void disas_a64_insn(CPUARMState *env, DisasContext *s)
 
     s->fp_access_checked = false;
 
+    if (dc_isar_feature(aa64_bti, s)) {
+        if (s->base.num_insns == 1) {
+            /*
+             * At the first insn of the TB, compute s->guarded_page.
+             * We delayed computing this until successfully reading
+             * the first insn of the TB, above.  This (mostly) ensures
+             * that the softmmu tlb entry has been populated, and the
+             * page table GP bit is available.
+             *
+             * Note that we need to compute this even if btype == 0,
+             * because this value is used for BR instructions later
+             * where ENV is not available.
+             */
+            s->guarded_page = is_guarded_page(env, s);
+
+            /* First insn can have btype set to non-zero.  */
+            tcg_debug_assert(s->btype >= 0);
+
+            /*
+             * Note that the Branch Target Exception has fairly high
+             * priority -- below debugging exceptions but above most
+             * everything else.  This allows us to handle this now
+             * instead of waiting until the insn is otherwise decoded.
+             */
+            if (s->btype != 0
+                && s->guarded_page
+                && !btype_destination_ok(insn, s->bt, s->btype)) {
+                gen_exception_insn(s, 4, EXCP_UDEF, syn_btitrap(s->btype),
+                                   default_exception_el(s));
+                return;
+            }
+        } else {
+            /* Not the first insn: btype must be 0.  */
+            tcg_debug_assert(s->btype == 0);
+        }
+    }
+
     switch (extract32(insn, 25, 4)) {
     case 0x0: case 0x1: case 0x3: /* UNALLOCATED */
         unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_a64_insn(CPUARMState *env, DisasContext *s)
 
     /* if we allocated any temporaries, free them here */
     free_tmp_a64(s);
+
+    /*
+     * After execution of most insns, btype is reset to 0.
+     * Note that we set btype == -1 when the insn sets btype.
+     */
+    if (s->btype > 0 && s->base.is_jmp != DISAS_NORETURN) {
+        reset_btype(s);
+    }
 }
 
 static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is all of the non-exception cases of DISAS_NORETURN.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190128223118.5255-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_imm(DisasContext *s, uint32_t insn)
     }
 
     /* B Branch / BL Branch with link */
+    reset_btype(s);
     gen_goto_tb(s, 0, addr);
 }
 
@@ -XXX,XX +XXX,XX @@ static void disas_comp_b_imm(DisasContext *s, uint32_t insn)
     tcg_cmp = read_cpu_reg(s, rt, sf);
     label_match = gen_new_label();
 
+    reset_btype(s);
     tcg_gen_brcondi_i64(op ? TCG_COND_NE : TCG_COND_EQ,
                         tcg_cmp, 0, label_match);
 
@@ -XXX,XX +XXX,XX @@ static void disas_test_b_imm(DisasContext *s, uint32_t insn)
     tcg_cmp = tcg_temp_new_i64();
     tcg_gen_andi_i64(tcg_cmp, cpu_reg(s, rt), (1ULL << bit_pos));
     label_match = gen_new_label();
+
+    reset_btype(s);
     tcg_gen_brcondi_i64(op ? TCG_COND_NE : TCG_COND_EQ,
                         tcg_cmp, 0, label_match);
     tcg_temp_free_i64(tcg_cmp);
@@ -XXX,XX +XXX,XX @@ static void disas_cond_b_imm(DisasContext *s, uint32_t insn)
     addr = s->pc + sextract32(insn, 5, 19) * 4 - 4;
     cond = extract32(insn, 0, 4);
 
+    reset_btype(s);
     if (cond < 0x0e) {
         /* genuinely conditional branches */
         TCGLabel *label_match = gen_new_label();
@@ -XXX,XX +XXX,XX @@ static void handle_sync(DisasContext *s, uint32_t insn,
          * a self-modified code correctly and also to take
          * any pending interrupts immediately.
          */
+        reset_btype(s);
         gen_goto_tb(s, 0, s->pc);
         return;
     default:
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190128223118.5255-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void reset_btype(DisasContext *s)
     }
 }
 
+static void set_btype(DisasContext *s, int val)
+{
+    TCGv_i32 tcg_val;
+
+    /* BTYPE is a 2-bit field, and 0 should be done with reset_btype.  */
+    tcg_debug_assert(val >= 1 && val <= 3);
+
+    tcg_val = tcg_const_i32(val);
+    tcg_gen_st_i32(tcg_val, cpu_env, offsetof(CPUARMState, btype));
+    tcg_temp_free_i32(tcg_val);
+    s->btype = -1;
+}
+
 void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
                             fprintf_function cpu_fprintf, int flags)
 {
@@ -XXX,XX +XXX,XX @@ static void disas_exc(DisasContext *s, uint32_t insn)
 static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
 {
     unsigned int opc, op2, op3, rn, op4;
+    unsigned btype_mod = 2;   /* 0: BR, 1: BLR, 2: other */
     TCGv_i64 dst;
     TCGv_i64 modifier;
 
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
     case 0: /* BR */
     case 1: /* BLR */
     case 2: /* RET */
+        btype_mod = opc;
         switch (op3) {
         case 0:
             /* BR, BLR, RET */
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
         default:
             goto do_unallocated;
         }
-
         gen_a64_set_pc(s, dst);
         /* BLR also needs to load return address */
         if (opc == 1) {
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
         if ((op3 & ~1) != 2) {
             goto do_unallocated;
         }
+        btype_mod = opc & 1;
         if (s->pauth_active) {
             dst = new_tmp_a64(s);
             modifier = cpu_reg_sp(s, op4);
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
         return;
     }
 
+    switch (btype_mod) {
+    case 0: /* BR */
+        if (dc_isar_feature(aa64_bti, s)) {
+            /* BR to {x16,x17} or !guard -> 1, else 3.  */
+            set_btype(s, rn == 16 || rn == 17 || !s->guarded_page ? 1 : 3);
+        }
+        break;
+
+    case 1: /* BLR */
+        if (dc_isar_feature(aa64_bti, s)) {
+            /* BLR sets BTYPE to 2, regardless of source guarded page.  */
+            set_btype(s, 2);
+        }
+        break;
+
+    default: /* RET or none of the above.  */
+        /* BTYPE will be set to 0 by normal end-of-insn processing.  */
+        break;
+    }
+
     s->base.is_jmp = DISAS_JUMP;
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190201195404.30486-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/aarch64/target_syscall.h |  7 ++++++
 linux-user/syscall.c                | 36 +++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/linux-user/aarch64/target_syscall.h b/linux-user/aarch64/target_syscall.h
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/aarch64/target_syscall.h
+++ b/linux-user/aarch64/target_syscall.h
@@ -XXX,XX +XXX,XX @@ struct target_pt_regs {
 #define TARGET_PR_SVE_SET_VL  50
 #define TARGET_PR_SVE_GET_VL  51
 
+#define TARGET_PR_PAC_RESET_KEYS 54
+# define TARGET_PR_PAC_APIAKEY   (1 << 0)
+# define TARGET_PR_PAC_APIBKEY   (1 << 1)
+# define TARGET_PR_PAC_APDAKEY   (1 << 2)
+# define TARGET_PR_PAC_APDBKEY   (1 << 3)
+# define TARGET_PR_PAC_APGAKEY   (1 << 4)
+
 void arm_init_pauth_key(ARMPACKey *key);
 
 #endif /* AARCH64_TARGET_SYSCALL_H */
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -XXX,XX +XXX,XX @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
                 }
             }
             return ret;
+        case TARGET_PR_PAC_RESET_KEYS:
+            {
+                CPUARMState *env = cpu_env;
+                ARMCPU *cpu = arm_env_get_cpu(env);
+
+                if (arg3 || arg4 || arg5) {
+                    return -TARGET_EINVAL;
+                }
+                if (cpu_isar_feature(aa64_pauth, cpu)) {
+                    int all = (TARGET_PR_PAC_APIAKEY | TARGET_PR_PAC_APIBKEY |
+                               TARGET_PR_PAC_APDAKEY | TARGET_PR_PAC_APDBKEY |
+                               TARGET_PR_PAC_APGAKEY);
+                    if (arg2 == 0) {
+                        arg2 = all;
+                    } else if (arg2 & ~all) {
+                        return -TARGET_EINVAL;
+                    }
+                    if (arg2 & TARGET_PR_PAC_APIAKEY) {
+                        arm_init_pauth_key(&env->apia_key);
+                    }
+                    if (arg2 & TARGET_PR_PAC_APIBKEY) {
+                        arm_init_pauth_key(&env->apib_key);
+                    }
+                    if (arg2 & TARGET_PR_PAC_APDAKEY) {
+                        arm_init_pauth_key(&env->apda_key);
+                    }
+                    if (arg2 & TARGET_PR_PAC_APDBKEY) {
+                        arm_init_pauth_key(&env->apdb_key);
+                    }
+                    if (arg2 & TARGET_PR_PAC_APGAKEY) {
+                        arm_init_pauth_key(&env->apga_key);
+                    }
+                    return 0;
+                }
+            }
+            return -TARGET_EINVAL;
 #endif /* AARCH64 */
         case PR_GET_SECCOMP:
         case PR_SET_SECCOMP:
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190201195404.30486-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/tcg/aarch64/Makefile.target |  6 +++++-
 tests/tcg/aarch64/pauth-1.c       | 23 +++++++++++++++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/aarch64/pauth-1.c

diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index XXXXXXX..XXXXXXX 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -XXX,XX +XXX,XX @@ VPATH 		+= $(AARCH64_SRC)
 # we don't build any of the ARM tests
 AARCH64_TESTS=$(filter-out $(ARM_TESTS), $(TESTS))
 AARCH64_TESTS+=fcvt
-TESTS:=$(AARCH64_TESTS)
 
 fcvt: LDFLAGS+=-lm
 
 run-fcvt: fcvt
 	$(call run-test,$<,$(QEMU) $<, "$< on $(TARGET_NAME)")
 	$(call diff-out,$<,$(AARCH64_SRC)/fcvt.ref)
+
+AARCH64_TESTS += pauth-1
+run-pauth-%: QEMU += -cpu max
+
+TESTS:=$(AARCH64_TESTS)
diff --git a/tests/tcg/aarch64/pauth-1.c b/tests/tcg/aarch64/pauth-1.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/tcg/aarch64/pauth-1.c
@@ -XXX,XX +XXX,XX @@
+#include <assert.h>
+#include <sys/prctl.h>
+
+asm(".arch armv8.4-a");
+
+#ifndef PR_PAC_RESET_KEYS
+#define PR_PAC_RESET_KEYS  54
+#define PR_PAC_APDAKEY     (1 << 2)
+#endif
+
+int main()
+{
+    int x;
+    void *p0 = &x, *p1, *p2;
+
+    asm volatile("pacdza %0" : "=r"(p1) : "0"(p0));
+    prctl(PR_PAC_RESET_KEYS, PR_PAC_APDAKEY, 0, 0, 0);
+    asm volatile("pacdza %0" : "=r"(p2) : "0"(p0));
+
+    assert(p1 != p0);
+    assert(p1 != p2);
+    return 0;
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Split out gen_top_byte_ignore in preparation of handling these
data accesses; the new tbflags field is not yet honored.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190204132126.3255-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           |  1 +
 target/arm/translate.h     |  3 +-
 target/arm/helper.c        |  1 +
 target/arm/translate-a64.c | 72 +++++++++++++++++++-------------------
 4 files changed, 40 insertions(+), 37 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, ZCR_LEN, 4, 4)
 FIELD(TBFLAG_A64, PAUTH_ACTIVE, 8, 1)
 FIELD(TBFLAG_A64, BT, 9, 1)
 FIELD(TBFLAG_A64, BTYPE, 10, 2)
+FIELD(TBFLAG_A64, TBID, 12, 2)
 
 static inline bool bswap_code(bool sctlr_b)
 {
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     int user;
 #endif
     ARMMMUIdx mmu_idx; /* MMU index to use for normal loads/stores */
-    uint8_t tbii;      /* TBI1|TBI0 for EL0/1 or TBI for EL2/3 */
+    uint8_t tbii;      /* TBI1|TBI0 for insns */
+    uint8_t tbid;      /* TBI1|TBI0 for data */
     bool ns;        /* Use non-secure CPREG bank on access */
     int fp_excp_el; /* FP exception EL or 0 if enabled */
     int sve_excp_el; /* SVE exception EL or 0 if enabled */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
             }
 
             flags = FIELD_DP32(flags, TBFLAG_A64, TBII, tbii);
+            flags = FIELD_DP32(flags, TBFLAG_A64, TBID, tbid);
         }
 #endif
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ void gen_a64_set_pc_im(uint64_t val)
     tcg_gen_movi_i64(cpu_pc, val);
 }
 
-/* Load the PC from a generic TCG variable.
+/*
+ * Handle Top Byte Ignore (TBI) bits.
  *
- * If address tagging is enabled via the TCR TBI bits, then loading
- * an address into the PC will clear out any tag in it:
+ * If address tagging is enabled via the TCR TBI bits:
  *  + for EL2 and EL3 there is only one TBI bit, and if it is set
  *    then the address is zero-extended, clearing bits [63:56]
  *  + for EL0 and EL1, TBI0 controls addresses with bit 55 == 0
@@ -XXX,XX +XXX,XX @@ void gen_a64_set_pc_im(uint64_t val)
  *    If the appropriate TBI bit is set for the address then
  *    the address is sign-extended from bit 55 into bits [63:56]
  *
- * We can avoid doing this for relative-branches, because the
- * PC + offset can never overflow into the tag bits (assuming
- * that virtual addresses are less than 56 bits wide, as they
- * are currently), but we must handle it for branch-to-register.
+ * Here We have concatenated TBI{1,0} into tbi.
  */
-static void gen_a64_set_pc(DisasContext *s, TCGv_i64 src)
+static void gen_top_byte_ignore(DisasContext *s, TCGv_i64 dst,
+                                TCGv_i64 src, int tbi)
 {
-    /* Note that TBII is TBI1:TBI0.  */
-    int tbi = s->tbii;
-
-    if (s->current_el <= 1) {
-        if (tbi != 0) {
-            /* Sign-extend from bit 55.  */
-            tcg_gen_sextract_i64(cpu_pc, src, 0, 56);
-
-            if (tbi != 3) {
-                TCGv_i64 tcg_zero = tcg_const_i64(0);
-
-                /*
-                 * The two TBI bits differ.
-                 * If tbi0, then !tbi1: only use the extension if positive.
-                 * if !tbi0, then tbi1: only use the extension if negative.
-                 */
-                tcg_gen_movcond_i64(tbi == 1 ? TCG_COND_GE : TCG_COND_LT,
-                                    cpu_pc, cpu_pc, tcg_zero, cpu_pc, src);
-                tcg_temp_free_i64(tcg_zero);
-            }
-            return;
-        }
+    if (tbi == 0) {
+        /* Load unmodified address */
+        tcg_gen_mov_i64(dst, src);
+    } else if (s->current_el >= 2) {
+        /* FIXME: ARMv8.1-VHE S2 translation regime.  */
+        /* Force tag byte to all zero */
+        tcg_gen_extract_i64(dst, src, 0, 56);
     } else {
-        if (tbi != 0) {
-            /* Force tag byte to all zero */
-            tcg_gen_extract_i64(cpu_pc, src, 0, 56);
-            return;
+        /* Sign-extend from bit 55.  */
+        tcg_gen_sextract_i64(dst, src, 0, 56);
+
+        if (tbi != 3) {
+            TCGv_i64 tcg_zero = tcg_const_i64(0);
+
+            /*
+             * The two TBI bits differ.
+             * If tbi0, then !tbi1: only use the extension if positive.
+             * if !tbi0, then tbi1: only use the extension if negative.
+             */
+            tcg_gen_movcond_i64(tbi == 1 ? TCG_COND_GE : TCG_COND_LT,
+                                dst, dst, tcg_zero, dst, src);
+            tcg_temp_free_i64(tcg_zero);
         }
     }
+}
 
-    /* Load unmodified address */
-    tcg_gen_mov_i64(cpu_pc, src);
+static void gen_a64_set_pc(DisasContext *s, TCGv_i64 src)
+{
+    /*
+     * If address tagging is enabled for instructions via the TCR TBI bits,
+     * then loading an address into the PC will clear out any tag.
+     */
+    gen_top_byte_ignore(s, cpu_pc, src, s->tbii);
 }
 
 typedef struct DisasCompare64 {
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     core_mmu_idx = FIELD_EX32(tb_flags, TBFLAG_ANY, MMUIDX);
     dc->mmu_idx = core_to_arm_mmu_idx(env, core_mmu_idx);
     dc->tbii = FIELD_EX32(tb_flags, TBFLAG_A64, TBII);
+    dc->tbid = FIELD_EX32(tb_flags, TBFLAG_A64, TBID);
     dc->current_el = arm_mmu_idx_to_el(dc->mmu_idx);
 #if !defined(CONFIG_USER_ONLY)
     dc->user = (dc->current_el == 0);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This will allow TBI to be used in user-only mode, as well as
avoid ping-ponging the softmmu TLB when TBI is in use.  It
will also enable other armv8 extensions.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190204132126.3255-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 217 ++++++++++++++++++++-----------------
 1 file changed, 116 insertions(+), 101 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_a64_set_pc(DisasContext *s, TCGv_i64 src)
     gen_top_byte_ignore(s, cpu_pc, src, s->tbii);
 }
 
+/*
+ * Return a "clean" address for ADDR according to TBID.
+ * This is always a fresh temporary, as we need to be able to
+ * increment this independently of a dirty write-back address.
+ */
+static TCGv_i64 clean_data_tbi(DisasContext *s, TCGv_i64 addr)
+{
+    TCGv_i64 clean = new_tmp_a64(s);
+    gen_top_byte_ignore(s, clean, addr, s->tbid);
+    return clean;
+}
+
 typedef struct DisasCompare64 {
     TCGCond cond;
     TCGv_i64 value;
@@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap(DisasContext *s, int rs, int rt,
     TCGv_i64 tcg_rs = cpu_reg(s, rs);
     TCGv_i64 tcg_rt = cpu_reg(s, rt);
     int memidx = get_mem_index(s);
-    TCGv_i64 addr = cpu_reg_sp(s, rn);
+    TCGv_i64 clean_addr;
 
     if (rn == 31) {
         gen_check_sp_alignment(s);
     }
-    tcg_gen_atomic_cmpxchg_i64(tcg_rs, addr, tcg_rs, tcg_rt, memidx,
+    clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
+    tcg_gen_atomic_cmpxchg_i64(tcg_rs, clean_addr, tcg_rs, tcg_rt, memidx,
                                size | MO_ALIGN | s->be_data);
 }
 
@@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
     TCGv_i64 s2 = cpu_reg(s, rs + 1);
     TCGv_i64 t1 = cpu_reg(s, rt);
     TCGv_i64 t2 = cpu_reg(s, rt + 1);
-    TCGv_i64 addr = cpu_reg_sp(s, rn);
+    TCGv_i64 clean_addr;
     int memidx = get_mem_index(s);
 
     if (rn == 31) {
         gen_check_sp_alignment(s);
     }
+    clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
 
     if (size == 2) {
         TCGv_i64 cmp = tcg_temp_new_i64();
@@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
             tcg_gen_concat32_i64(cmp, s2, s1);
         }
 
-        tcg_gen_atomic_cmpxchg_i64(cmp, addr, cmp, val, memidx,
+        tcg_gen_atomic_cmpxchg_i64(cmp, clean_addr, cmp, val, memidx,
                                    MO_64 | MO_ALIGN | s->be_data);
         tcg_temp_free_i64(val);
 
@@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
         if (HAVE_CMPXCHG128) {
             TCGv_i32 tcg_rs = tcg_const_i32(rs);
             if (s->be_data == MO_LE) {
-                gen_helper_casp_le_parallel(cpu_env, tcg_rs, addr, t1, t2);
+                gen_helper_casp_le_parallel(cpu_env, tcg_rs,
+                                            clean_addr, t1, t2);
             } else {
-                gen_helper_casp_be_parallel(cpu_env, tcg_rs, addr, t1, t2);
+                gen_helper_casp_be_parallel(cpu_env, tcg_rs,
+                                            clean_addr, t1, t2);
             }
             tcg_temp_free_i32(tcg_rs);
         } else {
@@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
         TCGv_i64 zero = tcg_const_i64(0);
 
         /* Load the two words, in memory order.  */
-        tcg_gen_qemu_ld_i64(d1, addr, memidx,
+        tcg_gen_qemu_ld_i64(d1, clean_addr, memidx,
                             MO_64 | MO_ALIGN_16 | s->be_data);
-        tcg_gen_addi_i64(a2, addr, 8);
-        tcg_gen_qemu_ld_i64(d2, addr, memidx, MO_64 | s->be_data);
+        tcg_gen_addi_i64(a2, clean_addr, 8);
+        tcg_gen_qemu_ld_i64(d2, clean_addr, memidx, MO_64 | s->be_data);
 
         /* Compare the two words, also in memory order.  */
         tcg_gen_setcond_i64(TCG_COND_EQ, c1, d1, s1);
@@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
         /* If compare equal, write back new data, else write back old data.  */
         tcg_gen_movcond_i64(TCG_COND_NE, c1, c2, zero, t1, d1);
         tcg_gen_movcond_i64(TCG_COND_NE, c2, c2, zero, t2, d2);
-        tcg_gen_qemu_st_i64(c1, addr, memidx, MO_64 | s->be_data);
+        tcg_gen_qemu_st_i64(c1, clean_addr, memidx, MO_64 | s->be_data);
         tcg_gen_qemu_st_i64(c2, a2, memidx, MO_64 | s->be_data);
         tcg_temp_free_i64(a2);
         tcg_temp_free_i64(c1);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
     int is_lasr = extract32(insn, 15, 1);
     int o2_L_o1_o0 = extract32(insn, 21, 3) * 2 | is_lasr;
     int size = extract32(insn, 30, 2);
-    TCGv_i64 tcg_addr;
+    TCGv_i64 clean_addr;
 
     switch (o2_L_o1_o0) {
     case 0x0: /* STXR */
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
         if (is_lasr) {
             tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
         }
-        tcg_addr = read_cpu_reg_sp(s, rn, 1);
-        gen_store_exclusive(s, rs, rt, rt2, tcg_addr, size, false);
+        clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
+        gen_store_exclusive(s, rs, rt, rt2, clean_addr, size, false);
         return;
 
     case 0x4: /* LDXR */
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
         if (rn == 31) {
             gen_check_sp_alignment(s);
         }
-        tcg_addr = read_cpu_reg_sp(s, rn, 1);
+        clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
         s->is_ldex = true;
-        gen_load_exclusive(s, rt, rt2, tcg_addr, size, false);
+        gen_load_exclusive(s, rt, rt2, clean_addr, size, false);
         if (is_lasr) {
             tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
         }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
             gen_check_sp_alignment(s);
         }
         tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
-        tcg_addr = read_cpu_reg_sp(s, rn, 1);
-        do_gpr_st(s, cpu_reg(s, rt), tcg_addr, size, true, rt,
+        clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
+        do_gpr_st(s, cpu_reg(s, rt), clean_addr, size, true, rt,
                   disas_ldst_compute_iss_sf(size, false, 0), is_lasr);
         return;
 
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
         if (rn == 31) {
             gen_check_sp_alignment(s);
         }
-        tcg_addr = read_cpu_reg_sp(s, rn, 1);
-        do_gpr_ld(s, cpu_reg(s, rt), tcg_addr, size, false, false, true, rt,
+        clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
+        do_gpr_ld(s, cpu_reg(s, rt), clean_addr, size, false, false, true, rt,
                   disas_ldst_compute_iss_sf(size, false, 0), is_lasr);
         tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
         return;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
             if (is_lasr) {
                 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
             }
-            tcg_addr = read_cpu_reg_sp(s, rn, 1);
-            gen_store_exclusive(s, rs, rt, rt2, tcg_addr, size, true);
+            clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
+            gen_store_exclusive(s, rs, rt, rt2, clean_addr, size, true);
             return;
         }
         if (rt2 == 31
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
             if (rn == 31) {
                 gen_check_sp_alignment(s);
             }
-            tcg_addr = read_cpu_reg_sp(s, rn, 1);
+            clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
             s->is_ldex = true;
-            gen_load_exclusive(s, rt, rt2, tcg_addr, size, true);
+            gen_load_exclusive(s, rt, rt2, clean_addr, size, true);
             if (is_lasr) {
                 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
             }
@@ -XXX,XX +XXX,XX @@ static void disas_ld_lit(DisasContext *s, uint32_t insn)
     int opc = extract32(insn, 30, 2);
     bool is_signed = false;
     int size = 2;
-    TCGv_i64 tcg_rt, tcg_addr;
+    TCGv_i64 tcg_rt, clean_addr;
 
     if (is_vector) {
         if (opc == 3) {
@@ -XXX,XX +XXX,XX @@ static void disas_ld_lit(DisasContext *s, uint32_t insn)
 
     tcg_rt = cpu_reg(s, rt);
 
-    tcg_addr = tcg_const_i64((s->pc - 4) + imm);
+    clean_addr = tcg_const_i64((s->pc - 4) + imm);
     if (is_vector) {
-        do_fp_ld(s, rt, tcg_addr, size);
+        do_fp_ld(s, rt, clean_addr, size);
     } else {
         /* Only unsigned 32bit loads target 32bit registers.  */
         bool iss_sf = opc != 0;
 
-        do_gpr_ld(s, tcg_rt, tcg_addr, size, is_signed, false,
+        do_gpr_ld(s, tcg_rt, clean_addr, size, is_signed, false,
                   true, rt, iss_sf, false);
     }
-    tcg_temp_free_i64(tcg_addr);
+    tcg_temp_free_i64(clean_addr);
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn)
     bool postindex = false;
     bool wback = false;
 
-    TCGv_i64 tcg_addr; /* calculated address */
+    TCGv_i64 clean_addr, dirty_addr;
+
     int size;
 
     if (opc == 3) {
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn)
         gen_check_sp_alignment(s);
     }
 
-    tcg_addr = read_cpu_reg_sp(s, rn, 1);
-
+    dirty_addr = read_cpu_reg_sp(s, rn, 1);
     if (!postindex) {
-        tcg_gen_addi_i64(tcg_addr, tcg_addr, offset);
+        tcg_gen_addi_i64(dirty_addr, dirty_addr, offset);
     }
+    clean_addr = clean_data_tbi(s, dirty_addr);
 
     if (is_vector) {
         if (is_load) {
-            do_fp_ld(s, rt, tcg_addr, size);
+            do_fp_ld(s, rt, clean_addr, size);
         } else {
-            do_fp_st(s, rt, tcg_addr, size);
+            do_fp_st(s, rt, clean_addr, size);
         }
-        tcg_gen_addi_i64(tcg_addr, tcg_addr, 1 << size);
+        tcg_gen_addi_i64(clean_addr, clean_addr, 1 << size);
         if (is_load) {
-            do_fp_ld(s, rt2, tcg_addr, size);
+            do_fp_ld(s, rt2, clean_addr, size);
         } else {
-            do_fp_st(s, rt2, tcg_addr, size);
+            do_fp_st(s, rt2, clean_addr, size);
         }
     } else {
         TCGv_i64 tcg_rt = cpu_reg(s, rt);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn)
             /* Do not modify tcg_rt before recognizing any exception
              * from the second load.
              */
-            do_gpr_ld(s, tmp, tcg_addr, size, is_signed, false,
+            do_gpr_ld(s, tmp, clean_addr, size, is_signed, false,
                       false, 0, false, false);
-            tcg_gen_addi_i64(tcg_addr, tcg_addr, 1 << size);
-            do_gpr_ld(s, tcg_rt2, tcg_addr, size, is_signed, false,
+            tcg_gen_addi_i64(clean_addr, clean_addr, 1 << size);
+            do_gpr_ld(s, tcg_rt2, clean_addr, size, is_signed, false,
                       false, 0, false, false);
 
             tcg_gen_mov_i64(tcg_rt, tmp);
             tcg_temp_free_i64(tmp);
         } else {
-            do_gpr_st(s, tcg_rt, tcg_addr, size,
+            do_gpr_st(s, tcg_rt, clean_addr, size,
                       false, 0, false, false);
-            tcg_gen_addi_i64(tcg_addr, tcg_addr, 1 << size);
-            do_gpr_st(s, tcg_rt2, tcg_addr, size,
+            tcg_gen_addi_i64(clean_addr, clean_addr, 1 << size);
+            do_gpr_st(s, tcg_rt2, clean_addr, size,
                       false, 0, false, false);
         }
     }
 
     if (wback) {
         if (postindex) {
-            tcg_gen_addi_i64(tcg_addr, tcg_addr, offset - (1 << size));
-        } else {
-            tcg_gen_subi_i64(tcg_addr, tcg_addr, 1 << size);
+            tcg_gen_addi_i64(dirty_addr, dirty_addr, offset);
         }
-        tcg_gen_mov_i64(cpu_reg_sp(s, rn), tcg_addr);
+        tcg_gen_mov_i64(cpu_reg_sp(s, rn), dirty_addr);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn,
     bool post_index;
     bool writeback;
 
-    TCGv_i64 tcg_addr;
+    TCGv_i64 clean_addr, dirty_addr;
 
     if (is_vector) {
         size |= (opc & 2) << 1;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn,
     if (rn == 31) {
         gen_check_sp_alignment(s);
     }
-    tcg_addr = read_cpu_reg_sp(s, rn, 1);
 
+    dirty_addr = read_cpu_reg_sp(s, rn, 1);
     if (!post_index) {
-        tcg_gen_addi_i64(tcg_addr, tcg_addr, imm9);
+        tcg_gen_addi_i64(dirty_addr, dirty_addr, imm9);
     }
+    clean_addr = clean_data_tbi(s, dirty_addr);
 
     if (is_vector) {
         if (is_store) {
-            do_fp_st(s, rt, tcg_addr, size);
+            do_fp_st(s, rt, clean_addr, size);
         } else {
-            do_fp_ld(s, rt, tcg_addr, size);
+            do_fp_ld(s, rt, clean_addr, size);
         }
     } else {
         TCGv_i64 tcg_rt = cpu_reg(s, rt);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn,
         bool iss_sf = disas_ldst_compute_iss_sf(size, is_signed, opc);
 
         if (is_store) {
-            do_gpr_st_memidx(s, tcg_rt, tcg_addr, size, memidx,
+            do_gpr_st_memidx(s, tcg_rt, clean_addr, size, memidx,
                              iss_valid, rt, iss_sf, false);
         } else {
-            do_gpr_ld_memidx(s, tcg_rt, tcg_addr, size,
+            do_gpr_ld_memidx(s, tcg_rt, clean_addr, size,
                              is_signed, is_extended, memidx,
                              iss_valid, rt, iss_sf, false);
         }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn,
     if (writeback) {
         TCGv_i64 tcg_rn = cpu_reg_sp(s, rn);
         if (post_index) {
-            tcg_gen_addi_i64(tcg_addr, tcg_addr, imm9);
+            tcg_gen_addi_i64(dirty_addr, dirty_addr, imm9);
         }
-        tcg_gen_mov_i64(tcg_rn, tcg_addr);
+        tcg_gen_mov_i64(tcg_rn, dirty_addr);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn,
     bool is_store = false;
     bool is_extended = false;
 
-    TCGv_i64 tcg_rm;
-    TCGv_i64 tcg_addr;
+    TCGv_i64 tcg_rm, clean_addr, dirty_addr;
 
     if (extract32(opt, 1, 1) == 0) {
         unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn,
     if (rn == 31) {
         gen_check_sp_alignment(s);
     }
-    tcg_addr = read_cpu_reg_sp(s, rn, 1);
+    dirty_addr = read_cpu_reg_sp(s, rn, 1);
 
     tcg_rm = read_cpu_reg(s, rm, 1);
     ext_and_shift_reg(tcg_rm, tcg_rm, opt, shift ? size : 0);
 
-    tcg_gen_add_i64(tcg_addr, tcg_addr, tcg_rm);
+    tcg_gen_add_i64(dirty_addr, dirty_addr, tcg_rm);
+    clean_addr = clean_data_tbi(s, dirty_addr);
 
     if (is_vector) {
         if (is_store) {
-            do_fp_st(s, rt, tcg_addr, size);
+            do_fp_st(s, rt, clean_addr, size);
         } else {
-            do_fp_ld(s, rt, tcg_addr, size);
+            do_fp_ld(s, rt, clean_addr, size);
         }
     } else {
         TCGv_i64 tcg_rt = cpu_reg(s, rt);
         bool iss_sf = disas_ldst_compute_iss_sf(size, is_signed, opc);
         if (is_store) {
-            do_gpr_st(s, tcg_rt, tcg_addr, size,
+            do_gpr_st(s, tcg_rt, clean_addr, size,
                       true, rt, iss_sf, false);
         } else {
-            do_gpr_ld(s, tcg_rt, tcg_addr, size,
+            do_gpr_ld(s, tcg_rt, clean_addr, size,
                       is_signed, is_extended,
                       true, rt, iss_sf, false);
         }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn,
     unsigned int imm12 = extract32(insn, 10, 12);
     unsigned int offset;
 
-    TCGv_i64 tcg_addr;
+    TCGv_i64 clean_addr, dirty_addr;
 
     bool is_store;
     bool is_signed = false;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn,
     if (rn == 31) {
         gen_check_sp_alignment(s);
     }
-    tcg_addr = read_cpu_reg_sp(s, rn, 1);
+    dirty_addr = read_cpu_reg_sp(s, rn, 1);
     offset = imm12 << size;
-    tcg_gen_addi_i64(tcg_addr, tcg_addr, offset);
+    tcg_gen_addi_i64(dirty_addr, dirty_addr, offset);
+    clean_addr = clean_data_tbi(s, dirty_addr);
 
     if (is_vector) {
         if (is_store) {
-            do_fp_st(s, rt, tcg_addr, size);
+            do_fp_st(s, rt, clean_addr, size);
         } else {
-            do_fp_ld(s, rt, tcg_addr, size);
+            do_fp_ld(s, rt, clean_addr, size);
         }
     } else {
         TCGv_i64 tcg_rt = cpu_reg(s, rt);
         bool iss_sf = disas_ldst_compute_iss_sf(size, is_signed, opc);
         if (is_store) {
-            do_gpr_st(s, tcg_rt, tcg_addr, size,
+            do_gpr_st(s, tcg_rt, clean_addr, size,
                       true, rt, iss_sf, false);
         } else {
-            do_gpr_ld(s, tcg_rt, tcg_addr, size, is_signed, is_extended,
+            do_gpr_ld(s, tcg_rt, clean_addr, size, is_signed, is_extended,
                       true, rt, iss_sf, false);
         }
     }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     int rs = extract32(insn, 16, 5);
     int rn = extract32(insn, 5, 5);
     int o3_opc = extract32(insn, 12, 4);
-    TCGv_i64 tcg_rn, tcg_rs;
+    TCGv_i64 tcg_rs, clean_addr;
     AtomicThreeOpFn *fn;
 
     if (is_vector || !dc_isar_feature(aa64_atomics, s)) {
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     if (rn == 31) {
         gen_check_sp_alignment(s);
     }
-    tcg_rn = cpu_reg_sp(s, rn);
+    clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
     tcg_rs = read_cpu_reg(s, rs, true);
 
     if (o3_opc == 1) { /* LDCLR */
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     /* The tcg atomic primitives are all full barriers.  Therefore we
      * can ignore the Acquire and Release bits of this instruction.
      */
-    fn(cpu_reg(s, rt), tcg_rn, tcg_rs, get_mem_index(s),
+    fn(cpu_reg(s, rt), clean_addr, tcg_rs, get_mem_index(s),
        s->be_data | size | MO_ALIGN);
 }
 
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_pac(DisasContext *s, uint32_t insn,
     bool is_wback = extract32(insn, 11, 1);
     bool use_key_a = !extract32(insn, 23, 1);
     int offset;
-    TCGv_i64 tcg_addr, tcg_rt;
+    TCGv_i64 clean_addr, dirty_addr, tcg_rt;
 
     if (size != 3 || is_vector || !dc_isar_feature(aa64_pauth, s)) {
         unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_pac(DisasContext *s, uint32_t insn,
     if (rn == 31) {
         gen_check_sp_alignment(s);
     }
-    tcg_addr = read_cpu_reg_sp(s, rn, 1);
+    dirty_addr = read_cpu_reg_sp(s, rn, 1);
 
     if (s->pauth_active) {
         if (use_key_a) {
-            gen_helper_autda(tcg_addr, cpu_env, tcg_addr, cpu_X[31]);
+            gen_helper_autda(dirty_addr, cpu_env, dirty_addr, cpu_X[31]);
         } else {
-            gen_helper_autdb(tcg_addr, cpu_env, tcg_addr, cpu_X[31]);
+            gen_helper_autdb(dirty_addr, cpu_env, dirty_addr, cpu_X[31]);
         }
     }
 
     /* Form the 10-bit signed, scaled offset.  */
     offset = (extract32(insn, 22, 1) << 9) | extract32(insn, 12, 9);
     offset = sextract32(offset << size, 0, 10 + size);
-    tcg_gen_addi_i64(tcg_addr, tcg_addr, offset);
+    tcg_gen_addi_i64(dirty_addr, dirty_addr, offset);
+
+    /* Note that "clean" and "dirty" here refer to TBI not PAC.  */
+    clean_addr = clean_data_tbi(s, dirty_addr);
 
     tcg_rt = cpu_reg(s, rt);
-
-    do_gpr_ld(s, tcg_rt, tcg_addr, size, /* is_signed */ false,
+    do_gpr_ld(s, tcg_rt, clean_addr, size, /* is_signed */ false,
               /* extend */ false, /* iss_valid */ !is_wback,
               /* iss_srt */ rt, /* iss_sf */ true, /* iss_ar */ false);
 
     if (is_wback) {
-        tcg_gen_mov_i64(cpu_reg_sp(s, rn), tcg_addr);
+        tcg_gen_mov_i64(cpu_reg_sp(s, rn), dirty_addr);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
     bool is_store = !extract32(insn, 22, 1);
     bool is_postidx = extract32(insn, 23, 1);
     bool is_q = extract32(insn, 30, 1);
-    TCGv_i64 tcg_addr, tcg_rn, tcg_ebytes;
+    TCGv_i64 clean_addr, tcg_rn, tcg_ebytes;
     TCGMemOp endian = s->be_data;
 
     int ebytes;   /* bytes per element */
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
     elements = (is_q ? 16 : 8) / ebytes;
 
     tcg_rn = cpu_reg_sp(s, rn);
-    tcg_addr = tcg_temp_new_i64();
-    tcg_gen_mov_i64(tcg_addr, tcg_rn);
+    clean_addr = clean_data_tbi(s, tcg_rn);
     tcg_ebytes = tcg_const_i64(ebytes);
 
     for (r = 0; r < rpt; r++) {
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
             for (xs = 0; xs < selem; xs++) {
                 int tt = (rt + r + xs) % 32;
                 if (is_store) {
-                    do_vec_st(s, tt, e, tcg_addr, size, endian);
+                    do_vec_st(s, tt, e, clean_addr, size, endian);
                 } else {
-                    do_vec_ld(s, tt, e, tcg_addr, size, endian);
+                    do_vec_ld(s, tt, e, clean_addr, size, endian);
                 }
-                tcg_gen_add_i64(tcg_addr, tcg_addr, tcg_ebytes);
+                tcg_gen_add_i64(clean_addr, clean_addr, tcg_ebytes);
             }
         }
     }
+    tcg_temp_free_i64(tcg_ebytes);
 
     if (!is_store) {
         /* For non-quad operations, setting a slice of the low
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
 
     if (is_postidx) {
         if (rm == 31) {
-            tcg_gen_mov_i64(tcg_rn, tcg_addr);
+            tcg_gen_addi_i64(tcg_rn, tcg_rn, rpt * elements * selem * ebytes);
         } else {
             tcg_gen_add_i64(tcg_rn, tcg_rn, cpu_reg(s, rm));
         }
     }
-    tcg_temp_free_i64(tcg_ebytes);
-    tcg_temp_free_i64(tcg_addr);
 }
 
 /* AdvSIMD load/store single structure
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
     bool replicate = false;
     int index = is_q << 3 | S << 2 | size;
     int ebytes, xs;
-    TCGv_i64 tcg_addr, tcg_rn, tcg_ebytes;
+    TCGv_i64 clean_addr, tcg_rn, tcg_ebytes;
 
     if (extract32(insn, 31, 1)) {
         unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
     }
 
     tcg_rn = cpu_reg_sp(s, rn);
-    tcg_addr = tcg_temp_new_i64();
-    tcg_gen_mov_i64(tcg_addr, tcg_rn);
+    clean_addr = clean_data_tbi(s, tcg_rn);
     tcg_ebytes = tcg_const_i64(ebytes);
 
     for (xs = 0; xs < selem; xs++) {
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
             /* Load and replicate to all elements */
             TCGv_i64 tcg_tmp = tcg_temp_new_i64();
 
-            tcg_gen_qemu_ld_i64(tcg_tmp, tcg_addr,
+            tcg_gen_qemu_ld_i64(tcg_tmp, clean_addr,
                                 get_mem_index(s), s->be_data + scale);
             tcg_gen_gvec_dup_i64(scale, vec_full_reg_offset(s, rt),
                                  (is_q + 1) * 8, vec_full_reg_size(s),
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
         } else {
             /* Load/store one element per register */
             if (is_load) {
-                do_vec_ld(s, rt, index, tcg_addr, scale, s->be_data);
+                do_vec_ld(s, rt, index, clean_addr, scale, s->be_data);
             } else {
-                do_vec_st(s, rt, index, tcg_addr, scale, s->be_data);
+                do_vec_st(s, rt, index, clean_addr, scale, s->be_data);
             }
         }
-        tcg_gen_add_i64(tcg_addr, tcg_addr, tcg_ebytes);
+        tcg_gen_add_i64(clean_addr, clean_addr, tcg_ebytes);
         rt = (rt + 1) % 32;
     }
+    tcg_temp_free_i64(tcg_ebytes);
 
     if (is_postidx) {
         if (rm == 31) {
-            tcg_gen_mov_i64(tcg_rn, tcg_addr);
+            tcg_gen_addi_i64(tcg_rn, tcg_rn, selem * ebytes);
         } else {
             tcg_gen_add_i64(tcg_rn, tcg_rn, cpu_reg(s, rm));
         }
     }
-    tcg_temp_free_i64(tcg_ebytes);
-    tcg_temp_free_i64(tcg_addr);
 }
 
 /* Loads and stores */
-- 
2.20.1

Enables, but does not turn on, TBI for CONFIG_USER_ONLY.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190204132126.3255-4-richard.henderson@linaro.org
[PMM: adjusted #ifdeffery to placate clang, which otherwise complains
about static functions that are unused in the CONFIG_USER_ONLY build]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h | 21 --------------------
 target/arm/helper.c    | 45 ++++++++++++++++++++++--------------------
 2 files changed, 24 insertions(+), 42 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ typedef struct ARMVAParameters {
     bool using64k   : 1;
 } ARMVAParameters;
 
-#ifdef CONFIG_USER_ONLY
-static inline ARMVAParameters aa64_va_parameters_both(CPUARMState *env,
-                                                      uint64_t va,
-                                                      ARMMMUIdx mmu_idx)
-{
-    return (ARMVAParameters) {
-        /* 48-bit address space */
-        .tsz = 16,
-        /* We can't handle tagged addresses properly in user-only mode */
-        .tbi = false,
-    };
-}
-
-static inline ARMVAParameters aa64_va_parameters(CPUARMState *env,
-                                                 uint64_t va,
-                                                 ARMMMUIdx mmu_idx, bool data)
-{
-    return aa64_va_parameters_both(env, va, mmu_idx);
-}
-#else
 ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
                                         ARMMMUIdx mmu_idx);
 ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
                                    ARMMMUIdx mmu_idx, bool data);
-#endif
 
 #endif
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rbit)(uint32_t x)
     return revbit32(x);
 }
 
-#if defined(CONFIG_USER_ONLY)
+#ifdef CONFIG_USER_ONLY
 
 /* These should probably raise undefined insn exceptions.  */
 void HELPER(v7m_msr)(CPUARMState *env, uint32_t reg, uint32_t val)
@@ -XXX,XX +XXX,XX @@ void arm_cpu_do_interrupt(CPUState *cs)
         cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
     }
 }
+#endif /* !CONFIG_USER_ONLY */
 
 /* Return the exception level which controls this address translation regime */
 static inline uint32_t regime_el(CPUARMState *env, ARMMMUIdx mmu_idx)
@@ -XXX,XX +XXX,XX @@ static inline uint32_t regime_el(CPUARMState *env, ARMMMUIdx mmu_idx)
     }
 }
 
+#ifndef CONFIG_USER_ONLY
+
 /* Return the SCTLR value which controls this address translation regime */
 static inline uint32_t regime_sctlr(CPUARMState *env, ARMMMUIdx mmu_idx)
 {
@@ -XXX,XX +XXX,XX @@ static inline bool regime_translation_big_endian(CPUARMState *env,
     return (regime_sctlr(env, mmu_idx) & SCTLR_EE) != 0;
 }
 
+/* Return the TTBR associated with this translation regime */
+static inline uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx,
+                                   int ttbrn)
+{
+    if (mmu_idx == ARMMMUIdx_S2NS) {
+        return env->cp15.vttbr_el2;
+    }
+    if (ttbrn == 0) {
+        return env->cp15.ttbr0_el[regime_el(env, mmu_idx)];
+    } else {
+        return env->cp15.ttbr1_el[regime_el(env, mmu_idx)];
+    }
+}
+
+#endif /* !CONFIG_USER_ONLY */
+
 /* Return the TCR controlling this translation regime */
 static inline TCR *regime_tcr(CPUARMState *env, ARMMMUIdx mmu_idx)
 {
@@ -XXX,XX +XXX,XX @@ static inline ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
     return mmu_idx;
 }
 
-/* Return the TTBR associated with this translation regime */
-static inline uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx,
-                                   int ttbrn)
-{
-    if (mmu_idx == ARMMMUIdx_S2NS) {
-        return env->cp15.vttbr_el2;
-    }
-    if (ttbrn == 0) {
-        return env->cp15.ttbr0_el[regime_el(env, mmu_idx)];
-    } else {
-        return env->cp15.ttbr1_el[regime_el(env, mmu_idx)];
-    }
-}
-
 /* Return true if the translation regime is using LPAE format page tables */
 static inline bool regime_using_lpae_format(CPUARMState *env,
                                             ARMMMUIdx mmu_idx)
@@ -XXX,XX +XXX,XX @@ bool arm_s1_regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
     return regime_using_lpae_format(env, mmu_idx);
 }
 
+#ifndef CONFIG_USER_ONLY
 static inline bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
 {
     switch (mmu_idx) {
@@ -XXX,XX +XXX,XX @@ static uint8_t convert_stage2_attrs(CPUARMState *env, uint8_t s2attrs)
 
     return (hiattr << 6) | (hihint << 4) | (loattr << 2) | lohint;
 }
+#endif /* !CONFIG_USER_ONLY */
 
 ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
                                         ARMMMUIdx mmu_idx)
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
     return ret;
 }
 
+#ifndef CONFIG_USER_ONLY
 static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
                                           ARMMMUIdx mmu_idx)
 {
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         *pc = env->pc;
         flags = FIELD_DP32(flags, TBFLAG_ANY, AARCH64_STATE, 1);
 
-#ifndef CONFIG_USER_ONLY
-        /*
-         * Get control bits for tagged addresses.  Note that the
-         * translator only uses this for instruction addresses.
-         */
+        /* Get control bits for tagged addresses.  */
         {
             ARMMMUIdx stage1 = stage_1_mmu_idx(mmu_idx);
             ARMVAParameters p0 = aa64_va_parameters_both(env, 0, stage1);
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
             flags = FIELD_DP32(flags, TBFLAG_A64, TBII, tbii);
             flags = FIELD_DP32(flags, TBFLAG_A64, TBID, tbid);
         }
-#endif
 
         if (cpu_isar_feature(aa64_sve, cpu)) {
             int sve_el = sve_exception_el(env, current_el);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This has been enabled in the linux kernel since v3.11
(commit d50240a5f6cea, 2013-09-03,
"arm64: mm: permit use of tagged pointers at EL0").

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190204132126.3255-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
         env->vfp.zcr_el[1] = cpu->sve_max_vq - 1;
         env->vfp.zcr_el[2] = env->vfp.zcr_el[1];
         env->vfp.zcr_el[3] = env->vfp.zcr_el[1];
+        /*
+         * Enable TBI0 and TBI1.  While the real kernel only enables TBI0,
+         * turning on both here will produce smaller code and otherwise
+         * make no difference to the user-level emulation.
+         */
+        env->cp15.tcr_el[1].raw_tcr = (3ULL << 37);
 #else
         /* Reset into the highest available EL */
         if (arm_feature(env, ARM_FEATURE_EL3)) {
-- 
2.20.1

From: Max Filippov <jcmvbkbc@gmail.com>

With multiprocess extensions gdb uses 'vKill' packet instead of 'k' to
kill the inferior. Handle 'vKill' the same way 'k' was handled in the
presence of single process.

Fixes: 7cf48f6752e5 ("gdbstub: add multiprocess support to
(f|s)ThreadInfo and ThreadExtraInfo")

Cc: Luc Michel <luc.michel@greensocs.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Reviewed-by: KONRAD Frederic <frederic.konrad@adacore.com>
Tested-by: KONRAD Frederic <frederic.konrad@adacore.com>
Message-id: 20190130192403.13754-1-jcmvbkbc@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 gdbstub.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/gdbstub.c b/gdbstub.c
index XXXXXXX..XXXXXXX 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -XXX,XX +XXX,XX @@ static int gdb_handle_packet(GDBState *s, const char *line_buf)
 
             put_packet(s, buf);
             break;
+        } else if (strncmp(p, "Kill;", 5) == 0) {
+            /* Kill the target */
+            error_report("QEMU: Terminated via GDBstub");
+            exit(0);
         } else {
             goto unknown_command;
         }
-- 
2.20.1

Fix the block comment style in arm_load_kernel() to QEMU's
current style preferences. This will allow us to do some
refactoring of this function without checkpatch complaining
about the code-motion patches.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 20190131112240.8395-2-peter.maydell@linaro.org
---
 hw/arm/boot.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
     static const ARMInsnFixup *primary_loader;
     AddressSpace *as = arm_boot_address_space(cpu, info);
 
-    /* CPU objects (unlike devices) are not automatically reset on system
+    /*
+     * CPU objects (unlike devices) are not automatically reset on system
      * reset, so we must always register a handler to do so. If we're
      * actually loading a kernel, the handler is also responsible for
      * arranging that we start it correctly.
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
         qemu_register_reset(do_cpu_reset, ARM_CPU(cs));
     }
 
-    /* The board code is not supposed to set secure_board_setup unless
+    /*
+     * The board code is not supposed to set secure_board_setup unless
      * running its code in secure mode is actually possible, and KVM
      * doesn't support secure.
      */
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
     if (!info->kernel_filename || info->firmware_loaded) {
 
         if (have_dtb(info)) {
-            /* If we have a device tree blob, but no kernel to supply it to (or
+            /*
+             * If we have a device tree blob, but no kernel to supply it to (or
              * the kernel is supposed to be loaded by the bootloader), copy the
              * DTB to the base of RAM for the bootloader to pick up.
              */
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
             try_decompressing_kernel = arm_feature(&cpu->env,
                                                    ARM_FEATURE_AARCH64);
 
-            /* Expose the kernel, the command line, and the initrd in fw_cfg.
+            /*
+             * Expose the kernel, the command line, and the initrd in fw_cfg.
              * We don't process them here at all, it's all left to the
              * firmware.
              */
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
             }
         }
 
-        /* We will start from address 0 (typically a boot ROM image) in the
+        /*
+         * We will start from address 0 (typically a boot ROM image) in the
          * same way as hardware.
          */
         return;
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
     if (info->nb_cpus == 0)
         info->nb_cpus = 1;
 
-    /* We want to put the initrd far enough into RAM that when the
+    /*
+     * We want to put the initrd far enough into RAM that when the
      * kernel is uncompressed it will not clobber the initrd. However
      * on boards without much RAM we must ensure that we still leave
      * enough room for a decent sized initrd, and on boards with large
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
     kernel_size = arm_load_elf(info, &elf_entry, &elf_low_addr,
                                &elf_high_addr, elf_machine, as);
     if (kernel_size > 0 && have_dtb(info)) {
-        /* If there is still some room left at the base of RAM, try and put
+        /*
+         * If there is still some room left at the base of RAM, try and put
          * the DTB there like we do for images loaded with -bios or -pflash.
          */
         if (elf_low_addr > info->loader_start
             || elf_high_addr < info->loader_start) {
-            /* Set elf_low_addr as address limit for arm_load_dtb if it may be
+            /*
+             * Set elf_low_addr as address limit for arm_load_dtb if it may be
              * pointing into RAM, otherwise pass '0' (no limit)
              */
             if (elf_low_addr < info->loader_start) {
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
         fixupcontext[FIXUP_BOARDID] = info->board_id;
         fixupcontext[FIXUP_BOARD_SETUP] = info->board_setup_addr;
 
-        /* for device tree boot, we pass the DTB directly in r2. Otherwise
+        /*
+         * for device tree boot, we pass the DTB directly in r2. Otherwise
          * we point to the kernel args.
          */
         if (have_dtb(info)) {
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
             info->write_board_setup(cpu, info);
         }
 
-        /* Notify devices which need to fake up firmware initialization
+        /*
+         * Notify devices which need to fake up firmware initialization
          * that we're doing a direct kernel boot.
          */
         object_child_foreach_recursive(object_get_root(),
-- 
2.20.1

Factor out the "direct kernel boot" code path from arm_load_kernel()
into its own function; this function is getting long enough that
the code flow is a bit confusing.

This commit only moves code around; no semantic changes.

We leave the "load the dtb" code in arm_load_kernel() -- this
is currently only used by the "direct kernel boot" path, but
this is a bug which we will fix shortly.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 20190131112240.8395-3-peter.maydell@linaro.org
---
 hw/arm/boot.c | 150 +++++++++++++++++++++++++++-----------------------
 1 file changed, 80 insertions(+), 70 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ static uint64_t load_aarch64_image(const char *filename, hwaddr mem_base,
     return size;
 }
 
-void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
+static void arm_setup_direct_kernel_boot(ARMCPU *cpu,
+                                         struct arm_boot_info *info)
 {
+    /* Set up for a direct boot of a kernel image file. */
     CPUState *cs;
+    AddressSpace *as = arm_boot_address_space(cpu, info);
     int kernel_size;
     int initrd_size;
     int is_linux = 0;
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
     int elf_machine;
     hwaddr entry;
     static const ARMInsnFixup *primary_loader;
-    AddressSpace *as = arm_boot_address_space(cpu, info);
-
-    /*
-     * CPU objects (unlike devices) are not automatically reset on system
-     * reset, so we must always register a handler to do so. If we're
-     * actually loading a kernel, the handler is also responsible for
-     * arranging that we start it correctly.
-     */
-    for (cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
-        qemu_register_reset(do_cpu_reset, ARM_CPU(cs));
-    }
-
-    /*
-     * The board code is not supposed to set secure_board_setup unless
-     * running its code in secure mode is actually possible, and KVM
-     * doesn't support secure.
-     */
-    assert(!(info->secure_board_setup && kvm_enabled()));
-
-    info->dtb_filename = qemu_opt_get(qemu_get_machine_opts(), "dtb");
-    info->dtb_limit = 0;
-
-    /* Load the kernel.  */
-    if (!info->kernel_filename || info->firmware_loaded) {
-
-        if (have_dtb(info)) {
-            /*
-             * If we have a device tree blob, but no kernel to supply it to (or
-             * the kernel is supposed to be loaded by the bootloader), copy the
-             * DTB to the base of RAM for the bootloader to pick up.
-             */
-            info->dtb_start = info->loader_start;
-        }
-
-        if (info->kernel_filename) {
-            FWCfgState *fw_cfg;
-            bool try_decompressing_kernel;
-
-            fw_cfg = fw_cfg_find();
-            try_decompressing_kernel = arm_feature(&cpu->env,
-                                                   ARM_FEATURE_AARCH64);
-
-            /*
-             * Expose the kernel, the command line, and the initrd in fw_cfg.
-             * We don't process them here at all, it's all left to the
-             * firmware.
-             */
-            load_image_to_fw_cfg(fw_cfg,
-                                 FW_CFG_KERNEL_SIZE, FW_CFG_KERNEL_DATA,
-                                 info->kernel_filename,
-                                 try_decompressing_kernel);
-            load_image_to_fw_cfg(fw_cfg,
-                                 FW_CFG_INITRD_SIZE, FW_CFG_INITRD_DATA,
-                                 info->initrd_filename, false);
-
-            if (info->kernel_cmdline) {
-                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
-                               strlen(info->kernel_cmdline) + 1);
-                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA,
-                                  info->kernel_cmdline);
-            }
-        }
-
-        /*
-         * We will start from address 0 (typically a boot ROM image) in the
-         * same way as hardware.
-         */
-        return;
-    }
 
     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
         primary_loader = bootloader_aarch64;
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
     for (cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
         ARM_CPU(cs)->env.boot_info = info;
     }
+}
+
+void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
+{
+    CPUState *cs;
+    AddressSpace *as = arm_boot_address_space(cpu, info);
+
+    /*
+     * CPU objects (unlike devices) are not automatically reset on system
+     * reset, so we must always register a handler to do so. If we're
+     * actually loading a kernel, the handler is also responsible for
+     * arranging that we start it correctly.
+     */
+    for (cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
+        qemu_register_reset(do_cpu_reset, ARM_CPU(cs));
+    }
+
+    /*
+     * The board code is not supposed to set secure_board_setup unless
+     * running its code in secure mode is actually possible, and KVM
+     * doesn't support secure.
+     */
+    assert(!(info->secure_board_setup && kvm_enabled()));
+
+    info->dtb_filename = qemu_opt_get(qemu_get_machine_opts(), "dtb");
+    info->dtb_limit = 0;
+
+    /* Load the kernel.  */
+    if (!info->kernel_filename || info->firmware_loaded) {
+
+        if (have_dtb(info)) {
+            /*
+             * If we have a device tree blob, but no kernel to supply it to (or
+             * the kernel is supposed to be loaded by the bootloader), copy the
+             * DTB to the base of RAM for the bootloader to pick up.
+             */
+            info->dtb_start = info->loader_start;
+        }
+
+        if (info->kernel_filename) {
+            FWCfgState *fw_cfg;
+            bool try_decompressing_kernel;
+
+            fw_cfg = fw_cfg_find();
+            try_decompressing_kernel = arm_feature(&cpu->env,
+                                                   ARM_FEATURE_AARCH64);
+
+            /*
+             * Expose the kernel, the command line, and the initrd in fw_cfg.
+             * We don't process them here at all, it's all left to the
+             * firmware.
+             */
+            load_image_to_fw_cfg(fw_cfg,
+                                 FW_CFG_KERNEL_SIZE, FW_CFG_KERNEL_DATA,
+                                 info->kernel_filename,
+                                 try_decompressing_kernel);
+            load_image_to_fw_cfg(fw_cfg,
+                                 FW_CFG_INITRD_SIZE, FW_CFG_INITRD_DATA,
+                                 info->initrd_filename, false);
+
+            if (info->kernel_cmdline) {
+                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
+                               strlen(info->kernel_cmdline) + 1);
+                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA,
+                                  info->kernel_cmdline);
+            }
+        }
+
+        /*
+         * We will start from address 0 (typically a boot ROM image) in the
+         * same way as hardware.
+         */
+        return;
+    } else {
+        arm_setup_direct_kernel_boot(cpu, info);
+    }
 
     if (!info->skip_dtb_autoload && have_dtb(info)) {
         if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
-- 
2.20.1

Factor out the "boot via firmware" code path from arm_load_kernel()
into its own function.

This commit only moves code around; no semantic changes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 20190131112240.8395-4-peter.maydell@linaro.org
---
 hw/arm/boot.c | 92 +++++++++++++++++++++++++++------------------------
 1 file changed, 49 insertions(+), 43 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ static void arm_setup_direct_kernel_boot(ARMCPU *cpu,
     }
 }
 
+static void arm_setup_firmware_boot(ARMCPU *cpu, struct arm_boot_info *info)
+{
+    /* Set up for booting firmware (which might load a kernel via fw_cfg) */
+
+    if (have_dtb(info)) {
+        /*
+         * If we have a device tree blob, but no kernel to supply it to (or
+         * the kernel is supposed to be loaded by the bootloader), copy the
+         * DTB to the base of RAM for the bootloader to pick up.
+         */
+        info->dtb_start = info->loader_start;
+    }
+
+    if (info->kernel_filename) {
+        FWCfgState *fw_cfg;
+        bool try_decompressing_kernel;
+
+        fw_cfg = fw_cfg_find();
+        try_decompressing_kernel = arm_feature(&cpu->env,
+                                               ARM_FEATURE_AARCH64);
+
+        /*
+         * Expose the kernel, the command line, and the initrd in fw_cfg.
+         * We don't process them here at all, it's all left to the
+         * firmware.
+         */
+        load_image_to_fw_cfg(fw_cfg,
+                             FW_CFG_KERNEL_SIZE, FW_CFG_KERNEL_DATA,
+                             info->kernel_filename,
+                             try_decompressing_kernel);
+        load_image_to_fw_cfg(fw_cfg,
+                             FW_CFG_INITRD_SIZE, FW_CFG_INITRD_DATA,
+                             info->initrd_filename, false);
+
+        if (info->kernel_cmdline) {
+            fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
+                           strlen(info->kernel_cmdline) + 1);
+            fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA,
+                              info->kernel_cmdline);
+        }
+    }
+
+    /*
+     * We will start from address 0 (typically a boot ROM image) in the
+     * same way as hardware.
+     */
+}
+
 void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
 {
     CPUState *cs;
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
 
     /* Load the kernel.  */
     if (!info->kernel_filename || info->firmware_loaded) {
-
-        if (have_dtb(info)) {
-            /*
-             * If we have a device tree blob, but no kernel to supply it to (or
-             * the kernel is supposed to be loaded by the bootloader), copy the
-             * DTB to the base of RAM for the bootloader to pick up.
-             */
-            info->dtb_start = info->loader_start;
-        }
-
-        if (info->kernel_filename) {
-            FWCfgState *fw_cfg;
-            bool try_decompressing_kernel;
-
-            fw_cfg = fw_cfg_find();
-            try_decompressing_kernel = arm_feature(&cpu->env,
-                                                   ARM_FEATURE_AARCH64);
-
-            /*
-             * Expose the kernel, the command line, and the initrd in fw_cfg.
-             * We don't process them here at all, it's all left to the
-             * firmware.
-             */
-            load_image_to_fw_cfg(fw_cfg,
-                                 FW_CFG_KERNEL_SIZE, FW_CFG_KERNEL_DATA,
-                                 info->kernel_filename,
-                                 try_decompressing_kernel);
-            load_image_to_fw_cfg(fw_cfg,
-                                 FW_CFG_INITRD_SIZE, FW_CFG_INITRD_DATA,
-                                 info->initrd_filename, false);
-
-            if (info->kernel_cmdline) {
-                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
-                               strlen(info->kernel_cmdline) + 1);
-                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA,
-                                  info->kernel_cmdline);
-            }
-        }
-
-        /*
-         * We will start from address 0 (typically a boot ROM image) in the
-         * same way as hardware.
-         */
+        arm_setup_firmware_boot(cpu, info);
         return;
     } else {
         arm_setup_direct_kernel_boot(cpu, info);
-- 
2.20.1

The arm_boot_info struct has a skip_dtb_autoload flag: if this is
set to true by the board code then arm_load_kernel() will not
load the DTB itself, but will leave this for the board code to
do itself later. However, the check for this is done in a
code path which is only executed for the case where we load
a kernel image file. If we're taking the "boot via firmware"
code path then the flag isn't honoured and the DTB is never
loaded.

We didn't notice this because the only real user of "boot
via firmware" that cares about the DTB is the virt board
(for UEFI boot), and that always wants skip_dtb_autoload
anyway. But the SBSA reference board model we're planning to
add will want the flag to behave correctly.

Now we've refactored the arm_load_kernel() function, the
fix is simple: drop the early 'return' so we fall into
the same "load the DTB" code the boot-direct-kernel path uses.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 20190131112240.8395-6-peter.maydell@linaro.org
---
 hw/arm/boot.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
     /* Load the kernel.  */
     if (!info->kernel_filename || info->firmware_loaded) {
         arm_setup_firmware_boot(cpu, info);
-        return;
     } else {
         arm_setup_direct_kernel_boot(cpu, info);
     }
-- 
2.20.1

The {IOE, DZE, OFE, UFE, IXE, IDE} bits in the FPSCR/FPCR are for
enabling trapped IEEE floating point exceptions (where IEEE exception
conditions cause a CPU exception rather than updating the FPSR status
bits). QEMU doesn't implement this (and nor does the hardware we're
modelling), but for implementations which don't implement trapped
exception handling these control bits are supposed to be RAZ/WI.
This allows guest code to test for whether the feature is present
by trying to write to the bit and checking whether it sticks.

QEMU is incorrectly making these bits read as written. Make them
RAZ/WI as the architecture requires.

In particular this was causing problems for the NetBSD automatic
test suite.

Reported-by: Martin Husemann <martin@netbsd.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190131130700.28392-1-peter.maydell@linaro.org
---
 target/arm/cpu.h    | 6 ++++++
 target/arm/helper.c | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
 #define FPSR_MASK 0xf800009f
 #define FPCR_MASK 0x07ff9f00
 
+#define FPCR_IOE    (1 << 8)    /* Invalid Operation exception trap enable */
+#define FPCR_DZE    (1 << 9)    /* Divide by Zero exception trap enable */
+#define FPCR_OFE    (1 << 10)   /* Overflow exception trap enable */
+#define FPCR_UFE    (1 << 11)   /* Underflow exception trap enable */
+#define FPCR_IXE    (1 << 12)   /* Inexact exception trap enable */
+#define FPCR_IDE    (1 << 15)   /* Input Denormal exception trap enable */
 #define FPCR_FZ16   (1 << 19)   /* ARMv8.2+, FP16 flush-to-zero */
 #define FPCR_FZ     (1 << 24)   /* Flush-to-zero enable bit */
 #define FPCR_DN     (1 << 25)   /* Default NaN enable bit */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
         val &= ~FPCR_FZ16;
     }
 
+    /*
+     * We don't implement trapped exception handling, so the
+     * trap enable bits are all RAZ/WI (not RES0!)
+     */
+    val &= ~(FPCR_IDE | FPCR_IXE | FPCR_UFE | FPCR_OFE | FPCR_DZE | FPCR_IOE);
+
     changed = env->vfp.xregs[ARM_VFP_FPSCR];
     env->vfp.xregs[ARM_VFP_FPSCR] = (val & 0xffc8ffff);
     env->vfp.vec_len = (val >> 16) & 7;
-- 
2.20.1

Most of this is the Neon decodetree patches, followed by Edgar's versal cleanups.

thanks
-- PMM

The following changes since commit 2ef486e76d64436be90f7359a3071fb2a56ce835:

Merge remote-tracking branch 'remotes/marcel/tags/rdma-pull-request' into staging (2020-05-03 14:12:56 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200504

for you to fetch changes up to 9aefc6cf9b73f66062d2f914a0136756e7a28211:

target/arm: Move gen_ function typedefs to translate.h (2020-05-04 12:59:26 +0100)

----------------------------------------------------------------
target-arm queue:
 * Start of conversion of Neon insns to decodetree
 * versal board: support SD and RTC
 * Implement ARMv8.2-TTS2UXN
 * Make VQDMULL undefined when U=1
 * Some minor code cleanups

----------------------------------------------------------------
Edgar E. Iglesias (11):
      hw/arm: versal: Remove inclusion of arm_gicv3_common.h
      hw/arm: versal: Move misplaced comment
      hw/arm: versal-virt: Fix typo xlnx-ve -> xlnx-versal
      hw/arm: versal: Embed the UARTs into the SoC type
      hw/arm: versal: Embed the GEMs into the SoC type
      hw/arm: versal: Embed the ADMAs into the SoC type
      hw/arm: versal: Embed the APUs into the SoC type
      hw/arm: versal: Add support for SD
      hw/arm: versal: Add support for the RTC
      hw/arm: versal-virt: Add support for SD
      hw/arm: versal-virt: Add support for the RTC

Fredrik Strupe (1):
      target/arm: Make VQDMULL undefined when U=1

Peter Maydell (25):
      target/arm: Don't use a TLB for ARMMMUIdx_Stage2
      target/arm: Use enum constant in get_phys_addr_lpae() call
      target/arm: Add new 's1_is_el0' argument to get_phys_addr_lpae()
      target/arm: Implement ARMv8.2-TTS2UXN
      target/arm: Use correct variable for setting 'max' cpu's ID_AA64DFR0
      target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check
      target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
      target/arm: Add stubs for AArch32 Neon decodetree
      target/arm: Convert VCMLA (vector) to decodetree
      target/arm: Convert VCADD (vector) to decodetree
      target/arm: Convert V[US]DOT (vector) to decodetree
      target/arm: Convert VFM[AS]L (vector) to decodetree
      target/arm: Convert VCMLA (scalar) to decodetree
      target/arm: Convert V[US]DOT (scalar) to decodetree
      target/arm: Convert VFM[AS]L (scalar) to decodetree
      target/arm: Convert Neon load/store multiple structures to decodetree
      target/arm: Convert Neon 'load single structure to all lanes' to decodetree
      target/arm: Convert Neon 'load/store single structure' to decodetree
      target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree
      target/arm: Convert Neon 3-reg-same logic ops to decodetree
      target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree
      target/arm: Convert Neon 3-reg-same comparisons to decodetree
      target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree
      target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to decodetree
      target/arm: Move gen_ function typedefs to translate.h

Philippe Mathieu-Daudé (2):
      hw/arm/mps2-tz: Use TYPE_IOTKIT instead of hardcoded string
      target/arm: Use uint64_t for midr field in CPU state struct

include/hw/arm/xlnx-versal.h    |  31 +-
 target/arm/cpu-param.h          |   2 +-
 target/arm/cpu.h                |  38 ++-
 target/arm/translate-a64.h      |   9 -
 target/arm/translate.h          |  26 ++
 target/arm/neon-dp.decode       |  86 +++++
 target/arm/neon-ls.decode       |  52 +++
 target/arm/neon-shared.decode   |  66 ++++
 hw/arm/mps2-tz.c                |   2 +-
 hw/arm/xlnx-versal-virt.c       |  74 ++++-
 hw/arm/xlnx-versal.c            | 115 +++++--
 target/arm/cpu.c                |   3 +-
 target/arm/cpu64.c              |   8 +-
 target/arm/helper.c             | 183 ++++------
 target/arm/translate-a64.c      |  17 -
 target/arm/translate-neon.inc.c | 714 +++++++++++++++++++++++++++++++++++++++
 target/arm/translate-vfp.inc.c  |   6 -
 target/arm/translate.c          | 716 +++-------------------------------------
 target/arm/Makefile.objs        |  18 +
 19 files changed, 1302 insertions(+), 864 deletions(-)
 create mode 100644 target/arm/neon-dp.decode
 create mode 100644 target/arm/neon-ls.decode
 create mode 100644 target/arm/neon-shared.decode
 create mode 100644 target/arm/translate-neon.inc.c

From: Fredrik Strupe <fredrik@strupe.net>

According to Arm ARM, VQDMULL is only valid when U=0, while having
U=1 is unallocated.

Signed-off-by: Fredrik Strupe <fredrik@strupe.net>
Fixes: 695272dcb976 ("target-arm: Handle UNDEF cases for Neon 3-regs-different-widths")
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 0}, /* VMLSL */
                     {0, 0, 0, 9}, /* VQDMLSL */
                     {0, 0, 0, 0}, /* Integer VMULL */
-                    {0, 0, 0, 1}, /* VQDMULL */
+                    {0, 0, 0, 9}, /* VQDMULL */
                     {0, 0, 0, 0xa}, /* Polynomial VMULL */
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
                 };
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

By using the TYPE_* definitions for devices, we can:
 - quickly find where devices are used with 'git-grep'
 - easily rename a device (one-line change).

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200428154650.21991-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/mps2-tz.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
         exit(EXIT_FAILURE);
     }
 
-    sysbus_init_child_obj(OBJECT(machine), "iotkit", &mms->iotkit,
+    sysbus_init_child_obj(OBJECT(machine), TYPE_IOTKIT, &mms->iotkit,
                           sizeof(mms->iotkit), mmc->armsse_type);
     iotkitdev = DEVICE(&mms->iotkit);
     object_property_set_link(OBJECT(&mms->iotkit), OBJECT(system_memory),
-- 
2.20.1

We define ARMMMUIdx_Stage2 as being an MMU index which uses a QEMU
TLB.  However we never actually use the TLB -- all stage 2 lookups
are done by direct calls to get_phys_addr_lpae() followed by a
physical address load via address_space_ld*().

Remove Stage2 from the list of ARM MMU indexes which correspond to
real core MMU indexes, and instead put it in the set of "NOTLB" ARM
MMU indexes.

This allows us to drop NB_MMU_MODES to 11.  It also means we can
safely add support for the ARMv8.3-TTS2UXN extension, which adds
permission bits to the stage 2 descriptors which define execute
permission separatel for EL0 and EL1; supporting that while keeping
Stage2 in a QEMU TLB would require us to use separate TLBs for
"Stage2 for an EL0 access" and "Stage2 for an EL1 access", which is a
lot of extra complication given we aren't even using the QEMU TLB.

In the process of updating the comment on our MMU index use,
fix a couple of other minor errors:
 * NS EL2 EL2&0 was missing from the list in the comment
 * some text hadn't been updated from when we bumped NB_MMU_MODES
   above 8

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200330210400.11724-2-peter.maydell@linaro.org
---
 target/arm/cpu-param.h |   2 +-
 target/arm/cpu.h       |  21 +++++---
 target/arm/helper.c    | 112 ++++-------------------------------------
 3 files changed, 27 insertions(+), 108 deletions(-)

diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -XXX,XX +XXX,XX @@
 # define TARGET_PAGE_BITS_MIN  10
 #endif
 
-#define NB_MMU_MODES 12
+#define NB_MMU_MODES 11
 
 #endif
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  *     handling via the TLB. The only way to do a stage 1 translation without
  *     the immediate stage 2 translation is via the ATS or AT system insns,
  *     which can be slow-pathed and always do a page table walk.
+ *     The only use of stage 2 translations is either as part of an s1+2
+ *     lookup or when loading the descriptors during a stage 1 page table walk,
+ *     and in both those cases we don't use the TLB.
  *  4. we can also safely fold together the "32 bit EL3" and "64 bit EL3"
  *     translation regimes, because they map reasonably well to each other
  *     and they can't both be active at the same time.
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  * NS EL1 EL1&0 stage 1+2 (aka NS PL1)
  * NS EL1 EL1&0 stage 1+2 +PAN
  * NS EL0 EL2&0
+ * NS EL2 EL2&0
  * NS EL2 EL2&0 +PAN
  * NS EL2 (aka NS PL2)
  * S EL0 EL1&0 (aka S PL0)
  * S EL1 EL1&0 (not used if EL3 is 32 bit)
  * S EL1 EL1&0 +PAN
  * S EL3 (aka S PL1)
- * NS EL1&0 stage 2
  *
- * for a total of 12 different mmu_idx.
+ * for a total of 11 different mmu_idx.
  *
  * R profile CPUs have an MPU, but can use the same set of MMU indexes
  * as A profile. They only need to distinguish NS EL0 and NS EL1 (and
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  * are not quite the same -- different CPU types (most notably M profile
  * vs A/R profile) would like to use MMU indexes with different semantics,
  * but since we don't ever need to use all of those in a single CPU we
- * can avoid setting NB_MMU_MODES to more than 8. The lower bits of
+ * can avoid having to set NB_MMU_MODES to "total number of A profile MMU
+ * modes + total number of M profile MMU modes". The lower bits of
  * ARMMMUIdx are the core TLB mmu index, and the higher bits are always
  * the same for any particular CPU.
  * Variables of type ARMMUIdx are always full values, and the core
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
     ARMMMUIdx_SE10_1_PAN = 9 | ARM_MMU_IDX_A,
     ARMMMUIdx_SE3        = 10 | ARM_MMU_IDX_A,
 
-    ARMMMUIdx_Stage2     = 11 | ARM_MMU_IDX_A,
-
     /*
      * These are not allocated TLBs and are used only for AT system
      * instructions or for the first stage of an S12 page table walk.
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
     ARMMMUIdx_Stage1_E0 = 0 | ARM_MMU_IDX_NOTLB,
     ARMMMUIdx_Stage1_E1 = 1 | ARM_MMU_IDX_NOTLB,
     ARMMMUIdx_Stage1_E1_PAN = 2 | ARM_MMU_IDX_NOTLB,
+    /*
+     * Not allocated a TLB: used only for second stage of an S12 page
+     * table walk, or for descriptor loads during first stage of an S1
+     * page table walk. Note that if we ever want to have a TLB for this
+     * then various TLB flush insns which currently are no-ops or flush
+     * only stage 1 MMU indexes will need to change to flush stage 2.
+     */
+    ARMMMUIdx_Stage2     = 3 | ARM_MMU_IDX_NOTLB,
 
     /*
      * M-profile.
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdxBit {
     TO_CORE_BIT(SE10_1),
     TO_CORE_BIT(SE10_1_PAN),
     TO_CORE_BIT(SE3),
-    TO_CORE_BIT(Stage2),
 
     TO_CORE_BIT(MUser),
     TO_CORE_BIT(MPriv),
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void tlbiall_nsnh_write(CPUARMState *env, const ARMCPRegInfo *ri,
     tlb_flush_by_mmuidx(cs,
                         ARMMMUIdxBit_E10_1 |
                         ARMMMUIdxBit_E10_1_PAN |
-                        ARMMMUIdxBit_E10_0 |
-                        ARMMMUIdxBit_Stage2);
+                        ARMMMUIdxBit_E10_0);
 }
 
 static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -XXX,XX +XXX,XX @@ static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
     tlb_flush_by_mmuidx_all_cpus_synced(cs,
                                         ARMMMUIdxBit_E10_1 |
                                         ARMMMUIdxBit_E10_1_PAN |
-                                        ARMMMUIdxBit_E10_0 |
-                                        ARMMMUIdxBit_Stage2);
+                                        ARMMMUIdxBit_E10_0);
 }
 
-static void tlbiipas2_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                            uint64_t value)
-{
-    /* Invalidate by IPA. This has to invalidate any structures that
-     * contain only stage 2 translation information, but does not need
-     * to apply to structures that contain combined stage 1 and stage 2
-     * translation information.
-     * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
-     */
-    CPUState *cs = env_cpu(env);
-    uint64_t pageaddr;
-
-    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
-        return;
-    }
-
-    pageaddr = sextract64(value << 12, 0, 40);
-
-    tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2);
-}
-
-static void tlbiipas2_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                               uint64_t value)
-{
-    CPUState *cs = env_cpu(env);
-    uint64_t pageaddr;
-
-    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
-        return;
-    }
-
-    pageaddr = sextract64(value << 12, 0, 40);
-
-    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
-                                             ARMMMUIdxBit_Stage2);
-}
 
 static void tlbiall_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri,
                               uint64_t value)
@@ -XXX,XX +XXX,XX @@ static void vttbr_write(CPUARMState *env, const ARMCPRegInfo *ri,
         tlb_flush_by_mmuidx(cs,
                             ARMMMUIdxBit_E10_1 |
                             ARMMMUIdxBit_E10_1_PAN |
-                            ARMMMUIdxBit_E10_0 |
-                            ARMMMUIdxBit_Stage2);
+                            ARMMMUIdxBit_E10_0);
         raw_write(env, ri, value);
     }
 }
@@ -XXX,XX +XXX,XX @@ static int alle1_tlbmask(CPUARMState *env)
         return ARMMMUIdxBit_SE10_1 |
                ARMMMUIdxBit_SE10_1_PAN |
                ARMMMUIdxBit_SE10_0;
-    } else if (arm_feature(env, ARM_FEATURE_EL2)) {
-        return ARMMMUIdxBit_E10_1 |
-               ARMMMUIdxBit_E10_1_PAN |
-               ARMMMUIdxBit_E10_0 |
-               ARMMMUIdxBit_Stage2;
     } else {
         return ARMMMUIdxBit_E10_1 |
                ARMMMUIdxBit_E10_1_PAN |
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                              ARMMMUIdxBit_SE3);
 }
 
-static void tlbi_aa64_ipas2e1_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                                    uint64_t value)
-{
-    /* Invalidate by IPA. This has to invalidate any structures that
-     * contain only stage 2 translation information, but does not need
-     * to apply to structures that contain combined stage 1 and stage 2
-     * translation information.
-     * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
-     */
-    ARMCPU *cpu = env_archcpu(env);
-    CPUState *cs = CPU(cpu);
-    uint64_t pageaddr;
-
-    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
-        return;
-    }
-
-    pageaddr = sextract64(value << 12, 0, 48);
-
-    tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2);
-}
-
-static void tlbi_aa64_ipas2e1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                                      uint64_t value)
-{
-    CPUState *cs = env_cpu(env);
-    uint64_t pageaddr;
-
-    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
-        return;
-    }
-
-    pageaddr = sextract64(value << 12, 0, 48);
-
-    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
-                                             ARMMMUIdxBit_Stage2);
-}
-
 static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri,
                                       bool isread)
 {
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = tlbi_aa64_vae1_write },
     { .name = "TLBI_IPAS2E1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-      .writefn = tlbi_aa64_ipas2e1is_write },
+      .access = PL2_W, .type = ARM_CP_NOP },
     { .name = "TLBI_IPAS2LE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-      .writefn = tlbi_aa64_ipas2e1is_write },
+      .access = PL2_W, .type = ARM_CP_NOP },
     { .name = "TLBI_ALLE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4,
       .access = PL2_W, .type = ARM_CP_NO_RAW,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = tlbi_aa64_alle1is_write },
     { .name = "TLBI_IPAS2E1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-      .writefn = tlbi_aa64_ipas2e1_write },
+      .access = PL2_W, .type = ARM_CP_NOP },
     { .name = "TLBI_IPAS2LE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-      .writefn = tlbi_aa64_ipas2e1_write },
+      .access = PL2_W, .type = ARM_CP_NOP },
     { .name = "TLBI_ALLE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4,
       .access = PL2_W, .type = ARM_CP_NO_RAW,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = tlbimva_hyp_is_write },
     { .name = "TLBIIPAS2",
       .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-      .writefn = tlbiipas2_write },
+      .type = ARM_CP_NOP, .access = PL2_W },
     { .name = "TLBIIPAS2IS",
       .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-      .writefn = tlbiipas2_is_write },
+      .type = ARM_CP_NOP, .access = PL2_W },
     { .name = "TLBIIPAS2L",
       .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-      .writefn = tlbiipas2_write },
+      .type = ARM_CP_NOP, .access = PL2_W },
     { .name = "TLBIIPAS2LIS",
       .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-      .writefn = tlbiipas2_is_write },
+      .type = ARM_CP_NOP, .access = PL2_W },
     /* 32 bit cache operations */
     { .name = "ICIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
-- 
2.20.1

The access_type argument to get_phys_addr_lpae() is an MMUAccessType;
use the enum constant MMU_DATA_LOAD rather than a literal 0 when we
call it in S1_ptw_translate().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200330210400.11724-3-peter.maydell@linaro.org
---
 target/arm/helper.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
             pcacheattrs = &cacheattrs;
         }
 
-        ret = get_phys_addr_lpae(env, addr, 0, ARMMMUIdx_Stage2, &s2pa,
-                                 &txattrs, &s2prot, &s2size, fi, pcacheattrs);
+        ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2,
+                                 &s2pa, &txattrs, &s2prot, &s2size, fi,
+                                 pcacheattrs);
         if (ret) {
             assert(fi->type != ARMFault_None);
             fi->s2addr = addr;
-- 
2.20.1

For ARMv8.2-TTS2UXN, the stage 2 page table walk wants to know
whether the stage 1 access is for EL0 or not, because whether
exec permission is given can depend on whether this is an EL0
or EL1 access. Add a new argument to get_phys_addr_lpae() so
the call sites can pass this information in.

Since get_phys_addr_lpae() doesn't already have a doc comment,
add one so we have a place to put the documentation of the
semantics of the new s1_is_el0 argument.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200330210400.11724-4-peter.maydell@linaro.org
---
 target/arm/helper.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@
 
 static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
                                MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                               bool s1_is_el0,
                                hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
                                target_ulong *page_size_ptr,
                                ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs);
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
         }
 
         ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2,
+                                 false,
                                  &s2pa, &txattrs, &s2prot, &s2size, fi,
                                  pcacheattrs);
         if (ret) {
@@ -XXX,XX +XXX,XX @@ static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
     };
 }
 
+/**
+ * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
+ *
+ * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
+ * prot and page_size may not be filled in, and the populated fsr value provides
+ * information on why the translation aborted, in the format of a long-format
+ * DFSR/IFSR fault register, with the following caveats:
+ *  * the WnR bit is never set (the caller must do this).
+ *
+ * @env: CPUARMState
+ * @address: virtual address to get physical address for
+ * @access_type: MMU_DATA_LOAD, MMU_DATA_STORE or MMU_INST_FETCH
+ * @mmu_idx: MMU index indicating required translation regime
+ * @s1_is_el0: if @mmu_idx is ARMMMUIdx_Stage2 (so this is a stage 2 page table
+ *             walk), must be true if this is stage 2 of a stage 1+2 walk for an
+ *             EL0 access). If @mmu_idx is anything else, @s1_is_el0 is ignored.
+ * @phys_ptr: set to the physical address corresponding to the virtual address
+ * @attrs: set to the memory transaction attributes to use
+ * @prot: set to the permissions for the page containing phys_ptr
+ * @page_size_ptr: set to the size of the page containing phys_ptr
+ * @fi: set to fault info if the translation fails
+ * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
+ */
 static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
                                MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                               bool s1_is_el0,
                                hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
                                target_ulong *page_size_ptr,
                                ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 
             /* S1 is done. Now do S2 translation.  */
             ret = get_phys_addr_lpae(env, ipa, access_type, ARMMMUIdx_Stage2,
+                                     mmu_idx == ARMMMUIdx_E10_0,
                                      phys_ptr, attrs, &s2_prot,
                                      page_size, fi,
                                      cacheattrs != NULL ? &cacheattrs2 : NULL);
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
     }
 
     if (regime_using_lpae_format(env, mmu_idx)) {
-        return get_phys_addr_lpae(env, address, access_type, mmu_idx,
+        return get_phys_addr_lpae(env, address, access_type, mmu_idx, false,
                                   phys_ptr, attrs, prot, page_size,
                                   fi, cacheattrs);
     } else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
-- 
2.20.1

The ARMv8.2-TTS2UXN feature extends the XN field in stage 2
translation table descriptors from just bit [54] to bits [54:53],
allowing stage 2 to control execution permissions separately for EL0
and EL1. Implement the new semantics of the XN field and enable
the feature for our 'max' CPU.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200330210400.11724-5-peter.maydell@linaro.org
---
 target/arm/cpu.h    | 15 +++++++++++++++
 target/arm/cpu.c    |  1 +
 target/arm/cpu64.c  |  2 ++
 target/arm/helper.c | 37 +++++++++++++++++++++++++++++++------
 4 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_ccidx(const ARMISARegisters *id)
     return FIELD_EX32(id->id_mmfr4, ID_MMFR4, CCIDX) != 0;
 }
 
+static inline bool isar_feature_aa32_tts2uxn(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_mmfr4, ID_MMFR4, XNX) != 0;
+}
+
 /*
  * 64-bit feature tests via id registers.
  */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_ccidx(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, CCIDX) != 0;
 }
 
+static inline bool isar_feature_aa64_tts2uxn(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, XNX) != 0;
+}
+
 /*
  * Feature tests for "does this exist in either 32-bit or 64-bit?"
  */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_any_ccidx(const ARMISARegisters *id)
     return isar_feature_aa64_ccidx(id) || isar_feature_aa32_ccidx(id);
 }
 
+static inline bool isar_feature_any_tts2uxn(const ARMISARegisters *id)
+{
+    return isar_feature_aa64_tts2uxn(id) || isar_feature_aa32_tts2uxn(id);
+}
+
 /*
  * Forward to the above feature tests given an ARMCPU pointer.
  */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
             t = FIELD_DP32(t, ID_MMFR4, HPDS, 1); /* AA32HPD */
             t = FIELD_DP32(t, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
             t = FIELD_DP32(t, ID_MMFR4, CNP, 1); /* TTCNP */
+            t = FIELD_DP32(t, ID_MMFR4, XNX, 1); /* TTS2UXN */
             cpu->isar.id_mmfr4 = t;
         }
 #endif
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         t = FIELD_DP64(t, ID_AA64MMFR1, VH, 1);
         t = FIELD_DP64(t, ID_AA64MMFR1, PAN, 2); /* ATS1E1 */
         t = FIELD_DP64(t, ID_AA64MMFR1, VMIDBITS, 2); /* VMID16 */
+        t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1); /* TTS2UXN */
         cpu->isar.id_aa64mmfr1 = t;
 
         t = cpu->isar.id_aa64mmfr2;
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = FIELD_DP32(u, ID_MMFR4, HPDS, 1); /* AA32HPD */
         u = FIELD_DP32(u, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
         u = FIELD_DP32(u, ID_MMFR4, CNP, 1); /* TTCNP */
+        u = FIELD_DP32(u, ID_MMFR4, XNX, 1); /* TTS2UXN */
         cpu->isar.id_mmfr4 = u;
 
         u = cpu->isar.id_aa64dfr0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
  *
  * @env:     CPUARMState
  * @s2ap:    The 2-bit stage2 access permissions (S2AP)
- * @xn:      XN (execute-never) bit
+ * @xn:      XN (execute-never) bits
+ * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
  */
-static int get_S2prot(CPUARMState *env, int s2ap, int xn)
+static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
 {
     int prot = 0;
 
@@ -XXX,XX +XXX,XX @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn)
     if (s2ap & 2) {
         prot |= PAGE_WRITE;
     }
-    if (!xn) {
-        if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
+
+    if (cpu_isar_feature(any_tts2uxn, env_archcpu(env))) {
+        switch (xn) {
+        case 0:
             prot |= PAGE_EXEC;
+            break;
+        case 1:
+            if (s1_is_el0) {
+                prot |= PAGE_EXEC;
+            }
+            break;
+        case 2:
+            break;
+        case 3:
+            if (!s1_is_el0) {
+                prot |= PAGE_EXEC;
+            }
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    } else {
+        if (!extract32(xn, 1, 1)) {
+            if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
+                prot |= PAGE_EXEC;
+            }
         }
     }
     return prot;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
     }
 
     ap = extract32(attrs, 4, 2);
-    xn = extract32(attrs, 12, 1);
 
     if (mmu_idx == ARMMMUIdx_Stage2) {
         ns = true;
-        *prot = get_S2prot(env, ap, xn);
+        xn = extract32(attrs, 11, 2);
+        *prot = get_S2prot(env, ap, xn, s1_is_el0);
     } else {
         ns = extract32(attrs, 3, 1);
+        xn = extract32(attrs, 12, 1);
         pxn = extract32(attrs, 11, 1);
         *prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
     }
-- 
2.20.1

In aarch64_max_initfn() we update both 32-bit and 64-bit ID
registers.  The intended pattern is that for 64-bit ID registers we
use FIELD_DP64 and the uint64_t 't' register, while 32-bit ID
registers use FIELD_DP32 and the uint32_t 'u' register.  For
ID_AA64DFR0 we accidentally used 'u', meaning that the top 32 bits of
this 64-bit ID register would end up always zero.  Luckily at the
moment that's what they should be anyway, so this bug has no visible
effects.

Use the right-sized variable.

Fixes: 3bec78447a958d481991
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200423110915.10527-1-peter.maydell@linaro.org
---
 target/arm/cpu64.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = FIELD_DP32(u, ID_MMFR4, XNX, 1); /* TTS2UXN */
         cpu->isar.id_mmfr4 = u;
 
-        u = cpu->isar.id_aa64dfr0;
-        u = FIELD_DP64(u, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
-        cpu->isar.id_aa64dfr0 = u;
+        t = cpu->isar.id_aa64dfr0;
+        t = FIELD_DP64(t, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
+        cpu->isar.id_aa64dfr0 = t;
 
         u = cpu->isar.id_dfr0;
         u = FIELD_DP32(u, ID_DFR0, PERFMON, 5); /* v8.4-PMU */
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

MIDR_EL1 is a 64-bit system register with the top 32-bit being RES0.
Represent it in QEMU's ARMCPU struct with a uint64_t, not a
uint32_t.

This fixes an error when compiling with -Werror=conversion
because we were manipulating the register value using a
local uint64_t variable:

target/arm/cpu64.c: In function ‘aarch64_max_initfn’:
  target/arm/cpu64.c:628:21: error: conversion from ‘uint64_t’ {aka ‘long unsigned int’} to ‘uint32_t’ {aka ‘unsigned int’} may change value [-Werror=conversion]
    628 |         cpu->midr = t;
        |                     ^

and future-proofs us against a possible future architecture
change using some of the top 32 bits.

Suggested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Message-id: 20200428172634.29707-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 2 +-
 target/arm/cpu.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
         uint64_t id_aa64dfr0;
         uint64_t id_aa64dfr1;
     } isar;
-    uint32_t midr;
+    uint64_t midr;
     uint32_t revidr;
     uint32_t reset_fpsid;
     uint32_t ctr;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_cpus[] = {
 static Property arm_cpu_properties[] = {
     DEFINE_PROP_BOOL("start-powered-off", ARMCPU, start_powered_off, false),
     DEFINE_PROP_UINT32("psci-conduit", ARMCPU, psci_conduit, 0),
-    DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
+    DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
     DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
                         mp_affinity, ARM64_AFFINITY_INVALID),
     DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Move misplaced comment.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-3-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xlnx-versal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
 
         obj = object_new(XLNX_VERSAL_ACPU_TYPE);
         if (!obj) {
-            /* Secondary CPUs start in PSCI powered-down state */
             error_report("Unable to create apu.cpu[%d] of type %s",
                          i, XLNX_VERSAL_ACPU_TYPE);
             exit(EXIT_FAILURE);
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
         object_property_set_int(obj, s->cfg.psci_conduit,
                                 "psci-conduit", &error_abort);
         if (i) {
+            /* Secondary CPUs start in PSCI powered-down state */
             object_property_set_bool(obj, true,
                                      "start-powered-off", &error_abort);
         }
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Fix typo xlnx-ve -> xlnx-versal.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-4-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xlnx-versal-virt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
         psci_conduit = QEMU_PSCI_CONDUIT_SMC;
     }
 
-    sysbus_init_child_obj(OBJECT(machine), "xlnx-ve", &s->soc,
+    sysbus_init_child_obj(OBJECT(machine), "xlnx-versal", &s->soc,
                           sizeof(s->soc), TYPE_XLNX_VERSAL);
     object_property_set_link(OBJECT(&s->soc), OBJECT(machine->ram),
                              "ddr", &error_abort);
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Embed the UARTs into the SoC type.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-5-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-versal.h |  3 ++-
 hw/arm/xlnx-versal.c         | 12 ++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Embed the GEMs into the SoC type.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-6-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-versal.h |  3 ++-
 hw/arm/xlnx-versal.c         | 15 ++++++++-------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/xlnx-versal.h
+++ b/include/hw/arm/xlnx-versal.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/arm/boot.h"
 #include "hw/intc/arm_gicv3.h"
 #include "hw/char/pl011.h"
+#include "hw/net/cadence_gem.h"
 
 #define TYPE_XLNX_VERSAL "xlnx-versal"
 #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
 
         struct {
             PL011State uart[XLNX_VERSAL_NR_UARTS];
-            SysBusDevice *gem[XLNX_VERSAL_NR_GEMS];
+            CadenceGEMState gem[XLNX_VERSAL_NR_GEMS];
             SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
         } iou;
     } lpd;
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_gems(Versal *s, qemu_irq *pic)
         DeviceState *dev;
         MemoryRegion *mr;
 
-        dev = qdev_create(NULL, "cadence_gem");
-        s->lpd.iou.gem[i] = SYS_BUS_DEVICE(dev);
-        object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
+        sysbus_init_child_obj(OBJECT(s), name,
+                              &s->lpd.iou.gem[i], sizeof(s->lpd.iou.gem[i]),
+                              TYPE_CADENCE_GEM);
+        dev = DEVICE(&s->lpd.iou.gem[i]);
         if (nd->used) {
             qemu_check_nic_model(nd, "cadence_gem");
             qdev_set_nic_properties(dev, nd);
         }
-        object_property_set_int(OBJECT(s->lpd.iou.gem[i]),
+        object_property_set_int(OBJECT(dev),
                                 2, "num-priority-queues",
                                 &error_abort);
-        object_property_set_link(OBJECT(s->lpd.iou.gem[i]),
+        object_property_set_link(OBJECT(dev),
                                  OBJECT(&s->mr_ps), "dma",
                                  &error_abort);
         qdev_init_nofail(dev);
 
-        mr = sysbus_mmio_get_region(s->lpd.iou.gem[i], 0);
+        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
         memory_region_add_subregion(&s->mr_ps, addrs[i], mr);
 
-        sysbus_connect_irq(s->lpd.iou.gem[i], 0, pic[irqs[i]]);
+        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[irqs[i]]);
         g_free(name);
     }
 }
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Embed the ADMAs into the SoC type.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-7-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-versal.h |  3 ++-
 hw/arm/xlnx-versal.c         | 14 +++++++-------
 2 files changed, 9 insertions(+), 8 deletions(-)

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Embed the APUs into the SoC type.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-8-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-versal.h |  2 +-
 hw/arm/xlnx-versal-virt.c    |  4 ++--
 hw/arm/xlnx-versal.c         | 19 +++++--------------
 3 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/xlnx-versal.h
+++ b/include/hw/arm/xlnx-versal.h
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
     struct {
         struct {
             MemoryRegion mr;
-            ARMCPU *cpu[XLNX_VERSAL_NR_ACPUS];
+            ARMCPU cpu[XLNX_VERSAL_NR_ACPUS];
             GICv3State gic;
         } apu;
     } fpd;
diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
     s->binfo.get_dtb = versal_virt_get_dtb;
     s->binfo.modify_dtb = versal_virt_modify_dtb;
     if (machine->kernel_filename) {
-        arm_load_kernel(s->soc.fpd.apu.cpu[0], machine, &s->binfo);
+        arm_load_kernel(&s->soc.fpd.apu.cpu[0], machine, &s->binfo);
     } else {
-        AddressSpace *as = arm_boot_address_space(s->soc.fpd.apu.cpu[0],
+        AddressSpace *as = arm_boot_address_space(&s->soc.fpd.apu.cpu[0],
                                                   &s->binfo);
         /* Some boot-loaders (e.g u-boot) don't like blobs at address 0 (NULL).
          * Offset things by 4K.  */
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
 
     for (i = 0; i < ARRAY_SIZE(s->fpd.apu.cpu); i++) {
         Object *obj;
-        char *name;
-
-        obj = object_new(XLNX_VERSAL_ACPU_TYPE);
-        if (!obj) {
-            error_report("Unable to create apu.cpu[%d] of type %s",
-                         i, XLNX_VERSAL_ACPU_TYPE);
-            exit(EXIT_FAILURE);
-        }
-
-        name = g_strdup_printf("apu-cpu[%d]", i);
-        object_property_add_child(OBJECT(s), name, obj, &error_fatal);
-        g_free(name);
 
+        object_initialize_child(OBJECT(s), "apu-cpu[*]",
+                                &s->fpd.apu.cpu[i], sizeof(s->fpd.apu.cpu[i]),
+                                XLNX_VERSAL_ACPU_TYPE, &error_abort, NULL);
+        obj = OBJECT(&s->fpd.apu.cpu[i]);
         object_property_set_int(obj, s->cfg.psci_conduit,
                                 "psci-conduit", &error_abort);
         if (i) {
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
         object_property_set_link(obj, OBJECT(&s->fpd.apu.mr), "memory",
                                  &error_abort);
         object_property_set_bool(obj, true, "realized", &error_fatal);
-        s->fpd.apu.cpu[i] = ARM_CPU(obj);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_gic(Versal *s, qemu_irq *pic)
     }
 
     for (i = 0; i < nr_apu_cpus; i++) {
-        DeviceState *cpudev = DEVICE(s->fpd.apu.cpu[i]);
+        DeviceState *cpudev = DEVICE(&s->fpd.apu.cpu[i]);
         int ppibase = XLNX_VERSAL_NR_IRQS + i * GIC_INTERNAL + GIC_NR_SGIS;
         qemu_irq maint_irq;
         int ti;
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Add support for SD.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-9-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-versal.h | 12 ++++++++++++
 hw/arm/xlnx-versal.c         | 31 +++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/xlnx-versal.h
+++ b/include/hw/arm/xlnx-versal.h
@@ -XXX,XX +XXX,XX @@
 
 #include "hw/sysbus.h"
 #include "hw/arm/boot.h"
+#include "hw/sd/sdhci.h"
 #include "hw/intc/arm_gicv3.h"
 #include "hw/char/pl011.h"
 #include "hw/dma/xlnx-zdma.h"
@@ -XXX,XX +XXX,XX @@
 #define XLNX_VERSAL_NR_UARTS   2
 #define XLNX_VERSAL_NR_GEMS    2
 #define XLNX_VERSAL_NR_ADMAS   8
+#define XLNX_VERSAL_NR_SDS     2
 #define XLNX_VERSAL_NR_IRQS    192
 
 typedef struct Versal {
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
         } iou;
     } lpd;
 
+    /* The Platform Management Controller subsystem.  */
+    struct {
+        struct {
+            SDHCIState sd[XLNX_VERSAL_NR_SDS];
+        } iou;
+    } pmc;
+
     struct {
         MemoryRegion *mr_ddr;
         uint32_t psci_conduit;
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
 #define VERSAL_GEM1_IRQ_0          58
 #define VERSAL_GEM1_WAKE_IRQ_0     59
 #define VERSAL_ADMA_IRQ_0          60
+#define VERSAL_SD0_IRQ_0           126
 
 /* Architecturally reserved IRQs suitable for virtualization.  */
 #define VERSAL_RSVD_IRQ_FIRST 111
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
 #define MM_FPD_CRF                  0xfd1a0000U
 #define MM_FPD_CRF_SIZE             0x140000
 
+#define MM_PMC_SD0                  0xf1040000U
+#define MM_PMC_SD0_SIZE             0x10000
 #define MM_PMC_CRP                  0xf1260000U
 #define MM_PMC_CRP_SIZE             0x10000
 #endif
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_admas(Versal *s, qemu_irq *pic)
     }
 }
 
+#define SDHCI_CAPABILITIES  0x280737ec6481 /* Same as on ZynqMP.  */
+static void versal_create_sds(Versal *s, qemu_irq *pic)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(s->pmc.iou.sd); i++) {
+        DeviceState *dev;
+        MemoryRegion *mr;
+
+        sysbus_init_child_obj(OBJECT(s), "sd[*]",
+                              &s->pmc.iou.sd[i], sizeof(s->pmc.iou.sd[i]),
+                              TYPE_SYSBUS_SDHCI);
+        dev = DEVICE(&s->pmc.iou.sd[i]);
+
+        object_property_set_uint(OBJECT(dev),
+                                 3, "sd-spec-version", &error_fatal);
+        object_property_set_uint(OBJECT(dev), SDHCI_CAPABILITIES, "capareg",
+                                 &error_fatal);
+        object_property_set_uint(OBJECT(dev), UHS_I, "uhs", &error_fatal);
+        qdev_init_nofail(dev);
+
+        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
+        memory_region_add_subregion(&s->mr_ps,
+                                    MM_PMC_SD0 + i * MM_PMC_SD0_SIZE, mr);
+
+        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0,
+                           pic[VERSAL_SD0_IRQ_0 + i * 2]);
+    }
+}
+
 /* This takes the board allocated linear DDR memory and creates aliases
  * for each split DDR range/aperture on the Versal address map.
  */
@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
     versal_create_uarts(s, pic);
     versal_create_gems(s, pic);
     versal_create_admas(s, pic);
+    versal_create_sds(s, pic);
     versal_map_ddr(s);
     versal_unimp(s);
 
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

hw/arm: versal: Add support for the RTC.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-10-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-versal.h |  8 ++++++++
 hw/arm/xlnx-versal.c         | 21 +++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/xlnx-versal.h
+++ b/include/hw/arm/xlnx-versal.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/char/pl011.h"
 #include "hw/dma/xlnx-zdma.h"
 #include "hw/net/cadence_gem.h"
+#include "hw/rtc/xlnx-zynqmp-rtc.h"
 
 #define TYPE_XLNX_VERSAL "xlnx-versal"
 #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
         struct {
             SDHCIState sd[XLNX_VERSAL_NR_SDS];
         } iou;
+
+        XlnxZynqMPRTC rtc;
     } pmc;
 
     struct {
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
 #define VERSAL_GEM1_IRQ_0          58
 #define VERSAL_GEM1_WAKE_IRQ_0     59
 #define VERSAL_ADMA_IRQ_0          60
+#define VERSAL_RTC_APB_ERR_IRQ     121
 #define VERSAL_SD0_IRQ_0           126
+#define VERSAL_RTC_ALARM_IRQ       142
+#define VERSAL_RTC_SECONDS_IRQ     143
 
 /* Architecturally reserved IRQs suitable for virtualization.  */
 #define VERSAL_RSVD_IRQ_FIRST 111
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
 #define MM_PMC_SD0_SIZE             0x10000
 #define MM_PMC_CRP                  0xf1260000U
 #define MM_PMC_CRP_SIZE             0x10000
+#define MM_PMC_RTC                  0xf12a0000
+#define MM_PMC_RTC_SIZE             0x10000
 #endif
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_sds(Versal *s, qemu_irq *pic)
     }
 }
 
+static void versal_create_rtc(Versal *s, qemu_irq *pic)
+{
+    SysBusDevice *sbd;
+    MemoryRegion *mr;
+
+    sysbus_init_child_obj(OBJECT(s), "rtc", &s->pmc.rtc, sizeof(s->pmc.rtc),
+                          TYPE_XLNX_ZYNQMP_RTC);
+    sbd = SYS_BUS_DEVICE(&s->pmc.rtc);
+    qdev_init_nofail(DEVICE(sbd));
+
+    mr = sysbus_mmio_get_region(sbd, 0);
+    memory_region_add_subregion(&s->mr_ps, MM_PMC_RTC, mr);
+
+    /*
+     * TODO: Connect the ALARM and SECONDS interrupts once our RTC model
+     * supports them.
+     */
+    sysbus_connect_irq(sbd, 1, pic[VERSAL_RTC_APB_ERR_IRQ]);
+}
+
 /* This takes the board allocated linear DDR memory and creates aliases
  * for each split DDR range/aperture on the Versal address map.
  */
@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
     versal_create_gems(s, pic);
     versal_create_admas(s, pic);
     versal_create_sds(s, pic);
+    versal_create_rtc(s, pic);
     versal_map_ddr(s);
     versal_unimp(s);
 
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Add support for SD.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-11-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xlnx-versal-virt.c | 46 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/arm/sysbus-fdt.h"
 #include "hw/arm/fdt.h"
 #include "cpu.h"
+#include "hw/qdev-properties.h"
 #include "hw/arm/xlnx-versal.h"
 
 #define TYPE_XLNX_VERSAL_VIRT_MACHINE MACHINE_TYPE_NAME("xlnx-versal-virt")
@@ -XXX,XX +XXX,XX @@ static void fdt_add_zdma_nodes(VersalVirt *s)
     }
 }
 
+static void fdt_add_sd_nodes(VersalVirt *s)
+{
+    const char clocknames[] = "clk_xin\0clk_ahb";
+    const char compat[] = "arasan,sdhci-8.9a";
+    int i;
+
+    for (i = ARRAY_SIZE(s->soc.pmc.iou.sd) - 1; i >= 0; i--) {
+        uint64_t addr = MM_PMC_SD0 + MM_PMC_SD0_SIZE * i;
+        char *name = g_strdup_printf("/sdhci@%" PRIx64, addr);
+
+        qemu_fdt_add_subnode(s->fdt, name);
+
+        qemu_fdt_setprop_cells(s->fdt, name, "clocks",
+                               s->phandle.clk_25Mhz, s->phandle.clk_25Mhz);
+        qemu_fdt_setprop(s->fdt, name, "clock-names",
+                         clocknames, sizeof(clocknames));
+        qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
+                               GIC_FDT_IRQ_TYPE_SPI, VERSAL_SD0_IRQ_0 + i * 2,
+                               GIC_FDT_IRQ_FLAGS_LEVEL_HI);
+        qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
+                                     2, addr, 2, MM_PMC_SD0_SIZE);
+        qemu_fdt_setprop(s->fdt, name, "compatible", compat, sizeof(compat));
+        g_free(name);
+    }
+}
+
 static void fdt_nop_memory_nodes(void *fdt, Error **errp)
 {
     Error *err = NULL;
@@ -XXX,XX +XXX,XX @@ static void create_virtio_regions(VersalVirt *s)
     }
 }
 
+static void sd_plugin_card(SDHCIState *sd, DriveInfo *di)
+{
+    BlockBackend *blk = di ? blk_by_legacy_dinfo(di) : NULL;
+    DeviceState *card;
+
+    card = qdev_create(qdev_get_child_bus(DEVICE(sd), "sd-bus"), TYPE_SD_CARD);
+    object_property_add_child(OBJECT(sd), "card[*]", OBJECT(card),
+                              &error_fatal);
+    qdev_prop_set_drive(card, "drive", blk, &error_fatal);
+    object_property_set_bool(OBJECT(card), true, "realized", &error_fatal);
+}
+
 static void versal_virt_init(MachineState *machine)
 {
     VersalVirt *s = XLNX_VERSAL_VIRT_MACHINE(machine);
     int psci_conduit = QEMU_PSCI_CONDUIT_DISABLED;
+    int i;
 
     /*
      * If the user provides an Operating System to be loaded, we expect them
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
     fdt_add_gic_nodes(s);
     fdt_add_timer_nodes(s);
     fdt_add_zdma_nodes(s);
+    fdt_add_sd_nodes(s);
     fdt_add_cpu_nodes(s, psci_conduit);
     fdt_add_clk_node(s, "/clk125", 125000000, s->phandle.clk_125Mhz);
     fdt_add_clk_node(s, "/clk25", 25000000, s->phandle.clk_25Mhz);
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
     memory_region_add_subregion_overlap(get_system_memory(),
                                         0, &s->soc.fpd.apu.mr, 0);
 
+    /* Plugin SD cards.  */
+    for (i = 0; i < ARRAY_SIZE(s->soc.pmc.iou.sd); i++) {
+        sd_plugin_card(&s->soc.pmc.iou.sd[i], drive_get_next(IF_SD));
+    }
+
     s->binfo.ram_size = machine->ram_size;
     s->binfo.loader_start = 0x0;
     s->binfo.get_dtb = versal_virt_get_dtb;
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Add support for the RTC.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-12-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xlnx-versal-virt.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -XXX,XX +XXX,XX @@ static void fdt_add_sd_nodes(VersalVirt *s)
     }
 }
 
+static void fdt_add_rtc_node(VersalVirt *s)
+{
+    const char compat[] = "xlnx,zynqmp-rtc";
+    const char interrupt_names[] = "alarm\0sec";
+    char *name = g_strdup_printf("/rtc@%x", MM_PMC_RTC);
+
+    qemu_fdt_add_subnode(s->fdt, name);
+
+    qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
+                           GIC_FDT_IRQ_TYPE_SPI, VERSAL_RTC_ALARM_IRQ,
+                           GIC_FDT_IRQ_FLAGS_LEVEL_HI,
+                           GIC_FDT_IRQ_TYPE_SPI, VERSAL_RTC_SECONDS_IRQ,
+                           GIC_FDT_IRQ_FLAGS_LEVEL_HI);
+    qemu_fdt_setprop(s->fdt, name, "interrupt-names",
+                     interrupt_names, sizeof(interrupt_names));
+    qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
+                                 2, MM_PMC_RTC, 2, MM_PMC_RTC_SIZE);
+    qemu_fdt_setprop(s->fdt, name, "compatible", compat, sizeof(compat));
+    g_free(name);
+}
+
 static void fdt_nop_memory_nodes(void *fdt, Error **errp)
 {
     Error *err = NULL;
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
     fdt_add_timer_nodes(s);
     fdt_add_zdma_nodes(s);
     fdt_add_sd_nodes(s);
+    fdt_add_rtc_node(s);
     fdt_add_cpu_nodes(s, psci_conduit);
     fdt_add_clk_node(s, "/clk125", 125000000, s->phandle.clk_125Mhz);
     fdt_add_clk_node(s, "/clk25", 25000000, s->phandle.clk_25Mhz);
-- 
2.20.1

Somewhere along theline we accidentally added a duplicate
"using D16-D31 when they don't exist" check to do_vfm_dp()
(probably an artifact of a patchseries rebase). Remove it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200430181003.21682-2-peter.maydell@linaro.org
---
 target/arm/translate-vfp.inc.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
         return false;
     }
 
-    /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_simd_r32, s) &&
-        ((a->vd | a->vn | a->vm) & 0x10)) {
-        return false;
-    }
-
     if (!vfp_access_check(s)) {
         return true;
     }
-- 
2.20.1

We were accidentally permitting decode of Thumb Neon insns even if
the CPU didn't have the FEATURE_NEON bit set, because the feature
check was being done before the call to disas_neon_data_insn() and
disas_neon_ls_insn() in the Arm decoder but was omitted from the
Thumb decoder.  Push the feature bit check down into the called
functions so it is done for both Arm and Thumb encodings.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200430181003.21682-3-peter.maydell@linaro.org
---
 target/arm/translate.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
     TCGv_i32 tmp2;
     TCGv_i64 tmp64;
 
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return 1;
+    }
+
     /* FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
      * for attempts to execute invalid vfp/neon encodings with FP disabled.
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     TCGv_ptr ptr1, ptr2, ptr3;
     TCGv_i64 tmp64;
 
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return 1;
+    }
+
     /* FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
      * for attempts to execute invalid vfp/neon encodings with FP disabled.
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
 
         if (((insn >> 25) & 7) == 1) {
             /* NEON Data processing.  */
-            if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-                goto illegal_op;
-            }
-
             if (disas_neon_data_insn(s, insn)) {
                 goto illegal_op;
             }
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
         }
         if ((insn & 0x0f100000) == 0x04000000) {
             /* NEON load/store.  */
-            if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-                goto illegal_op;
-            }
-
             if (disas_neon_ls_insn(s, insn)) {
                 goto illegal_op;
             }
-- 
2.20.1

Add the infrastructure for building and invoking a decodetree decoder
for the AArch32 Neon encodings.  At the moment the new decoder covers
nothing, so we always fall back to the existing hand-written decode.

We follow the same pattern we did for the VFP decodetree conversion
(commit 78e138bc1f672c145ef6ace74617d and following): code that deals
with Neon will be moving gradually out to translate-neon.vfp.inc,
which we #include into translate.c.

In order to share the decode files between A32 and T32, we
split Neon into 3 parts:
 * data-processing
 * load-store
 * 'shared' encodings

The first two groups of instructions have similar but not identical
A32 and T32 encodings, so we need to manually transform the T32
encoding into the A32 one before calling the decoder; the third group
covers the Neon instructions which are identical in A32 and T32.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-4-peter.maydell@linaro.org
---
 target/arm/neon-dp.decode       | 29 ++++++++++++++++++++++++++
 target/arm/neon-ls.decode       | 29 ++++++++++++++++++++++++++
 target/arm/neon-shared.decode   | 27 +++++++++++++++++++++++++
 target/arm/translate-neon.inc.c | 32 +++++++++++++++++++++++++++++
 target/arm/translate.c          | 36 +++++++++++++++++++++++++++++++--
 target/arm/Makefile.objs        | 18 +++++++++++++++++
 6 files changed, 169 insertions(+), 2 deletions(-)
 create mode 100644 target/arm/neon-dp.decode
 create mode 100644 target/arm/neon-ls.decode
 create mode 100644 target/arm/neon-shared.decode
 create mode 100644 target/arm/translate-neon.inc.c

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@
+# AArch32 Neon data-processing instruction descriptions
+#
+#  Copyright (c) 2020 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+
+# Encodings for Neon data processing instructions where the T32 encoding
+# is a simple transformation of the A32 encoding.
+# More specifically, this file covers instructions where the A32 encoding is
+#   0b1111_001p_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
+# and the T32 encoding is
+#   0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
+# This file works on the A32 encoding only; calling code for T32 has to
+# transform the insn into the A32 version first.
diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/neon-ls.decode
@@ -XXX,XX +XXX,XX @@
+# AArch32 Neon load/store instruction descriptions
+#
+#  Copyright (c) 2020 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+
+# Encodings for Neon load/store instructions where the T32 encoding
+# is a simple transformation of the A32 encoding.
+# More specifically, this file covers instructions where the A32 encoding is
+#   0b1111_0100_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
+# and the T32 encoding is
+#   0b1111_1001_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
+# This file works on the A32 encoding only; calling code for T32 has to
+# transform the insn into the A32 version first.
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@
+# AArch32 Neon instruction descriptions
+#
+#  Copyright (c) 2020 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+
+# Encodings for Neon instructions whose encoding is the same for
+# both A32 and T32.
+
+# More specifically, this covers:
+# 2reg scalar ext: 0b1111_1110_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
+# 3same ext:       0b1111_110x_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@
+/*
+ *  ARM translation: AArch32 Neon instructions
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *  Copyright (c) 2005-2007 CodeSourcery
+ *  Copyright (c) 2007 OpenedHand, Ltd.
+ *  Copyright (c) 2020 Linaro, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This file is intended to be included from translate.c; it uses
+ * some macros and definitions provided by that file.
+ * It might be possible to convert it to a standalone .c file eventually.
+ */
+
+/* Include the generated Neon decoder */
+#include "decode-neon-dp.inc.c"
+#include "decode-neon-ls.inc.c"
+#include "decode-neon-shared.inc.c"
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 
 #define ARM_CP_RW_BIT   (1 << 20)
 
-/* Include the VFP decoder */
+/* Include the VFP and Neon decoders */
 #include "translate-vfp.inc.c"
+#include "translate-neon.inc.c"
 
 static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
 {
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
         /* Unconditional instructions.  */
         /* TODO: Perhaps merge these into one decodetree output file.  */
         if (disas_a32_uncond(s, insn) ||
-            disas_vfp_uncond(s, insn)) {
+            disas_vfp_uncond(s, insn) ||
+            disas_neon_dp(s, insn) ||
+            disas_neon_ls(s, insn) ||
+            disas_neon_shared(s, insn)) {
             return;
         }
         /* fall back to legacy decoder */
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
         ARCH(6T2);
     }
 
+    if ((insn & 0xef000000) == 0xef000000) {
+        /*
+         * T32 encodings 0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
+         * transform into
+         * A32 encodings 0b1111_001p_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
+         */
+        uint32_t a32_insn = (insn & 0xe2ffffff) |
+            ((insn & (1 << 28)) >> 4) | (1 << 28);
+
+        if (disas_neon_dp(s, a32_insn)) {
+            return;
+        }
+    }
+
+    if ((insn & 0xff100000) == 0xf9000000) {
+        /*
+         * T32 encodings 0b1111_1001_ppp0_qqqq_qqqq_qqqq_qqqq_qqqq
+         * transform into
+         * A32 encodings 0b1111_0100_ppp0_qqqq_qqqq_qqqq_qqqq_qqqq
+         */
+        uint32_t a32_insn = (insn & 0x00ffffff) | 0xf4000000;
+
+        if (disas_neon_ls(s, a32_insn)) {
+            return;
+        }
+    }
+
     /*
      * TODO: Perhaps merge these into one decodetree output file.
      * Note disas_vfp is written for a32 with cond field in the
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
      */
     if (disas_t32(s, insn) ||
         disas_vfp_uncond(s, insn) ||
+        disas_neon_shared(s, insn) ||
         ((insn >> 28) == 0xe && disas_vfp(s, insn))) {
         return;
     }
diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -XXX,XX +XXX,XX @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
 	  $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
 	  "GEN", $(TARGET_DIR)$@)
 
+target/arm/decode-neon-shared.inc.c: $(SRC_PATH)/target/arm/neon-shared.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_neon_shared -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
+target/arm/decode-neon-dp.inc.c: $(SRC_PATH)/target/arm/neon-dp.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_neon_dp -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
+target/arm/decode-neon-ls.inc.c: $(SRC_PATH)/target/arm/neon-ls.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_neon_ls -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
 target/arm/decode-vfp.inc.c: $(SRC_PATH)/target/arm/vfp.decode $(DECODETREE)
 	$(call quiet-command,\
 	  $(PYTHON) $(DECODETREE) --static-decode disas_vfp -o $@ $<,\
@@ -XXX,XX +XXX,XX @@ target/arm/decode-t16.inc.c: $(SRC_PATH)/target/arm/t16.decode $(DECODETREE)
 	  "GEN", $(TARGET_DIR)$@)
 
 target/arm/translate-sve.o: target/arm/decode-sve.inc.c
+target/arm/translate.o: target/arm/decode-neon-shared.inc.c
+target/arm/translate.o: target/arm/decode-neon-dp.inc.c
+target/arm/translate.o: target/arm/decode-neon-ls.inc.c
 target/arm/translate.o: target/arm/decode-vfp.inc.c
 target/arm/translate.o: target/arm/decode-vfp-uncond.inc.c
 target/arm/translate.o: target/arm/decode-a32.inc.c
-- 
2.20.1

Convert the VCMLA (vector) insns in the 3same extension group to
decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-5-peter.maydell@linaro.org
---
 target/arm/neon-shared.decode   | 11 ++++++++++
 target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 11 +---------
 3 files changed, 49 insertions(+), 10 deletions(-)

diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@
 # More specifically, this covers:
 # 2reg scalar ext: 0b1111_1110_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
 # 3same ext:       0b1111_110x_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
+
+# VFP/Neon register fields; same as vfp.decode
+%vm_dp  5:1 0:4
+%vm_sp  0:4 5:1
+%vn_dp  7:1 16:4
+%vn_sp  16:4 7:1
+%vd_dp  22:1 12:4
+%vd_sp  12:4 22:1
+
+VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@
 #include "decode-neon-dp.inc.c"
 #include "decode-neon-ls.inc.c"
 #include "decode-neon-shared.inc.c"
+
+static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
+{
+    int opr_sz;
+    TCGv_ptr fpst;
+    gen_helper_gvec_3_ptr *fn_gvec_ptr;
+
+    if (!dc_isar_feature(aa32_vcma, s)
+        || (!a->size && !dc_isar_feature(aa32_fp16_arith, s))) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    opr_sz = (1 + a->q) * 8;
+    fpst = get_fpstatus_ptr(1);
+    fn_gvec_ptr = a->size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(1, a->vn),
+                       vfp_reg_offset(1, a->vm),
+                       fpst, opr_sz, opr_sz, a->rot,
+                       fn_gvec_ptr);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
     bool is_long = false, q = extract32(insn, 6, 1);
     bool ptr_is_env = false;
 
-    if ((insn & 0xfe200f10) == 0xfc200800) {
-        /* VCMLA -- 1111 110R R.1S .... .... 1000 ...0 .... */
-        int size = extract32(insn, 20, 1);
-        data = extract32(insn, 23, 2); /* rot */
-        if (!dc_isar_feature(aa32_vcma, s)
-            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
-            return 1;
-        }
-        fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
-    } else if ((insn & 0xfea00f10) == 0xfc800800) {
+    if ((insn & 0xfea00f10) == 0xfc800800) {
         /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
         int size = extract32(insn, 20, 1);
         data = extract32(insn, 24, 1); /* rot */
-- 
2.20.1

Convert the VCADD (vector) insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-6-peter.maydell@linaro.org
---
 target/arm/neon-shared.decode   |  3 +++
 target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 11 +---------
 3 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@
 
 VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
     tcg_temp_free_ptr(fpst);
     return true;
 }
+
+static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
+{
+    int opr_sz;
+    TCGv_ptr fpst;
+    gen_helper_gvec_3_ptr *fn_gvec_ptr;
+
+    if (!dc_isar_feature(aa32_vcma, s)
+        || (!a->size && !dc_isar_feature(aa32_fp16_arith, s))) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    opr_sz = (1 + a->q) * 8;
+    fpst = get_fpstatus_ptr(1);
+    fn_gvec_ptr = a->size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(1, a->vn),
+                       vfp_reg_offset(1, a->vm),
+                       fpst, opr_sz, opr_sz, a->rot,
+                       fn_gvec_ptr);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
     bool is_long = false, q = extract32(insn, 6, 1);
     bool ptr_is_env = false;
 
-    if ((insn & 0xfea00f10) == 0xfc800800) {
-        /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
-        int size = extract32(insn, 20, 1);
-        data = extract32(insn, 24, 1); /* rot */
-        if (!dc_isar_feature(aa32_vcma, s)
-            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
-            return 1;
-        }
-        fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
-    } else if ((insn & 0xfeb00f00) == 0xfc200d00) {
+    if ((insn & 0xfeb00f00) == 0xfc200d00) {
         /* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */
         bool u = extract32(insn, 4, 1);
         if (!dc_isar_feature(aa32_dp, s)) {
-- 
2.20.1

Convert the V[US]DOT (vector) insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-7-peter.maydell@linaro.org
---
 target/arm/neon-shared.decode   |  4 ++++
 target/arm/translate-neon.inc.c | 32 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  9 +--------
 3 files changed, 37 insertions(+), 8 deletions(-)

Convert the VFM[AS]L (vector) insns to decodetree.  This is the last
insn in the legacy decoder for the 3same_ext group, so we can
delete the legacy decoder function for the group entirely.

Note that in disas_thumb2_insn() the parts of this encoding space
where the decodetree decoder returns false will correctly be directed
to illegal_op by the "(insn & (1 << 28))" check so they won't fall
into disas_coproc_insn() by mistake.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-8-peter.maydell@linaro.org
---
 target/arm/neon-shared.decode   |  6 +++
 target/arm/translate-neon.inc.c | 31 +++++++++++
 target/arm/translate.c          | 92 +--------------------------------
 3 files changed, 38 insertions(+), 91 deletions(-)

diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
 # VUDOT and VSDOT
 VDOT           1111 110 00 . 10 .... .... 1101 . q:1 . u:1 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+# VFM[AS]L
+VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
+               vm=%vm_sp vn=%vn_sp vd=%vd_dp q=0
+VFML           1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VDOT(DisasContext *s, arg_VDOT *a)
                        opr_sz, opr_sz, 0, fn_gvec);
     return true;
 }
+
+static bool trans_VFML(DisasContext *s, arg_VFML *a)
+{
+    int opr_sz;
+
+    if (!dc_isar_feature(aa32_fhm, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        (a->vd & 0x10)) {
+        return false;
+    }
+
+    if (a->vd & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    opr_sz = (1 + a->q) * 8;
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(a->q, a->vn),
+                       vfp_reg_offset(a->q, a->vm),
+                       cpu_env, opr_sz, opr_sz, a->s, /* is_2 == 0 */
+                       gen_helper_gvec_fmlal_a32);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     return 0;
 }
 
-/* Advanced SIMD three registers of the same length extension.
- *  31           25    23  22    20   16   12  11   10   9    8        3     0
- * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
- * | 1 1 1 1 1 1 0 | op1 | D | op2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
- * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
- */
-static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
-{
-    gen_helper_gvec_3 *fn_gvec = NULL;
-    gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL;
-    int rd, rn, rm, opr_sz;
-    int data = 0;
-    int off_rn, off_rm;
-    bool is_long = false, q = extract32(insn, 6, 1);
-    bool ptr_is_env = false;
-
-    if ((insn & 0xff300f10) == 0xfc200810) {
-        /* VFM[AS]L -- 1111 1100 S.10 .... .... 1000 .Q.1 .... */
-        int is_s = extract32(insn, 23, 1);
-        if (!dc_isar_feature(aa32_fhm, s)) {
-            return 1;
-        }
-        is_long = true;
-        data = is_s; /* is_2 == 0 */
-        fn_gvec_ptr = gen_helper_gvec_fmlal_a32;
-        ptr_is_env = true;
-    } else {
-        return 1;
-    }
-
-    VFP_DREG_D(rd, insn);
-    if (rd & q) {
-        return 1;
-    }
-    if (q || !is_long) {
-        VFP_DREG_N(rn, insn);
-        VFP_DREG_M(rm, insn);
-        if ((rn | rm) & q & !is_long) {
-            return 1;
-        }
-        off_rn = vfp_reg_offset(1, rn);
-        off_rm = vfp_reg_offset(1, rm);
-    } else {
-        rn = VFP_SREG_N(insn);
-        rm = VFP_SREG_M(insn);
-        off_rn = vfp_reg_offset(0, rn);
-        off_rm = vfp_reg_offset(0, rm);
-    }
-
-    if (s->fp_excp_el) {
-        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
-        return 0;
-    }
-    if (!s->vfp_enabled) {
-        return 1;
-    }
-
-    opr_sz = (1 + q) * 8;
-    if (fn_gvec_ptr) {
-        TCGv_ptr ptr;
-        if (ptr_is_env) {
-            ptr = cpu_env;
-        } else {
-            ptr = get_fpstatus_ptr(1);
-        }
-        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), off_rn, off_rm, ptr,
-                           opr_sz, opr_sz, data, fn_gvec_ptr);
-        if (!ptr_is_env) {
-            tcg_temp_free_ptr(ptr);
-        }
-    } else {
-        tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), off_rn, off_rm,
-                           opr_sz, opr_sz, data, fn_gvec);
-    }
-    return 0;
-}
-
 /* Advanced SIMD two registers and a scalar extension.
  *  31             24   23  22   20   16   12  11   10   9    8        3     0
  * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                     }
                 }
             }
-        } else if ((insn & 0x0e000a00) == 0x0c000800
-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
-            if (disas_neon_insn_3same_ext(s, insn)) {
-                goto illegal_op;
-            }
-            return;
         } else if ((insn & 0x0f000a00) == 0x0e000800
                    && arm_dc_feature(s, ARM_FEATURE_V8)) {
             if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
             }
             break;
         }
-        if ((insn & 0xfe000a00) == 0xfc000800
+        if ((insn & 0xff000a00) == 0xfe000800
             && arm_dc_feature(s, ARM_FEATURE_V8)) {
             /* The Thumb2 and ARM encodings are identical.  */
-            if (disas_neon_insn_3same_ext(s, insn)) {
-                goto illegal_op;
-            }
-        } else if ((insn & 0xff000a00) == 0xfe000800
-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
-            /* The Thumb2 and ARM encodings are identical.  */
             if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
                 goto illegal_op;
             }
-- 
2.20.1

Convert VCMLA (scalar) in the 2reg-scalar-ext group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-9-peter.maydell@linaro.org
---
 target/arm/neon-shared.decode   |  5 +++++
 target/arm/translate-neon.inc.c | 40 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 26 +--------------------
 3 files changed, 46 insertions(+), 25 deletions(-)

diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
                vm=%vm_sp vn=%vn_sp vd=%vd_dp q=0
 VFML           1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
+
+VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
+               vn=%vn_dp vd=%vd_dp size=0
+VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp size=1 index=0
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VFML(DisasContext *s, arg_VFML *a)
                        gen_helper_gvec_fmlal_a32);
     return true;
 }
+
+static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
+{
+    gen_helper_gvec_3_ptr *fn_gvec_ptr;
+    int opr_sz;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_vcma, s)) {
+        return false;
+    }
+    if (a->size == 0 && !dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vd | a->vn) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fn_gvec_ptr = (a->size ? gen_helper_gvec_fcmlas_idx
+                   : gen_helper_gvec_fcmlah_idx);
+    opr_sz = (1 + a->q) * 8;
+    fpst = get_fpstatus_ptr(1);
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(1, a->vn),
+                       vfp_reg_offset(1, a->vm),
+                       fpst, opr_sz, opr_sz,
+                       (a->index << 2) | a->rot, fn_gvec_ptr);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
     bool is_long = false, q = extract32(insn, 6, 1);
     bool ptr_is_env = false;
 
-    if ((insn & 0xff000f10) == 0xfe000800) {
-        /* VCMLA (indexed) -- 1111 1110 S.RR .... .... 1000 ...0 .... */
-        int rot = extract32(insn, 20, 2);
-        int size = extract32(insn, 23, 1);
-        int index;
-
-        if (!dc_isar_feature(aa32_vcma, s)) {
-            return 1;
-        }
-        if (size == 0) {
-            if (!dc_isar_feature(aa32_fp16_arith, s)) {
-                return 1;
-            }
-            /* For fp16, rm is just Vm, and index is M.  */
-            rm = extract32(insn, 0, 4);
-            index = extract32(insn, 5, 1);
-        } else {
-            /* For fp32, rm is the usual M:Vm, and index is 0.  */
-            VFP_DREG_M(rm, insn);
-            index = 0;
-        }
-        data = (index << 2) | rot;
-        fn_gvec_ptr = (size ? gen_helper_gvec_fcmlas_idx
-                       : gen_helper_gvec_fcmlah_idx);
-    } else if ((insn & 0xffb00f00) == 0xfe200d00) {
+    if ((insn & 0xffb00f00) == 0xfe200d00) {
         /* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */
         int u = extract32(insn, 4, 1);
 
-- 
2.20.1

Convert the V[US]DOT (scalar) insns in the 2reg-scalar-ext group
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-10-peter.maydell@linaro.org
---
 target/arm/neon-shared.decode   |  3 +++
 target/arm/translate-neon.inc.c | 35 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 13 +-----------
 3 files changed, 39 insertions(+), 12 deletions(-)

Convert the VFM[AS]L (scalar) insns in the 2reg-scalar-ext group
to decodetree. These are the last ones in the group so we can remove
all the legacy decode for the group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-11-peter.maydell@linaro.org
---
 target/arm/neon-shared.decode   |   7 +++
 target/arm/translate-neon.inc.c |  32 ++++++++++
 target/arm/translate.c          | 107 +-------------------------------
 3 files changed, 40 insertions(+), 106 deletions(-)

diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
 
 VDOT_scalar    1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 rm:4 \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+%vfml_scalar_q0_rm 0:3 5:1
+%vfml_scalar_q1_index 5:1 3:1
+VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
+               rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
+VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
+               index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
     tcg_temp_free_ptr(fpst);
     return true;
 }
+
+static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
+{
+    int opr_sz;
+
+    if (!dc_isar_feature(aa32_fhm, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd & 0x10) || (a->q && (a->vn & 0x10)))) {
+        return false;
+    }
+
+    if (a->vd & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    opr_sz = (1 + a->q) * 8;
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(a->q, a->vn),
+                       vfp_reg_offset(a->q, a->rm),
+                       cpu_env, opr_sz, opr_sz,
+                       (a->index << 2) | a->s, /* is_2 == 0 */
+                       gen_helper_gvec_fmlal_idx_a32);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
 }
 
 #define VFP_REG_SHR(x, n) (((n) > 0) ? (x) >> (n) : (x) << -(n))
-#define VFP_SREG(insn, bigbit, smallbit) \
-  ((VFP_REG_SHR(insn, bigbit - 1) & 0x1e) | (((insn) >> (smallbit)) & 1))
 #define VFP_DREG(reg, insn, bigbit, smallbit) do { \
     if (dc_isar_feature(aa32_simd_r32, s)) { \
         reg = (((insn) >> (bigbit)) & 0x0f) \
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
         reg = ((insn) >> (bigbit)) & 0x0f; \
     }} while (0)
 
-#define VFP_SREG_D(insn) VFP_SREG(insn, 12, 22)
 #define VFP_DREG_D(reg, insn) VFP_DREG(reg, insn, 12, 22)
-#define VFP_SREG_N(insn) VFP_SREG(insn, 16,  7)
 #define VFP_DREG_N(reg, insn) VFP_DREG(reg, insn, 16,  7)
-#define VFP_SREG_M(insn) VFP_SREG(insn,  0,  5)
 #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
 
 static void gen_neon_dup_low16(TCGv_i32 var)
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     return 0;
 }
 
-/* Advanced SIMD two registers and a scalar extension.
- *  31             24   23  22   20   16   12  11   10   9    8        3     0
- * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
- * | 1 1 1 1 1 1 1 0 | o1 | D | o2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
- * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
- *
- */
-
-static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
-{
-    gen_helper_gvec_3 *fn_gvec = NULL;
-    gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL;
-    int rd, rn, rm, opr_sz, data;
-    int off_rn, off_rm;
-    bool is_long = false, q = extract32(insn, 6, 1);
-    bool ptr_is_env = false;
-
-    if ((insn & 0xffa00f10) == 0xfe000810) {
-        /* VFM[AS]L -- 1111 1110 0.0S .... .... 1000 .Q.1 .... */
-        int is_s = extract32(insn, 20, 1);
-        int vm20 = extract32(insn, 0, 3);
-        int vm3 = extract32(insn, 3, 1);
-        int m = extract32(insn, 5, 1);
-        int index;
-
-        if (!dc_isar_feature(aa32_fhm, s)) {
-            return 1;
-        }
-        if (q) {
-            rm = vm20;
-            index = m * 2 + vm3;
-        } else {
-            rm = vm20 * 2 + m;
-            index = vm3;
-        }
-        is_long = true;
-        data = (index << 2) | is_s; /* is_2 == 0 */
-        fn_gvec_ptr = gen_helper_gvec_fmlal_idx_a32;
-        ptr_is_env = true;
-    } else {
-        return 1;
-    }
-
-    VFP_DREG_D(rd, insn);
-    if (rd & q) {
-        return 1;
-    }
-    if (q || !is_long) {
-        VFP_DREG_N(rn, insn);
-        if (rn & q & !is_long) {
-            return 1;
-        }
-        off_rn = vfp_reg_offset(1, rn);
-        off_rm = vfp_reg_offset(1, rm);
-    } else {
-        rn = VFP_SREG_N(insn);
-        off_rn = vfp_reg_offset(0, rn);
-        off_rm = vfp_reg_offset(0, rm);
-    }
-    if (s->fp_excp_el) {
-        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
-        return 0;
-    }
-    if (!s->vfp_enabled) {
-        return 1;
-    }
-
-    opr_sz = (1 + q) * 8;
-    if (fn_gvec_ptr) {
-        TCGv_ptr ptr;
-        if (ptr_is_env) {
-            ptr = cpu_env;
-        } else {
-            ptr = get_fpstatus_ptr(1);
-        }
-        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), off_rn, off_rm, ptr,
-                           opr_sz, opr_sz, data, fn_gvec_ptr);
-        if (!ptr_is_env) {
-            tcg_temp_free_ptr(ptr);
-        }
-    } else {
-        tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), off_rn, off_rm,
-                           opr_sz, opr_sz, data, fn_gvec);
-    }
-    return 0;
-}
-
 static int disas_coproc_insn(DisasContext *s, uint32_t insn)
 {
     int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                     }
                 }
             }
-        } else if ((insn & 0x0f000a00) == 0x0e000800
-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
-            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
-                goto illegal_op;
-            }
-            return;
         }
         goto illegal_op;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
             }
             break;
         }
-        if ((insn & 0xff000a00) == 0xfe000800
-            && arm_dc_feature(s, ARM_FEATURE_V8)) {
-            /* The Thumb2 and ARM encodings are identical.  */
-            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
-                goto illegal_op;
-            }
-        } else if (((insn >> 24) & 3) == 3) {
+        if (((insn >> 24) & 3) == 3) {
             /* Translate into the equivalent ARM encoding.  */
             insn = (insn & 0xe2ffffff) | ((insn & (1 << 28)) >> 4) | (1 << 28);
             if (disas_neon_data_insn(s, insn)) {
-- 
2.20.1

Convert the Neon "load/store multiple structures" insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-12-peter.maydell@linaro.org
---
 target/arm/neon-ls.decode       |   7 ++
 target/arm/translate-neon.inc.c | 124 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  91 +----------------------
 3 files changed, 133 insertions(+), 89 deletions(-)

diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-ls.decode
+++ b/target/arm/neon-ls.decode
@@ -XXX,XX +XXX,XX @@
 #   0b1111_1001_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
 # This file works on the A32 encoding only; calling code for T32 has to
 # transform the insn into the A32 version first.
+
+%vd_dp  22:1 12:4
+
+# Neon load/store multiple structures
+
+VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
+               vd=%vd_dp
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
                        gen_helper_gvec_fmlal_idx_a32);
     return true;
 }
+
+static struct {
+    int nregs;
+    int interleave;
+    int spacing;
+} const neon_ls_element_type[11] = {
+    {1, 4, 1},
+    {1, 4, 2},
+    {4, 1, 1},
+    {2, 2, 2},
+    {1, 3, 1},
+    {1, 3, 2},
+    {3, 1, 1},
+    {1, 1, 1},
+    {1, 2, 1},
+    {1, 2, 2},
+    {2, 1, 1}
+};
+
+static void gen_neon_ldst_base_update(DisasContext *s, int rm, int rn,
+                                      int stride)
+{
+    if (rm != 15) {
+        TCGv_i32 base;
+
+        base = load_reg(s, rn);
+        if (rm == 13) {
+            tcg_gen_addi_i32(base, base, stride);
+        } else {
+            TCGv_i32 index;
+            index = load_reg(s, rm);
+            tcg_gen_add_i32(base, base, index);
+            tcg_temp_free_i32(index);
+        }
+        store_reg(s, rn, base);
+    }
+}
+
+static bool trans_VLDST_multiple(DisasContext *s, arg_VLDST_multiple *a)
+{
+    /* Neon load/store multiple structures */
+    int nregs, interleave, spacing, reg, n;
+    MemOp endian = s->be_data;
+    int mmu_idx = get_mem_index(s);
+    int size = a->size;
+    TCGv_i64 tmp64;
+    TCGv_i32 addr, tmp;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
+        return false;
+    }
+    if (a->itype > 10) {
+        return false;
+    }
+    /* Catch UNDEF cases for bad values of align field */
+    switch (a->itype & 0xc) {
+    case 4:
+        if (a->align >= 2) {
+            return false;
+        }
+        break;
+    case 8:
+        if (a->align == 3) {
+            return false;
+        }
+        break;
+    default:
+        break;
+    }
+    nregs = neon_ls_element_type[a->itype].nregs;
+    interleave = neon_ls_element_type[a->itype].interleave;
+    spacing = neon_ls_element_type[a->itype].spacing;
+    if (size == 3 && (interleave | spacing) != 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /* For our purposes, bytes are always little-endian.  */
+    if (size == 0) {
+        endian = MO_LE;
+    }
+    /*
+     * Consecutive little-endian elements from a single register
+     * can be promoted to a larger little-endian operation.
+     */
+    if (interleave == 1 && endian == MO_LE) {
+        size = 3;
+    }
+    tmp64 = tcg_temp_new_i64();
+    addr = tcg_temp_new_i32();
+    tmp = tcg_const_i32(1 << size);
+    load_reg_var(s, addr, a->rn);
+    for (reg = 0; reg < nregs; reg++) {
+        for (n = 0; n < 8 >> size; n++) {
+            int xs;
+            for (xs = 0; xs < interleave; xs++) {
+                int tt = a->vd + reg + spacing * xs;
+
+                if (a->l) {
+                    gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
+                    neon_store_element64(tt, n, size, tmp64);
+                } else {
+                    neon_load_element64(tmp64, tt, n, size);
+                    gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
+                }
+                tcg_gen_add_i32(addr, addr, tmp);
+            }
+        }
+    }
+    tcg_temp_free_i32(addr);
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i64(tmp64);
+
+    gen_neon_ldst_base_update(s, a->rm, a->rn, nregs * interleave * 8);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_trn_u16(TCGv_i32 t0, TCGv_i32 t1)
 }
 
 
-static struct {
-    int nregs;
-    int interleave;
-    int spacing;
-} const neon_ls_element_type[11] = {
-    {1, 4, 1},
-    {1, 4, 2},
-    {4, 1, 1},
-    {2, 2, 2},
-    {1, 3, 1},
-    {1, 3, 2},
-    {3, 1, 1},
-    {1, 1, 1},
-    {1, 2, 1},
-    {1, 2, 2},
-    {2, 1, 1}
-};
-
 /* Translate a NEON load/store element instruction.  Return nonzero if the
    instruction is invalid.  */
 static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
 {
     int rd, rn, rm;
-    int op;
     int nregs;
-    int interleave;
-    int spacing;
     int stride;
     int size;
     int reg;
     int load;
-    int n;
     int vec_size;
-    int mmu_idx;
-    MemOp endian;
     TCGv_i32 addr;
     TCGv_i32 tmp;
-    TCGv_i32 tmp2;
-    TCGv_i64 tmp64;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
     rn = (insn >> 16) & 0xf;
     rm = insn & 0xf;
     load = (insn & (1 << 21)) != 0;
-    endian = s->be_data;
-    mmu_idx = get_mem_index(s);
     if ((insn & (1 << 23)) == 0) {
-        /* Load store all elements.  */
-        op = (insn >> 8) & 0xf;
-        size = (insn >> 6) & 3;
-        if (op > 10)
-            return 1;
-        /* Catch UNDEF cases for bad values of align field */
-        switch (op & 0xc) {
-        case 4:
-            if (((insn >> 5) & 1) == 1) {
-                return 1;
-            }
-            break;
-        case 8:
-            if (((insn >> 4) & 3) == 3) {
-                return 1;
-            }
-            break;
-        default:
-            break;
-        }
-        nregs = neon_ls_element_type[op].nregs;
-        interleave = neon_ls_element_type[op].interleave;
-        spacing = neon_ls_element_type[op].spacing;
-        if (size == 3 && (interleave | spacing) != 1) {
-            return 1;
-        }
-        /* For our purposes, bytes are always little-endian.  */
-        if (size == 0) {
-            endian = MO_LE;
-        }
-        /* Consecutive little-endian elements from a single register
-         * can be promoted to a larger little-endian operation.
-         */
-        if (interleave == 1 && endian == MO_LE) {
-            size = 3;
-        }
-        tmp64 = tcg_temp_new_i64();
-        addr = tcg_temp_new_i32();
-        tmp2 = tcg_const_i32(1 << size);
-        load_reg_var(s, addr, rn);
-        for (reg = 0; reg < nregs; reg++) {
-            for (n = 0; n < 8 >> size; n++) {
-                int xs;
-                for (xs = 0; xs < interleave; xs++) {
-                    int tt = rd + reg + spacing * xs;
-
-                    if (load) {
-                        gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
-                        neon_store_element64(tt, n, size, tmp64);
-                    } else {
-                        neon_load_element64(tmp64, tt, n, size);
-                        gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
-                    }
-                    tcg_gen_add_i32(addr, addr, tmp2);
-                }
-            }
-        }
-        tcg_temp_free_i32(addr);
-        tcg_temp_free_i32(tmp2);
-        tcg_temp_free_i64(tmp64);
-        stride = nregs * interleave * 8;
+        /* Load store all elements -- handled already by decodetree */
+        return 1;
     } else {
         size = (insn >> 10) & 3;
         if (size == 3) {
-- 
2.20.1

Convert the Neon "load single structure to all lanes" insns to
decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-13-peter.maydell@linaro.org
---
 target/arm/neon-ls.decode       |  5 +++
 target/arm/translate-neon.inc.c | 73 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 55 +------------------------
 3 files changed, 80 insertions(+), 53 deletions(-)

Convert the Neon "load/store single structure to one lane" insns to
decodetree.

As this is the last set of insns in the neon load/store group,
we can remove the whole disas_neon_ls_insn() function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-14-peter.maydell@linaro.org
---
 target/arm/neon-ls.decode       |  11 +++
 target/arm/translate-neon.inc.c |  89 +++++++++++++++++++
 target/arm/translate.c          | 147 --------------------------------
 3 files changed, 100 insertions(+), 147 deletions(-)

diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-ls.decode
+++ b/target/arm/neon-ls.decode
@@ -XXX,XX +XXX,XX @@ VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
 
 VLD_all_lanes  1111 0100 1 . 1 0 rn:4 .... 11 n:2 size:2 t:1 a:1 rm:4 \
                vd=%vd_dp
+
+# Neon load/store single structure to one lane
+%imm1_5_p1 5:1 !function=plus1
+%imm1_6_p1 6:1 !function=plus1
+
+VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 00 n:2 reg_idx:3 align:1 rm:4 \
+               vd=%vd_dp size=0 stride=1
+VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 01 n:2 reg_idx:2 align:2 rm:4 \
+               vd=%vd_dp size=1 stride=%imm1_5_p1
+VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 10 n:2 reg_idx:1 align:3 rm:4 \
+               vd=%vd_dp size=2 stride=%imm1_6_p1
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@
  * It might be possible to convert it to a standalone .c file eventually.
  */
 
+static inline int plus1(DisasContext *s, int x)
+{
+    return x + 1;
+}
+
 /* Include the generated Neon decoder */
 #include "decode-neon-dp.inc.c"
 #include "decode-neon-ls.inc.c"
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
 
     return true;
 }
+
+static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
+{
+    /* Neon load/store single structure to one lane */
+    int reg;
+    int nregs = a->n + 1;
+    int vd = a->vd;
+    TCGv_i32 addr, tmp;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
+        return false;
+    }
+
+    /* Catch the UNDEF cases. This is unavoidably a bit messy. */
+    switch (nregs) {
+    case 1:
+        if (((a->align & (1 << a->size)) != 0) ||
+            (a->size == 2 && ((a->align & 3) == 1 || (a->align & 3) == 2))) {
+            return false;
+        }
+        break;
+    case 3:
+        if ((a->align & 1) != 0) {
+            return false;
+        }
+        /* fall through */
+    case 2:
+        if (a->size == 2 && (a->align & 2) != 0) {
+            return false;
+        }
+        break;
+    case 4:
+        if ((a->size == 2) && ((a->align & 3) == 3)) {
+            return false;
+        }
+        break;
+    default:
+        abort();
+    }
+    if ((vd + a->stride * (nregs - 1)) > 31) {
+        /*
+         * Attempts to write off the end of the register file are
+         * UNPREDICTABLE; we choose to UNDEF because otherwise we would
+         * access off the end of the array that holds the register data.
+         */
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    addr = tcg_temp_new_i32();
+    load_reg_var(s, addr, a->rn);
+    /*
+     * TODO: if we implemented alignment exceptions, we should check
+     * addr against the alignment encoded in a->align here.
+     */
+    for (reg = 0; reg < nregs; reg++) {
+        if (a->l) {
+            gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
+                            s->be_data | a->size);
+            neon_store_element(vd, a->reg_idx, a->size, tmp);
+        } else { /* Store */
+            neon_load_element(tmp, vd, a->reg_idx, a->size);
+            gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
+                            s->be_data | a->size);
+        }
+        vd += a->stride;
+        tcg_gen_addi_i32(addr, addr, 1 << a->size);
+    }
+    tcg_temp_free_i32(addr);
+    tcg_temp_free_i32(tmp);
+
+    gen_neon_ldst_base_update(s, a->rm, a->rn, (1 << a->size) * nregs);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_trn_u16(TCGv_i32 t0, TCGv_i32 t1)
     tcg_temp_free_i32(rd);
 }
 
-
-/* Translate a NEON load/store element instruction.  Return nonzero if the
-   instruction is invalid.  */
-static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
-{
-    int rd, rn, rm;
-    int nregs;
-    int stride;
-    int size;
-    int reg;
-    int load;
-    TCGv_i32 addr;
-    TCGv_i32 tmp;
-
-    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-        return 1;
-    }
-
-    /* FIXME: this access check should not take precedence over UNDEF
-     * for invalid encodings; we will generate incorrect syndrome information
-     * for attempts to execute invalid vfp/neon encodings with FP disabled.
-     */
-    if (s->fp_excp_el) {
-        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
-        return 0;
-    }
-
-    if (!s->vfp_enabled)
-      return 1;
-    VFP_DREG_D(rd, insn);
-    rn = (insn >> 16) & 0xf;
-    rm = insn & 0xf;
-    load = (insn & (1 << 21)) != 0;
-    if ((insn & (1 << 23)) == 0) {
-        /* Load store all elements -- handled already by decodetree */
-        return 1;
-    } else {
-        size = (insn >> 10) & 3;
-        if (size == 3) {
-            /* Load single element to all lanes -- handled by decodetree  */
-            return 1;
-        } else {
-            /* Single element.  */
-            int idx = (insn >> 4) & 0xf;
-            int reg_idx;
-            switch (size) {
-            case 0:
-                reg_idx = (insn >> 5) & 7;
-                stride = 1;
-                break;
-            case 1:
-                reg_idx = (insn >> 6) & 3;
-                stride = (insn & (1 << 5)) ? 2 : 1;
-                break;
-            case 2:
-                reg_idx = (insn >> 7) & 1;
-                stride = (insn & (1 << 6)) ? 2 : 1;
-                break;
-            default:
-                abort();
-            }
-            nregs = ((insn >> 8) & 3) + 1;
-            /* Catch the UNDEF cases. This is unavoidably a bit messy. */
-            switch (nregs) {
-            case 1:
-                if (((idx & (1 << size)) != 0) ||
-                    (size == 2 && ((idx & 3) == 1 || (idx & 3) == 2))) {
-                    return 1;
-                }
-                break;
-            case 3:
-                if ((idx & 1) != 0) {
-                    return 1;
-                }
-                /* fall through */
-            case 2:
-                if (size == 2 && (idx & 2) != 0) {
-                    return 1;
-                }
-                break;
-            case 4:
-                if ((size == 2) && ((idx & 3) == 3)) {
-                    return 1;
-                }
-                break;
-            default:
-                abort();
-            }
-            if ((rd + stride * (nregs - 1)) > 31) {
-                /* Attempts to write off the end of the register file
-                 * are UNPREDICTABLE; we choose to UNDEF because otherwise
-                 * the neon_load_reg() would write off the end of the array.
-                 */
-                return 1;
-            }
-            tmp = tcg_temp_new_i32();
-            addr = tcg_temp_new_i32();
-            load_reg_var(s, addr, rn);
-            for (reg = 0; reg < nregs; reg++) {
-                if (load) {
-                    gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
-                                    s->be_data | size);
-                    neon_store_element(rd, reg_idx, size, tmp);
-                } else { /* Store */
-                    neon_load_element(tmp, rd, reg_idx, size);
-                    gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
-                                    s->be_data | size);
-                }
-                rd += stride;
-                tcg_gen_addi_i32(addr, addr, 1 << size);
-            }
-            tcg_temp_free_i32(addr);
-            tcg_temp_free_i32(tmp);
-            stride = nregs * (1 << size);
-        }
-    }
-    if (rm != 15) {
-        TCGv_i32 base;
-
-        base = load_reg(s, rn);
-        if (rm == 13) {
-            tcg_gen_addi_i32(base, base, stride);
-        } else {
-            TCGv_i32 index;
-            index = load_reg(s, rm);
-            tcg_gen_add_i32(base, base, index);
-            tcg_temp_free_i32(index);
-        }
-        store_reg(s, rn, base);
-    }
-    return 0;
-}
-
 static inline void gen_neon_narrow(int size, TCGv_i32 dest, TCGv_i64 src)
 {
     switch (size) {
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
             }
             return;
         }
-        if ((insn & 0x0f100000) == 0x04000000) {
-            /* NEON load/store.  */
-            if (disas_neon_ls_insn(s, insn)) {
-                goto illegal_op;
-            }
-            return;
-        }
         if ((insn & 0x0e000f00) == 0x0c000100) {
             if (arm_dc_feature(s, ARM_FEATURE_IWMMXT)) {
                 /* iWMMXt register transfer.  */
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
         }
         break;
     case 12:
-        if ((insn & 0x01100000) == 0x01000000) {
-            if (disas_neon_ls_insn(s, insn)) {
-                goto illegal_op;
-            }
-            break;
-        }
         goto illegal_op;
     default:
     illegal_op:
-- 
2.20.1

Convert the Neon 3-reg-same VADD and VSUB insns to decodetree.

Note that we don't need the neon_3r_sizes[op] check here because all
size values are OK for VADD and VSUB; we'll add this when we convert
the first insn that has size restrictions.

For this we need one of the GVecGen*Fn typedefs currently in
translate-a64.h; move them all to translate.h as a block so they
are visible to the 32-bit decoder.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-15-peter.maydell@linaro.org
---
 target/arm/translate-a64.h      |  9 --------
 target/arm/translate.h          |  9 ++++++++
 target/arm/neon-dp.decode       | 17 +++++++++++++++
 target/arm/translate-neon.inc.c | 38 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 14 ++++--------
 5 files changed, 68 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -XXX,XX +XXX,XX @@ static inline int vec_full_reg_size(DisasContext *s)
 
 bool disas_sve(DisasContext *, uint32_t);
 
-/* Note that the gvec expanders operate on offsets + sizes.  */
-typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
-typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
-                         uint32_t, uint32_t);
-typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
-                        uint32_t, uint32_t, uint32_t);
-typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
-                        uint32_t, uint32_t, uint32_t);
-
 #endif /* TARGET_ARM_TRANSLATE_A64_H */
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 #define dc_isar_feature(name, ctx) \
     ({ DisasContext *ctx_ = (ctx); isar_feature_##name(ctx_->isar); })
 
+/* Note that the gvec expanders operate on offsets + sizes.  */
+typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
+typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
+                         uint32_t, uint32_t);
+typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
+                        uint32_t, uint32_t, uint32_t);
+typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
+                        uint32_t, uint32_t, uint32_t);
+
 #endif /* TARGET_ARM_TRANSLATE_H */
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@
 #
 # This file is processed by scripts/decodetree.py
 #
+# VFP/Neon register fields; same as vfp.decode
+%vm_dp  5:1 0:4
+%vn_dp  7:1 16:4
+%vd_dp  22:1 12:4
 
 # Encodings for Neon data processing instructions where the T32 encoding
 # is a simple transformation of the A32 encoding.
@@ -XXX,XX +XXX,XX @@
 #   0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
 # This file works on the A32 encoding only; calling code for T32 has to
 # transform the insn into the A32 version first.
+
+######################################################################
+# 3-reg-same grouping:
+# 1111 001 U 0 D sz:2 Vn:4 Vd:4 opc:4 N Q M op Vm:4
+######################################################################
+
+&3same vm vn vd q size
+
+@3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
+VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
 
     return true;
 }
+
+static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
+{
+    int vec_size = a->q ? 16 : 8;
+    int rd_ofs = neon_reg_offset(a->vd, 0);
+    int rn_ofs = neon_reg_offset(a->vn, 0);
+    int rm_ofs = neon_reg_offset(a->vm, 0);
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fn(a->size, rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
+    return true;
+}
+
+#define DO_3SAME(INSN, FUNC)                                            \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        return do_3same(s, a, FUNC);                                    \
+    }
+
+DO_3SAME(VADD, tcg_gen_gvec_add)
+DO_3SAME(VSUB, tcg_gen_gvec_sub)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             }
             return 0;
 
-        case NEON_3R_VADD_VSUB:
-            if (u) {
-                tcg_gen_gvec_sub(size, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-            } else {
-                tcg_gen_gvec_add(size, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-            }
-            return 0;
-
         case NEON_3R_VQADD:
             tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
                            rn_ofs, rm_ofs, vec_size, vec_size,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
                            u ? &ushl_op[size] : &sshl_op[size]);
             return 0;
+
+        case NEON_3R_VADD_VSUB:
+            /* Already handled by decodetree */
+            return 1;
         }
 
         if (size == 3) {
-- 
2.20.1

Convert the Neon logic ops in the 3-reg-same grouping to decodetree.
Note that for the logic ops the 'size' field forms part of their
decode and the actual operations are always bitwise.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-16-peter.maydell@linaro.org
---
 target/arm/neon-dp.decode       | 12 +++++++++++
 target/arm/translate-neon.inc.c | 19 +++++++++++++++++
 target/arm/translate.c          | 38 +--------------------------------
 3 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@
 @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+@3same_logic     .... ... . . . .. .... .... .... . q:1 .. .... \
+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
+
+VAND_3s          1111 001 0 0 . 00 .... .... 0001 ... 1 .... @3same_logic
+VBIC_3s          1111 001 0 0 . 01 .... .... 0001 ... 1 .... @3same_logic
+VORR_3s          1111 001 0 0 . 10 .... .... 0001 ... 1 .... @3same_logic
+VORN_3s          1111 001 0 0 . 11 .... .... 0001 ... 1 .... @3same_logic
+VEOR_3s          1111 001 1 0 . 00 .... .... 0001 ... 1 .... @3same_logic
+VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
+VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
+VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
+
 VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
 VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
 
 DO_3SAME(VADD, tcg_gen_gvec_add)
 DO_3SAME(VSUB, tcg_gen_gvec_sub)
+DO_3SAME(VAND, tcg_gen_gvec_and)
+DO_3SAME(VBIC, tcg_gen_gvec_andc)
+DO_3SAME(VORR, tcg_gen_gvec_or)
+DO_3SAME(VORN, tcg_gen_gvec_orc)
+DO_3SAME(VEOR, tcg_gen_gvec_xor)
+
+/* These insns are all gvec_bitsel but with the inputs in various orders. */
+#define DO_3SAME_BITSEL(INSN, O1, O2, O3)                               \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        tcg_gen_gvec_bitsel(vece, rd_ofs, O1, O2, O3, oprsz, maxsz);    \
+    }                                                                   \
+    DO_3SAME(INSN, gen_##INSN##_3s)
+
+DO_3SAME_BITSEL(VBSL, rd_ofs, rn_ofs, rm_ofs)
+DO_3SAME_BITSEL(VBIT, rm_ofs, rn_ofs, rd_ofs)
+DO_3SAME_BITSEL(VBIF, rm_ofs, rd_ofs, rn_ofs)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             }
             return 1;
 
-        case NEON_3R_LOGIC: /* Logic ops.  */
-            switch ((u << 2) | size) {
-            case 0: /* VAND */
-                tcg_gen_gvec_and(0, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-                break;
-            case 1: /* VBIC */
-                tcg_gen_gvec_andc(0, rd_ofs, rn_ofs, rm_ofs,
-                                  vec_size, vec_size);
-                break;
-            case 2: /* VORR */
-                tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
-                                vec_size, vec_size);
-                break;
-            case 3: /* VORN */
-                tcg_gen_gvec_orc(0, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-                break;
-            case 4: /* VEOR */
-                tcg_gen_gvec_xor(0, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-                break;
-            case 5: /* VBSL */
-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rd_ofs, rn_ofs, rm_ofs,
-                                    vec_size, vec_size);
-                break;
-            case 6: /* VBIT */
-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rn_ofs, rd_ofs,
-                                    vec_size, vec_size);
-                break;
-            case 7: /* VBIF */
-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rd_ofs, rn_ofs,
-                                    vec_size, vec_size);
-                break;
-            }
-            return 0;
-
         case NEON_3R_VQADD:
             tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
                            rn_ofs, rm_ofs, vec_size, vec_size,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 0;
 
         case NEON_3R_VADD_VSUB:
+        case NEON_3R_LOGIC:
             /* Already handled by decodetree */
             return 1;
         }
-- 
2.20.1

Convert the Neon 3-reg-same VMAX and VMIN insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-17-peter.maydell@linaro.org
---
 target/arm/neon-dp.decode       |  5 +++++
 target/arm/translate-neon.inc.c | 14 ++++++++++++++
 target/arm/translate.c          | 21 ++-------------------
 3 files changed, 21 insertions(+), 19 deletions(-)

Convert the Neon comparison ops in the 3-reg-same grouping
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-18-peter.maydell@linaro.org
---
 target/arm/neon-dp.decode       |  8 ++++++++
 target/arm/translate-neon.inc.c | 22 ++++++++++++++++++++++
 target/arm/translate.c          | 23 +++--------------------
 3 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
 VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
 VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
 
+VCGT_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 0 .... @3same
+VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
+VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
+VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
+
 VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
 VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
 VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
@@ -XXX,XX +XXX,XX @@ VMIN_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
 
 VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
 VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
+
+VTST_3s          1111 001 0 0 . .. .... .... 1000 . . . 1 .... @3same
+VCEQ_3s          1111 001 1 0 . .. .... .... 1000 . . . 1 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
 DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
 DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
 DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
+
+#define DO_3SAME_CMP(INSN, COND)                                        \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        tcg_gen_gvec_cmp(COND, vece, rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz); \
+    }                                                                   \
+    DO_3SAME_NO_SZ_3(INSN, gen_##INSN##_3s)
+
+DO_3SAME_CMP(VCGT_S, TCG_COND_GT)
+DO_3SAME_CMP(VCGT_U, TCG_COND_GTU)
+DO_3SAME_CMP(VCGE_S, TCG_COND_GE)
+DO_3SAME_CMP(VCGE_U, TCG_COND_GEU)
+DO_3SAME_CMP(VCEQ, TCG_COND_EQ)
+
+static void gen_VTST_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                         uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &cmtst_op[vece]);
+}
+DO_3SAME_NO_SZ_3(VTST, gen_VTST_3s)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                            u ? &mls_op[size] : &mla_op[size]);
             return 0;
 
-        case NEON_3R_VTST_VCEQ:
-            if (u) { /* VCEQ */
-                tcg_gen_gvec_cmp(TCG_COND_EQ, size, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-            } else { /* VTST */
-                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-                               vec_size, vec_size, &cmtst_op[size]);
-            }
-            return 0;
-
-        case NEON_3R_VCGT:
-            tcg_gen_gvec_cmp(u ? TCG_COND_GTU : TCG_COND_GT, size,
-                             rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
-            return 0;
-
-        case NEON_3R_VCGE:
-            tcg_gen_gvec_cmp(u ? TCG_COND_GEU : TCG_COND_GE, size,
-                             rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
-            return 0;
-
         case NEON_3R_VSHL:
             /* Note the operation is vshl vd,vm,vn */
             tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_LOGIC:
         case NEON_3R_VMAX:
         case NEON_3R_VMIN:
+        case NEON_3R_VTST_VCEQ:
+        case NEON_3R_VCGT:
+        case NEON_3R_VCGE:
             /* Already handled by decodetree */
             return 1;
         }
-- 
2.20.1

Convert the Neon VQADD/VQSUB insns in the 3-reg-same grouping
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-19-peter.maydell@linaro.org
---
 target/arm/neon-dp.decode       |  6 ++++++
 target/arm/translate-neon.inc.c | 15 +++++++++++++++
 target/arm/translate.c          | 14 ++------------
 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@
 @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+VQADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
+VQADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 1 .... @3same
+
 @3same_logic     .... ... . . . .. .... .... .... . q:1 .. .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
 
@@ -XXX,XX +XXX,XX @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
 VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
 VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
 
+VQSUB_S_3s       1111 001 0 0 . .. .... .... 0010 . . . 1 .... @3same
+VQSUB_U_3s       1111 001 1 0 . .. .... .... 0010 . . . 1 .... @3same
+
 VCGT_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 0 .... @3same
 VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
 VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static void gen_VTST_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
     tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &cmtst_op[vece]);
 }
 DO_3SAME_NO_SZ_3(VTST, gen_VTST_3s)
+
+#define DO_3SAME_GVEC4(INSN, OPARRAY)                                   \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),           \
+                       rn_ofs, rm_ofs, oprsz, maxsz, &OPARRAY[vece]);   \
+    }                                                                   \
+    DO_3SAME(INSN, gen_##INSN##_3s)
+
+DO_3SAME_GVEC4(VQADD_S, sqadd_op)
+DO_3SAME_GVEC4(VQADD_U, uqadd_op)
+DO_3SAME_GVEC4(VQSUB_S, sqsub_op)
+DO_3SAME_GVEC4(VQSUB_U, uqsub_op)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             }
             return 1;
 
-        case NEON_3R_VQADD:
-            tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
-                           rn_ofs, rm_ofs, vec_size, vec_size,
-                           (u ? uqadd_op : sqadd_op) + size);
-            return 0;
-
-        case NEON_3R_VQSUB:
-            tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
-                           rn_ofs, rm_ofs, vec_size, vec_size,
-                           (u ? uqsub_op : sqsub_op) + size);
-            return 0;
-
         case NEON_3R_VMUL: /* VMUL */
             if (u) {
                 /* Polynomial case allows only P8.  */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VTST_VCEQ:
         case NEON_3R_VCGT:
         case NEON_3R_VCGE:
+        case NEON_3R_VQADD:
+        case NEON_3R_VQSUB:
             /* Already handled by decodetree */
             return 1;
         }
-- 
2.20.1

Convert the Neon VMUL, VMLA, VMLS and VSHL insns in the
3-reg-same grouping to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-20-peter.maydell@linaro.org
---
 target/arm/neon-dp.decode       |  9 +++++++
 target/arm/translate-neon.inc.c | 44 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 28 +++------------------
 3 files changed, 56 insertions(+), 25 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
 VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
 VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
 
+VSHL_S_3s        1111 001 0 0 . .. .... .... 0100 . . . 0 .... @3same
+VSHL_U_3s        1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same
+
 VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
 VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
 VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
@@ -XXX,XX +XXX,XX @@ VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
 
 VTST_3s          1111 001 0 0 . .. .... .... 1000 . . . 1 .... @3same
 VCEQ_3s          1111 001 1 0 . .. .... .... 1000 . . . 1 .... @3same
+
+VMLA_3s          1111 001 0 0 . .. .... .... 1001 . . . 0 .... @3same
+VMLS_3s          1111 001 1 0 . .. .... .... 1001 . . . 0 .... @3same
+
+VMUL_3s          1111 001 0 0 . .. .... .... 1001 . . . 1 .... @3same
+VMUL_p_3s        1111 001 1 0 . .. .... .... 1001 . . . 1 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
 DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
 DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
 DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
+DO_3SAME_NO_SZ_3(VMUL, tcg_gen_gvec_mul)
 
 #define DO_3SAME_CMP(INSN, COND)                                        \
     static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
@@ -XXX,XX +XXX,XX @@ DO_3SAME_GVEC4(VQADD_S, sqadd_op)
 DO_3SAME_GVEC4(VQADD_U, uqadd_op)
 DO_3SAME_GVEC4(VQSUB_S, sqsub_op)
 DO_3SAME_GVEC4(VQSUB_U, uqsub_op)
+
+static void gen_VMUL_p_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                           uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz,
+                       0, gen_helper_gvec_pmul_b);
+}
+
+static bool trans_VMUL_p_3s(DisasContext *s, arg_3same *a)
+{
+    if (a->size != 0) {
+        return false;
+    }
+    return do_3same(s, a, gen_VMUL_p_3s);
+}
+
+#define DO_3SAME_GVEC3_NO_SZ_3(INSN, OPARRAY)                           \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,                          \
+                       oprsz, maxsz, &OPARRAY[vece]);                   \
+    }                                                                   \
+    DO_3SAME_NO_SZ_3(INSN, gen_##INSN##_3s)
+
+
+DO_3SAME_GVEC3_NO_SZ_3(VMLA, mla_op)
+DO_3SAME_GVEC3_NO_SZ_3(VMLS, mls_op)
+
+#define DO_3SAME_GVEC3_SHIFT(INSN, OPARRAY)                             \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        /* Note the operation is vshl vd,vm,vn */                       \
+        tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs,                          \
+                       oprsz, maxsz, &OPARRAY[vece]);                   \
+    }                                                                   \
+    DO_3SAME(INSN, gen_##INSN##_3s)
+
+DO_3SAME_GVEC3_SHIFT(VSHL_S, sshl_op)
+DO_3SAME_GVEC3_SHIFT(VSHL_U, ushl_op)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             }
             return 1;
 
-        case NEON_3R_VMUL: /* VMUL */
-            if (u) {
-                /* Polynomial case allows only P8.  */
-                if (size != 0) {
-                    return 1;
-                }
-                tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
-                                   0, gen_helper_gvec_pmul_b);
-            } else {
-                tcg_gen_gvec_mul(size, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-            }
-            return 0;
-
-        case NEON_3R_VML: /* VMLA, VMLS */
-            tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
-                           u ? &mls_op[size] : &mla_op[size]);
-            return 0;
-
-        case NEON_3R_VSHL:
-            /* Note the operation is vshl vd,vm,vn */
-            tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
-                           u ? &ushl_op[size] : &sshl_op[size]);
-            return 0;
-
         case NEON_3R_VADD_VSUB:
         case NEON_3R_LOGIC:
         case NEON_3R_VMAX:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VCGE:
         case NEON_3R_VQADD:
         case NEON_3R_VQSUB:
+        case NEON_3R_VMUL:
+        case NEON_3R_VML:
+        case NEON_3R_VSHL:
             /* Already handled by decodetree */
             return 1;
         }
-- 
2.20.1

We're going to want at least some of the NeonGen* typedefs
for the refactored 32-bit Neon decoder, so move them all
to translate.h since it makes more sense to keep them in
one group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-23-peter.maydell@linaro.org
---
 target/arm/translate.h     | 17 +++++++++++++++++
 target/arm/translate-a64.c | 17 -----------------
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
 typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
                         uint32_t, uint32_t, uint32_t);
 
+/* Function prototype for gen_ functions for calling Neon helpers */
+typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
+typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
+typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
+typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
+typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
+typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
+typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
+typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
+typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
+typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
+typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
+typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
+typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
+typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
+typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
+
 #endif /* TARGET_ARM_TRANSLATE_H */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ typedef struct AArch64DecodeTable {
     AArch64DecodeFn *disas_fn;
 } AArch64DecodeTable;
 
-/* Function prototype for gen_ functions for calling Neon helpers */
-typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
-typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
-typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
-typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
-typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
-typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
-typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
-typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
-typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
-typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
-typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
-typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
-typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
-typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
-typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
-
 /* initialize TCG globals.  */
 void a64_translate_init(void)
 {
-- 
2.20.1