Series comparison

-[Qemu-devel] [PULL 00/48] target-arm queue
+[PULL 00/29] target-arm queue
-Arm queue; the bulk of this is the VFP decodetree conversion...
+First arm pullreq of the 8.0 series...
-thanks
+The following changes since commit ae2b87341b5ddb0dcb1b3f2d4f586ef18de75873:
 -- PMM
-The following changes since commit 4747524f9f243ca5ff1f146d37e423c00e923ee1:
+  Merge tag 'pull-qapi-2022-12-14-v2' of https://repo.or.cz/qemu/armbru into staging (2022-12-14 22:42:14 +0000)
   Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2019-06-12' into staging (2019-06-13 11:58:00 +0100)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20190613
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20221215
-for you to fetch changes up to 07e4c7f769120c9a5bd6a26c2dc1421f2f838d80:
+for you to fetch changes up to 4f3ebdc33618e7c163f769047859d6f34373e3af:
-  target/arm: Fix short-vector increment behaviour (2019-06-13 12:57:37 +0100)
+  target/arm: Restrict arm_cpu_exec_interrupt() to TCG accelerator (2022-12-15 11:18:20 +0000)
 ----------------------------------------------------------------
 target-arm queue:
- * convert aarch32 VFP decoder to decodetree
+ * hw/arm/virt: Add properties to allow more granular
-   (includes tightening up decode in a few places)
+   configuration of use of highmem space
- * fix minor bugs in VFP short-vector handling
+ * target/arm: Add Cortex-A55 CPU
- * hw/core/bus.c: Only the main system bus can have no parent
+ * hw/intc/arm_gicv3: Fix GICD_TYPER ITLinesNumber advertisement
- * smmuv3: Fix decoding of ID register range
+ * Implement FEAT_EVT
- * Implement NSACR gating of floating point
+ * Some 3-phase-reset conversions for Arm GIC, SMMU
- * Use tcg_gen_gvec_bitsel
+ * hw/arm/boot: set initrd with #address-cells type in fdt
- * Vectorize USHL and SSHL
+ * align user-mode exposed ID registers with Linux
  * hw/misc: Move some arm-related files from specific_ss into softmmu_ss
  * Restrict arm_cpu_exec_interrupt() to TCG accelerator
 ----------------------------------------------------------------
-Peter Maydell (44):
+Gavin Shan (7):
-      target/arm: Implement NSACR gating of floating point
+      hw/arm/virt: Introduce virt_set_high_memmap() helper
-      hw/arm/smmuv3: Fix decoding of ID register range
+      hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()
-      hw/core/bus.c: Only the main system bus can have no parent
+      hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
-      target/arm: Add stubs for AArch32 VFP decodetree
+      hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
-      target/arm: Factor out VFP access checking code
+      hw/arm/virt: Improve high memory region address assignment
-      target/arm: Fix Cortex-R5F MVFR values
+      hw/arm/virt: Add 'compact-highmem' property
-      target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max
+      hw/arm/virt: Add properties to disable high memory regions
       target/arm: Convert the VSEL instructions to decodetree
       target/arm: Convert VMINNM, VMAXNM to decodetree
       target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM to decodetree
       target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM to decodetree
       target/arm: Move the VFP trans_* functions to translate-vfp.inc.c
       target/arm: Add helpers for VFP register loads and stores
       target/arm: Convert "double-precision" register moves to decodetree
       target/arm: Convert "single-precision" register moves to decodetree
       target/arm: Convert VFP two-register transfer insns to decodetree
       target/arm: Convert VFP VLDR and VSTR to decodetree
       target/arm: Convert the VFP load/store multiple insns to decodetree
       target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d
       target/arm: Convert VFP VMLA to decodetree
       target/arm: Convert VFP VMLS to decodetree
       target/arm: Convert VFP VNMLS to decodetree
       target/arm: Convert VFP VNMLA to decodetree
       target/arm: Convert VMUL to decodetree
       target/arm: Convert VNMUL to decodetree
       target/arm: Convert VADD to decodetree
       target/arm: Convert VSUB to decodetree
       target/arm: Convert VDIV to decodetree
       target/arm: Convert VFP fused multiply-add insns to decodetree
       target/arm: Convert VMOV (imm) to decodetree
       target/arm: Convert VABS to decodetree
       target/arm: Convert VNEG to decodetree
       target/arm: Convert VSQRT to decodetree
       target/arm: Convert VMOV (register) to decodetree
       target/arm: Convert VFP comparison insns to decodetree
       target/arm: Convert the VCVT-from-f16 insns to decodetree
       target/arm: Convert the VCVT-to-f16 insns to decodetree
       target/arm: Convert VFP round insns to decodetree
       target/arm: Convert double-single precision conversion insns to decodetree
       target/arm: Convert integer-to-float insns to decodetree
       target/arm: Convert VJCVT to decodetree
       target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree
       target/arm: Convert float-to-integer VCVT insns to decodetree
       target/arm: Fix short-vector increment behaviour
-Richard Henderson (4):
+Luke Starrett (1):
-      target/arm: Vectorize USHL and SSHL
+      hw/intc/arm_gicv3: Fix GICD_TYPER ITLinesNumber advertisement
       target/arm: Use tcg_gen_gvec_bitsel
       target/arm: Fix output of PAuth Auth
       decodetree: Fix comparison of Field
- target/arm/Makefile.objs          |   13 +
+Mihai Carabas (1):
- tests/tcg/aarch64/Makefile.target |    2 +-
+      hw/arm/virt: build SMBIOS 19 table
  target/arm/cpu.h                  |   11 +
  target/arm/helper.h               |   11 +-
  target/arm/translate-a64.h        |    2 +
  target/arm/translate.h            |    9 +-
  hw/arm/smmuv3.c                   |    2 +-
  hw/core/bus.c                     |   21 +-
  target/arm/cpu.c                  |    6 +
  target/arm/helper.c               |   75 +-
  target/arm/neon_helper.c          |   33 -
  target/arm/pauth_helper.c         |    4 +-
  target/arm/translate-a64.c        |   33 +-
  target/arm/translate-vfp.inc.c    | 2672 +++++++++++++++++++++++++++++++++++++
  target/arm/translate.c            | 1881 +++++---------------------
  target/arm/vec_helper.c           |   88 ++
  tests/tcg/aarch64/pauth-2.c       |   61 +
  scripts/decodetree.py             |    2 +-
  target/arm/vfp-uncond.decode      |   63 +
  target/arm/vfp.decode             |  242 ++++
 files changed, 3593 insertions(+), 1638 deletions(-)
  create mode 100644 target/arm/translate-vfp.inc.c
  create mode 100644 tests/tcg/aarch64/pauth-2.c
  create mode 100644 target/arm/vfp-uncond.decode
  create mode 100644 target/arm/vfp.decode
+Peter Maydell (15):
+      target/arm: Allow relevant HCR bits to be written for FEAT_EVT
+      target/arm: Implement HCR_EL2.TTLBIS traps
+      target/arm: Implement HCR_EL2.TTLBOS traps
+      target/arm: Implement HCR_EL2.TICAB,TOCU traps
+      target/arm: Implement HCR_EL2.TID4 traps
+      target/arm: Report FEAT_EVT for TCG '-cpu max'
+      hw/arm: Convert TYPE_ARM_SMMU to 3-phase reset
+      hw/arm: Convert TYPE_ARM_SMMUV3 to 3-phase reset
+      hw/intc: Convert TYPE_ARM_GIC_COMMON to 3-phase reset
+      hw/intc: Convert TYPE_ARM_GIC_KVM to 3-phase reset
+      hw/intc: Convert TYPE_ARM_GICV3_COMMON to 3-phase reset
+      hw/intc: Convert TYPE_KVM_ARM_GICV3 to 3-phase reset
+      hw/intc: Convert TYPE_ARM_GICV3_ITS_COMMON to 3-phase reset
+      hw/intc: Convert TYPE_ARM_GICV3_ITS to 3-phase reset
+      hw/intc: Convert TYPE_KVM_ARM_ITS to 3-phase reset
+Philippe Mathieu-Daudé (1):
+      target/arm: Restrict arm_cpu_exec_interrupt() to TCG accelerator
+Schspa Shi (1):
+      hw/arm/boot: set initrd with #address-cells type in fdt
+Thomas Huth (1):
+      hw/misc: Move some arm-related files from specific_ss into softmmu_ss
+Timofey Kutergin (1):
+      target/arm: Add Cortex-A55 CPU
+Zhuojia Shen (1):
+      target/arm: align exposed ID registers with Linux
+ docs/system/arm/emulation.rst          |   1 +
+ docs/system/arm/virt.rst               |  18 +++
+ include/hw/arm/smmuv3.h                |   2 +-
+ include/hw/arm/virt.h                  |   2 +
+ include/hw/misc/xlnx-zynqmp-apu-ctrl.h |   2 +-
+ target/arm/cpu.h                       |  30 +++++
+ target/arm/kvm-consts.h                |   8 +-
+ hw/arm/boot.c                          |  10 +-
+ hw/arm/smmu-common.c                   |   7 +-
+ hw/arm/smmuv3.c                        |  12 +-
+ hw/arm/virt.c                          | 202 +++++++++++++++++++++++-----
+ hw/intc/arm_gic_common.c               |   7 +-
+ hw/intc/arm_gic_kvm.c                  |  14 +-
+ hw/intc/arm_gicv3_common.c             |   7 +-
+ hw/intc/arm_gicv3_dist.c               |   4 +-
+ hw/intc/arm_gicv3_its.c                |  14 +-
+ hw/intc/arm_gicv3_its_common.c         |   7 +-
+ hw/intc/arm_gicv3_its_kvm.c            |  14 +-
+ hw/intc/arm_gicv3_kvm.c                |  14 +-
+ hw/misc/imx6_src.c                     |   2 +-
+ hw/misc/iotkit-sysctl.c                |   1 -
+ target/arm/cpu.c                       |   5 +-
+ target/arm/cpu64.c                     |  70 ++++++++++
+ target/arm/cpu_tcg.c                   |   1 +
+ target/arm/helper.c                    | 231 ++++++++++++++++++++++++---------
+ hw/misc/meson.build                    |  11 +-
+files changed, 538 insertions(+), 158 deletions(-)

-[Qemu-devel] [PULL 34/48] target/arm: Convert VMOV (imm) to decodetree
+[PULL 01/29] hw/arm/virt: Introduce virt_set_high_memmap() helper
-Convert the VFP VMOV (immediate) instruction to decodetree.
+From: Gavin Shan <gshan@redhat.com>
+This introduces virt_set_high_memmap() helper. The logic of high
+memory region address assignment is moved to the helper. The intention
+is to make the subsequent optimization for high memory region address
+assignment easier.
+No functional change intended.
+Signed-off-by: Gavin Shan <gshan@redhat.com>
+Reviewed-by: Eric Auger <eric.auger@redhat.com>
+Reviewed-by: Cornelia Huck <cohuck@redhat.com>
+Reviewed-by: Marc Zyngier <maz@kernel.org>
+Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
+Message-id: 20221029224307.138822-2-gshan@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 129 +++++++++++++++++++++++++++++++++
+ hw/arm/virt.c | 74 ++++++++++++++++++++++++++++-----------------------
- target/arm/translate.c         |  27 +------
+file changed, 41 insertions(+), 33 deletions(-)
  target/arm/vfp.decode          |   5 ++
 files changed, 136 insertions(+), 25 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/arm/virt.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/arm/virt.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
+@@ -XXX,XX +XXX,XX @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
+     return arm_cpu_mp_affinity(idx, clustersz);
      return true;
  }
++static void virt_set_high_memmap(VirtMachineState *vms,
++                                 hwaddr base, int pa_bits)
++{
++    int i;
 +
-+static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
++    for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-+{
++        hwaddr size = extended_memmap[i].size;
-+    uint32_t delta_d = 0;
++        bool fits;
 +    uint32_t bank_mask = 0;
 +    int veclen = s->vec_len;
 +    TCGv_i32 fd;
 +    uint32_t n, i, vd;
 +
-+    vd = a->vd;
++        base = ROUND_UP(base, size);
 +        vms->memmap[i].base = base;
 +        vms->memmap[i].size = size;
 +
-+    if (!dc_isar_feature(aa32_fpshvec, s) &&
++        /*
-+        (veclen != 0 || s->vec_stride != 0)) {
++         * Check each device to see if they fit in the PA space,
-+        return false;
++         * moving highest_gpa as we go.
-+    }
++         *
 +         * For each device that doesn't fit, disable it.
 +         */
 +        fits = (base + size) <= BIT_ULL(pa_bits);
 +        if (fits) {
 +            vms->highest_gpa = base + size - 1;
 +        }
 +
-+    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
++        switch (i) {
-+        return false;
++        case VIRT_HIGH_GIC_REDIST2:
-+    }
++            vms->highmem_redists &= fits;
-+
++            break;
-+    if (!vfp_access_check(s)) {
++        case VIRT_HIGH_PCIE_ECAM:
-+        return true;
++            vms->highmem_ecam &= fits;
-+    }
++            break;
-+
++        case VIRT_HIGH_PCIE_MMIO:
-+    if (veclen > 0) {
++            vms->highmem_mmio &= fits;
 +        bank_mask = 0x18;
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = s->vec_stride + 1;
 +        }
 +    }
 +
 +    n = (a->imm4h << 28) & 0x80000000;
 +    i = ((a->imm4h << 4) & 0x70) | a->imm4l;
 +    if (i & 0x40) {
 +        i |= 0x780;
 +    } else {
 +        i |= 0x800;
 +    }
 +    n |= i << 19;
 +
 +    fd = tcg_temp_new_i32();
 +    tcg_gen_movi_i32(fd, n);
 +
 +    for (;;) {
 +        neon_store_reg32(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +
-+        /* Set up the operands for the next iteration */
++        base += size;
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +    }
-+
-+    tcg_temp_free_i32(fd);
-+    return true;
 +}
 +
-+static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
+ static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
 +{
 +    uint32_t delta_d = 0;
 +    uint32_t bank_mask = 0;
 +    int veclen = s->vec_len;
 +    TCGv_i64 fd;
 +    uint32_t n, i, vd;
 +
 +    vd = a->vd;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (vd & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!dc_isar_feature(aa32_fpshvec, s) &&
 +        (veclen != 0 || s->vec_stride != 0)) {
 +        return false;
 +    }
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (veclen > 0) {
 +        bank_mask = 0xc;
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = (s->vec_stride >> 1) + 1;
 +        }
 +    }
 +
 +    n = (a->imm4h << 28) & 0x80000000;
 +    i = ((a->imm4h << 4) & 0x70) | a->imm4l;
 +    if (i & 0x40) {
 +        i |= 0x3f80;
 +    } else {
 +        i |= 0x4000;
 +    }
 +    n |= i << 16;
 +
 +    fd = tcg_temp_new_i64();
 +    tcg_gen_movi_i64(fd, ((uint64_t)n) << 32);
 +
 +    for (;;) {
 +        neon_store_reg64(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +    }
 +
 +    tcg_temp_free_i64(fd);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
   */
  static int disas_vfp_insn(DisasContext *s, uint32_t insn)
  {
--    uint32_t rd, rn, rm, op, i, n, delta_d, delta_m, bank_mask;
+     MachineState *ms = MACHINE(vms);
-+    uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
+@@ -XXX,XX +XXX,XX @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
-     int dp, veclen;
+     /* We know for sure that at least the memory fits in the PA space */
-     TCGv_i32 tmp;
+     vms->highest_gpa = memtop - 1;
-     TCGv_i32 tmp2;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-    for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-             rn = VFP_SREG_N(insn);
+-        hwaddr size = extended_memmap[i].size;
+-        bool fits;
              switch (op) {
 -            case 0 ... 13:
 +            case 0 ... 14:
                  /* Already handled by decodetree */
                  return 1;
              default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              for (;;) {
                  /* Perform the calculation.  */
                  switch (op) {
 -                case 14: /* fconst */
 -                    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                        return 1;
 -                    }
 -
--                    n = (insn << 12) & 0x80000000;
+-        base = ROUND_UP(base, size);
--                    i = ((insn >> 12) & 0x70) | (insn & 0xf);
+-        vms->memmap[i].base = base;
--                    if (dp) {
+-        vms->memmap[i].size = size;
--                        if (i & 0x40)
+-
--                            i |= 0x3f80;
+-        /*
--                        else
+-         * Check each device to see if they fit in the PA space,
--                            i |= 0x4000;
+-         * moving highest_gpa as we go.
--                        n |= i << 16;
+-         *
--                        tcg_gen_movi_i64(cpu_F0d, ((uint64_t)n) << 32);
+-         * For each device that doesn't fit, disable it.
--                    } else {
+-         */
--                        if (i & 0x40)
+-        fits = (base + size) <= BIT_ULL(pa_bits);
--                            i |= 0x780;
+-        if (fits) {
--                        else
+-            vms->highest_gpa = base + size - 1;
--                            i |= 0x800;
+-        }
--                        n |= i << 19;
+-
--                        tcg_gen_movi_i32(cpu_F0s, n);
+-        switch (i) {
--                    }
+-        case VIRT_HIGH_GIC_REDIST2:
--                    break;
+-            vms->highmem_redists &= fits;
-                 case 15: /* extension space */
+-            break;
-                     switch (rn) {
+-        case VIRT_HIGH_PCIE_ECAM:
-                     case 0: /* cpy */
+-            vms->highmem_ecam &= fits;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
+-            break;
-index XXXXXXX..XXXXXXX 100644
+-        case VIRT_HIGH_PCIE_MMIO:
---- a/target/arm/vfp.decode
+-            vms->highmem_mmio &= fits;
-+++ b/target/arm/vfp.decode
+-            break;
-@@ -XXX,XX +XXX,XX @@ VFM_sp       ---- 1110 1.10 .... .... 1010 . o2:1 . 0 .... \
+-        }
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=2
+-
- VFM_dp       ---- 1110 1.10 .... .... 1011 . o2:1 . 0 .... \
+-        base += size;
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=2
+-    }
-+
++    virt_set_high_memmap(vms, base, pa_bits);
-+VMOV_imm_sp  ---- 1110 1.11 imm4h:4 .... 1010 0000 imm4l:4 \
-+             vd=%vd_sp
+     if (device_memory_size > 0) {
-+VMOV_imm_dp  ---- 1110 1.11 imm4h:4 .... 1011 0000 imm4l:4 \
+         ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
 +             vd=%vd_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 48/48] target/arm: Fix short-vector increment behaviour
+[PULL 02/29] hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()
-For VFP short vectors, the VFP registers are divided into a
+From: Gavin Shan <gshan@redhat.com>
 series of banks: for single-precision these are s0-s7, s8-s15,
 s16-s23 and s24-s31; for double-precision they are d0-d3,
 d4-d7, ... d28-d31. Some banks are "scalar" meaning that
 use of a register within them triggers a pure-scalar or
 mixed vector-scalar operation rather than a full vector
 operation. The scalar banks are s0-s7, d0-d3 and d16-d19.
 When using a bank as part of a vector operation, we
 iterate through it, increasing the register number by
 the specified stride each time, and wrapping around to
 the beginning of the bank.
-Unfortunately our calculation of the "increment" part of this
+This renames variable 'size' to 'region_size' in virt_set_high_memmap().
-was incorrect:
+Its counterpart ('region_base') will be introduced in next patch.
  vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask)
 will only do the intended thing if bank_mask has exactly
 one set high bit. For instance for doubles (bank_mask = 0xc),
 if we start with vd = 6 and delta_d = 2 then vd is updated
 to 12 rather than the intended 4.
-This only causes problems in the unlikely case that the
+No functional change intended.
 starting register is not the first in its bank: if the
 register number doesn't have to wrap around then the
 expression happens to give the right answer.
-Fix this bug by abstracting out the "check whether register
+Signed-off-by: Gavin Shan <gshan@redhat.com>
-is in a scalar bank" and "advance register within bank"
+Reviewed-by: Eric Auger <eric.auger@redhat.com>
-operations to utility functions which use the right
+Reviewed-by: Cornelia Huck <cohuck@redhat.com>
-bit masking operations.
+Reviewed-by: Marc Zyngier <maz@kernel.org>
 Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
 Message-id: 20221029224307.138822-3-gshan@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/arm/virt.c | 15 ++++++++-------
 file changed, 8 insertions(+), 7 deletions(-)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/translate-vfp.inc.c | 100 ++++++++++++++++++++-------------
 file changed, 60 insertions(+), 40 deletions(-)
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/arm/virt.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/arm/virt.c
-@@ -XXX,XX +XXX,XX @@ typedef void VFPGen3OpDPFn(TCGv_i64 vd,
+@@ -XXX,XX +XXX,XX @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
- typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
+ static void virt_set_high_memmap(VirtMachineState *vms,
- typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
+                                  hwaddr base, int pa_bits)
 +/*
 + * Return true if the specified S reg is in a scalar bank
 + * (ie if it is s0..s7)
 + */
 +static inline bool vfp_sreg_is_scalar(int reg)
 +{
 +    return (reg & 0x18) == 0;
 +}
 +
 +/*
 + * Return true if the specified D reg is in a scalar bank
 + * (ie if it is d0..d3 or d16..d19)
 + */
 +static inline bool vfp_dreg_is_scalar(int reg)
 +{
 +    return (reg & 0xc) == 0;
 +}
 +
 +/*
 + * Advance the S reg number forwards by delta within its bank
 + * (ie increment the low 3 bits but leave the rest the same)
 + */
 +static inline int vfp_advance_sreg(int reg, int delta)
 +{
 +    return ((reg + delta) & 0x7) | (reg & ~0x7);
 +}
 +
 +/*
 + * Advance the D reg number forwards by delta within its bank
 + * (ie increment the low 2 bits but leave the rest the same)
 + */
 +static inline int vfp_advance_dreg(int reg, int delta)
 +{
 +    return ((reg + delta) & 0x3) | (reg & ~0x3);
 +}
 +
  /*
   * Perform a 3-operand VFP data processing instruction. fn is the
   * callback to do the actual operation; this function deals with the
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
  {
-     uint32_t delta_m = 0;
++    hwaddr region_size;
-     uint32_t delta_d = 0;
++    bool fits;
--    uint32_t bank_mask = 0;
+     int i;
-     int veclen = s->vec_len;
-     TCGv_i32 f0, f1, fd;
+     for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-     TCGv_ptr fpst;
+-        hwaddr size = extended_memmap[i].size;
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
+-        bool fits;
 +        region_size = extended_memmap[i].size;
 -        base = ROUND_UP(base, size);
 +        base = ROUND_UP(base, region_size);
          vms->memmap[i].base = base;
 -        vms->memmap[i].size = size;
 +        vms->memmap[i].size = region_size;
          /*
           * Check each device to see if they fit in the PA space,
@@ -XXX,XX +XXX,XX @@ static void virt_set_high_memmap(VirtMachineState *vms,
           *
           * For each device that doesn't fit, disable it.
           */
 -        fits = (base + size) <= BIT_ULL(pa_bits);
 +        fits = (base + region_size) <= BIT_ULL(pa_bits);
          if (fits) {
 -            vms->highest_gpa = base + size - 1;
 +            vms->highest_gpa = base + region_size - 1;
          }
          switch (i) {
@@ -XXX,XX +XXX,XX @@ static void virt_set_high_memmap(VirtMachineState *vms,
              break;
          }
 -        base += size;
 +        base += region_size;
      }
+ }
-     if (veclen > 0) {
 -        bank_mask = 0x18;
 -
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_sreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
              delta_d = s->vec_stride + 1;
 -            if ((vm & bank_mask) == 0) {
 +            if (vfp_sreg_is_scalar(vm)) {
                  /* mixed scalar/vector */
                  delta_m = 0;
              } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 -        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
 +        vd = vfp_advance_sreg(vd, delta_d);
 +        vn = vfp_advance_sreg(vn, delta_d);
          neon_load_reg32(f0, vn);
          if (delta_m) {
 -            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +            vm = vfp_advance_sreg(vm, delta_m);
              neon_load_reg32(f1, vm);
          }
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
  {
      uint32_t delta_m = 0;
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i64 f0, f1, fd;
      TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
      }
      if (veclen > 0) {
 -        bank_mask = 0xc;
 -
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_dreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
              delta_d = (s->vec_stride >> 1) + 1;
 -            if ((vm & bank_mask) == 0) {
 +            if (vfp_dreg_is_scalar(vm)) {
                  /* mixed scalar/vector */
                  delta_m = 0;
              } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
          }
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 -        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
 +        vd = vfp_advance_dreg(vd, delta_d);
 +        vn = vfp_advance_dreg(vn, delta_d);
          neon_load_reg64(f0, vn);
          if (delta_m) {
 -            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +            vm = vfp_advance_dreg(vm, delta_m);
              neon_load_reg64(f1, vm);
          }
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
  {
      uint32_t delta_m = 0;
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i32 f0, fd;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      }
      if (veclen > 0) {
 -        bank_mask = 0x18;
 -
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_sreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
              delta_d = s->vec_stride + 1;
 -            if ((vm & bank_mask) == 0) {
 +            if (vfp_sreg_is_scalar(vm)) {
                  /* mixed scalar/vector */
                  delta_m = 0;
              } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
          if (delta_m == 0) {
              /* single source one-many */
              while (veclen--) {
 -                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +                vd = vfp_advance_sreg(vd, delta_d);
                  neon_store_reg32(fd, vd);
              }
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 -        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +        vd = vfp_advance_sreg(vd, delta_d);
 +        vm = vfp_advance_sreg(vm, delta_m);
          neon_load_reg32(f0, vm);
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
  {
      uint32_t delta_m = 0;
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i64 f0, fd;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
      }
      if (veclen > 0) {
 -        bank_mask = 0xc;
 -
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_dreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
              delta_d = (s->vec_stride >> 1) + 1;
 -            if ((vm & bank_mask) == 0) {
 +            if (vfp_dreg_is_scalar(vm)) {
                  /* mixed scalar/vector */
                  delta_m = 0;
              } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
          if (delta_m == 0) {
              /* single source one-many */
              while (veclen--) {
 -                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +                vd = vfp_advance_dreg(vd, delta_d);
                  neon_store_reg64(fd, vd);
              }
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 -        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +        vd = vfp_advance_dreg(vd, delta_d);
 +        vd = vfp_advance_dreg(vm, delta_m);
          neon_load_reg64(f0, vm);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
  static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
  {
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i32 fd;
      uint32_t n, i, vd;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
      }
      if (veclen > 0) {
 -        bank_mask = 0x18;
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_sreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vd = vfp_advance_sreg(vd, delta_d);
      }
      tcg_temp_free_i32(fd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
  static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
  {
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i64 fd;
      uint32_t n, i, vd;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
      }
      if (veclen > 0) {
 -        bank_mask = 0xc;
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_dreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vfp_advance_dreg(vd, delta_d);
      }
      tcg_temp_free_i64(fd);
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 17/48] target/arm: Add helpers for VFP register loads and stores
+[PULL 03/29] hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
-The current VFP code has two different idioms for
+From: Gavin Shan <gshan@redhat.com>
 loading and storing from the VFP register file:
 using the gen_mov_F0_vreg() and similar functions,
    which load and store to a fixed set of TCG globals
    cpu_F0s, CPU_F0d, etc
 by direct calls to tcg_gen_ld_f64() and friends
-We want to phase out idiom 1 (because the use of the
+This introduces variable 'region_base' for the base address of the
-fixed globals is a relic of a much older version of TCG),
+specific high memory region. It's the preparatory work to optimize
-but idiom 2 is quite longwinded:
+high memory region address assignment.
  tcg_gen_ld_f64(tmp, cpu_env, vfp_reg_offset(true, reg))
 requires us to specify the 64-bitness twice, once in
 the function name and once by passing 'true' to
 vfp_reg_offset(). There's no guard against accidentally
 passing the wrong flag.
-Instead, let's move to a convention of accessing 64-bit
+No functional change intended.
 registers via the existing neon_load_reg64() and
 neon_store_reg64(), and provide new neon_load_reg32()
 and neon_store_reg32() for the 32-bit equivalents.
-Implement the new functions and use them in the code in
+Signed-off-by: Gavin Shan <gshan@redhat.com>
-translate-vfp.inc.c. We will convert the rest of the VFP
+Reviewed-by: Eric Auger <eric.auger@redhat.com>
-code as we do the decodetree conversion in subsequent
+Reviewed-by: Cornelia Huck <cohuck@redhat.com>
-commits.
+Reviewed-by: Marc Zyngier <maz@kernel.org>
 Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
 Message-id: 20221029224307.138822-4-gshan@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/arm/virt.c | 12 ++++++------
 file changed, 6 insertions(+), 6 deletions(-)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/translate-vfp.inc.c | 40 +++++++++++++++++-----------------
  target/arm/translate.c         | 10 +++++++++
 files changed, 30 insertions(+), 20 deletions(-)
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/arm/virt.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/arm/virt.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+@@ -XXX,XX +XXX,XX @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
-         tcg_gen_ext_i32_i64(nf, cpu_NF);
+ static void virt_set_high_memmap(VirtMachineState *vms,
-         tcg_gen_ext_i32_i64(vf, cpu_VF);
+                                  hwaddr base, int pa_bits)
+ {
--        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
+-    hwaddr region_size;
--        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
++    hwaddr region_base, region_size;
-+        neon_load_reg64(frn, rn);
+     bool fits;
-+        neon_load_reg64(frm, rm);
+     int i;
-         switch (a->cc) {
-         case 0: /* eq: Z */
+     for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-             tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
++        region_base = ROUND_UP(base, extended_memmap[i].size);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+         region_size = extended_memmap[i].size;
-             tcg_temp_free_i64(tmp);
 -        base = ROUND_UP(base, region_size);
 -        vms->memmap[i].base = base;
 +        vms->memmap[i].base = region_base;
          vms->memmap[i].size = region_size;
          /*
@@ -XXX,XX +XXX,XX @@ static void virt_set_high_memmap(VirtMachineState *vms,
           *
           * For each device that doesn't fit, disable it.
           */
 -        fits = (base + region_size) <= BIT_ULL(pa_bits);
 +        fits = (region_base + region_size) <= BIT_ULL(pa_bits);
          if (fits) {
 -            vms->highest_gpa = base + region_size - 1;
 +            vms->highest_gpa = region_base + region_size - 1;
          }
          switch (i) {
@@ -XXX,XX +XXX,XX @@ static void virt_set_high_memmap(VirtMachineState *vms,
              break;
          }
--        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
-+        neon_store_reg64(dest, rd);
+-        base += region_size;
-         tcg_temp_free_i64(frn);
++        base = region_base + region_size;
          tcg_temp_free_i64(frm);
          tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          frn = tcg_temp_new_i32();
          frm = tcg_temp_new_i32();
          dest = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        neon_load_reg32(frn, rn);
 +        neon_load_reg32(frm, rm);
          switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
              tcg_temp_free_i32(tmp);
              break;
          }
 -        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        neon_store_reg32(dest, rd);
          tcg_temp_free_i32(frn);
          tcg_temp_free_i32(frm);
          tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
          frm = tcg_temp_new_i64();
          dest = tcg_temp_new_i64();
 -        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        neon_load_reg64(frn, rn);
 +        neon_load_reg64(frm, rm);
          if (vmin) {
              gen_helper_vfp_minnumd(dest, frn, frm, fpst);
          } else {
              gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
          }
 -        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        neon_store_reg64(dest, rd);
          tcg_temp_free_i64(frn);
          tcg_temp_free_i64(frm);
          tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
          frm = tcg_temp_new_i32();
          dest = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        neon_load_reg32(frn, rn);
 +        neon_load_reg32(frm, rm);
          if (vmin) {
              gen_helper_vfp_minnums(dest, frn, frm, fpst);
          } else {
              gen_helper_vfp_maxnums(dest, frn, frm, fpst);
          }
 -        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        neon_store_reg32(dest, rd);
          tcg_temp_free_i32(frn);
          tcg_temp_free_i32(frm);
          tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i64 tcg_res;
          tcg_op = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
 -        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 +        neon_load_reg64(tcg_op, rm);
          gen_helper_rintd(tcg_res, tcg_op, fpst);
 -        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 +        neon_store_reg64(tcg_res, rd);
          tcg_temp_free_i64(tcg_op);
          tcg_temp_free_i64(tcg_res);
      } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i32 tcg_res;
          tcg_op = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 +        neon_load_reg32(tcg_op, rm);
          gen_helper_rints(tcg_res, tcg_op, fpst);
 -        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 +        neon_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_op);
          tcg_temp_free_i32(tcg_res);
      }
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-         tcg_double = tcg_temp_new_i64();
-         tcg_res = tcg_temp_new_i64();
-         tcg_tmp = tcg_temp_new_i32();
--        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
-+        neon_load_reg64(tcg_double, rm);
-         if (is_signed) {
-             gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
-         } else {
-             gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
-         }
-         tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
--        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
-+        neon_store_reg32(tcg_tmp, rd);
-         tcg_temp_free_i32(tcg_tmp);
-         tcg_temp_free_i64(tcg_res);
-         tcg_temp_free_i64(tcg_double);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-         TCGv_i32 tcg_single, tcg_res;
-         tcg_single = tcg_temp_new_i32();
-         tcg_res = tcg_temp_new_i32();
--        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
-+        neon_load_reg32(tcg_single, rm);
-         if (is_signed) {
-             gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
-         } else {
-             gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
-         }
--        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
-+        neon_store_reg32(tcg_res, rd);
-         tcg_temp_free_i32(tcg_res);
-         tcg_temp_free_i32(tcg_single);
-     }
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
-     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
  }
-+static inline void neon_load_reg32(TCGv_i32 var, int reg)
-+{
-+    tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
-+}
-+
-+static inline void neon_store_reg32(TCGv_i32 var, int reg)
-+{
-+    tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
-+}
-+
- static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
- {
-     TCGv_ptr ret = tcg_temp_new_ptr();
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 16/48] target/arm: Move the VFP trans_* functions to translate-vfp.inc.c
+[PULL 04/29] hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
-Move the trans_*() functions we've just created from translate.c
+From: Gavin Shan <gshan@redhat.com>
 to translate-vfp.inc.c. This is pure code motion with no textual
 changes (this can be checked with 'git show --color-moved').
+This introduces virt_get_high_memmap_enabled() helper, which returns
+the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will
+be used in the subsequent patches.
+No functional change intended.
+Signed-off-by: Gavin Shan <gshan@redhat.com>
+Reviewed-by: Eric Auger <eric.auger@redhat.com>
+Reviewed-by: Cornelia Huck <cohuck@redhat.com>
+Reviewed-by: Marc Zyngier <maz@kernel.org>
+Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
+Message-id: 20221029224307.138822-5-gshan@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 337 +++++++++++++++++++++++++++++++++
+ hw/arm/virt.c | 32 +++++++++++++++++++-------------
- target/arm/translate.c         | 337 ---------------------------------
+file changed, 19 insertions(+), 13 deletions(-)
 files changed, 337 insertions(+), 337 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/arm/virt.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/arm/virt.c
-@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check(DisasContext *s)
+@@ -XXX,XX +XXX,XX @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
- {
+     return arm_cpu_mp_affinity(idx, clustersz);
      return full_vfp_access_check(s, false);
  }
++static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms,
++                                                 int index)
++{
++    bool *enabled_array[] = {
++        &vms->highmem_redists,
++        &vms->highmem_ecam,
++        &vms->highmem_mmio,
++    };
 +
-+static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
++    assert(ARRAY_SIZE(extended_memmap) - VIRT_LOWMEMMAP_LAST ==
-+{
++           ARRAY_SIZE(enabled_array));
-+    uint32_t rd, rn, rm;
++    assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array));
 +    bool dp = a->dp;
 +
-+    if (!dc_isar_feature(aa32_vsel, s)) {
++    return enabled_array[index - VIRT_LOWMEMMAP_LAST];
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 +        ((a->vm | a->vn | a->vd) & 0x10)) {
 +        return false;
 +    }
 +    rd = a->vd;
 +    rn = a->vn;
 +    rm = a->vm;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (dp) {
 +        TCGv_i64 frn, frm, dest;
 +        TCGv_i64 tmp, zero, zf, nf, vf;
 +
 +        zero = tcg_const_i64(0);
 +
 +        frn = tcg_temp_new_i64();
 +        frm = tcg_temp_new_i64();
 +        dest = tcg_temp_new_i64();
 +
 +        zf = tcg_temp_new_i64();
 +        nf = tcg_temp_new_i64();
 +        vf = tcg_temp_new_i64();
 +
 +        tcg_gen_extu_i32_i64(zf, cpu_ZF);
 +        tcg_gen_ext_i32_i64(nf, cpu_NF);
 +        tcg_gen_ext_i32_i64(vf, cpu_VF);
 +
 +        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 +        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        switch (a->cc) {
 +        case 0: /* eq: Z */
 +            tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
 +                                frn, frm);
 +            break;
 +        case 1: /* vs: V */
 +            tcg_gen_movcond_i64(TCG_COND_LT, dest, vf, zero,
 +                                frn, frm);
 +            break;
 +        case 2: /* ge: N == V -> N ^ V == 0 */
 +            tmp = tcg_temp_new_i64();
 +            tcg_gen_xor_i64(tmp, vf, nf);
 +            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
 +                                frn, frm);
 +            tcg_temp_free_i64(tmp);
 +            break;
 +        case 3: /* gt: !Z && N == V */
 +            tcg_gen_movcond_i64(TCG_COND_NE, dest, zf, zero,
 +                                frn, frm);
 +            tmp = tcg_temp_new_i64();
 +            tcg_gen_xor_i64(tmp, vf, nf);
 +            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
 +                                dest, frm);
 +            tcg_temp_free_i64(tmp);
 +            break;
 +        }
 +        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i64(frn);
 +        tcg_temp_free_i64(frm);
 +        tcg_temp_free_i64(dest);
 +
 +        tcg_temp_free_i64(zf);
 +        tcg_temp_free_i64(nf);
 +        tcg_temp_free_i64(vf);
 +
 +        tcg_temp_free_i64(zero);
 +    } else {
 +        TCGv_i32 frn, frm, dest;
 +        TCGv_i32 tmp, zero;
 +
 +        zero = tcg_const_i32(0);
 +
 +        frn = tcg_temp_new_i32();
 +        frm = tcg_temp_new_i32();
 +        dest = tcg_temp_new_i32();
 +        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 +        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        switch (a->cc) {
 +        case 0: /* eq: Z */
 +            tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
 +                                frn, frm);
 +            break;
 +        case 1: /* vs: V */
 +            tcg_gen_movcond_i32(TCG_COND_LT, dest, cpu_VF, zero,
 +                                frn, frm);
 +            break;
 +        case 2: /* ge: N == V -> N ^ V == 0 */
 +            tmp = tcg_temp_new_i32();
 +            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
 +            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
 +                                frn, frm);
 +            tcg_temp_free_i32(tmp);
 +            break;
 +        case 3: /* gt: !Z && N == V */
 +            tcg_gen_movcond_i32(TCG_COND_NE, dest, cpu_ZF, zero,
 +                                frn, frm);
 +            tmp = tcg_temp_new_i32();
 +            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
 +            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
 +                                dest, frm);
 +            tcg_temp_free_i32(tmp);
 +            break;
 +        }
 +        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i32(frn);
 +        tcg_temp_free_i32(frm);
 +        tcg_temp_free_i32(dest);
 +
 +        tcg_temp_free_i32(zero);
 +    }
 +
 +    return true;
 +}
 +
-+static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
+ static void virt_set_high_memmap(VirtMachineState *vms,
-+{
+                                  hwaddr base, int pa_bits)
-+    uint32_t rd, rn, rm;
+ {
-+    bool dp = a->dp;
+     hwaddr region_base, region_size;
-+    bool vmin = a->op;
+-    bool fits;
-+    TCGv_ptr fpst;
++    bool *region_enabled, fits;
-+
+     int i;
-+    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
-+        return false;
+     for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-+    }
++        region_enabled = virt_get_high_memmap_enabled(vms, i);
-+
+         region_base = ROUND_UP(base, extended_memmap[i].size);
-+    /* UNDEF accesses to D16-D31 if they don't exist */
+         region_size = extended_memmap[i].size;
-+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
-+        ((a->vm | a->vn | a->vd) & 0x10)) {
+@@ -XXX,XX +XXX,XX @@ static void virt_set_high_memmap(VirtMachineState *vms,
-+        return false;
+             vms->highest_gpa = region_base + region_size - 1;
-+    }
+         }
-+    rd = a->vd;
-+    rn = a->vn;
+-        switch (i) {
-+    rm = a->vm;
+-        case VIRT_HIGH_GIC_REDIST2:
-+
+-            vms->highmem_redists &= fits;
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(0);
 +
 +    if (dp) {
 +        TCGv_i64 frn, frm, dest;
 +
 +        frn = tcg_temp_new_i64();
 +        frm = tcg_temp_new_i64();
 +        dest = tcg_temp_new_i64();
 +
 +        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 +        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        if (vmin) {
 +            gen_helper_vfp_minnumd(dest, frn, frm, fpst);
 +        } else {
 +            gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
 +        }
 +        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i64(frn);
 +        tcg_temp_free_i64(frm);
 +        tcg_temp_free_i64(dest);
 +    } else {
 +        TCGv_i32 frn, frm, dest;
 +
 +        frn = tcg_temp_new_i32();
 +        frm = tcg_temp_new_i32();
 +        dest = tcg_temp_new_i32();
 +
 +        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 +        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        if (vmin) {
 +            gen_helper_vfp_minnums(dest, frn, frm, fpst);
 +        } else {
 +            gen_helper_vfp_maxnums(dest, frn, frm, fpst);
 +        }
 +        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i32(frn);
 +        tcg_temp_free_i32(frm);
 +        tcg_temp_free_i32(dest);
 +    }
 +
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
 +/*
 + * Table for converting the most common AArch32 encoding of
 + * rounding mode to arm_fprounding order (which matches the
 + * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
 + */
 +static const uint8_t fp_decode_rm[] = {
 +    FPROUNDING_TIEAWAY,
 +    FPROUNDING_TIEEVEN,
 +    FPROUNDING_POSINF,
 +    FPROUNDING_NEGINF,
 +};
 +
 +static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
 +{
 +    uint32_t rd, rm;
 +    bool dp = a->dp;
 +    TCGv_ptr fpst;
 +    TCGv_i32 tcg_rmode;
 +    int rounding = fp_decode_rm[a->rm];
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 +        ((a->vm | a->vd) & 0x10)) {
 +        return false;
 +    }
 +    rd = a->vd;
 +    rm = a->vm;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(0);
 +
 +    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +
 +    if (dp) {
 +        TCGv_i64 tcg_op;
 +        TCGv_i64 tcg_res;
 +        tcg_op = tcg_temp_new_i64();
 +        tcg_res = tcg_temp_new_i64();
 +        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 +        gen_helper_rintd(tcg_res, tcg_op, fpst);
 +        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i64(tcg_op);
 +        tcg_temp_free_i64(tcg_res);
 +    } else {
 +        TCGv_i32 tcg_op;
 +        TCGv_i32 tcg_res;
 +        tcg_op = tcg_temp_new_i32();
 +        tcg_res = tcg_temp_new_i32();
 +        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 +        gen_helper_rints(tcg_res, tcg_op, fpst);
 +        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i32(tcg_op);
 +        tcg_temp_free_i32(tcg_res);
 +    }
 +
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    tcg_temp_free_i32(tcg_rmode);
 +
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
 +static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
 +{
 +    uint32_t rd, rm;
 +    bool dp = a->dp;
 +    TCGv_ptr fpst;
 +    TCGv_i32 tcg_rmode, tcg_shift;
 +    int rounding = fp_decode_rm[a->rm];
 +    bool is_signed = a->op;
 +
 +    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +        return false;
 +    }
 +    rd = a->vd;
 +    rm = a->vm;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(0);
 +
 +    tcg_shift = tcg_const_i32(0);
 +
 +    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +
 +    if (dp) {
 +        TCGv_i64 tcg_double, tcg_res;
 +        TCGv_i32 tcg_tmp;
 +        tcg_double = tcg_temp_new_i64();
 +        tcg_res = tcg_temp_new_i64();
 +        tcg_tmp = tcg_temp_new_i32();
 +        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
 +        if (is_signed) {
 +            gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
 +        } else {
 +            gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
 +        }
 +        tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
 +        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
 +        tcg_temp_free_i32(tcg_tmp);
 +        tcg_temp_free_i64(tcg_res);
 +        tcg_temp_free_i64(tcg_double);
 +    } else {
 +        TCGv_i32 tcg_single, tcg_res;
 +        tcg_single = tcg_temp_new_i32();
 +        tcg_res = tcg_temp_new_i32();
 +        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
 +        if (is_signed) {
 +            gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
 +        } else {
 +            gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
 +        }
 +        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
 +        tcg_temp_free_i32(tcg_res);
 +        tcg_temp_free_i32(tcg_single);
 +    }
 +
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    tcg_temp_free_i32(tcg_rmode);
 +
 +    tcg_temp_free_i32(tcg_shift);
 +
 +    tcg_temp_free_ptr(fpst);
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
      tcg_temp_free_i32(tmp);
  }
 -static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
 -{
 -    uint32_t rd, rn, rm;
 -    bool dp = a->dp;
 -
 -    if (!dc_isar_feature(aa32_vsel, s)) {
 -        return false;
 -    }
 -
 -    /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 -        ((a->vm | a->vn | a->vd) & 0x10)) {
 -        return false;
 -    }
 -    rd = a->vd;
 -    rn = a->vn;
 -    rm = a->vm;
 -
 -    if (!vfp_access_check(s)) {
 -        return true;
 -    }
 -
 -    if (dp) {
 -        TCGv_i64 frn, frm, dest;
 -        TCGv_i64 tmp, zero, zf, nf, vf;
 -
 -        zero = tcg_const_i64(0);
 -
 -        frn = tcg_temp_new_i64();
 -        frm = tcg_temp_new_i64();
 -        dest = tcg_temp_new_i64();
 -
 -        zf = tcg_temp_new_i64();
 -        nf = tcg_temp_new_i64();
 -        vf = tcg_temp_new_i64();
 -
 -        tcg_gen_extu_i32_i64(zf, cpu_ZF);
 -        tcg_gen_ext_i32_i64(nf, cpu_NF);
 -        tcg_gen_ext_i32_i64(vf, cpu_VF);
 -
 -        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        switch (a->cc) {
 -        case 0: /* eq: Z */
 -            tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
 -                                frn, frm);
 -            break;
--        case 1: /* vs: V */
+-        case VIRT_HIGH_PCIE_ECAM:
--            tcg_gen_movcond_i64(TCG_COND_LT, dest, vf, zero,
+-            vms->highmem_ecam &= fits;
 -                                frn, frm);
 -            break;
--        case 2: /* ge: N == V -> N ^ V == 0 */
+-        case VIRT_HIGH_PCIE_MMIO:
--            tmp = tcg_temp_new_i64();
+-            vms->highmem_mmio &= fits;
 -            tcg_gen_xor_i64(tmp, vf, nf);
 -            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
 -                                frn, frm);
 -            tcg_temp_free_i64(tmp);
 -            break;
 -        case 3: /* gt: !Z && N == V */
 -            tcg_gen_movcond_i64(TCG_COND_NE, dest, zf, zero,
 -                                frn, frm);
 -            tmp = tcg_temp_new_i64();
 -            tcg_gen_xor_i64(tmp, vf, nf);
 -            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
 -                                dest, frm);
 -            tcg_temp_free_i64(tmp);
 -            break;
 -        }
--        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
--        tcg_temp_free_i64(frn);
--        tcg_temp_free_i64(frm);
--        tcg_temp_free_i64(dest);
 -
--        tcg_temp_free_i64(zf);
++        *region_enabled &= fits;
--        tcg_temp_free_i64(nf);
+         base = region_base + region_size;
--        tcg_temp_free_i64(vf);
+     }
--
+ }
 -        tcg_temp_free_i64(zero);
 -    } else {
 -        TCGv_i32 frn, frm, dest;
 -        TCGv_i32 tmp, zero;
 -
 -        zero = tcg_const_i32(0);
 -
 -        frn = tcg_temp_new_i32();
 -        frm = tcg_temp_new_i32();
 -        dest = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        switch (a->cc) {
 -        case 0: /* eq: Z */
 -            tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
 -                                frn, frm);
 -            break;
 -        case 1: /* vs: V */
 -            tcg_gen_movcond_i32(TCG_COND_LT, dest, cpu_VF, zero,
 -                                frn, frm);
 -            break;
 -        case 2: /* ge: N == V -> N ^ V == 0 */
 -            tmp = tcg_temp_new_i32();
 -            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
 -            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
 -                                frn, frm);
 -            tcg_temp_free_i32(tmp);
 -            break;
 -        case 3: /* gt: !Z && N == V */
 -            tcg_gen_movcond_i32(TCG_COND_NE, dest, cpu_ZF, zero,
 -                                frn, frm);
 -            tmp = tcg_temp_new_i32();
 -            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
 -            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
 -                                dest, frm);
 -            tcg_temp_free_i32(tmp);
 -            break;
 -        }
 -        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i32(frn);
 -        tcg_temp_free_i32(frm);
 -        tcg_temp_free_i32(dest);
 -
 -        tcg_temp_free_i32(zero);
 -    }
 -
 -    return true;
 -}
 -
 -static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
 -{
 -    uint32_t rd, rn, rm;
 -    bool dp = a->dp;
 -    bool vmin = a->op;
 -    TCGv_ptr fpst;
 -
 -    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
 -        return false;
 -    }
 -
 -    /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 -        ((a->vm | a->vn | a->vd) & 0x10)) {
 -        return false;
 -    }
 -    rd = a->vd;
 -    rn = a->vn;
 -    rm = a->vm;
 -
 -    if (!vfp_access_check(s)) {
 -        return true;
 -    }
 -
 -    fpst = get_fpstatus_ptr(0);
 -
 -    if (dp) {
 -        TCGv_i64 frn, frm, dest;
 -
 -        frn = tcg_temp_new_i64();
 -        frm = tcg_temp_new_i64();
 -        dest = tcg_temp_new_i64();
 -
 -        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        if (vmin) {
 -            gen_helper_vfp_minnumd(dest, frn, frm, fpst);
 -        } else {
 -            gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
 -        }
 -        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i64(frn);
 -        tcg_temp_free_i64(frm);
 -        tcg_temp_free_i64(dest);
 -    } else {
 -        TCGv_i32 frn, frm, dest;
 -
 -        frn = tcg_temp_new_i32();
 -        frm = tcg_temp_new_i32();
 -        dest = tcg_temp_new_i32();
 -
 -        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        if (vmin) {
 -            gen_helper_vfp_minnums(dest, frn, frm, fpst);
 -        } else {
 -            gen_helper_vfp_maxnums(dest, frn, frm, fpst);
 -        }
 -        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i32(frn);
 -        tcg_temp_free_i32(frm);
 -        tcg_temp_free_i32(dest);
 -    }
 -
 -    tcg_temp_free_ptr(fpst);
 -    return true;
 -}
 -
 -/*
 - * Table for converting the most common AArch32 encoding of
 - * rounding mode to arm_fprounding order (which matches the
 - * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
 - */
 -static const uint8_t fp_decode_rm[] = {
 -    FPROUNDING_TIEAWAY,
 -    FPROUNDING_TIEEVEN,
 -    FPROUNDING_POSINF,
 -    FPROUNDING_NEGINF,
 -};
 -
 -static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
 -{
 -    uint32_t rd, rm;
 -    bool dp = a->dp;
 -    TCGv_ptr fpst;
 -    TCGv_i32 tcg_rmode;
 -    int rounding = fp_decode_rm[a->rm];
 -
 -    if (!dc_isar_feature(aa32_vrint, s)) {
 -        return false;
 -    }
 -
 -    /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 -        ((a->vm | a->vd) & 0x10)) {
 -        return false;
 -    }
 -    rd = a->vd;
 -    rm = a->vm;
 -
 -    if (!vfp_access_check(s)) {
 -        return true;
 -    }
 -
 -    fpst = get_fpstatus_ptr(0);
 -
 -    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
 -    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -
 -    if (dp) {
 -        TCGv_i64 tcg_op;
 -        TCGv_i64 tcg_res;
 -        tcg_op = tcg_temp_new_i64();
 -        tcg_res = tcg_temp_new_i64();
 -        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 -        gen_helper_rintd(tcg_res, tcg_op, fpst);
 -        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i64(tcg_op);
 -        tcg_temp_free_i64(tcg_res);
 -    } else {
 -        TCGv_i32 tcg_op;
 -        TCGv_i32 tcg_res;
 -        tcg_op = tcg_temp_new_i32();
 -        tcg_res = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 -        gen_helper_rints(tcg_res, tcg_op, fpst);
 -        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i32(tcg_op);
 -        tcg_temp_free_i32(tcg_res);
 -    }
 -
 -    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    tcg_temp_free_i32(tcg_rmode);
 -
 -    tcg_temp_free_ptr(fpst);
 -    return true;
 -}
 -
 -static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
 -{
 -    uint32_t rd, rm;
 -    bool dp = a->dp;
 -    TCGv_ptr fpst;
 -    TCGv_i32 tcg_rmode, tcg_shift;
 -    int rounding = fp_decode_rm[a->rm];
 -    bool is_signed = a->op;
 -
 -    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
 -        return false;
 -    }
 -
 -    /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 -        return false;
 -    }
 -    rd = a->vd;
 -    rm = a->vm;
 -
 -    if (!vfp_access_check(s)) {
 -        return true;
 -    }
 -
 -    fpst = get_fpstatus_ptr(0);
 -
 -    tcg_shift = tcg_const_i32(0);
 -
 -    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
 -    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -
 -    if (dp) {
 -        TCGv_i64 tcg_double, tcg_res;
 -        TCGv_i32 tcg_tmp;
 -        tcg_double = tcg_temp_new_i64();
 -        tcg_res = tcg_temp_new_i64();
 -        tcg_tmp = tcg_temp_new_i32();
 -        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
 -        if (is_signed) {
 -            gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
 -        } else {
 -            gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
 -        }
 -        tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
 -        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
 -        tcg_temp_free_i32(tcg_tmp);
 -        tcg_temp_free_i64(tcg_res);
 -        tcg_temp_free_i64(tcg_double);
 -    } else {
 -        TCGv_i32 tcg_single, tcg_res;
 -        tcg_single = tcg_temp_new_i32();
 -        tcg_res = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
 -        if (is_signed) {
 -            gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
 -        } else {
 -            gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
 -        }
 -        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
 -        tcg_temp_free_i32(tcg_res);
 -        tcg_temp_free_i32(tcg_single);
 -    }
 -
 -    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    tcg_temp_free_i32(tcg_rmode);
 -
 -    tcg_temp_free_i32(tcg_shift);
 -
 -    tcg_temp_free_ptr(fpst);
 -
 -    return true;
 -}
 -
  /*
   * Disassemble a VFP instruction.  Returns nonzero if an error occurred
   * (ie. an undefined instruction).
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 38/48] target/arm: Convert VMOV (register) to decodetree
+[PULL 05/29] hw/arm/virt: Improve high memory region address assignment
+From: Gavin Shan <gshan@redhat.com>
+There are three high memory regions, which are VIRT_HIGH_REDIST2,
+VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
+are floating on highest RAM address. However, they can be disabled
+in several cases.
+(1) One specific high memory region is likely to be disabled by
+    code by toggling vms->highmem_{redists, ecam, mmio}.
+(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
+    'virt-2.12' or ealier than it.
+(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
+    on 32-bits system.
+(4) One specific high memory region is disabled when it breaks the
+    PA space limit.
+The current implementation of virt_set_{memmap, high_memmap}() isn't
+optimized because the high memory region's PA space is always reserved,
+regardless of whatever the actual state in the corresponding
+vms->highmem_{redists, ecam, mmio} flag. In the code, 'base' and
+'vms->highest_gpa' are always increased for case (1), (2) and (3).
+It's unnecessary since the assigned PA space for the disabled high
+memory region won't be used afterwards.
+Improve the address assignment for those three high memory region by
+skipping the address assignment for one specific high memory region if
+it has been disabled in case (1), (2) and (3). The memory layout may
+be changed after the improvement is applied, which leads to potential
+migration breakage. So 'vms->highmem_compact' is added to control if
+the improvement should be applied. For now, 'vms->highmem_compact' is
+set to false, meaning that we don't have memory layout change until it
+becomes configurable through property 'compact-highmem' in next patch.
+Signed-off-by: Gavin Shan <gshan@redhat.com>
+Reviewed-by: Eric Auger <eric.auger@redhat.com>
+Reviewed-by: Cornelia Huck <cohuck@redhat.com>
+Reviewed-by: Marc Zyngier <maz@kernel.org>
+Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
+Message-id: 20221029224307.138822-6-gshan@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 10 ++++++++++
+ include/hw/arm/virt.h |  1 +
- target/arm/translate.c         |  8 +-------
+ hw/arm/virt.c         | 15 ++++++++++-----
- target/arm/vfp.decode          |  5 +++++
+files changed, 11 insertions(+), 5 deletions(-)
 files changed, 16 insertions(+), 7 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/virt.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/virt.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
+@@ -XXX,XX +XXX,XX @@ struct VirtMachineState {
-     return true;
+     PFlashCFI01 *flash[2];
      bool secure;
      bool highmem;
 +    bool highmem_compact;
      bool highmem_ecam;
      bool highmem_mmio;
      bool highmem_redists;
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void virt_set_high_memmap(VirtMachineState *vms,
          vms->memmap[i].size = region_size;
          /*
 -         * Check each device to see if they fit in the PA space,
 -         * moving highest_gpa as we go.
 +         * Check each device to see if it fits in the PA space,
 +         * moving highest_gpa as we go. For compatibility, move
 +         * highest_gpa for disabled fitting devices as well, if
 +         * the compact layout has been disabled.
           *
           * For each device that doesn't fit, disable it.
           */
          fits = (region_base + region_size) <= BIT_ULL(pa_bits);
 -        if (fits) {
 -            vms->highest_gpa = region_base + region_size - 1;
 +        *region_enabled &= fits;
 +        if (vms->highmem_compact && !*region_enabled) {
 +            continue;
          }
 -        *region_enabled &= fits;
          base = region_base + region_size;
 +        if (fits) {
 +            vms->highest_gpa = base - 1;
 +        }
      }
  }
-+static bool trans_VMOV_reg_sp(DisasContext *s, arg_VMOV_reg_sp *a)
-+{
-+    return do_vfp_2op_sp(s, tcg_gen_mov_i32, a->vd, a->vm);
-+}
-+
-+static bool trans_VMOV_reg_dp(DisasContext *s, arg_VMOV_reg_dp *a)
-+{
-+    return do_vfp_2op_dp(s, tcg_gen_mov_i64, a->vd, a->vm);
-+}
-+
- static bool trans_VABS_sp(DisasContext *s, arg_VABS_sp *a)
- {
-     return do_vfp_2op_sp(s, gen_helper_vfp_abss, a->vd, a->vm);
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 return 1;
-             case 15:
-                 switch (rn) {
--                case 1 ... 3:
-+                case 0 ... 3:
-                     /* Already handled by decodetree */
-                     return 1;
-                 default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             if (op == 15) {
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
-                 switch (rn) {
--                case 0x00: /* vmov */
--                    break;
--
-                 case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
-                 case 0x05: /* vcvtt.f64.f16, vcvtt.f32.f16 */
-                     /*
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 switch (op) {
-                 case 15: /* extension space */
-                     switch (rn) {
--                    case 0: /* cpy */
--                        /* no-op */
--                        break;
-                     case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
-                     {
-                         TCGv_ptr fpst = get_fpstatus_ptr(false);
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VMOV_imm_sp  ---- 1110 1.11 imm4h:4 .... 1010 0000 imm4l:4 \
- VMOV_imm_dp  ---- 1110 1.11 imm4h:4 .... 1011 0000 imm4l:4 \
-              vd=%vd_dp
-+VMOV_reg_sp  ---- 1110 1.11 0000 .... 1010 01.0 .... \
-+             vd=%vd_sp vm=%vm_sp
-+VMOV_reg_dp  ---- 1110 1.11 0000 .... 1011 01.0 .... \
-+             vd=%vd_dp vm=%vm_dp
-+
- VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
-              vd=%vd_sp vm=%vm_sp
- VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 18/48] target/arm: Convert "double-precision" register moves to decodetree
+[PULL 06/29] hw/arm/virt: Add 'compact-highmem' property
-Convert the "double-precision" register moves to decodetree:
+From: Gavin Shan <gshan@redhat.com>
 this covers VMOV scalar-to-gpreg, VMOV gpreg-to-scalar and VDUP.
-Note that the conversion process has tightened up a few of the
+After the improvement to high memory region address assignment is
-UNDEF encoding checks: we now correctly forbid:
+applied, the memory layout can be changed, introducing possible
- * VMOV-to-gpr with U:opc1:opc2 == 10x00 or x0x10
+migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region
- * VMOV-from-gpr with opc1:opc2 == 0x10
+is disabled or enabled when the optimization is applied or not, with
- * VDUP with B:E == 11
+the following configuration. The configuration is only achievable by
- * VDUP with Q == 1 and Vn<0> == 1
+modifying the source code until more properties are added to allow
 users selectively disable those high memory regions.
+  pa_bits              = 40;
+  vms->highmem_redists = false;
+  vms->highmem_ecam    = false;
+  vms->highmem_mmio    = true;
+  # qemu-system-aarch64 -accel kvm -cpu host    \
+    -machine virt-7.2,compact-highmem={on, off} \
+    -m 4G,maxmem=511G -monitor stdio
+  Region             compact-highmem=off         compact-highmem=on
+  ----------------------------------------------------------------
+  MEM                [1GB         512GB]        [1GB         512GB]
+  HIGH_GIC_REDISTS2  [512GB       512GB+64MB]   [disabled]
+  HIGH_PCIE_ECAM     [512GB+256MB 512GB+512MB]  [disabled]
+  HIGH_PCIE_MMIO     [disabled]                 [512GB       1TB]
+In order to keep backwords compatibility, we need to disable the
+optimization on machine, which is virt-7.1 or ealier than it. It
+means the optimization is enabled by default from virt-7.2. Besides,
+'compact-highmem' property is added so that the optimization can be
+explicitly enabled or disabled on all machine types by users.
+Signed-off-by: Gavin Shan <gshan@redhat.com>
+Reviewed-by: Eric Auger <eric.auger@redhat.com>
+Reviewed-by: Cornelia Huck <cohuck@redhat.com>
+Reviewed-by: Marc Zyngier <maz@kernel.org>
+Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
+Message-id: 20221029224307.138822-7-gshan@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
-The accesses of elements < 32 bits could be improved by doing
+ docs/system/arm/virt.rst |  4 ++++
-direct ld/st of the right size rather than 32-bit read-and-shift
+ include/hw/arm/virt.h    |  1 +
-or read-modify-write, but we leave this for later cleanup,
+ hw/arm/virt.c            | 32 ++++++++++++++++++++++++++++++++
-since this series is generally trying to stick to fixing
+files changed, 37 insertions(+)
 the decode.
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/translate-vfp.inc.c | 147 +++++++++++++++++++++++++++++++++
  target/arm/translate.c         |  83 +------------------
  target/arm/vfp.decode          |  36 ++++++++
 files changed, 185 insertions(+), 81 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/docs/system/arm/virt.rst
-+++ b/target/arm/translate-vfp.inc.c
++++ b/docs/system/arm/virt.rst
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
+@@ -XXX,XX +XXX,XX @@ highmem
+   address space above 32 bits. The default is ``on`` for machine types
-     return true;
+   later than ``virt-2.12``.
 +compact-highmem
 +  Set ``on``/``off`` to enable/disable the compact layout for high memory regions.
 +  The default is ``on`` for machine types later than ``virt-7.2``.
 +
  gic-version
    Specify the version of the Generic Interrupt Controller (GIC) to provide.
    Valid values are:
 diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/arm/virt.h
 +++ b/include/hw/arm/virt.h
@@ -XXX,XX +XXX,XX @@ struct VirtMachineClass {
      bool no_pmu;
      bool claim_edge_triggered_timers;
      bool smbios_old_sys_ver;
 +    bool no_highmem_compact;
      bool no_highmem_ecam;
      bool no_ged;   /* Machines < 4.2 have no support for ACPI GED device */
      bool kvm_no_adjvtime;
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static const MemMapEntry base_memmap[] = {
   * Note the extended_memmap is sized so that it eventually also includes the
   * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last
   * index of base_memmap).
 + *
 + * The memory map for these Highmem IO Regions can be in legacy or compact
 + * layout, depending on 'compact-highmem' property. With legacy layout, the
 + * PA space for one specific region is always reserved, even if the region
 + * has been disabled or doesn't fit into the PA space. However, the PA space
 + * for the region won't be reserved in these circumstances with compact layout.
   */
  static MemMapEntry extended_memmap[] = {
      /* Additional 64 MB redist region (can contain up to 512 redistributors) */
@@ -XXX,XX +XXX,XX @@ static void virt_set_highmem(Object *obj, bool value, Error **errp)
      vms->highmem = value;
  }
++static bool virt_get_compact_highmem(Object *obj, Error **errp)
++{
++    VirtMachineState *vms = VIRT_MACHINE(obj);
 +
-+static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
++    return vms->highmem_compact;
 +{
 +    /* VMOV scalar to general purpose register */
 +    TCGv_i32 tmp;
 +    int pass;
 +    uint32_t offset;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
 +        return false;
 +    }
 +
 +    offset = a->index << a->size;
 +    pass = extract32(offset, 2, 1);
 +    offset = extract32(offset, 0, 2) * 8;
 +
 +    if (a->size != 2 && !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = neon_load_reg(a->vn, pass);
 +    switch (a->size) {
 +    case 0:
 +        if (offset) {
 +            tcg_gen_shri_i32(tmp, tmp, offset);
 +        }
 +        if (a->u) {
 +            gen_uxtb(tmp);
 +        } else {
 +            gen_sxtb(tmp);
 +        }
 +        break;
 +    case 1:
 +        if (a->u) {
 +            if (offset) {
 +                tcg_gen_shri_i32(tmp, tmp, 16);
 +            } else {
 +                gen_uxth(tmp);
 +            }
 +        } else {
 +            if (offset) {
 +                tcg_gen_sari_i32(tmp, tmp, 16);
 +            } else {
 +                gen_sxth(tmp);
 +            }
 +        }
 +        break;
 +    case 2:
 +        break;
 +    }
 +    store_reg(s, a->rt, tmp);
 +
 +    return true;
 +}
 +
-+static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
++static void virt_set_compact_highmem(Object *obj, bool value, Error **errp)
 +{
-+    /* VMOV general purpose register to scalar */
++    VirtMachineState *vms = VIRT_MACHINE(obj);
 +    TCGv_i32 tmp, tmp2;
 +    int pass;
 +    uint32_t offset;
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist */
++    vms->highmem_compact = value;
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
 +        return false;
 +    }
 +
 +    offset = a->index << a->size;
 +    pass = extract32(offset, 2, 1);
 +    offset = extract32(offset, 0, 2) * 8;
 +
 +    if (a->size != 2 && !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = load_reg(s, a->rt);
 +    switch (a->size) {
 +    case 0:
 +        tmp2 = neon_load_reg(a->vn, pass);
 +        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
 +        tcg_temp_free_i32(tmp2);
 +        break;
 +    case 1:
 +        tmp2 = neon_load_reg(a->vn, pass);
 +        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
 +        tcg_temp_free_i32(tmp2);
 +        break;
 +    case 2:
 +        break;
 +    }
 +    neon_store_reg(a->vn, pass, tmp);
 +
 +    return true;
 +}
 +
-+static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
+ static bool virt_get_its(Object *obj, Error **errp)
-+{
+ {
-+    /* VDUP (general purpose register) */
+     VirtMachineState *vms = VIRT_MACHINE(obj);
-+    TCGv_i32 tmp;
+@@ -XXX,XX +XXX,XX @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
-+    int size, vec_size;
+                                           "Set on/off to enable/disable using "
                                            "physical address space above 32 bits");
 +    object_class_property_add_bool(oc, "compact-highmem",
 +                                   virt_get_compact_highmem,
 +                                   virt_set_compact_highmem);
 +    object_class_property_set_description(oc, "compact-highmem",
 +                                          "Set on/off to enable/disable compact "
 +                                          "layout for high memory regions");
 +
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+     object_class_property_add_str(oc, "gic-version", virt_get_gic_version,
-+        return false;
+                                   virt_set_gic_version);
-+    }
+     object_class_property_set_description(oc, "gic-version",
@@ -XXX,XX +XXX,XX @@ static void virt_instance_init(Object *obj)
      /* High memory is enabled by default */
      vms->highmem = true;
 +    vms->highmem_compact = !vmc->no_highmem_compact;
      vms->gic_version = VIRT_GIC_VERSION_NOSEL;
      vms->highmem_ecam = !vmc->no_highmem_ecam;
@@ -XXX,XX +XXX,XX @@ DEFINE_VIRT_MACHINE_AS_LATEST(7, 2)
  static void virt_machine_7_1_options(MachineClass *mc)
  {
 +    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist */
+     virt_machine_7_2_options(mc);
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
+     compat_props_add(mc->compat_props, hw_compat_7_1, hw_compat_7_1_len);
-+        return false;
++    /* Compact layout for high memory regions was introduced with 7.2 */
-+    }
++    vmc->no_highmem_compact = true;
-+
+ }
-+    if (a->b && a->e) {
+ DEFINE_VIRT_MACHINE(7, 1)
-+        return false;
 +    }
 +
 +    if (a->q && (a->vn & 1)) {
 +        return false;
 +    }
 +
 +    vec_size = a->q ? 16 : 8;
 +    if (a->b) {
 +        size = 0;
 +    } else if (a->e) {
 +        size = 1;
 +    } else {
 +        size = 2;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = load_reg(s, a->rt);
 +    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
 +                         vec_size, vec_size, tmp);
 +    tcg_temp_free_i32(tmp);
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              /* single register transfer */
              rd = (insn >> 12) & 0xf;
              if (dp) {
 -                int size;
 -                int pass;
 -
 -                VFP_DREG_N(rn, insn);
 -                if (insn & 0xf)
 -                    return 1;
 -                if (insn & 0x00c00060
 -                    && !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 -                    return 1;
 -                }
 -
 -                pass = (insn >> 21) & 1;
 -                if (insn & (1 << 22)) {
 -                    size = 0;
 -                    offset = ((insn >> 5) & 3) * 8;
 -                } else if (insn & (1 << 5)) {
 -                    size = 1;
 -                    offset = (insn & (1 << 6)) ? 16 : 0;
 -                } else {
 -                    size = 2;
 -                    offset = 0;
 -                }
 -                if (insn & ARM_CP_RW_BIT) {
 -                    /* vfp->arm */
 -                    tmp = neon_load_reg(rn, pass);
 -                    switch (size) {
 -                    case 0:
 -                        if (offset)
 -                            tcg_gen_shri_i32(tmp, tmp, offset);
 -                        if (insn & (1 << 23))
 -                            gen_uxtb(tmp);
 -                        else
 -                            gen_sxtb(tmp);
 -                        break;
 -                    case 1:
 -                        if (insn & (1 << 23)) {
 -                            if (offset) {
 -                                tcg_gen_shri_i32(tmp, tmp, 16);
 -                            } else {
 -                                gen_uxth(tmp);
 -                            }
 -                        } else {
 -                            if (offset) {
 -                                tcg_gen_sari_i32(tmp, tmp, 16);
 -                            } else {
 -                                gen_sxth(tmp);
 -                            }
 -                        }
 -                        break;
 -                    case 2:
 -                        break;
 -                    }
 -                    store_reg(s, rd, tmp);
 -                } else {
 -                    /* arm->vfp */
 -                    tmp = load_reg(s, rd);
 -                    if (insn & (1 << 23)) {
 -                        /* VDUP */
 -                        int vec_size = pass ? 16 : 8;
 -                        tcg_gen_gvec_dup_i32(size, neon_reg_offset(rn, 0),
 -                                             vec_size, vec_size, tmp);
 -                        tcg_temp_free_i32(tmp);
 -                    } else {
 -                        /* VMOV */
 -                        switch (size) {
 -                        case 0:
 -                            tmp2 = neon_load_reg(rn, pass);
 -                            tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
 -                            tcg_temp_free_i32(tmp2);
 -                            break;
 -                        case 1:
 -                            tmp2 = neon_load_reg(rn, pass);
 -                            tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
 -                            tcg_temp_free_i32(tmp2);
 -                            break;
 -                        case 2:
 -                            break;
 -                        }
 -                        neon_store_reg(rn, pass, tmp);
 -                    }
 -                }
 +                /* already handled by decodetree */
 +                return 1;
              } else { /* !dp */
                  bool is_sysreg;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@
  #  1110 1110 .... .... .... 101. .... ....
  # (but those patterns might also cover some Neon instructions,
  # which do not live in this file.)
 +
 +# VFP registers have an odd encoding with a four-bit field
 +# and a one-bit field which are assembled in different orders
 +# depending on whether the register is double or single precision.
 +# Each individual instruction function must do the checks for
 +# "double register selected but CPU does not have double support"
 +# and "double register number has bit 4 set but CPU does not
 +# support D16-D31" (which should UNDEF).
 +%vm_dp  5:1 0:4
 +%vm_sp  0:4 5:1
 +%vn_dp  7:1 16:4
 +%vn_sp  16:4 7:1
 +%vd_dp  22:1 12:4
 +%vd_sp  12:4 22:1
 +
 +%vmov_idx_b     21:1 5:2
 +%vmov_idx_h     21:1 6:1
 +
 +# VMOV scalar to general-purpose register; note that this does
 +# include some Neon cases.
 +VMOV_to_gp   ---- 1110 u:1 1.        1 .... rt:4 1011 ... 1 0000 \
 +             vn=%vn_dp size=0 index=%vmov_idx_b
 +VMOV_to_gp   ---- 1110 u:1 0.        1 .... rt:4 1011 ..1 1 0000 \
 +             vn=%vn_dp size=1 index=%vmov_idx_h
 +VMOV_to_gp   ---- 1110 0   0 index:1 1 .... rt:4 1011 .00 1 0000 \
 +             vn=%vn_dp size=2 u=0
 +
 +VMOV_from_gp ---- 1110 0 1.        0 .... rt:4 1011 ... 1 0000 \
 +             vn=%vn_dp size=0 index=%vmov_idx_b
 +VMOV_from_gp ---- 1110 0 0.        0 .... rt:4 1011 ..1 1 0000 \
 +             vn=%vn_dp size=1 index=%vmov_idx_h
 +VMOV_from_gp ---- 1110 0 0 index:1 0 .... rt:4 1011 .00 1 0000 \
 +             vn=%vn_dp size=2
 +
 +VDUP         ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \
 +             vn=%vn_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 24/48] target/arm: Convert VFP VMLA to decodetree
+[PULL 07/29] hw/arm/virt: Add properties to disable high memory regions
-Convert the VFP VMLA instruction to decodetree.
+From: Gavin Shan <gshan@redhat.com>
-This is the first of the VFP 3-operand data processing instructions,
+The 3 high memory regions are usually enabled by default, but they may
-so we include in this patch the code which loops over the elements
+be not used. For example, VIRT_HIGH_GIC_REDIST2 isn't needed by GICv2.
-for an old-style VFP vector operation. The existing code to do this
+This leads to waste in the PA space.
 looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since
 we are going to be converting instructions one at a time anyway
 we can take the opportunity to make the new loop use TCG temporaries,
 which means we can do that conversion one operation at a time
 rather than needing to do it all in one go.
-We include an UNDEF check which was missing in the old code:
+Add properties ("highmem-redists", "highmem-ecam", "highmem-mmio") to
-short-vector operations (with stride or length non-zero) were
+allow users selectively disable them if needed. After that, the high
-deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec
+memory region for GICv3 or GICv4 redistributor can be disabled by user,
-field does not indicate that support for short vectors is present
+the number of maximal supported CPUs needs to be calculated based on
-we UNDEF the operations that would use them. (This is a change
+'vms->highmem_redists'. The follow-up error message is also improved
-of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which
+to indicate if the high memory region for GICv3 and GICv4 has been
-previously were all incorrectly allowing short-vector operations.)
+enabled or not.
-Note that the conversion fixes a bug in the old code for the
+Suggested-by: Marc Zyngier <maz@kernel.org>
-case of VFP short-vector "mixed scalar/vector operations". These
+Signed-off-by: Gavin Shan <gshan@redhat.com>
-happen where the destination register is in a vector bank but
+Reviewed-by: Marc Zyngier <maz@kernel.org>
-but the second operand is in a scalar bank. For example
+Reviewed-by: Cornelia Huck <cohuck@redhat.com>
-  vmla.f64 d10, d1, d16   with length 2 stride 2
+Reviewed-by: Eric Auger <eric.auger@redhat.com>
-is equivalent to the pair of scalar operations
+Message-id: 20221029224307.138822-8-gshan@redhat.com
-  vmla.f64 d10, d1, d16
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-  vmla.f64 d8, d3, d16
+---
-where the destination and first input register cycle through
+ docs/system/arm/virt.rst | 13 +++++++
-their vector but the second input is scalar (d16). In the
+ hw/arm/virt.c            | 75 ++++++++++++++++++++++++++++++++++++++--
-old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d}
+files changed, 86 insertions(+), 2 deletions(-)
 as a temporary output for the multiply, which trashes the
 second input operand. For the fully-scalar case (where we
 never do a second iteration) and the fully-vector case
 (where the loop loads the new second input operand) this
 doesn't matter, but for the mixed scalar/vector case we
 will end up using the wrong value for later loop iterations.
 In the new code we use TCG temporaries and so avoid the bug.
 This bug is present for all the multiply-accumulate insns
 that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS.
-Note 2: the expression used to calculate the next register
+diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
 number in the vector bank is not in fact correct; we leave
 this behaviour unchanged from the old decoder and will
 fix this bug later in the series.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/cpu.h               |   5 +
  target/arm/translate-vfp.inc.c | 205 +++++++++++++++++++++++++++++++++
  target/arm/translate.c         |  14 ++-
  target/arm/vfp.decode          |   6 +
 files changed, 224 insertions(+), 6 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/docs/system/arm/virt.rst
-+++ b/target/arm/cpu.h
++++ b/docs/system/arm/virt.rst
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp_d32(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@ compact-highmem
-     return FIELD_EX64(id->mvfr0, MVFR0, SIMDREG) >= 2;
+   Set ``on``/``off`` to enable/disable the compact layout for high memory regions.
    The default is ``on`` for machine types later than ``virt-7.2``.
 +highmem-redists
 +  Set ``on``/``off`` to enable/disable the high memory region for GICv3 or
 +  GICv4 redistributor. The default is ``on``. Setting this to ``off`` will
 +  limit the maximum number of CPUs when GICv3 or GICv4 is used.
 +
 +highmem-ecam
 +  Set ``on``/``off`` to enable/disable the high memory region for PCI ECAM.
 +  The default is ``on`` for machine types later than ``virt-3.0``.
 +
 +highmem-mmio
 +  Set ``on``/``off`` to enable/disable the high memory region for PCI MMIO.
 +  The default is ``on``.
 +
  gic-version
    Specify the version of the Generic Interrupt Controller (GIC) to provide.
    Valid values are:
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
      if (vms->gic_version == VIRT_GIC_VERSION_2) {
          virt_max_cpus = GIC_NCPU;
      } else {
 -        virt_max_cpus = virt_redist_capacity(vms, VIRT_GIC_REDIST) +
 -            virt_redist_capacity(vms, VIRT_HIGH_GIC_REDIST2);
 +        virt_max_cpus = virt_redist_capacity(vms, VIRT_GIC_REDIST);
 +        if (vms->highmem_redists) {
 +            virt_max_cpus += virt_redist_capacity(vms, VIRT_HIGH_GIC_REDIST2);
 +        }
      }
      if (max_cpus > virt_max_cpus) {
          error_report("Number of SMP CPUs requested (%d) exceeds max CPUs "
                       "supported by machine 'mach-virt' (%d)",
                       max_cpus, virt_max_cpus);
 +        if (vms->gic_version != VIRT_GIC_VERSION_2 && !vms->highmem_redists) {
 +            error_printf("Try 'highmem-redists=on' for more CPUs\n");
 +        }
 +
          exit(1);
      }
@@ -XXX,XX +XXX,XX @@ static void virt_set_compact_highmem(Object *obj, bool value, Error **errp)
      vms->highmem_compact = value;
  }
-+static inline bool isar_feature_aa32_fpshvec(const ARMISARegisters *id)
++static bool virt_get_highmem_redists(Object *obj, Error **errp)
 +{
-+    return FIELD_EX64(id->mvfr0, MVFR0, FPSHVEC) > 0;
++    VirtMachineState *vms = VIRT_MACHINE(obj);
 +
 +    return vms->highmem_redists;
 +}
 +
- /*
++static void virt_set_highmem_redists(Object *obj, bool value, Error **errp)
-  * We always set the FP and SIMD FP16 fields to indicate identical
++{
-  * levels of support (assuming SIMD is implemented at all), so
++    VirtMachineState *vms = VIRT_MACHINE(obj);
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.inc.c
 +++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
      return true;
  }
 +
-+/*
++    vms->highmem_redists = value;
 + * Types for callbacks for do_vfp_3op_sp() and do_vfp_3op_dp().
 + * The callback should emit code to write a value to vd. If
 + * do_vfp_3op_{sp,dp}() was passed reads_vd then the TCGv vd
 + * will contain the old value of the relevant VFP register;
 + * otherwise it must be written to only.
 + */
 +typedef void VFPGen3OpSPFn(TCGv_i32 vd,
 +                           TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst);
 +typedef void VFPGen3OpDPFn(TCGv_i64 vd,
 +                           TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst);
 +
 +/*
 + * Perform a 3-operand VFP data processing instruction. fn is the
 + * callback to do the actual operation; this function deals with the
 + * code to handle looping around for VFP vector processing.
 + */
 +static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
 +                          int vd, int vn, int vm, bool reads_vd)
 +{
 +    uint32_t delta_m = 0;
 +    uint32_t delta_d = 0;
 +    uint32_t bank_mask = 0;
 +    int veclen = s->vec_len;
 +    TCGv_i32 f0, f1, fd;
 +    TCGv_ptr fpst;
 +
 +    if (!dc_isar_feature(aa32_fpshvec, s) &&
 +        (veclen != 0 || s->vec_stride != 0)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (veclen > 0) {
 +        bank_mask = 0x18;
 +
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = s->vec_stride + 1;
 +
 +            if ((vm & bank_mask) == 0) {
 +                /* mixed scalar/vector */
 +                delta_m = 0;
 +            } else {
 +                /* vector */
 +                delta_m = delta_d;
 +            }
 +        }
 +    }
 +
 +    f0 = tcg_temp_new_i32();
 +    f1 = tcg_temp_new_i32();
 +    fd = tcg_temp_new_i32();
 +    fpst = get_fpstatus_ptr(0);
 +
 +    neon_load_reg32(f0, vn);
 +    neon_load_reg32(f1, vm);
 +
 +    for (;;) {
 +        if (reads_vd) {
 +            neon_load_reg32(fd, vd);
 +        }
 +        fn(fd, f0, f1, fpst);
 +        neon_store_reg32(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
 +        neon_load_reg32(f0, vn);
 +        if (delta_m) {
 +            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +            neon_load_reg32(f1, vm);
 +        }
 +    }
 +
 +    tcg_temp_free_i32(f0);
 +    tcg_temp_free_i32(f1);
 +    tcg_temp_free_i32(fd);
 +    tcg_temp_free_ptr(fpst);
 +
 +    return true;
 +}
 +
-+static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
++static bool virt_get_highmem_ecam(Object *obj, Error **errp)
 +                          int vd, int vn, int vm, bool reads_vd)
 +{
-+    uint32_t delta_m = 0;
++    VirtMachineState *vms = VIRT_MACHINE(obj);
 +    uint32_t delta_d = 0;
 +    uint32_t bank_mask = 0;
 +    int veclen = s->vec_len;
 +    TCGv_i64 f0, f1, fd;
 +    TCGv_ptr fpst;
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist */
++    return vms->highmem_ecam;
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vn | vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!dc_isar_feature(aa32_fpshvec, s) &&
 +        (veclen != 0 || s->vec_stride != 0)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (veclen > 0) {
 +        bank_mask = 0xc;
 +
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = (s->vec_stride >> 1) + 1;
 +
 +            if ((vm & bank_mask) == 0) {
 +                /* mixed scalar/vector */
 +                delta_m = 0;
 +            } else {
 +                /* vector */
 +                delta_m = delta_d;
 +            }
 +        }
 +    }
 +
 +    f0 = tcg_temp_new_i64();
 +    f1 = tcg_temp_new_i64();
 +    fd = tcg_temp_new_i64();
 +    fpst = get_fpstatus_ptr(0);
 +
 +    neon_load_reg64(f0, vn);
 +    neon_load_reg64(f1, vm);
 +
 +    for (;;) {
 +        if (reads_vd) {
 +            neon_load_reg64(fd, vd);
 +        }
 +        fn(fd, f0, f1, fpst);
 +        neon_store_reg64(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
 +        neon_load_reg64(f0, vn);
 +        if (delta_m) {
 +            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +            neon_load_reg64(f1, vm);
 +        }
 +    }
 +
 +    tcg_temp_free_i64(f0);
 +    tcg_temp_free_i64(f1);
 +    tcg_temp_free_i64(fd);
 +    tcg_temp_free_ptr(fpst);
 +
 +    return true;
 +}
 +
-+static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
++static void virt_set_highmem_ecam(Object *obj, bool value, Error **errp)
 +{
-+    /* Note that order of inputs to the add matters for NaNs */
++    VirtMachineState *vms = VIRT_MACHINE(obj);
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +
-+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
++    vms->highmem_ecam = value;
 +    gen_helper_vfp_adds(vd, vd, tmp, fpst);
 +    tcg_temp_free_i32(tmp);
 +}
 +
-+static bool trans_VMLA_sp(DisasContext *s, arg_VMLA_sp *a)
++static bool virt_get_highmem_mmio(Object *obj, Error **errp)
 +{
-+    return do_vfp_3op_sp(s, gen_VMLA_sp, a->vd, a->vn, a->vm, true);
++    VirtMachineState *vms = VIRT_MACHINE(obj);
 +
 +    return vms->highmem_mmio;
 +}
 +
-+static void gen_VMLA_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
++static void virt_set_highmem_mmio(Object *obj, bool value, Error **errp)
 +{
-+    /* Note that order of inputs to the add matters for NaNs */
++    VirtMachineState *vms = VIRT_MACHINE(obj);
 +    TCGv_i64 tmp = tcg_temp_new_i64();
 +
-+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
++    vms->highmem_mmio = value;
 +    gen_helper_vfp_addd(vd, vd, tmp, fpst);
 +    tcg_temp_free_i64(tmp);
 +}
 +
-+static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             op = ((insn >> 20) & 8) | ((insn >> 19) & 6) | ((insn >> 6) & 1);
-             rn = VFP_SREG_N(insn);
-+            switch (op) {
-+            case 0:
-+                /* Already handled by decodetree */
-+                return 1;
-+            default:
-+                break;
-+            }
 +
-             if (op == 15) {
+ static bool virt_get_its(Object *obj, Error **errp)
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
+ {
-                 switch (rn) {
+     VirtMachineState *vms = VIRT_MACHINE(obj);
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
-             for (;;) {
+                                           "Set on/off to enable/disable compact "
-                 /* Perform the calculation.  */
+                                           "layout for high memory regions");
-                 switch (op) {
--                case 0: /* VMLA: fd + (fn * fm) */
++    object_class_property_add_bool(oc, "highmem-redists",
--                    /* Note that order of inputs to the add matters for NaNs */
++                                   virt_get_highmem_redists,
--                    gen_vfp_F1_mul(dp);
++                                   virt_set_highmem_redists);
--                    gen_mov_F0_vreg(dp, rd);
++    object_class_property_set_description(oc, "highmem-redists",
--                    gen_vfp_add(dp);
++                                          "Set on/off to enable/disable high "
--                    break;
++                                          "memory region for GICv3 or GICv4 "
-                 case 1: /* VMLS: fd + -(fn * fm) */
++                                          "redistributor");
                      gen_vfp_mul(dp);
                      gen_vfp_F1_neg(dp);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VLDM_VSTM_sp ---- 1101 0.1 l:1 rn:4 .... 1010 imm:8 \
               vd=%vd_sp p=1 u=0 w=1
  VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
               vd=%vd_dp p=1 u=0 w=1
 +
-+# 3-register VFP data-processing; bits [23,21:20,6] identify the operation.
++    object_class_property_add_bool(oc, "highmem-ecam",
-+VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 .... \
++                                   virt_get_highmem_ecam,
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
++                                   virt_set_highmem_ecam);
-+VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 .... \
++    object_class_property_set_description(oc, "highmem-ecam",
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
++                                          "Set on/off to enable/disable high "
 +                                          "memory region for PCI ECAM");
 +
 +    object_class_property_add_bool(oc, "highmem-mmio",
 +                                   virt_get_highmem_mmio,
 +                                   virt_set_highmem_mmio);
 +    object_class_property_set_description(oc, "highmem-mmio",
 +                                          "Set on/off to enable/disable high "
 +                                          "memory region for PCI MMIO");
 +
      object_class_property_add_str(oc, "gic-version", virt_get_gic_version,
                                    virt_set_gic_version);
      object_class_property_set_description(oc, "gic-version",
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 06/48] target/arm: Fix output of PAuth Auth
+[PULL 08/29] hw/arm/virt: build SMBIOS 19 table
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Mihai Carabas <mihai.carabas@oracle.com>
-The ARM pseudocode installs the error_code into the original
+Use the base_memmap to build the SMBIOS 19 table which provides the address
-pointer, not the encrypted pointer.  The difference applies
+mapping for a Physical Memory Array (from spec [1] chapter 7.20).
 within the 7 bits of pac data; the result should be the sign
 extension of bit 55.
-Add a testcase to that effect.
+This was present on i386 from commit c97294ec1b9e36887e119589d456557d72ab37b5
 ("SMBIOS: Build aggregate smbios tables and entry point").
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+[1] https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.5.0.pdf
 The absence of this table is a breach of the specs and is
 detected by the FirmwareTestSuite (FWTS), but it doesn't
 cause any known problems for guest OSes.
 Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
 Message-id: 1668789029-5432-1-git-send-email-mihai.carabas@oracle.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- tests/tcg/aarch64/Makefile.target |  2 +-
+ hw/arm/virt.c | 8 +++++++-
- target/arm/pauth_helper.c         |  4 +-
+file changed, 7 insertions(+), 1 deletion(-)
  tests/tcg/aarch64/pauth-2.c       | 61 +++++++++++++++++++++++++++++++
 files changed, 64 insertions(+), 3 deletions(-)
  create mode 100644 tests/tcg/aarch64/pauth-2.c
-diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
+diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/tests/tcg/aarch64/Makefile.target
+--- a/hw/arm/virt.c
-+++ b/tests/tcg/aarch64/Makefile.target
++++ b/hw/arm/virt.c
-@@ -XXX,XX +XXX,XX @@ run-fcvt: fcvt
+@@ -XXX,XX +XXX,XX @@ static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
-     $(call run-test,$<,$(QEMU) $<, "$< on $(TARGET_NAME)")
+ static void virt_build_smbios(VirtMachineState *vms)
-     $(call diff-out,$<,$(AARCH64_SRC)/fcvt.ref)
+ {
+     MachineClass *mc = MACHINE_GET_CLASS(vms);
--AARCH64_TESTS += pauth-1
++    MachineState *ms = MACHINE(vms);
-+AARCH64_TESTS += pauth-1 pauth-2
+     VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
- run-pauth-%: QEMU += -cpu max
+     uint8_t *smbios_tables, *smbios_anchor;
+     size_t smbios_tables_len, smbios_anchor_len;
- TESTS:=$(AARCH64_TESTS)
++    struct smbios_phys_mem_area mem_array;
-diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
+     const char *product = "QEMU Virtual Machine";
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/pauth_helper.c
+     if (kvm_enabled()) {
-+++ b/target/arm/pauth_helper.c
+@@ -XXX,XX +XXX,XX @@ static void virt_build_smbios(VirtMachineState *vms)
-@@ -XXX,XX +XXX,XX @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
+                         vmc->smbios_old_sys_ver ? "1.0" : mc->name, false,
-     if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
+                         true, SMBIOS_ENTRY_POINT_TYPE_64);
-         int error_code = (keynumber << 1) | (keynumber ^ 1);
-         if (param.tbi) {
+-    smbios_get_tables(MACHINE(vms), NULL, 0,
--            return deposit64(ptr, 53, 2, error_code);
++    /* build the array of physical mem area from base_memmap */
-+            return deposit64(orig_ptr, 53, 2, error_code);
++    mem_array.address = vms->memmap[VIRT_MEM].base;
-         } else {
++    mem_array.length = ms->ram_size;
 -            return deposit64(ptr, 61, 2, error_code);
 +            return deposit64(orig_ptr, 61, 2, error_code);
          }
      }
      return orig_ptr;
 diff --git a/tests/tcg/aarch64/pauth-2.c b/tests/tcg/aarch64/pauth-2.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/tests/tcg/aarch64/pauth-2.c
@@ -XXX,XX +XXX,XX @@
 +#include <stdint.h>
 +#include <assert.h>
 +
-+asm(".arch armv8.4-a");
++    smbios_get_tables(ms, &mem_array, 1,
-+
+                       &smbios_tables, &smbios_tables_len,
-+void do_test(uint64_t value)
+                       &smbios_anchor, &smbios_anchor_len,
-+{
+                       &error_fatal);
 +    uint64_t salt1, salt2;
 +    uint64_t encode, decode;
 +
 +    /*
 +     * With TBI enabled and a 48-bit VA, there are 7 bits of auth,
 +     * and so a 1/128 chance of encode = pac(value,key,salt) producing
 +     * an auth for which leaves value unchanged.
 +     * Iterate until we find a salt for which encode != value.
 +     */
 +    for (salt1 = 1; ; salt1++) {
 +        asm volatile("pacda %0, %2" : "=r"(encode) : "0"(value), "r"(salt1));
 +        if (encode != value) {
 +            break;
 +        }
 +    }
 +
 +    /* A valid salt must produce a valid authorization.  */
 +    asm volatile("autda %0, %2" : "=r"(decode) : "0"(encode), "r"(salt1));
 +    assert(decode == value);
 +
 +    /*
 +     * An invalid salt usually fails authorization, but again there
 +     * is a chance of choosing another salt that works.
 +     * Iterate until we find another salt which does fail.
 +     */
 +    for (salt2 = salt1 + 1; ; salt2++) {
 +        asm volatile("autda %0, %2" : "=r"(decode) : "0"(encode), "r"(salt2));
 +        if (decode != value) {
 +            break;
 +        }
 +    }
 +
 +    /* The VA bits, bit 55, and the TBI bits, should be unchanged.  */
 +    assert(((decode ^ value) & 0xff80ffffffffffffull) == 0);
 +
 +    /*
 +     * Bits [54:53] are an error indicator based on the key used;
 +     * the DA key above is keynumber 0, so error == 0b01.  Otherwise
 +     * bit 55 of the original is sign-extended into the rest of the auth.
 +     */
 +    if ((value >> 55) & 1) {
 +        assert(((decode >> 48) & 0xff) == 0b10111111);
 +    } else {
 +        assert(((decode >> 48) & 0xff) == 0b00100000);
 +    }
 +}
 +
 +int main()
 +{
 +    do_test(0);
 +    do_test(-1);
 +    do_test(0xda004acedeadbeefull);
 +    return 0;
 +}
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 02/48] target/arm: Use tcg_gen_gvec_bitsel
+[PULL 09/29] target/arm: Add Cortex-A55 CPU
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Timofey Kutergin <tkutergin@gmail.com>
-This replaces 3 target-specific implementations for BIT, BIF, and BSL.
+The Cortex-A55 is one of the newer armv8.2+ CPUs; in particular
 it supports the Privileged Access Never (PAN) feature. Add
 a model of this CPU, so you can use a CPU type on the virt
 board that models a specific real hardware CPU, rather than
 having to use the QEMU-specific "max" CPU type.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Timofey Kutergin <tkutergin@gmail.com>
 Message-id: 20221121150819.2782817-1-tkutergin@gmail.com
 [PMM: tweaked commit message]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20190518191934.21887-3-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.h |  2 +
+ docs/system/arm/virt.rst |  1 +
- target/arm/translate.h     |  3 --
+ hw/arm/virt.c            |  1 +
- target/arm/translate-a64.c | 15 ++++++--
+ target/arm/cpu64.c       | 69 ++++++++++++++++++++++++++++++++++++++++
- target/arm/translate.c     | 78 +++-----------------------------------
+files changed, 71 insertions(+)
 files changed, 20 insertions(+), 78 deletions(-)
-diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
+diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.h
+--- a/docs/system/arm/virt.rst
-+++ b/target/arm/translate-a64.h
++++ b/docs/system/arm/virt.rst
-@@ -XXX,XX +XXX,XX @@ typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
+@@ -XXX,XX +XXX,XX @@ Supported guest CPU types:
-                          uint32_t, uint32_t);
+ - ``cortex-a15`` (32-bit; the default)
- typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
+ - ``cortex-a35`` (64-bit)
-                         uint32_t, uint32_t, uint32_t);
+ - ``cortex-a53`` (64-bit)
-+typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
++- ``cortex-a55`` (64-bit)
-+                        uint32_t, uint32_t, uint32_t);
+ - ``cortex-a57`` (64-bit)
+ - ``cortex-a72`` (64-bit)
- #endif /* TARGET_ARM_TRANSLATE_A64_H */
+ - ``cortex-a76`` (64-bit)
-diff --git a/target/arm/translate.h b/target/arm/translate.h
+diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.h
+--- a/hw/arm/virt.c
-+++ b/target/arm/translate.h
++++ b/hw/arm/virt.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_ss_advance(DisasContext *s)
+@@ -XXX,XX +XXX,XX @@ static const char *valid_cpus[] = {
      ARM_CPU_TYPE_NAME("cortex-a15"),
      ARM_CPU_TYPE_NAME("cortex-a35"),
      ARM_CPU_TYPE_NAME("cortex-a53"),
 +    ARM_CPU_TYPE_NAME("cortex-a55"),
      ARM_CPU_TYPE_NAME("cortex-a57"),
      ARM_CPU_TYPE_NAME("cortex-a72"),
      ARM_CPU_TYPE_NAME("cortex-a76"),
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
      define_cortex_a72_a57_a53_cp_reginfo(cpu);
  }
- /* Vector operations shared between ARM and AArch64.  */
++static void aarch64_a55_initfn(Object *obj)
 -extern const GVecGen3 bsl_op;
 -extern const GVecGen3 bit_op;
 -extern const GVecGen3 bif_op;
  extern const GVecGen3 mla_op[4];
  extern const GVecGen3 mls_op[4];
  extern const GVecGen3 cmtst_op[4];
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_fn3(DisasContext *s, bool is_q, int rd, int rn, int rm,
              vec_full_reg_offset(s, rm), is_q ? 16 : 8, vec_full_reg_size(s));
  }
 +/* Expand a 4-operand AdvSIMD vector operation using an expander function.  */
 +static void gen_gvec_fn4(DisasContext *s, bool is_q, int rd, int rn, int rm,
 +                         int rx, GVecGen4Fn *gvec_fn, int vece)
 +{
-+    gvec_fn(vece, vec_full_reg_offset(s, rd), vec_full_reg_offset(s, rn),
++    ARMCPU *cpu = ARM_CPU(obj);
-+            vec_full_reg_offset(s, rm), vec_full_reg_offset(s, rx),
++
-+            is_q ? 16 : 8, vec_full_reg_size(s));
++    cpu->dtb_compatible = "arm,cortex-a55";
 +    set_feature(&cpu->env, ARM_FEATURE_V8);
 +    set_feature(&cpu->env, ARM_FEATURE_NEON);
 +    set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
 +    set_feature(&cpu->env, ARM_FEATURE_AARCH64);
 +    set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
 +    set_feature(&cpu->env, ARM_FEATURE_EL2);
 +    set_feature(&cpu->env, ARM_FEATURE_EL3);
 +    set_feature(&cpu->env, ARM_FEATURE_PMU);
 +
 +    /* Ordered by B2.4 AArch64 registers by functional group */
 +    cpu->clidr = 0x82000023;
 +    cpu->ctr = 0x84448004; /* L1Ip = VIPT */
 +    cpu->dcz_blocksize = 4; /* 64 bytes */
 +    cpu->isar.id_aa64dfr0  = 0x0000000010305408ull;
 +    cpu->isar.id_aa64isar0 = 0x0000100010211120ull;
 +    cpu->isar.id_aa64isar1 = 0x0000000000100001ull;
 +    cpu->isar.id_aa64mmfr0 = 0x0000000000101122ull;
 +    cpu->isar.id_aa64mmfr1 = 0x0000000010212122ull;
 +    cpu->isar.id_aa64mmfr2 = 0x0000000000001011ull;
 +    cpu->isar.id_aa64pfr0  = 0x0000000010112222ull;
 +    cpu->isar.id_aa64pfr1  = 0x0000000000000010ull;
 +    cpu->id_afr0       = 0x00000000;
 +    cpu->isar.id_dfr0  = 0x04010088;
 +    cpu->isar.id_isar0 = 0x02101110;
 +    cpu->isar.id_isar1 = 0x13112111;
 +    cpu->isar.id_isar2 = 0x21232042;
 +    cpu->isar.id_isar3 = 0x01112131;
 +    cpu->isar.id_isar4 = 0x00011142;
 +    cpu->isar.id_isar5 = 0x01011121;
 +    cpu->isar.id_isar6 = 0x00000010;
 +    cpu->isar.id_mmfr0 = 0x10201105;
 +    cpu->isar.id_mmfr1 = 0x40000000;
 +    cpu->isar.id_mmfr2 = 0x01260000;
 +    cpu->isar.id_mmfr3 = 0x02122211;
 +    cpu->isar.id_mmfr4 = 0x00021110;
 +    cpu->isar.id_pfr0  = 0x10010131;
 +    cpu->isar.id_pfr1  = 0x00011011;
 +    cpu->isar.id_pfr2  = 0x00000011;
 +    cpu->midr = 0x412FD050;          /* r2p0 */
 +    cpu->revidr = 0;
 +
 +    /* From B2.23 CCSIDR_EL1 */
 +    cpu->ccsidr[0] = 0x700fe01a; /* 32KB L1 dcache */
 +    cpu->ccsidr[1] = 0x200fe01a; /* 32KB L1 icache */
 +    cpu->ccsidr[2] = 0x703fe07a; /* 512KB L2 cache */
 +
 +    /* From B2.96 SCTLR_EL3 */
 +    cpu->reset_sctlr = 0x30c50838;
 +
 +    /* From B4.45 ICH_VTR_EL2 */
 +    cpu->gic_num_lrs = 4;
 +    cpu->gic_vpribits = 5;
 +    cpu->gic_vprebits = 5;
 +    cpu->gic_pribits = 5;
 +
 +    cpu->isar.mvfr0 = 0x10110222;
 +    cpu->isar.mvfr1 = 0x13211111;
 +    cpu->isar.mvfr2 = 0x00000043;
 +
 +    /* From D5.4 AArch64 PMU register summary */
 +    cpu->isar.reset_pmcr_el0 = 0x410b3000;
 +}
 +
- /* Expand a 2-operand + immediate AdvSIMD vector operation using
+ static void aarch64_a72_initfn(Object *obj)
   * an op descriptor.
   */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
          return;
      case 5: /* BSL bitwise select */
 -        gen_gvec_op3(s, is_q, rd, rn, rm, &bsl_op);
 +        gen_gvec_fn4(s, is_q, rd, rd, rn, rm, tcg_gen_gvec_bitsel, 0);
          return;
      case 6: /* BIT, bitwise insert if true */
 -        gen_gvec_op3(s, is_q, rd, rn, rm, &bit_op);
 +        gen_gvec_fn4(s, is_q, rd, rm, rn, rd, tcg_gen_gvec_bitsel, 0);
          return;
      case 7: /* BIF, bitwise insert if false */
 -        gen_gvec_op3(s, is_q, rd, rn, rm, &bif_op);
 +        gen_gvec_fn4(s, is_q, rd, rm, rd, rn, tcg_gen_gvec_bitsel, 0);
          return;
      default:
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int do_v81_helper(DisasContext *s, gen_helper_gvec_3_ptr *fn,
      return 1;
  }
 -/*
 - * Expanders for VBitOps_VBIF, VBIT, VBSL.
 - */
 -static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
 -{
 -    tcg_gen_xor_i64(rn, rn, rm);
 -    tcg_gen_and_i64(rn, rn, rd);
 -    tcg_gen_xor_i64(rd, rm, rn);
 -}
 -
 -static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
 -{
 -    tcg_gen_xor_i64(rn, rn, rd);
 -    tcg_gen_and_i64(rn, rn, rm);
 -    tcg_gen_xor_i64(rd, rd, rn);
 -}
 -
 -static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
 -{
 -    tcg_gen_xor_i64(rn, rn, rd);
 -    tcg_gen_andc_i64(rn, rn, rm);
 -    tcg_gen_xor_i64(rd, rd, rn);
 -}
 -
 -static void gen_bsl_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
 -{
 -    tcg_gen_xor_vec(vece, rn, rn, rm);
 -    tcg_gen_and_vec(vece, rn, rn, rd);
 -    tcg_gen_xor_vec(vece, rd, rm, rn);
 -}
 -
 -static void gen_bit_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
 -{
 -    tcg_gen_xor_vec(vece, rn, rn, rd);
 -    tcg_gen_and_vec(vece, rn, rn, rm);
 -    tcg_gen_xor_vec(vece, rd, rd, rn);
 -}
 -
 -static void gen_bif_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
 -{
 -    tcg_gen_xor_vec(vece, rn, rn, rd);
 -    tcg_gen_andc_vec(vece, rn, rn, rm);
 -    tcg_gen_xor_vec(vece, rd, rd, rn);
 -}
 -
 -const GVecGen3 bsl_op = {
 -    .fni8 = gen_bsl_i64,
 -    .fniv = gen_bsl_vec,
 -    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -    .load_dest = true
 -};
 -
 -const GVecGen3 bit_op = {
 -    .fni8 = gen_bit_i64,
 -    .fniv = gen_bit_vec,
 -    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -    .load_dest = true
 -};
 -
 -const GVecGen3 bif_op = {
 -    .fni8 = gen_bif_i64,
 -    .fniv = gen_bif_vec,
 -    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -    .load_dest = true
 -};
 -
  static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
  {
-     tcg_gen_vec_sar8i_i64(a, a, shift);
+     ARMCPU *cpu = ARM_CPU(obj);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo aarch64_cpus[] = {
-                                  vec_size, vec_size);
+     { .name = "cortex-a35",         .initfn = aarch64_a35_initfn },
-                 break;
+     { .name = "cortex-a57",         .initfn = aarch64_a57_initfn },
-             case 5: /* VBSL */
+     { .name = "cortex-a53",         .initfn = aarch64_a53_initfn },
--                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
++    { .name = "cortex-a55",         .initfn = aarch64_a55_initfn },
--                               vec_size, vec_size, &bsl_op);
+     { .name = "cortex-a72",         .initfn = aarch64_a72_initfn },
-+                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rd_ofs, rn_ofs, rm_ofs,
+     { .name = "cortex-a76",         .initfn = aarch64_a76_initfn },
-+                                    vec_size, vec_size);
+     { .name = "a64fx",              .initfn = aarch64_a64fx_initfn },
                  break;
              case 6: /* VBIT */
 -                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
 -                               vec_size, vec_size, &bit_op);
 +                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rn_ofs, rd_ofs,
 +                                    vec_size, vec_size);
                  break;
              case 7: /* VBIF */
 -                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
 -                               vec_size, vec_size, &bif_op);
 +                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rd_ofs, rn_ofs,
 +                                    vec_size, vec_size);
                  break;
              }
              return 0;
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 01/48] target/arm: Vectorize USHL and SSHL
+[PULL 10/29] hw/intc/arm_gicv3: Fix GICD_TYPER ITLinesNumber advertisement
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Luke Starrett <lukes@xsightlabs.com>
-These instructions shift left or right depending on the sign
+The ARM GICv3 TRM describes that the ITLinesNumber field of GICD_TYPER
-of the input, and 7 bits are significant to the shift.  This
+register:
 requires several masks and selects in addition to the actual
 shifts to form the complete answer.
-That said, the operation is still a small improvement even for
+"indicates the maximum SPI INTID that the GIC implementation supports"
 two 64-bit elements -- 13 vector operations instead of 2 * 7
 integer operations.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+As SPI #0 is absolute IRQ #32, the max SPI INTID should have accounted
-Message-id: 20190603232209.20704-1-richard.henderson@linaro.org
+for the internal 16x SGI's and 16x PPI's.  However, the original GICv3
 model subtracted off the SGI/PPI.  Cosmetically this can be seen at OS
 boot (Linux) showing 32 shy of what should be there, i.e.:
     [    0.000000] GICv3: 224 SPIs implemented
 Though in hw/arm/virt.c, the machine is configured for 256 SPI's.  ARM
 virt machine likely doesn't have a problem with this because the upper
 IRQ's don't actually have anything meaningful wired. But, this does
 become a functional issue on a custom use case which wants to make use
 of these IRQ's.  Additionally, boot code (i.e. TF-A) will only init up
 to the number (blocks of 32) that it believes to actually be there.
 Signed-off-by: Luke Starrett <lukes@xsightlabs.com>
 Message-id: AM9P193MB168473D99B761E204E032095D40D9@AM9P193MB1684.EURP193.PROD.OUTLOOK.COM
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.h        |  11 +-
+ hw/intc/arm_gicv3_dist.c | 4 ++--
- target/arm/translate.h     |   6 +
+file changed, 2 insertions(+), 2 deletions(-)
  target/arm/neon_helper.c   |  33 ----
  target/arm/translate-a64.c |  18 +--
  target/arm/translate.c     | 300 +++++++++++++++++++++++++++++++++++--
  target/arm/vec_helper.c    |  88 +++++++++++
 files changed, 390 insertions(+), 66 deletions(-)
-diff --git a/target/arm/helper.h b/target/arm/helper.h
+diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
+--- a/hw/intc/arm_gicv3_dist.c
-+++ b/target/arm/helper.h
++++ b/hw/intc/arm_gicv3_dist.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(neon_abd_s16, i32, i32, i32)
+@@ -XXX,XX +XXX,XX @@ static bool gicd_readl(GICv3State *s, hwaddr offset,
- DEF_HELPER_2(neon_abd_u32, i32, i32, i32)
+          * MBIS == 0 (message-based SPIs not supported)
- DEF_HELPER_2(neon_abd_s32, i32, i32, i32)
+          * SecurityExtn == 1 if security extns supported
+          * CPUNumber == 0 since for us ARE is always 1
--DEF_HELPER_2(neon_shl_u8, i32, i32, i32)
+-         * ITLinesNumber == (num external irqs / 32) - 1
--DEF_HELPER_2(neon_shl_s8, i32, i32, i32)
++         * ITLinesNumber == (((max SPI IntID + 1) / 32) - 1)
- DEF_HELPER_2(neon_shl_u16, i32, i32, i32)
+          */
- DEF_HELPER_2(neon_shl_s16, i32, i32, i32)
+-        int itlinesnumber = ((s->num_irq - GIC_INTERNAL) / 32) - 1;
--DEF_HELPER_2(neon_shl_u32, i32, i32, i32)
++        int itlinesnumber = (s->num_irq / 32) - 1;
--DEF_HELPER_2(neon_shl_s32, i32, i32, i32)
+         /*
--DEF_HELPER_2(neon_shl_u64, i64, i64, i64)
+          * SecurityExtn must be RAZ if GICD_CTLR.DS == 1, and
--DEF_HELPER_2(neon_shl_s64, i64, i64, i64)
+          * "security extensions not supported" always implies DS == 1,
  DEF_HELPER_2(neon_rshl_u8, i32, i32, i32)
  DEF_HELPER_2(neon_rshl_s8, i32, i32, i32)
  DEF_HELPER_2(neon_rshl_u16, i32, i32, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_2(frint64_s, TCG_CALL_NO_RWG, f32, f32, ptr)
  DEF_HELPER_FLAGS_2(frint32_d, TCG_CALL_NO_RWG, f64, f64, ptr)
  DEF_HELPER_FLAGS_2(frint64_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 +DEF_HELPER_FLAGS_4(gvec_sshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +
  #ifdef TARGET_AARCH64
  #include "helper-a64.h"
  #include "helper-sve.h"
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ extern const GVecGen3 bif_op;
  extern const GVecGen3 mla_op[4];
  extern const GVecGen3 mls_op[4];
  extern const GVecGen3 cmtst_op[4];
 +extern const GVecGen3 sshl_op[4];
 +extern const GVecGen3 ushl_op[4];
  extern const GVecGen2i ssra_op[4];
  extern const GVecGen2i usra_op[4];
  extern const GVecGen2i sri_op[4];
@@ -XXX,XX +XXX,XX @@ extern const GVecGen4 sqadd_op[4];
  extern const GVecGen4 uqsub_op[4];
  extern const GVecGen4 sqsub_op[4];
  void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 +void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 +void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 +void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 +void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
  /*
   * Forward to the isar_feature_* tests given a DisasContext pointer.
 diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon_helper.c
 +++ b/target/arm/neon_helper.c
@@ -XXX,XX +XXX,XX @@ NEON_VOP(abd_u32, neon_u32, 1)
      } else { \
          dest = src1 << tmp; \
      }} while (0)
 -NEON_VOP(shl_u8, neon_u8, 4)
  NEON_VOP(shl_u16, neon_u16, 2)
 -NEON_VOP(shl_u32, neon_u32, 1)
  #undef NEON_FN
 -uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
 -{
 -    int8_t shift = (int8_t)shiftop;
 -    if (shift >= 64 || shift <= -64) {
 -        val = 0;
 -    } else if (shift < 0) {
 -        val >>= -shift;
 -    } else {
 -        val <<= shift;
 -    }
 -    return val;
 -}
 -
  #define NEON_FN(dest, src1, src2) do { \
      int8_t tmp; \
      tmp = (int8_t)src2; \
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
      } else { \
          dest = src1 << tmp; \
      }} while (0)
 -NEON_VOP(shl_s8, neon_s8, 4)
  NEON_VOP(shl_s16, neon_s16, 2)
 -NEON_VOP(shl_s32, neon_s32, 1)
  #undef NEON_FN
 -uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop)
 -{
 -    int8_t shift = (int8_t)shiftop;
 -    int64_t val = valop;
 -    if (shift >= 64) {
 -        val = 0;
 -    } else if (shift <= -64) {
 -        val >>= 63;
 -    } else if (shift < 0) {
 -        val >>= -shift;
 -    } else {
 -        val <<= shift;
 -    }
 -    return val;
 -}
 -
  #define NEON_FN(dest, src1, src2) do { \
      int8_t tmp; \
      tmp = (int8_t)src2; \
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
          break;
      case 0x8: /* SSHL, USHL */
          if (u) {
 -            gen_helper_neon_shl_u64(tcg_rd, tcg_rn, tcg_rm);
 +            gen_ushl_i64(tcg_rd, tcg_rn, tcg_rm);
          } else {
 -            gen_helper_neon_shl_s64(tcg_rd, tcg_rn, tcg_rm);
 +            gen_sshl_i64(tcg_rd, tcg_rn, tcg_rm);
          }
          break;
      case 0x9: /* SQSHL, UQSHL */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                         is_q ? 16 : 8, vec_full_reg_size(s),
                         (u ? uqsub_op : sqsub_op) + size);
          return;
 +    case 0x08: /* SSHL, USHL */
 +        gen_gvec_op3(s, is_q, rd, rn, rm,
 +                     u ? &ushl_op[size] : &sshl_op[size]);
 +        return;
      case 0x0c: /* SMAX, UMAX */
          if (u) {
              gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                  genfn = fns[size][u];
                  break;
              }
 -            case 0x8: /* SSHL, USHL */
 -            {
 -                static NeonGenTwoOpFn * const fns[3][2] = {
 -                    { gen_helper_neon_shl_s8, gen_helper_neon_shl_u8 },
 -                    { gen_helper_neon_shl_s16, gen_helper_neon_shl_u16 },
 -                    { gen_helper_neon_shl_s32, gen_helper_neon_shl_u32 },
 -                };
 -                genfn = fns[size][u];
 -                break;
 -            }
              case 0x9: /* SQSHL, UQSHL */
              {
                  static NeonGenTwoOpEnvFn * const fns[3][2] = {
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_shift_narrow(int size, TCGv_i32 var, TCGv_i32 shift,
          if (u) {
              switch (size) {
              case 1: gen_helper_neon_shl_u16(var, var, shift); break;
 -            case 2: gen_helper_neon_shl_u32(var, var, shift); break;
 +            case 2: gen_ushl_i32(var, var, shift); break;
              default: abort();
              }
          } else {
              switch (size) {
              case 1: gen_helper_neon_shl_s16(var, var, shift); break;
 -            case 2: gen_helper_neon_shl_s32(var, var, shift); break;
 +            case 2: gen_sshl_i32(var, var, shift); break;
              default: abort();
              }
          }
@@ -XXX,XX +XXX,XX @@ const GVecGen3 cmtst_op[4] = {
        .vece = MO_64 },
  };
 +void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 +{
 +    TCGv_i32 lval = tcg_temp_new_i32();
 +    TCGv_i32 rval = tcg_temp_new_i32();
 +    TCGv_i32 lsh = tcg_temp_new_i32();
 +    TCGv_i32 rsh = tcg_temp_new_i32();
 +    TCGv_i32 zero = tcg_const_i32(0);
 +    TCGv_i32 max = tcg_const_i32(32);
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_ext8s_i32(lsh, b);
 +    tcg_gen_neg_i32(rsh, lsh);
 +    tcg_gen_shl_i32(lval, a, lsh);
 +    tcg_gen_shr_i32(rval, a, rsh);
 +    tcg_gen_movcond_i32(TCG_COND_LTU, d, lsh, max, lval, zero);
 +    tcg_gen_movcond_i32(TCG_COND_LTU, d, rsh, max, rval, d);
 +
 +    tcg_temp_free_i32(lval);
 +    tcg_temp_free_i32(rval);
 +    tcg_temp_free_i32(lsh);
 +    tcg_temp_free_i32(rsh);
 +    tcg_temp_free_i32(zero);
 +    tcg_temp_free_i32(max);
 +}
 +
 +void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 +{
 +    TCGv_i64 lval = tcg_temp_new_i64();
 +    TCGv_i64 rval = tcg_temp_new_i64();
 +    TCGv_i64 lsh = tcg_temp_new_i64();
 +    TCGv_i64 rsh = tcg_temp_new_i64();
 +    TCGv_i64 zero = tcg_const_i64(0);
 +    TCGv_i64 max = tcg_const_i64(64);
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_ext8s_i64(lsh, b);
 +    tcg_gen_neg_i64(rsh, lsh);
 +    tcg_gen_shl_i64(lval, a, lsh);
 +    tcg_gen_shr_i64(rval, a, rsh);
 +    tcg_gen_movcond_i64(TCG_COND_LTU, d, lsh, max, lval, zero);
 +    tcg_gen_movcond_i64(TCG_COND_LTU, d, rsh, max, rval, d);
 +
 +    tcg_temp_free_i64(lval);
 +    tcg_temp_free_i64(rval);
 +    tcg_temp_free_i64(lsh);
 +    tcg_temp_free_i64(rsh);
 +    tcg_temp_free_i64(zero);
 +    tcg_temp_free_i64(max);
 +}
 +
 +static void gen_ushl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
 +{
 +    TCGv_vec lval = tcg_temp_new_vec_matching(d);
 +    TCGv_vec rval = tcg_temp_new_vec_matching(d);
 +    TCGv_vec lsh = tcg_temp_new_vec_matching(d);
 +    TCGv_vec rsh = tcg_temp_new_vec_matching(d);
 +    TCGv_vec msk, max;
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_neg_vec(vece, rsh, b);
 +    if (vece == MO_8) {
 +        tcg_gen_mov_vec(lsh, b);
 +    } else {
 +        msk = tcg_temp_new_vec_matching(d);
 +        tcg_gen_dupi_vec(vece, msk, 0xff);
 +        tcg_gen_and_vec(vece, lsh, b, msk);
 +        tcg_gen_and_vec(vece, rsh, rsh, msk);
 +        tcg_temp_free_vec(msk);
 +    }
 +
 +    /*
 +     * Perform possibly out of range shifts, trusting that the operation
 +     * does not trap.  Discard unused results after the fact.
 +     */
 +    tcg_gen_shlv_vec(vece, lval, a, lsh);
 +    tcg_gen_shrv_vec(vece, rval, a, rsh);
 +
 +    max = tcg_temp_new_vec_matching(d);
 +    tcg_gen_dupi_vec(vece, max, 8 << vece);
 +
 +    /*
 +     * The choice of LT (signed) and GEU (unsigned) are biased toward
 +     * the instructions of the x86_64 host.  For MO_8, the whole byte
 +     * is significant so we must use an unsigned compare; otherwise we
 +     * have already masked to a byte and so a signed compare works.
 +     * Other tcg hosts have a full set of comparisons and do not care.
 +     */
 +    if (vece == MO_8) {
 +        tcg_gen_cmp_vec(TCG_COND_GEU, vece, lsh, lsh, max);
 +        tcg_gen_cmp_vec(TCG_COND_GEU, vece, rsh, rsh, max);
 +        tcg_gen_andc_vec(vece, lval, lval, lsh);
 +        tcg_gen_andc_vec(vece, rval, rval, rsh);
 +    } else {
 +        tcg_gen_cmp_vec(TCG_COND_LT, vece, lsh, lsh, max);
 +        tcg_gen_cmp_vec(TCG_COND_LT, vece, rsh, rsh, max);
 +        tcg_gen_and_vec(vece, lval, lval, lsh);
 +        tcg_gen_and_vec(vece, rval, rval, rsh);
 +    }
 +    tcg_gen_or_vec(vece, d, lval, rval);
 +
 +    tcg_temp_free_vec(max);
 +    tcg_temp_free_vec(lval);
 +    tcg_temp_free_vec(rval);
 +    tcg_temp_free_vec(lsh);
 +    tcg_temp_free_vec(rsh);
 +}
 +
 +static const TCGOpcode ushl_list[] = {
 +    INDEX_op_neg_vec, INDEX_op_shlv_vec,
 +    INDEX_op_shrv_vec, INDEX_op_cmp_vec, 0
 +};
 +
 +const GVecGen3 ushl_op[4] = {
 +    { .fniv = gen_ushl_vec,
 +      .fno = gen_helper_gvec_ushl_b,
 +      .opt_opc = ushl_list,
 +      .vece = MO_8 },
 +    { .fniv = gen_ushl_vec,
 +      .fno = gen_helper_gvec_ushl_h,
 +      .opt_opc = ushl_list,
 +      .vece = MO_16 },
 +    { .fni4 = gen_ushl_i32,
 +      .fniv = gen_ushl_vec,
 +      .opt_opc = ushl_list,
 +      .vece = MO_32 },
 +    { .fni8 = gen_ushl_i64,
 +      .fniv = gen_ushl_vec,
 +      .opt_opc = ushl_list,
 +      .vece = MO_64 },
 +};
 +
 +void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 +{
 +    TCGv_i32 lval = tcg_temp_new_i32();
 +    TCGv_i32 rval = tcg_temp_new_i32();
 +    TCGv_i32 lsh = tcg_temp_new_i32();
 +    TCGv_i32 rsh = tcg_temp_new_i32();
 +    TCGv_i32 zero = tcg_const_i32(0);
 +    TCGv_i32 max = tcg_const_i32(31);
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_ext8s_i32(lsh, b);
 +    tcg_gen_neg_i32(rsh, lsh);
 +    tcg_gen_shl_i32(lval, a, lsh);
 +    tcg_gen_umin_i32(rsh, rsh, max);
 +    tcg_gen_sar_i32(rval, a, rsh);
 +    tcg_gen_movcond_i32(TCG_COND_LEU, lval, lsh, max, lval, zero);
 +    tcg_gen_movcond_i32(TCG_COND_LT, d, lsh, zero, rval, lval);
 +
 +    tcg_temp_free_i32(lval);
 +    tcg_temp_free_i32(rval);
 +    tcg_temp_free_i32(lsh);
 +    tcg_temp_free_i32(rsh);
 +    tcg_temp_free_i32(zero);
 +    tcg_temp_free_i32(max);
 +}
 +
 +void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 +{
 +    TCGv_i64 lval = tcg_temp_new_i64();
 +    TCGv_i64 rval = tcg_temp_new_i64();
 +    TCGv_i64 lsh = tcg_temp_new_i64();
 +    TCGv_i64 rsh = tcg_temp_new_i64();
 +    TCGv_i64 zero = tcg_const_i64(0);
 +    TCGv_i64 max = tcg_const_i64(63);
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_ext8s_i64(lsh, b);
 +    tcg_gen_neg_i64(rsh, lsh);
 +    tcg_gen_shl_i64(lval, a, lsh);
 +    tcg_gen_umin_i64(rsh, rsh, max);
 +    tcg_gen_sar_i64(rval, a, rsh);
 +    tcg_gen_movcond_i64(TCG_COND_LEU, lval, lsh, max, lval, zero);
 +    tcg_gen_movcond_i64(TCG_COND_LT, d, lsh, zero, rval, lval);
 +
 +    tcg_temp_free_i64(lval);
 +    tcg_temp_free_i64(rval);
 +    tcg_temp_free_i64(lsh);
 +    tcg_temp_free_i64(rsh);
 +    tcg_temp_free_i64(zero);
 +    tcg_temp_free_i64(max);
 +}
 +
 +static void gen_sshl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
 +{
 +    TCGv_vec lval = tcg_temp_new_vec_matching(d);
 +    TCGv_vec rval = tcg_temp_new_vec_matching(d);
 +    TCGv_vec lsh = tcg_temp_new_vec_matching(d);
 +    TCGv_vec rsh = tcg_temp_new_vec_matching(d);
 +    TCGv_vec tmp = tcg_temp_new_vec_matching(d);
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_neg_vec(vece, rsh, b);
 +    if (vece == MO_8) {
 +        tcg_gen_mov_vec(lsh, b);
 +    } else {
 +        tcg_gen_dupi_vec(vece, tmp, 0xff);
 +        tcg_gen_and_vec(vece, lsh, b, tmp);
 +        tcg_gen_and_vec(vece, rsh, rsh, tmp);
 +    }
 +
 +    /* Bound rsh so out of bound right shift gets -1.  */
 +    tcg_gen_dupi_vec(vece, tmp, (8 << vece) - 1);
 +    tcg_gen_umin_vec(vece, rsh, rsh, tmp);
 +    tcg_gen_cmp_vec(TCG_COND_GT, vece, tmp, lsh, tmp);
 +
 +    tcg_gen_shlv_vec(vece, lval, a, lsh);
 +    tcg_gen_sarv_vec(vece, rval, a, rsh);
 +
 +    /* Select in-bound left shift.  */
 +    tcg_gen_andc_vec(vece, lval, lval, tmp);
 +
 +    /* Select between left and right shift.  */
 +    if (vece == MO_8) {
 +        tcg_gen_dupi_vec(vece, tmp, 0);
 +        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, d, lsh, tmp, rval, lval);
 +    } else {
 +        tcg_gen_dupi_vec(vece, tmp, 0x80);
 +        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, d, lsh, tmp, lval, rval);
 +    }
 +
 +    tcg_temp_free_vec(lval);
 +    tcg_temp_free_vec(rval);
 +    tcg_temp_free_vec(lsh);
 +    tcg_temp_free_vec(rsh);
 +    tcg_temp_free_vec(tmp);
 +}
 +
 +static const TCGOpcode sshl_list[] = {
 +    INDEX_op_neg_vec, INDEX_op_umin_vec, INDEX_op_shlv_vec,
 +    INDEX_op_sarv_vec, INDEX_op_cmp_vec, INDEX_op_cmpsel_vec, 0
 +};
 +
 +const GVecGen3 sshl_op[4] = {
 +    { .fniv = gen_sshl_vec,
 +      .fno = gen_helper_gvec_sshl_b,
 +      .opt_opc = sshl_list,
 +      .vece = MO_8 },
 +    { .fniv = gen_sshl_vec,
 +      .fno = gen_helper_gvec_sshl_h,
 +      .opt_opc = sshl_list,
 +      .vece = MO_16 },
 +    { .fni4 = gen_sshl_i32,
 +      .fniv = gen_sshl_vec,
 +      .opt_opc = sshl_list,
 +      .vece = MO_32 },
 +    { .fni8 = gen_sshl_i64,
 +      .fniv = gen_sshl_vec,
 +      .opt_opc = sshl_list,
 +      .vece = MO_64 },
 +};
 +
  static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
                            TCGv_vec a, TCGv_vec b)
  {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                    vec_size, vec_size);
              }
              return 0;
 +
 +        case NEON_3R_VSHL:
 +            tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
 +                           u ? &ushl_op[size] : &sshl_op[size]);
 +            return 0;
          }
          if (size == 3) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  neon_load_reg64(cpu_V0, rn + pass);
                  neon_load_reg64(cpu_V1, rm + pass);
                  switch (op) {
 -                case NEON_3R_VSHL:
 -                    if (u) {
 -                        gen_helper_neon_shl_u64(cpu_V0, cpu_V1, cpu_V0);
 -                    } else {
 -                        gen_helper_neon_shl_s64(cpu_V0, cpu_V1, cpu_V0);
 -                    }
 -                    break;
                  case NEON_3R_VQSHL:
                      if (u) {
                          gen_helper_neon_qshl_u64(cpu_V0, cpu_env,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          }
          pairwise = 0;
          switch (op) {
 -        case NEON_3R_VSHL:
          case NEON_3R_VQSHL:
          case NEON_3R_VRSHL:
          case NEON_3R_VQRSHL:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          case NEON_3R_VHSUB:
              GEN_NEON_INTEGER_OP(hsub);
              break;
 -        case NEON_3R_VSHL:
 -            GEN_NEON_INTEGER_OP(shl);
 -            break;
          case NEON_3R_VQSHL:
              GEN_NEON_INTEGER_OP_ENV(qshl);
              break;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              }
                          } else {
                              if (input_unsigned) {
 -                                gen_helper_neon_shl_u64(cpu_V0, in, tmp64);
 +                                gen_ushl_i64(cpu_V0, in, tmp64);
                              } else {
 -                                gen_helper_neon_shl_s64(cpu_V0, in, tmp64);
 +                                gen_sshl_i64(cpu_V0, in, tmp64);
                              }
                          }
                          tmp = tcg_temp_new_i32();
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
      do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status, desc,
                   get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
  }
 +
 +void HELPER(gvec_sshl_b)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    int8_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz; ++i) {
 +        int8_t mm = m[i];
 +        int8_t nn = n[i];
 +        int8_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 8) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            res = nn >> (mm > -8 ? -mm : 7);
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
 +void HELPER(gvec_sshl_h)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    int16_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz / 2; ++i) {
 +        int8_t mm = m[i];   /* only 8 bits of shift are significant */
 +        int16_t nn = n[i];
 +        int16_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 16) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            res = nn >> (mm > -16 ? -mm : 15);
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
 +void HELPER(gvec_ushl_b)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    uint8_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz; ++i) {
 +        int8_t mm = m[i];
 +        uint8_t nn = n[i];
 +        uint8_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 8) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            if (mm > -8) {
 +                res = nn >> -mm;
 +            }
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
 +void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    uint16_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz / 2; ++i) {
 +        int8_t mm = m[i];   /* only 8 bits of shift are significant */
 +        uint16_t nn = n[i];
 +        uint16_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 16) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            if (mm > -16) {
 +                res = nn >> -mm;
 +            }
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 12/48] target/arm: Convert the VSEL instructions to decodetree
+[PULL 11/29] target/arm: Allow relevant HCR bits to be written for FEAT_EVT
-Convert the VSEL instructions to decodetree.
+FEAT_EVT adds five new bits to the HCR_EL2 register: TTLBIS, TTLBOS,
-We leave trans_VSEL() in translate.c for now as this allows
+TICAB, TOCU and TID4.  These allow the guest to enable trapping of
-the patch to show just the changes from the old handle_vsel().
+various EL1 instructions to EL2.  In this commit, add the necessary
 code to allow the guest to set these bits if the feature is present;
 because the bit is always zero when the feature isn't present we
 won't need to use explicit feature checks in the "trap on condition"
 tests in the following commits.
-In the old code the check for "do D16-D31 exist" was hidden in
+Note that although full implementation of the feature (mandatory from
-the VFP_DREG macro, and assumed that VFPv3 always implied that
+Armv8.5 onward) requires all five trap bits, the ID registers permit
-D16-D31 exist. In the new code we do the correct ID register test.
+a value indicating that only TICAB, TOCU and TID4 are implemented,
-This gives identical behaviour for most of our CPUs, and fixes
+which might be the case for CPUs between Armv8.2 and Armv8.5.
 previously incorrect handling for  Cortex-R5F, Cortex-M4 and
 Cortex-M33, which all implement VFPv3 or better with only 16
 double-precision registers.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/cpu.h               |  6 ++++++
+ target/arm/cpu.h    | 30 ++++++++++++++++++++++++++++++
- target/arm/translate-vfp.inc.c |  9 +++++++++
+ target/arm/helper.c |  6 ++++++
- target/arm/translate.c         | 35 ++++++++++++++++++++++++----------
+files changed, 36 insertions(+)
  target/arm/vfp-uncond.decode   | 19 ++++++++++++++++++
 files changed, 59 insertions(+), 10 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_tts2uxn(const ARMISARegisters *id)
-     return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
+     return FIELD_EX32(id->id_mmfr4, ID_MMFR4, XNX) != 0;
  }
-+static inline bool isar_feature_aa32_fp_d32(const ARMISARegisters *id)
++static inline bool isar_feature_aa32_half_evt(const ARMISARegisters *id)
 +{
-+    /* Return true if D16-D31 are implemented */
++    return FIELD_EX32(id->id_mmfr4, ID_MMFR4, EVT) >= 1;
-+    return FIELD_EX64(id->mvfr0, MVFR0, SIMDREG) >= 2;
++}
 +
 +static inline bool isar_feature_aa32_evt(const ARMISARegisters *id)
 +{
 +    return FIELD_EX32(id->id_mmfr4, ID_MMFR4, EVT) >= 2;
 +}
 +
  static inline bool isar_feature_aa32_dit(const ARMISARegisters *id)
  {
      return FIELD_EX32(id->id_pfr0, ID_PFR0, DIT) != 0;
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_ids(const ARMISARegisters *id)
      return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, IDS) != 0;
  }
 +static inline bool isar_feature_aa64_half_evt(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, EVT) >= 1;
 +}
 +
 +static inline bool isar_feature_aa64_evt(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, EVT) >= 2;
 +}
 +
  static inline bool isar_feature_aa64_bti(const ARMISARegisters *id)
  {
      return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, BT) != 0;
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_any_ras(const ARMISARegisters *id)
      return isar_feature_aa64_ras(id) || isar_feature_aa32_ras(id);
  }
 +static inline bool isar_feature_any_half_evt(const ARMISARegisters *id)
 +{
 +    return isar_feature_aa64_half_evt(id) || isar_feature_aa32_half_evt(id);
 +}
 +
 +static inline bool isar_feature_any_evt(const ARMISARegisters *id)
 +{
 +    return isar_feature_aa64_evt(id) || isar_feature_aa32_evt(id);
 +}
 +
  /*
-  * We always set the FP and SIMD FP16 fields to indicate identical
+  * Forward to the above feature tests given an ARMCPU pointer.
-  * levels of support (assuming SIMD is implemented at all), so
+  */
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/helper.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
+@@ -XXX,XX +XXX,XX @@ static void do_hcr_write(CPUARMState *env, uint64_t value, uint64_t valid_mask)
+         }
-     return true;
+     }
- }
-+
++    if (cpu_isar_feature(any_evt, cpu)) {
-+/*
++        valid_mask |= HCR_TTLBIS | HCR_TTLBOS | HCR_TICAB | HCR_TOCU | HCR_TID4;
-+ * The most usual kind of VFP access check, for everything except
++    } else if (cpu_isar_feature(any_half_evt, cpu)) {
-+ * FMXR/FMRX to the always-available special registers.
++        valid_mask |= HCR_TICAB | HCR_TOCU | HCR_TID4;
 + */
 +static bool vfp_access_check(DisasContext *s)
 +{
 +    return full_vfp_access_check(s, false);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
      tcg_temp_free_i32(tmp);
  }
 -static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
 -                       uint32_t dp)
 +static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
  {
 -    uint32_t cc = extract32(insn, 20, 2);
 +    uint32_t rd, rn, rm;
 +    bool dp = a->dp;
 +
 +    if (!dc_isar_feature(aa32_vsel, s)) {
 +        return false;
 +    }
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist */
+     /* Clear RES0 bits.  */
-+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+     value &= valid_mask;
-+        ((a->vm | a->vn | a->vd) & 0x10)) {
 +        return false;
 +    }
 +    rd = a->vd;
 +    rn = a->vn;
 +    rm = a->vm;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
      if (dp) {
          TCGv_i64 frn, frm, dest;
@@ -XXX,XX +XXX,XX @@ static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
          tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
          tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        switch (cc) {
 +        switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
                                  frn, frm);
@@ -XXX,XX +XXX,XX @@ static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
          dest = tcg_temp_new_i32();
          tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
          tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        switch (cc) {
 +        switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
                                  frn, frm);
@@ -XXX,XX +XXX,XX @@ static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
          tcg_temp_free_i32(zero);
      }
 -    return 0;
 +    return true;
  }
  static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
          rm = VFP_SREG_M(insn);
      }
 -    if ((insn & 0x0f800e50) == 0x0e000a00 && dc_isar_feature(aa32_vsel, s)) {
 -        return handle_vsel(insn, rd, rn, rm, dp);
 -    } else if ((insn & 0x0fb00e10) == 0x0e800a00 &&
 -               dc_isar_feature(aa32_vminmaxnm, s)) {
 +    if ((insn & 0x0fb00e10) == 0x0e800a00 &&
 +        dc_isar_feature(aa32_vminmaxnm, s)) {
          return handle_vminmaxnm(insn, rd, rn, rm, dp);
      } else if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
                 dc_isar_feature(aa32_vrint, s)) {
 diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp-uncond.decode
 +++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@
  #  1111 1110 .... .... .... 101. .... ....
  # (but those patterns might also cover some Neon instructions,
  # which do not live in this file.)
 +
 +# VFP registers have an odd encoding with a four-bit field
 +# and a one-bit field which are assembled in different orders
 +# depending on whether the register is double or single precision.
 +# Each individual instruction function must do the checks for
 +# "double register selected but CPU does not have double support"
 +# and "double register number has bit 4 set but CPU does not
 +# support D16-D31" (which should UNDEF).
 +%vm_dp  5:1 0:4
 +%vm_sp  0:4 5:1
 +%vn_dp  7:1 16:4
 +%vn_sp  16:4 7:1
 +%vd_dp  22:1 12:4
 +%vd_sp  12:4 22:1
 +
 +VSEL        1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
 +            vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
 +VSEL        1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
 +            vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 46/48] target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree
+[PULL 12/29] target/arm: Implement HCR_EL2.TTLBIS traps
-Convert the VCVT (between floating-point and fixed-point) instructions
+For FEAT_EVT, the HCR_EL2.TTLBIS bit allows trapping on EL1 use of
-to decodetree.
+TLB maintenance instructions that operate on the inner shareable
 domain:
 AArch64:
  TLBI VMALLE1IS, TLBI VAE1IS, TLBI ASIDE1IS, TLBI VAAE1IS,
  TLBI VALE1IS, TLBI VAALE1IS, TLBI RVAE1IS, TLBI RVAAE1IS,
  TLBI RVALE1IS, and TLBI RVAALE1IS.
 AArch32:
  TLBIALLIS, TLBIMVAIS, TLBIASIDIS, TLBIMVAAIS, TLBIMVALIS,
  and TLBIMVAALIS.
 Add the trapping support.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 124 +++++++++++++++++++++++++++++++++
+ target/arm/helper.c | 43 +++++++++++++++++++++++++++----------------
- target/arm/translate.c         |  57 +--------------
+file changed, 27 insertions(+), 16 deletions(-)
  target/arm/vfp.decode          |  10 +++
 files changed, 136 insertions(+), 55 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/helper.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
+@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_ttlb(CPUARMState *env, const ARMCPRegInfo *ri,
-     tcg_temp_free_i32(vd);
+     return CP_ACCESS_OK;
      return true;
  }
-+
-+static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
++/* Check for traps from EL1 due to HCR_EL2.TTLB or TTLBIS. */
 +static CPAccessResult access_ttlbis(CPUARMState *env, const ARMCPRegInfo *ri,
 +                                    bool isread)
 +{
-+    TCGv_i32 vd, shift;
++    if (arm_current_el(env) == 1 &&
-+    TCGv_ptr fpst;
++        (arm_hcr_el2_eff(env) & (HCR_TTLB | HCR_TTLBIS))) {
-+    int frac_bits;
++        return CP_ACCESS_TRAP_EL2;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 +        return false;
 +    }
-+
++    return CP_ACCESS_OK;
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 +
 +    vd = tcg_temp_new_i32();
 +    neon_load_reg32(vd, a->vd);
 +
 +    fpst = get_fpstatus_ptr(false);
 +    shift = tcg_const_i32(frac_bits);
 +
 +    /* Switch on op:U:sx bits */
 +    switch (a->opc) {
 +    case 0:
 +        gen_helper_vfp_shtos(vd, vd, shift, fpst);
 +        break;
 +    case 1:
 +        gen_helper_vfp_sltos(vd, vd, shift, fpst);
 +        break;
 +    case 2:
 +        gen_helper_vfp_uhtos(vd, vd, shift, fpst);
 +        break;
 +    case 3:
 +        gen_helper_vfp_ultos(vd, vd, shift, fpst);
 +        break;
 +    case 4:
 +        gen_helper_vfp_toshs_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 5:
 +        gen_helper_vfp_tosls_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 6:
 +        gen_helper_vfp_touhs_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 7:
 +        gen_helper_vfp_touls_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    neon_store_reg32(vd, a->vd);
 +    tcg_temp_free_i32(vd);
 +    tcg_temp_free_i32(shift);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
-+static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
+ static void dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
-+{
+ {
-+    TCGv_i64 vd;
+     ARMCPU *cpu = env_archcpu(env);
-+    TCGv_i32 shift;
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
-+    TCGv_ptr fpst;
+ static const ARMCPRegInfo v7mp_cp_reginfo[] = {
-+    int frac_bits;
+     /* 32 bit TLB invalidates, Inner Shareable */
-+
+     { .name = "TLBIALLIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0,
-+    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
+-      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
-+        return false;
++      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis,
-+    }
+       .writefn = tlbiall_is_write },
-+
+     { .name = "TLBIMVAIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 1,
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
+-      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
++      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis,
-+        return false;
+       .writefn = tlbimva_is_write },
-+    }
+     { .name = "TLBIASIDIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 2,
-+
+-      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
-+    if (!vfp_access_check(s)) {
++      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis,
-+        return true;
+       .writefn = tlbiasid_is_write },
-+    }
+     { .name = "TLBIMVAAIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 3,
-+
+-      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
-+    frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
++      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis,
-+
+       .writefn = tlbimvaa_is_write },
-+    vd = tcg_temp_new_i64();
+ };
-+    neon_load_reg64(vd, a->vd);
-+
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-+    fpst = get_fpstatus_ptr(false);
+     /* TLBI operations */
-+    shift = tcg_const_i32(frac_bits);
+     { .name = "TLBI_VMALLE1IS", .state = ARM_CP_STATE_AA64,
-+
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0,
-+    /* Switch on op:U:sx bits */
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-+    switch (a->opc) {
++      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
-+    case 0:
+       .writefn = tlbi_aa64_vmalle1is_write },
-+        gen_helper_vfp_shtod(vd, vd, shift, fpst);
+     { .name = "TLBI_VAE1IS", .state = ARM_CP_STATE_AA64,
-+        break;
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 1,
-+    case 1:
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-+        gen_helper_vfp_sltod(vd, vd, shift, fpst);
++      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
-+        break;
+       .writefn = tlbi_aa64_vae1is_write },
-+    case 2:
+     { .name = "TLBI_ASIDE1IS", .state = ARM_CP_STATE_AA64,
-+        gen_helper_vfp_uhtod(vd, vd, shift, fpst);
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 2,
-+        break;
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-+    case 3:
++      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
-+        gen_helper_vfp_ultod(vd, vd, shift, fpst);
+       .writefn = tlbi_aa64_vmalle1is_write },
-+        break;
+     { .name = "TLBI_VAAE1IS", .state = ARM_CP_STATE_AA64,
-+    case 4:
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 3,
-+        gen_helper_vfp_toshd_round_to_zero(vd, vd, shift, fpst);
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-+        break;
++      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
-+    case 5:
+       .writefn = tlbi_aa64_vae1is_write },
-+        gen_helper_vfp_tosld_round_to_zero(vd, vd, shift, fpst);
+     { .name = "TLBI_VALE1IS", .state = ARM_CP_STATE_AA64,
-+        break;
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 5,
-+    case 6:
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-+        gen_helper_vfp_touhd_round_to_zero(vd, vd, shift, fpst);
++      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
-+        break;
+       .writefn = tlbi_aa64_vae1is_write },
-+    case 7:
+     { .name = "TLBI_VAALE1IS", .state = ARM_CP_STATE_AA64,
-+        gen_helper_vfp_tould_round_to_zero(vd, vd, shift, fpst);
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 7,
-+        break;
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-+    default:
++      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
-+        g_assert_not_reached();
+       .writefn = tlbi_aa64_vae1is_write },
-+    }
+     { .name = "TLBI_VMALLE1", .state = ARM_CP_STATE_AA64,
-+
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 0,
-+    neon_store_reg64(vd, a->vd);
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-+    tcg_temp_free_i64(vd);
+ #endif
-+    tcg_temp_free_i32(shift);
+     /* TLB invalidate last level of translation table walk */
-+    tcg_temp_free_ptr(fpst);
+     { .name = "TLBIMVALIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 5,
-+    return true;
+-      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
-+}
++      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis,
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+       .writefn = tlbimva_is_write },
-index XXXXXXX..XXXXXXX 100644
+     { .name = "TLBIMVAALIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 7,
---- a/target/arm/translate.c
+-      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
-+++ b/target/arm/translate.c
++      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis,
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp, int shift, int neon) \
+       .writefn = tlbimvaa_is_write },
-     tcg_temp_free_i32(tmp_shift); \
+     { .name = "TLBIMVAL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 5,
-     tcg_temp_free_ptr(statusptr); \
+       .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
- }
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo pauth_reginfo[] = {
--VFP_GEN_FIX(tosh, _round_to_zero)
+ static const ARMCPRegInfo tlbirange_reginfo[] = {
- VFP_GEN_FIX(tosl, _round_to_zero)
+     { .name = "TLBI_RVAE1IS", .state = ARM_CP_STATE_AA64,
--VFP_GEN_FIX(touh, _round_to_zero)
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 1,
- VFP_GEN_FIX(toul, _round_to_zero)
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
--VFP_GEN_FIX(shto, )
++      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
- VFP_GEN_FIX(slto, )
+       .writefn = tlbi_aa64_rvae1is_write },
--VFP_GEN_FIX(uhto, )
+     { .name = "TLBI_RVAAE1IS", .state = ARM_CP_STATE_AA64,
- VFP_GEN_FIX(ulto, )
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 3,
- #undef VFP_GEN_FIX
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
++      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+       .writefn = tlbi_aa64_rvae1is_write },
-                 return 1;
+    { .name = "TLBI_RVALE1IS", .state = ARM_CP_STATE_AA64,
-             case 15:
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 5,
-                 switch (rn) {
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
--                case 0 ... 19:
++      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
-+                case 0 ... 23:
+       .writefn = tlbi_aa64_rvae1is_write },
-+                case 28 ... 31:
+     { .name = "TLBI_RVAALE1IS", .state = ARM_CP_STATE_AA64,
-                     /* Already handled by decodetree */
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 7,
-                     return 1;
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-                 default:
++      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+       .writefn = tlbi_aa64_rvae1is_write },
-                     rd_is_dp = false;
+     { .name = "TLBI_RVAE1OS", .state = ARM_CP_STATE_AA64,
-                     break;
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 1,
 -                case 0x14: /* vcvt fp <-> fixed */
 -                case 0x15:
 -                case 0x16:
 -                case 0x17:
 -                case 0x1c:
 -                case 0x1d:
 -                case 0x1e:
 -                case 0x1f:
 -                    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                        return 1;
 -                    }
 -                    /* Immediate frac_bits has same format as SREG_M.  */
 -                    rm_is_dp = false;
 -                    break;
 -
                  default:
                      return 1;
                  }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              /* Load the initial operands.  */
              if (op == 15) {
                  switch (rn) {
 -                case 0x14: /* vcvt fp <-> fixed */
 -                case 0x15:
 -                case 0x16:
 -                case 0x17:
 -                case 0x1c:
 -                case 0x1d:
 -                case 0x1e:
 -                case 0x1f:
 -                    /* Source and destination the same.  */
 -                    gen_mov_F0_vreg(dp, rd);
 -                    break;
                  default:
                      /* One source operand.  */
                      gen_mov_F0_vreg(rm_is_dp, rm);
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 20: /* fshto */
 -                        gen_vfp_shto(dp, 16 - rm, 0);
 -                        break;
 -                    case 21: /* fslto */
 -                        gen_vfp_slto(dp, 32 - rm, 0);
 -                        break;
 -                    case 22: /* fuhto */
 -                        gen_vfp_uhto(dp, 16 - rm, 0);
 -                        break;
 -                    case 23: /* fulto */
 -                        gen_vfp_ulto(dp, 32 - rm, 0);
 -                        break;
                      case 24: /* ftoui */
                          gen_vfp_toui(dp, 0);
                          break;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                      case 27: /* ftosiz */
                          gen_vfp_tosiz(dp, 0);
                          break;
 -                    case 28: /* ftosh */
 -                        gen_vfp_tosh(dp, 16 - rm, 0);
 -                        break;
 -                    case 29: /* ftosl */
 -                        gen_vfp_tosl(dp, 32 - rm, 0);
 -                        break;
 -                    case 30: /* ftouh */
 -                        gen_vfp_touh(dp, 16 - rm, 0);
 -                        break;
 -                    case 31: /* ftoul */
 -                        gen_vfp_toul(dp, 32 - rm, 0);
 -                        break;
                      default: /* undefined */
                          g_assert_not_reached();
                      }
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_int_dp  ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
  # VJCVT is always dp to sp
  VJCVT        ---- 1110 1.11 1001 .... 1011 11.0 .... \
               vd=%vd_sp vm=%vm_dp
 +
 +# VCVT between floating-point and fixed-point. The immediate value
 +# is in the same format as a Vm single-precision register number.
 +# We assemble bits 18 (op), 16 (u) and 7 (sx) into a single opc field
 +# for the convenience of the trans_VCVT_fix functions.
 +%vcvt_fix_op 18:1 16:1 7:1
 +VCVT_fix_sp  ---- 1110 1.11 1.1. .... 1010 .1.0 .... \
 +             vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
 +VCVT_fix_dp  ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
 +             vd=%vd_dp imm=%vm_sp opc=%vcvt_fix_op
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 45/48] target/arm: Convert VJCVT to decodetree
+[PULL 13/29] target/arm: Implement HCR_EL2.TTLBOS traps
-Convert the VJCVT instruction to decodetree.
+For FEAT_EVT, the HCR_EL2.TTLBOS bit allows trapping on EL1
 use of TLB maintenance instructions that operate on the
 outer shareable domain:
 TLBI VMALLE1OS, TLBI VAE1OS, TLBI ASIDE1OS,TLBI VAAE1OS,
 TLBI VALE1OS, TLBI VAALE1OS, TLBI RVAE1OS, TLBI RVAAE1OS,
 TLBI RVALE1OS, and TLBI RVAALE1OS.
 (There are no AArch32 outer-shareable TLB maintenance ops.)
 Implement the trapping.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 28 ++++++++++++++++++++++++++++
+ target/arm/helper.c | 33 +++++++++++++++++++++++----------
- target/arm/translate.c         | 12 +-----------
+file changed, 23 insertions(+), 10 deletions(-)
  target/arm/vfp.decode          |  4 ++++
 files changed, 33 insertions(+), 11 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/helper.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
+@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_ttlbis(CPUARMState *env, const ARMCPRegInfo *ri,
-     tcg_temp_free_ptr(fpst);
+     return CP_ACCESS_OK;
      return true;
  }
++#ifdef TARGET_AARCH64
++/* Check for traps from EL1 due to HCR_EL2.TTLB or TTLBOS. */
++static CPAccessResult access_ttlbos(CPUARMState *env, const ARMCPRegInfo *ri,
++                                    bool isread)
++{
++    if (arm_current_el(env) == 1 &&
++        (arm_hcr_el2_eff(env) & (HCR_TTLB | HCR_TTLBOS))) {
++        return CP_ACCESS_TRAP_EL2;
++    }
++    return CP_ACCESS_OK;
++}
++#endif
 +
-+static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
+ static void dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
-+{
+ {
-+    TCGv_i32 vd;
+     ARMCPU *cpu = env_archcpu(env);
-+    TCGv_i64 vm;
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo tlbirange_reginfo[] = {
-+
+       .writefn = tlbi_aa64_rvae1is_write },
-+    if (!dc_isar_feature(aa32_jscvt, s)) {
+     { .name = "TLBI_RVAE1OS", .state = ARM_CP_STATE_AA64,
-+        return false;
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 1,
-+    }
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-+
++      .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
+       .writefn = tlbi_aa64_rvae1is_write },
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+     { .name = "TLBI_RVAAE1OS", .state = ARM_CP_STATE_AA64,
-+        return false;
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 3,
-+    }
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-+
++      .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
-+    if (!vfp_access_check(s)) {
+       .writefn = tlbi_aa64_rvae1is_write },
-+        return true;
+    { .name = "TLBI_RVALE1OS", .state = ARM_CP_STATE_AA64,
-+    }
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 5,
-+
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-+    vm = tcg_temp_new_i64();
++      .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
-+    vd = tcg_temp_new_i32();
+       .writefn = tlbi_aa64_rvae1is_write },
-+    neon_load_reg64(vm, a->vm);
+     { .name = "TLBI_RVAALE1OS", .state = ARM_CP_STATE_AA64,
-+    gen_helper_vjcvt(vd, vm, cpu_env);
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 7,
-+    neon_store_reg32(vd, a->vd);
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-+    tcg_temp_free_i64(vm);
++      .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
-+    tcg_temp_free_i32(vd);
+       .writefn = tlbi_aa64_rvae1is_write },
-+    return true;
+     { .name = "TLBI_RVAE1", .state = ARM_CP_STATE_AA64,
-+}
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 1,
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo tlbirange_reginfo[] = {
-index XXXXXXX..XXXXXXX 100644
+ static const ARMCPRegInfo tlbios_reginfo[] = {
---- a/target/arm/translate.c
+     { .name = "TLBI_VMALLE1OS", .state = ARM_CP_STATE_AA64,
-+++ b/target/arm/translate.c
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 0,
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-                 return 1;
++      .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
-             case 15:
+       .writefn = tlbi_aa64_vmalle1is_write },
-                 switch (rn) {
+     { .name = "TLBI_VAE1OS", .state = ARM_CP_STATE_AA64,
--                case 0 ... 17:
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 1,
-+                case 0 ... 19:
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-                     /* Already handled by decodetree */
++      .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
-                     return 1;
+       .writefn = tlbi_aa64_vae1is_write },
-                 default:
+     { .name = "TLBI_ASIDE1OS", .state = ARM_CP_STATE_AA64,
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 2,
-                     rm_is_dp = false;
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-                     break;
++      .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
+       .writefn = tlbi_aa64_vmalle1is_write },
--                case 0x13: /* vjcvt */
+     { .name = "TLBI_VAAE1OS", .state = ARM_CP_STATE_AA64,
--                    if (!dp || !dc_isar_feature(aa32_jscvt, s)) {
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 3,
--                        return 1;
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
--                    }
++      .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
--                    rd_is_dp = false;
+       .writefn = tlbi_aa64_vae1is_write },
--                    break;
+     { .name = "TLBI_VALE1OS", .state = ARM_CP_STATE_AA64,
--
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 5,
-                 default:
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-                     return 1;
++      .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
-                 }
+       .writefn = tlbi_aa64_vae1is_write },
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+     { .name = "TLBI_VAALE1OS", .state = ARM_CP_STATE_AA64,
-                 switch (op) {
+       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 7,
-                 case 15: /* extension space */
+-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
-                     switch (rn) {
++      .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
--                    case 19: /* vjcvt */
+       .writefn = tlbi_aa64_vae1is_write },
--                        gen_helper_vjcvt(cpu_F0s, cpu_F0d, cpu_env);
+     { .name = "TLBI_ALLE2OS", .state = ARM_CP_STATE_AA64,
--                        break;
+       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 0,
                      case 20: /* fshto */
                          gen_vfp_shto(dp, 16 - rm, 0);
                          break;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_int_sp  ---- 1110 1.11 1000 .... 1010 s:1 1.0 .... \
               vd=%vd_sp vm=%vm_sp
  VCVT_int_dp  ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
               vd=%vd_dp vm=%vm_sp
 +
 +# VJCVT is always dp to sp
 +VJCVT        ---- 1110 1.11 1001 .... 1011 11.0 .... \
 +             vd=%vd_sp vm=%vm_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 42/48] target/arm: Convert VFP round insns to decodetree
+[PULL 14/29] target/arm: Implement HCR_EL2.TICAB,TOCU traps
-Convert the VFP round-to-integer instructions VRINTR, VRINTZ and
+For FEAT_EVT, the HCR_EL2.TICAB bit allows trapping of the ICIALLUIS
-VRINTX to decodetree.
+and IC IALLUIS cache maintenance instructions.
-These instructions were only introduced as part of the "VFP misc"
+The HCR_EL2.TOCU bit traps all the other cache maintenance
-additions in v8A, so we check this. The old decoder's implementation
+instructions that operate to the point of unification:
-was incorrectly providing them even for v7A CPUs.
+ AArch64 IC IVAU, IC IALLU, DC CVAU
  AArch32 ICIMVAU, ICIALLU, DCCMVAU
 The two trap bits between them cover all of the cache maintenance
 instructions which must also check the HCR_TPU flag.  Turn the old
 aa64_cacheop_pou_access() function into a helper function which takes
 the set of HCR_EL2 flags to check as an argument, and call it from
 new access_ticab() and access_tocu() functions as appropriate for
 each cache op.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 163 +++++++++++++++++++++++++++++++++
+ target/arm/helper.c | 36 +++++++++++++++++++++++-------------
- target/arm/translate.c         |  45 +--------
+file changed, 23 insertions(+), 13 deletions(-)
  target/arm/vfp.decode          |  15 +++
 files changed, 179 insertions(+), 44 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/helper.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
+@@ -XXX,XX +XXX,XX @@ static CPAccessResult aa64_cacheop_poc_access(CPUARMState *env,
-     tcg_temp_free_i32(tmp);
+     return CP_ACCESS_OK;
      return true;
  }
-+
-+static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
+-static CPAccessResult aa64_cacheop_pou_access(CPUARMState *env,
 -                                              const ARMCPRegInfo *ri,
 -                                              bool isread)
 +static CPAccessResult do_cacheop_pou_access(CPUARMState *env, uint64_t hcrflags)
  {
      /* Cache invalidate/clean to Point of Unification... */
      switch (arm_current_el(env)) {
@@ -XXX,XX +XXX,XX @@ static CPAccessResult aa64_cacheop_pou_access(CPUARMState *env,
          }
          /* fall through */
      case 1:
 -        /* ... EL1 must trap to EL2 if HCR_EL2.TPU is set.  */
 -        if (arm_hcr_el2_eff(env) & HCR_TPU) {
 +        /* ... EL1 must trap to EL2 if relevant HCR_EL2 flags are set.  */
 +        if (arm_hcr_el2_eff(env) & hcrflags) {
              return CP_ACCESS_TRAP_EL2;
          }
          break;
@@ -XXX,XX +XXX,XX @@ static CPAccessResult aa64_cacheop_pou_access(CPUARMState *env,
      return CP_ACCESS_OK;
  }
 +static CPAccessResult access_ticab(CPUARMState *env, const ARMCPRegInfo *ri,
 +                                   bool isread)
 +{
-+    TCGv_ptr fpst;
++    return do_cacheop_pou_access(env, HCR_TICAB | HCR_TPU);
 +    TCGv_i32 tmp;
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i32();
 +    neon_load_reg32(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    gen_helper_rints(tmp, tmp, fpst);
 +    neon_store_reg32(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
-+static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_sp *a)
++static CPAccessResult access_tocu(CPUARMState *env, const ARMCPRegInfo *ri,
 +                                  bool isread)
 +{
-+    TCGv_ptr fpst;
++    return do_cacheop_pou_access(env, HCR_TOCU | HCR_TPU);
 +    TCGv_i64 tmp;
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i64();
 +    neon_load_reg64(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    gen_helper_rintd(tmp, tmp, fpst);
 +    neon_store_reg64(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i64(tmp);
 +    return true;
 +}
 +
-+static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
+ /* See: D4.7.2 TLB maintenance requirements and the TLB maintenance instructions
-+{
+  * Page D4-1736 (DDI0487A.b)
-+    TCGv_ptr fpst;
+  */
-+    TCGv_i32 tmp;
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-+    TCGv_i32 tcg_rmode;
+     { .name = "IC_IALLUIS", .state = ARM_CP_STATE_AA64,
-+
+       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
-+    if (!dc_isar_feature(aa32_vrint, s)) {
+       .access = PL1_W, .type = ARM_CP_NOP,
-+        return false;
+-      .accessfn = aa64_cacheop_pou_access },
-+    }
++      .accessfn = access_ticab },
-+
+     { .name = "IC_IALLU", .state = ARM_CP_STATE_AA64,
-+    if (!vfp_access_check(s)) {
+       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 0,
-+        return true;
+       .access = PL1_W, .type = ARM_CP_NOP,
-+    }
+-      .accessfn = aa64_cacheop_pou_access },
-+
++      .accessfn = access_tocu },
-+    tmp = tcg_temp_new_i32();
+     { .name = "IC_IVAU", .state = ARM_CP_STATE_AA64,
-+    neon_load_reg32(tmp, a->vm);
+       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 5, .opc2 = 1,
-+    fpst = get_fpstatus_ptr(false);
+       .access = PL0_W, .type = ARM_CP_NOP,
-+    tcg_rmode = tcg_const_i32(float_round_to_zero);
+-      .accessfn = aa64_cacheop_pou_access },
-+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
++      .accessfn = access_tocu },
-+    gen_helper_rints(tmp, tmp, fpst);
+     { .name = "DC_IVAC", .state = ARM_CP_STATE_AA64,
-+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 1,
-+    neon_store_reg32(tmp, a->vd);
+       .access = PL1_W, .accessfn = aa64_cacheop_poc_access,
-+    tcg_temp_free_ptr(fpst);
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-+    tcg_temp_free_i32(tcg_rmode);
+     { .name = "DC_CVAU", .state = ARM_CP_STATE_AA64,
-+    tcg_temp_free_i32(tmp);
+       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 11, .opc2 = 1,
-+    return true;
+       .access = PL0_W, .type = ARM_CP_NOP,
-+}
+-      .accessfn = aa64_cacheop_pou_access },
-+
++      .accessfn = access_tocu },
-+static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_sp *a)
+     { .name = "DC_CIVAC", .state = ARM_CP_STATE_AA64,
-+{
+       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 14, .opc2 = 1,
-+    TCGv_ptr fpst;
+       .access = PL0_W, .type = ARM_CP_NOP,
-+    TCGv_i64 tmp;
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-+    TCGv_i32 tcg_rmode;
+       .writefn = tlbiipas2is_hyp_write },
-+
+     /* 32 bit cache operations */
-+    if (!dc_isar_feature(aa32_vrint, s)) {
+     { .name = "ICIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
-+        return false;
+-      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
-+    }
++      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_ticab },
-+
+     { .name = "BPIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 6,
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
+       .type = ARM_CP_NOP, .access = PL1_W },
-+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+     { .name = "ICIALLU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 0,
-+        return false;
+-      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
-+    }
++      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tocu },
-+
+     { .name = "ICIMVAU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 1,
-+    if (!vfp_access_check(s)) {
+-      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
-+        return true;
++      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tocu },
-+    }
+     { .name = "BPIALL", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 6,
-+
+       .type = ARM_CP_NOP, .access = PL1_W },
-+    tmp = tcg_temp_new_i64();
+     { .name = "BPIMVA", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 7,
-+    neon_load_reg64(tmp, a->vm);
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-+    fpst = get_fpstatus_ptr(false);
+     { .name = "DCCSW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 2,
-+    tcg_rmode = tcg_const_i32(float_round_to_zero);
+       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
-+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+     { .name = "DCCMVAU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 11, .opc2 = 1,
-+    gen_helper_rintd(tmp, tmp, fpst);
+-      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
-+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
++      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tocu },
-+    neon_store_reg64(tmp, a->vd);
+     { .name = "DCCIMVAC", .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 1,
-+    tcg_temp_free_ptr(fpst);
+       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_poc_access },
-+    tcg_temp_free_i64(tmp);
+     { .name = "DCCISW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2,
 +    tcg_temp_free_i32(tcg_rmode);
 +    return true;
 +}
 +
 +static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i32 tmp;
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i32();
 +    neon_load_reg32(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    gen_helper_rints_exact(tmp, tmp, fpst);
 +    neon_store_reg32(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
 +static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i64 tmp;
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i64();
 +    neon_load_reg64(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    gen_helper_rintd_exact(tmp, tmp, fpst);
 +    neon_store_reg64(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i64(tmp);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  return 1;
              case 15:
                  switch (rn) {
 -                case 0 ... 11:
 +                case 0 ... 14:
                      /* Already handled by decodetree */
                      return 1;
                  default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              if (op == 15) {
                  /* rn is opcode, encoded as per VFP_SREG_N. */
                  switch (rn) {
 -                case 0x0c: /* vrintr */
 -                case 0x0d: /* vrintz */
 -                case 0x0e: /* vrintx */
 -                    break;
 -
                  case 0x0f: /* vcvt double<->single */
                      rd_is_dp = !dp;
                      break;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 12: /* vrintr */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(0);
 -                        if (dp) {
 -                            gen_helper_rintd(cpu_F0d, cpu_F0d, fpst);
 -                        } else {
 -                            gen_helper_rints(cpu_F0s, cpu_F0s, fpst);
 -                        }
 -                        tcg_temp_free_ptr(fpst);
 -                        break;
 -                    }
 -                    case 13: /* vrintz */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(0);
 -                        TCGv_i32 tcg_rmode;
 -                        tcg_rmode = tcg_const_i32(float_round_to_zero);
 -                        gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -                        if (dp) {
 -                            gen_helper_rintd(cpu_F0d, cpu_F0d, fpst);
 -                        } else {
 -                            gen_helper_rints(cpu_F0s, cpu_F0s, fpst);
 -                        }
 -                        gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -                        tcg_temp_free_i32(tcg_rmode);
 -                        tcg_temp_free_ptr(fpst);
 -                        break;
 -                    }
 -                    case 14: /* vrintx */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(0);
 -                        if (dp) {
 -                            gen_helper_rintd_exact(cpu_F0d, cpu_F0d, fpst);
 -                        } else {
 -                            gen_helper_rints_exact(cpu_F0s, cpu_F0s, fpst);
 -                        }
 -                        tcg_temp_free_ptr(fpst);
 -                        break;
 -                    }
                      case 15: /* single<->double conversion */
                          if (dp) {
                              gen_helper_vfp_fcvtsd(cpu_F0s, cpu_F0d, cpu_env);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
               vd=%vd_sp vm=%vm_sp
  VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
               vd=%vd_sp vm=%vm_dp
 +
 +VRINTR_sp    ---- 1110 1.11 0110 .... 1010 01.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VRINTR_dp    ---- 1110 1.11 0110 .... 1011 01.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 +
 +VRINTZ_sp    ---- 1110 1.11 0110 .... 1010 11.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VRINTZ_dp    ---- 1110 1.11 0110 .... 1011 11.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 +
 +VRINTX_sp    ---- 1110 1.11 0111 .... 1010 01.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VRINTX_dp    ---- 1110 1.11 0111 .... 1011 01.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 09/48] target/arm: Factor out VFP access checking code
+[PULL 15/29] target/arm: Implement HCR_EL2.TID4 traps
-Factor out the VFP access checking code so that we can use it in the
+For FEAT_EVT, the HCR_EL2.TID4 trap allows trapping of the cache ID
-leaf functions of the decodetree decoder.
+registers CCSIDR_EL1, CCSIDR2_EL1, CLIDR_EL1 and CSSELR_EL1 (and
 their AArch32 equivalents).  This is a subset of the registers
 trapped by HCR_EL2.TID2, which includes all of these and also the
 CTR_EL0 register.
-We call the function full_vfp_access_check() so we can keep
+Our implementation already uses a separate access function for
-the more natural vfp_access_check() for a version which doesn't
+CTR_EL0 (ctr_el0_access()), so all of the registers currently using
-have the 'ignore_vfp_enabled' flag -- that way almost all VFP
+access_aa64_tid2() should also be checking TID4.  Make that function
-insns will be able to use vfp_access_check(s) and only the
+check both TID2 and TID4, and rename it appropriately.
 special-register access function will have to use
 full_vfp_access_check(s, ignore_vfp_enabled).
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 100 ++++++++++++++++++++++++++++++++
+ target/arm/helper.c | 17 +++++++++--------
- target/arm/translate.c         | 101 +++++----------------------------
+file changed, 9 insertions(+), 8 deletions(-)
 files changed, 113 insertions(+), 88 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/helper.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void scr_reset(CPUARMState *env, const ARMCPRegInfo *ri)
- /* Include the generated VFP decoder */
+     scr_write(env, ri, 0);
  #include "decode-vfp.inc.c"
  #include "decode-vfp-uncond.inc.c"
 +
 +/*
 + * Check that VFP access is enabled. If it is, do the necessary
 + * M-profile lazy-FP handling and then return true.
 + * If not, emit code to generate an appropriate exception and
 + * return false.
 + * The ignore_vfp_enabled argument specifies that we should ignore
 + * whether VFP is enabled via FPEXC[EN]: this should be true for FMXR/FMRX
 + * accesses to FPSID, FPEXC, MVFR0, MVFR1, MVFR2, and false for all other insns.
 + */
 +static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
 +{
 +    if (s->fp_excp_el) {
 +        if (arm_dc_feature(s, ARM_FEATURE_M)) {
 +            gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
 +                               s->fp_excp_el);
 +        } else {
 +            gen_exception_insn(s, 4, EXCP_UDEF,
 +                               syn_fp_access_trap(1, 0xe, false),
 +                               s->fp_excp_el);
 +        }
 +        return false;
 +    }
 +
 +    if (!s->vfp_enabled && !ignore_vfp_enabled) {
 +        assert(!arm_dc_feature(s, ARM_FEATURE_M));
 +        gen_exception_insn(s, 4, EXCP_UDEF, syn_uncategorized(),
 +                           default_exception_el(s));
 +        return false;
 +    }
 +
 +    if (arm_dc_feature(s, ARM_FEATURE_M)) {
 +        /* Handle M-profile lazy FP state mechanics */
 +
 +        /* Trigger lazy-state preservation if necessary */
 +        if (s->v7m_lspact) {
 +            /*
 +             * Lazy state saving affects external memory and also the NVIC,
 +             * so we must mark it as an IO operation for icount.
 +             */
 +            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 +                gen_io_start();
 +            }
 +            gen_helper_v7m_preserve_fp_state(cpu_env);
 +            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 +                gen_io_end();
 +            }
 +            /*
 +             * If the preserve_fp_state helper doesn't throw an exception
 +             * then it will clear LSPACT; we don't need to repeat this for
 +             * any further FP insns in this TB.
 +             */
 +            s->v7m_lspact = false;
 +        }
 +
 +        /* Update ownership of FP context: set FPCCR.S to match current state */
 +        if (s->v8m_fpccr_s_wrong) {
 +            TCGv_i32 tmp;
 +
 +            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
 +            if (s->v8m_secure) {
 +                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
 +            } else {
 +                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
 +            }
 +            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
 +            /* Don't need to do this for any further FP insns in this TB */
 +            s->v8m_fpccr_s_wrong = false;
 +        }
 +
 +        if (s->v7m_new_fp_ctxt_needed) {
 +            /*
 +             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
 +             * and the FPSCR.
 +             */
 +            TCGv_i32 control, fpscr;
 +            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
 +
 +            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
 +            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
 +            tcg_temp_free_i32(fpscr);
 +            /*
 +             * We don't need to arrange to end the TB, because the only
 +             * parts of FPSCR which we cache in the TB flags are the VECLEN
 +             * and VECSTRIDE, and those don't exist for M-profile.
 +             */
 +
 +            if (s->v8m_secure) {
 +                bits |= R_V7M_CONTROL_SFPA_MASK;
 +            }
 +            control = load_cpu_field(v7m.control[M_REG_S]);
 +            tcg_gen_ori_i32(control, control, bits);
 +            store_cpu_field(control, v7m.control[M_REG_S]);
 +            /* Don't need to do this for any further FP insns in this TB */
 +            s->v7m_new_fp_ctxt_needed = false;
 +        }
 +    }
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
      return 1;
  }
--/* Disassemble a VFP instruction.  Returns nonzero if an error occurred
+-static CPAccessResult access_aa64_tid2(CPUARMState *env,
--   (ie. an undefined instruction).  */
+-                                       const ARMCPRegInfo *ri,
-+/*
+-                                       bool isread)
-+ * Disassemble a VFP instruction.  Returns nonzero if an error occurred
++static CPAccessResult access_tid4(CPUARMState *env,
-+ * (ie. an undefined instruction).
++                                  const ARMCPRegInfo *ri,
-+ */
++                                  bool isread)
  static int disas_vfp_insn(DisasContext *s, uint32_t insn)
  {
-     uint32_t rd, rn, rm, op, i, n, offset, delta_d, delta_m, bank_mask;
+-    if (arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_TID2)) {
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
++    if (arm_current_el(env) == 1 &&
-     TCGv_i32 addr;
++        (arm_hcr_el2_eff(env) & (HCR_TID2 | HCR_TID4))) {
-     TCGv_i32 tmp;
+         return CP_ACCESS_TRAP_EL2;
      TCGv_i32 tmp2;
 +    bool ignore_vfp_enabled = false;
      if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
          return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
          }
      }
--    /* FIXME: this access check should not take precedence over UNDEF
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
-+    /*
+     { .name = "CCSIDR", .state = ARM_CP_STATE_BOTH,
-+     * FIXME: this access check should not take precedence over UNDEF
+       .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 0,
-      * for invalid encodings; we will generate incorrect syndrome information
+       .access = PL1_R,
-      * for attempts to execute invalid vfp/neon encodings with FP disabled.
+-      .accessfn = access_aa64_tid2,
-      */
++      .accessfn = access_tid4,
--    if (s->fp_excp_el) {
+       .readfn = ccsidr_read, .type = ARM_CP_NO_RAW },
--        if (arm_dc_feature(s, ARM_FEATURE_M)) {
+     { .name = "CSSELR", .state = ARM_CP_STATE_BOTH,
--            gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
+       .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 2, .opc2 = 0,
--                               s->fp_excp_el);
+       .access = PL1_RW,
--        } else {
+-      .accessfn = access_aa64_tid2,
--            gen_exception_insn(s, 4, EXCP_UDEF,
++      .accessfn = access_tid4,
--                               syn_fp_access_trap(1, 0xe, false),
+       .writefn = csselr_write, .resetvalue = 0,
--                               s->fp_excp_el);
+       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.csselr_s),
--        }
+                              offsetof(CPUARMState, cp15.csselr_ns) } },
--        return 0;
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo ccsidr2_reginfo[] = {
--    }
+     { .name = "CCSIDR2", .state = ARM_CP_STATE_BOTH,
--
+       .opc0 = 3, .opc1 = 1, .crn = 0, .crm = 0, .opc2 = 2,
--    if (!s->vfp_enabled) {
+       .access = PL1_R,
--        /* VFP disabled.  Only allow fmxr/fmrx to/from some control regs.  */
+-      .accessfn = access_aa64_tid2,
--        if ((insn & 0x0fe00fff) != 0x0ee00a10)
++      .accessfn = access_tid4,
--            return 1;
+       .readfn = ccsidr2_read, .type = ARM_CP_NO_RAW },
-+    if ((insn & 0x0fe00fff) == 0x0ee00a10) {
+ };
-         rn = (insn >> 16) & 0xf;
--        if (rn != ARM_VFP_FPSID && rn != ARM_VFP_FPEXC && rn != ARM_VFP_MVFR2
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
--            && rn != ARM_VFP_MVFR1 && rn != ARM_VFP_MVFR0) {
+             .name = "CLIDR", .state = ARM_CP_STATE_BOTH,
--            return 1;
+             .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 1,
-+        if (rn == ARM_VFP_FPSID || rn == ARM_VFP_FPEXC || rn == ARM_VFP_MVFR2
+             .access = PL1_R, .type = ARM_CP_CONST,
-+            || rn == ARM_VFP_MVFR1 || rn == ARM_VFP_MVFR0) {
+-            .accessfn = access_aa64_tid2,
-+            ignore_vfp_enabled = true;
++            .accessfn = access_tid4,
-         }
+             .resetvalue = cpu->clidr
-     }
+         };
--
+         define_one_arm_cp_reg(cpu, &clidr);
 -    if (arm_dc_feature(s, ARM_FEATURE_M)) {
 -        /* Handle M-profile lazy FP state mechanics */
 -
 -        /* Trigger lazy-state preservation if necessary */
 -        if (s->v7m_lspact) {
 -            /*
 -             * Lazy state saving affects external memory and also the NVIC,
 -             * so we must mark it as an IO operation for icount.
 -             */
 -            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 -                gen_io_start();
 -            }
 -            gen_helper_v7m_preserve_fp_state(cpu_env);
 -            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 -                gen_io_end();
 -            }
 -            /*
 -             * If the preserve_fp_state helper doesn't throw an exception
 -             * then it will clear LSPACT; we don't need to repeat this for
 -             * any further FP insns in this TB.
 -             */
 -            s->v7m_lspact = false;
 -        }
 -
 -        /* Update ownership of FP context: set FPCCR.S to match current state */
 -        if (s->v8m_fpccr_s_wrong) {
 -            TCGv_i32 tmp;
 -
 -            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
 -            if (s->v8m_secure) {
 -                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
 -            } else {
 -                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
 -            }
 -            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
 -            /* Don't need to do this for any further FP insns in this TB */
 -            s->v8m_fpccr_s_wrong = false;
 -        }
 -
 -        if (s->v7m_new_fp_ctxt_needed) {
 -            /*
 -             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
 -             * and the FPSCR.
 -             */
 -            TCGv_i32 control, fpscr;
 -            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
 -
 -            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
 -            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
 -            tcg_temp_free_i32(fpscr);
 -            /*
 -             * We don't need to arrange to end the TB, because the only
 -             * parts of FPSCR which we cache in the TB flags are the VECLEN
 -             * and VECSTRIDE, and those don't exist for M-profile.
 -             */
 -
 -            if (s->v8m_secure) {
 -                bits |= R_V7M_CONTROL_SFPA_MASK;
 -            }
 -            control = load_cpu_field(v7m.control[M_REG_S]);
 -            tcg_gen_ori_i32(control, control, bits);
 -            store_cpu_field(control, v7m.control[M_REG_S]);
 -            /* Don't need to do this for any further FP insns in this TB */
 -            s->v7m_new_fp_ctxt_needed = false;
 -        }
 +    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
 +        return 0;
      }
      if (extract32(insn, 28, 4) == 0xf) {
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 44/48] target/arm: Convert integer-to-float insns to decodetree
+[PULL 16/29] target/arm: Report FEAT_EVT for TCG '-cpu max'
-Convert the VCVT integer-to-float instructions to decodetree.
+Update the ID registers for TCG's '-cpu max' to report the
 FEAT_EVT Enhanced Virtualization Traps support.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 58 ++++++++++++++++++++++++++++++++++
+ docs/system/arm/emulation.rst | 1 +
- target/arm/translate.c         | 12 +------
+ target/arm/cpu64.c            | 1 +
- target/arm/vfp.decode          |  6 ++++
+ target/arm/cpu_tcg.c          | 1 +
-files changed, 65 insertions(+), 11 deletions(-)
+files changed, 3 insertions(+)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/docs/system/arm/emulation.rst
-+++ b/target/arm/translate-vfp.inc.c
++++ b/docs/system/arm/emulation.rst
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
-     tcg_temp_free_i64(vm);
+ - FEAT_DoubleFault (Double Fault Extension)
-     return true;
+ - FEAT_E0PD (Preventing EL0 access to halves of address maps)
- }
+ - FEAT_ETS (Enhanced Translation Synchronization)
-+
++- FEAT_EVT (Enhanced Virtualization Traps)
-+static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
+ - FEAT_FCMA (Floating-point complex number instructions)
-+{
+ - FEAT_FHM (Floating-point half-precision multiplication instructions)
-+    TCGv_i32 vm;
+ - FEAT_FP16 (Half-precision floating-point data processing)
-+    TCGv_ptr fpst;
+diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    vm = tcg_temp_new_i32();
 +    neon_load_reg32(vm, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    if (a->s) {
 +        /* i32 -> f32 */
 +        gen_helper_vfp_sitos(vm, vm, fpst);
 +    } else {
 +        /* u32 -> f32 */
 +        gen_helper_vfp_uitos(vm, vm, fpst);
 +    }
 +    neon_store_reg32(vm, a->vd);
 +    tcg_temp_free_i32(vm);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
 +static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
 +{
 +    TCGv_i32 vm;
 +    TCGv_i64 vd;
 +    TCGv_ptr fpst;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    vm = tcg_temp_new_i32();
 +    vd = tcg_temp_new_i64();
 +    neon_load_reg32(vm, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    if (a->s) {
 +        /* i32 -> f64 */
 +        gen_helper_vfp_sitod(vd, vm, fpst);
 +    } else {
 +        /* u32 -> f64 */
 +        gen_helper_vfp_uitod(vd, vm, fpst);
 +    }
 +    neon_store_reg64(vd, a->vd);
 +    tcg_temp_free_i32(vm);
 +    tcg_temp_free_i64(vd);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/cpu64.c
-+++ b/target/arm/translate.c
++++ b/target/arm/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
-                 return 1;
+     t = FIELD_DP64(t, ID_AA64MMFR2, FWB, 1);      /* FEAT_S2FWB */
-             case 15:
+     t = FIELD_DP64(t, ID_AA64MMFR2, TTL, 1);      /* FEAT_TTL */
-                 switch (rn) {
+     t = FIELD_DP64(t, ID_AA64MMFR2, BBM, 2);      /* FEAT_BBM at level 2 */
--                case 0 ... 15:
++    t = FIELD_DP64(t, ID_AA64MMFR2, EVT, 2);      /* FEAT_EVT */
-+                case 0 ... 17:
+     t = FIELD_DP64(t, ID_AA64MMFR2, E0PD, 1);     /* FEAT_E0PD */
-                     /* Already handled by decodetree */
+     cpu->isar.id_aa64mmfr2 = t;
-                     return 1;
-                 default:
+diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              if (op == 15) {
                  /* rn is opcode, encoded as per VFP_SREG_N. */
                  switch (rn) {
 -                case 0x10: /* vcvt.fxx.u32 */
 -                case 0x11: /* vcvt.fxx.s32 */
 -                    rm_is_dp = false;
 -                    break;
                  case 0x18: /* vcvtr.u32.fxx */
                  case 0x19: /* vcvtz.u32.fxx */
                  case 0x1a: /* vcvtr.s32.fxx */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 16: /* fuito */
 -                        gen_vfp_uito(dp, 0);
 -                        break;
 -                    case 17: /* fsito */
 -                        gen_vfp_sito(dp, 0);
 -                        break;
                      case 19: /* vjcvt */
                          gen_helper_vjcvt(cpu_F0s, cpu_F0d, cpu_env);
                          break;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
+--- a/target/arm/cpu_tcg.c
-+++ b/target/arm/vfp.decode
++++ b/target/arm/cpu_tcg.c
-@@ -XXX,XX +XXX,XX @@ VCVT_sp      ---- 1110 1.11 0111 .... 1010 11.0 .... \
+@@ -XXX,XX +XXX,XX @@ void aa32_max_features(ARMCPU *cpu)
-              vd=%vd_dp vm=%vm_sp
+     t = FIELD_DP32(t, ID_MMFR4, AC2, 1);          /* ACTLR2, HACTLR2 */
- VCVT_dp      ---- 1110 1.11 0111 .... 1011 11.0 .... \
+     t = FIELD_DP32(t, ID_MMFR4, CNP, 1);          /* FEAT_TTCNP */
-              vd=%vd_sp vm=%vm_dp
+     t = FIELD_DP32(t, ID_MMFR4, XNX, 1);          /* FEAT_XNX */
-+
++    t = FIELD_DP32(t, ID_MMFR4, EVT, 2);          /* FEAT_EVT */
-+# VCVT from integer to floating point: Vm always single; Vd depends on size
+     cpu->isar.id_mmfr4 = t;
-+VCVT_int_sp  ---- 1110 1.11 1000 .... 1010 s:1 1.0 .... \
-+             vd=%vd_sp vm=%vm_sp
+     t = cpu->isar.id_mmfr5;
 +VCVT_int_dp  ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
 +             vd=%vd_dp vm=%vm_sp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 05/48] hw/core/bus.c: Only the main system bus can have no parent
+[PULL 17/29] hw/arm: Convert TYPE_ARM_SMMU to 3-phase reset
-In commit 80376c3fc2c38fdd453 in 2010 we added a workaround for
+Convert the TYPE_ARM_SMMU device to 3-phase reset.  The legacy method
-some qbus buses not being connected to qdev devices -- if the
+doesn't do anything that's invalid in the hold phase, so the
-bus has no parent object then we register a reset function which
+conversion is simple and not a behaviour change.
 resets the bus on system reset (and unregister it when the
 bus is unparented).
-Nearly a decade later, we have now no buses in the tree which
+Note that we must convert this base class before we can convert the
-are created with non-NULL parents, so we can remove the
+TYPE_ARM_SMMUV3 subclass -- transitional support in Resettable
-workaround and instead just assert that if the bus has a NULL
+handles "chain to parent class reset" when the base class is 3-phase
-parent then it is the main system bus.
+and the subclass is still using legacy reset, but not the other way
+around.
 (The absence of other parentless buses was confirmed by
 code inspection of all the callsites of qbus_create() and
 qbus_create_inplace() and cross-checked by 'make check'.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Markus Armbruster <armbru@redhat.com>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Damien Hedde <damien.hedde@greensocs.com>
+Reviewed-by: Eric Auger <eric.auger@redhat.com>
-Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Message-id: 20221109161444.3397405-2-peter.maydell@linaro.org
 Message-id: 20190523150543.22676-1-peter.maydell@linaro.org
 ---
- hw/core/bus.c | 21 +++++++++------------
+ hw/arm/smmu-common.c | 7 ++++---
-file changed, 9 insertions(+), 12 deletions(-)
+file changed, 4 insertions(+), 3 deletions(-)
-diff --git a/hw/core/bus.c b/hw/core/bus.c
+diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/core/bus.c
+--- a/hw/arm/smmu-common.c
-+++ b/hw/core/bus.c
++++ b/hw/arm/smmu-common.c
-@@ -XXX,XX +XXX,XX @@ static void qbus_realize(BusState *bus, DeviceState *parent, const char *name)
+@@ -XXX,XX +XXX,XX @@ static void smmu_base_realize(DeviceState *dev, Error **errp)
          bus->parent->num_child_bus++;
          object_property_add_child(OBJECT(bus->parent), bus->name, OBJECT(bus), NULL);
          object_unref(OBJECT(bus));
 -    } else if (bus != sysbus_get_default()) {
 -        /* TODO: once all bus devices are qdevified,
 -           only reset handler for main_system_bus should be registered here. */
 -        qemu_register_reset(qbus_reset_all_fn, bus);
 +    } else {
 +        /* The only bus without a parent is the main system bus */
 +        assert(bus == sysbus_get_default());
      }
  }
-@@ -XXX,XX +XXX,XX @@ static void bus_unparent(Object *obj)
+-static void smmu_base_reset(DeviceState *dev)
-     BusState *bus = BUS(obj);
++static void smmu_base_reset_hold(Object *obj)
-     BusChild *kid;
+ {
+-    SMMUState *s = ARM_SMMU(dev);
-+    /* Only the main system bus has no parent, and that bus is never freed */
++    SMMUState *s = ARM_SMMU(obj);
-+    assert(bus->parent);
-+
+     g_hash_table_remove_all(s->configs);
-     while ((kid = QTAILQ_FIRST(&bus->children)) != NULL) {
+     g_hash_table_remove_all(s->iotlb);
-         DeviceState *dev = kid->child;
+@@ -XXX,XX +XXX,XX @@ static Property smmu_dev_properties[] = {
-         object_unparent(OBJECT(dev));
+ static void smmu_base_class_init(ObjectClass *klass, void *data)
-     }
+ {
--    if (bus->parent) {
+     DeviceClass *dc = DEVICE_CLASS(klass);
--        QLIST_REMOVE(bus, sibling);
++    ResettableClass *rc = RESETTABLE_CLASS(klass);
--        bus->parent->num_child_bus--;
+     SMMUBaseClass *sbc = ARM_SMMU_CLASS(klass);
--        bus->parent = NULL;
--    } else {
+     device_class_set_props(dc, smmu_dev_properties);
--        assert(bus != sysbus_get_default()); /* main_system_bus is never freed */
+     device_class_set_parent_realize(dc, smmu_base_realize,
--        qemu_unregister_reset(qbus_reset_all_fn, bus);
+                                     &sbc->parent_realize);
--    }
+-    dc->reset = smmu_base_reset;
-+    QLIST_REMOVE(bus, sibling);
++    rc->phases.hold = smmu_base_reset_hold;
 +    bus->parent->num_child_bus--;
 +    bus->parent = NULL;
  }
- void qbus_create_inplace(void *bus, size_t size, const char *typename,
+ static const TypeInfo smmu_base_info = {
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 04/48] hw/arm/smmuv3: Fix decoding of ID register range
+[PULL 18/29] hw/arm: Convert TYPE_ARM_SMMUV3 to 3-phase reset
-The SMMUv3 ID registers cover an area 0x30 bytes in size
+Convert the TYPE_ARM_SMMUV3 device to 3-phase reset.  The legacy
-(12 registers, 4 bytes each). We were incorrectly decoding
+reset method doesn't do anything that's invalid in the hold phase, so
-only the first 0x20 bytes.
+the conversion only requires changing it to a hold phase method, and
 using the 3-phase versions of the "save the parent reset method and
 chain to it" code.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Eric Auger <eric.auger@redhat.com>
-Message-id: 20190524124829.2589-1-peter.maydell@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20221109161444.3397405-3-peter.maydell@linaro.org
 ---
- hw/arm/smmuv3.c | 2 +-
+ include/hw/arm/smmuv3.h |  2 +-
-file changed, 1 insertion(+), 1 deletion(-)
+ hw/arm/smmuv3.c         | 12 ++++++++----
 files changed, 9 insertions(+), 5 deletions(-)
+diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/hw/arm/smmuv3.h
++++ b/include/hw/arm/smmuv3.h
+@@ -XXX,XX +XXX,XX @@ struct SMMUv3Class {
+     /*< public >*/
+     DeviceRealize parent_realize;
+-    DeviceReset   parent_reset;
++    ResettablePhases parent_phases;
+ };
+ #define TYPE_ARM_SMMUV3   "arm-smmuv3"
 diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/smmuv3.c
 +++ b/hw/arm/smmuv3.c
-@@ -XXX,XX +XXX,XX @@ static MemTxResult smmu_readl(SMMUv3State *s, hwaddr offset,
+@@ -XXX,XX +XXX,XX @@ static void smmu_init_irq(SMMUv3State *s, SysBusDevice *dev)
-                               uint64_t *data, MemTxAttrs attrs)
+     }
  }
 -static void smmu_reset(DeviceState *dev)
 +static void smmu_reset_hold(Object *obj)
  {
-     switch (offset) {
+-    SMMUv3State *s = ARM_SMMUV3(dev);
--    case A_IDREGS ... A_IDREGS + 0x1f:
++    SMMUv3State *s = ARM_SMMUV3(obj);
-+    case A_IDREGS ... A_IDREGS + 0x2f:
+     SMMUv3Class *c = ARM_SMMUV3_GET_CLASS(s);
-         *data = smmuv3_idreg(offset - A_IDREGS);
-         return MEMTX_OK;
+-    c->parent_reset(dev);
-     case A_IDR0 ... A_IDR5:
++    if (c->parent_phases.hold) {
 +        c->parent_phases.hold(obj);
 +    }
      smmuv3_init_regs(s);
  }
@@ -XXX,XX +XXX,XX @@ static void smmuv3_instance_init(Object *obj)
  static void smmuv3_class_init(ObjectClass *klass, void *data)
  {
      DeviceClass *dc = DEVICE_CLASS(klass);
 +    ResettableClass *rc = RESETTABLE_CLASS(klass);
      SMMUv3Class *c = ARM_SMMUV3_CLASS(klass);
      dc->vmsd = &vmstate_smmuv3;
 -    device_class_set_parent_reset(dc, smmu_reset, &c->parent_reset);
 +    resettable_class_set_parent_phases(rc, NULL, smmu_reset_hold, NULL,
 +                                       &c->parent_phases);
      c->parent_realize = dc->realize;
      dc->realize = smmu_realize;
  }
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 43/48] target/arm: Convert double-single precision conversion insns to decodetree
+[PULL 19/29] hw/intc: Convert TYPE_ARM_GIC_COMMON to 3-phase reset
-Convert the VCVT double/single precision conversion insns to decodetree.
+Convert the TYPE_ARM_GIC_COMMON device to 3-phase reset.  This is a
 simple no-behaviour-change conversion.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20221109161444.3397405-4-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 48 ++++++++++++++++++++++++++++++++++
+ hw/intc/arm_gic_common.c | 7 ++++---
- target/arm/translate.c         | 13 +--------
+file changed, 4 insertions(+), 3 deletions(-)
  target/arm/vfp.decode          |  6 +++++
 files changed, 55 insertions(+), 12 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/intc/arm_gic_common.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/intc/arm_gic_common.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
+@@ -XXX,XX +XXX,XX @@ static inline void arm_gic_common_reset_irq_state(GICState *s, int first_cpu,
-     tcg_temp_free_i64(tmp);
+     }
      return true;
  }
-+
-+static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
+-static void arm_gic_common_reset(DeviceState *dev)
-+{
++static void arm_gic_common_reset_hold(Object *obj)
-+    TCGv_i64 vd;
+ {
-+    TCGv_i32 vm;
+-    GICState *s = ARM_GIC_COMMON(dev);
-+
++    GICState *s = ARM_GIC_COMMON(obj);
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
+     int i, j;
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
+     int resetprio;
-+        return false;
-+    }
+@@ -XXX,XX +XXX,XX @@ static Property arm_gic_common_properties[] = {
-+
+ static void arm_gic_common_class_init(ObjectClass *klass, void *data)
-+    if (!vfp_access_check(s)) {
+ {
-+        return true;
+     DeviceClass *dc = DEVICE_CLASS(klass);
-+    }
++    ResettableClass *rc = RESETTABLE_CLASS(klass);
-+
+     ARMLinuxBootIfClass *albifc = ARM_LINUX_BOOT_IF_CLASS(klass);
-+    vm = tcg_temp_new_i32();
-+    vd = tcg_temp_new_i64();
+-    dc->reset = arm_gic_common_reset;
-+    neon_load_reg32(vm, a->vm);
++    rc->phases.hold = arm_gic_common_reset_hold;
-+    gen_helper_vfp_fcvtds(vd, vm, cpu_env);
+     dc->realize = arm_gic_common_realize;
-+    neon_store_reg64(vd, a->vd);
+     device_class_set_props(dc, arm_gic_common_properties);
-+    tcg_temp_free_i32(vm);
+     dc->vmsd = &vmstate_gic;
 +    tcg_temp_free_i64(vd);
 +    return true;
 +}
 +
 +static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
 +{
 +    TCGv_i64 vm;
 +    TCGv_i32 vd;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    vd = tcg_temp_new_i32();
 +    vm = tcg_temp_new_i64();
 +    neon_load_reg64(vm, a->vm);
 +    gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
 +    neon_store_reg32(vd, a->vd);
 +    tcg_temp_free_i32(vd);
 +    tcg_temp_free_i64(vm);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  return 1;
              case 15:
                  switch (rn) {
 -                case 0 ... 14:
 +                case 0 ... 15:
                      /* Already handled by decodetree */
                      return 1;
                  default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              if (op == 15) {
                  /* rn is opcode, encoded as per VFP_SREG_N. */
                  switch (rn) {
 -                case 0x0f: /* vcvt double<->single */
 -                    rd_is_dp = !dp;
 -                    break;
 -
                  case 0x10: /* vcvt.fxx.u32 */
                  case 0x11: /* vcvt.fxx.s32 */
                      rm_is_dp = false;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 15: /* single<->double conversion */
 -                        if (dp) {
 -                            gen_helper_vfp_fcvtsd(cpu_F0s, cpu_F0d, cpu_env);
 -                        } else {
 -                            gen_helper_vfp_fcvtds(cpu_F0d, cpu_F0s, cpu_env);
 -                        }
 -                        break;
                      case 16: /* fuito */
                          gen_vfp_uito(dp, 0);
                          break;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VRINTX_sp    ---- 1110 1.11 0111 .... 1010 01.0 .... \
               vd=%vd_sp vm=%vm_sp
  VRINTX_dp    ---- 1110 1.11 0111 .... 1011 01.0 .... \
               vd=%vd_dp vm=%vm_dp
 +
 +# VCVT between single and double: Vm precision depends on size; Vd is its reverse
 +VCVT_sp      ---- 1110 1.11 0111 .... 1010 11.0 .... \
 +             vd=%vd_dp vm=%vm_sp
 +VCVT_dp      ---- 1110 1.11 0111 .... 1011 11.0 .... \
 +             vd=%vd_sp vm=%vm_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 47/48] target/arm: Convert float-to-integer VCVT insns to decodetree
+[PULL 20/29] hw/intc: Convert TYPE_ARM_GIC_KVM to 3-phase reset
-Convert the float-to-integer VCVT instructions to decodetree.
+Now we have converted TYPE_ARM_GIC_COMMON, we can convert the
-Since these are the last unconverted instructions, we can
+TYPE_ARM_GIC_KVM subclass to 3-phase reset.
 delete the old decoder structure entirely now.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20221109161444.3397405-5-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c |  72 ++++++++++
+ hw/intc/arm_gic_kvm.c | 14 +++++++++-----
- target/arm/translate.c         | 241 +--------------------------------
+file changed, 9 insertions(+), 5 deletions(-)
  target/arm/vfp.decode          |   6 +
 files changed, 80 insertions(+), 239 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/intc/arm_gic_kvm.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/intc/arm_gic_kvm.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
+@@ -XXX,XX +XXX,XX @@ DECLARE_OBJ_CHECKERS(GICState, KVMARMGICClass,
-     tcg_temp_free_ptr(fpst);
+ struct KVMARMGICClass {
-     return true;
+     ARMGICCommonClass parent_class;
      DeviceRealize parent_realize;
 -    void (*parent_reset)(DeviceState *dev);
 +    ResettablePhases parent_phases;
  };
  void kvm_arm_gic_set_irq(uint32_t num_irq, int irq, int level)
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gic_get(GICState *s)
      }
  }
-+
-+static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
+-static void kvm_arm_gic_reset(DeviceState *dev)
-+{
++static void kvm_arm_gic_reset_hold(Object *obj)
-+    TCGv_i32 vm;
+ {
-+    TCGv_ptr fpst;
+-    GICState *s = ARM_GIC_COMMON(dev);
-+
++    GICState *s = ARM_GIC_COMMON(obj);
-+    if (!vfp_access_check(s)) {
+     KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s);
-+        return true;
 -    kgc->parent_reset(dev);
 +    if (kgc->parent_phases.hold) {
 +        kgc->parent_phases.hold(obj);
 +    }
-+
-+    fpst = get_fpstatus_ptr(false);
+     if (kvm_arm_gic_can_save_restore(s)) {
-+    vm = tcg_temp_new_i32();
+         kvm_arm_gic_put(s);
-+    neon_load_reg32(vm, a->vm);
+@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gic_realize(DeviceState *dev, Error **errp)
-+
+ static void kvm_arm_gic_class_init(ObjectClass *klass, void *data)
-+    if (a->s) {
+ {
-+        if (a->rz) {
+     DeviceClass *dc = DEVICE_CLASS(klass);
-+            gen_helper_vfp_tosizs(vm, vm, fpst);
++    ResettableClass *rc = RESETTABLE_CLASS(klass);
-+        } else {
+     ARMGICCommonClass *agcc = ARM_GIC_COMMON_CLASS(klass);
-+            gen_helper_vfp_tosis(vm, vm, fpst);
+     KVMARMGICClass *kgc = KVM_ARM_GIC_CLASS(klass);
-+        }
-+    } else {
+@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gic_class_init(ObjectClass *klass, void *data)
-+        if (a->rz) {
+     agcc->post_load = kvm_arm_gic_put;
-+            gen_helper_vfp_touizs(vm, vm, fpst);
+     device_class_set_parent_realize(dc, kvm_arm_gic_realize,
-+        } else {
+                                     &kgc->parent_realize);
-+            gen_helper_vfp_touis(vm, vm, fpst);
+-    device_class_set_parent_reset(dc, kvm_arm_gic_reset, &kgc->parent_reset);
-+        }
++    resettable_class_set_parent_phases(rc, NULL, kvm_arm_gic_reset_hold, NULL,
-+    }
++                                       &kgc->parent_phases);
 +    neon_store_reg32(vm, a->vd);
 +    tcg_temp_free_i32(vm);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
 +static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
 +{
 +    TCGv_i32 vd;
 +    TCGv_i64 vm;
 +    TCGv_ptr fpst;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(false);
 +    vm = tcg_temp_new_i64();
 +    vd = tcg_temp_new_i32();
 +    neon_load_reg64(vm, a->vm);
 +
 +    if (a->s) {
 +        if (a->rz) {
 +            gen_helper_vfp_tosizd(vd, vm, fpst);
 +        } else {
 +            gen_helper_vfp_tosid(vd, vm, fpst);
 +        }
 +    } else {
 +        if (a->rz) {
 +            gen_helper_vfp_touizd(vd, vm, fpst);
 +        } else {
 +            gen_helper_vfp_touid(vd, vm, fpst);
 +        }
 +    }
 +    neon_store_reg32(vd, a->vd);
 +    tcg_temp_free_i32(vd);
 +    tcg_temp_free_i64(vm);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp, int neon) \
      tcg_temp_free_ptr(statusptr); \
  }
--VFP_GEN_FTOI(toui)
+ static const TypeInfo kvm_arm_gic_info = {
  VFP_GEN_FTOI(touiz)
 -VFP_GEN_FTOI(tosi)
  VFP_GEN_FTOI(tosiz)
  #undef VFP_GEN_FTOI
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
  }
  #define tcg_gen_ld_f32 tcg_gen_ld_i32
 -#define tcg_gen_ld_f64 tcg_gen_ld_i64
  #define tcg_gen_st_f32 tcg_gen_st_i32
 -#define tcg_gen_st_f64 tcg_gen_st_i64
 -
 -static inline void gen_mov_F0_vreg(int dp, int reg)
 -{
 -    if (dp)
 -        tcg_gen_ld_f64(cpu_F0d, cpu_env, vfp_reg_offset(dp, reg));
 -    else
 -        tcg_gen_ld_f32(cpu_F0s, cpu_env, vfp_reg_offset(dp, reg));
 -}
 -
 -static inline void gen_mov_F1_vreg(int dp, int reg)
 -{
 -    if (dp)
 -        tcg_gen_ld_f64(cpu_F1d, cpu_env, vfp_reg_offset(dp, reg));
 -    else
 -        tcg_gen_ld_f32(cpu_F1s, cpu_env, vfp_reg_offset(dp, reg));
 -}
 -
 -static inline void gen_mov_vreg_F0(int dp, int reg)
 -{
 -    if (dp)
 -        tcg_gen_st_f64(cpu_F0d, cpu_env, vfp_reg_offset(dp, reg));
 -    else
 -        tcg_gen_st_f32(cpu_F0s, cpu_env, vfp_reg_offset(dp, reg));
 -}
  #define ARM_CP_RW_BIT   (1 << 20)
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
   */
  static int disas_vfp_insn(DisasContext *s, uint32_t insn)
  {
 -    uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
 -    int dp, veclen;
 -
      if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
          return 1;
      }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              return 0;
          }
      }
 -
 -    if (extract32(insn, 28, 4) == 0xf) {
 -        /*
 -         * Encodings with T=1 (Thumb) or unconditional (ARM): these
 -         * were all handled by the decodetree decoder, so any insn
 -         * patterns which get here must be UNDEF.
 -         */
 -        return 1;
 -    }
 -
 -    /*
 -     * FIXME: this access check should not take precedence over UNDEF
 -     * for invalid encodings; we will generate incorrect syndrome information
 -     * for attempts to execute invalid vfp/neon encodings with FP disabled.
 -     */
 -    if (!vfp_access_check(s)) {
 -        return 0;
 -    }
 -
 -    dp = ((insn & 0xf00) == 0xb00);
 -    switch ((insn >> 24) & 0xf) {
 -    case 0xe:
 -        if (insn & (1 << 4)) {
 -            /* already handled by decodetree */
 -            return 1;
 -        } else {
 -            /* data processing */
 -            bool rd_is_dp = dp;
 -            bool rm_is_dp = dp;
 -            bool no_output = false;
 -
 -            /* The opcode is in bits 23, 21, 20 and 6.  */
 -            op = ((insn >> 20) & 8) | ((insn >> 19) & 6) | ((insn >> 6) & 1);
 -            rn = VFP_SREG_N(insn);
 -
 -            switch (op) {
 -            case 0 ... 14:
 -                /* Already handled by decodetree */
 -                return 1;
 -            case 15:
 -                switch (rn) {
 -                case 0 ... 23:
 -                case 28 ... 31:
 -                    /* Already handled by decodetree */
 -                    return 1;
 -                default:
 -                    break;
 -                }
 -            default:
 -                break;
 -            }
 -
 -            if (op == 15) {
 -                /* rn is opcode, encoded as per VFP_SREG_N. */
 -                switch (rn) {
 -                case 0x18: /* vcvtr.u32.fxx */
 -                case 0x19: /* vcvtz.u32.fxx */
 -                case 0x1a: /* vcvtr.s32.fxx */
 -                case 0x1b: /* vcvtz.s32.fxx */
 -                    rd_is_dp = false;
 -                    break;
 -
 -                default:
 -                    return 1;
 -                }
 -            } else if (dp) {
 -                /* rn is register number */
 -                VFP_DREG_N(rn, insn);
 -            }
 -
 -            if (rd_is_dp) {
 -                VFP_DREG_D(rd, insn);
 -            } else {
 -                rd = VFP_SREG_D(insn);
 -            }
 -            if (rm_is_dp) {
 -                VFP_DREG_M(rm, insn);
 -            } else {
 -                rm = VFP_SREG_M(insn);
 -            }
 -
 -            veclen = s->vec_len;
 -            if (op == 15 && rn > 3) {
 -                veclen = 0;
 -            }
 -
 -            /* Shut up compiler warnings.  */
 -            delta_m = 0;
 -            delta_d = 0;
 -            bank_mask = 0;
 -
 -            if (veclen > 0) {
 -                if (dp)
 -                    bank_mask = 0xc;
 -                else
 -                    bank_mask = 0x18;
 -
 -                /* Figure out what type of vector operation this is.  */
 -                if ((rd & bank_mask) == 0) {
 -                    /* scalar */
 -                    veclen = 0;
 -                } else {
 -                    if (dp)
 -                        delta_d = (s->vec_stride >> 1) + 1;
 -                    else
 -                        delta_d = s->vec_stride + 1;
 -
 -                    if ((rm & bank_mask) == 0) {
 -                        /* mixed scalar/vector */
 -                        delta_m = 0;
 -                    } else {
 -                        /* vector */
 -                        delta_m = delta_d;
 -                    }
 -                }
 -            }
 -
 -            /* Load the initial operands.  */
 -            if (op == 15) {
 -                switch (rn) {
 -                default:
 -                    /* One source operand.  */
 -                    gen_mov_F0_vreg(rm_is_dp, rm);
 -                    break;
 -                }
 -            } else {
 -                /* Two source operands.  */
 -                gen_mov_F0_vreg(dp, rn);
 -                gen_mov_F1_vreg(dp, rm);
 -            }
 -
 -            for (;;) {
 -                /* Perform the calculation.  */
 -                switch (op) {
 -                case 15: /* extension space */
 -                    switch (rn) {
 -                    case 24: /* ftoui */
 -                        gen_vfp_toui(dp, 0);
 -                        break;
 -                    case 25: /* ftouiz */
 -                        gen_vfp_touiz(dp, 0);
 -                        break;
 -                    case 26: /* ftosi */
 -                        gen_vfp_tosi(dp, 0);
 -                        break;
 -                    case 27: /* ftosiz */
 -                        gen_vfp_tosiz(dp, 0);
 -                        break;
 -                    default: /* undefined */
 -                        g_assert_not_reached();
 -                    }
 -                    break;
 -                default: /* undefined */
 -                    return 1;
 -                }
 -
 -                /* Write back the result, if any.  */
 -                if (!no_output) {
 -                    gen_mov_vreg_F0(rd_is_dp, rd);
 -                }
 -
 -                /* break out of the loop if we have finished  */
 -                if (veclen == 0) {
 -                    break;
 -                }
 -
 -                if (op == 15 && delta_m == 0) {
 -                    /* single source one-many */
 -                    while (veclen--) {
 -                        rd = ((rd + delta_d) & (bank_mask - 1))
 -                             | (rd & bank_mask);
 -                        gen_mov_vreg_F0(dp, rd);
 -                    }
 -                    break;
 -                }
 -                /* Setup the next operands.  */
 -                veclen--;
 -                rd = ((rd + delta_d) & (bank_mask - 1))
 -                     | (rd & bank_mask);
 -
 -                if (op == 15) {
 -                    /* One source operand.  */
 -                    rm = ((rm + delta_m) & (bank_mask - 1))
 -                         | (rm & bank_mask);
 -                    gen_mov_F0_vreg(dp, rm);
 -                } else {
 -                    /* Two source operands.  */
 -                    rn = ((rn + delta_d) & (bank_mask - 1))
 -                         | (rn & bank_mask);
 -                    gen_mov_F0_vreg(dp, rn);
 -                    if (delta_m) {
 -                        rm = ((rm + delta_m) & (bank_mask - 1))
 -                             | (rm & bank_mask);
 -                        gen_mov_F1_vreg(dp, rm);
 -                    }
 -                }
 -            }
 -        }
 -        break;
 -    case 0xc:
 -    case 0xd:
 -        /* Already handled by decodetree */
 -        return 1;
 -    default:
 -        /* Should never happen.  */
 -        return 1;
 -    }
 -    return 0;
 +    /* If the decodetree decoder didn't handle this insn, it must be UNDEF */
 +    return 1;
  }
  static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_fix_sp  ---- 1110 1.11 1.1. .... 1010 .1.0 .... \
               vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
  VCVT_fix_dp  ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
               vd=%vd_dp imm=%vm_sp opc=%vcvt_fix_op
 +
 +# VCVT float to integer (VCVT and VCVTR): Vd always single; Vd depends on size
 +VCVT_sp_int  ---- 1110 1.11 110 s:1 .... 1010 rz:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VCVT_dp_int  ---- 1110 1.11 110 s:1 .... 1011 rz:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 41/48] target/arm: Convert the VCVT-to-f16 insns to decodetree
+[PULL 21/29] hw/intc: Convert TYPE_ARM_GICV3_COMMON to 3-phase reset
-Convert the VCVTT and VCVTB instructions which convert from
+Convert the TYPE_ARM_GICV3_COMMON parent class to 3-phase reset.
 f32 and f64 to f16 to decodetree.
 Since we're no longer constrained to the old decoder's style
 using cpu_F0s and cpu_F0d we can perform a direct 16 bit
 store of the right half of the input single-precision register
 rather than doing a load/modify/store sequence on the full
 bits.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20221109161444.3397405-6-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 62 ++++++++++++++++++++++++++
+ hw/intc/arm_gicv3_common.c | 7 ++++---
- target/arm/translate.c         | 79 +---------------------------------
+file changed, 4 insertions(+), 3 deletions(-)
  target/arm/vfp.decode          |  6 +++
 files changed, 69 insertions(+), 78 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/intc/arm_gicv3_common.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/intc/arm_gicv3_common.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
+@@ -XXX,XX +XXX,XX @@ static void arm_gicv3_finalize(Object *obj)
-     tcg_temp_free_i64(vd);
+     g_free(s->redist_region_count);
      return true;
  }
-+
-+static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
+-static void arm_gicv3_common_reset(DeviceState *dev)
-+{
++static void arm_gicv3_common_reset_hold(Object *obj)
 +    TCGv_ptr fpst;
 +    TCGv_i32 ahp_mode;
 +    TCGv_i32 tmp;
 +
 +    if (!dc_isar_feature(aa32_fp16_spconv, s)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(false);
 +    ahp_mode = get_ahp_flag();
 +    tmp = tcg_temp_new_i32();
 +
 +    neon_load_reg32(tmp, a->vm);
 +    gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
 +    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
 +    tcg_temp_free_i32(ahp_mode);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
 +static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i32 ahp_mode;
 +    TCGv_i32 tmp;
 +    TCGv_i64 vm;
 +
 +    if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm  & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(false);
 +    ahp_mode = get_ahp_flag();
 +    tmp = tcg_temp_new_i32();
 +    vm = tcg_temp_new_i64();
 +
 +    neon_load_reg64(vm, a->vm);
 +    gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
 +    tcg_temp_free_i64(vm);
 +    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
 +    tcg_temp_free_i32(ahp_mode);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
  #define VFP_SREG_M(insn) VFP_SREG(insn,  0,  5)
  #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
 -/* Move between integer and VFP cores.  */
 -static TCGv_i32 gen_vfp_mrs(void)
 -{
 -    TCGv_i32 tmp = tcg_temp_new_i32();
 -    tcg_gen_mov_i32(tmp, cpu_F0s);
 -    return tmp;
 -}
 -
 -static void gen_vfp_msr(TCGv_i32 tmp)
 -{
 -    tcg_gen_mov_i32(cpu_F0s, tmp);
 -    tcg_temp_free_i32(tmp);
 -}
 -
  static void gen_neon_dup_low16(TCGv_i32 var)
  {
-     TCGv_i32 tmp = tcg_temp_new_i32();
+-    GICv3State *s = ARM_GICV3_COMMON(dev);
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
++    GICv3State *s = ARM_GICV3_COMMON(obj);
      int i;
      for (i = 0; i < s->num_cpu; i++) {
@@ -XXX,XX +XXX,XX @@ static Property arm_gicv3_common_properties[] = {
  static void arm_gicv3_common_class_init(ObjectClass *klass, void *data)
  {
-     uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
+     DeviceClass *dc = DEVICE_CLASS(klass);
-     int dp, veclen;
++    ResettableClass *rc = RESETTABLE_CLASS(klass);
--    TCGv_i32 tmp;
+     ARMLinuxBootIfClass *albifc = ARM_LINUX_BOOT_IF_CLASS(klass);
--    TCGv_i32 tmp2;
+-    dc->reset = arm_gicv3_common_reset;
-     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
++    rc->phases.hold = arm_gicv3_common_reset_hold;
-         return 1;
+     dc->realize = arm_gicv3_common_realize;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+     device_class_set_props(dc, arm_gicv3_common_properties);
-                 return 1;
+     dc->vmsd = &vmstate_gicv3;
              case 15:
                  switch (rn) {
 -                case 0 ... 5:
 -                case 8 ... 11:
 +                case 0 ... 11:
                      /* Already handled by decodetree */
                      return 1;
                  default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              if (op == 15) {
                  /* rn is opcode, encoded as per VFP_SREG_N. */
                  switch (rn) {
 -                case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
 -                case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
 -                    if (dp) {
 -                        if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
 -                            return 1;
 -                        }
 -                    } else {
 -                        if (!dc_isar_feature(aa32_fp16_spconv, s)) {
 -                            return 1;
 -                        }
 -                    }
 -                    rd_is_dp = false;
 -                    break;
 -
                  case 0x0c: /* vrintr */
                  case 0x0d: /* vrintz */
                  case 0x0e: /* vrintx */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 6: /* vcvtb.f16.f32, vcvtb.f16.f64 */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(false);
 -                        TCGv_i32 ahp = get_ahp_flag();
 -                        tmp = tcg_temp_new_i32();
 -
 -                        if (dp) {
 -                            gen_helper_vfp_fcvt_f64_to_f16(tmp, cpu_F0d,
 -                                                           fpst, ahp);
 -                        } else {
 -                            gen_helper_vfp_fcvt_f32_to_f16(tmp, cpu_F0s,
 -                                                           fpst, ahp);
 -                        }
 -                        tcg_temp_free_i32(ahp);
 -                        tcg_temp_free_ptr(fpst);
 -                        gen_mov_F0_vreg(0, rd);
 -                        tmp2 = gen_vfp_mrs();
 -                        tcg_gen_andi_i32(tmp2, tmp2, 0xffff0000);
 -                        tcg_gen_or_i32(tmp, tmp, tmp2);
 -                        tcg_temp_free_i32(tmp2);
 -                        gen_vfp_msr(tmp);
 -                        break;
 -                    }
 -                    case 7: /* vcvtt.f16.f32, vcvtt.f16.f64 */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(false);
 -                        TCGv_i32 ahp = get_ahp_flag();
 -                        tmp = tcg_temp_new_i32();
 -                        if (dp) {
 -                            gen_helper_vfp_fcvt_f64_to_f16(tmp, cpu_F0d,
 -                                                           fpst, ahp);
 -                        } else {
 -                            gen_helper_vfp_fcvt_f32_to_f16(tmp, cpu_F0s,
 -                                                           fpst, ahp);
 -                        }
 -                        tcg_temp_free_i32(ahp);
 -                        tcg_temp_free_ptr(fpst);
 -                        tcg_gen_shli_i32(tmp, tmp, 16);
 -                        gen_mov_F0_vreg(0, rd);
 -                        tmp2 = gen_vfp_mrs();
 -                        tcg_gen_ext16u_i32(tmp2, tmp2);
 -                        tcg_gen_or_i32(tmp, tmp, tmp2);
 -                        tcg_temp_free_i32(tmp2);
 -                        gen_vfp_msr(tmp);
 -                        break;
 -                    }
                      case 12: /* vrintr */
                      {
                          TCGv_ptr fpst = get_fpstatus_ptr(0);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f32_f16 ---- 1110 1.11 0010 .... 1010 t:1 1.0 .... \
               vd=%vd_sp vm=%vm_sp
  VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
               vd=%vd_dp vm=%vm_sp
 +
 +# VCVTB and VCVTT to f16: Vd format is always vd_sp; Vm format depends on size bit
 +VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 15/48] target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM to decodetree
+[PULL 22/29] hw/intc: Convert TYPE_KVM_ARM_GICV3 to 3-phase reset
-Convert the VCVTA/VCVTN/VCVTP/VCVTM instructions to decodetree.
+Convert the TYPE_KVM_ARM_GICV3 device to 3-phase reset.
 trans_VCVT() is temporarily left in translate.c.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20221109161444.3397405-7-peter.maydell@linaro.org
 ---
- target/arm/translate.c       | 72 +++++++++++++++++-------------------
+ hw/intc/arm_gicv3_kvm.c | 14 +++++++++-----
- target/arm/vfp-uncond.decode |  6 +++
+file changed, 9 insertions(+), 5 deletions(-)
 files changed, 39 insertions(+), 39 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/intc/arm_gicv3_kvm.c
-+++ b/target/arm/translate.c
++++ b/hw/intc/arm_gicv3_kvm.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
+@@ -XXX,XX +XXX,XX @@ DECLARE_OBJ_CHECKERS(GICv3State, KVMARMGICv3Class,
-     return true;
+ struct KVMARMGICv3Class {
      ARMGICv3CommonClass parent_class;
      DeviceRealize parent_realize;
 -    void (*parent_reset)(DeviceState *dev);
 +    ResettablePhases parent_phases;
  };
  static void kvm_arm_gicv3_set_irq(void *opaque, int irq, int level)
@@ -XXX,XX +XXX,XX @@ static void arm_gicv3_icc_reset(CPUARMState *env, const ARMCPRegInfo *ri)
      c->icc_ctlr_el1[GICV3_S] = c->icc_ctlr_el1[GICV3_NS];
  }
--static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
+-static void kvm_arm_gicv3_reset(DeviceState *dev)
--                       int rounding)
++static void kvm_arm_gicv3_reset_hold(Object *obj)
 +static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
  {
--    bool is_signed = extract32(insn, 7, 1);
+-    GICv3State *s = ARM_GICV3_COMMON(dev);
--    TCGv_ptr fpst = get_fpstatus_ptr(0);
++    GICv3State *s = ARM_GICV3_COMMON(obj);
-+    uint32_t rd, rm;
+     KVMARMGICv3Class *kgc = KVM_ARM_GICV3_GET_CLASS(s);
-+    bool dp = a->dp;
-+    TCGv_ptr fpst;
+     DPRINTF("Reset\n");
-     TCGv_i32 tcg_rmode, tcg_shift;
-+    int rounding = fp_decode_rm[a->rm];
+-    kgc->parent_reset(dev);
-+    bool is_signed = a->op;
++    if (kgc->parent_phases.hold) {
-+
++        kgc->parent_phases.hold(obj);
 +    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
 +        return false;
 +    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist */
+     if (s->migration_blocker) {
-+    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+         DPRINTF("Cannot put kernel gic state, no kernel interface\n");
-+        return false;
+@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
-+    }
+ static void kvm_arm_gicv3_class_init(ObjectClass *klass, void *data)
-+    rd = a->vd;
+ {
-+    rm = a->vm;
+     DeviceClass *dc = DEVICE_CLASS(klass);
-+
++    ResettableClass *rc = RESETTABLE_CLASS(klass);
-+    if (!vfp_access_check(s)) {
+     ARMGICv3CommonClass *agcc = ARM_GICV3_COMMON_CLASS(klass);
-+        return true;
+     KVMARMGICv3Class *kgc = KVM_ARM_GICV3_CLASS(klass);
-+    }
-+
+@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_class_init(ObjectClass *klass, void *data)
-+    fpst = get_fpstatus_ptr(0);
+     agcc->post_load = kvm_arm_gicv3_put;
+     device_class_set_parent_realize(dc, kvm_arm_gicv3_realize,
-     tcg_shift = tcg_const_i32(0);
+                                     &kgc->parent_realize);
+-    device_class_set_parent_reset(dc, kvm_arm_gicv3_reset, &kgc->parent_reset);
-@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
++    resettable_class_set_parent_phases(rc, NULL, kvm_arm_gicv3_reset_hold, NULL,
-     if (dp) {
++                                       &kgc->parent_phases);
          TCGv_i64 tcg_double, tcg_res;
          TCGv_i32 tcg_tmp;
 -        /* Rd is encoded as a single precision register even when the source
 -         * is double precision.
 -         */
 -        rd = ((rd << 1) & 0x1e) | ((rd >> 4) & 0x1);
          tcg_double = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
          tcg_tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
      tcg_temp_free_ptr(fpst);
 -    return 0;
 -}
 -
 -static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
 -{
 -    uint32_t rd, rm, dp = extract32(insn, 8, 1);
 -
 -    if (dp) {
 -        VFP_DREG_D(rd, insn);
 -        VFP_DREG_M(rm, insn);
 -    } else {
 -        rd = VFP_SREG_D(insn);
 -        rm = VFP_SREG_M(insn);
 -    }
 -
 -    if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
 -        dc_isar_feature(aa32_vcvt_dr, s)) {
 -        /* VCVTA, VCVTN, VCVTP, VCVTM */
 -        int rounding = fp_decode_rm[extract32(insn, 16, 2)];
 -        return handle_vcvt(insn, rd, rm, dp, rounding);
 -    }
 -    return 1;
 +    return true;
  }
- /*
+ static const TypeInfo kvm_arm_gicv3_info = {
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
          }
      }
 +    if (extract32(insn, 28, 4) == 0xf) {
 +        /*
 +         * Encodings with T=1 (Thumb) or unconditional (ARM): these
 +         * were all handled by the decodetree decoder, so any insn
 +         * patterns which get here must be UNDEF.
 +         */
 +        return 1;
 +    }
 +
      /*
       * FIXME: this access check should not take precedence over UNDEF
       * for invalid encodings; we will generate incorrect syndrome information
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
          return 0;
      }
 -    if (extract32(insn, 28, 4) == 0xf) {
 -        /*
 -         * Encodings with T=1 (Thumb) or unconditional (ARM):
 -         * only used for the "miscellaneous VFP features" added in v8A
 -         * and v7M (and gated on the MVFR2.FPMisc field).
 -         */
 -        return disas_vfp_misc_insn(s, insn);
 -    }
 -
      dp = ((insn & 0xf00) == 0xb00);
      switch ((insn >> 24) & 0xf) {
      case 0xe:
 diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp-uncond.decode
 +++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VRINT       1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
              vm=%vm_sp vd=%vd_sp dp=0
  VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
              vm=%vm_dp vd=%vd_dp dp=1
 +
 +# VCVT float to int with specified rounding mode; Vd is always single-precision
 +VCVT        1111 1110 1.11 11 rm:2 .... 1010 op:1 1.0 .... \
 +            vm=%vm_sp vd=%vd_sp dp=0
 +VCVT        1111 1110 1.11 11 rm:2 .... 1011 op:1 1.0 .... \
 +            vm=%vm_dp vd=%vd_sp dp=1
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 14/48] target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM to decodetree
+[PULL 23/29] hw/intc: Convert TYPE_ARM_GICV3_ITS_COMMON to 3-phase reset
-Convert the VRINTA/VRINTN/VRINTP/VRINTM instructions to decodetree.
+Convert the TYPE_ARM_GICV3_ITS_COMMON parent class to 3-phase reset.
 Again, trans_VRINT() is temporarily left in translate.c.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20221109161444.3397405-8-peter.maydell@linaro.org
 ---
- target/arm/translate.c       | 60 +++++++++++++++++++++++-------------
+ hw/intc/arm_gicv3_its_common.c | 7 ++++---
- target/arm/vfp-uncond.decode |  5 +++
+file changed, 4 insertions(+), 3 deletions(-)
 files changed, 43 insertions(+), 22 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/hw/intc/arm_gicv3_its_common.c b/hw/intc/arm_gicv3_its_common.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/intc/arm_gicv3_its_common.c
-+++ b/target/arm/translate.c
++++ b/hw/intc/arm_gicv3_its_common.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
+@@ -XXX,XX +XXX,XX @@ void gicv3_its_init_mmio(GICv3ITSState *s, const MemoryRegionOps *ops,
-     return true;
+     msi_nonbroken = true;
  }
--static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
+-static void gicv3_its_common_reset(DeviceState *dev)
--                        int rounding)
++static void gicv3_its_common_reset_hold(Object *obj)
 +/*
 + * Table for converting the most common AArch32 encoding of
 + * rounding mode to arm_fprounding order (which matches the
 + * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
 + */
 +static const uint8_t fp_decode_rm[] = {
 +    FPROUNDING_TIEAWAY,
 +    FPROUNDING_TIEEVEN,
 +    FPROUNDING_POSINF,
 +    FPROUNDING_NEGINF,
 +};
 +
 +static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
  {
--    TCGv_ptr fpst = get_fpstatus_ptr(0);
+-    GICv3ITSState *s = ARM_GICV3_ITS_COMMON(dev);
-+    uint32_t rd, rm;
++    GICv3ITSState *s = ARM_GICV3_ITS_COMMON(obj);
-+    bool dp = a->dp;
-+    TCGv_ptr fpst;
+     s->ctlr = 0;
-     TCGv_i32 tcg_rmode;
+     s->cbaser = 0;
-+    int rounding = fp_decode_rm[a->rm];
+@@ -XXX,XX +XXX,XX @@ static void gicv3_its_common_reset(DeviceState *dev)
-+
+ static void gicv3_its_common_class_init(ObjectClass *klass, void *data)
-+    if (!dc_isar_feature(aa32_vrint, s)) {
+ {
-+        return false;
+     DeviceClass *dc = DEVICE_CLASS(klass);
-+    }
++    ResettableClass *rc = RESETTABLE_CLASS(klass);
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist */
+-    dc->reset = gicv3_its_common_reset;
-+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
++    rc->phases.hold = gicv3_its_common_reset_hold;
-+        ((a->vm | a->vd) & 0x10)) {
+     dc->vmsd = &vmstate_its;
 +        return false;
 +    }
 +    rd = a->vd;
 +    rm = a->vm;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(0);
      tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
@@ -XXX,XX +XXX,XX @@ static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
      tcg_temp_free_i32(tcg_rmode);
      tcg_temp_free_ptr(fpst);
 -    return 0;
 +    return true;
  }
- static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
-@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
-     return 0;
- }
--/* Table for converting the most common AArch32 encoding of
-- * rounding mode to arm_fprounding order (which matches the
-- * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
-- */
--static const uint8_t fp_decode_rm[] = {
--    FPROUNDING_TIEAWAY,
--    FPROUNDING_TIEEVEN,
--    FPROUNDING_POSINF,
--    FPROUNDING_NEGINF,
--};
--
- static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
- {
-     uint32_t rd, rm, dp = extract32(insn, 8, 1);
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
-         rm = VFP_SREG_M(insn);
-     }
--    if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
--        dc_isar_feature(aa32_vrint, s)) {
--        /* VRINTA, VRINTN, VRINTP, VRINTM */
--        int rounding = fp_decode_rm[extract32(insn, 16, 2)];
--        return handle_vrint(insn, rd, rm, dp, rounding);
--    } else if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
--               dc_isar_feature(aa32_vcvt_dr, s)) {
-+    if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
-+        dc_isar_feature(aa32_vcvt_dr, s)) {
-         /* VCVTA, VCVTN, VCVTP, VCVTM */
-         int rounding = fp_decode_rm[extract32(insn, 16, 2)];
-         return handle_vcvt(insn, rd, rm, dp, rounding);
-diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp-uncond.decode
-+++ b/target/arm/vfp-uncond.decode
-@@ -XXX,XX +XXX,XX @@ VMINMAXNM   1111 1110 1.00 .... .... 1010 . op:1 .0 .... \
-             vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
- VMINMAXNM   1111 1110 1.00 .... .... 1011 . op:1 .0 .... \
-             vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
-+
-+VRINT       1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
-+            vm=%vm_sp vd=%vd_sp dp=0
-+VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
-+            vm=%vm_dp vd=%vd_dp dp=1
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 13/48] target/arm: Convert VMINNM, VMAXNM to decodetree
+[PULL 24/29] hw/intc: Convert TYPE_ARM_GICV3_ITS to 3-phase reset
-Convert the VMINNM and VMAXNM instructions to decodetree.
+Convert the TYPE_ARM_GICV3_ITS device to 3-phase reset.
 As with VSEL, we leave the trans_VMINMAXNM() function
 in translate.c for the moment.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20221109161444.3397405-9-peter.maydell@linaro.org
 ---
- target/arm/translate.c       | 41 ++++++++++++++++++++++++------------
+ hw/intc/arm_gicv3_its.c | 14 +++++++++-----
- target/arm/vfp-uncond.decode |  5 +++++
+file changed, 9 insertions(+), 5 deletions(-)
 files changed, 33 insertions(+), 13 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/intc/arm_gicv3_its.c
-+++ b/target/arm/translate.c
++++ b/hw/intc/arm_gicv3_its.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+@@ -XXX,XX +XXX,XX @@ DECLARE_OBJ_CHECKERS(GICv3ITSState, GICv3ITSClass,
-     return true;
  struct GICv3ITSClass {
      GICv3ITSCommonClass parent_class;
 -    void (*parent_reset)(DeviceState *dev);
 +    ResettablePhases parent_phases;
  };
  /*
@@ -XXX,XX +XXX,XX @@ static void gicv3_arm_its_realize(DeviceState *dev, Error **errp)
      }
  }
--static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
+-static void gicv3_its_reset(DeviceState *dev)
--                            uint32_t rm, uint32_t dp)
++static void gicv3_its_reset_hold(Object *obj)
 +static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
  {
--    uint32_t vmin = extract32(insn, 6, 1);
+-    GICv3ITSState *s = ARM_GICV3_ITS_COMMON(dev);
--    TCGv_ptr fpst = get_fpstatus_ptr(0);
++    GICv3ITSState *s = ARM_GICV3_ITS_COMMON(obj);
-+    uint32_t rd, rn, rm;
+     GICv3ITSClass *c = ARM_GICV3_ITS_GET_CLASS(s);
-+    bool dp = a->dp;
-+    bool vmin = a->op;
+-    c->parent_reset(dev);
-+    TCGv_ptr fpst;
++    if (c->parent_phases.hold) {
-+
++        c->parent_phases.hold(obj);
 +    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
 +        return false;
 +    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist */
+     /* Quiescent bit reset to 1 */
-+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+     s->ctlr = FIELD_DP32(s->ctlr, GITS_CTLR, QUIESCENT, 1);
-+        ((a->vm | a->vn | a->vd) & 0x10)) {
+@@ -XXX,XX +XXX,XX @@ static Property gicv3_its_props[] = {
-+        return false;
+ static void gicv3_its_class_init(ObjectClass *klass, void *data)
-+    }
+ {
-+    rd = a->vd;
+     DeviceClass *dc = DEVICE_CLASS(klass);
-+    rn = a->vn;
++    ResettableClass *rc = RESETTABLE_CLASS(klass);
-+    rm = a->vm;
+     GICv3ITSClass *ic = ARM_GICV3_ITS_CLASS(klass);
-+
+     GICv3ITSCommonClass *icc = ARM_GICV3_ITS_COMMON_CLASS(klass);
-+    if (!vfp_access_check(s)) {
-+        return true;
+     dc->realize = gicv3_arm_its_realize;
-+    }
+     device_class_set_props(dc, gicv3_its_props);
-+
+-    device_class_set_parent_reset(dc, gicv3_its_reset, &ic->parent_reset);
-+    fpst = get_fpstatus_ptr(0);
++    resettable_class_set_parent_phases(rc, NULL, gicv3_its_reset_hold, NULL,
++                                       &ic->parent_phases);
-     if (dp) {
+     icc->post_load = gicv3_its_post_load;
          TCGv_i64 frn, frm, dest;
@@ -XXX,XX +XXX,XX @@ static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
      }
      tcg_temp_free_ptr(fpst);
 -    return 0;
 +    return true;
  }
- static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
-@@ -XXX,XX +XXX,XX @@ static const uint8_t fp_decode_rm[] = {
- static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
- {
--    uint32_t rd, rn, rm, dp = extract32(insn, 8, 1);
-+    uint32_t rd, rm, dp = extract32(insn, 8, 1);
-     if (dp) {
-         VFP_DREG_D(rd, insn);
--        VFP_DREG_N(rn, insn);
-         VFP_DREG_M(rm, insn);
-     } else {
-         rd = VFP_SREG_D(insn);
--        rn = VFP_SREG_N(insn);
-         rm = VFP_SREG_M(insn);
-     }
--    if ((insn & 0x0fb00e10) == 0x0e800a00 &&
--        dc_isar_feature(aa32_vminmaxnm, s)) {
--        return handle_vminmaxnm(insn, rd, rn, rm, dp);
--    } else if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
--               dc_isar_feature(aa32_vrint, s)) {
-+    if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
-+        dc_isar_feature(aa32_vrint, s)) {
-         /* VRINTA, VRINTN, VRINTP, VRINTM */
-         int rounding = fp_decode_rm[extract32(insn, 16, 2)];
-         return handle_vrint(insn, rd, rm, dp, rounding);
-diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp-uncond.decode
-+++ b/target/arm/vfp-uncond.decode
-@@ -XXX,XX +XXX,XX @@ VSEL        1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
-             vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
- VSEL        1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
-             vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
-+
-+VMINMAXNM   1111 1110 1.00 .... .... 1010 . op:1 .0 .... \
-+            vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
-+VMINMAXNM   1111 1110 1.00 .... .... 1011 . op:1 .0 .... \
-+            vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 35/48] target/arm: Convert VABS to decodetree
+[PULL 25/29] hw/intc: Convert TYPE_KVM_ARM_ITS to 3-phase reset
-Convert the VFP VABS instruction to decodetree.
+Convert the TYPE_KVM_ARM_ITS device to 3-phase reset.
 Unlike the 3-op versions, we don't pass fpst to the VFPGen2OpSPFn or
 VFPGen2OpDPFn because none of the operations which use this format
 and support short vectors will need it.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20221109161444.3397405-10-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 167 +++++++++++++++++++++++++++++++++
+ hw/intc/arm_gicv3_its_kvm.c | 14 +++++++++-----
- target/arm/translate.c         |  12 ++-
+file changed, 9 insertions(+), 5 deletions(-)
  target/arm/vfp.decode          |   5 +
 files changed, 180 insertions(+), 4 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/hw/intc/arm_gicv3_its_kvm.c b/hw/intc/arm_gicv3_its_kvm.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/intc/arm_gicv3_its_kvm.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/intc/arm_gicv3_its_kvm.c
-@@ -XXX,XX +XXX,XX @@ typedef void VFPGen3OpSPFn(TCGv_i32 vd,
+@@ -XXX,XX +XXX,XX @@ DECLARE_OBJ_CHECKERS(GICv3ITSState, KVMARMITSClass,
- typedef void VFPGen3OpDPFn(TCGv_i64 vd,
-                            TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst);
+ struct KVMARMITSClass {
+     GICv3ITSCommonClass parent_class;
-+/*
+-    void (*parent_reset)(DeviceState *dev);
-+ * Types for callbacks for do_vfp_2op_sp() and do_vfp_2op_dp().
++    ResettablePhases parent_phases;
-+ * The callback should emit code to write a value to vd (which
+ };
-+ * should be written to only).
-+ */
-+typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
+@@ -XXX,XX +XXX,XX @@ static void kvm_arm_its_post_load(GICv3ITSState *s)
-+typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
+                       GITS_CTLR, &s->ctlr, true, &error_abort);
 +
  /*
   * Perform a 3-operand VFP data processing instruction. fn is the
   * callback to do the actual operation; this function deals with the
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
      return true;
  }
-+static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+-static void kvm_arm_its_reset(DeviceState *dev)
-+{
++static void kvm_arm_its_reset_hold(Object *obj)
-+    uint32_t delta_m = 0;
+ {
-+    uint32_t delta_d = 0;
+-    GICv3ITSState *s = ARM_GICV3_ITS_COMMON(dev);
-+    uint32_t bank_mask = 0;
++    GICv3ITSState *s = ARM_GICV3_ITS_COMMON(obj);
-+    int veclen = s->vec_len;
+     KVMARMITSClass *c = KVM_ARM_ITS_GET_CLASS(s);
-+    TCGv_i32 f0, fd;
+     int i;
-+
-+    if (!dc_isar_feature(aa32_fpshvec, s) &&
+-    c->parent_reset(dev);
-+        (veclen != 0 || s->vec_stride != 0)) {
++    if (c->parent_phases.hold) {
-+        return false;
++        c->parent_phases.hold(obj);
 +    }
-+
-+    if (!vfp_access_check(s)) {
+     if (kvm_device_check_attr(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
-+        return true;
+                                KVM_DEV_ARM_ITS_CTRL_RESET)) {
-+    }
+@@ -XXX,XX +XXX,XX @@ static Property kvm_arm_its_props[] = {
-+
+ static void kvm_arm_its_class_init(ObjectClass *klass, void *data)
 +    if (veclen > 0) {
 +        bank_mask = 0x18;
 +
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = s->vec_stride + 1;
 +
 +            if ((vm & bank_mask) == 0) {
 +                /* mixed scalar/vector */
 +                delta_m = 0;
 +            } else {
 +                /* vector */
 +                delta_m = delta_d;
 +            }
 +        }
 +    }
 +
 +    f0 = tcg_temp_new_i32();
 +    fd = tcg_temp_new_i32();
 +
 +    neon_load_reg32(f0, vm);
 +
 +    for (;;) {
 +        fn(fd, f0);
 +        neon_store_reg32(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +
 +        if (delta_m == 0) {
 +            /* single source one-many */
 +            while (veclen--) {
 +                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +                neon_store_reg32(fd, vd);
 +            }
 +            break;
 +        }
 +
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +        neon_load_reg32(f0, vm);
 +    }
 +
 +    tcg_temp_free_i32(f0);
 +    tcg_temp_free_i32(fd);
 +
 +    return true;
 +}
 +
 +static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
 +{
 +    uint32_t delta_m = 0;
 +    uint32_t delta_d = 0;
 +    uint32_t bank_mask = 0;
 +    int veclen = s->vec_len;
 +    TCGv_i64 f0, fd;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!dc_isar_feature(aa32_fpshvec, s) &&
 +        (veclen != 0 || s->vec_stride != 0)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (veclen > 0) {
 +        bank_mask = 0xc;
 +
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = (s->vec_stride >> 1) + 1;
 +
 +            if ((vm & bank_mask) == 0) {
 +                /* mixed scalar/vector */
 +                delta_m = 0;
 +            } else {
 +                /* vector */
 +                delta_m = delta_d;
 +            }
 +        }
 +    }
 +
 +    f0 = tcg_temp_new_i64();
 +    fd = tcg_temp_new_i64();
 +
 +    neon_load_reg64(f0, vm);
 +
 +    for (;;) {
 +        fn(fd, f0);
 +        neon_store_reg64(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +
 +        if (delta_m == 0) {
 +            /* single source one-many */
 +            while (veclen--) {
 +                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +                neon_store_reg64(fd, vd);
 +            }
 +            break;
 +        }
 +
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +        neon_load_reg64(f0, vm);
 +    }
 +
 +    tcg_temp_free_i64(f0);
 +    tcg_temp_free_i64(fd);
 +
 +    return true;
 +}
 +
  static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
  {
-     /* Note that order of inputs to the add matters for NaNs */
+     DeviceClass *dc = DEVICE_CLASS(klass);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
++    ResettableClass *rc = RESETTABLE_CLASS(klass);
-     tcg_temp_free_i64(fd);
+     GICv3ITSCommonClass *icc = ARM_GICV3_ITS_COMMON_CLASS(klass);
-     return true;
+     KVMARMITSClass *ic = KVM_ARM_ITS_CLASS(klass);
- }
-+
+     dc->realize = kvm_arm_its_realize;
-+static bool trans_VABS_sp(DisasContext *s, arg_VABS_sp *a)
+     device_class_set_props(dc, kvm_arm_its_props);
-+{
+-    device_class_set_parent_reset(dc, kvm_arm_its_reset, &ic->parent_reset);
-+    return do_vfp_2op_sp(s, gen_helper_vfp_abss, a->vd, a->vm);
++    resettable_class_set_parent_phases(rc, NULL, kvm_arm_its_reset_hold, NULL,
-+}
++                                       &ic->parent_phases);
-+
+     icc->send_msi = kvm_its_send_msi;
-+static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
+     icc->pre_save = kvm_arm_its_pre_save;
-+{
+     icc->post_load = kvm_arm_its_post_load;
 +    return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              case 0 ... 14:
                  /* Already handled by decodetree */
                  return 1;
 +            case 15:
 +                switch (rn) {
 +                case 1:
 +                    /* Already handled by decodetree */
 +                    return 1;
 +                default:
 +                    break;
 +                }
              default:
                  break;
              }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  /* rn is opcode, encoded as per VFP_SREG_N. */
                  switch (rn) {
                  case 0x00: /* vmov */
 -                case 0x01: /* vabs */
                  case 0x02: /* vneg */
                  case 0x03: /* vsqrt */
                      break;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                      case 0: /* cpy */
                          /* no-op */
                          break;
 -                    case 1: /* abs */
 -                        gen_vfp_abs(dp);
 -                        break;
                      case 2: /* neg */
                          gen_vfp_neg(dp);
                          break;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMOV_imm_sp  ---- 1110 1.11 imm4h:4 .... 1010 0000 imm4l:4 \
               vd=%vd_sp
  VMOV_imm_dp  ---- 1110 1.11 imm4h:4 .... 1011 0000 imm4l:4 \
               vd=%vd_dp
 +
 +VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 08/48] target/arm: Add stubs for AArch32 VFP decodetree
+[PULL 26/29] hw/arm/boot: set initrd with #address-cells type in fdt
-Add the infrastructure for building and invoking a decodetree decoder
+From: Schspa Shi <schspa@gmail.com>
 for the AArch32 VFP encodings.  At the moment the new decoder covers
 nothing, so we always fall back to the existing hand-written decode.
-We need to have one decoder for the unconditional insns and one for
+We use 32bit value for linux,initrd-[start/end], when we have
-the conditional insns, as otherwise the patterns for conditional
+loader_start > 4GB, there will be a wrong initrd_start passed
-insns would incorrectly match against the unconditional ones too.
+to the kernel, and the kernel will report the following warning.
-Since translate.c is over 14,000 lines long and we're going to be
+[    0.000000] ------------[ cut here ]------------
-touching pretty much every line of the VFP code as part of the
+[    0.000000] initrd not fully accessible via the linear mapping -- please check your bootloader ...
-decodetree conversion, we create a new translate-vfp.inc.c to hold
+[    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:355 arm64_memblock_init+0x158/0x244
-the code which deals with VFP in the new scheme.  It should be
+[    0.000000] Modules linked in:
-possible to convert this into a standalone translation unit
+[    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W          6.1.0-rc3-13250-g30a0b95b1335-dirty #28
-eventually, but the conversion process will be much simpler if we
+[    0.000000] Hardware name: Horizon Sigi Virtual development board (DT)
-simply #include it midway through translate.c to start with.
+[    0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 [    0.000000] pc : arm64_memblock_init+0x158/0x244
 [    0.000000] lr : arm64_memblock_init+0x158/0x244
 [    0.000000] sp : ffff800009273df0
 [    0.000000] x29: ffff800009273df0 x28: 0000001000cc0010 x27: 0000800000000000
 [    0.000000] x26: 000000000050a3e2 x25: ffff800008b46000 x24: ffff800008b46000
 [    0.000000] x23: ffff800008a53000 x22: ffff800009420000 x21: ffff800008a53000
 [    0.000000] x20: 0000000004000000 x19: 0000000004000000 x18: 00000000ffff1020
 [    0.000000] x17: 6568632065736165 x16: 6c70202d2d20676e x15: 697070616d207261
 [    0.000000] x14: 656e696c20656874 x13: 0a2e2e2e20726564 x12: 0000000000000000
 [    0.000000] x11: 0000000000000000 x10: 00000000ffffffff x9 : 0000000000000000
 [    0.000000] x8 : 0000000000000000 x7 : 796c6c756620746f x6 : 6e20647274696e69
 [    0.000000] x5 : ffff8000093c7c47 x4 : ffff800008a2102f x3 : ffff800009273a88
 [    0.000000] x2 : 80000000fffff038 x1 : 00000000000000c0 x0 : 0000000000000056
 [    0.000000] Call trace:
 [    0.000000]  arm64_memblock_init+0x158/0x244
 [    0.000000]  setup_arch+0x164/0x1cc
 [    0.000000]  start_kernel+0x94/0x4ac
 [    0.000000]  __primary_switched+0xb4/0xbc
 [    0.000000] ---[ end trace 0000000000000000 ]---
 [    0.000000] Zone ranges:
 [    0.000000]   DMA      [mem 0x0000001000000000-0x0000001007ffffff]
+This doesn't affect any machine types we currently support, because
+for all of our machine types the RAM starts well below the 4GB
+mark, but it does demonstrate that we're not currently writing
+the device-tree properties quite as intended.
+To fix it, we can change it to write these values to the dtb using a
+type width matching #address-cells.  This is the intended size for
+these dtb properties, and is how u-boot, for instance, writes them,
+although in practice the Linux kernel will cope with them being any
+width as long as they're big enough to fit the value.
+Signed-off-by: Schspa Shi <schspa@gmail.com>
+Message-id: 20221129160724.75667-1-schspa@gmail.com
+[PMM: tweaked commit message]
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/Makefile.objs       | 13 +++++++++++++
+ hw/arm/boot.c | 10 ++++++----
- target/arm/translate-vfp.inc.c | 31 +++++++++++++++++++++++++++++++
+file changed, 6 insertions(+), 4 deletions(-)
  target/arm/translate.c         | 19 +++++++++++++++++++
  target/arm/vfp-uncond.decode   | 28 ++++++++++++++++++++++++++++
  target/arm/vfp.decode          | 28 ++++++++++++++++++++++++++++
 files changed, 119 insertions(+)
  create mode 100644 target/arm/translate-vfp.inc.c
  create mode 100644 target/arm/vfp-uncond.decode
  create mode 100644 target/arm/vfp.decode
-diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
+diff --git a/hw/arm/boot.c b/hw/arm/boot.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/Makefile.objs
+--- a/hw/arm/boot.c
-+++ b/target/arm/Makefile.objs
++++ b/hw/arm/boot.c
-@@ -XXX,XX +XXX,XX @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
+@@ -XXX,XX +XXX,XX @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
        $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
        "GEN", $(TARGET_DIR)$@)
 +target/arm/decode-vfp.inc.c: $(SRC_PATH)/target/arm/vfp.decode $(DECODETREE)
 +    $(call quiet-command,\
 +      $(PYTHON) $(DECODETREE) --static-decode disas_vfp -o $@ $<,\
 +      "GEN", $(TARGET_DIR)$@)
 +
 +target/arm/decode-vfp-uncond.inc.c: $(SRC_PATH)/target/arm/vfp-uncond.decode $(DECODETREE)
 +    $(call quiet-command,\
 +      $(PYTHON) $(DECODETREE) --static-decode disas_vfp_uncond -o $@ $<,\
 +      "GEN", $(TARGET_DIR)$@)
 +
  target/arm/translate-sve.o: target/arm/decode-sve.inc.c
 +target/arm/translate.o: target/arm/decode-vfp.inc.c
 +target/arm/translate.o: target/arm/decode-vfp-uncond.inc.c
 +
  obj-$(TARGET_AARCH64) += translate-sve.o sve_helper.o
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + *  ARM translation: AArch32 VFP instructions
 + *
 + *  Copyright (c) 2003 Fabrice Bellard
 + *  Copyright (c) 2005-2007 CodeSourcery
 + *  Copyright (c) 2007 OpenedHand, Ltd.
 + *  Copyright (c) 2019 Linaro, Ltd.
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
 + */
 +
 +/*
 + * This file is intended to be included from translate.c; it uses
 + * some macros and definitions provided by that file.
 + * It might be possible to convert it to a standalone .c file eventually.
 + */
 +
 +/* Include the generated VFP decoder */
 +#include "decode-vfp.inc.c"
 +#include "decode-vfp-uncond.inc.c"
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_mov_vreg_F0(int dp, int reg)
  #define ARM_CP_RW_BIT   (1 << 20)
 +/* Include the VFP decoder */
 +#include "translate-vfp.inc.c"
 +
  static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
  {
      tcg_gen_ld_i64(var, cpu_env, offsetof(CPUARMState, iwmmxt.regs[reg]));
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
          return 1;
      }
-+    /*
+     if (binfo->initrd_size) {
-+     * If the decodetree decoder handles this insn it will always
+-        rc = qemu_fdt_setprop_cell(fdt, "/chosen", "linux,initrd-start",
-+     * emit code to either execute the insn or generate an appropriate
+-                                   binfo->initrd_start);
-+     * exception; so we don't need to ever return non-zero to tell
++        rc = qemu_fdt_setprop_sized_cells(fdt, "/chosen", "linux,initrd-start",
-+     * the calling code to emit an UNDEF exception.
++                                          acells, binfo->initrd_start);
-+     */
+         if (rc < 0) {
-+    if (extract32(insn, 28, 4) == 0xf) {
+             fprintf(stderr, "couldn't set /chosen/linux,initrd-start\n");
-+        if (disas_vfp_uncond(s, insn)) {
+             goto fail;
-+            return 0;
+         }
-+        }
-+    } else {
+-        rc = qemu_fdt_setprop_cell(fdt, "/chosen", "linux,initrd-end",
-+        if (disas_vfp(s, insn)) {
+-                                   binfo->initrd_start + binfo->initrd_size);
-+            return 0;
++        rc = qemu_fdt_setprop_sized_cells(fdt, "/chosen", "linux,initrd-end",
-+        }
++                                          acells,
-+    }
++                                          binfo->initrd_start +
-+
++                                          binfo->initrd_size);
-     /* FIXME: this access check should not take precedence over UNDEF
+         if (rc < 0) {
-      * for invalid encodings; we will generate incorrect syndrome information
+             fprintf(stderr, "couldn't set /chosen/linux,initrd-end\n");
-      * for attempts to execute invalid vfp/neon encodings with FP disabled.
+             goto fail;
 diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@
 +# AArch32 VFP instruction descriptions (unconditional insns)
 +#
 +#  Copyright (c) 2019 Linaro, Ltd
 +#
 +# This library is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU Lesser General Public
 +# License as published by the Free Software Foundation; either
 +# version 2 of the License, or (at your option) any later version.
 +#
 +# This library is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +# Lesser General Public License for more details.
 +#
 +# You should have received a copy of the GNU Lesser General Public
 +# License along with this library; if not, see <http://www.gnu.org/licenses/>.
 +
 +#
 +# This file is processed by scripts/decodetree.py
 +#
 +# Encodings for the unconditional VFP instructions are here:
 +# generally anything matching A32
 +#  1111 1110 .... .... .... 101. ...0 ....
 +# and T32
 +#  1111 110. .... .... .... 101. .... ....
 +#  1111 1110 .... .... .... 101. .... ....
 +# (but those patterns might also cover some Neon instructions,
 +# which do not live in this file.)
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@
 +# AArch32 VFP instruction descriptions (conditional insns)
 +#
 +#  Copyright (c) 2019 Linaro, Ltd
 +#
 +# This library is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU Lesser General Public
 +# License as published by the Free Software Foundation; either
 +# version 2 of the License, or (at your option) any later version.
 +#
 +# This library is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +# Lesser General Public License for more details.
 +#
 +# You should have received a copy of the GNU Lesser General Public
 +# License along with this library; if not, see <http://www.gnu.org/licenses/>.
 +
 +#
 +# This file is processed by scripts/decodetree.py
 +#
 +# Encodings for the conditional VFP instructions are here:
 +# generally anything matching A32
 +#  cccc 11.. .... .... .... 101. .... ....
 +# and T32
 +#  1110 110. .... .... .... 101. .... ....
 +#  1110 1110 .... .... .... 101. .... ....
 +# (but those patterns might also cover some Neon instructions,
 +# which do not live in this file.)
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 03/48] target/arm: Implement NSACR gating of floating point
+[PULL 27/29] target/arm: align exposed ID registers with Linux
-The NSACR register allows secure code to configure the FPU
+From: Zhuojia Shen <chaosdefinition@hotmail.com>
 to be inaccessible to non-secure code. If the NSACR.CP10
 bit is set then:
  * NS accesses to the FPU trap as UNDEF (ie to NS EL1 or EL2)
  * CPACR.{CP10,CP11} behave as if RAZ/WI
  * HCPTR.{TCP11,TCP10} behave as if RAO/WI
-Note that we do not implement the NSACR.NSASEDIS bit which
+In CPUID registers exposed to userspace, some registers were missing
-gates only access to Advanced SIMD, in the same way that
+and some fields were not exposed.  This patch aligns exposed ID
-we don't implement the equivalent CPACR.ASEDIS and HCPTR.TASE.
+registers and their fields with what the upstream kernel currently
 exposes.
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Specifically, the following new ID registers/fields are exposed to
 userspace:
 ID_AA64PFR1_EL1.BT:       bits 3-0
 ID_AA64PFR1_EL1.MTE:      bits 11-8
 ID_AA64PFR1_EL1.SME:      bits 27-24
 ID_AA64ZFR0_EL1.SVEver:   bits 3-0
 ID_AA64ZFR0_EL1.AES:      bits 7-4
 ID_AA64ZFR0_EL1.BitPerm:  bits 19-16
 ID_AA64ZFR0_EL1.BF16:     bits 23-20
 ID_AA64ZFR0_EL1.SHA3:     bits 35-32
 ID_AA64ZFR0_EL1.SM4:      bits 43-40
 ID_AA64ZFR0_EL1.I8MM:     bits 47-44
 ID_AA64ZFR0_EL1.F32MM:    bits 55-52
 ID_AA64ZFR0_EL1.F64MM:    bits 59-56
 ID_AA64SMFR0_EL1.F32F32:  bit 32
 ID_AA64SMFR0_EL1.B16F32:  bit 34
 ID_AA64SMFR0_EL1.F16F32:  bit 35
 ID_AA64SMFR0_EL1.I8I32:   bits 39-36
 ID_AA64SMFR0_EL1.F64F64:  bit 48
 ID_AA64SMFR0_EL1.I16I64:  bits 55-52
 ID_AA64SMFR0_EL1.FA64:    bit 63
 ID_AA64MMFR0_EL1.ECV:     bits 63-60
 ID_AA64MMFR1_EL1.AFP:     bits 47-44
 ID_AA64MMFR2_EL1.AT:      bits 35-32
 ID_AA64ISAR0_EL1.RNDR:    bits 63-60
 ID_AA64ISAR1_EL1.FRINTTS: bits 35-32
 ID_AA64ISAR1_EL1.BF16:    bits 47-44
 ID_AA64ISAR1_EL1.DGH:     bits 51-48
 ID_AA64ISAR1_EL1.I8MM:    bits 55-52
 ID_AA64ISAR2_EL1.WFxT:    bits 3-0
 ID_AA64ISAR2_EL1.RPRES:   bits 7-4
 ID_AA64ISAR2_EL1.GPA3:    bits 11-8
 ID_AA64ISAR2_EL1.APA3:    bits 15-12
 The code is also refactored to use symbolic names for ID register fields
 for better readability and maintainability.
 Signed-off-by: Zhuojia Shen <chaosdefinition@hotmail.com>
 Message-id: DS7PR12MB6309BC9133877BCC6FC419FEAC0D9@DS7PR12MB6309.namprd12.prod.outlook.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20190510110357.18825-1-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 75 +++++++++++++++++++++++++++++++++++++++++++--
+ target/arm/helper.c | 96 +++++++++++++++++++++++++++++++++++++--------
-file changed, 73 insertions(+), 2 deletions(-)
+file changed, 79 insertions(+), 17 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static void cpacr_write(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
-         }
+ #ifdef CONFIG_USER_ONLY
-         value &= mask;
+         static const ARMCPRegUserSpaceInfo v8_user_idregs[] = {
-     }
+             { .name = "ID_AA64PFR0_EL1",
-+
+-              .exported_bits = 0x000f000f00ff0000,
-+    /*
+-              .fixed_bits    = 0x0000000000000011 },
-+     * For A-profile AArch32 EL3 (but not M-profile secure mode), if NSACR.CP10
++              .exported_bits = R_ID_AA64PFR0_FP_MASK |
-+     * is 0 then CPACR.{CP11,CP10} ignore writes and read as 0b00.
++                               R_ID_AA64PFR0_ADVSIMD_MASK |
-+     */
++                               R_ID_AA64PFR0_SVE_MASK |
-+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
++                               R_ID_AA64PFR0_DIT_MASK,
-+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
++              .fixed_bits = (0x1 << R_ID_AA64PFR0_EL0_SHIFT) |
-+        value &= ~(0xf << 20);
++                            (0x1 << R_ID_AA64PFR0_EL1_SHIFT) },
-+        value |= env->cp15.cpacr_el1 & (0xf << 20);
+             { .name = "ID_AA64PFR1_EL1",
-+    }
+-              .exported_bits = 0x00000000000000f0 },
-+
++              .exported_bits = R_ID_AA64PFR1_BT_MASK |
-     env->cp15.cpacr_el1 = value;
++                               R_ID_AA64PFR1_SSBS_MASK |
- }
++                               R_ID_AA64PFR1_MTE_MASK |
++                               R_ID_AA64PFR1_SME_MASK },
-+static uint64_t cpacr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+             { .name = "ID_AA64PFR*_EL1_RESERVED",
-+{
+-              .is_glob = true                     },
-+    /*
+-            { .name = "ID_AA64ZFR0_EL1"           },
-+     * For A-profile AArch32 EL3 (but not M-profile secure mode), if NSACR.CP10
++              .is_glob = true },
-+     * is 0 then CPACR.{CP11,CP10} ignore writes and read as 0b00.
++            { .name = "ID_AA64ZFR0_EL1",
-+     */
++              .exported_bits = R_ID_AA64ZFR0_SVEVER_MASK |
-+    uint64_t value = env->cp15.cpacr_el1;
++                               R_ID_AA64ZFR0_AES_MASK |
-+
++                               R_ID_AA64ZFR0_BITPERM_MASK |
-+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
++                               R_ID_AA64ZFR0_BFLOAT16_MASK |
-+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
++                               R_ID_AA64ZFR0_SHA3_MASK |
-+        value &= ~(0xf << 20);
++                               R_ID_AA64ZFR0_SM4_MASK |
-+    }
++                               R_ID_AA64ZFR0_I8MM_MASK |
-+    return value;
++                               R_ID_AA64ZFR0_F32MM_MASK |
-+}
++                               R_ID_AA64ZFR0_F64MM_MASK },
-+
++            { .name = "ID_AA64SMFR0_EL1",
-+
++              .exported_bits = R_ID_AA64SMFR0_F32F32_MASK |
- static void cpacr_reset(CPUARMState *env, const ARMCPRegInfo *ri)
++                               R_ID_AA64SMFR0_B16F32_MASK |
- {
++                               R_ID_AA64SMFR0_F16F32_MASK |
-     /* Call cpacr_write() so that we reset with the correct RAO bits set
++                               R_ID_AA64SMFR0_I8I32_MASK |
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
++                               R_ID_AA64SMFR0_F64F64_MASK |
-     { .name = "CPACR", .state = ARM_CP_STATE_BOTH, .opc0 = 3,
++                               R_ID_AA64SMFR0_I16I64_MASK |
-       .crn = 1, .crm = 0, .opc1 = 0, .opc2 = 2, .accessfn = cpacr_access,
++                               R_ID_AA64SMFR0_FA64_MASK },
-       .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.cpacr_el1),
+             { .name = "ID_AA64MMFR0_EL1",
--      .resetfn = cpacr_reset, .writefn = cpacr_write },
+-              .fixed_bits    = 0x00000000ff000000 },
-+      .resetfn = cpacr_reset, .writefn = cpacr_write, .readfn = cpacr_read },
+-            { .name = "ID_AA64MMFR1_EL1"          },
-     REGINFO_SENTINEL
++              .exported_bits = R_ID_AA64MMFR0_ECV_MASK,
- };
++              .fixed_bits = (0xf << R_ID_AA64MMFR0_TGRAN64_SHIFT) |
++                            (0xf << R_ID_AA64MMFR0_TGRAN4_SHIFT) },
-@@ -XXX,XX +XXX,XX @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
++            { .name = "ID_AA64MMFR1_EL1",
-     return ret;
++              .exported_bits = R_ID_AA64MMFR1_AFP_MASK },
- }
++            { .name = "ID_AA64MMFR2_EL1",
++              .exported_bits = R_ID_AA64MMFR2_AT_MASK },
-+static void cptr_el2_write(CPUARMState *env, const ARMCPRegInfo *ri,
+             { .name = "ID_AA64MMFR*_EL1_RESERVED",
-+                           uint64_t value)
+-              .is_glob = true                     },
-+{
++              .is_glob = true },
-+    /*
+             { .name = "ID_AA64DFR0_EL1",
-+     * For A-profile AArch32 EL3, if NSACR.CP10
+-              .fixed_bits    = 0x0000000000000006 },
-+     * is 0 then HCPTR.{TCP11,TCP10} ignore writes and read as 1.
+-            { .name = "ID_AA64DFR1_EL1"           },
-+     */
++              .fixed_bits = (0x6 << R_ID_AA64DFR0_DEBUGVER_SHIFT) },
-+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
++            { .name = "ID_AA64DFR1_EL1" },
-+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
+             { .name = "ID_AA64DFR*_EL1_RESERVED",
-+        value &= ~(0x3 << 10);
+-              .is_glob = true                     },
-+        value |= env->cp15.cptr_el[2] & (0x3 << 10);
++              .is_glob = true },
-+    }
+             { .name = "ID_AA64AFR*",
-+    env->cp15.cptr_el[2] = value;
+-              .is_glob = true                     },
-+}
++              .is_glob = true },
-+
+             { .name = "ID_AA64ISAR0_EL1",
-+static uint64_t cptr_el2_read(CPUARMState *env, const ARMCPRegInfo *ri)
+-              .exported_bits = 0x00fffffff0fffff0 },
-+{
++              .exported_bits = R_ID_AA64ISAR0_AES_MASK |
-+    /*
++                               R_ID_AA64ISAR0_SHA1_MASK |
-+     * For A-profile AArch32 EL3, if NSACR.CP10
++                               R_ID_AA64ISAR0_SHA2_MASK |
-+     * is 0 then HCPTR.{TCP11,TCP10} ignore writes and read as 1.
++                               R_ID_AA64ISAR0_CRC32_MASK |
-+     */
++                               R_ID_AA64ISAR0_ATOMIC_MASK |
-+    uint64_t value = env->cp15.cptr_el[2];
++                               R_ID_AA64ISAR0_RDM_MASK |
-+
++                               R_ID_AA64ISAR0_SHA3_MASK |
-+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
++                               R_ID_AA64ISAR0_SM3_MASK |
-+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
++                               R_ID_AA64ISAR0_SM4_MASK |
-+        value |= 0x3 << 10;
++                               R_ID_AA64ISAR0_DP_MASK |
-+    }
++                               R_ID_AA64ISAR0_FHM_MASK |
-+    return value;
++                               R_ID_AA64ISAR0_TS_MASK |
-+}
++                               R_ID_AA64ISAR0_RNDR_MASK },
-+
+             { .name = "ID_AA64ISAR1_EL1",
- static const ARMCPRegInfo el2_cp_reginfo[] = {
+-              .exported_bits = 0x000000f0ffffffff },
-     { .name = "HCR_EL2", .state = ARM_CP_STATE_AA64,
++              .exported_bits = R_ID_AA64ISAR1_DPB_MASK |
-       .type = ARM_CP_IO,
++                               R_ID_AA64ISAR1_APA_MASK |
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
++                               R_ID_AA64ISAR1_API_MASK |
-     { .name = "CPTR_EL2", .state = ARM_CP_STATE_BOTH,
++                               R_ID_AA64ISAR1_JSCVT_MASK |
-       .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 2,
++                               R_ID_AA64ISAR1_FCMA_MASK |
-       .access = PL2_RW, .accessfn = cptr_access, .resetvalue = 0,
++                               R_ID_AA64ISAR1_LRCPC_MASK |
--      .fieldoffset = offsetof(CPUARMState, cp15.cptr_el[2]) },
++                               R_ID_AA64ISAR1_GPA_MASK |
-+      .fieldoffset = offsetof(CPUARMState, cp15.cptr_el[2]),
++                               R_ID_AA64ISAR1_GPI_MASK |
-+      .readfn = cptr_el2_read, .writefn = cptr_el2_write },
++                               R_ID_AA64ISAR1_FRINTTS_MASK |
-     { .name = "MAIR_EL2", .state = ARM_CP_STATE_BOTH,
++                               R_ID_AA64ISAR1_SB_MASK |
-       .opc0 = 3, .opc1 = 4, .crn = 10, .crm = 2, .opc2 = 0,
++                               R_ID_AA64ISAR1_BF16_MASK |
-       .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.mair_el[2]),
++                               R_ID_AA64ISAR1_DGH_MASK |
-@@ -XXX,XX +XXX,XX @@ int fp_exception_el(CPUARMState *env, int cur_el)
++                               R_ID_AA64ISAR1_I8MM_MASK },
-         break;
++            { .name = "ID_AA64ISAR2_EL1",
-     }
++              .exported_bits = R_ID_AA64ISAR2_WFXT_MASK |
++                               R_ID_AA64ISAR2_RPRES_MASK |
-+    /*
++                               R_ID_AA64ISAR2_GPA3_MASK |
-+     * The NSACR allows A-profile AArch32 EL3 and M-profile secure mode
++                               R_ID_AA64ISAR2_APA3_MASK },
-+     * to control non-secure access to the FPU. It doesn't have any
+             { .name = "ID_AA64ISAR*_EL1_RESERVED",
-+     * effect if EL3 is AArch64 or if EL3 doesn't exist at all.
+-              .is_glob = true                     },
-+     */
++              .is_glob = true },
-+    if ((arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+         };
-+         cur_el <= 2 && !arm_is_secure_below_el3(env))) {
+         modify_arm_cp_regs(v8_idregs, v8_user_idregs);
-+        if (!extract32(env->cp15.nsacr, 10, 1)) {
+ #endif
-+            /* FP insns act as UNDEF */
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
-+            return cur_el == 2 ? 2 : 1;
+ #ifdef CONFIG_USER_ONLY
-+        }
+         static const ARMCPRegUserSpaceInfo id_v8_user_midr_cp_reginfo[] = {
-+    }
+             { .name = "MIDR_EL1",
-+
+-              .exported_bits = 0x00000000ffffffff },
-     /* For the CPTR registers we don't need to guard with an ARM_FEATURE
+-            { .name = "REVIDR_EL1"                },
-      * check because zero bits in the registers mean "don't trap".
++              .exported_bits = R_MIDR_EL1_REVISION_MASK |
-      */
++                               R_MIDR_EL1_PARTNUM_MASK |
 +                               R_MIDR_EL1_ARCHITECTURE_MASK |
 +                               R_MIDR_EL1_VARIANT_MASK |
 +                               R_MIDR_EL1_IMPLEMENTER_MASK },
 +            { .name = "REVIDR_EL1" },
          };
          modify_arm_cp_regs(id_v8_midr_cp_reginfo, id_v8_user_midr_cp_reginfo);
  #endif
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 07/48] decodetree: Fix comparison of Field
+[PULL 28/29] hw/misc: Move some arm-related files from specific_ss into softmmu_ss
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Thomas Huth <thuth@redhat.com>
-Typo comparing the sign of the field, twice, instead of also comparing
+The header target/arm/kvm-consts.h checks CONFIG_KVM which is marked as
-the mask of the field (which itself encodes both position and length).
+poisoned in common code, so the files that include this header have to
 be added to specific_ss and recompiled for each, qemu-system-arm and
 qemu-system-aarch64. However, since the kvm headers are only optionally
 used in kvm-constants.h for some sanity checks, we can additionally
 check the NEED_CPU_H macro first to avoid the poisoned CONFIG_KVM macro,
 so kvm-constants.h can also be used from "common" files (without the
 sanity checks - which should be OK since they are still done from other
 target-specific files instead). This way, and by adjusting some other
 include statements in the related files here and there, we can move some
 files from specific_ss into softmmu_ss, so that they only need to be
 compiled once during the build process.
-Reported-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Thomas Huth <thuth@redhat.com>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 20190604154225.26992-1-richard.henderson@linaro.org
+Message-id: 20221202154023.293614-1-thuth@redhat.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- scripts/decodetree.py | 2 +-
+ include/hw/misc/xlnx-zynqmp-apu-ctrl.h |  2 +-
-file changed, 1 insertion(+), 1 deletion(-)
+ target/arm/kvm-consts.h                |  8 ++++----
  hw/misc/imx6_src.c                     |  2 +-
  hw/misc/iotkit-sysctl.c                |  1 -
  hw/misc/meson.build                    | 11 +++++------
 files changed, 11 insertions(+), 13 deletions(-)
-diff --git a/scripts/decodetree.py b/scripts/decodetree.py
+diff --git a/include/hw/misc/xlnx-zynqmp-apu-ctrl.h b/include/hw/misc/xlnx-zynqmp-apu-ctrl.h
-index XXXXXXX..XXXXXXX 100755
+index XXXXXXX..XXXXXXX 100644
---- a/scripts/decodetree.py
+--- a/include/hw/misc/xlnx-zynqmp-apu-ctrl.h
-+++ b/scripts/decodetree.py
++++ b/include/hw/misc/xlnx-zynqmp-apu-ctrl.h
-@@ -XXX,XX +XXX,XX @@ class Field:
+@@ -XXX,XX +XXX,XX @@
-         return '{0}(insn, {1}, {2})'.format(extr, self.pos, self.len)
+ #include "hw/sysbus.h"
-     def __eq__(self, other):
+ #include "hw/register.h"
--        return self.sign == other.sign and self.sign == other.sign
+-#include "target/arm/cpu.h"
-+        return self.sign == other.sign and self.mask == other.mask
++#include "target/arm/cpu-qom.h"
-     def __ne__(self, other):
+ #define TYPE_XLNX_ZYNQMP_APU_CTRL "xlnx.apu-ctrl"
-         return not self.__eq__(other)
+ OBJECT_DECLARE_SIMPLE_TYPE(XlnxZynqMPAPUCtrl, XLNX_ZYNQMP_APU_CTRL)
 diff --git a/target/arm/kvm-consts.h b/target/arm/kvm-consts.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm-consts.h
 +++ b/target/arm/kvm-consts.h
@@ -XXX,XX +XXX,XX @@
  #ifndef ARM_KVM_CONSTS_H
  #define ARM_KVM_CONSTS_H
 +#ifdef NEED_CPU_H
  #ifdef CONFIG_KVM
  #include <linux/kvm.h>
  #include <linux/psci.h>
 -
  #define MISMATCH_CHECK(X, Y) QEMU_BUILD_BUG_ON(X != Y)
 +#endif
 +#endif
 -#else
 -
 +#ifndef MISMATCH_CHECK
  #define MISMATCH_CHECK(X, Y) QEMU_BUILD_BUG_ON(0)
 -
  #endif
  #define CP_REG_SIZE_SHIFT 52
 diff --git a/hw/misc/imx6_src.c b/hw/misc/imx6_src.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/imx6_src.c
 +++ b/hw/misc/imx6_src.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/log.h"
  #include "qemu/main-loop.h"
  #include "qemu/module.h"
 -#include "arm-powerctl.h"
 +#include "target/arm/arm-powerctl.h"
  #include "hw/core/cpu.h"
  #ifndef DEBUG_IMX6_SRC
 diff --git a/hw/misc/iotkit-sysctl.c b/hw/misc/iotkit-sysctl.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/iotkit-sysctl.c
 +++ b/hw/misc/iotkit-sysctl.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/qdev-properties.h"
  #include "hw/arm/armsse-version.h"
  #include "target/arm/arm-powerctl.h"
 -#include "target/arm/cpu.h"
  REG32(SECDBGSTAT, 0x0)
  REG32(SECDBGSET, 0x4)
 diff --git a/hw/misc/meson.build b/hw/misc/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/meson.build
 +++ b/hw/misc/meson.build
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_IMX', if_true: files(
    'imx25_ccm.c',
    'imx31_ccm.c',
    'imx6_ccm.c',
 +  'imx6_src.c',
    'imx6ul_ccm.c',
    'imx7_ccm.c',
    'imx7_gpr.c',
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_RASPI', if_true: files(
  ))
  softmmu_ss.add(when: 'CONFIG_SLAVIO', if_true: files('slavio_misc.c'))
  softmmu_ss.add(when: 'CONFIG_ZYNQ', if_true: files('zynq_slcr.c'))
 -specific_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp-crf.c'))
 -specific_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp-apu-ctrl.c'))
 +softmmu_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp-crf.c'))
 +softmmu_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp-apu-ctrl.c'))
  specific_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: files('xlnx-versal-crl.c'))
  softmmu_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: files(
    'xlnx-versal-xramc.c',
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_TZ_MPC', if_true: files('tz-mpc.c'))
  softmmu_ss.add(when: 'CONFIG_TZ_MSC', if_true: files('tz-msc.c'))
  softmmu_ss.add(when: 'CONFIG_TZ_PPC', if_true: files('tz-ppc.c'))
  softmmu_ss.add(when: 'CONFIG_IOTKIT_SECCTL', if_true: files('iotkit-secctl.c'))
 +softmmu_ss.add(when: 'CONFIG_IOTKIT_SYSCTL', if_true: files('iotkit-sysctl.c'))
  softmmu_ss.add(when: 'CONFIG_IOTKIT_SYSINFO', if_true: files('iotkit-sysinfo.c'))
  softmmu_ss.add(when: 'CONFIG_ARMSSE_CPU_PWRCTRL', if_true: files('armsse-cpu-pwrctrl.c'))
  softmmu_ss.add(when: 'CONFIG_ARMSSE_CPUID', if_true: files('armsse-cpuid.c'))
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_GRLIB', if_true: files('grlib_ahb_apb_pnp.c'))
  specific_ss.add(when: 'CONFIG_AVR_POWER', if_true: files('avr_power.c'))
 -specific_ss.add(when: 'CONFIG_IMX', if_true: files('imx6_src.c'))
 -specific_ss.add(when: 'CONFIG_IOTKIT_SYSCTL', if_true: files('iotkit-sysctl.c'))
 -
  specific_ss.add(when: 'CONFIG_MAC_VIA', if_true: files('mac_via.c'))
  specific_ss.add(when: 'CONFIG_MIPS_CPS', if_true: files('mips_cmgcr.c', 'mips_cpc.c'))
  specific_ss.add(when: 'CONFIG_MIPS_ITU', if_true: files('mips_itu.c'))
 -specific_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa_ec.c'))
 +softmmu_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa_ec.c'))
  # HPPA devices
  softmmu_ss.add(when: 'CONFIG_LASI', if_true: files('lasi.c'))
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 10/48] target/arm: Fix Cortex-R5F MVFR values
+[PULL 29/29] target/arm: Restrict arm_cpu_exec_interrupt() to TCG accelerator
-The Cortex-R5F initfn was not correctly setting up the MVFR
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
 ID register values. Fill these in, since some subsequent patches
 will use ID register checks rather than CPU feature bit checks.
+When building with --disable-tcg on Darwin we get:
+  target/arm/cpu.c:725:16: error: incomplete definition of type 'struct TCGCPUOps'
+    cc->tcg_ops->do_interrupt(cs);
+    ~~~~~~~~~~~^
+Commit 083afd18a9 ("target/arm: Restrict cpu_exec_interrupt()
+handler to sysemu") limited this block to system emulation,
+but neglected to also limit it to TCG.
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Reviewed-by: Fabiano Rosas <farosas@suse.de>
+Message-id: 20221209110823.59495-1-philmd@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/cpu.c | 2 ++
+ target/arm/cpu.c | 5 +++--
-file changed, 2 insertions(+)
+file changed, 3 insertions(+), 2 deletions(-)
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void cortex_r5f_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
+     arm_rebuild_hflags(env);
      cortex_r5_initfn(obj);
      set_feature(&cpu->env, ARM_FEATURE_VFP3);
 +    cpu->isar.mvfr0 = 0x10110221;
 +    cpu->isar.mvfr1 = 0x00000011;
  }
- static const ARMCPRegInfo cortexa8_cp_reginfo[] = {
+-#ifndef CONFIG_USER_ONLY
 +#if defined(CONFIG_TCG) && !defined(CONFIG_USER_ONLY)
  static inline bool arm_excp_unmasked(CPUState *cs, unsigned int excp_idx,
                                       unsigned int target_el,
@@ -XXX,XX +XXX,XX @@ static bool arm_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
      cc->tcg_ops->do_interrupt(cs);
      return true;
  }
 -#endif /* !CONFIG_USER_ONLY */
 +
 +#endif /* CONFIG_TCG && !CONFIG_USER_ONLY */
  void arm_cpu_update_virq(ARMCPU *cpu)
  {
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 11/48] target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max
+Deleted patch
-At the moment our -cpu max for AArch32 supports VFP short-vectors
-because we always implement them, even for CPUs which should
-not have them. The following commits are going to switch to
-using the correct ID-register-check to enable or disable short
-vector support, so we need to turn it on explicitly for -cpu max,
-because Cortex-A15 doesn't implement it.
-We don't enable this for the AArch64 -cpu max, because the v8A
-architecture never supports short-vectors.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/cpu.c | 4 ++++
-file changed, 4 insertions(+)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
-+++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
-         kvm_arm_set_cpu_features_from_host(cpu);
-     } else {
-         cortex_a15_initfn(obj);
-+
-+        /* old-style VFP short-vector support */
-+        cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSHVEC, 1);
-+
- #ifdef CONFIG_USER_ONLY
-         /* We don't set these in system emulation mode for the moment,
-          * since we don't correctly set (all of) the ID registers to
---
-.20.1

-[Qemu-devel] [PULL 19/48] target/arm: Convert "single-precision" register moves to decodetree
+Deleted patch
-Convert the "single-precision" register moves to decodetree:
- * VMSR
- * VMRS
- * VMOV between general purpose register and single precision
-Note that the VMSR/VMRS conversions make our handling of
-the "should this UNDEF?" checks consistent between the two
-instructions:
- * VMSR to MVFR0, MVFR1, MVFR2 now UNDEF from EL0
-   (previously was a nop)
- * VMSR to FPSID now UNDEFs from EL0 or if VFPv3 or better
-   (previously was a nop)
- * VMSR to FPINST and FPINST2 now UNDEF if VFPv3 or better
-   (previously would write to the register, which had no
-   guest-visible effect because we always UNDEF reads)
-We also tighten up the decode: we were previously underdecoding
-some SBZ or SBO bits.
-The conversion of VMOV_single includes the expansion out of the
-gen_mov_F0_vreg()/gen_vfp_mrs() and gen_mov_vreg_F0()/gen_vfp_msr()
-sequences into the simpler direct load/store of the TCG temp via
-neon_{load,store}_reg32(): we know in the new function that we're
-always single-precision, we don't need to use the old-and-deprecated
-cpu_F0* TCG globals, and we don't happen to have the declaration of
-gen_vfp_msr() and gen_vfp_mrs() at the point in the file where the
-new function is.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 161 +++++++++++++++++++++++++++++++++
- target/arm/translate.c         | 148 +-----------------------------
- target/arm/vfp.decode          |   4 +
-files changed, 168 insertions(+), 145 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
-     return true;
- }
-+
-+static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
-+{
-+    TCGv_i32 tmp;
-+    bool ignore_vfp_enabled = false;
-+
-+    if (arm_dc_feature(s, ARM_FEATURE_M)) {
-+        /*
-+         * The only M-profile VFP vmrs/vmsr sysreg is FPSCR.
-+         * Writes to R15 are UNPREDICTABLE; we choose to undef.
-+         */
-+        if (a->rt == 15 || a->reg != ARM_VFP_FPSCR) {
-+            return false;
-+        }
-+    }
-+
-+    switch (a->reg) {
-+    case ARM_VFP_FPSID:
-+        /*
-+         * VFPv2 allows access to FPSID from userspace; VFPv3 restricts
-+         * all ID registers to privileged access only.
-+         */
-+        if (IS_USER(s) && arm_dc_feature(s, ARM_FEATURE_VFP3)) {
-+            return false;
-+        }
-+        ignore_vfp_enabled = true;
-+        break;
-+    case ARM_VFP_MVFR0:
-+    case ARM_VFP_MVFR1:
-+        if (IS_USER(s) || !arm_dc_feature(s, ARM_FEATURE_MVFR)) {
-+            return false;
-+        }
-+        ignore_vfp_enabled = true;
-+        break;
-+    case ARM_VFP_MVFR2:
-+        if (IS_USER(s) || !arm_dc_feature(s, ARM_FEATURE_V8)) {
-+            return false;
-+        }
-+        ignore_vfp_enabled = true;
-+        break;
-+    case ARM_VFP_FPSCR:
-+        break;
-+    case ARM_VFP_FPEXC:
-+        if (IS_USER(s)) {
-+            return false;
-+        }
-+        ignore_vfp_enabled = true;
-+        break;
-+    case ARM_VFP_FPINST:
-+    case ARM_VFP_FPINST2:
-+        /* Not present in VFPv3 */
-+        if (IS_USER(s) || arm_dc_feature(s, ARM_FEATURE_VFP3)) {
-+            return false;
-+        }
-+        break;
-+    default:
-+        return false;
-+    }
-+
-+    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
-+        return true;
-+    }
-+
-+    if (a->l) {
-+        /* VMRS, move VFP special register to gp register */
-+        switch (a->reg) {
-+        case ARM_VFP_FPSID:
-+        case ARM_VFP_FPEXC:
-+        case ARM_VFP_FPINST:
-+        case ARM_VFP_FPINST2:
-+        case ARM_VFP_MVFR0:
-+        case ARM_VFP_MVFR1:
-+        case ARM_VFP_MVFR2:
-+            tmp = load_cpu_field(vfp.xregs[a->reg]);
-+            break;
-+        case ARM_VFP_FPSCR:
-+            if (a->rt == 15) {
-+                tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
-+                tcg_gen_andi_i32(tmp, tmp, 0xf0000000);
-+            } else {
-+                tmp = tcg_temp_new_i32();
-+                gen_helper_vfp_get_fpscr(tmp, cpu_env);
-+            }
-+            break;
-+        default:
-+            g_assert_not_reached();
-+        }
-+
-+        if (a->rt == 15) {
-+            /* Set the 4 flag bits in the CPSR.  */
-+            gen_set_nzcv(tmp);
-+            tcg_temp_free_i32(tmp);
-+        } else {
-+            store_reg(s, a->rt, tmp);
-+        }
-+    } else {
-+        /* VMSR, move gp register to VFP special register */
-+        switch (a->reg) {
-+        case ARM_VFP_FPSID:
-+        case ARM_VFP_MVFR0:
-+        case ARM_VFP_MVFR1:
-+        case ARM_VFP_MVFR2:
-+            /* Writes are ignored.  */
-+            break;
-+        case ARM_VFP_FPSCR:
-+            tmp = load_reg(s, a->rt);
-+            gen_helper_vfp_set_fpscr(cpu_env, tmp);
-+            tcg_temp_free_i32(tmp);
-+            gen_lookup_tb(s);
-+            break;
-+        case ARM_VFP_FPEXC:
-+            /*
-+             * TODO: VFP subarchitecture support.
-+             * For now, keep the EN bit only
-+             */
-+            tmp = load_reg(s, a->rt);
-+            tcg_gen_andi_i32(tmp, tmp, 1 << 30);
-+            store_cpu_field(tmp, vfp.xregs[a->reg]);
-+            gen_lookup_tb(s);
-+            break;
-+        case ARM_VFP_FPINST:
-+        case ARM_VFP_FPINST2:
-+            tmp = load_reg(s, a->rt);
-+            store_cpu_field(tmp, vfp.xregs[a->reg]);
-+            break;
-+        default:
-+            g_assert_not_reached();
-+        }
-+    }
-+
-+    return true;
-+}
-+
-+static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
-+{
-+    TCGv_i32 tmp;
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    if (a->l) {
-+        /* VFP to general purpose register */
-+        tmp = tcg_temp_new_i32();
-+        neon_load_reg32(tmp, a->vn);
-+        if (a->rt == 15) {
-+            /* Set the 4 flag bits in the CPSR.  */
-+            gen_set_nzcv(tmp);
-+            tcg_temp_free_i32(tmp);
-+        } else {
-+            store_reg(s, a->rt, tmp);
-+        }
-+    } else {
-+        /* general purpose register to VFP */
-+        tmp = load_reg(s, a->rt);
-+        neon_store_reg32(tmp, a->vn);
-+        tcg_temp_free_i32(tmp);
-+    }
-+
-+    return true;
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-     TCGv_i32 addr;
-     TCGv_i32 tmp;
-     TCGv_i32 tmp2;
--    bool ignore_vfp_enabled = false;
-     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
-         return 1;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-      * for invalid encodings; we will generate incorrect syndrome information
-      * for attempts to execute invalid vfp/neon encodings with FP disabled.
-      */
--    if ((insn & 0x0fe00fff) == 0x0ee00a10) {
--        rn = (insn >> 16) & 0xf;
--        if (rn == ARM_VFP_FPSID || rn == ARM_VFP_FPEXC || rn == ARM_VFP_MVFR2
--            || rn == ARM_VFP_MVFR1 || rn == ARM_VFP_MVFR0) {
--            ignore_vfp_enabled = true;
--        }
--    }
--    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
-+    if (!vfp_access_check(s)) {
-         return 0;
-     }
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-     switch ((insn >> 24) & 0xf) {
-     case 0xe:
-         if (insn & (1 << 4)) {
--            /* single register transfer */
--            rd = (insn >> 12) & 0xf;
--            if (dp) {
--                /* already handled by decodetree */
--                return 1;
--            } else { /* !dp */
--                bool is_sysreg;
--
--                if ((insn & 0x6f) != 0x00)
--                    return 1;
--                rn = VFP_SREG_N(insn);
--
--                is_sysreg = extract32(insn, 21, 1);
--
--                if (arm_dc_feature(s, ARM_FEATURE_M)) {
--                    /*
--                     * The only M-profile VFP vmrs/vmsr sysreg is FPSCR.
--                     * Writes to R15 are UNPREDICTABLE; we choose to undef.
--                     */
--                    if (is_sysreg && (rd == 15 || (rn >> 1) != ARM_VFP_FPSCR)) {
--                        return 1;
--                    }
--                }
--
--                if (insn & ARM_CP_RW_BIT) {
--                    /* vfp->arm */
--                    if (is_sysreg) {
--                        /* system register */
--                        rn >>= 1;
--
--                        switch (rn) {
--                        case ARM_VFP_FPSID:
--                            /* VFP2 allows access to FSID from userspace.
--                               VFP3 restricts all id registers to privileged
--                               accesses.  */
--                            if (IS_USER(s)
--                                && arm_dc_feature(s, ARM_FEATURE_VFP3)) {
--                                return 1;
--                            }
--                            tmp = load_cpu_field(vfp.xregs[rn]);
--                            break;
--                        case ARM_VFP_FPEXC:
--                            if (IS_USER(s))
--                                return 1;
--                            tmp = load_cpu_field(vfp.xregs[rn]);
--                            break;
--                        case ARM_VFP_FPINST:
--                        case ARM_VFP_FPINST2:
--                            /* Not present in VFP3.  */
--                            if (IS_USER(s)
--                                || arm_dc_feature(s, ARM_FEATURE_VFP3)) {
--                                return 1;
--                            }
--                            tmp = load_cpu_field(vfp.xregs[rn]);
--                            break;
--                        case ARM_VFP_FPSCR:
--                            if (rd == 15) {
--                                tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
--                                tcg_gen_andi_i32(tmp, tmp, 0xf0000000);
--                            } else {
--                                tmp = tcg_temp_new_i32();
--                                gen_helper_vfp_get_fpscr(tmp, cpu_env);
--                            }
--                            break;
--                        case ARM_VFP_MVFR2:
--                            if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
--                                return 1;
--                            }
--                            /* fall through */
--                        case ARM_VFP_MVFR0:
--                        case ARM_VFP_MVFR1:
--                            if (IS_USER(s)
--                                || !arm_dc_feature(s, ARM_FEATURE_MVFR)) {
--                                return 1;
--                            }
--                            tmp = load_cpu_field(vfp.xregs[rn]);
--                            break;
--                        default:
--                            return 1;
--                        }
--                    } else {
--                        gen_mov_F0_vreg(0, rn);
--                        tmp = gen_vfp_mrs();
--                    }
--                    if (rd == 15) {
--                        /* Set the 4 flag bits in the CPSR.  */
--                        gen_set_nzcv(tmp);
--                        tcg_temp_free_i32(tmp);
--                    } else {
--                        store_reg(s, rd, tmp);
--                    }
--                } else {
--                    /* arm->vfp */
--                    if (is_sysreg) {
--                        rn >>= 1;
--                        /* system register */
--                        switch (rn) {
--                        case ARM_VFP_FPSID:
--                        case ARM_VFP_MVFR0:
--                        case ARM_VFP_MVFR1:
--                            /* Writes are ignored.  */
--                            break;
--                        case ARM_VFP_FPSCR:
--                            tmp = load_reg(s, rd);
--                            gen_helper_vfp_set_fpscr(cpu_env, tmp);
--                            tcg_temp_free_i32(tmp);
--                            gen_lookup_tb(s);
--                            break;
--                        case ARM_VFP_FPEXC:
--                            if (IS_USER(s))
--                                return 1;
--                            /* TODO: VFP subarchitecture support.
--                             * For now, keep the EN bit only */
--                            tmp = load_reg(s, rd);
--                            tcg_gen_andi_i32(tmp, tmp, 1 << 30);
--                            store_cpu_field(tmp, vfp.xregs[rn]);
--                            gen_lookup_tb(s);
--                            break;
--                        case ARM_VFP_FPINST:
--                        case ARM_VFP_FPINST2:
--                            if (IS_USER(s)) {
--                                return 1;
--                            }
--                            tmp = load_reg(s, rd);
--                            store_cpu_field(tmp, vfp.xregs[rn]);
--                            break;
--                        default:
--                            return 1;
--                        }
--                    } else {
--                        tmp = load_reg(s, rd);
--                        gen_vfp_msr(tmp);
--                        gen_mov_vreg_F0(0, rn);
--                    }
--                }
--            }
-+            /* already handled by decodetree */
-+            return 1;
-         } else {
-             /* data processing */
-             bool rd_is_dp = dp;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VMOV_from_gp ---- 1110 0 0 index:1 0 .... rt:4 1011 .00 1 0000 \
- VDUP         ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \
-              vn=%vn_dp
-+
-+VMSR_VMRS    ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
-+VMOV_single  ---- 1110 000 l:1 .... rt:4 1010 . 001 0000 \
-+             vn=%vn_sp
---
-.20.1

-[Qemu-devel] [PULL 20/48] target/arm: Convert VFP two-register transfer insns to decodetree
+Deleted patch
-Convert the VFP two-register transfer instructions to decodetree
-(in the v8 Arm ARM these are the "Advanced SIMD and floating-point
--bit move" encoding group).
-Again, we expand out the sequences involving gen_vfp_msr() and
-gen_msr_vfp().
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 70 ++++++++++++++++++++++++++++++++++
- target/arm/translate.c         | 46 +---------------------
- target/arm/vfp.decode          |  5 +++
-files changed, 77 insertions(+), 44 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
-     return true;
- }
-+
-+static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
-+{
-+    TCGv_i32 tmp;
-+
-+    /*
-+     * VMOV between two general-purpose registers and two single precision
-+     * floating point registers
-+     */
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    if (a->op) {
-+        /* fpreg to gpreg */
-+        tmp = tcg_temp_new_i32();
-+        neon_load_reg32(tmp, a->vm);
-+        store_reg(s, a->rt, tmp);
-+        tmp = tcg_temp_new_i32();
-+        neon_load_reg32(tmp, a->vm + 1);
-+        store_reg(s, a->rt2, tmp);
-+    } else {
-+        /* gpreg to fpreg */
-+        tmp = load_reg(s, a->rt);
-+        neon_store_reg32(tmp, a->vm);
-+        tmp = load_reg(s, a->rt2);
-+        neon_store_reg32(tmp, a->vm + 1);
-+    }
-+
-+    return true;
-+}
-+
-+static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_sp *a)
-+{
-+    TCGv_i32 tmp;
-+
-+    /*
-+     * VMOV between two general-purpose registers and one double precision
-+     * floating point register
-+     */
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    if (a->op) {
-+        /* fpreg to gpreg */
-+        tmp = tcg_temp_new_i32();
-+        neon_load_reg32(tmp, a->vm * 2);
-+        store_reg(s, a->rt, tmp);
-+        tmp = tcg_temp_new_i32();
-+        neon_load_reg32(tmp, a->vm * 2 + 1);
-+        store_reg(s, a->rt2, tmp);
-+    } else {
-+        /* gpreg to fpreg */
-+        tmp = load_reg(s, a->rt);
-+        neon_store_reg32(tmp, a->vm * 2);
-+        tcg_temp_free_i32(tmp);
-+        tmp = load_reg(s, a->rt2);
-+        neon_store_reg32(tmp, a->vm * 2 + 1);
-+        tcg_temp_free_i32(tmp);
-+    }
-+
-+    return true;
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-     case 0xc:
-     case 0xd:
-         if ((insn & 0x03e00000) == 0x00400000) {
--            /* two-register transfer */
--            rn = (insn >> 16) & 0xf;
--            rd = (insn >> 12) & 0xf;
--            if (dp) {
--                VFP_DREG_M(rm, insn);
--            } else {
--                rm = VFP_SREG_M(insn);
--            }
--
--            if (insn & ARM_CP_RW_BIT) {
--                /* vfp->arm */
--                if (dp) {
--                    gen_mov_F0_vreg(0, rm * 2);
--                    tmp = gen_vfp_mrs();
--                    store_reg(s, rd, tmp);
--                    gen_mov_F0_vreg(0, rm * 2 + 1);
--                    tmp = gen_vfp_mrs();
--                    store_reg(s, rn, tmp);
--                } else {
--                    gen_mov_F0_vreg(0, rm);
--                    tmp = gen_vfp_mrs();
--                    store_reg(s, rd, tmp);
--                    gen_mov_F0_vreg(0, rm + 1);
--                    tmp = gen_vfp_mrs();
--                    store_reg(s, rn, tmp);
--                }
--            } else {
--                /* arm->vfp */
--                if (dp) {
--                    tmp = load_reg(s, rd);
--                    gen_vfp_msr(tmp);
--                    gen_mov_vreg_F0(0, rm * 2);
--                    tmp = load_reg(s, rn);
--                    gen_vfp_msr(tmp);
--                    gen_mov_vreg_F0(0, rm * 2 + 1);
--                } else {
--                    tmp = load_reg(s, rd);
--                    gen_vfp_msr(tmp);
--                    gen_mov_vreg_F0(0, rm);
--                    tmp = load_reg(s, rn);
--                    gen_vfp_msr(tmp);
--                    gen_mov_vreg_F0(0, rm + 1);
--                }
--            }
-+            /* Already handled by decodetree */
-+            return 1;
-         } else {
-             /* Load/store */
-             rn = (insn >> 16) & 0xf;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VDUP         ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \
- VMSR_VMRS    ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
- VMOV_single  ---- 1110 000 l:1 .... rt:4 1010 . 001 0000 \
-              vn=%vn_sp
-+
-+VMOV_64_sp   ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 .... \
-+             vm=%vm_sp
-+VMOV_64_dp   ---- 1100 010 op:1 rt2:4 rt:4 1011 00.1 .... \
-+             vm=%vm_dp
---
-.20.1

-[Qemu-devel] [PULL 21/48] target/arm: Convert VFP VLDR and VSTR to decodetree
+Deleted patch
-Convert the VFP single load/store insns VLDR and VSTR to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 73 ++++++++++++++++++++++++++++++++++
- target/arm/translate.c         | 22 +---------
- target/arm/vfp.decode          |  7 ++++
-files changed, 82 insertions(+), 20 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_sp *a)
-     return true;
- }
-+
-+static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-+{
-+    uint32_t offset;
-+    TCGv_i32 addr;
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    offset = a->imm << 2;
-+    if (!a->u) {
-+        offset = -offset;
-+    }
-+
-+    if (s->thumb && a->rn == 15) {
-+        /* This is actually UNPREDICTABLE */
-+        addr = tcg_temp_new_i32();
-+        tcg_gen_movi_i32(addr, s->pc & ~2);
-+    } else {
-+        addr = load_reg(s, a->rn);
-+    }
-+    tcg_gen_addi_i32(addr, addr, offset);
-+    if (a->l) {
-+        gen_vfp_ld(s, false, addr);
-+        gen_mov_vreg_F0(false, a->vd);
-+    } else {
-+        gen_mov_F0_vreg(false, a->vd);
-+        gen_vfp_st(s, false, addr);
-+    }
-+    tcg_temp_free_i32(addr);
-+
-+    return true;
-+}
-+
-+static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-+{
-+    uint32_t offset;
-+    TCGv_i32 addr;
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    offset = a->imm << 2;
-+    if (!a->u) {
-+        offset = -offset;
-+    }
-+
-+    if (s->thumb && a->rn == 15) {
-+        /* This is actually UNPREDICTABLE */
-+        addr = tcg_temp_new_i32();
-+        tcg_gen_movi_i32(addr, s->pc & ~2);
-+    } else {
-+        addr = load_reg(s, a->rn);
-+    }
-+    tcg_gen_addi_i32(addr, addr, offset);
-+    if (a->l) {
-+        gen_vfp_ld(s, true, addr);
-+        gen_mov_vreg_F0(true, a->vd);
-+    } else {
-+        gen_mov_F0_vreg(true, a->vd);
-+        gen_vfp_st(s, true, addr);
-+    }
-+    tcg_temp_free_i32(addr);
-+
-+    return true;
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             else
-                 rd = VFP_SREG_D(insn);
-             if ((insn & 0x01200000) == 0x01000000) {
--                /* Single load/store */
--                offset = (insn & 0xff) << 2;
--                if ((insn & (1 << 23)) == 0)
--                    offset = -offset;
--                if (s->thumb && rn == 15) {
--                    /* This is actually UNPREDICTABLE */
--                    addr = tcg_temp_new_i32();
--                    tcg_gen_movi_i32(addr, s->pc & ~2);
--                } else {
--                    addr = load_reg(s, rn);
--                }
--                tcg_gen_addi_i32(addr, addr, offset);
--                if (insn & (1 << 20)) {
--                    gen_vfp_ld(s, dp, addr);
--                    gen_mov_vreg_F0(dp, rd);
--                } else {
--                    gen_mov_F0_vreg(dp, rd);
--                    gen_vfp_st(s, dp, addr);
--                }
--                tcg_temp_free_i32(addr);
-+                /* Already handled by decodetree */
-+                return 1;
-             } else {
-                 /* load/store multiple */
-                 int w = insn & (1 << 21);
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VMOV_64_sp   ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 .... \
-              vm=%vm_sp
- VMOV_64_dp   ---- 1100 010 op:1 rt2:4 rt:4 1011 00.1 .... \
-              vm=%vm_dp
-+
-+# Note that the half-precision variants of VLDR and VSTR are
-+# not part of this decodetree at all because they have bits [9:8] == 0b01
-+VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8 \
-+             vd=%vd_sp
-+VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8 \
-+             vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 22/48] target/arm: Convert the VFP load/store multiple insns to decodetree
+Deleted patch
-Convert the VFP load/store multiple insns to decodetree.
-This includes tightening up the UNDEF checking for pre-VFPv3
-CPUs which only have D0-D15 : they now UNDEF for any access
-to D16-D31, not merely when the smallest register in the
-transfer list is in D16-D31.
-This conversion does not try to share code between the single
-precision and the double precision versions; this looks a bit
-duplicative of code, but it leaves the door open for a future
-refactoring which gets rid of the use of the "F0" registers
-by inlining the various functions like gen_vfp_ld() and
-gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }"
-conditionalisation.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 162 +++++++++++++++++++++++++++++++++
- target/arm/translate.c         |  97 +-------------------
- target/arm/vfp.decode          |  18 ++++
-files changed, 183 insertions(+), 94 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-     return true;
- }
-+
-+static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
-+{
-+    uint32_t offset;
-+    TCGv_i32 addr;
-+    int i, n;
-+
-+    n = a->imm;
-+
-+    if (n == 0 || (a->vd + n) > 32) {
-+        /*
-+         * UNPREDICTABLE cases for bad immediates: we choose to
-+         * UNDEF to avoid generating huge numbers of TCG ops
-+         */
-+        return false;
-+    }
-+    if (a->rn == 15 && a->w) {
-+        /* writeback to PC is UNPREDICTABLE, we choose to UNDEF */
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    if (s->thumb && a->rn == 15) {
-+        /* This is actually UNPREDICTABLE */
-+        addr = tcg_temp_new_i32();
-+        tcg_gen_movi_i32(addr, s->pc & ~2);
-+    } else {
-+        addr = load_reg(s, a->rn);
-+    }
-+    if (a->p) {
-+        /* pre-decrement */
-+        tcg_gen_addi_i32(addr, addr, -(a->imm << 2));
-+    }
-+
-+    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
-+        /*
-+         * Here 'addr' is the lowest address we will store to,
-+         * and is either the old SP (if post-increment) or
-+         * the new SP (if pre-decrement). For post-increment
-+         * where the old value is below the limit and the new
-+         * value is above, it is UNKNOWN whether the limit check
-+         * triggers; we choose to trigger.
-+         */
-+        gen_helper_v8m_stackcheck(cpu_env, addr);
-+    }
-+
-+    offset = 4;
-+    for (i = 0; i < n; i++) {
-+        if (a->l) {
-+            /* load */
-+            gen_vfp_ld(s, false, addr);
-+            gen_mov_vreg_F0(false, a->vd + i);
-+        } else {
-+            /* store */
-+            gen_mov_F0_vreg(false, a->vd + i);
-+            gen_vfp_st(s, false, addr);
-+        }
-+        tcg_gen_addi_i32(addr, addr, offset);
-+    }
-+    if (a->w) {
-+        /* writeback */
-+        if (a->p) {
-+            offset = -offset * n;
-+            tcg_gen_addi_i32(addr, addr, offset);
-+        }
-+        store_reg(s, a->rn, addr);
-+    } else {
-+        tcg_temp_free_i32(addr);
-+    }
-+
-+    return true;
-+}
-+
-+static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
-+{
-+    uint32_t offset;
-+    TCGv_i32 addr;
-+    int i, n;
-+
-+    n = a->imm >> 1;
-+
-+    if (n == 0 || (a->vd + n) > 32 || n > 16) {
-+        /*
-+         * UNPREDICTABLE cases for bad immediates: we choose to
-+         * UNDEF to avoid generating huge numbers of TCG ops
-+         */
-+        return false;
-+    }
-+    if (a->rn == 15 && a->w) {
-+        /* writeback to PC is UNPREDICTABLE, we choose to UNDEF */
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd + n) > 16) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    if (s->thumb && a->rn == 15) {
-+        /* This is actually UNPREDICTABLE */
-+        addr = tcg_temp_new_i32();
-+        tcg_gen_movi_i32(addr, s->pc & ~2);
-+    } else {
-+        addr = load_reg(s, a->rn);
-+    }
-+    if (a->p) {
-+        /* pre-decrement */
-+        tcg_gen_addi_i32(addr, addr, -(a->imm << 2));
-+    }
-+
-+    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
-+        /*
-+         * Here 'addr' is the lowest address we will store to,
-+         * and is either the old SP (if post-increment) or
-+         * the new SP (if pre-decrement). For post-increment
-+         * where the old value is below the limit and the new
-+         * value is above, it is UNKNOWN whether the limit check
-+         * triggers; we choose to trigger.
-+         */
-+        gen_helper_v8m_stackcheck(cpu_env, addr);
-+    }
-+
-+    offset = 8;
-+    for (i = 0; i < n; i++) {
-+        if (a->l) {
-+            /* load */
-+            gen_vfp_ld(s, true, addr);
-+            gen_mov_vreg_F0(true, a->vd + i);
-+        } else {
-+            /* store */
-+            gen_mov_F0_vreg(true, a->vd + i);
-+            gen_vfp_st(s, true, addr);
-+        }
-+        tcg_gen_addi_i32(addr, addr, offset);
-+    }
-+    if (a->w) {
-+        /* writeback */
-+        if (a->p) {
-+            offset = -offset * n;
-+        } else if (a->imm & 1) {
-+            offset = 4;
-+        } else {
-+            offset = 0;
-+        }
-+
-+        if (offset != 0) {
-+            tcg_gen_addi_i32(addr, addr, offset);
-+        }
-+        store_reg(s, a->rn, addr);
-+    } else {
-+        tcg_temp_free_i32(addr);
-+    }
-+
-+    return true;
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
-  */
- static int disas_vfp_insn(DisasContext *s, uint32_t insn)
- {
--    uint32_t rd, rn, rm, op, i, n, offset, delta_d, delta_m, bank_mask;
-+    uint32_t rd, rn, rm, op, i, n, delta_d, delta_m, bank_mask;
-     int dp, veclen;
--    TCGv_i32 addr;
-     TCGv_i32 tmp;
-     TCGv_i32 tmp2;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-         break;
-     case 0xc:
-     case 0xd:
--        if ((insn & 0x03e00000) == 0x00400000) {
--            /* Already handled by decodetree */
--            return 1;
--        } else {
--            /* Load/store */
--            rn = (insn >> 16) & 0xf;
--            if (dp)
--                VFP_DREG_D(rd, insn);
--            else
--                rd = VFP_SREG_D(insn);
--            if ((insn & 0x01200000) == 0x01000000) {
--                /* Already handled by decodetree */
--                return 1;
--            } else {
--                /* load/store multiple */
--                int w = insn & (1 << 21);
--                if (dp)
--                    n = (insn >> 1) & 0x7f;
--                else
--                    n = insn & 0xff;
--
--                if (w && !(((insn >> 23) ^ (insn >> 24)) & 1)) {
--                    /* P == U , W == 1  => UNDEF */
--                    return 1;
--                }
--                if (n == 0 || (rd + n) > 32 || (dp && n > 16)) {
--                    /* UNPREDICTABLE cases for bad immediates: we choose to
--                     * UNDEF to avoid generating huge numbers of TCG ops
--                     */
--                    return 1;
--                }
--                if (rn == 15 && w) {
--                    /* writeback to PC is UNPREDICTABLE, we choose to UNDEF */
--                    return 1;
--                }
--
--                if (s->thumb && rn == 15) {
--                    /* This is actually UNPREDICTABLE */
--                    addr = tcg_temp_new_i32();
--                    tcg_gen_movi_i32(addr, s->pc & ~2);
--                } else {
--                    addr = load_reg(s, rn);
--                }
--                if (insn & (1 << 24)) /* pre-decrement */
--                    tcg_gen_addi_i32(addr, addr, -((insn & 0xff) << 2));
--
--                if (s->v8m_stackcheck && rn == 13 && w) {
--                    /*
--                     * Here 'addr' is the lowest address we will store to,
--                     * and is either the old SP (if post-increment) or
--                     * the new SP (if pre-decrement). For post-increment
--                     * where the old value is below the limit and the new
--                     * value is above, it is UNKNOWN whether the limit check
--                     * triggers; we choose to trigger.
--                     */
--                    gen_helper_v8m_stackcheck(cpu_env, addr);
--                }
--
--                if (dp)
--                    offset = 8;
--                else
--                    offset = 4;
--                for (i = 0; i < n; i++) {
--                    if (insn & ARM_CP_RW_BIT) {
--                        /* load */
--                        gen_vfp_ld(s, dp, addr);
--                        gen_mov_vreg_F0(dp, rd + i);
--                    } else {
--                        /* store */
--                        gen_mov_F0_vreg(dp, rd + i);
--                        gen_vfp_st(s, dp, addr);
--                    }
--                    tcg_gen_addi_i32(addr, addr, offset);
--                }
--                if (w) {
--                    /* writeback */
--                    if (insn & (1 << 24))
--                        offset = -offset * n;
--                    else if (dp && (insn & 1))
--                        offset = 4;
--                    else
--                        offset = 0;
--
--                    if (offset != 0)
--                        tcg_gen_addi_i32(addr, addr, offset);
--                    store_reg(s, rn, addr);
--                } else {
--                    tcg_temp_free_i32(addr);
--                }
--            }
--        }
--        break;
-+        /* Already handled by decodetree */
-+        return 1;
-     default:
-         /* Should never happen.  */
-         return 1;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8 \
-              vd=%vd_sp
- VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8 \
-              vd=%vd_dp
-+
-+# We split the load/store multiple up into two patterns to avoid
-+# overlap with other insns in the "Advanced SIMD load/store and 64-bit move"
-+# grouping:
-+#   P=0 U=0 W=0 is 64-bit VMOV
-+#   P=1 W=0 is VLDR/VSTR
-+#   P=U W=1 is UNDEF
-+# leaving P=0 U=1 W=x and P=1 U=0 W=1 for load/store multiple.
-+# These include FSTM/FLDM.
-+VLDM_VSTM_sp ---- 1100 1 . w:1 l:1 rn:4 .... 1010 imm:8 \
-+             vd=%vd_sp p=0 u=1
-+VLDM_VSTM_dp ---- 1100 1 . w:1 l:1 rn:4 .... 1011 imm:8 \
-+             vd=%vd_dp p=0 u=1
-+
-+VLDM_VSTM_sp ---- 1101 0.1 l:1 rn:4 .... 1010 imm:8 \
-+             vd=%vd_sp p=1 u=0 w=1
-+VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
-+             vd=%vd_dp p=1 u=0 w=1
---
-.20.1

-[Qemu-devel] [PULL 23/48] target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d
+Deleted patch
-Expand out the sequences in the new decoder VLDR/VSTR/VLDM/VSTM trans
-functions which perform the memory accesses by going via the TCG
-globals cpu_F0s and cpu_F0d, to use local TCG temps instead.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 46 +++++++++++++++++++++-------------
- target/arm/translate.c         | 18 -------------
-files changed, 28 insertions(+), 36 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_sp *a)
- static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
- {
-     uint32_t offset;
--    TCGv_i32 addr;
-+    TCGv_i32 addr, tmp;
-     if (!vfp_access_check(s)) {
-         return true;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-         addr = load_reg(s, a->rn);
-     }
-     tcg_gen_addi_i32(addr, addr, offset);
-+    tmp = tcg_temp_new_i32();
-     if (a->l) {
--        gen_vfp_ld(s, false, addr);
--        gen_mov_vreg_F0(false, a->vd);
-+        gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-+        neon_store_reg32(tmp, a->vd);
-     } else {
--        gen_mov_F0_vreg(false, a->vd);
--        gen_vfp_st(s, false, addr);
-+        neon_load_reg32(tmp, a->vd);
-+        gen_aa32_st32(s, tmp, addr, get_mem_index(s));
-     }
-+    tcg_temp_free_i32(tmp);
-     tcg_temp_free_i32(addr);
-     return true;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
- {
-     uint32_t offset;
-     TCGv_i32 addr;
-+    TCGv_i64 tmp;
-     /* UNDEF accesses to D16-D31 if they don't exist */
-     if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-         addr = load_reg(s, a->rn);
-     }
-     tcg_gen_addi_i32(addr, addr, offset);
-+    tmp = tcg_temp_new_i64();
-     if (a->l) {
--        gen_vfp_ld(s, true, addr);
--        gen_mov_vreg_F0(true, a->vd);
-+        gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-+        neon_store_reg64(tmp, a->vd);
-     } else {
--        gen_mov_F0_vreg(true, a->vd);
--        gen_vfp_st(s, true, addr);
-+        neon_load_reg64(tmp, a->vd);
-+        gen_aa32_st64(s, tmp, addr, get_mem_index(s));
-     }
-+    tcg_temp_free_i64(tmp);
-     tcg_temp_free_i32(addr);
-     return true;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
- static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
- {
-     uint32_t offset;
--    TCGv_i32 addr;
-+    TCGv_i32 addr, tmp;
-     int i, n;
-     n = a->imm;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
-     }
-     offset = 4;
-+    tmp = tcg_temp_new_i32();
-     for (i = 0; i < n; i++) {
-         if (a->l) {
-             /* load */
--            gen_vfp_ld(s, false, addr);
--            gen_mov_vreg_F0(false, a->vd + i);
-+            gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-+            neon_store_reg32(tmp, a->vd + i);
-         } else {
-             /* store */
--            gen_mov_F0_vreg(false, a->vd + i);
--            gen_vfp_st(s, false, addr);
-+            neon_load_reg32(tmp, a->vd + i);
-+            gen_aa32_st32(s, tmp, addr, get_mem_index(s));
-         }
-         tcg_gen_addi_i32(addr, addr, offset);
-     }
-+    tcg_temp_free_i32(tmp);
-     if (a->w) {
-         /* writeback */
-         if (a->p) {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
- {
-     uint32_t offset;
-     TCGv_i32 addr;
-+    TCGv_i64 tmp;
-     int i, n;
-     n = a->imm >> 1;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
-     }
-     offset = 8;
-+    tmp = tcg_temp_new_i64();
-     for (i = 0; i < n; i++) {
-         if (a->l) {
-             /* load */
--            gen_vfp_ld(s, true, addr);
--            gen_mov_vreg_F0(true, a->vd + i);
-+            gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-+            neon_store_reg64(tmp, a->vd + i);
-         } else {
-             /* store */
--            gen_mov_F0_vreg(true, a->vd + i);
--            gen_vfp_st(s, true, addr);
-+            neon_load_reg64(tmp, a->vd + i);
-+            gen_aa32_st64(s, tmp, addr, get_mem_index(s));
-         }
-         tcg_gen_addi_i32(addr, addr, offset);
-     }
-+    tcg_temp_free_i64(tmp);
-     if (a->w) {
-         /* writeback */
-         if (a->p) {
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ VFP_GEN_FIX(uhto, )
- VFP_GEN_FIX(ulto, )
- #undef VFP_GEN_FIX
--static inline void gen_vfp_ld(DisasContext *s, int dp, TCGv_i32 addr)
--{
--    if (dp) {
--        gen_aa32_ld64(s, cpu_F0d, addr, get_mem_index(s));
--    } else {
--        gen_aa32_ld32u(s, cpu_F0s, addr, get_mem_index(s));
--    }
--}
--
--static inline void gen_vfp_st(DisasContext *s, int dp, TCGv_i32 addr)
--{
--    if (dp) {
--        gen_aa32_st64(s, cpu_F0d, addr, get_mem_index(s));
--    } else {
--        gen_aa32_st32(s, cpu_F0s, addr, get_mem_index(s));
--    }
--}
--
- static inline long vfp_reg_offset(bool dp, unsigned reg)
- {
-     if (dp) {
---
-.20.1

-[Qemu-devel] [PULL 25/48] target/arm: Convert VFP VMLS to decodetree
+Deleted patch
-Convert the VFP VMLS instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 38 ++++++++++++++++++++++++++++++++++
- target/arm/translate.c         |  8 +------
- target/arm/vfp.decode          |  5 +++++
-files changed, 44 insertions(+), 7 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
- }
-+
-+static void gen_VMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
-+{
-+    /*
-+     * VMLS: vd = vd + -(vn * vm)
-+     * Note that order of inputs to the add matters for NaNs.
-+     */
-+    TCGv_i32 tmp = tcg_temp_new_i32();
-+
-+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
-+    gen_helper_vfp_negs(tmp, tmp);
-+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
-+    tcg_temp_free_i32(tmp);
-+}
-+
-+static bool trans_VMLS_sp(DisasContext *s, arg_VMLS_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_VMLS_sp, a->vd, a->vn, a->vm, true);
-+}
-+
-+static void gen_VMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
-+{
-+    /*
-+     * VMLS: vd = vd + -(vn * vm)
-+     * Note that order of inputs to the add matters for NaNs.
-+     */
-+    TCGv_i64 tmp = tcg_temp_new_i64();
-+
-+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
-+    gen_helper_vfp_negd(tmp, tmp);
-+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
-+    tcg_temp_free_i64(tmp);
-+}
-+
-+static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0:
-+            case 0 ... 1:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 1: /* VMLS: fd + -(fn * fm) */
--                    gen_vfp_mul(dp);
--                    gen_vfp_F1_neg(dp);
--                    gen_mov_F0_vreg(dp, rd);
--                    gen_vfp_add(dp);
--                    break;
-                 case 2: /* VNMLS: -fd + (fn * fm) */
-                     /* Note that it isn't valid to replace (-A + B) with (B - A)
-                      * or similar plausible looking simplifications
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 26/48] target/arm: Convert VFP VNMLS to decodetree
+Deleted patch
-Convert the VFP VNMLS instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 42 ++++++++++++++++++++++++++++++++++
- target/arm/translate.c         | 24 +------------------
- target/arm/vfp.decode          |  5 ++++
-files changed, 48 insertions(+), 23 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
- }
-+
-+static void gen_VNMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
-+{
-+    /*
-+     * VNMLS: -fd + (fn * fm)
-+     * Note that it isn't valid to replace (-A + B) with (B - A) or similar
-+     * plausible looking simplifications because this will give wrong results
-+     * for NaNs.
-+     */
-+    TCGv_i32 tmp = tcg_temp_new_i32();
-+
-+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
-+    gen_helper_vfp_negs(vd, vd);
-+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
-+    tcg_temp_free_i32(tmp);
-+}
-+
-+static bool trans_VNMLS_sp(DisasContext *s, arg_VNMLS_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_VNMLS_sp, a->vd, a->vn, a->vm, true);
-+}
-+
-+static void gen_VNMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
-+{
-+    /*
-+     * VNMLS: -fd + (fn * fm)
-+     * Note that it isn't valid to replace (-A + B) with (B - A) or similar
-+     * plausible looking simplifications because this will give wrong results
-+     * for NaNs.
-+     */
-+    TCGv_i64 tmp = tcg_temp_new_i64();
-+
-+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
-+    gen_helper_vfp_negd(vd, vd);
-+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
-+    tcg_temp_free_i64(tmp);
-+}
-+
-+static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ VFP_OP2(div)
- #undef VFP_OP2
--static inline void gen_vfp_F1_mul(int dp)
--{
--    /* Like gen_vfp_mul() but put result in F1 */
--    TCGv_ptr fpst = get_fpstatus_ptr(0);
--    if (dp) {
--        gen_helper_vfp_muld(cpu_F1d, cpu_F0d, cpu_F1d, fpst);
--    } else {
--        gen_helper_vfp_muls(cpu_F1s, cpu_F0s, cpu_F1s, fpst);
--    }
--    tcg_temp_free_ptr(fpst);
--}
--
- static inline void gen_vfp_F1_neg(int dp)
- {
-     /* Like gen_vfp_neg() but put result in F1 */
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 1:
-+            case 0 ... 2:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 2: /* VNMLS: -fd + (fn * fm) */
--                    /* Note that it isn't valid to replace (-A + B) with (B - A)
--                     * or similar plausible looking simplifications
--                     * because this will give wrong results for NaNs.
--                     */
--                    gen_vfp_F1_mul(dp);
--                    gen_mov_F0_vreg(dp, rd);
--                    gen_vfp_neg(dp);
--                    gen_vfp_add(dp);
--                    break;
-                 case 3: /* VNMLA: -fd + -(fn * fm) */
-                     gen_vfp_mul(dp);
-                     gen_vfp_F1_neg(dp);
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 27/48] target/arm: Convert VFP VNMLA to decodetree
+Deleted patch
-Convert the VFP VNMLA instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 34 ++++++++++++++++++++++++++++++++++
- target/arm/translate.c         | 19 +------------------
- target/arm/vfp.decode          |  5 +++++
-files changed, 40 insertions(+), 18 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
- }
-+
-+static void gen_VNMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
-+{
-+    /* VNMLA: -fd + -(fn * fm) */
-+    TCGv_i32 tmp = tcg_temp_new_i32();
-+
-+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
-+    gen_helper_vfp_negs(tmp, tmp);
-+    gen_helper_vfp_negs(vd, vd);
-+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
-+    tcg_temp_free_i32(tmp);
-+}
-+
-+static bool trans_VNMLA_sp(DisasContext *s, arg_VNMLA_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_VNMLA_sp, a->vd, a->vn, a->vm, true);
-+}
-+
-+static void gen_VNMLA_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
-+{
-+    /* VNMLA: -fd + (fn * fm) */
-+    TCGv_i64 tmp = tcg_temp_new_i64();
-+
-+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
-+    gen_helper_vfp_negd(tmp, tmp);
-+    gen_helper_vfp_negd(vd, vd);
-+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
-+    tcg_temp_free_i64(tmp);
-+}
-+
-+static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ VFP_OP2(div)
- #undef VFP_OP2
--static inline void gen_vfp_F1_neg(int dp)
--{
--    /* Like gen_vfp_neg() but put result in F1 */
--    if (dp) {
--        gen_helper_vfp_negd(cpu_F1d, cpu_F0d);
--    } else {
--        gen_helper_vfp_negs(cpu_F1s, cpu_F0s);
--    }
--}
--
- static inline void gen_vfp_abs(int dp)
- {
-     if (dp)
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 2:
-+            case 0 ... 3:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 3: /* VNMLA: -fd + -(fn * fm) */
--                    gen_vfp_mul(dp);
--                    gen_vfp_F1_neg(dp);
--                    gen_mov_F0_vreg(dp, rd);
--                    gen_vfp_neg(dp);
--                    gen_vfp_add(dp);
--                    break;
-                 case 4: /* mul: fn * fm */
-                     gen_vfp_mul(dp);
-                     break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 28/48] target/arm: Convert VMUL to decodetree
+Deleted patch
-Convert the VMUL instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 10 ++++++++++
- target/arm/translate.c         |  5 +----
- target/arm/vfp.decode          |  5 +++++
-files changed, 16 insertions(+), 4 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
- }
-+
-+static bool trans_VMUL_sp(DisasContext *s, arg_VMUL_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_helper_vfp_muls, a->vd, a->vn, a->vm, false);
-+}
-+
-+static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 3:
-+            case 0 ... 4:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 4: /* mul: fn * fm */
--                    gen_vfp_mul(dp);
--                    break;
-                 case 5: /* nmul: -(fn * fm) */
-                     gen_vfp_mul(dp);
-                     gen_vfp_neg(dp);
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 29/48] target/arm: Convert VNMUL to decodetree
+Deleted patch
-Convert the VNMUL instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 24 ++++++++++++++++++++++++
- target/arm/translate.c         |  7 +------
- target/arm/vfp.decode          |  5 +++++
-files changed, 30 insertions(+), 6 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
- }
-+
-+static void gen_VNMUL_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
-+{
-+    /* VNMUL: -(fn * fm) */
-+    gen_helper_vfp_muls(vd, vn, vm, fpst);
-+    gen_helper_vfp_negs(vd, vd);
-+}
-+
-+static bool trans_VNMUL_sp(DisasContext *s, arg_VNMUL_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_VNMUL_sp, a->vd, a->vn, a->vm, false);
-+}
-+
-+static void gen_VNMUL_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
-+{
-+    /* VNMUL: -(fn * fm) */
-+    gen_helper_vfp_muld(vd, vn, vm, fpst);
-+    gen_helper_vfp_negd(vd, vd);
-+}
-+
-+static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
- VFP_OP2(add)
- VFP_OP2(sub)
--VFP_OP2(mul)
- VFP_OP2(div)
- #undef VFP_OP2
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 4:
-+            case 0 ... 5:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 5: /* nmul: -(fn * fm) */
--                    gen_vfp_mul(dp);
--                    gen_vfp_neg(dp);
--                    break;
-                 case 6: /* add: fn + fm */
-                     gen_vfp_add(dp);
-                     break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 30/48] target/arm: Convert VADD to decodetree
+Deleted patch
-Convert the VADD instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 10 ++++++++++
- target/arm/translate.c         |  6 +-----
- target/arm/vfp.decode          |  5 +++++
-files changed, 16 insertions(+), 5 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
- }
-+
-+static bool trans_VADD_sp(DisasContext *s, arg_VADD_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_helper_vfp_adds, a->vd, a->vn, a->vm, false);
-+}
-+
-+static bool trans_VADD_dp(DisasContext *s, arg_VADD_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
-     tcg_temp_free_ptr(fpst);                                          \
- }
--VFP_OP2(add)
- VFP_OP2(sub)
- VFP_OP2(div)
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 5:
-+            case 0 ... 6:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 6: /* add: fn + fm */
--                    gen_vfp_add(dp);
--                    break;
-                 case 7: /* sub: fn - fm */
-                     gen_vfp_sub(dp);
-                     break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 31/48] target/arm: Convert VSUB to decodetree
+Deleted patch
-Convert the VSUB instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 10 ++++++++++
- target/arm/translate.c         |  6 +-----
- target/arm/vfp.decode          |  5 +++++
-files changed, 16 insertions(+), 5 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VADD_dp(DisasContext *s, arg_VADD_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
- }
-+
-+static bool trans_VSUB_sp(DisasContext *s, arg_VSUB_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_helper_vfp_subs, a->vd, a->vn, a->vm, false);
-+}
-+
-+static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
-     tcg_temp_free_ptr(fpst);                                          \
- }
--VFP_OP2(sub)
- VFP_OP2(div)
- #undef VFP_OP2
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 6:
-+            case 0 ... 7:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 7: /* sub: fn - fm */
--                    gen_vfp_sub(dp);
--                    break;
-                 case 8: /* div: fn / fm */
-                     gen_vfp_div(dp);
-                     break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VSUB_sp      ---- 1110 0.11 .... .... 1010 .1.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VSUB_dp      ---- 1110 0.11 .... .... 1011 .1.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 32/48] target/arm: Convert VDIV to decodetree
+Deleted patch
-Convert the VDIV instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 10 ++++++++++
- target/arm/translate.c         | 21 +--------------------
- target/arm/vfp.decode          |  5 +++++
-files changed, 16 insertions(+), 20 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
- }
-+
-+static bool trans_VDIV_sp(DisasContext *s, arg_VDIV_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_helper_vfp_divs, a->vd, a->vn, a->vm, false);
-+}
-+
-+static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_fpstatus_ptr(int neon)
-     return statusptr;
- }
--#define VFP_OP2(name)                                                 \
--static inline void gen_vfp_##name(int dp)                             \
--{                                                                     \
--    TCGv_ptr fpst = get_fpstatus_ptr(0);                              \
--    if (dp) {                                                         \
--        gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, cpu_F1d, fpst);    \
--    } else {                                                          \
--        gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, cpu_F1s, fpst);    \
--    }                                                                 \
--    tcg_temp_free_ptr(fpst);                                          \
--}
--
--VFP_OP2(div)
--
--#undef VFP_OP2
--
- static inline void gen_vfp_abs(int dp)
- {
-     if (dp)
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 7:
-+            case 0 ... 8:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 8: /* div: fn / fm */
--                    gen_vfp_div(dp);
--                    break;
-                 case 10: /* VFNMA : fd = muladd(-fd,  fn, fm) */
-                 case 11: /* VFNMS : fd = muladd(-fd, -fn, fm) */
-                 case 12: /* VFMA  : fd = muladd( fd,  fn, fm) */
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VSUB_sp      ---- 1110 0.11 .... .... 1010 .1.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VSUB_dp      ---- 1110 0.11 .... .... 1011 .1.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 33/48] target/arm: Convert VFP fused multiply-add insns to decodetree
+Deleted patch
-Convert the VFP fused multiply-add instructions (VFNMA, VFNMS,
-VFMA, VFMS) to decodetree.
-Note that in the old decode structure we were implementing
-these to honour the VFP vector stride/length. These instructions
-were introduced in VFPv4, and in the v7A architecture they
-are UNPREDICTABLE if the vector stride or length are non-zero.
-In v8A they must UNDEF if stride or length are non-zero, like
-all VFP instructions; we choose to UNDEF always.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 121 +++++++++++++++++++++++++++++++++
- target/arm/translate.c         |  53 +--------------
- target/arm/vfp.decode          |   9 +++
-files changed, 131 insertions(+), 52 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
- }
-+
-+static bool trans_VFM_sp(DisasContext *s, arg_VFM_sp *a)
-+{
-+    /*
-+     * VFNMA : fd = muladd(-fd,  fn, fm)
-+     * VFNMS : fd = muladd(-fd, -fn, fm)
-+     * VFMA  : fd = muladd( fd,  fn, fm)
-+     * VFMS  : fd = muladd( fd, -fn, fm)
-+     *
-+     * These are fused multiply-add, and must be done as one floating
-+     * point operation with no rounding between the multiplication and
-+     * addition steps.  NB that doing the negations here as separate
-+     * steps is correct : an input NaN should come out with its sign
-+     * bit flipped if it is a negated-input.
-+     */
-+    TCGv_ptr fpst;
-+    TCGv_i32 vn, vm, vd;
-+
-+    /*
-+     * Present in VFPv4 only.
-+     * In v7A, UNPREDICTABLE with non-zero vector length/stride; from
-+     * v8A, must UNDEF. We choose to UNDEF for both v7A and v8A.
-+     */
-+    if (!arm_dc_feature(s, ARM_FEATURE_VFP4) ||
-+        (s->vec_len != 0 || s->vec_stride != 0)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    vn = tcg_temp_new_i32();
-+    vm = tcg_temp_new_i32();
-+    vd = tcg_temp_new_i32();
-+
-+    neon_load_reg32(vn, a->vn);
-+    neon_load_reg32(vm, a->vm);
-+    if (a->o2) {
-+        /* VFNMS, VFMS */
-+        gen_helper_vfp_negs(vn, vn);
-+    }
-+    neon_load_reg32(vd, a->vd);
-+    if (a->o1 & 1) {
-+        /* VFNMA, VFNMS */
-+        gen_helper_vfp_negs(vd, vd);
-+    }
-+    fpst = get_fpstatus_ptr(0);
-+    gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
-+    neon_store_reg32(vd, a->vd);
-+
-+    tcg_temp_free_ptr(fpst);
-+    tcg_temp_free_i32(vn);
-+    tcg_temp_free_i32(vm);
-+    tcg_temp_free_i32(vd);
-+
-+    return true;
-+}
-+
-+static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
-+{
-+    /*
-+     * VFNMA : fd = muladd(-fd,  fn, fm)
-+     * VFNMS : fd = muladd(-fd, -fn, fm)
-+     * VFMA  : fd = muladd( fd,  fn, fm)
-+     * VFMS  : fd = muladd( fd, -fn, fm)
-+     *
-+     * These are fused multiply-add, and must be done as one floating
-+     * point operation with no rounding between the multiplication and
-+     * addition steps.  NB that doing the negations here as separate
-+     * steps is correct : an input NaN should come out with its sign
-+     * bit flipped if it is a negated-input.
-+     */
-+    TCGv_ptr fpst;
-+    TCGv_i64 vn, vm, vd;
-+
-+    /*
-+     * Present in VFPv4 only.
-+     * In v7A, UNPREDICTABLE with non-zero vector length/stride; from
-+     * v8A, must UNDEF. We choose to UNDEF for both v7A and v8A.
-+     */
-+    if (!arm_dc_feature(s, ARM_FEATURE_VFP4) ||
-+        (s->vec_len != 0 || s->vec_stride != 0)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vn | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    vn = tcg_temp_new_i64();
-+    vm = tcg_temp_new_i64();
-+    vd = tcg_temp_new_i64();
-+
-+    neon_load_reg64(vn, a->vn);
-+    neon_load_reg64(vm, a->vm);
-+    if (a->o2) {
-+        /* VFNMS, VFMS */
-+        gen_helper_vfp_negd(vn, vn);
-+    }
-+    neon_load_reg64(vd, a->vd);
-+    if (a->o1 & 1) {
-+        /* VFNMA, VFNMS */
-+        gen_helper_vfp_negd(vd, vd);
-+    }
-+    fpst = get_fpstatus_ptr(0);
-+    gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
-+    neon_store_reg64(vd, a->vd);
-+
-+    tcg_temp_free_ptr(fpst);
-+    tcg_temp_free_i64(vn);
-+    tcg_temp_free_i64(vm);
-+    tcg_temp_free_i64(vd);
-+
-+    return true;
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 8:
-+            case 0 ... 13:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 10: /* VFNMA : fd = muladd(-fd,  fn, fm) */
--                case 11: /* VFNMS : fd = muladd(-fd, -fn, fm) */
--                case 12: /* VFMA  : fd = muladd( fd,  fn, fm) */
--                case 13: /* VFMS  : fd = muladd( fd, -fn, fm) */
--                    /* These are fused multiply-add, and must be done as one
--                     * floating point operation with no rounding between the
--                     * multiplication and addition steps.
--                     * NB that doing the negations here as separate steps is
--                     * correct : an input NaN should come out with its sign bit
--                     * flipped if it is a negated-input.
--                     */
--                    if (!arm_dc_feature(s, ARM_FEATURE_VFP4)) {
--                        return 1;
--                    }
--                    if (dp) {
--                        TCGv_ptr fpst;
--                        TCGv_i64 frd;
--                        if (op & 1) {
--                            /* VFNMS, VFMS */
--                            gen_helper_vfp_negd(cpu_F0d, cpu_F0d);
--                        }
--                        frd = tcg_temp_new_i64();
--                        tcg_gen_ld_f64(frd, cpu_env, vfp_reg_offset(dp, rd));
--                        if (op & 2) {
--                            /* VFNMA, VFNMS */
--                            gen_helper_vfp_negd(frd, frd);
--                        }
--                        fpst = get_fpstatus_ptr(0);
--                        gen_helper_vfp_muladdd(cpu_F0d, cpu_F0d,
--                                               cpu_F1d, frd, fpst);
--                        tcg_temp_free_ptr(fpst);
--                        tcg_temp_free_i64(frd);
--                    } else {
--                        TCGv_ptr fpst;
--                        TCGv_i32 frd;
--                        if (op & 1) {
--                            /* VFNMS, VFMS */
--                            gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
--                        }
--                        frd = tcg_temp_new_i32();
--                        tcg_gen_ld_f32(frd, cpu_env, vfp_reg_offset(dp, rd));
--                        if (op & 2) {
--                            gen_helper_vfp_negs(frd, frd);
--                        }
--                        fpst = get_fpstatus_ptr(0);
--                        gen_helper_vfp_muladds(cpu_F0s, cpu_F0s,
--                                               cpu_F1s, frd, fpst);
--                        tcg_temp_free_ptr(fpst);
--                        tcg_temp_free_i32(frd);
--                    }
--                    break;
-                 case 14: /* fconst */
-                     if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
-                         return 1;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VFM_sp       ---- 1110 1.01 .... .... 1010 . o2:1 . 0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=1
-+VFM_dp       ---- 1110 1.01 .... .... 1011 . o2:1 . 0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=1
-+VFM_sp       ---- 1110 1.10 .... .... 1010 . o2:1 . 0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=2
-+VFM_dp       ---- 1110 1.10 .... .... 1011 . o2:1 . 0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=2
---
-.20.1

-[Qemu-devel] [PULL 36/48] target/arm: Convert VNEG to decodetree
+Deleted patch
-Convert the VNEG instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 10 ++++++++++
- target/arm/translate.c         |  6 +-----
- target/arm/vfp.decode          |  5 +++++
-files changed, 16 insertions(+), 5 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
- {
-     return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
- }
-+
-+static bool trans_VNEG_sp(DisasContext *s, arg_VNEG_sp *a)
-+{
-+    return do_vfp_2op_sp(s, gen_helper_vfp_negs, a->vd, a->vm);
-+}
-+
-+static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
-+{
-+    return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 return 1;
-             case 15:
-                 switch (rn) {
--                case 1:
-+                case 1 ... 2:
-                     /* Already handled by decodetree */
-                     return 1;
-                 default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
-                 switch (rn) {
-                 case 0x00: /* vmov */
--                case 0x02: /* vneg */
-                 case 0x03: /* vsqrt */
-                     break;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                     case 0: /* cpy */
-                         /* no-op */
-                         break;
--                    case 2: /* neg */
--                        gen_vfp_neg(dp);
--                        break;
-                     case 3: /* sqrt */
-                         gen_vfp_sqrt(dp);
-                         break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
-              vd=%vd_sp vm=%vm_sp
- VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
-              vd=%vd_dp vm=%vm_dp
-+
-+VNEG_sp      ---- 1110 1.11 0001 .... 1010 01.0 .... \
-+             vd=%vd_sp vm=%vm_sp
-+VNEG_dp      ---- 1110 1.11 0001 .... 1011 01.0 .... \
-+             vd=%vd_dp vm=%vm_dp
---
-.20.1

-[Qemu-devel] [PULL 37/48] target/arm: Convert VSQRT to decodetree
+Deleted patch
-Convert the VSQRT instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 20 ++++++++++++++++++++
- target/arm/translate.c         | 14 +-------------
- target/arm/vfp.decode          |  5 +++++
-files changed, 26 insertions(+), 13 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
- {
-     return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
- }
-+
-+static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
-+{
-+    gen_helper_vfp_sqrts(vd, vm, cpu_env);
-+}
-+
-+static bool trans_VSQRT_sp(DisasContext *s, arg_VSQRT_sp *a)
-+{
-+    return do_vfp_2op_sp(s, gen_VSQRT_sp, a->vd, a->vm);
-+}
-+
-+static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
-+{
-+    gen_helper_vfp_sqrtd(vd, vm, cpu_env);
-+}
-+
-+static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
-+{
-+    return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_neg(int dp)
-         gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
- }
--static inline void gen_vfp_sqrt(int dp)
--{
--    if (dp)
--        gen_helper_vfp_sqrtd(cpu_F0d, cpu_F0d, cpu_env);
--    else
--        gen_helper_vfp_sqrts(cpu_F0s, cpu_F0s, cpu_env);
--}
--
- static inline void gen_vfp_cmp(int dp)
- {
-     if (dp)
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 return 1;
-             case 15:
-                 switch (rn) {
--                case 1 ... 2:
-+                case 1 ... 3:
-                     /* Already handled by decodetree */
-                     return 1;
-                 default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
-                 switch (rn) {
-                 case 0x00: /* vmov */
--                case 0x03: /* vsqrt */
-                     break;
-                 case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                     case 0: /* cpy */
-                         /* no-op */
-                         break;
--                    case 3: /* sqrt */
--                        gen_vfp_sqrt(dp);
--                        break;
-                     case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
-                     {
-                         TCGv_ptr fpst = get_fpstatus_ptr(false);
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VNEG_sp      ---- 1110 1.11 0001 .... 1010 01.0 .... \
-              vd=%vd_sp vm=%vm_sp
- VNEG_dp      ---- 1110 1.11 0001 .... 1011 01.0 .... \
-              vd=%vd_dp vm=%vm_dp
-+
-+VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 .... \
-+             vd=%vd_sp vm=%vm_sp
-+VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 .... \
-+             vd=%vd_dp vm=%vm_dp
---
-.20.1

-[Qemu-devel] [PULL 39/48] target/arm: Convert VFP comparison insns to decodetree
+Deleted patch
-Convert the VFP comparison instructions to decodetree.
-Note that comparison instructions should not honour the VFP
-short-vector length and stride information: they are scalar-only
-operations.  This applies to all the 2-operand instructions except
-for VMOV, VABS, VNEG and VSQRT.  (In the old decoder this is
-implemented via the "if (op == 15 && rn > 3) { veclen = 0; }" check.)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 75 ++++++++++++++++++++++++++++++++++
- target/arm/translate.c         | 51 +----------------------
- target/arm/vfp.decode          |  5 +++
-files changed, 81 insertions(+), 50 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
- {
-     return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
- }
-+
-+static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
-+{
-+    TCGv_i32 vd, vm;
-+
-+    /* Vm/M bits must be zero for the Z variant */
-+    if (a->z && a->vm != 0) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    vd = tcg_temp_new_i32();
-+    vm = tcg_temp_new_i32();
-+
-+    neon_load_reg32(vd, a->vd);
-+    if (a->z) {
-+        tcg_gen_movi_i32(vm, 0);
-+    } else {
-+        neon_load_reg32(vm, a->vm);
-+    }
-+
-+    if (a->e) {
-+        gen_helper_vfp_cmpes(vd, vm, cpu_env);
-+    } else {
-+        gen_helper_vfp_cmps(vd, vm, cpu_env);
-+    }
-+
-+    tcg_temp_free_i32(vd);
-+    tcg_temp_free_i32(vm);
-+
-+    return true;
-+}
-+
-+static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
-+{
-+    TCGv_i64 vd, vm;
-+
-+    /* Vm/M bits must be zero for the Z variant */
-+    if (a->z && a->vm != 0) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    vd = tcg_temp_new_i64();
-+    vm = tcg_temp_new_i64();
-+
-+    neon_load_reg64(vd, a->vd);
-+    if (a->z) {
-+        tcg_gen_movi_i64(vm, 0);
-+    } else {
-+        neon_load_reg64(vm, a->vm);
-+    }
-+
-+    if (a->e) {
-+        gen_helper_vfp_cmped(vd, vm, cpu_env);
-+    } else {
-+        gen_helper_vfp_cmpd(vd, vm, cpu_env);
-+    }
-+
-+    tcg_temp_free_i64(vd);
-+    tcg_temp_free_i64(vm);
-+
-+    return true;
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_neg(int dp)
-         gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
- }
--static inline void gen_vfp_cmp(int dp)
--{
--    if (dp)
--        gen_helper_vfp_cmpd(cpu_F0d, cpu_F1d, cpu_env);
--    else
--        gen_helper_vfp_cmps(cpu_F0s, cpu_F1s, cpu_env);
--}
--
--static inline void gen_vfp_cmpe(int dp)
--{
--    if (dp)
--        gen_helper_vfp_cmped(cpu_F0d, cpu_F1d, cpu_env);
--    else
--        gen_helper_vfp_cmpes(cpu_F0s, cpu_F1s, cpu_env);
--}
--
--static inline void gen_vfp_F1_ld0(int dp)
--{
--    if (dp)
--        tcg_gen_movi_i64(cpu_F1d, 0);
--    else
--        tcg_gen_movi_i32(cpu_F1s, 0);
--}
--
- #define VFP_GEN_ITOF(name) \
- static inline void gen_vfp_##name(int dp, int neon) \
- { \
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             case 15:
-                 switch (rn) {
-                 case 0 ... 3:
-+                case 8 ... 11:
-                     /* Already handled by decodetree */
-                     return 1;
-                 default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                     rd_is_dp = false;
-                     break;
--                case 0x08: case 0x0a: /* vcmp, vcmpz */
--                case 0x09: case 0x0b: /* vcmpe, vcmpez */
--                    no_output = true;
--                    break;
--
-                 case 0x0c: /* vrintr */
-                 case 0x0d: /* vrintz */
-                 case 0x0e: /* vrintx */
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             /* Load the initial operands.  */
-             if (op == 15) {
-                 switch (rn) {
--                case 0x08: case 0x09: /* Compare */
--                    gen_mov_F0_vreg(dp, rd);
--                    gen_mov_F1_vreg(dp, rm);
--                    break;
--                case 0x0a: case 0x0b: /* Compare with zero */
--                    gen_mov_F0_vreg(dp, rd);
--                    gen_vfp_F1_ld0(dp);
--                    break;
-                 case 0x14: /* vcvt fp <-> fixed */
-                 case 0x15:
-                 case 0x16:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                         gen_vfp_msr(tmp);
-                         break;
-                     }
--                    case 8: /* cmp */
--                        gen_vfp_cmp(dp);
--                        break;
--                    case 9: /* cmpe */
--                        gen_vfp_cmpe(dp);
--                        break;
--                    case 10: /* cmpz */
--                        gen_vfp_cmp(dp);
--                        break;
--                    case 11: /* cmpez */
--                        gen_vfp_F1_ld0(dp);
--                        gen_vfp_cmpe(dp);
--                        break;
-                     case 12: /* vrintr */
-                     {
-                         TCGv_ptr fpst = get_fpstatus_ptr(0);
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 .... \
-              vd=%vd_sp vm=%vm_sp
- VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 .... \
-              vd=%vd_dp vm=%vm_dp
-+
-+VCMP_sp      ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
-+             vd=%vd_sp vm=%vm_sp
-+VCMP_dp      ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
-+             vd=%vd_dp vm=%vm_dp
---
-.20.1

-[Qemu-devel] [PULL 40/48] target/arm: Convert the VCVT-from-f16 insns to decodetree
+Deleted patch
-Convert the VCVTT, VCVTB instructions that deal with conversion
-from half-precision floats to f32 or 64 to decodetree.
-Since we're no longer constrained to the old decoder's style
-using cpu_F0s and cpu_F0d we can perform a direct 16 bit
-load of the right half of the input single-precision register
-rather than loading the full 32 bits and then doing a
-separate shift or sign-extension.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 82 ++++++++++++++++++++++++++++++++++
- target/arm/translate.c         | 56 +----------------------
- target/arm/vfp.decode          |  6 +++
-files changed, 89 insertions(+), 55 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@
- #include "decode-vfp.inc.c"
- #include "decode-vfp-uncond.inc.c"
-+/*
-+ * Return the offset of a 16-bit half of the specified VFP single-precision
-+ * register. If top is true, returns the top 16 bits; otherwise the bottom
-+ * 16 bits.
-+ */
-+static inline long vfp_f16_offset(unsigned reg, bool top)
-+{
-+    long offs = vfp_reg_offset(false, reg);
-+#ifdef HOST_WORDS_BIGENDIAN
-+    if (!top) {
-+        offs += 2;
-+    }
-+#else
-+    if (top) {
-+        offs += 2;
-+    }
-+#endif
-+    return offs;
-+}
-+
- /*
-  * Check that VFP access is enabled. If it is, do the necessary
-  * M-profile lazy-FP handling and then return true.
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
-     return true;
- }
-+
-+static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
-+{
-+    TCGv_ptr fpst;
-+    TCGv_i32 ahp_mode;
-+    TCGv_i32 tmp;
-+
-+    if (!dc_isar_feature(aa32_fp16_spconv, s)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    fpst = get_fpstatus_ptr(false);
-+    ahp_mode = get_ahp_flag();
-+    tmp = tcg_temp_new_i32();
-+    /* The T bit tells us if we want the low or high 16 bits of Vm */
-+    tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
-+    gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
-+    neon_store_reg32(tmp, a->vd);
-+    tcg_temp_free_i32(ahp_mode);
-+    tcg_temp_free_ptr(fpst);
-+    tcg_temp_free_i32(tmp);
-+    return true;
-+}
-+
-+static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
-+{
-+    TCGv_ptr fpst;
-+    TCGv_i32 ahp_mode;
-+    TCGv_i32 tmp;
-+    TCGv_i64 vd;
-+
-+    if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd  & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    fpst = get_fpstatus_ptr(false);
-+    ahp_mode = get_ahp_flag();
-+    tmp = tcg_temp_new_i32();
-+    /* The T bit tells us if we want the low or high 16 bits of Vm */
-+    tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
-+    vd = tcg_temp_new_i64();
-+    gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
-+    neon_store_reg64(vd, a->vd);
-+    tcg_temp_free_i32(ahp_mode);
-+    tcg_temp_free_ptr(fpst);
-+    tcg_temp_free_i32(tmp);
-+    tcg_temp_free_i64(vd);
-+    return true;
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 return 1;
-             case 15:
-                 switch (rn) {
--                case 0 ... 3:
-+                case 0 ... 5:
-                 case 8 ... 11:
-                     /* Already handled by decodetree */
-                     return 1;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             if (op == 15) {
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
-                 switch (rn) {
--                case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
--                case 0x05: /* vcvtt.f64.f16, vcvtt.f32.f16 */
--                    /*
--                     * VCVTB, VCVTT: only present with the halfprec extension
--                     * UNPREDICTABLE if bit 8 is set prior to ARMv8
--                     * (we choose to UNDEF)
--                     */
--                    if (dp) {
--                        if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
--                            return 1;
--                        }
--                    } else {
--                        if (!dc_isar_feature(aa32_fp16_spconv, s)) {
--                            return 1;
--                        }
--                    }
--                    rm_is_dp = false;
--                    break;
-                 case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
-                 case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
-                     if (dp) {
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 switch (op) {
-                 case 15: /* extension space */
-                     switch (rn) {
--                    case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
--                    {
--                        TCGv_ptr fpst = get_fpstatus_ptr(false);
--                        TCGv_i32 ahp_mode = get_ahp_flag();
--                        tmp = gen_vfp_mrs();
--                        tcg_gen_ext16u_i32(tmp, tmp);
--                        if (dp) {
--                            gen_helper_vfp_fcvt_f16_to_f64(cpu_F0d, tmp,
--                                                           fpst, ahp_mode);
--                        } else {
--                            gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp,
--                                                           fpst, ahp_mode);
--                        }
--                        tcg_temp_free_i32(ahp_mode);
--                        tcg_temp_free_ptr(fpst);
--                        tcg_temp_free_i32(tmp);
--                        break;
--                    }
--                    case 5: /* vcvtt.f32.f16, vcvtt.f64.f16 */
--                    {
--                        TCGv_ptr fpst = get_fpstatus_ptr(false);
--                        TCGv_i32 ahp = get_ahp_flag();
--                        tmp = gen_vfp_mrs();
--                        tcg_gen_shri_i32(tmp, tmp, 16);
--                        if (dp) {
--                            gen_helper_vfp_fcvt_f16_to_f64(cpu_F0d, tmp,
--                                                           fpst, ahp);
--                        } else {
--                            gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp,
--                                                           fpst, ahp);
--                        }
--                        tcg_temp_free_i32(tmp);
--                        tcg_temp_free_i32(ahp);
--                        tcg_temp_free_ptr(fpst);
--                        break;
--                    }
-                     case 6: /* vcvtb.f16.f32, vcvtb.f16.f64 */
-                     {
-                         TCGv_ptr fpst = get_fpstatus_ptr(false);
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VCMP_sp      ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
-              vd=%vd_sp vm=%vm_sp
- VCMP_dp      ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
-              vd=%vd_dp vm=%vm_dp
-+
-+# VCVTT and VCVTB from f16: Vd format depends on size bit; Vm is always vm_sp
-+VCVT_f32_f16 ---- 1110 1.11 0010 .... 1010 t:1 1.0 .... \
-+             vd=%vd_sp vm=%vm_sp
-+VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
-+             vd=%vd_dp vm=%vm_sp
---
-.20.1

Arm queue; the bulk of this is the VFP decodetree conversion...

thanks
-- PMM

The following changes since commit 4747524f9f243ca5ff1f146d37e423c00e923ee1:

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2019-06-12' into staging (2019-06-13 11:58:00 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20190613

for you to fetch changes up to 07e4c7f769120c9a5bd6a26c2dc1421f2f838d80:

target/arm: Fix short-vector increment behaviour (2019-06-13 12:57:37 +0100)

----------------------------------------------------------------
target-arm queue:
 * convert aarch32 VFP decoder to decodetree
   (includes tightening up decode in a few places)
 * fix minor bugs in VFP short-vector handling
 * hw/core/bus.c: Only the main system bus can have no parent
 * smmuv3: Fix decoding of ID register range
 * Implement NSACR gating of floating point
 * Use tcg_gen_gvec_bitsel
 * Vectorize USHL and SSHL

----------------------------------------------------------------
Peter Maydell (44):
      target/arm: Implement NSACR gating of floating point
      hw/arm/smmuv3: Fix decoding of ID register range
      hw/core/bus.c: Only the main system bus can have no parent
      target/arm: Add stubs for AArch32 VFP decodetree
      target/arm: Factor out VFP access checking code
      target/arm: Fix Cortex-R5F MVFR values
      target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max
      target/arm: Convert the VSEL instructions to decodetree
      target/arm: Convert VMINNM, VMAXNM to decodetree
      target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM to decodetree
      target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM to decodetree
      target/arm: Move the VFP trans_* functions to translate-vfp.inc.c
      target/arm: Add helpers for VFP register loads and stores
      target/arm: Convert "double-precision" register moves to decodetree
      target/arm: Convert "single-precision" register moves to decodetree
      target/arm: Convert VFP two-register transfer insns to decodetree
      target/arm: Convert VFP VLDR and VSTR to decodetree
      target/arm: Convert the VFP load/store multiple insns to decodetree
      target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d
      target/arm: Convert VFP VMLA to decodetree
      target/arm: Convert VFP VMLS to decodetree
      target/arm: Convert VFP VNMLS to decodetree
      target/arm: Convert VFP VNMLA to decodetree
      target/arm: Convert VMUL to decodetree
      target/arm: Convert VNMUL to decodetree
      target/arm: Convert VADD to decodetree
      target/arm: Convert VSUB to decodetree
      target/arm: Convert VDIV to decodetree
      target/arm: Convert VFP fused multiply-add insns to decodetree
      target/arm: Convert VMOV (imm) to decodetree
      target/arm: Convert VABS to decodetree
      target/arm: Convert VNEG to decodetree
      target/arm: Convert VSQRT to decodetree
      target/arm: Convert VMOV (register) to decodetree
      target/arm: Convert VFP comparison insns to decodetree
      target/arm: Convert the VCVT-from-f16 insns to decodetree
      target/arm: Convert the VCVT-to-f16 insns to decodetree
      target/arm: Convert VFP round insns to decodetree
      target/arm: Convert double-single precision conversion insns to decodetree
      target/arm: Convert integer-to-float insns to decodetree
      target/arm: Convert VJCVT to decodetree
      target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree
      target/arm: Convert float-to-integer VCVT insns to decodetree
      target/arm: Fix short-vector increment behaviour

Richard Henderson (4):
      target/arm: Vectorize USHL and SSHL
      target/arm: Use tcg_gen_gvec_bitsel
      target/arm: Fix output of PAuth Auth
      decodetree: Fix comparison of Field

target/arm/Makefile.objs          |   13 +
 tests/tcg/aarch64/Makefile.target |    2 +-
 target/arm/cpu.h                  |   11 +
 target/arm/helper.h               |   11 +-
 target/arm/translate-a64.h        |    2 +
 target/arm/translate.h            |    9 +-
 hw/arm/smmuv3.c                   |    2 +-
 hw/core/bus.c                     |   21 +-
 target/arm/cpu.c                  |    6 +
 target/arm/helper.c               |   75 +-
 target/arm/neon_helper.c          |   33 -
 target/arm/pauth_helper.c         |    4 +-
 target/arm/translate-a64.c        |   33 +-
 target/arm/translate-vfp.inc.c    | 2672 +++++++++++++++++++++++++++++++++++++
 target/arm/translate.c            | 1881 +++++---------------------
 target/arm/vec_helper.c           |   88 ++
 tests/tcg/aarch64/pauth-2.c       |   61 +
 scripts/decodetree.py             |    2 +-
 target/arm/vfp-uncond.decode      |   63 +
 target/arm/vfp.decode             |  242 ++++
 20 files changed, 3593 insertions(+), 1638 deletions(-)
 create mode 100644 target/arm/translate-vfp.inc.c
 create mode 100644 tests/tcg/aarch64/pauth-2.c
 create mode 100644 target/arm/vfp-uncond.decode
 create mode 100644 target/arm/vfp.decode

From: Richard Henderson <richard.henderson@linaro.org>

These instructions shift left or right depending on the sign
of the input, and 7 bits are significant to the shift.  This
requires several masks and selects in addition to the actual
shifts to form the complete answer.

That said, the operation is still a small improvement even for
two 64-bit elements -- 13 vector operations instead of 2 * 7
integer operations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190603232209.20704-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  11 +-
 target/arm/translate.h     |   6 +
 target/arm/neon_helper.c   |  33 ----
 target/arm/translate-a64.c |  18 +--
 target/arm/translate.c     | 300 +++++++++++++++++++++++++++++++++++--
 target/arm/vec_helper.c    |  88 +++++++++++
 6 files changed, 390 insertions(+), 66 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(neon_abd_s16, i32, i32, i32)
 DEF_HELPER_2(neon_abd_u32, i32, i32, i32)
 DEF_HELPER_2(neon_abd_s32, i32, i32, i32)
 
-DEF_HELPER_2(neon_shl_u8, i32, i32, i32)
-DEF_HELPER_2(neon_shl_s8, i32, i32, i32)
 DEF_HELPER_2(neon_shl_u16, i32, i32, i32)
 DEF_HELPER_2(neon_shl_s16, i32, i32, i32)
-DEF_HELPER_2(neon_shl_u32, i32, i32, i32)
-DEF_HELPER_2(neon_shl_s32, i32, i32, i32)
-DEF_HELPER_2(neon_shl_u64, i64, i64, i64)
-DEF_HELPER_2(neon_shl_s64, i64, i64, i64)
 DEF_HELPER_2(neon_rshl_u8, i32, i32, i32)
 DEF_HELPER_2(neon_rshl_s8, i32, i32, i32)
 DEF_HELPER_2(neon_rshl_u16, i32, i32, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_2(frint64_s, TCG_CALL_NO_RWG, f32, f32, ptr)
 DEF_HELPER_FLAGS_2(frint32_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 DEF_HELPER_FLAGS_2(frint64_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 
+DEF_HELPER_FLAGS_4(gvec_sshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ extern const GVecGen3 bif_op;
 extern const GVecGen3 mla_op[4];
 extern const GVecGen3 mls_op[4];
 extern const GVecGen3 cmtst_op[4];
+extern const GVecGen3 sshl_op[4];
+extern const GVecGen3 ushl_op[4];
 extern const GVecGen2i ssra_op[4];
 extern const GVecGen2i usra_op[4];
 extern const GVecGen2i sri_op[4];
@@ -XXX,XX +XXX,XX @@ extern const GVecGen4 sqadd_op[4];
 extern const GVecGen4 uqsub_op[4];
 extern const GVecGen4 sqsub_op[4];
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
+void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
+void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 /*
  * Forward to the isar_feature_* tests given a DisasContext pointer.
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -XXX,XX +XXX,XX @@ NEON_VOP(abd_u32, neon_u32, 1)
     } else { \
         dest = src1 << tmp; \
     }} while (0)
-NEON_VOP(shl_u8, neon_u8, 4)
 NEON_VOP(shl_u16, neon_u16, 2)
-NEON_VOP(shl_u32, neon_u32, 1)
 #undef NEON_FN
 
-uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
-{
-    int8_t shift = (int8_t)shiftop;
-    if (shift >= 64 || shift <= -64) {
-        val = 0;
-    } else if (shift < 0) {
-        val >>= -shift;
-    } else {
-        val <<= shift;
-    }
-    return val;
-}
-
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
     } else { \
         dest = src1 << tmp; \
     }} while (0)
-NEON_VOP(shl_s8, neon_s8, 4)
 NEON_VOP(shl_s16, neon_s16, 2)
-NEON_VOP(shl_s32, neon_s32, 1)
 #undef NEON_FN
 
-uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop)
-{
-    int8_t shift = (int8_t)shiftop;
-    int64_t val = valop;
-    if (shift >= 64) {
-        val = 0;
-    } else if (shift <= -64) {
-        val >>= 63;
-    } else if (shift < 0) {
-        val >>= -shift;
-    } else {
-        val <<= shift;
-    }
-    return val;
-}
-
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
         break;
     case 0x8: /* SSHL, USHL */
         if (u) {
-            gen_helper_neon_shl_u64(tcg_rd, tcg_rn, tcg_rm);
+            gen_ushl_i64(tcg_rd, tcg_rn, tcg_rm);
         } else {
-            gen_helper_neon_shl_s64(tcg_rd, tcg_rn, tcg_rm);
+            gen_sshl_i64(tcg_rd, tcg_rn, tcg_rm);
         }
         break;
     case 0x9: /* SQSHL, UQSHL */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                        is_q ? 16 : 8, vec_full_reg_size(s),
                        (u ? uqsub_op : sqsub_op) + size);
         return;
+    case 0x08: /* SSHL, USHL */
+        gen_gvec_op3(s, is_q, rd, rn, rm,
+                     u ? &ushl_op[size] : &sshl_op[size]);
+        return;
     case 0x0c: /* SMAX, UMAX */
         if (u) {
             gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genfn = fns[size][u];
                 break;
             }
-            case 0x8: /* SSHL, USHL */
-            {
-                static NeonGenTwoOpFn * const fns[3][2] = {
-                    { gen_helper_neon_shl_s8, gen_helper_neon_shl_u8 },
-                    { gen_helper_neon_shl_s16, gen_helper_neon_shl_u16 },
-                    { gen_helper_neon_shl_s32, gen_helper_neon_shl_u32 },
-                };
-                genfn = fns[size][u];
-                break;
-            }
             case 0x9: /* SQSHL, UQSHL */
             {
                 static NeonGenTwoOpEnvFn * const fns[3][2] = {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_shift_narrow(int size, TCGv_i32 var, TCGv_i32 shift,
         if (u) {
             switch (size) {
             case 1: gen_helper_neon_shl_u16(var, var, shift); break;
-            case 2: gen_helper_neon_shl_u32(var, var, shift); break;
+            case 2: gen_ushl_i32(var, var, shift); break;
             default: abort();
             }
         } else {
             switch (size) {
             case 1: gen_helper_neon_shl_s16(var, var, shift); break;
-            case 2: gen_helper_neon_shl_s32(var, var, shift); break;
+            case 2: gen_sshl_i32(var, var, shift); break;
             default: abort();
             }
         }
@@ -XXX,XX +XXX,XX @@ const GVecGen3 cmtst_op[4] = {
       .vece = MO_64 },
 };
 
+void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 lval = tcg_temp_new_i32();
+    TCGv_i32 rval = tcg_temp_new_i32();
+    TCGv_i32 lsh = tcg_temp_new_i32();
+    TCGv_i32 rsh = tcg_temp_new_i32();
+    TCGv_i32 zero = tcg_const_i32(0);
+    TCGv_i32 max = tcg_const_i32(32);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i32(lsh, b);
+    tcg_gen_neg_i32(rsh, lsh);
+    tcg_gen_shl_i32(lval, a, lsh);
+    tcg_gen_shr_i32(rval, a, rsh);
+    tcg_gen_movcond_i32(TCG_COND_LTU, d, lsh, max, lval, zero);
+    tcg_gen_movcond_i32(TCG_COND_LTU, d, rsh, max, rval, d);
+
+    tcg_temp_free_i32(lval);
+    tcg_temp_free_i32(rval);
+    tcg_temp_free_i32(lsh);
+    tcg_temp_free_i32(rsh);
+    tcg_temp_free_i32(zero);
+    tcg_temp_free_i32(max);
+}
+
+void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 lval = tcg_temp_new_i64();
+    TCGv_i64 rval = tcg_temp_new_i64();
+    TCGv_i64 lsh = tcg_temp_new_i64();
+    TCGv_i64 rsh = tcg_temp_new_i64();
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 max = tcg_const_i64(64);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i64(lsh, b);
+    tcg_gen_neg_i64(rsh, lsh);
+    tcg_gen_shl_i64(lval, a, lsh);
+    tcg_gen_shr_i64(rval, a, rsh);
+    tcg_gen_movcond_i64(TCG_COND_LTU, d, lsh, max, lval, zero);
+    tcg_gen_movcond_i64(TCG_COND_LTU, d, rsh, max, rval, d);
+
+    tcg_temp_free_i64(lval);
+    tcg_temp_free_i64(rval);
+    tcg_temp_free_i64(lsh);
+    tcg_temp_free_i64(rsh);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(max);
+}
+
+static void gen_ushl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec lval = tcg_temp_new_vec_matching(d);
+    TCGv_vec rval = tcg_temp_new_vec_matching(d);
+    TCGv_vec lsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec rsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec msk, max;
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_neg_vec(vece, rsh, b);
+    if (vece == MO_8) {
+        tcg_gen_mov_vec(lsh, b);
+    } else {
+        msk = tcg_temp_new_vec_matching(d);
+        tcg_gen_dupi_vec(vece, msk, 0xff);
+        tcg_gen_and_vec(vece, lsh, b, msk);
+        tcg_gen_and_vec(vece, rsh, rsh, msk);
+        tcg_temp_free_vec(msk);
+    }
+
+    /*
+     * Perform possibly out of range shifts, trusting that the operation
+     * does not trap.  Discard unused results after the fact.
+     */
+    tcg_gen_shlv_vec(vece, lval, a, lsh);
+    tcg_gen_shrv_vec(vece, rval, a, rsh);
+
+    max = tcg_temp_new_vec_matching(d);
+    tcg_gen_dupi_vec(vece, max, 8 << vece);
+
+    /*
+     * The choice of LT (signed) and GEU (unsigned) are biased toward
+     * the instructions of the x86_64 host.  For MO_8, the whole byte
+     * is significant so we must use an unsigned compare; otherwise we
+     * have already masked to a byte and so a signed compare works.
+     * Other tcg hosts have a full set of comparisons and do not care.
+     */
+    if (vece == MO_8) {
+        tcg_gen_cmp_vec(TCG_COND_GEU, vece, lsh, lsh, max);
+        tcg_gen_cmp_vec(TCG_COND_GEU, vece, rsh, rsh, max);
+        tcg_gen_andc_vec(vece, lval, lval, lsh);
+        tcg_gen_andc_vec(vece, rval, rval, rsh);
+    } else {
+        tcg_gen_cmp_vec(TCG_COND_LT, vece, lsh, lsh, max);
+        tcg_gen_cmp_vec(TCG_COND_LT, vece, rsh, rsh, max);
+        tcg_gen_and_vec(vece, lval, lval, lsh);
+        tcg_gen_and_vec(vece, rval, rval, rsh);
+    }
+    tcg_gen_or_vec(vece, d, lval, rval);
+
+    tcg_temp_free_vec(max);
+    tcg_temp_free_vec(lval);
+    tcg_temp_free_vec(rval);
+    tcg_temp_free_vec(lsh);
+    tcg_temp_free_vec(rsh);
+}
+
+static const TCGOpcode ushl_list[] = {
+    INDEX_op_neg_vec, INDEX_op_shlv_vec,
+    INDEX_op_shrv_vec, INDEX_op_cmp_vec, 0
+};
+
+const GVecGen3 ushl_op[4] = {
+    { .fniv = gen_ushl_vec,
+      .fno = gen_helper_gvec_ushl_b,
+      .opt_opc = ushl_list,
+      .vece = MO_8 },
+    { .fniv = gen_ushl_vec,
+      .fno = gen_helper_gvec_ushl_h,
+      .opt_opc = ushl_list,
+      .vece = MO_16 },
+    { .fni4 = gen_ushl_i32,
+      .fniv = gen_ushl_vec,
+      .opt_opc = ushl_list,
+      .vece = MO_32 },
+    { .fni8 = gen_ushl_i64,
+      .fniv = gen_ushl_vec,
+      .opt_opc = ushl_list,
+      .vece = MO_64 },
+};
+
+void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 lval = tcg_temp_new_i32();
+    TCGv_i32 rval = tcg_temp_new_i32();
+    TCGv_i32 lsh = tcg_temp_new_i32();
+    TCGv_i32 rsh = tcg_temp_new_i32();
+    TCGv_i32 zero = tcg_const_i32(0);
+    TCGv_i32 max = tcg_const_i32(31);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i32(lsh, b);
+    tcg_gen_neg_i32(rsh, lsh);
+    tcg_gen_shl_i32(lval, a, lsh);
+    tcg_gen_umin_i32(rsh, rsh, max);
+    tcg_gen_sar_i32(rval, a, rsh);
+    tcg_gen_movcond_i32(TCG_COND_LEU, lval, lsh, max, lval, zero);
+    tcg_gen_movcond_i32(TCG_COND_LT, d, lsh, zero, rval, lval);
+
+    tcg_temp_free_i32(lval);
+    tcg_temp_free_i32(rval);
+    tcg_temp_free_i32(lsh);
+    tcg_temp_free_i32(rsh);
+    tcg_temp_free_i32(zero);
+    tcg_temp_free_i32(max);
+}
+
+void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 lval = tcg_temp_new_i64();
+    TCGv_i64 rval = tcg_temp_new_i64();
+    TCGv_i64 lsh = tcg_temp_new_i64();
+    TCGv_i64 rsh = tcg_temp_new_i64();
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 max = tcg_const_i64(63);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i64(lsh, b);
+    tcg_gen_neg_i64(rsh, lsh);
+    tcg_gen_shl_i64(lval, a, lsh);
+    tcg_gen_umin_i64(rsh, rsh, max);
+    tcg_gen_sar_i64(rval, a, rsh);
+    tcg_gen_movcond_i64(TCG_COND_LEU, lval, lsh, max, lval, zero);
+    tcg_gen_movcond_i64(TCG_COND_LT, d, lsh, zero, rval, lval);
+
+    tcg_temp_free_i64(lval);
+    tcg_temp_free_i64(rval);
+    tcg_temp_free_i64(lsh);
+    tcg_temp_free_i64(rsh);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(max);
+}
+
+static void gen_sshl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec lval = tcg_temp_new_vec_matching(d);
+    TCGv_vec rval = tcg_temp_new_vec_matching(d);
+    TCGv_vec lsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec rsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec tmp = tcg_temp_new_vec_matching(d);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_neg_vec(vece, rsh, b);
+    if (vece == MO_8) {
+        tcg_gen_mov_vec(lsh, b);
+    } else {
+        tcg_gen_dupi_vec(vece, tmp, 0xff);
+        tcg_gen_and_vec(vece, lsh, b, tmp);
+        tcg_gen_and_vec(vece, rsh, rsh, tmp);
+    }
+
+    /* Bound rsh so out of bound right shift gets -1.  */
+    tcg_gen_dupi_vec(vece, tmp, (8 << vece) - 1);
+    tcg_gen_umin_vec(vece, rsh, rsh, tmp);
+    tcg_gen_cmp_vec(TCG_COND_GT, vece, tmp, lsh, tmp);
+
+    tcg_gen_shlv_vec(vece, lval, a, lsh);
+    tcg_gen_sarv_vec(vece, rval, a, rsh);
+
+    /* Select in-bound left shift.  */
+    tcg_gen_andc_vec(vece, lval, lval, tmp);
+
+    /* Select between left and right shift.  */
+    if (vece == MO_8) {
+        tcg_gen_dupi_vec(vece, tmp, 0);
+        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, d, lsh, tmp, rval, lval);
+    } else {
+        tcg_gen_dupi_vec(vece, tmp, 0x80);
+        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, d, lsh, tmp, lval, rval);
+    }
+
+    tcg_temp_free_vec(lval);
+    tcg_temp_free_vec(rval);
+    tcg_temp_free_vec(lsh);
+    tcg_temp_free_vec(rsh);
+    tcg_temp_free_vec(tmp);
+}
+
+static const TCGOpcode sshl_list[] = {
+    INDEX_op_neg_vec, INDEX_op_umin_vec, INDEX_op_shlv_vec,
+    INDEX_op_sarv_vec, INDEX_op_cmp_vec, INDEX_op_cmpsel_vec, 0
+};
+
+const GVecGen3 sshl_op[4] = {
+    { .fniv = gen_sshl_vec,
+      .fno = gen_helper_gvec_sshl_b,
+      .opt_opc = sshl_list,
+      .vece = MO_8 },
+    { .fniv = gen_sshl_vec,
+      .fno = gen_helper_gvec_sshl_h,
+      .opt_opc = sshl_list,
+      .vece = MO_16 },
+    { .fni4 = gen_sshl_i32,
+      .fniv = gen_sshl_vec,
+      .opt_opc = sshl_list,
+      .vece = MO_32 },
+    { .fni8 = gen_sshl_i64,
+      .fniv = gen_sshl_vec,
+      .opt_opc = sshl_list,
+      .vece = MO_64 },
+};
+
 static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
                           TCGv_vec a, TCGv_vec b)
 {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                   vec_size, vec_size);
             }
             return 0;
+
+        case NEON_3R_VSHL:
+            tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
+                           u ? &ushl_op[size] : &sshl_op[size]);
+            return 0;
         }
 
         if (size == 3) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 neon_load_reg64(cpu_V0, rn + pass);
                 neon_load_reg64(cpu_V1, rm + pass);
                 switch (op) {
-                case NEON_3R_VSHL:
-                    if (u) {
-                        gen_helper_neon_shl_u64(cpu_V0, cpu_V1, cpu_V0);
-                    } else {
-                        gen_helper_neon_shl_s64(cpu_V0, cpu_V1, cpu_V0);
-                    }
-                    break;
                 case NEON_3R_VQSHL:
                     if (u) {
                         gen_helper_neon_qshl_u64(cpu_V0, cpu_env,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         }
         pairwise = 0;
         switch (op) {
-        case NEON_3R_VSHL:
         case NEON_3R_VQSHL:
         case NEON_3R_VRSHL:
         case NEON_3R_VQRSHL:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VHSUB:
             GEN_NEON_INTEGER_OP(hsub);
             break;
-        case NEON_3R_VSHL:
-            GEN_NEON_INTEGER_OP(shl);
-            break;
         case NEON_3R_VQSHL:
             GEN_NEON_INTEGER_OP_ENV(qshl);
             break;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             }
                         } else {
                             if (input_unsigned) {
-                                gen_helper_neon_shl_u64(cpu_V0, in, tmp64);
+                                gen_ushl_i64(cpu_V0, in, tmp64);
                             } else {
-                                gen_helper_neon_shl_s64(cpu_V0, in, tmp64);
+                                gen_sshl_i64(cpu_V0, in, tmp64);
                             }
                         }
                         tmp = tcg_temp_new_i32();
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
 }
+
+void HELPER(gvec_sshl_b)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int8_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz; ++i) {
+        int8_t mm = m[i];
+        int8_t nn = n[i];
+        int8_t res = 0;
+        if (mm >= 0) {
+            if (mm < 8) {
+                res = nn << mm;
+            }
+        } else {
+            res = nn >> (mm > -8 ? -mm : 7);
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_sshl_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int16_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 2; ++i) {
+        int8_t mm = m[i];   /* only 8 bits of shift are significant */
+        int16_t nn = n[i];
+        int16_t res = 0;
+        if (mm >= 0) {
+            if (mm < 16) {
+                res = nn << mm;
+            }
+        } else {
+            res = nn >> (mm > -16 ? -mm : 15);
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_ushl_b)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint8_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz; ++i) {
+        int8_t mm = m[i];
+        uint8_t nn = n[i];
+        uint8_t res = 0;
+        if (mm >= 0) {
+            if (mm < 8) {
+                res = nn << mm;
+            }
+        } else {
+            if (mm > -8) {
+                res = nn >> -mm;
+            }
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint16_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 2; ++i) {
+        int8_t mm = m[i];   /* only 8 bits of shift are significant */
+        uint16_t nn = n[i];
+        uint16_t res = 0;
+        if (mm >= 0) {
+            if (mm < 16) {
+                res = nn << mm;
+            }
+        } else {
+            if (mm > -16) {
+                res = nn >> -mm;
+            }
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This replaces 3 target-specific implementations for BIT, BIF, and BSL.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190518191934.21887-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.h |  2 +
 target/arm/translate.h     |  3 --
 target/arm/translate-a64.c | 15 ++++++--
 target/arm/translate.c     | 78 +++-----------------------------------
 4 files changed, 20 insertions(+), 78 deletions(-)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -XXX,XX +XXX,XX @@ typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
                          uint32_t, uint32_t);
 typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
                         uint32_t, uint32_t, uint32_t);
+typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
+                        uint32_t, uint32_t, uint32_t);
 
 #endif /* TARGET_ARM_TRANSLATE_A64_H */
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline void gen_ss_advance(DisasContext *s)
 }
 
 /* Vector operations shared between ARM and AArch64.  */
-extern const GVecGen3 bsl_op;
-extern const GVecGen3 bit_op;
-extern const GVecGen3 bif_op;
 extern const GVecGen3 mla_op[4];
 extern const GVecGen3 mls_op[4];
 extern const GVecGen3 cmtst_op[4];
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_fn3(DisasContext *s, bool is_q, int rd, int rn, int rm,
             vec_full_reg_offset(s, rm), is_q ? 16 : 8, vec_full_reg_size(s));
 }
 
+/* Expand a 4-operand AdvSIMD vector operation using an expander function.  */
+static void gen_gvec_fn4(DisasContext *s, bool is_q, int rd, int rn, int rm,
+                         int rx, GVecGen4Fn *gvec_fn, int vece)
+{
+    gvec_fn(vece, vec_full_reg_offset(s, rd), vec_full_reg_offset(s, rn),
+            vec_full_reg_offset(s, rm), vec_full_reg_offset(s, rx),
+            is_q ? 16 : 8, vec_full_reg_size(s));
+}
+
 /* Expand a 2-operand + immediate AdvSIMD vector operation using
  * an op descriptor.
  */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
         return;
 
     case 5: /* BSL bitwise select */
-        gen_gvec_op3(s, is_q, rd, rn, rm, &bsl_op);
+        gen_gvec_fn4(s, is_q, rd, rd, rn, rm, tcg_gen_gvec_bitsel, 0);
         return;
     case 6: /* BIT, bitwise insert if true */
-        gen_gvec_op3(s, is_q, rd, rn, rm, &bit_op);
+        gen_gvec_fn4(s, is_q, rd, rm, rn, rd, tcg_gen_gvec_bitsel, 0);
         return;
     case 7: /* BIF, bitwise insert if false */
-        gen_gvec_op3(s, is_q, rd, rn, rm, &bif_op);
+        gen_gvec_fn4(s, is_q, rd, rm, rd, rn, tcg_gen_gvec_bitsel, 0);
         return;
 
     default:
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int do_v81_helper(DisasContext *s, gen_helper_gvec_3_ptr *fn,
     return 1;
 }
 
-/*
- * Expanders for VBitOps_VBIF, VBIT, VBSL.
- */
-static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-    tcg_gen_xor_i64(rn, rn, rm);
-    tcg_gen_and_i64(rn, rn, rd);
-    tcg_gen_xor_i64(rd, rm, rn);
-}
-
-static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-    tcg_gen_xor_i64(rn, rn, rd);
-    tcg_gen_and_i64(rn, rn, rm);
-    tcg_gen_xor_i64(rd, rd, rn);
-}
-
-static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-    tcg_gen_xor_i64(rn, rn, rd);
-    tcg_gen_andc_i64(rn, rn, rm);
-    tcg_gen_xor_i64(rd, rd, rn);
-}
-
-static void gen_bsl_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-    tcg_gen_xor_vec(vece, rn, rn, rm);
-    tcg_gen_and_vec(vece, rn, rn, rd);
-    tcg_gen_xor_vec(vece, rd, rm, rn);
-}
-
-static void gen_bit_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-    tcg_gen_xor_vec(vece, rn, rn, rd);
-    tcg_gen_and_vec(vece, rn, rn, rm);
-    tcg_gen_xor_vec(vece, rd, rd, rn);
-}
-
-static void gen_bif_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-    tcg_gen_xor_vec(vece, rn, rn, rd);
-    tcg_gen_andc_vec(vece, rn, rn, rm);
-    tcg_gen_xor_vec(vece, rd, rd, rn);
-}
-
-const GVecGen3 bsl_op = {
-    .fni8 = gen_bsl_i64,
-    .fniv = gen_bsl_vec,
-    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-    .load_dest = true
-};
-
-const GVecGen3 bit_op = {
-    .fni8 = gen_bit_i64,
-    .fniv = gen_bit_vec,
-    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-    .load_dest = true
-};
-
-const GVecGen3 bif_op = {
-    .fni8 = gen_bif_i64,
-    .fniv = gen_bif_vec,
-    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-    .load_dest = true
-};
-
 static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 {
     tcg_gen_vec_sar8i_i64(a, a, shift);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                  vec_size, vec_size);
                 break;
             case 5: /* VBSL */
-                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-                               vec_size, vec_size, &bsl_op);
+                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rd_ofs, rn_ofs, rm_ofs,
+                                    vec_size, vec_size);
                 break;
             case 6: /* VBIT */
-                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-                               vec_size, vec_size, &bit_op);
+                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rn_ofs, rd_ofs,
+                                    vec_size, vec_size);
                 break;
             case 7: /* VBIF */
-                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-                               vec_size, vec_size, &bif_op);
+                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rd_ofs, rn_ofs,
+                                    vec_size, vec_size);
                 break;
             }
             return 0;
-- 
2.20.1

The NSACR register allows secure code to configure the FPU
to be inaccessible to non-secure code. If the NSACR.CP10
bit is set then:
 * NS accesses to the FPU trap as UNDEF (ie to NS EL1 or EL2)
 * CPACR.{CP10,CP11} behave as if RAZ/WI
 * HCPTR.{TCP11,TCP10} behave as if RAO/WI

Note that we do not implement the NSACR.NSASEDIS bit which
gates only access to Advanced SIMD, in the same way that
we don't implement the equivalent CPACR.ASEDIS and HCPTR.TASE.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190510110357.18825-1-peter.maydell@linaro.org
---
 target/arm/helper.c | 75 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void cpacr_write(CPUARMState *env, const ARMCPRegInfo *ri,
         }
         value &= mask;
     }
+
+    /*
+     * For A-profile AArch32 EL3 (but not M-profile secure mode), if NSACR.CP10
+     * is 0 then CPACR.{CP11,CP10} ignore writes and read as 0b00.
+     */
+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
+        value &= ~(0xf << 20);
+        value |= env->cp15.cpacr_el1 & (0xf << 20);
+    }
+
     env->cp15.cpacr_el1 = value;
 }
 
+static uint64_t cpacr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    /*
+     * For A-profile AArch32 EL3 (but not M-profile secure mode), if NSACR.CP10
+     * is 0 then CPACR.{CP11,CP10} ignore writes and read as 0b00.
+     */
+    uint64_t value = env->cp15.cpacr_el1;
+
+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
+        value &= ~(0xf << 20);
+    }
+    return value;
+}
+
+
 static void cpacr_reset(CPUARMState *env, const ARMCPRegInfo *ri)
 {
     /* Call cpacr_write() so that we reset with the correct RAO bits set
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
     { .name = "CPACR", .state = ARM_CP_STATE_BOTH, .opc0 = 3,
       .crn = 1, .crm = 0, .opc1 = 0, .opc2 = 2, .accessfn = cpacr_access,
       .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.cpacr_el1),
-      .resetfn = cpacr_reset, .writefn = cpacr_write },
+      .resetfn = cpacr_reset, .writefn = cpacr_write, .readfn = cpacr_read },
     REGINFO_SENTINEL
 };
 
@@ -XXX,XX +XXX,XX @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
     return ret;
 }
 
+static void cptr_el2_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                           uint64_t value)
+{
+    /*
+     * For A-profile AArch32 EL3, if NSACR.CP10
+     * is 0 then HCPTR.{TCP11,TCP10} ignore writes and read as 1.
+     */
+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
+        value &= ~(0x3 << 10);
+        value |= env->cp15.cptr_el[2] & (0x3 << 10);
+    }
+    env->cp15.cptr_el[2] = value;
+}
+
+static uint64_t cptr_el2_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    /*
+     * For A-profile AArch32 EL3, if NSACR.CP10
+     * is 0 then HCPTR.{TCP11,TCP10} ignore writes and read as 1.
+     */
+    uint64_t value = env->cp15.cptr_el[2];
+
+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
+        value |= 0x3 << 10;
+    }
+    return value;
+}
+
 static const ARMCPRegInfo el2_cp_reginfo[] = {
     { .name = "HCR_EL2", .state = ARM_CP_STATE_AA64,
       .type = ARM_CP_IO,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
     { .name = "CPTR_EL2", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 2,
       .access = PL2_RW, .accessfn = cptr_access, .resetvalue = 0,
-      .fieldoffset = offsetof(CPUARMState, cp15.cptr_el[2]) },
+      .fieldoffset = offsetof(CPUARMState, cp15.cptr_el[2]),
+      .readfn = cptr_el2_read, .writefn = cptr_el2_write },
     { .name = "MAIR_EL2", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 4, .crn = 10, .crm = 2, .opc2 = 0,
       .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.mair_el[2]),
@@ -XXX,XX +XXX,XX @@ int fp_exception_el(CPUARMState *env, int cur_el)
         break;
     }
 
+    /*
+     * The NSACR allows A-profile AArch32 EL3 and M-profile secure mode
+     * to control non-secure access to the FPU. It doesn't have any
+     * effect if EL3 is AArch64 or if EL3 doesn't exist at all.
+     */
+    if ((arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+         cur_el <= 2 && !arm_is_secure_below_el3(env))) {
+        if (!extract32(env->cp15.nsacr, 10, 1)) {
+            /* FP insns act as UNDEF */
+            return cur_el == 2 ? 2 : 1;
+        }
+    }
+
     /* For the CPTR registers we don't need to guard with an ARM_FEATURE
      * check because zero bits in the registers mean "don't trap".
      */
-- 
2.20.1

In commit 80376c3fc2c38fdd453 in 2010 we added a workaround for
some qbus buses not being connected to qdev devices -- if the
bus has no parent object then we register a reset function which
resets the bus on system reset (and unregister it when the
bus is unparented).

Nearly a decade later, we have now no buses in the tree which
are created with non-NULL parents, so we can remove the
workaround and instead just assert that if the bus has a NULL
parent then it is the main system bus.

(The absence of other parentless buses was confirmed by
code inspection of all the callsites of qbus_create() and
qbus_create_inplace() and cross-checked by 'make check'.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Damien Hedde <damien.hedde@greensocs.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190523150543.22676-1-peter.maydell@linaro.org
---
 hw/core/bus.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/hw/core/bus.c b/hw/core/bus.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/core/bus.c
+++ b/hw/core/bus.c
@@ -XXX,XX +XXX,XX @@ static void qbus_realize(BusState *bus, DeviceState *parent, const char *name)
         bus->parent->num_child_bus++;
         object_property_add_child(OBJECT(bus->parent), bus->name, OBJECT(bus), NULL);
         object_unref(OBJECT(bus));
-    } else if (bus != sysbus_get_default()) {
-        /* TODO: once all bus devices are qdevified,
-           only reset handler for main_system_bus should be registered here. */
-        qemu_register_reset(qbus_reset_all_fn, bus);
+    } else {
+        /* The only bus without a parent is the main system bus */
+        assert(bus == sysbus_get_default());
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void bus_unparent(Object *obj)
     BusState *bus = BUS(obj);
     BusChild *kid;
 
+    /* Only the main system bus has no parent, and that bus is never freed */
+    assert(bus->parent);
+
     while ((kid = QTAILQ_FIRST(&bus->children)) != NULL) {
         DeviceState *dev = kid->child;
         object_unparent(OBJECT(dev));
     }
-    if (bus->parent) {
-        QLIST_REMOVE(bus, sibling);
-        bus->parent->num_child_bus--;
-        bus->parent = NULL;
-    } else {
-        assert(bus != sysbus_get_default()); /* main_system_bus is never freed */
-        qemu_unregister_reset(qbus_reset_all_fn, bus);
-    }
+    QLIST_REMOVE(bus, sibling);
+    bus->parent->num_child_bus--;
+    bus->parent = NULL;
 }
 
 void qbus_create_inplace(void *bus, size_t size, const char *typename,
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The ARM pseudocode installs the error_code into the original
pointer, not the encrypted pointer.  The difference applies
within the 7 bits of pac data; the result should be the sign
extension of bit 55.

Add a testcase to that effect.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/tcg/aarch64/Makefile.target |  2 +-
 target/arm/pauth_helper.c         |  4 +-
 tests/tcg/aarch64/pauth-2.c       | 61 +++++++++++++++++++++++++++++++
 3 files changed, 64 insertions(+), 3 deletions(-)
 create mode 100644 tests/tcg/aarch64/pauth-2.c

diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index XXXXXXX..XXXXXXX 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -XXX,XX +XXX,XX @@ run-fcvt: fcvt
 	$(call run-test,$<,$(QEMU) $<, "$< on $(TARGET_NAME)")
 	$(call diff-out,$<,$(AARCH64_SRC)/fcvt.ref)
 
-AARCH64_TESTS += pauth-1
+AARCH64_TESTS += pauth-1 pauth-2
 run-pauth-%: QEMU += -cpu max
 
 TESTS:=$(AARCH64_TESTS)
diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
     if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
         int error_code = (keynumber << 1) | (keynumber ^ 1);
         if (param.tbi) {
-            return deposit64(ptr, 53, 2, error_code);
+            return deposit64(orig_ptr, 53, 2, error_code);
         } else {
-            return deposit64(ptr, 61, 2, error_code);
+            return deposit64(orig_ptr, 61, 2, error_code);
         }
     }
     return orig_ptr;
diff --git a/tests/tcg/aarch64/pauth-2.c b/tests/tcg/aarch64/pauth-2.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/tcg/aarch64/pauth-2.c
@@ -XXX,XX +XXX,XX @@
+#include <stdint.h>
+#include <assert.h>
+
+asm(".arch armv8.4-a");
+
+void do_test(uint64_t value)
+{
+    uint64_t salt1, salt2;
+    uint64_t encode, decode;
+
+    /*
+     * With TBI enabled and a 48-bit VA, there are 7 bits of auth,
+     * and so a 1/128 chance of encode = pac(value,key,salt) producing
+     * an auth for which leaves value unchanged.
+     * Iterate until we find a salt for which encode != value.
+     */
+    for (salt1 = 1; ; salt1++) {
+        asm volatile("pacda %0, %2" : "=r"(encode) : "0"(value), "r"(salt1));
+        if (encode != value) {
+            break;
+        }
+    }
+
+    /* A valid salt must produce a valid authorization.  */
+    asm volatile("autda %0, %2" : "=r"(decode) : "0"(encode), "r"(salt1));
+    assert(decode == value);
+
+    /*
+     * An invalid salt usually fails authorization, but again there
+     * is a chance of choosing another salt that works.
+     * Iterate until we find another salt which does fail.
+     */
+    for (salt2 = salt1 + 1; ; salt2++) {
+        asm volatile("autda %0, %2" : "=r"(decode) : "0"(encode), "r"(salt2));
+        if (decode != value) {
+            break;
+        }
+    }
+
+    /* The VA bits, bit 55, and the TBI bits, should be unchanged.  */
+    assert(((decode ^ value) & 0xff80ffffffffffffull) == 0);
+
+    /*
+     * Bits [54:53] are an error indicator based on the key used;
+     * the DA key above is keynumber 0, so error == 0b01.  Otherwise
+     * bit 55 of the original is sign-extended into the rest of the auth.
+     */
+    if ((value >> 55) & 1) {
+        assert(((decode >> 48) & 0xff) == 0b10111111);
+    } else {
+        assert(((decode >> 48) & 0xff) == 0b00100000);
+    }
+}
+
+int main()
+{
+    do_test(0);
+    do_test(-1);
+    do_test(0xda004acedeadbeefull);
+    return 0;
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Typo comparing the sign of the field, twice, instead of also comparing
the mask of the field (which itself encodes both position and length).

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190604154225.26992-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 scripts/decodetree.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index XXXXXXX..XXXXXXX 100755
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -XXX,XX +XXX,XX @@ class Field:
         return '{0}(insn, {1}, {2})'.format(extr, self.pos, self.len)
 
     def __eq__(self, other):
-        return self.sign == other.sign and self.sign == other.sign
+        return self.sign == other.sign and self.mask == other.mask
 
     def __ne__(self, other):
         return not self.__eq__(other)
-- 
2.20.1

Add the infrastructure for building and invoking a decodetree decoder
for the AArch32 VFP encodings.  At the moment the new decoder covers
nothing, so we always fall back to the existing hand-written decode.

We need to have one decoder for the unconditional insns and one for
the conditional insns, as otherwise the patterns for conditional
insns would incorrectly match against the unconditional ones too.

Since translate.c is over 14,000 lines long and we're going to be
touching pretty much every line of the VFP code as part of the
decodetree conversion, we create a new translate-vfp.inc.c to hold
the code which deals with VFP in the new scheme.  It should be
possible to convert this into a standalone translation unit
eventually, but the conversion process will be much simpler if we
simply #include it midway through translate.c to start with.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/Makefile.objs       | 13 +++++++++++++
 target/arm/translate-vfp.inc.c | 31 +++++++++++++++++++++++++++++++
 target/arm/translate.c         | 19 +++++++++++++++++++
 target/arm/vfp-uncond.decode   | 28 ++++++++++++++++++++++++++++
 target/arm/vfp.decode          | 28 ++++++++++++++++++++++++++++
 5 files changed, 119 insertions(+)
 create mode 100644 target/arm/translate-vfp.inc.c
 create mode 100644 target/arm/vfp-uncond.decode
 create mode 100644 target/arm/vfp.decode

diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -XXX,XX +XXX,XX @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
 	  $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
 	  "GEN", $(TARGET_DIR)$@)
 
+target/arm/decode-vfp.inc.c: $(SRC_PATH)/target/arm/vfp.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_vfp -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
+target/arm/decode-vfp-uncond.inc.c: $(SRC_PATH)/target/arm/vfp-uncond.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_vfp_uncond -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
 target/arm/translate-sve.o: target/arm/decode-sve.inc.c
+target/arm/translate.o: target/arm/decode-vfp.inc.c
+target/arm/translate.o: target/arm/decode-vfp-uncond.inc.c
+
 obj-$(TARGET_AARCH64) += translate-sve.o sve_helper.o
diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@
+/*
+ *  ARM translation: AArch32 VFP instructions
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *  Copyright (c) 2005-2007 CodeSourcery
+ *  Copyright (c) 2007 OpenedHand, Ltd.
+ *  Copyright (c) 2019 Linaro, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This file is intended to be included from translate.c; it uses
+ * some macros and definitions provided by that file.
+ * It might be possible to convert it to a standalone .c file eventually.
+ */
+
+/* Include the generated VFP decoder */
+#include "decode-vfp.inc.c"
+#include "decode-vfp-uncond.inc.c"
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_mov_vreg_F0(int dp, int reg)
 
 #define ARM_CP_RW_BIT   (1 << 20)
 
+/* Include the VFP decoder */
+#include "translate-vfp.inc.c"
+
 static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
 {
     tcg_gen_ld_i64(var, cpu_env, offsetof(CPUARMState, iwmmxt.regs[reg]));
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
         return 1;
     }
 
+    /*
+     * If the decodetree decoder handles this insn it will always
+     * emit code to either execute the insn or generate an appropriate
+     * exception; so we don't need to ever return non-zero to tell
+     * the calling code to emit an UNDEF exception.
+     */
+    if (extract32(insn, 28, 4) == 0xf) {
+        if (disas_vfp_uncond(s, insn)) {
+            return 0;
+        }
+    } else {
+        if (disas_vfp(s, insn)) {
+            return 0;
+        }
+    }
+
     /* FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
      * for attempts to execute invalid vfp/neon encodings with FP disabled.
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@
+# AArch32 VFP instruction descriptions (unconditional insns)
+#
+#  Copyright (c) 2019 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+# Encodings for the unconditional VFP instructions are here:
+# generally anything matching A32
+#  1111 1110 .... .... .... 101. ...0 ....
+# and T32
+#  1111 110. .... .... .... 101. .... ....
+#  1111 1110 .... .... .... 101. .... ....
+# (but those patterns might also cover some Neon instructions,
+# which do not live in this file.)
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@
+# AArch32 VFP instruction descriptions (conditional insns)
+#
+#  Copyright (c) 2019 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+# Encodings for the conditional VFP instructions are here:
+# generally anything matching A32
+#  cccc 11.. .... .... .... 101. .... ....
+# and T32
+#  1110 110. .... .... .... 101. .... ....
+#  1110 1110 .... .... .... 101. .... ....
+# (but those patterns might also cover some Neon instructions,
+# which do not live in this file.)
-- 
2.20.1

Factor out the VFP access checking code so that we can use it in the
leaf functions of the decodetree decoder.

We call the function full_vfp_access_check() so we can keep
the more natural vfp_access_check() for a version which doesn't
have the 'ignore_vfp_enabled' flag -- that way almost all VFP
insns will be able to use vfp_access_check(s) and only the
special-register access function will have to use
full_vfp_access_check(s, ignore_vfp_enabled).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 100 ++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 101 +++++----------------------------
 2 files changed, 113 insertions(+), 88 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@
 /* Include the generated VFP decoder */
 #include "decode-vfp.inc.c"
 #include "decode-vfp-uncond.inc.c"
+
+/*
+ * Check that VFP access is enabled. If it is, do the necessary
+ * M-profile lazy-FP handling and then return true.
+ * If not, emit code to generate an appropriate exception and
+ * return false.
+ * The ignore_vfp_enabled argument specifies that we should ignore
+ * whether VFP is enabled via FPEXC[EN]: this should be true for FMXR/FMRX
+ * accesses to FPSID, FPEXC, MVFR0, MVFR1, MVFR2, and false for all other insns.
+ */
+static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
+{
+    if (s->fp_excp_el) {
+        if (arm_dc_feature(s, ARM_FEATURE_M)) {
+            gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
+                               s->fp_excp_el);
+        } else {
+            gen_exception_insn(s, 4, EXCP_UDEF,
+                               syn_fp_access_trap(1, 0xe, false),
+                               s->fp_excp_el);
+        }
+        return false;
+    }
+
+    if (!s->vfp_enabled && !ignore_vfp_enabled) {
+        assert(!arm_dc_feature(s, ARM_FEATURE_M));
+        gen_exception_insn(s, 4, EXCP_UDEF, syn_uncategorized(),
+                           default_exception_el(s));
+        return false;
+    }
+
+    if (arm_dc_feature(s, ARM_FEATURE_M)) {
+        /* Handle M-profile lazy FP state mechanics */
+
+        /* Trigger lazy-state preservation if necessary */
+        if (s->v7m_lspact) {
+            /*
+             * Lazy state saving affects external memory and also the NVIC,
+             * so we must mark it as an IO operation for icount.
+             */
+            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+                gen_io_start();
+            }
+            gen_helper_v7m_preserve_fp_state(cpu_env);
+            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+                gen_io_end();
+            }
+            /*
+             * If the preserve_fp_state helper doesn't throw an exception
+             * then it will clear LSPACT; we don't need to repeat this for
+             * any further FP insns in this TB.
+             */
+            s->v7m_lspact = false;
+        }
+
+        /* Update ownership of FP context: set FPCCR.S to match current state */
+        if (s->v8m_fpccr_s_wrong) {
+            TCGv_i32 tmp;
+
+            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
+            if (s->v8m_secure) {
+                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
+            } else {
+                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
+            }
+            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
+            /* Don't need to do this for any further FP insns in this TB */
+            s->v8m_fpccr_s_wrong = false;
+        }
+
+        if (s->v7m_new_fp_ctxt_needed) {
+            /*
+             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
+             * and the FPSCR.
+             */
+            TCGv_i32 control, fpscr;
+            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
+
+            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
+            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
+            tcg_temp_free_i32(fpscr);
+            /*
+             * We don't need to arrange to end the TB, because the only
+             * parts of FPSCR which we cache in the TB flags are the VECLEN
+             * and VECSTRIDE, and those don't exist for M-profile.
+             */
+
+            if (s->v8m_secure) {
+                bits |= R_V7M_CONTROL_SFPA_MASK;
+            }
+            control = load_cpu_field(v7m.control[M_REG_S]);
+            tcg_gen_ori_i32(control, control, bits);
+            store_cpu_field(control, v7m.control[M_REG_S]);
+            /* Don't need to do this for any further FP insns in this TB */
+            s->v7m_new_fp_ctxt_needed = false;
+        }
+    }
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
     return 1;
 }
 
-/* Disassemble a VFP instruction.  Returns nonzero if an error occurred
-   (ie. an undefined instruction).  */
+/*
+ * Disassemble a VFP instruction.  Returns nonzero if an error occurred
+ * (ie. an undefined instruction).
+ */
 static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 {
     uint32_t rd, rn, rm, op, i, n, offset, delta_d, delta_m, bank_mask;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
     TCGv_i32 addr;
     TCGv_i32 tmp;
     TCGv_i32 tmp2;
+    bool ignore_vfp_enabled = false;
 
     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
         }
     }
 
-    /* FIXME: this access check should not take precedence over UNDEF
+    /*
+     * FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
      * for attempts to execute invalid vfp/neon encodings with FP disabled.
      */
-    if (s->fp_excp_el) {
-        if (arm_dc_feature(s, ARM_FEATURE_M)) {
-            gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
-                               s->fp_excp_el);
-        } else {
-            gen_exception_insn(s, 4, EXCP_UDEF,
-                               syn_fp_access_trap(1, 0xe, false),
-                               s->fp_excp_el);
-        }
-        return 0;
-    }
-
-    if (!s->vfp_enabled) {
-        /* VFP disabled.  Only allow fmxr/fmrx to/from some control regs.  */
-        if ((insn & 0x0fe00fff) != 0x0ee00a10)
-            return 1;
+    if ((insn & 0x0fe00fff) == 0x0ee00a10) {
         rn = (insn >> 16) & 0xf;
-        if (rn != ARM_VFP_FPSID && rn != ARM_VFP_FPEXC && rn != ARM_VFP_MVFR2
-            && rn != ARM_VFP_MVFR1 && rn != ARM_VFP_MVFR0) {
-            return 1;
+        if (rn == ARM_VFP_FPSID || rn == ARM_VFP_FPEXC || rn == ARM_VFP_MVFR2
+            || rn == ARM_VFP_MVFR1 || rn == ARM_VFP_MVFR0) {
+            ignore_vfp_enabled = true;
         }
     }
-
-    if (arm_dc_feature(s, ARM_FEATURE_M)) {
-        /* Handle M-profile lazy FP state mechanics */
-
-        /* Trigger lazy-state preservation if necessary */
-        if (s->v7m_lspact) {
-            /*
-             * Lazy state saving affects external memory and also the NVIC,
-             * so we must mark it as an IO operation for icount.
-             */
-            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-                gen_io_start();
-            }
-            gen_helper_v7m_preserve_fp_state(cpu_env);
-            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-                gen_io_end();
-            }
-            /*
-             * If the preserve_fp_state helper doesn't throw an exception
-             * then it will clear LSPACT; we don't need to repeat this for
-             * any further FP insns in this TB.
-             */
-            s->v7m_lspact = false;
-        }
-
-        /* Update ownership of FP context: set FPCCR.S to match current state */
-        if (s->v8m_fpccr_s_wrong) {
-            TCGv_i32 tmp;
-
-            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
-            if (s->v8m_secure) {
-                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
-            } else {
-                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
-            }
-            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
-            /* Don't need to do this for any further FP insns in this TB */
-            s->v8m_fpccr_s_wrong = false;
-        }
-
-        if (s->v7m_new_fp_ctxt_needed) {
-            /*
-             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
-             * and the FPSCR.
-             */
-            TCGv_i32 control, fpscr;
-            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
-
-            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
-            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
-            tcg_temp_free_i32(fpscr);
-            /*
-             * We don't need to arrange to end the TB, because the only
-             * parts of FPSCR which we cache in the TB flags are the VECLEN
-             * and VECSTRIDE, and those don't exist for M-profile.
-             */
-
-            if (s->v8m_secure) {
-                bits |= R_V7M_CONTROL_SFPA_MASK;
-            }
-            control = load_cpu_field(v7m.control[M_REG_S]);
-            tcg_gen_ori_i32(control, control, bits);
-            store_cpu_field(control, v7m.control[M_REG_S]);
-            /* Don't need to do this for any further FP insns in this TB */
-            s->v7m_new_fp_ctxt_needed = false;
-        }
+    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
+        return 0;
     }
 
     if (extract32(insn, 28, 4) == 0xf) {
-- 
2.20.1

At the moment our -cpu max for AArch32 supports VFP short-vectors
because we always implement them, even for CPUs which should
not have them. The following commits are going to switch to
using the correct ID-register-check to enable or disable short
vector support, so we need to turn it on explicitly for -cpu max,
because Cortex-A15 doesn't implement it.

We don't enable this for the AArch64 -cpu max, because the v8A
architecture never supports short-vectors.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
         kvm_arm_set_cpu_features_from_host(cpu);
     } else {
         cortex_a15_initfn(obj);
+
+        /* old-style VFP short-vector support */
+        cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSHVEC, 1);
+
 #ifdef CONFIG_USER_ONLY
         /* We don't set these in system emulation mode for the moment,
          * since we don't correctly set (all of) the ID registers to
-- 
2.20.1

Convert the VSEL instructions to decodetree.
We leave trans_VSEL() in translate.c for now as this allows
the patch to show just the changes from the old handle_vsel().

In the old code the check for "do D16-D31 exist" was hidden in
the VFP_DREG macro, and assumed that VFPv3 always implied that
D16-D31 exist. In the new code we do the correct ID register test.
This gives identical behaviour for most of our CPUs, and fixes
previously incorrect handling for  Cortex-R5F, Cortex-M4 and
Cortex-M33, which all implement VFPv3 or better with only 16
double-precision registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h               |  6 ++++++
 target/arm/translate-vfp.inc.c |  9 +++++++++
 target/arm/translate.c         | 35 ++++++++++++++++++++++++----------
 target/arm/vfp-uncond.decode   | 19 ++++++++++++++++++
 4 files changed, 59 insertions(+), 10 deletions(-)

Convert the VMINNM and VMAXNM instructions to decodetree.
As with VSEL, we leave the trans_VMINMAXNM() function
in translate.c for the moment.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c       | 41 ++++++++++++++++++++++++------------
 target/arm/vfp-uncond.decode |  5 +++++
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
     return true;
 }
 
-static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
-                            uint32_t rm, uint32_t dp)
+static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
 {
-    uint32_t vmin = extract32(insn, 6, 1);
-    TCGv_ptr fpst = get_fpstatus_ptr(0);
+    uint32_t rd, rn, rm;
+    bool dp = a->dp;
+    bool vmin = a->op;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vn | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rn = a->vn;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
 
     if (dp) {
         TCGv_i64 frn, frm, dest;
@@ -XXX,XX +XXX,XX @@ static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
     }
 
     tcg_temp_free_ptr(fpst);
-    return 0;
+    return true;
 }
 
 static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
@@ -XXX,XX +XXX,XX @@ static const uint8_t fp_decode_rm[] = {
 
 static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
 {
-    uint32_t rd, rn, rm, dp = extract32(insn, 8, 1);
+    uint32_t rd, rm, dp = extract32(insn, 8, 1);
 
     if (dp) {
         VFP_DREG_D(rd, insn);
-        VFP_DREG_N(rn, insn);
         VFP_DREG_M(rm, insn);
     } else {
         rd = VFP_SREG_D(insn);
-        rn = VFP_SREG_N(insn);
         rm = VFP_SREG_M(insn);
     }
 
-    if ((insn & 0x0fb00e10) == 0x0e800a00 &&
-        dc_isar_feature(aa32_vminmaxnm, s)) {
-        return handle_vminmaxnm(insn, rd, rn, rm, dp);
-    } else if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
-               dc_isar_feature(aa32_vrint, s)) {
+    if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
+        dc_isar_feature(aa32_vrint, s)) {
         /* VRINTA, VRINTN, VRINTP, VRINTM */
         int rounding = fp_decode_rm[extract32(insn, 16, 2)];
         return handle_vrint(insn, rd, rm, dp, rounding);
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VSEL        1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
             vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
 VSEL        1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
             vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
+
+VMINMAXNM   1111 1110 1.00 .... .... 1010 . op:1 .0 .... \
+            vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
+VMINMAXNM   1111 1110 1.00 .... .... 1011 . op:1 .0 .... \
+            vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
-- 
2.20.1

Convert the VRINTA/VRINTN/VRINTP/VRINTM instructions to decodetree.
Again, trans_VRINT() is temporarily left in translate.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c       | 60 +++++++++++++++++++++++-------------
 target/arm/vfp-uncond.decode |  5 +++
 2 files changed, 43 insertions(+), 22 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
     return true;
 }
 
-static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
-                        int rounding)
+/*
+ * Table for converting the most common AArch32 encoding of
+ * rounding mode to arm_fprounding order (which matches the
+ * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
+ */
+static const uint8_t fp_decode_rm[] = {
+    FPROUNDING_TIEAWAY,
+    FPROUNDING_TIEEVEN,
+    FPROUNDING_POSINF,
+    FPROUNDING_NEGINF,
+};
+
+static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
 {
-    TCGv_ptr fpst = get_fpstatus_ptr(0);
+    uint32_t rd, rm;
+    bool dp = a->dp;
+    TCGv_ptr fpst;
     TCGv_i32 tcg_rmode;
+    int rounding = fp_decode_rm[a->rm];
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
 
     tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
@@ -XXX,XX +XXX,XX @@ static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
     tcg_temp_free_i32(tcg_rmode);
 
     tcg_temp_free_ptr(fpst);
-    return 0;
+    return true;
 }
 
 static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
     return 0;
 }
 
-/* Table for converting the most common AArch32 encoding of
- * rounding mode to arm_fprounding order (which matches the
- * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
- */
-static const uint8_t fp_decode_rm[] = {
-    FPROUNDING_TIEAWAY,
-    FPROUNDING_TIEEVEN,
-    FPROUNDING_POSINF,
-    FPROUNDING_NEGINF,
-};
-
 static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
 {
     uint32_t rd, rm, dp = extract32(insn, 8, 1);
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
         rm = VFP_SREG_M(insn);
     }
 
-    if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
-        dc_isar_feature(aa32_vrint, s)) {
-        /* VRINTA, VRINTN, VRINTP, VRINTM */
-        int rounding = fp_decode_rm[extract32(insn, 16, 2)];
-        return handle_vrint(insn, rd, rm, dp, rounding);
-    } else if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
-               dc_isar_feature(aa32_vcvt_dr, s)) {
+    if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
+        dc_isar_feature(aa32_vcvt_dr, s)) {
         /* VCVTA, VCVTN, VCVTP, VCVTM */
         int rounding = fp_decode_rm[extract32(insn, 16, 2)];
         return handle_vcvt(insn, rd, rm, dp, rounding);
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VMINMAXNM   1111 1110 1.00 .... .... 1010 . op:1 .0 .... \
             vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
 VMINMAXNM   1111 1110 1.00 .... .... 1011 . op:1 .0 .... \
             vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
+
+VRINT       1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
+            vm=%vm_sp vd=%vd_sp dp=0
+VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
+            vm=%vm_dp vd=%vd_dp dp=1
-- 
2.20.1

Convert the VCVTA/VCVTN/VCVTP/VCVTM instructions to decodetree.
trans_VCVT() is temporarily left in translate.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c       | 72 +++++++++++++++++-------------------
 target/arm/vfp-uncond.decode |  6 +++
 2 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
     return true;
 }
 
-static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
-                       int rounding)
+static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
 {
-    bool is_signed = extract32(insn, 7, 1);
-    TCGv_ptr fpst = get_fpstatus_ptr(0);
+    uint32_t rd, rm;
+    bool dp = a->dp;
+    TCGv_ptr fpst;
     TCGv_i32 tcg_rmode, tcg_shift;
+    int rounding = fp_decode_rm[a->rm];
+    bool is_signed = a->op;
+
+    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
 
     tcg_shift = tcg_const_i32(0);
 
@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
     if (dp) {
         TCGv_i64 tcg_double, tcg_res;
         TCGv_i32 tcg_tmp;
-        /* Rd is encoded as a single precision register even when the source
-         * is double precision.
-         */
-        rd = ((rd << 1) & 0x1e) | ((rd >> 4) & 0x1);
         tcg_double = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
         tcg_tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
 
     tcg_temp_free_ptr(fpst);
 
-    return 0;
-}
-
-static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
-{
-    uint32_t rd, rm, dp = extract32(insn, 8, 1);
-
-    if (dp) {
-        VFP_DREG_D(rd, insn);
-        VFP_DREG_M(rm, insn);
-    } else {
-        rd = VFP_SREG_D(insn);
-        rm = VFP_SREG_M(insn);
-    }
-
-    if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
-        dc_isar_feature(aa32_vcvt_dr, s)) {
-        /* VCVTA, VCVTN, VCVTP, VCVTM */
-        int rounding = fp_decode_rm[extract32(insn, 16, 2)];
-        return handle_vcvt(insn, rd, rm, dp, rounding);
-    }
-    return 1;
+    return true;
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
         }
     }
 
+    if (extract32(insn, 28, 4) == 0xf) {
+        /*
+         * Encodings with T=1 (Thumb) or unconditional (ARM): these
+         * were all handled by the decodetree decoder, so any insn
+         * patterns which get here must be UNDEF.
+         */
+        return 1;
+    }
+
     /*
      * FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
         return 0;
     }
 
-    if (extract32(insn, 28, 4) == 0xf) {
-        /*
-         * Encodings with T=1 (Thumb) or unconditional (ARM):
-         * only used for the "miscellaneous VFP features" added in v8A
-         * and v7M (and gated on the MVFR2.FPMisc field).
-         */
-        return disas_vfp_misc_insn(s, insn);
-    }
-
     dp = ((insn & 0xf00) == 0xb00);
     switch ((insn >> 24) & 0xf) {
     case 0xe:
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VRINT       1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
             vm=%vm_sp vd=%vd_sp dp=0
 VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
             vm=%vm_dp vd=%vd_dp dp=1
+
+# VCVT float to int with specified rounding mode; Vd is always single-precision
+VCVT        1111 1110 1.11 11 rm:2 .... 1010 op:1 1.0 .... \
+            vm=%vm_sp vd=%vd_sp dp=0
+VCVT        1111 1110 1.11 11 rm:2 .... 1011 op:1 1.0 .... \
+            vm=%vm_dp vd=%vd_sp dp=1
-- 
2.20.1

Move the trans_*() functions we've just created from translate.c
to translate-vfp.inc.c. This is pure code motion with no textual
changes (this can be checked with 'git show --color-moved').

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 337 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 337 ---------------------------------
 2 files changed, 337 insertions(+), 337 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check(DisasContext *s)
 {
     return full_vfp_access_check(s, false);
 }
+
+static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+{
+    uint32_t rd, rn, rm;
+    bool dp = a->dp;
+
+    if (!dc_isar_feature(aa32_vsel, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vn | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rn = a->vn;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (dp) {
+        TCGv_i64 frn, frm, dest;
+        TCGv_i64 tmp, zero, zf, nf, vf;
+
+        zero = tcg_const_i64(0);
+
+        frn = tcg_temp_new_i64();
+        frm = tcg_temp_new_i64();
+        dest = tcg_temp_new_i64();
+
+        zf = tcg_temp_new_i64();
+        nf = tcg_temp_new_i64();
+        vf = tcg_temp_new_i64();
+
+        tcg_gen_extu_i32_i64(zf, cpu_ZF);
+        tcg_gen_ext_i32_i64(nf, cpu_NF);
+        tcg_gen_ext_i32_i64(vf, cpu_VF);
+
+        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
+        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+        switch (a->cc) {
+        case 0: /* eq: Z */
+            tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
+                                frn, frm);
+            break;
+        case 1: /* vs: V */
+            tcg_gen_movcond_i64(TCG_COND_LT, dest, vf, zero,
+                                frn, frm);
+            break;
+        case 2: /* ge: N == V -> N ^ V == 0 */
+            tmp = tcg_temp_new_i64();
+            tcg_gen_xor_i64(tmp, vf, nf);
+            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
+                                frn, frm);
+            tcg_temp_free_i64(tmp);
+            break;
+        case 3: /* gt: !Z && N == V */
+            tcg_gen_movcond_i64(TCG_COND_NE, dest, zf, zero,
+                                frn, frm);
+            tmp = tcg_temp_new_i64();
+            tcg_gen_xor_i64(tmp, vf, nf);
+            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
+                                dest, frm);
+            tcg_temp_free_i64(tmp);
+            break;
+        }
+        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i64(frn);
+        tcg_temp_free_i64(frm);
+        tcg_temp_free_i64(dest);
+
+        tcg_temp_free_i64(zf);
+        tcg_temp_free_i64(nf);
+        tcg_temp_free_i64(vf);
+
+        tcg_temp_free_i64(zero);
+    } else {
+        TCGv_i32 frn, frm, dest;
+        TCGv_i32 tmp, zero;
+
+        zero = tcg_const_i32(0);
+
+        frn = tcg_temp_new_i32();
+        frm = tcg_temp_new_i32();
+        dest = tcg_temp_new_i32();
+        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
+        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+        switch (a->cc) {
+        case 0: /* eq: Z */
+            tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
+                                frn, frm);
+            break;
+        case 1: /* vs: V */
+            tcg_gen_movcond_i32(TCG_COND_LT, dest, cpu_VF, zero,
+                                frn, frm);
+            break;
+        case 2: /* ge: N == V -> N ^ V == 0 */
+            tmp = tcg_temp_new_i32();
+            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
+            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
+                                frn, frm);
+            tcg_temp_free_i32(tmp);
+            break;
+        case 3: /* gt: !Z && N == V */
+            tcg_gen_movcond_i32(TCG_COND_NE, dest, cpu_ZF, zero,
+                                frn, frm);
+            tmp = tcg_temp_new_i32();
+            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
+            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
+                                dest, frm);
+            tcg_temp_free_i32(tmp);
+            break;
+        }
+        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i32(frn);
+        tcg_temp_free_i32(frm);
+        tcg_temp_free_i32(dest);
+
+        tcg_temp_free_i32(zero);
+    }
+
+    return true;
+}
+
+static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
+{
+    uint32_t rd, rn, rm;
+    bool dp = a->dp;
+    bool vmin = a->op;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vn | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rn = a->vn;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
+
+    if (dp) {
+        TCGv_i64 frn, frm, dest;
+
+        frn = tcg_temp_new_i64();
+        frm = tcg_temp_new_i64();
+        dest = tcg_temp_new_i64();
+
+        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
+        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+        if (vmin) {
+            gen_helper_vfp_minnumd(dest, frn, frm, fpst);
+        } else {
+            gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
+        }
+        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i64(frn);
+        tcg_temp_free_i64(frm);
+        tcg_temp_free_i64(dest);
+    } else {
+        TCGv_i32 frn, frm, dest;
+
+        frn = tcg_temp_new_i32();
+        frm = tcg_temp_new_i32();
+        dest = tcg_temp_new_i32();
+
+        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
+        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+        if (vmin) {
+            gen_helper_vfp_minnums(dest, frn, frm, fpst);
+        } else {
+            gen_helper_vfp_maxnums(dest, frn, frm, fpst);
+        }
+        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i32(frn);
+        tcg_temp_free_i32(frm);
+        tcg_temp_free_i32(dest);
+    }
+
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
+/*
+ * Table for converting the most common AArch32 encoding of
+ * rounding mode to arm_fprounding order (which matches the
+ * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
+ */
+static const uint8_t fp_decode_rm[] = {
+    FPROUNDING_TIEAWAY,
+    FPROUNDING_TIEEVEN,
+    FPROUNDING_POSINF,
+    FPROUNDING_NEGINF,
+};
+
+static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
+{
+    uint32_t rd, rm;
+    bool dp = a->dp;
+    TCGv_ptr fpst;
+    TCGv_i32 tcg_rmode;
+    int rounding = fp_decode_rm[a->rm];
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
+
+    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+
+    if (dp) {
+        TCGv_i64 tcg_op;
+        TCGv_i64 tcg_res;
+        tcg_op = tcg_temp_new_i64();
+        tcg_res = tcg_temp_new_i64();
+        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
+        gen_helper_rintd(tcg_res, tcg_op, fpst);
+        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i64(tcg_op);
+        tcg_temp_free_i64(tcg_res);
+    } else {
+        TCGv_i32 tcg_op;
+        TCGv_i32 tcg_res;
+        tcg_op = tcg_temp_new_i32();
+        tcg_res = tcg_temp_new_i32();
+        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
+        gen_helper_rints(tcg_res, tcg_op, fpst);
+        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i32(tcg_op);
+        tcg_temp_free_i32(tcg_res);
+    }
+
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    tcg_temp_free_i32(tcg_rmode);
+
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
+static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
+{
+    uint32_t rd, rm;
+    bool dp = a->dp;
+    TCGv_ptr fpst;
+    TCGv_i32 tcg_rmode, tcg_shift;
+    int rounding = fp_decode_rm[a->rm];
+    bool is_signed = a->op;
+
+    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
+
+    tcg_shift = tcg_const_i32(0);
+
+    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+
+    if (dp) {
+        TCGv_i64 tcg_double, tcg_res;
+        TCGv_i32 tcg_tmp;
+        tcg_double = tcg_temp_new_i64();
+        tcg_res = tcg_temp_new_i64();
+        tcg_tmp = tcg_temp_new_i32();
+        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
+        if (is_signed) {
+            gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
+        } else {
+            gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
+        }
+        tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
+        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
+        tcg_temp_free_i32(tcg_tmp);
+        tcg_temp_free_i64(tcg_res);
+        tcg_temp_free_i64(tcg_double);
+    } else {
+        TCGv_i32 tcg_single, tcg_res;
+        tcg_single = tcg_temp_new_i32();
+        tcg_res = tcg_temp_new_i32();
+        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
+        if (is_signed) {
+            gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
+        } else {
+            gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
+        }
+        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
+        tcg_temp_free_i32(tcg_res);
+        tcg_temp_free_i32(tcg_single);
+    }
+
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    tcg_temp_free_i32(tcg_rmode);
+
+    tcg_temp_free_i32(tcg_shift);
+
+    tcg_temp_free_ptr(fpst);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
     tcg_temp_free_i32(tmp);
 }
 
-static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
-{
-    uint32_t rd, rn, rm;
-    bool dp = a->dp;
-
-    if (!dc_isar_feature(aa32_vsel, s)) {
-        return false;
-    }
-
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
-        ((a->vm | a->vn | a->vd) & 0x10)) {
-        return false;
-    }
-    rd = a->vd;
-    rn = a->vn;
-    rm = a->vm;
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    if (dp) {
-        TCGv_i64 frn, frm, dest;
-        TCGv_i64 tmp, zero, zf, nf, vf;
-
-        zero = tcg_const_i64(0);
-
-        frn = tcg_temp_new_i64();
-        frm = tcg_temp_new_i64();
-        dest = tcg_temp_new_i64();
-
-        zf = tcg_temp_new_i64();
-        nf = tcg_temp_new_i64();
-        vf = tcg_temp_new_i64();
-
-        tcg_gen_extu_i32_i64(zf, cpu_ZF);
-        tcg_gen_ext_i32_i64(nf, cpu_NF);
-        tcg_gen_ext_i32_i64(vf, cpu_VF);
-
-        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
-        switch (a->cc) {
-        case 0: /* eq: Z */
-            tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
-                                frn, frm);
-            break;
-        case 1: /* vs: V */
-            tcg_gen_movcond_i64(TCG_COND_LT, dest, vf, zero,
-                                frn, frm);
-            break;
-        case 2: /* ge: N == V -> N ^ V == 0 */
-            tmp = tcg_temp_new_i64();
-            tcg_gen_xor_i64(tmp, vf, nf);
-            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
-                                frn, frm);
-            tcg_temp_free_i64(tmp);
-            break;
-        case 3: /* gt: !Z && N == V */
-            tcg_gen_movcond_i64(TCG_COND_NE, dest, zf, zero,
-                                frn, frm);
-            tmp = tcg_temp_new_i64();
-            tcg_gen_xor_i64(tmp, vf, nf);
-            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
-                                dest, frm);
-            tcg_temp_free_i64(tmp);
-            break;
-        }
-        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i64(frn);
-        tcg_temp_free_i64(frm);
-        tcg_temp_free_i64(dest);
-
-        tcg_temp_free_i64(zf);
-        tcg_temp_free_i64(nf);
-        tcg_temp_free_i64(vf);
-
-        tcg_temp_free_i64(zero);
-    } else {
-        TCGv_i32 frn, frm, dest;
-        TCGv_i32 tmp, zero;
-
-        zero = tcg_const_i32(0);
-
-        frn = tcg_temp_new_i32();
-        frm = tcg_temp_new_i32();
-        dest = tcg_temp_new_i32();
-        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
-        switch (a->cc) {
-        case 0: /* eq: Z */
-            tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
-                                frn, frm);
-            break;
-        case 1: /* vs: V */
-            tcg_gen_movcond_i32(TCG_COND_LT, dest, cpu_VF, zero,
-                                frn, frm);
-            break;
-        case 2: /* ge: N == V -> N ^ V == 0 */
-            tmp = tcg_temp_new_i32();
-            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
-            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
-                                frn, frm);
-            tcg_temp_free_i32(tmp);
-            break;
-        case 3: /* gt: !Z && N == V */
-            tcg_gen_movcond_i32(TCG_COND_NE, dest, cpu_ZF, zero,
-                                frn, frm);
-            tmp = tcg_temp_new_i32();
-            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
-            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
-                                dest, frm);
-            tcg_temp_free_i32(tmp);
-            break;
-        }
-        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i32(frn);
-        tcg_temp_free_i32(frm);
-        tcg_temp_free_i32(dest);
-
-        tcg_temp_free_i32(zero);
-    }
-
-    return true;
-}
-
-static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
-{
-    uint32_t rd, rn, rm;
-    bool dp = a->dp;
-    bool vmin = a->op;
-    TCGv_ptr fpst;
-
-    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
-        return false;
-    }
-
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
-        ((a->vm | a->vn | a->vd) & 0x10)) {
-        return false;
-    }
-    rd = a->vd;
-    rn = a->vn;
-    rm = a->vm;
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    fpst = get_fpstatus_ptr(0);
-
-    if (dp) {
-        TCGv_i64 frn, frm, dest;
-
-        frn = tcg_temp_new_i64();
-        frm = tcg_temp_new_i64();
-        dest = tcg_temp_new_i64();
-
-        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
-        if (vmin) {
-            gen_helper_vfp_minnumd(dest, frn, frm, fpst);
-        } else {
-            gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
-        }
-        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i64(frn);
-        tcg_temp_free_i64(frm);
-        tcg_temp_free_i64(dest);
-    } else {
-        TCGv_i32 frn, frm, dest;
-
-        frn = tcg_temp_new_i32();
-        frm = tcg_temp_new_i32();
-        dest = tcg_temp_new_i32();
-
-        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
-        if (vmin) {
-            gen_helper_vfp_minnums(dest, frn, frm, fpst);
-        } else {
-            gen_helper_vfp_maxnums(dest, frn, frm, fpst);
-        }
-        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i32(frn);
-        tcg_temp_free_i32(frm);
-        tcg_temp_free_i32(dest);
-    }
-
-    tcg_temp_free_ptr(fpst);
-    return true;
-}
-
-/*
- * Table for converting the most common AArch32 encoding of
- * rounding mode to arm_fprounding order (which matches the
- * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
- */
-static const uint8_t fp_decode_rm[] = {
-    FPROUNDING_TIEAWAY,
-    FPROUNDING_TIEEVEN,
-    FPROUNDING_POSINF,
-    FPROUNDING_NEGINF,
-};
-
-static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
-{
-    uint32_t rd, rm;
-    bool dp = a->dp;
-    TCGv_ptr fpst;
-    TCGv_i32 tcg_rmode;
-    int rounding = fp_decode_rm[a->rm];
-
-    if (!dc_isar_feature(aa32_vrint, s)) {
-        return false;
-    }
-
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
-        ((a->vm | a->vd) & 0x10)) {
-        return false;
-    }
-    rd = a->vd;
-    rm = a->vm;
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    fpst = get_fpstatus_ptr(0);
-
-    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
-    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-
-    if (dp) {
-        TCGv_i64 tcg_op;
-        TCGv_i64 tcg_res;
-        tcg_op = tcg_temp_new_i64();
-        tcg_res = tcg_temp_new_i64();
-        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
-        gen_helper_rintd(tcg_res, tcg_op, fpst);
-        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i64(tcg_op);
-        tcg_temp_free_i64(tcg_res);
-    } else {
-        TCGv_i32 tcg_op;
-        TCGv_i32 tcg_res;
-        tcg_op = tcg_temp_new_i32();
-        tcg_res = tcg_temp_new_i32();
-        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
-        gen_helper_rints(tcg_res, tcg_op, fpst);
-        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i32(tcg_op);
-        tcg_temp_free_i32(tcg_res);
-    }
-
-    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    tcg_temp_free_i32(tcg_rmode);
-
-    tcg_temp_free_ptr(fpst);
-    return true;
-}
-
-static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-{
-    uint32_t rd, rm;
-    bool dp = a->dp;
-    TCGv_ptr fpst;
-    TCGv_i32 tcg_rmode, tcg_shift;
-    int rounding = fp_decode_rm[a->rm];
-    bool is_signed = a->op;
-
-    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
-        return false;
-    }
-
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
-        return false;
-    }
-    rd = a->vd;
-    rm = a->vm;
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    fpst = get_fpstatus_ptr(0);
-
-    tcg_shift = tcg_const_i32(0);
-
-    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
-    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-
-    if (dp) {
-        TCGv_i64 tcg_double, tcg_res;
-        TCGv_i32 tcg_tmp;
-        tcg_double = tcg_temp_new_i64();
-        tcg_res = tcg_temp_new_i64();
-        tcg_tmp = tcg_temp_new_i32();
-        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
-        if (is_signed) {
-            gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
-        } else {
-            gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
-        }
-        tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
-        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
-        tcg_temp_free_i32(tcg_tmp);
-        tcg_temp_free_i64(tcg_res);
-        tcg_temp_free_i64(tcg_double);
-    } else {
-        TCGv_i32 tcg_single, tcg_res;
-        tcg_single = tcg_temp_new_i32();
-        tcg_res = tcg_temp_new_i32();
-        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
-        if (is_signed) {
-            gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
-        } else {
-            gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
-        }
-        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
-        tcg_temp_free_i32(tcg_res);
-        tcg_temp_free_i32(tcg_single);
-    }
-
-    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    tcg_temp_free_i32(tcg_rmode);
-
-    tcg_temp_free_i32(tcg_shift);
-
-    tcg_temp_free_ptr(fpst);
-
-    return true;
-}
-
 /*
  * Disassemble a VFP instruction.  Returns nonzero if an error occurred
  * (ie. an undefined instruction).
-- 
2.20.1

The current VFP code has two different idioms for
loading and storing from the VFP register file:
 1 using the gen_mov_F0_vreg() and similar functions,
   which load and store to a fixed set of TCG globals
   cpu_F0s, CPU_F0d, etc
 2 by direct calls to tcg_gen_ld_f64() and friends

We want to phase out idiom 1 (because the use of the
fixed globals is a relic of a much older version of TCG),
but idiom 2 is quite longwinded:
 tcg_gen_ld_f64(tmp, cpu_env, vfp_reg_offset(true, reg))
requires us to specify the 64-bitness twice, once in
the function name and once by passing 'true' to
vfp_reg_offset(). There's no guard against accidentally
passing the wrong flag.

Instead, let's move to a convention of accessing 64-bit
registers via the existing neon_load_reg64() and
neon_store_reg64(), and provide new neon_load_reg32()
and neon_store_reg32() for the 32-bit equivalents.

Implement the new functions and use them in the code in
translate-vfp.inc.c. We will convert the rest of the VFP
code as we do the decodetree conversion in subsequent
commits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 40 +++++++++++++++++-----------------
 target/arm/translate.c         | 10 +++++++++
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         tcg_gen_ext_i32_i64(nf, cpu_NF);
         tcg_gen_ext_i32_i64(vf, cpu_VF);
 
-        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg64(frn, rn);
+        neon_load_reg64(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
             tcg_temp_free_i64(tmp);
             break;
         }
-        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg64(dest, rd);
         tcg_temp_free_i64(frn);
         tcg_temp_free_i64(frm);
         tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         frn = tcg_temp_new_i32();
         frm = tcg_temp_new_i32();
         dest = tcg_temp_new_i32();
-        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg32(frn, rn);
+        neon_load_reg32(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
             tcg_temp_free_i32(tmp);
             break;
         }
-        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg32(dest, rd);
         tcg_temp_free_i32(frn);
         tcg_temp_free_i32(frm);
         tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
         frm = tcg_temp_new_i64();
         dest = tcg_temp_new_i64();
 
-        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg64(frn, rn);
+        neon_load_reg64(frm, rm);
         if (vmin) {
             gen_helper_vfp_minnumd(dest, frn, frm, fpst);
         } else {
             gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
         }
-        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg64(dest, rd);
         tcg_temp_free_i64(frn);
         tcg_temp_free_i64(frm);
         tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
         frm = tcg_temp_new_i32();
         dest = tcg_temp_new_i32();
 
-        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg32(frn, rn);
+        neon_load_reg32(frm, rm);
         if (vmin) {
             gen_helper_vfp_minnums(dest, frn, frm, fpst);
         } else {
             gen_helper_vfp_maxnums(dest, frn, frm, fpst);
         }
-        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg32(dest, rd);
         tcg_temp_free_i32(frn);
         tcg_temp_free_i32(frm);
         tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i64 tcg_res;
         tcg_op = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
-        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg64(tcg_op, rm);
         gen_helper_rintd(tcg_res, tcg_op, fpst);
-        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg64(tcg_res, rd);
         tcg_temp_free_i64(tcg_op);
         tcg_temp_free_i64(tcg_res);
     } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i32 tcg_res;
         tcg_op = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg32(tcg_op, rm);
         gen_helper_rints(tcg_res, tcg_op, fpst);
-        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_op);
         tcg_temp_free_i32(tcg_res);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         tcg_double = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
         tcg_tmp = tcg_temp_new_i32();
-        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
+        neon_load_reg64(tcg_double, rm);
         if (is_signed) {
             gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
         } else {
             gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
         }
         tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
-        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
+        neon_store_reg32(tcg_tmp, rd);
         tcg_temp_free_i32(tcg_tmp);
         tcg_temp_free_i64(tcg_res);
         tcg_temp_free_i64(tcg_double);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         TCGv_i32 tcg_single, tcg_res;
         tcg_single = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
+        neon_load_reg32(tcg_single, rm);
         if (is_signed) {
             gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
         } else {
             gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
         }
-        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
+        neon_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_res);
         tcg_temp_free_i32(tcg_single);
     }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
 }
 
+static inline void neon_load_reg32(TCGv_i32 var, int reg)
+{
+    tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
+}
+
+static inline void neon_store_reg32(TCGv_i32 var, int reg)
+{
+    tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
-- 
2.20.1

Convert the "double-precision" register moves to decodetree:
this covers VMOV scalar-to-gpreg, VMOV gpreg-to-scalar and VDUP.

Note that the conversion process has tightened up a few of the
UNDEF encoding checks: we now correctly forbid:
 * VMOV-to-gpr with U:opc1:opc2 == 10x00 or x0x10
 * VMOV-from-gpr with opc1:opc2 == 0x10
 * VDUP with B:E == 11
 * VDUP with Q == 1 and Vn<0> == 1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
The accesses of elements < 32 bits could be improved by doing
direct ld/st of the right size rather than 32-bit read-and-shift
or read-modify-write, but we leave this for later cleanup,
since this series is generally trying to stick to fixing
the decode.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 147 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  83 +------------------
 target/arm/vfp.decode          |  36 ++++++++
 3 files changed, 185 insertions(+), 81 deletions(-)

Convert the "single-precision" register moves to decodetree:
 * VMSR
 * VMRS
 * VMOV between general purpose register and single precision

Note that the VMSR/VMRS conversions make our handling of
the "should this UNDEF?" checks consistent between the two
instructions:
 * VMSR to MVFR0, MVFR1, MVFR2 now UNDEF from EL0
   (previously was a nop)
 * VMSR to FPSID now UNDEFs from EL0 or if VFPv3 or better
   (previously was a nop)
 * VMSR to FPINST and FPINST2 now UNDEF if VFPv3 or better
   (previously would write to the register, which had no
   guest-visible effect because we always UNDEF reads)

We also tighten up the decode: we were previously underdecoding
some SBZ or SBO bits.

The conversion of VMOV_single includes the expansion out of the
gen_mov_F0_vreg()/gen_vfp_mrs() and gen_mov_vreg_F0()/gen_vfp_msr()
sequences into the simpler direct load/store of the TCG temp via
neon_{load,store}_reg32(): we know in the new function that we're
always single-precision, we don't need to use the old-and-deprecated
cpu_F0* TCG globals, and we don't happen to have the declaration of
gen_vfp_msr() and gen_vfp_mrs() at the point in the file where the
new function is.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 161 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 148 +-----------------------------
 target/arm/vfp.decode          |   4 +
 3 files changed, 168 insertions(+), 145 deletions(-)

Convert the VFP two-register transfer instructions to decodetree
(in the v8 Arm ARM these are the "Advanced SIMD and floating-point
64-bit move" encoding group).

Again, we expand out the sequences involving gen_vfp_msr() and
gen_msr_vfp().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 70 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 46 +---------------------
 target/arm/vfp.decode          |  5 +++
 3 files changed, 77 insertions(+), 44 deletions(-)

Convert the VFP single load/store insns VLDR and VSTR to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 73 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 22 +---------
 target/arm/vfp.decode          |  7 ++++
 3 files changed, 82 insertions(+), 20 deletions(-)

Convert the VFP load/store multiple insns to decodetree.
This includes tightening up the UNDEF checking for pre-VFPv3
CPUs which only have D0-D15 : they now UNDEF for any access
to D16-D31, not merely when the smallest register in the
transfer list is in D16-D31.

This conversion does not try to share code between the single
precision and the double precision versions; this looks a bit
duplicative of code, but it leaves the door open for a future
refactoring which gets rid of the use of the "F0" registers
by inlining the various functions like gen_vfp_ld() and
gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }"
conditionalisation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 162 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  97 +-------------------
 target/arm/vfp.decode          |  18 ++++
 3 files changed, 183 insertions(+), 94 deletions(-)

Expand out the sequences in the new decoder VLDR/VSTR/VLDM/VSTM trans
functions which perform the memory accesses by going via the TCG
globals cpu_F0s and cpu_F0d, to use local TCG temps instead.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 46 +++++++++++++++++++++-------------
 target/arm/translate.c         | 18 -------------
 2 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_sp *a)
 static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
 {
     uint32_t offset;
-    TCGv_i32 addr;
+    TCGv_i32 addr, tmp;
 
     if (!vfp_access_check(s)) {
         return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
         addr = load_reg(s, a->rn);
     }
     tcg_gen_addi_i32(addr, addr, offset);
+    tmp = tcg_temp_new_i32();
     if (a->l) {
-        gen_vfp_ld(s, false, addr);
-        gen_mov_vreg_F0(false, a->vd);
+        gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
+        neon_store_reg32(tmp, a->vd);
     } else {
-        gen_mov_F0_vreg(false, a->vd);
-        gen_vfp_st(s, false, addr);
+        neon_load_reg32(tmp, a->vd);
+        gen_aa32_st32(s, tmp, addr, get_mem_index(s));
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(addr);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
 {
     uint32_t offset;
     TCGv_i32 addr;
+    TCGv_i64 tmp;
 
     /* UNDEF accesses to D16-D31 if they don't exist */
     if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
         addr = load_reg(s, a->rn);
     }
     tcg_gen_addi_i32(addr, addr, offset);
+    tmp = tcg_temp_new_i64();
     if (a->l) {
-        gen_vfp_ld(s, true, addr);
-        gen_mov_vreg_F0(true, a->vd);
+        gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
+        neon_store_reg64(tmp, a->vd);
     } else {
-        gen_mov_F0_vreg(true, a->vd);
-        gen_vfp_st(s, true, addr);
+        neon_load_reg64(tmp, a->vd);
+        gen_aa32_st64(s, tmp, addr, get_mem_index(s));
     }
+    tcg_temp_free_i64(tmp);
     tcg_temp_free_i32(addr);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
 static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
 {
     uint32_t offset;
-    TCGv_i32 addr;
+    TCGv_i32 addr, tmp;
     int i, n;
 
     n = a->imm;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
     }
 
     offset = 4;
+    tmp = tcg_temp_new_i32();
     for (i = 0; i < n; i++) {
         if (a->l) {
             /* load */
-            gen_vfp_ld(s, false, addr);
-            gen_mov_vreg_F0(false, a->vd + i);
+            gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
+            neon_store_reg32(tmp, a->vd + i);
         } else {
             /* store */
-            gen_mov_F0_vreg(false, a->vd + i);
-            gen_vfp_st(s, false, addr);
+            neon_load_reg32(tmp, a->vd + i);
+            gen_aa32_st32(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
     }
+    tcg_temp_free_i32(tmp);
     if (a->w) {
         /* writeback */
         if (a->p) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
 {
     uint32_t offset;
     TCGv_i32 addr;
+    TCGv_i64 tmp;
     int i, n;
 
     n = a->imm >> 1;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
     }
 
     offset = 8;
+    tmp = tcg_temp_new_i64();
     for (i = 0; i < n; i++) {
         if (a->l) {
             /* load */
-            gen_vfp_ld(s, true, addr);
-            gen_mov_vreg_F0(true, a->vd + i);
+            gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
+            neon_store_reg64(tmp, a->vd + i);
         } else {
             /* store */
-            gen_mov_F0_vreg(true, a->vd + i);
-            gen_vfp_st(s, true, addr);
+            neon_load_reg64(tmp, a->vd + i);
+            gen_aa32_st64(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
     }
+    tcg_temp_free_i64(tmp);
     if (a->w) {
         /* writeback */
         if (a->p) {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ VFP_GEN_FIX(uhto, )
 VFP_GEN_FIX(ulto, )
 #undef VFP_GEN_FIX
 
-static inline void gen_vfp_ld(DisasContext *s, int dp, TCGv_i32 addr)
-{
-    if (dp) {
-        gen_aa32_ld64(s, cpu_F0d, addr, get_mem_index(s));
-    } else {
-        gen_aa32_ld32u(s, cpu_F0s, addr, get_mem_index(s));
-    }
-}
-
-static inline void gen_vfp_st(DisasContext *s, int dp, TCGv_i32 addr)
-{
-    if (dp) {
-        gen_aa32_st64(s, cpu_F0d, addr, get_mem_index(s));
-    } else {
-        gen_aa32_st32(s, cpu_F0s, addr, get_mem_index(s));
-    }
-}
-
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
-- 
2.20.1

Convert the VFP VMLA instruction to decodetree.

This is the first of the VFP 3-operand data processing instructions,
so we include in this patch the code which loops over the elements
for an old-style VFP vector operation. The existing code to do this
looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since
we are going to be converting instructions one at a time anyway
we can take the opportunity to make the new loop use TCG temporaries,
which means we can do that conversion one operation at a time
rather than needing to do it all in one go.

We include an UNDEF check which was missing in the old code:
short-vector operations (with stride or length non-zero) were
deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec
field does not indicate that support for short vectors is present
we UNDEF the operations that would use them. (This is a change
of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which
previously were all incorrectly allowing short-vector operations.)

Note that the conversion fixes a bug in the old code for the
case of VFP short-vector "mixed scalar/vector operations". These
happen where the destination register is in a vector bank but
but the second operand is in a scalar bank. For example
  vmla.f64 d10, d1, d16   with length 2 stride 2
is equivalent to the pair of scalar operations
  vmla.f64 d10, d1, d16
  vmla.f64 d8, d3, d16
where the destination and first input register cycle through
their vector but the second input is scalar (d16). In the
old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d}
as a temporary output for the multiply, which trashes the
second input operand. For the fully-scalar case (where we
never do a second iteration) and the fully-vector case
(where the loop loads the new second input operand) this
doesn't matter, but for the mixed scalar/vector case we
will end up using the wrong value for later loop iterations.
In the new code we use TCG temporaries and so avoid the bug.
This bug is present for all the multiply-accumulate insns
that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS.

Note 2: the expression used to calculate the next register
number in the vector bank is not in fact correct; we leave
this behaviour unchanged from the old decoder and will
fix this bug later in the series.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h               |   5 +
 target/arm/translate-vfp.inc.c | 205 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  14 ++-
 target/arm/vfp.decode          |   6 +
 4 files changed, 224 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp_d32(const ARMISARegisters *id)
     return FIELD_EX64(id->mvfr0, MVFR0, SIMDREG) >= 2;
 }
 
+static inline bool isar_feature_aa32_fpshvec(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->mvfr0, MVFR0, FPSHVEC) > 0;
+}
+
 /*
  * We always set the FP and SIMD FP16 fields to indicate identical
  * levels of support (assuming SIMD is implemented at all), so
diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
 
     return true;
 }
+
+/*
+ * Types for callbacks for do_vfp_3op_sp() and do_vfp_3op_dp().
+ * The callback should emit code to write a value to vd. If
+ * do_vfp_3op_{sp,dp}() was passed reads_vd then the TCGv vd
+ * will contain the old value of the relevant VFP register;
+ * otherwise it must be written to only.
+ */
+typedef void VFPGen3OpSPFn(TCGv_i32 vd,
+                           TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst);
+typedef void VFPGen3OpDPFn(TCGv_i64 vd,
+                           TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst);
+
+/*
+ * Perform a 3-operand VFP data processing instruction. fn is the
+ * callback to do the actual operation; this function deals with the
+ * code to handle looping around for VFP vector processing.
+ */
+static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
+                          int vd, int vn, int vm, bool reads_vd)
+{
+    uint32_t delta_m = 0;
+    uint32_t delta_d = 0;
+    uint32_t bank_mask = 0;
+    int veclen = s->vec_len;
+    TCGv_i32 f0, f1, fd;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_fpshvec, s) &&
+        (veclen != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (veclen > 0) {
+        bank_mask = 0x18;
+
+        /* Figure out what type of vector operation this is.  */
+        if ((vd & bank_mask) == 0) {
+            /* scalar */
+            veclen = 0;
+        } else {
+            delta_d = s->vec_stride + 1;
+
+            if ((vm & bank_mask) == 0) {
+                /* mixed scalar/vector */
+                delta_m = 0;
+            } else {
+                /* vector */
+                delta_m = delta_d;
+            }
+        }
+    }
+
+    f0 = tcg_temp_new_i32();
+    f1 = tcg_temp_new_i32();
+    fd = tcg_temp_new_i32();
+    fpst = get_fpstatus_ptr(0);
+
+    neon_load_reg32(f0, vn);
+    neon_load_reg32(f1, vm);
+
+    for (;;) {
+        if (reads_vd) {
+            neon_load_reg32(fd, vd);
+        }
+        fn(fd, f0, f1, fpst);
+        neon_store_reg32(fd, vd);
+
+        if (veclen == 0) {
+            break;
+        }
+
+        /* Set up the operands for the next iteration */
+        veclen--;
+        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+        neon_load_reg32(f0, vn);
+        if (delta_m) {
+            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+            neon_load_reg32(f1, vm);
+        }
+    }
+
+    tcg_temp_free_i32(f0);
+    tcg_temp_free_i32(f1);
+    tcg_temp_free_i32(fd);
+    tcg_temp_free_ptr(fpst);
+
+    return true;
+}
+
+static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
+                          int vd, int vn, int vm, bool reads_vd)
+{
+    uint32_t delta_m = 0;
+    uint32_t delta_d = 0;
+    uint32_t bank_mask = 0;
+    int veclen = s->vec_len;
+    TCGv_i64 f0, f1, fd;
+    TCGv_ptr fpst;
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vn | vm) & 0x10)) {
+        return false;
+    }
+
+    if (!dc_isar_feature(aa32_fpshvec, s) &&
+        (veclen != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (veclen > 0) {
+        bank_mask = 0xc;
+
+        /* Figure out what type of vector operation this is.  */
+        if ((vd & bank_mask) == 0) {
+            /* scalar */
+            veclen = 0;
+        } else {
+            delta_d = (s->vec_stride >> 1) + 1;
+
+            if ((vm & bank_mask) == 0) {
+                /* mixed scalar/vector */
+                delta_m = 0;
+            } else {
+                /* vector */
+                delta_m = delta_d;
+            }
+        }
+    }
+
+    f0 = tcg_temp_new_i64();
+    f1 = tcg_temp_new_i64();
+    fd = tcg_temp_new_i64();
+    fpst = get_fpstatus_ptr(0);
+
+    neon_load_reg64(f0, vn);
+    neon_load_reg64(f1, vm);
+
+    for (;;) {
+        if (reads_vd) {
+            neon_load_reg64(fd, vd);
+        }
+        fn(fd, f0, f1, fpst);
+        neon_store_reg64(fd, vd);
+
+        if (veclen == 0) {
+            break;
+        }
+        /* Set up the operands for the next iteration */
+        veclen--;
+        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+        neon_load_reg64(f0, vn);
+        if (delta_m) {
+            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+            neon_load_reg64(f1, vm);
+        }
+    }
+
+    tcg_temp_free_i64(f0);
+    tcg_temp_free_i64(f1);
+    tcg_temp_free_i64(fd);
+    tcg_temp_free_ptr(fpst);
+
+    return true;
+}
+
+static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /* Note that order of inputs to the add matters for NaNs */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VMLA_sp(DisasContext *s, arg_VMLA_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VMLA_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VMLA_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /* Note that order of inputs to the add matters for NaNs */
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
+    tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             op = ((insn >> 20) & 8) | ((insn >> 19) & 6) | ((insn >> 6) & 1);
             rn = VFP_SREG_N(insn);
 
+            switch (op) {
+            case 0:
+                /* Already handled by decodetree */
+                return 1;
+            default:
+                break;
+            }
+
             if (op == 15) {
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 0: /* VMLA: fd + (fn * fm) */
-                    /* Note that order of inputs to the add matters for NaNs */
-                    gen_vfp_F1_mul(dp);
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_add(dp);
-                    break;
                 case 1: /* VMLS: fd + -(fn * fm) */
                     gen_vfp_mul(dp);
                     gen_vfp_F1_neg(dp);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VLDM_VSTM_sp ---- 1101 0.1 l:1 rn:4 .... 1010 imm:8 \
              vd=%vd_sp p=1 u=0 w=1
 VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
              vd=%vd_dp p=1 u=0 w=1
+
+# 3-register VFP data-processing; bits [23,21:20,6] identify the operation.
+VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VFP VMLS instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 38 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  8 +------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
 }
+
+static void gen_VMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /*
+     * VMLS: vd = vd + -(vn * vm)
+     * Note that order of inputs to the add matters for NaNs.
+     */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
+    gen_helper_vfp_negs(tmp, tmp);
+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VMLS_sp(DisasContext *s, arg_VMLS_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VMLS_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /*
+     * VMLS: vd = vd + -(vn * vm)
+     * Note that order of inputs to the add matters for NaNs.
+     */
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
+    gen_helper_vfp_negd(tmp, tmp);
+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
+    tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0:
+            case 0 ... 1:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 1: /* VMLS: fd + -(fn * fm) */
-                    gen_vfp_mul(dp);
-                    gen_vfp_F1_neg(dp);
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_add(dp);
-                    break;
                 case 2: /* VNMLS: -fd + (fn * fm) */
                     /* Note that it isn't valid to replace (-A + B) with (B - A)
                      * or similar plausible looking simplifications
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VFP VNMLS instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 42 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 24 +------------------
 target/arm/vfp.decode          |  5 ++++
 3 files changed, 48 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
 }
+
+static void gen_VNMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /*
+     * VNMLS: -fd + (fn * fm)
+     * Note that it isn't valid to replace (-A + B) with (B - A) or similar
+     * plausible looking simplifications because this will give wrong results
+     * for NaNs.
+     */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
+    gen_helper_vfp_negs(vd, vd);
+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VNMLS_sp(DisasContext *s, arg_VNMLS_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VNMLS_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VNMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /*
+     * VNMLS: -fd + (fn * fm)
+     * Note that it isn't valid to replace (-A + B) with (B - A) or similar
+     * plausible looking simplifications because this will give wrong results
+     * for NaNs.
+     */
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
+    gen_helper_vfp_negd(vd, vd);
+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
+    tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ VFP_OP2(div)
 
 #undef VFP_OP2
 
-static inline void gen_vfp_F1_mul(int dp)
-{
-    /* Like gen_vfp_mul() but put result in F1 */
-    TCGv_ptr fpst = get_fpstatus_ptr(0);
-    if (dp) {
-        gen_helper_vfp_muld(cpu_F1d, cpu_F0d, cpu_F1d, fpst);
-    } else {
-        gen_helper_vfp_muls(cpu_F1s, cpu_F0s, cpu_F1s, fpst);
-    }
-    tcg_temp_free_ptr(fpst);
-}
-
 static inline void gen_vfp_F1_neg(int dp)
 {
     /* Like gen_vfp_neg() but put result in F1 */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 1:
+            case 0 ... 2:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 2: /* VNMLS: -fd + (fn * fm) */
-                    /* Note that it isn't valid to replace (-A + B) with (B - A)
-                     * or similar plausible looking simplifications
-                     * because this will give wrong results for NaNs.
-                     */
-                    gen_vfp_F1_mul(dp);
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_neg(dp);
-                    gen_vfp_add(dp);
-                    break;
                 case 3: /* VNMLA: -fd + -(fn * fm) */
                     gen_vfp_mul(dp);
                     gen_vfp_F1_neg(dp);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VFP VNMLA instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 34 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 19 +------------------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 40 insertions(+), 18 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
 }
+
+static void gen_VNMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /* VNMLA: -fd + -(fn * fm) */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
+    gen_helper_vfp_negs(tmp, tmp);
+    gen_helper_vfp_negs(vd, vd);
+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VNMLA_sp(DisasContext *s, arg_VNMLA_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VNMLA_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VNMLA_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /* VNMLA: -fd + (fn * fm) */
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
+    gen_helper_vfp_negd(tmp, tmp);
+    gen_helper_vfp_negd(vd, vd);
+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
+    tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ VFP_OP2(div)
 
 #undef VFP_OP2
 
-static inline void gen_vfp_F1_neg(int dp)
-{
-    /* Like gen_vfp_neg() but put result in F1 */
-    if (dp) {
-        gen_helper_vfp_negd(cpu_F1d, cpu_F0d);
-    } else {
-        gen_helper_vfp_negs(cpu_F1s, cpu_F0s);
-    }
-}
-
 static inline void gen_vfp_abs(int dp)
 {
     if (dp)
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 2:
+            case 0 ... 3:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 3: /* VNMLA: -fd + -(fn * fm) */
-                    gen_vfp_mul(dp);
-                    gen_vfp_F1_neg(dp);
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_neg(dp);
-                    gen_vfp_add(dp);
-                    break;
                 case 4: /* mul: fn * fm */
                     gen_vfp_mul(dp);
                     break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VMUL instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  5 +----
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
 }
+
+static bool trans_VMUL_sp(DisasContext *s, arg_VMUL_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_helper_vfp_muls, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 3:
+            case 0 ... 4:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 4: /* mul: fn * fm */
-                    gen_vfp_mul(dp);
-                    break;
                 case 5: /* nmul: -(fn * fm) */
                     gen_vfp_mul(dp);
                     gen_vfp_neg(dp);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VNMUL instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 24 ++++++++++++++++++++++++
 target/arm/translate.c         |  7 +------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_sp *a)
 {
     return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
 }
+
+static void gen_VNMUL_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /* VNMUL: -(fn * fm) */
+    gen_helper_vfp_muls(vd, vn, vm, fpst);
+    gen_helper_vfp_negs(vd, vd);
+}
+
+static bool trans_VNMUL_sp(DisasContext *s, arg_VNMUL_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VNMUL_sp, a->vd, a->vn, a->vm, false);
+}
+
+static void gen_VNMUL_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /* VNMUL: -(fn * fm) */
+    gen_helper_vfp_muld(vd, vn, vm, fpst);
+    gen_helper_vfp_negd(vd, vd);
+}
+
+static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
 
 VFP_OP2(add)
 VFP_OP2(sub)
-VFP_OP2(mul)
 VFP_OP2(div)
 
 #undef VFP_OP2
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 4:
+            case 0 ... 5:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 5: /* nmul: -(fn * fm) */
-                    gen_vfp_mul(dp);
-                    gen_vfp_neg(dp);
-                    break;
                 case 6: /* add: fn + fm */
                     gen_vfp_add(dp);
                     break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VADD instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  6 +-----
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VADD_sp(DisasContext *s, arg_VADD_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_helper_vfp_adds, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VADD_dp(DisasContext *s, arg_VADD_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
     tcg_temp_free_ptr(fpst);                                          \
 }
 
-VFP_OP2(add)
 VFP_OP2(sub)
 VFP_OP2(div)
 
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 5:
+            case 0 ... 6:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 6: /* add: fn + fm */
-                    gen_vfp_add(dp);
-                    break;
                 case 7: /* sub: fn - fm */
                     gen_vfp_sub(dp);
                     break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VSUB instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  6 +-----
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VADD_dp(DisasContext *s, arg_VADD_sp *a)
 {
     return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VSUB_sp(DisasContext *s, arg_VSUB_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_helper_vfp_subs, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
     tcg_temp_free_ptr(fpst);                                          \
 }
 
-VFP_OP2(sub)
 VFP_OP2(div)
 
 #undef VFP_OP2
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 6:
+            case 0 ... 7:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 7: /* sub: fn - fm */
-                    gen_vfp_sub(dp);
-                    break;
                 case 8: /* div: fn / fm */
                     gen_vfp_div(dp);
                     break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VSUB_sp      ---- 1110 0.11 .... .... 1010 .1.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VSUB_dp      ---- 1110 0.11 .... .... 1011 .1.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VDIV instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         | 21 +--------------------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_sp *a)
 {
     return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VDIV_sp(DisasContext *s, arg_VDIV_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_helper_vfp_divs, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_fpstatus_ptr(int neon)
     return statusptr;
 }
 
-#define VFP_OP2(name)                                                 \
-static inline void gen_vfp_##name(int dp)                             \
-{                                                                     \
-    TCGv_ptr fpst = get_fpstatus_ptr(0);                              \
-    if (dp) {                                                         \
-        gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, cpu_F1d, fpst);    \
-    } else {                                                          \
-        gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, cpu_F1s, fpst);    \
-    }                                                                 \
-    tcg_temp_free_ptr(fpst);                                          \
-}
-
-VFP_OP2(div)
-
-#undef VFP_OP2
-
 static inline void gen_vfp_abs(int dp)
 {
     if (dp)
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 7:
+            case 0 ... 8:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 8: /* div: fn / fm */
-                    gen_vfp_div(dp);
-                    break;
                 case 10: /* VFNMA : fd = muladd(-fd,  fn, fm) */
                 case 11: /* VFNMS : fd = muladd(-fd, -fn, fm) */
                 case 12: /* VFMA  : fd = muladd( fd,  fn, fm) */
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VSUB_sp      ---- 1110 0.11 .... .... 1010 .1.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VSUB_dp      ---- 1110 0.11 .... .... 1011 .1.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VFP fused multiply-add instructions (VFNMA, VFNMS,
VFMA, VFMS) to decodetree.

Note that in the old decode structure we were implementing
these to honour the VFP vector stride/length. These instructions
were introduced in VFPv4, and in the v7A architecture they
are UNPREDICTABLE if the vector stride or length are non-zero.
In v8A they must UNDEF if stride or length are non-zero, like
all VFP instructions; we choose to UNDEF always.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 121 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  53 +--------------
 target/arm/vfp.decode          |   9 +++
 3 files changed, 131 insertions(+), 52 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_sp *a)
 {
     return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VFM_sp(DisasContext *s, arg_VFM_sp *a)
+{
+    /*
+     * VFNMA : fd = muladd(-fd,  fn, fm)
+     * VFNMS : fd = muladd(-fd, -fn, fm)
+     * VFMA  : fd = muladd( fd,  fn, fm)
+     * VFMS  : fd = muladd( fd, -fn, fm)
+     *
+     * These are fused multiply-add, and must be done as one floating
+     * point operation with no rounding between the multiplication and
+     * addition steps.  NB that doing the negations here as separate
+     * steps is correct : an input NaN should come out with its sign
+     * bit flipped if it is a negated-input.
+     */
+    TCGv_ptr fpst;
+    TCGv_i32 vn, vm, vd;
+
+    /*
+     * Present in VFPv4 only.
+     * In v7A, UNPREDICTABLE with non-zero vector length/stride; from
+     * v8A, must UNDEF. We choose to UNDEF for both v7A and v8A.
+     */
+    if (!arm_dc_feature(s, ARM_FEATURE_VFP4) ||
+        (s->vec_len != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vn = tcg_temp_new_i32();
+    vm = tcg_temp_new_i32();
+    vd = tcg_temp_new_i32();
+
+    neon_load_reg32(vn, a->vn);
+    neon_load_reg32(vm, a->vm);
+    if (a->o2) {
+        /* VFNMS, VFMS */
+        gen_helper_vfp_negs(vn, vn);
+    }
+    neon_load_reg32(vd, a->vd);
+    if (a->o1 & 1) {
+        /* VFNMA, VFNMS */
+        gen_helper_vfp_negs(vd, vd);
+    }
+    fpst = get_fpstatus_ptr(0);
+    gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
+    neon_store_reg32(vd, a->vd);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(vn);
+    tcg_temp_free_i32(vm);
+    tcg_temp_free_i32(vd);
+
+    return true;
+}
+
+static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
+{
+    /*
+     * VFNMA : fd = muladd(-fd,  fn, fm)
+     * VFNMS : fd = muladd(-fd, -fn, fm)
+     * VFMA  : fd = muladd( fd,  fn, fm)
+     * VFMS  : fd = muladd( fd, -fn, fm)
+     *
+     * These are fused multiply-add, and must be done as one floating
+     * point operation with no rounding between the multiplication and
+     * addition steps.  NB that doing the negations here as separate
+     * steps is correct : an input NaN should come out with its sign
+     * bit flipped if it is a negated-input.
+     */
+    TCGv_ptr fpst;
+    TCGv_i64 vn, vm, vd;
+
+    /*
+     * Present in VFPv4 only.
+     * In v7A, UNPREDICTABLE with non-zero vector length/stride; from
+     * v8A, must UNDEF. We choose to UNDEF for both v7A and v8A.
+     */
+    if (!arm_dc_feature(s, ARM_FEATURE_VFP4) ||
+        (s->vec_len != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vn = tcg_temp_new_i64();
+    vm = tcg_temp_new_i64();
+    vd = tcg_temp_new_i64();
+
+    neon_load_reg64(vn, a->vn);
+    neon_load_reg64(vm, a->vm);
+    if (a->o2) {
+        /* VFNMS, VFMS */
+        gen_helper_vfp_negd(vn, vn);
+    }
+    neon_load_reg64(vd, a->vd);
+    if (a->o1 & 1) {
+        /* VFNMA, VFNMS */
+        gen_helper_vfp_negd(vd, vd);
+    }
+    fpst = get_fpstatus_ptr(0);
+    gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
+    neon_store_reg64(vd, a->vd);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i64(vn);
+    tcg_temp_free_i64(vm);
+    tcg_temp_free_i64(vd);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 8:
+            case 0 ... 13:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 10: /* VFNMA : fd = muladd(-fd,  fn, fm) */
-                case 11: /* VFNMS : fd = muladd(-fd, -fn, fm) */
-                case 12: /* VFMA  : fd = muladd( fd,  fn, fm) */
-                case 13: /* VFMS  : fd = muladd( fd, -fn, fm) */
-                    /* These are fused multiply-add, and must be done as one
-                     * floating point operation with no rounding between the
-                     * multiplication and addition steps.
-                     * NB that doing the negations here as separate steps is
-                     * correct : an input NaN should come out with its sign bit
-                     * flipped if it is a negated-input.
-                     */
-                    if (!arm_dc_feature(s, ARM_FEATURE_VFP4)) {
-                        return 1;
-                    }
-                    if (dp) {
-                        TCGv_ptr fpst;
-                        TCGv_i64 frd;
-                        if (op & 1) {
-                            /* VFNMS, VFMS */
-                            gen_helper_vfp_negd(cpu_F0d, cpu_F0d);
-                        }
-                        frd = tcg_temp_new_i64();
-                        tcg_gen_ld_f64(frd, cpu_env, vfp_reg_offset(dp, rd));
-                        if (op & 2) {
-                            /* VFNMA, VFNMS */
-                            gen_helper_vfp_negd(frd, frd);
-                        }
-                        fpst = get_fpstatus_ptr(0);
-                        gen_helper_vfp_muladdd(cpu_F0d, cpu_F0d,
-                                               cpu_F1d, frd, fpst);
-                        tcg_temp_free_ptr(fpst);
-                        tcg_temp_free_i64(frd);
-                    } else {
-                        TCGv_ptr fpst;
-                        TCGv_i32 frd;
-                        if (op & 1) {
-                            /* VFNMS, VFMS */
-                            gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
-                        }
-                        frd = tcg_temp_new_i32();
-                        tcg_gen_ld_f32(frd, cpu_env, vfp_reg_offset(dp, rd));
-                        if (op & 2) {
-                            gen_helper_vfp_negs(frd, frd);
-                        }
-                        fpst = get_fpstatus_ptr(0);
-                        gen_helper_vfp_muladds(cpu_F0s, cpu_F0s,
-                                               cpu_F1s, frd, fpst);
-                        tcg_temp_free_ptr(fpst);
-                        tcg_temp_free_i32(frd);
-                    }
-                    break;
                 case 14: /* fconst */
                     if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
                         return 1;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VFM_sp       ---- 1110 1.01 .... .... 1010 . o2:1 . 0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=1
+VFM_dp       ---- 1110 1.01 .... .... 1011 . o2:1 . 0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=1
+VFM_sp       ---- 1110 1.10 .... .... 1010 . o2:1 . 0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=2
+VFM_dp       ---- 1110 1.10 .... .... 1011 . o2:1 . 0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=2
-- 
2.20.1

Convert the VFP VMOV (immediate) instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 129 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  27 +------
 target/arm/vfp.decode          |   5 ++
 3 files changed, 136 insertions(+), 25 deletions(-)

Convert the VFP VABS instruction to decodetree.

Unlike the 3-op versions, we don't pass fpst to the VFPGen2OpSPFn or
VFPGen2OpDPFn because none of the operations which use this format
and support short vectors will need it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 167 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  12 ++-
 target/arm/vfp.decode          |   5 +
 3 files changed, 180 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ typedef void VFPGen3OpSPFn(TCGv_i32 vd,
 typedef void VFPGen3OpDPFn(TCGv_i64 vd,
                            TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst);
 
+/*
+ * Types for callbacks for do_vfp_2op_sp() and do_vfp_2op_dp().
+ * The callback should emit code to write a value to vd (which
+ * should be written to only).
+ */
+typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
+typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
+
 /*
  * Perform a 3-operand VFP data processing instruction. fn is the
  * callback to do the actual operation; this function deals with the
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     return true;
 }
 
+static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+{
+    uint32_t delta_m = 0;
+    uint32_t delta_d = 0;
+    uint32_t bank_mask = 0;
+    int veclen = s->vec_len;
+    TCGv_i32 f0, fd;
+
+    if (!dc_isar_feature(aa32_fpshvec, s) &&
+        (veclen != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (veclen > 0) {
+        bank_mask = 0x18;
+
+        /* Figure out what type of vector operation this is.  */
+        if ((vd & bank_mask) == 0) {
+            /* scalar */
+            veclen = 0;
+        } else {
+            delta_d = s->vec_stride + 1;
+
+            if ((vm & bank_mask) == 0) {
+                /* mixed scalar/vector */
+                delta_m = 0;
+            } else {
+                /* vector */
+                delta_m = delta_d;
+            }
+        }
+    }
+
+    f0 = tcg_temp_new_i32();
+    fd = tcg_temp_new_i32();
+
+    neon_load_reg32(f0, vm);
+
+    for (;;) {
+        fn(fd, f0);
+        neon_store_reg32(fd, vd);
+
+        if (veclen == 0) {
+            break;
+        }
+
+        if (delta_m == 0) {
+            /* single source one-many */
+            while (veclen--) {
+                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+                neon_store_reg32(fd, vd);
+            }
+            break;
+        }
+
+        /* Set up the operands for the next iteration */
+        veclen--;
+        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+        neon_load_reg32(f0, vm);
+    }
+
+    tcg_temp_free_i32(f0);
+    tcg_temp_free_i32(fd);
+
+    return true;
+}
+
+static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
+{
+    uint32_t delta_m = 0;
+    uint32_t delta_d = 0;
+    uint32_t bank_mask = 0;
+    int veclen = s->vec_len;
+    TCGv_i64 f0, fd;
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vm) & 0x10)) {
+        return false;
+    }
+
+    if (!dc_isar_feature(aa32_fpshvec, s) &&
+        (veclen != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (veclen > 0) {
+        bank_mask = 0xc;
+
+        /* Figure out what type of vector operation this is.  */
+        if ((vd & bank_mask) == 0) {
+            /* scalar */
+            veclen = 0;
+        } else {
+            delta_d = (s->vec_stride >> 1) + 1;
+
+            if ((vm & bank_mask) == 0) {
+                /* mixed scalar/vector */
+                delta_m = 0;
+            } else {
+                /* vector */
+                delta_m = delta_d;
+            }
+        }
+    }
+
+    f0 = tcg_temp_new_i64();
+    fd = tcg_temp_new_i64();
+
+    neon_load_reg64(f0, vm);
+
+    for (;;) {
+        fn(fd, f0);
+        neon_store_reg64(fd, vd);
+
+        if (veclen == 0) {
+            break;
+        }
+
+        if (delta_m == 0) {
+            /* single source one-many */
+            while (veclen--) {
+                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+                neon_store_reg64(fd, vd);
+            }
+            break;
+        }
+
+        /* Set up the operands for the next iteration */
+        veclen--;
+        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+        neon_load_reg64(f0, vm);
+    }
+
+    tcg_temp_free_i64(f0);
+    tcg_temp_free_i64(fd);
+
+    return true;
+}
+
 static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 {
     /* Note that order of inputs to the add matters for NaNs */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     tcg_temp_free_i64(fd);
     return true;
 }
+
+static bool trans_VABS_sp(DisasContext *s, arg_VABS_sp *a)
+{
+    return do_vfp_2op_sp(s, gen_helper_vfp_abss, a->vd, a->vm);
+}
+
+static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
+{
+    return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             case 0 ... 14:
                 /* Already handled by decodetree */
                 return 1;
+            case 15:
+                switch (rn) {
+                case 1:
+                    /* Already handled by decodetree */
+                    return 1;
+                default:
+                    break;
+                }
             default:
                 break;
             }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
                 case 0x00: /* vmov */
-                case 0x01: /* vabs */
                 case 0x02: /* vneg */
                 case 0x03: /* vsqrt */
                     break;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     case 0: /* cpy */
                         /* no-op */
                         break;
-                    case 1: /* abs */
-                        gen_vfp_abs(dp);
-                        break;
                     case 2: /* neg */
                         gen_vfp_neg(dp);
                         break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMOV_imm_sp  ---- 1110 1.11 imm4h:4 .... 1010 0000 imm4l:4 \
              vd=%vd_sp
 VMOV_imm_dp  ---- 1110 1.11 imm4h:4 .... 1011 0000 imm4l:4 \
              vd=%vd_dp
+
+VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Convert the VNEG instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  6 +-----
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
 {
     return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
 }
+
+static bool trans_VNEG_sp(DisasContext *s, arg_VNEG_sp *a)
+{
+    return do_vfp_2op_sp(s, gen_helper_vfp_negs, a->vd, a->vm);
+}
+
+static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
+{
+    return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 1:
+                case 1 ... 2:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
                 case 0x00: /* vmov */
-                case 0x02: /* vneg */
                 case 0x03: /* vsqrt */
                     break;
 
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     case 0: /* cpy */
                         /* no-op */
                         break;
-                    case 2: /* neg */
-                        gen_vfp_neg(dp);
-                        break;
                     case 3: /* sqrt */
                         gen_vfp_sqrt(dp);
                         break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
              vd=%vd_sp vm=%vm_sp
 VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
              vd=%vd_dp vm=%vm_dp
+
+VNEG_sp      ---- 1110 1.11 0001 .... 1010 01.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VNEG_dp      ---- 1110 1.11 0001 .... 1011 01.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Convert the VSQRT instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 20 ++++++++++++++++++++
 target/arm/translate.c         | 14 +-------------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
 {
     return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
 }
+
+static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
+{
+    gen_helper_vfp_sqrts(vd, vm, cpu_env);
+}
+
+static bool trans_VSQRT_sp(DisasContext *s, arg_VSQRT_sp *a)
+{
+    return do_vfp_2op_sp(s, gen_VSQRT_sp, a->vd, a->vm);
+}
+
+static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
+{
+    gen_helper_vfp_sqrtd(vd, vm, cpu_env);
+}
+
+static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
+{
+    return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_neg(int dp)
         gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
 }
 
-static inline void gen_vfp_sqrt(int dp)
-{
-    if (dp)
-        gen_helper_vfp_sqrtd(cpu_F0d, cpu_F0d, cpu_env);
-    else
-        gen_helper_vfp_sqrts(cpu_F0s, cpu_F0s, cpu_env);
-}
-
 static inline void gen_vfp_cmp(int dp)
 {
     if (dp)
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 1 ... 2:
+                case 1 ... 3:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
                 case 0x00: /* vmov */
-                case 0x03: /* vsqrt */
                     break;
 
                 case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     case 0: /* cpy */
                         /* no-op */
                         break;
-                    case 3: /* sqrt */
-                        gen_vfp_sqrt(dp);
-                        break;
                     case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
                     {
                         TCGv_ptr fpst = get_fpstatus_ptr(false);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNEG_sp      ---- 1110 1.11 0001 .... 1010 01.0 .... \
              vd=%vd_sp vm=%vm_sp
 VNEG_dp      ---- 1110 1.11 0001 .... 1011 01.0 .... \
              vd=%vd_dp vm=%vm_dp
+
+VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  8 +-------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 7 deletions(-)

Convert the VFP comparison instructions to decodetree.

Note that comparison instructions should not honour the VFP
short-vector length and stride information: they are scalar-only
operations.  This applies to all the 2-operand instructions except
for VMOV, VABS, VNEG and VSQRT.  (In the old decoder this is
implemented via the "if (op == 15 && rn > 3) { veclen = 0; }" check.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 75 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 51 +----------------------
 target/arm/vfp.decode          |  5 +++
 3 files changed, 81 insertions(+), 50 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
 {
     return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
 }
+
+static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
+{
+    TCGv_i32 vd, vm;
+
+    /* Vm/M bits must be zero for the Z variant */
+    if (a->z && a->vm != 0) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vd = tcg_temp_new_i32();
+    vm = tcg_temp_new_i32();
+
+    neon_load_reg32(vd, a->vd);
+    if (a->z) {
+        tcg_gen_movi_i32(vm, 0);
+    } else {
+        neon_load_reg32(vm, a->vm);
+    }
+
+    if (a->e) {
+        gen_helper_vfp_cmpes(vd, vm, cpu_env);
+    } else {
+        gen_helper_vfp_cmps(vd, vm, cpu_env);
+    }
+
+    tcg_temp_free_i32(vd);
+    tcg_temp_free_i32(vm);
+
+    return true;
+}
+
+static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
+{
+    TCGv_i64 vd, vm;
+
+    /* Vm/M bits must be zero for the Z variant */
+    if (a->z && a->vm != 0) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vd = tcg_temp_new_i64();
+    vm = tcg_temp_new_i64();
+
+    neon_load_reg64(vd, a->vd);
+    if (a->z) {
+        tcg_gen_movi_i64(vm, 0);
+    } else {
+        neon_load_reg64(vm, a->vm);
+    }
+
+    if (a->e) {
+        gen_helper_vfp_cmped(vd, vm, cpu_env);
+    } else {
+        gen_helper_vfp_cmpd(vd, vm, cpu_env);
+    }
+
+    tcg_temp_free_i64(vd);
+    tcg_temp_free_i64(vm);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_neg(int dp)
         gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
 }
 
-static inline void gen_vfp_cmp(int dp)
-{
-    if (dp)
-        gen_helper_vfp_cmpd(cpu_F0d, cpu_F1d, cpu_env);
-    else
-        gen_helper_vfp_cmps(cpu_F0s, cpu_F1s, cpu_env);
-}
-
-static inline void gen_vfp_cmpe(int dp)
-{
-    if (dp)
-        gen_helper_vfp_cmped(cpu_F0d, cpu_F1d, cpu_env);
-    else
-        gen_helper_vfp_cmpes(cpu_F0s, cpu_F1s, cpu_env);
-}
-
-static inline void gen_vfp_F1_ld0(int dp)
-{
-    if (dp)
-        tcg_gen_movi_i64(cpu_F1d, 0);
-    else
-        tcg_gen_movi_i32(cpu_F1s, 0);
-}
-
 #define VFP_GEN_ITOF(name) \
 static inline void gen_vfp_##name(int dp, int neon) \
 { \
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             case 15:
                 switch (rn) {
                 case 0 ... 3:
+                case 8 ... 11:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     rd_is_dp = false;
                     break;
 
-                case 0x08: case 0x0a: /* vcmp, vcmpz */
-                case 0x09: case 0x0b: /* vcmpe, vcmpez */
-                    no_output = true;
-                    break;
-
                 case 0x0c: /* vrintr */
                 case 0x0d: /* vrintz */
                 case 0x0e: /* vrintx */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             /* Load the initial operands.  */
             if (op == 15) {
                 switch (rn) {
-                case 0x08: case 0x09: /* Compare */
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_mov_F1_vreg(dp, rm);
-                    break;
-                case 0x0a: case 0x0b: /* Compare with zero */
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_F1_ld0(dp);
-                    break;
                 case 0x14: /* vcvt fp <-> fixed */
                 case 0x15:
                 case 0x16:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                         gen_vfp_msr(tmp);
                         break;
                     }
-                    case 8: /* cmp */
-                        gen_vfp_cmp(dp);
-                        break;
-                    case 9: /* cmpe */
-                        gen_vfp_cmpe(dp);
-                        break;
-                    case 10: /* cmpz */
-                        gen_vfp_cmp(dp);
-                        break;
-                    case 11: /* cmpez */
-                        gen_vfp_F1_ld0(dp);
-                        gen_vfp_cmpe(dp);
-                        break;
                     case 12: /* vrintr */
                     {
                         TCGv_ptr fpst = get_fpstatus_ptr(0);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 .... \
              vd=%vd_sp vm=%vm_sp
 VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 .... \
              vd=%vd_dp vm=%vm_dp
+
+VCMP_sp      ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VCMP_dp      ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Convert the VCVTT, VCVTB instructions that deal with conversion
from half-precision floats to f32 or 64 to decodetree.

Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
load of the right half of the input single-precision register
rather than loading the full 32 bits and then doing a
separate shift or sign-extension.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 82 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 56 +----------------------
 target/arm/vfp.decode          |  6 +++
 3 files changed, 89 insertions(+), 55 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@
 #include "decode-vfp.inc.c"
 #include "decode-vfp-uncond.inc.c"
 
+/*
+ * Return the offset of a 16-bit half of the specified VFP single-precision
+ * register. If top is true, returns the top 16 bits; otherwise the bottom
+ * 16 bits.
+ */
+static inline long vfp_f16_offset(unsigned reg, bool top)
+{
+    long offs = vfp_reg_offset(false, reg);
+#ifdef HOST_WORDS_BIGENDIAN
+    if (!top) {
+        offs += 2;
+    }
+#else
+    if (top) {
+        offs += 2;
+    }
+#endif
+    return offs;
+}
+
 /*
  * Check that VFP access is enabled. If it is, do the necessary
  * M-profile lazy-FP handling and then return true.
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
 
     return true;
 }
+
+static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 ahp_mode;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_fp16_spconv, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    ahp_mode = get_ahp_flag();
+    tmp = tcg_temp_new_i32();
+    /* The T bit tells us if we want the low or high 16 bits of Vm */
+    tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
+    gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_i32(ahp_mode);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 ahp_mode;
+    TCGv_i32 tmp;
+    TCGv_i64 vd;
+
+    if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd  & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    ahp_mode = get_ahp_flag();
+    tmp = tcg_temp_new_i32();
+    /* The T bit tells us if we want the low or high 16 bits of Vm */
+    tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
+    vd = tcg_temp_new_i64();
+    gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
+    neon_store_reg64(vd, a->vd);
+    tcg_temp_free_i32(ahp_mode);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i64(vd);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 0 ... 3:
+                case 0 ... 5:
                 case 8 ... 11:
                     /* Already handled by decodetree */
                     return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             if (op == 15) {
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
-                case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
-                case 0x05: /* vcvtt.f64.f16, vcvtt.f32.f16 */
-                    /*
-                     * VCVTB, VCVTT: only present with the halfprec extension
-                     * UNPREDICTABLE if bit 8 is set prior to ARMv8
-                     * (we choose to UNDEF)
-                     */
-                    if (dp) {
-                        if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
-                            return 1;
-                        }
-                    } else {
-                        if (!dc_isar_feature(aa32_fp16_spconv, s)) {
-                            return 1;
-                        }
-                    }
-                    rm_is_dp = false;
-                    break;
                 case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
                 case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
                     if (dp) {
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 switch (op) {
                 case 15: /* extension space */
                     switch (rn) {
-                    case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(false);
-                        TCGv_i32 ahp_mode = get_ahp_flag();
-                        tmp = gen_vfp_mrs();
-                        tcg_gen_ext16u_i32(tmp, tmp);
-                        if (dp) {
-                            gen_helper_vfp_fcvt_f16_to_f64(cpu_F0d, tmp,
-                                                           fpst, ahp_mode);
-                        } else {
-                            gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp,
-                                                           fpst, ahp_mode);
-                        }
-                        tcg_temp_free_i32(ahp_mode);
-                        tcg_temp_free_ptr(fpst);
-                        tcg_temp_free_i32(tmp);
-                        break;
-                    }
-                    case 5: /* vcvtt.f32.f16, vcvtt.f64.f16 */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(false);
-                        TCGv_i32 ahp = get_ahp_flag();
-                        tmp = gen_vfp_mrs();
-                        tcg_gen_shri_i32(tmp, tmp, 16);
-                        if (dp) {
-                            gen_helper_vfp_fcvt_f16_to_f64(cpu_F0d, tmp,
-                                                           fpst, ahp);
-                        } else {
-                            gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp,
-                                                           fpst, ahp);
-                        }
-                        tcg_temp_free_i32(tmp);
-                        tcg_temp_free_i32(ahp);
-                        tcg_temp_free_ptr(fpst);
-                        break;
-                    }
                     case 6: /* vcvtb.f16.f32, vcvtb.f16.f64 */
                     {
                         TCGv_ptr fpst = get_fpstatus_ptr(false);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCMP_sp      ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCMP_dp      ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
              vd=%vd_dp vm=%vm_dp
+
+# VCVTT and VCVTB from f16: Vd format depends on size bit; Vm is always vm_sp
+VCVT_f32_f16 ---- 1110 1.11 0010 .... 1010 t:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
+             vd=%vd_dp vm=%vm_sp
-- 
2.20.1

Convert the VCVTT and VCVTB instructions which convert from
f32 and f64 to f16 to decodetree.

Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
store of the right half of the input single-precision register
rather than doing a load/modify/store sequence on the full
32 bits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 62 ++++++++++++++++++++++++++
 target/arm/translate.c         | 79 +---------------------------------
 target/arm/vfp.decode          |  6 +++
 3 files changed, 69 insertions(+), 78 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     tcg_temp_free_i64(vd);
     return true;
 }
+
+static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 ahp_mode;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_fp16_spconv, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    ahp_mode = get_ahp_flag();
+    tmp = tcg_temp_new_i32();
+
+    neon_load_reg32(tmp, a->vm);
+    gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
+    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
+    tcg_temp_free_i32(ahp_mode);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 ahp_mode;
+    TCGv_i32 tmp;
+    TCGv_i64 vm;
+
+    if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm  & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    ahp_mode = get_ahp_flag();
+    tmp = tcg_temp_new_i32();
+    vm = tcg_temp_new_i64();
+
+    neon_load_reg64(vm, a->vm);
+    gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
+    tcg_temp_free_i64(vm);
+    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
+    tcg_temp_free_i32(ahp_mode);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
 #define VFP_SREG_M(insn) VFP_SREG(insn,  0,  5)
 #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
 
-/* Move between integer and VFP cores.  */
-static TCGv_i32 gen_vfp_mrs(void)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_mov_i32(tmp, cpu_F0s);
-    return tmp;
-}
-
-static void gen_vfp_msr(TCGv_i32 tmp)
-{
-    tcg_gen_mov_i32(cpu_F0s, tmp);
-    tcg_temp_free_i32(tmp);
-}
-
 static void gen_neon_dup_low16(TCGv_i32 var)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 {
     uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
     int dp, veclen;
-    TCGv_i32 tmp;
-    TCGv_i32 tmp2;
 
     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 0 ... 5:
-                case 8 ... 11:
+                case 0 ... 11:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             if (op == 15) {
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
-                case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
-                case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
-                    if (dp) {
-                        if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
-                            return 1;
-                        }
-                    } else {
-                        if (!dc_isar_feature(aa32_fp16_spconv, s)) {
-                            return 1;
-                        }
-                    }
-                    rd_is_dp = false;
-                    break;
-
                 case 0x0c: /* vrintr */
                 case 0x0d: /* vrintz */
                 case 0x0e: /* vrintx */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 switch (op) {
                 case 15: /* extension space */
                     switch (rn) {
-                    case 6: /* vcvtb.f16.f32, vcvtb.f16.f64 */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(false);
-                        TCGv_i32 ahp = get_ahp_flag();
-                        tmp = tcg_temp_new_i32();
-
-                        if (dp) {
-                            gen_helper_vfp_fcvt_f64_to_f16(tmp, cpu_F0d,
-                                                           fpst, ahp);
-                        } else {
-                            gen_helper_vfp_fcvt_f32_to_f16(tmp, cpu_F0s,
-                                                           fpst, ahp);
-                        }
-                        tcg_temp_free_i32(ahp);
-                        tcg_temp_free_ptr(fpst);
-                        gen_mov_F0_vreg(0, rd);
-                        tmp2 = gen_vfp_mrs();
-                        tcg_gen_andi_i32(tmp2, tmp2, 0xffff0000);
-                        tcg_gen_or_i32(tmp, tmp, tmp2);
-                        tcg_temp_free_i32(tmp2);
-                        gen_vfp_msr(tmp);
-                        break;
-                    }
-                    case 7: /* vcvtt.f16.f32, vcvtt.f16.f64 */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(false);
-                        TCGv_i32 ahp = get_ahp_flag();
-                        tmp = tcg_temp_new_i32();
-                        if (dp) {
-                            gen_helper_vfp_fcvt_f64_to_f16(tmp, cpu_F0d,
-                                                           fpst, ahp);
-                        } else {
-                            gen_helper_vfp_fcvt_f32_to_f16(tmp, cpu_F0s,
-                                                           fpst, ahp);
-                        }
-                        tcg_temp_free_i32(ahp);
-                        tcg_temp_free_ptr(fpst);
-                        tcg_gen_shli_i32(tmp, tmp, 16);
-                        gen_mov_F0_vreg(0, rd);
-                        tmp2 = gen_vfp_mrs();
-                        tcg_gen_ext16u_i32(tmp2, tmp2);
-                        tcg_gen_or_i32(tmp, tmp, tmp2);
-                        tcg_temp_free_i32(tmp2);
-                        gen_vfp_msr(tmp);
-                        break;
-                    }
                     case 12: /* vrintr */
                     {
                         TCGv_ptr fpst = get_fpstatus_ptr(0);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f32_f16 ---- 1110 1.11 0010 .... 1010 t:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
              vd=%vd_dp vm=%vm_sp
+
+# VCVTB and VCVTT to f16: Vd format is always vd_sp; Vm format depends on size bit
+VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
+             vd=%vd_sp vm=%vm_dp
-- 
2.20.1

Convert the VFP round-to-integer instructions VRINTR, VRINTZ and
VRINTX to decodetree.

These instructions were only introduced as part of the "VFP misc"
additions in v8A, so we check this. The old decoder's implementation
was incorrectly providing them even for v7A CPUs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 163 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  45 +--------
 target/arm/vfp.decode          |  15 +++
 3 files changed, 179 insertions(+), 44 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
     tcg_temp_free_i32(tmp);
     return true;
 }
+
+static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    neon_load_reg32(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    gen_helper_rints(tmp, tmp, fpst);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i64 tmp;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i64();
+    neon_load_reg64(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    gen_helper_rintd(tmp, tmp, fpst);
+    neon_store_reg64(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i64(tmp);
+    return true;
+}
+
+static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+    TCGv_i32 tcg_rmode;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    neon_load_reg32(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    tcg_rmode = tcg_const_i32(float_round_to_zero);
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    gen_helper_rints(tmp, tmp, fpst);
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tcg_rmode);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i64 tmp;
+    TCGv_i32 tcg_rmode;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i64();
+    neon_load_reg64(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    tcg_rmode = tcg_const_i32(float_round_to_zero);
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    gen_helper_rintd(tmp, tmp, fpst);
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    neon_store_reg64(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i32(tcg_rmode);
+    return true;
+}
+
+static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    neon_load_reg32(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    gen_helper_rints_exact(tmp, tmp, fpst);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i64 tmp;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i64();
+    neon_load_reg64(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    gen_helper_rintd_exact(tmp, tmp, fpst);
+    neon_store_reg64(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i64(tmp);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 0 ... 11:
+                case 0 ... 14:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             if (op == 15) {
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
-                case 0x0c: /* vrintr */
-                case 0x0d: /* vrintz */
-                case 0x0e: /* vrintx */
-                    break;
-
                 case 0x0f: /* vcvt double<->single */
                     rd_is_dp = !dp;
                     break;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 switch (op) {
                 case 15: /* extension space */
                     switch (rn) {
-                    case 12: /* vrintr */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(0);
-                        if (dp) {
-                            gen_helper_rintd(cpu_F0d, cpu_F0d, fpst);
-                        } else {
-                            gen_helper_rints(cpu_F0s, cpu_F0s, fpst);
-                        }
-                        tcg_temp_free_ptr(fpst);
-                        break;
-                    }
-                    case 13: /* vrintz */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(0);
-                        TCGv_i32 tcg_rmode;
-                        tcg_rmode = tcg_const_i32(float_round_to_zero);
-                        gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-                        if (dp) {
-                            gen_helper_rintd(cpu_F0d, cpu_F0d, fpst);
-                        } else {
-                            gen_helper_rints(cpu_F0s, cpu_F0s, fpst);
-                        }
-                        gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-                        tcg_temp_free_i32(tcg_rmode);
-                        tcg_temp_free_ptr(fpst);
-                        break;
-                    }
-                    case 14: /* vrintx */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(0);
-                        if (dp) {
-                            gen_helper_rintd_exact(cpu_F0d, cpu_F0d, fpst);
-                        } else {
-                            gen_helper_rints_exact(cpu_F0s, cpu_F0s, fpst);
-                        }
-                        tcg_temp_free_ptr(fpst);
-                        break;
-                    }
                     case 15: /* single<->double conversion */
                         if (dp) {
                             gen_helper_vfp_fcvtsd(cpu_F0s, cpu_F0d, cpu_env);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
              vd=%vd_sp vm=%vm_dp
+
+VRINTR_sp    ---- 1110 1.11 0110 .... 1010 01.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VRINTR_dp    ---- 1110 1.11 0110 .... 1011 01.0 .... \
+             vd=%vd_dp vm=%vm_dp
+
+VRINTZ_sp    ---- 1110 1.11 0110 .... 1010 11.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VRINTZ_dp    ---- 1110 1.11 0110 .... 1011 11.0 .... \
+             vd=%vd_dp vm=%vm_dp
+
+VRINTX_sp    ---- 1110 1.11 0111 .... 1010 01.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VRINTX_dp    ---- 1110 1.11 0111 .... 1011 01.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Convert the VCVT double/single precision conversion insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 48 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 13 +--------
 target/arm/vfp.decode          |  6 +++++
 3 files changed, 55 insertions(+), 12 deletions(-)

Convert the VCVT integer-to-float instructions to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 58 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 12 +------
 target/arm/vfp.decode          |  6 ++++
 3 files changed, 65 insertions(+), 11 deletions(-)

Convert the VJCVT instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 28 ++++++++++++++++++++++++++++
 target/arm/translate.c         | 12 +-----------
 target/arm/vfp.decode          |  4 ++++
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
     tcg_temp_free_ptr(fpst);
     return true;
 }
+
+static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
+{
+    TCGv_i32 vd;
+    TCGv_i64 vm;
+
+    if (!dc_isar_feature(aa32_jscvt, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vm = tcg_temp_new_i64();
+    vd = tcg_temp_new_i32();
+    neon_load_reg64(vm, a->vm);
+    gen_helper_vjcvt(vd, vm, cpu_env);
+    neon_store_reg32(vd, a->vd);
+    tcg_temp_free_i64(vm);
+    tcg_temp_free_i32(vd);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 0 ... 17:
+                case 0 ... 19:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     rm_is_dp = false;
                     break;
 
-                case 0x13: /* vjcvt */
-                    if (!dp || !dc_isar_feature(aa32_jscvt, s)) {
-                        return 1;
-                    }
-                    rd_is_dp = false;
-                    break;
-
                 default:
                     return 1;
                 }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 switch (op) {
                 case 15: /* extension space */
                     switch (rn) {
-                    case 19: /* vjcvt */
-                        gen_helper_vjcvt(cpu_F0s, cpu_F0d, cpu_env);
-                        break;
                     case 20: /* fshto */
                         gen_vfp_shto(dp, 16 - rm, 0);
                         break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_int_sp  ---- 1110 1.11 1000 .... 1010 s:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_int_dp  ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
              vd=%vd_dp vm=%vm_sp
+
+# VJCVT is always dp to sp
+VJCVT        ---- 1110 1.11 1001 .... 1011 11.0 .... \
+             vd=%vd_sp vm=%vm_dp
-- 
2.20.1

Convert the VCVT (between floating-point and fixed-point) instructions
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 124 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  57 +--------------
 target/arm/vfp.decode          |  10 +++
 3 files changed, 136 insertions(+), 55 deletions(-)

Convert the float-to-integer VCVT instructions to decodetree.
Since these are the last unconverted instructions, we can
delete the old decoder structure entirely now.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c |  72 ++++++++++
 target/arm/translate.c         | 241 +--------------------------------
 target/arm/vfp.decode          |   6 +
 3 files changed, 80 insertions(+), 239 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
     tcg_temp_free_ptr(fpst);
     return true;
 }
+
+static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
+{
+    TCGv_i32 vm;
+    TCGv_ptr fpst;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    vm = tcg_temp_new_i32();
+    neon_load_reg32(vm, a->vm);
+
+    if (a->s) {
+        if (a->rz) {
+            gen_helper_vfp_tosizs(vm, vm, fpst);
+        } else {
+            gen_helper_vfp_tosis(vm, vm, fpst);
+        }
+    } else {
+        if (a->rz) {
+            gen_helper_vfp_touizs(vm, vm, fpst);
+        } else {
+            gen_helper_vfp_touis(vm, vm, fpst);
+        }
+    }
+    neon_store_reg32(vm, a->vd);
+    tcg_temp_free_i32(vm);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
+static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
+{
+    TCGv_i32 vd;
+    TCGv_i64 vm;
+    TCGv_ptr fpst;
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    vm = tcg_temp_new_i64();
+    vd = tcg_temp_new_i32();
+    neon_load_reg64(vm, a->vm);
+
+    if (a->s) {
+        if (a->rz) {
+            gen_helper_vfp_tosizd(vd, vm, fpst);
+        } else {
+            gen_helper_vfp_tosid(vd, vm, fpst);
+        }
+    } else {
+        if (a->rz) {
+            gen_helper_vfp_touizd(vd, vm, fpst);
+        } else {
+            gen_helper_vfp_touid(vd, vm, fpst);
+        }
+    }
+    neon_store_reg32(vd, a->vd);
+    tcg_temp_free_i32(vd);
+    tcg_temp_free_i64(vm);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp, int neon) \
     tcg_temp_free_ptr(statusptr); \
 }
 
-VFP_GEN_FTOI(toui)
 VFP_GEN_FTOI(touiz)
-VFP_GEN_FTOI(tosi)
 VFP_GEN_FTOI(tosiz)
 #undef VFP_GEN_FTOI
 
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 }
 
 #define tcg_gen_ld_f32 tcg_gen_ld_i32
-#define tcg_gen_ld_f64 tcg_gen_ld_i64
 #define tcg_gen_st_f32 tcg_gen_st_i32
-#define tcg_gen_st_f64 tcg_gen_st_i64
-
-static inline void gen_mov_F0_vreg(int dp, int reg)
-{
-    if (dp)
-        tcg_gen_ld_f64(cpu_F0d, cpu_env, vfp_reg_offset(dp, reg));
-    else
-        tcg_gen_ld_f32(cpu_F0s, cpu_env, vfp_reg_offset(dp, reg));
-}
-
-static inline void gen_mov_F1_vreg(int dp, int reg)
-{
-    if (dp)
-        tcg_gen_ld_f64(cpu_F1d, cpu_env, vfp_reg_offset(dp, reg));
-    else
-        tcg_gen_ld_f32(cpu_F1s, cpu_env, vfp_reg_offset(dp, reg));
-}
-
-static inline void gen_mov_vreg_F0(int dp, int reg)
-{
-    if (dp)
-        tcg_gen_st_f64(cpu_F0d, cpu_env, vfp_reg_offset(dp, reg));
-    else
-        tcg_gen_st_f32(cpu_F0s, cpu_env, vfp_reg_offset(dp, reg));
-}
 
 #define ARM_CP_RW_BIT   (1 << 20)
 
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
  */
 static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 {
-    uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
-    int dp, veclen;
-
     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
         return 1;
     }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             return 0;
         }
     }
-
-    if (extract32(insn, 28, 4) == 0xf) {
-        /*
-         * Encodings with T=1 (Thumb) or unconditional (ARM): these
-         * were all handled by the decodetree decoder, so any insn
-         * patterns which get here must be UNDEF.
-         */
-        return 1;
-    }
-
-    /*
-     * FIXME: this access check should not take precedence over UNDEF
-     * for invalid encodings; we will generate incorrect syndrome information
-     * for attempts to execute invalid vfp/neon encodings with FP disabled.
-     */
-    if (!vfp_access_check(s)) {
-        return 0;
-    }
-
-    dp = ((insn & 0xf00) == 0xb00);
-    switch ((insn >> 24) & 0xf) {
-    case 0xe:
-        if (insn & (1 << 4)) {
-            /* already handled by decodetree */
-            return 1;
-        } else {
-            /* data processing */
-            bool rd_is_dp = dp;
-            bool rm_is_dp = dp;
-            bool no_output = false;
-
-            /* The opcode is in bits 23, 21, 20 and 6.  */
-            op = ((insn >> 20) & 8) | ((insn >> 19) & 6) | ((insn >> 6) & 1);
-            rn = VFP_SREG_N(insn);
-
-            switch (op) {
-            case 0 ... 14:
-                /* Already handled by decodetree */
-                return 1;
-            case 15:
-                switch (rn) {
-                case 0 ... 23:
-                case 28 ... 31:
-                    /* Already handled by decodetree */
-                    return 1;
-                default:
-                    break;
-                }
-            default:
-                break;
-            }
-
-            if (op == 15) {
-                /* rn is opcode, encoded as per VFP_SREG_N. */
-                switch (rn) {
-                case 0x18: /* vcvtr.u32.fxx */
-                case 0x19: /* vcvtz.u32.fxx */
-                case 0x1a: /* vcvtr.s32.fxx */
-                case 0x1b: /* vcvtz.s32.fxx */
-                    rd_is_dp = false;
-                    break;
-
-                default:
-                    return 1;
-                }
-            } else if (dp) {
-                /* rn is register number */
-                VFP_DREG_N(rn, insn);
-            }
-
-            if (rd_is_dp) {
-                VFP_DREG_D(rd, insn);
-            } else {
-                rd = VFP_SREG_D(insn);
-            }
-            if (rm_is_dp) {
-                VFP_DREG_M(rm, insn);
-            } else {
-                rm = VFP_SREG_M(insn);
-            }
-
-            veclen = s->vec_len;
-            if (op == 15 && rn > 3) {
-                veclen = 0;
-            }
-
-            /* Shut up compiler warnings.  */
-            delta_m = 0;
-            delta_d = 0;
-            bank_mask = 0;
-
-            if (veclen > 0) {
-                if (dp)
-                    bank_mask = 0xc;
-                else
-                    bank_mask = 0x18;
-
-                /* Figure out what type of vector operation this is.  */
-                if ((rd & bank_mask) == 0) {
-                    /* scalar */
-                    veclen = 0;
-                } else {
-                    if (dp)
-                        delta_d = (s->vec_stride >> 1) + 1;
-                    else
-                        delta_d = s->vec_stride + 1;
-
-                    if ((rm & bank_mask) == 0) {
-                        /* mixed scalar/vector */
-                        delta_m = 0;
-                    } else {
-                        /* vector */
-                        delta_m = delta_d;
-                    }
-                }
-            }
-
-            /* Load the initial operands.  */
-            if (op == 15) {
-                switch (rn) {
-                default:
-                    /* One source operand.  */
-                    gen_mov_F0_vreg(rm_is_dp, rm);
-                    break;
-                }
-            } else {
-                /* Two source operands.  */
-                gen_mov_F0_vreg(dp, rn);
-                gen_mov_F1_vreg(dp, rm);
-            }
-
-            for (;;) {
-                /* Perform the calculation.  */
-                switch (op) {
-                case 15: /* extension space */
-                    switch (rn) {
-                    case 24: /* ftoui */
-                        gen_vfp_toui(dp, 0);
-                        break;
-                    case 25: /* ftouiz */
-                        gen_vfp_touiz(dp, 0);
-                        break;
-                    case 26: /* ftosi */
-                        gen_vfp_tosi(dp, 0);
-                        break;
-                    case 27: /* ftosiz */
-                        gen_vfp_tosiz(dp, 0);
-                        break;
-                    default: /* undefined */
-                        g_assert_not_reached();
-                    }
-                    break;
-                default: /* undefined */
-                    return 1;
-                }
-
-                /* Write back the result, if any.  */
-                if (!no_output) {
-                    gen_mov_vreg_F0(rd_is_dp, rd);
-                }
-
-                /* break out of the loop if we have finished  */
-                if (veclen == 0) {
-                    break;
-                }
-
-                if (op == 15 && delta_m == 0) {
-                    /* single source one-many */
-                    while (veclen--) {
-                        rd = ((rd + delta_d) & (bank_mask - 1))
-                             | (rd & bank_mask);
-                        gen_mov_vreg_F0(dp, rd);
-                    }
-                    break;
-                }
-                /* Setup the next operands.  */
-                veclen--;
-                rd = ((rd + delta_d) & (bank_mask - 1))
-                     | (rd & bank_mask);
-
-                if (op == 15) {
-                    /* One source operand.  */
-                    rm = ((rm + delta_m) & (bank_mask - 1))
-                         | (rm & bank_mask);
-                    gen_mov_F0_vreg(dp, rm);
-                } else {
-                    /* Two source operands.  */
-                    rn = ((rn + delta_d) & (bank_mask - 1))
-                         | (rn & bank_mask);
-                    gen_mov_F0_vreg(dp, rn);
-                    if (delta_m) {
-                        rm = ((rm + delta_m) & (bank_mask - 1))
-                             | (rm & bank_mask);
-                        gen_mov_F1_vreg(dp, rm);
-                    }
-                }
-            }
-        }
-        break;
-    case 0xc:
-    case 0xd:
-        /* Already handled by decodetree */
-        return 1;
-    default:
-        /* Should never happen.  */
-        return 1;
-    }
-    return 0;
+    /* If the decodetree decoder didn't handle this insn, it must be UNDEF */
+    return 1;
 }
 
 static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_fix_sp  ---- 1110 1.11 1.1. .... 1010 .1.0 .... \
              vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
 VCVT_fix_dp  ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
              vd=%vd_dp imm=%vm_sp opc=%vcvt_fix_op
+
+# VCVT float to integer (VCVT and VCVTR): Vd always single; Vd depends on size
+VCVT_sp_int  ---- 1110 1.11 110 s:1 .... 1010 rz:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VCVT_dp_int  ---- 1110 1.11 110 s:1 .... 1011 rz:1 1.0 .... \
+             vd=%vd_sp vm=%vm_dp
-- 
2.20.1

For VFP short vectors, the VFP registers are divided into a
series of banks: for single-precision these are s0-s7, s8-s15,
s16-s23 and s24-s31; for double-precision they are d0-d3,
d4-d7, ... d28-d31. Some banks are "scalar" meaning that
use of a register within them triggers a pure-scalar or
mixed vector-scalar operation rather than a full vector
operation. The scalar banks are s0-s7, d0-d3 and d16-d19.
When using a bank as part of a vector operation, we
iterate through it, increasing the register number by
the specified stride each time, and wrapping around to
the beginning of the bank.

Unfortunately our calculation of the "increment" part of this
was incorrect:
 vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask)
will only do the intended thing if bank_mask has exactly
one set high bit. For instance for doubles (bank_mask = 0xc),
if we start with vd = 6 and delta_d = 2 then vd is updated
to 12 rather than the intended 4.

This only causes problems in the unlikely case that the
starting register is not the first in its bank: if the
register number doesn't have to wrap around then the
expression happens to give the right answer.

Fix this bug by abstracting out the "check whether register
is in a scalar bank" and "advance register within bank"
operations to utility functions which use the right
bit masking operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 100 ++++++++++++++++++++-------------
 1 file changed, 60 insertions(+), 40 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ typedef void VFPGen3OpDPFn(TCGv_i64 vd,
 typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
 typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
 
+/*
+ * Return true if the specified S reg is in a scalar bank
+ * (ie if it is s0..s7)
+ */
+static inline bool vfp_sreg_is_scalar(int reg)
+{
+    return (reg & 0x18) == 0;
+}
+
+/*
+ * Return true if the specified D reg is in a scalar bank
+ * (ie if it is d0..d3 or d16..d19)
+ */
+static inline bool vfp_dreg_is_scalar(int reg)
+{
+    return (reg & 0xc) == 0;
+}
+
+/*
+ * Advance the S reg number forwards by delta within its bank
+ * (ie increment the low 3 bits but leave the rest the same)
+ */
+static inline int vfp_advance_sreg(int reg, int delta)
+{
+    return ((reg + delta) & 0x7) | (reg & ~0x7);
+}
+
+/*
+ * Advance the D reg number forwards by delta within its bank
+ * (ie increment the low 2 bits but leave the rest the same)
+ */
+static inline int vfp_advance_dreg(int reg, int delta)
+{
+    return ((reg + delta) & 0x3) | (reg & ~0x3);
+}
+
 /*
  * Perform a 3-operand VFP data processing instruction. fn is the
  * callback to do the actual operation; this function deals with the
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i32 f0, f1, fd;
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
     }
 
     if (veclen > 0) {
-        bank_mask = 0x18;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_sreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = s->vec_stride + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_sreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+        vd = vfp_advance_sreg(vd, delta_d);
+        vn = vfp_advance_sreg(vn, delta_d);
         neon_load_reg32(f0, vn);
         if (delta_m) {
-            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+            vm = vfp_advance_sreg(vm, delta_m);
             neon_load_reg32(f1, vm);
         }
     }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i64 f0, f1, fd;
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     }
 
     if (veclen > 0) {
-        bank_mask = 0xc;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_dreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = (s->vec_stride >> 1) + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_dreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
         }
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+        vd = vfp_advance_dreg(vd, delta_d);
+        vn = vfp_advance_dreg(vn, delta_d);
         neon_load_reg64(f0, vn);
         if (delta_m) {
-            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+            vm = vfp_advance_dreg(vm, delta_m);
             neon_load_reg64(f1, vm);
         }
     }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i32 f0, fd;
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     }
 
     if (veclen > 0) {
-        bank_mask = 0x18;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_sreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = s->vec_stride + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_sreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
         if (delta_m == 0) {
             /* single source one-many */
             while (veclen--) {
-                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+                vd = vfp_advance_sreg(vd, delta_d);
                 neon_store_reg32(fd, vd);
             }
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+        vd = vfp_advance_sreg(vd, delta_d);
+        vm = vfp_advance_sreg(vm, delta_m);
         neon_load_reg32(f0, vm);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i64 f0, fd;
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     }
 
     if (veclen > 0) {
-        bank_mask = 0xc;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_dreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = (s->vec_stride >> 1) + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_dreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
         if (delta_m == 0) {
             /* single source one-many */
             while (veclen--) {
-                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+                vd = vfp_advance_dreg(vd, delta_d);
                 neon_store_reg64(fd, vd);
             }
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+        vd = vfp_advance_dreg(vd, delta_d);
+        vd = vfp_advance_dreg(vm, delta_m);
         neon_load_reg64(f0, vm);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
 static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
 {
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i32 fd;
     uint32_t n, i, vd;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
     }
 
     if (veclen > 0) {
-        bank_mask = 0x18;
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_sreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vd = vfp_advance_sreg(vd, delta_d);
     }
 
     tcg_temp_free_i32(fd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
 static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
 {
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i64 fd;
     uint32_t n, i, vd;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     }
 
     if (veclen > 0) {
-        bank_mask = 0xc;
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_dreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vfp_advance_dreg(vd, delta_d);
     }
 
     tcg_temp_free_i64(fd);
-- 
2.20.1

First arm pullreq of the 8.0 series...

The following changes since commit ae2b87341b5ddb0dcb1b3f2d4f586ef18de75873:

Merge tag 'pull-qapi-2022-12-14-v2' of https://repo.or.cz/qemu/armbru into staging (2022-12-14 22:42:14 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20221215

for you to fetch changes up to 4f3ebdc33618e7c163f769047859d6f34373e3af:

target/arm: Restrict arm_cpu_exec_interrupt() to TCG accelerator (2022-12-15 11:18:20 +0000)

----------------------------------------------------------------
target-arm queue:
 * hw/arm/virt: Add properties to allow more granular
   configuration of use of highmem space
 * target/arm: Add Cortex-A55 CPU
 * hw/intc/arm_gicv3: Fix GICD_TYPER ITLinesNumber advertisement
 * Implement FEAT_EVT
 * Some 3-phase-reset conversions for Arm GIC, SMMU
 * hw/arm/boot: set initrd with #address-cells type in fdt
 * align user-mode exposed ID registers with Linux
 * hw/misc: Move some arm-related files from specific_ss into softmmu_ss
 * Restrict arm_cpu_exec_interrupt() to TCG accelerator

----------------------------------------------------------------
Gavin Shan (7):
      hw/arm/virt: Introduce virt_set_high_memmap() helper
      hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()
      hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
      hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
      hw/arm/virt: Improve high memory region address assignment
      hw/arm/virt: Add 'compact-highmem' property
      hw/arm/virt: Add properties to disable high memory regions

Luke Starrett (1):
      hw/intc/arm_gicv3: Fix GICD_TYPER ITLinesNumber advertisement

Mihai Carabas (1):
      hw/arm/virt: build SMBIOS 19 table

Peter Maydell (15):
      target/arm: Allow relevant HCR bits to be written for FEAT_EVT
      target/arm: Implement HCR_EL2.TTLBIS traps
      target/arm: Implement HCR_EL2.TTLBOS traps
      target/arm: Implement HCR_EL2.TICAB,TOCU traps
      target/arm: Implement HCR_EL2.TID4 traps
      target/arm: Report FEAT_EVT for TCG '-cpu max'
      hw/arm: Convert TYPE_ARM_SMMU to 3-phase reset
      hw/arm: Convert TYPE_ARM_SMMUV3 to 3-phase reset
      hw/intc: Convert TYPE_ARM_GIC_COMMON to 3-phase reset
      hw/intc: Convert TYPE_ARM_GIC_KVM to 3-phase reset
      hw/intc: Convert TYPE_ARM_GICV3_COMMON to 3-phase reset
      hw/intc: Convert TYPE_KVM_ARM_GICV3 to 3-phase reset
      hw/intc: Convert TYPE_ARM_GICV3_ITS_COMMON to 3-phase reset
      hw/intc: Convert TYPE_ARM_GICV3_ITS to 3-phase reset
      hw/intc: Convert TYPE_KVM_ARM_ITS to 3-phase reset

Philippe Mathieu-Daudé (1):
      target/arm: Restrict arm_cpu_exec_interrupt() to TCG accelerator

Schspa Shi (1):
      hw/arm/boot: set initrd with #address-cells type in fdt

Thomas Huth (1):
      hw/misc: Move some arm-related files from specific_ss into softmmu_ss

Timofey Kutergin (1):
      target/arm: Add Cortex-A55 CPU

Zhuojia Shen (1):
      target/arm: align exposed ID registers with Linux

From: Gavin Shan <gshan@redhat.com>

This introduces virt_set_high_memmap() helper. The logic of high
memory region address assignment is moved to the helper. The intention
is to make the subsequent optimization for high memory region address
assignment easier.

No functional change intended.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-id: 20221029224307.138822-2-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/virt.c | 74 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 41 insertions(+), 33 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
     return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static void virt_set_high_memmap(VirtMachineState *vms,
+                                 hwaddr base, int pa_bits)
+{
+    int i;
+
+    for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+        hwaddr size = extended_memmap[i].size;
+        bool fits;
+
+        base = ROUND_UP(base, size);
+        vms->memmap[i].base = base;
+        vms->memmap[i].size = size;
+
+        /*
+         * Check each device to see if they fit in the PA space,
+         * moving highest_gpa as we go.
+         *
+         * For each device that doesn't fit, disable it.
+         */
+        fits = (base + size) <= BIT_ULL(pa_bits);
+        if (fits) {
+            vms->highest_gpa = base + size - 1;
+        }
+
+        switch (i) {
+        case VIRT_HIGH_GIC_REDIST2:
+            vms->highmem_redists &= fits;
+            break;
+        case VIRT_HIGH_PCIE_ECAM:
+            vms->highmem_ecam &= fits;
+            break;
+        case VIRT_HIGH_PCIE_MMIO:
+            vms->highmem_mmio &= fits;
+            break;
+        }
+
+        base += size;
+    }
+}
+
 static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
 {
     MachineState *ms = MACHINE(vms);
@@ -XXX,XX +XXX,XX @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
     /* We know for sure that at least the memory fits in the PA space */
     vms->highest_gpa = memtop - 1;
 
-    for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-        hwaddr size = extended_memmap[i].size;
-        bool fits;
-
-        base = ROUND_UP(base, size);
-        vms->memmap[i].base = base;
-        vms->memmap[i].size = size;
-
-        /*
-         * Check each device to see if they fit in the PA space,
-         * moving highest_gpa as we go.
-         *
-         * For each device that doesn't fit, disable it.
-         */
-        fits = (base + size) <= BIT_ULL(pa_bits);
-        if (fits) {
-            vms->highest_gpa = base + size - 1;
-        }
-
-        switch (i) {
-        case VIRT_HIGH_GIC_REDIST2:
-            vms->highmem_redists &= fits;
-            break;
-        case VIRT_HIGH_PCIE_ECAM:
-            vms->highmem_ecam &= fits;
-            break;
-        case VIRT_HIGH_PCIE_MMIO:
-            vms->highmem_mmio &= fits;
-            break;
-        }
-
-        base += size;
-    }
+    virt_set_high_memmap(vms, base, pa_bits);
 
     if (device_memory_size > 0) {
         ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
-- 
2.25.1

From: Gavin Shan <gshan@redhat.com>

This renames variable 'size' to 'region_size' in virt_set_high_memmap().
Its counterpart ('region_base') will be introduced in next patch.

No functional change intended.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-id: 20221029224307.138822-3-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/virt.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
 static void virt_set_high_memmap(VirtMachineState *vms,
                                  hwaddr base, int pa_bits)
 {
+    hwaddr region_size;
+    bool fits;
     int i;
 
     for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-        hwaddr size = extended_memmap[i].size;
-        bool fits;
+        region_size = extended_memmap[i].size;
 
-        base = ROUND_UP(base, size);
+        base = ROUND_UP(base, region_size);
         vms->memmap[i].base = base;
-        vms->memmap[i].size = size;
+        vms->memmap[i].size = region_size;
 
         /*
          * Check each device to see if they fit in the PA space,
@@ -XXX,XX +XXX,XX @@ static void virt_set_high_memmap(VirtMachineState *vms,
          *
          * For each device that doesn't fit, disable it.
          */
-        fits = (base + size) <= BIT_ULL(pa_bits);
+        fits = (base + region_size) <= BIT_ULL(pa_bits);
         if (fits) {
-            vms->highest_gpa = base + size - 1;
+            vms->highest_gpa = base + region_size - 1;
         }
 
         switch (i) {
@@ -XXX,XX +XXX,XX @@ static void virt_set_high_memmap(VirtMachineState *vms,
             break;
         }
 
-        base += size;
+        base += region_size;
     }
 }
 
-- 
2.25.1

From: Gavin Shan <gshan@redhat.com>

This introduces variable 'region_base' for the base address of the
specific high memory region. It's the preparatory work to optimize
high memory region address assignment.

No functional change intended.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-id: 20221029224307.138822-4-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/virt.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
 static void virt_set_high_memmap(VirtMachineState *vms,
                                  hwaddr base, int pa_bits)
 {
-    hwaddr region_size;
+    hwaddr region_base, region_size;
     bool fits;
     int i;
 
     for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+        region_base = ROUND_UP(base, extended_memmap[i].size);
         region_size = extended_memmap[i].size;
 
-        base = ROUND_UP(base, region_size);
-        vms->memmap[i].base = base;
+        vms->memmap[i].base = region_base;
         vms->memmap[i].size = region_size;
 
         /*
@@ -XXX,XX +XXX,XX @@ static void virt_set_high_memmap(VirtMachineState *vms,
          *
          * For each device that doesn't fit, disable it.
          */
-        fits = (base + region_size) <= BIT_ULL(pa_bits);
+        fits = (region_base + region_size) <= BIT_ULL(pa_bits);
         if (fits) {
-            vms->highest_gpa = base + region_size - 1;
+            vms->highest_gpa = region_base + region_size - 1;
         }
 
         switch (i) {
@@ -XXX,XX +XXX,XX @@ static void virt_set_high_memmap(VirtMachineState *vms,
             break;
         }
 
-        base += region_size;
+        base = region_base + region_size;
     }
 }
 
-- 
2.25.1

From: Gavin Shan <gshan@redhat.com>

This introduces virt_get_high_memmap_enabled() helper, which returns
the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will
be used in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-id: 20221029224307.138822-5-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/virt.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

From: Gavin Shan <gshan@redhat.com>

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is likely to be disabled by
    code by toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
    'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
    on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
    PA space limit.

The current implementation of virt_set_{memmap, high_memmap}() isn't
optimized because the high memory region's PA space is always reserved,
regardless of whatever the actual state in the corresponding
vms->highmem_{redists, ecam, mmio} flag. In the code, 'base' and
'vms->highest_gpa' are always increased for case (1), (2) and (3).
It's unnecessary since the assigned PA space for the disabled high
memory region won't be used afterwards.

Improve the address assignment for those three high memory region by
skipping the address assignment for one specific high memory region if
it has been disabled in case (1), (2) and (3). The memory layout may
be changed after the improvement is applied, which leads to potential
migration breakage. So 'vms->highmem_compact' is added to control if
the improvement should be applied. For now, 'vms->highmem_compact' is
set to false, meaning that we don't have memory layout change until it
becomes configurable through property 'compact-highmem' in next patch.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-id: 20221029224307.138822-6-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/virt.h |  1 +
 hw/arm/virt.c         | 15 ++++++++++-----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -XXX,XX +XXX,XX @@ struct VirtMachineState {
     PFlashCFI01 *flash[2];
     bool secure;
     bool highmem;
+    bool highmem_compact;
     bool highmem_ecam;
     bool highmem_mmio;
     bool highmem_redists;
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void virt_set_high_memmap(VirtMachineState *vms,
         vms->memmap[i].size = region_size;
 
         /*
-         * Check each device to see if they fit in the PA space,
-         * moving highest_gpa as we go.
+         * Check each device to see if it fits in the PA space,
+         * moving highest_gpa as we go. For compatibility, move
+         * highest_gpa for disabled fitting devices as well, if
+         * the compact layout has been disabled.
          *
          * For each device that doesn't fit, disable it.
          */
         fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-        if (fits) {
-            vms->highest_gpa = region_base + region_size - 1;
+        *region_enabled &= fits;
+        if (vms->highmem_compact && !*region_enabled) {
+            continue;
         }
 
-        *region_enabled &= fits;
         base = region_base + region_size;
+        if (fits) {
+            vms->highest_gpa = base - 1;
+        }
     }
 }
 
-- 
2.25.1

From: Gavin Shan <gshan@redhat.com>

After the improvement to high memory region address assignment is
applied, the memory layout can be changed, introducing possible
migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region
is disabled or enabled when the optimization is applied or not, with
the following configuration. The configuration is only achievable by
modifying the source code until more properties are added to allow
users selectively disable those high memory regions.

pa_bits              = 40;
  vms->highmem_redists = false;
  vms->highmem_ecam    = false;
  vms->highmem_mmio    = true;

# qemu-system-aarch64 -accel kvm -cpu host    \
    -machine virt-7.2,compact-highmem={on, off} \
    -m 4G,maxmem=511G -monitor stdio

Region             compact-highmem=off         compact-highmem=on
  ----------------------------------------------------------------
  MEM                [1GB         512GB]        [1GB         512GB]
  HIGH_GIC_REDISTS2  [512GB       512GB+64MB]   [disabled]
  HIGH_PCIE_ECAM     [512GB+256MB 512GB+512MB]  [disabled]
  HIGH_PCIE_MMIO     [disabled]                 [512GB       1TB]

In order to keep backwords compatibility, we need to disable the
optimization on machine, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'compact-highmem' property is added so that the optimization can be
explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-id: 20221029224307.138822-7-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/virt.rst |  4 ++++
 include/hw/arm/virt.h    |  1 +
 hw/arm/virt.c            | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -XXX,XX +XXX,XX @@ highmem
   address space above 32 bits. The default is ``on`` for machine types
   later than ``virt-2.12``.
 
+compact-highmem
+  Set ``on``/``off`` to enable/disable the compact layout for high memory regions.
+  The default is ``on`` for machine types later than ``virt-7.2``.
+
 gic-version
   Specify the version of the Generic Interrupt Controller (GIC) to provide.
   Valid values are:
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -XXX,XX +XXX,XX @@ struct VirtMachineClass {
     bool no_pmu;
     bool claim_edge_triggered_timers;
     bool smbios_old_sys_ver;
+    bool no_highmem_compact;
     bool no_highmem_ecam;
     bool no_ged;   /* Machines < 4.2 have no support for ACPI GED device */
     bool kvm_no_adjvtime;
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static const MemMapEntry base_memmap[] = {
  * Note the extended_memmap is sized so that it eventually also includes the
  * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last
  * index of base_memmap).
+ *
+ * The memory map for these Highmem IO Regions can be in legacy or compact
+ * layout, depending on 'compact-highmem' property. With legacy layout, the
+ * PA space for one specific region is always reserved, even if the region
+ * has been disabled or doesn't fit into the PA space. However, the PA space
+ * for the region won't be reserved in these circumstances with compact layout.
  */
 static MemMapEntry extended_memmap[] = {
     /* Additional 64 MB redist region (can contain up to 512 redistributors) */
@@ -XXX,XX +XXX,XX @@ static void virt_set_highmem(Object *obj, bool value, Error **errp)
     vms->highmem = value;
 }
 
+static bool virt_get_compact_highmem(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return vms->highmem_compact;
+}
+
+static void virt_set_compact_highmem(Object *obj, bool value, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    vms->highmem_compact = value;
+}
+
 static bool virt_get_its(Object *obj, Error **errp)
 {
     VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -XXX,XX +XXX,XX @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
                                           "Set on/off to enable/disable using "
                                           "physical address space above 32 bits");
 
+    object_class_property_add_bool(oc, "compact-highmem",
+                                   virt_get_compact_highmem,
+                                   virt_set_compact_highmem);
+    object_class_property_set_description(oc, "compact-highmem",
+                                          "Set on/off to enable/disable compact "
+                                          "layout for high memory regions");
+
     object_class_property_add_str(oc, "gic-version", virt_get_gic_version,
                                   virt_set_gic_version);
     object_class_property_set_description(oc, "gic-version",
@@ -XXX,XX +XXX,XX @@ static void virt_instance_init(Object *obj)
 
     /* High memory is enabled by default */
     vms->highmem = true;
+    vms->highmem_compact = !vmc->no_highmem_compact;
     vms->gic_version = VIRT_GIC_VERSION_NOSEL;
 
     vms->highmem_ecam = !vmc->no_highmem_ecam;
@@ -XXX,XX +XXX,XX @@ DEFINE_VIRT_MACHINE_AS_LATEST(7, 2)
 
 static void virt_machine_7_1_options(MachineClass *mc)
 {
+    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
     virt_machine_7_2_options(mc);
     compat_props_add(mc->compat_props, hw_compat_7_1, hw_compat_7_1_len);
+    /* Compact layout for high memory regions was introduced with 7.2 */
+    vmc->no_highmem_compact = true;
 }
 DEFINE_VIRT_MACHINE(7, 1)
 
-- 
2.25.1

From: Gavin Shan <gshan@redhat.com>

The 3 high memory regions are usually enabled by default, but they may
be not used. For example, VIRT_HIGH_GIC_REDIST2 isn't needed by GICv2.
This leads to waste in the PA space.

Add properties ("highmem-redists", "highmem-ecam", "highmem-mmio") to
allow users selectively disable them if needed. After that, the high
memory region for GICv3 or GICv4 redistributor can be disabled by user,
the number of maximal supported CPUs needs to be calculated based on
'vms->highmem_redists'. The follow-up error message is also improved
to indicate if the high memory region for GICv3 and GICv4 has been
enabled or not.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20221029224307.138822-8-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/virt.rst | 13 +++++++
 hw/arm/virt.c            | 75 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -XXX,XX +XXX,XX @@ compact-highmem
   Set ``on``/``off`` to enable/disable the compact layout for high memory regions.
   The default is ``on`` for machine types later than ``virt-7.2``.
 
+highmem-redists
+  Set ``on``/``off`` to enable/disable the high memory region for GICv3 or
+  GICv4 redistributor. The default is ``on``. Setting this to ``off`` will
+  limit the maximum number of CPUs when GICv3 or GICv4 is used.
+
+highmem-ecam
+  Set ``on``/``off`` to enable/disable the high memory region for PCI ECAM.
+  The default is ``on`` for machine types later than ``virt-3.0``.
+
+highmem-mmio
+  Set ``on``/``off`` to enable/disable the high memory region for PCI MMIO.
+  The default is ``on``.
+
 gic-version
   Specify the version of the Generic Interrupt Controller (GIC) to provide.
   Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
     if (vms->gic_version == VIRT_GIC_VERSION_2) {
         virt_max_cpus = GIC_NCPU;
     } else {
-        virt_max_cpus = virt_redist_capacity(vms, VIRT_GIC_REDIST) +
-            virt_redist_capacity(vms, VIRT_HIGH_GIC_REDIST2);
+        virt_max_cpus = virt_redist_capacity(vms, VIRT_GIC_REDIST);
+        if (vms->highmem_redists) {
+            virt_max_cpus += virt_redist_capacity(vms, VIRT_HIGH_GIC_REDIST2);
+        }
     }
 
     if (max_cpus > virt_max_cpus) {
         error_report("Number of SMP CPUs requested (%d) exceeds max CPUs "
                      "supported by machine 'mach-virt' (%d)",
                      max_cpus, virt_max_cpus);
+        if (vms->gic_version != VIRT_GIC_VERSION_2 && !vms->highmem_redists) {
+            error_printf("Try 'highmem-redists=on' for more CPUs\n");
+        }
+
         exit(1);
     }
 
@@ -XXX,XX +XXX,XX @@ static void virt_set_compact_highmem(Object *obj, bool value, Error **errp)
     vms->highmem_compact = value;
 }
 
+static bool virt_get_highmem_redists(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return vms->highmem_redists;
+}
+
+static void virt_set_highmem_redists(Object *obj, bool value, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    vms->highmem_redists = value;
+}
+
+static bool virt_get_highmem_ecam(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return vms->highmem_ecam;
+}
+
+static void virt_set_highmem_ecam(Object *obj, bool value, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    vms->highmem_ecam = value;
+}
+
+static bool virt_get_highmem_mmio(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return vms->highmem_mmio;
+}
+
+static void virt_set_highmem_mmio(Object *obj, bool value, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    vms->highmem_mmio = value;
+}
+
+
 static bool virt_get_its(Object *obj, Error **errp)
 {
     VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -XXX,XX +XXX,XX @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
                                           "Set on/off to enable/disable compact "
                                           "layout for high memory regions");
 
+    object_class_property_add_bool(oc, "highmem-redists",
+                                   virt_get_highmem_redists,
+                                   virt_set_highmem_redists);
+    object_class_property_set_description(oc, "highmem-redists",
+                                          "Set on/off to enable/disable high "
+                                          "memory region for GICv3 or GICv4 "
+                                          "redistributor");
+
+    object_class_property_add_bool(oc, "highmem-ecam",
+                                   virt_get_highmem_ecam,
+                                   virt_set_highmem_ecam);
+    object_class_property_set_description(oc, "highmem-ecam",
+                                          "Set on/off to enable/disable high "
+                                          "memory region for PCI ECAM");
+
+    object_class_property_add_bool(oc, "highmem-mmio",
+                                   virt_get_highmem_mmio,
+                                   virt_set_highmem_mmio);
+    object_class_property_set_description(oc, "highmem-mmio",
+                                          "Set on/off to enable/disable high "
+                                          "memory region for PCI MMIO");
+
     object_class_property_add_str(oc, "gic-version", virt_get_gic_version,
                                   virt_set_gic_version);
     object_class_property_set_description(oc, "gic-version",
-- 
2.25.1

From: Mihai Carabas <mihai.carabas@oracle.com>

Use the base_memmap to build the SMBIOS 19 table which provides the address
mapping for a Physical Memory Array (from spec [1] chapter 7.20).

This was present on i386 from commit c97294ec1b9e36887e119589d456557d72ab37b5
("SMBIOS: Build aggregate smbios tables and entry point").

[1] https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.5.0.pdf

The absence of this table is a breach of the specs and is
detected by the FirmwareTestSuite (FWTS), but it doesn't
cause any known problems for guest OSes.

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Message-id: 1668789029-5432-1-git-send-email-mihai.carabas@oracle.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/virt.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
 static void virt_build_smbios(VirtMachineState *vms)
 {
     MachineClass *mc = MACHINE_GET_CLASS(vms);
+    MachineState *ms = MACHINE(vms);
     VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
     uint8_t *smbios_tables, *smbios_anchor;
     size_t smbios_tables_len, smbios_anchor_len;
+    struct smbios_phys_mem_area mem_array;
     const char *product = "QEMU Virtual Machine";
 
     if (kvm_enabled()) {
@@ -XXX,XX +XXX,XX @@ static void virt_build_smbios(VirtMachineState *vms)
                         vmc->smbios_old_sys_ver ? "1.0" : mc->name, false,
                         true, SMBIOS_ENTRY_POINT_TYPE_64);
 
-    smbios_get_tables(MACHINE(vms), NULL, 0,
+    /* build the array of physical mem area from base_memmap */
+    mem_array.address = vms->memmap[VIRT_MEM].base;
+    mem_array.length = ms->ram_size;
+
+    smbios_get_tables(ms, &mem_array, 1,
                       &smbios_tables, &smbios_tables_len,
                       &smbios_anchor, &smbios_anchor_len,
                       &error_fatal);
-- 
2.25.1

From: Timofey Kutergin <tkutergin@gmail.com>

The Cortex-A55 is one of the newer armv8.2+ CPUs; in particular
it supports the Privileged Access Never (PAN) feature. Add
a model of this CPU, so you can use a CPU type on the virt
board that models a specific real hardware CPU, rather than
having to use the QEMU-specific "max" CPU type.

Signed-off-by: Timofey Kutergin <tkutergin@gmail.com>
Message-id: 20221121150819.2782817-1-tkutergin@gmail.com
[PMM: tweaked commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/virt.rst |  1 +
 hw/arm/virt.c            |  1 +
 target/arm/cpu64.c       | 69 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 71 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -XXX,XX +XXX,XX @@ Supported guest CPU types:
 - ``cortex-a15`` (32-bit; the default)
 - ``cortex-a35`` (64-bit)
 - ``cortex-a53`` (64-bit)
+- ``cortex-a55`` (64-bit)
 - ``cortex-a57`` (64-bit)
 - ``cortex-a72`` (64-bit)
 - ``cortex-a76`` (64-bit)
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static const char *valid_cpus[] = {
     ARM_CPU_TYPE_NAME("cortex-a15"),
     ARM_CPU_TYPE_NAME("cortex-a35"),
     ARM_CPU_TYPE_NAME("cortex-a53"),
+    ARM_CPU_TYPE_NAME("cortex-a55"),
     ARM_CPU_TYPE_NAME("cortex-a57"),
     ARM_CPU_TYPE_NAME("cortex-a72"),
     ARM_CPU_TYPE_NAME("cortex-a76"),
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
     define_cortex_a72_a57_a53_cp_reginfo(cpu);
 }
 
+static void aarch64_a55_initfn(Object *obj)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+
+    cpu->dtb_compatible = "arm,cortex-a55";
+    set_feature(&cpu->env, ARM_FEATURE_V8);
+    set_feature(&cpu->env, ARM_FEATURE_NEON);
+    set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
+    set_feature(&cpu->env, ARM_FEATURE_AARCH64);
+    set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
+    set_feature(&cpu->env, ARM_FEATURE_EL2);
+    set_feature(&cpu->env, ARM_FEATURE_EL3);
+    set_feature(&cpu->env, ARM_FEATURE_PMU);
+
+    /* Ordered by B2.4 AArch64 registers by functional group */
+    cpu->clidr = 0x82000023;
+    cpu->ctr = 0x84448004; /* L1Ip = VIPT */
+    cpu->dcz_blocksize = 4; /* 64 bytes */
+    cpu->isar.id_aa64dfr0  = 0x0000000010305408ull;
+    cpu->isar.id_aa64isar0 = 0x0000100010211120ull;
+    cpu->isar.id_aa64isar1 = 0x0000000000100001ull;
+    cpu->isar.id_aa64mmfr0 = 0x0000000000101122ull;
+    cpu->isar.id_aa64mmfr1 = 0x0000000010212122ull;
+    cpu->isar.id_aa64mmfr2 = 0x0000000000001011ull;
+    cpu->isar.id_aa64pfr0  = 0x0000000010112222ull;
+    cpu->isar.id_aa64pfr1  = 0x0000000000000010ull;
+    cpu->id_afr0       = 0x00000000;
+    cpu->isar.id_dfr0  = 0x04010088;
+    cpu->isar.id_isar0 = 0x02101110;
+    cpu->isar.id_isar1 = 0x13112111;
+    cpu->isar.id_isar2 = 0x21232042;
+    cpu->isar.id_isar3 = 0x01112131;
+    cpu->isar.id_isar4 = 0x00011142;
+    cpu->isar.id_isar5 = 0x01011121;
+    cpu->isar.id_isar6 = 0x00000010;
+    cpu->isar.id_mmfr0 = 0x10201105;
+    cpu->isar.id_mmfr1 = 0x40000000;
+    cpu->isar.id_mmfr2 = 0x01260000;
+    cpu->isar.id_mmfr3 = 0x02122211;
+    cpu->isar.id_mmfr4 = 0x00021110;
+    cpu->isar.id_pfr0  = 0x10010131;
+    cpu->isar.id_pfr1  = 0x00011011;
+    cpu->isar.id_pfr2  = 0x00000011;
+    cpu->midr = 0x412FD050;          /* r2p0 */
+    cpu->revidr = 0;
+
+    /* From B2.23 CCSIDR_EL1 */
+    cpu->ccsidr[0] = 0x700fe01a; /* 32KB L1 dcache */
+    cpu->ccsidr[1] = 0x200fe01a; /* 32KB L1 icache */
+    cpu->ccsidr[2] = 0x703fe07a; /* 512KB L2 cache */
+
+    /* From B2.96 SCTLR_EL3 */
+    cpu->reset_sctlr = 0x30c50838;
+
+    /* From B4.45 ICH_VTR_EL2 */
+    cpu->gic_num_lrs = 4;
+    cpu->gic_vpribits = 5;
+    cpu->gic_vprebits = 5;
+    cpu->gic_pribits = 5;
+
+    cpu->isar.mvfr0 = 0x10110222;
+    cpu->isar.mvfr1 = 0x13211111;
+    cpu->isar.mvfr2 = 0x00000043;
+
+    /* From D5.4 AArch64 PMU register summary */
+    cpu->isar.reset_pmcr_el0 = 0x410b3000;
+}
+
 static void aarch64_a72_initfn(Object *obj)
 {
     ARMCPU *cpu = ARM_CPU(obj);
@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo aarch64_cpus[] = {
     { .name = "cortex-a35",         .initfn = aarch64_a35_initfn },
     { .name = "cortex-a57",         .initfn = aarch64_a57_initfn },
     { .name = "cortex-a53",         .initfn = aarch64_a53_initfn },
+    { .name = "cortex-a55",         .initfn = aarch64_a55_initfn },
     { .name = "cortex-a72",         .initfn = aarch64_a72_initfn },
     { .name = "cortex-a76",         .initfn = aarch64_a76_initfn },
     { .name = "a64fx",              .initfn = aarch64_a64fx_initfn },
-- 
2.25.1

From: Luke Starrett <lukes@xsightlabs.com>

The ARM GICv3 TRM describes that the ITLinesNumber field of GICD_TYPER
register:

"indicates the maximum SPI INTID that the GIC implementation supports"

As SPI #0 is absolute IRQ #32, the max SPI INTID should have accounted
for the internal 16x SGI's and 16x PPI's.  However, the original GICv3
model subtracted off the SGI/PPI.  Cosmetically this can be seen at OS
boot (Linux) showing 32 shy of what should be there, i.e.:

[    0.000000] GICv3: 224 SPIs implemented

Though in hw/arm/virt.c, the machine is configured for 256 SPI's.  ARM
virt machine likely doesn't have a problem with this because the upper
32 IRQ's don't actually have anything meaningful wired. But, this does
become a functional issue on a custom use case which wants to make use
of these IRQ's.  Additionally, boot code (i.e. TF-A) will only init up
to the number (blocks of 32) that it believes to actually be there.

Signed-off-by: Luke Starrett <lukes@xsightlabs.com>
Message-id: AM9P193MB168473D99B761E204E032095D40D9@AM9P193MB1684.EURP193.PROD.OUTLOOK.COM
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/intc/arm_gicv3_dist.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_dist.c
+++ b/hw/intc/arm_gicv3_dist.c
@@ -XXX,XX +XXX,XX @@ static bool gicd_readl(GICv3State *s, hwaddr offset,
          * MBIS == 0 (message-based SPIs not supported)
          * SecurityExtn == 1 if security extns supported
          * CPUNumber == 0 since for us ARE is always 1
-         * ITLinesNumber == (num external irqs / 32) - 1
+         * ITLinesNumber == (((max SPI IntID + 1) / 32) - 1)
          */
-        int itlinesnumber = ((s->num_irq - GIC_INTERNAL) / 32) - 1;
+        int itlinesnumber = (s->num_irq / 32) - 1;
         /*
          * SecurityExtn must be RAZ if GICD_CTLR.DS == 1, and
          * "security extensions not supported" always implies DS == 1,
-- 
2.25.1

FEAT_EVT adds five new bits to the HCR_EL2 register: TTLBIS, TTLBOS,
TICAB, TOCU and TID4.  These allow the guest to enable trapping of
various EL1 instructions to EL2.  In this commit, add the necessary
code to allow the guest to set these bits if the feature is present;
because the bit is always zero when the feature isn't present we
won't need to use explicit feature checks in the "trap on condition"
tests in the following commits.

Note that although full implementation of the feature (mandatory from
Armv8.5 onward) requires all five trap bits, the ID registers permit
a value indicating that only TICAB, TOCU and TID4 are implemented,
which might be the case for CPUs between Armv8.2 and Armv8.5.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h    | 30 ++++++++++++++++++++++++++++++
 target/arm/helper.c |  6 ++++++
 2 files changed, 36 insertions(+)

For FEAT_EVT, the HCR_EL2.TTLBIS bit allows trapping on EL1 use of
TLB maintenance instructions that operate on the inner shareable
domain:

AArch64:
 TLBI VMALLE1IS, TLBI VAE1IS, TLBI ASIDE1IS, TLBI VAAE1IS,
 TLBI VALE1IS, TLBI VAALE1IS, TLBI RVAE1IS, TLBI RVAAE1IS,
 TLBI RVALE1IS, and TLBI RVAALE1IS.

AArch32:
 TLBIALLIS, TLBIMVAIS, TLBIASIDIS, TLBIMVAAIS, TLBIMVALIS,
 and TLBIMVAALIS.

Add the trapping support.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 43 +++++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_ttlb(CPUARMState *env, const ARMCPRegInfo *ri,
     return CP_ACCESS_OK;
 }
 
+/* Check for traps from EL1 due to HCR_EL2.TTLB or TTLBIS. */
+static CPAccessResult access_ttlbis(CPUARMState *env, const ARMCPRegInfo *ri,
+                                    bool isread)
+{
+    if (arm_current_el(env) == 1 &&
+        (arm_hcr_el2_eff(env) & (HCR_TTLB | HCR_TTLBIS))) {
+        return CP_ACCESS_TRAP_EL2;
+    }
+    return CP_ACCESS_OK;
+}
+
 static void dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
 {
     ARMCPU *cpu = env_archcpu(env);
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
 static const ARMCPRegInfo v7mp_cp_reginfo[] = {
     /* 32 bit TLB invalidates, Inner Shareable */
     { .name = "TLBIALLIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis,
       .writefn = tlbiall_is_write },
     { .name = "TLBIMVAIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 1,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis,
       .writefn = tlbimva_is_write },
     { .name = "TLBIASIDIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 2,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis,
       .writefn = tlbiasid_is_write },
     { .name = "TLBIMVAAIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 3,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis,
       .writefn = tlbimvaa_is_write },
 };
 
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     /* TLBI operations */
     { .name = "TLBI_VMALLE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0,
-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vmalle1is_write },
     { .name = "TLBI_VAE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 1,
-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_ASIDE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 2,
-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vmalle1is_write },
     { .name = "TLBI_VAAE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 3,
-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_VALE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 5,
-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_VAALE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 7,
-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_VMALLE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
 #endif
     /* TLB invalidate last level of translation table walk */
     { .name = "TLBIMVALIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 5,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis,
       .writefn = tlbimva_is_write },
     { .name = "TLBIMVAALIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 7,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis,
       .writefn = tlbimvaa_is_write },
     { .name = "TLBIMVAL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 5,
       .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo pauth_reginfo[] = {
 static const ARMCPRegInfo tlbirange_reginfo[] = {
     { .name = "TLBI_RVAE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 1,
-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_rvae1is_write },
     { .name = "TLBI_RVAAE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 3,
-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_rvae1is_write },
    { .name = "TLBI_RVALE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 5,
-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_rvae1is_write },
     { .name = "TLBI_RVAALE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 7,
-      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_rvae1is_write },
     { .name = "TLBI_RVAE1OS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 1,
-- 
2.25.1

For FEAT_EVT, the HCR_EL2.TTLBOS bit allows trapping on EL1
use of TLB maintenance instructions that operate on the
outer shareable domain:

TLBI VMALLE1OS, TLBI VAE1OS, TLBI ASIDE1OS,TLBI VAAE1OS,
TLBI VALE1OS, TLBI VAALE1OS, TLBI RVAE1OS, TLBI RVAAE1OS,
TLBI RVALE1OS, and TLBI RVAALE1OS.

(There are no AArch32 outer-shareable TLB maintenance ops.)

Implement the trapping.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

For FEAT_EVT, the HCR_EL2.TICAB bit allows trapping of the ICIALLUIS
and IC IALLUIS cache maintenance instructions.

The HCR_EL2.TOCU bit traps all the other cache maintenance
instructions that operate to the point of unification:
 AArch64 IC IVAU, IC IALLU, DC CVAU
 AArch32 ICIMVAU, ICIALLU, DCCMVAU

The two trap bits between them cover all of the cache maintenance
instructions which must also check the HCR_TPU flag.  Turn the old
aa64_cacheop_pou_access() function into a helper function which takes
the set of HCR_EL2 flags to check as an argument, and call it from
new access_ticab() and access_tocu() functions as appropriate for
each cache op.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPAccessResult aa64_cacheop_poc_access(CPUARMState *env,
     return CP_ACCESS_OK;
 }
 
-static CPAccessResult aa64_cacheop_pou_access(CPUARMState *env,
-                                              const ARMCPRegInfo *ri,
-                                              bool isread)
+static CPAccessResult do_cacheop_pou_access(CPUARMState *env, uint64_t hcrflags)
 {
     /* Cache invalidate/clean to Point of Unification... */
     switch (arm_current_el(env)) {
@@ -XXX,XX +XXX,XX @@ static CPAccessResult aa64_cacheop_pou_access(CPUARMState *env,
         }
         /* fall through */
     case 1:
-        /* ... EL1 must trap to EL2 if HCR_EL2.TPU is set.  */
-        if (arm_hcr_el2_eff(env) & HCR_TPU) {
+        /* ... EL1 must trap to EL2 if relevant HCR_EL2 flags are set.  */
+        if (arm_hcr_el2_eff(env) & hcrflags) {
             return CP_ACCESS_TRAP_EL2;
         }
         break;
@@ -XXX,XX +XXX,XX @@ static CPAccessResult aa64_cacheop_pou_access(CPUARMState *env,
     return CP_ACCESS_OK;
 }
 
+static CPAccessResult access_ticab(CPUARMState *env, const ARMCPRegInfo *ri,
+                                   bool isread)
+{
+    return do_cacheop_pou_access(env, HCR_TICAB | HCR_TPU);
+}
+
+static CPAccessResult access_tocu(CPUARMState *env, const ARMCPRegInfo *ri,
+                                  bool isread)
+{
+    return do_cacheop_pou_access(env, HCR_TOCU | HCR_TPU);
+}
+
 /* See: D4.7.2 TLB maintenance requirements and the TLB maintenance instructions
  * Page D4-1736 (DDI0487A.b)
  */
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     { .name = "IC_IALLUIS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
       .access = PL1_W, .type = ARM_CP_NOP,
-      .accessfn = aa64_cacheop_pou_access },
+      .accessfn = access_ticab },
     { .name = "IC_IALLU", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 0,
       .access = PL1_W, .type = ARM_CP_NOP,
-      .accessfn = aa64_cacheop_pou_access },
+      .accessfn = access_tocu },
     { .name = "IC_IVAU", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 5, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NOP,
-      .accessfn = aa64_cacheop_pou_access },
+      .accessfn = access_tocu },
     { .name = "DC_IVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 1,
       .access = PL1_W, .accessfn = aa64_cacheop_poc_access,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     { .name = "DC_CVAU", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 11, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NOP,
-      .accessfn = aa64_cacheop_pou_access },
+      .accessfn = access_tocu },
     { .name = "DC_CIVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 14, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NOP,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = tlbiipas2is_hyp_write },
     /* 32 bit cache operations */
     { .name = "ICIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
-      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_ticab },
     { .name = "BPIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 6,
       .type = ARM_CP_NOP, .access = PL1_W },
     { .name = "ICIALLU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 0,
-      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tocu },
     { .name = "ICIMVAU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 1,
-      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tocu },
     { .name = "BPIALL", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 6,
       .type = ARM_CP_NOP, .access = PL1_W },
     { .name = "BPIMVA", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 7,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     { .name = "DCCSW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 2,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
     { .name = "DCCMVAU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 11, .opc2 = 1,
-      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tocu },
     { .name = "DCCIMVAC", .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 1,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_poc_access },
     { .name = "DCCISW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2,
-- 
2.25.1

For FEAT_EVT, the HCR_EL2.TID4 trap allows trapping of the cache ID
registers CCSIDR_EL1, CCSIDR2_EL1, CLIDR_EL1 and CSSELR_EL1 (and
their AArch32 equivalents).  This is a subset of the registers
trapped by HCR_EL2.TID2, which includes all of these and also the
CTR_EL0 register.

Our implementation already uses a separate access function for
CTR_EL0 (ctr_el0_access()), so all of the registers currently using
access_aa64_tid2() should also be checking TID4.  Make that function
check both TID2 and TID4, and rename it appropriately.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void scr_reset(CPUARMState *env, const ARMCPRegInfo *ri)
     scr_write(env, ri, 0);
 }
 
-static CPAccessResult access_aa64_tid2(CPUARMState *env,
-                                       const ARMCPRegInfo *ri,
-                                       bool isread)
+static CPAccessResult access_tid4(CPUARMState *env,
+                                  const ARMCPRegInfo *ri,
+                                  bool isread)
 {
-    if (arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_TID2)) {
+    if (arm_current_el(env) == 1 &&
+        (arm_hcr_el2_eff(env) & (HCR_TID2 | HCR_TID4))) {
         return CP_ACCESS_TRAP_EL2;
     }
 
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
     { .name = "CCSIDR", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 0,
       .access = PL1_R,
-      .accessfn = access_aa64_tid2,
+      .accessfn = access_tid4,
       .readfn = ccsidr_read, .type = ARM_CP_NO_RAW },
     { .name = "CSSELR", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 2, .opc2 = 0,
       .access = PL1_RW,
-      .accessfn = access_aa64_tid2,
+      .accessfn = access_tid4,
       .writefn = csselr_write, .resetvalue = 0,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.csselr_s),
                              offsetof(CPUARMState, cp15.csselr_ns) } },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo ccsidr2_reginfo[] = {
     { .name = "CCSIDR2", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 1, .crn = 0, .crm = 0, .opc2 = 2,
       .access = PL1_R,
-      .accessfn = access_aa64_tid2,
+      .accessfn = access_tid4,
       .readfn = ccsidr2_read, .type = ARM_CP_NO_RAW },
 };
 
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
             .name = "CLIDR", .state = ARM_CP_STATE_BOTH,
             .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 1,
             .access = PL1_R, .type = ARM_CP_CONST,
-            .accessfn = access_aa64_tid2,
+            .accessfn = access_tid4,
             .resetvalue = cpu->clidr
         };
         define_one_arm_cp_reg(cpu, &clidr);
-- 
2.25.1

Update the ID registers for TCG's '-cpu max' to report the
FEAT_EVT Enhanced Virtualization Traps support.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 docs/system/arm/emulation.rst | 1 +
 target/arm/cpu64.c            | 1 +
 target/arm/cpu_tcg.c          | 1 +
 3 files changed, 3 insertions(+)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
 - FEAT_DoubleFault (Double Fault Extension)
 - FEAT_E0PD (Preventing EL0 access to halves of address maps)
 - FEAT_ETS (Enhanced Translation Synchronization)
+- FEAT_EVT (Enhanced Virtualization Traps)
 - FEAT_FCMA (Floating-point complex number instructions)
 - FEAT_FHM (Floating-point half-precision multiplication instructions)
 - FEAT_FP16 (Half-precision floating-point data processing)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
     t = FIELD_DP64(t, ID_AA64MMFR2, FWB, 1);      /* FEAT_S2FWB */
     t = FIELD_DP64(t, ID_AA64MMFR2, TTL, 1);      /* FEAT_TTL */
     t = FIELD_DP64(t, ID_AA64MMFR2, BBM, 2);      /* FEAT_BBM at level 2 */
+    t = FIELD_DP64(t, ID_AA64MMFR2, EVT, 2);      /* FEAT_EVT */
     t = FIELD_DP64(t, ID_AA64MMFR2, E0PD, 1);     /* FEAT_E0PD */
     cpu->isar.id_aa64mmfr2 = t;
 
diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu_tcg.c
+++ b/target/arm/cpu_tcg.c
@@ -XXX,XX +XXX,XX @@ void aa32_max_features(ARMCPU *cpu)
     t = FIELD_DP32(t, ID_MMFR4, AC2, 1);          /* ACTLR2, HACTLR2 */
     t = FIELD_DP32(t, ID_MMFR4, CNP, 1);          /* FEAT_TTCNP */
     t = FIELD_DP32(t, ID_MMFR4, XNX, 1);          /* FEAT_XNX */
+    t = FIELD_DP32(t, ID_MMFR4, EVT, 2);          /* FEAT_EVT */
     cpu->isar.id_mmfr4 = t;
 
     t = cpu->isar.id_mmfr5;
-- 
2.25.1

Convert the TYPE_ARM_SMMU device to 3-phase reset.  The legacy method
doesn't do anything that's invalid in the hold phase, so the
conversion is simple and not a behaviour change.

Note that we must convert this base class before we can convert the
TYPE_ARM_SMMUV3 subclass -- transitional support in Resettable
handles "chain to parent class reset" when the base class is 3-phase
and the subclass is still using legacy reset, but not the other way
around.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20221109161444.3397405-2-peter.maydell@linaro.org
---
 hw/arm/smmu-common.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -XXX,XX +XXX,XX @@ static void smmu_base_realize(DeviceState *dev, Error **errp)
     }
 }
 
-static void smmu_base_reset(DeviceState *dev)
+static void smmu_base_reset_hold(Object *obj)
 {
-    SMMUState *s = ARM_SMMU(dev);
+    SMMUState *s = ARM_SMMU(obj);
 
     g_hash_table_remove_all(s->configs);
     g_hash_table_remove_all(s->iotlb);
@@ -XXX,XX +XXX,XX @@ static Property smmu_dev_properties[] = {
 static void smmu_base_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
     SMMUBaseClass *sbc = ARM_SMMU_CLASS(klass);
 
     device_class_set_props(dc, smmu_dev_properties);
     device_class_set_parent_realize(dc, smmu_base_realize,
                                     &sbc->parent_realize);
-    dc->reset = smmu_base_reset;
+    rc->phases.hold = smmu_base_reset_hold;
 }
 
 static const TypeInfo smmu_base_info = {
-- 
2.25.1

Convert the TYPE_ARM_SMMUV3 device to 3-phase reset.  The legacy
reset method doesn't do anything that's invalid in the hold phase, so
the conversion only requires changing it to a hold phase method, and
using the 3-phase versions of the "save the parent reset method and
chain to it" code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221109161444.3397405-3-peter.maydell@linaro.org
---
 include/hw/arm/smmuv3.h |  2 +-
 hw/arm/smmuv3.c         | 12 ++++++++----
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -XXX,XX +XXX,XX @@ struct SMMUv3Class {
     /*< public >*/
 
     DeviceRealize parent_realize;
-    DeviceReset   parent_reset;
+    ResettablePhases parent_phases;
 };
 
 #define TYPE_ARM_SMMUV3   "arm-smmuv3"
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -XXX,XX +XXX,XX @@ static void smmu_init_irq(SMMUv3State *s, SysBusDevice *dev)
     }
 }
 
-static void smmu_reset(DeviceState *dev)
+static void smmu_reset_hold(Object *obj)
 {
-    SMMUv3State *s = ARM_SMMUV3(dev);
+    SMMUv3State *s = ARM_SMMUV3(obj);
     SMMUv3Class *c = ARM_SMMUV3_GET_CLASS(s);
 
-    c->parent_reset(dev);
+    if (c->parent_phases.hold) {
+        c->parent_phases.hold(obj);
+    }
 
     smmuv3_init_regs(s);
 }
@@ -XXX,XX +XXX,XX @@ static void smmuv3_instance_init(Object *obj)
 static void smmuv3_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
     SMMUv3Class *c = ARM_SMMUV3_CLASS(klass);
 
     dc->vmsd = &vmstate_smmuv3;
-    device_class_set_parent_reset(dc, smmu_reset, &c->parent_reset);
+    resettable_class_set_parent_phases(rc, NULL, smmu_reset_hold, NULL,
+                                       &c->parent_phases);
     c->parent_realize = dc->realize;
     dc->realize = smmu_realize;
 }
-- 
2.25.1

Convert the TYPE_ARM_GIC_COMMON device to 3-phase reset.  This is a
simple no-behaviour-change conversion.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20221109161444.3397405-4-peter.maydell@linaro.org
---
 hw/intc/arm_gic_common.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gic_common.c
+++ b/hw/intc/arm_gic_common.c
@@ -XXX,XX +XXX,XX @@ static inline void arm_gic_common_reset_irq_state(GICState *s, int first_cpu,
     }
 }
 
-static void arm_gic_common_reset(DeviceState *dev)
+static void arm_gic_common_reset_hold(Object *obj)
 {
-    GICState *s = ARM_GIC_COMMON(dev);
+    GICState *s = ARM_GIC_COMMON(obj);
     int i, j;
     int resetprio;
 
@@ -XXX,XX +XXX,XX @@ static Property arm_gic_common_properties[] = {
 static void arm_gic_common_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
     ARMLinuxBootIfClass *albifc = ARM_LINUX_BOOT_IF_CLASS(klass);
 
-    dc->reset = arm_gic_common_reset;
+    rc->phases.hold = arm_gic_common_reset_hold;
     dc->realize = arm_gic_common_realize;
     device_class_set_props(dc, arm_gic_common_properties);
     dc->vmsd = &vmstate_gic;
-- 
2.25.1

Now we have converted TYPE_ARM_GIC_COMMON, we can convert the
TYPE_ARM_GIC_KVM subclass to 3-phase reset.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221109161444.3397405-5-peter.maydell@linaro.org
---
 hw/intc/arm_gic_kvm.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gic_kvm.c
+++ b/hw/intc/arm_gic_kvm.c
@@ -XXX,XX +XXX,XX @@ DECLARE_OBJ_CHECKERS(GICState, KVMARMGICClass,
 struct KVMARMGICClass {
     ARMGICCommonClass parent_class;
     DeviceRealize parent_realize;
-    void (*parent_reset)(DeviceState *dev);
+    ResettablePhases parent_phases;
 };
 
 void kvm_arm_gic_set_irq(uint32_t num_irq, int irq, int level)
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gic_get(GICState *s)
     }
 }
 
-static void kvm_arm_gic_reset(DeviceState *dev)
+static void kvm_arm_gic_reset_hold(Object *obj)
 {
-    GICState *s = ARM_GIC_COMMON(dev);
+    GICState *s = ARM_GIC_COMMON(obj);
     KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s);
 
-    kgc->parent_reset(dev);
+    if (kgc->parent_phases.hold) {
+        kgc->parent_phases.hold(obj);
+    }
 
     if (kvm_arm_gic_can_save_restore(s)) {
         kvm_arm_gic_put(s);
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gic_realize(DeviceState *dev, Error **errp)
 static void kvm_arm_gic_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
     ARMGICCommonClass *agcc = ARM_GIC_COMMON_CLASS(klass);
     KVMARMGICClass *kgc = KVM_ARM_GIC_CLASS(klass);
 
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gic_class_init(ObjectClass *klass, void *data)
     agcc->post_load = kvm_arm_gic_put;
     device_class_set_parent_realize(dc, kvm_arm_gic_realize,
                                     &kgc->parent_realize);
-    device_class_set_parent_reset(dc, kvm_arm_gic_reset, &kgc->parent_reset);
+    resettable_class_set_parent_phases(rc, NULL, kvm_arm_gic_reset_hold, NULL,
+                                       &kgc->parent_phases);
 }
 
 static const TypeInfo kvm_arm_gic_info = {
-- 
2.25.1

Convert the TYPE_ARM_GICV3_COMMON parent class to 3-phase reset.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221109161444.3397405-6-peter.maydell@linaro.org
---
 hw/intc/arm_gicv3_common.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -XXX,XX +XXX,XX @@ static void arm_gicv3_finalize(Object *obj)
     g_free(s->redist_region_count);
 }
 
-static void arm_gicv3_common_reset(DeviceState *dev)
+static void arm_gicv3_common_reset_hold(Object *obj)
 {
-    GICv3State *s = ARM_GICV3_COMMON(dev);
+    GICv3State *s = ARM_GICV3_COMMON(obj);
     int i;
 
     for (i = 0; i < s->num_cpu; i++) {
@@ -XXX,XX +XXX,XX @@ static Property arm_gicv3_common_properties[] = {
 static void arm_gicv3_common_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
     ARMLinuxBootIfClass *albifc = ARM_LINUX_BOOT_IF_CLASS(klass);
 
-    dc->reset = arm_gicv3_common_reset;
+    rc->phases.hold = arm_gicv3_common_reset_hold;
     dc->realize = arm_gicv3_common_realize;
     device_class_set_props(dc, arm_gicv3_common_properties);
     dc->vmsd = &vmstate_gicv3;
-- 
2.25.1

Convert the TYPE_KVM_ARM_GICV3 device to 3-phase reset.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221109161444.3397405-7-peter.maydell@linaro.org
---
 hw/intc/arm_gicv3_kvm.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -XXX,XX +XXX,XX @@ DECLARE_OBJ_CHECKERS(GICv3State, KVMARMGICv3Class,
 struct KVMARMGICv3Class {
     ARMGICv3CommonClass parent_class;
     DeviceRealize parent_realize;
-    void (*parent_reset)(DeviceState *dev);
+    ResettablePhases parent_phases;
 };
 
 static void kvm_arm_gicv3_set_irq(void *opaque, int irq, int level)
@@ -XXX,XX +XXX,XX @@ static void arm_gicv3_icc_reset(CPUARMState *env, const ARMCPRegInfo *ri)
     c->icc_ctlr_el1[GICV3_S] = c->icc_ctlr_el1[GICV3_NS];
 }
 
-static void kvm_arm_gicv3_reset(DeviceState *dev)
+static void kvm_arm_gicv3_reset_hold(Object *obj)
 {
-    GICv3State *s = ARM_GICV3_COMMON(dev);
+    GICv3State *s = ARM_GICV3_COMMON(obj);
     KVMARMGICv3Class *kgc = KVM_ARM_GICV3_GET_CLASS(s);
 
     DPRINTF("Reset\n");
 
-    kgc->parent_reset(dev);
+    if (kgc->parent_phases.hold) {
+        kgc->parent_phases.hold(obj);
+    }
 
     if (s->migration_blocker) {
         DPRINTF("Cannot put kernel gic state, no kernel interface\n");
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
 static void kvm_arm_gicv3_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
     ARMGICv3CommonClass *agcc = ARM_GICV3_COMMON_CLASS(klass);
     KVMARMGICv3Class *kgc = KVM_ARM_GICV3_CLASS(klass);
 
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_class_init(ObjectClass *klass, void *data)
     agcc->post_load = kvm_arm_gicv3_put;
     device_class_set_parent_realize(dc, kvm_arm_gicv3_realize,
                                     &kgc->parent_realize);
-    device_class_set_parent_reset(dc, kvm_arm_gicv3_reset, &kgc->parent_reset);
+    resettable_class_set_parent_phases(rc, NULL, kvm_arm_gicv3_reset_hold, NULL,
+                                       &kgc->parent_phases);
 }
 
 static const TypeInfo kvm_arm_gicv3_info = {
-- 
2.25.1

Convert the TYPE_ARM_GICV3_ITS_COMMON parent class to 3-phase reset.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20221109161444.3397405-8-peter.maydell@linaro.org
---
 hw/intc/arm_gicv3_its_common.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/intc/arm_gicv3_its_common.c b/hw/intc/arm_gicv3_its_common.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_its_common.c
+++ b/hw/intc/arm_gicv3_its_common.c
@@ -XXX,XX +XXX,XX @@ void gicv3_its_init_mmio(GICv3ITSState *s, const MemoryRegionOps *ops,
     msi_nonbroken = true;
 }
 
-static void gicv3_its_common_reset(DeviceState *dev)
+static void gicv3_its_common_reset_hold(Object *obj)
 {
-    GICv3ITSState *s = ARM_GICV3_ITS_COMMON(dev);
+    GICv3ITSState *s = ARM_GICV3_ITS_COMMON(obj);
 
     s->ctlr = 0;
     s->cbaser = 0;
@@ -XXX,XX +XXX,XX @@ static void gicv3_its_common_reset(DeviceState *dev)
 static void gicv3_its_common_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
 
-    dc->reset = gicv3_its_common_reset;
+    rc->phases.hold = gicv3_its_common_reset_hold;
     dc->vmsd = &vmstate_its;
 }
 
-- 
2.25.1

Convert the TYPE_ARM_GICV3_ITS device to 3-phase reset.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221109161444.3397405-9-peter.maydell@linaro.org
---
 hw/intc/arm_gicv3_its.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_its.c
+++ b/hw/intc/arm_gicv3_its.c
@@ -XXX,XX +XXX,XX @@ DECLARE_OBJ_CHECKERS(GICv3ITSState, GICv3ITSClass,
 
 struct GICv3ITSClass {
     GICv3ITSCommonClass parent_class;
-    void (*parent_reset)(DeviceState *dev);
+    ResettablePhases parent_phases;
 };
 
 /*
@@ -XXX,XX +XXX,XX @@ static void gicv3_arm_its_realize(DeviceState *dev, Error **errp)
     }
 }
 
-static void gicv3_its_reset(DeviceState *dev)
+static void gicv3_its_reset_hold(Object *obj)
 {
-    GICv3ITSState *s = ARM_GICV3_ITS_COMMON(dev);
+    GICv3ITSState *s = ARM_GICV3_ITS_COMMON(obj);
     GICv3ITSClass *c = ARM_GICV3_ITS_GET_CLASS(s);
 
-    c->parent_reset(dev);
+    if (c->parent_phases.hold) {
+        c->parent_phases.hold(obj);
+    }
 
     /* Quiescent bit reset to 1 */
     s->ctlr = FIELD_DP32(s->ctlr, GITS_CTLR, QUIESCENT, 1);
@@ -XXX,XX +XXX,XX @@ static Property gicv3_its_props[] = {
 static void gicv3_its_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
     GICv3ITSClass *ic = ARM_GICV3_ITS_CLASS(klass);
     GICv3ITSCommonClass *icc = ARM_GICV3_ITS_COMMON_CLASS(klass);
 
     dc->realize = gicv3_arm_its_realize;
     device_class_set_props(dc, gicv3_its_props);
-    device_class_set_parent_reset(dc, gicv3_its_reset, &ic->parent_reset);
+    resettable_class_set_parent_phases(rc, NULL, gicv3_its_reset_hold, NULL,
+                                       &ic->parent_phases);
     icc->post_load = gicv3_its_post_load;
 }
 
-- 
2.25.1

Convert the TYPE_KVM_ARM_ITS device to 3-phase reset.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221109161444.3397405-10-peter.maydell@linaro.org
---
 hw/intc/arm_gicv3_its_kvm.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/intc/arm_gicv3_its_kvm.c b/hw/intc/arm_gicv3_its_kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_its_kvm.c
+++ b/hw/intc/arm_gicv3_its_kvm.c
@@ -XXX,XX +XXX,XX @@ DECLARE_OBJ_CHECKERS(GICv3ITSState, KVMARMITSClass,
 
 struct KVMARMITSClass {
     GICv3ITSCommonClass parent_class;
-    void (*parent_reset)(DeviceState *dev);
+    ResettablePhases parent_phases;
 };
 
 
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_its_post_load(GICv3ITSState *s)
                       GITS_CTLR, &s->ctlr, true, &error_abort);
 }
 
-static void kvm_arm_its_reset(DeviceState *dev)
+static void kvm_arm_its_reset_hold(Object *obj)
 {
-    GICv3ITSState *s = ARM_GICV3_ITS_COMMON(dev);
+    GICv3ITSState *s = ARM_GICV3_ITS_COMMON(obj);
     KVMARMITSClass *c = KVM_ARM_ITS_GET_CLASS(s);
     int i;
 
-    c->parent_reset(dev);
+    if (c->parent_phases.hold) {
+        c->parent_phases.hold(obj);
+    }
 
     if (kvm_device_check_attr(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
                                KVM_DEV_ARM_ITS_CTRL_RESET)) {
@@ -XXX,XX +XXX,XX @@ static Property kvm_arm_its_props[] = {
 static void kvm_arm_its_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
     GICv3ITSCommonClass *icc = ARM_GICV3_ITS_COMMON_CLASS(klass);
     KVMARMITSClass *ic = KVM_ARM_ITS_CLASS(klass);
 
     dc->realize = kvm_arm_its_realize;
     device_class_set_props(dc, kvm_arm_its_props);
-    device_class_set_parent_reset(dc, kvm_arm_its_reset, &ic->parent_reset);
+    resettable_class_set_parent_phases(rc, NULL, kvm_arm_its_reset_hold, NULL,
+                                       &ic->parent_phases);
     icc->send_msi = kvm_its_send_msi;
     icc->pre_save = kvm_arm_its_pre_save;
     icc->post_load = kvm_arm_its_post_load;
-- 
2.25.1

From: Schspa Shi <schspa@gmail.com>

We use 32bit value for linux,initrd-[start/end], when we have
loader_start > 4GB, there will be a wrong initrd_start passed
to the kernel, and the kernel will report the following warning.

[    0.000000] ------------[ cut here ]------------
[    0.000000] initrd not fully accessible via the linear mapping -- please check your bootloader ...
[    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:355 arm64_memblock_init+0x158/0x244
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W          6.1.0-rc3-13250-g30a0b95b1335-dirty #28
[    0.000000] Hardware name: Horizon Sigi Virtual development board (DT)
[    0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.000000] pc : arm64_memblock_init+0x158/0x244
[    0.000000] lr : arm64_memblock_init+0x158/0x244
[    0.000000] sp : ffff800009273df0
[    0.000000] x29: ffff800009273df0 x28: 0000001000cc0010 x27: 0000800000000000
[    0.000000] x26: 000000000050a3e2 x25: ffff800008b46000 x24: ffff800008b46000
[    0.000000] x23: ffff800008a53000 x22: ffff800009420000 x21: ffff800008a53000
[    0.000000] x20: 0000000004000000 x19: 0000000004000000 x18: 00000000ffff1020
[    0.000000] x17: 6568632065736165 x16: 6c70202d2d20676e x15: 697070616d207261
[    0.000000] x14: 656e696c20656874 x13: 0a2e2e2e20726564 x12: 0000000000000000
[    0.000000] x11: 0000000000000000 x10: 00000000ffffffff x9 : 0000000000000000
[    0.000000] x8 : 0000000000000000 x7 : 796c6c756620746f x6 : 6e20647274696e69
[    0.000000] x5 : ffff8000093c7c47 x4 : ffff800008a2102f x3 : ffff800009273a88
[    0.000000] x2 : 80000000fffff038 x1 : 00000000000000c0 x0 : 0000000000000056
[    0.000000] Call trace:
[    0.000000]  arm64_memblock_init+0x158/0x244
[    0.000000]  setup_arch+0x164/0x1cc
[    0.000000]  start_kernel+0x94/0x4ac
[    0.000000]  __primary_switched+0xb4/0xbc
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000001000000000-0x0000001007ffffff]

This doesn't affect any machine types we currently support, because
for all of our machine types the RAM starts well below the 4GB
mark, but it does demonstrate that we're not currently writing
the device-tree properties quite as intended.

To fix it, we can change it to write these values to the dtb using a
type width matching #address-cells.  This is the intended size for
these dtb properties, and is how u-boot, for instance, writes them,
although in practice the Linux kernel will cope with them being any
width as long as they're big enough to fit the value.

Signed-off-by: Schspa Shi <schspa@gmail.com>
Message-id: 20221129160724.75667-1-schspa@gmail.com
[PMM: tweaked commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/boot.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
     }
 
     if (binfo->initrd_size) {
-        rc = qemu_fdt_setprop_cell(fdt, "/chosen", "linux,initrd-start",
-                                   binfo->initrd_start);
+        rc = qemu_fdt_setprop_sized_cells(fdt, "/chosen", "linux,initrd-start",
+                                          acells, binfo->initrd_start);
         if (rc < 0) {
             fprintf(stderr, "couldn't set /chosen/linux,initrd-start\n");
             goto fail;
         }
 
-        rc = qemu_fdt_setprop_cell(fdt, "/chosen", "linux,initrd-end",
-                                   binfo->initrd_start + binfo->initrd_size);
+        rc = qemu_fdt_setprop_sized_cells(fdt, "/chosen", "linux,initrd-end",
+                                          acells,
+                                          binfo->initrd_start +
+                                          binfo->initrd_size);
         if (rc < 0) {
             fprintf(stderr, "couldn't set /chosen/linux,initrd-end\n");
             goto fail;
-- 
2.25.1

From: Zhuojia Shen <chaosdefinition@hotmail.com>

In CPUID registers exposed to userspace, some registers were missing
and some fields were not exposed.  This patch aligns exposed ID
registers and their fields with what the upstream kernel currently
exposes.

Specifically, the following new ID registers/fields are exposed to
userspace:

ID_AA64PFR1_EL1.BT:       bits 3-0
ID_AA64PFR1_EL1.MTE:      bits 11-8
ID_AA64PFR1_EL1.SME:      bits 27-24

ID_AA64ZFR0_EL1.SVEver:   bits 3-0
ID_AA64ZFR0_EL1.AES:      bits 7-4
ID_AA64ZFR0_EL1.BitPerm:  bits 19-16
ID_AA64ZFR0_EL1.BF16:     bits 23-20
ID_AA64ZFR0_EL1.SHA3:     bits 35-32
ID_AA64ZFR0_EL1.SM4:      bits 43-40
ID_AA64ZFR0_EL1.I8MM:     bits 47-44
ID_AA64ZFR0_EL1.F32MM:    bits 55-52
ID_AA64ZFR0_EL1.F64MM:    bits 59-56

ID_AA64SMFR0_EL1.F32F32:  bit 32
ID_AA64SMFR0_EL1.B16F32:  bit 34
ID_AA64SMFR0_EL1.F16F32:  bit 35
ID_AA64SMFR0_EL1.I8I32:   bits 39-36
ID_AA64SMFR0_EL1.F64F64:  bit 48
ID_AA64SMFR0_EL1.I16I64:  bits 55-52
ID_AA64SMFR0_EL1.FA64:    bit 63

ID_AA64MMFR0_EL1.ECV:     bits 63-60

ID_AA64MMFR1_EL1.AFP:     bits 47-44

ID_AA64MMFR2_EL1.AT:      bits 35-32

ID_AA64ISAR0_EL1.RNDR:    bits 63-60

ID_AA64ISAR1_EL1.FRINTTS: bits 35-32
ID_AA64ISAR1_EL1.BF16:    bits 47-44
ID_AA64ISAR1_EL1.DGH:     bits 51-48
ID_AA64ISAR1_EL1.I8MM:    bits 55-52

ID_AA64ISAR2_EL1.WFxT:    bits 3-0
ID_AA64ISAR2_EL1.RPRES:   bits 7-4
ID_AA64ISAR2_EL1.GPA3:    bits 11-8
ID_AA64ISAR2_EL1.APA3:    bits 15-12

The code is also refactored to use symbolic names for ID register fields
for better readability and maintainability.

Signed-off-by: Zhuojia Shen <chaosdefinition@hotmail.com>
Message-id: DS7PR12MB6309BC9133877BCC6FC419FEAC0D9@DS7PR12MB6309.namprd12.prod.outlook.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 96 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 79 insertions(+), 17 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
 #ifdef CONFIG_USER_ONLY
         static const ARMCPRegUserSpaceInfo v8_user_idregs[] = {
             { .name = "ID_AA64PFR0_EL1",
-              .exported_bits = 0x000f000f00ff0000,
-              .fixed_bits    = 0x0000000000000011 },
+              .exported_bits = R_ID_AA64PFR0_FP_MASK |
+                               R_ID_AA64PFR0_ADVSIMD_MASK |
+                               R_ID_AA64PFR0_SVE_MASK |
+                               R_ID_AA64PFR0_DIT_MASK,
+              .fixed_bits = (0x1 << R_ID_AA64PFR0_EL0_SHIFT) |
+                            (0x1 << R_ID_AA64PFR0_EL1_SHIFT) },
             { .name = "ID_AA64PFR1_EL1",
-              .exported_bits = 0x00000000000000f0 },
+              .exported_bits = R_ID_AA64PFR1_BT_MASK |
+                               R_ID_AA64PFR1_SSBS_MASK |
+                               R_ID_AA64PFR1_MTE_MASK |
+                               R_ID_AA64PFR1_SME_MASK },
             { .name = "ID_AA64PFR*_EL1_RESERVED",
-              .is_glob = true                     },
-            { .name = "ID_AA64ZFR0_EL1"           },
+              .is_glob = true },
+            { .name = "ID_AA64ZFR0_EL1",
+              .exported_bits = R_ID_AA64ZFR0_SVEVER_MASK |
+                               R_ID_AA64ZFR0_AES_MASK |
+                               R_ID_AA64ZFR0_BITPERM_MASK |
+                               R_ID_AA64ZFR0_BFLOAT16_MASK |
+                               R_ID_AA64ZFR0_SHA3_MASK |
+                               R_ID_AA64ZFR0_SM4_MASK |
+                               R_ID_AA64ZFR0_I8MM_MASK |
+                               R_ID_AA64ZFR0_F32MM_MASK |
+                               R_ID_AA64ZFR0_F64MM_MASK },
+            { .name = "ID_AA64SMFR0_EL1",
+              .exported_bits = R_ID_AA64SMFR0_F32F32_MASK |
+                               R_ID_AA64SMFR0_B16F32_MASK |
+                               R_ID_AA64SMFR0_F16F32_MASK |
+                               R_ID_AA64SMFR0_I8I32_MASK |
+                               R_ID_AA64SMFR0_F64F64_MASK |
+                               R_ID_AA64SMFR0_I16I64_MASK |
+                               R_ID_AA64SMFR0_FA64_MASK },
             { .name = "ID_AA64MMFR0_EL1",
-              .fixed_bits    = 0x00000000ff000000 },
-            { .name = "ID_AA64MMFR1_EL1"          },
+              .exported_bits = R_ID_AA64MMFR0_ECV_MASK,
+              .fixed_bits = (0xf << R_ID_AA64MMFR0_TGRAN64_SHIFT) |
+                            (0xf << R_ID_AA64MMFR0_TGRAN4_SHIFT) },
+            { .name = "ID_AA64MMFR1_EL1",
+              .exported_bits = R_ID_AA64MMFR1_AFP_MASK },
+            { .name = "ID_AA64MMFR2_EL1",
+              .exported_bits = R_ID_AA64MMFR2_AT_MASK },
             { .name = "ID_AA64MMFR*_EL1_RESERVED",
-              .is_glob = true                     },
+              .is_glob = true },
             { .name = "ID_AA64DFR0_EL1",
-              .fixed_bits    = 0x0000000000000006 },
-            { .name = "ID_AA64DFR1_EL1"           },
+              .fixed_bits = (0x6 << R_ID_AA64DFR0_DEBUGVER_SHIFT) },
+            { .name = "ID_AA64DFR1_EL1" },
             { .name = "ID_AA64DFR*_EL1_RESERVED",
-              .is_glob = true                     },
+              .is_glob = true },
             { .name = "ID_AA64AFR*",
-              .is_glob = true                     },
+              .is_glob = true },
             { .name = "ID_AA64ISAR0_EL1",
-              .exported_bits = 0x00fffffff0fffff0 },
+              .exported_bits = R_ID_AA64ISAR0_AES_MASK |
+                               R_ID_AA64ISAR0_SHA1_MASK |
+                               R_ID_AA64ISAR0_SHA2_MASK |
+                               R_ID_AA64ISAR0_CRC32_MASK |
+                               R_ID_AA64ISAR0_ATOMIC_MASK |
+                               R_ID_AA64ISAR0_RDM_MASK |
+                               R_ID_AA64ISAR0_SHA3_MASK |
+                               R_ID_AA64ISAR0_SM3_MASK |
+                               R_ID_AA64ISAR0_SM4_MASK |
+                               R_ID_AA64ISAR0_DP_MASK |
+                               R_ID_AA64ISAR0_FHM_MASK |
+                               R_ID_AA64ISAR0_TS_MASK |
+                               R_ID_AA64ISAR0_RNDR_MASK },
             { .name = "ID_AA64ISAR1_EL1",
-              .exported_bits = 0x000000f0ffffffff },
+              .exported_bits = R_ID_AA64ISAR1_DPB_MASK |
+                               R_ID_AA64ISAR1_APA_MASK |
+                               R_ID_AA64ISAR1_API_MASK |
+                               R_ID_AA64ISAR1_JSCVT_MASK |
+                               R_ID_AA64ISAR1_FCMA_MASK |
+                               R_ID_AA64ISAR1_LRCPC_MASK |
+                               R_ID_AA64ISAR1_GPA_MASK |
+                               R_ID_AA64ISAR1_GPI_MASK |
+                               R_ID_AA64ISAR1_FRINTTS_MASK |
+                               R_ID_AA64ISAR1_SB_MASK |
+                               R_ID_AA64ISAR1_BF16_MASK |
+                               R_ID_AA64ISAR1_DGH_MASK |
+                               R_ID_AA64ISAR1_I8MM_MASK },
+            { .name = "ID_AA64ISAR2_EL1",
+              .exported_bits = R_ID_AA64ISAR2_WFXT_MASK |
+                               R_ID_AA64ISAR2_RPRES_MASK |
+                               R_ID_AA64ISAR2_GPA3_MASK |
+                               R_ID_AA64ISAR2_APA3_MASK },
             { .name = "ID_AA64ISAR*_EL1_RESERVED",
-              .is_glob = true                     },
+              .is_glob = true },
         };
         modify_arm_cp_regs(v8_idregs, v8_user_idregs);
 #endif
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
 #ifdef CONFIG_USER_ONLY
         static const ARMCPRegUserSpaceInfo id_v8_user_midr_cp_reginfo[] = {
             { .name = "MIDR_EL1",
-              .exported_bits = 0x00000000ffffffff },
-            { .name = "REVIDR_EL1"                },
+              .exported_bits = R_MIDR_EL1_REVISION_MASK |
+                               R_MIDR_EL1_PARTNUM_MASK |
+                               R_MIDR_EL1_ARCHITECTURE_MASK |
+                               R_MIDR_EL1_VARIANT_MASK |
+                               R_MIDR_EL1_IMPLEMENTER_MASK },
+            { .name = "REVIDR_EL1" },
         };
         modify_arm_cp_regs(id_v8_midr_cp_reginfo, id_v8_user_midr_cp_reginfo);
 #endif
-- 
2.25.1

From: Thomas Huth <thuth@redhat.com>

The header target/arm/kvm-consts.h checks CONFIG_KVM which is marked as
poisoned in common code, so the files that include this header have to
be added to specific_ss and recompiled for each, qemu-system-arm and
qemu-system-aarch64. However, since the kvm headers are only optionally
used in kvm-constants.h for some sanity checks, we can additionally
check the NEED_CPU_H macro first to avoid the poisoned CONFIG_KVM macro,
so kvm-constants.h can also be used from "common" files (without the
sanity checks - which should be OK since they are still done from other
target-specific files instead). This way, and by adjusting some other
include statements in the related files here and there, we can move some
files from specific_ss into softmmu_ss, so that they only need to be
compiled once during the build process.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221202154023.293614-1-thuth@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/misc/xlnx-zynqmp-apu-ctrl.h |  2 +-
 target/arm/kvm-consts.h                |  8 ++++----
 hw/misc/imx6_src.c                     |  2 +-
 hw/misc/iotkit-sysctl.c                |  1 -
 hw/misc/meson.build                    | 11 +++++------
 5 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/include/hw/misc/xlnx-zynqmp-apu-ctrl.h b/include/hw/misc/xlnx-zynqmp-apu-ctrl.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/xlnx-zynqmp-apu-ctrl.h
+++ b/include/hw/misc/xlnx-zynqmp-apu-ctrl.h
@@ -XXX,XX +XXX,XX @@
 
 #include "hw/sysbus.h"
 #include "hw/register.h"
-#include "target/arm/cpu.h"
+#include "target/arm/cpu-qom.h"
 
 #define TYPE_XLNX_ZYNQMP_APU_CTRL "xlnx.apu-ctrl"
 OBJECT_DECLARE_SIMPLE_TYPE(XlnxZynqMPAPUCtrl, XLNX_ZYNQMP_APU_CTRL)
diff --git a/target/arm/kvm-consts.h b/target/arm/kvm-consts.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm-consts.h
+++ b/target/arm/kvm-consts.h
@@ -XXX,XX +XXX,XX @@
 #ifndef ARM_KVM_CONSTS_H
 #define ARM_KVM_CONSTS_H
 
+#ifdef NEED_CPU_H
 #ifdef CONFIG_KVM
 #include <linux/kvm.h>
 #include <linux/psci.h>
-
 #define MISMATCH_CHECK(X, Y) QEMU_BUILD_BUG_ON(X != Y)
+#endif
+#endif
 
-#else
-
+#ifndef MISMATCH_CHECK
 #define MISMATCH_CHECK(X, Y) QEMU_BUILD_BUG_ON(0)
-
 #endif
 
 #define CP_REG_SIZE_SHIFT 52
diff --git a/hw/misc/imx6_src.c b/hw/misc/imx6_src.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/imx6_src.c
+++ b/hw/misc/imx6_src.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/log.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
-#include "arm-powerctl.h"
+#include "target/arm/arm-powerctl.h"
 #include "hw/core/cpu.h"
 
 #ifndef DEBUG_IMX6_SRC
diff --git a/hw/misc/iotkit-sysctl.c b/hw/misc/iotkit-sysctl.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/iotkit-sysctl.c
+++ b/hw/misc/iotkit-sysctl.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/qdev-properties.h"
 #include "hw/arm/armsse-version.h"
 #include "target/arm/arm-powerctl.h"
-#include "target/arm/cpu.h"
 
 REG32(SECDBGSTAT, 0x0)
 REG32(SECDBGSET, 0x4)
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_IMX', if_true: files(
   'imx25_ccm.c',
   'imx31_ccm.c',
   'imx6_ccm.c',
+  'imx6_src.c',
   'imx6ul_ccm.c',
   'imx7_ccm.c',
   'imx7_gpr.c',
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_RASPI', if_true: files(
 ))
 softmmu_ss.add(when: 'CONFIG_SLAVIO', if_true: files('slavio_misc.c'))
 softmmu_ss.add(when: 'CONFIG_ZYNQ', if_true: files('zynq_slcr.c'))
-specific_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp-crf.c'))
-specific_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp-apu-ctrl.c'))
+softmmu_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp-crf.c'))
+softmmu_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp-apu-ctrl.c'))
 specific_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: files('xlnx-versal-crl.c'))
 softmmu_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: files(
   'xlnx-versal-xramc.c',
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_TZ_MPC', if_true: files('tz-mpc.c'))
 softmmu_ss.add(when: 'CONFIG_TZ_MSC', if_true: files('tz-msc.c'))
 softmmu_ss.add(when: 'CONFIG_TZ_PPC', if_true: files('tz-ppc.c'))
 softmmu_ss.add(when: 'CONFIG_IOTKIT_SECCTL', if_true: files('iotkit-secctl.c'))
+softmmu_ss.add(when: 'CONFIG_IOTKIT_SYSCTL', if_true: files('iotkit-sysctl.c'))
 softmmu_ss.add(when: 'CONFIG_IOTKIT_SYSINFO', if_true: files('iotkit-sysinfo.c'))
 softmmu_ss.add(when: 'CONFIG_ARMSSE_CPU_PWRCTRL', if_true: files('armsse-cpu-pwrctrl.c'))
 softmmu_ss.add(when: 'CONFIG_ARMSSE_CPUID', if_true: files('armsse-cpuid.c'))
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_GRLIB', if_true: files('grlib_ahb_apb_pnp.c'))
 
 specific_ss.add(when: 'CONFIG_AVR_POWER', if_true: files('avr_power.c'))
 
-specific_ss.add(when: 'CONFIG_IMX', if_true: files('imx6_src.c'))
-specific_ss.add(when: 'CONFIG_IOTKIT_SYSCTL', if_true: files('iotkit-sysctl.c'))
-
 specific_ss.add(when: 'CONFIG_MAC_VIA', if_true: files('mac_via.c'))
 
 specific_ss.add(when: 'CONFIG_MIPS_CPS', if_true: files('mips_cmgcr.c', 'mips_cpc.c'))
 specific_ss.add(when: 'CONFIG_MIPS_ITU', if_true: files('mips_itu.c'))
 
-specific_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa_ec.c'))
+softmmu_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa_ec.c'))
 
 # HPPA devices
 softmmu_ss.add(when: 'CONFIG_LASI', if_true: files('lasi.c'))
-- 
2.25.1

From: Philippe Mathieu-Daudé <philmd@linaro.org>

When building with --disable-tcg on Darwin we get:

target/arm/cpu.c:725:16: error: incomplete definition of type 'struct TCGCPUOps'
    cc->tcg_ops->do_interrupt(cs);
    ~~~~~~~~~~~^

Commit 083afd18a9 ("target/arm: Restrict cpu_exec_interrupt()
handler to sysemu") limited this block to system emulation,
but neglected to also limit it to TCG.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-id: 20221209110823.59495-1-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
     arm_rebuild_hflags(env);
 }
 
-#ifndef CONFIG_USER_ONLY
+#if defined(CONFIG_TCG) && !defined(CONFIG_USER_ONLY)
 
 static inline bool arm_excp_unmasked(CPUState *cs, unsigned int excp_idx,
                                      unsigned int target_el,
@@ -XXX,XX +XXX,XX @@ static bool arm_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
     cc->tcg_ops->do_interrupt(cs);
     return true;
 }
-#endif /* !CONFIG_USER_ONLY */
+
+#endif /* CONFIG_TCG && !CONFIG_USER_ONLY */
 
 void arm_cpu_update_virq(ARMCPU *cpu)
 {
-- 
2.25.1