Series comparison

-[Qemu-devel] [PULL 00/48] target-arm queue
+[PULL 00/31] target-arm queue
-Arm queue; the bulk of this is the VFP decodetree conversion...
+First arm pullreq for 7.1. The bulk of this is the qemu_split_irq
 removal.
 I have enough stuff in my to-review queue that I expect to do another
 pullreq early next week, but 31 patches is enough to not hang on to.
 thanks
 -- PMM
-The following changes since commit 4747524f9f243ca5ff1f146d37e423c00e923ee1:
+The following changes since commit 9c125d17e9402c232c46610802e5931b3639d77b:
-  Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2019-06-12' into staging (2019-06-13 11:58:00 +0100)
+  Merge tag 'pull-tcg-20220420' of https://gitlab.com/rth7680/qemu into staging (2022-04-20 16:43:11 -0700)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20190613
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20220421
-for you to fetch changes up to 07e4c7f769120c9a5bd6a26c2dc1421f2f838d80:
+for you to fetch changes up to 5b415dd61bdbf61fb4be0e9f1a7172b8bce682c6:
-  target/arm: Fix short-vector increment behaviour (2019-06-13 12:57:37 +0100)
+  hw/arm: Use bit fields for NPCM7XX PWRON STRAPs (2022-04-21 11:37:05 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * convert aarch32 VFP decoder to decodetree
+ * hw/arm/virt: Check for attempt to use TrustZone with KVM or HVF
-   (includes tightening up decode in a few places)
+ * versal: Add the Cortex-R5s in the Real-Time Processing Unit (RPU) subsystem
- * fix minor bugs in VFP short-vector handling
+ * versal: model enough of the Clock/Reset Low-power domain (CRL) to allow control of the Cortex-R5s
- * hw/core/bus.c: Only the main system bus can have no parent
+ * xlnx-zynqmp: Connect 4 TTC timers
- * smmuv3: Fix decoding of ID register range
+ * exynos4210: Refactor GIC/combiner code to stop using qemu_split_irq
- * Implement NSACR gating of floating point
+ * realview: replace 'qemu_split_irq' with 'TYPE_SPLIT_IRQ'
- * Use tcg_gen_gvec_bitsel
+ * stellaris: replace 'qemu_split_irq' with 'TYPE_SPLIT_IRQ'
- * Vectorize USHL and SSHL
+ * hw/core/irq: remove unused 'qemu_irq_split' function
  * npcm7xx: use symbolic constants for PWRON STRAP bit fields
  * virt: document impact of gic-version on max CPUs
 ----------------------------------------------------------------
-Peter Maydell (44):
+Edgar E. Iglesias (6):
-      target/arm: Implement NSACR gating of floating point
+      timer: cadence_ttc: Break out header file to allow embedding
-      hw/arm/smmuv3: Fix decoding of ID register range
+      hw/arm/xlnx-zynqmp: Connect 4 TTC timers
-      hw/core/bus.c: Only the main system bus can have no parent
+      hw/arm: versal: Create an APU CPU Cluster
-      target/arm: Add stubs for AArch32 VFP decodetree
+      hw/arm: versal: Add the Cortex-R5Fs
-      target/arm: Factor out VFP access checking code
+      hw/misc: Add a model of the Xilinx Versal CRL
-      target/arm: Fix Cortex-R5F MVFR values
+      hw/arm: versal: Connect the CRL
       target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max
       target/arm: Convert the VSEL instructions to decodetree
       target/arm: Convert VMINNM, VMAXNM to decodetree
       target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM to decodetree
       target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM to decodetree
       target/arm: Move the VFP trans_* functions to translate-vfp.inc.c
       target/arm: Add helpers for VFP register loads and stores
       target/arm: Convert "double-precision" register moves to decodetree
       target/arm: Convert "single-precision" register moves to decodetree
       target/arm: Convert VFP two-register transfer insns to decodetree
       target/arm: Convert VFP VLDR and VSTR to decodetree
       target/arm: Convert the VFP load/store multiple insns to decodetree
       target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d
       target/arm: Convert VFP VMLA to decodetree
       target/arm: Convert VFP VMLS to decodetree
       target/arm: Convert VFP VNMLS to decodetree
       target/arm: Convert VFP VNMLA to decodetree
       target/arm: Convert VMUL to decodetree
       target/arm: Convert VNMUL to decodetree
       target/arm: Convert VADD to decodetree
       target/arm: Convert VSUB to decodetree
       target/arm: Convert VDIV to decodetree
       target/arm: Convert VFP fused multiply-add insns to decodetree
       target/arm: Convert VMOV (imm) to decodetree
       target/arm: Convert VABS to decodetree
       target/arm: Convert VNEG to decodetree
       target/arm: Convert VSQRT to decodetree
       target/arm: Convert VMOV (register) to decodetree
       target/arm: Convert VFP comparison insns to decodetree
       target/arm: Convert the VCVT-from-f16 insns to decodetree
       target/arm: Convert the VCVT-to-f16 insns to decodetree
       target/arm: Convert VFP round insns to decodetree
       target/arm: Convert double-single precision conversion insns to decodetree
       target/arm: Convert integer-to-float insns to decodetree
       target/arm: Convert VJCVT to decodetree
       target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree
       target/arm: Convert float-to-integer VCVT insns to decodetree
       target/arm: Fix short-vector increment behaviour
-Richard Henderson (4):
+Hao Wu (2):
-      target/arm: Vectorize USHL and SSHL
+      hw/misc: Add PWRON STRAP bit fields in GCR module
-      target/arm: Use tcg_gen_gvec_bitsel
+      hw/arm: Use bit fields for NPCM7XX PWRON STRAPs
       target/arm: Fix output of PAuth Auth
       decodetree: Fix comparison of Field
- target/arm/Makefile.objs          |   13 +
+Heinrich Schuchardt (1):
- tests/tcg/aarch64/Makefile.target |    2 +-
+      hw/arm/virt: impact of gic-version on max CPUs
  target/arm/cpu.h                  |   11 +
  target/arm/helper.h               |   11 +-
  target/arm/translate-a64.h        |    2 +
  target/arm/translate.h            |    9 +-
  hw/arm/smmuv3.c                   |    2 +-
  hw/core/bus.c                     |   21 +-
  target/arm/cpu.c                  |    6 +
  target/arm/helper.c               |   75 +-
  target/arm/neon_helper.c          |   33 -
  target/arm/pauth_helper.c         |    4 +-
  target/arm/translate-a64.c        |   33 +-
  target/arm/translate-vfp.inc.c    | 2672 +++++++++++++++++++++++++++++++++++++
  target/arm/translate.c            | 1881 +++++---------------------
  target/arm/vec_helper.c           |   88 ++
  tests/tcg/aarch64/pauth-2.c       |   61 +
  scripts/decodetree.py             |    2 +-
  target/arm/vfp-uncond.decode      |   63 +
  target/arm/vfp.decode             |  242 ++++
 files changed, 3593 insertions(+), 1638 deletions(-)
  create mode 100644 target/arm/translate-vfp.inc.c
  create mode 100644 tests/tcg/aarch64/pauth-2.c
  create mode 100644 target/arm/vfp-uncond.decode
  create mode 100644 target/arm/vfp.decode
+Peter Maydell (19):
+      hw/arm/virt: Check for attempt to use TrustZone with KVM or HVF
+      hw/arm/exynos4210: Use TYPE_OR_IRQ instead of custom OR-gate device
+      hw/intc/exynos4210_gic: Remove unused TYPE_EXYNOS4210_IRQ_GATE
+      hw/arm/exynos4210: Put a9mpcore device into state struct
+      hw/arm/exynos4210: Drop int_gic_irq[] from Exynos4210Irq struct
+      hw/arm/exynos4210: Coalesce board_irqs and irq_table
+      hw/arm/exynos4210: Fix code style nit in combiner_grp_to_gic_id[]
+      hw/arm/exynos4210: Move exynos4210_init_board_irqs() into exynos4210.c
+      hw/arm/exynos4210: Put external GIC into state struct
+      hw/arm/exynos4210: Drop ext_gic_irq[] from Exynos4210Irq struct
+      hw/arm/exynos4210: Move exynos4210_combiner_get_gpioin() into exynos4210.c
+      hw/arm/exynos4210: Delete unused macro definitions
+      hw/arm/exynos4210: Use TYPE_SPLIT_IRQ in exynos4210_init_board_irqs()
+      hw/arm/exynos4210: Fill in irq_table[] for internal-combiner-only IRQ lines
+      hw/arm/exynos4210: Connect MCT_G0 and MCT_G1 to both combiners
+      hw/arm/exynos4210: Don't connect multiple lines to external GIC inputs
+      hw/arm/exynos4210: Fold combiner splits into exynos4210_init_board_irqs()
+      hw/arm/exynos4210: Put combiners into state struct
+      hw/arm/exynos4210: Drop Exynos4210Irq struct
+Zongyuan Li (3):
+      hw/arm/realview: replace 'qemu_split_irq' with 'TYPE_SPLIT_IRQ'
+      hw/arm/stellaris: replace 'qemu_split_irq' with 'TYPE_SPLIT_IRQ'
+      hw/core/irq: remove unused 'qemu_irq_split' function
+ docs/system/arm/virt.rst              |   4 +-
+ include/hw/arm/exynos4210.h           |  50 ++--
+ include/hw/arm/xlnx-versal.h          |  16 ++
+ include/hw/arm/xlnx-zynqmp.h          |   4 +
+ include/hw/intc/exynos4210_combiner.h |  57 +++++
+ include/hw/intc/exynos4210_gic.h      |  43 ++++
+ include/hw/irq.h                      |   5 -
+ include/hw/misc/npcm7xx_gcr.h         |  30 +++
+ include/hw/misc/xlnx-versal-crl.h     | 235 +++++++++++++++++++
+ include/hw/timer/cadence_ttc.h        |  54 +++++
+ hw/arm/exynos4210.c                   | 430 ++++++++++++++++++++++++++++++----
+ hw/arm/npcm7xx_boards.c               |  24 +-
+ hw/arm/realview.c                     |  33 ++-
+ hw/arm/stellaris.c                    |  15 +-
+ hw/arm/virt.c                         |   7 +
+ hw/arm/xlnx-versal-virt.c             |   6 +-
+ hw/arm/xlnx-versal.c                  |  99 +++++++-
+ hw/arm/xlnx-zynqmp.c                  |  22 ++
+ hw/core/irq.c                         |  15 --
+ hw/intc/exynos4210_combiner.c         | 108 +--------
+ hw/intc/exynos4210_gic.c              | 344 +--------------------------
+ hw/misc/xlnx-versal-crl.c             | 421 +++++++++++++++++++++++++++++++++
+ hw/timer/cadence_ttc.c                |  32 +--
+ MAINTAINERS                           |   2 +-
+ hw/misc/meson.build                   |   1 +
+files changed, 1457 insertions(+), 600 deletions(-)
+ create mode 100644 include/hw/intc/exynos4210_combiner.h
+ create mode 100644 include/hw/intc/exynos4210_gic.h
+ create mode 100644 include/hw/misc/xlnx-versal-crl.h
+ create mode 100644 include/hw/timer/cadence_ttc.h
+ create mode 100644 hw/misc/xlnx-versal-crl.c

-[Qemu-devel] [PULL 46/48] target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree
+[PULL 01/31] hw/arm/virt: Check for attempt to use TrustZone with KVM or HVF
-Convert the VCVT (between floating-point and fixed-point) instructions
+It's not possible to provide the guest with the Security extensions
-to decodetree.
+(TrustZone) when using KVM or HVF, because the hardware
 virtualization extensions don't permit running EL3 guest code.
 However, we weren't checking for this combination, with the result
 that QEMU would assert if you tried it:
+$ qemu-system-aarch64 -enable-kvm -machine virt,secure=on -cpu host -display none
+Unexpected error in object_property_find_err() at ../../qom/object.c:1304:
+qemu-system-aarch64: Property 'host-arm-cpu.secure-memory' not found
+Aborted
+Check for this combination of options and report an error, in the
+same way we already do for attempts to give a KVM or HVF guest the
+Virtualization or MTE extensions. Now we will report:
+qemu-system-aarch64: mach-virt: KVM does not support providing Security extensions (TrustZone) to the guest CPU
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/961
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404155301.566542-1-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 124 +++++++++++++++++++++++++++++++++
+ hw/arm/virt.c | 7 +++++++
- target/arm/translate.c         |  57 +--------------
+file changed, 7 insertions(+)
  target/arm/vfp.decode          |  10 +++
 files changed, 136 insertions(+), 55 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/arm/virt.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/arm/virt.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
+@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
-     tcg_temp_free_i32(vd);
+         exit(1);
-     return true;
+     }
- }
-+
++    if (vms->secure && (kvm_enabled() || hvf_enabled())) {
-+static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
++        error_report("mach-virt: %s does not support providing "
-+{
++                     "Security extensions (TrustZone) to the guest CPU",
-+    TCGv_i32 vd, shift;
++                     kvm_enabled() ? "KVM" : "HVF");
-+    TCGv_ptr fpst;
++        exit(1);
 +    int frac_bits;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 +        return false;
 +    }
 +
-+    if (!vfp_access_check(s)) {
+     if (vms->virt && (kvm_enabled() || hvf_enabled())) {
-+        return true;
+         error_report("mach-virt: %s does not support providing "
-+    }
+                      "Virtualization extensions to the guest CPU",
 +
 +    frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 +
 +    vd = tcg_temp_new_i32();
 +    neon_load_reg32(vd, a->vd);
 +
 +    fpst = get_fpstatus_ptr(false);
 +    shift = tcg_const_i32(frac_bits);
 +
 +    /* Switch on op:U:sx bits */
 +    switch (a->opc) {
 +    case 0:
 +        gen_helper_vfp_shtos(vd, vd, shift, fpst);
 +        break;
 +    case 1:
 +        gen_helper_vfp_sltos(vd, vd, shift, fpst);
 +        break;
 +    case 2:
 +        gen_helper_vfp_uhtos(vd, vd, shift, fpst);
 +        break;
 +    case 3:
 +        gen_helper_vfp_ultos(vd, vd, shift, fpst);
 +        break;
 +    case 4:
 +        gen_helper_vfp_toshs_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 5:
 +        gen_helper_vfp_tosls_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 6:
 +        gen_helper_vfp_touhs_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 7:
 +        gen_helper_vfp_touls_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    neon_store_reg32(vd, a->vd);
 +    tcg_temp_free_i32(vd);
 +    tcg_temp_free_i32(shift);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
 +static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
 +{
 +    TCGv_i64 vd;
 +    TCGv_i32 shift;
 +    TCGv_ptr fpst;
 +    int frac_bits;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 +
 +    vd = tcg_temp_new_i64();
 +    neon_load_reg64(vd, a->vd);
 +
 +    fpst = get_fpstatus_ptr(false);
 +    shift = tcg_const_i32(frac_bits);
 +
 +    /* Switch on op:U:sx bits */
 +    switch (a->opc) {
 +    case 0:
 +        gen_helper_vfp_shtod(vd, vd, shift, fpst);
 +        break;
 +    case 1:
 +        gen_helper_vfp_sltod(vd, vd, shift, fpst);
 +        break;
 +    case 2:
 +        gen_helper_vfp_uhtod(vd, vd, shift, fpst);
 +        break;
 +    case 3:
 +        gen_helper_vfp_ultod(vd, vd, shift, fpst);
 +        break;
 +    case 4:
 +        gen_helper_vfp_toshd_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 5:
 +        gen_helper_vfp_tosld_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 6:
 +        gen_helper_vfp_touhd_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 7:
 +        gen_helper_vfp_tould_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    neon_store_reg64(vd, a->vd);
 +    tcg_temp_free_i64(vd);
 +    tcg_temp_free_i32(shift);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp, int shift, int neon) \
      tcg_temp_free_i32(tmp_shift); \
      tcg_temp_free_ptr(statusptr); \
  }
 -VFP_GEN_FIX(tosh, _round_to_zero)
  VFP_GEN_FIX(tosl, _round_to_zero)
 -VFP_GEN_FIX(touh, _round_to_zero)
  VFP_GEN_FIX(toul, _round_to_zero)
 -VFP_GEN_FIX(shto, )
  VFP_GEN_FIX(slto, )
 -VFP_GEN_FIX(uhto, )
  VFP_GEN_FIX(ulto, )
  #undef VFP_GEN_FIX
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  return 1;
              case 15:
                  switch (rn) {
 -                case 0 ... 19:
 +                case 0 ... 23:
 +                case 28 ... 31:
                      /* Already handled by decodetree */
                      return 1;
                  default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                      rd_is_dp = false;
                      break;
 -                case 0x14: /* vcvt fp <-> fixed */
 -                case 0x15:
 -                case 0x16:
 -                case 0x17:
 -                case 0x1c:
 -                case 0x1d:
 -                case 0x1e:
 -                case 0x1f:
 -                    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                        return 1;
 -                    }
 -                    /* Immediate frac_bits has same format as SREG_M.  */
 -                    rm_is_dp = false;
 -                    break;
 -
                  default:
                      return 1;
                  }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              /* Load the initial operands.  */
              if (op == 15) {
                  switch (rn) {
 -                case 0x14: /* vcvt fp <-> fixed */
 -                case 0x15:
 -                case 0x16:
 -                case 0x17:
 -                case 0x1c:
 -                case 0x1d:
 -                case 0x1e:
 -                case 0x1f:
 -                    /* Source and destination the same.  */
 -                    gen_mov_F0_vreg(dp, rd);
 -                    break;
                  default:
                      /* One source operand.  */
                      gen_mov_F0_vreg(rm_is_dp, rm);
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 20: /* fshto */
 -                        gen_vfp_shto(dp, 16 - rm, 0);
 -                        break;
 -                    case 21: /* fslto */
 -                        gen_vfp_slto(dp, 32 - rm, 0);
 -                        break;
 -                    case 22: /* fuhto */
 -                        gen_vfp_uhto(dp, 16 - rm, 0);
 -                        break;
 -                    case 23: /* fulto */
 -                        gen_vfp_ulto(dp, 32 - rm, 0);
 -                        break;
                      case 24: /* ftoui */
                          gen_vfp_toui(dp, 0);
                          break;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                      case 27: /* ftosiz */
                          gen_vfp_tosiz(dp, 0);
                          break;
 -                    case 28: /* ftosh */
 -                        gen_vfp_tosh(dp, 16 - rm, 0);
 -                        break;
 -                    case 29: /* ftosl */
 -                        gen_vfp_tosl(dp, 32 - rm, 0);
 -                        break;
 -                    case 30: /* ftouh */
 -                        gen_vfp_touh(dp, 16 - rm, 0);
 -                        break;
 -                    case 31: /* ftoul */
 -                        gen_vfp_toul(dp, 32 - rm, 0);
 -                        break;
                      default: /* undefined */
                          g_assert_not_reached();
                      }
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_int_dp  ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
  # VJCVT is always dp to sp
  VJCVT        ---- 1110 1.11 1001 .... 1011 11.0 .... \
               vd=%vd_sp vm=%vm_dp
 +
 +# VCVT between floating-point and fixed-point. The immediate value
 +# is in the same format as a Vm single-precision register number.
 +# We assemble bits 18 (op), 16 (u) and 7 (sx) into a single opc field
 +# for the convenience of the trans_VCVT_fix functions.
 +%vcvt_fix_op 18:1 16:1 7:1
 +VCVT_fix_sp  ---- 1110 1.11 1.1. .... 1010 .1.0 .... \
 +             vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
 +VCVT_fix_dp  ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
 +             vd=%vd_dp imm=%vm_sp opc=%vcvt_fix_op
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 14/48] target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM to decodetree
+[PULL 02/31] timer: cadence_ttc: Break out header file to allow embedding
-Convert the VRINTA/VRINTN/VRINTP/VRINTM instructions to decodetree.
+From: "Edgar E. Iglesias" <edgar.iglesias@amd.com>
 Again, trans_VRINT() is temporarily left in translate.c.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Break out header file to allow embedding of the the TTC.
 Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Reviewed-by: Luc Michel <luc@lmichel.fr>
 Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
 Message-id: 20220331222017.2914409-2-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c       | 60 +++++++++++++++++++++++-------------
+ include/hw/timer/cadence_ttc.h | 54 ++++++++++++++++++++++++++++++++++
- target/arm/vfp-uncond.decode |  5 +++
+ hw/timer/cadence_ttc.c         | 32 ++------------------
-files changed, 43 insertions(+), 22 deletions(-)
+files changed, 56 insertions(+), 30 deletions(-)
  create mode 100644 include/hw/timer/cadence_ttc.h
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/include/hw/timer/cadence_ttc.h b/include/hw/timer/cadence_ttc.h
-index XXXXXXX..XXXXXXX 100644
+new file mode 100644
---- a/target/arm/translate.c
+index XXXXXXX..XXXXXXX
-+++ b/target/arm/translate.c
+--- /dev/null
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
++++ b/include/hw/timer/cadence_ttc.h
-     return true;
+@@ -XXX,XX +XXX,XX @@
  }
 -static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
 -                        int rounding)
 +/*
-+ * Table for converting the most common AArch32 encoding of
++ * Xilinx Zynq cadence TTC model
-+ * rounding mode to arm_fprounding order (which matches the
++ *
-+ * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
++ * Copyright (c) 2011 Xilinx Inc.
 + * Copyright (c) 2012 Peter A.G. Crosthwaite (peter.crosthwaite@petalogix.com)
 + * Copyright (c) 2012 PetaLogix Pty Ltd.
 + * Written By Haibing Ma
 + *            M. Habib
 + *
 + * This program is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU General Public License
 + * as published by the Free Software Foundation; either version
 + * 2 of the License, or (at your option) any later version.
 + *
 + * You should have received a copy of the GNU General Public License along
 + * with this program; if not, see <http://www.gnu.org/licenses/>.
 + */
-+static const uint8_t fp_decode_rm[] = {
++#ifndef HW_TIMER_CADENCE_TTC_H
-+    FPROUNDING_TIEAWAY,
++#define HW_TIMER_CADENCE_TTC_H
-+    FPROUNDING_TIEEVEN,
++
-+    FPROUNDING_POSINF,
++#include "hw/sysbus.h"
-+    FPROUNDING_NEGINF,
++#include "qemu/timer.h"
 +
 +typedef struct {
 +    QEMUTimer *timer;
 +    int freq;
 +
 +    uint32_t reg_clock;
 +    uint32_t reg_count;
 +    uint32_t reg_value;
 +    uint16_t reg_interval;
 +    uint16_t reg_match[3];
 +    uint32_t reg_intr;
 +    uint32_t reg_intr_en;
 +    uint32_t reg_event_ctrl;
 +    uint32_t reg_event;
 +
 +    uint64_t cpu_time;
 +    unsigned int cpu_time_valid;
 +
 +    qemu_irq irq;
 +} CadenceTimerState;
 +
 +#define TYPE_CADENCE_TTC "cadence_ttc"
 +OBJECT_DECLARE_SIMPLE_TYPE(CadenceTTCState, CADENCE_TTC)
 +
 +struct CadenceTTCState {
 +    SysBusDevice parent_obj;
 +
 +    MemoryRegion iomem;
 +    CadenceTimerState timer[3];
 +};
 +
-+static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
++#endif
- {
+diff --git a/hw/timer/cadence_ttc.c b/hw/timer/cadence_ttc.c
--    TCGv_ptr fpst = get_fpstatus_ptr(0);
+index XXXXXXX..XXXXXXX 100644
-+    uint32_t rd, rm;
+--- a/hw/timer/cadence_ttc.c
-+    bool dp = a->dp;
++++ b/hw/timer/cadence_ttc.c
-+    TCGv_ptr fpst;
+@@ -XXX,XX +XXX,XX @@
-     TCGv_i32 tcg_rmode;
+ #include "qemu/timer.h"
-+    int rounding = fp_decode_rm[a->rm];
+ #include "qom/object.h"
 +#include "hw/timer/cadence_ttc.h"
 +
-+    if (!dc_isar_feature(aa32_vrint, s)) {
+ #ifdef CADENCE_TTC_ERR_DEBUG
-+        return false;
+ #define DB_PRINT(...) do { \
-+    }
+     fprintf(stderr,  ": %s: ", __func__); \
-+
+@@ -XXX,XX +XXX,XX @@
-+    /* UNDEF accesses to D16-D31 if they don't exist */
+ #define CLOCK_CTRL_PS_EN    0x00000001
-+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+ #define CLOCK_CTRL_PS_V     0x0000001e
-+        ((a->vm | a->vd) & 0x10)) {
-+        return false;
+-typedef struct {
-+    }
+-    QEMUTimer *timer;
-+    rd = a->vd;
+-    int freq;
-+    rm = a->vm;
+-
-+
+-    uint32_t reg_clock;
-+    if (!vfp_access_check(s)) {
+-    uint32_t reg_count;
-+        return true;
+-    uint32_t reg_value;
-+    }
+-    uint16_t reg_interval;
-+
+-    uint16_t reg_match[3];
-+    fpst = get_fpstatus_ptr(0);
+-    uint32_t reg_intr;
+-    uint32_t reg_intr_en;
-     tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
+-    uint32_t reg_event_ctrl;
-     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+-    uint32_t reg_event;
-@@ -XXX,XX +XXX,XX @@ static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
+-
-     tcg_temp_free_i32(tcg_rmode);
+-    uint64_t cpu_time;
+-    unsigned int cpu_time_valid;
-     tcg_temp_free_ptr(fpst);
+-
--    return 0;
+-    qemu_irq irq;
-+    return true;
+-} CadenceTimerState;
- }
+-
+-#define TYPE_CADENCE_TTC "cadence_ttc"
- static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
+-OBJECT_DECLARE_SIMPLE_TYPE(CadenceTTCState, CADENCE_TTC)
-@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
+-
-     return 0;
+-struct CadenceTTCState {
- }
+-    SysBusDevice parent_obj;
+-
--/* Table for converting the most common AArch32 encoding of
+-    MemoryRegion iomem;
-- * rounding mode to arm_fprounding order (which matches the
+-    CadenceTimerState timer[3];
 - * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
 - */
 -static const uint8_t fp_decode_rm[] = {
 -    FPROUNDING_TIEAWAY,
 -    FPROUNDING_TIEEVEN,
 -    FPROUNDING_POSINF,
 -    FPROUNDING_NEGINF,
 -};
 -
- static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
+ static void cadence_timer_update(CadenceTimerState *s)
  {
-     uint32_t rd, rm, dp = extract32(insn, 8, 1);
+     qemu_set_irq(s->irq, !!(s->reg_intr & s->reg_intr_en));
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
          rm = VFP_SREG_M(insn);
      }
 -    if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
 -        dc_isar_feature(aa32_vrint, s)) {
 -        /* VRINTA, VRINTN, VRINTP, VRINTM */
 -        int rounding = fp_decode_rm[extract32(insn, 16, 2)];
 -        return handle_vrint(insn, rd, rm, dp, rounding);
 -    } else if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
 -               dc_isar_feature(aa32_vcvt_dr, s)) {
 +    if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
 +        dc_isar_feature(aa32_vcvt_dr, s)) {
          /* VCVTA, VCVTN, VCVTP, VCVTM */
          int rounding = fp_decode_rm[extract32(insn, 16, 2)];
          return handle_vcvt(insn, rd, rm, dp, rounding);
 diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp-uncond.decode
 +++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VMINMAXNM   1111 1110 1.00 .... .... 1010 . op:1 .0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
  VMINMAXNM   1111 1110 1.00 .... .... 1011 . op:1 .0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
 +
 +VRINT       1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
 +            vm=%vm_sp vd=%vd_sp dp=0
 +VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
 +            vm=%vm_dp vd=%vd_dp dp=1
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 35/48] target/arm: Convert VABS to decodetree
+[PULL 03/31] hw/arm/xlnx-zynqmp: Connect 4 TTC timers
-Convert the VFP VABS instruction to decodetree.
+From: "Edgar E. Iglesias" <edgar.iglesias@amd.com>
-Unlike the 3-op versions, we don't pass fpst to the VFPGen2OpSPFn or
+Connect the 4 TTC timers on the ZynqMP.
 VFPGen2OpDPFn because none of the operations which use this format
 and support short vectors will need it.
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Luc Michel <luc@lmichel.fr>
+Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
+Message-id: 20220331222017.2914409-3-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 167 +++++++++++++++++++++++++++++++++
+ include/hw/arm/xlnx-zynqmp.h |  4 ++++
- target/arm/translate.c         |  12 ++-
+ hw/arm/xlnx-zynqmp.c         | 22 ++++++++++++++++++++++
- target/arm/vfp.decode          |   5 +
+files changed, 26 insertions(+)
 files changed, 180 insertions(+), 4 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/xlnx-zynqmp.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/xlnx-zynqmp.h
-@@ -XXX,XX +XXX,XX @@ typedef void VFPGen3OpSPFn(TCGv_i32 vd,
+@@ -XXX,XX +XXX,XX @@
- typedef void VFPGen3OpDPFn(TCGv_i64 vd,
+ #include "hw/or-irq.h"
-                            TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst);
+ #include "hw/misc/xlnx-zynqmp-apu-ctrl.h"
+ #include "hw/misc/xlnx-zynqmp-crf.h"
-+/*
++#include "hw/timer/cadence_ttc.h"
-+ * Types for callbacks for do_vfp_2op_sp() and do_vfp_2op_dp().
-+ * The callback should emit code to write a value to vd (which
+ #define TYPE_XLNX_ZYNQMP "xlnx-zynqmp"
-+ * should be written to only).
+ OBJECT_DECLARE_SIMPLE_TYPE(XlnxZynqMPState, XLNX_ZYNQMP)
-+ */
+@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(XlnxZynqMPState, XLNX_ZYNQMP)
-+typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
+ #define XLNX_ZYNQMP_MAX_RAM_SIZE (XLNX_ZYNQMP_MAX_LOW_RAM_SIZE + \
-+typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
+                                   XLNX_ZYNQMP_MAX_HIGH_RAM_SIZE)
 +#define XLNX_ZYNQMP_NUM_TTC 4
 +
  /*
-  * Perform a 3-operand VFP data processing instruction. fn is the
+  * Unimplemented mmio regions needed to boot some images.
-  * callback to do the actual operation; this function deals with the
+  */
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
+@@ -XXX,XX +XXX,XX @@ struct XlnxZynqMPState {
-     return true;
+     qemu_or_irq qspi_irq_orgate;
      XlnxZynqMPAPUCtrl apu_ctrl;
      XlnxZynqMPCRF crf;
 +    CadenceTTCState ttc[XLNX_ZYNQMP_NUM_TTC];
      char *boot_cpu;
      ARMCPU *boot_cpu_ptr;
 diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-zynqmp.c
 +++ b/hw/arm/xlnx-zynqmp.c
@@ -XXX,XX +XXX,XX @@
  #define APU_ADDR            0xfd5c0000
  #define APU_IRQ             153
 +#define TTC0_ADDR           0xFF110000
 +#define TTC0_IRQ            36
 +
  #define IPI_ADDR            0xFF300000
  #define IPI_IRQ             64
@@ -XXX,XX +XXX,XX @@ static void xlnx_zynqmp_create_crf(XlnxZynqMPState *s, qemu_irq *gic)
      sysbus_connect_irq(sbd, 0, gic[CRF_IRQ]);
  }
-+static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
++static void xlnx_zynqmp_create_ttc(XlnxZynqMPState *s, qemu_irq *gic)
 +{
-+    uint32_t delta_m = 0;
++    SysBusDevice *sbd;
-+    uint32_t delta_d = 0;
++    int i, irq;
 +    uint32_t bank_mask = 0;
 +    int veclen = s->vec_len;
 +    TCGv_i32 f0, fd;
 +
-+    if (!dc_isar_feature(aa32_fpshvec, s) &&
++    for (i = 0; i < XLNX_ZYNQMP_NUM_TTC; i++) {
-+        (veclen != 0 || s->vec_stride != 0)) {
++        object_initialize_child(OBJECT(s), "ttc[*]", &s->ttc[i],
-+        return false;
++                                TYPE_CADENCE_TTC);
-+    }
++        sbd = SYS_BUS_DEVICE(&s->ttc[i]);
 +
-+    if (!vfp_access_check(s)) {
++        sysbus_realize(sbd, &error_fatal);
-+        return true;
++        sysbus_mmio_map(sbd, 0, TTC0_ADDR + i * 0x10000);
-+    }
++        for (irq = 0; irq < 3; irq++) {
-+
++            sysbus_connect_irq(sbd, irq, gic[TTC0_IRQ + i * 3 + irq]);
 +    if (veclen > 0) {
 +        bank_mask = 0x18;
 +
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = s->vec_stride + 1;
 +
 +            if ((vm & bank_mask) == 0) {
 +                /* mixed scalar/vector */
 +                delta_m = 0;
 +            } else {
 +                /* vector */
 +                delta_m = delta_d;
 +            }
 +        }
 +    }
-+
-+    f0 = tcg_temp_new_i32();
-+    fd = tcg_temp_new_i32();
-+
-+    neon_load_reg32(f0, vm);
-+
-+    for (;;) {
-+        fn(fd, f0);
-+        neon_store_reg32(fd, vd);
-+
-+        if (veclen == 0) {
-+            break;
-+        }
-+
-+        if (delta_m == 0) {
-+            /* single source one-many */
-+            while (veclen--) {
-+                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-+                neon_store_reg32(fd, vd);
-+            }
-+            break;
-+        }
-+
-+        /* Set up the operands for the next iteration */
-+        veclen--;
-+        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-+        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
-+        neon_load_reg32(f0, vm);
-+    }
-+
-+    tcg_temp_free_i32(f0);
-+    tcg_temp_free_i32(fd);
-+
-+    return true;
 +}
 +
-+static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
+ static void xlnx_zynqmp_create_unimp_mmio(XlnxZynqMPState *s)
 +{
 +    uint32_t delta_m = 0;
 +    uint32_t delta_d = 0;
 +    uint32_t bank_mask = 0;
 +    int veclen = s->vec_len;
 +    TCGv_i64 f0, fd;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!dc_isar_feature(aa32_fpshvec, s) &&
 +        (veclen != 0 || s->vec_stride != 0)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (veclen > 0) {
 +        bank_mask = 0xc;
 +
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = (s->vec_stride >> 1) + 1;
 +
 +            if ((vm & bank_mask) == 0) {
 +                /* mixed scalar/vector */
 +                delta_m = 0;
 +            } else {
 +                /* vector */
 +                delta_m = delta_d;
 +            }
 +        }
 +    }
 +
 +    f0 = tcg_temp_new_i64();
 +    fd = tcg_temp_new_i64();
 +
 +    neon_load_reg64(f0, vm);
 +
 +    for (;;) {
 +        fn(fd, f0);
 +        neon_store_reg64(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +
 +        if (delta_m == 0) {
 +            /* single source one-many */
 +            while (veclen--) {
 +                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +                neon_store_reg64(fd, vd);
 +            }
 +            break;
 +        }
 +
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +        neon_load_reg64(f0, vm);
 +    }
 +
 +    tcg_temp_free_i64(f0);
 +    tcg_temp_free_i64(fd);
 +
 +    return true;
 +}
 +
  static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
  {
-     /* Note that order of inputs to the add matters for NaNs */
+     static const struct UnimpInfo {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
+@@ -XXX,XX +XXX,XX @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error **errp)
-     tcg_temp_free_i64(fd);
+     xlnx_zynqmp_create_efuse(s, gic_spi);
-     return true;
+     xlnx_zynqmp_create_apu_ctrl(s, gic_spi);
- }
+     xlnx_zynqmp_create_crf(s, gic_spi);
-+
++    xlnx_zynqmp_create_ttc(s, gic_spi);
-+static bool trans_VABS_sp(DisasContext *s, arg_VABS_sp *a)
+     xlnx_zynqmp_create_unimp_mmio(s);
-+{
-+    return do_vfp_2op_sp(s, gen_helper_vfp_abss, a->vd, a->vm);
+     for (i = 0; i < XLNX_ZYNQMP_NUM_GDMA_CH; i++) {
 +}
 +
 +static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
 +{
 +    return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              case 0 ... 14:
                  /* Already handled by decodetree */
                  return 1;
 +            case 15:
 +                switch (rn) {
 +                case 1:
 +                    /* Already handled by decodetree */
 +                    return 1;
 +                default:
 +                    break;
 +                }
              default:
                  break;
              }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  /* rn is opcode, encoded as per VFP_SREG_N. */
                  switch (rn) {
                  case 0x00: /* vmov */
 -                case 0x01: /* vabs */
                  case 0x02: /* vneg */
                  case 0x03: /* vsqrt */
                      break;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                      case 0: /* cpy */
                          /* no-op */
                          break;
 -                    case 1: /* abs */
 -                        gen_vfp_abs(dp);
 -                        break;
                      case 2: /* neg */
                          gen_vfp_neg(dp);
                          break;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMOV_imm_sp  ---- 1110 1.11 imm4h:4 .... 1010 0000 imm4l:4 \
               vd=%vd_sp
  VMOV_imm_dp  ---- 1110 1.11 imm4h:4 .... 1011 0000 imm4l:4 \
               vd=%vd_dp
 +
 +VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 03/48] target/arm: Implement NSACR gating of floating point
+[PULL 04/31] hw/arm: versal: Create an APU CPU Cluster
-The NSACR register allows secure code to configure the FPU
+From: "Edgar E. Iglesias" <edgar.iglesias@amd.com>
 to be inaccessible to non-secure code. If the NSACR.CP10
 bit is set then:
  * NS accesses to the FPU trap as UNDEF (ie to NS EL1 or EL2)
  * CPACR.{CP10,CP11} behave as if RAZ/WI
  * HCPTR.{TCP11,TCP10} behave as if RAO/WI
-Note that we do not implement the NSACR.NSASEDIS bit which
+Create an APU CPU Cluster. This is in preparation to add the RPU.
 gates only access to Advanced SIMD, in the same way that
 we don't implement the equivalent CPACR.ASEDIS and HCPTR.TASE.
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
 Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com>
 Message-id: 20220406174303.2022038-2-edgar.iglesias@xilinx.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20190510110357.18825-1-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 75 +++++++++++++++++++++++++++++++++++++++++++--
+ include/hw/arm/xlnx-versal.h | 2 ++
-file changed, 73 insertions(+), 2 deletions(-)
+ hw/arm/xlnx-versal.c         | 9 ++++++++-
 files changed, 10 insertions(+), 1 deletion(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/target/arm/helper.c
++++ b/include/hw/arm/xlnx-versal.h
-@@ -XXX,XX +XXX,XX @@ static void cpacr_write(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@
-         }
-         value &= mask;
+ #include "hw/sysbus.h"
  #include "hw/arm/boot.h"
 +#include "hw/cpu/cluster.h"
  #include "hw/or-irq.h"
  #include "hw/sd/sdhci.h"
  #include "hw/intc/arm_gicv3.h"
@@ -XXX,XX +XXX,XX @@ struct Versal {
      struct {
          struct {
              MemoryRegion mr;
 +            CPUClusterState cluster;
              ARMCPU cpu[XLNX_VERSAL_NR_ACPUS];
              GICv3State gic;
          } apu;
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-versal.c
 +++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
  {
      int i;
 +    object_initialize_child(OBJECT(s), "apu-cluster", &s->fpd.apu.cluster,
 +                            TYPE_CPU_CLUSTER);
 +    qdev_prop_set_uint32(DEVICE(&s->fpd.apu.cluster), "cluster-id", 0);
 +
      for (i = 0; i < ARRAY_SIZE(s->fpd.apu.cpu); i++) {
          Object *obj;
 -        object_initialize_child(OBJECT(s), "apu-cpu[*]", &s->fpd.apu.cpu[i],
 +        object_initialize_child(OBJECT(&s->fpd.apu.cluster),
 +                                "apu-cpu[*]", &s->fpd.apu.cpu[i],
                                  XLNX_VERSAL_ACPU_TYPE);
          obj = OBJECT(&s->fpd.apu.cpu[i]);
          if (i) {
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
                                   &error_abort);
          qdev_realize(DEVICE(obj), NULL, &error_fatal);
      }
 +
-+    /*
++    qdev_realize(DEVICE(&s->fpd.apu.cluster), NULL, &error_fatal);
 +     * For A-profile AArch32 EL3 (but not M-profile secure mode), if NSACR.CP10
 +     * is 0 then CPACR.{CP11,CP10} ignore writes and read as 0b00.
 +     */
 +    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
 +        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
 +        value &= ~(0xf << 20);
 +        value |= env->cp15.cpacr_el1 & (0xf << 20);
 +    }
 +
      env->cp15.cpacr_el1 = value;
  }
-+static uint64_t cpacr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+ static void versal_create_apu_gic(Versal *s, qemu_irq *pic)
 +{
 +    /*
 +     * For A-profile AArch32 EL3 (but not M-profile secure mode), if NSACR.CP10
 +     * is 0 then CPACR.{CP11,CP10} ignore writes and read as 0b00.
 +     */
 +    uint64_t value = env->cp15.cpacr_el1;
 +
 +    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
 +        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
 +        value &= ~(0xf << 20);
 +    }
 +    return value;
 +}
 +
 +
  static void cpacr_reset(CPUARMState *env, const ARMCPRegInfo *ri)
  {
      /* Call cpacr_write() so that we reset with the correct RAO bits set
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
      { .name = "CPACR", .state = ARM_CP_STATE_BOTH, .opc0 = 3,
        .crn = 1, .crm = 0, .opc1 = 0, .opc2 = 2, .accessfn = cpacr_access,
        .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.cpacr_el1),
 -      .resetfn = cpacr_reset, .writefn = cpacr_write },
 +      .resetfn = cpacr_reset, .writefn = cpacr_write, .readfn = cpacr_read },
      REGINFO_SENTINEL
  };
@@ -XXX,XX +XXX,XX @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
      return ret;
  }
 +static void cptr_el2_write(CPUARMState *env, const ARMCPRegInfo *ri,
 +                           uint64_t value)
 +{
 +    /*
 +     * For A-profile AArch32 EL3, if NSACR.CP10
 +     * is 0 then HCPTR.{TCP11,TCP10} ignore writes and read as 1.
 +     */
 +    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
 +        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
 +        value &= ~(0x3 << 10);
 +        value |= env->cp15.cptr_el[2] & (0x3 << 10);
 +    }
 +    env->cp15.cptr_el[2] = value;
 +}
 +
 +static uint64_t cptr_el2_read(CPUARMState *env, const ARMCPRegInfo *ri)
 +{
 +    /*
 +     * For A-profile AArch32 EL3, if NSACR.CP10
 +     * is 0 then HCPTR.{TCP11,TCP10} ignore writes and read as 1.
 +     */
 +    uint64_t value = env->cp15.cptr_el[2];
 +
 +    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
 +        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
 +        value |= 0x3 << 10;
 +    }
 +    return value;
 +}
 +
  static const ARMCPRegInfo el2_cp_reginfo[] = {
      { .name = "HCR_EL2", .state = ARM_CP_STATE_AA64,
        .type = ARM_CP_IO,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
      { .name = "CPTR_EL2", .state = ARM_CP_STATE_BOTH,
        .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 2,
        .access = PL2_RW, .accessfn = cptr_access, .resetvalue = 0,
 -      .fieldoffset = offsetof(CPUARMState, cp15.cptr_el[2]) },
 +      .fieldoffset = offsetof(CPUARMState, cp15.cptr_el[2]),
 +      .readfn = cptr_el2_read, .writefn = cptr_el2_write },
      { .name = "MAIR_EL2", .state = ARM_CP_STATE_BOTH,
        .opc0 = 3, .opc1 = 4, .crn = 10, .crm = 2, .opc2 = 0,
        .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.mair_el[2]),
@@ -XXX,XX +XXX,XX @@ int fp_exception_el(CPUARMState *env, int cur_el)
          break;
      }
 +    /*
 +     * The NSACR allows A-profile AArch32 EL3 and M-profile secure mode
 +     * to control non-secure access to the FPU. It doesn't have any
 +     * effect if EL3 is AArch64 or if EL3 doesn't exist at all.
 +     */
 +    if ((arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
 +         cur_el <= 2 && !arm_is_secure_below_el3(env))) {
 +        if (!extract32(env->cp15.nsacr, 10, 1)) {
 +            /* FP insns act as UNDEF */
 +            return cur_el == 2 ? 2 : 1;
 +        }
 +    }
 +
      /* For the CPTR registers we don't need to guard with an ARM_FEATURE
       * check because zero bits in the registers mean "don't trap".
       */
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 39/48] target/arm: Convert VFP comparison insns to decodetree
+[PULL 05/31] hw/arm: versal: Add the Cortex-R5Fs
-Convert the VFP comparison instructions to decodetree.
+From: "Edgar E. Iglesias" <edgar.iglesias@amd.com>
-Note that comparison instructions should not honour the VFP
+Add the Cortex-R5Fs of the Versal RPU (Real-time Processing Unit)
-short-vector length and stride information: they are scalar-only
+subsystem.
 operations.  This applies to all the 2-operand instructions except
 for VMOV, VABS, VNEG and VSQRT.  (In the old decoder this is
 implemented via the "if (op == 15 && rn > 3) { veclen = 0; }" check.)
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
+Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com>
+Message-id: 20220406174303.2022038-3-edgar.iglesias@xilinx.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 75 ++++++++++++++++++++++++++++++++++
+ include/hw/arm/xlnx-versal.h | 10 ++++++++++
- target/arm/translate.c         | 51 +----------------------
+ hw/arm/xlnx-versal-virt.c    |  6 +++---
- target/arm/vfp.decode          |  5 +++
+ hw/arm/xlnx-versal.c         | 36 ++++++++++++++++++++++++++++++++++++
-files changed, 81 insertions(+), 50 deletions(-)
+files changed, 49 insertions(+), 3 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/xlnx-versal.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
+@@ -XXX,XX +XXX,XX @@
- {
+ OBJECT_DECLARE_SIMPLE_TYPE(Versal, XLNX_VERSAL)
-     return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
  #define XLNX_VERSAL_NR_ACPUS   2
 +#define XLNX_VERSAL_NR_RCPUS   2
  #define XLNX_VERSAL_NR_UARTS   2
  #define XLNX_VERSAL_NR_GEMS    2
  #define XLNX_VERSAL_NR_ADMAS   8
@@ -XXX,XX +XXX,XX @@ struct Versal {
              VersalUsb2 usb;
          } iou;
 +        /* Real-time Processing Unit.  */
 +        struct {
 +            MemoryRegion mr;
 +            MemoryRegion mr_ps_alias;
 +
 +            CPUClusterState cluster;
 +            ARMCPU cpu[XLNX_VERSAL_NR_RCPUS];
 +        } rpu;
 +
          struct {
              qemu_or_irq irq_orgate;
              XlnxXramCtrl ctrl[XLNX_VERSAL_NR_XRAM];
 diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-versal-virt.c
 +++ b/hw/arm/xlnx-versal-virt.c
@@ -XXX,XX +XXX,XX @@ static void versal_virt_machine_class_init(ObjectClass *oc, void *data)
      mc->desc = "Xilinx Versal Virtual development board";
      mc->init = versal_virt_init;
 -    mc->min_cpus = XLNX_VERSAL_NR_ACPUS;
 -    mc->max_cpus = XLNX_VERSAL_NR_ACPUS;
 -    mc->default_cpus = XLNX_VERSAL_NR_ACPUS;
 +    mc->min_cpus = XLNX_VERSAL_NR_ACPUS + XLNX_VERSAL_NR_RCPUS;
 +    mc->max_cpus = XLNX_VERSAL_NR_ACPUS + XLNX_VERSAL_NR_RCPUS;
 +    mc->default_cpus = XLNX_VERSAL_NR_ACPUS + XLNX_VERSAL_NR_RCPUS;
      mc->no_cdrom = true;
      mc->default_ram_id = "ddr";
  }
+diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/arm/xlnx-versal.c
++++ b/hw/arm/xlnx-versal.c
+@@ -XXX,XX +XXX,XX @@
+ #include "hw/sysbus.h"
+ #define XLNX_VERSAL_ACPU_TYPE ARM_CPU_TYPE_NAME("cortex-a72")
++#define XLNX_VERSAL_RCPU_TYPE ARM_CPU_TYPE_NAME("cortex-r5f")
+ #define GEM_REVISION        0x40070106
+ #define VERSAL_NUM_PMC_APB_IRQS 3
+@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_gic(Versal *s, qemu_irq *pic)
+     }
+ }
++static void versal_create_rpu_cpus(Versal *s)
++{
++    int i;
 +
-+static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
++    object_initialize_child(OBJECT(s), "rpu-cluster", &s->lpd.rpu.cluster,
-+{
++                            TYPE_CPU_CLUSTER);
-+    TCGv_i32 vd, vm;
++    qdev_prop_set_uint32(DEVICE(&s->lpd.rpu.cluster), "cluster-id", 1);
 +
-+    /* Vm/M bits must be zero for the Z variant */
++    for (i = 0; i < ARRAY_SIZE(s->lpd.rpu.cpu); i++) {
-+    if (a->z && a->vm != 0) {
++        Object *obj;
-+        return false;
++
 +        object_initialize_child(OBJECT(&s->lpd.rpu.cluster),
 +                                "rpu-cpu[*]", &s->lpd.rpu.cpu[i],
 +                                XLNX_VERSAL_RCPU_TYPE);
 +        obj = OBJECT(&s->lpd.rpu.cpu[i]);
 +        object_property_set_bool(obj, "start-powered-off", true,
 +                                 &error_abort);
 +
 +        object_property_set_int(obj, "mp-affinity", 0x100 | i, &error_abort);
 +        object_property_set_int(obj, "core-count", ARRAY_SIZE(s->lpd.rpu.cpu),
 +                                &error_abort);
 +        object_property_set_link(obj, "memory", OBJECT(&s->lpd.rpu.mr),
 +                                 &error_abort);
 +        qdev_realize(DEVICE(obj), NULL, &error_fatal);
 +    }
 +
-+    if (!vfp_access_check(s)) {
++    qdev_realize(DEVICE(&s->lpd.rpu.cluster), NULL, &error_fatal);
 +        return true;
 +    }
 +
 +    vd = tcg_temp_new_i32();
 +    vm = tcg_temp_new_i32();
 +
 +    neon_load_reg32(vd, a->vd);
 +    if (a->z) {
 +        tcg_gen_movi_i32(vm, 0);
 +    } else {
 +        neon_load_reg32(vm, a->vm);
 +    }
 +
 +    if (a->e) {
 +        gen_helper_vfp_cmpes(vd, vm, cpu_env);
 +    } else {
 +        gen_helper_vfp_cmps(vd, vm, cpu_env);
 +    }
 +
 +    tcg_temp_free_i32(vd);
 +    tcg_temp_free_i32(vm);
 +
 +    return true;
 +}
 +
-+static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
+ static void versal_create_uarts(Versal *s, qemu_irq *pic)
-+{
+ {
-+    TCGv_i64 vd, vm;
+     int i;
-+
+@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
-+    /* Vm/M bits must be zero for the Z variant */
-+    if (a->z && a->vm != 0) {
+     versal_create_apu_cpus(s);
-+        return false;
+     versal_create_apu_gic(s, pic);
-+    }
++    versal_create_rpu_cpus(s);
-+
+     versal_create_uarts(s, pic);
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
+     versal_create_usbs(s, pic);
-+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+     versal_create_gems(s, pic);
-+        return false;
+@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
-+    }
-+
+     memory_region_add_subregion_overlap(&s->mr_ps, MM_OCM, &s->lpd.mr_ocm, 0);
-+    if (!vfp_access_check(s)) {
+     memory_region_add_subregion_overlap(&s->fpd.apu.mr, 0, &s->mr_ps, 0);
-+        return true;
++    memory_region_add_subregion_overlap(&s->lpd.rpu.mr, 0,
-+    }
++                                        &s->lpd.rpu.mr_ps_alias, 0);
 +
 +    vd = tcg_temp_new_i64();
 +    vm = tcg_temp_new_i64();
 +
 +    neon_load_reg64(vd, a->vd);
 +    if (a->z) {
 +        tcg_gen_movi_i64(vm, 0);
 +    } else {
 +        neon_load_reg64(vm, a->vm);
 +    }
 +
 +    if (a->e) {
 +        gen_helper_vfp_cmped(vd, vm, cpu_env);
 +    } else {
 +        gen_helper_vfp_cmpd(vd, vm, cpu_env);
 +    }
 +
 +    tcg_temp_free_i64(vd);
 +    tcg_temp_free_i64(vm);
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_neg(int dp)
          gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
  }
--static inline void gen_vfp_cmp(int dp)
+ static void versal_init(Object *obj)
--{
+@@ -XXX,XX +XXX,XX @@ static void versal_init(Object *obj)
--    if (dp)
+     Versal *s = XLNX_VERSAL(obj);
--        gen_helper_vfp_cmpd(cpu_F0d, cpu_F1d, cpu_env);
--    else
+     memory_region_init(&s->fpd.apu.mr, obj, "mr-apu", UINT64_MAX);
--        gen_helper_vfp_cmps(cpu_F0s, cpu_F1s, cpu_env);
++    memory_region_init(&s->lpd.rpu.mr, obj, "mr-rpu", UINT64_MAX);
--}
+     memory_region_init(&s->mr_ps, obj, "mr-ps-switch", UINT64_MAX);
--
++    memory_region_init_alias(&s->lpd.rpu.mr_ps_alias, OBJECT(s),
--static inline void gen_vfp_cmpe(int dp)
++                             "mr-rpu-ps-alias", &s->mr_ps, 0, UINT64_MAX);
--{
+ }
--    if (dp)
--        gen_helper_vfp_cmped(cpu_F0d, cpu_F1d, cpu_env);
+ static Property versal_properties[] = {
 -    else
 -        gen_helper_vfp_cmpes(cpu_F0s, cpu_F1s, cpu_env);
 -}
 -
 -static inline void gen_vfp_F1_ld0(int dp)
 -{
 -    if (dp)
 -        tcg_gen_movi_i64(cpu_F1d, 0);
 -    else
 -        tcg_gen_movi_i32(cpu_F1s, 0);
 -}
 -
  #define VFP_GEN_ITOF(name) \
  static inline void gen_vfp_##name(int dp, int neon) \
  { \
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              case 15:
                  switch (rn) {
                  case 0 ... 3:
 +                case 8 ... 11:
                      /* Already handled by decodetree */
                      return 1;
                  default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                      rd_is_dp = false;
                      break;
 -                case 0x08: case 0x0a: /* vcmp, vcmpz */
 -                case 0x09: case 0x0b: /* vcmpe, vcmpez */
 -                    no_output = true;
 -                    break;
 -
                  case 0x0c: /* vrintr */
                  case 0x0d: /* vrintz */
                  case 0x0e: /* vrintx */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              /* Load the initial operands.  */
              if (op == 15) {
                  switch (rn) {
 -                case 0x08: case 0x09: /* Compare */
 -                    gen_mov_F0_vreg(dp, rd);
 -                    gen_mov_F1_vreg(dp, rm);
 -                    break;
 -                case 0x0a: case 0x0b: /* Compare with zero */
 -                    gen_mov_F0_vreg(dp, rd);
 -                    gen_vfp_F1_ld0(dp);
 -                    break;
                  case 0x14: /* vcvt fp <-> fixed */
                  case 0x15:
                  case 0x16:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                          gen_vfp_msr(tmp);
                          break;
                      }
 -                    case 8: /* cmp */
 -                        gen_vfp_cmp(dp);
 -                        break;
 -                    case 9: /* cmpe */
 -                        gen_vfp_cmpe(dp);
 -                        break;
 -                    case 10: /* cmpz */
 -                        gen_vfp_cmp(dp);
 -                        break;
 -                    case 11: /* cmpez */
 -                        gen_vfp_F1_ld0(dp);
 -                        gen_vfp_cmpe(dp);
 -                        break;
                      case 12: /* vrintr */
                      {
                          TCGv_ptr fpst = get_fpstatus_ptr(0);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 .... \
               vd=%vd_sp vm=%vm_sp
  VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 .... \
               vd=%vd_dp vm=%vm_dp
 +
 +VCMP_sp      ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VCMP_dp      ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 08/48] target/arm: Add stubs for AArch32 VFP decodetree
+[PULL 06/31] hw/misc: Add a model of the Xilinx Versal CRL
-Add the infrastructure for building and invoking a decodetree decoder
+From: "Edgar E. Iglesias" <edgar.iglesias@amd.com>
 for the AArch32 VFP encodings.  At the moment the new decoder covers
 nothing, so we always fall back to the existing hand-written decode.
-We need to have one decoder for the unconditional insns and one for
+Add a model of the Xilinx Versal CRL.
 the conditional insns, as otherwise the patterns for conditional
 insns would incorrectly match against the unconditional ones too.
-Since translate.c is over 14,000 lines long and we're going to be
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
-touching pretty much every line of the VFP code as part of the
+Reviewed-by: Frederic Konrad <fkonrad@amd.com>
-decodetree conversion, we create a new translate-vfp.inc.c to hold
+Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com>
-the code which deals with VFP in the new scheme.  It should be
+Message-id: 20220406174303.2022038-4-edgar.iglesias@xilinx.com
-possible to convert this into a standalone translation unit
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-eventually, but the conversion process will be much simpler if we
+---
-simply #include it midway through translate.c to start with.
+ include/hw/misc/xlnx-versal-crl.h | 235 +++++++++++++++++
  hw/misc/xlnx-versal-crl.c         | 421 ++++++++++++++++++++++++++++++
  hw/misc/meson.build               |   1 +
 files changed, 657 insertions(+)
  create mode 100644 include/hw/misc/xlnx-versal-crl.h
  create mode 100644 hw/misc/xlnx-versal-crl.c
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/include/hw/misc/xlnx-versal-crl.h b/include/hw/misc/xlnx-versal-crl.h
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/Makefile.objs       | 13 +++++++++++++
  target/arm/translate-vfp.inc.c | 31 +++++++++++++++++++++++++++++++
  target/arm/translate.c         | 19 +++++++++++++++++++
  target/arm/vfp-uncond.decode   | 28 ++++++++++++++++++++++++++++
  target/arm/vfp.decode          | 28 ++++++++++++++++++++++++++++
 files changed, 119 insertions(+)
  create mode 100644 target/arm/translate-vfp.inc.c
  create mode 100644 target/arm/vfp-uncond.decode
  create mode 100644 target/arm/vfp.decode
 diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/Makefile.objs
 +++ b/target/arm/Makefile.objs
@@ -XXX,XX +XXX,XX @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
        $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
        "GEN", $(TARGET_DIR)$@)
 +target/arm/decode-vfp.inc.c: $(SRC_PATH)/target/arm/vfp.decode $(DECODETREE)
 +    $(call quiet-command,\
 +      $(PYTHON) $(DECODETREE) --static-decode disas_vfp -o $@ $<,\
 +      "GEN", $(TARGET_DIR)$@)
 +
 +target/arm/decode-vfp-uncond.inc.c: $(SRC_PATH)/target/arm/vfp-uncond.decode $(DECODETREE)
 +    $(call quiet-command,\
 +      $(PYTHON) $(DECODETREE) --static-decode disas_vfp_uncond -o $@ $<,\
 +      "GEN", $(TARGET_DIR)$@)
 +
  target/arm/translate-sve.o: target/arm/decode-sve.inc.c
 +target/arm/translate.o: target/arm/decode-vfp.inc.c
 +target/arm/translate.o: target/arm/decode-vfp-uncond.inc.c
 +
  obj-$(TARGET_AARCH64) += translate-sve.o sve_helper.o
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/misc/xlnx-versal-crl.h
 @@ -XXX,XX +XXX,XX @@
 +/*
-+ *  ARM translation: AArch32 VFP instructions
++ * QEMU model of the Clock-Reset-LPD (CRL).
 + *
-+ *  Copyright (c) 2003 Fabrice Bellard
++ * Copyright (c) 2022 Xilinx Inc.
-+ *  Copyright (c) 2005-2007 CodeSourcery
++ * SPDX-License-Identifier: GPL-2.0-or-later
 + *  Copyright (c) 2007 OpenedHand, Ltd.
 + *  Copyright (c) 2019 Linaro, Ltd.
 + *
-+ * This library is free software; you can redistribute it and/or
++ * Written by Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
 + */
-+
++#ifndef HW_MISC_XLNX_VERSAL_CRL_H
-+/*
++#define HW_MISC_XLNX_VERSAL_CRL_H
-+ * This file is intended to be included from translate.c; it uses
++
-+ * some macros and definitions provided by that file.
++#include "hw/sysbus.h"
-+ * It might be possible to convert it to a standalone .c file eventually.
++#include "hw/register.h"
-+ */
++#include "target/arm/cpu.h"
 +
-+/* Include the generated VFP decoder */
++#define TYPE_XLNX_VERSAL_CRL "xlnx,versal-crl"
-+#include "decode-vfp.inc.c"
++OBJECT_DECLARE_SIMPLE_TYPE(XlnxVersalCRL, XLNX_VERSAL_CRL)
-+#include "decode-vfp-uncond.inc.c"
++
-diff --git a/target/arm/translate.c b/target/arm/translate.c
++REG32(ERR_CTRL, 0x0)
-index XXXXXXX..XXXXXXX 100644
++    FIELD(ERR_CTRL, SLVERR_ENABLE, 0, 1)
---- a/target/arm/translate.c
++REG32(IR_STATUS, 0x4)
-+++ b/target/arm/translate.c
++    FIELD(IR_STATUS, ADDR_DECODE_ERR, 0, 1)
-@@ -XXX,XX +XXX,XX @@ static inline void gen_mov_vreg_F0(int dp, int reg)
++REG32(IR_MASK, 0x8)
++    FIELD(IR_MASK, ADDR_DECODE_ERR, 0, 1)
- #define ARM_CP_RW_BIT   (1 << 20)
++REG32(IR_ENABLE, 0xc)
++    FIELD(IR_ENABLE, ADDR_DECODE_ERR, 0, 1)
-+/* Include the VFP decoder */
++REG32(IR_DISABLE, 0x10)
-+#include "translate-vfp.inc.c"
++    FIELD(IR_DISABLE, ADDR_DECODE_ERR, 0, 1)
-+
++REG32(WPROT, 0x1c)
- static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
++    FIELD(WPROT, ACTIVE, 0, 1)
- {
++REG32(PLL_CLK_OTHER_DMN, 0x20)
-     tcg_gen_ld_i64(var, cpu_env, offsetof(CPUARMState, iwmmxt.regs[reg]));
++    FIELD(PLL_CLK_OTHER_DMN, APLL_BYPASS, 0, 1)
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
++REG32(RPLL_CTRL, 0x40)
-         return 1;
++    FIELD(RPLL_CTRL, POST_SRC, 24, 3)
-     }
++    FIELD(RPLL_CTRL, PRE_SRC, 20, 3)
++    FIELD(RPLL_CTRL, CLKOUTDIV, 16, 2)
-+    /*
++    FIELD(RPLL_CTRL, FBDIV, 8, 8)
-+     * If the decodetree decoder handles this insn it will always
++    FIELD(RPLL_CTRL, BYPASS, 3, 1)
-+     * emit code to either execute the insn or generate an appropriate
++    FIELD(RPLL_CTRL, RESET, 0, 1)
-+     * exception; so we don't need to ever return non-zero to tell
++REG32(RPLL_CFG, 0x44)
-+     * the calling code to emit an UNDEF exception.
++    FIELD(RPLL_CFG, LOCK_DLY, 25, 7)
-+     */
++    FIELD(RPLL_CFG, LOCK_CNT, 13, 10)
-+    if (extract32(insn, 28, 4) == 0xf) {
++    FIELD(RPLL_CFG, LFHF, 10, 2)
-+        if (disas_vfp_uncond(s, insn)) {
++    FIELD(RPLL_CFG, CP, 5, 4)
-+            return 0;
++    FIELD(RPLL_CFG, RES, 0, 4)
-+        }
++REG32(RPLL_FRAC_CFG, 0x48)
-+    } else {
++    FIELD(RPLL_FRAC_CFG, ENABLED, 31, 1)
-+        if (disas_vfp(s, insn)) {
++    FIELD(RPLL_FRAC_CFG, SEED, 22, 3)
-+            return 0;
++    FIELD(RPLL_FRAC_CFG, ALGRTHM, 19, 1)
-+        }
++    FIELD(RPLL_FRAC_CFG, ORDER, 18, 1)
-+    }
++    FIELD(RPLL_FRAC_CFG, DATA, 0, 16)
-+
++REG32(PLL_STATUS, 0x50)
-     /* FIXME: this access check should not take precedence over UNDEF
++    FIELD(PLL_STATUS, RPLL_STABLE, 2, 1)
-      * for invalid encodings; we will generate incorrect syndrome information
++    FIELD(PLL_STATUS, RPLL_LOCK, 0, 1)
-      * for attempts to execute invalid vfp/neon encodings with FP disabled.
++REG32(RPLL_TO_XPD_CTRL, 0x100)
-diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
++    FIELD(RPLL_TO_XPD_CTRL, CLKACT, 25, 1)
 +    FIELD(RPLL_TO_XPD_CTRL, DIVISOR0, 8, 10)
 +REG32(LPD_TOP_SWITCH_CTRL, 0x104)
 +    FIELD(LPD_TOP_SWITCH_CTRL, CLKACT_ADMA, 26, 1)
 +    FIELD(LPD_TOP_SWITCH_CTRL, CLKACT, 25, 1)
 +    FIELD(LPD_TOP_SWITCH_CTRL, DIVISOR0, 8, 10)
 +    FIELD(LPD_TOP_SWITCH_CTRL, SRCSEL, 0, 3)
 +REG32(LPD_LSBUS_CTRL, 0x108)
 +    FIELD(LPD_LSBUS_CTRL, CLKACT, 25, 1)
 +    FIELD(LPD_LSBUS_CTRL, DIVISOR0, 8, 10)
 +    FIELD(LPD_LSBUS_CTRL, SRCSEL, 0, 3)
 +REG32(CPU_R5_CTRL, 0x10c)
 +    FIELD(CPU_R5_CTRL, CLKACT_OCM2, 28, 1)
 +    FIELD(CPU_R5_CTRL, CLKACT_OCM, 27, 1)
 +    FIELD(CPU_R5_CTRL, CLKACT_CORE, 26, 1)
 +    FIELD(CPU_R5_CTRL, CLKACT, 25, 1)
 +    FIELD(CPU_R5_CTRL, DIVISOR0, 8, 10)
 +    FIELD(CPU_R5_CTRL, SRCSEL, 0, 3)
 +REG32(IOU_SWITCH_CTRL, 0x114)
 +    FIELD(IOU_SWITCH_CTRL, CLKACT, 25, 1)
 +    FIELD(IOU_SWITCH_CTRL, DIVISOR0, 8, 10)
 +    FIELD(IOU_SWITCH_CTRL, SRCSEL, 0, 3)
 +REG32(GEM0_REF_CTRL, 0x118)
 +    FIELD(GEM0_REF_CTRL, CLKACT_RX, 27, 1)
 +    FIELD(GEM0_REF_CTRL, CLKACT_TX, 26, 1)
 +    FIELD(GEM0_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(GEM0_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(GEM0_REF_CTRL, SRCSEL, 0, 3)
 +REG32(GEM1_REF_CTRL, 0x11c)
 +    FIELD(GEM1_REF_CTRL, CLKACT_RX, 27, 1)
 +    FIELD(GEM1_REF_CTRL, CLKACT_TX, 26, 1)
 +    FIELD(GEM1_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(GEM1_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(GEM1_REF_CTRL, SRCSEL, 0, 3)
 +REG32(GEM_TSU_REF_CTRL, 0x120)
 +    FIELD(GEM_TSU_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(GEM_TSU_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(GEM_TSU_REF_CTRL, SRCSEL, 0, 3)
 +REG32(USB0_BUS_REF_CTRL, 0x124)
 +    FIELD(USB0_BUS_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(USB0_BUS_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(USB0_BUS_REF_CTRL, SRCSEL, 0, 3)
 +REG32(UART0_REF_CTRL, 0x128)
 +    FIELD(UART0_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(UART0_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(UART0_REF_CTRL, SRCSEL, 0, 3)
 +REG32(UART1_REF_CTRL, 0x12c)
 +    FIELD(UART1_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(UART1_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(UART1_REF_CTRL, SRCSEL, 0, 3)
 +REG32(SPI0_REF_CTRL, 0x130)
 +    FIELD(SPI0_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(SPI0_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(SPI0_REF_CTRL, SRCSEL, 0, 3)
 +REG32(SPI1_REF_CTRL, 0x134)
 +    FIELD(SPI1_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(SPI1_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(SPI1_REF_CTRL, SRCSEL, 0, 3)
 +REG32(CAN0_REF_CTRL, 0x138)
 +    FIELD(CAN0_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(CAN0_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(CAN0_REF_CTRL, SRCSEL, 0, 3)
 +REG32(CAN1_REF_CTRL, 0x13c)
 +    FIELD(CAN1_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(CAN1_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(CAN1_REF_CTRL, SRCSEL, 0, 3)
 +REG32(I2C0_REF_CTRL, 0x140)
 +    FIELD(I2C0_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(I2C0_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(I2C0_REF_CTRL, SRCSEL, 0, 3)
 +REG32(I2C1_REF_CTRL, 0x144)
 +    FIELD(I2C1_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(I2C1_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(I2C1_REF_CTRL, SRCSEL, 0, 3)
 +REG32(DBG_LPD_CTRL, 0x148)
 +    FIELD(DBG_LPD_CTRL, CLKACT, 25, 1)
 +    FIELD(DBG_LPD_CTRL, DIVISOR0, 8, 10)
 +    FIELD(DBG_LPD_CTRL, SRCSEL, 0, 3)
 +REG32(TIMESTAMP_REF_CTRL, 0x14c)
 +    FIELD(TIMESTAMP_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(TIMESTAMP_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(TIMESTAMP_REF_CTRL, SRCSEL, 0, 3)
 +REG32(CRL_SAFETY_CHK, 0x150)
 +REG32(PSM_REF_CTRL, 0x154)
 +    FIELD(PSM_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(PSM_REF_CTRL, SRCSEL, 0, 3)
 +REG32(DBG_TSTMP_CTRL, 0x158)
 +    FIELD(DBG_TSTMP_CTRL, CLKACT, 25, 1)
 +    FIELD(DBG_TSTMP_CTRL, DIVISOR0, 8, 10)
 +    FIELD(DBG_TSTMP_CTRL, SRCSEL, 0, 3)
 +REG32(CPM_TOPSW_REF_CTRL, 0x15c)
 +    FIELD(CPM_TOPSW_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(CPM_TOPSW_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(CPM_TOPSW_REF_CTRL, SRCSEL, 0, 3)
 +REG32(USB3_DUAL_REF_CTRL, 0x160)
 +    FIELD(USB3_DUAL_REF_CTRL, CLKACT, 25, 1)
 +    FIELD(USB3_DUAL_REF_CTRL, DIVISOR0, 8, 10)
 +    FIELD(USB3_DUAL_REF_CTRL, SRCSEL, 0, 3)
 +REG32(RST_CPU_R5, 0x300)
 +    FIELD(RST_CPU_R5, RESET_PGE, 4, 1)
 +    FIELD(RST_CPU_R5, RESET_AMBA, 2, 1)
 +    FIELD(RST_CPU_R5, RESET_CPU1, 1, 1)
 +    FIELD(RST_CPU_R5, RESET_CPU0, 0, 1)
 +REG32(RST_ADMA, 0x304)
 +    FIELD(RST_ADMA, RESET, 0, 1)
 +REG32(RST_GEM0, 0x308)
 +    FIELD(RST_GEM0, RESET, 0, 1)
 +REG32(RST_GEM1, 0x30c)
 +    FIELD(RST_GEM1, RESET, 0, 1)
 +REG32(RST_SPARE, 0x310)
 +    FIELD(RST_SPARE, RESET, 0, 1)
 +REG32(RST_USB0, 0x314)
 +    FIELD(RST_USB0, RESET, 0, 1)
 +REG32(RST_UART0, 0x318)
 +    FIELD(RST_UART0, RESET, 0, 1)
 +REG32(RST_UART1, 0x31c)
 +    FIELD(RST_UART1, RESET, 0, 1)
 +REG32(RST_SPI0, 0x320)
 +    FIELD(RST_SPI0, RESET, 0, 1)
 +REG32(RST_SPI1, 0x324)
 +    FIELD(RST_SPI1, RESET, 0, 1)
 +REG32(RST_CAN0, 0x328)
 +    FIELD(RST_CAN0, RESET, 0, 1)
 +REG32(RST_CAN1, 0x32c)
 +    FIELD(RST_CAN1, RESET, 0, 1)
 +REG32(RST_I2C0, 0x330)
 +    FIELD(RST_I2C0, RESET, 0, 1)
 +REG32(RST_I2C1, 0x334)
 +    FIELD(RST_I2C1, RESET, 0, 1)
 +REG32(RST_DBG_LPD, 0x338)
 +    FIELD(RST_DBG_LPD, RPU_DBG1_RESET, 5, 1)
 +    FIELD(RST_DBG_LPD, RPU_DBG0_RESET, 4, 1)
 +    FIELD(RST_DBG_LPD, RESET_HSDP, 1, 1)
 +    FIELD(RST_DBG_LPD, RESET, 0, 1)
 +REG32(RST_GPIO, 0x33c)
 +    FIELD(RST_GPIO, RESET, 0, 1)
 +REG32(RST_TTC, 0x344)
 +    FIELD(RST_TTC, TTC3_RESET, 3, 1)
 +    FIELD(RST_TTC, TTC2_RESET, 2, 1)
 +    FIELD(RST_TTC, TTC1_RESET, 1, 1)
 +    FIELD(RST_TTC, TTC0_RESET, 0, 1)
 +REG32(RST_TIMESTAMP, 0x348)
 +    FIELD(RST_TIMESTAMP, RESET, 0, 1)
 +REG32(RST_SWDT, 0x34c)
 +    FIELD(RST_SWDT, RESET, 0, 1)
 +REG32(RST_OCM, 0x350)
 +    FIELD(RST_OCM, RESET, 0, 1)
 +REG32(RST_IPI, 0x354)
 +    FIELD(RST_IPI, RESET, 0, 1)
 +REG32(RST_SYSMON, 0x358)
 +    FIELD(RST_SYSMON, SEQ_RST, 1, 1)
 +    FIELD(RST_SYSMON, CFG_RST, 0, 1)
 +REG32(RST_FPD, 0x360)
 +    FIELD(RST_FPD, SRST, 1, 1)
 +    FIELD(RST_FPD, POR, 0, 1)
 +REG32(PSM_RST_MODE, 0x370)
 +    FIELD(PSM_RST_MODE, WAKEUP, 2, 1)
 +    FIELD(PSM_RST_MODE, RST_MODE, 0, 2)
 +
 +#define CRL_R_MAX (R_PSM_RST_MODE + 1)
 +
 +#define RPU_MAX_CPU 2
 +
 +struct XlnxVersalCRL {
 +    SysBusDevice parent_obj;
 +    qemu_irq irq;
 +
 +    struct {
 +        ARMCPU *cpu_r5[RPU_MAX_CPU];
 +        DeviceState *adma[8];
 +        DeviceState *uart[2];
 +        DeviceState *gem[2];
 +        DeviceState *usb;
 +    } cfg;
 +
 +    RegisterInfoArray *reg_array;
 +    uint32_t regs[CRL_R_MAX];
 +    RegisterInfo regs_info[CRL_R_MAX];
 +};
 +#endif
 diff --git a/hw/misc/xlnx-versal-crl.c b/hw/misc/xlnx-versal-crl.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/target/arm/vfp-uncond.decode
++++ b/hw/misc/xlnx-versal-crl.c
 @@ -XXX,XX +XXX,XX @@
-+# AArch32 VFP instruction descriptions (unconditional insns)
++/*
-+#
++ * QEMU model of the Clock-Reset-LPD (CRL).
-+#  Copyright (c) 2019 Linaro, Ltd
++ *
-+#
++ * Copyright (c) 2022 Advanced Micro Devices, Inc.
-+# This library is free software; you can redistribute it and/or
++ * SPDX-License-Identifier: GPL-2.0-or-later
-+# modify it under the terms of the GNU Lesser General Public
++ *
-+# License as published by the Free Software Foundation; either
++ * Written by Edgar E. Iglesias <edgar.iglesias@amd.com>
-+# version 2 of the License, or (at your option) any later version.
++ */
-+#
++
-+# This library is distributed in the hope that it will be useful,
++#include "qemu/osdep.h"
-+# but WITHOUT ANY WARRANTY; without even the implied warranty of
++#include "qapi/error.h"
-+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++#include "qemu/log.h"
-+# Lesser General Public License for more details.
++#include "qemu/bitops.h"
-+#
++#include "migration/vmstate.h"
-+# You should have received a copy of the GNU Lesser General Public
++#include "hw/qdev-properties.h"
-+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
++#include "hw/sysbus.h"
-+
++#include "hw/irq.h"
-+#
++#include "hw/register.h"
-+# This file is processed by scripts/decodetree.py
++#include "hw/resettable.h"
-+#
++
-+# Encodings for the unconditional VFP instructions are here:
++#include "target/arm/arm-powerctl.h"
-+# generally anything matching A32
++#include "hw/misc/xlnx-versal-crl.h"
-+#  1111 1110 .... .... .... 101. ...0 ....
++
-+# and T32
++#ifndef XLNX_VERSAL_CRL_ERR_DEBUG
-+#  1111 110. .... .... .... 101. .... ....
++#define XLNX_VERSAL_CRL_ERR_DEBUG 0
-+#  1111 1110 .... .... .... 101. .... ....
++#endif
-+# (but those patterns might also cover some Neon instructions,
++
-+# which do not live in this file.)
++static void crl_update_irq(XlnxVersalCRL *s)
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
++{
-new file mode 100644
++    bool pending = s->regs[R_IR_STATUS] & ~s->regs[R_IR_MASK];
-index XXXXXXX..XXXXXXX
++    qemu_set_irq(s->irq, pending);
---- /dev/null
++}
-+++ b/target/arm/vfp.decode
++
-@@ -XXX,XX +XXX,XX @@
++static void crl_status_postw(RegisterInfo *reg, uint64_t val64)
-+# AArch32 VFP instruction descriptions (conditional insns)
++{
-+#
++    XlnxVersalCRL *s = XLNX_VERSAL_CRL(reg->opaque);
-+#  Copyright (c) 2019 Linaro, Ltd
++    crl_update_irq(s);
-+#
++}
-+# This library is free software; you can redistribute it and/or
++
-+# modify it under the terms of the GNU Lesser General Public
++static uint64_t crl_enable_prew(RegisterInfo *reg, uint64_t val64)
-+# License as published by the Free Software Foundation; either
++{
-+# version 2 of the License, or (at your option) any later version.
++    XlnxVersalCRL *s = XLNX_VERSAL_CRL(reg->opaque);
-+#
++    uint32_t val = val64;
-+# This library is distributed in the hope that it will be useful,
++
-+# but WITHOUT ANY WARRANTY; without even the implied warranty of
++    s->regs[R_IR_MASK] &= ~val;
-+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++    crl_update_irq(s);
-+# Lesser General Public License for more details.
++    return 0;
-+#
++}
-+# You should have received a copy of the GNU Lesser General Public
++
-+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
++static uint64_t crl_disable_prew(RegisterInfo *reg, uint64_t val64)
-+
++{
-+#
++    XlnxVersalCRL *s = XLNX_VERSAL_CRL(reg->opaque);
-+# This file is processed by scripts/decodetree.py
++    uint32_t val = val64;
-+#
++
-+# Encodings for the conditional VFP instructions are here:
++    s->regs[R_IR_MASK] |= val;
-+# generally anything matching A32
++    crl_update_irq(s);
-+#  cccc 11.. .... .... .... 101. .... ....
++    return 0;
-+# and T32
++}
-+#  1110 110. .... .... .... 101. .... ....
++
-+#  1110 1110 .... .... .... 101. .... ....
++static void crl_reset_dev(XlnxVersalCRL *s, DeviceState *dev,
-+# (but those patterns might also cover some Neon instructions,
++                          bool rst_old, bool rst_new)
-+# which do not live in this file.)
++{
 +    device_cold_reset(dev);
 +}
 +
 +static void crl_reset_cpu(XlnxVersalCRL *s, ARMCPU *armcpu,
 +                          bool rst_old, bool rst_new)
 +{
 +    if (rst_new) {
 +        arm_set_cpu_off(armcpu->mp_affinity);
 +    } else {
 +        arm_set_cpu_on_and_reset(armcpu->mp_affinity);
 +    }
 +}
 +
 +#define REGFIELD_RESET(type, s, reg, f, new_val, dev) {     \
 +    bool old_f = ARRAY_FIELD_EX32((s)->regs, reg, f);       \
 +    bool new_f = FIELD_EX32(new_val, reg, f);               \
 +                                                            \
 +    /* Detect edges.  */                                    \
 +    if (dev && old_f != new_f) {                            \
 +        crl_reset_ ## type(s, dev, old_f, new_f);           \
 +    }                                                       \
 +}
 +
 +static uint64_t crl_rst_r5_prew(RegisterInfo *reg, uint64_t val64)
 +{
 +    XlnxVersalCRL *s = XLNX_VERSAL_CRL(reg->opaque);
 +
 +    REGFIELD_RESET(cpu, s, RST_CPU_R5, RESET_CPU0, val64, s->cfg.cpu_r5[0]);
 +    REGFIELD_RESET(cpu, s, RST_CPU_R5, RESET_CPU1, val64, s->cfg.cpu_r5[1]);
 +    return val64;
 +}
 +
 +static uint64_t crl_rst_adma_prew(RegisterInfo *reg, uint64_t val64)
 +{
 +    XlnxVersalCRL *s = XLNX_VERSAL_CRL(reg->opaque);
 +    int i;
 +
 +    /* A single register fans out to all ADMA reset inputs.  */
 +    for (i = 0; i < ARRAY_SIZE(s->cfg.adma); i++) {
 +        REGFIELD_RESET(dev, s, RST_ADMA, RESET, val64, s->cfg.adma[i]);
 +    }
 +    return val64;
 +}
 +
 +static uint64_t crl_rst_uart0_prew(RegisterInfo *reg, uint64_t val64)
 +{
 +    XlnxVersalCRL *s = XLNX_VERSAL_CRL(reg->opaque);
 +
 +    REGFIELD_RESET(dev, s, RST_UART0, RESET, val64, s->cfg.uart[0]);
 +    return val64;
 +}
 +
 +static uint64_t crl_rst_uart1_prew(RegisterInfo *reg, uint64_t val64)
 +{
 +    XlnxVersalCRL *s = XLNX_VERSAL_CRL(reg->opaque);
 +
 +    REGFIELD_RESET(dev, s, RST_UART1, RESET, val64, s->cfg.uart[1]);
 +    return val64;
 +}
 +
 +static uint64_t crl_rst_gem0_prew(RegisterInfo *reg, uint64_t val64)
 +{
 +    XlnxVersalCRL *s = XLNX_VERSAL_CRL(reg->opaque);
 +
 +    REGFIELD_RESET(dev, s, RST_GEM0, RESET, val64, s->cfg.gem[0]);
 +    return val64;
 +}
 +
 +static uint64_t crl_rst_gem1_prew(RegisterInfo *reg, uint64_t val64)
 +{
 +    XlnxVersalCRL *s = XLNX_VERSAL_CRL(reg->opaque);
 +
 +    REGFIELD_RESET(dev, s, RST_GEM1, RESET, val64, s->cfg.gem[1]);
 +    return val64;
 +}
 +
 +static uint64_t crl_rst_usb_prew(RegisterInfo *reg, uint64_t val64)
 +{
 +    XlnxVersalCRL *s = XLNX_VERSAL_CRL(reg->opaque);
 +
 +    REGFIELD_RESET(dev, s, RST_USB0, RESET, val64, s->cfg.usb);
 +    return val64;
 +}
 +
 +static const RegisterAccessInfo crl_regs_info[] = {
 +    {   .name = "ERR_CTRL",  .addr = A_ERR_CTRL,
 +    },{ .name = "IR_STATUS",  .addr = A_IR_STATUS,
 +        .w1c = 0x1,
 +        .post_write = crl_status_postw,
 +    },{ .name = "IR_MASK",  .addr = A_IR_MASK,
 +        .reset = 0x1,
 +        .ro = 0x1,
 +    },{ .name = "IR_ENABLE",  .addr = A_IR_ENABLE,
 +        .pre_write = crl_enable_prew,
 +    },{ .name = "IR_DISABLE",  .addr = A_IR_DISABLE,
 +        .pre_write = crl_disable_prew,
 +    },{ .name = "WPROT",  .addr = A_WPROT,
 +    },{ .name = "PLL_CLK_OTHER_DMN",  .addr = A_PLL_CLK_OTHER_DMN,
 +        .reset = 0x1,
 +        .rsvd = 0xe,
 +    },{ .name = "RPLL_CTRL",  .addr = A_RPLL_CTRL,
 +        .reset = 0x24809,
 +        .rsvd = 0xf88c00f6,
 +    },{ .name = "RPLL_CFG",  .addr = A_RPLL_CFG,
 +        .reset = 0x2000000,
 +        .rsvd = 0x1801210,
 +    },{ .name = "RPLL_FRAC_CFG",  .addr = A_RPLL_FRAC_CFG,
 +        .rsvd = 0x7e330000,
 +    },{ .name = "PLL_STATUS",  .addr = A_PLL_STATUS,
 +        .reset = R_PLL_STATUS_RPLL_STABLE_MASK |
 +                 R_PLL_STATUS_RPLL_LOCK_MASK,
 +        .rsvd = 0xfa,
 +        .ro = 0x5,
 +    },{ .name = "RPLL_TO_XPD_CTRL",  .addr = A_RPLL_TO_XPD_CTRL,
 +        .reset = 0x2000100,
 +        .rsvd = 0xfdfc00ff,
 +    },{ .name = "LPD_TOP_SWITCH_CTRL",  .addr = A_LPD_TOP_SWITCH_CTRL,
 +        .reset = 0x6000300,
 +        .rsvd = 0xf9fc00f8,
 +    },{ .name = "LPD_LSBUS_CTRL",  .addr = A_LPD_LSBUS_CTRL,
 +        .reset = 0x2000800,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "CPU_R5_CTRL",  .addr = A_CPU_R5_CTRL,
 +        .reset = 0xe000300,
 +        .rsvd = 0xe1fc00f8,
 +    },{ .name = "IOU_SWITCH_CTRL",  .addr = A_IOU_SWITCH_CTRL,
 +        .reset = 0x2000500,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "GEM0_REF_CTRL",  .addr = A_GEM0_REF_CTRL,
 +        .reset = 0xe000a00,
 +        .rsvd = 0xf1fc00f8,
 +    },{ .name = "GEM1_REF_CTRL",  .addr = A_GEM1_REF_CTRL,
 +        .reset = 0xe000a00,
 +        .rsvd = 0xf1fc00f8,
 +    },{ .name = "GEM_TSU_REF_CTRL",  .addr = A_GEM_TSU_REF_CTRL,
 +        .reset = 0x300,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "USB0_BUS_REF_CTRL",  .addr = A_USB0_BUS_REF_CTRL,
 +        .reset = 0x2001900,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "UART0_REF_CTRL",  .addr = A_UART0_REF_CTRL,
 +        .reset = 0xc00,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "UART1_REF_CTRL",  .addr = A_UART1_REF_CTRL,
 +        .reset = 0xc00,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "SPI0_REF_CTRL",  .addr = A_SPI0_REF_CTRL,
 +        .reset = 0x600,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "SPI1_REF_CTRL",  .addr = A_SPI1_REF_CTRL,
 +        .reset = 0x600,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "CAN0_REF_CTRL",  .addr = A_CAN0_REF_CTRL,
 +        .reset = 0xc00,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "CAN1_REF_CTRL",  .addr = A_CAN1_REF_CTRL,
 +        .reset = 0xc00,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "I2C0_REF_CTRL",  .addr = A_I2C0_REF_CTRL,
 +        .reset = 0xc00,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "I2C1_REF_CTRL",  .addr = A_I2C1_REF_CTRL,
 +        .reset = 0xc00,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "DBG_LPD_CTRL",  .addr = A_DBG_LPD_CTRL,
 +        .reset = 0x300,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "TIMESTAMP_REF_CTRL",  .addr = A_TIMESTAMP_REF_CTRL,
 +        .reset = 0x2000c00,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "CRL_SAFETY_CHK",  .addr = A_CRL_SAFETY_CHK,
 +    },{ .name = "PSM_REF_CTRL",  .addr = A_PSM_REF_CTRL,
 +        .reset = 0xf04,
 +        .rsvd = 0xfffc00f8,
 +    },{ .name = "DBG_TSTMP_CTRL",  .addr = A_DBG_TSTMP_CTRL,
 +        .reset = 0x300,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "CPM_TOPSW_REF_CTRL",  .addr = A_CPM_TOPSW_REF_CTRL,
 +        .reset = 0x300,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "USB3_DUAL_REF_CTRL",  .addr = A_USB3_DUAL_REF_CTRL,
 +        .reset = 0x3c00,
 +        .rsvd = 0xfdfc00f8,
 +    },{ .name = "RST_CPU_R5",  .addr = A_RST_CPU_R5,
 +        .reset = 0x17,
 +        .rsvd = 0x8,
 +        .pre_write = crl_rst_r5_prew,
 +    },{ .name = "RST_ADMA",  .addr = A_RST_ADMA,
 +        .reset = 0x1,
 +        .pre_write = crl_rst_adma_prew,
 +    },{ .name = "RST_GEM0",  .addr = A_RST_GEM0,
 +        .reset = 0x1,
 +        .pre_write = crl_rst_gem0_prew,
 +    },{ .name = "RST_GEM1",  .addr = A_RST_GEM1,
 +        .reset = 0x1,
 +        .pre_write = crl_rst_gem1_prew,
 +    },{ .name = "RST_SPARE",  .addr = A_RST_SPARE,
 +        .reset = 0x1,
 +    },{ .name = "RST_USB0",  .addr = A_RST_USB0,
 +        .reset = 0x1,
 +        .pre_write = crl_rst_usb_prew,
 +    },{ .name = "RST_UART0",  .addr = A_RST_UART0,
 +        .reset = 0x1,
 +        .pre_write = crl_rst_uart0_prew,
 +    },{ .name = "RST_UART1",  .addr = A_RST_UART1,
 +        .reset = 0x1,
 +        .pre_write = crl_rst_uart1_prew,
 +    },{ .name = "RST_SPI0",  .addr = A_RST_SPI0,
 +        .reset = 0x1,
 +    },{ .name = "RST_SPI1",  .addr = A_RST_SPI1,
 +        .reset = 0x1,
 +    },{ .name = "RST_CAN0",  .addr = A_RST_CAN0,
 +        .reset = 0x1,
 +    },{ .name = "RST_CAN1",  .addr = A_RST_CAN1,
 +        .reset = 0x1,
 +    },{ .name = "RST_I2C0",  .addr = A_RST_I2C0,
 +        .reset = 0x1,
 +    },{ .name = "RST_I2C1",  .addr = A_RST_I2C1,
 +        .reset = 0x1,
 +    },{ .name = "RST_DBG_LPD",  .addr = A_RST_DBG_LPD,
 +        .reset = 0x33,
 +        .rsvd = 0xcc,
 +    },{ .name = "RST_GPIO",  .addr = A_RST_GPIO,
 +        .reset = 0x1,
 +    },{ .name = "RST_TTC",  .addr = A_RST_TTC,
 +        .reset = 0xf,
 +    },{ .name = "RST_TIMESTAMP",  .addr = A_RST_TIMESTAMP,
 +        .reset = 0x1,
 +    },{ .name = "RST_SWDT",  .addr = A_RST_SWDT,
 +        .reset = 0x1,
 +    },{ .name = "RST_OCM",  .addr = A_RST_OCM,
 +    },{ .name = "RST_IPI",  .addr = A_RST_IPI,
 +    },{ .name = "RST_FPD",  .addr = A_RST_FPD,
 +        .reset = 0x3,
 +    },{ .name = "PSM_RST_MODE",  .addr = A_PSM_RST_MODE,
 +        .reset = 0x1,
 +        .rsvd = 0xf8,
 +    }
 +};
 +
 +static void crl_reset_enter(Object *obj, ResetType type)
 +{
 +    XlnxVersalCRL *s = XLNX_VERSAL_CRL(obj);
 +    unsigned int i;
 +
 +    for (i = 0; i < ARRAY_SIZE(s->regs_info); ++i) {
 +        register_reset(&s->regs_info[i]);
 +    }
 +}
 +
 +static void crl_reset_hold(Object *obj)
 +{
 +    XlnxVersalCRL *s = XLNX_VERSAL_CRL(obj);
 +
 +    crl_update_irq(s);
 +}
 +
 +static const MemoryRegionOps crl_ops = {
 +    .read = register_read_memory,
 +    .write = register_write_memory,
 +    .endianness = DEVICE_LITTLE_ENDIAN,
 +    .valid = {
 +        .min_access_size = 4,
 +        .max_access_size = 4,
 +    },
 +};
 +
 +static void crl_init(Object *obj)
 +{
 +    XlnxVersalCRL *s = XLNX_VERSAL_CRL(obj);
 +    SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
 +    int i;
 +
 +    s->reg_array =
 +        register_init_block32(DEVICE(obj), crl_regs_info,
 +                              ARRAY_SIZE(crl_regs_info),
 +                              s->regs_info, s->regs,
 +                              &crl_ops,
 +                              XLNX_VERSAL_CRL_ERR_DEBUG,
 +                              CRL_R_MAX * 4);
 +    sysbus_init_mmio(sbd, &s->reg_array->mem);
 +    sysbus_init_irq(sbd, &s->irq);
 +
 +    for (i = 0; i < ARRAY_SIZE(s->cfg.cpu_r5); ++i) {
 +        object_property_add_link(obj, "cpu_r5[*]", TYPE_ARM_CPU,
 +                                 (Object **)&s->cfg.cpu_r5[i],
 +                                 qdev_prop_allow_set_link_before_realize,
 +                                 OBJ_PROP_LINK_STRONG);
 +    }
 +
 +    for (i = 0; i < ARRAY_SIZE(s->cfg.adma); ++i) {
 +        object_property_add_link(obj, "adma[*]", TYPE_DEVICE,
 +                                 (Object **)&s->cfg.adma[i],
 +                                 qdev_prop_allow_set_link_before_realize,
 +                                 OBJ_PROP_LINK_STRONG);
 +    }
 +
 +    for (i = 0; i < ARRAY_SIZE(s->cfg.uart); ++i) {
 +        object_property_add_link(obj, "uart[*]", TYPE_DEVICE,
 +                                 (Object **)&s->cfg.uart[i],
 +                                 qdev_prop_allow_set_link_before_realize,
 +                                 OBJ_PROP_LINK_STRONG);
 +    }
 +
 +    for (i = 0; i < ARRAY_SIZE(s->cfg.gem); ++i) {
 +        object_property_add_link(obj, "gem[*]", TYPE_DEVICE,
 +                                 (Object **)&s->cfg.gem[i],
 +                                 qdev_prop_allow_set_link_before_realize,
 +                                 OBJ_PROP_LINK_STRONG);
 +    }
 +
 +    object_property_add_link(obj, "usb", TYPE_DEVICE,
 +                             (Object **)&s->cfg.gem[i],
 +                             qdev_prop_allow_set_link_before_realize,
 +                             OBJ_PROP_LINK_STRONG);
 +}
 +
 +static void crl_finalize(Object *obj)
 +{
 +    XlnxVersalCRL *s = XLNX_VERSAL_CRL(obj);
 +    register_finalize_block(s->reg_array);
 +}
 +
 +static const VMStateDescription vmstate_crl = {
 +    .name = TYPE_XLNX_VERSAL_CRL,
 +    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .fields = (VMStateField[]) {
 +        VMSTATE_UINT32_ARRAY(regs, XlnxVersalCRL, CRL_R_MAX),
 +        VMSTATE_END_OF_LIST(),
 +    }
 +};
 +
 +static void crl_class_init(ObjectClass *klass, void *data)
 +{
 +    ResettableClass *rc = RESETTABLE_CLASS(klass);
 +    DeviceClass *dc = DEVICE_CLASS(klass);
 +
 +    dc->vmsd = &vmstate_crl;
 +
 +    rc->phases.enter = crl_reset_enter;
 +    rc->phases.hold = crl_reset_hold;
 +}
 +
 +static const TypeInfo crl_info = {
 +    .name          = TYPE_XLNX_VERSAL_CRL,
 +    .parent        = TYPE_SYS_BUS_DEVICE,
 +    .instance_size = sizeof(XlnxVersalCRL),
 +    .class_init    = crl_class_init,
 +    .instance_init = crl_init,
 +    .instance_finalize = crl_finalize,
 +};
 +
 +static void crl_register_types(void)
 +{
 +    type_register_static(&crl_info);
 +}
 +
 +type_init(crl_register_types)
 diff --git a/hw/misc/meson.build b/hw/misc/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/meson.build
 +++ b/hw/misc/meson.build
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_SLAVIO', if_true: files('slavio_misc.c'))
  softmmu_ss.add(when: 'CONFIG_ZYNQ', if_true: files('zynq_slcr.c'))
  specific_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp-crf.c'))
  specific_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp-apu-ctrl.c'))
 +specific_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: files('xlnx-versal-crl.c'))
  softmmu_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: files(
    'xlnx-versal-xramc.c',
    'xlnx-versal-pmc-iou-slcr.c',
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 34/48] target/arm: Convert VMOV (imm) to decodetree
+[PULL 07/31] hw/arm: versal: Connect the CRL
-Convert the VFP VMOV (immediate) instruction to decodetree.
+From: "Edgar E. Iglesias" <edgar.iglesias@amd.com>
+Connect the CRL (Clock Reset LPD) to the Versal SoC.
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
+Reviewed-by: Frederic Konrad <fkonrad@amd.com>
+Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com>
+Message-id: 20220406174303.2022038-5-edgar.iglesias@xilinx.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 129 +++++++++++++++++++++++++++++++++
+ include/hw/arm/xlnx-versal.h |  4 +++
- target/arm/translate.c         |  27 +------
+ hw/arm/xlnx-versal.c         | 54 ++++++++++++++++++++++++++++++++++--
- target/arm/vfp.decode          |   5 ++
+files changed, 56 insertions(+), 2 deletions(-)
 files changed, 136 insertions(+), 25 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/xlnx-versal.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
+@@ -XXX,XX +XXX,XX @@
+ #include "hw/nvram/xlnx-versal-efuse.h"
-     return true;
+ #include "hw/ssi/xlnx-versal-ospi.h"
  #include "hw/dma/xlnx_csu_dma.h"
 +#include "hw/misc/xlnx-versal-crl.h"
  #include "hw/misc/xlnx-versal-pmc-iou-slcr.h"
  #define TYPE_XLNX_VERSAL "xlnx-versal"
@@ -XXX,XX +XXX,XX @@ struct Versal {
              qemu_or_irq irq_orgate;
              XlnxXramCtrl ctrl[XLNX_VERSAL_NR_XRAM];
          } xram;
 +
 +        XlnxVersalCRL crl;
      } lpd;
      /* The Platform Management Controller subsystem.  */
@@ -XXX,XX +XXX,XX @@ struct Versal {
  #define VERSAL_TIMER_NS_EL1_IRQ     14
  #define VERSAL_TIMER_NS_EL2_IRQ     10
 +#define VERSAL_CRL_IRQ             10
  #define VERSAL_UART0_IRQ_0         18
  #define VERSAL_UART1_IRQ_0         19
  #define VERSAL_USB0_IRQ_0          22
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-versal.c
 +++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_ospi(Versal *s, qemu_irq *pic)
      qdev_connect_gpio_out(orgate, 0, pic[VERSAL_OSPI_IRQ]);
  }
++static void versal_create_crl(Versal *s, qemu_irq *pic)
++{
++    SysBusDevice *sbd;
++    int i;
 +
-+static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
++    object_initialize_child(OBJECT(s), "crl", &s->lpd.crl,
-+{
++                            TYPE_XLNX_VERSAL_CRL);
-+    uint32_t delta_d = 0;
++    sbd = SYS_BUS_DEVICE(&s->lpd.crl);
 +    uint32_t bank_mask = 0;
 +    int veclen = s->vec_len;
 +    TCGv_i32 fd;
 +    uint32_t n, i, vd;
 +
-+    vd = a->vd;
++    for (i = 0; i < ARRAY_SIZE(s->lpd.rpu.cpu); i++) {
 +        g_autofree gchar *name = g_strdup_printf("cpu_r5[%d]", i);
 +
-+    if (!dc_isar_feature(aa32_fpshvec, s) &&
++        object_property_set_link(OBJECT(&s->lpd.crl),
-+        (veclen != 0 || s->vec_stride != 0)) {
++                                 name, OBJECT(&s->lpd.rpu.cpu[i]),
-+        return false;
++                                 &error_abort);
 +    }
 +
-+    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
++    for (i = 0; i < ARRAY_SIZE(s->lpd.iou.gem); i++) {
-+        return false;
++        g_autofree gchar *name = g_strdup_printf("gem[%d]", i);
 +
 +        object_property_set_link(OBJECT(&s->lpd.crl),
 +                                 name, OBJECT(&s->lpd.iou.gem[i]),
 +                                 &error_abort);
 +    }
 +
-+    if (!vfp_access_check(s)) {
++    for (i = 0; i < ARRAY_SIZE(s->lpd.iou.adma); i++) {
-+        return true;
++        g_autofree gchar *name = g_strdup_printf("adma[%d]", i);
 +
 +        object_property_set_link(OBJECT(&s->lpd.crl),
 +                                 name, OBJECT(&s->lpd.iou.adma[i]),
 +                                 &error_abort);
 +    }
 +
-+    if (veclen > 0) {
++    for (i = 0; i < ARRAY_SIZE(s->lpd.iou.uart); i++) {
-+        bank_mask = 0x18;
++        g_autofree gchar *name = g_strdup_printf("uart[%d]", i);
-+        /* Figure out what type of vector operation this is.  */
++
-+        if ((vd & bank_mask) == 0) {
++        object_property_set_link(OBJECT(&s->lpd.crl),
-+            /* scalar */
++                                 name, OBJECT(&s->lpd.iou.uart[i]),
-+            veclen = 0;
++                                 &error_abort);
 +        } else {
 +            delta_d = s->vec_stride + 1;
 +        }
 +    }
 +
-+    n = (a->imm4h << 28) & 0x80000000;
++    object_property_set_link(OBJECT(&s->lpd.crl),
-+    i = ((a->imm4h << 4) & 0x70) | a->imm4l;
++                             "usb", OBJECT(&s->lpd.iou.usb),
-+    if (i & 0x40) {
++                             &error_abort);
 +        i |= 0x780;
 +    } else {
 +        i |= 0x800;
 +    }
 +    n |= i << 19;
 +
-+    fd = tcg_temp_new_i32();
++    sysbus_realize(sbd, &error_fatal);
-+    tcg_gen_movi_i32(fd, n);
++    memory_region_add_subregion(&s->mr_ps, MM_CRL,
-+
++                                sysbus_mmio_get_region(sbd, 0));
-+    for (;;) {
++    sysbus_connect_irq(sbd, 0, pic[VERSAL_CRL_IRQ]);
 +        neon_store_reg32(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +    }
 +
 +    tcg_temp_free_i32(fd);
 +    return true;
 +}
 +
-+static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
+ /* This takes the board allocated linear DDR memory and creates aliases
-+{
+  * for each split DDR range/aperture on the Versal address map.
 +    uint32_t delta_d = 0;
 +    uint32_t bank_mask = 0;
 +    int veclen = s->vec_len;
 +    TCGv_i64 fd;
 +    uint32_t n, i, vd;
 +
 +    vd = a->vd;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (vd & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!dc_isar_feature(aa32_fpshvec, s) &&
 +        (veclen != 0 || s->vec_stride != 0)) {
 +        return false;
 +    }
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (veclen > 0) {
 +        bank_mask = 0xc;
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = (s->vec_stride >> 1) + 1;
 +        }
 +    }
 +
 +    n = (a->imm4h << 28) & 0x80000000;
 +    i = ((a->imm4h << 4) & 0x70) | a->imm4l;
 +    if (i & 0x40) {
 +        i |= 0x3f80;
 +    } else {
 +        i |= 0x4000;
 +    }
 +    n |= i << 16;
 +
 +    fd = tcg_temp_new_i64();
 +    tcg_gen_movi_i64(fd, ((uint64_t)n) << 32);
 +
 +    for (;;) {
 +        neon_store_reg64(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +    }
 +
 +    tcg_temp_free_i64(fd);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
   */
- static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void versal_unimp(Versal *s)
- {
--    uint32_t rd, rn, rm, op, i, n, delta_d, delta_m, bank_mask;
+     versal_unimp_area(s, "psm", &s->mr_ps,
-+    uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
+                         MM_PSM_START, MM_PSM_END - MM_PSM_START);
-     int dp, veclen;
+-    versal_unimp_area(s, "crl", &s->mr_ps,
-     TCGv_i32 tmp;
+-                        MM_CRL, MM_CRL_SIZE);
-     TCGv_i32 tmp2;
+     versal_unimp_area(s, "crf", &s->mr_ps,
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+                         MM_FPD_CRF, MM_FPD_CRF_SIZE);
-             rn = VFP_SREG_N(insn);
+     versal_unimp_area(s, "apu", &s->mr_ps,
+@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
-             switch (op) {
+     versal_create_efuse(s, pic);
--            case 0 ... 13:
+     versal_create_pmc_iou_slcr(s, pic);
-+            case 0 ... 14:
+     versal_create_ospi(s, pic);
-                 /* Already handled by decodetree */
++    versal_create_crl(s, pic);
-                 return 1;
+     versal_map_ddr(s);
-             default:
+     versal_unimp(s);
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              for (;;) {
                  /* Perform the calculation.  */
                  switch (op) {
 -                case 14: /* fconst */
 -                    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                        return 1;
 -                    }
 -
 -                    n = (insn << 12) & 0x80000000;
 -                    i = ((insn >> 12) & 0x70) | (insn & 0xf);
 -                    if (dp) {
 -                        if (i & 0x40)
 -                            i |= 0x3f80;
 -                        else
 -                            i |= 0x4000;
 -                        n |= i << 16;
 -                        tcg_gen_movi_i64(cpu_F0d, ((uint64_t)n) << 32);
 -                    } else {
 -                        if (i & 0x40)
 -                            i |= 0x780;
 -                        else
 -                            i |= 0x800;
 -                        n |= i << 19;
 -                        tcg_gen_movi_i32(cpu_F0s, n);
 -                    }
 -                    break;
                  case 15: /* extension space */
                      switch (rn) {
                      case 0: /* cpy */
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VFM_sp       ---- 1110 1.10 .... .... 1010 . o2:1 . 0 .... \
               vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=2
  VFM_dp       ---- 1110 1.10 .... .... 1011 . o2:1 . 0 .... \
               vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=2
 +
 +VMOV_imm_sp  ---- 1110 1.11 imm4h:4 .... 1010 0000 imm4l:4 \
 +             vd=%vd_sp
 +VMOV_imm_dp  ---- 1110 1.11 imm4h:4 .... 1011 0000 imm4l:4 \
 +             vd=%vd_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 13/48] target/arm: Convert VMINNM, VMAXNM to decodetree
+[PULL 08/31] hw/arm/exynos4210: Use TYPE_OR_IRQ instead of custom OR-gate device
-Convert the VMINNM and VMAXNM instructions to decodetree.
+The Exynos4210 SoC device currently uses a custom device
-As with VSEL, we leave the trans_VMINMAXNM() function
+"exynos4210.irq_gate" to model the OR gate that feeds each CPU's IRQ
-in translate.c for the moment.
+line.  We have a standard TYPE_OR_IRQ device for this now, so use
 that instead.
 (This is a migration compatibility break, but that is OK for this
 machine type.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-2-peter.maydell@linaro.org
 ---
- target/arm/translate.c       | 41 ++++++++++++++++++++++++------------
+ include/hw/arm/exynos4210.h |  1 +
- target/arm/vfp-uncond.decode |  5 +++++
+ hw/arm/exynos4210.c         | 31 ++++++++++++++++---------------
-files changed, 33 insertions(+), 13 deletions(-)
+files changed, 17 insertions(+), 15 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/include/hw/arm/exynos4210.h
-+++ b/target/arm/translate.c
++++ b/include/hw/arm/exynos4210.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+@@ -XXX,XX +XXX,XX @@ struct Exynos4210State {
-     return true;
+     MemoryRegion bootreg_mem;
      I2CBus *i2c_if[EXYNOS4210_I2C_NUMBER];
      qemu_or_irq pl330_irq_orgate[EXYNOS4210_NUM_DMA];
 +    qemu_or_irq cpu_irq_orgate[EXYNOS4210_NCPUS];
  };
  #define TYPE_EXYNOS4210_SOC "exynos4210"
 diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/exynos4210.c
 +++ b/hw/arm/exynos4210.c
@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
  {
      Exynos4210State *s = EXYNOS4210_SOC(socdev);
      MemoryRegion *system_mem = get_system_memory();
 -    qemu_irq gate_irq[EXYNOS4210_NCPUS][EXYNOS4210_IRQ_GATE_NINPUTS];
      SysBusDevice *busdev;
      DeviceState *dev, *uart[4], *pl330[3];
      int i, n;
@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
      /* IRQ Gate */
      for (i = 0; i < EXYNOS4210_NCPUS; i++) {
 -        dev = qdev_new("exynos4210.irq_gate");
 -        qdev_prop_set_uint32(dev, "n_in", EXYNOS4210_IRQ_GATE_NINPUTS);
 -        sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
 -        /* Get IRQ Gate input in gate_irq */
 -        for (n = 0; n < EXYNOS4210_IRQ_GATE_NINPUTS; n++) {
 -            gate_irq[i][n] = qdev_get_gpio_in(dev, n);
 -        }
 -        busdev = SYS_BUS_DEVICE(dev);
 -
 -        /* Connect IRQ Gate output to CPU's IRQ line */
 -        sysbus_connect_irq(busdev, 0,
 -                           qdev_get_gpio_in(DEVICE(s->cpu[i]), ARM_CPU_IRQ));
 +        DeviceState *orgate = DEVICE(&s->cpu_irq_orgate[i]);
 +        object_property_set_int(OBJECT(orgate), "num-lines",
 +                                EXYNOS4210_IRQ_GATE_NINPUTS,
 +                                &error_abort);
 +        qdev_realize(orgate, NULL, &error_abort);
 +        qdev_connect_gpio_out(orgate, 0,
 +                              qdev_get_gpio_in(DEVICE(s->cpu[i]), ARM_CPU_IRQ));
      }
      /* Private memory region and Internal GIC */
@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
      sysbus_realize_and_unref(busdev, &error_fatal);
      sysbus_mmio_map(busdev, 0, EXYNOS4210_SMP_PRIVATE_BASE_ADDR);
      for (n = 0; n < EXYNOS4210_NCPUS; n++) {
 -        sysbus_connect_irq(busdev, n, gate_irq[n][0]);
 +        sysbus_connect_irq(busdev, n,
 +                           qdev_get_gpio_in(DEVICE(&s->cpu_irq_orgate[n]), 0));
      }
      for (n = 0; n < EXYNOS4210_INT_GIC_NIRQ; n++) {
          s->irqs.int_gic_irq[n] = qdev_get_gpio_in(dev, n);
@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
      /* Map Distributer interface */
      sysbus_mmio_map(busdev, 1, EXYNOS4210_EXT_GIC_DIST_BASE_ADDR);
      for (n = 0; n < EXYNOS4210_NCPUS; n++) {
 -        sysbus_connect_irq(busdev, n, gate_irq[n][1]);
 +        sysbus_connect_irq(busdev, n,
 +                           qdev_get_gpio_in(DEVICE(&s->cpu_irq_orgate[n]), 1));
      }
      for (n = 0; n < EXYNOS4210_EXT_GIC_NIRQ; n++) {
          s->irqs.ext_gic_irq[n] = qdev_get_gpio_in(dev, n);
@@ -XXX,XX +XXX,XX @@ static void exynos4210_init(Object *obj)
          object_initialize_child(obj, name, orgate, TYPE_OR_IRQ);
          g_free(name);
      }
 +
 +    for (i = 0; i < ARRAY_SIZE(s->cpu_irq_orgate); i++) {
 +        g_autofree char *name = g_strdup_printf("cpu-irq-orgate%d", i);
 +        object_initialize_child(obj, name, &s->cpu_irq_orgate[i], TYPE_OR_IRQ);
 +    }
  }
--static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
+ static void exynos4210_class_init(ObjectClass *klass, void *data)
 -                            uint32_t rm, uint32_t dp)
 +static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
  {
 -    uint32_t vmin = extract32(insn, 6, 1);
 -    TCGv_ptr fpst = get_fpstatus_ptr(0);
 +    uint32_t rd, rn, rm;
 +    bool dp = a->dp;
 +    bool vmin = a->op;
 +    TCGv_ptr fpst;
 +
 +    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 +        ((a->vm | a->vn | a->vd) & 0x10)) {
 +        return false;
 +    }
 +    rd = a->vd;
 +    rn = a->vn;
 +    rm = a->vm;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(0);
      if (dp) {
          TCGv_i64 frn, frm, dest;
@@ -XXX,XX +XXX,XX @@ static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
      }
      tcg_temp_free_ptr(fpst);
 -    return 0;
 +    return true;
  }
  static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
@@ -XXX,XX +XXX,XX @@ static const uint8_t fp_decode_rm[] = {
  static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
  {
 -    uint32_t rd, rn, rm, dp = extract32(insn, 8, 1);
 +    uint32_t rd, rm, dp = extract32(insn, 8, 1);
      if (dp) {
          VFP_DREG_D(rd, insn);
 -        VFP_DREG_N(rn, insn);
          VFP_DREG_M(rm, insn);
      } else {
          rd = VFP_SREG_D(insn);
 -        rn = VFP_SREG_N(insn);
          rm = VFP_SREG_M(insn);
      }
 -    if ((insn & 0x0fb00e10) == 0x0e800a00 &&
 -        dc_isar_feature(aa32_vminmaxnm, s)) {
 -        return handle_vminmaxnm(insn, rd, rn, rm, dp);
 -    } else if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
 -               dc_isar_feature(aa32_vrint, s)) {
 +    if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
 +        dc_isar_feature(aa32_vrint, s)) {
          /* VRINTA, VRINTN, VRINTP, VRINTM */
          int rounding = fp_decode_rm[extract32(insn, 16, 2)];
          return handle_vrint(insn, rd, rm, dp, rounding);
 diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp-uncond.decode
 +++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VSEL        1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
  VSEL        1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
 +
 +VMINMAXNM   1111 1110 1.00 .... .... 1010 . op:1 .0 .... \
 +            vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
 +VMINMAXNM   1111 1110 1.00 .... .... 1011 . op:1 .0 .... \
 +            vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 02/48] target/arm: Use tcg_gen_gvec_bitsel
+[PULL 09/31] hw/intc/exynos4210_gic: Remove unused TYPE_EXYNOS4210_IRQ_GATE
-From: Richard Henderson <richard.henderson@linaro.org>
+Now we have removed the only use of TYPE_EXYNOS4210_IRQ_GATE we can
 delete the device entirely.
-This replaces 3 target-specific implementations for BIT, BIF, and BSL.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
 Message-id: 20220404154658.565020-3-peter.maydell@linaro.org
 ---
  hw/intc/exynos4210_gic.c | 107 ---------------------------------------
 file changed, 107 deletions(-)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20190518191934.21887-3-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/translate-a64.h |  2 +
  target/arm/translate.h     |  3 --
  target/arm/translate-a64.c | 15 ++++++--
  target/arm/translate.c     | 78 +++-----------------------------------
 files changed, 20 insertions(+), 78 deletions(-)
 diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.h
+--- a/hw/intc/exynos4210_gic.c
-+++ b/target/arm/translate-a64.h
++++ b/hw/intc/exynos4210_gic.c
-@@ -XXX,XX +XXX,XX @@ typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_gic_register_types(void)
                           uint32_t, uint32_t);
  typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
                          uint32_t, uint32_t, uint32_t);
 +typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
 +                        uint32_t, uint32_t, uint32_t);
  #endif /* TARGET_ARM_TRANSLATE_A64_H */
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline void gen_ss_advance(DisasContext *s)
  }
- /* Vector operations shared between ARM and AArch64.  */
+ type_init(exynos4210_gic_register_types)
--extern const GVecGen3 bsl_op;
+-
--extern const GVecGen3 bit_op;
+-/* IRQ OR Gate struct.
--extern const GVecGen3 bif_op;
+- *
- extern const GVecGen3 mla_op[4];
+- * This device models an OR gate. There are n_in input qdev gpio lines and one
- extern const GVecGen3 mls_op[4];
+- * output sysbus IRQ line. The output IRQ level is formed as OR between all
- extern const GVecGen3 cmtst_op[4];
+- * gpio inputs.
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_fn3(DisasContext *s, bool is_q, int rd, int rn, int rm,
              vec_full_reg_offset(s, rm), is_q ? 16 : 8, vec_full_reg_size(s));
  }
 +/* Expand a 4-operand AdvSIMD vector operation using an expander function.  */
 +static void gen_gvec_fn4(DisasContext *s, bool is_q, int rd, int rn, int rm,
 +                         int rx, GVecGen4Fn *gvec_fn, int vece)
 +{
 +    gvec_fn(vece, vec_full_reg_offset(s, rd), vec_full_reg_offset(s, rn),
 +            vec_full_reg_offset(s, rm), vec_full_reg_offset(s, rx),
 +            is_q ? 16 : 8, vec_full_reg_size(s));
 +}
 +
  /* Expand a 2-operand + immediate AdvSIMD vector operation using
   * an op descriptor.
   */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
          return;
      case 5: /* BSL bitwise select */
 -        gen_gvec_op3(s, is_q, rd, rn, rm, &bsl_op);
 +        gen_gvec_fn4(s, is_q, rd, rd, rn, rm, tcg_gen_gvec_bitsel, 0);
          return;
      case 6: /* BIT, bitwise insert if true */
 -        gen_gvec_op3(s, is_q, rd, rn, rm, &bit_op);
 +        gen_gvec_fn4(s, is_q, rd, rm, rn, rd, tcg_gen_gvec_bitsel, 0);
          return;
      case 7: /* BIF, bitwise insert if false */
 -        gen_gvec_op3(s, is_q, rd, rn, rm, &bif_op);
 +        gen_gvec_fn4(s, is_q, rd, rm, rd, rn, tcg_gen_gvec_bitsel, 0);
          return;
      default:
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int do_v81_helper(DisasContext *s, gen_helper_gvec_3_ptr *fn,
      return 1;
  }
 -/*
 - * Expanders for VBitOps_VBIF, VBIT, VBSL.
 - */
--static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+-
 -#define TYPE_EXYNOS4210_IRQ_GATE "exynos4210.irq_gate"
 -OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210IRQGateState, EXYNOS4210_IRQ_GATE)
 -
 -struct Exynos4210IRQGateState {
 -    SysBusDevice parent_obj;
 -
 -    uint32_t n_in;      /* inputs amount */
 -    uint32_t *level;    /* input levels */
 -    qemu_irq out;       /* output IRQ */
 -};
 -
 -static Property exynos4210_irq_gate_properties[] = {
 -    DEFINE_PROP_UINT32("n_in", Exynos4210IRQGateState, n_in, 1),
 -    DEFINE_PROP_END_OF_LIST(),
 -};
 -
 -static const VMStateDescription vmstate_exynos4210_irq_gate = {
 -    .name = "exynos4210.irq_gate",
 -    .version_id = 2,
 -    .minimum_version_id = 2,
 -    .fields = (VMStateField[]) {
 -        VMSTATE_VBUFFER_UINT32(level, Exynos4210IRQGateState, 1, NULL, n_in),
 -        VMSTATE_END_OF_LIST()
 -    }
 -};
 -
 -/* Process a change in IRQ input. */
 -static void exynos4210_irq_gate_handler(void *opaque, int irq, int level)
 -{
--    tcg_gen_xor_i64(rn, rn, rm);
+-    Exynos4210IRQGateState *s = (Exynos4210IRQGateState *)opaque;
--    tcg_gen_and_i64(rn, rn, rd);
+-    uint32_t i;
--    tcg_gen_xor_i64(rd, rm, rn);
+-
 -    assert(irq < s->n_in);
 -
 -    s->level[irq] = level;
 -
 -    for (i = 0; i < s->n_in; i++) {
 -        if (s->level[i] >= 1) {
 -            qemu_irq_raise(s->out);
 -            return;
 -        }
 -    }
 -
 -    qemu_irq_lower(s->out);
 -}
 -
--static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+-static void exynos4210_irq_gate_reset(DeviceState *d)
 -{
--    tcg_gen_xor_i64(rn, rn, rd);
+-    Exynos4210IRQGateState *s = EXYNOS4210_IRQ_GATE(d);
--    tcg_gen_and_i64(rn, rn, rm);
+-
--    tcg_gen_xor_i64(rd, rd, rn);
+-    memset(s->level, 0, s->n_in * sizeof(*s->level));
 -}
 -
--static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+-/*
 - * IRQ Gate initialization.
 - */
 -static void exynos4210_irq_gate_init(Object *obj)
 -{
--    tcg_gen_xor_i64(rn, rn, rd);
+-    Exynos4210IRQGateState *s = EXYNOS4210_IRQ_GATE(obj);
--    tcg_gen_andc_i64(rn, rn, rm);
+-    SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
--    tcg_gen_xor_i64(rd, rd, rn);
+-
 -    sysbus_init_irq(sbd, &s->out);
 -}
 -
--static void gen_bsl_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
+-static void exynos4210_irq_gate_realize(DeviceState *dev, Error **errp)
 -{
--    tcg_gen_xor_vec(vece, rn, rn, rm);
+-    Exynos4210IRQGateState *s = EXYNOS4210_IRQ_GATE(dev);
--    tcg_gen_and_vec(vece, rn, rn, rd);
+-
--    tcg_gen_xor_vec(vece, rd, rm, rn);
+-    /* Allocate general purpose input signals and connect a handler to each of
 -     * them */
 -    qdev_init_gpio_in(dev, exynos4210_irq_gate_handler, s->n_in);
 -
 -    s->level = g_malloc0(s->n_in * sizeof(*s->level));
 -}
 -
--static void gen_bit_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
+-static void exynos4210_irq_gate_class_init(ObjectClass *klass, void *data)
 -{
--    tcg_gen_xor_vec(vece, rn, rn, rd);
+-    DeviceClass *dc = DEVICE_CLASS(klass);
--    tcg_gen_and_vec(vece, rn, rn, rm);
+-
--    tcg_gen_xor_vec(vece, rd, rd, rn);
+-    dc->reset = exynos4210_irq_gate_reset;
 -    dc->vmsd = &vmstate_exynos4210_irq_gate;
 -    device_class_set_props(dc, exynos4210_irq_gate_properties);
 -    dc->realize = exynos4210_irq_gate_realize;
 -}
 -
--static void gen_bif_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
+-static const TypeInfo exynos4210_irq_gate_info = {
 -    .name          = TYPE_EXYNOS4210_IRQ_GATE,
 -    .parent        = TYPE_SYS_BUS_DEVICE,
 -    .instance_size = sizeof(Exynos4210IRQGateState),
 -    .instance_init = exynos4210_irq_gate_init,
 -    .class_init    = exynos4210_irq_gate_class_init,
 -};
 -
 -static void exynos4210_irq_gate_register_types(void)
 -{
--    tcg_gen_xor_vec(vece, rn, rn, rd);
+-    type_register_static(&exynos4210_irq_gate_info);
 -    tcg_gen_andc_vec(vece, rn, rn, rm);
 -    tcg_gen_xor_vec(vece, rd, rd, rn);
 -}
 -
--const GVecGen3 bsl_op = {
+-type_init(exynos4210_irq_gate_register_types)
 -    .fni8 = gen_bsl_i64,
 -    .fniv = gen_bsl_vec,
 -    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -    .load_dest = true
 -};
 -
 -const GVecGen3 bit_op = {
 -    .fni8 = gen_bit_i64,
 -    .fniv = gen_bit_vec,
 -    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -    .load_dest = true
 -};
 -
 -const GVecGen3 bif_op = {
 -    .fni8 = gen_bif_i64,
 -    .fniv = gen_bif_vec,
 -    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -    .load_dest = true
 -};
 -
  static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
  {
      tcg_gen_vec_sar8i_i64(a, a, shift);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                   vec_size, vec_size);
                  break;
              case 5: /* VBSL */
 -                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
 -                               vec_size, vec_size, &bsl_op);
 +                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rd_ofs, rn_ofs, rm_ofs,
 +                                    vec_size, vec_size);
                  break;
              case 6: /* VBIT */
 -                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
 -                               vec_size, vec_size, &bit_op);
 +                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rn_ofs, rd_ofs,
 +                                    vec_size, vec_size);
                  break;
              case 7: /* VBIF */
 -                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
 -                               vec_size, vec_size, &bif_op);
 +                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rd_ofs, rn_ofs,
 +                                    vec_size, vec_size);
                  break;
              }
              return 0;
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 24/48] target/arm: Convert VFP VMLA to decodetree
+[PULL 10/31] hw/arm/exynos4210: Put a9mpcore device into state struct
-Convert the VFP VMLA instruction to decodetree.
+The exynos4210 SoC mostly creates its child devices as if it were
+board code.  This includes the a9mpcore object.  Switch that to a
-This is the first of the VFP 3-operand data processing instructions,
+new-style "embedded in the state struct" creation, because in the
-so we include in this patch the code which loops over the elements
+next commit we're going to want to refer to the object again further
-for an old-style VFP vector operation. The existing code to do this
+down in the exynos4210_realize() function.
 looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since
 we are going to be converting instructions one at a time anyway
 we can take the opportunity to make the new loop use TCG temporaries,
 which means we can do that conversion one operation at a time
 rather than needing to do it all in one go.
 We include an UNDEF check which was missing in the old code:
 short-vector operations (with stride or length non-zero) were
 deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec
 field does not indicate that support for short vectors is present
 we UNDEF the operations that would use them. (This is a change
 of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which
 previously were all incorrectly allowing short-vector operations.)
 Note that the conversion fixes a bug in the old code for the
 case of VFP short-vector "mixed scalar/vector operations". These
 happen where the destination register is in a vector bank but
 but the second operand is in a scalar bank. For example
   vmla.f64 d10, d1, d16   with length 2 stride 2
 is equivalent to the pair of scalar operations
   vmla.f64 d10, d1, d16
   vmla.f64 d8, d3, d16
 where the destination and first input register cycle through
 their vector but the second input is scalar (d16). In the
 old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d}
 as a temporary output for the multiply, which trashes the
 second input operand. For the fully-scalar case (where we
 never do a second iteration) and the fully-vector case
 (where the loop loads the new second input operand) this
 doesn't matter, but for the mixed scalar/vector case we
 will end up using the wrong value for later loop iterations.
 In the new code we use TCG temporaries and so avoid the bug.
 This bug is present for all the multiply-accumulate insns
 that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS.
 Note 2: the expression used to calculate the next register
 number in the vector bank is not in fact correct; we leave
 this behaviour unchanged from the old decoder and will
 fix this bug later in the series.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-4-peter.maydell@linaro.org
 ---
- target/arm/cpu.h               |   5 +
+ include/hw/arm/exynos4210.h |  2 ++
- target/arm/translate-vfp.inc.c | 205 +++++++++++++++++++++++++++++++++
+ hw/arm/exynos4210.c         | 11 ++++++-----
- target/arm/translate.c         |  14 ++-
+files changed, 8 insertions(+), 5 deletions(-)
  target/arm/vfp.decode          |   6 +
 files changed, 224 insertions(+), 6 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/include/hw/arm/exynos4210.h
-+++ b/target/arm/cpu.h
++++ b/include/hw/arm/exynos4210.h
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp_d32(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@
-     return FIELD_EX64(id->mvfr0, MVFR0, SIMDREG) >= 2;
  #include "hw/or-irq.h"
  #include "hw/sysbus.h"
 +#include "hw/cpu/a9mpcore.h"
  #include "target/arm/cpu-qom.h"
  #include "qom/object.h"
@@ -XXX,XX +XXX,XX @@ struct Exynos4210State {
      I2CBus *i2c_if[EXYNOS4210_I2C_NUMBER];
      qemu_or_irq pl330_irq_orgate[EXYNOS4210_NUM_DMA];
      qemu_or_irq cpu_irq_orgate[EXYNOS4210_NCPUS];
 +    A9MPPrivState a9mpcore;
  };
  #define TYPE_EXYNOS4210_SOC "exynos4210"
 diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/exynos4210.c
 +++ b/hw/arm/exynos4210.c
@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
      }
      /* Private memory region and Internal GIC */
 -    dev = qdev_new(TYPE_A9MPCORE_PRIV);
 -    qdev_prop_set_uint32(dev, "num-cpu", EXYNOS4210_NCPUS);
 -    busdev = SYS_BUS_DEVICE(dev);
 -    sysbus_realize_and_unref(busdev, &error_fatal);
 +    qdev_prop_set_uint32(DEVICE(&s->a9mpcore), "num-cpu", EXYNOS4210_NCPUS);
 +    busdev = SYS_BUS_DEVICE(&s->a9mpcore);
 +    sysbus_realize(busdev, &error_fatal);
      sysbus_mmio_map(busdev, 0, EXYNOS4210_SMP_PRIVATE_BASE_ADDR);
      for (n = 0; n < EXYNOS4210_NCPUS; n++) {
          sysbus_connect_irq(busdev, n,
                             qdev_get_gpio_in(DEVICE(&s->cpu_irq_orgate[n]), 0));
      }
      for (n = 0; n < EXYNOS4210_INT_GIC_NIRQ; n++) {
 -        s->irqs.int_gic_irq[n] = qdev_get_gpio_in(dev, n);
 +        s->irqs.int_gic_irq[n] = qdev_get_gpio_in(DEVICE(&s->a9mpcore), n);
      }
      /* Cache controller */
@@ -XXX,XX +XXX,XX @@ static void exynos4210_init(Object *obj)
          g_autofree char *name = g_strdup_printf("cpu-irq-orgate%d", i);
          object_initialize_child(obj, name, &s->cpu_irq_orgate[i], TYPE_OR_IRQ);
      }
 +
 +    object_initialize_child(obj, "a9mpcore", &s->a9mpcore, TYPE_A9MPCORE_PRIV);
  }
-+static inline bool isar_feature_aa32_fpshvec(const ARMISARegisters *id)
+ static void exynos4210_class_init(ObjectClass *klass, void *data)
 +{
 +    return FIELD_EX64(id->mvfr0, MVFR0, FPSHVEC) > 0;
 +}
 +
  /*
   * We always set the FP and SIMD FP16 fields to indicate identical
   * levels of support (assuming SIMD is implemented at all), so
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.inc.c
 +++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
      return true;
  }
 +
 +/*
 + * Types for callbacks for do_vfp_3op_sp() and do_vfp_3op_dp().
 + * The callback should emit code to write a value to vd. If
 + * do_vfp_3op_{sp,dp}() was passed reads_vd then the TCGv vd
 + * will contain the old value of the relevant VFP register;
 + * otherwise it must be written to only.
 + */
 +typedef void VFPGen3OpSPFn(TCGv_i32 vd,
 +                           TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst);
 +typedef void VFPGen3OpDPFn(TCGv_i64 vd,
 +                           TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst);
 +
 +/*
 + * Perform a 3-operand VFP data processing instruction. fn is the
 + * callback to do the actual operation; this function deals with the
 + * code to handle looping around for VFP vector processing.
 + */
 +static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
 +                          int vd, int vn, int vm, bool reads_vd)
 +{
 +    uint32_t delta_m = 0;
 +    uint32_t delta_d = 0;
 +    uint32_t bank_mask = 0;
 +    int veclen = s->vec_len;
 +    TCGv_i32 f0, f1, fd;
 +    TCGv_ptr fpst;
 +
 +    if (!dc_isar_feature(aa32_fpshvec, s) &&
 +        (veclen != 0 || s->vec_stride != 0)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (veclen > 0) {
 +        bank_mask = 0x18;
 +
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = s->vec_stride + 1;
 +
 +            if ((vm & bank_mask) == 0) {
 +                /* mixed scalar/vector */
 +                delta_m = 0;
 +            } else {
 +                /* vector */
 +                delta_m = delta_d;
 +            }
 +        }
 +    }
 +
 +    f0 = tcg_temp_new_i32();
 +    f1 = tcg_temp_new_i32();
 +    fd = tcg_temp_new_i32();
 +    fpst = get_fpstatus_ptr(0);
 +
 +    neon_load_reg32(f0, vn);
 +    neon_load_reg32(f1, vm);
 +
 +    for (;;) {
 +        if (reads_vd) {
 +            neon_load_reg32(fd, vd);
 +        }
 +        fn(fd, f0, f1, fpst);
 +        neon_store_reg32(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
 +        neon_load_reg32(f0, vn);
 +        if (delta_m) {
 +            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +            neon_load_reg32(f1, vm);
 +        }
 +    }
 +
 +    tcg_temp_free_i32(f0);
 +    tcg_temp_free_i32(f1);
 +    tcg_temp_free_i32(fd);
 +    tcg_temp_free_ptr(fpst);
 +
 +    return true;
 +}
 +
 +static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
 +                          int vd, int vn, int vm, bool reads_vd)
 +{
 +    uint32_t delta_m = 0;
 +    uint32_t delta_d = 0;
 +    uint32_t bank_mask = 0;
 +    int veclen = s->vec_len;
 +    TCGv_i64 f0, f1, fd;
 +    TCGv_ptr fpst;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vn | vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!dc_isar_feature(aa32_fpshvec, s) &&
 +        (veclen != 0 || s->vec_stride != 0)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (veclen > 0) {
 +        bank_mask = 0xc;
 +
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = (s->vec_stride >> 1) + 1;
 +
 +            if ((vm & bank_mask) == 0) {
 +                /* mixed scalar/vector */
 +                delta_m = 0;
 +            } else {
 +                /* vector */
 +                delta_m = delta_d;
 +            }
 +        }
 +    }
 +
 +    f0 = tcg_temp_new_i64();
 +    f1 = tcg_temp_new_i64();
 +    fd = tcg_temp_new_i64();
 +    fpst = get_fpstatus_ptr(0);
 +
 +    neon_load_reg64(f0, vn);
 +    neon_load_reg64(f1, vm);
 +
 +    for (;;) {
 +        if (reads_vd) {
 +            neon_load_reg64(fd, vd);
 +        }
 +        fn(fd, f0, f1, fpst);
 +        neon_store_reg64(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
 +        neon_load_reg64(f0, vn);
 +        if (delta_m) {
 +            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +            neon_load_reg64(f1, vm);
 +        }
 +    }
 +
 +    tcg_temp_free_i64(f0);
 +    tcg_temp_free_i64(f1);
 +    tcg_temp_free_i64(fd);
 +    tcg_temp_free_ptr(fpst);
 +
 +    return true;
 +}
 +
 +static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 +{
 +    /* Note that order of inputs to the add matters for NaNs */
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +
 +    gen_helper_vfp_muls(tmp, vn, vm, fpst);
 +    gen_helper_vfp_adds(vd, vd, tmp, fpst);
 +    tcg_temp_free_i32(tmp);
 +}
 +
 +static bool trans_VMLA_sp(DisasContext *s, arg_VMLA_sp *a)
 +{
 +    return do_vfp_3op_sp(s, gen_VMLA_sp, a->vd, a->vn, a->vm, true);
 +}
 +
 +static void gen_VMLA_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
 +{
 +    /* Note that order of inputs to the add matters for NaNs */
 +    TCGv_i64 tmp = tcg_temp_new_i64();
 +
 +    gen_helper_vfp_muld(tmp, vn, vm, fpst);
 +    gen_helper_vfp_addd(vd, vd, tmp, fpst);
 +    tcg_temp_free_i64(tmp);
 +}
 +
 +static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_sp *a)
 +{
 +    return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              op = ((insn >> 20) & 8) | ((insn >> 19) & 6) | ((insn >> 6) & 1);
              rn = VFP_SREG_N(insn);
 +            switch (op) {
 +            case 0:
 +                /* Already handled by decodetree */
 +                return 1;
 +            default:
 +                break;
 +            }
 +
              if (op == 15) {
                  /* rn is opcode, encoded as per VFP_SREG_N. */
                  switch (rn) {
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              for (;;) {
                  /* Perform the calculation.  */
                  switch (op) {
 -                case 0: /* VMLA: fd + (fn * fm) */
 -                    /* Note that order of inputs to the add matters for NaNs */
 -                    gen_vfp_F1_mul(dp);
 -                    gen_mov_F0_vreg(dp, rd);
 -                    gen_vfp_add(dp);
 -                    break;
                  case 1: /* VMLS: fd + -(fn * fm) */
                      gen_vfp_mul(dp);
                      gen_vfp_F1_neg(dp);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VLDM_VSTM_sp ---- 1101 0.1 l:1 rn:4 .... 1010 imm:8 \
               vd=%vd_sp p=1 u=0 w=1
  VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
               vd=%vd_dp p=1 u=0 w=1
 +
 +# 3-register VFP data-processing; bits [23,21:20,6] identify the operation.
 +VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 .... \
 +             vm=%vm_sp vn=%vn_sp vd=%vd_sp
 +VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 .... \
 +             vm=%vm_dp vn=%vn_dp vd=%vd_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 45/48] target/arm: Convert VJCVT to decodetree
+[PULL 11/31] hw/arm/exynos4210: Drop int_gic_irq[] from Exynos4210Irq struct
-Convert the VJCVT instruction to decodetree.
+The only time we use the int_gic_irq[] array in the Exynos4210Irq
 struct is in the exynos4210_realize() function: we initialize it with
 the GPIO inputs of the a9mpcore device, and then a bit later on we
 connect those to the outputs of the internal combiner.  Now that the
 a9mpcore object is easily accessible as s->a9mpcore we can make the
 connection directly from one device to the other without going via
 this array.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-5-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 28 ++++++++++++++++++++++++++++
+ include/hw/arm/exynos4210.h | 1 -
- target/arm/translate.c         | 12 +-----------
+ hw/arm/exynos4210.c         | 6 ++----
- target/arm/vfp.decode          |  4 ++++
+files changed, 2 insertions(+), 5 deletions(-)
 files changed, 33 insertions(+), 11 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/exynos4210.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/exynos4210.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
+@@ -XXX,XX +XXX,XX @@
-     tcg_temp_free_ptr(fpst);
+ typedef struct Exynos4210Irq {
-     return true;
+     qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
- }
+     qemu_irq ext_combiner_irq[EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ];
-+
+-    qemu_irq int_gic_irq[EXYNOS4210_INT_GIC_NIRQ];
-+static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
+     qemu_irq ext_gic_irq[EXYNOS4210_EXT_GIC_NIRQ];
-+{
+     qemu_irq board_irqs[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
-+    TCGv_i32 vd;
+ } Exynos4210Irq;
-+    TCGv_i64 vm;
+diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 +
 +    if (!dc_isar_feature(aa32_jscvt, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    vm = tcg_temp_new_i64();
 +    vd = tcg_temp_new_i32();
 +    neon_load_reg64(vm, a->vm);
 +    gen_helper_vjcvt(vd, vm, cpu_env);
 +    neon_store_reg32(vd, a->vd);
 +    tcg_temp_free_i64(vm);
 +    tcg_temp_free_i32(vd);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/arm/exynos4210.c
-+++ b/target/arm/translate.c
++++ b/hw/arm/exynos4210.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
-                 return 1;
+         sysbus_connect_irq(busdev, n,
-             case 15:
+                            qdev_get_gpio_in(DEVICE(&s->cpu_irq_orgate[n]), 0));
-                 switch (rn) {
+     }
--                case 0 ... 17:
+-    for (n = 0; n < EXYNOS4210_INT_GIC_NIRQ; n++) {
-+                case 0 ... 19:
+-        s->irqs.int_gic_irq[n] = qdev_get_gpio_in(DEVICE(&s->a9mpcore), n);
-                     /* Already handled by decodetree */
+-    }
-                     return 1;
-                 default:
+     /* Cache controller */
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+     sysbus_create_simple("l2x0", EXYNOS4210_L2X0_BASE_ADDR, NULL);
-                     rm_is_dp = false;
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
-                     break;
+     busdev = SYS_BUS_DEVICE(dev);
+     sysbus_realize_and_unref(busdev, &error_fatal);
--                case 0x13: /* vjcvt */
+     for (n = 0; n < EXYNOS4210_MAX_INT_COMBINER_OUT_IRQ; n++) {
--                    if (!dp || !dc_isar_feature(aa32_jscvt, s)) {
+-        sysbus_connect_irq(busdev, n, s->irqs.int_gic_irq[n]);
--                        return 1;
++        sysbus_connect_irq(busdev, n,
--                    }
++                           qdev_get_gpio_in(DEVICE(&s->a9mpcore), n));
--                    rd_is_dp = false;
+     }
--                    break;
+     exynos4210_combiner_get_gpioin(&s->irqs, dev, 0);
--
+     sysbus_mmio_map(busdev, 0, EXYNOS4210_INT_COMBINER_BASE_ADDR);
                  default:
                      return 1;
                  }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 19: /* vjcvt */
 -                        gen_helper_vjcvt(cpu_F0s, cpu_F0d, cpu_env);
 -                        break;
                      case 20: /* fshto */
                          gen_vfp_shto(dp, 16 - rm, 0);
                          break;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_int_sp  ---- 1110 1.11 1000 .... 1010 s:1 1.0 .... \
               vd=%vd_sp vm=%vm_sp
  VCVT_int_dp  ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
               vd=%vd_dp vm=%vm_sp
 +
 +# VJCVT is always dp to sp
 +VJCVT        ---- 1110 1.11 1001 .... 1011 11.0 .... \
 +             vd=%vd_sp vm=%vm_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 47/48] target/arm: Convert float-to-integer VCVT insns to decodetree
+[PULL 12/31] hw/arm/exynos4210: Coalesce board_irqs and irq_table
-Convert the float-to-integer VCVT instructions to decodetree.
+The exynos4210 code currently has two very similar arrays of IRQs:
-Since these are the last unconverted instructions, we can
-delete the old decoder structure entirely now.
+ * board_irqs is a field of the Exynos4210Irq struct which is filled
    in by exynos4210_init_board_irqs() with the appropriate qemu_irqs
    for each IRQ the board/SoC can assert
  * irq_table is a set of qemu_irqs pointed to from the
    Exynos4210State struct.  It's allocated in exynos4210_init_irq,
    and the only behaviour these irqs have is that they pass on the
    level to the equivalent board_irqs[] irq
 The extra indirection through irq_table is unnecessary, so coalesce
 these into a single irq_table[] array as a direct field in
 Exynos4210State which exynos4210_init_board_irqs() fills in.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-6-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c |  72 ++++++++++
+ include/hw/arm/exynos4210.h |  8 ++------
- target/arm/translate.c         | 241 +--------------------------------
+ hw/arm/exynos4210.c         |  6 +-----
- target/arm/vfp.decode          |   6 +
+ hw/intc/exynos4210_gic.c    | 32 ++++++++------------------------
-files changed, 80 insertions(+), 239 deletions(-)
+files changed, 11 insertions(+), 35 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/exynos4210.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/exynos4210.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
+@@ -XXX,XX +XXX,XX @@ typedef struct Exynos4210Irq {
-     tcg_temp_free_ptr(fpst);
+     qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
-     return true;
+     qemu_irq ext_combiner_irq[EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ];
- }
+     qemu_irq ext_gic_irq[EXYNOS4210_EXT_GIC_NIRQ];
-+
+-    qemu_irq board_irqs[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
-+static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
+ } Exynos4210Irq;
-+{
-+    TCGv_i32 vm;
+ struct Exynos4210State {
-+    TCGv_ptr fpst;
+@@ -XXX,XX +XXX,XX @@ struct Exynos4210State {
-+
+     /*< public >*/
-+    if (!vfp_access_check(s)) {
+     ARMCPU *cpu[EXYNOS4210_NCPUS];
-+        return true;
+     Exynos4210Irq irqs;
-+    }
+-    qemu_irq *irq_table;
-+
++    qemu_irq irq_table[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
-+    fpst = get_fpstatus_ptr(false);
-+    vm = tcg_temp_new_i32();
+     MemoryRegion chipid_mem;
-+    neon_load_reg32(vm, a->vm);
+     MemoryRegion iram_mem;
-+
+@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210State, EXYNOS4210_SOC)
-+    if (a->s) {
+ void exynos4210_write_secondary(ARMCPU *cpu,
-+        if (a->rz) {
+         const struct arm_boot_info *info);
-+            gen_helper_vfp_tosizs(vm, vm, fpst);
-+        } else {
+-/* Initialize exynos4210 IRQ subsystem stub */
-+            gen_helper_vfp_tosis(vm, vm, fpst);
+-qemu_irq *exynos4210_init_irq(Exynos4210Irq *env);
-+        }
+-
-+    } else {
+ /* Initialize board IRQs.
-+        if (a->rz) {
+  * These IRQs contain splitted Int/External Combiner and External Gic IRQs */
-+            gen_helper_vfp_touizs(vm, vm, fpst);
+-void exynos4210_init_board_irqs(Exynos4210Irq *s);
-+        } else {
++void exynos4210_init_board_irqs(Exynos4210State *s);
-+            gen_helper_vfp_touis(vm, vm, fpst);
-+        }
+ /* Get IRQ number from exynos4210 IRQ subsystem stub.
-+    }
+  * To identify IRQ source use internal combiner group and bit number
-+    neon_store_reg32(vm, a->vd);
+diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 +    tcg_temp_free_i32(vm);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
 +static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
 +{
 +    TCGv_i32 vd;
 +    TCGv_i64 vm;
 +    TCGv_ptr fpst;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(false);
 +    vm = tcg_temp_new_i64();
 +    vd = tcg_temp_new_i32();
 +    neon_load_reg64(vm, a->vm);
 +
 +    if (a->s) {
 +        if (a->rz) {
 +            gen_helper_vfp_tosizd(vd, vm, fpst);
 +        } else {
 +            gen_helper_vfp_tosid(vd, vm, fpst);
 +        }
 +    } else {
 +        if (a->rz) {
 +            gen_helper_vfp_touizd(vd, vm, fpst);
 +        } else {
 +            gen_helper_vfp_touid(vd, vm, fpst);
 +        }
 +    }
 +    neon_store_reg32(vd, a->vd);
 +    tcg_temp_free_i32(vd);
 +    tcg_temp_free_i64(vm);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/arm/exynos4210.c
-+++ b/target/arm/translate.c
++++ b/hw/arm/exynos4210.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp, int neon) \
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
-     tcg_temp_free_ptr(statusptr); \
+         qdev_realize(DEVICE(cpuobj), NULL, &error_fatal);
- }
+     }
--VFP_GEN_FTOI(toui)
+-    /*** IRQs ***/
  VFP_GEN_FTOI(touiz)
 -VFP_GEN_FTOI(tosi)
  VFP_GEN_FTOI(tosiz)
  #undef VFP_GEN_FTOI
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
  }
  #define tcg_gen_ld_f32 tcg_gen_ld_i32
 -#define tcg_gen_ld_f64 tcg_gen_ld_i64
  #define tcg_gen_st_f32 tcg_gen_st_i32
 -#define tcg_gen_st_f64 tcg_gen_st_i64
 -
--static inline void gen_mov_F0_vreg(int dp, int reg)
+-    s->irq_table = exynos4210_init_irq(&s->irqs);
 -
      /* IRQ Gate */
      for (i = 0; i < EXYNOS4210_NCPUS; i++) {
          DeviceState *orgate = DEVICE(&s->cpu_irq_orgate[i]);
@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
      sysbus_mmio_map(busdev, 0, EXYNOS4210_EXT_COMBINER_BASE_ADDR);
      /* Initialize board IRQs. */
 -    exynos4210_init_board_irqs(&s->irqs);
 +    exynos4210_init_board_irqs(s);
      /*** Memory ***/
 diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/intc/exynos4210_gic.c
 +++ b/hw/intc/exynos4210_gic.c
@@ -XXX,XX +XXX,XX @@ combiner_grp_to_gic_id[64-EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
  #define EXYNOS4210_GIC_CPU_REGION_SIZE  0x100
  #define EXYNOS4210_GIC_DIST_REGION_SIZE 0x1000
 -static void exynos4210_irq_handler(void *opaque, int irq, int level)
 -{
--    if (dp)
+-    Exynos4210Irq *s = (Exynos4210Irq *)opaque;
--        tcg_gen_ld_f64(cpu_F0d, cpu_env, vfp_reg_offset(dp, reg));
+-
--    else
+-    /* Bypass */
--        tcg_gen_ld_f32(cpu_F0s, cpu_env, vfp_reg_offset(dp, reg));
+-    qemu_set_irq(s->board_irqs[irq], level);
 -}
 -
--static inline void gen_mov_F1_vreg(int dp, int reg)
+-/*
 - * Initialize exynos4210 IRQ subsystem stub.
 - */
 -qemu_irq *exynos4210_init_irq(Exynos4210Irq *s)
 -{
--    if (dp)
+-    return qemu_allocate_irqs(exynos4210_irq_handler, s,
--        tcg_gen_ld_f64(cpu_F1d, cpu_env, vfp_reg_offset(dp, reg));
+-            EXYNOS4210_MAX_INT_COMBINER_IN_IRQ);
 -    else
 -        tcg_gen_ld_f32(cpu_F1s, cpu_env, vfp_reg_offset(dp, reg));
 -}
 -
--static inline void gen_mov_vreg_F0(int dp, int reg)
+ /*
--{
+  * Initialize board IRQs.
--    if (dp)
+  * These IRQs contain splitted Int/External Combiner and External Gic IRQs.
 -        tcg_gen_st_f64(cpu_F0d, cpu_env, vfp_reg_offset(dp, reg));
 -    else
 -        tcg_gen_st_f32(cpu_F0s, cpu_env, vfp_reg_offset(dp, reg));
 -}
  #define ARM_CP_RW_BIT   (1 << 20)
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
   */
- static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-void exynos4210_init_board_irqs(Exynos4210Irq *s)
 +void exynos4210_init_board_irqs(Exynos4210State *s)
  {
--    uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
+     uint32_t grp, bit, irq_id, n;
--    int dp, veclen;
++    Exynos4210Irq *is = &s->irqs;
--
-     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
+     for (n = 0; n < EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ; n++) {
-         return 1;
+         irq_id = 0;
-     }
+@@ -XXX,XX +XXX,XX @@ void exynos4210_init_board_irqs(Exynos4210Irq *s)
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+             irq_id = EXT_GIC_ID_MCT_G1;
-             return 0;
+         }
          if (irq_id) {
 -            s->board_irqs[n] = qemu_irq_split(s->int_combiner_irq[n],
 -                    s->ext_gic_irq[irq_id-32]);
 +            s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
 +                    is->ext_gic_irq[irq_id - 32]);
          } else {
 -            s->board_irqs[n] = qemu_irq_split(s->int_combiner_irq[n],
 -                    s->ext_combiner_irq[n]);
 +            s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
 +                    is->ext_combiner_irq[n]);
          }
      }
--
+     for (; n < EXYNOS4210_MAX_INT_COMBINER_IN_IRQ; n++) {
--    if (extract32(insn, 28, 4) == 0xf) {
+@@ -XXX,XX +XXX,XX @@ void exynos4210_init_board_irqs(Exynos4210Irq *s)
--        /*
+                      EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][bit];
--         * Encodings with T=1 (Thumb) or unconditional (ARM): these
--         * were all handled by the decodetree decoder, so any insn
+         if (irq_id) {
--         * patterns which get here must be UNDEF.
+-            s->board_irqs[n] = qemu_irq_split(s->int_combiner_irq[n],
--         */
+-                    s->ext_gic_irq[irq_id-32]);
--        return 1;
++            s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
--    }
++                    is->ext_gic_irq[irq_id - 32]);
--
+         }
--    /*
+     }
 -     * FIXME: this access check should not take precedence over UNDEF
 -     * for invalid encodings; we will generate incorrect syndrome information
 -     * for attempts to execute invalid vfp/neon encodings with FP disabled.
 -     */
 -    if (!vfp_access_check(s)) {
 -        return 0;
 -    }
 -
 -    dp = ((insn & 0xf00) == 0xb00);
 -    switch ((insn >> 24) & 0xf) {
 -    case 0xe:
 -        if (insn & (1 << 4)) {
 -            /* already handled by decodetree */
 -            return 1;
 -        } else {
 -            /* data processing */
 -            bool rd_is_dp = dp;
 -            bool rm_is_dp = dp;
 -            bool no_output = false;
 -
 -            /* The opcode is in bits 23, 21, 20 and 6.  */
 -            op = ((insn >> 20) & 8) | ((insn >> 19) & 6) | ((insn >> 6) & 1);
 -            rn = VFP_SREG_N(insn);
 -
 -            switch (op) {
 -            case 0 ... 14:
 -                /* Already handled by decodetree */
 -                return 1;
 -            case 15:
 -                switch (rn) {
 -                case 0 ... 23:
 -                case 28 ... 31:
 -                    /* Already handled by decodetree */
 -                    return 1;
 -                default:
 -                    break;
 -                }
 -            default:
 -                break;
 -            }
 -
 -            if (op == 15) {
 -                /* rn is opcode, encoded as per VFP_SREG_N. */
 -                switch (rn) {
 -                case 0x18: /* vcvtr.u32.fxx */
 -                case 0x19: /* vcvtz.u32.fxx */
 -                case 0x1a: /* vcvtr.s32.fxx */
 -                case 0x1b: /* vcvtz.s32.fxx */
 -                    rd_is_dp = false;
 -                    break;
 -
 -                default:
 -                    return 1;
 -                }
 -            } else if (dp) {
 -                /* rn is register number */
 -                VFP_DREG_N(rn, insn);
 -            }
 -
 -            if (rd_is_dp) {
 -                VFP_DREG_D(rd, insn);
 -            } else {
 -                rd = VFP_SREG_D(insn);
 -            }
 -            if (rm_is_dp) {
 -                VFP_DREG_M(rm, insn);
 -            } else {
 -                rm = VFP_SREG_M(insn);
 -            }
 -
 -            veclen = s->vec_len;
 -            if (op == 15 && rn > 3) {
 -                veclen = 0;
 -            }
 -
 -            /* Shut up compiler warnings.  */
 -            delta_m = 0;
 -            delta_d = 0;
 -            bank_mask = 0;
 -
 -            if (veclen > 0) {
 -                if (dp)
 -                    bank_mask = 0xc;
 -                else
 -                    bank_mask = 0x18;
 -
 -                /* Figure out what type of vector operation this is.  */
 -                if ((rd & bank_mask) == 0) {
 -                    /* scalar */
 -                    veclen = 0;
 -                } else {
 -                    if (dp)
 -                        delta_d = (s->vec_stride >> 1) + 1;
 -                    else
 -                        delta_d = s->vec_stride + 1;
 -
 -                    if ((rm & bank_mask) == 0) {
 -                        /* mixed scalar/vector */
 -                        delta_m = 0;
 -                    } else {
 -                        /* vector */
 -                        delta_m = delta_d;
 -                    }
 -                }
 -            }
 -
 -            /* Load the initial operands.  */
 -            if (op == 15) {
 -                switch (rn) {
 -                default:
 -                    /* One source operand.  */
 -                    gen_mov_F0_vreg(rm_is_dp, rm);
 -                    break;
 -                }
 -            } else {
 -                /* Two source operands.  */
 -                gen_mov_F0_vreg(dp, rn);
 -                gen_mov_F1_vreg(dp, rm);
 -            }
 -
 -            for (;;) {
 -                /* Perform the calculation.  */
 -                switch (op) {
 -                case 15: /* extension space */
 -                    switch (rn) {
 -                    case 24: /* ftoui */
 -                        gen_vfp_toui(dp, 0);
 -                        break;
 -                    case 25: /* ftouiz */
 -                        gen_vfp_touiz(dp, 0);
 -                        break;
 -                    case 26: /* ftosi */
 -                        gen_vfp_tosi(dp, 0);
 -                        break;
 -                    case 27: /* ftosiz */
 -                        gen_vfp_tosiz(dp, 0);
 -                        break;
 -                    default: /* undefined */
 -                        g_assert_not_reached();
 -                    }
 -                    break;
 -                default: /* undefined */
 -                    return 1;
 -                }
 -
 -                /* Write back the result, if any.  */
 -                if (!no_output) {
 -                    gen_mov_vreg_F0(rd_is_dp, rd);
 -                }
 -
 -                /* break out of the loop if we have finished  */
 -                if (veclen == 0) {
 -                    break;
 -                }
 -
 -                if (op == 15 && delta_m == 0) {
 -                    /* single source one-many */
 -                    while (veclen--) {
 -                        rd = ((rd + delta_d) & (bank_mask - 1))
 -                             | (rd & bank_mask);
 -                        gen_mov_vreg_F0(dp, rd);
 -                    }
 -                    break;
 -                }
 -                /* Setup the next operands.  */
 -                veclen--;
 -                rd = ((rd + delta_d) & (bank_mask - 1))
 -                     | (rd & bank_mask);
 -
 -                if (op == 15) {
 -                    /* One source operand.  */
 -                    rm = ((rm + delta_m) & (bank_mask - 1))
 -                         | (rm & bank_mask);
 -                    gen_mov_F0_vreg(dp, rm);
 -                } else {
 -                    /* Two source operands.  */
 -                    rn = ((rn + delta_d) & (bank_mask - 1))
 -                         | (rn & bank_mask);
 -                    gen_mov_F0_vreg(dp, rn);
 -                    if (delta_m) {
 -                        rm = ((rm + delta_m) & (bank_mask - 1))
 -                             | (rm & bank_mask);
 -                        gen_mov_F1_vreg(dp, rm);
 -                    }
 -                }
 -            }
 -        }
 -        break;
 -    case 0xc:
 -    case 0xd:
 -        /* Already handled by decodetree */
 -        return 1;
 -    default:
 -        /* Should never happen.  */
 -        return 1;
 -    }
 -    return 0;
 +    /* If the decodetree decoder didn't handle this insn, it must be UNDEF */
 +    return 1;
  }
- static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VCVT_fix_sp  ---- 1110 1.11 1.1. .... 1010 .1.0 .... \
-              vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
- VCVT_fix_dp  ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
-              vd=%vd_dp imm=%vm_sp opc=%vcvt_fix_op
-+
-+# VCVT float to integer (VCVT and VCVTR): Vd always single; Vd depends on size
-+VCVT_sp_int  ---- 1110 1.11 110 s:1 .... 1010 rz:1 1.0 .... \
-+             vd=%vd_sp vm=%vm_sp
-+VCVT_dp_int  ---- 1110 1.11 110 s:1 .... 1011 rz:1 1.0 .... \
-+             vd=%vd_sp vm=%vm_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 44/48] target/arm: Convert integer-to-float insns to decodetree
+[PULL 13/31] hw/arm/exynos4210: Fix code style nit in combiner_grp_to_gic_id[]
-Convert the VCVT integer-to-float instructions to decodetree.
+Fix a missing set of spaces around '-' in the definition of
 combiner_grp_to_gic_id[]. We're about to move this code, so
 fix the style issue first to keep checkpatch happy with the
 code-motion patch.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-7-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 58 ++++++++++++++++++++++++++++++++++
+ hw/intc/exynos4210_gic.c | 2 +-
- target/arm/translate.c         | 12 +------
+file changed, 1 insertion(+), 1 deletion(-)
  target/arm/vfp.decode          |  6 ++++
 files changed, 65 insertions(+), 11 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/intc/exynos4210_gic.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/intc/exynos4210_gic.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
+@@ -XXX,XX +XXX,XX @@ enum ExtInt {
-     tcg_temp_free_i64(vm);
+  */
-     return true;
- }
+ static const uint32_t
-+
+-combiner_grp_to_gic_id[64-EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
-+static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
++combiner_grp_to_gic_id[64 - EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
-+{
+     /* int combiner groups 16-19 */
-+    TCGv_i32 vm;
+     { }, { }, { }, { },
-+    TCGv_ptr fpst;
+     /* int combiner group 20 */
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    vm = tcg_temp_new_i32();
 +    neon_load_reg32(vm, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    if (a->s) {
 +        /* i32 -> f32 */
 +        gen_helper_vfp_sitos(vm, vm, fpst);
 +    } else {
 +        /* u32 -> f32 */
 +        gen_helper_vfp_uitos(vm, vm, fpst);
 +    }
 +    neon_store_reg32(vm, a->vd);
 +    tcg_temp_free_i32(vm);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
 +static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
 +{
 +    TCGv_i32 vm;
 +    TCGv_i64 vd;
 +    TCGv_ptr fpst;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    vm = tcg_temp_new_i32();
 +    vd = tcg_temp_new_i64();
 +    neon_load_reg32(vm, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    if (a->s) {
 +        /* i32 -> f64 */
 +        gen_helper_vfp_sitod(vd, vm, fpst);
 +    } else {
 +        /* u32 -> f64 */
 +        gen_helper_vfp_uitod(vd, vm, fpst);
 +    }
 +    neon_store_reg64(vd, a->vd);
 +    tcg_temp_free_i32(vm);
 +    tcg_temp_free_i64(vd);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  return 1;
              case 15:
                  switch (rn) {
 -                case 0 ... 15:
 +                case 0 ... 17:
                      /* Already handled by decodetree */
                      return 1;
                  default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              if (op == 15) {
                  /* rn is opcode, encoded as per VFP_SREG_N. */
                  switch (rn) {
 -                case 0x10: /* vcvt.fxx.u32 */
 -                case 0x11: /* vcvt.fxx.s32 */
 -                    rm_is_dp = false;
 -                    break;
                  case 0x18: /* vcvtr.u32.fxx */
                  case 0x19: /* vcvtz.u32.fxx */
                  case 0x1a: /* vcvtr.s32.fxx */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 16: /* fuito */
 -                        gen_vfp_uito(dp, 0);
 -                        break;
 -                    case 17: /* fsito */
 -                        gen_vfp_sito(dp, 0);
 -                        break;
                      case 19: /* vjcvt */
                          gen_helper_vjcvt(cpu_F0s, cpu_F0d, cpu_env);
                          break;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_sp      ---- 1110 1.11 0111 .... 1010 11.0 .... \
               vd=%vd_dp vm=%vm_sp
  VCVT_dp      ---- 1110 1.11 0111 .... 1011 11.0 .... \
               vd=%vd_sp vm=%vm_dp
 +
 +# VCVT from integer to floating point: Vm always single; Vd depends on size
 +VCVT_int_sp  ---- 1110 1.11 1000 .... 1010 s:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VCVT_int_dp  ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
 +             vd=%vd_dp vm=%vm_sp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 19/48] target/arm: Convert "single-precision" register moves to decodetree
+[PULL 14/31] hw/arm/exynos4210: Move exynos4210_init_board_irqs() into exynos4210.c
-Convert the "single-precision" register moves to decodetree:
+The function exynos4210_init_board_irqs() currently lives in
- * VMSR
+exynos4210_gic.c, but it isn't really part of the exynos4210.gic
- * VMRS
+device -- it is a function that implements (some of) the wiring up of
- * VMOV between general purpose register and single precision
+interrupts between the SoC's GIC and combiner components.  This means
+it fits better in exynos4210.c, which is the SoC-level code.  Move it
-Note that the VMSR/VMRS conversions make our handling of
+there. Similarly, exynos4210_git_irq() is used almost only in the
-the "should this UNDEF?" checks consistent between the two
+SoC-level code, so move it too.
 instructions:
  * VMSR to MVFR0, MVFR1, MVFR2 now UNDEF from EL0
    (previously was a nop)
  * VMSR to FPSID now UNDEFs from EL0 or if VFPv3 or better
    (previously was a nop)
  * VMSR to FPINST and FPINST2 now UNDEF if VFPv3 or better
    (previously would write to the register, which had no
    guest-visible effect because we always UNDEF reads)
 We also tighten up the decode: we were previously underdecoding
 some SBZ or SBO bits.
 The conversion of VMOV_single includes the expansion out of the
 gen_mov_F0_vreg()/gen_vfp_mrs() and gen_mov_vreg_F0()/gen_vfp_msr()
 sequences into the simpler direct load/store of the TCG temp via
 neon_{load,store}_reg32(): we know in the new function that we're
 always single-precision, we don't need to use the old-and-deprecated
 cpu_F0* TCG globals, and we don't happen to have the declaration of
 gen_vfp_msr() and gen_vfp_mrs() at the point in the file where the
 new function is.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-8-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 161 +++++++++++++++++++++++++++++++++
+ include/hw/arm/exynos4210.h |   4 -
- target/arm/translate.c         | 148 +-----------------------------
+ hw/arm/exynos4210.c         | 202 +++++++++++++++++++++++++++++++++++
- target/arm/vfp.decode          |   4 +
+ hw/intc/exynos4210_gic.c    | 204 ------------------------------------
-files changed, 168 insertions(+), 145 deletions(-)
+files changed, 202 insertions(+), 208 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/exynos4210.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/exynos4210.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
+@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210State, EXYNOS4210_SOC)
+ void exynos4210_write_secondary(ARMCPU *cpu,
-     return true;
+         const struct arm_boot_info *info);
- }
-+
+-/* Initialize board IRQs.
-+static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
+- * These IRQs contain splitted Int/External Combiner and External Gic IRQs */
 -void exynos4210_init_board_irqs(Exynos4210State *s);
 -
  /* Get IRQ number from exynos4210 IRQ subsystem stub.
   * To identify IRQ source use internal combiner group and bit number
   *  grp - group number
 diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/exynos4210.c
 +++ b/hw/arm/exynos4210.c
@@ -XXX,XX +XXX,XX @@
  #define EXYNOS4210_PL330_BASE1_ADDR         0x12690000
  #define EXYNOS4210_PL330_BASE2_ADDR         0x12850000
 +enum ExtGicId {
 +    EXT_GIC_ID_MDMA_LCD0 = 66,
 +    EXT_GIC_ID_PDMA0,
 +    EXT_GIC_ID_PDMA1,
 +    EXT_GIC_ID_TIMER0,
 +    EXT_GIC_ID_TIMER1,
 +    EXT_GIC_ID_TIMER2,
 +    EXT_GIC_ID_TIMER3,
 +    EXT_GIC_ID_TIMER4,
 +    EXT_GIC_ID_MCT_L0,
 +    EXT_GIC_ID_WDT,
 +    EXT_GIC_ID_RTC_ALARM,
 +    EXT_GIC_ID_RTC_TIC,
 +    EXT_GIC_ID_GPIO_XB,
 +    EXT_GIC_ID_GPIO_XA,
 +    EXT_GIC_ID_MCT_L1,
 +    EXT_GIC_ID_IEM_APC,
 +    EXT_GIC_ID_IEM_IEC,
 +    EXT_GIC_ID_NFC,
 +    EXT_GIC_ID_UART0,
 +    EXT_GIC_ID_UART1,
 +    EXT_GIC_ID_UART2,
 +    EXT_GIC_ID_UART3,
 +    EXT_GIC_ID_UART4,
 +    EXT_GIC_ID_MCT_G0,
 +    EXT_GIC_ID_I2C0,
 +    EXT_GIC_ID_I2C1,
 +    EXT_GIC_ID_I2C2,
 +    EXT_GIC_ID_I2C3,
 +    EXT_GIC_ID_I2C4,
 +    EXT_GIC_ID_I2C5,
 +    EXT_GIC_ID_I2C6,
 +    EXT_GIC_ID_I2C7,
 +    EXT_GIC_ID_SPI0,
 +    EXT_GIC_ID_SPI1,
 +    EXT_GIC_ID_SPI2,
 +    EXT_GIC_ID_MCT_G1,
 +    EXT_GIC_ID_USB_HOST,
 +    EXT_GIC_ID_USB_DEVICE,
 +    EXT_GIC_ID_MODEMIF,
 +    EXT_GIC_ID_HSMMC0,
 +    EXT_GIC_ID_HSMMC1,
 +    EXT_GIC_ID_HSMMC2,
 +    EXT_GIC_ID_HSMMC3,
 +    EXT_GIC_ID_SDMMC,
 +    EXT_GIC_ID_MIPI_CSI_4LANE,
 +    EXT_GIC_ID_MIPI_DSI_4LANE,
 +    EXT_GIC_ID_MIPI_CSI_2LANE,
 +    EXT_GIC_ID_MIPI_DSI_2LANE,
 +    EXT_GIC_ID_ONENAND_AUDI,
 +    EXT_GIC_ID_ROTATOR,
 +    EXT_GIC_ID_FIMC0,
 +    EXT_GIC_ID_FIMC1,
 +    EXT_GIC_ID_FIMC2,
 +    EXT_GIC_ID_FIMC3,
 +    EXT_GIC_ID_JPEG,
 +    EXT_GIC_ID_2D,
 +    EXT_GIC_ID_PCIe,
 +    EXT_GIC_ID_MIXER,
 +    EXT_GIC_ID_HDMI,
 +    EXT_GIC_ID_HDMI_I2C,
 +    EXT_GIC_ID_MFC,
 +    EXT_GIC_ID_TVENC,
 +};
 +
 +enum ExtInt {
 +    EXT_GIC_ID_EXTINT0 = 48,
 +    EXT_GIC_ID_EXTINT1,
 +    EXT_GIC_ID_EXTINT2,
 +    EXT_GIC_ID_EXTINT3,
 +    EXT_GIC_ID_EXTINT4,
 +    EXT_GIC_ID_EXTINT5,
 +    EXT_GIC_ID_EXTINT6,
 +    EXT_GIC_ID_EXTINT7,
 +    EXT_GIC_ID_EXTINT8,
 +    EXT_GIC_ID_EXTINT9,
 +    EXT_GIC_ID_EXTINT10,
 +    EXT_GIC_ID_EXTINT11,
 +    EXT_GIC_ID_EXTINT12,
 +    EXT_GIC_ID_EXTINT13,
 +    EXT_GIC_ID_EXTINT14,
 +    EXT_GIC_ID_EXTINT15
 +};
 +
 +/*
 + * External GIC sources which are not from External Interrupt Combiner or
 + * External Interrupts are starting from EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ,
 + * which is INTG16 in Internal Interrupt Combiner.
 + */
 +
 +static const uint32_t
 +combiner_grp_to_gic_id[64 - EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
 +    /* int combiner groups 16-19 */
 +    { }, { }, { }, { },
 +    /* int combiner group 20 */
 +    { 0, EXT_GIC_ID_MDMA_LCD0 },
 +    /* int combiner group 21 */
 +    { EXT_GIC_ID_PDMA0, EXT_GIC_ID_PDMA1 },
 +    /* int combiner group 22 */
 +    { EXT_GIC_ID_TIMER0, EXT_GIC_ID_TIMER1, EXT_GIC_ID_TIMER2,
 +            EXT_GIC_ID_TIMER3, EXT_GIC_ID_TIMER4 },
 +    /* int combiner group 23 */
 +    { EXT_GIC_ID_RTC_ALARM, EXT_GIC_ID_RTC_TIC },
 +    /* int combiner group 24 */
 +    { EXT_GIC_ID_GPIO_XB, EXT_GIC_ID_GPIO_XA },
 +    /* int combiner group 25 */
 +    { EXT_GIC_ID_IEM_APC, EXT_GIC_ID_IEM_IEC },
 +    /* int combiner group 26 */
 +    { EXT_GIC_ID_UART0, EXT_GIC_ID_UART1, EXT_GIC_ID_UART2, EXT_GIC_ID_UART3,
 +            EXT_GIC_ID_UART4 },
 +    /* int combiner group 27 */
 +    { EXT_GIC_ID_I2C0, EXT_GIC_ID_I2C1, EXT_GIC_ID_I2C2, EXT_GIC_ID_I2C3,
 +            EXT_GIC_ID_I2C4, EXT_GIC_ID_I2C5, EXT_GIC_ID_I2C6,
 +            EXT_GIC_ID_I2C7 },
 +    /* int combiner group 28 */
 +    { EXT_GIC_ID_SPI0, EXT_GIC_ID_SPI1, EXT_GIC_ID_SPI2 , EXT_GIC_ID_USB_HOST},
 +    /* int combiner group 29 */
 +    { EXT_GIC_ID_HSMMC0, EXT_GIC_ID_HSMMC1, EXT_GIC_ID_HSMMC2,
 +     EXT_GIC_ID_HSMMC3, EXT_GIC_ID_SDMMC },
 +    /* int combiner group 30 */
 +    { EXT_GIC_ID_MIPI_CSI_4LANE, EXT_GIC_ID_MIPI_CSI_2LANE },
 +    /* int combiner group 31 */
 +    { EXT_GIC_ID_MIPI_DSI_4LANE, EXT_GIC_ID_MIPI_DSI_2LANE },
 +    /* int combiner group 32 */
 +    { EXT_GIC_ID_FIMC0, EXT_GIC_ID_FIMC1 },
 +    /* int combiner group 33 */
 +    { EXT_GIC_ID_FIMC2, EXT_GIC_ID_FIMC3 },
 +    /* int combiner group 34 */
 +    { EXT_GIC_ID_ONENAND_AUDI, EXT_GIC_ID_NFC },
 +    /* int combiner group 35 */
 +    { 0, 0, 0, EXT_GIC_ID_MCT_L1, EXT_GIC_ID_MCT_G0, EXT_GIC_ID_MCT_G1 },
 +    /* int combiner group 36 */
 +    { EXT_GIC_ID_MIXER },
 +    /* int combiner group 37 */
 +    { EXT_GIC_ID_EXTINT4, EXT_GIC_ID_EXTINT5, EXT_GIC_ID_EXTINT6,
 +     EXT_GIC_ID_EXTINT7 },
 +    /* groups 38-50 */
 +    { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { },
 +    /* int combiner group 51 */
 +    { EXT_GIC_ID_MCT_L0, 0, 0, 0, EXT_GIC_ID_MCT_G0, EXT_GIC_ID_MCT_G1 },
 +    /* group 52 */
 +    { },
 +    /* int combiner group 53 */
 +    { EXT_GIC_ID_WDT, 0, 0, 0, EXT_GIC_ID_MCT_G0, EXT_GIC_ID_MCT_G1 },
 +    /* groups 54-63 */
 +    { }, { }, { }, { }, { }, { }, { }, { }, { }, { }
 +};
 +
 +/*
 + * Initialize board IRQs.
 + * These IRQs contain splitted Int/External Combiner and External Gic IRQs.
 + */
 +static void exynos4210_init_board_irqs(Exynos4210State *s)
 +{
-+    TCGv_i32 tmp;
++    uint32_t grp, bit, irq_id, n;
-+    bool ignore_vfp_enabled = false;
++    Exynos4210Irq *is = &s->irqs;
 +
-+    if (arm_dc_feature(s, ARM_FEATURE_M)) {
++    for (n = 0; n < EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ; n++) {
-+        /*
++        irq_id = 0;
-+         * The only M-profile VFP vmrs/vmsr sysreg is FPSCR.
++        if (n == EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 4) ||
-+         * Writes to R15 are UNPREDICTABLE; we choose to undef.
++                n == EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 4)) {
-+         */
++            /* MCT_G0 is passed to External GIC */
-+        if (a->rt == 15 || a->reg != ARM_VFP_FPSCR) {
++            irq_id = EXT_GIC_ID_MCT_G0;
-+            return false;
++        }
 +        if (n == EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 5) ||
 +                n == EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 5)) {
 +            /* MCT_G1 is passed to External and GIC */
 +            irq_id = EXT_GIC_ID_MCT_G1;
 +        }
 +        if (irq_id) {
 +            s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
 +                    is->ext_gic_irq[irq_id - 32]);
 +        } else {
 +            s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
 +                    is->ext_combiner_irq[n]);
 +        }
 +    }
-+
++    for (; n < EXYNOS4210_MAX_INT_COMBINER_IN_IRQ; n++) {
-+    switch (a->reg) {
++        /* these IDs are passed to Internal Combiner and External GIC */
-+    case ARM_VFP_FPSID:
++        grp = EXYNOS4210_COMBINER_GET_GRP_NUM(n);
-+        /*
++        bit = EXYNOS4210_COMBINER_GET_BIT_NUM(n);
-+         * VFPv2 allows access to FPSID from userspace; VFPv3 restricts
++        irq_id = combiner_grp_to_gic_id[grp -
-+         * all ID registers to privileged access only.
++                     EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][bit];
-+         */
++
-+        if (IS_USER(s) && arm_dc_feature(s, ARM_FEATURE_VFP3)) {
++        if (irq_id) {
-+            return false;
++            s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
-+        }
++                    is->ext_gic_irq[irq_id - 32]);
 +        ignore_vfp_enabled = true;
 +        break;
 +    case ARM_VFP_MVFR0:
 +    case ARM_VFP_MVFR1:
 +        if (IS_USER(s) || !arm_dc_feature(s, ARM_FEATURE_MVFR)) {
 +            return false;
 +        }
 +        ignore_vfp_enabled = true;
 +        break;
 +    case ARM_VFP_MVFR2:
 +        if (IS_USER(s) || !arm_dc_feature(s, ARM_FEATURE_V8)) {
 +            return false;
 +        }
 +        ignore_vfp_enabled = true;
 +        break;
 +    case ARM_VFP_FPSCR:
 +        break;
 +    case ARM_VFP_FPEXC:
 +        if (IS_USER(s)) {
 +            return false;
 +        }
 +        ignore_vfp_enabled = true;
 +        break;
 +    case ARM_VFP_FPINST:
 +    case ARM_VFP_FPINST2:
 +        /* Not present in VFPv3 */
 +        if (IS_USER(s) || arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 +            return false;
 +        }
 +        break;
 +    default:
 +        return false;
 +    }
 +
 +    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
 +        return true;
 +    }
 +
 +    if (a->l) {
 +        /* VMRS, move VFP special register to gp register */
 +        switch (a->reg) {
 +        case ARM_VFP_FPSID:
 +        case ARM_VFP_FPEXC:
 +        case ARM_VFP_FPINST:
 +        case ARM_VFP_FPINST2:
 +        case ARM_VFP_MVFR0:
 +        case ARM_VFP_MVFR1:
 +        case ARM_VFP_MVFR2:
 +            tmp = load_cpu_field(vfp.xregs[a->reg]);
 +            break;
 +        case ARM_VFP_FPSCR:
 +            if (a->rt == 15) {
 +                tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
 +                tcg_gen_andi_i32(tmp, tmp, 0xf0000000);
 +            } else {
 +                tmp = tcg_temp_new_i32();
 +                gen_helper_vfp_get_fpscr(tmp, cpu_env);
 +            }
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +
 +        if (a->rt == 15) {
 +            /* Set the 4 flag bits in the CPSR.  */
 +            gen_set_nzcv(tmp);
 +            tcg_temp_free_i32(tmp);
 +        } else {
 +            store_reg(s, a->rt, tmp);
 +        }
 +    } else {
 +        /* VMSR, move gp register to VFP special register */
 +        switch (a->reg) {
 +        case ARM_VFP_FPSID:
 +        case ARM_VFP_MVFR0:
 +        case ARM_VFP_MVFR1:
 +        case ARM_VFP_MVFR2:
 +            /* Writes are ignored.  */
 +            break;
 +        case ARM_VFP_FPSCR:
 +            tmp = load_reg(s, a->rt);
 +            gen_helper_vfp_set_fpscr(cpu_env, tmp);
 +            tcg_temp_free_i32(tmp);
 +            gen_lookup_tb(s);
 +            break;
 +        case ARM_VFP_FPEXC:
 +            /*
 +             * TODO: VFP subarchitecture support.
 +             * For now, keep the EN bit only
 +             */
 +            tmp = load_reg(s, a->rt);
 +            tcg_gen_andi_i32(tmp, tmp, 1 << 30);
 +            store_cpu_field(tmp, vfp.xregs[a->reg]);
 +            gen_lookup_tb(s);
 +            break;
 +        case ARM_VFP_FPINST:
 +        case ARM_VFP_FPINST2:
 +            tmp = load_reg(s, a->rt);
 +            store_cpu_field(tmp, vfp.xregs[a->reg]);
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +    }
-+
-+    return true;
 +}
 +
-+static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
++/*
 + * Get IRQ number from exynos4210 IRQ subsystem stub.
 + * To identify IRQ source use internal combiner group and bit number
 + *  grp - group number
 + *  bit - bit number inside group
 + */
 +uint32_t exynos4210_get_irq(uint32_t grp, uint32_t bit)
 +{
-+    TCGv_i32 tmp;
++    return EXYNOS4210_COMBINER_GET_IRQ_NUM(grp, bit);
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (a->l) {
 +        /* VFP to general purpose register */
 +        tmp = tcg_temp_new_i32();
 +        neon_load_reg32(tmp, a->vn);
 +        if (a->rt == 15) {
 +            /* Set the 4 flag bits in the CPSR.  */
 +            gen_set_nzcv(tmp);
 +            tcg_temp_free_i32(tmp);
 +        } else {
 +            store_reg(s, a->rt, tmp);
 +        }
 +    } else {
 +        /* general purpose register to VFP */
 +        tmp = load_reg(s, a->rt);
 +        neon_store_reg32(tmp, a->vn);
 +        tcg_temp_free_i32(tmp);
 +    }
 +
 +    return true;
 +}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
++
  static uint8_t chipid_and_omr[] = { 0x11, 0x02, 0x21, 0x43,
 x09, 0x00, 0x00, 0x00 };
 diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/intc/exynos4210_gic.c
-+++ b/target/arm/translate.c
++++ b/hw/intc/exynos4210_gic.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@
-     TCGv_i32 addr;
+ #include "hw/arm/exynos4210.h"
-     TCGv_i32 tmp;
+ #include "qom/object.h"
-     TCGv_i32 tmp2;
--    bool ignore_vfp_enabled = false;
+-enum ExtGicId {
+-    EXT_GIC_ID_MDMA_LCD0 = 66,
-     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
+-    EXT_GIC_ID_PDMA0,
-         return 1;
+-    EXT_GIC_ID_PDMA1,
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-    EXT_GIC_ID_TIMER0,
-      * for invalid encodings; we will generate incorrect syndrome information
+-    EXT_GIC_ID_TIMER1,
-      * for attempts to execute invalid vfp/neon encodings with FP disabled.
+-    EXT_GIC_ID_TIMER2,
-      */
+-    EXT_GIC_ID_TIMER3,
--    if ((insn & 0x0fe00fff) == 0x0ee00a10) {
+-    EXT_GIC_ID_TIMER4,
--        rn = (insn >> 16) & 0xf;
+-    EXT_GIC_ID_MCT_L0,
--        if (rn == ARM_VFP_FPSID || rn == ARM_VFP_FPEXC || rn == ARM_VFP_MVFR2
+-    EXT_GIC_ID_WDT,
--            || rn == ARM_VFP_MVFR1 || rn == ARM_VFP_MVFR0) {
+-    EXT_GIC_ID_RTC_ALARM,
--            ignore_vfp_enabled = true;
+-    EXT_GIC_ID_RTC_TIC,
 -    EXT_GIC_ID_GPIO_XB,
 -    EXT_GIC_ID_GPIO_XA,
 -    EXT_GIC_ID_MCT_L1,
 -    EXT_GIC_ID_IEM_APC,
 -    EXT_GIC_ID_IEM_IEC,
 -    EXT_GIC_ID_NFC,
 -    EXT_GIC_ID_UART0,
 -    EXT_GIC_ID_UART1,
 -    EXT_GIC_ID_UART2,
 -    EXT_GIC_ID_UART3,
 -    EXT_GIC_ID_UART4,
 -    EXT_GIC_ID_MCT_G0,
 -    EXT_GIC_ID_I2C0,
 -    EXT_GIC_ID_I2C1,
 -    EXT_GIC_ID_I2C2,
 -    EXT_GIC_ID_I2C3,
 -    EXT_GIC_ID_I2C4,
 -    EXT_GIC_ID_I2C5,
 -    EXT_GIC_ID_I2C6,
 -    EXT_GIC_ID_I2C7,
 -    EXT_GIC_ID_SPI0,
 -    EXT_GIC_ID_SPI1,
 -    EXT_GIC_ID_SPI2,
 -    EXT_GIC_ID_MCT_G1,
 -    EXT_GIC_ID_USB_HOST,
 -    EXT_GIC_ID_USB_DEVICE,
 -    EXT_GIC_ID_MODEMIF,
 -    EXT_GIC_ID_HSMMC0,
 -    EXT_GIC_ID_HSMMC1,
 -    EXT_GIC_ID_HSMMC2,
 -    EXT_GIC_ID_HSMMC3,
 -    EXT_GIC_ID_SDMMC,
 -    EXT_GIC_ID_MIPI_CSI_4LANE,
 -    EXT_GIC_ID_MIPI_DSI_4LANE,
 -    EXT_GIC_ID_MIPI_CSI_2LANE,
 -    EXT_GIC_ID_MIPI_DSI_2LANE,
 -    EXT_GIC_ID_ONENAND_AUDI,
 -    EXT_GIC_ID_ROTATOR,
 -    EXT_GIC_ID_FIMC0,
 -    EXT_GIC_ID_FIMC1,
 -    EXT_GIC_ID_FIMC2,
 -    EXT_GIC_ID_FIMC3,
 -    EXT_GIC_ID_JPEG,
 -    EXT_GIC_ID_2D,
 -    EXT_GIC_ID_PCIe,
 -    EXT_GIC_ID_MIXER,
 -    EXT_GIC_ID_HDMI,
 -    EXT_GIC_ID_HDMI_I2C,
 -    EXT_GIC_ID_MFC,
 -    EXT_GIC_ID_TVENC,
 -};
 -
 -enum ExtInt {
 -    EXT_GIC_ID_EXTINT0 = 48,
 -    EXT_GIC_ID_EXTINT1,
 -    EXT_GIC_ID_EXTINT2,
 -    EXT_GIC_ID_EXTINT3,
 -    EXT_GIC_ID_EXTINT4,
 -    EXT_GIC_ID_EXTINT5,
 -    EXT_GIC_ID_EXTINT6,
 -    EXT_GIC_ID_EXTINT7,
 -    EXT_GIC_ID_EXTINT8,
 -    EXT_GIC_ID_EXTINT9,
 -    EXT_GIC_ID_EXTINT10,
 -    EXT_GIC_ID_EXTINT11,
 -    EXT_GIC_ID_EXTINT12,
 -    EXT_GIC_ID_EXTINT13,
 -    EXT_GIC_ID_EXTINT14,
 -    EXT_GIC_ID_EXTINT15
 -};
 -
 -/*
 - * External GIC sources which are not from External Interrupt Combiner or
 - * External Interrupts are starting from EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ,
 - * which is INTG16 in Internal Interrupt Combiner.
 - */
 -
 -static const uint32_t
 -combiner_grp_to_gic_id[64 - EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
 -    /* int combiner groups 16-19 */
 -    { }, { }, { }, { },
 -    /* int combiner group 20 */
 -    { 0, EXT_GIC_ID_MDMA_LCD0 },
 -    /* int combiner group 21 */
 -    { EXT_GIC_ID_PDMA0, EXT_GIC_ID_PDMA1 },
 -    /* int combiner group 22 */
 -    { EXT_GIC_ID_TIMER0, EXT_GIC_ID_TIMER1, EXT_GIC_ID_TIMER2,
 -            EXT_GIC_ID_TIMER3, EXT_GIC_ID_TIMER4 },
 -    /* int combiner group 23 */
 -    { EXT_GIC_ID_RTC_ALARM, EXT_GIC_ID_RTC_TIC },
 -    /* int combiner group 24 */
 -    { EXT_GIC_ID_GPIO_XB, EXT_GIC_ID_GPIO_XA },
 -    /* int combiner group 25 */
 -    { EXT_GIC_ID_IEM_APC, EXT_GIC_ID_IEM_IEC },
 -    /* int combiner group 26 */
 -    { EXT_GIC_ID_UART0, EXT_GIC_ID_UART1, EXT_GIC_ID_UART2, EXT_GIC_ID_UART3,
 -            EXT_GIC_ID_UART4 },
 -    /* int combiner group 27 */
 -    { EXT_GIC_ID_I2C0, EXT_GIC_ID_I2C1, EXT_GIC_ID_I2C2, EXT_GIC_ID_I2C3,
 -            EXT_GIC_ID_I2C4, EXT_GIC_ID_I2C5, EXT_GIC_ID_I2C6,
 -            EXT_GIC_ID_I2C7 },
 -    /* int combiner group 28 */
 -    { EXT_GIC_ID_SPI0, EXT_GIC_ID_SPI1, EXT_GIC_ID_SPI2 , EXT_GIC_ID_USB_HOST},
 -    /* int combiner group 29 */
 -    { EXT_GIC_ID_HSMMC0, EXT_GIC_ID_HSMMC1, EXT_GIC_ID_HSMMC2,
 -     EXT_GIC_ID_HSMMC3, EXT_GIC_ID_SDMMC },
 -    /* int combiner group 30 */
 -    { EXT_GIC_ID_MIPI_CSI_4LANE, EXT_GIC_ID_MIPI_CSI_2LANE },
 -    /* int combiner group 31 */
 -    { EXT_GIC_ID_MIPI_DSI_4LANE, EXT_GIC_ID_MIPI_DSI_2LANE },
 -    /* int combiner group 32 */
 -    { EXT_GIC_ID_FIMC0, EXT_GIC_ID_FIMC1 },
 -    /* int combiner group 33 */
 -    { EXT_GIC_ID_FIMC2, EXT_GIC_ID_FIMC3 },
 -    /* int combiner group 34 */
 -    { EXT_GIC_ID_ONENAND_AUDI, EXT_GIC_ID_NFC },
 -    /* int combiner group 35 */
 -    { 0, 0, 0, EXT_GIC_ID_MCT_L1, EXT_GIC_ID_MCT_G0, EXT_GIC_ID_MCT_G1 },
 -    /* int combiner group 36 */
 -    { EXT_GIC_ID_MIXER },
 -    /* int combiner group 37 */
 -    { EXT_GIC_ID_EXTINT4, EXT_GIC_ID_EXTINT5, EXT_GIC_ID_EXTINT6,
 -     EXT_GIC_ID_EXTINT7 },
 -    /* groups 38-50 */
 -    { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { },
 -    /* int combiner group 51 */
 -    { EXT_GIC_ID_MCT_L0, 0, 0, 0, EXT_GIC_ID_MCT_G0, EXT_GIC_ID_MCT_G1 },
 -    /* group 52 */
 -    { },
 -    /* int combiner group 53 */
 -    { EXT_GIC_ID_WDT, 0, 0, 0, EXT_GIC_ID_MCT_G0, EXT_GIC_ID_MCT_G1 },
 -    /* groups 54-63 */
 -    { }, { }, { }, { }, { }, { }, { }, { }, { }, { }
 -};
 -
  #define EXYNOS4210_GIC_NIRQ 160
  #define EXYNOS4210_EXT_GIC_CPU_REGION_SIZE     0x10000
@@ -XXX,XX +XXX,XX @@ combiner_grp_to_gic_id[64 - EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
  #define EXYNOS4210_GIC_CPU_REGION_SIZE  0x100
  #define EXYNOS4210_GIC_DIST_REGION_SIZE 0x1000
 -/*
 - * Initialize board IRQs.
 - * These IRQs contain splitted Int/External Combiner and External Gic IRQs.
 - */
 -void exynos4210_init_board_irqs(Exynos4210State *s)
 -{
 -    uint32_t grp, bit, irq_id, n;
 -    Exynos4210Irq *is = &s->irqs;
 -
 -    for (n = 0; n < EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ; n++) {
 -        irq_id = 0;
 -        if (n == EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 4) ||
 -                n == EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 4)) {
 -            /* MCT_G0 is passed to External GIC */
 -            irq_id = EXT_GIC_ID_MCT_G0;
 -        }
 -        if (n == EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 5) ||
 -                n == EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 5)) {
 -            /* MCT_G1 is passed to External and GIC */
 -            irq_id = EXT_GIC_ID_MCT_G1;
 -        }
 -        if (irq_id) {
 -            s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
 -                    is->ext_gic_irq[irq_id - 32]);
 -        } else {
 -            s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
 -                    is->ext_combiner_irq[n]);
 -        }
 -    }
--    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
+-    for (; n < EXYNOS4210_MAX_INT_COMBINER_IN_IRQ; n++) {
-+    if (!vfp_access_check(s)) {
+-        /* these IDs are passed to Internal Combiner and External GIC */
-         return 0;
+-        grp = EXYNOS4210_COMBINER_GET_GRP_NUM(n);
-     }
+-        bit = EXYNOS4210_COMBINER_GET_BIT_NUM(n);
+-        irq_id = combiner_grp_to_gic_id[grp -
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-                     EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][bit];
-     switch ((insn >> 24) & 0xf) {
+-
-     case 0xe:
+-        if (irq_id) {
-         if (insn & (1 << 4)) {
+-            s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
--            /* single register transfer */
+-                    is->ext_gic_irq[irq_id - 32]);
--            rd = (insn >> 12) & 0xf;
+-        }
--            if (dp) {
+-    }
--                /* already handled by decodetree */
+-}
--                return 1;
+-
--            } else { /* !dp */
+-/*
--                bool is_sysreg;
+- * Get IRQ number from exynos4210 IRQ subsystem stub.
--
+- * To identify IRQ source use internal combiner group and bit number
--                if ((insn & 0x6f) != 0x00)
+- *  grp - group number
--                    return 1;
+- *  bit - bit number inside group
--                rn = VFP_SREG_N(insn);
+- */
--
+-uint32_t exynos4210_get_irq(uint32_t grp, uint32_t bit)
--                is_sysreg = extract32(insn, 21, 1);
+-{
--
+-    return EXYNOS4210_COMBINER_GET_IRQ_NUM(grp, bit);
--                if (arm_dc_feature(s, ARM_FEATURE_M)) {
+-}
--                    /*
+-
--                     * The only M-profile VFP vmrs/vmsr sysreg is FPSCR.
+-/********* GIC part *********/
--                     * Writes to R15 are UNPREDICTABLE; we choose to undef.
+-
--                     */
+ #define TYPE_EXYNOS4210_GIC "exynos4210.gic"
--                    if (is_sysreg && (rd == 15 || (rn >> 1) != ARM_VFP_FPSCR)) {
+ OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210GicState, EXYNOS4210_GIC)
--                        return 1;
 -                    }
 -                }
 -
 -                if (insn & ARM_CP_RW_BIT) {
 -                    /* vfp->arm */
 -                    if (is_sysreg) {
 -                        /* system register */
 -                        rn >>= 1;
 -
 -                        switch (rn) {
 -                        case ARM_VFP_FPSID:
 -                            /* VFP2 allows access to FSID from userspace.
 -                               VFP3 restricts all id registers to privileged
 -                               accesses.  */
 -                            if (IS_USER(s)
 -                                && arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                                return 1;
 -                            }
 -                            tmp = load_cpu_field(vfp.xregs[rn]);
 -                            break;
 -                        case ARM_VFP_FPEXC:
 -                            if (IS_USER(s))
 -                                return 1;
 -                            tmp = load_cpu_field(vfp.xregs[rn]);
 -                            break;
 -                        case ARM_VFP_FPINST:
 -                        case ARM_VFP_FPINST2:
 -                            /* Not present in VFP3.  */
 -                            if (IS_USER(s)
 -                                || arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                                return 1;
 -                            }
 -                            tmp = load_cpu_field(vfp.xregs[rn]);
 -                            break;
 -                        case ARM_VFP_FPSCR:
 -                            if (rd == 15) {
 -                                tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
 -                                tcg_gen_andi_i32(tmp, tmp, 0xf0000000);
 -                            } else {
 -                                tmp = tcg_temp_new_i32();
 -                                gen_helper_vfp_get_fpscr(tmp, cpu_env);
 -                            }
 -                            break;
 -                        case ARM_VFP_MVFR2:
 -                            if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
 -                                return 1;
 -                            }
 -                            /* fall through */
 -                        case ARM_VFP_MVFR0:
 -                        case ARM_VFP_MVFR1:
 -                            if (IS_USER(s)
 -                                || !arm_dc_feature(s, ARM_FEATURE_MVFR)) {
 -                                return 1;
 -                            }
 -                            tmp = load_cpu_field(vfp.xregs[rn]);
 -                            break;
 -                        default:
 -                            return 1;
 -                        }
 -                    } else {
 -                        gen_mov_F0_vreg(0, rn);
 -                        tmp = gen_vfp_mrs();
 -                    }
 -                    if (rd == 15) {
 -                        /* Set the 4 flag bits in the CPSR.  */
 -                        gen_set_nzcv(tmp);
 -                        tcg_temp_free_i32(tmp);
 -                    } else {
 -                        store_reg(s, rd, tmp);
 -                    }
 -                } else {
 -                    /* arm->vfp */
 -                    if (is_sysreg) {
 -                        rn >>= 1;
 -                        /* system register */
 -                        switch (rn) {
 -                        case ARM_VFP_FPSID:
 -                        case ARM_VFP_MVFR0:
 -                        case ARM_VFP_MVFR1:
 -                            /* Writes are ignored.  */
 -                            break;
 -                        case ARM_VFP_FPSCR:
 -                            tmp = load_reg(s, rd);
 -                            gen_helper_vfp_set_fpscr(cpu_env, tmp);
 -                            tcg_temp_free_i32(tmp);
 -                            gen_lookup_tb(s);
 -                            break;
 -                        case ARM_VFP_FPEXC:
 -                            if (IS_USER(s))
 -                                return 1;
 -                            /* TODO: VFP subarchitecture support.
 -                             * For now, keep the EN bit only */
 -                            tmp = load_reg(s, rd);
 -                            tcg_gen_andi_i32(tmp, tmp, 1 << 30);
 -                            store_cpu_field(tmp, vfp.xregs[rn]);
 -                            gen_lookup_tb(s);
 -                            break;
 -                        case ARM_VFP_FPINST:
 -                        case ARM_VFP_FPINST2:
 -                            if (IS_USER(s)) {
 -                                return 1;
 -                            }
 -                            tmp = load_reg(s, rd);
 -                            store_cpu_field(tmp, vfp.xregs[rn]);
 -                            break;
 -                        default:
 -                            return 1;
 -                        }
 -                    } else {
 -                        tmp = load_reg(s, rd);
 -                        gen_vfp_msr(tmp);
 -                        gen_mov_vreg_F0(0, rn);
 -                    }
 -                }
 -            }
 +            /* already handled by decodetree */
 +            return 1;
          } else {
              /* data processing */
              bool rd_is_dp = dp;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMOV_from_gp ---- 1110 0 0 index:1 0 .... rt:4 1011 .00 1 0000 \
  VDUP         ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \
               vn=%vn_dp
 +
 +VMSR_VMRS    ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
 +VMOV_single  ---- 1110 000 l:1 .... rt:4 1010 . 001 0000 \
 +             vn=%vn_sp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 16/48] target/arm: Move the VFP trans_* functions to translate-vfp.inc.c
+[PULL 15/31] hw/arm/exynos4210: Put external GIC into state struct
-Move the trans_*() functions we've just created from translate.c
+Switch the creation of the external GIC to the new-style "embedded in
-to translate-vfp.inc.c. This is pure code motion with no textual
+state struct" approach, so we can easily refer to the object
-changes (this can be checked with 'git show --color-moved').
+elsewhere during realize.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-9-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 337 +++++++++++++++++++++++++++++++++
+ include/hw/arm/exynos4210.h      |  2 ++
- target/arm/translate.c         | 337 ---------------------------------
+ include/hw/intc/exynos4210_gic.h | 43 ++++++++++++++++++++++++++++++++
-files changed, 337 insertions(+), 337 deletions(-)
+ hw/arm/exynos4210.c              | 10 ++++----
  hw/intc/exynos4210_gic.c         | 17 ++-----------
  MAINTAINERS                      |  2 +-
 files changed, 53 insertions(+), 21 deletions(-)
  create mode 100644 include/hw/intc/exynos4210_gic.h
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/exynos4210.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/exynos4210.h
-@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check(DisasContext *s)
+@@ -XXX,XX +XXX,XX @@
- {
+ #include "hw/or-irq.h"
-     return full_vfp_access_check(s, false);
+ #include "hw/sysbus.h"
- }
+ #include "hw/cpu/a9mpcore.h"
 +#include "hw/intc/exynos4210_gic.h"
  #include "target/arm/cpu-qom.h"
  #include "qom/object.h"
@@ -XXX,XX +XXX,XX @@ struct Exynos4210State {
      qemu_or_irq pl330_irq_orgate[EXYNOS4210_NUM_DMA];
      qemu_or_irq cpu_irq_orgate[EXYNOS4210_NCPUS];
      A9MPPrivState a9mpcore;
 +    Exynos4210GicState ext_gic;
  };
  #define TYPE_EXYNOS4210_SOC "exynos4210"
 diff --git a/include/hw/intc/exynos4210_gic.h b/include/hw/intc/exynos4210_gic.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/intc/exynos4210_gic.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * Samsung exynos4210 GIC implementation. Based on hw/arm_gic.c
 + *
 + * Copyright (c) 2000 - 2011 Samsung Electronics Co., Ltd.
 + * All rights reserved.
 + *
 + * Evgeny Voevodin <e.voevodin@samsung.com>
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License as published by the
 + * Free Software Foundation; either version 2 of the License, or (at your
 + * option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 + * See the GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License along
 + * with this program; if not, see <http://www.gnu.org/licenses/>.
 + */
 +#ifndef HW_INTC_EXYNOS4210_GIC_H
 +#define HW_INTC_EXYNOS4210_GIC_H
 +
-+static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
++#include "hw/sysbus.h"
 +{
 +    uint32_t rd, rn, rm;
 +    bool dp = a->dp;
 +
-+    if (!dc_isar_feature(aa32_vsel, s)) {
++#define TYPE_EXYNOS4210_GIC "exynos4210.gic"
-+        return false;
++OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210GicState, EXYNOS4210_GIC)
 +    }
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist */
++#define EXYNOS4210_GIC_NCPUS 2
 +    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 +        ((a->vm | a->vn | a->vd) & 0x10)) {
 +        return false;
 +    }
 +    rd = a->vd;
 +    rn = a->vn;
 +    rm = a->vm;
 +
-+    if (!vfp_access_check(s)) {
++struct Exynos4210GicState {
-+        return true;
++    SysBusDevice parent_obj;
 +    }
 +
-+    if (dp) {
++    MemoryRegion cpu_container;
-+        TCGv_i64 frn, frm, dest;
++    MemoryRegion dist_container;
-+        TCGv_i64 tmp, zero, zf, nf, vf;
++    MemoryRegion cpu_alias[EXYNOS4210_GIC_NCPUS];
-+
++    MemoryRegion dist_alias[EXYNOS4210_GIC_NCPUS];
-+        zero = tcg_const_i64(0);
++    uint32_t num_cpu;
-+
++    DeviceState *gic;
 +        frn = tcg_temp_new_i64();
 +        frm = tcg_temp_new_i64();
 +        dest = tcg_temp_new_i64();
 +
 +        zf = tcg_temp_new_i64();
 +        nf = tcg_temp_new_i64();
 +        vf = tcg_temp_new_i64();
 +
 +        tcg_gen_extu_i32_i64(zf, cpu_ZF);
 +        tcg_gen_ext_i32_i64(nf, cpu_NF);
 +        tcg_gen_ext_i32_i64(vf, cpu_VF);
 +
 +        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 +        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        switch (a->cc) {
 +        case 0: /* eq: Z */
 +            tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
 +                                frn, frm);
 +            break;
 +        case 1: /* vs: V */
 +            tcg_gen_movcond_i64(TCG_COND_LT, dest, vf, zero,
 +                                frn, frm);
 +            break;
 +        case 2: /* ge: N == V -> N ^ V == 0 */
 +            tmp = tcg_temp_new_i64();
 +            tcg_gen_xor_i64(tmp, vf, nf);
 +            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
 +                                frn, frm);
 +            tcg_temp_free_i64(tmp);
 +            break;
 +        case 3: /* gt: !Z && N == V */
 +            tcg_gen_movcond_i64(TCG_COND_NE, dest, zf, zero,
 +                                frn, frm);
 +            tmp = tcg_temp_new_i64();
 +            tcg_gen_xor_i64(tmp, vf, nf);
 +            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
 +                                dest, frm);
 +            tcg_temp_free_i64(tmp);
 +            break;
 +        }
 +        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i64(frn);
 +        tcg_temp_free_i64(frm);
 +        tcg_temp_free_i64(dest);
 +
 +        tcg_temp_free_i64(zf);
 +        tcg_temp_free_i64(nf);
 +        tcg_temp_free_i64(vf);
 +
 +        tcg_temp_free_i64(zero);
 +    } else {
 +        TCGv_i32 frn, frm, dest;
 +        TCGv_i32 tmp, zero;
 +
 +        zero = tcg_const_i32(0);
 +
 +        frn = tcg_temp_new_i32();
 +        frm = tcg_temp_new_i32();
 +        dest = tcg_temp_new_i32();
 +        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 +        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        switch (a->cc) {
 +        case 0: /* eq: Z */
 +            tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
 +                                frn, frm);
 +            break;
 +        case 1: /* vs: V */
 +            tcg_gen_movcond_i32(TCG_COND_LT, dest, cpu_VF, zero,
 +                                frn, frm);
 +            break;
 +        case 2: /* ge: N == V -> N ^ V == 0 */
 +            tmp = tcg_temp_new_i32();
 +            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
 +            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
 +                                frn, frm);
 +            tcg_temp_free_i32(tmp);
 +            break;
 +        case 3: /* gt: !Z && N == V */
 +            tcg_gen_movcond_i32(TCG_COND_NE, dest, cpu_ZF, zero,
 +                                frn, frm);
 +            tmp = tcg_temp_new_i32();
 +            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
 +            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
 +                                dest, frm);
 +            tcg_temp_free_i32(tmp);
 +            break;
 +        }
 +        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i32(frn);
 +        tcg_temp_free_i32(frm);
 +        tcg_temp_free_i32(dest);
 +
 +        tcg_temp_free_i32(zero);
 +    }
 +
 +    return true;
 +}
 +
 +static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
 +{
 +    uint32_t rd, rn, rm;
 +    bool dp = a->dp;
 +    bool vmin = a->op;
 +    TCGv_ptr fpst;
 +
 +    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 +        ((a->vm | a->vn | a->vd) & 0x10)) {
 +        return false;
 +    }
 +    rd = a->vd;
 +    rn = a->vn;
 +    rm = a->vm;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(0);
 +
 +    if (dp) {
 +        TCGv_i64 frn, frm, dest;
 +
 +        frn = tcg_temp_new_i64();
 +        frm = tcg_temp_new_i64();
 +        dest = tcg_temp_new_i64();
 +
 +        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 +        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        if (vmin) {
 +            gen_helper_vfp_minnumd(dest, frn, frm, fpst);
 +        } else {
 +            gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
 +        }
 +        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i64(frn);
 +        tcg_temp_free_i64(frm);
 +        tcg_temp_free_i64(dest);
 +    } else {
 +        TCGv_i32 frn, frm, dest;
 +
 +        frn = tcg_temp_new_i32();
 +        frm = tcg_temp_new_i32();
 +        dest = tcg_temp_new_i32();
 +
 +        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 +        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        if (vmin) {
 +            gen_helper_vfp_minnums(dest, frn, frm, fpst);
 +        } else {
 +            gen_helper_vfp_maxnums(dest, frn, frm, fpst);
 +        }
 +        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i32(frn);
 +        tcg_temp_free_i32(frm);
 +        tcg_temp_free_i32(dest);
 +    }
 +
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
 +/*
 + * Table for converting the most common AArch32 encoding of
 + * rounding mode to arm_fprounding order (which matches the
 + * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
 + */
 +static const uint8_t fp_decode_rm[] = {
 +    FPROUNDING_TIEAWAY,
 +    FPROUNDING_TIEEVEN,
 +    FPROUNDING_POSINF,
 +    FPROUNDING_NEGINF,
 +};
 +
-+static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
++#endif
-+{
+diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 +    uint32_t rd, rm;
 +    bool dp = a->dp;
 +    TCGv_ptr fpst;
 +    TCGv_i32 tcg_rmode;
 +    int rounding = fp_decode_rm[a->rm];
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 +        ((a->vm | a->vd) & 0x10)) {
 +        return false;
 +    }
 +    rd = a->vd;
 +    rm = a->vm;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(0);
 +
 +    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +
 +    if (dp) {
 +        TCGv_i64 tcg_op;
 +        TCGv_i64 tcg_res;
 +        tcg_op = tcg_temp_new_i64();
 +        tcg_res = tcg_temp_new_i64();
 +        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 +        gen_helper_rintd(tcg_res, tcg_op, fpst);
 +        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i64(tcg_op);
 +        tcg_temp_free_i64(tcg_res);
 +    } else {
 +        TCGv_i32 tcg_op;
 +        TCGv_i32 tcg_res;
 +        tcg_op = tcg_temp_new_i32();
 +        tcg_res = tcg_temp_new_i32();
 +        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 +        gen_helper_rints(tcg_res, tcg_op, fpst);
 +        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i32(tcg_op);
 +        tcg_temp_free_i32(tcg_res);
 +    }
 +
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    tcg_temp_free_i32(tcg_rmode);
 +
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
 +static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
 +{
 +    uint32_t rd, rm;
 +    bool dp = a->dp;
 +    TCGv_ptr fpst;
 +    TCGv_i32 tcg_rmode, tcg_shift;
 +    int rounding = fp_decode_rm[a->rm];
 +    bool is_signed = a->op;
 +
 +    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +        return false;
 +    }
 +    rd = a->vd;
 +    rm = a->vm;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(0);
 +
 +    tcg_shift = tcg_const_i32(0);
 +
 +    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +
 +    if (dp) {
 +        TCGv_i64 tcg_double, tcg_res;
 +        TCGv_i32 tcg_tmp;
 +        tcg_double = tcg_temp_new_i64();
 +        tcg_res = tcg_temp_new_i64();
 +        tcg_tmp = tcg_temp_new_i32();
 +        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
 +        if (is_signed) {
 +            gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
 +        } else {
 +            gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
 +        }
 +        tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
 +        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
 +        tcg_temp_free_i32(tcg_tmp);
 +        tcg_temp_free_i64(tcg_res);
 +        tcg_temp_free_i64(tcg_double);
 +    } else {
 +        TCGv_i32 tcg_single, tcg_res;
 +        tcg_single = tcg_temp_new_i32();
 +        tcg_res = tcg_temp_new_i32();
 +        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
 +        if (is_signed) {
 +            gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
 +        } else {
 +            gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
 +        }
 +        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
 +        tcg_temp_free_i32(tcg_res);
 +        tcg_temp_free_i32(tcg_single);
 +    }
 +
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    tcg_temp_free_i32(tcg_rmode);
 +
 +    tcg_temp_free_i32(tcg_shift);
 +
 +    tcg_temp_free_ptr(fpst);
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/arm/exynos4210.c
-+++ b/target/arm/translate.c
++++ b/hw/arm/exynos4210.c
-@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
-     tcg_temp_free_i32(tmp);
+     sysbus_create_simple("l2x0", EXYNOS4210_L2X0_BASE_ADDR, NULL);
      /* External GIC */
 -    dev = qdev_new("exynos4210.gic");
 -    qdev_prop_set_uint32(dev, "num-cpu", EXYNOS4210_NCPUS);
 -    busdev = SYS_BUS_DEVICE(dev);
 -    sysbus_realize_and_unref(busdev, &error_fatal);
 +    qdev_prop_set_uint32(DEVICE(&s->ext_gic), "num-cpu", EXYNOS4210_NCPUS);
 +    busdev = SYS_BUS_DEVICE(&s->ext_gic);
 +    sysbus_realize(busdev, &error_fatal);
      /* Map CPU interface */
      sysbus_mmio_map(busdev, 0, EXYNOS4210_EXT_GIC_CPU_BASE_ADDR);
      /* Map Distributer interface */
@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
                             qdev_get_gpio_in(DEVICE(&s->cpu_irq_orgate[n]), 1));
      }
      for (n = 0; n < EXYNOS4210_EXT_GIC_NIRQ; n++) {
 -        s->irqs.ext_gic_irq[n] = qdev_get_gpio_in(dev, n);
 +        s->irqs.ext_gic_irq[n] = qdev_get_gpio_in(DEVICE(&s->ext_gic), n);
      }
      /* Internal Interrupt Combiner */
@@ -XXX,XX +XXX,XX @@ static void exynos4210_init(Object *obj)
      }
      object_initialize_child(obj, "a9mpcore", &s->a9mpcore, TYPE_A9MPCORE_PRIV);
 +    object_initialize_child(obj, "ext-gic", &s->ext_gic, TYPE_EXYNOS4210_GIC);
  }
--static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+ static void exynos4210_class_init(ObjectClass *klass, void *data)
--{
+diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
--    uint32_t rd, rn, rm;
+index XXXXXXX..XXXXXXX 100644
--    bool dp = a->dp;
+--- a/hw/intc/exynos4210_gic.c
 +++ b/hw/intc/exynos4210_gic.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/module.h"
  #include "hw/irq.h"
  #include "hw/qdev-properties.h"
 +#include "hw/intc/exynos4210_gic.h"
  #include "hw/arm/exynos4210.h"
  #include "qom/object.h"
@@ -XXX,XX +XXX,XX @@
  #define EXYNOS4210_GIC_CPU_REGION_SIZE  0x100
  #define EXYNOS4210_GIC_DIST_REGION_SIZE 0x1000
 -#define TYPE_EXYNOS4210_GIC "exynos4210.gic"
 -OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210GicState, EXYNOS4210_GIC)
 -
--    if (!dc_isar_feature(aa32_vsel, s)) {
+-struct Exynos4210GicState {
--        return false;
+-    SysBusDevice parent_obj;
 -    }
 -
--    /* UNDEF accesses to D16-D31 if they don't exist */
+-    MemoryRegion cpu_container;
--    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+-    MemoryRegion dist_container;
--        ((a->vm | a->vn | a->vd) & 0x10)) {
+-    MemoryRegion cpu_alias[EXYNOS4210_NCPUS];
--        return false;
+-    MemoryRegion dist_alias[EXYNOS4210_NCPUS];
--    }
+-    uint32_t num_cpu;
--    rd = a->vd;
+-    DeviceState *gic;
 -    rn = a->vn;
 -    rm = a->vm;
 -
 -    if (!vfp_access_check(s)) {
 -        return true;
 -    }
 -
 -    if (dp) {
 -        TCGv_i64 frn, frm, dest;
 -        TCGv_i64 tmp, zero, zf, nf, vf;
 -
 -        zero = tcg_const_i64(0);
 -
 -        frn = tcg_temp_new_i64();
 -        frm = tcg_temp_new_i64();
 -        dest = tcg_temp_new_i64();
 -
 -        zf = tcg_temp_new_i64();
 -        nf = tcg_temp_new_i64();
 -        vf = tcg_temp_new_i64();
 -
 -        tcg_gen_extu_i32_i64(zf, cpu_ZF);
 -        tcg_gen_ext_i32_i64(nf, cpu_NF);
 -        tcg_gen_ext_i32_i64(vf, cpu_VF);
 -
 -        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        switch (a->cc) {
 -        case 0: /* eq: Z */
 -            tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
 -                                frn, frm);
 -            break;
 -        case 1: /* vs: V */
 -            tcg_gen_movcond_i64(TCG_COND_LT, dest, vf, zero,
 -                                frn, frm);
 -            break;
 -        case 2: /* ge: N == V -> N ^ V == 0 */
 -            tmp = tcg_temp_new_i64();
 -            tcg_gen_xor_i64(tmp, vf, nf);
 -            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
 -                                frn, frm);
 -            tcg_temp_free_i64(tmp);
 -            break;
 -        case 3: /* gt: !Z && N == V */
 -            tcg_gen_movcond_i64(TCG_COND_NE, dest, zf, zero,
 -                                frn, frm);
 -            tmp = tcg_temp_new_i64();
 -            tcg_gen_xor_i64(tmp, vf, nf);
 -            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
 -                                dest, frm);
 -            tcg_temp_free_i64(tmp);
 -            break;
 -        }
 -        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i64(frn);
 -        tcg_temp_free_i64(frm);
 -        tcg_temp_free_i64(dest);
 -
 -        tcg_temp_free_i64(zf);
 -        tcg_temp_free_i64(nf);
 -        tcg_temp_free_i64(vf);
 -
 -        tcg_temp_free_i64(zero);
 -    } else {
 -        TCGv_i32 frn, frm, dest;
 -        TCGv_i32 tmp, zero;
 -
 -        zero = tcg_const_i32(0);
 -
 -        frn = tcg_temp_new_i32();
 -        frm = tcg_temp_new_i32();
 -        dest = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        switch (a->cc) {
 -        case 0: /* eq: Z */
 -            tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
 -                                frn, frm);
 -            break;
 -        case 1: /* vs: V */
 -            tcg_gen_movcond_i32(TCG_COND_LT, dest, cpu_VF, zero,
 -                                frn, frm);
 -            break;
 -        case 2: /* ge: N == V -> N ^ V == 0 */
 -            tmp = tcg_temp_new_i32();
 -            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
 -            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
 -                                frn, frm);
 -            tcg_temp_free_i32(tmp);
 -            break;
 -        case 3: /* gt: !Z && N == V */
 -            tcg_gen_movcond_i32(TCG_COND_NE, dest, cpu_ZF, zero,
 -                                frn, frm);
 -            tmp = tcg_temp_new_i32();
 -            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
 -            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
 -                                dest, frm);
 -            tcg_temp_free_i32(tmp);
 -            break;
 -        }
 -        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i32(frn);
 -        tcg_temp_free_i32(frm);
 -        tcg_temp_free_i32(dest);
 -
 -        tcg_temp_free_i32(zero);
 -    }
 -
 -    return true;
 -}
 -
 -static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
 -{
 -    uint32_t rd, rn, rm;
 -    bool dp = a->dp;
 -    bool vmin = a->op;
 -    TCGv_ptr fpst;
 -
 -    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
 -        return false;
 -    }
 -
 -    /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 -        ((a->vm | a->vn | a->vd) & 0x10)) {
 -        return false;
 -    }
 -    rd = a->vd;
 -    rn = a->vn;
 -    rm = a->vm;
 -
 -    if (!vfp_access_check(s)) {
 -        return true;
 -    }
 -
 -    fpst = get_fpstatus_ptr(0);
 -
 -    if (dp) {
 -        TCGv_i64 frn, frm, dest;
 -
 -        frn = tcg_temp_new_i64();
 -        frm = tcg_temp_new_i64();
 -        dest = tcg_temp_new_i64();
 -
 -        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        if (vmin) {
 -            gen_helper_vfp_minnumd(dest, frn, frm, fpst);
 -        } else {
 -            gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
 -        }
 -        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i64(frn);
 -        tcg_temp_free_i64(frm);
 -        tcg_temp_free_i64(dest);
 -    } else {
 -        TCGv_i32 frn, frm, dest;
 -
 -        frn = tcg_temp_new_i32();
 -        frm = tcg_temp_new_i32();
 -        dest = tcg_temp_new_i32();
 -
 -        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        if (vmin) {
 -            gen_helper_vfp_minnums(dest, frn, frm, fpst);
 -        } else {
 -            gen_helper_vfp_maxnums(dest, frn, frm, fpst);
 -        }
 -        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i32(frn);
 -        tcg_temp_free_i32(frm);
 -        tcg_temp_free_i32(dest);
 -    }
 -
 -    tcg_temp_free_ptr(fpst);
 -    return true;
 -}
 -
 -/*
 - * Table for converting the most common AArch32 encoding of
 - * rounding mode to arm_fprounding order (which matches the
 - * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
 - */
 -static const uint8_t fp_decode_rm[] = {
 -    FPROUNDING_TIEAWAY,
 -    FPROUNDING_TIEEVEN,
 -    FPROUNDING_POSINF,
 -    FPROUNDING_NEGINF,
 -};
 -
--static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
+ static void exynos4210_gic_set_irq(void *opaque, int irq, int level)
--{
+ {
--    uint32_t rd, rm;
+     Exynos4210GicState *s = (Exynos4210GicState *)opaque;
--    bool dp = a->dp;
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_gic_realize(DeviceState *dev, Error **errp)
--    TCGv_ptr fpst;
+      * enough room for the cpu numbers.  gcc 9.2.1 on 32-bit x86
--    TCGv_i32 tcg_rmode;
+      * doesn't figure this out, otherwise and gives spurious warnings.
--    int rounding = fp_decode_rm[a->rm];
+      */
--
+-    assert(n <= EXYNOS4210_NCPUS);
--    if (!dc_isar_feature(aa32_vrint, s)) {
++    assert(n <= EXYNOS4210_GIC_NCPUS);
--        return false;
+     for (i = 0; i < n; i++) {
--    }
+         /* Map CPU interface per SMP Core */
--
+         sprintf(cpu_alias_name, "%s%x", cpu_prefix, i);
--    /* UNDEF accesses to D16-D31 if they don't exist */
+diff --git a/MAINTAINERS b/MAINTAINERS
--    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+index XXXXXXX..XXXXXXX 100644
--        ((a->vm | a->vd) & 0x10)) {
+--- a/MAINTAINERS
--        return false;
++++ b/MAINTAINERS
--    }
+@@ -XXX,XX +XXX,XX @@ M: Peter Maydell <peter.maydell@linaro.org>
--    rd = a->vd;
+ L: qemu-arm@nongnu.org
--    rm = a->vm;
+ S: Odd Fixes
--
+ F: hw/*/exynos*
--    if (!vfp_access_check(s)) {
+-F: include/hw/arm/exynos4210.h
--        return true;
++F: include/hw/*/exynos*
--    }
--
+ Calxeda Highbank
--    fpst = get_fpstatus_ptr(0);
+ M: Rob Herring <robh@kernel.org>
 -
 -    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
 -    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -
 -    if (dp) {
 -        TCGv_i64 tcg_op;
 -        TCGv_i64 tcg_res;
 -        tcg_op = tcg_temp_new_i64();
 -        tcg_res = tcg_temp_new_i64();
 -        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 -        gen_helper_rintd(tcg_res, tcg_op, fpst);
 -        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i64(tcg_op);
 -        tcg_temp_free_i64(tcg_res);
 -    } else {
 -        TCGv_i32 tcg_op;
 -        TCGv_i32 tcg_res;
 -        tcg_op = tcg_temp_new_i32();
 -        tcg_res = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 -        gen_helper_rints(tcg_res, tcg_op, fpst);
 -        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i32(tcg_op);
 -        tcg_temp_free_i32(tcg_res);
 -    }
 -
 -    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    tcg_temp_free_i32(tcg_rmode);
 -
 -    tcg_temp_free_ptr(fpst);
 -    return true;
 -}
 -
 -static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
 -{
 -    uint32_t rd, rm;
 -    bool dp = a->dp;
 -    TCGv_ptr fpst;
 -    TCGv_i32 tcg_rmode, tcg_shift;
 -    int rounding = fp_decode_rm[a->rm];
 -    bool is_signed = a->op;
 -
 -    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
 -        return false;
 -    }
 -
 -    /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 -        return false;
 -    }
 -    rd = a->vd;
 -    rm = a->vm;
 -
 -    if (!vfp_access_check(s)) {
 -        return true;
 -    }
 -
 -    fpst = get_fpstatus_ptr(0);
 -
 -    tcg_shift = tcg_const_i32(0);
 -
 -    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
 -    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -
 -    if (dp) {
 -        TCGv_i64 tcg_double, tcg_res;
 -        TCGv_i32 tcg_tmp;
 -        tcg_double = tcg_temp_new_i64();
 -        tcg_res = tcg_temp_new_i64();
 -        tcg_tmp = tcg_temp_new_i32();
 -        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
 -        if (is_signed) {
 -            gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
 -        } else {
 -            gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
 -        }
 -        tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
 -        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
 -        tcg_temp_free_i32(tcg_tmp);
 -        tcg_temp_free_i64(tcg_res);
 -        tcg_temp_free_i64(tcg_double);
 -    } else {
 -        TCGv_i32 tcg_single, tcg_res;
 -        tcg_single = tcg_temp_new_i32();
 -        tcg_res = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
 -        if (is_signed) {
 -            gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
 -        } else {
 -            gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
 -        }
 -        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
 -        tcg_temp_free_i32(tcg_res);
 -        tcg_temp_free_i32(tcg_single);
 -    }
 -
 -    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    tcg_temp_free_i32(tcg_rmode);
 -
 -    tcg_temp_free_i32(tcg_shift);
 -
 -    tcg_temp_free_ptr(fpst);
 -
 -    return true;
 -}
 -
  /*
   * Disassemble a VFP instruction.  Returns nonzero if an error occurred
   * (ie. an undefined instruction).
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 15/48] target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM to decodetree
+[PULL 16/31] hw/arm/exynos4210: Drop ext_gic_irq[] from Exynos4210Irq struct
-Convert the VCVTA/VCVTN/VCVTP/VCVTM instructions to decodetree.
+The only time we use the ext_gic_irq[] array in the Exynos4210Irq
-trans_VCVT() is temporarily left in translate.c.
+struct is during realize of the SoC -- we initialize it with the
 input IRQs of the external GIC device, and then connect those to
 outputs of other devices further on in realize (including in the
 exynos4210_init_board_irqs() function).  Now that the ext_gic object
 is easily accessible as s->ext_gic we can make the connections
 directly from one device to the other without going via this array.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-10-peter.maydell@linaro.org
 ---
- target/arm/translate.c       | 72 +++++++++++++++++-------------------
+ include/hw/arm/exynos4210.h |  1 -
- target/arm/vfp-uncond.decode |  6 +++
+ hw/arm/exynos4210.c         | 12 ++++++------
-files changed, 39 insertions(+), 39 deletions(-)
+files changed, 6 insertions(+), 7 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/include/hw/arm/exynos4210.h
-+++ b/target/arm/translate.c
++++ b/include/hw/arm/exynos4210.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
+@@ -XXX,XX +XXX,XX @@
-     return true;
+ typedef struct Exynos4210Irq {
- }
+     qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
+     qemu_irq ext_combiner_irq[EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ];
--static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
+-    qemu_irq ext_gic_irq[EXYNOS4210_EXT_GIC_NIRQ];
--                       int rounding)
+ } Exynos4210Irq;
-+static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
  struct Exynos4210State {
 diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/exynos4210.c
 +++ b/hw/arm/exynos4210.c
@@ -XXX,XX +XXX,XX @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
  {
--    bool is_signed = extract32(insn, 7, 1);
+     uint32_t grp, bit, irq_id, n;
--    TCGv_ptr fpst = get_fpstatus_ptr(0);
+     Exynos4210Irq *is = &s->irqs;
-+    uint32_t rd, rm;
++    DeviceState *extgicdev = DEVICE(&s->ext_gic);
-+    bool dp = a->dp;
-+    TCGv_ptr fpst;
+     for (n = 0; n < EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ; n++) {
-     TCGv_i32 tcg_rmode, tcg_shift;
+         irq_id = 0;
-+    int rounding = fp_decode_rm[a->rm];
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
-+    bool is_signed = a->op;
+         }
-+
+         if (irq_id) {
-+    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
+             s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
-+        return false;
+-                    is->ext_gic_irq[irq_id - 32]);
-+    }
++                                             qdev_get_gpio_in(extgicdev,
-+
++                                                              irq_id - 32));
-+    /* UNDEF accesses to D16-D31 if they don't exist */
+         } else {
-+    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+             s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
-+        return false;
+                     is->ext_combiner_irq[n]);
-+    }
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
-+    rd = a->vd;
-+    rm = a->vm;
+         if (irq_id) {
-+
+             s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
-+    if (!vfp_access_check(s)) {
+-                    is->ext_gic_irq[irq_id - 32]);
-+        return true;
++                                             qdev_get_gpio_in(extgicdev,
-+    }
++                                                              irq_id - 32));
 +
 +    fpst = get_fpstatus_ptr(0);
      tcg_shift = tcg_const_i32(0);
@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
      if (dp) {
          TCGv_i64 tcg_double, tcg_res;
          TCGv_i32 tcg_tmp;
 -        /* Rd is encoded as a single precision register even when the source
 -         * is double precision.
 -         */
 -        rd = ((rd << 1) & 0x1e) | ((rd >> 4) & 0x1);
          tcg_double = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
          tcg_tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
      tcg_temp_free_ptr(fpst);
 -    return 0;
 -}
 -
 -static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
 -{
 -    uint32_t rd, rm, dp = extract32(insn, 8, 1);
 -
 -    if (dp) {
 -        VFP_DREG_D(rd, insn);
 -        VFP_DREG_M(rm, insn);
 -    } else {
 -        rd = VFP_SREG_D(insn);
 -        rm = VFP_SREG_M(insn);
 -    }
 -
 -    if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
 -        dc_isar_feature(aa32_vcvt_dr, s)) {
 -        /* VCVTA, VCVTN, VCVTP, VCVTM */
 -        int rounding = fp_decode_rm[extract32(insn, 16, 2)];
 -        return handle_vcvt(insn, rd, rm, dp, rounding);
 -    }
 -    return 1;
 +    return true;
  }
  /*
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
          }
      }
+ }
-+    if (extract32(insn, 28, 4) == 0xf) {
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
-+        /*
+         sysbus_connect_irq(busdev, n,
-+         * Encodings with T=1 (Thumb) or unconditional (ARM): these
+                            qdev_get_gpio_in(DEVICE(&s->cpu_irq_orgate[n]), 1));
 +         * were all handled by the decodetree decoder, so any insn
 +         * patterns which get here must be UNDEF.
 +         */
 +        return 1;
 +    }
 +
      /*
       * FIXME: this access check should not take precedence over UNDEF
       * for invalid encodings; we will generate incorrect syndrome information
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
          return 0;
      }
+-    for (n = 0; n < EXYNOS4210_EXT_GIC_NIRQ; n++) {
--    if (extract32(insn, 28, 4) == 0xf) {
+-        s->irqs.ext_gic_irq[n] = qdev_get_gpio_in(DEVICE(&s->ext_gic), n);
 -        /*
 -         * Encodings with T=1 (Thumb) or unconditional (ARM):
 -         * only used for the "miscellaneous VFP features" added in v8A
 -         * and v7M (and gated on the MVFR2.FPMisc field).
 -         */
 -        return disas_vfp_misc_insn(s, insn);
 -    }
--
-     dp = ((insn & 0xf00) == 0xb00);
+     /* Internal Interrupt Combiner */
-     switch ((insn >> 24) & 0xf) {
+     dev = qdev_new("exynos4210.combiner");
-     case 0xe:
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
-diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
+     busdev = SYS_BUS_DEVICE(dev);
-index XXXXXXX..XXXXXXX 100644
+     sysbus_realize_and_unref(busdev, &error_fatal);
---- a/target/arm/vfp-uncond.decode
+     for (n = 0; n < EXYNOS4210_MAX_INT_COMBINER_OUT_IRQ; n++) {
-+++ b/target/arm/vfp-uncond.decode
+-        sysbus_connect_irq(busdev, n, s->irqs.ext_gic_irq[n]);
-@@ -XXX,XX +XXX,XX @@ VRINT       1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
++        sysbus_connect_irq(busdev, n, qdev_get_gpio_in(DEVICE(&s->ext_gic), n));
-             vm=%vm_sp vd=%vd_sp dp=0
+     }
- VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
+     exynos4210_combiner_get_gpioin(&s->irqs, dev, 1);
-             vm=%vm_dp vd=%vd_dp dp=1
+     sysbus_mmio_map(busdev, 0, EXYNOS4210_EXT_COMBINER_BASE_ADDR);
 +
 +# VCVT float to int with specified rounding mode; Vd is always single-precision
 +VCVT        1111 1110 1.11 11 rm:2 .... 1010 op:1 1.0 .... \
 +            vm=%vm_sp vd=%vd_sp dp=0
 +VCVT        1111 1110 1.11 11 rm:2 .... 1011 op:1 1.0 .... \
 +            vm=%vm_dp vd=%vd_sp dp=1
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 27/48] target/arm: Convert VFP VNMLA to decodetree
+[PULL 17/31] hw/arm/exynos4210: Move exynos4210_combiner_get_gpioin() into exynos4210.c
-Convert the VFP VNMLA instruction to decodetree.
+The function exynos4210_combiner_get_gpioin() currently lives in
 exynos4210_combiner.c, but it isn't really part of the combiner
 device itself -- it is a function that implements the wiring up of
 some interrupt sources to multiple combiner inputs.  Move it to live
 with the other SoC-level code in exynos4210.c, along with a few
 macros previously defined in exynos4210.h which are now used only
 in exynos4210.c.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-11-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 34 ++++++++++++++++++++++++++++++++++
+ include/hw/arm/exynos4210.h   | 11 -----
- target/arm/translate.c         | 19 +------------------
+ hw/arm/exynos4210.c           | 82 +++++++++++++++++++++++++++++++++++
- target/arm/vfp.decode          |  5 +++++
+ hw/intc/exynos4210_combiner.c | 77 --------------------------------
-files changed, 40 insertions(+), 18 deletions(-)
+files changed, 82 insertions(+), 88 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/exynos4210.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/exynos4210.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_sp *a)
+@@ -XXX,XX +XXX,XX @@
- {
+ #define EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ   \
-     return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
+     (EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ * 8)
 -#define EXYNOS4210_COMBINER_GET_IRQ_NUM(grp, bit)  ((grp)*8 + (bit))
 -#define EXYNOS4210_COMBINER_GET_GRP_NUM(irq)       ((irq) / 8)
 -#define EXYNOS4210_COMBINER_GET_BIT_NUM(irq) \
 -    ((irq) - 8 * EXYNOS4210_COMBINER_GET_GRP_NUM(irq))
 -
  /* IRQs number for external and internal GIC */
  #define EXYNOS4210_EXT_GIC_NIRQ     (160-32)
  #define EXYNOS4210_INT_GIC_NIRQ     64
@@ -XXX,XX +XXX,XX @@ void exynos4210_write_secondary(ARMCPU *cpu,
   *  bit - bit number inside group */
  uint32_t exynos4210_get_irq(uint32_t grp, uint32_t bit);
 -/*
 - * Get Combiner input GPIO into irqs structure
 - */
 -void exynos4210_combiner_get_gpioin(Exynos4210Irq *irqs, DeviceState *dev,
 -        int ext);
 -
  /*
   * exynos4210 UART
   */
 diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/exynos4210.c
 +++ b/hw/arm/exynos4210.c
@@ -XXX,XX +XXX,XX @@ combiner_grp_to_gic_id[64 - EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
      { }, { }, { }, { }, { }, { }, { }, { }, { }, { }
  };
 +#define EXYNOS4210_COMBINER_GET_IRQ_NUM(grp, bit)  ((grp) * 8 + (bit))
 +#define EXYNOS4210_COMBINER_GET_GRP_NUM(irq)       ((irq) / 8)
 +#define EXYNOS4210_COMBINER_GET_BIT_NUM(irq) \
 +    ((irq) - 8 * EXYNOS4210_COMBINER_GET_GRP_NUM(irq))
 +
  /*
   * Initialize board IRQs.
   * These IRQs contain splitted Int/External Combiner and External Gic IRQs.
@@ -XXX,XX +XXX,XX @@ uint32_t exynos4210_get_irq(uint32_t grp, uint32_t bit)
      return EXYNOS4210_COMBINER_GET_IRQ_NUM(grp, bit);
  }
-+
-+static void gen_VNMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
++/*
 + * Get Combiner input GPIO into irqs structure
 + */
 +static void exynos4210_combiner_get_gpioin(Exynos4210Irq *irqs,
 +                                           DeviceState *dev, int ext)
 +{
-+    /* VNMLA: -fd + -(fn * fm) */
++    int n;
-+    TCGv_i32 tmp = tcg_temp_new_i32();
++    int bit;
-+
++    int max;
-+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
++    qemu_irq *irq;
-+    gen_helper_vfp_negs(tmp, tmp);
++
-+    gen_helper_vfp_negs(vd, vd);
++    max = ext ? EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ :
-+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
++        EXYNOS4210_MAX_INT_COMBINER_IN_IRQ;
-+    tcg_temp_free_i32(tmp);
++    irq = ext ? irqs->ext_combiner_irq : irqs->int_combiner_irq;
 +
 +    /*
 +     * Some IRQs of Int/External Combiner are going to two Combiners groups,
 +     * so let split them.
 +     */
 +    for (n = 0; n < max; n++) {
 +
 +        bit = EXYNOS4210_COMBINER_GET_BIT_NUM(n);
 +
 +        switch (n) {
 +        /* MDNIE_LCD1 INTG1 */
 +        case EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 0) ...
 +             EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 3):
 +            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 +                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(0, bit + 4)]);
 +            continue;
 +
 +        /* TMU INTG3 */
 +        case EXYNOS4210_COMBINER_GET_IRQ_NUM(3, 4):
 +            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 +                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(2, bit)]);
 +            continue;
 +
 +        /* LCD1 INTG12 */
 +        case EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 0) ...
 +             EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 3):
 +            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 +                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(11, bit + 4)]);
 +            continue;
 +
 +        /* Multi-Core Timer INTG12 */
 +        case EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 4) ...
 +             EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 8):
 +               irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 +                       irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
 +            continue;
 +
 +        /* Multi-Core Timer INTG35 */
 +        case EXYNOS4210_COMBINER_GET_IRQ_NUM(35, 4) ...
 +             EXYNOS4210_COMBINER_GET_IRQ_NUM(35, 8):
 +            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 +                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
 +            continue;
 +
 +        /* Multi-Core Timer INTG51 */
 +        case EXYNOS4210_COMBINER_GET_IRQ_NUM(51, 4) ...
 +             EXYNOS4210_COMBINER_GET_IRQ_NUM(51, 8):
 +            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 +                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
 +            continue;
 +
 +        /* Multi-Core Timer INTG53 */
 +        case EXYNOS4210_COMBINER_GET_IRQ_NUM(53, 4) ...
 +             EXYNOS4210_COMBINER_GET_IRQ_NUM(53, 8):
 +            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 +                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
 +            continue;
 +        }
 +
 +        irq[n] = qdev_get_gpio_in(dev, n);
 +    }
 +}
 +
-+static bool trans_VNMLA_sp(DisasContext *s, arg_VNMLA_sp *a)
+ static uint8_t chipid_and_omr[] = { 0x11, 0x02, 0x21, 0x43,
-+{
+x09, 0x00, 0x00, 0x00 };
-+    return do_vfp_3op_sp(s, gen_VNMLA_sp, a->vd, a->vn, a->vm, true);
-+}
+diff --git a/hw/intc/exynos4210_combiner.c b/hw/intc/exynos4210_combiner.c
 +
 +static void gen_VNMLA_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
 +{
 +    /* VNMLA: -fd + (fn * fm) */
 +    TCGv_i64 tmp = tcg_temp_new_i64();
 +
 +    gen_helper_vfp_muld(tmp, vn, vm, fpst);
 +    gen_helper_vfp_negd(tmp, tmp);
 +    gen_helper_vfp_negd(vd, vd);
 +    gen_helper_vfp_addd(vd, vd, tmp, fpst);
 +    tcg_temp_free_i64(tmp);
 +}
 +
 +static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_sp *a)
 +{
 +    return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/intc/exynos4210_combiner.c
-+++ b/target/arm/translate.c
++++ b/hw/intc/exynos4210_combiner.c
-@@ -XXX,XX +XXX,XX @@ VFP_OP2(div)
+@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_exynos4210_combiner = {
+     }
- #undef VFP_OP2
+ };
--static inline void gen_vfp_F1_neg(int dp)
+-/*
 - * Get Combiner input GPIO into irqs structure
 - */
 -void exynos4210_combiner_get_gpioin(Exynos4210Irq *irqs, DeviceState *dev,
 -        int ext)
 -{
--    /* Like gen_vfp_neg() but put result in F1 */
+-    int n;
--    if (dp) {
+-    int bit;
--        gen_helper_vfp_negd(cpu_F1d, cpu_F0d);
+-    int max;
--    } else {
+-    qemu_irq *irq;
--        gen_helper_vfp_negs(cpu_F1s, cpu_F0s);
+-
 -    max = ext ? EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ :
 -        EXYNOS4210_MAX_INT_COMBINER_IN_IRQ;
 -    irq = ext ? irqs->ext_combiner_irq : irqs->int_combiner_irq;
 -
 -    /*
 -     * Some IRQs of Int/External Combiner are going to two Combiners groups,
 -     * so let split them.
 -     */
 -    for (n = 0; n < max; n++) {
 -
 -        bit = EXYNOS4210_COMBINER_GET_BIT_NUM(n);
 -
 -        switch (n) {
 -        /* MDNIE_LCD1 INTG1 */
 -        case EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 0) ...
 -             EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 3):
 -            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 -                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(0, bit + 4)]);
 -            continue;
 -
 -        /* TMU INTG3 */
 -        case EXYNOS4210_COMBINER_GET_IRQ_NUM(3, 4):
 -            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 -                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(2, bit)]);
 -            continue;
 -
 -        /* LCD1 INTG12 */
 -        case EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 0) ...
 -             EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 3):
 -            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 -                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(11, bit + 4)]);
 -            continue;
 -
 -        /* Multi-Core Timer INTG12 */
 -        case EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 4) ...
 -             EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 8):
 -               irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 -                       irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
 -            continue;
 -
 -        /* Multi-Core Timer INTG35 */
 -        case EXYNOS4210_COMBINER_GET_IRQ_NUM(35, 4) ...
 -             EXYNOS4210_COMBINER_GET_IRQ_NUM(35, 8):
 -            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 -                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
 -            continue;
 -
 -        /* Multi-Core Timer INTG51 */
 -        case EXYNOS4210_COMBINER_GET_IRQ_NUM(51, 4) ...
 -             EXYNOS4210_COMBINER_GET_IRQ_NUM(51, 8):
 -            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 -                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
 -            continue;
 -
 -        /* Multi-Core Timer INTG53 */
 -        case EXYNOS4210_COMBINER_GET_IRQ_NUM(53, 4) ...
 -             EXYNOS4210_COMBINER_GET_IRQ_NUM(53, 8):
 -            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 -                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
 -            continue;
 -        }
 -
 -        irq[n] = qdev_get_gpio_in(dev, n);
 -    }
 -}
 -
- static inline void gen_vfp_abs(int dp)
+ static uint64_t
  exynos4210_combiner_read(void *opaque, hwaddr offset, unsigned size)
  {
-     if (dp)
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 2:
-+            case 0 ... 3:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 3: /* VNMLA: -fd + -(fn * fm) */
--                    gen_vfp_mul(dp);
--                    gen_vfp_F1_neg(dp);
--                    gen_mov_F0_vreg(dp, rd);
--                    gen_vfp_neg(dp);
--                    gen_vfp_add(dp);
--                    break;
-                 case 4: /* mul: fn * fm */
-                     gen_vfp_mul(dp);
-                     break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 43/48] target/arm: Convert double-single precision conversion insns to decodetree
+[PULL 18/31] hw/arm/exynos4210: Delete unused macro definitions
-Convert the VCVT double/single precision conversion insns to decodetree.
+Delete a couple of #defines which are never used.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-12-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 48 ++++++++++++++++++++++++++++++++++
+ include/hw/arm/exynos4210.h | 4 ----
- target/arm/translate.c         | 13 +--------
+file changed, 4 deletions(-)
  target/arm/vfp.decode          |  6 +++++
 files changed, 55 insertions(+), 12 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/exynos4210.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/exynos4210.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
+@@ -XXX,XX +XXX,XX @@
-     tcg_temp_free_i64(tmp);
+ #define EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ   \
-     return true;
+     (EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ * 8)
- }
-+
+-/* IRQs number for external and internal GIC */
-+static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
+-#define EXYNOS4210_EXT_GIC_NIRQ     (160-32)
-+{
+-#define EXYNOS4210_INT_GIC_NIRQ     64
 +    TCGv_i64 vd;
 +    TCGv_i32 vm;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    vm = tcg_temp_new_i32();
 +    vd = tcg_temp_new_i64();
 +    neon_load_reg32(vm, a->vm);
 +    gen_helper_vfp_fcvtds(vd, vm, cpu_env);
 +    neon_store_reg64(vd, a->vd);
 +    tcg_temp_free_i32(vm);
 +    tcg_temp_free_i64(vd);
 +    return true;
 +}
 +
 +static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
 +{
 +    TCGv_i64 vm;
 +    TCGv_i32 vd;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    vd = tcg_temp_new_i32();
 +    vm = tcg_temp_new_i64();
 +    neon_load_reg64(vm, a->vm);
 +    gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
 +    neon_store_reg32(vd, a->vd);
 +    tcg_temp_free_i32(vd);
 +    tcg_temp_free_i64(vm);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  return 1;
              case 15:
                  switch (rn) {
 -                case 0 ... 14:
 +                case 0 ... 15:
                      /* Already handled by decodetree */
                      return 1;
                  default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              if (op == 15) {
                  /* rn is opcode, encoded as per VFP_SREG_N. */
                  switch (rn) {
 -                case 0x0f: /* vcvt double<->single */
 -                    rd_is_dp = !dp;
 -                    break;
 -
-                 case 0x10: /* vcvt.fxx.u32 */
+ #define EXYNOS4210_I2C_NUMBER               9
-                 case 0x11: /* vcvt.fxx.s32 */
-                     rm_is_dp = false;
+ #define EXYNOS4210_NUM_DMA      3
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 15: /* single<->double conversion */
 -                        if (dp) {
 -                            gen_helper_vfp_fcvtsd(cpu_F0s, cpu_F0d, cpu_env);
 -                        } else {
 -                            gen_helper_vfp_fcvtds(cpu_F0d, cpu_F0s, cpu_env);
 -                        }
 -                        break;
                      case 16: /* fuito */
                          gen_vfp_uito(dp, 0);
                          break;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VRINTX_sp    ---- 1110 1.11 0111 .... 1010 01.0 .... \
               vd=%vd_sp vm=%vm_sp
  VRINTX_dp    ---- 1110 1.11 0111 .... 1011 01.0 .... \
               vd=%vd_dp vm=%vm_dp
 +
 +# VCVT between single and double: Vm precision depends on size; Vd is its reverse
 +VCVT_sp      ---- 1110 1.11 0111 .... 1010 11.0 .... \
 +             vd=%vd_dp vm=%vm_sp
 +VCVT_dp      ---- 1110 1.11 0111 .... 1011 11.0 .... \
 +             vd=%vd_sp vm=%vm_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 48/48] target/arm: Fix short-vector increment behaviour
+[PULL 19/31] hw/arm/exynos4210: Use TYPE_SPLIT_IRQ in exynos4210_init_board_irqs()
-For VFP short vectors, the VFP registers are divided into a
+In exynos4210_init_board_irqs(), use the TYPE_SPLIT_IRQ device
-series of banks: for single-precision these are s0-s7, s8-s15,
+instead of qemu_irq_split().
 s16-s23 and s24-s31; for double-precision they are d0-d3,
 d4-d7, ... d28-d31. Some banks are "scalar" meaning that
 use of a register within them triggers a pure-scalar or
 mixed vector-scalar operation rather than a full vector
 operation. The scalar banks are s0-s7, d0-d3 and d16-d19.
 When using a bank as part of a vector operation, we
 iterate through it, increasing the register number by
 the specified stride each time, and wrapping around to
 the beginning of the bank.
 Unfortunately our calculation of the "increment" part of this
 was incorrect:
  vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask)
 will only do the intended thing if bank_mask has exactly
 one set high bit. For instance for doubles (bank_mask = 0xc),
 if we start with vd = 6 and delta_d = 2 then vd is updated
 to 12 rather than the intended 4.
 This only causes problems in the unlikely case that the
 starting register is not the first in its bank: if the
 register number doesn't have to wrap around then the
 expression happens to give the right answer.
 Fix this bug by abstracting out the "check whether register
 is in a scalar bank" and "advance register within bank"
 operations to utility functions which use the right
 bit masking operations.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-13-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 100 ++++++++++++++++++++-------------
+ include/hw/arm/exynos4210.h |  9 ++++++++
-file changed, 60 insertions(+), 40 deletions(-)
+ hw/arm/exynos4210.c         | 41 +++++++++++++++++++++++++++++--------
 files changed, 42 insertions(+), 8 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/exynos4210.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/exynos4210.h
-@@ -XXX,XX +XXX,XX @@ typedef void VFPGen3OpDPFn(TCGv_i64 vd,
+@@ -XXX,XX +XXX,XX @@
- typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
+ #include "hw/sysbus.h"
- typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
+ #include "hw/cpu/a9mpcore.h"
  #include "hw/intc/exynos4210_gic.h"
 +#include "hw/core/split-irq.h"
  #include "target/arm/cpu-qom.h"
  #include "qom/object.h"
@@ -XXX,XX +XXX,XX @@
  #define EXYNOS4210_NUM_DMA      3
 +/*
-+ * Return true if the specified S reg is in a scalar bank
++ * We need one splitter for every external combiner input, plus
-+ * (ie if it is s0..s7)
++ * one for every non-zero entry in combiner_grp_to_gic_id[].
 + * We'll assert in exynos4210_init_board_irqs() if this is wrong.
 + */
-+static inline bool vfp_sreg_is_scalar(int reg)
++#define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 60)
 +{
 +    return (reg & 0x18) == 0;
 +}
 +
-+/*
+ typedef struct Exynos4210Irq {
-+ * Return true if the specified D reg is in a scalar bank
+     qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
-+ * (ie if it is d0..d3 or d16..d19)
+     qemu_irq ext_combiner_irq[EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ];
-+ */
+@@ -XXX,XX +XXX,XX @@ struct Exynos4210State {
-+static inline bool vfp_dreg_is_scalar(int reg)
+     qemu_or_irq cpu_irq_orgate[EXYNOS4210_NCPUS];
-+{
+     A9MPPrivState a9mpcore;
-+    return (reg & 0xc) == 0;
+     Exynos4210GicState ext_gic;
-+}
++    SplitIRQ splitter[EXYNOS4210_NUM_SPLITTERS];
  };
  #define TYPE_EXYNOS4210_SOC "exynos4210"
 diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/exynos4210.c
 +++ b/hw/arm/exynos4210.c
@@ -XXX,XX +XXX,XX @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
      uint32_t grp, bit, irq_id, n;
      Exynos4210Irq *is = &s->irqs;
      DeviceState *extgicdev = DEVICE(&s->ext_gic);
 +    int splitcount = 0;
 +    DeviceState *splitter;
      for (n = 0; n < EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ; n++) {
          irq_id = 0;
@@ -XXX,XX +XXX,XX @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
              /* MCT_G1 is passed to External and GIC */
              irq_id = EXT_GIC_ID_MCT_G1;
          }
 +
-+/*
++        assert(splitcount < EXYNOS4210_NUM_SPLITTERS);
-+ * Advance the S reg number forwards by delta within its bank
++        splitter = DEVICE(&s->splitter[splitcount]);
-+ * (ie increment the low 3 bits but leave the rest the same)
++        qdev_prop_set_uint16(splitter, "num-lines", 2);
-+ */
++        qdev_realize(splitter, NULL, &error_abort);
-+static inline int vfp_advance_sreg(int reg, int delta)
++        splitcount++;
-+{
++        s->irq_table[n] = qdev_get_gpio_in(splitter, 0);
-+    return ((reg + delta) & 0x7) | (reg & ~0x7);
++        qdev_connect_gpio_out(splitter, 0, is->int_combiner_irq[n]);
-+}
+         if (irq_id) {
-+
+-            s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
-+/*
+-                                             qdev_get_gpio_in(extgicdev,
-+ * Advance the D reg number forwards by delta within its bank
+-                                                              irq_id - 32));
-+ * (ie increment the low 2 bits but leave the rest the same)
++            qdev_connect_gpio_out(splitter, 1,
-+ */
++                                  qdev_get_gpio_in(extgicdev, irq_id - 32));
 +static inline int vfp_advance_dreg(int reg, int delta)
 +{
 +    return ((reg + delta) & 0x3) | (reg & ~0x3);
 +}
 +
  /*
   * Perform a 3-operand VFP data processing instruction. fn is the
   * callback to do the actual operation; this function deals with the
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
  {
      uint32_t delta_m = 0;
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i32 f0, f1, fd;
      TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
      }
      if (veclen > 0) {
 -        bank_mask = 0x18;
 -
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_sreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
-             delta_d = s->vec_stride + 1;
+-            s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
+-                    is->ext_combiner_irq[n]);
--            if ((vm & bank_mask) == 0) {
++            qdev_connect_gpio_out(splitter, 1, is->ext_combiner_irq[n]);
 +            if (vfp_sreg_is_scalar(vm)) {
                  /* mixed scalar/vector */
                  delta_m = 0;
              } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 -        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
 +        vd = vfp_advance_sreg(vd, delta_d);
 +        vn = vfp_advance_sreg(vn, delta_d);
          neon_load_reg32(f0, vn);
          if (delta_m) {
 -            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +            vm = vfp_advance_sreg(vm, delta_m);
              neon_load_reg32(f1, vm);
          }
      }
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
+     for (; n < EXYNOS4210_MAX_INT_COMBINER_IN_IRQ; n++) {
- {
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
-     uint32_t delta_m = 0;
+                      EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][bit];
-     uint32_t delta_d = 0;
--    uint32_t bank_mask = 0;
+         if (irq_id) {
-     int veclen = s->vec_len;
+-            s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
-     TCGv_i64 f0, f1, fd;
+-                                             qdev_get_gpio_in(extgicdev,
-     TCGv_ptr fpst;
+-                                                              irq_id - 32));
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
++            assert(splitcount < EXYNOS4210_NUM_SPLITTERS);
-     }
++            splitter = DEVICE(&s->splitter[splitcount]);
++            qdev_prop_set_uint16(splitter, "num-lines", 2);
-     if (veclen > 0) {
++            qdev_realize(splitter, NULL, &error_abort);
--        bank_mask = 0xc;
++            splitcount++;
--
++            s->irq_table[n] = qdev_get_gpio_in(splitter, 0);
-         /* Figure out what type of vector operation this is.  */
++            qdev_connect_gpio_out(splitter, 0, is->int_combiner_irq[n]);
--        if ((vd & bank_mask) == 0) {
++            qdev_connect_gpio_out(splitter, 1,
-+        if (vfp_dreg_is_scalar(vd)) {
++                                  qdev_get_gpio_in(extgicdev, irq_id - 32));
              /* scalar */
              veclen = 0;
          } else {
              delta_d = (s->vec_stride >> 1) + 1;
 -            if ((vm & bank_mask) == 0) {
 +            if (vfp_dreg_is_scalar(vm)) {
                  /* mixed scalar/vector */
                  delta_m = 0;
              } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
          }
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 -        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
 +        vd = vfp_advance_dreg(vd, delta_d);
 +        vn = vfp_advance_dreg(vn, delta_d);
          neon_load_reg64(f0, vn);
          if (delta_m) {
 -            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +            vm = vfp_advance_dreg(vm, delta_m);
              neon_load_reg64(f1, vm);
          }
      }
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
++    /*
- {
++     * We check this here to avoid a more obscure assert later when
-     uint32_t delta_m = 0;
++     * qdev_assert_realized_properly() checks that we realized every
-     uint32_t delta_d = 0;
++     * child object we initialized.
--    uint32_t bank_mask = 0;
++     */
-     int veclen = s->vec_len;
++    assert(splitcount == EXYNOS4210_NUM_SPLITTERS);
-     TCGv_i32 f0, fd;
+ }
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+ /*
@@ -XXX,XX +XXX,XX @@ static void exynos4210_init(Object *obj)
          object_initialize_child(obj, name, &s->cpu_irq_orgate[i], TYPE_OR_IRQ);
      }
-     if (veclen > 0) {
++    for (i = 0; i < ARRAY_SIZE(s->splitter); i++) {
--        bank_mask = 0x18;
++        g_autofree char *name = g_strdup_printf("irq-splitter%d", i);
--
++        object_initialize_child(obj, name, &s->splitter[i], TYPE_SPLIT_IRQ);
-         /* Figure out what type of vector operation this is.  */
++    }
--        if ((vd & bank_mask) == 0) {
++
-+        if (vfp_sreg_is_scalar(vd)) {
+     object_initialize_child(obj, "a9mpcore", &s->a9mpcore, TYPE_A9MPCORE_PRIV);
-             /* scalar */
+     object_initialize_child(obj, "ext-gic", &s->ext_gic, TYPE_EXYNOS4210_GIC);
-             veclen = 0;
+ }
          } else {
              delta_d = s->vec_stride + 1;
 -            if ((vm & bank_mask) == 0) {
 +            if (vfp_sreg_is_scalar(vm)) {
                  /* mixed scalar/vector */
                  delta_m = 0;
              } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
          if (delta_m == 0) {
              /* single source one-many */
              while (veclen--) {
 -                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +                vd = vfp_advance_sreg(vd, delta_d);
                  neon_store_reg32(fd, vd);
              }
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 -        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +        vd = vfp_advance_sreg(vd, delta_d);
 +        vm = vfp_advance_sreg(vm, delta_m);
          neon_load_reg32(f0, vm);
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
  {
      uint32_t delta_m = 0;
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i64 f0, fd;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
      }
      if (veclen > 0) {
 -        bank_mask = 0xc;
 -
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_dreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
              delta_d = (s->vec_stride >> 1) + 1;
 -            if ((vm & bank_mask) == 0) {
 +            if (vfp_dreg_is_scalar(vm)) {
                  /* mixed scalar/vector */
                  delta_m = 0;
              } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
          if (delta_m == 0) {
              /* single source one-many */
              while (veclen--) {
 -                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +                vd = vfp_advance_dreg(vd, delta_d);
                  neon_store_reg64(fd, vd);
              }
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 -        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +        vd = vfp_advance_dreg(vd, delta_d);
 +        vd = vfp_advance_dreg(vm, delta_m);
          neon_load_reg64(f0, vm);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
  static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
  {
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i32 fd;
      uint32_t n, i, vd;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
      }
      if (veclen > 0) {
 -        bank_mask = 0x18;
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_sreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vd = vfp_advance_sreg(vd, delta_d);
      }
      tcg_temp_free_i32(fd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
  static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
  {
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i64 fd;
      uint32_t n, i, vd;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
      }
      if (veclen > 0) {
 -        bank_mask = 0xc;
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_dreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vfp_advance_dreg(vd, delta_d);
      }
      tcg_temp_free_i64(fd);
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 10/48] target/arm: Fix Cortex-R5F MVFR values
+[PULL 20/31] hw/arm/exynos4210: Fill in irq_table[] for internal-combiner-only IRQ lines
-The Cortex-R5F initfn was not correctly setting up the MVFR
+In exynos4210_init_board_irqs(), the loop that handles IRQ lines that
-ID register values. Fill these in, since some subsequent patches
+are in a range that applies to the internal combiner only creates a
-will use ID register checks rather than CPU feature bit checks.
+splitter for those interrupts which go to both the internal combiner
 and to the external GIC, but it does nothing at all for the
 interrupts which don't go to the external GIC, leaving the
 irq_table[] array element empty for those.  (This will result in
 those interrupts simply being lost, not in a QEMU crash.)
 I don't have a reliable datasheet for this SoC, but since we do wire
 up one interrupt line in this category (the HDMI I2C device on
 interrupt 16,1), this seems like it must be a bug in the existing
 QEMU code.  Fill in the irq_table[] entries where we're not splitting
 the IRQ to both the internal combiner and the external GIC with the
 IRQ line of the internal combiner.  (That is, these IRQ lines go to
 just one device, not multiple.)
 This bug didn't have any visible guest effects because the only
 implemented device that was affected was the HDMI I2C controller,
 and we never connect any I2C devices to that bus.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-14-peter.maydell@linaro.org
 ---
- target/arm/cpu.c | 2 ++
+ hw/arm/exynos4210.c | 2 ++
 file changed, 2 insertions(+)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/hw/arm/exynos4210.c
-+++ b/target/arm/cpu.c
++++ b/hw/arm/exynos4210.c
-@@ -XXX,XX +XXX,XX @@ static void cortex_r5f_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
+             qdev_connect_gpio_out(splitter, 0, is->int_combiner_irq[n]);
-     cortex_r5_initfn(obj);
+             qdev_connect_gpio_out(splitter, 1,
-     set_feature(&cpu->env, ARM_FEATURE_VFP3);
+                                   qdev_get_gpio_in(extgicdev, irq_id - 32));
-+    cpu->isar.mvfr0 = 0x10110221;
++        } else {
-+    cpu->isar.mvfr1 = 0x00000011;
++            s->irq_table[n] = is->int_combiner_irq[n];
- }
+         }
+     }
- static const ARMCPRegInfo cortexa8_cp_reginfo[] = {
+     /*
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 09/48] target/arm: Factor out VFP access checking code
+[PULL 21/31] hw/arm/exynos4210: Connect MCT_G0 and MCT_G1 to both combiners
-Factor out the VFP access checking code so that we can use it in the
+Currently for the interrupts MCT_G0 and MCT_G1 which are
-leaf functions of the decodetree decoder.
+the only ones in the input range of the external combiner
 and which are also wired to the external GIC, we connect
 them only to the internal combiner and the external GIC.
 This seems likely to be a bug, as all other interrupts
 which are in the input range of both combiners are
 connected to both combiners. (The fact that the code in
 exynos4210_combiner_get_gpioin() is also trying to wire
 up these inputs on both combiners also suggests this.)
-We call the function full_vfp_access_check() so we can keep
+Wire these interrupts up to both combiners, like the rest.
 the more natural vfp_access_check() for a version which doesn't
 have the 'ignore_vfp_enabled' flag -- that way almost all VFP
 insns will be able to use vfp_access_check(s) and only the
 special-register access function will have to use
 full_vfp_access_check(s, ignore_vfp_enabled).
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-15-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 100 ++++++++++++++++++++++++++++++++
+ hw/arm/exynos4210.c | 7 +++----
- target/arm/translate.c         | 101 +++++----------------------------
+file changed, 3 insertions(+), 4 deletions(-)
 files changed, 113 insertions(+), 88 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/arm/exynos4210.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/arm/exynos4210.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
- /* Include the generated VFP decoder */
- #include "decode-vfp.inc.c"
+         assert(splitcount < EXYNOS4210_NUM_SPLITTERS);
- #include "decode-vfp-uncond.inc.c"
+         splitter = DEVICE(&s->splitter[splitcount]);
-+
+-        qdev_prop_set_uint16(splitter, "num-lines", 2);
-+/*
++        qdev_prop_set_uint16(splitter, "num-lines", irq_id ? 3 : 2);
-+ * Check that VFP access is enabled. If it is, do the necessary
+         qdev_realize(splitter, NULL, &error_abort);
-+ * M-profile lazy-FP handling and then return true.
+         splitcount++;
-+ * If not, emit code to generate an appropriate exception and
+         s->irq_table[n] = qdev_get_gpio_in(splitter, 0);
-+ * return false.
+         qdev_connect_gpio_out(splitter, 0, is->int_combiner_irq[n]);
-+ * The ignore_vfp_enabled argument specifies that we should ignore
++        qdev_connect_gpio_out(splitter, 1, is->ext_combiner_irq[n]);
-+ * whether VFP is enabled via FPEXC[EN]: this should be true for FMXR/FMRX
+         if (irq_id) {
-+ * accesses to FPSID, FPEXC, MVFR0, MVFR1, MVFR2, and false for all other insns.
+-            qdev_connect_gpio_out(splitter, 1,
-+ */
++            qdev_connect_gpio_out(splitter, 2,
-+static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
+                                   qdev_get_gpio_in(extgicdev, irq_id - 32));
-+{
+-        } else {
-+    if (s->fp_excp_el) {
+-            qdev_connect_gpio_out(splitter, 1, is->ext_combiner_irq[n]);
 +        if (arm_dc_feature(s, ARM_FEATURE_M)) {
 +            gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
 +                               s->fp_excp_el);
 +        } else {
 +            gen_exception_insn(s, 4, EXCP_UDEF,
 +                               syn_fp_access_trap(1, 0xe, false),
 +                               s->fp_excp_el);
 +        }
 +        return false;
 +    }
 +
 +    if (!s->vfp_enabled && !ignore_vfp_enabled) {
 +        assert(!arm_dc_feature(s, ARM_FEATURE_M));
 +        gen_exception_insn(s, 4, EXCP_UDEF, syn_uncategorized(),
 +                           default_exception_el(s));
 +        return false;
 +    }
 +
 +    if (arm_dc_feature(s, ARM_FEATURE_M)) {
 +        /* Handle M-profile lazy FP state mechanics */
 +
 +        /* Trigger lazy-state preservation if necessary */
 +        if (s->v7m_lspact) {
 +            /*
 +             * Lazy state saving affects external memory and also the NVIC,
 +             * so we must mark it as an IO operation for icount.
 +             */
 +            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 +                gen_io_start();
 +            }
 +            gen_helper_v7m_preserve_fp_state(cpu_env);
 +            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 +                gen_io_end();
 +            }
 +            /*
 +             * If the preserve_fp_state helper doesn't throw an exception
 +             * then it will clear LSPACT; we don't need to repeat this for
 +             * any further FP insns in this TB.
 +             */
 +            s->v7m_lspact = false;
 +        }
 +
 +        /* Update ownership of FP context: set FPCCR.S to match current state */
 +        if (s->v8m_fpccr_s_wrong) {
 +            TCGv_i32 tmp;
 +
 +            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
 +            if (s->v8m_secure) {
 +                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
 +            } else {
 +                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
 +            }
 +            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
 +            /* Don't need to do this for any further FP insns in this TB */
 +            s->v8m_fpccr_s_wrong = false;
 +        }
 +
 +        if (s->v7m_new_fp_ctxt_needed) {
 +            /*
 +             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
 +             * and the FPSCR.
 +             */
 +            TCGv_i32 control, fpscr;
 +            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
 +
 +            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
 +            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
 +            tcg_temp_free_i32(fpscr);
 +            /*
 +             * We don't need to arrange to end the TB, because the only
 +             * parts of FPSCR which we cache in the TB flags are the VECLEN
 +             * and VECSTRIDE, and those don't exist for M-profile.
 +             */
 +
 +            if (s->v8m_secure) {
 +                bits |= R_V7M_CONTROL_SFPA_MASK;
 +            }
 +            control = load_cpu_field(v7m.control[M_REG_S]);
 +            tcg_gen_ori_i32(control, control, bits);
 +            store_cpu_field(control, v7m.control[M_REG_S]);
 +            /* Don't need to do this for any further FP insns in this TB */
 +            s->v7m_new_fp_ctxt_needed = false;
 +        }
 +    }
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
      return 1;
  }
 -/* Disassemble a VFP instruction.  Returns nonzero if an error occurred
 -   (ie. an undefined instruction).  */
 +/*
 + * Disassemble a VFP instruction.  Returns nonzero if an error occurred
 + * (ie. an undefined instruction).
 + */
  static int disas_vfp_insn(DisasContext *s, uint32_t insn)
  {
      uint32_t rd, rn, rm, op, i, n, offset, delta_d, delta_m, bank_mask;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
      TCGv_i32 addr;
      TCGv_i32 tmp;
      TCGv_i32 tmp2;
 +    bool ignore_vfp_enabled = false;
      if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
          return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
          }
      }
+     for (; n < EXYNOS4210_MAX_INT_COMBINER_IN_IRQ; n++) {
 -    /* FIXME: this access check should not take precedence over UNDEF
 +    /*
 +     * FIXME: this access check should not take precedence over UNDEF
       * for invalid encodings; we will generate incorrect syndrome information
       * for attempts to execute invalid vfp/neon encodings with FP disabled.
       */
 -    if (s->fp_excp_el) {
 -        if (arm_dc_feature(s, ARM_FEATURE_M)) {
 -            gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
 -                               s->fp_excp_el);
 -        } else {
 -            gen_exception_insn(s, 4, EXCP_UDEF,
 -                               syn_fp_access_trap(1, 0xe, false),
 -                               s->fp_excp_el);
 -        }
 -        return 0;
 -    }
 -
 -    if (!s->vfp_enabled) {
 -        /* VFP disabled.  Only allow fmxr/fmrx to/from some control regs.  */
 -        if ((insn & 0x0fe00fff) != 0x0ee00a10)
 -            return 1;
 +    if ((insn & 0x0fe00fff) == 0x0ee00a10) {
          rn = (insn >> 16) & 0xf;
 -        if (rn != ARM_VFP_FPSID && rn != ARM_VFP_FPEXC && rn != ARM_VFP_MVFR2
 -            && rn != ARM_VFP_MVFR1 && rn != ARM_VFP_MVFR0) {
 -            return 1;
 +        if (rn == ARM_VFP_FPSID || rn == ARM_VFP_FPEXC || rn == ARM_VFP_MVFR2
 +            || rn == ARM_VFP_MVFR1 || rn == ARM_VFP_MVFR0) {
 +            ignore_vfp_enabled = true;
          }
      }
 -
 -    if (arm_dc_feature(s, ARM_FEATURE_M)) {
 -        /* Handle M-profile lazy FP state mechanics */
 -
 -        /* Trigger lazy-state preservation if necessary */
 -        if (s->v7m_lspact) {
 -            /*
 -             * Lazy state saving affects external memory and also the NVIC,
 -             * so we must mark it as an IO operation for icount.
 -             */
 -            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 -                gen_io_start();
 -            }
 -            gen_helper_v7m_preserve_fp_state(cpu_env);
 -            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 -                gen_io_end();
 -            }
 -            /*
 -             * If the preserve_fp_state helper doesn't throw an exception
 -             * then it will clear LSPACT; we don't need to repeat this for
 -             * any further FP insns in this TB.
 -             */
 -            s->v7m_lspact = false;
 -        }
 -
 -        /* Update ownership of FP context: set FPCCR.S to match current state */
 -        if (s->v8m_fpccr_s_wrong) {
 -            TCGv_i32 tmp;
 -
 -            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
 -            if (s->v8m_secure) {
 -                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
 -            } else {
 -                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
 -            }
 -            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
 -            /* Don't need to do this for any further FP insns in this TB */
 -            s->v8m_fpccr_s_wrong = false;
 -        }
 -
 -        if (s->v7m_new_fp_ctxt_needed) {
 -            /*
 -             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
 -             * and the FPSCR.
 -             */
 -            TCGv_i32 control, fpscr;
 -            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
 -
 -            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
 -            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
 -            tcg_temp_free_i32(fpscr);
 -            /*
 -             * We don't need to arrange to end the TB, because the only
 -             * parts of FPSCR which we cache in the TB flags are the VECLEN
 -             * and VECSTRIDE, and those don't exist for M-profile.
 -             */
 -
 -            if (s->v8m_secure) {
 -                bits |= R_V7M_CONTROL_SFPA_MASK;
 -            }
 -            control = load_cpu_field(v7m.control[M_REG_S]);
 -            tcg_gen_ori_i32(control, control, bits);
 -            store_cpu_field(control, v7m.control[M_REG_S]);
 -            /* Don't need to do this for any further FP insns in this TB */
 -            s->v7m_new_fp_ctxt_needed = false;
 -        }
 +    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
 +        return 0;
      }
      if (extract32(insn, 28, 4) == 0xf) {
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 42/48] target/arm: Convert VFP round insns to decodetree
+[PULL 22/31] hw/arm/exynos4210: Don't connect multiple lines to external GIC inputs
-Convert the VFP round-to-integer instructions VRINTR, VRINTZ and
+The combiner_grp_to_gic_id[] array includes the EXT_GIC_ID_MCT_G0
-VRINTX to decodetree.
+and EXT_GIC_ID_MCT_G1 multiple times. This means that we will
 connect multiple IRQs up to the same external GIC input, which
 is not permitted. We do the same thing in the code in
 exynos4210_init_board_irqs() because the conditionals selecting
 an irq_id in the first loop match multiple interrupt IDs.
-These instructions were only introduced as part of the "VFP misc"
+Overall we do this for interrupt IDs
-additions in v8A, so we check this. The old decoder's implementation
+(1, 4), (12, 4), (35, 4), (51, 4), (53, 4) for EXT_GIC_ID_MCT_G0
-was incorrectly providing them even for v7A CPUs.
+and
 (1, 5), (12, 5), (35, 5), (51, 5), (53, 5) for EXT_GIC_ID_MCT_G1
 These correspond to the cases for the multi-core timer that we are
 wiring up to multiple inputs on the combiner in
 exynos4210_combiner_get_gpioin().  That code already deals with all
 these interrupt IDs being the same input source, so we don't need to
 connect the external GIC interrupt for any of them except the first
 (1, 4) and (1, 5). Remove the array entries and conditionals which
 were incorrectly causing us to wire up extra lines.
 This bug didn't cause any visible effects, because we only connect
 up a device to the "primary" ID values (1, 4) and (1, 5), so the
 extra lines would never be set to a level.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-16-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 163 +++++++++++++++++++++++++++++++++
+ include/hw/arm/exynos4210.h |  2 +-
- target/arm/translate.c         |  45 +--------
+ hw/arm/exynos4210.c         | 12 +++++-------
- target/arm/vfp.decode          |  15 +++
+files changed, 6 insertions(+), 8 deletions(-)
 files changed, 179 insertions(+), 44 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/exynos4210.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/exynos4210.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
+@@ -XXX,XX +XXX,XX @@
-     tcg_temp_free_i32(tmp);
+  * one for every non-zero entry in combiner_grp_to_gic_id[].
-     return true;
+  * We'll assert in exynos4210_init_board_irqs() if this is wrong.
- }
+  */
-+
+-#define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 60)
-+static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
++#define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 54)
-+{
-+    TCGv_ptr fpst;
+ typedef struct Exynos4210Irq {
-+    TCGv_i32 tmp;
+     qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
-+
+diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i32();
 +    neon_load_reg32(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    gen_helper_rints(tmp, tmp, fpst);
 +    neon_store_reg32(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
 +static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_sp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i64 tmp;
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i64();
 +    neon_load_reg64(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    gen_helper_rintd(tmp, tmp, fpst);
 +    neon_store_reg64(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i64(tmp);
 +    return true;
 +}
 +
 +static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i32 tmp;
 +    TCGv_i32 tcg_rmode;
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i32();
 +    neon_load_reg32(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    tcg_rmode = tcg_const_i32(float_round_to_zero);
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    gen_helper_rints(tmp, tmp, fpst);
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    neon_store_reg32(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tcg_rmode);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
 +static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_sp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i64 tmp;
 +    TCGv_i32 tcg_rmode;
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i64();
 +    neon_load_reg64(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    tcg_rmode = tcg_const_i32(float_round_to_zero);
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    gen_helper_rintd(tmp, tmp, fpst);
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    neon_store_reg64(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i64(tmp);
 +    tcg_temp_free_i32(tcg_rmode);
 +    return true;
 +}
 +
 +static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i32 tmp;
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i32();
 +    neon_load_reg32(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    gen_helper_rints_exact(tmp, tmp, fpst);
 +    neon_store_reg32(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
 +static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i64 tmp;
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i64();
 +    neon_load_reg64(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    gen_helper_rintd_exact(tmp, tmp, fpst);
 +    neon_store_reg64(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i64(tmp);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/arm/exynos4210.c
-+++ b/target/arm/translate.c
++++ b/hw/arm/exynos4210.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ combiner_grp_to_gic_id[64 - EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
-                 return 1;
+     /* int combiner group 34 */
-             case 15:
+     { EXT_GIC_ID_ONENAND_AUDI, EXT_GIC_ID_NFC },
-                 switch (rn) {
+     /* int combiner group 35 */
--                case 0 ... 11:
+-    { 0, 0, 0, EXT_GIC_ID_MCT_L1, EXT_GIC_ID_MCT_G0, EXT_GIC_ID_MCT_G1 },
-+                case 0 ... 14:
++    { 0, 0, 0, EXT_GIC_ID_MCT_L1 },
-                     /* Already handled by decodetree */
+     /* int combiner group 36 */
-                     return 1;
+     { EXT_GIC_ID_MIXER },
-                 default:
+     /* int combiner group 37 */
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ combiner_grp_to_gic_id[64 - EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
-             if (op == 15) {
+     /* groups 38-50 */
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
+     { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { },
-                 switch (rn) {
+     /* int combiner group 51 */
--                case 0x0c: /* vrintr */
+-    { EXT_GIC_ID_MCT_L0, 0, 0, 0, EXT_GIC_ID_MCT_G0, EXT_GIC_ID_MCT_G1 },
--                case 0x0d: /* vrintz */
++    { EXT_GIC_ID_MCT_L0 },
--                case 0x0e: /* vrintx */
+     /* group 52 */
--                    break;
+     { },
--
+     /* int combiner group 53 */
-                 case 0x0f: /* vcvt double<->single */
+-    { EXT_GIC_ID_WDT, 0, 0, 0, EXT_GIC_ID_MCT_G0, EXT_GIC_ID_MCT_G1 },
-                     rd_is_dp = !dp;
++    { EXT_GIC_ID_WDT },
-                     break;
+     /* groups 54-63 */
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+     { }, { }, { }, { }, { }, { }, { }, { }, { }, { }
-                 switch (op) {
+ };
-                 case 15: /* extension space */
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
-                     switch (rn) {
--                    case 12: /* vrintr */
+     for (n = 0; n < EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ; n++) {
--                    {
+         irq_id = 0;
--                        TCGv_ptr fpst = get_fpstatus_ptr(0);
+-        if (n == EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 4) ||
--                        if (dp) {
+-                n == EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 4)) {
--                            gen_helper_rintd(cpu_F0d, cpu_F0d, fpst);
++        if (n == EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 4)) {
--                        } else {
+             /* MCT_G0 is passed to External GIC */
--                            gen_helper_rints(cpu_F0s, cpu_F0s, fpst);
+             irq_id = EXT_GIC_ID_MCT_G0;
--                        }
+         }
--                        tcg_temp_free_ptr(fpst);
+-        if (n == EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 5) ||
--                        break;
+-                n == EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 5)) {
--                    }
++        if (n == EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 5)) {
--                    case 13: /* vrintz */
+             /* MCT_G1 is passed to External and GIC */
--                    {
+             irq_id = EXT_GIC_ID_MCT_G1;
--                        TCGv_ptr fpst = get_fpstatus_ptr(0);
+         }
 -                        TCGv_i32 tcg_rmode;
 -                        tcg_rmode = tcg_const_i32(float_round_to_zero);
 -                        gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -                        if (dp) {
 -                            gen_helper_rintd(cpu_F0d, cpu_F0d, fpst);
 -                        } else {
 -                            gen_helper_rints(cpu_F0s, cpu_F0s, fpst);
 -                        }
 -                        gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -                        tcg_temp_free_i32(tcg_rmode);
 -                        tcg_temp_free_ptr(fpst);
 -                        break;
 -                    }
 -                    case 14: /* vrintx */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(0);
 -                        if (dp) {
 -                            gen_helper_rintd_exact(cpu_F0d, cpu_F0d, fpst);
 -                        } else {
 -                            gen_helper_rints_exact(cpu_F0s, cpu_F0s, fpst);
 -                        }
 -                        tcg_temp_free_ptr(fpst);
 -                        break;
 -                    }
                      case 15: /* single<->double conversion */
                          if (dp) {
                              gen_helper_vfp_fcvtsd(cpu_F0s, cpu_F0d, cpu_env);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
               vd=%vd_sp vm=%vm_sp
  VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
               vd=%vd_sp vm=%vm_dp
 +
 +VRINTR_sp    ---- 1110 1.11 0110 .... 1010 01.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VRINTR_dp    ---- 1110 1.11 0110 .... 1011 01.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 +
 +VRINTZ_sp    ---- 1110 1.11 0110 .... 1010 11.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VRINTZ_dp    ---- 1110 1.11 0110 .... 1011 11.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 +
 +VRINTX_sp    ---- 1110 1.11 0111 .... 1010 01.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VRINTX_dp    ---- 1110 1.11 0111 .... 1011 01.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 26/48] target/arm: Convert VFP VNMLS to decodetree
+[PULL 23/31] hw/arm/exynos4210: Fold combiner splits into exynos4210_init_board_irqs()
-Convert the VFP VNMLS instruction to decodetree.
+At this point, the function exynos4210_init_board_irqs() splits input
 IRQ lines to connect them to the input combiner, output combiner and
 external GIC.  The function exynos4210_combiner_get_gpioin() splits
 some of the combiner input lines further to connect them to multiple
 different inputs on the combiner.
 Because (unlike qemu_irq_split()) the TYPE_SPLIT_IRQ device has a
 configurable number of outputs, we can do all this in one place, by
 making exynos4210_init_board_irqs() add extra outputs to the splitter
 device when it must be connected to more than one input on each
 combiner.
 We do this with a new data structure, the combinermap, which is an
 array each of whose elements is a list of the interrupt IDs on the
 combiner which must be tied together.  As we loop through each
 interrupt ID, if we find that it is the first one in one of these
 lists, we configure the splitter device with eonugh extra outputs and
 wire them up to the other interrupt IDs in the list.
 Conveniently, for all the cases where this is necessary, the
 lowest-numbered interrupt ID in each group is in the range of the
 external combiner, so we only need to code for this in the first of
 the two loops in exynos4210_init_board_irqs().
 The old code in exynos4210_combiner_get_gpioin() which is being
 deleted here had several problems which don't exist in the new code
 in its handling of the multi-core timer interrupts:
  (1) the case labels specified bits 4 ... 8, but bit '8' doesn't
      exist; these should have been 4 ... 7
  (2) it used the input irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]
      multiple times as the input of several different splitters,
      which isn't allowed
  (3) in an apparent cut-and-paste error, the cases for all the
      multi-core timer inputs used "bit + 4" even though the
      bit range for the case was (intended to be) 4 ... 7, which
      meant it was looking at non-existent bits 8 ... 11.
 None of these exist in the new code.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-17-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 42 ++++++++++++++++++++++++++++++++++
+ include/hw/arm/exynos4210.h |   6 +-
- target/arm/translate.c         | 24 +------------------
+ hw/arm/exynos4210.c         | 178 +++++++++++++++++++++++-------------
- target/arm/vfp.decode          |  5 ++++
+files changed, 119 insertions(+), 65 deletions(-)
-files changed, 48 insertions(+), 23 deletions(-)
+diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/exynos4210.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/exynos4210.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_sp *a)
+@@ -XXX,XX +XXX,XX @@
- {
-     return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
+ /*
- }
+  * We need one splitter for every external combiner input, plus
-+
+- * one for every non-zero entry in combiner_grp_to_gic_id[].
-+static void gen_VNMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
++ * one for every non-zero entry in combiner_grp_to_gic_id[],
 + * minus one for every external combiner ID in second or later
 + * places in a combinermap[] line.
   * We'll assert in exynos4210_init_board_irqs() if this is wrong.
   */
 -#define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 54)
 +#define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 38)
  typedef struct Exynos4210Irq {
      qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/exynos4210.c
 +++ b/hw/arm/exynos4210.c
@@ -XXX,XX +XXX,XX @@ combiner_grp_to_gic_id[64 - EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
  #define EXYNOS4210_COMBINER_GET_BIT_NUM(irq) \
      ((irq) - 8 * EXYNOS4210_COMBINER_GET_GRP_NUM(irq))
 +/*
 + * Some interrupt lines go to multiple combiner inputs.
 + * This data structure defines those: each array element is
 + * a list of combiner inputs which are connected together;
 + * the one with the smallest interrupt ID value must be first.
 + * As with combiner_grp_to_gic_id[], we rely on (0, 0) not being
 + * wired to anything so we can use 0 as a terminator.
 + */
 +#define IRQNO(G, B) EXYNOS4210_COMBINER_GET_IRQ_NUM(G, B)
 +#define IRQNONE 0
 +
 +#define COMBINERMAP_SIZE 16
 +
 +static const int combinermap[COMBINERMAP_SIZE][6] = {
 +    /* MDNIE_LCD1 */
 +    { IRQNO(0, 4), IRQNO(1, 0), IRQNONE },
 +    { IRQNO(0, 5), IRQNO(1, 1), IRQNONE },
 +    { IRQNO(0, 6), IRQNO(1, 2), IRQNONE },
 +    { IRQNO(0, 7), IRQNO(1, 3), IRQNONE },
 +    /* TMU */
 +    { IRQNO(2, 4), IRQNO(3, 4), IRQNONE },
 +    { IRQNO(2, 5), IRQNO(3, 5), IRQNONE },
 +    { IRQNO(2, 6), IRQNO(3, 6), IRQNONE },
 +    { IRQNO(2, 7), IRQNO(3, 7), IRQNONE },
 +    /* LCD1 */
 +    { IRQNO(11, 4), IRQNO(12, 0), IRQNONE },
 +    { IRQNO(11, 5), IRQNO(12, 1), IRQNONE },
 +    { IRQNO(11, 6), IRQNO(12, 2), IRQNONE },
 +    { IRQNO(11, 7), IRQNO(12, 3), IRQNONE },
 +    /* Multi-core timer */
 +    { IRQNO(1, 4), IRQNO(12, 4), IRQNO(35, 4), IRQNO(51, 4), IRQNO(53, 4), IRQNONE },
 +    { IRQNO(1, 5), IRQNO(12, 5), IRQNO(35, 5), IRQNO(51, 5), IRQNO(53, 5), IRQNONE },
 +    { IRQNO(1, 6), IRQNO(12, 6), IRQNO(35, 6), IRQNO(51, 6), IRQNO(53, 6), IRQNONE },
 +    { IRQNO(1, 7), IRQNO(12, 7), IRQNO(35, 7), IRQNO(51, 7), IRQNO(53, 7), IRQNONE },
 +};
 +
 +#undef IRQNO
 +
 +static const int *combinermap_entry(int irq)
 +{
 +    /*
-+     * VNMLS: -fd + (fn * fm)
++     * If the interrupt number passed in is the first entry in some
-+     * Note that it isn't valid to replace (-A + B) with (B - A) or similar
++     * line of the combinermap, return a pointer to that line;
-+     * plausible looking simplifications because this will give wrong results
++     * otherwise return NULL.
 +     * for NaNs.
 +     */
-+    TCGv_i32 tmp = tcg_temp_new_i32();
++    int i;
-+
++    for (i = 0; i < COMBINERMAP_SIZE; i++) {
-+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
++        if (combinermap[i][0] == irq) {
-+    gen_helper_vfp_negs(vd, vd);
++            return combinermap[i];
-+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
++        }
-+    tcg_temp_free_i32(tmp);
++    }
 +    return NULL;
 +}
 +
-+static bool trans_VNMLS_sp(DisasContext *s, arg_VNMLS_sp *a)
++static int mapline_size(const int *mapline)
 +{
-+    return do_vfp_3op_sp(s, gen_VNMLS_sp, a->vd, a->vn, a->vm, true);
++    /* Return number of entries in this mapline in total */
 +    int i = 0;
 +
 +    if (!mapline) {
 +        /* Not in the map? IRQ goes to exactly one combiner input */
 +        return 1;
 +    }
 +    while (*mapline != IRQNONE) {
 +        mapline++;
 +        i++;
 +    }
 +    return i;
 +}
 +
-+static void gen_VNMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+ /*
-+{
+  * Initialize board IRQs.
-+    /*
+  * These IRQs contain splitted Int/External Combiner and External Gic IRQs.
-+     * VNMLS: -fd + (fn * fm)
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
-+     * Note that it isn't valid to replace (-A + B) with (B - A) or similar
+     DeviceState *extgicdev = DEVICE(&s->ext_gic);
-+     * plausible looking simplifications because this will give wrong results
+     int splitcount = 0;
-+     * for NaNs.
+     DeviceState *splitter;
-+     */
++    const int *mapline;
-+    TCGv_i64 tmp = tcg_temp_new_i64();
++    int numlines, splitin, in;
-+
-+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
+     for (n = 0; n < EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ; n++) {
-+    gen_helper_vfp_negd(vd, vd);
+         irq_id = 0;
-+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
-+    tcg_temp_free_i64(tmp);
+             irq_id = EXT_GIC_ID_MCT_G1;
-+}
+         }
-+
-+static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_sp *a)
++        if (s->irq_table[n]) {
-+{
++            /*
-+    return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
++             * This must be some non-first entry in a combinermap line,
-+}
++             * and we've already filled it in.
-diff --git a/target/arm/translate.c b/target/arm/translate.c
++             */
-index XXXXXXX..XXXXXXX 100644
++            continue;
---- a/target/arm/translate.c
++        }
-+++ b/target/arm/translate.c
++        mapline = combinermap_entry(n);
-@@ -XXX,XX +XXX,XX @@ VFP_OP2(div)
++        /*
++         * We need to connect the IRQ to multiple inputs on both combiners
- #undef VFP_OP2
++         * and possibly also to the external GIC.
++         */
--static inline void gen_vfp_F1_mul(int dp)
++        numlines = 2 * mapline_size(mapline);
--{
++        if (irq_id) {
--    /* Like gen_vfp_mul() but put result in F1 */
++            numlines++;
--    TCGv_ptr fpst = get_fpstatus_ptr(0);
++        }
--    if (dp) {
+         assert(splitcount < EXYNOS4210_NUM_SPLITTERS);
--        gen_helper_vfp_muld(cpu_F1d, cpu_F0d, cpu_F1d, fpst);
+         splitter = DEVICE(&s->splitter[splitcount]);
--    } else {
+-        qdev_prop_set_uint16(splitter, "num-lines", irq_id ? 3 : 2);
--        gen_helper_vfp_muls(cpu_F1s, cpu_F0s, cpu_F1s, fpst);
++        qdev_prop_set_uint16(splitter, "num-lines", numlines);
--    }
+         qdev_realize(splitter, NULL, &error_abort);
--    tcg_temp_free_ptr(fpst);
+         splitcount++;
--}
+-        s->irq_table[n] = qdev_get_gpio_in(splitter, 0);
--
+-        qdev_connect_gpio_out(splitter, 0, is->int_combiner_irq[n]);
- static inline void gen_vfp_F1_neg(int dp)
+-        qdev_connect_gpio_out(splitter, 1, is->ext_combiner_irq[n]);
 +
 +        in = n;
 +        splitin = 0;
 +        for (;;) {
 +            s->irq_table[in] = qdev_get_gpio_in(splitter, 0);
 +            qdev_connect_gpio_out(splitter, splitin, is->int_combiner_irq[in]);
 +            qdev_connect_gpio_out(splitter, splitin + 1, is->ext_combiner_irq[in]);
 +            splitin += 2;
 +            if (!mapline) {
 +                break;
 +            }
 +            mapline++;
 +            in = *mapline;
 +            if (in == IRQNONE) {
 +                break;
 +            }
 +        }
          if (irq_id) {
 -            qdev_connect_gpio_out(splitter, 2,
 +            qdev_connect_gpio_out(splitter, splitin,
                                    qdev_get_gpio_in(extgicdev, irq_id - 32));
          }
      }
@@ -XXX,XX +XXX,XX @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
          irq_id = combiner_grp_to_gic_id[grp -
                       EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][bit];
 +        if (s->irq_table[n]) {
 +            /*
 +             * This must be some non-first entry in a combinermap line,
 +             * and we've already filled it in.
 +             */
 +            continue;
 +        }
 +
          if (irq_id) {
              assert(splitcount < EXYNOS4210_NUM_SPLITTERS);
              splitter = DEVICE(&s->splitter[splitcount]);
@@ -XXX,XX +XXX,XX @@ static void exynos4210_combiner_get_gpioin(Exynos4210Irq *irqs,
                                             DeviceState *dev, int ext)
  {
-     /* Like gen_vfp_neg() but put result in F1 */
+     int n;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-    int bit;
-             rn = VFP_SREG_N(insn);
+     int max;
+     qemu_irq *irq;
-             switch (op) {
--            case 0 ... 1:
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_combiner_get_gpioin(Exynos4210Irq *irqs,
-+            case 0 ... 2:
+         EXYNOS4210_MAX_INT_COMBINER_IN_IRQ;
-                 /* Already handled by decodetree */
+     irq = ext ? irqs->ext_combiner_irq : irqs->int_combiner_irq;
-                 return 1;
-             default:
+-    /*
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-     * Some IRQs of Int/External Combiner are going to two Combiners groups,
-             for (;;) {
+-     * so let split them.
-                 /* Perform the calculation.  */
+-     */
-                 switch (op) {
+     for (n = 0; n < max; n++) {
--                case 2: /* VNMLS: -fd + (fn * fm) */
+-
--                    /* Note that it isn't valid to replace (-A + B) with (B - A)
+-        bit = EXYNOS4210_COMBINER_GET_BIT_NUM(n);
--                     * or similar plausible looking simplifications
+-
--                     * because this will give wrong results for NaNs.
+-        switch (n) {
--                     */
+-        /* MDNIE_LCD1 INTG1 */
--                    gen_vfp_F1_mul(dp);
+-        case EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 0) ...
--                    gen_mov_F0_vreg(dp, rd);
+-             EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 3):
--                    gen_vfp_neg(dp);
+-            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
--                    gen_vfp_add(dp);
+-                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(0, bit + 4)]);
--                    break;
+-            continue;
-                 case 3: /* VNMLA: -fd + -(fn * fm) */
+-
-                     gen_vfp_mul(dp);
+-        /* TMU INTG3 */
-                     gen_vfp_F1_neg(dp);
+-        case EXYNOS4210_COMBINER_GET_IRQ_NUM(3, 4):
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
+-            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
-index XXXXXXX..XXXXXXX 100644
+-                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(2, bit)]);
---- a/target/arm/vfp.decode
+-            continue;
-+++ b/target/arm/vfp.decode
+-
-@@ -XXX,XX +XXX,XX @@ VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 .... \
+-        /* LCD1 INTG12 */
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
+-        case EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 0) ...
- VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 .... \
+-             EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 3):
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+-            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
-+
+-                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(11, bit + 4)]);
-+VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 .... \
+-            continue;
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+-
-+VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 .... \
+-        /* Multi-Core Timer INTG12 */
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
+-        case EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 4) ...
 -             EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 8):
 -               irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 -                       irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
 -            continue;
 -
 -        /* Multi-Core Timer INTG35 */
 -        case EXYNOS4210_COMBINER_GET_IRQ_NUM(35, 4) ...
 -             EXYNOS4210_COMBINER_GET_IRQ_NUM(35, 8):
 -            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 -                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
 -            continue;
 -
 -        /* Multi-Core Timer INTG51 */
 -        case EXYNOS4210_COMBINER_GET_IRQ_NUM(51, 4) ...
 -             EXYNOS4210_COMBINER_GET_IRQ_NUM(51, 8):
 -            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 -                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
 -            continue;
 -
 -        /* Multi-Core Timer INTG53 */
 -        case EXYNOS4210_COMBINER_GET_IRQ_NUM(53, 4) ...
 -             EXYNOS4210_COMBINER_GET_IRQ_NUM(53, 8):
 -            irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
 -                    irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
 -            continue;
 -        }
 -
          irq[n] = qdev_get_gpio_in(dev, n);
      }
  }
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 12/48] target/arm: Convert the VSEL instructions to decodetree
+[PULL 24/31] hw/arm/exynos4210: Put combiners into state struct
-Convert the VSEL instructions to decodetree.
+Switch the creation of the combiner devices to the new-style
-We leave trans_VSEL() in translate.c for now as this allows
+"embedded in state struct" approach, so we can easily refer
-the patch to show just the changes from the old handle_vsel().
+to the object elsewhere during realize.
 In the old code the check for "do D16-D31 exist" was hidden in
 the VFP_DREG macro, and assumed that VFPv3 always implied that
 D16-D31 exist. In the new code we do the correct ID register test.
 This gives identical behaviour for most of our CPUs, and fixes
 previously incorrect handling for  Cortex-R5F, Cortex-M4 and
 Cortex-M33, which all implement VFPv3 or better with only 16
 double-precision registers.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-18-peter.maydell@linaro.org
 ---
- target/arm/cpu.h               |  6 ++++++
+ include/hw/arm/exynos4210.h           |  3 ++
- target/arm/translate-vfp.inc.c |  9 +++++++++
+ include/hw/intc/exynos4210_combiner.h | 57 +++++++++++++++++++++++++++
- target/arm/translate.c         | 35 ++++++++++++++++++++++++----------
+ hw/arm/exynos4210.c                   | 20 +++++-----
- target/arm/vfp-uncond.decode   | 19 ++++++++++++++++++
+ hw/intc/exynos4210_combiner.c         | 31 +--------------
-files changed, 59 insertions(+), 10 deletions(-)
+files changed, 72 insertions(+), 39 deletions(-)
  create mode 100644 include/hw/intc/exynos4210_combiner.h
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/include/hw/arm/exynos4210.h
-+++ b/target/arm/cpu.h
++++ b/include/hw/arm/exynos4210.h
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@
-     return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
+ #include "hw/sysbus.h"
- }
+ #include "hw/cpu/a9mpcore.h"
+ #include "hw/intc/exynos4210_gic.h"
-+static inline bool isar_feature_aa32_fp_d32(const ARMISARegisters *id)
++#include "hw/intc/exynos4210_combiner.h"
-+{
+ #include "hw/core/split-irq.h"
-+    /* Return true if D16-D31 are implemented */
+ #include "target/arm/cpu-qom.h"
-+    return FIELD_EX64(id->mvfr0, MVFR0, SIMDREG) >= 2;
+ #include "qom/object.h"
-+}
+@@ -XXX,XX +XXX,XX @@ struct Exynos4210State {
      qemu_or_irq cpu_irq_orgate[EXYNOS4210_NCPUS];
      A9MPPrivState a9mpcore;
      Exynos4210GicState ext_gic;
 +    Exynos4210CombinerState int_combiner;
 +    Exynos4210CombinerState ext_combiner;
      SplitIRQ splitter[EXYNOS4210_NUM_SPLITTERS];
  };
 diff --git a/include/hw/intc/exynos4210_combiner.h b/include/hw/intc/exynos4210_combiner.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/intc/exynos4210_combiner.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * Samsung exynos4210 Interrupt Combiner
 + *
 + * Copyright (c) 2000 - 2011 Samsung Electronics Co., Ltd.
 + * All rights reserved.
 + *
 + * Evgeny Voevodin <e.voevodin@samsung.com>
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License as published by the
 + * Free Software Foundation; either version 2 of the License, or (at your
 + * option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 + * See the GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License along
 + * with this program; if not, see <http://www.gnu.org/licenses/>.
 + */
 +
- /*
++#ifndef HW_INTC_EXYNOS4210_COMBINER
-  * We always set the FP and SIMD FP16 fields to indicate identical
++#define HW_INTC_EXYNOS4210_COMBINER
-  * levels of support (assuming SIMD is implemented at all), so
++
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
++#include "hw/sysbus.h"
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.inc.c
 +++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
      return true;
  }
 +
 +/*
-+ * The most usual kind of VFP access check, for everything except
++ * State for each output signal of internal combiner
 + * FMXR/FMRX to the always-available special registers.
 + */
-+static bool vfp_access_check(DisasContext *s)
++typedef struct CombinerGroupState {
-+{
++    uint8_t src_mask;            /* 1 - source enabled, 0 - disabled */
-+    return full_vfp_access_check(s, false);
++    uint8_t src_pending;        /* Pending source interrupts before masking */
-+}
++} CombinerGroupState;
-diff --git a/target/arm/translate.c b/target/arm/translate.c
++
 +#define TYPE_EXYNOS4210_COMBINER "exynos4210.combiner"
 +OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210CombinerState, EXYNOS4210_COMBINER)
 +
 +/* Number of groups and total number of interrupts for the internal combiner */
 +#define IIC_NGRP 64
 +#define IIC_NIRQ (IIC_NGRP * 8)
 +#define IIC_REGSET_SIZE 0x41
 +
 +struct Exynos4210CombinerState {
 +    SysBusDevice parent_obj;
 +
 +    MemoryRegion iomem;
 +
 +    struct CombinerGroupState group[IIC_NGRP];
 +    uint32_t reg_set[IIC_REGSET_SIZE];
 +    uint32_t icipsr[2];
 +    uint32_t external;          /* 1 means that this combiner is external */
 +
 +    qemu_irq output_irq[IIC_NGRP];
 +};
 +
 +#endif
 diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/arm/exynos4210.c
-+++ b/target/arm/translate.c
++++ b/hw/arm/exynos4210.c
-@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
-     tcg_temp_free_i32(tmp);
+     }
      /* Internal Interrupt Combiner */
 -    dev = qdev_new("exynos4210.combiner");
 -    busdev = SYS_BUS_DEVICE(dev);
 -    sysbus_realize_and_unref(busdev, &error_fatal);
 +    busdev = SYS_BUS_DEVICE(&s->int_combiner);
 +    sysbus_realize(busdev, &error_fatal);
      for (n = 0; n < EXYNOS4210_MAX_INT_COMBINER_OUT_IRQ; n++) {
          sysbus_connect_irq(busdev, n,
                             qdev_get_gpio_in(DEVICE(&s->a9mpcore), n));
      }
 -    exynos4210_combiner_get_gpioin(&s->irqs, dev, 0);
 +    exynos4210_combiner_get_gpioin(&s->irqs, DEVICE(&s->int_combiner), 0);
      sysbus_mmio_map(busdev, 0, EXYNOS4210_INT_COMBINER_BASE_ADDR);
      /* External Interrupt Combiner */
 -    dev = qdev_new("exynos4210.combiner");
 -    qdev_prop_set_uint32(dev, "external", 1);
 -    busdev = SYS_BUS_DEVICE(dev);
 -    sysbus_realize_and_unref(busdev, &error_fatal);
 +    qdev_prop_set_uint32(DEVICE(&s->ext_combiner), "external", 1);
 +    busdev = SYS_BUS_DEVICE(&s->ext_combiner);
 +    sysbus_realize(busdev, &error_fatal);
      for (n = 0; n < EXYNOS4210_MAX_INT_COMBINER_OUT_IRQ; n++) {
          sysbus_connect_irq(busdev, n, qdev_get_gpio_in(DEVICE(&s->ext_gic), n));
      }
 -    exynos4210_combiner_get_gpioin(&s->irqs, dev, 1);
 +    exynos4210_combiner_get_gpioin(&s->irqs, DEVICE(&s->ext_combiner), 1);
      sysbus_mmio_map(busdev, 0, EXYNOS4210_EXT_COMBINER_BASE_ADDR);
      /* Initialize board IRQs. */
@@ -XXX,XX +XXX,XX @@ static void exynos4210_init(Object *obj)
      object_initialize_child(obj, "a9mpcore", &s->a9mpcore, TYPE_A9MPCORE_PRIV);
      object_initialize_child(obj, "ext-gic", &s->ext_gic, TYPE_EXYNOS4210_GIC);
 +    object_initialize_child(obj, "int-combiner", &s->int_combiner,
 +                            TYPE_EXYNOS4210_COMBINER);
 +    object_initialize_child(obj, "ext-combiner", &s->ext_combiner,
 +                            TYPE_EXYNOS4210_COMBINER);
  }
--static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
+ static void exynos4210_class_init(ObjectClass *klass, void *data)
--                       uint32_t dp)
+diff --git a/hw/intc/exynos4210_combiner.c b/hw/intc/exynos4210_combiner.c
 +static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
  {
 -    uint32_t cc = extract32(insn, 20, 2);
 +    uint32_t rd, rn, rm;
 +    bool dp = a->dp;
 +
 +    if (!dc_isar_feature(aa32_vsel, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 +        ((a->vm | a->vn | a->vd) & 0x10)) {
 +        return false;
 +    }
 +    rd = a->vd;
 +    rn = a->vn;
 +    rm = a->vm;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
      if (dp) {
          TCGv_i64 frn, frm, dest;
@@ -XXX,XX +XXX,XX @@ static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
          tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
          tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        switch (cc) {
 +        switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
                                  frn, frm);
@@ -XXX,XX +XXX,XX @@ static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
          dest = tcg_temp_new_i32();
          tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
          tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        switch (cc) {
 +        switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
                                  frn, frm);
@@ -XXX,XX +XXX,XX @@ static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
          tcg_temp_free_i32(zero);
      }
 -    return 0;
 +    return true;
  }
  static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
          rm = VFP_SREG_M(insn);
      }
 -    if ((insn & 0x0f800e50) == 0x0e000a00 && dc_isar_feature(aa32_vsel, s)) {
 -        return handle_vsel(insn, rd, rn, rm, dp);
 -    } else if ((insn & 0x0fb00e10) == 0x0e800a00 &&
 -               dc_isar_feature(aa32_vminmaxnm, s)) {
 +    if ((insn & 0x0fb00e10) == 0x0e800a00 &&
 +        dc_isar_feature(aa32_vminmaxnm, s)) {
          return handle_vminmaxnm(insn, rd, rn, rm, dp);
      } else if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
                 dc_isar_feature(aa32_vrint, s)) {
 diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp-uncond.decode
+--- a/hw/intc/exynos4210_combiner.c
-+++ b/target/arm/vfp-uncond.decode
++++ b/hw/intc/exynos4210_combiner.c
 @@ -XXX,XX +XXX,XX @@
- #  1111 1110 .... .... .... 101. .... ....
+ #include "hw/sysbus.h"
- # (but those patterns might also cover some Neon instructions,
+ #include "migration/vmstate.h"
- # which do not live in this file.)
+ #include "qemu/module.h"
-+
+-
-+# VFP registers have an odd encoding with a four-bit field
++#include "hw/intc/exynos4210_combiner.h"
-+# and a one-bit field which are assembled in different orders
+ #include "hw/arm/exynos4210.h"
-+# depending on whether the register is double or single precision.
+ #include "hw/hw.h"
-+# Each individual instruction function must do the checks for
+ #include "hw/irq.h"
-+# "double register selected but CPU does not have double support"
+@@ -XXX,XX +XXX,XX @@
-+# and "double register number has bit 4 set but CPU does not
+ #define DPRINTF(fmt, ...) do {} while (0)
-+# support D16-D31" (which should UNDEF).
+ #endif
-+%vm_dp  5:1 0:4
-+%vm_sp  0:4 5:1
+-#define    IIC_NGRP        64            /* Internal Interrupt Combiner
-+%vn_dp  7:1 16:4
+-                                            Groups number */
-+%vn_sp  16:4 7:1
+-#define    IIC_NIRQ        (IIC_NGRP * 8)/* Internal Interrupt Combiner
-+%vd_dp  22:1 12:4
+-                                            Interrupts number */
-+%vd_sp  12:4 22:1
+ #define IIC_REGION_SIZE    0x108         /* Size of memory mapped region */
-+
+-#define IIC_REGSET_SIZE    0x41
-+VSEL        1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
+-
-+            vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
+-/*
-+VSEL        1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
+- * State for each output signal of internal combiner
-+            vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
+- */
 -typedef struct CombinerGroupState {
 -    uint8_t src_mask;            /* 1 - source enabled, 0 - disabled */
 -    uint8_t src_pending;        /* Pending source interrupts before masking */
 -} CombinerGroupState;
 -
 -#define TYPE_EXYNOS4210_COMBINER "exynos4210.combiner"
 -OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210CombinerState, EXYNOS4210_COMBINER)
 -
 -struct Exynos4210CombinerState {
 -    SysBusDevice parent_obj;
 -
 -    MemoryRegion iomem;
 -
 -    struct CombinerGroupState group[IIC_NGRP];
 -    uint32_t reg_set[IIC_REGSET_SIZE];
 -    uint32_t icipsr[2];
 -    uint32_t external;          /* 1 means that this combiner is external */
 -
 -    qemu_irq output_irq[IIC_NGRP];
 -};
  static const VMStateDescription vmstate_exynos4210_combiner_group_state = {
      .name = "exynos4210.combiner.groupstate",
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 23/48] target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d
+[PULL 25/31] hw/arm/exynos4210: Drop Exynos4210Irq struct
-Expand out the sequences in the new decoder VLDR/VSTR/VLDM/VSTM trans
+The only time we use the int_combiner_irq[] and ext_combiner_irq[]
-functions which perform the memory accesses by going via the TCG
+arrays in the Exynos4210Irq struct is during realize of the SoC -- we
-globals cpu_F0s and cpu_F0d, to use local TCG temps instead.
+initialize them with the input IRQs of the combiner devices, and then
 connect those to outputs of other devices in
 exynos4210_init_board_irqs().  Now that the combiner objects are
 easily accessible as s->int_combiner and s->ext_combiner we can make
 the connections directly from one device to the other without going
 via these arrays.
 Since these are the only two remaining elements of Exynos4210Irq,
 we can remove that struct entirely.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220404154658.565020-19-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 46 +++++++++++++++++++++-------------
+ include/hw/arm/exynos4210.h |  6 ------
- target/arm/translate.c         | 18 -------------
+ hw/arm/exynos4210.c         | 34 ++++++++--------------------------
-files changed, 28 insertions(+), 36 deletions(-)
+files changed, 8 insertions(+), 32 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/exynos4210.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/exynos4210.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_sp *a)
+@@ -XXX,XX +XXX,XX @@
- static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
+  */
  #define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 38)
 -typedef struct Exynos4210Irq {
 -    qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 -    qemu_irq ext_combiner_irq[EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ];
 -} Exynos4210Irq;
 -
  struct Exynos4210State {
      /*< private >*/
      SysBusDevice parent_obj;
      /*< public >*/
      ARMCPU *cpu[EXYNOS4210_NCPUS];
 -    Exynos4210Irq irqs;
      qemu_irq irq_table[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
      MemoryRegion chipid_mem;
 diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/exynos4210.c
 +++ b/hw/arm/exynos4210.c
@@ -XXX,XX +XXX,XX @@ static int mapline_size(const int *mapline)
  static void exynos4210_init_board_irqs(Exynos4210State *s)
  {
-     uint32_t offset;
+     uint32_t grp, bit, irq_id, n;
--    TCGv_i32 addr;
+-    Exynos4210Irq *is = &s->irqs;
-+    TCGv_i32 addr, tmp;
+     DeviceState *extgicdev = DEVICE(&s->ext_gic);
++    DeviceState *intcdev = DEVICE(&s->int_combiner);
-     if (!vfp_access_check(s)) {
++    DeviceState *extcdev = DEVICE(&s->ext_combiner);
-         return true;
+     int splitcount = 0;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
+     DeviceState *splitter;
-         addr = load_reg(s, a->rn);
+     const int *mapline;
@@ -XXX,XX +XXX,XX @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
          splitin = 0;
          for (;;) {
              s->irq_table[in] = qdev_get_gpio_in(splitter, 0);
 -            qdev_connect_gpio_out(splitter, splitin, is->int_combiner_irq[in]);
 -            qdev_connect_gpio_out(splitter, splitin + 1, is->ext_combiner_irq[in]);
 +            qdev_connect_gpio_out(splitter, splitin,
 +                                  qdev_get_gpio_in(intcdev, in));
 +            qdev_connect_gpio_out(splitter, splitin + 1,
 +                                  qdev_get_gpio_in(extcdev, in));
              splitin += 2;
              if (!mapline) {
                  break;
@@ -XXX,XX +XXX,XX @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
              qdev_realize(splitter, NULL, &error_abort);
              splitcount++;
              s->irq_table[n] = qdev_get_gpio_in(splitter, 0);
 -            qdev_connect_gpio_out(splitter, 0, is->int_combiner_irq[n]);
 +            qdev_connect_gpio_out(splitter, 0, qdev_get_gpio_in(intcdev, n));
              qdev_connect_gpio_out(splitter, 1,
                                    qdev_get_gpio_in(extgicdev, irq_id - 32));
          } else {
 -            s->irq_table[n] = is->int_combiner_irq[n];
 +            s->irq_table[n] = qdev_get_gpio_in(intcdev, n);
          }
      }
-     tcg_gen_addi_i32(addr, addr, offset);
+     /*
-+    tmp = tcg_temp_new_i32();
+@@ -XXX,XX +XXX,XX @@ uint32_t exynos4210_get_irq(uint32_t grp, uint32_t bit)
-     if (a->l) {
+     return EXYNOS4210_COMBINER_GET_IRQ_NUM(grp, bit);
--        gen_vfp_ld(s, false, addr);
+ }
--        gen_mov_vreg_F0(false, a->vd);
-+        gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
+-/*
-+        neon_store_reg32(tmp, a->vd);
+- * Get Combiner input GPIO into irqs structure
-     } else {
+- */
--        gen_mov_F0_vreg(false, a->vd);
+-static void exynos4210_combiner_get_gpioin(Exynos4210Irq *irqs,
--        gen_vfp_st(s, false, addr);
+-                                           DeviceState *dev, int ext)
 +        neon_load_reg32(tmp, a->vd);
 +        gen_aa32_st32(s, tmp, addr, get_mem_index(s));
      }
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(addr);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
  {
      uint32_t offset;
      TCGv_i32 addr;
 +    TCGv_i64 tmp;
      /* UNDEF accesses to D16-D31 if they don't exist */
      if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
          addr = load_reg(s, a->rn);
      }
      tcg_gen_addi_i32(addr, addr, offset);
 +    tmp = tcg_temp_new_i64();
      if (a->l) {
 -        gen_vfp_ld(s, true, addr);
 -        gen_mov_vreg_F0(true, a->vd);
 +        gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
 +        neon_store_reg64(tmp, a->vd);
      } else {
 -        gen_mov_F0_vreg(true, a->vd);
 -        gen_vfp_st(s, true, addr);
 +        neon_load_reg64(tmp, a->vd);
 +        gen_aa32_st64(s, tmp, addr, get_mem_index(s));
      }
 +    tcg_temp_free_i64(tmp);
      tcg_temp_free_i32(addr);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
  static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
  {
      uint32_t offset;
 -    TCGv_i32 addr;
 +    TCGv_i32 addr, tmp;
      int i, n;
      n = a->imm;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
      }
      offset = 4;
 +    tmp = tcg_temp_new_i32();
      for (i = 0; i < n; i++) {
          if (a->l) {
              /* load */
 -            gen_vfp_ld(s, false, addr);
 -            gen_mov_vreg_F0(false, a->vd + i);
 +            gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
 +            neon_store_reg32(tmp, a->vd + i);
          } else {
              /* store */
 -            gen_mov_F0_vreg(false, a->vd + i);
 -            gen_vfp_st(s, false, addr);
 +            neon_load_reg32(tmp, a->vd + i);
 +            gen_aa32_st32(s, tmp, addr, get_mem_index(s));
          }
          tcg_gen_addi_i32(addr, addr, offset);
      }
 +    tcg_temp_free_i32(tmp);
      if (a->w) {
          /* writeback */
          if (a->p) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
  {
      uint32_t offset;
      TCGv_i32 addr;
 +    TCGv_i64 tmp;
      int i, n;
      n = a->imm >> 1;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
      }
      offset = 8;
 +    tmp = tcg_temp_new_i64();
      for (i = 0; i < n; i++) {
          if (a->l) {
              /* load */
 -            gen_vfp_ld(s, true, addr);
 -            gen_mov_vreg_F0(true, a->vd + i);
 +            gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
 +            neon_store_reg64(tmp, a->vd + i);
          } else {
              /* store */
 -            gen_mov_F0_vreg(true, a->vd + i);
 -            gen_vfp_st(s, true, addr);
 +            neon_load_reg64(tmp, a->vd + i);
 +            gen_aa32_st64(s, tmp, addr, get_mem_index(s));
          }
          tcg_gen_addi_i32(addr, addr, offset);
      }
 +    tcg_temp_free_i64(tmp);
      if (a->w) {
          /* writeback */
          if (a->p) {
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ VFP_GEN_FIX(uhto, )
  VFP_GEN_FIX(ulto, )
  #undef VFP_GEN_FIX
 -static inline void gen_vfp_ld(DisasContext *s, int dp, TCGv_i32 addr)
 -{
--    if (dp) {
+-    int n;
--        gen_aa32_ld64(s, cpu_F0d, addr, get_mem_index(s));
+-    int max;
--    } else {
+-    qemu_irq *irq;
--        gen_aa32_ld32u(s, cpu_F0s, addr, get_mem_index(s));
+-
 -    max = ext ? EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ :
 -        EXYNOS4210_MAX_INT_COMBINER_IN_IRQ;
 -    irq = ext ? irqs->ext_combiner_irq : irqs->int_combiner_irq;
 -
 -    for (n = 0; n < max; n++) {
 -        irq[n] = qdev_get_gpio_in(dev, n);
 -    }
 -}
 -
--static inline void gen_vfp_st(DisasContext *s, int dp, TCGv_i32 addr)
+ static uint8_t chipid_and_omr[] = { 0x11, 0x02, 0x21, 0x43,
--{
+x09, 0x00, 0x00, 0x00 };
--    if (dp) {
--        gen_aa32_st64(s, cpu_F0d, addr, get_mem_index(s));
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
--    } else {
+         sysbus_connect_irq(busdev, n,
--        gen_aa32_st32(s, cpu_F0s, addr, get_mem_index(s));
+                            qdev_get_gpio_in(DEVICE(&s->a9mpcore), n));
--    }
+     }
--}
+-    exynos4210_combiner_get_gpioin(&s->irqs, DEVICE(&s->int_combiner), 0);
--
+     sysbus_mmio_map(busdev, 0, EXYNOS4210_INT_COMBINER_BASE_ADDR);
- static inline long vfp_reg_offset(bool dp, unsigned reg)
- {
+     /* External Interrupt Combiner */
-     if (dp) {
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_realize(DeviceState *socdev, Error **errp)
      for (n = 0; n < EXYNOS4210_MAX_INT_COMBINER_OUT_IRQ; n++) {
          sysbus_connect_irq(busdev, n, qdev_get_gpio_in(DEVICE(&s->ext_gic), n));
      }
 -    exynos4210_combiner_get_gpioin(&s->irqs, DEVICE(&s->ext_combiner), 1);
      sysbus_mmio_map(busdev, 0, EXYNOS4210_EXT_COMBINER_BASE_ADDR);
      /* Initialize board IRQs. */
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 18/48] target/arm: Convert "double-precision" register moves to decodetree
+[PULL 26/31] hw/arm/realview: replace 'qemu_split_irq' with 'TYPE_SPLIT_IRQ'
-Convert the "double-precision" register moves to decodetree:
+From: Zongyuan Li <zongyuan.li@smartx.com>
 this covers VMOV scalar-to-gpreg, VMOV gpreg-to-scalar and VDUP.
-Note that the conversion process has tightened up a few of the
+Signed-off-by: Zongyuan Li <zongyuan.li@smartx.com>
-UNDEF encoding checks: we now correctly forbid:
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
- * VMOV-to-gpr with U:opc1:opc2 == 10x00 or x0x10
+Message-id: 20220324181557.203805-2-zongyuan.li@smartx.com
  * VMOV-from-gpr with opc1:opc2 == 0x10
  * VDUP with B:E == 11
  * VDUP with Q == 1 and Vn<0> == 1
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
-The accesses of elements < 32 bits could be improved by doing
+ hw/arm/realview.c | 33 ++++++++++++++++++++++++---------
-direct ld/st of the right size rather than 32-bit read-and-shift
+file changed, 24 insertions(+), 9 deletions(-)
 or read-modify-write, but we leave this for later cleanup,
 since this series is generally trying to stick to fixing
 the decode.
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/translate-vfp.inc.c | 147 +++++++++++++++++++++++++++++++++
  target/arm/translate.c         |  83 +------------------
  target/arm/vfp.decode          |  36 ++++++++
 files changed, 185 insertions(+), 81 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/hw/arm/realview.c b/hw/arm/realview.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/arm/realview.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/arm/realview.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
+@@ -XXX,XX +XXX,XX @@
+ #include "hw/sysbus.h"
-     return true;
+ #include "hw/arm/boot.h"
- }
+ #include "hw/arm/primecell.h"
 +#include "hw/core/split-irq.h"
  #include "hw/net/lan9118.h"
  #include "hw/net/smc91c111.h"
  #include "hw/pci/pci.h"
 +#include "hw/qdev-core.h"
  #include "net/net.h"
  #include "sysemu/sysemu.h"
  #include "hw/boards.h"
@@ -XXX,XX +XXX,XX @@ static const int realview_board_id[] = {
 x76d
  };
 +static void split_irq_from_named(DeviceState *src, const char* outname,
 +                                 qemu_irq out1, qemu_irq out2) {
 +    DeviceState *splitter = qdev_new(TYPE_SPLIT_IRQ);
 +
-+static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
++    qdev_prop_set_uint32(splitter, "num-lines", 2);
 +{
 +    /* VMOV scalar to general purpose register */
 +    TCGv_i32 tmp;
 +    int pass;
 +    uint32_t offset;
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist */
++    qdev_realize_and_unref(splitter, NULL, &error_fatal);
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
 +        return false;
 +    }
 +
-+    offset = a->index << a->size;
++    qdev_connect_gpio_out(splitter, 0, out1);
-+    pass = extract32(offset, 2, 1);
++    qdev_connect_gpio_out(splitter, 1, out2);
-+    offset = extract32(offset, 0, 2) * 8;
++    qdev_connect_gpio_out_named(src, outname, 0,
-+
++                                qdev_get_gpio_in(splitter, 0));
 +    if (a->size != 2 && !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = neon_load_reg(a->vn, pass);
 +    switch (a->size) {
 +    case 0:
 +        if (offset) {
 +            tcg_gen_shri_i32(tmp, tmp, offset);
 +        }
 +        if (a->u) {
 +            gen_uxtb(tmp);
 +        } else {
 +            gen_sxtb(tmp);
 +        }
 +        break;
 +    case 1:
 +        if (a->u) {
 +            if (offset) {
 +                tcg_gen_shri_i32(tmp, tmp, 16);
 +            } else {
 +                gen_uxth(tmp);
 +            }
 +        } else {
 +            if (offset) {
 +                tcg_gen_sari_i32(tmp, tmp, 16);
 +            } else {
 +                gen_sxth(tmp);
 +            }
 +        }
 +        break;
 +    case 2:
 +        break;
 +    }
 +    store_reg(s, a->rt, tmp);
 +
 +    return true;
 +}
 +
-+static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
+ static void realview_init(MachineState *machine,
-+{
+                           enum realview_board_type board_type)
-+    /* VMOV general purpose register to scalar */
+ {
-+    TCGv_i32 tmp, tmp2;
+@@ -XXX,XX +XXX,XX @@ static void realview_init(MachineState *machine,
-+    int pass;
+     DeviceState *dev, *sysctl, *gpio2, *pl041;
-+    uint32_t offset;
+     SysBusDevice *busdev;
      qemu_irq pic[64];
 -    qemu_irq mmc_irq[2];
      PCIBus *pci_bus = NULL;
      NICInfo *nd;
      DriveInfo *dinfo;
@@ -XXX,XX +XXX,XX @@ static void realview_init(MachineState *machine,
       * and the PL061 has them the other way about. Also the card
       * detect line is inverted.
       */
 -    mmc_irq[0] = qemu_irq_split(
 -        qdev_get_gpio_in(sysctl, ARM_SYSCTL_GPIO_MMC_WPROT),
 -        qdev_get_gpio_in(gpio2, 1));
 -    mmc_irq[1] = qemu_irq_split(
 -        qdev_get_gpio_in(sysctl, ARM_SYSCTL_GPIO_MMC_CARDIN),
 -        qemu_irq_invert(qdev_get_gpio_in(gpio2, 0)));
 -    qdev_connect_gpio_out_named(dev, "card-read-only", 0, mmc_irq[0]);
 -    qdev_connect_gpio_out_named(dev, "card-inserted", 0, mmc_irq[1]);
 +    split_irq_from_named(dev, "card-read-only",
 +                   qdev_get_gpio_in(sysctl, ARM_SYSCTL_GPIO_MMC_WPROT),
 +                   qdev_get_gpio_in(gpio2, 1));
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist */
++    split_irq_from_named(dev, "card-inserted",
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
++                   qdev_get_gpio_in(sysctl, ARM_SYSCTL_GPIO_MMC_CARDIN),
-+        return false;
++                   qemu_irq_invert(qdev_get_gpio_in(gpio2, 0)));
 +    }
 +
-+    offset = a->index << a->size;
+     dinfo = drive_get(IF_SD, 0, 0);
-+    pass = extract32(offset, 2, 1);
+     if (dinfo) {
-+    offset = extract32(offset, 0, 2) * 8;
+         DeviceState *card;
 +
 +    if (a->size != 2 && !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = load_reg(s, a->rt);
 +    switch (a->size) {
 +    case 0:
 +        tmp2 = neon_load_reg(a->vn, pass);
 +        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
 +        tcg_temp_free_i32(tmp2);
 +        break;
 +    case 1:
 +        tmp2 = neon_load_reg(a->vn, pass);
 +        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
 +        tcg_temp_free_i32(tmp2);
 +        break;
 +    case 2:
 +        break;
 +    }
 +    neon_store_reg(a->vn, pass, tmp);
 +
 +    return true;
 +}
 +
 +static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
 +{
 +    /* VDUP (general purpose register) */
 +    TCGv_i32 tmp;
 +    int size, vec_size;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
 +        return false;
 +    }
 +
 +    if (a->b && a->e) {
 +        return false;
 +    }
 +
 +    if (a->q && (a->vn & 1)) {
 +        return false;
 +    }
 +
 +    vec_size = a->q ? 16 : 8;
 +    if (a->b) {
 +        size = 0;
 +    } else if (a->e) {
 +        size = 1;
 +    } else {
 +        size = 2;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = load_reg(s, a->rt);
 +    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
 +                         vec_size, vec_size, tmp);
 +    tcg_temp_free_i32(tmp);
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              /* single register transfer */
              rd = (insn >> 12) & 0xf;
              if (dp) {
 -                int size;
 -                int pass;
 -
 -                VFP_DREG_N(rn, insn);
 -                if (insn & 0xf)
 -                    return 1;
 -                if (insn & 0x00c00060
 -                    && !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 -                    return 1;
 -                }
 -
 -                pass = (insn >> 21) & 1;
 -                if (insn & (1 << 22)) {
 -                    size = 0;
 -                    offset = ((insn >> 5) & 3) * 8;
 -                } else if (insn & (1 << 5)) {
 -                    size = 1;
 -                    offset = (insn & (1 << 6)) ? 16 : 0;
 -                } else {
 -                    size = 2;
 -                    offset = 0;
 -                }
 -                if (insn & ARM_CP_RW_BIT) {
 -                    /* vfp->arm */
 -                    tmp = neon_load_reg(rn, pass);
 -                    switch (size) {
 -                    case 0:
 -                        if (offset)
 -                            tcg_gen_shri_i32(tmp, tmp, offset);
 -                        if (insn & (1 << 23))
 -                            gen_uxtb(tmp);
 -                        else
 -                            gen_sxtb(tmp);
 -                        break;
 -                    case 1:
 -                        if (insn & (1 << 23)) {
 -                            if (offset) {
 -                                tcg_gen_shri_i32(tmp, tmp, 16);
 -                            } else {
 -                                gen_uxth(tmp);
 -                            }
 -                        } else {
 -                            if (offset) {
 -                                tcg_gen_sari_i32(tmp, tmp, 16);
 -                            } else {
 -                                gen_sxth(tmp);
 -                            }
 -                        }
 -                        break;
 -                    case 2:
 -                        break;
 -                    }
 -                    store_reg(s, rd, tmp);
 -                } else {
 -                    /* arm->vfp */
 -                    tmp = load_reg(s, rd);
 -                    if (insn & (1 << 23)) {
 -                        /* VDUP */
 -                        int vec_size = pass ? 16 : 8;
 -                        tcg_gen_gvec_dup_i32(size, neon_reg_offset(rn, 0),
 -                                             vec_size, vec_size, tmp);
 -                        tcg_temp_free_i32(tmp);
 -                    } else {
 -                        /* VMOV */
 -                        switch (size) {
 -                        case 0:
 -                            tmp2 = neon_load_reg(rn, pass);
 -                            tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
 -                            tcg_temp_free_i32(tmp2);
 -                            break;
 -                        case 1:
 -                            tmp2 = neon_load_reg(rn, pass);
 -                            tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
 -                            tcg_temp_free_i32(tmp2);
 -                            break;
 -                        case 2:
 -                            break;
 -                        }
 -                        neon_store_reg(rn, pass, tmp);
 -                    }
 -                }
 +                /* already handled by decodetree */
 +                return 1;
              } else { /* !dp */
                  bool is_sysreg;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@
  #  1110 1110 .... .... .... 101. .... ....
  # (but those patterns might also cover some Neon instructions,
  # which do not live in this file.)
 +
 +# VFP registers have an odd encoding with a four-bit field
 +# and a one-bit field which are assembled in different orders
 +# depending on whether the register is double or single precision.
 +# Each individual instruction function must do the checks for
 +# "double register selected but CPU does not have double support"
 +# and "double register number has bit 4 set but CPU does not
 +# support D16-D31" (which should UNDEF).
 +%vm_dp  5:1 0:4
 +%vm_sp  0:4 5:1
 +%vn_dp  7:1 16:4
 +%vn_sp  16:4 7:1
 +%vd_dp  22:1 12:4
 +%vd_sp  12:4 22:1
 +
 +%vmov_idx_b     21:1 5:2
 +%vmov_idx_h     21:1 6:1
 +
 +# VMOV scalar to general-purpose register; note that this does
 +# include some Neon cases.
 +VMOV_to_gp   ---- 1110 u:1 1.        1 .... rt:4 1011 ... 1 0000 \
 +             vn=%vn_dp size=0 index=%vmov_idx_b
 +VMOV_to_gp   ---- 1110 u:1 0.        1 .... rt:4 1011 ..1 1 0000 \
 +             vn=%vn_dp size=1 index=%vmov_idx_h
 +VMOV_to_gp   ---- 1110 0   0 index:1 1 .... rt:4 1011 .00 1 0000 \
 +             vn=%vn_dp size=2 u=0
 +
 +VMOV_from_gp ---- 1110 0 1.        0 .... rt:4 1011 ... 1 0000 \
 +             vn=%vn_dp size=0 index=%vmov_idx_b
 +VMOV_from_gp ---- 1110 0 0.        0 .... rt:4 1011 ..1 1 0000 \
 +             vn=%vn_dp size=1 index=%vmov_idx_h
 +VMOV_from_gp ---- 1110 0 0 index:1 0 .... rt:4 1011 .00 1 0000 \
 +             vn=%vn_dp size=2
 +
 +VDUP         ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \
 +             vn=%vn_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 07/48] decodetree: Fix comparison of Field
+[PULL 27/31] hw/arm/stellaris: replace 'qemu_split_irq' with 'TYPE_SPLIT_IRQ'
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Zongyuan Li <zongyuan.li@smartx.com>
-Typo comparing the sign of the field, twice, instead of also comparing
+Signed-off-by: Zongyuan Li <zongyuan.li@smartx.com>
 the mask of the field (which itself encodes both position and length).
 Reported-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20190604154225.26992-1-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Message-id: 20220324181557.203805-3-zongyuan.li@smartx.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- scripts/decodetree.py | 2 +-
+ hw/arm/stellaris.c | 15 +++++++++++++--
-file changed, 1 insertion(+), 1 deletion(-)
+file changed, 13 insertions(+), 2 deletions(-)
-diff --git a/scripts/decodetree.py b/scripts/decodetree.py
+diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
-index XXXXXXX..XXXXXXX 100755
+index XXXXXXX..XXXXXXX 100644
---- a/scripts/decodetree.py
+--- a/hw/arm/stellaris.c
-+++ b/scripts/decodetree.py
++++ b/hw/arm/stellaris.c
-@@ -XXX,XX +XXX,XX @@ class Field:
+@@ -XXX,XX +XXX,XX @@
-         return '{0}(insn, {1}, {2})'.format(extr, self.pos, self.len)
+ #include "qemu/osdep.h"
-     def __eq__(self, other):
+ #include "qapi/error.h"
--        return self.sign == other.sign and self.sign == other.sign
++#include "hw/core/split-irq.h"
-+        return self.sign == other.sign and self.mask == other.mask
+ #include "hw/sysbus.h"
+ #include "hw/sd/sd.h"
-     def __ne__(self, other):
+ #include "hw/ssi/ssi.h"
-         return not self.__eq__(other)
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
              DeviceState *ssddev;
              DriveInfo *dinfo;
              DeviceState *carddev;
 +            DeviceState *gpio_d_splitter;
              BlockBackend *blk;
              /*
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
                                     &error_fatal);
              ssddev = ssi_create_peripheral(bus, "ssd0323");
 -            gpio_out[GPIO_D][0] = qemu_irq_split(
 -                    qdev_get_gpio_in_named(sddev, SSI_GPIO_CS, 0),
 +
 +            gpio_d_splitter = qdev_new(TYPE_SPLIT_IRQ);
 +            qdev_prop_set_uint32(gpio_d_splitter, "num-lines", 2);
 +            qdev_realize_and_unref(gpio_d_splitter, NULL, &error_fatal);
 +            qdev_connect_gpio_out(
 +                    gpio_d_splitter, 0,
 +                    qdev_get_gpio_in_named(sddev, SSI_GPIO_CS, 0));
 +            qdev_connect_gpio_out(
 +                    gpio_d_splitter, 1,
                      qdev_get_gpio_in_named(ssddev, SSI_GPIO_CS, 0));
 +            gpio_out[GPIO_D][0] = qdev_get_gpio_in(gpio_d_splitter, 0);
 +
              gpio_out[GPIO_C][7] = qdev_get_gpio_in(ssddev, 0);
              /* Make sure the select pin is high.  */
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 41/48] target/arm: Convert the VCVT-to-f16 insns to decodetree
+[PULL 28/31] hw/core/irq: remove unused 'qemu_irq_split' function
-Convert the VCVTT and VCVTB instructions which convert from
+From: Zongyuan Li <zongyuan.li@smartx.com>
 f32 and f64 to f16 to decodetree.
-Since we're no longer constrained to the old decoder's style
+Signed-off-by: Zongyuan Li <zongyuan.li@smartx.com>
-using cpu_F0s and cpu_F0d we can perform a direct 16 bit
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-store of the right half of the input single-precision register
+Message-id: 20220324181557.203805-5-zongyuan.li@smartx.com
-rather than doing a load/modify/store sequence on the full
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/811
-bits.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  include/hw/irq.h |  5 -----
  hw/core/irq.c    | 15 ---------------
 files changed, 20 deletions(-)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/include/hw/irq.h b/include/hw/irq.h
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/translate-vfp.inc.c | 62 ++++++++++++++++++++++++++
  target/arm/translate.c         | 79 +---------------------------------
  target/arm/vfp.decode          |  6 +++
 files changed, 69 insertions(+), 78 deletions(-)
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/irq.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/irq.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
+@@ -XXX,XX +XXX,XX @@ void qemu_free_irq(qemu_irq irq);
-     tcg_temp_free_i64(vd);
+ /* Returns a new IRQ with opposite polarity.  */
-     return true;
+ qemu_irq qemu_irq_invert(qemu_irq irq);
 -/* Returns a new IRQ which feeds into both the passed IRQs.
 - * It's probably better to use the TYPE_SPLIT_IRQ device instead.
 - */
 -qemu_irq qemu_irq_split(qemu_irq irq1, qemu_irq irq2);
 -
  /* For internal use in qtest.  Similar to qemu_irq_split, but operating
     on an existing vector of qemu_irq.  */
  void qemu_irq_intercept_in(qemu_irq *gpio_in, qemu_irq_handler handler, int n);
 diff --git a/hw/core/irq.c b/hw/core/irq.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/core/irq.c
 +++ b/hw/core/irq.c
@@ -XXX,XX +XXX,XX @@ qemu_irq qemu_irq_invert(qemu_irq irq)
      return qemu_allocate_irq(qemu_notirq, irq, 0);
  }
-+
-+static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
+-static void qemu_splitirq(void *opaque, int line, int level)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i32 ahp_mode;
 +    TCGv_i32 tmp;
 +
 +    if (!dc_isar_feature(aa32_fp16_spconv, s)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(false);
 +    ahp_mode = get_ahp_flag();
 +    tmp = tcg_temp_new_i32();
 +
 +    neon_load_reg32(tmp, a->vm);
 +    gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
 +    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
 +    tcg_temp_free_i32(ahp_mode);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
 +static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i32 ahp_mode;
 +    TCGv_i32 tmp;
 +    TCGv_i64 vm;
 +
 +    if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm  & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(false);
 +    ahp_mode = get_ahp_flag();
 +    tmp = tcg_temp_new_i32();
 +    vm = tcg_temp_new_i64();
 +
 +    neon_load_reg64(vm, a->vm);
 +    gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
 +    tcg_temp_free_i64(vm);
 +    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
 +    tcg_temp_free_i32(ahp_mode);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
  #define VFP_SREG_M(insn) VFP_SREG(insn,  0,  5)
  #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
 -/* Move between integer and VFP cores.  */
 -static TCGv_i32 gen_vfp_mrs(void)
 -{
--    TCGv_i32 tmp = tcg_temp_new_i32();
+-    struct IRQState **irq = opaque;
--    tcg_gen_mov_i32(tmp, cpu_F0s);
+-    irq[0]->handler(irq[0]->opaque, irq[0]->n, level);
--    return tmp;
+-    irq[1]->handler(irq[1]->opaque, irq[1]->n, level);
 -}
 -
--static void gen_vfp_msr(TCGv_i32 tmp)
+-qemu_irq qemu_irq_split(qemu_irq irq1, qemu_irq irq2)
 -{
--    tcg_gen_mov_i32(cpu_F0s, tmp);
+-    qemu_irq *s = g_new0(qemu_irq, 2);
--    tcg_temp_free_i32(tmp);
+-    s[0] = irq1;
 -    s[1] = irq2;
 -    return qemu_allocate_irq(qemu_splitirq, s, 0);
 -}
 -
- static void gen_neon_dup_low16(TCGv_i32 var)
+ void qemu_irq_intercept_in(qemu_irq *gpio_in, qemu_irq_handler handler, int n)
  {
-     TCGv_i32 tmp = tcg_temp_new_i32();
+     int i;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
  {
      uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
      int dp, veclen;
 -    TCGv_i32 tmp;
 -    TCGv_i32 tmp2;
      if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
          return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  return 1;
              case 15:
                  switch (rn) {
 -                case 0 ... 5:
 -                case 8 ... 11:
 +                case 0 ... 11:
                      /* Already handled by decodetree */
                      return 1;
                  default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              if (op == 15) {
                  /* rn is opcode, encoded as per VFP_SREG_N. */
                  switch (rn) {
 -                case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
 -                case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
 -                    if (dp) {
 -                        if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
 -                            return 1;
 -                        }
 -                    } else {
 -                        if (!dc_isar_feature(aa32_fp16_spconv, s)) {
 -                            return 1;
 -                        }
 -                    }
 -                    rd_is_dp = false;
 -                    break;
 -
                  case 0x0c: /* vrintr */
                  case 0x0d: /* vrintz */
                  case 0x0e: /* vrintx */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 6: /* vcvtb.f16.f32, vcvtb.f16.f64 */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(false);
 -                        TCGv_i32 ahp = get_ahp_flag();
 -                        tmp = tcg_temp_new_i32();
 -
 -                        if (dp) {
 -                            gen_helper_vfp_fcvt_f64_to_f16(tmp, cpu_F0d,
 -                                                           fpst, ahp);
 -                        } else {
 -                            gen_helper_vfp_fcvt_f32_to_f16(tmp, cpu_F0s,
 -                                                           fpst, ahp);
 -                        }
 -                        tcg_temp_free_i32(ahp);
 -                        tcg_temp_free_ptr(fpst);
 -                        gen_mov_F0_vreg(0, rd);
 -                        tmp2 = gen_vfp_mrs();
 -                        tcg_gen_andi_i32(tmp2, tmp2, 0xffff0000);
 -                        tcg_gen_or_i32(tmp, tmp, tmp2);
 -                        tcg_temp_free_i32(tmp2);
 -                        gen_vfp_msr(tmp);
 -                        break;
 -                    }
 -                    case 7: /* vcvtt.f16.f32, vcvtt.f16.f64 */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(false);
 -                        TCGv_i32 ahp = get_ahp_flag();
 -                        tmp = tcg_temp_new_i32();
 -                        if (dp) {
 -                            gen_helper_vfp_fcvt_f64_to_f16(tmp, cpu_F0d,
 -                                                           fpst, ahp);
 -                        } else {
 -                            gen_helper_vfp_fcvt_f32_to_f16(tmp, cpu_F0s,
 -                                                           fpst, ahp);
 -                        }
 -                        tcg_temp_free_i32(ahp);
 -                        tcg_temp_free_ptr(fpst);
 -                        tcg_gen_shli_i32(tmp, tmp, 16);
 -                        gen_mov_F0_vreg(0, rd);
 -                        tmp2 = gen_vfp_mrs();
 -                        tcg_gen_ext16u_i32(tmp2, tmp2);
 -                        tcg_gen_or_i32(tmp, tmp, tmp2);
 -                        tcg_temp_free_i32(tmp2);
 -                        gen_vfp_msr(tmp);
 -                        break;
 -                    }
                      case 12: /* vrintr */
                      {
                          TCGv_ptr fpst = get_fpstatus_ptr(0);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f32_f16 ---- 1110 1.11 0010 .... 1010 t:1 1.0 .... \
               vd=%vd_sp vm=%vm_sp
  VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
               vd=%vd_dp vm=%vm_sp
 +
 +# VCVTB and VCVTT to f16: Vd format is always vd_sp; Vm format depends on size bit
 +VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_dp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 06/48] target/arm: Fix output of PAuth Auth
+[PULL 29/31] hw/arm/virt: impact of gic-version on max CPUs
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
-The ARM pseudocode installs the error_code into the original
+Describe that the gic-version influences the maximum number of CPUs.
 pointer, not the encrypted pointer.  The difference applies
 within the 7 bits of pac data; the result should be the sign
 extension of bit 55.
-Add a testcase to that effect.
+Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
+Message-id: 20220413231456.35811-1-heinrich.schuchardt@canonical.com
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+[PMM: minor punctuation tweaks]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- tests/tcg/aarch64/Makefile.target |  2 +-
+ docs/system/arm/virt.rst | 4 ++--
- target/arm/pauth_helper.c         |  4 +-
+file changed, 2 insertions(+), 2 deletions(-)
  tests/tcg/aarch64/pauth-2.c       | 61 +++++++++++++++++++++++++++++++
 files changed, 64 insertions(+), 3 deletions(-)
  create mode 100644 tests/tcg/aarch64/pauth-2.c
-diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
+diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
 index XXXXXXX..XXXXXXX 100644
---- a/tests/tcg/aarch64/Makefile.target
+--- a/docs/system/arm/virt.rst
-+++ b/tests/tcg/aarch64/Makefile.target
++++ b/docs/system/arm/virt.rst
-@@ -XXX,XX +XXX,XX @@ run-fcvt: fcvt
+@@ -XXX,XX +XXX,XX @@ gic-version
-     $(call run-test,$<,$(QEMU) $<, "$< on $(TARGET_NAME)")
+   Valid values are:
-     $(call diff-out,$<,$(AARCH64_SRC)/fcvt.ref)
+   ``2``
--AARCH64_TESTS += pauth-1
+-    GICv2
-+AARCH64_TESTS += pauth-1 pauth-2
++    GICv2. Note that this limits the number of CPUs to 8.
- run-pauth-%: QEMU += -cpu max
+   ``3``
+-    GICv3
- TESTS:=$(AARCH64_TESTS)
++    GICv3. This allows up to 512 CPUs.
-diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
+   ``host``
-index XXXXXXX..XXXXXXX 100644
+     Use the same GIC version the host provides, when using KVM
---- a/target/arm/pauth_helper.c
+   ``max``
 +++ b/target/arm/pauth_helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
      if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
          int error_code = (keynumber << 1) | (keynumber ^ 1);
          if (param.tbi) {
 -            return deposit64(ptr, 53, 2, error_code);
 +            return deposit64(orig_ptr, 53, 2, error_code);
          } else {
 -            return deposit64(ptr, 61, 2, error_code);
 +            return deposit64(orig_ptr, 61, 2, error_code);
          }
      }
      return orig_ptr;
 diff --git a/tests/tcg/aarch64/pauth-2.c b/tests/tcg/aarch64/pauth-2.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/tests/tcg/aarch64/pauth-2.c
@@ -XXX,XX +XXX,XX @@
 +#include <stdint.h>
 +#include <assert.h>
 +
 +asm(".arch armv8.4-a");
 +
 +void do_test(uint64_t value)
 +{
 +    uint64_t salt1, salt2;
 +    uint64_t encode, decode;
 +
 +    /*
 +     * With TBI enabled and a 48-bit VA, there are 7 bits of auth,
 +     * and so a 1/128 chance of encode = pac(value,key,salt) producing
 +     * an auth for which leaves value unchanged.
 +     * Iterate until we find a salt for which encode != value.
 +     */
 +    for (salt1 = 1; ; salt1++) {
 +        asm volatile("pacda %0, %2" : "=r"(encode) : "0"(value), "r"(salt1));
 +        if (encode != value) {
 +            break;
 +        }
 +    }
 +
 +    /* A valid salt must produce a valid authorization.  */
 +    asm volatile("autda %0, %2" : "=r"(decode) : "0"(encode), "r"(salt1));
 +    assert(decode == value);
 +
 +    /*
 +     * An invalid salt usually fails authorization, but again there
 +     * is a chance of choosing another salt that works.
 +     * Iterate until we find another salt which does fail.
 +     */
 +    for (salt2 = salt1 + 1; ; salt2++) {
 +        asm volatile("autda %0, %2" : "=r"(decode) : "0"(encode), "r"(salt2));
 +        if (decode != value) {
 +            break;
 +        }
 +    }
 +
 +    /* The VA bits, bit 55, and the TBI bits, should be unchanged.  */
 +    assert(((decode ^ value) & 0xff80ffffffffffffull) == 0);
 +
 +    /*
 +     * Bits [54:53] are an error indicator based on the key used;
 +     * the DA key above is keynumber 0, so error == 0b01.  Otherwise
 +     * bit 55 of the original is sign-extended into the rest of the auth.
 +     */
 +    if ((value >> 55) & 1) {
 +        assert(((decode >> 48) & 0xff) == 0b10111111);
 +    } else {
 +        assert(((decode >> 48) & 0xff) == 0b00100000);
 +    }
 +}
 +
 +int main()
 +{
 +    do_test(0);
 +    do_test(-1);
 +    do_test(0xda004acedeadbeefull);
 +    return 0;
 +}
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 40/48] target/arm: Convert the VCVT-from-f16 insns to decodetree
+[PULL 30/31] hw/misc: Add PWRON STRAP bit fields in GCR module
-Convert the VCVTT, VCVTB instructions that deal with conversion
+From: Hao Wu <wuhaotsh@google.com>
 from half-precision floats to f32 or 64 to decodetree.
-Since we're no longer constrained to the old decoder's style
+Similar to the Aspeed code in include/misc/aspeed_scu.h, we define
-using cpu_F0s and cpu_F0d we can perform a direct 16 bit
+the PWRON STRAP fields in their corresponding module for NPCM7XX.
 load of the right half of the input single-precision register
 rather than loading the full 32 bits and then doing a
 separate shift or sign-extension.
+Signed-off-by: Hao Wu <wuhaotsh@google.com>
+Reviewed-by: Patrick Venture <venture@google.com>
+Message-id: 20220411165842.3912945-2-wuhaotsh@google.com
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 82 ++++++++++++++++++++++++++++++++++
+ include/hw/misc/npcm7xx_gcr.h | 30 ++++++++++++++++++++++++++++++
- target/arm/translate.c         | 56 +----------------------
+file changed, 30 insertions(+)
  target/arm/vfp.decode          |  6 +++
 files changed, 89 insertions(+), 55 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/misc/npcm7xx_gcr.h b/include/hw/misc/npcm7xx_gcr.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/misc/npcm7xx_gcr.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/misc/npcm7xx_gcr.h
 @@ -XXX,XX +XXX,XX @@
- #include "decode-vfp.inc.c"
+ #include "exec/memory.h"
- #include "decode-vfp-uncond.inc.c"
+ #include "hw/sysbus.h"
 +/*
-+ * Return the offset of a 16-bit half of the specified VFP single-precision
++ * NPCM7XX PWRON STRAP bit fields
-+ * register. If top is true, returns the top 16 bits; otherwise the bottom
++ * 12: SPI0 powered by VSBV3 at 1.8V
-+ * 16 bits.
++ * 11: System flash attached to BMC
 + * 10: BSP alternative pins.
 + * 9:8: Flash UART command route enabled.
 + * 7: Security enabled.
 + * 6: HI-Z state control.
 + * 5: ECC disabled.
 + * 4: Reserved
 + * 3: JTAG2 enabled.
 + * 2:0: CPU and DRAM clock frequency.
 + */
-+static inline long vfp_f16_offset(unsigned reg, bool top)
++#define NPCM7XX_PWRON_STRAP_SPI0F18                 BIT(12)
-+{
++#define NPCM7XX_PWRON_STRAP_SFAB                    BIT(11)
-+    long offs = vfp_reg_offset(false, reg);
++#define NPCM7XX_PWRON_STRAP_BSPA                    BIT(10)
-+#ifdef HOST_WORDS_BIGENDIAN
++#define NPCM7XX_PWRON_STRAP_FUP(x)                  ((x) << 8)
-+    if (!top) {
++#define     FUP_NORM_UART2      3
-+        offs += 2;
++#define     FUP_PROG_UART3      2
-+    }
++#define     FUP_PROG_UART2      1
-+#else
++#define     FUP_NORM_UART3      0
-+    if (top) {
++#define NPCM7XX_PWRON_STRAP_SECEN                   BIT(7)
-+        offs += 2;
++#define NPCM7XX_PWRON_STRAP_HIZ                     BIT(6)
-+    }
++#define NPCM7XX_PWRON_STRAP_ECC                     BIT(5)
-+#endif
++#define NPCM7XX_PWRON_STRAP_RESERVE1                BIT(4)
-+    return offs;
++#define NPCM7XX_PWRON_STRAP_J2EN                    BIT(3)
-+}
++#define NPCM7XX_PWRON_STRAP_CKFRQ(x)                (x)
 +#define     CKFRQ_SKIPINIT      0x000
 +#define     CKFRQ_DEFAULT       0x111
 +
  /*
-  * Check that VFP access is enabled. If it is, do the necessary
+  * Number of registers in our device state structure. Don't change this without
-  * M-profile lazy-FP handling and then return true.
+  * incrementing the version_id in the vmstate.
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
      return true;
  }
 +
 +static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i32 ahp_mode;
 +    TCGv_i32 tmp;
 +
 +    if (!dc_isar_feature(aa32_fp16_spconv, s)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(false);
 +    ahp_mode = get_ahp_flag();
 +    tmp = tcg_temp_new_i32();
 +    /* The T bit tells us if we want the low or high 16 bits of Vm */
 +    tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
 +    gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
 +    neon_store_reg32(tmp, a->vd);
 +    tcg_temp_free_i32(ahp_mode);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
 +static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i32 ahp_mode;
 +    TCGv_i32 tmp;
 +    TCGv_i64 vd;
 +
 +    if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd  & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(false);
 +    ahp_mode = get_ahp_flag();
 +    tmp = tcg_temp_new_i32();
 +    /* The T bit tells us if we want the low or high 16 bits of Vm */
 +    tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
 +    vd = tcg_temp_new_i64();
 +    gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
 +    neon_store_reg64(vd, a->vd);
 +    tcg_temp_free_i32(ahp_mode);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i64(vd);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  return 1;
              case 15:
                  switch (rn) {
 -                case 0 ... 3:
 +                case 0 ... 5:
                  case 8 ... 11:
                      /* Already handled by decodetree */
                      return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              if (op == 15) {
                  /* rn is opcode, encoded as per VFP_SREG_N. */
                  switch (rn) {
 -                case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
 -                case 0x05: /* vcvtt.f64.f16, vcvtt.f32.f16 */
 -                    /*
 -                     * VCVTB, VCVTT: only present with the halfprec extension
 -                     * UNPREDICTABLE if bit 8 is set prior to ARMv8
 -                     * (we choose to UNDEF)
 -                     */
 -                    if (dp) {
 -                        if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
 -                            return 1;
 -                        }
 -                    } else {
 -                        if (!dc_isar_feature(aa32_fp16_spconv, s)) {
 -                            return 1;
 -                        }
 -                    }
 -                    rm_is_dp = false;
 -                    break;
                  case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
                  case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
                      if (dp) {
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(false);
 -                        TCGv_i32 ahp_mode = get_ahp_flag();
 -                        tmp = gen_vfp_mrs();
 -                        tcg_gen_ext16u_i32(tmp, tmp);
 -                        if (dp) {
 -                            gen_helper_vfp_fcvt_f16_to_f64(cpu_F0d, tmp,
 -                                                           fpst, ahp_mode);
 -                        } else {
 -                            gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp,
 -                                                           fpst, ahp_mode);
 -                        }
 -                        tcg_temp_free_i32(ahp_mode);
 -                        tcg_temp_free_ptr(fpst);
 -                        tcg_temp_free_i32(tmp);
 -                        break;
 -                    }
 -                    case 5: /* vcvtt.f32.f16, vcvtt.f64.f16 */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(false);
 -                        TCGv_i32 ahp = get_ahp_flag();
 -                        tmp = gen_vfp_mrs();
 -                        tcg_gen_shri_i32(tmp, tmp, 16);
 -                        if (dp) {
 -                            gen_helper_vfp_fcvt_f16_to_f64(cpu_F0d, tmp,
 -                                                           fpst, ahp);
 -                        } else {
 -                            gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp,
 -                                                           fpst, ahp);
 -                        }
 -                        tcg_temp_free_i32(tmp);
 -                        tcg_temp_free_i32(ahp);
 -                        tcg_temp_free_ptr(fpst);
 -                        break;
 -                    }
                      case 6: /* vcvtb.f16.f32, vcvtb.f16.f64 */
                      {
                          TCGv_ptr fpst = get_fpstatus_ptr(false);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCMP_sp      ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
               vd=%vd_sp vm=%vm_sp
  VCMP_dp      ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
               vd=%vd_dp vm=%vm_dp
 +
 +# VCVTT and VCVTB from f16: Vd format depends on size bit; Vm is always vm_sp
 +VCVT_f32_f16 ---- 1110 1.11 0010 .... 1010 t:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
 +             vd=%vd_dp vm=%vm_sp
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 01/48] target/arm: Vectorize USHL and SSHL
+[PULL 31/31] hw/arm: Use bit fields for NPCM7XX PWRON STRAPs
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Hao Wu <wuhaotsh@google.com>
-These instructions shift left or right depending on the sign
+This patch uses the defined fields to describe PWRON STRAPs for
-of the input, and 7 bits are significant to the shift.  This
+better readability.
 requires several masks and selects in addition to the actual
 shifts to form the complete answer.
-That said, the operation is still a small improvement even for
+Signed-off-by: Hao Wu <wuhaotsh@google.com>
-two 64-bit elements -- 13 vector operations instead of 2 * 7
+Reviewed-by: Patrick Venture <venture@google.com>
-integer operations.
+Message-id: 20220411165842.3912945-3-wuhaotsh@google.com
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20190603232209.20704-1-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.h        |  11 +-
+ hw/arm/npcm7xx_boards.c | 24 +++++++++++++++++++-----
- target/arm/translate.h     |   6 +
+file changed, 19 insertions(+), 5 deletions(-)
  target/arm/neon_helper.c   |  33 ----
  target/arm/translate-a64.c |  18 +--
  target/arm/translate.c     | 300 +++++++++++++++++++++++++++++++++++--
  target/arm/vec_helper.c    |  88 +++++++++++
 files changed, 390 insertions(+), 66 deletions(-)
-diff --git a/target/arm/helper.h b/target/arm/helper.h
+diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
+--- a/hw/arm/npcm7xx_boards.c
-+++ b/target/arm/helper.h
++++ b/hw/arm/npcm7xx_boards.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(neon_abd_s16, i32, i32, i32)
+@@ -XXX,XX +XXX,XX @@
- DEF_HELPER_2(neon_abd_u32, i32, i32, i32)
+ #include "sysemu/sysemu.h"
- DEF_HELPER_2(neon_abd_s32, i32, i32, i32)
+ #include "sysemu/block-backend.h"
--DEF_HELPER_2(neon_shl_u8, i32, i32, i32)
+-#define NPCM750_EVB_POWER_ON_STRAPS 0x00001ff7
--DEF_HELPER_2(neon_shl_s8, i32, i32, i32)
+-#define QUANTA_GSJ_POWER_ON_STRAPS 0x00001fff
- DEF_HELPER_2(neon_shl_u16, i32, i32, i32)
+-#define QUANTA_GBS_POWER_ON_STRAPS 0x000017ff
- DEF_HELPER_2(neon_shl_s16, i32, i32, i32)
+-#define KUDO_BMC_POWER_ON_STRAPS 0x00001fff
--DEF_HELPER_2(neon_shl_u32, i32, i32, i32)
+-#define MORI_BMC_POWER_ON_STRAPS 0x00001fff
--DEF_HELPER_2(neon_shl_s32, i32, i32, i32)
++#define NPCM7XX_POWER_ON_STRAPS_DEFAULT (           \
--DEF_HELPER_2(neon_shl_u64, i64, i64, i64)
++        NPCM7XX_PWRON_STRAP_SPI0F18 |               \
--DEF_HELPER_2(neon_shl_s64, i64, i64, i64)
++        NPCM7XX_PWRON_STRAP_SFAB |                  \
- DEF_HELPER_2(neon_rshl_u8, i32, i32, i32)
++        NPCM7XX_PWRON_STRAP_BSPA |                  \
- DEF_HELPER_2(neon_rshl_s8, i32, i32, i32)
++        NPCM7XX_PWRON_STRAP_FUP(FUP_NORM_UART2) |   \
- DEF_HELPER_2(neon_rshl_u16, i32, i32, i32)
++        NPCM7XX_PWRON_STRAP_SECEN |                 \
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_2(frint64_s, TCG_CALL_NO_RWG, f32, f32, ptr)
++        NPCM7XX_PWRON_STRAP_HIZ |                   \
- DEF_HELPER_FLAGS_2(frint32_d, TCG_CALL_NO_RWG, f64, f64, ptr)
++        NPCM7XX_PWRON_STRAP_ECC |                   \
- DEF_HELPER_FLAGS_2(frint64_d, TCG_CALL_NO_RWG, f64, f64, ptr)
++        NPCM7XX_PWRON_STRAP_RESERVE1 |              \
++        NPCM7XX_PWRON_STRAP_J2EN |                  \
-+DEF_HELPER_FLAGS_4(gvec_sshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++        NPCM7XX_PWRON_STRAP_CKFRQ(CKFRQ_DEFAULT))
 +DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +
- #ifdef TARGET_AARCH64
++#define NPCM750_EVB_POWER_ON_STRAPS ( \
- #include "helper-a64.h"
++        NPCM7XX_POWER_ON_STRAPS_DEFAULT & ~NPCM7XX_PWRON_STRAP_J2EN)
- #include "helper-sve.h"
++#define QUANTA_GSJ_POWER_ON_STRAPS NPCM7XX_POWER_ON_STRAPS_DEFAULT
-diff --git a/target/arm/translate.h b/target/arm/translate.h
++#define QUANTA_GBS_POWER_ON_STRAPS ( \
-index XXXXXXX..XXXXXXX 100644
++        NPCM7XX_POWER_ON_STRAPS_DEFAULT & ~NPCM7XX_PWRON_STRAP_SFAB)
---- a/target/arm/translate.h
++#define KUDO_BMC_POWER_ON_STRAPS NPCM7XX_POWER_ON_STRAPS_DEFAULT
-+++ b/target/arm/translate.h
++#define MORI_BMC_POWER_ON_STRAPS NPCM7XX_POWER_ON_STRAPS_DEFAULT
-@@ -XXX,XX +XXX,XX @@ extern const GVecGen3 bif_op;
- extern const GVecGen3 mla_op[4];
+ static const char npcm7xx_default_bootrom[] = "npcm7xx_bootrom.bin";
- extern const GVecGen3 mls_op[4];
  extern const GVecGen3 cmtst_op[4];
 +extern const GVecGen3 sshl_op[4];
 +extern const GVecGen3 ushl_op[4];
  extern const GVecGen2i ssra_op[4];
  extern const GVecGen2i usra_op[4];
  extern const GVecGen2i sri_op[4];
@@ -XXX,XX +XXX,XX @@ extern const GVecGen4 sqadd_op[4];
  extern const GVecGen4 uqsub_op[4];
  extern const GVecGen4 sqsub_op[4];
  void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 +void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 +void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 +void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 +void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
  /*
   * Forward to the isar_feature_* tests given a DisasContext pointer.
 diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon_helper.c
 +++ b/target/arm/neon_helper.c
@@ -XXX,XX +XXX,XX @@ NEON_VOP(abd_u32, neon_u32, 1)
      } else { \
          dest = src1 << tmp; \
      }} while (0)
 -NEON_VOP(shl_u8, neon_u8, 4)
  NEON_VOP(shl_u16, neon_u16, 2)
 -NEON_VOP(shl_u32, neon_u32, 1)
  #undef NEON_FN
 -uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
 -{
 -    int8_t shift = (int8_t)shiftop;
 -    if (shift >= 64 || shift <= -64) {
 -        val = 0;
 -    } else if (shift < 0) {
 -        val >>= -shift;
 -    } else {
 -        val <<= shift;
 -    }
 -    return val;
 -}
 -
  #define NEON_FN(dest, src1, src2) do { \
      int8_t tmp; \
      tmp = (int8_t)src2; \
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
      } else { \
          dest = src1 << tmp; \
      }} while (0)
 -NEON_VOP(shl_s8, neon_s8, 4)
  NEON_VOP(shl_s16, neon_s16, 2)
 -NEON_VOP(shl_s32, neon_s32, 1)
  #undef NEON_FN
 -uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop)
 -{
 -    int8_t shift = (int8_t)shiftop;
 -    int64_t val = valop;
 -    if (shift >= 64) {
 -        val = 0;
 -    } else if (shift <= -64) {
 -        val >>= 63;
 -    } else if (shift < 0) {
 -        val >>= -shift;
 -    } else {
 -        val <<= shift;
 -    }
 -    return val;
 -}
 -
  #define NEON_FN(dest, src1, src2) do { \
      int8_t tmp; \
      tmp = (int8_t)src2; \
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
          break;
      case 0x8: /* SSHL, USHL */
          if (u) {
 -            gen_helper_neon_shl_u64(tcg_rd, tcg_rn, tcg_rm);
 +            gen_ushl_i64(tcg_rd, tcg_rn, tcg_rm);
          } else {
 -            gen_helper_neon_shl_s64(tcg_rd, tcg_rn, tcg_rm);
 +            gen_sshl_i64(tcg_rd, tcg_rn, tcg_rm);
          }
          break;
      case 0x9: /* SQSHL, UQSHL */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                         is_q ? 16 : 8, vec_full_reg_size(s),
                         (u ? uqsub_op : sqsub_op) + size);
          return;
 +    case 0x08: /* SSHL, USHL */
 +        gen_gvec_op3(s, is_q, rd, rn, rm,
 +                     u ? &ushl_op[size] : &sshl_op[size]);
 +        return;
      case 0x0c: /* SMAX, UMAX */
          if (u) {
              gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                  genfn = fns[size][u];
                  break;
              }
 -            case 0x8: /* SSHL, USHL */
 -            {
 -                static NeonGenTwoOpFn * const fns[3][2] = {
 -                    { gen_helper_neon_shl_s8, gen_helper_neon_shl_u8 },
 -                    { gen_helper_neon_shl_s16, gen_helper_neon_shl_u16 },
 -                    { gen_helper_neon_shl_s32, gen_helper_neon_shl_u32 },
 -                };
 -                genfn = fns[size][u];
 -                break;
 -            }
              case 0x9: /* SQSHL, UQSHL */
              {
                  static NeonGenTwoOpEnvFn * const fns[3][2] = {
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_shift_narrow(int size, TCGv_i32 var, TCGv_i32 shift,
          if (u) {
              switch (size) {
              case 1: gen_helper_neon_shl_u16(var, var, shift); break;
 -            case 2: gen_helper_neon_shl_u32(var, var, shift); break;
 +            case 2: gen_ushl_i32(var, var, shift); break;
              default: abort();
              }
          } else {
              switch (size) {
              case 1: gen_helper_neon_shl_s16(var, var, shift); break;
 -            case 2: gen_helper_neon_shl_s32(var, var, shift); break;
 +            case 2: gen_sshl_i32(var, var, shift); break;
              default: abort();
              }
          }
@@ -XXX,XX +XXX,XX @@ const GVecGen3 cmtst_op[4] = {
        .vece = MO_64 },
  };
 +void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 +{
 +    TCGv_i32 lval = tcg_temp_new_i32();
 +    TCGv_i32 rval = tcg_temp_new_i32();
 +    TCGv_i32 lsh = tcg_temp_new_i32();
 +    TCGv_i32 rsh = tcg_temp_new_i32();
 +    TCGv_i32 zero = tcg_const_i32(0);
 +    TCGv_i32 max = tcg_const_i32(32);
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_ext8s_i32(lsh, b);
 +    tcg_gen_neg_i32(rsh, lsh);
 +    tcg_gen_shl_i32(lval, a, lsh);
 +    tcg_gen_shr_i32(rval, a, rsh);
 +    tcg_gen_movcond_i32(TCG_COND_LTU, d, lsh, max, lval, zero);
 +    tcg_gen_movcond_i32(TCG_COND_LTU, d, rsh, max, rval, d);
 +
 +    tcg_temp_free_i32(lval);
 +    tcg_temp_free_i32(rval);
 +    tcg_temp_free_i32(lsh);
 +    tcg_temp_free_i32(rsh);
 +    tcg_temp_free_i32(zero);
 +    tcg_temp_free_i32(max);
 +}
 +
 +void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 +{
 +    TCGv_i64 lval = tcg_temp_new_i64();
 +    TCGv_i64 rval = tcg_temp_new_i64();
 +    TCGv_i64 lsh = tcg_temp_new_i64();
 +    TCGv_i64 rsh = tcg_temp_new_i64();
 +    TCGv_i64 zero = tcg_const_i64(0);
 +    TCGv_i64 max = tcg_const_i64(64);
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_ext8s_i64(lsh, b);
 +    tcg_gen_neg_i64(rsh, lsh);
 +    tcg_gen_shl_i64(lval, a, lsh);
 +    tcg_gen_shr_i64(rval, a, rsh);
 +    tcg_gen_movcond_i64(TCG_COND_LTU, d, lsh, max, lval, zero);
 +    tcg_gen_movcond_i64(TCG_COND_LTU, d, rsh, max, rval, d);
 +
 +    tcg_temp_free_i64(lval);
 +    tcg_temp_free_i64(rval);
 +    tcg_temp_free_i64(lsh);
 +    tcg_temp_free_i64(rsh);
 +    tcg_temp_free_i64(zero);
 +    tcg_temp_free_i64(max);
 +}
 +
 +static void gen_ushl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
 +{
 +    TCGv_vec lval = tcg_temp_new_vec_matching(d);
 +    TCGv_vec rval = tcg_temp_new_vec_matching(d);
 +    TCGv_vec lsh = tcg_temp_new_vec_matching(d);
 +    TCGv_vec rsh = tcg_temp_new_vec_matching(d);
 +    TCGv_vec msk, max;
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_neg_vec(vece, rsh, b);
 +    if (vece == MO_8) {
 +        tcg_gen_mov_vec(lsh, b);
 +    } else {
 +        msk = tcg_temp_new_vec_matching(d);
 +        tcg_gen_dupi_vec(vece, msk, 0xff);
 +        tcg_gen_and_vec(vece, lsh, b, msk);
 +        tcg_gen_and_vec(vece, rsh, rsh, msk);
 +        tcg_temp_free_vec(msk);
 +    }
 +
 +    /*
 +     * Perform possibly out of range shifts, trusting that the operation
 +     * does not trap.  Discard unused results after the fact.
 +     */
 +    tcg_gen_shlv_vec(vece, lval, a, lsh);
 +    tcg_gen_shrv_vec(vece, rval, a, rsh);
 +
 +    max = tcg_temp_new_vec_matching(d);
 +    tcg_gen_dupi_vec(vece, max, 8 << vece);
 +
 +    /*
 +     * The choice of LT (signed) and GEU (unsigned) are biased toward
 +     * the instructions of the x86_64 host.  For MO_8, the whole byte
 +     * is significant so we must use an unsigned compare; otherwise we
 +     * have already masked to a byte and so a signed compare works.
 +     * Other tcg hosts have a full set of comparisons and do not care.
 +     */
 +    if (vece == MO_8) {
 +        tcg_gen_cmp_vec(TCG_COND_GEU, vece, lsh, lsh, max);
 +        tcg_gen_cmp_vec(TCG_COND_GEU, vece, rsh, rsh, max);
 +        tcg_gen_andc_vec(vece, lval, lval, lsh);
 +        tcg_gen_andc_vec(vece, rval, rval, rsh);
 +    } else {
 +        tcg_gen_cmp_vec(TCG_COND_LT, vece, lsh, lsh, max);
 +        tcg_gen_cmp_vec(TCG_COND_LT, vece, rsh, rsh, max);
 +        tcg_gen_and_vec(vece, lval, lval, lsh);
 +        tcg_gen_and_vec(vece, rval, rval, rsh);
 +    }
 +    tcg_gen_or_vec(vece, d, lval, rval);
 +
 +    tcg_temp_free_vec(max);
 +    tcg_temp_free_vec(lval);
 +    tcg_temp_free_vec(rval);
 +    tcg_temp_free_vec(lsh);
 +    tcg_temp_free_vec(rsh);
 +}
 +
 +static const TCGOpcode ushl_list[] = {
 +    INDEX_op_neg_vec, INDEX_op_shlv_vec,
 +    INDEX_op_shrv_vec, INDEX_op_cmp_vec, 0
 +};
 +
 +const GVecGen3 ushl_op[4] = {
 +    { .fniv = gen_ushl_vec,
 +      .fno = gen_helper_gvec_ushl_b,
 +      .opt_opc = ushl_list,
 +      .vece = MO_8 },
 +    { .fniv = gen_ushl_vec,
 +      .fno = gen_helper_gvec_ushl_h,
 +      .opt_opc = ushl_list,
 +      .vece = MO_16 },
 +    { .fni4 = gen_ushl_i32,
 +      .fniv = gen_ushl_vec,
 +      .opt_opc = ushl_list,
 +      .vece = MO_32 },
 +    { .fni8 = gen_ushl_i64,
 +      .fniv = gen_ushl_vec,
 +      .opt_opc = ushl_list,
 +      .vece = MO_64 },
 +};
 +
 +void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 +{
 +    TCGv_i32 lval = tcg_temp_new_i32();
 +    TCGv_i32 rval = tcg_temp_new_i32();
 +    TCGv_i32 lsh = tcg_temp_new_i32();
 +    TCGv_i32 rsh = tcg_temp_new_i32();
 +    TCGv_i32 zero = tcg_const_i32(0);
 +    TCGv_i32 max = tcg_const_i32(31);
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_ext8s_i32(lsh, b);
 +    tcg_gen_neg_i32(rsh, lsh);
 +    tcg_gen_shl_i32(lval, a, lsh);
 +    tcg_gen_umin_i32(rsh, rsh, max);
 +    tcg_gen_sar_i32(rval, a, rsh);
 +    tcg_gen_movcond_i32(TCG_COND_LEU, lval, lsh, max, lval, zero);
 +    tcg_gen_movcond_i32(TCG_COND_LT, d, lsh, zero, rval, lval);
 +
 +    tcg_temp_free_i32(lval);
 +    tcg_temp_free_i32(rval);
 +    tcg_temp_free_i32(lsh);
 +    tcg_temp_free_i32(rsh);
 +    tcg_temp_free_i32(zero);
 +    tcg_temp_free_i32(max);
 +}
 +
 +void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 +{
 +    TCGv_i64 lval = tcg_temp_new_i64();
 +    TCGv_i64 rval = tcg_temp_new_i64();
 +    TCGv_i64 lsh = tcg_temp_new_i64();
 +    TCGv_i64 rsh = tcg_temp_new_i64();
 +    TCGv_i64 zero = tcg_const_i64(0);
 +    TCGv_i64 max = tcg_const_i64(63);
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_ext8s_i64(lsh, b);
 +    tcg_gen_neg_i64(rsh, lsh);
 +    tcg_gen_shl_i64(lval, a, lsh);
 +    tcg_gen_umin_i64(rsh, rsh, max);
 +    tcg_gen_sar_i64(rval, a, rsh);
 +    tcg_gen_movcond_i64(TCG_COND_LEU, lval, lsh, max, lval, zero);
 +    tcg_gen_movcond_i64(TCG_COND_LT, d, lsh, zero, rval, lval);
 +
 +    tcg_temp_free_i64(lval);
 +    tcg_temp_free_i64(rval);
 +    tcg_temp_free_i64(lsh);
 +    tcg_temp_free_i64(rsh);
 +    tcg_temp_free_i64(zero);
 +    tcg_temp_free_i64(max);
 +}
 +
 +static void gen_sshl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
 +{
 +    TCGv_vec lval = tcg_temp_new_vec_matching(d);
 +    TCGv_vec rval = tcg_temp_new_vec_matching(d);
 +    TCGv_vec lsh = tcg_temp_new_vec_matching(d);
 +    TCGv_vec rsh = tcg_temp_new_vec_matching(d);
 +    TCGv_vec tmp = tcg_temp_new_vec_matching(d);
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_neg_vec(vece, rsh, b);
 +    if (vece == MO_8) {
 +        tcg_gen_mov_vec(lsh, b);
 +    } else {
 +        tcg_gen_dupi_vec(vece, tmp, 0xff);
 +        tcg_gen_and_vec(vece, lsh, b, tmp);
 +        tcg_gen_and_vec(vece, rsh, rsh, tmp);
 +    }
 +
 +    /* Bound rsh so out of bound right shift gets -1.  */
 +    tcg_gen_dupi_vec(vece, tmp, (8 << vece) - 1);
 +    tcg_gen_umin_vec(vece, rsh, rsh, tmp);
 +    tcg_gen_cmp_vec(TCG_COND_GT, vece, tmp, lsh, tmp);
 +
 +    tcg_gen_shlv_vec(vece, lval, a, lsh);
 +    tcg_gen_sarv_vec(vece, rval, a, rsh);
 +
 +    /* Select in-bound left shift.  */
 +    tcg_gen_andc_vec(vece, lval, lval, tmp);
 +
 +    /* Select between left and right shift.  */
 +    if (vece == MO_8) {
 +        tcg_gen_dupi_vec(vece, tmp, 0);
 +        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, d, lsh, tmp, rval, lval);
 +    } else {
 +        tcg_gen_dupi_vec(vece, tmp, 0x80);
 +        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, d, lsh, tmp, lval, rval);
 +    }
 +
 +    tcg_temp_free_vec(lval);
 +    tcg_temp_free_vec(rval);
 +    tcg_temp_free_vec(lsh);
 +    tcg_temp_free_vec(rsh);
 +    tcg_temp_free_vec(tmp);
 +}
 +
 +static const TCGOpcode sshl_list[] = {
 +    INDEX_op_neg_vec, INDEX_op_umin_vec, INDEX_op_shlv_vec,
 +    INDEX_op_sarv_vec, INDEX_op_cmp_vec, INDEX_op_cmpsel_vec, 0
 +};
 +
 +const GVecGen3 sshl_op[4] = {
 +    { .fniv = gen_sshl_vec,
 +      .fno = gen_helper_gvec_sshl_b,
 +      .opt_opc = sshl_list,
 +      .vece = MO_8 },
 +    { .fniv = gen_sshl_vec,
 +      .fno = gen_helper_gvec_sshl_h,
 +      .opt_opc = sshl_list,
 +      .vece = MO_16 },
 +    { .fni4 = gen_sshl_i32,
 +      .fniv = gen_sshl_vec,
 +      .opt_opc = sshl_list,
 +      .vece = MO_32 },
 +    { .fni8 = gen_sshl_i64,
 +      .fniv = gen_sshl_vec,
 +      .opt_opc = sshl_list,
 +      .vece = MO_64 },
 +};
 +
  static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
                            TCGv_vec a, TCGv_vec b)
  {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                    vec_size, vec_size);
              }
              return 0;
 +
 +        case NEON_3R_VSHL:
 +            tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
 +                           u ? &ushl_op[size] : &sshl_op[size]);
 +            return 0;
          }
          if (size == 3) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  neon_load_reg64(cpu_V0, rn + pass);
                  neon_load_reg64(cpu_V1, rm + pass);
                  switch (op) {
 -                case NEON_3R_VSHL:
 -                    if (u) {
 -                        gen_helper_neon_shl_u64(cpu_V0, cpu_V1, cpu_V0);
 -                    } else {
 -                        gen_helper_neon_shl_s64(cpu_V0, cpu_V1, cpu_V0);
 -                    }
 -                    break;
                  case NEON_3R_VQSHL:
                      if (u) {
                          gen_helper_neon_qshl_u64(cpu_V0, cpu_env,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          }
          pairwise = 0;
          switch (op) {
 -        case NEON_3R_VSHL:
          case NEON_3R_VQSHL:
          case NEON_3R_VRSHL:
          case NEON_3R_VQRSHL:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          case NEON_3R_VHSUB:
              GEN_NEON_INTEGER_OP(hsub);
              break;
 -        case NEON_3R_VSHL:
 -            GEN_NEON_INTEGER_OP(shl);
 -            break;
          case NEON_3R_VQSHL:
              GEN_NEON_INTEGER_OP_ENV(qshl);
              break;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              }
                          } else {
                              if (input_unsigned) {
 -                                gen_helper_neon_shl_u64(cpu_V0, in, tmp64);
 +                                gen_ushl_i64(cpu_V0, in, tmp64);
                              } else {
 -                                gen_helper_neon_shl_s64(cpu_V0, in, tmp64);
 +                                gen_sshl_i64(cpu_V0, in, tmp64);
                              }
                          }
                          tmp = tcg_temp_new_i32();
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
      do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status, desc,
                   get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
  }
 +
 +void HELPER(gvec_sshl_b)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    int8_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz; ++i) {
 +        int8_t mm = m[i];
 +        int8_t nn = n[i];
 +        int8_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 8) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            res = nn >> (mm > -8 ? -mm : 7);
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
 +void HELPER(gvec_sshl_h)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    int16_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz / 2; ++i) {
 +        int8_t mm = m[i];   /* only 8 bits of shift are significant */
 +        int16_t nn = n[i];
 +        int16_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 16) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            res = nn >> (mm > -16 ? -mm : 15);
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
 +void HELPER(gvec_ushl_b)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    uint8_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz; ++i) {
 +        int8_t mm = m[i];
 +        uint8_t nn = n[i];
 +        uint8_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 8) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            if (mm > -8) {
 +                res = nn >> -mm;
 +            }
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
 +void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    uint16_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz / 2; ++i) {
 +        int8_t mm = m[i];   /* only 8 bits of shift are significant */
 +        uint16_t nn = n[i];
 +        uint16_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 16) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            if (mm > -16) {
 +                res = nn >> -mm;
 +            }
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
-.20.1
+.25.1

-[Qemu-devel] [PULL 04/48] hw/arm/smmuv3: Fix decoding of ID register range
+Deleted patch
-The SMMUv3 ID registers cover an area 0x30 bytes in size
-(12 registers, 4 bytes each). We were incorrectly decoding
-only the first 0x20 bytes.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Eric Auger <eric.auger@redhat.com>
-Message-id: 20190524124829.2589-1-peter.maydell@linaro.org
----
- hw/arm/smmuv3.c | 2 +-
-file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/smmuv3.c
-+++ b/hw/arm/smmuv3.c
-@@ -XXX,XX +XXX,XX @@ static MemTxResult smmu_readl(SMMUv3State *s, hwaddr offset,
-                               uint64_t *data, MemTxAttrs attrs)
- {
-     switch (offset) {
--    case A_IDREGS ... A_IDREGS + 0x1f:
-+    case A_IDREGS ... A_IDREGS + 0x2f:
-         *data = smmuv3_idreg(offset - A_IDREGS);
-         return MEMTX_OK;
-     case A_IDR0 ... A_IDR5:
---
-.20.1

-[Qemu-devel] [PULL 05/48] hw/core/bus.c: Only the main system bus can have no parent
+Deleted patch
-In commit 80376c3fc2c38fdd453 in 2010 we added a workaround for
-some qbus buses not being connected to qdev devices -- if the
-bus has no parent object then we register a reset function which
-resets the bus on system reset (and unregister it when the
-bus is unparented).
-Nearly a decade later, we have now no buses in the tree which
-are created with non-NULL parents, so we can remove the
-workaround and instead just assert that if the bus has a NULL
-parent then it is the main system bus.
-(The absence of other parentless buses was confirmed by
-code inspection of all the callsites of qbus_create() and
-qbus_create_inplace() and cross-checked by 'make check'.)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Markus Armbruster <armbru@redhat.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Damien Hedde <damien.hedde@greensocs.com>
-Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20190523150543.22676-1-peter.maydell@linaro.org
----
- hw/core/bus.c | 21 +++++++++------------
-file changed, 9 insertions(+), 12 deletions(-)
-diff --git a/hw/core/bus.c b/hw/core/bus.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/core/bus.c
-+++ b/hw/core/bus.c
-@@ -XXX,XX +XXX,XX @@ static void qbus_realize(BusState *bus, DeviceState *parent, const char *name)
-         bus->parent->num_child_bus++;
-         object_property_add_child(OBJECT(bus->parent), bus->name, OBJECT(bus), NULL);
-         object_unref(OBJECT(bus));
--    } else if (bus != sysbus_get_default()) {
--        /* TODO: once all bus devices are qdevified,
--           only reset handler for main_system_bus should be registered here. */
--        qemu_register_reset(qbus_reset_all_fn, bus);
-+    } else {
-+        /* The only bus without a parent is the main system bus */
-+        assert(bus == sysbus_get_default());
-     }
- }
-@@ -XXX,XX +XXX,XX @@ static void bus_unparent(Object *obj)
-     BusState *bus = BUS(obj);
-     BusChild *kid;
-+    /* Only the main system bus has no parent, and that bus is never freed */
-+    assert(bus->parent);
-+
-     while ((kid = QTAILQ_FIRST(&bus->children)) != NULL) {
-         DeviceState *dev = kid->child;
-         object_unparent(OBJECT(dev));
-     }
--    if (bus->parent) {
--        QLIST_REMOVE(bus, sibling);
--        bus->parent->num_child_bus--;
--        bus->parent = NULL;
--    } else {
--        assert(bus != sysbus_get_default()); /* main_system_bus is never freed */
--        qemu_unregister_reset(qbus_reset_all_fn, bus);
--    }
-+    QLIST_REMOVE(bus, sibling);
-+    bus->parent->num_child_bus--;
-+    bus->parent = NULL;
- }
- void qbus_create_inplace(void *bus, size_t size, const char *typename,
---
-.20.1

-[Qemu-devel] [PULL 11/48] target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max
+Deleted patch
-At the moment our -cpu max for AArch32 supports VFP short-vectors
-because we always implement them, even for CPUs which should
-not have them. The following commits are going to switch to
-using the correct ID-register-check to enable or disable short
-vector support, so we need to turn it on explicitly for -cpu max,
-because Cortex-A15 doesn't implement it.
-We don't enable this for the AArch64 -cpu max, because the v8A
-architecture never supports short-vectors.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/cpu.c | 4 ++++
-file changed, 4 insertions(+)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
-+++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
-         kvm_arm_set_cpu_features_from_host(cpu);
-     } else {
-         cortex_a15_initfn(obj);
-+
-+        /* old-style VFP short-vector support */
-+        cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSHVEC, 1);
-+
- #ifdef CONFIG_USER_ONLY
-         /* We don't set these in system emulation mode for the moment,
-          * since we don't correctly set (all of) the ID registers to
---
-.20.1

-[Qemu-devel] [PULL 17/48] target/arm: Add helpers for VFP register loads and stores
+Deleted patch
-The current VFP code has two different idioms for
-loading and storing from the VFP register file:
-using the gen_mov_F0_vreg() and similar functions,
-   which load and store to a fixed set of TCG globals
-   cpu_F0s, CPU_F0d, etc
-by direct calls to tcg_gen_ld_f64() and friends
-We want to phase out idiom 1 (because the use of the
-fixed globals is a relic of a much older version of TCG),
-but idiom 2 is quite longwinded:
- tcg_gen_ld_f64(tmp, cpu_env, vfp_reg_offset(true, reg))
-requires us to specify the 64-bitness twice, once in
-the function name and once by passing 'true' to
-vfp_reg_offset(). There's no guard against accidentally
-passing the wrong flag.
-Instead, let's move to a convention of accessing 64-bit
-registers via the existing neon_load_reg64() and
-neon_store_reg64(), and provide new neon_load_reg32()
-and neon_store_reg32() for the 32-bit equivalents.
-Implement the new functions and use them in the code in
-translate-vfp.inc.c. We will convert the rest of the VFP
-code as we do the decodetree conversion in subsequent
-commits.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 40 +++++++++++++++++-----------------
- target/arm/translate.c         | 10 +++++++++
-files changed, 30 insertions(+), 20 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
-         tcg_gen_ext_i32_i64(nf, cpu_NF);
-         tcg_gen_ext_i32_i64(vf, cpu_VF);
--        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
--        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
-+        neon_load_reg64(frn, rn);
-+        neon_load_reg64(frm, rm);
-         switch (a->cc) {
-         case 0: /* eq: Z */
-             tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
-             tcg_temp_free_i64(tmp);
-             break;
-         }
--        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
-+        neon_store_reg64(dest, rd);
-         tcg_temp_free_i64(frn);
-         tcg_temp_free_i64(frm);
-         tcg_temp_free_i64(dest);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
-         frn = tcg_temp_new_i32();
-         frm = tcg_temp_new_i32();
-         dest = tcg_temp_new_i32();
--        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
--        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
-+        neon_load_reg32(frn, rn);
-+        neon_load_reg32(frm, rm);
-         switch (a->cc) {
-         case 0: /* eq: Z */
-             tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
-             tcg_temp_free_i32(tmp);
-             break;
-         }
--        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
-+        neon_store_reg32(dest, rd);
-         tcg_temp_free_i32(frn);
-         tcg_temp_free_i32(frm);
-         tcg_temp_free_i32(dest);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
-         frm = tcg_temp_new_i64();
-         dest = tcg_temp_new_i64();
--        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
--        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
-+        neon_load_reg64(frn, rn);
-+        neon_load_reg64(frm, rm);
-         if (vmin) {
-             gen_helper_vfp_minnumd(dest, frn, frm, fpst);
-         } else {
-             gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
-         }
--        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
-+        neon_store_reg64(dest, rd);
-         tcg_temp_free_i64(frn);
-         tcg_temp_free_i64(frm);
-         tcg_temp_free_i64(dest);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
-         frm = tcg_temp_new_i32();
-         dest = tcg_temp_new_i32();
--        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
--        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
-+        neon_load_reg32(frn, rn);
-+        neon_load_reg32(frm, rm);
-         if (vmin) {
-             gen_helper_vfp_minnums(dest, frn, frm, fpst);
-         } else {
-             gen_helper_vfp_maxnums(dest, frn, frm, fpst);
-         }
--        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
-+        neon_store_reg32(dest, rd);
-         tcg_temp_free_i32(frn);
-         tcg_temp_free_i32(frm);
-         tcg_temp_free_i32(dest);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
-         TCGv_i64 tcg_res;
-         tcg_op = tcg_temp_new_i64();
-         tcg_res = tcg_temp_new_i64();
--        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
-+        neon_load_reg64(tcg_op, rm);
-         gen_helper_rintd(tcg_res, tcg_op, fpst);
--        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
-+        neon_store_reg64(tcg_res, rd);
-         tcg_temp_free_i64(tcg_op);
-         tcg_temp_free_i64(tcg_res);
-     } else {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
-         TCGv_i32 tcg_res;
-         tcg_op = tcg_temp_new_i32();
-         tcg_res = tcg_temp_new_i32();
--        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
-+        neon_load_reg32(tcg_op, rm);
-         gen_helper_rints(tcg_res, tcg_op, fpst);
--        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
-+        neon_store_reg32(tcg_res, rd);
-         tcg_temp_free_i32(tcg_op);
-         tcg_temp_free_i32(tcg_res);
-     }
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-         tcg_double = tcg_temp_new_i64();
-         tcg_res = tcg_temp_new_i64();
-         tcg_tmp = tcg_temp_new_i32();
--        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
-+        neon_load_reg64(tcg_double, rm);
-         if (is_signed) {
-             gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
-         } else {
-             gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
-         }
-         tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
--        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
-+        neon_store_reg32(tcg_tmp, rd);
-         tcg_temp_free_i32(tcg_tmp);
-         tcg_temp_free_i64(tcg_res);
-         tcg_temp_free_i64(tcg_double);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-         TCGv_i32 tcg_single, tcg_res;
-         tcg_single = tcg_temp_new_i32();
-         tcg_res = tcg_temp_new_i32();
--        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
-+        neon_load_reg32(tcg_single, rm);
-         if (is_signed) {
-             gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
-         } else {
-             gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
-         }
--        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
-+        neon_store_reg32(tcg_res, rd);
-         tcg_temp_free_i32(tcg_res);
-         tcg_temp_free_i32(tcg_single);
-     }
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
-     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
- }
-+static inline void neon_load_reg32(TCGv_i32 var, int reg)
-+{
-+    tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
-+}
-+
-+static inline void neon_store_reg32(TCGv_i32 var, int reg)
-+{
-+    tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
-+}
-+
- static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
- {
-     TCGv_ptr ret = tcg_temp_new_ptr();
---
-.20.1

-[Qemu-devel] [PULL 20/48] target/arm: Convert VFP two-register transfer insns to decodetree
+Deleted patch
-Convert the VFP two-register transfer instructions to decodetree
-(in the v8 Arm ARM these are the "Advanced SIMD and floating-point
--bit move" encoding group).
-Again, we expand out the sequences involving gen_vfp_msr() and
-gen_msr_vfp().
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 70 ++++++++++++++++++++++++++++++++++
- target/arm/translate.c         | 46 +---------------------
- target/arm/vfp.decode          |  5 +++
-files changed, 77 insertions(+), 44 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
-     return true;
- }
-+
-+static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
-+{
-+    TCGv_i32 tmp;
-+
-+    /*
-+     * VMOV between two general-purpose registers and two single precision
-+     * floating point registers
-+     */
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    if (a->op) {
-+        /* fpreg to gpreg */
-+        tmp = tcg_temp_new_i32();
-+        neon_load_reg32(tmp, a->vm);
-+        store_reg(s, a->rt, tmp);
-+        tmp = tcg_temp_new_i32();
-+        neon_load_reg32(tmp, a->vm + 1);
-+        store_reg(s, a->rt2, tmp);
-+    } else {
-+        /* gpreg to fpreg */
-+        tmp = load_reg(s, a->rt);
-+        neon_store_reg32(tmp, a->vm);
-+        tmp = load_reg(s, a->rt2);
-+        neon_store_reg32(tmp, a->vm + 1);
-+    }
-+
-+    return true;
-+}
-+
-+static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_sp *a)
-+{
-+    TCGv_i32 tmp;
-+
-+    /*
-+     * VMOV between two general-purpose registers and one double precision
-+     * floating point register
-+     */
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    if (a->op) {
-+        /* fpreg to gpreg */
-+        tmp = tcg_temp_new_i32();
-+        neon_load_reg32(tmp, a->vm * 2);
-+        store_reg(s, a->rt, tmp);
-+        tmp = tcg_temp_new_i32();
-+        neon_load_reg32(tmp, a->vm * 2 + 1);
-+        store_reg(s, a->rt2, tmp);
-+    } else {
-+        /* gpreg to fpreg */
-+        tmp = load_reg(s, a->rt);
-+        neon_store_reg32(tmp, a->vm * 2);
-+        tcg_temp_free_i32(tmp);
-+        tmp = load_reg(s, a->rt2);
-+        neon_store_reg32(tmp, a->vm * 2 + 1);
-+        tcg_temp_free_i32(tmp);
-+    }
-+
-+    return true;
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-     case 0xc:
-     case 0xd:
-         if ((insn & 0x03e00000) == 0x00400000) {
--            /* two-register transfer */
--            rn = (insn >> 16) & 0xf;
--            rd = (insn >> 12) & 0xf;
--            if (dp) {
--                VFP_DREG_M(rm, insn);
--            } else {
--                rm = VFP_SREG_M(insn);
--            }
--
--            if (insn & ARM_CP_RW_BIT) {
--                /* vfp->arm */
--                if (dp) {
--                    gen_mov_F0_vreg(0, rm * 2);
--                    tmp = gen_vfp_mrs();
--                    store_reg(s, rd, tmp);
--                    gen_mov_F0_vreg(0, rm * 2 + 1);
--                    tmp = gen_vfp_mrs();
--                    store_reg(s, rn, tmp);
--                } else {
--                    gen_mov_F0_vreg(0, rm);
--                    tmp = gen_vfp_mrs();
--                    store_reg(s, rd, tmp);
--                    gen_mov_F0_vreg(0, rm + 1);
--                    tmp = gen_vfp_mrs();
--                    store_reg(s, rn, tmp);
--                }
--            } else {
--                /* arm->vfp */
--                if (dp) {
--                    tmp = load_reg(s, rd);
--                    gen_vfp_msr(tmp);
--                    gen_mov_vreg_F0(0, rm * 2);
--                    tmp = load_reg(s, rn);
--                    gen_vfp_msr(tmp);
--                    gen_mov_vreg_F0(0, rm * 2 + 1);
--                } else {
--                    tmp = load_reg(s, rd);
--                    gen_vfp_msr(tmp);
--                    gen_mov_vreg_F0(0, rm);
--                    tmp = load_reg(s, rn);
--                    gen_vfp_msr(tmp);
--                    gen_mov_vreg_F0(0, rm + 1);
--                }
--            }
-+            /* Already handled by decodetree */
-+            return 1;
-         } else {
-             /* Load/store */
-             rn = (insn >> 16) & 0xf;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VDUP         ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \
- VMSR_VMRS    ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
- VMOV_single  ---- 1110 000 l:1 .... rt:4 1010 . 001 0000 \
-              vn=%vn_sp
-+
-+VMOV_64_sp   ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 .... \
-+             vm=%vm_sp
-+VMOV_64_dp   ---- 1100 010 op:1 rt2:4 rt:4 1011 00.1 .... \
-+             vm=%vm_dp
---
-.20.1

-[Qemu-devel] [PULL 21/48] target/arm: Convert VFP VLDR and VSTR to decodetree
+Deleted patch
-Convert the VFP single load/store insns VLDR and VSTR to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 73 ++++++++++++++++++++++++++++++++++
- target/arm/translate.c         | 22 +---------
- target/arm/vfp.decode          |  7 ++++
-files changed, 82 insertions(+), 20 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_sp *a)
-     return true;
- }
-+
-+static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-+{
-+    uint32_t offset;
-+    TCGv_i32 addr;
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    offset = a->imm << 2;
-+    if (!a->u) {
-+        offset = -offset;
-+    }
-+
-+    if (s->thumb && a->rn == 15) {
-+        /* This is actually UNPREDICTABLE */
-+        addr = tcg_temp_new_i32();
-+        tcg_gen_movi_i32(addr, s->pc & ~2);
-+    } else {
-+        addr = load_reg(s, a->rn);
-+    }
-+    tcg_gen_addi_i32(addr, addr, offset);
-+    if (a->l) {
-+        gen_vfp_ld(s, false, addr);
-+        gen_mov_vreg_F0(false, a->vd);
-+    } else {
-+        gen_mov_F0_vreg(false, a->vd);
-+        gen_vfp_st(s, false, addr);
-+    }
-+    tcg_temp_free_i32(addr);
-+
-+    return true;
-+}
-+
-+static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-+{
-+    uint32_t offset;
-+    TCGv_i32 addr;
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    offset = a->imm << 2;
-+    if (!a->u) {
-+        offset = -offset;
-+    }
-+
-+    if (s->thumb && a->rn == 15) {
-+        /* This is actually UNPREDICTABLE */
-+        addr = tcg_temp_new_i32();
-+        tcg_gen_movi_i32(addr, s->pc & ~2);
-+    } else {
-+        addr = load_reg(s, a->rn);
-+    }
-+    tcg_gen_addi_i32(addr, addr, offset);
-+    if (a->l) {
-+        gen_vfp_ld(s, true, addr);
-+        gen_mov_vreg_F0(true, a->vd);
-+    } else {
-+        gen_mov_F0_vreg(true, a->vd);
-+        gen_vfp_st(s, true, addr);
-+    }
-+    tcg_temp_free_i32(addr);
-+
-+    return true;
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             else
-                 rd = VFP_SREG_D(insn);
-             if ((insn & 0x01200000) == 0x01000000) {
--                /* Single load/store */
--                offset = (insn & 0xff) << 2;
--                if ((insn & (1 << 23)) == 0)
--                    offset = -offset;
--                if (s->thumb && rn == 15) {
--                    /* This is actually UNPREDICTABLE */
--                    addr = tcg_temp_new_i32();
--                    tcg_gen_movi_i32(addr, s->pc & ~2);
--                } else {
--                    addr = load_reg(s, rn);
--                }
--                tcg_gen_addi_i32(addr, addr, offset);
--                if (insn & (1 << 20)) {
--                    gen_vfp_ld(s, dp, addr);
--                    gen_mov_vreg_F0(dp, rd);
--                } else {
--                    gen_mov_F0_vreg(dp, rd);
--                    gen_vfp_st(s, dp, addr);
--                }
--                tcg_temp_free_i32(addr);
-+                /* Already handled by decodetree */
-+                return 1;
-             } else {
-                 /* load/store multiple */
-                 int w = insn & (1 << 21);
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VMOV_64_sp   ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 .... \
-              vm=%vm_sp
- VMOV_64_dp   ---- 1100 010 op:1 rt2:4 rt:4 1011 00.1 .... \
-              vm=%vm_dp
-+
-+# Note that the half-precision variants of VLDR and VSTR are
-+# not part of this decodetree at all because they have bits [9:8] == 0b01
-+VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8 \
-+             vd=%vd_sp
-+VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8 \
-+             vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 22/48] target/arm: Convert the VFP load/store multiple insns to decodetree
+Deleted patch
-Convert the VFP load/store multiple insns to decodetree.
-This includes tightening up the UNDEF checking for pre-VFPv3
-CPUs which only have D0-D15 : they now UNDEF for any access
-to D16-D31, not merely when the smallest register in the
-transfer list is in D16-D31.
-This conversion does not try to share code between the single
-precision and the double precision versions; this looks a bit
-duplicative of code, but it leaves the door open for a future
-refactoring which gets rid of the use of the "F0" registers
-by inlining the various functions like gen_vfp_ld() and
-gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }"
-conditionalisation.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 162 +++++++++++++++++++++++++++++++++
- target/arm/translate.c         |  97 +-------------------
- target/arm/vfp.decode          |  18 ++++
-files changed, 183 insertions(+), 94 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-     return true;
- }
-+
-+static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
-+{
-+    uint32_t offset;
-+    TCGv_i32 addr;
-+    int i, n;
-+
-+    n = a->imm;
-+
-+    if (n == 0 || (a->vd + n) > 32) {
-+        /*
-+         * UNPREDICTABLE cases for bad immediates: we choose to
-+         * UNDEF to avoid generating huge numbers of TCG ops
-+         */
-+        return false;
-+    }
-+    if (a->rn == 15 && a->w) {
-+        /* writeback to PC is UNPREDICTABLE, we choose to UNDEF */
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    if (s->thumb && a->rn == 15) {
-+        /* This is actually UNPREDICTABLE */
-+        addr = tcg_temp_new_i32();
-+        tcg_gen_movi_i32(addr, s->pc & ~2);
-+    } else {
-+        addr = load_reg(s, a->rn);
-+    }
-+    if (a->p) {
-+        /* pre-decrement */
-+        tcg_gen_addi_i32(addr, addr, -(a->imm << 2));
-+    }
-+
-+    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
-+        /*
-+         * Here 'addr' is the lowest address we will store to,
-+         * and is either the old SP (if post-increment) or
-+         * the new SP (if pre-decrement). For post-increment
-+         * where the old value is below the limit and the new
-+         * value is above, it is UNKNOWN whether the limit check
-+         * triggers; we choose to trigger.
-+         */
-+        gen_helper_v8m_stackcheck(cpu_env, addr);
-+    }
-+
-+    offset = 4;
-+    for (i = 0; i < n; i++) {
-+        if (a->l) {
-+            /* load */
-+            gen_vfp_ld(s, false, addr);
-+            gen_mov_vreg_F0(false, a->vd + i);
-+        } else {
-+            /* store */
-+            gen_mov_F0_vreg(false, a->vd + i);
-+            gen_vfp_st(s, false, addr);
-+        }
-+        tcg_gen_addi_i32(addr, addr, offset);
-+    }
-+    if (a->w) {
-+        /* writeback */
-+        if (a->p) {
-+            offset = -offset * n;
-+            tcg_gen_addi_i32(addr, addr, offset);
-+        }
-+        store_reg(s, a->rn, addr);
-+    } else {
-+        tcg_temp_free_i32(addr);
-+    }
-+
-+    return true;
-+}
-+
-+static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
-+{
-+    uint32_t offset;
-+    TCGv_i32 addr;
-+    int i, n;
-+
-+    n = a->imm >> 1;
-+
-+    if (n == 0 || (a->vd + n) > 32 || n > 16) {
-+        /*
-+         * UNPREDICTABLE cases for bad immediates: we choose to
-+         * UNDEF to avoid generating huge numbers of TCG ops
-+         */
-+        return false;
-+    }
-+    if (a->rn == 15 && a->w) {
-+        /* writeback to PC is UNPREDICTABLE, we choose to UNDEF */
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd + n) > 16) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    if (s->thumb && a->rn == 15) {
-+        /* This is actually UNPREDICTABLE */
-+        addr = tcg_temp_new_i32();
-+        tcg_gen_movi_i32(addr, s->pc & ~2);
-+    } else {
-+        addr = load_reg(s, a->rn);
-+    }
-+    if (a->p) {
-+        /* pre-decrement */
-+        tcg_gen_addi_i32(addr, addr, -(a->imm << 2));
-+    }
-+
-+    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
-+        /*
-+         * Here 'addr' is the lowest address we will store to,
-+         * and is either the old SP (if post-increment) or
-+         * the new SP (if pre-decrement). For post-increment
-+         * where the old value is below the limit and the new
-+         * value is above, it is UNKNOWN whether the limit check
-+         * triggers; we choose to trigger.
-+         */
-+        gen_helper_v8m_stackcheck(cpu_env, addr);
-+    }
-+
-+    offset = 8;
-+    for (i = 0; i < n; i++) {
-+        if (a->l) {
-+            /* load */
-+            gen_vfp_ld(s, true, addr);
-+            gen_mov_vreg_F0(true, a->vd + i);
-+        } else {
-+            /* store */
-+            gen_mov_F0_vreg(true, a->vd + i);
-+            gen_vfp_st(s, true, addr);
-+        }
-+        tcg_gen_addi_i32(addr, addr, offset);
-+    }
-+    if (a->w) {
-+        /* writeback */
-+        if (a->p) {
-+            offset = -offset * n;
-+        } else if (a->imm & 1) {
-+            offset = 4;
-+        } else {
-+            offset = 0;
-+        }
-+
-+        if (offset != 0) {
-+            tcg_gen_addi_i32(addr, addr, offset);
-+        }
-+        store_reg(s, a->rn, addr);
-+    } else {
-+        tcg_temp_free_i32(addr);
-+    }
-+
-+    return true;
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
-  */
- static int disas_vfp_insn(DisasContext *s, uint32_t insn)
- {
--    uint32_t rd, rn, rm, op, i, n, offset, delta_d, delta_m, bank_mask;
-+    uint32_t rd, rn, rm, op, i, n, delta_d, delta_m, bank_mask;
-     int dp, veclen;
--    TCGv_i32 addr;
-     TCGv_i32 tmp;
-     TCGv_i32 tmp2;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-         break;
-     case 0xc:
-     case 0xd:
--        if ((insn & 0x03e00000) == 0x00400000) {
--            /* Already handled by decodetree */
--            return 1;
--        } else {
--            /* Load/store */
--            rn = (insn >> 16) & 0xf;
--            if (dp)
--                VFP_DREG_D(rd, insn);
--            else
--                rd = VFP_SREG_D(insn);
--            if ((insn & 0x01200000) == 0x01000000) {
--                /* Already handled by decodetree */
--                return 1;
--            } else {
--                /* load/store multiple */
--                int w = insn & (1 << 21);
--                if (dp)
--                    n = (insn >> 1) & 0x7f;
--                else
--                    n = insn & 0xff;
--
--                if (w && !(((insn >> 23) ^ (insn >> 24)) & 1)) {
--                    /* P == U , W == 1  => UNDEF */
--                    return 1;
--                }
--                if (n == 0 || (rd + n) > 32 || (dp && n > 16)) {
--                    /* UNPREDICTABLE cases for bad immediates: we choose to
--                     * UNDEF to avoid generating huge numbers of TCG ops
--                     */
--                    return 1;
--                }
--                if (rn == 15 && w) {
--                    /* writeback to PC is UNPREDICTABLE, we choose to UNDEF */
--                    return 1;
--                }
--
--                if (s->thumb && rn == 15) {
--                    /* This is actually UNPREDICTABLE */
--                    addr = tcg_temp_new_i32();
--                    tcg_gen_movi_i32(addr, s->pc & ~2);
--                } else {
--                    addr = load_reg(s, rn);
--                }
--                if (insn & (1 << 24)) /* pre-decrement */
--                    tcg_gen_addi_i32(addr, addr, -((insn & 0xff) << 2));
--
--                if (s->v8m_stackcheck && rn == 13 && w) {
--                    /*
--                     * Here 'addr' is the lowest address we will store to,
--                     * and is either the old SP (if post-increment) or
--                     * the new SP (if pre-decrement). For post-increment
--                     * where the old value is below the limit and the new
--                     * value is above, it is UNKNOWN whether the limit check
--                     * triggers; we choose to trigger.
--                     */
--                    gen_helper_v8m_stackcheck(cpu_env, addr);
--                }
--
--                if (dp)
--                    offset = 8;
--                else
--                    offset = 4;
--                for (i = 0; i < n; i++) {
--                    if (insn & ARM_CP_RW_BIT) {
--                        /* load */
--                        gen_vfp_ld(s, dp, addr);
--                        gen_mov_vreg_F0(dp, rd + i);
--                    } else {
--                        /* store */
--                        gen_mov_F0_vreg(dp, rd + i);
--                        gen_vfp_st(s, dp, addr);
--                    }
--                    tcg_gen_addi_i32(addr, addr, offset);
--                }
--                if (w) {
--                    /* writeback */
--                    if (insn & (1 << 24))
--                        offset = -offset * n;
--                    else if (dp && (insn & 1))
--                        offset = 4;
--                    else
--                        offset = 0;
--
--                    if (offset != 0)
--                        tcg_gen_addi_i32(addr, addr, offset);
--                    store_reg(s, rn, addr);
--                } else {
--                    tcg_temp_free_i32(addr);
--                }
--            }
--        }
--        break;
-+        /* Already handled by decodetree */
-+        return 1;
-     default:
-         /* Should never happen.  */
-         return 1;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8 \
-              vd=%vd_sp
- VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8 \
-              vd=%vd_dp
-+
-+# We split the load/store multiple up into two patterns to avoid
-+# overlap with other insns in the "Advanced SIMD load/store and 64-bit move"
-+# grouping:
-+#   P=0 U=0 W=0 is 64-bit VMOV
-+#   P=1 W=0 is VLDR/VSTR
-+#   P=U W=1 is UNDEF
-+# leaving P=0 U=1 W=x and P=1 U=0 W=1 for load/store multiple.
-+# These include FSTM/FLDM.
-+VLDM_VSTM_sp ---- 1100 1 . w:1 l:1 rn:4 .... 1010 imm:8 \
-+             vd=%vd_sp p=0 u=1
-+VLDM_VSTM_dp ---- 1100 1 . w:1 l:1 rn:4 .... 1011 imm:8 \
-+             vd=%vd_dp p=0 u=1
-+
-+VLDM_VSTM_sp ---- 1101 0.1 l:1 rn:4 .... 1010 imm:8 \
-+             vd=%vd_sp p=1 u=0 w=1
-+VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
-+             vd=%vd_dp p=1 u=0 w=1
---
-.20.1

-[Qemu-devel] [PULL 25/48] target/arm: Convert VFP VMLS to decodetree
+Deleted patch
-Convert the VFP VMLS instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 38 ++++++++++++++++++++++++++++++++++
- target/arm/translate.c         |  8 +------
- target/arm/vfp.decode          |  5 +++++
-files changed, 44 insertions(+), 7 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
- }
-+
-+static void gen_VMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
-+{
-+    /*
-+     * VMLS: vd = vd + -(vn * vm)
-+     * Note that order of inputs to the add matters for NaNs.
-+     */
-+    TCGv_i32 tmp = tcg_temp_new_i32();
-+
-+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
-+    gen_helper_vfp_negs(tmp, tmp);
-+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
-+    tcg_temp_free_i32(tmp);
-+}
-+
-+static bool trans_VMLS_sp(DisasContext *s, arg_VMLS_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_VMLS_sp, a->vd, a->vn, a->vm, true);
-+}
-+
-+static void gen_VMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
-+{
-+    /*
-+     * VMLS: vd = vd + -(vn * vm)
-+     * Note that order of inputs to the add matters for NaNs.
-+     */
-+    TCGv_i64 tmp = tcg_temp_new_i64();
-+
-+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
-+    gen_helper_vfp_negd(tmp, tmp);
-+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
-+    tcg_temp_free_i64(tmp);
-+}
-+
-+static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0:
-+            case 0 ... 1:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 1: /* VMLS: fd + -(fn * fm) */
--                    gen_vfp_mul(dp);
--                    gen_vfp_F1_neg(dp);
--                    gen_mov_F0_vreg(dp, rd);
--                    gen_vfp_add(dp);
--                    break;
-                 case 2: /* VNMLS: -fd + (fn * fm) */
-                     /* Note that it isn't valid to replace (-A + B) with (B - A)
-                      * or similar plausible looking simplifications
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 28/48] target/arm: Convert VMUL to decodetree
+Deleted patch
-Convert the VMUL instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 10 ++++++++++
- target/arm/translate.c         |  5 +----
- target/arm/vfp.decode          |  5 +++++
-files changed, 16 insertions(+), 4 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
- }
-+
-+static bool trans_VMUL_sp(DisasContext *s, arg_VMUL_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_helper_vfp_muls, a->vd, a->vn, a->vm, false);
-+}
-+
-+static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 3:
-+            case 0 ... 4:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 4: /* mul: fn * fm */
--                    gen_vfp_mul(dp);
--                    break;
-                 case 5: /* nmul: -(fn * fm) */
-                     gen_vfp_mul(dp);
-                     gen_vfp_neg(dp);
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 29/48] target/arm: Convert VNMUL to decodetree
+Deleted patch
-Convert the VNMUL instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 24 ++++++++++++++++++++++++
- target/arm/translate.c         |  7 +------
- target/arm/vfp.decode          |  5 +++++
-files changed, 30 insertions(+), 6 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
- }
-+
-+static void gen_VNMUL_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
-+{
-+    /* VNMUL: -(fn * fm) */
-+    gen_helper_vfp_muls(vd, vn, vm, fpst);
-+    gen_helper_vfp_negs(vd, vd);
-+}
-+
-+static bool trans_VNMUL_sp(DisasContext *s, arg_VNMUL_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_VNMUL_sp, a->vd, a->vn, a->vm, false);
-+}
-+
-+static void gen_VNMUL_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
-+{
-+    /* VNMUL: -(fn * fm) */
-+    gen_helper_vfp_muld(vd, vn, vm, fpst);
-+    gen_helper_vfp_negd(vd, vd);
-+}
-+
-+static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
- VFP_OP2(add)
- VFP_OP2(sub)
--VFP_OP2(mul)
- VFP_OP2(div)
- #undef VFP_OP2
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 4:
-+            case 0 ... 5:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 5: /* nmul: -(fn * fm) */
--                    gen_vfp_mul(dp);
--                    gen_vfp_neg(dp);
--                    break;
-                 case 6: /* add: fn + fm */
-                     gen_vfp_add(dp);
-                     break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 30/48] target/arm: Convert VADD to decodetree
+Deleted patch
-Convert the VADD instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 10 ++++++++++
- target/arm/translate.c         |  6 +-----
- target/arm/vfp.decode          |  5 +++++
-files changed, 16 insertions(+), 5 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
- }
-+
-+static bool trans_VADD_sp(DisasContext *s, arg_VADD_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_helper_vfp_adds, a->vd, a->vn, a->vm, false);
-+}
-+
-+static bool trans_VADD_dp(DisasContext *s, arg_VADD_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
-     tcg_temp_free_ptr(fpst);                                          \
- }
--VFP_OP2(add)
- VFP_OP2(sub)
- VFP_OP2(div)
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 5:
-+            case 0 ... 6:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 6: /* add: fn + fm */
--                    gen_vfp_add(dp);
--                    break;
-                 case 7: /* sub: fn - fm */
-                     gen_vfp_sub(dp);
-                     break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 31/48] target/arm: Convert VSUB to decodetree
+Deleted patch
-Convert the VSUB instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 10 ++++++++++
- target/arm/translate.c         |  6 +-----
- target/arm/vfp.decode          |  5 +++++
-files changed, 16 insertions(+), 5 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VADD_dp(DisasContext *s, arg_VADD_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
- }
-+
-+static bool trans_VSUB_sp(DisasContext *s, arg_VSUB_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_helper_vfp_subs, a->vd, a->vn, a->vm, false);
-+}
-+
-+static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
-     tcg_temp_free_ptr(fpst);                                          \
- }
--VFP_OP2(sub)
- VFP_OP2(div)
- #undef VFP_OP2
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 6:
-+            case 0 ... 7:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 7: /* sub: fn - fm */
--                    gen_vfp_sub(dp);
--                    break;
-                 case 8: /* div: fn / fm */
-                     gen_vfp_div(dp);
-                     break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VSUB_sp      ---- 1110 0.11 .... .... 1010 .1.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VSUB_dp      ---- 1110 0.11 .... .... 1011 .1.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 32/48] target/arm: Convert VDIV to decodetree
+Deleted patch
-Convert the VDIV instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 10 ++++++++++
- target/arm/translate.c         | 21 +--------------------
- target/arm/vfp.decode          |  5 +++++
-files changed, 16 insertions(+), 20 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
- }
-+
-+static bool trans_VDIV_sp(DisasContext *s, arg_VDIV_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_helper_vfp_divs, a->vd, a->vn, a->vm, false);
-+}
-+
-+static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_fpstatus_ptr(int neon)
-     return statusptr;
- }
--#define VFP_OP2(name)                                                 \
--static inline void gen_vfp_##name(int dp)                             \
--{                                                                     \
--    TCGv_ptr fpst = get_fpstatus_ptr(0);                              \
--    if (dp) {                                                         \
--        gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, cpu_F1d, fpst);    \
--    } else {                                                          \
--        gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, cpu_F1s, fpst);    \
--    }                                                                 \
--    tcg_temp_free_ptr(fpst);                                          \
--}
--
--VFP_OP2(div)
--
--#undef VFP_OP2
--
- static inline void gen_vfp_abs(int dp)
- {
-     if (dp)
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 7:
-+            case 0 ... 8:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 8: /* div: fn / fm */
--                    gen_vfp_div(dp);
--                    break;
-                 case 10: /* VFNMA : fd = muladd(-fd,  fn, fm) */
-                 case 11: /* VFNMS : fd = muladd(-fd, -fn, fm) */
-                 case 12: /* VFMA  : fd = muladd( fd,  fn, fm) */
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VSUB_sp      ---- 1110 0.11 .... .... 1010 .1.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VSUB_dp      ---- 1110 0.11 .... .... 1011 .1.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 33/48] target/arm: Convert VFP fused multiply-add insns to decodetree
+Deleted patch
-Convert the VFP fused multiply-add instructions (VFNMA, VFNMS,
-VFMA, VFMS) to decodetree.
-Note that in the old decode structure we were implementing
-these to honour the VFP vector stride/length. These instructions
-were introduced in VFPv4, and in the v7A architecture they
-are UNPREDICTABLE if the vector stride or length are non-zero.
-In v8A they must UNDEF if stride or length are non-zero, like
-all VFP instructions; we choose to UNDEF always.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 121 +++++++++++++++++++++++++++++++++
- target/arm/translate.c         |  53 +--------------
- target/arm/vfp.decode          |   9 +++
-files changed, 131 insertions(+), 52 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
- }
-+
-+static bool trans_VFM_sp(DisasContext *s, arg_VFM_sp *a)
-+{
-+    /*
-+     * VFNMA : fd = muladd(-fd,  fn, fm)
-+     * VFNMS : fd = muladd(-fd, -fn, fm)
-+     * VFMA  : fd = muladd( fd,  fn, fm)
-+     * VFMS  : fd = muladd( fd, -fn, fm)
-+     *
-+     * These are fused multiply-add, and must be done as one floating
-+     * point operation with no rounding between the multiplication and
-+     * addition steps.  NB that doing the negations here as separate
-+     * steps is correct : an input NaN should come out with its sign
-+     * bit flipped if it is a negated-input.
-+     */
-+    TCGv_ptr fpst;
-+    TCGv_i32 vn, vm, vd;
-+
-+    /*
-+     * Present in VFPv4 only.
-+     * In v7A, UNPREDICTABLE with non-zero vector length/stride; from
-+     * v8A, must UNDEF. We choose to UNDEF for both v7A and v8A.
-+     */
-+    if (!arm_dc_feature(s, ARM_FEATURE_VFP4) ||
-+        (s->vec_len != 0 || s->vec_stride != 0)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    vn = tcg_temp_new_i32();
-+    vm = tcg_temp_new_i32();
-+    vd = tcg_temp_new_i32();
-+
-+    neon_load_reg32(vn, a->vn);
-+    neon_load_reg32(vm, a->vm);
-+    if (a->o2) {
-+        /* VFNMS, VFMS */
-+        gen_helper_vfp_negs(vn, vn);
-+    }
-+    neon_load_reg32(vd, a->vd);
-+    if (a->o1 & 1) {
-+        /* VFNMA, VFNMS */
-+        gen_helper_vfp_negs(vd, vd);
-+    }
-+    fpst = get_fpstatus_ptr(0);
-+    gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
-+    neon_store_reg32(vd, a->vd);
-+
-+    tcg_temp_free_ptr(fpst);
-+    tcg_temp_free_i32(vn);
-+    tcg_temp_free_i32(vm);
-+    tcg_temp_free_i32(vd);
-+
-+    return true;
-+}
-+
-+static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
-+{
-+    /*
-+     * VFNMA : fd = muladd(-fd,  fn, fm)
-+     * VFNMS : fd = muladd(-fd, -fn, fm)
-+     * VFMA  : fd = muladd( fd,  fn, fm)
-+     * VFMS  : fd = muladd( fd, -fn, fm)
-+     *
-+     * These are fused multiply-add, and must be done as one floating
-+     * point operation with no rounding between the multiplication and
-+     * addition steps.  NB that doing the negations here as separate
-+     * steps is correct : an input NaN should come out with its sign
-+     * bit flipped if it is a negated-input.
-+     */
-+    TCGv_ptr fpst;
-+    TCGv_i64 vn, vm, vd;
-+
-+    /*
-+     * Present in VFPv4 only.
-+     * In v7A, UNPREDICTABLE with non-zero vector length/stride; from
-+     * v8A, must UNDEF. We choose to UNDEF for both v7A and v8A.
-+     */
-+    if (!arm_dc_feature(s, ARM_FEATURE_VFP4) ||
-+        (s->vec_len != 0 || s->vec_stride != 0)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vn | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    vn = tcg_temp_new_i64();
-+    vm = tcg_temp_new_i64();
-+    vd = tcg_temp_new_i64();
-+
-+    neon_load_reg64(vn, a->vn);
-+    neon_load_reg64(vm, a->vm);
-+    if (a->o2) {
-+        /* VFNMS, VFMS */
-+        gen_helper_vfp_negd(vn, vn);
-+    }
-+    neon_load_reg64(vd, a->vd);
-+    if (a->o1 & 1) {
-+        /* VFNMA, VFNMS */
-+        gen_helper_vfp_negd(vd, vd);
-+    }
-+    fpst = get_fpstatus_ptr(0);
-+    gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
-+    neon_store_reg64(vd, a->vd);
-+
-+    tcg_temp_free_ptr(fpst);
-+    tcg_temp_free_i64(vn);
-+    tcg_temp_free_i64(vm);
-+    tcg_temp_free_i64(vd);
-+
-+    return true;
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 8:
-+            case 0 ... 13:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 10: /* VFNMA : fd = muladd(-fd,  fn, fm) */
--                case 11: /* VFNMS : fd = muladd(-fd, -fn, fm) */
--                case 12: /* VFMA  : fd = muladd( fd,  fn, fm) */
--                case 13: /* VFMS  : fd = muladd( fd, -fn, fm) */
--                    /* These are fused multiply-add, and must be done as one
--                     * floating point operation with no rounding between the
--                     * multiplication and addition steps.
--                     * NB that doing the negations here as separate steps is
--                     * correct : an input NaN should come out with its sign bit
--                     * flipped if it is a negated-input.
--                     */
--                    if (!arm_dc_feature(s, ARM_FEATURE_VFP4)) {
--                        return 1;
--                    }
--                    if (dp) {
--                        TCGv_ptr fpst;
--                        TCGv_i64 frd;
--                        if (op & 1) {
--                            /* VFNMS, VFMS */
--                            gen_helper_vfp_negd(cpu_F0d, cpu_F0d);
--                        }
--                        frd = tcg_temp_new_i64();
--                        tcg_gen_ld_f64(frd, cpu_env, vfp_reg_offset(dp, rd));
--                        if (op & 2) {
--                            /* VFNMA, VFNMS */
--                            gen_helper_vfp_negd(frd, frd);
--                        }
--                        fpst = get_fpstatus_ptr(0);
--                        gen_helper_vfp_muladdd(cpu_F0d, cpu_F0d,
--                                               cpu_F1d, frd, fpst);
--                        tcg_temp_free_ptr(fpst);
--                        tcg_temp_free_i64(frd);
--                    } else {
--                        TCGv_ptr fpst;
--                        TCGv_i32 frd;
--                        if (op & 1) {
--                            /* VFNMS, VFMS */
--                            gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
--                        }
--                        frd = tcg_temp_new_i32();
--                        tcg_gen_ld_f32(frd, cpu_env, vfp_reg_offset(dp, rd));
--                        if (op & 2) {
--                            gen_helper_vfp_negs(frd, frd);
--                        }
--                        fpst = get_fpstatus_ptr(0);
--                        gen_helper_vfp_muladds(cpu_F0s, cpu_F0s,
--                                               cpu_F1s, frd, fpst);
--                        tcg_temp_free_ptr(fpst);
--                        tcg_temp_free_i32(frd);
--                    }
--                    break;
-                 case 14: /* fconst */
-                     if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
-                         return 1;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VFM_sp       ---- 1110 1.01 .... .... 1010 . o2:1 . 0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=1
-+VFM_dp       ---- 1110 1.01 .... .... 1011 . o2:1 . 0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=1
-+VFM_sp       ---- 1110 1.10 .... .... 1010 . o2:1 . 0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=2
-+VFM_dp       ---- 1110 1.10 .... .... 1011 . o2:1 . 0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=2
---
-.20.1

-[Qemu-devel] [PULL 36/48] target/arm: Convert VNEG to decodetree
+Deleted patch
-Convert the VNEG instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 10 ++++++++++
- target/arm/translate.c         |  6 +-----
- target/arm/vfp.decode          |  5 +++++
-files changed, 16 insertions(+), 5 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
- {
-     return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
- }
-+
-+static bool trans_VNEG_sp(DisasContext *s, arg_VNEG_sp *a)
-+{
-+    return do_vfp_2op_sp(s, gen_helper_vfp_negs, a->vd, a->vm);
-+}
-+
-+static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
-+{
-+    return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 return 1;
-             case 15:
-                 switch (rn) {
--                case 1:
-+                case 1 ... 2:
-                     /* Already handled by decodetree */
-                     return 1;
-                 default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
-                 switch (rn) {
-                 case 0x00: /* vmov */
--                case 0x02: /* vneg */
-                 case 0x03: /* vsqrt */
-                     break;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                     case 0: /* cpy */
-                         /* no-op */
-                         break;
--                    case 2: /* neg */
--                        gen_vfp_neg(dp);
--                        break;
-                     case 3: /* sqrt */
-                         gen_vfp_sqrt(dp);
-                         break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
-              vd=%vd_sp vm=%vm_sp
- VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
-              vd=%vd_dp vm=%vm_dp
-+
-+VNEG_sp      ---- 1110 1.11 0001 .... 1010 01.0 .... \
-+             vd=%vd_sp vm=%vm_sp
-+VNEG_dp      ---- 1110 1.11 0001 .... 1011 01.0 .... \
-+             vd=%vd_dp vm=%vm_dp
---
-.20.1

-[Qemu-devel] [PULL 37/48] target/arm: Convert VSQRT to decodetree
+Deleted patch
-Convert the VSQRT instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 20 ++++++++++++++++++++
- target/arm/translate.c         | 14 +-------------
- target/arm/vfp.decode          |  5 +++++
-files changed, 26 insertions(+), 13 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
- {
-     return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
- }
-+
-+static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
-+{
-+    gen_helper_vfp_sqrts(vd, vm, cpu_env);
-+}
-+
-+static bool trans_VSQRT_sp(DisasContext *s, arg_VSQRT_sp *a)
-+{
-+    return do_vfp_2op_sp(s, gen_VSQRT_sp, a->vd, a->vm);
-+}
-+
-+static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
-+{
-+    gen_helper_vfp_sqrtd(vd, vm, cpu_env);
-+}
-+
-+static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
-+{
-+    return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_neg(int dp)
-         gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
- }
--static inline void gen_vfp_sqrt(int dp)
--{
--    if (dp)
--        gen_helper_vfp_sqrtd(cpu_F0d, cpu_F0d, cpu_env);
--    else
--        gen_helper_vfp_sqrts(cpu_F0s, cpu_F0s, cpu_env);
--}
--
- static inline void gen_vfp_cmp(int dp)
- {
-     if (dp)
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 return 1;
-             case 15:
-                 switch (rn) {
--                case 1 ... 2:
-+                case 1 ... 3:
-                     /* Already handled by decodetree */
-                     return 1;
-                 default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
-                 switch (rn) {
-                 case 0x00: /* vmov */
--                case 0x03: /* vsqrt */
-                     break;
-                 case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                     case 0: /* cpy */
-                         /* no-op */
-                         break;
--                    case 3: /* sqrt */
--                        gen_vfp_sqrt(dp);
--                        break;
-                     case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
-                     {
-                         TCGv_ptr fpst = get_fpstatus_ptr(false);
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VNEG_sp      ---- 1110 1.11 0001 .... 1010 01.0 .... \
-              vd=%vd_sp vm=%vm_sp
- VNEG_dp      ---- 1110 1.11 0001 .... 1011 01.0 .... \
-              vd=%vd_dp vm=%vm_dp
-+
-+VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 .... \
-+             vd=%vd_sp vm=%vm_sp
-+VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 .... \
-+             vd=%vd_dp vm=%vm_dp
---
-.20.1

-[Qemu-devel] [PULL 38/48] target/arm: Convert VMOV (register) to decodetree
+Deleted patch
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 10 ++++++++++
- target/arm/translate.c         |  8 +-------
- target/arm/vfp.decode          |  5 +++++
-files changed, 16 insertions(+), 7 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
-     return true;
- }
-+static bool trans_VMOV_reg_sp(DisasContext *s, arg_VMOV_reg_sp *a)
-+{
-+    return do_vfp_2op_sp(s, tcg_gen_mov_i32, a->vd, a->vm);
-+}
-+
-+static bool trans_VMOV_reg_dp(DisasContext *s, arg_VMOV_reg_dp *a)
-+{
-+    return do_vfp_2op_dp(s, tcg_gen_mov_i64, a->vd, a->vm);
-+}
-+
- static bool trans_VABS_sp(DisasContext *s, arg_VABS_sp *a)
- {
-     return do_vfp_2op_sp(s, gen_helper_vfp_abss, a->vd, a->vm);
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 return 1;
-             case 15:
-                 switch (rn) {
--                case 1 ... 3:
-+                case 0 ... 3:
-                     /* Already handled by decodetree */
-                     return 1;
-                 default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             if (op == 15) {
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
-                 switch (rn) {
--                case 0x00: /* vmov */
--                    break;
--
-                 case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
-                 case 0x05: /* vcvtt.f64.f16, vcvtt.f32.f16 */
-                     /*
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 switch (op) {
-                 case 15: /* extension space */
-                     switch (rn) {
--                    case 0: /* cpy */
--                        /* no-op */
--                        break;
-                     case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
-                     {
-                         TCGv_ptr fpst = get_fpstatus_ptr(false);
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VMOV_imm_sp  ---- 1110 1.11 imm4h:4 .... 1010 0000 imm4l:4 \
- VMOV_imm_dp  ---- 1110 1.11 imm4h:4 .... 1011 0000 imm4l:4 \
-              vd=%vd_dp
-+VMOV_reg_sp  ---- 1110 1.11 0000 .... 1010 01.0 .... \
-+             vd=%vd_sp vm=%vm_sp
-+VMOV_reg_dp  ---- 1110 1.11 0000 .... 1011 01.0 .... \
-+             vd=%vd_dp vm=%vm_dp
-+
- VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
-              vd=%vd_sp vm=%vm_sp
- VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
---
-.20.1

Arm queue; the bulk of this is the VFP decodetree conversion...

thanks
-- PMM

The following changes since commit 4747524f9f243ca5ff1f146d37e423c00e923ee1:

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2019-06-12' into staging (2019-06-13 11:58:00 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20190613

for you to fetch changes up to 07e4c7f769120c9a5bd6a26c2dc1421f2f838d80:

target/arm: Fix short-vector increment behaviour (2019-06-13 12:57:37 +0100)

----------------------------------------------------------------
target-arm queue:
 * convert aarch32 VFP decoder to decodetree
   (includes tightening up decode in a few places)
 * fix minor bugs in VFP short-vector handling
 * hw/core/bus.c: Only the main system bus can have no parent
 * smmuv3: Fix decoding of ID register range
 * Implement NSACR gating of floating point
 * Use tcg_gen_gvec_bitsel
 * Vectorize USHL and SSHL

----------------------------------------------------------------
Peter Maydell (44):
      target/arm: Implement NSACR gating of floating point
      hw/arm/smmuv3: Fix decoding of ID register range
      hw/core/bus.c: Only the main system bus can have no parent
      target/arm: Add stubs for AArch32 VFP decodetree
      target/arm: Factor out VFP access checking code
      target/arm: Fix Cortex-R5F MVFR values
      target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max
      target/arm: Convert the VSEL instructions to decodetree
      target/arm: Convert VMINNM, VMAXNM to decodetree
      target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM to decodetree
      target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM to decodetree
      target/arm: Move the VFP trans_* functions to translate-vfp.inc.c
      target/arm: Add helpers for VFP register loads and stores
      target/arm: Convert "double-precision" register moves to decodetree
      target/arm: Convert "single-precision" register moves to decodetree
      target/arm: Convert VFP two-register transfer insns to decodetree
      target/arm: Convert VFP VLDR and VSTR to decodetree
      target/arm: Convert the VFP load/store multiple insns to decodetree
      target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d
      target/arm: Convert VFP VMLA to decodetree
      target/arm: Convert VFP VMLS to decodetree
      target/arm: Convert VFP VNMLS to decodetree
      target/arm: Convert VFP VNMLA to decodetree
      target/arm: Convert VMUL to decodetree
      target/arm: Convert VNMUL to decodetree
      target/arm: Convert VADD to decodetree
      target/arm: Convert VSUB to decodetree
      target/arm: Convert VDIV to decodetree
      target/arm: Convert VFP fused multiply-add insns to decodetree
      target/arm: Convert VMOV (imm) to decodetree
      target/arm: Convert VABS to decodetree
      target/arm: Convert VNEG to decodetree
      target/arm: Convert VSQRT to decodetree
      target/arm: Convert VMOV (register) to decodetree
      target/arm: Convert VFP comparison insns to decodetree
      target/arm: Convert the VCVT-from-f16 insns to decodetree
      target/arm: Convert the VCVT-to-f16 insns to decodetree
      target/arm: Convert VFP round insns to decodetree
      target/arm: Convert double-single precision conversion insns to decodetree
      target/arm: Convert integer-to-float insns to decodetree
      target/arm: Convert VJCVT to decodetree
      target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree
      target/arm: Convert float-to-integer VCVT insns to decodetree
      target/arm: Fix short-vector increment behaviour

Richard Henderson (4):
      target/arm: Vectorize USHL and SSHL
      target/arm: Use tcg_gen_gvec_bitsel
      target/arm: Fix output of PAuth Auth
      decodetree: Fix comparison of Field

target/arm/Makefile.objs          |   13 +
 tests/tcg/aarch64/Makefile.target |    2 +-
 target/arm/cpu.h                  |   11 +
 target/arm/helper.h               |   11 +-
 target/arm/translate-a64.h        |    2 +
 target/arm/translate.h            |    9 +-
 hw/arm/smmuv3.c                   |    2 +-
 hw/core/bus.c                     |   21 +-
 target/arm/cpu.c                  |    6 +
 target/arm/helper.c               |   75 +-
 target/arm/neon_helper.c          |   33 -
 target/arm/pauth_helper.c         |    4 +-
 target/arm/translate-a64.c        |   33 +-
 target/arm/translate-vfp.inc.c    | 2672 +++++++++++++++++++++++++++++++++++++
 target/arm/translate.c            | 1881 +++++---------------------
 target/arm/vec_helper.c           |   88 ++
 tests/tcg/aarch64/pauth-2.c       |   61 +
 scripts/decodetree.py             |    2 +-
 target/arm/vfp-uncond.decode      |   63 +
 target/arm/vfp.decode             |  242 ++++
 20 files changed, 3593 insertions(+), 1638 deletions(-)
 create mode 100644 target/arm/translate-vfp.inc.c
 create mode 100644 tests/tcg/aarch64/pauth-2.c
 create mode 100644 target/arm/vfp-uncond.decode
 create mode 100644 target/arm/vfp.decode

From: Richard Henderson <richard.henderson@linaro.org>

These instructions shift left or right depending on the sign
of the input, and 7 bits are significant to the shift.  This
requires several masks and selects in addition to the actual
shifts to form the complete answer.

That said, the operation is still a small improvement even for
two 64-bit elements -- 13 vector operations instead of 2 * 7
integer operations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190603232209.20704-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  11 +-
 target/arm/translate.h     |   6 +
 target/arm/neon_helper.c   |  33 ----
 target/arm/translate-a64.c |  18 +--
 target/arm/translate.c     | 300 +++++++++++++++++++++++++++++++++++--
 target/arm/vec_helper.c    |  88 +++++++++++
 6 files changed, 390 insertions(+), 66 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(neon_abd_s16, i32, i32, i32)
 DEF_HELPER_2(neon_abd_u32, i32, i32, i32)
 DEF_HELPER_2(neon_abd_s32, i32, i32, i32)
 
-DEF_HELPER_2(neon_shl_u8, i32, i32, i32)
-DEF_HELPER_2(neon_shl_s8, i32, i32, i32)
 DEF_HELPER_2(neon_shl_u16, i32, i32, i32)
 DEF_HELPER_2(neon_shl_s16, i32, i32, i32)
-DEF_HELPER_2(neon_shl_u32, i32, i32, i32)
-DEF_HELPER_2(neon_shl_s32, i32, i32, i32)
-DEF_HELPER_2(neon_shl_u64, i64, i64, i64)
-DEF_HELPER_2(neon_shl_s64, i64, i64, i64)
 DEF_HELPER_2(neon_rshl_u8, i32, i32, i32)
 DEF_HELPER_2(neon_rshl_s8, i32, i32, i32)
 DEF_HELPER_2(neon_rshl_u16, i32, i32, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_2(frint64_s, TCG_CALL_NO_RWG, f32, f32, ptr)
 DEF_HELPER_FLAGS_2(frint32_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 DEF_HELPER_FLAGS_2(frint64_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 
+DEF_HELPER_FLAGS_4(gvec_sshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ extern const GVecGen3 bif_op;
 extern const GVecGen3 mla_op[4];
 extern const GVecGen3 mls_op[4];
 extern const GVecGen3 cmtst_op[4];
+extern const GVecGen3 sshl_op[4];
+extern const GVecGen3 ushl_op[4];
 extern const GVecGen2i ssra_op[4];
 extern const GVecGen2i usra_op[4];
 extern const GVecGen2i sri_op[4];
@@ -XXX,XX +XXX,XX @@ extern const GVecGen4 sqadd_op[4];
 extern const GVecGen4 uqsub_op[4];
 extern const GVecGen4 sqsub_op[4];
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
+void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
+void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 /*
  * Forward to the isar_feature_* tests given a DisasContext pointer.
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -XXX,XX +XXX,XX @@ NEON_VOP(abd_u32, neon_u32, 1)
     } else { \
         dest = src1 << tmp; \
     }} while (0)
-NEON_VOP(shl_u8, neon_u8, 4)
 NEON_VOP(shl_u16, neon_u16, 2)
-NEON_VOP(shl_u32, neon_u32, 1)
 #undef NEON_FN
 
-uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
-{
-    int8_t shift = (int8_t)shiftop;
-    if (shift >= 64 || shift <= -64) {
-        val = 0;
-    } else if (shift < 0) {
-        val >>= -shift;
-    } else {
-        val <<= shift;
-    }
-    return val;
-}
-
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
     } else { \
         dest = src1 << tmp; \
     }} while (0)
-NEON_VOP(shl_s8, neon_s8, 4)
 NEON_VOP(shl_s16, neon_s16, 2)
-NEON_VOP(shl_s32, neon_s32, 1)
 #undef NEON_FN
 
-uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop)
-{
-    int8_t shift = (int8_t)shiftop;
-    int64_t val = valop;
-    if (shift >= 64) {
-        val = 0;
-    } else if (shift <= -64) {
-        val >>= 63;
-    } else if (shift < 0) {
-        val >>= -shift;
-    } else {
-        val <<= shift;
-    }
-    return val;
-}
-
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
         break;
     case 0x8: /* SSHL, USHL */
         if (u) {
-            gen_helper_neon_shl_u64(tcg_rd, tcg_rn, tcg_rm);
+            gen_ushl_i64(tcg_rd, tcg_rn, tcg_rm);
         } else {
-            gen_helper_neon_shl_s64(tcg_rd, tcg_rn, tcg_rm);
+            gen_sshl_i64(tcg_rd, tcg_rn, tcg_rm);
         }
         break;
     case 0x9: /* SQSHL, UQSHL */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                        is_q ? 16 : 8, vec_full_reg_size(s),
                        (u ? uqsub_op : sqsub_op) + size);
         return;
+    case 0x08: /* SSHL, USHL */
+        gen_gvec_op3(s, is_q, rd, rn, rm,
+                     u ? &ushl_op[size] : &sshl_op[size]);
+        return;
     case 0x0c: /* SMAX, UMAX */
         if (u) {
             gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genfn = fns[size][u];
                 break;
             }
-            case 0x8: /* SSHL, USHL */
-            {
-                static NeonGenTwoOpFn * const fns[3][2] = {
-                    { gen_helper_neon_shl_s8, gen_helper_neon_shl_u8 },
-                    { gen_helper_neon_shl_s16, gen_helper_neon_shl_u16 },
-                    { gen_helper_neon_shl_s32, gen_helper_neon_shl_u32 },
-                };
-                genfn = fns[size][u];
-                break;
-            }
             case 0x9: /* SQSHL, UQSHL */
             {
                 static NeonGenTwoOpEnvFn * const fns[3][2] = {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_shift_narrow(int size, TCGv_i32 var, TCGv_i32 shift,
         if (u) {
             switch (size) {
             case 1: gen_helper_neon_shl_u16(var, var, shift); break;
-            case 2: gen_helper_neon_shl_u32(var, var, shift); break;
+            case 2: gen_ushl_i32(var, var, shift); break;
             default: abort();
             }
         } else {
             switch (size) {
             case 1: gen_helper_neon_shl_s16(var, var, shift); break;
-            case 2: gen_helper_neon_shl_s32(var, var, shift); break;
+            case 2: gen_sshl_i32(var, var, shift); break;
             default: abort();
             }
         }
@@ -XXX,XX +XXX,XX @@ const GVecGen3 cmtst_op[4] = {
       .vece = MO_64 },
 };
 
+void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 lval = tcg_temp_new_i32();
+    TCGv_i32 rval = tcg_temp_new_i32();
+    TCGv_i32 lsh = tcg_temp_new_i32();
+    TCGv_i32 rsh = tcg_temp_new_i32();
+    TCGv_i32 zero = tcg_const_i32(0);
+    TCGv_i32 max = tcg_const_i32(32);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i32(lsh, b);
+    tcg_gen_neg_i32(rsh, lsh);
+    tcg_gen_shl_i32(lval, a, lsh);
+    tcg_gen_shr_i32(rval, a, rsh);
+    tcg_gen_movcond_i32(TCG_COND_LTU, d, lsh, max, lval, zero);
+    tcg_gen_movcond_i32(TCG_COND_LTU, d, rsh, max, rval, d);
+
+    tcg_temp_free_i32(lval);
+    tcg_temp_free_i32(rval);
+    tcg_temp_free_i32(lsh);
+    tcg_temp_free_i32(rsh);
+    tcg_temp_free_i32(zero);
+    tcg_temp_free_i32(max);
+}
+
+void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 lval = tcg_temp_new_i64();
+    TCGv_i64 rval = tcg_temp_new_i64();
+    TCGv_i64 lsh = tcg_temp_new_i64();
+    TCGv_i64 rsh = tcg_temp_new_i64();
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 max = tcg_const_i64(64);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i64(lsh, b);
+    tcg_gen_neg_i64(rsh, lsh);
+    tcg_gen_shl_i64(lval, a, lsh);
+    tcg_gen_shr_i64(rval, a, rsh);
+    tcg_gen_movcond_i64(TCG_COND_LTU, d, lsh, max, lval, zero);
+    tcg_gen_movcond_i64(TCG_COND_LTU, d, rsh, max, rval, d);
+
+    tcg_temp_free_i64(lval);
+    tcg_temp_free_i64(rval);
+    tcg_temp_free_i64(lsh);
+    tcg_temp_free_i64(rsh);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(max);
+}
+
+static void gen_ushl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec lval = tcg_temp_new_vec_matching(d);
+    TCGv_vec rval = tcg_temp_new_vec_matching(d);
+    TCGv_vec lsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec rsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec msk, max;
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_neg_vec(vece, rsh, b);
+    if (vece == MO_8) {
+        tcg_gen_mov_vec(lsh, b);
+    } else {
+        msk = tcg_temp_new_vec_matching(d);
+        tcg_gen_dupi_vec(vece, msk, 0xff);
+        tcg_gen_and_vec(vece, lsh, b, msk);
+        tcg_gen_and_vec(vece, rsh, rsh, msk);
+        tcg_temp_free_vec(msk);
+    }
+
+    /*
+     * Perform possibly out of range shifts, trusting that the operation
+     * does not trap.  Discard unused results after the fact.
+     */
+    tcg_gen_shlv_vec(vece, lval, a, lsh);
+    tcg_gen_shrv_vec(vece, rval, a, rsh);
+
+    max = tcg_temp_new_vec_matching(d);
+    tcg_gen_dupi_vec(vece, max, 8 << vece);
+
+    /*
+     * The choice of LT (signed) and GEU (unsigned) are biased toward
+     * the instructions of the x86_64 host.  For MO_8, the whole byte
+     * is significant so we must use an unsigned compare; otherwise we
+     * have already masked to a byte and so a signed compare works.
+     * Other tcg hosts have a full set of comparisons and do not care.
+     */
+    if (vece == MO_8) {
+        tcg_gen_cmp_vec(TCG_COND_GEU, vece, lsh, lsh, max);
+        tcg_gen_cmp_vec(TCG_COND_GEU, vece, rsh, rsh, max);
+        tcg_gen_andc_vec(vece, lval, lval, lsh);
+        tcg_gen_andc_vec(vece, rval, rval, rsh);
+    } else {
+        tcg_gen_cmp_vec(TCG_COND_LT, vece, lsh, lsh, max);
+        tcg_gen_cmp_vec(TCG_COND_LT, vece, rsh, rsh, max);
+        tcg_gen_and_vec(vece, lval, lval, lsh);
+        tcg_gen_and_vec(vece, rval, rval, rsh);
+    }
+    tcg_gen_or_vec(vece, d, lval, rval);
+
+    tcg_temp_free_vec(max);
+    tcg_temp_free_vec(lval);
+    tcg_temp_free_vec(rval);
+    tcg_temp_free_vec(lsh);
+    tcg_temp_free_vec(rsh);
+}
+
+static const TCGOpcode ushl_list[] = {
+    INDEX_op_neg_vec, INDEX_op_shlv_vec,
+    INDEX_op_shrv_vec, INDEX_op_cmp_vec, 0
+};
+
+const GVecGen3 ushl_op[4] = {
+    { .fniv = gen_ushl_vec,
+      .fno = gen_helper_gvec_ushl_b,
+      .opt_opc = ushl_list,
+      .vece = MO_8 },
+    { .fniv = gen_ushl_vec,
+      .fno = gen_helper_gvec_ushl_h,
+      .opt_opc = ushl_list,
+      .vece = MO_16 },
+    { .fni4 = gen_ushl_i32,
+      .fniv = gen_ushl_vec,
+      .opt_opc = ushl_list,
+      .vece = MO_32 },
+    { .fni8 = gen_ushl_i64,
+      .fniv = gen_ushl_vec,
+      .opt_opc = ushl_list,
+      .vece = MO_64 },
+};
+
+void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 lval = tcg_temp_new_i32();
+    TCGv_i32 rval = tcg_temp_new_i32();
+    TCGv_i32 lsh = tcg_temp_new_i32();
+    TCGv_i32 rsh = tcg_temp_new_i32();
+    TCGv_i32 zero = tcg_const_i32(0);
+    TCGv_i32 max = tcg_const_i32(31);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i32(lsh, b);
+    tcg_gen_neg_i32(rsh, lsh);
+    tcg_gen_shl_i32(lval, a, lsh);
+    tcg_gen_umin_i32(rsh, rsh, max);
+    tcg_gen_sar_i32(rval, a, rsh);
+    tcg_gen_movcond_i32(TCG_COND_LEU, lval, lsh, max, lval, zero);
+    tcg_gen_movcond_i32(TCG_COND_LT, d, lsh, zero, rval, lval);
+
+    tcg_temp_free_i32(lval);
+    tcg_temp_free_i32(rval);
+    tcg_temp_free_i32(lsh);
+    tcg_temp_free_i32(rsh);
+    tcg_temp_free_i32(zero);
+    tcg_temp_free_i32(max);
+}
+
+void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 lval = tcg_temp_new_i64();
+    TCGv_i64 rval = tcg_temp_new_i64();
+    TCGv_i64 lsh = tcg_temp_new_i64();
+    TCGv_i64 rsh = tcg_temp_new_i64();
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 max = tcg_const_i64(63);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i64(lsh, b);
+    tcg_gen_neg_i64(rsh, lsh);
+    tcg_gen_shl_i64(lval, a, lsh);
+    tcg_gen_umin_i64(rsh, rsh, max);
+    tcg_gen_sar_i64(rval, a, rsh);
+    tcg_gen_movcond_i64(TCG_COND_LEU, lval, lsh, max, lval, zero);
+    tcg_gen_movcond_i64(TCG_COND_LT, d, lsh, zero, rval, lval);
+
+    tcg_temp_free_i64(lval);
+    tcg_temp_free_i64(rval);
+    tcg_temp_free_i64(lsh);
+    tcg_temp_free_i64(rsh);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(max);
+}
+
+static void gen_sshl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec lval = tcg_temp_new_vec_matching(d);
+    TCGv_vec rval = tcg_temp_new_vec_matching(d);
+    TCGv_vec lsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec rsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec tmp = tcg_temp_new_vec_matching(d);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_neg_vec(vece, rsh, b);
+    if (vece == MO_8) {
+        tcg_gen_mov_vec(lsh, b);
+    } else {
+        tcg_gen_dupi_vec(vece, tmp, 0xff);
+        tcg_gen_and_vec(vece, lsh, b, tmp);
+        tcg_gen_and_vec(vece, rsh, rsh, tmp);
+    }
+
+    /* Bound rsh so out of bound right shift gets -1.  */
+    tcg_gen_dupi_vec(vece, tmp, (8 << vece) - 1);
+    tcg_gen_umin_vec(vece, rsh, rsh, tmp);
+    tcg_gen_cmp_vec(TCG_COND_GT, vece, tmp, lsh, tmp);
+
+    tcg_gen_shlv_vec(vece, lval, a, lsh);
+    tcg_gen_sarv_vec(vece, rval, a, rsh);
+
+    /* Select in-bound left shift.  */
+    tcg_gen_andc_vec(vece, lval, lval, tmp);
+
+    /* Select between left and right shift.  */
+    if (vece == MO_8) {
+        tcg_gen_dupi_vec(vece, tmp, 0);
+        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, d, lsh, tmp, rval, lval);
+    } else {
+        tcg_gen_dupi_vec(vece, tmp, 0x80);
+        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, d, lsh, tmp, lval, rval);
+    }
+
+    tcg_temp_free_vec(lval);
+    tcg_temp_free_vec(rval);
+    tcg_temp_free_vec(lsh);
+    tcg_temp_free_vec(rsh);
+    tcg_temp_free_vec(tmp);
+}
+
+static const TCGOpcode sshl_list[] = {
+    INDEX_op_neg_vec, INDEX_op_umin_vec, INDEX_op_shlv_vec,
+    INDEX_op_sarv_vec, INDEX_op_cmp_vec, INDEX_op_cmpsel_vec, 0
+};
+
+const GVecGen3 sshl_op[4] = {
+    { .fniv = gen_sshl_vec,
+      .fno = gen_helper_gvec_sshl_b,
+      .opt_opc = sshl_list,
+      .vece = MO_8 },
+    { .fniv = gen_sshl_vec,
+      .fno = gen_helper_gvec_sshl_h,
+      .opt_opc = sshl_list,
+      .vece = MO_16 },
+    { .fni4 = gen_sshl_i32,
+      .fniv = gen_sshl_vec,
+      .opt_opc = sshl_list,
+      .vece = MO_32 },
+    { .fni8 = gen_sshl_i64,
+      .fniv = gen_sshl_vec,
+      .opt_opc = sshl_list,
+      .vece = MO_64 },
+};
+
 static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
                           TCGv_vec a, TCGv_vec b)
 {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                   vec_size, vec_size);
             }
             return 0;
+
+        case NEON_3R_VSHL:
+            tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
+                           u ? &ushl_op[size] : &sshl_op[size]);
+            return 0;
         }
 
         if (size == 3) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 neon_load_reg64(cpu_V0, rn + pass);
                 neon_load_reg64(cpu_V1, rm + pass);
                 switch (op) {
-                case NEON_3R_VSHL:
-                    if (u) {
-                        gen_helper_neon_shl_u64(cpu_V0, cpu_V1, cpu_V0);
-                    } else {
-                        gen_helper_neon_shl_s64(cpu_V0, cpu_V1, cpu_V0);
-                    }
-                    break;
                 case NEON_3R_VQSHL:
                     if (u) {
                         gen_helper_neon_qshl_u64(cpu_V0, cpu_env,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         }
         pairwise = 0;
         switch (op) {
-        case NEON_3R_VSHL:
         case NEON_3R_VQSHL:
         case NEON_3R_VRSHL:
         case NEON_3R_VQRSHL:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VHSUB:
             GEN_NEON_INTEGER_OP(hsub);
             break;
-        case NEON_3R_VSHL:
-            GEN_NEON_INTEGER_OP(shl);
-            break;
         case NEON_3R_VQSHL:
             GEN_NEON_INTEGER_OP_ENV(qshl);
             break;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             }
                         } else {
                             if (input_unsigned) {
-                                gen_helper_neon_shl_u64(cpu_V0, in, tmp64);
+                                gen_ushl_i64(cpu_V0, in, tmp64);
                             } else {
-                                gen_helper_neon_shl_s64(cpu_V0, in, tmp64);
+                                gen_sshl_i64(cpu_V0, in, tmp64);
                             }
                         }
                         tmp = tcg_temp_new_i32();
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
 }
+
+void HELPER(gvec_sshl_b)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int8_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz; ++i) {
+        int8_t mm = m[i];
+        int8_t nn = n[i];
+        int8_t res = 0;
+        if (mm >= 0) {
+            if (mm < 8) {
+                res = nn << mm;
+            }
+        } else {
+            res = nn >> (mm > -8 ? -mm : 7);
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_sshl_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int16_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 2; ++i) {
+        int8_t mm = m[i];   /* only 8 bits of shift are significant */
+        int16_t nn = n[i];
+        int16_t res = 0;
+        if (mm >= 0) {
+            if (mm < 16) {
+                res = nn << mm;
+            }
+        } else {
+            res = nn >> (mm > -16 ? -mm : 15);
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_ushl_b)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint8_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz; ++i) {
+        int8_t mm = m[i];
+        uint8_t nn = n[i];
+        uint8_t res = 0;
+        if (mm >= 0) {
+            if (mm < 8) {
+                res = nn << mm;
+            }
+        } else {
+            if (mm > -8) {
+                res = nn >> -mm;
+            }
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint16_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 2; ++i) {
+        int8_t mm = m[i];   /* only 8 bits of shift are significant */
+        uint16_t nn = n[i];
+        uint16_t res = 0;
+        if (mm >= 0) {
+            if (mm < 16) {
+                res = nn << mm;
+            }
+        } else {
+            if (mm > -16) {
+                res = nn >> -mm;
+            }
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This replaces 3 target-specific implementations for BIT, BIF, and BSL.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190518191934.21887-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.h |  2 +
 target/arm/translate.h     |  3 --
 target/arm/translate-a64.c | 15 ++++++--
 target/arm/translate.c     | 78 +++-----------------------------------
 4 files changed, 20 insertions(+), 78 deletions(-)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -XXX,XX +XXX,XX @@ typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
                          uint32_t, uint32_t);
 typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
                         uint32_t, uint32_t, uint32_t);
+typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
+                        uint32_t, uint32_t, uint32_t);
 
 #endif /* TARGET_ARM_TRANSLATE_A64_H */
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline void gen_ss_advance(DisasContext *s)
 }
 
 /* Vector operations shared between ARM and AArch64.  */
-extern const GVecGen3 bsl_op;
-extern const GVecGen3 bit_op;
-extern const GVecGen3 bif_op;
 extern const GVecGen3 mla_op[4];
 extern const GVecGen3 mls_op[4];
 extern const GVecGen3 cmtst_op[4];
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_fn3(DisasContext *s, bool is_q, int rd, int rn, int rm,
             vec_full_reg_offset(s, rm), is_q ? 16 : 8, vec_full_reg_size(s));
 }
 
+/* Expand a 4-operand AdvSIMD vector operation using an expander function.  */
+static void gen_gvec_fn4(DisasContext *s, bool is_q, int rd, int rn, int rm,
+                         int rx, GVecGen4Fn *gvec_fn, int vece)
+{
+    gvec_fn(vece, vec_full_reg_offset(s, rd), vec_full_reg_offset(s, rn),
+            vec_full_reg_offset(s, rm), vec_full_reg_offset(s, rx),
+            is_q ? 16 : 8, vec_full_reg_size(s));
+}
+
 /* Expand a 2-operand + immediate AdvSIMD vector operation using
  * an op descriptor.
  */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
         return;
 
     case 5: /* BSL bitwise select */
-        gen_gvec_op3(s, is_q, rd, rn, rm, &bsl_op);
+        gen_gvec_fn4(s, is_q, rd, rd, rn, rm, tcg_gen_gvec_bitsel, 0);
         return;
     case 6: /* BIT, bitwise insert if true */
-        gen_gvec_op3(s, is_q, rd, rn, rm, &bit_op);
+        gen_gvec_fn4(s, is_q, rd, rm, rn, rd, tcg_gen_gvec_bitsel, 0);
         return;
     case 7: /* BIF, bitwise insert if false */
-        gen_gvec_op3(s, is_q, rd, rn, rm, &bif_op);
+        gen_gvec_fn4(s, is_q, rd, rm, rd, rn, tcg_gen_gvec_bitsel, 0);
         return;
 
     default:
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int do_v81_helper(DisasContext *s, gen_helper_gvec_3_ptr *fn,
     return 1;
 }
 
-/*
- * Expanders for VBitOps_VBIF, VBIT, VBSL.
- */
-static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-    tcg_gen_xor_i64(rn, rn, rm);
-    tcg_gen_and_i64(rn, rn, rd);
-    tcg_gen_xor_i64(rd, rm, rn);
-}
-
-static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-    tcg_gen_xor_i64(rn, rn, rd);
-    tcg_gen_and_i64(rn, rn, rm);
-    tcg_gen_xor_i64(rd, rd, rn);
-}
-
-static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-    tcg_gen_xor_i64(rn, rn, rd);
-    tcg_gen_andc_i64(rn, rn, rm);
-    tcg_gen_xor_i64(rd, rd, rn);
-}
-
-static void gen_bsl_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-    tcg_gen_xor_vec(vece, rn, rn, rm);
-    tcg_gen_and_vec(vece, rn, rn, rd);
-    tcg_gen_xor_vec(vece, rd, rm, rn);
-}
-
-static void gen_bit_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-    tcg_gen_xor_vec(vece, rn, rn, rd);
-    tcg_gen_and_vec(vece, rn, rn, rm);
-    tcg_gen_xor_vec(vece, rd, rd, rn);
-}
-
-static void gen_bif_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-    tcg_gen_xor_vec(vece, rn, rn, rd);
-    tcg_gen_andc_vec(vece, rn, rn, rm);
-    tcg_gen_xor_vec(vece, rd, rd, rn);
-}
-
-const GVecGen3 bsl_op = {
-    .fni8 = gen_bsl_i64,
-    .fniv = gen_bsl_vec,
-    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-    .load_dest = true
-};
-
-const GVecGen3 bit_op = {
-    .fni8 = gen_bit_i64,
-    .fniv = gen_bit_vec,
-    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-    .load_dest = true
-};
-
-const GVecGen3 bif_op = {
-    .fni8 = gen_bif_i64,
-    .fniv = gen_bif_vec,
-    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-    .load_dest = true
-};
-
 static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 {
     tcg_gen_vec_sar8i_i64(a, a, shift);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                  vec_size, vec_size);
                 break;
             case 5: /* VBSL */
-                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-                               vec_size, vec_size, &bsl_op);
+                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rd_ofs, rn_ofs, rm_ofs,
+                                    vec_size, vec_size);
                 break;
             case 6: /* VBIT */
-                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-                               vec_size, vec_size, &bit_op);
+                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rn_ofs, rd_ofs,
+                                    vec_size, vec_size);
                 break;
             case 7: /* VBIF */
-                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-                               vec_size, vec_size, &bif_op);
+                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rd_ofs, rn_ofs,
+                                    vec_size, vec_size);
                 break;
             }
             return 0;
-- 
2.20.1

The NSACR register allows secure code to configure the FPU
to be inaccessible to non-secure code. If the NSACR.CP10
bit is set then:
 * NS accesses to the FPU trap as UNDEF (ie to NS EL1 or EL2)
 * CPACR.{CP10,CP11} behave as if RAZ/WI
 * HCPTR.{TCP11,TCP10} behave as if RAO/WI

Note that we do not implement the NSACR.NSASEDIS bit which
gates only access to Advanced SIMD, in the same way that
we don't implement the equivalent CPACR.ASEDIS and HCPTR.TASE.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190510110357.18825-1-peter.maydell@linaro.org
---
 target/arm/helper.c | 75 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void cpacr_write(CPUARMState *env, const ARMCPRegInfo *ri,
         }
         value &= mask;
     }
+
+    /*
+     * For A-profile AArch32 EL3 (but not M-profile secure mode), if NSACR.CP10
+     * is 0 then CPACR.{CP11,CP10} ignore writes and read as 0b00.
+     */
+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
+        value &= ~(0xf << 20);
+        value |= env->cp15.cpacr_el1 & (0xf << 20);
+    }
+
     env->cp15.cpacr_el1 = value;
 }
 
+static uint64_t cpacr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    /*
+     * For A-profile AArch32 EL3 (but not M-profile secure mode), if NSACR.CP10
+     * is 0 then CPACR.{CP11,CP10} ignore writes and read as 0b00.
+     */
+    uint64_t value = env->cp15.cpacr_el1;
+
+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
+        value &= ~(0xf << 20);
+    }
+    return value;
+}
+
+
 static void cpacr_reset(CPUARMState *env, const ARMCPRegInfo *ri)
 {
     /* Call cpacr_write() so that we reset with the correct RAO bits set
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
     { .name = "CPACR", .state = ARM_CP_STATE_BOTH, .opc0 = 3,
       .crn = 1, .crm = 0, .opc1 = 0, .opc2 = 2, .accessfn = cpacr_access,
       .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.cpacr_el1),
-      .resetfn = cpacr_reset, .writefn = cpacr_write },
+      .resetfn = cpacr_reset, .writefn = cpacr_write, .readfn = cpacr_read },
     REGINFO_SENTINEL
 };
 
@@ -XXX,XX +XXX,XX @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
     return ret;
 }
 
+static void cptr_el2_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                           uint64_t value)
+{
+    /*
+     * For A-profile AArch32 EL3, if NSACR.CP10
+     * is 0 then HCPTR.{TCP11,TCP10} ignore writes and read as 1.
+     */
+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
+        value &= ~(0x3 << 10);
+        value |= env->cp15.cptr_el[2] & (0x3 << 10);
+    }
+    env->cp15.cptr_el[2] = value;
+}
+
+static uint64_t cptr_el2_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    /*
+     * For A-profile AArch32 EL3, if NSACR.CP10
+     * is 0 then HCPTR.{TCP11,TCP10} ignore writes and read as 1.
+     */
+    uint64_t value = env->cp15.cptr_el[2];
+
+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
+        value |= 0x3 << 10;
+    }
+    return value;
+}
+
 static const ARMCPRegInfo el2_cp_reginfo[] = {
     { .name = "HCR_EL2", .state = ARM_CP_STATE_AA64,
       .type = ARM_CP_IO,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
     { .name = "CPTR_EL2", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 2,
       .access = PL2_RW, .accessfn = cptr_access, .resetvalue = 0,
-      .fieldoffset = offsetof(CPUARMState, cp15.cptr_el[2]) },
+      .fieldoffset = offsetof(CPUARMState, cp15.cptr_el[2]),
+      .readfn = cptr_el2_read, .writefn = cptr_el2_write },
     { .name = "MAIR_EL2", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 4, .crn = 10, .crm = 2, .opc2 = 0,
       .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.mair_el[2]),
@@ -XXX,XX +XXX,XX @@ int fp_exception_el(CPUARMState *env, int cur_el)
         break;
     }
 
+    /*
+     * The NSACR allows A-profile AArch32 EL3 and M-profile secure mode
+     * to control non-secure access to the FPU. It doesn't have any
+     * effect if EL3 is AArch64 or if EL3 doesn't exist at all.
+     */
+    if ((arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+         cur_el <= 2 && !arm_is_secure_below_el3(env))) {
+        if (!extract32(env->cp15.nsacr, 10, 1)) {
+            /* FP insns act as UNDEF */
+            return cur_el == 2 ? 2 : 1;
+        }
+    }
+
     /* For the CPTR registers we don't need to guard with an ARM_FEATURE
      * check because zero bits in the registers mean "don't trap".
      */
-- 
2.20.1

In commit 80376c3fc2c38fdd453 in 2010 we added a workaround for
some qbus buses not being connected to qdev devices -- if the
bus has no parent object then we register a reset function which
resets the bus on system reset (and unregister it when the
bus is unparented).

Nearly a decade later, we have now no buses in the tree which
are created with non-NULL parents, so we can remove the
workaround and instead just assert that if the bus has a NULL
parent then it is the main system bus.

(The absence of other parentless buses was confirmed by
code inspection of all the callsites of qbus_create() and
qbus_create_inplace() and cross-checked by 'make check'.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Damien Hedde <damien.hedde@greensocs.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190523150543.22676-1-peter.maydell@linaro.org
---
 hw/core/bus.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/hw/core/bus.c b/hw/core/bus.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/core/bus.c
+++ b/hw/core/bus.c
@@ -XXX,XX +XXX,XX @@ static void qbus_realize(BusState *bus, DeviceState *parent, const char *name)
         bus->parent->num_child_bus++;
         object_property_add_child(OBJECT(bus->parent), bus->name, OBJECT(bus), NULL);
         object_unref(OBJECT(bus));
-    } else if (bus != sysbus_get_default()) {
-        /* TODO: once all bus devices are qdevified,
-           only reset handler for main_system_bus should be registered here. */
-        qemu_register_reset(qbus_reset_all_fn, bus);
+    } else {
+        /* The only bus without a parent is the main system bus */
+        assert(bus == sysbus_get_default());
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void bus_unparent(Object *obj)
     BusState *bus = BUS(obj);
     BusChild *kid;
 
+    /* Only the main system bus has no parent, and that bus is never freed */
+    assert(bus->parent);
+
     while ((kid = QTAILQ_FIRST(&bus->children)) != NULL) {
         DeviceState *dev = kid->child;
         object_unparent(OBJECT(dev));
     }
-    if (bus->parent) {
-        QLIST_REMOVE(bus, sibling);
-        bus->parent->num_child_bus--;
-        bus->parent = NULL;
-    } else {
-        assert(bus != sysbus_get_default()); /* main_system_bus is never freed */
-        qemu_unregister_reset(qbus_reset_all_fn, bus);
-    }
+    QLIST_REMOVE(bus, sibling);
+    bus->parent->num_child_bus--;
+    bus->parent = NULL;
 }
 
 void qbus_create_inplace(void *bus, size_t size, const char *typename,
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The ARM pseudocode installs the error_code into the original
pointer, not the encrypted pointer.  The difference applies
within the 7 bits of pac data; the result should be the sign
extension of bit 55.

Add a testcase to that effect.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/tcg/aarch64/Makefile.target |  2 +-
 target/arm/pauth_helper.c         |  4 +-
 tests/tcg/aarch64/pauth-2.c       | 61 +++++++++++++++++++++++++++++++
 3 files changed, 64 insertions(+), 3 deletions(-)
 create mode 100644 tests/tcg/aarch64/pauth-2.c

diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index XXXXXXX..XXXXXXX 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -XXX,XX +XXX,XX @@ run-fcvt: fcvt
 	$(call run-test,$<,$(QEMU) $<, "$< on $(TARGET_NAME)")
 	$(call diff-out,$<,$(AARCH64_SRC)/fcvt.ref)
 
-AARCH64_TESTS += pauth-1
+AARCH64_TESTS += pauth-1 pauth-2
 run-pauth-%: QEMU += -cpu max
 
 TESTS:=$(AARCH64_TESTS)
diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
     if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
         int error_code = (keynumber << 1) | (keynumber ^ 1);
         if (param.tbi) {
-            return deposit64(ptr, 53, 2, error_code);
+            return deposit64(orig_ptr, 53, 2, error_code);
         } else {
-            return deposit64(ptr, 61, 2, error_code);
+            return deposit64(orig_ptr, 61, 2, error_code);
         }
     }
     return orig_ptr;
diff --git a/tests/tcg/aarch64/pauth-2.c b/tests/tcg/aarch64/pauth-2.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/tcg/aarch64/pauth-2.c
@@ -XXX,XX +XXX,XX @@
+#include <stdint.h>
+#include <assert.h>
+
+asm(".arch armv8.4-a");
+
+void do_test(uint64_t value)
+{
+    uint64_t salt1, salt2;
+    uint64_t encode, decode;
+
+    /*
+     * With TBI enabled and a 48-bit VA, there are 7 bits of auth,
+     * and so a 1/128 chance of encode = pac(value,key,salt) producing
+     * an auth for which leaves value unchanged.
+     * Iterate until we find a salt for which encode != value.
+     */
+    for (salt1 = 1; ; salt1++) {
+        asm volatile("pacda %0, %2" : "=r"(encode) : "0"(value), "r"(salt1));
+        if (encode != value) {
+            break;
+        }
+    }
+
+    /* A valid salt must produce a valid authorization.  */
+    asm volatile("autda %0, %2" : "=r"(decode) : "0"(encode), "r"(salt1));
+    assert(decode == value);
+
+    /*
+     * An invalid salt usually fails authorization, but again there
+     * is a chance of choosing another salt that works.
+     * Iterate until we find another salt which does fail.
+     */
+    for (salt2 = salt1 + 1; ; salt2++) {
+        asm volatile("autda %0, %2" : "=r"(decode) : "0"(encode), "r"(salt2));
+        if (decode != value) {
+            break;
+        }
+    }
+
+    /* The VA bits, bit 55, and the TBI bits, should be unchanged.  */
+    assert(((decode ^ value) & 0xff80ffffffffffffull) == 0);
+
+    /*
+     * Bits [54:53] are an error indicator based on the key used;
+     * the DA key above is keynumber 0, so error == 0b01.  Otherwise
+     * bit 55 of the original is sign-extended into the rest of the auth.
+     */
+    if ((value >> 55) & 1) {
+        assert(((decode >> 48) & 0xff) == 0b10111111);
+    } else {
+        assert(((decode >> 48) & 0xff) == 0b00100000);
+    }
+}
+
+int main()
+{
+    do_test(0);
+    do_test(-1);
+    do_test(0xda004acedeadbeefull);
+    return 0;
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Typo comparing the sign of the field, twice, instead of also comparing
the mask of the field (which itself encodes both position and length).

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190604154225.26992-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 scripts/decodetree.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index XXXXXXX..XXXXXXX 100755
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -XXX,XX +XXX,XX @@ class Field:
         return '{0}(insn, {1}, {2})'.format(extr, self.pos, self.len)
 
     def __eq__(self, other):
-        return self.sign == other.sign and self.sign == other.sign
+        return self.sign == other.sign and self.mask == other.mask
 
     def __ne__(self, other):
         return not self.__eq__(other)
-- 
2.20.1

Add the infrastructure for building and invoking a decodetree decoder
for the AArch32 VFP encodings.  At the moment the new decoder covers
nothing, so we always fall back to the existing hand-written decode.

We need to have one decoder for the unconditional insns and one for
the conditional insns, as otherwise the patterns for conditional
insns would incorrectly match against the unconditional ones too.

Since translate.c is over 14,000 lines long and we're going to be
touching pretty much every line of the VFP code as part of the
decodetree conversion, we create a new translate-vfp.inc.c to hold
the code which deals with VFP in the new scheme.  It should be
possible to convert this into a standalone translation unit
eventually, but the conversion process will be much simpler if we
simply #include it midway through translate.c to start with.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/Makefile.objs       | 13 +++++++++++++
 target/arm/translate-vfp.inc.c | 31 +++++++++++++++++++++++++++++++
 target/arm/translate.c         | 19 +++++++++++++++++++
 target/arm/vfp-uncond.decode   | 28 ++++++++++++++++++++++++++++
 target/arm/vfp.decode          | 28 ++++++++++++++++++++++++++++
 5 files changed, 119 insertions(+)
 create mode 100644 target/arm/translate-vfp.inc.c
 create mode 100644 target/arm/vfp-uncond.decode
 create mode 100644 target/arm/vfp.decode

diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -XXX,XX +XXX,XX @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
 	  $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
 	  "GEN", $(TARGET_DIR)$@)
 
+target/arm/decode-vfp.inc.c: $(SRC_PATH)/target/arm/vfp.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_vfp -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
+target/arm/decode-vfp-uncond.inc.c: $(SRC_PATH)/target/arm/vfp-uncond.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_vfp_uncond -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
 target/arm/translate-sve.o: target/arm/decode-sve.inc.c
+target/arm/translate.o: target/arm/decode-vfp.inc.c
+target/arm/translate.o: target/arm/decode-vfp-uncond.inc.c
+
 obj-$(TARGET_AARCH64) += translate-sve.o sve_helper.o
diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@
+/*
+ *  ARM translation: AArch32 VFP instructions
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *  Copyright (c) 2005-2007 CodeSourcery
+ *  Copyright (c) 2007 OpenedHand, Ltd.
+ *  Copyright (c) 2019 Linaro, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This file is intended to be included from translate.c; it uses
+ * some macros and definitions provided by that file.
+ * It might be possible to convert it to a standalone .c file eventually.
+ */
+
+/* Include the generated VFP decoder */
+#include "decode-vfp.inc.c"
+#include "decode-vfp-uncond.inc.c"
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_mov_vreg_F0(int dp, int reg)
 
 #define ARM_CP_RW_BIT   (1 << 20)
 
+/* Include the VFP decoder */
+#include "translate-vfp.inc.c"
+
 static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
 {
     tcg_gen_ld_i64(var, cpu_env, offsetof(CPUARMState, iwmmxt.regs[reg]));
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
         return 1;
     }
 
+    /*
+     * If the decodetree decoder handles this insn it will always
+     * emit code to either execute the insn or generate an appropriate
+     * exception; so we don't need to ever return non-zero to tell
+     * the calling code to emit an UNDEF exception.
+     */
+    if (extract32(insn, 28, 4) == 0xf) {
+        if (disas_vfp_uncond(s, insn)) {
+            return 0;
+        }
+    } else {
+        if (disas_vfp(s, insn)) {
+            return 0;
+        }
+    }
+
     /* FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
      * for attempts to execute invalid vfp/neon encodings with FP disabled.
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@
+# AArch32 VFP instruction descriptions (unconditional insns)
+#
+#  Copyright (c) 2019 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+# Encodings for the unconditional VFP instructions are here:
+# generally anything matching A32
+#  1111 1110 .... .... .... 101. ...0 ....
+# and T32
+#  1111 110. .... .... .... 101. .... ....
+#  1111 1110 .... .... .... 101. .... ....
+# (but those patterns might also cover some Neon instructions,
+# which do not live in this file.)
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@
+# AArch32 VFP instruction descriptions (conditional insns)
+#
+#  Copyright (c) 2019 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+# Encodings for the conditional VFP instructions are here:
+# generally anything matching A32
+#  cccc 11.. .... .... .... 101. .... ....
+# and T32
+#  1110 110. .... .... .... 101. .... ....
+#  1110 1110 .... .... .... 101. .... ....
+# (but those patterns might also cover some Neon instructions,
+# which do not live in this file.)
-- 
2.20.1

Factor out the VFP access checking code so that we can use it in the
leaf functions of the decodetree decoder.

We call the function full_vfp_access_check() so we can keep
the more natural vfp_access_check() for a version which doesn't
have the 'ignore_vfp_enabled' flag -- that way almost all VFP
insns will be able to use vfp_access_check(s) and only the
special-register access function will have to use
full_vfp_access_check(s, ignore_vfp_enabled).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 100 ++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 101 +++++----------------------------
 2 files changed, 113 insertions(+), 88 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@
 /* Include the generated VFP decoder */
 #include "decode-vfp.inc.c"
 #include "decode-vfp-uncond.inc.c"
+
+/*
+ * Check that VFP access is enabled. If it is, do the necessary
+ * M-profile lazy-FP handling and then return true.
+ * If not, emit code to generate an appropriate exception and
+ * return false.
+ * The ignore_vfp_enabled argument specifies that we should ignore
+ * whether VFP is enabled via FPEXC[EN]: this should be true for FMXR/FMRX
+ * accesses to FPSID, FPEXC, MVFR0, MVFR1, MVFR2, and false for all other insns.
+ */
+static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
+{
+    if (s->fp_excp_el) {
+        if (arm_dc_feature(s, ARM_FEATURE_M)) {
+            gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
+                               s->fp_excp_el);
+        } else {
+            gen_exception_insn(s, 4, EXCP_UDEF,
+                               syn_fp_access_trap(1, 0xe, false),
+                               s->fp_excp_el);
+        }
+        return false;
+    }
+
+    if (!s->vfp_enabled && !ignore_vfp_enabled) {
+        assert(!arm_dc_feature(s, ARM_FEATURE_M));
+        gen_exception_insn(s, 4, EXCP_UDEF, syn_uncategorized(),
+                           default_exception_el(s));
+        return false;
+    }
+
+    if (arm_dc_feature(s, ARM_FEATURE_M)) {
+        /* Handle M-profile lazy FP state mechanics */
+
+        /* Trigger lazy-state preservation if necessary */
+        if (s->v7m_lspact) {
+            /*
+             * Lazy state saving affects external memory and also the NVIC,
+             * so we must mark it as an IO operation for icount.
+             */
+            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+                gen_io_start();
+            }
+            gen_helper_v7m_preserve_fp_state(cpu_env);
+            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+                gen_io_end();
+            }
+            /*
+             * If the preserve_fp_state helper doesn't throw an exception
+             * then it will clear LSPACT; we don't need to repeat this for
+             * any further FP insns in this TB.
+             */
+            s->v7m_lspact = false;
+        }
+
+        /* Update ownership of FP context: set FPCCR.S to match current state */
+        if (s->v8m_fpccr_s_wrong) {
+            TCGv_i32 tmp;
+
+            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
+            if (s->v8m_secure) {
+                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
+            } else {
+                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
+            }
+            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
+            /* Don't need to do this for any further FP insns in this TB */
+            s->v8m_fpccr_s_wrong = false;
+        }
+
+        if (s->v7m_new_fp_ctxt_needed) {
+            /*
+             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
+             * and the FPSCR.
+             */
+            TCGv_i32 control, fpscr;
+            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
+
+            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
+            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
+            tcg_temp_free_i32(fpscr);
+            /*
+             * We don't need to arrange to end the TB, because the only
+             * parts of FPSCR which we cache in the TB flags are the VECLEN
+             * and VECSTRIDE, and those don't exist for M-profile.
+             */
+
+            if (s->v8m_secure) {
+                bits |= R_V7M_CONTROL_SFPA_MASK;
+            }
+            control = load_cpu_field(v7m.control[M_REG_S]);
+            tcg_gen_ori_i32(control, control, bits);
+            store_cpu_field(control, v7m.control[M_REG_S]);
+            /* Don't need to do this for any further FP insns in this TB */
+            s->v7m_new_fp_ctxt_needed = false;
+        }
+    }
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
     return 1;
 }
 
-/* Disassemble a VFP instruction.  Returns nonzero if an error occurred
-   (ie. an undefined instruction).  */
+/*
+ * Disassemble a VFP instruction.  Returns nonzero if an error occurred
+ * (ie. an undefined instruction).
+ */
 static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 {
     uint32_t rd, rn, rm, op, i, n, offset, delta_d, delta_m, bank_mask;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
     TCGv_i32 addr;
     TCGv_i32 tmp;
     TCGv_i32 tmp2;
+    bool ignore_vfp_enabled = false;
 
     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
         }
     }
 
-    /* FIXME: this access check should not take precedence over UNDEF
+    /*
+     * FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
      * for attempts to execute invalid vfp/neon encodings with FP disabled.
      */
-    if (s->fp_excp_el) {
-        if (arm_dc_feature(s, ARM_FEATURE_M)) {
-            gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
-                               s->fp_excp_el);
-        } else {
-            gen_exception_insn(s, 4, EXCP_UDEF,
-                               syn_fp_access_trap(1, 0xe, false),
-                               s->fp_excp_el);
-        }
-        return 0;
-    }
-
-    if (!s->vfp_enabled) {
-        /* VFP disabled.  Only allow fmxr/fmrx to/from some control regs.  */
-        if ((insn & 0x0fe00fff) != 0x0ee00a10)
-            return 1;
+    if ((insn & 0x0fe00fff) == 0x0ee00a10) {
         rn = (insn >> 16) & 0xf;
-        if (rn != ARM_VFP_FPSID && rn != ARM_VFP_FPEXC && rn != ARM_VFP_MVFR2
-            && rn != ARM_VFP_MVFR1 && rn != ARM_VFP_MVFR0) {
-            return 1;
+        if (rn == ARM_VFP_FPSID || rn == ARM_VFP_FPEXC || rn == ARM_VFP_MVFR2
+            || rn == ARM_VFP_MVFR1 || rn == ARM_VFP_MVFR0) {
+            ignore_vfp_enabled = true;
         }
     }
-
-    if (arm_dc_feature(s, ARM_FEATURE_M)) {
-        /* Handle M-profile lazy FP state mechanics */
-
-        /* Trigger lazy-state preservation if necessary */
-        if (s->v7m_lspact) {
-            /*
-             * Lazy state saving affects external memory and also the NVIC,
-             * so we must mark it as an IO operation for icount.
-             */
-            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-                gen_io_start();
-            }
-            gen_helper_v7m_preserve_fp_state(cpu_env);
-            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-                gen_io_end();
-            }
-            /*
-             * If the preserve_fp_state helper doesn't throw an exception
-             * then it will clear LSPACT; we don't need to repeat this for
-             * any further FP insns in this TB.
-             */
-            s->v7m_lspact = false;
-        }
-
-        /* Update ownership of FP context: set FPCCR.S to match current state */
-        if (s->v8m_fpccr_s_wrong) {
-            TCGv_i32 tmp;
-
-            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
-            if (s->v8m_secure) {
-                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
-            } else {
-                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
-            }
-            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
-            /* Don't need to do this for any further FP insns in this TB */
-            s->v8m_fpccr_s_wrong = false;
-        }
-
-        if (s->v7m_new_fp_ctxt_needed) {
-            /*
-             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
-             * and the FPSCR.
-             */
-            TCGv_i32 control, fpscr;
-            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
-
-            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
-            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
-            tcg_temp_free_i32(fpscr);
-            /*
-             * We don't need to arrange to end the TB, because the only
-             * parts of FPSCR which we cache in the TB flags are the VECLEN
-             * and VECSTRIDE, and those don't exist for M-profile.
-             */
-
-            if (s->v8m_secure) {
-                bits |= R_V7M_CONTROL_SFPA_MASK;
-            }
-            control = load_cpu_field(v7m.control[M_REG_S]);
-            tcg_gen_ori_i32(control, control, bits);
-            store_cpu_field(control, v7m.control[M_REG_S]);
-            /* Don't need to do this for any further FP insns in this TB */
-            s->v7m_new_fp_ctxt_needed = false;
-        }
+    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
+        return 0;
     }
 
     if (extract32(insn, 28, 4) == 0xf) {
-- 
2.20.1

At the moment our -cpu max for AArch32 supports VFP short-vectors
because we always implement them, even for CPUs which should
not have them. The following commits are going to switch to
using the correct ID-register-check to enable or disable short
vector support, so we need to turn it on explicitly for -cpu max,
because Cortex-A15 doesn't implement it.

We don't enable this for the AArch64 -cpu max, because the v8A
architecture never supports short-vectors.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
         kvm_arm_set_cpu_features_from_host(cpu);
     } else {
         cortex_a15_initfn(obj);
+
+        /* old-style VFP short-vector support */
+        cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSHVEC, 1);
+
 #ifdef CONFIG_USER_ONLY
         /* We don't set these in system emulation mode for the moment,
          * since we don't correctly set (all of) the ID registers to
-- 
2.20.1

Convert the VSEL instructions to decodetree.
We leave trans_VSEL() in translate.c for now as this allows
the patch to show just the changes from the old handle_vsel().

In the old code the check for "do D16-D31 exist" was hidden in
the VFP_DREG macro, and assumed that VFPv3 always implied that
D16-D31 exist. In the new code we do the correct ID register test.
This gives identical behaviour for most of our CPUs, and fixes
previously incorrect handling for  Cortex-R5F, Cortex-M4 and
Cortex-M33, which all implement VFPv3 or better with only 16
double-precision registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h               |  6 ++++++
 target/arm/translate-vfp.inc.c |  9 +++++++++
 target/arm/translate.c         | 35 ++++++++++++++++++++++++----------
 target/arm/vfp-uncond.decode   | 19 ++++++++++++++++++
 4 files changed, 59 insertions(+), 10 deletions(-)

Convert the VMINNM and VMAXNM instructions to decodetree.
As with VSEL, we leave the trans_VMINMAXNM() function
in translate.c for the moment.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c       | 41 ++++++++++++++++++++++++------------
 target/arm/vfp-uncond.decode |  5 +++++
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
     return true;
 }
 
-static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
-                            uint32_t rm, uint32_t dp)
+static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
 {
-    uint32_t vmin = extract32(insn, 6, 1);
-    TCGv_ptr fpst = get_fpstatus_ptr(0);
+    uint32_t rd, rn, rm;
+    bool dp = a->dp;
+    bool vmin = a->op;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vn | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rn = a->vn;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
 
     if (dp) {
         TCGv_i64 frn, frm, dest;
@@ -XXX,XX +XXX,XX @@ static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
     }
 
     tcg_temp_free_ptr(fpst);
-    return 0;
+    return true;
 }
 
 static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
@@ -XXX,XX +XXX,XX @@ static const uint8_t fp_decode_rm[] = {
 
 static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
 {
-    uint32_t rd, rn, rm, dp = extract32(insn, 8, 1);
+    uint32_t rd, rm, dp = extract32(insn, 8, 1);
 
     if (dp) {
         VFP_DREG_D(rd, insn);
-        VFP_DREG_N(rn, insn);
         VFP_DREG_M(rm, insn);
     } else {
         rd = VFP_SREG_D(insn);
-        rn = VFP_SREG_N(insn);
         rm = VFP_SREG_M(insn);
     }
 
-    if ((insn & 0x0fb00e10) == 0x0e800a00 &&
-        dc_isar_feature(aa32_vminmaxnm, s)) {
-        return handle_vminmaxnm(insn, rd, rn, rm, dp);
-    } else if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
-               dc_isar_feature(aa32_vrint, s)) {
+    if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
+        dc_isar_feature(aa32_vrint, s)) {
         /* VRINTA, VRINTN, VRINTP, VRINTM */
         int rounding = fp_decode_rm[extract32(insn, 16, 2)];
         return handle_vrint(insn, rd, rm, dp, rounding);
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VSEL        1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
             vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
 VSEL        1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
             vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
+
+VMINMAXNM   1111 1110 1.00 .... .... 1010 . op:1 .0 .... \
+            vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
+VMINMAXNM   1111 1110 1.00 .... .... 1011 . op:1 .0 .... \
+            vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
-- 
2.20.1

Convert the VRINTA/VRINTN/VRINTP/VRINTM instructions to decodetree.
Again, trans_VRINT() is temporarily left in translate.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c       | 60 +++++++++++++++++++++++-------------
 target/arm/vfp-uncond.decode |  5 +++
 2 files changed, 43 insertions(+), 22 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
     return true;
 }
 
-static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
-                        int rounding)
+/*
+ * Table for converting the most common AArch32 encoding of
+ * rounding mode to arm_fprounding order (which matches the
+ * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
+ */
+static const uint8_t fp_decode_rm[] = {
+    FPROUNDING_TIEAWAY,
+    FPROUNDING_TIEEVEN,
+    FPROUNDING_POSINF,
+    FPROUNDING_NEGINF,
+};
+
+static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
 {
-    TCGv_ptr fpst = get_fpstatus_ptr(0);
+    uint32_t rd, rm;
+    bool dp = a->dp;
+    TCGv_ptr fpst;
     TCGv_i32 tcg_rmode;
+    int rounding = fp_decode_rm[a->rm];
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
 
     tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
@@ -XXX,XX +XXX,XX @@ static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
     tcg_temp_free_i32(tcg_rmode);
 
     tcg_temp_free_ptr(fpst);
-    return 0;
+    return true;
 }
 
 static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
     return 0;
 }
 
-/* Table for converting the most common AArch32 encoding of
- * rounding mode to arm_fprounding order (which matches the
- * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
- */
-static const uint8_t fp_decode_rm[] = {
-    FPROUNDING_TIEAWAY,
-    FPROUNDING_TIEEVEN,
-    FPROUNDING_POSINF,
-    FPROUNDING_NEGINF,
-};
-
 static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
 {
     uint32_t rd, rm, dp = extract32(insn, 8, 1);
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
         rm = VFP_SREG_M(insn);
     }
 
-    if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
-        dc_isar_feature(aa32_vrint, s)) {
-        /* VRINTA, VRINTN, VRINTP, VRINTM */
-        int rounding = fp_decode_rm[extract32(insn, 16, 2)];
-        return handle_vrint(insn, rd, rm, dp, rounding);
-    } else if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
-               dc_isar_feature(aa32_vcvt_dr, s)) {
+    if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
+        dc_isar_feature(aa32_vcvt_dr, s)) {
         /* VCVTA, VCVTN, VCVTP, VCVTM */
         int rounding = fp_decode_rm[extract32(insn, 16, 2)];
         return handle_vcvt(insn, rd, rm, dp, rounding);
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VMINMAXNM   1111 1110 1.00 .... .... 1010 . op:1 .0 .... \
             vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
 VMINMAXNM   1111 1110 1.00 .... .... 1011 . op:1 .0 .... \
             vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
+
+VRINT       1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
+            vm=%vm_sp vd=%vd_sp dp=0
+VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
+            vm=%vm_dp vd=%vd_dp dp=1
-- 
2.20.1

Convert the VCVTA/VCVTN/VCVTP/VCVTM instructions to decodetree.
trans_VCVT() is temporarily left in translate.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c       | 72 +++++++++++++++++-------------------
 target/arm/vfp-uncond.decode |  6 +++
 2 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
     return true;
 }
 
-static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
-                       int rounding)
+static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
 {
-    bool is_signed = extract32(insn, 7, 1);
-    TCGv_ptr fpst = get_fpstatus_ptr(0);
+    uint32_t rd, rm;
+    bool dp = a->dp;
+    TCGv_ptr fpst;
     TCGv_i32 tcg_rmode, tcg_shift;
+    int rounding = fp_decode_rm[a->rm];
+    bool is_signed = a->op;
+
+    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
 
     tcg_shift = tcg_const_i32(0);
 
@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
     if (dp) {
         TCGv_i64 tcg_double, tcg_res;
         TCGv_i32 tcg_tmp;
-        /* Rd is encoded as a single precision register even when the source
-         * is double precision.
-         */
-        rd = ((rd << 1) & 0x1e) | ((rd >> 4) & 0x1);
         tcg_double = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
         tcg_tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
 
     tcg_temp_free_ptr(fpst);
 
-    return 0;
-}
-
-static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
-{
-    uint32_t rd, rm, dp = extract32(insn, 8, 1);
-
-    if (dp) {
-        VFP_DREG_D(rd, insn);
-        VFP_DREG_M(rm, insn);
-    } else {
-        rd = VFP_SREG_D(insn);
-        rm = VFP_SREG_M(insn);
-    }
-
-    if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
-        dc_isar_feature(aa32_vcvt_dr, s)) {
-        /* VCVTA, VCVTN, VCVTP, VCVTM */
-        int rounding = fp_decode_rm[extract32(insn, 16, 2)];
-        return handle_vcvt(insn, rd, rm, dp, rounding);
-    }
-    return 1;
+    return true;
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
         }
     }
 
+    if (extract32(insn, 28, 4) == 0xf) {
+        /*
+         * Encodings with T=1 (Thumb) or unconditional (ARM): these
+         * were all handled by the decodetree decoder, so any insn
+         * patterns which get here must be UNDEF.
+         */
+        return 1;
+    }
+
     /*
      * FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
         return 0;
     }
 
-    if (extract32(insn, 28, 4) == 0xf) {
-        /*
-         * Encodings with T=1 (Thumb) or unconditional (ARM):
-         * only used for the "miscellaneous VFP features" added in v8A
-         * and v7M (and gated on the MVFR2.FPMisc field).
-         */
-        return disas_vfp_misc_insn(s, insn);
-    }
-
     dp = ((insn & 0xf00) == 0xb00);
     switch ((insn >> 24) & 0xf) {
     case 0xe:
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VRINT       1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
             vm=%vm_sp vd=%vd_sp dp=0
 VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
             vm=%vm_dp vd=%vd_dp dp=1
+
+# VCVT float to int with specified rounding mode; Vd is always single-precision
+VCVT        1111 1110 1.11 11 rm:2 .... 1010 op:1 1.0 .... \
+            vm=%vm_sp vd=%vd_sp dp=0
+VCVT        1111 1110 1.11 11 rm:2 .... 1011 op:1 1.0 .... \
+            vm=%vm_dp vd=%vd_sp dp=1
-- 
2.20.1

Move the trans_*() functions we've just created from translate.c
to translate-vfp.inc.c. This is pure code motion with no textual
changes (this can be checked with 'git show --color-moved').

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 337 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 337 ---------------------------------
 2 files changed, 337 insertions(+), 337 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check(DisasContext *s)
 {
     return full_vfp_access_check(s, false);
 }
+
+static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+{
+    uint32_t rd, rn, rm;
+    bool dp = a->dp;
+
+    if (!dc_isar_feature(aa32_vsel, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vn | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rn = a->vn;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (dp) {
+        TCGv_i64 frn, frm, dest;
+        TCGv_i64 tmp, zero, zf, nf, vf;
+
+        zero = tcg_const_i64(0);
+
+        frn = tcg_temp_new_i64();
+        frm = tcg_temp_new_i64();
+        dest = tcg_temp_new_i64();
+
+        zf = tcg_temp_new_i64();
+        nf = tcg_temp_new_i64();
+        vf = tcg_temp_new_i64();
+
+        tcg_gen_extu_i32_i64(zf, cpu_ZF);
+        tcg_gen_ext_i32_i64(nf, cpu_NF);
+        tcg_gen_ext_i32_i64(vf, cpu_VF);
+
+        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
+        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+        switch (a->cc) {
+        case 0: /* eq: Z */
+            tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
+                                frn, frm);
+            break;
+        case 1: /* vs: V */
+            tcg_gen_movcond_i64(TCG_COND_LT, dest, vf, zero,
+                                frn, frm);
+            break;
+        case 2: /* ge: N == V -> N ^ V == 0 */
+            tmp = tcg_temp_new_i64();
+            tcg_gen_xor_i64(tmp, vf, nf);
+            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
+                                frn, frm);
+            tcg_temp_free_i64(tmp);
+            break;
+        case 3: /* gt: !Z && N == V */
+            tcg_gen_movcond_i64(TCG_COND_NE, dest, zf, zero,
+                                frn, frm);
+            tmp = tcg_temp_new_i64();
+            tcg_gen_xor_i64(tmp, vf, nf);
+            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
+                                dest, frm);
+            tcg_temp_free_i64(tmp);
+            break;
+        }
+        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i64(frn);
+        tcg_temp_free_i64(frm);
+        tcg_temp_free_i64(dest);
+
+        tcg_temp_free_i64(zf);
+        tcg_temp_free_i64(nf);
+        tcg_temp_free_i64(vf);
+
+        tcg_temp_free_i64(zero);
+    } else {
+        TCGv_i32 frn, frm, dest;
+        TCGv_i32 tmp, zero;
+
+        zero = tcg_const_i32(0);
+
+        frn = tcg_temp_new_i32();
+        frm = tcg_temp_new_i32();
+        dest = tcg_temp_new_i32();
+        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
+        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+        switch (a->cc) {
+        case 0: /* eq: Z */
+            tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
+                                frn, frm);
+            break;
+        case 1: /* vs: V */
+            tcg_gen_movcond_i32(TCG_COND_LT, dest, cpu_VF, zero,
+                                frn, frm);
+            break;
+        case 2: /* ge: N == V -> N ^ V == 0 */
+            tmp = tcg_temp_new_i32();
+            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
+            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
+                                frn, frm);
+            tcg_temp_free_i32(tmp);
+            break;
+        case 3: /* gt: !Z && N == V */
+            tcg_gen_movcond_i32(TCG_COND_NE, dest, cpu_ZF, zero,
+                                frn, frm);
+            tmp = tcg_temp_new_i32();
+            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
+            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
+                                dest, frm);
+            tcg_temp_free_i32(tmp);
+            break;
+        }
+        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i32(frn);
+        tcg_temp_free_i32(frm);
+        tcg_temp_free_i32(dest);
+
+        tcg_temp_free_i32(zero);
+    }
+
+    return true;
+}
+
+static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
+{
+    uint32_t rd, rn, rm;
+    bool dp = a->dp;
+    bool vmin = a->op;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vn | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rn = a->vn;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
+
+    if (dp) {
+        TCGv_i64 frn, frm, dest;
+
+        frn = tcg_temp_new_i64();
+        frm = tcg_temp_new_i64();
+        dest = tcg_temp_new_i64();
+
+        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
+        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+        if (vmin) {
+            gen_helper_vfp_minnumd(dest, frn, frm, fpst);
+        } else {
+            gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
+        }
+        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i64(frn);
+        tcg_temp_free_i64(frm);
+        tcg_temp_free_i64(dest);
+    } else {
+        TCGv_i32 frn, frm, dest;
+
+        frn = tcg_temp_new_i32();
+        frm = tcg_temp_new_i32();
+        dest = tcg_temp_new_i32();
+
+        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
+        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+        if (vmin) {
+            gen_helper_vfp_minnums(dest, frn, frm, fpst);
+        } else {
+            gen_helper_vfp_maxnums(dest, frn, frm, fpst);
+        }
+        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i32(frn);
+        tcg_temp_free_i32(frm);
+        tcg_temp_free_i32(dest);
+    }
+
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
+/*
+ * Table for converting the most common AArch32 encoding of
+ * rounding mode to arm_fprounding order (which matches the
+ * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
+ */
+static const uint8_t fp_decode_rm[] = {
+    FPROUNDING_TIEAWAY,
+    FPROUNDING_TIEEVEN,
+    FPROUNDING_POSINF,
+    FPROUNDING_NEGINF,
+};
+
+static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
+{
+    uint32_t rd, rm;
+    bool dp = a->dp;
+    TCGv_ptr fpst;
+    TCGv_i32 tcg_rmode;
+    int rounding = fp_decode_rm[a->rm];
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
+
+    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+
+    if (dp) {
+        TCGv_i64 tcg_op;
+        TCGv_i64 tcg_res;
+        tcg_op = tcg_temp_new_i64();
+        tcg_res = tcg_temp_new_i64();
+        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
+        gen_helper_rintd(tcg_res, tcg_op, fpst);
+        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i64(tcg_op);
+        tcg_temp_free_i64(tcg_res);
+    } else {
+        TCGv_i32 tcg_op;
+        TCGv_i32 tcg_res;
+        tcg_op = tcg_temp_new_i32();
+        tcg_res = tcg_temp_new_i32();
+        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
+        gen_helper_rints(tcg_res, tcg_op, fpst);
+        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i32(tcg_op);
+        tcg_temp_free_i32(tcg_res);
+    }
+
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    tcg_temp_free_i32(tcg_rmode);
+
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
+static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
+{
+    uint32_t rd, rm;
+    bool dp = a->dp;
+    TCGv_ptr fpst;
+    TCGv_i32 tcg_rmode, tcg_shift;
+    int rounding = fp_decode_rm[a->rm];
+    bool is_signed = a->op;
+
+    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
+
+    tcg_shift = tcg_const_i32(0);
+
+    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+
+    if (dp) {
+        TCGv_i64 tcg_double, tcg_res;
+        TCGv_i32 tcg_tmp;
+        tcg_double = tcg_temp_new_i64();
+        tcg_res = tcg_temp_new_i64();
+        tcg_tmp = tcg_temp_new_i32();
+        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
+        if (is_signed) {
+            gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
+        } else {
+            gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
+        }
+        tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
+        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
+        tcg_temp_free_i32(tcg_tmp);
+        tcg_temp_free_i64(tcg_res);
+        tcg_temp_free_i64(tcg_double);
+    } else {
+        TCGv_i32 tcg_single, tcg_res;
+        tcg_single = tcg_temp_new_i32();
+        tcg_res = tcg_temp_new_i32();
+        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
+        if (is_signed) {
+            gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
+        } else {
+            gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
+        }
+        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
+        tcg_temp_free_i32(tcg_res);
+        tcg_temp_free_i32(tcg_single);
+    }
+
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    tcg_temp_free_i32(tcg_rmode);
+
+    tcg_temp_free_i32(tcg_shift);
+
+    tcg_temp_free_ptr(fpst);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
     tcg_temp_free_i32(tmp);
 }
 
-static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
-{
-    uint32_t rd, rn, rm;
-    bool dp = a->dp;
-
-    if (!dc_isar_feature(aa32_vsel, s)) {
-        return false;
-    }
-
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
-        ((a->vm | a->vn | a->vd) & 0x10)) {
-        return false;
-    }
-    rd = a->vd;
-    rn = a->vn;
-    rm = a->vm;
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    if (dp) {
-        TCGv_i64 frn, frm, dest;
-        TCGv_i64 tmp, zero, zf, nf, vf;
-
-        zero = tcg_const_i64(0);
-
-        frn = tcg_temp_new_i64();
-        frm = tcg_temp_new_i64();
-        dest = tcg_temp_new_i64();
-
-        zf = tcg_temp_new_i64();
-        nf = tcg_temp_new_i64();
-        vf = tcg_temp_new_i64();
-
-        tcg_gen_extu_i32_i64(zf, cpu_ZF);
-        tcg_gen_ext_i32_i64(nf, cpu_NF);
-        tcg_gen_ext_i32_i64(vf, cpu_VF);
-
-        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
-        switch (a->cc) {
-        case 0: /* eq: Z */
-            tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
-                                frn, frm);
-            break;
-        case 1: /* vs: V */
-            tcg_gen_movcond_i64(TCG_COND_LT, dest, vf, zero,
-                                frn, frm);
-            break;
-        case 2: /* ge: N == V -> N ^ V == 0 */
-            tmp = tcg_temp_new_i64();
-            tcg_gen_xor_i64(tmp, vf, nf);
-            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
-                                frn, frm);
-            tcg_temp_free_i64(tmp);
-            break;
-        case 3: /* gt: !Z && N == V */
-            tcg_gen_movcond_i64(TCG_COND_NE, dest, zf, zero,
-                                frn, frm);
-            tmp = tcg_temp_new_i64();
-            tcg_gen_xor_i64(tmp, vf, nf);
-            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
-                                dest, frm);
-            tcg_temp_free_i64(tmp);
-            break;
-        }
-        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i64(frn);
-        tcg_temp_free_i64(frm);
-        tcg_temp_free_i64(dest);
-
-        tcg_temp_free_i64(zf);
-        tcg_temp_free_i64(nf);
-        tcg_temp_free_i64(vf);
-
-        tcg_temp_free_i64(zero);
-    } else {
-        TCGv_i32 frn, frm, dest;
-        TCGv_i32 tmp, zero;
-
-        zero = tcg_const_i32(0);
-
-        frn = tcg_temp_new_i32();
-        frm = tcg_temp_new_i32();
-        dest = tcg_temp_new_i32();
-        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
-        switch (a->cc) {
-        case 0: /* eq: Z */
-            tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
-                                frn, frm);
-            break;
-        case 1: /* vs: V */
-            tcg_gen_movcond_i32(TCG_COND_LT, dest, cpu_VF, zero,
-                                frn, frm);
-            break;
-        case 2: /* ge: N == V -> N ^ V == 0 */
-            tmp = tcg_temp_new_i32();
-            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
-            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
-                                frn, frm);
-            tcg_temp_free_i32(tmp);
-            break;
-        case 3: /* gt: !Z && N == V */
-            tcg_gen_movcond_i32(TCG_COND_NE, dest, cpu_ZF, zero,
-                                frn, frm);
-            tmp = tcg_temp_new_i32();
-            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
-            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
-                                dest, frm);
-            tcg_temp_free_i32(tmp);
-            break;
-        }
-        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i32(frn);
-        tcg_temp_free_i32(frm);
-        tcg_temp_free_i32(dest);
-
-        tcg_temp_free_i32(zero);
-    }
-
-    return true;
-}
-
-static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
-{
-    uint32_t rd, rn, rm;
-    bool dp = a->dp;
-    bool vmin = a->op;
-    TCGv_ptr fpst;
-
-    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
-        return false;
-    }
-
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
-        ((a->vm | a->vn | a->vd) & 0x10)) {
-        return false;
-    }
-    rd = a->vd;
-    rn = a->vn;
-    rm = a->vm;
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    fpst = get_fpstatus_ptr(0);
-
-    if (dp) {
-        TCGv_i64 frn, frm, dest;
-
-        frn = tcg_temp_new_i64();
-        frm = tcg_temp_new_i64();
-        dest = tcg_temp_new_i64();
-
-        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
-        if (vmin) {
-            gen_helper_vfp_minnumd(dest, frn, frm, fpst);
-        } else {
-            gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
-        }
-        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i64(frn);
-        tcg_temp_free_i64(frm);
-        tcg_temp_free_i64(dest);
-    } else {
-        TCGv_i32 frn, frm, dest;
-
-        frn = tcg_temp_new_i32();
-        frm = tcg_temp_new_i32();
-        dest = tcg_temp_new_i32();
-
-        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
-        if (vmin) {
-            gen_helper_vfp_minnums(dest, frn, frm, fpst);
-        } else {
-            gen_helper_vfp_maxnums(dest, frn, frm, fpst);
-        }
-        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i32(frn);
-        tcg_temp_free_i32(frm);
-        tcg_temp_free_i32(dest);
-    }
-
-    tcg_temp_free_ptr(fpst);
-    return true;
-}
-
-/*
- * Table for converting the most common AArch32 encoding of
- * rounding mode to arm_fprounding order (which matches the
- * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
- */
-static const uint8_t fp_decode_rm[] = {
-    FPROUNDING_TIEAWAY,
-    FPROUNDING_TIEEVEN,
-    FPROUNDING_POSINF,
-    FPROUNDING_NEGINF,
-};
-
-static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
-{
-    uint32_t rd, rm;
-    bool dp = a->dp;
-    TCGv_ptr fpst;
-    TCGv_i32 tcg_rmode;
-    int rounding = fp_decode_rm[a->rm];
-
-    if (!dc_isar_feature(aa32_vrint, s)) {
-        return false;
-    }
-
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
-        ((a->vm | a->vd) & 0x10)) {
-        return false;
-    }
-    rd = a->vd;
-    rm = a->vm;
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    fpst = get_fpstatus_ptr(0);
-
-    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
-    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-
-    if (dp) {
-        TCGv_i64 tcg_op;
-        TCGv_i64 tcg_res;
-        tcg_op = tcg_temp_new_i64();
-        tcg_res = tcg_temp_new_i64();
-        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
-        gen_helper_rintd(tcg_res, tcg_op, fpst);
-        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i64(tcg_op);
-        tcg_temp_free_i64(tcg_res);
-    } else {
-        TCGv_i32 tcg_op;
-        TCGv_i32 tcg_res;
-        tcg_op = tcg_temp_new_i32();
-        tcg_res = tcg_temp_new_i32();
-        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
-        gen_helper_rints(tcg_res, tcg_op, fpst);
-        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i32(tcg_op);
-        tcg_temp_free_i32(tcg_res);
-    }
-
-    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    tcg_temp_free_i32(tcg_rmode);
-
-    tcg_temp_free_ptr(fpst);
-    return true;
-}
-
-static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-{
-    uint32_t rd, rm;
-    bool dp = a->dp;
-    TCGv_ptr fpst;
-    TCGv_i32 tcg_rmode, tcg_shift;
-    int rounding = fp_decode_rm[a->rm];
-    bool is_signed = a->op;
-
-    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
-        return false;
-    }
-
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
-        return false;
-    }
-    rd = a->vd;
-    rm = a->vm;
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    fpst = get_fpstatus_ptr(0);
-
-    tcg_shift = tcg_const_i32(0);
-
-    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
-    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-
-    if (dp) {
-        TCGv_i64 tcg_double, tcg_res;
-        TCGv_i32 tcg_tmp;
-        tcg_double = tcg_temp_new_i64();
-        tcg_res = tcg_temp_new_i64();
-        tcg_tmp = tcg_temp_new_i32();
-        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
-        if (is_signed) {
-            gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
-        } else {
-            gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
-        }
-        tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
-        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
-        tcg_temp_free_i32(tcg_tmp);
-        tcg_temp_free_i64(tcg_res);
-        tcg_temp_free_i64(tcg_double);
-    } else {
-        TCGv_i32 tcg_single, tcg_res;
-        tcg_single = tcg_temp_new_i32();
-        tcg_res = tcg_temp_new_i32();
-        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
-        if (is_signed) {
-            gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
-        } else {
-            gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
-        }
-        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
-        tcg_temp_free_i32(tcg_res);
-        tcg_temp_free_i32(tcg_single);
-    }
-
-    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    tcg_temp_free_i32(tcg_rmode);
-
-    tcg_temp_free_i32(tcg_shift);
-
-    tcg_temp_free_ptr(fpst);
-
-    return true;
-}
-
 /*
  * Disassemble a VFP instruction.  Returns nonzero if an error occurred
  * (ie. an undefined instruction).
-- 
2.20.1

The current VFP code has two different idioms for
loading and storing from the VFP register file:
 1 using the gen_mov_F0_vreg() and similar functions,
   which load and store to a fixed set of TCG globals
   cpu_F0s, CPU_F0d, etc
 2 by direct calls to tcg_gen_ld_f64() and friends

We want to phase out idiom 1 (because the use of the
fixed globals is a relic of a much older version of TCG),
but idiom 2 is quite longwinded:
 tcg_gen_ld_f64(tmp, cpu_env, vfp_reg_offset(true, reg))
requires us to specify the 64-bitness twice, once in
the function name and once by passing 'true' to
vfp_reg_offset(). There's no guard against accidentally
passing the wrong flag.

Instead, let's move to a convention of accessing 64-bit
registers via the existing neon_load_reg64() and
neon_store_reg64(), and provide new neon_load_reg32()
and neon_store_reg32() for the 32-bit equivalents.

Implement the new functions and use them in the code in
translate-vfp.inc.c. We will convert the rest of the VFP
code as we do the decodetree conversion in subsequent
commits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 40 +++++++++++++++++-----------------
 target/arm/translate.c         | 10 +++++++++
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         tcg_gen_ext_i32_i64(nf, cpu_NF);
         tcg_gen_ext_i32_i64(vf, cpu_VF);
 
-        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg64(frn, rn);
+        neon_load_reg64(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
             tcg_temp_free_i64(tmp);
             break;
         }
-        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg64(dest, rd);
         tcg_temp_free_i64(frn);
         tcg_temp_free_i64(frm);
         tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         frn = tcg_temp_new_i32();
         frm = tcg_temp_new_i32();
         dest = tcg_temp_new_i32();
-        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg32(frn, rn);
+        neon_load_reg32(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
             tcg_temp_free_i32(tmp);
             break;
         }
-        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg32(dest, rd);
         tcg_temp_free_i32(frn);
         tcg_temp_free_i32(frm);
         tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
         frm = tcg_temp_new_i64();
         dest = tcg_temp_new_i64();
 
-        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg64(frn, rn);
+        neon_load_reg64(frm, rm);
         if (vmin) {
             gen_helper_vfp_minnumd(dest, frn, frm, fpst);
         } else {
             gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
         }
-        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg64(dest, rd);
         tcg_temp_free_i64(frn);
         tcg_temp_free_i64(frm);
         tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
         frm = tcg_temp_new_i32();
         dest = tcg_temp_new_i32();
 
-        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg32(frn, rn);
+        neon_load_reg32(frm, rm);
         if (vmin) {
             gen_helper_vfp_minnums(dest, frn, frm, fpst);
         } else {
             gen_helper_vfp_maxnums(dest, frn, frm, fpst);
         }
-        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg32(dest, rd);
         tcg_temp_free_i32(frn);
         tcg_temp_free_i32(frm);
         tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i64 tcg_res;
         tcg_op = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
-        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg64(tcg_op, rm);
         gen_helper_rintd(tcg_res, tcg_op, fpst);
-        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg64(tcg_res, rd);
         tcg_temp_free_i64(tcg_op);
         tcg_temp_free_i64(tcg_res);
     } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i32 tcg_res;
         tcg_op = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg32(tcg_op, rm);
         gen_helper_rints(tcg_res, tcg_op, fpst);
-        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_op);
         tcg_temp_free_i32(tcg_res);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         tcg_double = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
         tcg_tmp = tcg_temp_new_i32();
-        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
+        neon_load_reg64(tcg_double, rm);
         if (is_signed) {
             gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
         } else {
             gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
         }
         tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
-        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
+        neon_store_reg32(tcg_tmp, rd);
         tcg_temp_free_i32(tcg_tmp);
         tcg_temp_free_i64(tcg_res);
         tcg_temp_free_i64(tcg_double);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         TCGv_i32 tcg_single, tcg_res;
         tcg_single = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
+        neon_load_reg32(tcg_single, rm);
         if (is_signed) {
             gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
         } else {
             gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
         }
-        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
+        neon_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_res);
         tcg_temp_free_i32(tcg_single);
     }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
 }
 
+static inline void neon_load_reg32(TCGv_i32 var, int reg)
+{
+    tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
+}
+
+static inline void neon_store_reg32(TCGv_i32 var, int reg)
+{
+    tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
-- 
2.20.1

Convert the "double-precision" register moves to decodetree:
this covers VMOV scalar-to-gpreg, VMOV gpreg-to-scalar and VDUP.

Note that the conversion process has tightened up a few of the
UNDEF encoding checks: we now correctly forbid:
 * VMOV-to-gpr with U:opc1:opc2 == 10x00 or x0x10
 * VMOV-from-gpr with opc1:opc2 == 0x10
 * VDUP with B:E == 11
 * VDUP with Q == 1 and Vn<0> == 1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
The accesses of elements < 32 bits could be improved by doing
direct ld/st of the right size rather than 32-bit read-and-shift
or read-modify-write, but we leave this for later cleanup,
since this series is generally trying to stick to fixing
the decode.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 147 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  83 +------------------
 target/arm/vfp.decode          |  36 ++++++++
 3 files changed, 185 insertions(+), 81 deletions(-)

Convert the "single-precision" register moves to decodetree:
 * VMSR
 * VMRS
 * VMOV between general purpose register and single precision

Note that the VMSR/VMRS conversions make our handling of
the "should this UNDEF?" checks consistent between the two
instructions:
 * VMSR to MVFR0, MVFR1, MVFR2 now UNDEF from EL0
   (previously was a nop)
 * VMSR to FPSID now UNDEFs from EL0 or if VFPv3 or better
   (previously was a nop)
 * VMSR to FPINST and FPINST2 now UNDEF if VFPv3 or better
   (previously would write to the register, which had no
   guest-visible effect because we always UNDEF reads)

We also tighten up the decode: we were previously underdecoding
some SBZ or SBO bits.

The conversion of VMOV_single includes the expansion out of the
gen_mov_F0_vreg()/gen_vfp_mrs() and gen_mov_vreg_F0()/gen_vfp_msr()
sequences into the simpler direct load/store of the TCG temp via
neon_{load,store}_reg32(): we know in the new function that we're
always single-precision, we don't need to use the old-and-deprecated
cpu_F0* TCG globals, and we don't happen to have the declaration of
gen_vfp_msr() and gen_vfp_mrs() at the point in the file where the
new function is.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 161 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 148 +-----------------------------
 target/arm/vfp.decode          |   4 +
 3 files changed, 168 insertions(+), 145 deletions(-)

Convert the VFP two-register transfer instructions to decodetree
(in the v8 Arm ARM these are the "Advanced SIMD and floating-point
64-bit move" encoding group).

Again, we expand out the sequences involving gen_vfp_msr() and
gen_msr_vfp().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 70 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 46 +---------------------
 target/arm/vfp.decode          |  5 +++
 3 files changed, 77 insertions(+), 44 deletions(-)

Convert the VFP single load/store insns VLDR and VSTR to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 73 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 22 +---------
 target/arm/vfp.decode          |  7 ++++
 3 files changed, 82 insertions(+), 20 deletions(-)

Convert the VFP load/store multiple insns to decodetree.
This includes tightening up the UNDEF checking for pre-VFPv3
CPUs which only have D0-D15 : they now UNDEF for any access
to D16-D31, not merely when the smallest register in the
transfer list is in D16-D31.

This conversion does not try to share code between the single
precision and the double precision versions; this looks a bit
duplicative of code, but it leaves the door open for a future
refactoring which gets rid of the use of the "F0" registers
by inlining the various functions like gen_vfp_ld() and
gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }"
conditionalisation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 162 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  97 +-------------------
 target/arm/vfp.decode          |  18 ++++
 3 files changed, 183 insertions(+), 94 deletions(-)

Expand out the sequences in the new decoder VLDR/VSTR/VLDM/VSTM trans
functions which perform the memory accesses by going via the TCG
globals cpu_F0s and cpu_F0d, to use local TCG temps instead.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 46 +++++++++++++++++++++-------------
 target/arm/translate.c         | 18 -------------
 2 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_sp *a)
 static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
 {
     uint32_t offset;
-    TCGv_i32 addr;
+    TCGv_i32 addr, tmp;
 
     if (!vfp_access_check(s)) {
         return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
         addr = load_reg(s, a->rn);
     }
     tcg_gen_addi_i32(addr, addr, offset);
+    tmp = tcg_temp_new_i32();
     if (a->l) {
-        gen_vfp_ld(s, false, addr);
-        gen_mov_vreg_F0(false, a->vd);
+        gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
+        neon_store_reg32(tmp, a->vd);
     } else {
-        gen_mov_F0_vreg(false, a->vd);
-        gen_vfp_st(s, false, addr);
+        neon_load_reg32(tmp, a->vd);
+        gen_aa32_st32(s, tmp, addr, get_mem_index(s));
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(addr);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
 {
     uint32_t offset;
     TCGv_i32 addr;
+    TCGv_i64 tmp;
 
     /* UNDEF accesses to D16-D31 if they don't exist */
     if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
         addr = load_reg(s, a->rn);
     }
     tcg_gen_addi_i32(addr, addr, offset);
+    tmp = tcg_temp_new_i64();
     if (a->l) {
-        gen_vfp_ld(s, true, addr);
-        gen_mov_vreg_F0(true, a->vd);
+        gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
+        neon_store_reg64(tmp, a->vd);
     } else {
-        gen_mov_F0_vreg(true, a->vd);
-        gen_vfp_st(s, true, addr);
+        neon_load_reg64(tmp, a->vd);
+        gen_aa32_st64(s, tmp, addr, get_mem_index(s));
     }
+    tcg_temp_free_i64(tmp);
     tcg_temp_free_i32(addr);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
 static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
 {
     uint32_t offset;
-    TCGv_i32 addr;
+    TCGv_i32 addr, tmp;
     int i, n;
 
     n = a->imm;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
     }
 
     offset = 4;
+    tmp = tcg_temp_new_i32();
     for (i = 0; i < n; i++) {
         if (a->l) {
             /* load */
-            gen_vfp_ld(s, false, addr);
-            gen_mov_vreg_F0(false, a->vd + i);
+            gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
+            neon_store_reg32(tmp, a->vd + i);
         } else {
             /* store */
-            gen_mov_F0_vreg(false, a->vd + i);
-            gen_vfp_st(s, false, addr);
+            neon_load_reg32(tmp, a->vd + i);
+            gen_aa32_st32(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
     }
+    tcg_temp_free_i32(tmp);
     if (a->w) {
         /* writeback */
         if (a->p) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
 {
     uint32_t offset;
     TCGv_i32 addr;
+    TCGv_i64 tmp;
     int i, n;
 
     n = a->imm >> 1;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
     }
 
     offset = 8;
+    tmp = tcg_temp_new_i64();
     for (i = 0; i < n; i++) {
         if (a->l) {
             /* load */
-            gen_vfp_ld(s, true, addr);
-            gen_mov_vreg_F0(true, a->vd + i);
+            gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
+            neon_store_reg64(tmp, a->vd + i);
         } else {
             /* store */
-            gen_mov_F0_vreg(true, a->vd + i);
-            gen_vfp_st(s, true, addr);
+            neon_load_reg64(tmp, a->vd + i);
+            gen_aa32_st64(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
     }
+    tcg_temp_free_i64(tmp);
     if (a->w) {
         /* writeback */
         if (a->p) {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ VFP_GEN_FIX(uhto, )
 VFP_GEN_FIX(ulto, )
 #undef VFP_GEN_FIX
 
-static inline void gen_vfp_ld(DisasContext *s, int dp, TCGv_i32 addr)
-{
-    if (dp) {
-        gen_aa32_ld64(s, cpu_F0d, addr, get_mem_index(s));
-    } else {
-        gen_aa32_ld32u(s, cpu_F0s, addr, get_mem_index(s));
-    }
-}
-
-static inline void gen_vfp_st(DisasContext *s, int dp, TCGv_i32 addr)
-{
-    if (dp) {
-        gen_aa32_st64(s, cpu_F0d, addr, get_mem_index(s));
-    } else {
-        gen_aa32_st32(s, cpu_F0s, addr, get_mem_index(s));
-    }
-}
-
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
-- 
2.20.1

Convert the VFP VMLA instruction to decodetree.

This is the first of the VFP 3-operand data processing instructions,
so we include in this patch the code which loops over the elements
for an old-style VFP vector operation. The existing code to do this
looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since
we are going to be converting instructions one at a time anyway
we can take the opportunity to make the new loop use TCG temporaries,
which means we can do that conversion one operation at a time
rather than needing to do it all in one go.

We include an UNDEF check which was missing in the old code:
short-vector operations (with stride or length non-zero) were
deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec
field does not indicate that support for short vectors is present
we UNDEF the operations that would use them. (This is a change
of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which
previously were all incorrectly allowing short-vector operations.)

Note that the conversion fixes a bug in the old code for the
case of VFP short-vector "mixed scalar/vector operations". These
happen where the destination register is in a vector bank but
but the second operand is in a scalar bank. For example
  vmla.f64 d10, d1, d16   with length 2 stride 2
is equivalent to the pair of scalar operations
  vmla.f64 d10, d1, d16
  vmla.f64 d8, d3, d16
where the destination and first input register cycle through
their vector but the second input is scalar (d16). In the
old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d}
as a temporary output for the multiply, which trashes the
second input operand. For the fully-scalar case (where we
never do a second iteration) and the fully-vector case
(where the loop loads the new second input operand) this
doesn't matter, but for the mixed scalar/vector case we
will end up using the wrong value for later loop iterations.
In the new code we use TCG temporaries and so avoid the bug.
This bug is present for all the multiply-accumulate insns
that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS.

Note 2: the expression used to calculate the next register
number in the vector bank is not in fact correct; we leave
this behaviour unchanged from the old decoder and will
fix this bug later in the series.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h               |   5 +
 target/arm/translate-vfp.inc.c | 205 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  14 ++-
 target/arm/vfp.decode          |   6 +
 4 files changed, 224 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp_d32(const ARMISARegisters *id)
     return FIELD_EX64(id->mvfr0, MVFR0, SIMDREG) >= 2;
 }
 
+static inline bool isar_feature_aa32_fpshvec(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->mvfr0, MVFR0, FPSHVEC) > 0;
+}
+
 /*
  * We always set the FP and SIMD FP16 fields to indicate identical
  * levels of support (assuming SIMD is implemented at all), so
diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
 
     return true;
 }
+
+/*
+ * Types for callbacks for do_vfp_3op_sp() and do_vfp_3op_dp().
+ * The callback should emit code to write a value to vd. If
+ * do_vfp_3op_{sp,dp}() was passed reads_vd then the TCGv vd
+ * will contain the old value of the relevant VFP register;
+ * otherwise it must be written to only.
+ */
+typedef void VFPGen3OpSPFn(TCGv_i32 vd,
+                           TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst);
+typedef void VFPGen3OpDPFn(TCGv_i64 vd,
+                           TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst);
+
+/*
+ * Perform a 3-operand VFP data processing instruction. fn is the
+ * callback to do the actual operation; this function deals with the
+ * code to handle looping around for VFP vector processing.
+ */
+static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
+                          int vd, int vn, int vm, bool reads_vd)
+{
+    uint32_t delta_m = 0;
+    uint32_t delta_d = 0;
+    uint32_t bank_mask = 0;
+    int veclen = s->vec_len;
+    TCGv_i32 f0, f1, fd;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_fpshvec, s) &&
+        (veclen != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (veclen > 0) {
+        bank_mask = 0x18;
+
+        /* Figure out what type of vector operation this is.  */
+        if ((vd & bank_mask) == 0) {
+            /* scalar */
+            veclen = 0;
+        } else {
+            delta_d = s->vec_stride + 1;
+
+            if ((vm & bank_mask) == 0) {
+                /* mixed scalar/vector */
+                delta_m = 0;
+            } else {
+                /* vector */
+                delta_m = delta_d;
+            }
+        }
+    }
+
+    f0 = tcg_temp_new_i32();
+    f1 = tcg_temp_new_i32();
+    fd = tcg_temp_new_i32();
+    fpst = get_fpstatus_ptr(0);
+
+    neon_load_reg32(f0, vn);
+    neon_load_reg32(f1, vm);
+
+    for (;;) {
+        if (reads_vd) {
+            neon_load_reg32(fd, vd);
+        }
+        fn(fd, f0, f1, fpst);
+        neon_store_reg32(fd, vd);
+
+        if (veclen == 0) {
+            break;
+        }
+
+        /* Set up the operands for the next iteration */
+        veclen--;
+        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+        neon_load_reg32(f0, vn);
+        if (delta_m) {
+            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+            neon_load_reg32(f1, vm);
+        }
+    }
+
+    tcg_temp_free_i32(f0);
+    tcg_temp_free_i32(f1);
+    tcg_temp_free_i32(fd);
+    tcg_temp_free_ptr(fpst);
+
+    return true;
+}
+
+static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
+                          int vd, int vn, int vm, bool reads_vd)
+{
+    uint32_t delta_m = 0;
+    uint32_t delta_d = 0;
+    uint32_t bank_mask = 0;
+    int veclen = s->vec_len;
+    TCGv_i64 f0, f1, fd;
+    TCGv_ptr fpst;
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vn | vm) & 0x10)) {
+        return false;
+    }
+
+    if (!dc_isar_feature(aa32_fpshvec, s) &&
+        (veclen != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (veclen > 0) {
+        bank_mask = 0xc;
+
+        /* Figure out what type of vector operation this is.  */
+        if ((vd & bank_mask) == 0) {
+            /* scalar */
+            veclen = 0;
+        } else {
+            delta_d = (s->vec_stride >> 1) + 1;
+
+            if ((vm & bank_mask) == 0) {
+                /* mixed scalar/vector */
+                delta_m = 0;
+            } else {
+                /* vector */
+                delta_m = delta_d;
+            }
+        }
+    }
+
+    f0 = tcg_temp_new_i64();
+    f1 = tcg_temp_new_i64();
+    fd = tcg_temp_new_i64();
+    fpst = get_fpstatus_ptr(0);
+
+    neon_load_reg64(f0, vn);
+    neon_load_reg64(f1, vm);
+
+    for (;;) {
+        if (reads_vd) {
+            neon_load_reg64(fd, vd);
+        }
+        fn(fd, f0, f1, fpst);
+        neon_store_reg64(fd, vd);
+
+        if (veclen == 0) {
+            break;
+        }
+        /* Set up the operands for the next iteration */
+        veclen--;
+        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+        neon_load_reg64(f0, vn);
+        if (delta_m) {
+            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+            neon_load_reg64(f1, vm);
+        }
+    }
+
+    tcg_temp_free_i64(f0);
+    tcg_temp_free_i64(f1);
+    tcg_temp_free_i64(fd);
+    tcg_temp_free_ptr(fpst);
+
+    return true;
+}
+
+static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /* Note that order of inputs to the add matters for NaNs */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VMLA_sp(DisasContext *s, arg_VMLA_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VMLA_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VMLA_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /* Note that order of inputs to the add matters for NaNs */
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
+    tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             op = ((insn >> 20) & 8) | ((insn >> 19) & 6) | ((insn >> 6) & 1);
             rn = VFP_SREG_N(insn);
 
+            switch (op) {
+            case 0:
+                /* Already handled by decodetree */
+                return 1;
+            default:
+                break;
+            }
+
             if (op == 15) {
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 0: /* VMLA: fd + (fn * fm) */
-                    /* Note that order of inputs to the add matters for NaNs */
-                    gen_vfp_F1_mul(dp);
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_add(dp);
-                    break;
                 case 1: /* VMLS: fd + -(fn * fm) */
                     gen_vfp_mul(dp);
                     gen_vfp_F1_neg(dp);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VLDM_VSTM_sp ---- 1101 0.1 l:1 rn:4 .... 1010 imm:8 \
              vd=%vd_sp p=1 u=0 w=1
 VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
              vd=%vd_dp p=1 u=0 w=1
+
+# 3-register VFP data-processing; bits [23,21:20,6] identify the operation.
+VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VFP VMLS instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 38 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  8 +------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
 }
+
+static void gen_VMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /*
+     * VMLS: vd = vd + -(vn * vm)
+     * Note that order of inputs to the add matters for NaNs.
+     */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
+    gen_helper_vfp_negs(tmp, tmp);
+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VMLS_sp(DisasContext *s, arg_VMLS_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VMLS_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /*
+     * VMLS: vd = vd + -(vn * vm)
+     * Note that order of inputs to the add matters for NaNs.
+     */
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
+    gen_helper_vfp_negd(tmp, tmp);
+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
+    tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0:
+            case 0 ... 1:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 1: /* VMLS: fd + -(fn * fm) */
-                    gen_vfp_mul(dp);
-                    gen_vfp_F1_neg(dp);
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_add(dp);
-                    break;
                 case 2: /* VNMLS: -fd + (fn * fm) */
                     /* Note that it isn't valid to replace (-A + B) with (B - A)
                      * or similar plausible looking simplifications
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VFP VNMLS instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 42 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 24 +------------------
 target/arm/vfp.decode          |  5 ++++
 3 files changed, 48 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
 }
+
+static void gen_VNMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /*
+     * VNMLS: -fd + (fn * fm)
+     * Note that it isn't valid to replace (-A + B) with (B - A) or similar
+     * plausible looking simplifications because this will give wrong results
+     * for NaNs.
+     */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
+    gen_helper_vfp_negs(vd, vd);
+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VNMLS_sp(DisasContext *s, arg_VNMLS_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VNMLS_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VNMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /*
+     * VNMLS: -fd + (fn * fm)
+     * Note that it isn't valid to replace (-A + B) with (B - A) or similar
+     * plausible looking simplifications because this will give wrong results
+     * for NaNs.
+     */
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
+    gen_helper_vfp_negd(vd, vd);
+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
+    tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ VFP_OP2(div)
 
 #undef VFP_OP2
 
-static inline void gen_vfp_F1_mul(int dp)
-{
-    /* Like gen_vfp_mul() but put result in F1 */
-    TCGv_ptr fpst = get_fpstatus_ptr(0);
-    if (dp) {
-        gen_helper_vfp_muld(cpu_F1d, cpu_F0d, cpu_F1d, fpst);
-    } else {
-        gen_helper_vfp_muls(cpu_F1s, cpu_F0s, cpu_F1s, fpst);
-    }
-    tcg_temp_free_ptr(fpst);
-}
-
 static inline void gen_vfp_F1_neg(int dp)
 {
     /* Like gen_vfp_neg() but put result in F1 */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 1:
+            case 0 ... 2:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 2: /* VNMLS: -fd + (fn * fm) */
-                    /* Note that it isn't valid to replace (-A + B) with (B - A)
-                     * or similar plausible looking simplifications
-                     * because this will give wrong results for NaNs.
-                     */
-                    gen_vfp_F1_mul(dp);
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_neg(dp);
-                    gen_vfp_add(dp);
-                    break;
                 case 3: /* VNMLA: -fd + -(fn * fm) */
                     gen_vfp_mul(dp);
                     gen_vfp_F1_neg(dp);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VFP VNMLA instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 34 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 19 +------------------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 40 insertions(+), 18 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
 }
+
+static void gen_VNMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /* VNMLA: -fd + -(fn * fm) */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
+    gen_helper_vfp_negs(tmp, tmp);
+    gen_helper_vfp_negs(vd, vd);
+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VNMLA_sp(DisasContext *s, arg_VNMLA_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VNMLA_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VNMLA_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /* VNMLA: -fd + (fn * fm) */
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
+    gen_helper_vfp_negd(tmp, tmp);
+    gen_helper_vfp_negd(vd, vd);
+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
+    tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ VFP_OP2(div)
 
 #undef VFP_OP2
 
-static inline void gen_vfp_F1_neg(int dp)
-{
-    /* Like gen_vfp_neg() but put result in F1 */
-    if (dp) {
-        gen_helper_vfp_negd(cpu_F1d, cpu_F0d);
-    } else {
-        gen_helper_vfp_negs(cpu_F1s, cpu_F0s);
-    }
-}
-
 static inline void gen_vfp_abs(int dp)
 {
     if (dp)
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 2:
+            case 0 ... 3:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 3: /* VNMLA: -fd + -(fn * fm) */
-                    gen_vfp_mul(dp);
-                    gen_vfp_F1_neg(dp);
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_neg(dp);
-                    gen_vfp_add(dp);
-                    break;
                 case 4: /* mul: fn * fm */
                     gen_vfp_mul(dp);
                     break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VMUL instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  5 +----
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
 }
+
+static bool trans_VMUL_sp(DisasContext *s, arg_VMUL_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_helper_vfp_muls, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 3:
+            case 0 ... 4:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 4: /* mul: fn * fm */
-                    gen_vfp_mul(dp);
-                    break;
                 case 5: /* nmul: -(fn * fm) */
                     gen_vfp_mul(dp);
                     gen_vfp_neg(dp);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VNMUL instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 24 ++++++++++++++++++++++++
 target/arm/translate.c         |  7 +------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_sp *a)
 {
     return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
 }
+
+static void gen_VNMUL_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /* VNMUL: -(fn * fm) */
+    gen_helper_vfp_muls(vd, vn, vm, fpst);
+    gen_helper_vfp_negs(vd, vd);
+}
+
+static bool trans_VNMUL_sp(DisasContext *s, arg_VNMUL_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VNMUL_sp, a->vd, a->vn, a->vm, false);
+}
+
+static void gen_VNMUL_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /* VNMUL: -(fn * fm) */
+    gen_helper_vfp_muld(vd, vn, vm, fpst);
+    gen_helper_vfp_negd(vd, vd);
+}
+
+static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
 
 VFP_OP2(add)
 VFP_OP2(sub)
-VFP_OP2(mul)
 VFP_OP2(div)
 
 #undef VFP_OP2
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 4:
+            case 0 ... 5:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 5: /* nmul: -(fn * fm) */
-                    gen_vfp_mul(dp);
-                    gen_vfp_neg(dp);
-                    break;
                 case 6: /* add: fn + fm */
                     gen_vfp_add(dp);
                     break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VADD instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  6 +-----
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VADD_sp(DisasContext *s, arg_VADD_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_helper_vfp_adds, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VADD_dp(DisasContext *s, arg_VADD_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
     tcg_temp_free_ptr(fpst);                                          \
 }
 
-VFP_OP2(add)
 VFP_OP2(sub)
 VFP_OP2(div)
 
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 5:
+            case 0 ... 6:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 6: /* add: fn + fm */
-                    gen_vfp_add(dp);
-                    break;
                 case 7: /* sub: fn - fm */
                     gen_vfp_sub(dp);
                     break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VSUB instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  6 +-----
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VADD_dp(DisasContext *s, arg_VADD_sp *a)
 {
     return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VSUB_sp(DisasContext *s, arg_VSUB_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_helper_vfp_subs, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
     tcg_temp_free_ptr(fpst);                                          \
 }
 
-VFP_OP2(sub)
 VFP_OP2(div)
 
 #undef VFP_OP2
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 6:
+            case 0 ... 7:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 7: /* sub: fn - fm */
-                    gen_vfp_sub(dp);
-                    break;
                 case 8: /* div: fn / fm */
                     gen_vfp_div(dp);
                     break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VSUB_sp      ---- 1110 0.11 .... .... 1010 .1.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VSUB_dp      ---- 1110 0.11 .... .... 1011 .1.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VDIV instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         | 21 +--------------------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_sp *a)
 {
     return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VDIV_sp(DisasContext *s, arg_VDIV_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_helper_vfp_divs, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_fpstatus_ptr(int neon)
     return statusptr;
 }
 
-#define VFP_OP2(name)                                                 \
-static inline void gen_vfp_##name(int dp)                             \
-{                                                                     \
-    TCGv_ptr fpst = get_fpstatus_ptr(0);                              \
-    if (dp) {                                                         \
-        gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, cpu_F1d, fpst);    \
-    } else {                                                          \
-        gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, cpu_F1s, fpst);    \
-    }                                                                 \
-    tcg_temp_free_ptr(fpst);                                          \
-}
-
-VFP_OP2(div)
-
-#undef VFP_OP2
-
 static inline void gen_vfp_abs(int dp)
 {
     if (dp)
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 7:
+            case 0 ... 8:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 8: /* div: fn / fm */
-                    gen_vfp_div(dp);
-                    break;
                 case 10: /* VFNMA : fd = muladd(-fd,  fn, fm) */
                 case 11: /* VFNMS : fd = muladd(-fd, -fn, fm) */
                 case 12: /* VFMA  : fd = muladd( fd,  fn, fm) */
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VSUB_sp      ---- 1110 0.11 .... .... 1010 .1.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VSUB_dp      ---- 1110 0.11 .... .... 1011 .1.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VFP fused multiply-add instructions (VFNMA, VFNMS,
VFMA, VFMS) to decodetree.

Note that in the old decode structure we were implementing
these to honour the VFP vector stride/length. These instructions
were introduced in VFPv4, and in the v7A architecture they
are UNPREDICTABLE if the vector stride or length are non-zero.
In v8A they must UNDEF if stride or length are non-zero, like
all VFP instructions; we choose to UNDEF always.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 121 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  53 +--------------
 target/arm/vfp.decode          |   9 +++
 3 files changed, 131 insertions(+), 52 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_sp *a)
 {
     return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VFM_sp(DisasContext *s, arg_VFM_sp *a)
+{
+    /*
+     * VFNMA : fd = muladd(-fd,  fn, fm)
+     * VFNMS : fd = muladd(-fd, -fn, fm)
+     * VFMA  : fd = muladd( fd,  fn, fm)
+     * VFMS  : fd = muladd( fd, -fn, fm)
+     *
+     * These are fused multiply-add, and must be done as one floating
+     * point operation with no rounding between the multiplication and
+     * addition steps.  NB that doing the negations here as separate
+     * steps is correct : an input NaN should come out with its sign
+     * bit flipped if it is a negated-input.
+     */
+    TCGv_ptr fpst;
+    TCGv_i32 vn, vm, vd;
+
+    /*
+     * Present in VFPv4 only.
+     * In v7A, UNPREDICTABLE with non-zero vector length/stride; from
+     * v8A, must UNDEF. We choose to UNDEF for both v7A and v8A.
+     */
+    if (!arm_dc_feature(s, ARM_FEATURE_VFP4) ||
+        (s->vec_len != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vn = tcg_temp_new_i32();
+    vm = tcg_temp_new_i32();
+    vd = tcg_temp_new_i32();
+
+    neon_load_reg32(vn, a->vn);
+    neon_load_reg32(vm, a->vm);
+    if (a->o2) {
+        /* VFNMS, VFMS */
+        gen_helper_vfp_negs(vn, vn);
+    }
+    neon_load_reg32(vd, a->vd);
+    if (a->o1 & 1) {
+        /* VFNMA, VFNMS */
+        gen_helper_vfp_negs(vd, vd);
+    }
+    fpst = get_fpstatus_ptr(0);
+    gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
+    neon_store_reg32(vd, a->vd);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(vn);
+    tcg_temp_free_i32(vm);
+    tcg_temp_free_i32(vd);
+
+    return true;
+}
+
+static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
+{
+    /*
+     * VFNMA : fd = muladd(-fd,  fn, fm)
+     * VFNMS : fd = muladd(-fd, -fn, fm)
+     * VFMA  : fd = muladd( fd,  fn, fm)
+     * VFMS  : fd = muladd( fd, -fn, fm)
+     *
+     * These are fused multiply-add, and must be done as one floating
+     * point operation with no rounding between the multiplication and
+     * addition steps.  NB that doing the negations here as separate
+     * steps is correct : an input NaN should come out with its sign
+     * bit flipped if it is a negated-input.
+     */
+    TCGv_ptr fpst;
+    TCGv_i64 vn, vm, vd;
+
+    /*
+     * Present in VFPv4 only.
+     * In v7A, UNPREDICTABLE with non-zero vector length/stride; from
+     * v8A, must UNDEF. We choose to UNDEF for both v7A and v8A.
+     */
+    if (!arm_dc_feature(s, ARM_FEATURE_VFP4) ||
+        (s->vec_len != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vn = tcg_temp_new_i64();
+    vm = tcg_temp_new_i64();
+    vd = tcg_temp_new_i64();
+
+    neon_load_reg64(vn, a->vn);
+    neon_load_reg64(vm, a->vm);
+    if (a->o2) {
+        /* VFNMS, VFMS */
+        gen_helper_vfp_negd(vn, vn);
+    }
+    neon_load_reg64(vd, a->vd);
+    if (a->o1 & 1) {
+        /* VFNMA, VFNMS */
+        gen_helper_vfp_negd(vd, vd);
+    }
+    fpst = get_fpstatus_ptr(0);
+    gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
+    neon_store_reg64(vd, a->vd);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i64(vn);
+    tcg_temp_free_i64(vm);
+    tcg_temp_free_i64(vd);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 8:
+            case 0 ... 13:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 10: /* VFNMA : fd = muladd(-fd,  fn, fm) */
-                case 11: /* VFNMS : fd = muladd(-fd, -fn, fm) */
-                case 12: /* VFMA  : fd = muladd( fd,  fn, fm) */
-                case 13: /* VFMS  : fd = muladd( fd, -fn, fm) */
-                    /* These are fused multiply-add, and must be done as one
-                     * floating point operation with no rounding between the
-                     * multiplication and addition steps.
-                     * NB that doing the negations here as separate steps is
-                     * correct : an input NaN should come out with its sign bit
-                     * flipped if it is a negated-input.
-                     */
-                    if (!arm_dc_feature(s, ARM_FEATURE_VFP4)) {
-                        return 1;
-                    }
-                    if (dp) {
-                        TCGv_ptr fpst;
-                        TCGv_i64 frd;
-                        if (op & 1) {
-                            /* VFNMS, VFMS */
-                            gen_helper_vfp_negd(cpu_F0d, cpu_F0d);
-                        }
-                        frd = tcg_temp_new_i64();
-                        tcg_gen_ld_f64(frd, cpu_env, vfp_reg_offset(dp, rd));
-                        if (op & 2) {
-                            /* VFNMA, VFNMS */
-                            gen_helper_vfp_negd(frd, frd);
-                        }
-                        fpst = get_fpstatus_ptr(0);
-                        gen_helper_vfp_muladdd(cpu_F0d, cpu_F0d,
-                                               cpu_F1d, frd, fpst);
-                        tcg_temp_free_ptr(fpst);
-                        tcg_temp_free_i64(frd);
-                    } else {
-                        TCGv_ptr fpst;
-                        TCGv_i32 frd;
-                        if (op & 1) {
-                            /* VFNMS, VFMS */
-                            gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
-                        }
-                        frd = tcg_temp_new_i32();
-                        tcg_gen_ld_f32(frd, cpu_env, vfp_reg_offset(dp, rd));
-                        if (op & 2) {
-                            gen_helper_vfp_negs(frd, frd);
-                        }
-                        fpst = get_fpstatus_ptr(0);
-                        gen_helper_vfp_muladds(cpu_F0s, cpu_F0s,
-                                               cpu_F1s, frd, fpst);
-                        tcg_temp_free_ptr(fpst);
-                        tcg_temp_free_i32(frd);
-                    }
-                    break;
                 case 14: /* fconst */
                     if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
                         return 1;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VFM_sp       ---- 1110 1.01 .... .... 1010 . o2:1 . 0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=1
+VFM_dp       ---- 1110 1.01 .... .... 1011 . o2:1 . 0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=1
+VFM_sp       ---- 1110 1.10 .... .... 1010 . o2:1 . 0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=2
+VFM_dp       ---- 1110 1.10 .... .... 1011 . o2:1 . 0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=2
-- 
2.20.1

Convert the VFP VMOV (immediate) instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 129 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  27 +------
 target/arm/vfp.decode          |   5 ++
 3 files changed, 136 insertions(+), 25 deletions(-)

Convert the VFP VABS instruction to decodetree.

Unlike the 3-op versions, we don't pass fpst to the VFPGen2OpSPFn or
VFPGen2OpDPFn because none of the operations which use this format
and support short vectors will need it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 167 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  12 ++-
 target/arm/vfp.decode          |   5 +
 3 files changed, 180 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ typedef void VFPGen3OpSPFn(TCGv_i32 vd,
 typedef void VFPGen3OpDPFn(TCGv_i64 vd,
                            TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst);
 
+/*
+ * Types for callbacks for do_vfp_2op_sp() and do_vfp_2op_dp().
+ * The callback should emit code to write a value to vd (which
+ * should be written to only).
+ */
+typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
+typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
+
 /*
  * Perform a 3-operand VFP data processing instruction. fn is the
  * callback to do the actual operation; this function deals with the
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     return true;
 }
 
+static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+{
+    uint32_t delta_m = 0;
+    uint32_t delta_d = 0;
+    uint32_t bank_mask = 0;
+    int veclen = s->vec_len;
+    TCGv_i32 f0, fd;
+
+    if (!dc_isar_feature(aa32_fpshvec, s) &&
+        (veclen != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (veclen > 0) {
+        bank_mask = 0x18;
+
+        /* Figure out what type of vector operation this is.  */
+        if ((vd & bank_mask) == 0) {
+            /* scalar */
+            veclen = 0;
+        } else {
+            delta_d = s->vec_stride + 1;
+
+            if ((vm & bank_mask) == 0) {
+                /* mixed scalar/vector */
+                delta_m = 0;
+            } else {
+                /* vector */
+                delta_m = delta_d;
+            }
+        }
+    }
+
+    f0 = tcg_temp_new_i32();
+    fd = tcg_temp_new_i32();
+
+    neon_load_reg32(f0, vm);
+
+    for (;;) {
+        fn(fd, f0);
+        neon_store_reg32(fd, vd);
+
+        if (veclen == 0) {
+            break;
+        }
+
+        if (delta_m == 0) {
+            /* single source one-many */
+            while (veclen--) {
+                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+                neon_store_reg32(fd, vd);
+            }
+            break;
+        }
+
+        /* Set up the operands for the next iteration */
+        veclen--;
+        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+        neon_load_reg32(f0, vm);
+    }
+
+    tcg_temp_free_i32(f0);
+    tcg_temp_free_i32(fd);
+
+    return true;
+}
+
+static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
+{
+    uint32_t delta_m = 0;
+    uint32_t delta_d = 0;
+    uint32_t bank_mask = 0;
+    int veclen = s->vec_len;
+    TCGv_i64 f0, fd;
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vm) & 0x10)) {
+        return false;
+    }
+
+    if (!dc_isar_feature(aa32_fpshvec, s) &&
+        (veclen != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (veclen > 0) {
+        bank_mask = 0xc;
+
+        /* Figure out what type of vector operation this is.  */
+        if ((vd & bank_mask) == 0) {
+            /* scalar */
+            veclen = 0;
+        } else {
+            delta_d = (s->vec_stride >> 1) + 1;
+
+            if ((vm & bank_mask) == 0) {
+                /* mixed scalar/vector */
+                delta_m = 0;
+            } else {
+                /* vector */
+                delta_m = delta_d;
+            }
+        }
+    }
+
+    f0 = tcg_temp_new_i64();
+    fd = tcg_temp_new_i64();
+
+    neon_load_reg64(f0, vm);
+
+    for (;;) {
+        fn(fd, f0);
+        neon_store_reg64(fd, vd);
+
+        if (veclen == 0) {
+            break;
+        }
+
+        if (delta_m == 0) {
+            /* single source one-many */
+            while (veclen--) {
+                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+                neon_store_reg64(fd, vd);
+            }
+            break;
+        }
+
+        /* Set up the operands for the next iteration */
+        veclen--;
+        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+        neon_load_reg64(f0, vm);
+    }
+
+    tcg_temp_free_i64(f0);
+    tcg_temp_free_i64(fd);
+
+    return true;
+}
+
 static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 {
     /* Note that order of inputs to the add matters for NaNs */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     tcg_temp_free_i64(fd);
     return true;
 }
+
+static bool trans_VABS_sp(DisasContext *s, arg_VABS_sp *a)
+{
+    return do_vfp_2op_sp(s, gen_helper_vfp_abss, a->vd, a->vm);
+}
+
+static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
+{
+    return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             case 0 ... 14:
                 /* Already handled by decodetree */
                 return 1;
+            case 15:
+                switch (rn) {
+                case 1:
+                    /* Already handled by decodetree */
+                    return 1;
+                default:
+                    break;
+                }
             default:
                 break;
             }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
                 case 0x00: /* vmov */
-                case 0x01: /* vabs */
                 case 0x02: /* vneg */
                 case 0x03: /* vsqrt */
                     break;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     case 0: /* cpy */
                         /* no-op */
                         break;
-                    case 1: /* abs */
-                        gen_vfp_abs(dp);
-                        break;
                     case 2: /* neg */
                         gen_vfp_neg(dp);
                         break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMOV_imm_sp  ---- 1110 1.11 imm4h:4 .... 1010 0000 imm4l:4 \
              vd=%vd_sp
 VMOV_imm_dp  ---- 1110 1.11 imm4h:4 .... 1011 0000 imm4l:4 \
              vd=%vd_dp
+
+VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Convert the VNEG instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  6 +-----
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
 {
     return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
 }
+
+static bool trans_VNEG_sp(DisasContext *s, arg_VNEG_sp *a)
+{
+    return do_vfp_2op_sp(s, gen_helper_vfp_negs, a->vd, a->vm);
+}
+
+static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
+{
+    return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 1:
+                case 1 ... 2:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
                 case 0x00: /* vmov */
-                case 0x02: /* vneg */
                 case 0x03: /* vsqrt */
                     break;
 
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     case 0: /* cpy */
                         /* no-op */
                         break;
-                    case 2: /* neg */
-                        gen_vfp_neg(dp);
-                        break;
                     case 3: /* sqrt */
                         gen_vfp_sqrt(dp);
                         break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
              vd=%vd_sp vm=%vm_sp
 VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
              vd=%vd_dp vm=%vm_dp
+
+VNEG_sp      ---- 1110 1.11 0001 .... 1010 01.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VNEG_dp      ---- 1110 1.11 0001 .... 1011 01.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Convert the VSQRT instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 20 ++++++++++++++++++++
 target/arm/translate.c         | 14 +-------------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
 {
     return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
 }
+
+static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
+{
+    gen_helper_vfp_sqrts(vd, vm, cpu_env);
+}
+
+static bool trans_VSQRT_sp(DisasContext *s, arg_VSQRT_sp *a)
+{
+    return do_vfp_2op_sp(s, gen_VSQRT_sp, a->vd, a->vm);
+}
+
+static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
+{
+    gen_helper_vfp_sqrtd(vd, vm, cpu_env);
+}
+
+static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
+{
+    return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_neg(int dp)
         gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
 }
 
-static inline void gen_vfp_sqrt(int dp)
-{
-    if (dp)
-        gen_helper_vfp_sqrtd(cpu_F0d, cpu_F0d, cpu_env);
-    else
-        gen_helper_vfp_sqrts(cpu_F0s, cpu_F0s, cpu_env);
-}
-
 static inline void gen_vfp_cmp(int dp)
 {
     if (dp)
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 1 ... 2:
+                case 1 ... 3:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
                 case 0x00: /* vmov */
-                case 0x03: /* vsqrt */
                     break;
 
                 case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     case 0: /* cpy */
                         /* no-op */
                         break;
-                    case 3: /* sqrt */
-                        gen_vfp_sqrt(dp);
-                        break;
                     case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
                     {
                         TCGv_ptr fpst = get_fpstatus_ptr(false);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNEG_sp      ---- 1110 1.11 0001 .... 1010 01.0 .... \
              vd=%vd_sp vm=%vm_sp
 VNEG_dp      ---- 1110 1.11 0001 .... 1011 01.0 .... \
              vd=%vd_dp vm=%vm_dp
+
+VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  8 +-------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 7 deletions(-)

Convert the VFP comparison instructions to decodetree.

Note that comparison instructions should not honour the VFP
short-vector length and stride information: they are scalar-only
operations.  This applies to all the 2-operand instructions except
for VMOV, VABS, VNEG and VSQRT.  (In the old decoder this is
implemented via the "if (op == 15 && rn > 3) { veclen = 0; }" check.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 75 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 51 +----------------------
 target/arm/vfp.decode          |  5 +++
 3 files changed, 81 insertions(+), 50 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
 {
     return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
 }
+
+static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
+{
+    TCGv_i32 vd, vm;
+
+    /* Vm/M bits must be zero for the Z variant */
+    if (a->z && a->vm != 0) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vd = tcg_temp_new_i32();
+    vm = tcg_temp_new_i32();
+
+    neon_load_reg32(vd, a->vd);
+    if (a->z) {
+        tcg_gen_movi_i32(vm, 0);
+    } else {
+        neon_load_reg32(vm, a->vm);
+    }
+
+    if (a->e) {
+        gen_helper_vfp_cmpes(vd, vm, cpu_env);
+    } else {
+        gen_helper_vfp_cmps(vd, vm, cpu_env);
+    }
+
+    tcg_temp_free_i32(vd);
+    tcg_temp_free_i32(vm);
+
+    return true;
+}
+
+static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
+{
+    TCGv_i64 vd, vm;
+
+    /* Vm/M bits must be zero for the Z variant */
+    if (a->z && a->vm != 0) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vd = tcg_temp_new_i64();
+    vm = tcg_temp_new_i64();
+
+    neon_load_reg64(vd, a->vd);
+    if (a->z) {
+        tcg_gen_movi_i64(vm, 0);
+    } else {
+        neon_load_reg64(vm, a->vm);
+    }
+
+    if (a->e) {
+        gen_helper_vfp_cmped(vd, vm, cpu_env);
+    } else {
+        gen_helper_vfp_cmpd(vd, vm, cpu_env);
+    }
+
+    tcg_temp_free_i64(vd);
+    tcg_temp_free_i64(vm);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_neg(int dp)
         gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
 }
 
-static inline void gen_vfp_cmp(int dp)
-{
-    if (dp)
-        gen_helper_vfp_cmpd(cpu_F0d, cpu_F1d, cpu_env);
-    else
-        gen_helper_vfp_cmps(cpu_F0s, cpu_F1s, cpu_env);
-}
-
-static inline void gen_vfp_cmpe(int dp)
-{
-    if (dp)
-        gen_helper_vfp_cmped(cpu_F0d, cpu_F1d, cpu_env);
-    else
-        gen_helper_vfp_cmpes(cpu_F0s, cpu_F1s, cpu_env);
-}
-
-static inline void gen_vfp_F1_ld0(int dp)
-{
-    if (dp)
-        tcg_gen_movi_i64(cpu_F1d, 0);
-    else
-        tcg_gen_movi_i32(cpu_F1s, 0);
-}
-
 #define VFP_GEN_ITOF(name) \
 static inline void gen_vfp_##name(int dp, int neon) \
 { \
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             case 15:
                 switch (rn) {
                 case 0 ... 3:
+                case 8 ... 11:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     rd_is_dp = false;
                     break;
 
-                case 0x08: case 0x0a: /* vcmp, vcmpz */
-                case 0x09: case 0x0b: /* vcmpe, vcmpez */
-                    no_output = true;
-                    break;
-
                 case 0x0c: /* vrintr */
                 case 0x0d: /* vrintz */
                 case 0x0e: /* vrintx */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             /* Load the initial operands.  */
             if (op == 15) {
                 switch (rn) {
-                case 0x08: case 0x09: /* Compare */
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_mov_F1_vreg(dp, rm);
-                    break;
-                case 0x0a: case 0x0b: /* Compare with zero */
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_F1_ld0(dp);
-                    break;
                 case 0x14: /* vcvt fp <-> fixed */
                 case 0x15:
                 case 0x16:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                         gen_vfp_msr(tmp);
                         break;
                     }
-                    case 8: /* cmp */
-                        gen_vfp_cmp(dp);
-                        break;
-                    case 9: /* cmpe */
-                        gen_vfp_cmpe(dp);
-                        break;
-                    case 10: /* cmpz */
-                        gen_vfp_cmp(dp);
-                        break;
-                    case 11: /* cmpez */
-                        gen_vfp_F1_ld0(dp);
-                        gen_vfp_cmpe(dp);
-                        break;
                     case 12: /* vrintr */
                     {
                         TCGv_ptr fpst = get_fpstatus_ptr(0);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 .... \
              vd=%vd_sp vm=%vm_sp
 VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 .... \
              vd=%vd_dp vm=%vm_dp
+
+VCMP_sp      ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VCMP_dp      ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Convert the VCVTT, VCVTB instructions that deal with conversion
from half-precision floats to f32 or 64 to decodetree.

Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
load of the right half of the input single-precision register
rather than loading the full 32 bits and then doing a
separate shift or sign-extension.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 82 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 56 +----------------------
 target/arm/vfp.decode          |  6 +++
 3 files changed, 89 insertions(+), 55 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@
 #include "decode-vfp.inc.c"
 #include "decode-vfp-uncond.inc.c"
 
+/*
+ * Return the offset of a 16-bit half of the specified VFP single-precision
+ * register. If top is true, returns the top 16 bits; otherwise the bottom
+ * 16 bits.
+ */
+static inline long vfp_f16_offset(unsigned reg, bool top)
+{
+    long offs = vfp_reg_offset(false, reg);
+#ifdef HOST_WORDS_BIGENDIAN
+    if (!top) {
+        offs += 2;
+    }
+#else
+    if (top) {
+        offs += 2;
+    }
+#endif
+    return offs;
+}
+
 /*
  * Check that VFP access is enabled. If it is, do the necessary
  * M-profile lazy-FP handling and then return true.
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
 
     return true;
 }
+
+static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 ahp_mode;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_fp16_spconv, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    ahp_mode = get_ahp_flag();
+    tmp = tcg_temp_new_i32();
+    /* The T bit tells us if we want the low or high 16 bits of Vm */
+    tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
+    gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_i32(ahp_mode);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 ahp_mode;
+    TCGv_i32 tmp;
+    TCGv_i64 vd;
+
+    if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd  & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    ahp_mode = get_ahp_flag();
+    tmp = tcg_temp_new_i32();
+    /* The T bit tells us if we want the low or high 16 bits of Vm */
+    tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
+    vd = tcg_temp_new_i64();
+    gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
+    neon_store_reg64(vd, a->vd);
+    tcg_temp_free_i32(ahp_mode);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i64(vd);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 0 ... 3:
+                case 0 ... 5:
                 case 8 ... 11:
                     /* Already handled by decodetree */
                     return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             if (op == 15) {
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
-                case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
-                case 0x05: /* vcvtt.f64.f16, vcvtt.f32.f16 */
-                    /*
-                     * VCVTB, VCVTT: only present with the halfprec extension
-                     * UNPREDICTABLE if bit 8 is set prior to ARMv8
-                     * (we choose to UNDEF)
-                     */
-                    if (dp) {
-                        if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
-                            return 1;
-                        }
-                    } else {
-                        if (!dc_isar_feature(aa32_fp16_spconv, s)) {
-                            return 1;
-                        }
-                    }
-                    rm_is_dp = false;
-                    break;
                 case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
                 case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
                     if (dp) {
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 switch (op) {
                 case 15: /* extension space */
                     switch (rn) {
-                    case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(false);
-                        TCGv_i32 ahp_mode = get_ahp_flag();
-                        tmp = gen_vfp_mrs();
-                        tcg_gen_ext16u_i32(tmp, tmp);
-                        if (dp) {
-                            gen_helper_vfp_fcvt_f16_to_f64(cpu_F0d, tmp,
-                                                           fpst, ahp_mode);
-                        } else {
-                            gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp,
-                                                           fpst, ahp_mode);
-                        }
-                        tcg_temp_free_i32(ahp_mode);
-                        tcg_temp_free_ptr(fpst);
-                        tcg_temp_free_i32(tmp);
-                        break;
-                    }
-                    case 5: /* vcvtt.f32.f16, vcvtt.f64.f16 */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(false);
-                        TCGv_i32 ahp = get_ahp_flag();
-                        tmp = gen_vfp_mrs();
-                        tcg_gen_shri_i32(tmp, tmp, 16);
-                        if (dp) {
-                            gen_helper_vfp_fcvt_f16_to_f64(cpu_F0d, tmp,
-                                                           fpst, ahp);
-                        } else {
-                            gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp,
-                                                           fpst, ahp);
-                        }
-                        tcg_temp_free_i32(tmp);
-                        tcg_temp_free_i32(ahp);
-                        tcg_temp_free_ptr(fpst);
-                        break;
-                    }
                     case 6: /* vcvtb.f16.f32, vcvtb.f16.f64 */
                     {
                         TCGv_ptr fpst = get_fpstatus_ptr(false);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCMP_sp      ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCMP_dp      ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
              vd=%vd_dp vm=%vm_dp
+
+# VCVTT and VCVTB from f16: Vd format depends on size bit; Vm is always vm_sp
+VCVT_f32_f16 ---- 1110 1.11 0010 .... 1010 t:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
+             vd=%vd_dp vm=%vm_sp
-- 
2.20.1

Convert the VCVTT and VCVTB instructions which convert from
f32 and f64 to f16 to decodetree.

Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
store of the right half of the input single-precision register
rather than doing a load/modify/store sequence on the full
32 bits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 62 ++++++++++++++++++++++++++
 target/arm/translate.c         | 79 +---------------------------------
 target/arm/vfp.decode          |  6 +++
 3 files changed, 69 insertions(+), 78 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     tcg_temp_free_i64(vd);
     return true;
 }
+
+static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 ahp_mode;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_fp16_spconv, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    ahp_mode = get_ahp_flag();
+    tmp = tcg_temp_new_i32();
+
+    neon_load_reg32(tmp, a->vm);
+    gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
+    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
+    tcg_temp_free_i32(ahp_mode);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 ahp_mode;
+    TCGv_i32 tmp;
+    TCGv_i64 vm;
+
+    if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm  & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    ahp_mode = get_ahp_flag();
+    tmp = tcg_temp_new_i32();
+    vm = tcg_temp_new_i64();
+
+    neon_load_reg64(vm, a->vm);
+    gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
+    tcg_temp_free_i64(vm);
+    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
+    tcg_temp_free_i32(ahp_mode);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
 #define VFP_SREG_M(insn) VFP_SREG(insn,  0,  5)
 #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
 
-/* Move between integer and VFP cores.  */
-static TCGv_i32 gen_vfp_mrs(void)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_mov_i32(tmp, cpu_F0s);
-    return tmp;
-}
-
-static void gen_vfp_msr(TCGv_i32 tmp)
-{
-    tcg_gen_mov_i32(cpu_F0s, tmp);
-    tcg_temp_free_i32(tmp);
-}
-
 static void gen_neon_dup_low16(TCGv_i32 var)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 {
     uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
     int dp, veclen;
-    TCGv_i32 tmp;
-    TCGv_i32 tmp2;
 
     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 0 ... 5:
-                case 8 ... 11:
+                case 0 ... 11:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             if (op == 15) {
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
-                case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
-                case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
-                    if (dp) {
-                        if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
-                            return 1;
-                        }
-                    } else {
-                        if (!dc_isar_feature(aa32_fp16_spconv, s)) {
-                            return 1;
-                        }
-                    }
-                    rd_is_dp = false;
-                    break;
-
                 case 0x0c: /* vrintr */
                 case 0x0d: /* vrintz */
                 case 0x0e: /* vrintx */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 switch (op) {
                 case 15: /* extension space */
                     switch (rn) {
-                    case 6: /* vcvtb.f16.f32, vcvtb.f16.f64 */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(false);
-                        TCGv_i32 ahp = get_ahp_flag();
-                        tmp = tcg_temp_new_i32();
-
-                        if (dp) {
-                            gen_helper_vfp_fcvt_f64_to_f16(tmp, cpu_F0d,
-                                                           fpst, ahp);
-                        } else {
-                            gen_helper_vfp_fcvt_f32_to_f16(tmp, cpu_F0s,
-                                                           fpst, ahp);
-                        }
-                        tcg_temp_free_i32(ahp);
-                        tcg_temp_free_ptr(fpst);
-                        gen_mov_F0_vreg(0, rd);
-                        tmp2 = gen_vfp_mrs();
-                        tcg_gen_andi_i32(tmp2, tmp2, 0xffff0000);
-                        tcg_gen_or_i32(tmp, tmp, tmp2);
-                        tcg_temp_free_i32(tmp2);
-                        gen_vfp_msr(tmp);
-                        break;
-                    }
-                    case 7: /* vcvtt.f16.f32, vcvtt.f16.f64 */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(false);
-                        TCGv_i32 ahp = get_ahp_flag();
-                        tmp = tcg_temp_new_i32();
-                        if (dp) {
-                            gen_helper_vfp_fcvt_f64_to_f16(tmp, cpu_F0d,
-                                                           fpst, ahp);
-                        } else {
-                            gen_helper_vfp_fcvt_f32_to_f16(tmp, cpu_F0s,
-                                                           fpst, ahp);
-                        }
-                        tcg_temp_free_i32(ahp);
-                        tcg_temp_free_ptr(fpst);
-                        tcg_gen_shli_i32(tmp, tmp, 16);
-                        gen_mov_F0_vreg(0, rd);
-                        tmp2 = gen_vfp_mrs();
-                        tcg_gen_ext16u_i32(tmp2, tmp2);
-                        tcg_gen_or_i32(tmp, tmp, tmp2);
-                        tcg_temp_free_i32(tmp2);
-                        gen_vfp_msr(tmp);
-                        break;
-                    }
                     case 12: /* vrintr */
                     {
                         TCGv_ptr fpst = get_fpstatus_ptr(0);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f32_f16 ---- 1110 1.11 0010 .... 1010 t:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
              vd=%vd_dp vm=%vm_sp
+
+# VCVTB and VCVTT to f16: Vd format is always vd_sp; Vm format depends on size bit
+VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
+             vd=%vd_sp vm=%vm_dp
-- 
2.20.1

Convert the VFP round-to-integer instructions VRINTR, VRINTZ and
VRINTX to decodetree.

These instructions were only introduced as part of the "VFP misc"
additions in v8A, so we check this. The old decoder's implementation
was incorrectly providing them even for v7A CPUs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 163 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  45 +--------
 target/arm/vfp.decode          |  15 +++
 3 files changed, 179 insertions(+), 44 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
     tcg_temp_free_i32(tmp);
     return true;
 }
+
+static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    neon_load_reg32(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    gen_helper_rints(tmp, tmp, fpst);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i64 tmp;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i64();
+    neon_load_reg64(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    gen_helper_rintd(tmp, tmp, fpst);
+    neon_store_reg64(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i64(tmp);
+    return true;
+}
+
+static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+    TCGv_i32 tcg_rmode;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    neon_load_reg32(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    tcg_rmode = tcg_const_i32(float_round_to_zero);
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    gen_helper_rints(tmp, tmp, fpst);
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tcg_rmode);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i64 tmp;
+    TCGv_i32 tcg_rmode;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i64();
+    neon_load_reg64(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    tcg_rmode = tcg_const_i32(float_round_to_zero);
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    gen_helper_rintd(tmp, tmp, fpst);
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    neon_store_reg64(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i32(tcg_rmode);
+    return true;
+}
+
+static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    neon_load_reg32(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    gen_helper_rints_exact(tmp, tmp, fpst);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i64 tmp;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i64();
+    neon_load_reg64(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    gen_helper_rintd_exact(tmp, tmp, fpst);
+    neon_store_reg64(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i64(tmp);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 0 ... 11:
+                case 0 ... 14:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             if (op == 15) {
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
-                case 0x0c: /* vrintr */
-                case 0x0d: /* vrintz */
-                case 0x0e: /* vrintx */
-                    break;
-
                 case 0x0f: /* vcvt double<->single */
                     rd_is_dp = !dp;
                     break;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 switch (op) {
                 case 15: /* extension space */
                     switch (rn) {
-                    case 12: /* vrintr */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(0);
-                        if (dp) {
-                            gen_helper_rintd(cpu_F0d, cpu_F0d, fpst);
-                        } else {
-                            gen_helper_rints(cpu_F0s, cpu_F0s, fpst);
-                        }
-                        tcg_temp_free_ptr(fpst);
-                        break;
-                    }
-                    case 13: /* vrintz */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(0);
-                        TCGv_i32 tcg_rmode;
-                        tcg_rmode = tcg_const_i32(float_round_to_zero);
-                        gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-                        if (dp) {
-                            gen_helper_rintd(cpu_F0d, cpu_F0d, fpst);
-                        } else {
-                            gen_helper_rints(cpu_F0s, cpu_F0s, fpst);
-                        }
-                        gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-                        tcg_temp_free_i32(tcg_rmode);
-                        tcg_temp_free_ptr(fpst);
-                        break;
-                    }
-                    case 14: /* vrintx */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(0);
-                        if (dp) {
-                            gen_helper_rintd_exact(cpu_F0d, cpu_F0d, fpst);
-                        } else {
-                            gen_helper_rints_exact(cpu_F0s, cpu_F0s, fpst);
-                        }
-                        tcg_temp_free_ptr(fpst);
-                        break;
-                    }
                     case 15: /* single<->double conversion */
                         if (dp) {
                             gen_helper_vfp_fcvtsd(cpu_F0s, cpu_F0d, cpu_env);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
              vd=%vd_sp vm=%vm_dp
+
+VRINTR_sp    ---- 1110 1.11 0110 .... 1010 01.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VRINTR_dp    ---- 1110 1.11 0110 .... 1011 01.0 .... \
+             vd=%vd_dp vm=%vm_dp
+
+VRINTZ_sp    ---- 1110 1.11 0110 .... 1010 11.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VRINTZ_dp    ---- 1110 1.11 0110 .... 1011 11.0 .... \
+             vd=%vd_dp vm=%vm_dp
+
+VRINTX_sp    ---- 1110 1.11 0111 .... 1010 01.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VRINTX_dp    ---- 1110 1.11 0111 .... 1011 01.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Convert the VCVT double/single precision conversion insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 48 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 13 +--------
 target/arm/vfp.decode          |  6 +++++
 3 files changed, 55 insertions(+), 12 deletions(-)

Convert the VCVT integer-to-float instructions to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 58 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 12 +------
 target/arm/vfp.decode          |  6 ++++
 3 files changed, 65 insertions(+), 11 deletions(-)

Convert the VJCVT instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 28 ++++++++++++++++++++++++++++
 target/arm/translate.c         | 12 +-----------
 target/arm/vfp.decode          |  4 ++++
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
     tcg_temp_free_ptr(fpst);
     return true;
 }
+
+static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
+{
+    TCGv_i32 vd;
+    TCGv_i64 vm;
+
+    if (!dc_isar_feature(aa32_jscvt, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vm = tcg_temp_new_i64();
+    vd = tcg_temp_new_i32();
+    neon_load_reg64(vm, a->vm);
+    gen_helper_vjcvt(vd, vm, cpu_env);
+    neon_store_reg32(vd, a->vd);
+    tcg_temp_free_i64(vm);
+    tcg_temp_free_i32(vd);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 0 ... 17:
+                case 0 ... 19:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     rm_is_dp = false;
                     break;
 
-                case 0x13: /* vjcvt */
-                    if (!dp || !dc_isar_feature(aa32_jscvt, s)) {
-                        return 1;
-                    }
-                    rd_is_dp = false;
-                    break;
-
                 default:
                     return 1;
                 }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 switch (op) {
                 case 15: /* extension space */
                     switch (rn) {
-                    case 19: /* vjcvt */
-                        gen_helper_vjcvt(cpu_F0s, cpu_F0d, cpu_env);
-                        break;
                     case 20: /* fshto */
                         gen_vfp_shto(dp, 16 - rm, 0);
                         break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_int_sp  ---- 1110 1.11 1000 .... 1010 s:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_int_dp  ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
              vd=%vd_dp vm=%vm_sp
+
+# VJCVT is always dp to sp
+VJCVT        ---- 1110 1.11 1001 .... 1011 11.0 .... \
+             vd=%vd_sp vm=%vm_dp
-- 
2.20.1

Convert the VCVT (between floating-point and fixed-point) instructions
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 124 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  57 +--------------
 target/arm/vfp.decode          |  10 +++
 3 files changed, 136 insertions(+), 55 deletions(-)

Convert the float-to-integer VCVT instructions to decodetree.
Since these are the last unconverted instructions, we can
delete the old decoder structure entirely now.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c |  72 ++++++++++
 target/arm/translate.c         | 241 +--------------------------------
 target/arm/vfp.decode          |   6 +
 3 files changed, 80 insertions(+), 239 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
     tcg_temp_free_ptr(fpst);
     return true;
 }
+
+static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
+{
+    TCGv_i32 vm;
+    TCGv_ptr fpst;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    vm = tcg_temp_new_i32();
+    neon_load_reg32(vm, a->vm);
+
+    if (a->s) {
+        if (a->rz) {
+            gen_helper_vfp_tosizs(vm, vm, fpst);
+        } else {
+            gen_helper_vfp_tosis(vm, vm, fpst);
+        }
+    } else {
+        if (a->rz) {
+            gen_helper_vfp_touizs(vm, vm, fpst);
+        } else {
+            gen_helper_vfp_touis(vm, vm, fpst);
+        }
+    }
+    neon_store_reg32(vm, a->vd);
+    tcg_temp_free_i32(vm);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
+static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
+{
+    TCGv_i32 vd;
+    TCGv_i64 vm;
+    TCGv_ptr fpst;
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    vm = tcg_temp_new_i64();
+    vd = tcg_temp_new_i32();
+    neon_load_reg64(vm, a->vm);
+
+    if (a->s) {
+        if (a->rz) {
+            gen_helper_vfp_tosizd(vd, vm, fpst);
+        } else {
+            gen_helper_vfp_tosid(vd, vm, fpst);
+        }
+    } else {
+        if (a->rz) {
+            gen_helper_vfp_touizd(vd, vm, fpst);
+        } else {
+            gen_helper_vfp_touid(vd, vm, fpst);
+        }
+    }
+    neon_store_reg32(vd, a->vd);
+    tcg_temp_free_i32(vd);
+    tcg_temp_free_i64(vm);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp, int neon) \
     tcg_temp_free_ptr(statusptr); \
 }
 
-VFP_GEN_FTOI(toui)
 VFP_GEN_FTOI(touiz)
-VFP_GEN_FTOI(tosi)
 VFP_GEN_FTOI(tosiz)
 #undef VFP_GEN_FTOI
 
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 }
 
 #define tcg_gen_ld_f32 tcg_gen_ld_i32
-#define tcg_gen_ld_f64 tcg_gen_ld_i64
 #define tcg_gen_st_f32 tcg_gen_st_i32
-#define tcg_gen_st_f64 tcg_gen_st_i64
-
-static inline void gen_mov_F0_vreg(int dp, int reg)
-{
-    if (dp)
-        tcg_gen_ld_f64(cpu_F0d, cpu_env, vfp_reg_offset(dp, reg));
-    else
-        tcg_gen_ld_f32(cpu_F0s, cpu_env, vfp_reg_offset(dp, reg));
-}
-
-static inline void gen_mov_F1_vreg(int dp, int reg)
-{
-    if (dp)
-        tcg_gen_ld_f64(cpu_F1d, cpu_env, vfp_reg_offset(dp, reg));
-    else
-        tcg_gen_ld_f32(cpu_F1s, cpu_env, vfp_reg_offset(dp, reg));
-}
-
-static inline void gen_mov_vreg_F0(int dp, int reg)
-{
-    if (dp)
-        tcg_gen_st_f64(cpu_F0d, cpu_env, vfp_reg_offset(dp, reg));
-    else
-        tcg_gen_st_f32(cpu_F0s, cpu_env, vfp_reg_offset(dp, reg));
-}
 
 #define ARM_CP_RW_BIT   (1 << 20)
 
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
  */
 static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 {
-    uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
-    int dp, veclen;
-
     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
         return 1;
     }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             return 0;
         }
     }
-
-    if (extract32(insn, 28, 4) == 0xf) {
-        /*
-         * Encodings with T=1 (Thumb) or unconditional (ARM): these
-         * were all handled by the decodetree decoder, so any insn
-         * patterns which get here must be UNDEF.
-         */
-        return 1;
-    }
-
-    /*
-     * FIXME: this access check should not take precedence over UNDEF
-     * for invalid encodings; we will generate incorrect syndrome information
-     * for attempts to execute invalid vfp/neon encodings with FP disabled.
-     */
-    if (!vfp_access_check(s)) {
-        return 0;
-    }
-
-    dp = ((insn & 0xf00) == 0xb00);
-    switch ((insn >> 24) & 0xf) {
-    case 0xe:
-        if (insn & (1 << 4)) {
-            /* already handled by decodetree */
-            return 1;
-        } else {
-            /* data processing */
-            bool rd_is_dp = dp;
-            bool rm_is_dp = dp;
-            bool no_output = false;
-
-            /* The opcode is in bits 23, 21, 20 and 6.  */
-            op = ((insn >> 20) & 8) | ((insn >> 19) & 6) | ((insn >> 6) & 1);
-            rn = VFP_SREG_N(insn);
-
-            switch (op) {
-            case 0 ... 14:
-                /* Already handled by decodetree */
-                return 1;
-            case 15:
-                switch (rn) {
-                case 0 ... 23:
-                case 28 ... 31:
-                    /* Already handled by decodetree */
-                    return 1;
-                default:
-                    break;
-                }
-            default:
-                break;
-            }
-
-            if (op == 15) {
-                /* rn is opcode, encoded as per VFP_SREG_N. */
-                switch (rn) {
-                case 0x18: /* vcvtr.u32.fxx */
-                case 0x19: /* vcvtz.u32.fxx */
-                case 0x1a: /* vcvtr.s32.fxx */
-                case 0x1b: /* vcvtz.s32.fxx */
-                    rd_is_dp = false;
-                    break;
-
-                default:
-                    return 1;
-                }
-            } else if (dp) {
-                /* rn is register number */
-                VFP_DREG_N(rn, insn);
-            }
-
-            if (rd_is_dp) {
-                VFP_DREG_D(rd, insn);
-            } else {
-                rd = VFP_SREG_D(insn);
-            }
-            if (rm_is_dp) {
-                VFP_DREG_M(rm, insn);
-            } else {
-                rm = VFP_SREG_M(insn);
-            }
-
-            veclen = s->vec_len;
-            if (op == 15 && rn > 3) {
-                veclen = 0;
-            }
-
-            /* Shut up compiler warnings.  */
-            delta_m = 0;
-            delta_d = 0;
-            bank_mask = 0;
-
-            if (veclen > 0) {
-                if (dp)
-                    bank_mask = 0xc;
-                else
-                    bank_mask = 0x18;
-
-                /* Figure out what type of vector operation this is.  */
-                if ((rd & bank_mask) == 0) {
-                    /* scalar */
-                    veclen = 0;
-                } else {
-                    if (dp)
-                        delta_d = (s->vec_stride >> 1) + 1;
-                    else
-                        delta_d = s->vec_stride + 1;
-
-                    if ((rm & bank_mask) == 0) {
-                        /* mixed scalar/vector */
-                        delta_m = 0;
-                    } else {
-                        /* vector */
-                        delta_m = delta_d;
-                    }
-                }
-            }
-
-            /* Load the initial operands.  */
-            if (op == 15) {
-                switch (rn) {
-                default:
-                    /* One source operand.  */
-                    gen_mov_F0_vreg(rm_is_dp, rm);
-                    break;
-                }
-            } else {
-                /* Two source operands.  */
-                gen_mov_F0_vreg(dp, rn);
-                gen_mov_F1_vreg(dp, rm);
-            }
-
-            for (;;) {
-                /* Perform the calculation.  */
-                switch (op) {
-                case 15: /* extension space */
-                    switch (rn) {
-                    case 24: /* ftoui */
-                        gen_vfp_toui(dp, 0);
-                        break;
-                    case 25: /* ftouiz */
-                        gen_vfp_touiz(dp, 0);
-                        break;
-                    case 26: /* ftosi */
-                        gen_vfp_tosi(dp, 0);
-                        break;
-                    case 27: /* ftosiz */
-                        gen_vfp_tosiz(dp, 0);
-                        break;
-                    default: /* undefined */
-                        g_assert_not_reached();
-                    }
-                    break;
-                default: /* undefined */
-                    return 1;
-                }
-
-                /* Write back the result, if any.  */
-                if (!no_output) {
-                    gen_mov_vreg_F0(rd_is_dp, rd);
-                }
-
-                /* break out of the loop if we have finished  */
-                if (veclen == 0) {
-                    break;
-                }
-
-                if (op == 15 && delta_m == 0) {
-                    /* single source one-many */
-                    while (veclen--) {
-                        rd = ((rd + delta_d) & (bank_mask - 1))
-                             | (rd & bank_mask);
-                        gen_mov_vreg_F0(dp, rd);
-                    }
-                    break;
-                }
-                /* Setup the next operands.  */
-                veclen--;
-                rd = ((rd + delta_d) & (bank_mask - 1))
-                     | (rd & bank_mask);
-
-                if (op == 15) {
-                    /* One source operand.  */
-                    rm = ((rm + delta_m) & (bank_mask - 1))
-                         | (rm & bank_mask);
-                    gen_mov_F0_vreg(dp, rm);
-                } else {
-                    /* Two source operands.  */
-                    rn = ((rn + delta_d) & (bank_mask - 1))
-                         | (rn & bank_mask);
-                    gen_mov_F0_vreg(dp, rn);
-                    if (delta_m) {
-                        rm = ((rm + delta_m) & (bank_mask - 1))
-                             | (rm & bank_mask);
-                        gen_mov_F1_vreg(dp, rm);
-                    }
-                }
-            }
-        }
-        break;
-    case 0xc:
-    case 0xd:
-        /* Already handled by decodetree */
-        return 1;
-    default:
-        /* Should never happen.  */
-        return 1;
-    }
-    return 0;
+    /* If the decodetree decoder didn't handle this insn, it must be UNDEF */
+    return 1;
 }
 
 static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_fix_sp  ---- 1110 1.11 1.1. .... 1010 .1.0 .... \
              vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
 VCVT_fix_dp  ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
              vd=%vd_dp imm=%vm_sp opc=%vcvt_fix_op
+
+# VCVT float to integer (VCVT and VCVTR): Vd always single; Vd depends on size
+VCVT_sp_int  ---- 1110 1.11 110 s:1 .... 1010 rz:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VCVT_dp_int  ---- 1110 1.11 110 s:1 .... 1011 rz:1 1.0 .... \
+             vd=%vd_sp vm=%vm_dp
-- 
2.20.1

For VFP short vectors, the VFP registers are divided into a
series of banks: for single-precision these are s0-s7, s8-s15,
s16-s23 and s24-s31; for double-precision they are d0-d3,
d4-d7, ... d28-d31. Some banks are "scalar" meaning that
use of a register within them triggers a pure-scalar or
mixed vector-scalar operation rather than a full vector
operation. The scalar banks are s0-s7, d0-d3 and d16-d19.
When using a bank as part of a vector operation, we
iterate through it, increasing the register number by
the specified stride each time, and wrapping around to
the beginning of the bank.

Unfortunately our calculation of the "increment" part of this
was incorrect:
 vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask)
will only do the intended thing if bank_mask has exactly
one set high bit. For instance for doubles (bank_mask = 0xc),
if we start with vd = 6 and delta_d = 2 then vd is updated
to 12 rather than the intended 4.

This only causes problems in the unlikely case that the
starting register is not the first in its bank: if the
register number doesn't have to wrap around then the
expression happens to give the right answer.

Fix this bug by abstracting out the "check whether register
is in a scalar bank" and "advance register within bank"
operations to utility functions which use the right
bit masking operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 100 ++++++++++++++++++++-------------
 1 file changed, 60 insertions(+), 40 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ typedef void VFPGen3OpDPFn(TCGv_i64 vd,
 typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
 typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
 
+/*
+ * Return true if the specified S reg is in a scalar bank
+ * (ie if it is s0..s7)
+ */
+static inline bool vfp_sreg_is_scalar(int reg)
+{
+    return (reg & 0x18) == 0;
+}
+
+/*
+ * Return true if the specified D reg is in a scalar bank
+ * (ie if it is d0..d3 or d16..d19)
+ */
+static inline bool vfp_dreg_is_scalar(int reg)
+{
+    return (reg & 0xc) == 0;
+}
+
+/*
+ * Advance the S reg number forwards by delta within its bank
+ * (ie increment the low 3 bits but leave the rest the same)
+ */
+static inline int vfp_advance_sreg(int reg, int delta)
+{
+    return ((reg + delta) & 0x7) | (reg & ~0x7);
+}
+
+/*
+ * Advance the D reg number forwards by delta within its bank
+ * (ie increment the low 2 bits but leave the rest the same)
+ */
+static inline int vfp_advance_dreg(int reg, int delta)
+{
+    return ((reg + delta) & 0x3) | (reg & ~0x3);
+}
+
 /*
  * Perform a 3-operand VFP data processing instruction. fn is the
  * callback to do the actual operation; this function deals with the
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i32 f0, f1, fd;
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
     }
 
     if (veclen > 0) {
-        bank_mask = 0x18;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_sreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = s->vec_stride + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_sreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+        vd = vfp_advance_sreg(vd, delta_d);
+        vn = vfp_advance_sreg(vn, delta_d);
         neon_load_reg32(f0, vn);
         if (delta_m) {
-            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+            vm = vfp_advance_sreg(vm, delta_m);
             neon_load_reg32(f1, vm);
         }
     }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i64 f0, f1, fd;
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     }
 
     if (veclen > 0) {
-        bank_mask = 0xc;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_dreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = (s->vec_stride >> 1) + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_dreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
         }
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+        vd = vfp_advance_dreg(vd, delta_d);
+        vn = vfp_advance_dreg(vn, delta_d);
         neon_load_reg64(f0, vn);
         if (delta_m) {
-            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+            vm = vfp_advance_dreg(vm, delta_m);
             neon_load_reg64(f1, vm);
         }
     }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i32 f0, fd;
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     }
 
     if (veclen > 0) {
-        bank_mask = 0x18;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_sreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = s->vec_stride + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_sreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
         if (delta_m == 0) {
             /* single source one-many */
             while (veclen--) {
-                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+                vd = vfp_advance_sreg(vd, delta_d);
                 neon_store_reg32(fd, vd);
             }
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+        vd = vfp_advance_sreg(vd, delta_d);
+        vm = vfp_advance_sreg(vm, delta_m);
         neon_load_reg32(f0, vm);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i64 f0, fd;
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     }
 
     if (veclen > 0) {
-        bank_mask = 0xc;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_dreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = (s->vec_stride >> 1) + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_dreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
         if (delta_m == 0) {
             /* single source one-many */
             while (veclen--) {
-                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+                vd = vfp_advance_dreg(vd, delta_d);
                 neon_store_reg64(fd, vd);
             }
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+        vd = vfp_advance_dreg(vd, delta_d);
+        vd = vfp_advance_dreg(vm, delta_m);
         neon_load_reg64(f0, vm);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
 static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
 {
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i32 fd;
     uint32_t n, i, vd;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
     }
 
     if (veclen > 0) {
-        bank_mask = 0x18;
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_sreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vd = vfp_advance_sreg(vd, delta_d);
     }
 
     tcg_temp_free_i32(fd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
 static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
 {
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i64 fd;
     uint32_t n, i, vd;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     }
 
     if (veclen > 0) {
-        bank_mask = 0xc;
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_dreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vfp_advance_dreg(vd, delta_d);
     }
 
     tcg_temp_free_i64(fd);
-- 
2.20.1

First arm pullreq for 7.1. The bulk of this is the qemu_split_irq
removal.

I have enough stuff in my to-review queue that I expect to do another
pullreq early next week, but 31 patches is enough to not hang on to.

thanks
-- PMM

The following changes since commit 9c125d17e9402c232c46610802e5931b3639d77b:

Merge tag 'pull-tcg-20220420' of https://gitlab.com/rth7680/qemu into staging (2022-04-20 16:43:11 -0700)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20220421

for you to fetch changes up to 5b415dd61bdbf61fb4be0e9f1a7172b8bce682c6:

hw/arm: Use bit fields for NPCM7XX PWRON STRAPs (2022-04-21 11:37:05 +0100)

----------------------------------------------------------------
target-arm queue:
 * hw/arm/virt: Check for attempt to use TrustZone with KVM or HVF
 * versal: Add the Cortex-R5s in the Real-Time Processing Unit (RPU) subsystem
 * versal: model enough of the Clock/Reset Low-power domain (CRL) to allow control of the Cortex-R5s
 * xlnx-zynqmp: Connect 4 TTC timers
 * exynos4210: Refactor GIC/combiner code to stop using qemu_split_irq
 * realview: replace 'qemu_split_irq' with 'TYPE_SPLIT_IRQ'
 * stellaris: replace 'qemu_split_irq' with 'TYPE_SPLIT_IRQ'
 * hw/core/irq: remove unused 'qemu_irq_split' function
 * npcm7xx: use symbolic constants for PWRON STRAP bit fields
 * virt: document impact of gic-version on max CPUs

----------------------------------------------------------------
Edgar E. Iglesias (6):
      timer: cadence_ttc: Break out header file to allow embedding
      hw/arm/xlnx-zynqmp: Connect 4 TTC timers
      hw/arm: versal: Create an APU CPU Cluster
      hw/arm: versal: Add the Cortex-R5Fs
      hw/misc: Add a model of the Xilinx Versal CRL
      hw/arm: versal: Connect the CRL

Hao Wu (2):
      hw/misc: Add PWRON STRAP bit fields in GCR module
      hw/arm: Use bit fields for NPCM7XX PWRON STRAPs

Heinrich Schuchardt (1):
      hw/arm/virt: impact of gic-version on max CPUs

Peter Maydell (19):
      hw/arm/virt: Check for attempt to use TrustZone with KVM or HVF
      hw/arm/exynos4210: Use TYPE_OR_IRQ instead of custom OR-gate device
      hw/intc/exynos4210_gic: Remove unused TYPE_EXYNOS4210_IRQ_GATE
      hw/arm/exynos4210: Put a9mpcore device into state struct
      hw/arm/exynos4210: Drop int_gic_irq[] from Exynos4210Irq struct
      hw/arm/exynos4210: Coalesce board_irqs and irq_table
      hw/arm/exynos4210: Fix code style nit in combiner_grp_to_gic_id[]
      hw/arm/exynos4210: Move exynos4210_init_board_irqs() into exynos4210.c
      hw/arm/exynos4210: Put external GIC into state struct
      hw/arm/exynos4210: Drop ext_gic_irq[] from Exynos4210Irq struct
      hw/arm/exynos4210: Move exynos4210_combiner_get_gpioin() into exynos4210.c
      hw/arm/exynos4210: Delete unused macro definitions
      hw/arm/exynos4210: Use TYPE_SPLIT_IRQ in exynos4210_init_board_irqs()
      hw/arm/exynos4210: Fill in irq_table[] for internal-combiner-only IRQ lines
      hw/arm/exynos4210: Connect MCT_G0 and MCT_G1 to both combiners
      hw/arm/exynos4210: Don't connect multiple lines to external GIC inputs
      hw/arm/exynos4210: Fold combiner splits into exynos4210_init_board_irqs()
      hw/arm/exynos4210: Put combiners into state struct
      hw/arm/exynos4210: Drop Exynos4210Irq struct

Zongyuan Li (3):
      hw/arm/realview: replace 'qemu_split_irq' with 'TYPE_SPLIT_IRQ'
      hw/arm/stellaris: replace 'qemu_split_irq' with 'TYPE_SPLIT_IRQ'
      hw/core/irq: remove unused 'qemu_irq_split' function

docs/system/arm/virt.rst              |   4 +-
 include/hw/arm/exynos4210.h           |  50 ++--
 include/hw/arm/xlnx-versal.h          |  16 ++
 include/hw/arm/xlnx-zynqmp.h          |   4 +
 include/hw/intc/exynos4210_combiner.h |  57 +++++
 include/hw/intc/exynos4210_gic.h      |  43 ++++
 include/hw/irq.h                      |   5 -
 include/hw/misc/npcm7xx_gcr.h         |  30 +++
 include/hw/misc/xlnx-versal-crl.h     | 235 +++++++++++++++++++
 include/hw/timer/cadence_ttc.h        |  54 +++++
 hw/arm/exynos4210.c                   | 430 ++++++++++++++++++++++++++++++----
 hw/arm/npcm7xx_boards.c               |  24 +-
 hw/arm/realview.c                     |  33 ++-
 hw/arm/stellaris.c                    |  15 +-
 hw/arm/virt.c                         |   7 +
 hw/arm/xlnx-versal-virt.c             |   6 +-
 hw/arm/xlnx-versal.c                  |  99 +++++++-
 hw/arm/xlnx-zynqmp.c                  |  22 ++
 hw/core/irq.c                         |  15 --
 hw/intc/exynos4210_combiner.c         | 108 +--------
 hw/intc/exynos4210_gic.c              | 344 +--------------------------
 hw/misc/xlnx-versal-crl.c             | 421 +++++++++++++++++++++++++++++++++
 hw/timer/cadence_ttc.c                |  32 +--
 MAINTAINERS                           |   2 +-
 hw/misc/meson.build                   |   1 +
 25 files changed, 1457 insertions(+), 600 deletions(-)
 create mode 100644 include/hw/intc/exynos4210_combiner.h
 create mode 100644 include/hw/intc/exynos4210_gic.h
 create mode 100644 include/hw/misc/xlnx-versal-crl.h
 create mode 100644 include/hw/timer/cadence_ttc.h
 create mode 100644 hw/misc/xlnx-versal-crl.c

It's not possible to provide the guest with the Security extensions
(TrustZone) when using KVM or HVF, because the hardware
virtualization extensions don't permit running EL3 guest code.
However, we weren't checking for this combination, with the result
that QEMU would assert if you tried it:

$ qemu-system-aarch64 -enable-kvm -machine virt,secure=on -cpu host -display none
Unexpected error in object_property_find_err() at ../../qom/object.c:1304:
qemu-system-aarch64: Property 'host-arm-cpu.secure-memory' not found
Aborted

Check for this combination of options and report an error, in the
same way we already do for attempts to give a KVM or HVF guest the
Virtualization or MTE extensions. Now we will report:

qemu-system-aarch64: mach-virt: KVM does not support providing Security extensions (TrustZone) to the guest CPU

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/961
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220404155301.566542-1-peter.maydell@linaro.org
---
 hw/arm/virt.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
         exit(1);
     }
 
+    if (vms->secure && (kvm_enabled() || hvf_enabled())) {
+        error_report("mach-virt: %s does not support providing "
+                     "Security extensions (TrustZone) to the guest CPU",
+                     kvm_enabled() ? "KVM" : "HVF");
+        exit(1);
+    }
+
     if (vms->virt && (kvm_enabled() || hvf_enabled())) {
         error_report("mach-virt: %s does not support providing "
                      "Virtualization extensions to the guest CPU",
-- 
2.25.1