Series comparison

-[Qemu-devel] [PULL 00/39] target-arm queue
+[PULL 00/47] target-arm queue
-Second pull request of the week; mostly RTH's support for some
+Just my fp16 work, plus some small stuff for the sbsa-ref board;
-new-in-v8.1/v8.3 instructions, and my v8M board model.
+but my rule of thumb is to send a pullreq once I get over about
 patches...
-thanks
 -- PMM
-The following changes since commit 427cbc7e4136a061628cb4315cc8182ea36d772f:
+The following changes since commit 2f4c51c0f384d7888a04b4815861e6d5fd244d75:
-  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging (2018-03-01 18:46:41 +0000)
+  Merge remote-tracking branch 'remotes/kraxel/tags/usb-20200831-pull-request' into staging (2020-08-31 19:39:13 +0100)
 are available in the Git repository at:
-  git://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20180302
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200901
-for you to fetch changes up to e66a67bf28e1b4fce2e3d72a2610dbd48d9d3078:
+for you to fetch changes up to 3f462bf0f6ea6382dd1502d4eb1fcd33c8e774f5:
-  target/arm: Enable ARM_FEATURE_V8_FCMA (2018-03-02 11:03:45 +0000)
+  hw/arm/sbsa-ref : Add embedded controller in secure memory (2020-09-01 14:01:34 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * implement FCMA and RDM v8.1 and v8.3 instructions
+ * Implement fp16 support for AArch32 VFP and Neon
- * enable Cortex-M33 v8M core, and provide new mps2-an505 board model
+ * hw/arm/sbsa-ref: add "reg" property to DT cpu nodes
-   that uses it
+ * hw/arm/sbsa-ref : Add embedded controller in secure memory
  * decodetree: Propagate return value from translate subroutines
  * xlnx-zynqmp: Implement the RTC device
 ----------------------------------------------------------------
-Alistair Francis (3):
+Graeme Gregory (2):
-      xlnx-zynqmp-rtc: Initial commit
+      hw/misc/sbsa_ec : Add an embedded controller for sbsa-ref
-      xlnx-zynqmp-rtc: Add basic time support
+      hw/arm/sbsa-ref : Add embedded controller in secure memory
       xlnx-zynqmp: Connect the RTC device
-Peter Maydell (19):
+Leif Lindholm (1):
-      loader: Add new load_ramdisk_as()
+      hw/arm/sbsa-ref: add "reg" property to DT cpu nodes
       hw/arm/boot: Honour CPU's address space for image loads
       hw/arm/armv7m: Honour CPU's address space for image loads
       target/arm: Define an IDAU interface
       armv7m: Forward idau property to CPU object
       target/arm: Define init-svtor property for the reset secure VTOR value
       armv7m: Forward init-svtor property to CPU object
       target/arm: Add Cortex-M33
       hw/misc/unimp: Move struct to header file
       include/hw/or-irq.h: Add missing include guard
       qdev: Add new qdev_init_gpio_in_named_with_opaque()
       hw/core/split-irq: Device that splits IRQ lines
       hw/misc/mps2-fpgaio: FPGA control block for MPS2 AN505
       hw/misc/tz-ppc: Model TrustZone peripheral protection controller
       hw/misc/iotkit-secctl: Arm IoT Kit security controller initial skeleton
       hw/misc/iotkit-secctl: Add handling for PPCs
       hw/misc/iotkit-secctl: Add remaining simple registers
       hw/arm/iotkit: Model Arm IOT Kit
       mps2-an505: New board model: MPS2 with AN505 Cortex-M33 FPGA image
-Richard Henderson (17):
+Peter Maydell (44):
-      decodetree: Propagate return value from translate subroutines
+      target/arm: Remove local definitions of float constants
-      target/arm: Add ARM_FEATURE_V8_RDM
+      target/arm: Use correct ID register check for aa32_fp16_arith
-      target/arm: Refactor disas_simd_indexed decode
+      target/arm: Implement VFP fp16 for VFP_BINOP operations
-      target/arm: Refactor disas_simd_indexed size checks
+      target/arm: Implement VFP fp16 VMLA, VMLS, VNMLS, VNMLA, VNMUL
-      target/arm: Decode aa64 armv8.1 scalar three same extra
+      target/arm: Macroify trans functions for VFMA, VFMS, VFNMA, VFNMS
-      target/arm: Decode aa64 armv8.1 three same extra
+      target/arm: Implement VFP fp16 for fused-multiply-add
-      target/arm: Decode aa64 armv8.1 scalar/vector x indexed element
+      target/arm: Macroify uses of do_vfp_2op_sp() and do_vfp_2op_dp()
-      target/arm: Decode aa32 armv8.1 three same
+      target/arm: Implement VFP fp16 for VABS, VNEG, VSQRT
-      target/arm: Decode aa32 armv8.1 two reg and a scalar
+      target/arm: Implement VFP fp16 for VMOV immediate
-      target/arm: Enable ARM_FEATURE_V8_RDM
+      target/arm: Implement VFP fp16 VCMP
-      target/arm: Add ARM_FEATURE_V8_FCMA
+      target/arm: Implement VFP fp16 VLDR and VSTR
-      target/arm: Decode aa64 armv8.3 fcadd
+      target/arm: Implement VFP fp16 VCVT between float and integer
-      target/arm: Decode aa64 armv8.3 fcmla
+      target/arm: Make VFP_CONV_FIX macros take separate float type and float size
-      target/arm: Decode aa32 armv8.3 3-same
+      target/arm: Use macros instead of open-coding fp16 conversion helpers
-      target/arm: Decode aa32 armv8.3 2-reg-index
+      target/arm: Implement VFP fp16 VCVT between float and fixed-point
-      target/arm: Decode t32 simd 3reg and 2reg_scalar extension
+      target/arm: Implement VFP vp16 VCVT-with-specified-rounding-mode
-      target/arm: Enable ARM_FEATURE_V8_FCMA
+      target/arm: Implement VFP fp16 VSEL
       target/arm: Implement VFP fp16 VRINT*
       target/arm: Implement new VFP fp16 insn VINS
       target/arm: Implement new VFP fp16 insn VMOVX
       target/arm: Implement VFP fp16 VMOV between gp and halfprec registers
       target/arm: Implement FP16 for Neon VADD, VSUB, VABD, VMUL
       target/arm: Implement fp16 for Neon VRECPE, VRSQRTE using gvec
       target/arm: Implement fp16 for Neon VABS, VNEG of floats
       target/arm: Implement fp16 for VCEQ, VCGE, VCGT comparisons
       target/arm: Implement fp16 for VACGE, VACGT
       target/arm: Implement fp16 for Neon VMAX, VMIN
       target/arm: Implement fp16 for Neon VMAXNM, VMINNM
       target/arm: Implement fp16 for Neon VMLA, VMLS operations
       target/arm: Implement fp16 for Neon VFMA, VMFS
       target/arm: Implement fp16 for Neon fp compare-vs-0
       target/arm: Implement fp16 for Neon VRECPS
       target/arm: Implement fp16 for Neon VRSQRTS
       target/arm: Implement fp16 for Neon pairwise fp ops
       target/arm: Implement fp16 for Neon float-integer VCVT
       target/arm: Convert Neon VCVT fixed-point to gvec
       target/arm: Implement fp16 for Neon VCVT fixed-point
       target/arm: Implement fp16 for Neon VCVT with rounding modes
       target/arm: Implement fp16 for Neon VRINT-with-specified-rounding-mode
       target/arm: Implement fp16 for Neon VRINTX
       target/arm/vec_helper: Handle oprsz less than 16 bytes in indexed operations
       target/arm/vec_helper: Add gvec fp indexed multiply-and-add operations
       target/arm: Implement fp16 for Neon VMUL, VMLA, VMLS
       target/arm: Enable FP16 in '-cpu max'
- hw/arm/Makefile.objs               |   2 +
+ target/arm/cpu.h                |   7 +-
- hw/core/Makefile.objs              |   1 +
+ target/arm/helper.h             | 133 ++++++-
- hw/misc/Makefile.objs              |   4 +
+ target/arm/neon-dp.decode       |   8 +-
- hw/timer/Makefile.objs             |   1 +
+ target/arm/vfp-uncond.decode    |  27 +-
- target/arm/Makefile.objs           |   2 +-
+ target/arm/vfp.decode           |  34 +-
- include/hw/arm/armv7m.h            |   5 +
+ hw/arm/sbsa-ref.c               |  43 ++-
- include/hw/arm/iotkit.h            | 109 ++++++
+ hw/misc/sbsa_ec.c               |  98 +++++
- include/hw/arm/xlnx-zynqmp.h       |   2 +
+ target/arm/cpu.c                |   3 +-
- include/hw/core/split-irq.h        |  57 +++
+ target/arm/cpu64.c              |  10 +-
- include/hw/irq.h                   |   4 +-
+ target/arm/helper-a64.c         |  11 -
- include/hw/loader.h                |  12 +-
+ target/arm/translate-sve.c      |   4 -
- include/hw/misc/iotkit-secctl.h    | 103 ++++++
+ target/arm/vec_helper.c         | 431 ++++++++++++++++++++-
- include/hw/misc/mps2-fpgaio.h      |  43 +++
+ target/arm/vfp_helper.c         | 244 +++++-------
- include/hw/misc/tz-ppc.h           | 101 ++++++
+ hw/misc/meson.build             |   2 +
- include/hw/misc/unimp.h            |  10 +
+ target/arm/translate-neon.c.inc | 755 +++++++++++++------------------------
- include/hw/or-irq.h                |   5 +
+ target/arm/translate-vfp.c.inc  | 810 ++++++++++++++++++++++++++++++++++++----
- include/hw/qdev-core.h             |  30 +-
+files changed, 1819 insertions(+), 801 deletions(-)
- include/hw/timer/xlnx-zynqmp-rtc.h |  86 +++++
+ create mode 100644 hw/misc/sbsa_ec.c
  target/arm/cpu.h                   |   8 +
  target/arm/helper.h                |  31 ++
  target/arm/idau.h                  |  61 ++++
  hw/arm/armv7m.c                    |  35 +-
  hw/arm/boot.c                      | 119 ++++---
  hw/arm/iotkit.c                    | 598 +++++++++++++++++++++++++++++++
  hw/arm/mps2-tz.c                   | 503 ++++++++++++++++++++++++++
  hw/arm/xlnx-zynqmp.c               |  14 +
  hw/core/loader.c                   |   8 +-
  hw/core/qdev.c                     |   8 +-
  hw/core/split-irq.c                |  89 +++++
  hw/misc/iotkit-secctl.c            | 704 +++++++++++++++++++++++++++++++++++++
  hw/misc/mps2-fpgaio.c              | 176 ++++++++++
  hw/misc/tz-ppc.c                   | 302 ++++++++++++++++
  hw/misc/unimp.c                    |  10 -
  hw/timer/xlnx-zynqmp-rtc.c         | 272 ++++++++++++++
  linux-user/elfload.c               |   2 +
  target/arm/cpu.c                   |  66 +++-
  target/arm/cpu64.c                 |   2 +
  target/arm/helper.c                |  28 +-
  target/arm/translate-a64.c         | 514 +++++++++++++++++++++------
  target/arm/translate.c             | 275 +++++++++++++--
  target/arm/vec_helper.c            | 429 ++++++++++++++++++++++
  default-configs/arm-softmmu.mak    |   5 +
  hw/misc/trace-events               |  24 ++
  hw/timer/trace-events              |   3 +
  scripts/decodetree.py              |   5 +-
 files changed, 4668 insertions(+), 200 deletions(-)
  create mode 100644 include/hw/arm/iotkit.h
  create mode 100644 include/hw/core/split-irq.h
  create mode 100644 include/hw/misc/iotkit-secctl.h
  create mode 100644 include/hw/misc/mps2-fpgaio.h
  create mode 100644 include/hw/misc/tz-ppc.h
  create mode 100644 include/hw/timer/xlnx-zynqmp-rtc.h
  create mode 100644 target/arm/idau.h
  create mode 100644 hw/arm/iotkit.c
  create mode 100644 hw/arm/mps2-tz.c
  create mode 100644 hw/core/split-irq.c
  create mode 100644 hw/misc/iotkit-secctl.c
  create mode 100644 hw/misc/mps2-fpgaio.c
  create mode 100644 hw/misc/tz-ppc.c
  create mode 100644 hw/timer/xlnx-zynqmp-rtc.c
  create mode 100644 target/arm/vec_helper.c

-[Qemu-devel] [PULL 39/39] target/arm: Enable ARM_FEATURE_V8_FCMA
+[PULL 01/47] target/arm: Remove local definitions of float constants
-From: Richard Henderson <richard.henderson@linaro.org>
+In several places the target/arm code defines local float constants
 for 2, 3 and 1.5, which are also provided by include/fpu/softfloat.h.
 Remove the unnecessary local duplicate versions.
-Enable it for the "any" CPU used by *-linux-user.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200828183354.27913-2-peter.maydell@linaro.org
 ---
  target/arm/helper-a64.c    | 11 -----------
  target/arm/translate-sve.c |  4 ----
  target/arm/vfp_helper.c    |  4 ----
 files changed, 19 deletions(-)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20180228193125.20577-17-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/cpu.c   | 1 +
  target/arm/cpu64.c | 1 +
 files changed, 2 insertions(+)
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/target/arm/helper-a64.c
-+++ b/target/arm/cpu.c
++++ b/target/arm/helper-a64.c
-@@ -XXX,XX +XXX,XX @@ static void arm_any_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_cgt_f64)(float64 a, float64 b, void *fpstp)
-     set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
+  * versions, these do a fully fused multiply-add or
-     set_feature(&cpu->env, ARM_FEATURE_CRC);
+  * multiply-add-and-halve.
-     set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
+  */
-+    set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
+-#define float16_two make_float16(0x4000)
-     cpu->midr = 0xffffffff;
+-#define float16_three make_float16(0x4200)
 -#define float16_one_point_five make_float16(0x3e00)
 -
 -#define float32_two make_float32(0x40000000)
 -#define float32_three make_float32(0x40400000)
 -#define float32_one_point_five make_float32(0x3fc00000)
 -
 -#define float64_two make_float64(0x4000000000000000ULL)
 -#define float64_three make_float64(0x4008000000000000ULL)
 -#define float64_one_point_five make_float64(0x3FF8000000000000ULL)
  uint32_t HELPER(recpsf_f16)(uint32_t a, uint32_t b, void *fpstp)
  {
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME##_zpzi(DisasContext *s, arg_rpri_esz *a)         \
      return true;                                                          \
  }
- #endif
-diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
+-#define float16_two  make_float16(0x4000)
 -#define float32_two  make_float32(0x40000000)
 -#define float64_two  make_float64(0x4000000000000000ULL)
 -
  DO_FP_IMM(FADD, fadds, half, one)
  DO_FP_IMM(FSUB, fsubs, half, one)
  DO_FP_IMM(FMUL, fmuls, half, two)
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu64.c
+--- a/target/arm/vfp_helper.c
-+++ b/target/arm/cpu64.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ static void aarch64_any_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_fcvt_f64_to_f16)(float64 a, void *fpstp, uint32_t ahp_mode)
-     set_feature(&cpu->env, ARM_FEATURE_CRC);
+     return r;
      set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
      set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
 +    set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
      cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
      cpu->dcz_blocksize = 7; /*  512 bytes */
  }
+-#define float32_two make_float32(0x40000000)
+-#define float32_three make_float32(0x40400000)
+-#define float32_one_point_five make_float32(0x3fc00000)
+-
+ float32 HELPER(recps_f32)(CPUARMState *env, float32 a, float32 b)
+ {
+     float_status *s = &env->vfp.standard_fp_status;
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 33/39] target/arm: Add ARM_FEATURE_V8_FCMA
+[PULL 02/47] target/arm: Use correct ID register check for aa32_fp16_arith
-From: Richard Henderson <richard.henderson@linaro.org>
+The aa32_fp16_arith feature check function currently looks at the
 AArch64 ID_AA64PFR0 register. This is (as the comment notes) not
 correct. The bogus check was put in mostly to allow testing of the
 fp16 variants of the VCMLA instructions and it was something of
 a mistake that we allowed them to exist in master.
-Not enabled anywhere yet.
+Switch the feature check function to testing VMFR1.FPHP, which is
 what it ought to be.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+This will remove emulation of the VCMLA and VCADD insns from
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+AArch32 code running on an AArch64 '-cpu max' using system emulation.
-Message-id: 20180228193125.20577-11-richard.henderson@linaro.org
+(They were never enabled for aarch32 linux-user and system-emulation.)
 Since we weren't advertising their existence via the AArch32 ID
 register, well-behaved guests wouldn't have been using them anyway.
 Once we have implemented all the AArch32 support for the FP16 extension
 we will advertise it in the MVFR1 ID register field, which will reenable
 these insns along with all the others.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200828183354.27913-3-peter.maydell@linaro.org
 ---
- target/arm/cpu.h     | 1 +
+ target/arm/cpu.h | 7 +------
- linux-user/elfload.c | 1 +
+file changed, 1 insertion(+), 6 deletions(-)
 files changed, 2 insertions(+)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ enum arm_features {
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_predinv(const ARMISARegisters *id)
-     ARM_FEATURE_V8_SM4, /* implements SM4 part of v8 Crypto Extensions */
-     ARM_FEATURE_V8_RDM, /* implements v8.1 simd round multiply */
+ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
-     ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
+ {
-+    ARM_FEATURE_V8_FCMA, /* has complex number part of v8.3 extensions.  */
+-    /*
- };
+-     * This is a placeholder for use by VCMA until the rest of
+-     * the ARMv8.2-FP16 extension is implemented for aa32 mode.
- static inline int arm_feature(CPUARMState *env, int feature)
+-     * At which point we can properly set and check MVFR1.FPHP.
-diff --git a/linux-user/elfload.c b/linux-user/elfload.c
+-     */
-index XXXXXXX..XXXXXXX 100644
+-    return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
---- a/linux-user/elfload.c
++    return FIELD_EX32(id->mvfr1, MVFR1, FPHP) >= 3;
-+++ b/linux-user/elfload.c
+ }
-@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
-     GET_FEATURE(ARM_FEATURE_V8_FP16,
+ static inline bool isar_feature_aa32_vfp_simd(const ARMISARegisters *id)
                  ARM_HWCAP_A64_FPHP | ARM_HWCAP_A64_ASIMDHP);
      GET_FEATURE(ARM_FEATURE_V8_RDM, ARM_HWCAP_A64_ASIMDRDM);
 +    GET_FEATURE(ARM_FEATURE_V8_FCMA, ARM_HWCAP_A64_FCMA);
  #undef GET_FEATURE
      return hwcaps;
 --
-.16.2
+.20.1

-New patch
+[PULL 03/47] target/arm: Implement VFP fp16 for VFP_BINOP operations
+Implmeent VFP fp16 support for simple binary-operator VFP insns VADD,
 VSUB, VMUL, VDIV, VMINNM and VMAXNM:
  * make the VFP_BINOP() macro generate float16 helpers as well as
    float32 and float64
  * implement a do_vfp_3op_hp() function similar to the existing
    do_vfp_3op_sp()
  * add decode for the half-precision insn patterns
 Note that the VFP_BINOP macro use creates a couple of unused helper
 functions vfp_maxh and vfp_minh, but they're small so it's not worth
 splitting the BINOP operations into "needs halfprec" and "no
 halfprec" groups.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200828183354.27913-4-peter.maydell@linaro.org
 ---
  target/arm/helper.h            |  8 ++++
  target/arm/vfp-uncond.decode   |  3 ++
  target/arm/vfp.decode          |  4 ++
  target/arm/vfp_helper.c        |  5 ++
  target/arm/translate-vfp.c.inc | 86 ++++++++++++++++++++++++++++++++++
 files changed, 106 insertions(+)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(probe_access, TCG_CALL_NO_WG, void, env, tl, i32, i32, i32)
  DEF_HELPER_1(vfp_get_fpscr, i32, env)
  DEF_HELPER_2(vfp_set_fpscr, void, env, i32)
 +DEF_HELPER_3(vfp_addh, f16, f16, f16, ptr)
  DEF_HELPER_3(vfp_adds, f32, f32, f32, ptr)
  DEF_HELPER_3(vfp_addd, f64, f64, f64, ptr)
 +DEF_HELPER_3(vfp_subh, f16, f16, f16, ptr)
  DEF_HELPER_3(vfp_subs, f32, f32, f32, ptr)
  DEF_HELPER_3(vfp_subd, f64, f64, f64, ptr)
 +DEF_HELPER_3(vfp_mulh, f16, f16, f16, ptr)
  DEF_HELPER_3(vfp_muls, f32, f32, f32, ptr)
  DEF_HELPER_3(vfp_muld, f64, f64, f64, ptr)
 +DEF_HELPER_3(vfp_divh, f16, f16, f16, ptr)
  DEF_HELPER_3(vfp_divs, f32, f32, f32, ptr)
  DEF_HELPER_3(vfp_divd, f64, f64, f64, ptr)
 +DEF_HELPER_3(vfp_maxh, f16, f16, f16, ptr)
  DEF_HELPER_3(vfp_maxs, f32, f32, f32, ptr)
  DEF_HELPER_3(vfp_maxd, f64, f64, f64, ptr)
 +DEF_HELPER_3(vfp_minh, f16, f16, f16, ptr)
  DEF_HELPER_3(vfp_mins, f32, f32, f32, ptr)
  DEF_HELPER_3(vfp_mind, f64, f64, f64, ptr)
 +DEF_HELPER_3(vfp_maxnumh, f16, f16, f16, ptr)
  DEF_HELPER_3(vfp_maxnums, f32, f32, f32, ptr)
  DEF_HELPER_3(vfp_maxnumd, f64, f64, f64, ptr)
 +DEF_HELPER_3(vfp_minnumh, f16, f16, f16, ptr)
  DEF_HELPER_3(vfp_minnums, f32, f32, f32, ptr)
  DEF_HELPER_3(vfp_minnumd, f64, f64, f64, ptr)
  DEF_HELPER_1(vfp_negs, f32, f32)
 diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp-uncond.decode
 +++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VSEL        1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
  VSEL        1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
 +VMAXNM_hp   1111 1110 1.00 .... .... 1001 .0.0 ....         @vfp_dnm_s
 +VMINNM_hp   1111 1110 1.00 .... .... 1001 .1.0 ....         @vfp_dnm_s
 +
  VMAXNM_sp   1111 1110 1.00 .... .... 1010 .0.0 ....         @vfp_dnm_s
  VMINNM_sp   1111 1110 1.00 .... .... 1010 .1.0 ....         @vfp_dnm_s
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 ....        @vfp_dnm_d
  VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 ....        @vfp_dnm_s
  VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 ....        @vfp_dnm_d
 +VMUL_hp      ---- 1110 0.10 .... .... 1001 .0.0 ....        @vfp_dnm_s
  VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 ....        @vfp_dnm_s
  VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 ....        @vfp_dnm_d
  VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 ....        @vfp_dnm_s
  VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 ....        @vfp_dnm_d
 +VADD_hp      ---- 1110 0.11 .... .... 1001 .0.0 ....        @vfp_dnm_s
  VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 ....        @vfp_dnm_s
  VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 ....        @vfp_dnm_d
 +VSUB_hp      ---- 1110 0.11 .... .... 1001 .1.0 ....        @vfp_dnm_s
  VSUB_sp      ---- 1110 0.11 .... .... 1010 .1.0 ....        @vfp_dnm_s
  VSUB_dp      ---- 1110 0.11 .... .... 1011 .1.0 ....        @vfp_dnm_d
 +VDIV_hp      ---- 1110 1.00 .... .... 1001 .0.0 ....        @vfp_dnm_s
  VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 ....        @vfp_dnm_s
  VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 ....        @vfp_dnm_d
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val)
  #define VFP_HELPER(name, p) HELPER(glue(glue(vfp_,name),p))
  #define VFP_BINOP(name) \
 +dh_ctype_f16 VFP_HELPER(name, h)(dh_ctype_f16 a, dh_ctype_f16 b, void *fpstp) \
 +{ \
 +    float_status *fpst = fpstp; \
 +    return float16_ ## name(a, b, fpst); \
 +} \
  float32 VFP_HELPER(name, s)(float32 a, float32 b, void *fpstp) \
  { \
      float_status *fpst = fpstp; \
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
      return true;
  }
 +static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
 +                          int vd, int vn, int vm, bool reads_vd)
 +{
 +    /*
 +     * Do a half-precision operation. Functionally this is
 +     * the same as do_vfp_3op_sp(), except:
 +     *  - it uses the FPST_FPCR_F16
 +     *  - it doesn't need the VFP vector handling (fp16 is a
 +     *    v8 feature, and in v8 VFP vectors don't exist)
 +     *  - it does the aa32_fp16_arith feature test
 +     */
 +    TCGv_i32 f0, f1, fd;
 +    TCGv_ptr fpst;
 +
 +    if (!dc_isar_feature(aa32_fp16_arith, s)) {
 +        return false;
 +    }
 +
 +    if (s->vec_len != 0 || s->vec_stride != 0) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    f0 = tcg_temp_new_i32();
 +    f1 = tcg_temp_new_i32();
 +    fd = tcg_temp_new_i32();
 +    fpst = fpstatus_ptr(FPST_FPCR_F16);
 +
 +    neon_load_reg32(f0, vn);
 +    neon_load_reg32(f1, vm);
 +
 +    if (reads_vd) {
 +        neon_load_reg32(fd, vd);
 +    }
 +    fn(fd, f0, f1, fpst);
 +    neon_store_reg32(fd, vd);
 +
 +    tcg_temp_free_i32(f0);
 +    tcg_temp_free_i32(f1);
 +    tcg_temp_free_i32(fd);
 +    tcg_temp_free_ptr(fpst);
 +
 +    return true;
 +}
 +
  static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
                            int vd, int vn, int vm, bool reads_vd)
  {
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_dp *a)
      return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
  }
 +static bool trans_VMUL_hp(DisasContext *s, arg_VMUL_sp *a)
 +{
 +    return do_vfp_3op_hp(s, gen_helper_vfp_mulh, a->vd, a->vn, a->vm, false);
 +}
 +
  static bool trans_VMUL_sp(DisasContext *s, arg_VMUL_sp *a)
  {
      return do_vfp_3op_sp(s, gen_helper_vfp_muls, a->vd, a->vn, a->vm, false);
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_dp *a)
      return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
  }
 +static bool trans_VADD_hp(DisasContext *s, arg_VADD_sp *a)
 +{
 +    return do_vfp_3op_hp(s, gen_helper_vfp_addh, a->vd, a->vn, a->vm, false);
 +}
 +
  static bool trans_VADD_sp(DisasContext *s, arg_VADD_sp *a)
  {
      return do_vfp_3op_sp(s, gen_helper_vfp_adds, a->vd, a->vn, a->vm, false);
@@ -XXX,XX +XXX,XX @@ static bool trans_VADD_dp(DisasContext *s, arg_VADD_dp *a)
      return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
  }
 +static bool trans_VSUB_hp(DisasContext *s, arg_VSUB_sp *a)
 +{
 +    return do_vfp_3op_hp(s, gen_helper_vfp_subh, a->vd, a->vn, a->vm, false);
 +}
 +
  static bool trans_VSUB_sp(DisasContext *s, arg_VSUB_sp *a)
  {
      return do_vfp_3op_sp(s, gen_helper_vfp_subs, a->vd, a->vn, a->vm, false);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_dp *a)
      return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
  }
 +static bool trans_VDIV_hp(DisasContext *s, arg_VDIV_sp *a)
 +{
 +    return do_vfp_3op_hp(s, gen_helper_vfp_divh, a->vd, a->vn, a->vm, false);
 +}
 +
  static bool trans_VDIV_sp(DisasContext *s, arg_VDIV_sp *a)
  {
      return do_vfp_3op_sp(s, gen_helper_vfp_divs, a->vd, a->vn, a->vm, false);
@@ -XXX,XX +XXX,XX @@ static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_dp *a)
      return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
  }
 +static bool trans_VMINNM_hp(DisasContext *s, arg_VMINNM_sp *a)
 +{
 +    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
 +        return false;
 +    }
 +    return do_vfp_3op_hp(s, gen_helper_vfp_minnumh,
 +                         a->vd, a->vn, a->vm, false);
 +}
 +
 +static bool trans_VMAXNM_hp(DisasContext *s, arg_VMAXNM_sp *a)
 +{
 +    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
 +        return false;
 +    }
 +    return do_vfp_3op_hp(s, gen_helper_vfp_maxnumh,
 +                         a->vd, a->vn, a->vm, false);
 +}
 +
  static bool trans_VMINNM_sp(DisasContext *s, arg_VMINNM_sp *a)
  {
      if (!dc_isar_feature(aa32_vminmaxnm, s)) {
 --
 .20.1

-New patch
+[PULL 04/47] target/arm: Implement VFP fp16 VMLA, VMLS, VNMLS, VNMLA, VNMUL
+Implement fp16 versions of the VFP VMLA, VMLS, VNMLS, VNMLA, VNMUL
 instructions. (These are all the remaining ones which we implement
 via do_vfp_3op_[hsd]p().)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200828183354.27913-5-peter.maydell@linaro.org
 ---
  target/arm/helper.h            |  1 +
  target/arm/vfp.decode          |  5 ++
  target/arm/vfp_helper.c        |  5 ++
  target/arm/translate-vfp.c.inc | 84 ++++++++++++++++++++++++++++++++++
 files changed, 95 insertions(+)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_maxnumd, f64, f64, f64, ptr)
  DEF_HELPER_3(vfp_minnumh, f16, f16, f16, ptr)
  DEF_HELPER_3(vfp_minnums, f32, f32, f32, ptr)
  DEF_HELPER_3(vfp_minnumd, f64, f64, f64, ptr)
 +DEF_HELPER_1(vfp_negh, f16, f16)
  DEF_HELPER_1(vfp_negs, f32, f32)
  DEF_HELPER_1(vfp_negd, f64, f64)
  DEF_HELPER_1(vfp_abss, f32, f32)
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
               vd=%vd_dp p=1 u=0 w=1
  # 3-register VFP data-processing; bits [23,21:20,6] identify the operation.
 +VMLA_hp      ---- 1110 0.00 .... .... 1001 .0.0 ....        @vfp_dnm_s
  VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 ....        @vfp_dnm_s
  VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 ....        @vfp_dnm_d
 +VMLS_hp      ---- 1110 0.00 .... .... 1001 .1.0 ....        @vfp_dnm_s
  VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 ....        @vfp_dnm_s
  VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 ....        @vfp_dnm_d
 +VNMLS_hp     ---- 1110 0.01 .... .... 1001 .0.0 ....        @vfp_dnm_s
  VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 ....        @vfp_dnm_s
  VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 ....        @vfp_dnm_d
 +VNMLA_hp     ---- 1110 0.01 .... .... 1001 .1.0 ....        @vfp_dnm_s
  VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 ....        @vfp_dnm_s
  VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 ....        @vfp_dnm_d
@@ -XXX,XX +XXX,XX @@ VMUL_hp      ---- 1110 0.10 .... .... 1001 .0.0 ....        @vfp_dnm_s
  VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 ....        @vfp_dnm_s
  VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 ....        @vfp_dnm_d
 +VNMUL_hp     ---- 1110 0.10 .... .... 1001 .1.0 ....        @vfp_dnm_s
  VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 ....        @vfp_dnm_s
  VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 ....        @vfp_dnm_d
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ VFP_BINOP(minnum)
  VFP_BINOP(maxnum)
  #undef VFP_BINOP
 +dh_ctype_f16 VFP_HELPER(neg, h)(dh_ctype_f16 a)
 +{
 +    return float16_chs(a);
 +}
 +
  float32 VFP_HELPER(neg, s)(float32 a)
  {
      return float32_chs(a);
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
      return true;
  }
 +static void gen_VMLA_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 +{
 +    /* Note that order of inputs to the add matters for NaNs */
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +
 +    gen_helper_vfp_mulh(tmp, vn, vm, fpst);
 +    gen_helper_vfp_addh(vd, vd, tmp, fpst);
 +    tcg_temp_free_i32(tmp);
 +}
 +
 +static bool trans_VMLA_hp(DisasContext *s, arg_VMLA_sp *a)
 +{
 +    return do_vfp_3op_hp(s, gen_VMLA_hp, a->vd, a->vn, a->vm, true);
 +}
 +
  static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
  {
      /* Note that order of inputs to the add matters for NaNs */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_dp *a)
      return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
  }
 +static void gen_VMLS_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 +{
 +    /*
 +     * VMLS: vd = vd + -(vn * vm)
 +     * Note that order of inputs to the add matters for NaNs.
 +     */
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +
 +    gen_helper_vfp_mulh(tmp, vn, vm, fpst);
 +    gen_helper_vfp_negh(tmp, tmp);
 +    gen_helper_vfp_addh(vd, vd, tmp, fpst);
 +    tcg_temp_free_i32(tmp);
 +}
 +
 +static bool trans_VMLS_hp(DisasContext *s, arg_VMLS_sp *a)
 +{
 +    return do_vfp_3op_hp(s, gen_VMLS_hp, a->vd, a->vn, a->vm, true);
 +}
 +
  static void gen_VMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
  {
      /*
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_dp *a)
      return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
  }
 +static void gen_VNMLS_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 +{
 +    /*
 +     * VNMLS: -fd + (fn * fm)
 +     * Note that it isn't valid to replace (-A + B) with (B - A) or similar
 +     * plausible looking simplifications because this will give wrong results
 +     * for NaNs.
 +     */
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +
 +    gen_helper_vfp_mulh(tmp, vn, vm, fpst);
 +    gen_helper_vfp_negh(vd, vd);
 +    gen_helper_vfp_addh(vd, vd, tmp, fpst);
 +    tcg_temp_free_i32(tmp);
 +}
 +
 +static bool trans_VNMLS_hp(DisasContext *s, arg_VNMLS_sp *a)
 +{
 +    return do_vfp_3op_hp(s, gen_VNMLS_hp, a->vd, a->vn, a->vm, true);
 +}
 +
  static void gen_VNMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
  {
      /*
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_dp *a)
      return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
  }
 +static void gen_VNMLA_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 +{
 +    /* VNMLA: -fd + -(fn * fm) */
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +
 +    gen_helper_vfp_mulh(tmp, vn, vm, fpst);
 +    gen_helper_vfp_negh(tmp, tmp);
 +    gen_helper_vfp_negh(vd, vd);
 +    gen_helper_vfp_addh(vd, vd, tmp, fpst);
 +    tcg_temp_free_i32(tmp);
 +}
 +
 +static bool trans_VNMLA_hp(DisasContext *s, arg_VNMLA_sp *a)
 +{
 +    return do_vfp_3op_hp(s, gen_VNMLA_hp, a->vd, a->vn, a->vm, true);
 +}
 +
  static void gen_VNMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
  {
      /* VNMLA: -fd + -(fn * fm) */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_dp *a)
      return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
  }
 +static void gen_VNMUL_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 +{
 +    /* VNMUL: -(fn * fm) */
 +    gen_helper_vfp_mulh(vd, vn, vm, fpst);
 +    gen_helper_vfp_negh(vd, vd);
 +}
 +
 +static bool trans_VNMUL_hp(DisasContext *s, arg_VNMUL_sp *a)
 +{
 +    return do_vfp_3op_hp(s, gen_VNMUL_hp, a->vd, a->vn, a->vm, false);
 +}
 +
  static void gen_VNMUL_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
  {
      /* VNMUL: -(fn * fm) */
 --
 .20.1

-New patch
+[PULL 05/47] target/arm: Macroify trans functions for VFMA, VFMS, VFNMA, VFNMS
+Macroify creation of the trans functions for single and double
+precision VFMA, VFMS, VFNMA, VFNMS. The repetition was OK for
+two sizes, but we're about to add halfprec and it will get a bit
+more than seems reasonable.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200828183354.27913-6-peter.maydell@linaro.org
+---
+ target/arm/translate-vfp.c.inc | 50 +++++++++-------------------------
+file changed, 13 insertions(+), 37 deletions(-)
+diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-vfp.c.inc
++++ b/target/arm/translate-vfp.c.inc
+@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
+     return true;
+ }
+-static bool trans_VFMA_sp(DisasContext *s, arg_VFMA_sp *a)
+-{
+-    return do_vfm_sp(s, a, false, false);
+-}
+-
+-static bool trans_VFMS_sp(DisasContext *s, arg_VFMS_sp *a)
+-{
+-    return do_vfm_sp(s, a, true, false);
+-}
+-
+-static bool trans_VFNMA_sp(DisasContext *s, arg_VFNMA_sp *a)
+-{
+-    return do_vfm_sp(s, a, false, true);
+-}
+-
+-static bool trans_VFNMS_sp(DisasContext *s, arg_VFNMS_sp *a)
+-{
+-    return do_vfm_sp(s, a, true, true);
+-}
+-
+ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
+ {
+     /*
+@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
+     return true;
+ }
+-static bool trans_VFMA_dp(DisasContext *s, arg_VFMA_dp *a)
+-{
+-    return do_vfm_dp(s, a, false, false);
+-}
++#define MAKE_ONE_VFM_TRANS_FN(INSN, PREC, NEGN, NEGD)                   \
++    static bool trans_##INSN##_##PREC(DisasContext *s,                  \
++                                      arg_##INSN##_##PREC *a)           \
++    {                                                                   \
++        return do_vfm_##PREC(s, a, NEGN, NEGD);                         \
++    }
+-static bool trans_VFMS_dp(DisasContext *s, arg_VFMS_dp *a)
+-{
+-    return do_vfm_dp(s, a, true, false);
+-}
++#define MAKE_VFM_TRANS_FNS(PREC) \
++    MAKE_ONE_VFM_TRANS_FN(VFMA, PREC, false, false) \
++    MAKE_ONE_VFM_TRANS_FN(VFMS, PREC, true, false) \
++    MAKE_ONE_VFM_TRANS_FN(VFNMA, PREC, false, true) \
++    MAKE_ONE_VFM_TRANS_FN(VFNMS, PREC, true, true)
+-static bool trans_VFNMA_dp(DisasContext *s, arg_VFNMA_dp *a)
+-{
+-    return do_vfm_dp(s, a, false, true);
+-}
+-
+-static bool trans_VFNMS_dp(DisasContext *s, arg_VFNMS_dp *a)
+-{
+-    return do_vfm_dp(s, a, true, true);
+-}
++MAKE_VFM_TRANS_FNS(sp)
++MAKE_VFM_TRANS_FNS(dp)
+ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
+ {
+--
+.20.1

-[Qemu-devel] [PULL 11/39] armv7m: Forward init-svtor property to CPU object
+[PULL 06/47] target/arm: Implement VFP fp16 for fused-multiply-add
-Create an "init-svtor" property on the armv7m container
+Implement VFP fp16 support for fused multiply-add insns
-object which we can forward to the CPU object.
+VFNMA, VFNMS, VFMA, VFMS.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-8-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-7-peter.maydell@linaro.org
 ---
- include/hw/arm/armv7m.h | 2 ++
+ target/arm/helper.h            |  1 +
- hw/arm/armv7m.c         | 9 +++++++++
+ target/arm/vfp.decode          |  5 +++
-files changed, 11 insertions(+)
+ target/arm/vfp_helper.c        |  7 ++++
  target/arm/translate-vfp.c.inc | 64 ++++++++++++++++++++++++++++++++++
 files changed, 77 insertions(+)
-diff --git a/include/hw/arm/armv7m.h b/include/hw/arm/armv7m.h
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/armv7m.h
+--- a/target/arm/helper.h
-+++ b/include/hw/arm/armv7m.h
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ typedef struct {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(vfp_fcvt_f64_to_f16, TCG_CALL_NO_RWG, f16, f64, ptr, i32)
-  *   that CPU accesses see. (The NVIC, bitbanding and other CPU-internal
-  *   devices will be automatically layered on top of this view.)
+ DEF_HELPER_4(vfp_muladdd, f64, f64, f64, f64, ptr)
-  * + Property "idau": IDAU interface (forwarded to CPU object)
+ DEF_HELPER_4(vfp_muladds, f32, f32, f32, f32, ptr)
-+ * + Property "init-svtor": secure VTOR reset value (forwarded to CPU object)
++DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, ptr)
-  */
- typedef struct ARMv7MState {
+ DEF_HELPER_3(recps_f32, f32, env, f32, f32)
-     /*< private >*/
+ DEF_HELPER_3(rsqrts_f32, f32, env, f32, f32)
-@@ -XXX,XX +XXX,XX @@ typedef struct ARMv7MState {
+diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
      /* MemoryRegion the board provides to us (with its devices, RAM, etc) */
      MemoryRegion *board_memory;
      Object *idau;
 +    uint32_t init_svtor;
  } ARMv7MState;
  #endif
 diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/armv7m.c
+--- a/target/arm/vfp.decode
-+++ b/hw/arm/armv7m.c
++++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ static void armv7m_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ VDIV_hp      ---- 1110 1.00 .... .... 1001 .0.0 ....        @vfp_dnm_s
-             return;
+ VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 ....        @vfp_dnm_s
-         }
+ VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 ....        @vfp_dnm_d
-     }
-+    if (object_property_find(OBJECT(s->cpu), "init-svtor", NULL)) {
++VFMA_hp      ---- 1110 1.10 .... .... 1001 .0. 0 ....       @vfp_dnm_s
-+        object_property_set_uint(OBJECT(s->cpu), s->init_svtor,
++VFMS_hp      ---- 1110 1.10 .... .... 1001 .1. 0 ....       @vfp_dnm_s
-+                                 "init-svtor", &err);
++VFNMA_hp     ---- 1110 1.01 .... .... 1001 .0. 0 ....       @vfp_dnm_s
-+        if (err != NULL) {
++VFNMS_hp     ---- 1110 1.01 .... .... 1001 .1. 0 ....       @vfp_dnm_s
-+            error_propagate(errp, err);
++
-+            return;
+ VFMA_sp      ---- 1110 1.10 .... .... 1010 .0. 0 ....       @vfp_dnm_s
-+        }
+ VFMS_sp      ---- 1110 1.10 .... .... 1010 .1. 0 ....       @vfp_dnm_s
  VFNMA_sp     ---- 1110 1.01 .... .... 1010 .0. 0 ....       @vfp_dnm_s
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_u32)(uint32_t a)
  }
  /* VFPv4 fused multiply-accumulate */
 +dh_ctype_f16 VFP_HELPER(muladd, h)(dh_ctype_f16 a, dh_ctype_f16 b,
 +                                   dh_ctype_f16 c, void *fpstp)
 +{
 +    float_status *fpst = fpstp;
 +    return float16_muladd(a, b, c, 0, fpst);
 +}
 +
  float32 VFP_HELPER(muladd, s)(float32 a, float32 b, float32 c, void *fpstp)
  {
      float_status *fpst = fpstp;
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMAXNM_dp(DisasContext *s, arg_VMAXNM_dp *a)
                           a->vd, a->vn, a->vm, false);
  }
 +static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
 +{
 +    /*
 +     * VFNMA : fd = muladd(-fd,  fn, fm)
 +     * VFNMS : fd = muladd(-fd, -fn, fm)
 +     * VFMA  : fd = muladd( fd,  fn, fm)
 +     * VFMS  : fd = muladd( fd, -fn, fm)
 +     *
 +     * These are fused multiply-add, and must be done as one floating
 +     * point operation with no rounding between the multiplication and
 +     * addition steps.  NB that doing the negations here as separate
 +     * steps is correct : an input NaN should come out with its sign
 +     * bit flipped if it is a negated-input.
 +     */
 +    TCGv_ptr fpst;
 +    TCGv_i32 vn, vm, vd;
 +
 +    /*
 +     * Present in VFPv4 only, and only with the FP16 extension.
 +     * Note that we can't rely on the SIMDFMAC check alone, because
 +     * in a Neon-no-VFP core that ID register field will be non-zero.
 +     */
 +    if (!dc_isar_feature(aa32_fp16_arith, s) ||
 +        !dc_isar_feature(aa32_simdfmac, s) ||
 +        !dc_isar_feature(aa32_fpsp_v2, s)) {
 +        return false;
 +    }
-     object_property_set_bool(OBJECT(s->cpu), true, "realized", &err);
++
-     if (err != NULL) {
++    if (s->vec_len != 0 || s->vec_stride != 0) {
-         error_propagate(errp, err);
++        return false;
-@@ -XXX,XX +XXX,XX @@ static Property armv7m_properties[] = {
++    }
-     DEFINE_PROP_LINK("memory", ARMv7MState, board_memory, TYPE_MEMORY_REGION,
++
-                      MemoryRegion *),
++    if (!vfp_access_check(s)) {
-     DEFINE_PROP_LINK("idau", ARMv7MState, idau, TYPE_IDAU_INTERFACE, Object *),
++        return true;
-+    DEFINE_PROP_UINT32("init-svtor", ARMv7MState, init_svtor, 0),
++    }
-     DEFINE_PROP_END_OF_LIST(),
++
- };
++    vn = tcg_temp_new_i32();
 +    vm = tcg_temp_new_i32();
 +    vd = tcg_temp_new_i32();
 +
 +    neon_load_reg32(vn, a->vn);
 +    neon_load_reg32(vm, a->vm);
 +    if (neg_n) {
 +        /* VFNMS, VFMS */
 +        gen_helper_vfp_negh(vn, vn);
 +    }
 +    neon_load_reg32(vd, a->vd);
 +    if (neg_d) {
 +        /* VFNMA, VFNMS */
 +        gen_helper_vfp_negh(vd, vd);
 +    }
 +    fpst = fpstatus_ptr(FPST_FPCR_F16);
 +    gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
 +    neon_store_reg32(vd, a->vd);
 +
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(vn);
 +    tcg_temp_free_i32(vm);
 +    tcg_temp_free_i32(vd);
 +
 +    return true;
 +}
 +
  static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
  {
      /*
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
      MAKE_ONE_VFM_TRANS_FN(VFNMA, PREC, false, true) \
      MAKE_ONE_VFM_TRANS_FN(VFNMS, PREC, true, true)
 +MAKE_VFM_TRANS_FNS(hp)
  MAKE_VFM_TRANS_FNS(sp)
  MAKE_VFM_TRANS_FNS(dp)
 --
-.16.2
+.20.1

-New patch
+[PULL 07/47] target/arm: Macroify uses of do_vfp_2op_sp() and do_vfp_2op_dp()
+Macroify the uses of do_vfp_2op_sp() and do_vfp_2op_dp(); this will
+make it easier to add the halfprec support.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200828183354.27913-8-peter.maydell@linaro.org
+---
+ target/arm/translate-vfp.c.inc | 49 ++++++++++------------------------
+file changed, 14 insertions(+), 35 deletions(-)
+diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-vfp.c.inc
++++ b/target/arm/translate-vfp.c.inc
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
+     return true;
+ }
+-static bool trans_VMOV_reg_sp(DisasContext *s, arg_VMOV_reg_sp *a)
+-{
+-    return do_vfp_2op_sp(s, tcg_gen_mov_i32, a->vd, a->vm);
+-}
++#define DO_VFP_2OP(INSN, PREC, FN)                              \
++    static bool trans_##INSN##_##PREC(DisasContext *s,          \
++                                      arg_##INSN##_##PREC *a)   \
++    {                                                           \
++        return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
++    }
+-static bool trans_VMOV_reg_dp(DisasContext *s, arg_VMOV_reg_dp *a)
+-{
+-    return do_vfp_2op_dp(s, tcg_gen_mov_i64, a->vd, a->vm);
+-}
++DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32)
++DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64)
+-static bool trans_VABS_sp(DisasContext *s, arg_VABS_sp *a)
+-{
+-    return do_vfp_2op_sp(s, gen_helper_vfp_abss, a->vd, a->vm);
+-}
++DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss)
++DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd)
+-static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
+-{
+-    return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
+-}
+-
+-static bool trans_VNEG_sp(DisasContext *s, arg_VNEG_sp *a)
+-{
+-    return do_vfp_2op_sp(s, gen_helper_vfp_negs, a->vd, a->vm);
+-}
+-
+-static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
+-{
+-    return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
+-}
++DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs)
++DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd)
+ static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
+ {
+     gen_helper_vfp_sqrts(vd, vm, cpu_env);
+ }
+-static bool trans_VSQRT_sp(DisasContext *s, arg_VSQRT_sp *a)
+-{
+-    return do_vfp_2op_sp(s, gen_VSQRT_sp, a->vd, a->vm);
+-}
+-
+ static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
+ {
+     gen_helper_vfp_sqrtd(vd, vm, cpu_env);
+ }
+-static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
+-{
+-    return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
+-}
++DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
++DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
+ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
+ {
+--
+.20.1

-[Qemu-devel] [PULL 20/39] hw/misc/iotkit-secctl: Add handling for PPCs
+[PULL 08/47] target/arm: Implement VFP fp16 for VABS, VNEG, VSQRT
-The IoTKit Security Controller includes various registers
+Implement VFP fp16 for VABS, VNEG and VSQRT. This is all
-that expose to software the controls for the Peripheral
+the fp16 insns that use the DO_VFP_2OP macro, because there
-Protection Controllers in the system. Implement these.
+is no fp16 version of VMOV_reg.
 Notes:
  * the gen_helper_vfp_negh already exists as we needed to create
    it for the fp16 multiply-add insns
  * as usual we need to use the f16 version of the fp_status;
    this is only relevant for VSQRT
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-17-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-9-peter.maydell@linaro.org
 ---
- include/hw/misc/iotkit-secctl.h |  64 +++++++++-
+ target/arm/helper.h            |  2 ++
- hw/misc/iotkit-secctl.c         | 270 +++++++++++++++++++++++++++++++++++++---
+ target/arm/vfp.decode          |  3 +++
-files changed, 315 insertions(+), 19 deletions(-)
+ target/arm/vfp_helper.c        | 10 +++++++++
  target/arm/translate-vfp.c.inc | 40 ++++++++++++++++++++++++++++++++++
 files changed, 55 insertions(+)
-diff --git a/include/hw/misc/iotkit-secctl.h b/include/hw/misc/iotkit-secctl.h
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/misc/iotkit-secctl.h
+--- a/target/arm/helper.h
-+++ b/include/hw/misc/iotkit-secctl.h
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_minnumd, f64, f64, f64, ptr)
-  * QEMU interface:
+ DEF_HELPER_1(vfp_negh, f16, f16)
-  *  + sysbus MMIO region 0 is the "secure privilege control block" registers
+ DEF_HELPER_1(vfp_negs, f32, f32)
-  *  + sysbus MMIO region 1 is the "non-secure privilege control block" registers
+ DEF_HELPER_1(vfp_negd, f64, f64)
-+ *  + named GPIO output "sec_resp_cfg" indicating whether blocked accesses
++DEF_HELPER_1(vfp_absh, f16, f16)
-+ *    should RAZ/WI or bus error
+ DEF_HELPER_1(vfp_abss, f32, f32)
-+ * Controlling the 2 APB PPCs in the IoTKit:
+ DEF_HELPER_1(vfp_absd, f64, f64)
-+ *  + named GPIO outputs apb_ppc0_nonsec[0..2] and apb_ppc1_nonsec
++DEF_HELPER_2(vfp_sqrth, f16, f16, env)
-+ *  + named GPIO outputs apb_ppc0_ap[0..2] and apb_ppc1_ap
+ DEF_HELPER_2(vfp_sqrts, f32, f32, env)
-+ *  + named GPIO outputs apb_ppc{0,1}_irq_enable
+ DEF_HELPER_2(vfp_sqrtd, f64, f64, env)
-+ *  + named GPIO outputs apb_ppc{0,1}_irq_clear
+ DEF_HELPER_3(vfp_cmps, void, f32, f32, env)
-+ *  + named GPIO inputs apb_ppc{0,1}_irq_status
+diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 + * Controlling each of the 4 expansion APB PPCs which a system using the IoTKit
 + * might provide:
 + *  + named GPIO outputs apb_ppcexp{0,1,2,3}_nonsec[0..15]
 + *  + named GPIO outputs apb_ppcexp{0,1,2,3}_ap[0..15]
 + *  + named GPIO outputs apb_ppcexp{0,1,2,3}_irq_enable
 + *  + named GPIO outputs apb_ppcexp{0,1,2,3}_irq_clear
 + *  + named GPIO inputs apb_ppcexp{0,1,2,3}_irq_status
 + * Controlling each of the 4 expansion AHB PPCs which a system using the IoTKit
 + * might provide:
 + *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_nonsec[0..15]
 + *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_ap[0..15]
 + *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_enable
 + *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_clear
 + *  + named GPIO inputs ahb_ppcexp{0,1,2,3}_irq_status
   */
  #ifndef IOTKIT_SECCTL_H
@@ -XXX,XX +XXX,XX @@
  #define TYPE_IOTKIT_SECCTL "iotkit-secctl"
  #define IOTKIT_SECCTL(obj) OBJECT_CHECK(IoTKitSecCtl, (obj), TYPE_IOTKIT_SECCTL)
 -typedef struct IoTKitSecCtl {
 +#define IOTS_APB_PPC0_NUM_PORTS 3
 +#define IOTS_APB_PPC1_NUM_PORTS 1
 +#define IOTS_PPC_NUM_PORTS 16
 +#define IOTS_NUM_APB_PPC 2
 +#define IOTS_NUM_APB_EXP_PPC 4
 +#define IOTS_NUM_AHB_EXP_PPC 4
 +
 +typedef struct IoTKitSecCtl IoTKitSecCtl;
 +
 +/* State and IRQ lines relating to a PPC. For the
 + * PPCs in the IoTKit not all the IRQ lines are used.
 + */
 +typedef struct IoTKitSecCtlPPC {
 +    qemu_irq nonsec[IOTS_PPC_NUM_PORTS];
 +    qemu_irq ap[IOTS_PPC_NUM_PORTS];
 +    qemu_irq irq_enable;
 +    qemu_irq irq_clear;
 +
 +    uint32_t ns;
 +    uint32_t sp;
 +    uint32_t nsp;
 +
 +    /* Number of ports actually present */
 +    int numports;
 +    /* Offset of this PPC's interrupt bits in SECPPCINTSTAT */
 +    int irq_bit_offset;
 +    IoTKitSecCtl *parent;
 +} IoTKitSecCtlPPC;
 +
 +struct IoTKitSecCtl {
      /*< private >*/
      SysBusDevice parent_obj;
      /*< public >*/
 +    qemu_irq sec_resp_cfg;
      MemoryRegion s_regs;
      MemoryRegion ns_regs;
 -} IoTKitSecCtl;
 +
 +    uint32_t secppcintstat;
 +    uint32_t secppcinten;
 +    uint32_t secrespcfg;
 +
 +    IoTKitSecCtlPPC apb[IOTS_NUM_APB_PPC];
 +    IoTKitSecCtlPPC apbexp[IOTS_NUM_APB_EXP_PPC];
 +    IoTKitSecCtlPPC ahbexp[IOTS_NUM_APB_EXP_PPC];
 +};
  #endif
 diff --git a/hw/misc/iotkit-secctl.c b/hw/misc/iotkit-secctl.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/iotkit-secctl.c
+--- a/target/arm/vfp.decode
-+++ b/hw/misc/iotkit-secctl.c
++++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ static const uint8_t iotkit_secctl_ns_idregs[] = {
+@@ -XXX,XX +XXX,XX @@ VMOV_imm_dp  ---- 1110 1.11 .... .... 1011 0000 .... \
-x0d, 0xf0, 0x05, 0xb1,
+ VMOV_reg_sp  ---- 1110 1.11 0000 .... 1010 01.0 ....        @vfp_dm_ss
- };
+ VMOV_reg_dp  ---- 1110 1.11 0000 .... 1011 01.0 ....        @vfp_dm_dd
-+/* The register sets for the various PPCs (AHB internal, APB internal,
++VABS_hp      ---- 1110 1.11 0000 .... 1001 11.0 ....        @vfp_dm_ss
-+ * AHB expansion, APB expansion) are all set up so that they are
+ VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 ....        @vfp_dm_ss
-+ * in 16-aligned blocks so offsets 0xN0, 0xN4, 0xN8, 0xNC are PPCs
+ VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 ....        @vfp_dm_dd
-+ * 0, 1, 2, 3 of that type, so we can convert a register address offset
-+ * into an an index into a PPC array easily.
++VNEG_hp      ---- 1110 1.11 0001 .... 1001 01.0 ....        @vfp_dm_ss
-+ */
+ VNEG_sp      ---- 1110 1.11 0001 .... 1010 01.0 ....        @vfp_dm_ss
-+static inline int offset_to_ppc_idx(uint32_t offset)
+ VNEG_dp      ---- 1110 1.11 0001 .... 1011 01.0 ....        @vfp_dm_dd
 +VSQRT_hp     ---- 1110 1.11 0001 .... 1001 11.0 ....        @vfp_dm_ss
  VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 ....        @vfp_dm_ss
  VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 ....        @vfp_dm_dd
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ float64 VFP_HELPER(neg, d)(float64 a)
      return float64_chs(a);
  }
 +dh_ctype_f16 VFP_HELPER(abs, h)(dh_ctype_f16 a)
 +{
-+    return extract32(offset, 2, 2);
++    return float16_abs(a);
 +}
 +
-+typedef void PerPPCFunction(IoTKitSecCtlPPC *ppc);
+ float32 VFP_HELPER(abs, s)(float32 a)
-+
+ {
-+static void foreach_ppc(IoTKitSecCtl *s, PerPPCFunction *fn)
+     return float32_abs(a);
@@ -XXX,XX +XXX,XX @@ float64 VFP_HELPER(abs, d)(float64 a)
      return float64_abs(a);
  }
 +dh_ctype_f16 VFP_HELPER(sqrt, h)(dh_ctype_f16 a, CPUARMState *env)
 +{
-+    int i;
++    return float16_sqrt(a, &env->vfp.fp_status_f16);
 +
 +    for (i = 0; i < IOTS_NUM_APB_PPC; i++) {
 +        fn(&s->apb[i]);
 +    }
 +    for (i = 0; i < IOTS_NUM_APB_EXP_PPC; i++) {
 +        fn(&s->apbexp[i]);
 +    }
 +    for (i = 0; i < IOTS_NUM_AHB_EXP_PPC; i++) {
 +        fn(&s->ahbexp[i]);
 +    }
 +}
 +
- static MemTxResult iotkit_secctl_s_read(void *opaque, hwaddr addr,
+ float32 VFP_HELPER(sqrt, s)(float32 a, CPUARMState *env)
                                          uint64_t *pdata,
                                          unsigned size, MemTxAttrs attrs)
  {
-     uint64_t r;
+     return float32_sqrt(a, &env->vfp.fp_status);
-     uint32_t offset = addr & ~0x3;
+diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
-+    IoTKitSecCtl *s = IOTKIT_SECCTL(opaque);
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-vfp.c.inc
-     switch (offset) {
++++ b/target/arm/translate-vfp.c.inc
-     case A_AHBNSPPC0:
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
-@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_read(void *opaque, hwaddr addr,
+     return true;
          r = 0;
          break;
      case A_SECRESPCFG:
 -    case A_NSCCFG:
 -    case A_SECMPCINTSTATUS:
 +        r = s->secrespcfg;
 +        break;
      case A_SECPPCINTSTAT:
 +        r = s->secppcintstat;
 +        break;
      case A_SECPPCINTEN:
 -    case A_SECMSCINTSTAT:
 -    case A_SECMSCINTEN:
 -    case A_BRGINTSTAT:
 -    case A_BRGINTEN:
 +        r = s->secppcinten;
 +        break;
      case A_AHBNSPPCEXP0:
      case A_AHBNSPPCEXP1:
      case A_AHBNSPPCEXP2:
      case A_AHBNSPPCEXP3:
 +        r = s->ahbexp[offset_to_ppc_idx(offset)].ns;
 +        break;
      case A_APBNSPPC0:
      case A_APBNSPPC1:
 +        r = s->apb[offset_to_ppc_idx(offset)].ns;
 +        break;
      case A_APBNSPPCEXP0:
      case A_APBNSPPCEXP1:
      case A_APBNSPPCEXP2:
      case A_APBNSPPCEXP3:
 +        r = s->apbexp[offset_to_ppc_idx(offset)].ns;
 +        break;
      case A_AHBSPPPCEXP0:
      case A_AHBSPPPCEXP1:
      case A_AHBSPPPCEXP2:
      case A_AHBSPPPCEXP3:
 +        r = s->apbexp[offset_to_ppc_idx(offset)].sp;
 +        break;
      case A_APBSPPPC0:
      case A_APBSPPPC1:
 +        r = s->apb[offset_to_ppc_idx(offset)].sp;
 +        break;
      case A_APBSPPPCEXP0:
      case A_APBSPPPCEXP1:
      case A_APBSPPPCEXP2:
      case A_APBSPPPCEXP3:
 +        r = s->apbexp[offset_to_ppc_idx(offset)].sp;
 +        break;
 +    case A_NSCCFG:
 +    case A_SECMPCINTSTATUS:
 +    case A_SECMSCINTSTAT:
 +    case A_SECMSCINTEN:
 +    case A_BRGINTSTAT:
 +    case A_BRGINTEN:
      case A_NSMSCEXP:
          qemu_log_mask(LOG_UNIMP,
                        "IoTKit SecCtl S block read: "
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_read(void *opaque, hwaddr addr,
      return MEMTX_OK;
  }
-+static void iotkit_secctl_update_ppc_ap(IoTKitSecCtlPPC *ppc)
++static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
 +{
-+    int i;
++    /*
 +     * Do a half-precision operation. Functionally this is
 +     * the same as do_vfp_2op_sp(), except:
 +     *  - it doesn't need the VFP vector handling (fp16 is a
 +     *    v8 feature, and in v8 VFP vectors don't exist)
 +     *  - it does the aa32_fp16_arith feature test
 +     */
 +    TCGv_i32 f0;
 +
-+    for (i = 0; i < ppc->numports; i++) {
++    if (!dc_isar_feature(aa32_fp16_arith, s)) {
-+        bool v;
++        return false;
 +    }
 +
-+        if (extract32(ppc->ns, i, 1)) {
++    if (s->vec_len != 0 || s->vec_stride != 0) {
-+            v = extract32(ppc->nsp, i, 1);
++        return false;
 +        } else {
 +            v = extract32(ppc->sp, i, 1);
 +        }
 +        qemu_set_irq(ppc->ap[i], v);
 +    }
++
++    if (!vfp_access_check(s)) {
++        return true;
++    }
++
++    f0 = tcg_temp_new_i32();
++    neon_load_reg32(f0, vm);
++    fn(f0, f0);
++    neon_store_reg32(f0, vd);
++    tcg_temp_free_i32(f0);
++
++    return true;
 +}
 +
-+static void iotkit_secctl_ppc_ns_write(IoTKitSecCtlPPC *ppc, uint32_t value)
+ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
  {
      uint32_t delta_m = 0;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
  DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32)
  DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64)
 +DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh)
  DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss)
  DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd)
 +DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh)
  DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs)
  DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd)
 +static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm)
 +{
-+    int i;
++    gen_helper_vfp_sqrth(vd, vm, cpu_env);
 +
 +    ppc->ns = value & MAKE_64BIT_MASK(0, ppc->numports);
 +    for (i = 0; i < ppc->numports; i++) {
 +        qemu_set_irq(ppc->nonsec[i], extract32(ppc->ns, i, 1));
 +    }
 +    iotkit_secctl_update_ppc_ap(ppc);
 +}
 +
-+static void iotkit_secctl_ppc_sp_write(IoTKitSecCtlPPC *ppc, uint32_t value)
+ static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
 +{
 +    ppc->sp = value & MAKE_64BIT_MASK(0, ppc->numports);
 +    iotkit_secctl_update_ppc_ap(ppc);
 +}
 +
 +static void iotkit_secctl_ppc_nsp_write(IoTKitSecCtlPPC *ppc, uint32_t value)
 +{
 +    ppc->nsp = value & MAKE_64BIT_MASK(0, ppc->numports);
 +    iotkit_secctl_update_ppc_ap(ppc);
 +}
 +
 +static void iotkit_secctl_ppc_update_irq_clear(IoTKitSecCtlPPC *ppc)
 +{
 +    uint32_t value = ppc->parent->secppcintstat;
 +
 +    qemu_set_irq(ppc->irq_clear, extract32(value, ppc->irq_bit_offset, 1));
 +}
 +
 +static void iotkit_secctl_ppc_update_irq_enable(IoTKitSecCtlPPC *ppc)
 +{
 +    uint32_t value = ppc->parent->secppcinten;
 +
 +    qemu_set_irq(ppc->irq_enable, extract32(value, ppc->irq_bit_offset, 1));
 +}
 +
  static MemTxResult iotkit_secctl_s_write(void *opaque, hwaddr addr,
                                           uint64_t value,
                                           unsigned size, MemTxAttrs attrs)
  {
-+    IoTKitSecCtl *s = IOTKIT_SECCTL(opaque);
+     gen_helper_vfp_sqrts(vd, vm, cpu_env);
-     uint32_t offset = addr;
+@@ -XXX,XX +XXX,XX @@ static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
-+    IoTKitSecCtlPPC *ppc;
+     gen_helper_vfp_sqrtd(vd, vm, cpu_env);
      trace_iotkit_secctl_s_write(offset, value, size);
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_write(void *opaque, hwaddr addr,
      switch (offset) {
      case A_SECRESPCFG:
 -    case A_NSCCFG:
 +        value &= 1;
 +        s->secrespcfg = value;
 +        qemu_set_irq(s->sec_resp_cfg, s->secrespcfg);
 +        break;
      case A_SECPPCINTCLR:
 +        value &= 0x00f000f3;
 +        foreach_ppc(s, iotkit_secctl_ppc_update_irq_clear);
 +        break;
      case A_SECPPCINTEN:
 -    case A_SECMSCINTCLR:
 -    case A_SECMSCINTEN:
 -    case A_BRGINTCLR:
 -    case A_BRGINTEN:
 +        s->secppcinten = value & 0x00f000f3;
 +        foreach_ppc(s, iotkit_secctl_ppc_update_irq_enable);
 +        break;
      case A_AHBNSPPCEXP0:
      case A_AHBNSPPCEXP1:
      case A_AHBNSPPCEXP2:
      case A_AHBNSPPCEXP3:
 +        ppc = &s->ahbexp[offset_to_ppc_idx(offset)];
 +        iotkit_secctl_ppc_ns_write(ppc, value);
 +        break;
      case A_APBNSPPC0:
      case A_APBNSPPC1:
 +        ppc = &s->apb[offset_to_ppc_idx(offset)];
 +        iotkit_secctl_ppc_ns_write(ppc, value);
 +        break;
      case A_APBNSPPCEXP0:
      case A_APBNSPPCEXP1:
      case A_APBNSPPCEXP2:
      case A_APBNSPPCEXP3:
 +        ppc = &s->apbexp[offset_to_ppc_idx(offset)];
 +        iotkit_secctl_ppc_ns_write(ppc, value);
 +        break;
      case A_AHBSPPPCEXP0:
      case A_AHBSPPPCEXP1:
      case A_AHBSPPPCEXP2:
      case A_AHBSPPPCEXP3:
 +        ppc = &s->ahbexp[offset_to_ppc_idx(offset)];
 +        iotkit_secctl_ppc_sp_write(ppc, value);
 +        break;
      case A_APBSPPPC0:
      case A_APBSPPPC1:
 +        ppc = &s->apb[offset_to_ppc_idx(offset)];
 +        iotkit_secctl_ppc_sp_write(ppc, value);
 +        break;
      case A_APBSPPPCEXP0:
      case A_APBSPPPCEXP1:
      case A_APBSPPPCEXP2:
      case A_APBSPPPCEXP3:
 +        ppc = &s->apbexp[offset_to_ppc_idx(offset)];
 +        iotkit_secctl_ppc_sp_write(ppc, value);
 +        break;
 +    case A_NSCCFG:
 +    case A_SECMSCINTCLR:
 +    case A_SECMSCINTEN:
 +    case A_BRGINTCLR:
 +    case A_BRGINTEN:
          qemu_log_mask(LOG_UNIMP,
                        "IoTKit SecCtl S block write: "
                        "unimplemented offset 0x%x\n", offset);
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_ns_read(void *opaque, hwaddr addr,
                                           uint64_t *pdata,
                                           unsigned size, MemTxAttrs attrs)
  {
 +    IoTKitSecCtl *s = IOTKIT_SECCTL(opaque);
      uint64_t r;
      uint32_t offset = addr & ~0x3;
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_ns_read(void *opaque, hwaddr addr,
      case A_AHBNSPPPCEXP1:
      case A_AHBNSPPPCEXP2:
      case A_AHBNSPPPCEXP3:
 +        r = s->ahbexp[offset_to_ppc_idx(offset)].nsp;
 +        break;
      case A_APBNSPPPC0:
      case A_APBNSPPPC1:
 +        r = s->apb[offset_to_ppc_idx(offset)].nsp;
 +        break;
      case A_APBNSPPPCEXP0:
      case A_APBNSPPPCEXP1:
      case A_APBNSPPPCEXP2:
      case A_APBNSPPPCEXP3:
 -        qemu_log_mask(LOG_UNIMP,
 -                      "IoTKit SecCtl NS block read: "
 -                      "unimplemented offset 0x%x\n", offset);
 +        r = s->apbexp[offset_to_ppc_idx(offset)].nsp;
          break;
      case A_PID4:
      case A_PID5:
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_ns_write(void *opaque, hwaddr addr,
                                            uint64_t value,
                                            unsigned size, MemTxAttrs attrs)
  {
 +    IoTKitSecCtl *s = IOTKIT_SECCTL(opaque);
      uint32_t offset = addr;
 +    IoTKitSecCtlPPC *ppc;
      trace_iotkit_secctl_ns_write(offset, value, size);
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_ns_write(void *opaque, hwaddr addr,
      case A_AHBNSPPPCEXP1:
      case A_AHBNSPPPCEXP2:
      case A_AHBNSPPPCEXP3:
 +        ppc = &s->ahbexp[offset_to_ppc_idx(offset)];
 +        iotkit_secctl_ppc_nsp_write(ppc, value);
 +        break;
      case A_APBNSPPPC0:
      case A_APBNSPPPC1:
 +        ppc = &s->apb[offset_to_ppc_idx(offset)];
 +        iotkit_secctl_ppc_nsp_write(ppc, value);
 +        break;
      case A_APBNSPPPCEXP0:
      case A_APBNSPPPCEXP1:
      case A_APBNSPPPCEXP2:
      case A_APBNSPPPCEXP3:
 -        qemu_log_mask(LOG_UNIMP,
 -                      "IoTKit SecCtl NS block write: "
 -                      "unimplemented offset 0x%x\n", offset);
 +        ppc = &s->apbexp[offset_to_ppc_idx(offset)];
 +        iotkit_secctl_ppc_nsp_write(ppc, value);
          break;
      case A_AHBNSPPPC0:
      case A_PID4:
@@ -XXX,XX +XXX,XX @@ static const MemoryRegionOps iotkit_secctl_ns_ops = {
      .impl.max_access_size = 4,
  };
 +static void iotkit_secctl_reset_ppc(IoTKitSecCtlPPC *ppc)
 +{
 +    ppc->ns = 0;
 +    ppc->sp = 0;
 +    ppc->nsp = 0;
 +}
 +
  static void iotkit_secctl_reset(DeviceState *dev)
  {
 +    IoTKitSecCtl *s = IOTKIT_SECCTL(dev);
 +    s->secppcintstat = 0;
 +    s->secppcinten = 0;
 +    s->secrespcfg = 0;
 +
 +    foreach_ppc(s, iotkit_secctl_reset_ppc);
 +}
 +
 +static void iotkit_secctl_ppc_irqstatus(void *opaque, int n, int level)
 +{
 +    IoTKitSecCtlPPC *ppc = opaque;
 +    IoTKitSecCtl *s = IOTKIT_SECCTL(ppc->parent);
 +    int irqbit = ppc->irq_bit_offset + n;
 +
 +    s->secppcintstat = deposit32(s->secppcintstat, irqbit, 1, level);
 +}
 +
 +static void iotkit_secctl_init_ppc(IoTKitSecCtl *s,
 +                                   IoTKitSecCtlPPC *ppc,
 +                                   const char *name,
 +                                   int numports,
 +                                   int irq_bit_offset)
 +{
 +    char *gpioname;
 +    DeviceState *dev = DEVICE(s);
 +
 +    ppc->numports = numports;
 +    ppc->irq_bit_offset = irq_bit_offset;
 +    ppc->parent = s;
 +
 +    gpioname = g_strdup_printf("%s_nonsec", name);
 +    qdev_init_gpio_out_named(dev, ppc->nonsec, gpioname, numports);
 +    g_free(gpioname);
 +    gpioname = g_strdup_printf("%s_ap", name);
 +    qdev_init_gpio_out_named(dev, ppc->ap, gpioname, numports);
 +    g_free(gpioname);
 +    gpioname = g_strdup_printf("%s_irq_enable", name);
 +    qdev_init_gpio_out_named(dev, &ppc->irq_enable, gpioname, 1);
 +    g_free(gpioname);
 +    gpioname = g_strdup_printf("%s_irq_clear", name);
 +    qdev_init_gpio_out_named(dev, &ppc->irq_clear, gpioname, 1);
 +    g_free(gpioname);
 +    gpioname = g_strdup_printf("%s_irq_status", name);
 +    qdev_init_gpio_in_named_with_opaque(dev, iotkit_secctl_ppc_irqstatus,
 +                                        ppc, gpioname, 1);
 +    g_free(gpioname);
  }
- static void iotkit_secctl_init(Object *obj)
++DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp)
- {
+ DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
-     IoTKitSecCtl *s = IOTKIT_SECCTL(obj);
+ DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
-     SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
 +    DeviceState *dev = DEVICE(obj);
 +    int i;
 +
 +    iotkit_secctl_init_ppc(s, &s->apb[0], "apb_ppc0",
 +                           IOTS_APB_PPC0_NUM_PORTS, 0);
 +    iotkit_secctl_init_ppc(s, &s->apb[1], "apb_ppc1",
 +                           IOTS_APB_PPC1_NUM_PORTS, 1);
 +
 +    for (i = 0; i < IOTS_NUM_APB_EXP_PPC; i++) {
 +        IoTKitSecCtlPPC *ppc = &s->apbexp[i];
 +        char *ppcname = g_strdup_printf("apb_ppcexp%d", i);
 +        iotkit_secctl_init_ppc(s, ppc, ppcname, IOTS_PPC_NUM_PORTS, 4 + i);
 +        g_free(ppcname);
 +    }
 +    for (i = 0; i < IOTS_NUM_AHB_EXP_PPC; i++) {
 +        IoTKitSecCtlPPC *ppc = &s->ahbexp[i];
 +        char *ppcname = g_strdup_printf("ahb_ppcexp%d", i);
 +        iotkit_secctl_init_ppc(s, ppc, ppcname, IOTS_PPC_NUM_PORTS, 20 + i);
 +        g_free(ppcname);
 +    }
 +
 +    qdev_init_gpio_out_named(dev, &s->sec_resp_cfg, "sec_resp_cfg", 1);
      memory_region_init_io(&s->s_regs, obj, &iotkit_secctl_s_ops,
                            s, "iotkit-secctl-s-regs", 0x1000);
@@ -XXX,XX +XXX,XX @@ static void iotkit_secctl_init(Object *obj)
      sysbus_init_mmio(sbd, &s->ns_regs);
  }
 +static const VMStateDescription iotkit_secctl_ppc_vmstate = {
 +    .name = "iotkit-secctl-ppc",
 +    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .fields = (VMStateField[]) {
 +        VMSTATE_UINT32(ns, IoTKitSecCtlPPC),
 +        VMSTATE_UINT32(sp, IoTKitSecCtlPPC),
 +        VMSTATE_UINT32(nsp, IoTKitSecCtlPPC),
 +        VMSTATE_END_OF_LIST()
 +    }
 +};
 +
  static const VMStateDescription iotkit_secctl_vmstate = {
      .name = "iotkit-secctl",
      .version_id = 1,
      .minimum_version_id = 1,
      .fields = (VMStateField[]) {
 +        VMSTATE_UINT32(secppcintstat, IoTKitSecCtl),
 +        VMSTATE_UINT32(secppcinten, IoTKitSecCtl),
 +        VMSTATE_UINT32(secrespcfg, IoTKitSecCtl),
 +        VMSTATE_STRUCT_ARRAY(apb, IoTKitSecCtl, IOTS_NUM_APB_PPC, 1,
 +                             iotkit_secctl_ppc_vmstate, IoTKitSecCtlPPC),
 +        VMSTATE_STRUCT_ARRAY(apbexp, IoTKitSecCtl, IOTS_NUM_APB_EXP_PPC, 1,
 +                             iotkit_secctl_ppc_vmstate, IoTKitSecCtlPPC),
 +        VMSTATE_STRUCT_ARRAY(ahbexp, IoTKitSecCtl, IOTS_NUM_AHB_EXP_PPC, 1,
 +                             iotkit_secctl_ppc_vmstate, IoTKitSecCtlPPC),
          VMSTATE_END_OF_LIST()
      }
  };
 --
-.16.2
+.20.1

-New patch
+[PULL 09/47] target/arm: Implement VFP fp16 for VMOV immediate
+Implement VFP fp16 support for the VMOV immediate insn.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200828183354.27913-10-peter.maydell@linaro.org
+---
+ target/arm/vfp.decode          |  2 ++
+ target/arm/translate-vfp.c.inc | 22 ++++++++++++++++++++++
+files changed, 24 insertions(+)
+diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp.decode
++++ b/target/arm/vfp.decode
+@@ -XXX,XX +XXX,XX @@ VFMS_dp      ---- 1110 1.10 .... .... 1011 .1.0 ....        @vfp_dnm_d
+ VFNMA_dp     ---- 1110 1.01 .... .... 1011 .0.0 ....        @vfp_dnm_d
+ VFNMS_dp     ---- 1110 1.01 .... .... 1011 .1.0 ....        @vfp_dnm_d
++VMOV_imm_hp  ---- 1110 1.11 .... .... 1001 0000 .... \
++             vd=%vd_sp imm=%vmov_imm
+ VMOV_imm_sp  ---- 1110 1.11 .... .... 1010 0000 .... \
+              vd=%vd_sp imm=%vmov_imm
+ VMOV_imm_dp  ---- 1110 1.11 .... .... 1011 0000 .... \
+diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-vfp.c.inc
++++ b/target/arm/translate-vfp.c.inc
+@@ -XXX,XX +XXX,XX @@ MAKE_VFM_TRANS_FNS(hp)
+ MAKE_VFM_TRANS_FNS(sp)
+ MAKE_VFM_TRANS_FNS(dp)
++static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
++{
++    TCGv_i32 fd;
++
++    if (!dc_isar_feature(aa32_fp16_arith, s)) {
++        return false;
++    }
++
++    if (s->vec_len != 0 || s->vec_stride != 0) {
++        return false;
++    }
++
++    if (!vfp_access_check(s)) {
++        return true;
++    }
++
++    fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
++    neon_store_reg32(fd, a->vd);
++    tcg_temp_free_i32(fd);
++    return true;
++}
++
+ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
+ {
+     uint32_t delta_d = 0;
+--
+.20.1

-[Qemu-devel] [PULL 17/39] hw/misc/mps2-fpgaio: FPGA control block for MPS2 AN505
+[PULL 10/47] target/arm: Implement VFP fp16 VCMP
-The MPS2 AN505 FPGA image includes a "FPGA control block"
+Implement fp16 version of VCMP.
 which is a small set of registers handling LEDs, buttons
 and some counters.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-14-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-11-peter.maydell@linaro.org
 ---
- hw/misc/Makefile.objs           |   1 +
+ target/arm/helper.h            |  2 ++
- include/hw/misc/mps2-fpgaio.h   |  43 ++++++++++
+ target/arm/vfp.decode          |  2 ++
- hw/misc/mps2-fpgaio.c           | 176 ++++++++++++++++++++++++++++++++++++++++
+ target/arm/vfp_helper.c        | 15 +++++++------
- default-configs/arm-softmmu.mak |   1 +
+ target/arm/translate-vfp.c.inc | 39 ++++++++++++++++++++++++++++++++++
- hw/misc/trace-events            |   6 ++
+files changed, 51 insertions(+), 7 deletions(-)
 files changed, 227 insertions(+)
  create mode 100644 include/hw/misc/mps2-fpgaio.h
  create mode 100644 hw/misc/mps2-fpgaio.c
-diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/Makefile.objs
+--- a/target/arm/helper.h
-+++ b/hw/misc/Makefile.objs
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_1(vfp_absd, f64, f64)
- obj-$(CONFIG_MIPS_CPS) += mips_cmgcr.o
+ DEF_HELPER_2(vfp_sqrth, f16, f16, env)
- obj-$(CONFIG_MIPS_CPS) += mips_cpc.o
+ DEF_HELPER_2(vfp_sqrts, f32, f32, env)
- obj-$(CONFIG_MIPS_ITU) += mips_itu.o
+ DEF_HELPER_2(vfp_sqrtd, f64, f64, env)
-+obj-$(CONFIG_MPS2_FPGAIO) += mps2-fpgaio.o
++DEF_HELPER_3(vfp_cmph, void, f16, f16, env)
- obj-$(CONFIG_MPS2_SCC) += mps2-scc.o
+ DEF_HELPER_3(vfp_cmps, void, f32, f32, env)
+ DEF_HELPER_3(vfp_cmpd, void, f64, f64, env)
- obj-$(CONFIG_PVPANIC) += pvpanic.o
++DEF_HELPER_3(vfp_cmpeh, void, f16, f16, env)
-diff --git a/include/hw/misc/mps2-fpgaio.h b/include/hw/misc/mps2-fpgaio.h
+ DEF_HELPER_3(vfp_cmpes, void, f32, f32, env)
-new file mode 100644
+ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
-index XXXXXXX..XXXXXXX
---- /dev/null
+diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-+++ b/include/hw/misc/mps2-fpgaio.h
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@
+--- a/target/arm/vfp.decode
-+/*
++++ b/target/arm/vfp.decode
-+ * ARM MPS2 FPGAIO emulation
+@@ -XXX,XX +XXX,XX @@ VSQRT_hp     ---- 1110 1.11 0001 .... 1001 11.0 ....        @vfp_dm_ss
-+ *
+ VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 ....        @vfp_dm_ss
-+ * Copyright (c) 2018 Linaro Limited
+ VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 ....        @vfp_dm_dd
-+ * Written by Peter Maydell
-+ *
++VCMP_hp      ---- 1110 1.11 010 z:1 .... 1001 e:1 1.0 .... \
-+ *  This program is free software; you can redistribute it and/or modify
++             vd=%vd_sp vm=%vm_sp
-+ *  it under the terms of the GNU General Public License version 2 or
+ VCMP_sp      ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
-+ *  (at your option) any later version.
+              vd=%vd_sp vm=%vm_sp
-+ */
+ VCMP_dp      ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static void softfloat_to_vfp_compare(CPUARMState *env, FloatRelation cmp)
  }
  /* XXX: check quiet/signaling case */
 -#define DO_VFP_cmp(p, type) \
 -void VFP_HELPER(cmp, p)(type a, type b, CPUARMState *env)  \
 +#define DO_VFP_cmp(P, FLOATTYPE, ARGTYPE, FPST) \
 +void VFP_HELPER(cmp, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env)  \
  { \
      softfloat_to_vfp_compare(env, \
 -        type ## _compare_quiet(a, b, &env->vfp.fp_status)); \
 +        FLOATTYPE ## _compare_quiet(a, b, &env->vfp.FPST)); \
  } \
 -void VFP_HELPER(cmpe, p)(type a, type b, CPUARMState *env) \
 +void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
  { \
      softfloat_to_vfp_compare(env, \
 -        type ## _compare(a, b, &env->vfp.fp_status)); \
 +        FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
  }
 -DO_VFP_cmp(s, float32)
 -DO_VFP_cmp(d, float64)
 +DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status_f16)
 +DO_VFP_cmp(s, float32, float32, fp_status)
 +DO_VFP_cmp(d, float64, float64, fp_status)
  #undef DO_VFP_cmp
  /* Integer to float and float to integer conversions */
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp)
  DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
  DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
 +static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
 +{
 +    TCGv_i32 vd, vm;
 +
-+/* This is a model of the FPGAIO register block in the AN505
++    if (!dc_isar_feature(aa32_fp16_arith, s)) {
-+ * FPGA image for the MPS2 dev board; it is documented in the
++        return false;
 + * application note:
 + * http://infocenter.arm.com/help/topic/com.arm.doc.dai0505b/index.html
 + *
 + * QEMU interface:
 + *  + sysbus MMIO region 0: the register bank
 + */
 +
 +#ifndef MPS2_FPGAIO_H
 +#define MPS2_FPGAIO_H
 +
 +#include "hw/sysbus.h"
 +
 +#define TYPE_MPS2_FPGAIO "mps2-fpgaio"
 +#define MPS2_FPGAIO(obj) OBJECT_CHECK(MPS2FPGAIO, (obj), TYPE_MPS2_FPGAIO)
 +
 +typedef struct {
 +    /*< private >*/
 +    SysBusDevice parent_obj;
 +
 +    /*< public >*/
 +    MemoryRegion iomem;
 +
 +    uint32_t led0;
 +    uint32_t prescale;
 +    uint32_t misc;
 +
 +    uint32_t prescale_clk;
 +} MPS2FPGAIO;
 +
 +#endif
 diff --git a/hw/misc/mps2-fpgaio.c b/hw/misc/mps2-fpgaio.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/misc/mps2-fpgaio.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * ARM MPS2 AN505 FPGAIO emulation
 + *
 + * Copyright (c) 2018 Linaro Limited
 + * Written by Peter Maydell
 + *
 + *  This program is free software; you can redistribute it and/or modify
 + *  it under the terms of the GNU General Public License version 2 or
 + *  (at your option) any later version.
 + */
 +
 +/* This is a model of the "FPGA system control and I/O" block found
 + * in the AN505 FPGA image for the MPS2 devboard.
 + * It is documented in AN505:
 + * http://infocenter.arm.com/help/topic/com.arm.doc.dai0505b/index.html
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/log.h"
 +#include "qapi/error.h"
 +#include "trace.h"
 +#include "hw/sysbus.h"
 +#include "hw/registerfields.h"
 +#include "hw/misc/mps2-fpgaio.h"
 +
 +REG32(LED0, 0)
 +REG32(BUTTON, 8)
 +REG32(CLK1HZ, 0x10)
 +REG32(CLK100HZ, 0x14)
 +REG32(COUNTER, 0x18)
 +REG32(PRESCALE, 0x1c)
 +REG32(PSCNTR, 0x20)
 +REG32(MISC, 0x4c)
 +
 +static uint64_t mps2_fpgaio_read(void *opaque, hwaddr offset, unsigned size)
 +{
 +    MPS2FPGAIO *s = MPS2_FPGAIO(opaque);
 +    uint64_t r;
 +
 +    switch (offset) {
 +    case A_LED0:
 +        r = s->led0;
 +        break;
 +    case A_BUTTON:
 +        /* User-pressable board buttons. We don't model that, so just return
 +         * zeroes.
 +         */
 +        r = 0;
 +        break;
 +    case A_PRESCALE:
 +        r = s->prescale;
 +        break;
 +    case A_MISC:
 +        r = s->misc;
 +        break;
 +    case A_CLK1HZ:
 +    case A_CLK100HZ:
 +    case A_COUNTER:
 +    case A_PSCNTR:
 +        /* These are all upcounters of various frequencies. */
 +        qemu_log_mask(LOG_UNIMP, "MPS2 FPGAIO: counters unimplemented\n");
 +        r = 0;
 +        break;
 +    default:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "MPS2 FPGAIO read: bad offset %x\n", (int) offset);
 +        r = 0;
 +        break;
 +    }
 +
-+    trace_mps2_fpgaio_read(offset, r, size);
++    /* Vm/M bits must be zero for the Z variant */
-+    return r;
++    if (a->z && a->vm != 0) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    vd = tcg_temp_new_i32();
 +    vm = tcg_temp_new_i32();
 +
 +    neon_load_reg32(vd, a->vd);
 +    if (a->z) {
 +        tcg_gen_movi_i32(vm, 0);
 +    } else {
 +        neon_load_reg32(vm, a->vm);
 +    }
 +
 +    if (a->e) {
 +        gen_helper_vfp_cmpeh(vd, vm, cpu_env);
 +    } else {
 +        gen_helper_vfp_cmph(vd, vm, cpu_env);
 +    }
 +
 +    tcg_temp_free_i32(vd);
 +    tcg_temp_free_i32(vm);
 +
 +    return true;
 +}
 +
-+static void mps2_fpgaio_write(void *opaque, hwaddr offset, uint64_t value,
+ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
-+                              unsigned size)
+ {
-+{
+     TCGv_i32 vd, vm;
 +    MPS2FPGAIO *s = MPS2_FPGAIO(opaque);
 +
 +    trace_mps2_fpgaio_write(offset, value, size);
 +
 +    switch (offset) {
 +    case A_LED0:
 +        /* LED bits [1:0] control board LEDs. We don't currently have
 +         * a mechanism for displaying this graphically, so use a trace event.
 +         */
 +        trace_mps2_fpgaio_leds(value & 0x02 ? '*' : '.',
 +                               value & 0x01 ? '*' : '.');
 +        s->led0 = value & 0x3;
 +        break;
 +    case A_PRESCALE:
 +        s->prescale = value;
 +        break;
 +    case A_MISC:
 +        /* These are control bits for some of the other devices on the
 +         * board (SPI, CLCD, etc). We don't implement that yet, so just
 +         * make the bits read as written.
 +         */
 +        qemu_log_mask(LOG_UNIMP,
 +                      "MPS2 FPGAIO: MISC control bits unimplemented\n");
 +        s->misc = value;
 +        break;
 +    default:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "MPS2 FPGAIO write: bad offset 0x%x\n", (int) offset);
 +        break;
 +    }
 +}
 +
 +static const MemoryRegionOps mps2_fpgaio_ops = {
 +    .read = mps2_fpgaio_read,
 +    .write = mps2_fpgaio_write,
 +    .endianness = DEVICE_LITTLE_ENDIAN,
 +};
 +
 +static void mps2_fpgaio_reset(DeviceState *dev)
 +{
 +    MPS2FPGAIO *s = MPS2_FPGAIO(dev);
 +
 +    trace_mps2_fpgaio_reset();
 +    s->led0 = 0;
 +    s->prescale = 0;
 +    s->misc = 0;
 +}
 +
 +static void mps2_fpgaio_init(Object *obj)
 +{
 +    SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
 +    MPS2FPGAIO *s = MPS2_FPGAIO(obj);
 +
 +    memory_region_init_io(&s->iomem, obj, &mps2_fpgaio_ops, s,
 +                          "mps2-fpgaio", 0x1000);
 +    sysbus_init_mmio(sbd, &s->iomem);
 +}
 +
 +static const VMStateDescription mps2_fpgaio_vmstate = {
 +    .name = "mps2-fpgaio",
 +    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .fields = (VMStateField[]) {
 +        VMSTATE_UINT32(led0, MPS2FPGAIO),
 +        VMSTATE_UINT32(prescale, MPS2FPGAIO),
 +        VMSTATE_UINT32(misc, MPS2FPGAIO),
 +        VMSTATE_END_OF_LIST()
 +    }
 +};
 +
 +static Property mps2_fpgaio_properties[] = {
 +    /* Frequency of the prescale counter */
 +    DEFINE_PROP_UINT32("prescale-clk", MPS2FPGAIO, prescale_clk, 20000000),
 +    DEFINE_PROP_END_OF_LIST(),
 +};
 +
 +static void mps2_fpgaio_class_init(ObjectClass *klass, void *data)
 +{
 +    DeviceClass *dc = DEVICE_CLASS(klass);
 +
 +    dc->vmsd = &mps2_fpgaio_vmstate;
 +    dc->reset = mps2_fpgaio_reset;
 +    dc->props = mps2_fpgaio_properties;
 +}
 +
 +static const TypeInfo mps2_fpgaio_info = {
 +    .name = TYPE_MPS2_FPGAIO,
 +    .parent = TYPE_SYS_BUS_DEVICE,
 +    .instance_size = sizeof(MPS2FPGAIO),
 +    .instance_init = mps2_fpgaio_init,
 +    .class_init = mps2_fpgaio_class_init,
 +};
 +
 +static void mps2_fpgaio_register_types(void)
 +{
 +    type_register_static(&mps2_fpgaio_info);
 +}
 +
 +type_init(mps2_fpgaio_register_types);
 diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
 index XXXXXXX..XXXXXXX 100644
 --- a/default-configs/arm-softmmu.mak
 +++ b/default-configs/arm-softmmu.mak
@@ -XXX,XX +XXX,XX @@ CONFIG_STM32F205_SOC=y
  CONFIG_CMSDK_APB_TIMER=y
  CONFIG_CMSDK_APB_UART=y
 +CONFIG_MPS2_FPGAIO=y
  CONFIG_MPS2_SCC=y
  CONFIG_VERSATILE_PCI=y
 diff --git a/hw/misc/trace-events b/hw/misc/trace-events
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/trace-events
 +++ b/hw/misc/trace-events
@@ -XXX,XX +XXX,XX @@ mps2_scc_leds(char led7, char led6, char led5, char led4, char led3, char led2,
  mps2_scc_cfg_write(unsigned function, unsigned device, uint32_t value) "MPS2 SCC config write: function %d device %d data 0x%" PRIx32
  mps2_scc_cfg_read(unsigned function, unsigned device, uint32_t value) "MPS2 SCC config read: function %d device %d data 0x%" PRIx32
 +# hw/misc/mps2_fpgaio.c
 +mps2_fpgaio_read(uint64_t offset, uint64_t data, unsigned size) "MPS2 FPGAIO read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
 +mps2_fpgaio_write(uint64_t offset, uint64_t data, unsigned size) "MPS2 FPGAIO write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
 +mps2_fpgaio_reset(void) "MPS2 FPGAIO: reset"
 +mps2_fpgaio_leds(char led1, char led0) "MPS2 FPGAIO LEDs: %c%c"
 +
  # hw/misc/msf2-sysreg.c
  msf2_sysreg_write(uint64_t offset, uint32_t val, uint32_t prev) "msf2-sysreg write: addr 0x%08" HWADDR_PRIx " data 0x%" PRIx32 " prev 0x%" PRIx32
  msf2_sysreg_read(uint64_t offset, uint32_t val) "msf2-sysreg read: addr 0x%08" HWADDR_PRIx " data 0x%08" PRIx32
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 37/39] target/arm: Decode aa32 armv8.3 2-reg-index
+[PULL 11/47] target/arm: Implement VFP fp16 VLDR and VSTR
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the fp16 versions of the VFP VLDR/VSTR (immediate).
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180228193125.20577-15-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200828183354.27913-12-peter.maydell@linaro.org
 ---
- target/arm/translate.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++
+ target/arm/vfp.decode          |  3 +--
-file changed, 61 insertions(+)
+ target/arm/translate-vfp.c.inc | 35 ++++++++++++++++++++++++++++++++++
 files changed, 36 insertions(+), 2 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/vfp.decode
-+++ b/target/arm/translate.c
++++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ VMOV_single  ---- 1110 000 l:1 .... rt:4 1010 . 001 0000    vn=%vn_sp
-     return 0;
+ VMOV_64_sp   ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 ....   vm=%vm_sp
  VMOV_64_dp   ---- 1100 010 op:1 rt2:4 rt:4 1011 00.1 ....   vm=%vm_dp
 -# Note that the half-precision variants of VLDR and VSTR are
 -# not part of this decodetree at all because they have bits [9:8] == 0b01
 +VLDR_VSTR_hp ---- 1101 u:1 .0 l:1 rn:4 .... 1001 imm:8      vd=%vd_sp
  VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8      vd=%vd_sp
  VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8      vd=%vd_dp
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
      return true;
  }
-+/* Advanced SIMD two registers and a scalar extension.
++static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-+ *  31             24   23  22   20   16   12  11   10   9    8        3     0
++{
-+ * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
++    uint32_t offset;
-+ * | 1 1 1 1 1 1 1 0 | o1 | D | o2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
++    TCGv_i32 addr, tmp;
 + * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
 + *
 + */
 +
-+static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
++    if (!dc_isar_feature(aa32_fp16_arith, s)) {
-+{
++        return false;
 +    int rd, rn, rm, rot, size, opr_sz;
 +    TCGv_ptr fpst;
 +    bool q;
 +
 +    q = extract32(insn, 6, 1);
 +    VFP_DREG_D(rd, insn);
 +    VFP_DREG_N(rn, insn);
 +    VFP_DREG_M(rm, insn);
 +    if ((rd | rn) & q) {
 +        return 1;
 +    }
 +
-+    if ((insn & 0xff000f10) == 0xfe000800) {
++    if (!vfp_access_check(s)) {
-+        /* VCMLA (indexed) -- 1111 1110 S.RR .... .... 1000 ...0 .... */
++        return true;
 +        rot = extract32(insn, 20, 2);
 +        size = extract32(insn, 23, 1);
 +        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)
 +            || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
 +            return 1;
 +        }
 +    } else {
 +        return 1;
 +    }
 +
-+    if (s->fp_excp_el) {
++    /* imm8 field is offset/2 for fp16, unlike fp32 and fp64 */
-+        gen_exception_insn(s, 4, EXCP_UDEF,
++    offset = a->imm << 1;
-+                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
++    if (!a->u) {
-+        return 0;
++        offset = -offset;
 +    }
 +    if (!s->vfp_enabled) {
 +        return 1;
 +    }
 +
-+    opr_sz = (1 + q) * 8;
++    /* For thumb, use of PC is UNPREDICTABLE.  */
-+    fpst = get_fpstatus_ptr(1);
++    addr = add_reg_for_lit(s, a->rn, offset);
-+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
++    tmp = tcg_temp_new_i32();
-+                       vfp_reg_offset(1, rn),
++    if (a->l) {
-+                       vfp_reg_offset(1, rm), fpst,
++        gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
-+                       opr_sz, opr_sz, rot,
++        neon_store_reg32(tmp, a->vd);
-+                       size ? gen_helper_gvec_fcmlas_idx
++    } else {
-+                       : gen_helper_gvec_fcmlah_idx);
++        neon_load_reg32(tmp, a->vd);
-+    tcg_temp_free_ptr(fpst);
++        gen_aa32_st16(s, tmp, addr, get_mem_index(s));
-+    return 0;
++    }
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i32(addr);
 +
 +    return true;
 +}
 +
- static int disas_coproc_insn(DisasContext *s, uint32_t insn)
+ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
  {
-     int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
+     uint32_t offset;
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                  goto illegal_op;
              }
              return;
 +        } else if ((insn & 0x0f000a00) == 0x0e000800
 +                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
 +            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
 +                goto illegal_op;
 +            }
 +            return;
          } else if ((insn & 0x0fe00000) == 0x0c400000) {
              /* Coprocessor double register transfer.  */
              ARCH(5TE);
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 19/39] hw/misc/iotkit-secctl: Arm IoT Kit security controller initial skeleton
+[PULL 12/47] target/arm: Implement VFP fp16 VCVT between float and integer
-The Arm IoT Kit includes a "security controller" which is largely a
+Implement the fp16 versions of the VFP VCVT instruction forms which
-collection of registers for controlling the PPCs and other bits of
+convert between floating point and integer.
 glue in the system.  This commit provides the initial skeleton of the
 device, implementing just the ID registers, and a couple of read-only
 read-as-zero registers.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-16-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-13-peter.maydell@linaro.org
 ---
- hw/misc/Makefile.objs           |   1 +
+ target/arm/vfp.decode          |  4 +++
- include/hw/misc/iotkit-secctl.h |  39 ++++
+ target/arm/translate-vfp.c.inc | 65 ++++++++++++++++++++++++++++++++++
- hw/misc/iotkit-secctl.c         | 448 ++++++++++++++++++++++++++++++++++++++++
+files changed, 69 insertions(+)
  default-configs/arm-softmmu.mak |   1 +
  hw/misc/trace-events            |   7 +
 files changed, 496 insertions(+)
  create mode 100644 include/hw/misc/iotkit-secctl.h
  create mode 100644 hw/misc/iotkit-secctl.c
-diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
+diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/Makefile.objs
+--- a/target/arm/vfp.decode
-+++ b/hw/misc/Makefile.objs
++++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_MPS2_FPGAIO) += mps2-fpgaio.o
+@@ -XXX,XX +XXX,XX @@ VCVT_sp      ---- 1110 1.11 0111 .... 1010 11.0 ....        @vfp_dm_ds
- obj-$(CONFIG_MPS2_SCC) += mps2-scc.o
+ VCVT_dp      ---- 1110 1.11 0111 .... 1011 11.0 ....        @vfp_dm_sd
- obj-$(CONFIG_TZ_PPC) += tz-ppc.o
+ # VCVT from integer to floating point: Vm always single; Vd depends on size
-+obj-$(CONFIG_IOTKIT_SECCTL) += iotkit-secctl.o
++VCVT_int_hp  ---- 1110 1.11 1000 .... 1001 s:1 1.0 .... \
++             vd=%vd_sp vm=%vm_sp
- obj-$(CONFIG_PVPANIC) += pvpanic.o
+ VCVT_int_sp  ---- 1110 1.11 1000 .... 1010 s:1 1.0 .... \
- obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
+              vd=%vd_sp vm=%vm_sp
-diff --git a/include/hw/misc/iotkit-secctl.h b/include/hw/misc/iotkit-secctl.h
+ VCVT_int_dp  ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
-new file mode 100644
+@@ -XXX,XX +XXX,XX @@ VCVT_fix_dp  ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
-index XXXXXXX..XXXXXXX
+              vd=%vd_dp imm=%vm_sp opc=%vcvt_fix_op
---- /dev/null
-+++ b/include/hw/misc/iotkit-secctl.h
+ # VCVT float to integer (VCVT and VCVTR): Vd always single; Vd depends on size
-@@ -XXX,XX +XXX,XX @@
++VCVT_hp_int  ---- 1110 1.11 110 s:1 .... 1001 rz:1 1.0 .... \
-+/*
++             vd=%vd_sp vm=%vm_sp
-+ * ARM IoT Kit security controller
+ VCVT_sp_int  ---- 1110 1.11 110 s:1 .... 1010 rz:1 1.0 .... \
-+ *
+              vd=%vd_sp vm=%vm_sp
-+ * Copyright (c) 2018 Linaro Limited
+ VCVT_dp_int  ---- 1110 1.11 110 s:1 .... 1011 rz:1 1.0 .... \
-+ * Written by Peter Maydell
+diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
-+ *
+index XXXXXXX..XXXXXXX 100644
-+ * This program is free software; you can redistribute it and/or modify
+--- a/target/arm/translate-vfp.c.inc
-+ * it under the terms of the GNU General Public License version 2 or
++++ b/target/arm/translate-vfp.c.inc
-+ * (at your option) any later version.
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
-+ */
+     return true;
  }
 +static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
 +{
 +    TCGv_i32 vm;
 +    TCGv_ptr fpst;
 +
-+/* This is a model of the security controller which is part of the
++    if (!dc_isar_feature(aa32_fp16_arith, s)) {
-+ * Arm IoT Kit and documented in
++        return false;
 + * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
 + *
 + * QEMU interface:
 + *  + sysbus MMIO region 0 is the "secure privilege control block" registers
 + *  + sysbus MMIO region 1 is the "non-secure privilege control block" registers
 + */
 +
 +#ifndef IOTKIT_SECCTL_H
 +#define IOTKIT_SECCTL_H
 +
 +#include "hw/sysbus.h"
 +
 +#define TYPE_IOTKIT_SECCTL "iotkit-secctl"
 +#define IOTKIT_SECCTL(obj) OBJECT_CHECK(IoTKitSecCtl, (obj), TYPE_IOTKIT_SECCTL)
 +
 +typedef struct IoTKitSecCtl {
 +    /*< private >*/
 +    SysBusDevice parent_obj;
 +
 +    /*< public >*/
 +
 +    MemoryRegion s_regs;
 +    MemoryRegion ns_regs;
 +} IoTKitSecCtl;
 +
 +#endif
 diff --git a/hw/misc/iotkit-secctl.c b/hw/misc/iotkit-secctl.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/misc/iotkit-secctl.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * Arm IoT Kit security controller
 + *
 + * Copyright (c) 2018 Linaro Limited
 + * Written by Peter Maydell
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 or
 + * (at your option) any later version.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/log.h"
 +#include "qapi/error.h"
 +#include "trace.h"
 +#include "hw/sysbus.h"
 +#include "hw/registerfields.h"
 +#include "hw/misc/iotkit-secctl.h"
 +
 +/* Registers in the secure privilege control block */
 +REG32(SECRESPCFG, 0x10)
 +REG32(NSCCFG, 0x14)
 +REG32(SECMPCINTSTATUS, 0x1c)
 +REG32(SECPPCINTSTAT, 0x20)
 +REG32(SECPPCINTCLR, 0x24)
 +REG32(SECPPCINTEN, 0x28)
 +REG32(SECMSCINTSTAT, 0x30)
 +REG32(SECMSCINTCLR, 0x34)
 +REG32(SECMSCINTEN, 0x38)
 +REG32(BRGINTSTAT, 0x40)
 +REG32(BRGINTCLR, 0x44)
 +REG32(BRGINTEN, 0x48)
 +REG32(AHBNSPPC0, 0x50)
 +REG32(AHBNSPPCEXP0, 0x60)
 +REG32(AHBNSPPCEXP1, 0x64)
 +REG32(AHBNSPPCEXP2, 0x68)
 +REG32(AHBNSPPCEXP3, 0x6c)
 +REG32(APBNSPPC0, 0x70)
 +REG32(APBNSPPC1, 0x74)
 +REG32(APBNSPPCEXP0, 0x80)
 +REG32(APBNSPPCEXP1, 0x84)
 +REG32(APBNSPPCEXP2, 0x88)
 +REG32(APBNSPPCEXP3, 0x8c)
 +REG32(AHBSPPPC0, 0x90)
 +REG32(AHBSPPPCEXP0, 0xa0)
 +REG32(AHBSPPPCEXP1, 0xa4)
 +REG32(AHBSPPPCEXP2, 0xa8)
 +REG32(AHBSPPPCEXP3, 0xac)
 +REG32(APBSPPPC0, 0xb0)
 +REG32(APBSPPPC1, 0xb4)
 +REG32(APBSPPPCEXP0, 0xc0)
 +REG32(APBSPPPCEXP1, 0xc4)
 +REG32(APBSPPPCEXP2, 0xc8)
 +REG32(APBSPPPCEXP3, 0xcc)
 +REG32(NSMSCEXP, 0xd0)
 +REG32(PID4, 0xfd0)
 +REG32(PID5, 0xfd4)
 +REG32(PID6, 0xfd8)
 +REG32(PID7, 0xfdc)
 +REG32(PID0, 0xfe0)
 +REG32(PID1, 0xfe4)
 +REG32(PID2, 0xfe8)
 +REG32(PID3, 0xfec)
 +REG32(CID0, 0xff0)
 +REG32(CID1, 0xff4)
 +REG32(CID2, 0xff8)
 +REG32(CID3, 0xffc)
 +
 +/* Registers in the non-secure privilege control block */
 +REG32(AHBNSPPPC0, 0x90)
 +REG32(AHBNSPPPCEXP0, 0xa0)
 +REG32(AHBNSPPPCEXP1, 0xa4)
 +REG32(AHBNSPPPCEXP2, 0xa8)
 +REG32(AHBNSPPPCEXP3, 0xac)
 +REG32(APBNSPPPC0, 0xb0)
 +REG32(APBNSPPPC1, 0xb4)
 +REG32(APBNSPPPCEXP0, 0xc0)
 +REG32(APBNSPPPCEXP1, 0xc4)
 +REG32(APBNSPPPCEXP2, 0xc8)
 +REG32(APBNSPPPCEXP3, 0xcc)
 +/* PID and CID registers are also present in the NS block */
 +
 +static const uint8_t iotkit_secctl_s_idregs[] = {
 +    0x04, 0x00, 0x00, 0x00,
 +    0x52, 0xb8, 0x0b, 0x00,
 +    0x0d, 0xf0, 0x05, 0xb1,
 +};
 +
 +static const uint8_t iotkit_secctl_ns_idregs[] = {
 +    0x04, 0x00, 0x00, 0x00,
 +    0x53, 0xb8, 0x0b, 0x00,
 +    0x0d, 0xf0, 0x05, 0xb1,
 +};
 +
 +static MemTxResult iotkit_secctl_s_read(void *opaque, hwaddr addr,
 +                                        uint64_t *pdata,
 +                                        unsigned size, MemTxAttrs attrs)
 +{
 +    uint64_t r;
 +    uint32_t offset = addr & ~0x3;
 +
 +    switch (offset) {
 +    case A_AHBNSPPC0:
 +    case A_AHBSPPPC0:
 +        r = 0;
 +        break;
 +    case A_SECRESPCFG:
 +    case A_NSCCFG:
 +    case A_SECMPCINTSTATUS:
 +    case A_SECPPCINTSTAT:
 +    case A_SECPPCINTEN:
 +    case A_SECMSCINTSTAT:
 +    case A_SECMSCINTEN:
 +    case A_BRGINTSTAT:
 +    case A_BRGINTEN:
 +    case A_AHBNSPPCEXP0:
 +    case A_AHBNSPPCEXP1:
 +    case A_AHBNSPPCEXP2:
 +    case A_AHBNSPPCEXP3:
 +    case A_APBNSPPC0:
 +    case A_APBNSPPC1:
 +    case A_APBNSPPCEXP0:
 +    case A_APBNSPPCEXP1:
 +    case A_APBNSPPCEXP2:
 +    case A_APBNSPPCEXP3:
 +    case A_AHBSPPPCEXP0:
 +    case A_AHBSPPPCEXP1:
 +    case A_AHBSPPPCEXP2:
 +    case A_AHBSPPPCEXP3:
 +    case A_APBSPPPC0:
 +    case A_APBSPPPC1:
 +    case A_APBSPPPCEXP0:
 +    case A_APBSPPPCEXP1:
 +    case A_APBSPPPCEXP2:
 +    case A_APBSPPPCEXP3:
 +    case A_NSMSCEXP:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "IoTKit SecCtl S block read: "
 +                      "unimplemented offset 0x%x\n", offset);
 +        r = 0;
 +        break;
 +    case A_PID4:
 +    case A_PID5:
 +    case A_PID6:
 +    case A_PID7:
 +    case A_PID0:
 +    case A_PID1:
 +    case A_PID2:
 +    case A_PID3:
 +    case A_CID0:
 +    case A_CID1:
 +    case A_CID2:
 +    case A_CID3:
 +        r = iotkit_secctl_s_idregs[(offset - A_PID4) / 4];
 +        break;
 +    case A_SECPPCINTCLR:
 +    case A_SECMSCINTCLR:
 +    case A_BRGINTCLR:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "IotKit SecCtl S block read: write-only offset 0x%x\n",
 +                      offset);
 +        r = 0;
 +        break;
 +    default:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "IotKit SecCtl S block read: bad offset 0x%x\n", offset);
 +        r = 0;
 +        break;
 +    }
 +
-+    if (size != 4) {
++    if (!vfp_access_check(s)) {
-+        /* None of our registers are access-sensitive, so just pull the right
++        return true;
 +         * byte out of the word read result.
 +         */
 +        r = extract32(r, (addr & 3) * 8, size * 8);
 +    }
 +
-+    trace_iotkit_secctl_s_read(offset, r, size);
++    vm = tcg_temp_new_i32();
-+    *pdata = r;
++    neon_load_reg32(vm, a->vm);
-+    return MEMTX_OK;
++    fpst = fpstatus_ptr(FPST_FPCR_F16);
 +    if (a->s) {
 +        /* i32 -> f16 */
 +        gen_helper_vfp_sitoh(vm, vm, fpst);
 +    } else {
 +        /* u32 -> f16 */
 +        gen_helper_vfp_uitoh(vm, vm, fpst);
 +    }
 +    neon_store_reg32(vm, a->vd);
 +    tcg_temp_free_i32(vm);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
-+static MemTxResult iotkit_secctl_s_write(void *opaque, hwaddr addr,
+ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
-+                                         uint64_t value,
+ {
-+                                         unsigned size, MemTxAttrs attrs)
+     TCGv_i32 vm;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
      return true;
  }
 +static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
 +{
-+    uint32_t offset = addr;
++    TCGv_i32 vm;
 +    TCGv_ptr fpst;
 +
-+    trace_iotkit_secctl_s_write(offset, value, size);
++    if (!dc_isar_feature(aa32_fp16_arith, s)) {
-+
++        return false;
 +    if (size != 4) {
 +        /* Byte and halfword writes are ignored */
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "IotKit SecCtl S block write: bad size, ignored\n");
 +        return MEMTX_OK;
 +    }
 +
-+    switch (offset) {
++    if (!vfp_access_check(s)) {
-+    case A_SECRESPCFG:
++        return true;
 +    case A_NSCCFG:
 +    case A_SECPPCINTCLR:
 +    case A_SECPPCINTEN:
 +    case A_SECMSCINTCLR:
 +    case A_SECMSCINTEN:
 +    case A_BRGINTCLR:
 +    case A_BRGINTEN:
 +    case A_AHBNSPPCEXP0:
 +    case A_AHBNSPPCEXP1:
 +    case A_AHBNSPPCEXP2:
 +    case A_AHBNSPPCEXP3:
 +    case A_APBNSPPC0:
 +    case A_APBNSPPC1:
 +    case A_APBNSPPCEXP0:
 +    case A_APBNSPPCEXP1:
 +    case A_APBNSPPCEXP2:
 +    case A_APBNSPPCEXP3:
 +    case A_AHBSPPPCEXP0:
 +    case A_AHBSPPPCEXP1:
 +    case A_AHBSPPPCEXP2:
 +    case A_AHBSPPPCEXP3:
 +    case A_APBSPPPC0:
 +    case A_APBSPPPC1:
 +    case A_APBSPPPCEXP0:
 +    case A_APBSPPPCEXP1:
 +    case A_APBSPPPCEXP2:
 +    case A_APBSPPPCEXP3:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "IoTKit SecCtl S block write: "
 +                      "unimplemented offset 0x%x\n", offset);
 +        break;
 +    case A_SECMPCINTSTATUS:
 +    case A_SECPPCINTSTAT:
 +    case A_SECMSCINTSTAT:
 +    case A_BRGINTSTAT:
 +    case A_AHBNSPPC0:
 +    case A_AHBSPPPC0:
 +    case A_NSMSCEXP:
 +    case A_PID4:
 +    case A_PID5:
 +    case A_PID6:
 +    case A_PID7:
 +    case A_PID0:
 +    case A_PID1:
 +    case A_PID2:
 +    case A_PID3:
 +    case A_CID0:
 +    case A_CID1:
 +    case A_CID2:
 +    case A_CID3:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "IoTKit SecCtl S block write: "
 +                      "read-only offset 0x%x\n", offset);
 +        break;
 +    default:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "IotKit SecCtl S block write: bad offset 0x%x\n",
 +                      offset);
 +        break;
 +    }
 +
-+    return MEMTX_OK;
++    fpst = fpstatus_ptr(FPST_FPCR_F16);
 +    vm = tcg_temp_new_i32();
 +    neon_load_reg32(vm, a->vm);
 +
 +    if (a->s) {
 +        if (a->rz) {
 +            gen_helper_vfp_tosizh(vm, vm, fpst);
 +        } else {
 +            gen_helper_vfp_tosih(vm, vm, fpst);
 +        }
 +    } else {
 +        if (a->rz) {
 +            gen_helper_vfp_touizh(vm, vm, fpst);
 +        } else {
 +            gen_helper_vfp_touih(vm, vm, fpst);
 +        }
 +    }
 +    neon_store_reg32(vm, a->vd);
 +    tcg_temp_free_i32(vm);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
-+static MemTxResult iotkit_secctl_ns_read(void *opaque, hwaddr addr,
+ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
-+                                         uint64_t *pdata,
+ {
-+                                         unsigned size, MemTxAttrs attrs)
+     TCGv_i32 vm;
 +{
 +    uint64_t r;
 +    uint32_t offset = addr & ~0x3;
 +
 +    switch (offset) {
 +    case A_AHBNSPPPC0:
 +        r = 0;
 +        break;
 +    case A_AHBNSPPPCEXP0:
 +    case A_AHBNSPPPCEXP1:
 +    case A_AHBNSPPPCEXP2:
 +    case A_AHBNSPPPCEXP3:
 +    case A_APBNSPPPC0:
 +    case A_APBNSPPPC1:
 +    case A_APBNSPPPCEXP0:
 +    case A_APBNSPPPCEXP1:
 +    case A_APBNSPPPCEXP2:
 +    case A_APBNSPPPCEXP3:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "IoTKit SecCtl NS block read: "
 +                      "unimplemented offset 0x%x\n", offset);
 +        break;
 +    case A_PID4:
 +    case A_PID5:
 +    case A_PID6:
 +    case A_PID7:
 +    case A_PID0:
 +    case A_PID1:
 +    case A_PID2:
 +    case A_PID3:
 +    case A_CID0:
 +    case A_CID1:
 +    case A_CID2:
 +    case A_CID3:
 +        r = iotkit_secctl_ns_idregs[(offset - A_PID4) / 4];
 +        break;
 +    default:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "IotKit SecCtl NS block write: bad offset 0x%x\n",
 +                      offset);
 +        r = 0;
 +        break;
 +    }
 +
 +    if (size != 4) {
 +        /* None of our registers are access-sensitive, so just pull the right
 +         * byte out of the word read result.
 +         */
 +        r = extract32(r, (addr & 3) * 8, size * 8);
 +    }
 +
 +    trace_iotkit_secctl_ns_read(offset, r, size);
 +    *pdata = r;
 +    return MEMTX_OK;
 +}
 +
 +static MemTxResult iotkit_secctl_ns_write(void *opaque, hwaddr addr,
 +                                          uint64_t value,
 +                                          unsigned size, MemTxAttrs attrs)
 +{
 +    uint32_t offset = addr;
 +
 +    trace_iotkit_secctl_ns_write(offset, value, size);
 +
 +    if (size != 4) {
 +        /* Byte and halfword writes are ignored */
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "IotKit SecCtl NS block write: bad size, ignored\n");
 +        return MEMTX_OK;
 +    }
 +
 +    switch (offset) {
 +    case A_AHBNSPPPCEXP0:
 +    case A_AHBNSPPPCEXP1:
 +    case A_AHBNSPPPCEXP2:
 +    case A_AHBNSPPPCEXP3:
 +    case A_APBNSPPPC0:
 +    case A_APBNSPPPC1:
 +    case A_APBNSPPPCEXP0:
 +    case A_APBNSPPPCEXP1:
 +    case A_APBNSPPPCEXP2:
 +    case A_APBNSPPPCEXP3:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "IoTKit SecCtl NS block write: "
 +                      "unimplemented offset 0x%x\n", offset);
 +        break;
 +    case A_AHBNSPPPC0:
 +    case A_PID4:
 +    case A_PID5:
 +    case A_PID6:
 +    case A_PID7:
 +    case A_PID0:
 +    case A_PID1:
 +    case A_PID2:
 +    case A_PID3:
 +    case A_CID0:
 +    case A_CID1:
 +    case A_CID2:
 +    case A_CID3:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "IoTKit SecCtl NS block write: "
 +                      "read-only offset 0x%x\n", offset);
 +        break;
 +    default:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "IotKit SecCtl NS block write: bad offset 0x%x\n",
 +                      offset);
 +        break;
 +    }
 +
 +    return MEMTX_OK;
 +}
 +
 +static const MemoryRegionOps iotkit_secctl_s_ops = {
 +    .read_with_attrs = iotkit_secctl_s_read,
 +    .write_with_attrs = iotkit_secctl_s_write,
 +    .endianness = DEVICE_LITTLE_ENDIAN,
 +    .valid.min_access_size = 1,
 +    .valid.max_access_size = 4,
 +    .impl.min_access_size = 1,
 +    .impl.max_access_size = 4,
 +};
 +
 +static const MemoryRegionOps iotkit_secctl_ns_ops = {
 +    .read_with_attrs = iotkit_secctl_ns_read,
 +    .write_with_attrs = iotkit_secctl_ns_write,
 +    .endianness = DEVICE_LITTLE_ENDIAN,
 +    .valid.min_access_size = 1,
 +    .valid.max_access_size = 4,
 +    .impl.min_access_size = 1,
 +    .impl.max_access_size = 4,
 +};
 +
 +static void iotkit_secctl_reset(DeviceState *dev)
 +{
 +
 +}
 +
 +static void iotkit_secctl_init(Object *obj)
 +{
 +    IoTKitSecCtl *s = IOTKIT_SECCTL(obj);
 +    SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
 +
 +    memory_region_init_io(&s->s_regs, obj, &iotkit_secctl_s_ops,
 +                          s, "iotkit-secctl-s-regs", 0x1000);
 +    memory_region_init_io(&s->ns_regs, obj, &iotkit_secctl_ns_ops,
 +                          s, "iotkit-secctl-ns-regs", 0x1000);
 +    sysbus_init_mmio(sbd, &s->s_regs);
 +    sysbus_init_mmio(sbd, &s->ns_regs);
 +}
 +
 +static const VMStateDescription iotkit_secctl_vmstate = {
 +    .name = "iotkit-secctl",
 +    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .fields = (VMStateField[]) {
 +        VMSTATE_END_OF_LIST()
 +    }
 +};
 +
 +static void iotkit_secctl_class_init(ObjectClass *klass, void *data)
 +{
 +    DeviceClass *dc = DEVICE_CLASS(klass);
 +
 +    dc->vmsd = &iotkit_secctl_vmstate;
 +    dc->reset = iotkit_secctl_reset;
 +}
 +
 +static const TypeInfo iotkit_secctl_info = {
 +    .name = TYPE_IOTKIT_SECCTL,
 +    .parent = TYPE_SYS_BUS_DEVICE,
 +    .instance_size = sizeof(IoTKitSecCtl),
 +    .instance_init = iotkit_secctl_init,
 +    .class_init = iotkit_secctl_class_init,
 +};
 +
 +static void iotkit_secctl_register_types(void)
 +{
 +    type_register_static(&iotkit_secctl_info);
 +}
 +
 +type_init(iotkit_secctl_register_types);
 diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
 index XXXXXXX..XXXXXXX 100644
 --- a/default-configs/arm-softmmu.mak
 +++ b/default-configs/arm-softmmu.mak
@@ -XXX,XX +XXX,XX @@ CONFIG_MPS2_FPGAIO=y
  CONFIG_MPS2_SCC=y
  CONFIG_TZ_PPC=y
 +CONFIG_IOTKIT_SECCTL=y
  CONFIG_VERSATILE_PCI=y
  CONFIG_VERSATILE_I2C=y
 diff --git a/hw/misc/trace-events b/hw/misc/trace-events
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/trace-events
 +++ b/hw/misc/trace-events
@@ -XXX,XX +XXX,XX @@ tz_ppc_irq_clear(int level) "TZ PPC: int_clear = %d"
  tz_ppc_update_irq(int level) "TZ PPC: setting irq line to %d"
  tz_ppc_read_blocked(int n, hwaddr offset, bool secure, bool user) "TZ PPC: port %d offset 0x%" HWADDR_PRIx " read (secure %d user %d) blocked"
  tz_ppc_write_blocked(int n, hwaddr offset, bool secure, bool user) "TZ PPC: port %d offset 0x%" HWADDR_PRIx " write (secure %d user %d) blocked"
 +
 +# hw/misc/iotkit-secctl.c
 +iotkit_secctl_s_read(uint32_t offset, uint64_t data, unsigned size) "IoTKit SecCtl S regs read: offset 0x%x data 0x%" PRIx64 " size %u"
 +iotkit_secctl_s_write(uint32_t offset, uint64_t data, unsigned size) "IoTKit SecCtl S regs write: offset 0x%x data 0x%" PRIx64 " size %u"
 +iotkit_secctl_ns_read(uint32_t offset, uint64_t data, unsigned size) "IoTKit SecCtl NS regs read: offset 0x%x data 0x%" PRIx64 " size %u"
 +iotkit_secctl_ns_write(uint32_t offset, uint64_t data, unsigned size) "IoTKit SecCtl NS regs write: offset 0x%x data 0x%" PRIx64 " size %u"
 +iotkit_secctl_reset(void) "IoTKit SecCtl: reset"
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 03/39] xlnx-zynqmp: Connect the RTC device
+[PULL 13/47] target/arm: Make VFP_CONV_FIX macros take separate float type and float size
-From: Alistair Francis <alistair.francis@xilinx.com>
+Currently the VFP_CONV_FIX macros take a single fsz argument for the
 size of the float type, which is used both to select the name of
 the functions to call (eg float32_is_any_nan()) and also for the
 type to use for the float inputs and outputs (eg float32).
-Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
+Separate these into fsz and ftype arguments, so that we can use them
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+for fp16, which uses 'float16' in the function names but is still
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+passing inputs and outputs in a 32-bit sized type.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200828183354.27913-14-peter.maydell@linaro.org
 ---
- include/hw/arm/xlnx-zynqmp.h |  2 ++
+ target/arm/vfp_helper.c | 46 ++++++++++++++++++++---------------------
- hw/arm/xlnx-zynqmp.c         | 14 ++++++++++++++
+file changed, 23 insertions(+), 23 deletions(-)
 files changed, 16 insertions(+)
-diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/xlnx-zynqmp.h
+--- a/target/arm/vfp_helper.c
-+++ b/include/hw/arm/xlnx-zynqmp.h
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
  #include "hw/dma/xlnx_dpdma.h"
  #include "hw/display/xlnx_dp.h"
  #include "hw/intc/xlnx-zynqmp-ipi.h"
 +#include "hw/timer/xlnx-zynqmp-rtc.h"
  #define TYPE_XLNX_ZYNQMP "xlnx,zynqmp"
  #define XLNX_ZYNQMP(obj) OBJECT_CHECK(XlnxZynqMPState, (obj), \
@@ -XXX,XX +XXX,XX @@ typedef struct XlnxZynqMPState {
      XlnxDPState dp;
      XlnxDPDMAState dpdma;
      XlnxZynqMPIPI ipi;
 +    XlnxZynqMPRTC rtc;
      char *boot_cpu;
      ARMCPU *boot_cpu_ptr;
 diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-zynqmp.c
 +++ b/hw/arm/xlnx-zynqmp.c
@@ -XXX,XX +XXX,XX @@
  #define IPI_ADDR            0xFF300000
  #define IPI_IRQ             64
 +#define RTC_ADDR            0xffa60000
 +#define RTC_IRQ             26
 +
  #define SDHCI_CAPABILITIES  0x280737ec6481 /* Datasheet: UG1085 (v1.7) */
  static const uint64_t gem_addr[XLNX_ZYNQMP_NUM_GEMS] = {
@@ -XXX,XX +XXX,XX @@ static void xlnx_zynqmp_init(Object *obj)
      object_initialize(&s->ipi, sizeof(s->ipi), TYPE_XLNX_ZYNQMP_IPI);
      qdev_set_parent_bus(DEVICE(&s->ipi), sysbus_get_default());
 +
 +    object_initialize(&s->rtc, sizeof(s->rtc), TYPE_XLNX_ZYNQMP_RTC);
 +    qdev_set_parent_bus(DEVICE(&s->rtc), sysbus_get_default());
  }
- static void xlnx_zynqmp_realize(DeviceState *dev, Error **errp)
+ /* VFP3 fixed point conversion.  */
-@@ -XXX,XX +XXX,XX @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error **errp)
+-#define VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype) \
-     }
+-float##fsz HELPER(vfp_##name##to##p)(uint##isz##_t  x, uint32_t shift, \
-     sysbus_mmio_map(SYS_BUS_DEVICE(&s->ipi), 0, IPI_ADDR);
++#define VFP_CONV_FIX_FLOAT(name, p, fsz, ftype, isz, itype)            \
-     sysbus_connect_irq(SYS_BUS_DEVICE(&s->ipi), 0, gic_spi[IPI_IRQ]);
++ftype HELPER(vfp_##name##to##p)(uint##isz##_t  x, uint32_t shift,      \
-+
+                                      void *fpstp) \
-+    object_property_set_bool(OBJECT(&s->rtc), true, "realized", &err);
+ { return itype##_to_##float##fsz##_scalbn(x, -shift, fpstp); }
-+    if (err) {
-+        error_propagate(errp, err);
+-#define VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype, ROUND, suff)   \
-+        return;
+-uint##isz##_t HELPER(vfp_to##name##p##suff)(float##fsz x, uint32_t shift, \
-+    }
++#define VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype, ROUND, suff) \
-+    sysbus_mmio_map(SYS_BUS_DEVICE(&s->rtc), 0, RTC_ADDR);
++uint##isz##_t HELPER(vfp_to##name##p##suff)(ftype x, uint32_t shift,      \
-+    sysbus_connect_irq(SYS_BUS_DEVICE(&s->rtc), 0, gic_spi[RTC_IRQ]);
+                                             void *fpst)                   \
  {                                                                         \
      if (unlikely(float##fsz##_is_any_nan(x))) {                           \
@@ -XXX,XX +XXX,XX @@ uint##isz##_t HELPER(vfp_to##name##p##suff)(float##fsz x, uint32_t shift, \
      return float##fsz##_to_##itype##_scalbn(x, ROUND, shift, fpst);       \
  }
- static Property xlnx_zynqmp_props[] = {
+-#define VFP_CONV_FIX(name, p, fsz, isz, itype)                   \
 -VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype)                     \
 -VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
 +#define VFP_CONV_FIX(name, p, fsz, ftype, isz, itype)            \
 +VFP_CONV_FIX_FLOAT(name, p, fsz, ftype, isz, itype)              \
 +VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype,        \
                           float_round_to_zero, _round_to_zero)    \
 -VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
 +VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype,        \
                           get_float_rounding_mode(fpst), )
 -#define VFP_CONV_FIX_A64(name, p, fsz, isz, itype)               \
 -VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype)                     \
 -VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
 +#define VFP_CONV_FIX_A64(name, p, fsz, ftype, isz, itype)        \
 +VFP_CONV_FIX_FLOAT(name, p, fsz, ftype, isz, itype)              \
 +VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype,        \
                           get_float_rounding_mode(fpst), )
 -VFP_CONV_FIX(sh, d, 64, 64, int16)
 -VFP_CONV_FIX(sl, d, 64, 64, int32)
 -VFP_CONV_FIX_A64(sq, d, 64, 64, int64)
 -VFP_CONV_FIX(uh, d, 64, 64, uint16)
 -VFP_CONV_FIX(ul, d, 64, 64, uint32)
 -VFP_CONV_FIX_A64(uq, d, 64, 64, uint64)
 -VFP_CONV_FIX(sh, s, 32, 32, int16)
 -VFP_CONV_FIX(sl, s, 32, 32, int32)
 -VFP_CONV_FIX_A64(sq, s, 32, 64, int64)
 -VFP_CONV_FIX(uh, s, 32, 32, uint16)
 -VFP_CONV_FIX(ul, s, 32, 32, uint32)
 -VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
 +VFP_CONV_FIX(sh, d, 64, float64, 64, int16)
 +VFP_CONV_FIX(sl, d, 64, float64, 64, int32)
 +VFP_CONV_FIX_A64(sq, d, 64, float64, 64, int64)
 +VFP_CONV_FIX(uh, d, 64, float64, 64, uint16)
 +VFP_CONV_FIX(ul, d, 64, float64, 64, uint32)
 +VFP_CONV_FIX_A64(uq, d, 64, float64, 64, uint64)
 +VFP_CONV_FIX(sh, s, 32, float32, 32, int16)
 +VFP_CONV_FIX(sl, s, 32, float32, 32, int32)
 +VFP_CONV_FIX_A64(sq, s, 32, float32, 64, int64)
 +VFP_CONV_FIX(uh, s, 32, float32, 32, uint16)
 +VFP_CONV_FIX(ul, s, 32, float32, 32, uint32)
 +VFP_CONV_FIX_A64(uq, s, 32, float32, 64, uint64)
  #undef VFP_CONV_FIX
  #undef VFP_CONV_FIX_FLOAT
 --
-.16.2
+.20.1

-New patch
+[PULL 14/47] target/arm: Use macros instead of open-coding fp16 conversion helpers
+Now the VFP_CONV_FIX macros can handle fp16's distinction between the
+width of the operation and the width of the type used to pass operands,
+use the macros rather than the open-coded functions.
+This creates an extra six helper functions, all of which we are going
+to need for the AArch32 VFP fp16 instructions.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200828183354.27913-15-peter.maydell@linaro.org
+---
+ target/arm/helper.h     |  6 +++
+ target/arm/vfp_helper.c | 86 +++--------------------------------------
+files changed, 12 insertions(+), 80 deletions(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(vfp_tosizh, s32, f16, ptr)
+ DEF_HELPER_2(vfp_tosizs, s32, f32, ptr)
+ DEF_HELPER_2(vfp_tosizd, s32, f64, ptr)
++DEF_HELPER_3(vfp_toshh_round_to_zero, i32, f16, i32, ptr)
++DEF_HELPER_3(vfp_toslh_round_to_zero, i32, f16, i32, ptr)
++DEF_HELPER_3(vfp_touhh_round_to_zero, i32, f16, i32, ptr)
++DEF_HELPER_3(vfp_toulh_round_to_zero, i32, f16, i32, ptr)
+ DEF_HELPER_3(vfp_toshs_round_to_zero, i32, f32, i32, ptr)
+ DEF_HELPER_3(vfp_tosls_round_to_zero, i32, f32, i32, ptr)
+ DEF_HELPER_3(vfp_touhs_round_to_zero, i32, f32, i32, ptr)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_sqtod, f64, i64, i32, ptr)
+ DEF_HELPER_3(vfp_uhtod, f64, i64, i32, ptr)
+ DEF_HELPER_3(vfp_ultod, f64, i64, i32, ptr)
+ DEF_HELPER_3(vfp_uqtod, f64, i64, i32, ptr)
++DEF_HELPER_3(vfp_shtoh, f16, i32, i32, ptr)
++DEF_HELPER_3(vfp_uhtoh, f16, i32, i32, ptr)
+ DEF_HELPER_3(vfp_sltoh, f16, i32, i32, ptr)
+ DEF_HELPER_3(vfp_ultoh, f16, i32, i32, ptr)
+ DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp_helper.c
++++ b/target/arm/vfp_helper.c
+@@ -XXX,XX +XXX,XX @@ VFP_CONV_FIX_A64(sq, s, 32, float32, 64, int64)
+ VFP_CONV_FIX(uh, s, 32, float32, 32, uint16)
+ VFP_CONV_FIX(ul, s, 32, float32, 32, uint32)
+ VFP_CONV_FIX_A64(uq, s, 32, float32, 64, uint64)
++VFP_CONV_FIX(sh, h, 16, dh_ctype_f16, 32, int16)
++VFP_CONV_FIX(sl, h, 16, dh_ctype_f16, 32, int32)
++VFP_CONV_FIX_A64(sq, h, 16, dh_ctype_f16, 64, int64)
++VFP_CONV_FIX(uh, h, 16, dh_ctype_f16, 32, uint16)
++VFP_CONV_FIX(ul, h, 16, dh_ctype_f16, 32, uint32)
++VFP_CONV_FIX_A64(uq, h, 16, dh_ctype_f16, 64, uint64)
+ #undef VFP_CONV_FIX
+ #undef VFP_CONV_FIX_FLOAT
+ #undef VFP_CONV_FLOAT_FIX_ROUND
+ #undef VFP_CONV_FIX_A64
+-uint32_t HELPER(vfp_sltoh)(uint32_t x, uint32_t shift, void *fpst)
+-{
+-    return int32_to_float16_scalbn(x, -shift, fpst);
+-}
+-
+-uint32_t HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
+-{
+-    return uint32_to_float16_scalbn(x, -shift, fpst);
+-}
+-
+-uint32_t HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
+-{
+-    return int64_to_float16_scalbn(x, -shift, fpst);
+-}
+-
+-uint32_t HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
+-{
+-    return uint64_to_float16_scalbn(x, -shift, fpst);
+-}
+-
+-uint32_t HELPER(vfp_toshh)(uint32_t x, uint32_t shift, void *fpst)
+-{
+-    if (unlikely(float16_is_any_nan(x))) {
+-        float_raise(float_flag_invalid, fpst);
+-        return 0;
+-    }
+-    return float16_to_int16_scalbn(x, get_float_rounding_mode(fpst),
+-                                   shift, fpst);
+-}
+-
+-uint32_t HELPER(vfp_touhh)(uint32_t x, uint32_t shift, void *fpst)
+-{
+-    if (unlikely(float16_is_any_nan(x))) {
+-        float_raise(float_flag_invalid, fpst);
+-        return 0;
+-    }
+-    return float16_to_uint16_scalbn(x, get_float_rounding_mode(fpst),
+-                                    shift, fpst);
+-}
+-
+-uint32_t HELPER(vfp_toslh)(uint32_t x, uint32_t shift, void *fpst)
+-{
+-    if (unlikely(float16_is_any_nan(x))) {
+-        float_raise(float_flag_invalid, fpst);
+-        return 0;
+-    }
+-    return float16_to_int32_scalbn(x, get_float_rounding_mode(fpst),
+-                                   shift, fpst);
+-}
+-
+-uint32_t HELPER(vfp_toulh)(uint32_t x, uint32_t shift, void *fpst)
+-{
+-    if (unlikely(float16_is_any_nan(x))) {
+-        float_raise(float_flag_invalid, fpst);
+-        return 0;
+-    }
+-    return float16_to_uint32_scalbn(x, get_float_rounding_mode(fpst),
+-                                    shift, fpst);
+-}
+-
+-uint64_t HELPER(vfp_tosqh)(uint32_t x, uint32_t shift, void *fpst)
+-{
+-    if (unlikely(float16_is_any_nan(x))) {
+-        float_raise(float_flag_invalid, fpst);
+-        return 0;
+-    }
+-    return float16_to_int64_scalbn(x, get_float_rounding_mode(fpst),
+-                                   shift, fpst);
+-}
+-
+-uint64_t HELPER(vfp_touqh)(uint32_t x, uint32_t shift, void *fpst)
+-{
+-    if (unlikely(float16_is_any_nan(x))) {
+-        float_raise(float_flag_invalid, fpst);
+-        return 0;
+-    }
+-    return float16_to_uint64_scalbn(x, get_float_rounding_mode(fpst),
+-                                    shift, fpst);
+-}
+-
+ /* Set the current fp rounding mode and return the old one.
+  * The argument is a softfloat float_round_ value.
+  */
+--
+.20.1

-[Qemu-devel] [PULL 18/39] hw/misc/tz-ppc: Model TrustZone peripheral protection controller
+[PULL 15/47] target/arm: Implement VFP fp16 VCVT between float and fixed-point
-Add a model of the TrustZone peripheral protection controller (PPC),
+Implement the fp16 versions of the VFP VCVT instruction forms which
-which is used to gate transactions to non-TZ-aware peripherals so
+convert between floating point and fixed-point.
 that secure software can configure them to not be accessible to
 non-secure software.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-15-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-16-peter.maydell@linaro.org
 ---
- hw/misc/Makefile.objs           |   2 +
+ target/arm/vfp.decode          |  2 ++
- include/hw/misc/tz-ppc.h        | 101 ++++++++++++++
+ target/arm/translate-vfp.c.inc | 59 ++++++++++++++++++++++++++++++++++
- hw/misc/tz-ppc.c                | 302 ++++++++++++++++++++++++++++++++++++++++
+files changed, 61 insertions(+)
  default-configs/arm-softmmu.mak |   2 +
  hw/misc/trace-events            |  11 ++
 files changed, 418 insertions(+)
  create mode 100644 include/hw/misc/tz-ppc.h
  create mode 100644 hw/misc/tz-ppc.c
-diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
+diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/Makefile.objs
+--- a/target/arm/vfp.decode
-+++ b/hw/misc/Makefile.objs
++++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_MIPS_ITU) += mips_itu.o
+@@ -XXX,XX +XXX,XX @@ VJCVT        ---- 1110 1.11 1001 .... 1011 11.0 ....        @vfp_dm_sd
- obj-$(CONFIG_MPS2_FPGAIO) += mps2-fpgaio.o
+ # We assemble bits 18 (op), 16 (u) and 7 (sx) into a single opc field
- obj-$(CONFIG_MPS2_SCC) += mps2-scc.o
+ # for the convenience of the trans_VCVT_fix functions.
+ %vcvt_fix_op 18:1 16:1 7:1
-+obj-$(CONFIG_TZ_PPC) += tz-ppc.o
++VCVT_fix_hp  ---- 1110 1.11 1.1. .... 1001 .1.0 .... \
 +             vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
  VCVT_fix_sp  ---- 1110 1.11 1.1. .... 1010 .1.0 .... \
               vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
  VCVT_fix_dp  ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      return true;
  }
 +static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
 +{
 +    TCGv_i32 vd, shift;
 +    TCGv_ptr fpst;
 +    int frac_bits;
 +
- obj-$(CONFIG_PVPANIC) += pvpanic.o
++    if (!dc_isar_feature(aa32_fp16_arith, s)) {
  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
  obj-$(CONFIG_AUX) += auxbus.o
 diff --git a/include/hw/misc/tz-ppc.h b/include/hw/misc/tz-ppc.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/misc/tz-ppc.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * ARM TrustZone peripheral protection controller emulation
 + *
 + * Copyright (c) 2018 Linaro Limited
 + * Written by Peter Maydell
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 or
 + * (at your option) any later version.
 + */
 +
 +/* This is a model of the TrustZone peripheral protection controller (PPC).
 + * It is documented in the ARM CoreLink SIE-200 System IP for Embedded TRM
 + * (DDI 0571G):
 + * https://developer.arm.com/products/architecture/m-profile/docs/ddi0571/g
 + *
 + * The PPC sits in front of peripherals and allows secure software to
 + * configure it to either pass through or reject transactions.
 + * Rejected transactions may be configured to either be aborted, or to
 + * behave as RAZ/WI. An interrupt can be signalled for a rejected transaction.
 + *
 + * The PPC has no register interface -- it is configured purely by a
 + * collection of input signals from other hardware in the system. Typically
 + * they are either hardwired or exposed in an ad-hoc register interface by
 + * the SoC that uses the PPC.
 + *
 + * This QEMU model can be used to model either the AHB5 or APB4 TZ PPC,
 + * since the only difference between them is that the AHB version has a
 + * "default" port which has no security checks applied. In QEMU the default
 + * port can be emulated simply by wiring its downstream devices directly
 + * into the parent address space, since the PPC does not need to intercept
 + * transactions there.
 + *
 + * In the hardware, selection of which downstream port to use is done by
 + * the user's decode logic asserting one of the hsel[] signals. In QEMU,
 + * we provide 16 MMIO regions, one per port, and the user maps these into
 + * the desired addresses to implement the address decode.
 + *
 + * QEMU interface:
 + * + sysbus MMIO regions 0..15: MemoryRegions defining the upstream end
 + *   of each of the 16 ports of the PPC
 + * + Property "port[0..15]": MemoryRegion defining the downstream device(s)
 + *   for each of the 16 ports of the PPC
 + * + Named GPIO inputs "cfg_nonsec[0..15]": set to 1 if the port should be
 + *   accessible to NonSecure transactions
 + * + Named GPIO inputs "cfg_ap[0..15]": set to 1 if the port should be
 + *   accessible to non-privileged transactions
 + * + Named GPIO input "cfg_sec_resp": set to 1 if a rejected transaction should
 + *   result in a transaction error, or 0 for the transaction to RAZ/WI
 + * + Named GPIO input "irq_enable": set to 1 to enable interrupts
 + * + Named GPIO input "irq_clear": set to 1 to clear a pending interrupt
 + * + Named GPIO output "irq": set for a transaction-failed interrupt
 + * + Property "NONSEC_MASK": if a bit is set in this mask then accesses to
 + *   the associated port do not have the TZ security check performed. (This
 + *   corresponds to the hardware allowing this to be set as a Verilog
 + *   parameter.)
 + */
 +
 +#ifndef TZ_PPC_H
 +#define TZ_PPC_H
 +
 +#include "hw/sysbus.h"
 +
 +#define TYPE_TZ_PPC "tz-ppc"
 +#define TZ_PPC(obj) OBJECT_CHECK(TZPPC, (obj), TYPE_TZ_PPC)
 +
 +#define TZ_NUM_PORTS 16
 +
 +typedef struct TZPPC TZPPC;
 +
 +typedef struct TZPPCPort {
 +    TZPPC *ppc;
 +    MemoryRegion upstream;
 +    AddressSpace downstream_as;
 +    MemoryRegion *downstream;
 +} TZPPCPort;
 +
 +struct TZPPC {
 +    /*< private >*/
 +    SysBusDevice parent_obj;
 +
 +    /*< public >*/
 +
 +    /* State: these just track the values of our input signals */
 +    bool cfg_nonsec[TZ_NUM_PORTS];
 +    bool cfg_ap[TZ_NUM_PORTS];
 +    bool cfg_sec_resp;
 +    bool irq_enable;
 +    bool irq_clear;
 +    /* State: are we asserting irq ? */
 +    bool irq_status;
 +
 +    qemu_irq irq;
 +
 +    /* Properties */
 +    uint32_t nonsec_mask;
 +
 +    TZPPCPort port[TZ_NUM_PORTS];
 +};
 +
 +#endif
 diff --git a/hw/misc/tz-ppc.c b/hw/misc/tz-ppc.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/misc/tz-ppc.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * ARM TrustZone peripheral protection controller emulation
 + *
 + * Copyright (c) 2018 Linaro Limited
 + * Written by Peter Maydell
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 or
 + * (at your option) any later version.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/log.h"
 +#include "qapi/error.h"
 +#include "trace.h"
 +#include "hw/sysbus.h"
 +#include "hw/registerfields.h"
 +#include "hw/misc/tz-ppc.h"
 +
 +static void tz_ppc_update_irq(TZPPC *s)
 +{
 +    bool level = s->irq_status && s->irq_enable;
 +
 +    trace_tz_ppc_update_irq(level);
 +    qemu_set_irq(s->irq, level);
 +}
 +
 +static void tz_ppc_cfg_nonsec(void *opaque, int n, int level)
 +{
 +    TZPPC *s = TZ_PPC(opaque);
 +
 +    assert(n < TZ_NUM_PORTS);
 +    trace_tz_ppc_cfg_nonsec(n, level);
 +    s->cfg_nonsec[n] = level;
 +}
 +
 +static void tz_ppc_cfg_ap(void *opaque, int n, int level)
 +{
 +    TZPPC *s = TZ_PPC(opaque);
 +
 +    assert(n < TZ_NUM_PORTS);
 +    trace_tz_ppc_cfg_ap(n, level);
 +    s->cfg_ap[n] = level;
 +}
 +
 +static void tz_ppc_cfg_sec_resp(void *opaque, int n, int level)
 +{
 +    TZPPC *s = TZ_PPC(opaque);
 +
 +    trace_tz_ppc_cfg_sec_resp(level);
 +    s->cfg_sec_resp = level;
 +}
 +
 +static void tz_ppc_irq_enable(void *opaque, int n, int level)
 +{
 +    TZPPC *s = TZ_PPC(opaque);
 +
 +    trace_tz_ppc_irq_enable(level);
 +    s->irq_enable = level;
 +    tz_ppc_update_irq(s);
 +}
 +
 +static void tz_ppc_irq_clear(void *opaque, int n, int level)
 +{
 +    TZPPC *s = TZ_PPC(opaque);
 +
 +    trace_tz_ppc_irq_clear(level);
 +
 +    s->irq_clear = level;
 +    if (level) {
 +        s->irq_status = false;
 +        tz_ppc_update_irq(s);
 +    }
 +}
 +
 +static bool tz_ppc_check(TZPPC *s, int n, MemTxAttrs attrs)
 +{
 +    /* Check whether to allow an access to port n; return true if
 +     * the check passes, and false if the transaction must be blocked.
 +     * If the latter, the caller must check cfg_sec_resp to determine
 +     * whether to abort or RAZ/WI the transaction.
 +     * The checks are:
 +     *  + nonsec_mask suppresses any check of the secure attribute
 +     *  + otherwise, block if cfg_nonsec is 1 and transaction is secure,
 +     *    or if cfg_nonsec is 0 and transaction is non-secure
 +     *  + block if transaction is usermode and cfg_ap is 0
 +     */
 +    if ((attrs.secure == s->cfg_nonsec[n] && !(s->nonsec_mask & (1 << n))) ||
 +        (attrs.user && !s->cfg_ap[n])) {
 +        /* Block the transaction. */
 +        if (!s->irq_clear) {
 +            /* Note that holding irq_clear high suppresses interrupts */
 +            s->irq_status = true;
 +            tz_ppc_update_irq(s);
 +        }
 +        return false;
 +    }
-+    return true;
-+}
 +
-+static MemTxResult tz_ppc_read(void *opaque, hwaddr addr, uint64_t *pdata,
++    if (!vfp_access_check(s)) {
-+                               unsigned size, MemTxAttrs attrs)
++        return true;
 +{
 +    TZPPCPort *p = opaque;
 +    TZPPC *s = p->ppc;
 +    int n = p - s->port;
 +    AddressSpace *as = &p->downstream_as;
 +    uint64_t data;
 +    MemTxResult res;
 +
 +    if (!tz_ppc_check(s, n, attrs)) {
 +        trace_tz_ppc_read_blocked(n, addr, attrs.secure, attrs.user);
 +        if (s->cfg_sec_resp) {
 +            return MEMTX_ERROR;
 +        } else {
 +            *pdata = 0;
 +            return MEMTX_OK;
 +        }
 +    }
 +
-+    switch (size) {
++    frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 +
 +    vd = tcg_temp_new_i32();
 +    neon_load_reg32(vd, a->vd);
 +
 +    fpst = fpstatus_ptr(FPST_FPCR_F16);
 +    shift = tcg_const_i32(frac_bits);
 +
 +    /* Switch on op:U:sx bits */
 +    switch (a->opc) {
 +    case 0:
 +        gen_helper_vfp_shtoh(vd, vd, shift, fpst);
 +        break;
 +    case 1:
-+        data = address_space_ldub(as, addr, attrs, &res);
++        gen_helper_vfp_sltoh(vd, vd, shift, fpst);
 +        break;
 +    case 2:
-+        data = address_space_lduw_le(as, addr, attrs, &res);
++        gen_helper_vfp_uhtoh(vd, vd, shift, fpst);
 +        break;
 +    case 3:
 +        gen_helper_vfp_ultoh(vd, vd, shift, fpst);
 +        break;
 +    case 4:
-+        data = address_space_ldl_le(as, addr, attrs, &res);
++        gen_helper_vfp_toshh_round_to_zero(vd, vd, shift, fpst);
 +        break;
-+    case 8:
++    case 5:
-+        data = address_space_ldq_le(as, addr, attrs, &res);
++        gen_helper_vfp_toslh_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 6:
 +        gen_helper_vfp_touhh_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 7:
 +        gen_helper_vfp_toulh_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
-+    *pdata = data;
++
-+    return res;
++    neon_store_reg32(vd, a->vd);
 +    tcg_temp_free_i32(vd);
 +    tcg_temp_free_i32(shift);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
-+static MemTxResult tz_ppc_write(void *opaque, hwaddr addr, uint64_t val,
+ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
-+                                unsigned size, MemTxAttrs attrs)
+ {
-+{
+     TCGv_i32 vd, shift;
 +    TZPPCPort *p = opaque;
 +    TZPPC *s = p->ppc;
 +    AddressSpace *as = &p->downstream_as;
 +    int n = p - s->port;
 +    MemTxResult res;
 +
 +    if (!tz_ppc_check(s, n, attrs)) {
 +        trace_tz_ppc_write_blocked(n, addr, attrs.secure, attrs.user);
 +        if (s->cfg_sec_resp) {
 +            return MEMTX_ERROR;
 +        } else {
 +            return MEMTX_OK;
 +        }
 +    }
 +
 +    switch (size) {
 +    case 1:
 +        address_space_stb(as, addr, val, attrs, &res);
 +        break;
 +    case 2:
 +        address_space_stw_le(as, addr, val, attrs, &res);
 +        break;
 +    case 4:
 +        address_space_stl_le(as, addr, val, attrs, &res);
 +        break;
 +    case 8:
 +        address_space_stq_le(as, addr, val, attrs, &res);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +    return res;
 +}
 +
 +static const MemoryRegionOps tz_ppc_ops = {
 +    .read_with_attrs = tz_ppc_read,
 +    .write_with_attrs = tz_ppc_write,
 +    .endianness = DEVICE_LITTLE_ENDIAN,
 +};
 +
 +static void tz_ppc_reset(DeviceState *dev)
 +{
 +    TZPPC *s = TZ_PPC(dev);
 +
 +    trace_tz_ppc_reset();
 +    s->cfg_sec_resp = false;
 +    memset(s->cfg_nonsec, 0, sizeof(s->cfg_nonsec));
 +    memset(s->cfg_ap, 0, sizeof(s->cfg_ap));
 +}
 +
 +static void tz_ppc_init(Object *obj)
 +{
 +    DeviceState *dev = DEVICE(obj);
 +    TZPPC *s = TZ_PPC(obj);
 +
 +    qdev_init_gpio_in_named(dev, tz_ppc_cfg_nonsec, "cfg_nonsec", TZ_NUM_PORTS);
 +    qdev_init_gpio_in_named(dev, tz_ppc_cfg_ap, "cfg_ap", TZ_NUM_PORTS);
 +    qdev_init_gpio_in_named(dev, tz_ppc_cfg_sec_resp, "cfg_sec_resp", 1);
 +    qdev_init_gpio_in_named(dev, tz_ppc_irq_enable, "irq_enable", 1);
 +    qdev_init_gpio_in_named(dev, tz_ppc_irq_clear, "irq_clear", 1);
 +    qdev_init_gpio_out_named(dev, &s->irq, "irq", 1);
 +}
 +
 +static void tz_ppc_realize(DeviceState *dev, Error **errp)
 +{
 +    Object *obj = OBJECT(dev);
 +    SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
 +    TZPPC *s = TZ_PPC(dev);
 +    int i;
 +
 +    /* We can't create the upstream end of the port until realize,
 +     * as we don't know the size of the MR used as the downstream until then.
 +     */
 +    for (i = 0; i < TZ_NUM_PORTS; i++) {
 +        TZPPCPort *port = &s->port[i];
 +        char *name;
 +        uint64_t size;
 +
 +        if (!port->downstream) {
 +            continue;
 +        }
 +
 +        name = g_strdup_printf("tz-ppc-port[%d]", i);
 +
 +        port->ppc = s;
 +        address_space_init(&port->downstream_as, port->downstream, name);
 +
 +        size = memory_region_size(port->downstream);
 +        memory_region_init_io(&port->upstream, obj, &tz_ppc_ops,
 +                              port, name, size);
 +        sysbus_init_mmio(sbd, &port->upstream);
 +        g_free(name);
 +    }
 +}
 +
 +static const VMStateDescription tz_ppc_vmstate = {
 +    .name = "tz-ppc",
 +    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .fields = (VMStateField[]) {
 +        VMSTATE_BOOL_ARRAY(cfg_nonsec, TZPPC, 16),
 +        VMSTATE_BOOL_ARRAY(cfg_ap, TZPPC, 16),
 +        VMSTATE_BOOL(cfg_sec_resp, TZPPC),
 +        VMSTATE_BOOL(irq_enable, TZPPC),
 +        VMSTATE_BOOL(irq_clear, TZPPC),
 +        VMSTATE_BOOL(irq_status, TZPPC),
 +        VMSTATE_END_OF_LIST()
 +    }
 +};
 +
 +#define DEFINE_PORT(N)                                          \
 +    DEFINE_PROP_LINK("port[" #N "]", TZPPC, port[N].downstream, \
 +                     TYPE_MEMORY_REGION, MemoryRegion *)
 +
 +static Property tz_ppc_properties[] = {
 +    DEFINE_PROP_UINT32("NONSEC_MASK", TZPPC, nonsec_mask, 0),
 +    DEFINE_PORT(0),
 +    DEFINE_PORT(1),
 +    DEFINE_PORT(2),
 +    DEFINE_PORT(3),
 +    DEFINE_PORT(4),
 +    DEFINE_PORT(5),
 +    DEFINE_PORT(6),
 +    DEFINE_PORT(7),
 +    DEFINE_PORT(8),
 +    DEFINE_PORT(9),
 +    DEFINE_PORT(10),
 +    DEFINE_PORT(11),
 +    DEFINE_PORT(12),
 +    DEFINE_PORT(13),
 +    DEFINE_PORT(14),
 +    DEFINE_PORT(15),
 +    DEFINE_PROP_END_OF_LIST(),
 +};
 +
 +static void tz_ppc_class_init(ObjectClass *klass, void *data)
 +{
 +    DeviceClass *dc = DEVICE_CLASS(klass);
 +
 +    dc->realize = tz_ppc_realize;
 +    dc->vmsd = &tz_ppc_vmstate;
 +    dc->reset = tz_ppc_reset;
 +    dc->props = tz_ppc_properties;
 +}
 +
 +static const TypeInfo tz_ppc_info = {
 +    .name = TYPE_TZ_PPC,
 +    .parent = TYPE_SYS_BUS_DEVICE,
 +    .instance_size = sizeof(TZPPC),
 +    .instance_init = tz_ppc_init,
 +    .class_init = tz_ppc_class_init,
 +};
 +
 +static void tz_ppc_register_types(void)
 +{
 +    type_register_static(&tz_ppc_info);
 +}
 +
 +type_init(tz_ppc_register_types);
 diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
 index XXXXXXX..XXXXXXX 100644
 --- a/default-configs/arm-softmmu.mak
 +++ b/default-configs/arm-softmmu.mak
@@ -XXX,XX +XXX,XX @@ CONFIG_CMSDK_APB_UART=y
  CONFIG_MPS2_FPGAIO=y
  CONFIG_MPS2_SCC=y
 +CONFIG_TZ_PPC=y
 +
  CONFIG_VERSATILE_PCI=y
  CONFIG_VERSATILE_I2C=y
 diff --git a/hw/misc/trace-events b/hw/misc/trace-events
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/trace-events
 +++ b/hw/misc/trace-events
@@ -XXX,XX +XXX,XX @@ mos6522_get_next_irq_time(uint16_t latch, int64_t d, int64_t delta) "latch=%d co
  mos6522_set_sr_int(void) "set sr_int"
  mos6522_write(uint64_t addr, uint64_t val) "reg=0x%"PRIx64 " val=0x%"PRIx64
  mos6522_read(uint64_t addr, unsigned val) "reg=0x%"PRIx64 " val=0x%x"
 +
 +# hw/misc/tz-ppc.c
 +tz_ppc_reset(void) "TZ PPC: reset"
 +tz_ppc_cfg_nonsec(int n, int level) "TZ PPC: cfg_nonsec[%d] = %d"
 +tz_ppc_cfg_ap(int n, int level) "TZ PPC: cfg_ap[%d] = %d"
 +tz_ppc_cfg_sec_resp(int level) "TZ PPC: cfg_sec_resp = %d"
 +tz_ppc_irq_enable(int level) "TZ PPC: int_enable = %d"
 +tz_ppc_irq_clear(int level) "TZ PPC: int_clear = %d"
 +tz_ppc_update_irq(int level) "TZ PPC: setting irq line to %d"
 +tz_ppc_read_blocked(int n, hwaddr offset, bool secure, bool user) "TZ PPC: port %d offset 0x%" HWADDR_PRIx " read (secure %d user %d) blocked"
 +tz_ppc_write_blocked(int n, hwaddr offset, bool secure, bool user) "TZ PPC: port %d offset 0x%" HWADDR_PRIx " write (secure %d user %d) blocked"
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 21/39] hw/misc/iotkit-secctl: Add remaining simple registers
+[PULL 16/47] target/arm: Implement VFP vp16 VCVT-with-specified-rounding-mode
-Add remaining easy registers to iotkit-secctl:
+Implement the fp16 versions of the VFP VCVT instruction forms
- * NSCCFG just routes its two bits out to external GPIO lines
+which convert between floating point and integer with a specified
- * BRGINSTAT/BRGINTCLR/BRGINTEN can be dummies, because QEMU's
+rounding mode.
    bus fabric can never report errors
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20180220180325.29818-18-peter.maydell@linaro.org
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200828183354.27913-17-peter.maydell@linaro.org
 ---
- include/hw/misc/iotkit-secctl.h |  4 ++++
+ target/arm/vfp-uncond.decode   |  6 ++++--
- hw/misc/iotkit-secctl.c         | 32 ++++++++++++++++++++++++++------
+ target/arm/translate-vfp.c.inc | 32 ++++++++++++++++++++++++--------
-files changed, 30 insertions(+), 6 deletions(-)
+files changed, 28 insertions(+), 10 deletions(-)
-diff --git a/include/hw/misc/iotkit-secctl.h b/include/hw/misc/iotkit-secctl.h
+diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/misc/iotkit-secctl.h
+--- a/target/arm/vfp-uncond.decode
-+++ b/include/hw/misc/iotkit-secctl.h
++++ b/target/arm/vfp-uncond.decode
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
-  *  + sysbus MMIO region 1 is the "non-secure privilege control block" registers
+             vm=%vm_dp vd=%vd_dp dp=1
-  *  + named GPIO output "sec_resp_cfg" indicating whether blocked accesses
-  *    should RAZ/WI or bus error
+ # VCVT float to int with specified rounding mode; Vd is always single-precision
-+ *  + named GPIO output "nsc_cfg" whose value tracks the NSCCFG register value
++VCVT        1111 1110 1.11 11 rm:2 .... 1001 op:1 1.0 .... \
-  * Controlling the 2 APB PPCs in the IoTKit:
++            vm=%vm_sp vd=%vd_sp sz=1
-  *  + named GPIO outputs apb_ppc0_nonsec[0..2] and apb_ppc1_nonsec
+ VCVT        1111 1110 1.11 11 rm:2 .... 1010 op:1 1.0 .... \
-  *  + named GPIO outputs apb_ppc0_ap[0..2] and apb_ppc1_ap
+-            vm=%vm_sp vd=%vd_sp dp=0
-@@ -XXX,XX +XXX,XX @@ struct IoTKitSecCtl {
++            vm=%vm_sp vd=%vd_sp sz=2
+ VCVT        1111 1110 1.11 11 rm:2 .... 1011 op:1 1.0 .... \
-     /*< public >*/
+-            vm=%vm_dp vd=%vd_sp dp=1
-     qemu_irq sec_resp_cfg;
++            vm=%vm_dp vd=%vd_sp sz=3
-+    qemu_irq nsc_cfg_irq;
+diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
      MemoryRegion s_regs;
      MemoryRegion ns_regs;
@@ -XXX,XX +XXX,XX @@ struct IoTKitSecCtl {
      uint32_t secppcintstat;
      uint32_t secppcinten;
      uint32_t secrespcfg;
 +    uint32_t nsccfg;
 +    uint32_t brginten;
      IoTKitSecCtlPPC apb[IOTS_NUM_APB_PPC];
      IoTKitSecCtlPPC apbexp[IOTS_NUM_APB_EXP_PPC];
 diff --git a/hw/misc/iotkit-secctl.c b/hw/misc/iotkit-secctl.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/iotkit-secctl.c
+--- a/target/arm/translate-vfp.c.inc
-+++ b/hw/misc/iotkit-secctl.c
++++ b/target/arm/translate-vfp.c.inc
-@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_read(void *opaque, hwaddr addr,
+@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
-     case A_SECRESPCFG:
+ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-         r = s->secrespcfg;
+ {
-         break;
+     uint32_t rd, rm;
-+    case A_NSCCFG:
+-    bool dp = a->dp;
-+        r = s->nsccfg;
++    int sz = a->sz;
-+        break;
+     TCGv_ptr fpst;
-     case A_SECPPCINTSTAT:
+     TCGv_i32 tcg_rmode, tcg_shift;
-         r = s->secppcintstat;
+     int rounding = fp_decode_rm[a->rm];
-         break;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-     case A_SECPPCINTEN:
+         return false;
          r = s->secppcinten;
          break;
 +    case A_BRGINTSTAT:
 +        /* QEMU's bus fabric can never report errors as it doesn't buffer
 +         * writes, so we never report bridge interrupts.
 +         */
 +        r = 0;
 +        break;
 +    case A_BRGINTEN:
 +        r = s->brginten;
 +        break;
      case A_AHBNSPPCEXP0:
      case A_AHBNSPPCEXP1:
      case A_AHBNSPPCEXP2:
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_read(void *opaque, hwaddr addr,
      case A_APBSPPPCEXP3:
          r = s->apbexp[offset_to_ppc_idx(offset)].sp;
          break;
 -    case A_NSCCFG:
      case A_SECMPCINTSTATUS:
      case A_SECMSCINTSTAT:
      case A_SECMSCINTEN:
 -    case A_BRGINTSTAT:
 -    case A_BRGINTEN:
      case A_NSMSCEXP:
          qemu_log_mask(LOG_UNIMP,
                        "IoTKit SecCtl S block read: "
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_write(void *opaque, hwaddr addr,
      }
-     switch (offset) {
+-    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
-+    case A_NSCCFG:
++    if (sz == 3 && !dc_isar_feature(aa32_fpdp_v2, s)) {
-+        s->nsccfg = value & 3;
++        return false;
-+        qemu_set_irq(s->nsc_cfg_irq, s->nsccfg);
++    }
-+        break;
++
-     case A_SECRESPCFG:
++    if (sz == 1 && !dc_isar_feature(aa32_fp16_arith, s)) {
-         value &= 1;
+         return false;
          s->secrespcfg = value;
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_write(void *opaque, hwaddr addr,
          s->secppcinten = value & 0x00f000f3;
          foreach_ppc(s, iotkit_secctl_ppc_update_irq_enable);
          break;
 +    case A_BRGINTCLR:
 +        break;
 +    case A_BRGINTEN:
 +        s->brginten = value & 0xffff0000;
 +        break;
      case A_AHBNSPPCEXP0:
      case A_AHBNSPPCEXP1:
      case A_AHBNSPPCEXP2:
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_write(void *opaque, hwaddr addr,
          ppc = &s->apbexp[offset_to_ppc_idx(offset)];
          iotkit_secctl_ppc_sp_write(ppc, value);
          break;
 -    case A_NSCCFG:
      case A_SECMSCINTCLR:
      case A_SECMSCINTEN:
 -    case A_BRGINTCLR:
 -    case A_BRGINTEN:
          qemu_log_mask(LOG_UNIMP,
                        "IoTKit SecCtl S block write: "
                        "unimplemented offset 0x%x\n", offset);
@@ -XXX,XX +XXX,XX @@ static void iotkit_secctl_reset(DeviceState *dev)
      s->secppcintstat = 0;
      s->secppcinten = 0;
      s->secrespcfg = 0;
 +    s->nsccfg = 0;
 +    s->brginten = 0;
      foreach_ppc(s, iotkit_secctl_reset_ppc);
  }
@@ -XXX,XX +XXX,XX @@ static void iotkit_secctl_init(Object *obj)
      }
-     qdev_init_gpio_out_named(dev, &s->sec_resp_cfg, "sec_resp_cfg", 1);
+     /* UNDEF accesses to D16-D31 if they don't exist */
-+    qdev_init_gpio_out_named(dev, &s->nsc_cfg_irq, "nsc_cfg", 1);
+-    if (dp && !dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
++    if (sz == 3 && !dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
-     memory_region_init_io(&s->s_regs, obj, &iotkit_secctl_s_ops,
+         return false;
-                           s, "iotkit-secctl-s-regs", 0x1000);
+     }
-@@ -XXX,XX +XXX,XX @@ static const VMStateDescription iotkit_secctl_vmstate = {
-         VMSTATE_UINT32(secppcintstat, IoTKitSecCtl),
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-         VMSTATE_UINT32(secppcinten, IoTKitSecCtl),
+         return true;
-         VMSTATE_UINT32(secrespcfg, IoTKitSecCtl),
+     }
-+        VMSTATE_UINT32(nsccfg, IoTKitSecCtl),
-+        VMSTATE_UINT32(brginten, IoTKitSecCtl),
+-    fpst = fpstatus_ptr(FPST_FPCR);
-         VMSTATE_STRUCT_ARRAY(apb, IoTKitSecCtl, IOTS_NUM_APB_PPC, 1,
++    if (sz == 1) {
-                              iotkit_secctl_ppc_vmstate, IoTKitSecCtlPPC),
++        fpst = fpstatus_ptr(FPST_FPCR_F16);
-         VMSTATE_STRUCT_ARRAY(apbexp, IoTKitSecCtl, IOTS_NUM_APB_EXP_PPC, 1,
++    } else {
 +        fpst = fpstatus_ptr(FPST_FPCR);
 +    }
      tcg_shift = tcg_const_i32(0);
      tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    if (dp) {
 +    if (sz == 3) {
          TCGv_i64 tcg_double, tcg_res;
          TCGv_i32 tcg_tmp;
          tcg_double = tcg_temp_new_i64();
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
          tcg_single = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
          neon_load_reg32(tcg_single, rm);
 -        if (is_signed) {
 -            gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
 +        if (sz == 1) {
 +            if (is_signed) {
 +                gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
 +            } else {
 +                gen_helper_vfp_toulh(tcg_res, tcg_single, tcg_shift, fpst);
 +            }
          } else {
 -            gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
 +            if (is_signed) {
 +                gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
 +            } else {
 +                gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
 +            }
          }
          neon_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_res);
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 38/39] target/arm: Decode t32 simd 3reg and 2reg_scalar extension
+[PULL 17/47] target/arm: Implement VFP fp16 VSEL
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the fp16 versions of the VFP VSEL instruction.
-Happily, the bits are in the same places compared to a32.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200828183354.27913-18-peter.maydell@linaro.org
 ---
  target/arm/vfp-uncond.decode   |  6 ++++--
  target/arm/translate-vfp.c.inc | 16 ++++++++++++----
 files changed, 16 insertions(+), 6 deletions(-)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 Message-id: 20180228193125.20577-16-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/translate.c | 14 +++++++++++++-
 file changed, 13 insertions(+), 1 deletion(-)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/vfp-uncond.decode
-+++ b/target/arm/translate.c
++++ b/target/arm/vfp-uncond.decode
-@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@
-                                default_exception_el(s));
+ @vfp_dnm_s   ................................ vm=%vm_sp vn=%vn_sp vd=%vd_sp
  @vfp_dnm_d   ................................ vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +VSEL        1111 1110 0. cc:2 .... .... 1001 .0.0 .... \
 +            vm=%vm_sp vn=%vn_sp vd=%vd_sp sz=1
  VSEL        1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
 -            vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
 +            vm=%vm_sp vn=%vn_sp vd=%vd_sp sz=2
  VSEL        1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
 -            vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
 +            vm=%vm_dp vn=%vn_dp vd=%vd_dp sz=3
  VMAXNM_hp   1111 1110 1.00 .... .... 1001 .0.0 ....         @vfp_dnm_s
  VMINNM_hp   1111 1110 1.00 .... .... 1001 .1.0 ....         @vfp_dnm_s
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check(DisasContext *s)
  static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
  {
      uint32_t rd, rn, rm;
 -    bool dp = a->dp;
 +    int sz = a->sz;
      if (!dc_isar_feature(aa32_vsel, s)) {
          return false;
      }
 -    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
 +    if (sz == 3 && !dc_isar_feature(aa32_fpdp_v2, s)) {
 +        return false;
 +    }
 +
 +    if (sz == 1 && !dc_isar_feature(aa32_fp16_arith, s)) {
          return false;
      }
      /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
 +    if (sz == 3 && !dc_isar_feature(aa32_simd_r32, s) &&
          ((a->vm | a->vn | a->vd) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          return true;
      }
 -    if (dp) {
 +    if (sz == 3) {
          TCGv_i64 frn, frm, dest;
          TCGv_i64 tmp, zero, zf, nf, vf;
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
              tcg_temp_free_i32(tmp);
              break;
          }
--        if (((insn >> 24) & 3) == 3) {
++        /* For fp16 the top half is always zeroes */
-+        if ((insn & 0xfe000a00) == 0xfc000800
++        if (sz == 1) {
-+            && arm_dc_feature(s, ARM_FEATURE_V8)) {
++            tcg_gen_andi_i32(dest, dest, 0xffff);
-+            /* The Thumb2 and ARM encodings are identical.  */
++        }
-+            if (disas_neon_insn_3same_ext(s, insn)) {
+         neon_store_reg32(dest, rd);
-+                goto illegal_op;
+         tcg_temp_free_i32(frn);
-+            }
+         tcg_temp_free_i32(frm);
 +        } else if ((insn & 0xff000a00) == 0xfe000800
 +                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
 +            /* The Thumb2 and ARM encodings are identical.  */
 +            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
 +                goto illegal_op;
 +            }
 +        } else if (((insn >> 24) & 3) == 3) {
              /* Translate into the equivalent ARM encoding.  */
              insn = (insn & 0xe2ffffff) | ((insn & (1 << 28)) >> 4) | (1 << 28);
              if (disas_neon_data_insn(s, insn)) {
 --
-.16.2
+.20.1

-New patch
+[PULL 18/47] target/arm: Implement VFP fp16 VRINT*
+Implement the fp16 version of the VFP VRINT* insns.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200828183354.27913-19-peter.maydell@linaro.org
 ---
  target/arm/helper.h            |  2 +
  target/arm/vfp-uncond.decode   |  6 ++-
  target/arm/vfp.decode          |  3 ++
  target/arm/vfp_helper.c        | 21 ++++++++
  target/arm/translate-vfp.c.inc | 98 +++++++++++++++++++++++++++++++---
 files changed, 122 insertions(+), 8 deletions(-)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(shr_cc, i32, env, i32, i32)
  DEF_HELPER_3(sar_cc, i32, env, i32, i32)
  DEF_HELPER_3(ror_cc, i32, env, i32, i32)
 +DEF_HELPER_FLAGS_2(rinth_exact, TCG_CALL_NO_RWG, f16, f16, ptr)
  DEF_HELPER_FLAGS_2(rints_exact, TCG_CALL_NO_RWG, f32, f32, ptr)
  DEF_HELPER_FLAGS_2(rintd_exact, TCG_CALL_NO_RWG, f64, f64, ptr)
 +DEF_HELPER_FLAGS_2(rinth, TCG_CALL_NO_RWG, f16, f16, ptr)
  DEF_HELPER_FLAGS_2(rints, TCG_CALL_NO_RWG, f32, f32, ptr)
  DEF_HELPER_FLAGS_2(rintd, TCG_CALL_NO_RWG, f64, f64, ptr)
 diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp-uncond.decode
 +++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VMINNM_sp   1111 1110 1.00 .... .... 1010 .1.0 ....         @vfp_dnm_s
  VMAXNM_dp   1111 1110 1.00 .... .... 1011 .0.0 ....         @vfp_dnm_d
  VMINNM_dp   1111 1110 1.00 .... .... 1011 .1.0 ....         @vfp_dnm_d
 +VRINT       1111 1110 1.11 10 rm:2 .... 1001 01.0 .... \
 +            vm=%vm_sp vd=%vd_sp sz=1
  VRINT       1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
 -            vm=%vm_sp vd=%vd_sp dp=0
 +            vm=%vm_sp vd=%vd_sp sz=2
  VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
 -            vm=%vm_dp vd=%vd_dp dp=1
 +            vm=%vm_dp vd=%vd_dp sz=3
  # VCVT float to int with specified rounding mode; Vd is always single-precision
  VCVT        1111 1110 1.11 11 rm:2 .... 1001 op:1 1.0 .... \
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
  VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
               vd=%vd_sp vm=%vm_dp
 +VRINTR_hp    ---- 1110 1.11 0110 .... 1001 01.0 ....        @vfp_dm_ss
  VRINTR_sp    ---- 1110 1.11 0110 .... 1010 01.0 ....        @vfp_dm_ss
  VRINTR_dp    ---- 1110 1.11 0110 .... 1011 01.0 ....        @vfp_dm_dd
 +VRINTZ_hp    ---- 1110 1.11 0110 .... 1001 11.0 ....        @vfp_dm_ss
  VRINTZ_sp    ---- 1110 1.11 0110 .... 1010 11.0 ....        @vfp_dm_ss
  VRINTZ_dp    ---- 1110 1.11 0110 .... 1011 11.0 ....        @vfp_dm_dd
 +VRINTX_hp    ---- 1110 1.11 0111 .... 1001 01.0 ....        @vfp_dm_ss
  VRINTX_sp    ---- 1110 1.11 0111 .... 1010 01.0 ....        @vfp_dm_ss
  VRINTX_dp    ---- 1110 1.11 0111 .... 1011 01.0 ....        @vfp_dm_dd
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ float64 VFP_HELPER(muladd, d)(float64 a, float64 b, float64 c, void *fpstp)
  }
  /* ARMv8 round to integral */
 +dh_ctype_f16 HELPER(rinth_exact)(dh_ctype_f16 x, void *fp_status)
 +{
 +    return float16_round_to_int(x, fp_status);
 +}
 +
  float32 HELPER(rints_exact)(float32 x, void *fp_status)
  {
      return float32_round_to_int(x, fp_status);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(rintd_exact)(float64 x, void *fp_status)
      return float64_round_to_int(x, fp_status);
  }
 +dh_ctype_f16 HELPER(rinth)(dh_ctype_f16 x, void *fp_status)
 +{
 +    int old_flags = get_float_exception_flags(fp_status), new_flags;
 +    float16 ret;
 +
 +    ret = float16_round_to_int(x, fp_status);
 +
 +    /* Suppress any inexact exceptions the conversion produced */
 +    if (!(old_flags & float_flag_inexact)) {
 +        new_flags = get_float_exception_flags(fp_status);
 +        set_float_exception_flags(new_flags & ~float_flag_inexact, fp_status);
 +    }
 +
 +    return ret;
 +}
 +
  float32 HELPER(rints)(float32 x, void *fp_status)
  {
      int old_flags = get_float_exception_flags(fp_status), new_flags;
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static const uint8_t fp_decode_rm[] = {
  static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
  {
      uint32_t rd, rm;
 -    bool dp = a->dp;
 +    int sz = a->sz;
      TCGv_ptr fpst;
      TCGv_i32 tcg_rmode;
      int rounding = fp_decode_rm[a->rm];
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          return false;
      }
 -    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
 +    if (sz == 3 && !dc_isar_feature(aa32_fpdp_v2, s)) {
 +        return false;
 +    }
 +
 +    if (sz == 1 && !dc_isar_feature(aa32_fp16_arith, s)) {
          return false;
      }
      /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
 +    if (sz == 3 && !dc_isar_feature(aa32_simd_r32, s) &&
          ((a->vm | a->vd) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          return true;
      }
 -    fpst = fpstatus_ptr(FPST_FPCR);
 +    if (sz == 1) {
 +        fpst = fpstatus_ptr(FPST_FPCR_F16);
 +    } else {
 +        fpst = fpstatus_ptr(FPST_FPCR);
 +    }
      tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    if (dp) {
 +    if (sz == 3) {
          TCGv_i64 tcg_op;
          TCGv_i64 tcg_res;
          tcg_op = tcg_temp_new_i64();
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          tcg_op = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
          neon_load_reg32(tcg_op, rm);
 -        gen_helper_rints(tcg_res, tcg_op, fpst);
 +        if (sz == 1) {
 +            gen_helper_rinth(tcg_res, tcg_op, fpst);
 +        } else {
 +            gen_helper_rints(tcg_res, tcg_op, fpst);
 +        }
          neon_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_op);
          tcg_temp_free_i32(tcg_res);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
      return true;
  }
 +static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i32 tmp;
 +
 +    if (!dc_isar_feature(aa32_fp16_arith, s)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i32();
 +    neon_load_reg32(tmp, a->vm);
 +    fpst = fpstatus_ptr(FPST_FPCR_F16);
 +    gen_helper_rinth(tmp, tmp, fpst);
 +    neon_store_reg32(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
  static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
  {
      TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
      return true;
  }
 +static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i32 tmp;
 +    TCGv_i32 tcg_rmode;
 +
 +    if (!dc_isar_feature(aa32_fp16_arith, s)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i32();
 +    neon_load_reg32(tmp, a->vm);
 +    fpst = fpstatus_ptr(FPST_FPCR_F16);
 +    tcg_rmode = tcg_const_i32(float_round_to_zero);
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    gen_helper_rinth(tmp, tmp, fpst);
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    neon_store_reg32(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tcg_rmode);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
  static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
  {
      TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
      return true;
  }
 +static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i32 tmp;
 +
 +    if (!dc_isar_feature(aa32_fp16_arith, s)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i32();
 +    neon_load_reg32(tmp, a->vm);
 +    fpst = fpstatus_ptr(FPST_FPCR_F16);
 +    gen_helper_rinth_exact(tmp, tmp, fpst);
 +    neon_store_reg32(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
  static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
  {
      TCGv_ptr fpst;
 --
 .20.1

-[Qemu-devel] [PULL 25/39] target/arm: Refactor disas_simd_indexed decode
+[PULL 19/47] target/arm: Implement new VFP fp16 insn VINS
-From: Richard Henderson <richard.henderson@linaro.org>
+The fp16 extension includes a new instruction VINS, which copies the
 lower 16 bits of a 32-bit source VFP register into the upper 16 bits
 of the destination.  Implement it.
-Include the U bit in the switches rather than testing separately.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200828183354.27913-20-peter.maydell@linaro.org
 ---
  target/arm/vfp-uncond.decode   |  3 +++
  target/arm/translate-vfp.c.inc | 28 ++++++++++++++++++++++++++++
 files changed, 31 insertions(+)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20180228193125.20577-3-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/translate-a64.c | 129 +++++++++++++++++++++------------------------
 file changed, 61 insertions(+), 68 deletions(-)
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/vfp-uncond.decode
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/vfp-uncond.decode
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ VCVT        1111 1110 1.11 11 rm:2 .... 1010 op:1 1.0 .... \
-     int index;
+             vm=%vm_sp vd=%vd_sp sz=2
-     TCGv_ptr fpst;
+ VCVT        1111 1110 1.11 11 rm:2 .... 1011 op:1 1.0 .... \
+             vm=%vm_dp vd=%vd_sp sz=3
--    switch (opcode) {
++
--    case 0x0: /* MLA */
++VINS        1111 1110 1.11 0000 .... 1010 11 . 0 .... \
--    case 0x4: /* MLS */
++            vd=%vd_sp vm=%vm_sp
--        if (!u || is_scalar) {
+diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
-+    switch (16 * u + opcode) {
+index XXXXXXX..XXXXXXX 100644
-+    case 0x08: /* MUL */
+--- a/target/arm/translate-vfp.c.inc
-+    case 0x10: /* MLA */
++++ b/target/arm/translate-vfp.c.inc
-+    case 0x14: /* MLS */
+@@ -XXX,XX +XXX,XX @@ static bool trans_NOCP(DisasContext *s, arg_NOCP *a)
-+        if (is_scalar) {
-             unallocated_encoding(s);
+     return false;
-             return;
+ }
-         }
++
-         break;
++static bool trans_VINS(DisasContext *s, arg_VINS *a)
--    case 0x2: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
++{
--    case 0x6: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
++    TCGv_i32 rd, rm;
--    case 0xa: /* SMULL, SMULL2, UMULL, UMULL2 */
++
-+    case 0x02: /* SMLAL, SMLAL2 */
++    if (!dc_isar_feature(aa32_fp16_arith, s)) {
-+    case 0x12: /* UMLAL, UMLAL2 */
++        return false;
-+    case 0x06: /* SMLSL, SMLSL2 */
++    }
-+    case 0x16: /* UMLSL, UMLSL2 */
++
-+    case 0x0a: /* SMULL, SMULL2 */
++    if (s->vec_len != 0 || s->vec_stride != 0) {
-+    case 0x1a: /* UMULL, UMULL2 */
++        return false;
-         if (is_scalar) {
++    }
-             unallocated_encoding(s);
++
-             return;
++    if (!vfp_access_check(s)) {
-         }
++        return true;
-         is_long = true;
++    }
-         break;
++
--    case 0x3: /* SQDMLAL, SQDMLAL2 */
++    /* Insert low half of Vm into high half of Vd */
--    case 0x7: /* SQDMLSL, SQDMLSL2 */
++    rm = tcg_temp_new_i32();
--    case 0xb: /* SQDMULL, SQDMULL2 */
++    rd = tcg_temp_new_i32();
-+    case 0x03: /* SQDMLAL, SQDMLAL2 */
++    neon_load_reg32(rm, a->vm);
-+    case 0x07: /* SQDMLSL, SQDMLSL2 */
++    neon_load_reg32(rd, a->vd);
-+    case 0x0b: /* SQDMULL, SQDMULL2 */
++    tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
-         is_long = true;
++    neon_store_reg32(rd, a->vd);
--        /* fall through */
++    tcg_temp_free_i32(rm);
--    case 0xc: /* SQDMULH */
++    tcg_temp_free_i32(rd);
--    case 0xd: /* SQRDMULH */
++    return true;
--        if (u) {
++}
 -            unallocated_encoding(s);
 -            return;
 -        }
          break;
 -    case 0x8: /* MUL */
 -        if (u || is_scalar) {
 -            unallocated_encoding(s);
 -            return;
 -        }
 +    case 0x0c: /* SQDMULH */
 +    case 0x0d: /* SQRDMULH */
          break;
 -    case 0x1: /* FMLA */
 -    case 0x5: /* FMLS */
 -        if (u) {
 -            unallocated_encoding(s);
 -            return;
 -        }
 -        /* fall through */
 -    case 0x9: /* FMUL, FMULX */
 +    case 0x01: /* FMLA */
 +    case 0x05: /* FMLS */
 +    case 0x09: /* FMUL */
 +    case 0x19: /* FMULX */
          if (size == 1) {
              unallocated_encoding(s);
              return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
              read_vec_element(s, tcg_op, rn, pass, MO_64);
 -            switch (opcode) {
 -            case 0x5: /* FMLS */
 +            switch (16 * u + opcode) {
 +            case 0x05: /* FMLS */
                  /* As usual for ARM, separate negation for fused multiply-add */
                  gen_helper_vfp_negd(tcg_op, tcg_op);
                  /* fall through */
 -            case 0x1: /* FMLA */
 +            case 0x01: /* FMLA */
                  read_vec_element(s, tcg_res, rd, pass, MO_64);
                  gen_helper_vfp_muladdd(tcg_res, tcg_op, tcg_idx, tcg_res, fpst);
                  break;
 -            case 0x9: /* FMUL, FMULX */
 -                if (u) {
 -                    gen_helper_vfp_mulxd(tcg_res, tcg_op, tcg_idx, fpst);
 -                } else {
 -                    gen_helper_vfp_muld(tcg_res, tcg_op, tcg_idx, fpst);
 -                }
 +            case 0x09: /* FMUL */
 +                gen_helper_vfp_muld(tcg_res, tcg_op, tcg_idx, fpst);
 +                break;
 +            case 0x19: /* FMULX */
 +                gen_helper_vfp_mulxd(tcg_res, tcg_op, tcg_idx, fpst);
                  break;
              default:
                  g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
              read_vec_element_i32(s, tcg_op, rn, pass, is_scalar ? size : MO_32);
 -            switch (opcode) {
 -            case 0x0: /* MLA */
 -            case 0x4: /* MLS */
 -            case 0x8: /* MUL */
 +            switch (16 * u + opcode) {
 +            case 0x08: /* MUL */
 +            case 0x10: /* MLA */
 +            case 0x14: /* MLS */
              {
                  static NeonGenTwoOpFn * const fns[2][2] = {
                      { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 },
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                  genfn(tcg_res, tcg_op, tcg_res);
                  break;
              }
 -            case 0x5: /* FMLS */
 -            case 0x1: /* FMLA */
 +            case 0x05: /* FMLS */
 +            case 0x01: /* FMLA */
                  read_vec_element_i32(s, tcg_res, rd, pass,
                                       is_scalar ? size : MO_32);
                  switch (size) {
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                      g_assert_not_reached();
                  }
                  break;
 -            case 0x9: /* FMUL, FMULX */
 +            case 0x09: /* FMUL */
                  switch (size) {
                  case 1:
 -                    if (u) {
 -                        if (is_scalar) {
 -                            gen_helper_advsimd_mulxh(tcg_res, tcg_op,
 -                                                     tcg_idx, fpst);
 -                        } else {
 -                            gen_helper_advsimd_mulx2h(tcg_res, tcg_op,
 -                                                      tcg_idx, fpst);
 -                        }
 +                    if (is_scalar) {
 +                        gen_helper_advsimd_mulh(tcg_res, tcg_op,
 +                                                tcg_idx, fpst);
                      } else {
 -                        if (is_scalar) {
 -                            gen_helper_advsimd_mulh(tcg_res, tcg_op,
 -                                                    tcg_idx, fpst);
 -                        } else {
 -                            gen_helper_advsimd_mul2h(tcg_res, tcg_op,
 -                                                     tcg_idx, fpst);
 -                        }
 +                        gen_helper_advsimd_mul2h(tcg_res, tcg_op,
 +                                                 tcg_idx, fpst);
                      }
                      break;
                  case 2:
 -                    if (u) {
 -                        gen_helper_vfp_mulxs(tcg_res, tcg_op, tcg_idx, fpst);
 -                    } else {
 -                        gen_helper_vfp_muls(tcg_res, tcg_op, tcg_idx, fpst);
 -                    }
 +                    gen_helper_vfp_muls(tcg_res, tcg_op, tcg_idx, fpst);
                      break;
                  default:
                      g_assert_not_reached();
                  }
                  break;
 -            case 0xc: /* SQDMULH */
 +            case 0x19: /* FMULX */
 +                switch (size) {
 +                case 1:
 +                    if (is_scalar) {
 +                        gen_helper_advsimd_mulxh(tcg_res, tcg_op,
 +                                                 tcg_idx, fpst);
 +                    } else {
 +                        gen_helper_advsimd_mulx2h(tcg_res, tcg_op,
 +                                                  tcg_idx, fpst);
 +                    }
 +                    break;
 +                case 2:
 +                    gen_helper_vfp_mulxs(tcg_res, tcg_op, tcg_idx, fpst);
 +                    break;
 +                default:
 +                    g_assert_not_reached();
 +                }
 +                break;
 +            case 0x0c: /* SQDMULH */
                  if (size == 1) {
                      gen_helper_neon_qdmulh_s16(tcg_res, cpu_env,
                                                 tcg_op, tcg_idx);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                                                 tcg_op, tcg_idx);
                  }
                  break;
 -            case 0xd: /* SQRDMULH */
 +            case 0x0d: /* SQRDMULH */
                  if (size == 1) {
                      gen_helper_neon_qrdmulh_s16(tcg_res, cpu_env,
                                                  tcg_op, tcg_idx);
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 22/39] hw/arm/iotkit: Model Arm IOT Kit
+[PULL 20/47] target/arm: Implement new VFP fp16 insn VMOVX
-Model the Arm IoT Kit documented in
+The fp16 extension includes a new instruction VMOVX, which copies the
-http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
+upper 16 bits of a 32-bit source VFP register into the lower 16
+bits of the destination and zeroes the high half of the destination.
-The Arm IoT Kit is a subsystem which includes a CPU and some devices,
+Implement it.
 and is intended be extended by adding extra devices to form a
 complete system.  It is used in the MPS2 board's AN505 image for the
 Cortex-M33.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-19-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-21-peter.maydell@linaro.org
 ---
- hw/arm/Makefile.objs            |   1 +
+ target/arm/vfp-uncond.decode   |  3 +++
- include/hw/arm/iotkit.h         | 109 ++++++++
+ target/arm/translate-vfp.c.inc | 25 +++++++++++++++++++++++++
- hw/arm/iotkit.c                 | 598 ++++++++++++++++++++++++++++++++++++++++
+files changed, 28 insertions(+)
  default-configs/arm-softmmu.mak |   1 +
 files changed, 709 insertions(+)
  create mode 100644 include/hw/arm/iotkit.h
  create mode 100644 hw/arm/iotkit.c
-diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
+diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/Makefile.objs
+--- a/target/arm/vfp-uncond.decode
-+++ b/hw/arm/Makefile.objs
++++ b/target/arm/vfp-uncond.decode
-@@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_FSL_IMX6) += fsl-imx6.o sabrelite.o
+@@ -XXX,XX +XXX,XX @@ VCVT        1111 1110 1.11 11 rm:2 .... 1010 op:1 1.0 .... \
- obj-$(CONFIG_ASPEED_SOC) += aspeed_soc.o aspeed.o
+ VCVT        1111 1110 1.11 11 rm:2 .... 1011 op:1 1.0 .... \
- obj-$(CONFIG_MPS2) += mps2.o
+             vm=%vm_dp vd=%vd_sp sz=3
- obj-$(CONFIG_MSF2) += msf2-soc.o msf2-som.o
-+obj-$(CONFIG_IOTKIT) += iotkit.o
++VMOVX       1111 1110 1.11 0000 .... 1010 01 . 0 .... \
-diff --git a/include/hw/arm/iotkit.h b/include/hw/arm/iotkit.h
++            vd=%vd_sp vm=%vm_sp
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/arm/iotkit.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * ARM IoT Kit
 + *
 + * Copyright (c) 2018 Linaro Limited
 + * Written by Peter Maydell
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 or
 + * (at your option) any later version.
 + */
 +
-+/* This is a model of the Arm IoT Kit which is documented in
+ VINS        1111 1110 1.11 0000 .... 1010 11 . 0 .... \
-+ * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
+             vd=%vd_sp vm=%vm_sp
-+ * It contains:
+diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
-+ *  a Cortex-M33
+index XXXXXXX..XXXXXXX 100644
-+ *  the IDAU
+--- a/target/arm/translate-vfp.c.inc
-+ *  some timers and watchdogs
++++ b/target/arm/translate-vfp.c.inc
-+ *  two peripheral protection controllers
+@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
-+ *  a memory protection controller
+     tcg_temp_free_i32(rd);
-+ *  a security controller
+     return true;
-+ *  a bus fabric which arranges that some parts of the address
+ }
 + *  space are secure and non-secure aliases of each other
 + *
 + * QEMU interface:
 + *  + QOM property "memory" is a MemoryRegion containing the devices provided
 + *    by the board model.
 + *  + QOM property "MAINCLK" is the frequency of the main system clock
 + *  + QOM property "EXP_NUMIRQ" sets the number of expansion interrupts
 + *  + Named GPIO inputs "EXP_IRQ" 0..n are the expansion interrupts, which
 + *    are wired to the NVIC lines 32 .. n+32
 + * Controlling up to 4 AHB expansion PPBs which a system using the IoTKit
 + * might provide:
 + *  + named GPIO outputs apb_ppcexp{0,1,2,3}_nonsec[0..15]
 + *  + named GPIO outputs apb_ppcexp{0,1,2,3}_ap[0..15]
 + *  + named GPIO outputs apb_ppcexp{0,1,2,3}_irq_enable
 + *  + named GPIO outputs apb_ppcexp{0,1,2,3}_irq_clear
 + *  + named GPIO inputs apb_ppcexp{0,1,2,3}_irq_status
 + * Controlling each of the 4 expansion AHB PPCs which a system using the IoTKit
 + * might provide:
 + *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_nonsec[0..15]
 + *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_ap[0..15]
 + *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_enable
 + *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_clear
 + *  + named GPIO inputs ahb_ppcexp{0,1,2,3}_irq_status
 + */
 +
-+#ifndef IOTKIT_H
++static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
-+#define IOTKIT_H
++{
 +    TCGv_i32 rm;
 +
-+#include "hw/sysbus.h"
++    if (!dc_isar_feature(aa32_fp16_arith, s)) {
-+#include "hw/arm/armv7m.h"
++        return false;
 +#include "hw/misc/iotkit-secctl.h"
 +#include "hw/misc/tz-ppc.h"
 +#include "hw/timer/cmsdk-apb-timer.h"
 +#include "hw/misc/unimp.h"
 +#include "hw/or-irq.h"
 +#include "hw/core/split-irq.h"
 +
 +#define TYPE_IOTKIT "iotkit"
 +#define IOTKIT(obj) OBJECT_CHECK(IoTKit, (obj), TYPE_IOTKIT)
 +
 +/* We have an IRQ splitter and an OR gate input for each external PPC
 + * and the 2 internal PPCs
 + */
 +#define NUM_EXTERNAL_PPCS (IOTS_NUM_AHB_EXP_PPC + IOTS_NUM_APB_EXP_PPC)
 +#define NUM_PPCS (NUM_EXTERNAL_PPCS + 2)
 +
 +typedef struct IoTKit {
 +    /*< private >*/
 +    SysBusDevice parent_obj;
 +
 +    /*< public >*/
 +    ARMv7MState armv7m;
 +    IoTKitSecCtl secctl;
 +    TZPPC apb_ppc0;
 +    TZPPC apb_ppc1;
 +    CMSDKAPBTIMER timer0;
 +    CMSDKAPBTIMER timer1;
 +    qemu_or_irq ppc_irq_orgate;
 +    SplitIRQ sec_resp_splitter;
 +    SplitIRQ ppc_irq_splitter[NUM_PPCS];
 +
 +    UnimplementedDeviceState dualtimer;
 +    UnimplementedDeviceState s32ktimer;
 +
 +    MemoryRegion container;
 +    MemoryRegion alias1;
 +    MemoryRegion alias2;
 +    MemoryRegion alias3;
 +    MemoryRegion sram0;
 +
 +    qemu_irq *exp_irqs;
 +    qemu_irq ppc0_irq;
 +    qemu_irq ppc1_irq;
 +    qemu_irq sec_resp_cfg;
 +    qemu_irq sec_resp_cfg_in;
 +    qemu_irq nsc_cfg_in;
 +
 +    qemu_irq irq_status_in[NUM_EXTERNAL_PPCS];
 +
 +    uint32_t nsccfg;
 +
 +    /* Properties */
 +    MemoryRegion *board_memory;
 +    uint32_t exp_numirq;
 +    uint32_t mainclk_frq;
 +} IoTKit;
 +
 +#endif
 diff --git a/hw/arm/iotkit.c b/hw/arm/iotkit.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/arm/iotkit.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * Arm IoT Kit
 + *
 + * Copyright (c) 2018 Linaro Limited
 + * Written by Peter Maydell
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 or
 + * (at your option) any later version.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/log.h"
 +#include "qapi/error.h"
 +#include "trace.h"
 +#include "hw/sysbus.h"
 +#include "hw/registerfields.h"
 +#include "hw/arm/iotkit.h"
 +#include "hw/misc/unimp.h"
 +#include "hw/arm/arm.h"
 +
 +/* Create an alias region of @size bytes starting at @base
 + * which mirrors the memory starting at @orig.
 + */
 +static void make_alias(IoTKit *s, MemoryRegion *mr, const char *name,
 +                       hwaddr base, hwaddr size, hwaddr orig)
 +{
 +    memory_region_init_alias(mr, NULL, name, &s->container, orig, size);
 +    /* The alias is even lower priority than unimplemented_device regions */
 +    memory_region_add_subregion_overlap(&s->container, base, mr, -1500);
 +}
 +
 +static void init_sysbus_child(Object *parent, const char *childname,
 +                              void *child, size_t childsize,
 +                              const char *childtype)
 +{
 +    object_initialize(child, childsize, childtype);
 +    object_property_add_child(parent, childname, OBJECT(child), &error_abort);
 +    qdev_set_parent_bus(DEVICE(child), sysbus_get_default());
 +}
 +
 +static void irq_status_forwarder(void *opaque, int n, int level)
 +{
 +    qemu_irq destirq = opaque;
 +
 +    qemu_set_irq(destirq, level);
 +}
 +
 +static void nsccfg_handler(void *opaque, int n, int level)
 +{
 +    IoTKit *s = IOTKIT(opaque);
 +
 +    s->nsccfg = level;
 +}
 +
 +static void iotkit_forward_ppc(IoTKit *s, const char *ppcname, int ppcnum)
 +{
 +    /* Each of the 4 AHB and 4 APB PPCs that might be present in a
 +     * system using the IoTKit has a collection of control lines which
 +     * are provided by the security controller and which we want to
 +     * expose as control lines on the IoTKit device itself, so the
 +     * code using the IoTKit can wire them up to the PPCs.
 +     */
 +    SplitIRQ *splitter = &s->ppc_irq_splitter[ppcnum];
 +    DeviceState *iotkitdev = DEVICE(s);
 +    DeviceState *dev_secctl = DEVICE(&s->secctl);
 +    DeviceState *dev_splitter = DEVICE(splitter);
 +    char *name;
 +
 +    name = g_strdup_printf("%s_nonsec", ppcname);
 +    qdev_pass_gpios(dev_secctl, iotkitdev, name);
 +    g_free(name);
 +    name = g_strdup_printf("%s_ap", ppcname);
 +    qdev_pass_gpios(dev_secctl, iotkitdev, name);
 +    g_free(name);
 +    name = g_strdup_printf("%s_irq_enable", ppcname);
 +    qdev_pass_gpios(dev_secctl, iotkitdev, name);
 +    g_free(name);
 +    name = g_strdup_printf("%s_irq_clear", ppcname);
 +    qdev_pass_gpios(dev_secctl, iotkitdev, name);
 +    g_free(name);
 +
 +    /* irq_status is a little more tricky, because we need to
 +     * split it so we can send it both to the security controller
 +     * and to our OR gate for the NVIC interrupt line.
 +     * Connect up the splitter's outputs, and create a GPIO input
 +     * which will pass the line state to the input splitter.
 +     */
 +    name = g_strdup_printf("%s_irq_status", ppcname);
 +    qdev_connect_gpio_out(dev_splitter, 0,
 +                          qdev_get_gpio_in_named(dev_secctl,
 +                                                 name, 0));
 +    qdev_connect_gpio_out(dev_splitter, 1,
 +                          qdev_get_gpio_in(DEVICE(&s->ppc_irq_orgate), ppcnum));
 +    s->irq_status_in[ppcnum] = qdev_get_gpio_in(dev_splitter, 0);
 +    qdev_init_gpio_in_named_with_opaque(iotkitdev, irq_status_forwarder,
 +                                        s->irq_status_in[ppcnum], name, 1);
 +    g_free(name);
 +}
 +
 +static void iotkit_forward_sec_resp_cfg(IoTKit *s)
 +{
 +    /* Forward the 3rd output from the splitter device as a
 +     * named GPIO output of the iotkit object.
 +     */
 +    DeviceState *dev = DEVICE(s);
 +    DeviceState *dev_splitter = DEVICE(&s->sec_resp_splitter);
 +
 +    qdev_init_gpio_out_named(dev, &s->sec_resp_cfg, "sec_resp_cfg", 1);
 +    s->sec_resp_cfg_in = qemu_allocate_irq(irq_status_forwarder,
 +                                           s->sec_resp_cfg, 1);
 +    qdev_connect_gpio_out(dev_splitter, 2, s->sec_resp_cfg_in);
 +}
 +
 +static void iotkit_init(Object *obj)
 +{
 +    IoTKit *s = IOTKIT(obj);
 +    int i;
 +
 +    memory_region_init(&s->container, obj, "iotkit-container", UINT64_MAX);
 +
 +    init_sysbus_child(obj, "armv7m", &s->armv7m, sizeof(s->armv7m),
 +                      TYPE_ARMV7M);
 +    qdev_prop_set_string(DEVICE(&s->armv7m), "cpu-type",
 +                         ARM_CPU_TYPE_NAME("cortex-m33"));
 +
 +    init_sysbus_child(obj, "secctl", &s->secctl, sizeof(s->secctl),
 +                      TYPE_IOTKIT_SECCTL);
 +    init_sysbus_child(obj, "apb-ppc0", &s->apb_ppc0, sizeof(s->apb_ppc0),
 +                      TYPE_TZ_PPC);
 +    init_sysbus_child(obj, "apb-ppc1", &s->apb_ppc1, sizeof(s->apb_ppc1),
 +                      TYPE_TZ_PPC);
 +    init_sysbus_child(obj, "timer0", &s->timer0, sizeof(s->timer0),
 +                      TYPE_CMSDK_APB_TIMER);
 +    init_sysbus_child(obj, "timer1", &s->timer1, sizeof(s->timer1),
 +                      TYPE_CMSDK_APB_TIMER);
 +    init_sysbus_child(obj, "dualtimer", &s->dualtimer, sizeof(s->dualtimer),
 +                      TYPE_UNIMPLEMENTED_DEVICE);
 +    object_initialize(&s->ppc_irq_orgate, sizeof(s->ppc_irq_orgate),
 +                      TYPE_OR_IRQ);
 +    object_property_add_child(obj, "ppc-irq-orgate",
 +                              OBJECT(&s->ppc_irq_orgate), &error_abort);
 +    object_initialize(&s->sec_resp_splitter, sizeof(s->sec_resp_splitter),
 +                      TYPE_SPLIT_IRQ);
 +    object_property_add_child(obj, "sec-resp-splitter",
 +                              OBJECT(&s->sec_resp_splitter), &error_abort);
 +    for (i = 0; i < ARRAY_SIZE(s->ppc_irq_splitter); i++) {
 +        char *name = g_strdup_printf("ppc-irq-splitter-%d", i);
 +        SplitIRQ *splitter = &s->ppc_irq_splitter[i];
 +
 +        object_initialize(splitter, sizeof(*splitter), TYPE_SPLIT_IRQ);
 +        object_property_add_child(obj, name, OBJECT(splitter), &error_abort);
 +    }
 +    init_sysbus_child(obj, "s32ktimer", &s->s32ktimer, sizeof(s->s32ktimer),
 +                      TYPE_UNIMPLEMENTED_DEVICE);
 +}
 +
 +static void iotkit_exp_irq(void *opaque, int n, int level)
 +{
 +    IoTKit *s = IOTKIT(opaque);
 +
 +    qemu_set_irq(s->exp_irqs[n], level);
 +}
 +
 +static void iotkit_realize(DeviceState *dev, Error **errp)
 +{
 +    IoTKit *s = IOTKIT(dev);
 +    int i;
 +    MemoryRegion *mr;
 +    Error *err = NULL;
 +    SysBusDevice *sbd_apb_ppc0;
 +    SysBusDevice *sbd_secctl;
 +    DeviceState *dev_apb_ppc0;
 +    DeviceState *dev_apb_ppc1;
 +    DeviceState *dev_secctl;
 +    DeviceState *dev_splitter;
 +
 +    if (!s->board_memory) {
 +        error_setg(errp, "memory property was not set");
 +        return;
 +    }
 +
-+    if (!s->mainclk_frq) {
++    if (s->vec_len != 0 || s->vec_stride != 0) {
-+        error_setg(errp, "MAINCLK property was not set");
++        return false;
 +        return;
 +    }
 +
-+    /* Handling of which devices should be available only to secure
++    if (!vfp_access_check(s)) {
-+     * code is usually done differently for M profile than for A profile.
++        return true;
 +     * Instead of putting some devices only into the secure address space,
 +     * devices exist in both address spaces but with hard-wired security
 +     * permissions that will cause the CPU to fault for non-secure accesses.
 +     *
 +     * The IoTKit has an IDAU (Implementation Defined Access Unit),
 +     * which specifies hard-wired security permissions for different
 +     * areas of the physical address space. For the IoTKit IDAU, the
 +     * top 4 bits of the physical address are the IDAU region ID, and
 +     * if bit 28 (ie the lowest bit of the ID) is 0 then this is an NS
 +     * region, otherwise it is an S region.
 +     *
 +     * The various devices and RAMs are generally all mapped twice,
 +     * once into a region that the IDAU defines as secure and once
 +     * into a non-secure region. They sit behind either a Memory
 +     * Protection Controller (for RAM) or a Peripheral Protection
 +     * Controller (for devices), which allow a more fine grained
 +     * configuration of whether non-secure accesses are permitted.
 +     *
 +     * (The other place that guest software can configure security
 +     * permissions is in the architected SAU (Security Attribution
 +     * Unit), which is entirely inside the CPU. The IDAU can upgrade
 +     * the security attributes for a region to more restrictive than
 +     * the SAU specifies, but cannot downgrade them.)
 +     *
 +     * 0x10000000..0x1fffffff  alias of 0x00000000..0x0fffffff
 +     * 0x20000000..0x2007ffff  32KB FPGA block RAM
 +     * 0x30000000..0x3fffffff  alias of 0x20000000..0x2fffffff
 +     * 0x40000000..0x4000ffff  base peripheral region 1
 +     * 0x40010000..0x4001ffff  CPU peripherals (none for IoTKit)
 +     * 0x40020000..0x4002ffff  system control element peripherals
 +     * 0x40080000..0x400fffff  base peripheral region 2
 +     * 0x50000000..0x5fffffff  alias of 0x40000000..0x4fffffff
 +     */
 +
 +    memory_region_add_subregion_overlap(&s->container, 0, s->board_memory, -1);
 +
 +    qdev_prop_set_uint32(DEVICE(&s->armv7m), "num-irq", s->exp_numirq + 32);
 +    /* In real hardware the initial Secure VTOR is set from the INITSVTOR0
 +     * register in the IoT Kit System Control Register block, and the
 +     * initial value of that is in turn specifiable by the FPGA that
 +     * instantiates the IoT Kit. In QEMU we don't implement this wrinkle,
 +     * and simply set the CPU's init-svtor to the IoT Kit default value.
 +     */
 +    qdev_prop_set_uint32(DEVICE(&s->armv7m), "init-svtor", 0x10000000);
 +    object_property_set_link(OBJECT(&s->armv7m), OBJECT(&s->container),
 +                             "memory", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +    object_property_set_link(OBJECT(&s->armv7m), OBJECT(s), "idau", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +    object_property_set_bool(OBJECT(&s->armv7m), true, "realized", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +
-+    /* Connect our EXP_IRQ GPIOs to the NVIC's lines 32 and up. */
++    /* Set Vd to high half of Vm */
-+    s->exp_irqs = g_new(qemu_irq, s->exp_numirq);
++    rm = tcg_temp_new_i32();
-+    for (i = 0; i < s->exp_numirq; i++) {
++    neon_load_reg32(rm, a->vm);
-+        s->exp_irqs[i] = qdev_get_gpio_in(DEVICE(&s->armv7m), i + 32);
++    tcg_gen_shri_i32(rm, rm, 16);
-+    }
++    neon_store_reg32(rm, a->vd);
-+    qdev_init_gpio_in_named(dev, iotkit_exp_irq, "EXP_IRQ", s->exp_numirq);
++    tcg_temp_free_i32(rm);
-+
++    return true;
 +    /* Set up the big aliases first */
 +    make_alias(s, &s->alias1, "alias 1", 0x10000000, 0x10000000, 0x00000000);
 +    make_alias(s, &s->alias2, "alias 2", 0x30000000, 0x10000000, 0x20000000);
 +    /* The 0x50000000..0x5fffffff region is not a pure alias: it has
 +     * a few extra devices that only appear there (generally the
 +     * control interfaces for the protection controllers).
 +     * We implement this by mapping those devices over the top of this
 +     * alias MR at a higher priority.
 +     */
 +    make_alias(s, &s->alias3, "alias 3", 0x50000000, 0x10000000, 0x40000000);
 +
 +    /* This RAM should be behind a Memory Protection Controller, but we
 +     * don't implement that yet.
 +     */
 +    memory_region_init_ram(&s->sram0, NULL, "iotkit.sram0", 0x00008000, &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +    memory_region_add_subregion(&s->container, 0x20000000, &s->sram0);
 +
 +    /* Security controller */
 +    object_property_set_bool(OBJECT(&s->secctl), true, "realized", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +    sbd_secctl = SYS_BUS_DEVICE(&s->secctl);
 +    dev_secctl = DEVICE(&s->secctl);
 +    sysbus_mmio_map(sbd_secctl, 0, 0x50080000);
 +    sysbus_mmio_map(sbd_secctl, 1, 0x40080000);
 +
 +    s->nsc_cfg_in = qemu_allocate_irq(nsccfg_handler, s, 1);
 +    qdev_connect_gpio_out_named(dev_secctl, "nsc_cfg", 0, s->nsc_cfg_in);
 +
 +    /* The sec_resp_cfg output from the security controller must be split into
 +     * multiple lines, one for each of the PPCs within the IoTKit and one
 +     * that will be an output from the IoTKit to the system.
 +     */
 +    object_property_set_int(OBJECT(&s->sec_resp_splitter), 3,
 +                            "num-lines", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +    object_property_set_bool(OBJECT(&s->sec_resp_splitter), true,
 +                             "realized", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +    dev_splitter = DEVICE(&s->sec_resp_splitter);
 +    qdev_connect_gpio_out_named(dev_secctl, "sec_resp_cfg", 0,
 +                                qdev_get_gpio_in(dev_splitter, 0));
 +
 +    /* Devices behind APB PPC0:
 +     *   0x40000000: timer0
 +     *   0x40001000: timer1
 +     *   0x40002000: dual timer
 +     * We must configure and realize each downstream device and connect
 +     * it to the appropriate PPC port; then we can realize the PPC and
 +     * map its upstream ends to the right place in the container.
 +     */
 +    qdev_prop_set_uint32(DEVICE(&s->timer0), "pclk-frq", s->mainclk_frq);
 +    object_property_set_bool(OBJECT(&s->timer0), true, "realized", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +    sysbus_connect_irq(SYS_BUS_DEVICE(&s->timer0), 0,
 +                       qdev_get_gpio_in(DEVICE(&s->armv7m), 3));
 +    mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->timer0), 0);
 +    object_property_set_link(OBJECT(&s->apb_ppc0), OBJECT(mr), "port[0]", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +
 +    qdev_prop_set_uint32(DEVICE(&s->timer1), "pclk-frq", s->mainclk_frq);
 +    object_property_set_bool(OBJECT(&s->timer1), true, "realized", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +    sysbus_connect_irq(SYS_BUS_DEVICE(&s->timer1), 0,
 +                       qdev_get_gpio_in(DEVICE(&s->armv7m), 3));
 +    mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->timer1), 0);
 +    object_property_set_link(OBJECT(&s->apb_ppc0), OBJECT(mr), "port[1]", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +
 +    qdev_prop_set_string(DEVICE(&s->dualtimer), "name", "Dual timer");
 +    qdev_prop_set_uint64(DEVICE(&s->dualtimer), "size", 0x1000);
 +    object_property_set_bool(OBJECT(&s->dualtimer), true, "realized", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +    mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->dualtimer), 0);
 +    object_property_set_link(OBJECT(&s->apb_ppc0), OBJECT(mr), "port[2]", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +
 +    object_property_set_bool(OBJECT(&s->apb_ppc0), true, "realized", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +
 +    sbd_apb_ppc0 = SYS_BUS_DEVICE(&s->apb_ppc0);
 +    dev_apb_ppc0 = DEVICE(&s->apb_ppc0);
 +
 +    mr = sysbus_mmio_get_region(sbd_apb_ppc0, 0);
 +    memory_region_add_subregion(&s->container, 0x40000000, mr);
 +    mr = sysbus_mmio_get_region(sbd_apb_ppc0, 1);
 +    memory_region_add_subregion(&s->container, 0x40001000, mr);
 +    mr = sysbus_mmio_get_region(sbd_apb_ppc0, 2);
 +    memory_region_add_subregion(&s->container, 0x40002000, mr);
 +    for (i = 0; i < IOTS_APB_PPC0_NUM_PORTS; i++) {
 +        qdev_connect_gpio_out_named(dev_secctl, "apb_ppc0_nonsec", i,
 +                                    qdev_get_gpio_in_named(dev_apb_ppc0,
 +                                                           "cfg_nonsec", i));
 +        qdev_connect_gpio_out_named(dev_secctl, "apb_ppc0_ap", i,
 +                                    qdev_get_gpio_in_named(dev_apb_ppc0,
 +                                                           "cfg_ap", i));
 +    }
 +    qdev_connect_gpio_out_named(dev_secctl, "apb_ppc0_irq_enable", 0,
 +                                qdev_get_gpio_in_named(dev_apb_ppc0,
 +                                                       "irq_enable", 0));
 +    qdev_connect_gpio_out_named(dev_secctl, "apb_ppc0_irq_clear", 0,
 +                                qdev_get_gpio_in_named(dev_apb_ppc0,
 +                                                       "irq_clear", 0));
 +    qdev_connect_gpio_out(dev_splitter, 0,
 +                          qdev_get_gpio_in_named(dev_apb_ppc0,
 +                                                 "cfg_sec_resp", 0));
 +
 +    /* All the PPC irq lines (from the 2 internal PPCs and the 8 external
 +     * ones) are sent individually to the security controller, and also
 +     * ORed together to give a single combined PPC interrupt to the NVIC.
 +     */
 +    object_property_set_int(OBJECT(&s->ppc_irq_orgate),
 +                            NUM_PPCS, "num-lines", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +    object_property_set_bool(OBJECT(&s->ppc_irq_orgate), true,
 +                             "realized", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +    qdev_connect_gpio_out(DEVICE(&s->ppc_irq_orgate), 0,
 +                          qdev_get_gpio_in(DEVICE(&s->armv7m), 10));
 +
 +    /* 0x40010000 .. 0x4001ffff: private CPU region: unused in IoTKit */
 +
 +    /* 0x40020000 .. 0x4002ffff : IoTKit system control peripheral region */
 +    /* Devices behind APB PPC1:
 +     *   0x4002f000: S32K timer
 +     */
 +    qdev_prop_set_string(DEVICE(&s->s32ktimer), "name", "S32KTIMER");
 +    qdev_prop_set_uint64(DEVICE(&s->s32ktimer), "size", 0x1000);
 +    object_property_set_bool(OBJECT(&s->s32ktimer), true, "realized", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +    mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->s32ktimer), 0);
 +    object_property_set_link(OBJECT(&s->apb_ppc1), OBJECT(mr), "port[0]", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +
 +    object_property_set_bool(OBJECT(&s->apb_ppc1), true, "realized", &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +    mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->apb_ppc1), 0);
 +    memory_region_add_subregion(&s->container, 0x4002f000, mr);
 +
 +    dev_apb_ppc1 = DEVICE(&s->apb_ppc1);
 +    qdev_connect_gpio_out_named(dev_secctl, "apb_ppc1_nonsec", 0,
 +                                qdev_get_gpio_in_named(dev_apb_ppc1,
 +                                                       "cfg_nonsec", 0));
 +    qdev_connect_gpio_out_named(dev_secctl, "apb_ppc1_ap", 0,
 +                                qdev_get_gpio_in_named(dev_apb_ppc1,
 +                                                       "cfg_ap", 0));
 +    qdev_connect_gpio_out_named(dev_secctl, "apb_ppc1_irq_enable", 0,
 +                                qdev_get_gpio_in_named(dev_apb_ppc1,
 +                                                       "irq_enable", 0));
 +    qdev_connect_gpio_out_named(dev_secctl, "apb_ppc1_irq_clear", 0,
 +                                qdev_get_gpio_in_named(dev_apb_ppc1,
 +                                                       "irq_clear", 0));
 +    qdev_connect_gpio_out(dev_splitter, 1,
 +                          qdev_get_gpio_in_named(dev_apb_ppc1,
 +                                                 "cfg_sec_resp", 0));
 +
 +    /* Using create_unimplemented_device() maps the stub into the
 +     * system address space rather than into our container, but the
 +     * overall effect to the guest is the same.
 +     */
 +    create_unimplemented_device("SYSINFO", 0x40020000, 0x1000);
 +
 +    create_unimplemented_device("SYSCONTROL", 0x50021000, 0x1000);
 +    create_unimplemented_device("S32KWATCHDOG", 0x5002e000, 0x1000);
 +
 +    /* 0x40080000 .. 0x4008ffff : IoTKit second Base peripheral region */
 +
 +    create_unimplemented_device("NS watchdog", 0x40081000, 0x1000);
 +    create_unimplemented_device("S watchdog", 0x50081000, 0x1000);
 +
 +    create_unimplemented_device("SRAM0 MPC", 0x50083000, 0x1000);
 +
 +    for (i = 0; i < ARRAY_SIZE(s->ppc_irq_splitter); i++) {
 +        Object *splitter = OBJECT(&s->ppc_irq_splitter[i]);
 +
 +        object_property_set_int(splitter, 2, "num-lines", &err);
 +        if (err) {
 +            error_propagate(errp, err);
 +            return;
 +        }
 +        object_property_set_bool(splitter, true, "realized", &err);
 +        if (err) {
 +            error_propagate(errp, err);
 +            return;
 +        }
 +    }
 +
 +    for (i = 0; i < IOTS_NUM_AHB_EXP_PPC; i++) {
 +        char *ppcname = g_strdup_printf("ahb_ppcexp%d", i);
 +
 +        iotkit_forward_ppc(s, ppcname, i);
 +        g_free(ppcname);
 +    }
 +
 +    for (i = 0; i < IOTS_NUM_APB_EXP_PPC; i++) {
 +        char *ppcname = g_strdup_printf("apb_ppcexp%d", i);
 +
 +        iotkit_forward_ppc(s, ppcname, i + IOTS_NUM_AHB_EXP_PPC);
 +        g_free(ppcname);
 +    }
 +
 +    for (i = NUM_EXTERNAL_PPCS; i < NUM_PPCS; i++) {
 +        /* Wire up IRQ splitter for internal PPCs */
 +        DeviceState *devs = DEVICE(&s->ppc_irq_splitter[i]);
 +        char *gpioname = g_strdup_printf("apb_ppc%d_irq_status",
 +                                         i - NUM_EXTERNAL_PPCS);
 +        TZPPC *ppc = (i == NUM_EXTERNAL_PPCS) ? &s->apb_ppc0 : &s->apb_ppc1;
 +
 +        qdev_connect_gpio_out(devs, 0,
 +                              qdev_get_gpio_in_named(dev_secctl, gpioname, 0));
 +        qdev_connect_gpio_out(devs, 1,
 +                              qdev_get_gpio_in(DEVICE(&s->ppc_irq_orgate), i));
 +        qdev_connect_gpio_out_named(DEVICE(ppc), "irq", 0,
 +                                    qdev_get_gpio_in(devs, 0));
 +    }
 +
 +    iotkit_forward_sec_resp_cfg(s);
 +
 +    system_clock_scale = NANOSECONDS_PER_SECOND / s->mainclk_frq;
 +}
-+
-+static void iotkit_idau_check(IDAUInterface *ii, uint32_t address,
-+                              int *iregion, bool *exempt, bool *ns, bool *nsc)
-+{
-+    /* For IoTKit systems the IDAU responses are simple logical functions
-+     * of the address bits. The NSC attribute is guest-adjustable via the
-+     * NSCCFG register in the security controller.
-+     */
-+    IoTKit *s = IOTKIT(ii);
-+    int region = extract32(address, 28, 4);
-+
-+    *ns = !(region & 1);
-+    *nsc = (region == 1 && (s->nsccfg & 1)) || (region == 3 && (s->nsccfg & 2));
-+    /* 0xe0000000..0xe00fffff and 0xf0000000..0xf00fffff are exempt */
-+    *exempt = (address & 0xeff00000) == 0xe0000000;
-+    *iregion = region;
-+}
-+
-+static const VMStateDescription iotkit_vmstate = {
-+    .name = "iotkit",
-+    .version_id = 1,
-+    .minimum_version_id = 1,
-+    .fields = (VMStateField[]) {
-+        VMSTATE_UINT32(nsccfg, IoTKit),
-+        VMSTATE_END_OF_LIST()
-+    }
-+};
-+
-+static Property iotkit_properties[] = {
-+    DEFINE_PROP_LINK("memory", IoTKit, board_memory, TYPE_MEMORY_REGION,
-+                     MemoryRegion *),
-+    DEFINE_PROP_UINT32("EXP_NUMIRQ", IoTKit, exp_numirq, 64),
-+    DEFINE_PROP_UINT32("MAINCLK", IoTKit, mainclk_frq, 0),
-+    DEFINE_PROP_END_OF_LIST()
-+};
-+
-+static void iotkit_reset(DeviceState *dev)
-+{
-+    IoTKit *s = IOTKIT(dev);
-+
-+    s->nsccfg = 0;
-+}
-+
-+static void iotkit_class_init(ObjectClass *klass, void *data)
-+{
-+    DeviceClass *dc = DEVICE_CLASS(klass);
-+    IDAUInterfaceClass *iic = IDAU_INTERFACE_CLASS(klass);
-+
-+    dc->realize = iotkit_realize;
-+    dc->vmsd = &iotkit_vmstate;
-+    dc->props = iotkit_properties;
-+    dc->reset = iotkit_reset;
-+    iic->check = iotkit_idau_check;
-+}
-+
-+static const TypeInfo iotkit_info = {
-+    .name = TYPE_IOTKIT,
-+    .parent = TYPE_SYS_BUS_DEVICE,
-+    .instance_size = sizeof(IoTKit),
-+    .instance_init = iotkit_init,
-+    .class_init = iotkit_class_init,
-+    .interfaces = (InterfaceInfo[]) {
-+        { TYPE_IDAU_INTERFACE },
-+        { }
-+    }
-+};
-+
-+static void iotkit_register_types(void)
-+{
-+    type_register_static(&iotkit_info);
-+}
-+
-+type_init(iotkit_register_types);
-diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
-index XXXXXXX..XXXXXXX 100644
---- a/default-configs/arm-softmmu.mak
-+++ b/default-configs/arm-softmmu.mak
-@@ -XXX,XX +XXX,XX @@ CONFIG_MPS2_FPGAIO=y
- CONFIG_MPS2_SCC=y
- CONFIG_TZ_PPC=y
-+CONFIG_IOTKIT=y
- CONFIG_IOTKIT_SECCTL=y
- CONFIG_VERSATILE_PCI=y
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 36/39] target/arm: Decode aa32 armv8.3 3-same
+[PULL 21/47] target/arm: Implement VFP fp16 VMOV between gp and halfprec registers
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the VFP fp16 variant of VMOV that transfers a 16-bit
 value between a general purpose register and a VFP register.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Note that Rt == 15 is UNPREDICTABLE; since this insn is v8 and later
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+only we have no need to replicate the old "updates CPSR.NZCV"
-Message-id: 20180228193125.20577-14-richard.henderson@linaro.org
+behaviour that the singleprec version of this insn does.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200828183354.27913-22-peter.maydell@linaro.org
 ---
- target/arm/translate.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
+ target/arm/vfp.decode          |  1 +
-file changed, 68 insertions(+)
+ target/arm/translate-vfp.c.inc | 34 ++++++++++++++++++++++++++++++++++
 files changed, 35 insertions(+)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/vfp.decode
-+++ b/target/arm/translate.c
++++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ VDUP         ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \
-     return 0;
+              vn=%vn_dp
  VMSR_VMRS    ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
 +VMOV_half    ---- 1110 000 l:1 .... rt:4 1001 . 001 0000    vn=%vn_sp
  VMOV_single  ---- 1110 000 l:1 .... rt:4 1010 . 001 0000    vn=%vn_sp
  VMOV_64_sp   ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 ....   vm=%vm_sp
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
      return true;
  }
-+/* Advanced SIMD three registers of the same length extension.
++static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
 + *  31           25    23  22    20   16   12  11   10   9    8        3     0
 + * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
 + * | 1 1 1 1 1 1 0 | op1 | D | op2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
 + * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
 + */
 +static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
 +{
-+    gen_helper_gvec_3_ptr *fn_gvec_ptr;
++    TCGv_i32 tmp;
 +    int rd, rn, rm, rot, size, opr_sz;
 +    TCGv_ptr fpst;
 +    bool q;
 +
-+    q = extract32(insn, 6, 1);
++    if (!dc_isar_feature(aa32_fp16_arith, s)) {
-+    VFP_DREG_D(rd, insn);
++        return false;
 +    VFP_DREG_N(rn, insn);
 +    VFP_DREG_M(rm, insn);
 +    if ((rd | rn | rm) & q) {
 +        return 1;
 +    }
 +
-+    if ((insn & 0xfe200f10) == 0xfc200800) {
++    if (a->rt == 15) {
-+        /* VCMLA -- 1111 110R R.1S .... .... 1000 ...0 .... */
++        /* UNPREDICTABLE; we choose to UNDEF */
-+        size = extract32(insn, 20, 1);
++        return false;
 +        rot = extract32(insn, 23, 2);
 +        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)
 +            || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
 +            return 1;
 +        }
 +        fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
 +    } else if ((insn & 0xfea00f10) == 0xfc800800) {
 +        /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
 +        size = extract32(insn, 20, 1);
 +        rot = extract32(insn, 24, 1);
 +        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)
 +            || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
 +            return 1;
 +        }
 +        fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
 +    } else {
 +        return 1;
 +    }
 +
-+    if (s->fp_excp_el) {
++    if (!vfp_access_check(s)) {
-+        gen_exception_insn(s, 4, EXCP_UDEF,
++        return true;
 +                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
 +        return 0;
 +    }
 +    if (!s->vfp_enabled) {
 +        return 1;
 +    }
 +
-+    opr_sz = (1 + q) * 8;
++    if (a->l) {
-+    fpst = get_fpstatus_ptr(1);
++        /* VFP to general purpose register */
-+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
++        tmp = tcg_temp_new_i32();
-+                       vfp_reg_offset(1, rn),
++        neon_load_reg32(tmp, a->vn);
-+                       vfp_reg_offset(1, rm), fpst,
++        tcg_gen_andi_i32(tmp, tmp, 0xffff);
-+                       opr_sz, opr_sz, rot, fn_gvec_ptr);
++        store_reg(s, a->rt, tmp);
-+    tcg_temp_free_ptr(fpst);
++    } else {
-+    return 0;
++        /* general purpose register to VFP */
 +        tmp = load_reg(s, a->rt);
 +        tcg_gen_andi_i32(tmp, tmp, 0xffff);
 +        neon_store_reg32(tmp, a->vn);
 +        tcg_temp_free_i32(tmp);
 +    }
 +
 +    return true;
 +}
 +
- static int disas_coproc_insn(DisasContext *s, uint32_t insn)
+ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
  {
-     int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
+     TCGv_i32 tmp;
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                      }
                  }
              }
 +        } else if ((insn & 0x0e000a00) == 0x0c000800
 +                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
 +            if (disas_neon_insn_3same_ext(s, insn)) {
 +                goto illegal_op;
 +            }
 +            return;
          } else if ((insn & 0x0fe00000) == 0x0c400000) {
              /* Coprocessor double register transfer.  */
              ARCH(5TE);
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 12/39] target/arm: Add Cortex-M33
+[PULL 22/47] target/arm: Implement FP16 for Neon VADD, VSUB, VABD, VMUL
-Add a Cortex-M33 definition. The M33 is an M profile CPU
+Implement FP16 support for the Neon insns which use the DO_3S_FP_GVEC
-which implements the ARM v8M architecture, including the
+macro: VADD, VSUB, VABD, VMUL.
-M profile Security Extension.
 For VABD this requires us to implement a new gvec_fabd_h helper
 using the machinery we have already for the other helpers.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-9-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-24-peter.maydell@linaro.org
 ---
- target/arm/cpu.c | 31 +++++++++++++++++++++++++++++++
+ target/arm/helper.h             |  1 +
-file changed, 31 insertions(+)
+ target/arm/vec_helper.c         |  6 ++++++
  target/arm/translate-neon.c.inc | 36 +++++++++++++++++----------------
 files changed, 26 insertions(+), 17 deletions(-)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/target/arm/helper.h
-+++ b/target/arm/cpu.c
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ static void cortex_m4_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmul_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-     cpu->id_isar5 = 0x00000000;
+ DEF_HELPER_FLAGS_5(gvec_fmul_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(gvec_fmul_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_fabd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(gvec_fabd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static float64 float64_ftsmul(float64 op1, uint64_t op2, float_status *stat)
      return result;
  }
-+static void cortex_m33_initfn(Object *obj)
++static float16 float16_abd(float16 op1, float16 op2, float_status *stat)
 +{
-+    ARMCPU *cpu = ARM_CPU(obj);
++    return float16_abs(float16_sub(op1, op2, stat));
 +
 +    set_feature(&cpu->env, ARM_FEATURE_V8);
 +    set_feature(&cpu->env, ARM_FEATURE_M);
 +    set_feature(&cpu->env, ARM_FEATURE_M_SECURITY);
 +    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
 +    cpu->midr = 0x410fd213; /* r0p3 */
 +    cpu->pmsav7_dregion = 16;
 +    cpu->sau_sregion = 8;
 +    cpu->id_pfr0 = 0x00000030;
 +    cpu->id_pfr1 = 0x00000210;
 +    cpu->id_dfr0 = 0x00200000;
 +    cpu->id_afr0 = 0x00000000;
 +    cpu->id_mmfr0 = 0x00101F40;
 +    cpu->id_mmfr1 = 0x00000000;
 +    cpu->id_mmfr2 = 0x01000000;
 +    cpu->id_mmfr3 = 0x00000000;
 +    cpu->id_isar0 = 0x01101110;
 +    cpu->id_isar1 = 0x02212000;
 +    cpu->id_isar2 = 0x20232232;
 +    cpu->id_isar3 = 0x01111131;
 +    cpu->id_isar4 = 0x01310132;
 +    cpu->id_isar5 = 0x00000000;
 +    cpu->clidr = 0x00000000;
 +    cpu->ctr = 0x8000c000;
 +}
 +
- static void arm_v7m_class_init(ObjectClass *oc, void *data)
+ static float32 float32_abd(float32 op1, float32 op2, float_status *stat)
  {
-     CPUClass *cc = CPU_CLASS(oc);
+     return float32_abs(float32_sub(op1, op2, stat));
-@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_cpus[] = {
+@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_ftsmul_h, float16_ftsmul, float16)
-                              .class_init = arm_v7m_class_init },
+ DO_3OP(gvec_ftsmul_s, float32_ftsmul, float32)
-     { .name = "cortex-m4",   .initfn = cortex_m4_initfn,
+ DO_3OP(gvec_ftsmul_d, float64_ftsmul, float64)
-                              .class_init = arm_v7m_class_init },
-+    { .name = "cortex-m33",  .initfn = cortex_m33_initfn,
++DO_3OP(gvec_fabd_h, float16_abd, float16)
-+                             .class_init = arm_v7m_class_init },
+ DO_3OP(gvec_fabd_s, float32_abd, float32)
-     { .name = "cortex-r5",   .initfn = cortex_r5_initfn },
-     { .name = "cortex-a7",   .initfn = cortex_a7_initfn },
+ #ifdef TARGET_AARCH64
-     { .name = "cortex-a8",   .initfn = cortex_a8_initfn },
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_3same_fp(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn,
      return true;
  }
 -/*
 - * For all the functions using this macro, size == 1 means fp16,
 - * which is an architecture extension we don't implement yet.
 - */
 -#define DO_3S_FP_GVEC(INSN,FUNC)                                        \
 -    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
 -                                uint32_t rn_ofs, uint32_t rm_ofs,       \
 -                                uint32_t oprsz, uint32_t maxsz)         \
 +#define WRAP_FP_GVEC(WRAPNAME, FPST, FUNC)                              \
 +    static void WRAPNAME(unsigned vece, uint32_t rd_ofs,                \
 +                         uint32_t rn_ofs, uint32_t rm_ofs,              \
 +                         uint32_t oprsz, uint32_t maxsz)                \
      {                                                                   \
 -        TCGv_ptr fpst = fpstatus_ptr(FPST_STD);                         \
 +        TCGv_ptr fpst = fpstatus_ptr(FPST);                             \
          tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpst,                \
                             oprsz, maxsz, 0, FUNC);                      \
          tcg_temp_free_ptr(fpst);                                        \
 -    }                                                                   \
 +    }
 +
 +#define DO_3S_FP_GVEC(INSN,SFUNC,HFUNC)                                 \
 +    WRAP_FP_GVEC(gen_##INSN##_fp32_3s, FPST_STD, SFUNC)                 \
 +    WRAP_FP_GVEC(gen_##INSN##_fp16_3s, FPST_STD_F16, HFUNC)             \
      static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a)     \
      {                                                                   \
          if (a->size != 0) {                                             \
 -            /* TODO fp16 support */                                     \
 -            return false;                                               \
 +            if (!dc_isar_feature(aa32_fp16_arith, s)) {                 \
 +                return false;                                           \
 +            }                                                           \
 +            return do_3same(s, a, gen_##INSN##_fp16_3s);                \
          }                                                               \
 -        return do_3same(s, a, gen_##INSN##_3s);                         \
 +        return do_3same(s, a, gen_##INSN##_fp32_3s);                    \
      }
 -DO_3S_FP_GVEC(VADD, gen_helper_gvec_fadd_s)
 -DO_3S_FP_GVEC(VSUB, gen_helper_gvec_fsub_s)
 -DO_3S_FP_GVEC(VABD, gen_helper_gvec_fabd_s)
 -DO_3S_FP_GVEC(VMUL, gen_helper_gvec_fmul_s)
 +DO_3S_FP_GVEC(VADD, gen_helper_gvec_fadd_s, gen_helper_gvec_fadd_h)
 +DO_3S_FP_GVEC(VSUB, gen_helper_gvec_fsub_s, gen_helper_gvec_fsub_h)
 +DO_3S_FP_GVEC(VABD, gen_helper_gvec_fabd_s, gen_helper_gvec_fabd_h)
 +DO_3S_FP_GVEC(VMUL, gen_helper_gvec_fmul_s, gen_helper_gvec_fmul_h)
  /*
   * For all the functions using this macro, size == 1 means fp16,
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 31/39] target/arm: Decode aa32 armv8.1 two reg and a scalar
+[PULL 23/47] target/arm: Implement fp16 for Neon VRECPE, VRSQRTE using gvec
-From: Richard Henderson <richard.henderson@linaro.org>
+We already have gvec helpers for floating point VRECPE and
 VRQSRTE, so convert the Neon decoder to use them and
 add the fp16 support.
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180228193125.20577-9-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200828183354.27913-25-peter.maydell@linaro.org
 ---
- target/arm/translate.c | 46 ++++++++++++++++++++++++++++++++++++++++++----
+ target/arm/translate-neon.c.inc | 31 +++++++++++++++++++++++++++++--
-file changed, 42 insertions(+), 4 deletions(-)
+file changed, 29 insertions(+), 2 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/translate-neon.c.inc
-+++ b/target/arm/translate.c
++++ b/target/arm/translate-neon.c.inc
-@@ -XXX,XX +XXX,XX @@ static const char *regnames[] =
+@@ -XXX,XX +XXX,XX @@ static bool do_2misc_fp(DisasContext *s, arg_2misc *a,
-     { "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
+         return do_2misc_fp(s, a, FUNC);                         \
-       "r8", "r9", "r10", "r11", "r12", "r13", "r14", "pc" };
+     }
-+/* Function prototypes for gen_ functions calling Neon helpers.  */
+-DO_2MISC_FP(VRECPE_F, gen_helper_recpe_f32)
-+typedef void NeonGenThreeOpEnvFn(TCGv_i32, TCGv_env, TCGv_i32,
+-DO_2MISC_FP(VRSQRTE_F, gen_helper_rsqrte_f32)
-+                                 TCGv_i32, TCGv_i32);
+ DO_2MISC_FP(VCVT_FS, gen_helper_vfp_sitos)
  DO_2MISC_FP(VCVT_FU, gen_helper_vfp_uitos)
  DO_2MISC_FP(VCVT_SF, gen_helper_vfp_tosizs)
  DO_2MISC_FP(VCVT_UF, gen_helper_vfp_touizs)
 +#define DO_2MISC_FP_VEC(INSN, HFUNC, SFUNC)                             \
 +    static void gen_##INSN(unsigned vece, uint32_t rd_ofs,              \
 +                           uint32_t rm_ofs,                             \
 +                           uint32_t oprsz, uint32_t maxsz)              \
 +    {                                                                   \
 +        static gen_helper_gvec_2_ptr * const fns[4] = {                 \
 +            NULL, HFUNC, SFUNC, NULL,                                   \
 +        };                                                              \
 +        TCGv_ptr fpst;                                                  \
 +        fpst = fpstatus_ptr(vece == MO_16 ? FPST_STD_F16 : FPST_STD);   \
 +        tcg_gen_gvec_2_ptr(rd_ofs, rm_ofs, fpst, oprsz, maxsz, 0,       \
 +                           fns[vece]);                                  \
 +        tcg_temp_free_ptr(fpst);                                        \
 +    }                                                                   \
 +    static bool trans_##INSN(DisasContext *s, arg_2misc *a)             \
 +    {                                                                   \
 +        if (a->size == MO_16) {                                         \
 +            if (!dc_isar_feature(aa32_fp16_arith, s)) {                 \
 +                return false;                                           \
 +            }                                                           \
 +        } else if (a->size != MO_32) {                                  \
 +            return false;                                               \
 +        }                                                               \
 +        return do_2misc_vec(s, a, gen_##INSN);                          \
 +    }
 +
- /* initialize TCG globals.  */
++DO_2MISC_FP_VEC(VRECPE_F, gen_helper_gvec_frecpe_h, gen_helper_gvec_frecpe_s)
- void arm_translate_init(void)
++DO_2MISC_FP_VEC(VRSQRTE_F, gen_helper_gvec_frsqrte_h, gen_helper_gvec_frsqrte_s)
 +
  static bool trans_VRINTX(DisasContext *s, arg_2misc *a)
  {
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+     if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
                          }
                          neon_store_reg64(cpu_V0, rd + pass);
                      }
 -
 -
                      break;
 -                default: /* 14 and 15 are RESERVED */
 -                    return 1;
 +                case 14: /* VQRDMLAH scalar */
 +                case 15: /* VQRDMLSH scalar */
 +                    {
 +                        NeonGenThreeOpEnvFn *fn;
 +
 +                        if (!arm_dc_feature(s, ARM_FEATURE_V8_RDM)) {
 +                            return 1;
 +                        }
 +                        if (u && ((rd | rn) & 1)) {
 +                            return 1;
 +                        }
 +                        if (op == 14) {
 +                            if (size == 1) {
 +                                fn = gen_helper_neon_qrdmlah_s16;
 +                            } else {
 +                                fn = gen_helper_neon_qrdmlah_s32;
 +                            }
 +                        } else {
 +                            if (size == 1) {
 +                                fn = gen_helper_neon_qrdmlsh_s16;
 +                            } else {
 +                                fn = gen_helper_neon_qrdmlsh_s32;
 +                            }
 +                        }
 +
 +                        tmp2 = neon_get_scalar(size, rm);
 +                        for (pass = 0; pass < (u ? 4 : 2); pass++) {
 +                            tmp = neon_load_reg(rn, pass);
 +                            tmp3 = neon_load_reg(rd, pass);
 +                            fn(tmp, cpu_env, tmp, tmp2, tmp3);
 +                            tcg_temp_free_i32(tmp3);
 +                            neon_store_reg(rd, pass, tmp);
 +                        }
 +                        tcg_temp_free_i32(tmp2);
 +                    }
 +                    break;
 +                default:
 +                    g_assert_not_reached();
                  }
              }
          } else { /* size == 3 */
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 05/39] loader: Add new load_ramdisk_as()
+[PULL 24/47] target/arm: Implement fp16 for Neon VABS, VNEG of floats
-Add a function load_ramdisk_as() which behaves like the existing
+Rewrite Neon VABS/VNEG of floats to use gvec logical AND and XOR, so
-load_ramdisk() but allows the caller to specify the AddressSpace
+that we can implement the fp16 version of the insns.
 to use. This matches the pattern we have already for various
 other loader functions.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-2-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-26-peter.maydell@linaro.org
 ---
- include/hw/loader.h | 12 +++++++++++-
+ target/arm/translate-neon.c.inc | 34 +++++++++++++++++++++++++++------
- hw/core/loader.c    |  8 +++++++-
+file changed, 28 insertions(+), 6 deletions(-)
 files changed, 18 insertions(+), 2 deletions(-)
-diff --git a/include/hw/loader.h b/include/hw/loader.h
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/loader.h
+--- a/target/arm/translate-neon.c.inc
-+++ b/include/hw/loader.h
++++ b/target/arm/translate-neon.c.inc
-@@ -XXX,XX +XXX,XX @@ int load_uimage(const char *filename, hwaddr *ep,
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCNT(DisasContext *s, arg_2misc *a)
-                 void *translate_opaque);
+     return do_2misc(s, a, gen_helper_neon_cnt_u8);
+ }
- /**
-- * load_ramdisk:
++static void gen_VABS_F(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
-+ * load_ramdisk_as:
++                       uint32_t oprsz, uint32_t maxsz)
   * @filename: Path to the ramdisk image
   * @addr: Memory address to load the ramdisk to
   * @max_sz: Maximum allowed ramdisk size (for non-u-boot ramdisks)
 + * @as: The AddressSpace to load the ELF to. The value of address_space_memory
 + *      is used if nothing is supplied here.
   *
   * Load a ramdisk image with U-Boot header to the specified memory
   * address.
   *
   * Returns the size of the loaded image on success, -1 otherwise.
   */
 +int load_ramdisk_as(const char *filename, hwaddr addr, uint64_t max_sz,
 +                    AddressSpace *as);
 +
 +/**
 + * load_ramdisk:
 + * Same as load_ramdisk_as(), but doesn't allow the caller to specify
 + * an AddressSpace.
 + */
  int load_ramdisk(const char *filename, hwaddr addr, uint64_t max_sz);
  ssize_t gunzip(void *dst, size_t dstlen, uint8_t *src, size_t srclen);
 diff --git a/hw/core/loader.c b/hw/core/loader.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/core/loader.c
 +++ b/hw/core/loader.c
@@ -XXX,XX +XXX,XX @@ int load_uimage_as(const char *filename, hwaddr *ep, hwaddr *loadaddr,
  /* Load a ramdisk.  */
  int load_ramdisk(const char *filename, hwaddr addr, uint64_t max_sz)
 +{
-+    return load_ramdisk_as(filename, addr, max_sz, NULL);
++    tcg_gen_gvec_andi(vece, rd_ofs, rm_ofs,
 +                      vece == MO_16 ? 0x7fff : 0x7fffffff,
 +                      oprsz, maxsz);
 +}
 +
-+int load_ramdisk_as(const char *filename, hwaddr addr, uint64_t max_sz,
+ static bool trans_VABS_F(DisasContext *s, arg_2misc *a)
 +                    AddressSpace *as)
  {
-     return load_uboot_image(filename, NULL, &addr, NULL, IH_TYPE_RAMDISK,
+-    if (a->size != 2) {
--                            NULL, NULL, NULL);
++    if (a->size == MO_16) {
-+                            NULL, NULL, as);
++        if (!dc_isar_feature(aa32_fp16_arith, s)) {
 +            return false;
 +        }
 +    } else if (a->size != MO_32) {
          return false;
      }
 -    /* TODO: FP16 : size == 1 */
 -    return do_2misc(s, a, gen_helper_vfp_abss);
 +    return do_2misc_vec(s, a, gen_VABS_F);
 +}
 +
 +static void gen_VNEG_F(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
 +                       uint32_t oprsz, uint32_t maxsz)
 +{
 +    tcg_gen_gvec_xori(vece, rd_ofs, rm_ofs,
 +                      vece == MO_16 ? 0x8000 : 0x80000000,
 +                      oprsz, maxsz);
  }
- /* Load a gzip-compressed kernel to a dynamically allocated buffer. */
+ static bool trans_VNEG_F(DisasContext *s, arg_2misc *a)
  {
 -    if (a->size != 2) {
 +    if (a->size == MO_16) {
 +        if (!dc_isar_feature(aa32_fp16_arith, s)) {
 +            return false;
 +        }
 +    } else if (a->size != MO_32) {
          return false;
      }
 -    /* TODO: FP16 : size == 1 */
 -    return do_2misc(s, a, gen_helper_vfp_negs);
 +    return do_2misc_vec(s, a, gen_VNEG_F);
  }
  static bool trans_VRECPE(DisasContext *s, arg_2misc *a)
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 27/39] target/arm: Decode aa64 armv8.1 scalar three same extra
+[PULL 25/47] target/arm: Implement fp16 for VCEQ, VCGE, VCGT comparisons
-From: Richard Henderson <richard.henderson@linaro.org>
+Convert the Neon floating-point vector comparison ops VCEQ,
 VCGE and VCGT over to using a gvec helper and use this to
 implement the fp16 case.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+(We put the float16_ceq() etc functions above the DO_2OP()
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+macro definition because later when we convert the
-Message-id: 20180228193125.20577-5-richard.henderson@linaro.org
+compare-against-zero instructions we'll want their
 definitions to be visible at that point in the source file.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200828183354.27913-27-peter.maydell@linaro.org
 ---
- target/arm/Makefile.objs   |   2 +-
+ target/arm/helper.h             |  9 +++++++
- target/arm/helper.h        |   4 ++
+ target/arm/vec_helper.c         | 44 +++++++++++++++++++++++++++++++++
- target/arm/translate-a64.c |  84 ++++++++++++++++++++++++++++++++++
+ target/arm/translate-neon.c.inc |  6 ++---
- target/arm/vec_helper.c    | 109 +++++++++++++++++++++++++++++++++++++++++++++
+files changed, 56 insertions(+), 3 deletions(-)
 files changed, 198 insertions(+), 1 deletion(-)
  create mode 100644 target/arm/vec_helper.c
-diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/Makefile.objs
-+++ b/target/arm/Makefile.objs
-@@ -XXX,XX +XXX,XX @@ obj-$(call land,$(CONFIG_KVM),$(call lnot,$(TARGET_AARCH64))) += kvm32.o
- obj-$(call land,$(CONFIG_KVM),$(TARGET_AARCH64)) += kvm64.o
- obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
- obj-y += translate.o op_helper.o helper.o cpu.o
--obj-y += neon_helper.o iwmmxt_helper.o
-+obj-y += neon_helper.o iwmmxt_helper.o vec_helper.o
- obj-y += gdbstub.o
- obj-$(TARGET_AARCH64) += cpu64.o translate-a64.o helper-a64.o gdbstub64.o
- obj-y += crypto_helper.o
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_1(neon_rbit_u8, TCG_CALL_NO_RWG_SE, i32, i32)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmul_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_5(gvec_fabd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- DEF_HELPER_3(neon_qdmulh_s16, i32, env, i32, i32)
+ DEF_HELPER_FLAGS_5(gvec_fabd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- DEF_HELPER_3(neon_qrdmulh_s16, i32, env, i32, i32)
-+DEF_HELPER_4(neon_qrdmlah_s16, i32, env, i32, i32, i32)
++DEF_HELPER_FLAGS_5(gvec_fceq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+DEF_HELPER_4(neon_qrdmlsh_s16, i32, env, i32, i32, i32)
++DEF_HELPER_FLAGS_5(gvec_fceq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- DEF_HELPER_3(neon_qdmulh_s32, i32, env, i32, i32)
++
- DEF_HELPER_3(neon_qrdmulh_s32, i32, env, i32, i32)
++DEF_HELPER_FLAGS_5(gvec_fcge_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+DEF_HELPER_4(neon_qrdmlah_s32, i32, env, s32, s32, s32)
++DEF_HELPER_FLAGS_5(gvec_fcge_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+DEF_HELPER_4(neon_qrdmlsh_s32, i32, env, s32, s32, s32)
++
++DEF_HELPER_FLAGS_5(gvec_fcgt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- DEF_HELPER_1(neon_narrow_u8, i32, i64)
++DEF_HELPER_FLAGS_5(gvec_fcgt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- DEF_HELPER_1(neon_narrow_u16, i32, i64)
++
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+ DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/vec_helper.c
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_same_fp16(DisasContext *s,
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlad)(void *vd, void *vn, void *vm,
-     tcg_temp_free_ptr(fpst);
+     clear_tail(d, opr_sz, simd_maxsz(desc));
  }
-+/* AdvSIMD scalar three same extra
++/*
-+ *  31 30  29 28       24 23  22  21 20  16  15 14    11  10 9  5 4  0
++ * Floating point comparisons producing an integer result (all 1s or all 0s).
-+ * +-----+---+-----------+------+---+------+---+--------+---+----+----+
++ * Note that EQ doesn't signal InvalidOp for QNaNs but GE and GT do.
-+ * | 0 1 | U | 1 1 1 1 0 | size | 0 |  Rm  | 1 | opcode | 1 | Rn | Rd |
++ * Softfloat routines return 0/1, which we convert to the 0/-1 Neon requires.
 + * +-----+---+-----------+------+---+------+---+--------+---+----+----+
 + */
-+static void disas_simd_scalar_three_reg_same_extra(DisasContext *s,
++static uint16_t float16_ceq(float16 op1, float16 op2, float_status *stat)
 +                                                   uint32_t insn)
 +{
-+    int rd = extract32(insn, 0, 5);
++    return -float16_eq_quiet(op1, op2, stat);
 +    int rn = extract32(insn, 5, 5);
 +    int opcode = extract32(insn, 11, 4);
 +    int rm = extract32(insn, 16, 5);
 +    int size = extract32(insn, 22, 2);
 +    bool u = extract32(insn, 29, 1);
 +    TCGv_i32 ele1, ele2, ele3;
 +    TCGv_i64 res;
 +    int feature;
 +
 +    switch (u * 16 + opcode) {
 +    case 0x10: /* SQRDMLAH (vector) */
 +    case 0x11: /* SQRDMLSH (vector) */
 +        if (size != 1 && size != 2) {
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        feature = ARM_FEATURE_V8_RDM;
 +        break;
 +    default:
 +        unallocated_encoding(s);
 +        return;
 +    }
 +    if (!arm_dc_feature(s, feature)) {
 +        unallocated_encoding(s);
 +        return;
 +    }
 +    if (!fp_access_check(s)) {
 +        return;
 +    }
 +
 +    /* Do a single operation on the lowest element in the vector.
 +     * We use the standard Neon helpers and rely on 0 OP 0 == 0
 +     * with no side effects for all these operations.
 +     * OPTME: special-purpose helpers would avoid doing some
 +     * unnecessary work in the helper for the 16 bit cases.
 +     */
 +    ele1 = tcg_temp_new_i32();
 +    ele2 = tcg_temp_new_i32();
 +    ele3 = tcg_temp_new_i32();
 +
 +    read_vec_element_i32(s, ele1, rn, 0, size);
 +    read_vec_element_i32(s, ele2, rm, 0, size);
 +    read_vec_element_i32(s, ele3, rd, 0, size);
 +
 +    switch (opcode) {
 +    case 0x0: /* SQRDMLAH */
 +        if (size == 1) {
 +            gen_helper_neon_qrdmlah_s16(ele3, cpu_env, ele1, ele2, ele3);
 +        } else {
 +            gen_helper_neon_qrdmlah_s32(ele3, cpu_env, ele1, ele2, ele3);
 +        }
 +        break;
 +    case 0x1: /* SQRDMLSH */
 +        if (size == 1) {
 +            gen_helper_neon_qrdmlsh_s16(ele3, cpu_env, ele1, ele2, ele3);
 +        } else {
 +            gen_helper_neon_qrdmlsh_s32(ele3, cpu_env, ele1, ele2, ele3);
 +        }
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +    tcg_temp_free_i32(ele1);
 +    tcg_temp_free_i32(ele2);
 +
 +    res = tcg_temp_new_i64();
 +    tcg_gen_extu_i32_i64(res, ele3);
 +    tcg_temp_free_i32(ele3);
 +
 +    write_fp_dreg(s, rd, res);
 +    tcg_temp_free_i64(res);
 +}
 +
- static void handle_2misc_64(DisasContext *s, int opcode, bool u,
++static uint32_t float32_ceq(float32 op1, float32 op2, float_status *stat)
                              TCGv_i64 tcg_rd, TCGv_i64 tcg_rn,
                              TCGv_i32 tcg_rmode, TCGv_ptr tcg_fpstatus)
@@ -XXX,XX +XXX,XX @@ static const AArch64DecodeTable data_proc_simd[] = {
      { 0x0e000800, 0xbf208c00, disas_simd_zip_trn },
      { 0x2e000000, 0xbf208400, disas_simd_ext },
      { 0x5e200400, 0xdf200400, disas_simd_scalar_three_reg_same },
 +    { 0x5e008400, 0xdf208400, disas_simd_scalar_three_reg_same_extra },
      { 0x5e200000, 0xdf200c00, disas_simd_scalar_three_reg_diff },
      { 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc },
      { 0x5e300800, 0xdf3e0c00, disas_simd_scalar_pairwise },
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * ARM AdvSIMD / SVE Vector Operations
 + *
 + * Copyright (c) 2018 Linaro
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "cpu.h"
 +#include "exec/exec-all.h"
 +#include "exec/helper-proto.h"
 +#include "tcg/tcg-gvec-desc.h"
 +
 +
 +#define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] |= CPSR_Q
 +
 +/* Signed saturating rounding doubling multiply-accumulate high half, 16-bit */
 +static uint16_t inl_qrdmlah_s16(CPUARMState *env, int16_t src1,
 +                                int16_t src2, int16_t src3)
 +{
-+    /* Simplify:
++    return -float32_eq_quiet(op1, op2, stat);
 +     * = ((a3 << 16) + ((e1 * e2) << 1) + (1 << 15)) >> 16
 +     * = ((a3 << 15) + (e1 * e2) + (1 << 14)) >> 15
 +     */
 +    int32_t ret = (int32_t)src1 * src2;
 +    ret = ((int32_t)src3 << 15) + ret + (1 << 14);
 +    ret >>= 15;
 +    if (ret != (int16_t)ret) {
 +        SET_QC();
 +        ret = (ret < 0 ? -0x8000 : 0x7fff);
 +    }
 +    return ret;
 +}
 +
-+uint32_t HELPER(neon_qrdmlah_s16)(CPUARMState *env, uint32_t src1,
++static uint16_t float16_cge(float16 op1, float16 op2, float_status *stat)
 +                                  uint32_t src2, uint32_t src3)
 +{
-+    uint16_t e1 = inl_qrdmlah_s16(env, src1, src2, src3);
++    return -float16_le(op2, op1, stat);
 +    uint16_t e2 = inl_qrdmlah_s16(env, src1 >> 16, src2 >> 16, src3 >> 16);
 +    return deposit32(e1, 16, 16, e2);
 +}
 +
-+/* Signed saturating rounding doubling multiply-subtract high half, 16-bit */
++static uint32_t float32_cge(float32 op1, float32 op2, float_status *stat)
 +static uint16_t inl_qrdmlsh_s16(CPUARMState *env, int16_t src1,
 +                                int16_t src2, int16_t src3)
 +{
-+    /* Similarly, using subtraction:
++    return -float32_le(op2, op1, stat);
 +     * = ((a3 << 16) - ((e1 * e2) << 1) + (1 << 15)) >> 16
 +     * = ((a3 << 15) - (e1 * e2) + (1 << 14)) >> 15
 +     */
 +    int32_t ret = (int32_t)src1 * src2;
 +    ret = ((int32_t)src3 << 15) - ret + (1 << 14);
 +    ret >>= 15;
 +    if (ret != (int16_t)ret) {
 +        SET_QC();
 +        ret = (ret < 0 ? -0x8000 : 0x7fff);
 +    }
 +    return ret;
 +}
 +
-+uint32_t HELPER(neon_qrdmlsh_s16)(CPUARMState *env, uint32_t src1,
++static uint16_t float16_cgt(float16 op1, float16 op2, float_status *stat)
 +                                  uint32_t src2, uint32_t src3)
 +{
-+    uint16_t e1 = inl_qrdmlsh_s16(env, src1, src2, src3);
++    return -float16_lt(op2, op1, stat);
 +    uint16_t e2 = inl_qrdmlsh_s16(env, src1 >> 16, src2 >> 16, src3 >> 16);
 +    return deposit32(e1, 16, 16, e2);
 +}
 +
-+/* Signed saturating rounding doubling multiply-accumulate high half, 32-bit */
++static uint32_t float32_cgt(float32 op1, float32 op2, float_status *stat)
 +uint32_t HELPER(neon_qrdmlah_s32)(CPUARMState *env, int32_t src1,
 +                                  int32_t src2, int32_t src3)
 +{
-+    /* Simplify similarly to int_qrdmlah_s16 above.  */
++    return -float32_lt(op2, op1, stat);
 +    int64_t ret = (int64_t)src1 * src2;
 +    ret = ((int64_t)src3 << 31) + ret + (1 << 30);
 +    ret >>= 31;
 +    if (ret != (int32_t)ret) {
 +        SET_QC();
 +        ret = (ret < 0 ? INT32_MIN : INT32_MAX);
 +    }
 +    return ret;
 +}
 +
-+/* Signed saturating rounding doubling multiply-subtract high half, 32-bit */
+ #define DO_2OP(NAME, FUNC, TYPE) \
-+uint32_t HELPER(neon_qrdmlsh_s32)(CPUARMState *env, int32_t src1,
+ void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc)  \
-+                                  int32_t src2, int32_t src3)
+ {                                                                 \
-+{
+@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_ftsmul_d, float64_ftsmul, float64)
-+    /* Simplify similarly to int_qrdmlsh_s16 above.  */
+ DO_3OP(gvec_fabd_h, float16_abd, float16)
-+    int64_t ret = (int64_t)src1 * src2;
+ DO_3OP(gvec_fabd_s, float32_abd, float32)
-+    ret = ((int64_t)src3 << 31) - ret + (1 << 30);
-+    ret >>= 31;
++DO_3OP(gvec_fceq_h, float16_ceq, float16)
-+    if (ret != (int32_t)ret) {
++DO_3OP(gvec_fceq_s, float32_ceq, float32)
-+        SET_QC();
++
-+        ret = (ret < 0 ? INT32_MIN : INT32_MAX);
++DO_3OP(gvec_fcge_h, float16_cge, float16)
-+    }
++DO_3OP(gvec_fcge_s, float32_cge, float32)
-+    return ret;
++
-+}
++DO_3OP(gvec_fcgt_h, float16_cgt, float16)
 +DO_3OP(gvec_fcgt_s, float32_cgt, float32)
 +
  #ifdef TARGET_AARCH64
  DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VADD, gen_helper_gvec_fadd_s, gen_helper_gvec_fadd_h)
  DO_3S_FP_GVEC(VSUB, gen_helper_gvec_fsub_s, gen_helper_gvec_fsub_h)
  DO_3S_FP_GVEC(VABD, gen_helper_gvec_fabd_s, gen_helper_gvec_fabd_h)
  DO_3S_FP_GVEC(VMUL, gen_helper_gvec_fmul_s, gen_helper_gvec_fmul_h)
 +DO_3S_FP_GVEC(VCEQ, gen_helper_gvec_fceq_s, gen_helper_gvec_fceq_h)
 +DO_3S_FP_GVEC(VCGE, gen_helper_gvec_fcge_s, gen_helper_gvec_fcge_h)
 +DO_3S_FP_GVEC(VCGT, gen_helper_gvec_fcgt_s, gen_helper_gvec_fcgt_h)
  /*
   * For all the functions using this macro, size == 1 means fp16,
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VMUL, gen_helper_gvec_fmul_s, gen_helper_gvec_fmul_h)
          return do_3same_fp(s, a, FUNC, READS_VD);                   \
      }
 -DO_3S_FP(VCEQ, gen_helper_neon_ceq_f32, false)
 -DO_3S_FP(VCGE, gen_helper_neon_cge_f32, false)
 -DO_3S_FP(VCGT, gen_helper_neon_cgt_f32, false)
  DO_3S_FP(VACGE, gen_helper_neon_acge_f32, false)
  DO_3S_FP(VACGT, gen_helper_neon_acgt_f32, false)
  DO_3S_FP(VMAX, gen_helper_vfp_maxs, false)
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 35/39] target/arm: Decode aa64 armv8.3 fcmla
+[PULL 26/47] target/arm: Implement fp16 for VACGE, VACGT
-From: Richard Henderson <richard.henderson@linaro.org>
+Convert the neon floating-point vector absolute comparison ops
 VACGE and VACGT over to using a gvec hepler and use this to
 implement the fp16 case.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180228193125.20577-13-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-[PMM: renamed e1/e2/e3/e4 to use the same naming as the version
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
- of the pseudocode in the Arm ARM]
+Message-id: 20200828183354.27913-28-peter.maydell@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.h        |  11 ++++
+ target/arm/helper.h             |  6 ++++++
- target/arm/translate-a64.c |  94 +++++++++++++++++++++++++---
+ target/arm/vec_helper.c         | 26 ++++++++++++++++++++++++++
- target/arm/vec_helper.c    | 149 +++++++++++++++++++++++++++++++++++++++++++++
+ target/arm/translate-neon.c.inc |  4 ++--
-files changed, 246 insertions(+), 8 deletions(-)
+files changed, 34 insertions(+), 2 deletions(-)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fcadds, TCG_CALL_NO_RWG,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fcge_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- DEF_HELPER_FLAGS_5(gvec_fcaddd, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_fcgt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(gvec_fcgt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_facge_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_facge_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_5(gvec_facgt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_facgt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
 +DEF_HELPER_FLAGS_5(gvec_fcmlah, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_fcmlah_idx, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_fcmlas, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_fcmlas_idx, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_fcmlad, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
 +
  #ifdef TARGET_AARCH64
  #include "helper-a64.h"
  #endif
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          feature = ARM_FEATURE_V8_RDM;
          break;
 +    case 0x8: /* FCMLA, #0 */
 +    case 0x9: /* FCMLA, #90 */
 +    case 0xa: /* FCMLA, #180 */
 +    case 0xb: /* FCMLA, #270 */
      case 0xc: /* FCADD, #90 */
      case 0xe: /* FCADD, #270 */
          if (size == 0
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          return;
 +    case 0x8: /* FCMLA, #0 */
 +    case 0x9: /* FCMLA, #90 */
 +    case 0xa: /* FCMLA, #180 */
 +    case 0xb: /* FCMLA, #270 */
 +        rot = extract32(opcode, 0, 2);
 +        switch (size) {
 +        case 1:
 +            gen_gvec_op3_fpst(s, is_q, rd, rn, rm, true, rot,
 +                              gen_helper_gvec_fcmlah);
 +            break;
 +        case 2:
 +            gen_gvec_op3_fpst(s, is_q, rd, rn, rm, false, rot,
 +                              gen_helper_gvec_fcmlas);
 +            break;
 +        case 3:
 +            gen_gvec_op3_fpst(s, is_q, rd, rn, rm, false, rot,
 +                              gen_helper_gvec_fcmlad);
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +        return;
 +
      case 0xc: /* FCADD, #90 */
      case 0xe: /* FCADD, #270 */
          rot = extract32(opcode, 1, 1);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
      int rn = extract32(insn, 5, 5);
      int rd = extract32(insn, 0, 5);
      bool is_long = false;
 -    bool is_fp = false;
 +    int is_fp = 0;
      bool is_fp16 = false;
      int index;
      TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
      case 0x05: /* FMLS */
      case 0x09: /* FMUL */
      case 0x19: /* FMULX */
 -        is_fp = true;
 +        is_fp = 1;
          break;
      case 0x1d: /* SQRDMLAH */
      case 0x1f: /* SQRDMLSH */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
              return;
          }
          break;
 +    case 0x11: /* FCMLA #0 */
 +    case 0x13: /* FCMLA #90 */
 +    case 0x15: /* FCMLA #180 */
 +    case 0x17: /* FCMLA #270 */
 +        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)) {
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        is_fp = 2;
 +        break;
      default:
          unallocated_encoding(s);
          return;
      }
 -    if (is_fp) {
 +    switch (is_fp) {
 +    case 1: /* normal fp */
          /* convert insn encoded size to TCGMemOp size */
          switch (size) {
          case 0: /* half-precision */
 -            if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 -                unallocated_encoding(s);
 -                return;
 -            }
              size = MO_16;
 +            is_fp16 = true;
              break;
          case MO_32: /* single precision */
          case MO_64: /* double precision */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
              unallocated_encoding(s);
              return;
          }
 -    } else {
 +        break;
 +
 +    case 2: /* complex fp */
 +        /* Each indexable element is a complex pair.  */
 +        size <<= 1;
 +        switch (size) {
 +        case MO_32:
 +            if (h && !is_q) {
 +                unallocated_encoding(s);
 +                return;
 +            }
 +            is_fp16 = true;
 +            break;
 +        case MO_64:
 +            break;
 +        default:
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        break;
 +
 +    default: /* integer */
          switch (size) {
          case MO_8:
          case MO_64:
              unallocated_encoding(s);
              return;
          }
 +        break;
 +    }
 +    if (is_fp16 && !arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +        unallocated_encoding(s);
 +        return;
      }
      /* Given TCGMemOp size, adjust register and indexing.  */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
          fpst = NULL;
      }
 +    switch (16 * u + opcode) {
 +    case 0x11: /* FCMLA #0 */
 +    case 0x13: /* FCMLA #90 */
 +    case 0x15: /* FCMLA #180 */
 +    case 0x17: /* FCMLA #270 */
 +        tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
 +                           vec_full_reg_offset(s, rn),
 +                           vec_reg_offset(s, rm, index, size), fpst,
 +                           is_q ? 16 : 8, vec_full_reg_size(s),
 +                           extract32(insn, 13, 2), /* rot */
 +                           size == MO_64
 +                           ? gen_helper_gvec_fcmlas_idx
 +                           : gen_helper_gvec_fcmlah_idx);
 +        tcg_temp_free_ptr(fpst);
 +        return;
 +    }
 +
      if (size == 3) {
          TCGv_i64 tcg_idx = tcg_temp_new_i64();
          int pass;
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
+@@ -XXX,XX +XXX,XX @@ static uint32_t float32_cgt(float32 op1, float32 op2, float_status *stat)
-     }
+     return -float32_lt(op2, op1, stat);
      clear_tail(d, opr_sz, simd_maxsz(desc));
  }
-+
-+void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm,
++static uint16_t float16_acge(float16 op1, float16 op2, float_status *stat)
 +                         void *vfpst, uint32_t desc)
 +{
-+    uintptr_t opr_sz = simd_oprsz(desc);
++    return -float16_le(float16_abs(op2), float16_abs(op1), stat);
 +    float16 *d = vd;
 +    float16 *n = vn;
 +    float16 *m = vm;
 +    float_status *fpst = vfpst;
 +    intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t neg_real = flip ^ neg_imag;
 +    uintptr_t i;
 +
 +    /* Shift boolean to the sign bit so we can xor to negate.  */
 +    neg_real <<= 15;
 +    neg_imag <<= 15;
 +
 +    for (i = 0; i < opr_sz / 2; i += 2) {
 +        float16 e2 = n[H2(i + flip)];
 +        float16 e1 = m[H2(i + flip)] ^ neg_real;
 +        float16 e4 = e2;
 +        float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag;
 +
 +        d[H2(i)] = float16_muladd(e2, e1, d[H2(i)], 0, fpst);
 +        d[H2(i + 1)] = float16_muladd(e4, e3, d[H2(i + 1)], 0, fpst);
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
-+void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm,
++static uint32_t float32_acge(float32 op1, float32 op2, float_status *stat)
 +                             void *vfpst, uint32_t desc)
 +{
-+    uintptr_t opr_sz = simd_oprsz(desc);
++    return -float32_le(float32_abs(op2), float32_abs(op1), stat);
 +    float16 *d = vd;
 +    float16 *n = vn;
 +    float16 *m = vm;
 +    float_status *fpst = vfpst;
 +    intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t neg_real = flip ^ neg_imag;
 +    uintptr_t i;
 +    float16 e1 = m[H2(flip)];
 +    float16 e3 = m[H2(1 - flip)];
 +
 +    /* Shift boolean to the sign bit so we can xor to negate.  */
 +    neg_real <<= 15;
 +    neg_imag <<= 15;
 +    e1 ^= neg_real;
 +    e3 ^= neg_imag;
 +
 +    for (i = 0; i < opr_sz / 2; i += 2) {
 +        float16 e2 = n[H2(i + flip)];
 +        float16 e4 = e2;
 +
 +        d[H2(i)] = float16_muladd(e2, e1, d[H2(i)], 0, fpst);
 +        d[H2(i + 1)] = float16_muladd(e4, e3, d[H2(i + 1)], 0, fpst);
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
-+void HELPER(gvec_fcmlas)(void *vd, void *vn, void *vm,
++static uint16_t float16_acgt(float16 op1, float16 op2, float_status *stat)
 +                         void *vfpst, uint32_t desc)
 +{
-+    uintptr_t opr_sz = simd_oprsz(desc);
++    return -float16_lt(float16_abs(op2), float16_abs(op1), stat);
 +    float32 *d = vd;
 +    float32 *n = vn;
 +    float32 *m = vm;
 +    float_status *fpst = vfpst;
 +    intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t neg_real = flip ^ neg_imag;
 +    uintptr_t i;
 +
 +    /* Shift boolean to the sign bit so we can xor to negate.  */
 +    neg_real <<= 31;
 +    neg_imag <<= 31;
 +
 +    for (i = 0; i < opr_sz / 4; i += 2) {
 +        float32 e2 = n[H4(i + flip)];
 +        float32 e1 = m[H4(i + flip)] ^ neg_real;
 +        float32 e4 = e2;
 +        float32 e3 = m[H4(i + 1 - flip)] ^ neg_imag;
 +
 +        d[H4(i)] = float32_muladd(e2, e1, d[H4(i)], 0, fpst);
 +        d[H4(i + 1)] = float32_muladd(e4, e3, d[H4(i + 1)], 0, fpst);
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
-+void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm,
++static uint32_t float32_acgt(float32 op1, float32 op2, float_status *stat)
 +                             void *vfpst, uint32_t desc)
 +{
-+    uintptr_t opr_sz = simd_oprsz(desc);
++    return -float32_lt(float32_abs(op2), float32_abs(op1), stat);
 +    float32 *d = vd;
 +    float32 *n = vn;
 +    float32 *m = vm;
 +    float_status *fpst = vfpst;
 +    intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t neg_real = flip ^ neg_imag;
 +    uintptr_t i;
 +    float32 e1 = m[H4(flip)];
 +    float32 e3 = m[H4(1 - flip)];
 +
 +    /* Shift boolean to the sign bit so we can xor to negate.  */
 +    neg_real <<= 31;
 +    neg_imag <<= 31;
 +    e1 ^= neg_real;
 +    e3 ^= neg_imag;
 +
 +    for (i = 0; i < opr_sz / 4; i += 2) {
 +        float32 e2 = n[H4(i + flip)];
 +        float32 e4 = e2;
 +
 +        d[H4(i)] = float32_muladd(e2, e1, d[H4(i)], 0, fpst);
 +        d[H4(i + 1)] = float32_muladd(e4, e3, d[H4(i + 1)], 0, fpst);
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
-+void HELPER(gvec_fcmlad)(void *vd, void *vn, void *vm,
+ #define DO_2OP(NAME, FUNC, TYPE) \
-+                         void *vfpst, uint32_t desc)
+ void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc)  \
-+{
+ {                                                                 \
-+    uintptr_t opr_sz = simd_oprsz(desc);
+@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_fcge_s, float32_cge, float32)
-+    float64 *d = vd;
+ DO_3OP(gvec_fcgt_h, float16_cgt, float16)
-+    float64 *n = vn;
+ DO_3OP(gvec_fcgt_s, float32_cgt, float32)
-+    float64 *m = vm;
-+    float_status *fpst = vfpst;
++DO_3OP(gvec_facge_h, float16_acge, float16)
-+    intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
++DO_3OP(gvec_facge_s, float32_acge, float32)
 +    uint64_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint64_t neg_real = flip ^ neg_imag;
 +    uintptr_t i;
 +
-+    /* Shift boolean to the sign bit so we can xor to negate.  */
++DO_3OP(gvec_facgt_h, float16_acgt, float16)
-+    neg_real <<= 63;
++DO_3OP(gvec_facgt_s, float32_acgt, float32)
 +    neg_imag <<= 63;
 +
-+    for (i = 0; i < opr_sz / 8; i += 2) {
+ #ifdef TARGET_AARCH64
-+        float64 e2 = n[i + flip];
-+        float64 e1 = m[i + flip] ^ neg_real;
+ DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
-+        float64 e4 = e2;
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
-+        float64 e3 = m[i + 1 - flip] ^ neg_imag;
+index XXXXXXX..XXXXXXX 100644
-+
+--- a/target/arm/translate-neon.c.inc
-+        d[i] = float64_muladd(e2, e1, d[i], 0, fpst);
++++ b/target/arm/translate-neon.c.inc
-+        d[i + 1] = float64_muladd(e4, e3, d[i + 1], 0, fpst);
+@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VMUL, gen_helper_gvec_fmul_s, gen_helper_gvec_fmul_h)
-+    }
+ DO_3S_FP_GVEC(VCEQ, gen_helper_gvec_fceq_s, gen_helper_gvec_fceq_h)
-+    clear_tail(d, opr_sz, simd_maxsz(desc));
+ DO_3S_FP_GVEC(VCGE, gen_helper_gvec_fcge_s, gen_helper_gvec_fcge_h)
-+}
+ DO_3S_FP_GVEC(VCGT, gen_helper_gvec_fcgt_s, gen_helper_gvec_fcgt_h)
 +DO_3S_FP_GVEC(VACGE, gen_helper_gvec_facge_s, gen_helper_gvec_facge_h)
 +DO_3S_FP_GVEC(VACGT, gen_helper_gvec_facgt_s, gen_helper_gvec_facgt_h)
  /*
   * For all the functions using this macro, size == 1 means fp16,
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VCGT, gen_helper_gvec_fcgt_s, gen_helper_gvec_fcgt_h)
          return do_3same_fp(s, a, FUNC, READS_VD);                   \
      }
 -DO_3S_FP(VACGE, gen_helper_neon_acge_f32, false)
 -DO_3S_FP(VACGT, gen_helper_neon_acgt_f32, false)
  DO_3S_FP(VMAX, gen_helper_vfp_maxs, false)
  DO_3S_FP(VMIN, gen_helper_vfp_mins, false)
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 29/39] target/arm: Decode aa64 armv8.1 scalar/vector x indexed element
+[PULL 27/47] target/arm: Implement fp16 for Neon VMAX, VMIN
-From: Richard Henderson <richard.henderson@linaro.org>
+Convert the Neon float-point VMAX and VMIN insns over to using
 a gvec helper, and use this to implement the fp16 case.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180228193125.20577-7-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200828183354.27913-29-peter.maydell@linaro.org
 ---
- target/arm/translate-a64.c | 29 +++++++++++++++++++++++++++++
+ target/arm/helper.h             | 6 ++++++
-file changed, 29 insertions(+)
+ target/arm/vec_helper.c         | 6 ++++++
  target/arm/translate-neon.c.inc | 5 ++---
 files changed, 14 insertions(+), 3 deletions(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/helper.h
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_facge_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-     case 0x19: /* FMULX */
+ DEF_HELPER_FLAGS_5(gvec_facgt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-         is_fp = true;
+ DEF_HELPER_FLAGS_5(gvec_facgt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-         break;
-+    case 0x1d: /* SQRDMLAH */
++DEF_HELPER_FLAGS_5(gvec_fmax_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+    case 0x1f: /* SQRDMLSH */
++DEF_HELPER_FLAGS_5(gvec_fmax_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+        if (!arm_dc_feature(s, ARM_FEATURE_V8_RDM)) {
++
-+            unallocated_encoding(s);
++DEF_HELPER_FLAGS_5(gvec_fmin_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+            return;
++DEF_HELPER_FLAGS_5(gvec_fmin_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+        }
++
-+        break;
+ DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
-     default:
+                    void, ptr, ptr, ptr, ptr, i32)
-         unallocated_encoding(s);
+ DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
-         return;
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
+index XXXXXXX..XXXXXXX 100644
-                                                 tcg_op, tcg_idx);
+--- a/target/arm/vec_helper.c
-                 }
++++ b/target/arm/vec_helper.c
-                 break;
+@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_facge_s, float32_acge, float32)
-+            case 0x1d: /* SQRDMLAH */
+ DO_3OP(gvec_facgt_h, float16_acgt, float16)
-+                read_vec_element_i32(s, tcg_res, rd, pass,
+ DO_3OP(gvec_facgt_s, float32_acgt, float32)
-+                                     is_scalar ? size : MO_32);
-+                if (size == 1) {
++DO_3OP(gvec_fmax_h, float16_max, float16)
-+                    gen_helper_neon_qrdmlah_s16(tcg_res, cpu_env,
++DO_3OP(gvec_fmax_s, float32_max, float32)
-+                                                tcg_op, tcg_idx, tcg_res);
++
-+                } else {
++DO_3OP(gvec_fmin_h, float16_min, float16)
-+                    gen_helper_neon_qrdmlah_s32(tcg_res, cpu_env,
++DO_3OP(gvec_fmin_s, float32_min, float32)
-+                                                tcg_op, tcg_idx, tcg_res);
++
-+                }
+ #ifdef TARGET_AARCH64
-+                break;
-+            case 0x1f: /* SQRDMLSH */
+ DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
-+                read_vec_element_i32(s, tcg_res, rd, pass,
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
-+                                     is_scalar ? size : MO_32);
+index XXXXXXX..XXXXXXX 100644
-+                if (size == 1) {
+--- a/target/arm/translate-neon.c.inc
-+                    gen_helper_neon_qrdmlsh_s16(tcg_res, cpu_env,
++++ b/target/arm/translate-neon.c.inc
-+                                                tcg_op, tcg_idx, tcg_res);
+@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VCGE, gen_helper_gvec_fcge_s, gen_helper_gvec_fcge_h)
-+                } else {
+ DO_3S_FP_GVEC(VCGT, gen_helper_gvec_fcgt_s, gen_helper_gvec_fcgt_h)
-+                    gen_helper_neon_qrdmlsh_s32(tcg_res, cpu_env,
+ DO_3S_FP_GVEC(VACGE, gen_helper_gvec_facge_s, gen_helper_gvec_facge_h)
-+                                                tcg_op, tcg_idx, tcg_res);
+ DO_3S_FP_GVEC(VACGT, gen_helper_gvec_facgt_s, gen_helper_gvec_facgt_h)
-+                }
++DO_3S_FP_GVEC(VMAX, gen_helper_gvec_fmax_s, gen_helper_gvec_fmax_h)
-+                break;
++DO_3S_FP_GVEC(VMIN, gen_helper_gvec_fmin_s, gen_helper_gvec_fmin_h)
-             default:
-                 g_assert_not_reached();
+ /*
-             }
+  * For all the functions using this macro, size == 1 means fp16,
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VACGT, gen_helper_gvec_facgt_s, gen_helper_gvec_facgt_h)
          return do_3same_fp(s, a, FUNC, READS_VD);                   \
      }
 -DO_3S_FP(VMAX, gen_helper_vfp_maxs, false)
 -DO_3S_FP(VMIN, gen_helper_vfp_mins, false)
 -
  static void gen_VMLA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
                              TCGv_ptr fpstatus)
  {
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 06/39] hw/arm/boot: Honour CPU's address space for image loads
+[PULL 28/47] target/arm: Implement fp16 for Neon VMAXNM, VMINNM
-Instead of loading kernels, device trees, and the like to
+Convert the Neon floating point VMAXNM and VMINNM insns to
-the system address space, use the CPU's address space. This
+using a gvec helper and use this to implement the fp16 case.
 is important if we're trying to load the file to memory or
 via an alias memory region that is provided by an SoC
 object and thus not mapped into the system address space.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-3-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-30-peter.maydell@linaro.org
 ---
- hw/arm/boot.c | 119 +++++++++++++++++++++++++++++++++++++---------------------
+ target/arm/helper.h             |  6 ++++++
-file changed, 76 insertions(+), 43 deletions(-)
+ target/arm/vec_helper.c         |  6 ++++++
  target/arm/translate-neon.c.inc | 23 +++++++++++++++--------
 files changed, 27 insertions(+), 8 deletions(-)
-diff --git a/hw/arm/boot.c b/hw/arm/boot.c
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/boot.c
+--- a/target/arm/helper.h
-+++ b/hw/arm/boot.c
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmax_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- #define ARM64_TEXT_OFFSET_OFFSET    8
+ DEF_HELPER_FLAGS_5(gvec_fmin_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- #define ARM64_MAGIC_OFFSET          56
+ DEF_HELPER_FLAGS_5(gvec_fmin_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+static AddressSpace *arm_boot_address_space(ARMCPU *cpu,
++DEF_HELPER_FLAGS_5(gvec_fmaxnum_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+                                            const struct arm_boot_info *info)
++DEF_HELPER_FLAGS_5(gvec_fmaxnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +{
 +    /* Return the address space to use for bootloader reads and writes.
 +     * We prefer the secure address space if the CPU has it and we're
 +     * going to boot the guest into it.
 +     */
 +    int asidx;
 +    CPUState *cs = CPU(cpu);
 +
-+    if (arm_feature(&cpu->env, ARM_FEATURE_EL3) && info->secure_boot) {
++DEF_HELPER_FLAGS_5(gvec_fminnum_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+        asidx = ARMASIdx_S;
++DEF_HELPER_FLAGS_5(gvec_fminnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +    } else {
 +        asidx = ARMASIdx_NS;
 +    }
 +
-+    return cpu_get_address_space(cs, asidx);
+ DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
-+}
+                    void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_fmax_s, float32_max, float32)
  DO_3OP(gvec_fmin_h, float16_min, float16)
  DO_3OP(gvec_fmin_s, float32_min, float32)
 +DO_3OP(gvec_fmaxnum_h, float16_maxnum, float16)
 +DO_3OP(gvec_fmaxnum_s, float32_maxnum, float32)
 +
- typedef enum {
++DO_3OP(gvec_fminnum_h, float16_minnum, float16)
-     FIXUP_NONE = 0,     /* do nothing */
++DO_3OP(gvec_fminnum_s, float32_minnum, float32)
-     FIXUP_TERMINATOR,   /* end of insns */
++
-@@ -XXX,XX +XXX,XX @@ static const ARMInsnFixup smpboot[] = {
+ #ifdef TARGET_AARCH64
- };
+ DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
- static void write_bootloader(const char *name, hwaddr addr,
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
--                             const ARMInsnFixup *insns, uint32_t *fixupcontext)
+index XXXXXXX..XXXXXXX 100644
-+                             const ARMInsnFixup *insns, uint32_t *fixupcontext,
+--- a/target/arm/translate-neon.c.inc
-+                             AddressSpace *as)
++++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static void gen_VMLS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
  DO_3S_FP(VMLA, gen_VMLA_fp_3s, true)
  DO_3S_FP(VMLS, gen_VMLS_fp_3s, true)
 +WRAP_FP_GVEC(gen_VMAXNM_fp32_3s, FPST_STD, gen_helper_gvec_fmaxnum_s)
 +WRAP_FP_GVEC(gen_VMAXNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fmaxnum_h)
 +WRAP_FP_GVEC(gen_VMINNM_fp32_3s, FPST_STD, gen_helper_gvec_fminnum_s)
 +WRAP_FP_GVEC(gen_VMINNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fminnum_h)
 +
  static bool trans_VMAXNM_fp_3s(DisasContext *s, arg_3same *a)
  {
-     /* Fix up the specified bootloader fragment and write it into
+     if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
-      * guest memory using rom_add_blob_fixed(). fixupcontext is
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMAXNM_fp_3s(DisasContext *s, arg_3same *a)
@@ -XXX,XX +XXX,XX @@ static void write_bootloader(const char *name, hwaddr addr,
          code[i] = tswap32(insn);
      }
--    rom_add_blob_fixed(name, code, len * sizeof(uint32_t), addr);
+     if (a->size != 0) {
-+    rom_add_blob_fixed_as(name, code, len * sizeof(uint32_t), addr, as);
+-        /* TODO fp16 support */
+-        return false;
-     g_free(code);
++        if (!dc_isar_feature(aa32_fp16_arith, s)) {
 +            return false;
 +        }
 +        return do_3same(s, a, gen_VMAXNM_fp16_3s);
      }
 -
 -    return do_3same_fp(s, a, gen_helper_vfp_maxnums, false);
 +    return do_3same(s, a, gen_VMAXNM_fp32_3s);
  }
-@@ -XXX,XX +XXX,XX @@ static void default_write_secondary(ARMCPU *cpu,
-                                     const struct arm_boot_info *info)
+ static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
- {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
      uint32_t fixupcontext[FIXUP_MAX];
 +    AddressSpace *as = arm_boot_address_space(cpu, info);
      fixupcontext[FIXUP_GIC_CPU_IF] = info->gic_cpu_if_addr;
      fixupcontext[FIXUP_BOOTREG] = info->smp_bootreg_addr;
@@ -XXX,XX +XXX,XX @@ static void default_write_secondary(ARMCPU *cpu,
      }
-     write_bootloader("smpboot", info->smp_loader_start,
+     if (a->size != 0) {
--                     smpboot, fixupcontext);
+-        /* TODO fp16 support */
-+                     smpboot, fixupcontext, as);
+-        return false;
 +        if (!dc_isar_feature(aa32_fp16_arith, s)) {
 +            return false;
 +        }
 +        return do_3same(s, a, gen_VMINNM_fp16_3s);
      }
 -
 -    return do_3same_fp(s, a, gen_helper_vfp_minnums, false);
 +    return do_3same(s, a, gen_VMINNM_fp32_3s);
  }
- void arm_write_secure_board_setup_dummy_smc(ARMCPU *cpu,
+ WRAP_ENV_FN(gen_VRECPS_tramp, gen_helper_recps_f32)
                                              const struct arm_boot_info *info,
                                              hwaddr mvbar_addr)
  {
 +    AddressSpace *as = arm_boot_address_space(cpu, info);
      int n;
      uint32_t mvbar_blob[] = {
          /* mvbar_addr: secure monitor vectors
@@ -XXX,XX +XXX,XX @@ void arm_write_secure_board_setup_dummy_smc(ARMCPU *cpu,
      for (n = 0; n < ARRAY_SIZE(mvbar_blob); n++) {
          mvbar_blob[n] = tswap32(mvbar_blob[n]);
      }
 -    rom_add_blob_fixed("board-setup-mvbar", mvbar_blob, sizeof(mvbar_blob),
 -                       mvbar_addr);
 +    rom_add_blob_fixed_as("board-setup-mvbar", mvbar_blob, sizeof(mvbar_blob),
 +                          mvbar_addr, as);
      for (n = 0; n < ARRAY_SIZE(board_setup_blob); n++) {
          board_setup_blob[n] = tswap32(board_setup_blob[n]);
      }
 -    rom_add_blob_fixed("board-setup", board_setup_blob,
 -                       sizeof(board_setup_blob), info->board_setup_addr);
 +    rom_add_blob_fixed_as("board-setup", board_setup_blob,
 +                          sizeof(board_setup_blob), info->board_setup_addr, as);
  }
  static void default_reset_secondary(ARMCPU *cpu,
                                      const struct arm_boot_info *info)
  {
 +    AddressSpace *as = arm_boot_address_space(cpu, info);
      CPUState *cs = CPU(cpu);
 -    address_space_stl_notdirty(&address_space_memory, info->smp_bootreg_addr,
 +    address_space_stl_notdirty(as, info->smp_bootreg_addr,
 , MEMTXATTRS_UNSPECIFIED, NULL);
      cpu_set_pc(cs, info->smp_loader_start);
  }
@@ -XXX,XX +XXX,XX @@ static inline bool have_dtb(const struct arm_boot_info *info)
  }
  #define WRITE_WORD(p, value) do { \
 -    address_space_stl_notdirty(&address_space_memory, p, value, \
 +    address_space_stl_notdirty(as, p, value, \
                                 MEMTXATTRS_UNSPECIFIED, NULL);  \
      p += 4;                       \
  } while (0)
 -static void set_kernel_args(const struct arm_boot_info *info)
 +static void set_kernel_args(const struct arm_boot_info *info, AddressSpace *as)
  {
      int initrd_size = info->initrd_size;
      hwaddr base = info->loader_start;
@@ -XXX,XX +XXX,XX @@ static void set_kernel_args(const struct arm_boot_info *info)
          int cmdline_size;
          cmdline_size = strlen(info->kernel_cmdline);
 -        cpu_physical_memory_write(p + 8, info->kernel_cmdline,
 -                                  cmdline_size + 1);
 +        address_space_write(as, p + 8, MEMTXATTRS_UNSPECIFIED,
 +                            (const uint8_t *)info->kernel_cmdline,
 +                            cmdline_size + 1);
          cmdline_size = (cmdline_size >> 2) + 1;
          WRITE_WORD(p, cmdline_size + 2);
          WRITE_WORD(p, 0x54410009);
@@ -XXX,XX +XXX,XX @@ static void set_kernel_args(const struct arm_boot_info *info)
          atag_board_len = (info->atag_board(info, atag_board_buf) + 3) & ~3;
          WRITE_WORD(p, (atag_board_len + 8) >> 2);
          WRITE_WORD(p, 0x414f4d50);
 -        cpu_physical_memory_write(p, atag_board_buf, atag_board_len);
 +        address_space_write(as, p, MEMTXATTRS_UNSPECIFIED,
 +                            atag_board_buf, atag_board_len);
          p += atag_board_len;
      }
      /* ATAG_END */
@@ -XXX,XX +XXX,XX @@ static void set_kernel_args(const struct arm_boot_info *info)
      WRITE_WORD(p, 0);
  }
 -static void set_kernel_args_old(const struct arm_boot_info *info)
 +static void set_kernel_args_old(const struct arm_boot_info *info,
 +                                AddressSpace *as)
  {
      hwaddr p;
      const char *s;
@@ -XXX,XX +XXX,XX @@ static void set_kernel_args_old(const struct arm_boot_info *info)
      }
      s = info->kernel_cmdline;
      if (s) {
 -        cpu_physical_memory_write(p, s, strlen(s) + 1);
 +        address_space_write(as, p, MEMTXATTRS_UNSPECIFIED,
 +                            (const uint8_t *)s, strlen(s) + 1);
      } else {
          WRITE_WORD(p, 0);
      }
@@ -XXX,XX +XXX,XX @@ static void fdt_add_psci_node(void *fdt)
   * @addr:       the address to load the image at
   * @binfo:      struct describing the boot environment
   * @addr_limit: upper limit of the available memory area at @addr
 + * @as:         address space to load image to
   *
   * Load a device tree supplied by the machine or by the user  with the
   * '-dtb' command line option, and put it at offset @addr in target
@@ -XXX,XX +XXX,XX @@ static void fdt_add_psci_node(void *fdt)
   * Note: Must not be called unless have_dtb(binfo) is true.
   */
  static int load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
 -                    hwaddr addr_limit)
 +                    hwaddr addr_limit, AddressSpace *as)
  {
      void *fdt = NULL;
      int size, rc;
@@ -XXX,XX +XXX,XX @@ static int load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
      /* Put the DTB into the memory map as a ROM image: this will ensure
       * the DTB is copied again upon reset, even if addr points into RAM.
       */
 -    rom_add_blob_fixed("dtb", fdt, size, addr);
 +    rom_add_blob_fixed_as("dtb", fdt, size, addr, as);
      g_free(fdt);
@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
              }
              if (cs == first_cpu) {
 +                AddressSpace *as = arm_boot_address_space(cpu, info);
 +
                  cpu_set_pc(cs, info->loader_start);
                  if (!have_dtb(info)) {
                      if (old_param) {
 -                        set_kernel_args_old(info);
 +                        set_kernel_args_old(info, as);
                      } else {
 -                        set_kernel_args(info);
 +                        set_kernel_args(info, as);
                      }
                  }
              } else {
@@ -XXX,XX +XXX,XX @@ static int do_arm_linux_init(Object *obj, void *opaque)
  static uint64_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry,
                               uint64_t *lowaddr, uint64_t *highaddr,
 -                             int elf_machine)
 +                             int elf_machine, AddressSpace *as)
  {
      bool elf_is64;
      union {
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry,
          }
      }
 -    ret = load_elf(info->kernel_filename, NULL, NULL,
 -                   pentry, lowaddr, highaddr, big_endian, elf_machine,
 -                   1, data_swab);
 +    ret = load_elf_as(info->kernel_filename, NULL, NULL,
 +                      pentry, lowaddr, highaddr, big_endian, elf_machine,
 +                      1, data_swab, as);
      if (ret <= 0) {
          /* The header loaded but the image didn't */
          exit(1);
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry,
  }
  static uint64_t load_aarch64_image(const char *filename, hwaddr mem_base,
 -                                   hwaddr *entry)
 +                                   hwaddr *entry, AddressSpace *as)
  {
      hwaddr kernel_load_offset = KERNEL64_LOAD_ADDR;
      uint8_t *buffer;
@@ -XXX,XX +XXX,XX @@ static uint64_t load_aarch64_image(const char *filename, hwaddr mem_base,
      }
      *entry = mem_base + kernel_load_offset;
 -    rom_add_blob_fixed(filename, buffer, size, *entry);
 +    rom_add_blob_fixed_as(filename, buffer, size, *entry, as);
      g_free(buffer);
@@ -XXX,XX +XXX,XX @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
      ARMCPU *cpu = n->cpu;
      struct arm_boot_info *info =
          container_of(n, struct arm_boot_info, load_kernel_notifier);
 +    AddressSpace *as = arm_boot_address_space(cpu, info);
      /* The board code is not supposed to set secure_board_setup unless
       * running its code in secure mode is actually possible, and KVM
@@ -XXX,XX +XXX,XX @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
               * the kernel is supposed to be loaded by the bootloader), copy the
               * DTB to the base of RAM for the bootloader to pick up.
               */
 -            if (load_dtb(info->loader_start, info, 0) < 0) {
 +            if (load_dtb(info->loader_start, info, 0, as) < 0) {
                  exit(1);
              }
          }
@@ -XXX,XX +XXX,XX @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
      /* Assume that raw images are linux kernels, and ELF images are not.  */
      kernel_size = arm_load_elf(info, &elf_entry, &elf_low_addr,
 -                               &elf_high_addr, elf_machine);
 +                               &elf_high_addr, elf_machine, as);
      if (kernel_size > 0 && have_dtb(info)) {
          /* If there is still some room left at the base of RAM, try and put
           * the DTB there like we do for images loaded with -bios or -pflash.
@@ -XXX,XX +XXX,XX @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
              if (elf_low_addr < info->loader_start) {
                  elf_low_addr = 0;
              }
 -            if (load_dtb(info->loader_start, info, elf_low_addr) < 0) {
 +            if (load_dtb(info->loader_start, info, elf_low_addr, as) < 0) {
                  exit(1);
              }
          }
      }
      entry = elf_entry;
      if (kernel_size < 0) {
 -        kernel_size = load_uimage(info->kernel_filename, &entry, NULL,
 -                                  &is_linux, NULL, NULL);
 +        kernel_size = load_uimage_as(info->kernel_filename, &entry, NULL,
 +                                     &is_linux, NULL, NULL, as);
      }
      if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64) && kernel_size < 0) {
          kernel_size = load_aarch64_image(info->kernel_filename,
 -                                         info->loader_start, &entry);
 +                                         info->loader_start, &entry, as);
          is_linux = 1;
      } else if (kernel_size < 0) {
          /* 32-bit ARM */
          entry = info->loader_start + KERNEL_LOAD_ADDR;
 -        kernel_size = load_image_targphys(info->kernel_filename, entry,
 -                                          info->ram_size - KERNEL_LOAD_ADDR);
 +        kernel_size = load_image_targphys_as(info->kernel_filename, entry,
 +                                             info->ram_size - KERNEL_LOAD_ADDR,
 +                                             as);
          is_linux = 1;
      }
      if (kernel_size < 0) {
@@ -XXX,XX +XXX,XX @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
          uint32_t fixupcontext[FIXUP_MAX];
          if (info->initrd_filename) {
 -            initrd_size = load_ramdisk(info->initrd_filename,
 -                                       info->initrd_start,
 -                                       info->ram_size -
 -                                       info->initrd_start);
 +            initrd_size = load_ramdisk_as(info->initrd_filename,
 +                                          info->initrd_start,
 +                                          info->ram_size - info->initrd_start,
 +                                          as);
              if (initrd_size < 0) {
 -                initrd_size = load_image_targphys(info->initrd_filename,
 -                                                  info->initrd_start,
 -                                                  info->ram_size -
 -                                                  info->initrd_start);
 +                initrd_size = load_image_targphys_as(info->initrd_filename,
 +                                                     info->initrd_start,
 +                                                     info->ram_size -
 +                                                     info->initrd_start,
 +                                                     as);
              }
              if (initrd_size < 0) {
                  error_report("could not load initrd '%s'",
@@ -XXX,XX +XXX,XX @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
              /* Place the DTB after the initrd in memory with alignment. */
              dtb_start = QEMU_ALIGN_UP(info->initrd_start + initrd_size, align);
 -            if (load_dtb(dtb_start, info, 0) < 0) {
 +            if (load_dtb(dtb_start, info, 0, as) < 0) {
                  exit(1);
              }
              fixupcontext[FIXUP_ARGPTR] = dtb_start;
@@ -XXX,XX +XXX,XX @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
          fixupcontext[FIXUP_ENTRYPOINT] = entry;
          write_bootloader("bootloader", info->loader_start,
 -                         primary_loader, fixupcontext);
 +                         primary_loader, fixupcontext, as);
          if (info->nb_cpus > 1) {
              info->write_secondary_boot(cpu, info);
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 16/39] hw/core/split-irq: Device that splits IRQ lines
+[PULL 29/47] target/arm: Implement fp16 for Neon VMLA, VMLS operations
-In some board or SoC models it is necessary to split a qemu_irq line
+Convert the Neon floating-point VMLA and VMLS insns over to using a
-so that one input can feed multiple outputs.  We currently have
+gvec helper, and use this to implement the fp16 case.
 qemu_irq_split() for this, but that has several deficiencies:
  * it can only handle splitting a line into two
  * it unavoidably leaks memory, so it can't be used
    in a device that can be deleted
 Implement a qdev device that encapsulates splitting of IRQs, with a
 configurable number of outputs.  (This is in some ways the inverse of
 the TYPE_OR_IRQ device.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-13-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-31-peter.maydell@linaro.org
 ---
- hw/core/Makefile.objs       |  1 +
+ target/arm/helper.h             |  6 +++++
- include/hw/core/split-irq.h | 57 +++++++++++++++++++++++++++++
+ target/arm/vec_helper.c         | 42 +++++++++++++++++++++++++++++++++
- include/hw/irq.h            |  4 +-
+ target/arm/translate-neon.c.inc | 33 ++------------------------
- hw/core/split-irq.c         | 89 +++++++++++++++++++++++++++++++++++++++++++++
+files changed, 50 insertions(+), 31 deletions(-)
 files changed, 150 insertions(+), 1 deletion(-)
  create mode 100644 include/hw/core/split-irq.h
  create mode 100644 hw/core/split-irq.c
-diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/core/Makefile.objs
+--- a/target/arm/helper.h
-+++ b/hw/core/Makefile.objs
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ common-obj-$(CONFIG_FITLOADER) += loader-fit.o
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmaxnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i3
- common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
+ DEF_HELPER_FLAGS_5(gvec_fminnum_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- common-obj-$(CONFIG_SOFTMMU) += register.o
+ DEF_HELPER_FLAGS_5(gvec_fminnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- common-obj-$(CONFIG_SOFTMMU) += or-irq.o
-+common-obj-$(CONFIG_SOFTMMU) += split-irq.o
++DEF_HELPER_FLAGS_5(gvec_fmla_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
++DEF_HELPER_FLAGS_5(gvec_fmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
  obj-$(CONFIG_SOFTMMU) += generic-loader.o
 diff --git a/include/hw/core/split-irq.h b/include/hw/core/split-irq.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/core/split-irq.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * IRQ splitter device.
 + *
 + * Copyright (c) 2018 Linaro Limited.
 + * Written by Peter Maydell
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a copy
 + * of this software and associated documentation files (the "Software"), to deal
 + * in the Software without restriction, including without limitation the rights
 + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 + * copies of the Software, and to permit persons to whom the Software is
 + * furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice shall be included in
 + * all copies or substantial portions of the Software.
 + *
 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 + * THE SOFTWARE.
 + */
 +
-+/* This is a simple device which has one GPIO input line and multiple
++DEF_HELPER_FLAGS_5(gvec_fmls_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+ * GPIO output lines. Any change on the input line is forwarded to all
++DEF_HELPER_FLAGS_5(gvec_fmls_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 + * of the outputs.
 + *
 + * QEMU interface:
 + *  + one unnamed GPIO input: the input line
 + *  + N unnamed GPIO outputs: the output lines
 + *  + QOM property "num-lines": sets the number of output lines
 + */
 +#ifndef HW_SPLIT_IRQ_H
 +#define HW_SPLIT_IRQ_H
 +
-+#include "hw/irq.h"
+ DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
-+#include "hw/sysbus.h"
+                    void, ptr, ptr, ptr, ptr, i32)
-+#include "qom/object.h"
+ DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
-+
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 +#define TYPE_SPLIT_IRQ "split-irq"
 +
 +#define MAX_SPLIT_LINES 16
 +
 +typedef struct SplitIRQ SplitIRQ;
 +
 +#define SPLIT_IRQ(obj) OBJECT_CHECK(SplitIRQ, (obj), TYPE_SPLIT_IRQ)
 +
 +struct SplitIRQ {
 +    DeviceState parent_obj;
 +
 +    qemu_irq out_irq[MAX_SPLIT_LINES];
 +    uint16_t num_lines;
 +};
 +
 +#endif
 diff --git a/include/hw/irq.h b/include/hw/irq.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/irq.h
+--- a/target/arm/vec_helper.c
-+++ b/include/hw/irq.h
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ void qemu_free_irq(qemu_irq irq);
+@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64)
- /* Returns a new IRQ with opposite polarity.  */
+ #endif
- qemu_irq qemu_irq_invert(qemu_irq irq);
+ #undef DO_3OP
--/* Returns a new IRQ which feeds into both the passed IRQs */
++/* Non-fused multiply-add (unlike float16_muladd etc, which are fused) */
-+/* Returns a new IRQ which feeds into both the passed IRQs.
++static float16 float16_muladd_nf(float16 dest, float16 op1, float16 op2,
-+ * It's probably better to use the TYPE_SPLIT_IRQ device instead.
++                                 float_status *stat)
 + */
  qemu_irq qemu_irq_split(qemu_irq irq1, qemu_irq irq2);
  /* Returns a new IRQ set which connects 1:1 to another IRQ set, which
 diff --git a/hw/core/split-irq.c b/hw/core/split-irq.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/core/split-irq.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * IRQ splitter device.
 + *
 + * Copyright (c) 2018 Linaro Limited.
 + * Written by Peter Maydell
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a copy
 + * of this software and associated documentation files (the "Software"), to deal
 + * in the Software without restriction, including without limitation the rights
 + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 + * copies of the Software, and to permit persons to whom the Software is
 + * furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice shall be included in
 + * all copies or substantial portions of the Software.
 + *
 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 + * THE SOFTWARE.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "hw/core/split-irq.h"
 +#include "qapi/error.h"
 +
 +static void split_irq_handler(void *opaque, int n, int level)
 +{
-+    SplitIRQ *s = SPLIT_IRQ(opaque);
++    return float16_add(dest, float16_mul(op1, op2, stat), stat);
 +    int i;
 +
 +    for (i = 0; i < s->num_lines; i++) {
 +        qemu_set_irq(s->out_irq[i], level);
 +    }
 +}
 +
-+static void split_irq_init(Object *obj)
++static float32 float32_muladd_nf(float32 dest, float32 op1, float32 op2,
 +                                 float_status *stat)
 +{
-+    qdev_init_gpio_in(DEVICE(obj), split_irq_handler, 1);
++    return float32_add(dest, float32_mul(op1, op2, stat), stat);
 +}
 +
-+static void split_irq_realize(DeviceState *dev, Error **errp)
++static float16 float16_mulsub_nf(float16 dest, float16 op1, float16 op2,
 +                                 float_status *stat)
 +{
-+    SplitIRQ *s = SPLIT_IRQ(dev);
++    return float16_sub(dest, float16_mul(op1, op2, stat), stat);
 +
 +    if (s->num_lines < 1 || s->num_lines >= MAX_SPLIT_LINES) {
 +        error_setg(errp,
 +                   "IRQ splitter number of lines %d is not between 1 and %d",
 +                   s->num_lines, MAX_SPLIT_LINES);
 +        return;
 +    }
 +
 +    qdev_init_gpio_out(dev, s->out_irq, s->num_lines);
 +}
 +
-+static Property split_irq_properties[] = {
++static float32 float32_mulsub_nf(float32 dest, float32 op1, float32 op2,
-+    DEFINE_PROP_UINT16("num-lines", SplitIRQ, num_lines, 1),
++                                 float_status *stat)
 +    DEFINE_PROP_END_OF_LIST(),
 +};
 +
 +static void split_irq_class_init(ObjectClass *klass, void *data)
 +{
-+    DeviceClass *dc = DEVICE_CLASS(klass);
++    return float32_sub(dest, float32_mul(op1, op2, stat), stat);
 +
 +    /* No state to reset or migrate */
 +    dc->props = split_irq_properties;
 +    dc->realize = split_irq_realize;
 +
 +    /* Reason: Needs to be wired up to work */
 +    dc->user_creatable = false;
 +}
 +
-+static const TypeInfo split_irq_type_info = {
++#define DO_MULADD(NAME, FUNC, TYPE) \
-+   .name = TYPE_SPLIT_IRQ,
++void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
-+   .parent = TYPE_DEVICE,
++{                                                                          \
-+   .instance_size = sizeof(SplitIRQ),
++    intptr_t i, oprsz = simd_oprsz(desc);                                  \
-+   .instance_init = split_irq_init,
++    TYPE *d = vd, *n = vn, *m = vm;                                        \
-+   .class_init = split_irq_class_init,
++    for (i = 0; i < oprsz / sizeof(TYPE); i++) {                           \
-+};
++        d[i] = FUNC(d[i], n[i], m[i], stat);                               \
-+
++    }                                                                      \
-+static void split_irq_register_types(void)
++    clear_tail(d, oprsz, simd_maxsz(desc));                                \
 +{
 +    type_register_static(&split_irq_type_info);
 +}
 +
-+type_init(split_irq_register_types)
++DO_MULADD(gvec_fmla_h, float16_muladd_nf, float16)
 +DO_MULADD(gvec_fmla_s, float32_muladd_nf, float32)
 +
 +DO_MULADD(gvec_fmls_h, float16_mulsub_nf, float16)
 +DO_MULADD(gvec_fmls_s, float32_mulsub_nf, float32)
 +
  /* For the indexed ops, SVE applies the index per 128-bit vector segment.
   * For AdvSIMD, there is of course only one such vector segment.
   */
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VACGE, gen_helper_gvec_facge_s, gen_helper_gvec_facge_h)
  DO_3S_FP_GVEC(VACGT, gen_helper_gvec_facgt_s, gen_helper_gvec_facgt_h)
  DO_3S_FP_GVEC(VMAX, gen_helper_gvec_fmax_s, gen_helper_gvec_fmax_h)
  DO_3S_FP_GVEC(VMIN, gen_helper_gvec_fmin_s, gen_helper_gvec_fmin_h)
 -
 -/*
 - * For all the functions using this macro, size == 1 means fp16,
 - * which is an architecture extension we don't implement yet.
 - */
 -#define DO_3S_FP(INSN,FUNC,READS_VD)                                \
 -    static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
 -    {                                                               \
 -        if (a->size != 0) {                                         \
 -            /* TODO fp16 support */                                 \
 -            return false;                                           \
 -        }                                                           \
 -        return do_3same_fp(s, a, FUNC, READS_VD);                   \
 -    }
 -
 -static void gen_VMLA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
 -                            TCGv_ptr fpstatus)
 -{
 -    gen_helper_vfp_muls(vn, vn, vm, fpstatus);
 -    gen_helper_vfp_adds(vd, vd, vn, fpstatus);
 -}
 -
 -static void gen_VMLS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
 -                            TCGv_ptr fpstatus)
 -{
 -    gen_helper_vfp_muls(vn, vn, vm, fpstatus);
 -    gen_helper_vfp_subs(vd, vd, vn, fpstatus);
 -}
 -
 -DO_3S_FP(VMLA, gen_VMLA_fp_3s, true)
 -DO_3S_FP(VMLS, gen_VMLS_fp_3s, true)
 +DO_3S_FP_GVEC(VMLA, gen_helper_gvec_fmla_s, gen_helper_gvec_fmla_h)
 +DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_s, gen_helper_gvec_fmls_h)
  WRAP_FP_GVEC(gen_VMAXNM_fp32_3s, FPST_STD, gen_helper_gvec_fmaxnum_s)
  WRAP_FP_GVEC(gen_VMAXNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fmaxnum_h)
 --
-.16.2
+.20.1

-New patch
+[PULL 30/47] target/arm: Implement fp16 for Neon VFMA, VMFS
+Convert the neon floating-point vector operations VFMA and VFMS
 to use a gvec helper, and use this to implement the fp16 case.
 This is the last use of do_3same_fp() so we can now delete
 that function.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200828183354.27913-32-peter.maydell@linaro.org
 ---
  target/arm/helper.h             |  6 +++
  target/arm/vec_helper.c         | 33 +++++++++++-
  target/arm/translate-neon.c.inc | 92 +--------------------------------
 files changed, 40 insertions(+), 91 deletions(-)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(gvec_fmls_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(gvec_fmls_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_vfma_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_vfma_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_5(gvec_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static float32 float32_mulsub_nf(float32 dest, float32 op1, float32 op2,
      return float32_sub(dest, float32_mul(op1, op2, stat), stat);
  }
 -#define DO_MULADD(NAME, FUNC, TYPE) \
 +/* Fused versions; these have the semantics Neon VFMA/VFMS want */
 +static float16 float16_muladd_f(float16 dest, float16 op1, float16 op2,
 +                                float_status *stat)
 +{
 +    return float16_muladd(op1, op2, dest, 0, stat);
 +}
 +
 +static float32 float32_muladd_f(float32 dest, float32 op1, float32 op2,
 +                                 float_status *stat)
 +{
 +    return float32_muladd(op1, op2, dest, 0, stat);
 +}
 +
 +static float16 float16_mulsub_f(float16 dest, float16 op1, float16 op2,
 +                                 float_status *stat)
 +{
 +    return float16_muladd(float16_chs(op1), op2, dest, 0, stat);
 +}
 +
 +static float32 float32_mulsub_f(float32 dest, float32 op1, float32 op2,
 +                                 float_status *stat)
 +{
 +    return float32_muladd(float32_chs(op1), op2, dest, 0, stat);
 +}
 +
 +#define DO_MULADD(NAME, FUNC, TYPE)                                     \
  void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
  {                                                                          \
      intptr_t i, oprsz = simd_oprsz(desc);                                  \
@@ -XXX,XX +XXX,XX @@ DO_MULADD(gvec_fmla_s, float32_muladd_nf, float32)
  DO_MULADD(gvec_fmls_h, float16_mulsub_nf, float16)
  DO_MULADD(gvec_fmls_s, float32_mulsub_nf, float32)
 +DO_MULADD(gvec_vfma_h, float16_muladd_f, float16)
 +DO_MULADD(gvec_vfma_s, float32_muladd_f, float32)
 +
 +DO_MULADD(gvec_vfms_h, float16_mulsub_f, float16)
 +DO_MULADD(gvec_vfms_s, float32_mulsub_f, float32)
 +
  /* For the indexed ops, SVE applies the index per 128-bit vector segment.
   * For AdvSIMD, there is of course only one such vector segment.
   */
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ DO_3SAME_PAIR(VPADD, padd_u)
  DO_3SAME_VQDMULH(VQDMULH, qdmulh)
  DO_3SAME_VQDMULH(VQRDMULH, qrdmulh)
 -static bool do_3same_fp(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn,
 -                        bool reads_vd)
 -{
 -    /*
 -     * FP operations handled elementwise 32 bits at a time.
 -     * If reads_vd is true then the old value of Vd will be
 -     * loaded before calling the callback function. This is
 -     * used for multiply-accumulate type operations.
 -     */
 -    TCGv_i32 tmp, tmp2;
 -    int pass;
 -
 -    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 -        return false;
 -    }
 -
 -    /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_simd_r32, s) &&
 -        ((a->vd | a->vn | a->vm) & 0x10)) {
 -        return false;
 -    }
 -
 -    if ((a->vn | a->vm | a->vd) & a->q) {
 -        return false;
 -    }
 -
 -    if (!vfp_access_check(s)) {
 -        return true;
 -    }
 -
 -    TCGv_ptr fpstatus = fpstatus_ptr(FPST_STD);
 -    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        tmp = neon_load_reg(a->vn, pass);
 -        tmp2 = neon_load_reg(a->vm, pass);
 -        if (reads_vd) {
 -            TCGv_i32 tmp_rd = neon_load_reg(a->vd, pass);
 -            fn(tmp_rd, tmp, tmp2, fpstatus);
 -            neon_store_reg(a->vd, pass, tmp_rd);
 -            tcg_temp_free_i32(tmp);
 -        } else {
 -            fn(tmp, tmp, tmp2, fpstatus);
 -            neon_store_reg(a->vd, pass, tmp);
 -        }
 -        tcg_temp_free_i32(tmp2);
 -    }
 -    tcg_temp_free_ptr(fpstatus);
 -    return true;
 -}
 -
  #define WRAP_FP_GVEC(WRAPNAME, FPST, FUNC)                              \
      static void WRAPNAME(unsigned vece, uint32_t rd_ofs,                \
                           uint32_t rn_ofs, uint32_t rm_ofs,              \
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VMAX, gen_helper_gvec_fmax_s, gen_helper_gvec_fmax_h)
  DO_3S_FP_GVEC(VMIN, gen_helper_gvec_fmin_s, gen_helper_gvec_fmin_h)
  DO_3S_FP_GVEC(VMLA, gen_helper_gvec_fmla_s, gen_helper_gvec_fmla_h)
  DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_s, gen_helper_gvec_fmls_h)
 +DO_3S_FP_GVEC(VFMA, gen_helper_gvec_vfma_s, gen_helper_gvec_vfma_h)
 +DO_3S_FP_GVEC(VFMS, gen_helper_gvec_vfms_s, gen_helper_gvec_vfms_h)
  WRAP_FP_GVEC(gen_VMAXNM_fp32_3s, FPST_STD, gen_helper_gvec_fmaxnum_s)
  WRAP_FP_GVEC(gen_VMAXNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fmaxnum_h)
@@ -XXX,XX +XXX,XX @@ static bool trans_VRSQRTS_fp_3s(DisasContext *s, arg_3same *a)
      return do_3same(s, a, gen_VRSQRTS_fp_3s);
  }
 -static void gen_VFMA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
 -                            TCGv_ptr fpstatus)
 -{
 -    gen_helper_vfp_muladds(vd, vn, vm, vd, fpstatus);
 -}
 -
 -static bool trans_VFMA_fp_3s(DisasContext *s, arg_3same *a)
 -{
 -    if (!dc_isar_feature(aa32_simdfmac, s)) {
 -        return false;
 -    }
 -
 -    if (a->size != 0) {
 -        /* TODO fp16 support */
 -        return false;
 -    }
 -
 -    return do_3same_fp(s, a, gen_VFMA_fp_3s, true);
 -}
 -
 -static void gen_VFMS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
 -                            TCGv_ptr fpstatus)
 -{
 -    gen_helper_vfp_negs(vn, vn);
 -    gen_helper_vfp_muladds(vd, vn, vm, vd, fpstatus);
 -}
 -
 -static bool trans_VFMS_fp_3s(DisasContext *s, arg_3same *a)
 -{
 -    if (!dc_isar_feature(aa32_simdfmac, s)) {
 -        return false;
 -    }
 -
 -    if (a->size != 0) {
 -        /* TODO fp16 support */
 -        return false;
 -    }
 -
 -    return do_3same_fp(s, a, gen_VFMS_fp_3s, true);
 -}
 -
  static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
  {
      /* FP operations handled pairwise 32 bits at a time */
 --
 .20.1

-[Qemu-devel] [PULL 23/39] mps2-an505: New board model: MPS2 with AN505 Cortex-M33 FPGA image
+[PULL 31/47] target/arm: Implement fp16 for Neon fp compare-vs-0
-Define a new board model for the MPS2 with an AN505 FPGA image
+Convert the neon floating-point vector compare-vs-0 insns VCEQ0,
-containing a Cortex-M33. Since the FPGA images for TrustZone
+VCGT0, VCLE0, VCGE0 and VCLT0 to use a gvec helper, and use this to
-cores (AN505, and the similar AN519 for Cortex-M23) have a
+implement the fp16 case.
 significantly different layout of devices to the non-TrustZone
 images, we use a new source file rather than shoehorning them
 into the existing mps2.c.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-20-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-33-peter.maydell@linaro.org
 ---
- hw/arm/Makefile.objs |   1 +
+ target/arm/helper.h             | 15 +++++++++++++++
- hw/arm/mps2-tz.c     | 503 +++++++++++++++++++++++++++++++++++++++++++++++++++
+ target/arm/vec_helper.c         | 25 +++++++++++++++++++++++++
-files changed, 504 insertions(+)
+ target/arm/translate-neon.c.inc | 33 +++++----------------------------
- create mode 100644 hw/arm/mps2-tz.c
+files changed, 45 insertions(+), 28 deletions(-)
-diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/Makefile.objs
+--- a/target/arm/helper.h
-+++ b/hw/arm/Makefile.objs
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_FSL_IMX31) += fsl-imx31.o kzm.o
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_frsqrte_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- obj-$(CONFIG_FSL_IMX6) += fsl-imx6.o sabrelite.o
+ DEF_HELPER_FLAGS_4(gvec_frsqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- obj-$(CONFIG_ASPEED_SOC) += aspeed_soc.o aspeed.o
+ DEF_HELPER_FLAGS_4(gvec_frsqrte_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- obj-$(CONFIG_MPS2) += mps2.o
-+obj-$(CONFIG_MPS2) += mps2-tz.o
++DEF_HELPER_FLAGS_4(gvec_fcgt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- obj-$(CONFIG_MSF2) += msf2-soc.o msf2-som.o
++DEF_HELPER_FLAGS_4(gvec_fcgt0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
  obj-$(CONFIG_IOTKIT) += iotkit.o
 diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * ARM V2M MPS2 board emulation, trustzone aware FPGA images
 + *
 + * Copyright (c) 2017 Linaro Limited
 + * Written by Peter Maydell
 + *
 + *  This program is free software; you can redistribute it and/or modify
 + *  it under the terms of the GNU General Public License version 2 or
 + *  (at your option) any later version.
 + */
 +
-+/* The MPS2 and MPS2+ dev boards are FPGA based (the 2+ has a bigger
++DEF_HELPER_FLAGS_4(gvec_fcge0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+ * FPGA but is otherwise the same as the 2). Since the CPU itself
++DEF_HELPER_FLAGS_4(gvec_fcge0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 + * and most of the devices are in the FPGA, the details of the board
 + * as seen by the guest depend significantly on the FPGA image.
 + * This source file covers the following FPGA images, for TrustZone cores:
 + *  "mps2-an505" -- Cortex-M33 as documented in ARM Application Note AN505
 + *
 + * Links to the TRM for the board itself and to the various Application
 + * Notes which document the FPGA images can be found here:
 + * https://developer.arm.com/products/system-design/development-boards/fpga-prototyping-boards/mps2
 + *
 + * Board TRM:
 + * http://infocenter.arm.com/help/topic/com.arm.doc.100112_0200_06_en/versatile_express_cortex_m_prototyping_systems_v2m_mps2_and_v2m_mps2plus_technical_reference_100112_0200_06_en.pdf
 + * Application Note AN505:
 + * http://infocenter.arm.com/help/topic/com.arm.doc.dai0505b/index.html
 + *
 + * The AN505 defers to the Cortex-M33 processor ARMv8M IoT Kit FVP User Guide
 + * (ARM ECM0601256) for the details of some of the device layout:
 + *   http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
 + */
 +
-+#include "qemu/osdep.h"
++DEF_HELPER_FLAGS_4(gvec_fceq0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+#include "qapi/error.h"
++DEF_HELPER_FLAGS_4(gvec_fceq0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +#include "qemu/error-report.h"
 +#include "hw/arm/arm.h"
 +#include "hw/arm/armv7m.h"
 +#include "hw/or-irq.h"
 +#include "hw/boards.h"
 +#include "exec/address-spaces.h"
 +#include "sysemu/sysemu.h"
 +#include "hw/misc/unimp.h"
 +#include "hw/char/cmsdk-apb-uart.h"
 +#include "hw/timer/cmsdk-apb-timer.h"
 +#include "hw/misc/mps2-scc.h"
 +#include "hw/misc/mps2-fpgaio.h"
 +#include "hw/arm/iotkit.h"
 +#include "hw/devices.h"
 +#include "net/net.h"
 +#include "hw/core/split-irq.h"
 +
-+typedef enum MPS2TZFPGAType {
++DEF_HELPER_FLAGS_4(gvec_fcle0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+    FPGA_AN505,
++DEF_HELPER_FLAGS_4(gvec_fcle0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +} MPS2TZFPGAType;
 +
-+typedef struct {
++DEF_HELPER_FLAGS_4(gvec_fclt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+    MachineClass parent;
++DEF_HELPER_FLAGS_4(gvec_fclt0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +    MPS2TZFPGAType fpga_type;
 +    uint32_t scc_id;
 +} MPS2TZMachineClass;
 +
-+typedef struct {
+ DEF_HELPER_FLAGS_5(gvec_fadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+    MachineState parent;
+ DEF_HELPER_FLAGS_5(gvec_fadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+
+ DEF_HELPER_FLAGS_5(gvec_fadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+    IoTKit iotkit;
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
-+    MemoryRegion psram;
+index XXXXXXX..XXXXXXX 100644
-+    MemoryRegion ssram1;
+--- a/target/arm/vec_helper.c
-+    MemoryRegion ssram1_m;
++++ b/target/arm/vec_helper.c
-+    MemoryRegion ssram23;
+@@ -XXX,XX +XXX,XX @@ DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16)
-+    MPS2SCC scc;
+ DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32)
-+    MPS2FPGAIO fpgaio;
+ DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64)
-+    TZPPC ppc[5];
-+    UnimplementedDeviceState ssram_mpc[3];
++#define WRAP_CMP0_FWD(FN, CMPOP, TYPE)                          \
-+    UnimplementedDeviceState spi[5];
++    static TYPE TYPE##_##FN##0(TYPE op, float_status *stat)     \
-+    UnimplementedDeviceState i2c[4];
++    {                                                           \
-+    UnimplementedDeviceState i2s_audio;
++        return TYPE##_##CMPOP(op, TYPE##_zero, stat);           \
 +    UnimplementedDeviceState gpio[5];
 +    UnimplementedDeviceState dma[4];
 +    UnimplementedDeviceState gfx;
 +    CMSDKAPBUART uart[5];
 +    SplitIRQ sec_resp_splitter;
 +    qemu_or_irq uart_irq_orgate;
 +} MPS2TZMachineState;
 +
 +#define TYPE_MPS2TZ_MACHINE "mps2tz"
 +#define TYPE_MPS2TZ_AN505_MACHINE MACHINE_TYPE_NAME("mps2-an505")
 +
 +#define MPS2TZ_MACHINE(obj) \
 +    OBJECT_CHECK(MPS2TZMachineState, obj, TYPE_MPS2TZ_MACHINE)
 +#define MPS2TZ_MACHINE_GET_CLASS(obj) \
 +    OBJECT_GET_CLASS(MPS2TZMachineClass, obj, TYPE_MPS2TZ_MACHINE)
 +#define MPS2TZ_MACHINE_CLASS(klass) \
 +    OBJECT_CLASS_CHECK(MPS2TZMachineClass, klass, TYPE_MPS2TZ_MACHINE)
 +
 +/* Main SYSCLK frequency in Hz */
 +#define SYSCLK_FRQ 20000000
 +
 +/* Initialize the auxiliary RAM region @mr and map it into
 + * the memory map at @base.
 + */
 +static void make_ram(MemoryRegion *mr, const char *name,
 +                     hwaddr base, hwaddr size)
 +{
 +    memory_region_init_ram(mr, NULL, name, size, &error_fatal);
 +    memory_region_add_subregion(get_system_memory(), base, mr);
 +}
 +
 +/* Create an alias of an entire original MemoryRegion @orig
 + * located at @base in the memory map.
 + */
 +static void make_ram_alias(MemoryRegion *mr, const char *name,
 +                           MemoryRegion *orig, hwaddr base)
 +{
 +    memory_region_init_alias(mr, NULL, name, orig, 0,
 +                             memory_region_size(orig));
 +    memory_region_add_subregion(get_system_memory(), base, mr);
 +}
 +
 +static void init_sysbus_child(Object *parent, const char *childname,
 +                              void *child, size_t childsize,
 +                              const char *childtype)
 +{
 +    object_initialize(child, childsize, childtype);
 +    object_property_add_child(parent, childname, OBJECT(child), &error_abort);
 +    qdev_set_parent_bus(DEVICE(child), sysbus_get_default());
 +
 +}
 +
 +/* Most of the devices in the AN505 FPGA image sit behind
 + * Peripheral Protection Controllers. These data structures
 + * define the layout of which devices sit behind which PPCs.
 + * The devfn for each port is a function which creates, configures
 + * and initializes the device, returning the MemoryRegion which
 + * needs to be plugged into the downstream end of the PPC port.
 + */
 +typedef MemoryRegion *MakeDevFn(MPS2TZMachineState *mms, void *opaque,
 +                                const char *name, hwaddr size);
 +
 +typedef struct PPCPortInfo {
 +    const char *name;
 +    MakeDevFn *devfn;
 +    void *opaque;
 +    hwaddr addr;
 +    hwaddr size;
 +} PPCPortInfo;
 +
 +typedef struct PPCInfo {
 +    const char *name;
 +    PPCPortInfo ports[TZ_NUM_PORTS];
 +} PPCInfo;
 +
 +static MemoryRegion *make_unimp_dev(MPS2TZMachineState *mms,
 +                                       void *opaque,
 +                                       const char *name, hwaddr size)
 +{
 +    /* Initialize, configure and realize a TYPE_UNIMPLEMENTED_DEVICE,
 +     * and return a pointer to its MemoryRegion.
 +     */
 +    UnimplementedDeviceState *uds = opaque;
 +
 +    init_sysbus_child(OBJECT(mms), name, uds,
 +                      sizeof(UnimplementedDeviceState),
 +                      TYPE_UNIMPLEMENTED_DEVICE);
 +    qdev_prop_set_string(DEVICE(uds), "name", name);
 +    qdev_prop_set_uint64(DEVICE(uds), "size", size);
 +    object_property_set_bool(OBJECT(uds), true, "realized", &error_fatal);
 +    return sysbus_mmio_get_region(SYS_BUS_DEVICE(uds), 0);
 +}
 +
 +static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
 +                               const char *name, hwaddr size)
 +{
 +    CMSDKAPBUART *uart = opaque;
 +    int i = uart - &mms->uart[0];
 +    Chardev *uartchr = i < MAX_SERIAL_PORTS ? serial_hds[i] : NULL;
 +    int rxirqno = i * 2;
 +    int txirqno = i * 2 + 1;
 +    int combirqno = i + 10;
 +    SysBusDevice *s;
 +    DeviceState *iotkitdev = DEVICE(&mms->iotkit);
 +    DeviceState *orgate_dev = DEVICE(&mms->uart_irq_orgate);
 +
 +    init_sysbus_child(OBJECT(mms), name, uart,
 +                      sizeof(mms->uart[0]), TYPE_CMSDK_APB_UART);
 +    qdev_prop_set_chr(DEVICE(uart), "chardev", uartchr);
 +    qdev_prop_set_uint32(DEVICE(uart), "pclk-frq", SYSCLK_FRQ);
 +    object_property_set_bool(OBJECT(uart), true, "realized", &error_fatal);
 +    s = SYS_BUS_DEVICE(uart);
 +    sysbus_connect_irq(s, 0, qdev_get_gpio_in_named(iotkitdev,
 +                                                    "EXP_IRQ", txirqno));
 +    sysbus_connect_irq(s, 1, qdev_get_gpio_in_named(iotkitdev,
 +                                                    "EXP_IRQ", rxirqno));
 +    sysbus_connect_irq(s, 2, qdev_get_gpio_in(orgate_dev, i * 2));
 +    sysbus_connect_irq(s, 3, qdev_get_gpio_in(orgate_dev, i * 2 + 1));
 +    sysbus_connect_irq(s, 4, qdev_get_gpio_in_named(iotkitdev,
 +                                                    "EXP_IRQ", combirqno));
 +    return sysbus_mmio_get_region(SYS_BUS_DEVICE(uart), 0);
 +}
 +
 +static MemoryRegion *make_scc(MPS2TZMachineState *mms, void *opaque,
 +                              const char *name, hwaddr size)
 +{
 +    MPS2SCC *scc = opaque;
 +    DeviceState *sccdev;
 +    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
 +
 +    object_initialize(scc, sizeof(mms->scc), TYPE_MPS2_SCC);
 +    sccdev = DEVICE(scc);
 +    qdev_set_parent_bus(sccdev, sysbus_get_default());
 +    qdev_prop_set_uint32(sccdev, "scc-cfg4", 0x2);
 +    qdev_prop_set_uint32(sccdev, "scc-aid", 0x02000008);
 +    qdev_prop_set_uint32(sccdev, "scc-id", mmc->scc_id);
 +    object_property_set_bool(OBJECT(scc), true, "realized", &error_fatal);
 +    return sysbus_mmio_get_region(SYS_BUS_DEVICE(sccdev), 0);
 +}
 +
 +static MemoryRegion *make_fpgaio(MPS2TZMachineState *mms, void *opaque,
 +                                 const char *name, hwaddr size)
 +{
 +    MPS2FPGAIO *fpgaio = opaque;
 +
 +    object_initialize(fpgaio, sizeof(mms->fpgaio), TYPE_MPS2_FPGAIO);
 +    qdev_set_parent_bus(DEVICE(fpgaio), sysbus_get_default());
 +    object_property_set_bool(OBJECT(fpgaio), true, "realized", &error_fatal);
 +    return sysbus_mmio_get_region(SYS_BUS_DEVICE(fpgaio), 0);
 +}
 +
 +static void mps2tz_common_init(MachineState *machine)
 +{
 +    MPS2TZMachineState *mms = MPS2TZ_MACHINE(machine);
 +    MachineClass *mc = MACHINE_GET_CLASS(machine);
 +    MemoryRegion *system_memory = get_system_memory();
 +    DeviceState *iotkitdev;
 +    DeviceState *dev_splitter;
 +    int i;
 +
 +    if (strcmp(machine->cpu_type, mc->default_cpu_type) != 0) {
 +        error_report("This board can only be used with CPU %s",
 +                     mc->default_cpu_type);
 +        exit(1);
 +    }
 +
-+    init_sysbus_child(OBJECT(machine), "iotkit", &mms->iotkit,
++#define WRAP_CMP0_REV(FN, CMPOP, TYPE)                          \
-+                      sizeof(mms->iotkit), TYPE_IOTKIT);
++    static TYPE TYPE##_##FN##0(TYPE op, float_status *stat)    \
-+    iotkitdev = DEVICE(&mms->iotkit);
++    {                                                           \
-+    object_property_set_link(OBJECT(&mms->iotkit), OBJECT(system_memory),
++        return TYPE##_##CMPOP(TYPE##_zero, op, stat);           \
 +                             "memory", &error_abort);
 +    qdev_prop_set_uint32(iotkitdev, "EXP_NUMIRQ", 92);
 +    qdev_prop_set_uint32(iotkitdev, "MAINCLK", SYSCLK_FRQ);
 +    object_property_set_bool(OBJECT(&mms->iotkit), true, "realized",
 +                             &error_fatal);
 +
 +    /* The sec_resp_cfg output from the IoTKit must be split into multiple
 +     * lines, one for each of the PPCs we create here.
 +     */
 +    object_initialize(&mms->sec_resp_splitter, sizeof(mms->sec_resp_splitter),
 +                      TYPE_SPLIT_IRQ);
 +    object_property_add_child(OBJECT(machine), "sec-resp-splitter",
 +                              OBJECT(&mms->sec_resp_splitter), &error_abort);
 +    object_property_set_int(OBJECT(&mms->sec_resp_splitter), 5,
 +                            "num-lines", &error_fatal);
 +    object_property_set_bool(OBJECT(&mms->sec_resp_splitter), true,
 +                             "realized", &error_fatal);
 +    dev_splitter = DEVICE(&mms->sec_resp_splitter);
 +    qdev_connect_gpio_out_named(iotkitdev, "sec_resp_cfg", 0,
 +                                qdev_get_gpio_in(dev_splitter, 0));
 +
 +    /* The IoTKit sets up much of the memory layout, including
 +     * the aliases between secure and non-secure regions in the
 +     * address space. The FPGA itself contains:
 +     *
 +     * 0x00000000..0x003fffff  SSRAM1
 +     * 0x00400000..0x007fffff  alias of SSRAM1
 +     * 0x28000000..0x283fffff  4MB SSRAM2 + SSRAM3
 +     * 0x40100000..0x4fffffff  AHB Master Expansion 1 interface devices
 +     * 0x80000000..0x80ffffff  16MB PSRAM
 +     */
 +
 +    /* The FPGA images have an odd combination of different RAMs,
 +     * because in hardware they are different implementations and
 +     * connected to different buses, giving varying performance/size
 +     * tradeoffs. For QEMU they're all just RAM, though. We arbitrarily
 +     * call the 16MB our "system memory", as it's the largest lump.
 +     */
 +    memory_region_allocate_system_memory(&mms->psram,
 +                                         NULL, "mps.ram", 0x01000000);
 +    memory_region_add_subregion(system_memory, 0x80000000, &mms->psram);
 +
 +    /* The SSRAM memories should all be behind Memory Protection Controllers,
 +     * but we don't implement that yet.
 +     */
 +    make_ram(&mms->ssram1, "mps.ssram1", 0x00000000, 0x00400000);
 +    make_ram_alias(&mms->ssram1_m, "mps.ssram1_m", &mms->ssram1, 0x00400000);
 +
 +    make_ram(&mms->ssram23, "mps.ssram23", 0x28000000, 0x00400000);
 +
 +    /* The overflow IRQs for all UARTs are ORed together.
 +     * Tx, Rx and "combined" IRQs are sent to the NVIC separately.
 +     * Create the OR gate for this.
 +     */
 +    object_initialize(&mms->uart_irq_orgate, sizeof(mms->uart_irq_orgate),
 +                      TYPE_OR_IRQ);
 +    object_property_add_child(OBJECT(mms), "uart-irq-orgate",
 +                              OBJECT(&mms->uart_irq_orgate), &error_abort);
 +    object_property_set_int(OBJECT(&mms->uart_irq_orgate), 10, "num-lines",
 +                            &error_fatal);
 +    object_property_set_bool(OBJECT(&mms->uart_irq_orgate), true,
 +                             "realized", &error_fatal);
 +    qdev_connect_gpio_out(DEVICE(&mms->uart_irq_orgate), 0,
 +                          qdev_get_gpio_in_named(iotkitdev, "EXP_IRQ", 15));
 +
 +    /* Most of the devices in the FPGA are behind Peripheral Protection
 +     * Controllers. The required order for initializing things is:
 +     *  + initialize the PPC
 +     *  + initialize, configure and realize downstream devices
 +     *  + connect downstream device MemoryRegions to the PPC
 +     *  + realize the PPC
 +     *  + map the PPC's MemoryRegions to the places in the address map
 +     *    where the downstream devices should appear
 +     *  + wire up the PPC's control lines to the IoTKit object
 +     */
 +
 +    const PPCInfo ppcs[] = { {
 +            .name = "apb_ppcexp0",
 +            .ports = {
 +                { "ssram-mpc0", make_unimp_dev, &mms->ssram_mpc[0],
 +                  0x58007000, 0x1000 },
 +                { "ssram-mpc1", make_unimp_dev, &mms->ssram_mpc[1],
 +                  0x58008000, 0x1000 },
 +                { "ssram-mpc2", make_unimp_dev, &mms->ssram_mpc[2],
 +                  0x58009000, 0x1000 },
 +            },
 +        }, {
 +            .name = "apb_ppcexp1",
 +            .ports = {
 +                { "spi0", make_unimp_dev, &mms->spi[0], 0x40205000, 0x1000 },
 +                { "spi1", make_unimp_dev, &mms->spi[1], 0x40206000, 0x1000 },
 +                { "spi2", make_unimp_dev, &mms->spi[2], 0x40209000, 0x1000 },
 +                { "spi3", make_unimp_dev, &mms->spi[3], 0x4020a000, 0x1000 },
 +                { "spi4", make_unimp_dev, &mms->spi[4], 0x4020b000, 0x1000 },
 +                { "uart0", make_uart, &mms->uart[0], 0x40200000, 0x1000 },
 +                { "uart1", make_uart, &mms->uart[1], 0x40201000, 0x1000 },
 +                { "uart2", make_uart, &mms->uart[2], 0x40202000, 0x1000 },
 +                { "uart3", make_uart, &mms->uart[3], 0x40203000, 0x1000 },
 +                { "uart4", make_uart, &mms->uart[4], 0x40204000, 0x1000 },
 +                { "i2c0", make_unimp_dev, &mms->i2c[0], 0x40207000, 0x1000 },
 +                { "i2c1", make_unimp_dev, &mms->i2c[1], 0x40208000, 0x1000 },
 +                { "i2c2", make_unimp_dev, &mms->i2c[2], 0x4020c000, 0x1000 },
 +                { "i2c3", make_unimp_dev, &mms->i2c[3], 0x4020d000, 0x1000 },
 +            },
 +        }, {
 +            .name = "apb_ppcexp2",
 +            .ports = {
 +                { "scc", make_scc, &mms->scc, 0x40300000, 0x1000 },
 +                { "i2s-audio", make_unimp_dev, &mms->i2s_audio,
 +                  0x40301000, 0x1000 },
 +                { "fpgaio", make_fpgaio, &mms->fpgaio, 0x40302000, 0x1000 },
 +            },
 +        }, {
 +            .name = "ahb_ppcexp0",
 +            .ports = {
 +                { "gfx", make_unimp_dev, &mms->gfx, 0x41000000, 0x140000 },
 +                { "gpio0", make_unimp_dev, &mms->gpio[0], 0x40100000, 0x1000 },
 +                { "gpio1", make_unimp_dev, &mms->gpio[1], 0x40101000, 0x1000 },
 +                { "gpio2", make_unimp_dev, &mms->gpio[2], 0x40102000, 0x1000 },
 +                { "gpio3", make_unimp_dev, &mms->gpio[3], 0x40103000, 0x1000 },
 +                { "gpio4", make_unimp_dev, &mms->gpio[4], 0x40104000, 0x1000 },
 +            },
 +        }, {
 +            .name = "ahb_ppcexp1",
 +            .ports = {
 +                { "dma0", make_unimp_dev, &mms->dma[0], 0x40110000, 0x1000 },
 +                { "dma1", make_unimp_dev, &mms->dma[1], 0x40111000, 0x1000 },
 +                { "dma2", make_unimp_dev, &mms->dma[2], 0x40112000, 0x1000 },
 +                { "dma3", make_unimp_dev, &mms->dma[3], 0x40113000, 0x1000 },
 +            },
 +        },
 +    };
 +
 +    for (i = 0; i < ARRAY_SIZE(ppcs); i++) {
 +        const PPCInfo *ppcinfo = &ppcs[i];
 +        TZPPC *ppc = &mms->ppc[i];
 +        DeviceState *ppcdev;
 +        int port;
 +        char *gpioname;
 +
 +        init_sysbus_child(OBJECT(machine), ppcinfo->name, ppc,
 +                          sizeof(TZPPC), TYPE_TZ_PPC);
 +        ppcdev = DEVICE(ppc);
 +
 +        for (port = 0; port < TZ_NUM_PORTS; port++) {
 +            const PPCPortInfo *pinfo = &ppcinfo->ports[port];
 +            MemoryRegion *mr;
 +            char *portname;
 +
 +            if (!pinfo->devfn) {
 +                continue;
 +            }
 +
 +            mr = pinfo->devfn(mms, pinfo->opaque, pinfo->name, pinfo->size);
 +            portname = g_strdup_printf("port[%d]", port);
 +            object_property_set_link(OBJECT(ppc), OBJECT(mr),
 +                                     portname, &error_fatal);
 +            g_free(portname);
 +        }
 +
 +        object_property_set_bool(OBJECT(ppc), true, "realized", &error_fatal);
 +
 +        for (port = 0; port < TZ_NUM_PORTS; port++) {
 +            const PPCPortInfo *pinfo = &ppcinfo->ports[port];
 +
 +            if (!pinfo->devfn) {
 +                continue;
 +            }
 +            sysbus_mmio_map(SYS_BUS_DEVICE(ppc), port, pinfo->addr);
 +
 +            gpioname = g_strdup_printf("%s_nonsec", ppcinfo->name);
 +            qdev_connect_gpio_out_named(iotkitdev, gpioname, port,
 +                                        qdev_get_gpio_in_named(ppcdev,
 +                                                               "cfg_nonsec",
 +                                                               port));
 +            g_free(gpioname);
 +            gpioname = g_strdup_printf("%s_ap", ppcinfo->name);
 +            qdev_connect_gpio_out_named(iotkitdev, gpioname, port,
 +                                        qdev_get_gpio_in_named(ppcdev,
 +                                                               "cfg_ap", port));
 +            g_free(gpioname);
 +        }
 +
 +        gpioname = g_strdup_printf("%s_irq_enable", ppcinfo->name);
 +        qdev_connect_gpio_out_named(iotkitdev, gpioname, 0,
 +                                    qdev_get_gpio_in_named(ppcdev,
 +                                                           "irq_enable", 0));
 +        g_free(gpioname);
 +        gpioname = g_strdup_printf("%s_irq_clear", ppcinfo->name);
 +        qdev_connect_gpio_out_named(iotkitdev, gpioname, 0,
 +                                    qdev_get_gpio_in_named(ppcdev,
 +                                                           "irq_clear", 0));
 +        g_free(gpioname);
 +        gpioname = g_strdup_printf("%s_irq_status", ppcinfo->name);
 +        qdev_connect_gpio_out_named(ppcdev, "irq", 0,
 +                                    qdev_get_gpio_in_named(iotkitdev,
 +                                                           gpioname, 0));
 +        g_free(gpioname);
 +
 +        qdev_connect_gpio_out(dev_splitter, i,
 +                              qdev_get_gpio_in_named(ppcdev,
 +                                                     "cfg_sec_resp", 0));
 +    }
 +
-+    /* In hardware this is a LAN9220; the LAN9118 is software compatible
++#define DO_2OP_CMP0(FN, CMPOP, DIRN)                    \
-+     * except that it doesn't support the checksum-offload feature.
++    WRAP_CMP0_##DIRN(FN, CMPOP, float16)                \
-+     * The ethernet controller is not behind a PPC.
++    WRAP_CMP0_##DIRN(FN, CMPOP, float32)                \
-+     */
++    DO_2OP(gvec_f##FN##0_h, float16_##FN##0, float16)   \
-+    lan9118_init(&nd_table[0], 0x42000000,
++    DO_2OP(gvec_f##FN##0_s, float32_##FN##0, float32)
 +                 qdev_get_gpio_in_named(iotkitdev, "EXP_IRQ", 16));
 +
-+    create_unimplemented_device("FPGA NS PC", 0x48007000, 0x1000);
++DO_2OP_CMP0(cgt, cgt, FWD)
 +DO_2OP_CMP0(cge, cge, FWD)
 +DO_2OP_CMP0(ceq, ceq, FWD)
 +DO_2OP_CMP0(clt, cgt, REV)
 +DO_2OP_CMP0(cle, cge, REV)
 +
-+    armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename, 0x400000);
+ #undef DO_2OP
-+}
++#undef DO_2OP_CMP0
-+
-+static void mps2tz_class_init(ObjectClass *oc, void *data)
+ /* Floating-point trigonometric starting value.
-+{
+  * See the ARM ARM pseudocode function FPTrigSMul.
-+    MachineClass *mc = MACHINE_CLASS(oc);
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
-+
+index XXXXXXX..XXXXXXX 100644
-+    mc->init = mps2tz_common_init;
+--- a/target/arm/translate-neon.c.inc
-+    mc->max_cpus = 1;
++++ b/target/arm/translate-neon.c.inc
-+}
+@@ -XXX,XX +XXX,XX @@ DO_2MISC_FP(VCVT_UF, gen_helper_vfp_touizs)
-+
-+static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
+ DO_2MISC_FP_VEC(VRECPE_F, gen_helper_gvec_frecpe_h, gen_helper_gvec_frecpe_s)
-+{
+ DO_2MISC_FP_VEC(VRSQRTE_F, gen_helper_gvec_frsqrte_h, gen_helper_gvec_frsqrte_s)
-+    MachineClass *mc = MACHINE_CLASS(oc);
++DO_2MISC_FP_VEC(VCGT0_F, gen_helper_gvec_fcgt0_h, gen_helper_gvec_fcgt0_s)
-+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_CLASS(oc);
++DO_2MISC_FP_VEC(VCGE0_F, gen_helper_gvec_fcge0_h, gen_helper_gvec_fcge0_s)
-+
++DO_2MISC_FP_VEC(VCEQ0_F, gen_helper_gvec_fceq0_h, gen_helper_gvec_fceq0_s)
-+    mc->desc = "ARM MPS2 with AN505 FPGA image for Cortex-M33";
++DO_2MISC_FP_VEC(VCLT0_F, gen_helper_gvec_fclt0_h, gen_helper_gvec_fclt0_s)
-+    mmc->fpga_type = FPGA_AN505;
++DO_2MISC_FP_VEC(VCLE0_F, gen_helper_gvec_fcle0_h, gen_helper_gvec_fcle0_s)
-+    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
-+    mmc->scc_id = 0x41040000 | (505 << 4);
+ static bool trans_VRINTX(DisasContext *s, arg_2misc *a)
-+}
+ {
-+
+@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX(DisasContext *s, arg_2misc *a)
-+static const TypeInfo mps2tz_info = {
+     return do_2misc_fp(s, a, gen_helper_rints_exact);
-+    .name = TYPE_MPS2TZ_MACHINE,
+ }
-+    .parent = TYPE_MACHINE,
-+    .abstract = true,
+-#define WRAP_FP_CMP0_FWD(WRAPNAME, FUNC)                        \
-+    .instance_size = sizeof(MPS2TZMachineState),
+-    static void WRAPNAME(TCGv_i32 d, TCGv_i32 m, TCGv_ptr fpst) \
-+    .class_size = sizeof(MPS2TZMachineClass),
+-    {                                                           \
-+    .class_init = mps2tz_class_init,
+-        TCGv_i32 zero = tcg_const_i32(0);                       \
-+};
+-        FUNC(d, m, zero, fpst);                                 \
-+
+-        tcg_temp_free_i32(zero);                                \
-+static const TypeInfo mps2tz_an505_info = {
+-    }
-+    .name = TYPE_MPS2TZ_AN505_MACHINE,
+-#define WRAP_FP_CMP0_REV(WRAPNAME, FUNC)                        \
-+    .parent = TYPE_MPS2TZ_MACHINE,
+-    static void WRAPNAME(TCGv_i32 d, TCGv_i32 m, TCGv_ptr fpst) \
-+    .class_init = mps2tz_an505_class_init,
+-    {                                                           \
-+};
+-        TCGv_i32 zero = tcg_const_i32(0);                       \
-+
+-        FUNC(d, zero, m, fpst);                                 \
-+static void mps2tz_machine_init(void)
+-        tcg_temp_free_i32(zero);                                \
-+{
+-    }
-+    type_register_static(&mps2tz_info);
+-
-+    type_register_static(&mps2tz_an505_info);
+-#define DO_FP_CMP0(INSN, FUNC, REV)                             \
-+}
+-    WRAP_FP_CMP0_##REV(gen_##INSN, FUNC)                        \
-+
+-    static bool trans_##INSN(DisasContext *s, arg_2misc *a)     \
-+type_init(mps2tz_machine_init);
+-    {                                                           \
 -        return do_2misc_fp(s, a, gen_##INSN);                   \
 -    }
 -
 -DO_FP_CMP0(VCGT0_F, gen_helper_neon_cgt_f32, FWD)
 -DO_FP_CMP0(VCGE0_F, gen_helper_neon_cge_f32, FWD)
 -DO_FP_CMP0(VCEQ0_F, gen_helper_neon_ceq_f32, FWD)
 -DO_FP_CMP0(VCLE0_F, gen_helper_neon_cge_f32, REV)
 -DO_FP_CMP0(VCLT0_F, gen_helper_neon_cgt_f32, REV)
 -
  static bool do_vrint(DisasContext *s, arg_2misc *a, int rmode)
  {
      /*
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 02/39] xlnx-zynqmp-rtc: Add basic time support
+[PULL 32/47] target/arm: Implement fp16 for Neon VRECPS
-From: Alistair Francis <alistair.francis@xilinx.com>
+Convert the Neon VRECPS insn to using a gvec helper, and
 use this to implement the fp16 case.
-Allow the guest to determine the time set from the QEMU command line.
+The phrasing of the new float32_recps_nf() is slightly different from
 the old recps_f32() so that it parallels the f16 version; for f16 we
 can't assume that flush-to-zero is always enabled.
-This includes adding a trace event to debug the new time.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200828183354.27913-34-peter.maydell@linaro.org
 ---
  target/arm/helper.h             |  4 +++-
  target/arm/vec_helper.c         | 31 +++++++++++++++++++++++++++++++
  target/arm/vfp_helper.c         | 13 -------------
  target/arm/translate-neon.c.inc | 21 +--------------------
 files changed, 35 insertions(+), 34 deletions(-)
-Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  include/hw/timer/xlnx-zynqmp-rtc.h |  2 ++
  hw/timer/xlnx-zynqmp-rtc.c         | 58 ++++++++++++++++++++++++++++++++++++++
  hw/timer/trace-events              |  3 ++
 files changed, 63 insertions(+)
 diff --git a/include/hw/timer/xlnx-zynqmp-rtc.h b/include/hw/timer/xlnx-zynqmp-rtc.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/timer/xlnx-zynqmp-rtc.h
+--- a/target/arm/helper.h
-+++ b/include/hw/timer/xlnx-zynqmp-rtc.h
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ typedef struct XlnxZynqMPRTC {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(vfp_muladdd, f64, f64, f64, f64, ptr)
-     qemu_irq irq_rtc_int;
+ DEF_HELPER_4(vfp_muladds, f32, f32, f32, f32, ptr)
-     qemu_irq irq_addr_error_int;
+ DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, ptr)
-+    uint32_t tick_offset;
+-DEF_HELPER_3(recps_f32, f32, env, f32, f32)
  DEF_HELPER_3(rsqrts_f32, f32, env, f32, f32)
  DEF_HELPER_FLAGS_2(recpe_f16, TCG_CALL_NO_RWG, f16, f16, ptr)
  DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, ptr)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmaxnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i3
  DEF_HELPER_FLAGS_5(gvec_fminnum_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(gvec_fminnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_recps_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_recps_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +
-     uint32_t regs[XLNX_ZYNQMP_RTC_R_MAX];
+ DEF_HELPER_FLAGS_5(gvec_fmla_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-     RegisterInfo regs_info[XLNX_ZYNQMP_RTC_R_MAX];
+ DEF_HELPER_FLAGS_5(gvec_fmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- } XlnxZynqMPRTC;
-diff --git a/hw/timer/xlnx-zynqmp-rtc.c b/hw/timer/xlnx-zynqmp-rtc.c
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/xlnx-zynqmp-rtc.c
+--- a/target/arm/vec_helper.c
-+++ b/hw/timer/xlnx-zynqmp-rtc.c
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static float32 float32_abd(float32 op1, float32 op2, float_status *stat)
- #include "hw/register.h"
+     return float32_abs(float32_sub(op1, op2, stat));
  #include "qemu/bitops.h"
  #include "qemu/log.h"
 +#include "hw/ptimer.h"
 +#include "qemu/cutils.h"
 +#include "sysemu/sysemu.h"
 +#include "trace.h"
  #include "hw/timer/xlnx-zynqmp-rtc.h"
  #ifndef XLNX_ZYNQMP_RTC_ERR_DEBUG
@@ -XXX,XX +XXX,XX @@ static void addr_error_int_update_irq(XlnxZynqMPRTC *s)
      qemu_set_irq(s->irq_addr_error_int, pending);
  }
-+static uint32_t rtc_get_count(XlnxZynqMPRTC *s)
++/*
 + * Reciprocal step. These are the AArch32 version which uses a
 + * non-fused multiply-and-subtract.
 + */
 +static float16 float16_recps_nf(float16 op1, float16 op2, float_status *stat)
 +{
-+    int64_t now = qemu_clock_get_ns(rtc_clock);
++    op1 = float16_squash_input_denormal(op1, stat);
-+    return s->tick_offset + now / NANOSECONDS_PER_SECOND;
++    op2 = float16_squash_input_denormal(op2, stat);
 +
 +    if ((float16_is_infinity(op1) && float16_is_zero(op2)) ||
 +        (float16_is_infinity(op2) && float16_is_zero(op1))) {
 +        return float16_two;
 +    }
 +    return float16_sub(float16_two, float16_mul(op1, op2, stat), stat);
 +}
 +
-+static uint64_t current_time_postr(RegisterInfo *reg, uint64_t val64)
++static float32 float32_recps_nf(float32 op1, float32 op2, float_status *stat)
 +{
-+    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
++    op1 = float32_squash_input_denormal(op1, stat);
 +    op2 = float32_squash_input_denormal(op2, stat);
 +
-+    return rtc_get_count(s);
++    if ((float32_is_infinity(op1) && float32_is_zero(op2)) ||
 +        (float32_is_infinity(op2) && float32_is_zero(op1))) {
 +        return float32_two;
 +    }
 +    return float32_sub(float32_two, float32_mul(op1, op2, stat), stat);
 +}
 +
- static void rtc_int_status_postw(RegisterInfo *reg, uint64_t val64)
+ #define DO_3OP(NAME, FUNC, TYPE) \
  void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
  {                                                                          \
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_fmaxnum_s, float32_maxnum, float32)
  DO_3OP(gvec_fminnum_h, float16_minnum, float16)
  DO_3OP(gvec_fminnum_s, float32_minnum, float32)
 +DO_3OP(gvec_recps_nf_h, float16_recps_nf, float16)
 +DO_3OP(gvec_recps_nf_s, float32_recps_nf, float32)
 +
  #ifdef TARGET_AARCH64
  DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_fcvt_f64_to_f16)(float64 a, void *fpstp, uint32_t ahp_mode)
      return r;
  }
 -float32 HELPER(recps_f32)(CPUARMState *env, float32 a, float32 b)
 -{
 -    float_status *s = &env->vfp.standard_fp_status;
 -    if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
 -        (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) {
 -        if (!(float32_is_zero(a) || float32_is_zero(b))) {
 -            float_raise(float_flag_input_denormal, s);
 -        }
 -        return float32_two;
 -    }
 -    return float32_sub(float32_two, float32_mul(a, b, s), s);
 -}
 -
  float32 HELPER(rsqrts_f32)(CPUARMState *env, float32 a, float32 b)
  {
-     XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
+     float_status *s = &env->vfp.standard_fp_status;
-@@ -XXX,XX +XXX,XX @@ static uint64_t addr_error_int_dis_prew(RegisterInfo *reg, uint64_t val64)
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+index XXXXXXX..XXXXXXX 100644
- static const RegisterAccessInfo rtc_regs_info[] = {
+--- a/target/arm/translate-neon.c.inc
-     {   .name = "SET_TIME_WRITE",  .addr = A_SET_TIME_WRITE,
++++ b/target/arm/translate-neon.c.inc
-+        .unimp = MAKE_64BIT_MASK(0, 32),
+@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VMLA, gen_helper_gvec_fmla_s, gen_helper_gvec_fmla_h)
-     },{ .name = "SET_TIME_READ",  .addr = A_SET_TIME_READ,
+ DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_s, gen_helper_gvec_fmls_h)
-         .ro = 0xffffffff,
+ DO_3S_FP_GVEC(VFMA, gen_helper_gvec_vfma_s, gen_helper_gvec_vfma_h)
-+        .post_read = current_time_postr,
+ DO_3S_FP_GVEC(VFMS, gen_helper_gvec_vfms_s, gen_helper_gvec_vfms_h)
-     },{ .name = "CALIB_WRITE",  .addr = A_CALIB_WRITE,
++DO_3S_FP_GVEC(VRECPS, gen_helper_gvec_recps_nf_s, gen_helper_gvec_recps_nf_h)
-+        .unimp = MAKE_64BIT_MASK(0, 32),
-     },{ .name = "CALIB_READ",  .addr = A_CALIB_READ,
+ WRAP_FP_GVEC(gen_VMAXNM_fp32_3s, FPST_STD, gen_helper_gvec_fmaxnum_s)
-         .ro = 0x1fffff,
+ WRAP_FP_GVEC(gen_VMAXNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fmaxnum_h)
-     },{ .name = "CURRENT_TIME",  .addr = A_CURRENT_TIME,
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
-         .ro = 0xffffffff,
+     return do_3same(s, a, gen_VMINNM_fp32_3s);
 +        .post_read = current_time_postr,
      },{ .name = "CURRENT_TICK",  .addr = A_CURRENT_TICK,
          .ro = 0xffff,
      },{ .name = "ALARM",  .addr = A_ALARM,
@@ -XXX,XX +XXX,XX @@ static void rtc_init(Object *obj)
      XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(obj);
      SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
      RegisterInfoArray *reg_array;
 +    struct tm current_tm;
      memory_region_init(&s->iomem, obj, TYPE_XLNX_ZYNQMP_RTC,
                         XLNX_ZYNQMP_RTC_R_MAX * 4);
@@ -XXX,XX +XXX,XX @@ static void rtc_init(Object *obj)
      sysbus_init_mmio(sbd, &s->iomem);
      sysbus_init_irq(sbd, &s->irq_rtc_int);
      sysbus_init_irq(sbd, &s->irq_addr_error_int);
 +
 +    qemu_get_timedate(&current_tm, 0);
 +    s->tick_offset = mktimegm(&current_tm) -
 +        qemu_clock_get_ns(rtc_clock) / NANOSECONDS_PER_SECOND;
 +
 +    trace_xlnx_zynqmp_rtc_gettime(current_tm.tm_year, current_tm.tm_mon,
 +                                  current_tm.tm_mday, current_tm.tm_hour,
 +                                  current_tm.tm_min, current_tm.tm_sec);
 +}
 +
 +static int rtc_pre_save(void *opaque)
 +{
 +    XlnxZynqMPRTC *s = opaque;
 +    int64_t now = qemu_clock_get_ns(rtc_clock) / NANOSECONDS_PER_SECOND;
 +
 +    /* Add the time at migration */
 +    s->tick_offset = s->tick_offset + now;
 +
 +    return 0;
 +}
 +
 +static int rtc_post_load(void *opaque, int version_id)
 +{
 +    XlnxZynqMPRTC *s = opaque;
 +    int64_t now = qemu_clock_get_ns(rtc_clock) / NANOSECONDS_PER_SECOND;
 +
 +    /* Subtract the time after migration. This combined with the pre_save
 +     * action results in us having subtracted the time that the guest was
 +     * stopped to the offset.
 +     */
 +    s->tick_offset = s->tick_offset - now;
 +
 +    return 0;
  }
- static const VMStateDescription vmstate_rtc = {
+-WRAP_ENV_FN(gen_VRECPS_tramp, gen_helper_recps_f32)
-     .name = TYPE_XLNX_ZYNQMP_RTC,
+-
-     .version_id = 1,
+-static void gen_VRECPS_fp_3s(unsigned vece, uint32_t rd_ofs,
-     .minimum_version_id = 1,
+-                             uint32_t rn_ofs, uint32_t rm_ofs,
-+    .pre_save = rtc_pre_save,
+-                             uint32_t oprsz, uint32_t maxsz)
-+    .post_load = rtc_post_load,
+-{
-     .fields = (VMStateField[]) {
+-    static const GVecGen3 ops = { .fni4 = gen_VRECPS_tramp };
-         VMSTATE_UINT32_ARRAY(regs, XlnxZynqMPRTC, XLNX_ZYNQMP_RTC_R_MAX),
+-    tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops);
-+        VMSTATE_UINT32(tick_offset, XlnxZynqMPRTC),
+-}
-         VMSTATE_END_OF_LIST(),
+-
-     }
+-static bool trans_VRECPS_fp_3s(DisasContext *s, arg_3same *a)
- };
+-{
-diff --git a/hw/timer/trace-events b/hw/timer/trace-events
+-    if (a->size != 0) {
-index XXXXXXX..XXXXXXX 100644
+-        /* TODO fp16 support */
---- a/hw/timer/trace-events
+-        return false;
-+++ b/hw/timer/trace-events
+-    }
-@@ -XXX,XX +XXX,XX @@ systick_write(uint64_t addr, uint32_t value, unsigned size) "systick write addr
+-
- cmsdk_apb_timer_read(uint64_t offset, uint64_t data, unsigned size) "CMSDK APB timer read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
+-    return do_3same(s, a, gen_VRECPS_fp_3s);
- cmsdk_apb_timer_write(uint64_t offset, uint64_t data, unsigned size) "CMSDK APB timer write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
+-}
- cmsdk_apb_timer_reset(void) "CMSDK APB timer: reset"
+-
-+
+ WRAP_ENV_FN(gen_VRSQRTS_tramp, gen_helper_rsqrts_f32)
-+# hw/timer/xlnx-zynqmp-rtc.c
-+xlnx_zynqmp_rtc_gettime(int year, int month, int day, int hour, int min, int sec) "Get time from host: %d-%d-%d %2d:%02d:%02d"
+ static void gen_VRSQRTS_fp_3s(unsigned vece, uint32_t rd_ofs,
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 28/39] target/arm: Decode aa64 armv8.1 three same extra
+[PULL 33/47] target/arm: Implement fp16 for Neon VRSQRTS
-From: Richard Henderson <richard.henderson@linaro.org>
+Convert the Neon VRSQRTS insn to using a gvec helper,
 and use this to implement the fp16 case.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+As with VRECPS, we adjust the phrasing of the new implementation
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+slightly so that the fp32 version parallels the fp16 one.
-Message-id: 20180228193125.20577-6-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200828183354.27913-35-peter.maydell@linaro.org
 ---
- target/arm/helper.h        |  9 +++++
+ target/arm/helper.h             |  4 +++-
- target/arm/translate-a64.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++
+ target/arm/vec_helper.c         | 30 ++++++++++++++++++++++++++++++
- target/arm/vec_helper.c    | 74 +++++++++++++++++++++++++++++++++++++++++
+ target/arm/vfp_helper.c         | 15 ---------------
-files changed, 166 insertions(+)
+ target/arm/translate-neon.c.inc | 21 +--------------------
 files changed, 34 insertions(+), 36 deletions(-)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(dc_zva, void, env, i64)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(vfp_muladdd, f64, f64, f64, f64, ptr)
- DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+ DEF_HELPER_4(vfp_muladds, f32, f32, f32, f32, ptr)
- DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+ DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, ptr)
-+DEF_HELPER_FLAGS_5(gvec_qrdmlah_s16, TCG_CALL_NO_RWG,
+-DEF_HELPER_3(rsqrts_f32, f32, env, f32, f32)
-+                   void, ptr, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_2(recpe_f16, TCG_CALL_NO_RWG, f16, f16, ptr)
-+DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s16, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, ptr)
-+                   void, ptr, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_2(recpe_f64, TCG_CALL_NO_RWG, f64, f64, ptr)
-+DEF_HELPER_FLAGS_5(gvec_qrdmlah_s32, TCG_CALL_NO_RWG,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fminnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i3
-+                   void, ptr, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_5(gvec_recps_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s32, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_recps_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+                   void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_rsqrts_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_rsqrts_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +
- #ifdef TARGET_AARCH64
+ DEF_HELPER_FLAGS_5(gvec_fmla_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- #include "helper-a64.h"
+ DEF_HELPER_FLAGS_5(gvec_fmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- #endif
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op3(DisasContext *s, bool is_q, int rd,
                     vec_full_reg_size(s), gvec_op);
  }
 +/* Expand a 3-operand + env pointer operation using
 + * an out-of-line helper.
 + */
 +static void gen_gvec_op3_env(DisasContext *s, bool is_q, int rd,
 +                             int rn, int rm, gen_helper_gvec_3_ptr *fn)
 +{
 +    tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
 +                       vec_full_reg_offset(s, rn),
 +                       vec_full_reg_offset(s, rm), cpu_env,
 +                       is_q ? 16 : 8, vec_full_reg_size(s), 0, fn);
 +}
 +
  /* Set ZF and NF based on a 64 bit result. This is alas fiddlier
   * than the 32 bit equivalent.
   */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_fp16(DisasContext *s, uint32_t insn)
      clear_vec_high(s, is_q, rd);
  }
 +/* AdvSIMD three same extra
 + *  31   30  29 28       24 23  22  21 20  16  15 14    11  10 9  5 4  0
 + * +---+---+---+-----------+------+---+------+---+--------+---+----+----+
 + * | 0 | Q | U | 0 1 1 1 0 | size | 0 |  Rm  | 1 | opcode | 1 | Rn | Rd |
 + * +---+---+---+-----------+------+---+------+---+--------+---+----+----+
 + */
 +static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
 +{
 +    int rd = extract32(insn, 0, 5);
 +    int rn = extract32(insn, 5, 5);
 +    int opcode = extract32(insn, 11, 4);
 +    int rm = extract32(insn, 16, 5);
 +    int size = extract32(insn, 22, 2);
 +    bool u = extract32(insn, 29, 1);
 +    bool is_q = extract32(insn, 30, 1);
 +    int feature;
 +
 +    switch (u * 16 + opcode) {
 +    case 0x10: /* SQRDMLAH (vector) */
 +    case 0x11: /* SQRDMLSH (vector) */
 +        if (size != 1 && size != 2) {
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        feature = ARM_FEATURE_V8_RDM;
 +        break;
 +    default:
 +        unallocated_encoding(s);
 +        return;
 +    }
 +    if (!arm_dc_feature(s, feature)) {
 +        unallocated_encoding(s);
 +        return;
 +    }
 +    if (!fp_access_check(s)) {
 +        return;
 +    }
 +
 +    switch (opcode) {
 +    case 0x0: /* SQRDMLAH (vector) */
 +        switch (size) {
 +        case 1:
 +            gen_gvec_op3_env(s, is_q, rd, rn, rm, gen_helper_gvec_qrdmlah_s16);
 +            break;
 +        case 2:
 +            gen_gvec_op3_env(s, is_q, rd, rn, rm, gen_helper_gvec_qrdmlah_s32);
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +        return;
 +
 +    case 0x1: /* SQRDMLSH (vector) */
 +        switch (size) {
 +        case 1:
 +            gen_gvec_op3_env(s, is_q, rd, rn, rm, gen_helper_gvec_qrdmlsh_s16);
 +            break;
 +        case 2:
 +            gen_gvec_op3_env(s, is_q, rd, rn, rm, gen_helper_gvec_qrdmlsh_s32);
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +        return;
 +
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
  static void handle_2misc_widening(DisasContext *s, int opcode, bool is_q,
                                    int size, int rn, int rd)
  {
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_three_reg_imm2(DisasContext *s, uint32_t insn)
  static const AArch64DecodeTable data_proc_simd[] = {
      /* pattern  ,  mask     ,  fn                        */
      { 0x0e200400, 0x9f200400, disas_simd_three_reg_same },
 +    { 0x0e008400, 0x9f208400, disas_simd_three_reg_same_extra },
      { 0x0e200000, 0x9f200c00, disas_simd_three_reg_diff },
      { 0x0e200800, 0x9f3e0c00, disas_simd_two_reg_misc },
      { 0x0e300800, 0x9f3e0c00, disas_simd_across_lanes },
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static float32 float32_recps_nf(float32 op1, float32 op2, float_status *stat)
+     return float32_sub(float32_two, float32_mul(op1, op2, stat), stat);
- #define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] |= CPSR_Q
+ }
-+static void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
++/* Reciprocal square-root step. AArch32 non-fused semantics. */
 +static float16 float16_rsqrts_nf(float16 op1, float16 op2, float_status *stat)
 +{
-+    uint64_t *d = vd + opr_sz;
++    op1 = float16_squash_input_denormal(op1, stat);
-+    uintptr_t i;
++    op2 = float16_squash_input_denormal(op2, stat);
 +
-+    for (i = opr_sz; i < max_sz; i += 8) {
++    if ((float16_is_infinity(op1) && float16_is_zero(op2)) ||
-+        *d++ = 0;
++        (float16_is_infinity(op2) && float16_is_zero(op1))) {
 +        return float16_one_point_five;
 +    }
++    op1 = float16_sub(float16_three, float16_mul(op1, op2, stat), stat);
++    return float16_div(op1, float16_two, stat);
 +}
 +
- /* Signed saturating rounding doubling multiply-accumulate high half, 16-bit */
++static float32 float32_rsqrts_nf(float32 op1, float32 op2, float_status *stat)
  static uint16_t inl_qrdmlah_s16(CPUARMState *env, int16_t src1,
                                  int16_t src2, int16_t src3)
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(neon_qrdmlah_s16)(CPUARMState *env, uint32_t src1,
      return deposit32(e1, 16, 16, e2);
  }
 +void HELPER(gvec_qrdmlah_s16)(void *vd, void *vn, void *vm,
 +                              void *ve, uint32_t desc)
 +{
-+    uintptr_t opr_sz = simd_oprsz(desc);
++    op1 = float32_squash_input_denormal(op1, stat);
-+    int16_t *d = vd;
++    op2 = float32_squash_input_denormal(op2, stat);
 +    int16_t *n = vn;
 +    int16_t *m = vm;
 +    CPUARMState *env = ve;
 +    uintptr_t i;
 +
-+    for (i = 0; i < opr_sz / 2; ++i) {
++    if ((float32_is_infinity(op1) && float32_is_zero(op2)) ||
-+        d[i] = inl_qrdmlah_s16(env, n[i], m[i], d[i]);
++        (float32_is_infinity(op2) && float32_is_zero(op1))) {
 +        return float32_one_point_five;
 +    }
-+    clear_tail(d, opr_sz, simd_maxsz(desc));
++    op1 = float32_sub(float32_three, float32_mul(op1, op2, stat), stat);
 +    return float32_div(op1, float32_two, stat);
 +}
 +
- /* Signed saturating rounding doubling multiply-subtract high half, 16-bit */
+ #define DO_3OP(NAME, FUNC, TYPE) \
- static uint16_t inl_qrdmlsh_s16(CPUARMState *env, int16_t src1,
+ void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
-                                 int16_t src2, int16_t src3)
+ {                                                                          \
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(neon_qrdmlsh_s16)(CPUARMState *env, uint32_t src1,
+@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_fminnum_s, float32_minnum, float32)
-     return deposit32(e1, 16, 16, e2);
+ DO_3OP(gvec_recps_nf_h, float16_recps_nf, float16)
  DO_3OP(gvec_recps_nf_s, float32_recps_nf, float32)
 +DO_3OP(gvec_rsqrts_nf_h, float16_rsqrts_nf, float16)
 +DO_3OP(gvec_rsqrts_nf_s, float32_rsqrts_nf, float32)
 +
  #ifdef TARGET_AARCH64
  DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_fcvt_f64_to_f16)(float64 a, void *fpstp, uint32_t ahp_mode)
      return r;
  }
-+void HELPER(gvec_qrdmlsh_s16)(void *vd, void *vn, void *vm,
+-float32 HELPER(rsqrts_f32)(CPUARMState *env, float32 a, float32 b)
-+                              void *ve, uint32_t desc)
+-{
-+{
+-    float_status *s = &env->vfp.standard_fp_status;
-+    uintptr_t opr_sz = simd_oprsz(desc);
+-    float32 product;
-+    int16_t *d = vd;
+-    if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
-+    int16_t *n = vn;
+-        (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) {
-+    int16_t *m = vm;
+-        if (!(float32_is_zero(a) || float32_is_zero(b))) {
-+    CPUARMState *env = ve;
+-            float_raise(float_flag_input_denormal, s);
-+    uintptr_t i;
+-        }
-+
+-        return float32_one_point_five;
-+    for (i = 0; i < opr_sz / 2; ++i) {
+-    }
-+        d[i] = inl_qrdmlsh_s16(env, n[i], m[i], d[i]);
+-    product = float32_mul(a, b, s);
-+    }
+-    return float32_div(float32_sub(float32_three, product, s), float32_two, s);
-+    clear_tail(d, opr_sz, simd_maxsz(desc));
+-}
-+}
+-
-+
+ /* NEON helpers.  */
- /* Signed saturating rounding doubling multiply-accumulate high half, 32-bit */
- uint32_t HELPER(neon_qrdmlah_s32)(CPUARMState *env, int32_t src1,
+ /* Constants 256 and 512 are used in some helpers; we avoid relying on
-                                   int32_t src2, int32_t src3)
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(neon_qrdmlah_s32)(CPUARMState *env, int32_t src1,
+index XXXXXXX..XXXXXXX 100644
-     return ret;
+--- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_s, gen_helper_gvec_fmls_h)
  DO_3S_FP_GVEC(VFMA, gen_helper_gvec_vfma_s, gen_helper_gvec_vfma_h)
  DO_3S_FP_GVEC(VFMS, gen_helper_gvec_vfms_s, gen_helper_gvec_vfms_h)
  DO_3S_FP_GVEC(VRECPS, gen_helper_gvec_recps_nf_s, gen_helper_gvec_recps_nf_h)
 +DO_3S_FP_GVEC(VRSQRTS, gen_helper_gvec_rsqrts_nf_s, gen_helper_gvec_rsqrts_nf_h)
  WRAP_FP_GVEC(gen_VMAXNM_fp32_3s, FPST_STD, gen_helper_gvec_fmaxnum_s)
  WRAP_FP_GVEC(gen_VMAXNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fmaxnum_h)
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
      return do_3same(s, a, gen_VMINNM_fp32_3s);
  }
-+void HELPER(gvec_qrdmlah_s32)(void *vd, void *vn, void *vm,
+-WRAP_ENV_FN(gen_VRSQRTS_tramp, gen_helper_rsqrts_f32)
-+                              void *ve, uint32_t desc)
+-
-+{
+-static void gen_VRSQRTS_fp_3s(unsigned vece, uint32_t rd_ofs,
-+    uintptr_t opr_sz = simd_oprsz(desc);
+-                              uint32_t rn_ofs, uint32_t rm_ofs,
-+    int32_t *d = vd;
+-                              uint32_t oprsz, uint32_t maxsz)
-+    int32_t *n = vn;
+-{
-+    int32_t *m = vm;
+-    static const GVecGen3 ops = { .fni4 = gen_VRSQRTS_tramp };
-+    CPUARMState *env = ve;
+-    tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops);
-+    uintptr_t i;
+-}
-+
+-
-+    for (i = 0; i < opr_sz / 4; ++i) {
+-static bool trans_VRSQRTS_fp_3s(DisasContext *s, arg_3same *a)
-+        d[i] = helper_neon_qrdmlah_s32(env, n[i], m[i], d[i]);
+-{
-+    }
+-    if (a->size != 0) {
-+    clear_tail(d, opr_sz, simd_maxsz(desc));
+-        /* TODO fp16 support */
-+}
+-        return false;
-+
+-    }
- /* Signed saturating rounding doubling multiply-subtract high half, 32-bit */
+-
- uint32_t HELPER(neon_qrdmlsh_s32)(CPUARMState *env, int32_t src1,
+-    return do_3same(s, a, gen_VRSQRTS_fp_3s);
-                                   int32_t src2, int32_t src3)
+-}
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(neon_qrdmlsh_s32)(CPUARMState *env, int32_t src1,
+-
-     }
+ static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
-     return ret;
+ {
- }
+     /* FP operations handled pairwise 32 bits at a time */
 +
 +void HELPER(gvec_qrdmlsh_s32)(void *vd, void *vn, void *vm,
 +                              void *ve, uint32_t desc)
 +{
 +    uintptr_t opr_sz = simd_oprsz(desc);
 +    int32_t *d = vd;
 +    int32_t *n = vn;
 +    int32_t *m = vm;
 +    CPUARMState *env = ve;
 +    uintptr_t i;
 +
 +    for (i = 0; i < opr_sz / 4; ++i) {
 +        d[i] = helper_neon_qrdmlsh_s32(env, n[i], m[i], d[i]);
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 10/39] target/arm: Define init-svtor property for the reset secure VTOR value
+[PULL 34/47] target/arm: Implement fp16 for Neon pairwise fp ops
-The Cortex-M33 allows the system to specify the reset value of the
+Convert the Neon pairwise fp ops to use a single gvic-style
-secure Vector Table Offset Register (VTOR) by asserting config
+helper to do the full operation instead of one helper call
-signals. In particular, guest images for the MPS2 AN505 board rely
+for each 32-bit part. This allows us to use the same
-on the MPS2's initial VTOR being correct for that board.
+framework to implement the fp16.
 Implement a QEMU property so board and SoC code can set the reset
 value to the correct value.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-7-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-36-peter.maydell@linaro.org
 ---
- target/arm/cpu.h |  3 +++
+ target/arm/helper.h             |  7 +++++
- target/arm/cpu.c | 18 ++++++++++++++----
+ target/arm/vec_helper.c         | 45 +++++++++++++++++++++++++++++++++
-files changed, 17 insertions(+), 4 deletions(-)
+ target/arm/translate-neon.c.inc | 42 ++++++++++++------------------
 files changed, 68 insertions(+), 26 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/helper.h
-+++ b/target/arm/cpu.h
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fcmlas_idx, TCG_CALL_NO_RWG,
-      */
+ DEF_HELPER_FLAGS_5(gvec_fcmlad, TCG_CALL_NO_RWG,
-     uint32_t psci_conduit;
+                    void, ptr, ptr, ptr, ptr, i32)
-+    /* For v8M, initial value of the Secure VTOR */
++DEF_HELPER_FLAGS_5(neon_paddh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-+    uint32_t init_svtor;
++DEF_HELPER_FLAGS_5(neon_pmaxh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(neon_pminh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(neon_padds, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(neon_pmaxs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(neon_pmins, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 +
-     /* [QEMU_]KVM_ARM_TARGET_* constant for this CPU, or
+ DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-      * QEMU_KVM_ARM_TARGET_NONE if the kernel doesn't support this CPU type.
+ DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-      */
+ DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/target/arm/vec_helper.c
-+++ b/target/arm/cpu.c
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
+@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_s, uint32_t)
-         uint32_t initial_msp; /* Loaded from 0x0 */
+ DO_ABA(gvec_uaba_d, uint64_t)
-         uint32_t initial_pc; /* Loaded from 0x4 */
-         uint8_t *rom;
+ #undef DO_ABA
 +        uint32_t vecbase;
          if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
              env->v7m.secure = true;
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
          /* Unlike A/R profile, M profile defines the reset LR value */
          env->regs[14] = 0xffffffff;
 -        /* Load the initial SP and PC from the vector table at address 0 */
 -        rom = rom_ptr(0);
 +        env->v7m.vecbase[M_REG_S] = cpu->init_svtor & 0xffffff80;
 +
-+        /* Load the initial SP and PC from offset 0 and 4 in the vector table */
++#define DO_NEON_PAIRWISE(NAME, OP)                                      \
-+        vecbase = env->v7m.vecbase[env->v7m.secure];
++    void HELPER(NAME##s)(void *vd, void *vn, void *vm,                  \
-+        rom = rom_ptr(vecbase);
++                         void *stat, uint32_t oprsz)                    \
-         if (rom) {
++    {                                                                   \
-             /* Address zero is covered by ROM which hasn't yet been
++        float_status *fpst = stat;                                      \
-              * copied into physical memory.
++        float32 *d = vd;                                                \
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
++        float32 *n = vn;                                                \
-              * it got copied into memory. In the latter case, rom_ptr
++        float32 *m = vm;                                                \
-              * will return a NULL pointer and we should use ldl_phys instead.
++        float32 r0, r1;                                                 \
-              */
++                                                                        \
--            initial_msp = ldl_phys(s->as, 0);
++        /* Read all inputs before writing outputs in case vm == vd */   \
--            initial_pc = ldl_phys(s->as, 4);
++        r0 = float32_##OP(n[H4(0)], n[H4(1)], fpst);                    \
-+            initial_msp = ldl_phys(s->as, vecbase);
++        r1 = float32_##OP(m[H4(0)], m[H4(1)], fpst);                    \
-+            initial_pc = ldl_phys(s->as, vecbase + 4);
++                                                                        \
-         }
++        d[H4(0)] = r0;                                                  \
++        d[H4(1)] = r1;                                                  \
-         env->regs[13] = initial_msp & 0xFFFFFFFC;
++    }                                                                   \
-@@ -XXX,XX +XXX,XX @@ static Property arm_cpu_pmsav7_dregion_property =
++                                                                        \
-                                            pmsav7_dregion,
++    void HELPER(NAME##h)(void *vd, void *vn, void *vm,                  \
-                                            qdev_prop_uint32, uint32_t);
++                         void *stat, uint32_t oprsz)                    \
++    {                                                                   \
-+/* M profile: initial value of the Secure VTOR */
++        float_status *fpst = stat;                                      \
-+static Property arm_cpu_initsvtor_property =
++        float16 *d = vd;                                                \
-+            DEFINE_PROP_UINT32("init-svtor", ARMCPU, init_svtor, 0);
++        float16 *n = vn;                                                \
 +        float16 *m = vm;                                                \
 +        float16 r0, r1, r2, r3;                                         \
 +                                                                        \
 +        /* Read all inputs before writing outputs in case vm == vd */   \
 +        r0 = float16_##OP(n[H2(0)], n[H2(1)], fpst);                    \
 +        r1 = float16_##OP(n[H2(2)], n[H2(3)], fpst);                    \
 +        r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst);                    \
 +        r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst);                    \
 +                                                                        \
 +        d[H4(0)] = r0;                                                  \
 +        d[H4(1)] = r1;                                                  \
 +        d[H4(2)] = r2;                                                  \
 +        d[H4(3)] = r3;                                                  \
 +    }
 +
- static void arm_cpu_post_init(Object *obj)
++DO_NEON_PAIRWISE(neon_padd, add)
 +DO_NEON_PAIRWISE(neon_pmax, max)
 +DO_NEON_PAIRWISE(neon_pmin, min)
 +
 +#undef DO_NEON_PAIRWISE
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
      return do_3same(s, a, gen_VMINNM_fp32_3s);
  }
 -static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
 +static bool do_3same_fp_pair(DisasContext *s, arg_3same *a,
 +                             gen_helper_gvec_3_ptr *fn)
  {
-     ARMCPU *cpu = ARM_CPU(obj);
+-    /* FP operations handled pairwise 32 bits at a time */
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_post_init(Object *obj)
+-    TCGv_i32 tmp, tmp2, tmp3;
-                                  qdev_prop_allow_set_link_before_realize,
++    /* FP pairwise operations */
-                                  OBJ_PROP_LINK_UNREF_ON_RELEASE,
+     TCGv_ptr fpstatus;
-                                  &error_abort);
-+        qdev_property_add_static(DEVICE(obj), &arm_cpu_initsvtor_property,
+     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+                                 &error_abort);
+@@ -XXX,XX +XXX,XX @@ static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
      assert(a->q == 0); /* enforced by decode patterns */
 -    /*
 -     * Note that we have to be careful not to clobber the source operands
 -     * in the "vm == vd" case by storing the result of the first pass too
 -     * early. Since Q is 0 there are always just two passes, so instead
 -     * of a complicated loop over each pass we just unroll.
 -     */
 -    fpstatus = fpstatus_ptr(FPST_STD);
 -    tmp = neon_load_reg(a->vn, 0);
 -    tmp2 = neon_load_reg(a->vn, 1);
 -    fn(tmp, tmp, tmp2, fpstatus);
 -    tcg_temp_free_i32(tmp2);
 -    tmp3 = neon_load_reg(a->vm, 0);
 -    tmp2 = neon_load_reg(a->vm, 1);
 -    fn(tmp3, tmp3, tmp2, fpstatus);
 -    tcg_temp_free_i32(tmp2);
 +    fpstatus = fpstatus_ptr(a->size != 0 ? FPST_STD_F16 : FPST_STD);
 +    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
 +                       vfp_reg_offset(1, a->vn),
 +                       vfp_reg_offset(1, a->vm),
 +                       fpstatus, 8, 8, 0, fn);
      tcg_temp_free_ptr(fpstatus);
 -    neon_store_reg(a->vd, 0, tmp);
 -    neon_store_reg(a->vd, 1, tmp3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
      static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
      {                                                               \
          if (a->size != 0) {                                         \
 -            /* TODO fp16 support */                                 \
 -            return false;                                           \
 +            if (!dc_isar_feature(aa32_fp16_arith, s)) {             \
 +                return false;                                       \
 +            }                                                       \
 +            return do_3same_fp_pair(s, a, FUNC##h);                 \
          }                                                           \
 -        return do_3same_fp_pair(s, a, FUNC);                        \
 +        return do_3same_fp_pair(s, a, FUNC##s);                     \
      }
-     qdev_property_add_static(DEVICE(obj), &arm_cpu_cfgend_property,
+-DO_3S_FP_PAIR(VPADD, gen_helper_vfp_adds)
 -DO_3S_FP_PAIR(VPMAX, gen_helper_vfp_maxs)
 -DO_3S_FP_PAIR(VPMIN, gen_helper_vfp_mins)
 +DO_3S_FP_PAIR(VPADD, gen_helper_neon_padd)
 +DO_3S_FP_PAIR(VPMAX, gen_helper_neon_pmax)
 +DO_3S_FP_PAIR(VPMIN, gen_helper_neon_pmin)
  static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
  {
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 30/39] target/arm: Decode aa32 armv8.1 three same
+[PULL 35/47] target/arm: Implement fp16 for Neon float-integer VCVT
-From: Richard Henderson <richard.henderson@linaro.org>
+Convert the Neon float-integer VCVT insns to gvec, and use this
 to implement fp16 support for them.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Note that unlike the VFP int<->fp16 VCVT insns we converted
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+earlier and which convert to/from a 32-bit integer, these
-Message-id: 20180228193125.20577-8-richard.henderson@linaro.org
+Neon insns convert to/from 16-bit integers. So we can use
 the existing vfp conversion helpers for the f32<->u32/i32
 case but need to provide our own for f16<->u16/i16.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200828183354.27913-37-peter.maydell@linaro.org
 ---
- target/arm/translate.c | 86 +++++++++++++++++++++++++++++++++++++++-----------
+ target/arm/helper.h             |  9 +++++++++
-file changed, 67 insertions(+), 19 deletions(-)
+ target/arm/vec_helper.c         | 29 +++++++++++++++++++++++++++++
  target/arm/translate-neon.c.inc | 15 ++++-----------
 files changed, 42 insertions(+), 11 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/helper.h
-+++ b/target/arm/translate.c
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(neon_padds, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- #include "disas/disas.h"
+ DEF_HELPER_FLAGS_5(neon_pmaxs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- #include "exec/exec-all.h"
+ DEF_HELPER_FLAGS_5(neon_pmins, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
- #include "tcg-op.h"
-+#include "tcg-op-gvec.h"
++DEF_HELPER_FLAGS_4(gvec_sstoh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- #include "qemu/log.h"
++DEF_HELPER_FLAGS_4(gvec_sitos, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- #include "qemu/bitops.h"
++DEF_HELPER_FLAGS_4(gvec_ustoh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- #include "arm_ldst.h"
++DEF_HELPER_FLAGS_4(gvec_uitos, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ static void gen_neon_narrow_op(int op, int u, int size,
++DEF_HELPER_FLAGS_4(gvec_tosszh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- #define NEON_3R_VPMAX 20
++DEF_HELPER_FLAGS_4(gvec_tosizs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- #define NEON_3R_VPMIN 21
++DEF_HELPER_FLAGS_4(gvec_touszh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- #define NEON_3R_VQDMULH_VQRDMULH 22
++DEF_HELPER_FLAGS_4(gvec_touizs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 -#define NEON_3R_VPADD 23
 +#define NEON_3R_VPADD_VQRDMLAH 23
  #define NEON_3R_SHA 24 /* SHA1C,SHA1P,SHA1M,SHA1SU0,SHA256H{2},SHA256SU1 */
 -#define NEON_3R_VFM 25 /* VFMA, VFMS : float fused multiply-add */
 +#define NEON_3R_VFM_VQRDMLSH 25 /* VFMA, VFMS, VQRDMLSH */
  #define NEON_3R_FLOAT_ARITH 26 /* float VADD, VSUB, VPADD, VABD */
  #define NEON_3R_FLOAT_MULTIPLY 27 /* float VMLA, VMLS, VMUL */
  #define NEON_3R_FLOAT_CMP 28 /* float VCEQ, VCGE, VCGT */
@@ -XXX,XX +XXX,XX @@ static const uint8_t neon_3r_sizes[] = {
      [NEON_3R_VPMAX] = 0x7,
      [NEON_3R_VPMIN] = 0x7,
      [NEON_3R_VQDMULH_VQRDMULH] = 0x6,
 -    [NEON_3R_VPADD] = 0x7,
 +    [NEON_3R_VPADD_VQRDMLAH] = 0x7,
      [NEON_3R_SHA] = 0xf, /* size field encodes op type */
 -    [NEON_3R_VFM] = 0x5, /* size bit 1 encodes op */
 +    [NEON_3R_VFM_VQRDMLSH] = 0x7, /* For VFM, size bit 1 encodes op */
      [NEON_3R_FLOAT_ARITH] = 0x5, /* size bit 1 encodes op */
      [NEON_3R_FLOAT_MULTIPLY] = 0x5, /* size bit 1 encodes op */
      [NEON_3R_FLOAT_CMP] = 0x5, /* size bit 1 encodes op */
@@ -XXX,XX +XXX,XX @@ static const uint8_t neon_2rm_sizes[] = {
      [NEON_2RM_VCVT_UF] = 0x4,
  };
 +
-+/* Expand v8.1 simd helper.  */
+ DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+static int do_v81_helper(DisasContext *s, gen_helper_gvec_3_ptr *fn,
+ DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+                         int q, int rd, int rn, int rm)
+ DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t float32_acgt(float32 op1, float32 op2, float_status *stat)
      return -float32_lt(float32_abs(op2), float32_abs(op1), stat);
  }
 +static int16_t vfp_tosszh(float16 x, void *fpstp)
 +{
-+    if (arm_dc_feature(s, ARM_FEATURE_V8_RDM)) {
++    float_status *fpst = fpstp;
-+        int opr_sz = (1 + q) * 8;
++    if (float16_is_any_nan(x)) {
-+        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
++        float_raise(float_flag_invalid, fpst);
 +                           vfp_reg_offset(1, rn),
 +                           vfp_reg_offset(1, rm), cpu_env,
 +                           opr_sz, opr_sz, 0, fn);
 +        return 0;
 +    }
-+    return 1;
++    return float16_to_int16_round_to_zero(x, fpst);
 +}
 +
- /* Translate a NEON data processing instruction.  Return nonzero if the
++static uint16_t vfp_touszh(float16 x, void *fpstp)
-    instruction is invalid.
++{
-    We process data in a mixture of 32-bit and 64-bit chunks.
++    float_status *fpst = fpstp;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++    if (float16_is_any_nan(x)) {
-         if (q && ((rd | rn | rm) & 1)) {
++        float_raise(float_flag_invalid, fpst);
-             return 1;
++        return 0;
-         }
++    }
--        /*
++    return float16_to_uint16_round_to_zero(x, fpst);
--         * The SHA-1/SHA-256 3-register instructions require special treatment
++}
 -         * here, as their size field is overloaded as an op type selector, and
 -         * they all consume their input in a single pass.
 -         */
 -        if (op == NEON_3R_SHA) {
 +        switch (op) {
 +        case NEON_3R_SHA:
 +            /* The SHA-1/SHA-256 3-register instructions require special
 +             * treatment here, as their size field is overloaded as an
 +             * op type selector, and they all consume their input in a
 +             * single pass.
 +             */
              if (!q) {
                  return 1;
              }
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
              tcg_temp_free_ptr(ptr2);
              tcg_temp_free_ptr(ptr3);
              return 0;
 +
-+        case NEON_3R_VPADD_VQRDMLAH:
+ #define DO_2OP(NAME, FUNC, TYPE) \
-+            if (!u) {
+ void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc)  \
-+                break;  /* VPADD */
+ {                                                                 \
-+            }
+@@ -XXX,XX +XXX,XX @@ DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16)
-+            /* VQRDMLAH */
+ DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32)
-+            switch (size) {
+ DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64)
-+            case 1:
-+                return do_v81_helper(s, gen_helper_gvec_qrdmlah_s16,
++DO_2OP(gvec_sitos, helper_vfp_sitos, int32_t)
-+                                     q, rd, rn, rm);
++DO_2OP(gvec_uitos, helper_vfp_uitos, uint32_t)
-+            case 2:
++DO_2OP(gvec_tosizs, helper_vfp_tosizs, float32)
-+                return do_v81_helper(s, gen_helper_gvec_qrdmlah_s32,
++DO_2OP(gvec_touizs, helper_vfp_touizs, float32)
-+                                     q, rd, rn, rm);
++DO_2OP(gvec_sstoh, int16_to_float16, int16_t)
-+            }
++DO_2OP(gvec_ustoh, uint16_to_float16, uint16_t)
-+            return 1;
++DO_2OP(gvec_tosszh, vfp_tosszh, float16)
 +DO_2OP(gvec_touszh, vfp_touszh, float16)
 +
-+        case NEON_3R_VFM_VQRDMLSH:
+ #define WRAP_CMP0_FWD(FN, CMPOP, TYPE)                          \
-+            if (!u) {
+     static TYPE TYPE##_##FN##0(TYPE op, float_status *stat)     \
-+                /* VFM, VFMS */
+     {                                                           \
-+                if (size == 1) {
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
-+                    return 1;
+index XXXXXXX..XXXXXXX 100644
-+                }
+--- a/target/arm/translate-neon.c.inc
-+                break;
++++ b/target/arm/translate-neon.c.inc
-+            }
+@@ -XXX,XX +XXX,XX @@ static bool do_2misc_fp(DisasContext *s, arg_2misc *a,
-+            /* VQRDMLSH */
+     return true;
-+            switch (size) {
+ }
-+            case 1:
-+                return do_v81_helper(s, gen_helper_gvec_qrdmlsh_s16,
+-#define DO_2MISC_FP(INSN, FUNC)                                 \
-+                                     q, rd, rn, rm);
+-    static bool trans_##INSN(DisasContext *s, arg_2misc *a)     \
-+            case 2:
+-    {                                                           \
-+                return do_v81_helper(s, gen_helper_gvec_qrdmlsh_s32,
+-        return do_2misc_fp(s, a, FUNC);                         \
-+                                     q, rd, rn, rm);
+-    }
-+            }
+-
-+            return 1;
+-DO_2MISC_FP(VCVT_FS, gen_helper_vfp_sitos)
-         }
+-DO_2MISC_FP(VCVT_FU, gen_helper_vfp_uitos)
-         if (size == 3 && op != NEON_3R_LOGIC) {
+-DO_2MISC_FP(VCVT_SF, gen_helper_vfp_tosizs)
-             /* 64-bit element instructions. */
+-DO_2MISC_FP(VCVT_UF, gen_helper_vfp_touizs)
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+-
-                 rm = rtmp;
+ #define DO_2MISC_FP_VEC(INSN, HFUNC, SFUNC)                             \
-             }
+     static void gen_##INSN(unsigned vece, uint32_t rd_ofs,              \
-             break;
+                            uint32_t rm_ofs,                             \
--        case NEON_3R_VPADD:
+@@ -XXX,XX +XXX,XX @@ DO_2MISC_FP_VEC(VCGE0_F, gen_helper_gvec_fcge0_h, gen_helper_gvec_fcge0_s)
--            if (u) {
+ DO_2MISC_FP_VEC(VCEQ0_F, gen_helper_gvec_fceq0_h, gen_helper_gvec_fceq0_s)
--                return 1;
+ DO_2MISC_FP_VEC(VCLT0_F, gen_helper_gvec_fclt0_h, gen_helper_gvec_fclt0_s)
--            }
+ DO_2MISC_FP_VEC(VCLE0_F, gen_helper_gvec_fcle0_h, gen_helper_gvec_fcle0_s)
--            /* Fall through */
++DO_2MISC_FP_VEC(VCVT_FS, gen_helper_gvec_sstoh, gen_helper_gvec_sitos)
-+        case NEON_3R_VPADD_VQRDMLAH:
++DO_2MISC_FP_VEC(VCVT_FU, gen_helper_gvec_ustoh, gen_helper_gvec_uitos)
-         case NEON_3R_VPMAX:
++DO_2MISC_FP_VEC(VCVT_SF, gen_helper_gvec_tosszh, gen_helper_gvec_tosizs)
-         case NEON_3R_VPMIN:
++DO_2MISC_FP_VEC(VCVT_UF, gen_helper_gvec_touszh, gen_helper_gvec_touizs)
-             pairwise = 1;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+ static bool trans_VRINTX(DisasContext *s, arg_2misc *a)
-                 return 1;
+ {
              }
              break;
 -        case NEON_3R_VFM:
 -            if (!arm_dc_feature(s, ARM_FEATURE_VFP4) || u) {
 +        case NEON_3R_VFM_VQRDMLSH:
 +            if (!arm_dc_feature(s, ARM_FEATURE_VFP4)) {
                  return 1;
              }
              break;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  }
              }
              break;
 -        case NEON_3R_VPADD:
 +        case NEON_3R_VPADD_VQRDMLAH:
              switch (size) {
              case 0: gen_helper_neon_padd_u8(tmp, tmp, tmp2); break;
              case 1: gen_helper_neon_padd_u16(tmp, tmp, tmp2); break;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                }
              }
              break;
 -        case NEON_3R_VFM:
 +        case NEON_3R_VFM_VQRDMLSH:
          {
              /* VFMA, VFMS: fused multiply-add */
              TCGv_ptr fpstatus = get_fpstatus_ptr(1);
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 26/39] target/arm: Refactor disas_simd_indexed size checks
+[PULL 36/47] target/arm: Convert Neon VCVT fixed-point to gvec
-From: Richard Henderson <richard.henderson@linaro.org>
+Convert the Neon VCVT float<->fixed-point insns to a
 gvec style, in preparation for adding fp16 support.
-The integer size check was already outside of the opcode switch;
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-move the floating-point size check outside as well.  Unify the
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-size vs index adjustment between fp and integer paths.
+Message-id: 20200828183354.27913-38-peter.maydell@linaro.org
 ---
  target/arm/helper.h             |  5 +++++
  target/arm/vec_helper.c         | 20 +++++++++++++++++++
  target/arm/translate-neon.c.inc | 35 +++++++++++++++++----------------
 files changed, 43 insertions(+), 17 deletions(-)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20180228193125.20577-4-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/translate-a64.c | 65 +++++++++++++++++++++++-----------------------
 file changed, 32 insertions(+), 33 deletions(-)
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/helper.h
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_tosizs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-     case 0x05: /* FMLS */
+ DEF_HELPER_FLAGS_4(gvec_touszh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-     case 0x09: /* FMUL */
+ DEF_HELPER_FLAGS_4(gvec_touizs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-     case 0x19: /* FMULX */
--        if (size == 1) {
++DEF_HELPER_FLAGS_4(gvec_vcvt_sf, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
--            unallocated_encoding(s);
++DEF_HELPER_FLAGS_4(gvec_vcvt_uf, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
--            return;
++DEF_HELPER_FLAGS_4(gvec_vcvt_fs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
--        }
++DEF_HELPER_FLAGS_4(gvec_vcvt_fu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-         is_fp = true;
++
-         break;
+ DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-     default:
+ DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
+ DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-     if (is_fp) {
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
-         /* convert insn encoded size to TCGMemOp size */
+index XXXXXXX..XXXXXXX 100644
-         switch (size) {
+--- a/target/arm/vec_helper.c
--        case 2: /* single precision */
++++ b/target/arm/vec_helper.c
--            size = MO_32;
+@@ -XXX,XX +XXX,XX @@ DO_NEON_PAIRWISE(neon_pmax, max)
--            index = h << 1 | l;
+ DO_NEON_PAIRWISE(neon_pmin, min)
--            rm |= (m << 4);
--            break;
+ #undef DO_NEON_PAIRWISE
--        case 3: /* double precision */
++
--            size = MO_64;
++#define DO_VCVT_FIXED(NAME, FUNC, TYPE)                                 \
--            if (l || !is_q) {
++    void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc)    \
-+        case 0: /* half-precision */
++    {                                                                   \
-+            if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
++        intptr_t i, oprsz = simd_oprsz(desc);                           \
-                 unallocated_encoding(s);
++        int shift = simd_data(desc);                                    \
-                 return;
++        TYPE *d = vd, *n = vn;                                          \
-             }
++        float_status *fpst = stat;                                      \
--            index = h;
++        for (i = 0; i < oprsz / sizeof(TYPE); i++) {                    \
--            rm |= (m << 4);
++            d[i] = FUNC(n[i], shift, fpst);                             \
--            break;
++        }                                                               \
--        case 0: /* half precision */
++        clear_tail(d, oprsz, simd_maxsz(desc));                         \
-             size = MO_16;
++    }
--            index = h << 2 | l << 1 | m;
++
--            is_fp16 = true;
++DO_VCVT_FIXED(gvec_vcvt_sf, helper_vfp_sltos, uint32_t)
--            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
++DO_VCVT_FIXED(gvec_vcvt_uf, helper_vfp_ultos, uint32_t)
--                break;
++DO_VCVT_FIXED(gvec_vcvt_fs, helper_vfp_tosls_round_to_zero, uint32_t)
--            }
++DO_VCVT_FIXED(gvec_vcvt_fu, helper_vfp_touls_round_to_zero, uint32_t)
--            /* fallthru */
++
--        default: /* unallocated */
++#undef DO_VCVT_FIXED
--            unallocated_encoding(s);
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
--            return;
+index XXXXXXX..XXXXXXX 100644
--        }
+--- a/target/arm/translate-neon.c.inc
--    } else {
++++ b/target/arm/translate-neon.c.inc
--        switch (size) {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
--        case 1:
+ }
--            index = h << 2 | l << 1 | m;
-             break;
+ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
--        case 2:
+-                      NeonGenTwoSingleOpFn *fn)
--            index = h << 1 | l;
++                      gen_helper_gvec_2_ptr *fn)
--            rm |= (m << 4);
+ {
-+        case MO_32: /* single precision */
+     /* FP operations in 2-reg-and-shift group */
-+        case MO_64: /* double precision */
+-    TCGv_i32 tmp, shiftv;
-             break;
+-    TCGv_ptr fpstatus;
-         default:
+-    int pass;
-             unallocated_encoding(s);
++    int vec_size = a->q ? 16 : 8;
-             return;
++    int rd_ofs = neon_reg_offset(a->vd, 0);
-         }
++    int rm_ofs = neon_reg_offset(a->vm, 0);
-+    } else {
++    TCGv_ptr fpst;
-+        switch (size) {
-+        case MO_8:
+     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        case MO_64:
+         return false;
-+            unallocated_encoding(s);
+     }
-+            return;
 +    if (a->size != 0) {
 +        if (!dc_isar_feature(aa32_fp16_arith, s)) {
 +            return false;
 +        }
 +    }
 +
-+    /* Given TCGMemOp size, adjust register and indexing.  */
+     /* UNDEF accesses to D16-D31 if they don't exist. */
-+    switch (size) {
+     if (!dc_isar_feature(aa32_simd_r32, s) &&
-+    case MO_16:
+         ((a->vd | a->vm) & 0x10)) {
-+        index = h << 2 | l << 1 | m;
+@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
-+        break;
+         return true;
 +    case MO_32:
 +        index = h << 1 | l;
 +        rm |= m << 4;
 +        break;
 +    case MO_64:
 +        if (l || !is_q) {
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        index = h;
 +        rm |= m << 4;
 +        break;
 +    default:
 +        g_assert_not_reached();
      }
-     if (!fp_access_check(s)) {
+-    fpstatus = fpstatus_ptr(FPST_STD);
 -    shiftv = tcg_const_i32(a->shift);
 -    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        tmp = neon_load_reg(a->vm, pass);
 -        fn(tmp, tmp, shiftv, fpstatus);
 -        neon_store_reg(a->vd, pass, tmp);
 -    }
 -    tcg_temp_free_ptr(fpstatus);
 -    tcg_temp_free_i32(shiftv);
 +    fpst = fpstatus_ptr(a->size ? FPST_STD_F16 : FPST_STD);
 +    tcg_gen_gvec_2_ptr(rd_ofs, rm_ofs, fpst, vec_size, vec_size, a->shift, fn);
 +    tcg_temp_free_ptr(fpst);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
          return do_fp_2sh(s, a, FUNC);                                   \
      }
 -DO_FP_2SH(VCVT_SF, gen_helper_vfp_sltos)
 -DO_FP_2SH(VCVT_UF, gen_helper_vfp_ultos)
 -DO_FP_2SH(VCVT_FS, gen_helper_vfp_tosls_round_to_zero)
 -DO_FP_2SH(VCVT_FU, gen_helper_vfp_touls_round_to_zero)
 +DO_FP_2SH(VCVT_SF, gen_helper_gvec_vcvt_sf)
 +DO_FP_2SH(VCVT_UF, gen_helper_gvec_vcvt_uf)
 +DO_FP_2SH(VCVT_FS, gen_helper_gvec_vcvt_fs)
 +DO_FP_2SH(VCVT_FU, gen_helper_gvec_vcvt_fu)
  static uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
  {
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 14/39] include/hw/or-irq.h: Add missing include guard
+[PULL 37/47] target/arm: Implement fp16 for Neon VCVT fixed-point
-The or-irq.h header file is missing the customary guard against
+Implement fp16 for the Neon VCVT insns which convert between
-multiple inclusion, which means compilation fails if it gets
+float and fixed-point.
 included twice. Fix the omission.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-11-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-39-peter.maydell@linaro.org
 ---
- include/hw/or-irq.h | 5 +++++
+ target/arm/helper.h             | 5 +++++
-file changed, 5 insertions(+)
+ target/arm/neon-dp.decode       | 8 +++++++-
  target/arm/vec_helper.c         | 4 ++++
  target/arm/translate-neon.c.inc | 5 +++++
 files changed, 21 insertions(+), 1 deletion(-)
-diff --git a/include/hw/or-irq.h b/include/hw/or-irq.h
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/or-irq.h
+--- a/target/arm/helper.h
-+++ b/include/hw/or-irq.h
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vcvt_uf, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-  * THE SOFTWARE.
+ DEF_HELPER_FLAGS_4(gvec_vcvt_fs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-  */
+ DEF_HELPER_FLAGS_4(gvec_vcvt_fu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+#ifndef HW_OR_IRQ_H
++DEF_HELPER_FLAGS_4(gvec_vcvt_sh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+#define HW_OR_IRQ_H
++DEF_HELPER_FLAGS_4(gvec_vcvt_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(gvec_vcvt_hs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(gvec_vcvt_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +
- #include "hw/irq.h"
+ DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- #include "hw/sysbus.h"
+ DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- #include "qom/object.h"
+ DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ struct OrIRQState {
+diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
-     bool levels[MAX_OR_LINES];
+index XXXXXXX..XXXXXXX 100644
-     uint16_t num_lines;
+--- a/target/arm/neon-dp.decode
- };
++++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ VMINNM_fp_3s     1111 001 1 0 . 1 . .... .... 1111 ... 1 .... @3same_fp
  # We use size=0 for fp32 and size=1 for fp16 to match the 3-same encodings.
  @2reg_vcvt       .... ... . . . 1 ..... .... .... . q:1 . . .... \
                   &2reg_shift vm=%vm_dp vd=%vd_dp size=0 shift=%neon_rshift_i5
 +@2reg_vcvt_f16   .... ... . . . 11 .... .... .... . q:1 . . .... \
 +                 &2reg_shift vm=%vm_dp vd=%vd_dp size=1 shift=%neon_rshift_i4
  VSHR_S_2sh       1111 001 0 1 . ...... .... 0000 . . . 1 .... @2reg_shr_d
  VSHR_S_2sh       1111 001 0 1 . ...... .... 0000 . . . 1 .... @2reg_shr_s
@@ -XXX,XX +XXX,XX @@ VSHLL_U_2sh      1111 001 1 1 . ...... .... 1010 . 0 . 1 .... @2reg_shll_h
  VSHLL_U_2sh      1111 001 1 1 . ...... .... 1010 . 0 . 1 .... @2reg_shll_b
  # VCVT fixed<->float conversions
 -# TODO: FP16 fixed<->float conversions are opc==0b1100 and 0b1101
 +VCVT_SH_2sh      1111 001 0 1 . ...... .... 1100 0 . . 1 .... @2reg_vcvt_f16
 +VCVT_UH_2sh      1111 001 1 1 . ...... .... 1100 0 . . 1 .... @2reg_vcvt_f16
 +VCVT_HS_2sh      1111 001 0 1 . ...... .... 1101 0 . . 1 .... @2reg_vcvt_f16
 +VCVT_HU_2sh      1111 001 1 1 . ...... .... 1101 0 . . 1 .... @2reg_vcvt_f16
 +
-+#endif
+ VCVT_SF_2sh      1111 001 0 1 . ...... .... 1110 0 . . 1 .... @2reg_vcvt
  VCVT_UF_2sh      1111 001 1 1 . ...... .... 1110 0 . . 1 .... @2reg_vcvt
  VCVT_FS_2sh      1111 001 0 1 . ...... .... 1111 0 . . 1 .... @2reg_vcvt
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(gvec_vcvt_sf, helper_vfp_sltos, uint32_t)
  DO_VCVT_FIXED(gvec_vcvt_uf, helper_vfp_ultos, uint32_t)
  DO_VCVT_FIXED(gvec_vcvt_fs, helper_vfp_tosls_round_to_zero, uint32_t)
  DO_VCVT_FIXED(gvec_vcvt_fu, helper_vfp_touls_round_to_zero, uint32_t)
 +DO_VCVT_FIXED(gvec_vcvt_sh, helper_vfp_shtoh, uint16_t)
 +DO_VCVT_FIXED(gvec_vcvt_uh, helper_vfp_uhtoh, uint16_t)
 +DO_VCVT_FIXED(gvec_vcvt_hs, helper_vfp_toshh_round_to_zero, uint16_t)
 +DO_VCVT_FIXED(gvec_vcvt_hu, helper_vfp_touhh_round_to_zero, uint16_t)
  #undef DO_VCVT_FIXED
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ DO_FP_2SH(VCVT_UF, gen_helper_gvec_vcvt_uf)
  DO_FP_2SH(VCVT_FS, gen_helper_gvec_vcvt_fs)
  DO_FP_2SH(VCVT_FU, gen_helper_gvec_vcvt_fu)
 +DO_FP_2SH(VCVT_SH, gen_helper_gvec_vcvt_sh)
 +DO_FP_2SH(VCVT_UH, gen_helper_gvec_vcvt_uh)
 +DO_FP_2SH(VCVT_HS, gen_helper_gvec_vcvt_hs)
 +DO_FP_2SH(VCVT_HU, gen_helper_gvec_vcvt_hu)
 +
  static uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
  {
      /*
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 07/39] hw/arm/armv7m: Honour CPU's address space for image loads
+[PULL 38/47] target/arm: Implement fp16 for Neon VCVT with rounding modes
-Instead of loading guest images to the system address space, use the
+Convert the Neon VCVT with-specified-rounding-mode instructions
-CPU's address space.  This is important if we're trying to load the
+to gvec, and use this to implement fp16 support for them.
 file to memory or via an alias memory region that is provided by an
 SoC object and thus not mapped into the system address space.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-4-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-40-peter.maydell@linaro.org
 ---
- hw/arm/armv7m.c | 17 ++++++++++++++---
+ target/arm/helper.h             |   5 ++
-file changed, 14 insertions(+), 3 deletions(-)
+ target/arm/vec_helper.c         |  23 +++++++
  target/arm/translate-neon.c.inc | 105 ++++++++++++--------------------
 files changed, 66 insertions(+), 67 deletions(-)
-diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/armv7m.c
+--- a/target/arm/helper.h
-+++ b/hw/arm/armv7m.c
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int mem_size)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vcvt_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-     uint64_t entry;
+ DEF_HELPER_FLAGS_4(gvec_vcvt_hs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-     uint64_t lowaddr;
+ DEF_HELPER_FLAGS_4(gvec_vcvt_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-     int big_endian;
-+    AddressSpace *as;
++DEF_HELPER_FLAGS_4(gvec_vcvt_rm_ss, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+    int asidx;
++DEF_HELPER_FLAGS_4(gvec_vcvt_rm_us, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+    CPUState *cs = CPU(cpu);
++DEF_HELPER_FLAGS_4(gvec_vcvt_rm_sh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(gvec_vcvt_rm_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- #ifdef TARGET_WORDS_BIGENDIAN
++
-     big_endian = 1;
+ DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int mem_size)
+ DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-         exit(1);
+ DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(gvec_vcvt_hs, helper_vfp_toshh_round_to_zero, uint16_t)
  DO_VCVT_FIXED(gvec_vcvt_hu, helper_vfp_touhh_round_to_zero, uint16_t)
  #undef DO_VCVT_FIXED
 +
 +#define DO_VCVT_RMODE(NAME, FUNC, TYPE)                                 \
 +    void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc)    \
 +    {                                                                   \
 +        float_status *fpst = stat;                                      \
 +        intptr_t i, oprsz = simd_oprsz(desc);                           \
 +        uint32_t rmode = simd_data(desc);                               \
 +        uint32_t prev_rmode = get_float_rounding_mode(fpst);            \
 +        TYPE *d = vd, *n = vn;                                          \
 +        set_float_rounding_mode(rmode, fpst);                           \
 +        for (i = 0; i < oprsz / sizeof(TYPE); i++) {                    \
 +            d[i] = FUNC(n[i], 0, fpst);                                 \
 +        }                                                               \
 +        set_float_rounding_mode(prev_rmode, fpst);                      \
 +        clear_tail(d, oprsz, simd_maxsz(desc));                         \
 +    }
 +
 +DO_VCVT_RMODE(gvec_vcvt_rm_ss, helper_vfp_tosls, uint32_t)
 +DO_VCVT_RMODE(gvec_vcvt_rm_us, helper_vfp_touls, uint32_t)
 +DO_VCVT_RMODE(gvec_vcvt_rm_sh, helper_vfp_toshh, uint16_t)
 +DO_VCVT_RMODE(gvec_vcvt_rm_uh, helper_vfp_touhh, uint16_t)
 +
 +#undef DO_VCVT_RMODE
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ DO_VRINT(VRINTZ, FPROUNDING_ZERO)
  DO_VRINT(VRINTM, FPROUNDING_NEGINF)
  DO_VRINT(VRINTP, FPROUNDING_POSINF)
 -static bool do_vcvt(DisasContext *s, arg_2misc *a, int rmode, bool is_signed)
 -{
 -    /*
 -     * Handle a VCVT* operation by iterating 32 bits at a time,
 -     * with a specified rounding mode in operation.
 -     */
 -    int pass;
 -    TCGv_ptr fpst;
 -    TCGv_i32 tcg_rmode, tcg_shift;
 -
 -    if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
 -        !arm_dc_feature(s, ARM_FEATURE_V8)) {
 -        return false;
 +#define DO_VEC_RMODE(INSN, RMODE, OP)                                   \
 +    static void gen_##INSN(unsigned vece, uint32_t rd_ofs,              \
 +                           uint32_t rm_ofs,                             \
 +                           uint32_t oprsz, uint32_t maxsz)              \
 +    {                                                                   \
 +        static gen_helper_gvec_2_ptr * const fns[4] = {                 \
 +            NULL,                                                       \
 +            gen_helper_gvec_##OP##h,                                    \
 +            gen_helper_gvec_##OP##s,                                    \
 +            NULL,                                                       \
 +        };                                                              \
 +        TCGv_ptr fpst;                                                  \
 +        fpst = fpstatus_ptr(vece == 1 ? FPST_STD_F16 : FPST_STD);       \
 +        tcg_gen_gvec_2_ptr(rd_ofs, rm_ofs, fpst, oprsz, maxsz,          \
 +                           arm_rmode_to_sf(RMODE), fns[vece]);          \
 +        tcg_temp_free_ptr(fpst);                                        \
 +    }                                                                   \
 +    static bool trans_##INSN(DisasContext *s, arg_2misc *a)             \
 +    {                                                                   \
 +        if (!arm_dc_feature(s, ARM_FEATURE_V8)) {                       \
 +            return false;                                               \
 +        }                                                               \
 +        if (a->size == MO_16) {                                         \
 +            if (!dc_isar_feature(aa32_fp16_arith, s)) {                 \
 +                return false;                                           \
 +            }                                                           \
 +        } else if (a->size != MO_32) {                                  \
 +            return false;                                               \
 +        }                                                               \
 +        return do_2misc_vec(s, a, gen_##INSN);                          \
      }
-+    if (arm_feature(&cpu->env, ARM_FEATURE_EL3)) {
+-    /* UNDEF accesses to D16-D31 if they don't exist. */
-+        asidx = ARMASIdx_S;
+-    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+    } else {
+-        ((a->vd | a->vm) & 0x10)) {
-+        asidx = ARMASIdx_NS;
+-        return false;
-+    }
+-    }
-+    as = cpu_get_address_space(cs, asidx);
+-
-+
+-    if (a->size != 2) {
-     if (kernel_filename) {
+-        /* TODO: FP16 will be the size == 1 case */
--        image_size = load_elf(kernel_filename, NULL, NULL, &entry, &lowaddr,
+-        return false;
--                              NULL, big_endian, EM_ARM, 1, 0);
+-    }
-+        image_size = load_elf_as(kernel_filename, NULL, NULL, &entry, &lowaddr,
+-
-+                                 NULL, big_endian, EM_ARM, 1, 0, as);
+-    if ((a->vd | a->vm) & a->q) {
-         if (image_size < 0) {
+-        return false;
--            image_size = load_image_targphys(kernel_filename, 0, mem_size);
+-    }
-+            image_size = load_image_targphys_as(kernel_filename, 0,
+-
-+                                                mem_size, as);
+-    if (!vfp_access_check(s)) {
-             lowaddr = 0;
+-        return true;
-         }
+-    }
-         if (image_size < 0) {
+-
 -    fpst = fpstatus_ptr(FPST_STD);
 -    tcg_shift = tcg_const_i32(0);
 -    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rmode));
 -    gen_helper_set_neon_rmode(tcg_rmode, tcg_rmode, cpu_env);
 -    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 -        if (is_signed) {
 -            gen_helper_vfp_tosls(tmp, tmp, tcg_shift, fpst);
 -        } else {
 -            gen_helper_vfp_touls(tmp, tmp, tcg_shift, fpst);
 -        }
 -        neon_store_reg(a->vd, pass, tmp);
 -    }
 -    gen_helper_set_neon_rmode(tcg_rmode, tcg_rmode, cpu_env);
 -    tcg_temp_free_i32(tcg_rmode);
 -    tcg_temp_free_i32(tcg_shift);
 -    tcg_temp_free_ptr(fpst);
 -
 -    return true;
 -}
 -
 -#define DO_VCVT(INSN, RMODE, SIGNED)                            \
 -    static bool trans_##INSN(DisasContext *s, arg_2misc *a)     \
 -    {                                                           \
 -        return do_vcvt(s, a, RMODE, SIGNED);                    \
 -    }
 -
 -DO_VCVT(VCVTAU, FPROUNDING_TIEAWAY, false)
 -DO_VCVT(VCVTAS, FPROUNDING_TIEAWAY, true)
 -DO_VCVT(VCVTNU, FPROUNDING_TIEEVEN, false)
 -DO_VCVT(VCVTNS, FPROUNDING_TIEEVEN, true)
 -DO_VCVT(VCVTPU, FPROUNDING_POSINF, false)
 -DO_VCVT(VCVTPS, FPROUNDING_POSINF, true)
 -DO_VCVT(VCVTMU, FPROUNDING_NEGINF, false)
 -DO_VCVT(VCVTMS, FPROUNDING_NEGINF, true)
 +DO_VEC_RMODE(VCVTAU, FPROUNDING_TIEAWAY, vcvt_rm_u)
 +DO_VEC_RMODE(VCVTAS, FPROUNDING_TIEAWAY, vcvt_rm_s)
 +DO_VEC_RMODE(VCVTNU, FPROUNDING_TIEEVEN, vcvt_rm_u)
 +DO_VEC_RMODE(VCVTNS, FPROUNDING_TIEEVEN, vcvt_rm_s)
 +DO_VEC_RMODE(VCVTPU, FPROUNDING_POSINF, vcvt_rm_u)
 +DO_VEC_RMODE(VCVTPS, FPROUNDING_POSINF, vcvt_rm_s)
 +DO_VEC_RMODE(VCVTMU, FPROUNDING_NEGINF, vcvt_rm_u)
 +DO_VEC_RMODE(VCVTMS, FPROUNDING_NEGINF, vcvt_rm_s)
  static bool trans_VSWP(DisasContext *s, arg_2misc *a)
  {
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 13/39] hw/misc/unimp: Move struct to header file
+[PULL 39/47] target/arm: Implement fp16 for Neon VRINT-with-specified-rounding-mode
-Move the definition of the struct for the unimplemented-device
+Convert the Neon VRINT-with-specified-rounding-mode insns to gvec,
-from unimp.c to unimp.h, so that users can embed the struct
+and use this to implement the fp16 versions.
 in their own device structs if they prefer.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-10-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-41-peter.maydell@linaro.org
 ---
- include/hw/misc/unimp.h | 10 ++++++++++
+ target/arm/helper.h             |  4 +-
- hw/misc/unimp.c         | 10 ----------
+ target/arm/vec_helper.c         | 21 +++++++++++
-files changed, 10 insertions(+), 10 deletions(-)
+ target/arm/vfp_helper.c         | 17 ---------
  target/arm/translate-neon.c.inc | 67 +++------------------------------
 files changed, 30 insertions(+), 79 deletions(-)
-diff --git a/include/hw/misc/unimp.h b/include/hw/misc/unimp.h
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/misc/unimp.h
+--- a/target/arm/helper.h
-+++ b/include/hw/misc/unimp.h
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
+ DEF_HELPER_3(vfp_uqtoh, f16, i64, i32, ptr)
- #define TYPE_UNIMPLEMENTED_DEVICE "unimplemented-device"
+ DEF_HELPER_FLAGS_2(set_rmode, TCG_CALL_NO_RWG, i32, i32, ptr)
-+#define UNIMPLEMENTED_DEVICE(obj) \
+-DEF_HELPER_FLAGS_2(set_neon_rmode, TCG_CALL_NO_RWG, i32, i32, env)
-+    OBJECT_CHECK(UnimplementedDeviceState, (obj), TYPE_UNIMPLEMENTED_DEVICE)
  DEF_HELPER_FLAGS_3(vfp_fcvt_f16_to_f32, TCG_CALL_NO_RWG, f32, f16, ptr, i32)
  DEF_HELPER_FLAGS_3(vfp_fcvt_f32_to_f16, TCG_CALL_NO_RWG, f16, f32, ptr, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vcvt_rm_us, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(gvec_vcvt_rm_sh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(gvec_vcvt_rm_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(gvec_vrint_rm_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(gvec_vrint_rm_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +
-+typedef struct {
+ DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+    SysBusDevice parent_obj;
+ DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+    MemoryRegion iomem;
+ DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+    char *name;
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
-+    uint64_t size;
+index XXXXXXX..XXXXXXX 100644
-+} UnimplementedDeviceState;
+--- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VCVT_RMODE(gvec_vcvt_rm_sh, helper_vfp_toshh, uint16_t)
  DO_VCVT_RMODE(gvec_vcvt_rm_uh, helper_vfp_touhh, uint16_t)
  #undef DO_VCVT_RMODE
 +
- /**
++#define DO_VRINT_RMODE(NAME, FUNC, TYPE)                                \
-  * create_unimplemented_device: create and map a dummy device
++    void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc)    \
-  * @name: name of the device for debug logging
++    {                                                                   \
-diff --git a/hw/misc/unimp.c b/hw/misc/unimp.c
++        float_status *fpst = stat;                                      \
 +        intptr_t i, oprsz = simd_oprsz(desc);                           \
 +        uint32_t rmode = simd_data(desc);                               \
 +        uint32_t prev_rmode = get_float_rounding_mode(fpst);            \
 +        TYPE *d = vd, *n = vn;                                          \
 +        set_float_rounding_mode(rmode, fpst);                           \
 +        for (i = 0; i < oprsz / sizeof(TYPE); i++) {                    \
 +            d[i] = FUNC(n[i], fpst);                                    \
 +        }                                                               \
 +        set_float_rounding_mode(prev_rmode, fpst);                      \
 +        clear_tail(d, oprsz, simd_maxsz(desc));                         \
 +    }
 +
 +DO_VRINT_RMODE(gvec_vrint_rm_h, helper_rinth, uint16_t)
 +DO_VRINT_RMODE(gvec_vrint_rm_s, helper_rints, uint32_t)
 +
 +#undef DO_VRINT_RMODE
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/unimp.c
+--- a/target/arm/vfp_helper.c
-+++ b/hw/misc/unimp.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(set_rmode)(uint32_t rmode, void *fpstp)
- #include "qemu/log.h"
+     return prev_rmode;
- #include "qapi/error.h"
+ }
--#define UNIMPLEMENTED_DEVICE(obj) \
+-/* Set the current fp rounding mode in the standard fp status and return
--    OBJECT_CHECK(UnimplementedDeviceState, (obj), TYPE_UNIMPLEMENTED_DEVICE)
+- * the old one. This is for NEON instructions that need to change the
 - * rounding mode but wish to use the standard FPSCR values for everything
 - * else. Always set the rounding mode back to the correct value after
 - * modifying it.
 - * The argument is a softfloat float_round_ value.
 - */
 -uint32_t HELPER(set_neon_rmode)(uint32_t rmode, CPUARMState *env)
 -{
 -    float_status *fp_status = &env->vfp.standard_fp_status;
 -
--typedef struct {
+-    uint32_t prev_rmode = get_float_rounding_mode(fp_status);
--    SysBusDevice parent_obj;
+-    set_float_rounding_mode(rmode, fp_status);
 -    MemoryRegion iomem;
 -    char *name;
 -    uint64_t size;
 -} UnimplementedDeviceState;
 -
- static uint64_t unimp_read(void *opaque, hwaddr offset, unsigned size)
+-    return prev_rmode;
 -}
 -
  /* Half precision conversions.  */
  float32 HELPER(vfp_fcvt_f16_to_f32)(uint32_t a, void *fpstp, uint32_t ahp_mode)
  {
-     UnimplementedDeviceState *s = UNIMPLEMENTED_DEVICE(opaque);
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX(DisasContext *s, arg_2misc *a)
      return do_2misc_fp(s, a, gen_helper_rints_exact);
  }
 -static bool do_vrint(DisasContext *s, arg_2misc *a, int rmode)
 -{
 -    /*
 -     * Handle a VRINT* operation by iterating 32 bits at a time,
 -     * with a specified rounding mode in operation.
 -     */
 -    int pass;
 -    TCGv_ptr fpst;
 -    TCGv_i32 tcg_rmode;
 -
 -    if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
 -        !arm_dc_feature(s, ARM_FEATURE_V8)) {
 -        return false;
 -    }
 -
 -    /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_simd_r32, s) &&
 -        ((a->vd | a->vm) & 0x10)) {
 -        return false;
 -    }
 -
 -    if (a->size != 2) {
 -        /* TODO: FP16 will be the size == 1 case */
 -        return false;
 -    }
 -
 -    if ((a->vd | a->vm) & a->q) {
 -        return false;
 -    }
 -
 -    if (!vfp_access_check(s)) {
 -        return true;
 -    }
 -
 -    fpst = fpstatus_ptr(FPST_STD);
 -    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rmode));
 -    gen_helper_set_neon_rmode(tcg_rmode, tcg_rmode, cpu_env);
 -    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 -        gen_helper_rints(tmp, tmp, fpst);
 -        neon_store_reg(a->vd, pass, tmp);
 -    }
 -    gen_helper_set_neon_rmode(tcg_rmode, tcg_rmode, cpu_env);
 -    tcg_temp_free_i32(tcg_rmode);
 -    tcg_temp_free_ptr(fpst);
 -
 -    return true;
 -}
 -
 -#define DO_VRINT(INSN, RMODE)                                   \
 -    static bool trans_##INSN(DisasContext *s, arg_2misc *a)     \
 -    {                                                           \
 -        return do_vrint(s, a, RMODE);                           \
 -    }
 -
 -DO_VRINT(VRINTN, FPROUNDING_TIEEVEN)
 -DO_VRINT(VRINTA, FPROUNDING_TIEAWAY)
 -DO_VRINT(VRINTZ, FPROUNDING_ZERO)
 -DO_VRINT(VRINTM, FPROUNDING_NEGINF)
 -DO_VRINT(VRINTP, FPROUNDING_POSINF)
 -
  #define DO_VEC_RMODE(INSN, RMODE, OP)                                   \
      static void gen_##INSN(unsigned vece, uint32_t rd_ofs,              \
                             uint32_t rm_ofs,                             \
@@ -XXX,XX +XXX,XX @@ DO_VEC_RMODE(VCVTPS, FPROUNDING_POSINF, vcvt_rm_s)
  DO_VEC_RMODE(VCVTMU, FPROUNDING_NEGINF, vcvt_rm_u)
  DO_VEC_RMODE(VCVTMS, FPROUNDING_NEGINF, vcvt_rm_s)
 +DO_VEC_RMODE(VRINTN, FPROUNDING_TIEEVEN, vrint_rm_)
 +DO_VEC_RMODE(VRINTA, FPROUNDING_TIEAWAY, vrint_rm_)
 +DO_VEC_RMODE(VRINTZ, FPROUNDING_ZERO, vrint_rm_)
 +DO_VEC_RMODE(VRINTM, FPROUNDING_NEGINF, vrint_rm_)
 +DO_VEC_RMODE(VRINTP, FPROUNDING_POSINF, vrint_rm_)
 +
  static bool trans_VSWP(DisasContext *s, arg_2misc *a)
  {
      TCGv_i64 rm, rd;
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 15/39] qdev: Add new qdev_init_gpio_in_named_with_opaque()
+[PULL 40/47] target/arm: Implement fp16 for Neon VRINTX
-The function qdev_init_gpio_in_named() passes the DeviceState pointer
+Convert the Neon VRINTX insn to use gvec, and use this to implement
-as the opaque data pointor for the irq handler function.  Usually
+fp16 support for it.
 this is what you want, but in some cases it would be helpful to use
 some other data pointer.
 Add a new function qdev_init_gpio_in_named_with_opaque() which allows
 the caller to specify the data pointer they want.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-12-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-42-peter.maydell@linaro.org
 ---
- include/hw/qdev-core.h | 30 ++++++++++++++++++++++++++++--
+ target/arm/helper.h             |  3 +++
- hw/core/qdev.c         |  8 +++++---
+ target/arm/vec_helper.c         |  3 +++
-files changed, 33 insertions(+), 5 deletions(-)
+ target/arm/translate-neon.c.inc | 45 +++------------------------------
 files changed, 9 insertions(+), 42 deletions(-)
-diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/qdev-core.h
+--- a/target/arm/helper.h
-+++ b/include/hw/qdev-core.h
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ BusState *qdev_get_child_bus(DeviceState *dev, const char *name);
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vcvt_rm_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- /* GPIO inputs also double as IRQ sinks.  */
+ DEF_HELPER_FLAGS_4(gvec_vrint_rm_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- void qdev_init_gpio_in(DeviceState *dev, qemu_irq_handler handler, int n);
+ DEF_HELPER_FLAGS_4(gvec_vrint_rm_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
- void qdev_init_gpio_out(DeviceState *dev, qemu_irq *pins, int n);
--void qdev_init_gpio_in_named(DeviceState *dev, qemu_irq_handler handler,
++DEF_HELPER_FLAGS_4(gvec_vrintx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
--                             const char *name, int n);
++DEF_HELPER_FLAGS_4(gvec_vrintx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
  void qdev_init_gpio_out_named(DeviceState *dev, qemu_irq *pins,
                                const char *name, int n);
 +/**
 + * qdev_init_gpio_in_named_with_opaque: create an array of input GPIO lines
 + *   for the specified device
 + *
 + * @dev: Device to create input GPIOs for
 + * @handler: Function to call when GPIO line value is set
 + * @opaque: Opaque data pointer to pass to @handler
 + * @name: Name of the GPIO input (must be unique for this device)
 + * @n: Number of GPIO lines in this input set
 + */
 +void qdev_init_gpio_in_named_with_opaque(DeviceState *dev,
 +                                         qemu_irq_handler handler,
 +                                         void *opaque,
 +                                         const char *name, int n);
 +
-+/**
+ DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+ * qdev_init_gpio_in_named: create an array of input GPIO lines
+ DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+ *   for the specified device
+ DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+ *
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 + * Like qdev_init_gpio_in_named_with_opaque(), but the opaque pointer
 + * passed to the handler is @dev (which is the most commonly desired behaviour).
 + */
 +static inline void qdev_init_gpio_in_named(DeviceState *dev,
 +                                           qemu_irq_handler handler,
 +                                           const char *name, int n)
 +{
 +    qdev_init_gpio_in_named_with_opaque(dev, handler, dev, name, n);
 +}
  void qdev_pass_gpios(DeviceState *dev, DeviceState *container,
                       const char *name);
 diff --git a/hw/core/qdev.c b/hw/core/qdev.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/core/qdev.c
+--- a/target/arm/vec_helper.c
-+++ b/hw/core/qdev.c
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static NamedGPIOList *qdev_get_named_gpio_list(DeviceState *dev,
+@@ -XXX,XX +XXX,XX @@ DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16)
-     return ngl;
+ DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32)
  DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64)
 +DO_2OP(gvec_vrintx_h, float16_round_to_int, float16)
 +DO_2OP(gvec_vrintx_s, float32_round_to_int, float32)
 +
  DO_2OP(gvec_sitos, helper_vfp_sitos, int32_t)
  DO_2OP(gvec_uitos, helper_vfp_uitos, uint32_t)
  DO_2OP(gvec_tosizs, helper_vfp_tosizs, float32)
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VQNEG(DisasContext *s, arg_2misc *a)
      return do_2misc(s, a, fn[a->size]);
  }
--void qdev_init_gpio_in_named(DeviceState *dev, qemu_irq_handler handler,
+-static bool do_2misc_fp(DisasContext *s, arg_2misc *a,
--                             const char *name, int n)
+-                        NeonGenOneSingleOpFn *fn)
-+void qdev_init_gpio_in_named_with_opaque(DeviceState *dev,
+-{
-+                                         qemu_irq_handler handler,
+-    int pass;
-+                                         void *opaque,
+-    TCGv_ptr fpst;
-+                                         const char *name, int n)
+-
 -    /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
 -    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 -        return false;
 -    }
 -
 -    /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_simd_r32, s) &&
 -        ((a->vd | a->vm) & 0x10)) {
 -        return false;
 -    }
 -
 -    if (a->size != 2) {
 -        /* TODO: FP16 will be the size == 1 case */
 -        return false;
 -    }
 -
 -    if ((a->vd | a->vm) & a->q) {
 -        return false;
 -    }
 -
 -    if (!vfp_access_check(s)) {
 -        return true;
 -    }
 -
 -    fpst = fpstatus_ptr(FPST_STD);
 -    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 -        fn(tmp, tmp, fpst);
 -        neon_store_reg(a->vd, pass, tmp);
 -    }
 -    tcg_temp_free_ptr(fpst);
 -
 -    return true;
 -}
 -
  #define DO_2MISC_FP_VEC(INSN, HFUNC, SFUNC)                             \
      static void gen_##INSN(unsigned vece, uint32_t rd_ofs,              \
                             uint32_t rm_ofs,                             \
@@ -XXX,XX +XXX,XX @@ DO_2MISC_FP_VEC(VCVT_FU, gen_helper_gvec_ustoh, gen_helper_gvec_uitos)
  DO_2MISC_FP_VEC(VCVT_SF, gen_helper_gvec_tosszh, gen_helper_gvec_tosizs)
  DO_2MISC_FP_VEC(VCVT_UF, gen_helper_gvec_touszh, gen_helper_gvec_touizs)
 +DO_2MISC_FP_VEC(VRINTX_impl, gen_helper_gvec_vrintx_h, gen_helper_gvec_vrintx_s)
 +
  static bool trans_VRINTX(DisasContext *s, arg_2misc *a)
  {
-     int i;
+     if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
-     NamedGPIOList *gpio_list = qdev_get_named_gpio_list(dev, name);
+         return false;
+     }
-     assert(gpio_list->num_out == 0 || !name);
+-    return do_2misc_fp(s, a, gen_helper_rints_exact);
-     gpio_list->in = qemu_extend_irqs(gpio_list->in, gpio_list->num_in, handler,
++    return trans_VRINTX_impl(s, a);
--                                     dev, n);
+ }
-+                                     opaque, n);
+ #define DO_VEC_RMODE(INSN, RMODE, OP)                                   \
      if (!name) {
          name = "unnamed-gpio-in";
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 09/39] armv7m: Forward idau property to CPU object
+[PULL 41/47] target/arm/vec_helper: Handle oprsz less than 16 bytes in indexed operations
-Create an "idau" property on the armv7m container object which
+In the gvec helper functions for indexed operations, for AArch32
-we can forward to the CPU object. Annoyingly, we can't use
+Neon the oprsz (total size of the vector) can be less than 16 bytes
-object_property_add_alias() because the CPU object we want to
+if the operation is on a D reg. Since the inner loop in these
-forward to doesn't exist until the armv7m container is realized.
+helpers always goes from 0 to segment, we must clamp it based
 on oprsz to avoid processing a full 16 byte segment when asked to
 handle an 8 byte wide vector.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-6-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-43-peter.maydell@linaro.org
 ---
- include/hw/arm/armv7m.h | 3 +++
+ target/arm/vec_helper.c | 12 ++++++++----
- hw/arm/armv7m.c         | 9 +++++++++
+file changed, 8 insertions(+), 4 deletions(-)
 files changed, 12 insertions(+)
-diff --git a/include/hw/arm/armv7m.h b/include/hw/arm/armv7m.h
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/armv7m.h
+--- a/target/arm/vec_helper.c
-+++ b/include/hw/arm/armv7m.h
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DO_MULADD(gvec_vfms_s, float32_mulsub_f, float32)
+ #define DO_MUL_IDX(NAME, TYPE, H) \
- #include "hw/sysbus.h"
+ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
- #include "hw/intc/armv7m_nvic.h"
+ {                                                                          \
-+#include "target/arm/idau.h"
+-    intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE);  \
++    intptr_t i, j, oprsz = simd_oprsz(desc);                               \
- #define TYPE_BITBAND "ARM,bitband-memory"
++    intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
- #define BITBAND(obj) OBJECT_CHECK(BitBandState, (obj), TYPE_BITBAND)
+     intptr_t idx = simd_data(desc);                                        \
-@@ -XXX,XX +XXX,XX @@ typedef struct {
+     TYPE *d = vd, *n = vn, *m = vm;                                        \
-  * + Property "memory": MemoryRegion defining the physical address space
+     for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
-  *   that CPU accesses see. (The NVIC, bitbanding and other CPU-internal
+@@ -XXX,XX +XXX,XX @@ DO_MUL_IDX(gvec_mul_idx_d, uint64_t, )
-  *   devices will be automatically layered on top of this view.)
+ #define DO_MLA_IDX(NAME, TYPE, OP, H) \
-+ * + Property "idau": IDAU interface (forwarded to CPU object)
+ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)   \
-  */
+ {                                                                          \
- typedef struct ARMv7MState {
+-    intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE);  \
-     /*< private >*/
++    intptr_t i, j, oprsz = simd_oprsz(desc);                               \
-@@ -XXX,XX +XXX,XX @@ typedef struct ARMv7MState {
++    intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
-     char *cpu_type;
+     intptr_t idx = simd_data(desc);                                        \
-     /* MemoryRegion the board provides to us (with its devices, RAM, etc) */
+     TYPE *d = vd, *n = vn, *m = vm, *a = va;                               \
-     MemoryRegion *board_memory;
+     for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
-+    Object *idau;
+@@ -XXX,XX +XXX,XX @@ DO_MLA_IDX(gvec_mls_idx_d, uint64_t, -,   )
- } ARMv7MState;
+ #define DO_FMUL_IDX(NAME, TYPE, H) \
+ void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
- #endif
+ {                                                                          \
-diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
+-    intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE);  \
-index XXXXXXX..XXXXXXX 100644
++    intptr_t i, j, oprsz = simd_oprsz(desc);                               \
---- a/hw/arm/armv7m.c
++    intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
-+++ b/hw/arm/armv7m.c
+     intptr_t idx = simd_data(desc);                                        \
-@@ -XXX,XX +XXX,XX @@
+     TYPE *d = vd, *n = vn, *m = vm;                                        \
- #include "sysemu/qtest.h"
+     for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
- #include "qemu/error-report.h"
+@@ -XXX,XX +XXX,XX @@ DO_FMUL_IDX(gvec_fmul_idx_d, float64, )
- #include "exec/address-spaces.h"
+ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va,                  \
-+#include "target/arm/idau.h"
+                   void *stat, uint32_t desc)                               \
+ {                                                                          \
- /* Bitbanded IO.  Each word corresponds to a single bit.  */
+-    intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE);  \
++    intptr_t i, j, oprsz = simd_oprsz(desc);                               \
-@@ -XXX,XX +XXX,XX @@ static void armv7m_realize(DeviceState *dev, Error **errp)
++    intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
+     TYPE op1_neg = extract32(desc, SIMD_DATA_SHIFT, 1);                    \
-     object_property_set_link(OBJECT(s->cpu), OBJECT(&s->container), "memory",
+     intptr_t idx = desc >> (SIMD_DATA_SHIFT + 1);                          \
-                              &error_abort);
+     TYPE *d = vd, *n = vn, *m = vm, *a = va;                               \
 +    if (object_property_find(OBJECT(s->cpu), "idau", NULL)) {
 +        object_property_set_link(OBJECT(s->cpu), s->idau, "idau", &err);
 +        if (err != NULL) {
 +            error_propagate(errp, err);
 +            return;
 +        }
 +    }
      object_property_set_bool(OBJECT(s->cpu), true, "realized", &err);
      if (err != NULL) {
          error_propagate(errp, err);
@@ -XXX,XX +XXX,XX @@ static Property armv7m_properties[] = {
      DEFINE_PROP_STRING("cpu-type", ARMv7MState, cpu_type),
      DEFINE_PROP_LINK("memory", ARMv7MState, board_memory, TYPE_MEMORY_REGION,
                       MemoryRegion *),
 +    DEFINE_PROP_LINK("idau", ARMv7MState, idau, TYPE_IDAU_INTERFACE, Object *),
      DEFINE_PROP_END_OF_LIST(),
  };
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 34/39] target/arm: Decode aa64 armv8.3 fcadd
+[PULL 42/47] target/arm/vec_helper: Add gvec fp indexed multiply-and-add operations
-From: Richard Henderson <richard.henderson@linaro.org>
+Add gvec helpers for doing Neon-style indexed non-fused fp
 multiply-and-accumulate operations.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180228193125.20577-12-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20200828183354.27913-44-peter.maydell@linaro.org
 ---
- target/arm/helper.h        |  7 ++++
+ target/arm/helper.h     | 10 ++++++++++
- target/arm/translate-a64.c | 48 ++++++++++++++++++++++-
+ target/arm/vec_helper.c | 27 ++++++++++++++++++++++-----
- target/arm/vec_helper.c    | 97 ++++++++++++++++++++++++++++++++++++++++++++++
+files changed, 32 insertions(+), 5 deletions(-)
 files changed, 151 insertions(+), 1 deletion(-)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_qrdmlah_s32, TCG_CALL_NO_RWG,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmul_idx_s, TCG_CALL_NO_RWG,
- DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s32, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_fmul_idx_d, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_5(gvec_fcaddh, TCG_CALL_NO_RWG,
++DEF_HELPER_FLAGS_5(gvec_fmla_nf_idx_h, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_5(gvec_fcadds, TCG_CALL_NO_RWG,
++DEF_HELPER_FLAGS_5(gvec_fmla_nf_idx_s, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_fcaddd, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
 +
- #ifdef TARGET_AARCH64
++DEF_HELPER_FLAGS_5(gvec_fmls_nf_idx_h, TCG_CALL_NO_RWG,
- #include "helper-a64.h"
++                   void, ptr, ptr, ptr, ptr, i32)
- #endif
++DEF_HELPER_FLAGS_5(gvec_fmls_nf_idx_s, TCG_CALL_NO_RWG,
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
++                   void, ptr, ptr, ptr, ptr, i32)
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op3_env(DisasContext *s, bool is_q, int rd,
                         is_q ? 16 : 8, vec_full_reg_size(s), 0, fn);
  }
 +/* Expand a 3-operand + fpstatus pointer + simd data value operation using
 + * an out-of-line helper.
 + */
 +static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, int rd, int rn,
 +                              int rm, bool is_fp16, int data,
 +                              gen_helper_gvec_3_ptr *fn)
 +{
 +    TCGv_ptr fpst = get_fpstatus_ptr(is_fp16);
 +    tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
 +                       vec_full_reg_offset(s, rn),
 +                       vec_full_reg_offset(s, rm), fpst,
 +                       is_q ? 16 : 8, vec_full_reg_size(s), data, fn);
 +    tcg_temp_free_ptr(fpst);
 +}
 +
- /* Set ZF and NF based on a 64 bit result. This is alas fiddlier
+ DEF_HELPER_FLAGS_6(gvec_fmla_idx_h, TCG_CALL_NO_RWG,
-  * than the 32 bit equivalent.
+                    void, ptr, ptr, ptr, ptr, ptr, i32)
-  */
+ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG,
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
      int size = extract32(insn, 22, 2);
      bool u = extract32(insn, 29, 1);
      bool is_q = extract32(insn, 30, 1);
 -    int feature;
 +    int feature, rot;
      switch (u * 16 + opcode) {
      case 0x10: /* SQRDMLAH (vector) */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          feature = ARM_FEATURE_V8_RDM;
          break;
 +    case 0xc: /* FCADD, #90 */
 +    case 0xe: /* FCADD, #270 */
 +        if (size == 0
 +            || (size == 1 && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))
 +            || (size == 3 && !is_q)) {
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        feature = ARM_FEATURE_V8_FCMA;
 +        break;
      default:
          unallocated_encoding(s);
          return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          return;
 +    case 0xc: /* FCADD, #90 */
 +    case 0xe: /* FCADD, #270 */
 +        rot = extract32(opcode, 1, 1);
 +        switch (size) {
 +        case 1:
 +            gen_gvec_op3_fpst(s, is_q, rd, rn, rm, size == 1, rot,
 +                              gen_helper_gvec_fcaddh);
 +            break;
 +        case 2:
 +            gen_gvec_op3_fpst(s, is_q, rd, rn, rm, size == 1, rot,
 +                              gen_helper_gvec_fcadds);
 +            break;
 +        case 3:
 +            gen_gvec_op3_fpst(s, is_q, rd, rn, rm, size == 1, rot,
 +                              gen_helper_gvec_fcaddd);
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +        return;
 +
      default:
          g_assert_not_reached();
      }
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DO_MLA_IDX(gvec_mls_idx_d, uint64_t, -,   )
- #include "exec/exec-all.h"
- #include "exec/helper-proto.h"
+ #undef DO_MLA_IDX
- #include "tcg/tcg-gvec-desc.h"
-+#include "fpu/softfloat.h"
+-#define DO_FMUL_IDX(NAME, TYPE, H) \
++#define DO_FMUL_IDX(NAME, ADD, TYPE, H)                                    \
+ void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
-+/* Note that vector data is stored in host-endian 64-bit chunks,
+ {                                                                          \
-+   so addressing units smaller than that needs a host-endian fixup.  */
+     intptr_t i, j, oprsz = simd_oprsz(desc);                               \
-+#ifdef HOST_WORDS_BIGENDIAN
+@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
-+#define H1(x)  ((x) ^ 7)
+     for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
-+#define H2(x)  ((x) ^ 3)
+         TYPE mm = m[H(i + idx)];                                           \
-+#define H4(x)  ((x) ^ 1)
+         for (j = 0; j < segment; j++) {                                    \
-+#else
+-            d[i + j] = TYPE##_mul(n[i + j], mm, stat);                     \
-+#define H1(x)  (x)
++            d[i + j] = TYPE##_##ADD(d[i + j],                              \
-+#define H2(x)  (x)
++                                    TYPE##_mul(n[i + j], mm, stat), stat); \
-+#define H4(x)  (x)
+         }                                                                  \
-+#endif
+     }                                                                      \
      clear_tail(d, oprsz, simd_maxsz(desc));                                \
  }
 -DO_FMUL_IDX(gvec_fmul_idx_h, float16, H2)
 -DO_FMUL_IDX(gvec_fmul_idx_s, float32, H4)
 -DO_FMUL_IDX(gvec_fmul_idx_d, float64, )
 +#define float16_nop(N, M, S) (M)
 +#define float32_nop(N, M, S) (M)
 +#define float64_nop(N, M, S) (M)
 +DO_FMUL_IDX(gvec_fmul_idx_h, nop, float16, H2)
 +DO_FMUL_IDX(gvec_fmul_idx_s, nop, float32, H4)
 +DO_FMUL_IDX(gvec_fmul_idx_d, nop, float64, )
 +
- #define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] |= CPSR_Q
++/*
++ * Non-fused multiply-accumulate operations, for Neon. NB that unlike
- static void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
++ * the fused ops below they assume accumulate both from and into Vd.
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_qrdmlsh_s32)(void *vd, void *vn, void *vm,
++ */
-     }
++DO_FMUL_IDX(gvec_fmla_nf_idx_h, add, float16, H2)
-     clear_tail(d, opr_sz, simd_maxsz(desc));
++DO_FMUL_IDX(gvec_fmla_nf_idx_s, add, float32, H4)
- }
++DO_FMUL_IDX(gvec_fmls_nf_idx_h, sub, float16, H2)
 +DO_FMUL_IDX(gvec_fmls_nf_idx_s, sub, float32, H4)
 +
-+void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm,
++#undef float16_nop
-+                         void *vfpst, uint32_t desc)
++#undef float32_nop
-+{
++#undef float64_nop
-+    uintptr_t opr_sz = simd_oprsz(desc);
+ #undef DO_FMUL_IDX
-+    float16 *d = vd;
-+    float16 *n = vn;
+ #define DO_FMLA_IDX(NAME, TYPE, H)                                         \
 +    float16 *m = vm;
 +    float_status *fpst = vfpst;
 +    uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t neg_imag = neg_real ^ 1;
 +    uintptr_t i;
 +
 +    /* Shift boolean to the sign bit so we can xor to negate.  */
 +    neg_real <<= 15;
 +    neg_imag <<= 15;
 +
 +    for (i = 0; i < opr_sz / 2; i += 2) {
 +        float16 e0 = n[H2(i)];
 +        float16 e1 = m[H2(i + 1)] ^ neg_imag;
 +        float16 e2 = n[H2(i + 1)];
 +        float16 e3 = m[H2(i)] ^ neg_real;
 +
 +        d[H2(i)] = float16_add(e0, e1, fpst);
 +        d[H2(i + 1)] = float16_add(e2, e3, fpst);
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
 +void HELPER(gvec_fcadds)(void *vd, void *vn, void *vm,
 +                         void *vfpst, uint32_t desc)
 +{
 +    uintptr_t opr_sz = simd_oprsz(desc);
 +    float32 *d = vd;
 +    float32 *n = vn;
 +    float32 *m = vm;
 +    float_status *fpst = vfpst;
 +    uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t neg_imag = neg_real ^ 1;
 +    uintptr_t i;
 +
 +    /* Shift boolean to the sign bit so we can xor to negate.  */
 +    neg_real <<= 31;
 +    neg_imag <<= 31;
 +
 +    for (i = 0; i < opr_sz / 4; i += 2) {
 +        float32 e0 = n[H4(i)];
 +        float32 e1 = m[H4(i + 1)] ^ neg_imag;
 +        float32 e2 = n[H4(i + 1)];
 +        float32 e3 = m[H4(i)] ^ neg_real;
 +
 +        d[H4(i)] = float32_add(e0, e1, fpst);
 +        d[H4(i + 1)] = float32_add(e2, e3, fpst);
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
 +void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
 +                         void *vfpst, uint32_t desc)
 +{
 +    uintptr_t opr_sz = simd_oprsz(desc);
 +    float64 *d = vd;
 +    float64 *n = vn;
 +    float64 *m = vm;
 +    float_status *fpst = vfpst;
 +    uint64_t neg_real = extract64(desc, SIMD_DATA_SHIFT, 1);
 +    uint64_t neg_imag = neg_real ^ 1;
 +    uintptr_t i;
 +
 +    /* Shift boolean to the sign bit so we can xor to negate.  */
 +    neg_real <<= 63;
 +    neg_imag <<= 63;
 +
 +    for (i = 0; i < opr_sz / 8; i += 2) {
 +        float64 e0 = n[i];
 +        float64 e1 = m[i + 1] ^ neg_imag;
 +        float64 e2 = n[i + 1];
 +        float64 e3 = m[i] ^ neg_real;
 +
 +        d[i] = float64_add(e0, e1, fpst);
 +        d[i + 1] = float64_add(e2, e3, fpst);
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 08/39] target/arm: Define an IDAU interface
+[PULL 43/47] target/arm: Implement fp16 for Neon VMUL, VMLA, VMLS
-In v8M, the Implementation Defined Attribution Unit (IDAU) is
+Convert the Neon floating-point VMUL, VMLA and VMLS to use gvec,
-a small piece of hardware typically implemented in the SoC
+and use this to implement fp16 support.
 which provides board or SoC specific security attribution
 information for each address that the CPU performs MPU/SAU
 checks on. For QEMU, we model this with a QOM interface which
 is implemented by the board or SoC object and connected to
 the CPU using a link property.
 This commit defines the new interface class, adds the link
 property to the CPU object, and makes the SAU checking
 code call the IDAU interface if one is present.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180220180325.29818-5-peter.maydell@linaro.org
+Message-id: 20200828183354.27913-45-peter.maydell@linaro.org
 ---
- target/arm/cpu.h    |  3 +++
+ target/arm/translate-neon.c.inc | 114 ++++++++++++++++----------------
- target/arm/idau.h   | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++
+file changed, 57 insertions(+), 57 deletions(-)
  target/arm/cpu.c    | 15 +++++++++++++
  target/arm/helper.c | 28 +++++++++++++++++++++---
 files changed, 104 insertions(+), 3 deletions(-)
  create mode 100644 target/arm/idau.h
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/translate-neon.c.inc
-+++ b/target/arm/cpu.h
++++ b/target/arm/translate-neon.c.inc
-@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
-     /* MemoryRegion to use for secure physical accesses */
+     return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
-     MemoryRegion *secure_memory;
+ }
-+    /* For v8M, pointer to the IDAU interface provided by board/SoC */
+-/*
-+    Object *idau;
+- * Rather than have a float-specific version of do_2scalar just for
 - * three insns, we wrap a NeonGenTwoSingleOpFn to turn it into
 - * a NeonGenTwoOpFn.
 - */
 -#define WRAP_FP_FN(WRAPNAME, FUNC)                              \
 -    static void WRAPNAME(TCGv_i32 rd, TCGv_i32 rn, TCGv_i32 rm) \
 -    {                                                           \
 -        TCGv_ptr fpstatus = fpstatus_ptr(FPST_STD);             \
 -        FUNC(rd, rn, rm, fpstatus);                             \
 -        tcg_temp_free_ptr(fpstatus);                            \
 +static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
 +                              gen_helper_gvec_3_ptr *fn)
 +{
 +    /* Two registers and a scalar, using gvec */
 +    int vec_size = a->q ? 16 : 8;
 +    int rd_ofs = neon_reg_offset(a->vd, 0);
 +    int rn_ofs = neon_reg_offset(a->vn, 0);
 +    int rm_ofs;
 +    int idx;
 +    TCGv_ptr fpstatus;
 +
-     /* 'compatible' string for this CPU for Linux device trees */
++    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-     const char *dtb_compatible;
++        return false;
 diff --git a/target/arm/idau.h b/target/arm/idau.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/idau.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * QEMU ARM CPU -- interface for the Arm v8M IDAU
 + *
 + * Copyright (c) 2018 Linaro Ltd
 + *
 + * This program is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU General Public License
 + * as published by the Free Software Foundation; either version 2
 + * of the License, or (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program; if not, see
 + * <http://www.gnu.org/licenses/gpl-2.0.html>
 + *
 + * In the v8M architecture, the IDAU is a small piece of hardware
 + * typically implemented in the SoC which provides board or SoC
 + * specific security attribution information for each address that
 + * the CPU performs MPU/SAU checks on. For QEMU, we model this with a
 + * QOM interface which is implemented by the board or SoC object and
 + * connected to the CPU using a link property.
 + */
 +
 +#ifndef TARGET_ARM_IDAU_H
 +#define TARGET_ARM_IDAU_H
 +
 +#include "qom/object.h"
 +
 +#define TYPE_IDAU_INTERFACE "idau-interface"
 +#define IDAU_INTERFACE(obj) \
 +    INTERFACE_CHECK(IDAUInterface, (obj), TYPE_IDAU_INTERFACE)
 +#define IDAU_INTERFACE_CLASS(class) \
 +    OBJECT_CLASS_CHECK(IDAUInterfaceClass, (class), TYPE_IDAU_INTERFACE)
 +#define IDAU_INTERFACE_GET_CLASS(obj) \
 +    OBJECT_GET_CLASS(IDAUInterfaceClass, (obj), TYPE_IDAU_INTERFACE)
 +
 +typedef struct IDAUInterface {
 +    Object parent;
 +} IDAUInterface;
 +
 +#define IREGION_NOTVALID -1
 +
 +typedef struct IDAUInterfaceClass {
 +    InterfaceClass parent;
 +
 +    /* Check the specified address and return the IDAU security information
 +     * for it by filling in iregion, exempt, ns and nsc:
 +     *  iregion: IDAU region number, or IREGION_NOTVALID if not valid
 +     *  exempt: true if address is exempt from security attribution
 +     *  ns: true if the address is NonSecure
 +     *  nsc: true if the address is NonSecure-callable
 +     */
 +    void (*check)(IDAUInterface *ii, uint32_t address, int *iregion,
 +                  bool *exempt, bool *ns, bool *nsc);
 +} IDAUInterfaceClass;
 +
 +#endif
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@
   */
  #include "qemu/osdep.h"
 +#include "target/arm/idau.h"
  #include "qemu/error-report.h"
  #include "qapi/error.h"
  #include "cpu.h"
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_post_init(Object *obj)
          }
      }
-+    if (arm_feature(&cpu->env, ARM_FEATURE_M_SECURITY)) {
+-WRAP_FP_FN(gen_VMUL_F_mul, gen_helper_vfp_muls)
-+        object_property_add_link(obj, "idau", TYPE_IDAU_INTERFACE, &cpu->idau,
+-WRAP_FP_FN(gen_VMUL_F_add, gen_helper_vfp_adds)
-+                                 qdev_prop_allow_set_link_before_realize,
+-WRAP_FP_FN(gen_VMUL_F_sub, gen_helper_vfp_subs)
-+                                 OBJ_PROP_LINK_UNREF_ON_RELEASE,
++    /* UNDEF accesses to D16-D31 if they don't exist. */
-+                                 &error_abort);
++    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 -static bool trans_VMUL_F_2sc(DisasContext *s, arg_2scalar *a)
 -{
 -    static NeonGenTwoOpFn * const opfn[] = {
 -        NULL,
 -        NULL, /* TODO: fp16 support */
 -        gen_VMUL_F_mul,
 -        NULL,
 -    };
 +    if (!fn) {
 +        /* Bad size (including size == 3, which is a different insn group) */
 +        return false;
 +    }
 -    return do_2scalar(s, a, opfn[a->size], NULL);
 +    if (a->q && ((a->vd | a->vn) & 1)) {
 +        return false;
 +    }
 +
-     qdev_property_add_static(DEVICE(obj), &arm_cpu_cfgend_property,
++    if (!vfp_access_check(s)) {
-                              &error_abort);
++        return true;
  }
@@ -XXX,XX +XXX,XX @@ static const TypeInfo arm_cpu_type_info = {
      .class_init = arm_cpu_class_init,
  };
 +static const TypeInfo idau_interface_type_info = {
 +    .name = TYPE_IDAU_INTERFACE,
 +    .parent = TYPE_INTERFACE,
 +    .class_size = sizeof(IDAUInterfaceClass),
 +};
 +
  static void arm_cpu_register_types(void)
  {
      const ARMCPUInfo *info = arm_cpus;
      type_register_static(&arm_cpu_type_info);
 +    type_register_static(&idau_interface_type_info);
      while (info->name) {
          cpu_register(info);
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/osdep.h"
 +#include "target/arm/idau.h"
  #include "trace.h"
  #include "cpu.h"
  #include "internals.h"
@@ -XXX,XX +XXX,XX @@ static void v8m_security_lookup(CPUARMState *env, uint32_t address,
       */
      ARMCPU *cpu = arm_env_get_cpu(env);
      int r;
 +    bool idau_exempt = false, idau_ns = true, idau_nsc = true;
 +    int idau_region = IREGION_NOTVALID;
 -    /* TODO: implement IDAU */
 +    if (cpu->idau) {
 +        IDAUInterfaceClass *iic = IDAU_INTERFACE_GET_CLASS(cpu->idau);
 +        IDAUInterface *ii = IDAU_INTERFACE(cpu->idau);
 +
 +        iic->check(ii, address, &idau_region, &idau_exempt, &idau_ns,
 +                   &idau_nsc);
 +    }
      if (access_type == MMU_INST_FETCH && extract32(address, 28, 4) == 0xf) {
          /* 0xf0000000..0xffffffff is always S for insn fetches */
          return;
      }
 -    if (v8m_is_sau_exempt(env, address, access_type)) {
 +    if (idau_exempt || v8m_is_sau_exempt(env, address, access_type)) {
          sattrs->ns = !regime_is_secure(env, mmu_idx);
          return;
      }
 +    if (idau_region != IREGION_NOTVALID) {
 +        sattrs->irvalid = true;
 +        sattrs->iregion = idau_region;
 +    }
 +
-     switch (env->sau.ctrl & 3) {
++    /* a->vm is M:Vm, which encodes both register and index */
-     case 0: /* SAU.ENABLE == 0, SAU.ALLNS == 0 */
++    idx = extract32(a->vm, a->size + 2, 2);
-         break;
++    a->vm = extract32(a->vm, 0, a->size + 2);
-@@ -XXX,XX +XXX,XX @@ static void v8m_security_lookup(CPUARMState *env, uint32_t address,
++    rm_ofs = neon_reg_offset(a->vm, 0);
-             }
++
-         }
++    fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
++    tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
--        /* TODO when we support the IDAU then it may override the result here */
++                       vec_size, vec_size, idx, fn);
-+        /* The IDAU will override the SAU lookup results if it specifies
++    tcg_temp_free_ptr(fpstatus);
-+         * higher security than the SAU does.
++    return true;
 +         */
 +        if (!idau_ns) {
 +            if (sattrs->ns || (!idau_nsc && sattrs->nsc)) {
 +                sattrs->ns = false;
 +                sattrs->nsc = idau_nsc;
 +            }
 +        }
          break;
      }
  }
+-static bool trans_VMLA_F_2sc(DisasContext *s, arg_2scalar *a)
+-{
+-    static NeonGenTwoOpFn * const opfn[] = {
+-        NULL,
+-        NULL, /* TODO: fp16 support */
+-        gen_VMUL_F_mul,
+-        NULL,
+-    };
+-    static NeonGenTwoOpFn * const accfn[] = {
+-        NULL,
+-        NULL, /* TODO: fp16 support */
+-        gen_VMUL_F_add,
+-        NULL,
+-    };
++#define DO_VMUL_F_2sc(NAME, FUNC)                                       \
++    static bool trans_##NAME##_F_2sc(DisasContext *s, arg_2scalar *a)   \
++    {                                                                   \
++        static gen_helper_gvec_3_ptr * const opfn[] = {                 \
++            NULL,                                                       \
++            gen_helper_##FUNC##_h,                                      \
++            gen_helper_##FUNC##_s,                                      \
++            NULL,                                                       \
++        };                                                              \
++        if (a->size == MO_16 && !dc_isar_feature(aa32_fp16_arith, s)) { \
++            return false;                                               \
++        }                                                               \
++        return do_2scalar_fp_vec(s, a, opfn[a->size]);                  \
++    }
+-    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
+-}
+-
+-static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
+-{
+-    static NeonGenTwoOpFn * const opfn[] = {
+-        NULL,
+-        NULL, /* TODO: fp16 support */
+-        gen_VMUL_F_mul,
+-        NULL,
+-    };
+-    static NeonGenTwoOpFn * const accfn[] = {
+-        NULL,
+-        NULL, /* TODO: fp16 support */
+-        gen_VMUL_F_sub,
+-        NULL,
+-    };
+-
+-    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
+-}
++DO_VMUL_F_2sc(VMUL, gvec_fmul_idx)
++DO_VMUL_F_2sc(VMLA, gvec_fmla_nf_idx)
++DO_VMUL_F_2sc(VMLS, gvec_fmls_nf_idx)
+ WRAP_ENV_FN(gen_VQDMULH_16, gen_helper_neon_qdmulh_s16)
+ WRAP_ENV_FN(gen_VQDMULH_32, gen_helper_neon_qdmulh_s32)
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 32/39] target/arm: Enable ARM_FEATURE_V8_RDM
+[PULL 44/47] target/arm: Enable FP16 in '-cpu max'
-From: Richard Henderson <richard.henderson@linaro.org>
+Set the MVFR1 ID register FPHP and SIMDHP fields to indicate
 that our "-cpu max" has v8.2-FP16.
-Enable it for the "any" CPU used by *-linux-user.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20180228193125.20577-10-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200828183354.27913-46-peter.maydell@linaro.org
 ---
- target/arm/cpu.c   | 1 +
+ target/arm/cpu.c   |  3 ++-
- target/arm/cpu64.c | 1 +
+ target/arm/cpu64.c | 10 ++++------
-files changed, 2 insertions(+)
+files changed, 6 insertions(+), 7 deletions(-)
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void arm_any_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
-     set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
+             cpu->isar.id_isar6 = t;
-     set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
-     set_feature(&cpu->env, ARM_FEATURE_CRC);
+             t = cpu->isar.mvfr1;
-+    set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
+-            t = FIELD_DP32(t, MVFR1, FPHP, 2);     /* v8.0 FP support */
-     cpu->midr = 0xffffffff;
++            t = FIELD_DP32(t, MVFR1, FPHP, 3);     /* v8.2-FP16 */
- }
++            t = FIELD_DP32(t, MVFR1, SIMDHP, 2);   /* v8.2-FP16 */
- #endif
+             cpu->isar.mvfr1 = t;
              t = cpu->isar.mvfr2;
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static void aarch64_any_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
-     set_feature(&cpu->env, ARM_FEATURE_V8_SM4);
+         u = FIELD_DP32(u, ID_DFR0, PERFMON, 5); /* v8.4-PMU */
-     set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
+         cpu->isar.id_dfr0 = u;
-     set_feature(&cpu->env, ARM_FEATURE_CRC);
-+    set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
+-        /*
-     set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
+-         * FIXME: We do not yet support ARMv8.2-fp16 for AArch32 yet,
-     cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
+-         * so do not set MVFR1.FPHP.  Strictly speaking this is not legal,
-     cpu->dcz_blocksize = 7; /*  512 bytes */
+-         * but it is also not legal to enable SVE without support for FP16,
 -         * and enabling SVE in system mode is more useful in the short term.
 -         */
 +        u = cpu->isar.mvfr1;
 +        u = FIELD_DP32(u, MVFR1, FPHP, 3);      /* v8.2-FP16 */
 +        u = FIELD_DP32(u, MVFR1, SIMDHP, 2);    /* v8.2-FP16 */
 +        cpu->isar.mvfr1 = u;
  #ifdef CONFIG_USER_ONLY
          /* For usermode -cpu max we can use a larger and more efficient DCZ
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 24/39] target/arm: Add ARM_FEATURE_V8_RDM
+[PULL 45/47] hw/arm/sbsa-ref: add "reg" property to DT cpu nodes
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Leif Lindholm <leif@nuviainc.com>
-Not enabled anywhere yet.
+The sbsa-ref platform uses a minimal device tree to pass amount of memory
 as well as number of cpus to the firmware. However, when dumping that
 minimal dtb (with -M sbsa-virt,dumpdtb=<file>), the resulting blob
 generates a warning when decompiled by dtc due to lack of reg property.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Add a simple reg property per cpu, representing a 64-bit MPIDR_EL1.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+This also ends up being cleaner than having the firmware calculating its
-Message-id: 20180228193125.20577-2-richard.henderson@linaro.org
+own IDs for generating APCI.
 Signed-off-by: Leif Lindholm <leif@nuviainc.com>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20200827124335.30586-1-leif@nuviainc.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h     | 1 +
+ hw/arm/sbsa-ref.c | 29 +++++++++++++++++++++++------
- linux-user/elfload.c | 1 +
+file changed, 23 insertions(+), 6 deletions(-)
 files changed, 2 insertions(+)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/hw/arm/sbsa-ref.c
-+++ b/target/arm/cpu.h
++++ b/hw/arm/sbsa-ref.c
-@@ -XXX,XX +XXX,XX @@ enum arm_features {
+@@ -XXX,XX +XXX,XX @@ static const int sbsa_ref_irqmap[] = {
-     ARM_FEATURE_V8_SHA3, /* implements SHA3 part of v8 Crypto Extensions */
+     [SBSA_EHCI] = 11,
      ARM_FEATURE_V8_SM3, /* implements SM3 part of v8 Crypto Extensions */
      ARM_FEATURE_V8_SM4, /* implements SM4 part of v8 Crypto Extensions */
 +    ARM_FEATURE_V8_RDM, /* implements v8.1 simd round multiply */
      ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
  };
-diff --git a/linux-user/elfload.c b/linux-user/elfload.c
++static uint64_t sbsa_ref_cpu_mp_affinity(SBSAMachineState *sms, int idx)
-index XXXXXXX..XXXXXXX 100644
++{
---- a/linux-user/elfload.c
++    uint8_t clustersz = ARM_DEFAULT_CPUS_PER_CLUSTER;
-+++ b/linux-user/elfload.c
++    return arm_cpu_mp_affinity(idx, clustersz);
-@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
++}
-     GET_FEATURE(ARM_FEATURE_V8_SHA512, ARM_HWCAP_A64_SHA512);
++
-     GET_FEATURE(ARM_FEATURE_V8_FP16,
+ /*
-                 ARM_HWCAP_A64_FPHP | ARM_HWCAP_A64_ASIMDHP);
+  * Firmware on this machine only uses ACPI table to load OS, these limited
-+    GET_FEATURE(ARM_FEATURE_V8_RDM, ARM_HWCAP_A64_ASIMDRDM);
+  * device tree nodes are just to let firmware know the info which varies from
- #undef GET_FEATURE
+@@ -XXX,XX +XXX,XX @@ static void create_fdt(SBSAMachineState *sms)
+         g_free(matrix);
-     return hwcaps;
+     }
 +    /*
 +     * From Documentation/devicetree/bindings/arm/cpus.yaml
 +     *  On ARM v8 64-bit systems this property is required
 +     *    and matches the MPIDR_EL1 register affinity bits.
 +     *
 +     *    * If cpus node's #address-cells property is set to 2
 +     *
 +     *      The first reg cell bits [7:0] must be set to
 +     *      bits [39:32] of MPIDR_EL1.
 +     *
 +     *      The second reg cell bits [23:0] must be set to
 +     *      bits [23:0] of MPIDR_EL1.
 +     */
      qemu_fdt_add_subnode(sms->fdt, "/cpus");
 +    qemu_fdt_setprop_cell(sms->fdt, "/cpus", "#address-cells", 2);
 +    qemu_fdt_setprop_cell(sms->fdt, "/cpus", "#size-cells", 0x0);
      for (cpu = sms->smp_cpus - 1; cpu >= 0; cpu--) {
          char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
          ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
          CPUState *cs = CPU(armcpu);
 +        uint64_t mpidr = sbsa_ref_cpu_mp_affinity(sms, cpu);
          qemu_fdt_add_subnode(sms->fdt, nodename);
 +        qemu_fdt_setprop_u64(sms->fdt, nodename, "reg", mpidr);
          if (ms->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
              qemu_fdt_setprop_cell(sms->fdt, nodename, "numa-node-id",
@@ -XXX,XX +XXX,XX @@ static void sbsa_ref_init(MachineState *machine)
      arm_load_kernel(ARM_CPU(first_cpu), machine, &sms->bootinfo);
  }
 -static uint64_t sbsa_ref_cpu_mp_affinity(SBSAMachineState *sms, int idx)
 -{
 -    uint8_t clustersz = ARM_DEFAULT_CPUS_PER_CLUSTER;
 -    return arm_cpu_mp_affinity(idx, clustersz);
 -}
 -
  static const CPUArchIdList *sbsa_ref_possible_cpu_arch_ids(MachineState *ms)
  {
      unsigned int max_cpus = ms->smp.max_cpus;
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 01/39] xlnx-zynqmp-rtc: Initial commit
+[PULL 46/47] hw/misc/sbsa_ec : Add an embedded controller for sbsa-ref
-From: Alistair Francis <alistair.francis@xilinx.com>
+From: Graeme Gregory <graeme@nuviainc.com>
-Initial commit of the ZynqMP RTC device.
+A difference between sbsa platform and the virt platform is PSCI is
 handled by ARM-TF in the sbsa platform. This means that the PSCI code
 there needs to communicate some of the platform power changes down
 to the qemu code for things like shutdown/reset control.
-Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
+Space has been left to extend the EC if we find other use cases in
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+future where ARM-TF and qemu need to communicate.
 Signed-off-by: Graeme Gregory <graeme@nuviainc.com>
 Reviewed-by: Leif Lindholm <leif@nuviainc.com>
 Tested-by: Leif Lindholm <leif@nuviainc.com>
 Message-id: 20200826141952.136164-2-graeme@nuviainc.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/timer/Makefile.objs             |   1 +
+ hw/misc/sbsa_ec.c   | 98 +++++++++++++++++++++++++++++++++++++++++++++
- include/hw/timer/xlnx-zynqmp-rtc.h |  84 +++++++++++++++
+ hw/misc/meson.build |  2 +
- hw/timer/xlnx-zynqmp-rtc.c         | 214 +++++++++++++++++++++++++++++++++++++
+files changed, 100 insertions(+)
-files changed, 299 insertions(+)
+ create mode 100644 hw/misc/sbsa_ec.c
  create mode 100644 include/hw/timer/xlnx-zynqmp-rtc.h
  create mode 100644 hw/timer/xlnx-zynqmp-rtc.c
-diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
+diff --git a/hw/misc/sbsa_ec.c b/hw/misc/sbsa_ec.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/timer/Makefile.objs
 +++ b/hw/timer/Makefile.objs
@@ -XXX,XX +XXX,XX @@ common-obj-$(CONFIG_IMX) += imx_epit.o
  common-obj-$(CONFIG_IMX) += imx_gpt.o
  common-obj-$(CONFIG_LM32) += lm32_timer.o
  common-obj-$(CONFIG_MILKYMIST) += milkymist-sysctl.o
 +common-obj-$(CONFIG_XLNX_ZYNQMP) += xlnx-zynqmp-rtc.o
  obj-$(CONFIG_ALTERA_TIMER) += altera_timer.o
  obj-$(CONFIG_EXYNOS4) += exynos4210_mct.o
 diff --git a/include/hw/timer/xlnx-zynqmp-rtc.h b/include/hw/timer/xlnx-zynqmp-rtc.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/include/hw/timer/xlnx-zynqmp-rtc.h
++++ b/hw/misc/sbsa_ec.c
 @@ -XXX,XX +XXX,XX @@
 +/*
-+ * QEMU model of the Xilinx ZynqMP Real Time Clock (RTC).
++ * ARM SBSA Reference Platform Embedded Controller
 + *
-+ * Copyright (c) 2017 Xilinx Inc.
++ * A device to allow PSCI running in the secure side of sbsa-ref machine
 + * to communicate platform power states to qemu.
 + *
-+ * Written-by: Alistair Francis <alistair.francis@xilinx.com>
++ * Copyright (c) 2020 Nuvia Inc
 + * Written by Graeme Gregory <graeme@nuviainc.com>
 + *
-+ * Permission is hereby granted, free of charge, to any person obtaining a copy
++ * SPDX-License-Identifer: GPL-2.0-or-later
 + * of this software and associated documentation files (the "Software"), to deal
 + * in the Software without restriction, including without limitation the rights
 + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 + * copies of the Software, and to permit persons to whom the Software is
 + * furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice shall be included in
 + * all copies or substantial portions of the Software.
 + *
 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 + * THE SOFTWARE.
 + */
 +
 +#include "hw/register.h"
 +
 +#define TYPE_XLNX_ZYNQMP_RTC "xlnx-zynmp.rtc"
 +
 +#define XLNX_ZYNQMP_RTC(obj) \
 +     OBJECT_CHECK(XlnxZynqMPRTC, (obj), TYPE_XLNX_ZYNQMP_RTC)
 +
 +REG32(SET_TIME_WRITE, 0x0)
 +REG32(SET_TIME_READ, 0x4)
 +REG32(CALIB_WRITE, 0x8)
 +    FIELD(CALIB_WRITE, FRACTION_EN, 20, 1)
 +    FIELD(CALIB_WRITE, FRACTION_DATA, 16, 4)
 +    FIELD(CALIB_WRITE, MAX_TICK, 0, 16)
 +REG32(CALIB_READ, 0xc)
 +    FIELD(CALIB_READ, FRACTION_EN, 20, 1)
 +    FIELD(CALIB_READ, FRACTION_DATA, 16, 4)
 +    FIELD(CALIB_READ, MAX_TICK, 0, 16)
 +REG32(CURRENT_TIME, 0x10)
 +REG32(CURRENT_TICK, 0x14)
 +    FIELD(CURRENT_TICK, VALUE, 0, 16)
 +REG32(ALARM, 0x18)
 +REG32(RTC_INT_STATUS, 0x20)
 +    FIELD(RTC_INT_STATUS, ALARM, 1, 1)
 +    FIELD(RTC_INT_STATUS, SECONDS, 0, 1)
 +REG32(RTC_INT_MASK, 0x24)
 +    FIELD(RTC_INT_MASK, ALARM, 1, 1)
 +    FIELD(RTC_INT_MASK, SECONDS, 0, 1)
 +REG32(RTC_INT_EN, 0x28)
 +    FIELD(RTC_INT_EN, ALARM, 1, 1)
 +    FIELD(RTC_INT_EN, SECONDS, 0, 1)
 +REG32(RTC_INT_DIS, 0x2c)
 +    FIELD(RTC_INT_DIS, ALARM, 1, 1)
 +    FIELD(RTC_INT_DIS, SECONDS, 0, 1)
 +REG32(ADDR_ERROR, 0x30)
 +    FIELD(ADDR_ERROR, STATUS, 0, 1)
 +REG32(ADDR_ERROR_INT_MASK, 0x34)
 +    FIELD(ADDR_ERROR_INT_MASK, MASK, 0, 1)
 +REG32(ADDR_ERROR_INT_EN, 0x38)
 +    FIELD(ADDR_ERROR_INT_EN, MASK, 0, 1)
 +REG32(ADDR_ERROR_INT_DIS, 0x3c)
 +    FIELD(ADDR_ERROR_INT_DIS, MASK, 0, 1)
 +REG32(CONTROL, 0x40)
 +    FIELD(CONTROL, BATTERY_DISABLE, 31, 1)
 +    FIELD(CONTROL, OSC_CNTRL, 24, 4)
 +    FIELD(CONTROL, SLVERR_ENABLE, 0, 1)
 +REG32(SAFETY_CHK, 0x50)
 +
 +#define XLNX_ZYNQMP_RTC_R_MAX (R_SAFETY_CHK + 1)
 +
 +typedef struct XlnxZynqMPRTC {
 +    SysBusDevice parent_obj;
 +    MemoryRegion iomem;
 +    qemu_irq irq_rtc_int;
 +    qemu_irq irq_addr_error_int;
 +
 +    uint32_t regs[XLNX_ZYNQMP_RTC_R_MAX];
 +    RegisterInfo regs_info[XLNX_ZYNQMP_RTC_R_MAX];
 +} XlnxZynqMPRTC;
 diff --git a/hw/timer/xlnx-zynqmp-rtc.c b/hw/timer/xlnx-zynqmp-rtc.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/timer/xlnx-zynqmp-rtc.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * QEMU model of the Xilinx ZynqMP Real Time Clock (RTC).
 + *
 + * Copyright (c) 2017 Xilinx Inc.
 + *
 + * Written-by: Alistair Francis <alistair.francis@xilinx.com>
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a copy
 + * of this software and associated documentation files (the "Software"), to deal
 + * in the Software without restriction, including without limitation the rights
 + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 + * copies of the Software, and to permit persons to whom the Software is
 + * furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice shall be included in
 + * all copies or substantial portions of the Software.
 + *
 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 + * THE SOFTWARE.
 + */
 +
 +#include "qemu/osdep.h"
++#include "qemu-common.h"
++#include "qemu/log.h"
 +#include "hw/sysbus.h"
-+#include "hw/register.h"
++#include "sysemu/runstate.h"
 +#include "qemu/bitops.h"
 +#include "qemu/log.h"
 +#include "hw/timer/xlnx-zynqmp-rtc.h"
 +
-+#ifndef XLNX_ZYNQMP_RTC_ERR_DEBUG
++typedef struct {
-+#define XLNX_ZYNQMP_RTC_ERR_DEBUG 0
++    SysBusDevice parent_obj;
-+#endif
++    MemoryRegion iomem;
 +} SECUREECState;
 +
-+static void rtc_int_update_irq(XlnxZynqMPRTC *s)
++#define TYPE_SBSA_EC      "sbsa-ec"
 +#define SECURE_EC(obj) OBJECT_CHECK(SECUREECState, (obj), TYPE_SBSA_EC)
 +
 +enum sbsa_ec_powerstates {
 +    SBSA_EC_CMD_POWEROFF = 0x01,
 +    SBSA_EC_CMD_REBOOT = 0x02,
 +};
 +
 +static uint64_t sbsa_ec_read(void *opaque, hwaddr offset, unsigned size)
 +{
-+    bool pending = s->regs[R_RTC_INT_STATUS] & ~s->regs[R_RTC_INT_MASK];
++    /* No use for this currently */
-+    qemu_set_irq(s->irq_rtc_int, pending);
++    qemu_log_mask(LOG_GUEST_ERROR, "sbsa-ec: no readable registers");
 +}
 +
 +static void addr_error_int_update_irq(XlnxZynqMPRTC *s)
 +{
 +    bool pending = s->regs[R_ADDR_ERROR] & ~s->regs[R_ADDR_ERROR_INT_MASK];
 +    qemu_set_irq(s->irq_addr_error_int, pending);
 +}
 +
 +static void rtc_int_status_postw(RegisterInfo *reg, uint64_t val64)
 +{
 +    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
 +    rtc_int_update_irq(s);
 +}
 +
 +static uint64_t rtc_int_en_prew(RegisterInfo *reg, uint64_t val64)
 +{
 +    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
 +
 +    s->regs[R_RTC_INT_MASK] &= (uint32_t) ~val64;
 +    rtc_int_update_irq(s);
 +    return 0;
 +}
 +
-+static uint64_t rtc_int_dis_prew(RegisterInfo *reg, uint64_t val64)
++static void sbsa_ec_write(void *opaque, hwaddr offset,
 +                     uint64_t value, unsigned size)
 +{
-+    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
++    if (offset == 0) { /* PSCI machine power command register */
-+
++        switch (value) {
-+    s->regs[R_RTC_INT_MASK] |= (uint32_t) val64;
++        case SBSA_EC_CMD_POWEROFF:
-+    rtc_int_update_irq(s);
++            qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
-+    return 0;
++            break;
 +        case SBSA_EC_CMD_REBOOT:
 +            qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
 +            break;
 +        default:
 +            qemu_log_mask(LOG_GUEST_ERROR,
 +                          "sbsa-ec: unknown power command");
 +        }
 +    } else {
 +        qemu_log_mask(LOG_GUEST_ERROR, "sbsa-ec: unknown EC register");
 +    }
 +}
 +
-+static void addr_error_postw(RegisterInfo *reg, uint64_t val64)
++static const MemoryRegionOps sbsa_ec_ops = {
 +    .read = sbsa_ec_read,
 +    .write = sbsa_ec_write,
 +    .endianness = DEVICE_NATIVE_ENDIAN,
 +    .valid.min_access_size = 4,
 +    .valid.max_access_size = 4,
 +};
 +
 +static void sbsa_ec_init(Object *obj)
 +{
-+    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
++    SECUREECState *s = SECURE_EC(obj);
-+    addr_error_int_update_irq(s);
++    SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 +
 +    memory_region_init_io(&s->iomem, obj, &sbsa_ec_ops, s, "sbsa-ec",
 +                          0x1000);
 +    sysbus_init_mmio(dev, &s->iomem);
 +}
 +
-+static uint64_t addr_error_int_en_prew(RegisterInfo *reg, uint64_t val64)
++static void sbsa_ec_class_init(ObjectClass *klass, void *data)
 +{
 +    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
 +
 +    s->regs[R_ADDR_ERROR_INT_MASK] &= (uint32_t) ~val64;
 +    addr_error_int_update_irq(s);
 +    return 0;
 +}
 +
 +static uint64_t addr_error_int_dis_prew(RegisterInfo *reg, uint64_t val64)
 +{
 +    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
 +
 +    s->regs[R_ADDR_ERROR_INT_MASK] |= (uint32_t) val64;
 +    addr_error_int_update_irq(s);
 +    return 0;
 +}
 +
 +static const RegisterAccessInfo rtc_regs_info[] = {
 +    {   .name = "SET_TIME_WRITE",  .addr = A_SET_TIME_WRITE,
 +    },{ .name = "SET_TIME_READ",  .addr = A_SET_TIME_READ,
 +        .ro = 0xffffffff,
 +    },{ .name = "CALIB_WRITE",  .addr = A_CALIB_WRITE,
 +    },{ .name = "CALIB_READ",  .addr = A_CALIB_READ,
 +        .ro = 0x1fffff,
 +    },{ .name = "CURRENT_TIME",  .addr = A_CURRENT_TIME,
 +        .ro = 0xffffffff,
 +    },{ .name = "CURRENT_TICK",  .addr = A_CURRENT_TICK,
 +        .ro = 0xffff,
 +    },{ .name = "ALARM",  .addr = A_ALARM,
 +    },{ .name = "RTC_INT_STATUS",  .addr = A_RTC_INT_STATUS,
 +        .w1c = 0x3,
 +        .post_write = rtc_int_status_postw,
 +    },{ .name = "RTC_INT_MASK",  .addr = A_RTC_INT_MASK,
 +        .reset = 0x3,
 +        .ro = 0x3,
 +    },{ .name = "RTC_INT_EN",  .addr = A_RTC_INT_EN,
 +        .pre_write = rtc_int_en_prew,
 +    },{ .name = "RTC_INT_DIS",  .addr = A_RTC_INT_DIS,
 +        .pre_write = rtc_int_dis_prew,
 +    },{ .name = "ADDR_ERROR",  .addr = A_ADDR_ERROR,
 +        .w1c = 0x1,
 +        .post_write = addr_error_postw,
 +    },{ .name = "ADDR_ERROR_INT_MASK",  .addr = A_ADDR_ERROR_INT_MASK,
 +        .reset = 0x1,
 +        .ro = 0x1,
 +    },{ .name = "ADDR_ERROR_INT_EN",  .addr = A_ADDR_ERROR_INT_EN,
 +        .pre_write = addr_error_int_en_prew,
 +    },{ .name = "ADDR_ERROR_INT_DIS",  .addr = A_ADDR_ERROR_INT_DIS,
 +        .pre_write = addr_error_int_dis_prew,
 +    },{ .name = "CONTROL",  .addr = A_CONTROL,
 +        .reset = 0x1000000,
 +        .rsvd = 0x70fffffe,
 +    },{ .name = "SAFETY_CHK",  .addr = A_SAFETY_CHK,
 +    }
 +};
 +
 +static void rtc_reset(DeviceState *dev)
 +{
 +    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(dev);
 +    unsigned int i;
 +
 +    for (i = 0; i < ARRAY_SIZE(s->regs_info); ++i) {
 +        register_reset(&s->regs_info[i]);
 +    }
 +
 +    rtc_int_update_irq(s);
 +    addr_error_int_update_irq(s);
 +}
 +
 +static const MemoryRegionOps rtc_ops = {
 +    .read = register_read_memory,
 +    .write = register_write_memory,
 +    .endianness = DEVICE_LITTLE_ENDIAN,
 +    .valid = {
 +        .min_access_size = 4,
 +        .max_access_size = 4,
 +    },
 +};
 +
 +static void rtc_init(Object *obj)
 +{
 +    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(obj);
 +    SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
 +    RegisterInfoArray *reg_array;
 +
 +    memory_region_init(&s->iomem, obj, TYPE_XLNX_ZYNQMP_RTC,
 +                       XLNX_ZYNQMP_RTC_R_MAX * 4);
 +    reg_array =
 +        register_init_block32(DEVICE(obj), rtc_regs_info,
 +                              ARRAY_SIZE(rtc_regs_info),
 +                              s->regs_info, s->regs,
 +                              &rtc_ops,
 +                              XLNX_ZYNQMP_RTC_ERR_DEBUG,
 +                              XLNX_ZYNQMP_RTC_R_MAX * 4);
 +    memory_region_add_subregion(&s->iomem,
 +                                0x0,
 +                                &reg_array->mem);
 +    sysbus_init_mmio(sbd, &s->iomem);
 +    sysbus_init_irq(sbd, &s->irq_rtc_int);
 +    sysbus_init_irq(sbd, &s->irq_addr_error_int);
 +}
 +
 +static const VMStateDescription vmstate_rtc = {
 +    .name = TYPE_XLNX_ZYNQMP_RTC,
 +    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .fields = (VMStateField[]) {
 +        VMSTATE_UINT32_ARRAY(regs, XlnxZynqMPRTC, XLNX_ZYNQMP_RTC_R_MAX),
 +        VMSTATE_END_OF_LIST(),
 +    }
 +};
 +
 +static void rtc_class_init(ObjectClass *klass, void *data)
 +{
 +    DeviceClass *dc = DEVICE_CLASS(klass);
 +
-+    dc->reset = rtc_reset;
++    /* No vmstate or reset required: device has no internal state */
-+    dc->vmsd = &vmstate_rtc;
++    dc->user_creatable = false;
 +}
 +
-+static const TypeInfo rtc_info = {
++static const TypeInfo sbsa_ec_info = {
-+    .name          = TYPE_XLNX_ZYNQMP_RTC,
++    .name          = TYPE_SBSA_EC,
 +    .parent        = TYPE_SYS_BUS_DEVICE,
-+    .instance_size = sizeof(XlnxZynqMPRTC),
++    .instance_size = sizeof(SECUREECState),
-+    .class_init    = rtc_class_init,
++    .instance_init = sbsa_ec_init,
-+    .instance_init = rtc_init,
++    .class_init    = sbsa_ec_class_init,
 +};
 +
-+static void rtc_register_types(void)
++static void sbsa_ec_register_type(void)
 +{
-+    type_register_static(&rtc_info);
++    type_register_static(&sbsa_ec_info);
 +}
 +
-+type_init(rtc_register_types)
++type_init(sbsa_ec_register_type);
 diff --git a/hw/misc/meson.build b/hw/misc/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/meson.build
 +++ b/hw/misc/meson.build
@@ -XXX,XX +XXX,XX @@ specific_ss.add(when: 'CONFIG_MAC_VIA', if_true: files('mac_via.c'))
  specific_ss.add(when: 'CONFIG_MIPS_CPS', if_true: files('mips_cmgcr.c', 'mips_cpc.c'))
  specific_ss.add(when: 'CONFIG_MIPS_ITU', if_true: files('mips_itu.c'))
 +
 +specific_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa_ec.c'))
 --
-.16.2
+.20.1

-[Qemu-devel] [PULL 04/39] decodetree: Propagate return value from translate subroutines
+[PULL 47/47] hw/arm/sbsa-ref : Add embedded controller in secure memory
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Graeme Gregory <graeme@nuviainc.com>
-Allow the translate subroutines to return false for invalid insns.
+Add the previously created sbsa-ec device to the sbsa-ref machine in
 secure memory so the PSCI implementation in ARM-TF can access it, but
 not expose it to non secure firmware or OS except by via ARM-TF.
-At present we can of course invoke an invalid insn exception from within
+Signed-off-by: Graeme Gregory <graeme@nuviainc.com>
-the translate subroutine, but in the short term this consolidates code.
+Reviewed-by: Leif Lindholm <leif@nuviainc.com>
-In the long term it would allow the decodetree language to support
+Tested-by: Leif Lindholm <leif@nuviainc.com>
-overlapping patterns for ISA extensions.
+Message-id: 20200826141952.136164-3-graeme@nuviainc.com
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20180227232618.2908-1-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- scripts/decodetree.py | 5 ++---
+ hw/arm/sbsa-ref.c | 14 ++++++++++++++
-file changed, 2 insertions(+), 3 deletions(-)
+file changed, 14 insertions(+)
-diff --git a/scripts/decodetree.py b/scripts/decodetree.py
+diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
-index XXXXXXX..XXXXXXX 100755
+index XXXXXXX..XXXXXXX 100644
---- a/scripts/decodetree.py
+--- a/hw/arm/sbsa-ref.c
-+++ b/scripts/decodetree.py
++++ b/hw/arm/sbsa-ref.c
-@@ -XXX,XX +XXX,XX @@ class Pattern(General):
+@@ -XXX,XX +XXX,XX @@ enum {
-         global translate_prefix
+     SBSA_CPUPERIPHS,
-         output('typedef ', self.base.base.struct_name(),
+     SBSA_GIC_DIST,
-                ' arg_', self.name, ';\n')
+     SBSA_GIC_REDIST,
--        output(translate_scope, 'void ', translate_prefix, '_', self.name,
++    SBSA_SECURE_EC,
-+        output(translate_scope, 'bool ', translate_prefix, '_', self.name,
+     SBSA_SMMU,
-                '(DisasContext *ctx, arg_', self.name,
+     SBSA_UART,
-                ' *a, ', insntype, ' insn);\n')
+     SBSA_RTC,
+@@ -XXX,XX +XXX,XX @@ static const MemMapEntry sbsa_ref_memmap[] = {
-@@ -XXX,XX +XXX,XX @@ class Pattern(General):
+     [SBSA_CPUPERIPHS] =         { 0x40000000, 0x00040000 },
-             output(ind, self.base.extract_name(), '(&u.f_', arg, ', insn);\n')
+     [SBSA_GIC_DIST] =           { 0x40060000, 0x00010000 },
-         for n, f in self.fields.items():
+     [SBSA_GIC_REDIST] =         { 0x40080000, 0x04000000 },
-             output(ind, 'u.f_', arg, '.', n, ' = ', f.str_extract(), ';\n')
++    [SBSA_SECURE_EC] =          { 0x50000000, 0x00001000 },
--        output(ind, translate_prefix, '_', self.name,
+     [SBSA_UART] =               { 0x60000000, 0x00001000 },
-+        output(ind, 'return ', translate_prefix, '_', self.name,
+     [SBSA_RTC] =                { 0x60010000, 0x00001000 },
-                '(ctx, &u.f_', arg, ', insn);\n')
+     [SBSA_GPIO] =               { 0x60020000, 0x00001000 },
--        output(ind, 'return true;\n')
+@@ -XXX,XX +XXX,XX @@ static void *sbsa_ref_dtb(const struct arm_boot_info *binfo, int *fdt_size)
- # end Pattern
+     return board->fdt;
+ }
 +static void create_secure_ec(MemoryRegion *mem)
 +{
 +    hwaddr base = sbsa_ref_memmap[SBSA_SECURE_EC].base;
 +    DeviceState *dev = qdev_new("sbsa-ec");
 +    SysBusDevice *s = SYS_BUS_DEVICE(dev);
 +
 +    memory_region_add_subregion(mem, base,
 +                                sysbus_mmio_get_region(s, 0));
 +}
 +
  static void sbsa_ref_init(MachineState *machine)
  {
      unsigned int smp_cpus = machine->smp.cpus;
@@ -XXX,XX +XXX,XX @@ static void sbsa_ref_init(MachineState *machine)
      create_pcie(sms);
 +    create_secure_ec(secure_sysmem);
 +
      sms->bootinfo.ram_size = machine->ram_size;
      sms->bootinfo.nb_cpus = smp_cpus;
      sms->bootinfo.board_id = -1;
 --
-.16.2
+.20.1

Second pull request of the week; mostly RTH's support for some
new-in-v8.1/v8.3 instructions, and my v8M board model.

thanks
-- PMM

The following changes since commit 427cbc7e4136a061628cb4315cc8182ea36d772f:

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging (2018-03-01 18:46:41 +0000)

are available in the Git repository at:

git://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20180302

for you to fetch changes up to e66a67bf28e1b4fce2e3d72a2610dbd48d9d3078:

target/arm: Enable ARM_FEATURE_V8_FCMA (2018-03-02 11:03:45 +0000)

----------------------------------------------------------------
target-arm queue:
 * implement FCMA and RDM v8.1 and v8.3 instructions
 * enable Cortex-M33 v8M core, and provide new mps2-an505 board model
   that uses it
 * decodetree: Propagate return value from translate subroutines
 * xlnx-zynqmp: Implement the RTC device

----------------------------------------------------------------
Alistair Francis (3):
      xlnx-zynqmp-rtc: Initial commit
      xlnx-zynqmp-rtc: Add basic time support
      xlnx-zynqmp: Connect the RTC device

Peter Maydell (19):
      loader: Add new load_ramdisk_as()
      hw/arm/boot: Honour CPU's address space for image loads
      hw/arm/armv7m: Honour CPU's address space for image loads
      target/arm: Define an IDAU interface
      armv7m: Forward idau property to CPU object
      target/arm: Define init-svtor property for the reset secure VTOR value
      armv7m: Forward init-svtor property to CPU object
      target/arm: Add Cortex-M33
      hw/misc/unimp: Move struct to header file
      include/hw/or-irq.h: Add missing include guard
      qdev: Add new qdev_init_gpio_in_named_with_opaque()
      hw/core/split-irq: Device that splits IRQ lines
      hw/misc/mps2-fpgaio: FPGA control block for MPS2 AN505
      hw/misc/tz-ppc: Model TrustZone peripheral protection controller
      hw/misc/iotkit-secctl: Arm IoT Kit security controller initial skeleton
      hw/misc/iotkit-secctl: Add handling for PPCs
      hw/misc/iotkit-secctl: Add remaining simple registers
      hw/arm/iotkit: Model Arm IOT Kit
      mps2-an505: New board model: MPS2 with AN505 Cortex-M33 FPGA image

Richard Henderson (17):
      decodetree: Propagate return value from translate subroutines
      target/arm: Add ARM_FEATURE_V8_RDM
      target/arm: Refactor disas_simd_indexed decode
      target/arm: Refactor disas_simd_indexed size checks
      target/arm: Decode aa64 armv8.1 scalar three same extra
      target/arm: Decode aa64 armv8.1 three same extra
      target/arm: Decode aa64 armv8.1 scalar/vector x indexed element
      target/arm: Decode aa32 armv8.1 three same
      target/arm: Decode aa32 armv8.1 two reg and a scalar
      target/arm: Enable ARM_FEATURE_V8_RDM
      target/arm: Add ARM_FEATURE_V8_FCMA
      target/arm: Decode aa64 armv8.3 fcadd
      target/arm: Decode aa64 armv8.3 fcmla
      target/arm: Decode aa32 armv8.3 3-same
      target/arm: Decode aa32 armv8.3 2-reg-index
      target/arm: Decode t32 simd 3reg and 2reg_scalar extension
      target/arm: Enable ARM_FEATURE_V8_FCMA

hw/arm/Makefile.objs               |   2 +
 hw/core/Makefile.objs              |   1 +
 hw/misc/Makefile.objs              |   4 +
 hw/timer/Makefile.objs             |   1 +
 target/arm/Makefile.objs           |   2 +-
 include/hw/arm/armv7m.h            |   5 +
 include/hw/arm/iotkit.h            | 109 ++++++
 include/hw/arm/xlnx-zynqmp.h       |   2 +
 include/hw/core/split-irq.h        |  57 +++
 include/hw/irq.h                   |   4 +-
 include/hw/loader.h                |  12 +-
 include/hw/misc/iotkit-secctl.h    | 103 ++++++
 include/hw/misc/mps2-fpgaio.h      |  43 +++
 include/hw/misc/tz-ppc.h           | 101 ++++++
 include/hw/misc/unimp.h            |  10 +
 include/hw/or-irq.h                |   5 +
 include/hw/qdev-core.h             |  30 +-
 include/hw/timer/xlnx-zynqmp-rtc.h |  86 +++++
 target/arm/cpu.h                   |   8 +
 target/arm/helper.h                |  31 ++
 target/arm/idau.h                  |  61 ++++
 hw/arm/armv7m.c                    |  35 +-
 hw/arm/boot.c                      | 119 ++++---
 hw/arm/iotkit.c                    | 598 +++++++++++++++++++++++++++++++
 hw/arm/mps2-tz.c                   | 503 ++++++++++++++++++++++++++
 hw/arm/xlnx-zynqmp.c               |  14 +
 hw/core/loader.c                   |   8 +-
 hw/core/qdev.c                     |   8 +-
 hw/core/split-irq.c                |  89 +++++
 hw/misc/iotkit-secctl.c            | 704 +++++++++++++++++++++++++++++++++++++
 hw/misc/mps2-fpgaio.c              | 176 ++++++++++
 hw/misc/tz-ppc.c                   | 302 ++++++++++++++++
 hw/misc/unimp.c                    |  10 -
 hw/timer/xlnx-zynqmp-rtc.c         | 272 ++++++++++++++
 linux-user/elfload.c               |   2 +
 target/arm/cpu.c                   |  66 +++-
 target/arm/cpu64.c                 |   2 +
 target/arm/helper.c                |  28 +-
 target/arm/translate-a64.c         | 514 +++++++++++++++++++++------
 target/arm/translate.c             | 275 +++++++++++++--
 target/arm/vec_helper.c            | 429 ++++++++++++++++++++++
 default-configs/arm-softmmu.mak    |   5 +
 hw/misc/trace-events               |  24 ++
 hw/timer/trace-events              |   3 +
 scripts/decodetree.py              |   5 +-
 45 files changed, 4668 insertions(+), 200 deletions(-)
 create mode 100644 include/hw/arm/iotkit.h
 create mode 100644 include/hw/core/split-irq.h
 create mode 100644 include/hw/misc/iotkit-secctl.h
 create mode 100644 include/hw/misc/mps2-fpgaio.h
 create mode 100644 include/hw/misc/tz-ppc.h
 create mode 100644 include/hw/timer/xlnx-zynqmp-rtc.h
 create mode 100644 target/arm/idau.h
 create mode 100644 hw/arm/iotkit.c
 create mode 100644 hw/arm/mps2-tz.c
 create mode 100644 hw/core/split-irq.c
 create mode 100644 hw/misc/iotkit-secctl.c
 create mode 100644 hw/misc/mps2-fpgaio.c
 create mode 100644 hw/misc/tz-ppc.c
 create mode 100644 hw/timer/xlnx-zynqmp-rtc.c
 create mode 100644 target/arm/vec_helper.c

From: Alistair Francis <alistair.francis@xilinx.com>

Initial commit of the ZynqMP RTC device.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/timer/Makefile.objs             |   1 +
 include/hw/timer/xlnx-zynqmp-rtc.h |  84 +++++++++++++++
 hw/timer/xlnx-zynqmp-rtc.c         | 214 +++++++++++++++++++++++++++++++++++++
 3 files changed, 299 insertions(+)
 create mode 100644 include/hw/timer/xlnx-zynqmp-rtc.h
 create mode 100644 hw/timer/xlnx-zynqmp-rtc.c

diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -XXX,XX +XXX,XX @@ common-obj-$(CONFIG_IMX) += imx_epit.o
 common-obj-$(CONFIG_IMX) += imx_gpt.o
 common-obj-$(CONFIG_LM32) += lm32_timer.o
 common-obj-$(CONFIG_MILKYMIST) += milkymist-sysctl.o
+common-obj-$(CONFIG_XLNX_ZYNQMP) += xlnx-zynqmp-rtc.o
 
 obj-$(CONFIG_ALTERA_TIMER) += altera_timer.o
 obj-$(CONFIG_EXYNOS4) += exynos4210_mct.o
diff --git a/include/hw/timer/xlnx-zynqmp-rtc.h b/include/hw/timer/xlnx-zynqmp-rtc.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/hw/timer/xlnx-zynqmp-rtc.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * QEMU model of the Xilinx ZynqMP Real Time Clock (RTC).
+ *
+ * Copyright (c) 2017 Xilinx Inc.
+ *
+ * Written-by: Alistair Francis <alistair.francis@xilinx.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "hw/register.h"
+
+#define TYPE_XLNX_ZYNQMP_RTC "xlnx-zynmp.rtc"
+
+#define XLNX_ZYNQMP_RTC(obj) \
+     OBJECT_CHECK(XlnxZynqMPRTC, (obj), TYPE_XLNX_ZYNQMP_RTC)
+
+REG32(SET_TIME_WRITE, 0x0)
+REG32(SET_TIME_READ, 0x4)
+REG32(CALIB_WRITE, 0x8)
+    FIELD(CALIB_WRITE, FRACTION_EN, 20, 1)
+    FIELD(CALIB_WRITE, FRACTION_DATA, 16, 4)
+    FIELD(CALIB_WRITE, MAX_TICK, 0, 16)
+REG32(CALIB_READ, 0xc)
+    FIELD(CALIB_READ, FRACTION_EN, 20, 1)
+    FIELD(CALIB_READ, FRACTION_DATA, 16, 4)
+    FIELD(CALIB_READ, MAX_TICK, 0, 16)
+REG32(CURRENT_TIME, 0x10)
+REG32(CURRENT_TICK, 0x14)
+    FIELD(CURRENT_TICK, VALUE, 0, 16)
+REG32(ALARM, 0x18)
+REG32(RTC_INT_STATUS, 0x20)
+    FIELD(RTC_INT_STATUS, ALARM, 1, 1)
+    FIELD(RTC_INT_STATUS, SECONDS, 0, 1)
+REG32(RTC_INT_MASK, 0x24)
+    FIELD(RTC_INT_MASK, ALARM, 1, 1)
+    FIELD(RTC_INT_MASK, SECONDS, 0, 1)
+REG32(RTC_INT_EN, 0x28)
+    FIELD(RTC_INT_EN, ALARM, 1, 1)
+    FIELD(RTC_INT_EN, SECONDS, 0, 1)
+REG32(RTC_INT_DIS, 0x2c)
+    FIELD(RTC_INT_DIS, ALARM, 1, 1)
+    FIELD(RTC_INT_DIS, SECONDS, 0, 1)
+REG32(ADDR_ERROR, 0x30)
+    FIELD(ADDR_ERROR, STATUS, 0, 1)
+REG32(ADDR_ERROR_INT_MASK, 0x34)
+    FIELD(ADDR_ERROR_INT_MASK, MASK, 0, 1)
+REG32(ADDR_ERROR_INT_EN, 0x38)
+    FIELD(ADDR_ERROR_INT_EN, MASK, 0, 1)
+REG32(ADDR_ERROR_INT_DIS, 0x3c)
+    FIELD(ADDR_ERROR_INT_DIS, MASK, 0, 1)
+REG32(CONTROL, 0x40)
+    FIELD(CONTROL, BATTERY_DISABLE, 31, 1)
+    FIELD(CONTROL, OSC_CNTRL, 24, 4)
+    FIELD(CONTROL, SLVERR_ENABLE, 0, 1)
+REG32(SAFETY_CHK, 0x50)
+
+#define XLNX_ZYNQMP_RTC_R_MAX (R_SAFETY_CHK + 1)
+
+typedef struct XlnxZynqMPRTC {
+    SysBusDevice parent_obj;
+    MemoryRegion iomem;
+    qemu_irq irq_rtc_int;
+    qemu_irq irq_addr_error_int;
+
+    uint32_t regs[XLNX_ZYNQMP_RTC_R_MAX];
+    RegisterInfo regs_info[XLNX_ZYNQMP_RTC_R_MAX];
+} XlnxZynqMPRTC;
diff --git a/hw/timer/xlnx-zynqmp-rtc.c b/hw/timer/xlnx-zynqmp-rtc.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/timer/xlnx-zynqmp-rtc.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * QEMU model of the Xilinx ZynqMP Real Time Clock (RTC).
+ *
+ * Copyright (c) 2017 Xilinx Inc.
+ *
+ * Written-by: Alistair Francis <alistair.francis@xilinx.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/register.h"
+#include "qemu/bitops.h"
+#include "qemu/log.h"
+#include "hw/timer/xlnx-zynqmp-rtc.h"
+
+#ifndef XLNX_ZYNQMP_RTC_ERR_DEBUG
+#define XLNX_ZYNQMP_RTC_ERR_DEBUG 0
+#endif
+
+static void rtc_int_update_irq(XlnxZynqMPRTC *s)
+{
+    bool pending = s->regs[R_RTC_INT_STATUS] & ~s->regs[R_RTC_INT_MASK];
+    qemu_set_irq(s->irq_rtc_int, pending);
+}
+
+static void addr_error_int_update_irq(XlnxZynqMPRTC *s)
+{
+    bool pending = s->regs[R_ADDR_ERROR] & ~s->regs[R_ADDR_ERROR_INT_MASK];
+    qemu_set_irq(s->irq_addr_error_int, pending);
+}
+
+static void rtc_int_status_postw(RegisterInfo *reg, uint64_t val64)
+{
+    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
+    rtc_int_update_irq(s);
+}
+
+static uint64_t rtc_int_en_prew(RegisterInfo *reg, uint64_t val64)
+{
+    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
+
+    s->regs[R_RTC_INT_MASK] &= (uint32_t) ~val64;
+    rtc_int_update_irq(s);
+    return 0;
+}
+
+static uint64_t rtc_int_dis_prew(RegisterInfo *reg, uint64_t val64)
+{
+    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
+
+    s->regs[R_RTC_INT_MASK] |= (uint32_t) val64;
+    rtc_int_update_irq(s);
+    return 0;
+}
+
+static void addr_error_postw(RegisterInfo *reg, uint64_t val64)
+{
+    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
+    addr_error_int_update_irq(s);
+}
+
+static uint64_t addr_error_int_en_prew(RegisterInfo *reg, uint64_t val64)
+{
+    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
+
+    s->regs[R_ADDR_ERROR_INT_MASK] &= (uint32_t) ~val64;
+    addr_error_int_update_irq(s);
+    return 0;
+}
+
+static uint64_t addr_error_int_dis_prew(RegisterInfo *reg, uint64_t val64)
+{
+    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
+
+    s->regs[R_ADDR_ERROR_INT_MASK] |= (uint32_t) val64;
+    addr_error_int_update_irq(s);
+    return 0;
+}
+
+static const RegisterAccessInfo rtc_regs_info[] = {
+    {   .name = "SET_TIME_WRITE",  .addr = A_SET_TIME_WRITE,
+    },{ .name = "SET_TIME_READ",  .addr = A_SET_TIME_READ,
+        .ro = 0xffffffff,
+    },{ .name = "CALIB_WRITE",  .addr = A_CALIB_WRITE,
+    },{ .name = "CALIB_READ",  .addr = A_CALIB_READ,
+        .ro = 0x1fffff,
+    },{ .name = "CURRENT_TIME",  .addr = A_CURRENT_TIME,
+        .ro = 0xffffffff,
+    },{ .name = "CURRENT_TICK",  .addr = A_CURRENT_TICK,
+        .ro = 0xffff,
+    },{ .name = "ALARM",  .addr = A_ALARM,
+    },{ .name = "RTC_INT_STATUS",  .addr = A_RTC_INT_STATUS,
+        .w1c = 0x3,
+        .post_write = rtc_int_status_postw,
+    },{ .name = "RTC_INT_MASK",  .addr = A_RTC_INT_MASK,
+        .reset = 0x3,
+        .ro = 0x3,
+    },{ .name = "RTC_INT_EN",  .addr = A_RTC_INT_EN,
+        .pre_write = rtc_int_en_prew,
+    },{ .name = "RTC_INT_DIS",  .addr = A_RTC_INT_DIS,
+        .pre_write = rtc_int_dis_prew,
+    },{ .name = "ADDR_ERROR",  .addr = A_ADDR_ERROR,
+        .w1c = 0x1,
+        .post_write = addr_error_postw,
+    },{ .name = "ADDR_ERROR_INT_MASK",  .addr = A_ADDR_ERROR_INT_MASK,
+        .reset = 0x1,
+        .ro = 0x1,
+    },{ .name = "ADDR_ERROR_INT_EN",  .addr = A_ADDR_ERROR_INT_EN,
+        .pre_write = addr_error_int_en_prew,
+    },{ .name = "ADDR_ERROR_INT_DIS",  .addr = A_ADDR_ERROR_INT_DIS,
+        .pre_write = addr_error_int_dis_prew,
+    },{ .name = "CONTROL",  .addr = A_CONTROL,
+        .reset = 0x1000000,
+        .rsvd = 0x70fffffe,
+    },{ .name = "SAFETY_CHK",  .addr = A_SAFETY_CHK,
+    }
+};
+
+static void rtc_reset(DeviceState *dev)
+{
+    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(dev);
+    unsigned int i;
+
+    for (i = 0; i < ARRAY_SIZE(s->regs_info); ++i) {
+        register_reset(&s->regs_info[i]);
+    }
+
+    rtc_int_update_irq(s);
+    addr_error_int_update_irq(s);
+}
+
+static const MemoryRegionOps rtc_ops = {
+    .read = register_read_memory,
+    .write = register_write_memory,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
+
+static void rtc_init(Object *obj)
+{
+    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(obj);
+    SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
+    RegisterInfoArray *reg_array;
+
+    memory_region_init(&s->iomem, obj, TYPE_XLNX_ZYNQMP_RTC,
+                       XLNX_ZYNQMP_RTC_R_MAX * 4);
+    reg_array =
+        register_init_block32(DEVICE(obj), rtc_regs_info,
+                              ARRAY_SIZE(rtc_regs_info),
+                              s->regs_info, s->regs,
+                              &rtc_ops,
+                              XLNX_ZYNQMP_RTC_ERR_DEBUG,
+                              XLNX_ZYNQMP_RTC_R_MAX * 4);
+    memory_region_add_subregion(&s->iomem,
+                                0x0,
+                                &reg_array->mem);
+    sysbus_init_mmio(sbd, &s->iomem);
+    sysbus_init_irq(sbd, &s->irq_rtc_int);
+    sysbus_init_irq(sbd, &s->irq_addr_error_int);
+}
+
+static const VMStateDescription vmstate_rtc = {
+    .name = TYPE_XLNX_ZYNQMP_RTC,
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32_ARRAY(regs, XlnxZynqMPRTC, XLNX_ZYNQMP_RTC_R_MAX),
+        VMSTATE_END_OF_LIST(),
+    }
+};
+
+static void rtc_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->reset = rtc_reset;
+    dc->vmsd = &vmstate_rtc;
+}
+
+static const TypeInfo rtc_info = {
+    .name          = TYPE_XLNX_ZYNQMP_RTC,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(XlnxZynqMPRTC),
+    .class_init    = rtc_class_init,
+    .instance_init = rtc_init,
+};
+
+static void rtc_register_types(void)
+{
+    type_register_static(&rtc_info);
+}
+
+type_init(rtc_register_types)
-- 
2.16.2

From: Alistair Francis <alistair.francis@xilinx.com>

Allow the guest to determine the time set from the QEMU command line.

This includes adding a trace event to debug the new time.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/timer/xlnx-zynqmp-rtc.h |  2 ++
 hw/timer/xlnx-zynqmp-rtc.c         | 58 ++++++++++++++++++++++++++++++++++++++
 hw/timer/trace-events              |  3 ++
 3 files changed, 63 insertions(+)

diff --git a/include/hw/timer/xlnx-zynqmp-rtc.h b/include/hw/timer/xlnx-zynqmp-rtc.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/timer/xlnx-zynqmp-rtc.h
+++ b/include/hw/timer/xlnx-zynqmp-rtc.h
@@ -XXX,XX +XXX,XX @@ typedef struct XlnxZynqMPRTC {
     qemu_irq irq_rtc_int;
     qemu_irq irq_addr_error_int;
 
+    uint32_t tick_offset;
+
     uint32_t regs[XLNX_ZYNQMP_RTC_R_MAX];
     RegisterInfo regs_info[XLNX_ZYNQMP_RTC_R_MAX];
 } XlnxZynqMPRTC;
diff --git a/hw/timer/xlnx-zynqmp-rtc.c b/hw/timer/xlnx-zynqmp-rtc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/xlnx-zynqmp-rtc.c
+++ b/hw/timer/xlnx-zynqmp-rtc.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/register.h"
 #include "qemu/bitops.h"
 #include "qemu/log.h"
+#include "hw/ptimer.h"
+#include "qemu/cutils.h"
+#include "sysemu/sysemu.h"
+#include "trace.h"
 #include "hw/timer/xlnx-zynqmp-rtc.h"
 
 #ifndef XLNX_ZYNQMP_RTC_ERR_DEBUG
@@ -XXX,XX +XXX,XX @@ static void addr_error_int_update_irq(XlnxZynqMPRTC *s)
     qemu_set_irq(s->irq_addr_error_int, pending);
 }
 
+static uint32_t rtc_get_count(XlnxZynqMPRTC *s)
+{
+    int64_t now = qemu_clock_get_ns(rtc_clock);
+    return s->tick_offset + now / NANOSECONDS_PER_SECOND;
+}
+
+static uint64_t current_time_postr(RegisterInfo *reg, uint64_t val64)
+{
+    XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
+
+    return rtc_get_count(s);
+}
+
 static void rtc_int_status_postw(RegisterInfo *reg, uint64_t val64)
 {
     XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(reg->opaque);
@@ -XXX,XX +XXX,XX @@ static uint64_t addr_error_int_dis_prew(RegisterInfo *reg, uint64_t val64)
 
 static const RegisterAccessInfo rtc_regs_info[] = {
     {   .name = "SET_TIME_WRITE",  .addr = A_SET_TIME_WRITE,
+        .unimp = MAKE_64BIT_MASK(0, 32),
     },{ .name = "SET_TIME_READ",  .addr = A_SET_TIME_READ,
         .ro = 0xffffffff,
+        .post_read = current_time_postr,
     },{ .name = "CALIB_WRITE",  .addr = A_CALIB_WRITE,
+        .unimp = MAKE_64BIT_MASK(0, 32),
     },{ .name = "CALIB_READ",  .addr = A_CALIB_READ,
         .ro = 0x1fffff,
     },{ .name = "CURRENT_TIME",  .addr = A_CURRENT_TIME,
         .ro = 0xffffffff,
+        .post_read = current_time_postr,
     },{ .name = "CURRENT_TICK",  .addr = A_CURRENT_TICK,
         .ro = 0xffff,
     },{ .name = "ALARM",  .addr = A_ALARM,
@@ -XXX,XX +XXX,XX @@ static void rtc_init(Object *obj)
     XlnxZynqMPRTC *s = XLNX_ZYNQMP_RTC(obj);
     SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
     RegisterInfoArray *reg_array;
+    struct tm current_tm;
 
     memory_region_init(&s->iomem, obj, TYPE_XLNX_ZYNQMP_RTC,
                        XLNX_ZYNQMP_RTC_R_MAX * 4);
@@ -XXX,XX +XXX,XX @@ static void rtc_init(Object *obj)
     sysbus_init_mmio(sbd, &s->iomem);
     sysbus_init_irq(sbd, &s->irq_rtc_int);
     sysbus_init_irq(sbd, &s->irq_addr_error_int);
+
+    qemu_get_timedate(&current_tm, 0);
+    s->tick_offset = mktimegm(&current_tm) -
+        qemu_clock_get_ns(rtc_clock) / NANOSECONDS_PER_SECOND;
+
+    trace_xlnx_zynqmp_rtc_gettime(current_tm.tm_year, current_tm.tm_mon,
+                                  current_tm.tm_mday, current_tm.tm_hour,
+                                  current_tm.tm_min, current_tm.tm_sec);
+}
+
+static int rtc_pre_save(void *opaque)
+{
+    XlnxZynqMPRTC *s = opaque;
+    int64_t now = qemu_clock_get_ns(rtc_clock) / NANOSECONDS_PER_SECOND;
+
+    /* Add the time at migration */
+    s->tick_offset = s->tick_offset + now;
+
+    return 0;
+}
+
+static int rtc_post_load(void *opaque, int version_id)
+{
+    XlnxZynqMPRTC *s = opaque;
+    int64_t now = qemu_clock_get_ns(rtc_clock) / NANOSECONDS_PER_SECOND;
+
+    /* Subtract the time after migration. This combined with the pre_save
+     * action results in us having subtracted the time that the guest was
+     * stopped to the offset.
+     */
+    s->tick_offset = s->tick_offset - now;
+
+    return 0;
 }
 
 static const VMStateDescription vmstate_rtc = {
     .name = TYPE_XLNX_ZYNQMP_RTC,
     .version_id = 1,
     .minimum_version_id = 1,
+    .pre_save = rtc_pre_save,
+    .post_load = rtc_post_load,
     .fields = (VMStateField[]) {
         VMSTATE_UINT32_ARRAY(regs, XlnxZynqMPRTC, XLNX_ZYNQMP_RTC_R_MAX),
+        VMSTATE_UINT32(tick_offset, XlnxZynqMPRTC),
         VMSTATE_END_OF_LIST(),
     }
 };
diff --git a/hw/timer/trace-events b/hw/timer/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/trace-events
+++ b/hw/timer/trace-events
@@ -XXX,XX +XXX,XX @@ systick_write(uint64_t addr, uint32_t value, unsigned size) "systick write addr
 cmsdk_apb_timer_read(uint64_t offset, uint64_t data, unsigned size) "CMSDK APB timer read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
 cmsdk_apb_timer_write(uint64_t offset, uint64_t data, unsigned size) "CMSDK APB timer write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
 cmsdk_apb_timer_reset(void) "CMSDK APB timer: reset"
+
+# hw/timer/xlnx-zynqmp-rtc.c
+xlnx_zynqmp_rtc_gettime(int year, int month, int day, int hour, int min, int sec) "Get time from host: %d-%d-%d %2d:%02d:%02d"
-- 
2.16.2

From: Alistair Francis <alistair.francis@xilinx.com>

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-zynqmp.h |  2 ++
 hw/arm/xlnx-zynqmp.c         | 14 ++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/xlnx-zynqmp.h
+++ b/include/hw/arm/xlnx-zynqmp.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/dma/xlnx_dpdma.h"
 #include "hw/display/xlnx_dp.h"
 #include "hw/intc/xlnx-zynqmp-ipi.h"
+#include "hw/timer/xlnx-zynqmp-rtc.h"
 
 #define TYPE_XLNX_ZYNQMP "xlnx,zynqmp"
 #define XLNX_ZYNQMP(obj) OBJECT_CHECK(XlnxZynqMPState, (obj), \
@@ -XXX,XX +XXX,XX @@ typedef struct XlnxZynqMPState {
     XlnxDPState dp;
     XlnxDPDMAState dpdma;
     XlnxZynqMPIPI ipi;
+    XlnxZynqMPRTC rtc;
 
     char *boot_cpu;
     ARMCPU *boot_cpu_ptr;
diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -XXX,XX +XXX,XX @@
 #define IPI_ADDR            0xFF300000
 #define IPI_IRQ             64
 
+#define RTC_ADDR            0xffa60000
+#define RTC_IRQ             26
+
 #define SDHCI_CAPABILITIES  0x280737ec6481 /* Datasheet: UG1085 (v1.7) */
 
 static const uint64_t gem_addr[XLNX_ZYNQMP_NUM_GEMS] = {
@@ -XXX,XX +XXX,XX @@ static void xlnx_zynqmp_init(Object *obj)
 
     object_initialize(&s->ipi, sizeof(s->ipi), TYPE_XLNX_ZYNQMP_IPI);
     qdev_set_parent_bus(DEVICE(&s->ipi), sysbus_get_default());
+
+    object_initialize(&s->rtc, sizeof(s->rtc), TYPE_XLNX_ZYNQMP_RTC);
+    qdev_set_parent_bus(DEVICE(&s->rtc), sysbus_get_default());
 }
 
 static void xlnx_zynqmp_realize(DeviceState *dev, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error **errp)
     }
     sysbus_mmio_map(SYS_BUS_DEVICE(&s->ipi), 0, IPI_ADDR);
     sysbus_connect_irq(SYS_BUS_DEVICE(&s->ipi), 0, gic_spi[IPI_IRQ]);
+
+    object_property_set_bool(OBJECT(&s->rtc), true, "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    sysbus_mmio_map(SYS_BUS_DEVICE(&s->rtc), 0, RTC_ADDR);
+    sysbus_connect_irq(SYS_BUS_DEVICE(&s->rtc), 0, gic_spi[RTC_IRQ]);
 }
 
 static Property xlnx_zynqmp_props[] = {
-- 
2.16.2

From: Richard Henderson <richard.henderson@linaro.org>

Allow the translate subroutines to return false for invalid insns.

At present we can of course invoke an invalid insn exception from within
the translate subroutine, but in the short term this consolidates code.
In the long term it would allow the decodetree language to support
overlapping patterns for ISA extensions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180227232618.2908-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 scripts/decodetree.py | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index XXXXXXX..XXXXXXX 100755
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -XXX,XX +XXX,XX @@ class Pattern(General):
         global translate_prefix
         output('typedef ', self.base.base.struct_name(),
                ' arg_', self.name, ';\n')
-        output(translate_scope, 'void ', translate_prefix, '_', self.name,
+        output(translate_scope, 'bool ', translate_prefix, '_', self.name,
                '(DisasContext *ctx, arg_', self.name,
                ' *a, ', insntype, ' insn);\n')
 
@@ -XXX,XX +XXX,XX @@ class Pattern(General):
             output(ind, self.base.extract_name(), '(&u.f_', arg, ', insn);\n')
         for n, f in self.fields.items():
             output(ind, 'u.f_', arg, '.', n, ' = ', f.str_extract(), ';\n')
-        output(ind, translate_prefix, '_', self.name,
+        output(ind, 'return ', translate_prefix, '_', self.name,
                '(ctx, &u.f_', arg, ', insn);\n')
-        output(ind, 'return true;\n')
 # end Pattern
 
 
-- 
2.16.2

Add a function load_ramdisk_as() which behaves like the existing
load_ramdisk() but allows the caller to specify the AddressSpace
to use. This matches the pattern we have already for various
other loader functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-2-peter.maydell@linaro.org
---
 include/hw/loader.h | 12 +++++++++++-
 hw/core/loader.c    |  8 +++++++-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/hw/loader.h b/include/hw/loader.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/loader.h
+++ b/include/hw/loader.h
@@ -XXX,XX +XXX,XX @@ int load_uimage(const char *filename, hwaddr *ep,
                 void *translate_opaque);
 
 /**
- * load_ramdisk:
+ * load_ramdisk_as:
  * @filename: Path to the ramdisk image
  * @addr: Memory address to load the ramdisk to
  * @max_sz: Maximum allowed ramdisk size (for non-u-boot ramdisks)
+ * @as: The AddressSpace to load the ELF to. The value of address_space_memory
+ *      is used if nothing is supplied here.
  *
  * Load a ramdisk image with U-Boot header to the specified memory
  * address.
  *
  * Returns the size of the loaded image on success, -1 otherwise.
  */
+int load_ramdisk_as(const char *filename, hwaddr addr, uint64_t max_sz,
+                    AddressSpace *as);
+
+/**
+ * load_ramdisk:
+ * Same as load_ramdisk_as(), but doesn't allow the caller to specify
+ * an AddressSpace.
+ */
 int load_ramdisk(const char *filename, hwaddr addr, uint64_t max_sz);
 
 ssize_t gunzip(void *dst, size_t dstlen, uint8_t *src, size_t srclen);
diff --git a/hw/core/loader.c b/hw/core/loader.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -XXX,XX +XXX,XX @@ int load_uimage_as(const char *filename, hwaddr *ep, hwaddr *loadaddr,
 
 /* Load a ramdisk.  */
 int load_ramdisk(const char *filename, hwaddr addr, uint64_t max_sz)
+{
+    return load_ramdisk_as(filename, addr, max_sz, NULL);
+}
+
+int load_ramdisk_as(const char *filename, hwaddr addr, uint64_t max_sz,
+                    AddressSpace *as)
 {
     return load_uboot_image(filename, NULL, &addr, NULL, IH_TYPE_RAMDISK,
-                            NULL, NULL, NULL);
+                            NULL, NULL, as);
 }
 
 /* Load a gzip-compressed kernel to a dynamically allocated buffer. */
-- 
2.16.2

Instead of loading kernels, device trees, and the like to
the system address space, use the CPU's address space. This
is important if we're trying to load the file to memory or
via an alias memory region that is provided by an SoC
object and thus not mapped into the system address space.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-3-peter.maydell@linaro.org
---
 hw/arm/boot.c | 119 +++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 76 insertions(+), 43 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@
 #define ARM64_TEXT_OFFSET_OFFSET    8
 #define ARM64_MAGIC_OFFSET          56
 
+static AddressSpace *arm_boot_address_space(ARMCPU *cpu,
+                                            const struct arm_boot_info *info)
+{
+    /* Return the address space to use for bootloader reads and writes.
+     * We prefer the secure address space if the CPU has it and we're
+     * going to boot the guest into it.
+     */
+    int asidx;
+    CPUState *cs = CPU(cpu);
+
+    if (arm_feature(&cpu->env, ARM_FEATURE_EL3) && info->secure_boot) {
+        asidx = ARMASIdx_S;
+    } else {
+        asidx = ARMASIdx_NS;
+    }
+
+    return cpu_get_address_space(cs, asidx);
+}
+
 typedef enum {
     FIXUP_NONE = 0,     /* do nothing */
     FIXUP_TERMINATOR,   /* end of insns */
@@ -XXX,XX +XXX,XX @@ static const ARMInsnFixup smpboot[] = {
 };
 
 static void write_bootloader(const char *name, hwaddr addr,
-                             const ARMInsnFixup *insns, uint32_t *fixupcontext)
+                             const ARMInsnFixup *insns, uint32_t *fixupcontext,
+                             AddressSpace *as)
 {
     /* Fix up the specified bootloader fragment and write it into
      * guest memory using rom_add_blob_fixed(). fixupcontext is
@@ -XXX,XX +XXX,XX @@ static void write_bootloader(const char *name, hwaddr addr,
         code[i] = tswap32(insn);
     }
 
-    rom_add_blob_fixed(name, code, len * sizeof(uint32_t), addr);
+    rom_add_blob_fixed_as(name, code, len * sizeof(uint32_t), addr, as);
 
     g_free(code);
 }
@@ -XXX,XX +XXX,XX @@ static void default_write_secondary(ARMCPU *cpu,
                                     const struct arm_boot_info *info)
 {
     uint32_t fixupcontext[FIXUP_MAX];
+    AddressSpace *as = arm_boot_address_space(cpu, info);
 
     fixupcontext[FIXUP_GIC_CPU_IF] = info->gic_cpu_if_addr;
     fixupcontext[FIXUP_BOOTREG] = info->smp_bootreg_addr;
@@ -XXX,XX +XXX,XX @@ static void default_write_secondary(ARMCPU *cpu,
     }
 
     write_bootloader("smpboot", info->smp_loader_start,
-                     smpboot, fixupcontext);
+                     smpboot, fixupcontext, as);
 }
 
 void arm_write_secure_board_setup_dummy_smc(ARMCPU *cpu,
                                             const struct arm_boot_info *info,
                                             hwaddr mvbar_addr)
 {
+    AddressSpace *as = arm_boot_address_space(cpu, info);
     int n;
     uint32_t mvbar_blob[] = {
         /* mvbar_addr: secure monitor vectors
@@ -XXX,XX +XXX,XX @@ void arm_write_secure_board_setup_dummy_smc(ARMCPU *cpu,
     for (n = 0; n < ARRAY_SIZE(mvbar_blob); n++) {
         mvbar_blob[n] = tswap32(mvbar_blob[n]);
     }
-    rom_add_blob_fixed("board-setup-mvbar", mvbar_blob, sizeof(mvbar_blob),
-                       mvbar_addr);
+    rom_add_blob_fixed_as("board-setup-mvbar", mvbar_blob, sizeof(mvbar_blob),
+                          mvbar_addr, as);
 
     for (n = 0; n < ARRAY_SIZE(board_setup_blob); n++) {
         board_setup_blob[n] = tswap32(board_setup_blob[n]);
     }
-    rom_add_blob_fixed("board-setup", board_setup_blob,
-                       sizeof(board_setup_blob), info->board_setup_addr);
+    rom_add_blob_fixed_as("board-setup", board_setup_blob,
+                          sizeof(board_setup_blob), info->board_setup_addr, as);
 }
 
 static void default_reset_secondary(ARMCPU *cpu,
                                     const struct arm_boot_info *info)
 {
+    AddressSpace *as = arm_boot_address_space(cpu, info);
     CPUState *cs = CPU(cpu);
 
-    address_space_stl_notdirty(&address_space_memory, info->smp_bootreg_addr,
+    address_space_stl_notdirty(as, info->smp_bootreg_addr,
                                0, MEMTXATTRS_UNSPECIFIED, NULL);
     cpu_set_pc(cs, info->smp_loader_start);
 }
@@ -XXX,XX +XXX,XX @@ static inline bool have_dtb(const struct arm_boot_info *info)
 }
 
 #define WRITE_WORD(p, value) do { \
-    address_space_stl_notdirty(&address_space_memory, p, value, \
+    address_space_stl_notdirty(as, p, value, \
                                MEMTXATTRS_UNSPECIFIED, NULL);  \
     p += 4;                       \
 } while (0)
 
-static void set_kernel_args(const struct arm_boot_info *info)
+static void set_kernel_args(const struct arm_boot_info *info, AddressSpace *as)
 {
     int initrd_size = info->initrd_size;
     hwaddr base = info->loader_start;
@@ -XXX,XX +XXX,XX @@ static void set_kernel_args(const struct arm_boot_info *info)
         int cmdline_size;
 
         cmdline_size = strlen(info->kernel_cmdline);
-        cpu_physical_memory_write(p + 8, info->kernel_cmdline,
-                                  cmdline_size + 1);
+        address_space_write(as, p + 8, MEMTXATTRS_UNSPECIFIED,
+                            (const uint8_t *)info->kernel_cmdline,
+                            cmdline_size + 1);
         cmdline_size = (cmdline_size >> 2) + 1;
         WRITE_WORD(p, cmdline_size + 2);
         WRITE_WORD(p, 0x54410009);
@@ -XXX,XX +XXX,XX @@ static void set_kernel_args(const struct arm_boot_info *info)
         atag_board_len = (info->atag_board(info, atag_board_buf) + 3) & ~3;
         WRITE_WORD(p, (atag_board_len + 8) >> 2);
         WRITE_WORD(p, 0x414f4d50);
-        cpu_physical_memory_write(p, atag_board_buf, atag_board_len);
+        address_space_write(as, p, MEMTXATTRS_UNSPECIFIED,
+                            atag_board_buf, atag_board_len);
         p += atag_board_len;
     }
     /* ATAG_END */
@@ -XXX,XX +XXX,XX @@ static void set_kernel_args(const struct arm_boot_info *info)
     WRITE_WORD(p, 0);
 }
 
-static void set_kernel_args_old(const struct arm_boot_info *info)
+static void set_kernel_args_old(const struct arm_boot_info *info,
+                                AddressSpace *as)
 {
     hwaddr p;
     const char *s;
@@ -XXX,XX +XXX,XX @@ static void set_kernel_args_old(const struct arm_boot_info *info)
     }
     s = info->kernel_cmdline;
     if (s) {
-        cpu_physical_memory_write(p, s, strlen(s) + 1);
+        address_space_write(as, p, MEMTXATTRS_UNSPECIFIED,
+                            (const uint8_t *)s, strlen(s) + 1);
     } else {
         WRITE_WORD(p, 0);
     }
@@ -XXX,XX +XXX,XX @@ static void fdt_add_psci_node(void *fdt)
  * @addr:       the address to load the image at
  * @binfo:      struct describing the boot environment
  * @addr_limit: upper limit of the available memory area at @addr
+ * @as:         address space to load image to
  *
  * Load a device tree supplied by the machine or by the user  with the
  * '-dtb' command line option, and put it at offset @addr in target
@@ -XXX,XX +XXX,XX @@ static void fdt_add_psci_node(void *fdt)
  * Note: Must not be called unless have_dtb(binfo) is true.
  */
 static int load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
-                    hwaddr addr_limit)
+                    hwaddr addr_limit, AddressSpace *as)
 {
     void *fdt = NULL;
     int size, rc;
@@ -XXX,XX +XXX,XX @@ static int load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
     /* Put the DTB into the memory map as a ROM image: this will ensure
      * the DTB is copied again upon reset, even if addr points into RAM.
      */
-    rom_add_blob_fixed("dtb", fdt, size, addr);
+    rom_add_blob_fixed_as("dtb", fdt, size, addr, as);
 
     g_free(fdt);
 
@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
             }
 
             if (cs == first_cpu) {
+                AddressSpace *as = arm_boot_address_space(cpu, info);
+
                 cpu_set_pc(cs, info->loader_start);
 
                 if (!have_dtb(info)) {
                     if (old_param) {
-                        set_kernel_args_old(info);
+                        set_kernel_args_old(info, as);
                     } else {
-                        set_kernel_args(info);
+                        set_kernel_args(info, as);
                     }
                 }
             } else {
@@ -XXX,XX +XXX,XX @@ static int do_arm_linux_init(Object *obj, void *opaque)
 
 static uint64_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry,
                              uint64_t *lowaddr, uint64_t *highaddr,
-                             int elf_machine)
+                             int elf_machine, AddressSpace *as)
 {
     bool elf_is64;
     union {
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry,
         }
     }
 
-    ret = load_elf(info->kernel_filename, NULL, NULL,
-                   pentry, lowaddr, highaddr, big_endian, elf_machine,
-                   1, data_swab);
+    ret = load_elf_as(info->kernel_filename, NULL, NULL,
+                      pentry, lowaddr, highaddr, big_endian, elf_machine,
+                      1, data_swab, as);
     if (ret <= 0) {
         /* The header loaded but the image didn't */
         exit(1);
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry,
 }
 
 static uint64_t load_aarch64_image(const char *filename, hwaddr mem_base,
-                                   hwaddr *entry)
+                                   hwaddr *entry, AddressSpace *as)
 {
     hwaddr kernel_load_offset = KERNEL64_LOAD_ADDR;
     uint8_t *buffer;
@@ -XXX,XX +XXX,XX @@ static uint64_t load_aarch64_image(const char *filename, hwaddr mem_base,
     }
 
     *entry = mem_base + kernel_load_offset;
-    rom_add_blob_fixed(filename, buffer, size, *entry);
+    rom_add_blob_fixed_as(filename, buffer, size, *entry, as);
 
     g_free(buffer);
 
@@ -XXX,XX +XXX,XX @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
     ARMCPU *cpu = n->cpu;
     struct arm_boot_info *info =
         container_of(n, struct arm_boot_info, load_kernel_notifier);
+    AddressSpace *as = arm_boot_address_space(cpu, info);
 
     /* The board code is not supposed to set secure_board_setup unless
      * running its code in secure mode is actually possible, and KVM
@@ -XXX,XX +XXX,XX @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
              * the kernel is supposed to be loaded by the bootloader), copy the
              * DTB to the base of RAM for the bootloader to pick up.
              */
-            if (load_dtb(info->loader_start, info, 0) < 0) {
+            if (load_dtb(info->loader_start, info, 0, as) < 0) {
                 exit(1);
             }
         }
@@ -XXX,XX +XXX,XX @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
 
     /* Assume that raw images are linux kernels, and ELF images are not.  */
     kernel_size = arm_load_elf(info, &elf_entry, &elf_low_addr,
-                               &elf_high_addr, elf_machine);
+                               &elf_high_addr, elf_machine, as);
     if (kernel_size > 0 && have_dtb(info)) {
         /* If there is still some room left at the base of RAM, try and put
          * the DTB there like we do for images loaded with -bios or -pflash.
@@ -XXX,XX +XXX,XX @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
             if (elf_low_addr < info->loader_start) {
                 elf_low_addr = 0;
             }
-            if (load_dtb(info->loader_start, info, elf_low_addr) < 0) {
+            if (load_dtb(info->loader_start, info, elf_low_addr, as) < 0) {
                 exit(1);
             }
         }
     }
     entry = elf_entry;
     if (kernel_size < 0) {
-        kernel_size = load_uimage(info->kernel_filename, &entry, NULL,
-                                  &is_linux, NULL, NULL);
+        kernel_size = load_uimage_as(info->kernel_filename, &entry, NULL,
+                                     &is_linux, NULL, NULL, as);
     }
     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64) && kernel_size < 0) {
         kernel_size = load_aarch64_image(info->kernel_filename,
-                                         info->loader_start, &entry);
+                                         info->loader_start, &entry, as);
         is_linux = 1;
     } else if (kernel_size < 0) {
         /* 32-bit ARM */
         entry = info->loader_start + KERNEL_LOAD_ADDR;
-        kernel_size = load_image_targphys(info->kernel_filename, entry,
-                                          info->ram_size - KERNEL_LOAD_ADDR);
+        kernel_size = load_image_targphys_as(info->kernel_filename, entry,
+                                             info->ram_size - KERNEL_LOAD_ADDR,
+                                             as);
         is_linux = 1;
     }
     if (kernel_size < 0) {
@@ -XXX,XX +XXX,XX @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
         uint32_t fixupcontext[FIXUP_MAX];
 
         if (info->initrd_filename) {
-            initrd_size = load_ramdisk(info->initrd_filename,
-                                       info->initrd_start,
-                                       info->ram_size -
-                                       info->initrd_start);
+            initrd_size = load_ramdisk_as(info->initrd_filename,
+                                          info->initrd_start,
+                                          info->ram_size - info->initrd_start,
+                                          as);
             if (initrd_size < 0) {
-                initrd_size = load_image_targphys(info->initrd_filename,
-                                                  info->initrd_start,
-                                                  info->ram_size -
-                                                  info->initrd_start);
+                initrd_size = load_image_targphys_as(info->initrd_filename,
+                                                     info->initrd_start,
+                                                     info->ram_size -
+                                                     info->initrd_start,
+                                                     as);
             }
             if (initrd_size < 0) {
                 error_report("could not load initrd '%s'",
@@ -XXX,XX +XXX,XX @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
 
             /* Place the DTB after the initrd in memory with alignment. */
             dtb_start = QEMU_ALIGN_UP(info->initrd_start + initrd_size, align);
-            if (load_dtb(dtb_start, info, 0) < 0) {
+            if (load_dtb(dtb_start, info, 0, as) < 0) {
                 exit(1);
             }
             fixupcontext[FIXUP_ARGPTR] = dtb_start;
@@ -XXX,XX +XXX,XX @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
         fixupcontext[FIXUP_ENTRYPOINT] = entry;
 
         write_bootloader("bootloader", info->loader_start,
-                         primary_loader, fixupcontext);
+                         primary_loader, fixupcontext, as);
 
         if (info->nb_cpus > 1) {
             info->write_secondary_boot(cpu, info);
-- 
2.16.2

Instead of loading guest images to the system address space, use the
CPU's address space.  This is important if we're trying to load the
file to memory or via an alias memory region that is provided by an
SoC object and thus not mapped into the system address space.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-4-peter.maydell@linaro.org
---
 hw/arm/armv7m.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -XXX,XX +XXX,XX @@ void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int mem_size)
     uint64_t entry;
     uint64_t lowaddr;
     int big_endian;
+    AddressSpace *as;
+    int asidx;
+    CPUState *cs = CPU(cpu);
 
 #ifdef TARGET_WORDS_BIGENDIAN
     big_endian = 1;
@@ -XXX,XX +XXX,XX @@ void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int mem_size)
         exit(1);
     }
 
+    if (arm_feature(&cpu->env, ARM_FEATURE_EL3)) {
+        asidx = ARMASIdx_S;
+    } else {
+        asidx = ARMASIdx_NS;
+    }
+    as = cpu_get_address_space(cs, asidx);
+
     if (kernel_filename) {
-        image_size = load_elf(kernel_filename, NULL, NULL, &entry, &lowaddr,
-                              NULL, big_endian, EM_ARM, 1, 0);
+        image_size = load_elf_as(kernel_filename, NULL, NULL, &entry, &lowaddr,
+                                 NULL, big_endian, EM_ARM, 1, 0, as);
         if (image_size < 0) {
-            image_size = load_image_targphys(kernel_filename, 0, mem_size);
+            image_size = load_image_targphys_as(kernel_filename, 0,
+                                                mem_size, as);
             lowaddr = 0;
         }
         if (image_size < 0) {
-- 
2.16.2

In v8M, the Implementation Defined Attribution Unit (IDAU) is
a small piece of hardware typically implemented in the SoC
which provides board or SoC specific security attribution
information for each address that the CPU performs MPU/SAU
checks on. For QEMU, we model this with a QOM interface which
is implemented by the board or SoC object and connected to
the CPU using a link property.

This commit defines the new interface class, adds the link
property to the CPU object, and makes the SAU checking
code call the IDAU interface if one is present.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-5-peter.maydell@linaro.org
---
 target/arm/cpu.h    |  3 +++
 target/arm/idau.h   | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/cpu.c    | 15 +++++++++++++
 target/arm/helper.c | 28 +++++++++++++++++++++---
 4 files changed, 104 insertions(+), 3 deletions(-)
 create mode 100644 target/arm/idau.h

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
     /* MemoryRegion to use for secure physical accesses */
     MemoryRegion *secure_memory;
 
+    /* For v8M, pointer to the IDAU interface provided by board/SoC */
+    Object *idau;
+
     /* 'compatible' string for this CPU for Linux device trees */
     const char *dtb_compatible;
 
diff --git a/target/arm/idau.h b/target/arm/idau.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/idau.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * QEMU ARM CPU -- interface for the Arm v8M IDAU
+ *
+ * Copyright (c) 2018 Linaro Ltd
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see
+ * <http://www.gnu.org/licenses/gpl-2.0.html>
+ *
+ * In the v8M architecture, the IDAU is a small piece of hardware
+ * typically implemented in the SoC which provides board or SoC
+ * specific security attribution information for each address that
+ * the CPU performs MPU/SAU checks on. For QEMU, we model this with a
+ * QOM interface which is implemented by the board or SoC object and
+ * connected to the CPU using a link property.
+ */
+
+#ifndef TARGET_ARM_IDAU_H
+#define TARGET_ARM_IDAU_H
+
+#include "qom/object.h"
+
+#define TYPE_IDAU_INTERFACE "idau-interface"
+#define IDAU_INTERFACE(obj) \
+    INTERFACE_CHECK(IDAUInterface, (obj), TYPE_IDAU_INTERFACE)
+#define IDAU_INTERFACE_CLASS(class) \
+    OBJECT_CLASS_CHECK(IDAUInterfaceClass, (class), TYPE_IDAU_INTERFACE)
+#define IDAU_INTERFACE_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(IDAUInterfaceClass, (obj), TYPE_IDAU_INTERFACE)
+
+typedef struct IDAUInterface {
+    Object parent;
+} IDAUInterface;
+
+#define IREGION_NOTVALID -1
+
+typedef struct IDAUInterfaceClass {
+    InterfaceClass parent;
+
+    /* Check the specified address and return the IDAU security information
+     * for it by filling in iregion, exempt, ns and nsc:
+     *  iregion: IDAU region number, or IREGION_NOTVALID if not valid
+     *  exempt: true if address is exempt from security attribution
+     *  ns: true if the address is NonSecure
+     *  nsc: true if the address is NonSecure-callable
+     */
+    void (*check)(IDAUInterface *ii, uint32_t address, int *iregion,
+                  bool *exempt, bool *ns, bool *nsc);
+} IDAUInterfaceClass;
+
+#endif
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
+#include "target/arm/idau.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "cpu.h"
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_post_init(Object *obj)
         }
     }
 
+    if (arm_feature(&cpu->env, ARM_FEATURE_M_SECURITY)) {
+        object_property_add_link(obj, "idau", TYPE_IDAU_INTERFACE, &cpu->idau,
+                                 qdev_prop_allow_set_link_before_realize,
+                                 OBJ_PROP_LINK_UNREF_ON_RELEASE,
+                                 &error_abort);
+    }
+
     qdev_property_add_static(DEVICE(obj), &arm_cpu_cfgend_property,
                              &error_abort);
 }
@@ -XXX,XX +XXX,XX @@ static const TypeInfo arm_cpu_type_info = {
     .class_init = arm_cpu_class_init,
 };
 
+static const TypeInfo idau_interface_type_info = {
+    .name = TYPE_IDAU_INTERFACE,
+    .parent = TYPE_INTERFACE,
+    .class_size = sizeof(IDAUInterfaceClass),
+};
+
 static void arm_cpu_register_types(void)
 {
     const ARMCPUInfo *info = arm_cpus;
 
     type_register_static(&arm_cpu_type_info);
+    type_register_static(&idau_interface_type_info);
 
     while (info->name) {
         cpu_register(info);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/osdep.h"
+#include "target/arm/idau.h"
 #include "trace.h"
 #include "cpu.h"
 #include "internals.h"
@@ -XXX,XX +XXX,XX @@ static void v8m_security_lookup(CPUARMState *env, uint32_t address,
      */
     ARMCPU *cpu = arm_env_get_cpu(env);
     int r;
+    bool idau_exempt = false, idau_ns = true, idau_nsc = true;
+    int idau_region = IREGION_NOTVALID;
 
-    /* TODO: implement IDAU */
+    if (cpu->idau) {
+        IDAUInterfaceClass *iic = IDAU_INTERFACE_GET_CLASS(cpu->idau);
+        IDAUInterface *ii = IDAU_INTERFACE(cpu->idau);
+
+        iic->check(ii, address, &idau_region, &idau_exempt, &idau_ns,
+                   &idau_nsc);
+    }
 
     if (access_type == MMU_INST_FETCH && extract32(address, 28, 4) == 0xf) {
         /* 0xf0000000..0xffffffff is always S for insn fetches */
         return;
     }
 
-    if (v8m_is_sau_exempt(env, address, access_type)) {
+    if (idau_exempt || v8m_is_sau_exempt(env, address, access_type)) {
         sattrs->ns = !regime_is_secure(env, mmu_idx);
         return;
     }
 
+    if (idau_region != IREGION_NOTVALID) {
+        sattrs->irvalid = true;
+        sattrs->iregion = idau_region;
+    }
+
     switch (env->sau.ctrl & 3) {
     case 0: /* SAU.ENABLE == 0, SAU.ALLNS == 0 */
         break;
@@ -XXX,XX +XXX,XX @@ static void v8m_security_lookup(CPUARMState *env, uint32_t address,
             }
         }
 
-        /* TODO when we support the IDAU then it may override the result here */
+        /* The IDAU will override the SAU lookup results if it specifies
+         * higher security than the SAU does.
+         */
+        if (!idau_ns) {
+            if (sattrs->ns || (!idau_nsc && sattrs->nsc)) {
+                sattrs->ns = false;
+                sattrs->nsc = idau_nsc;
+            }
+        }
         break;
     }
 }
-- 
2.16.2

Create an "idau" property on the armv7m container object which
we can forward to the CPU object. Annoyingly, we can't use
object_property_add_alias() because the CPU object we want to
forward to doesn't exist until the armv7m container is realized.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-6-peter.maydell@linaro.org
---
 include/hw/arm/armv7m.h | 3 +++
 hw/arm/armv7m.c         | 9 +++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/hw/arm/armv7m.h b/include/hw/arm/armv7m.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/armv7m.h
+++ b/include/hw/arm/armv7m.h
@@ -XXX,XX +XXX,XX @@
 
 #include "hw/sysbus.h"
 #include "hw/intc/armv7m_nvic.h"
+#include "target/arm/idau.h"
 
 #define TYPE_BITBAND "ARM,bitband-memory"
 #define BITBAND(obj) OBJECT_CHECK(BitBandState, (obj), TYPE_BITBAND)
@@ -XXX,XX +XXX,XX @@ typedef struct {
  * + Property "memory": MemoryRegion defining the physical address space
  *   that CPU accesses see. (The NVIC, bitbanding and other CPU-internal
  *   devices will be automatically layered on top of this view.)
+ * + Property "idau": IDAU interface (forwarded to CPU object)
  */
 typedef struct ARMv7MState {
     /*< private >*/
@@ -XXX,XX +XXX,XX @@ typedef struct ARMv7MState {
     char *cpu_type;
     /* MemoryRegion the board provides to us (with its devices, RAM, etc) */
     MemoryRegion *board_memory;
+    Object *idau;
 } ARMv7MState;
 
 #endif
diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -XXX,XX +XXX,XX @@
 #include "sysemu/qtest.h"
 #include "qemu/error-report.h"
 #include "exec/address-spaces.h"
+#include "target/arm/idau.h"
 
 /* Bitbanded IO.  Each word corresponds to a single bit.  */
 
@@ -XXX,XX +XXX,XX @@ static void armv7m_realize(DeviceState *dev, Error **errp)
 
     object_property_set_link(OBJECT(s->cpu), OBJECT(&s->container), "memory",
                              &error_abort);
+    if (object_property_find(OBJECT(s->cpu), "idau", NULL)) {
+        object_property_set_link(OBJECT(s->cpu), s->idau, "idau", &err);
+        if (err != NULL) {
+            error_propagate(errp, err);
+            return;
+        }
+    }
     object_property_set_bool(OBJECT(s->cpu), true, "realized", &err);
     if (err != NULL) {
         error_propagate(errp, err);
@@ -XXX,XX +XXX,XX @@ static Property armv7m_properties[] = {
     DEFINE_PROP_STRING("cpu-type", ARMv7MState, cpu_type),
     DEFINE_PROP_LINK("memory", ARMv7MState, board_memory, TYPE_MEMORY_REGION,
                      MemoryRegion *),
+    DEFINE_PROP_LINK("idau", ARMv7MState, idau, TYPE_IDAU_INTERFACE, Object *),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.16.2

The Cortex-M33 allows the system to specify the reset value of the
secure Vector Table Offset Register (VTOR) by asserting config
signals. In particular, guest images for the MPS2 AN505 board rely
on the MPS2's initial VTOR being correct for that board.
Implement a QEMU property so board and SoC code can set the reset
value to the correct value.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-7-peter.maydell@linaro.org
---
 target/arm/cpu.h |  3 +++
 target/arm/cpu.c | 18 ++++++++++++++----
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
      */
     uint32_t psci_conduit;
 
+    /* For v8M, initial value of the Secure VTOR */
+    uint32_t init_svtor;
+
     /* [QEMU_]KVM_ARM_TARGET_* constant for this CPU, or
      * QEMU_KVM_ARM_TARGET_NONE if the kernel doesn't support this CPU type.
      */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
         uint32_t initial_msp; /* Loaded from 0x0 */
         uint32_t initial_pc; /* Loaded from 0x4 */
         uint8_t *rom;
+        uint32_t vecbase;
 
         if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
             env->v7m.secure = true;
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
         /* Unlike A/R profile, M profile defines the reset LR value */
         env->regs[14] = 0xffffffff;
 
-        /* Load the initial SP and PC from the vector table at address 0 */
-        rom = rom_ptr(0);
+        env->v7m.vecbase[M_REG_S] = cpu->init_svtor & 0xffffff80;
+
+        /* Load the initial SP and PC from offset 0 and 4 in the vector table */
+        vecbase = env->v7m.vecbase[env->v7m.secure];
+        rom = rom_ptr(vecbase);
         if (rom) {
             /* Address zero is covered by ROM which hasn't yet been
              * copied into physical memory.
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
              * it got copied into memory. In the latter case, rom_ptr
              * will return a NULL pointer and we should use ldl_phys instead.
              */
-            initial_msp = ldl_phys(s->as, 0);
-            initial_pc = ldl_phys(s->as, 4);
+            initial_msp = ldl_phys(s->as, vecbase);
+            initial_pc = ldl_phys(s->as, vecbase + 4);
         }
 
         env->regs[13] = initial_msp & 0xFFFFFFFC;
@@ -XXX,XX +XXX,XX @@ static Property arm_cpu_pmsav7_dregion_property =
                                            pmsav7_dregion,
                                            qdev_prop_uint32, uint32_t);
 
+/* M profile: initial value of the Secure VTOR */
+static Property arm_cpu_initsvtor_property =
+            DEFINE_PROP_UINT32("init-svtor", ARMCPU, init_svtor, 0);
+
 static void arm_cpu_post_init(Object *obj)
 {
     ARMCPU *cpu = ARM_CPU(obj);
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_post_init(Object *obj)
                                  qdev_prop_allow_set_link_before_realize,
                                  OBJ_PROP_LINK_UNREF_ON_RELEASE,
                                  &error_abort);
+        qdev_property_add_static(DEVICE(obj), &arm_cpu_initsvtor_property,
+                                 &error_abort);
     }
 
     qdev_property_add_static(DEVICE(obj), &arm_cpu_cfgend_property,
-- 
2.16.2

Create an "init-svtor" property on the armv7m container
object which we can forward to the CPU object.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-8-peter.maydell@linaro.org
---
 include/hw/arm/armv7m.h | 2 ++
 hw/arm/armv7m.c         | 9 +++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/hw/arm/armv7m.h b/include/hw/arm/armv7m.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/armv7m.h
+++ b/include/hw/arm/armv7m.h
@@ -XXX,XX +XXX,XX @@ typedef struct {
  *   that CPU accesses see. (The NVIC, bitbanding and other CPU-internal
  *   devices will be automatically layered on top of this view.)
  * + Property "idau": IDAU interface (forwarded to CPU object)
+ * + Property "init-svtor": secure VTOR reset value (forwarded to CPU object)
  */
 typedef struct ARMv7MState {
     /*< private >*/
@@ -XXX,XX +XXX,XX @@ typedef struct ARMv7MState {
     /* MemoryRegion the board provides to us (with its devices, RAM, etc) */
     MemoryRegion *board_memory;
     Object *idau;
+    uint32_t init_svtor;
 } ARMv7MState;
 
 #endif
diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -XXX,XX +XXX,XX @@ static void armv7m_realize(DeviceState *dev, Error **errp)
             return;
         }
     }
+    if (object_property_find(OBJECT(s->cpu), "init-svtor", NULL)) {
+        object_property_set_uint(OBJECT(s->cpu), s->init_svtor,
+                                 "init-svtor", &err);
+        if (err != NULL) {
+            error_propagate(errp, err);
+            return;
+        }
+    }
     object_property_set_bool(OBJECT(s->cpu), true, "realized", &err);
     if (err != NULL) {
         error_propagate(errp, err);
@@ -XXX,XX +XXX,XX @@ static Property armv7m_properties[] = {
     DEFINE_PROP_LINK("memory", ARMv7MState, board_memory, TYPE_MEMORY_REGION,
                      MemoryRegion *),
     DEFINE_PROP_LINK("idau", ARMv7MState, idau, TYPE_IDAU_INTERFACE, Object *),
+    DEFINE_PROP_UINT32("init-svtor", ARMv7MState, init_svtor, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.16.2

Add a Cortex-M33 definition. The M33 is an M profile CPU
which implements the ARM v8M architecture, including the
M profile Security Extension.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-9-peter.maydell@linaro.org
---
 target/arm/cpu.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void cortex_m4_initfn(Object *obj)
     cpu->id_isar5 = 0x00000000;
 }
 
+static void cortex_m33_initfn(Object *obj)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+
+    set_feature(&cpu->env, ARM_FEATURE_V8);
+    set_feature(&cpu->env, ARM_FEATURE_M);
+    set_feature(&cpu->env, ARM_FEATURE_M_SECURITY);
+    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
+    cpu->midr = 0x410fd213; /* r0p3 */
+    cpu->pmsav7_dregion = 16;
+    cpu->sau_sregion = 8;
+    cpu->id_pfr0 = 0x00000030;
+    cpu->id_pfr1 = 0x00000210;
+    cpu->id_dfr0 = 0x00200000;
+    cpu->id_afr0 = 0x00000000;
+    cpu->id_mmfr0 = 0x00101F40;
+    cpu->id_mmfr1 = 0x00000000;
+    cpu->id_mmfr2 = 0x01000000;
+    cpu->id_mmfr3 = 0x00000000;
+    cpu->id_isar0 = 0x01101110;
+    cpu->id_isar1 = 0x02212000;
+    cpu->id_isar2 = 0x20232232;
+    cpu->id_isar3 = 0x01111131;
+    cpu->id_isar4 = 0x01310132;
+    cpu->id_isar5 = 0x00000000;
+    cpu->clidr = 0x00000000;
+    cpu->ctr = 0x8000c000;
+}
+
 static void arm_v7m_class_init(ObjectClass *oc, void *data)
 {
     CPUClass *cc = CPU_CLASS(oc);
@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_cpus[] = {
                              .class_init = arm_v7m_class_init },
     { .name = "cortex-m4",   .initfn = cortex_m4_initfn,
                              .class_init = arm_v7m_class_init },
+    { .name = "cortex-m33",  .initfn = cortex_m33_initfn,
+                             .class_init = arm_v7m_class_init },
     { .name = "cortex-r5",   .initfn = cortex_r5_initfn },
     { .name = "cortex-a7",   .initfn = cortex_a7_initfn },
     { .name = "cortex-a8",   .initfn = cortex_a8_initfn },
-- 
2.16.2

Move the definition of the struct for the unimplemented-device
from unimp.c to unimp.h, so that users can embed the struct
in their own device structs if they prefer.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-10-peter.maydell@linaro.org
---
 include/hw/misc/unimp.h | 10 ++++++++++
 hw/misc/unimp.c         | 10 ----------
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/hw/misc/unimp.h b/include/hw/misc/unimp.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/unimp.h
+++ b/include/hw/misc/unimp.h
@@ -XXX,XX +XXX,XX @@
 
 #define TYPE_UNIMPLEMENTED_DEVICE "unimplemented-device"
 
+#define UNIMPLEMENTED_DEVICE(obj) \
+    OBJECT_CHECK(UnimplementedDeviceState, (obj), TYPE_UNIMPLEMENTED_DEVICE)
+
+typedef struct {
+    SysBusDevice parent_obj;
+    MemoryRegion iomem;
+    char *name;
+    uint64_t size;
+} UnimplementedDeviceState;
+
 /**
  * create_unimplemented_device: create and map a dummy device
  * @name: name of the device for debug logging
diff --git a/hw/misc/unimp.c b/hw/misc/unimp.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/unimp.c
+++ b/hw/misc/unimp.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/log.h"
 #include "qapi/error.h"
 
-#define UNIMPLEMENTED_DEVICE(obj) \
-    OBJECT_CHECK(UnimplementedDeviceState, (obj), TYPE_UNIMPLEMENTED_DEVICE)
-
-typedef struct {
-    SysBusDevice parent_obj;
-    MemoryRegion iomem;
-    char *name;
-    uint64_t size;
-} UnimplementedDeviceState;
-
 static uint64_t unimp_read(void *opaque, hwaddr offset, unsigned size)
 {
     UnimplementedDeviceState *s = UNIMPLEMENTED_DEVICE(opaque);
-- 
2.16.2

The function qdev_init_gpio_in_named() passes the DeviceState pointer
as the opaque data pointor for the irq handler function.  Usually
this is what you want, but in some cases it would be helpful to use
some other data pointer.

Add a new function qdev_init_gpio_in_named_with_opaque() which allows
the caller to specify the data pointer they want.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-12-peter.maydell@linaro.org
---
 include/hw/qdev-core.h | 30 ++++++++++++++++++++++++++++--
 hw/core/qdev.c         |  8 +++++---
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -XXX,XX +XXX,XX @@ BusState *qdev_get_child_bus(DeviceState *dev, const char *name);
 /* GPIO inputs also double as IRQ sinks.  */
 void qdev_init_gpio_in(DeviceState *dev, qemu_irq_handler handler, int n);
 void qdev_init_gpio_out(DeviceState *dev, qemu_irq *pins, int n);
-void qdev_init_gpio_in_named(DeviceState *dev, qemu_irq_handler handler,
-                             const char *name, int n);
 void qdev_init_gpio_out_named(DeviceState *dev, qemu_irq *pins,
                               const char *name, int n);
+/**
+ * qdev_init_gpio_in_named_with_opaque: create an array of input GPIO lines
+ *   for the specified device
+ *
+ * @dev: Device to create input GPIOs for
+ * @handler: Function to call when GPIO line value is set
+ * @opaque: Opaque data pointer to pass to @handler
+ * @name: Name of the GPIO input (must be unique for this device)
+ * @n: Number of GPIO lines in this input set
+ */
+void qdev_init_gpio_in_named_with_opaque(DeviceState *dev,
+                                         qemu_irq_handler handler,
+                                         void *opaque,
+                                         const char *name, int n);
+
+/**
+ * qdev_init_gpio_in_named: create an array of input GPIO lines
+ *   for the specified device
+ *
+ * Like qdev_init_gpio_in_named_with_opaque(), but the opaque pointer
+ * passed to the handler is @dev (which is the most commonly desired behaviour).
+ */
+static inline void qdev_init_gpio_in_named(DeviceState *dev,
+                                           qemu_irq_handler handler,
+                                           const char *name, int n)
+{
+    qdev_init_gpio_in_named_with_opaque(dev, handler, dev, name, n);
+}
 
 void qdev_pass_gpios(DeviceState *dev, DeviceState *container,
                      const char *name);
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -XXX,XX +XXX,XX @@ static NamedGPIOList *qdev_get_named_gpio_list(DeviceState *dev,
     return ngl;
 }
 
-void qdev_init_gpio_in_named(DeviceState *dev, qemu_irq_handler handler,
-                             const char *name, int n)
+void qdev_init_gpio_in_named_with_opaque(DeviceState *dev,
+                                         qemu_irq_handler handler,
+                                         void *opaque,
+                                         const char *name, int n)
 {
     int i;
     NamedGPIOList *gpio_list = qdev_get_named_gpio_list(dev, name);
 
     assert(gpio_list->num_out == 0 || !name);
     gpio_list->in = qemu_extend_irqs(gpio_list->in, gpio_list->num_in, handler,
-                                     dev, n);
+                                     opaque, n);
 
     if (!name) {
         name = "unnamed-gpio-in";
-- 
2.16.2

In some board or SoC models it is necessary to split a qemu_irq line
so that one input can feed multiple outputs.  We currently have
qemu_irq_split() for this, but that has several deficiencies:
 * it can only handle splitting a line into two
 * it unavoidably leaks memory, so it can't be used
   in a device that can be deleted

Implement a qdev device that encapsulates splitting of IRQs, with a
configurable number of outputs.  (This is in some ways the inverse of
the TYPE_OR_IRQ device.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-13-peter.maydell@linaro.org
---
 hw/core/Makefile.objs       |  1 +
 include/hw/core/split-irq.h | 57 +++++++++++++++++++++++++++++
 include/hw/irq.h            |  4 +-
 hw/core/split-irq.c         | 89 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 150 insertions(+), 1 deletion(-)
 create mode 100644 include/hw/core/split-irq.h
 create mode 100644 hw/core/split-irq.c

diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -XXX,XX +XXX,XX @@ common-obj-$(CONFIG_FITLOADER) += loader-fit.o
 common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
 common-obj-$(CONFIG_SOFTMMU) += register.o
 common-obj-$(CONFIG_SOFTMMU) += or-irq.o
+common-obj-$(CONFIG_SOFTMMU) += split-irq.o
 common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
 
 obj-$(CONFIG_SOFTMMU) += generic-loader.o
diff --git a/include/hw/core/split-irq.h b/include/hw/core/split-irq.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/hw/core/split-irq.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * IRQ splitter device.
+ *
+ * Copyright (c) 2018 Linaro Limited.
+ * Written by Peter Maydell
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/* This is a simple device which has one GPIO input line and multiple
+ * GPIO output lines. Any change on the input line is forwarded to all
+ * of the outputs.
+ *
+ * QEMU interface:
+ *  + one unnamed GPIO input: the input line
+ *  + N unnamed GPIO outputs: the output lines
+ *  + QOM property "num-lines": sets the number of output lines
+ */
+#ifndef HW_SPLIT_IRQ_H
+#define HW_SPLIT_IRQ_H
+
+#include "hw/irq.h"
+#include "hw/sysbus.h"
+#include "qom/object.h"
+
+#define TYPE_SPLIT_IRQ "split-irq"
+
+#define MAX_SPLIT_LINES 16
+
+typedef struct SplitIRQ SplitIRQ;
+
+#define SPLIT_IRQ(obj) OBJECT_CHECK(SplitIRQ, (obj), TYPE_SPLIT_IRQ)
+
+struct SplitIRQ {
+    DeviceState parent_obj;
+
+    qemu_irq out_irq[MAX_SPLIT_LINES];
+    uint16_t num_lines;
+};
+
+#endif
diff --git a/include/hw/irq.h b/include/hw/irq.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/irq.h
+++ b/include/hw/irq.h
@@ -XXX,XX +XXX,XX @@ void qemu_free_irq(qemu_irq irq);
 /* Returns a new IRQ with opposite polarity.  */
 qemu_irq qemu_irq_invert(qemu_irq irq);
 
-/* Returns a new IRQ which feeds into both the passed IRQs */
+/* Returns a new IRQ which feeds into both the passed IRQs.
+ * It's probably better to use the TYPE_SPLIT_IRQ device instead.
+ */
 qemu_irq qemu_irq_split(qemu_irq irq1, qemu_irq irq2);
 
 /* Returns a new IRQ set which connects 1:1 to another IRQ set, which
diff --git a/hw/core/split-irq.c b/hw/core/split-irq.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/core/split-irq.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * IRQ splitter device.
+ *
+ * Copyright (c) 2018 Linaro Limited.
+ * Written by Peter Maydell
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/core/split-irq.h"
+#include "qapi/error.h"
+
+static void split_irq_handler(void *opaque, int n, int level)
+{
+    SplitIRQ *s = SPLIT_IRQ(opaque);
+    int i;
+
+    for (i = 0; i < s->num_lines; i++) {
+        qemu_set_irq(s->out_irq[i], level);
+    }
+}
+
+static void split_irq_init(Object *obj)
+{
+    qdev_init_gpio_in(DEVICE(obj), split_irq_handler, 1);
+}
+
+static void split_irq_realize(DeviceState *dev, Error **errp)
+{
+    SplitIRQ *s = SPLIT_IRQ(dev);
+
+    if (s->num_lines < 1 || s->num_lines >= MAX_SPLIT_LINES) {
+        error_setg(errp,
+                   "IRQ splitter number of lines %d is not between 1 and %d",
+                   s->num_lines, MAX_SPLIT_LINES);
+        return;
+    }
+
+    qdev_init_gpio_out(dev, s->out_irq, s->num_lines);
+}
+
+static Property split_irq_properties[] = {
+    DEFINE_PROP_UINT16("num-lines", SplitIRQ, num_lines, 1),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void split_irq_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    /* No state to reset or migrate */
+    dc->props = split_irq_properties;
+    dc->realize = split_irq_realize;
+
+    /* Reason: Needs to be wired up to work */
+    dc->user_creatable = false;
+}
+
+static const TypeInfo split_irq_type_info = {
+   .name = TYPE_SPLIT_IRQ,
+   .parent = TYPE_DEVICE,
+   .instance_size = sizeof(SplitIRQ),
+   .instance_init = split_irq_init,
+   .class_init = split_irq_class_init,
+};
+
+static void split_irq_register_types(void)
+{
+    type_register_static(&split_irq_type_info);
+}
+
+type_init(split_irq_register_types)
-- 
2.16.2

The MPS2 AN505 FPGA image includes a "FPGA control block"
which is a small set of registers handling LEDs, buttons
and some counters.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-14-peter.maydell@linaro.org
---
 hw/misc/Makefile.objs           |   1 +
 include/hw/misc/mps2-fpgaio.h   |  43 ++++++++++
 hw/misc/mps2-fpgaio.c           | 176 ++++++++++++++++++++++++++++++++++++++++
 default-configs/arm-softmmu.mak |   1 +
 hw/misc/trace-events            |   6 ++
 5 files changed, 227 insertions(+)
 create mode 100644 include/hw/misc/mps2-fpgaio.h
 create mode 100644 hw/misc/mps2-fpgaio.c

Add a model of the TrustZone peripheral protection controller (PPC),
which is used to gate transactions to non-TZ-aware peripherals so
that secure software can configure them to not be accessible to
non-secure software.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-15-peter.maydell@linaro.org
---
 hw/misc/Makefile.objs           |   2 +
 include/hw/misc/tz-ppc.h        | 101 ++++++++++++++
 hw/misc/tz-ppc.c                | 302 ++++++++++++++++++++++++++++++++++++++++
 default-configs/arm-softmmu.mak |   2 +
 hw/misc/trace-events            |  11 ++
 5 files changed, 418 insertions(+)
 create mode 100644 include/hw/misc/tz-ppc.h
 create mode 100644 hw/misc/tz-ppc.c

diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_MIPS_ITU) += mips_itu.o
 obj-$(CONFIG_MPS2_FPGAIO) += mps2-fpgaio.o
 obj-$(CONFIG_MPS2_SCC) += mps2-scc.o
 
+obj-$(CONFIG_TZ_PPC) += tz-ppc.o
+
 obj-$(CONFIG_PVPANIC) += pvpanic.o
 obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
 obj-$(CONFIG_AUX) += auxbus.o
diff --git a/include/hw/misc/tz-ppc.h b/include/hw/misc/tz-ppc.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/hw/misc/tz-ppc.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM TrustZone peripheral protection controller emulation
+ *
+ * Copyright (c) 2018 Linaro Limited
+ * Written by Peter Maydell
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 or
+ * (at your option) any later version.
+ */
+
+/* This is a model of the TrustZone peripheral protection controller (PPC).
+ * It is documented in the ARM CoreLink SIE-200 System IP for Embedded TRM
+ * (DDI 0571G):
+ * https://developer.arm.com/products/architecture/m-profile/docs/ddi0571/g
+ *
+ * The PPC sits in front of peripherals and allows secure software to
+ * configure it to either pass through or reject transactions.
+ * Rejected transactions may be configured to either be aborted, or to
+ * behave as RAZ/WI. An interrupt can be signalled for a rejected transaction.
+ *
+ * The PPC has no register interface -- it is configured purely by a
+ * collection of input signals from other hardware in the system. Typically
+ * they are either hardwired or exposed in an ad-hoc register interface by
+ * the SoC that uses the PPC.
+ *
+ * This QEMU model can be used to model either the AHB5 or APB4 TZ PPC,
+ * since the only difference between them is that the AHB version has a
+ * "default" port which has no security checks applied. In QEMU the default
+ * port can be emulated simply by wiring its downstream devices directly
+ * into the parent address space, since the PPC does not need to intercept
+ * transactions there.
+ *
+ * In the hardware, selection of which downstream port to use is done by
+ * the user's decode logic asserting one of the hsel[] signals. In QEMU,
+ * we provide 16 MMIO regions, one per port, and the user maps these into
+ * the desired addresses to implement the address decode.
+ *
+ * QEMU interface:
+ * + sysbus MMIO regions 0..15: MemoryRegions defining the upstream end
+ *   of each of the 16 ports of the PPC
+ * + Property "port[0..15]": MemoryRegion defining the downstream device(s)
+ *   for each of the 16 ports of the PPC
+ * + Named GPIO inputs "cfg_nonsec[0..15]": set to 1 if the port should be
+ *   accessible to NonSecure transactions
+ * + Named GPIO inputs "cfg_ap[0..15]": set to 1 if the port should be
+ *   accessible to non-privileged transactions
+ * + Named GPIO input "cfg_sec_resp": set to 1 if a rejected transaction should
+ *   result in a transaction error, or 0 for the transaction to RAZ/WI
+ * + Named GPIO input "irq_enable": set to 1 to enable interrupts
+ * + Named GPIO input "irq_clear": set to 1 to clear a pending interrupt
+ * + Named GPIO output "irq": set for a transaction-failed interrupt
+ * + Property "NONSEC_MASK": if a bit is set in this mask then accesses to
+ *   the associated port do not have the TZ security check performed. (This
+ *   corresponds to the hardware allowing this to be set as a Verilog
+ *   parameter.)
+ */
+
+#ifndef TZ_PPC_H
+#define TZ_PPC_H
+
+#include "hw/sysbus.h"
+
+#define TYPE_TZ_PPC "tz-ppc"
+#define TZ_PPC(obj) OBJECT_CHECK(TZPPC, (obj), TYPE_TZ_PPC)
+
+#define TZ_NUM_PORTS 16
+
+typedef struct TZPPC TZPPC;
+
+typedef struct TZPPCPort {
+    TZPPC *ppc;
+    MemoryRegion upstream;
+    AddressSpace downstream_as;
+    MemoryRegion *downstream;
+} TZPPCPort;
+
+struct TZPPC {
+    /*< private >*/
+    SysBusDevice parent_obj;
+
+    /*< public >*/
+
+    /* State: these just track the values of our input signals */
+    bool cfg_nonsec[TZ_NUM_PORTS];
+    bool cfg_ap[TZ_NUM_PORTS];
+    bool cfg_sec_resp;
+    bool irq_enable;
+    bool irq_clear;
+    /* State: are we asserting irq ? */
+    bool irq_status;
+
+    qemu_irq irq;
+
+    /* Properties */
+    uint32_t nonsec_mask;
+
+    TZPPCPort port[TZ_NUM_PORTS];
+};
+
+#endif
diff --git a/hw/misc/tz-ppc.c b/hw/misc/tz-ppc.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/misc/tz-ppc.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM TrustZone peripheral protection controller emulation
+ *
+ * Copyright (c) 2018 Linaro Limited
+ * Written by Peter Maydell
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 or
+ * (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "trace.h"
+#include "hw/sysbus.h"
+#include "hw/registerfields.h"
+#include "hw/misc/tz-ppc.h"
+
+static void tz_ppc_update_irq(TZPPC *s)
+{
+    bool level = s->irq_status && s->irq_enable;
+
+    trace_tz_ppc_update_irq(level);
+    qemu_set_irq(s->irq, level);
+}
+
+static void tz_ppc_cfg_nonsec(void *opaque, int n, int level)
+{
+    TZPPC *s = TZ_PPC(opaque);
+
+    assert(n < TZ_NUM_PORTS);
+    trace_tz_ppc_cfg_nonsec(n, level);
+    s->cfg_nonsec[n] = level;
+}
+
+static void tz_ppc_cfg_ap(void *opaque, int n, int level)
+{
+    TZPPC *s = TZ_PPC(opaque);
+
+    assert(n < TZ_NUM_PORTS);
+    trace_tz_ppc_cfg_ap(n, level);
+    s->cfg_ap[n] = level;
+}
+
+static void tz_ppc_cfg_sec_resp(void *opaque, int n, int level)
+{
+    TZPPC *s = TZ_PPC(opaque);
+
+    trace_tz_ppc_cfg_sec_resp(level);
+    s->cfg_sec_resp = level;
+}
+
+static void tz_ppc_irq_enable(void *opaque, int n, int level)
+{
+    TZPPC *s = TZ_PPC(opaque);
+
+    trace_tz_ppc_irq_enable(level);
+    s->irq_enable = level;
+    tz_ppc_update_irq(s);
+}
+
+static void tz_ppc_irq_clear(void *opaque, int n, int level)
+{
+    TZPPC *s = TZ_PPC(opaque);
+
+    trace_tz_ppc_irq_clear(level);
+
+    s->irq_clear = level;
+    if (level) {
+        s->irq_status = false;
+        tz_ppc_update_irq(s);
+    }
+}
+
+static bool tz_ppc_check(TZPPC *s, int n, MemTxAttrs attrs)
+{
+    /* Check whether to allow an access to port n; return true if
+     * the check passes, and false if the transaction must be blocked.
+     * If the latter, the caller must check cfg_sec_resp to determine
+     * whether to abort or RAZ/WI the transaction.
+     * The checks are:
+     *  + nonsec_mask suppresses any check of the secure attribute
+     *  + otherwise, block if cfg_nonsec is 1 and transaction is secure,
+     *    or if cfg_nonsec is 0 and transaction is non-secure
+     *  + block if transaction is usermode and cfg_ap is 0
+     */
+    if ((attrs.secure == s->cfg_nonsec[n] && !(s->nonsec_mask & (1 << n))) ||
+        (attrs.user && !s->cfg_ap[n])) {
+        /* Block the transaction. */
+        if (!s->irq_clear) {
+            /* Note that holding irq_clear high suppresses interrupts */
+            s->irq_status = true;
+            tz_ppc_update_irq(s);
+        }
+        return false;
+    }
+    return true;
+}
+
+static MemTxResult tz_ppc_read(void *opaque, hwaddr addr, uint64_t *pdata,
+                               unsigned size, MemTxAttrs attrs)
+{
+    TZPPCPort *p = opaque;
+    TZPPC *s = p->ppc;
+    int n = p - s->port;
+    AddressSpace *as = &p->downstream_as;
+    uint64_t data;
+    MemTxResult res;
+
+    if (!tz_ppc_check(s, n, attrs)) {
+        trace_tz_ppc_read_blocked(n, addr, attrs.secure, attrs.user);
+        if (s->cfg_sec_resp) {
+            return MEMTX_ERROR;
+        } else {
+            *pdata = 0;
+            return MEMTX_OK;
+        }
+    }
+
+    switch (size) {
+    case 1:
+        data = address_space_ldub(as, addr, attrs, &res);
+        break;
+    case 2:
+        data = address_space_lduw_le(as, addr, attrs, &res);
+        break;
+    case 4:
+        data = address_space_ldl_le(as, addr, attrs, &res);
+        break;
+    case 8:
+        data = address_space_ldq_le(as, addr, attrs, &res);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    *pdata = data;
+    return res;
+}
+
+static MemTxResult tz_ppc_write(void *opaque, hwaddr addr, uint64_t val,
+                                unsigned size, MemTxAttrs attrs)
+{
+    TZPPCPort *p = opaque;
+    TZPPC *s = p->ppc;
+    AddressSpace *as = &p->downstream_as;
+    int n = p - s->port;
+    MemTxResult res;
+
+    if (!tz_ppc_check(s, n, attrs)) {
+        trace_tz_ppc_write_blocked(n, addr, attrs.secure, attrs.user);
+        if (s->cfg_sec_resp) {
+            return MEMTX_ERROR;
+        } else {
+            return MEMTX_OK;
+        }
+    }
+
+    switch (size) {
+    case 1:
+        address_space_stb(as, addr, val, attrs, &res);
+        break;
+    case 2:
+        address_space_stw_le(as, addr, val, attrs, &res);
+        break;
+    case 4:
+        address_space_stl_le(as, addr, val, attrs, &res);
+        break;
+    case 8:
+        address_space_stq_le(as, addr, val, attrs, &res);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return res;
+}
+
+static const MemoryRegionOps tz_ppc_ops = {
+    .read_with_attrs = tz_ppc_read,
+    .write_with_attrs = tz_ppc_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+static void tz_ppc_reset(DeviceState *dev)
+{
+    TZPPC *s = TZ_PPC(dev);
+
+    trace_tz_ppc_reset();
+    s->cfg_sec_resp = false;
+    memset(s->cfg_nonsec, 0, sizeof(s->cfg_nonsec));
+    memset(s->cfg_ap, 0, sizeof(s->cfg_ap));
+}
+
+static void tz_ppc_init(Object *obj)
+{
+    DeviceState *dev = DEVICE(obj);
+    TZPPC *s = TZ_PPC(obj);
+
+    qdev_init_gpio_in_named(dev, tz_ppc_cfg_nonsec, "cfg_nonsec", TZ_NUM_PORTS);
+    qdev_init_gpio_in_named(dev, tz_ppc_cfg_ap, "cfg_ap", TZ_NUM_PORTS);
+    qdev_init_gpio_in_named(dev, tz_ppc_cfg_sec_resp, "cfg_sec_resp", 1);
+    qdev_init_gpio_in_named(dev, tz_ppc_irq_enable, "irq_enable", 1);
+    qdev_init_gpio_in_named(dev, tz_ppc_irq_clear, "irq_clear", 1);
+    qdev_init_gpio_out_named(dev, &s->irq, "irq", 1);
+}
+
+static void tz_ppc_realize(DeviceState *dev, Error **errp)
+{
+    Object *obj = OBJECT(dev);
+    SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
+    TZPPC *s = TZ_PPC(dev);
+    int i;
+
+    /* We can't create the upstream end of the port until realize,
+     * as we don't know the size of the MR used as the downstream until then.
+     */
+    for (i = 0; i < TZ_NUM_PORTS; i++) {
+        TZPPCPort *port = &s->port[i];
+        char *name;
+        uint64_t size;
+
+        if (!port->downstream) {
+            continue;
+        }
+
+        name = g_strdup_printf("tz-ppc-port[%d]", i);
+
+        port->ppc = s;
+        address_space_init(&port->downstream_as, port->downstream, name);
+
+        size = memory_region_size(port->downstream);
+        memory_region_init_io(&port->upstream, obj, &tz_ppc_ops,
+                              port, name, size);
+        sysbus_init_mmio(sbd, &port->upstream);
+        g_free(name);
+    }
+}
+
+static const VMStateDescription tz_ppc_vmstate = {
+    .name = "tz-ppc",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_BOOL_ARRAY(cfg_nonsec, TZPPC, 16),
+        VMSTATE_BOOL_ARRAY(cfg_ap, TZPPC, 16),
+        VMSTATE_BOOL(cfg_sec_resp, TZPPC),
+        VMSTATE_BOOL(irq_enable, TZPPC),
+        VMSTATE_BOOL(irq_clear, TZPPC),
+        VMSTATE_BOOL(irq_status, TZPPC),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+#define DEFINE_PORT(N)                                          \
+    DEFINE_PROP_LINK("port[" #N "]", TZPPC, port[N].downstream, \
+                     TYPE_MEMORY_REGION, MemoryRegion *)
+
+static Property tz_ppc_properties[] = {
+    DEFINE_PROP_UINT32("NONSEC_MASK", TZPPC, nonsec_mask, 0),
+    DEFINE_PORT(0),
+    DEFINE_PORT(1),
+    DEFINE_PORT(2),
+    DEFINE_PORT(3),
+    DEFINE_PORT(4),
+    DEFINE_PORT(5),
+    DEFINE_PORT(6),
+    DEFINE_PORT(7),
+    DEFINE_PORT(8),
+    DEFINE_PORT(9),
+    DEFINE_PORT(10),
+    DEFINE_PORT(11),
+    DEFINE_PORT(12),
+    DEFINE_PORT(13),
+    DEFINE_PORT(14),
+    DEFINE_PORT(15),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void tz_ppc_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->realize = tz_ppc_realize;
+    dc->vmsd = &tz_ppc_vmstate;
+    dc->reset = tz_ppc_reset;
+    dc->props = tz_ppc_properties;
+}
+
+static const TypeInfo tz_ppc_info = {
+    .name = TYPE_TZ_PPC,
+    .parent = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(TZPPC),
+    .instance_init = tz_ppc_init,
+    .class_init = tz_ppc_class_init,
+};
+
+static void tz_ppc_register_types(void)
+{
+    type_register_static(&tz_ppc_info);
+}
+
+type_init(tz_ppc_register_types);
diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index XXXXXXX..XXXXXXX 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -XXX,XX +XXX,XX @@ CONFIG_CMSDK_APB_UART=y
 CONFIG_MPS2_FPGAIO=y
 CONFIG_MPS2_SCC=y
 
+CONFIG_TZ_PPC=y
+
 CONFIG_VERSATILE_PCI=y
 CONFIG_VERSATILE_I2C=y
 
diff --git a/hw/misc/trace-events b/hw/misc/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/trace-events
+++ b/hw/misc/trace-events
@@ -XXX,XX +XXX,XX @@ mos6522_get_next_irq_time(uint16_t latch, int64_t d, int64_t delta) "latch=%d co
 mos6522_set_sr_int(void) "set sr_int"
 mos6522_write(uint64_t addr, uint64_t val) "reg=0x%"PRIx64 " val=0x%"PRIx64
 mos6522_read(uint64_t addr, unsigned val) "reg=0x%"PRIx64 " val=0x%x"
+
+# hw/misc/tz-ppc.c
+tz_ppc_reset(void) "TZ PPC: reset"
+tz_ppc_cfg_nonsec(int n, int level) "TZ PPC: cfg_nonsec[%d] = %d"
+tz_ppc_cfg_ap(int n, int level) "TZ PPC: cfg_ap[%d] = %d"
+tz_ppc_cfg_sec_resp(int level) "TZ PPC: cfg_sec_resp = %d"
+tz_ppc_irq_enable(int level) "TZ PPC: int_enable = %d"
+tz_ppc_irq_clear(int level) "TZ PPC: int_clear = %d"
+tz_ppc_update_irq(int level) "TZ PPC: setting irq line to %d"
+tz_ppc_read_blocked(int n, hwaddr offset, bool secure, bool user) "TZ PPC: port %d offset 0x%" HWADDR_PRIx " read (secure %d user %d) blocked"
+tz_ppc_write_blocked(int n, hwaddr offset, bool secure, bool user) "TZ PPC: port %d offset 0x%" HWADDR_PRIx " write (secure %d user %d) blocked"
-- 
2.16.2

The Arm IoT Kit includes a "security controller" which is largely a
collection of registers for controlling the PPCs and other bits of
glue in the system.  This commit provides the initial skeleton of the
device, implementing just the ID registers, and a couple of read-only
read-as-zero registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-16-peter.maydell@linaro.org
---
 hw/misc/Makefile.objs           |   1 +
 include/hw/misc/iotkit-secctl.h |  39 ++++
 hw/misc/iotkit-secctl.c         | 448 ++++++++++++++++++++++++++++++++++++++++
 default-configs/arm-softmmu.mak |   1 +
 hw/misc/trace-events            |   7 +
 5 files changed, 496 insertions(+)
 create mode 100644 include/hw/misc/iotkit-secctl.h
 create mode 100644 hw/misc/iotkit-secctl.c

diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_MPS2_FPGAIO) += mps2-fpgaio.o
 obj-$(CONFIG_MPS2_SCC) += mps2-scc.o
 
 obj-$(CONFIG_TZ_PPC) += tz-ppc.o
+obj-$(CONFIG_IOTKIT_SECCTL) += iotkit-secctl.o
 
 obj-$(CONFIG_PVPANIC) += pvpanic.o
 obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
diff --git a/include/hw/misc/iotkit-secctl.h b/include/hw/misc/iotkit-secctl.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/hw/misc/iotkit-secctl.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM IoT Kit security controller
+ *
+ * Copyright (c) 2018 Linaro Limited
+ * Written by Peter Maydell
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 or
+ * (at your option) any later version.
+ */
+
+/* This is a model of the security controller which is part of the
+ * Arm IoT Kit and documented in
+ * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
+ *
+ * QEMU interface:
+ *  + sysbus MMIO region 0 is the "secure privilege control block" registers
+ *  + sysbus MMIO region 1 is the "non-secure privilege control block" registers
+ */
+
+#ifndef IOTKIT_SECCTL_H
+#define IOTKIT_SECCTL_H
+
+#include "hw/sysbus.h"
+
+#define TYPE_IOTKIT_SECCTL "iotkit-secctl"
+#define IOTKIT_SECCTL(obj) OBJECT_CHECK(IoTKitSecCtl, (obj), TYPE_IOTKIT_SECCTL)
+
+typedef struct IoTKitSecCtl {
+    /*< private >*/
+    SysBusDevice parent_obj;
+
+    /*< public >*/
+
+    MemoryRegion s_regs;
+    MemoryRegion ns_regs;
+} IoTKitSecCtl;
+
+#endif
diff --git a/hw/misc/iotkit-secctl.c b/hw/misc/iotkit-secctl.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/misc/iotkit-secctl.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * Arm IoT Kit security controller
+ *
+ * Copyright (c) 2018 Linaro Limited
+ * Written by Peter Maydell
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 or
+ * (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "trace.h"
+#include "hw/sysbus.h"
+#include "hw/registerfields.h"
+#include "hw/misc/iotkit-secctl.h"
+
+/* Registers in the secure privilege control block */
+REG32(SECRESPCFG, 0x10)
+REG32(NSCCFG, 0x14)
+REG32(SECMPCINTSTATUS, 0x1c)
+REG32(SECPPCINTSTAT, 0x20)
+REG32(SECPPCINTCLR, 0x24)
+REG32(SECPPCINTEN, 0x28)
+REG32(SECMSCINTSTAT, 0x30)
+REG32(SECMSCINTCLR, 0x34)
+REG32(SECMSCINTEN, 0x38)
+REG32(BRGINTSTAT, 0x40)
+REG32(BRGINTCLR, 0x44)
+REG32(BRGINTEN, 0x48)
+REG32(AHBNSPPC0, 0x50)
+REG32(AHBNSPPCEXP0, 0x60)
+REG32(AHBNSPPCEXP1, 0x64)
+REG32(AHBNSPPCEXP2, 0x68)
+REG32(AHBNSPPCEXP3, 0x6c)
+REG32(APBNSPPC0, 0x70)
+REG32(APBNSPPC1, 0x74)
+REG32(APBNSPPCEXP0, 0x80)
+REG32(APBNSPPCEXP1, 0x84)
+REG32(APBNSPPCEXP2, 0x88)
+REG32(APBNSPPCEXP3, 0x8c)
+REG32(AHBSPPPC0, 0x90)
+REG32(AHBSPPPCEXP0, 0xa0)
+REG32(AHBSPPPCEXP1, 0xa4)
+REG32(AHBSPPPCEXP2, 0xa8)
+REG32(AHBSPPPCEXP3, 0xac)
+REG32(APBSPPPC0, 0xb0)
+REG32(APBSPPPC1, 0xb4)
+REG32(APBSPPPCEXP0, 0xc0)
+REG32(APBSPPPCEXP1, 0xc4)
+REG32(APBSPPPCEXP2, 0xc8)
+REG32(APBSPPPCEXP3, 0xcc)
+REG32(NSMSCEXP, 0xd0)
+REG32(PID4, 0xfd0)
+REG32(PID5, 0xfd4)
+REG32(PID6, 0xfd8)
+REG32(PID7, 0xfdc)
+REG32(PID0, 0xfe0)
+REG32(PID1, 0xfe4)
+REG32(PID2, 0xfe8)
+REG32(PID3, 0xfec)
+REG32(CID0, 0xff0)
+REG32(CID1, 0xff4)
+REG32(CID2, 0xff8)
+REG32(CID3, 0xffc)
+
+/* Registers in the non-secure privilege control block */
+REG32(AHBNSPPPC0, 0x90)
+REG32(AHBNSPPPCEXP0, 0xa0)
+REG32(AHBNSPPPCEXP1, 0xa4)
+REG32(AHBNSPPPCEXP2, 0xa8)
+REG32(AHBNSPPPCEXP3, 0xac)
+REG32(APBNSPPPC0, 0xb0)
+REG32(APBNSPPPC1, 0xb4)
+REG32(APBNSPPPCEXP0, 0xc0)
+REG32(APBNSPPPCEXP1, 0xc4)
+REG32(APBNSPPPCEXP2, 0xc8)
+REG32(APBNSPPPCEXP3, 0xcc)
+/* PID and CID registers are also present in the NS block */
+
+static const uint8_t iotkit_secctl_s_idregs[] = {
+    0x04, 0x00, 0x00, 0x00,
+    0x52, 0xb8, 0x0b, 0x00,
+    0x0d, 0xf0, 0x05, 0xb1,
+};
+
+static const uint8_t iotkit_secctl_ns_idregs[] = {
+    0x04, 0x00, 0x00, 0x00,
+    0x53, 0xb8, 0x0b, 0x00,
+    0x0d, 0xf0, 0x05, 0xb1,
+};
+
+static MemTxResult iotkit_secctl_s_read(void *opaque, hwaddr addr,
+                                        uint64_t *pdata,
+                                        unsigned size, MemTxAttrs attrs)
+{
+    uint64_t r;
+    uint32_t offset = addr & ~0x3;
+
+    switch (offset) {
+    case A_AHBNSPPC0:
+    case A_AHBSPPPC0:
+        r = 0;
+        break;
+    case A_SECRESPCFG:
+    case A_NSCCFG:
+    case A_SECMPCINTSTATUS:
+    case A_SECPPCINTSTAT:
+    case A_SECPPCINTEN:
+    case A_SECMSCINTSTAT:
+    case A_SECMSCINTEN:
+    case A_BRGINTSTAT:
+    case A_BRGINTEN:
+    case A_AHBNSPPCEXP0:
+    case A_AHBNSPPCEXP1:
+    case A_AHBNSPPCEXP2:
+    case A_AHBNSPPCEXP3:
+    case A_APBNSPPC0:
+    case A_APBNSPPC1:
+    case A_APBNSPPCEXP0:
+    case A_APBNSPPCEXP1:
+    case A_APBNSPPCEXP2:
+    case A_APBNSPPCEXP3:
+    case A_AHBSPPPCEXP0:
+    case A_AHBSPPPCEXP1:
+    case A_AHBSPPPCEXP2:
+    case A_AHBSPPPCEXP3:
+    case A_APBSPPPC0:
+    case A_APBSPPPC1:
+    case A_APBSPPPCEXP0:
+    case A_APBSPPPCEXP1:
+    case A_APBSPPPCEXP2:
+    case A_APBSPPPCEXP3:
+    case A_NSMSCEXP:
+        qemu_log_mask(LOG_UNIMP,
+                      "IoTKit SecCtl S block read: "
+                      "unimplemented offset 0x%x\n", offset);
+        r = 0;
+        break;
+    case A_PID4:
+    case A_PID5:
+    case A_PID6:
+    case A_PID7:
+    case A_PID0:
+    case A_PID1:
+    case A_PID2:
+    case A_PID3:
+    case A_CID0:
+    case A_CID1:
+    case A_CID2:
+    case A_CID3:
+        r = iotkit_secctl_s_idregs[(offset - A_PID4) / 4];
+        break;
+    case A_SECPPCINTCLR:
+    case A_SECMSCINTCLR:
+    case A_BRGINTCLR:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "IotKit SecCtl S block read: write-only offset 0x%x\n",
+                      offset);
+        r = 0;
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "IotKit SecCtl S block read: bad offset 0x%x\n", offset);
+        r = 0;
+        break;
+    }
+
+    if (size != 4) {
+        /* None of our registers are access-sensitive, so just pull the right
+         * byte out of the word read result.
+         */
+        r = extract32(r, (addr & 3) * 8, size * 8);
+    }
+
+    trace_iotkit_secctl_s_read(offset, r, size);
+    *pdata = r;
+    return MEMTX_OK;
+}
+
+static MemTxResult iotkit_secctl_s_write(void *opaque, hwaddr addr,
+                                         uint64_t value,
+                                         unsigned size, MemTxAttrs attrs)
+{
+    uint32_t offset = addr;
+
+    trace_iotkit_secctl_s_write(offset, value, size);
+
+    if (size != 4) {
+        /* Byte and halfword writes are ignored */
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "IotKit SecCtl S block write: bad size, ignored\n");
+        return MEMTX_OK;
+    }
+
+    switch (offset) {
+    case A_SECRESPCFG:
+    case A_NSCCFG:
+    case A_SECPPCINTCLR:
+    case A_SECPPCINTEN:
+    case A_SECMSCINTCLR:
+    case A_SECMSCINTEN:
+    case A_BRGINTCLR:
+    case A_BRGINTEN:
+    case A_AHBNSPPCEXP0:
+    case A_AHBNSPPCEXP1:
+    case A_AHBNSPPCEXP2:
+    case A_AHBNSPPCEXP3:
+    case A_APBNSPPC0:
+    case A_APBNSPPC1:
+    case A_APBNSPPCEXP0:
+    case A_APBNSPPCEXP1:
+    case A_APBNSPPCEXP2:
+    case A_APBNSPPCEXP3:
+    case A_AHBSPPPCEXP0:
+    case A_AHBSPPPCEXP1:
+    case A_AHBSPPPCEXP2:
+    case A_AHBSPPPCEXP3:
+    case A_APBSPPPC0:
+    case A_APBSPPPC1:
+    case A_APBSPPPCEXP0:
+    case A_APBSPPPCEXP1:
+    case A_APBSPPPCEXP2:
+    case A_APBSPPPCEXP3:
+        qemu_log_mask(LOG_UNIMP,
+                      "IoTKit SecCtl S block write: "
+                      "unimplemented offset 0x%x\n", offset);
+        break;
+    case A_SECMPCINTSTATUS:
+    case A_SECPPCINTSTAT:
+    case A_SECMSCINTSTAT:
+    case A_BRGINTSTAT:
+    case A_AHBNSPPC0:
+    case A_AHBSPPPC0:
+    case A_NSMSCEXP:
+    case A_PID4:
+    case A_PID5:
+    case A_PID6:
+    case A_PID7:
+    case A_PID0:
+    case A_PID1:
+    case A_PID2:
+    case A_PID3:
+    case A_CID0:
+    case A_CID1:
+    case A_CID2:
+    case A_CID3:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "IoTKit SecCtl S block write: "
+                      "read-only offset 0x%x\n", offset);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "IotKit SecCtl S block write: bad offset 0x%x\n",
+                      offset);
+        break;
+    }
+
+    return MEMTX_OK;
+}
+
+static MemTxResult iotkit_secctl_ns_read(void *opaque, hwaddr addr,
+                                         uint64_t *pdata,
+                                         unsigned size, MemTxAttrs attrs)
+{
+    uint64_t r;
+    uint32_t offset = addr & ~0x3;
+
+    switch (offset) {
+    case A_AHBNSPPPC0:
+        r = 0;
+        break;
+    case A_AHBNSPPPCEXP0:
+    case A_AHBNSPPPCEXP1:
+    case A_AHBNSPPPCEXP2:
+    case A_AHBNSPPPCEXP3:
+    case A_APBNSPPPC0:
+    case A_APBNSPPPC1:
+    case A_APBNSPPPCEXP0:
+    case A_APBNSPPPCEXP1:
+    case A_APBNSPPPCEXP2:
+    case A_APBNSPPPCEXP3:
+        qemu_log_mask(LOG_UNIMP,
+                      "IoTKit SecCtl NS block read: "
+                      "unimplemented offset 0x%x\n", offset);
+        break;
+    case A_PID4:
+    case A_PID5:
+    case A_PID6:
+    case A_PID7:
+    case A_PID0:
+    case A_PID1:
+    case A_PID2:
+    case A_PID3:
+    case A_CID0:
+    case A_CID1:
+    case A_CID2:
+    case A_CID3:
+        r = iotkit_secctl_ns_idregs[(offset - A_PID4) / 4];
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "IotKit SecCtl NS block write: bad offset 0x%x\n",
+                      offset);
+        r = 0;
+        break;
+    }
+
+    if (size != 4) {
+        /* None of our registers are access-sensitive, so just pull the right
+         * byte out of the word read result.
+         */
+        r = extract32(r, (addr & 3) * 8, size * 8);
+    }
+
+    trace_iotkit_secctl_ns_read(offset, r, size);
+    *pdata = r;
+    return MEMTX_OK;
+}
+
+static MemTxResult iotkit_secctl_ns_write(void *opaque, hwaddr addr,
+                                          uint64_t value,
+                                          unsigned size, MemTxAttrs attrs)
+{
+    uint32_t offset = addr;
+
+    trace_iotkit_secctl_ns_write(offset, value, size);
+
+    if (size != 4) {
+        /* Byte and halfword writes are ignored */
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "IotKit SecCtl NS block write: bad size, ignored\n");
+        return MEMTX_OK;
+    }
+
+    switch (offset) {
+    case A_AHBNSPPPCEXP0:
+    case A_AHBNSPPPCEXP1:
+    case A_AHBNSPPPCEXP2:
+    case A_AHBNSPPPCEXP3:
+    case A_APBNSPPPC0:
+    case A_APBNSPPPC1:
+    case A_APBNSPPPCEXP0:
+    case A_APBNSPPPCEXP1:
+    case A_APBNSPPPCEXP2:
+    case A_APBNSPPPCEXP3:
+        qemu_log_mask(LOG_UNIMP,
+                      "IoTKit SecCtl NS block write: "
+                      "unimplemented offset 0x%x\n", offset);
+        break;
+    case A_AHBNSPPPC0:
+    case A_PID4:
+    case A_PID5:
+    case A_PID6:
+    case A_PID7:
+    case A_PID0:
+    case A_PID1:
+    case A_PID2:
+    case A_PID3:
+    case A_CID0:
+    case A_CID1:
+    case A_CID2:
+    case A_CID3:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "IoTKit SecCtl NS block write: "
+                      "read-only offset 0x%x\n", offset);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "IotKit SecCtl NS block write: bad offset 0x%x\n",
+                      offset);
+        break;
+    }
+
+    return MEMTX_OK;
+}
+
+static const MemoryRegionOps iotkit_secctl_s_ops = {
+    .read_with_attrs = iotkit_secctl_s_read,
+    .write_with_attrs = iotkit_secctl_s_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .valid.min_access_size = 1,
+    .valid.max_access_size = 4,
+    .impl.min_access_size = 1,
+    .impl.max_access_size = 4,
+};
+
+static const MemoryRegionOps iotkit_secctl_ns_ops = {
+    .read_with_attrs = iotkit_secctl_ns_read,
+    .write_with_attrs = iotkit_secctl_ns_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .valid.min_access_size = 1,
+    .valid.max_access_size = 4,
+    .impl.min_access_size = 1,
+    .impl.max_access_size = 4,
+};
+
+static void iotkit_secctl_reset(DeviceState *dev)
+{
+
+}
+
+static void iotkit_secctl_init(Object *obj)
+{
+    IoTKitSecCtl *s = IOTKIT_SECCTL(obj);
+    SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
+
+    memory_region_init_io(&s->s_regs, obj, &iotkit_secctl_s_ops,
+                          s, "iotkit-secctl-s-regs", 0x1000);
+    memory_region_init_io(&s->ns_regs, obj, &iotkit_secctl_ns_ops,
+                          s, "iotkit-secctl-ns-regs", 0x1000);
+    sysbus_init_mmio(sbd, &s->s_regs);
+    sysbus_init_mmio(sbd, &s->ns_regs);
+}
+
+static const VMStateDescription iotkit_secctl_vmstate = {
+    .name = "iotkit-secctl",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void iotkit_secctl_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->vmsd = &iotkit_secctl_vmstate;
+    dc->reset = iotkit_secctl_reset;
+}
+
+static const TypeInfo iotkit_secctl_info = {
+    .name = TYPE_IOTKIT_SECCTL,
+    .parent = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(IoTKitSecCtl),
+    .instance_init = iotkit_secctl_init,
+    .class_init = iotkit_secctl_class_init,
+};
+
+static void iotkit_secctl_register_types(void)
+{
+    type_register_static(&iotkit_secctl_info);
+}
+
+type_init(iotkit_secctl_register_types);
diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index XXXXXXX..XXXXXXX 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -XXX,XX +XXX,XX @@ CONFIG_MPS2_FPGAIO=y
 CONFIG_MPS2_SCC=y
 
 CONFIG_TZ_PPC=y
+CONFIG_IOTKIT_SECCTL=y
 
 CONFIG_VERSATILE_PCI=y
 CONFIG_VERSATILE_I2C=y
diff --git a/hw/misc/trace-events b/hw/misc/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/trace-events
+++ b/hw/misc/trace-events
@@ -XXX,XX +XXX,XX @@ tz_ppc_irq_clear(int level) "TZ PPC: int_clear = %d"
 tz_ppc_update_irq(int level) "TZ PPC: setting irq line to %d"
 tz_ppc_read_blocked(int n, hwaddr offset, bool secure, bool user) "TZ PPC: port %d offset 0x%" HWADDR_PRIx " read (secure %d user %d) blocked"
 tz_ppc_write_blocked(int n, hwaddr offset, bool secure, bool user) "TZ PPC: port %d offset 0x%" HWADDR_PRIx " write (secure %d user %d) blocked"
+
+# hw/misc/iotkit-secctl.c
+iotkit_secctl_s_read(uint32_t offset, uint64_t data, unsigned size) "IoTKit SecCtl S regs read: offset 0x%x data 0x%" PRIx64 " size %u"
+iotkit_secctl_s_write(uint32_t offset, uint64_t data, unsigned size) "IoTKit SecCtl S regs write: offset 0x%x data 0x%" PRIx64 " size %u"
+iotkit_secctl_ns_read(uint32_t offset, uint64_t data, unsigned size) "IoTKit SecCtl NS regs read: offset 0x%x data 0x%" PRIx64 " size %u"
+iotkit_secctl_ns_write(uint32_t offset, uint64_t data, unsigned size) "IoTKit SecCtl NS regs write: offset 0x%x data 0x%" PRIx64 " size %u"
+iotkit_secctl_reset(void) "IoTKit SecCtl: reset"
-- 
2.16.2

The IoTKit Security Controller includes various registers
that expose to software the controls for the Peripheral
Protection Controllers in the system. Implement these.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-17-peter.maydell@linaro.org
---
 include/hw/misc/iotkit-secctl.h |  64 +++++++++-
 hw/misc/iotkit-secctl.c         | 270 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 315 insertions(+), 19 deletions(-)

diff --git a/include/hw/misc/iotkit-secctl.h b/include/hw/misc/iotkit-secctl.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/iotkit-secctl.h
+++ b/include/hw/misc/iotkit-secctl.h
@@ -XXX,XX +XXX,XX @@
  * QEMU interface:
  *  + sysbus MMIO region 0 is the "secure privilege control block" registers
  *  + sysbus MMIO region 1 is the "non-secure privilege control block" registers
+ *  + named GPIO output "sec_resp_cfg" indicating whether blocked accesses
+ *    should RAZ/WI or bus error
+ * Controlling the 2 APB PPCs in the IoTKit:
+ *  + named GPIO outputs apb_ppc0_nonsec[0..2] and apb_ppc1_nonsec
+ *  + named GPIO outputs apb_ppc0_ap[0..2] and apb_ppc1_ap
+ *  + named GPIO outputs apb_ppc{0,1}_irq_enable
+ *  + named GPIO outputs apb_ppc{0,1}_irq_clear
+ *  + named GPIO inputs apb_ppc{0,1}_irq_status
+ * Controlling each of the 4 expansion APB PPCs which a system using the IoTKit
+ * might provide:
+ *  + named GPIO outputs apb_ppcexp{0,1,2,3}_nonsec[0..15]
+ *  + named GPIO outputs apb_ppcexp{0,1,2,3}_ap[0..15]
+ *  + named GPIO outputs apb_ppcexp{0,1,2,3}_irq_enable
+ *  + named GPIO outputs apb_ppcexp{0,1,2,3}_irq_clear
+ *  + named GPIO inputs apb_ppcexp{0,1,2,3}_irq_status
+ * Controlling each of the 4 expansion AHB PPCs which a system using the IoTKit
+ * might provide:
+ *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_nonsec[0..15]
+ *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_ap[0..15]
+ *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_enable
+ *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_clear
+ *  + named GPIO inputs ahb_ppcexp{0,1,2,3}_irq_status
  */
 
 #ifndef IOTKIT_SECCTL_H
@@ -XXX,XX +XXX,XX @@
 #define TYPE_IOTKIT_SECCTL "iotkit-secctl"
 #define IOTKIT_SECCTL(obj) OBJECT_CHECK(IoTKitSecCtl, (obj), TYPE_IOTKIT_SECCTL)
 
-typedef struct IoTKitSecCtl {
+#define IOTS_APB_PPC0_NUM_PORTS 3
+#define IOTS_APB_PPC1_NUM_PORTS 1
+#define IOTS_PPC_NUM_PORTS 16
+#define IOTS_NUM_APB_PPC 2
+#define IOTS_NUM_APB_EXP_PPC 4
+#define IOTS_NUM_AHB_EXP_PPC 4
+
+typedef struct IoTKitSecCtl IoTKitSecCtl;
+
+/* State and IRQ lines relating to a PPC. For the
+ * PPCs in the IoTKit not all the IRQ lines are used.
+ */
+typedef struct IoTKitSecCtlPPC {
+    qemu_irq nonsec[IOTS_PPC_NUM_PORTS];
+    qemu_irq ap[IOTS_PPC_NUM_PORTS];
+    qemu_irq irq_enable;
+    qemu_irq irq_clear;
+
+    uint32_t ns;
+    uint32_t sp;
+    uint32_t nsp;
+
+    /* Number of ports actually present */
+    int numports;
+    /* Offset of this PPC's interrupt bits in SECPPCINTSTAT */
+    int irq_bit_offset;
+    IoTKitSecCtl *parent;
+} IoTKitSecCtlPPC;
+
+struct IoTKitSecCtl {
     /*< private >*/
     SysBusDevice parent_obj;
 
     /*< public >*/
+    qemu_irq sec_resp_cfg;
 
     MemoryRegion s_regs;
     MemoryRegion ns_regs;
-} IoTKitSecCtl;
+
+    uint32_t secppcintstat;
+    uint32_t secppcinten;
+    uint32_t secrespcfg;
+
+    IoTKitSecCtlPPC apb[IOTS_NUM_APB_PPC];
+    IoTKitSecCtlPPC apbexp[IOTS_NUM_APB_EXP_PPC];
+    IoTKitSecCtlPPC ahbexp[IOTS_NUM_APB_EXP_PPC];
+};
 
 #endif
diff --git a/hw/misc/iotkit-secctl.c b/hw/misc/iotkit-secctl.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/iotkit-secctl.c
+++ b/hw/misc/iotkit-secctl.c
@@ -XXX,XX +XXX,XX @@ static const uint8_t iotkit_secctl_ns_idregs[] = {
     0x0d, 0xf0, 0x05, 0xb1,
 };
 
+/* The register sets for the various PPCs (AHB internal, APB internal,
+ * AHB expansion, APB expansion) are all set up so that they are
+ * in 16-aligned blocks so offsets 0xN0, 0xN4, 0xN8, 0xNC are PPCs
+ * 0, 1, 2, 3 of that type, so we can convert a register address offset
+ * into an an index into a PPC array easily.
+ */
+static inline int offset_to_ppc_idx(uint32_t offset)
+{
+    return extract32(offset, 2, 2);
+}
+
+typedef void PerPPCFunction(IoTKitSecCtlPPC *ppc);
+
+static void foreach_ppc(IoTKitSecCtl *s, PerPPCFunction *fn)
+{
+    int i;
+
+    for (i = 0; i < IOTS_NUM_APB_PPC; i++) {
+        fn(&s->apb[i]);
+    }
+    for (i = 0; i < IOTS_NUM_APB_EXP_PPC; i++) {
+        fn(&s->apbexp[i]);
+    }
+    for (i = 0; i < IOTS_NUM_AHB_EXP_PPC; i++) {
+        fn(&s->ahbexp[i]);
+    }
+}
+
 static MemTxResult iotkit_secctl_s_read(void *opaque, hwaddr addr,
                                         uint64_t *pdata,
                                         unsigned size, MemTxAttrs attrs)
 {
     uint64_t r;
     uint32_t offset = addr & ~0x3;
+    IoTKitSecCtl *s = IOTKIT_SECCTL(opaque);
 
     switch (offset) {
     case A_AHBNSPPC0:
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_read(void *opaque, hwaddr addr,
         r = 0;
         break;
     case A_SECRESPCFG:
-    case A_NSCCFG:
-    case A_SECMPCINTSTATUS:
+        r = s->secrespcfg;
+        break;
     case A_SECPPCINTSTAT:
+        r = s->secppcintstat;
+        break;
     case A_SECPPCINTEN:
-    case A_SECMSCINTSTAT:
-    case A_SECMSCINTEN:
-    case A_BRGINTSTAT:
-    case A_BRGINTEN:
+        r = s->secppcinten;
+        break;
     case A_AHBNSPPCEXP0:
     case A_AHBNSPPCEXP1:
     case A_AHBNSPPCEXP2:
     case A_AHBNSPPCEXP3:
+        r = s->ahbexp[offset_to_ppc_idx(offset)].ns;
+        break;
     case A_APBNSPPC0:
     case A_APBNSPPC1:
+        r = s->apb[offset_to_ppc_idx(offset)].ns;
+        break;
     case A_APBNSPPCEXP0:
     case A_APBNSPPCEXP1:
     case A_APBNSPPCEXP2:
     case A_APBNSPPCEXP3:
+        r = s->apbexp[offset_to_ppc_idx(offset)].ns;
+        break;
     case A_AHBSPPPCEXP0:
     case A_AHBSPPPCEXP1:
     case A_AHBSPPPCEXP2:
     case A_AHBSPPPCEXP3:
+        r = s->apbexp[offset_to_ppc_idx(offset)].sp;
+        break;
     case A_APBSPPPC0:
     case A_APBSPPPC1:
+        r = s->apb[offset_to_ppc_idx(offset)].sp;
+        break;
     case A_APBSPPPCEXP0:
     case A_APBSPPPCEXP1:
     case A_APBSPPPCEXP2:
     case A_APBSPPPCEXP3:
+        r = s->apbexp[offset_to_ppc_idx(offset)].sp;
+        break;
+    case A_NSCCFG:
+    case A_SECMPCINTSTATUS:
+    case A_SECMSCINTSTAT:
+    case A_SECMSCINTEN:
+    case A_BRGINTSTAT:
+    case A_BRGINTEN:
     case A_NSMSCEXP:
         qemu_log_mask(LOG_UNIMP,
                       "IoTKit SecCtl S block read: "
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_read(void *opaque, hwaddr addr,
     return MEMTX_OK;
 }
 
+static void iotkit_secctl_update_ppc_ap(IoTKitSecCtlPPC *ppc)
+{
+    int i;
+
+    for (i = 0; i < ppc->numports; i++) {
+        bool v;
+
+        if (extract32(ppc->ns, i, 1)) {
+            v = extract32(ppc->nsp, i, 1);
+        } else {
+            v = extract32(ppc->sp, i, 1);
+        }
+        qemu_set_irq(ppc->ap[i], v);
+    }
+}
+
+static void iotkit_secctl_ppc_ns_write(IoTKitSecCtlPPC *ppc, uint32_t value)
+{
+    int i;
+
+    ppc->ns = value & MAKE_64BIT_MASK(0, ppc->numports);
+    for (i = 0; i < ppc->numports; i++) {
+        qemu_set_irq(ppc->nonsec[i], extract32(ppc->ns, i, 1));
+    }
+    iotkit_secctl_update_ppc_ap(ppc);
+}
+
+static void iotkit_secctl_ppc_sp_write(IoTKitSecCtlPPC *ppc, uint32_t value)
+{
+    ppc->sp = value & MAKE_64BIT_MASK(0, ppc->numports);
+    iotkit_secctl_update_ppc_ap(ppc);
+}
+
+static void iotkit_secctl_ppc_nsp_write(IoTKitSecCtlPPC *ppc, uint32_t value)
+{
+    ppc->nsp = value & MAKE_64BIT_MASK(0, ppc->numports);
+    iotkit_secctl_update_ppc_ap(ppc);
+}
+
+static void iotkit_secctl_ppc_update_irq_clear(IoTKitSecCtlPPC *ppc)
+{
+    uint32_t value = ppc->parent->secppcintstat;
+
+    qemu_set_irq(ppc->irq_clear, extract32(value, ppc->irq_bit_offset, 1));
+}
+
+static void iotkit_secctl_ppc_update_irq_enable(IoTKitSecCtlPPC *ppc)
+{
+    uint32_t value = ppc->parent->secppcinten;
+
+    qemu_set_irq(ppc->irq_enable, extract32(value, ppc->irq_bit_offset, 1));
+}
+
 static MemTxResult iotkit_secctl_s_write(void *opaque, hwaddr addr,
                                          uint64_t value,
                                          unsigned size, MemTxAttrs attrs)
 {
+    IoTKitSecCtl *s = IOTKIT_SECCTL(opaque);
     uint32_t offset = addr;
+    IoTKitSecCtlPPC *ppc;
 
     trace_iotkit_secctl_s_write(offset, value, size);
 
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_write(void *opaque, hwaddr addr,
 
     switch (offset) {
     case A_SECRESPCFG:
-    case A_NSCCFG:
+        value &= 1;
+        s->secrespcfg = value;
+        qemu_set_irq(s->sec_resp_cfg, s->secrespcfg);
+        break;
     case A_SECPPCINTCLR:
+        value &= 0x00f000f3;
+        foreach_ppc(s, iotkit_secctl_ppc_update_irq_clear);
+        break;
     case A_SECPPCINTEN:
-    case A_SECMSCINTCLR:
-    case A_SECMSCINTEN:
-    case A_BRGINTCLR:
-    case A_BRGINTEN:
+        s->secppcinten = value & 0x00f000f3;
+        foreach_ppc(s, iotkit_secctl_ppc_update_irq_enable);
+        break;
     case A_AHBNSPPCEXP0:
     case A_AHBNSPPCEXP1:
     case A_AHBNSPPCEXP2:
     case A_AHBNSPPCEXP3:
+        ppc = &s->ahbexp[offset_to_ppc_idx(offset)];
+        iotkit_secctl_ppc_ns_write(ppc, value);
+        break;
     case A_APBNSPPC0:
     case A_APBNSPPC1:
+        ppc = &s->apb[offset_to_ppc_idx(offset)];
+        iotkit_secctl_ppc_ns_write(ppc, value);
+        break;
     case A_APBNSPPCEXP0:
     case A_APBNSPPCEXP1:
     case A_APBNSPPCEXP2:
     case A_APBNSPPCEXP3:
+        ppc = &s->apbexp[offset_to_ppc_idx(offset)];
+        iotkit_secctl_ppc_ns_write(ppc, value);
+        break;
     case A_AHBSPPPCEXP0:
     case A_AHBSPPPCEXP1:
     case A_AHBSPPPCEXP2:
     case A_AHBSPPPCEXP3:
+        ppc = &s->ahbexp[offset_to_ppc_idx(offset)];
+        iotkit_secctl_ppc_sp_write(ppc, value);
+        break;
     case A_APBSPPPC0:
     case A_APBSPPPC1:
+        ppc = &s->apb[offset_to_ppc_idx(offset)];
+        iotkit_secctl_ppc_sp_write(ppc, value);
+        break;
     case A_APBSPPPCEXP0:
     case A_APBSPPPCEXP1:
     case A_APBSPPPCEXP2:
     case A_APBSPPPCEXP3:
+        ppc = &s->apbexp[offset_to_ppc_idx(offset)];
+        iotkit_secctl_ppc_sp_write(ppc, value);
+        break;
+    case A_NSCCFG:
+    case A_SECMSCINTCLR:
+    case A_SECMSCINTEN:
+    case A_BRGINTCLR:
+    case A_BRGINTEN:
         qemu_log_mask(LOG_UNIMP,
                       "IoTKit SecCtl S block write: "
                       "unimplemented offset 0x%x\n", offset);
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_ns_read(void *opaque, hwaddr addr,
                                          uint64_t *pdata,
                                          unsigned size, MemTxAttrs attrs)
 {
+    IoTKitSecCtl *s = IOTKIT_SECCTL(opaque);
     uint64_t r;
     uint32_t offset = addr & ~0x3;
 
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_ns_read(void *opaque, hwaddr addr,
     case A_AHBNSPPPCEXP1:
     case A_AHBNSPPPCEXP2:
     case A_AHBNSPPPCEXP3:
+        r = s->ahbexp[offset_to_ppc_idx(offset)].nsp;
+        break;
     case A_APBNSPPPC0:
     case A_APBNSPPPC1:
+        r = s->apb[offset_to_ppc_idx(offset)].nsp;
+        break;
     case A_APBNSPPPCEXP0:
     case A_APBNSPPPCEXP1:
     case A_APBNSPPPCEXP2:
     case A_APBNSPPPCEXP3:
-        qemu_log_mask(LOG_UNIMP,
-                      "IoTKit SecCtl NS block read: "
-                      "unimplemented offset 0x%x\n", offset);
+        r = s->apbexp[offset_to_ppc_idx(offset)].nsp;
         break;
     case A_PID4:
     case A_PID5:
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_ns_write(void *opaque, hwaddr addr,
                                           uint64_t value,
                                           unsigned size, MemTxAttrs attrs)
 {
+    IoTKitSecCtl *s = IOTKIT_SECCTL(opaque);
     uint32_t offset = addr;
+    IoTKitSecCtlPPC *ppc;
 
     trace_iotkit_secctl_ns_write(offset, value, size);
 
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_ns_write(void *opaque, hwaddr addr,
     case A_AHBNSPPPCEXP1:
     case A_AHBNSPPPCEXP2:
     case A_AHBNSPPPCEXP3:
+        ppc = &s->ahbexp[offset_to_ppc_idx(offset)];
+        iotkit_secctl_ppc_nsp_write(ppc, value);
+        break;
     case A_APBNSPPPC0:
     case A_APBNSPPPC1:
+        ppc = &s->apb[offset_to_ppc_idx(offset)];
+        iotkit_secctl_ppc_nsp_write(ppc, value);
+        break;
     case A_APBNSPPPCEXP0:
     case A_APBNSPPPCEXP1:
     case A_APBNSPPPCEXP2:
     case A_APBNSPPPCEXP3:
-        qemu_log_mask(LOG_UNIMP,
-                      "IoTKit SecCtl NS block write: "
-                      "unimplemented offset 0x%x\n", offset);
+        ppc = &s->apbexp[offset_to_ppc_idx(offset)];
+        iotkit_secctl_ppc_nsp_write(ppc, value);
         break;
     case A_AHBNSPPPC0:
     case A_PID4:
@@ -XXX,XX +XXX,XX @@ static const MemoryRegionOps iotkit_secctl_ns_ops = {
     .impl.max_access_size = 4,
 };
 
+static void iotkit_secctl_reset_ppc(IoTKitSecCtlPPC *ppc)
+{
+    ppc->ns = 0;
+    ppc->sp = 0;
+    ppc->nsp = 0;
+}
+
 static void iotkit_secctl_reset(DeviceState *dev)
 {
+    IoTKitSecCtl *s = IOTKIT_SECCTL(dev);
 
+    s->secppcintstat = 0;
+    s->secppcinten = 0;
+    s->secrespcfg = 0;
+
+    foreach_ppc(s, iotkit_secctl_reset_ppc);
+}
+
+static void iotkit_secctl_ppc_irqstatus(void *opaque, int n, int level)
+{
+    IoTKitSecCtlPPC *ppc = opaque;
+    IoTKitSecCtl *s = IOTKIT_SECCTL(ppc->parent);
+    int irqbit = ppc->irq_bit_offset + n;
+
+    s->secppcintstat = deposit32(s->secppcintstat, irqbit, 1, level);
+}
+
+static void iotkit_secctl_init_ppc(IoTKitSecCtl *s,
+                                   IoTKitSecCtlPPC *ppc,
+                                   const char *name,
+                                   int numports,
+                                   int irq_bit_offset)
+{
+    char *gpioname;
+    DeviceState *dev = DEVICE(s);
+
+    ppc->numports = numports;
+    ppc->irq_bit_offset = irq_bit_offset;
+    ppc->parent = s;
+
+    gpioname = g_strdup_printf("%s_nonsec", name);
+    qdev_init_gpio_out_named(dev, ppc->nonsec, gpioname, numports);
+    g_free(gpioname);
+    gpioname = g_strdup_printf("%s_ap", name);
+    qdev_init_gpio_out_named(dev, ppc->ap, gpioname, numports);
+    g_free(gpioname);
+    gpioname = g_strdup_printf("%s_irq_enable", name);
+    qdev_init_gpio_out_named(dev, &ppc->irq_enable, gpioname, 1);
+    g_free(gpioname);
+    gpioname = g_strdup_printf("%s_irq_clear", name);
+    qdev_init_gpio_out_named(dev, &ppc->irq_clear, gpioname, 1);
+    g_free(gpioname);
+    gpioname = g_strdup_printf("%s_irq_status", name);
+    qdev_init_gpio_in_named_with_opaque(dev, iotkit_secctl_ppc_irqstatus,
+                                        ppc, gpioname, 1);
+    g_free(gpioname);
 }
 
 static void iotkit_secctl_init(Object *obj)
 {
     IoTKitSecCtl *s = IOTKIT_SECCTL(obj);
     SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
+    DeviceState *dev = DEVICE(obj);
+    int i;
+
+    iotkit_secctl_init_ppc(s, &s->apb[0], "apb_ppc0",
+                           IOTS_APB_PPC0_NUM_PORTS, 0);
+    iotkit_secctl_init_ppc(s, &s->apb[1], "apb_ppc1",
+                           IOTS_APB_PPC1_NUM_PORTS, 1);
+
+    for (i = 0; i < IOTS_NUM_APB_EXP_PPC; i++) {
+        IoTKitSecCtlPPC *ppc = &s->apbexp[i];
+        char *ppcname = g_strdup_printf("apb_ppcexp%d", i);
+        iotkit_secctl_init_ppc(s, ppc, ppcname, IOTS_PPC_NUM_PORTS, 4 + i);
+        g_free(ppcname);
+    }
+    for (i = 0; i < IOTS_NUM_AHB_EXP_PPC; i++) {
+        IoTKitSecCtlPPC *ppc = &s->ahbexp[i];
+        char *ppcname = g_strdup_printf("ahb_ppcexp%d", i);
+        iotkit_secctl_init_ppc(s, ppc, ppcname, IOTS_PPC_NUM_PORTS, 20 + i);
+        g_free(ppcname);
+    }
+
+    qdev_init_gpio_out_named(dev, &s->sec_resp_cfg, "sec_resp_cfg", 1);
 
     memory_region_init_io(&s->s_regs, obj, &iotkit_secctl_s_ops,
                           s, "iotkit-secctl-s-regs", 0x1000);
@@ -XXX,XX +XXX,XX @@ static void iotkit_secctl_init(Object *obj)
     sysbus_init_mmio(sbd, &s->ns_regs);
 }
 
+static const VMStateDescription iotkit_secctl_ppc_vmstate = {
+    .name = "iotkit-secctl-ppc",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(ns, IoTKitSecCtlPPC),
+        VMSTATE_UINT32(sp, IoTKitSecCtlPPC),
+        VMSTATE_UINT32(nsp, IoTKitSecCtlPPC),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription iotkit_secctl_vmstate = {
     .name = "iotkit-secctl",
     .version_id = 1,
     .minimum_version_id = 1,
     .fields = (VMStateField[]) {
+        VMSTATE_UINT32(secppcintstat, IoTKitSecCtl),
+        VMSTATE_UINT32(secppcinten, IoTKitSecCtl),
+        VMSTATE_UINT32(secrespcfg, IoTKitSecCtl),
+        VMSTATE_STRUCT_ARRAY(apb, IoTKitSecCtl, IOTS_NUM_APB_PPC, 1,
+                             iotkit_secctl_ppc_vmstate, IoTKitSecCtlPPC),
+        VMSTATE_STRUCT_ARRAY(apbexp, IoTKitSecCtl, IOTS_NUM_APB_EXP_PPC, 1,
+                             iotkit_secctl_ppc_vmstate, IoTKitSecCtlPPC),
+        VMSTATE_STRUCT_ARRAY(ahbexp, IoTKitSecCtl, IOTS_NUM_AHB_EXP_PPC, 1,
+                             iotkit_secctl_ppc_vmstate, IoTKitSecCtlPPC),
         VMSTATE_END_OF_LIST()
     }
 };
-- 
2.16.2

Add remaining easy registers to iotkit-secctl:
 * NSCCFG just routes its two bits out to external GPIO lines
 * BRGINSTAT/BRGINTCLR/BRGINTEN can be dummies, because QEMU's
   bus fabric can never report errors

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180220180325.29818-18-peter.maydell@linaro.org
---
 include/hw/misc/iotkit-secctl.h |  4 ++++
 hw/misc/iotkit-secctl.c         | 32 ++++++++++++++++++++++++++------
 2 files changed, 30 insertions(+), 6 deletions(-)

Model the Arm IoT Kit documented in
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html

The Arm IoT Kit is a subsystem which includes a CPU and some devices,
and is intended be extended by adding extra devices to form a
complete system.  It is used in the MPS2 board's AN505 image for the
Cortex-M33.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-19-peter.maydell@linaro.org
---
 hw/arm/Makefile.objs            |   1 +
 include/hw/arm/iotkit.h         | 109 ++++++++
 hw/arm/iotkit.c                 | 598 ++++++++++++++++++++++++++++++++++++++++
 default-configs/arm-softmmu.mak |   1 +
 4 files changed, 709 insertions(+)
 create mode 100644 include/hw/arm/iotkit.h
 create mode 100644 hw/arm/iotkit.c

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_FSL_IMX6) += fsl-imx6.o sabrelite.o
 obj-$(CONFIG_ASPEED_SOC) += aspeed_soc.o aspeed.o
 obj-$(CONFIG_MPS2) += mps2.o
 obj-$(CONFIG_MSF2) += msf2-soc.o msf2-som.o
+obj-$(CONFIG_IOTKIT) += iotkit.o
diff --git a/include/hw/arm/iotkit.h b/include/hw/arm/iotkit.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/hw/arm/iotkit.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM IoT Kit
+ *
+ * Copyright (c) 2018 Linaro Limited
+ * Written by Peter Maydell
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 or
+ * (at your option) any later version.
+ */
+
+/* This is a model of the Arm IoT Kit which is documented in
+ * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
+ * It contains:
+ *  a Cortex-M33
+ *  the IDAU
+ *  some timers and watchdogs
+ *  two peripheral protection controllers
+ *  a memory protection controller
+ *  a security controller
+ *  a bus fabric which arranges that some parts of the address
+ *  space are secure and non-secure aliases of each other
+ *
+ * QEMU interface:
+ *  + QOM property "memory" is a MemoryRegion containing the devices provided
+ *    by the board model.
+ *  + QOM property "MAINCLK" is the frequency of the main system clock
+ *  + QOM property "EXP_NUMIRQ" sets the number of expansion interrupts
+ *  + Named GPIO inputs "EXP_IRQ" 0..n are the expansion interrupts, which
+ *    are wired to the NVIC lines 32 .. n+32
+ * Controlling up to 4 AHB expansion PPBs which a system using the IoTKit
+ * might provide:
+ *  + named GPIO outputs apb_ppcexp{0,1,2,3}_nonsec[0..15]
+ *  + named GPIO outputs apb_ppcexp{0,1,2,3}_ap[0..15]
+ *  + named GPIO outputs apb_ppcexp{0,1,2,3}_irq_enable
+ *  + named GPIO outputs apb_ppcexp{0,1,2,3}_irq_clear
+ *  + named GPIO inputs apb_ppcexp{0,1,2,3}_irq_status
+ * Controlling each of the 4 expansion AHB PPCs which a system using the IoTKit
+ * might provide:
+ *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_nonsec[0..15]
+ *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_ap[0..15]
+ *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_enable
+ *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_clear
+ *  + named GPIO inputs ahb_ppcexp{0,1,2,3}_irq_status
+ */
+
+#ifndef IOTKIT_H
+#define IOTKIT_H
+
+#include "hw/sysbus.h"
+#include "hw/arm/armv7m.h"
+#include "hw/misc/iotkit-secctl.h"
+#include "hw/misc/tz-ppc.h"
+#include "hw/timer/cmsdk-apb-timer.h"
+#include "hw/misc/unimp.h"
+#include "hw/or-irq.h"
+#include "hw/core/split-irq.h"
+
+#define TYPE_IOTKIT "iotkit"
+#define IOTKIT(obj) OBJECT_CHECK(IoTKit, (obj), TYPE_IOTKIT)
+
+/* We have an IRQ splitter and an OR gate input for each external PPC
+ * and the 2 internal PPCs
+ */
+#define NUM_EXTERNAL_PPCS (IOTS_NUM_AHB_EXP_PPC + IOTS_NUM_APB_EXP_PPC)
+#define NUM_PPCS (NUM_EXTERNAL_PPCS + 2)
+
+typedef struct IoTKit {
+    /*< private >*/
+    SysBusDevice parent_obj;
+
+    /*< public >*/
+    ARMv7MState armv7m;
+    IoTKitSecCtl secctl;
+    TZPPC apb_ppc0;
+    TZPPC apb_ppc1;
+    CMSDKAPBTIMER timer0;
+    CMSDKAPBTIMER timer1;
+    qemu_or_irq ppc_irq_orgate;
+    SplitIRQ sec_resp_splitter;
+    SplitIRQ ppc_irq_splitter[NUM_PPCS];
+
+    UnimplementedDeviceState dualtimer;
+    UnimplementedDeviceState s32ktimer;
+
+    MemoryRegion container;
+    MemoryRegion alias1;
+    MemoryRegion alias2;
+    MemoryRegion alias3;
+    MemoryRegion sram0;
+
+    qemu_irq *exp_irqs;
+    qemu_irq ppc0_irq;
+    qemu_irq ppc1_irq;
+    qemu_irq sec_resp_cfg;
+    qemu_irq sec_resp_cfg_in;
+    qemu_irq nsc_cfg_in;
+
+    qemu_irq irq_status_in[NUM_EXTERNAL_PPCS];
+
+    uint32_t nsccfg;
+
+    /* Properties */
+    MemoryRegion *board_memory;
+    uint32_t exp_numirq;
+    uint32_t mainclk_frq;
+} IoTKit;
+
+#endif
diff --git a/hw/arm/iotkit.c b/hw/arm/iotkit.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/arm/iotkit.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * Arm IoT Kit
+ *
+ * Copyright (c) 2018 Linaro Limited
+ * Written by Peter Maydell
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 or
+ * (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "trace.h"
+#include "hw/sysbus.h"
+#include "hw/registerfields.h"
+#include "hw/arm/iotkit.h"
+#include "hw/misc/unimp.h"
+#include "hw/arm/arm.h"
+
+/* Create an alias region of @size bytes starting at @base
+ * which mirrors the memory starting at @orig.
+ */
+static void make_alias(IoTKit *s, MemoryRegion *mr, const char *name,
+                       hwaddr base, hwaddr size, hwaddr orig)
+{
+    memory_region_init_alias(mr, NULL, name, &s->container, orig, size);
+    /* The alias is even lower priority than unimplemented_device regions */
+    memory_region_add_subregion_overlap(&s->container, base, mr, -1500);
+}
+
+static void init_sysbus_child(Object *parent, const char *childname,
+                              void *child, size_t childsize,
+                              const char *childtype)
+{
+    object_initialize(child, childsize, childtype);
+    object_property_add_child(parent, childname, OBJECT(child), &error_abort);
+    qdev_set_parent_bus(DEVICE(child), sysbus_get_default());
+}
+
+static void irq_status_forwarder(void *opaque, int n, int level)
+{
+    qemu_irq destirq = opaque;
+
+    qemu_set_irq(destirq, level);
+}
+
+static void nsccfg_handler(void *opaque, int n, int level)
+{
+    IoTKit *s = IOTKIT(opaque);
+
+    s->nsccfg = level;
+}
+
+static void iotkit_forward_ppc(IoTKit *s, const char *ppcname, int ppcnum)
+{
+    /* Each of the 4 AHB and 4 APB PPCs that might be present in a
+     * system using the IoTKit has a collection of control lines which
+     * are provided by the security controller and which we want to
+     * expose as control lines on the IoTKit device itself, so the
+     * code using the IoTKit can wire them up to the PPCs.
+     */
+    SplitIRQ *splitter = &s->ppc_irq_splitter[ppcnum];
+    DeviceState *iotkitdev = DEVICE(s);
+    DeviceState *dev_secctl = DEVICE(&s->secctl);
+    DeviceState *dev_splitter = DEVICE(splitter);
+    char *name;
+
+    name = g_strdup_printf("%s_nonsec", ppcname);
+    qdev_pass_gpios(dev_secctl, iotkitdev, name);
+    g_free(name);
+    name = g_strdup_printf("%s_ap", ppcname);
+    qdev_pass_gpios(dev_secctl, iotkitdev, name);
+    g_free(name);
+    name = g_strdup_printf("%s_irq_enable", ppcname);
+    qdev_pass_gpios(dev_secctl, iotkitdev, name);
+    g_free(name);
+    name = g_strdup_printf("%s_irq_clear", ppcname);
+    qdev_pass_gpios(dev_secctl, iotkitdev, name);
+    g_free(name);
+
+    /* irq_status is a little more tricky, because we need to
+     * split it so we can send it both to the security controller
+     * and to our OR gate for the NVIC interrupt line.
+     * Connect up the splitter's outputs, and create a GPIO input
+     * which will pass the line state to the input splitter.
+     */
+    name = g_strdup_printf("%s_irq_status", ppcname);
+    qdev_connect_gpio_out(dev_splitter, 0,
+                          qdev_get_gpio_in_named(dev_secctl,
+                                                 name, 0));
+    qdev_connect_gpio_out(dev_splitter, 1,
+                          qdev_get_gpio_in(DEVICE(&s->ppc_irq_orgate), ppcnum));
+    s->irq_status_in[ppcnum] = qdev_get_gpio_in(dev_splitter, 0);
+    qdev_init_gpio_in_named_with_opaque(iotkitdev, irq_status_forwarder,
+                                        s->irq_status_in[ppcnum], name, 1);
+    g_free(name);
+}
+
+static void iotkit_forward_sec_resp_cfg(IoTKit *s)
+{
+    /* Forward the 3rd output from the splitter device as a
+     * named GPIO output of the iotkit object.
+     */
+    DeviceState *dev = DEVICE(s);
+    DeviceState *dev_splitter = DEVICE(&s->sec_resp_splitter);
+
+    qdev_init_gpio_out_named(dev, &s->sec_resp_cfg, "sec_resp_cfg", 1);
+    s->sec_resp_cfg_in = qemu_allocate_irq(irq_status_forwarder,
+                                           s->sec_resp_cfg, 1);
+    qdev_connect_gpio_out(dev_splitter, 2, s->sec_resp_cfg_in);
+}
+
+static void iotkit_init(Object *obj)
+{
+    IoTKit *s = IOTKIT(obj);
+    int i;
+
+    memory_region_init(&s->container, obj, "iotkit-container", UINT64_MAX);
+
+    init_sysbus_child(obj, "armv7m", &s->armv7m, sizeof(s->armv7m),
+                      TYPE_ARMV7M);
+    qdev_prop_set_string(DEVICE(&s->armv7m), "cpu-type",
+                         ARM_CPU_TYPE_NAME("cortex-m33"));
+
+    init_sysbus_child(obj, "secctl", &s->secctl, sizeof(s->secctl),
+                      TYPE_IOTKIT_SECCTL);
+    init_sysbus_child(obj, "apb-ppc0", &s->apb_ppc0, sizeof(s->apb_ppc0),
+                      TYPE_TZ_PPC);
+    init_sysbus_child(obj, "apb-ppc1", &s->apb_ppc1, sizeof(s->apb_ppc1),
+                      TYPE_TZ_PPC);
+    init_sysbus_child(obj, "timer0", &s->timer0, sizeof(s->timer0),
+                      TYPE_CMSDK_APB_TIMER);
+    init_sysbus_child(obj, "timer1", &s->timer1, sizeof(s->timer1),
+                      TYPE_CMSDK_APB_TIMER);
+    init_sysbus_child(obj, "dualtimer", &s->dualtimer, sizeof(s->dualtimer),
+                      TYPE_UNIMPLEMENTED_DEVICE);
+    object_initialize(&s->ppc_irq_orgate, sizeof(s->ppc_irq_orgate),
+                      TYPE_OR_IRQ);
+    object_property_add_child(obj, "ppc-irq-orgate",
+                              OBJECT(&s->ppc_irq_orgate), &error_abort);
+    object_initialize(&s->sec_resp_splitter, sizeof(s->sec_resp_splitter),
+                      TYPE_SPLIT_IRQ);
+    object_property_add_child(obj, "sec-resp-splitter",
+                              OBJECT(&s->sec_resp_splitter), &error_abort);
+    for (i = 0; i < ARRAY_SIZE(s->ppc_irq_splitter); i++) {
+        char *name = g_strdup_printf("ppc-irq-splitter-%d", i);
+        SplitIRQ *splitter = &s->ppc_irq_splitter[i];
+
+        object_initialize(splitter, sizeof(*splitter), TYPE_SPLIT_IRQ);
+        object_property_add_child(obj, name, OBJECT(splitter), &error_abort);
+    }
+    init_sysbus_child(obj, "s32ktimer", &s->s32ktimer, sizeof(s->s32ktimer),
+                      TYPE_UNIMPLEMENTED_DEVICE);
+}
+
+static void iotkit_exp_irq(void *opaque, int n, int level)
+{
+    IoTKit *s = IOTKIT(opaque);
+
+    qemu_set_irq(s->exp_irqs[n], level);
+}
+
+static void iotkit_realize(DeviceState *dev, Error **errp)
+{
+    IoTKit *s = IOTKIT(dev);
+    int i;
+    MemoryRegion *mr;
+    Error *err = NULL;
+    SysBusDevice *sbd_apb_ppc0;
+    SysBusDevice *sbd_secctl;
+    DeviceState *dev_apb_ppc0;
+    DeviceState *dev_apb_ppc1;
+    DeviceState *dev_secctl;
+    DeviceState *dev_splitter;
+
+    if (!s->board_memory) {
+        error_setg(errp, "memory property was not set");
+        return;
+    }
+
+    if (!s->mainclk_frq) {
+        error_setg(errp, "MAINCLK property was not set");
+        return;
+    }
+
+    /* Handling of which devices should be available only to secure
+     * code is usually done differently for M profile than for A profile.
+     * Instead of putting some devices only into the secure address space,
+     * devices exist in both address spaces but with hard-wired security
+     * permissions that will cause the CPU to fault for non-secure accesses.
+     *
+     * The IoTKit has an IDAU (Implementation Defined Access Unit),
+     * which specifies hard-wired security permissions for different
+     * areas of the physical address space. For the IoTKit IDAU, the
+     * top 4 bits of the physical address are the IDAU region ID, and
+     * if bit 28 (ie the lowest bit of the ID) is 0 then this is an NS
+     * region, otherwise it is an S region.
+     *
+     * The various devices and RAMs are generally all mapped twice,
+     * once into a region that the IDAU defines as secure and once
+     * into a non-secure region. They sit behind either a Memory
+     * Protection Controller (for RAM) or a Peripheral Protection
+     * Controller (for devices), which allow a more fine grained
+     * configuration of whether non-secure accesses are permitted.
+     *
+     * (The other place that guest software can configure security
+     * permissions is in the architected SAU (Security Attribution
+     * Unit), which is entirely inside the CPU. The IDAU can upgrade
+     * the security attributes for a region to more restrictive than
+     * the SAU specifies, but cannot downgrade them.)
+     *
+     * 0x10000000..0x1fffffff  alias of 0x00000000..0x0fffffff
+     * 0x20000000..0x2007ffff  32KB FPGA block RAM
+     * 0x30000000..0x3fffffff  alias of 0x20000000..0x2fffffff
+     * 0x40000000..0x4000ffff  base peripheral region 1
+     * 0x40010000..0x4001ffff  CPU peripherals (none for IoTKit)
+     * 0x40020000..0x4002ffff  system control element peripherals
+     * 0x40080000..0x400fffff  base peripheral region 2
+     * 0x50000000..0x5fffffff  alias of 0x40000000..0x4fffffff
+     */
+
+    memory_region_add_subregion_overlap(&s->container, 0, s->board_memory, -1);
+
+    qdev_prop_set_uint32(DEVICE(&s->armv7m), "num-irq", s->exp_numirq + 32);
+    /* In real hardware the initial Secure VTOR is set from the INITSVTOR0
+     * register in the IoT Kit System Control Register block, and the
+     * initial value of that is in turn specifiable by the FPGA that
+     * instantiates the IoT Kit. In QEMU we don't implement this wrinkle,
+     * and simply set the CPU's init-svtor to the IoT Kit default value.
+     */
+    qdev_prop_set_uint32(DEVICE(&s->armv7m), "init-svtor", 0x10000000);
+    object_property_set_link(OBJECT(&s->armv7m), OBJECT(&s->container),
+                             "memory", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    object_property_set_link(OBJECT(&s->armv7m), OBJECT(s), "idau", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    object_property_set_bool(OBJECT(&s->armv7m), true, "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    /* Connect our EXP_IRQ GPIOs to the NVIC's lines 32 and up. */
+    s->exp_irqs = g_new(qemu_irq, s->exp_numirq);
+    for (i = 0; i < s->exp_numirq; i++) {
+        s->exp_irqs[i] = qdev_get_gpio_in(DEVICE(&s->armv7m), i + 32);
+    }
+    qdev_init_gpio_in_named(dev, iotkit_exp_irq, "EXP_IRQ", s->exp_numirq);
+
+    /* Set up the big aliases first */
+    make_alias(s, &s->alias1, "alias 1", 0x10000000, 0x10000000, 0x00000000);
+    make_alias(s, &s->alias2, "alias 2", 0x30000000, 0x10000000, 0x20000000);
+    /* The 0x50000000..0x5fffffff region is not a pure alias: it has
+     * a few extra devices that only appear there (generally the
+     * control interfaces for the protection controllers).
+     * We implement this by mapping those devices over the top of this
+     * alias MR at a higher priority.
+     */
+    make_alias(s, &s->alias3, "alias 3", 0x50000000, 0x10000000, 0x40000000);
+
+    /* This RAM should be behind a Memory Protection Controller, but we
+     * don't implement that yet.
+     */
+    memory_region_init_ram(&s->sram0, NULL, "iotkit.sram0", 0x00008000, &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    memory_region_add_subregion(&s->container, 0x20000000, &s->sram0);
+
+    /* Security controller */
+    object_property_set_bool(OBJECT(&s->secctl), true, "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    sbd_secctl = SYS_BUS_DEVICE(&s->secctl);
+    dev_secctl = DEVICE(&s->secctl);
+    sysbus_mmio_map(sbd_secctl, 0, 0x50080000);
+    sysbus_mmio_map(sbd_secctl, 1, 0x40080000);
+
+    s->nsc_cfg_in = qemu_allocate_irq(nsccfg_handler, s, 1);
+    qdev_connect_gpio_out_named(dev_secctl, "nsc_cfg", 0, s->nsc_cfg_in);
+
+    /* The sec_resp_cfg output from the security controller must be split into
+     * multiple lines, one for each of the PPCs within the IoTKit and one
+     * that will be an output from the IoTKit to the system.
+     */
+    object_property_set_int(OBJECT(&s->sec_resp_splitter), 3,
+                            "num-lines", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    object_property_set_bool(OBJECT(&s->sec_resp_splitter), true,
+                             "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    dev_splitter = DEVICE(&s->sec_resp_splitter);
+    qdev_connect_gpio_out_named(dev_secctl, "sec_resp_cfg", 0,
+                                qdev_get_gpio_in(dev_splitter, 0));
+
+    /* Devices behind APB PPC0:
+     *   0x40000000: timer0
+     *   0x40001000: timer1
+     *   0x40002000: dual timer
+     * We must configure and realize each downstream device and connect
+     * it to the appropriate PPC port; then we can realize the PPC and
+     * map its upstream ends to the right place in the container.
+     */
+    qdev_prop_set_uint32(DEVICE(&s->timer0), "pclk-frq", s->mainclk_frq);
+    object_property_set_bool(OBJECT(&s->timer0), true, "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    sysbus_connect_irq(SYS_BUS_DEVICE(&s->timer0), 0,
+                       qdev_get_gpio_in(DEVICE(&s->armv7m), 3));
+    mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->timer0), 0);
+    object_property_set_link(OBJECT(&s->apb_ppc0), OBJECT(mr), "port[0]", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    qdev_prop_set_uint32(DEVICE(&s->timer1), "pclk-frq", s->mainclk_frq);
+    object_property_set_bool(OBJECT(&s->timer1), true, "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    sysbus_connect_irq(SYS_BUS_DEVICE(&s->timer1), 0,
+                       qdev_get_gpio_in(DEVICE(&s->armv7m), 3));
+    mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->timer1), 0);
+    object_property_set_link(OBJECT(&s->apb_ppc0), OBJECT(mr), "port[1]", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    qdev_prop_set_string(DEVICE(&s->dualtimer), "name", "Dual timer");
+    qdev_prop_set_uint64(DEVICE(&s->dualtimer), "size", 0x1000);
+    object_property_set_bool(OBJECT(&s->dualtimer), true, "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->dualtimer), 0);
+    object_property_set_link(OBJECT(&s->apb_ppc0), OBJECT(mr), "port[2]", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    object_property_set_bool(OBJECT(&s->apb_ppc0), true, "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    sbd_apb_ppc0 = SYS_BUS_DEVICE(&s->apb_ppc0);
+    dev_apb_ppc0 = DEVICE(&s->apb_ppc0);
+
+    mr = sysbus_mmio_get_region(sbd_apb_ppc0, 0);
+    memory_region_add_subregion(&s->container, 0x40000000, mr);
+    mr = sysbus_mmio_get_region(sbd_apb_ppc0, 1);
+    memory_region_add_subregion(&s->container, 0x40001000, mr);
+    mr = sysbus_mmio_get_region(sbd_apb_ppc0, 2);
+    memory_region_add_subregion(&s->container, 0x40002000, mr);
+    for (i = 0; i < IOTS_APB_PPC0_NUM_PORTS; i++) {
+        qdev_connect_gpio_out_named(dev_secctl, "apb_ppc0_nonsec", i,
+                                    qdev_get_gpio_in_named(dev_apb_ppc0,
+                                                           "cfg_nonsec", i));
+        qdev_connect_gpio_out_named(dev_secctl, "apb_ppc0_ap", i,
+                                    qdev_get_gpio_in_named(dev_apb_ppc0,
+                                                           "cfg_ap", i));
+    }
+    qdev_connect_gpio_out_named(dev_secctl, "apb_ppc0_irq_enable", 0,
+                                qdev_get_gpio_in_named(dev_apb_ppc0,
+                                                       "irq_enable", 0));
+    qdev_connect_gpio_out_named(dev_secctl, "apb_ppc0_irq_clear", 0,
+                                qdev_get_gpio_in_named(dev_apb_ppc0,
+                                                       "irq_clear", 0));
+    qdev_connect_gpio_out(dev_splitter, 0,
+                          qdev_get_gpio_in_named(dev_apb_ppc0,
+                                                 "cfg_sec_resp", 0));
+
+    /* All the PPC irq lines (from the 2 internal PPCs and the 8 external
+     * ones) are sent individually to the security controller, and also
+     * ORed together to give a single combined PPC interrupt to the NVIC.
+     */
+    object_property_set_int(OBJECT(&s->ppc_irq_orgate),
+                            NUM_PPCS, "num-lines", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    object_property_set_bool(OBJECT(&s->ppc_irq_orgate), true,
+                             "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    qdev_connect_gpio_out(DEVICE(&s->ppc_irq_orgate), 0,
+                          qdev_get_gpio_in(DEVICE(&s->armv7m), 10));
+
+    /* 0x40010000 .. 0x4001ffff: private CPU region: unused in IoTKit */
+
+    /* 0x40020000 .. 0x4002ffff : IoTKit system control peripheral region */
+    /* Devices behind APB PPC1:
+     *   0x4002f000: S32K timer
+     */
+    qdev_prop_set_string(DEVICE(&s->s32ktimer), "name", "S32KTIMER");
+    qdev_prop_set_uint64(DEVICE(&s->s32ktimer), "size", 0x1000);
+    object_property_set_bool(OBJECT(&s->s32ktimer), true, "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->s32ktimer), 0);
+    object_property_set_link(OBJECT(&s->apb_ppc1), OBJECT(mr), "port[0]", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    object_property_set_bool(OBJECT(&s->apb_ppc1), true, "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->apb_ppc1), 0);
+    memory_region_add_subregion(&s->container, 0x4002f000, mr);
+
+    dev_apb_ppc1 = DEVICE(&s->apb_ppc1);
+    qdev_connect_gpio_out_named(dev_secctl, "apb_ppc1_nonsec", 0,
+                                qdev_get_gpio_in_named(dev_apb_ppc1,
+                                                       "cfg_nonsec", 0));
+    qdev_connect_gpio_out_named(dev_secctl, "apb_ppc1_ap", 0,
+                                qdev_get_gpio_in_named(dev_apb_ppc1,
+                                                       "cfg_ap", 0));
+    qdev_connect_gpio_out_named(dev_secctl, "apb_ppc1_irq_enable", 0,
+                                qdev_get_gpio_in_named(dev_apb_ppc1,
+                                                       "irq_enable", 0));
+    qdev_connect_gpio_out_named(dev_secctl, "apb_ppc1_irq_clear", 0,
+                                qdev_get_gpio_in_named(dev_apb_ppc1,
+                                                       "irq_clear", 0));
+    qdev_connect_gpio_out(dev_splitter, 1,
+                          qdev_get_gpio_in_named(dev_apb_ppc1,
+                                                 "cfg_sec_resp", 0));
+
+    /* Using create_unimplemented_device() maps the stub into the
+     * system address space rather than into our container, but the
+     * overall effect to the guest is the same.
+     */
+    create_unimplemented_device("SYSINFO", 0x40020000, 0x1000);
+
+    create_unimplemented_device("SYSCONTROL", 0x50021000, 0x1000);
+    create_unimplemented_device("S32KWATCHDOG", 0x5002e000, 0x1000);
+
+    /* 0x40080000 .. 0x4008ffff : IoTKit second Base peripheral region */
+
+    create_unimplemented_device("NS watchdog", 0x40081000, 0x1000);
+    create_unimplemented_device("S watchdog", 0x50081000, 0x1000);
+
+    create_unimplemented_device("SRAM0 MPC", 0x50083000, 0x1000);
+
+    for (i = 0; i < ARRAY_SIZE(s->ppc_irq_splitter); i++) {
+        Object *splitter = OBJECT(&s->ppc_irq_splitter[i]);
+
+        object_property_set_int(splitter, 2, "num-lines", &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
+        object_property_set_bool(splitter, true, "realized", &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
+    }
+
+    for (i = 0; i < IOTS_NUM_AHB_EXP_PPC; i++) {
+        char *ppcname = g_strdup_printf("ahb_ppcexp%d", i);
+
+        iotkit_forward_ppc(s, ppcname, i);
+        g_free(ppcname);
+    }
+
+    for (i = 0; i < IOTS_NUM_APB_EXP_PPC; i++) {
+        char *ppcname = g_strdup_printf("apb_ppcexp%d", i);
+
+        iotkit_forward_ppc(s, ppcname, i + IOTS_NUM_AHB_EXP_PPC);
+        g_free(ppcname);
+    }
+
+    for (i = NUM_EXTERNAL_PPCS; i < NUM_PPCS; i++) {
+        /* Wire up IRQ splitter for internal PPCs */
+        DeviceState *devs = DEVICE(&s->ppc_irq_splitter[i]);
+        char *gpioname = g_strdup_printf("apb_ppc%d_irq_status",
+                                         i - NUM_EXTERNAL_PPCS);
+        TZPPC *ppc = (i == NUM_EXTERNAL_PPCS) ? &s->apb_ppc0 : &s->apb_ppc1;
+
+        qdev_connect_gpio_out(devs, 0,
+                              qdev_get_gpio_in_named(dev_secctl, gpioname, 0));
+        qdev_connect_gpio_out(devs, 1,
+                              qdev_get_gpio_in(DEVICE(&s->ppc_irq_orgate), i));
+        qdev_connect_gpio_out_named(DEVICE(ppc), "irq", 0,
+                                    qdev_get_gpio_in(devs, 0));
+    }
+
+    iotkit_forward_sec_resp_cfg(s);
+
+    system_clock_scale = NANOSECONDS_PER_SECOND / s->mainclk_frq;
+}
+
+static void iotkit_idau_check(IDAUInterface *ii, uint32_t address,
+                              int *iregion, bool *exempt, bool *ns, bool *nsc)
+{
+    /* For IoTKit systems the IDAU responses are simple logical functions
+     * of the address bits. The NSC attribute is guest-adjustable via the
+     * NSCCFG register in the security controller.
+     */
+    IoTKit *s = IOTKIT(ii);
+    int region = extract32(address, 28, 4);
+
+    *ns = !(region & 1);
+    *nsc = (region == 1 && (s->nsccfg & 1)) || (region == 3 && (s->nsccfg & 2));
+    /* 0xe0000000..0xe00fffff and 0xf0000000..0xf00fffff are exempt */
+    *exempt = (address & 0xeff00000) == 0xe0000000;
+    *iregion = region;
+}
+
+static const VMStateDescription iotkit_vmstate = {
+    .name = "iotkit",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(nsccfg, IoTKit),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static Property iotkit_properties[] = {
+    DEFINE_PROP_LINK("memory", IoTKit, board_memory, TYPE_MEMORY_REGION,
+                     MemoryRegion *),
+    DEFINE_PROP_UINT32("EXP_NUMIRQ", IoTKit, exp_numirq, 64),
+    DEFINE_PROP_UINT32("MAINCLK", IoTKit, mainclk_frq, 0),
+    DEFINE_PROP_END_OF_LIST()
+};
+
+static void iotkit_reset(DeviceState *dev)
+{
+    IoTKit *s = IOTKIT(dev);
+
+    s->nsccfg = 0;
+}
+
+static void iotkit_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    IDAUInterfaceClass *iic = IDAU_INTERFACE_CLASS(klass);
+
+    dc->realize = iotkit_realize;
+    dc->vmsd = &iotkit_vmstate;
+    dc->props = iotkit_properties;
+    dc->reset = iotkit_reset;
+    iic->check = iotkit_idau_check;
+}
+
+static const TypeInfo iotkit_info = {
+    .name = TYPE_IOTKIT,
+    .parent = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(IoTKit),
+    .instance_init = iotkit_init,
+    .class_init = iotkit_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_IDAU_INTERFACE },
+        { }
+    }
+};
+
+static void iotkit_register_types(void)
+{
+    type_register_static(&iotkit_info);
+}
+
+type_init(iotkit_register_types);
diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index XXXXXXX..XXXXXXX 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -XXX,XX +XXX,XX @@ CONFIG_MPS2_FPGAIO=y
 CONFIG_MPS2_SCC=y
 
 CONFIG_TZ_PPC=y
+CONFIG_IOTKIT=y
 CONFIG_IOTKIT_SECCTL=y
 
 CONFIG_VERSATILE_PCI=y
-- 
2.16.2

Define a new board model for the MPS2 with an AN505 FPGA image
containing a Cortex-M33. Since the FPGA images for TrustZone
cores (AN505, and the similar AN519 for Cortex-M23) have a
significantly different layout of devices to the non-TrustZone
images, we use a new source file rather than shoehorning them
into the existing mps2.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180220180325.29818-20-peter.maydell@linaro.org
---
 hw/arm/Makefile.objs |   1 +
 hw/arm/mps2-tz.c     | 503 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 504 insertions(+)
 create mode 100644 hw/arm/mps2-tz.c

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_FSL_IMX31) += fsl-imx31.o kzm.o
 obj-$(CONFIG_FSL_IMX6) += fsl-imx6.o sabrelite.o
 obj-$(CONFIG_ASPEED_SOC) += aspeed_soc.o aspeed.o
 obj-$(CONFIG_MPS2) += mps2.o
+obj-$(CONFIG_MPS2) += mps2-tz.o
 obj-$(CONFIG_MSF2) += msf2-soc.o msf2-som.o
 obj-$(CONFIG_IOTKIT) += iotkit.o
diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM V2M MPS2 board emulation, trustzone aware FPGA images
+ *
+ * Copyright (c) 2017 Linaro Limited
+ * Written by Peter Maydell
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 or
+ *  (at your option) any later version.
+ */
+
+/* The MPS2 and MPS2+ dev boards are FPGA based (the 2+ has a bigger
+ * FPGA but is otherwise the same as the 2). Since the CPU itself
+ * and most of the devices are in the FPGA, the details of the board
+ * as seen by the guest depend significantly on the FPGA image.
+ * This source file covers the following FPGA images, for TrustZone cores:
+ *  "mps2-an505" -- Cortex-M33 as documented in ARM Application Note AN505
+ *
+ * Links to the TRM for the board itself and to the various Application
+ * Notes which document the FPGA images can be found here:
+ * https://developer.arm.com/products/system-design/development-boards/fpga-prototyping-boards/mps2
+ *
+ * Board TRM:
+ * http://infocenter.arm.com/help/topic/com.arm.doc.100112_0200_06_en/versatile_express_cortex_m_prototyping_systems_v2m_mps2_and_v2m_mps2plus_technical_reference_100112_0200_06_en.pdf
+ * Application Note AN505:
+ * http://infocenter.arm.com/help/topic/com.arm.doc.dai0505b/index.html
+ *
+ * The AN505 defers to the Cortex-M33 processor ARMv8M IoT Kit FVP User Guide
+ * (ARM ECM0601256) for the details of some of the device layout:
+ *   http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "hw/arm/arm.h"
+#include "hw/arm/armv7m.h"
+#include "hw/or-irq.h"
+#include "hw/boards.h"
+#include "exec/address-spaces.h"
+#include "sysemu/sysemu.h"
+#include "hw/misc/unimp.h"
+#include "hw/char/cmsdk-apb-uart.h"
+#include "hw/timer/cmsdk-apb-timer.h"
+#include "hw/misc/mps2-scc.h"
+#include "hw/misc/mps2-fpgaio.h"
+#include "hw/arm/iotkit.h"
+#include "hw/devices.h"
+#include "net/net.h"
+#include "hw/core/split-irq.h"
+
+typedef enum MPS2TZFPGAType {
+    FPGA_AN505,
+} MPS2TZFPGAType;
+
+typedef struct {
+    MachineClass parent;
+    MPS2TZFPGAType fpga_type;
+    uint32_t scc_id;
+} MPS2TZMachineClass;
+
+typedef struct {
+    MachineState parent;
+
+    IoTKit iotkit;
+    MemoryRegion psram;
+    MemoryRegion ssram1;
+    MemoryRegion ssram1_m;
+    MemoryRegion ssram23;
+    MPS2SCC scc;
+    MPS2FPGAIO fpgaio;
+    TZPPC ppc[5];
+    UnimplementedDeviceState ssram_mpc[3];
+    UnimplementedDeviceState spi[5];
+    UnimplementedDeviceState i2c[4];
+    UnimplementedDeviceState i2s_audio;
+    UnimplementedDeviceState gpio[5];
+    UnimplementedDeviceState dma[4];
+    UnimplementedDeviceState gfx;
+    CMSDKAPBUART uart[5];
+    SplitIRQ sec_resp_splitter;
+    qemu_or_irq uart_irq_orgate;
+} MPS2TZMachineState;
+
+#define TYPE_MPS2TZ_MACHINE "mps2tz"
+#define TYPE_MPS2TZ_AN505_MACHINE MACHINE_TYPE_NAME("mps2-an505")
+
+#define MPS2TZ_MACHINE(obj) \
+    OBJECT_CHECK(MPS2TZMachineState, obj, TYPE_MPS2TZ_MACHINE)
+#define MPS2TZ_MACHINE_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(MPS2TZMachineClass, obj, TYPE_MPS2TZ_MACHINE)
+#define MPS2TZ_MACHINE_CLASS(klass) \
+    OBJECT_CLASS_CHECK(MPS2TZMachineClass, klass, TYPE_MPS2TZ_MACHINE)
+
+/* Main SYSCLK frequency in Hz */
+#define SYSCLK_FRQ 20000000
+
+/* Initialize the auxiliary RAM region @mr and map it into
+ * the memory map at @base.
+ */
+static void make_ram(MemoryRegion *mr, const char *name,
+                     hwaddr base, hwaddr size)
+{
+    memory_region_init_ram(mr, NULL, name, size, &error_fatal);
+    memory_region_add_subregion(get_system_memory(), base, mr);
+}
+
+/* Create an alias of an entire original MemoryRegion @orig
+ * located at @base in the memory map.
+ */
+static void make_ram_alias(MemoryRegion *mr, const char *name,
+                           MemoryRegion *orig, hwaddr base)
+{
+    memory_region_init_alias(mr, NULL, name, orig, 0,
+                             memory_region_size(orig));
+    memory_region_add_subregion(get_system_memory(), base, mr);
+}
+
+static void init_sysbus_child(Object *parent, const char *childname,
+                              void *child, size_t childsize,
+                              const char *childtype)
+{
+    object_initialize(child, childsize, childtype);
+    object_property_add_child(parent, childname, OBJECT(child), &error_abort);
+    qdev_set_parent_bus(DEVICE(child), sysbus_get_default());
+
+}
+
+/* Most of the devices in the AN505 FPGA image sit behind
+ * Peripheral Protection Controllers. These data structures
+ * define the layout of which devices sit behind which PPCs.
+ * The devfn for each port is a function which creates, configures
+ * and initializes the device, returning the MemoryRegion which
+ * needs to be plugged into the downstream end of the PPC port.
+ */
+typedef MemoryRegion *MakeDevFn(MPS2TZMachineState *mms, void *opaque,
+                                const char *name, hwaddr size);
+
+typedef struct PPCPortInfo {
+    const char *name;
+    MakeDevFn *devfn;
+    void *opaque;
+    hwaddr addr;
+    hwaddr size;
+} PPCPortInfo;
+
+typedef struct PPCInfo {
+    const char *name;
+    PPCPortInfo ports[TZ_NUM_PORTS];
+} PPCInfo;
+
+static MemoryRegion *make_unimp_dev(MPS2TZMachineState *mms,
+                                       void *opaque,
+                                       const char *name, hwaddr size)
+{
+    /* Initialize, configure and realize a TYPE_UNIMPLEMENTED_DEVICE,
+     * and return a pointer to its MemoryRegion.
+     */
+    UnimplementedDeviceState *uds = opaque;
+
+    init_sysbus_child(OBJECT(mms), name, uds,
+                      sizeof(UnimplementedDeviceState),
+                      TYPE_UNIMPLEMENTED_DEVICE);
+    qdev_prop_set_string(DEVICE(uds), "name", name);
+    qdev_prop_set_uint64(DEVICE(uds), "size", size);
+    object_property_set_bool(OBJECT(uds), true, "realized", &error_fatal);
+    return sysbus_mmio_get_region(SYS_BUS_DEVICE(uds), 0);
+}
+
+static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
+                               const char *name, hwaddr size)
+{
+    CMSDKAPBUART *uart = opaque;
+    int i = uart - &mms->uart[0];
+    Chardev *uartchr = i < MAX_SERIAL_PORTS ? serial_hds[i] : NULL;
+    int rxirqno = i * 2;
+    int txirqno = i * 2 + 1;
+    int combirqno = i + 10;
+    SysBusDevice *s;
+    DeviceState *iotkitdev = DEVICE(&mms->iotkit);
+    DeviceState *orgate_dev = DEVICE(&mms->uart_irq_orgate);
+
+    init_sysbus_child(OBJECT(mms), name, uart,
+                      sizeof(mms->uart[0]), TYPE_CMSDK_APB_UART);
+    qdev_prop_set_chr(DEVICE(uart), "chardev", uartchr);
+    qdev_prop_set_uint32(DEVICE(uart), "pclk-frq", SYSCLK_FRQ);
+    object_property_set_bool(OBJECT(uart), true, "realized", &error_fatal);
+    s = SYS_BUS_DEVICE(uart);
+    sysbus_connect_irq(s, 0, qdev_get_gpio_in_named(iotkitdev,
+                                                    "EXP_IRQ", txirqno));
+    sysbus_connect_irq(s, 1, qdev_get_gpio_in_named(iotkitdev,
+                                                    "EXP_IRQ", rxirqno));
+    sysbus_connect_irq(s, 2, qdev_get_gpio_in(orgate_dev, i * 2));
+    sysbus_connect_irq(s, 3, qdev_get_gpio_in(orgate_dev, i * 2 + 1));
+    sysbus_connect_irq(s, 4, qdev_get_gpio_in_named(iotkitdev,
+                                                    "EXP_IRQ", combirqno));
+    return sysbus_mmio_get_region(SYS_BUS_DEVICE(uart), 0);
+}
+
+static MemoryRegion *make_scc(MPS2TZMachineState *mms, void *opaque,
+                              const char *name, hwaddr size)
+{
+    MPS2SCC *scc = opaque;
+    DeviceState *sccdev;
+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
+
+    object_initialize(scc, sizeof(mms->scc), TYPE_MPS2_SCC);
+    sccdev = DEVICE(scc);
+    qdev_set_parent_bus(sccdev, sysbus_get_default());
+    qdev_prop_set_uint32(sccdev, "scc-cfg4", 0x2);
+    qdev_prop_set_uint32(sccdev, "scc-aid", 0x02000008);
+    qdev_prop_set_uint32(sccdev, "scc-id", mmc->scc_id);
+    object_property_set_bool(OBJECT(scc), true, "realized", &error_fatal);
+    return sysbus_mmio_get_region(SYS_BUS_DEVICE(sccdev), 0);
+}
+
+static MemoryRegion *make_fpgaio(MPS2TZMachineState *mms, void *opaque,
+                                 const char *name, hwaddr size)
+{
+    MPS2FPGAIO *fpgaio = opaque;
+
+    object_initialize(fpgaio, sizeof(mms->fpgaio), TYPE_MPS2_FPGAIO);
+    qdev_set_parent_bus(DEVICE(fpgaio), sysbus_get_default());
+    object_property_set_bool(OBJECT(fpgaio), true, "realized", &error_fatal);
+    return sysbus_mmio_get_region(SYS_BUS_DEVICE(fpgaio), 0);
+}
+
+static void mps2tz_common_init(MachineState *machine)
+{
+    MPS2TZMachineState *mms = MPS2TZ_MACHINE(machine);
+    MachineClass *mc = MACHINE_GET_CLASS(machine);
+    MemoryRegion *system_memory = get_system_memory();
+    DeviceState *iotkitdev;
+    DeviceState *dev_splitter;
+    int i;
+
+    if (strcmp(machine->cpu_type, mc->default_cpu_type) != 0) {
+        error_report("This board can only be used with CPU %s",
+                     mc->default_cpu_type);
+        exit(1);
+    }
+
+    init_sysbus_child(OBJECT(machine), "iotkit", &mms->iotkit,
+                      sizeof(mms->iotkit), TYPE_IOTKIT);
+    iotkitdev = DEVICE(&mms->iotkit);
+    object_property_set_link(OBJECT(&mms->iotkit), OBJECT(system_memory),
+                             "memory", &error_abort);
+    qdev_prop_set_uint32(iotkitdev, "EXP_NUMIRQ", 92);
+    qdev_prop_set_uint32(iotkitdev, "MAINCLK", SYSCLK_FRQ);
+    object_property_set_bool(OBJECT(&mms->iotkit), true, "realized",
+                             &error_fatal);
+
+    /* The sec_resp_cfg output from the IoTKit must be split into multiple
+     * lines, one for each of the PPCs we create here.
+     */
+    object_initialize(&mms->sec_resp_splitter, sizeof(mms->sec_resp_splitter),
+                      TYPE_SPLIT_IRQ);
+    object_property_add_child(OBJECT(machine), "sec-resp-splitter",
+                              OBJECT(&mms->sec_resp_splitter), &error_abort);
+    object_property_set_int(OBJECT(&mms->sec_resp_splitter), 5,
+                            "num-lines", &error_fatal);
+    object_property_set_bool(OBJECT(&mms->sec_resp_splitter), true,
+                             "realized", &error_fatal);
+    dev_splitter = DEVICE(&mms->sec_resp_splitter);
+    qdev_connect_gpio_out_named(iotkitdev, "sec_resp_cfg", 0,
+                                qdev_get_gpio_in(dev_splitter, 0));
+
+    /* The IoTKit sets up much of the memory layout, including
+     * the aliases between secure and non-secure regions in the
+     * address space. The FPGA itself contains:
+     *
+     * 0x00000000..0x003fffff  SSRAM1
+     * 0x00400000..0x007fffff  alias of SSRAM1
+     * 0x28000000..0x283fffff  4MB SSRAM2 + SSRAM3
+     * 0x40100000..0x4fffffff  AHB Master Expansion 1 interface devices
+     * 0x80000000..0x80ffffff  16MB PSRAM
+     */
+
+    /* The FPGA images have an odd combination of different RAMs,
+     * because in hardware they are different implementations and
+     * connected to different buses, giving varying performance/size
+     * tradeoffs. For QEMU they're all just RAM, though. We arbitrarily
+     * call the 16MB our "system memory", as it's the largest lump.
+     */
+    memory_region_allocate_system_memory(&mms->psram,
+                                         NULL, "mps.ram", 0x01000000);
+    memory_region_add_subregion(system_memory, 0x80000000, &mms->psram);
+
+    /* The SSRAM memories should all be behind Memory Protection Controllers,
+     * but we don't implement that yet.
+     */
+    make_ram(&mms->ssram1, "mps.ssram1", 0x00000000, 0x00400000);
+    make_ram_alias(&mms->ssram1_m, "mps.ssram1_m", &mms->ssram1, 0x00400000);
+
+    make_ram(&mms->ssram23, "mps.ssram23", 0x28000000, 0x00400000);
+
+    /* The overflow IRQs for all UARTs are ORed together.
+     * Tx, Rx and "combined" IRQs are sent to the NVIC separately.
+     * Create the OR gate for this.
+     */
+    object_initialize(&mms->uart_irq_orgate, sizeof(mms->uart_irq_orgate),
+                      TYPE_OR_IRQ);
+    object_property_add_child(OBJECT(mms), "uart-irq-orgate",
+                              OBJECT(&mms->uart_irq_orgate), &error_abort);
+    object_property_set_int(OBJECT(&mms->uart_irq_orgate), 10, "num-lines",
+                            &error_fatal);
+    object_property_set_bool(OBJECT(&mms->uart_irq_orgate), true,
+                             "realized", &error_fatal);
+    qdev_connect_gpio_out(DEVICE(&mms->uart_irq_orgate), 0,
+                          qdev_get_gpio_in_named(iotkitdev, "EXP_IRQ", 15));
+
+    /* Most of the devices in the FPGA are behind Peripheral Protection
+     * Controllers. The required order for initializing things is:
+     *  + initialize the PPC
+     *  + initialize, configure and realize downstream devices
+     *  + connect downstream device MemoryRegions to the PPC
+     *  + realize the PPC
+     *  + map the PPC's MemoryRegions to the places in the address map
+     *    where the downstream devices should appear
+     *  + wire up the PPC's control lines to the IoTKit object
+     */
+
+    const PPCInfo ppcs[] = { {
+            .name = "apb_ppcexp0",
+            .ports = {
+                { "ssram-mpc0", make_unimp_dev, &mms->ssram_mpc[0],
+                  0x58007000, 0x1000 },
+                { "ssram-mpc1", make_unimp_dev, &mms->ssram_mpc[1],
+                  0x58008000, 0x1000 },
+                { "ssram-mpc2", make_unimp_dev, &mms->ssram_mpc[2],
+                  0x58009000, 0x1000 },
+            },
+        }, {
+            .name = "apb_ppcexp1",
+            .ports = {
+                { "spi0", make_unimp_dev, &mms->spi[0], 0x40205000, 0x1000 },
+                { "spi1", make_unimp_dev, &mms->spi[1], 0x40206000, 0x1000 },
+                { "spi2", make_unimp_dev, &mms->spi[2], 0x40209000, 0x1000 },
+                { "spi3", make_unimp_dev, &mms->spi[3], 0x4020a000, 0x1000 },
+                { "spi4", make_unimp_dev, &mms->spi[4], 0x4020b000, 0x1000 },
+                { "uart0", make_uart, &mms->uart[0], 0x40200000, 0x1000 },
+                { "uart1", make_uart, &mms->uart[1], 0x40201000, 0x1000 },
+                { "uart2", make_uart, &mms->uart[2], 0x40202000, 0x1000 },
+                { "uart3", make_uart, &mms->uart[3], 0x40203000, 0x1000 },
+                { "uart4", make_uart, &mms->uart[4], 0x40204000, 0x1000 },
+                { "i2c0", make_unimp_dev, &mms->i2c[0], 0x40207000, 0x1000 },
+                { "i2c1", make_unimp_dev, &mms->i2c[1], 0x40208000, 0x1000 },
+                { "i2c2", make_unimp_dev, &mms->i2c[2], 0x4020c000, 0x1000 },
+                { "i2c3", make_unimp_dev, &mms->i2c[3], 0x4020d000, 0x1000 },
+            },
+        }, {
+            .name = "apb_ppcexp2",
+            .ports = {
+                { "scc", make_scc, &mms->scc, 0x40300000, 0x1000 },
+                { "i2s-audio", make_unimp_dev, &mms->i2s_audio,
+                  0x40301000, 0x1000 },
+                { "fpgaio", make_fpgaio, &mms->fpgaio, 0x40302000, 0x1000 },
+            },
+        }, {
+            .name = "ahb_ppcexp0",
+            .ports = {
+                { "gfx", make_unimp_dev, &mms->gfx, 0x41000000, 0x140000 },
+                { "gpio0", make_unimp_dev, &mms->gpio[0], 0x40100000, 0x1000 },
+                { "gpio1", make_unimp_dev, &mms->gpio[1], 0x40101000, 0x1000 },
+                { "gpio2", make_unimp_dev, &mms->gpio[2], 0x40102000, 0x1000 },
+                { "gpio3", make_unimp_dev, &mms->gpio[3], 0x40103000, 0x1000 },
+                { "gpio4", make_unimp_dev, &mms->gpio[4], 0x40104000, 0x1000 },
+            },
+        }, {
+            .name = "ahb_ppcexp1",
+            .ports = {
+                { "dma0", make_unimp_dev, &mms->dma[0], 0x40110000, 0x1000 },
+                { "dma1", make_unimp_dev, &mms->dma[1], 0x40111000, 0x1000 },
+                { "dma2", make_unimp_dev, &mms->dma[2], 0x40112000, 0x1000 },
+                { "dma3", make_unimp_dev, &mms->dma[3], 0x40113000, 0x1000 },
+            },
+        },
+    };
+
+    for (i = 0; i < ARRAY_SIZE(ppcs); i++) {
+        const PPCInfo *ppcinfo = &ppcs[i];
+        TZPPC *ppc = &mms->ppc[i];
+        DeviceState *ppcdev;
+        int port;
+        char *gpioname;
+
+        init_sysbus_child(OBJECT(machine), ppcinfo->name, ppc,
+                          sizeof(TZPPC), TYPE_TZ_PPC);
+        ppcdev = DEVICE(ppc);
+
+        for (port = 0; port < TZ_NUM_PORTS; port++) {
+            const PPCPortInfo *pinfo = &ppcinfo->ports[port];
+            MemoryRegion *mr;
+            char *portname;
+
+            if (!pinfo->devfn) {
+                continue;
+            }
+
+            mr = pinfo->devfn(mms, pinfo->opaque, pinfo->name, pinfo->size);
+            portname = g_strdup_printf("port[%d]", port);
+            object_property_set_link(OBJECT(ppc), OBJECT(mr),
+                                     portname, &error_fatal);
+            g_free(portname);
+        }
+
+        object_property_set_bool(OBJECT(ppc), true, "realized", &error_fatal);
+
+        for (port = 0; port < TZ_NUM_PORTS; port++) {
+            const PPCPortInfo *pinfo = &ppcinfo->ports[port];
+
+            if (!pinfo->devfn) {
+                continue;
+            }
+            sysbus_mmio_map(SYS_BUS_DEVICE(ppc), port, pinfo->addr);
+
+            gpioname = g_strdup_printf("%s_nonsec", ppcinfo->name);
+            qdev_connect_gpio_out_named(iotkitdev, gpioname, port,
+                                        qdev_get_gpio_in_named(ppcdev,
+                                                               "cfg_nonsec",
+                                                               port));
+            g_free(gpioname);
+            gpioname = g_strdup_printf("%s_ap", ppcinfo->name);
+            qdev_connect_gpio_out_named(iotkitdev, gpioname, port,
+                                        qdev_get_gpio_in_named(ppcdev,
+                                                               "cfg_ap", port));
+            g_free(gpioname);
+        }
+
+        gpioname = g_strdup_printf("%s_irq_enable", ppcinfo->name);
+        qdev_connect_gpio_out_named(iotkitdev, gpioname, 0,
+                                    qdev_get_gpio_in_named(ppcdev,
+                                                           "irq_enable", 0));
+        g_free(gpioname);
+        gpioname = g_strdup_printf("%s_irq_clear", ppcinfo->name);
+        qdev_connect_gpio_out_named(iotkitdev, gpioname, 0,
+                                    qdev_get_gpio_in_named(ppcdev,
+                                                           "irq_clear", 0));
+        g_free(gpioname);
+        gpioname = g_strdup_printf("%s_irq_status", ppcinfo->name);
+        qdev_connect_gpio_out_named(ppcdev, "irq", 0,
+                                    qdev_get_gpio_in_named(iotkitdev,
+                                                           gpioname, 0));
+        g_free(gpioname);
+
+        qdev_connect_gpio_out(dev_splitter, i,
+                              qdev_get_gpio_in_named(ppcdev,
+                                                     "cfg_sec_resp", 0));
+    }
+
+    /* In hardware this is a LAN9220; the LAN9118 is software compatible
+     * except that it doesn't support the checksum-offload feature.
+     * The ethernet controller is not behind a PPC.
+     */
+    lan9118_init(&nd_table[0], 0x42000000,
+                 qdev_get_gpio_in_named(iotkitdev, "EXP_IRQ", 16));
+
+    create_unimplemented_device("FPGA NS PC", 0x48007000, 0x1000);
+
+    armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename, 0x400000);
+}
+
+static void mps2tz_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+
+    mc->init = mps2tz_common_init;
+    mc->max_cpus = 1;
+}
+
+static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_CLASS(oc);
+
+    mc->desc = "ARM MPS2 with AN505 FPGA image for Cortex-M33";
+    mmc->fpga_type = FPGA_AN505;
+    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
+    mmc->scc_id = 0x41040000 | (505 << 4);
+}
+
+static const TypeInfo mps2tz_info = {
+    .name = TYPE_MPS2TZ_MACHINE,
+    .parent = TYPE_MACHINE,
+    .abstract = true,
+    .instance_size = sizeof(MPS2TZMachineState),
+    .class_size = sizeof(MPS2TZMachineClass),
+    .class_init = mps2tz_class_init,
+};
+
+static const TypeInfo mps2tz_an505_info = {
+    .name = TYPE_MPS2TZ_AN505_MACHINE,
+    .parent = TYPE_MPS2TZ_MACHINE,
+    .class_init = mps2tz_an505_class_init,
+};
+
+static void mps2tz_machine_init(void)
+{
+    type_register_static(&mps2tz_info);
+    type_register_static(&mps2tz_an505_info);
+}
+
+type_init(mps2tz_machine_init);
-- 
2.16.2

From: Richard Henderson <richard.henderson@linaro.org>

Not enabled anywhere yet.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180228193125.20577-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h     | 1 +
 linux-user/elfload.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ enum arm_features {
     ARM_FEATURE_V8_SHA3, /* implements SHA3 part of v8 Crypto Extensions */
     ARM_FEATURE_V8_SM3, /* implements SM3 part of v8 Crypto Extensions */
     ARM_FEATURE_V8_SM4, /* implements SM4 part of v8 Crypto Extensions */
+    ARM_FEATURE_V8_RDM, /* implements v8.1 simd round multiply */
     ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
 };
 
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
     GET_FEATURE(ARM_FEATURE_V8_SHA512, ARM_HWCAP_A64_SHA512);
     GET_FEATURE(ARM_FEATURE_V8_FP16,
                 ARM_HWCAP_A64_FPHP | ARM_HWCAP_A64_ASIMDHP);
+    GET_FEATURE(ARM_FEATURE_V8_RDM, ARM_HWCAP_A64_ASIMDRDM);
 #undef GET_FEATURE
 
     return hwcaps;
-- 
2.16.2

From: Richard Henderson <richard.henderson@linaro.org>

Include the U bit in the switches rather than testing separately.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180228193125.20577-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 129 +++++++++++++++++++++------------------------
 1 file changed, 61 insertions(+), 68 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
     int index;
     TCGv_ptr fpst;
 
-    switch (opcode) {
-    case 0x0: /* MLA */
-    case 0x4: /* MLS */
-        if (!u || is_scalar) {
+    switch (16 * u + opcode) {
+    case 0x08: /* MUL */
+    case 0x10: /* MLA */
+    case 0x14: /* MLS */
+        if (is_scalar) {
             unallocated_encoding(s);
             return;
         }
         break;
-    case 0x2: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
-    case 0x6: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
-    case 0xa: /* SMULL, SMULL2, UMULL, UMULL2 */
+    case 0x02: /* SMLAL, SMLAL2 */
+    case 0x12: /* UMLAL, UMLAL2 */
+    case 0x06: /* SMLSL, SMLSL2 */
+    case 0x16: /* UMLSL, UMLSL2 */
+    case 0x0a: /* SMULL, SMULL2 */
+    case 0x1a: /* UMULL, UMULL2 */
         if (is_scalar) {
             unallocated_encoding(s);
             return;
         }
         is_long = true;
         break;
-    case 0x3: /* SQDMLAL, SQDMLAL2 */
-    case 0x7: /* SQDMLSL, SQDMLSL2 */
-    case 0xb: /* SQDMULL, SQDMULL2 */
+    case 0x03: /* SQDMLAL, SQDMLAL2 */
+    case 0x07: /* SQDMLSL, SQDMLSL2 */
+    case 0x0b: /* SQDMULL, SQDMULL2 */
         is_long = true;
-        /* fall through */
-    case 0xc: /* SQDMULH */
-    case 0xd: /* SQRDMULH */
-        if (u) {
-            unallocated_encoding(s);
-            return;
-        }
         break;
-    case 0x8: /* MUL */
-        if (u || is_scalar) {
-            unallocated_encoding(s);
-            return;
-        }
+    case 0x0c: /* SQDMULH */
+    case 0x0d: /* SQRDMULH */
         break;
-    case 0x1: /* FMLA */
-    case 0x5: /* FMLS */
-        if (u) {
-            unallocated_encoding(s);
-            return;
-        }
-        /* fall through */
-    case 0x9: /* FMUL, FMULX */
+    case 0x01: /* FMLA */
+    case 0x05: /* FMLS */
+    case 0x09: /* FMUL */
+    case 0x19: /* FMULX */
         if (size == 1) {
             unallocated_encoding(s);
             return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
 
             read_vec_element(s, tcg_op, rn, pass, MO_64);
 
-            switch (opcode) {
-            case 0x5: /* FMLS */
+            switch (16 * u + opcode) {
+            case 0x05: /* FMLS */
                 /* As usual for ARM, separate negation for fused multiply-add */
                 gen_helper_vfp_negd(tcg_op, tcg_op);
                 /* fall through */
-            case 0x1: /* FMLA */
+            case 0x01: /* FMLA */
                 read_vec_element(s, tcg_res, rd, pass, MO_64);
                 gen_helper_vfp_muladdd(tcg_res, tcg_op, tcg_idx, tcg_res, fpst);
                 break;
-            case 0x9: /* FMUL, FMULX */
-                if (u) {
-                    gen_helper_vfp_mulxd(tcg_res, tcg_op, tcg_idx, fpst);
-                } else {
-                    gen_helper_vfp_muld(tcg_res, tcg_op, tcg_idx, fpst);
-                }
+            case 0x09: /* FMUL */
+                gen_helper_vfp_muld(tcg_res, tcg_op, tcg_idx, fpst);
+                break;
+            case 0x19: /* FMULX */
+                gen_helper_vfp_mulxd(tcg_res, tcg_op, tcg_idx, fpst);
                 break;
             default:
                 g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
 
             read_vec_element_i32(s, tcg_op, rn, pass, is_scalar ? size : MO_32);
 
-            switch (opcode) {
-            case 0x0: /* MLA */
-            case 0x4: /* MLS */
-            case 0x8: /* MUL */
+            switch (16 * u + opcode) {
+            case 0x08: /* MUL */
+            case 0x10: /* MLA */
+            case 0x14: /* MLS */
             {
                 static NeonGenTwoOpFn * const fns[2][2] = {
                     { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 },
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                 genfn(tcg_res, tcg_op, tcg_res);
                 break;
             }
-            case 0x5: /* FMLS */
-            case 0x1: /* FMLA */
+            case 0x05: /* FMLS */
+            case 0x01: /* FMLA */
                 read_vec_element_i32(s, tcg_res, rd, pass,
                                      is_scalar ? size : MO_32);
                 switch (size) {
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                     g_assert_not_reached();
                 }
                 break;
-            case 0x9: /* FMUL, FMULX */
+            case 0x09: /* FMUL */
                 switch (size) {
                 case 1:
-                    if (u) {
-                        if (is_scalar) {
-                            gen_helper_advsimd_mulxh(tcg_res, tcg_op,
-                                                     tcg_idx, fpst);
-                        } else {
-                            gen_helper_advsimd_mulx2h(tcg_res, tcg_op,
-                                                      tcg_idx, fpst);
-                        }
+                    if (is_scalar) {
+                        gen_helper_advsimd_mulh(tcg_res, tcg_op,
+                                                tcg_idx, fpst);
                     } else {
-                        if (is_scalar) {
-                            gen_helper_advsimd_mulh(tcg_res, tcg_op,
-                                                    tcg_idx, fpst);
-                        } else {
-                            gen_helper_advsimd_mul2h(tcg_res, tcg_op,
-                                                     tcg_idx, fpst);
-                        }
+                        gen_helper_advsimd_mul2h(tcg_res, tcg_op,
+                                                 tcg_idx, fpst);
                     }
                     break;
                 case 2:
-                    if (u) {
-                        gen_helper_vfp_mulxs(tcg_res, tcg_op, tcg_idx, fpst);
-                    } else {
-                        gen_helper_vfp_muls(tcg_res, tcg_op, tcg_idx, fpst);
-                    }
+                    gen_helper_vfp_muls(tcg_res, tcg_op, tcg_idx, fpst);
                     break;
                 default:
                     g_assert_not_reached();
                 }
                 break;
-            case 0xc: /* SQDMULH */
+            case 0x19: /* FMULX */
+                switch (size) {
+                case 1:
+                    if (is_scalar) {
+                        gen_helper_advsimd_mulxh(tcg_res, tcg_op,
+                                                 tcg_idx, fpst);
+                    } else {
+                        gen_helper_advsimd_mulx2h(tcg_res, tcg_op,
+                                                  tcg_idx, fpst);
+                    }
+                    break;
+                case 2:
+                    gen_helper_vfp_mulxs(tcg_res, tcg_op, tcg_idx, fpst);
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+                break;
+            case 0x0c: /* SQDMULH */
                 if (size == 1) {
                     gen_helper_neon_qdmulh_s16(tcg_res, cpu_env,
                                                tcg_op, tcg_idx);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                                                tcg_op, tcg_idx);
                 }
                 break;
-            case 0xd: /* SQRDMULH */
+            case 0x0d: /* SQRDMULH */
                 if (size == 1) {
                     gen_helper_neon_qrdmulh_s16(tcg_res, cpu_env,
                                                 tcg_op, tcg_idx);
-- 
2.16.2

From: Richard Henderson <richard.henderson@linaro.org>

The integer size check was already outside of the opcode switch;
move the floating-point size check outside as well.  Unify the
size vs index adjustment between fp and integer paths.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180228193125.20577-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 65 +++++++++++++++++++++++-----------------------
 1 file changed, 32 insertions(+), 33 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180228193125.20577-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/Makefile.objs   |   2 +-
 target/arm/helper.h        |   4 ++
 target/arm/translate-a64.c |  84 ++++++++++++++++++++++++++++++++++
 target/arm/vec_helper.c    | 109 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 198 insertions(+), 1 deletion(-)
 create mode 100644 target/arm/vec_helper.c

diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -XXX,XX +XXX,XX @@ obj-$(call land,$(CONFIG_KVM),$(call lnot,$(TARGET_AARCH64))) += kvm32.o
 obj-$(call land,$(CONFIG_KVM),$(TARGET_AARCH64)) += kvm64.o
 obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
 obj-y += translate.o op_helper.o helper.o cpu.o
-obj-y += neon_helper.o iwmmxt_helper.o
+obj-y += neon_helper.o iwmmxt_helper.o vec_helper.o
 obj-y += gdbstub.o
 obj-$(TARGET_AARCH64) += cpu64.o translate-a64.o helper-a64.o gdbstub64.o
 obj-y += crypto_helper.o
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_1(neon_rbit_u8, TCG_CALL_NO_RWG_SE, i32, i32)
 
 DEF_HELPER_3(neon_qdmulh_s16, i32, env, i32, i32)
 DEF_HELPER_3(neon_qrdmulh_s16, i32, env, i32, i32)
+DEF_HELPER_4(neon_qrdmlah_s16, i32, env, i32, i32, i32)
+DEF_HELPER_4(neon_qrdmlsh_s16, i32, env, i32, i32, i32)
 DEF_HELPER_3(neon_qdmulh_s32, i32, env, i32, i32)
 DEF_HELPER_3(neon_qrdmulh_s32, i32, env, i32, i32)
+DEF_HELPER_4(neon_qrdmlah_s32, i32, env, s32, s32, s32)
+DEF_HELPER_4(neon_qrdmlsh_s32, i32, env, s32, s32, s32)
 
 DEF_HELPER_1(neon_narrow_u8, i32, i64)
 DEF_HELPER_1(neon_narrow_u16, i32, i64)
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_same_fp16(DisasContext *s,
     tcg_temp_free_ptr(fpst);
 }
 
+/* AdvSIMD scalar three same extra
+ *  31 30  29 28       24 23  22  21 20  16  15 14    11  10 9  5 4  0
+ * +-----+---+-----------+------+---+------+---+--------+---+----+----+
+ * | 0 1 | U | 1 1 1 1 0 | size | 0 |  Rm  | 1 | opcode | 1 | Rn | Rd |
+ * +-----+---+-----------+------+---+------+---+--------+---+----+----+
+ */
+static void disas_simd_scalar_three_reg_same_extra(DisasContext *s,
+                                                   uint32_t insn)
+{
+    int rd = extract32(insn, 0, 5);
+    int rn = extract32(insn, 5, 5);
+    int opcode = extract32(insn, 11, 4);
+    int rm = extract32(insn, 16, 5);
+    int size = extract32(insn, 22, 2);
+    bool u = extract32(insn, 29, 1);
+    TCGv_i32 ele1, ele2, ele3;
+    TCGv_i64 res;
+    int feature;
+
+    switch (u * 16 + opcode) {
+    case 0x10: /* SQRDMLAH (vector) */
+    case 0x11: /* SQRDMLSH (vector) */
+        if (size != 1 && size != 2) {
+            unallocated_encoding(s);
+            return;
+        }
+        feature = ARM_FEATURE_V8_RDM;
+        break;
+    default:
+        unallocated_encoding(s);
+        return;
+    }
+    if (!arm_dc_feature(s, feature)) {
+        unallocated_encoding(s);
+        return;
+    }
+    if (!fp_access_check(s)) {
+        return;
+    }
+
+    /* Do a single operation on the lowest element in the vector.
+     * We use the standard Neon helpers and rely on 0 OP 0 == 0
+     * with no side effects for all these operations.
+     * OPTME: special-purpose helpers would avoid doing some
+     * unnecessary work in the helper for the 16 bit cases.
+     */
+    ele1 = tcg_temp_new_i32();
+    ele2 = tcg_temp_new_i32();
+    ele3 = tcg_temp_new_i32();
+
+    read_vec_element_i32(s, ele1, rn, 0, size);
+    read_vec_element_i32(s, ele2, rm, 0, size);
+    read_vec_element_i32(s, ele3, rd, 0, size);
+
+    switch (opcode) {
+    case 0x0: /* SQRDMLAH */
+        if (size == 1) {
+            gen_helper_neon_qrdmlah_s16(ele3, cpu_env, ele1, ele2, ele3);
+        } else {
+            gen_helper_neon_qrdmlah_s32(ele3, cpu_env, ele1, ele2, ele3);
+        }
+        break;
+    case 0x1: /* SQRDMLSH */
+        if (size == 1) {
+            gen_helper_neon_qrdmlsh_s16(ele3, cpu_env, ele1, ele2, ele3);
+        } else {
+            gen_helper_neon_qrdmlsh_s32(ele3, cpu_env, ele1, ele2, ele3);
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    tcg_temp_free_i32(ele1);
+    tcg_temp_free_i32(ele2);
+
+    res = tcg_temp_new_i64();
+    tcg_gen_extu_i32_i64(res, ele3);
+    tcg_temp_free_i32(ele3);
+
+    write_fp_dreg(s, rd, res);
+    tcg_temp_free_i64(res);
+}
+
 static void handle_2misc_64(DisasContext *s, int opcode, bool u,
                             TCGv_i64 tcg_rd, TCGv_i64 tcg_rn,
                             TCGv_i32 tcg_rmode, TCGv_ptr tcg_fpstatus)
@@ -XXX,XX +XXX,XX @@ static const AArch64DecodeTable data_proc_simd[] = {
     { 0x0e000800, 0xbf208c00, disas_simd_zip_trn },
     { 0x2e000000, 0xbf208400, disas_simd_ext },
     { 0x5e200400, 0xdf200400, disas_simd_scalar_three_reg_same },
+    { 0x5e008400, 0xdf208400, disas_simd_scalar_three_reg_same_extra },
     { 0x5e200000, 0xdf200c00, disas_simd_scalar_three_reg_diff },
     { 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc },
     { 0x5e300800, 0xdf3e0c00, disas_simd_scalar_pairwise },
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM AdvSIMD / SVE Vector Operations
+ *
+ * Copyright (c) 2018 Linaro
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+#include "tcg/tcg-gvec-desc.h"
+
+
+#define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] |= CPSR_Q
+
+/* Signed saturating rounding doubling multiply-accumulate high half, 16-bit */
+static uint16_t inl_qrdmlah_s16(CPUARMState *env, int16_t src1,
+                                int16_t src2, int16_t src3)
+{
+    /* Simplify:
+     * = ((a3 << 16) + ((e1 * e2) << 1) + (1 << 15)) >> 16
+     * = ((a3 << 15) + (e1 * e2) + (1 << 14)) >> 15
+     */
+    int32_t ret = (int32_t)src1 * src2;
+    ret = ((int32_t)src3 << 15) + ret + (1 << 14);
+    ret >>= 15;
+    if (ret != (int16_t)ret) {
+        SET_QC();
+        ret = (ret < 0 ? -0x8000 : 0x7fff);
+    }
+    return ret;
+}
+
+uint32_t HELPER(neon_qrdmlah_s16)(CPUARMState *env, uint32_t src1,
+                                  uint32_t src2, uint32_t src3)
+{
+    uint16_t e1 = inl_qrdmlah_s16(env, src1, src2, src3);
+    uint16_t e2 = inl_qrdmlah_s16(env, src1 >> 16, src2 >> 16, src3 >> 16);
+    return deposit32(e1, 16, 16, e2);
+}
+
+/* Signed saturating rounding doubling multiply-subtract high half, 16-bit */
+static uint16_t inl_qrdmlsh_s16(CPUARMState *env, int16_t src1,
+                                int16_t src2, int16_t src3)
+{
+    /* Similarly, using subtraction:
+     * = ((a3 << 16) - ((e1 * e2) << 1) + (1 << 15)) >> 16
+     * = ((a3 << 15) - (e1 * e2) + (1 << 14)) >> 15
+     */
+    int32_t ret = (int32_t)src1 * src2;
+    ret = ((int32_t)src3 << 15) - ret + (1 << 14);
+    ret >>= 15;
+    if (ret != (int16_t)ret) {
+        SET_QC();
+        ret = (ret < 0 ? -0x8000 : 0x7fff);
+    }
+    return ret;
+}
+
+uint32_t HELPER(neon_qrdmlsh_s16)(CPUARMState *env, uint32_t src1,
+                                  uint32_t src2, uint32_t src3)
+{
+    uint16_t e1 = inl_qrdmlsh_s16(env, src1, src2, src3);
+    uint16_t e2 = inl_qrdmlsh_s16(env, src1 >> 16, src2 >> 16, src3 >> 16);
+    return deposit32(e1, 16, 16, e2);
+}
+
+/* Signed saturating rounding doubling multiply-accumulate high half, 32-bit */
+uint32_t HELPER(neon_qrdmlah_s32)(CPUARMState *env, int32_t src1,
+                                  int32_t src2, int32_t src3)
+{
+    /* Simplify similarly to int_qrdmlah_s16 above.  */
+    int64_t ret = (int64_t)src1 * src2;
+    ret = ((int64_t)src3 << 31) + ret + (1 << 30);
+    ret >>= 31;
+    if (ret != (int32_t)ret) {
+        SET_QC();
+        ret = (ret < 0 ? INT32_MIN : INT32_MAX);
+    }
+    return ret;
+}
+
+/* Signed saturating rounding doubling multiply-subtract high half, 32-bit */
+uint32_t HELPER(neon_qrdmlsh_s32)(CPUARMState *env, int32_t src1,
+                                  int32_t src2, int32_t src3)
+{
+    /* Simplify similarly to int_qrdmlsh_s16 above.  */
+    int64_t ret = (int64_t)src1 * src2;
+    ret = ((int64_t)src3 << 31) - ret + (1 << 30);
+    ret >>= 31;
+    if (ret != (int32_t)ret) {
+        SET_QC();
+        ret = (ret < 0 ? INT32_MIN : INT32_MAX);
+    }
+    return ret;
+}
-- 
2.16.2

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180228193125.20577-6-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  9 +++++
 target/arm/translate-a64.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/vec_helper.c    | 74 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 166 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(dc_zva, void, env, i64)
 DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 
+DEF_HELPER_FLAGS_5(gvec_qrdmlah_s16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_qrdmlah_s32, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s32, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #endif
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op3(DisasContext *s, bool is_q, int rd,
                    vec_full_reg_size(s), gvec_op);
 }
 
+/* Expand a 3-operand + env pointer operation using
+ * an out-of-line helper.
+ */
+static void gen_gvec_op3_env(DisasContext *s, bool is_q, int rd,
+                             int rn, int rm, gen_helper_gvec_3_ptr *fn)
+{
+    tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
+                       vec_full_reg_offset(s, rn),
+                       vec_full_reg_offset(s, rm), cpu_env,
+                       is_q ? 16 : 8, vec_full_reg_size(s), 0, fn);
+}
+
 /* Set ZF and NF based on a 64 bit result. This is alas fiddlier
  * than the 32 bit equivalent.
  */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_fp16(DisasContext *s, uint32_t insn)
     clear_vec_high(s, is_q, rd);
 }
 
+/* AdvSIMD three same extra
+ *  31   30  29 28       24 23  22  21 20  16  15 14    11  10 9  5 4  0
+ * +---+---+---+-----------+------+---+------+---+--------+---+----+----+
+ * | 0 | Q | U | 0 1 1 1 0 | size | 0 |  Rm  | 1 | opcode | 1 | Rn | Rd |
+ * +---+---+---+-----------+------+---+------+---+--------+---+----+----+
+ */
+static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
+{
+    int rd = extract32(insn, 0, 5);
+    int rn = extract32(insn, 5, 5);
+    int opcode = extract32(insn, 11, 4);
+    int rm = extract32(insn, 16, 5);
+    int size = extract32(insn, 22, 2);
+    bool u = extract32(insn, 29, 1);
+    bool is_q = extract32(insn, 30, 1);
+    int feature;
+
+    switch (u * 16 + opcode) {
+    case 0x10: /* SQRDMLAH (vector) */
+    case 0x11: /* SQRDMLSH (vector) */
+        if (size != 1 && size != 2) {
+            unallocated_encoding(s);
+            return;
+        }
+        feature = ARM_FEATURE_V8_RDM;
+        break;
+    default:
+        unallocated_encoding(s);
+        return;
+    }
+    if (!arm_dc_feature(s, feature)) {
+        unallocated_encoding(s);
+        return;
+    }
+    if (!fp_access_check(s)) {
+        return;
+    }
+
+    switch (opcode) {
+    case 0x0: /* SQRDMLAH (vector) */
+        switch (size) {
+        case 1:
+            gen_gvec_op3_env(s, is_q, rd, rn, rm, gen_helper_gvec_qrdmlah_s16);
+            break;
+        case 2:
+            gen_gvec_op3_env(s, is_q, rd, rn, rm, gen_helper_gvec_qrdmlah_s32);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        return;
+
+    case 0x1: /* SQRDMLSH (vector) */
+        switch (size) {
+        case 1:
+            gen_gvec_op3_env(s, is_q, rd, rn, rm, gen_helper_gvec_qrdmlsh_s16);
+            break;
+        case 2:
+            gen_gvec_op3_env(s, is_q, rd, rn, rm, gen_helper_gvec_qrdmlsh_s32);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        return;
+
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void handle_2misc_widening(DisasContext *s, int opcode, bool is_q,
                                   int size, int rn, int rd)
 {
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_three_reg_imm2(DisasContext *s, uint32_t insn)
 static const AArch64DecodeTable data_proc_simd[] = {
     /* pattern  ,  mask     ,  fn                        */
     { 0x0e200400, 0x9f200400, disas_simd_three_reg_same },
+    { 0x0e008400, 0x9f208400, disas_simd_three_reg_same_extra },
     { 0x0e200000, 0x9f200c00, disas_simd_three_reg_diff },
     { 0x0e200800, 0x9f3e0c00, disas_simd_two_reg_misc },
     { 0x0e300800, 0x9f3e0c00, disas_simd_across_lanes },
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@
 
 #define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] |= CPSR_Q
 
+static void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
+{
+    uint64_t *d = vd + opr_sz;
+    uintptr_t i;
+
+    for (i = opr_sz; i < max_sz; i += 8) {
+        *d++ = 0;
+    }
+}
+
 /* Signed saturating rounding doubling multiply-accumulate high half, 16-bit */
 static uint16_t inl_qrdmlah_s16(CPUARMState *env, int16_t src1,
                                 int16_t src2, int16_t src3)
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(neon_qrdmlah_s16)(CPUARMState *env, uint32_t src1,
     return deposit32(e1, 16, 16, e2);
 }
 
+void HELPER(gvec_qrdmlah_s16)(void *vd, void *vn, void *vm,
+                              void *ve, uint32_t desc)
+{
+    uintptr_t opr_sz = simd_oprsz(desc);
+    int16_t *d = vd;
+    int16_t *n = vn;
+    int16_t *m = vm;
+    CPUARMState *env = ve;
+    uintptr_t i;
+
+    for (i = 0; i < opr_sz / 2; ++i) {
+        d[i] = inl_qrdmlah_s16(env, n[i], m[i], d[i]);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
 /* Signed saturating rounding doubling multiply-subtract high half, 16-bit */
 static uint16_t inl_qrdmlsh_s16(CPUARMState *env, int16_t src1,
                                 int16_t src2, int16_t src3)
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(neon_qrdmlsh_s16)(CPUARMState *env, uint32_t src1,
     return deposit32(e1, 16, 16, e2);
 }
 
+void HELPER(gvec_qrdmlsh_s16)(void *vd, void *vn, void *vm,
+                              void *ve, uint32_t desc)
+{
+    uintptr_t opr_sz = simd_oprsz(desc);
+    int16_t *d = vd;
+    int16_t *n = vn;
+    int16_t *m = vm;
+    CPUARMState *env = ve;
+    uintptr_t i;
+
+    for (i = 0; i < opr_sz / 2; ++i) {
+        d[i] = inl_qrdmlsh_s16(env, n[i], m[i], d[i]);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
 /* Signed saturating rounding doubling multiply-accumulate high half, 32-bit */
 uint32_t HELPER(neon_qrdmlah_s32)(CPUARMState *env, int32_t src1,
                                   int32_t src2, int32_t src3)
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(neon_qrdmlah_s32)(CPUARMState *env, int32_t src1,
     return ret;
 }
 
+void HELPER(gvec_qrdmlah_s32)(void *vd, void *vn, void *vm,
+                              void *ve, uint32_t desc)
+{
+    uintptr_t opr_sz = simd_oprsz(desc);
+    int32_t *d = vd;
+    int32_t *n = vn;
+    int32_t *m = vm;
+    CPUARMState *env = ve;
+    uintptr_t i;
+
+    for (i = 0; i < opr_sz / 4; ++i) {
+        d[i] = helper_neon_qrdmlah_s32(env, n[i], m[i], d[i]);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
 /* Signed saturating rounding doubling multiply-subtract high half, 32-bit */
 uint32_t HELPER(neon_qrdmlsh_s32)(CPUARMState *env, int32_t src1,
                                   int32_t src2, int32_t src3)
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(neon_qrdmlsh_s32)(CPUARMState *env, int32_t src1,
     }
     return ret;
 }
+
+void HELPER(gvec_qrdmlsh_s32)(void *vd, void *vn, void *vm,
+                              void *ve, uint32_t desc)
+{
+    uintptr_t opr_sz = simd_oprsz(desc);
+    int32_t *d = vd;
+    int32_t *n = vn;
+    int32_t *m = vm;
+    CPUARMState *env = ve;
+    uintptr_t i;
+
+    for (i = 0; i < opr_sz / 4; ++i) {
+        d[i] = helper_neon_qrdmlsh_s32(env, n[i], m[i], d[i]);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.16.2

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180228193125.20577-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180228193125.20577-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 86 +++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 67 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@
 #include "disas/disas.h"
 #include "exec/exec-all.h"
 #include "tcg-op.h"
+#include "tcg-op-gvec.h"
 #include "qemu/log.h"
 #include "qemu/bitops.h"
 #include "arm_ldst.h"
@@ -XXX,XX +XXX,XX @@ static void gen_neon_narrow_op(int op, int u, int size,
 #define NEON_3R_VPMAX 20
 #define NEON_3R_VPMIN 21
 #define NEON_3R_VQDMULH_VQRDMULH 22
-#define NEON_3R_VPADD 23
+#define NEON_3R_VPADD_VQRDMLAH 23
 #define NEON_3R_SHA 24 /* SHA1C,SHA1P,SHA1M,SHA1SU0,SHA256H{2},SHA256SU1 */
-#define NEON_3R_VFM 25 /* VFMA, VFMS : float fused multiply-add */
+#define NEON_3R_VFM_VQRDMLSH 25 /* VFMA, VFMS, VQRDMLSH */
 #define NEON_3R_FLOAT_ARITH 26 /* float VADD, VSUB, VPADD, VABD */
 #define NEON_3R_FLOAT_MULTIPLY 27 /* float VMLA, VMLS, VMUL */
 #define NEON_3R_FLOAT_CMP 28 /* float VCEQ, VCGE, VCGT */
@@ -XXX,XX +XXX,XX @@ static const uint8_t neon_3r_sizes[] = {
     [NEON_3R_VPMAX] = 0x7,
     [NEON_3R_VPMIN] = 0x7,
     [NEON_3R_VQDMULH_VQRDMULH] = 0x6,
-    [NEON_3R_VPADD] = 0x7,
+    [NEON_3R_VPADD_VQRDMLAH] = 0x7,
     [NEON_3R_SHA] = 0xf, /* size field encodes op type */
-    [NEON_3R_VFM] = 0x5, /* size bit 1 encodes op */
+    [NEON_3R_VFM_VQRDMLSH] = 0x7, /* For VFM, size bit 1 encodes op */
     [NEON_3R_FLOAT_ARITH] = 0x5, /* size bit 1 encodes op */
     [NEON_3R_FLOAT_MULTIPLY] = 0x5, /* size bit 1 encodes op */
     [NEON_3R_FLOAT_CMP] = 0x5, /* size bit 1 encodes op */
@@ -XXX,XX +XXX,XX @@ static const uint8_t neon_2rm_sizes[] = {
     [NEON_2RM_VCVT_UF] = 0x4,
 };
 
+
+/* Expand v8.1 simd helper.  */
+static int do_v81_helper(DisasContext *s, gen_helper_gvec_3_ptr *fn,
+                         int q, int rd, int rn, int rm)
+{
+    if (arm_dc_feature(s, ARM_FEATURE_V8_RDM)) {
+        int opr_sz = (1 + q) * 8;
+        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
+                           vfp_reg_offset(1, rn),
+                           vfp_reg_offset(1, rm), cpu_env,
+                           opr_sz, opr_sz, 0, fn);
+        return 0;
+    }
+    return 1;
+}
+
 /* Translate a NEON data processing instruction.  Return nonzero if the
    instruction is invalid.
    We process data in a mixture of 32-bit and 64-bit chunks.
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         if (q && ((rd | rn | rm) & 1)) {
             return 1;
         }
-        /*
-         * The SHA-1/SHA-256 3-register instructions require special treatment
-         * here, as their size field is overloaded as an op type selector, and
-         * they all consume their input in a single pass.
-         */
-        if (op == NEON_3R_SHA) {
+        switch (op) {
+        case NEON_3R_SHA:
+            /* The SHA-1/SHA-256 3-register instructions require special
+             * treatment here, as their size field is overloaded as an
+             * op type selector, and they all consume their input in a
+             * single pass.
+             */
             if (!q) {
                 return 1;
             }
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tcg_temp_free_ptr(ptr2);
             tcg_temp_free_ptr(ptr3);
             return 0;
+
+        case NEON_3R_VPADD_VQRDMLAH:
+            if (!u) {
+                break;  /* VPADD */
+            }
+            /* VQRDMLAH */
+            switch (size) {
+            case 1:
+                return do_v81_helper(s, gen_helper_gvec_qrdmlah_s16,
+                                     q, rd, rn, rm);
+            case 2:
+                return do_v81_helper(s, gen_helper_gvec_qrdmlah_s32,
+                                     q, rd, rn, rm);
+            }
+            return 1;
+
+        case NEON_3R_VFM_VQRDMLSH:
+            if (!u) {
+                /* VFM, VFMS */
+                if (size == 1) {
+                    return 1;
+                }
+                break;
+            }
+            /* VQRDMLSH */
+            switch (size) {
+            case 1:
+                return do_v81_helper(s, gen_helper_gvec_qrdmlsh_s16,
+                                     q, rd, rn, rm);
+            case 2:
+                return do_v81_helper(s, gen_helper_gvec_qrdmlsh_s32,
+                                     q, rd, rn, rm);
+            }
+            return 1;
         }
         if (size == 3 && op != NEON_3R_LOGIC) {
             /* 64-bit element instructions. */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 rm = rtmp;
             }
             break;
-        case NEON_3R_VPADD:
-            if (u) {
-                return 1;
-            }
-            /* Fall through */
+        case NEON_3R_VPADD_VQRDMLAH:
         case NEON_3R_VPMAX:
         case NEON_3R_VPMIN:
             pairwise = 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 return 1;
             }
             break;
-        case NEON_3R_VFM:
-            if (!arm_dc_feature(s, ARM_FEATURE_VFP4) || u) {
+        case NEON_3R_VFM_VQRDMLSH:
+            if (!arm_dc_feature(s, ARM_FEATURE_VFP4)) {
                 return 1;
             }
             break;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 }
             }
             break;
-        case NEON_3R_VPADD:
+        case NEON_3R_VPADD_VQRDMLAH:
             switch (size) {
             case 0: gen_helper_neon_padd_u8(tmp, tmp, tmp2); break;
             case 1: gen_helper_neon_padd_u16(tmp, tmp, tmp2); break;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
               }
             }
             break;
-        case NEON_3R_VFM:
+        case NEON_3R_VFM_VQRDMLSH:
         {
             /* VFMA, VFMS: fused multiply-add */
             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-- 
2.16.2

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180228193125.20577-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 46 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static const char *regnames[] =
     { "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
       "r8", "r9", "r10", "r11", "r12", "r13", "r14", "pc" };
 
+/* Function prototypes for gen_ functions calling Neon helpers.  */
+typedef void NeonGenThreeOpEnvFn(TCGv_i32, TCGv_env, TCGv_i32,
+                                 TCGv_i32, TCGv_i32);
+
 /* initialize TCG globals.  */
 void arm_translate_init(void)
 {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         }
                         neon_store_reg64(cpu_V0, rd + pass);
                     }
-
-
                     break;
-                default: /* 14 and 15 are RESERVED */
-                    return 1;
+                case 14: /* VQRDMLAH scalar */
+                case 15: /* VQRDMLSH scalar */
+                    {
+                        NeonGenThreeOpEnvFn *fn;
+
+                        if (!arm_dc_feature(s, ARM_FEATURE_V8_RDM)) {
+                            return 1;
+                        }
+                        if (u && ((rd | rn) & 1)) {
+                            return 1;
+                        }
+                        if (op == 14) {
+                            if (size == 1) {
+                                fn = gen_helper_neon_qrdmlah_s16;
+                            } else {
+                                fn = gen_helper_neon_qrdmlah_s32;
+                            }
+                        } else {
+                            if (size == 1) {
+                                fn = gen_helper_neon_qrdmlsh_s16;
+                            } else {
+                                fn = gen_helper_neon_qrdmlsh_s32;
+                            }
+                        }
+
+                        tmp2 = neon_get_scalar(size, rm);
+                        for (pass = 0; pass < (u ? 4 : 2); pass++) {
+                            tmp = neon_load_reg(rn, pass);
+                            tmp3 = neon_load_reg(rd, pass);
+                            fn(tmp, cpu_env, tmp, tmp2, tmp3);
+                            tcg_temp_free_i32(tmp3);
+                            neon_store_reg(rd, pass, tmp);
+                        }
+                        tcg_temp_free_i32(tmp2);
+                    }
+                    break;
+                default:
+                    g_assert_not_reached();
                 }
             }
         } else { /* size == 3 */
-- 
2.16.2

From: Richard Henderson <richard.henderson@linaro.org>

Enable it for the "any" CPU used by *-linux-user.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180228193125.20577-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c   | 1 +
 target/arm/cpu64.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_any_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
     set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
     set_feature(&cpu->env, ARM_FEATURE_CRC);
+    set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
     cpu->midr = 0xffffffff;
 }
 #endif
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_any_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_V8_SM4);
     set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
     set_feature(&cpu->env, ARM_FEATURE_CRC);
+    set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
     set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
     cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
     cpu->dcz_blocksize = 7; /*  512 bytes */
-- 
2.16.2

From: Richard Henderson <richard.henderson@linaro.org>

Not enabled anywhere yet.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180228193125.20577-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h     | 1 +
 linux-user/elfload.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ enum arm_features {
     ARM_FEATURE_V8_SM4, /* implements SM4 part of v8 Crypto Extensions */
     ARM_FEATURE_V8_RDM, /* implements v8.1 simd round multiply */
     ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
+    ARM_FEATURE_V8_FCMA, /* has complex number part of v8.3 extensions.  */
 };
 
 static inline int arm_feature(CPUARMState *env, int feature)
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
     GET_FEATURE(ARM_FEATURE_V8_FP16,
                 ARM_HWCAP_A64_FPHP | ARM_HWCAP_A64_ASIMDHP);
     GET_FEATURE(ARM_FEATURE_V8_RDM, ARM_HWCAP_A64_ASIMDRDM);
+    GET_FEATURE(ARM_FEATURE_V8_FCMA, ARM_HWCAP_A64_FCMA);
 #undef GET_FEATURE
 
     return hwcaps;
-- 
2.16.2

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180228193125.20577-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  7 ++++
 target/arm/translate-a64.c | 48 ++++++++++++++++++++++-
 target/arm/vec_helper.c    | 97 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 151 insertions(+), 1 deletion(-)

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180228193125.20577-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: renamed e1/e2/e3/e4 to use the same naming as the version
 of the pseudocode in the Arm ARM]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  11 ++++
 target/arm/translate-a64.c |  94 +++++++++++++++++++++++++---
 target/arm/vec_helper.c    | 149 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 246 insertions(+), 8 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fcadds, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_fcaddd, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_fcmlah, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fcmlah_idx, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fcmlas, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fcmlas_idx, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fcmlad, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #endif
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         feature = ARM_FEATURE_V8_RDM;
         break;
+    case 0x8: /* FCMLA, #0 */
+    case 0x9: /* FCMLA, #90 */
+    case 0xa: /* FCMLA, #180 */
+    case 0xb: /* FCMLA, #270 */
     case 0xc: /* FCADD, #90 */
     case 0xe: /* FCADD, #270 */
         if (size == 0
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         return;
 
+    case 0x8: /* FCMLA, #0 */
+    case 0x9: /* FCMLA, #90 */
+    case 0xa: /* FCMLA, #180 */
+    case 0xb: /* FCMLA, #270 */
+        rot = extract32(opcode, 0, 2);
+        switch (size) {
+        case 1:
+            gen_gvec_op3_fpst(s, is_q, rd, rn, rm, true, rot,
+                              gen_helper_gvec_fcmlah);
+            break;
+        case 2:
+            gen_gvec_op3_fpst(s, is_q, rd, rn, rm, false, rot,
+                              gen_helper_gvec_fcmlas);
+            break;
+        case 3:
+            gen_gvec_op3_fpst(s, is_q, rd, rn, rm, false, rot,
+                              gen_helper_gvec_fcmlad);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        return;
+
     case 0xc: /* FCADD, #90 */
     case 0xe: /* FCADD, #270 */
         rot = extract32(opcode, 1, 1);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
     int rn = extract32(insn, 5, 5);
     int rd = extract32(insn, 0, 5);
     bool is_long = false;
-    bool is_fp = false;
+    int is_fp = 0;
     bool is_fp16 = false;
     int index;
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
     case 0x05: /* FMLS */
     case 0x09: /* FMUL */
     case 0x19: /* FMULX */
-        is_fp = true;
+        is_fp = 1;
         break;
     case 0x1d: /* SQRDMLAH */
     case 0x1f: /* SQRDMLSH */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
             return;
         }
         break;
+    case 0x11: /* FCMLA #0 */
+    case 0x13: /* FCMLA #90 */
+    case 0x15: /* FCMLA #180 */
+    case 0x17: /* FCMLA #270 */
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)) {
+            unallocated_encoding(s);
+            return;
+        }
+        is_fp = 2;
+        break;
     default:
         unallocated_encoding(s);
         return;
     }
 
-    if (is_fp) {
+    switch (is_fp) {
+    case 1: /* normal fp */
         /* convert insn encoded size to TCGMemOp size */
         switch (size) {
         case 0: /* half-precision */
-            if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
-                unallocated_encoding(s);
-                return;
-            }
             size = MO_16;
+            is_fp16 = true;
             break;
         case MO_32: /* single precision */
         case MO_64: /* double precision */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
             unallocated_encoding(s);
             return;
         }
-    } else {
+        break;
+
+    case 2: /* complex fp */
+        /* Each indexable element is a complex pair.  */
+        size <<= 1;
+        switch (size) {
+        case MO_32:
+            if (h && !is_q) {
+                unallocated_encoding(s);
+                return;
+            }
+            is_fp16 = true;
+            break;
+        case MO_64:
+            break;
+        default:
+            unallocated_encoding(s);
+            return;
+        }
+        break;
+
+    default: /* integer */
         switch (size) {
         case MO_8:
         case MO_64:
             unallocated_encoding(s);
             return;
         }
+        break;
+    }
+    if (is_fp16 && !arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+        unallocated_encoding(s);
+        return;
     }
 
     /* Given TCGMemOp size, adjust register and indexing.  */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
         fpst = NULL;
     }
 
+    switch (16 * u + opcode) {
+    case 0x11: /* FCMLA #0 */
+    case 0x13: /* FCMLA #90 */
+    case 0x15: /* FCMLA #180 */
+    case 0x17: /* FCMLA #270 */
+        tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
+                           vec_full_reg_offset(s, rn),
+                           vec_reg_offset(s, rm, index, size), fpst,
+                           is_q ? 16 : 8, vec_full_reg_size(s),
+                           extract32(insn, 13, 2), /* rot */
+                           size == MO_64
+                           ? gen_helper_gvec_fcmlas_idx
+                           : gen_helper_gvec_fcmlah_idx);
+        tcg_temp_free_ptr(fpst);
+        return;
+    }
+
     if (size == 3) {
         TCGv_i64 tcg_idx = tcg_temp_new_i64();
         int pass;
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm,
+                         void *vfpst, uint32_t desc)
+{
+    uintptr_t opr_sz = simd_oprsz(desc);
+    float16 *d = vd;
+    float16 *n = vn;
+    float16 *m = vm;
+    float_status *fpst = vfpst;
+    intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t neg_real = flip ^ neg_imag;
+    uintptr_t i;
+
+    /* Shift boolean to the sign bit so we can xor to negate.  */
+    neg_real <<= 15;
+    neg_imag <<= 15;
+
+    for (i = 0; i < opr_sz / 2; i += 2) {
+        float16 e2 = n[H2(i + flip)];
+        float16 e1 = m[H2(i + flip)] ^ neg_real;
+        float16 e4 = e2;
+        float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag;
+
+        d[H2(i)] = float16_muladd(e2, e1, d[H2(i)], 0, fpst);
+        d[H2(i + 1)] = float16_muladd(e4, e3, d[H2(i + 1)], 0, fpst);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm,
+                             void *vfpst, uint32_t desc)
+{
+    uintptr_t opr_sz = simd_oprsz(desc);
+    float16 *d = vd;
+    float16 *n = vn;
+    float16 *m = vm;
+    float_status *fpst = vfpst;
+    intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t neg_real = flip ^ neg_imag;
+    uintptr_t i;
+    float16 e1 = m[H2(flip)];
+    float16 e3 = m[H2(1 - flip)];
+
+    /* Shift boolean to the sign bit so we can xor to negate.  */
+    neg_real <<= 15;
+    neg_imag <<= 15;
+    e1 ^= neg_real;
+    e3 ^= neg_imag;
+
+    for (i = 0; i < opr_sz / 2; i += 2) {
+        float16 e2 = n[H2(i + flip)];
+        float16 e4 = e2;
+
+        d[H2(i)] = float16_muladd(e2, e1, d[H2(i)], 0, fpst);
+        d[H2(i + 1)] = float16_muladd(e4, e3, d[H2(i + 1)], 0, fpst);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_fcmlas)(void *vd, void *vn, void *vm,
+                         void *vfpst, uint32_t desc)
+{
+    uintptr_t opr_sz = simd_oprsz(desc);
+    float32 *d = vd;
+    float32 *n = vn;
+    float32 *m = vm;
+    float_status *fpst = vfpst;
+    intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t neg_real = flip ^ neg_imag;
+    uintptr_t i;
+
+    /* Shift boolean to the sign bit so we can xor to negate.  */
+    neg_real <<= 31;
+    neg_imag <<= 31;
+
+    for (i = 0; i < opr_sz / 4; i += 2) {
+        float32 e2 = n[H4(i + flip)];
+        float32 e1 = m[H4(i + flip)] ^ neg_real;
+        float32 e4 = e2;
+        float32 e3 = m[H4(i + 1 - flip)] ^ neg_imag;
+
+        d[H4(i)] = float32_muladd(e2, e1, d[H4(i)], 0, fpst);
+        d[H4(i + 1)] = float32_muladd(e4, e3, d[H4(i + 1)], 0, fpst);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm,
+                             void *vfpst, uint32_t desc)
+{
+    uintptr_t opr_sz = simd_oprsz(desc);
+    float32 *d = vd;
+    float32 *n = vn;
+    float32 *m = vm;
+    float_status *fpst = vfpst;
+    intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t neg_real = flip ^ neg_imag;
+    uintptr_t i;
+    float32 e1 = m[H4(flip)];
+    float32 e3 = m[H4(1 - flip)];
+
+    /* Shift boolean to the sign bit so we can xor to negate.  */
+    neg_real <<= 31;
+    neg_imag <<= 31;
+    e1 ^= neg_real;
+    e3 ^= neg_imag;
+
+    for (i = 0; i < opr_sz / 4; i += 2) {
+        float32 e2 = n[H4(i + flip)];
+        float32 e4 = e2;
+
+        d[H4(i)] = float32_muladd(e2, e1, d[H4(i)], 0, fpst);
+        d[H4(i + 1)] = float32_muladd(e4, e3, d[H4(i + 1)], 0, fpst);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_fcmlad)(void *vd, void *vn, void *vm,
+                         void *vfpst, uint32_t desc)
+{
+    uintptr_t opr_sz = simd_oprsz(desc);
+    float64 *d = vd;
+    float64 *n = vn;
+    float64 *m = vm;
+    float_status *fpst = vfpst;
+    intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint64_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint64_t neg_real = flip ^ neg_imag;
+    uintptr_t i;
+
+    /* Shift boolean to the sign bit so we can xor to negate.  */
+    neg_real <<= 63;
+    neg_imag <<= 63;
+
+    for (i = 0; i < opr_sz / 8; i += 2) {
+        float64 e2 = n[i + flip];
+        float64 e1 = m[i + flip] ^ neg_real;
+        float64 e4 = e2;
+        float64 e3 = m[i + 1 - flip] ^ neg_imag;
+
+        d[i] = float64_muladd(e2, e1, d[i], 0, fpst);
+        d[i + 1] = float64_muladd(e4, e3, d[i + 1], 0, fpst);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.16.2

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180228193125.20577-14-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     return 0;
 }
 
+/* Advanced SIMD three registers of the same length extension.
+ *  31           25    23  22    20   16   12  11   10   9    8        3     0
+ * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
+ * | 1 1 1 1 1 1 0 | op1 | D | op2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
+ * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
+ */
+static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
+{
+    gen_helper_gvec_3_ptr *fn_gvec_ptr;
+    int rd, rn, rm, rot, size, opr_sz;
+    TCGv_ptr fpst;
+    bool q;
+
+    q = extract32(insn, 6, 1);
+    VFP_DREG_D(rd, insn);
+    VFP_DREG_N(rn, insn);
+    VFP_DREG_M(rm, insn);
+    if ((rd | rn | rm) & q) {
+        return 1;
+    }
+
+    if ((insn & 0xfe200f10) == 0xfc200800) {
+        /* VCMLA -- 1111 110R R.1S .... .... 1000 ...0 .... */
+        size = extract32(insn, 20, 1);
+        rot = extract32(insn, 23, 2);
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)
+            || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
+            return 1;
+        }
+        fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
+    } else if ((insn & 0xfea00f10) == 0xfc800800) {
+        /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
+        size = extract32(insn, 20, 1);
+        rot = extract32(insn, 24, 1);
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)
+            || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
+            return 1;
+        }
+        fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
+    } else {
+        return 1;
+    }
+
+    if (s->fp_excp_el) {
+        gen_exception_insn(s, 4, EXCP_UDEF,
+                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
+        return 0;
+    }
+    if (!s->vfp_enabled) {
+        return 1;
+    }
+
+    opr_sz = (1 + q) * 8;
+    fpst = get_fpstatus_ptr(1);
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
+                       vfp_reg_offset(1, rn),
+                       vfp_reg_offset(1, rm), fpst,
+                       opr_sz, opr_sz, rot, fn_gvec_ptr);
+    tcg_temp_free_ptr(fpst);
+    return 0;
+}
+
 static int disas_coproc_insn(DisasContext *s, uint32_t insn)
 {
     int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                     }
                 }
             }
+        } else if ((insn & 0x0e000a00) == 0x0c000800
+                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
+            if (disas_neon_insn_3same_ext(s, insn)) {
+                goto illegal_op;
+            }
+            return;
         } else if ((insn & 0x0fe00000) == 0x0c400000) {
             /* Coprocessor double register transfer.  */
             ARCH(5TE);
-- 
2.16.2

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180228193125.20577-15-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
     return 0;
 }
 
+/* Advanced SIMD two registers and a scalar extension.
+ *  31             24   23  22   20   16   12  11   10   9    8        3     0
+ * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
+ * | 1 1 1 1 1 1 1 0 | o1 | D | o2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
+ * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
+ *
+ */
+
+static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
+{
+    int rd, rn, rm, rot, size, opr_sz;
+    TCGv_ptr fpst;
+    bool q;
+
+    q = extract32(insn, 6, 1);
+    VFP_DREG_D(rd, insn);
+    VFP_DREG_N(rn, insn);
+    VFP_DREG_M(rm, insn);
+    if ((rd | rn) & q) {
+        return 1;
+    }
+
+    if ((insn & 0xff000f10) == 0xfe000800) {
+        /* VCMLA (indexed) -- 1111 1110 S.RR .... .... 1000 ...0 .... */
+        rot = extract32(insn, 20, 2);
+        size = extract32(insn, 23, 1);
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)
+            || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
+            return 1;
+        }
+    } else {
+        return 1;
+    }
+
+    if (s->fp_excp_el) {
+        gen_exception_insn(s, 4, EXCP_UDEF,
+                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
+        return 0;
+    }
+    if (!s->vfp_enabled) {
+        return 1;
+    }
+
+    opr_sz = (1 + q) * 8;
+    fpst = get_fpstatus_ptr(1);
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
+                       vfp_reg_offset(1, rn),
+                       vfp_reg_offset(1, rm), fpst,
+                       opr_sz, opr_sz, rot,
+                       size ? gen_helper_gvec_fcmlas_idx
+                       : gen_helper_gvec_fcmlah_idx);
+    tcg_temp_free_ptr(fpst);
+    return 0;
+}
+
 static int disas_coproc_insn(DisasContext *s, uint32_t insn)
 {
     int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                 goto illegal_op;
             }
             return;
+        } else if ((insn & 0x0f000a00) == 0x0e000800
+                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
+            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
+                goto illegal_op;
+            }
+            return;
         } else if ((insn & 0x0fe00000) == 0x0c400000) {
             /* Coprocessor double register transfer.  */
             ARCH(5TE);
-- 
2.16.2

From: Richard Henderson <richard.henderson@linaro.org>

Happily, the bits are in the same places compared to a32.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180228193125.20577-16-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
                                default_exception_el(s));
             break;
         }
-        if (((insn >> 24) & 3) == 3) {
+        if ((insn & 0xfe000a00) == 0xfc000800
+            && arm_dc_feature(s, ARM_FEATURE_V8)) {
+            /* The Thumb2 and ARM encodings are identical.  */
+            if (disas_neon_insn_3same_ext(s, insn)) {
+                goto illegal_op;
+            }
+        } else if ((insn & 0xff000a00) == 0xfe000800
+                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
+            /* The Thumb2 and ARM encodings are identical.  */
+            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
+                goto illegal_op;
+            }
+        } else if (((insn >> 24) & 3) == 3) {
             /* Translate into the equivalent ARM encoding.  */
             insn = (insn & 0xe2ffffff) | ((insn & (1 << 28)) >> 4) | (1 << 28);
             if (disas_neon_data_insn(s, insn)) {
-- 
2.16.2

From: Richard Henderson <richard.henderson@linaro.org>

Enable it for the "any" CPU used by *-linux-user.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180228193125.20577-17-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c   | 1 +
 target/arm/cpu64.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_any_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
     set_feature(&cpu->env, ARM_FEATURE_CRC);
     set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
+    set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
     cpu->midr = 0xffffffff;
 }
 #endif
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_any_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_CRC);
     set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
     set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
+    set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
     cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
     cpu->dcz_blocksize = 7; /*  512 bytes */
 }
-- 
2.16.2

Just my fp16 work, plus some small stuff for the sbsa-ref board;
but my rule of thumb is to send a pullreq once I get over about
30 patches...

-- PMM

The following changes since commit 2f4c51c0f384d7888a04b4815861e6d5fd244d75:

Merge remote-tracking branch 'remotes/kraxel/tags/usb-20200831-pull-request' into staging (2020-08-31 19:39:13 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200901

for you to fetch changes up to 3f462bf0f6ea6382dd1502d4eb1fcd33c8e774f5:

hw/arm/sbsa-ref : Add embedded controller in secure memory (2020-09-01 14:01:34 +0100)

----------------------------------------------------------------
target-arm queue:
 * Implement fp16 support for AArch32 VFP and Neon
 * hw/arm/sbsa-ref: add "reg" property to DT cpu nodes
 * hw/arm/sbsa-ref : Add embedded controller in secure memory

----------------------------------------------------------------
Graeme Gregory (2):
      hw/misc/sbsa_ec : Add an embedded controller for sbsa-ref
      hw/arm/sbsa-ref : Add embedded controller in secure memory

Leif Lindholm (1):
      hw/arm/sbsa-ref: add "reg" property to DT cpu nodes

Peter Maydell (44):
      target/arm: Remove local definitions of float constants
      target/arm: Use correct ID register check for aa32_fp16_arith
      target/arm: Implement VFP fp16 for VFP_BINOP operations
      target/arm: Implement VFP fp16 VMLA, VMLS, VNMLS, VNMLA, VNMUL
      target/arm: Macroify trans functions for VFMA, VFMS, VFNMA, VFNMS
      target/arm: Implement VFP fp16 for fused-multiply-add
      target/arm: Macroify uses of do_vfp_2op_sp() and do_vfp_2op_dp()
      target/arm: Implement VFP fp16 for VABS, VNEG, VSQRT
      target/arm: Implement VFP fp16 for VMOV immediate
      target/arm: Implement VFP fp16 VCMP
      target/arm: Implement VFP fp16 VLDR and VSTR
      target/arm: Implement VFP fp16 VCVT between float and integer
      target/arm: Make VFP_CONV_FIX macros take separate float type and float size
      target/arm: Use macros instead of open-coding fp16 conversion helpers
      target/arm: Implement VFP fp16 VCVT between float and fixed-point
      target/arm: Implement VFP vp16 VCVT-with-specified-rounding-mode
      target/arm: Implement VFP fp16 VSEL
      target/arm: Implement VFP fp16 VRINT*
      target/arm: Implement new VFP fp16 insn VINS
      target/arm: Implement new VFP fp16 insn VMOVX
      target/arm: Implement VFP fp16 VMOV between gp and halfprec registers
      target/arm: Implement FP16 for Neon VADD, VSUB, VABD, VMUL
      target/arm: Implement fp16 for Neon VRECPE, VRSQRTE using gvec
      target/arm: Implement fp16 for Neon VABS, VNEG of floats
      target/arm: Implement fp16 for VCEQ, VCGE, VCGT comparisons
      target/arm: Implement fp16 for VACGE, VACGT
      target/arm: Implement fp16 for Neon VMAX, VMIN
      target/arm: Implement fp16 for Neon VMAXNM, VMINNM
      target/arm: Implement fp16 for Neon VMLA, VMLS operations
      target/arm: Implement fp16 for Neon VFMA, VMFS
      target/arm: Implement fp16 for Neon fp compare-vs-0
      target/arm: Implement fp16 for Neon VRECPS
      target/arm: Implement fp16 for Neon VRSQRTS
      target/arm: Implement fp16 for Neon pairwise fp ops
      target/arm: Implement fp16 for Neon float-integer VCVT
      target/arm: Convert Neon VCVT fixed-point to gvec
      target/arm: Implement fp16 for Neon VCVT fixed-point
      target/arm: Implement fp16 for Neon VCVT with rounding modes
      target/arm: Implement fp16 for Neon VRINT-with-specified-rounding-mode
      target/arm: Implement fp16 for Neon VRINTX
      target/arm/vec_helper: Handle oprsz less than 16 bytes in indexed operations
      target/arm/vec_helper: Add gvec fp indexed multiply-and-add operations
      target/arm: Implement fp16 for Neon VMUL, VMLA, VMLS
      target/arm: Enable FP16 in '-cpu max'

In several places the target/arm code defines local float constants
for 2, 3 and 1.5, which are also provided by include/fpu/softfloat.h.
Remove the unnecessary local duplicate versions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-2-peter.maydell@linaro.org
---
 target/arm/helper-a64.c    | 11 -----------
 target/arm/translate-sve.c |  4 ----
 target/arm/vfp_helper.c    |  4 ----
 3 files changed, 19 deletions(-)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_cgt_f64)(float64 a, float64 b, void *fpstp)
  * versions, these do a fully fused multiply-add or
  * multiply-add-and-halve.
  */
-#define float16_two make_float16(0x4000)
-#define float16_three make_float16(0x4200)
-#define float16_one_point_five make_float16(0x3e00)
-
-#define float32_two make_float32(0x40000000)
-#define float32_three make_float32(0x40400000)
-#define float32_one_point_five make_float32(0x3fc00000)
-
-#define float64_two make_float64(0x4000000000000000ULL)
-#define float64_three make_float64(0x4008000000000000ULL)
-#define float64_one_point_five make_float64(0x3FF8000000000000ULL)
 
 uint32_t HELPER(recpsf_f16)(uint32_t a, uint32_t b, void *fpstp)
 {
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME##_zpzi(DisasContext *s, arg_rpri_esz *a)         \
     return true;                                                          \
 }
 
-#define float16_two  make_float16(0x4000)
-#define float32_two  make_float32(0x40000000)
-#define float64_two  make_float64(0x4000000000000000ULL)
-
 DO_FP_IMM(FADD, fadds, half, one)
 DO_FP_IMM(FSUB, fsubs, half, one)
 DO_FP_IMM(FMUL, fmuls, half, two)
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_fcvt_f64_to_f16)(float64 a, void *fpstp, uint32_t ahp_mode)
     return r;
 }
 
-#define float32_two make_float32(0x40000000)
-#define float32_three make_float32(0x40400000)
-#define float32_one_point_five make_float32(0x3fc00000)
-
 float32 HELPER(recps_f32)(CPUARMState *env, float32 a, float32 b)
 {
     float_status *s = &env->vfp.standard_fp_status;
-- 
2.20.1

The aa32_fp16_arith feature check function currently looks at the
AArch64 ID_AA64PFR0 register. This is (as the comment notes) not
correct. The bogus check was put in mostly to allow testing of the
fp16 variants of the VCMLA instructions and it was something of
a mistake that we allowed them to exist in master.

Switch the feature check function to testing VMFR1.FPHP, which is
what it ought to be.

This will remove emulation of the VCMLA and VCADD insns from
AArch32 code running on an AArch64 '-cpu max' using system emulation.
(They were never enabled for aarch32 linux-user and system-emulation.)
Since we weren't advertising their existence via the AArch32 ID
register, well-behaved guests wouldn't have been using them anyway.

Once we have implemented all the AArch32 support for the FP16 extension
we will advertise it in the MVFR1 ID register field, which will reenable
these insns along with all the others.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-3-peter.maydell@linaro.org
---
 target/arm/cpu.h | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_predinv(const ARMISARegisters *id)
 
 static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
 {
-    /*
-     * This is a placeholder for use by VCMA until the rest of
-     * the ARMv8.2-FP16 extension is implemented for aa32 mode.
-     * At which point we can properly set and check MVFR1.FPHP.
-     */
-    return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
+    return FIELD_EX32(id->mvfr1, MVFR1, FPHP) >= 3;
 }
 
 static inline bool isar_feature_aa32_vfp_simd(const ARMISARegisters *id)
-- 
2.20.1

Implmeent VFP fp16 support for simple binary-operator VFP insns VADD,
VSUB, VMUL, VDIV, VMINNM and VMAXNM:

* make the VFP_BINOP() macro generate float16 helpers as well as
   float32 and float64
 * implement a do_vfp_3op_hp() function similar to the existing
   do_vfp_3op_sp()
 * add decode for the half-precision insn patterns

Note that the VFP_BINOP macro use creates a couple of unused helper
functions vfp_maxh and vfp_minh, but they're small so it's not worth
splitting the BINOP operations into "needs halfprec" and "no
halfprec" groups.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-4-peter.maydell@linaro.org
---
 target/arm/helper.h            |  8 ++++
 target/arm/vfp-uncond.decode   |  3 ++
 target/arm/vfp.decode          |  4 ++
 target/arm/vfp_helper.c        |  5 ++
 target/arm/translate-vfp.c.inc | 86 ++++++++++++++++++++++++++++++++++
 5 files changed, 106 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(probe_access, TCG_CALL_NO_WG, void, env, tl, i32, i32, i32)
 DEF_HELPER_1(vfp_get_fpscr, i32, env)
 DEF_HELPER_2(vfp_set_fpscr, void, env, i32)
 
+DEF_HELPER_3(vfp_addh, f16, f16, f16, ptr)
 DEF_HELPER_3(vfp_adds, f32, f32, f32, ptr)
 DEF_HELPER_3(vfp_addd, f64, f64, f64, ptr)
+DEF_HELPER_3(vfp_subh, f16, f16, f16, ptr)
 DEF_HELPER_3(vfp_subs, f32, f32, f32, ptr)
 DEF_HELPER_3(vfp_subd, f64, f64, f64, ptr)
+DEF_HELPER_3(vfp_mulh, f16, f16, f16, ptr)
 DEF_HELPER_3(vfp_muls, f32, f32, f32, ptr)
 DEF_HELPER_3(vfp_muld, f64, f64, f64, ptr)
+DEF_HELPER_3(vfp_divh, f16, f16, f16, ptr)
 DEF_HELPER_3(vfp_divs, f32, f32, f32, ptr)
 DEF_HELPER_3(vfp_divd, f64, f64, f64, ptr)
+DEF_HELPER_3(vfp_maxh, f16, f16, f16, ptr)
 DEF_HELPER_3(vfp_maxs, f32, f32, f32, ptr)
 DEF_HELPER_3(vfp_maxd, f64, f64, f64, ptr)
+DEF_HELPER_3(vfp_minh, f16, f16, f16, ptr)
 DEF_HELPER_3(vfp_mins, f32, f32, f32, ptr)
 DEF_HELPER_3(vfp_mind, f64, f64, f64, ptr)
+DEF_HELPER_3(vfp_maxnumh, f16, f16, f16, ptr)
 DEF_HELPER_3(vfp_maxnums, f32, f32, f32, ptr)
 DEF_HELPER_3(vfp_maxnumd, f64, f64, f64, ptr)
+DEF_HELPER_3(vfp_minnumh, f16, f16, f16, ptr)
 DEF_HELPER_3(vfp_minnums, f32, f32, f32, ptr)
 DEF_HELPER_3(vfp_minnumd, f64, f64, f64, ptr)
 DEF_HELPER_1(vfp_negs, f32, f32)
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VSEL        1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
 VSEL        1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
             vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
 
+VMAXNM_hp   1111 1110 1.00 .... .... 1001 .0.0 ....         @vfp_dnm_s
+VMINNM_hp   1111 1110 1.00 .... .... 1001 .1.0 ....         @vfp_dnm_s
+
 VMAXNM_sp   1111 1110 1.00 .... .... 1010 .0.0 ....         @vfp_dnm_s
 VMINNM_sp   1111 1110 1.00 .... .... 1010 .1.0 ....         @vfp_dnm_s
 
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 ....        @vfp_dnm_d
 VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 ....        @vfp_dnm_s
 VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 ....        @vfp_dnm_d
 
+VMUL_hp      ---- 1110 0.10 .... .... 1001 .0.0 ....        @vfp_dnm_s
 VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 ....        @vfp_dnm_s
 VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 ....        @vfp_dnm_d
 
 VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 ....        @vfp_dnm_s
 VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 ....        @vfp_dnm_d
 
+VADD_hp      ---- 1110 0.11 .... .... 1001 .0.0 ....        @vfp_dnm_s
 VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 ....        @vfp_dnm_s
 VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 ....        @vfp_dnm_d
 
+VSUB_hp      ---- 1110 0.11 .... .... 1001 .1.0 ....        @vfp_dnm_s
 VSUB_sp      ---- 1110 0.11 .... .... 1010 .1.0 ....        @vfp_dnm_s
 VSUB_dp      ---- 1110 0.11 .... .... 1011 .1.0 ....        @vfp_dnm_d
 
+VDIV_hp      ---- 1110 1.00 .... .... 1001 .0.0 ....        @vfp_dnm_s
 VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 ....        @vfp_dnm_s
 VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 ....        @vfp_dnm_d
 
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val)
 #define VFP_HELPER(name, p) HELPER(glue(glue(vfp_,name),p))
 
 #define VFP_BINOP(name) \
+dh_ctype_f16 VFP_HELPER(name, h)(dh_ctype_f16 a, dh_ctype_f16 b, void *fpstp) \
+{ \
+    float_status *fpst = fpstp; \
+    return float16_ ## name(a, b, fpst); \
+} \
 float32 VFP_HELPER(name, s)(float32 a, float32 b, void *fpstp) \
 { \
     float_status *fpst = fpstp; \
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
     return true;
 }
 
+static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
+                          int vd, int vn, int vm, bool reads_vd)
+{
+    /*
+     * Do a half-precision operation. Functionally this is
+     * the same as do_vfp_3op_sp(), except:
+     *  - it uses the FPST_FPCR_F16
+     *  - it doesn't need the VFP vector handling (fp16 is a
+     *    v8 feature, and in v8 VFP vectors don't exist)
+     *  - it does the aa32_fp16_arith feature test
+     */
+    TCGv_i32 f0, f1, fd;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    if (s->vec_len != 0 || s->vec_stride != 0) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    f0 = tcg_temp_new_i32();
+    f1 = tcg_temp_new_i32();
+    fd = tcg_temp_new_i32();
+    fpst = fpstatus_ptr(FPST_FPCR_F16);
+
+    neon_load_reg32(f0, vn);
+    neon_load_reg32(f1, vm);
+
+    if (reads_vd) {
+        neon_load_reg32(fd, vd);
+    }
+    fn(fd, f0, f1, fpst);
+    neon_store_reg32(fd, vd);
+
+    tcg_temp_free_i32(f0);
+    tcg_temp_free_i32(f1);
+    tcg_temp_free_i32(fd);
+    tcg_temp_free_ptr(fpst);
+
+    return true;
+}
+
 static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
                           int vd, int vn, int vm, bool reads_vd)
 {
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_dp *a)
     return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
 }
 
+static bool trans_VMUL_hp(DisasContext *s, arg_VMUL_sp *a)
+{
+    return do_vfp_3op_hp(s, gen_helper_vfp_mulh, a->vd, a->vn, a->vm, false);
+}
+
 static bool trans_VMUL_sp(DisasContext *s, arg_VMUL_sp *a)
 {
     return do_vfp_3op_sp(s, gen_helper_vfp_muls, a->vd, a->vn, a->vm, false);
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_dp *a)
     return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
 }
 
+static bool trans_VADD_hp(DisasContext *s, arg_VADD_sp *a)
+{
+    return do_vfp_3op_hp(s, gen_helper_vfp_addh, a->vd, a->vn, a->vm, false);
+}
+
 static bool trans_VADD_sp(DisasContext *s, arg_VADD_sp *a)
 {
     return do_vfp_3op_sp(s, gen_helper_vfp_adds, a->vd, a->vn, a->vm, false);
@@ -XXX,XX +XXX,XX @@ static bool trans_VADD_dp(DisasContext *s, arg_VADD_dp *a)
     return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
 }
 
+static bool trans_VSUB_hp(DisasContext *s, arg_VSUB_sp *a)
+{
+    return do_vfp_3op_hp(s, gen_helper_vfp_subh, a->vd, a->vn, a->vm, false);
+}
+
 static bool trans_VSUB_sp(DisasContext *s, arg_VSUB_sp *a)
 {
     return do_vfp_3op_sp(s, gen_helper_vfp_subs, a->vd, a->vn, a->vm, false);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_dp *a)
     return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
 }
 
+static bool trans_VDIV_hp(DisasContext *s, arg_VDIV_sp *a)
+{
+    return do_vfp_3op_hp(s, gen_helper_vfp_divh, a->vd, a->vn, a->vm, false);
+}
+
 static bool trans_VDIV_sp(DisasContext *s, arg_VDIV_sp *a)
 {
     return do_vfp_3op_sp(s, gen_helper_vfp_divs, a->vd, a->vn, a->vm, false);
@@ -XXX,XX +XXX,XX @@ static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_dp *a)
     return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
 }
 
+static bool trans_VMINNM_hp(DisasContext *s, arg_VMINNM_sp *a)
+{
+    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
+        return false;
+    }
+    return do_vfp_3op_hp(s, gen_helper_vfp_minnumh,
+                         a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VMAXNM_hp(DisasContext *s, arg_VMAXNM_sp *a)
+{
+    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
+        return false;
+    }
+    return do_vfp_3op_hp(s, gen_helper_vfp_maxnumh,
+                         a->vd, a->vn, a->vm, false);
+}
+
 static bool trans_VMINNM_sp(DisasContext *s, arg_VMINNM_sp *a)
 {
     if (!dc_isar_feature(aa32_vminmaxnm, s)) {
-- 
2.20.1

Implement fp16 versions of the VFP VMLA, VMLS, VNMLS, VNMLA, VNMUL
instructions. (These are all the remaining ones which we implement
via do_vfp_3op_[hsd]p().)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-5-peter.maydell@linaro.org
---
 target/arm/helper.h            |  1 +
 target/arm/vfp.decode          |  5 ++
 target/arm/vfp_helper.c        |  5 ++
 target/arm/translate-vfp.c.inc | 84 ++++++++++++++++++++++++++++++++++
 4 files changed, 95 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_maxnumd, f64, f64, f64, ptr)
 DEF_HELPER_3(vfp_minnumh, f16, f16, f16, ptr)
 DEF_HELPER_3(vfp_minnums, f32, f32, f32, ptr)
 DEF_HELPER_3(vfp_minnumd, f64, f64, f64, ptr)
+DEF_HELPER_1(vfp_negh, f16, f16)
 DEF_HELPER_1(vfp_negs, f32, f32)
 DEF_HELPER_1(vfp_negd, f64, f64)
 DEF_HELPER_1(vfp_abss, f32, f32)
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
              vd=%vd_dp p=1 u=0 w=1
 
 # 3-register VFP data-processing; bits [23,21:20,6] identify the operation.
+VMLA_hp      ---- 1110 0.00 .... .... 1001 .0.0 ....        @vfp_dnm_s
 VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 ....        @vfp_dnm_s
 VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 ....        @vfp_dnm_d
 
+VMLS_hp      ---- 1110 0.00 .... .... 1001 .1.0 ....        @vfp_dnm_s
 VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 ....        @vfp_dnm_s
 VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 ....        @vfp_dnm_d
 
+VNMLS_hp     ---- 1110 0.01 .... .... 1001 .0.0 ....        @vfp_dnm_s
 VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 ....        @vfp_dnm_s
 VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 ....        @vfp_dnm_d
 
+VNMLA_hp     ---- 1110 0.01 .... .... 1001 .1.0 ....        @vfp_dnm_s
 VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 ....        @vfp_dnm_s
 VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 ....        @vfp_dnm_d
 
@@ -XXX,XX +XXX,XX @@ VMUL_hp      ---- 1110 0.10 .... .... 1001 .0.0 ....        @vfp_dnm_s
 VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 ....        @vfp_dnm_s
 VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 ....        @vfp_dnm_d
 
+VNMUL_hp     ---- 1110 0.10 .... .... 1001 .1.0 ....        @vfp_dnm_s
 VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 ....        @vfp_dnm_s
 VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 ....        @vfp_dnm_d
 
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ VFP_BINOP(minnum)
 VFP_BINOP(maxnum)
 #undef VFP_BINOP
 
+dh_ctype_f16 VFP_HELPER(neg, h)(dh_ctype_f16 a)
+{
+    return float16_chs(a);
+}
+
 float32 VFP_HELPER(neg, s)(float32 a)
 {
     return float32_chs(a);
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     return true;
 }
 
+static void gen_VMLA_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /* Note that order of inputs to the add matters for NaNs */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_mulh(tmp, vn, vm, fpst);
+    gen_helper_vfp_addh(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VMLA_hp(DisasContext *s, arg_VMLA_sp *a)
+{
+    return do_vfp_3op_hp(s, gen_VMLA_hp, a->vd, a->vn, a->vm, true);
+}
+
 static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 {
     /* Note that order of inputs to the add matters for NaNs */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_dp *a)
     return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
 }
 
+static void gen_VMLS_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /*
+     * VMLS: vd = vd + -(vn * vm)
+     * Note that order of inputs to the add matters for NaNs.
+     */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_mulh(tmp, vn, vm, fpst);
+    gen_helper_vfp_negh(tmp, tmp);
+    gen_helper_vfp_addh(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VMLS_hp(DisasContext *s, arg_VMLS_sp *a)
+{
+    return do_vfp_3op_hp(s, gen_VMLS_hp, a->vd, a->vn, a->vm, true);
+}
+
 static void gen_VMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 {
     /*
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_dp *a)
     return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
 }
 
+static void gen_VNMLS_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /*
+     * VNMLS: -fd + (fn * fm)
+     * Note that it isn't valid to replace (-A + B) with (B - A) or similar
+     * plausible looking simplifications because this will give wrong results
+     * for NaNs.
+     */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_mulh(tmp, vn, vm, fpst);
+    gen_helper_vfp_negh(vd, vd);
+    gen_helper_vfp_addh(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VNMLS_hp(DisasContext *s, arg_VNMLS_sp *a)
+{
+    return do_vfp_3op_hp(s, gen_VNMLS_hp, a->vd, a->vn, a->vm, true);
+}
+
 static void gen_VNMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 {
     /*
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_dp *a)
     return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
 }
 
+static void gen_VNMLA_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /* VNMLA: -fd + -(fn * fm) */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_mulh(tmp, vn, vm, fpst);
+    gen_helper_vfp_negh(tmp, tmp);
+    gen_helper_vfp_negh(vd, vd);
+    gen_helper_vfp_addh(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VNMLA_hp(DisasContext *s, arg_VNMLA_sp *a)
+{
+    return do_vfp_3op_hp(s, gen_VNMLA_hp, a->vd, a->vn, a->vm, true);
+}
+
 static void gen_VNMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 {
     /* VNMLA: -fd + -(fn * fm) */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_dp *a)
     return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
 }
 
+static void gen_VNMUL_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /* VNMUL: -(fn * fm) */
+    gen_helper_vfp_mulh(vd, vn, vm, fpst);
+    gen_helper_vfp_negh(vd, vd);
+}
+
+static bool trans_VNMUL_hp(DisasContext *s, arg_VNMUL_sp *a)
+{
+    return do_vfp_3op_hp(s, gen_VNMUL_hp, a->vd, a->vn, a->vm, false);
+}
+
 static void gen_VNMUL_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 {
     /* VNMUL: -(fn * fm) */
-- 
2.20.1

Macroify creation of the trans functions for single and double
precision VFMA, VFMS, VFNMA, VFNMS. The repetition was OK for
two sizes, but we're about to add halfprec and it will get a bit
more than seems reasonable.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-6-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c.inc | 50 +++++++++-------------------------
 1 file changed, 13 insertions(+), 37 deletions(-)

diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     return true;
 }
 
-static bool trans_VFMA_sp(DisasContext *s, arg_VFMA_sp *a)
-{
-    return do_vfm_sp(s, a, false, false);
-}
-
-static bool trans_VFMS_sp(DisasContext *s, arg_VFMS_sp *a)
-{
-    return do_vfm_sp(s, a, true, false);
-}
-
-static bool trans_VFNMA_sp(DisasContext *s, arg_VFNMA_sp *a)
-{
-    return do_vfm_sp(s, a, false, true);
-}
-
-static bool trans_VFNMS_sp(DisasContext *s, arg_VFNMS_sp *a)
-{
-    return do_vfm_sp(s, a, true, true);
-}
-
 static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
 {
     /*
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
     return true;
 }
 
-static bool trans_VFMA_dp(DisasContext *s, arg_VFMA_dp *a)
-{
-    return do_vfm_dp(s, a, false, false);
-}
+#define MAKE_ONE_VFM_TRANS_FN(INSN, PREC, NEGN, NEGD)                   \
+    static bool trans_##INSN##_##PREC(DisasContext *s,                  \
+                                      arg_##INSN##_##PREC *a)           \
+    {                                                                   \
+        return do_vfm_##PREC(s, a, NEGN, NEGD);                         \
+    }
 
-static bool trans_VFMS_dp(DisasContext *s, arg_VFMS_dp *a)
-{
-    return do_vfm_dp(s, a, true, false);
-}
+#define MAKE_VFM_TRANS_FNS(PREC) \
+    MAKE_ONE_VFM_TRANS_FN(VFMA, PREC, false, false) \
+    MAKE_ONE_VFM_TRANS_FN(VFMS, PREC, true, false) \
+    MAKE_ONE_VFM_TRANS_FN(VFNMA, PREC, false, true) \
+    MAKE_ONE_VFM_TRANS_FN(VFNMS, PREC, true, true)
 
-static bool trans_VFNMA_dp(DisasContext *s, arg_VFNMA_dp *a)
-{
-    return do_vfm_dp(s, a, false, true);
-}
-
-static bool trans_VFNMS_dp(DisasContext *s, arg_VFNMS_dp *a)
-{
-    return do_vfm_dp(s, a, true, true);
-}
+MAKE_VFM_TRANS_FNS(sp)
+MAKE_VFM_TRANS_FNS(dp)
 
 static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
 {
-- 
2.20.1

Implement VFP fp16 support for fused multiply-add insns
VFNMA, VFNMS, VFMA, VFMS.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-7-peter.maydell@linaro.org
---
 target/arm/helper.h            |  1 +
 target/arm/vfp.decode          |  5 +++
 target/arm/vfp_helper.c        |  7 ++++
 target/arm/translate-vfp.c.inc | 64 ++++++++++++++++++++++++++++++++++
 4 files changed, 77 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(vfp_fcvt_f64_to_f16, TCG_CALL_NO_RWG, f16, f64, ptr, i32)
 
 DEF_HELPER_4(vfp_muladdd, f64, f64, f64, f64, ptr)
 DEF_HELPER_4(vfp_muladds, f32, f32, f32, f32, ptr)
+DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, ptr)
 
 DEF_HELPER_3(recps_f32, f32, env, f32, f32)
 DEF_HELPER_3(rsqrts_f32, f32, env, f32, f32)
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VDIV_hp      ---- 1110 1.00 .... .... 1001 .0.0 ....        @vfp_dnm_s
 VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 ....        @vfp_dnm_s
 VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 ....        @vfp_dnm_d
 
+VFMA_hp      ---- 1110 1.10 .... .... 1001 .0. 0 ....       @vfp_dnm_s
+VFMS_hp      ---- 1110 1.10 .... .... 1001 .1. 0 ....       @vfp_dnm_s
+VFNMA_hp     ---- 1110 1.01 .... .... 1001 .0. 0 ....       @vfp_dnm_s
+VFNMS_hp     ---- 1110 1.01 .... .... 1001 .1. 0 ....       @vfp_dnm_s
+
 VFMA_sp      ---- 1110 1.10 .... .... 1010 .0. 0 ....       @vfp_dnm_s
 VFMS_sp      ---- 1110 1.10 .... .... 1010 .1. 0 ....       @vfp_dnm_s
 VFNMA_sp     ---- 1110 1.01 .... .... 1010 .0. 0 ....       @vfp_dnm_s
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_u32)(uint32_t a)
 }
 
 /* VFPv4 fused multiply-accumulate */
+dh_ctype_f16 VFP_HELPER(muladd, h)(dh_ctype_f16 a, dh_ctype_f16 b,
+                                   dh_ctype_f16 c, void *fpstp)
+{
+    float_status *fpst = fpstp;
+    return float16_muladd(a, b, c, 0, fpst);
+}
+
 float32 VFP_HELPER(muladd, s)(float32 a, float32 b, float32 c, void *fpstp)
 {
     float_status *fpst = fpstp;
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMAXNM_dp(DisasContext *s, arg_VMAXNM_dp *a)
                          a->vd, a->vn, a->vm, false);
 }
 
+static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
+{
+    /*
+     * VFNMA : fd = muladd(-fd,  fn, fm)
+     * VFNMS : fd = muladd(-fd, -fn, fm)
+     * VFMA  : fd = muladd( fd,  fn, fm)
+     * VFMS  : fd = muladd( fd, -fn, fm)
+     *
+     * These are fused multiply-add, and must be done as one floating
+     * point operation with no rounding between the multiplication and
+     * addition steps.  NB that doing the negations here as separate
+     * steps is correct : an input NaN should come out with its sign
+     * bit flipped if it is a negated-input.
+     */
+    TCGv_ptr fpst;
+    TCGv_i32 vn, vm, vd;
+
+    /*
+     * Present in VFPv4 only, and only with the FP16 extension.
+     * Note that we can't rely on the SIMDFMAC check alone, because
+     * in a Neon-no-VFP core that ID register field will be non-zero.
+     */
+    if (!dc_isar_feature(aa32_fp16_arith, s) ||
+        !dc_isar_feature(aa32_simdfmac, s) ||
+        !dc_isar_feature(aa32_fpsp_v2, s)) {
+        return false;
+    }
+
+    if (s->vec_len != 0 || s->vec_stride != 0) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vn = tcg_temp_new_i32();
+    vm = tcg_temp_new_i32();
+    vd = tcg_temp_new_i32();
+
+    neon_load_reg32(vn, a->vn);
+    neon_load_reg32(vm, a->vm);
+    if (neg_n) {
+        /* VFNMS, VFMS */
+        gen_helper_vfp_negh(vn, vn);
+    }
+    neon_load_reg32(vd, a->vd);
+    if (neg_d) {
+        /* VFNMA, VFNMS */
+        gen_helper_vfp_negh(vd, vd);
+    }
+    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
+    neon_store_reg32(vd, a->vd);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(vn);
+    tcg_temp_free_i32(vm);
+    tcg_temp_free_i32(vd);
+
+    return true;
+}
+
 static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
 {
     /*
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
     MAKE_ONE_VFM_TRANS_FN(VFNMA, PREC, false, true) \
     MAKE_ONE_VFM_TRANS_FN(VFNMS, PREC, true, true)
 
+MAKE_VFM_TRANS_FNS(hp)
 MAKE_VFM_TRANS_FNS(sp)
 MAKE_VFM_TRANS_FNS(dp)
 
-- 
2.20.1

Macroify the uses of do_vfp_2op_sp() and do_vfp_2op_dp(); this will
make it easier to add the halfprec support.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-8-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c.inc | 49 ++++++++++------------------------
 1 file changed, 14 insertions(+), 35 deletions(-)

diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     return true;
 }
 
-static bool trans_VMOV_reg_sp(DisasContext *s, arg_VMOV_reg_sp *a)
-{
-    return do_vfp_2op_sp(s, tcg_gen_mov_i32, a->vd, a->vm);
-}
+#define DO_VFP_2OP(INSN, PREC, FN)                              \
+    static bool trans_##INSN##_##PREC(DisasContext *s,          \
+                                      arg_##INSN##_##PREC *a)   \
+    {                                                           \
+        return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
+    }
 
-static bool trans_VMOV_reg_dp(DisasContext *s, arg_VMOV_reg_dp *a)
-{
-    return do_vfp_2op_dp(s, tcg_gen_mov_i64, a->vd, a->vm);
-}
+DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32)
+DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64)
 
-static bool trans_VABS_sp(DisasContext *s, arg_VABS_sp *a)
-{
-    return do_vfp_2op_sp(s, gen_helper_vfp_abss, a->vd, a->vm);
-}
+DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss)
+DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd)
 
-static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
-{
-    return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
-}
-
-static bool trans_VNEG_sp(DisasContext *s, arg_VNEG_sp *a)
-{
-    return do_vfp_2op_sp(s, gen_helper_vfp_negs, a->vd, a->vm);
-}
-
-static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
-{
-    return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
-}
+DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs)
+DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd)
 
 static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
 {
     gen_helper_vfp_sqrts(vd, vm, cpu_env);
 }
 
-static bool trans_VSQRT_sp(DisasContext *s, arg_VSQRT_sp *a)
-{
-    return do_vfp_2op_sp(s, gen_VSQRT_sp, a->vd, a->vm);
-}
-
 static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
 {
     gen_helper_vfp_sqrtd(vd, vm, cpu_env);
 }
 
-static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
-{
-    return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
-}
+DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
+DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
 
 static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
 {
-- 
2.20.1

Implement VFP fp16 for VABS, VNEG and VSQRT. This is all
the fp16 insns that use the DO_VFP_2OP macro, because there
is no fp16 version of VMOV_reg.

Notes:
 * the gen_helper_vfp_negh already exists as we needed to create
   it for the fp16 multiply-add insns
 * as usual we need to use the f16 version of the fp_status;
   this is only relevant for VSQRT

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-9-peter.maydell@linaro.org
---
 target/arm/helper.h            |  2 ++
 target/arm/vfp.decode          |  3 +++
 target/arm/vfp_helper.c        | 10 +++++++++
 target/arm/translate-vfp.c.inc | 40 ++++++++++++++++++++++++++++++++++
 4 files changed, 55 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_minnumd, f64, f64, f64, ptr)
 DEF_HELPER_1(vfp_negh, f16, f16)
 DEF_HELPER_1(vfp_negs, f32, f32)
 DEF_HELPER_1(vfp_negd, f64, f64)
+DEF_HELPER_1(vfp_absh, f16, f16)
 DEF_HELPER_1(vfp_abss, f32, f32)
 DEF_HELPER_1(vfp_absd, f64, f64)
+DEF_HELPER_2(vfp_sqrth, f16, f16, env)
 DEF_HELPER_2(vfp_sqrts, f32, f32, env)
 DEF_HELPER_2(vfp_sqrtd, f64, f64, env)
 DEF_HELPER_3(vfp_cmps, void, f32, f32, env)
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMOV_imm_dp  ---- 1110 1.11 .... .... 1011 0000 .... \
 VMOV_reg_sp  ---- 1110 1.11 0000 .... 1010 01.0 ....        @vfp_dm_ss
 VMOV_reg_dp  ---- 1110 1.11 0000 .... 1011 01.0 ....        @vfp_dm_dd
 
+VABS_hp      ---- 1110 1.11 0000 .... 1001 11.0 ....        @vfp_dm_ss
 VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 ....        @vfp_dm_ss
 VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 ....        @vfp_dm_dd
 
+VNEG_hp      ---- 1110 1.11 0001 .... 1001 01.0 ....        @vfp_dm_ss
 VNEG_sp      ---- 1110 1.11 0001 .... 1010 01.0 ....        @vfp_dm_ss
 VNEG_dp      ---- 1110 1.11 0001 .... 1011 01.0 ....        @vfp_dm_dd
 
+VSQRT_hp     ---- 1110 1.11 0001 .... 1001 11.0 ....        @vfp_dm_ss
 VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 ....        @vfp_dm_ss
 VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 ....        @vfp_dm_dd
 
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ float64 VFP_HELPER(neg, d)(float64 a)
     return float64_chs(a);
 }
 
+dh_ctype_f16 VFP_HELPER(abs, h)(dh_ctype_f16 a)
+{
+    return float16_abs(a);
+}
+
 float32 VFP_HELPER(abs, s)(float32 a)
 {
     return float32_abs(a);
@@ -XXX,XX +XXX,XX @@ float64 VFP_HELPER(abs, d)(float64 a)
     return float64_abs(a);
 }
 
+dh_ctype_f16 VFP_HELPER(sqrt, h)(dh_ctype_f16 a, CPUARMState *env)
+{
+    return float16_sqrt(a, &env->vfp.fp_status_f16);
+}
+
 float32 VFP_HELPER(sqrt, s)(float32 a, CPUARMState *env)
 {
     return float32_sqrt(a, &env->vfp.fp_status);
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     return true;
 }
 
+static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+{
+    /*
+     * Do a half-precision operation. Functionally this is
+     * the same as do_vfp_2op_sp(), except:
+     *  - it doesn't need the VFP vector handling (fp16 is a
+     *    v8 feature, and in v8 VFP vectors don't exist)
+     *  - it does the aa32_fp16_arith feature test
+     */
+    TCGv_i32 f0;
+
+    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    if (s->vec_len != 0 || s->vec_stride != 0) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    f0 = tcg_temp_new_i32();
+    neon_load_reg32(f0, vm);
+    fn(f0, f0);
+    neon_store_reg32(f0, vd);
+    tcg_temp_free_i32(f0);
+
+    return true;
+}
+
 static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
 {
     uint32_t delta_m = 0;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
 DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32)
 DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64)
 
+DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh)
 DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss)
 DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd)
 
+DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh)
 DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs)
 DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd)
 
+static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm)
+{
+    gen_helper_vfp_sqrth(vd, vm, cpu_env);
+}
+
 static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
 {
     gen_helper_vfp_sqrts(vd, vm, cpu_env);
@@ -XXX,XX +XXX,XX @@ static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
     gen_helper_vfp_sqrtd(vd, vm, cpu_env);
 }
 
+DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp)
 DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
 DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
 
-- 
2.20.1

Implement VFP fp16 support for the VMOV immediate insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-10-peter.maydell@linaro.org
---
 target/arm/vfp.decode          |  2 ++
 target/arm/translate-vfp.c.inc | 22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+)

Implement fp16 version of VCMP.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-11-peter.maydell@linaro.org
---
 target/arm/helper.h            |  2 ++
 target/arm/vfp.decode          |  2 ++
 target/arm/vfp_helper.c        | 15 +++++++------
 target/arm/translate-vfp.c.inc | 39 ++++++++++++++++++++++++++++++++++
 4 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_1(vfp_absd, f64, f64)
 DEF_HELPER_2(vfp_sqrth, f16, f16, env)
 DEF_HELPER_2(vfp_sqrts, f32, f32, env)
 DEF_HELPER_2(vfp_sqrtd, f64, f64, env)
+DEF_HELPER_3(vfp_cmph, void, f16, f16, env)
 DEF_HELPER_3(vfp_cmps, void, f32, f32, env)
 DEF_HELPER_3(vfp_cmpd, void, f64, f64, env)
+DEF_HELPER_3(vfp_cmpeh, void, f16, f16, env)
 DEF_HELPER_3(vfp_cmpes, void, f32, f32, env)
 DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
 
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VSQRT_hp     ---- 1110 1.11 0001 .... 1001 11.0 ....        @vfp_dm_ss
 VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 ....        @vfp_dm_ss
 VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 ....        @vfp_dm_dd
 
+VCMP_hp      ---- 1110 1.11 010 z:1 .... 1001 e:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
 VCMP_sp      ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCMP_dp      ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static void softfloat_to_vfp_compare(CPUARMState *env, FloatRelation cmp)
 }
 
 /* XXX: check quiet/signaling case */
-#define DO_VFP_cmp(p, type) \
-void VFP_HELPER(cmp, p)(type a, type b, CPUARMState *env)  \
+#define DO_VFP_cmp(P, FLOATTYPE, ARGTYPE, FPST) \
+void VFP_HELPER(cmp, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env)  \
 { \
     softfloat_to_vfp_compare(env, \
-        type ## _compare_quiet(a, b, &env->vfp.fp_status)); \
+        FLOATTYPE ## _compare_quiet(a, b, &env->vfp.FPST)); \
 } \
-void VFP_HELPER(cmpe, p)(type a, type b, CPUARMState *env) \
+void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
 { \
     softfloat_to_vfp_compare(env, \
-        type ## _compare(a, b, &env->vfp.fp_status)); \
+        FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
 }
-DO_VFP_cmp(s, float32)
-DO_VFP_cmp(d, float64)
+DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status_f16)
+DO_VFP_cmp(s, float32, float32, fp_status)
+DO_VFP_cmp(d, float64, float64, fp_status)
 #undef DO_VFP_cmp
 
 /* Integer to float and float to integer conversions */
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp)
 DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
 DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
 
+static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
+{
+    TCGv_i32 vd, vm;
+
+    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    /* Vm/M bits must be zero for the Z variant */
+    if (a->z && a->vm != 0) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vd = tcg_temp_new_i32();
+    vm = tcg_temp_new_i32();
+
+    neon_load_reg32(vd, a->vd);
+    if (a->z) {
+        tcg_gen_movi_i32(vm, 0);
+    } else {
+        neon_load_reg32(vm, a->vm);
+    }
+
+    if (a->e) {
+        gen_helper_vfp_cmpeh(vd, vm, cpu_env);
+    } else {
+        gen_helper_vfp_cmph(vd, vm, cpu_env);
+    }
+
+    tcg_temp_free_i32(vd);
+    tcg_temp_free_i32(vm);
+
+    return true;
+}
+
 static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
 {
     TCGv_i32 vd, vm;
-- 
2.20.1

Implement the fp16 versions of the VFP VLDR/VSTR (immediate).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-12-peter.maydell@linaro.org
---
 target/arm/vfp.decode          |  3 +--
 target/arm/translate-vfp.c.inc | 35 ++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMOV_single  ---- 1110 000 l:1 .... rt:4 1010 . 001 0000    vn=%vn_sp
 VMOV_64_sp   ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 ....   vm=%vm_sp
 VMOV_64_dp   ---- 1100 010 op:1 rt2:4 rt:4 1011 00.1 ....   vm=%vm_dp
 
-# Note that the half-precision variants of VLDR and VSTR are
-# not part of this decodetree at all because they have bits [9:8] == 0b01
+VLDR_VSTR_hp ---- 1101 u:1 .0 l:1 rn:4 .... 1001 imm:8      vd=%vd_sp
 VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8      vd=%vd_sp
 VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8      vd=%vd_dp
 
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
     return true;
 }
 
+static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
+{
+    uint32_t offset;
+    TCGv_i32 addr, tmp;
+
+    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /* imm8 field is offset/2 for fp16, unlike fp32 and fp64 */
+    offset = a->imm << 1;
+    if (!a->u) {
+        offset = -offset;
+    }
+
+    /* For thumb, use of PC is UNPREDICTABLE.  */
+    addr = add_reg_for_lit(s, a->rn, offset);
+    tmp = tcg_temp_new_i32();
+    if (a->l) {
+        gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
+        neon_store_reg32(tmp, a->vd);
+    } else {
+        neon_load_reg32(tmp, a->vd);
+        gen_aa32_st16(s, tmp, addr, get_mem_index(s));
+    }
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(addr);
+
+    return true;
+}
+
 static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
 {
     uint32_t offset;
-- 
2.20.1

Implement the fp16 versions of the VFP VCVT instruction forms which
convert between floating point and integer.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-13-peter.maydell@linaro.org
---
 target/arm/vfp.decode          |  4 +++
 target/arm/translate-vfp.c.inc | 65 ++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_sp      ---- 1110 1.11 0111 .... 1010 11.0 ....        @vfp_dm_ds
 VCVT_dp      ---- 1110 1.11 0111 .... 1011 11.0 ....        @vfp_dm_sd
 
 # VCVT from integer to floating point: Vm always single; Vd depends on size
+VCVT_int_hp  ---- 1110 1.11 1000 .... 1001 s:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
 VCVT_int_sp  ---- 1110 1.11 1000 .... 1010 s:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_int_dp  ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
@@ -XXX,XX +XXX,XX @@ VCVT_fix_dp  ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
              vd=%vd_dp imm=%vm_sp opc=%vcvt_fix_op
 
 # VCVT float to integer (VCVT and VCVTR): Vd always single; Vd depends on size
+VCVT_hp_int  ---- 1110 1.11 110 s:1 .... 1001 rz:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
 VCVT_sp_int  ---- 1110 1.11 110 s:1 .... 1010 rz:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_dp_int  ---- 1110 1.11 110 s:1 .... 1011 rz:1 1.0 .... \
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
     return true;
 }
 
+static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
+{
+    TCGv_i32 vm;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vm = tcg_temp_new_i32();
+    neon_load_reg32(vm, a->vm);
+    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    if (a->s) {
+        /* i32 -> f16 */
+        gen_helper_vfp_sitoh(vm, vm, fpst);
+    } else {
+        /* u32 -> f16 */
+        gen_helper_vfp_uitoh(vm, vm, fpst);
+    }
+    neon_store_reg32(vm, a->vd);
+    tcg_temp_free_i32(vm);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
 static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
 {
     TCGv_i32 vm;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
     return true;
 }
 
+static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
+{
+    TCGv_i32 vm;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    vm = tcg_temp_new_i32();
+    neon_load_reg32(vm, a->vm);
+
+    if (a->s) {
+        if (a->rz) {
+            gen_helper_vfp_tosizh(vm, vm, fpst);
+        } else {
+            gen_helper_vfp_tosih(vm, vm, fpst);
+        }
+    } else {
+        if (a->rz) {
+            gen_helper_vfp_touizh(vm, vm, fpst);
+        } else {
+            gen_helper_vfp_touih(vm, vm, fpst);
+        }
+    }
+    neon_store_reg32(vm, a->vd);
+    tcg_temp_free_i32(vm);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
 static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
 {
     TCGv_i32 vm;
-- 
2.20.1

Currently the VFP_CONV_FIX macros take a single fsz argument for the
size of the float type, which is used both to select the name of
the functions to call (eg float32_is_any_nan()) and also for the
type to use for the float inputs and outputs (eg float32).

Separate these into fsz and ftype arguments, so that we can use them
for fp16, which uses 'float16' in the function names but is still
passing inputs and outputs in a 32-bit sized type.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-14-peter.maydell@linaro.org
---
 target/arm/vfp_helper.c | 46 ++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
 }
 
 /* VFP3 fixed point conversion.  */
-#define VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype) \
-float##fsz HELPER(vfp_##name##to##p)(uint##isz##_t  x, uint32_t shift, \
+#define VFP_CONV_FIX_FLOAT(name, p, fsz, ftype, isz, itype)            \
+ftype HELPER(vfp_##name##to##p)(uint##isz##_t  x, uint32_t shift,      \
                                      void *fpstp) \
 { return itype##_to_##float##fsz##_scalbn(x, -shift, fpstp); }
 
-#define VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype, ROUND, suff)   \
-uint##isz##_t HELPER(vfp_to##name##p##suff)(float##fsz x, uint32_t shift, \
+#define VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype, ROUND, suff) \
+uint##isz##_t HELPER(vfp_to##name##p##suff)(ftype x, uint32_t shift,      \
                                             void *fpst)                   \
 {                                                                         \
     if (unlikely(float##fsz##_is_any_nan(x))) {                           \
@@ -XXX,XX +XXX,XX @@ uint##isz##_t HELPER(vfp_to##name##p##suff)(float##fsz x, uint32_t shift, \
     return float##fsz##_to_##itype##_scalbn(x, ROUND, shift, fpst);       \
 }
 
-#define VFP_CONV_FIX(name, p, fsz, isz, itype)                   \
-VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype)                     \
-VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
+#define VFP_CONV_FIX(name, p, fsz, ftype, isz, itype)            \
+VFP_CONV_FIX_FLOAT(name, p, fsz, ftype, isz, itype)              \
+VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype,        \
                          float_round_to_zero, _round_to_zero)    \
-VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
+VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype,        \
                          get_float_rounding_mode(fpst), )
 
-#define VFP_CONV_FIX_A64(name, p, fsz, isz, itype)               \
-VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype)                     \
-VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
+#define VFP_CONV_FIX_A64(name, p, fsz, ftype, isz, itype)        \
+VFP_CONV_FIX_FLOAT(name, p, fsz, ftype, isz, itype)              \
+VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype,        \
                          get_float_rounding_mode(fpst), )
 
-VFP_CONV_FIX(sh, d, 64, 64, int16)
-VFP_CONV_FIX(sl, d, 64, 64, int32)
-VFP_CONV_FIX_A64(sq, d, 64, 64, int64)
-VFP_CONV_FIX(uh, d, 64, 64, uint16)
-VFP_CONV_FIX(ul, d, 64, 64, uint32)
-VFP_CONV_FIX_A64(uq, d, 64, 64, uint64)
-VFP_CONV_FIX(sh, s, 32, 32, int16)
-VFP_CONV_FIX(sl, s, 32, 32, int32)
-VFP_CONV_FIX_A64(sq, s, 32, 64, int64)
-VFP_CONV_FIX(uh, s, 32, 32, uint16)
-VFP_CONV_FIX(ul, s, 32, 32, uint32)
-VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
+VFP_CONV_FIX(sh, d, 64, float64, 64, int16)
+VFP_CONV_FIX(sl, d, 64, float64, 64, int32)
+VFP_CONV_FIX_A64(sq, d, 64, float64, 64, int64)
+VFP_CONV_FIX(uh, d, 64, float64, 64, uint16)
+VFP_CONV_FIX(ul, d, 64, float64, 64, uint32)
+VFP_CONV_FIX_A64(uq, d, 64, float64, 64, uint64)
+VFP_CONV_FIX(sh, s, 32, float32, 32, int16)
+VFP_CONV_FIX(sl, s, 32, float32, 32, int32)
+VFP_CONV_FIX_A64(sq, s, 32, float32, 64, int64)
+VFP_CONV_FIX(uh, s, 32, float32, 32, uint16)
+VFP_CONV_FIX(ul, s, 32, float32, 32, uint32)
+VFP_CONV_FIX_A64(uq, s, 32, float32, 64, uint64)
 
 #undef VFP_CONV_FIX
 #undef VFP_CONV_FIX_FLOAT
-- 
2.20.1

Now the VFP_CONV_FIX macros can handle fp16's distinction between the
width of the operation and the width of the type used to pass operands,
use the macros rather than the open-coded functions.

This creates an extra six helper functions, all of which we are going
to need for the AArch32 VFP fp16 instructions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-15-peter.maydell@linaro.org
---
 target/arm/helper.h     |  6 +++
 target/arm/vfp_helper.c | 86 +++--------------------------------------
 2 files changed, 12 insertions(+), 80 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(vfp_tosizh, s32, f16, ptr)
 DEF_HELPER_2(vfp_tosizs, s32, f32, ptr)
 DEF_HELPER_2(vfp_tosizd, s32, f64, ptr)
 
+DEF_HELPER_3(vfp_toshh_round_to_zero, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_toslh_round_to_zero, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_touhh_round_to_zero, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_toulh_round_to_zero, i32, f16, i32, ptr)
 DEF_HELPER_3(vfp_toshs_round_to_zero, i32, f32, i32, ptr)
 DEF_HELPER_3(vfp_tosls_round_to_zero, i32, f32, i32, ptr)
 DEF_HELPER_3(vfp_touhs_round_to_zero, i32, f32, i32, ptr)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_sqtod, f64, i64, i32, ptr)
 DEF_HELPER_3(vfp_uhtod, f64, i64, i32, ptr)
 DEF_HELPER_3(vfp_ultod, f64, i64, i32, ptr)
 DEF_HELPER_3(vfp_uqtod, f64, i64, i32, ptr)
+DEF_HELPER_3(vfp_shtoh, f16, i32, i32, ptr)
+DEF_HELPER_3(vfp_uhtoh, f16, i32, i32, ptr)
 DEF_HELPER_3(vfp_sltoh, f16, i32, i32, ptr)
 DEF_HELPER_3(vfp_ultoh, f16, i32, i32, ptr)
 DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ VFP_CONV_FIX_A64(sq, s, 32, float32, 64, int64)
 VFP_CONV_FIX(uh, s, 32, float32, 32, uint16)
 VFP_CONV_FIX(ul, s, 32, float32, 32, uint32)
 VFP_CONV_FIX_A64(uq, s, 32, float32, 64, uint64)
+VFP_CONV_FIX(sh, h, 16, dh_ctype_f16, 32, int16)
+VFP_CONV_FIX(sl, h, 16, dh_ctype_f16, 32, int32)
+VFP_CONV_FIX_A64(sq, h, 16, dh_ctype_f16, 64, int64)
+VFP_CONV_FIX(uh, h, 16, dh_ctype_f16, 32, uint16)
+VFP_CONV_FIX(ul, h, 16, dh_ctype_f16, 32, uint32)
+VFP_CONV_FIX_A64(uq, h, 16, dh_ctype_f16, 64, uint64)
 
 #undef VFP_CONV_FIX
 #undef VFP_CONV_FIX_FLOAT
 #undef VFP_CONV_FLOAT_FIX_ROUND
 #undef VFP_CONV_FIX_A64
 
-uint32_t HELPER(vfp_sltoh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    return int32_to_float16_scalbn(x, -shift, fpst);
-}
-
-uint32_t HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    return uint32_to_float16_scalbn(x, -shift, fpst);
-}
-
-uint32_t HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
-{
-    return int64_to_float16_scalbn(x, -shift, fpst);
-}
-
-uint32_t HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
-{
-    return uint64_to_float16_scalbn(x, -shift, fpst);
-}
-
-uint32_t HELPER(vfp_toshh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    if (unlikely(float16_is_any_nan(x))) {
-        float_raise(float_flag_invalid, fpst);
-        return 0;
-    }
-    return float16_to_int16_scalbn(x, get_float_rounding_mode(fpst),
-                                   shift, fpst);
-}
-
-uint32_t HELPER(vfp_touhh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    if (unlikely(float16_is_any_nan(x))) {
-        float_raise(float_flag_invalid, fpst);
-        return 0;
-    }
-    return float16_to_uint16_scalbn(x, get_float_rounding_mode(fpst),
-                                    shift, fpst);
-}
-
-uint32_t HELPER(vfp_toslh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    if (unlikely(float16_is_any_nan(x))) {
-        float_raise(float_flag_invalid, fpst);
-        return 0;
-    }
-    return float16_to_int32_scalbn(x, get_float_rounding_mode(fpst),
-                                   shift, fpst);
-}
-
-uint32_t HELPER(vfp_toulh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    if (unlikely(float16_is_any_nan(x))) {
-        float_raise(float_flag_invalid, fpst);
-        return 0;
-    }
-    return float16_to_uint32_scalbn(x, get_float_rounding_mode(fpst),
-                                    shift, fpst);
-}
-
-uint64_t HELPER(vfp_tosqh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    if (unlikely(float16_is_any_nan(x))) {
-        float_raise(float_flag_invalid, fpst);
-        return 0;
-    }
-    return float16_to_int64_scalbn(x, get_float_rounding_mode(fpst),
-                                   shift, fpst);
-}
-
-uint64_t HELPER(vfp_touqh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    if (unlikely(float16_is_any_nan(x))) {
-        float_raise(float_flag_invalid, fpst);
-        return 0;
-    }
-    return float16_to_uint64_scalbn(x, get_float_rounding_mode(fpst),
-                                    shift, fpst);
-}
-
 /* Set the current fp rounding mode and return the old one.
  * The argument is a softfloat float_round_ value.
  */
-- 
2.20.1

Implement the fp16 versions of the VFP VCVT instruction forms which
convert between floating point and fixed-point.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-16-peter.maydell@linaro.org
---
 target/arm/vfp.decode          |  2 ++
 target/arm/translate-vfp.c.inc | 59 ++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)

diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VJCVT        ---- 1110 1.11 1001 .... 1011 11.0 ....        @vfp_dm_sd
 # We assemble bits 18 (op), 16 (u) and 7 (sx) into a single opc field
 # for the convenience of the trans_VCVT_fix functions.
 %vcvt_fix_op 18:1 16:1 7:1
+VCVT_fix_hp  ---- 1110 1.11 1.1. .... 1001 .1.0 .... \
+             vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
 VCVT_fix_sp  ---- 1110 1.11 1.1. .... 1010 .1.0 .... \
              vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
 VCVT_fix_dp  ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
     return true;
 }
 
+static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
+{
+    TCGv_i32 vd, shift;
+    TCGv_ptr fpst;
+    int frac_bits;
+
+    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
+
+    vd = tcg_temp_new_i32();
+    neon_load_reg32(vd, a->vd);
+
+    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    shift = tcg_const_i32(frac_bits);
+
+    /* Switch on op:U:sx bits */
+    switch (a->opc) {
+    case 0:
+        gen_helper_vfp_shtoh(vd, vd, shift, fpst);
+        break;
+    case 1:
+        gen_helper_vfp_sltoh(vd, vd, shift, fpst);
+        break;
+    case 2:
+        gen_helper_vfp_uhtoh(vd, vd, shift, fpst);
+        break;
+    case 3:
+        gen_helper_vfp_ultoh(vd, vd, shift, fpst);
+        break;
+    case 4:
+        gen_helper_vfp_toshh_round_to_zero(vd, vd, shift, fpst);
+        break;
+    case 5:
+        gen_helper_vfp_toslh_round_to_zero(vd, vd, shift, fpst);
+        break;
+    case 6:
+        gen_helper_vfp_touhh_round_to_zero(vd, vd, shift, fpst);
+        break;
+    case 7:
+        gen_helper_vfp_toulh_round_to_zero(vd, vd, shift, fpst);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    neon_store_reg32(vd, a->vd);
+    tcg_temp_free_i32(vd);
+    tcg_temp_free_i32(shift);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
 static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
 {
     TCGv_i32 vd, shift;
-- 
2.20.1

Implement the fp16 versions of the VFP VCVT instruction forms
which convert between floating point and integer with a specified
rounding mode.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-17-peter.maydell@linaro.org
---
 target/arm/vfp-uncond.decode   |  6 ++++--
 target/arm/translate-vfp.c.inc | 32 ++++++++++++++++++++++++--------
 2 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
             vm=%vm_dp vd=%vd_dp dp=1
 
 # VCVT float to int with specified rounding mode; Vd is always single-precision
+VCVT        1111 1110 1.11 11 rm:2 .... 1001 op:1 1.0 .... \
+            vm=%vm_sp vd=%vd_sp sz=1
 VCVT        1111 1110 1.11 11 rm:2 .... 1010 op:1 1.0 .... \
-            vm=%vm_sp vd=%vd_sp dp=0
+            vm=%vm_sp vd=%vd_sp sz=2
 VCVT        1111 1110 1.11 11 rm:2 .... 1011 op:1 1.0 .... \
-            vm=%vm_dp vd=%vd_sp dp=1
+            vm=%vm_dp vd=%vd_sp sz=3
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
 static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
 {
     uint32_t rd, rm;
-    bool dp = a->dp;
+    int sz = a->sz;
     TCGv_ptr fpst;
     TCGv_i32 tcg_rmode, tcg_shift;
     int rounding = fp_decode_rm[a->rm];
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         return false;
     }
 
-    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
+    if (sz == 3 && !dc_isar_feature(aa32_fpdp_v2, s)) {
+        return false;
+    }
+
+    if (sz == 1 && !dc_isar_feature(aa32_fp16_arith, s)) {
         return false;
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
+    if (sz == 3 && !dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         return true;
     }
 
-    fpst = fpstatus_ptr(FPST_FPCR);
+    if (sz == 1) {
+        fpst = fpstatus_ptr(FPST_FPCR_F16);
+    } else {
+        fpst = fpstatus_ptr(FPST_FPCR);
+    }
 
     tcg_shift = tcg_const_i32(0);
 
     tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 
-    if (dp) {
+    if (sz == 3) {
         TCGv_i64 tcg_double, tcg_res;
         TCGv_i32 tcg_tmp;
         tcg_double = tcg_temp_new_i64();
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         tcg_single = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
         neon_load_reg32(tcg_single, rm);
-        if (is_signed) {
-            gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
+        if (sz == 1) {
+            if (is_signed) {
+                gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
+            } else {
+                gen_helper_vfp_toulh(tcg_res, tcg_single, tcg_shift, fpst);
+            }
         } else {
-            gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
+            if (is_signed) {
+                gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
+            } else {
+                gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
+            }
         }
         neon_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_res);
-- 
2.20.1

Implement the fp16 versions of the VFP VSEL instruction.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-18-peter.maydell@linaro.org
---
 target/arm/vfp-uncond.decode   |  6 ++++--
 target/arm/translate-vfp.c.inc | 16 ++++++++++++----
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@
 @vfp_dnm_s   ................................ vm=%vm_sp vn=%vn_sp vd=%vd_sp
 @vfp_dnm_d   ................................ vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+VSEL        1111 1110 0. cc:2 .... .... 1001 .0.0 .... \
+            vm=%vm_sp vn=%vn_sp vd=%vd_sp sz=1
 VSEL        1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
-            vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
+            vm=%vm_sp vn=%vn_sp vd=%vd_sp sz=2
 VSEL        1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
-            vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
+            vm=%vm_dp vn=%vn_dp vd=%vd_dp sz=3
 
 VMAXNM_hp   1111 1110 1.00 .... .... 1001 .0.0 ....         @vfp_dnm_s
 VMINNM_hp   1111 1110 1.00 .... .... 1001 .1.0 ....         @vfp_dnm_s
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check(DisasContext *s)
 static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
 {
     uint32_t rd, rn, rm;
-    bool dp = a->dp;
+    int sz = a->sz;
 
     if (!dc_isar_feature(aa32_vsel, s)) {
         return false;
     }
 
-    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
+    if (sz == 3 && !dc_isar_feature(aa32_fpdp_v2, s)) {
+        return false;
+    }
+
+    if (sz == 1 && !dc_isar_feature(aa32_fp16_arith, s)) {
         return false;
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
+    if (sz == 3 && !dc_isar_feature(aa32_simd_r32, s) &&
         ((a->vm | a->vn | a->vd) & 0x10)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         return true;
     }
 
-    if (dp) {
+    if (sz == 3) {
         TCGv_i64 frn, frm, dest;
         TCGv_i64 tmp, zero, zf, nf, vf;
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
             tcg_temp_free_i32(tmp);
             break;
         }
+        /* For fp16 the top half is always zeroes */
+        if (sz == 1) {
+            tcg_gen_andi_i32(dest, dest, 0xffff);
+        }
         neon_store_reg32(dest, rd);
         tcg_temp_free_i32(frn);
         tcg_temp_free_i32(frm);
-- 
2.20.1

Implement the fp16 version of the VFP VRINT* insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-19-peter.maydell@linaro.org
---
 target/arm/helper.h            |  2 +
 target/arm/vfp-uncond.decode   |  6 ++-
 target/arm/vfp.decode          |  3 ++
 target/arm/vfp_helper.c        | 21 ++++++++
 target/arm/translate-vfp.c.inc | 98 +++++++++++++++++++++++++++++++---
 5 files changed, 122 insertions(+), 8 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(shr_cc, i32, env, i32, i32)
 DEF_HELPER_3(sar_cc, i32, env, i32, i32)
 DEF_HELPER_3(ror_cc, i32, env, i32, i32)
 
+DEF_HELPER_FLAGS_2(rinth_exact, TCG_CALL_NO_RWG, f16, f16, ptr)
 DEF_HELPER_FLAGS_2(rints_exact, TCG_CALL_NO_RWG, f32, f32, ptr)
 DEF_HELPER_FLAGS_2(rintd_exact, TCG_CALL_NO_RWG, f64, f64, ptr)
+DEF_HELPER_FLAGS_2(rinth, TCG_CALL_NO_RWG, f16, f16, ptr)
 DEF_HELPER_FLAGS_2(rints, TCG_CALL_NO_RWG, f32, f32, ptr)
 DEF_HELPER_FLAGS_2(rintd, TCG_CALL_NO_RWG, f64, f64, ptr)
 
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VMINNM_sp   1111 1110 1.00 .... .... 1010 .1.0 ....         @vfp_dnm_s
 VMAXNM_dp   1111 1110 1.00 .... .... 1011 .0.0 ....         @vfp_dnm_d
 VMINNM_dp   1111 1110 1.00 .... .... 1011 .1.0 ....         @vfp_dnm_d
 
+VRINT       1111 1110 1.11 10 rm:2 .... 1001 01.0 .... \
+            vm=%vm_sp vd=%vd_sp sz=1
 VRINT       1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
-            vm=%vm_sp vd=%vd_sp dp=0
+            vm=%vm_sp vd=%vd_sp sz=2
 VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
-            vm=%vm_dp vd=%vd_dp dp=1
+            vm=%vm_dp vd=%vd_dp sz=3
 
 # VCVT float to int with specified rounding mode; Vd is always single-precision
 VCVT        1111 1110 1.11 11 rm:2 .... 1001 op:1 1.0 .... \
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
 VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
              vd=%vd_sp vm=%vm_dp
 
+VRINTR_hp    ---- 1110 1.11 0110 .... 1001 01.0 ....        @vfp_dm_ss
 VRINTR_sp    ---- 1110 1.11 0110 .... 1010 01.0 ....        @vfp_dm_ss
 VRINTR_dp    ---- 1110 1.11 0110 .... 1011 01.0 ....        @vfp_dm_dd
 
+VRINTZ_hp    ---- 1110 1.11 0110 .... 1001 11.0 ....        @vfp_dm_ss
 VRINTZ_sp    ---- 1110 1.11 0110 .... 1010 11.0 ....        @vfp_dm_ss
 VRINTZ_dp    ---- 1110 1.11 0110 .... 1011 11.0 ....        @vfp_dm_dd
 
+VRINTX_hp    ---- 1110 1.11 0111 .... 1001 01.0 ....        @vfp_dm_ss
 VRINTX_sp    ---- 1110 1.11 0111 .... 1010 01.0 ....        @vfp_dm_ss
 VRINTX_dp    ---- 1110 1.11 0111 .... 1011 01.0 ....        @vfp_dm_dd
 
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ float64 VFP_HELPER(muladd, d)(float64 a, float64 b, float64 c, void *fpstp)
 }
 
 /* ARMv8 round to integral */
+dh_ctype_f16 HELPER(rinth_exact)(dh_ctype_f16 x, void *fp_status)
+{
+    return float16_round_to_int(x, fp_status);
+}
+
 float32 HELPER(rints_exact)(float32 x, void *fp_status)
 {
     return float32_round_to_int(x, fp_status);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(rintd_exact)(float64 x, void *fp_status)
     return float64_round_to_int(x, fp_status);
 }
 
+dh_ctype_f16 HELPER(rinth)(dh_ctype_f16 x, void *fp_status)
+{
+    int old_flags = get_float_exception_flags(fp_status), new_flags;
+    float16 ret;
+
+    ret = float16_round_to_int(x, fp_status);
+
+    /* Suppress any inexact exceptions the conversion produced */
+    if (!(old_flags & float_flag_inexact)) {
+        new_flags = get_float_exception_flags(fp_status);
+        set_float_exception_flags(new_flags & ~float_flag_inexact, fp_status);
+    }
+
+    return ret;
+}
+
 float32 HELPER(rints)(float32 x, void *fp_status)
 {
     int old_flags = get_float_exception_flags(fp_status), new_flags;
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static const uint8_t fp_decode_rm[] = {
 static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
 {
     uint32_t rd, rm;
-    bool dp = a->dp;
+    int sz = a->sz;
     TCGv_ptr fpst;
     TCGv_i32 tcg_rmode;
     int rounding = fp_decode_rm[a->rm];
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         return false;
     }
 
-    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
+    if (sz == 3 && !dc_isar_feature(aa32_fpdp_v2, s)) {
+        return false;
+    }
+
+    if (sz == 1 && !dc_isar_feature(aa32_fp16_arith, s)) {
         return false;
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
+    if (sz == 3 && !dc_isar_feature(aa32_simd_r32, s) &&
         ((a->vm | a->vd) & 0x10)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         return true;
     }
 
-    fpst = fpstatus_ptr(FPST_FPCR);
+    if (sz == 1) {
+        fpst = fpstatus_ptr(FPST_FPCR_F16);
+    } else {
+        fpst = fpstatus_ptr(FPST_FPCR);
+    }
 
     tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 
-    if (dp) {
+    if (sz == 3) {
         TCGv_i64 tcg_op;
         TCGv_i64 tcg_res;
         tcg_op = tcg_temp_new_i64();
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         tcg_op = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
         neon_load_reg32(tcg_op, rm);
-        gen_helper_rints(tcg_res, tcg_op, fpst);
+        if (sz == 1) {
+            gen_helper_rinth(tcg_res, tcg_op, fpst);
+        } else {
+            gen_helper_rints(tcg_res, tcg_op, fpst);
+        }
         neon_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_op);
         tcg_temp_free_i32(tcg_res);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
     return true;
 }
 
+static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    neon_load_reg32(tmp, a->vm);
+    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    gen_helper_rinth(tmp, tmp, fpst);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
 static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
 {
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
     return true;
 }
 
+static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+    TCGv_i32 tcg_rmode;
+
+    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    neon_load_reg32(tmp, a->vm);
+    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    tcg_rmode = tcg_const_i32(float_round_to_zero);
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    gen_helper_rinth(tmp, tmp, fpst);
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tcg_rmode);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
 static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
 {
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
     return true;
 }
 
+static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    neon_load_reg32(tmp, a->vm);
+    fpst = fpstatus_ptr(FPST_FPCR_F16);
+    gen_helper_rinth_exact(tmp, tmp, fpst);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
 static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
 {
     TCGv_ptr fpst;
-- 
2.20.1

The fp16 extension includes a new instruction VINS, which copies the
lower 16 bits of a 32-bit source VFP register into the upper 16 bits
of the destination.  Implement it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-20-peter.maydell@linaro.org
---
 target/arm/vfp-uncond.decode   |  3 +++
 target/arm/translate-vfp.c.inc | 28 ++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

The fp16 extension includes a new instruction VMOVX, which copies the
upper 16 bits of a 32-bit source VFP register into the lower 16
bits of the destination and zeroes the high half of the destination.
Implement it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-21-peter.maydell@linaro.org
---
 target/arm/vfp-uncond.decode   |  3 +++
 target/arm/translate-vfp.c.inc | 25 +++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VCVT        1111 1110 1.11 11 rm:2 .... 1010 op:1 1.0 .... \
 VCVT        1111 1110 1.11 11 rm:2 .... 1011 op:1 1.0 .... \
             vm=%vm_dp vd=%vd_sp sz=3
 
+VMOVX       1111 1110 1.11 0000 .... 1010 01 . 0 .... \
+            vd=%vd_sp vm=%vm_sp
+
 VINS        1111 1110 1.11 0000 .... 1010 11 . 0 .... \
             vd=%vd_sp vm=%vm_sp
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
     tcg_temp_free_i32(rd);
     return true;
 }
+
+static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
+{
+    TCGv_i32 rm;
+
+    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    if (s->vec_len != 0 || s->vec_stride != 0) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /* Set Vd to high half of Vm */
+    rm = tcg_temp_new_i32();
+    neon_load_reg32(rm, a->vm);
+    tcg_gen_shri_i32(rm, rm, 16);
+    neon_store_reg32(rm, a->vd);
+    tcg_temp_free_i32(rm);
+    return true;
+}
-- 
2.20.1

Implement the VFP fp16 variant of VMOV that transfers a 16-bit
value between a general purpose register and a VFP register.

Note that Rt == 15 is UNPREDICTABLE; since this insn is v8 and later
only we have no need to replicate the old "updates CPSR.NZCV"
behaviour that the singleprec version of this insn does.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-22-peter.maydell@linaro.org
---
 target/arm/vfp.decode          |  1 +
 target/arm/translate-vfp.c.inc | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VDUP         ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \
              vn=%vn_dp
 
 VMSR_VMRS    ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
+VMOV_half    ---- 1110 000 l:1 .... rt:4 1001 . 001 0000    vn=%vn_sp
 VMOV_single  ---- 1110 000 l:1 .... rt:4 1010 . 001 0000    vn=%vn_sp
 
 VMOV_64_sp   ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 ....   vm=%vm_sp
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
     return true;
 }
 
+static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
+{
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    if (a->rt == 15) {
+        /* UNPREDICTABLE; we choose to UNDEF */
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (a->l) {
+        /* VFP to general purpose register */
+        tmp = tcg_temp_new_i32();
+        neon_load_reg32(tmp, a->vn);
+        tcg_gen_andi_i32(tmp, tmp, 0xffff);
+        store_reg(s, a->rt, tmp);
+    } else {
+        /* general purpose register to VFP */
+        tmp = load_reg(s, a->rt);
+        tcg_gen_andi_i32(tmp, tmp, 0xffff);
+        neon_store_reg32(tmp, a->vn);
+        tcg_temp_free_i32(tmp);
+    }
+
+    return true;
+}
+
 static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
 {
     TCGv_i32 tmp;
-- 
2.20.1

Implement FP16 support for the Neon insns which use the DO_3S_FP_GVEC
macro: VADD, VSUB, VABD, VMUL.

For VABD this requires us to implement a new gvec_fabd_h helper
using the machinery we have already for the other helpers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-24-peter.maydell@linaro.org
---
 target/arm/helper.h             |  1 +
 target/arm/vec_helper.c         |  6 ++++++
 target/arm/translate-neon.c.inc | 36 +++++++++++++++++----------------
 3 files changed, 26 insertions(+), 17 deletions(-)

We already have gvec helpers for floating point VRECPE and
VRQSRTE, so convert the Neon decoder to use them and
add the fp16 support.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-25-peter.maydell@linaro.org
---
 target/arm/translate-neon.c.inc | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

Rewrite Neon VABS/VNEG of floats to use gvec logical AND and XOR, so
that we can implement the fp16 version of the insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-26-peter.maydell@linaro.org
---
 target/arm/translate-neon.c.inc | 34 +++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

Convert the Neon floating-point vector comparison ops VCEQ,
VCGE and VCGT over to using a gvec helper and use this to
implement the fp16 case.

(We put the float16_ceq() etc functions above the DO_2OP()
macro definition because later when we convert the
compare-against-zero instructions we'll want their
definitions to be visible at that point in the source file.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-27-peter.maydell@linaro.org
---
 target/arm/helper.h             |  9 +++++++
 target/arm/vec_helper.c         | 44 +++++++++++++++++++++++++++++++++
 target/arm/translate-neon.c.inc |  6 ++---
 3 files changed, 56 insertions(+), 3 deletions(-)

Convert the neon floating-point vector absolute comparison ops
VACGE and VACGT over to using a gvec hepler and use this to
implement the fp16 case.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-28-peter.maydell@linaro.org
---
 target/arm/helper.h             |  6 ++++++
 target/arm/vec_helper.c         | 26 ++++++++++++++++++++++++++
 target/arm/translate-neon.c.inc |  4 ++--
 3 files changed, 34 insertions(+), 2 deletions(-)

Convert the Neon float-point VMAX and VMIN insns over to using
a gvec helper, and use this to implement the fp16 case.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-29-peter.maydell@linaro.org
---
 target/arm/helper.h             | 6 ++++++
 target/arm/vec_helper.c         | 6 ++++++
 target/arm/translate-neon.c.inc | 5 ++---
 3 files changed, 14 insertions(+), 3 deletions(-)

Convert the Neon floating point VMAXNM and VMINNM insns to
using a gvec helper and use this to implement the fp16 case.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-30-peter.maydell@linaro.org
---
 target/arm/helper.h             |  6 ++++++
 target/arm/vec_helper.c         |  6 ++++++
 target/arm/translate-neon.c.inc | 23 +++++++++++++++--------
 3 files changed, 27 insertions(+), 8 deletions(-)

Convert the Neon floating-point VMLA and VMLS insns over to using a
gvec helper, and use this to implement the fp16 case.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-31-peter.maydell@linaro.org
---
 target/arm/helper.h             |  6 +++++
 target/arm/vec_helper.c         | 42 +++++++++++++++++++++++++++++++++
 target/arm/translate-neon.c.inc | 33 ++------------------------
 3 files changed, 50 insertions(+), 31 deletions(-)

Convert the neon floating-point vector operations VFMA and VFMS
to use a gvec helper, and use this to implement the fp16 case.

This is the last use of do_3same_fp() so we can now delete
that function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-32-peter.maydell@linaro.org
---
 target/arm/helper.h             |  6 +++
 target/arm/vec_helper.c         | 33 +++++++++++-
 target/arm/translate-neon.c.inc | 92 +--------------------------------
 3 files changed, 40 insertions(+), 91 deletions(-)

Convert the neon floating-point vector compare-vs-0 insns VCEQ0,
VCGT0, VCLE0, VCGE0 and VCLT0 to use a gvec helper, and use this to
implement the fp16 case.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-33-peter.maydell@linaro.org
---
 target/arm/helper.h             | 15 +++++++++++++++
 target/arm/vec_helper.c         | 25 +++++++++++++++++++++++++
 target/arm/translate-neon.c.inc | 33 +++++----------------------------
 3 files changed, 45 insertions(+), 28 deletions(-)

Convert the Neon VRECPS insn to using a gvec helper, and
use this to implement the fp16 case.

The phrasing of the new float32_recps_nf() is slightly different from
the old recps_f32() so that it parallels the f16 version; for f16 we
can't assume that flush-to-zero is always enabled.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-34-peter.maydell@linaro.org
---
 target/arm/helper.h             |  4 +++-
 target/arm/vec_helper.c         | 31 +++++++++++++++++++++++++++++++
 target/arm/vfp_helper.c         | 13 -------------
 target/arm/translate-neon.c.inc | 21 +--------------------
 4 files changed, 35 insertions(+), 34 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(vfp_muladdd, f64, f64, f64, f64, ptr)
 DEF_HELPER_4(vfp_muladds, f32, f32, f32, f32, ptr)
 DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, ptr)
 
-DEF_HELPER_3(recps_f32, f32, env, f32, f32)
 DEF_HELPER_3(rsqrts_f32, f32, env, f32, f32)
 DEF_HELPER_FLAGS_2(recpe_f16, TCG_CALL_NO_RWG, f16, f16, ptr)
 DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, ptr)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmaxnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i3
 DEF_HELPER_FLAGS_5(gvec_fminnum_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fminnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_recps_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_recps_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(gvec_fmla_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static float32 float32_abd(float32 op1, float32 op2, float_status *stat)
     return float32_abs(float32_sub(op1, op2, stat));
 }
 
+/*
+ * Reciprocal step. These are the AArch32 version which uses a
+ * non-fused multiply-and-subtract.
+ */
+static float16 float16_recps_nf(float16 op1, float16 op2, float_status *stat)
+{
+    op1 = float16_squash_input_denormal(op1, stat);
+    op2 = float16_squash_input_denormal(op2, stat);
+
+    if ((float16_is_infinity(op1) && float16_is_zero(op2)) ||
+        (float16_is_infinity(op2) && float16_is_zero(op1))) {
+        return float16_two;
+    }
+    return float16_sub(float16_two, float16_mul(op1, op2, stat), stat);
+}
+
+static float32 float32_recps_nf(float32 op1, float32 op2, float_status *stat)
+{
+    op1 = float32_squash_input_denormal(op1, stat);
+    op2 = float32_squash_input_denormal(op2, stat);
+
+    if ((float32_is_infinity(op1) && float32_is_zero(op2)) ||
+        (float32_is_infinity(op2) && float32_is_zero(op1))) {
+        return float32_two;
+    }
+    return float32_sub(float32_two, float32_mul(op1, op2, stat), stat);
+}
+
 #define DO_3OP(NAME, FUNC, TYPE) \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
 {                                                                          \
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_fmaxnum_s, float32_maxnum, float32)
 DO_3OP(gvec_fminnum_h, float16_minnum, float16)
 DO_3OP(gvec_fminnum_s, float32_minnum, float32)
 
+DO_3OP(gvec_recps_nf_h, float16_recps_nf, float16)
+DO_3OP(gvec_recps_nf_s, float32_recps_nf, float32)
+
 #ifdef TARGET_AARCH64
 
 DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_fcvt_f64_to_f16)(float64 a, void *fpstp, uint32_t ahp_mode)
     return r;
 }
 
-float32 HELPER(recps_f32)(CPUARMState *env, float32 a, float32 b)
-{
-    float_status *s = &env->vfp.standard_fp_status;
-    if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
-        (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) {
-        if (!(float32_is_zero(a) || float32_is_zero(b))) {
-            float_raise(float_flag_input_denormal, s);
-        }
-        return float32_two;
-    }
-    return float32_sub(float32_two, float32_mul(a, b, s), s);
-}
-
 float32 HELPER(rsqrts_f32)(CPUARMState *env, float32 a, float32 b)
 {
     float_status *s = &env->vfp.standard_fp_status;
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VMLA, gen_helper_gvec_fmla_s, gen_helper_gvec_fmla_h)
 DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_s, gen_helper_gvec_fmls_h)
 DO_3S_FP_GVEC(VFMA, gen_helper_gvec_vfma_s, gen_helper_gvec_vfma_h)
 DO_3S_FP_GVEC(VFMS, gen_helper_gvec_vfms_s, gen_helper_gvec_vfms_h)
+DO_3S_FP_GVEC(VRECPS, gen_helper_gvec_recps_nf_s, gen_helper_gvec_recps_nf_h)
 
 WRAP_FP_GVEC(gen_VMAXNM_fp32_3s, FPST_STD, gen_helper_gvec_fmaxnum_s)
 WRAP_FP_GVEC(gen_VMAXNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fmaxnum_h)
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
     return do_3same(s, a, gen_VMINNM_fp32_3s);
 }
 
-WRAP_ENV_FN(gen_VRECPS_tramp, gen_helper_recps_f32)
-
-static void gen_VRECPS_fp_3s(unsigned vece, uint32_t rd_ofs,
-                             uint32_t rn_ofs, uint32_t rm_ofs,
-                             uint32_t oprsz, uint32_t maxsz)
-{
-    static const GVecGen3 ops = { .fni4 = gen_VRECPS_tramp };
-    tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops);
-}
-
-static bool trans_VRECPS_fp_3s(DisasContext *s, arg_3same *a)
-{
-    if (a->size != 0) {
-        /* TODO fp16 support */
-        return false;
-    }
-
-    return do_3same(s, a, gen_VRECPS_fp_3s);
-}
-
 WRAP_ENV_FN(gen_VRSQRTS_tramp, gen_helper_rsqrts_f32)
 
 static void gen_VRSQRTS_fp_3s(unsigned vece, uint32_t rd_ofs,
-- 
2.20.1

Convert the Neon VRSQRTS insn to using a gvec helper,
and use this to implement the fp16 case.

As with VRECPS, we adjust the phrasing of the new implementation
slightly so that the fp32 version parallels the fp16 one.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-35-peter.maydell@linaro.org
---
 target/arm/helper.h             |  4 +++-
 target/arm/vec_helper.c         | 30 ++++++++++++++++++++++++++++++
 target/arm/vfp_helper.c         | 15 ---------------
 target/arm/translate-neon.c.inc | 21 +--------------------
 4 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(vfp_muladdd, f64, f64, f64, f64, ptr)
 DEF_HELPER_4(vfp_muladds, f32, f32, f32, f32, ptr)
 DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, ptr)
 
-DEF_HELPER_3(rsqrts_f32, f32, env, f32, f32)
 DEF_HELPER_FLAGS_2(recpe_f16, TCG_CALL_NO_RWG, f16, f16, ptr)
 DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, ptr)
 DEF_HELPER_FLAGS_2(recpe_f64, TCG_CALL_NO_RWG, f64, f64, ptr)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fminnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i3
 DEF_HELPER_FLAGS_5(gvec_recps_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_recps_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_rsqrts_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_rsqrts_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(gvec_fmla_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static float32 float32_recps_nf(float32 op1, float32 op2, float_status *stat)
     return float32_sub(float32_two, float32_mul(op1, op2, stat), stat);
 }
 
+/* Reciprocal square-root step. AArch32 non-fused semantics. */
+static float16 float16_rsqrts_nf(float16 op1, float16 op2, float_status *stat)
+{
+    op1 = float16_squash_input_denormal(op1, stat);
+    op2 = float16_squash_input_denormal(op2, stat);
+
+    if ((float16_is_infinity(op1) && float16_is_zero(op2)) ||
+        (float16_is_infinity(op2) && float16_is_zero(op1))) {
+        return float16_one_point_five;
+    }
+    op1 = float16_sub(float16_three, float16_mul(op1, op2, stat), stat);
+    return float16_div(op1, float16_two, stat);
+}
+
+static float32 float32_rsqrts_nf(float32 op1, float32 op2, float_status *stat)
+{
+    op1 = float32_squash_input_denormal(op1, stat);
+    op2 = float32_squash_input_denormal(op2, stat);
+
+    if ((float32_is_infinity(op1) && float32_is_zero(op2)) ||
+        (float32_is_infinity(op2) && float32_is_zero(op1))) {
+        return float32_one_point_five;
+    }
+    op1 = float32_sub(float32_three, float32_mul(op1, op2, stat), stat);
+    return float32_div(op1, float32_two, stat);
+}
+
 #define DO_3OP(NAME, FUNC, TYPE) \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
 {                                                                          \
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_fminnum_s, float32_minnum, float32)
 DO_3OP(gvec_recps_nf_h, float16_recps_nf, float16)
 DO_3OP(gvec_recps_nf_s, float32_recps_nf, float32)
 
+DO_3OP(gvec_rsqrts_nf_h, float16_rsqrts_nf, float16)
+DO_3OP(gvec_rsqrts_nf_s, float32_rsqrts_nf, float32)
+
 #ifdef TARGET_AARCH64
 
 DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_fcvt_f64_to_f16)(float64 a, void *fpstp, uint32_t ahp_mode)
     return r;
 }
 
-float32 HELPER(rsqrts_f32)(CPUARMState *env, float32 a, float32 b)
-{
-    float_status *s = &env->vfp.standard_fp_status;
-    float32 product;
-    if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
-        (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) {
-        if (!(float32_is_zero(a) || float32_is_zero(b))) {
-            float_raise(float_flag_input_denormal, s);
-        }
-        return float32_one_point_five;
-    }
-    product = float32_mul(a, b, s);
-    return float32_div(float32_sub(float32_three, product, s), float32_two, s);
-}
-
 /* NEON helpers.  */
 
 /* Constants 256 and 512 are used in some helpers; we avoid relying on
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_s, gen_helper_gvec_fmls_h)
 DO_3S_FP_GVEC(VFMA, gen_helper_gvec_vfma_s, gen_helper_gvec_vfma_h)
 DO_3S_FP_GVEC(VFMS, gen_helper_gvec_vfms_s, gen_helper_gvec_vfms_h)
 DO_3S_FP_GVEC(VRECPS, gen_helper_gvec_recps_nf_s, gen_helper_gvec_recps_nf_h)
+DO_3S_FP_GVEC(VRSQRTS, gen_helper_gvec_rsqrts_nf_s, gen_helper_gvec_rsqrts_nf_h)
 
 WRAP_FP_GVEC(gen_VMAXNM_fp32_3s, FPST_STD, gen_helper_gvec_fmaxnum_s)
 WRAP_FP_GVEC(gen_VMAXNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fmaxnum_h)
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
     return do_3same(s, a, gen_VMINNM_fp32_3s);
 }
 
-WRAP_ENV_FN(gen_VRSQRTS_tramp, gen_helper_rsqrts_f32)
-
-static void gen_VRSQRTS_fp_3s(unsigned vece, uint32_t rd_ofs,
-                              uint32_t rn_ofs, uint32_t rm_ofs,
-                              uint32_t oprsz, uint32_t maxsz)
-{
-    static const GVecGen3 ops = { .fni4 = gen_VRSQRTS_tramp };
-    tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops);
-}
-
-static bool trans_VRSQRTS_fp_3s(DisasContext *s, arg_3same *a)
-{
-    if (a->size != 0) {
-        /* TODO fp16 support */
-        return false;
-    }
-
-    return do_3same(s, a, gen_VRSQRTS_fp_3s);
-}
-
 static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
 {
     /* FP operations handled pairwise 32 bits at a time */
-- 
2.20.1

Convert the Neon pairwise fp ops to use a single gvic-style
helper to do the full operation instead of one helper call
for each 32-bit part. This allows us to use the same
framework to implement the fp16.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-36-peter.maydell@linaro.org
---
 target/arm/helper.h             |  7 +++++
 target/arm/vec_helper.c         | 45 +++++++++++++++++++++++++++++++++
 target/arm/translate-neon.c.inc | 42 ++++++++++++------------------
 3 files changed, 68 insertions(+), 26 deletions(-)

Convert the Neon float-integer VCVT insns to gvec, and use this
to implement fp16 support for them.

Note that unlike the VFP int<->fp16 VCVT insns we converted
earlier and which convert to/from a 32-bit integer, these
Neon insns convert to/from 16-bit integers. So we can use
the existing vfp conversion helpers for the f32<->u32/i32
case but need to provide our own for f16<->u16/i16.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-37-peter.maydell@linaro.org
---
 target/arm/helper.h             |  9 +++++++++
 target/arm/vec_helper.c         | 29 +++++++++++++++++++++++++++++
 target/arm/translate-neon.c.inc | 15 ++++-----------
 3 files changed, 42 insertions(+), 11 deletions(-)

Convert the Neon VCVT float<->fixed-point insns to a
gvec style, in preparation for adding fp16 support.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-38-peter.maydell@linaro.org
---
 target/arm/helper.h             |  5 +++++
 target/arm/vec_helper.c         | 20 +++++++++++++++++++
 target/arm/translate-neon.c.inc | 35 +++++++++++++++++----------------
 3 files changed, 43 insertions(+), 17 deletions(-)

Implement fp16 for the Neon VCVT insns which convert between
float and fixed-point.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-39-peter.maydell@linaro.org
---
 target/arm/helper.h             | 5 +++++
 target/arm/neon-dp.decode       | 8 +++++++-
 target/arm/vec_helper.c         | 4 ++++
 target/arm/translate-neon.c.inc | 5 +++++
 4 files changed, 21 insertions(+), 1 deletion(-)

Convert the Neon VCVT with-specified-rounding-mode instructions
to gvec, and use this to implement fp16 support for them.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-40-peter.maydell@linaro.org
---
 target/arm/helper.h             |   5 ++
 target/arm/vec_helper.c         |  23 +++++++
 target/arm/translate-neon.c.inc | 105 ++++++++++++--------------------
 3 files changed, 66 insertions(+), 67 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vcvt_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vcvt_hs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vcvt_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(gvec_vcvt_rm_ss, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vcvt_rm_us, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vcvt_rm_sh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vcvt_rm_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(gvec_vcvt_hs, helper_vfp_toshh_round_to_zero, uint16_t)
 DO_VCVT_FIXED(gvec_vcvt_hu, helper_vfp_touhh_round_to_zero, uint16_t)
 
 #undef DO_VCVT_FIXED
+
+#define DO_VCVT_RMODE(NAME, FUNC, TYPE)                                 \
+    void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc)    \
+    {                                                                   \
+        float_status *fpst = stat;                                      \
+        intptr_t i, oprsz = simd_oprsz(desc);                           \
+        uint32_t rmode = simd_data(desc);                               \
+        uint32_t prev_rmode = get_float_rounding_mode(fpst);            \
+        TYPE *d = vd, *n = vn;                                          \
+        set_float_rounding_mode(rmode, fpst);                           \
+        for (i = 0; i < oprsz / sizeof(TYPE); i++) {                    \
+            d[i] = FUNC(n[i], 0, fpst);                                 \
+        }                                                               \
+        set_float_rounding_mode(prev_rmode, fpst);                      \
+        clear_tail(d, oprsz, simd_maxsz(desc));                         \
+    }
+
+DO_VCVT_RMODE(gvec_vcvt_rm_ss, helper_vfp_tosls, uint32_t)
+DO_VCVT_RMODE(gvec_vcvt_rm_us, helper_vfp_touls, uint32_t)
+DO_VCVT_RMODE(gvec_vcvt_rm_sh, helper_vfp_toshh, uint16_t)
+DO_VCVT_RMODE(gvec_vcvt_rm_uh, helper_vfp_touhh, uint16_t)
+
+#undef DO_VCVT_RMODE
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ DO_VRINT(VRINTZ, FPROUNDING_ZERO)
 DO_VRINT(VRINTM, FPROUNDING_NEGINF)
 DO_VRINT(VRINTP, FPROUNDING_POSINF)
 
-static bool do_vcvt(DisasContext *s, arg_2misc *a, int rmode, bool is_signed)
-{
-    /*
-     * Handle a VCVT* operation by iterating 32 bits at a time,
-     * with a specified rounding mode in operation.
-     */
-    int pass;
-    TCGv_ptr fpst;
-    TCGv_i32 tcg_rmode, tcg_shift;
-
-    if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
-        !arm_dc_feature(s, ARM_FEATURE_V8)) {
-        return false;
+#define DO_VEC_RMODE(INSN, RMODE, OP)                                   \
+    static void gen_##INSN(unsigned vece, uint32_t rd_ofs,              \
+                           uint32_t rm_ofs,                             \
+                           uint32_t oprsz, uint32_t maxsz)              \
+    {                                                                   \
+        static gen_helper_gvec_2_ptr * const fns[4] = {                 \
+            NULL,                                                       \
+            gen_helper_gvec_##OP##h,                                    \
+            gen_helper_gvec_##OP##s,                                    \
+            NULL,                                                       \
+        };                                                              \
+        TCGv_ptr fpst;                                                  \
+        fpst = fpstatus_ptr(vece == 1 ? FPST_STD_F16 : FPST_STD);       \
+        tcg_gen_gvec_2_ptr(rd_ofs, rm_ofs, fpst, oprsz, maxsz,          \
+                           arm_rmode_to_sf(RMODE), fns[vece]);          \
+        tcg_temp_free_ptr(fpst);                                        \
+    }                                                                   \
+    static bool trans_##INSN(DisasContext *s, arg_2misc *a)             \
+    {                                                                   \
+        if (!arm_dc_feature(s, ARM_FEATURE_V8)) {                       \
+            return false;                                               \
+        }                                                               \
+        if (a->size == MO_16) {                                         \
+            if (!dc_isar_feature(aa32_fp16_arith, s)) {                 \
+                return false;                                           \
+            }                                                           \
+        } else if (a->size != MO_32) {                                  \
+            return false;                                               \
+        }                                                               \
+        return do_2misc_vec(s, a, gen_##INSN);                          \
     }
 
-    /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_simd_r32, s) &&
-        ((a->vd | a->vm) & 0x10)) {
-        return false;
-    }
-
-    if (a->size != 2) {
-        /* TODO: FP16 will be the size == 1 case */
-        return false;
-    }
-
-    if ((a->vd | a->vm) & a->q) {
-        return false;
-    }
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    fpst = fpstatus_ptr(FPST_STD);
-    tcg_shift = tcg_const_i32(0);
-    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rmode));
-    gen_helper_set_neon_rmode(tcg_rmode, tcg_rmode, cpu_env);
-    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
-        if (is_signed) {
-            gen_helper_vfp_tosls(tmp, tmp, tcg_shift, fpst);
-        } else {
-            gen_helper_vfp_touls(tmp, tmp, tcg_shift, fpst);
-        }
-        neon_store_reg(a->vd, pass, tmp);
-    }
-    gen_helper_set_neon_rmode(tcg_rmode, tcg_rmode, cpu_env);
-    tcg_temp_free_i32(tcg_rmode);
-    tcg_temp_free_i32(tcg_shift);
-    tcg_temp_free_ptr(fpst);
-
-    return true;
-}
-
-#define DO_VCVT(INSN, RMODE, SIGNED)                            \
-    static bool trans_##INSN(DisasContext *s, arg_2misc *a)     \
-    {                                                           \
-        return do_vcvt(s, a, RMODE, SIGNED);                    \
-    }
-
-DO_VCVT(VCVTAU, FPROUNDING_TIEAWAY, false)
-DO_VCVT(VCVTAS, FPROUNDING_TIEAWAY, true)
-DO_VCVT(VCVTNU, FPROUNDING_TIEEVEN, false)
-DO_VCVT(VCVTNS, FPROUNDING_TIEEVEN, true)
-DO_VCVT(VCVTPU, FPROUNDING_POSINF, false)
-DO_VCVT(VCVTPS, FPROUNDING_POSINF, true)
-DO_VCVT(VCVTMU, FPROUNDING_NEGINF, false)
-DO_VCVT(VCVTMS, FPROUNDING_NEGINF, true)
+DO_VEC_RMODE(VCVTAU, FPROUNDING_TIEAWAY, vcvt_rm_u)
+DO_VEC_RMODE(VCVTAS, FPROUNDING_TIEAWAY, vcvt_rm_s)
+DO_VEC_RMODE(VCVTNU, FPROUNDING_TIEEVEN, vcvt_rm_u)
+DO_VEC_RMODE(VCVTNS, FPROUNDING_TIEEVEN, vcvt_rm_s)
+DO_VEC_RMODE(VCVTPU, FPROUNDING_POSINF, vcvt_rm_u)
+DO_VEC_RMODE(VCVTPS, FPROUNDING_POSINF, vcvt_rm_s)
+DO_VEC_RMODE(VCVTMU, FPROUNDING_NEGINF, vcvt_rm_u)
+DO_VEC_RMODE(VCVTMS, FPROUNDING_NEGINF, vcvt_rm_s)
 
 static bool trans_VSWP(DisasContext *s, arg_2misc *a)
 {
-- 
2.20.1

Convert the Neon VRINT-with-specified-rounding-mode insns to gvec,
and use this to implement the fp16 versions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-41-peter.maydell@linaro.org
---
 target/arm/helper.h             |  4 +-
 target/arm/vec_helper.c         | 21 +++++++++++
 target/arm/vfp_helper.c         | 17 ---------
 target/arm/translate-neon.c.inc | 67 +++------------------------------
 4 files changed, 30 insertions(+), 79 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
 DEF_HELPER_3(vfp_uqtoh, f16, i64, i32, ptr)
 
 DEF_HELPER_FLAGS_2(set_rmode, TCG_CALL_NO_RWG, i32, i32, ptr)
-DEF_HELPER_FLAGS_2(set_neon_rmode, TCG_CALL_NO_RWG, i32, i32, env)
 
 DEF_HELPER_FLAGS_3(vfp_fcvt_f16_to_f32, TCG_CALL_NO_RWG, f32, f16, ptr, i32)
 DEF_HELPER_FLAGS_3(vfp_fcvt_f32_to_f16, TCG_CALL_NO_RWG, f16, f32, ptr, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vcvt_rm_us, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vcvt_rm_sh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vcvt_rm_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(gvec_vrint_rm_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vrint_rm_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VCVT_RMODE(gvec_vcvt_rm_sh, helper_vfp_toshh, uint16_t)
 DO_VCVT_RMODE(gvec_vcvt_rm_uh, helper_vfp_touhh, uint16_t)
 
 #undef DO_VCVT_RMODE
+
+#define DO_VRINT_RMODE(NAME, FUNC, TYPE)                                \
+    void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc)    \
+    {                                                                   \
+        float_status *fpst = stat;                                      \
+        intptr_t i, oprsz = simd_oprsz(desc);                           \
+        uint32_t rmode = simd_data(desc);                               \
+        uint32_t prev_rmode = get_float_rounding_mode(fpst);            \
+        TYPE *d = vd, *n = vn;                                          \
+        set_float_rounding_mode(rmode, fpst);                           \
+        for (i = 0; i < oprsz / sizeof(TYPE); i++) {                    \
+            d[i] = FUNC(n[i], fpst);                                    \
+        }                                                               \
+        set_float_rounding_mode(prev_rmode, fpst);                      \
+        clear_tail(d, oprsz, simd_maxsz(desc));                         \
+    }
+
+DO_VRINT_RMODE(gvec_vrint_rm_h, helper_rinth, uint16_t)
+DO_VRINT_RMODE(gvec_vrint_rm_s, helper_rints, uint32_t)
+
+#undef DO_VRINT_RMODE
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(set_rmode)(uint32_t rmode, void *fpstp)
     return prev_rmode;
 }
 
-/* Set the current fp rounding mode in the standard fp status and return
- * the old one. This is for NEON instructions that need to change the
- * rounding mode but wish to use the standard FPSCR values for everything
- * else. Always set the rounding mode back to the correct value after
- * modifying it.
- * The argument is a softfloat float_round_ value.
- */
-uint32_t HELPER(set_neon_rmode)(uint32_t rmode, CPUARMState *env)
-{
-    float_status *fp_status = &env->vfp.standard_fp_status;
-
-    uint32_t prev_rmode = get_float_rounding_mode(fp_status);
-    set_float_rounding_mode(rmode, fp_status);
-
-    return prev_rmode;
-}
-
 /* Half precision conversions.  */
 float32 HELPER(vfp_fcvt_f16_to_f32)(uint32_t a, void *fpstp, uint32_t ahp_mode)
 {
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX(DisasContext *s, arg_2misc *a)
     return do_2misc_fp(s, a, gen_helper_rints_exact);
 }
 
-static bool do_vrint(DisasContext *s, arg_2misc *a, int rmode)
-{
-    /*
-     * Handle a VRINT* operation by iterating 32 bits at a time,
-     * with a specified rounding mode in operation.
-     */
-    int pass;
-    TCGv_ptr fpst;
-    TCGv_i32 tcg_rmode;
-
-    if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
-        !arm_dc_feature(s, ARM_FEATURE_V8)) {
-        return false;
-    }
-
-    /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_simd_r32, s) &&
-        ((a->vd | a->vm) & 0x10)) {
-        return false;
-    }
-
-    if (a->size != 2) {
-        /* TODO: FP16 will be the size == 1 case */
-        return false;
-    }
-
-    if ((a->vd | a->vm) & a->q) {
-        return false;
-    }
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    fpst = fpstatus_ptr(FPST_STD);
-    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rmode));
-    gen_helper_set_neon_rmode(tcg_rmode, tcg_rmode, cpu_env);
-    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
-        gen_helper_rints(tmp, tmp, fpst);
-        neon_store_reg(a->vd, pass, tmp);
-    }
-    gen_helper_set_neon_rmode(tcg_rmode, tcg_rmode, cpu_env);
-    tcg_temp_free_i32(tcg_rmode);
-    tcg_temp_free_ptr(fpst);
-
-    return true;
-}
-
-#define DO_VRINT(INSN, RMODE)                                   \
-    static bool trans_##INSN(DisasContext *s, arg_2misc *a)     \
-    {                                                           \
-        return do_vrint(s, a, RMODE);                           \
-    }
-
-DO_VRINT(VRINTN, FPROUNDING_TIEEVEN)
-DO_VRINT(VRINTA, FPROUNDING_TIEAWAY)
-DO_VRINT(VRINTZ, FPROUNDING_ZERO)
-DO_VRINT(VRINTM, FPROUNDING_NEGINF)
-DO_VRINT(VRINTP, FPROUNDING_POSINF)
-
 #define DO_VEC_RMODE(INSN, RMODE, OP)                                   \
     static void gen_##INSN(unsigned vece, uint32_t rd_ofs,              \
                            uint32_t rm_ofs,                             \
@@ -XXX,XX +XXX,XX @@ DO_VEC_RMODE(VCVTPS, FPROUNDING_POSINF, vcvt_rm_s)
 DO_VEC_RMODE(VCVTMU, FPROUNDING_NEGINF, vcvt_rm_u)
 DO_VEC_RMODE(VCVTMS, FPROUNDING_NEGINF, vcvt_rm_s)
 
+DO_VEC_RMODE(VRINTN, FPROUNDING_TIEEVEN, vrint_rm_)
+DO_VEC_RMODE(VRINTA, FPROUNDING_TIEAWAY, vrint_rm_)
+DO_VEC_RMODE(VRINTZ, FPROUNDING_ZERO, vrint_rm_)
+DO_VEC_RMODE(VRINTM, FPROUNDING_NEGINF, vrint_rm_)
+DO_VEC_RMODE(VRINTP, FPROUNDING_POSINF, vrint_rm_)
+
 static bool trans_VSWP(DisasContext *s, arg_2misc *a)
 {
     TCGv_i64 rm, rd;
-- 
2.20.1

Convert the Neon VRINTX insn to use gvec, and use this to implement
fp16 support for it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-42-peter.maydell@linaro.org
---
 target/arm/helper.h             |  3 +++
 target/arm/vec_helper.c         |  3 +++
 target/arm/translate-neon.c.inc | 45 +++------------------------------
 3 files changed, 9 insertions(+), 42 deletions(-)

In the gvec helper functions for indexed operations, for AArch32
Neon the oprsz (total size of the vector) can be less than 16 bytes
if the operation is on a D reg. Since the inner loop in these
helpers always goes from 0 to segment, we must clamp it based
on oprsz to avoid processing a full 16 byte segment when asked to
handle an 8 byte wide vector.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-43-peter.maydell@linaro.org
---
 target/arm/vec_helper.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_MULADD(gvec_vfms_s, float32_mulsub_f, float32)
 #define DO_MUL_IDX(NAME, TYPE, H) \
 void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
 {                                                                          \
-    intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE);  \
+    intptr_t i, j, oprsz = simd_oprsz(desc);                               \
+    intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
     intptr_t idx = simd_data(desc);                                        \
     TYPE *d = vd, *n = vn, *m = vm;                                        \
     for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
@@ -XXX,XX +XXX,XX @@ DO_MUL_IDX(gvec_mul_idx_d, uint64_t, )
 #define DO_MLA_IDX(NAME, TYPE, OP, H) \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)   \
 {                                                                          \
-    intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE);  \
+    intptr_t i, j, oprsz = simd_oprsz(desc);                               \
+    intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
     intptr_t idx = simd_data(desc);                                        \
     TYPE *d = vd, *n = vn, *m = vm, *a = va;                               \
     for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
@@ -XXX,XX +XXX,XX @@ DO_MLA_IDX(gvec_mls_idx_d, uint64_t, -,   )
 #define DO_FMUL_IDX(NAME, TYPE, H) \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
 {                                                                          \
-    intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE);  \
+    intptr_t i, j, oprsz = simd_oprsz(desc);                               \
+    intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
     intptr_t idx = simd_data(desc);                                        \
     TYPE *d = vd, *n = vn, *m = vm;                                        \
     for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
@@ -XXX,XX +XXX,XX @@ DO_FMUL_IDX(gvec_fmul_idx_d, float64, )
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *va,                  \
                   void *stat, uint32_t desc)                               \
 {                                                                          \
-    intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE);  \
+    intptr_t i, j, oprsz = simd_oprsz(desc);                               \
+    intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
     TYPE op1_neg = extract32(desc, SIMD_DATA_SHIFT, 1);                    \
     intptr_t idx = desc >> (SIMD_DATA_SHIFT + 1);                          \
     TYPE *d = vd, *n = vn, *m = vm, *a = va;                               \
-- 
2.20.1

Add gvec helpers for doing Neon-style indexed non-fused fp
multiply-and-accumulate operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200828183354.27913-44-peter.maydell@linaro.org
---
 target/arm/helper.h     | 10 ++++++++++
 target/arm/vec_helper.c | 27 ++++++++++++++++++++++-----
 2 files changed, 32 insertions(+), 5 deletions(-)

Convert the Neon floating-point VMUL, VMLA and VMLS to use gvec,
and use this to implement fp16 support.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-45-peter.maydell@linaro.org
---
 target/arm/translate-neon.c.inc | 114 ++++++++++++++++----------------
 1 file changed, 57 insertions(+), 57 deletions(-)

diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
     return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
 }
 
-/*
- * Rather than have a float-specific version of do_2scalar just for
- * three insns, we wrap a NeonGenTwoSingleOpFn to turn it into
- * a NeonGenTwoOpFn.
- */
-#define WRAP_FP_FN(WRAPNAME, FUNC)                              \
-    static void WRAPNAME(TCGv_i32 rd, TCGv_i32 rn, TCGv_i32 rm) \
-    {                                                           \
-        TCGv_ptr fpstatus = fpstatus_ptr(FPST_STD);             \
-        FUNC(rd, rn, rm, fpstatus);                             \
-        tcg_temp_free_ptr(fpstatus);                            \
+static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
+                              gen_helper_gvec_3_ptr *fn)
+{
+    /* Two registers and a scalar, using gvec */
+    int vec_size = a->q ? 16 : 8;
+    int rd_ofs = neon_reg_offset(a->vd, 0);
+    int rn_ofs = neon_reg_offset(a->vn, 0);
+    int rm_ofs;
+    int idx;
+    TCGv_ptr fpstatus;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
     }
 
-WRAP_FP_FN(gen_VMUL_F_mul, gen_helper_vfp_muls)
-WRAP_FP_FN(gen_VMUL_F_add, gen_helper_vfp_adds)
-WRAP_FP_FN(gen_VMUL_F_sub, gen_helper_vfp_subs)
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
 
-static bool trans_VMUL_F_2sc(DisasContext *s, arg_2scalar *a)
-{
-    static NeonGenTwoOpFn * const opfn[] = {
-        NULL,
-        NULL, /* TODO: fp16 support */
-        gen_VMUL_F_mul,
-        NULL,
-    };
+    if (!fn) {
+        /* Bad size (including size == 3, which is a different insn group) */
+        return false;
+    }
 
-    return do_2scalar(s, a, opfn[a->size], NULL);
+    if (a->q && ((a->vd | a->vn) & 1)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /* a->vm is M:Vm, which encodes both register and index */
+    idx = extract32(a->vm, a->size + 2, 2);
+    a->vm = extract32(a->vm, 0, a->size + 2);
+    rm_ofs = neon_reg_offset(a->vm, 0);
+
+    fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
+    tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
+                       vec_size, vec_size, idx, fn);
+    tcg_temp_free_ptr(fpstatus);
+    return true;
 }
 
-static bool trans_VMLA_F_2sc(DisasContext *s, arg_2scalar *a)
-{
-    static NeonGenTwoOpFn * const opfn[] = {
-        NULL,
-        NULL, /* TODO: fp16 support */
-        gen_VMUL_F_mul,
-        NULL,
-    };
-    static NeonGenTwoOpFn * const accfn[] = {
-        NULL,
-        NULL, /* TODO: fp16 support */
-        gen_VMUL_F_add,
-        NULL,
-    };
+#define DO_VMUL_F_2sc(NAME, FUNC)                                       \
+    static bool trans_##NAME##_F_2sc(DisasContext *s, arg_2scalar *a)   \
+    {                                                                   \
+        static gen_helper_gvec_3_ptr * const opfn[] = {                 \
+            NULL,                                                       \
+            gen_helper_##FUNC##_h,                                      \
+            gen_helper_##FUNC##_s,                                      \
+            NULL,                                                       \
+        };                                                              \
+        if (a->size == MO_16 && !dc_isar_feature(aa32_fp16_arith, s)) { \
+            return false;                                               \
+        }                                                               \
+        return do_2scalar_fp_vec(s, a, opfn[a->size]);                  \
+    }
 
-    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
-}
-
-static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
-{
-    static NeonGenTwoOpFn * const opfn[] = {
-        NULL,
-        NULL, /* TODO: fp16 support */
-        gen_VMUL_F_mul,
-        NULL,
-    };
-    static NeonGenTwoOpFn * const accfn[] = {
-        NULL,
-        NULL, /* TODO: fp16 support */
-        gen_VMUL_F_sub,
-        NULL,
-    };
-
-    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
-}
+DO_VMUL_F_2sc(VMUL, gvec_fmul_idx)
+DO_VMUL_F_2sc(VMLA, gvec_fmla_nf_idx)
+DO_VMUL_F_2sc(VMLS, gvec_fmls_nf_idx)
 
 WRAP_ENV_FN(gen_VQDMULH_16, gen_helper_neon_qdmulh_s16)
 WRAP_ENV_FN(gen_VQDMULH_32, gen_helper_neon_qdmulh_s32)
-- 
2.20.1

Set the MVFR1 ID register FPHP and SIMDHP fields to indicate
that our "-cpu max" has v8.2-FP16.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-46-peter.maydell@linaro.org
---
 target/arm/cpu.c   |  3 ++-
 target/arm/cpu64.c | 10 ++++------
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
             cpu->isar.id_isar6 = t;
 
             t = cpu->isar.mvfr1;
-            t = FIELD_DP32(t, MVFR1, FPHP, 2);     /* v8.0 FP support */
+            t = FIELD_DP32(t, MVFR1, FPHP, 3);     /* v8.2-FP16 */
+            t = FIELD_DP32(t, MVFR1, SIMDHP, 2);   /* v8.2-FP16 */
             cpu->isar.mvfr1 = t;
 
             t = cpu->isar.mvfr2;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = FIELD_DP32(u, ID_DFR0, PERFMON, 5); /* v8.4-PMU */
         cpu->isar.id_dfr0 = u;
 
-        /*
-         * FIXME: We do not yet support ARMv8.2-fp16 for AArch32 yet,
-         * so do not set MVFR1.FPHP.  Strictly speaking this is not legal,
-         * but it is also not legal to enable SVE without support for FP16,
-         * and enabling SVE in system mode is more useful in the short term.
-         */
+        u = cpu->isar.mvfr1;
+        u = FIELD_DP32(u, MVFR1, FPHP, 3);      /* v8.2-FP16 */
+        u = FIELD_DP32(u, MVFR1, SIMDHP, 2);    /* v8.2-FP16 */
+        cpu->isar.mvfr1 = u;
 
 #ifdef CONFIG_USER_ONLY
         /* For usermode -cpu max we can use a larger and more efficient DCZ
-- 
2.20.1

From: Leif Lindholm <leif@nuviainc.com>

The sbsa-ref platform uses a minimal device tree to pass amount of memory
as well as number of cpus to the firmware. However, when dumping that
minimal dtb (with -M sbsa-virt,dumpdtb=<file>), the resulting blob
generates a warning when decompiled by dtc due to lack of reg property.

Add a simple reg property per cpu, representing a 64-bit MPIDR_EL1.

This also ends up being cleaner than having the firmware calculating its
own IDs for generating APCI.

Signed-off-by: Leif Lindholm <leif@nuviainc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200827124335.30586-1-leif@nuviainc.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/sbsa-ref.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -XXX,XX +XXX,XX @@ static const int sbsa_ref_irqmap[] = {
     [SBSA_EHCI] = 11,
 };
 
+static uint64_t sbsa_ref_cpu_mp_affinity(SBSAMachineState *sms, int idx)
+{
+    uint8_t clustersz = ARM_DEFAULT_CPUS_PER_CLUSTER;
+    return arm_cpu_mp_affinity(idx, clustersz);
+}
+
 /*
  * Firmware on this machine only uses ACPI table to load OS, these limited
  * device tree nodes are just to let firmware know the info which varies from
@@ -XXX,XX +XXX,XX @@ static void create_fdt(SBSAMachineState *sms)
         g_free(matrix);
     }
 
+    /*
+     * From Documentation/devicetree/bindings/arm/cpus.yaml
+     *  On ARM v8 64-bit systems this property is required
+     *    and matches the MPIDR_EL1 register affinity bits.
+     *
+     *    * If cpus node's #address-cells property is set to 2
+     *
+     *      The first reg cell bits [7:0] must be set to
+     *      bits [39:32] of MPIDR_EL1.
+     *
+     *      The second reg cell bits [23:0] must be set to
+     *      bits [23:0] of MPIDR_EL1.
+     */
     qemu_fdt_add_subnode(sms->fdt, "/cpus");
+    qemu_fdt_setprop_cell(sms->fdt, "/cpus", "#address-cells", 2);
+    qemu_fdt_setprop_cell(sms->fdt, "/cpus", "#size-cells", 0x0);
 
     for (cpu = sms->smp_cpus - 1; cpu >= 0; cpu--) {
         char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
         ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
         CPUState *cs = CPU(armcpu);
+        uint64_t mpidr = sbsa_ref_cpu_mp_affinity(sms, cpu);
 
         qemu_fdt_add_subnode(sms->fdt, nodename);
+        qemu_fdt_setprop_u64(sms->fdt, nodename, "reg", mpidr);
 
         if (ms->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
             qemu_fdt_setprop_cell(sms->fdt, nodename, "numa-node-id",
@@ -XXX,XX +XXX,XX @@ static void sbsa_ref_init(MachineState *machine)
     arm_load_kernel(ARM_CPU(first_cpu), machine, &sms->bootinfo);
 }
 
-static uint64_t sbsa_ref_cpu_mp_affinity(SBSAMachineState *sms, int idx)
-{
-    uint8_t clustersz = ARM_DEFAULT_CPUS_PER_CLUSTER;
-    return arm_cpu_mp_affinity(idx, clustersz);
-}
-
 static const CPUArchIdList *sbsa_ref_possible_cpu_arch_ids(MachineState *ms)
 {
     unsigned int max_cpus = ms->smp.max_cpus;
-- 
2.20.1

From: Graeme Gregory <graeme@nuviainc.com>

A difference between sbsa platform and the virt platform is PSCI is
handled by ARM-TF in the sbsa platform. This means that the PSCI code
there needs to communicate some of the platform power changes down
to the qemu code for things like shutdown/reset control.

Space has been left to extend the EC if we find other use cases in
future where ARM-TF and qemu need to communicate.

Signed-off-by: Graeme Gregory <graeme@nuviainc.com>
Reviewed-by: Leif Lindholm <leif@nuviainc.com>
Tested-by: Leif Lindholm <leif@nuviainc.com>
Message-id: 20200826141952.136164-2-graeme@nuviainc.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/misc/sbsa_ec.c   | 98 +++++++++++++++++++++++++++++++++++++++++++++
 hw/misc/meson.build |  2 +
 2 files changed, 100 insertions(+)
 create mode 100644 hw/misc/sbsa_ec.c

diff --git a/hw/misc/sbsa_ec.c b/hw/misc/sbsa_ec.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/misc/sbsa_ec.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM SBSA Reference Platform Embedded Controller
+ *
+ * A device to allow PSCI running in the secure side of sbsa-ref machine
+ * to communicate platform power states to qemu.
+ *
+ * Copyright (c) 2020 Nuvia Inc
+ * Written by Graeme Gregory <graeme@nuviainc.com>
+ *
+ * SPDX-License-Identifer: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/log.h"
+#include "hw/sysbus.h"
+#include "sysemu/runstate.h"
+
+typedef struct {
+    SysBusDevice parent_obj;
+    MemoryRegion iomem;
+} SECUREECState;
+
+#define TYPE_SBSA_EC      "sbsa-ec"
+#define SECURE_EC(obj) OBJECT_CHECK(SECUREECState, (obj), TYPE_SBSA_EC)
+
+enum sbsa_ec_powerstates {
+    SBSA_EC_CMD_POWEROFF = 0x01,
+    SBSA_EC_CMD_REBOOT = 0x02,
+};
+
+static uint64_t sbsa_ec_read(void *opaque, hwaddr offset, unsigned size)
+{
+    /* No use for this currently */
+    qemu_log_mask(LOG_GUEST_ERROR, "sbsa-ec: no readable registers");
+    return 0;
+}
+
+static void sbsa_ec_write(void *opaque, hwaddr offset,
+                     uint64_t value, unsigned size)
+{
+    if (offset == 0) { /* PSCI machine power command register */
+        switch (value) {
+        case SBSA_EC_CMD_POWEROFF:
+            qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+            break;
+        case SBSA_EC_CMD_REBOOT:
+            qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+            break;
+        default:
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "sbsa-ec: unknown power command");
+        }
+    } else {
+        qemu_log_mask(LOG_GUEST_ERROR, "sbsa-ec: unknown EC register");
+    }
+}
+
+static const MemoryRegionOps sbsa_ec_ops = {
+    .read = sbsa_ec_read,
+    .write = sbsa_ec_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .valid.min_access_size = 4,
+    .valid.max_access_size = 4,
+};
+
+static void sbsa_ec_init(Object *obj)
+{
+    SECUREECState *s = SECURE_EC(obj);
+    SysBusDevice *dev = SYS_BUS_DEVICE(obj);
+
+    memory_region_init_io(&s->iomem, obj, &sbsa_ec_ops, s, "sbsa-ec",
+                          0x1000);
+    sysbus_init_mmio(dev, &s->iomem);
+}
+
+static void sbsa_ec_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    /* No vmstate or reset required: device has no internal state */
+    dc->user_creatable = false;
+}
+
+static const TypeInfo sbsa_ec_info = {
+    .name          = TYPE_SBSA_EC,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(SECUREECState),
+    .instance_init = sbsa_ec_init,
+    .class_init    = sbsa_ec_class_init,
+};
+
+static void sbsa_ec_register_type(void)
+{
+    type_register_static(&sbsa_ec_info);
+}
+
+type_init(sbsa_ec_register_type);
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -XXX,XX +XXX,XX @@ specific_ss.add(when: 'CONFIG_MAC_VIA', if_true: files('mac_via.c'))
 
 specific_ss.add(when: 'CONFIG_MIPS_CPS', if_true: files('mips_cmgcr.c', 'mips_cpc.c'))
 specific_ss.add(when: 'CONFIG_MIPS_ITU', if_true: files('mips_itu.c'))
+
+specific_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa_ec.c'))
-- 
2.20.1

From: Graeme Gregory <graeme@nuviainc.com>

Add the previously created sbsa-ec device to the sbsa-ref machine in
secure memory so the PSCI implementation in ARM-TF can access it, but
not expose it to non secure firmware or OS except by via ARM-TF.

Signed-off-by: Graeme Gregory <graeme@nuviainc.com>
Reviewed-by: Leif Lindholm <leif@nuviainc.com>
Tested-by: Leif Lindholm <leif@nuviainc.com>
Message-id: 20200826141952.136164-3-graeme@nuviainc.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/sbsa-ref.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -XXX,XX +XXX,XX @@ enum {
     SBSA_CPUPERIPHS,
     SBSA_GIC_DIST,
     SBSA_GIC_REDIST,
+    SBSA_SECURE_EC,
     SBSA_SMMU,
     SBSA_UART,
     SBSA_RTC,
@@ -XXX,XX +XXX,XX @@ static const MemMapEntry sbsa_ref_memmap[] = {
     [SBSA_CPUPERIPHS] =         { 0x40000000, 0x00040000 },
     [SBSA_GIC_DIST] =           { 0x40060000, 0x00010000 },
     [SBSA_GIC_REDIST] =         { 0x40080000, 0x04000000 },
+    [SBSA_SECURE_EC] =          { 0x50000000, 0x00001000 },
     [SBSA_UART] =               { 0x60000000, 0x00001000 },
     [SBSA_RTC] =                { 0x60010000, 0x00001000 },
     [SBSA_GPIO] =               { 0x60020000, 0x00001000 },
@@ -XXX,XX +XXX,XX @@ static void *sbsa_ref_dtb(const struct arm_boot_info *binfo, int *fdt_size)
     return board->fdt;
 }
 
+static void create_secure_ec(MemoryRegion *mem)
+{
+    hwaddr base = sbsa_ref_memmap[SBSA_SECURE_EC].base;
+    DeviceState *dev = qdev_new("sbsa-ec");
+    SysBusDevice *s = SYS_BUS_DEVICE(dev);
+
+    memory_region_add_subregion(mem, base,
+                                sysbus_mmio_get_region(s, 0));
+}
+
 static void sbsa_ref_init(MachineState *machine)
 {
     unsigned int smp_cpus = machine->smp.cpus;
@@ -XXX,XX +XXX,XX @@ static void sbsa_ref_init(MachineState *machine)
 
     create_pcie(sms);
 
+    create_secure_ec(secure_sysmem);
+
     sms->bootinfo.ram_size = machine->ram_size;
     sms->bootinfo.nb_cpus = smp_cpus;
     sms->bootinfo.board_id = -1;
-- 
2.20.1