Series comparison

-[Qemu-devel] [PULL 00/28] target-arm queue
+[PULL 00/68] target-arm queue
-Arm queue. I still have a lot of stuff in my to-review queue, so
+Hi; this pullreq contains only my FEAT_AFP/FEAT_RPRES patches
-won't be long til the next one.
+(plus a fix for a target/alpha latent bug that would otherwise
+be revealed by the fpu changes), because 68 patches is already
-I've thrown in a couple of minor non-arm patches (a xen code
+longer than I prefer to send in at one time...
 cleanup and a vl.c codestyle issue).
 thanks
 -- PMM
-The following changes since commit de44c044420d1139480fa50c2d5be19223391218:
+The following changes since commit ffaf7f0376f8040ce9068d71ae9ae8722505c42e:
-  Merge remote-tracking branch 'remotes/stsquad/tags/pull-tcg-testing-revivial-210618-2' into staging (2018-06-22 10:57:47 +0100)
+  Merge tag 'pull-10.0-testing-and-gdstub-updates-100225-1' of https://gitlab.com/stsquad/qemu into staging (2025-02-10 13:26:17 -0500)
 are available in the Git repository at:
-  git://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20180622
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20250211
-for you to fetch changes up to 6dad8260e82b69bd278685ee25209f5824360455:
+for you to fetch changes up to ca4c34e07d1388df8e396520b5e7d60883cd3690:
-  xen: Don't use memory_region_init_ram_nomigrate() in pci_assign_dev_load_option_rom() (2018-06-22 13:28:42 +0100)
+  target/arm: Sink fp_status and fpcr access into do_fmlal* (2025-02-11 16:22:08 +0000)
 ----------------------------------------------------------------
 target-arm queue:
- * hw/intc/arm_gicv3: fix wrong values when reading IPRIORITYR
+ * target/alpha: Don't corrupt error_code with unknown softfloat flags
- * target/arm: fix read of freed memory in kvm_arm_machine_init_done()
+ * target/arm: Implement FEAT_AFP and FEAT_RPRES
  * virt: support up to 512 CPUs
  * virt: support 256MB ECAM PCI region (for more PCI devices)
  * xlnx-zynqmp: Use Cortex-R5F, not Cortex-R5
  * mps2-tz: Implement and use the TrustZone Memory Protection Controller
  * target/arm: enforce alignment checking for v6M cores
  * xen: Don't use memory_region_init_ram_nomigrate() in pci_assign_dev_load_option_rom()
  * vl.c: Don't zero-initialize statics for serial_hds
 ----------------------------------------------------------------
-Amol Surati (1):
+Peter Maydell (49):
-      hw/intc/arm_gicv3: fix an extra left-shift when reading IPRIORITYR
+      target/alpha: Don't corrupt error_code with unknown softfloat flags
       fpu: Add float_class_denormal
       fpu: Implement float_flag_input_denormal_used
       fpu: allow flushing of output denormals to be after rounding
       target/arm: Define FPCR AH, FIZ, NEP bits
       target/arm: Implement FPCR.FIZ handling
       target/arm: Adjust FP behaviour for FPCR.AH = 1
       target/arm: Adjust exception flag handling for AH = 1
       target/arm: Add FPCR.AH to tbflags
       target/arm: Set up float_status to use for FPCR.AH=1 behaviour
       target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
       target/arm: Use FPST_FPCR_AH for BFCVT* insns
       target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
       target/arm: Add FPCR.NEP to TBFLAGS
       target/arm: Define and use new write_fp_*reg_merging() functions
       target/arm: Handle FPCR.NEP for 3-input scalar operations
       target/arm: Handle FPCR.NEP for BFCVT scalar
       target/arm: Handle FPCR.NEP for 1-input scalar operations
       target/arm: Handle FPCR.NEP in do_cvtf_scalar()
       target/arm: Handle FPCR.NEP for scalar FABS and FNEG
       target/arm: Handle FPCR.NEP for FCVTXN (scalar)
       target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
       target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
       target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
       target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
       target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
       target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
       target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
       target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
       target/arm: Implement FPCR.AH handling of negation of NaN
       target/arm: Implement FPCR.AH handling for scalar FABS and FABD
       target/arm: Handle FPCR.AH in vector FABD
       target/arm: Handle FPCR.AH in SVE FNEG
       target/arm: Handle FPCR.AH in SVE FABS
       target/arm: Handle FPCR.AH in SVE FABD
       target/arm: Handle FPCR.AH in negation steps in SVE FCADD
       target/arm: Handle FPCR.AH in negation steps in FCADD
       target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
       target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
       target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
       target/arm: Handle FPCR.AH in negation in FMLS (vector)
       target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
       target/arm: Handle FPCR.AH in SVE FTSSEL
       target/arm: Handle FPCR.AH in SVE FTMAD
       target/arm: Enable FEAT_AFP for '-cpu max'
       target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
       target/arm: Implement increased precision FRECPE
       target/arm: Implement increased precision FRSQRTE
       target/arm: Enable FEAT_RPRES for -cpu max
-Edgar E. Iglesias (2):
+Richard Henderson (19):
-      target-arm: Add the Cortex-R5F
+      target/arm: Handle FPCR.AH in vector FCMLA
-      xlnx-zynqmp: Swap Cortex-R5 for Cortex-R5F
+      target/arm: Handle FPCR.AH in FCMLA by index
       target/arm: Handle FPCR.AH in SVE FCMLA
       target/arm: Handle FPCR.AH in FMLSL (by element and vector)
       target/arm: Handle FPCR.AH in SVE FMLSL (indexed)
       target/arm: Handle FPCR.AH in SVE FMLSLB, FMLSLT (vectors)
       target/arm: Introduce CPUARMState.vfp.fp_status[]
       target/arm: Remove standard_fp_status_f16
       target/arm: Remove standard_fp_status
       target/arm: Remove ah_fp_status_f16
       target/arm: Remove ah_fp_status
       target/arm: Remove fp_status_f16_a64
       target/arm: Remove fp_status_f16_a32
       target/arm: Remove fp_status_a64
       target/arm: Remove fp_status_a32
       target/arm: Simplify fp_status indexing in mve_helper.c
       target/arm: Simplify DO_VFP_cmp in vfp_helper.c
       target/arm: Read fz16 from env->vfp.fpcr
       target/arm: Sink fp_status and fpcr access into do_fmlal*
-Eric Auger (11):
+ docs/system/arm/emulation.rst   |   2 +
-      linux-headers: Update to kernel mainline commit b357bf602
+ include/fpu/softfloat-helpers.h |  11 +
-      target/arm: Allow KVM device address overwriting
+ include/fpu/softfloat-types.h   |  25 ++
-      hw/intc/arm_gicv3: Introduce redist-region-count array property
+ target/arm/cpu-features.h       |  10 +
-      hw/intc/arm_gicv3_kvm: Get prepared to handle multiple redist regions
+ target/arm/cpu.h                |  97 +++--
-      hw/arm/virt: GICv3 DT node with one or two redistributor regions
+ target/arm/helper.h             |  26 ++
-      hw/arm/virt-acpi-build: Advertise one or two GICR structures
+ target/arm/internals.h          |   6 +
-      hw/arm/virt: Register two redistributor regions when necessary
+ target/arm/tcg/helper-a64.h     |  13 +
-      hw/arm/virt: Add a new 256MB ECAM region
+ target/arm/tcg/helper-sve.h     | 120 ++++++
-      hw/arm/virt: Add virt-3.0 machine type
+ target/arm/tcg/translate-a64.h  |  13 +
-      hw/arm/virt: Use 256MB ECAM region by default
+ target/arm/tcg/translate.h      |  54 +--
-      hw/arm/virt: Increase max_cpus to 512
+ target/arm/tcg/vec_internal.h   |  35 ++
+ target/mips/fpu_helper.h        |   6 +
-Julia Suvorova (3):
+ fpu/softfloat.c                 |  66 +++-
-      target/arm: Minor cleanup for ARMv6-M 32-bit instructions
+ target/alpha/cpu.c              |   7 +
-      target/arm: Introduce ARM_FEATURE_M_MAIN
+ target/alpha/fpu_helper.c       |   2 +
-      target/arm: Strict alignment for ARMv6-M and ARMv8-M Baseline
+ target/arm/cpu.c                |  46 +--
+ target/arm/helper.c             |   2 +-
-Peter Maydell (10):
+ target/arm/tcg/cpu64.c          |   2 +
-      hw/misc/tz-mpc.c: Implement the Arm TrustZone Memory Protection Controller
+ target/arm/tcg/helper-a64.c     | 151 ++++----
-      hw/misc/tz-mpc.c: Implement registers
+ target/arm/tcg/hflags.c         |  13 +
-      hw/misc/tz-mpc.c: Implement correct blocked-access behaviour
+ target/arm/tcg/mve_helper.c     |  44 +--
-      hw/misc/tz_mpc.c: Honour the BLK_LUT settings in translate
+ target/arm/tcg/sme_helper.c     |   4 +-
-      hw/misc/iotkit-secctl.c: Implement SECMPCINTSTATUS
+ target/arm/tcg/sve_helper.c     | 367 ++++++++++++++-----
-      hw/arm/iotkit: Instantiate MPC
+ target/arm/tcg/translate-a64.c  | 782 ++++++++++++++++++++++++++++++++--------
-      hw/arm/iotkit: Wire up MPC interrupt lines
+ target/arm/tcg/translate-sve.c  | 193 +++++++---
-      hw/arm/mps2-tz.c: Instantiate MPCs
+ target/arm/tcg/vec_helper.c     | 387 ++++++++++++++------
-      vl.c: Don't zero-initialize statics for serial_hds
+ target/arm/vfp_helper.c         | 374 +++++++++++++++----
-      xen: Don't use memory_region_init_ram_nomigrate() in pci_assign_dev_load_option_rom()
+ target/hppa/fpu_helper.c        |  11 +
+ target/i386/tcg/fpu_helper.c    |   8 +
-Zheng Xiang (1):
+ target/mips/msa.c               |   9 +
-      target-arm: fix a segmentation fault due to illegal memory access
+ target/ppc/cpu_init.c           |   3 +
+ target/rx/cpu.c                 |   8 +
- hw/misc/Makefile.objs                              |   1 +
+ target/sh4/cpu.c                |   8 +
- hw/xen/xen_pt.h                                    |   2 +-
+ target/tricore/helper.c         |   1 +
- include/hw/arm/iotkit.h                            |   8 +
+ tests/fp/fp-bench.c             |   1 +
- include/hw/arm/virt.h                              |  19 +
+ fpu/softfloat-parts.c.inc       | 127 +++++--
- include/hw/intc/arm_gicv3_common.h                 |   8 +-
+files changed, 2325 insertions(+), 709 deletions(-)
  include/hw/misc/iotkit-secctl.h                    |   8 +
  include/hw/misc/tz-mpc.h                           |  80 +++
  include/standard-headers/linux/pci_regs.h          |   8 +
  include/standard-headers/linux/virtio_gpu.h        |   1 +
  include/standard-headers/linux/virtio_net.h        |   3 +
  linux-headers/asm-arm/kvm.h                        |   1 +
  linux-headers/asm-arm/unistd-common.h              |   1 +
  linux-headers/asm-arm64/kvm.h                      |   1 +
  linux-headers/asm-generic/unistd.h                 |   4 +-
  linux-headers/asm-powerpc/unistd.h                 |   1 +
  linux-headers/asm-x86/unistd_32.h                  |   2 +
  linux-headers/asm-x86/unistd_64.h                  |   2 +
  linux-headers/asm-x86/unistd_x32.h                 |   2 +
  linux-headers/linux/kvm.h                          |   5 +-
  linux-headers/linux/psp-sev.h                      |  12 +
  target/arm/cpu.h                                   |   1 +
  target/arm/kvm_arm.h                               |   3 +-
  hw/arm/iotkit.c                                    | 112 +++-
  hw/arm/mps2-tz.c                                   |  71 ++-
  hw/arm/virt-acpi-build.c                           |  30 +-
  hw/arm/virt.c                                      | 100 +++-
  hw/arm/xlnx-zcu102.c                               |   2 +-
  hw/arm/xlnx-zynqmp.c                               |   2 +-
  hw/intc/arm_gic_kvm.c                              |   4 +-
  hw/intc/arm_gicv3.c                                |  12 +-
  hw/intc/arm_gicv3_common.c                         |  38 +-
  hw/intc/arm_gicv3_dist.c                           |   3 +-
  hw/intc/arm_gicv3_its_kvm.c                        |   2 +-
  hw/intc/arm_gicv3_kvm.c                            |  44 +-
  hw/intc/arm_gicv3_redist.c                         |   3 +-
  hw/misc/iotkit-secctl.c                            |  38 +-
  hw/misc/tz-mpc.c                                   | 628 +++++++++++++++++++++
  hw/xen/xen_pt_graphics.c                           |   2 +-
  hw/xen/xen_pt_load_rom.c                           |   6 +-
  target/arm/cpu.c                                   |  12 +
  target/arm/kvm.c                                   |  11 +-
  target/arm/translate.c                             |  45 +-
  vl.c                                               |   4 +-
  MAINTAINERS                                        |   2 +
  default-configs/arm-softmmu.mak                    |   1 +
  hw/misc/trace-events                               |   8 +
  .../LICENSES/exceptions/Linux-syscall-note         |   2 +-
  linux-headers/LICENSES/preferred/GPL-2.0           |   6 +
 files changed, 1250 insertions(+), 111 deletions(-)
  create mode 100644 include/hw/misc/tz-mpc.h
  create mode 100644 hw/misc/tz-mpc.c

-New patch
+[PULL 01/68] target/alpha: Don't corrupt error_code with unknown softfloat flags
+In do_cvttq() we set env->error_code with what is supposed to be a
+set of FPCR exception bit values.  However, if the set of float
+exception flags we get back from softfloat for the conversion
+includes a flag which is not one of the three we expect here
+(invalid_cvti, invalid, inexact) then we will fall through the
+if-ladder and set env->error_code to the unconverted softfloat
+exception_flag value.  This will then cause us to take a spurious
+exception.
+This is harmless now, but when we add new floating point exception
+flags to softfloat it will cause problems.  Add an else clause to the
+if-ladder to make it ignore any float exception flags it doesn't care
+about.
+Specifically, without this fix, 'make check-tcg' will fail for Alpha
+when the commit adding float_flag_input_denormal_used lands.
+Fixes: aa3bad5b59e7 ("target/alpha: Use float64_to_int64_modulo for CVTTQ")
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+---
+ target/alpha/fpu_helper.c | 2 ++
+file changed, 2 insertions(+)
+diff --git a/target/alpha/fpu_helper.c b/target/alpha/fpu_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/alpha/fpu_helper.c
++++ b/target/alpha/fpu_helper.c
+@@ -XXX,XX +XXX,XX @@ static uint64_t do_cvttq(CPUAlphaState *env, uint64_t a, int roundmode)
+             exc = FPCR_INV;
+         } else if (exc & float_flag_inexact) {
+             exc = FPCR_INE;
++        } else {
++            exc = 0;
+         }
+     }
+     env->error_code = exc;
+--
+.34.1

-[Qemu-devel] [PULL 21/28] hw/misc/iotkit-secctl.c: Implement SECMPCINTSTATUS
+[PULL 02/68] fpu: Add float_class_denormal
-Implement the SECMPCINTSTATUS register. This is the only register
+Currently in softfloat we canonicalize input denormals and so the
-in the security controller that deals with Memory Protection
+code that implements floating point operations does not need to care
-Controllers, and it simply provides a read-only view of the
+whether the input value was originally normal or denormal.  However,
-interrupt lines from the various MPCs in the system.
+both x86 and Arm FEAT_AFP require that an exception flag is set if:
  * an input is denormal
  * that input is not squashed to zero
  * that input is actually used in the calculation (e.g. we
    did not find the other input was a NaN)
 So we need to track that the input was a non-squashed denormal.  To
 do this we add a new value to the FloatClass enum.  In this commit we
 add the value and adjust the code everywhere that looks at FloatClass
 values so that the new float_class_denormal behaves identically to
 float_class_normal.  We will add the code that does the "raise a new
 float exception flag if an input was an unsquashed denormal and we
 used it" in a subsequent commit.
 There should be no behavioural change in this commit.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20180620132032.28865-6-peter.maydell@linaro.org
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/hw/misc/iotkit-secctl.h |  8 +++++++
+ fpu/softfloat.c           | 32 ++++++++++++++++++++++++++++---
- hw/misc/iotkit-secctl.c         | 38 +++++++++++++++++++++++++++++++--
+ fpu/softfloat-parts.c.inc | 40 ++++++++++++++++++++++++---------------
-files changed, 44 insertions(+), 2 deletions(-)
+files changed, 54 insertions(+), 18 deletions(-)
-diff --git a/include/hw/misc/iotkit-secctl.h b/include/hw/misc/iotkit-secctl.h
+diff --git a/fpu/softfloat.c b/fpu/softfloat.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/misc/iotkit-secctl.h
+--- a/fpu/softfloat.c
-+++ b/include/hw/misc/iotkit-secctl.h
++++ b/fpu/softfloat.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ float64_gen2(float64 xa, float64 xb, float_status *s,
-  *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_enable
+ /*
-  *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_clear
+  * Classify a floating point number. Everything above float_class_qnan
-  *  + named GPIO inputs ahb_ppcexp{0,1,2,3}_irq_status
+  * is a NaN so cls >= float_class_qnan is any NaN.
-+ * Controlling the MPC in the IoTKit:
++ *
-+ *  + named GPIO input mpc_status
++ * Note that we canonicalize denormals, so most code should treat
-+ * Controlling each of the 16 expansion MPCs which a system using the IoTKit
++ * class_normal and class_denormal identically.
 + * might provide:
 + *  + named GPIO inputs mpcexp_status[0..15]
   */
- #ifndef IOTKIT_SECCTL_H
+ typedef enum __attribute__ ((__packed__)) {
-@@ -XXX,XX +XXX,XX @@
+     float_class_unclassified,
- #define IOTS_NUM_APB_PPC 2
+     float_class_zero,
- #define IOTS_NUM_APB_EXP_PPC 4
+     float_class_normal,
- #define IOTS_NUM_AHB_EXP_PPC 4
++    float_class_denormal, /* input was a non-squashed denormal */
-+#define IOTS_NUM_EXP_MPC 16
+     float_class_inf,
-+#define IOTS_NUM_MPC 1
+     float_class_qnan,  /* all NaNs from here */
+     float_class_snan,
- typedef struct IoTKitSecCtl IoTKitSecCtl;
+@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__ ((__packed__)) {
+ enum {
-@@ -XXX,XX +XXX,XX @@ struct IoTKitSecCtl {
+     float_cmask_zero    = float_cmask(float_class_zero),
-     uint32_t secrespcfg;
+     float_cmask_normal  = float_cmask(float_class_normal),
-     uint32_t nsccfg;
++    float_cmask_denormal = float_cmask(float_class_denormal),
-     uint32_t brginten;
+     float_cmask_inf     = float_cmask(float_class_inf),
-+    uint32_t mpcintstatus;
+     float_cmask_qnan    = float_cmask(float_class_qnan),
+     float_cmask_snan    = float_cmask(float_class_snan),
-     IoTKitSecCtlPPC apb[IOTS_NUM_APB_PPC];
-     IoTKitSecCtlPPC apbexp[IOTS_NUM_APB_EXP_PPC];
+     float_cmask_infzero = float_cmask_zero | float_cmask_inf,
-diff --git a/hw/misc/iotkit-secctl.c b/hw/misc/iotkit-secctl.c
+     float_cmask_anynan  = float_cmask_qnan | float_cmask_snan,
-index XXXXXXX..XXXXXXX 100644
++    float_cmask_anynorm = float_cmask_normal | float_cmask_denormal,
---- a/hw/misc/iotkit-secctl.c
+ };
-+++ b/hw/misc/iotkit-secctl.c
-@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_read(void *opaque, hwaddr addr,
+ /* Flags for parts_minmax. */
-     case A_NSCCFG:
+@@ -XXX,XX +XXX,XX @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
-         r = s->nsccfg;
+     return c == float_class_qnan;
          break;
 +    case A_SECMPCINTSTATUS:
 +        r = s->mpcintstatus;
 +        break;
      case A_SECPPCINTSTAT:
          r = s->secppcintstat;
          break;
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_read(void *opaque, hwaddr addr,
      case A_APBSPPPCEXP3:
          r = s->apbexp[offset_to_ppc_idx(offset)].sp;
          break;
 -    case A_SECMPCINTSTATUS:
      case A_SECMSCINTSTAT:
      case A_SECMSCINTEN:
      case A_NSMSCEXP:
@@ -XXX,XX +XXX,XX @@ static void iotkit_secctl_reset(DeviceState *dev)
      foreach_ppc(s, iotkit_secctl_reset_ppc);
  }
-+static void iotkit_secctl_mpc_status(void *opaque, int n, int level)
++/*
 + * Return true if the float_cmask has only normals in it
 + * (including input denormals that were canonicalized)
 + */
 +static inline bool cmask_is_only_normals(int cmask)
 +{
-+    IoTKitSecCtl *s = IOTKIT_SECCTL(opaque);
++    return !(cmask & ~float_cmask_anynorm);
 +
 +    s->mpcintstatus = deposit32(s->mpcintstatus, 0, 1, !!level);
 +}
 +
-+static void iotkit_secctl_mpcexp_status(void *opaque, int n, int level)
++static inline bool is_anynorm(FloatClass c)
 +{
-+    IoTKitSecCtl *s = IOTKIT_SECCTL(opaque);
++    return float_cmask(c) & float_cmask_anynorm;
 +
 +    s->mpcintstatus = deposit32(s->mpcintstatus, n + 16, 1, !!level);
 +}
 +
- static void iotkit_secctl_ppc_irqstatus(void *opaque, int n, int level)
+ /*
   * Structure holding all of the decomposed parts of a float.
   * The exponent is unbiased and the fraction is normalized.
@@ -XXX,XX +XXX,XX @@ static float64 float64r32_round_pack_canonical(FloatParts64 *p,
       */
      switch (p->cls) {
      case float_class_normal:
 +    case float_class_denormal:
          if (unlikely(p->exp == 0)) {
              /*
               * The result is denormal for float32, but can be represented
@@ -XXX,XX +XXX,XX @@ static floatx80 floatx80_round_pack_canonical(FloatParts128 *p,
      switch (p->cls) {
      case float_class_normal:
 +    case float_class_denormal:
          if (s->floatx80_rounding_precision == floatx80_precision_x) {
              parts_uncanon_normal(p, s, fmt);
              frac = p->frac_hi;
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
          break;
      case float_class_normal:
 +    case float_class_denormal:
      case float_class_zero:
          break;
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
      a->sign = b->sign;
      a->exp = b->exp;
 -    if (a->cls == float_class_normal) {
 +    if (is_anynorm(a->cls)) {
          frac_truncjam(a, b);
      } else if (is_nan(a->cls)) {
          /* Discard the low bits of the NaN. */
@@ -XXX,XX +XXX,XX @@ static Int128 float128_to_int128_scalbn(float128 a, FloatRoundMode rmode,
          return int128_zero();
      case float_class_normal:
 +    case float_class_denormal:
          if (parts_round_to_int_normal(&p, rmode, scale, 128 - 2)) {
              flags = float_flag_inexact;
          }
@@ -XXX,XX +XXX,XX @@ static Int128 float128_to_uint128_scalbn(float128 a, FloatRoundMode rmode,
          return int128_zero();
      case float_class_normal:
 +    case float_class_denormal:
          if (parts_round_to_int_normal(&p, rmode, scale, 128 - 2)) {
              flags = float_flag_inexact;
              if (p.cls == float_class_zero) {
@@ -XXX,XX +XXX,XX @@ float32 float32_exp2(float32 a, float_status *status)
      float32_unpack_canonical(&xp, a, status);
      if (unlikely(xp.cls != float_class_normal)) {
          switch (xp.cls) {
 +        case float_class_denormal:
 +            break;
          case float_class_snan:
          case float_class_qnan:
              parts_return_nan(&xp, status);
@@ -XXX,XX +XXX,XX @@ float32 float32_exp2(float32 a, float_status *status)
          case float_class_zero:
              return float32_one;
          default:
 -            break;
 +            g_assert_not_reached();
          }
 -        g_assert_not_reached();
      }
      float_raise(float_flag_inexact, status);
 diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/fpu/softfloat-parts.c.inc
 +++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(canonicalize)(FloatPartsN *p, float_status *status,
              frac_clear(p);
          } else {
              int shift = frac_normalize(p);
 -            p->cls = float_class_normal;
 +            p->cls = float_class_denormal;
              p->exp = fmt->frac_shift - fmt->exp_bias
                     - shift + !fmt->m68k_denormal;
          }
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
  static void partsN(uncanon)(FloatPartsN *p, float_status *s,
                              const FloatFmt *fmt)
  {
-     IoTKitSecCtlPPC *ppc = opaque;
+-    if (likely(p->cls == float_class_normal)) {
-@@ -XXX,XX +XXX,XX @@ static void iotkit_secctl_init(Object *obj)
++    if (likely(is_anynorm(p->cls))) {
-     qdev_init_gpio_out_named(dev, &s->sec_resp_cfg, "sec_resp_cfg", 1);
+         parts_uncanon_normal(p, s, fmt);
-     qdev_init_gpio_out_named(dev, &s->nsc_cfg_irq, "nsc_cfg", 1);
+     } else {
+         switch (p->cls) {
-+    qdev_init_gpio_in_named(dev, iotkit_secctl_mpc_status, "mpc_status", 1);
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
-+    qdev_init_gpio_in_named(dev, iotkit_secctl_mpcexp_status,
-+                            "mpcexp_status", IOTS_NUM_EXP_MPC);
+     if (a->sign != b_sign) {
-+
+         /* Subtraction */
-     memory_region_init_io(&s->s_regs, obj, &iotkit_secctl_s_ops,
+-        if (likely(ab_mask == float_cmask_normal)) {
-                           s, "iotkit-secctl-s-regs", 0x1000);
++        if (likely(cmask_is_only_normals(ab_mask))) {
-     memory_region_init_io(&s->ns_regs, obj, &iotkit_secctl_ns_ops,
+             if (parts_sub_normal(a, b)) {
-@@ -XXX,XX +XXX,XX @@ static const VMStateDescription iotkit_secctl_ppc_vmstate = {
+                 return a;
-     }
+             }
- };
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
+         }
-+static const VMStateDescription iotkit_secctl_mpcintstatus_vmstate = {
+     } else {
-+    .name = "iotkit-secctl-mpcintstatus",
+         /* Addition */
-+    .version_id = 1,
+-        if (likely(ab_mask == float_cmask_normal)) {
-+    .minimum_version_id = 1,
++        if (likely(cmask_is_only_normals(ab_mask))) {
-+    .fields = (VMStateField[]) {
+             parts_add_normal(a, b);
-+        VMSTATE_UINT32(mpcintstatus, IoTKitSecCtl),
+             return a;
-+        VMSTATE_END_OF_LIST()
+         }
-+    }
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
-+};
+     }
-+
- static const VMStateDescription iotkit_secctl_vmstate = {
+     if (b->cls == float_class_zero) {
-     .name = "iotkit-secctl",
+-        g_assert(a->cls == float_class_normal);
-     .version_id = 1,
++        g_assert(is_anynorm(a->cls));
-@@ -XXX,XX +XXX,XX @@ static const VMStateDescription iotkit_secctl_vmstate = {
+         return a;
-         VMSTATE_STRUCT_ARRAY(ahbexp, IoTKitSecCtl, IOTS_NUM_AHB_EXP_PPC, 1,
+     }
-                              iotkit_secctl_ppc_vmstate, IoTKitSecCtlPPC),
-         VMSTATE_END_OF_LIST()
+     g_assert(a->cls == float_class_zero);
--    }
+-    g_assert(b->cls == float_class_normal);
-+    },
++    g_assert(is_anynorm(b->cls));
-+    .subsections = (const VMStateDescription*[]) {
+  return_b:
-+        &iotkit_secctl_mpcintstatus_vmstate,
+     b->sign = b_sign;
-+        NULL
+     return b;
-+    },
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
- };
+     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
+     bool sign = a->sign ^ b->sign;
- static void iotkit_secctl_class_init(ObjectClass *klass, void *data)
 -    if (likely(ab_mask == float_cmask_normal)) {
 +    if (likely(cmask_is_only_normals(ab_mask))) {
          FloatPartsW tmp;
          frac_mulw(&tmp, a, b);
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
          a->sign ^= 1;
      }
 -    if (unlikely(ab_mask != float_cmask_normal)) {
 +    if (unlikely(!cmask_is_only_normals(ab_mask))) {
          if (unlikely(ab_mask == float_cmask_infzero)) {
              float_raise(float_flag_invalid | float_flag_invalid_imz, s);
              goto d_nan;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
          }
          g_assert(ab_mask & float_cmask_zero);
 -        if (c->cls == float_class_normal) {
 +        if (is_anynorm(c->cls)) {
              *a = *c;
              goto return_normal;
          }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
      int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
      bool sign = a->sign ^ b->sign;
 -    if (likely(ab_mask == float_cmask_normal)) {
 +    if (likely(cmask_is_only_normals(ab_mask))) {
          a->sign = sign;
          a->exp -= b->exp + frac_div(a, b);
          return a;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
  {
      int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 -    if (likely(ab_mask == float_cmask_normal)) {
 +    if (likely(cmask_is_only_normals(ab_mask))) {
          frac_modrem(a, b, mod_quot);
          return a;
      }
@@ -XXX,XX +XXX,XX @@ static void partsN(sqrt)(FloatPartsN *a, float_status *status,
      if (unlikely(a->cls != float_class_normal)) {
          switch (a->cls) {
 +        case float_class_denormal:
 +            break;
          case float_class_snan:
          case float_class_qnan:
              parts_return_nan(a, status);
@@ -XXX,XX +XXX,XX @@ static void partsN(round_to_int)(FloatPartsN *a, FloatRoundMode rmode,
      case float_class_inf:
          break;
      case float_class_normal:
 +    case float_class_denormal:
          if (parts_round_to_int_normal(a, rmode, scale, fmt->frac_size)) {
              float_raise(float_flag_inexact, s);
          }
@@ -XXX,XX +XXX,XX @@ static int64_t partsN(float_to_sint)(FloatPartsN *p, FloatRoundMode rmode,
          return 0;
      case float_class_normal:
 +    case float_class_denormal:
          /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
          if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
              flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static uint64_t partsN(float_to_uint)(FloatPartsN *p, FloatRoundMode rmode,
          return 0;
      case float_class_normal:
 +    case float_class_denormal:
          /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
          if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
              flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static int64_t partsN(float_to_sint_modulo)(FloatPartsN *p,
          return 0;
      case float_class_normal:
 +    case float_class_denormal:
          /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
          if (parts_round_to_int_normal(p, rmode, 0, N - 2)) {
              flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
      a_exp = a->exp;
      b_exp = b->exp;
 -    if (unlikely(ab_mask != float_cmask_normal)) {
 +    if (unlikely(!cmask_is_only_normals(ab_mask))) {
          switch (a->cls) {
          case float_class_normal:
 +        case float_class_denormal:
              break;
          case float_class_inf:
              a_exp = INT16_MAX;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
          }
          switch (b->cls) {
          case float_class_normal:
 +        case float_class_denormal:
              break;
          case float_class_inf:
              b_exp = INT16_MAX;
@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
  {
      int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 -    if (likely(ab_mask == float_cmask_normal)) {
 +    if (likely(cmask_is_only_normals(ab_mask))) {
          FloatRelation cmp;
          if (a->sign != b->sign) {
@@ -XXX,XX +XXX,XX @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
      case float_class_inf:
          break;
      case float_class_normal:
 +    case float_class_denormal:
          a->exp += MIN(MAX(n, -0x10000), 0x10000);
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
      if (unlikely(a->cls != float_class_normal)) {
          switch (a->cls) {
 +        case float_class_denormal:
 +            break;
          case float_class_snan:
          case float_class_qnan:
              parts_return_nan(a, s);
@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
              }
              return;
          default:
 -            break;
 +            g_assert_not_reached();
          }
 -        g_assert_not_reached();
      }
      if (unlikely(a->sign)) {
          goto d_nan;
 --
-.17.1
+.34.1

-New patch
+[PULL 03/68] fpu: Implement float_flag_input_denormal_used
+For the x86 and the Arm FEAT_AFP semantics, we need to be able to
+tell the target code that the FPU operation has used an input
+denormal.  Implement this; when it happens we set the new
+float_flag_denormal_input_used.
+Note that we only set this when an input denormal is actually used by
+the operation: if the operation results in Invalid Operation or
+Divide By Zero or the result is a NaN because some other input was a
+NaN then we never needed to look at the input denormal and do not set
+denormal_input_used.
+We mostly do not need to adjust the hardfloat codepaths to deal with
+this flag, because almost all hardfloat operations are already gated
+on the input not being a denormal, and will fall back to softfloat
+for a denormal input.  The only exception is the comparison
+operations, where we need to add the check for input denormals, which
+must now fall back to softfloat where they did not before.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ include/fpu/softfloat-types.h |  7 ++++
+ fpu/softfloat.c               | 38 +++++++++++++++++---
+ fpu/softfloat-parts.c.inc     | 68 ++++++++++++++++++++++++++++++++++-
+files changed, 107 insertions(+), 6 deletions(-)
+diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/fpu/softfloat-types.h
++++ b/include/fpu/softfloat-types.h
+@@ -XXX,XX +XXX,XX @@ enum {
+     float_flag_invalid_sqrt    = 0x0800,  /* sqrt(-x) */
+     float_flag_invalid_cvti    = 0x1000,  /* non-nan to integer */
+     float_flag_invalid_snan    = 0x2000,  /* any operand was snan */
++    /*
++     * An input was denormal and we used it (without flushing it to zero).
++     * Not set if we do not actually use the denormal input (e.g.
++     * because some other input was a NaN, or because the operation
++     * wasn't actually carried out (divide-by-zero; invalid))
++     */
++    float_flag_input_denormal_used = 0x4000,
+ };
+ /*
+diff --git a/fpu/softfloat.c b/fpu/softfloat.c
+index XXXXXXX..XXXXXXX 100644
+--- a/fpu/softfloat.c
++++ b/fpu/softfloat.c
+@@ -XXX,XX +XXX,XX @@ static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
+                                   float16_params_ahp.frac_size + 1);
+         break;
+-    case float_class_normal:
+     case float_class_denormal:
++        float_raise(float_flag_input_denormal_used, s);
++        break;
++    case float_class_normal:
+     case float_class_zero:
+         break;
+@@ -XXX,XX +XXX,XX @@ static void parts64_float_to_float(FloatParts64 *a, float_status *s)
+     if (is_nan(a->cls)) {
+         parts_return_nan(a, s);
+     }
++    if (a->cls == float_class_denormal) {
++        float_raise(float_flag_input_denormal_used, s);
++    }
+ }
+ static void parts128_float_to_float(FloatParts128 *a, float_status *s)
+@@ -XXX,XX +XXX,XX @@ static void parts128_float_to_float(FloatParts128 *a, float_status *s)
+     if (is_nan(a->cls)) {
+         parts_return_nan(a, s);
+     }
++    if (a->cls == float_class_denormal) {
++        float_raise(float_flag_input_denormal_used, s);
++    }
+ }
+ #define parts_float_to_float(P, S) \
+@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
+     a->sign = b->sign;
+     a->exp = b->exp;
+-    if (is_anynorm(a->cls)) {
++    switch (a->cls) {
++    case float_class_denormal:
++        float_raise(float_flag_input_denormal_used, s);
++        /* fall through */
++    case float_class_normal:
+         frac_truncjam(a, b);
+-    } else if (is_nan(a->cls)) {
++        break;
++    case float_class_snan:
++    case float_class_qnan:
+         /* Discard the low bits of the NaN. */
+         a->frac = b->frac_hi;
+         parts_return_nan(a, s);
++        break;
++    default:
++        break;
+     }
+ }
+@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_widen(FloatParts128 *a, FloatParts64 *b,
+     if (is_nan(a->cls)) {
+         parts_return_nan(a, s);
+     }
++    if (a->cls == float_class_denormal) {
++        float_raise(float_flag_input_denormal_used, s);
++    }
+ }
+ float32 float16_to_float32(float16 a, bool ieee, float_status *s)
+@@ -XXX,XX +XXX,XX @@ float32_hs_compare(float32 xa, float32 xb, float_status *s, bool is_quiet)
+         goto soft;
+     }
+-    float32_input_flush2(&ua.s, &ub.s, s);
++    if (unlikely(float32_is_denormal(ua.s) || float32_is_denormal(ub.s))) {
++        /* We may need to set the input_denormal_used flag */
++        goto soft;
++    }
++
+     if (isgreaterequal(ua.h, ub.h)) {
+         if (isgreater(ua.h, ub.h)) {
+             return float_relation_greater;
+@@ -XXX,XX +XXX,XX @@ float64_hs_compare(float64 xa, float64 xb, float_status *s, bool is_quiet)
+         goto soft;
+     }
+-    float64_input_flush2(&ua.s, &ub.s, s);
++    if (unlikely(float64_is_denormal(ua.s) || float64_is_denormal(ub.s))) {
++        /* We may need to set the input_denormal_used flag */
++        goto soft;
++    }
++
+     if (isgreaterequal(ua.h, ub.h)) {
+         if (isgreater(ua.h, ub.h)) {
+             return float_relation_greater;
+diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
+index XXXXXXX..XXXXXXX 100644
+--- a/fpu/softfloat-parts.c.inc
++++ b/fpu/softfloat-parts.c.inc
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
+     bool b_sign = b->sign ^ subtract;
+     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
++    /*
++     * For addition and subtraction, we will consume an
++     * input denormal unless the other input is a NaN.
++     */
++    if ((ab_mask & (float_cmask_denormal | float_cmask_anynan)) ==
++        float_cmask_denormal) {
++        float_raise(float_flag_input_denormal_used, s);
++    }
++
+     if (a->sign != b_sign) {
+         /* Subtraction */
+         if (likely(cmask_is_only_normals(ab_mask))) {
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
+     if (likely(cmask_is_only_normals(ab_mask))) {
+         FloatPartsW tmp;
++        if (ab_mask & float_cmask_denormal) {
++            float_raise(float_flag_input_denormal_used, s);
++        }
++
+         frac_mulw(&tmp, a, b);
+         frac_truncjam(a, &tmp);
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
+     }
+     /* Multiply by 0 or Inf */
++    if (ab_mask & float_cmask_denormal) {
++        float_raise(float_flag_input_denormal_used, s);
++    }
++
+     if (ab_mask & float_cmask_inf) {
+         a->cls = float_class_inf;
+         a->sign = sign;
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
+     if (flags & float_muladd_negate_result) {
+         a->sign ^= 1;
+     }
++
++    /*
++     * All result types except for "return the default NaN
++     * because this is an Invalid Operation" go through here;
++     * this matches the set of cases where we consumed a
++     * denormal input.
++     */
++    if (abc_mask & float_cmask_denormal) {
++        float_raise(float_flag_input_denormal_used, s);
++    }
+     return a;
+  return_sub_zero:
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
+     bool sign = a->sign ^ b->sign;
+     if (likely(cmask_is_only_normals(ab_mask))) {
++        if (ab_mask & float_cmask_denormal) {
++            float_raise(float_flag_input_denormal_used, s);
++        }
+         a->sign = sign;
+         a->exp -= b->exp + frac_div(a, b);
+         return a;
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
+         return parts_pick_nan(a, b, s);
+     }
++    if ((ab_mask & float_cmask_denormal) && b->cls != float_class_zero) {
++        float_raise(float_flag_input_denormal_used, s);
++    }
++
+     a->sign = sign;
+     /* Inf / X */
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
+     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
+     if (likely(cmask_is_only_normals(ab_mask))) {
++        if (ab_mask & float_cmask_denormal) {
++            float_raise(float_flag_input_denormal_used, s);
++        }
+         frac_modrem(a, b, mod_quot);
+         return a;
+     }
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
+         return a;
+     }
++    if (ab_mask & float_cmask_denormal) {
++        float_raise(float_flag_input_denormal_used, s);
++    }
++
+     /* N % Inf; 0 % N */
+     g_assert(b->cls == float_class_inf || a->cls == float_class_zero);
+     return a;
+@@ -XXX,XX +XXX,XX @@ static void partsN(sqrt)(FloatPartsN *a, float_status *status,
+     if (unlikely(a->cls != float_class_normal)) {
+         switch (a->cls) {
+         case float_class_denormal:
++            if (!a->sign) {
++                /* -ve denormal will be InvalidOperation */
++                float_raise(float_flag_input_denormal_used, status);
++            }
+             break;
+         case float_class_snan:
+         case float_class_qnan:
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
+         if ((flags & (minmax_isnum | minmax_isnumber))
+             && !(ab_mask & float_cmask_snan)
+             && (ab_mask & ~float_cmask_qnan)) {
++            if (ab_mask & float_cmask_denormal) {
++                float_raise(float_flag_input_denormal_used, s);
++            }
+             return is_nan(a->cls) ? b : a;
+         }
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
+         return parts_pick_nan(a, b, s);
+     }
++    if (ab_mask & float_cmask_denormal) {
++        float_raise(float_flag_input_denormal_used, s);
++    }
++
+     a_exp = a->exp;
+     b_exp = b->exp;
+@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
+     if (likely(cmask_is_only_normals(ab_mask))) {
+         FloatRelation cmp;
++        if (ab_mask & float_cmask_denormal) {
++            float_raise(float_flag_input_denormal_used, s);
++        }
++
+         if (a->sign != b->sign) {
+             goto a_sign;
+         }
+@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
+         return float_relation_unordered;
+     }
++    if (ab_mask & float_cmask_denormal) {
++        float_raise(float_flag_input_denormal_used, s);
++    }
++
+     if (ab_mask & float_cmask_zero) {
+         if (ab_mask == float_cmask_zero) {
+             return float_relation_equal;
+@@ -XXX,XX +XXX,XX @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
+     case float_class_zero:
+     case float_class_inf:
+         break;
+-    case float_class_normal:
+     case float_class_denormal:
++        float_raise(float_flag_input_denormal_used, s);
++        /* fall through */
++    case float_class_normal:
+         a->exp += MIN(MAX(n, -0x10000), 0x10000);
+         break;
+     default:
+@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
+     if (unlikely(a->cls != float_class_normal)) {
+         switch (a->cls) {
+         case float_class_denormal:
++            if (!a->sign) {
++                /* -ve denormal will be InvalidOperation */
++                float_raise(float_flag_input_denormal_used, s);
++            }
+             break;
+         case float_class_snan:
+         case float_class_qnan:
+--
+.34.1

-New patch
+[PULL 04/68] fpu: allow flushing of output denormals to be after rounding
+Currently we handle flushing of output denormals in uncanon_normal
 always before we deal with rounding.  This works for architectures
 that detect tininess before rounding, but is usually not the right
 place when the architecture detects tininess after rounding.  For
 example, for x86 the SDM states that the MXCSR FTZ control bit causes
 outputs to be flushed to zero "when it detects a floating-point
 underflow condition".  This means that we mustn't flush to zero if
 the input is such that after rounding it is no longer tiny.
 At least one of our guest architectures does underflow detection
 after rounding but flushing of denormals before rounding (MIPS MSA);
 this means we need to have a config knob for this that is separate
 from our existing tininess_before_rounding setting.
 Add an ftz_detection flag.  For consistency with
 tininess_before_rounding, we make it default to "detect ftz after
 rounding"; this means that we need to explicitly set the flag to
 "detect ftz before rounding" on every existing architecture that sets
 flush_to_zero, so that this commit has no behaviour change.
 (This means more code change here but for the long term a less
 confusing API.)
 For several architectures the current behaviour is either
 definitely or possibly wrong; annotate those with TODO comments.
 These architectures are definitely wrong (and should detect
 ftz after rounding):
  * x86
  * Alpha
 For these architectures the spec is unclear:
  * MIPS (for non-MSA)
  * RX
  * SH4
 PA-RISC makes ftz detection IMPDEF, but we aren't setting the
 "tininess before rounding" setting that we ought to.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  include/fpu/softfloat-helpers.h | 11 +++++++++++
  include/fpu/softfloat-types.h   | 18 ++++++++++++++++++
  target/mips/fpu_helper.h        |  6 ++++++
  target/alpha/cpu.c              |  7 +++++++
  target/arm/cpu.c                |  1 +
  target/hppa/fpu_helper.c        | 11 +++++++++++
  target/i386/tcg/fpu_helper.c    |  8 ++++++++
  target/mips/msa.c               |  9 +++++++++
  target/ppc/cpu_init.c           |  3 +++
  target/rx/cpu.c                 |  8 ++++++++
  target/sh4/cpu.c                |  8 ++++++++
  target/tricore/helper.c         |  1 +
  tests/fp/fp-bench.c             |  1 +
  fpu/softfloat-parts.c.inc       | 21 +++++++++++++++------
 files changed, 107 insertions(+), 6 deletions(-)
 diff --git a/include/fpu/softfloat-helpers.h b/include/fpu/softfloat-helpers.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/fpu/softfloat-helpers.h
 +++ b/include/fpu/softfloat-helpers.h
@@ -XXX,XX +XXX,XX @@ static inline void set_flush_inputs_to_zero(bool val, float_status *status)
      status->flush_inputs_to_zero = val;
  }
 +static inline void set_float_ftz_detection(FloatFTZDetection d,
 +                                           float_status *status)
 +{
 +    status->ftz_detection = d;
 +}
 +
  static inline void set_default_nan_mode(bool val, float_status *status)
  {
      status->default_nan_mode = val;
@@ -XXX,XX +XXX,XX @@ static inline bool get_default_nan_mode(const float_status *status)
      return status->default_nan_mode;
  }
 +static inline FloatFTZDetection get_float_ftz_detection(const float_status *status)
 +{
 +    return status->ftz_detection;
 +}
 +
  #endif /* SOFTFLOAT_HELPERS_H */
 diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/fpu/softfloat-types.h
 +++ b/include/fpu/softfloat-types.h
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
      float_infzeronan_suppress_invalid = (1 << 7),
  } FloatInfZeroNaNRule;
 +/*
 + * When flush_to_zero is set, should we detect denormal results to
 + * be flushed before or after rounding? For most architectures this
 + * should be set to match the tininess_before_rounding setting,
 + * but a few architectures, e.g. MIPS MSA, detect FTZ before
 + * rounding but tininess after rounding.
 + *
 + * This enum is arranged so that the default if the target doesn't
 + * configure it matches the default for tininess_before_rounding
 + * (i.e. "after rounding").
 + */
 +typedef enum __attribute__((__packed__)) {
 +    float_ftz_after_rounding = 0,
 +    float_ftz_before_rounding = 1,
 +} FloatFTZDetection;
 +
  /*
   * Floating Point Status. Individual architectures may maintain
   * several versions of float_status for different functions. The
@@ -XXX,XX +XXX,XX @@ typedef struct float_status {
      bool tininess_before_rounding;
      /* should denormalised results go to zero and set output_denormal_flushed? */
      bool flush_to_zero;
 +    /* do we detect and flush denormal results before or after rounding? */
 +    FloatFTZDetection ftz_detection;
      /* should denormalised inputs go to zero and set input_denormal_flushed? */
      bool flush_inputs_to_zero;
      bool default_nan_mode;
 diff --git a/target/mips/fpu_helper.h b/target/mips/fpu_helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/mips/fpu_helper.h
 +++ b/target/mips/fpu_helper.h
@@ -XXX,XX +XXX,XX @@ static inline void fp_reset(CPUMIPSState *env)
       */
      set_float_2nan_prop_rule(float_2nan_prop_s_ab,
                               &env->active_fpu.fp_status);
 +    /*
 +     * TODO: the spec does't say clearly whether FTZ happens before
 +     * or after rounding for normal FPU operations.
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding,
 +                            &env->active_fpu.fp_status);
  }
  /* MSA */
 diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/alpha/cpu.c
 +++ b/target/alpha/cpu.c
@@ -XXX,XX +XXX,XX @@ static void alpha_cpu_initfn(Object *obj)
      set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
      /* Default NaN: sign bit clear, msb frac bit set */
      set_float_default_nan_pattern(0b01000000, &env->fp_status);
 +    /*
 +     * TODO: this is incorrect. The Alpha Architecture Handbook version 4
 +     * section 4.7.7.11 says that we flush to zero for underflow cases, so
 +     * this should be float_ftz_after_rounding to match the
 +     * tininess_after_rounding (which is specified in section 4.7.5).
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
  #if defined(CONFIG_USER_ONLY)
      env->flags = ENV_FLAG_PS_USER | ENV_FLAG_FEN;
      cpu_alpha_store_fpcr(env, (uint64_t)(FPCR_INVD | FPCR_DZED | FPCR_OVFD
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
  static void arm_set_default_fp_behaviours(float_status *s)
  {
      set_float_detect_tininess(float_tininess_before_rounding, s);
 +    set_float_ftz_detection(float_ftz_before_rounding, s);
      set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
      set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
      set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
 diff --git a/target/hppa/fpu_helper.c b/target/hppa/fpu_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/hppa/fpu_helper.c
 +++ b/target/hppa/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(loaded_fr0)(CPUHPPAState *env)
      set_float_infzeronan_rule(float_infzeronan_dnan_never, &env->fp_status);
      /* Default NaN: sign bit clear, msb-1 frac bit set */
      set_float_default_nan_pattern(0b00100000, &env->fp_status);
 +    /*
 +     * "PA-RISC 2.0 Architecture" says it is IMPDEF whether the flushing
 +     * enabled by FPSR.D happens before or after rounding. We pick "before"
 +     * for consistency with tininess detection.
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 +    /*
 +     * TODO: "PA-RISC 2.0 Architecture" chapter 10 says that we should
 +     * detect tininess before rounding, but we don't set that here so we
 +     * get the default tininess after rounding.
 +     */
  }
  void cpu_hppa_loaded_fr0(CPUHPPAState *env)
 diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/tcg/fpu_helper.c
 +++ b/target/i386/tcg/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ void cpu_init_fp_statuses(CPUX86State *env)
      set_float_default_nan_pattern(0b11000000, &env->fp_status);
      set_float_default_nan_pattern(0b11000000, &env->mmx_status);
      set_float_default_nan_pattern(0b11000000, &env->sse_status);
 +    /*
 +     * TODO: x86 does flush-to-zero detection after rounding (the SDM
 +     * section 10.2.3.3 on the FTZ bit of MXCSR says that we flush
 +     * when we detect underflow, which x86 does after rounding).
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->mmx_status);
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->sse_status);
  }
  static inline uint8_t save_exception_flags(CPUX86State *env)
 diff --git a/target/mips/msa.c b/target/mips/msa.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/mips/msa.c
 +++ b/target/mips/msa.c
@@ -XXX,XX +XXX,XX @@ void msa_reset(CPUMIPSState *env)
      /* tininess detected after rounding.*/
      set_float_detect_tininess(float_tininess_after_rounding,
                                &env->active_tc.msa_fp_status);
 +    /*
 +     * MSACSR.FS detects tiny results to flush to zero before rounding
 +     * (per "MIPS Architecture for Programmers Volume IV-j: The MIPS64 SIMD
 +     * Architecture Module, Revision 1.1" section 3.5.4), even though it
 +     * detects tininess after rounding for underflow purposes (section 3.4.2
 +     * table 3.3).
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding,
 +                            &env->active_tc.msa_fp_status);
      /*
       * According to MIPS specifications, if one of the two operands is
 diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/ppc/cpu_init.c
 +++ b/target/ppc/cpu_init.c
@@ -XXX,XX +XXX,XX @@ static void ppc_cpu_reset_hold(Object *obj, ResetType type)
      /* tininess for underflow is detected before rounding */
      set_float_detect_tininess(float_tininess_before_rounding,
                                &env->fp_status);
 +    /* Similarly for flush-to-zero */
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 +
      /*
       * PowerPC propagation rules:
       *  1. A if it sNaN or qNaN
 diff --git a/target/rx/cpu.c b/target/rx/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/rx/cpu.c
 +++ b/target/rx/cpu.c
@@ -XXX,XX +XXX,XX @@ static void rx_cpu_reset_hold(Object *obj, ResetType type)
      set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
      /* Default NaN value: sign bit clear, set frac msb */
      set_float_default_nan_pattern(0b01000000, &env->fp_status);
 +    /*
 +     * TODO: "RX Family RXv1 Instruction Set Architecture" is not 100% clear
 +     * on whether flush-to-zero should happen before or after rounding, but
 +     * section 1.3.2 says that it happens when underflow is detected, and
 +     * implies that underflow is detected after rounding. So this may not
 +     * be the correct setting.
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
  }
  static ObjectClass *rx_cpu_class_by_name(const char *cpu_model)
 diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/sh4/cpu.c
 +++ b/target/sh4/cpu.c
@@ -XXX,XX +XXX,XX @@ static void superh_cpu_reset_hold(Object *obj, ResetType type)
      set_default_nan_mode(1, &env->fp_status);
      /* sign bit clear, set all frac bits other than msb */
      set_float_default_nan_pattern(0b00111111, &env->fp_status);
 +    /*
 +     * TODO: "SH-4 CPU Core Architecture ADCS 7182230F" doesn't say whether
 +     * it detects tininess before or after rounding. Section 6.4 is clear
 +     * that flush-to-zero happens when the result underflows, though, so
 +     * either this should be "detect ftz after rounding" or else we should
 +     * be setting "detect tininess before rounding".
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
  }
  static void superh_cpu_disas_set_info(CPUState *cpu, disassemble_info *info)
 diff --git a/target/tricore/helper.c b/target/tricore/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/tricore/helper.c
 +++ b/target/tricore/helper.c
@@ -XXX,XX +XXX,XX @@ void fpu_set_state(CPUTriCoreState *env)
      set_flush_inputs_to_zero(1, &env->fp_status);
      set_flush_to_zero(1, &env->fp_status);
      set_float_detect_tininess(float_tininess_before_rounding, &env->fp_status);
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
      set_default_nan_mode(1, &env->fp_status);
      /* Default NaN pattern: sign bit clear, frac msb set */
      set_float_default_nan_pattern(0b01000000, &env->fp_status);
 diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/fp/fp-bench.c
 +++ b/tests/fp/fp-bench.c
@@ -XXX,XX +XXX,XX @@ static void run_bench(void)
      set_float_3nan_prop_rule(float_3nan_prop_s_cab, &soft_status);
      set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, &soft_status);
      set_float_default_nan_pattern(0b01000000, &soft_status);
 +    set_float_ftz_detection(float_ftz_before_rounding, &soft_status);
      f = bench_funcs[operation][precision];
      g_assert(f);
 diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/fpu/softfloat-parts.c.inc
 +++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
              p->frac_lo &= ~round_mask;
          }
          frac_shr(p, frac_shift);
 -    } else if (s->flush_to_zero) {
 +    } else if (s->flush_to_zero &&
 +               s->ftz_detection == float_ftz_before_rounding) {
          flags |= float_flag_output_denormal_flushed;
          p->cls = float_class_zero;
          exp = 0;
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
          exp = (p->frac_hi & DECOMPOSED_IMPLICIT_BIT) && !fmt->m68k_denormal;
          frac_shr(p, frac_shift);
 -        if (is_tiny && (flags & float_flag_inexact)) {
 -            flags |= float_flag_underflow;
 -        }
 -        if (exp == 0 && frac_eqz(p)) {
 -            p->cls = float_class_zero;
 +        if (is_tiny) {
 +            if (s->flush_to_zero) {
 +                assert(s->ftz_detection == float_ftz_after_rounding);
 +                flags |= float_flag_output_denormal_flushed;
 +                p->cls = float_class_zero;
 +                exp = 0;
 +                frac_clear(p);
 +            } else if (flags & float_flag_inexact) {
 +                flags |= float_flag_underflow;
 +            }
 +            if (exp == 0 && frac_eqz(p)) {
 +                p->cls = float_class_zero;
 +            }
          }
      }
      p->exp = exp;
 --
 .34.1

-New patch
+[PULL 05/68] target/arm: Define FPCR AH, FIZ, NEP bits
+The Armv8.7 FEAT_AFP feature defines three new control bits in
+the FPCR:
+ * FPCR.AH: "alternate floating point mode"; this changes floating
+   point behaviour in a variety of ways, including:
+    - the sign of a default NaN is 1, not 0
+    - if FPCR.FZ is also 1, denormals detected after rounding
+      with an unbounded exponent has been applied are flushed to zero
+    - FPCR.FZ does not cause denormalized inputs to be flushed to zero
+    - miscellaneous other corner-case behaviour changes
+ * FPCR.FIZ: flush denormalized numbers to zero on input for
+   most instructions
+ * FPCR.NEP: makes scalar SIMD operations merge the result with
+   higher vector elements in one of the source registers, instead
+   of zeroing the higher elements of the destination
+This commit defines the new bits in the FPCR, and allows them to be
+read or written when FEAT_AFP is implemented.  Actual behaviour
+changes will be implemented in subsequent commits.
+Note that these are the first FPCR bits which don't appear in the
+AArch32 FPSCR view of the register, and which share bit positions
+with FPSR bits.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/cpu-features.h |  5 +++++
+ target/arm/cpu.h          |  3 +++
+ target/arm/vfp_helper.c   | 11 ++++++++---
+files changed, 16 insertions(+), 3 deletions(-)
+diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu-features.h
++++ b/target/arm/cpu-features.h
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_hcx(const ARMISARegisters *id)
+     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HCX) != 0;
+ }
++static inline bool isar_feature_aa64_afp(const ARMISARegisters *id)
++{
++    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, AFP) != 0;
++}
++
+ static inline bool isar_feature_aa64_tidcp1(const ARMISARegisters *id)
+ {
+     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, TIDCP1) != 0;
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
+  */
+ /* FPCR bits */
++#define FPCR_FIZ    (1 << 0)    /* Flush Inputs to Zero (FEAT_AFP) */
++#define FPCR_AH     (1 << 1)    /* Alternate Handling (FEAT_AFP) */
++#define FPCR_NEP    (1 << 2)    /* SIMD scalar ops preserve elts (FEAT_AFP) */
+ #define FPCR_IOE    (1 << 8)    /* Invalid Operation exception trap enable */
+ #define FPCR_DZE    (1 << 9)    /* Divide by Zero exception trap enable */
+ #define FPCR_OFE    (1 << 10)   /* Overflow exception trap enable */
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp_helper.c
++++ b/target/arm/vfp_helper.c
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_masked(CPUARMState *env, uint32_t val, uint32_t mask)
+     if (!cpu_isar_feature(any_fp16, cpu)) {
+         val &= ~FPCR_FZ16;
+     }
++    if (!cpu_isar_feature(aa64_afp, cpu)) {
++        val &= ~(FPCR_FIZ | FPCR_AH | FPCR_NEP);
++    }
+     if (!cpu_isar_feature(aa64_ebf16, cpu)) {
+         val &= ~FPCR_EBF;
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_masked(CPUARMState *env, uint32_t val, uint32_t mask)
+      * We don't implement trapped exception handling, so the
+      * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
+      *
+-     * The FPCR bits we keep in vfp.fpcr are AHP, DN, FZ, RMode, EBF
+-     * and FZ16. Len, Stride and LTPSIZE we just handled. Store those bits
++     * The FPCR bits we keep in vfp.fpcr are AHP, DN, FZ, RMode, EBF, FZ16,
++     * FIZ, AH, and NEP.
++     * Len, Stride and LTPSIZE we just handled. Store those bits
+      * there, and zero any of the other FPCR bits and the RES0 and RAZ/WI
+      * bits.
+      */
+-    val &= FPCR_AHP | FPCR_DN | FPCR_FZ | FPCR_RMODE_MASK | FPCR_FZ16 | FPCR_EBF;
++    val &= FPCR_AHP | FPCR_DN | FPCR_FZ | FPCR_RMODE_MASK | FPCR_FZ16 |
++        FPCR_EBF | FPCR_FIZ | FPCR_AH | FPCR_NEP;
+     env->vfp.fpcr &= ~mask;
+     env->vfp.fpcr |= val;
+ }
+--
+.34.1

-New patch
+[PULL 06/68] target/arm: Implement FPCR.FIZ handling
+Part of FEAT_AFP is the new control bit FPCR.FIZ.  This bit affects
+flushing of single and double precision denormal inputs to zero for
+AArch64 floating point instructions.  (For half-precision, the
+existing FPCR.FZ16 control remains the only one.)
+FPCR.FIZ differs from FPCR.FZ in that if we flush an input denormal
+only because of FPCR.FIZ then we should *not* set the cumulative
+exception bit FPSR.IDC.
+FEAT_AFP also defines that in AArch64 the existing FPCR.FZ only
+applies when FPCR.AH is 0.
+We can implement this by setting the "flush inputs to zero" state
+appropriately when FPCR is written, and by not reflecting the
+float_flag_input_denormal status flag into FPSR reads when it is the
+result only of FPSR.FIZ.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/vfp_helper.c | 60 ++++++++++++++++++++++++++++++++++-------
+file changed, 50 insertions(+), 10 deletions(-)
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp_helper.c
++++ b/target/arm/vfp_helper.c
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
+ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
+ {
+-    uint32_t i = 0;
++    uint32_t a32_flags = 0, a64_flags = 0;
+-    i |= get_float_exception_flags(&env->vfp.fp_status_a32);
+-    i |= get_float_exception_flags(&env->vfp.fp_status_a64);
+-    i |= get_float_exception_flags(&env->vfp.standard_fp_status);
++    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
++    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
+     /* FZ16 does not generate an input denormal exception.  */
+-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
++    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
+           & ~float_flag_input_denormal_flushed);
+-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
++    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
+           & ~float_flag_input_denormal_flushed);
+-    i |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
++
++    a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
++    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
+           & ~float_flag_input_denormal_flushed);
+-    return vfp_exceptbits_from_host(i);
++    /*
++     * Flushing an input denormal *only* because FPCR.FIZ == 1 does
++     * not set FPSR.IDC; if FPCR.FZ is also set then this takes
++     * precedence and IDC is set (see the FPUnpackBase pseudocode).
++     * So squash it unless (FPCR.AH == 0 && FPCR.FZ == 1).
++     * We only do this for the a64 flags because FIZ has no effect
++     * on AArch32 even if it is set.
++     */
++    if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
++        a64_flags &= ~float_flag_input_denormal_flushed;
++    }
++    return vfp_exceptbits_from_host(a32_flags | a64_flags);
+ }
+ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
+     set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
+ }
++static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
++{
++    /*
++     * Synchronize any pending exception-flag information in the
++     * float_status values into env->vfp.fpsr, and then clear out
++     * the float_status data.
++     */
++    env->vfp.fpsr |= vfp_get_fpsr_from_host(env);
++    vfp_clear_float_status_exc_flags(env);
++}
++
+ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
+ {
+     uint64_t changed = env->vfp.fpcr;
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
+     if (changed & FPCR_FZ) {
+         bool ftz_enabled = val & FPCR_FZ;
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
+-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
++        /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
++        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
++    }
++    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
++        /*
++         * A64: Flush denormalized inputs to zero if FPCR.FIZ = 1, or
++         * both FPCR.AH = 0 and FPCR.FZ = 1.
++         */
++        bool fitz_enabled = (val & FPCR_FIZ) ||
++            (val & (FPCR_FZ | FPCR_AH)) == FPCR_FZ;
++        set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status_a64);
+     }
+     if (changed & FPCR_DN) {
+         bool dnan_enabled = val & FPCR_DN;
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
+     }
++    /*
++     * If any bits changed that we look at in vfp_get_fpsr_from_host(),
++     * we must sync the float_status flags into vfp.fpsr now (under the
++     * old regime) before we update vfp.fpcr.
++     */
++    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
++        vfp_sync_and_clear_float_status_exc_flags(env);
++    }
+ }
+ #else
+--
+.34.1

-New patch
+[PULL 07/68] target/arm: Adjust FP behaviour for FPCR.AH = 1
+When FPCR.AH is set, various behaviours of AArch64 floating point
+operations which are controlled by softfloat config settings change:
+ * tininess and ftz detection before/after rounding
+ * NaN propagation order
+ * result of 0 * Inf + NaN
+ * default NaN value
+When the guest changes the value of the AH bit, switch these config
+settings on the fp_status_a64 and fp_status_f16_a64 float_status
+fields.
+This requires us to make the arm_set_default_fp_behaviours() function
+global, since we now need to call it from cpu.c and vfp_helper.c; we
+move it to vfp_helper.c so it can be next to the new
+arm_set_ah_fp_behaviours().
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/internals.h  |  4 +++
+ target/arm/cpu.c        | 23 ----------------
+ target/arm/vfp_helper.c | 58 ++++++++++++++++++++++++++++++++++++++++-
+files changed, 61 insertions(+), 24 deletions(-)
+diff --git a/target/arm/internals.h b/target/arm/internals.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/internals.h
++++ b/target/arm/internals.h
+@@ -XXX,XX +XXX,XX @@ uint64_t gt_virt_cnt_offset(CPUARMState *env);
+  * all EL1" scope; this covers stage 1 and stage 2.
+  */
+ int alle1_tlbmask(CPUARMState *env);
++
++/* Set the float_status behaviour to match the Arm defaults */
++void arm_set_default_fp_behaviours(float_status *s);
++
+ #endif
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
+     QLIST_INSERT_HEAD(&cpu->el_change_hooks, entry, node);
+ }
+-/*
+- * Set the float_status behaviour to match the Arm defaults:
+- *  * tininess-before-rounding
+- *  * 2-input NaN propagation prefers SNaN over QNaN, and then
+- *    operand A over operand B (see FPProcessNaNs() pseudocode)
+- *  * 3-input NaN propagation prefers SNaN over QNaN, and then
+- *    operand C over A over B (see FPProcessNaNs3() pseudocode,
+- *    but note that for QEMU muladd is a * b + c, whereas for
+- *    the pseudocode function the arguments are in the order c, a, b.
+- *  * 0 * Inf + NaN returns the default NaN if the input NaN is quiet,
+- *    and the input NaN if it is signalling
+- *  * Default NaN has sign bit clear, msb frac bit set
+- */
+-static void arm_set_default_fp_behaviours(float_status *s)
+-{
+-    set_float_detect_tininess(float_tininess_before_rounding, s);
+-    set_float_ftz_detection(float_ftz_before_rounding, s);
+-    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
+-    set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
+-    set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
+-    set_float_default_nan_pattern(0b01000000, s);
+-}
+-
+ static void cp_reg_reset(gpointer key, gpointer value, gpointer opaque)
+ {
+     /* Reset a single ARMCPRegInfo register */
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp_helper.c
++++ b/target/arm/vfp_helper.c
+@@ -XXX,XX +XXX,XX @@
+ #include "exec/helper-proto.h"
+ #include "internals.h"
+ #include "cpu-features.h"
++#include "fpu/softfloat.h"
+ #ifdef CONFIG_TCG
+ #include "qemu/log.h"
+-#include "fpu/softfloat.h"
+ #endif
+ /* VFP support.  We follow the convention used for VFP instructions:
+    Single precision routines have a "s" suffix, double precision a
+    "d" suffix.  */
++/*
++ * Set the float_status behaviour to match the Arm defaults:
++ *  * tininess-before-rounding
++ *  * 2-input NaN propagation prefers SNaN over QNaN, and then
++ *    operand A over operand B (see FPProcessNaNs() pseudocode)
++ *  * 3-input NaN propagation prefers SNaN over QNaN, and then
++ *    operand C over A over B (see FPProcessNaNs3() pseudocode,
++ *    but note that for QEMU muladd is a * b + c, whereas for
++ *    the pseudocode function the arguments are in the order c, a, b.
++ *  * 0 * Inf + NaN returns the default NaN if the input NaN is quiet,
++ *    and the input NaN if it is signalling
++ *  * Default NaN has sign bit clear, msb frac bit set
++ */
++void arm_set_default_fp_behaviours(float_status *s)
++{
++    set_float_detect_tininess(float_tininess_before_rounding, s);
++    set_float_ftz_detection(float_ftz_before_rounding, s);
++    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
++    set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
++    set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
++    set_float_default_nan_pattern(0b01000000, s);
++}
++
++/*
++ * Set the float_status behaviour to match the FEAT_AFP
++ * FPCR.AH=1 requirements:
++ *  * tininess-after-rounding
++ *  * 2-input NaN propagation prefers the first NaN
++ *  * 3-input NaN propagation prefers a over b over c
++ *  * 0 * Inf + NaN always returns the input NaN and doesn't
++ *    set Invalid for a QNaN
++ *  * default NaN has sign bit set, msb frac bit set
++ */
++static void arm_set_ah_fp_behaviours(float_status *s)
++{
++    set_float_detect_tininess(float_tininess_after_rounding, s);
++    set_float_ftz_detection(float_ftz_after_rounding, s);
++    set_float_2nan_prop_rule(float_2nan_prop_ab, s);
++    set_float_3nan_prop_rule(float_3nan_prop_abc, s);
++    set_float_infzeronan_rule(float_infzeronan_dnan_never |
++                              float_infzeronan_suppress_invalid, s);
++    set_float_default_nan_pattern(0b11000000, s);
++}
++
+ #ifdef CONFIG_TCG
+ /* Convert host exception flags to vfp form.  */
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
+     }
++    if (changed & FPCR_AH) {
++        bool ah_enabled = val & FPCR_AH;
++
++        if (ah_enabled) {
++            /* Change behaviours for A64 FP operations */
++            arm_set_ah_fp_behaviours(&env->vfp.fp_status_a64);
++            arm_set_ah_fp_behaviours(&env->vfp.fp_status_f16_a64);
++        } else {
++            arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
++            arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
++        }
++    }
+     /*
+      * If any bits changed that we look at in vfp_get_fpsr_from_host(),
+      * we must sync the float_status flags into vfp.fpsr now (under the
+--
+.34.1

-New patch
+[PULL 08/68] target/arm: Adjust exception flag handling for AH = 1
+When FPCR.AH = 1, some of the cumulative exception flags in the FPSR
+behave slightly differently for A64 operations:
+ * IDC is set when a denormal input is used without flushing
+ * IXC (Inexact) is set when an output denormal is flushed to zero
+Update vfp_get_fpsr_from_host() to do this.
+Note that because half-precision operations never set IDC, we now
+need to add float_flag_input_denormal_used to the set we mask out of
+fp_status_f16_a64.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/vfp_helper.c | 17 ++++++++++++++---
+file changed, 14 insertions(+), 3 deletions(-)
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp_helper.c
++++ b/target/arm/vfp_helper.c
+@@ -XXX,XX +XXX,XX @@ static void arm_set_ah_fp_behaviours(float_status *s)
+ #ifdef CONFIG_TCG
+ /* Convert host exception flags to vfp form.  */
+-static inline uint32_t vfp_exceptbits_from_host(int host_bits)
++static inline uint32_t vfp_exceptbits_from_host(int host_bits, bool ah)
+ {
+     uint32_t target_bits = 0;
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
+     if (host_bits & float_flag_input_denormal_flushed) {
+         target_bits |= FPSR_IDC;
+     }
++    /*
++     * With FPCR.AH, IDC is set when an input denormal is used,
++     * and flushing an output denormal to zero sets both IXC and UFC.
++     */
++    if (ah && (host_bits & float_flag_input_denormal_used)) {
++        target_bits |= FPSR_IDC;
++    }
++    if (ah && (host_bits & float_flag_output_denormal_flushed)) {
++        target_bits |= FPSR_IXC;
++    }
+     return target_bits;
+ }
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
+     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
+     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
+-          & ~float_flag_input_denormal_flushed);
++          & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
+     /*
+      * Flushing an input denormal *only* because FPCR.FIZ == 1 does
+      * not set FPSR.IDC; if FPCR.FZ is also set then this takes
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
+     if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
+         a64_flags &= ~float_flag_input_denormal_flushed;
+     }
+-    return vfp_exceptbits_from_host(a32_flags | a64_flags);
++    return vfp_exceptbits_from_host(a64_flags, env->vfp.fpcr & FPCR_AH) |
++        vfp_exceptbits_from_host(a32_flags, false);
+ }
+ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
+--
+.34.1

-New patch
+[PULL 09/68] target/arm: Add FPCR.AH to tbflags
+We are going to need to generate different code in some cases when
+FPCR.AH is 1.  For example:
+ * Floating point neg and abs must not flip the sign bit of NaNs
+ * some insns (FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, and various
+   BFCVT and BFM bfloat16 ops) need to use a different float_status
+   to the usual one
+Encode FPCR.AH into the A64 tbflags, so we can refer to it at
+translate time.
+Because we now have a bit in FPCR that affects codegen, we can't mark
+the AArch64 FPCR register as being SUPPRESS_TB_END any more; writes
+to it will now end the TB and trigger a regeneration of hflags.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/cpu.h               | 1 +
+ target/arm/tcg/translate.h     | 2 ++
+ target/arm/helper.c            | 2 +-
+ target/arm/tcg/hflags.c        | 4 ++++
+ target/arm/tcg/translate-a64.c | 1 +
+files changed, 9 insertions(+), 1 deletion(-)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2, 34, 1)
+ FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1)
+ /* Set if FEAT_NV2 RAM accesses are big-endian */
+ FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1)
++FIELD(TBFLAG_A64, AH, 37, 1)   /* FPCR.AH */
+ /*
+  * Helpers for using the above. Note that only the A64 accessors use
+diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate.h
++++ b/target/arm/tcg/translate.h
+@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
+     bool nv2_mem_e20;
+     /* True if NV2 enabled and NV2 RAM accesses are big-endian */
+     bool nv2_mem_be;
++    /* True if FPCR.AH is 1 (alternate floating point handling) */
++    bool fpcr_ah;
+     /*
+      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
+      *  < 0, set by the current instruction.
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
+       .writefn = aa64_daif_write, .resetfn = arm_cp_reset_ignore },
+     { .name = "FPCR", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 3, .opc2 = 0, .crn = 4, .crm = 4,
+-      .access = PL0_RW, .type = ARM_CP_FPU | ARM_CP_SUPPRESS_TB_END,
++      .access = PL0_RW, .type = ARM_CP_FPU,
+       .readfn = aa64_fpcr_read, .writefn = aa64_fpcr_write },
+     { .name = "FPSR", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 3, .opc2 = 1, .crn = 4, .crm = 4,
+diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/hflags.c
++++ b/target/arm/tcg/hflags.c
+@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
+         DP_TBFLAG_A64(flags, TCMA, aa64_va_parameter_tcma(tcr, mmu_idx));
+     }
++    if (env->vfp.fpcr & FPCR_AH) {
++        DP_TBFLAG_A64(flags, AH, 1);
++    }
++
+     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
+ }
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
+     dc->nv2 = EX_TBFLAG_A64(tb_flags, NV2);
+     dc->nv2_mem_e20 = EX_TBFLAG_A64(tb_flags, NV2_MEM_E20);
+     dc->nv2_mem_be = EX_TBFLAG_A64(tb_flags, NV2_MEM_BE);
++    dc->fpcr_ah = EX_TBFLAG_A64(tb_flags, AH);
+     dc->vec_len = 0;
+     dc->vec_stride = 0;
+     dc->cp_regs = arm_cpu->cp_regs;
+--
+.34.1

-New patch
+[PULL 10/68] target/arm: Set up float_status to use for FPCR.AH=1 behaviour
+When FPCR.AH is 1, the behaviour of some instructions changes:
  * AdvSIMD BFCVT, BFCVTN, BFCVTN2, BFMLALB, BFMLALT
  * SVE BFCVT, BFCVTNT, BFMLALB, BFMLALT, BFMLSLB, BFMLSLT
  * SME BFCVT, BFCVTN, BFMLAL, BFMLSL (these are all in SME2 which
    QEMU does not yet implement)
  * FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
 The behaviour change is:
  * the instructions do not update the FPSR cumulative exception flags
  * trapped floating point exceptions are disabled (a no-op for QEMU,
    which doesn't implement FPCR.{IDE,IXE,UFE,OFE,DZE,IOE})
  * rounding is always round-to-nearest-even regardless of FPCR.RMode
  * denormalized inputs and outputs are always flushed to zero, as if
    FPCR.{FZ,FIZ} is {1,1}
  * FPCR.FZ16 is still honoured for half-precision inputs
 (See the Arm ARM DDI0487L.a section A1.5.9.)
 We can provide all these behaviours with another pair of float_status fields
 which we use only for these insns, when FPCR.AH is 1. These float_status
 fields will always have:
  * flush_to_zero and flush_inputs_to_zero set for the non-F16 field
  * rounding mode set to round-to-nearest-even
 and so the only FPCR fields they need to honour are DN and FZ16.
 In this commit we only define the new fp_status fields and give them
 the required behaviour when FPSR is updated.  In subsequent commits
 we will arrange to use this new fp_status field for the instructions
 that should be affected by FPCR.AH in this way.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/cpu.h           | 15 +++++++++++++++
  target/arm/internals.h     |  2 ++
  target/arm/tcg/translate.h | 14 ++++++++++++++
  target/arm/cpu.c           |  4 ++++
  target/arm/vfp_helper.c    | 13 ++++++++++++-
 files changed, 47 insertions(+), 1 deletion(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
           *  standard_fp_status : the ARM "Standard FPSCR Value"
           *  standard_fp_status_fp16 : used for half-precision
           *       calculations with the ARM "Standard FPSCR Value"
 +         *  ah_fp_status: used for the A64 insns which change behaviour
 +         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 +         *       and the reciprocal and square root estimate/step insns)
 +         *  ah_fp_status_f16: used for the A64 insns which change behaviour
 +         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 +         *       and the reciprocal and square root estimate/step insns);
 +         *       for half-precision
           *
           * Half-precision operations are governed by a separate
           * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
           * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
           * using a fixed value for it.
           *
 +         * The ah_fp_status is needed because some insns have different
 +         * behaviour when FPCR.AH == 1: they don't update cumulative
 +         * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
 +         * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
 +         * which means we need an ah_fp_status_f16 as well.
 +         *
           * To avoid having to transfer exception bits around, we simply
           * say that the FPSCR cumulative exception flags are the logical
           * OR of the flags in the four fp statuses. This relies on the
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          float_status fp_status_f16_a64;
          float_status standard_fp_status;
          float_status standard_fp_status_f16;
 +        float_status ah_fp_status;
 +        float_status ah_fp_status_f16;
          uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
          uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
 diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/internals.h
 +++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ int alle1_tlbmask(CPUARMState *env);
  /* Set the float_status behaviour to match the Arm defaults */
  void arm_set_default_fp_behaviours(float_status *s);
 +/* Set the float_status behaviour to match Arm FPCR.AH=1 behaviour */
 +void arm_set_ah_fp_behaviours(float_status *s);
  #endif
 diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate.h
 +++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour {
      FPST_A64,
      FPST_A32_F16,
      FPST_A64_F16,
 +    FPST_AH,
 +    FPST_AH_F16,
      FPST_STD,
      FPST_STD_F16,
  } ARMFPStatusFlavour;
@@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour {
   *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
   * FPST_A64_F16
   *   for AArch64 operations controlled by the FPCR where FPCR.FZ16 is to be used
 + * FPST_AH:
 + *   for AArch64 operations which change behaviour when AH=1 (specifically,
 + *   bfloat16 conversions and multiplies, and the reciprocal and square root
 + *   estimate/step insns)
 + * FPST_AH_F16:
 + *   ditto, but for half-precision operations
   * FPST_STD
   *   for A32/T32 Neon operations using the "standard FPSCR value"
   * FPST_STD_F16
@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
      case FPST_A64_F16:
          offset = offsetof(CPUARMState, vfp.fp_status_f16_a64);
          break;
 +    case FPST_AH:
 +        offset = offsetof(CPUARMState, vfp.ah_fp_status);
 +        break;
 +    case FPST_AH_F16:
 +        offset = offsetof(CPUARMState, vfp.ah_fp_status_f16);
 +        break;
      case FPST_STD:
          offset = offsetof(CPUARMState, vfp.standard_fp_status);
          break;
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
      arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
 +    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
 +    set_flush_to_zero(1, &env->vfp.ah_fp_status);
 +    set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
 +    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status_f16);
  #ifndef CONFIG_USER_ONLY
      if (kvm_enabled()) {
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ void arm_set_default_fp_behaviours(float_status *s)
   *    set Invalid for a QNaN
   *  * default NaN has sign bit set, msb frac bit set
   */
 -static void arm_set_ah_fp_behaviours(float_status *s)
 +void arm_set_ah_fp_behaviours(float_status *s)
  {
      set_float_detect_tininess(float_tininess_after_rounding, s);
      set_float_ftz_detection(float_ftz_after_rounding, s);
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
      a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
      a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
            & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
 +    /*
 +     * We do not merge in flags from ah_fp_status or ah_fp_status_f16, because
 +     * they are used for insns that must not set the cumulative exception bits.
 +     */
 +
      /*
       * Flushing an input denormal *only* because FPCR.FIZ == 1 does
       * not set FPSR.IDC; if FPCR.FZ is also set then this takes
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
      set_float_exception_flags(0, &env->vfp.standard_fp_status);
      set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
 +    set_float_exception_flags(0, &env->vfp.ah_fp_status);
 +    set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
  }
  static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
          set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
 +        set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
 +        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
      }
      if (changed & FPCR_FZ) {
          bool ftz_enabled = val & FPCR_FZ;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status_f16);
      }
      if (changed & FPCR_AH) {
          bool ah_enabled = val & FPCR_AH;
 --
 .34.1

-New patch
+[PULL 11/68] target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
+For the instructions FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, use
+FPST_FPCR_AH or FPST_FPCR_AH_F16 when FPCR.AH is 1, so that they get
+the required behaviour changes.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.h |  13 ++++
+ target/arm/tcg/translate-a64.c | 119 +++++++++++++++++++++++++--------
+ target/arm/tcg/translate-sve.c |  30 ++++++---
+files changed, 127 insertions(+), 35 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.h
++++ b/target/arm/tcg/translate-a64.h
+@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr pred_full_reg_ptr(DisasContext *s, int regno)
+     return ret;
+ }
++/*
++ * Return the ARMFPStatusFlavour to use based on element size and
++ * whether FPCR.AH is set.
++ */
++static inline ARMFPStatusFlavour select_ah_fpst(DisasContext *s, MemOp esz)
++{
++    if (s->fpcr_ah) {
++        return esz == MO_16 ? FPST_AH_F16 : FPST_AH;
++    } else {
++        return esz == MO_16 ? FPST_A64_F16 : FPST_A64;
++    }
++}
++
+ bool disas_sve(DisasContext *, uint32_t);
+ bool disas_sme(DisasContext *, uint32_t);
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op3_ool(DisasContext *s, bool is_q, int rd,
+  * an out-of-line helper.
+  */
+ static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, int rd, int rn,
+-                              int rm, bool is_fp16, int data,
++                              int rm, ARMFPStatusFlavour fpsttype, int data,
+                               gen_helper_gvec_3_ptr *fn)
+ {
+-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_A64_F16 : FPST_A64);
++    TCGv_ptr fpst = fpstatus_ptr(fpsttype);
+     tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
+                        vec_full_reg_offset(s, rn),
+                        vec_full_reg_offset(s, rm), fpst,
+@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar {
+     void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
+ } FPScalar;
+-static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
++static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
++                                        const FPScalar *f,
++                                        ARMFPStatusFlavour fpsttype)
+ {
+     switch (a->esz) {
+     case MO_64:
+         if (fp_access_check(s)) {
+             TCGv_i64 t0 = read_fp_dreg(s, a->rn);
+             TCGv_i64 t1 = read_fp_dreg(s, a->rm);
+-            f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_A64));
++            f->gen_d(t0, t0, t1, fpstatus_ptr(fpsttype));
+             write_fp_dreg(s, a->rd, t0);
+         }
+         break;
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+         if (fp_access_check(s)) {
+             TCGv_i32 t0 = read_fp_sreg(s, a->rn);
+             TCGv_i32 t1 = read_fp_sreg(s, a->rm);
+-            f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_A64));
++            f->gen_s(t0, t0, t1, fpstatus_ptr(fpsttype));
+             write_fp_sreg(s, a->rd, t0);
+         }
+         break;
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+         if (fp_access_check(s)) {
+             TCGv_i32 t0 = read_fp_hreg(s, a->rn);
+             TCGv_i32 t1 = read_fp_hreg(s, a->rm);
+-            f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_A64_F16));
++            f->gen_h(t0, t0, t1, fpstatus_ptr(fpsttype));
+             write_fp_sreg(s, a->rd, t0);
+         }
+         break;
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+     return true;
+ }
++static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
++{
++    return do_fp3_scalar_with_fpsttype(s, a, f,
++                                       a->esz == MO_16 ?
++                                       FPST_A64_F16 : FPST_A64);
++}
++
++static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
++{
++    return do_fp3_scalar_with_fpsttype(s, a, f, select_ah_fpst(s, a->esz));
++}
++
+ static const FPScalar f_scalar_fadd = {
+     gen_helper_vfp_addh,
+     gen_helper_vfp_adds,
+@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_frecps = {
+     gen_helper_recpsf_f32,
+     gen_helper_recpsf_f64,
+ };
+-TRANS(FRECPS_s, do_fp3_scalar, a, &f_scalar_frecps)
++TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps)
+ static const FPScalar f_scalar_frsqrts = {
+     gen_helper_rsqrtsf_f16,
+     gen_helper_rsqrtsf_f32,
+     gen_helper_rsqrtsf_f64,
+ };
+-TRANS(FRSQRTS_s, do_fp3_scalar, a, &f_scalar_frsqrts)
++TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts)
+ static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
+                        const FPScalar *f, bool swap)
+@@ -XXX,XX +XXX,XX @@ TRANS(CMHS_s, do_cmop_d, a, TCG_COND_GEU)
+ TRANS(CMEQ_s, do_cmop_d, a, TCG_COND_EQ)
+ TRANS(CMTST_s, do_cmop_d, a, TCG_COND_TSTNE)
+-static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
+-                          gen_helper_gvec_3_ptr * const fns[3])
++static bool do_fp3_vector_with_fpsttype(DisasContext *s, arg_qrrr_e *a,
++                                        int data,
++                                        gen_helper_gvec_3_ptr * const fns[3],
++                                        ARMFPStatusFlavour fpsttype)
+ {
+     MemOp esz = a->esz;
+     int check = fp_access_check_vector_hsd(s, a->q, esz);
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
+         return check == 0;
+     }
+-    gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
+-                      esz == MO_16, data, fns[esz - 1]);
++    gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm, fpsttype,
++                      data, fns[esz - 1]);
+     return true;
+ }
++static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
++                          gen_helper_gvec_3_ptr * const fns[3])
++{
++    return do_fp3_vector_with_fpsttype(s, a, data, fns,
++                                       a->esz == MO_16 ?
++                                       FPST_A64_F16 : FPST_A64);
++}
++
++static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
++                             gen_helper_gvec_3_ptr * const f[3])
++{
++    return do_fp3_vector_with_fpsttype(s, a, data, f,
++                                       select_ah_fpst(s, a->esz));
++}
++
+ static gen_helper_gvec_3_ptr * const f_vector_fadd[3] = {
+     gen_helper_gvec_fadd_h,
+     gen_helper_gvec_fadd_s,
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
+     gen_helper_gvec_recps_s,
+     gen_helper_gvec_recps_d,
+ };
+-TRANS(FRECPS_v, do_fp3_vector, a, 0, f_vector_frecps)
++TRANS(FRECPS_v, do_fp3_vector_ah, a, 0, f_vector_frecps)
+ static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] = {
+     gen_helper_gvec_rsqrts_h,
+     gen_helper_gvec_rsqrts_s,
+     gen_helper_gvec_rsqrts_d,
+ };
+-TRANS(FRSQRTS_v, do_fp3_vector, a, 0, f_vector_frsqrts)
++TRANS(FRSQRTS_v, do_fp3_vector_ah, a, 0, f_vector_frsqrts)
+ static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = {
+     gen_helper_gvec_faddp_h,
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector_idx(DisasContext *s, arg_qrrx_e *a,
+     }
+     gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
+-                      esz == MO_16, a->idx, fns[esz - 1]);
++                      esz == MO_16 ? FPST_A64_F16 : FPST_A64,
++                      a->idx, fns[esz - 1]);
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar1 {
+     void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_ptr);
+ } FPScalar1;
+-static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
+-                          const FPScalar1 *f, int rmode)
++static bool do_fp1_scalar_with_fpsttype(DisasContext *s, arg_rr_e *a,
++                                        const FPScalar1 *f, int rmode,
++                                        ARMFPStatusFlavour fpsttype)
+ {
+     TCGv_i32 tcg_rmode = NULL;
+     TCGv_ptr fpst;
+@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
+         return check == 0;
+     }
+-    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
++    fpst = fpstatus_ptr(fpsttype);
+     if (rmode >= 0) {
+         tcg_rmode = gen_set_rmode(rmode, fpst);
+     }
+@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
+     return true;
+ }
++static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
++                          const FPScalar1 *f, int rmode)
++{
++    return do_fp1_scalar_with_fpsttype(s, a, f, rmode,
++                                       a->esz == MO_16 ?
++                                       FPST_A64_F16 : FPST_A64);
++}
++
++static bool do_fp1_scalar_ah(DisasContext *s, arg_rr_e *a,
++                             const FPScalar1 *f, int rmode)
++{
++    return do_fp1_scalar_with_fpsttype(s, a, f, rmode, select_ah_fpst(s, a->esz));
++}
++
+ static const FPScalar1 f_scalar_fsqrt = {
+     gen_helper_vfp_sqrth,
+     gen_helper_vfp_sqrts,
+@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frecpe = {
+     gen_helper_recpe_f32,
+     gen_helper_recpe_f64,
+ };
+-TRANS(FRECPE_s, do_fp1_scalar, a, &f_scalar_frecpe, -1)
++TRANS(FRECPE_s, do_fp1_scalar_ah, a, &f_scalar_frecpe, -1)
+ static const FPScalar1 f_scalar_frecpx = {
+     gen_helper_frecpx_f16,
+     gen_helper_frecpx_f32,
+     gen_helper_frecpx_f64,
+ };
+-TRANS(FRECPX_s, do_fp1_scalar, a, &f_scalar_frecpx, -1)
++TRANS(FRECPX_s, do_fp1_scalar_ah, a, &f_scalar_frecpx, -1)
+ static const FPScalar1 f_scalar_frsqrte = {
+     gen_helper_rsqrte_f16,
+     gen_helper_rsqrte_f32,
+     gen_helper_rsqrte_f64,
+ };
+-TRANS(FRSQRTE_s, do_fp1_scalar, a, &f_scalar_frsqrte, -1)
++TRANS(FRSQRTE_s, do_fp1_scalar_ah, a, &f_scalar_frsqrte, -1)
+ static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
+ {
+@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FRINT64Z_v, aa64_frint, do_fp1_vector, a,
+            &f_scalar_frint64, FPROUNDING_ZERO)
+ TRANS_FEAT(FRINT64X_v, aa64_frint, do_fp1_vector, a, &f_scalar_frint64, -1)
+-static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
+-                             int rd, int rn, int data,
+-                             gen_helper_gvec_2_ptr * const fns[3])
++static bool do_gvec_op2_fpst_with_fpsttype(DisasContext *s, MemOp esz,
++                                           bool is_q, int rd, int rn, int data,
++                                           gen_helper_gvec_2_ptr * const fns[3],
++                                           ARMFPStatusFlavour fpsttype)
+ {
+     int check = fp_access_check_vector_hsd(s, is_q, esz);
+     TCGv_ptr fpst;
+@@ -XXX,XX +XXX,XX @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
+         return check == 0;
+     }
+-    fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64);
++    fpst = fpstatus_ptr(fpsttype);
+     tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd),
+                        vec_full_reg_offset(s, rn), fpst,
+                        is_q ? 16 : 8, vec_full_reg_size(s),
+@@ -XXX,XX +XXX,XX @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
+     return true;
+ }
++static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
++                             int rd, int rn, int data,
++                             gen_helper_gvec_2_ptr * const fns[3])
++{
++    return do_gvec_op2_fpst_with_fpsttype(s, esz, is_q, rd, rn, data, fns,
++                                          esz == MO_16 ? FPST_A64_F16 :
++                                          FPST_A64);
++}
++
++static bool do_gvec_op2_ah_fpst(DisasContext *s, MemOp esz, bool is_q,
++                                int rd, int rn, int data,
++                                gen_helper_gvec_2_ptr * const fns[3])
++{
++    return do_gvec_op2_fpst_with_fpsttype(s, esz, is_q, rd, rn, data,
++                                          fns, select_ah_fpst(s, esz));
++}
++
+ static gen_helper_gvec_2_ptr * const f_scvtf_v[] = {
+     gen_helper_gvec_vcvt_sh,
+     gen_helper_gvec_vcvt_sf,
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_frecpe[] = {
+     gen_helper_gvec_frecpe_s,
+     gen_helper_gvec_frecpe_d,
+ };
+-TRANS(FRECPE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
++TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
+ static gen_helper_gvec_2_ptr * const f_frsqrte[] = {
+     gen_helper_gvec_frsqrte_h,
+     gen_helper_gvec_frsqrte_s,
+     gen_helper_gvec_frsqrte_d,
+ };
+-TRANS(FRSQRTE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
++TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
+ static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
+ {
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static bool gen_gvec_fpst_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
+     return true;
+ }
+-static bool gen_gvec_fpst_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
+-                                 arg_rr_esz *a, int data)
++static bool gen_gvec_fpst_ah_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
++                                    arg_rr_esz *a, int data)
+ {
+     return gen_gvec_fpst_zz(s, fn, a->rd, a->rn, data,
+-                            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
++                            select_ah_fpst(s, a->esz));
+ }
+ /* Invoke an out-of-line helper on 3 Zregs. */
+@@ -XXX,XX +XXX,XX @@ static bool gen_gvec_fpst_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
+                              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+ }
++static bool gen_gvec_fpst_ah_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
++                                     arg_rrr_esz *a, int data)
++{
++    return gen_gvec_fpst_zzz(s, fn, a->rd, a->rn, a->rm, data,
++                             select_ah_fpst(s, a->esz));
++}
++
+ /* Invoke an out-of-line helper on 4 Zregs. */
+ static bool gen_gvec_ool_zzzz(DisasContext *s, gen_helper_gvec_4 *fn,
+                               int rd, int rn, int rm, int ra, int data)
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const frecpe_fns[] = {
+     NULL,                     gen_helper_gvec_frecpe_h,
+     gen_helper_gvec_frecpe_s, gen_helper_gvec_frecpe_d,
+ };
+-TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_arg_zz, frecpe_fns[a->esz], a, 0)
++TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frecpe_fns[a->esz], a, 0)
+ static gen_helper_gvec_2_ptr * const frsqrte_fns[] = {
+     NULL,                      gen_helper_gvec_frsqrte_h,
+     gen_helper_gvec_frsqrte_s, gen_helper_gvec_frsqrte_d,
+ };
+-TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_arg_zz, frsqrte_fns[a->esz], a, 0)
++TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frsqrte_fns[a->esz], a, 0)
+ /*
+  *** SVE Floating Point Compare with Zero Group
+@@ -XXX,XX +XXX,XX @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
+     };                                                              \
+     TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_arg_zzz, name##_fns[a->esz], a, 0)
++#define DO_FP3_AH(NAME, name) \
++    static gen_helper_gvec_3_ptr * const name##_fns[4] = {          \
++        NULL, gen_helper_gvec_##name##_h,                           \
++        gen_helper_gvec_##name##_s, gen_helper_gvec_##name##_d      \
++    };                                                              \
++    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz, name##_fns[a->esz], a, 0)
++
+ DO_FP3(FADD_zzz, fadd)
+ DO_FP3(FSUB_zzz, fsub)
+ DO_FP3(FMUL_zzz, fmul)
+-DO_FP3(FRECPS, recps)
+-DO_FP3(FRSQRTS, rsqrts)
++DO_FP3_AH(FRECPS, recps)
++DO_FP3_AH(FRSQRTS, rsqrts)
+ #undef DO_FP3
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const frecpx_fns[] = {
+     gen_helper_sve_frecpx_s, gen_helper_sve_frecpx_d,
+ };
+ TRANS_FEAT(FRECPX, aa64_sve, gen_gvec_fpst_arg_zpz, frecpx_fns[a->esz],
+-           a, 0, a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
++           a, 0, select_ah_fpst(s, a->esz))
+ static gen_helper_gvec_3_ptr * const fsqrt_fns[] = {
+     NULL,                   gen_helper_sve_fsqrt_h,
+--
+.34.1

-New patch
+[PULL 12/68] target/arm: Use FPST_FPCR_AH for BFCVT* insns
+When FPCR.AH is 1, use FPST_FPCR_AH for:
+ * AdvSIMD BFCVT, BFCVTN, BFCVTN2
+ * SVE BFCVT, BFCVTNT
+so that they get the required behaviour changes.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 27 +++++++++++++++++++++------
+ target/arm/tcg/translate-sve.c |  6 ++++--
+files changed, 25 insertions(+), 8 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
+ static const FPScalar1 f_scalar_bfcvt = {
+     .gen_s = gen_helper_bfcvt,
+ };
+-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar, a, &f_scalar_bfcvt, -1)
++TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
+ static const FPScalar1 f_scalar_frint32 = {
+     NULL,
+@@ -XXX,XX +XXX,XX @@ static void gen_bfcvtn_hs(TCGv_i64 d, TCGv_i64 n)
+     tcg_gen_extu_i32_i64(d, tmp);
+ }
+-static ArithOneOp * const f_vector_bfcvtn[] = {
+-    NULL,
+-    gen_bfcvtn_hs,
+-    NULL,
++static void gen_bfcvtn_ah_hs(TCGv_i64 d, TCGv_i64 n)
++{
++    TCGv_ptr fpst = fpstatus_ptr(FPST_AH);
++    TCGv_i32 tmp = tcg_temp_new_i32();
++    gen_helper_bfcvt_pair(tmp, n, fpst);
++    tcg_gen_extu_i32_i64(d, tmp);
++}
++
++static ArithOneOp * const f_vector_bfcvtn[2][3] = {
++    {
++        NULL,
++        gen_bfcvtn_hs,
++        NULL,
++    }, {
++        NULL,
++        gen_bfcvtn_ah_hs,
++        NULL,
++    }
+ };
+-TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a, f_vector_bfcvtn)
++TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a,
++           f_vector_bfcvtn[s->fpcr_ah])
+ static bool trans_SHLL_v(DisasContext *s, arg_qrr_e *a)
+ {
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVT_hs, aa64_sve, gen_gvec_fpst_arg_zpz,
+            gen_helper_sve_fcvt_hs, a, 0, FPST_A64_F16)
+ TRANS_FEAT(BFCVT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
+-           gen_helper_sve_bfcvt, a, 0, FPST_A64)
++           gen_helper_sve_bfcvt, a, 0,
++           s->fpcr_ah ? FPST_AH : FPST_A64)
+ TRANS_FEAT(FCVT_dh, aa64_sve, gen_gvec_fpst_arg_zpz,
+            gen_helper_sve_fcvt_dh, a, 0, FPST_A64)
+@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVTNT_ds, aa64_sve2, gen_gvec_fpst_arg_zpz,
+            gen_helper_sve2_fcvtnt_ds, a, 0, FPST_A64)
+ TRANS_FEAT(BFCVTNT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
+-           gen_helper_sve_bfcvtnt, a, 0, FPST_A64)
++           gen_helper_sve_bfcvtnt, a, 0,
++           s->fpcr_ah ? FPST_AH : FPST_A64)
+ TRANS_FEAT(FCVTLT_hs, aa64_sve2, gen_gvec_fpst_arg_zpz,
+            gen_helper_sve2_fcvtlt_hs, a, 0, FPST_A64)
+--
+.34.1

-New patch
+[PULL 13/68] target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
+When FPCR.AH is 1, use FPST_FPCR_AH for:
+ * AdvSIMD BFMLALB, BFMLALT
+ * SVE BFMLALB, BFMLALT, BFMLSLB, BFMLSLT
+so that they get the required behaviour changes.
+We do this by making gen_gvec_op4_fpst() take an ARMFPStatusFlavour
+rather than a bool is_fp16; existing callsites now select
+FPST_FPCR_F16_A64 vs FPST_FPCR_A64 themselves rather than passing in
+the boolean.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 20 +++++++++++++-------
+ target/arm/tcg/translate-sve.c |  6 ++++--
+files changed, 17 insertions(+), 9 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op4_env(DisasContext *s, bool is_q, int rd, int rn,
+  * an out-of-line helper.
+  */
+ static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
+-                              int rm, int ra, bool is_fp16, int data,
++                              int rm, int ra, ARMFPStatusFlavour fpsttype,
++                              int data,
+                               gen_helper_gvec_4_ptr *fn)
+ {
+-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_A64_F16 : FPST_A64);
++    TCGv_ptr fpst = fpstatus_ptr(fpsttype);
+     tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, rd),
+                        vec_full_reg_offset(s, rn),
+                        vec_full_reg_offset(s, rm),
+@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLAL_v(DisasContext *s, arg_qrrr_e *a)
+     }
+     if (fp_access_check(s)) {
+         /* Q bit selects BFMLALB vs BFMLALT. */
+-        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, false, a->q,
++        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd,
++                          s->fpcr_ah ? FPST_AH : FPST_A64, a->q,
+                           gen_helper_gvec_bfmlal);
+     }
+     return true;
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
+     }
+     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
+-                      a->esz == MO_16, a->rot, fn[a->esz]);
++                      a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
++                      a->rot, fn[a->esz]);
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
+     }
+     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
+-                      esz == MO_16, (a->idx << 1) | neg,
++                      esz == MO_16 ? FPST_A64_F16 : FPST_A64,
++                      (a->idx << 1) | neg,
+                       fns[esz - 1]);
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLAL_vi(DisasContext *s, arg_qrrx_e *a)
+     }
+     if (fp_access_check(s)) {
+         /* Q bit selects BFMLALB vs BFMLALT. */
+-        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, 0,
++        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd,
++                          s->fpcr_ah ? FPST_AH : FPST_A64,
+                           (a->idx << 1) | a->q,
+                           gen_helper_gvec_bfmlal_idx);
+     }
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
+     }
+     if (fp_access_check(s)) {
+         gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
+-                          a->esz == MO_16, (a->idx << 2) | a->rot, fn);
++                          a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
++                          (a->idx << 2) | a->rot, fn);
+     }
+     return true;
+ }
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(BFMMLA, aa64_sve_bf16, gen_gvec_env_arg_zzzz,
+ static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
+ {
+     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal,
+-                              a->rd, a->rn, a->rm, a->ra, sel, FPST_A64);
++                              a->rd, a->rn, a->rm, a->ra, sel,
++                              s->fpcr_ah ? FPST_AH : FPST_A64);
+ }
+ TRANS_FEAT(BFMLALB_zzzw, aa64_sve_bf16, do_BFMLAL_zzzw, a, false)
+@@ -XXX,XX +XXX,XX @@ static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
+ {
+     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal_idx,
+                               a->rd, a->rn, a->rm, a->ra,
+-                              (a->index << 1) | sel, FPST_A64);
++                              (a->index << 1) | sel,
++                              s->fpcr_ah ? FPST_AH : FPST_A64);
+ }
+ TRANS_FEAT(BFMLALB_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, false)
+--
+.34.1

-New patch
+[PULL 14/68] target/arm: Add FPCR.NEP to TBFLAGS
+For FEAT_AFP, we want to emit different code when FPCR.NEP is set, so
+that instead of zeroing the high elements of a vector register when
+we write the output of a scalar operation to it, we instead merge in
+those elements from one of the source registers.  Since this affects
+the generated code, we need to put FPCR.NEP into the TBFLAGS.
+FPCR.NEP is treated as 0 when in streaming SVE mode and FEAT_SME_FA64
+is not implemented or not enabled; we can implement this logic in
+rebuild_hflags_a64().
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/cpu.h               | 1 +
+ target/arm/tcg/translate.h     | 2 ++
+ target/arm/tcg/hflags.c        | 9 +++++++++
+ target/arm/tcg/translate-a64.c | 1 +
+files changed, 13 insertions(+)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1)
+ /* Set if FEAT_NV2 RAM accesses are big-endian */
+ FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1)
+ FIELD(TBFLAG_A64, AH, 37, 1)   /* FPCR.AH */
++FIELD(TBFLAG_A64, NEP, 38, 1)   /* FPCR.NEP */
+ /*
+  * Helpers for using the above. Note that only the A64 accessors use
+diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate.h
++++ b/target/arm/tcg/translate.h
+@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
+     bool nv2_mem_be;
+     /* True if FPCR.AH is 1 (alternate floating point handling) */
+     bool fpcr_ah;
++    /* True if FPCR.NEP is 1 (FEAT_AFP scalar upper-element result handling) */
++    bool fpcr_nep;
+     /*
+      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
+      *  < 0, set by the current instruction.
+diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/hflags.c
++++ b/target/arm/tcg/hflags.c
+@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
+     if (env->vfp.fpcr & FPCR_AH) {
+         DP_TBFLAG_A64(flags, AH, 1);
+     }
++    if (env->vfp.fpcr & FPCR_NEP) {
++        /*
++         * In streaming-SVE without FA64, NEP behaves as if zero;
++         * compare pseudocode IsMerging()
++         */
++        if (!(EX_TBFLAG_A64(flags, PSTATE_SM) && !sme_fa64(env, el))) {
++            DP_TBFLAG_A64(flags, NEP, 1);
++        }
++    }
+     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
+ }
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
+     dc->nv2_mem_e20 = EX_TBFLAG_A64(tb_flags, NV2_MEM_E20);
+     dc->nv2_mem_be = EX_TBFLAG_A64(tb_flags, NV2_MEM_BE);
+     dc->fpcr_ah = EX_TBFLAG_A64(tb_flags, AH);
++    dc->fpcr_nep = EX_TBFLAG_A64(tb_flags, NEP);
+     dc->vec_len = 0;
+     dc->vec_stride = 0;
+     dc->cp_regs = arm_cpu->cp_regs;
+--
+.34.1

-[Qemu-devel] [PULL 06/28] hw/intc/arm_gicv3: Introduce redist-region-count array property
+[PULL 15/68] target/arm: Define and use new write_fp_*reg_merging() functions
-From: Eric Auger <eric.auger@redhat.com>
+For FEAT_AFP's FPCR.NEP bit, we need to programmatically change the
+behaviour of the writeback of the result for most SIMD scalar
-To prepare for multiple redistributor regions, we introduce
+operations, so that instead of zeroing the upper part of the result
-an array of uint32_t properties that stores the redistributor
+register it merges the upper elements from one of the input
-count of each redistributor region.
+registers.
-Non accelerated VGICv3 only supports a single redistributor region.
+Provide new functions write_fp_*reg_merging() which can be used
-The capacity of all redist regions is checked against the number of
+instead of the existing write_fp_*reg() functions when we want this
-vcpus.
+"merge the result with one of the input registers if FPCR.NEP is
+enabled" handling, and use them in do_fp3_scalar_with_fpsttype().
-Machvirt is updated to set those properties, ie. a single
-redistributor region with count set to the number of vcpus
+Note that (as documented in the description of the FPCR.NEP bit)
-capped by 123.
+which input register to use as the merge source varies by
+instruction: for these 2-input scalar operations, the comparison
-Signed-off-by: Eric Auger <eric.auger@redhat.com>
+instructions take from Rm, not Rn.
-Reviewed-by: Andrew Jones <drjones@redhat.com>
-Message-id: 1529072910-16156-4-git-send-email-eric.auger@redhat.com
+We'll extend this to also provide the merging behaviour for
 the remaining scalar insns in subsequent commits.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/hw/intc/arm_gicv3_common.h |  8 +++++--
+ target/arm/tcg/translate-a64.c | 117 +++++++++++++++++++++++++--------
- hw/arm/virt.c                      | 11 ++++++++-
+file changed, 91 insertions(+), 26 deletions(-)
- hw/intc/arm_gicv3.c                | 12 +++++++++-
- hw/intc/arm_gicv3_common.c         | 38 ++++++++++++++++++++++++++----
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
  hw/intc/arm_gicv3_kvm.c            |  9 +++++--
 files changed, 67 insertions(+), 11 deletions(-)
 diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/intc/arm_gicv3_common.h
+--- a/target/arm/tcg/translate-a64.c
-+++ b/include/hw/intc/arm_gicv3_common.h
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void write_fp_sreg(DisasContext *s, int reg, TCGv_i32 v)
- #define GICV3_MAXIRQ 1020
+     write_fp_dreg(s, reg, tmp);
- #define GICV3_MAXSPI (GICV3_MAXIRQ - GIC_INTERNAL)
+ }
-+#define GICV3_REDIST_SIZE 0x20000
++/*
-+
++ * Write a double result to 128 bit vector register reg, honouring FPCR.NEP:
- /* Number of SGI target-list bits */
++ * - if FPCR.NEP == 0, clear the high elements of reg
- #define GICV3_TARGETLIST_BITS 16
++ * - if FPCR.NEP == 1, set the high elements of reg from mergereg
++ *   (i.e. merge the result with those high elements)
-@@ -XXX,XX +XXX,XX @@ struct GICv3State {
++ * In either case, SVE register bits above 128 are zeroed (per R_WKYLB).
-     /*< public >*/
++ */
++static void write_fp_dreg_merging(DisasContext *s, int reg, int mergereg,
-     MemoryRegion iomem_dist; /* Distributor */
++                                  TCGv_i64 v)
--    MemoryRegion iomem_redist; /* Redistributors */
++{
-+    MemoryRegion *iomem_redist; /* Redistributor Regions */
++    if (!s->fpcr_nep) {
-+    uint32_t *redist_region_count; /* redistributor count within each region */
++        write_fp_dreg(s, reg, v);
 +    uint32_t nb_redist_regions; /* number of redist regions */
      uint32_t num_cpu;
      uint32_t num_irq;
@@ -XXX,XX +XXX,XX @@ typedef struct ARMGICv3CommonClass {
  } ARMGICv3CommonClass;
  void gicv3_init_irqs_and_mmio(GICv3State *s, qemu_irq_handler handler,
 -                              const MemoryRegionOps *ops);
 +                              const MemoryRegionOps *ops, Error **errp);
  #endif
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void create_gic(VirtMachineState *vms, qemu_irq *pic)
      if (!kvm_irqchip_in_kernel()) {
          qdev_prop_set_bit(gicdev, "has-security-extensions", vms->secure);
      }
 +
 +    if (type == 3) {
 +        uint32_t redist0_capacity =
 +                    vms->memmap[VIRT_GIC_REDIST].size / GICV3_REDIST_SIZE;
 +        uint32_t redist0_count = MIN(smp_cpus, redist0_capacity);
 +
 +        qdev_prop_set_uint32(gicdev, "len-redist-region-count", 1);
 +        qdev_prop_set_uint32(gicdev, "redist-region-count[0]", redist0_count);
 +    }
      qdev_init_nofail(gicdev);
      gicbusdev = SYS_BUS_DEVICE(gicdev);
      sysbus_mmio_map(gicbusdev, 0, vms->memmap[VIRT_GIC_DIST].base);
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
       * many redistributors we can fit into the memory map.
       */
      if (vms->gic_version == 3) {
 -        virt_max_cpus = vms->memmap[VIRT_GIC_REDIST].size / 0x20000;
 +        virt_max_cpus = vms->memmap[VIRT_GIC_REDIST].size / GICV3_REDIST_SIZE;
      } else {
          virt_max_cpus = GIC_NCPU;
      }
 diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/intc/arm_gicv3.c
 +++ b/hw/intc/arm_gicv3.c
@@ -XXX,XX +XXX,XX @@ static void arm_gic_realize(DeviceState *dev, Error **errp)
          return;
      }
 -    gicv3_init_irqs_and_mmio(s, gicv3_set_irq, gic_ops);
 +    if (s->nb_redist_regions != 1) {
 +        error_setg(errp, "VGICv3 redist region number(%d) not equal to 1",
 +                   s->nb_redist_regions);
 +        return;
 +    }
 +
-+    gicv3_init_irqs_and_mmio(s, gicv3_set_irq, gic_ops, &local_err);
++    /*
-+    if (local_err) {
++     * Move from mergereg to reg; this sets the high elements and
-+        error_propagate(errp, local_err);
++     * clears the bits above 128 as a side effect.
 +     */
 +    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
 +                     vec_full_reg_offset(s, mergereg),
 +                     16, vec_full_reg_size(s));
 +    tcg_gen_st_i64(v, tcg_env, vec_full_reg_offset(s, reg));
 +}
 +
 +/*
 + * Write a single-prec result, but only clear the higher elements
 + * of the destination register if FPCR.NEP is 0; otherwise preserve them.
 + */
 +static void write_fp_sreg_merging(DisasContext *s, int reg, int mergereg,
 +                                  TCGv_i32 v)
 +{
 +    if (!s->fpcr_nep) {
 +        write_fp_sreg(s, reg, v);
 +        return;
 +    }
++
-     gicv3_init_cpuif(s);
++    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
- }
++                     vec_full_reg_offset(s, mergereg),
-diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
++                     16, vec_full_reg_size(s));
-index XXXXXXX..XXXXXXX 100644
++    tcg_gen_st_i32(v, tcg_env, fp_reg_offset(s, reg, MO_32));
---- a/hw/intc/arm_gicv3_common.c
++}
-+++ b/hw/intc/arm_gicv3_common.c
++
-@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_gicv3 = {
++/*
- };
++ * Write a half-prec result, but only clear the higher elements
++ * of the destination register if FPCR.NEP is 0; otherwise preserve them.
- void gicv3_init_irqs_and_mmio(GICv3State *s, qemu_irq_handler handler,
++ * The caller must ensure that the top 16 bits of v are zero.
--                              const MemoryRegionOps *ops)
++ */
-+                              const MemoryRegionOps *ops, Error **errp)
++static void write_fp_hreg_merging(DisasContext *s, int reg, int mergereg,
- {
++                                  TCGv_i32 v)
-     SysBusDevice *sbd = SYS_BUS_DEVICE(s);
++{
-+    int rdist_capacity = 0;
++    if (!s->fpcr_nep) {
-     int i;
++        write_fp_sreg(s, reg, v);
 +    for (i = 0; i < s->nb_redist_regions; i++) {
 +        rdist_capacity += s->redist_region_count[i];
 +    }
 +    if (rdist_capacity < s->num_cpu) {
 +        error_setg(errp, "Capacity of the redist regions(%d) "
 +                   "is less than number of vcpus(%d)",
 +                   rdist_capacity, s->num_cpu);
 +        return;
 +    }
 +
-     /* For the GIC, also expose incoming GPIO lines for PPIs for each CPU.
++    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
-      * GPIO array layout is thus:
++                     vec_full_reg_offset(s, mergereg),
-      *  [0..N-1] spi
++                     16, vec_full_reg_size(s));
-@@ -XXX,XX +XXX,XX @@ void gicv3_init_irqs_and_mmio(GICv3State *s, qemu_irq_handler handler,
++    tcg_gen_st16_i32(v, tcg_env, fp_reg_offset(s, reg, MO_16));
++}
-     memory_region_init_io(&s->iomem_dist, OBJECT(s), ops, s,
++
-                           "gicv3_dist", 0x10000);
+ /* Expand a 2-operand AdvSIMD vector operation using an expander function.  */
--    memory_region_init_io(&s->iomem_redist, OBJECT(s), ops ? &ops[1] : NULL, s,
+ static void gen_gvec_fn2(DisasContext *s, bool is_q, int rd, int rn,
--                          "gicv3_redist", 0x20000 * s->num_cpu);
+                          GVecGen2Fn *gvec_fn, int vece)
--
+@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar {
-     sysbus_init_mmio(sbd, &s->iomem_dist);
+ } FPScalar;
--    sysbus_init_mmio(sbd, &s->iomem_redist);
-+
+ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
-+    s->iomem_redist = g_new0(MemoryRegion, s->nb_redist_regions);
+-                                        const FPScalar *f,
-+    for (i = 0; i < s->nb_redist_regions; i++) {
++                                        const FPScalar *f, int mergereg,
-+        char *name = g_strdup_printf("gicv3_redist_region[%d]", i);
+                                         ARMFPStatusFlavour fpsttype)
-+
+ {
-+        memory_region_init_io(&s->iomem_redist[i], OBJECT(s),
+     switch (a->esz) {
-+                              ops ? &ops[1] : NULL, s, name,
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
-+                              s->redist_region_count[i] * GICV3_REDIST_SIZE);
+             TCGv_i64 t0 = read_fp_dreg(s, a->rn);
-+        sysbus_init_mmio(sbd, &s->iomem_redist[i]);
+             TCGv_i64 t1 = read_fp_dreg(s, a->rm);
-+        g_free(name);
+             f->gen_d(t0, t0, t1, fpstatus_ptr(fpsttype));
-+    }
+-            write_fp_dreg(s, a->rd, t0);
 +            write_fp_dreg_merging(s, a->rd, mergereg, t0);
          }
          break;
      case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
              TCGv_i32 t0 = read_fp_sreg(s, a->rn);
              TCGv_i32 t1 = read_fp_sreg(s, a->rm);
              f->gen_s(t0, t0, t1, fpstatus_ptr(fpsttype));
 -            write_fp_sreg(s, a->rd, t0);
 +            write_fp_sreg_merging(s, a->rd, mergereg, t0);
          }
          break;
      case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
              TCGv_i32 t0 = read_fp_hreg(s, a->rn);
              TCGv_i32 t1 = read_fp_hreg(s, a->rm);
              f->gen_h(t0, t0, t1, fpstatus_ptr(fpsttype));
 -            write_fp_sreg(s, a->rd, t0);
 +            write_fp_hreg_merging(s, a->rd, mergereg, t0);
          }
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
      return true;
  }
- static void arm_gicv3_common_realize(DeviceState *dev, Error **errp)
+-static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
-@@ -XXX,XX +XXX,XX @@ static void arm_gicv3_common_realize(DeviceState *dev, Error **errp)
++static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
-     }
++                          int mergereg)
  {
 -    return do_fp3_scalar_with_fpsttype(s, a, f,
 +    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
                                         a->esz == MO_16 ?
                                         FPST_A64_F16 : FPST_A64);
  }
-+static void arm_gicv3_finalize(Object *obj)
+-static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
-+{
++static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
-+    GICv3State *s = ARM_GICV3_COMMON(obj);
++                             int mergereg)
-+
+ {
-+    g_free(s->redist_region_count);
+-    return do_fp3_scalar_with_fpsttype(s, a, f, select_ah_fpst(s, a->esz));
-+}
++    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
-+
++                                       select_ah_fpst(s, a->esz));
- static void arm_gicv3_common_reset(DeviceState *dev)
+ }
- {
-     GICv3State *s = ARM_GICV3_COMMON(dev);
+ static const FPScalar f_scalar_fadd = {
-@@ -XXX,XX +XXX,XX @@ static Property arm_gicv3_common_properties[] = {
+@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fadd = {
-     DEFINE_PROP_UINT32("num-irq", GICv3State, num_irq, 32),
+     gen_helper_vfp_adds,
-     DEFINE_PROP_UINT32("revision", GICv3State, revision, 3),
+     gen_helper_vfp_addd,
-     DEFINE_PROP_BOOL("has-security-extensions", GICv3State, security_extn, 0),
+ };
-+    DEFINE_PROP_ARRAY("redist-region-count", GICv3State, nb_redist_regions,
+-TRANS(FADD_s, do_fp3_scalar, a, &f_scalar_fadd)
-+                      redist_region_count, qdev_prop_uint32, uint32_t),
++TRANS(FADD_s, do_fp3_scalar, a, &f_scalar_fadd, a->rn)
-     DEFINE_PROP_END_OF_LIST(),
- };
+ static const FPScalar f_scalar_fsub = {
+     gen_helper_vfp_subh,
-@@ -XXX,XX +XXX,XX @@ static const TypeInfo arm_gicv3_common_type = {
+     gen_helper_vfp_subs,
-     .instance_size = sizeof(GICv3State),
+     gen_helper_vfp_subd,
-     .class_size = sizeof(ARMGICv3CommonClass),
+ };
-     .class_init = arm_gicv3_common_class_init,
+-TRANS(FSUB_s, do_fp3_scalar, a, &f_scalar_fsub)
-+    .instance_finalize = arm_gicv3_finalize,
++TRANS(FSUB_s, do_fp3_scalar, a, &f_scalar_fsub, a->rn)
-     .abstract = true,
-     .interfaces = (InterfaceInfo []) {
+ static const FPScalar f_scalar_fdiv = {
-         { TYPE_ARM_LINUX_BOOT_IF },
+     gen_helper_vfp_divh,
-diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
+     gen_helper_vfp_divs,
-index XXXXXXX..XXXXXXX 100644
+     gen_helper_vfp_divd,
---- a/hw/intc/arm_gicv3_kvm.c
+ };
-+++ b/hw/intc/arm_gicv3_kvm.c
+-TRANS(FDIV_s, do_fp3_scalar, a, &f_scalar_fdiv)
-@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
++TRANS(FDIV_s, do_fp3_scalar, a, &f_scalar_fdiv, a->rn)
-         return;
-     }
+ static const FPScalar f_scalar_fmul = {
+     gen_helper_vfp_mulh,
--    gicv3_init_irqs_and_mmio(s, kvm_arm_gicv3_set_irq, NULL);
+     gen_helper_vfp_muls,
-+    gicv3_init_irqs_and_mmio(s, kvm_arm_gicv3_set_irq, NULL, &local_err);
+     gen_helper_vfp_muld,
-+    if (local_err) {
+ };
-+        error_propagate(errp, local_err);
+-TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul)
-+        return;
++TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul, a->rn)
-+    }
+ static const FPScalar f_scalar_fmax = {
-     for (i = 0; i < s->num_cpu; i++) {
+     gen_helper_vfp_maxh,
-         ARMCPU *cpu = ARM_CPU(qemu_get_cpu(i));
+     gen_helper_vfp_maxs,
-@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
+     gen_helper_vfp_maxd,
+ };
-     kvm_arm_register_device(&s->iomem_dist, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
+-TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax)
-                             KVM_VGIC_V3_ADDR_TYPE_DIST, s->dev_fd, 0);
++TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax, a->rn)
--    kvm_arm_register_device(&s->iomem_redist, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
-+    kvm_arm_register_device(&s->iomem_redist[0], -1,
+ static const FPScalar f_scalar_fmin = {
-+                            KVM_DEV_ARM_VGIC_GRP_ADDR,
+     gen_helper_vfp_minh,
-                             KVM_VGIC_V3_ADDR_TYPE_REDIST, s->dev_fd, 0);
+     gen_helper_vfp_mins,
+     gen_helper_vfp_mind,
-     if (kvm_has_gsi_routing()) {
+ };
 -TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin)
 +TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin, a->rn)
  static const FPScalar f_scalar_fmaxnm = {
      gen_helper_vfp_maxnumh,
      gen_helper_vfp_maxnums,
      gen_helper_vfp_maxnumd,
  };
 -TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm)
 +TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm, a->rn)
  static const FPScalar f_scalar_fminnm = {
      gen_helper_vfp_minnumh,
      gen_helper_vfp_minnums,
      gen_helper_vfp_minnumd,
  };
 -TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm)
 +TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm, a->rn)
  static const FPScalar f_scalar_fmulx = {
      gen_helper_advsimd_mulxh,
      gen_helper_vfp_mulxs,
      gen_helper_vfp_mulxd,
  };
 -TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx)
 +TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx, a->rn)
  static void gen_fnmul_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
  {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fnmul = {
      gen_fnmul_s,
      gen_fnmul_d,
  };
 -TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul)
 +TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul, a->rn)
  static const FPScalar f_scalar_fcmeq = {
      gen_helper_advsimd_ceq_f16,
      gen_helper_neon_ceq_f32,
      gen_helper_neon_ceq_f64,
  };
 -TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq)
 +TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq, a->rm)
  static const FPScalar f_scalar_fcmge = {
      gen_helper_advsimd_cge_f16,
      gen_helper_neon_cge_f32,
      gen_helper_neon_cge_f64,
  };
 -TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge)
 +TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge, a->rm)
  static const FPScalar f_scalar_fcmgt = {
      gen_helper_advsimd_cgt_f16,
      gen_helper_neon_cgt_f32,
      gen_helper_neon_cgt_f64,
  };
 -TRANS(FCMGT_s, do_fp3_scalar, a, &f_scalar_fcmgt)
 +TRANS(FCMGT_s, do_fp3_scalar, a, &f_scalar_fcmgt, a->rm)
  static const FPScalar f_scalar_facge = {
      gen_helper_advsimd_acge_f16,
      gen_helper_neon_acge_f32,
      gen_helper_neon_acge_f64,
  };
 -TRANS(FACGE_s, do_fp3_scalar, a, &f_scalar_facge)
 +TRANS(FACGE_s, do_fp3_scalar, a, &f_scalar_facge, a->rm)
  static const FPScalar f_scalar_facgt = {
      gen_helper_advsimd_acgt_f16,
      gen_helper_neon_acgt_f32,
      gen_helper_neon_acgt_f64,
  };
 -TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt)
 +TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt, a->rm)
  static void gen_fabd_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
  {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fabd = {
      gen_fabd_s,
      gen_fabd_d,
  };
 -TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd)
 +TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd, a->rn)
  static const FPScalar f_scalar_frecps = {
      gen_helper_recpsf_f16,
      gen_helper_recpsf_f32,
      gen_helper_recpsf_f64,
  };
 -TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps)
 +TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps, a->rn)
  static const FPScalar f_scalar_frsqrts = {
      gen_helper_rsqrtsf_f16,
      gen_helper_rsqrtsf_f32,
      gen_helper_rsqrtsf_f64,
  };
 -TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts)
 +TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts, a->rn)
  static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                         const FPScalar *f, bool swap)
 --
-.17.1
+.34.1

-New patch
+[PULL 16/68] target/arm: Handle FPCR.NEP for 3-input scalar operations
+Handle FPCR.NEP for the 3-input scalar operations which use
+do_fmla_scalar_idx() and do_fmadd(), by making them call the
+appropriate write_fp_*reg_merging() functions.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 12 ++++++------
+file changed, 6 insertions(+), 6 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
+                 gen_vfp_negd(t1, t1);
+             }
+             gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
+-            write_fp_dreg(s, a->rd, t0);
++            write_fp_dreg_merging(s, a->rd, a->rd, t0);
+         }
+         break;
+     case MO_32:
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
+                 gen_vfp_negs(t1, t1);
+             }
+             gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
+-            write_fp_sreg(s, a->rd, t0);
++            write_fp_sreg_merging(s, a->rd, a->rd, t0);
+         }
+         break;
+     case MO_16:
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
+             }
+             gen_helper_advsimd_muladdh(t0, t1, t2, t0,
+                                        fpstatus_ptr(FPST_A64_F16));
+-            write_fp_sreg(s, a->rd, t0);
++            write_fp_hreg_merging(s, a->rd, a->rd, t0);
+         }
+         break;
+     default:
+@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
+             }
+             fpst = fpstatus_ptr(FPST_A64);
+             gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
+-            write_fp_dreg(s, a->rd, ta);
++            write_fp_dreg_merging(s, a->rd, a->ra, ta);
+         }
+         break;
+@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
+             }
+             fpst = fpstatus_ptr(FPST_A64);
+             gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
+-            write_fp_sreg(s, a->rd, ta);
++            write_fp_sreg_merging(s, a->rd, a->ra, ta);
+         }
+         break;
+@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
+             }
+             fpst = fpstatus_ptr(FPST_A64_F16);
+             gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
+-            write_fp_sreg(s, a->rd, ta);
++            write_fp_hreg_merging(s, a->rd, a->ra, ta);
+         }
+         break;
+--
+.34.1

-New patch
+[PULL 17/68] target/arm: Handle FPCR.NEP for BFCVT scalar
+Currently we implement BFCVT scalar via do_fp1_scalar().  This works
+even though BFCVT is a narrowing operation from 32 to 16 bits,
+because we can use write_fp_sreg() for float16. However, FPCR.NEP
+support requires that we use write_fp_hreg_merging() for float16
+outputs, so we can't continue to borrow the non-narrowing
+do_fp1_scalar() function for this. Split out trans_BFCVT_s()
+into its own implementation that honours FPCR.NEP.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
+file changed, 21 insertions(+), 4 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frintx = {
+ };
+ TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
+-static const FPScalar1 f_scalar_bfcvt = {
+-    .gen_s = gen_helper_bfcvt,
+-};
+-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
++static bool trans_BFCVT_s(DisasContext *s, arg_rr_e *a)
++{
++    ARMFPStatusFlavour fpsttype = s->fpcr_ah ? FPST_AH : FPST_A64;
++    TCGv_i32 t32;
++    int check;
++
++    if (!dc_isar_feature(aa64_bf16, s)) {
++        return false;
++    }
++
++    check = fp_access_check_scalar_hsd(s, a->esz);
++
++    if (check <= 0) {
++        return check == 0;
++    }
++
++    t32 = read_fp_sreg(s, a->rn);
++    gen_helper_bfcvt(t32, t32, fpstatus_ptr(fpsttype));
++    write_fp_hreg_merging(s, a->rd, a->rd, t32);
++    return true;
++}
+ static const FPScalar1 f_scalar_frint32 = {
+     NULL,
+--
+.34.1

-New patch
+[PULL 18/68] target/arm: Handle FPCR.NEP for 1-input scalar operations
+Handle FPCR.NEP for the 1-input scalar operations.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 26 ++++++++++++++------------
+file changed, 14 insertions(+), 12 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_with_fpsttype(DisasContext *s, arg_rr_e *a,
+     case MO_64:
+         t64 = read_fp_dreg(s, a->rn);
+         f->gen_d(t64, t64, fpst);
+-        write_fp_dreg(s, a->rd, t64);
++        write_fp_dreg_merging(s, a->rd, a->rd, t64);
+         break;
+     case MO_32:
+         t32 = read_fp_sreg(s, a->rn);
+         f->gen_s(t32, t32, fpst);
+-        write_fp_sreg(s, a->rd, t32);
++        write_fp_sreg_merging(s, a->rd, a->rd, t32);
+         break;
+     case MO_16:
+         t32 = read_fp_hreg(s, a->rn);
+         f->gen_h(t32, t32, fpst);
+-        write_fp_sreg(s, a->rd, t32);
++        write_fp_hreg_merging(s, a->rd, a->rd, t32);
+         break;
+     default:
+         g_assert_not_reached();
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
+         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
+         gen_helper_vfp_fcvtds(tcg_rd, tcg_rn, fpst);
+-        write_fp_dreg(s, a->rd, tcg_rd);
++        write_fp_dreg_merging(s, a->rd, a->rd, tcg_rd);
+     }
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_hs(DisasContext *s, arg_rr *a)
+         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
+         gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
+-        /* write_fp_sreg is OK here because top half of result is zero */
+-        write_fp_sreg(s, a->rd, tmp);
++        /* write_fp_hreg_merging is OK here because top half of result is zero */
++        write_fp_hreg_merging(s, a->rd, a->rd, tmp);
+     }
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_sd(DisasContext *s, arg_rr *a)
+         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
+         gen_helper_vfp_fcvtsd(tcg_rd, tcg_rn, fpst);
+-        write_fp_sreg(s, a->rd, tcg_rd);
++        write_fp_sreg_merging(s, a->rd, a->rd, tcg_rd);
+     }
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_hd(DisasContext *s, arg_rr *a)
+         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
+         gen_helper_vfp_fcvt_f64_to_f16(tcg_rd, tcg_rn, fpst, ahp);
+-        /* write_fp_sreg is OK here because top half of tcg_rd is zero */
+-        write_fp_sreg(s, a->rd, tcg_rd);
++        /* write_fp_hreg_merging is OK here because top half of tcg_rd is zero */
++        write_fp_hreg_merging(s, a->rd, a->rd, tcg_rd);
+     }
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_sh(DisasContext *s, arg_rr *a)
+         TCGv_i32 tcg_ahp = get_ahp_flag();
+         gen_helper_vfp_fcvt_f16_to_f32(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
+-        write_fp_sreg(s, a->rd, tcg_rd);
++        write_fp_sreg_merging(s, a->rd, a->rd, tcg_rd);
+     }
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_dh(DisasContext *s, arg_rr *a)
+         TCGv_i32 tcg_ahp = get_ahp_flag();
+         gen_helper_vfp_fcvt_f16_to_f64(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
+-        write_fp_dreg(s, a->rd, tcg_rd);
++        write_fp_dreg_merging(s, a->rd, a->rd, tcg_rd);
+     }
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool do_fcvt_f(DisasContext *s, arg_fcvt *a,
+     do_fcvt_scalar(s, a->esz | (is_signed ? MO_SIGN : 0),
+                    a->esz, tcg_int, a->shift, a->rn, rmode);
+-    clear_vec(s, a->rd);
++    if (!s->fpcr_nep) {
++        clear_vec(s, a->rd);
++    }
+     write_vec_element(s, tcg_int, a->rd, 0, a->esz);
+     return true;
+ }
+--
+.34.1

-New patch
+[PULL 19/68] target/arm: Handle FPCR.NEP in do_cvtf_scalar()
+Handle FPCR.NEP in the operations handled by do_cvtf_scalar().
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 6 +++---
+file changed, 3 insertions(+), 3 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
+         } else {
+             gen_helper_vfp_uqtod(tcg_double, tcg_int, tcg_shift, tcg_fpstatus);
+         }
+-        write_fp_dreg(s, rd, tcg_double);
++        write_fp_dreg_merging(s, rd, rd, tcg_double);
+         break;
+     case MO_32:
+@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
+         } else {
+             gen_helper_vfp_uqtos(tcg_single, tcg_int, tcg_shift, tcg_fpstatus);
+         }
+-        write_fp_sreg(s, rd, tcg_single);
++        write_fp_sreg_merging(s, rd, rd, tcg_single);
+         break;
+     case MO_16:
+@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
+         } else {
+             gen_helper_vfp_uqtoh(tcg_single, tcg_int, tcg_shift, tcg_fpstatus);
+         }
+-        write_fp_sreg(s, rd, tcg_single);
++        write_fp_hreg_merging(s, rd, rd, tcg_single);
+         break;
+     default:
+--
+.34.1

-New patch
+[PULL 20/68] target/arm: Handle FPCR.NEP for scalar FABS and FNEG
+Handle FPCR.NEP merging for scalar FABS and FNEG; this requires
+an extra parameter to do_fp1_scalar_int(), since FMOV scalar
+does not have the merging behaviour.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 27 ++++++++++++++++++++-------
+file changed, 20 insertions(+), 7 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar1Int {
+ } FPScalar1Int;
+ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
+-                              const FPScalar1Int *f)
++                              const FPScalar1Int *f,
++                              bool merging)
+ {
+     switch (a->esz) {
+     case MO_64:
+         if (fp_access_check(s)) {
+             TCGv_i64 t = read_fp_dreg(s, a->rn);
+             f->gen_d(t, t);
+-            write_fp_dreg(s, a->rd, t);
++            if (merging) {
++                write_fp_dreg_merging(s, a->rd, a->rd, t);
++            } else {
++                write_fp_dreg(s, a->rd, t);
++            }
+         }
+         break;
+     case MO_32:
+         if (fp_access_check(s)) {
+             TCGv_i32 t = read_fp_sreg(s, a->rn);
+             f->gen_s(t, t);
+-            write_fp_sreg(s, a->rd, t);
++            if (merging) {
++                write_fp_sreg_merging(s, a->rd, a->rd, t);
++            } else {
++                write_fp_sreg(s, a->rd, t);
++            }
+         }
+         break;
+     case MO_16:
+@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
+         if (fp_access_check(s)) {
+             TCGv_i32 t = read_fp_hreg(s, a->rn);
+             f->gen_h(t, t);
+-            write_fp_sreg(s, a->rd, t);
++            if (merging) {
++                write_fp_hreg_merging(s, a->rd, a->rd, t);
++            } else {
++                write_fp_sreg(s, a->rd, t);
++            }
+         }
+         break;
+     default:
+@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fmov = {
+     tcg_gen_mov_i32,
+     tcg_gen_mov_i64,
+ };
+-TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov)
++TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov, false)
+ static const FPScalar1Int f_scalar_fabs = {
+     gen_vfp_absh,
+     gen_vfp_abss,
+     gen_vfp_absd,
+ };
+-TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs)
++TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs, true)
+ static const FPScalar1Int f_scalar_fneg = {
+     gen_vfp_negh,
+     gen_vfp_negs,
+     gen_vfp_negd,
+ };
+-TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg)
++TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg, true)
+ typedef struct FPScalar1 {
+     void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr);
+--
+.34.1

-New patch
+[PULL 21/68] target/arm: Handle FPCR.NEP for FCVTXN (scalar)
+Unlike the other users of do_2misc_narrow_scalar(), FCVTXN (scalar)
+is always double-to-single and must honour FPCR.NEP.  Implement this
+directly in a trans function rather than using
+do_2misc_narrow_scalar().
+We still need gen_fcvtxn_sd() and the f_scalar_fcvtxn[] array for
+the FCVTXN (vector) insn, so we move those down in the file to
+where they are used.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 43 ++++++++++++++++++++++------------
+file changed, 28 insertions(+), 15 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static ArithOneOp * const f_scalar_uqxtn[] = {
+ };
+ TRANS(UQXTN_s, do_2misc_narrow_scalar, a, f_scalar_uqxtn)
+-static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
++static bool trans_FCVTXN_s(DisasContext *s, arg_rr_e *a)
+ {
+-    /*
+-     * 64 bit to 32 bit float conversion
+-     * with von Neumann rounding (round to odd)
+-     */
+-    TCGv_i32 tmp = tcg_temp_new_i32();
+-    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_A64));
+-    tcg_gen_extu_i32_i64(d, tmp);
++    if (fp_access_check(s)) {
++        /*
++         * 64 bit to 32 bit float conversion
++         * with von Neumann rounding (round to odd)
++         */
++        TCGv_i64 src = read_fp_dreg(s, a->rn);
++        TCGv_i32 dst = tcg_temp_new_i32();
++        gen_helper_fcvtx_f64_to_f32(dst, src, fpstatus_ptr(FPST_A64));
++        write_fp_sreg_merging(s, a->rd, a->rd, dst);
++    }
++    return true;
+ }
+-static ArithOneOp * const f_scalar_fcvtxn[] = {
+-    NULL,
+-    NULL,
+-    gen_fcvtxn_sd,
+-};
+-TRANS(FCVTXN_s, do_2misc_narrow_scalar, a, f_scalar_fcvtxn)
+-
+ #undef WRAP_ENV
+ static bool do_gvec_fn2(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn)
+@@ -XXX,XX +XXX,XX @@ static void gen_fcvtn_sd(TCGv_i64 d, TCGv_i64 n)
+     tcg_gen_extu_i32_i64(d, tmp);
+ }
++static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
++{
++    /*
++     * 64 bit to 32 bit float conversion
++     * with von Neumann rounding (round to odd)
++     */
++    TCGv_i32 tmp = tcg_temp_new_i32();
++    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_A64));
++    tcg_gen_extu_i32_i64(d, tmp);
++}
++
+ static ArithOneOp * const f_vector_fcvtn[] = {
+     NULL,
+     gen_fcvtn_hs,
+     gen_fcvtn_sd,
+ };
++static ArithOneOp * const f_scalar_fcvtxn[] = {
++    NULL,
++    NULL,
++    gen_fcvtxn_sd,
++};
+ TRANS(FCVTN_v, do_2misc_narrow_vector, a, f_vector_fcvtn)
+ TRANS(FCVTXN_v, do_2misc_narrow_vector, a, f_scalar_fcvtxn)
+--
+.34.1

-New patch
+[PULL 22/68] target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
+do_fp3_scalar_idx() is used only for the FMUL and FMULX scalar by
+element instructions; these both need to merge the result with the Rn
+register when FPCR.NEP is set.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 6 +++---
+file changed, 3 insertions(+), 3 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
+             read_vec_element(s, t1, a->rm, a->idx, MO_64);
+             f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_A64));
+-            write_fp_dreg(s, a->rd, t0);
++            write_fp_dreg_merging(s, a->rd, a->rn, t0);
+         }
+         break;
+     case MO_32:
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
+             read_vec_element_i32(s, t1, a->rm, a->idx, MO_32);
+             f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_A64));
+-            write_fp_sreg(s, a->rd, t0);
++            write_fp_sreg_merging(s, a->rd, a->rn, t0);
+         }
+         break;
+     case MO_16:
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
+             read_vec_element_i32(s, t1, a->rm, a->idx, MO_16);
+             f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_A64_F16));
+-            write_fp_sreg(s, a->rd, t0);
++            write_fp_hreg_merging(s, a->rd, a->rn, t0);
+         }
+         break;
+     default:
+--
+.34.1

-[Qemu-devel] [PULL 24/28] hw/arm/mps2-tz.c: Instantiate MPCs
+[PULL 23/68] target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
-Instantiate and wire up the Memory Protection Controllers
+When FPCR.AH == 1, floating point FMIN and FMAX have some odd special
-in the MPS2 board itself.
+cases:
  * comparing two zeroes (even of different sign) or comparing a NaN
    with anything always returns the second argument (possibly
    squashed to zero)
  * denormal outputs are not squashed to zero regardless of FZ or FZ16
 Implement these semantics in new helper functions and select them at
 translate time if FPCR.AH is 1 for the scalar FMAX and FMIN insns.
 (We will convert the other FMAX and FMIN insns in subsequent
 commits.)
 Note that FMINNM and FMAXNM are not affected.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20180620132032.28865-9-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 71 ++++++++++++++++++++++++++++++------------------
+ target/arm/tcg/helper-a64.h    |  7 +++++++
-file changed, 44 insertions(+), 27 deletions(-)
+ target/arm/tcg/helper-a64.c    | 36 ++++++++++++++++++++++++++++++++++
  target/arm/tcg/translate-a64.c | 23 ++++++++++++++++++++--
 files changed, 64 insertions(+), 2 deletions(-)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/tcg/helper-a64.h
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/tcg/helper-a64.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(advsimd_muladd2h, i32, i32, i32, i32, fpst)
- #include "hw/timer/cmsdk-apb-timer.h"
+ DEF_HELPER_2(advsimd_rinth_exact, f16, f16, fpst)
- #include "hw/misc/mps2-scc.h"
+ DEF_HELPER_2(advsimd_rinth, f16, f16, fpst)
- #include "hw/misc/mps2-fpgaio.h"
-+#include "hw/misc/tz-mpc.h"
++DEF_HELPER_3(vfp_ah_minh, f16, f16, f16, fpst)
- #include "hw/arm/iotkit.h"
++DEF_HELPER_3(vfp_ah_mins, f32, f32, f32, fpst)
- #include "hw/devices.h"
++DEF_HELPER_3(vfp_ah_mind, f64, f64, f64, fpst)
- #include "net/net.h"
++DEF_HELPER_3(vfp_ah_maxh, f16, f16, f16, fpst)
-@@ -XXX,XX +XXX,XX @@ typedef struct {
++DEF_HELPER_3(vfp_ah_maxs, f32, f32, f32, fpst)
++DEF_HELPER_3(vfp_ah_maxd, f64, f64, f64, fpst)
-     IoTKit iotkit;
++
-     MemoryRegion psram;
+ DEF_HELPER_2(exception_return, void, env, i64)
--    MemoryRegion ssram1;
+ DEF_HELPER_FLAGS_2(dc_zva, TCG_CALL_NO_WG, void, env, i64)
-+    MemoryRegion ssram[3];
-     MemoryRegion ssram1_m;
+diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
--    MemoryRegion ssram23;
+index XXXXXXX..XXXXXXX 100644
-     MPS2SCC scc;
+--- a/target/arm/tcg/helper-a64.c
-     MPS2FPGAIO fpgaio;
++++ b/target/arm/tcg/helper-a64.c
-     TZPPC ppc[5];
+@@ -XXX,XX +XXX,XX @@ float32 HELPER(fcvtx_f64_to_f32)(float64 a, float_status *fpst)
--    UnimplementedDeviceState ssram_mpc[3];
+     return r;
 +    TZMPC ssram_mpc[3];
      UnimplementedDeviceState spi[5];
      UnimplementedDeviceState i2c[4];
      UnimplementedDeviceState i2s_audio;
@@ -XXX,XX +XXX,XX @@ typedef struct {
  /* Main SYSCLK frequency in Hz */
  #define SYSCLK_FRQ 20000000
 -/* Initialize the auxiliary RAM region @mr and map it into
 - * the memory map at @base.
 - */
 -static void make_ram(MemoryRegion *mr, const char *name,
 -                     hwaddr base, hwaddr size)
 -{
 -    memory_region_init_ram(mr, NULL, name, size, &error_fatal);
 -    memory_region_add_subregion(get_system_memory(), base, mr);
 -}
 -
  /* Create an alias of an entire original MemoryRegion @orig
   * located at @base in the memory map.
   */
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_eth_dev(MPS2TZMachineState *mms, void *opaque,
      return sysbus_mmio_get_region(s, 0);
  }
-+static MemoryRegion *make_mpc(MPS2TZMachineState *mms, void *opaque,
++/*
-+                              const char *name, hwaddr size)
++ * AH=1 min/max have some odd special cases:
-+{
++ * comparing two zeroes (regardless of sign), (NaN, anything),
-+    TZMPC *mpc = opaque;
++ * or (anything, NaN) should return the second argument (possibly
-+    int i = mpc - &mms->ssram_mpc[0];
++ * squashed to zero).
-+    MemoryRegion *ssram = &mms->ssram[i];
++ * Also, denormal outputs are not squashed to zero regardless of FZ or FZ16.
-+    MemoryRegion *upstream;
++ */
-+    char *mpcname = g_strdup_printf("%s-mpc", name);
++#define AH_MINMAX_HELPER(NAME, CTYPE, FLOATTYPE, MINMAX)                \
-+    static uint32_t ramsize[] = { 0x00400000, 0x00200000, 0x00200000 };
++    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
-+    static uint32_t rambase[] = { 0x00000000, 0x28000000, 0x28200000 };
++    {                                                                   \
-+
++        bool save;                                                      \
-+    memory_region_init_ram(ssram, NULL, name, ramsize[i], &error_fatal);
++        CTYPE r;                                                        \
-+
++        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
-+    init_sysbus_child(OBJECT(mms), mpcname, mpc,
++        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
-+                      sizeof(mms->ssram_mpc[0]), TYPE_TZ_MPC);
++        if (FLOATTYPE ## _is_zero(a) && FLOATTYPE ## _is_zero(b)) {     \
-+    object_property_set_link(OBJECT(mpc), OBJECT(ssram),
++            return b;                                                   \
-+                             "downstream", &error_fatal);
++        }                                                               \
-+    object_property_set_bool(OBJECT(mpc), true, "realized", &error_fatal);
++        if (FLOATTYPE ## _is_any_nan(a) ||                              \
-+    /* Map the upstream end of the MPC into system memory */
++            FLOATTYPE ## _is_any_nan(b)) {                              \
-+    upstream = sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 1);
++            float_raise(float_flag_invalid, fpst);                      \
-+    memory_region_add_subregion(get_system_memory(), rambase[i], upstream);
++            return b;                                                   \
-+    /* and connect its interrupt to the IoTKit */
++        }                                                               \
-+    qdev_connect_gpio_out_named(DEVICE(mpc), "irq", 0,
++        save = get_flush_to_zero(fpst);                                 \
-+                                qdev_get_gpio_in_named(DEVICE(&mms->iotkit),
++        set_flush_to_zero(false, fpst);                                 \
-+                                                       "mpcexp_status", i));
++        r = FLOATTYPE ## _ ## MINMAX(a, b, fpst);                       \
-+
++        set_flush_to_zero(save, fpst);                                  \
-+    /* The first SSRAM is a special case as it has an alias; accesses to
++        return r;                                                       \
 +     * the alias region at 0x00400000 must also go to the MPC upstream.
 +     */
 +    if (i == 0) {
 +        make_ram_alias(&mms->ssram1_m, "mps.ssram1_m", upstream, 0x00400000);
 +    }
 +
-+    g_free(mpcname);
++AH_MINMAX_HELPER(vfp_ah_minh, dh_ctype_f16, float16, min)
-+    /* Return the register interface MR for our caller to map behind the PPC */
++AH_MINMAX_HELPER(vfp_ah_mins, float32, float32, min)
-+    return sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 0);
++AH_MINMAX_HELPER(vfp_ah_mind, float64, float64, min)
 +AH_MINMAX_HELPER(vfp_ah_maxh, dh_ctype_f16, float16, max)
 +AH_MINMAX_HELPER(vfp_ah_maxs, float32, float32, max)
 +AH_MINMAX_HELPER(vfp_ah_maxd, float64, float64, max)
 +
  /* 64-bit versions of the CRC helpers. Note that although the operation
   * (and the prototypes of crc32c() and crc32() mean that only the bottom
   * 32 bits of the accumulator and result are used, we pass and return
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
                                         select_ah_fpst(s, a->esz));
  }
 +/* Some insns need to call different helpers when FPCR.AH == 1 */
 +static bool do_fp3_scalar_2fn(DisasContext *s, arg_rrr_e *a,
 +                              const FPScalar *fnormal,
 +                              const FPScalar *fah,
 +                              int mergereg)
 +{
 +    return do_fp3_scalar(s, a, s->fpcr_ah ? fah : fnormal, mergereg);
 +}
 +
- static void mps2tz_common_init(MachineState *machine)
+ static const FPScalar f_scalar_fadd = {
- {
+     gen_helper_vfp_addh,
-     MPS2TZMachineState *mms = MPS2TZ_MACHINE(machine);
+     gen_helper_vfp_adds,
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
+@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fmax = {
-                                          NULL, "mps.ram", 0x01000000);
+     gen_helper_vfp_maxs,
-     memory_region_add_subregion(system_memory, 0x80000000, &mms->psram);
+     gen_helper_vfp_maxd,
+ };
--    /* The SSRAM memories should all be behind Memory Protection Controllers,
+-TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax, a->rn)
--     * but we don't implement that yet.
++static const FPScalar f_scalar_fmax_ah = {
--     */
++    gen_helper_vfp_ah_maxh,
--    make_ram(&mms->ssram1, "mps.ssram1", 0x00000000, 0x00400000);
++    gen_helper_vfp_ah_maxs,
--    make_ram_alias(&mms->ssram1_m, "mps.ssram1_m", &mms->ssram1, 0x00400000);
++    gen_helper_vfp_ah_maxd,
--
++};
--    make_ram(&mms->ssram23, "mps.ssram23", 0x28000000, 0x00400000);
++TRANS(FMAX_s, do_fp3_scalar_2fn, a, &f_scalar_fmax, &f_scalar_fmax_ah, a->rn)
--
-     /* The overflow IRQs for all UARTs are ORed together.
+ static const FPScalar f_scalar_fmin = {
-      * Tx, Rx and "combined" IRQs are sent to the NVIC separately.
+     gen_helper_vfp_minh,
-      * Create the OR gate for this.
+     gen_helper_vfp_mins,
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
+     gen_helper_vfp_mind,
-     const PPCInfo ppcs[] = { {
+ };
-             .name = "apb_ppcexp0",
+-TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin, a->rn)
-             .ports = {
++static const FPScalar f_scalar_fmin_ah = {
--                { "ssram-mpc0", make_unimp_dev, &mms->ssram_mpc[0],
++    gen_helper_vfp_ah_minh,
--                  0x58007000, 0x1000 },
++    gen_helper_vfp_ah_mins,
--                { "ssram-mpc1", make_unimp_dev, &mms->ssram_mpc[1],
++    gen_helper_vfp_ah_mind,
--                  0x58008000, 0x1000 },
++};
--                { "ssram-mpc2", make_unimp_dev, &mms->ssram_mpc[2],
++TRANS(FMIN_s, do_fp3_scalar_2fn, a, &f_scalar_fmin, &f_scalar_fmin_ah, a->rn)
--                  0x58009000, 0x1000 },
-+                { "ssram-0", make_mpc, &mms->ssram_mpc[0], 0x58007000, 0x1000 },
+ static const FPScalar f_scalar_fmaxnm = {
-+                { "ssram-1", make_mpc, &mms->ssram_mpc[1], 0x58008000, 0x1000 },
+     gen_helper_vfp_maxnumh,
 +                { "ssram-2", make_mpc, &mms->ssram_mpc[2], 0x58009000, 0x1000 },
              },
          }, {
              .name = "apb_ppcexp1",
 --
-.17.1
+.34.1

-New patch
+[PULL 24/68] target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
+Implement the FPCR.AH == 1 semantics for vector FMIN/FMAX, by
+creating new _ah_ versions of the gvec helpers which invoke the
+scalar fmin_ah and fmax_ah helpers on each element.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
+ target/arm/tcg/translate-a64.c | 21 +++++++++++++++++++--
+ target/arm/tcg/vec_helper.c    |  8 ++++++++
+files changed, 41 insertions(+), 2 deletions(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmax_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmax_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmax_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++
++DEF_HELPER_FLAGS_5(gvec_ah_fmin_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmin_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmin_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_4(sve_faddv_h, TCG_CALL_NO_RWG,
+                    i64, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_4(sve_faddv_s, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
+                                        FPST_A64_F16 : FPST_A64);
+ }
++static bool do_fp3_vector_2fn(DisasContext *s, arg_qrrr_e *a, int data,
++                              gen_helper_gvec_3_ptr * const fnormal[3],
++                              gen_helper_gvec_3_ptr * const fah[3])
++{
++    return do_fp3_vector(s, a, data, s->fpcr_ah ? fah : fnormal);
++}
++
+ static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
+                              gen_helper_gvec_3_ptr * const f[3])
+ {
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmax[3] = {
+     gen_helper_gvec_fmax_s,
+     gen_helper_gvec_fmax_d,
+ };
+-TRANS(FMAX_v, do_fp3_vector, a, 0, f_vector_fmax)
++static gen_helper_gvec_3_ptr * const f_vector_fmax_ah[3] = {
++    gen_helper_gvec_ah_fmax_h,
++    gen_helper_gvec_ah_fmax_s,
++    gen_helper_gvec_ah_fmax_d,
++};
++TRANS(FMAX_v, do_fp3_vector_2fn, a, 0, f_vector_fmax, f_vector_fmax_ah)
+ static gen_helper_gvec_3_ptr * const f_vector_fmin[3] = {
+     gen_helper_gvec_fmin_h,
+     gen_helper_gvec_fmin_s,
+     gen_helper_gvec_fmin_d,
+ };
+-TRANS(FMIN_v, do_fp3_vector, a, 0, f_vector_fmin)
++static gen_helper_gvec_3_ptr * const f_vector_fmin_ah[3] = {
++    gen_helper_gvec_ah_fmin_h,
++    gen_helper_gvec_ah_fmin_s,
++    gen_helper_gvec_ah_fmin_d,
++};
++TRANS(FMIN_v, do_fp3_vector_2fn, a, 0, f_vector_fmin, f_vector_fmin_ah)
+ static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[3] = {
+     gen_helper_gvec_fmaxnum_h,
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_rsqrts_h, helper_rsqrtsf_f16, float16)
+ DO_3OP(gvec_rsqrts_s, helper_rsqrtsf_f32, float32)
+ DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64)
++DO_3OP(gvec_ah_fmax_h, helper_vfp_ah_maxh, float16)
++DO_3OP(gvec_ah_fmax_s, helper_vfp_ah_maxs, float32)
++DO_3OP(gvec_ah_fmax_d, helper_vfp_ah_maxd, float64)
++
++DO_3OP(gvec_ah_fmin_h, helper_vfp_ah_minh, float16)
++DO_3OP(gvec_ah_fmin_s, helper_vfp_ah_mins, float32)
++DO_3OP(gvec_ah_fmin_d, helper_vfp_ah_mind, float64)
++
+ #endif
+ #undef DO_3OP
+--
+.34.1

-New patch
+[PULL 25/68] target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
+Implement the FPCR.AH semantics for FMAXV and FMINV.  These are the
+"recursively reduce all lanes of a vector to a scalar result" insns;
+we just need to use the _ah_ helper for the reduction step when
+FPCR.AH == 1.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 28 ++++++++++++++++++----------
+file changed, 18 insertions(+), 10 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static TCGv_i32 do_reduction_op(DisasContext *s, int rn, MemOp esz,
+ }
+ static bool do_fp_reduction(DisasContext *s, arg_qrr_e *a,
+-                              NeonGenTwoSingleOpFn *fn)
++                            NeonGenTwoSingleOpFn *fnormal,
++                            NeonGenTwoSingleOpFn *fah)
+ {
+     if (fp_access_check(s)) {
+         MemOp esz = a->esz;
+         int elts = (a->q ? 16 : 8) >> esz;
+         TCGv_ptr fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+-        TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst, fn);
++        TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst,
++                                       s->fpcr_ah ? fah : fnormal);
+         write_fp_sreg(s, a->rd, res);
+     }
+     return true;
+ }
+-TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxnumh)
+-TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minnumh)
+-TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxh)
+-TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minh)
++TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a,
++           gen_helper_vfp_maxnumh, gen_helper_vfp_maxnumh)
++TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a,
++           gen_helper_vfp_minnumh, gen_helper_vfp_minnumh)
++TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a,
++           gen_helper_vfp_maxh, gen_helper_vfp_ah_maxh)
++TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a,
++           gen_helper_vfp_minh, gen_helper_vfp_ah_minh)
+-TRANS(FMAXNMV_s, do_fp_reduction, a, gen_helper_vfp_maxnums)
+-TRANS(FMINNMV_s, do_fp_reduction, a, gen_helper_vfp_minnums)
+-TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs)
+-TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins)
++TRANS(FMAXNMV_s, do_fp_reduction, a,
++      gen_helper_vfp_maxnums, gen_helper_vfp_maxnums)
++TRANS(FMINNMV_s, do_fp_reduction, a,
++      gen_helper_vfp_minnums, gen_helper_vfp_minnums)
++TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs, gen_helper_vfp_ah_maxs)
++TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins, gen_helper_vfp_ah_mins)
+ /*
+  * Floating-point Immediate
+--
+.34.1

-New patch
+[PULL 26/68] target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
+Implement the FPCR.AH semantics for the pairwise floating
+point minimum/maximum insns FMINP and FMAXP.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
+ target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
+ target/arm/tcg/vec_helper.c    | 10 ++++++++++
+files changed, 45 insertions(+), 4 deletions(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_ah_fmin_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_ah_fmin_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmaxp_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmaxp_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmaxp_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++
++DEF_HELPER_FLAGS_5(gvec_ah_fminp_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fminp_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fminp_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_4(sve_faddv_h, TCG_CALL_NO_RWG,
+                    i64, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_4(sve_faddv_s, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmaxp[3] = {
+     gen_helper_gvec_fmaxp_s,
+     gen_helper_gvec_fmaxp_d,
+ };
+-TRANS(FMAXP_v, do_fp3_vector, a, 0, f_vector_fmaxp)
++static gen_helper_gvec_3_ptr * const f_vector_ah_fmaxp[3] = {
++    gen_helper_gvec_ah_fmaxp_h,
++    gen_helper_gvec_ah_fmaxp_s,
++    gen_helper_gvec_ah_fmaxp_d,
++};
++TRANS(FMAXP_v, do_fp3_vector_2fn, a, 0, f_vector_fmaxp, f_vector_ah_fmaxp)
+ static gen_helper_gvec_3_ptr * const f_vector_fminp[3] = {
+     gen_helper_gvec_fminp_h,
+     gen_helper_gvec_fminp_s,
+     gen_helper_gvec_fminp_d,
+ };
+-TRANS(FMINP_v, do_fp3_vector, a, 0, f_vector_fminp)
++static gen_helper_gvec_3_ptr * const f_vector_ah_fminp[3] = {
++    gen_helper_gvec_ah_fminp_h,
++    gen_helper_gvec_ah_fminp_s,
++    gen_helper_gvec_ah_fminp_d,
++};
++TRANS(FMINP_v, do_fp3_vector_2fn, a, 0, f_vector_fminp, f_vector_ah_fminp)
+ static gen_helper_gvec_3_ptr * const f_vector_fmaxnmp[3] = {
+     gen_helper_gvec_fmaxnump_h,
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_pair(DisasContext *s, arg_rr_e *a, const FPScalar *f)
+     return true;
+ }
++static bool do_fp3_scalar_pair_2fn(DisasContext *s, arg_rr_e *a,
++                                   const FPScalar *fnormal,
++                                   const FPScalar *fah)
++{
++    return do_fp3_scalar_pair(s, a, s->fpcr_ah ? fah : fnormal);
++}
++
+ TRANS(FADDP_s, do_fp3_scalar_pair, a, &f_scalar_fadd)
+-TRANS(FMAXP_s, do_fp3_scalar_pair, a, &f_scalar_fmax)
+-TRANS(FMINP_s, do_fp3_scalar_pair, a, &f_scalar_fmin)
++TRANS(FMAXP_s, do_fp3_scalar_pair_2fn, a, &f_scalar_fmax, &f_scalar_fmax_ah)
++TRANS(FMINP_s, do_fp3_scalar_pair_2fn, a, &f_scalar_fmin, &f_scalar_fmin_ah)
+ TRANS(FMAXNMP_s, do_fp3_scalar_pair, a, &f_scalar_fmaxnm)
+ TRANS(FMINNMP_s, do_fp3_scalar_pair, a, &f_scalar_fminnm)
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_3OP_PAIR(gvec_fminnump_h, float16_minnum, float16, H2)
+ DO_3OP_PAIR(gvec_fminnump_s, float32_minnum, float32, H4)
+ DO_3OP_PAIR(gvec_fminnump_d, float64_minnum, float64, )
++#ifdef TARGET_AARCH64
++DO_3OP_PAIR(gvec_ah_fmaxp_h, helper_vfp_ah_maxh, float16, H2)
++DO_3OP_PAIR(gvec_ah_fmaxp_s, helper_vfp_ah_maxs, float32, H4)
++DO_3OP_PAIR(gvec_ah_fmaxp_d, helper_vfp_ah_maxd, float64, )
++
++DO_3OP_PAIR(gvec_ah_fminp_h, helper_vfp_ah_minh, float16, H2)
++DO_3OP_PAIR(gvec_ah_fminp_s, helper_vfp_ah_mins, float32, H4)
++DO_3OP_PAIR(gvec_ah_fminp_d, helper_vfp_ah_mind, float64, )
++#endif
++
+ #undef DO_3OP_PAIR
+ #define DO_3OP_PAIR(NAME, FUNC, TYPE, H) \
+--
+.34.1

-New patch
+[PULL 27/68] target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
+Implement the FPCR.AH semantics for the SVE FMAXV and FMINV
+vector-reduction-to-scalar max/min operations.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 14 +++++++++++
+ target/arm/tcg/sve_helper.c    | 43 +++++++++++++++++++++-------------
+ target/arm/tcg/translate-sve.c | 16 +++++++++++--
+files changed, 55 insertions(+), 18 deletions(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_fminv_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_4(sve_fminv_d, TCG_CALL_NO_RWG,
+                    i64, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fmaxv_h, TCG_CALL_NO_RWG,
++                   i64, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fmaxv_s, TCG_CALL_NO_RWG,
++                   i64, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fmaxv_d, TCG_CALL_NO_RWG,
++                   i64, ptr, ptr, fpst, i32)
++
++DEF_HELPER_FLAGS_4(sve_ah_fminv_h, TCG_CALL_NO_RWG,
++                   i64, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fminv_s, TCG_CALL_NO_RWG,
++                   i64, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fminv_d, TCG_CALL_NO_RWG,
++                   i64, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_5(sve_fadda_h, TCG_CALL_NO_RWG,
+                    i64, i64, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(sve_fadda_s, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ static TYPE NAME##_reduce(TYPE *data, float_status *status, uintptr_t n) \
+         uintptr_t half = n / 2;                                       \
+         TYPE lo = NAME##_reduce(data, status, half);                  \
+         TYPE hi = NAME##_reduce(data + half, status, half);           \
+-        return TYPE##_##FUNC(lo, hi, status);                         \
++        return FUNC(lo, hi, status);                                  \
+     }                                                                 \
+ }                                                                     \
+ uint64_t HELPER(NAME)(void *vn, void *vg, float_status *s, uint32_t desc) \
+@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(NAME)(void *vn, void *vg, float_status *s, uint32_t desc) \
+     return NAME##_reduce(data, s, maxsz / sizeof(TYPE));              \
+ }
+-DO_REDUCE(sve_faddv_h, float16, H1_2, add, float16_zero)
+-DO_REDUCE(sve_faddv_s, float32, H1_4, add, float32_zero)
+-DO_REDUCE(sve_faddv_d, float64, H1_8, add, float64_zero)
++DO_REDUCE(sve_faddv_h, float16, H1_2, float16_add, float16_zero)
++DO_REDUCE(sve_faddv_s, float32, H1_4, float32_add, float32_zero)
++DO_REDUCE(sve_faddv_d, float64, H1_8, float64_add, float64_zero)
+ /* Identity is floatN_default_nan, without the function call.  */
+-DO_REDUCE(sve_fminnmv_h, float16, H1_2, minnum, 0x7E00)
+-DO_REDUCE(sve_fminnmv_s, float32, H1_4, minnum, 0x7FC00000)
+-DO_REDUCE(sve_fminnmv_d, float64, H1_8, minnum, 0x7FF8000000000000ULL)
++DO_REDUCE(sve_fminnmv_h, float16, H1_2, float16_minnum, 0x7E00)
++DO_REDUCE(sve_fminnmv_s, float32, H1_4, float32_minnum, 0x7FC00000)
++DO_REDUCE(sve_fminnmv_d, float64, H1_8, float64_minnum, 0x7FF8000000000000ULL)
+-DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, maxnum, 0x7E00)
+-DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, maxnum, 0x7FC00000)
+-DO_REDUCE(sve_fmaxnmv_d, float64, H1_8, maxnum, 0x7FF8000000000000ULL)
++DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, float16_maxnum, 0x7E00)
++DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, float32_maxnum, 0x7FC00000)
++DO_REDUCE(sve_fmaxnmv_d, float64, H1_8, float64_maxnum, 0x7FF8000000000000ULL)
+-DO_REDUCE(sve_fminv_h, float16, H1_2, min, float16_infinity)
+-DO_REDUCE(sve_fminv_s, float32, H1_4, min, float32_infinity)
+-DO_REDUCE(sve_fminv_d, float64, H1_8, min, float64_infinity)
++DO_REDUCE(sve_fminv_h, float16, H1_2, float16_min, float16_infinity)
++DO_REDUCE(sve_fminv_s, float32, H1_4, float32_min, float32_infinity)
++DO_REDUCE(sve_fminv_d, float64, H1_8, float64_min, float64_infinity)
+-DO_REDUCE(sve_fmaxv_h, float16, H1_2, max, float16_chs(float16_infinity))
+-DO_REDUCE(sve_fmaxv_s, float32, H1_4, max, float32_chs(float32_infinity))
+-DO_REDUCE(sve_fmaxv_d, float64, H1_8, max, float64_chs(float64_infinity))
++DO_REDUCE(sve_fmaxv_h, float16, H1_2, float16_max, float16_chs(float16_infinity))
++DO_REDUCE(sve_fmaxv_s, float32, H1_4, float32_max, float32_chs(float32_infinity))
++DO_REDUCE(sve_fmaxv_d, float64, H1_8, float64_max, float64_chs(float64_infinity))
++
++DO_REDUCE(sve_ah_fminv_h, float16, H1_2, helper_vfp_ah_minh, float16_infinity)
++DO_REDUCE(sve_ah_fminv_s, float32, H1_4, helper_vfp_ah_mins, float32_infinity)
++DO_REDUCE(sve_ah_fminv_d, float64, H1_8, helper_vfp_ah_mind, float64_infinity)
++
++DO_REDUCE(sve_ah_fmaxv_h, float16, H1_2, helper_vfp_ah_maxh,
++          float16_chs(float16_infinity))
++DO_REDUCE(sve_ah_fmaxv_s, float32, H1_4, helper_vfp_ah_maxs,
++          float32_chs(float32_infinity))
++DO_REDUCE(sve_ah_fmaxv_d, float64, H1_8, helper_vfp_ah_maxd,
++          float64_chs(float64_infinity))
+ #undef DO_REDUCE
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static bool do_reduce(DisasContext *s, arg_rpr_esz *a,
+     };                                                                   \
+     TRANS_FEAT(NAME, aa64_sve, do_reduce, a, name##_fns[a->esz])
++#define DO_VPZ_AH(NAME, name)                                            \
++    static gen_helper_fp_reduce * const name##_fns[4] = {                \
++        NULL,                      gen_helper_sve_##name##_h,            \
++        gen_helper_sve_##name##_s, gen_helper_sve_##name##_d,            \
++    };                                                                   \
++    static gen_helper_fp_reduce * const name##_ah_fns[4] = {             \
++        NULL,                      gen_helper_sve_ah_##name##_h,         \
++        gen_helper_sve_ah_##name##_s, gen_helper_sve_ah_##name##_d,      \
++    };                                                                   \
++    TRANS_FEAT(NAME, aa64_sve, do_reduce, a,                             \
++               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz])
++
+ DO_VPZ(FADDV, faddv)
+ DO_VPZ(FMINNMV, fminnmv)
+ DO_VPZ(FMAXNMV, fmaxnmv)
+-DO_VPZ(FMINV, fminv)
+-DO_VPZ(FMAXV, fmaxv)
++DO_VPZ_AH(FMINV, fminv)
++DO_VPZ_AH(FMAXV, fmaxv)
+ #undef DO_VPZ
+--
+.34.1

-New patch
+[PULL 28/68] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
+Implement the FPCR.AH semantics for the SVE FMAX and FMIN operations
+that take an immediate as the second operand.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
+ target/arm/tcg/sve_helper.c    |  8 ++++++++
+ target/arm/tcg/translate-sve.c | 25 +++++++++++++++++++++++--
+files changed, 45 insertions(+), 2 deletions(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_fmins_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_6(sve_fmins_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmaxs_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmaxs_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmaxs_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++
++DEF_HELPER_FLAGS_6(sve_ah_fmins_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmins_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmins_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++
+ DEF_HELPER_FLAGS_5(sve_fcvt_sh, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(sve_fcvt_dh, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_ZPZS_FP(sve_fmins_h, float16, H1_2, float16_min)
+ DO_ZPZS_FP(sve_fmins_s, float32, H1_4, float32_min)
+ DO_ZPZS_FP(sve_fmins_d, float64, H1_8, float64_min)
++DO_ZPZS_FP(sve_ah_fmaxs_h, float16, H1_2, helper_vfp_ah_maxh)
++DO_ZPZS_FP(sve_ah_fmaxs_s, float32, H1_4, helper_vfp_ah_maxs)
++DO_ZPZS_FP(sve_ah_fmaxs_d, float64, H1_8, helper_vfp_ah_maxd)
++
++DO_ZPZS_FP(sve_ah_fmins_h, float16, H1_2, helper_vfp_ah_minh)
++DO_ZPZS_FP(sve_ah_fmins_s, float32, H1_4, helper_vfp_ah_mins)
++DO_ZPZS_FP(sve_ah_fmins_d, float64, H1_8, helper_vfp_ah_mind)
++
+ /* Fully general two-operand expander, controlled by a predicate,
+  * With the extra float_status parameter.
+  */
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fp_imm(DisasContext *s, arg_rpri_esz *a, uint64_t imm,
+     TRANS_FEAT(NAME##_zpzi, aa64_sve, do_fp_imm, a,                     \
+                name##_const[a->esz][a->imm], name##_fns[a->esz])
++#define DO_FP_AH_IMM(NAME, name, const0, const1)                        \
++    static gen_helper_sve_fp2scalar * const name##_fns[4] = {           \
++        NULL, gen_helper_sve_##name##_h,                                \
++        gen_helper_sve_##name##_s,                                      \
++        gen_helper_sve_##name##_d                                       \
++    };                                                                  \
++    static gen_helper_sve_fp2scalar * const name##_ah_fns[4] = {        \
++        NULL, gen_helper_sve_ah_##name##_h,                             \
++        gen_helper_sve_ah_##name##_s,                                   \
++        gen_helper_sve_ah_##name##_d                                    \
++    };                                                                  \
++    static uint64_t const name##_const[4][2] = {                        \
++        { -1, -1 },                                                     \
++        { float16_##const0, float16_##const1 },                         \
++        { float32_##const0, float32_##const1 },                         \
++        { float64_##const0, float64_##const1 },                         \
++    };                                                                  \
++    TRANS_FEAT(NAME##_zpzi, aa64_sve, do_fp_imm, a,                     \
++               name##_const[a->esz][a->imm],                            \
++               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz])
++
+ DO_FP_IMM(FADD, fadds, half, one)
+ DO_FP_IMM(FSUB, fsubs, half, one)
+ DO_FP_IMM(FMUL, fmuls, half, two)
+ DO_FP_IMM(FSUBR, fsubrs, half, one)
+ DO_FP_IMM(FMAXNM, fmaxnms, zero, one)
+ DO_FP_IMM(FMINNM, fminnms, zero, one)
+-DO_FP_IMM(FMAX, fmaxs, zero, one)
+-DO_FP_IMM(FMIN, fmins, zero, one)
++DO_FP_AH_IMM(FMAX, fmaxs, zero, one)
++DO_FP_AH_IMM(FMIN, fmins, zero, one)
+ #undef DO_FP_IMM
+--
+.34.1

-New patch
+[PULL 29/68] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
+Implement the FPCR.AH semantics for the SVE FMAX and FMIN
+operations that take two vector operands.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
+ target/arm/tcg/sve_helper.c    |  8 ++++++++
+ target/arm/tcg/translate-sve.c | 17 +++++++++++++++--
+files changed, 37 insertions(+), 2 deletions(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_fmax_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_6(sve_fmax_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmin_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmin_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmin_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++
++DEF_HELPER_FLAGS_6(sve_ah_fmax_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmax_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmax_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_6(sve_fminnum_h, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_6(sve_fminnum_s, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_ZPZZ_FP(sve_fmax_h, uint16_t, H1_2, float16_max)
+ DO_ZPZZ_FP(sve_fmax_s, uint32_t, H1_4, float32_max)
+ DO_ZPZZ_FP(sve_fmax_d, uint64_t, H1_8, float64_max)
++DO_ZPZZ_FP(sve_ah_fmin_h, uint16_t, H1_2, helper_vfp_ah_minh)
++DO_ZPZZ_FP(sve_ah_fmin_s, uint32_t, H1_4, helper_vfp_ah_mins)
++DO_ZPZZ_FP(sve_ah_fmin_d, uint64_t, H1_8, helper_vfp_ah_mind)
++
++DO_ZPZZ_FP(sve_ah_fmax_h, uint16_t, H1_2, helper_vfp_ah_maxh)
++DO_ZPZZ_FP(sve_ah_fmax_s, uint32_t, H1_4, helper_vfp_ah_maxs)
++DO_ZPZZ_FP(sve_ah_fmax_d, uint64_t, H1_8, helper_vfp_ah_maxd)
++
+ DO_ZPZZ_FP(sve_fminnum_h, uint16_t, H1_2, float16_minnum)
+ DO_ZPZZ_FP(sve_fminnum_s, uint32_t, H1_4, float32_minnum)
+ DO_ZPZZ_FP(sve_fminnum_d, uint64_t, H1_8, float64_minnum)
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(FTSMUL, aa64_sve, gen_gvec_fpst_arg_zzz,
+     };                                                          \
+     TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz, name##_zpzz_fns[a->esz], a)
++#define DO_ZPZZ_AH_FP(NAME, FEAT, name, ah_name)                        \
++    static gen_helper_gvec_4_ptr * const name##_zpzz_fns[4] = {         \
++        NULL,                  gen_helper_##name##_h,                   \
++        gen_helper_##name##_s, gen_helper_##name##_d                    \
++    };                                                                  \
++    static gen_helper_gvec_4_ptr * const name##_ah_zpzz_fns[4] = {      \
++        NULL,                  gen_helper_##ah_name##_h,                \
++        gen_helper_##ah_name##_s, gen_helper_##ah_name##_d              \
++    };                                                                  \
++    TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz,                      \
++               s->fpcr_ah ? name##_ah_zpzz_fns[a->esz] :                \
++               name##_zpzz_fns[a->esz], a)
++
+ DO_ZPZZ_FP(FADD_zpzz, aa64_sve, sve_fadd)
+ DO_ZPZZ_FP(FSUB_zpzz, aa64_sve, sve_fsub)
+ DO_ZPZZ_FP(FMUL_zpzz, aa64_sve, sve_fmul)
+-DO_ZPZZ_FP(FMIN_zpzz, aa64_sve, sve_fmin)
+-DO_ZPZZ_FP(FMAX_zpzz, aa64_sve, sve_fmax)
++DO_ZPZZ_AH_FP(FMIN_zpzz, aa64_sve, sve_fmin, sve_ah_fmin)
++DO_ZPZZ_AH_FP(FMAX_zpzz, aa64_sve, sve_fmax, sve_ah_fmax)
+ DO_ZPZZ_FP(FMINNM_zpzz, aa64_sve, sve_fminnum)
+ DO_ZPZZ_FP(FMAXNM_zpzz, aa64_sve, sve_fmaxnum)
+ DO_ZPZZ_FP(FABD, aa64_sve, sve_fabd)
+--
+.34.1

-[Qemu-devel] [PULL 20/28] hw/misc/tz_mpc.c: Honour the BLK_LUT settings in translate
+[PULL 30/68] target/arm: Implement FPCR.AH handling of negation of NaN
-The final part of the Memory Protection Controller we need to
+FPCR.AH == 1 mandates that negation of a NaN value should not flip
-implement is actually using the BLK_LUT data programmed by the
+its sign bit.  This means we can no longer use gen_vfp_neg*()
-guest to determine whether to block the transaction or not.
+everywhere but must instead generate slightly more complex code when
+FPCR.AH is set.
-Since this means we now change transaction mappings when
-the guest writes to BLK_LUT, we must also call the IOMMU
+Make this change for the scalar FNEG and for those places in
-notifiers at that point.
+translate-a64.c which were previously directly calling
 gen_vfp_neg*().
 This change in semantics also affects any other instruction whose
 pseudocode calls FPNeg(); in following commits we extend this
 change to the other affected instructions.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Eric Auger <eric.auger@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20180620132032.28865-5-peter.maydell@linaro.org
 ---
- hw/misc/tz-mpc.c     | 53 ++++++++++++++++++++++++++++++++++++++++++--
+ target/arm/tcg/translate-a64.c | 125 ++++++++++++++++++++++++++++++---
- hw/misc/trace-events |  1 +
+file changed, 114 insertions(+), 11 deletions(-)
-files changed, 52 insertions(+), 2 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 diff --git a/hw/misc/tz-mpc.c b/hw/misc/tz-mpc.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/tz-mpc.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/hw/misc/tz-mpc.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void tz_mpc_irq_update(TZMPC *s)
+@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
-     qemu_set_irq(s->irq, s->int_stat && s->int_en);
+                        is_q ? 16 : 8, vec_full_reg_size(s), data, fn);
  }
-+static void tz_mpc_iommu_notify(TZMPC *s, uint32_t lutidx,
++/*
-+                                uint32_t oldlut, uint32_t newlut)
++ * When FPCR.AH == 1, NEG and ABS do not flip the sign bit of a NaN.
-+{
++ * These functions implement
-+    /* Called when the LUT word at lutidx has changed from oldlut to newlut;
++ *   d = floatN_is_any_nan(s) ? s : floatN_chs(s)
-+     * must call the IOMMU notifiers for the changed blocks.
++ * which for float32 is
-+     */
++ *   d = (s & ~(1 << 31)) > 0x7f800000UL) ? s : (s ^ (1 << 31))
-+    IOMMUTLBEntry entry = {
++ * and similarly for the other float sizes.
-+        .addr_mask = s->blocksize - 1,
++ */
-+    };
++static void gen_vfp_ah_negh(TCGv_i32 d, TCGv_i32 s)
-+    hwaddr addr = lutidx * s->blocksize * 32;
++{
-+    int i;
++    TCGv_i32 abs_s = tcg_temp_new_i32(), chs_s = tcg_temp_new_i32();
 +
-+    for (i = 0; i < 32; i++, addr += s->blocksize) {
++    gen_vfp_negh(chs_s, s);
-+        bool block_is_ns;
++    gen_vfp_absh(abs_s, s);
-+
++    tcg_gen_movcond_i32(TCG_COND_GTU, d,
-+        if (!((oldlut ^ newlut) & (1 << i))) {
++                        abs_s, tcg_constant_i32(0x7c00),
-+            continue;
++                        s, chs_s);
-+        }
++}
-+        /* This changes the mappings for both the S and the NS space,
++
-+         * so we need to do four notifies: an UNMAP then a MAP for each.
++static void gen_vfp_ah_negs(TCGv_i32 d, TCGv_i32 s)
-+         */
++{
-+        block_is_ns = newlut & (1 << i);
++    TCGv_i32 abs_s = tcg_temp_new_i32(), chs_s = tcg_temp_new_i32();
 +
-+        trace_tz_mpc_iommu_notify(addr);
++    gen_vfp_negs(chs_s, s);
-+        entry.iova = addr;
++    gen_vfp_abss(abs_s, s);
-+        entry.translated_addr = addr;
++    tcg_gen_movcond_i32(TCG_COND_GTU, d,
-+
++                        abs_s, tcg_constant_i32(0x7f800000UL),
-+        entry.perm = IOMMU_NONE;
++                        s, chs_s);
-+        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_S, entry);
++}
-+        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_NS, entry);
++
-+
++static void gen_vfp_ah_negd(TCGv_i64 d, TCGv_i64 s)
-+        entry.perm = IOMMU_RW;
++{
-+        if (block_is_ns) {
++    TCGv_i64 abs_s = tcg_temp_new_i64(), chs_s = tcg_temp_new_i64();
-+            entry.target_as = &s->blocked_io_as;
++
-+        } else {
++    gen_vfp_negd(chs_s, s);
-+            entry.target_as = &s->downstream_as;
++    gen_vfp_absd(abs_s, s);
-+        }
++    tcg_gen_movcond_i64(TCG_COND_GTU, d,
-+        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_S, entry);
++                        abs_s, tcg_constant_i64(0x7ff0000000000000ULL),
-+        if (block_is_ns) {
++                        s, chs_s);
-+            entry.target_as = &s->downstream_as;
++}
-+        } else {
++
-+            entry.target_as = &s->blocked_io_as;
++static void gen_vfp_maybe_ah_negh(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
-+        }
++{
-+        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_NS, entry);
++    if (dc->fpcr_ah) {
 +        gen_vfp_ah_negh(d, s);
 +    } else {
 +        gen_vfp_negh(d, s);
 +    }
 +}
 +
- static void tz_mpc_autoinc_idx(TZMPC *s, unsigned access_size)
++static void gen_vfp_maybe_ah_negs(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
- {
++{
-     /* Auto-increment BLK_IDX if necessary */
++    if (dc->fpcr_ah) {
-@@ -XXX,XX +XXX,XX @@ static MemTxResult tz_mpc_reg_write(void *opaque, hwaddr addr,
++        gen_vfp_ah_negs(d, s);
-         s->blk_idx = value % s->blk_max;
++    } else {
-         break;
++        gen_vfp_negs(d, s);
-     case A_BLK_LUT:
++    }
-+        tz_mpc_iommu_notify(s, s->blk_idx, s->blk_lut[s->blk_idx], value);
++}
-         s->blk_lut[s->blk_idx] = value;
++
-         tz_mpc_autoinc_idx(s, size);
++static void gen_vfp_maybe_ah_negd(DisasContext *dc, TCGv_i64 d, TCGv_i64 s)
-         break;
++{
-@@ -XXX,XX +XXX,XX @@ static IOMMUTLBEntry tz_mpc_translate(IOMMUMemoryRegion *iommu,
++    if (dc->fpcr_ah) {
-     /* Look at the per-block configuration for this address, and
++        gen_vfp_ah_negd(d, s);
-      * return a TLB entry directing the transaction at either
++    } else {
-      * downstream_as or blocked_io_as, as appropriate.
++        gen_vfp_negd(d, s);
--     * For the moment, always permit accesses.
++    }
-+     * If the LUT cfg_ns bit is 1, only non-secure transactions
++}
-+     * may pass. If the bit is 0, only secure transactions may pass.
++
-      */
+ /* Set ZF and NF based on a 64 bit result. This is alas fiddlier
--    ok = true;
+  * than the 32 bit equivalent.
-+    ok = tz_mpc_cfg_ns(s, addr) == (iommu_idx == IOMMU_IDX_NS);
+  */
+@@ -XXX,XX +XXX,XX @@ static void gen_fnmul_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
-     trace_tz_mpc_translate(addr, flags,
+     gen_vfp_negd(d, d);
-                            iommu_idx == IOMMU_IDX_S ? "S" : "NS",
+ }
-diff --git a/hw/misc/trace-events b/hw/misc/trace-events
-index XXXXXXX..XXXXXXX 100644
++static void gen_fnmul_ah_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
---- a/hw/misc/trace-events
++{
-+++ b/hw/misc/trace-events
++    gen_helper_vfp_mulh(d, n, m, s);
-@@ -XXX,XX +XXX,XX @@ tz_mpc_reg_write(uint32_t offset, uint64_t data, unsigned size) "TZ MPC regs wri
++    gen_vfp_ah_negh(d, d);
- tz_mpc_mem_blocked_read(uint64_t addr, unsigned size, bool secure) "TZ MPC blocked read: offset 0x%" PRIx64 " size %u secure %d"
++}
- tz_mpc_mem_blocked_write(uint64_t addr, uint64_t data, unsigned size, bool secure) "TZ MPC blocked write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u secure %d"
++
- tz_mpc_translate(uint64_t addr, int flags, const char *idx, const char *res) "TZ MPC translate: addr 0x%" PRIx64 " flags 0x%x iommu_idx %s: %s"
++static void gen_fnmul_ah_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
-+tz_mpc_iommu_notify(uint64_t addr) "TZ MPC iommu: notifying UNMAP/MAP for 0x%" PRIx64
++{
++    gen_helper_vfp_muls(d, n, m, s);
- # hw/misc/tz-ppc.c
++    gen_vfp_ah_negs(d, d);
- tz_ppc_reset(void) "TZ PPC: reset"
++}
 +
 +static void gen_fnmul_ah_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
 +{
 +    gen_helper_vfp_muld(d, n, m, s);
 +    gen_vfp_ah_negd(d, d);
 +}
 +
  static const FPScalar f_scalar_fnmul = {
      gen_fnmul_h,
      gen_fnmul_s,
      gen_fnmul_d,
  };
 -TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul, a->rn)
 +static const FPScalar f_scalar_ah_fnmul = {
 +    gen_fnmul_ah_h,
 +    gen_fnmul_ah_s,
 +    gen_fnmul_ah_d,
 +};
 +TRANS(FNMUL_s, do_fp3_scalar_2fn, a, &f_scalar_fnmul, &f_scalar_ah_fnmul, a->rn)
  static const FPScalar f_scalar_fcmeq = {
      gen_helper_advsimd_ceq_f16,
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
              read_vec_element(s, t2, a->rm, a->idx, MO_64);
              if (neg) {
 -                gen_vfp_negd(t1, t1);
 +                gen_vfp_maybe_ah_negd(s, t1, t1);
              }
              gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
              write_fp_dreg_merging(s, a->rd, a->rd, t0);
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
              read_vec_element_i32(s, t2, a->rm, a->idx, MO_32);
              if (neg) {
 -                gen_vfp_negs(t1, t1);
 +                gen_vfp_maybe_ah_negs(s, t1, t1);
              }
              gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
              write_fp_sreg_merging(s, a->rd, a->rd, t0);
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
              read_vec_element_i32(s, t2, a->rm, a->idx, MO_16);
              if (neg) {
 -                gen_vfp_negh(t1, t1);
 +                gen_vfp_maybe_ah_negh(s, t1, t1);
              }
              gen_helper_advsimd_muladdh(t0, t1, t2, t0,
                                         fpstatus_ptr(FPST_A64_F16));
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
              TCGv_i64 ta = read_fp_dreg(s, a->ra);
              if (neg_a) {
 -                gen_vfp_negd(ta, ta);
 +                gen_vfp_maybe_ah_negd(s, ta, ta);
              }
              if (neg_n) {
 -                gen_vfp_negd(tn, tn);
 +                gen_vfp_maybe_ah_negd(s, tn, tn);
              }
              fpst = fpstatus_ptr(FPST_A64);
              gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
              TCGv_i32 ta = read_fp_sreg(s, a->ra);
              if (neg_a) {
 -                gen_vfp_negs(ta, ta);
 +                gen_vfp_maybe_ah_negs(s, ta, ta);
              }
              if (neg_n) {
 -                gen_vfp_negs(tn, tn);
 +                gen_vfp_maybe_ah_negs(s, tn, tn);
              }
              fpst = fpstatus_ptr(FPST_A64);
              gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
              TCGv_i32 ta = read_fp_hreg(s, a->ra);
              if (neg_a) {
 -                gen_vfp_negh(ta, ta);
 +                gen_vfp_maybe_ah_negh(s, ta, ta);
              }
              if (neg_n) {
 -                gen_vfp_negh(tn, tn);
 +                gen_vfp_maybe_ah_negh(s, tn, tn);
              }
              fpst = fpstatus_ptr(FPST_A64_F16);
              gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
      return true;
  }
 +static bool do_fp1_scalar_int_2fn(DisasContext *s, arg_rr_e *a,
 +                                  const FPScalar1Int *fnormal,
 +                                  const FPScalar1Int *fah)
 +{
 +    return do_fp1_scalar_int(s, a, s->fpcr_ah ? fah : fnormal, true);
 +}
 +
  static const FPScalar1Int f_scalar_fmov = {
      tcg_gen_mov_i32,
      tcg_gen_mov_i32,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fneg = {
      gen_vfp_negs,
      gen_vfp_negd,
  };
 -TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg, true)
 +static const FPScalar1Int f_scalar_ah_fneg = {
 +    gen_vfp_ah_negh,
 +    gen_vfp_ah_negs,
 +    gen_vfp_ah_negd,
 +};
 +TRANS(FNEG_s, do_fp1_scalar_int_2fn, a, &f_scalar_fneg, &f_scalar_ah_fneg)
  typedef struct FPScalar1 {
      void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr);
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 17/28] hw/misc/tz-mpc.c: Implement the Arm TrustZone Memory Protection Controller
+[PULL 31/68] target/arm: Implement FPCR.AH handling for scalar FABS and FABD
-Implement the Arm TrustZone Memory Protection Controller, which sits
+FPCR.AH == 1 mandates that taking the absolute value of a NaN should
-in front of RAM and allows secure software to configure it to either
+not change its sign bit.  This means we can no longer use
-pass through or reject transactions.
+gen_vfp_abs*() everywhere but must instead generate slightly more
 complex code when FPCR.AH is set.
-We implement the MPC as a QEMU IOMMU, which will direct transactions
+Implement these semantics for scalar FABS and FABD.  This change also
-either through to the devices and memory behind it or to a special
+affects all other instructions whose psuedocode calls FPAbs(); we
-"never works" AddressSpace if they are blocked.
+will extend the change to those instructions in following commits.
 This initial commit implements the skeleton of the device:
  * it always permits accesses
  * it doesn't implement most of the registers
  * it doesn't implement the interrupt or other behaviour
    for blocked transactions
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Eric Auger <eric.auger@redhat.com>
 Message-id: 20180620132032.28865-2-peter.maydell@linaro.org
 ---
- hw/misc/Makefile.objs           |   1 +
+ target/arm/tcg/translate-a64.c | 69 +++++++++++++++++++++++++++++++++-
- include/hw/misc/tz-mpc.h        |  70 ++++++
+file changed, 67 insertions(+), 2 deletions(-)
  hw/misc/tz-mpc.c                | 399 ++++++++++++++++++++++++++++++++
  MAINTAINERS                     |   2 +
  default-configs/arm-softmmu.mak |   1 +
  hw/misc/trace-events            |   7 +
 files changed, 480 insertions(+)
  create mode 100644 include/hw/misc/tz-mpc.h
  create mode 100644 hw/misc/tz-mpc.c
-diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/Makefile.objs
+--- a/target/arm/tcg/translate-a64.c
-+++ b/hw/misc/Makefile.objs
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_MIPS_ITU) += mips_itu.o
+@@ -XXX,XX +XXX,XX @@ static void gen_vfp_ah_negd(TCGv_i64 d, TCGv_i64 s)
- obj-$(CONFIG_MPS2_FPGAIO) += mps2-fpgaio.o
+                         s, chs_s);
- obj-$(CONFIG_MPS2_SCC) += mps2-scc.o
+ }
 +obj-$(CONFIG_TZ_MPC) += tz-mpc.o
  obj-$(CONFIG_TZ_PPC) += tz-ppc.o
  obj-$(CONFIG_IOTKIT_SECCTL) += iotkit-secctl.o
 diff --git a/include/hw/misc/tz-mpc.h b/include/hw/misc/tz-mpc.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/misc/tz-mpc.h
@@ -XXX,XX +XXX,XX @@
 +/*
-+ * ARM AHB5 TrustZone Memory Protection Controller emulation
++ * These functions implement
-+ *
++ *  d = floatN_is_any_nan(s) ? s : floatN_abs(s)
-+ * Copyright (c) 2018 Linaro Limited
++ * which for float32 is
-+ * Written by Peter Maydell
++ *  d = (s & ~(1 << 31)) > 0x7f800000UL) ? s : (s & ~(1 << 31))
-+ *
++ * and similarly for the other float sizes.
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 or
 + * (at your option) any later version.
 + */
++static void gen_vfp_ah_absh(TCGv_i32 d, TCGv_i32 s)
++{
++    TCGv_i32 abs_s = tcg_temp_new_i32();
 +
-+/* This is a model of the TrustZone memory protection controller (MPC).
++    gen_vfp_absh(abs_s, s);
-+ * It is documented in the ARM CoreLink SIE-200 System IP for Embedded TRM
++    tcg_gen_movcond_i32(TCG_COND_GTU, d,
-+ * (DDI 0571G):
++                        abs_s, tcg_constant_i32(0x7c00),
-+ * https://developer.arm.com/products/architecture/m-profile/docs/ddi0571/g
++                        s, abs_s);
 + *
 + * The MPC sits in front of memory and allows secure software to
 + * configure it to either pass through or reject transactions.
 + * Rejected transactions may be configured to either be aborted, or to
 + * behave as RAZ/WI. An interrupt can be signalled for a rejected transaction.
 + *
 + * The MPC has a register interface which the guest uses to configure it.
 + *
 + * QEMU interface:
 + * + sysbus MMIO region 0: MemoryRegion for the MPC's config registers
 + * + sysbus MMIO region 1: MemoryRegion for the upstream end of the MPC
 + * + Property "downstream": MemoryRegion defining the downstream memory
 + * + Named GPIO output "irq": set for a transaction-failed interrupt
 + */
 +
 +#ifndef TZ_MPC_H
 +#define TZ_MPC_H
 +
 +#include "hw/sysbus.h"
 +
 +#define TYPE_TZ_MPC "tz-mpc"
 +#define TZ_MPC(obj) OBJECT_CHECK(TZMPC, (obj), TYPE_TZ_MPC)
 +
 +#define TZ_NUM_PORTS 16
 +
 +#define TYPE_TZ_MPC_IOMMU_MEMORY_REGION "tz-mpc-iommu-memory-region"
 +
 +typedef struct TZMPC TZMPC;
 +
 +struct TZMPC {
 +    /*< private >*/
 +    SysBusDevice parent_obj;
 +
 +    /*< public >*/
 +
 +    qemu_irq irq;
 +
 +    /* Properties */
 +    MemoryRegion *downstream;
 +
 +    hwaddr blocksize;
 +    uint32_t blk_max;
 +
 +    /* MemoryRegions exposed to user */
 +    MemoryRegion regmr;
 +    IOMMUMemoryRegion upstream;
 +
 +    /* MemoryRegion used internally */
 +    MemoryRegion blocked_io;
 +
 +    AddressSpace downstream_as;
 +    AddressSpace blocked_io_as;
 +};
 +
 +#endif
 diff --git a/hw/misc/tz-mpc.c b/hw/misc/tz-mpc.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/misc/tz-mpc.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * ARM AHB5 TrustZone Memory Protection Controller emulation
 + *
 + * Copyright (c) 2018 Linaro Limited
 + * Written by Peter Maydell
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 or
 + * (at your option) any later version.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/log.h"
 +#include "qapi/error.h"
 +#include "trace.h"
 +#include "hw/sysbus.h"
 +#include "hw/registerfields.h"
 +#include "hw/misc/tz-mpc.h"
 +
 +/* Our IOMMU has two IOMMU indexes, one for secure transactions and one for
 + * non-secure transactions.
 + */
 +enum {
 +    IOMMU_IDX_S,
 +    IOMMU_IDX_NS,
 +    IOMMU_NUM_INDEXES,
 +};
 +
 +/* Config registers */
 +REG32(CTRL, 0x00)
 +REG32(BLK_MAX, 0x10)
 +REG32(BLK_CFG, 0x14)
 +REG32(BLK_IDX, 0x18)
 +REG32(BLK_LUT, 0x1c)
 +REG32(INT_STAT, 0x20)
 +REG32(INT_CLEAR, 0x24)
 +REG32(INT_EN, 0x28)
 +REG32(INT_INFO1, 0x2c)
 +REG32(INT_INFO2, 0x30)
 +REG32(INT_SET, 0x34)
 +REG32(PIDR4, 0xfd0)
 +REG32(PIDR5, 0xfd4)
 +REG32(PIDR6, 0xfd8)
 +REG32(PIDR7, 0xfdc)
 +REG32(PIDR0, 0xfe0)
 +REG32(PIDR1, 0xfe4)
 +REG32(PIDR2, 0xfe8)
 +REG32(PIDR3, 0xfec)
 +REG32(CIDR0, 0xff0)
 +REG32(CIDR1, 0xff4)
 +REG32(CIDR2, 0xff8)
 +REG32(CIDR3, 0xffc)
 +
 +static const uint8_t tz_mpc_idregs[] = {
 +    0x04, 0x00, 0x00, 0x00,
 +    0x60, 0xb8, 0x1b, 0x00,
 +    0x0d, 0xf0, 0x05, 0xb1,
 +};
 +
 +static MemTxResult tz_mpc_reg_read(void *opaque, hwaddr addr,
 +                                   uint64_t *pdata,
 +                                   unsigned size, MemTxAttrs attrs)
 +{
 +    uint64_t r;
 +    uint32_t offset = addr & ~0x3;
 +
 +    if (!attrs.secure && offset < A_PIDR4) {
 +        /* NS accesses can only see the ID registers */
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "TZ MPC register read: NS access to offset 0x%x\n",
 +                      offset);
 +        r = 0;
 +        goto read_out;
 +    }
 +
 +    switch (offset) {
 +    case A_PIDR4:
 +    case A_PIDR5:
 +    case A_PIDR6:
 +    case A_PIDR7:
 +    case A_PIDR0:
 +    case A_PIDR1:
 +    case A_PIDR2:
 +    case A_PIDR3:
 +    case A_CIDR0:
 +    case A_CIDR1:
 +    case A_CIDR2:
 +    case A_CIDR3:
 +        r = tz_mpc_idregs[(offset - A_PIDR4) / 4];
 +        break;
 +    case A_INT_CLEAR:
 +    case A_INT_SET:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "TZ MPC register read: write-only offset 0x%x\n",
 +                      offset);
 +        r = 0;
 +        break;
 +    default:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "TZ MPC register read: bad offset 0x%x\n", offset);
 +        r = 0;
 +        break;
 +    }
 +
 +    if (size != 4) {
 +        /* None of our registers are read-sensitive (except BLK_LUT,
 +         * which can special case the "size not 4" case), so just
 +         * pull the right bytes out of the word read result.
 +         */
 +        r = extract32(r, (addr & 3) * 8, size * 8);
 +    }
 +
 +read_out:
 +    trace_tz_mpc_reg_read(addr, r, size);
 +    *pdata = r;
 +    return MEMTX_OK;
 +}
 +
-+static MemTxResult tz_mpc_reg_write(void *opaque, hwaddr addr,
++static void gen_vfp_ah_abss(TCGv_i32 d, TCGv_i32 s)
 +                                    uint64_t value,
 +                                    unsigned size, MemTxAttrs attrs)
 +{
-+    uint32_t offset = addr & ~0x3;
++    TCGv_i32 abs_s = tcg_temp_new_i32();
 +
-+    trace_tz_mpc_reg_write(addr, value, size);
++    gen_vfp_abss(abs_s, s);
-+
++    tcg_gen_movcond_i32(TCG_COND_GTU, d,
-+    if (!attrs.secure && offset < A_PIDR4) {
++                        abs_s, tcg_constant_i32(0x7f800000UL),
-+        /* NS accesses can only see the ID registers */
++                        s, abs_s);
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "TZ MPC register write: NS access to offset 0x%x\n",
 +                      offset);
 +        return MEMTX_OK;
 +    }
 +
 +    if (size != 4) {
 +        /* Expand the byte or halfword write to a full word size.
 +         * In most cases we can do this with zeroes; the exceptions
 +         * are CTRL, BLK_IDX and BLK_LUT.
 +         */
 +        uint32_t oldval;
 +
 +        switch (offset) {
 +            /* As we add support for registers which need expansions
 +             * other than zeroes we'll fill in cases here.
 +             */
 +        default:
 +            oldval = 0;
 +            break;
 +        }
 +        value = deposit32(oldval, (addr & 3) * 8, size * 8, value);
 +    }
 +
 +    switch (offset) {
 +    case A_PIDR4:
 +    case A_PIDR5:
 +    case A_PIDR6:
 +    case A_PIDR7:
 +    case A_PIDR0:
 +    case A_PIDR1:
 +    case A_PIDR2:
 +    case A_PIDR3:
 +    case A_CIDR0:
 +    case A_CIDR1:
 +    case A_CIDR2:
 +    case A_CIDR3:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "TZ MPC register write: read-only offset 0x%x\n", offset);
 +        break;
 +    default:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "TZ MPC register write: bad offset 0x%x\n", offset);
 +        break;
 +    }
 +
 +    return MEMTX_OK;
 +}
 +
-+static const MemoryRegionOps tz_mpc_reg_ops = {
++static void gen_vfp_ah_absd(TCGv_i64 d, TCGv_i64 s)
-+    .read_with_attrs = tz_mpc_reg_read,
++{
-+    .write_with_attrs = tz_mpc_reg_write,
++    TCGv_i64 abs_s = tcg_temp_new_i64();
 +    .endianness = DEVICE_LITTLE_ENDIAN,
 +    .valid.min_access_size = 1,
 +    .valid.max_access_size = 4,
 +    .impl.min_access_size = 1,
 +    .impl.max_access_size = 4,
 +};
 +
-+/* Accesses only reach these read and write functions if the MPC is
++    gen_vfp_absd(abs_s, s);
-+ * blocking them; non-blocked accesses go directly to the downstream
++    tcg_gen_movcond_i64(TCG_COND_GTU, d,
-+ * memory region without passing through this code.
++                        abs_s, tcg_constant_i64(0x7ff0000000000000ULL),
-+ */
++                        s, abs_s);
 +static MemTxResult tz_mpc_mem_blocked_read(void *opaque, hwaddr addr,
 +                                           uint64_t *pdata,
 +                                           unsigned size, MemTxAttrs attrs)
 +{
 +    trace_tz_mpc_mem_blocked_read(addr, size, attrs.secure);
 +
 +    *pdata = 0;
 +    return MEMTX_OK;
 +}
 +
-+static MemTxResult tz_mpc_mem_blocked_write(void *opaque, hwaddr addr,
+ static void gen_vfp_maybe_ah_negh(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
-+                                            uint64_t value,
+ {
-+                                            unsigned size, MemTxAttrs attrs)
+     if (dc->fpcr_ah) {
@@ -XXX,XX +XXX,XX @@ static void gen_fabd_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
      gen_vfp_absd(d, d);
  }
 +static void gen_fabd_ah_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 +{
-+    trace_tz_mpc_mem_blocked_write(addr, value, size, attrs.secure);
++    gen_helper_vfp_subh(d, n, m, s);
-+
++    gen_vfp_ah_absh(d, d);
 +    return MEMTX_OK;
 +}
 +
-+static const MemoryRegionOps tz_mpc_mem_blocked_ops = {
++static void gen_fabd_ah_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 +    .read_with_attrs = tz_mpc_mem_blocked_read,
 +    .write_with_attrs = tz_mpc_mem_blocked_write,
 +    .endianness = DEVICE_LITTLE_ENDIAN,
 +    .valid.min_access_size = 1,
 +    .valid.max_access_size = 8,
 +    .impl.min_access_size = 1,
 +    .impl.max_access_size = 8,
 +};
 +
 +static IOMMUTLBEntry tz_mpc_translate(IOMMUMemoryRegion *iommu,
 +                                      hwaddr addr, IOMMUAccessFlags flags,
 +                                      int iommu_idx)
 +{
-+    TZMPC *s = TZ_MPC(container_of(iommu, TZMPC, upstream));
++    gen_helper_vfp_subs(d, n, m, s);
-+    bool ok;
++    gen_vfp_ah_abss(d, d);
 +
 +    IOMMUTLBEntry ret = {
 +        .iova = addr & ~(s->blocksize - 1),
 +        .translated_addr = addr & ~(s->blocksize - 1),
 +        .addr_mask = s->blocksize - 1,
 +        .perm = IOMMU_RW,
 +    };
 +
 +    /* Look at the per-block configuration for this address, and
 +     * return a TLB entry directing the transaction at either
 +     * downstream_as or blocked_io_as, as appropriate.
 +     * For the moment, always permit accesses.
 +     */
 +    ok = true;
 +
 +    trace_tz_mpc_translate(addr, flags,
 +                           iommu_idx == IOMMU_IDX_S ? "S" : "NS",
 +                           ok ? "pass" : "block");
 +
 +    ret.target_as = ok ? &s->downstream_as : &s->blocked_io_as;
 +    return ret;
 +}
 +
-+static int tz_mpc_attrs_to_index(IOMMUMemoryRegion *iommu, MemTxAttrs attrs)
++static void gen_fabd_ah_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
 +{
-+    /* We treat unspecified attributes like secure. Transactions with
++    gen_helper_vfp_subd(d, n, m, s);
-+     * unspecified attributes come from places like
++    gen_vfp_ah_absd(d, d);
 +     * cpu_physical_memory_write_rom() for initial image load, and we want
 +     * those to pass through the from-reset "everything is secure" config.
 +     * All the real during-emulation transactions from the CPU will
 +     * specify attributes.
 +     */
 +    return (attrs.unspecified || attrs.secure) ? IOMMU_IDX_S : IOMMU_IDX_NS;
 +}
 +
-+static int tz_mpc_num_indexes(IOMMUMemoryRegion *iommu)
+ static const FPScalar f_scalar_fabd = {
-+{
+     gen_fabd_h,
-+    return IOMMU_NUM_INDEXES;
+     gen_fabd_s,
-+}
+     gen_fabd_d,
-+
+ };
-+static void tz_mpc_reset(DeviceState *dev)
+-TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd, a->rn)
-+{
++static const FPScalar f_scalar_ah_fabd = {
-+}
++    gen_fabd_ah_h,
-+
++    gen_fabd_ah_s,
-+static void tz_mpc_init(Object *obj)
++    gen_fabd_ah_d,
 +{
 +    DeviceState *dev = DEVICE(obj);
 +    TZMPC *s = TZ_MPC(obj);
 +
 +    qdev_init_gpio_out_named(dev, &s->irq, "irq", 1);
 +}
 +
 +static void tz_mpc_realize(DeviceState *dev, Error **errp)
 +{
 +    Object *obj = OBJECT(dev);
 +    SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
 +    TZMPC *s = TZ_MPC(dev);
 +    uint64_t size;
 +
 +    /* We can't create the upstream end of the port until realize,
 +     * as we don't know the size of the MR used as the downstream until then.
 +     * We insist on having a downstream, to avoid complicating the code
 +     * with handling the "don't know how big this is" case. It's easy
 +     * enough for the user to create an unimplemented_device as downstream
 +     * if they have nothing else to plug into this.
 +     */
 +    if (!s->downstream) {
 +        error_setg(errp, "MPC 'downstream' link not set");
 +        return;
 +    }
 +
 +    size = memory_region_size(s->downstream);
 +
 +    memory_region_init_iommu(&s->upstream, sizeof(s->upstream),
 +                             TYPE_TZ_MPC_IOMMU_MEMORY_REGION,
 +                             obj, "tz-mpc-upstream", size);
 +
 +    /* In real hardware the block size is configurable. In QEMU we could
 +     * make it configurable but will need it to be at least as big as the
 +     * target page size so we can execute out of the resulting MRs. Guest
 +     * software is supposed to check the block size using the BLK_CFG
 +     * register, so make it fixed at the page size.
 +     */
 +    s->blocksize = memory_region_iommu_get_min_page_size(&s->upstream);
 +    if (size % s->blocksize != 0) {
 +        error_setg(errp,
 +                   "MPC 'downstream' size %" PRId64
 +                   " is not a multiple of %" HWADDR_PRIx " bytes",
 +                   size, s->blocksize);
 +        object_unref(OBJECT(&s->upstream));
 +        return;
 +    }
 +
 +    /* BLK_MAX is the max value of BLK_IDX, which indexes an array of 32-bit
 +     * words, each bit of which indicates one block.
 +     */
 +    s->blk_max = DIV_ROUND_UP(size / s->blocksize, 32);
 +
 +    memory_region_init_io(&s->regmr, obj, &tz_mpc_reg_ops,
 +                          s, "tz-mpc-regs", 0x1000);
 +    sysbus_init_mmio(sbd, &s->regmr);
 +
 +    sysbus_init_mmio(sbd, MEMORY_REGION(&s->upstream));
 +
 +    /* This memory region is not exposed to users of this device as a
 +     * sysbus MMIO region, but is instead used internally as something
 +     * that our IOMMU translate function might direct accesses to.
 +     */
 +    memory_region_init_io(&s->blocked_io, obj, &tz_mpc_mem_blocked_ops,
 +                          s, "tz-mpc-blocked-io", size);
 +
 +    address_space_init(&s->downstream_as, s->downstream,
 +                       "tz-mpc-downstream");
 +    address_space_init(&s->blocked_io_as, &s->blocked_io,
 +                       "tz-mpc-blocked-io");
 +}
 +
 +static const VMStateDescription tz_mpc_vmstate = {
 +    .name = "tz-mpc",
 +    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .fields = (VMStateField[]) {
 +        VMSTATE_END_OF_LIST()
 +    }
 +};
-+
++TRANS(FABD_s, do_fp3_scalar_2fn, a, &f_scalar_fabd, &f_scalar_ah_fabd, a->rn)
-+static Property tz_mpc_properties[] = {
-+    DEFINE_PROP_LINK("downstream", TZMPC, downstream,
+ static const FPScalar f_scalar_frecps = {
-+                     TYPE_MEMORY_REGION, MemoryRegion *),
+     gen_helper_recpsf_f16,
-+    DEFINE_PROP_END_OF_LIST(),
+@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fabs = {
      gen_vfp_abss,
      gen_vfp_absd,
  };
 -TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs, true)
 +static const FPScalar1Int f_scalar_ah_fabs = {
 +    gen_vfp_ah_absh,
 +    gen_vfp_ah_abss,
 +    gen_vfp_ah_absd,
 +};
-+
++TRANS(FABS_s, do_fp1_scalar_int_2fn, a, &f_scalar_fabs, &f_scalar_ah_fabs)
-+static void tz_mpc_class_init(ObjectClass *klass, void *data)
-+{
+ static const FPScalar1Int f_scalar_fneg = {
-+    DeviceClass *dc = DEVICE_CLASS(klass);
+     gen_vfp_negh,
 +
 +    dc->realize = tz_mpc_realize;
 +    dc->vmsd = &tz_mpc_vmstate;
 +    dc->reset = tz_mpc_reset;
 +    dc->props = tz_mpc_properties;
 +}
 +
 +static const TypeInfo tz_mpc_info = {
 +    .name = TYPE_TZ_MPC,
 +    .parent = TYPE_SYS_BUS_DEVICE,
 +    .instance_size = sizeof(TZMPC),
 +    .instance_init = tz_mpc_init,
 +    .class_init = tz_mpc_class_init,
 +};
 +
 +static void tz_mpc_iommu_memory_region_class_init(ObjectClass *klass,
 +                                                  void *data)
 +{
 +    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
 +
 +    imrc->translate = tz_mpc_translate;
 +    imrc->attrs_to_index = tz_mpc_attrs_to_index;
 +    imrc->num_indexes = tz_mpc_num_indexes;
 +}
 +
 +static const TypeInfo tz_mpc_iommu_memory_region_info = {
 +    .name = TYPE_TZ_MPC_IOMMU_MEMORY_REGION,
 +    .parent = TYPE_IOMMU_MEMORY_REGION,
 +    .class_init = tz_mpc_iommu_memory_region_class_init,
 +};
 +
 +static void tz_mpc_register_types(void)
 +{
 +    type_register_static(&tz_mpc_info);
 +    type_register_static(&tz_mpc_iommu_memory_region_info);
 +}
 +
 +type_init(tz_mpc_register_types);
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: hw/char/cmsdk-apb-uart.c
  F: include/hw/char/cmsdk-apb-uart.h
  F: hw/misc/tz-ppc.c
  F: include/hw/misc/tz-ppc.h
 +F: hw/misc/tz-mpc.c
 +F: include/hw/misc/tz-mpc.h
  ARM cores
  M: Peter Maydell <peter.maydell@linaro.org>
 diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
 index XXXXXXX..XXXXXXX 100644
 --- a/default-configs/arm-softmmu.mak
 +++ b/default-configs/arm-softmmu.mak
@@ -XXX,XX +XXX,XX @@ CONFIG_CMSDK_APB_UART=y
  CONFIG_MPS2_FPGAIO=y
  CONFIG_MPS2_SCC=y
 +CONFIG_TZ_MPC=y
  CONFIG_TZ_PPC=y
  CONFIG_IOTKIT=y
  CONFIG_IOTKIT_SECCTL=y
 diff --git a/hw/misc/trace-events b/hw/misc/trace-events
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/trace-events
 +++ b/hw/misc/trace-events
@@ -XXX,XX +XXX,XX @@ mos6522_set_sr_int(void) "set sr_int"
  mos6522_write(uint64_t addr, uint64_t val) "reg=0x%"PRIx64 " val=0x%"PRIx64
  mos6522_read(uint64_t addr, unsigned val) "reg=0x%"PRIx64 " val=0x%x"
 +# hw/misc/tz-mpc.c
 +tz_mpc_reg_read(uint32_t offset, uint64_t data, unsigned size) "TZ MPC regs read: offset 0x%x data 0x%" PRIx64 " size %u"
 +tz_mpc_reg_write(uint32_t offset, uint64_t data, unsigned size) "TZ MPC regs write: offset 0x%x data 0x%" PRIx64 " size %u"
 +tz_mpc_mem_blocked_read(uint64_t addr, unsigned size, bool secure) "TZ MPC blocked read: offset 0x%" PRIx64 " size %u secure %d"
 +tz_mpc_mem_blocked_write(uint64_t addr, uint64_t data, unsigned size, bool secure) "TZ MPC blocked write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u secure %d"
 +tz_mpc_translate(uint64_t addr, int flags, const char *idx, const char *res) "TZ MPC translate: addr 0x%" PRIx64 " flags 0x%x iommu_idx %s: %s"
 +
  # hw/misc/tz-ppc.c
  tz_ppc_reset(void) "TZ PPC: reset"
  tz_ppc_cfg_nonsec(int n, int level) "TZ PPC: cfg_nonsec[%d] = %d"
 --
-.17.1
+.34.1

-New patch
+[PULL 32/68] target/arm: Handle FPCR.AH in vector FABD
+Split the handling of vector FABD so that it calls a different set
+of helpers when FPCR.AH is 1, which implement the "no negation of
+the sign of a NaN" semantics.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/helper.h            |  4 ++++
+ target/arm/tcg/translate-a64.c |  7 ++++++-
+ target/arm/tcg/vec_helper.c    | 23 +++++++++++++++++++++++
+files changed, 33 insertions(+), 1 deletion(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fabd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(gvec_fabd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(gvec_fabd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fabd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fabd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fabd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_5(gvec_fceq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(gvec_fceq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(gvec_fceq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fabd[3] = {
+     gen_helper_gvec_fabd_s,
+     gen_helper_gvec_fabd_d,
+ };
+-TRANS(FABD_v, do_fp3_vector, a, 0, f_vector_fabd)
++static gen_helper_gvec_3_ptr * const f_vector_ah_fabd[3] = {
++    gen_helper_gvec_ah_fabd_h,
++    gen_helper_gvec_ah_fabd_s,
++    gen_helper_gvec_ah_fabd_d,
++};
++TRANS(FABD_v, do_fp3_vector_2fn, a, 0, f_vector_fabd, f_vector_ah_fabd)
+ static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
+     gen_helper_gvec_recps_h,
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ static float64 float64_abd(float64 op1, float64 op2, float_status *stat)
+     return float64_abs(float64_sub(op1, op2, stat));
+ }
++/* ABD when FPCR.AH = 1: avoid flipping sign bit of a NaN result */
++static float16 float16_ah_abd(float16 op1, float16 op2, float_status *stat)
++{
++    float16 r = float16_sub(op1, op2, stat);
++    return float16_is_any_nan(r) ? r : float16_abs(r);
++}
++
++static float32 float32_ah_abd(float32 op1, float32 op2, float_status *stat)
++{
++    float32 r = float32_sub(op1, op2, stat);
++    return float32_is_any_nan(r) ? r : float32_abs(r);
++}
++
++static float64 float64_ah_abd(float64 op1, float64 op2, float_status *stat)
++{
++    float64 r = float64_sub(op1, op2, stat);
++    return float64_is_any_nan(r) ? r : float64_abs(r);
++}
++
+ /*
+  * Reciprocal step. These are the AArch32 version which uses a
+  * non-fused multiply-and-subtract.
+@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_fabd_h, float16_abd, float16)
+ DO_3OP(gvec_fabd_s, float32_abd, float32)
+ DO_3OP(gvec_fabd_d, float64_abd, float64)
++DO_3OP(gvec_ah_fabd_h, float16_ah_abd, float16)
++DO_3OP(gvec_ah_fabd_s, float32_ah_abd, float32)
++DO_3OP(gvec_ah_fabd_d, float64_ah_abd, float64)
++
+ DO_3OP(gvec_fceq_h, float16_ceq, float16)
+ DO_3OP(gvec_fceq_s, float32_ceq, float32)
+ DO_3OP(gvec_fceq_d, float64_ceq, float64)
+--
+.34.1

-New patch
+[PULL 33/68] target/arm: Handle FPCR.AH in SVE FNEG
+Make SVE FNEG honour the FPCR.AH "don't negate the sign of a NaN"
+semantics.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 4 ++++
+ target/arm/tcg/sve_helper.c    | 8 ++++++++
+ target/arm/tcg/translate-sve.c | 7 ++++++-
+files changed, 18 insertions(+), 1 deletion(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++
+ DEF_HELPER_FLAGS_4(sve_not_zpz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_not_zpz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_not_zpz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_ZPZ(sve_fneg_h, uint16_t, H1_2, DO_FNEG)
+ DO_ZPZ(sve_fneg_s, uint32_t, H1_4, DO_FNEG)
+ DO_ZPZ_D(sve_fneg_d, uint64_t, DO_FNEG)
++#define DO_AH_FNEG_H(N) (float16_is_any_nan(N) ? (N) : DO_FNEG(N))
++#define DO_AH_FNEG_S(N) (float32_is_any_nan(N) ? (N) : DO_FNEG(N))
++#define DO_AH_FNEG_D(N) (float64_is_any_nan(N) ? (N) : DO_FNEG(N))
++
++DO_ZPZ(sve_ah_fneg_h, uint16_t, H1_2, DO_AH_FNEG_H)
++DO_ZPZ(sve_ah_fneg_s, uint32_t, H1_4, DO_AH_FNEG_S)
++DO_ZPZ_D(sve_ah_fneg_d, uint64_t, DO_AH_FNEG_D)
++
+ #define DO_NOT(N)    (~N)
+ DO_ZPZ(sve_not_zpz_b, uint8_t, H1, DO_NOT)
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const fneg_fns[4] = {
+     NULL,                  gen_helper_sve_fneg_h,
+     gen_helper_sve_fneg_s, gen_helper_sve_fneg_d,
+ };
+-TRANS_FEAT(FNEG, aa64_sve, gen_gvec_ool_arg_zpz, fneg_fns[a->esz], a, 0)
++static gen_helper_gvec_3 * const fneg_ah_fns[4] = {
++    NULL,                  gen_helper_sve_ah_fneg_h,
++    gen_helper_sve_ah_fneg_s, gen_helper_sve_ah_fneg_d,
++};
++TRANS_FEAT(FNEG, aa64_sve, gen_gvec_ool_arg_zpz,
++           s->fpcr_ah ? fneg_ah_fns[a->esz] : fneg_fns[a->esz], a, 0)
+ static gen_helper_gvec_3 * const sxtb_fns[4] = {
+     NULL,                  gen_helper_sve_sxtb_h,
+--
+.34.1

-New patch
+[PULL 34/68] target/arm: Handle FPCR.AH in SVE FABS
+Make SVE FABS honour the FPCR.AH "don't negate the sign of a NaN"
+semantics.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 4 ++++
+ target/arm/tcg/sve_helper.c    | 8 ++++++++
+ target/arm/tcg/translate-sve.c | 7 ++++++-
+files changed, 18 insertions(+), 1 deletion(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_fabs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fabs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fabs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fabs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fabs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fabs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++
+ DEF_HELPER_FLAGS_4(sve_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_ZPZ(sve_fabs_h, uint16_t, H1_2, DO_FABS)
+ DO_ZPZ(sve_fabs_s, uint32_t, H1_4, DO_FABS)
+ DO_ZPZ_D(sve_fabs_d, uint64_t, DO_FABS)
++#define DO_AH_FABS_H(N) (float16_is_any_nan(N) ? (N) : DO_FABS(N))
++#define DO_AH_FABS_S(N) (float32_is_any_nan(N) ? (N) : DO_FABS(N))
++#define DO_AH_FABS_D(N) (float64_is_any_nan(N) ? (N) : DO_FABS(N))
++
++DO_ZPZ(sve_ah_fabs_h, uint16_t, H1_2, DO_AH_FABS_H)
++DO_ZPZ(sve_ah_fabs_s, uint32_t, H1_4, DO_AH_FABS_S)
++DO_ZPZ_D(sve_ah_fabs_d, uint64_t, DO_AH_FABS_D)
++
+ #define DO_FNEG(N)    (N ^ ~((__typeof(N))-1 >> 1))
+ DO_ZPZ(sve_fneg_h, uint16_t, H1_2, DO_FNEG)
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const fabs_fns[4] = {
+     NULL,                  gen_helper_sve_fabs_h,
+     gen_helper_sve_fabs_s, gen_helper_sve_fabs_d,
+ };
+-TRANS_FEAT(FABS, aa64_sve, gen_gvec_ool_arg_zpz, fabs_fns[a->esz], a, 0)
++static gen_helper_gvec_3 * const fabs_ah_fns[4] = {
++    NULL,                  gen_helper_sve_ah_fabs_h,
++    gen_helper_sve_ah_fabs_s, gen_helper_sve_ah_fabs_d,
++};
++TRANS_FEAT(FABS, aa64_sve, gen_gvec_ool_arg_zpz,
++           s->fpcr_ah ? fabs_ah_fns[a->esz] : fabs_fns[a->esz], a, 0)
+ static gen_helper_gvec_3 * const fneg_fns[4] = {
+     NULL,                  gen_helper_sve_fneg_h,
+--
+.34.1

-New patch
+[PULL 35/68] target/arm: Handle FPCR.AH in SVE FABD
+Make the SVE FABD insn honour the FPCR.AH "don't negate the sign
+of a NaN" semantics.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    |  7 +++++++
+ target/arm/tcg/sve_helper.c    | 22 ++++++++++++++++++++++
+ target/arm/tcg/translate-sve.c |  2 +-
+files changed, 30 insertions(+), 1 deletion(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_fabd_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_6(sve_fabd_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fabd_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fabd_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fabd_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_6(sve_fscalbn_h, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_6(sve_fscalbn_s, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ static inline float64 abd_d(float64 a, float64 b, float_status *s)
+     return float64_abs(float64_sub(a, b, s));
+ }
++/* ABD when FPCR.AH = 1: avoid flipping sign bit of a NaN result */
++static float16 ah_abd_h(float16 op1, float16 op2, float_status *stat)
++{
++    float16 r = float16_sub(op1, op2, stat);
++    return float16_is_any_nan(r) ? r : float16_abs(r);
++}
++
++static float32 ah_abd_s(float32 op1, float32 op2, float_status *stat)
++{
++    float32 r = float32_sub(op1, op2, stat);
++    return float32_is_any_nan(r) ? r : float32_abs(r);
++}
++
++static float64 ah_abd_d(float64 op1, float64 op2, float_status *stat)
++{
++    float64 r = float64_sub(op1, op2, stat);
++    return float64_is_any_nan(r) ? r : float64_abs(r);
++}
++
+ DO_ZPZZ_FP(sve_fabd_h, uint16_t, H1_2, abd_h)
+ DO_ZPZZ_FP(sve_fabd_s, uint32_t, H1_4, abd_s)
+ DO_ZPZZ_FP(sve_fabd_d, uint64_t, H1_8, abd_d)
++DO_ZPZZ_FP(sve_ah_fabd_h, uint16_t, H1_2, ah_abd_h)
++DO_ZPZZ_FP(sve_ah_fabd_s, uint32_t, H1_4, ah_abd_s)
++DO_ZPZZ_FP(sve_ah_fabd_d, uint64_t, H1_8, ah_abd_d)
+ static inline float64 scalbn_d(float64 a, int64_t b, float_status *s)
+ {
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ DO_ZPZZ_AH_FP(FMIN_zpzz, aa64_sve, sve_fmin, sve_ah_fmin)
+ DO_ZPZZ_AH_FP(FMAX_zpzz, aa64_sve, sve_fmax, sve_ah_fmax)
+ DO_ZPZZ_FP(FMINNM_zpzz, aa64_sve, sve_fminnum)
+ DO_ZPZZ_FP(FMAXNM_zpzz, aa64_sve, sve_fmaxnum)
+-DO_ZPZZ_FP(FABD, aa64_sve, sve_fabd)
++DO_ZPZZ_AH_FP(FABD, aa64_sve, sve_fabd, sve_ah_fabd)
+ DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn)
+ DO_ZPZZ_FP(FDIV, aa64_sve, sve_fdiv)
+ DO_ZPZZ_FP(FMULX, aa64_sve, sve_fmulx)
+--
+.34.1

-New patch
+[PULL 36/68] target/arm: Handle FPCR.AH in negation steps in SVE FCADD
+The negation steps in FCADD must honour FPCR.AH's "don't change the
+sign of a NaN" semantics.  Implement this in the same way we did for
+the base ASIMD FCADD, by encoding FPCR.AH into the SIMD data field
+passed to the helper and using that to decide whether to negate the
+values.
+The construction of neg_imag and neg_real were done to make it easy
+to apply both in parallel with two simple logical operations.  This
+changed with FPCR.AH, which is more complex than that. Switch to
+an approach that follows the pseudocode more closely, by extracting
+the 'rot=1' parameter from the SIMD data field and changing the
+sign of the appropriate input value.
+Note that there was a naming issue with neg_imag and neg_real.
+They were named backward, with neg_imag being non-zero for rot=1,
+and vice versa.  This was combined with reversed usage within the
+loop, so that the negation in the end turned out correct.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/vec_internal.h  | 17 ++++++++++++++
+ target/arm/tcg/sve_helper.c    | 42 ++++++++++++++++++++++++----------
+ target/arm/tcg/translate-sve.c |  2 +-
+files changed, 48 insertions(+), 13 deletions(-)
+diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_internal.h
++++ b/target/arm/tcg/vec_internal.h
+@@ -XXX,XX +XXX,XX @@
+ #ifndef TARGET_ARM_VEC_INTERNAL_H
+ #define TARGET_ARM_VEC_INTERNAL_H
++#include "fpu/softfloat.h"
++
+ /*
+  * Note that vector data is stored in host-endian 64-bit chunks,
+  * so addressing units smaller than that needs a host-endian fixup.
+@@ -XXX,XX +XXX,XX @@ float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2,
+  */
+ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp);
++static inline float16 float16_maybe_ah_chs(float16 a, bool fpcr_ah)
++{
++    return fpcr_ah && float16_is_any_nan(a) ? a : float16_chs(a);
++}
++
++static inline float32 float32_maybe_ah_chs(float32 a, bool fpcr_ah)
++{
++    return fpcr_ah && float32_is_any_nan(a) ? a : float32_chs(a);
++}
++
++static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah)
++{
++    return fpcr_ah && float64_is_any_nan(a) ? a : float64_chs(a);
++}
++
+ #endif /* TARGET_ARM_VEC_INTERNAL_H */
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg,
+ {
+     intptr_t j, i = simd_oprsz(desc);
+     uint64_t *g = vg;
+-    float16 neg_imag = float16_set_sign(0, simd_data(desc));
+-    float16 neg_real = float16_chs(neg_imag);
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+     do {
+         uint64_t pg = g[(i - 1) >> 6];
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg,
+             i -= 2 * sizeof(float16);
+             e0 = *(float16 *)(vn + H1_2(i));
+-            e1 = *(float16 *)(vm + H1_2(j)) ^ neg_real;
++            e1 = *(float16 *)(vm + H1_2(j));
+             e2 = *(float16 *)(vn + H1_2(j));
+-            e3 = *(float16 *)(vm + H1_2(i)) ^ neg_imag;
++            e3 = *(float16 *)(vm + H1_2(i));
++
++            if (rot) {
++                e3 = float16_maybe_ah_chs(e3, fpcr_ah);
++            } else {
++                e1 = float16_maybe_ah_chs(e1, fpcr_ah);
++            }
+             if (likely((pg >> (i & 63)) & 1)) {
+                 *(float16 *)(vd + H1_2(i)) = float16_add(e0, e1, s);
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg,
+ {
+     intptr_t j, i = simd_oprsz(desc);
+     uint64_t *g = vg;
+-    float32 neg_imag = float32_set_sign(0, simd_data(desc));
+-    float32 neg_real = float32_chs(neg_imag);
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+     do {
+         uint64_t pg = g[(i - 1) >> 6];
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg,
+             i -= 2 * sizeof(float32);
+             e0 = *(float32 *)(vn + H1_2(i));
+-            e1 = *(float32 *)(vm + H1_2(j)) ^ neg_real;
++            e1 = *(float32 *)(vm + H1_2(j));
+             e2 = *(float32 *)(vn + H1_2(j));
+-            e3 = *(float32 *)(vm + H1_2(i)) ^ neg_imag;
++            e3 = *(float32 *)(vm + H1_2(i));
++
++            if (rot) {
++                e3 = float32_maybe_ah_chs(e3, fpcr_ah);
++            } else {
++                e1 = float32_maybe_ah_chs(e1, fpcr_ah);
++            }
+             if (likely((pg >> (i & 63)) & 1)) {
+                 *(float32 *)(vd + H1_2(i)) = float32_add(e0, e1, s);
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg,
+ {
+     intptr_t j, i = simd_oprsz(desc);
+     uint64_t *g = vg;
+-    float64 neg_imag = float64_set_sign(0, simd_data(desc));
+-    float64 neg_real = float64_chs(neg_imag);
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+     do {
+         uint64_t pg = g[(i - 1) >> 6];
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg,
+             i -= 2 * sizeof(float64);
+             e0 = *(float64 *)(vn + H1_2(i));
+-            e1 = *(float64 *)(vm + H1_2(j)) ^ neg_real;
++            e1 = *(float64 *)(vm + H1_2(j));
+             e2 = *(float64 *)(vn + H1_2(j));
+-            e3 = *(float64 *)(vm + H1_2(i)) ^ neg_imag;
++            e3 = *(float64 *)(vm + H1_2(i));
++
++            if (rot) {
++                e3 = float64_maybe_ah_chs(e3, fpcr_ah);
++            } else {
++                e1 = float64_maybe_ah_chs(e1, fpcr_ah);
++            }
+             if (likely((pg >> (i & 63)) & 1)) {
+                 *(float64 *)(vd + H1_2(i)) = float64_add(e0, e1, s);
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_4_ptr * const fcadd_fns[] = {
+     gen_helper_sve_fcadd_s, gen_helper_sve_fcadd_d,
+ };
+ TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
+-           a->rd, a->rn, a->rm, a->pg, a->rot,
++           a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
+            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
+ #define DO_FMLA(NAME, name) \
+--
+.34.1

-New patch
+[PULL 37/68] target/arm: Handle FPCR.AH in negation steps in FCADD
+The negation steps in FCADD must honour FPCR.AH's "don't change the
+sign of a NaN" semantics.  Implement this by encoding FPCR.AH into
+the SIMD data field passed to the helper and using that to decide
+whether to negate the values.
+The construction of neg_imag and neg_real were done to make it easy
+to apply both in parallel with two simple logical operations.  This
+changed with FPCR.AH, which is more complex than that. Switch to
+an approach closer to the pseudocode, where we extract the rot
+parameter from the SIMD data word and negate the appropriate
+input value.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 10 +++++--
+ target/arm/tcg/vec_helper.c    | 54 +++++++++++++++++++---------------
+files changed, 38 insertions(+), 26 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fcadd[3] = {
+     gen_helper_gvec_fcadds,
+     gen_helper_gvec_fcaddd,
+ };
+-TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0, f_vector_fcadd)
+-TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1, f_vector_fcadd)
++/*
++ * Encode FPCR.AH into the data so the helper knows whether the
++ * negations it does should avoid flipping the sign bit on a NaN
++ */
++TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0 | (s->fpcr_ah << 1),
++           f_vector_fcadd)
++TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1 | (s->fpcr_ah << 1),
++           f_vector_fcadd)
+ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
+ {
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm,
+     float16 *d = vd;
+     float16 *n = vn;
+     float16 *m = vm;
+-    uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
+-    uint32_t neg_imag = neg_real ^ 1;
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
+     uintptr_t i;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 15;
+-    neg_imag <<= 15;
+-
+     for (i = 0; i < opr_sz / 2; i += 2) {
+         float16 e0 = n[H2(i)];
+-        float16 e1 = m[H2(i + 1)] ^ neg_imag;
++        float16 e1 = m[H2(i + 1)];
+         float16 e2 = n[H2(i + 1)];
+-        float16 e3 = m[H2(i)] ^ neg_real;
++        float16 e3 = m[H2(i)];
++
++        if (rot) {
++            e3 = float16_maybe_ah_chs(e3, fpcr_ah);
++        } else {
++            e1 = float16_maybe_ah_chs(e1, fpcr_ah);
++        }
+         d[H2(i)] = float16_add(e0, e1, fpst);
+         d[H2(i + 1)] = float16_add(e2, e3, fpst);
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcadds)(void *vd, void *vn, void *vm,
+     float32 *d = vd;
+     float32 *n = vn;
+     float32 *m = vm;
+-    uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
+-    uint32_t neg_imag = neg_real ^ 1;
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
+     uintptr_t i;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 31;
+-    neg_imag <<= 31;
+-
+     for (i = 0; i < opr_sz / 4; i += 2) {
+         float32 e0 = n[H4(i)];
+-        float32 e1 = m[H4(i + 1)] ^ neg_imag;
++        float32 e1 = m[H4(i + 1)];
+         float32 e2 = n[H4(i + 1)];
+-        float32 e3 = m[H4(i)] ^ neg_real;
++        float32 e3 = m[H4(i)];
++
++        if (rot) {
++            e3 = float32_maybe_ah_chs(e3, fpcr_ah);
++        } else {
++            e1 = float32_maybe_ah_chs(e1, fpcr_ah);
++        }
+         d[H4(i)] = float32_add(e0, e1, fpst);
+         d[H4(i + 1)] = float32_add(e2, e3, fpst);
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
+     float64 *d = vd;
+     float64 *n = vn;
+     float64 *m = vm;
+-    uint64_t neg_real = extract64(desc, SIMD_DATA_SHIFT, 1);
+-    uint64_t neg_imag = neg_real ^ 1;
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
+     uintptr_t i;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 63;
+-    neg_imag <<= 63;
+-
+     for (i = 0; i < opr_sz / 8; i += 2) {
+         float64 e0 = n[i];
+-        float64 e1 = m[i + 1] ^ neg_imag;
++        float64 e1 = m[i + 1];
+         float64 e2 = n[i + 1];
+-        float64 e3 = m[i] ^ neg_real;
++        float64 e3 = m[i];
++
++        if (rot) {
++            e3 = float64_maybe_ah_chs(e3, fpcr_ah);
++        } else {
++            e1 = float64_maybe_ah_chs(e1, fpcr_ah);
++        }
+         d[i] = float64_add(e0, e1, fpst);
+         d[i + 1] = float64_add(e2, e3, fpst);
+--
+.34.1

-[Qemu-devel] [PULL 18/28] hw/misc/tz-mpc.c: Implement registers
+[PULL 38/68] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
-Implement the missing registers for the TZ MPC.
+Handle the FPCR.AH semantics that we do not change the sign of an
 input NaN in the FRECPS and FRSQRTS scalar insns, by providing
 new helper functions that do the CHS part of the operation
 differently.
 Since the extra helper functions would be very repetitive if written
 out longhand, we condense them and the existing non-AH helpers into
 being emitted via macros.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Eric Auger <eric.auger@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20180620132032.28865-3-peter.maydell@linaro.org
 ---
- include/hw/misc/tz-mpc.h |  10 +++
+ target/arm/tcg/helper-a64.h    |   6 ++
- hw/misc/tz-mpc.c         | 140 ++++++++++++++++++++++++++++++++++++++-
+ target/arm/tcg/vec_internal.h  |  18 ++++++
-files changed, 147 insertions(+), 3 deletions(-)
+ target/arm/tcg/helper-a64.c    | 115 ++++++++++++---------------------
  target/arm/tcg/translate-a64.c |  25 +++++--
 files changed, 83 insertions(+), 81 deletions(-)
-diff --git a/include/hw/misc/tz-mpc.h b/include/hw/misc/tz-mpc.h
+diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/misc/tz-mpc.h
+--- a/target/arm/tcg/helper-a64.h
-+++ b/include/hw/misc/tz-mpc.h
++++ b/target/arm/tcg/helper-a64.h
-@@ -XXX,XX +XXX,XX @@ struct TZMPC {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(neon_cgt_f64, TCG_CALL_NO_RWG, i64, i64, i64, fpst)
+ DEF_HELPER_FLAGS_3(recpsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
-     /*< public >*/
+ DEF_HELPER_FLAGS_3(recpsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
+ DEF_HELPER_FLAGS_3(recpsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
-+    /* State */
++DEF_HELPER_FLAGS_3(recpsf_ah_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
-+    uint32_t ctrl;
++DEF_HELPER_FLAGS_3(recpsf_ah_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
-+    uint32_t blk_idx;
++DEF_HELPER_FLAGS_3(recpsf_ah_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
-+    uint32_t int_stat;
+ DEF_HELPER_FLAGS_3(rsqrtsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
-+    uint32_t int_en;
+ DEF_HELPER_FLAGS_3(rsqrtsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
-+    uint32_t int_info1;
+ DEF_HELPER_FLAGS_3(rsqrtsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
-+    uint32_t int_info2;
++DEF_HELPER_FLAGS_3(rsqrtsf_ah_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
-+
++DEF_HELPER_FLAGS_3(rsqrtsf_ah_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
-+    uint32_t *blk_lut;
++DEF_HELPER_FLAGS_3(rsqrtsf_ah_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
-+
+ DEF_HELPER_FLAGS_2(frecpx_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
-     qemu_irq irq;
+ DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
+ DEF_HELPER_FLAGS_2(frecpx_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
-     /* Properties */
+diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
-diff --git a/hw/misc/tz-mpc.c b/hw/misc/tz-mpc.c
+index XXXXXXX..XXXXXXX 100644
-index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_internal.h
---- a/hw/misc/tz-mpc.c
++++ b/target/arm/tcg/vec_internal.h
-+++ b/hw/misc/tz-mpc.c
+@@ -XXX,XX +XXX,XX @@ float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2,
-@@ -XXX,XX +XXX,XX @@ enum {
+  */
+ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp);
- /* Config registers */
- REG32(CTRL, 0x00)
++/*
-+    FIELD(CTRL, SEC_RESP, 4, 1)
++ * Negate as for FPCR.AH=1 -- do not negate NaNs.
-+    FIELD(CTRL, AUTOINC, 8, 1)
++ */
-+    FIELD(CTRL, LOCKDOWN, 31, 1)
++static inline float16 float16_ah_chs(float16 a)
  REG32(BLK_MAX, 0x10)
  REG32(BLK_CFG, 0x14)
  REG32(BLK_IDX, 0x18)
  REG32(BLK_LUT, 0x1c)
  REG32(INT_STAT, 0x20)
 +    FIELD(INT_STAT, IRQ, 0, 1)
  REG32(INT_CLEAR, 0x24)
 +    FIELD(INT_CLEAR, IRQ, 0, 1)
  REG32(INT_EN, 0x28)
 +    FIELD(INT_EN, IRQ, 0, 1)
  REG32(INT_INFO1, 0x2c)
  REG32(INT_INFO2, 0x30)
  REG32(INT_SET, 0x34)
 +    FIELD(INT_SET, IRQ, 0, 1)
  REG32(PIDR4, 0xfd0)
  REG32(PIDR5, 0xfd4)
  REG32(PIDR6, 0xfd8)
@@ -XXX,XX +XXX,XX @@ static const uint8_t tz_mpc_idregs[] = {
 x0d, 0xf0, 0x05, 0xb1,
  };
 +static void tz_mpc_irq_update(TZMPC *s)
 +{
-+    qemu_set_irq(s->irq, s->int_stat && s->int_en);
++    return float16_is_any_nan(a) ? a : float16_chs(a);
 +}
 +
-+static void tz_mpc_autoinc_idx(TZMPC *s, unsigned access_size)
++static inline float32 float32_ah_chs(float32 a)
 +{
-+    /* Auto-increment BLK_IDX if necessary */
++    return float32_is_any_nan(a) ? a : float32_chs(a);
 +    if (access_size == 4 && (s->ctrl & R_CTRL_AUTOINC_MASK)) {
 +        s->blk_idx++;
 +        s->blk_idx %= s->blk_max;
 +    }
 +}
 +
- static MemTxResult tz_mpc_reg_read(void *opaque, hwaddr addr,
++static inline float64 float64_ah_chs(float64 a)
-                                    uint64_t *pdata,
++{
-                                    unsigned size, MemTxAttrs attrs)
++    return float64_is_any_nan(a) ? a : float64_chs(a);
  {
 +    TZMPC *s = TZ_MPC(opaque);
      uint64_t r;
      uint32_t offset = addr & ~0x3;
@@ -XXX,XX +XXX,XX @@ static MemTxResult tz_mpc_reg_read(void *opaque, hwaddr addr,
      }
      switch (offset) {
 +    case A_CTRL:
 +        r = s->ctrl;
 +        break;
 +    case A_BLK_MAX:
 +        r = s->blk_max;
 +        break;
 +    case A_BLK_CFG:
 +        /* We are never in "init in progress state", so this just indicates
 +         * the block size. s->blocksize == (1 << BLK_CFG + 5), so
 +         * BLK_CFG == ctz32(s->blocksize) - 5
 +         */
 +        r = ctz32(s->blocksize) - 5;
 +        break;
 +    case A_BLK_IDX:
 +        r = s->blk_idx;
 +        break;
 +    case A_BLK_LUT:
 +        r = s->blk_lut[s->blk_idx];
 +        tz_mpc_autoinc_idx(s, size);
 +        break;
 +    case A_INT_STAT:
 +        r = s->int_stat;
 +        break;
 +    case A_INT_EN:
 +        r = s->int_en;
 +        break;
 +    case A_INT_INFO1:
 +        r = s->int_info1;
 +        break;
 +    case A_INT_INFO2:
 +        r = s->int_info2;
 +        break;
      case A_PIDR4:
      case A_PIDR5:
      case A_PIDR6:
@@ -XXX,XX +XXX,XX @@ static MemTxResult tz_mpc_reg_write(void *opaque, hwaddr addr,
                                      uint64_t value,
                                      unsigned size, MemTxAttrs attrs)
  {
 +    TZMPC *s = TZ_MPC(opaque);
      uint32_t offset = addr & ~0x3;
      trace_tz_mpc_reg_write(addr, value, size);
@@ -XXX,XX +XXX,XX @@ static MemTxResult tz_mpc_reg_write(void *opaque, hwaddr addr,
          uint32_t oldval;
          switch (offset) {
 -            /* As we add support for registers which need expansions
 -             * other than zeroes we'll fill in cases here.
 -             */
 +        case A_CTRL:
 +            oldval = s->ctrl;
 +            break;
 +        case A_BLK_IDX:
 +            oldval = s->blk_idx;
 +            break;
 +        case A_BLK_LUT:
 +            oldval = s->blk_lut[s->blk_idx];
 +            break;
          default:
              oldval = 0;
              break;
@@ -XXX,XX +XXX,XX @@ static MemTxResult tz_mpc_reg_write(void *opaque, hwaddr addr,
          value = deposit32(oldval, (addr & 3) * 8, size * 8, value);
      }
 +    if ((s->ctrl & R_CTRL_LOCKDOWN_MASK) &&
 +        (offset == A_CTRL || offset == A_BLK_LUT || offset == A_INT_EN)) {
 +        /* Lockdown mode makes these three registers read-only, and
 +         * the only way out of it is to reset the device.
 +         */
 +        qemu_log_mask(LOG_GUEST_ERROR, "TZ MPC register write to offset 0x%x "
 +                      "while MPC is in lockdown mode\n", offset);
 +        return MEMTX_OK;
 +    }
 +
      switch (offset) {
 +    case A_CTRL:
 +        /* We don't implement the 'data gating' feature so all other bits
 +         * are reserved and we make them RAZ/WI.
 +         */
 +        s->ctrl = value & (R_CTRL_SEC_RESP_MASK |
 +                           R_CTRL_AUTOINC_MASK |
 +                           R_CTRL_LOCKDOWN_MASK);
 +        break;
 +    case A_BLK_IDX:
 +        s->blk_idx = value % s->blk_max;
 +        break;
 +    case A_BLK_LUT:
 +        s->blk_lut[s->blk_idx] = value;
 +        tz_mpc_autoinc_idx(s, size);
 +        break;
 +    case A_INT_CLEAR:
 +        if (value & R_INT_CLEAR_IRQ_MASK) {
 +            s->int_stat = 0;
 +            tz_mpc_irq_update(s);
 +        }
 +        break;
 +    case A_INT_EN:
 +        s->int_en = value & R_INT_EN_IRQ_MASK;
 +        tz_mpc_irq_update(s);
 +        break;
 +    case A_INT_SET:
 +        if (value & R_INT_SET_IRQ_MASK) {
 +            s->int_stat = R_INT_STAT_IRQ_MASK;
 +            tz_mpc_irq_update(s);
 +        }
 +        break;
      case A_PIDR4:
      case A_PIDR5:
      case A_PIDR6:
@@ -XXX,XX +XXX,XX @@ static int tz_mpc_num_indexes(IOMMUMemoryRegion *iommu)
  static void tz_mpc_reset(DeviceState *dev)
  {
 +    TZMPC *s = TZ_MPC(dev);
 +
 +    s->ctrl = 0x00000100;
 +    s->blk_idx = 0;
 +    s->int_stat = 0;
 +    s->int_en = 1;
 +    s->int_info1 = 0;
 +    s->int_info2 = 0;
 +
 +    memset(s->blk_lut, 0, s->blk_max * sizeof(uint32_t));
  }
  static void tz_mpc_init(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void tz_mpc_realize(DeviceState *dev, Error **errp)
                         "tz-mpc-downstream");
      address_space_init(&s->blocked_io_as, &s->blocked_io,
                         "tz-mpc-blocked-io");
 +
 +    s->blk_lut = g_new(uint32_t, s->blk_max);
 +}
 +
-+static int tz_mpc_post_load(void *opaque, int version_id)
+ static inline float16 float16_maybe_ah_chs(float16 a, bool fpcr_ah)
-+{
+ {
-+    TZMPC *s = TZ_MPC(opaque);
+     return fpcr_ah && float16_is_any_nan(a) ? a : float16_chs(a);
-+
+diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
-+    /* Check the incoming data doesn't point blk_idx off the end of blk_lut. */
+index XXXXXXX..XXXXXXX 100644
-+    if (s->blk_idx >= s->blk_max) {
+--- a/target/arm/tcg/helper-a64.c
-+        return -1;
++++ b/target/arm/tcg/helper-a64.c
-+    }
+@@ -XXX,XX +XXX,XX @@
-+    return 0;
+ #ifdef CONFIG_USER_ONLY
  #include "user/page-protection.h"
  #endif
 +#include "vec_internal.h"
  /* C2.4.7 Multiply and divide */
  /* special cases for 0 and LLONG_MIN are mandated by the standard */
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_cgt_f64)(float64 a, float64 b, float_status *fpst)
      return -float64_lt(b, a, fpst);
  }
- static const VMStateDescription tz_mpc_vmstate = {
+-/* Reciprocal step and sqrt step. Note that unlike the A32/T32
-     .name = "tz-mpc",
++/*
-     .version_id = 1,
++ * Reciprocal step and sqrt step. Note that unlike the A32/T32
-     .minimum_version_id = 1,
+  * versions, these do a fully fused multiply-add or
-+    .post_load = tz_mpc_post_load,
+  * multiply-add-and-halve.
-     .fields = (VMStateField[]) {
++ * The FPCR.AH == 1 versions need to avoid flipping the sign of NaN.
-+        VMSTATE_UINT32(ctrl, TZMPC),
+  */
-+        VMSTATE_UINT32(blk_idx, TZMPC),
+-
-+        VMSTATE_UINT32(int_stat, TZMPC),
+-uint32_t HELPER(recpsf_f16)(uint32_t a, uint32_t b, float_status *fpst)
-+        VMSTATE_UINT32(int_en, TZMPC),
+-{
-+        VMSTATE_UINT32(int_info1, TZMPC),
+-    a = float16_squash_input_denormal(a, fpst);
-+        VMSTATE_UINT32(int_info2, TZMPC),
+-    b = float16_squash_input_denormal(b, fpst);
-+        VMSTATE_VARRAY_UINT32(blk_lut, TZMPC, blk_max,
+-
-+                              0, vmstate_info_uint32, uint32_t),
+-    a = float16_chs(a);
-         VMSTATE_END_OF_LIST()
+-    if ((float16_is_infinity(a) && float16_is_zero(b)) ||
 -        (float16_is_infinity(b) && float16_is_zero(a))) {
 -        return float16_two;
 +#define DO_RECPS(NAME, CTYPE, FLOATTYPE, CHSFN)                         \
 +    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
 +    {                                                                   \
 +        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
 +        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
 +        a = FLOATTYPE ## _ ## CHSFN(a);                                 \
 +        if ((FLOATTYPE ## _is_infinity(a) && FLOATTYPE ## _is_zero(b)) || \
 +            (FLOATTYPE ## _is_infinity(b) && FLOATTYPE ## _is_zero(a))) { \
 +            return FLOATTYPE ## _two;                                   \
 +        }                                                               \
 +        return FLOATTYPE ## _muladd(a, b, FLOATTYPE ## _two, 0, fpst);  \
      }
+-    return float16_muladd(a, b, float16_two, 0, fpst);
+-}
+-float32 HELPER(recpsf_f32)(float32 a, float32 b, float_status *fpst)
+-{
+-    a = float32_squash_input_denormal(a, fpst);
+-    b = float32_squash_input_denormal(b, fpst);
++DO_RECPS(recpsf_f16, uint32_t, float16, chs)
++DO_RECPS(recpsf_f32, float32, float32, chs)
++DO_RECPS(recpsf_f64, float64, float64, chs)
++DO_RECPS(recpsf_ah_f16, uint32_t, float16, ah_chs)
++DO_RECPS(recpsf_ah_f32, float32, float32, ah_chs)
++DO_RECPS(recpsf_ah_f64, float64, float64, ah_chs)
+-    a = float32_chs(a);
+-    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
+-        (float32_is_infinity(b) && float32_is_zero(a))) {
+-        return float32_two;
+-    }
+-    return float32_muladd(a, b, float32_two, 0, fpst);
+-}
++#define DO_RSQRTSF(NAME, CTYPE, FLOATTYPE, CHSFN)                       \
++    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
++    {                                                                   \
++        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
++        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
++        a = FLOATTYPE ## _ ## CHSFN(a);                                 \
++        if ((FLOATTYPE ## _is_infinity(a) && FLOATTYPE ## _is_zero(b)) || \
++            (FLOATTYPE ## _is_infinity(b) && FLOATTYPE ## _is_zero(a))) { \
++            return FLOATTYPE ## _one_point_five;                        \
++        }                                                               \
++        return FLOATTYPE ## _muladd_scalbn(a, b, FLOATTYPE ## _three,   \
++                                           -1, 0, fpst);                \
++    }                                                                   \
+-float64 HELPER(recpsf_f64)(float64 a, float64 b, float_status *fpst)
+-{
+-    a = float64_squash_input_denormal(a, fpst);
+-    b = float64_squash_input_denormal(b, fpst);
+-
+-    a = float64_chs(a);
+-    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
+-        (float64_is_infinity(b) && float64_is_zero(a))) {
+-        return float64_two;
+-    }
+-    return float64_muladd(a, b, float64_two, 0, fpst);
+-}
+-
+-uint32_t HELPER(rsqrtsf_f16)(uint32_t a, uint32_t b, float_status *fpst)
+-{
+-    a = float16_squash_input_denormal(a, fpst);
+-    b = float16_squash_input_denormal(b, fpst);
+-
+-    a = float16_chs(a);
+-    if ((float16_is_infinity(a) && float16_is_zero(b)) ||
+-        (float16_is_infinity(b) && float16_is_zero(a))) {
+-        return float16_one_point_five;
+-    }
+-    return float16_muladd_scalbn(a, b, float16_three, -1, 0, fpst);
+-}
+-
+-float32 HELPER(rsqrtsf_f32)(float32 a, float32 b, float_status *fpst)
+-{
+-    a = float32_squash_input_denormal(a, fpst);
+-    b = float32_squash_input_denormal(b, fpst);
+-
+-    a = float32_chs(a);
+-    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
+-        (float32_is_infinity(b) && float32_is_zero(a))) {
+-        return float32_one_point_five;
+-    }
+-    return float32_muladd_scalbn(a, b, float32_three, -1, 0, fpst);
+-}
+-
+-float64 HELPER(rsqrtsf_f64)(float64 a, float64 b, float_status *fpst)
+-{
+-    a = float64_squash_input_denormal(a, fpst);
+-    b = float64_squash_input_denormal(b, fpst);
+-
+-    a = float64_chs(a);
+-    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
+-        (float64_is_infinity(b) && float64_is_zero(a))) {
+-        return float64_one_point_five;
+-    }
+-    return float64_muladd_scalbn(a, b, float64_three, -1, 0, fpst);
+-}
++DO_RSQRTSF(rsqrtsf_f16, uint32_t, float16, chs)
++DO_RSQRTSF(rsqrtsf_f32, float32, float32, chs)
++DO_RSQRTSF(rsqrtsf_f64, float64, float64, chs)
++DO_RSQRTSF(rsqrtsf_ah_f16, uint32_t, float16, ah_chs)
++DO_RSQRTSF(rsqrtsf_ah_f32, float32, float32, ah_chs)
++DO_RSQRTSF(rsqrtsf_ah_f64, float64, float64, ah_chs)
+ /* Floating-point reciprocal exponent - see FPRecpX in ARM ARM */
+ uint32_t HELPER(frecpx_f16)(uint32_t a, float_status *fpst)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
+                                        FPST_A64_F16 : FPST_A64);
+ }
+-static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
+-                             int mergereg)
++static bool do_fp3_scalar_ah_2fn(DisasContext *s, arg_rrr_e *a,
++                                 const FPScalar *fnormal, const FPScalar *fah,
++                                 int mergereg)
+ {
+-    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
+-                                       select_ah_fpst(s, a->esz));
++    return do_fp3_scalar_with_fpsttype(s, a, s->fpcr_ah ? fah : fnormal,
++                                       mergereg, select_ah_fpst(s, a->esz));
+ }
+ /* Some insns need to call different helpers when FPCR.AH == 1 */
+@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_frecps = {
+     gen_helper_recpsf_f32,
+     gen_helper_recpsf_f64,
  };
+-TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps, a->rn)
++static const FPScalar f_scalar_ah_frecps = {
++    gen_helper_recpsf_ah_f16,
++    gen_helper_recpsf_ah_f32,
++    gen_helper_recpsf_ah_f64,
++};
++TRANS(FRECPS_s, do_fp3_scalar_ah_2fn, a,
++      &f_scalar_frecps, &f_scalar_ah_frecps, a->rn)
+ static const FPScalar f_scalar_frsqrts = {
+     gen_helper_rsqrtsf_f16,
+     gen_helper_rsqrtsf_f32,
+     gen_helper_rsqrtsf_f64,
+ };
+-TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts, a->rn)
++static const FPScalar f_scalar_ah_frsqrts = {
++    gen_helper_rsqrtsf_ah_f16,
++    gen_helper_rsqrtsf_ah_f32,
++    gen_helper_rsqrtsf_ah_f64,
++};
++TRANS(FRSQRTS_s, do_fp3_scalar_ah_2fn, a,
++      &f_scalar_frsqrts, &f_scalar_ah_frsqrts, a->rn)
+ static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
+                        const FPScalar *f, bool swap)
 --
-.17.1
+.34.1

-New patch
+[PULL 39/68] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
+Handle the FPCR.AH "don't negate the sign of a NaN" semantics
+in the vector versions of FRECPS and FRSQRTS, by implementing
+new vector wrappers that call the _ah_ scalar helpers.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
+ target/arm/tcg/translate-a64.c | 21 ++++++++++++++++-----
+ target/arm/tcg/translate-sve.c |  7 ++++++-
+ target/arm/tcg/vec_helper.c    |  8 ++++++++
+files changed, 44 insertions(+), 6 deletions(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_recps_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_recps_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_recps_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++
++DEF_HELPER_FLAGS_5(gvec_ah_rsqrts_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_rsqrts_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_rsqrts_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_5(gvec_ah_fmax_h, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(gvec_ah_fmax_s, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector_2fn(DisasContext *s, arg_qrrr_e *a, int data,
+     return do_fp3_vector(s, a, data, s->fpcr_ah ? fah : fnormal);
+ }
+-static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
+-                             gen_helper_gvec_3_ptr * const f[3])
++static bool do_fp3_vector_ah_2fn(DisasContext *s, arg_qrrr_e *a, int data,
++                                 gen_helper_gvec_3_ptr * const fnormal[3],
++                                 gen_helper_gvec_3_ptr * const fah[3])
+ {
+-    return do_fp3_vector_with_fpsttype(s, a, data, f,
++    return do_fp3_vector_with_fpsttype(s, a, data, s->fpcr_ah ? fah : fnormal,
+                                        select_ah_fpst(s, a->esz));
+ }
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
+     gen_helper_gvec_recps_s,
+     gen_helper_gvec_recps_d,
+ };
+-TRANS(FRECPS_v, do_fp3_vector_ah, a, 0, f_vector_frecps)
++static gen_helper_gvec_3_ptr * const f_vector_ah_frecps[3] = {
++    gen_helper_gvec_ah_recps_h,
++    gen_helper_gvec_ah_recps_s,
++    gen_helper_gvec_ah_recps_d,
++};
++TRANS(FRECPS_v, do_fp3_vector_ah_2fn, a, 0, f_vector_frecps, f_vector_ah_frecps)
+ static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] = {
+     gen_helper_gvec_rsqrts_h,
+     gen_helper_gvec_rsqrts_s,
+     gen_helper_gvec_rsqrts_d,
+ };
+-TRANS(FRSQRTS_v, do_fp3_vector_ah, a, 0, f_vector_frsqrts)
++static gen_helper_gvec_3_ptr * const f_vector_ah_frsqrts[3] = {
++    gen_helper_gvec_ah_rsqrts_h,
++    gen_helper_gvec_ah_rsqrts_s,
++    gen_helper_gvec_ah_rsqrts_d,
++};
++TRANS(FRSQRTS_v, do_fp3_vector_ah_2fn, a, 0, f_vector_frsqrts, f_vector_ah_frsqrts)
+ static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = {
+     gen_helper_gvec_faddp_h,
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
+         NULL, gen_helper_gvec_##name##_h,                           \
+         gen_helper_gvec_##name##_s, gen_helper_gvec_##name##_d      \
+     };                                                              \
+-    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz, name##_fns[a->esz], a, 0)
++    static gen_helper_gvec_3_ptr * const name##_ah_fns[4] = {       \
++        NULL, gen_helper_gvec_ah_##name##_h,                        \
++        gen_helper_gvec_ah_##name##_s, gen_helper_gvec_ah_##name##_d    \
++    };                                                              \
++    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz,            \
++               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz], a, 0)
+ DO_FP3(FADD_zzz, fadd)
+ DO_FP3(FSUB_zzz, fsub)
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_rsqrts_h, helper_rsqrtsf_f16, float16)
+ DO_3OP(gvec_rsqrts_s, helper_rsqrtsf_f32, float32)
+ DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64)
++DO_3OP(gvec_ah_recps_h, helper_recpsf_ah_f16, float16)
++DO_3OP(gvec_ah_recps_s, helper_recpsf_ah_f32, float32)
++DO_3OP(gvec_ah_recps_d, helper_recpsf_ah_f64, float64)
++
++DO_3OP(gvec_ah_rsqrts_h, helper_rsqrtsf_ah_f16, float16)
++DO_3OP(gvec_ah_rsqrts_s, helper_rsqrtsf_ah_f32, float32)
++DO_3OP(gvec_ah_rsqrts_d, helper_rsqrtsf_ah_f64, float64)
++
+ DO_3OP(gvec_ah_fmax_h, helper_vfp_ah_maxh, float16)
+ DO_3OP(gvec_ah_fmax_s, helper_vfp_ah_maxs, float32)
+ DO_3OP(gvec_ah_fmax_d, helper_vfp_ah_maxd, float64)
+--
+.34.1

-New patch
+[PULL 40/68] target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
+Handle the FPCR.AH "don't negate the sign of a NaN" semantics in FMLS
+(indexed). We do this by creating 6 new helpers, which allow us to
+do the negation either by XOR (for AH=0) or by muladd flags
+(for AH=1).
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+[PMM: Mostly from RTH's patch; error in index order into fns[][]
+ fixed]
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/helper.h            | 14 ++++++++++++++
+ target/arm/tcg/translate-a64.c | 17 +++++++++++------
+ target/arm/tcg/translate-sve.c | 31 +++++++++++++++++--------------
+ target/arm/tcg/vec_helper.c    | 24 +++++++++++++++---------
+files changed, 57 insertions(+), 29 deletions(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(gvec_fmls_idx_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(gvec_fmls_idx_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(gvec_fmls_idx_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++
++DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_5(gvec_uqadd_b, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_5(gvec_uqadd_h, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ TRANS(FMULX_vi, do_fp3_vector_idx, a, f_vector_idx_fmulx)
+ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
+ {
+-    static gen_helper_gvec_4_ptr * const fns[3] = {
+-        gen_helper_gvec_fmla_idx_h,
+-        gen_helper_gvec_fmla_idx_s,
+-        gen_helper_gvec_fmla_idx_d,
++    static gen_helper_gvec_4_ptr * const fns[3][3] = {
++        { gen_helper_gvec_fmla_idx_h,
++          gen_helper_gvec_fmla_idx_s,
++          gen_helper_gvec_fmla_idx_d },
++        { gen_helper_gvec_fmls_idx_h,
++          gen_helper_gvec_fmls_idx_s,
++          gen_helper_gvec_fmls_idx_d },
++        { gen_helper_gvec_ah_fmls_idx_h,
++          gen_helper_gvec_ah_fmls_idx_s,
++          gen_helper_gvec_ah_fmls_idx_d },
+     };
+     MemOp esz = a->esz;
+     int check = fp_access_check_vector_hsd(s, a->q, esz);
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
+     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
+                       esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+-                      (a->idx << 1) | neg,
+-                      fns[esz - 1]);
++                      a->idx, fns[neg ? 1 + s->fpcr_ah : 0][esz - 1]);
+     return true;
+ }
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ DO_SVE2_RRXR_ROT(CDOT_zzxw_d, gen_helper_sve2_cdot_idx_d)
+  *** SVE Floating Point Multiply-Add Indexed Group
+  */
+-static bool do_FMLA_zzxz(DisasContext *s, arg_rrxr_esz *a, bool sub)
+-{
+-    static gen_helper_gvec_4_ptr * const fns[4] = {
+-        NULL,
+-        gen_helper_gvec_fmla_idx_h,
+-        gen_helper_gvec_fmla_idx_s,
+-        gen_helper_gvec_fmla_idx_d,
+-    };
+-    return gen_gvec_fpst_zzzz(s, fns[a->esz], a->rd, a->rn, a->rm, a->ra,
+-                              (a->index << 1) | sub,
+-                              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+-}
++static gen_helper_gvec_4_ptr * const fmla_idx_fns[4] = {
++    NULL,                       gen_helper_gvec_fmla_idx_h,
++    gen_helper_gvec_fmla_idx_s, gen_helper_gvec_fmla_idx_d
++};
++TRANS_FEAT(FMLA_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
++           fmla_idx_fns[a->esz], a->rd, a->rn, a->rm, a->ra, a->index,
++           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
+-TRANS_FEAT(FMLA_zzxz, aa64_sve, do_FMLA_zzxz, a, false)
+-TRANS_FEAT(FMLS_zzxz, aa64_sve, do_FMLA_zzxz, a, true)
++static gen_helper_gvec_4_ptr * const fmls_idx_fns[4][2] = {
++    { NULL, NULL },
++    { gen_helper_gvec_fmls_idx_h, gen_helper_gvec_ah_fmls_idx_h },
++    { gen_helper_gvec_fmls_idx_s, gen_helper_gvec_ah_fmls_idx_s },
++    { gen_helper_gvec_fmls_idx_d, gen_helper_gvec_ah_fmls_idx_d },
++};
++TRANS_FEAT(FMLS_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
++           fmls_idx_fns[a->esz][s->fpcr_ah],
++           a->rd, a->rn, a->rm, a->ra, a->index,
++           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
+ /*
+  *** SVE Floating Point Multiply Indexed Group
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_FMUL_IDX(gvec_fmls_nf_idx_s, float32_sub, float32_mul, float32, H4)
+ #undef DO_FMUL_IDX
+-#define DO_FMLA_IDX(NAME, TYPE, H)                                         \
++#define DO_FMLA_IDX(NAME, TYPE, H, NEGX, NEGF)                             \
+ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va,                  \
+                   float_status *stat, uint32_t desc)                       \
+ {                                                                          \
+     intptr_t i, j, oprsz = simd_oprsz(desc);                               \
+     intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
+-    TYPE op1_neg = extract32(desc, SIMD_DATA_SHIFT, 1);                    \
+-    intptr_t idx = desc >> (SIMD_DATA_SHIFT + 1);                          \
++    intptr_t idx = simd_data(desc);                                        \
+     TYPE *d = vd, *n = vn, *m = vm, *a = va;                               \
+-    op1_neg <<= (8 * sizeof(TYPE) - 1);                                    \
+     for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
+         TYPE mm = m[H(i + idx)];                                           \
+         for (j = 0; j < segment; j++) {                                    \
+-            d[i + j] = TYPE##_muladd(n[i + j] ^ op1_neg,                   \
+-                                     mm, a[i + j], 0, stat);               \
++            d[i + j] = TYPE##_muladd(n[i + j] ^ NEGX, mm,                  \
++                                     a[i + j], NEGF, stat);                \
+         }                                                                  \
+     }                                                                      \
+     clear_tail(d, oprsz, simd_maxsz(desc));                                \
+ }
+-DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2)
+-DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4)
+-DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8)
++DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2, 0, 0)
++DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4, 0, 0)
++DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8, 0, 0)
++
++DO_FMLA_IDX(gvec_fmls_idx_h, float16, H2, INT16_MIN, 0)
++DO_FMLA_IDX(gvec_fmls_idx_s, float32, H4, INT32_MIN, 0)
++DO_FMLA_IDX(gvec_fmls_idx_d, float64, H8, INT64_MIN, 0)
++
++DO_FMLA_IDX(gvec_ah_fmls_idx_h, float16, H2, 0, float_muladd_negate_product)
++DO_FMLA_IDX(gvec_ah_fmls_idx_s, float32, H4, 0, float_muladd_negate_product)
++DO_FMLA_IDX(gvec_ah_fmls_idx_d, float64, H8, 0, float_muladd_negate_product)
+ #undef DO_FMLA_IDX
+--
+.34.1

-New patch
+[PULL 41/68] target/arm: Handle FPCR.AH in negation in FMLS (vector)
+Handle the FPCR.AH "don't negate the sign of a NaN" semantics
+in FMLS (vector), by implementing a new set of helpers for
+the AH=1 case.
+The float_muladd_negate_product flag produces the same result
+as negating either of the multiplication operands, assuming
+neither of the operands are NaNs.  But since FEAT_AFP does not
+negate NaNs, this behaviour is exactly what we need.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/helper.h            |  4 ++++
+ target/arm/tcg/translate-a64.c |  7 ++++++-
+ target/arm/tcg/vec_helper.c    | 22 ++++++++++++++++++++++
+files changed, 32 insertions(+), 1 deletion(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(gvec_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(gvec_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmls[3] = {
+     gen_helper_gvec_vfms_s,
+     gen_helper_gvec_vfms_d,
+ };
+-TRANS(FMLS_v, do_fp3_vector, a, 0, f_vector_fmls)
++static gen_helper_gvec_3_ptr * const f_vector_fmls_ah[3] = {
++    gen_helper_gvec_ah_vfms_h,
++    gen_helper_gvec_ah_vfms_s,
++    gen_helper_gvec_ah_vfms_d,
++};
++TRANS(FMLS_v, do_fp3_vector_2fn, a, 0, f_vector_fmls, f_vector_fmls_ah)
+ static gen_helper_gvec_3_ptr * const f_vector_fcmeq[3] = {
+     gen_helper_gvec_fceq_h,
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ static float64 float64_mulsub_f(float64 dest, float64 op1, float64 op2,
+     return float64_muladd(float64_chs(op1), op2, dest, 0, stat);
+ }
++static float16 float16_ah_mulsub_f(float16 dest, float16 op1, float16 op2,
++                                 float_status *stat)
++{
++    return float16_muladd(op1, op2, dest, float_muladd_negate_product, stat);
++}
++
++static float32 float32_ah_mulsub_f(float32 dest, float32 op1, float32 op2,
++                                 float_status *stat)
++{
++    return float32_muladd(op1, op2, dest, float_muladd_negate_product, stat);
++}
++
++static float64 float64_ah_mulsub_f(float64 dest, float64 op1, float64 op2,
++                                 float_status *stat)
++{
++    return float64_muladd(op1, op2, dest, float_muladd_negate_product, stat);
++}
++
+ #define DO_MULADD(NAME, FUNC, TYPE)                                        \
+ void HELPER(NAME)(void *vd, void *vn, void *vm,                            \
+                   float_status *stat, uint32_t desc)                       \
+@@ -XXX,XX +XXX,XX @@ DO_MULADD(gvec_vfms_h, float16_mulsub_f, float16)
+ DO_MULADD(gvec_vfms_s, float32_mulsub_f, float32)
+ DO_MULADD(gvec_vfms_d, float64_mulsub_f, float64)
++DO_MULADD(gvec_ah_vfms_h, float16_ah_mulsub_f, float16)
++DO_MULADD(gvec_ah_vfms_s, float32_ah_mulsub_f, float32)
++DO_MULADD(gvec_ah_vfms_d, float64_ah_mulsub_f, float64)
++
+ /* For the indexed ops, SVE applies the index per 128-bit vector segment.
+  * For AdvSIMD, there is of course only one such vector segment.
+  */
+--
+.34.1

-[Qemu-devel] [PULL 22/28] hw/arm/iotkit: Instantiate MPC
+[PULL 42/68] target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
-Wire up the one MPC that is part of the IoTKit itself. For the
+Handle the FPCR.AH "don't negate the sign of a NaN" semantics fro the
-moment we don't wire up its interrupt line.
+SVE FMLS (vector) insns, by providing new helpers for the AH=1 case
 which end up passing fpcr_ah = true to the do_fmla_zpzzz_* functions
 that do the work.
 The float*_muladd functions have a flags argument that can
 perform optional negation of various operand.  We don't use
 that for "normal" arm fmla, because the muladd flags are not
 applied when an input is a NaN.  But since FEAT_AFP does not
 negate NaNs, this behaviour is exactly what we need.
 The non-AH helpers pass in a zero flags argument and control the
 negation via the neg1 and neg3 arguments; the AH helpers always pass
 in neg1 and neg3 as zero and control the negation via the flags
 argument.  This allows us to avoid conditional branches within the
 inner loop.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20180620132032.28865-7-peter.maydell@linaro.org
 ---
- include/hw/arm/iotkit.h |  2 ++
+ target/arm/tcg/helper-sve.h    | 21 ++++++++
- hw/arm/iotkit.c         | 38 +++++++++++++++++++++++++++-----------
+ target/arm/tcg/sve_helper.c    | 99 +++++++++++++++++++++++++++-------
-files changed, 29 insertions(+), 11 deletions(-)
+ target/arm/tcg/translate-sve.c | 18 ++++---
 files changed, 114 insertions(+), 24 deletions(-)
-diff --git a/include/hw/arm/iotkit.h b/include/hw/arm/iotkit.h
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/iotkit.h
+--- a/target/arm/tcg/helper-sve.h
-+++ b/include/hw/arm/iotkit.h
++++ b/target/arm/tcg/helper-sve.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
- #include "hw/arm/armv7m.h"
+ DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
- #include "hw/misc/iotkit-secctl.h"
+                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
- #include "hw/misc/tz-ppc.h"
-+#include "hw/misc/tz-mpc.h"
++DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_h, TCG_CALL_NO_RWG,
- #include "hw/timer/cmsdk-apb-timer.h"
++                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
- #include "hw/misc/unimp.h"
++DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_s, TCG_CALL_NO_RWG,
- #include "hw/or-irq.h"
++                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
-@@ -XXX,XX +XXX,XX @@ typedef struct IoTKit {
++DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_d, TCG_CALL_NO_RWG,
-     IoTKitSecCtl secctl;
++                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
-     TZPPC apb_ppc0;
++
-     TZPPC apb_ppc1;
++DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_h, TCG_CALL_NO_RWG,
-+    TZMPC mpc;
++                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
-     CMSDKAPBTIMER timer0;
++DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_s, TCG_CALL_NO_RWG,
-     CMSDKAPBTIMER timer1;
++                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
-     qemu_or_irq ppc_irq_orgate;
++DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_d, TCG_CALL_NO_RWG,
-diff --git a/hw/arm/iotkit.c b/hw/arm/iotkit.c
++                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +
 +DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_h, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +
  DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_h, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_s, TCG_CALL_NO_RWG,
 diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/iotkit.c
+--- a/target/arm/tcg/sve_helper.c
-+++ b/hw/arm/iotkit.c
++++ b/target/arm/tcg/sve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void iotkit_init(Object *obj)
+@@ -XXX,XX +XXX,XX @@ DO_ZPZ_FP(flogb_d, float64, H1_8, do_float64_logb_as_int)
-                       TYPE_TZ_PPC);
-     init_sysbus_child(obj, "apb-ppc1", &s->apb_ppc1, sizeof(s->apb_ppc1),
+ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
-                       TYPE_TZ_PPC);
+                             float_status *status, uint32_t desc,
-+    init_sysbus_child(obj, "mpc", &s->mpc, sizeof(s->mpc), TYPE_TZ_MPC);
+-                            uint16_t neg1, uint16_t neg3)
-     init_sysbus_child(obj, "timer0", &s->timer0, sizeof(s->timer0),
++                            uint16_t neg1, uint16_t neg3, int flags)
-                       TYPE_CMSDK_APB_TIMER);
+ {
-     init_sysbus_child(obj, "timer1", &s->timer1, sizeof(s->timer1),
+     intptr_t i = simd_oprsz(desc);
-@@ -XXX,XX +XXX,XX @@ static void iotkit_realize(DeviceState *dev, Error **errp)
+     uint64_t *g = vg;
-      */
+@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
-     make_alias(s, &s->alias3, "alias 3", 0x50000000, 0x10000000, 0x40000000);
+                 e1 = *(uint16_t *)(vn + H1_2(i)) ^ neg1;
+                 e2 = *(uint16_t *)(vm + H1_2(i));
--    /* This RAM should be behind a Memory Protection Controller, but we
+                 e3 = *(uint16_t *)(va + H1_2(i)) ^ neg3;
--     * don't implement that yet.
+-                r = float16_muladd(e1, e2, e3, 0, status);
--     */
++                r = float16_muladd(e1, e2, e3, flags, status);
--    memory_region_init_ram(&s->sram0, NULL, "iotkit.sram0", 0x00008000, &err);
+                 *(uint16_t *)(vd + H1_2(i)) = r;
--    if (err) {
+             }
--        error_propagate(errp, err);
+         } while (i & 63);
--        return;
+@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
--    }
+ void HELPER(sve_fmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
--    memory_region_add_subregion(&s->container, 0x20000000, &s->sram0);
+                               void *vg, float_status *status, uint32_t desc)
+ {
-     /* Security controller */
+-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0);
-     object_property_set_bool(OBJECT(&s->secctl), true, "realized", &err);
++    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
-@@ -XXX,XX +XXX,XX @@ static void iotkit_realize(DeviceState *dev, Error **errp)
+ }
-     qdev_connect_gpio_out_named(dev_secctl, "sec_resp_cfg", 0,
-                                 qdev_get_gpio_in(dev_splitter, 0));
+ void HELPER(sve_fmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
-+    /* This RAM lives behind the Memory Protection Controller */
+ {
-+    memory_region_init_ram(&s->sram0, NULL, "iotkit.sram0", 0x00008000, &err);
+-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0);
-+    if (err) {
++    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0, 0);
-+        error_propagate(errp, err);
+ }
-+        return;
-+    }
+ void HELPER(sve_fnmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
-+    object_property_set_link(OBJECT(&s->mpc), OBJECT(&s->sram0),
+                                void *vg, float_status *status, uint32_t desc)
-+                             "downstream", &err);
+ {
-+    if (err) {
+-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000);
-+        error_propagate(errp, err);
++    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000, 0);
-+        return;
+ }
-+    }
-+    object_property_set_bool(OBJECT(&s->mpc), true, "realized", &err);
+ void HELPER(sve_fnmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
-+    if (err) {
+                                void *vg, float_status *status, uint32_t desc)
-+        error_propagate(errp, err);
+ {
-+        return;
+-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000);
-+    }
++    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000, 0);
-+    /* Map the upstream end of the MPC into the right place... */
++}
-+    memory_region_add_subregion(&s->container, 0x20000000,
++
-+                                sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->mpc),
++void HELPER(sve_ah_fmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
-+                                                       1));
++                              void *vg, float_status *status, uint32_t desc)
-+    /* ...and its register interface */
++{
-+    memory_region_add_subregion(&s->container, 0x50083000,
++    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
-+                                sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->mpc),
++                    float_muladd_negate_product);
-+                                                       0));
++}
 +
-     /* Devices behind APB PPC0:
++void HELPER(sve_ah_fnmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
-      *   0x40000000: timer0
++                               void *vg, float_status *status, uint32_t desc)
-      *   0x40001000: timer1
++{
-@@ -XXX,XX +XXX,XX @@ static void iotkit_realize(DeviceState *dev, Error **errp)
++    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
-     create_unimplemented_device("NS watchdog", 0x40081000, 0x1000);
++                    float_muladd_negate_product | float_muladd_negate_c);
-     create_unimplemented_device("S watchdog", 0x50081000, 0x1000);
++}
++
--    create_unimplemented_device("SRAM0 MPC", 0x50083000, 0x1000);
++void HELPER(sve_ah_fnmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
--
++                               void *vg, float_status *status, uint32_t desc)
-     for (i = 0; i < ARRAY_SIZE(s->ppc_irq_splitter); i++) {
++{
-         Object *splitter = OBJECT(&s->ppc_irq_splitter[i]);
++    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_c);
  }
  static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
                              float_status *status, uint32_t desc,
 -                            uint32_t neg1, uint32_t neg3)
 +                            uint32_t neg1, uint32_t neg3, int flags)
  {
      intptr_t i = simd_oprsz(desc);
      uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
                  e1 = *(uint32_t *)(vn + H1_4(i)) ^ neg1;
                  e2 = *(uint32_t *)(vm + H1_4(i));
                  e3 = *(uint32_t *)(va + H1_4(i)) ^ neg3;
 -                r = float32_muladd(e1, e2, e3, 0, status);
 +                r = float32_muladd(e1, e2, e3, flags, status);
                  *(uint32_t *)(vd + H1_4(i)) = r;
              }
          } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
  void HELPER(sve_fmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0);
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
  }
  void HELPER(sve_fmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0);
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0, 0);
  }
  void HELPER(sve_fnmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000);
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000, 0);
  }
  void HELPER(sve_fnmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000);
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000, 0);
 +}
 +
 +void HELPER(sve_ah_fmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
 +                              void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product);
 +}
 +
 +void HELPER(sve_ah_fnmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product | float_muladd_negate_c);
 +}
 +
 +void HELPER(sve_ah_fnmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_c);
  }
  static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
                              float_status *status, uint32_t desc,
 -                            uint64_t neg1, uint64_t neg3)
 +                            uint64_t neg1, uint64_t neg3, int flags)
  {
      intptr_t i = simd_oprsz(desc);
      uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
                  e1 = *(uint64_t *)(vn + i) ^ neg1;
                  e2 = *(uint64_t *)(vm + i);
                  e3 = *(uint64_t *)(va + i) ^ neg3;
 -                r = float64_muladd(e1, e2, e3, 0, status);
 +                r = float64_muladd(e1, e2, e3, flags, status);
                  *(uint64_t *)(vd + i) = r;
              }
          } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
  void HELPER(sve_fmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0);
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
  }
  void HELPER(sve_fmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0);
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0, 0);
  }
  void HELPER(sve_fnmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN);
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN, 0);
  }
  void HELPER(sve_fnmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN);
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN, 0);
 +}
 +
 +void HELPER(sve_ah_fmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
 +                              void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product);
 +}
 +
 +void HELPER(sve_ah_fnmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product | float_muladd_negate_c);
 +}
 +
 +void HELPER(sve_ah_fnmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_c);
  }
  /* Two operand floating-point comparison controlled by a predicate.
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
             a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
             a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 -#define DO_FMLA(NAME, name) \
 +#define DO_FMLA(NAME, name, ah_name)                                    \
      static gen_helper_gvec_5_ptr * const name##_fns[4] = {              \
          NULL, gen_helper_sve_##name##_h,                                \
          gen_helper_sve_##name##_s, gen_helper_sve_##name##_d            \
      };                                                                  \
 -    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp, name##_fns[a->esz], \
 +    static gen_helper_gvec_5_ptr * const name##_ah_fns[4] = {           \
 +        NULL, gen_helper_sve_##ah_name##_h,                             \
 +        gen_helper_sve_##ah_name##_s, gen_helper_sve_##ah_name##_d      \
 +    };                                                                  \
 +    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp,                     \
 +               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz], \
                 a->rd, a->rn, a->rm, a->ra, a->pg, 0,                    \
                 a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 -DO_FMLA(FMLA_zpzzz, fmla_zpzzz)
 -DO_FMLA(FMLS_zpzzz, fmls_zpzzz)
 -DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz)
 -DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz)
 +/* We don't need an ah_fmla_zpzzz because fmla doesn't negate anything */
 +DO_FMLA(FMLA_zpzzz, fmla_zpzzz, fmla_zpzzz)
 +DO_FMLA(FMLS_zpzzz, fmls_zpzzz, ah_fmls_zpzzz)
 +DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz, ah_fnmla_zpzzz)
 +DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz, ah_fnmls_zpzzz)
  #undef DO_FMLA
 --
-.17.1
+.34.1

-New patch
+[PULL 43/68] target/arm: Handle FPCR.AH in SVE FTSSEL
+The negation step in the SVE FTSSEL insn mustn't negate a NaN when
+FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field
+and use that to determine whether to do the negation.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/sve_helper.c    | 18 +++++++++++++++---
+ target/arm/tcg/translate-sve.c |  4 ++--
+files changed, 17 insertions(+), 5 deletions(-)
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fexpa_d)(void *vd, void *vn, uint32_t desc)
+ void HELPER(sve_ftssel_h)(void *vd, void *vn, void *vm, uint32_t desc)
+ {
+     intptr_t i, opr_sz = simd_oprsz(desc) / 2;
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT, 1);
+     uint16_t *d = vd, *n = vn, *m = vm;
+     for (i = 0; i < opr_sz; i += 1) {
+         uint16_t nn = n[i];
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftssel_h)(void *vd, void *vn, void *vm, uint32_t desc)
+         if (mm & 1) {
+             nn = float16_one;
+         }
+-        d[i] = nn ^ (mm & 2) << 14;
++        if (mm & 2) {
++            nn = float16_maybe_ah_chs(nn, fpcr_ah);
++        }
++        d[i] = nn;
+     }
+ }
+ void HELPER(sve_ftssel_s)(void *vd, void *vn, void *vm, uint32_t desc)
+ {
+     intptr_t i, opr_sz = simd_oprsz(desc) / 4;
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT, 1);
+     uint32_t *d = vd, *n = vn, *m = vm;
+     for (i = 0; i < opr_sz; i += 1) {
+         uint32_t nn = n[i];
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftssel_s)(void *vd, void *vn, void *vm, uint32_t desc)
+         if (mm & 1) {
+             nn = float32_one;
+         }
+-        d[i] = nn ^ (mm & 2) << 30;
++        if (mm & 2) {
++            nn = float32_maybe_ah_chs(nn, fpcr_ah);
++        }
++        d[i] = nn;
+     }
+ }
+ void HELPER(sve_ftssel_d)(void *vd, void *vn, void *vm, uint32_t desc)
+ {
+     intptr_t i, opr_sz = simd_oprsz(desc) / 8;
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT, 1);
+     uint64_t *d = vd, *n = vn, *m = vm;
+     for (i = 0; i < opr_sz; i += 1) {
+         uint64_t nn = n[i];
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftssel_d)(void *vd, void *vn, void *vm, uint32_t desc)
+         if (mm & 1) {
+             nn = float64_one;
+         }
+-        d[i] = nn ^ (mm & 2) << 62;
++        if (mm & 2) {
++            nn = float64_maybe_ah_chs(nn, fpcr_ah);
++        }
++        d[i] = nn;
+     }
+ }
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2 * const fexpa_fns[4] = {
+     gen_helper_sve_fexpa_s, gen_helper_sve_fexpa_d,
+ };
+ TRANS_FEAT_NONSTREAMING(FEXPA, aa64_sve, gen_gvec_ool_zz,
+-                        fexpa_fns[a->esz], a->rd, a->rn, 0)
++                        fexpa_fns[a->esz], a->rd, a->rn, s->fpcr_ah)
+ static gen_helper_gvec_3 * const ftssel_fns[4] = {
+     NULL,                    gen_helper_sve_ftssel_h,
+     gen_helper_sve_ftssel_s, gen_helper_sve_ftssel_d,
+ };
+ TRANS_FEAT_NONSTREAMING(FTSSEL, aa64_sve, gen_gvec_ool_arg_zzz,
+-                        ftssel_fns[a->esz], a, 0)
++                        ftssel_fns[a->esz], a, s->fpcr_ah)
+ /*
+  *** SVE Predicate Logical Operations Group
+--
+.34.1

-New patch
+[PULL 44/68] target/arm: Handle FPCR.AH in SVE FTMAD
+The negation step in the SVE FTMAD insn mustn't negate a NaN when
+FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field,
+so we can select the correct behaviour.
+Because the operand is known to be negative, negating the operand
+is the same as taking the absolute value.  Defer this to the muladd
+operation via flags, so that it happens after NaN detection, which
+is correct for FPCR.AH.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/sve_helper.c    | 42 ++++++++++++++++++++++++++--------
+ target/arm/tcg/translate-sve.c |  3 ++-
+files changed, 35 insertions(+), 10 deletions(-)
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftmad_h)(void *vd, void *vn, void *vm,
+x3c00, 0xb800, 0x293a, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
+     };
+     intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float16);
+-    intptr_t x = simd_data(desc);
++    intptr_t x = extract32(desc, SIMD_DATA_SHIFT, 3);
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 3, 1);
+     float16 *d = vd, *n = vn, *m = vm;
++
+     for (i = 0; i < opr_sz; i++) {
+         float16 mm = m[i];
+         intptr_t xx = x;
++        int flags = 0;
++
+         if (float16_is_neg(mm)) {
+-            mm = float16_abs(mm);
++            if (fpcr_ah) {
++                flags = float_muladd_negate_product;
++            } else {
++                mm = float16_abs(mm);
++            }
+             xx += 8;
+         }
+-        d[i] = float16_muladd(n[i], mm, coeff[xx], 0, s);
++        d[i] = float16_muladd(n[i], mm, coeff[xx], flags, s);
+     }
+ }
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftmad_s)(void *vd, void *vn, void *vm,
+x37cd37cc, 0x00000000, 0x00000000, 0x00000000,
+     };
+     intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float32);
+-    intptr_t x = simd_data(desc);
++    intptr_t x = extract32(desc, SIMD_DATA_SHIFT, 3);
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 3, 1);
+     float32 *d = vd, *n = vn, *m = vm;
++
+     for (i = 0; i < opr_sz; i++) {
+         float32 mm = m[i];
+         intptr_t xx = x;
++        int flags = 0;
++
+         if (float32_is_neg(mm)) {
+-            mm = float32_abs(mm);
++            if (fpcr_ah) {
++                flags = float_muladd_negate_product;
++            } else {
++                mm = float32_abs(mm);
++            }
+             xx += 8;
+         }
+-        d[i] = float32_muladd(n[i], mm, coeff[xx], 0, s);
++        d[i] = float32_muladd(n[i], mm, coeff[xx], flags, s);
+     }
+ }
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftmad_d)(void *vd, void *vn, void *vm,
+x3e21ee96d2641b13ull, 0xbda8f76380fbb401ull,
+     };
+     intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float64);
+-    intptr_t x = simd_data(desc);
++    intptr_t x = extract32(desc, SIMD_DATA_SHIFT, 3);
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 3, 1);
+     float64 *d = vd, *n = vn, *m = vm;
++
+     for (i = 0; i < opr_sz; i++) {
+         float64 mm = m[i];
+         intptr_t xx = x;
++        int flags = 0;
++
+         if (float64_is_neg(mm)) {
+-            mm = float64_abs(mm);
++            if (fpcr_ah) {
++                flags = float_muladd_negate_product;
++            } else {
++                mm = float64_abs(mm);
++            }
+             xx += 8;
+         }
+-        d[i] = float64_muladd(n[i], mm, coeff[xx], 0, s);
++        d[i] = float64_muladd(n[i], mm, coeff[xx], flags, s);
+     }
+ }
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const ftmad_fns[4] = {
+     gen_helper_sve_ftmad_s, gen_helper_sve_ftmad_d,
+ };
+ TRANS_FEAT_NONSTREAMING(FTMAD, aa64_sve, gen_gvec_fpst_zzz,
+-                        ftmad_fns[a->esz], a->rd, a->rn, a->rm, a->imm,
++                        ftmad_fns[a->esz], a->rd, a->rn, a->rm,
++                        a->imm | (s->fpcr_ah << 3),
+                         a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
+ /*
+--
+.34.1

-New patch
+[PULL 45/68] target/arm: Handle FPCR.AH in vector FCMLA
+From: Richard Henderson <richard.henderson@linaro.org>
+The negation step in FCMLA mustn't negate a NaN when FPCR.AH
+is set. Handle this by passing FPCR.AH to the helper via the
+SIMD data field, and use this to select whether to do the
+negation via XOR or via the muladd negate_product flag.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20250129013857.135256-26-richard.henderson@linaro.org
+[PMM: Expanded commit message]
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/tcg/translate-a64.c |  2 +-
+ target/arm/tcg/vec_helper.c    | 66 ++++++++++++++++++++--------------
+files changed, 40 insertions(+), 28 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
+     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
+                       a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+-                      a->rot, fn[a->esz]);
++                      a->rot | (s->fpcr_ah << 2), fn[a->esz]);
+     return true;
+ }
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm, void *va,
+     uintptr_t opr_sz = simd_oprsz(desc);
+     float16 *d = vd, *n = vn, *m = vm, *a = va;
+     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+-    uint32_t neg_real = flip ^ neg_imag;
++    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
++    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
++    uint32_t negf_real = flip ^ negf_imag;
++    float16 negx_imag, negx_real;
+     uintptr_t i;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 15;
+-    neg_imag <<= 15;
++    /* With AH=0, use negx; with AH=1 use negf. */
++    negx_real = (negf_real & ~fpcr_ah) << 15;
++    negx_imag = (negf_imag & ~fpcr_ah) << 15;
++    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
++    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
+     for (i = 0; i < opr_sz / 2; i += 2) {
+         float16 e2 = n[H2(i + flip)];
+-        float16 e1 = m[H2(i + flip)] ^ neg_real;
++        float16 e1 = m[H2(i + flip)] ^ negx_real;
+         float16 e4 = e2;
+-        float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag;
++        float16 e3 = m[H2(i + 1 - flip)] ^ negx_imag;
+-        d[H2(i)] = float16_muladd(e2, e1, a[H2(i)], 0, fpst);
+-        d[H2(i + 1)] = float16_muladd(e4, e3, a[H2(i + 1)], 0, fpst);
++        d[H2(i)] = float16_muladd(e2, e1, a[H2(i)], negf_real, fpst);
++        d[H2(i + 1)] = float16_muladd(e4, e3, a[H2(i + 1)], negf_imag, fpst);
+     }
+     clear_tail(d, opr_sz, simd_maxsz(desc));
+ }
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlas)(void *vd, void *vn, void *vm, void *va,
+     uintptr_t opr_sz = simd_oprsz(desc);
+     float32 *d = vd, *n = vn, *m = vm, *a = va;
+     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+-    uint32_t neg_real = flip ^ neg_imag;
++    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
++    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
++    uint32_t negf_real = flip ^ negf_imag;
++    float32 negx_imag, negx_real;
+     uintptr_t i;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 31;
+-    neg_imag <<= 31;
++    /* With AH=0, use negx; with AH=1 use negf. */
++    negx_real = (negf_real & ~fpcr_ah) << 31;
++    negx_imag = (negf_imag & ~fpcr_ah) << 31;
++    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
++    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
+     for (i = 0; i < opr_sz / 4; i += 2) {
+         float32 e2 = n[H4(i + flip)];
+-        float32 e1 = m[H4(i + flip)] ^ neg_real;
++        float32 e1 = m[H4(i + flip)] ^ negx_real;
+         float32 e4 = e2;
+-        float32 e3 = m[H4(i + 1 - flip)] ^ neg_imag;
++        float32 e3 = m[H4(i + 1 - flip)] ^ negx_imag;
+-        d[H4(i)] = float32_muladd(e2, e1, a[H4(i)], 0, fpst);
+-        d[H4(i + 1)] = float32_muladd(e4, e3, a[H4(i + 1)], 0, fpst);
++        d[H4(i)] = float32_muladd(e2, e1, a[H4(i)], negf_real, fpst);
++        d[H4(i + 1)] = float32_muladd(e4, e3, a[H4(i + 1)], negf_imag, fpst);
+     }
+     clear_tail(d, opr_sz, simd_maxsz(desc));
+ }
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlad)(void *vd, void *vn, void *vm, void *va,
+     uintptr_t opr_sz = simd_oprsz(desc);
+     float64 *d = vd, *n = vn, *m = vm, *a = va;
+     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+-    uint64_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+-    uint64_t neg_real = flip ^ neg_imag;
++    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
++    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
++    uint32_t negf_real = flip ^ negf_imag;
++    float64 negx_real, negx_imag;
+     uintptr_t i;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 63;
+-    neg_imag <<= 63;
++    /* With AH=0, use negx; with AH=1 use negf. */
++    negx_real = (uint64_t)(negf_real & ~fpcr_ah) << 63;
++    negx_imag = (uint64_t)(negf_imag & ~fpcr_ah) << 63;
++    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
++    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
+     for (i = 0; i < opr_sz / 8; i += 2) {
+         float64 e2 = n[i + flip];
+-        float64 e1 = m[i + flip] ^ neg_real;
++        float64 e1 = m[i + flip] ^ negx_real;
+         float64 e4 = e2;
+-        float64 e3 = m[i + 1 - flip] ^ neg_imag;
++        float64 e3 = m[i + 1 - flip] ^ negx_imag;
+-        d[i] = float64_muladd(e2, e1, a[i], 0, fpst);
+-        d[i + 1] = float64_muladd(e4, e3, a[i + 1], 0, fpst);
++        d[i] = float64_muladd(e2, e1, a[i], negf_real, fpst);
++        d[i + 1] = float64_muladd(e4, e3, a[i + 1], negf_imag, fpst);
+     }
+     clear_tail(d, opr_sz, simd_maxsz(desc));
+ }
+--
+.34.1

-New patch
+[PULL 46/68] target/arm: Handle FPCR.AH in FCMLA by index
+From: Richard Henderson <richard.henderson@linaro.org>
+The negation step in FCMLA by index mustn't negate a NaN when
+FPCR.AH is set. Use the same approach as vector FCMLA of
+passing in FPCR.AH and using it to select whether to negate
+by XOR or by the muladd negate_product flag.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20250129013857.135256-27-richard.henderson@linaro.org
+[PMM: Expanded commit message]
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/tcg/translate-a64.c |  2 +-
+ target/arm/tcg/vec_helper.c    | 44 ++++++++++++++++++++--------------
+files changed, 27 insertions(+), 19 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
+     if (fp_access_check(s)) {
+         gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
+                           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+-                          (a->idx << 2) | a->rot, fn);
++                          (s->fpcr_ah << 4) | (a->idx << 2) | a->rot, fn);
+     }
+     return true;
+ }
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm, void *va,
+     uintptr_t opr_sz = simd_oprsz(desc);
+     float16 *d = vd, *n = vn, *m = vm, *a = va;
+     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
++    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+     intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
+-    uint32_t neg_real = flip ^ neg_imag;
++    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 4, 1);
++    uint32_t negf_real = flip ^ negf_imag;
+     intptr_t elements = opr_sz / sizeof(float16);
+     intptr_t eltspersegment = MIN(16 / sizeof(float16), elements);
++    float16 negx_imag, negx_real;
+     intptr_t i, j;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 15;
+-    neg_imag <<= 15;
++    /* With AH=0, use negx; with AH=1 use negf. */
++    negx_real = (negf_real & ~fpcr_ah) << 15;
++    negx_imag = (negf_imag & ~fpcr_ah) << 15;
++    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
++    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
+     for (i = 0; i < elements; i += eltspersegment) {
+         float16 mr = m[H2(i + 2 * index + 0)];
+         float16 mi = m[H2(i + 2 * index + 1)];
+-        float16 e1 = neg_real ^ (flip ? mi : mr);
+-        float16 e3 = neg_imag ^ (flip ? mr : mi);
++        float16 e1 = negx_real ^ (flip ? mi : mr);
++        float16 e3 = negx_imag ^ (flip ? mr : mi);
+         for (j = i; j < i + eltspersegment; j += 2) {
+             float16 e2 = n[H2(j + flip)];
+             float16 e4 = e2;
+-            d[H2(j)] = float16_muladd(e2, e1, a[H2(j)], 0, fpst);
+-            d[H2(j + 1)] = float16_muladd(e4, e3, a[H2(j + 1)], 0, fpst);
++            d[H2(j)] = float16_muladd(e2, e1, a[H2(j)], negf_real, fpst);
++            d[H2(j + 1)] = float16_muladd(e4, e3, a[H2(j + 1)], negf_imag, fpst);
+         }
+     }
+     clear_tail(d, opr_sz, simd_maxsz(desc));
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm, void *va,
+     uintptr_t opr_sz = simd_oprsz(desc);
+     float32 *d = vd, *n = vn, *m = vm, *a = va;
+     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
++    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+     intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
+-    uint32_t neg_real = flip ^ neg_imag;
++    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 4, 1);
++    uint32_t negf_real = flip ^ negf_imag;
+     intptr_t elements = opr_sz / sizeof(float32);
+     intptr_t eltspersegment = MIN(16 / sizeof(float32), elements);
++    float32 negx_imag, negx_real;
+     intptr_t i, j;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 31;
+-    neg_imag <<= 31;
++    /* With AH=0, use negx; with AH=1 use negf. */
++    negx_real = (negf_real & ~fpcr_ah) << 31;
++    negx_imag = (negf_imag & ~fpcr_ah) << 31;
++    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
++    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
+     for (i = 0; i < elements; i += eltspersegment) {
+         float32 mr = m[H4(i + 2 * index + 0)];
+         float32 mi = m[H4(i + 2 * index + 1)];
+-        float32 e1 = neg_real ^ (flip ? mi : mr);
+-        float32 e3 = neg_imag ^ (flip ? mr : mi);
++        float32 e1 = negx_real ^ (flip ? mi : mr);
++        float32 e3 = negx_imag ^ (flip ? mr : mi);
+         for (j = i; j < i + eltspersegment; j += 2) {
+             float32 e2 = n[H4(j + flip)];
+             float32 e4 = e2;
+-            d[H4(j)] = float32_muladd(e2, e1, a[H4(j)], 0, fpst);
+-            d[H4(j + 1)] = float32_muladd(e4, e3, a[H4(j + 1)], 0, fpst);
++            d[H4(j)] = float32_muladd(e2, e1, a[H4(j)], negf_real, fpst);
++            d[H4(j + 1)] = float32_muladd(e4, e3, a[H4(j + 1)], negf_imag, fpst);
+         }
+     }
+     clear_tail(d, opr_sz, simd_maxsz(desc));
+--
+.34.1

-[Qemu-devel] [PULL 03/28] target-arm: fix a segmentation fault due to illegal memory access
+[PULL 47/68] target/arm: Handle FPCR.AH in SVE FCMLA
-From: Zheng Xiang <xiang.zheng@linaro.org>
+From: Richard Henderson <richard.henderson@linaro.org>
-The elements of kvm_devices_head list are freed in kvm_arm_machine_init_done(),
+The negation step in SVE FCMLA mustn't negate a NaN when FPCR.AH is
-but we still access these illegal memory in kvm_arm_devlistener_del().
+set.  Use the same approach as we did for A64 FCMLA of passing in
 FPCR.AH and using it to select whether to negate by XOR or by the
 muladd negate_product flag.
-This will cause segment fault when booting guest with MALLOC_PERTURB_=1.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20250129013857.135256-28-richard.henderson@linaro.org
 Signed-off-by: Zheng Xiang <xiang.zheng@linaro.org>
 Message-id: 20180619075821.9884-1-zhengxiang9@huawei.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/kvm.c | 1 +
+ target/arm/tcg/sve_helper.c    | 69 +++++++++++++++++++++-------------
-file changed, 1 insertion(+)
+ target/arm/tcg/translate-sve.c |  2 +-
 files changed, 43 insertions(+), 28 deletions(-)
-diff --git a/target/arm/kvm.c b/target/arm/kvm.c
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm.c
+--- a/target/arm/tcg/sve_helper.c
-+++ b/target/arm/kvm.c
++++ b/target/arm/tcg/sve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void kvm_arm_machine_init_done(Notifier *notifier, void *data)
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
-             kvm_arm_set_device_addr(kd);
+                                void *vg, float_status *status, uint32_t desc)
-         }
+ {
-         memory_region_unref(kd->mr);
+     intptr_t j, i = simd_oprsz(desc);
-+        QSLIST_REMOVE_HEAD(&kvm_devices_head, entries);
+-    unsigned rot = simd_data(desc);
-         g_free(kd);
+-    bool flip = rot & 1;
-     }
+-    float16 neg_imag, neg_real;
-     memory_listener_unregister(&devlistener);
++    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
 +    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t negf_real = flip ^ negf_imag;
 +    float16 negx_imag, negx_real;
      uint64_t *g = vg;
 -    neg_imag = float16_set_sign(0, (rot & 2) != 0);
 -    neg_real = float16_set_sign(0, rot == 1 || rot == 2);
 +    /* With AH=0, use negx; with AH=1 use negf. */
 +    negx_real = (negf_real & ~fpcr_ah) << 15;
 +    negx_imag = (negf_imag & ~fpcr_ah) << 15;
 +    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
 +    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
      do {
          uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
              mi = *(float16 *)(vm + H1_2(j));
              e2 = (flip ? ni : nr);
 -            e1 = (flip ? mi : mr) ^ neg_real;
 +            e1 = (flip ? mi : mr) ^ negx_real;
              e4 = e2;
 -            e3 = (flip ? mr : mi) ^ neg_imag;
 +            e3 = (flip ? mr : mi) ^ negx_imag;
              if (likely((pg >> (i & 63)) & 1)) {
                  d = *(float16 *)(va + H1_2(i));
 -                d = float16_muladd(e2, e1, d, 0, status);
 +                d = float16_muladd(e2, e1, d, negf_real, status);
                  *(float16 *)(vd + H1_2(i)) = d;
              }
              if (likely((pg >> (j & 63)) & 1)) {
                  d = *(float16 *)(va + H1_2(j));
 -                d = float16_muladd(e4, e3, d, 0, status);
 +                d = float16_muladd(e4, e3, d, negf_imag, status);
                  *(float16 *)(vd + H1_2(j)) = d;
              }
          } while (i & 63);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
      intptr_t j, i = simd_oprsz(desc);
 -    unsigned rot = simd_data(desc);
 -    bool flip = rot & 1;
 -    float32 neg_imag, neg_real;
 +    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
 +    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t negf_real = flip ^ negf_imag;
 +    float32 negx_imag, negx_real;
      uint64_t *g = vg;
 -    neg_imag = float32_set_sign(0, (rot & 2) != 0);
 -    neg_real = float32_set_sign(0, rot == 1 || rot == 2);
 +    /* With AH=0, use negx; with AH=1 use negf. */
 +    negx_real = (negf_real & ~fpcr_ah) << 31;
 +    negx_imag = (negf_imag & ~fpcr_ah) << 31;
 +    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
 +    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
      do {
          uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
              mi = *(float32 *)(vm + H1_2(j));
              e2 = (flip ? ni : nr);
 -            e1 = (flip ? mi : mr) ^ neg_real;
 +            e1 = (flip ? mi : mr) ^ negx_real;
              e4 = e2;
 -            e3 = (flip ? mr : mi) ^ neg_imag;
 +            e3 = (flip ? mr : mi) ^ negx_imag;
              if (likely((pg >> (i & 63)) & 1)) {
                  d = *(float32 *)(va + H1_2(i));
 -                d = float32_muladd(e2, e1, d, 0, status);
 +                d = float32_muladd(e2, e1, d, negf_real, status);
                  *(float32 *)(vd + H1_2(i)) = d;
              }
              if (likely((pg >> (j & 63)) & 1)) {
                  d = *(float32 *)(va + H1_2(j));
 -                d = float32_muladd(e4, e3, d, 0, status);
 +                d = float32_muladd(e4, e3, d, negf_imag, status);
                  *(float32 *)(vd + H1_2(j)) = d;
              }
          } while (i & 63);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
      intptr_t j, i = simd_oprsz(desc);
 -    unsigned rot = simd_data(desc);
 -    bool flip = rot & 1;
 -    float64 neg_imag, neg_real;
 +    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
 +    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t negf_real = flip ^ negf_imag;
 +    float64 negx_imag, negx_real;
      uint64_t *g = vg;
 -    neg_imag = float64_set_sign(0, (rot & 2) != 0);
 -    neg_real = float64_set_sign(0, rot == 1 || rot == 2);
 +    /* With AH=0, use negx; with AH=1 use negf. */
 +    negx_real = (uint64_t)(negf_real & ~fpcr_ah) << 63;
 +    negx_imag = (uint64_t)(negf_imag & ~fpcr_ah) << 63;
 +    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
 +    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
      do {
          uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
              mi = *(float64 *)(vm + H1_2(j));
              e2 = (flip ? ni : nr);
 -            e1 = (flip ? mi : mr) ^ neg_real;
 +            e1 = (flip ? mi : mr) ^ negx_real;
              e4 = e2;
 -            e3 = (flip ? mr : mi) ^ neg_imag;
 +            e3 = (flip ? mr : mi) ^ negx_imag;
              if (likely((pg >> (i & 63)) & 1)) {
                  d = *(float64 *)(va + H1_2(i));
 -                d = float64_muladd(e2, e1, d, 0, status);
 +                d = float64_muladd(e2, e1, d, negf_real, status);
                  *(float64 *)(vd + H1_2(i)) = d;
              }
              if (likely((pg >> (j & 63)) & 1)) {
                  d = *(float64 *)(va + H1_2(j));
 -                d = float64_muladd(e4, e3, d, 0, status);
 +                d = float64_muladd(e4, e3, d, negf_imag, status);
                  *(float64 *)(vd + H1_2(j)) = d;
              }
          } while (i & 63);
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_5_ptr * const fcmla_fns[4] = {
      gen_helper_sve_fcmla_zpzzz_s, gen_helper_sve_fcmla_zpzzz_d,
  };
  TRANS_FEAT(FCMLA_zpzzz, aa64_sve, gen_gvec_fpst_zzzzp, fcmla_fns[a->esz],
 -           a->rd, a->rn, a->rm, a->ra, a->pg, a->rot,
 +           a->rd, a->rn, a->rm, a->ra, a->pg, a->rot | (s->fpcr_ah << 2),
             a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
  static gen_helper_gvec_4_ptr * const fcmla_idx_fns[4] = {
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 16/28] xlnx-zynqmp: Swap Cortex-R5 for Cortex-R5F
+[PULL 48/68] target/arm: Handle FPCR.AH in FMLSL (by element and vector)
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-The ZynqMP has Cortex-R5Fs with the optional FPU enabled.
+Handle FPCR.AH's requirement to not negate the sign of a NaN
 in FMLSL by element and vector, using the usual trick of
 negating by XOR when AH=0 and by muladd flags when AH=1.
-Reviewed-by: KONRAD Frederic <frederic.konrad@adacore.com>
+Since we have the CPUARMState* in the helper anyway, we can
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+look directly at env->vfp.fpcr and don't need toa pass in the
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+FPCR.AH value via the SIMD data word.
-Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180529124707.3025-3-edgar.iglesias@gmail.com
+Message-id: 20250129013857.135256-31-richard.henderson@linaro.org
 [PMM: commit message tweaked]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/xlnx-zcu102.c | 2 +-
+ target/arm/tcg/vec_helper.c | 71 ++++++++++++++++++++++++-------------
- hw/arm/xlnx-zynqmp.c | 2 +-
+file changed, 46 insertions(+), 25 deletions(-)
 files changed, 2 insertions(+), 2 deletions(-)
-diff --git a/hw/arm/xlnx-zcu102.c b/hw/arm/xlnx-zcu102.c
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/xlnx-zcu102.c
+--- a/target/arm/tcg/vec_helper.c
-+++ b/hw/arm/xlnx-zcu102.c
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static void xlnx_zcu102_machine_class_init(ObjectClass *oc, void *data)
+@@ -XXX,XX +XXX,XX @@ static uint64_t load4_f16(uint64_t *ptr, int is_q, int is_2)
   */
  static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
 -                     uint32_t desc, bool fz16)
 +                     uint64_t negx, int negf, uint32_t desc, bool fz16)
  {
-     MachineClass *mc = MACHINE_CLASS(oc);
+     intptr_t i, oprsz = simd_oprsz(desc);
+-    int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
--    mc->desc = "Xilinx ZynqMP ZCU102 board with 4xA53s and 2xR5s based on " \
+     int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-+    mc->desc = "Xilinx ZynqMP ZCU102 board with 4xA53s and 2xR5Fs based on " \
+     int is_q = oprsz == 16;
-                "the value of smp";
+     uint64_t n_4, m_4;
-     mc->init = xlnx_zcu102_init;
-     mc->block_default_type = IF_IDE;
+-    /* Pre-load all of the f16 data, avoiding overlap issues.  */
-diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
+-    n_4 = load4_f16(vn, is_q, is_2);
-index XXXXXXX..XXXXXXX 100644
++    /*
---- a/hw/arm/xlnx-zynqmp.c
++     * Pre-load all of the f16 data, avoiding overlap issues.
-+++ b/hw/arm/xlnx-zynqmp.c
++     * Negate all inputs for AH=0 FMLSL at once.
-@@ -XXX,XX +XXX,XX @@ static void xlnx_zynqmp_create_rpu(XlnxZynqMPState *s, const char *boot_cpu,
++     */
-         char *name;
++    n_4 = load4_f16(vn, is_q, is_2) ^ negx;
+     m_4 = load4_f16(vm, is_q, is_2);
-         object_initialize(&s->rpu_cpu[i], sizeof(s->rpu_cpu[i]),
--                          "cortex-r5-" TYPE_ARM_CPU);
+-    /* Negate all inputs for FMLSL at once.  */
-+                          "cortex-r5f-" TYPE_ARM_CPU);
+-    if (is_s) {
-         object_property_add_child(OBJECT(s), "rpu-cpu[*]",
+-        n_4 ^= 0x8000800080008000ull;
-                                   OBJECT(&s->rpu_cpu[i]), &error_abort);
+-    }
 -
      for (i = 0; i < oprsz / 4; i++) {
          float32 n_1 = float16_to_float32_by_bits(n_4 >> (i * 16), fz16);
          float32 m_1 = float16_to_float32_by_bits(m_4 >> (i * 16), fz16);
 -        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], 0, fpst);
 +        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], negf, fpst);
      }
      clear_tail(d, oprsz, simd_maxsz(desc));
  }
@@ -XXX,XX +XXX,XX @@ static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
  void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
                              CPUARMState *env, uint32_t desc)
  {
 -    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, desc,
 +    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 +
 +    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
               get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
  }
  void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
                              CPUARMState *env, uint32_t desc)
  {
 -    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, desc,
 +    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint64_t negx = 0;
 +    int negf = 0;
 +
 +    if (is_s) {
 +        if (env->vfp.fpcr & FPCR_AH) {
 +            negf = float_muladd_negate_product;
 +        } else {
 +            negx = 0x8000800080008000ull;
 +        }
 +    }
 +    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
               get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
  }
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
  }
  static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
 -                         uint32_t desc, bool fz16)
 +                         uint64_t negx, int negf, uint32_t desc, bool fz16)
  {
      intptr_t i, oprsz = simd_oprsz(desc);
 -    int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
      int index = extract32(desc, SIMD_DATA_SHIFT + 2, 3);
      int is_q = oprsz == 16;
      uint64_t n_4;
      float32 m_1;
 -    /* Pre-load all of the f16 data, avoiding overlap issues.  */
 -    n_4 = load4_f16(vn, is_q, is_2);
 -
 -    /* Negate all inputs for FMLSL at once.  */
 -    if (is_s) {
 -        n_4 ^= 0x8000800080008000ull;
 -    }
 -
 +    /*
 +     * Pre-load all of the f16 data, avoiding overlap issues.
 +     * Negate all inputs for AH=0 FMLSL at once.
 +     */
 +    n_4 = load4_f16(vn, is_q, is_2) ^ negx;
      m_1 = float16_to_float32_by_bits(((float16 *)vm)[H2(index)], fz16);
      for (i = 0; i < oprsz / 4; i++) {
          float32 n_1 = float16_to_float32_by_bits(n_4 >> (i * 16), fz16);
 -        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], 0, fpst);
 +        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], negf, fpst);
      }
      clear_tail(d, oprsz, simd_maxsz(desc));
  }
@@ -XXX,XX +XXX,XX @@ static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
  void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
                                  CPUARMState *env, uint32_t desc)
  {
 -    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, desc,
 +    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 +
 +    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
                   get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
  }
  void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
                                  CPUARMState *env, uint32_t desc)
  {
 -    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, desc,
 +    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint64_t negx = 0;
 +    int negf = 0;
 +
 +    if (is_s) {
 +        if (env->vfp.fpcr & FPCR_AH) {
 +            negf = float_muladd_negate_product;
 +        } else {
 +            negx = 0x8000800080008000ull;
 +        }
 +    }
 +    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
                   get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
  }
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 26/28] target/arm: Strict alignment for ARMv6-M and ARMv8-M Baseline
+[PULL 49/68] target/arm: Handle FPCR.AH in SVE FMLSL (indexed)
-From: Julia Suvorova <jusual@mail.ru>
+From: Richard Henderson <richard.henderson@linaro.org>
-Unlike ARMv7-M, ARMv6-M and ARMv8-M Baseline only supports naturally
+Handle FPCR.AH's requirement to not negate the sign of a NaN in SVE
-aligned memory accesses for load/store instructions.
+FMLSL (indexed), using the usual trick of negating by XOR when AH=0
 and by muladd flags when AH=1.
-Signed-off-by: Julia Suvorova <jusual@mail.ru>
+Since we have the CPUARMState* in the helper anyway, we can
-Message-id: 20180622080138.17702-3-jusual@mail.ru
+look directly at env->vfp.fpcr and don't need toa pass in the
 FPCR.AH value via the SIMD data word.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20250129013857.135256-32-richard.henderson@linaro.org
 [PMM: commit message tweaked]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 18 ++++++++++++++++--
+ target/arm/tcg/vec_helper.c | 15 ++++++++++++---
-file changed, 16 insertions(+), 2 deletions(-)
+file changed, 12 insertions(+), 3 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/vec_helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static inline TCGv gen_aa32_addr(DisasContext *s, TCGv_i32 a32, TCGMemOp op)
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
- static void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
+                                CPUARMState *env, uint32_t desc)
                              int index, TCGMemOp opc)
  {
--    TCGv addr = gen_aa32_addr(s, a32, opc);
+     intptr_t i, j, oprsz = simd_oprsz(desc);
-+    TCGv addr;
+-    uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
 +    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
      intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
      float_status *status = &env->vfp.fp_status_a64;
      bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
 +    int negx = 0, negf = 0;
 +
-+    if (arm_dc_feature(s, ARM_FEATURE_M) &&
++    if (is_s) {
-+        !arm_dc_feature(s, ARM_FEATURE_M_MAIN)) {
++        if (env->vfp.fpcr & FPCR_AH) {
-+        opc |= MO_ALIGN;
++            negf = float_muladd_negate_product;
 +        } else {
 +            negx = 0x8000;
 +        }
 +    }
-+
-+    addr = gen_aa32_addr(s, a32, opc);
+     for (i = 0; i < oprsz; i += 16) {
-     tcg_gen_qemu_ld_i32(val, addr, index, opc);
+         float16 mm_16 = *(float16 *)(vm + i + idx);
-     tcg_temp_free(addr);
+         float32 mm = float16_to_float32_by_bits(mm_16, fz16);
- }
-@@ -XXX,XX +XXX,XX @@ static void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
+         for (j = 0; j < 16; j += sizeof(float32)) {
- static void gen_aa32_st_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
+-            float16 nn_16 = *(float16 *)(vn + H1_2(i + j + sel)) ^ negn;
-                             int index, TCGMemOp opc)
++            float16 nn_16 = *(float16 *)(vn + H1_2(i + j + sel)) ^ negx;
- {
+             float32 nn = float16_to_float32_by_bits(nn_16, fz16);
--    TCGv addr = gen_aa32_addr(s, a32, opc);
+             float32 aa = *(float32 *)(va + H1_4(i + j));
-+    TCGv addr;
-+
+             *(float32 *)(vd + H1_4(i + j)) =
-+    if (arm_dc_feature(s, ARM_FEATURE_M) &&
+-                float32_muladd(nn, mm, aa, 0, status);
-+        !arm_dc_feature(s, ARM_FEATURE_M_MAIN)) {
++                float32_muladd(nn, mm, aa, negf, status);
-+        opc |= MO_ALIGN;
+         }
-+    }
+     }
 +
 +    addr = gen_aa32_addr(s, a32, opc);
      tcg_gen_qemu_st_i32(val, addr, index, opc);
      tcg_temp_free(addr);
  }
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 07/28] hw/intc/arm_gicv3_kvm: Get prepared to handle multiple redist regions
+[PULL 50/68] target/arm: Handle FPCR.AH in SVE FMLSLB, FMLSLT (vectors)
-From: Eric Auger <eric.auger@redhat.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-Let's check if KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION is supported.
+Handle FPCR.AH's requirement to not negate the sign of a NaN in SVE
-If not, we check the number of redist region is equal to 1 and use the
+FMLSL (indexed), using the usual trick of negating by XOR when AH=0
-legacy KVM_VGIC_V3_ADDR_TYPE_REDIST attribute. Otherwise we use
+and by muladd flags when AH=1.
 the new attribute and allow to register multiple regions to the
 KVM device.
-Signed-off-by: Eric Auger <eric.auger@redhat.com>
+Since we have the CPUARMState* in the helper anyway, we can
 look directly at env->vfp.fpcr and don't need toa pass in the
 FPCR.AH value via the SIMD data word.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20250129013857.135256-33-richard.henderson@linaro.org
 [PMM: tweaked commit message]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Andrew Jones <drjones@redhat.com>
-Message-id: 1529072910-16156-5-git-send-email-eric.auger@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/intc/arm_gicv3_kvm.c | 37 ++++++++++++++++++++++++++++++++++---
+ target/arm/tcg/vec_helper.c | 15 ++++++++++++---
-file changed, 34 insertions(+), 3 deletions(-)
+file changed, 12 insertions(+), 3 deletions(-)
-diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/arm_gicv3_kvm.c
+--- a/target/arm/tcg/vec_helper.c
-+++ b/hw/intc/arm_gicv3_kvm.c
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
                                 CPUARMState *env, uint32_t desc)
  {
-     GICv3State *s = KVM_ARM_GICV3(dev);
+     intptr_t i, oprsz = simd_oprsz(desc);
-     KVMARMGICv3Class *kgc = KVM_ARM_GICV3_GET_CLASS(s);
+-    uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
-+    bool multiple_redist_region_allowed;
++    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-     Error *local_err = NULL;
+     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
-     int i;
+     float_status *status = &env->vfp.fp_status_a64;
+     bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
-@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
++    int negx = 0, negf = 0;
          return;
      }
 +    multiple_redist_region_allowed =
 +        kvm_device_check_attr(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
 +                              KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION);
 +
-+    if (!multiple_redist_region_allowed && s->nb_redist_regions > 1) {
++    if (is_s) {
-+        error_setg(errp, "Multiple VGICv3 redistributor regions are not "
++        if (env->vfp.fpcr & FPCR_AH) {
-+                   "supported by this host kernel");
++            negf = float_muladd_negate_product;
-+        error_append_hint(errp, "A maximum of %d VCPUs can be used",
++        } else {
-+                          s->redist_region_count[0]);
++            negx = 0x8000;
 +        return;
 +    }
 +
      kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_NR_IRQS,
 , &s->num_irq, true, &error_abort);
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
      kvm_arm_register_device(&s->iomem_dist, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
                              KVM_VGIC_V3_ADDR_TYPE_DIST, s->dev_fd, 0);
 -    kvm_arm_register_device(&s->iomem_redist[0], -1,
 -                            KVM_DEV_ARM_VGIC_GRP_ADDR,
 -                            KVM_VGIC_V3_ADDR_TYPE_REDIST, s->dev_fd, 0);
 +
 +    if (!multiple_redist_region_allowed) {
 +        kvm_arm_register_device(&s->iomem_redist[0], -1,
 +                                KVM_DEV_ARM_VGIC_GRP_ADDR,
 +                                KVM_VGIC_V3_ADDR_TYPE_REDIST, s->dev_fd, 0);
 +    } else {
 +        /* we register regions in reverse order as "devices" are inserted at
 +         * the head of a QSLIST and the list is then popped from the head
 +         * onwards by kvm_arm_machine_init_done()
 +         */
 +        for (i = s->nb_redist_regions - 1; i >= 0; i--) {
 +            /* Address mask made of the rdist region index and count */
 +            uint64_t addr_ormask =
 +                        i | ((uint64_t)s->redist_region_count[i] << 52);
 +
 +            kvm_arm_register_device(&s->iomem_redist[i], -1,
 +                                    KVM_DEV_ARM_VGIC_GRP_ADDR,
 +                                    KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION,
 +                                    s->dev_fd, addr_ormask);
 +        }
 +    }
-     if (kvm_has_gsi_routing()) {
+     for (i = 0; i < oprsz; i += sizeof(float32)) {
-         /* set up irq routing */
+-        float16 nn_16 = *(float16 *)(vn + H1_2(i + sel)) ^ negn;
 +        float16 nn_16 = *(float16 *)(vn + H1_2(i + sel)) ^ negx;
          float16 mm_16 = *(float16 *)(vm + H1_2(i + sel));
          float32 nn = float16_to_float32_by_bits(nn_16, fz16);
          float32 mm = float16_to_float32_by_bits(mm_16, fz16);
          float32 aa = *(float32 *)(va + H1_4(i));
 -        *(float32 *)(vd + H1_4(i)) = float32_muladd(nn, mm, aa, 0, status);
 +        *(float32 *)(vd + H1_4(i)) = float32_muladd(nn, mm, aa, negf, status);
      }
  }
 --
-.17.1
+.34.1

-New patch
+[PULL 51/68] target/arm: Enable FEAT_AFP for '-cpu max'
+Now that we have completed the handling for FPCR.{AH,FIZ,NEP}, we
+can enable FEAT_AFP for '-cpu max', and document that we support it.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ docs/system/arm/emulation.rst | 1 +
+ target/arm/tcg/cpu64.c        | 1 +
+files changed, 2 insertions(+)
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
+index XXXXXXX..XXXXXXX 100644
+--- a/docs/system/arm/emulation.rst
++++ b/docs/system/arm/emulation.rst
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
+ - FEAT_AA64EL3 (Support for AArch64 at EL3)
+ - FEAT_AdvSIMD (Advanced SIMD Extension)
+ - FEAT_AES (AESD and AESE instructions)
++- FEAT_AFP (Alternate floating-point behavior)
+ - FEAT_Armv9_Crypto (Armv9 Cryptographic Extension)
+ - FEAT_ASID16 (16 bit ASID)
+ - FEAT_BBM at level 2 (Translation table break-before-make levels)
+diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/cpu64.c
++++ b/target/arm/tcg/cpu64.c
+@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
+     t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1);      /* FEAT_XNX */
+     t = FIELD_DP64(t, ID_AA64MMFR1, ETS, 2);      /* FEAT_ETS2 */
+     t = FIELD_DP64(t, ID_AA64MMFR1, HCX, 1);      /* FEAT_HCX */
++    t = FIELD_DP64(t, ID_AA64MMFR1, AFP, 1);      /* FEAT_AFP */
+     t = FIELD_DP64(t, ID_AA64MMFR1, TIDCP1, 1);   /* FEAT_TIDCP1 */
+     t = FIELD_DP64(t, ID_AA64MMFR1, CMOW, 1);     /* FEAT_CMOW */
+     cpu->isar.id_aa64mmfr1 = t;
+--
+.34.1

-[Qemu-devel] [PULL 04/28] linux-headers: Update to kernel mainline commit b357bf602
+[PULL 52/68] target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
-From: Eric Auger <eric.auger@redhat.com>
+FEAT_RPRES implements an "increased precision" variant of the single
 precision FRECPE and FRSQRTE instructions from an 8 bit to a 12
 bit mantissa. This applies only when FPCR.AH == 1. Note that the
 halfprec and double versions of these insns retain the 8 bit
 precision regardless.
-Update our kernel headers to mainline commit
+In this commit we add all the plumbing to make these instructions
-b357bf6023a948cf6a9472f07a1b0caac0e4f8e8
+call a new helper function when the increased-precision is in
-("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
+effect. In the following commit we will provide the actual change
 in behaviour in the helpers.
-Signed-off-by: Eric Auger <eric.auger@redhat.com>
-Message-id: 1529072910-16156-2-git-send-email-eric.auger@redhat.com
-[PMM:  clarified commit message]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/standard-headers/linux/pci_regs.h            |  8 ++++++++
+ target/arm/cpu-features.h      |  5 +++++
- include/standard-headers/linux/virtio_gpu.h          |  1 +
+ target/arm/helper.h            |  4 ++++
- include/standard-headers/linux/virtio_net.h          |  3 +++
+ target/arm/tcg/translate-a64.c | 34 ++++++++++++++++++++++++++++++----
- linux-headers/asm-arm/kvm.h                          |  1 +
+ target/arm/tcg/translate-sve.c | 16 ++++++++++++++--
- linux-headers/asm-arm/unistd-common.h                |  1 +
+ target/arm/tcg/vec_helper.c    |  2 ++
- linux-headers/asm-arm64/kvm.h                        |  1 +
+ target/arm/vfp_helper.c        | 32 ++++++++++++++++++++++++++++++--
- linux-headers/asm-generic/unistd.h                   |  4 +++-
+files changed, 85 insertions(+), 8 deletions(-)
  linux-headers/asm-powerpc/unistd.h                   |  1 +
  linux-headers/asm-x86/unistd_32.h                    |  2 ++
  linux-headers/asm-x86/unistd_64.h                    |  2 ++
  linux-headers/asm-x86/unistd_x32.h                   |  2 ++
  linux-headers/linux/kvm.h                            |  5 +++--
  linux-headers/linux/psp-sev.h                        | 12 ++++++++++++
  linux-headers/LICENSES/exceptions/Linux-syscall-note |  2 +-
  linux-headers/LICENSES/preferred/GPL-2.0             |  6 ++++++
 files changed, 47 insertions(+), 4 deletions(-)
-diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
+diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/standard-headers/linux/pci_regs.h
+--- a/target/arm/cpu-features.h
-+++ b/include/standard-headers/linux/pci_regs.h
++++ b/target/arm/cpu-features.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_mops(const ARMISARegisters *id)
- #define  PCI_EXP_DEVCTL_READRQ_256B  0x1000 /* 256 Bytes */
+     return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, MOPS);
- #define  PCI_EXP_DEVCTL_READRQ_512B  0x2000 /* 512 Bytes */
+ }
- #define  PCI_EXP_DEVCTL_READRQ_1024B 0x3000 /* 1024 Bytes */
-+#define  PCI_EXP_DEVCTL_READRQ_2048B 0x4000 /* 2048 Bytes */
++static inline bool isar_feature_aa64_rpres(const ARMISARegisters *id)
-+#define  PCI_EXP_DEVCTL_READRQ_4096B 0x5000 /* 4096 Bytes */
++{
- #define  PCI_EXP_DEVCTL_BCR_FLR 0x8000  /* Bridge Configuration Retry / FLR */
++    return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, RPRES);
- #define PCI_EXP_DEVSTA        10    /* Device Status */
++}
- #define  PCI_EXP_DEVSTA_CED    0x0001    /* Correctable Error Detected */
++
-@@ -XXX,XX +XXX,XX @@
+ static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
- #define  PCI_EXP_LNKCAP2_SLS_16_0GB    0x00000010 /* Supported Speed 16GT/s */
+ {
- #define  PCI_EXP_LNKCAP2_CROSSLINK    0x00000100 /* Crosslink supported */
+     /* We always set the AdvSIMD and FP fields identically.  */
- #define PCI_EXP_LNKCTL2        48    /* Link Control 2 */
+diff --git a/target/arm/helper.h b/target/arm/helper.h
-+#define PCI_EXP_LNKCTL2_TLS        0x000f
+index XXXXXXX..XXXXXXX 100644
-+#define PCI_EXP_LNKCTL2_TLS_2_5GT    0x0001 /* Supported Speed 2.5GT/s */
+--- a/target/arm/helper.h
-+#define PCI_EXP_LNKCTL2_TLS_5_0GT    0x0002 /* Supported Speed 5GT/s */
++++ b/target/arm/helper.h
-+#define PCI_EXP_LNKCTL2_TLS_8_0GT    0x0003 /* Supported Speed 8GT/s */
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, fpst)
-+#define PCI_EXP_LNKCTL2_TLS_16_0GT    0x0004 /* Supported Speed 16GT/s */
- #define PCI_EXP_LNKSTA2        50    /* Link Status 2 */
+ DEF_HELPER_FLAGS_2(recpe_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
- #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2    52    /* v2 endpoints with link end here */
+ DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
- #define PCI_EXP_SLTCAP2        52    /* Slot Capabilities 2 */
++DEF_HELPER_FLAGS_2(recpe_rpres_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
-@@ -XXX,XX +XXX,XX @@
+ DEF_HELPER_FLAGS_2(recpe_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
- #define  PCI_EXP_DPC_CAP_DL_ACTIVE    0x1000    /* ERR_COR signal on DL_Active supported */
+ DEF_HELPER_FLAGS_2(rsqrte_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
+ DEF_HELPER_FLAGS_2(rsqrte_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
- #define PCI_EXP_DPC_CTL            6    /* DPC control */
++DEF_HELPER_FLAGS_2(rsqrte_rpres_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
-+#define  PCI_EXP_DPC_CTL_EN_FATAL     0x0001    /* Enable trigger on ERR_FATAL message */
+ DEF_HELPER_FLAGS_2(rsqrte_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
- #define  PCI_EXP_DPC_CTL_EN_NONFATAL     0x0002    /* Enable trigger on ERR_NONFATAL message */
+ DEF_HELPER_FLAGS_1(recpe_u32, TCG_CALL_NO_RWG, i32, i32)
- #define  PCI_EXP_DPC_CTL_INT_EN     0x0008    /* DPC Interrupt Enable */
+ DEF_HELPER_FLAGS_1(rsqrte_u32, TCG_CALL_NO_RWG, i32, i32)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vrintx_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
-diff --git a/include/standard-headers/linux/virtio_gpu.h b/include/standard-headers/linux/virtio_gpu.h
-index XXXXXXX..XXXXXXX 100644
+ DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
---- a/include/standard-headers/linux/virtio_gpu.h
+ DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
-+++ b/include/standard-headers/linux/virtio_gpu.h
++DEF_HELPER_FLAGS_4(gvec_frecpe_rpres_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
-@@ -XXX,XX +XXX,XX @@ struct virtio_gpu_cmd_submit {
+ DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
- };
+ DEF_HELPER_FLAGS_4(gvec_frsqrte_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
- #define VIRTIO_GPU_CAPSET_VIRGL 1
+ DEF_HELPER_FLAGS_4(gvec_frsqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
-+#define VIRTIO_GPU_CAPSET_VIRGL2 2
++DEF_HELPER_FLAGS_4(gvec_frsqrte_rpres_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_4(gvec_frsqrte_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
- /* VIRTIO_GPU_CMD_GET_CAPSET_INFO */
- struct virtio_gpu_get_capset_info {
+ DEF_HELPER_FLAGS_4(gvec_fcgt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
-diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/standard-headers/linux/virtio_net.h
+--- a/target/arm/tcg/translate-a64.c
-+++ b/include/standard-headers/linux/virtio_net.h
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frecpe = {
-                      * Steering */
+     gen_helper_recpe_f32,
- #define VIRTIO_NET_F_CTRL_MAC_ADDR 23    /* Set MAC address */
+     gen_helper_recpe_f64,
+ };
-+#define VIRTIO_NET_F_STANDBY      62    /* Act as standby for another device
+-TRANS(FRECPE_s, do_fp1_scalar_ah, a, &f_scalar_frecpe, -1)
-+                     * with the same MAC.
++static const FPScalar1 f_scalar_frecpe_rpres = {
-+                     */
++    gen_helper_recpe_f16,
- #define VIRTIO_NET_F_SPEED_DUPLEX 63    /* Device set linkspeed and duplex */
++    gen_helper_recpe_rpres_f32,
++    gen_helper_recpe_f64,
- #ifndef VIRTIO_NET_NO_LEGACY
++};
-diff --git a/linux-headers/asm-arm/kvm.h b/linux-headers/asm-arm/kvm.h
++TRANS(FRECPE_s, do_fp1_scalar_ah, a,
-index XXXXXXX..XXXXXXX 100644
++      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
---- a/linux-headers/asm-arm/kvm.h
++      &f_scalar_frecpe_rpres : &f_scalar_frecpe, -1)
-+++ b/linux-headers/asm-arm/kvm.h
-@@ -XXX,XX +XXX,XX @@ struct kvm_regs {
+ static const FPScalar1 f_scalar_frecpx = {
- #define KVM_VGIC_V3_ADDR_TYPE_DIST    2
+     gen_helper_frecpx_f16,
- #define KVM_VGIC_V3_ADDR_TYPE_REDIST    3
+@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frsqrte = {
- #define KVM_VGIC_ITS_ADDR_TYPE        4
+     gen_helper_rsqrte_f32,
-+#define KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION    5
+     gen_helper_rsqrte_f64,
+ };
- #define KVM_VGIC_V3_DIST_SIZE        SZ_64K
+-TRANS(FRSQRTE_s, do_fp1_scalar_ah, a, &f_scalar_frsqrte, -1)
- #define KVM_VGIC_V3_REDIST_SIZE        (2 * SZ_64K)
++static const FPScalar1 f_scalar_frsqrte_rpres = {
-diff --git a/linux-headers/asm-arm/unistd-common.h b/linux-headers/asm-arm/unistd-common.h
++    gen_helper_rsqrte_f16,
-index XXXXXXX..XXXXXXX 100644
++    gen_helper_rsqrte_rpres_f32,
---- a/linux-headers/asm-arm/unistd-common.h
++    gen_helper_rsqrte_f64,
-+++ b/linux-headers/asm-arm/unistd-common.h
++};
-@@ -XXX,XX +XXX,XX @@
++TRANS(FRSQRTE_s, do_fp1_scalar_ah, a,
- #define __NR_pkey_alloc (__NR_SYSCALL_BASE + 395)
++      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
- #define __NR_pkey_free (__NR_SYSCALL_BASE + 396)
++      &f_scalar_frsqrte_rpres : &f_scalar_frsqrte, -1)
- #define __NR_statx (__NR_SYSCALL_BASE + 397)
-+#define __NR_rseq (__NR_SYSCALL_BASE + 398)
+ static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
+ {
- #endif /* _ASM_ARM_UNISTD_COMMON_H */
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_frecpe[] = {
-diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
+     gen_helper_gvec_frecpe_s,
-index XXXXXXX..XXXXXXX 100644
+     gen_helper_gvec_frecpe_d,
---- a/linux-headers/asm-arm64/kvm.h
+ };
-+++ b/linux-headers/asm-arm64/kvm.h
+-TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
-@@ -XXX,XX +XXX,XX @@ struct kvm_regs {
++static gen_helper_gvec_2_ptr * const f_frecpe_rpres[] = {
- #define KVM_VGIC_V3_ADDR_TYPE_DIST    2
++    gen_helper_gvec_frecpe_h,
- #define KVM_VGIC_V3_ADDR_TYPE_REDIST    3
++    gen_helper_gvec_frecpe_rpres_s,
- #define KVM_VGIC_ITS_ADDR_TYPE        4
++    gen_helper_gvec_frecpe_d,
-+#define KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION    5
++};
++TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0,
- #define KVM_VGIC_V3_DIST_SIZE        SZ_64K
++      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ? f_frecpe_rpres : f_frecpe)
- #define KVM_VGIC_V3_REDIST_SIZE        (2 * SZ_64K)
-diff --git a/linux-headers/asm-generic/unistd.h b/linux-headers/asm-generic/unistd.h
+ static gen_helper_gvec_2_ptr * const f_frsqrte[] = {
-index XXXXXXX..XXXXXXX 100644
+     gen_helper_gvec_frsqrte_h,
---- a/linux-headers/asm-generic/unistd.h
+     gen_helper_gvec_frsqrte_s,
-+++ b/linux-headers/asm-generic/unistd.h
+     gen_helper_gvec_frsqrte_d,
-@@ -XXX,XX +XXX,XX @@ __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
+ };
- __SYSCALL(__NR_pkey_free,     sys_pkey_free)
+-TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
- #define __NR_statx 291
++static gen_helper_gvec_2_ptr * const f_frsqrte_rpres[] = {
- __SYSCALL(__NR_statx,     sys_statx)
++    gen_helper_gvec_frsqrte_h,
-+#define __NR_io_pgetevents 292
++    gen_helper_gvec_frsqrte_rpres_s,
-+__SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
++    gen_helper_gvec_frsqrte_d,
++};
- #undef __NR_syscalls
++TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0,
--#define __NR_syscalls 292
++      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ? f_frsqrte_rpres : f_frsqrte)
-+#define __NR_syscalls 293
  static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
  {
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const frecpe_fns[] = {
      NULL,                     gen_helper_gvec_frecpe_h,
      gen_helper_gvec_frecpe_s, gen_helper_gvec_frecpe_d,
  };
 -TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frecpe_fns[a->esz], a, 0)
 +static gen_helper_gvec_2_ptr * const frecpe_rpres_fns[] = {
 +    NULL,                           gen_helper_gvec_frecpe_h,
 +    gen_helper_gvec_frecpe_rpres_s, gen_helper_gvec_frecpe_d,
 +};
 +TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz,
 +           s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
 +           frecpe_rpres_fns[a->esz] : frecpe_fns[a->esz], a, 0)
  static gen_helper_gvec_2_ptr * const frsqrte_fns[] = {
      NULL,                      gen_helper_gvec_frsqrte_h,
      gen_helper_gvec_frsqrte_s, gen_helper_gvec_frsqrte_d,
  };
 -TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frsqrte_fns[a->esz], a, 0)
 +static gen_helper_gvec_2_ptr * const frsqrte_rpres_fns[] = {
 +    NULL,                            gen_helper_gvec_frsqrte_h,
 +    gen_helper_gvec_frsqrte_rpres_s, gen_helper_gvec_frsqrte_d,
 +};
 +TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz,
 +           s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
 +           frsqrte_rpres_fns[a->esz] : frsqrte_fns[a->esz], a, 0)
  /*
-  * 32 bit systems traditionally used different
+  *** SVE Floating Point Compare with Zero Group
-diff --git a/linux-headers/asm-powerpc/unistd.h b/linux-headers/asm-powerpc/unistd.h
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/linux-headers/asm-powerpc/unistd.h
+--- a/target/arm/tcg/vec_helper.c
-+++ b/linux-headers/asm-powerpc/unistd.h
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, float_status *stat, uint32_t desc)  \
- #define __NR_pkey_alloc        384
- #define __NR_pkey_free        385
+ DO_2OP(gvec_frecpe_h, helper_recpe_f16, float16)
- #define __NR_pkey_mprotect    386
+ DO_2OP(gvec_frecpe_s, helper_recpe_f32, float32)
-+#define __NR_rseq        387
++DO_2OP(gvec_frecpe_rpres_s, helper_recpe_rpres_f32, float32)
+ DO_2OP(gvec_frecpe_d, helper_recpe_f64, float64)
- #endif /* _ASM_POWERPC_UNISTD_H_ */
-diff --git a/linux-headers/asm-x86/unistd_32.h b/linux-headers/asm-x86/unistd_32.h
+ DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16)
-index XXXXXXX..XXXXXXX 100644
+ DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32)
---- a/linux-headers/asm-x86/unistd_32.h
++DO_2OP(gvec_frsqrte_rpres_s, helper_rsqrte_rpres_f32, float32)
-+++ b/linux-headers/asm-x86/unistd_32.h
+ DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64)
-@@ -XXX,XX +XXX,XX @@
- #define __NR_pkey_free 382
+ DO_2OP(gvec_vrintx_h, float16_round_to_int, float16)
- #define __NR_statx 383
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
- #define __NR_arch_prctl 384
+index XXXXXXX..XXXXXXX 100644
-+#define __NR_io_pgetevents 385
+--- a/target/arm/vfp_helper.c
-+#define __NR_rseq 386
++++ b/target/arm/vfp_helper.c
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, float_status *fpst)
- #endif /* _ASM_X86_UNISTD_32_H */
+     return make_float16(f16_val);
-diff --git a/linux-headers/asm-x86/unistd_64.h b/linux-headers/asm-x86/unistd_64.h
+ }
-index XXXXXXX..XXXXXXX 100644
---- a/linux-headers/asm-x86/unistd_64.h
+-float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
-+++ b/linux-headers/asm-x86/unistd_64.h
++/*
-@@ -XXX,XX +XXX,XX @@
++ * FEAT_RPRES means the f32 FRECPE has an "increased precision" variant
- #define __NR_pkey_alloc 330
++ * which is used when FPCR.AH == 1.
  #define __NR_pkey_free 331
  #define __NR_statx 332
 +#define __NR_io_pgetevents 333
 +#define __NR_rseq 334
  #endif /* _ASM_X86_UNISTD_64_H */
 diff --git a/linux-headers/asm-x86/unistd_x32.h b/linux-headers/asm-x86/unistd_x32.h
 index XXXXXXX..XXXXXXX 100644
 --- a/linux-headers/asm-x86/unistd_x32.h
 +++ b/linux-headers/asm-x86/unistd_x32.h
@@ -XXX,XX +XXX,XX @@
  #define __NR_pkey_alloc (__X32_SYSCALL_BIT + 330)
  #define __NR_pkey_free (__X32_SYSCALL_BIT + 331)
  #define __NR_statx (__X32_SYSCALL_BIT + 332)
 +#define __NR_io_pgetevents (__X32_SYSCALL_BIT + 333)
 +#define __NR_rseq (__X32_SYSCALL_BIT + 334)
  #define __NR_rt_sigaction (__X32_SYSCALL_BIT + 512)
  #define __NR_rt_sigreturn (__X32_SYSCALL_BIT + 513)
  #define __NR_ioctl (__X32_SYSCALL_BIT + 514)
 diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
 index XXXXXXX..XXXXXXX 100644
 --- a/linux-headers/linux/kvm.h
 +++ b/linux-headers/linux/kvm.h
@@ -XXX,XX +XXX,XX @@ struct kvm_ioeventfd {
  };
  #define KVM_X86_DISABLE_EXITS_MWAIT          (1 << 0)
 -#define KVM_X86_DISABLE_EXITS_HTL            (1 << 1)
 +#define KVM_X86_DISABLE_EXITS_HLT            (1 << 1)
  #define KVM_X86_DISABLE_EXITS_PAUSE          (1 << 2)
  #define KVM_X86_DISABLE_VALID_EXITS          (KVM_X86_DISABLE_EXITS_MWAIT | \
 -                                              KVM_X86_DISABLE_EXITS_HTL | \
 +                                              KVM_X86_DISABLE_EXITS_HLT | \
                                                KVM_X86_DISABLE_EXITS_PAUSE)
  /* for KVM_ENABLE_CAP */
@@ -XXX,XX +XXX,XX @@ struct kvm_ppc_resize_hpt {
  #define KVM_CAP_S390_BPB 152
  #define KVM_CAP_GET_MSR_FEATURES 153
  #define KVM_CAP_HYPERV_EVENTFD 154
 +#define KVM_CAP_HYPERV_TLBFLUSH 155
  #ifdef KVM_CAP_IRQ_ROUTING
 diff --git a/linux-headers/linux/psp-sev.h b/linux-headers/linux/psp-sev.h
 index XXXXXXX..XXXXXXX 100644
 --- a/linux-headers/linux/psp-sev.h
 +++ b/linux-headers/linux/psp-sev.h
@@ -XXX,XX +XXX,XX @@ enum {
      SEV_PDH_GEN,
      SEV_PDH_CERT_EXPORT,
      SEV_PEK_CERT_IMPORT,
 +    SEV_GET_ID,
      SEV_MAX,
  };
@@ -XXX,XX +XXX,XX @@ struct sev_user_data_pdh_cert_export {
      __u32 cert_chain_len;            /* In/Out */
  } __attribute__((packed));
 +/**
 + * struct sev_user_data_get_id - GET_ID command parameters
 + *
 + * @socket1: Buffer to pass unique ID of first socket
 + * @socket2: Buffer to pass unique ID of second socket
 + */
-+struct sev_user_data_get_id {
++static float32 do_recpe_f32(float32 input, float_status *fpst, bool rpres)
-+    __u8 socket1[64];            /* Out */
+ {
-+    __u8 socket2[64];            /* Out */
+     float32 f32 = float32_squash_input_denormal(input, fpst);
-+} __attribute__((packed));
+     uint32_t f32_val = float32_val(f32);
-+
+@@ -XXX,XX +XXX,XX @@ float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
- /**
+     return make_float32(f32_val);
-  * struct sev_issue_cmd - SEV ioctl parameters
+ }
-  *
-diff --git a/linux-headers/LICENSES/exceptions/Linux-syscall-note b/linux-headers/LICENSES/exceptions/Linux-syscall-note
++float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
-index XXXXXXX..XXXXXXX 100644
++{
---- a/linux-headers/LICENSES/exceptions/Linux-syscall-note
++    return do_recpe_f32(input, fpst, false);
-+++ b/linux-headers/LICENSES/exceptions/Linux-syscall-note
++}
-@@ -XXX,XX +XXX,XX @@
++
- SPDX-Exception-Identifier: Linux-syscall-note
++float32 HELPER(recpe_rpres_f32)(float32 input, float_status *fpst)
- SPDX-URL: https://spdx.org/licenses/Linux-syscall-note.html
++{
--SPDX-Licenses: GPL-2.0, GPL-2.0+, GPL-1.0+, LGPL-2.0, LGPL-2.0+, LGPL-2.1, LGPL-2.1+
++    return do_recpe_f32(input, fpst, true);
-+SPDX-Licenses: GPL-2.0, GPL-2.0+, GPL-1.0+, LGPL-2.0, LGPL-2.0+, LGPL-2.1, LGPL-2.1+, GPL-2.0-only, GPL-2.0-or-later
++}
- Usage-Guide:
++
-   This exception is used together with one of the above SPDX-Licenses
+ float64 HELPER(recpe_f64)(float64 input, float_status *fpst)
-   to mark user space API (uapi) header files so they can be included
+ {
-diff --git a/linux-headers/LICENSES/preferred/GPL-2.0 b/linux-headers/LICENSES/preferred/GPL-2.0
+     float64 f64 = float64_squash_input_denormal(input, fpst);
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
---- a/linux-headers/LICENSES/preferred/GPL-2.0
+     return make_float16(val);
-+++ b/linux-headers/LICENSES/preferred/GPL-2.0
+ }
-@@ -XXX,XX +XXX,XX @@
- Valid-License-Identifier: GPL-2.0
+-float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
-+Valid-License-Identifier: GPL-2.0-only
++/*
- Valid-License-Identifier: GPL-2.0+
++ * FEAT_RPRES means the f32 FRSQRTE has an "increased precision" variant
-+Valid-License-Identifier: GPL-2.0-or-later
++ * which is used when FPCR.AH == 1.
- SPDX-URL: https://spdx.org/licenses/GPL-2.0.html
++ */
- Usage-Guide:
++static float32 do_rsqrte_f32(float32 input, float_status *s, bool rpres)
-   To use this license in source code, put one of the following SPDX
+ {
-@@ -XXX,XX +XXX,XX @@ Usage-Guide:
+     float32 f32 = float32_squash_input_denormal(input, s);
-   guidelines in the licensing rules documentation.
+     uint32_t val = float32_val(f32);
-   For 'GNU General Public License (GPL) version 2 only' use:
+@@ -XXX,XX +XXX,XX @@ float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
-     SPDX-License-Identifier: GPL-2.0
+     return make_float32(val);
-+  or
+ }
-+    SPDX-License-Identifier: GPL-2.0-only
-   For 'GNU General Public License (GPL) version 2 or any later version' use:
++float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
-     SPDX-License-Identifier: GPL-2.0+
++{
-+  or
++    return do_rsqrte_f32(input, s, false);
-+    SPDX-License-Identifier: GPL-2.0-or-later
++}
- License-Text:
++
++float32 HELPER(rsqrte_rpres_f32)(float32 input, float_status *s)
-             GNU GENERAL PUBLIC LICENSE
++{
 +    return do_rsqrte_f32(input, s, true);
 +}
 +
  float64 HELPER(rsqrte_f64)(float64 input, float_status *s)
  {
      float64 f64 = float64_squash_input_denormal(input, s);
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 23/28] hw/arm/iotkit: Wire up MPC interrupt lines
+[PULL 53/68] target/arm: Implement increased precision FRECPE
-The interrupt outputs from the MPC in the IoTKit and the expansion
+Implement the increased precision variation of FRECPE.  In the
-MPCs in the board must be wired up to the security controller, and
+pseudocode this corresponds to the handling of the
-also all ORed together to produce a single line to the NVIC.
+"increasedprecision" boolean in the FPRecipEstimate() and
 RecipEstimate() functions.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20180620132032.28865-8-peter.maydell@linaro.org
 ---
- include/hw/arm/iotkit.h |  6 ++++
+ target/arm/vfp_helper.c | 54 +++++++++++++++++++++++++++++++++++------
- hw/arm/iotkit.c         | 74 +++++++++++++++++++++++++++++++++++++++++
+file changed, 46 insertions(+), 8 deletions(-)
 files changed, 80 insertions(+)
-diff --git a/include/hw/arm/iotkit.h b/include/hw/arm/iotkit.h
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/iotkit.h
+--- a/target/arm/vfp_helper.c
-+++ b/include/hw/arm/iotkit.h
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static int recip_estimate(int input)
-  *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_enable
+     return r;
   *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_clear
   *  + named GPIO inputs ahb_ppcexp{0,1,2,3}_irq_status
 + * Controlling each of the 16 expansion MPCs which a system using the IoTKit
 + * might provide:
 + *  + named GPIO inputs mpcexp_status[0..15]
   */
  #ifndef IOTKIT_H
@@ -XXX,XX +XXX,XX @@ typedef struct IoTKit {
      qemu_or_irq ppc_irq_orgate;
      SplitIRQ sec_resp_splitter;
      SplitIRQ ppc_irq_splitter[NUM_PPCS];
 +    SplitIRQ mpc_irq_splitter[IOTS_NUM_EXP_MPC + IOTS_NUM_MPC];
 +    qemu_or_irq mpc_irq_orgate;
      UnimplementedDeviceState dualtimer;
      UnimplementedDeviceState s32ktimer;
@@ -XXX,XX +XXX,XX @@ typedef struct IoTKit {
      qemu_irq nsc_cfg_in;
      qemu_irq irq_status_in[NUM_EXTERNAL_PPCS];
 +    qemu_irq mpcexp_status_in[IOTS_NUM_EXP_MPC];
      uint32_t nsccfg;
 diff --git a/hw/arm/iotkit.c b/hw/arm/iotkit.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/iotkit.c
 +++ b/hw/arm/iotkit.c
@@ -XXX,XX +XXX,XX @@ static void iotkit_init(Object *obj)
      init_sysbus_child(obj, "apb-ppc1", &s->apb_ppc1, sizeof(s->apb_ppc1),
                        TYPE_TZ_PPC);
      init_sysbus_child(obj, "mpc", &s->mpc, sizeof(s->mpc), TYPE_TZ_MPC);
 +    object_initialize(&s->mpc_irq_orgate, sizeof(s->mpc_irq_orgate),
 +                      TYPE_OR_IRQ);
 +    object_property_add_child(obj, "mpc-irq-orgate",
 +                              OBJECT(&s->mpc_irq_orgate), &error_abort);
 +    for (i = 0; i < ARRAY_SIZE(s->mpc_irq_splitter); i++) {
 +        char *name = g_strdup_printf("mpc-irq-splitter-%d", i);
 +        SplitIRQ *splitter = &s->mpc_irq_splitter[i];
 +
 +        object_initialize(splitter, sizeof(*splitter), TYPE_SPLIT_IRQ);
 +        object_property_add_child(obj, name, OBJECT(splitter), &error_abort);
 +        g_free(name);
 +    }
      init_sysbus_child(obj, "timer0", &s->timer0, sizeof(s->timer0),
                        TYPE_CMSDK_APB_TIMER);
      init_sysbus_child(obj, "timer1", &s->timer1, sizeof(s->timer1),
@@ -XXX,XX +XXX,XX @@ static void iotkit_exp_irq(void *opaque, int n, int level)
      qemu_set_irq(s->exp_irqs[n], level);
  }
-+static void iotkit_mpcexp_status(void *opaque, int n, int level)
++/*
 + * Increased precision version:
 + * input is a 13 bit fixed point number
 + * input range 2048 .. 4095 for a number from 0.5 <= x < 1.0.
 + * result range 4096 .. 8191 for a number from 1.0 to 2.0
 + */
 +static int recip_estimate_incprec(int input)
 +{
-+    IoTKit *s = IOTKIT(opaque);
++    int a, b, r;
-+    qemu_set_irq(s->mpcexp_status_in[n], level);
++    assert(2048 <= input && input < 4096);
 +    a = (input * 2) + 1;
 +    /*
 +     * The pseudocode expresses this as an operation on infinite
 +     * precision reals where it calculates 2^25 / a and then looks
 +     * at the error between that and the rounded-down-to-integer
 +     * value to see if it should instead round up. We instead
 +     * follow the same approach as the pseudocode for the 8-bit
 +     * precision version, and calculate (2 * (2^25 / a)) as an
 +     * integer so we can do the "add one and halve" to round it.
 +     * So the 1 << 26 here is correct.
 +     */
 +    b = (1 << 26) / a;
 +    r = (b + 1) >> 1;
 +    assert(4096 <= r && r < 8192);
 +    return r;
 +}
 +
- static void iotkit_realize(DeviceState *dev, Error **errp)
+ /*
   * Common wrapper to call recip_estimate
   *
@@ -XXX,XX +XXX,XX @@ static int recip_estimate(int input)
   * callee.
   */
 -static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
 +static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac,
 +                                    bool increasedprecision)
  {
-     IoTKit *s = IOTKIT(dev);
+     uint32_t scaled, estimate;
-@@ -XXX,XX +XXX,XX @@ static void iotkit_realize(DeviceState *dev, Error **errp)
+     uint64_t result_frac;
-                                 sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->mpc),
+@@ -XXX,XX +XXX,XX @@ static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
-));
+         }
+     }
-+    /* We must OR together lines from the MPC splitters to go to the NVIC */
-+    object_property_set_int(OBJECT(&s->mpc_irq_orgate),
+-    /* scaled = UInt('1':fraction<51:44>) */
-+                            IOTS_NUM_EXP_MPC + IOTS_NUM_MPC, "num-lines", &err);
+-    scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
-+    if (err) {
+-    estimate = recip_estimate(scaled);
-+        error_propagate(errp, err);
++    if (increasedprecision) {
-+        return;
++        /* scaled = UInt('1':fraction<51:41>) */
 +        scaled = deposit32(1 << 11, 0, 11, extract64(frac, 41, 11));
 +        estimate = recip_estimate_incprec(scaled);
 +    } else {
 +        /* scaled = UInt('1':fraction<51:44>) */
 +        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
 +        estimate = recip_estimate(scaled);
 +    }
-+    object_property_set_bool(OBJECT(&s->mpc_irq_orgate), true,
-+                             "realized", &err);
+     result_exp = exp_off - *exp;
-+    if (err) {
+-    result_frac = deposit64(0, 44, 8, estimate);
-+        error_propagate(errp, err);
++    if (increasedprecision) {
-+        return;
++        result_frac = deposit64(0, 40, 12, estimate);
 +    } else {
 +        result_frac = deposit64(0, 44, 8, estimate);
 +    }
-+    qdev_connect_gpio_out(DEVICE(&s->mpc_irq_orgate), 0,
+     if (result_exp == 0) {
-+                          qdev_get_gpio_in(DEVICE(&s->armv7m), 9));
+         result_frac = deposit64(result_frac >> 1, 51, 1, 1);
-+
+     } else if (result_exp == -1) {
-     /* Devices behind APB PPC0:
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, float_status *fpst)
       *   0x40000000: timer0
       *   0x40001000: timer1
@@ -XXX,XX +XXX,XX @@ static void iotkit_realize(DeviceState *dev, Error **errp)
          g_free(gpioname);
      }
-+    /* Wire up the splitters for the MPC IRQs */
+     f64_frac = call_recip_estimate(&f16_exp, 29,
-+    for (i = 0; i < IOTS_NUM_EXP_MPC + IOTS_NUM_MPC; i++) {
+-                                   ((uint64_t) f16_frac) << (52 - 10));
-+        SplitIRQ *splitter = &s->mpc_irq_splitter[i];
++                                   ((uint64_t) f16_frac) << (52 - 10), false);
-+        DeviceState *dev_splitter = DEVICE(splitter);
-+
+     /* result = sign : result_exp<4:0> : fraction<51:42> */
-+        object_property_set_int(OBJECT(splitter), 2, "num-lines", &err);
+     f16_val = deposit32(0, 15, 1, f16_sign);
-+        if (err) {
+@@ -XXX,XX +XXX,XX @@ static float32 do_recpe_f32(float32 input, float_status *fpst, bool rpres)
-+            error_propagate(errp, err);
+     }
-+            return;
-+        }
+     f64_frac = call_recip_estimate(&f32_exp, 253,
-+        object_property_set_bool(OBJECT(splitter), true, "realized", &err);
+-                                   ((uint64_t) f32_frac) << (52 - 23));
-+        if (err) {
++                                   ((uint64_t) f32_frac) << (52 - 23), rpres);
-+            error_propagate(errp, err);
-+            return;
+     /* result = sign : result_exp<7:0> : fraction<51:29> */
-+        }
+     f32_val = deposit32(0, 31, 1, f32_sign);
-+
+@@ -XXX,XX +XXX,XX @@ float64 HELPER(recpe_f64)(float64 input, float_status *fpst)
-+        if (i < IOTS_NUM_EXP_MPC) {
+         return float64_set_sign(float64_zero, float64_is_neg(f64));
-+            /* Splitter input is from GPIO input line */
+     }
-+            s->mpcexp_status_in[i] = qdev_get_gpio_in(dev_splitter, 0);
-+            qdev_connect_gpio_out(dev_splitter, 0,
+-    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac);
-+                                  qdev_get_gpio_in_named(dev_secctl,
++    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac, false);
-+                                                         "mpcexp_status", i));
-+        } else {
+     /* result = sign : result_exp<10:0> : fraction<51:0>; */
-+            /* Splitter input is from our own MPC */
+     f64_val = deposit64(0, 63, 1, f64_sign);
 +            qdev_connect_gpio_out_named(DEVICE(&s->mpc), "irq", 0,
 +                                        qdev_get_gpio_in(dev_splitter, 0));
 +            qdev_connect_gpio_out(dev_splitter, 0,
 +                                  qdev_get_gpio_in_named(dev_secctl,
 +                                                         "mpc_status", 0));
 +        }
 +
 +        qdev_connect_gpio_out(dev_splitter, 1,
 +                              qdev_get_gpio_in(DEVICE(&s->mpc_irq_orgate), i));
 +    }
 +    /* Create GPIO inputs which will pass the line state for our
 +     * mpcexp_irq inputs to the correct splitter devices.
 +     */
 +    qdev_init_gpio_in_named(dev, iotkit_mpcexp_status, "mpcexp_status",
 +                            IOTS_NUM_EXP_MPC);
 +
      iotkit_forward_sec_resp_cfg(s);
      system_clock_scale = NANOSECONDS_PER_SECOND / s->mainclk_frq;
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 19/28] hw/misc/tz-mpc.c: Implement correct blocked-access behaviour
+[PULL 54/68] target/arm: Implement increased precision FRSQRTE
-The MPC is guest-configurable for whether blocked accesses:
+Implement the increased precision variation of FRSQRTE.  In the
- * should be RAZ/WI or cause a bus error
+pseudocode this corresponds to the handling of the
- * should generate an interrupt or not
+"increasedprecision" boolean in the FPRSqrtEstimate() and
+RecipSqrtEstimate() functions.
 Implement this behaviour in the blocked-access handlers.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Eric Auger <eric.auger@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20180620132032.28865-4-peter.maydell@linaro.org
 ---
- hw/misc/tz-mpc.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++--
+ target/arm/vfp_helper.c | 77 ++++++++++++++++++++++++++++++++++-------
-file changed, 48 insertions(+), 2 deletions(-)
+file changed, 64 insertions(+), 13 deletions(-)
-diff --git a/hw/misc/tz-mpc.c b/hw/misc/tz-mpc.c
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/tz-mpc.c
+--- a/target/arm/vfp_helper.c
-+++ b/hw/misc/tz-mpc.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ REG32(INT_EN, 0x28)
+@@ -XXX,XX +XXX,XX @@ static int do_recip_sqrt_estimate(int a)
-     FIELD(INT_EN, IRQ, 0, 1)
+     return estimate;
- REG32(INT_INFO1, 0x2c)
+ }
- REG32(INT_INFO2, 0x30)
-+    FIELD(INT_INFO2, HMASTER, 0, 16)
++static int do_recip_sqrt_estimate_incprec(int a)
 +    FIELD(INT_INFO2, HNONSEC, 16, 1)
 +    FIELD(INT_INFO2, CFG_NS, 17, 1)
  REG32(INT_SET, 0x34)
      FIELD(INT_SET, IRQ, 0, 1)
  REG32(PIDR4, 0xfd0)
@@ -XXX,XX +XXX,XX @@ static const MemoryRegionOps tz_mpc_reg_ops = {
      .impl.max_access_size = 4,
  };
 +static inline bool tz_mpc_cfg_ns(TZMPC *s, hwaddr addr)
 +{
-+    /* Return the cfg_ns bit from the LUT for the specified address */
++    /*
-+    hwaddr blknum = addr / s->blocksize;
++     * The Arm ARM describes the 12-bit precision version of RecipSqrtEstimate
-+    hwaddr blkword = blknum / 32;
++     * in terms of an infinite-precision floating point calculation of a
-+    uint32_t blkbit = 1U << (blknum % 32);
++     * square root. We implement this using the same kind of pure integer
 +     * algorithm as the 8-bit mantissa, to get the same bit-for-bit result.
 +     */
 +    int64_t b, estimate;
 -static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
 +    assert(1024 <= a && a < 4096);
 +    if (a < 2048) {
 +        a = a * 2 + 1;
 +    } else {
 +        a = (a >> 1) << 1;
 +        a = (a + 1) * 2;
 +    }
 +    b = 8192;
 +    while (a * (b + 1) * (b + 1) < (1ULL << 39)) {
 +        b += 1;
 +    }
 +    estimate = (b + 1) / 2;
 +
-+    /* This would imply the address was larger than the size we
++    assert(4096 <= estimate && estimate < 8192);
-+     * defined this memory region to be, so it can't happen.
++
-+     */
++    return estimate;
 +    assert(blkword < s->blk_max);
 +    return s->blk_lut[blkword] & blkbit;
 +}
 +
-+static MemTxResult tz_mpc_handle_block(TZMPC *s, hwaddr addr, MemTxAttrs attrs)
++static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac,
-+{
++                                    bool increasedprecision)
-+    /* Handle a blocked transaction: raise IRQ, capture info, etc */
+ {
-+    if (!s->int_stat) {
+     int estimate;
-+        /* First blocked transfer: capture information into INT_INFO1 and
+     uint32_t scaled;
-+         * INT_INFO2. Subsequent transfers are still blocked but don't
+@@ -XXX,XX +XXX,XX @@ static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
-+         * capture information until the guest clears the interrupt.
+         frac = extract64(frac, 0, 51) << 1;
-+         */
+     }
-+
-+        s->int_info1 = addr;
+-    if (*exp & 1) {
-+        s->int_info2 = 0;
+-        /* scaled = UInt('01':fraction<51:45>) */
-+        s->int_info2 = FIELD_DP32(s->int_info2, INT_INFO2, HMASTER,
+-        scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
-+                                  attrs.requester_id & 0xffff);
++    if (increasedprecision) {
-+        s->int_info2 = FIELD_DP32(s->int_info2, INT_INFO2, HNONSEC,
++        if (*exp & 1) {
-+                                  ~attrs.secure);
++            /* scaled = UInt('01':fraction<51:42>) */
-+        s->int_info2 = FIELD_DP32(s->int_info2, INT_INFO2, CFG_NS,
++            scaled = deposit32(1 << 10, 0, 10, extract64(frac, 42, 10));
-+                                  tz_mpc_cfg_ns(s, addr));
++        } else {
-+        s->int_stat |= R_INT_STAT_IRQ_MASK;
++            /* scaled = UInt('1':fraction<51:41>) */
-+        tz_mpc_irq_update(s);
++            scaled = deposit32(1 << 11, 0, 11, extract64(frac, 41, 11));
 +        }
 +        estimate = do_recip_sqrt_estimate_incprec(scaled);
      } else {
 -        /* scaled = UInt('1':fraction<51:44>) */
 -        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
 +        if (*exp & 1) {
 +            /* scaled = UInt('01':fraction<51:45>) */
 +            scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
 +        } else {
 +            /* scaled = UInt('1':fraction<51:44>) */
 +            scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
 +        }
 +        estimate = do_recip_sqrt_estimate(scaled);
      }
 -    estimate = do_recip_sqrt_estimate(scaled);
      *exp = (exp_off - *exp) / 2;
 -    return extract64(estimate, 0, 8) << 44;
 +    if (increasedprecision) {
 +        return extract64(estimate, 0, 12) << 40;
 +    } else {
 +        return extract64(estimate, 0, 8) << 44;
 +    }
-+
-+    /* Generate bus error if desired; otherwise RAZ/WI */
-+    return (s->ctrl & R_CTRL_SEC_RESP_MASK) ? MEMTX_ERROR : MEMTX_OK;
-+}
-+
- /* Accesses only reach these read and write functions if the MPC is
-  * blocking them; non-blocked accesses go directly to the downstream
-  * memory region without passing through this code.
-@@ -XXX,XX +XXX,XX @@ static MemTxResult tz_mpc_mem_blocked_read(void *opaque, hwaddr addr,
-                                            uint64_t *pdata,
-                                            unsigned size, MemTxAttrs attrs)
- {
-+    TZMPC *s = TZ_MPC(opaque);
-+
-     trace_tz_mpc_mem_blocked_read(addr, size, attrs.secure);
-     *pdata = 0;
--    return MEMTX_OK;
-+    return tz_mpc_handle_block(s, addr, attrs);
  }
- static MemTxResult tz_mpc_mem_blocked_write(void *opaque, hwaddr addr,
+ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
-                                             uint64_t value,
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
-                                             unsigned size, MemTxAttrs attrs)
- {
+     f64_frac = ((uint64_t) f16_frac) << (52 - 10);
-+    TZMPC *s = TZ_MPC(opaque);
-+
+-    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac);
-     trace_tz_mpc_mem_blocked_write(addr, value, size, attrs.secure);
++    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac, false);
--    return MEMTX_OK;
+     /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(2) */
-+    return tz_mpc_handle_block(s, addr, attrs);
+     val = deposit32(0, 15, 1, f16_sign);
@@ -XXX,XX +XXX,XX @@ static float32 do_rsqrte_f32(float32 input, float_status *s, bool rpres)
      f64_frac = ((uint64_t) f32_frac) << 29;
 -    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac);
 +    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac, rpres);
 -    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(15) */
 +    /*
 +     * result = sign : result_exp<7:0> : estimate<7:0> : Zeros(15)
 +     * or for increased precision
 +     * result = sign : result_exp<7:0> : estimate<11:0> : Zeros(11)
 +     */
      val = deposit32(0, 31, 1, f32_sign);
      val = deposit32(val, 23, 8, f32_exp);
 -    val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
 +    if (rpres) {
 +        val = deposit32(val, 11, 12, extract64(f64_frac, 52 - 12, 12));
 +    } else {
 +        val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
 +    }
      return make_float32(val);
  }
- static const MemoryRegionOps tz_mpc_mem_blocked_ops = {
+@@ -XXX,XX +XXX,XX @@ float64 HELPER(rsqrte_f64)(float64 input, float_status *s)
          return float64_zero;
      }
 -    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac);
 +    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac, false);
      /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(44) */
      val = deposit64(0, 61, 1, f64_sign);
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 27/28] vl.c: Don't zero-initialize statics for serial_hds
+[PULL 55/68] target/arm: Enable FEAT_RPRES for -cpu max
-checkpatch reminds us that statics shouldn't be zero-initialized:
+Now the emulation is complete, we can enable FEAT_RPRES for the 'max'
+CPU type.
 ERROR: do not initialise statics to 0 or NULL
 #35: FILE: vl.c:157:
 +static int num_serial_hds = 0;
 ERROR: do not initialise statics to 0 or NULL
 #36: FILE: vl.c:158:
 +static Chardev **serial_hds = NULL;
 I forgot to fix this in 6af2692e86f9fdfb3d; do so now.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Thomas Huth <thuth@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20180426140253.3918-1-peter.maydell@linaro.org
 ---
- vl.c | 4 ++--
+ docs/system/arm/emulation.rst | 1 +
-file changed, 2 insertions(+), 2 deletions(-)
+ target/arm/tcg/cpu64.c        | 1 +
 files changed, 2 insertions(+)
-diff --git a/vl.c b/vl.c
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
---- a/vl.c
+--- a/docs/system/arm/emulation.rst
-+++ b/vl.c
++++ b/docs/system/arm/emulation.rst
-@@ -XXX,XX +XXX,XX @@ QEMUClockType rtc_clock;
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
- int vga_interface_type = VGA_NONE;
+ - FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions)
- static DisplayOptions dpy;
+ - FEAT_RME (Realm Management Extension) (NB: support status in QEMU is experimental)
- int no_frame;
+ - FEAT_RNG (Random number generator)
--static int num_serial_hds = 0;
++- FEAT_RPRES (Increased precision of FRECPE and FRSQRTE)
--static Chardev **serial_hds = NULL;
+ - FEAT_S2FWB (Stage 2 forced Write-Back)
-+static int num_serial_hds;
+ - FEAT_SB (Speculation Barrier)
-+static Chardev **serial_hds;
+ - FEAT_SEL2 (Secure EL2)
- Chardev *parallel_hds[MAX_PARALLEL_PORTS];
+diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
- Chardev *virtcon_hds[MAX_VIRTIO_CONSOLES];
+index XXXXXXX..XXXXXXX 100644
- int win2k_install_hack = 0;
+--- a/target/arm/tcg/cpu64.c
 +++ b/target/arm/tcg/cpu64.c
@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
      cpu->isar.id_aa64isar1 = t;
      t = cpu->isar.id_aa64isar2;
 +    t = FIELD_DP64(t, ID_AA64ISAR2, RPRES, 1);    /* FEAT_RPRES */
      t = FIELD_DP64(t, ID_AA64ISAR2, MOPS, 1);     /* FEAT_MOPS */
      t = FIELD_DP64(t, ID_AA64ISAR2, BC, 1);       /* FEAT_HBC */
      t = FIELD_DP64(t, ID_AA64ISAR2, WFXT, 2);     /* FEAT_WFxT */
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 05/28] target/arm: Allow KVM device address overwriting
+[PULL 56/68] target/arm: Introduce CPUARMState.vfp.fp_status[]
-From: Eric Auger <eric.auger@redhat.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-for KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION attribute, the attribute
+Move ARMFPStatusFlavour to cpu.h with which to index
-data pointed to by kvm_device_attr.addr is a OR of the
+this array.  For now, place the array in an anonymous
-redistributor region address and other fields such as the index
+union with the existing structures.  Adjust the order
-of the redistributor region and the number of redistributors the
+of the existing structures to match the enum.
-region can contain.
+Simplify fpstatus_ptr() using the new array.
-The existing machine init done notifier framework sets the address
-field to the actual address of the device and does not allow to OR
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-this value with other fields.
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20250129013857.135256-7-richard.henderson@linaro.org
 This patch extends the KVMDevice struct with a new kda_addr_ormask
 member. Its value is passed at registration time and OR'ed with the
 resolved address on kvm_arm_set_device_addr().
 Signed-off-by: Eric Auger <eric.auger@redhat.com>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 1529072910-16156-3-git-send-email-eric.auger@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/kvm_arm.h        |  3 ++-
+ target/arm/cpu.h           | 119 +++++++++++++++++++++----------------
- hw/intc/arm_gic_kvm.c       |  4 ++--
+ target/arm/tcg/translate.h |  64 +-------------------
- hw/intc/arm_gicv3_its_kvm.c |  2 +-
+files changed, 70 insertions(+), 113 deletions(-)
- hw/intc/arm_gicv3_kvm.c     |  4 ++--
- target/arm/kvm.c            | 10 +++++++++-
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 files changed, 16 insertions(+), 7 deletions(-)
 diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm_arm.h
+--- a/target/arm/cpu.h
-+++ b/target/arm/kvm_arm.h
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ int kvm_arm_vcpu_init(CPUState *cs);
+@@ -XXX,XX +XXX,XX @@ typedef struct ARMMMUFaultInfo ARMMMUFaultInfo;
-  * @group: device control API group for setting addresses
-  * @attr: device control API address type
+ typedef struct NVICState NVICState;
-  * @dev_fd: device control device file descriptor (or -1 if not supported)
-+ * @addr_ormask: value to be OR'ed with resolved address
++/*
 + * Enum for indexing vfp.fp_status[].
 + *
 + * FPST_A32: is the "normal" fp status for AArch32 insns
 + * FPST_A64: is the "normal" fp status for AArch64 insns
 + * FPST_A32_F16: used for AArch32 half-precision calculations
 + * FPST_A64_F16: used for AArch64 half-precision calculations
 + * FPST_STD: the ARM "Standard FPSCR Value"
 + * FPST_STD_F16: used for half-precision
 + *       calculations with the ARM "Standard FPSCR Value"
 + * FPST_AH: used for the A64 insns which change behaviour
 + *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 + *       and the reciprocal and square root estimate/step insns)
 + * FPST_AH_F16: used for the A64 insns which change behaviour
 + *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 + *       and the reciprocal and square root estimate/step insns);
 + *       for half-precision
 + *
 + * Half-precision operations are governed by a separate
 + * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
 + * status structure to control this.
 + *
 + * The "Standard FPSCR", ie default-NaN, flush-to-zero,
 + * round-to-nearest and is used by any operations (generally
 + * Neon) which the architecture defines as controlled by the
 + * standard FPSCR value rather than the FPSCR.
 + *
 + * The "standard FPSCR but for fp16 ops" is needed because
 + * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
 + * using a fixed value for it.
 + *
 + * The ah_fp_status is needed because some insns have different
 + * behaviour when FPCR.AH == 1: they don't update cumulative
 + * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
 + * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
 + * which means we need an ah_fp_status_f16 as well.
 + *
 + * To avoid having to transfer exception bits around, we simply
 + * say that the FPSCR cumulative exception flags are the logical
 + * OR of the flags in the four fp statuses. This relies on the
 + * only thing which needs to read the exception flags being
 + * an explicit FPSCR read.
 + */
 +typedef enum ARMFPStatusFlavour {
 +    FPST_A32,
 +    FPST_A64,
 +    FPST_A32_F16,
 +    FPST_A64_F16,
 +    FPST_AH,
 +    FPST_AH_F16,
 +    FPST_STD,
 +    FPST_STD_F16,
 +} ARMFPStatusFlavour;
 +#define FPST_COUNT  8
 +
  typedef struct CPUArchState {
      /* Regs for current mode.  */
      uint32_t regs[16];
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          /* Scratch space for aa32 neon expansion.  */
          uint32_t scratch[8];
 -        /* There are a number of distinct float control structures:
 -         *
 -         *  fp_status_a32: is the "normal" fp status for AArch32 insns
 -         *  fp_status_a64: is the "normal" fp status for AArch64 insns
 -         *  fp_status_fp16_a32: used for AArch32 half-precision calculations
 -         *  fp_status_fp16_a64: used for AArch64 half-precision calculations
 -         *  standard_fp_status : the ARM "Standard FPSCR Value"
 -         *  standard_fp_status_fp16 : used for half-precision
 -         *       calculations with the ARM "Standard FPSCR Value"
 -         *  ah_fp_status: used for the A64 insns which change behaviour
 -         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 -         *       and the reciprocal and square root estimate/step insns)
 -         *  ah_fp_status_f16: used for the A64 insns which change behaviour
 -         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 -         *       and the reciprocal and square root estimate/step insns);
 -         *       for half-precision
 -         *
 -         * Half-precision operations are governed by a separate
 -         * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
 -         * status structure to control this.
 -         *
 -         * The "Standard FPSCR", ie default-NaN, flush-to-zero,
 -         * round-to-nearest and is used by any operations (generally
 -         * Neon) which the architecture defines as controlled by the
 -         * standard FPSCR value rather than the FPSCR.
 -         *
 -         * The "standard FPSCR but for fp16 ops" is needed because
 -         * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
 -         * using a fixed value for it.
 -         *
 -         * The ah_fp_status is needed because some insns have different
 -         * behaviour when FPCR.AH == 1: they don't update cumulative
 -         * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
 -         * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
 -         * which means we need an ah_fp_status_f16 as well.
 -         *
 -         * To avoid having to transfer exception bits around, we simply
 -         * say that the FPSCR cumulative exception flags are the logical
 -         * OR of the flags in the four fp statuses. This relies on the
 -         * only thing which needs to read the exception flags being
 -         * an explicit FPSCR read.
 -         */
 -        float_status fp_status_a32;
 -        float_status fp_status_a64;
 -        float_status fp_status_f16_a32;
 -        float_status fp_status_f16_a64;
 -        float_status standard_fp_status;
 -        float_status standard_fp_status_f16;
 -        float_status ah_fp_status;
 -        float_status ah_fp_status_f16;
 +        /* There are a number of distinct float control structures. */
 +        union {
 +            float_status fp_status[FPST_COUNT];
 +            struct {
 +                float_status fp_status_a32;
 +                float_status fp_status_a64;
 +                float_status fp_status_f16_a32;
 +                float_status fp_status_f16_a64;
 +                float_status ah_fp_status;
 +                float_status ah_fp_status_f16;
 +                float_status standard_fp_status;
 +                float_status standard_fp_status_f16;
 +            };
 +        };
          uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
          uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
 diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate.h
 +++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ static inline CPUARMTBFlags arm_tbflags_from_tb(const TranslationBlock *tb)
      return (CPUARMTBFlags){ tb->flags, tb->cs_base };
  }
 -/*
 - * Enum for argument to fpstatus_ptr().
 - */
 -typedef enum ARMFPStatusFlavour {
 -    FPST_A32,
 -    FPST_A64,
 -    FPST_A32_F16,
 -    FPST_A64_F16,
 -    FPST_AH,
 -    FPST_AH_F16,
 -    FPST_STD,
 -    FPST_STD_F16,
 -} ARMFPStatusFlavour;
 -
  /**
   * fpstatus_ptr: return TCGv_ptr to the specified fp_status field
   *
-  * Remember the memory region @mr, and when it is mapped by the
+  * We have multiple softfloat float_status fields in the Arm CPU state struct
-  * machine model, tell the kernel that base address using the
+  * (see the comment in cpu.h for details). Return a TCGv_ptr which has
-@@ -XXX,XX +XXX,XX @@ int kvm_arm_vcpu_init(CPUState *cs);
+  * been set up to point to the requested field in the CPU state struct.
-  * address at the point where machine init is complete.
+- * The options are:
 - *
 - * FPST_A32
 - *   for AArch32 non-FP16 operations controlled by the FPCR
 - * FPST_A64
 - *   for AArch64 non-FP16 operations controlled by the FPCR
 - * FPST_A32_F16
 - *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
 - * FPST_A64_F16
 - *   for AArch64 operations controlled by the FPCR where FPCR.FZ16 is to be used
 - * FPST_AH:
 - *   for AArch64 operations which change behaviour when AH=1 (specifically,
 - *   bfloat16 conversions and multiplies, and the reciprocal and square root
 - *   estimate/step insns)
 - * FPST_AH_F16:
 - *   ditto, but for half-precision operations
 - * FPST_STD
 - *   for A32/T32 Neon operations using the "standard FPSCR value"
 - * FPST_STD_F16
 - *   as FPST_STD, but where FPCR.FZ16 is to be used
   */
- void kvm_arm_register_device(MemoryRegion *mr, uint64_t devid, uint64_t group,
+ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
 -                             uint64_t attr, int dev_fd);
 +                             uint64_t attr, int dev_fd, uint64_t addr_ormask);
  /**
   * kvm_arm_init_cpreg_list:
 diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/intc/arm_gic_kvm.c
 +++ b/hw/intc/arm_gic_kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gic_realize(DeviceState *dev, Error **errp)
                              | KVM_VGIC_V2_ADDR_TYPE_DIST,
                              KVM_DEV_ARM_VGIC_GRP_ADDR,
                              KVM_VGIC_V2_ADDR_TYPE_DIST,
 -                            s->dev_fd);
 +                            s->dev_fd, 0);
      /* CPU interface for current core. Unlike arm_gic, we don't
       * provide the "interface for core #N" memory regions, because
       * cores with a VGIC don't have those.
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gic_realize(DeviceState *dev, Error **errp)
                              | KVM_VGIC_V2_ADDR_TYPE_CPU,
                              KVM_DEV_ARM_VGIC_GRP_ADDR,
                              KVM_VGIC_V2_ADDR_TYPE_CPU,
 -                            s->dev_fd);
 +                            s->dev_fd, 0);
      if (kvm_has_gsi_routing()) {
          /* set up irq routing */
 diff --git a/hw/intc/arm_gicv3_its_kvm.c b/hw/intc/arm_gicv3_its_kvm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/intc/arm_gicv3_its_kvm.c
 +++ b/hw/intc/arm_gicv3_its_kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_its_realize(DeviceState *dev, Error **errp)
      /* register the base address */
      kvm_arm_register_device(&s->iomem_its_cntrl, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
 -                            KVM_VGIC_ITS_ADDR_TYPE, s->dev_fd);
 +                            KVM_VGIC_ITS_ADDR_TYPE, s->dev_fd, 0);
      gicv3_its_init_mmio(s, NULL);
 diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/intc/arm_gicv3_kvm.c
 +++ b/hw/intc/arm_gicv3_kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
                        KVM_DEV_ARM_VGIC_CTRL_INIT, NULL, true, &error_abort);
      kvm_arm_register_device(&s->iomem_dist, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
 -                            KVM_VGIC_V3_ADDR_TYPE_DIST, s->dev_fd);
 +                            KVM_VGIC_V3_ADDR_TYPE_DIST, s->dev_fd, 0);
      kvm_arm_register_device(&s->iomem_redist, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
 -                            KVM_VGIC_V3_ADDR_TYPE_REDIST, s->dev_fd);
 +                            KVM_VGIC_V3_ADDR_TYPE_REDIST, s->dev_fd, 0);
      if (kvm_has_gsi_routing()) {
          /* set up irq routing */
 diff --git a/target/arm/kvm.c b/target/arm/kvm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm.c
 +++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu)
   * We use a MemoryListener to track mapping and unmapping of
   * the regions during board creation, so the board models don't
   * need to do anything special for the KVM case.
 + *
 + * Sometimes the address must be OR'ed with some other fields
 + * (for example for KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION).
 + * @kda_addr_ormask aims at storing the value of those fields.
   */
  typedef struct KVMDevice {
      struct kvm_arm_device_addr kda;
      struct kvm_device_attr kdattr;
 +    uint64_t kda_addr_ormask;
      MemoryRegion *mr;
      QSLIST_ENTRY(KVMDevice) entries;
      int dev_fd;
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_set_device_addr(KVMDevice *kd)
       */
      if (kd->dev_fd >= 0) {
          uint64_t addr = kd->kda.addr;
 +
 +        addr |= kd->kda_addr_ormask;
          attr->addr = (uintptr_t)&addr;
          ret = kvm_device_ioctl(kd->dev_fd, KVM_SET_DEVICE_ATTR, attr);
      } else {
@@ -XXX,XX +XXX,XX @@ static Notifier notify = {
  };
  void kvm_arm_register_device(MemoryRegion *mr, uint64_t devid, uint64_t group,
 -                             uint64_t attr, int dev_fd)
 +                             uint64_t attr, int dev_fd, uint64_t addr_ormask)
  {
-     KVMDevice *kd;
+     TCGv_ptr statusptr = tcg_temp_new_ptr();
+-    int offset;
-@@ -XXX,XX +XXX,XX @@ void kvm_arm_register_device(MemoryRegion *mr, uint64_t devid, uint64_t group,
++    int offset = offsetof(CPUARMState, vfp.fp_status[flavour]);
-     kd->kdattr.group = group;
-     kd->kdattr.attr = attr;
+-    switch (flavour) {
-     kd->dev_fd = dev_fd;
+-    case FPST_A32:
-+    kd->kda_addr_ormask = addr_ormask;
+-        offset = offsetof(CPUARMState, vfp.fp_status_a32);
-     QSLIST_INSERT_HEAD(&kvm_devices_head, kd, entries);
+-        break;
-     memory_region_ref(kd->mr);
+-    case FPST_A64:
 -        offset = offsetof(CPUARMState, vfp.fp_status_a64);
 -        break;
 -    case FPST_A32_F16:
 -        offset = offsetof(CPUARMState, vfp.fp_status_f16_a32);
 -        break;
 -    case FPST_A64_F16:
 -        offset = offsetof(CPUARMState, vfp.fp_status_f16_a64);
 -        break;
 -    case FPST_AH:
 -        offset = offsetof(CPUARMState, vfp.ah_fp_status);
 -        break;
 -    case FPST_AH_F16:
 -        offset = offsetof(CPUARMState, vfp.ah_fp_status_f16);
 -        break;
 -    case FPST_STD:
 -        offset = offsetof(CPUARMState, vfp.standard_fp_status);
 -        break;
 -    case FPST_STD_F16:
 -        offset = offsetof(CPUARMState, vfp.standard_fp_status_f16);
 -        break;
 -    default:
 -        g_assert_not_reached();
 -    }
      tcg_gen_addi_ptr(statusptr, tcg_env, offset);
      return statusptr;
  }
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 25/28] target/arm: Introduce ARM_FEATURE_M_MAIN
+[PULL 57/68] target/arm: Remove standard_fp_status_f16
-From: Julia Suvorova <jusual@mail.ru>
+From: Richard Henderson <richard.henderson@linaro.org>
-This feature is intended to distinguish ARMv8-M variants: Baseline and
+Replace with fp_status[FPST_STD_F16].
 Mainline. ARMv7-M compatibility requires the Main Extension. ARMv6-M
 compatibility is provided by all ARMv8-M implementations.
-Signed-off-by: Julia Suvorova <jusual@mail.ru>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180622080138.17702-2-jusual@mail.ru
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20250129013857.135256-8-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h | 1 +
+ target/arm/cpu.h            |  1 -
- target/arm/cpu.c | 3 +++
+ target/arm/cpu.c            |  4 ++--
-files changed, 4 insertions(+)
+ target/arm/tcg/mve_helper.c | 24 ++++++++++++------------
  target/arm/vfp_helper.c     |  8 ++++----
 files changed, 18 insertions(+), 19 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ enum arm_features {
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
-     ARM_FEATURE_V8_RDM, /* implements v8.1 simd round multiply */
+                 float_status ah_fp_status;
-     ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
+                 float_status ah_fp_status_f16;
-     ARM_FEATURE_V8_FCMA, /* has complex number part of v8.3 extensions.  */
+                 float_status standard_fp_status;
-+    ARM_FEATURE_M_MAIN, /* M profile Main Extension */
+-                float_status standard_fp_status_f16;
- };
+             };
+         };
- static inline int arm_feature(CPUARMState *env, int feature)
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void cortex_m3_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
-     ARMCPU *cpu = ARM_CPU(obj);
+     set_flush_to_zero(1, &env->vfp.standard_fp_status);
-     set_feature(&cpu->env, ARM_FEATURE_V7);
+     set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
-     set_feature(&cpu->env, ARM_FEATURE_M);
+     set_default_nan_mode(1, &env->vfp.standard_fp_status);
-+    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
+-    set_default_nan_mode(1, &env->vfp.standard_fp_status_f16);
-     cpu->midr = 0x410fc231;
++    set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
-     cpu->pmsav7_dregion = 8;
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
-     cpu->id_pfr0 = 0x00000030;
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
-@@ -XXX,XX +XXX,XX @@ static void cortex_m4_initfn(Object *obj)
+     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
-     set_feature(&cpu->env, ARM_FEATURE_V7);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
-     set_feature(&cpu->env, ARM_FEATURE_M);
+-    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
-+    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
++    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
-     set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
+     arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
-     cpu->midr = 0x410fc240; /* r0p0 */
+     set_flush_to_zero(1, &env->vfp.ah_fp_status);
-     cpu->pmsav7_dregion = 8;
+     set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
-@@ -XXX,XX +XXX,XX @@ static void cortex_m33_initfn(Object *obj)
+diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
+index XXXXXXX..XXXXXXX 100644
-     set_feature(&cpu->env, ARM_FEATURE_V8);
+--- a/target/arm/tcg/mve_helper.c
-     set_feature(&cpu->env, ARM_FEATURE_M);
++++ b/target/arm/tcg/mve_helper.c
-+    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
+@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
-     set_feature(&cpu->env, ARM_FEATURE_M_SECURITY);
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
-     set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
+                 continue;                                               \
-     cpu->midr = 0x410fd213; /* r0p3 */
+             }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                  r[e] = 0;                                               \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(tm & 1)) {                                            \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE * 2)) == 0) {          \
                  continue;                                               \
              }                                                           \
 -            fpst0 = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :   \
 +            fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
                  &env->vfp.standard_fp_status;                           \
              fpst1 = fpst0;                                              \
              if (!(mask & 1)) {                                          \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
          TYPE *m = vm;                                           \
          TYPE ra = (TYPE)ra_in;                                  \
          float_status *fpst = (ESIZE == 2) ?                     \
 -            &env->vfp.standard_fp_status_f16 :                  \
 +            &env->vfp.fp_status[FPST_STD_F16] :                 \
              &env->vfp.standard_fp_status;                       \
          for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
              if (mask & 1) {                                     \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
              if ((mask & emask) == 0) {                                  \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & (1 << (e * ESIZE)))) {                         \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
              if ((mask & emask) == 0) {                                  \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & (1 << (e * ESIZE)))) {                         \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
          float_status *fpst;                                             \
          float_status scratch_fpst;                                      \
          float_status *base_fpst = (ESIZE == 2) ?                        \
 -            &env->vfp.standard_fp_status_f16 :                          \
 +            &env->vfp.fp_status[FPST_STD_F16] :                         \
              &env->vfp.standard_fp_status;                               \
          uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
          set_float_rounding_mode(rmode, base_fpst);                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
      /* FZ16 does not generate an input denormal exception.  */
      a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
            & ~float_flag_input_denormal_flushed);
 -    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
 +    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_STD_F16])
            & ~float_flag_input_denormal_flushed);
      a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
      set_float_exception_flags(0, &env->vfp.standard_fp_status);
 -    set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
      set_float_exception_flags(0, &env->vfp.ah_fp_status);
      set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
  }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          bool ftz_enabled = val & FPCR_FZ16;
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
 -        set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
 +        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
          set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
 -        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
 +        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
      }
      if (changed & FPCR_FZ) {
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 13/28] hw/arm/virt: Use 256MB ECAM region by default
+[PULL 58/68] target/arm: Remove standard_fp_status
-From: Eric Auger <eric.auger@redhat.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-With this patch, virt-3.0 machine uses a new 256MB ECAM region
+Replace with fp_status[FPST_STD].
-by default instead of the legacy 16MB one, if highmem is set
-(LPAE supported by the guest) and (!firmware_loaded || aarch64).
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Indeed aarch32 mode FW may not support this high ECAM region.
+Message-id: 20250129013857.135256-9-richard.henderson@linaro.org
 Signed-off-by: Eric Auger <eric.auger@redhat.com>
 Reviewed-by: Laszlo Ersek <lersek@redhat.com>
 Reviewed-by: Andrew Jones <drjones@redhat.com>
 Message-id: 1529072910-16156-11-git-send-email-eric.auger@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/arm/virt.h |  1 +
+ target/arm/cpu.h            |  1 -
- hw/arm/virt.c         | 10 ++++++++++
+ target/arm/cpu.c            |  8 ++++----
-files changed, 11 insertions(+)
+ target/arm/tcg/mve_helper.c | 28 ++++++++++++++--------------
+ target/arm/tcg/vec_helper.c |  4 ++--
-diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
+ target/arm/vfp_helper.c     |  4 ++--
-index XXXXXXX..XXXXXXX 100644
+files changed, 22 insertions(+), 23 deletions(-)
---- a/include/hw/arm/virt.h
-+++ b/include/hw/arm/virt.h
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ typedef struct {
+index XXXXXXX..XXXXXXX 100644
-     bool no_pmu;
+--- a/target/arm/cpu.h
-     bool claim_edge_triggered_timers;
++++ b/target/arm/cpu.h
-     bool smbios_old_sys_ver;
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
-+    bool no_highmem_ecam;
+                 float_status fp_status_f16_a64;
- } VirtMachineClass;
+                 float_status ah_fp_status;
+                 float_status ah_fp_status_f16;
- typedef struct {
+-                float_status standard_fp_status;
-diff --git a/hw/arm/virt.c b/hw/arm/virt.c
+             };
-index XXXXXXX..XXXXXXX 100644
+         };
---- a/hw/arm/virt.c
-+++ b/hw/arm/virt.c
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
+index XXXXXXX..XXXXXXX 100644
-     int n, virt_max_cpus;
+--- a/target/arm/cpu.c
-     MemoryRegion *ram = g_new(MemoryRegion, 1);
++++ b/target/arm/cpu.c
-     bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
-+    bool aarch64 = true;
+         env->sau.ctrl = 0;
      /* We can probe only here because during property set
       * KVM is not available yet
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
          numa_cpu_pre_plug(&possible_cpus->cpus[cs->cpu_index], DEVICE(cpuobj),
                            &error_fatal);
 +        aarch64 &= object_property_get_bool(cpuobj, "aarch64", NULL);
 +
          if (!vms->secure) {
              object_property_set_bool(cpuobj, false, "has_el3", NULL);
          }
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
          create_uart(vms, pic, VIRT_SECURE_UART, secure_sysmem, serial_hd(1));
      }
-+    vms->highmem_ecam &= vms->highmem && (!firmware_loaded || aarch64);
+-    set_flush_to_zero(1, &env->vfp.standard_fp_status);
-+
+-    set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
-     create_rtc(vms, pic);
+-    set_default_nan_mode(1, &env->vfp.standard_fp_status);
++    set_flush_to_zero(1, &env->vfp.fp_status[FPST_STD]);
-     create_pcie(vms, pic);
++    set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]);
-@@ -XXX,XX +XXX,XX @@ static void virt_3_0_instance_init(Object *obj)
++    set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
-                                     "Set GIC version. "
+     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
-                                     "Valid values are 2, 3 and host", NULL);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
-+    vms->highmem_ecam = !vmc->no_highmem_ecam;
+-    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
-+
++    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
-     if (vmc->no_its) {
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
-         vms->its = false;
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
-     } else {
+     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
-@@ -XXX,XX +XXX,XX @@ static void virt_2_12_instance_init(Object *obj)
+diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
+index XXXXXXX..XXXXXXX 100644
- static void virt_machine_2_12_options(MachineClass *mc)
+--- a/target/arm/tcg/mve_helper.c
- {
++++ b/target/arm/tcg/mve_helper.c
-+    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
-+
+                 continue;                                               \
-     virt_machine_3_0_options(mc);
+             }                                                           \
-     SET_MACHINE_COMPAT(mc, VIRT_COMPAT_2_12);
+             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-+    vmc->no_highmem_ecam = true;
+-                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(tm & 1)) {                                            \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
                  continue;                                               \
              }                                                           \
              fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              fpst1 = fpst0;                                              \
              if (!(mask & 1)) {                                          \
                  scratch_fpst = *fpst0;                                  \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
          TYPE ra = (TYPE)ra_in;                                  \
          float_status *fpst = (ESIZE == 2) ?                     \
              &env->vfp.fp_status[FPST_STD_F16] :                 \
 -            &env->vfp.standard_fp_status;                       \
 +            &env->vfp.fp_status[FPST_STD];                       \
          for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
              if (mask & 1) {                                     \
                  TYPE v = m[H##ESIZE(e)];                        \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & (1 << (e * ESIZE)))) {                         \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & (1 << (e * ESIZE)))) {                         \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
          float_status scratch_fpst;                                      \
          float_status *base_fpst = (ESIZE == 2) ?                        \
              &env->vfp.fp_status[FPST_STD_F16] :                         \
 -            &env->vfp.standard_fp_status;                               \
 +            &env->vfp.fp_status[FPST_STD];                               \
          uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
          set_float_rounding_mode(rmode, base_fpst);                      \
          for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
@@ -XXX,XX +XXX,XX @@ static void do_vcvt_sh(CPUARMState *env, void *vd, void *vm, int top)
      unsigned e;
      float_status *fpst;
      float_status scratch_fpst;
 -    float_status *base_fpst = &env->vfp.standard_fp_status;
 +    float_status *base_fpst = &env->vfp.fp_status[FPST_STD];
      bool old_fz = get_flush_to_zero(base_fpst);
      set_flush_to_zero(false, base_fpst);
      for (e = 0; e < 16 / 4; e++, mask >>= 4) {
@@ -XXX,XX +XXX,XX @@ static void do_vcvt_hs(CPUARMState *env, void *vd, void *vm, int top)
      unsigned e;
      float_status *fpst;
      float_status scratch_fpst;
 -    float_status *base_fpst = &env->vfp.standard_fp_status;
 +    float_status *base_fpst = &env->vfp.fp_status[FPST_STD];
      bool old_fiz = get_flush_inputs_to_zero(base_fpst);
      set_flush_inputs_to_zero(false, base_fpst);
      for (e = 0; e < 16 / 4; e++, mask >>= 4) {
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
      bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 -    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
 +    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
               get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
  }
- DEFINE_VIRT_MACHINE(2, 12)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
      bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 -    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
 +    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
                   get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
  }
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
      uint32_t a32_flags = 0, a64_flags = 0;
      a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
 -    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
 +    a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
      /* FZ16 does not generate an input denormal exception.  */
      a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
            & ~float_flag_input_denormal_flushed);
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      set_float_exception_flags(0, &env->vfp.fp_status_a64);
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
 -    set_float_exception_flags(0, &env->vfp.standard_fp_status);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
      set_float_exception_flags(0, &env->vfp.ah_fp_status);
      set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 28/28] xen: Don't use memory_region_init_ram_nomigrate() in pci_assign_dev_load_option_rom()
+[PULL 59/68] target/arm: Remove ah_fp_status_f16
-The xen pci_assign_dev_load_option_rom() currently creates a RAM
+From: Richard Henderson <richard.henderson@linaro.org>
 memory region with memory_region_init_ram_nomigrate(), and then
 manually registers it with vmstate_register_ram(). In fact for
 its only callsite, the 'owner' pointer we use for the init call
 and the '&dev->qdev' pointer we use for the vmstate_register_ram()
 call refer to the same object. Simplify the function to only
 take a pointer to the device once instead of twice, and use
 memory_region_init_ram() which automatically does the vmstate
 register for us.
-Acked-by: Anthony PERARD <anthony.perard@citrix.com>
+Replace with fp_status[FPST_AH_F16].
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20250129013857.135256-10-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/xen/xen_pt.h          | 2 +-
+ target/arm/cpu.h        |  3 +--
- hw/xen/xen_pt_graphics.c | 2 +-
+ target/arm/cpu.c        |  2 +-
- hw/xen/xen_pt_load_rom.c | 6 +++---
+ target/arm/vfp_helper.c | 10 +++++-----
-files changed, 5 insertions(+), 5 deletions(-)
+files changed, 7 insertions(+), 8 deletions(-)
-diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/xen/xen_pt.h
+--- a/target/arm/cpu.h
-+++ b/hw/xen/xen_pt.h
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline bool xen_pt_has_msix_mapping(XenPCIPassthroughState *s, int bar)
+@@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState;
   * behaviour when FPCR.AH == 1: they don't update cumulative
   * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
   * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
 - * which means we need an ah_fp_status_f16 as well.
 + * which means we need an FPST_AH_F16 as well.
   *
   * To avoid having to transfer exception bits around, we simply
   * say that the FPSCR cumulative exception flags are the logical
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
                  float_status fp_status_f16_a32;
                  float_status fp_status_f16_a64;
                  float_status ah_fp_status;
 -                float_status ah_fp_status_f16;
              };
          };
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
      arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
      set_flush_to_zero(1, &env->vfp.ah_fp_status);
      set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
 -    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status_f16);
 +    arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH_F16]);
  #ifndef CONFIG_USER_ONLY
      if (kvm_enabled()) {
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
      a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
            & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
      /*
 -     * We do not merge in flags from ah_fp_status or ah_fp_status_f16, because
 +     * We do not merge in flags from ah_fp_status or FPST_AH_F16, because
       * they are used for insns that must not set the cumulative exception bits.
       */
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
      set_float_exception_flags(0, &env->vfp.ah_fp_status);
 -    set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH_F16]);
  }
- extern void *pci_assign_dev_load_option_rom(PCIDevice *dev,
+ static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
--                                            struct Object *owner, int *size,
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
-+                                            int *size,
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
-                                             unsigned int domain,
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
-                                             unsigned int bus, unsigned int slot,
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-                                             unsigned int function);
+-        set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
-diff --git a/hw/xen/xen_pt_graphics.c b/hw/xen/xen_pt_graphics.c
++        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-index XXXXXXX..XXXXXXX 100644
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
---- a/hw/xen/xen_pt_graphics.c
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
-+++ b/hw/xen/xen_pt_graphics.c
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-@@ -XXX,XX +XXX,XX @@ int xen_pt_unregister_vga_regions(XenHostPCIDevice *dev)
+-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
- static void *get_vgabios(XenPCIPassthroughState *s, int *size,
++        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-                        XenHostPCIDevice *dev)
+     }
- {
+     if (changed & FPCR_FZ) {
--    return pci_assign_dev_load_option_rom(&s->dev, OBJECT(&s->dev), size,
+         bool ftz_enabled = val & FPCR_FZ;
-+    return pci_assign_dev_load_option_rom(&s->dev, size,
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
-                                           dev->domain, dev->bus,
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
-                                           dev->dev, dev->func);
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
- }
+         set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
-diff --git a/hw/xen/xen_pt_load_rom.c b/hw/xen/xen_pt_load_rom.c
+-        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status_f16);
-index XXXXXXX..XXXXXXX 100644
++        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
---- a/hw/xen/xen_pt_load_rom.c
+     }
-+++ b/hw/xen/xen_pt_load_rom.c
+     if (changed & FPCR_AH) {
-@@ -XXX,XX +XXX,XX @@
+         bool ah_enabled = val & FPCR_AH;
   * load the corresponding ROM data to RAM. If an error occurs while loading an
   * option ROM, we just ignore that option ROM and continue with the next one.
   */
 -void *pci_assign_dev_load_option_rom(PCIDevice *dev, struct Object *owner,
 +void *pci_assign_dev_load_option_rom(PCIDevice *dev,
                                       int *size, unsigned int domain,
                                       unsigned int bus, unsigned int slot,
                                       unsigned int function)
@@ -XXX,XX +XXX,XX @@ void *pci_assign_dev_load_option_rom(PCIDevice *dev, struct Object *owner,
      uint8_t val;
      struct stat st;
      void *ptr = NULL;
 +    Object *owner = OBJECT(dev);
      /* If loading ROM from file, pci handles it */
      if (dev->romfile || !dev->rom_bar) {
@@ -XXX,XX +XXX,XX @@ void *pci_assign_dev_load_option_rom(PCIDevice *dev, struct Object *owner,
      fseek(fp, 0, SEEK_SET);
      snprintf(name, sizeof(name), "%s.rom", object_get_typename(owner));
 -    memory_region_init_ram_nomigrate(&dev->rom, owner, name, st.st_size, &error_abort);
 -    vmstate_register_ram(&dev->rom, &dev->qdev);
 +    memory_region_init_ram(&dev->rom, owner, name, st.st_size, &error_abort);
      ptr = memory_region_get_ram_ptr(&dev->rom);
      memset(ptr, 0xff, st.st_size);
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 12/28] hw/arm/virt: Add virt-3.0 machine type
+[PULL 60/68] target/arm: Remove ah_fp_status
-From: Eric Auger <eric.auger@redhat.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-Add virt-3.0 machine type.
+Replace with fp_status[FPST_AH].
-Signed-off-by: Eric Auger <eric.auger@redhat.com>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Laszlo Ersek <lersek@redhat.com>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Andrew Jones <drjones@redhat.com>
+Message-id: 20250129013857.135256-11-richard.henderson@linaro.org
 Message-id: 1529072910-16156-10-git-send-email-eric.auger@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/virt.c | 15 +++++++++++++--
+ target/arm/cpu.h        | 3 +--
-file changed, 13 insertions(+), 2 deletions(-)
+ target/arm/cpu.c        | 6 +++---
  target/arm/vfp_helper.c | 6 +++---
 files changed, 7 insertions(+), 8 deletions(-)
-diff --git a/hw/arm/virt.c b/hw/arm/virt.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/virt.c
+--- a/target/arm/cpu.h
-+++ b/hw/arm/virt.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ type_init(machvirt_machine_init);
+@@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState;
- #define VIRT_COMPAT_2_12 \
+  * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
-     HW_COMPAT_2_12
+  * using a fixed value for it.
+  *
--static void virt_2_12_instance_init(Object *obj)
+- * The ah_fp_status is needed because some insns have different
-+static void virt_3_0_instance_init(Object *obj)
++ * FPST_AH is needed because some insns have different
- {
+  * behaviour when FPCR.AH == 1: they don't update cumulative
-     VirtMachineState *vms = VIRT_MACHINE(obj);
+  * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
-     VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+  * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
-@@ -XXX,XX +XXX,XX @@ static void virt_2_12_instance_init(Object *obj)
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
-     vms->irqmap = a15irqmap;
+                 float_status fp_status_a64;
                  float_status fp_status_f16_a32;
                  float_status fp_status_f16_a64;
 -                float_status ah_fp_status;
              };
          };
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
 -    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
 -    set_flush_to_zero(1, &env->vfp.ah_fp_status);
 -    set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
 +    arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]);
 +    set_flush_to_zero(1, &env->vfp.fp_status[FPST_AH]);
 +    set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_AH]);
      arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH_F16]);
  #ifndef CONFIG_USER_ONLY
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
      a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
            & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
      /*
 -     * We do not merge in flags from ah_fp_status or FPST_AH_F16, because
 +     * We do not merge in flags from FPST_AH or FPST_AH_F16, because
       * they are used for insns that must not set the cumulative exception bits.
       */
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
 -    set_float_exception_flags(0, &env->vfp.ah_fp_status);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH_F16]);
  }
-+static void virt_machine_3_0_options(MachineClass *mc)
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
-+{
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
-+}
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
-+DEFINE_VIRT_MACHINE_AS_LATEST(3, 0)
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
-+
+-        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
-+static void virt_2_12_instance_init(Object *obj)
++        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
-+{
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-+    virt_3_0_instance_init(obj);
+     }
-+}
+     if (changed & FPCR_AH) {
 +
  static void virt_machine_2_12_options(MachineClass *mc)
  {
 +    virt_machine_3_0_options(mc);
      SET_MACHINE_COMPAT(mc, VIRT_COMPAT_2_12);
  }
 -DEFINE_VIRT_MACHINE_AS_LATEST(2, 12)
 +DEFINE_VIRT_MACHINE(2, 12)
  #define VIRT_COMPAT_2_11 \
      HW_COMPAT_2_11
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 15/28] target-arm: Add the Cortex-R5F
+[PULL 61/68] target/arm: Remove fp_status_f16_a64
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-Add the Cortex-R5F with the optional FPU enabled.
+Replace with fp_status[FPST_A64_F16].
-Reviewed-by: KONRAD Frederic <frederic.konrad@adacore.com>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20250129013857.135256-12-richard.henderson@linaro.org
 Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 Message-id: 20180529124707.3025-2-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.c | 9 +++++++++
+ target/arm/cpu.h            |  1 -
-file changed, 9 insertions(+)
+ target/arm/cpu.c            |  2 +-
  target/arm/tcg/sme_helper.c |  2 +-
  target/arm/tcg/vec_helper.c |  9 ++++-----
  target/arm/vfp_helper.c     | 16 ++++++++--------
 files changed, 14 insertions(+), 16 deletions(-)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
+                 float_status fp_status_a32;
+                 float_status fp_status_a64;
+                 float_status fp_status_f16_a32;
+-                float_status fp_status_f16_a64;
+             };
+         };
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
-     define_arm_cp_regs(cpu, cortexr5_cp_reginfo);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
 -    arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
 +    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
      arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]);
      set_flush_to_zero(1, &env->vfp.fp_status[FPST_AH]);
 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/sme_helper.c
 +++ b/target/arm/tcg/sme_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
       * produces default NaNs. We also need a second copy of fp_status with
       * round-to-odd -- see above.
       */
 -    fpst_f16 = env->vfp.fp_status_f16_a64;
 +    fpst_f16 = env->vfp.fp_status[FPST_A64_F16];
      fpst_std = env->vfp.fp_status_a64;
      set_default_nan_mode(true, &fpst_std);
      set_default_nan_mode(true, &fpst_f16);
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
          }
      }
      do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
 -             get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
 +             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
  }
-+static void cortex_r5f_initfn(Object *obj)
+ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
-+{
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
-+    ARMCPU *cpu = ARM_CPU(obj);
+     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-+
+     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
-+    cortex_r5_initfn(obj);
+     float_status *status = &env->vfp.fp_status_a64;
-+    set_feature(&cpu->env, ARM_FEATURE_VFP3);
+-    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
-+}
++    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
-+
+     int negx = 0, negf = 0;
- static const ARMCPRegInfo cortexa8_cp_reginfo[] = {
-     { .name = "L2LOCKDOWN", .cp = 15, .crn = 9, .crm = 0, .opc1 = 1, .opc2 = 0,
+     if (is_s) {
-       .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
-@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_cpus[] = {
+         }
-     { .name = "cortex-m33",  .initfn = cortex_m33_initfn,
+     }
-                              .class_init = arm_v7m_class_init },
+     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
-     { .name = "cortex-r5",   .initfn = cortex_r5_initfn },
+-                 get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
-+    { .name = "cortex-r5f",  .initfn = cortex_r5f_initfn },
++                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
-     { .name = "cortex-a7",   .initfn = cortex_a7_initfn },
+ }
-     { .name = "cortex-a8",   .initfn = cortex_a8_initfn },
-     { .name = "cortex-a9",   .initfn = cortex_a9_initfn },
+ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
      intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
      intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
      float_status *status = &env->vfp.fp_status_a64;
 -    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
 +    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
      int negx = 0, negf = 0;
      if (is_s) {
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
              negx = 0x8000;
          }
      }
 -
      for (i = 0; i < oprsz; i += 16) {
          float16 mm_16 = *(float16 *)(vm + i + idx);
          float32 mm = float16_to_float32_by_bits(mm_16, fz16);
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
            & ~float_flag_input_denormal_flushed);
      a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
 -    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
 +    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A64_F16])
            & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
      /*
       * We do not merge in flags from FPST_AH or FPST_AH_F16, because
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      set_float_exception_flags(0, &env->vfp.fp_status_a32);
      set_float_exception_flags(0, &env->vfp.fp_status_a64);
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
 -    set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          set_float_rounding_mode(i, &env->vfp.fp_status_a32);
          set_float_rounding_mode(i, &env->vfp.fp_status_a64);
          set_float_rounding_mode(i, &env->vfp.fp_status_f16_a32);
 -        set_float_rounding_mode(i, &env->vfp.fp_status_f16_a64);
 +        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
      }
      if (changed & FPCR_FZ16) {
          bool ftz_enabled = val & FPCR_FZ16;
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
 -        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
 +        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]);
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
 -        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
 +        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
      }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
 -        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
      }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          if (ah_enabled) {
              /* Change behaviours for A64 FP operations */
              arm_set_ah_fp_behaviours(&env->vfp.fp_status_a64);
 -            arm_set_ah_fp_behaviours(&env->vfp.fp_status_f16_a64);
 +            arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
          } else {
              arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
 -            arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
 +            arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
          }
      }
      /*
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 11/28] hw/arm/virt: Add a new 256MB ECAM region
+[PULL 62/68] target/arm: Remove fp_status_f16_a32
-From: Eric Auger <eric.auger@redhat.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-This patch defines a new ECAM region located after the 256GB limit.
+Replace with fp_status[FPST_A32_F16].
-The virt machine state is augmented with a new highmem_ecam field
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-which guards the usage of this new ECAM region instead of the legacy
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-MB one. With the highmem ECAM region, up to 256 PCIe buses can be
+Message-id: 20250129013857.135256-13-richard.henderson@linaro.org
 used.
 Signed-off-by: Eric Auger <eric.auger@redhat.com>
 Reviewed-by: Laszlo Ersek <lersek@redhat.com>
 Reviewed-by: Andrew Jones <drjones@redhat.com>
 Message-id: 1529072910-16156-9-git-send-email-eric.auger@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/arm/virt.h    |  4 ++++
+ target/arm/cpu.h            |  1 -
- hw/arm/virt-acpi-build.c | 21 +++++++++++++--------
+ target/arm/cpu.c            |  2 +-
- hw/arm/virt.c            | 12 ++++++++----
+ target/arm/tcg/vec_helper.c |  4 ++--
-files changed, 25 insertions(+), 12 deletions(-)
+ target/arm/vfp_helper.c     | 14 +++++++-------
 files changed, 10 insertions(+), 11 deletions(-)
-diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/virt.h
+--- a/target/arm/cpu.h
-+++ b/include/hw/arm/virt.h
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ enum {
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
-     VIRT_PCIE_MMIO,
+             struct {
-     VIRT_PCIE_PIO,
+                 float_status fp_status_a32;
-     VIRT_PCIE_ECAM,
+                 float_status fp_status_a64;
-+    VIRT_PCIE_ECAM_HIGH,
+-                float_status fp_status_f16_a32;
-     VIRT_PLATFORM_BUS,
+             };
-     VIRT_PCIE_MMIO_HIGH,
+         };
-     VIRT_GPIO,
-@@ -XXX,XX +XXX,XX @@ typedef struct {
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
      FWCfgState *fw_cfg;
      bool secure;
      bool highmem;
 +    bool highmem_ecam;
      bool its;
      bool virt;
      int32_t gic_version;
@@ -XXX,XX +XXX,XX @@ typedef struct {
      int psci_conduit;
  } VirtMachineState;
 +#define VIRT_ECAM_ID(high) (high ? VIRT_PCIE_ECAM_HIGH : VIRT_PCIE_ECAM)
 +
  #define TYPE_VIRT_MACHINE   MACHINE_TYPE_NAME("virt")
  #define VIRT_MACHINE(obj) \
      OBJECT_CHECK(VirtMachineState, (obj), TYPE_VIRT_MACHINE)
 diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/virt-acpi-build.c
+--- a/target/arm/cpu.c
-+++ b/hw/arm/virt-acpi-build.c
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void acpi_dsdt_add_virtio(Aml *scope,
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
      arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
 -    arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
 +    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
      arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]);
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
      uint64_t negx = is_s ? 0x8000800080008000ull : 0;
      do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
 -             get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 +             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
  }
- static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
+ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
--                              uint32_t irq, bool use_highmem)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
-+                              uint32_t irq, bool use_highmem, bool highmem_ecam)
+     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
- {
-+    int ecam_id = VIRT_ECAM_ID(highmem_ecam);
+     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
-     Aml *method, *crs, *ifctx, *UUID, *ifctx1, *elsectx, *buf;
+-                 get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
-     int i, bus_no;
++                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
-     hwaddr base_mmio = memmap[VIRT_PCIE_MMIO].base;
+ }
-     hwaddr size_mmio = memmap[VIRT_PCIE_MMIO].size;
-     hwaddr base_pio = memmap[VIRT_PCIE_PIO].base;
+ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
-     hwaddr size_pio = memmap[VIRT_PCIE_PIO].size;
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 -    hwaddr base_ecam = memmap[VIRT_PCIE_ECAM].base;
 -    hwaddr size_ecam = memmap[VIRT_PCIE_ECAM].size;
 +    hwaddr base_ecam = memmap[ecam_id].base;
 +    hwaddr size_ecam = memmap[ecam_id].size;
      int nr_pcie_buses = size_ecam / PCIE_MMCFG_SIZE_MIN;
      Aml *dev = aml_device("%s", "PCI0");
@@ -XXX,XX +XXX,XX @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
      aml_append(dev, aml_name_decl("_CCA", aml_int(1)));
      /* Declare the PCI Routing Table. */
 -    Aml *rt_pkg = aml_package(nr_pcie_buses * PCI_NUM_PINS);
 +    Aml *rt_pkg = aml_varpackage(nr_pcie_buses * PCI_NUM_PINS);
      for (bus_no = 0; bus_no < nr_pcie_buses; bus_no++) {
          for (i = 0; i < PCI_NUM_PINS; i++) {
              int gsi = (i + bus_no) % PCI_NUM_PINS;
@@ -XXX,XX +XXX,XX @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
      Aml *dev_res0 = aml_device("%s", "RES0");
      aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
      crs = aml_resource_template();
 -    aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE));
 +    aml_append(crs,
 +        aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED, AML_MAX_FIXED,
 +                         AML_NON_CACHEABLE, AML_READ_WRITE, 0x0000, base_ecam,
 +                         base_ecam + size_ecam - 1, 0x0000, size_ecam));
      aml_append(dev_res0, aml_name_decl("_CRS", crs));
      aml_append(dev, dev_res0);
      aml_append(scope, dev);
@@ -XXX,XX +XXX,XX @@ build_mcfg(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
  {
      AcpiTableMcfg *mcfg;
      const MemMapEntry *memmap = vms->memmap;
 +    int ecam_id = VIRT_ECAM_ID(vms->highmem_ecam);
      int len = sizeof(*mcfg) + sizeof(mcfg->allocation[0]);
      int mcfg_start = table_data->len;
      mcfg = acpi_data_push(table_data, len);
 -    mcfg->allocation[0].address = cpu_to_le64(memmap[VIRT_PCIE_ECAM].base);
 +    mcfg->allocation[0].address = cpu_to_le64(memmap[ecam_id].base);
      /* Only a single allocation so no need to play with segments */
      mcfg->allocation[0].pci_segment = cpu_to_le16(0);
      mcfg->allocation[0].start_bus_number = 0;
 -    mcfg->allocation[0].end_bus_number = (memmap[VIRT_PCIE_ECAM].size
 +    mcfg->allocation[0].end_bus_number = (memmap[ecam_id].size
                                            / PCIE_MMCFG_SIZE_MIN) - 1;
      build_header(linker, table_data, (void *)(table_data->data + mcfg_start),
@@ -XXX,XX +XXX,XX @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
      acpi_dsdt_add_virtio(scope, &memmap[VIRT_MMIO],
                      (irqmap[VIRT_MMIO] + ARM_SPI_BASE), NUM_VIRTIO_TRANSPORTS);
      acpi_dsdt_add_pci(scope, memmap, (irqmap[VIRT_PCIE] + ARM_SPI_BASE),
 -                      vms->highmem);
 +                      vms->highmem, vms->highmem_ecam);
      acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
                         (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
      acpi_dsdt_add_power_button(scope);
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/virt.c
+--- a/target/arm/vfp_helper.c
-+++ b/hw/arm/virt.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ static const MemMapEntry a15memmap[] = {
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
-     [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
+     a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
-     /* Additional 64 MB redist region (can contain up to 512 redistributors) */
+     a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
-     [VIRT_GIC_REDIST2] =        { 0x4000000000ULL, 0x4000000 },
+     /* FZ16 does not generate an input denormal exception.  */
-+    [VIRT_PCIE_ECAM_HIGH] =     { 0x4010000000ULL, 0x10000000 },
+-    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
-     /* Second PCIe window, 512GB wide at the 512GB boundary */
++    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A32_F16])
-     [VIRT_PCIE_MMIO_HIGH] =   { 0x8000000000ULL, 0x8000000000ULL },
+           & ~float_flag_input_denormal_flushed);
- };
+     a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_STD_F16])
-@@ -XXX,XX +XXX,XX @@ static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
+           & ~float_flag_input_denormal_flushed);
-     hwaddr size_mmio_high = vms->memmap[VIRT_PCIE_MMIO_HIGH].size;
+@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
-     hwaddr base_pio = vms->memmap[VIRT_PCIE_PIO].base;
+      */
-     hwaddr size_pio = vms->memmap[VIRT_PCIE_PIO].size;
+     set_float_exception_flags(0, &env->vfp.fp_status_a32);
--    hwaddr base_ecam = vms->memmap[VIRT_PCIE_ECAM].base;
+     set_float_exception_flags(0, &env->vfp.fp_status_a64);
--    hwaddr size_ecam = vms->memmap[VIRT_PCIE_ECAM].size;
+-    set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
-+    hwaddr base_ecam, size_ecam;
++    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32_F16]);
-     hwaddr base = base_mmio;
+     set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
--    int nr_pcie_buses = size_ecam / PCIE_MMCFG_SIZE_MIN;
+     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
-+    int nr_pcie_buses;
+     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
-     int irq = vms->irqmap[VIRT_PCIE];
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
-     MemoryRegion *mmio_alias;
+         }
-     MemoryRegion *mmio_reg;
+         set_float_rounding_mode(i, &env->vfp.fp_status_a32);
-@@ -XXX,XX +XXX,XX @@ static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
+         set_float_rounding_mode(i, &env->vfp.fp_status_a64);
-     MemoryRegion *ecam_reg;
+-        set_float_rounding_mode(i, &env->vfp.fp_status_f16_a32);
-     DeviceState *dev;
++        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]);
-     char *nodename;
+         set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
--    int i;
+     }
-+    int i, ecam_id;
+     if (changed & FPCR_FZ16) {
-     PCIHostState *pci;
+         bool ftz_enabled = val & FPCR_FZ16;
+-        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
-     dev = qdev_create(NULL, TYPE_GPEX_HOST);
++        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32_F16]);
-     qdev_init_nofail(dev);
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]);
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-+    ecam_id = VIRT_ECAM_ID(vms->highmem_ecam);
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-+    base_ecam = vms->memmap[ecam_id].base;
+-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
-+    size_ecam = vms->memmap[ecam_id].size;
++        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32_F16]);
-+    nr_pcie_buses = size_ecam / PCIE_MMCFG_SIZE_MIN;
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]);
-     /* Map only the first size_ecam bytes of ECAM space */
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-     ecam_alias = g_new0(MemoryRegion, 1);
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-     ecam_reg = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          bool dnan_enabled = val & FPCR_DN;
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
 -        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
@@ -XXX,XX +XXX,XX @@ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
      softfloat_to_vfp_compare(env, \
          FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
  }
 -DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status_f16_a32)
 +DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
  DO_VFP_cmp(s, float32, float32, fp_status_a32)
  DO_VFP_cmp(d, float64, float64, fp_status_a32)
  #undef DO_VFP_cmp
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 10/28] hw/arm/virt: Register two redistributor regions when necessary
+[PULL 63/68] target/arm: Remove fp_status_a64
-From: Eric Auger <eric.auger@redhat.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-With a VGICv3 KVM device, if the number of vcpus exceeds the
+Replace with fp_status[FPST_A64].
 capacity of the legacy redistributor region (123 redistributors),
 we now attempt to register a second redistributor region. Up to
 redistributors can fit in this latter on top of the 123 allowed
 by the legacy redistributor region.
-Registering this second redistributor region is possible if the
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-host kernel supports the following VGICv3 KVM device group/attribute:
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-KVM_DEV_ARM_VGIC_GRP_ADDR/KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION.
+Message-id: 20250129013857.135256-14-richard.henderson@linaro.org
 In case the host kernel does not support the registration of several
 redistributor regions and the requested number of vcpus exceeds the
 capacity of the legacy redistributor region, the GICv3 device
 initialization fails with a proper error message and qemu exits.
 At the moment the max number of vcpus still is capped by the
 virt machine class max_cpus.
 Signed-off-by: Eric Auger <eric.auger@redhat.com>
 Reviewed-by: Andrew Jones <drjones@redhat.com>
 Message-id: 1529072910-16156-8-git-send-email-eric.auger@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/virt.c | 18 +++++++++++++++++-
+ target/arm/cpu.h            |  1 -
-file changed, 17 insertions(+), 1 deletion(-)
+ target/arm/cpu.c            |  2 +-
  target/arm/tcg/sme_helper.c |  2 +-
  target/arm/tcg/vec_helper.c | 10 +++++-----
  target/arm/vfp_helper.c     | 16 ++++++++--------
 files changed, 15 insertions(+), 16 deletions(-)
-diff --git a/hw/arm/virt.c b/hw/arm/virt.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/virt.c
+--- a/target/arm/cpu.h
-+++ b/hw/arm/virt.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static void create_gic(VirtMachineState *vms, qemu_irq *pic)
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
-     SysBusDevice *gicbusdev;
+             float_status fp_status[FPST_COUNT];
-     const char *gictype;
+             struct {
-     int type = vms->gic_version, i;
+                 float_status fp_status_a32;
-+    uint32_t nb_redist_regions = 0;
+-                float_status fp_status_a64;
+             };
-     gictype = (type == 3) ? gicv3_class_name() : gic_class_name();
+         };
-@@ -XXX,XX +XXX,XX @@ static void create_gic(VirtMachineState *vms, qemu_irq *pic)
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
-                     vms->memmap[VIRT_GIC_REDIST].size / GICV3_REDIST_SIZE;
+index XXXXXXX..XXXXXXX 100644
-         uint32_t redist0_count = MIN(smp_cpus, redist0_capacity);
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
--        qdev_prop_set_uint32(gicdev, "len-redist-region-count", 1);
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
-+        nb_redist_regions = virt_gicv3_redist_region_count(vms);
+     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
-+
+     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
-+        qdev_prop_set_uint32(gicdev, "len-redist-region-count",
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
-+                             nb_redist_regions);
+-    arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
-         qdev_prop_set_uint32(gicdev, "redist-region-count[0]", redist0_count);
++    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
-+
+     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
-+        if (nb_redist_regions == 2) {
+     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]);
-+            uint32_t redist1_capacity =
+     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
-+                        vms->memmap[VIRT_GIC_REDIST2].size / GICV3_REDIST_SIZE;
+diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
-+
+index XXXXXXX..XXXXXXX 100644
-+            qdev_prop_set_uint32(gicdev, "redist-region-count[1]",
+--- a/target/arm/tcg/sme_helper.c
-+                MIN(smp_cpus - redist0_count, redist1_capacity));
++++ b/target/arm/tcg/sme_helper.c
-+        }
+@@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
       * round-to-odd -- see above.
       */
      fpst_f16 = env->vfp.fp_status[FPST_A64_F16];
 -    fpst_std = env->vfp.fp_status_a64;
 +    fpst_std = env->vfp.fp_status[FPST_A64];
      set_default_nan_mode(true, &fpst_std);
      set_default_nan_mode(true, &fpst_f16);
      fpst_odd = fpst_std;
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
              negx = 0x8000800080008000ull;
          }
      }
-     qdev_init_nofail(gicdev);
+-    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
-     gicbusdev = SYS_BUS_DEVICE(gicdev);
++    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-     sysbus_mmio_map(gicbusdev, 0, vms->memmap[VIRT_GIC_DIST].base);
+              get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
-     if (type == 3) {
+ }
-         sysbus_mmio_map(gicbusdev, 1, vms->memmap[VIRT_GIC_REDIST].base);
-+        if (nb_redist_regions == 2) {
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
-+            sysbus_mmio_map(gicbusdev, 2, vms->memmap[VIRT_GIC_REDIST2].base);
+     intptr_t i, oprsz = simd_oprsz(desc);
-+        }
+     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-     } else {
+     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
-         sysbus_mmio_map(gicbusdev, 1, vms->memmap[VIRT_GIC_CPU].base);
+-    float_status *status = &env->vfp.fp_status_a64;
 +    float_status *status = &env->vfp.fp_status[FPST_A64];
      bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
      int negx = 0, negf = 0;
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
              negx = 0x8000800080008000ull;
          }
      }
-@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
+-    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
 +    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
                   get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
  }
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
      bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
      intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
 -    float_status *status = &env->vfp.fp_status_a64;
 +    float_status *status = &env->vfp.fp_status[FPST_A64];
      bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
      int negx = 0, negf = 0;
@@ -XXX,XX +XXX,XX @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp)
       */
-     if (vms->gic_version == 3) {
+     bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;
-         virt_max_cpus = vms->memmap[VIRT_GIC_REDIST].size / GICV3_REDIST_SIZE;
-+        virt_max_cpus += vms->memmap[VIRT_GIC_REDIST2].size / GICV3_REDIST_SIZE;
+-    *statusp = is_a64(env) ? env->vfp.fp_status_a64 : env->vfp.fp_status_a32;
-     } else {
++    *statusp = is_a64(env) ? env->vfp.fp_status[FPST_A64] : env->vfp.fp_status_a32;
-         virt_max_cpus = GIC_NCPU;
+     set_default_nan_mode(true, statusp);
      if (ebf) {
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
      a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_STD_F16])
            & ~float_flag_input_denormal_flushed);
 -    a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
 +    a64_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_A64]);
      a64_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A64_F16])
            & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
      /*
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
       * be the architecturally up-to-date exception flag information first.
       */
      set_float_exception_flags(0, &env->vfp.fp_status_a32);
 -    set_float_exception_flags(0, &env->vfp.fp_status_a64);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
              break;
          }
          set_float_rounding_mode(i, &env->vfp.fp_status_a32);
 -        set_float_rounding_mode(i, &env->vfp.fp_status_a64);
 +        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64]);
          set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]);
          set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
      }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
      if (changed & FPCR_FZ) {
          bool ftz_enabled = val & FPCR_FZ;
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
 -        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
 +        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64]);
          /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
      }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
           */
          bool fitz_enabled = (val & FPCR_FIZ) ||
              (val & (FPCR_FZ | FPCR_AH)) == FPCR_FZ;
 -        set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status_a64);
 +        set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status[FPST_A64]);
      }
      if (changed & FPCR_DN) {
          bool dnan_enabled = val & FPCR_DN;
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
 -        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          if (ah_enabled) {
              /* Change behaviours for A64 FP operations */
 -            arm_set_ah_fp_behaviours(&env->vfp.fp_status_a64);
 +            arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
              arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
          } else {
 -            arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
 +            arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
              arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
          }
      }
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 14/28] hw/arm/virt: Increase max_cpus to 512
+[PULL 64/68] target/arm: Remove fp_status_a32
-From: Eric Auger <eric.auger@redhat.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-virt 3.0 now allows up to 512 vcpus whereas for earlier machine
+Replace with fp_status[FPST_A32].  As this was the last of the
-types, max_cpus was set to 255 and any attempt to start the
+old structures, we can remove the anonymous union and struct.
 machine with vcpus > 255 was rejected at a very early stage,
 in vl.c/main level.
-is the max supported by KVM. Anyway the actual vcpu count
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-that can be achieved depends on other parameters such as the
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-acceleration mode, the vgic version, the host kernel version.
+Message-id: 20250129013857.135256-15-richard.henderson@linaro.org
-Those are discovered later on.
+[PMM: tweak to account for change to is_ebf()]
 Signed-off-by: Eric Auger <eric.auger@redhat.com>
 Reviewed-by: Andrew Jones <drjones@redhat.com>
 Message-id: 1529072910-16156-12-git-send-email-eric.auger@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/virt.c | 7 ++++---
+ target/arm/cpu.h            |  7 +------
-file changed, 4 insertions(+), 3 deletions(-)
+ target/arm/cpu.c            |  2 +-
  target/arm/tcg/vec_helper.c |  2 +-
  target/arm/vfp_helper.c     | 18 +++++++++---------
 files changed, 12 insertions(+), 17 deletions(-)
-diff --git a/hw/arm/virt.c b/hw/arm/virt.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/virt.c
+--- a/target/arm/cpu.h
-+++ b/hw/arm/virt.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
-     HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
+         uint32_t scratch[8];
-     mc->init = machvirt_init;
+         /* There are a number of distinct float control structures. */
--    /* Start max_cpus at the maximum QEMU supports. We'll further restrict
+-        union {
--     * it later in machvirt_init, where we have more information about the
+-            float_status fp_status[FPST_COUNT];
-+    /* Start with max_cpus set to 512, which is the maximum supported by KVM.
+-            struct {
-+     * The value may be reduced later when we have more information about the
+-                float_status fp_status_a32;
-      * configuration of the particular instance.
+-            };
 -        };
 +        float_status fp_status[FPST_COUNT];
          uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
          uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
      set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]);
      set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
      set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
 -    arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
 +    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]);
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp)
       */
--    mc->max_cpus = 255;
+     bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;
-+    mc->max_cpus = 512;
-     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_CALXEDA_XGMAC);
+-    *statusp = is_a64(env) ? env->vfp.fp_status[FPST_A64] : env->vfp.fp_status_a32;
-     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_AMD_XGBE);
++    *statusp = env->vfp.fp_status[is_a64(env) ? FPST_A64 : FPST_A32];
-     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE);
+     set_default_nan_mode(true, statusp);
-@@ -XXX,XX +XXX,XX @@ static void virt_machine_2_12_options(MachineClass *mc)
-     virt_machine_3_0_options(mc);
+     if (ebf) {
-     SET_MACHINE_COMPAT(mc, VIRT_COMPAT_2_12);
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
-     vmc->no_highmem_ecam = true;
+index XXXXXXX..XXXXXXX 100644
-+    mc->max_cpus = 255;
+--- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
  {
      uint32_t a32_flags = 0, a64_flags = 0;
 -    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
 +    a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_A32]);
      a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
      /* FZ16 does not generate an input denormal exception.  */
      a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A32_F16])
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
       * values. The caller should have arranged for env->vfp.fpsr to
       * be the architecturally up-to-date exception flag information first.
       */
 -    set_float_exception_flags(0, &env->vfp.fp_status_a32);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
              i = float_round_to_zero;
              break;
          }
 -        set_float_rounding_mode(i, &env->vfp.fp_status_a32);
 +        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32]);
          set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64]);
          set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]);
          set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
      }
      if (changed & FPCR_FZ) {
          bool ftz_enabled = val & FPCR_FZ;
 -        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
 +        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]);
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64]);
          /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
 -        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
 +        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]);
      }
      if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
          /*
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
      }
      if (changed & FPCR_DN) {
          bool dnan_enabled = val & FPCR_DN;
 -        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
          FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
  }
- DEFINE_VIRT_MACHINE(2, 12)
+ DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
 -DO_VFP_cmp(s, float32, float32, fp_status_a32)
 -DO_VFP_cmp(d, float64, float64, fp_status_a32)
 +DO_VFP_cmp(s, float32, float32, fp_status[FPST_A32])
 +DO_VFP_cmp(d, float64, float64, fp_status[FPST_A32])
  #undef DO_VFP_cmp
  /* Integer to float and float to integer conversions */
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(fjcvtzs)(float64 value, float_status *status)
  uint32_t HELPER(vjcvt)(float64 value, CPUARMState *env)
  {
 -    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status_a32);
 +    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status[FPST_A32]);
      uint32_t result = pair;
      uint32_t z = (pair >> 32) == 0;
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 08/28] hw/arm/virt: GICv3 DT node with one or two redistributor regions
+[PULL 65/68] target/arm: Simplify fp_status indexing in mve_helper.c
-From: Eric Auger <eric.auger@redhat.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-This patch allows the creation of a GICv3 node with 1 or 2
+Select on index instead of pointer.
-redistributor regions depending on the number of smu_cpus.
+No functional change.
 The second redistributor region is located just after the
 existing RAM region, at 256GB and contains up to up to 512 vcpus.
-Please refer to kernel documentation for further node details:
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20250129013857.135256-16-richard.henderson@linaro.org
 Signed-off-by: Eric Auger <eric.auger@redhat.com>
 Reviewed-by: Andrew Jones <drjones@redhat.com>
 Message-id: 1529072910-16156-6-git-send-email-eric.auger@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/arm/virt.h | 14 ++++++++++++++
+ target/arm/tcg/mve_helper.c | 40 +++++++++++++------------------------
- hw/arm/virt.c         | 29 ++++++++++++++++++++++++-----
+file changed, 14 insertions(+), 26 deletions(-)
 files changed, 38 insertions(+), 5 deletions(-)
-diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
+diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/virt.h
+--- a/target/arm/tcg/mve_helper.c
-+++ b/include/hw/arm/virt.h
++++ b/target/arm/tcg/mve_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
- #include "qemu/notify.h"
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
- #include "hw/boards.h"
+                 continue;                                               \
- #include "hw/arm/arm.h"
+             }                                                           \
-+#include "sysemu/kvm.h"
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-+#include "hw/intc/arm_gicv3_common.h"
+-                &env->vfp.fp_status[FPST_STD];                           \
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
- #define NUM_GICV2M_SPIS       64
+             if (!(mask & 1)) {                                          \
- #define NUM_VIRTIO_TRANSPORTS 32
+                 /* We need the result but without updating flags */     \
-@@ -XXX,XX +XXX,XX @@ enum {
+                 scratch_fpst = *fpst;                                   \
-     VIRT_GIC_V2M,
+@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
-     VIRT_GIC_ITS,
+                 r[e] = 0;                                               \
-     VIRT_GIC_REDIST,
+                 continue;                                               \
-+    VIRT_GIC_REDIST2,
+             }                                                           \
-     VIRT_SMMU,
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-     VIRT_UART,
+-                &env->vfp.fp_status[FPST_STD];                           \
-     VIRT_MMIO,
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
-@@ -XXX,XX +XXX,XX @@ typedef struct {
+             if (!(tm & 1)) {                                            \
+                 /* We need the result but without updating flags */     \
- void virt_acpi_setup(VirtMachineState *vms);
+                 scratch_fpst = *fpst;                                   \
+@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
-+/* Return the number of used redistributor regions  */
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
-+static inline int virt_gicv3_redist_region_count(VirtMachineState *vms)
+                 continue;                                               \
-+{
+             }                                                           \
-+    uint32_t redist0_capacity =
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-+                vms->memmap[VIRT_GIC_REDIST].size / GICV3_REDIST_SIZE;
+-                &env->vfp.fp_status[FPST_STD];                           \
-+
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
-+    assert(vms->gic_version == 3);
+             if (!(mask & 1)) {                                          \
-+
+                 /* We need the result but without updating flags */     \
-+    return vms->smp_cpus > redist0_capacity ? 2 : 1;
+                 scratch_fpst = *fpst;                                   \
-+}
+@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
-+
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE * 2)) == 0) {          \
- #endif /* QEMU_ARM_VIRT_H */
+                 continue;                                               \
-diff --git a/hw/arm/virt.c b/hw/arm/virt.c
+             }                                                           \
-index XXXXXXX..XXXXXXX 100644
+-            fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
---- a/hw/arm/virt.c
+-                &env->vfp.fp_status[FPST_STD];                           \
-+++ b/hw/arm/virt.c
++            fpst0 = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
-@@ -XXX,XX +XXX,XX @@ static const MemMapEntry a15memmap[] = {
+             fpst1 = fpst0;                                              \
-     [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
+             if (!(mask & 1)) {                                          \
-     [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
+                 scratch_fpst = *fpst0;                                  \
-     [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
+@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
-+    /* Additional 64 MB redist region (can contain up to 512 redistributors) */
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
-+    [VIRT_GIC_REDIST2] =        { 0x4000000000ULL, 0x4000000 },
+                 continue;                                               \
-     /* Second PCIe window, 512GB wide at the 512GB boundary */
+             }                                                           \
-     [VIRT_PCIE_MMIO_HIGH] =   { 0x8000000000ULL, 0x8000000000ULL },
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
- };
+-                &env->vfp.fp_status[FPST_STD];                           \
-@@ -XXX,XX +XXX,XX @@ static void fdt_add_gic_node(VirtMachineState *vms)
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
-     qemu_fdt_setprop_cell(vms->fdt, "/intc", "#size-cells", 0x2);
+             if (!(mask & 1)) {                                          \
-     qemu_fdt_setprop(vms->fdt, "/intc", "ranges", NULL, 0);
+                 /* We need the result but without updating flags */     \
-     if (vms->gic_version == 3) {
+                 scratch_fpst = *fpst;                                   \
-+        int nb_redist_regions = virt_gicv3_redist_region_count(vms);
+@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
-+
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
-         qemu_fdt_setprop_string(vms->fdt, "/intc", "compatible",
+                 continue;                                               \
-                                 "arm,gic-v3");
+             }                                                           \
--        qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg",
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
--                                     2, vms->memmap[VIRT_GIC_DIST].base,
+-                &env->vfp.fp_status[FPST_STD];                           \
--                                     2, vms->memmap[VIRT_GIC_DIST].size,
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
--                                     2, vms->memmap[VIRT_GIC_REDIST].base,
+             if (!(mask & 1)) {                                          \
--                                     2, vms->memmap[VIRT_GIC_REDIST].size);
+                 /* We need the result but without updating flags */     \
-+
+                 scratch_fpst = *fpst;                                   \
-+        qemu_fdt_setprop_cell(vms->fdt, "/intc",
+@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
-+                              "#redistributor-regions", nb_redist_regions);
+         unsigned e;                                             \
-+
+         TYPE *m = vm;                                           \
-+        if (nb_redist_regions == 1) {
+         TYPE ra = (TYPE)ra_in;                                  \
-+            qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg",
+-        float_status *fpst = (ESIZE == 2) ?                     \
-+                                         2, vms->memmap[VIRT_GIC_DIST].base,
+-            &env->vfp.fp_status[FPST_STD_F16] :                 \
-+                                         2, vms->memmap[VIRT_GIC_DIST].size,
+-            &env->vfp.fp_status[FPST_STD];                       \
-+                                         2, vms->memmap[VIRT_GIC_REDIST].base,
++        float_status *fpst =                                    \
-+                                         2, vms->memmap[VIRT_GIC_REDIST].size);
++            &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
-+        } else {
+         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
-+            qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg",
+             if (mask & 1) {                                     \
-+                                         2, vms->memmap[VIRT_GIC_DIST].base,
+                 TYPE v = m[H##ESIZE(e)];                        \
-+                                         2, vms->memmap[VIRT_GIC_DIST].size,
+@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
-+                                         2, vms->memmap[VIRT_GIC_REDIST].base,
+             if ((mask & emask) == 0) {                                  \
-+                                         2, vms->memmap[VIRT_GIC_REDIST].size,
+                 continue;                                               \
-+                                         2, vms->memmap[VIRT_GIC_REDIST2].base,
+             }                                                           \
-+                                         2, vms->memmap[VIRT_GIC_REDIST2].size);
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-+        }
+-                &env->vfp.fp_status[FPST_STD];                           \
-+
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
-         if (vms->virt) {
+             if (!(mask & (1 << (e * ESIZE)))) {                         \
-             qemu_fdt_setprop_cells(vms->fdt, "/intc", "interrupts",
+                 /* We need the result but without updating flags */     \
-                                    GIC_FDT_IRQ_TYPE_PPI, ARCH_GICV3_MAINT_IRQ,
+                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
              if ((mask & emask) == 0) {                                  \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.fp_status[FPST_STD];                           \
 +            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
              if (!(mask & (1 << (e * ESIZE)))) {                         \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.fp_status[FPST_STD];                           \
 +            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
          unsigned e;                                                     \
          float_status *fpst;                                             \
          float_status scratch_fpst;                                      \
 -        float_status *base_fpst = (ESIZE == 2) ?                        \
 -            &env->vfp.fp_status[FPST_STD_F16] :                         \
 -            &env->vfp.fp_status[FPST_STD];                               \
 +        float_status *base_fpst =                                       \
 +            &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD];  \
          uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
          set_float_rounding_mode(rmode, base_fpst);                      \
          for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.fp_status[FPST_STD];                           \
 +            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 09/28] hw/arm/virt-acpi-build: Advertise one or two GICR structures
+[PULL 66/68] target/arm: Simplify DO_VFP_cmp in vfp_helper.c
-From: Eric Auger <eric.auger@redhat.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-Depending on the number of smp_cpus we now register one or two
+Pass ARMFPStatusFlavour index instead of fp_status[FOO].
 GICR structures.
-Signed-off-by: Eric Auger <eric.auger@redhat.com>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Andrew Jones <drjones@redhat.com>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 1529072910-16156-7-git-send-email-eric.auger@redhat.com
+Message-id: 20250129013857.135256-17-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/virt-acpi-build.c | 9 +++++++++
+ target/arm/vfp_helper.c | 10 +++++-----
-file changed, 9 insertions(+)
+file changed, 5 insertions(+), 5 deletions(-)
-diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/virt-acpi-build.c
+--- a/target/arm/vfp_helper.c
-+++ b/hw/arm/virt-acpi-build.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
+@@ -XXX,XX +XXX,XX @@ static void softfloat_to_vfp_compare(CPUARMState *env, FloatRelation cmp)
+ void VFP_HELPER(cmp, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env)  \
-     if (vms->gic_version == 3) {
+ { \
-         AcpiMadtGenericTranslator *gic_its;
+     softfloat_to_vfp_compare(env, \
-+        int nb_redist_regions = virt_gicv3_redist_region_count(vms);
+-        FLOATTYPE ## _compare_quiet(a, b, &env->vfp.FPST)); \
-         AcpiMadtGenericRedistributor *gicr = acpi_data_push(table_data,
++        FLOATTYPE ## _compare_quiet(a, b, &env->vfp.fp_status[FPST])); \
-                                                          sizeof *gicr);
+ } \
+ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
-@@ -XXX,XX +XXX,XX @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
+ { \
-         gicr->base_address = cpu_to_le64(memmap[VIRT_GIC_REDIST].base);
+     softfloat_to_vfp_compare(env, \
-         gicr->range_length = cpu_to_le32(memmap[VIRT_GIC_REDIST].size);
+-        FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
++        FLOATTYPE ## _compare(a, b, &env->vfp.fp_status[FPST])); \
-+        if (nb_redist_regions == 2) {
+ }
-+            gicr = acpi_data_push(table_data, sizeof(*gicr));
+-DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
-+            gicr->type = ACPI_APIC_GENERIC_REDISTRIBUTOR;
+-DO_VFP_cmp(s, float32, float32, fp_status[FPST_A32])
-+            gicr->length = sizeof(*gicr);
+-DO_VFP_cmp(d, float64, float64, fp_status[FPST_A32])
-+            gicr->base_address = cpu_to_le64(memmap[VIRT_GIC_REDIST2].base);
++DO_VFP_cmp(h, float16, dh_ctype_f16, FPST_A32_F16)
-+            gicr->range_length = cpu_to_le32(memmap[VIRT_GIC_REDIST2].size);
++DO_VFP_cmp(s, float32, float32, FPST_A32)
-+        }
++DO_VFP_cmp(d, float64, float64, FPST_A32)
-+
+ #undef DO_VFP_cmp
-         if (its_class_name() && !vmc->no_its) {
-             gic_its = acpi_data_push(table_data, sizeof *gic_its);
+ /* Integer to float and float to integer conversions */
              gic_its->type = ACPI_APIC_GENERIC_TRANSLATOR;
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 02/28] target/arm: Minor cleanup for ARMv6-M 32-bit instructions
+[PULL 67/68] target/arm: Read fz16 from env->vfp.fpcr
-From: Julia Suvorova <jusual@mail.ru>
+From: Richard Henderson <richard.henderson@linaro.org>
-The arrays were made static, "if" was simplified because V7M and V8M
+Read the bit from the source, rather than from the proxy via
-define V6 feature.
+get_flush_inputs_to_zero.  This makes it clear that it does
 not matter which of the float_status structures is used.
-Signed-off-by: Julia Suvorova <jusual@mail.ru>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20250129013857.135256-34-richard.henderson@linaro.org
 Message-id: 20180618214604.6777-1-jusual@mail.ru
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 27 +++++++++++++--------------
+ target/arm/tcg/vec_helper.c | 12 ++++++------
-file changed, 13 insertions(+), 14 deletions(-)
+file changed, 6 insertions(+), 6 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/vec_helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
-         !arm_dc_feature(s, ARM_FEATURE_V7)) {
+     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
-         int i;
-         bool found = false;
+     do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
--        const uint32_t armv6m_insn[] = {0xf3808000 /* msr */,
+-             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
--                                        0xf3b08040 /* dsb */,
++             env->vfp.fpcr & FPCR_FZ16);
--                                        0xf3b08050 /* dmb */,
+ }
--                                        0xf3b08060 /* isb */,
--                                        0xf3e08000 /* mrs */,
+ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
--                                        0xf000d000 /* bl */};
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
--        const uint32_t armv6m_mask[] = {0xffe0d000,
+         }
--                                        0xfff0d0f0,
+     }
--                                        0xfff0d0f0,
+     do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
--                                        0xfff0d0f0,
+-             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
--                                        0xffe0d000,
++             env->vfp.fpcr & FPCR_FZ16);
--                                        0xf800d000};
+ }
-+        static const uint32_t armv6m_insn[] = {0xf3808000 /* msr */,
-+                                               0xf3b08040 /* dsb */,
+ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
-+                                               0xf3b08050 /* dmb */,
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
-+                                               0xf3b08060 /* isb */,
+     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-+                                               0xf3e08000 /* mrs */,
+     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
-+                                               0xf000d000 /* bl */};
+     float_status *status = &env->vfp.fp_status[FPST_A64];
-+        static const uint32_t armv6m_mask[] = {0xffe0d000,
+-    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
-+                                               0xfff0d0f0,
++    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
-+                                               0xfff0d0f0,
+     int negx = 0, negf = 0;
-+                                               0xfff0d0f0,
-+                                               0xffe0d000,
+     if (is_s) {
-+                                               0xf800d000};
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
+     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
-         for (i = 0; i < ARRAY_SIZE(armv6m_insn); i++) {
-             if ((insn & armv6m_mask[i]) == armv6m_insn[i]) {
+     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
-@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
+-                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
-                         break;
++                 env->vfp.fpcr & FPCR_FZ16);
-                     case 3: /* Special control operations.  */
+ }
-                         if (!arm_dc_feature(s, ARM_FEATURE_V7) &&
--                            !(arm_dc_feature(s, ARM_FEATURE_V6) &&
+ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
--                              arm_dc_feature(s, ARM_FEATURE_M))) {
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
-+                            !arm_dc_feature(s, ARM_FEATURE_M)) {
+         }
-                             goto illegal_op;
+     }
-                         }
+     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-                         op = (insn >> 4) & 0xf;
+-                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
 +                 env->vfp.fpcr & FPCR_FZ16);
  }
  void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
      intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
      intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
      float_status *status = &env->vfp.fp_status[FPST_A64];
 -    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
 +    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
      int negx = 0, negf = 0;
      if (is_s) {
 --
-.17.1
+.34.1

-[Qemu-devel] [PULL 01/28] hw/intc/arm_gicv3: fix an extra left-shift when reading IPRIORITYR
+[PULL 68/68] target/arm: Sink fp_status and fpcr access into do_fmlal*
-From: Amol Surati <suratiamol@gmail.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-When either GICD_IPRIORITYR or GICR_IPRIORITYR is read as a 32-bit
+Sink common code from the callers into do_fmlal
-register, the post left-shift operator in the for loop causes an
+and do_fmlal_idx.  Reorder the arguments to minimize
-extra shift after the least significant byte has been placed.
+the re-sorting from the caller's arguments.
-The 32-bit value actually returned is therefore the expected value
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-shifted left by 8 bits.
+Message-id: 20250129013857.135256-35-richard.henderson@linaro.org
 Signed-off-by: Amol Surati <suratiamol@gmail.com>
 Message-id: 20180614054857.26248-1-suratiamol@gmail.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/intc/arm_gicv3_dist.c   | 3 ++-
+ target/arm/tcg/vec_helper.c | 28 ++++++++++++++++------------
- hw/intc/arm_gicv3_redist.c | 3 ++-
+file changed, 16 insertions(+), 12 deletions(-)
 files changed, 4 insertions(+), 2 deletions(-)
-diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/arm_gicv3_dist.c
+--- a/target/arm/tcg/vec_helper.c
-+++ b/hw/intc/arm_gicv3_dist.c
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static MemTxResult gicd_readl(GICv3State *s, hwaddr offset,
+@@ -XXX,XX +XXX,XX @@ static uint64_t load4_f16(uint64_t *ptr, int is_q, int is_2)
-         int i, irq = offset - GICD_IPRIORITYR;
+  * as there is not yet SVE versions that might use blocking.
-         uint32_t value = 0;
+  */
--        for (i = irq + 3; i >= irq; i--, value <<= 8) {
+-static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
-+        for (i = irq + 3; i >= irq; i--) {
+-                     uint64_t negx, int negf, uint32_t desc, bool fz16)
-+            value <<= 8;
++static void do_fmlal(float32 *d, void *vn, void *vm,
-             value |= gicd_read_ipriorityr(s, attrs, i);
++                     CPUARMState *env, uint32_t desc,
 +                     ARMFPStatusFlavour fpst_idx,
 +                     uint64_t negx, int negf)
  {
 +    float_status *fpst = &env->vfp.fp_status[fpst_idx];
 +    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
      intptr_t i, oprsz = simd_oprsz(desc);
      int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
      int is_q = oprsz == 16;
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
      bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 -    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
 -             env->vfp.fpcr & FPCR_FZ16);
 +    do_fmlal(vd, vn, vm, env, desc, FPST_STD, negx, 0);
  }
  void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
              negx = 0x8000800080008000ull;
          }
-         *data = value;
+     }
-diff --git a/hw/intc/arm_gicv3_redist.c b/hw/intc/arm_gicv3_redist.c
+-    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-index XXXXXXX..XXXXXXX 100644
+-             env->vfp.fpcr & FPCR_FZ16);
---- a/hw/intc/arm_gicv3_redist.c
++    do_fmlal(vd, vn, vm, env, desc, FPST_A64, negx, negf);
-+++ b/hw/intc/arm_gicv3_redist.c
+ }
-@@ -XXX,XX +XXX,XX @@ static MemTxResult gicr_readl(GICv3CPUState *cs, hwaddr offset,
-         int i, irq = offset - GICR_IPRIORITYR;
+ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
-         uint32_t value = 0;
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
+     }
--        for (i = irq + 3; i >= irq; i--, value <<= 8) {
+ }
-+        for (i = irq + 3; i >= irq; i--) {
-+            value <<= 8;
+-static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
-             value |= gicr_read_ipriorityr(cs, attrs, i);
+-                         uint64_t negx, int negf, uint32_t desc, bool fz16)
 +static void do_fmlal_idx(float32 *d, void *vn, void *vm,
 +                         CPUARMState *env, uint32_t desc,
 +                         ARMFPStatusFlavour fpst_idx,
 +                         uint64_t negx, int negf)
  {
 +    float_status *fpst = &env->vfp.fp_status[fpst_idx];
 +    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
      intptr_t i, oprsz = simd_oprsz(desc);
      int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
      int index = extract32(desc, SIMD_DATA_SHIFT + 2, 3);
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
      bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 -    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
 -                 env->vfp.fpcr & FPCR_FZ16);
 +    do_fmlal_idx(vd, vn, vm, env, desc, FPST_STD, negx, 0);
  }
  void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
              negx = 0x8000800080008000ull;
          }
-         *data = value;
+     }
 -    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
 -                 env->vfp.fpcr & FPCR_FZ16);
 +    do_fmlal_idx(vd, vn, vm, env, desc, FPST_A64, negx, negf);
  }
  void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
 --
-.17.1
+.34.1

Arm queue. I still have a lot of stuff in my to-review queue, so
won't be long til the next one.

I've thrown in a couple of minor non-arm patches (a xen code
cleanup and a vl.c codestyle issue).

thanks
-- PMM

The following changes since commit de44c044420d1139480fa50c2d5be19223391218:

Merge remote-tracking branch 'remotes/stsquad/tags/pull-tcg-testing-revivial-210618-2' into staging (2018-06-22 10:57:47 +0100)

are available in the Git repository at:

git://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20180622

for you to fetch changes up to 6dad8260e82b69bd278685ee25209f5824360455:

xen: Don't use memory_region_init_ram_nomigrate() in pci_assign_dev_load_option_rom() (2018-06-22 13:28:42 +0100)

----------------------------------------------------------------
target-arm queue:
 * hw/intc/arm_gicv3: fix wrong values when reading IPRIORITYR
 * target/arm: fix read of freed memory in kvm_arm_machine_init_done()
 * virt: support up to 512 CPUs
 * virt: support 256MB ECAM PCI region (for more PCI devices)
 * xlnx-zynqmp: Use Cortex-R5F, not Cortex-R5
 * mps2-tz: Implement and use the TrustZone Memory Protection Controller
 * target/arm: enforce alignment checking for v6M cores
 * xen: Don't use memory_region_init_ram_nomigrate() in pci_assign_dev_load_option_rom()
 * vl.c: Don't zero-initialize statics for serial_hds

----------------------------------------------------------------
Amol Surati (1):
      hw/intc/arm_gicv3: fix an extra left-shift when reading IPRIORITYR

Edgar E. Iglesias (2):
      target-arm: Add the Cortex-R5F
      xlnx-zynqmp: Swap Cortex-R5 for Cortex-R5F

Eric Auger (11):
      linux-headers: Update to kernel mainline commit b357bf602
      target/arm: Allow KVM device address overwriting
      hw/intc/arm_gicv3: Introduce redist-region-count array property
      hw/intc/arm_gicv3_kvm: Get prepared to handle multiple redist regions
      hw/arm/virt: GICv3 DT node with one or two redistributor regions
      hw/arm/virt-acpi-build: Advertise one or two GICR structures
      hw/arm/virt: Register two redistributor regions when necessary
      hw/arm/virt: Add a new 256MB ECAM region
      hw/arm/virt: Add virt-3.0 machine type
      hw/arm/virt: Use 256MB ECAM region by default
      hw/arm/virt: Increase max_cpus to 512

Julia Suvorova (3):
      target/arm: Minor cleanup for ARMv6-M 32-bit instructions
      target/arm: Introduce ARM_FEATURE_M_MAIN
      target/arm: Strict alignment for ARMv6-M and ARMv8-M Baseline

Peter Maydell (10):
      hw/misc/tz-mpc.c: Implement the Arm TrustZone Memory Protection Controller
      hw/misc/tz-mpc.c: Implement registers
      hw/misc/tz-mpc.c: Implement correct blocked-access behaviour
      hw/misc/tz_mpc.c: Honour the BLK_LUT settings in translate
      hw/misc/iotkit-secctl.c: Implement SECMPCINTSTATUS
      hw/arm/iotkit: Instantiate MPC
      hw/arm/iotkit: Wire up MPC interrupt lines
      hw/arm/mps2-tz.c: Instantiate MPCs
      vl.c: Don't zero-initialize statics for serial_hds
      xen: Don't use memory_region_init_ram_nomigrate() in pci_assign_dev_load_option_rom()

Zheng Xiang (1):
      target-arm: fix a segmentation fault due to illegal memory access

From: Amol Surati <suratiamol@gmail.com>

When either GICD_IPRIORITYR or GICR_IPRIORITYR is read as a 32-bit
register, the post left-shift operator in the for loop causes an
extra shift after the least significant byte has been placed.

The 32-bit value actually returned is therefore the expected value
shifted left by 8 bits.

Signed-off-by: Amol Surati <suratiamol@gmail.com>
Message-id: 20180614054857.26248-1-suratiamol@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/intc/arm_gicv3_dist.c   | 3 ++-
 hw/intc/arm_gicv3_redist.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_dist.c
+++ b/hw/intc/arm_gicv3_dist.c
@@ -XXX,XX +XXX,XX @@ static MemTxResult gicd_readl(GICv3State *s, hwaddr offset,
         int i, irq = offset - GICD_IPRIORITYR;
         uint32_t value = 0;
 
-        for (i = irq + 3; i >= irq; i--, value <<= 8) {
+        for (i = irq + 3; i >= irq; i--) {
+            value <<= 8;
             value |= gicd_read_ipriorityr(s, attrs, i);
         }
         *data = value;
diff --git a/hw/intc/arm_gicv3_redist.c b/hw/intc/arm_gicv3_redist.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_redist.c
+++ b/hw/intc/arm_gicv3_redist.c
@@ -XXX,XX +XXX,XX @@ static MemTxResult gicr_readl(GICv3CPUState *cs, hwaddr offset,
         int i, irq = offset - GICR_IPRIORITYR;
         uint32_t value = 0;
 
-        for (i = irq + 3; i >= irq; i--, value <<= 8) {
+        for (i = irq + 3; i >= irq; i--) {
+            value <<= 8;
             value |= gicr_read_ipriorityr(cs, attrs, i);
         }
         *data = value;
-- 
2.17.1

From: Julia Suvorova <jusual@mail.ru>

The arrays were made static, "if" was simplified because V7M and V8M
define V6 feature.

Signed-off-by: Julia Suvorova <jusual@mail.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180618214604.6777-1-jusual@mail.ru
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
         !arm_dc_feature(s, ARM_FEATURE_V7)) {
         int i;
         bool found = false;
-        const uint32_t armv6m_insn[] = {0xf3808000 /* msr */,
-                                        0xf3b08040 /* dsb */,
-                                        0xf3b08050 /* dmb */,
-                                        0xf3b08060 /* isb */,
-                                        0xf3e08000 /* mrs */,
-                                        0xf000d000 /* bl */};
-        const uint32_t armv6m_mask[] = {0xffe0d000,
-                                        0xfff0d0f0,
-                                        0xfff0d0f0,
-                                        0xfff0d0f0,
-                                        0xffe0d000,
-                                        0xf800d000};
+        static const uint32_t armv6m_insn[] = {0xf3808000 /* msr */,
+                                               0xf3b08040 /* dsb */,
+                                               0xf3b08050 /* dmb */,
+                                               0xf3b08060 /* isb */,
+                                               0xf3e08000 /* mrs */,
+                                               0xf000d000 /* bl */};
+        static const uint32_t armv6m_mask[] = {0xffe0d000,
+                                               0xfff0d0f0,
+                                               0xfff0d0f0,
+                                               0xfff0d0f0,
+                                               0xffe0d000,
+                                               0xf800d000};
 
         for (i = 0; i < ARRAY_SIZE(armv6m_insn); i++) {
             if ((insn & armv6m_mask[i]) == armv6m_insn[i]) {
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
                         break;
                     case 3: /* Special control operations.  */
                         if (!arm_dc_feature(s, ARM_FEATURE_V7) &&
-                            !(arm_dc_feature(s, ARM_FEATURE_V6) &&
-                              arm_dc_feature(s, ARM_FEATURE_M))) {
+                            !arm_dc_feature(s, ARM_FEATURE_M)) {
                             goto illegal_op;
                         }
                         op = (insn >> 4) & 0xf;
-- 
2.17.1

From: Eric Auger <eric.auger@redhat.com>

Update our kernel headers to mainline commit
b357bf6023a948cf6a9472f07a1b0caac0e4f8e8
("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-id: 1529072910-16156-2-git-send-email-eric.auger@redhat.com
[PMM:  clarified commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/standard-headers/linux/pci_regs.h            |  8 ++++++++
 include/standard-headers/linux/virtio_gpu.h          |  1 +
 include/standard-headers/linux/virtio_net.h          |  3 +++
 linux-headers/asm-arm/kvm.h                          |  1 +
 linux-headers/asm-arm/unistd-common.h                |  1 +
 linux-headers/asm-arm64/kvm.h                        |  1 +
 linux-headers/asm-generic/unistd.h                   |  4 +++-
 linux-headers/asm-powerpc/unistd.h                   |  1 +
 linux-headers/asm-x86/unistd_32.h                    |  2 ++
 linux-headers/asm-x86/unistd_64.h                    |  2 ++
 linux-headers/asm-x86/unistd_x32.h                   |  2 ++
 linux-headers/linux/kvm.h                            |  5 +++--
 linux-headers/linux/psp-sev.h                        | 12 ++++++++++++
 linux-headers/LICENSES/exceptions/Linux-syscall-note |  2 +-
 linux-headers/LICENSES/preferred/GPL-2.0             |  6 ++++++
 15 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
index XXXXXXX..XXXXXXX 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -XXX,XX +XXX,XX @@
 #define  PCI_EXP_DEVCTL_READRQ_256B  0x1000 /* 256 Bytes */
 #define  PCI_EXP_DEVCTL_READRQ_512B  0x2000 /* 512 Bytes */
 #define  PCI_EXP_DEVCTL_READRQ_1024B 0x3000 /* 1024 Bytes */
+#define  PCI_EXP_DEVCTL_READRQ_2048B 0x4000 /* 2048 Bytes */
+#define  PCI_EXP_DEVCTL_READRQ_4096B 0x5000 /* 4096 Bytes */
 #define  PCI_EXP_DEVCTL_BCR_FLR 0x8000  /* Bridge Configuration Retry / FLR */
 #define PCI_EXP_DEVSTA		10	/* Device Status */
 #define  PCI_EXP_DEVSTA_CED	0x0001	/* Correctable Error Detected */
@@ -XXX,XX +XXX,XX @@
 #define  PCI_EXP_LNKCAP2_SLS_16_0GB	0x00000010 /* Supported Speed 16GT/s */
 #define  PCI_EXP_LNKCAP2_CROSSLINK	0x00000100 /* Crosslink supported */
 #define PCI_EXP_LNKCTL2		48	/* Link Control 2 */
+#define PCI_EXP_LNKCTL2_TLS		0x000f
+#define PCI_EXP_LNKCTL2_TLS_2_5GT	0x0001 /* Supported Speed 2.5GT/s */
+#define PCI_EXP_LNKCTL2_TLS_5_0GT	0x0002 /* Supported Speed 5GT/s */
+#define PCI_EXP_LNKCTL2_TLS_8_0GT	0x0003 /* Supported Speed 8GT/s */
+#define PCI_EXP_LNKCTL2_TLS_16_0GT	0x0004 /* Supported Speed 16GT/s */
 #define PCI_EXP_LNKSTA2		50	/* Link Status 2 */
 #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2	52	/* v2 endpoints with link end here */
 #define PCI_EXP_SLTCAP2		52	/* Slot Capabilities 2 */
@@ -XXX,XX +XXX,XX @@
 #define  PCI_EXP_DPC_CAP_DL_ACTIVE	0x1000	/* ERR_COR signal on DL_Active supported */
 
 #define PCI_EXP_DPC_CTL			6	/* DPC control */
+#define  PCI_EXP_DPC_CTL_EN_FATAL 	0x0001	/* Enable trigger on ERR_FATAL message */
 #define  PCI_EXP_DPC_CTL_EN_NONFATAL 	0x0002	/* Enable trigger on ERR_NONFATAL message */
 #define  PCI_EXP_DPC_CTL_INT_EN 	0x0008	/* DPC Interrupt Enable */
 
diff --git a/include/standard-headers/linux/virtio_gpu.h b/include/standard-headers/linux/virtio_gpu.h
index XXXXXXX..XXXXXXX 100644
--- a/include/standard-headers/linux/virtio_gpu.h
+++ b/include/standard-headers/linux/virtio_gpu.h
@@ -XXX,XX +XXX,XX @@ struct virtio_gpu_cmd_submit {
 };
 
 #define VIRTIO_GPU_CAPSET_VIRGL 1
+#define VIRTIO_GPU_CAPSET_VIRGL2 2
 
 /* VIRTIO_GPU_CMD_GET_CAPSET_INFO */
 struct virtio_gpu_get_capset_info {
diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
index XXXXXXX..XXXXXXX 100644
--- a/include/standard-headers/linux/virtio_net.h
+++ b/include/standard-headers/linux/virtio_net.h
@@ -XXX,XX +XXX,XX @@
 					 * Steering */
 #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
 
+#define VIRTIO_NET_F_STANDBY	  62	/* Act as standby for another device
+					 * with the same MAC.
+					 */
 #define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
 
 #ifndef VIRTIO_NET_NO_LEGACY
diff --git a/linux-headers/asm-arm/kvm.h b/linux-headers/asm-arm/kvm.h
index XXXXXXX..XXXXXXX 100644
--- a/linux-headers/asm-arm/kvm.h
+++ b/linux-headers/asm-arm/kvm.h
@@ -XXX,XX +XXX,XX @@ struct kvm_regs {
 #define KVM_VGIC_V3_ADDR_TYPE_DIST	2
 #define KVM_VGIC_V3_ADDR_TYPE_REDIST	3
 #define KVM_VGIC_ITS_ADDR_TYPE		4
+#define KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION	5
 
 #define KVM_VGIC_V3_DIST_SIZE		SZ_64K
 #define KVM_VGIC_V3_REDIST_SIZE		(2 * SZ_64K)
diff --git a/linux-headers/asm-arm/unistd-common.h b/linux-headers/asm-arm/unistd-common.h
index XXXXXXX..XXXXXXX 100644
--- a/linux-headers/asm-arm/unistd-common.h
+++ b/linux-headers/asm-arm/unistd-common.h
@@ -XXX,XX +XXX,XX @@
 #define __NR_pkey_alloc (__NR_SYSCALL_BASE + 395)
 #define __NR_pkey_free (__NR_SYSCALL_BASE + 396)
 #define __NR_statx (__NR_SYSCALL_BASE + 397)
+#define __NR_rseq (__NR_SYSCALL_BASE + 398)
 
 #endif /* _ASM_ARM_UNISTD_COMMON_H */
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index XXXXXXX..XXXXXXX 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -XXX,XX +XXX,XX @@ struct kvm_regs {
 #define KVM_VGIC_V3_ADDR_TYPE_DIST	2
 #define KVM_VGIC_V3_ADDR_TYPE_REDIST	3
 #define KVM_VGIC_ITS_ADDR_TYPE		4
+#define KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION	5
 
 #define KVM_VGIC_V3_DIST_SIZE		SZ_64K
 #define KVM_VGIC_V3_REDIST_SIZE		(2 * SZ_64K)
diff --git a/linux-headers/asm-generic/unistd.h b/linux-headers/asm-generic/unistd.h
index XXXXXXX..XXXXXXX 100644
--- a/linux-headers/asm-generic/unistd.h
+++ b/linux-headers/asm-generic/unistd.h
@@ -XXX,XX +XXX,XX @@ __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
 __SYSCALL(__NR_pkey_free,     sys_pkey_free)
 #define __NR_statx 291
 __SYSCALL(__NR_statx,     sys_statx)
+#define __NR_io_pgetevents 292
+__SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
 
 #undef __NR_syscalls
-#define __NR_syscalls 292
+#define __NR_syscalls 293
 
 /*
  * 32 bit systems traditionally used different
diff --git a/linux-headers/asm-powerpc/unistd.h b/linux-headers/asm-powerpc/unistd.h
index XXXXXXX..XXXXXXX 100644
--- a/linux-headers/asm-powerpc/unistd.h
+++ b/linux-headers/asm-powerpc/unistd.h
@@ -XXX,XX +XXX,XX @@
 #define __NR_pkey_alloc		384
 #define __NR_pkey_free		385
 #define __NR_pkey_mprotect	386
+#define __NR_rseq		387
 
 #endif /* _ASM_POWERPC_UNISTD_H_ */
diff --git a/linux-headers/asm-x86/unistd_32.h b/linux-headers/asm-x86/unistd_32.h
index XXXXXXX..XXXXXXX 100644
--- a/linux-headers/asm-x86/unistd_32.h
+++ b/linux-headers/asm-x86/unistd_32.h
@@ -XXX,XX +XXX,XX @@
 #define __NR_pkey_free 382
 #define __NR_statx 383
 #define __NR_arch_prctl 384
+#define __NR_io_pgetevents 385
+#define __NR_rseq 386
 
 #endif /* _ASM_X86_UNISTD_32_H */
diff --git a/linux-headers/asm-x86/unistd_64.h b/linux-headers/asm-x86/unistd_64.h
index XXXXXXX..XXXXXXX 100644
--- a/linux-headers/asm-x86/unistd_64.h
+++ b/linux-headers/asm-x86/unistd_64.h
@@ -XXX,XX +XXX,XX @@
 #define __NR_pkey_alloc 330
 #define __NR_pkey_free 331
 #define __NR_statx 332
+#define __NR_io_pgetevents 333
+#define __NR_rseq 334
 
 #endif /* _ASM_X86_UNISTD_64_H */
diff --git a/linux-headers/asm-x86/unistd_x32.h b/linux-headers/asm-x86/unistd_x32.h
index XXXXXXX..XXXXXXX 100644
--- a/linux-headers/asm-x86/unistd_x32.h
+++ b/linux-headers/asm-x86/unistd_x32.h
@@ -XXX,XX +XXX,XX @@
 #define __NR_pkey_alloc (__X32_SYSCALL_BIT + 330)
 #define __NR_pkey_free (__X32_SYSCALL_BIT + 331)
 #define __NR_statx (__X32_SYSCALL_BIT + 332)
+#define __NR_io_pgetevents (__X32_SYSCALL_BIT + 333)
+#define __NR_rseq (__X32_SYSCALL_BIT + 334)
 #define __NR_rt_sigaction (__X32_SYSCALL_BIT + 512)
 #define __NR_rt_sigreturn (__X32_SYSCALL_BIT + 513)
 #define __NR_ioctl (__X32_SYSCALL_BIT + 514)
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index XXXXXXX..XXXXXXX 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -XXX,XX +XXX,XX @@ struct kvm_ioeventfd {
 };
 
 #define KVM_X86_DISABLE_EXITS_MWAIT          (1 << 0)
-#define KVM_X86_DISABLE_EXITS_HTL            (1 << 1)
+#define KVM_X86_DISABLE_EXITS_HLT            (1 << 1)
 #define KVM_X86_DISABLE_EXITS_PAUSE          (1 << 2)
 #define KVM_X86_DISABLE_VALID_EXITS          (KVM_X86_DISABLE_EXITS_MWAIT | \
-                                              KVM_X86_DISABLE_EXITS_HTL | \
+                                              KVM_X86_DISABLE_EXITS_HLT | \
                                               KVM_X86_DISABLE_EXITS_PAUSE)
 
 /* for KVM_ENABLE_CAP */
@@ -XXX,XX +XXX,XX @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_BPB 152
 #define KVM_CAP_GET_MSR_FEATURES 153
 #define KVM_CAP_HYPERV_EVENTFD 154
+#define KVM_CAP_HYPERV_TLBFLUSH 155
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/linux-headers/linux/psp-sev.h b/linux-headers/linux/psp-sev.h
index XXXXXXX..XXXXXXX 100644
--- a/linux-headers/linux/psp-sev.h
+++ b/linux-headers/linux/psp-sev.h
@@ -XXX,XX +XXX,XX @@ enum {
 	SEV_PDH_GEN,
 	SEV_PDH_CERT_EXPORT,
 	SEV_PEK_CERT_IMPORT,
+	SEV_GET_ID,
 
 	SEV_MAX,
 };
@@ -XXX,XX +XXX,XX @@ struct sev_user_data_pdh_cert_export {
 	__u32 cert_chain_len;			/* In/Out */
 } __attribute__((packed));
 
+/**
+ * struct sev_user_data_get_id - GET_ID command parameters
+ *
+ * @socket1: Buffer to pass unique ID of first socket
+ * @socket2: Buffer to pass unique ID of second socket
+ */
+struct sev_user_data_get_id {
+	__u8 socket1[64];			/* Out */
+	__u8 socket2[64];			/* Out */
+} __attribute__((packed));
+
 /**
  * struct sev_issue_cmd - SEV ioctl parameters
  *
diff --git a/linux-headers/LICENSES/exceptions/Linux-syscall-note b/linux-headers/LICENSES/exceptions/Linux-syscall-note
index XXXXXXX..XXXXXXX 100644
--- a/linux-headers/LICENSES/exceptions/Linux-syscall-note
+++ b/linux-headers/LICENSES/exceptions/Linux-syscall-note
@@ -XXX,XX +XXX,XX @@
 SPDX-Exception-Identifier: Linux-syscall-note
 SPDX-URL: https://spdx.org/licenses/Linux-syscall-note.html
-SPDX-Licenses: GPL-2.0, GPL-2.0+, GPL-1.0+, LGPL-2.0, LGPL-2.0+, LGPL-2.1, LGPL-2.1+
+SPDX-Licenses: GPL-2.0, GPL-2.0+, GPL-1.0+, LGPL-2.0, LGPL-2.0+, LGPL-2.1, LGPL-2.1+, GPL-2.0-only, GPL-2.0-or-later
 Usage-Guide:
   This exception is used together with one of the above SPDX-Licenses
   to mark user space API (uapi) header files so they can be included
diff --git a/linux-headers/LICENSES/preferred/GPL-2.0 b/linux-headers/LICENSES/preferred/GPL-2.0
index XXXXXXX..XXXXXXX 100644
--- a/linux-headers/LICENSES/preferred/GPL-2.0
+++ b/linux-headers/LICENSES/preferred/GPL-2.0
@@ -XXX,XX +XXX,XX @@
 Valid-License-Identifier: GPL-2.0
+Valid-License-Identifier: GPL-2.0-only
 Valid-License-Identifier: GPL-2.0+
+Valid-License-Identifier: GPL-2.0-or-later
 SPDX-URL: https://spdx.org/licenses/GPL-2.0.html
 Usage-Guide:
   To use this license in source code, put one of the following SPDX
@@ -XXX,XX +XXX,XX @@ Usage-Guide:
   guidelines in the licensing rules documentation.
   For 'GNU General Public License (GPL) version 2 only' use:
     SPDX-License-Identifier: GPL-2.0
+  or
+    SPDX-License-Identifier: GPL-2.0-only
   For 'GNU General Public License (GPL) version 2 or any later version' use:
     SPDX-License-Identifier: GPL-2.0+
+  or
+    SPDX-License-Identifier: GPL-2.0-or-later
 License-Text:
 
 		    GNU GENERAL PUBLIC LICENSE
-- 
2.17.1

From: Eric Auger <eric.auger@redhat.com>

for KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION attribute, the attribute
data pointed to by kvm_device_attr.addr is a OR of the
redistributor region address and other fields such as the index
of the redistributor region and the number of redistributors the
region can contain.

The existing machine init done notifier framework sets the address
field to the actual address of the device and does not allow to OR
this value with other fields.

This patch extends the KVMDevice struct with a new kda_addr_ormask
member. Its value is passed at registration time and OR'ed with the
resolved address on kvm_arm_set_device_addr().

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1529072910-16156-3-git-send-email-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/kvm_arm.h        |  3 ++-
 hw/intc/arm_gic_kvm.c       |  4 ++--
 hw/intc/arm_gicv3_its_kvm.c |  2 +-
 hw/intc/arm_gicv3_kvm.c     |  4 ++--
 target/arm/kvm.c            | 10 +++++++++-
 5 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -XXX,XX +XXX,XX @@ int kvm_arm_vcpu_init(CPUState *cs);
  * @group: device control API group for setting addresses
  * @attr: device control API address type
  * @dev_fd: device control device file descriptor (or -1 if not supported)
+ * @addr_ormask: value to be OR'ed with resolved address
  *
  * Remember the memory region @mr, and when it is mapped by the
  * machine model, tell the kernel that base address using the
@@ -XXX,XX +XXX,XX @@ int kvm_arm_vcpu_init(CPUState *cs);
  * address at the point where machine init is complete.
  */
 void kvm_arm_register_device(MemoryRegion *mr, uint64_t devid, uint64_t group,
-                             uint64_t attr, int dev_fd);
+                             uint64_t attr, int dev_fd, uint64_t addr_ormask);
 
 /**
  * kvm_arm_init_cpreg_list:
diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gic_kvm.c
+++ b/hw/intc/arm_gic_kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gic_realize(DeviceState *dev, Error **errp)
                             | KVM_VGIC_V2_ADDR_TYPE_DIST,
                             KVM_DEV_ARM_VGIC_GRP_ADDR,
                             KVM_VGIC_V2_ADDR_TYPE_DIST,
-                            s->dev_fd);
+                            s->dev_fd, 0);
     /* CPU interface for current core. Unlike arm_gic, we don't
      * provide the "interface for core #N" memory regions, because
      * cores with a VGIC don't have those.
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gic_realize(DeviceState *dev, Error **errp)
                             | KVM_VGIC_V2_ADDR_TYPE_CPU,
                             KVM_DEV_ARM_VGIC_GRP_ADDR,
                             KVM_VGIC_V2_ADDR_TYPE_CPU,
-                            s->dev_fd);
+                            s->dev_fd, 0);
 
     if (kvm_has_gsi_routing()) {
         /* set up irq routing */
diff --git a/hw/intc/arm_gicv3_its_kvm.c b/hw/intc/arm_gicv3_its_kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_its_kvm.c
+++ b/hw/intc/arm_gicv3_its_kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_its_realize(DeviceState *dev, Error **errp)
 
     /* register the base address */
     kvm_arm_register_device(&s->iomem_its_cntrl, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
-                            KVM_VGIC_ITS_ADDR_TYPE, s->dev_fd);
+                            KVM_VGIC_ITS_ADDR_TYPE, s->dev_fd, 0);
 
     gicv3_its_init_mmio(s, NULL);
 
diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
                       KVM_DEV_ARM_VGIC_CTRL_INIT, NULL, true, &error_abort);
 
     kvm_arm_register_device(&s->iomem_dist, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
-                            KVM_VGIC_V3_ADDR_TYPE_DIST, s->dev_fd);
+                            KVM_VGIC_V3_ADDR_TYPE_DIST, s->dev_fd, 0);
     kvm_arm_register_device(&s->iomem_redist, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
-                            KVM_VGIC_V3_ADDR_TYPE_REDIST, s->dev_fd);
+                            KVM_VGIC_V3_ADDR_TYPE_REDIST, s->dev_fd, 0);
 
     if (kvm_has_gsi_routing()) {
         /* set up irq routing */
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu)
  * We use a MemoryListener to track mapping and unmapping of
  * the regions during board creation, so the board models don't
  * need to do anything special for the KVM case.
+ *
+ * Sometimes the address must be OR'ed with some other fields
+ * (for example for KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION).
+ * @kda_addr_ormask aims at storing the value of those fields.
  */
 typedef struct KVMDevice {
     struct kvm_arm_device_addr kda;
     struct kvm_device_attr kdattr;
+    uint64_t kda_addr_ormask;
     MemoryRegion *mr;
     QSLIST_ENTRY(KVMDevice) entries;
     int dev_fd;
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_set_device_addr(KVMDevice *kd)
      */
     if (kd->dev_fd >= 0) {
         uint64_t addr = kd->kda.addr;
+
+        addr |= kd->kda_addr_ormask;
         attr->addr = (uintptr_t)&addr;
         ret = kvm_device_ioctl(kd->dev_fd, KVM_SET_DEVICE_ATTR, attr);
     } else {
@@ -XXX,XX +XXX,XX @@ static Notifier notify = {
 };
 
 void kvm_arm_register_device(MemoryRegion *mr, uint64_t devid, uint64_t group,
-                             uint64_t attr, int dev_fd)
+                             uint64_t attr, int dev_fd, uint64_t addr_ormask)
 {
     KVMDevice *kd;
 
@@ -XXX,XX +XXX,XX @@ void kvm_arm_register_device(MemoryRegion *mr, uint64_t devid, uint64_t group,
     kd->kdattr.group = group;
     kd->kdattr.attr = attr;
     kd->dev_fd = dev_fd;
+    kd->kda_addr_ormask = addr_ormask;
     QSLIST_INSERT_HEAD(&kvm_devices_head, kd, entries);
     memory_region_ref(kd->mr);
 }
-- 
2.17.1

From: Eric Auger <eric.auger@redhat.com>

To prepare for multiple redistributor regions, we introduce
an array of uint32_t properties that stores the redistributor
count of each redistributor region.

Non accelerated VGICv3 only supports a single redistributor region.
The capacity of all redist regions is checked against the number of
vcpus.

Machvirt is updated to set those properties, ie. a single
redistributor region with count set to the number of vcpus
capped by 123.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1529072910-16156-4-git-send-email-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/intc/arm_gicv3_common.h |  8 +++++--
 hw/arm/virt.c                      | 11 ++++++++-
 hw/intc/arm_gicv3.c                | 12 +++++++++-
 hw/intc/arm_gicv3_common.c         | 38 ++++++++++++++++++++++++++----
 hw/intc/arm_gicv3_kvm.c            |  9 +++++--
 5 files changed, 67 insertions(+), 11 deletions(-)

diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -XXX,XX +XXX,XX @@
 #define GICV3_MAXIRQ 1020
 #define GICV3_MAXSPI (GICV3_MAXIRQ - GIC_INTERNAL)
 
+#define GICV3_REDIST_SIZE 0x20000
+
 /* Number of SGI target-list bits */
 #define GICV3_TARGETLIST_BITS 16
 
@@ -XXX,XX +XXX,XX @@ struct GICv3State {
     /*< public >*/
 
     MemoryRegion iomem_dist; /* Distributor */
-    MemoryRegion iomem_redist; /* Redistributors */
+    MemoryRegion *iomem_redist; /* Redistributor Regions */
+    uint32_t *redist_region_count; /* redistributor count within each region */
+    uint32_t nb_redist_regions; /* number of redist regions */
 
     uint32_t num_cpu;
     uint32_t num_irq;
@@ -XXX,XX +XXX,XX @@ typedef struct ARMGICv3CommonClass {
 } ARMGICv3CommonClass;
 
 void gicv3_init_irqs_and_mmio(GICv3State *s, qemu_irq_handler handler,
-                              const MemoryRegionOps *ops);
+                              const MemoryRegionOps *ops, Error **errp);
 
 #endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void create_gic(VirtMachineState *vms, qemu_irq *pic)
     if (!kvm_irqchip_in_kernel()) {
         qdev_prop_set_bit(gicdev, "has-security-extensions", vms->secure);
     }
+
+    if (type == 3) {
+        uint32_t redist0_capacity =
+                    vms->memmap[VIRT_GIC_REDIST].size / GICV3_REDIST_SIZE;
+        uint32_t redist0_count = MIN(smp_cpus, redist0_capacity);
+
+        qdev_prop_set_uint32(gicdev, "len-redist-region-count", 1);
+        qdev_prop_set_uint32(gicdev, "redist-region-count[0]", redist0_count);
+    }
     qdev_init_nofail(gicdev);
     gicbusdev = SYS_BUS_DEVICE(gicdev);
     sysbus_mmio_map(gicbusdev, 0, vms->memmap[VIRT_GIC_DIST].base);
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
      * many redistributors we can fit into the memory map.
      */
     if (vms->gic_version == 3) {
-        virt_max_cpus = vms->memmap[VIRT_GIC_REDIST].size / 0x20000;
+        virt_max_cpus = vms->memmap[VIRT_GIC_REDIST].size / GICV3_REDIST_SIZE;
     } else {
         virt_max_cpus = GIC_NCPU;
     }
diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3.c
+++ b/hw/intc/arm_gicv3.c
@@ -XXX,XX +XXX,XX @@ static void arm_gic_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    gicv3_init_irqs_and_mmio(s, gicv3_set_irq, gic_ops);
+    if (s->nb_redist_regions != 1) {
+        error_setg(errp, "VGICv3 redist region number(%d) not equal to 1",
+                   s->nb_redist_regions);
+        return;
+    }
+
+    gicv3_init_irqs_and_mmio(s, gicv3_set_irq, gic_ops, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
 
     gicv3_init_cpuif(s);
 }
diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_gicv3 = {
 };
 
 void gicv3_init_irqs_and_mmio(GICv3State *s, qemu_irq_handler handler,
-                              const MemoryRegionOps *ops)
+                              const MemoryRegionOps *ops, Error **errp)
 {
     SysBusDevice *sbd = SYS_BUS_DEVICE(s);
+    int rdist_capacity = 0;
     int i;
 
+    for (i = 0; i < s->nb_redist_regions; i++) {
+        rdist_capacity += s->redist_region_count[i];
+    }
+    if (rdist_capacity < s->num_cpu) {
+        error_setg(errp, "Capacity of the redist regions(%d) "
+                   "is less than number of vcpus(%d)",
+                   rdist_capacity, s->num_cpu);
+        return;
+    }
+
     /* For the GIC, also expose incoming GPIO lines for PPIs for each CPU.
      * GPIO array layout is thus:
      *  [0..N-1] spi
@@ -XXX,XX +XXX,XX @@ void gicv3_init_irqs_and_mmio(GICv3State *s, qemu_irq_handler handler,
 
     memory_region_init_io(&s->iomem_dist, OBJECT(s), ops, s,
                           "gicv3_dist", 0x10000);
-    memory_region_init_io(&s->iomem_redist, OBJECT(s), ops ? &ops[1] : NULL, s,
-                          "gicv3_redist", 0x20000 * s->num_cpu);
-
     sysbus_init_mmio(sbd, &s->iomem_dist);
-    sysbus_init_mmio(sbd, &s->iomem_redist);
+
+    s->iomem_redist = g_new0(MemoryRegion, s->nb_redist_regions);
+    for (i = 0; i < s->nb_redist_regions; i++) {
+        char *name = g_strdup_printf("gicv3_redist_region[%d]", i);
+
+        memory_region_init_io(&s->iomem_redist[i], OBJECT(s),
+                              ops ? &ops[1] : NULL, s, name,
+                              s->redist_region_count[i] * GICV3_REDIST_SIZE);
+        sysbus_init_mmio(sbd, &s->iomem_redist[i]);
+        g_free(name);
+    }
 }
 
 static void arm_gicv3_common_realize(DeviceState *dev, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void arm_gicv3_common_realize(DeviceState *dev, Error **errp)
     }
 }
 
+static void arm_gicv3_finalize(Object *obj)
+{
+    GICv3State *s = ARM_GICV3_COMMON(obj);
+
+    g_free(s->redist_region_count);
+}
+
 static void arm_gicv3_common_reset(DeviceState *dev)
 {
     GICv3State *s = ARM_GICV3_COMMON(dev);
@@ -XXX,XX +XXX,XX @@ static Property arm_gicv3_common_properties[] = {
     DEFINE_PROP_UINT32("num-irq", GICv3State, num_irq, 32),
     DEFINE_PROP_UINT32("revision", GICv3State, revision, 3),
     DEFINE_PROP_BOOL("has-security-extensions", GICv3State, security_extn, 0),
+    DEFINE_PROP_ARRAY("redist-region-count", GICv3State, nb_redist_regions,
+                      redist_region_count, qdev_prop_uint32, uint32_t),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -XXX,XX +XXX,XX @@ static const TypeInfo arm_gicv3_common_type = {
     .instance_size = sizeof(GICv3State),
     .class_size = sizeof(ARMGICv3CommonClass),
     .class_init = arm_gicv3_common_class_init,
+    .instance_finalize = arm_gicv3_finalize,
     .abstract = true,
     .interfaces = (InterfaceInfo []) {
         { TYPE_ARM_LINUX_BOOT_IF },
diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    gicv3_init_irqs_and_mmio(s, kvm_arm_gicv3_set_irq, NULL);
+    gicv3_init_irqs_and_mmio(s, kvm_arm_gicv3_set_irq, NULL, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
 
     for (i = 0; i < s->num_cpu; i++) {
         ARMCPU *cpu = ARM_CPU(qemu_get_cpu(i));
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
 
     kvm_arm_register_device(&s->iomem_dist, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
                             KVM_VGIC_V3_ADDR_TYPE_DIST, s->dev_fd, 0);
-    kvm_arm_register_device(&s->iomem_redist, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
+    kvm_arm_register_device(&s->iomem_redist[0], -1,
+                            KVM_DEV_ARM_VGIC_GRP_ADDR,
                             KVM_VGIC_V3_ADDR_TYPE_REDIST, s->dev_fd, 0);
 
     if (kvm_has_gsi_routing()) {
-- 
2.17.1

From: Eric Auger <eric.auger@redhat.com>

Let's check if KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION is supported.
If not, we check the number of redist region is equal to 1 and use the
legacy KVM_VGIC_V3_ADDR_TYPE_REDIST attribute. Otherwise we use
the new attribute and allow to register multiple regions to the
KVM device.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1529072910-16156-5-git-send-email-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/intc/arm_gicv3_kvm.c | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
 {
     GICv3State *s = KVM_ARM_GICV3(dev);
     KVMARMGICv3Class *kgc = KVM_ARM_GICV3_GET_CLASS(s);
+    bool multiple_redist_region_allowed;
     Error *local_err = NULL;
     int i;
 
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    multiple_redist_region_allowed =
+        kvm_device_check_attr(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
+                              KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION);
+
+    if (!multiple_redist_region_allowed && s->nb_redist_regions > 1) {
+        error_setg(errp, "Multiple VGICv3 redistributor regions are not "
+                   "supported by this host kernel");
+        error_append_hint(errp, "A maximum of %d VCPUs can be used",
+                          s->redist_region_count[0]);
+        return;
+    }
+
     kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_NR_IRQS,
                       0, &s->num_irq, true, &error_abort);
 
@@ -XXX,XX +XXX,XX @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
 
     kvm_arm_register_device(&s->iomem_dist, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
                             KVM_VGIC_V3_ADDR_TYPE_DIST, s->dev_fd, 0);
-    kvm_arm_register_device(&s->iomem_redist[0], -1,
-                            KVM_DEV_ARM_VGIC_GRP_ADDR,
-                            KVM_VGIC_V3_ADDR_TYPE_REDIST, s->dev_fd, 0);
+
+    if (!multiple_redist_region_allowed) {
+        kvm_arm_register_device(&s->iomem_redist[0], -1,
+                                KVM_DEV_ARM_VGIC_GRP_ADDR,
+                                KVM_VGIC_V3_ADDR_TYPE_REDIST, s->dev_fd, 0);
+    } else {
+        /* we register regions in reverse order as "devices" are inserted at
+         * the head of a QSLIST and the list is then popped from the head
+         * onwards by kvm_arm_machine_init_done()
+         */
+        for (i = s->nb_redist_regions - 1; i >= 0; i--) {
+            /* Address mask made of the rdist region index and count */
+            uint64_t addr_ormask =
+                        i | ((uint64_t)s->redist_region_count[i] << 52);
+
+            kvm_arm_register_device(&s->iomem_redist[i], -1,
+                                    KVM_DEV_ARM_VGIC_GRP_ADDR,
+                                    KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION,
+                                    s->dev_fd, addr_ormask);
+        }
+    }
 
     if (kvm_has_gsi_routing()) {
         /* set up irq routing */
-- 
2.17.1

From: Eric Auger <eric.auger@redhat.com>

This patch allows the creation of a GICv3 node with 1 or 2
redistributor regions depending on the number of smu_cpus.
The second redistributor region is located just after the
existing RAM region, at 256GB and contains up to up to 512 vcpus.

Please refer to kernel documentation for further node details:
Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1529072910-16156-6-git-send-email-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/virt.h | 14 ++++++++++++++
 hw/arm/virt.c         | 29 ++++++++++++++++++++++++-----
 2 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -XXX,XX +XXX,XX @@
 #include "qemu/notify.h"
 #include "hw/boards.h"
 #include "hw/arm/arm.h"
+#include "sysemu/kvm.h"
+#include "hw/intc/arm_gicv3_common.h"
 
 #define NUM_GICV2M_SPIS       64
 #define NUM_VIRTIO_TRANSPORTS 32
@@ -XXX,XX +XXX,XX @@ enum {
     VIRT_GIC_V2M,
     VIRT_GIC_ITS,
     VIRT_GIC_REDIST,
+    VIRT_GIC_REDIST2,
     VIRT_SMMU,
     VIRT_UART,
     VIRT_MMIO,
@@ -XXX,XX +XXX,XX @@ typedef struct {
 
 void virt_acpi_setup(VirtMachineState *vms);
 
+/* Return the number of used redistributor regions  */
+static inline int virt_gicv3_redist_region_count(VirtMachineState *vms)
+{
+    uint32_t redist0_capacity =
+                vms->memmap[VIRT_GIC_REDIST].size / GICV3_REDIST_SIZE;
+
+    assert(vms->gic_version == 3);
+
+    return vms->smp_cpus > redist0_capacity ? 2 : 1;
+}
+
 #endif /* QEMU_ARM_VIRT_H */
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static const MemMapEntry a15memmap[] = {
     [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
     [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
     [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
+    /* Additional 64 MB redist region (can contain up to 512 redistributors) */
+    [VIRT_GIC_REDIST2] =        { 0x4000000000ULL, 0x4000000 },
     /* Second PCIe window, 512GB wide at the 512GB boundary */
     [VIRT_PCIE_MMIO_HIGH] =   { 0x8000000000ULL, 0x8000000000ULL },
 };
@@ -XXX,XX +XXX,XX @@ static void fdt_add_gic_node(VirtMachineState *vms)
     qemu_fdt_setprop_cell(vms->fdt, "/intc", "#size-cells", 0x2);
     qemu_fdt_setprop(vms->fdt, "/intc", "ranges", NULL, 0);
     if (vms->gic_version == 3) {
+        int nb_redist_regions = virt_gicv3_redist_region_count(vms);
+
         qemu_fdt_setprop_string(vms->fdt, "/intc", "compatible",
                                 "arm,gic-v3");
-        qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg",
-                                     2, vms->memmap[VIRT_GIC_DIST].base,
-                                     2, vms->memmap[VIRT_GIC_DIST].size,
-                                     2, vms->memmap[VIRT_GIC_REDIST].base,
-                                     2, vms->memmap[VIRT_GIC_REDIST].size);
+
+        qemu_fdt_setprop_cell(vms->fdt, "/intc",
+                              "#redistributor-regions", nb_redist_regions);
+
+        if (nb_redist_regions == 1) {
+            qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg",
+                                         2, vms->memmap[VIRT_GIC_DIST].base,
+                                         2, vms->memmap[VIRT_GIC_DIST].size,
+                                         2, vms->memmap[VIRT_GIC_REDIST].base,
+                                         2, vms->memmap[VIRT_GIC_REDIST].size);
+        } else {
+            qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg",
+                                         2, vms->memmap[VIRT_GIC_DIST].base,
+                                         2, vms->memmap[VIRT_GIC_DIST].size,
+                                         2, vms->memmap[VIRT_GIC_REDIST].base,
+                                         2, vms->memmap[VIRT_GIC_REDIST].size,
+                                         2, vms->memmap[VIRT_GIC_REDIST2].base,
+                                         2, vms->memmap[VIRT_GIC_REDIST2].size);
+        }
+
         if (vms->virt) {
             qemu_fdt_setprop_cells(vms->fdt, "/intc", "interrupts",
                                    GIC_FDT_IRQ_TYPE_PPI, ARCH_GICV3_MAINT_IRQ,
-- 
2.17.1

From: Eric Auger <eric.auger@redhat.com>

Depending on the number of smp_cpus we now register one or two
GICR structures.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1529072910-16156-7-git-send-email-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/virt-acpi-build.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -XXX,XX +XXX,XX @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 
     if (vms->gic_version == 3) {
         AcpiMadtGenericTranslator *gic_its;
+        int nb_redist_regions = virt_gicv3_redist_region_count(vms);
         AcpiMadtGenericRedistributor *gicr = acpi_data_push(table_data,
                                                          sizeof *gicr);
 
@@ -XXX,XX +XXX,XX @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
         gicr->base_address = cpu_to_le64(memmap[VIRT_GIC_REDIST].base);
         gicr->range_length = cpu_to_le32(memmap[VIRT_GIC_REDIST].size);
 
+        if (nb_redist_regions == 2) {
+            gicr = acpi_data_push(table_data, sizeof(*gicr));
+            gicr->type = ACPI_APIC_GENERIC_REDISTRIBUTOR;
+            gicr->length = sizeof(*gicr);
+            gicr->base_address = cpu_to_le64(memmap[VIRT_GIC_REDIST2].base);
+            gicr->range_length = cpu_to_le32(memmap[VIRT_GIC_REDIST2].size);
+        }
+
         if (its_class_name() && !vmc->no_its) {
             gic_its = acpi_data_push(table_data, sizeof *gic_its);
             gic_its->type = ACPI_APIC_GENERIC_TRANSLATOR;
-- 
2.17.1

From: Eric Auger <eric.auger@redhat.com>

With a VGICv3 KVM device, if the number of vcpus exceeds the
capacity of the legacy redistributor region (123 redistributors),
we now attempt to register a second redistributor region. Up to
512 redistributors can fit in this latter on top of the 123 allowed
by the legacy redistributor region.

Registering this second redistributor region is possible if the
host kernel supports the following VGICv3 KVM device group/attribute:
KVM_DEV_ARM_VGIC_GRP_ADDR/KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION.

In case the host kernel does not support the registration of several
redistributor regions and the requested number of vcpus exceeds the
capacity of the legacy redistributor region, the GICv3 device
initialization fails with a proper error message and qemu exits.

At the moment the max number of vcpus still is capped by the
virt machine class max_cpus.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1529072910-16156-8-git-send-email-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/virt.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void create_gic(VirtMachineState *vms, qemu_irq *pic)
     SysBusDevice *gicbusdev;
     const char *gictype;
     int type = vms->gic_version, i;
+    uint32_t nb_redist_regions = 0;
 
     gictype = (type == 3) ? gicv3_class_name() : gic_class_name();
 
@@ -XXX,XX +XXX,XX @@ static void create_gic(VirtMachineState *vms, qemu_irq *pic)
                     vms->memmap[VIRT_GIC_REDIST].size / GICV3_REDIST_SIZE;
         uint32_t redist0_count = MIN(smp_cpus, redist0_capacity);
 
-        qdev_prop_set_uint32(gicdev, "len-redist-region-count", 1);
+        nb_redist_regions = virt_gicv3_redist_region_count(vms);
+
+        qdev_prop_set_uint32(gicdev, "len-redist-region-count",
+                             nb_redist_regions);
         qdev_prop_set_uint32(gicdev, "redist-region-count[0]", redist0_count);
+
+        if (nb_redist_regions == 2) {
+            uint32_t redist1_capacity =
+                        vms->memmap[VIRT_GIC_REDIST2].size / GICV3_REDIST_SIZE;
+
+            qdev_prop_set_uint32(gicdev, "redist-region-count[1]",
+                MIN(smp_cpus - redist0_count, redist1_capacity));
+        }
     }
     qdev_init_nofail(gicdev);
     gicbusdev = SYS_BUS_DEVICE(gicdev);
     sysbus_mmio_map(gicbusdev, 0, vms->memmap[VIRT_GIC_DIST].base);
     if (type == 3) {
         sysbus_mmio_map(gicbusdev, 1, vms->memmap[VIRT_GIC_REDIST].base);
+        if (nb_redist_regions == 2) {
+            sysbus_mmio_map(gicbusdev, 2, vms->memmap[VIRT_GIC_REDIST2].base);
+        }
     } else {
         sysbus_mmio_map(gicbusdev, 1, vms->memmap[VIRT_GIC_CPU].base);
     }
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
      */
     if (vms->gic_version == 3) {
         virt_max_cpus = vms->memmap[VIRT_GIC_REDIST].size / GICV3_REDIST_SIZE;
+        virt_max_cpus += vms->memmap[VIRT_GIC_REDIST2].size / GICV3_REDIST_SIZE;
     } else {
         virt_max_cpus = GIC_NCPU;
     }
-- 
2.17.1

From: Eric Auger <eric.auger@redhat.com>

This patch defines a new ECAM region located after the 256GB limit.

The virt machine state is augmented with a new highmem_ecam field
which guards the usage of this new ECAM region instead of the legacy
16MB one. With the highmem ECAM region, up to 256 PCIe buses can be
used.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1529072910-16156-9-git-send-email-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/virt.h    |  4 ++++
 hw/arm/virt-acpi-build.c | 21 +++++++++++++--------
 hw/arm/virt.c            | 12 ++++++++----
 3 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -XXX,XX +XXX,XX @@ enum {
     VIRT_PCIE_MMIO,
     VIRT_PCIE_PIO,
     VIRT_PCIE_ECAM,
+    VIRT_PCIE_ECAM_HIGH,
     VIRT_PLATFORM_BUS,
     VIRT_PCIE_MMIO_HIGH,
     VIRT_GPIO,
@@ -XXX,XX +XXX,XX @@ typedef struct {
     FWCfgState *fw_cfg;
     bool secure;
     bool highmem;
+    bool highmem_ecam;
     bool its;
     bool virt;
     int32_t gic_version;
@@ -XXX,XX +XXX,XX @@ typedef struct {
     int psci_conduit;
 } VirtMachineState;
 
+#define VIRT_ECAM_ID(high) (high ? VIRT_PCIE_ECAM_HIGH : VIRT_PCIE_ECAM)
+
 #define TYPE_VIRT_MACHINE   MACHINE_TYPE_NAME("virt")
 #define VIRT_MACHINE(obj) \
     OBJECT_CHECK(VirtMachineState, (obj), TYPE_VIRT_MACHINE)
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -XXX,XX +XXX,XX @@ static void acpi_dsdt_add_virtio(Aml *scope,
 }
 
 static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
-                              uint32_t irq, bool use_highmem)
+                              uint32_t irq, bool use_highmem, bool highmem_ecam)
 {
+    int ecam_id = VIRT_ECAM_ID(highmem_ecam);
     Aml *method, *crs, *ifctx, *UUID, *ifctx1, *elsectx, *buf;
     int i, bus_no;
     hwaddr base_mmio = memmap[VIRT_PCIE_MMIO].base;
     hwaddr size_mmio = memmap[VIRT_PCIE_MMIO].size;
     hwaddr base_pio = memmap[VIRT_PCIE_PIO].base;
     hwaddr size_pio = memmap[VIRT_PCIE_PIO].size;
-    hwaddr base_ecam = memmap[VIRT_PCIE_ECAM].base;
-    hwaddr size_ecam = memmap[VIRT_PCIE_ECAM].size;
+    hwaddr base_ecam = memmap[ecam_id].base;
+    hwaddr size_ecam = memmap[ecam_id].size;
     int nr_pcie_buses = size_ecam / PCIE_MMCFG_SIZE_MIN;
 
     Aml *dev = aml_device("%s", "PCI0");
@@ -XXX,XX +XXX,XX @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
     aml_append(dev, aml_name_decl("_CCA", aml_int(1)));
 
     /* Declare the PCI Routing Table. */
-    Aml *rt_pkg = aml_package(nr_pcie_buses * PCI_NUM_PINS);
+    Aml *rt_pkg = aml_varpackage(nr_pcie_buses * PCI_NUM_PINS);
     for (bus_no = 0; bus_no < nr_pcie_buses; bus_no++) {
         for (i = 0; i < PCI_NUM_PINS; i++) {
             int gsi = (i + bus_no) % PCI_NUM_PINS;
@@ -XXX,XX +XXX,XX @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
     Aml *dev_res0 = aml_device("%s", "RES0");
     aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
     crs = aml_resource_template();
-    aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE));
+    aml_append(crs,
+        aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED, AML_MAX_FIXED,
+                         AML_NON_CACHEABLE, AML_READ_WRITE, 0x0000, base_ecam,
+                         base_ecam + size_ecam - 1, 0x0000, size_ecam));
     aml_append(dev_res0, aml_name_decl("_CRS", crs));
     aml_append(dev, dev_res0);
     aml_append(scope, dev);
@@ -XXX,XX +XXX,XX @@ build_mcfg(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 {
     AcpiTableMcfg *mcfg;
     const MemMapEntry *memmap = vms->memmap;
+    int ecam_id = VIRT_ECAM_ID(vms->highmem_ecam);
     int len = sizeof(*mcfg) + sizeof(mcfg->allocation[0]);
     int mcfg_start = table_data->len;
 
     mcfg = acpi_data_push(table_data, len);
-    mcfg->allocation[0].address = cpu_to_le64(memmap[VIRT_PCIE_ECAM].base);
+    mcfg->allocation[0].address = cpu_to_le64(memmap[ecam_id].base);
 
     /* Only a single allocation so no need to play with segments */
     mcfg->allocation[0].pci_segment = cpu_to_le16(0);
     mcfg->allocation[0].start_bus_number = 0;
-    mcfg->allocation[0].end_bus_number = (memmap[VIRT_PCIE_ECAM].size
+    mcfg->allocation[0].end_bus_number = (memmap[ecam_id].size
                                           / PCIE_MMCFG_SIZE_MIN) - 1;
 
     build_header(linker, table_data, (void *)(table_data->data + mcfg_start),
@@ -XXX,XX +XXX,XX @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     acpi_dsdt_add_virtio(scope, &memmap[VIRT_MMIO],
                     (irqmap[VIRT_MMIO] + ARM_SPI_BASE), NUM_VIRTIO_TRANSPORTS);
     acpi_dsdt_add_pci(scope, memmap, (irqmap[VIRT_PCIE] + ARM_SPI_BASE),
-                      vms->highmem);
+                      vms->highmem, vms->highmem_ecam);
     acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
                        (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
     acpi_dsdt_add_power_button(scope);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static const MemMapEntry a15memmap[] = {
     [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
     /* Additional 64 MB redist region (can contain up to 512 redistributors) */
     [VIRT_GIC_REDIST2] =        { 0x4000000000ULL, 0x4000000 },
+    [VIRT_PCIE_ECAM_HIGH] =     { 0x4010000000ULL, 0x10000000 },
     /* Second PCIe window, 512GB wide at the 512GB boundary */
     [VIRT_PCIE_MMIO_HIGH] =   { 0x8000000000ULL, 0x8000000000ULL },
 };
@@ -XXX,XX +XXX,XX @@ static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
     hwaddr size_mmio_high = vms->memmap[VIRT_PCIE_MMIO_HIGH].size;
     hwaddr base_pio = vms->memmap[VIRT_PCIE_PIO].base;
     hwaddr size_pio = vms->memmap[VIRT_PCIE_PIO].size;
-    hwaddr base_ecam = vms->memmap[VIRT_PCIE_ECAM].base;
-    hwaddr size_ecam = vms->memmap[VIRT_PCIE_ECAM].size;
+    hwaddr base_ecam, size_ecam;
     hwaddr base = base_mmio;
-    int nr_pcie_buses = size_ecam / PCIE_MMCFG_SIZE_MIN;
+    int nr_pcie_buses;
     int irq = vms->irqmap[VIRT_PCIE];
     MemoryRegion *mmio_alias;
     MemoryRegion *mmio_reg;
@@ -XXX,XX +XXX,XX @@ static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
     MemoryRegion *ecam_reg;
     DeviceState *dev;
     char *nodename;
-    int i;
+    int i, ecam_id;
     PCIHostState *pci;
 
     dev = qdev_create(NULL, TYPE_GPEX_HOST);
     qdev_init_nofail(dev);
 
+    ecam_id = VIRT_ECAM_ID(vms->highmem_ecam);
+    base_ecam = vms->memmap[ecam_id].base;
+    size_ecam = vms->memmap[ecam_id].size;
+    nr_pcie_buses = size_ecam / PCIE_MMCFG_SIZE_MIN;
     /* Map only the first size_ecam bytes of ECAM space */
     ecam_alias = g_new0(MemoryRegion, 1);
     ecam_reg = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
-- 
2.17.1

From: Eric Auger <eric.auger@redhat.com>

Add virt-3.0 machine type.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1529072910-16156-10-git-send-email-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/virt.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ type_init(machvirt_machine_init);
 #define VIRT_COMPAT_2_12 \
     HW_COMPAT_2_12
 
-static void virt_2_12_instance_init(Object *obj)
+static void virt_3_0_instance_init(Object *obj)
 {
     VirtMachineState *vms = VIRT_MACHINE(obj);
     VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
@@ -XXX,XX +XXX,XX @@ static void virt_2_12_instance_init(Object *obj)
     vms->irqmap = a15irqmap;
 }
 
+static void virt_machine_3_0_options(MachineClass *mc)
+{
+}
+DEFINE_VIRT_MACHINE_AS_LATEST(3, 0)
+
+static void virt_2_12_instance_init(Object *obj)
+{
+    virt_3_0_instance_init(obj);
+}
+
 static void virt_machine_2_12_options(MachineClass *mc)
 {
+    virt_machine_3_0_options(mc);
     SET_MACHINE_COMPAT(mc, VIRT_COMPAT_2_12);
 }
-DEFINE_VIRT_MACHINE_AS_LATEST(2, 12)
+DEFINE_VIRT_MACHINE(2, 12)
 
 #define VIRT_COMPAT_2_11 \
     HW_COMPAT_2_11
-- 
2.17.1

From: Eric Auger <eric.auger@redhat.com>

With this patch, virt-3.0 machine uses a new 256MB ECAM region
by default instead of the legacy 16MB one, if highmem is set
(LPAE supported by the guest) and (!firmware_loaded || aarch64).

Indeed aarch32 mode FW may not support this high ECAM region.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1529072910-16156-11-git-send-email-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/virt.h |  1 +
 hw/arm/virt.c         | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -XXX,XX +XXX,XX @@ typedef struct {
     bool no_pmu;
     bool claim_edge_triggered_timers;
     bool smbios_old_sys_ver;
+    bool no_highmem_ecam;
 } VirtMachineClass;
 
 typedef struct {
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
     int n, virt_max_cpus;
     MemoryRegion *ram = g_new(MemoryRegion, 1);
     bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
+    bool aarch64 = true;
 
     /* We can probe only here because during property set
      * KVM is not available yet
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
         numa_cpu_pre_plug(&possible_cpus->cpus[cs->cpu_index], DEVICE(cpuobj),
                           &error_fatal);
 
+        aarch64 &= object_property_get_bool(cpuobj, "aarch64", NULL);
+
         if (!vms->secure) {
             object_property_set_bool(cpuobj, false, "has_el3", NULL);
         }
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
         create_uart(vms, pic, VIRT_SECURE_UART, secure_sysmem, serial_hd(1));
     }
 
+    vms->highmem_ecam &= vms->highmem && (!firmware_loaded || aarch64);
+
     create_rtc(vms, pic);
 
     create_pcie(vms, pic);
@@ -XXX,XX +XXX,XX @@ static void virt_3_0_instance_init(Object *obj)
                                     "Set GIC version. "
                                     "Valid values are 2, 3 and host", NULL);
 
+    vms->highmem_ecam = !vmc->no_highmem_ecam;
+
     if (vmc->no_its) {
         vms->its = false;
     } else {
@@ -XXX,XX +XXX,XX @@ static void virt_2_12_instance_init(Object *obj)
 
 static void virt_machine_2_12_options(MachineClass *mc)
 {
+    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
     virt_machine_3_0_options(mc);
     SET_MACHINE_COMPAT(mc, VIRT_COMPAT_2_12);
+    vmc->no_highmem_ecam = true;
 }
 DEFINE_VIRT_MACHINE(2, 12)
 
-- 
2.17.1

From: Eric Auger <eric.auger@redhat.com>

virt 3.0 now allows up to 512 vcpus whereas for earlier machine
types, max_cpus was set to 255 and any attempt to start the
machine with vcpus > 255 was rejected at a very early stage,
in vl.c/main level.

512 is the max supported by KVM. Anyway the actual vcpu count
that can be achieved depends on other parameters such as the
acceleration mode, the vgic version, the host kernel version.
Those are discovered later on.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1529072910-16156-12-git-send-email-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/virt.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
 
     mc->init = machvirt_init;
-    /* Start max_cpus at the maximum QEMU supports. We'll further restrict
-     * it later in machvirt_init, where we have more information about the
+    /* Start with max_cpus set to 512, which is the maximum supported by KVM.
+     * The value may be reduced later when we have more information about the
      * configuration of the particular instance.
      */
-    mc->max_cpus = 255;
+    mc->max_cpus = 512;
     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_CALXEDA_XGMAC);
     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_AMD_XGBE);
     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE);
@@ -XXX,XX +XXX,XX @@ static void virt_machine_2_12_options(MachineClass *mc)
     virt_machine_3_0_options(mc);
     SET_MACHINE_COMPAT(mc, VIRT_COMPAT_2_12);
     vmc->no_highmem_ecam = true;
+    mc->max_cpus = 255;
 }
 DEFINE_VIRT_MACHINE(2, 12)
 
-- 
2.17.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Add the Cortex-R5F with the optional FPU enabled.

Reviewed-by: KONRAD Frederic <frederic.konrad@adacore.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 20180529124707.3025-2-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
     define_arm_cp_regs(cpu, cortexr5_cp_reginfo);
 }
 
+static void cortex_r5f_initfn(Object *obj)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+
+    cortex_r5_initfn(obj);
+    set_feature(&cpu->env, ARM_FEATURE_VFP3);
+}
+
 static const ARMCPRegInfo cortexa8_cp_reginfo[] = {
     { .name = "L2LOCKDOWN", .cp = 15, .crn = 9, .crm = 0, .opc1 = 1, .opc2 = 0,
       .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_cpus[] = {
     { .name = "cortex-m33",  .initfn = cortex_m33_initfn,
                              .class_init = arm_v7m_class_init },
     { .name = "cortex-r5",   .initfn = cortex_r5_initfn },
+    { .name = "cortex-r5f",  .initfn = cortex_r5f_initfn },
     { .name = "cortex-a7",   .initfn = cortex_a7_initfn },
     { .name = "cortex-a8",   .initfn = cortex_a8_initfn },
     { .name = "cortex-a9",   .initfn = cortex_a9_initfn },
-- 
2.17.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

The ZynqMP has Cortex-R5Fs with the optional FPU enabled.

Reviewed-by: KONRAD Frederic <frederic.konrad@adacore.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 20180529124707.3025-3-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xlnx-zcu102.c | 2 +-
 hw/arm/xlnx-zynqmp.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/xlnx-zcu102.c b/hw/arm/xlnx-zcu102.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-zcu102.c
+++ b/hw/arm/xlnx-zcu102.c
@@ -XXX,XX +XXX,XX @@ static void xlnx_zcu102_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
 
-    mc->desc = "Xilinx ZynqMP ZCU102 board with 4xA53s and 2xR5s based on " \
+    mc->desc = "Xilinx ZynqMP ZCU102 board with 4xA53s and 2xR5Fs based on " \
                "the value of smp";
     mc->init = xlnx_zcu102_init;
     mc->block_default_type = IF_IDE;
diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -XXX,XX +XXX,XX @@ static void xlnx_zynqmp_create_rpu(XlnxZynqMPState *s, const char *boot_cpu,
         char *name;
 
         object_initialize(&s->rpu_cpu[i], sizeof(s->rpu_cpu[i]),
-                          "cortex-r5-" TYPE_ARM_CPU);
+                          "cortex-r5f-" TYPE_ARM_CPU);
         object_property_add_child(OBJECT(s), "rpu-cpu[*]",
                                   OBJECT(&s->rpu_cpu[i]), &error_abort);
 
-- 
2.17.1

Implement the Arm TrustZone Memory Protection Controller, which sits
in front of RAM and allows secure software to configure it to either
pass through or reject transactions.

We implement the MPC as a QEMU IOMMU, which will direct transactions
either through to the devices and memory behind it or to a special
"never works" AddressSpace if they are blocked.

This initial commit implements the skeleton of the device:
 * it always permits accesses
 * it doesn't implement most of the registers
 * it doesn't implement the interrupt or other behaviour
   for blocked transactions

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20180620132032.28865-2-peter.maydell@linaro.org
---
 hw/misc/Makefile.objs           |   1 +
 include/hw/misc/tz-mpc.h        |  70 ++++++
 hw/misc/tz-mpc.c                | 399 ++++++++++++++++++++++++++++++++
 MAINTAINERS                     |   2 +
 default-configs/arm-softmmu.mak |   1 +
 hw/misc/trace-events            |   7 +
 6 files changed, 480 insertions(+)
 create mode 100644 include/hw/misc/tz-mpc.h
 create mode 100644 hw/misc/tz-mpc.c

diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_MIPS_ITU) += mips_itu.o
 obj-$(CONFIG_MPS2_FPGAIO) += mps2-fpgaio.o
 obj-$(CONFIG_MPS2_SCC) += mps2-scc.o
 
+obj-$(CONFIG_TZ_MPC) += tz-mpc.o
 obj-$(CONFIG_TZ_PPC) += tz-ppc.o
 obj-$(CONFIG_IOTKIT_SECCTL) += iotkit-secctl.o
 
diff --git a/include/hw/misc/tz-mpc.h b/include/hw/misc/tz-mpc.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/hw/misc/tz-mpc.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM AHB5 TrustZone Memory Protection Controller emulation
+ *
+ * Copyright (c) 2018 Linaro Limited
+ * Written by Peter Maydell
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 or
+ * (at your option) any later version.
+ */
+
+/* This is a model of the TrustZone memory protection controller (MPC).
+ * It is documented in the ARM CoreLink SIE-200 System IP for Embedded TRM
+ * (DDI 0571G):
+ * https://developer.arm.com/products/architecture/m-profile/docs/ddi0571/g
+ *
+ * The MPC sits in front of memory and allows secure software to
+ * configure it to either pass through or reject transactions.
+ * Rejected transactions may be configured to either be aborted, or to
+ * behave as RAZ/WI. An interrupt can be signalled for a rejected transaction.
+ *
+ * The MPC has a register interface which the guest uses to configure it.
+ *
+ * QEMU interface:
+ * + sysbus MMIO region 0: MemoryRegion for the MPC's config registers
+ * + sysbus MMIO region 1: MemoryRegion for the upstream end of the MPC
+ * + Property "downstream": MemoryRegion defining the downstream memory
+ * + Named GPIO output "irq": set for a transaction-failed interrupt
+ */
+
+#ifndef TZ_MPC_H
+#define TZ_MPC_H
+
+#include "hw/sysbus.h"
+
+#define TYPE_TZ_MPC "tz-mpc"
+#define TZ_MPC(obj) OBJECT_CHECK(TZMPC, (obj), TYPE_TZ_MPC)
+
+#define TZ_NUM_PORTS 16
+
+#define TYPE_TZ_MPC_IOMMU_MEMORY_REGION "tz-mpc-iommu-memory-region"
+
+typedef struct TZMPC TZMPC;
+
+struct TZMPC {
+    /*< private >*/
+    SysBusDevice parent_obj;
+
+    /*< public >*/
+
+    qemu_irq irq;
+
+    /* Properties */
+    MemoryRegion *downstream;
+
+    hwaddr blocksize;
+    uint32_t blk_max;
+
+    /* MemoryRegions exposed to user */
+    MemoryRegion regmr;
+    IOMMUMemoryRegion upstream;
+
+    /* MemoryRegion used internally */
+    MemoryRegion blocked_io;
+
+    AddressSpace downstream_as;
+    AddressSpace blocked_io_as;
+};
+
+#endif
diff --git a/hw/misc/tz-mpc.c b/hw/misc/tz-mpc.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/misc/tz-mpc.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM AHB5 TrustZone Memory Protection Controller emulation
+ *
+ * Copyright (c) 2018 Linaro Limited
+ * Written by Peter Maydell
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 or
+ * (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "trace.h"
+#include "hw/sysbus.h"
+#include "hw/registerfields.h"
+#include "hw/misc/tz-mpc.h"
+
+/* Our IOMMU has two IOMMU indexes, one for secure transactions and one for
+ * non-secure transactions.
+ */
+enum {
+    IOMMU_IDX_S,
+    IOMMU_IDX_NS,
+    IOMMU_NUM_INDEXES,
+};
+
+/* Config registers */
+REG32(CTRL, 0x00)
+REG32(BLK_MAX, 0x10)
+REG32(BLK_CFG, 0x14)
+REG32(BLK_IDX, 0x18)
+REG32(BLK_LUT, 0x1c)
+REG32(INT_STAT, 0x20)
+REG32(INT_CLEAR, 0x24)
+REG32(INT_EN, 0x28)
+REG32(INT_INFO1, 0x2c)
+REG32(INT_INFO2, 0x30)
+REG32(INT_SET, 0x34)
+REG32(PIDR4, 0xfd0)
+REG32(PIDR5, 0xfd4)
+REG32(PIDR6, 0xfd8)
+REG32(PIDR7, 0xfdc)
+REG32(PIDR0, 0xfe0)
+REG32(PIDR1, 0xfe4)
+REG32(PIDR2, 0xfe8)
+REG32(PIDR3, 0xfec)
+REG32(CIDR0, 0xff0)
+REG32(CIDR1, 0xff4)
+REG32(CIDR2, 0xff8)
+REG32(CIDR3, 0xffc)
+
+static const uint8_t tz_mpc_idregs[] = {
+    0x04, 0x00, 0x00, 0x00,
+    0x60, 0xb8, 0x1b, 0x00,
+    0x0d, 0xf0, 0x05, 0xb1,
+};
+
+static MemTxResult tz_mpc_reg_read(void *opaque, hwaddr addr,
+                                   uint64_t *pdata,
+                                   unsigned size, MemTxAttrs attrs)
+{
+    uint64_t r;
+    uint32_t offset = addr & ~0x3;
+
+    if (!attrs.secure && offset < A_PIDR4) {
+        /* NS accesses can only see the ID registers */
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "TZ MPC register read: NS access to offset 0x%x\n",
+                      offset);
+        r = 0;
+        goto read_out;
+    }
+
+    switch (offset) {
+    case A_PIDR4:
+    case A_PIDR5:
+    case A_PIDR6:
+    case A_PIDR7:
+    case A_PIDR0:
+    case A_PIDR1:
+    case A_PIDR2:
+    case A_PIDR3:
+    case A_CIDR0:
+    case A_CIDR1:
+    case A_CIDR2:
+    case A_CIDR3:
+        r = tz_mpc_idregs[(offset - A_PIDR4) / 4];
+        break;
+    case A_INT_CLEAR:
+    case A_INT_SET:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "TZ MPC register read: write-only offset 0x%x\n",
+                      offset);
+        r = 0;
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "TZ MPC register read: bad offset 0x%x\n", offset);
+        r = 0;
+        break;
+    }
+
+    if (size != 4) {
+        /* None of our registers are read-sensitive (except BLK_LUT,
+         * which can special case the "size not 4" case), so just
+         * pull the right bytes out of the word read result.
+         */
+        r = extract32(r, (addr & 3) * 8, size * 8);
+    }
+
+read_out:
+    trace_tz_mpc_reg_read(addr, r, size);
+    *pdata = r;
+    return MEMTX_OK;
+}
+
+static MemTxResult tz_mpc_reg_write(void *opaque, hwaddr addr,
+                                    uint64_t value,
+                                    unsigned size, MemTxAttrs attrs)
+{
+    uint32_t offset = addr & ~0x3;
+
+    trace_tz_mpc_reg_write(addr, value, size);
+
+    if (!attrs.secure && offset < A_PIDR4) {
+        /* NS accesses can only see the ID registers */
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "TZ MPC register write: NS access to offset 0x%x\n",
+                      offset);
+        return MEMTX_OK;
+    }
+
+    if (size != 4) {
+        /* Expand the byte or halfword write to a full word size.
+         * In most cases we can do this with zeroes; the exceptions
+         * are CTRL, BLK_IDX and BLK_LUT.
+         */
+        uint32_t oldval;
+
+        switch (offset) {
+            /* As we add support for registers which need expansions
+             * other than zeroes we'll fill in cases here.
+             */
+        default:
+            oldval = 0;
+            break;
+        }
+        value = deposit32(oldval, (addr & 3) * 8, size * 8, value);
+    }
+
+    switch (offset) {
+    case A_PIDR4:
+    case A_PIDR5:
+    case A_PIDR6:
+    case A_PIDR7:
+    case A_PIDR0:
+    case A_PIDR1:
+    case A_PIDR2:
+    case A_PIDR3:
+    case A_CIDR0:
+    case A_CIDR1:
+    case A_CIDR2:
+    case A_CIDR3:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "TZ MPC register write: read-only offset 0x%x\n", offset);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "TZ MPC register write: bad offset 0x%x\n", offset);
+        break;
+    }
+
+    return MEMTX_OK;
+}
+
+static const MemoryRegionOps tz_mpc_reg_ops = {
+    .read_with_attrs = tz_mpc_reg_read,
+    .write_with_attrs = tz_mpc_reg_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .valid.min_access_size = 1,
+    .valid.max_access_size = 4,
+    .impl.min_access_size = 1,
+    .impl.max_access_size = 4,
+};
+
+/* Accesses only reach these read and write functions if the MPC is
+ * blocking them; non-blocked accesses go directly to the downstream
+ * memory region without passing through this code.
+ */
+static MemTxResult tz_mpc_mem_blocked_read(void *opaque, hwaddr addr,
+                                           uint64_t *pdata,
+                                           unsigned size, MemTxAttrs attrs)
+{
+    trace_tz_mpc_mem_blocked_read(addr, size, attrs.secure);
+
+    *pdata = 0;
+    return MEMTX_OK;
+}
+
+static MemTxResult tz_mpc_mem_blocked_write(void *opaque, hwaddr addr,
+                                            uint64_t value,
+                                            unsigned size, MemTxAttrs attrs)
+{
+    trace_tz_mpc_mem_blocked_write(addr, value, size, attrs.secure);
+
+    return MEMTX_OK;
+}
+
+static const MemoryRegionOps tz_mpc_mem_blocked_ops = {
+    .read_with_attrs = tz_mpc_mem_blocked_read,
+    .write_with_attrs = tz_mpc_mem_blocked_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .valid.min_access_size = 1,
+    .valid.max_access_size = 8,
+    .impl.min_access_size = 1,
+    .impl.max_access_size = 8,
+};
+
+static IOMMUTLBEntry tz_mpc_translate(IOMMUMemoryRegion *iommu,
+                                      hwaddr addr, IOMMUAccessFlags flags,
+                                      int iommu_idx)
+{
+    TZMPC *s = TZ_MPC(container_of(iommu, TZMPC, upstream));
+    bool ok;
+
+    IOMMUTLBEntry ret = {
+        .iova = addr & ~(s->blocksize - 1),
+        .translated_addr = addr & ~(s->blocksize - 1),
+        .addr_mask = s->blocksize - 1,
+        .perm = IOMMU_RW,
+    };
+
+    /* Look at the per-block configuration for this address, and
+     * return a TLB entry directing the transaction at either
+     * downstream_as or blocked_io_as, as appropriate.
+     * For the moment, always permit accesses.
+     */
+    ok = true;
+
+    trace_tz_mpc_translate(addr, flags,
+                           iommu_idx == IOMMU_IDX_S ? "S" : "NS",
+                           ok ? "pass" : "block");
+
+    ret.target_as = ok ? &s->downstream_as : &s->blocked_io_as;
+    return ret;
+}
+
+static int tz_mpc_attrs_to_index(IOMMUMemoryRegion *iommu, MemTxAttrs attrs)
+{
+    /* We treat unspecified attributes like secure. Transactions with
+     * unspecified attributes come from places like
+     * cpu_physical_memory_write_rom() for initial image load, and we want
+     * those to pass through the from-reset "everything is secure" config.
+     * All the real during-emulation transactions from the CPU will
+     * specify attributes.
+     */
+    return (attrs.unspecified || attrs.secure) ? IOMMU_IDX_S : IOMMU_IDX_NS;
+}
+
+static int tz_mpc_num_indexes(IOMMUMemoryRegion *iommu)
+{
+    return IOMMU_NUM_INDEXES;
+}
+
+static void tz_mpc_reset(DeviceState *dev)
+{
+}
+
+static void tz_mpc_init(Object *obj)
+{
+    DeviceState *dev = DEVICE(obj);
+    TZMPC *s = TZ_MPC(obj);
+
+    qdev_init_gpio_out_named(dev, &s->irq, "irq", 1);
+}
+
+static void tz_mpc_realize(DeviceState *dev, Error **errp)
+{
+    Object *obj = OBJECT(dev);
+    SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
+    TZMPC *s = TZ_MPC(dev);
+    uint64_t size;
+
+    /* We can't create the upstream end of the port until realize,
+     * as we don't know the size of the MR used as the downstream until then.
+     * We insist on having a downstream, to avoid complicating the code
+     * with handling the "don't know how big this is" case. It's easy
+     * enough for the user to create an unimplemented_device as downstream
+     * if they have nothing else to plug into this.
+     */
+    if (!s->downstream) {
+        error_setg(errp, "MPC 'downstream' link not set");
+        return;
+    }
+
+    size = memory_region_size(s->downstream);
+
+    memory_region_init_iommu(&s->upstream, sizeof(s->upstream),
+                             TYPE_TZ_MPC_IOMMU_MEMORY_REGION,
+                             obj, "tz-mpc-upstream", size);
+
+    /* In real hardware the block size is configurable. In QEMU we could
+     * make it configurable but will need it to be at least as big as the
+     * target page size so we can execute out of the resulting MRs. Guest
+     * software is supposed to check the block size using the BLK_CFG
+     * register, so make it fixed at the page size.
+     */
+    s->blocksize = memory_region_iommu_get_min_page_size(&s->upstream);
+    if (size % s->blocksize != 0) {
+        error_setg(errp,
+                   "MPC 'downstream' size %" PRId64
+                   " is not a multiple of %" HWADDR_PRIx " bytes",
+                   size, s->blocksize);
+        object_unref(OBJECT(&s->upstream));
+        return;
+    }
+
+    /* BLK_MAX is the max value of BLK_IDX, which indexes an array of 32-bit
+     * words, each bit of which indicates one block.
+     */
+    s->blk_max = DIV_ROUND_UP(size / s->blocksize, 32);
+
+    memory_region_init_io(&s->regmr, obj, &tz_mpc_reg_ops,
+                          s, "tz-mpc-regs", 0x1000);
+    sysbus_init_mmio(sbd, &s->regmr);
+
+    sysbus_init_mmio(sbd, MEMORY_REGION(&s->upstream));
+
+    /* This memory region is not exposed to users of this device as a
+     * sysbus MMIO region, but is instead used internally as something
+     * that our IOMMU translate function might direct accesses to.
+     */
+    memory_region_init_io(&s->blocked_io, obj, &tz_mpc_mem_blocked_ops,
+                          s, "tz-mpc-blocked-io", size);
+
+    address_space_init(&s->downstream_as, s->downstream,
+                       "tz-mpc-downstream");
+    address_space_init(&s->blocked_io_as, &s->blocked_io,
+                       "tz-mpc-blocked-io");
+}
+
+static const VMStateDescription tz_mpc_vmstate = {
+    .name = "tz-mpc",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static Property tz_mpc_properties[] = {
+    DEFINE_PROP_LINK("downstream", TZMPC, downstream,
+                     TYPE_MEMORY_REGION, MemoryRegion *),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void tz_mpc_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->realize = tz_mpc_realize;
+    dc->vmsd = &tz_mpc_vmstate;
+    dc->reset = tz_mpc_reset;
+    dc->props = tz_mpc_properties;
+}
+
+static const TypeInfo tz_mpc_info = {
+    .name = TYPE_TZ_MPC,
+    .parent = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(TZMPC),
+    .instance_init = tz_mpc_init,
+    .class_init = tz_mpc_class_init,
+};
+
+static void tz_mpc_iommu_memory_region_class_init(ObjectClass *klass,
+                                                  void *data)
+{
+    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
+
+    imrc->translate = tz_mpc_translate;
+    imrc->attrs_to_index = tz_mpc_attrs_to_index;
+    imrc->num_indexes = tz_mpc_num_indexes;
+}
+
+static const TypeInfo tz_mpc_iommu_memory_region_info = {
+    .name = TYPE_TZ_MPC_IOMMU_MEMORY_REGION,
+    .parent = TYPE_IOMMU_MEMORY_REGION,
+    .class_init = tz_mpc_iommu_memory_region_class_init,
+};
+
+static void tz_mpc_register_types(void)
+{
+    type_register_static(&tz_mpc_info);
+    type_register_static(&tz_mpc_iommu_memory_region_info);
+}
+
+type_init(tz_mpc_register_types);
diff --git a/MAINTAINERS b/MAINTAINERS
index XXXXXXX..XXXXXXX 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: hw/char/cmsdk-apb-uart.c
 F: include/hw/char/cmsdk-apb-uart.h
 F: hw/misc/tz-ppc.c
 F: include/hw/misc/tz-ppc.h
+F: hw/misc/tz-mpc.c
+F: include/hw/misc/tz-mpc.h
 
 ARM cores
 M: Peter Maydell <peter.maydell@linaro.org>
diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index XXXXXXX..XXXXXXX 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -XXX,XX +XXX,XX @@ CONFIG_CMSDK_APB_UART=y
 CONFIG_MPS2_FPGAIO=y
 CONFIG_MPS2_SCC=y
 
+CONFIG_TZ_MPC=y
 CONFIG_TZ_PPC=y
 CONFIG_IOTKIT=y
 CONFIG_IOTKIT_SECCTL=y
diff --git a/hw/misc/trace-events b/hw/misc/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/trace-events
+++ b/hw/misc/trace-events
@@ -XXX,XX +XXX,XX @@ mos6522_set_sr_int(void) "set sr_int"
 mos6522_write(uint64_t addr, uint64_t val) "reg=0x%"PRIx64 " val=0x%"PRIx64
 mos6522_read(uint64_t addr, unsigned val) "reg=0x%"PRIx64 " val=0x%x"
 
+# hw/misc/tz-mpc.c
+tz_mpc_reg_read(uint32_t offset, uint64_t data, unsigned size) "TZ MPC regs read: offset 0x%x data 0x%" PRIx64 " size %u"
+tz_mpc_reg_write(uint32_t offset, uint64_t data, unsigned size) "TZ MPC regs write: offset 0x%x data 0x%" PRIx64 " size %u"
+tz_mpc_mem_blocked_read(uint64_t addr, unsigned size, bool secure) "TZ MPC blocked read: offset 0x%" PRIx64 " size %u secure %d"
+tz_mpc_mem_blocked_write(uint64_t addr, uint64_t data, unsigned size, bool secure) "TZ MPC blocked write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u secure %d"
+tz_mpc_translate(uint64_t addr, int flags, const char *idx, const char *res) "TZ MPC translate: addr 0x%" PRIx64 " flags 0x%x iommu_idx %s: %s"
+
 # hw/misc/tz-ppc.c
 tz_ppc_reset(void) "TZ PPC: reset"
 tz_ppc_cfg_nonsec(int n, int level) "TZ PPC: cfg_nonsec[%d] = %d"
-- 
2.17.1

Implement the missing registers for the TZ MPC.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20180620132032.28865-3-peter.maydell@linaro.org
---
 include/hw/misc/tz-mpc.h |  10 +++
 hw/misc/tz-mpc.c         | 140 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/include/hw/misc/tz-mpc.h b/include/hw/misc/tz-mpc.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/tz-mpc.h
+++ b/include/hw/misc/tz-mpc.h
@@ -XXX,XX +XXX,XX @@ struct TZMPC {
 
     /*< public >*/
 
+    /* State */
+    uint32_t ctrl;
+    uint32_t blk_idx;
+    uint32_t int_stat;
+    uint32_t int_en;
+    uint32_t int_info1;
+    uint32_t int_info2;
+
+    uint32_t *blk_lut;
+
     qemu_irq irq;
 
     /* Properties */
diff --git a/hw/misc/tz-mpc.c b/hw/misc/tz-mpc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/tz-mpc.c
+++ b/hw/misc/tz-mpc.c
@@ -XXX,XX +XXX,XX @@ enum {
 
 /* Config registers */
 REG32(CTRL, 0x00)
+    FIELD(CTRL, SEC_RESP, 4, 1)
+    FIELD(CTRL, AUTOINC, 8, 1)
+    FIELD(CTRL, LOCKDOWN, 31, 1)
 REG32(BLK_MAX, 0x10)
 REG32(BLK_CFG, 0x14)
 REG32(BLK_IDX, 0x18)
 REG32(BLK_LUT, 0x1c)
 REG32(INT_STAT, 0x20)
+    FIELD(INT_STAT, IRQ, 0, 1)
 REG32(INT_CLEAR, 0x24)
+    FIELD(INT_CLEAR, IRQ, 0, 1)
 REG32(INT_EN, 0x28)
+    FIELD(INT_EN, IRQ, 0, 1)
 REG32(INT_INFO1, 0x2c)
 REG32(INT_INFO2, 0x30)
 REG32(INT_SET, 0x34)
+    FIELD(INT_SET, IRQ, 0, 1)
 REG32(PIDR4, 0xfd0)
 REG32(PIDR5, 0xfd4)
 REG32(PIDR6, 0xfd8)
@@ -XXX,XX +XXX,XX @@ static const uint8_t tz_mpc_idregs[] = {
     0x0d, 0xf0, 0x05, 0xb1,
 };
 
+static void tz_mpc_irq_update(TZMPC *s)
+{
+    qemu_set_irq(s->irq, s->int_stat && s->int_en);
+}
+
+static void tz_mpc_autoinc_idx(TZMPC *s, unsigned access_size)
+{
+    /* Auto-increment BLK_IDX if necessary */
+    if (access_size == 4 && (s->ctrl & R_CTRL_AUTOINC_MASK)) {
+        s->blk_idx++;
+        s->blk_idx %= s->blk_max;
+    }
+}
+
 static MemTxResult tz_mpc_reg_read(void *opaque, hwaddr addr,
                                    uint64_t *pdata,
                                    unsigned size, MemTxAttrs attrs)
 {
+    TZMPC *s = TZ_MPC(opaque);
     uint64_t r;
     uint32_t offset = addr & ~0x3;
 
@@ -XXX,XX +XXX,XX @@ static MemTxResult tz_mpc_reg_read(void *opaque, hwaddr addr,
     }
 
     switch (offset) {
+    case A_CTRL:
+        r = s->ctrl;
+        break;
+    case A_BLK_MAX:
+        r = s->blk_max;
+        break;
+    case A_BLK_CFG:
+        /* We are never in "init in progress state", so this just indicates
+         * the block size. s->blocksize == (1 << BLK_CFG + 5), so
+         * BLK_CFG == ctz32(s->blocksize) - 5
+         */
+        r = ctz32(s->blocksize) - 5;
+        break;
+    case A_BLK_IDX:
+        r = s->blk_idx;
+        break;
+    case A_BLK_LUT:
+        r = s->blk_lut[s->blk_idx];
+        tz_mpc_autoinc_idx(s, size);
+        break;
+    case A_INT_STAT:
+        r = s->int_stat;
+        break;
+    case A_INT_EN:
+        r = s->int_en;
+        break;
+    case A_INT_INFO1:
+        r = s->int_info1;
+        break;
+    case A_INT_INFO2:
+        r = s->int_info2;
+        break;
     case A_PIDR4:
     case A_PIDR5:
     case A_PIDR6:
@@ -XXX,XX +XXX,XX @@ static MemTxResult tz_mpc_reg_write(void *opaque, hwaddr addr,
                                     uint64_t value,
                                     unsigned size, MemTxAttrs attrs)
 {
+    TZMPC *s = TZ_MPC(opaque);
     uint32_t offset = addr & ~0x3;
 
     trace_tz_mpc_reg_write(addr, value, size);
@@ -XXX,XX +XXX,XX @@ static MemTxResult tz_mpc_reg_write(void *opaque, hwaddr addr,
         uint32_t oldval;
 
         switch (offset) {
-            /* As we add support for registers which need expansions
-             * other than zeroes we'll fill in cases here.
-             */
+        case A_CTRL:
+            oldval = s->ctrl;
+            break;
+        case A_BLK_IDX:
+            oldval = s->blk_idx;
+            break;
+        case A_BLK_LUT:
+            oldval = s->blk_lut[s->blk_idx];
+            break;
         default:
             oldval = 0;
             break;
@@ -XXX,XX +XXX,XX @@ static MemTxResult tz_mpc_reg_write(void *opaque, hwaddr addr,
         value = deposit32(oldval, (addr & 3) * 8, size * 8, value);
     }
 
+    if ((s->ctrl & R_CTRL_LOCKDOWN_MASK) &&
+        (offset == A_CTRL || offset == A_BLK_LUT || offset == A_INT_EN)) {
+        /* Lockdown mode makes these three registers read-only, and
+         * the only way out of it is to reset the device.
+         */
+        qemu_log_mask(LOG_GUEST_ERROR, "TZ MPC register write to offset 0x%x "
+                      "while MPC is in lockdown mode\n", offset);
+        return MEMTX_OK;
+    }
+
     switch (offset) {
+    case A_CTRL:
+        /* We don't implement the 'data gating' feature so all other bits
+         * are reserved and we make them RAZ/WI.
+         */
+        s->ctrl = value & (R_CTRL_SEC_RESP_MASK |
+                           R_CTRL_AUTOINC_MASK |
+                           R_CTRL_LOCKDOWN_MASK);
+        break;
+    case A_BLK_IDX:
+        s->blk_idx = value % s->blk_max;
+        break;
+    case A_BLK_LUT:
+        s->blk_lut[s->blk_idx] = value;
+        tz_mpc_autoinc_idx(s, size);
+        break;
+    case A_INT_CLEAR:
+        if (value & R_INT_CLEAR_IRQ_MASK) {
+            s->int_stat = 0;
+            tz_mpc_irq_update(s);
+        }
+        break;
+    case A_INT_EN:
+        s->int_en = value & R_INT_EN_IRQ_MASK;
+        tz_mpc_irq_update(s);
+        break;
+    case A_INT_SET:
+        if (value & R_INT_SET_IRQ_MASK) {
+            s->int_stat = R_INT_STAT_IRQ_MASK;
+            tz_mpc_irq_update(s);
+        }
+        break;
     case A_PIDR4:
     case A_PIDR5:
     case A_PIDR6:
@@ -XXX,XX +XXX,XX @@ static int tz_mpc_num_indexes(IOMMUMemoryRegion *iommu)
 
 static void tz_mpc_reset(DeviceState *dev)
 {
+    TZMPC *s = TZ_MPC(dev);
+
+    s->ctrl = 0x00000100;
+    s->blk_idx = 0;
+    s->int_stat = 0;
+    s->int_en = 1;
+    s->int_info1 = 0;
+    s->int_info2 = 0;
+
+    memset(s->blk_lut, 0, s->blk_max * sizeof(uint32_t));
 }
 
 static void tz_mpc_init(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void tz_mpc_realize(DeviceState *dev, Error **errp)
                        "tz-mpc-downstream");
     address_space_init(&s->blocked_io_as, &s->blocked_io,
                        "tz-mpc-blocked-io");
+
+    s->blk_lut = g_new(uint32_t, s->blk_max);
+}
+
+static int tz_mpc_post_load(void *opaque, int version_id)
+{
+    TZMPC *s = TZ_MPC(opaque);
+
+    /* Check the incoming data doesn't point blk_idx off the end of blk_lut. */
+    if (s->blk_idx >= s->blk_max) {
+        return -1;
+    }
+    return 0;
 }
 
 static const VMStateDescription tz_mpc_vmstate = {
     .name = "tz-mpc",
     .version_id = 1,
     .minimum_version_id = 1,
+    .post_load = tz_mpc_post_load,
     .fields = (VMStateField[]) {
+        VMSTATE_UINT32(ctrl, TZMPC),
+        VMSTATE_UINT32(blk_idx, TZMPC),
+        VMSTATE_UINT32(int_stat, TZMPC),
+        VMSTATE_UINT32(int_en, TZMPC),
+        VMSTATE_UINT32(int_info1, TZMPC),
+        VMSTATE_UINT32(int_info2, TZMPC),
+        VMSTATE_VARRAY_UINT32(blk_lut, TZMPC, blk_max,
+                              0, vmstate_info_uint32, uint32_t),
         VMSTATE_END_OF_LIST()
     }
 };
-- 
2.17.1

The MPC is guest-configurable for whether blocked accesses:
 * should be RAZ/WI or cause a bus error
 * should generate an interrupt or not

Implement this behaviour in the blocked-access handlers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20180620132032.28865-4-peter.maydell@linaro.org
---
 hw/misc/tz-mpc.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/hw/misc/tz-mpc.c b/hw/misc/tz-mpc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/tz-mpc.c
+++ b/hw/misc/tz-mpc.c
@@ -XXX,XX +XXX,XX @@ REG32(INT_EN, 0x28)
     FIELD(INT_EN, IRQ, 0, 1)
 REG32(INT_INFO1, 0x2c)
 REG32(INT_INFO2, 0x30)
+    FIELD(INT_INFO2, HMASTER, 0, 16)
+    FIELD(INT_INFO2, HNONSEC, 16, 1)
+    FIELD(INT_INFO2, CFG_NS, 17, 1)
 REG32(INT_SET, 0x34)
     FIELD(INT_SET, IRQ, 0, 1)
 REG32(PIDR4, 0xfd0)
@@ -XXX,XX +XXX,XX @@ static const MemoryRegionOps tz_mpc_reg_ops = {
     .impl.max_access_size = 4,
 };
 
+static inline bool tz_mpc_cfg_ns(TZMPC *s, hwaddr addr)
+{
+    /* Return the cfg_ns bit from the LUT for the specified address */
+    hwaddr blknum = addr / s->blocksize;
+    hwaddr blkword = blknum / 32;
+    uint32_t blkbit = 1U << (blknum % 32);
+
+    /* This would imply the address was larger than the size we
+     * defined this memory region to be, so it can't happen.
+     */
+    assert(blkword < s->blk_max);
+    return s->blk_lut[blkword] & blkbit;
+}
+
+static MemTxResult tz_mpc_handle_block(TZMPC *s, hwaddr addr, MemTxAttrs attrs)
+{
+    /* Handle a blocked transaction: raise IRQ, capture info, etc */
+    if (!s->int_stat) {
+        /* First blocked transfer: capture information into INT_INFO1 and
+         * INT_INFO2. Subsequent transfers are still blocked but don't
+         * capture information until the guest clears the interrupt.
+         */
+
+        s->int_info1 = addr;
+        s->int_info2 = 0;
+        s->int_info2 = FIELD_DP32(s->int_info2, INT_INFO2, HMASTER,
+                                  attrs.requester_id & 0xffff);
+        s->int_info2 = FIELD_DP32(s->int_info2, INT_INFO2, HNONSEC,
+                                  ~attrs.secure);
+        s->int_info2 = FIELD_DP32(s->int_info2, INT_INFO2, CFG_NS,
+                                  tz_mpc_cfg_ns(s, addr));
+        s->int_stat |= R_INT_STAT_IRQ_MASK;
+        tz_mpc_irq_update(s);
+    }
+
+    /* Generate bus error if desired; otherwise RAZ/WI */
+    return (s->ctrl & R_CTRL_SEC_RESP_MASK) ? MEMTX_ERROR : MEMTX_OK;
+}
+
 /* Accesses only reach these read and write functions if the MPC is
  * blocking them; non-blocked accesses go directly to the downstream
  * memory region without passing through this code.
@@ -XXX,XX +XXX,XX @@ static MemTxResult tz_mpc_mem_blocked_read(void *opaque, hwaddr addr,
                                            uint64_t *pdata,
                                            unsigned size, MemTxAttrs attrs)
 {
+    TZMPC *s = TZ_MPC(opaque);
+
     trace_tz_mpc_mem_blocked_read(addr, size, attrs.secure);
 
     *pdata = 0;
-    return MEMTX_OK;
+    return tz_mpc_handle_block(s, addr, attrs);
 }
 
 static MemTxResult tz_mpc_mem_blocked_write(void *opaque, hwaddr addr,
                                             uint64_t value,
                                             unsigned size, MemTxAttrs attrs)
 {
+    TZMPC *s = TZ_MPC(opaque);
+
     trace_tz_mpc_mem_blocked_write(addr, value, size, attrs.secure);
 
-    return MEMTX_OK;
+    return tz_mpc_handle_block(s, addr, attrs);
 }
 
 static const MemoryRegionOps tz_mpc_mem_blocked_ops = {
-- 
2.17.1

The final part of the Memory Protection Controller we need to
implement is actually using the BLK_LUT data programmed by the
guest to determine whether to block the transaction or not.

Since this means we now change transaction mappings when
the guest writes to BLK_LUT, we must also call the IOMMU
notifiers at that point.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20180620132032.28865-5-peter.maydell@linaro.org
---
 hw/misc/tz-mpc.c     | 53 ++++++++++++++++++++++++++++++++++++++++++--
 hw/misc/trace-events |  1 +
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/hw/misc/tz-mpc.c b/hw/misc/tz-mpc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/tz-mpc.c
+++ b/hw/misc/tz-mpc.c
@@ -XXX,XX +XXX,XX @@ static void tz_mpc_irq_update(TZMPC *s)
     qemu_set_irq(s->irq, s->int_stat && s->int_en);
 }
 
+static void tz_mpc_iommu_notify(TZMPC *s, uint32_t lutidx,
+                                uint32_t oldlut, uint32_t newlut)
+{
+    /* Called when the LUT word at lutidx has changed from oldlut to newlut;
+     * must call the IOMMU notifiers for the changed blocks.
+     */
+    IOMMUTLBEntry entry = {
+        .addr_mask = s->blocksize - 1,
+    };
+    hwaddr addr = lutidx * s->blocksize * 32;
+    int i;
+
+    for (i = 0; i < 32; i++, addr += s->blocksize) {
+        bool block_is_ns;
+
+        if (!((oldlut ^ newlut) & (1 << i))) {
+            continue;
+        }
+        /* This changes the mappings for both the S and the NS space,
+         * so we need to do four notifies: an UNMAP then a MAP for each.
+         */
+        block_is_ns = newlut & (1 << i);
+
+        trace_tz_mpc_iommu_notify(addr);
+        entry.iova = addr;
+        entry.translated_addr = addr;
+
+        entry.perm = IOMMU_NONE;
+        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_S, entry);
+        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_NS, entry);
+
+        entry.perm = IOMMU_RW;
+        if (block_is_ns) {
+            entry.target_as = &s->blocked_io_as;
+        } else {
+            entry.target_as = &s->downstream_as;
+        }
+        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_S, entry);
+        if (block_is_ns) {
+            entry.target_as = &s->downstream_as;
+        } else {
+            entry.target_as = &s->blocked_io_as;
+        }
+        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_NS, entry);
+    }
+}
+
 static void tz_mpc_autoinc_idx(TZMPC *s, unsigned access_size)
 {
     /* Auto-increment BLK_IDX if necessary */
@@ -XXX,XX +XXX,XX @@ static MemTxResult tz_mpc_reg_write(void *opaque, hwaddr addr,
         s->blk_idx = value % s->blk_max;
         break;
     case A_BLK_LUT:
+        tz_mpc_iommu_notify(s, s->blk_idx, s->blk_lut[s->blk_idx], value);
         s->blk_lut[s->blk_idx] = value;
         tz_mpc_autoinc_idx(s, size);
         break;
@@ -XXX,XX +XXX,XX @@ static IOMMUTLBEntry tz_mpc_translate(IOMMUMemoryRegion *iommu,
     /* Look at the per-block configuration for this address, and
      * return a TLB entry directing the transaction at either
      * downstream_as or blocked_io_as, as appropriate.
-     * For the moment, always permit accesses.
+     * If the LUT cfg_ns bit is 1, only non-secure transactions
+     * may pass. If the bit is 0, only secure transactions may pass.
      */
-    ok = true;
+    ok = tz_mpc_cfg_ns(s, addr) == (iommu_idx == IOMMU_IDX_NS);
 
     trace_tz_mpc_translate(addr, flags,
                            iommu_idx == IOMMU_IDX_S ? "S" : "NS",
diff --git a/hw/misc/trace-events b/hw/misc/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/trace-events
+++ b/hw/misc/trace-events
@@ -XXX,XX +XXX,XX @@ tz_mpc_reg_write(uint32_t offset, uint64_t data, unsigned size) "TZ MPC regs wri
 tz_mpc_mem_blocked_read(uint64_t addr, unsigned size, bool secure) "TZ MPC blocked read: offset 0x%" PRIx64 " size %u secure %d"
 tz_mpc_mem_blocked_write(uint64_t addr, uint64_t data, unsigned size, bool secure) "TZ MPC blocked write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u secure %d"
 tz_mpc_translate(uint64_t addr, int flags, const char *idx, const char *res) "TZ MPC translate: addr 0x%" PRIx64 " flags 0x%x iommu_idx %s: %s"
+tz_mpc_iommu_notify(uint64_t addr) "TZ MPC iommu: notifying UNMAP/MAP for 0x%" PRIx64
 
 # hw/misc/tz-ppc.c
 tz_ppc_reset(void) "TZ PPC: reset"
-- 
2.17.1

Implement the SECMPCINTSTATUS register. This is the only register
in the security controller that deals with Memory Protection
Controllers, and it simply provides a read-only view of the
interrupt lines from the various MPCs in the system.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180620132032.28865-6-peter.maydell@linaro.org
---
 include/hw/misc/iotkit-secctl.h |  8 +++++++
 hw/misc/iotkit-secctl.c         | 38 +++++++++++++++++++++++++++++++--
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/include/hw/misc/iotkit-secctl.h b/include/hw/misc/iotkit-secctl.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/iotkit-secctl.h
+++ b/include/hw/misc/iotkit-secctl.h
@@ -XXX,XX +XXX,XX @@
  *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_enable
  *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_clear
  *  + named GPIO inputs ahb_ppcexp{0,1,2,3}_irq_status
+ * Controlling the MPC in the IoTKit:
+ *  + named GPIO input mpc_status
+ * Controlling each of the 16 expansion MPCs which a system using the IoTKit
+ * might provide:
+ *  + named GPIO inputs mpcexp_status[0..15]
  */
 
 #ifndef IOTKIT_SECCTL_H
@@ -XXX,XX +XXX,XX @@
 #define IOTS_NUM_APB_PPC 2
 #define IOTS_NUM_APB_EXP_PPC 4
 #define IOTS_NUM_AHB_EXP_PPC 4
+#define IOTS_NUM_EXP_MPC 16
+#define IOTS_NUM_MPC 1
 
 typedef struct IoTKitSecCtl IoTKitSecCtl;
 
@@ -XXX,XX +XXX,XX @@ struct IoTKitSecCtl {
     uint32_t secrespcfg;
     uint32_t nsccfg;
     uint32_t brginten;
+    uint32_t mpcintstatus;
 
     IoTKitSecCtlPPC apb[IOTS_NUM_APB_PPC];
     IoTKitSecCtlPPC apbexp[IOTS_NUM_APB_EXP_PPC];
diff --git a/hw/misc/iotkit-secctl.c b/hw/misc/iotkit-secctl.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/iotkit-secctl.c
+++ b/hw/misc/iotkit-secctl.c
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_read(void *opaque, hwaddr addr,
     case A_NSCCFG:
         r = s->nsccfg;
         break;
+    case A_SECMPCINTSTATUS:
+        r = s->mpcintstatus;
+        break;
     case A_SECPPCINTSTAT:
         r = s->secppcintstat;
         break;
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_read(void *opaque, hwaddr addr,
     case A_APBSPPPCEXP3:
         r = s->apbexp[offset_to_ppc_idx(offset)].sp;
         break;
-    case A_SECMPCINTSTATUS:
     case A_SECMSCINTSTAT:
     case A_SECMSCINTEN:
     case A_NSMSCEXP:
@@ -XXX,XX +XXX,XX @@ static void iotkit_secctl_reset(DeviceState *dev)
     foreach_ppc(s, iotkit_secctl_reset_ppc);
 }
 
+static void iotkit_secctl_mpc_status(void *opaque, int n, int level)
+{
+    IoTKitSecCtl *s = IOTKIT_SECCTL(opaque);
+
+    s->mpcintstatus = deposit32(s->mpcintstatus, 0, 1, !!level);
+}
+
+static void iotkit_secctl_mpcexp_status(void *opaque, int n, int level)
+{
+    IoTKitSecCtl *s = IOTKIT_SECCTL(opaque);
+
+    s->mpcintstatus = deposit32(s->mpcintstatus, n + 16, 1, !!level);
+}
+
 static void iotkit_secctl_ppc_irqstatus(void *opaque, int n, int level)
 {
     IoTKitSecCtlPPC *ppc = opaque;
@@ -XXX,XX +XXX,XX @@ static void iotkit_secctl_init(Object *obj)
     qdev_init_gpio_out_named(dev, &s->sec_resp_cfg, "sec_resp_cfg", 1);
     qdev_init_gpio_out_named(dev, &s->nsc_cfg_irq, "nsc_cfg", 1);
 
+    qdev_init_gpio_in_named(dev, iotkit_secctl_mpc_status, "mpc_status", 1);
+    qdev_init_gpio_in_named(dev, iotkit_secctl_mpcexp_status,
+                            "mpcexp_status", IOTS_NUM_EXP_MPC);
+
     memory_region_init_io(&s->s_regs, obj, &iotkit_secctl_s_ops,
                           s, "iotkit-secctl-s-regs", 0x1000);
     memory_region_init_io(&s->ns_regs, obj, &iotkit_secctl_ns_ops,
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription iotkit_secctl_ppc_vmstate = {
     }
 };
 
+static const VMStateDescription iotkit_secctl_mpcintstatus_vmstate = {
+    .name = "iotkit-secctl-mpcintstatus",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(mpcintstatus, IoTKitSecCtl),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription iotkit_secctl_vmstate = {
     .name = "iotkit-secctl",
     .version_id = 1,
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription iotkit_secctl_vmstate = {
         VMSTATE_STRUCT_ARRAY(ahbexp, IoTKitSecCtl, IOTS_NUM_AHB_EXP_PPC, 1,
                              iotkit_secctl_ppc_vmstate, IoTKitSecCtlPPC),
         VMSTATE_END_OF_LIST()
-    }
+    },
+    .subsections = (const VMStateDescription*[]) {
+        &iotkit_secctl_mpcintstatus_vmstate,
+        NULL
+    },
 };
 
 static void iotkit_secctl_class_init(ObjectClass *klass, void *data)
-- 
2.17.1

Wire up the one MPC that is part of the IoTKit itself. For the
moment we don't wire up its interrupt line.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180620132032.28865-7-peter.maydell@linaro.org
---
 include/hw/arm/iotkit.h |  2 ++
 hw/arm/iotkit.c         | 38 +++++++++++++++++++++++++++-----------
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/include/hw/arm/iotkit.h b/include/hw/arm/iotkit.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/iotkit.h
+++ b/include/hw/arm/iotkit.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/arm/armv7m.h"
 #include "hw/misc/iotkit-secctl.h"
 #include "hw/misc/tz-ppc.h"
+#include "hw/misc/tz-mpc.h"
 #include "hw/timer/cmsdk-apb-timer.h"
 #include "hw/misc/unimp.h"
 #include "hw/or-irq.h"
@@ -XXX,XX +XXX,XX @@ typedef struct IoTKit {
     IoTKitSecCtl secctl;
     TZPPC apb_ppc0;
     TZPPC apb_ppc1;
+    TZMPC mpc;
     CMSDKAPBTIMER timer0;
     CMSDKAPBTIMER timer1;
     qemu_or_irq ppc_irq_orgate;
diff --git a/hw/arm/iotkit.c b/hw/arm/iotkit.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/iotkit.c
+++ b/hw/arm/iotkit.c
@@ -XXX,XX +XXX,XX @@ static void iotkit_init(Object *obj)
                       TYPE_TZ_PPC);
     init_sysbus_child(obj, "apb-ppc1", &s->apb_ppc1, sizeof(s->apb_ppc1),
                       TYPE_TZ_PPC);
+    init_sysbus_child(obj, "mpc", &s->mpc, sizeof(s->mpc), TYPE_TZ_MPC);
     init_sysbus_child(obj, "timer0", &s->timer0, sizeof(s->timer0),
                       TYPE_CMSDK_APB_TIMER);
     init_sysbus_child(obj, "timer1", &s->timer1, sizeof(s->timer1),
@@ -XXX,XX +XXX,XX @@ static void iotkit_realize(DeviceState *dev, Error **errp)
      */
     make_alias(s, &s->alias3, "alias 3", 0x50000000, 0x10000000, 0x40000000);
 
-    /* This RAM should be behind a Memory Protection Controller, but we
-     * don't implement that yet.
-     */
-    memory_region_init_ram(&s->sram0, NULL, "iotkit.sram0", 0x00008000, &err);
-    if (err) {
-        error_propagate(errp, err);
-        return;
-    }
-    memory_region_add_subregion(&s->container, 0x20000000, &s->sram0);
 
     /* Security controller */
     object_property_set_bool(OBJECT(&s->secctl), true, "realized", &err);
@@ -XXX,XX +XXX,XX @@ static void iotkit_realize(DeviceState *dev, Error **errp)
     qdev_connect_gpio_out_named(dev_secctl, "sec_resp_cfg", 0,
                                 qdev_get_gpio_in(dev_splitter, 0));
 
+    /* This RAM lives behind the Memory Protection Controller */
+    memory_region_init_ram(&s->sram0, NULL, "iotkit.sram0", 0x00008000, &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    object_property_set_link(OBJECT(&s->mpc), OBJECT(&s->sram0),
+                             "downstream", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    object_property_set_bool(OBJECT(&s->mpc), true, "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    /* Map the upstream end of the MPC into the right place... */
+    memory_region_add_subregion(&s->container, 0x20000000,
+                                sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->mpc),
+                                                       1));
+    /* ...and its register interface */
+    memory_region_add_subregion(&s->container, 0x50083000,
+                                sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->mpc),
+                                                       0));
+
     /* Devices behind APB PPC0:
      *   0x40000000: timer0
      *   0x40001000: timer1
@@ -XXX,XX +XXX,XX @@ static void iotkit_realize(DeviceState *dev, Error **errp)
     create_unimplemented_device("NS watchdog", 0x40081000, 0x1000);
     create_unimplemented_device("S watchdog", 0x50081000, 0x1000);
 
-    create_unimplemented_device("SRAM0 MPC", 0x50083000, 0x1000);
-
     for (i = 0; i < ARRAY_SIZE(s->ppc_irq_splitter); i++) {
         Object *splitter = OBJECT(&s->ppc_irq_splitter[i]);
 
-- 
2.17.1

The interrupt outputs from the MPC in the IoTKit and the expansion
MPCs in the board must be wired up to the security controller, and
also all ORed together to produce a single line to the NVIC.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180620132032.28865-8-peter.maydell@linaro.org
---
 include/hw/arm/iotkit.h |  6 ++++
 hw/arm/iotkit.c         | 74 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+)

diff --git a/include/hw/arm/iotkit.h b/include/hw/arm/iotkit.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/iotkit.h
+++ b/include/hw/arm/iotkit.h
@@ -XXX,XX +XXX,XX @@
  *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_enable
  *  + named GPIO outputs ahb_ppcexp{0,1,2,3}_irq_clear
  *  + named GPIO inputs ahb_ppcexp{0,1,2,3}_irq_status
+ * Controlling each of the 16 expansion MPCs which a system using the IoTKit
+ * might provide:
+ *  + named GPIO inputs mpcexp_status[0..15]
  */
 
 #ifndef IOTKIT_H
@@ -XXX,XX +XXX,XX @@ typedef struct IoTKit {
     qemu_or_irq ppc_irq_orgate;
     SplitIRQ sec_resp_splitter;
     SplitIRQ ppc_irq_splitter[NUM_PPCS];
+    SplitIRQ mpc_irq_splitter[IOTS_NUM_EXP_MPC + IOTS_NUM_MPC];
+    qemu_or_irq mpc_irq_orgate;
 
     UnimplementedDeviceState dualtimer;
     UnimplementedDeviceState s32ktimer;
@@ -XXX,XX +XXX,XX @@ typedef struct IoTKit {
     qemu_irq nsc_cfg_in;
 
     qemu_irq irq_status_in[NUM_EXTERNAL_PPCS];
+    qemu_irq mpcexp_status_in[IOTS_NUM_EXP_MPC];
 
     uint32_t nsccfg;
 
diff --git a/hw/arm/iotkit.c b/hw/arm/iotkit.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/iotkit.c
+++ b/hw/arm/iotkit.c
@@ -XXX,XX +XXX,XX @@ static void iotkit_init(Object *obj)
     init_sysbus_child(obj, "apb-ppc1", &s->apb_ppc1, sizeof(s->apb_ppc1),
                       TYPE_TZ_PPC);
     init_sysbus_child(obj, "mpc", &s->mpc, sizeof(s->mpc), TYPE_TZ_MPC);
+    object_initialize(&s->mpc_irq_orgate, sizeof(s->mpc_irq_orgate),
+                      TYPE_OR_IRQ);
+    object_property_add_child(obj, "mpc-irq-orgate",
+                              OBJECT(&s->mpc_irq_orgate), &error_abort);
+    for (i = 0; i < ARRAY_SIZE(s->mpc_irq_splitter); i++) {
+        char *name = g_strdup_printf("mpc-irq-splitter-%d", i);
+        SplitIRQ *splitter = &s->mpc_irq_splitter[i];
+
+        object_initialize(splitter, sizeof(*splitter), TYPE_SPLIT_IRQ);
+        object_property_add_child(obj, name, OBJECT(splitter), &error_abort);
+        g_free(name);
+    }
     init_sysbus_child(obj, "timer0", &s->timer0, sizeof(s->timer0),
                       TYPE_CMSDK_APB_TIMER);
     init_sysbus_child(obj, "timer1", &s->timer1, sizeof(s->timer1),
@@ -XXX,XX +XXX,XX @@ static void iotkit_exp_irq(void *opaque, int n, int level)
     qemu_set_irq(s->exp_irqs[n], level);
 }
 
+static void iotkit_mpcexp_status(void *opaque, int n, int level)
+{
+    IoTKit *s = IOTKIT(opaque);
+    qemu_set_irq(s->mpcexp_status_in[n], level);
+}
+
 static void iotkit_realize(DeviceState *dev, Error **errp)
 {
     IoTKit *s = IOTKIT(dev);
@@ -XXX,XX +XXX,XX @@ static void iotkit_realize(DeviceState *dev, Error **errp)
                                 sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->mpc),
                                                        0));
 
+    /* We must OR together lines from the MPC splitters to go to the NVIC */
+    object_property_set_int(OBJECT(&s->mpc_irq_orgate),
+                            IOTS_NUM_EXP_MPC + IOTS_NUM_MPC, "num-lines", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    object_property_set_bool(OBJECT(&s->mpc_irq_orgate), true,
+                             "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    qdev_connect_gpio_out(DEVICE(&s->mpc_irq_orgate), 0,
+                          qdev_get_gpio_in(DEVICE(&s->armv7m), 9));
+
     /* Devices behind APB PPC0:
      *   0x40000000: timer0
      *   0x40001000: timer1
@@ -XXX,XX +XXX,XX @@ static void iotkit_realize(DeviceState *dev, Error **errp)
         g_free(gpioname);
     }
 
+    /* Wire up the splitters for the MPC IRQs */
+    for (i = 0; i < IOTS_NUM_EXP_MPC + IOTS_NUM_MPC; i++) {
+        SplitIRQ *splitter = &s->mpc_irq_splitter[i];
+        DeviceState *dev_splitter = DEVICE(splitter);
+
+        object_property_set_int(OBJECT(splitter), 2, "num-lines", &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
+        object_property_set_bool(OBJECT(splitter), true, "realized", &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
+
+        if (i < IOTS_NUM_EXP_MPC) {
+            /* Splitter input is from GPIO input line */
+            s->mpcexp_status_in[i] = qdev_get_gpio_in(dev_splitter, 0);
+            qdev_connect_gpio_out(dev_splitter, 0,
+                                  qdev_get_gpio_in_named(dev_secctl,
+                                                         "mpcexp_status", i));
+        } else {
+            /* Splitter input is from our own MPC */
+            qdev_connect_gpio_out_named(DEVICE(&s->mpc), "irq", 0,
+                                        qdev_get_gpio_in(dev_splitter, 0));
+            qdev_connect_gpio_out(dev_splitter, 0,
+                                  qdev_get_gpio_in_named(dev_secctl,
+                                                         "mpc_status", 0));
+        }
+
+        qdev_connect_gpio_out(dev_splitter, 1,
+                              qdev_get_gpio_in(DEVICE(&s->mpc_irq_orgate), i));
+    }
+    /* Create GPIO inputs which will pass the line state for our
+     * mpcexp_irq inputs to the correct splitter devices.
+     */
+    qdev_init_gpio_in_named(dev, iotkit_mpcexp_status, "mpcexp_status",
+                            IOTS_NUM_EXP_MPC);
+
     iotkit_forward_sec_resp_cfg(s);
 
     system_clock_scale = NANOSECONDS_PER_SECOND / s->mainclk_frq;
-- 
2.17.1

Instantiate and wire up the Memory Protection Controllers
in the MPS2 board itself.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180620132032.28865-9-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 71 ++++++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 27 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/timer/cmsdk-apb-timer.h"
 #include "hw/misc/mps2-scc.h"
 #include "hw/misc/mps2-fpgaio.h"
+#include "hw/misc/tz-mpc.h"
 #include "hw/arm/iotkit.h"
 #include "hw/devices.h"
 #include "net/net.h"
@@ -XXX,XX +XXX,XX @@ typedef struct {
 
     IoTKit iotkit;
     MemoryRegion psram;
-    MemoryRegion ssram1;
+    MemoryRegion ssram[3];
     MemoryRegion ssram1_m;
-    MemoryRegion ssram23;
     MPS2SCC scc;
     MPS2FPGAIO fpgaio;
     TZPPC ppc[5];
-    UnimplementedDeviceState ssram_mpc[3];
+    TZMPC ssram_mpc[3];
     UnimplementedDeviceState spi[5];
     UnimplementedDeviceState i2c[4];
     UnimplementedDeviceState i2s_audio;
@@ -XXX,XX +XXX,XX @@ typedef struct {
 /* Main SYSCLK frequency in Hz */
 #define SYSCLK_FRQ 20000000
 
-/* Initialize the auxiliary RAM region @mr and map it into
- * the memory map at @base.
- */
-static void make_ram(MemoryRegion *mr, const char *name,
-                     hwaddr base, hwaddr size)
-{
-    memory_region_init_ram(mr, NULL, name, size, &error_fatal);
-    memory_region_add_subregion(get_system_memory(), base, mr);
-}
-
 /* Create an alias of an entire original MemoryRegion @orig
  * located at @base in the memory map.
  */
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_eth_dev(MPS2TZMachineState *mms, void *opaque,
     return sysbus_mmio_get_region(s, 0);
 }
 
+static MemoryRegion *make_mpc(MPS2TZMachineState *mms, void *opaque,
+                              const char *name, hwaddr size)
+{
+    TZMPC *mpc = opaque;
+    int i = mpc - &mms->ssram_mpc[0];
+    MemoryRegion *ssram = &mms->ssram[i];
+    MemoryRegion *upstream;
+    char *mpcname = g_strdup_printf("%s-mpc", name);
+    static uint32_t ramsize[] = { 0x00400000, 0x00200000, 0x00200000 };
+    static uint32_t rambase[] = { 0x00000000, 0x28000000, 0x28200000 };
+
+    memory_region_init_ram(ssram, NULL, name, ramsize[i], &error_fatal);
+
+    init_sysbus_child(OBJECT(mms), mpcname, mpc,
+                      sizeof(mms->ssram_mpc[0]), TYPE_TZ_MPC);
+    object_property_set_link(OBJECT(mpc), OBJECT(ssram),
+                             "downstream", &error_fatal);
+    object_property_set_bool(OBJECT(mpc), true, "realized", &error_fatal);
+    /* Map the upstream end of the MPC into system memory */
+    upstream = sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 1);
+    memory_region_add_subregion(get_system_memory(), rambase[i], upstream);
+    /* and connect its interrupt to the IoTKit */
+    qdev_connect_gpio_out_named(DEVICE(mpc), "irq", 0,
+                                qdev_get_gpio_in_named(DEVICE(&mms->iotkit),
+                                                       "mpcexp_status", i));
+
+    /* The first SSRAM is a special case as it has an alias; accesses to
+     * the alias region at 0x00400000 must also go to the MPC upstream.
+     */
+    if (i == 0) {
+        make_ram_alias(&mms->ssram1_m, "mps.ssram1_m", upstream, 0x00400000);
+    }
+
+    g_free(mpcname);
+    /* Return the register interface MR for our caller to map behind the PPC */
+    return sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 0);
+}
+
 static void mps2tz_common_init(MachineState *machine)
 {
     MPS2TZMachineState *mms = MPS2TZ_MACHINE(machine);
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
                                          NULL, "mps.ram", 0x01000000);
     memory_region_add_subregion(system_memory, 0x80000000, &mms->psram);
 
-    /* The SSRAM memories should all be behind Memory Protection Controllers,
-     * but we don't implement that yet.
-     */
-    make_ram(&mms->ssram1, "mps.ssram1", 0x00000000, 0x00400000);
-    make_ram_alias(&mms->ssram1_m, "mps.ssram1_m", &mms->ssram1, 0x00400000);
-
-    make_ram(&mms->ssram23, "mps.ssram23", 0x28000000, 0x00400000);
-
     /* The overflow IRQs for all UARTs are ORed together.
      * Tx, Rx and "combined" IRQs are sent to the NVIC separately.
      * Create the OR gate for this.
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
     const PPCInfo ppcs[] = { {
             .name = "apb_ppcexp0",
             .ports = {
-                { "ssram-mpc0", make_unimp_dev, &mms->ssram_mpc[0],
-                  0x58007000, 0x1000 },
-                { "ssram-mpc1", make_unimp_dev, &mms->ssram_mpc[1],
-                  0x58008000, 0x1000 },
-                { "ssram-mpc2", make_unimp_dev, &mms->ssram_mpc[2],
-                  0x58009000, 0x1000 },
+                { "ssram-0", make_mpc, &mms->ssram_mpc[0], 0x58007000, 0x1000 },
+                { "ssram-1", make_mpc, &mms->ssram_mpc[1], 0x58008000, 0x1000 },
+                { "ssram-2", make_mpc, &mms->ssram_mpc[2], 0x58009000, 0x1000 },
             },
         }, {
             .name = "apb_ppcexp1",
-- 
2.17.1

From: Julia Suvorova <jusual@mail.ru>

This feature is intended to distinguish ARMv8-M variants: Baseline and
Mainline. ARMv7-M compatibility requires the Main Extension. ARMv6-M
compatibility is provided by all ARMv8-M implementations.

Signed-off-by: Julia Suvorova <jusual@mail.ru>
Message-id: 20180622080138.17702-2-jusual@mail.ru
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 1 +
 target/arm/cpu.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ enum arm_features {
     ARM_FEATURE_V8_RDM, /* implements v8.1 simd round multiply */
     ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
     ARM_FEATURE_V8_FCMA, /* has complex number part of v8.3 extensions.  */
+    ARM_FEATURE_M_MAIN, /* M profile Main Extension */
 };
 
 static inline int arm_feature(CPUARMState *env, int feature)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void cortex_m3_initfn(Object *obj)
     ARMCPU *cpu = ARM_CPU(obj);
     set_feature(&cpu->env, ARM_FEATURE_V7);
     set_feature(&cpu->env, ARM_FEATURE_M);
+    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
     cpu->midr = 0x410fc231;
     cpu->pmsav7_dregion = 8;
     cpu->id_pfr0 = 0x00000030;
@@ -XXX,XX +XXX,XX @@ static void cortex_m4_initfn(Object *obj)
 
     set_feature(&cpu->env, ARM_FEATURE_V7);
     set_feature(&cpu->env, ARM_FEATURE_M);
+    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
     set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
     cpu->midr = 0x410fc240; /* r0p0 */
     cpu->pmsav7_dregion = 8;
@@ -XXX,XX +XXX,XX @@ static void cortex_m33_initfn(Object *obj)
 
     set_feature(&cpu->env, ARM_FEATURE_V8);
     set_feature(&cpu->env, ARM_FEATURE_M);
+    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
     set_feature(&cpu->env, ARM_FEATURE_M_SECURITY);
     set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
     cpu->midr = 0x410fd213; /* r0p3 */
-- 
2.17.1

From: Julia Suvorova <jusual@mail.ru>

Unlike ARMv7-M, ARMv6-M and ARMv8-M Baseline only supports naturally
aligned memory accesses for load/store instructions.

Signed-off-by: Julia Suvorova <jusual@mail.ru>
Message-id: 20180622080138.17702-3-jusual@mail.ru
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline TCGv gen_aa32_addr(DisasContext *s, TCGv_i32 a32, TCGMemOp op)
 static void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
                             int index, TCGMemOp opc)
 {
-    TCGv addr = gen_aa32_addr(s, a32, opc);
+    TCGv addr;
+
+    if (arm_dc_feature(s, ARM_FEATURE_M) &&
+        !arm_dc_feature(s, ARM_FEATURE_M_MAIN)) {
+        opc |= MO_ALIGN;
+    }
+
+    addr = gen_aa32_addr(s, a32, opc);
     tcg_gen_qemu_ld_i32(val, addr, index, opc);
     tcg_temp_free(addr);
 }
@@ -XXX,XX +XXX,XX @@ static void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
 static void gen_aa32_st_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
                             int index, TCGMemOp opc)
 {
-    TCGv addr = gen_aa32_addr(s, a32, opc);
+    TCGv addr;
+
+    if (arm_dc_feature(s, ARM_FEATURE_M) &&
+        !arm_dc_feature(s, ARM_FEATURE_M_MAIN)) {
+        opc |= MO_ALIGN;
+    }
+
+    addr = gen_aa32_addr(s, a32, opc);
     tcg_gen_qemu_st_i32(val, addr, index, opc);
     tcg_temp_free(addr);
 }
-- 
2.17.1

checkpatch reminds us that statics shouldn't be zero-initialized:

ERROR: do not initialise statics to 0 or NULL
#35: FILE: vl.c:157:
+static int num_serial_hds = 0;

ERROR: do not initialise statics to 0 or NULL
#36: FILE: vl.c:158:
+static Chardev **serial_hds = NULL;

I forgot to fix this in 6af2692e86f9fdfb3d; do so now.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20180426140253.3918-1-peter.maydell@linaro.org
---
 vl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/vl.c b/vl.c
index XXXXXXX..XXXXXXX 100644
--- a/vl.c
+++ b/vl.c
@@ -XXX,XX +XXX,XX @@ QEMUClockType rtc_clock;
 int vga_interface_type = VGA_NONE;
 static DisplayOptions dpy;
 int no_frame;
-static int num_serial_hds = 0;
-static Chardev **serial_hds = NULL;
+static int num_serial_hds;
+static Chardev **serial_hds;
 Chardev *parallel_hds[MAX_PARALLEL_PORTS];
 Chardev *virtcon_hds[MAX_VIRTIO_CONSOLES];
 int win2k_install_hack = 0;
-- 
2.17.1

The xen pci_assign_dev_load_option_rom() currently creates a RAM
memory region with memory_region_init_ram_nomigrate(), and then
manually registers it with vmstate_register_ram(). In fact for
its only callsite, the 'owner' pointer we use for the init call
and the '&dev->qdev' pointer we use for the vmstate_register_ram()
call refer to the same object. Simplify the function to only
take a pointer to the device once instead of twice, and use
memory_region_init_ram() which automatically does the vmstate
register for us.

Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/xen/xen_pt.h          | 2 +-
 hw/xen/xen_pt_graphics.c | 2 +-
 hw/xen/xen_pt_load_rom.c | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -XXX,XX +XXX,XX @@ static inline bool xen_pt_has_msix_mapping(XenPCIPassthroughState *s, int bar)
 }
 
 extern void *pci_assign_dev_load_option_rom(PCIDevice *dev,
-                                            struct Object *owner, int *size,
+                                            int *size,
                                             unsigned int domain,
                                             unsigned int bus, unsigned int slot,
                                             unsigned int function);
diff --git a/hw/xen/xen_pt_graphics.c b/hw/xen/xen_pt_graphics.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/xen/xen_pt_graphics.c
+++ b/hw/xen/xen_pt_graphics.c
@@ -XXX,XX +XXX,XX @@ int xen_pt_unregister_vga_regions(XenHostPCIDevice *dev)
 static void *get_vgabios(XenPCIPassthroughState *s, int *size,
                        XenHostPCIDevice *dev)
 {
-    return pci_assign_dev_load_option_rom(&s->dev, OBJECT(&s->dev), size,
+    return pci_assign_dev_load_option_rom(&s->dev, size,
                                           dev->domain, dev->bus,
                                           dev->dev, dev->func);
 }
diff --git a/hw/xen/xen_pt_load_rom.c b/hw/xen/xen_pt_load_rom.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/xen/xen_pt_load_rom.c
+++ b/hw/xen/xen_pt_load_rom.c
@@ -XXX,XX +XXX,XX @@
  * load the corresponding ROM data to RAM. If an error occurs while loading an
  * option ROM, we just ignore that option ROM and continue with the next one.
  */
-void *pci_assign_dev_load_option_rom(PCIDevice *dev, struct Object *owner,
+void *pci_assign_dev_load_option_rom(PCIDevice *dev,
                                      int *size, unsigned int domain,
                                      unsigned int bus, unsigned int slot,
                                      unsigned int function)
@@ -XXX,XX +XXX,XX @@ void *pci_assign_dev_load_option_rom(PCIDevice *dev, struct Object *owner,
     uint8_t val;
     struct stat st;
     void *ptr = NULL;
+    Object *owner = OBJECT(dev);
 
     /* If loading ROM from file, pci handles it */
     if (dev->romfile || !dev->rom_bar) {
@@ -XXX,XX +XXX,XX @@ void *pci_assign_dev_load_option_rom(PCIDevice *dev, struct Object *owner,
     fseek(fp, 0, SEEK_SET);
 
     snprintf(name, sizeof(name), "%s.rom", object_get_typename(owner));
-    memory_region_init_ram_nomigrate(&dev->rom, owner, name, st.st_size, &error_abort);
-    vmstate_register_ram(&dev->rom, &dev->qdev);
+    memory_region_init_ram(&dev->rom, owner, name, st.st_size, &error_abort);
     ptr = memory_region_get_ram_ptr(&dev->rom);
     memset(ptr, 0xff, st.st_size);
 
-- 
2.17.1

Hi; this pullreq contains only my FEAT_AFP/FEAT_RPRES patches
(plus a fix for a target/alpha latent bug that would otherwise
be revealed by the fpu changes), because 68 patches is already
longer than I prefer to send in at one time...

thanks
-- PMM

The following changes since commit ffaf7f0376f8040ce9068d71ae9ae8722505c42e:

Merge tag 'pull-10.0-testing-and-gdstub-updates-100225-1' of https://gitlab.com/stsquad/qemu into staging (2025-02-10 13:26:17 -0500)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20250211

for you to fetch changes up to ca4c34e07d1388df8e396520b5e7d60883cd3690:

target/arm: Sink fp_status and fpcr access into do_fmlal* (2025-02-11 16:22:08 +0000)

----------------------------------------------------------------
target-arm queue:
 * target/alpha: Don't corrupt error_code with unknown softfloat flags
 * target/arm: Implement FEAT_AFP and FEAT_RPRES

----------------------------------------------------------------
Peter Maydell (49):
      target/alpha: Don't corrupt error_code with unknown softfloat flags
      fpu: Add float_class_denormal
      fpu: Implement float_flag_input_denormal_used
      fpu: allow flushing of output denormals to be after rounding
      target/arm: Define FPCR AH, FIZ, NEP bits
      target/arm: Implement FPCR.FIZ handling
      target/arm: Adjust FP behaviour for FPCR.AH = 1
      target/arm: Adjust exception flag handling for AH = 1
      target/arm: Add FPCR.AH to tbflags
      target/arm: Set up float_status to use for FPCR.AH=1 behaviour
      target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
      target/arm: Use FPST_FPCR_AH for BFCVT* insns
      target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
      target/arm: Add FPCR.NEP to TBFLAGS
      target/arm: Define and use new write_fp_*reg_merging() functions
      target/arm: Handle FPCR.NEP for 3-input scalar operations
      target/arm: Handle FPCR.NEP for BFCVT scalar
      target/arm: Handle FPCR.NEP for 1-input scalar operations
      target/arm: Handle FPCR.NEP in do_cvtf_scalar()
      target/arm: Handle FPCR.NEP for scalar FABS and FNEG
      target/arm: Handle FPCR.NEP for FCVTXN (scalar)
      target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
      target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
      target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
      target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
      target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
      target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
      target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
      target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
      target/arm: Implement FPCR.AH handling of negation of NaN
      target/arm: Implement FPCR.AH handling for scalar FABS and FABD
      target/arm: Handle FPCR.AH in vector FABD
      target/arm: Handle FPCR.AH in SVE FNEG
      target/arm: Handle FPCR.AH in SVE FABS
      target/arm: Handle FPCR.AH in SVE FABD
      target/arm: Handle FPCR.AH in negation steps in SVE FCADD
      target/arm: Handle FPCR.AH in negation steps in FCADD
      target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
      target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
      target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
      target/arm: Handle FPCR.AH in negation in FMLS (vector)
      target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
      target/arm: Handle FPCR.AH in SVE FTSSEL
      target/arm: Handle FPCR.AH in SVE FTMAD
      target/arm: Enable FEAT_AFP for '-cpu max'
      target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
      target/arm: Implement increased precision FRECPE
      target/arm: Implement increased precision FRSQRTE
      target/arm: Enable FEAT_RPRES for -cpu max

Richard Henderson (19):
      target/arm: Handle FPCR.AH in vector FCMLA
      target/arm: Handle FPCR.AH in FCMLA by index
      target/arm: Handle FPCR.AH in SVE FCMLA
      target/arm: Handle FPCR.AH in FMLSL (by element and vector)
      target/arm: Handle FPCR.AH in SVE FMLSL (indexed)
      target/arm: Handle FPCR.AH in SVE FMLSLB, FMLSLT (vectors)
      target/arm: Introduce CPUARMState.vfp.fp_status[]
      target/arm: Remove standard_fp_status_f16
      target/arm: Remove standard_fp_status
      target/arm: Remove ah_fp_status_f16
      target/arm: Remove ah_fp_status
      target/arm: Remove fp_status_f16_a64
      target/arm: Remove fp_status_f16_a32
      target/arm: Remove fp_status_a64
      target/arm: Remove fp_status_a32
      target/arm: Simplify fp_status indexing in mve_helper.c
      target/arm: Simplify DO_VFP_cmp in vfp_helper.c
      target/arm: Read fz16 from env->vfp.fpcr
      target/arm: Sink fp_status and fpcr access into do_fmlal*

In do_cvttq() we set env->error_code with what is supposed to be a
set of FPCR exception bit values.  However, if the set of float
exception flags we get back from softfloat for the conversion
includes a flag which is not one of the three we expect here
(invalid_cvti, invalid, inexact) then we will fall through the
if-ladder and set env->error_code to the unconverted softfloat
exception_flag value.  This will then cause us to take a spurious
exception.

This is harmless now, but when we add new floating point exception
flags to softfloat it will cause problems.  Add an else clause to the
if-ladder to make it ignore any float exception flags it doesn't care
about.

Specifically, without this fix, 'make check-tcg' will fail for Alpha
when the commit adding float_flag_input_denormal_used lands.

Fixes: aa3bad5b59e7 ("target/alpha: Use float64_to_int64_modulo for CVTTQ")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
 target/alpha/fpu_helper.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/alpha/fpu_helper.c b/target/alpha/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/alpha/fpu_helper.c
+++ b/target/alpha/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t do_cvttq(CPUAlphaState *env, uint64_t a, int roundmode)
             exc = FPCR_INV;
         } else if (exc & float_flag_inexact) {
             exc = FPCR_INE;
+        } else {
+            exc = 0;
         }
     }
     env->error_code = exc;
-- 
2.34.1

Currently in softfloat we canonicalize input denormals and so the
code that implements floating point operations does not need to care
whether the input value was originally normal or denormal.  However,
both x86 and Arm FEAT_AFP require that an exception flag is set if:
 * an input is denormal
 * that input is not squashed to zero
 * that input is actually used in the calculation (e.g. we
   did not find the other input was a NaN)

So we need to track that the input was a non-squashed denormal.  To
do this we add a new value to the FloatClass enum.  In this commit we
add the value and adjust the code everywhere that looks at FloatClass
values so that the new float_class_denormal behaves identically to
float_class_normal.  We will add the code that does the "raise a new
float exception flag if an input was an unsquashed denormal and we
used it" in a subsequent commit.

There should be no behavioural change in this commit.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 32 ++++++++++++++++++++++++++++---
 fpu/softfloat-parts.c.inc | 40 ++++++++++++++++++++++++---------------
 2 files changed, 54 insertions(+), 18 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -XXX,XX +XXX,XX @@ float64_gen2(float64 xa, float64 xb, float_status *s,
 /*
  * Classify a floating point number. Everything above float_class_qnan
  * is a NaN so cls >= float_class_qnan is any NaN.
+ *
+ * Note that we canonicalize denormals, so most code should treat
+ * class_normal and class_denormal identically.
  */
 
 typedef enum __attribute__ ((__packed__)) {
     float_class_unclassified,
     float_class_zero,
     float_class_normal,
+    float_class_denormal, /* input was a non-squashed denormal */
     float_class_inf,
     float_class_qnan,  /* all NaNs from here */
     float_class_snan,
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__ ((__packed__)) {
 enum {
     float_cmask_zero    = float_cmask(float_class_zero),
     float_cmask_normal  = float_cmask(float_class_normal),
+    float_cmask_denormal = float_cmask(float_class_denormal),
     float_cmask_inf     = float_cmask(float_class_inf),
     float_cmask_qnan    = float_cmask(float_class_qnan),
     float_cmask_snan    = float_cmask(float_class_snan),
 
     float_cmask_infzero = float_cmask_zero | float_cmask_inf,
     float_cmask_anynan  = float_cmask_qnan | float_cmask_snan,
+    float_cmask_anynorm = float_cmask_normal | float_cmask_denormal,
 };
 
 /* Flags for parts_minmax. */
@@ -XXX,XX +XXX,XX @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
     return c == float_class_qnan;
 }
 
+/*
+ * Return true if the float_cmask has only normals in it
+ * (including input denormals that were canonicalized)
+ */
+static inline bool cmask_is_only_normals(int cmask)
+{
+    return !(cmask & ~float_cmask_anynorm);
+}
+
+static inline bool is_anynorm(FloatClass c)
+{
+    return float_cmask(c) & float_cmask_anynorm;
+}
+
 /*
  * Structure holding all of the decomposed parts of a float.
  * The exponent is unbiased and the fraction is normalized.
@@ -XXX,XX +XXX,XX @@ static float64 float64r32_round_pack_canonical(FloatParts64 *p,
      */
     switch (p->cls) {
     case float_class_normal:
+    case float_class_denormal:
         if (unlikely(p->exp == 0)) {
             /*
              * The result is denormal for float32, but can be represented
@@ -XXX,XX +XXX,XX @@ static floatx80 floatx80_round_pack_canonical(FloatParts128 *p,
 
     switch (p->cls) {
     case float_class_normal:
+    case float_class_denormal:
         if (s->floatx80_rounding_precision == floatx80_precision_x) {
             parts_uncanon_normal(p, s, fmt);
             frac = p->frac_hi;
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
         break;
 
     case float_class_normal:
+    case float_class_denormal:
     case float_class_zero:
         break;
 
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
     a->sign = b->sign;
     a->exp = b->exp;
 
-    if (a->cls == float_class_normal) {
+    if (is_anynorm(a->cls)) {
         frac_truncjam(a, b);
     } else if (is_nan(a->cls)) {
         /* Discard the low bits of the NaN. */
@@ -XXX,XX +XXX,XX @@ static Int128 float128_to_int128_scalbn(float128 a, FloatRoundMode rmode,
         return int128_zero();
 
     case float_class_normal:
+    case float_class_denormal:
         if (parts_round_to_int_normal(&p, rmode, scale, 128 - 2)) {
             flags = float_flag_inexact;
         }
@@ -XXX,XX +XXX,XX @@ static Int128 float128_to_uint128_scalbn(float128 a, FloatRoundMode rmode,
         return int128_zero();
 
     case float_class_normal:
+    case float_class_denormal:
         if (parts_round_to_int_normal(&p, rmode, scale, 128 - 2)) {
             flags = float_flag_inexact;
             if (p.cls == float_class_zero) {
@@ -XXX,XX +XXX,XX @@ float32 float32_exp2(float32 a, float_status *status)
     float32_unpack_canonical(&xp, a, status);
     if (unlikely(xp.cls != float_class_normal)) {
         switch (xp.cls) {
+        case float_class_denormal:
+            break;
         case float_class_snan:
         case float_class_qnan:
             parts_return_nan(&xp, status);
@@ -XXX,XX +XXX,XX @@ float32 float32_exp2(float32 a, float_status *status)
         case float_class_zero:
             return float32_one;
         default:
-            break;
+            g_assert_not_reached();
         }
-        g_assert_not_reached();
     }
 
     float_raise(float_flag_inexact, status);
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(canonicalize)(FloatPartsN *p, float_status *status,
             frac_clear(p);
         } else {
             int shift = frac_normalize(p);
-            p->cls = float_class_normal;
+            p->cls = float_class_denormal;
             p->exp = fmt->frac_shift - fmt->exp_bias
                    - shift + !fmt->m68k_denormal;
         }
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
 static void partsN(uncanon)(FloatPartsN *p, float_status *s,
                             const FloatFmt *fmt)
 {
-    if (likely(p->cls == float_class_normal)) {
+    if (likely(is_anynorm(p->cls))) {
         parts_uncanon_normal(p, s, fmt);
     } else {
         switch (p->cls) {
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
 
     if (a->sign != b_sign) {
         /* Subtraction */
-        if (likely(ab_mask == float_cmask_normal)) {
+        if (likely(cmask_is_only_normals(ab_mask))) {
             if (parts_sub_normal(a, b)) {
                 return a;
             }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
         }
     } else {
         /* Addition */
-        if (likely(ab_mask == float_cmask_normal)) {
+        if (likely(cmask_is_only_normals(ab_mask))) {
             parts_add_normal(a, b);
             return a;
         }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
     }
 
     if (b->cls == float_class_zero) {
-        g_assert(a->cls == float_class_normal);
+        g_assert(is_anynorm(a->cls));
         return a;
     }
 
     g_assert(a->cls == float_class_zero);
-    g_assert(b->cls == float_class_normal);
+    g_assert(is_anynorm(b->cls));
  return_b:
     b->sign = b_sign;
     return b;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
     bool sign = a->sign ^ b->sign;
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         FloatPartsW tmp;
 
         frac_mulw(&tmp, a, b);
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
         a->sign ^= 1;
     }
 
-    if (unlikely(ab_mask != float_cmask_normal)) {
+    if (unlikely(!cmask_is_only_normals(ab_mask))) {
         if (unlikely(ab_mask == float_cmask_infzero)) {
             float_raise(float_flag_invalid | float_flag_invalid_imz, s);
             goto d_nan;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
         }
 
         g_assert(ab_mask & float_cmask_zero);
-        if (c->cls == float_class_normal) {
+        if (is_anynorm(c->cls)) {
             *a = *c;
             goto return_normal;
         }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
     bool sign = a->sign ^ b->sign;
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         a->sign = sign;
         a->exp -= b->exp + frac_div(a, b);
         return a;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
 {
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         frac_modrem(a, b, mod_quot);
         return a;
     }
@@ -XXX,XX +XXX,XX @@ static void partsN(sqrt)(FloatPartsN *a, float_status *status,
 
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
+        case float_class_denormal:
+            break;
         case float_class_snan:
         case float_class_qnan:
             parts_return_nan(a, status);
@@ -XXX,XX +XXX,XX @@ static void partsN(round_to_int)(FloatPartsN *a, FloatRoundMode rmode,
     case float_class_inf:
         break;
     case float_class_normal:
+    case float_class_denormal:
         if (parts_round_to_int_normal(a, rmode, scale, fmt->frac_size)) {
             float_raise(float_flag_inexact, s);
         }
@@ -XXX,XX +XXX,XX @@ static int64_t partsN(float_to_sint)(FloatPartsN *p, FloatRoundMode rmode,
         return 0;
 
     case float_class_normal:
+    case float_class_denormal:
         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
         if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
             flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static uint64_t partsN(float_to_uint)(FloatPartsN *p, FloatRoundMode rmode,
         return 0;
 
     case float_class_normal:
+    case float_class_denormal:
         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
         if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
             flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static int64_t partsN(float_to_sint_modulo)(FloatPartsN *p,
         return 0;
 
     case float_class_normal:
+    case float_class_denormal:
         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
         if (parts_round_to_int_normal(p, rmode, 0, N - 2)) {
             flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
     a_exp = a->exp;
     b_exp = b->exp;
 
-    if (unlikely(ab_mask != float_cmask_normal)) {
+    if (unlikely(!cmask_is_only_normals(ab_mask))) {
         switch (a->cls) {
         case float_class_normal:
+        case float_class_denormal:
             break;
         case float_class_inf:
             a_exp = INT16_MAX;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
         }
         switch (b->cls) {
         case float_class_normal:
+        case float_class_denormal:
             break;
         case float_class_inf:
             b_exp = INT16_MAX;
@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
 {
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         FloatRelation cmp;
 
         if (a->sign != b->sign) {
@@ -XXX,XX +XXX,XX @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
     case float_class_inf:
         break;
     case float_class_normal:
+    case float_class_denormal:
         a->exp += MIN(MAX(n, -0x10000), 0x10000);
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
 
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
+        case float_class_denormal:
+            break;
         case float_class_snan:
         case float_class_qnan:
             parts_return_nan(a, s);
@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
             }
             return;
         default:
-            break;
+            g_assert_not_reached();
         }
-        g_assert_not_reached();
     }
     if (unlikely(a->sign)) {
         goto d_nan;
-- 
2.34.1

For the x86 and the Arm FEAT_AFP semantics, we need to be able to
tell the target code that the FPU operation has used an input
denormal.  Implement this; when it happens we set the new
float_flag_denormal_input_used.

Note that we only set this when an input denormal is actually used by
the operation: if the operation results in Invalid Operation or
Divide By Zero or the result is a NaN because some other input was a
NaN then we never needed to look at the input denormal and do not set
denormal_input_used.

We mostly do not need to adjust the hardfloat codepaths to deal with
this flag, because almost all hardfloat operations are already gated
on the input not being a denormal, and will fall back to softfloat
for a denormal input.  The only exception is the comparison
operations, where we need to add the check for input denormals, which
must now fall back to softfloat where they did not before.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-types.h |  7 ++++
 fpu/softfloat.c               | 38 +++++++++++++++++---
 fpu/softfloat-parts.c.inc     | 68 ++++++++++++++++++++++++++++++++++-
 3 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -XXX,XX +XXX,XX @@ enum {
     float_flag_invalid_sqrt    = 0x0800,  /* sqrt(-x) */
     float_flag_invalid_cvti    = 0x1000,  /* non-nan to integer */
     float_flag_invalid_snan    = 0x2000,  /* any operand was snan */
+    /*
+     * An input was denormal and we used it (without flushing it to zero).
+     * Not set if we do not actually use the denormal input (e.g.
+     * because some other input was a NaN, or because the operation
+     * wasn't actually carried out (divide-by-zero; invalid))
+     */
+    float_flag_input_denormal_used = 0x4000,
 };
 
 /*
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
                                   float16_params_ahp.frac_size + 1);
         break;
 
-    case float_class_normal:
     case float_class_denormal:
+        float_raise(float_flag_input_denormal_used, s);
+        break;
+    case float_class_normal:
     case float_class_zero:
         break;
 
@@ -XXX,XX +XXX,XX @@ static void parts64_float_to_float(FloatParts64 *a, float_status *s)
     if (is_nan(a->cls)) {
         parts_return_nan(a, s);
     }
+    if (a->cls == float_class_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
 }
 
 static void parts128_float_to_float(FloatParts128 *a, float_status *s)
@@ -XXX,XX +XXX,XX @@ static void parts128_float_to_float(FloatParts128 *a, float_status *s)
     if (is_nan(a->cls)) {
         parts_return_nan(a, s);
     }
+    if (a->cls == float_class_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
 }
 
 #define parts_float_to_float(P, S) \
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
     a->sign = b->sign;
     a->exp = b->exp;
 
-    if (is_anynorm(a->cls)) {
+    switch (a->cls) {
+    case float_class_denormal:
+        float_raise(float_flag_input_denormal_used, s);
+        /* fall through */
+    case float_class_normal:
         frac_truncjam(a, b);
-    } else if (is_nan(a->cls)) {
+        break;
+    case float_class_snan:
+    case float_class_qnan:
         /* Discard the low bits of the NaN. */
         a->frac = b->frac_hi;
         parts_return_nan(a, s);
+        break;
+    default:
+        break;
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_widen(FloatParts128 *a, FloatParts64 *b,
     if (is_nan(a->cls)) {
         parts_return_nan(a, s);
     }
+    if (a->cls == float_class_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
 }
 
 float32 float16_to_float32(float16 a, bool ieee, float_status *s)
@@ -XXX,XX +XXX,XX @@ float32_hs_compare(float32 xa, float32 xb, float_status *s, bool is_quiet)
         goto soft;
     }
 
-    float32_input_flush2(&ua.s, &ub.s, s);
+    if (unlikely(float32_is_denormal(ua.s) || float32_is_denormal(ub.s))) {
+        /* We may need to set the input_denormal_used flag */
+        goto soft;
+    }
+
     if (isgreaterequal(ua.h, ub.h)) {
         if (isgreater(ua.h, ub.h)) {
             return float_relation_greater;
@@ -XXX,XX +XXX,XX @@ float64_hs_compare(float64 xa, float64 xb, float_status *s, bool is_quiet)
         goto soft;
     }
 
-    float64_input_flush2(&ua.s, &ub.s, s);
+    if (unlikely(float64_is_denormal(ua.s) || float64_is_denormal(ub.s))) {
+        /* We may need to set the input_denormal_used flag */
+        goto soft;
+    }
+
     if (isgreaterequal(ua.h, ub.h)) {
         if (isgreater(ua.h, ub.h)) {
             return float_relation_greater;
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
     bool b_sign = b->sign ^ subtract;
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
+    /*
+     * For addition and subtraction, we will consume an
+     * input denormal unless the other input is a NaN.
+     */
+    if ((ab_mask & (float_cmask_denormal | float_cmask_anynan)) ==
+        float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     if (a->sign != b_sign) {
         /* Subtraction */
         if (likely(cmask_is_only_normals(ab_mask))) {
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
     if (likely(cmask_is_only_normals(ab_mask))) {
         FloatPartsW tmp;
 
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
+
         frac_mulw(&tmp, a, b);
         frac_truncjam(a, &tmp);
 
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
     }
 
     /* Multiply by 0 or Inf */
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     if (ab_mask & float_cmask_inf) {
         a->cls = float_class_inf;
         a->sign = sign;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
     if (flags & float_muladd_negate_result) {
         a->sign ^= 1;
     }
+
+    /*
+     * All result types except for "return the default NaN
+     * because this is an Invalid Operation" go through here;
+     * this matches the set of cases where we consumed a
+     * denormal input.
+     */
+    if (abc_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
     return a;
 
  return_sub_zero:
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
     bool sign = a->sign ^ b->sign;
 
     if (likely(cmask_is_only_normals(ab_mask))) {
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
         a->sign = sign;
         a->exp -= b->exp + frac_div(a, b);
         return a;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
         return parts_pick_nan(a, b, s);
     }
 
+    if ((ab_mask & float_cmask_denormal) && b->cls != float_class_zero) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     a->sign = sign;
 
     /* Inf / X */
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
     if (likely(cmask_is_only_normals(ab_mask))) {
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
         frac_modrem(a, b, mod_quot);
         return a;
     }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
         return a;
     }
 
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     /* N % Inf; 0 % N */
     g_assert(b->cls == float_class_inf || a->cls == float_class_zero);
     return a;
@@ -XXX,XX +XXX,XX @@ static void partsN(sqrt)(FloatPartsN *a, float_status *status,
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
         case float_class_denormal:
+            if (!a->sign) {
+                /* -ve denormal will be InvalidOperation */
+                float_raise(float_flag_input_denormal_used, status);
+            }
             break;
         case float_class_snan:
         case float_class_qnan:
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
         if ((flags & (minmax_isnum | minmax_isnumber))
             && !(ab_mask & float_cmask_snan)
             && (ab_mask & ~float_cmask_qnan)) {
+            if (ab_mask & float_cmask_denormal) {
+                float_raise(float_flag_input_denormal_used, s);
+            }
             return is_nan(a->cls) ? b : a;
         }
 
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
         return parts_pick_nan(a, b, s);
     }
 
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     a_exp = a->exp;
     b_exp = b->exp;
 
@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
     if (likely(cmask_is_only_normals(ab_mask))) {
         FloatRelation cmp;
 
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
+
         if (a->sign != b->sign) {
             goto a_sign;
         }
@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
         return float_relation_unordered;
     }
 
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     if (ab_mask & float_cmask_zero) {
         if (ab_mask == float_cmask_zero) {
             return float_relation_equal;
@@ -XXX,XX +XXX,XX @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
     case float_class_zero:
     case float_class_inf:
         break;
-    case float_class_normal:
     case float_class_denormal:
+        float_raise(float_flag_input_denormal_used, s);
+        /* fall through */
+    case float_class_normal:
         a->exp += MIN(MAX(n, -0x10000), 0x10000);
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
         case float_class_denormal:
+            if (!a->sign) {
+                /* -ve denormal will be InvalidOperation */
+                float_raise(float_flag_input_denormal_used, s);
+            }
             break;
         case float_class_snan:
         case float_class_qnan:
-- 
2.34.1

Currently we handle flushing of output denormals in uncanon_normal
always before we deal with rounding.  This works for architectures
that detect tininess before rounding, but is usually not the right
place when the architecture detects tininess after rounding.  For
example, for x86 the SDM states that the MXCSR FTZ control bit causes
outputs to be flushed to zero "when it detects a floating-point
underflow condition".  This means that we mustn't flush to zero if
the input is such that after rounding it is no longer tiny.

At least one of our guest architectures does underflow detection
after rounding but flushing of denormals before rounding (MIPS MSA);
this means we need to have a config knob for this that is separate
from our existing tininess_before_rounding setting.

Add an ftz_detection flag.  For consistency with
tininess_before_rounding, we make it default to "detect ftz after
rounding"; this means that we need to explicitly set the flag to
"detect ftz before rounding" on every existing architecture that sets
flush_to_zero, so that this commit has no behaviour change.
(This means more code change here but for the long term a less
confusing API.)

For several architectures the current behaviour is either
definitely or possibly wrong; annotate those with TODO comments.
These architectures are definitely wrong (and should detect
ftz after rounding):
 * x86
 * Alpha

For these architectures the spec is unclear:
 * MIPS (for non-MSA)
 * RX
 * SH4

PA-RISC makes ftz detection IMPDEF, but we aren't setting the
"tininess before rounding" setting that we ought to.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-helpers.h | 11 +++++++++++
 include/fpu/softfloat-types.h   | 18 ++++++++++++++++++
 target/mips/fpu_helper.h        |  6 ++++++
 target/alpha/cpu.c              |  7 +++++++
 target/arm/cpu.c                |  1 +
 target/hppa/fpu_helper.c        | 11 +++++++++++
 target/i386/tcg/fpu_helper.c    |  8 ++++++++
 target/mips/msa.c               |  9 +++++++++
 target/ppc/cpu_init.c           |  3 +++
 target/rx/cpu.c                 |  8 ++++++++
 target/sh4/cpu.c                |  8 ++++++++
 target/tricore/helper.c         |  1 +
 tests/fp/fp-bench.c             |  1 +
 fpu/softfloat-parts.c.inc       | 21 +++++++++++++++------
 14 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/include/fpu/softfloat-helpers.h b/include/fpu/softfloat-helpers.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-helpers.h
+++ b/include/fpu/softfloat-helpers.h
@@ -XXX,XX +XXX,XX @@ static inline void set_flush_inputs_to_zero(bool val, float_status *status)
     status->flush_inputs_to_zero = val;
 }
 
+static inline void set_float_ftz_detection(FloatFTZDetection d,
+                                           float_status *status)
+{
+    status->ftz_detection = d;
+}
+
 static inline void set_default_nan_mode(bool val, float_status *status)
 {
     status->default_nan_mode = val;
@@ -XXX,XX +XXX,XX @@ static inline bool get_default_nan_mode(const float_status *status)
     return status->default_nan_mode;
 }
 
+static inline FloatFTZDetection get_float_ftz_detection(const float_status *status)
+{
+    return status->ftz_detection;
+}
+
 #endif /* SOFTFLOAT_HELPERS_H */
diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
     float_infzeronan_suppress_invalid = (1 << 7),
 } FloatInfZeroNaNRule;
 
+/*
+ * When flush_to_zero is set, should we detect denormal results to
+ * be flushed before or after rounding? For most architectures this
+ * should be set to match the tininess_before_rounding setting,
+ * but a few architectures, e.g. MIPS MSA, detect FTZ before
+ * rounding but tininess after rounding.
+ *
+ * This enum is arranged so that the default if the target doesn't
+ * configure it matches the default for tininess_before_rounding
+ * (i.e. "after rounding").
+ */
+typedef enum __attribute__((__packed__)) {
+    float_ftz_after_rounding = 0,
+    float_ftz_before_rounding = 1,
+} FloatFTZDetection;
+
 /*
  * Floating Point Status. Individual architectures may maintain
  * several versions of float_status for different functions. The
@@ -XXX,XX +XXX,XX @@ typedef struct float_status {
     bool tininess_before_rounding;
     /* should denormalised results go to zero and set output_denormal_flushed? */
     bool flush_to_zero;
+    /* do we detect and flush denormal results before or after rounding? */
+    FloatFTZDetection ftz_detection;
     /* should denormalised inputs go to zero and set input_denormal_flushed? */
     bool flush_inputs_to_zero;
     bool default_nan_mode;
diff --git a/target/mips/fpu_helper.h b/target/mips/fpu_helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/mips/fpu_helper.h
+++ b/target/mips/fpu_helper.h
@@ -XXX,XX +XXX,XX @@ static inline void fp_reset(CPUMIPSState *env)
      */
     set_float_2nan_prop_rule(float_2nan_prop_s_ab,
                              &env->active_fpu.fp_status);
+    /*
+     * TODO: the spec does't say clearly whether FTZ happens before
+     * or after rounding for normal FPU operations.
+     */
+    set_float_ftz_detection(float_ftz_before_rounding,
+                            &env->active_fpu.fp_status);
 }
 
 /* MSA */
diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/alpha/cpu.c
+++ b/target/alpha/cpu.c
@@ -XXX,XX +XXX,XX @@ static void alpha_cpu_initfn(Object *obj)
     set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
     /* Default NaN: sign bit clear, msb frac bit set */
     set_float_default_nan_pattern(0b01000000, &env->fp_status);
+    /*
+     * TODO: this is incorrect. The Alpha Architecture Handbook version 4
+     * section 4.7.7.11 says that we flush to zero for underflow cases, so
+     * this should be float_ftz_after_rounding to match the
+     * tininess_after_rounding (which is specified in section 4.7.5).
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 #if defined(CONFIG_USER_ONLY)
     env->flags = ENV_FLAG_PS_USER | ENV_FLAG_FEN;
     cpu_alpha_store_fpcr(env, (uint64_t)(FPCR_INVD | FPCR_DZED | FPCR_OVFD
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
 static void arm_set_default_fp_behaviours(float_status *s)
 {
     set_float_detect_tininess(float_tininess_before_rounding, s);
+    set_float_ftz_detection(float_ftz_before_rounding, s);
     set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
     set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
     set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
diff --git a/target/hppa/fpu_helper.c b/target/hppa/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/hppa/fpu_helper.c
+++ b/target/hppa/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(loaded_fr0)(CPUHPPAState *env)
     set_float_infzeronan_rule(float_infzeronan_dnan_never, &env->fp_status);
     /* Default NaN: sign bit clear, msb-1 frac bit set */
     set_float_default_nan_pattern(0b00100000, &env->fp_status);
+    /*
+     * "PA-RISC 2.0 Architecture" says it is IMPDEF whether the flushing
+     * enabled by FPSR.D happens before or after rounding. We pick "before"
+     * for consistency with tininess detection.
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
+    /*
+     * TODO: "PA-RISC 2.0 Architecture" chapter 10 says that we should
+     * detect tininess before rounding, but we don't set that here so we
+     * get the default tininess after rounding.
+     */
 }
 
 void cpu_hppa_loaded_fr0(CPUHPPAState *env)
diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ void cpu_init_fp_statuses(CPUX86State *env)
     set_float_default_nan_pattern(0b11000000, &env->fp_status);
     set_float_default_nan_pattern(0b11000000, &env->mmx_status);
     set_float_default_nan_pattern(0b11000000, &env->sse_status);
+    /*
+     * TODO: x86 does flush-to-zero detection after rounding (the SDM
+     * section 10.2.3.3 on the FTZ bit of MXCSR says that we flush
+     * when we detect underflow, which x86 does after rounding).
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
+    set_float_ftz_detection(float_ftz_before_rounding, &env->mmx_status);
+    set_float_ftz_detection(float_ftz_before_rounding, &env->sse_status);
 }
 
 static inline uint8_t save_exception_flags(CPUX86State *env)
diff --git a/target/mips/msa.c b/target/mips/msa.c
index XXXXXXX..XXXXXXX 100644
--- a/target/mips/msa.c
+++ b/target/mips/msa.c
@@ -XXX,XX +XXX,XX @@ void msa_reset(CPUMIPSState *env)
     /* tininess detected after rounding.*/
     set_float_detect_tininess(float_tininess_after_rounding,
                               &env->active_tc.msa_fp_status);
+    /*
+     * MSACSR.FS detects tiny results to flush to zero before rounding
+     * (per "MIPS Architecture for Programmers Volume IV-j: The MIPS64 SIMD
+     * Architecture Module, Revision 1.1" section 3.5.4), even though it
+     * detects tininess after rounding for underflow purposes (section 3.4.2
+     * table 3.3).
+     */
+    set_float_ftz_detection(float_ftz_before_rounding,
+                            &env->active_tc.msa_fp_status);
 
     /*
      * According to MIPS specifications, if one of the two operands is
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index XXXXXXX..XXXXXXX 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -XXX,XX +XXX,XX @@ static void ppc_cpu_reset_hold(Object *obj, ResetType type)
     /* tininess for underflow is detected before rounding */
     set_float_detect_tininess(float_tininess_before_rounding,
                               &env->fp_status);
+    /* Similarly for flush-to-zero */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
+
     /*
      * PowerPC propagation rules:
      *  1. A if it sNaN or qNaN
diff --git a/target/rx/cpu.c b/target/rx/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/rx/cpu.c
+++ b/target/rx/cpu.c
@@ -XXX,XX +XXX,XX @@ static void rx_cpu_reset_hold(Object *obj, ResetType type)
     set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
     /* Default NaN value: sign bit clear, set frac msb */
     set_float_default_nan_pattern(0b01000000, &env->fp_status);
+    /*
+     * TODO: "RX Family RXv1 Instruction Set Architecture" is not 100% clear
+     * on whether flush-to-zero should happen before or after rounding, but
+     * section 1.3.2 says that it happens when underflow is detected, and
+     * implies that underflow is detected after rounding. So this may not
+     * be the correct setting.
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 }
 
 static ObjectClass *rx_cpu_class_by_name(const char *cpu_model)
diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/sh4/cpu.c
+++ b/target/sh4/cpu.c
@@ -XXX,XX +XXX,XX @@ static void superh_cpu_reset_hold(Object *obj, ResetType type)
     set_default_nan_mode(1, &env->fp_status);
     /* sign bit clear, set all frac bits other than msb */
     set_float_default_nan_pattern(0b00111111, &env->fp_status);
+    /*
+     * TODO: "SH-4 CPU Core Architecture ADCS 7182230F" doesn't say whether
+     * it detects tininess before or after rounding. Section 6.4 is clear
+     * that flush-to-zero happens when the result underflows, though, so
+     * either this should be "detect ftz after rounding" or else we should
+     * be setting "detect tininess before rounding".
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 }
 
 static void superh_cpu_disas_set_info(CPUState *cpu, disassemble_info *info)
diff --git a/target/tricore/helper.c b/target/tricore/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/tricore/helper.c
+++ b/target/tricore/helper.c
@@ -XXX,XX +XXX,XX @@ void fpu_set_state(CPUTriCoreState *env)
     set_flush_inputs_to_zero(1, &env->fp_status);
     set_flush_to_zero(1, &env->fp_status);
     set_float_detect_tininess(float_tininess_before_rounding, &env->fp_status);
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
     set_default_nan_mode(1, &env->fp_status);
     /* Default NaN pattern: sign bit clear, frac msb set */
     set_float_default_nan_pattern(0b01000000, &env->fp_status);
diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/fp/fp-bench.c
+++ b/tests/fp/fp-bench.c
@@ -XXX,XX +XXX,XX @@ static void run_bench(void)
     set_float_3nan_prop_rule(float_3nan_prop_s_cab, &soft_status);
     set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, &soft_status);
     set_float_default_nan_pattern(0b01000000, &soft_status);
+    set_float_ftz_detection(float_ftz_before_rounding, &soft_status);
 
     f = bench_funcs[operation][precision];
     g_assert(f);
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
             p->frac_lo &= ~round_mask;
         }
         frac_shr(p, frac_shift);
-    } else if (s->flush_to_zero) {
+    } else if (s->flush_to_zero &&
+               s->ftz_detection == float_ftz_before_rounding) {
         flags |= float_flag_output_denormal_flushed;
         p->cls = float_class_zero;
         exp = 0;
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
         exp = (p->frac_hi & DECOMPOSED_IMPLICIT_BIT) && !fmt->m68k_denormal;
         frac_shr(p, frac_shift);
 
-        if (is_tiny && (flags & float_flag_inexact)) {
-            flags |= float_flag_underflow;
-        }
-        if (exp == 0 && frac_eqz(p)) {
-            p->cls = float_class_zero;
+        if (is_tiny) {
+            if (s->flush_to_zero) {
+                assert(s->ftz_detection == float_ftz_after_rounding);
+                flags |= float_flag_output_denormal_flushed;
+                p->cls = float_class_zero;
+                exp = 0;
+                frac_clear(p);
+            } else if (flags & float_flag_inexact) {
+                flags |= float_flag_underflow;
+            }
+            if (exp == 0 && frac_eqz(p)) {
+                p->cls = float_class_zero;
+            }
         }
     }
     p->exp = exp;
-- 
2.34.1

The Armv8.7 FEAT_AFP feature defines three new control bits in
the FPCR:
 * FPCR.AH: "alternate floating point mode"; this changes floating
   point behaviour in a variety of ways, including:
    - the sign of a default NaN is 1, not 0
    - if FPCR.FZ is also 1, denormals detected after rounding
      with an unbounded exponent has been applied are flushed to zero
    - FPCR.FZ does not cause denormalized inputs to be flushed to zero
    - miscellaneous other corner-case behaviour changes
 * FPCR.FIZ: flush denormalized numbers to zero on input for
   most instructions
 * FPCR.NEP: makes scalar SIMD operations merge the result with
   higher vector elements in one of the source registers, instead
   of zeroing the higher elements of the destination

This commit defines the new bits in the FPCR, and allows them to be
read or written when FEAT_AFP is implemented.  Actual behaviour
changes will be implemented in subsequent commits.

Note that these are the first FPCR bits which don't appear in the
AArch32 FPSCR view of the register, and which share bit positions
with FPSR bits.

Part of FEAT_AFP is the new control bit FPCR.FIZ.  This bit affects
flushing of single and double precision denormal inputs to zero for
AArch64 floating point instructions.  (For half-precision, the
existing FPCR.FZ16 control remains the only one.)

FPCR.FIZ differs from FPCR.FZ in that if we flush an input denormal
only because of FPCR.FIZ then we should *not* set the cumulative
exception bit FPSR.IDC.

FEAT_AFP also defines that in AArch64 the existing FPCR.FZ only
applies when FPCR.AH is 0.

We can implement this by setting the "flush inputs to zero" state
appropriately when FPCR is written, and by not reflecting the
float_flag_input_denormal status flag into FPSR reads when it is the
result only of FPSR.FIZ.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vfp_helper.c | 60 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 50 insertions(+), 10 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
 
 static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
 {
-    uint32_t i = 0;
+    uint32_t a32_flags = 0, a64_flags = 0;
 
-    i |= get_float_exception_flags(&env->vfp.fp_status_a32);
-    i |= get_float_exception_flags(&env->vfp.fp_status_a64);
-    i |= get_float_exception_flags(&env->vfp.standard_fp_status);
+    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
+    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
     /* FZ16 does not generate an input denormal exception.  */
-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
+    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
           & ~float_flag_input_denormal_flushed);
-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
+    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
           & ~float_flag_input_denormal_flushed);
-    i |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
+
+    a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
+    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~float_flag_input_denormal_flushed);
-    return vfp_exceptbits_from_host(i);
+    /*
+     * Flushing an input denormal *only* because FPCR.FIZ == 1 does
+     * not set FPSR.IDC; if FPCR.FZ is also set then this takes
+     * precedence and IDC is set (see the FPUnpackBase pseudocode).
+     * So squash it unless (FPCR.AH == 0 && FPCR.FZ == 1).
+     * We only do this for the a64 flags because FIZ has no effect
+     * on AArch32 even if it is set.
+     */
+    if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
+        a64_flags &= ~float_flag_input_denormal_flushed;
+    }
+    return vfp_exceptbits_from_host(a32_flags | a64_flags);
 }
 
 static void vfp_clear_float_status_exc_flags(CPUARMState *env)
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
 }
 
+static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
+{
+    /*
+     * Synchronize any pending exception-flag information in the
+     * float_status values into env->vfp.fpsr, and then clear out
+     * the float_status data.
+     */
+    env->vfp.fpsr |= vfp_get_fpsr_from_host(env);
+    vfp_clear_float_status_exc_flags(env);
+}
+
 static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
 {
     uint64_t changed = env->vfp.fpcr;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
+        /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+    }
+    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
+        /*
+         * A64: Flush denormalized inputs to zero if FPCR.FIZ = 1, or
+         * both FPCR.AH = 0 and FPCR.FZ = 1.
+         */
+        bool fitz_enabled = (val & FPCR_FIZ) ||
+            (val & (FPCR_FZ | FPCR_AH)) == FPCR_FZ;
+        set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status_a64);
     }
     if (changed & FPCR_DN) {
         bool dnan_enabled = val & FPCR_DN;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
     }
+    /*
+     * If any bits changed that we look at in vfp_get_fpsr_from_host(),
+     * we must sync the float_status flags into vfp.fpsr now (under the
+     * old regime) before we update vfp.fpcr.
+     */
+    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
+        vfp_sync_and_clear_float_status_exc_flags(env);
+    }
 }
 
 #else
-- 
2.34.1

When FPCR.AH is set, various behaviours of AArch64 floating point
operations which are controlled by softfloat config settings change:
 * tininess and ftz detection before/after rounding
 * NaN propagation order
 * result of 0 * Inf + NaN
 * default NaN value

When the guest changes the value of the AH bit, switch these config
settings on the fp_status_a64 and fp_status_f16_a64 float_status
fields.

This requires us to make the arm_set_default_fp_behaviours() function
global, since we now need to call it from cpu.c and vfp_helper.c; we
move it to vfp_helper.c so it can be next to the new
arm_set_ah_fp_behaviours().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/internals.h  |  4 +++
 target/arm/cpu.c        | 23 ----------------
 target/arm/vfp_helper.c | 58 ++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 61 insertions(+), 24 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ uint64_t gt_virt_cnt_offset(CPUARMState *env);
  * all EL1" scope; this covers stage 1 and stage 2.
  */
 int alle1_tlbmask(CPUARMState *env);
+
+/* Set the float_status behaviour to match the Arm defaults */
+void arm_set_default_fp_behaviours(float_status *s);
+
 #endif
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
     QLIST_INSERT_HEAD(&cpu->el_change_hooks, entry, node);
 }
 
-/*
- * Set the float_status behaviour to match the Arm defaults:
- *  * tininess-before-rounding
- *  * 2-input NaN propagation prefers SNaN over QNaN, and then
- *    operand A over operand B (see FPProcessNaNs() pseudocode)
- *  * 3-input NaN propagation prefers SNaN over QNaN, and then
- *    operand C over A over B (see FPProcessNaNs3() pseudocode,
- *    but note that for QEMU muladd is a * b + c, whereas for
- *    the pseudocode function the arguments are in the order c, a, b.
- *  * 0 * Inf + NaN returns the default NaN if the input NaN is quiet,
- *    and the input NaN if it is signalling
- *  * Default NaN has sign bit clear, msb frac bit set
- */
-static void arm_set_default_fp_behaviours(float_status *s)
-{
-    set_float_detect_tininess(float_tininess_before_rounding, s);
-    set_float_ftz_detection(float_ftz_before_rounding, s);
-    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
-    set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
-    set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
-    set_float_default_nan_pattern(0b01000000, s);
-}
-
 static void cp_reg_reset(gpointer key, gpointer value, gpointer opaque)
 {
     /* Reset a single ARMCPRegInfo register */
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@
 #include "exec/helper-proto.h"
 #include "internals.h"
 #include "cpu-features.h"
+#include "fpu/softfloat.h"
 #ifdef CONFIG_TCG
 #include "qemu/log.h"
-#include "fpu/softfloat.h"
 #endif
 
 /* VFP support.  We follow the convention used for VFP instructions:
    Single precision routines have a "s" suffix, double precision a
    "d" suffix.  */
 
+/*
+ * Set the float_status behaviour to match the Arm defaults:
+ *  * tininess-before-rounding
+ *  * 2-input NaN propagation prefers SNaN over QNaN, and then
+ *    operand A over operand B (see FPProcessNaNs() pseudocode)
+ *  * 3-input NaN propagation prefers SNaN over QNaN, and then
+ *    operand C over A over B (see FPProcessNaNs3() pseudocode,
+ *    but note that for QEMU muladd is a * b + c, whereas for
+ *    the pseudocode function the arguments are in the order c, a, b.
+ *  * 0 * Inf + NaN returns the default NaN if the input NaN is quiet,
+ *    and the input NaN if it is signalling
+ *  * Default NaN has sign bit clear, msb frac bit set
+ */
+void arm_set_default_fp_behaviours(float_status *s)
+{
+    set_float_detect_tininess(float_tininess_before_rounding, s);
+    set_float_ftz_detection(float_ftz_before_rounding, s);
+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
+    set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
+    set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
+    set_float_default_nan_pattern(0b01000000, s);
+}
+
+/*
+ * Set the float_status behaviour to match the FEAT_AFP
+ * FPCR.AH=1 requirements:
+ *  * tininess-after-rounding
+ *  * 2-input NaN propagation prefers the first NaN
+ *  * 3-input NaN propagation prefers a over b over c
+ *  * 0 * Inf + NaN always returns the input NaN and doesn't
+ *    set Invalid for a QNaN
+ *  * default NaN has sign bit set, msb frac bit set
+ */
+static void arm_set_ah_fp_behaviours(float_status *s)
+{
+    set_float_detect_tininess(float_tininess_after_rounding, s);
+    set_float_ftz_detection(float_ftz_after_rounding, s);
+    set_float_2nan_prop_rule(float_2nan_prop_ab, s);
+    set_float_3nan_prop_rule(float_3nan_prop_abc, s);
+    set_float_infzeronan_rule(float_infzeronan_dnan_never |
+                              float_infzeronan_suppress_invalid, s);
+    set_float_default_nan_pattern(0b11000000, s);
+}
+
 #ifdef CONFIG_TCG
 
 /* Convert host exception flags to vfp form.  */
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
     }
+    if (changed & FPCR_AH) {
+        bool ah_enabled = val & FPCR_AH;
+
+        if (ah_enabled) {
+            /* Change behaviours for A64 FP operations */
+            arm_set_ah_fp_behaviours(&env->vfp.fp_status_a64);
+            arm_set_ah_fp_behaviours(&env->vfp.fp_status_f16_a64);
+        } else {
+            arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
+            arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
+        }
+    }
     /*
      * If any bits changed that we look at in vfp_get_fpsr_from_host(),
      * we must sync the float_status flags into vfp.fpsr now (under the
-- 
2.34.1

When FPCR.AH = 1, some of the cumulative exception flags in the FPSR
behave slightly differently for A64 operations:
 * IDC is set when a denormal input is used without flushing
 * IXC (Inexact) is set when an output denormal is flushed to zero

Update vfp_get_fpsr_from_host() to do this.

Note that because half-precision operations never set IDC, we now
need to add float_flag_input_denormal_used to the set we mask out of
fp_status_f16_a64.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vfp_helper.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static void arm_set_ah_fp_behaviours(float_status *s)
 #ifdef CONFIG_TCG
 
 /* Convert host exception flags to vfp form.  */
-static inline uint32_t vfp_exceptbits_from_host(int host_bits)
+static inline uint32_t vfp_exceptbits_from_host(int host_bits, bool ah)
 {
     uint32_t target_bits = 0;
 
@@ -XXX,XX +XXX,XX @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
     if (host_bits & float_flag_input_denormal_flushed) {
         target_bits |= FPSR_IDC;
     }
+    /*
+     * With FPCR.AH, IDC is set when an input denormal is used,
+     * and flushing an output denormal to zero sets both IXC and UFC.
+     */
+    if (ah && (host_bits & float_flag_input_denormal_used)) {
+        target_bits |= FPSR_IDC;
+    }
+    if (ah && (host_bits & float_flag_output_denormal_flushed)) {
+        target_bits |= FPSR_IXC;
+    }
     return target_bits;
 }
 
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
 
     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
-          & ~float_flag_input_denormal_flushed);
+          & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
     /*
      * Flushing an input denormal *only* because FPCR.FIZ == 1 does
      * not set FPSR.IDC; if FPCR.FZ is also set then this takes
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
         a64_flags &= ~float_flag_input_denormal_flushed;
     }
-    return vfp_exceptbits_from_host(a32_flags | a64_flags);
+    return vfp_exceptbits_from_host(a64_flags, env->vfp.fpcr & FPCR_AH) |
+        vfp_exceptbits_from_host(a32_flags, false);
 }
 
 static void vfp_clear_float_status_exc_flags(CPUARMState *env)
-- 
2.34.1

We are going to need to generate different code in some cases when
FPCR.AH is 1.  For example:
 * Floating point neg and abs must not flip the sign bit of NaNs
 * some insns (FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, and various
   BFCVT and BFM bfloat16 ops) need to use a different float_status
   to the usual one

Encode FPCR.AH into the A64 tbflags, so we can refer to it at
translate time.

Because we now have a bit in FPCR that affects codegen, we can't mark
the AArch64 FPCR register as being SUPPRESS_TB_END any more; writes
to it will now end the TB and trigger a regeneration of hflags.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h               | 1 +
 target/arm/tcg/translate.h     | 2 ++
 target/arm/helper.c            | 2 +-
 target/arm/tcg/hflags.c        | 4 ++++
 target/arm/tcg/translate-a64.c | 1 +
 5 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2, 34, 1)
 FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1)
 /* Set if FEAT_NV2 RAM accesses are big-endian */
 FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1)
+FIELD(TBFLAG_A64, AH, 37, 1)   /* FPCR.AH */
 
 /*
  * Helpers for using the above. Note that only the A64 accessors use
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool nv2_mem_e20;
     /* True if NV2 enabled and NV2 RAM accesses are big-endian */
     bool nv2_mem_be;
+    /* True if FPCR.AH is 1 (alternate floating point handling) */
+    bool fpcr_ah;
     /*
      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
      *  < 0, set by the current instruction.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = aa64_daif_write, .resetfn = arm_cp_reset_ignore },
     { .name = "FPCR", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .opc2 = 0, .crn = 4, .crm = 4,
-      .access = PL0_RW, .type = ARM_CP_FPU | ARM_CP_SUPPRESS_TB_END,
+      .access = PL0_RW, .type = ARM_CP_FPU,
       .readfn = aa64_fpcr_read, .writefn = aa64_fpcr_write },
     { .name = "FPSR", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .opc2 = 1, .crn = 4, .crm = 4,
diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/hflags.c
+++ b/target/arm/tcg/hflags.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
         DP_TBFLAG_A64(flags, TCMA, aa64_va_parameter_tcma(tcr, mmu_idx));
     }
 
+    if (env->vfp.fpcr & FPCR_AH) {
+        DP_TBFLAG_A64(flags, AH, 1);
+    }
+
     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 }
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->nv2 = EX_TBFLAG_A64(tb_flags, NV2);
     dc->nv2_mem_e20 = EX_TBFLAG_A64(tb_flags, NV2_MEM_E20);
     dc->nv2_mem_be = EX_TBFLAG_A64(tb_flags, NV2_MEM_BE);
+    dc->fpcr_ah = EX_TBFLAG_A64(tb_flags, AH);
     dc->vec_len = 0;
     dc->vec_stride = 0;
     dc->cp_regs = arm_cpu->cp_regs;
-- 
2.34.1

When FPCR.AH is 1, the behaviour of some instructions changes:
 * AdvSIMD BFCVT, BFCVTN, BFCVTN2, BFMLALB, BFMLALT
 * SVE BFCVT, BFCVTNT, BFMLALB, BFMLALT, BFMLSLB, BFMLSLT
 * SME BFCVT, BFCVTN, BFMLAL, BFMLSL (these are all in SME2 which
   QEMU does not yet implement)
 * FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS

The behaviour change is:
 * the instructions do not update the FPSR cumulative exception flags
 * trapped floating point exceptions are disabled (a no-op for QEMU,
   which doesn't implement FPCR.{IDE,IXE,UFE,OFE,DZE,IOE})
 * rounding is always round-to-nearest-even regardless of FPCR.RMode
 * denormalized inputs and outputs are always flushed to zero, as if
   FPCR.{FZ,FIZ} is {1,1}
 * FPCR.FZ16 is still honoured for half-precision inputs

(See the Arm ARM DDI0487L.a section A1.5.9.)

We can provide all these behaviours with another pair of float_status fields
which we use only for these insns, when FPCR.AH is 1. These float_status
fields will always have:
 * flush_to_zero and flush_inputs_to_zero set for the non-F16 field
 * rounding mode set to round-to-nearest-even
and so the only FPCR fields they need to honour are DN and FZ16.

In this commit we only define the new fp_status fields and give them
the required behaviour when FPSR is updated.  In subsequent commits
we will arrange to use this new fp_status field for the instructions
that should be affected by FPCR.AH in this way.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h           | 15 +++++++++++++++
 target/arm/internals.h     |  2 ++
 target/arm/tcg/translate.h | 14 ++++++++++++++
 target/arm/cpu.c           |  4 ++++
 target/arm/vfp_helper.c    | 13 ++++++++++++-
 5 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          *  standard_fp_status : the ARM "Standard FPSCR Value"
          *  standard_fp_status_fp16 : used for half-precision
          *       calculations with the ARM "Standard FPSCR Value"
+         *  ah_fp_status: used for the A64 insns which change behaviour
+         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
+         *       and the reciprocal and square root estimate/step insns)
+         *  ah_fp_status_f16: used for the A64 insns which change behaviour
+         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
+         *       and the reciprocal and square root estimate/step insns);
+         *       for half-precision
          *
          * Half-precision operations are governed by a separate
          * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
          * using a fixed value for it.
          *
+         * The ah_fp_status is needed because some insns have different
+         * behaviour when FPCR.AH == 1: they don't update cumulative
+         * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
+         * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
+         * which means we need an ah_fp_status_f16 as well.
+         *
          * To avoid having to transfer exception bits around, we simply
          * say that the FPSCR cumulative exception flags are the logical
          * OR of the flags in the four fp statuses. This relies on the
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
         float_status fp_status_f16_a64;
         float_status standard_fp_status;
         float_status standard_fp_status_f16;
+        float_status ah_fp_status;
+        float_status ah_fp_status_f16;
 
         uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
         uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ int alle1_tlbmask(CPUARMState *env);
 
 /* Set the float_status behaviour to match the Arm defaults */
 void arm_set_default_fp_behaviours(float_status *s);
+/* Set the float_status behaviour to match Arm FPCR.AH=1 behaviour */
+void arm_set_ah_fp_behaviours(float_status *s);
 
 #endif
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour {
     FPST_A64,
     FPST_A32_F16,
     FPST_A64_F16,
+    FPST_AH,
+    FPST_AH_F16,
     FPST_STD,
     FPST_STD_F16,
 } ARMFPStatusFlavour;
@@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour {
  *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
  * FPST_A64_F16
  *   for AArch64 operations controlled by the FPCR where FPCR.FZ16 is to be used
+ * FPST_AH:
+ *   for AArch64 operations which change behaviour when AH=1 (specifically,
+ *   bfloat16 conversions and multiplies, and the reciprocal and square root
+ *   estimate/step insns)
+ * FPST_AH_F16:
+ *   ditto, but for half-precision operations
  * FPST_STD
  *   for A32/T32 Neon operations using the "standard FPSCR value"
  * FPST_STD_F16
@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
     case FPST_A64_F16:
         offset = offsetof(CPUARMState, vfp.fp_status_f16_a64);
         break;
+    case FPST_AH:
+        offset = offsetof(CPUARMState, vfp.ah_fp_status);
+        break;
+    case FPST_AH_F16:
+        offset = offsetof(CPUARMState, vfp.ah_fp_status_f16);
+        break;
     case FPST_STD:
         offset = offsetof(CPUARMState, vfp.standard_fp_status);
         break;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
+    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
+    set_flush_to_zero(1, &env->vfp.ah_fp_status);
+    set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
+    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status_f16);
 
 #ifndef CONFIG_USER_ONLY
     if (kvm_enabled()) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ void arm_set_default_fp_behaviours(float_status *s)
  *    set Invalid for a QNaN
  *  * default NaN has sign bit set, msb frac bit set
  */
-static void arm_set_ah_fp_behaviours(float_status *s)
+void arm_set_ah_fp_behaviours(float_status *s)
 {
     set_float_detect_tininess(float_tininess_after_rounding, s);
     set_float_ftz_detection(float_ftz_after_rounding, s);
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
+    /*
+     * We do not merge in flags from ah_fp_status or ah_fp_status_f16, because
+     * they are used for insns that must not set the cumulative exception bits.
+     */
+
     /*
      * Flushing an input denormal *only* because FPCR.FIZ == 1 does
      * not set FPSR.IDC; if FPCR.FZ is also set then this takes
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
     set_float_exception_flags(0, &env->vfp.standard_fp_status);
     set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
+    set_float_exception_flags(0, &env->vfp.ah_fp_status);
+    set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
 }
 
 static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
+        set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
     }
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
+        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
+        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status_f16);
     }
     if (changed & FPCR_AH) {
         bool ah_enabled = val & FPCR_AH;
-- 
2.34.1

For the instructions FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, use
FPST_FPCR_AH or FPST_FPCR_AH_F16 when FPCR.AH is 1, so that they get
the required behaviour changes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.h |  13 ++++
 target/arm/tcg/translate-a64.c | 119 +++++++++++++++++++++++++--------
 target/arm/tcg/translate-sve.c |  30 ++++++---
 3 files changed, 127 insertions(+), 35 deletions(-)

diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.h
+++ b/target/arm/tcg/translate-a64.h
@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr pred_full_reg_ptr(DisasContext *s, int regno)
     return ret;
 }
 
+/*
+ * Return the ARMFPStatusFlavour to use based on element size and
+ * whether FPCR.AH is set.
+ */
+static inline ARMFPStatusFlavour select_ah_fpst(DisasContext *s, MemOp esz)
+{
+    if (s->fpcr_ah) {
+        return esz == MO_16 ? FPST_AH_F16 : FPST_AH;
+    } else {
+        return esz == MO_16 ? FPST_A64_F16 : FPST_A64;
+    }
+}
+
 bool disas_sve(DisasContext *, uint32_t);
 bool disas_sme(DisasContext *, uint32_t);
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op3_ool(DisasContext *s, bool is_q, int rd,
  * an out-of-line helper.
  */
 static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, int rd, int rn,
-                              int rm, bool is_fp16, int data,
+                              int rm, ARMFPStatusFlavour fpsttype, int data,
                               gen_helper_gvec_3_ptr *fn)
 {
-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_A64_F16 : FPST_A64);
+    TCGv_ptr fpst = fpstatus_ptr(fpsttype);
     tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn),
                        vec_full_reg_offset(s, rm), fpst,
@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar {
     void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
 } FPScalar;
 
-static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
+                                        const FPScalar *f,
+                                        ARMFPStatusFlavour fpsttype)
 {
     switch (a->esz) {
     case MO_64:
         if (fp_access_check(s)) {
             TCGv_i64 t0 = read_fp_dreg(s, a->rn);
             TCGv_i64 t1 = read_fp_dreg(s, a->rm);
-            f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_A64));
+            f->gen_d(t0, t0, t1, fpstatus_ptr(fpsttype));
             write_fp_dreg(s, a->rd, t0);
         }
         break;
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
         if (fp_access_check(s)) {
             TCGv_i32 t0 = read_fp_sreg(s, a->rn);
             TCGv_i32 t1 = read_fp_sreg(s, a->rm);
-            f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_A64));
+            f->gen_s(t0, t0, t1, fpstatus_ptr(fpsttype));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
         if (fp_access_check(s)) {
             TCGv_i32 t0 = read_fp_hreg(s, a->rn);
             TCGv_i32 t1 = read_fp_hreg(s, a->rm);
-            f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_A64_F16));
+            f->gen_h(t0, t0, t1, fpstatus_ptr(fpsttype));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
     return true;
 }
 
+static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+{
+    return do_fp3_scalar_with_fpsttype(s, a, f,
+                                       a->esz == MO_16 ?
+                                       FPST_A64_F16 : FPST_A64);
+}
+
+static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+{
+    return do_fp3_scalar_with_fpsttype(s, a, f, select_ah_fpst(s, a->esz));
+}
+
 static const FPScalar f_scalar_fadd = {
     gen_helper_vfp_addh,
     gen_helper_vfp_adds,
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f32,
     gen_helper_recpsf_f64,
 };
-TRANS(FRECPS_s, do_fp3_scalar, a, &f_scalar_frecps)
+TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps)
 
 static const FPScalar f_scalar_frsqrts = {
     gen_helper_rsqrtsf_f16,
     gen_helper_rsqrtsf_f32,
     gen_helper_rsqrtsf_f64,
 };
-TRANS(FRSQRTS_s, do_fp3_scalar, a, &f_scalar_frsqrts)
+TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts)
 
 static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                        const FPScalar *f, bool swap)
@@ -XXX,XX +XXX,XX @@ TRANS(CMHS_s, do_cmop_d, a, TCG_COND_GEU)
 TRANS(CMEQ_s, do_cmop_d, a, TCG_COND_EQ)
 TRANS(CMTST_s, do_cmop_d, a, TCG_COND_TSTNE)
 
-static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
-                          gen_helper_gvec_3_ptr * const fns[3])
+static bool do_fp3_vector_with_fpsttype(DisasContext *s, arg_qrrr_e *a,
+                                        int data,
+                                        gen_helper_gvec_3_ptr * const fns[3],
+                                        ARMFPStatusFlavour fpsttype)
 {
     MemOp esz = a->esz;
     int check = fp_access_check_vector_hsd(s, a->q, esz);
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
         return check == 0;
     }
 
-    gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
-                      esz == MO_16, data, fns[esz - 1]);
+    gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm, fpsttype,
+                      data, fns[esz - 1]);
     return true;
 }
 
+static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
+                          gen_helper_gvec_3_ptr * const fns[3])
+{
+    return do_fp3_vector_with_fpsttype(s, a, data, fns,
+                                       a->esz == MO_16 ?
+                                       FPST_A64_F16 : FPST_A64);
+}
+
+static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
+                             gen_helper_gvec_3_ptr * const f[3])
+{
+    return do_fp3_vector_with_fpsttype(s, a, data, f,
+                                       select_ah_fpst(s, a->esz));
+}
+
 static gen_helper_gvec_3_ptr * const f_vector_fadd[3] = {
     gen_helper_gvec_fadd_h,
     gen_helper_gvec_fadd_s,
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
     gen_helper_gvec_recps_s,
     gen_helper_gvec_recps_d,
 };
-TRANS(FRECPS_v, do_fp3_vector, a, 0, f_vector_frecps)
+TRANS(FRECPS_v, do_fp3_vector_ah, a, 0, f_vector_frecps)
 
 static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] = {
     gen_helper_gvec_rsqrts_h,
     gen_helper_gvec_rsqrts_s,
     gen_helper_gvec_rsqrts_d,
 };
-TRANS(FRSQRTS_v, do_fp3_vector, a, 0, f_vector_frsqrts)
+TRANS(FRSQRTS_v, do_fp3_vector_ah, a, 0, f_vector_frsqrts)
 
 static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = {
     gen_helper_gvec_faddp_h,
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector_idx(DisasContext *s, arg_qrrx_e *a,
     }
 
     gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
-                      esz == MO_16, a->idx, fns[esz - 1]);
+                      esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+                      a->idx, fns[esz - 1]);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar1 {
     void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_ptr);
 } FPScalar1;
 
-static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
-                          const FPScalar1 *f, int rmode)
+static bool do_fp1_scalar_with_fpsttype(DisasContext *s, arg_rr_e *a,
+                                        const FPScalar1 *f, int rmode,
+                                        ARMFPStatusFlavour fpsttype)
 {
     TCGv_i32 tcg_rmode = NULL;
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
         return check == 0;
     }
 
-    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+    fpst = fpstatus_ptr(fpsttype);
     if (rmode >= 0) {
         tcg_rmode = gen_set_rmode(rmode, fpst);
     }
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
     return true;
 }
 
+static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
+                          const FPScalar1 *f, int rmode)
+{
+    return do_fp1_scalar_with_fpsttype(s, a, f, rmode,
+                                       a->esz == MO_16 ?
+                                       FPST_A64_F16 : FPST_A64);
+}
+
+static bool do_fp1_scalar_ah(DisasContext *s, arg_rr_e *a,
+                             const FPScalar1 *f, int rmode)
+{
+    return do_fp1_scalar_with_fpsttype(s, a, f, rmode, select_ah_fpst(s, a->esz));
+}
+
 static const FPScalar1 f_scalar_fsqrt = {
     gen_helper_vfp_sqrth,
     gen_helper_vfp_sqrts,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frecpe = {
     gen_helper_recpe_f32,
     gen_helper_recpe_f64,
 };
-TRANS(FRECPE_s, do_fp1_scalar, a, &f_scalar_frecpe, -1)
+TRANS(FRECPE_s, do_fp1_scalar_ah, a, &f_scalar_frecpe, -1)
 
 static const FPScalar1 f_scalar_frecpx = {
     gen_helper_frecpx_f16,
     gen_helper_frecpx_f32,
     gen_helper_frecpx_f64,
 };
-TRANS(FRECPX_s, do_fp1_scalar, a, &f_scalar_frecpx, -1)
+TRANS(FRECPX_s, do_fp1_scalar_ah, a, &f_scalar_frecpx, -1)
 
 static const FPScalar1 f_scalar_frsqrte = {
     gen_helper_rsqrte_f16,
     gen_helper_rsqrte_f32,
     gen_helper_rsqrte_f64,
 };
-TRANS(FRSQRTE_s, do_fp1_scalar, a, &f_scalar_frsqrte, -1)
+TRANS(FRSQRTE_s, do_fp1_scalar_ah, a, &f_scalar_frsqrte, -1)
 
 static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
 {
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FRINT64Z_v, aa64_frint, do_fp1_vector, a,
            &f_scalar_frint64, FPROUNDING_ZERO)
 TRANS_FEAT(FRINT64X_v, aa64_frint, do_fp1_vector, a, &f_scalar_frint64, -1)
 
-static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
-                             int rd, int rn, int data,
-                             gen_helper_gvec_2_ptr * const fns[3])
+static bool do_gvec_op2_fpst_with_fpsttype(DisasContext *s, MemOp esz,
+                                           bool is_q, int rd, int rn, int data,
+                                           gen_helper_gvec_2_ptr * const fns[3],
+                                           ARMFPStatusFlavour fpsttype)
 {
     int check = fp_access_check_vector_hsd(s, is_q, esz);
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
         return check == 0;
     }
 
-    fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+    fpst = fpstatus_ptr(fpsttype);
     tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn), fpst,
                        is_q ? 16 : 8, vec_full_reg_size(s),
@@ -XXX,XX +XXX,XX @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
     return true;
 }
 
+static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
+                             int rd, int rn, int data,
+                             gen_helper_gvec_2_ptr * const fns[3])
+{
+    return do_gvec_op2_fpst_with_fpsttype(s, esz, is_q, rd, rn, data, fns,
+                                          esz == MO_16 ? FPST_A64_F16 :
+                                          FPST_A64);
+}
+
+static bool do_gvec_op2_ah_fpst(DisasContext *s, MemOp esz, bool is_q,
+                                int rd, int rn, int data,
+                                gen_helper_gvec_2_ptr * const fns[3])
+{
+    return do_gvec_op2_fpst_with_fpsttype(s, esz, is_q, rd, rn, data,
+                                          fns, select_ah_fpst(s, esz));
+}
+
 static gen_helper_gvec_2_ptr * const f_scvtf_v[] = {
     gen_helper_gvec_vcvt_sh,
     gen_helper_gvec_vcvt_sf,
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_frecpe[] = {
     gen_helper_gvec_frecpe_s,
     gen_helper_gvec_frecpe_d,
 };
-TRANS(FRECPE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
+TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
 
 static gen_helper_gvec_2_ptr * const f_frsqrte[] = {
     gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s,
     gen_helper_gvec_frsqrte_d,
 };
-TRANS(FRSQRTE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
+TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
 
 static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
 {
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool gen_gvec_fpst_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
     return true;
 }
 
-static bool gen_gvec_fpst_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
-                                 arg_rr_esz *a, int data)
+static bool gen_gvec_fpst_ah_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
+                                    arg_rr_esz *a, int data)
 {
     return gen_gvec_fpst_zz(s, fn, a->rd, a->rn, data,
-                            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+                            select_ah_fpst(s, a->esz));
 }
 
 /* Invoke an out-of-line helper on 3 Zregs. */
@@ -XXX,XX +XXX,XX @@ static bool gen_gvec_fpst_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
                              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
 }
 
+static bool gen_gvec_fpst_ah_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
+                                     arg_rrr_esz *a, int data)
+{
+    return gen_gvec_fpst_zzz(s, fn, a->rd, a->rn, a->rm, data,
+                             select_ah_fpst(s, a->esz));
+}
+
 /* Invoke an out-of-line helper on 4 Zregs. */
 static bool gen_gvec_ool_zzzz(DisasContext *s, gen_helper_gvec_4 *fn,
                               int rd, int rn, int rm, int ra, int data)
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const frecpe_fns[] = {
     NULL,                     gen_helper_gvec_frecpe_h,
     gen_helper_gvec_frecpe_s, gen_helper_gvec_frecpe_d,
 };
-TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_arg_zz, frecpe_fns[a->esz], a, 0)
+TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frecpe_fns[a->esz], a, 0)
 
 static gen_helper_gvec_2_ptr * const frsqrte_fns[] = {
     NULL,                      gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s, gen_helper_gvec_frsqrte_d,
 };
-TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_arg_zz, frsqrte_fns[a->esz], a, 0)
+TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frsqrte_fns[a->esz], a, 0)
 
 /*
  *** SVE Floating Point Compare with Zero Group
@@ -XXX,XX +XXX,XX @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
     };                                                              \
     TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_arg_zzz, name##_fns[a->esz], a, 0)
 
+#define DO_FP3_AH(NAME, name) \
+    static gen_helper_gvec_3_ptr * const name##_fns[4] = {          \
+        NULL, gen_helper_gvec_##name##_h,                           \
+        gen_helper_gvec_##name##_s, gen_helper_gvec_##name##_d      \
+    };                                                              \
+    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz, name##_fns[a->esz], a, 0)
+
 DO_FP3(FADD_zzz, fadd)
 DO_FP3(FSUB_zzz, fsub)
 DO_FP3(FMUL_zzz, fmul)
-DO_FP3(FRECPS, recps)
-DO_FP3(FRSQRTS, rsqrts)
+DO_FP3_AH(FRECPS, recps)
+DO_FP3_AH(FRSQRTS, rsqrts)
 
 #undef DO_FP3
 
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const frecpx_fns[] = {
     gen_helper_sve_frecpx_s, gen_helper_sve_frecpx_d,
 };
 TRANS_FEAT(FRECPX, aa64_sve, gen_gvec_fpst_arg_zpz, frecpx_fns[a->esz],
-           a, 0, a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
+           a, 0, select_ah_fpst(s, a->esz))
 
 static gen_helper_gvec_3_ptr * const fsqrt_fns[] = {
     NULL,                   gen_helper_sve_fsqrt_h,
-- 
2.34.1

When FPCR.AH is 1, use FPST_FPCR_AH for:
 * AdvSIMD BFCVT, BFCVTN, BFCVTN2
 * SVE BFCVT, BFCVTNT

so that they get the required behaviour changes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 27 +++++++++++++++++++++------
 target/arm/tcg/translate-sve.c |  6 ++++--
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
 static const FPScalar1 f_scalar_bfcvt = {
     .gen_s = gen_helper_bfcvt,
 };
-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar, a, &f_scalar_bfcvt, -1)
+TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
 
 static const FPScalar1 f_scalar_frint32 = {
     NULL,
@@ -XXX,XX +XXX,XX @@ static void gen_bfcvtn_hs(TCGv_i64 d, TCGv_i64 n)
     tcg_gen_extu_i32_i64(d, tmp);
 }
 
-static ArithOneOp * const f_vector_bfcvtn[] = {
-    NULL,
-    gen_bfcvtn_hs,
-    NULL,
+static void gen_bfcvtn_ah_hs(TCGv_i64 d, TCGv_i64 n)
+{
+    TCGv_ptr fpst = fpstatus_ptr(FPST_AH);
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    gen_helper_bfcvt_pair(tmp, n, fpst);
+    tcg_gen_extu_i32_i64(d, tmp);
+}
+
+static ArithOneOp * const f_vector_bfcvtn[2][3] = {
+    {
+        NULL,
+        gen_bfcvtn_hs,
+        NULL,
+    }, {
+        NULL,
+        gen_bfcvtn_ah_hs,
+        NULL,
+    }
 };
-TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a, f_vector_bfcvtn)
+TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a,
+           f_vector_bfcvtn[s->fpcr_ah])
 
 static bool trans_SHLL_v(DisasContext *s, arg_qrr_e *a)
 {
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVT_hs, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvt_hs, a, 0, FPST_A64_F16)
 
 TRANS_FEAT(BFCVT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_bfcvt, a, 0, FPST_A64)
+           gen_helper_sve_bfcvt, a, 0,
+           s->fpcr_ah ? FPST_AH : FPST_A64)
 
 TRANS_FEAT(FCVT_dh, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvt_dh, a, 0, FPST_A64)
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVTNT_ds, aa64_sve2, gen_gvec_fpst_arg_zpz,
            gen_helper_sve2_fcvtnt_ds, a, 0, FPST_A64)
 
 TRANS_FEAT(BFCVTNT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_bfcvtnt, a, 0, FPST_A64)
+           gen_helper_sve_bfcvtnt, a, 0,
+           s->fpcr_ah ? FPST_AH : FPST_A64)
 
 TRANS_FEAT(FCVTLT_hs, aa64_sve2, gen_gvec_fpst_arg_zpz,
            gen_helper_sve2_fcvtlt_hs, a, 0, FPST_A64)
-- 
2.34.1

When FPCR.AH is 1, use FPST_FPCR_AH for:
 * AdvSIMD BFMLALB, BFMLALT
 * SVE BFMLALB, BFMLALT, BFMLSLB, BFMLSLT

so that they get the required behaviour changes.

We do this by making gen_gvec_op4_fpst() take an ARMFPStatusFlavour
rather than a bool is_fp16; existing callsites now select
FPST_FPCR_F16_A64 vs FPST_FPCR_A64 themselves rather than passing in
the boolean.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 20 +++++++++++++-------
 target/arm/tcg/translate-sve.c |  6 ++++--
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op4_env(DisasContext *s, bool is_q, int rd, int rn,
  * an out-of-line helper.
  */
 static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
-                              int rm, int ra, bool is_fp16, int data,
+                              int rm, int ra, ARMFPStatusFlavour fpsttype,
+                              int data,
                               gen_helper_gvec_4_ptr *fn)
 {
-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_A64_F16 : FPST_A64);
+    TCGv_ptr fpst = fpstatus_ptr(fpsttype);
     tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn),
                        vec_full_reg_offset(s, rm),
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLAL_v(DisasContext *s, arg_qrrr_e *a)
     }
     if (fp_access_check(s)) {
         /* Q bit selects BFMLALB vs BFMLALT. */
-        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, false, a->q,
+        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd,
+                          s->fpcr_ah ? FPST_AH : FPST_A64, a->q,
                           gen_helper_gvec_bfmlal);
     }
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
     }
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
-                      a->esz == MO_16, a->rot, fn[a->esz]);
+                      a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+                      a->rot, fn[a->esz]);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
     }
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
-                      esz == MO_16, (a->idx << 1) | neg,
+                      esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+                      (a->idx << 1) | neg,
                       fns[esz - 1]);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLAL_vi(DisasContext *s, arg_qrrx_e *a)
     }
     if (fp_access_check(s)) {
         /* Q bit selects BFMLALB vs BFMLALT. */
-        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, 0,
+        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd,
+                          s->fpcr_ah ? FPST_AH : FPST_A64,
                           (a->idx << 1) | a->q,
                           gen_helper_gvec_bfmlal_idx);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
     }
     if (fp_access_check(s)) {
         gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
-                          a->esz == MO_16, (a->idx << 2) | a->rot, fn);
+                          a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+                          (a->idx << 2) | a->rot, fn);
     }
     return true;
 }
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(BFMMLA, aa64_sve_bf16, gen_gvec_env_arg_zzzz,
 static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
 {
     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal,
-                              a->rd, a->rn, a->rm, a->ra, sel, FPST_A64);
+                              a->rd, a->rn, a->rm, a->ra, sel,
+                              s->fpcr_ah ? FPST_AH : FPST_A64);
 }
 
 TRANS_FEAT(BFMLALB_zzzw, aa64_sve_bf16, do_BFMLAL_zzzw, a, false)
@@ -XXX,XX +XXX,XX @@ static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
 {
     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal_idx,
                               a->rd, a->rn, a->rm, a->ra,
-                              (a->index << 1) | sel, FPST_A64);
+                              (a->index << 1) | sel,
+                              s->fpcr_ah ? FPST_AH : FPST_A64);
 }
 
 TRANS_FEAT(BFMLALB_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, false)
-- 
2.34.1

For FEAT_AFP, we want to emit different code when FPCR.NEP is set, so
that instead of zeroing the high elements of a vector register when
we write the output of a scalar operation to it, we instead merge in
those elements from one of the source registers.  Since this affects
the generated code, we need to put FPCR.NEP into the TBFLAGS.

FPCR.NEP is treated as 0 when in streaming SVE mode and FEAT_SME_FA64
is not implemented or not enabled; we can implement this logic in
rebuild_hflags_a64().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h               | 1 +
 target/arm/tcg/translate.h     | 2 ++
 target/arm/tcg/hflags.c        | 9 +++++++++
 target/arm/tcg/translate-a64.c | 1 +
 4 files changed, 13 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1)
 /* Set if FEAT_NV2 RAM accesses are big-endian */
 FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1)
 FIELD(TBFLAG_A64, AH, 37, 1)   /* FPCR.AH */
+FIELD(TBFLAG_A64, NEP, 38, 1)   /* FPCR.NEP */
 
 /*
  * Helpers for using the above. Note that only the A64 accessors use
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool nv2_mem_be;
     /* True if FPCR.AH is 1 (alternate floating point handling) */
     bool fpcr_ah;
+    /* True if FPCR.NEP is 1 (FEAT_AFP scalar upper-element result handling) */
+    bool fpcr_nep;
     /*
      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
      *  < 0, set by the current instruction.
diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/hflags.c
+++ b/target/arm/tcg/hflags.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
     if (env->vfp.fpcr & FPCR_AH) {
         DP_TBFLAG_A64(flags, AH, 1);
     }
+    if (env->vfp.fpcr & FPCR_NEP) {
+        /*
+         * In streaming-SVE without FA64, NEP behaves as if zero;
+         * compare pseudocode IsMerging()
+         */
+        if (!(EX_TBFLAG_A64(flags, PSTATE_SM) && !sme_fa64(env, el))) {
+            DP_TBFLAG_A64(flags, NEP, 1);
+        }
+    }
 
     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 }
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->nv2_mem_e20 = EX_TBFLAG_A64(tb_flags, NV2_MEM_E20);
     dc->nv2_mem_be = EX_TBFLAG_A64(tb_flags, NV2_MEM_BE);
     dc->fpcr_ah = EX_TBFLAG_A64(tb_flags, AH);
+    dc->fpcr_nep = EX_TBFLAG_A64(tb_flags, NEP);
     dc->vec_len = 0;
     dc->vec_stride = 0;
     dc->cp_regs = arm_cpu->cp_regs;
-- 
2.34.1

For FEAT_AFP's FPCR.NEP bit, we need to programmatically change the
behaviour of the writeback of the result for most SIMD scalar
operations, so that instead of zeroing the upper part of the result
register it merges the upper elements from one of the input
registers.

Provide new functions write_fp_*reg_merging() which can be used
instead of the existing write_fp_*reg() functions when we want this
"merge the result with one of the input registers if FPCR.NEP is
enabled" handling, and use them in do_fp3_scalar_with_fpsttype().

Note that (as documented in the description of the FPCR.NEP bit)
which input register to use as the merge source varies by
instruction: for these 2-input scalar operations, the comparison
instructions take from Rm, not Rn.

We'll extend this to also provide the merging behaviour for
the remaining scalar insns in subsequent commits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 117 +++++++++++++++++++++++++--------
 1 file changed, 91 insertions(+), 26 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void write_fp_sreg(DisasContext *s, int reg, TCGv_i32 v)
     write_fp_dreg(s, reg, tmp);
 }
 
+/*
+ * Write a double result to 128 bit vector register reg, honouring FPCR.NEP:
+ * - if FPCR.NEP == 0, clear the high elements of reg
+ * - if FPCR.NEP == 1, set the high elements of reg from mergereg
+ *   (i.e. merge the result with those high elements)
+ * In either case, SVE register bits above 128 are zeroed (per R_WKYLB).
+ */
+static void write_fp_dreg_merging(DisasContext *s, int reg, int mergereg,
+                                  TCGv_i64 v)
+{
+    if (!s->fpcr_nep) {
+        write_fp_dreg(s, reg, v);
+        return;
+    }
+
+    /*
+     * Move from mergereg to reg; this sets the high elements and
+     * clears the bits above 128 as a side effect.
+     */
+    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
+                     vec_full_reg_offset(s, mergereg),
+                     16, vec_full_reg_size(s));
+    tcg_gen_st_i64(v, tcg_env, vec_full_reg_offset(s, reg));
+}
+
+/*
+ * Write a single-prec result, but only clear the higher elements
+ * of the destination register if FPCR.NEP is 0; otherwise preserve them.
+ */
+static void write_fp_sreg_merging(DisasContext *s, int reg, int mergereg,
+                                  TCGv_i32 v)
+{
+    if (!s->fpcr_nep) {
+        write_fp_sreg(s, reg, v);
+        return;
+    }
+
+    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
+                     vec_full_reg_offset(s, mergereg),
+                     16, vec_full_reg_size(s));
+    tcg_gen_st_i32(v, tcg_env, fp_reg_offset(s, reg, MO_32));
+}
+
+/*
+ * Write a half-prec result, but only clear the higher elements
+ * of the destination register if FPCR.NEP is 0; otherwise preserve them.
+ * The caller must ensure that the top 16 bits of v are zero.
+ */
+static void write_fp_hreg_merging(DisasContext *s, int reg, int mergereg,
+                                  TCGv_i32 v)
+{
+    if (!s->fpcr_nep) {
+        write_fp_sreg(s, reg, v);
+        return;
+    }
+
+    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
+                     vec_full_reg_offset(s, mergereg),
+                     16, vec_full_reg_size(s));
+    tcg_gen_st16_i32(v, tcg_env, fp_reg_offset(s, reg, MO_16));
+}
+
 /* Expand a 2-operand AdvSIMD vector operation using an expander function.  */
 static void gen_gvec_fn2(DisasContext *s, bool is_q, int rd, int rn,
                          GVecGen2Fn *gvec_fn, int vece)
@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar {
 } FPScalar;
 
 static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
-                                        const FPScalar *f,
+                                        const FPScalar *f, int mergereg,
                                         ARMFPStatusFlavour fpsttype)
 {
     switch (a->esz) {
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
             TCGv_i64 t0 = read_fp_dreg(s, a->rn);
             TCGv_i64 t1 = read_fp_dreg(s, a->rm);
             f->gen_d(t0, t0, t1, fpstatus_ptr(fpsttype));
-            write_fp_dreg(s, a->rd, t0);
+            write_fp_dreg_merging(s, a->rd, mergereg, t0);
         }
         break;
     case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
             TCGv_i32 t0 = read_fp_sreg(s, a->rn);
             TCGv_i32 t1 = read_fp_sreg(s, a->rm);
             f->gen_s(t0, t0, t1, fpstatus_ptr(fpsttype));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_sreg_merging(s, a->rd, mergereg, t0);
         }
         break;
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
             TCGv_i32 t0 = read_fp_hreg(s, a->rn);
             TCGv_i32 t1 = read_fp_hreg(s, a->rm);
             f->gen_h(t0, t0, t1, fpstatus_ptr(fpsttype));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_hreg_merging(s, a->rd, mergereg, t0);
         }
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
     return true;
 }
 
-static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
+                          int mergereg)
 {
-    return do_fp3_scalar_with_fpsttype(s, a, f,
+    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
                                        a->esz == MO_16 ?
                                        FPST_A64_F16 : FPST_A64);
 }
 
-static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
+                             int mergereg)
 {
-    return do_fp3_scalar_with_fpsttype(s, a, f, select_ah_fpst(s, a->esz));
+    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
+                                       select_ah_fpst(s, a->esz));
 }
 
 static const FPScalar f_scalar_fadd = {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fadd = {
     gen_helper_vfp_adds,
     gen_helper_vfp_addd,
 };
-TRANS(FADD_s, do_fp3_scalar, a, &f_scalar_fadd)
+TRANS(FADD_s, do_fp3_scalar, a, &f_scalar_fadd, a->rn)
 
 static const FPScalar f_scalar_fsub = {
     gen_helper_vfp_subh,
     gen_helper_vfp_subs,
     gen_helper_vfp_subd,
 };
-TRANS(FSUB_s, do_fp3_scalar, a, &f_scalar_fsub)
+TRANS(FSUB_s, do_fp3_scalar, a, &f_scalar_fsub, a->rn)
 
 static const FPScalar f_scalar_fdiv = {
     gen_helper_vfp_divh,
     gen_helper_vfp_divs,
     gen_helper_vfp_divd,
 };
-TRANS(FDIV_s, do_fp3_scalar, a, &f_scalar_fdiv)
+TRANS(FDIV_s, do_fp3_scalar, a, &f_scalar_fdiv, a->rn)
 
 static const FPScalar f_scalar_fmul = {
     gen_helper_vfp_mulh,
     gen_helper_vfp_muls,
     gen_helper_vfp_muld,
 };
-TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul)
+TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul, a->rn)
 
 static const FPScalar f_scalar_fmax = {
     gen_helper_vfp_maxh,
     gen_helper_vfp_maxs,
     gen_helper_vfp_maxd,
 };
-TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax)
+TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax, a->rn)
 
 static const FPScalar f_scalar_fmin = {
     gen_helper_vfp_minh,
     gen_helper_vfp_mins,
     gen_helper_vfp_mind,
 };
-TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin)
+TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin, a->rn)
 
 static const FPScalar f_scalar_fmaxnm = {
     gen_helper_vfp_maxnumh,
     gen_helper_vfp_maxnums,
     gen_helper_vfp_maxnumd,
 };
-TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm)
+TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm, a->rn)
 
 static const FPScalar f_scalar_fminnm = {
     gen_helper_vfp_minnumh,
     gen_helper_vfp_minnums,
     gen_helper_vfp_minnumd,
 };
-TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm)
+TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm, a->rn)
 
 static const FPScalar f_scalar_fmulx = {
     gen_helper_advsimd_mulxh,
     gen_helper_vfp_mulxs,
     gen_helper_vfp_mulxd,
 };
-TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx)
+TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx, a->rn)
 
 static void gen_fnmul_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fnmul = {
     gen_fnmul_s,
     gen_fnmul_d,
 };
-TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul)
+TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul, a->rn)
 
 static const FPScalar f_scalar_fcmeq = {
     gen_helper_advsimd_ceq_f16,
     gen_helper_neon_ceq_f32,
     gen_helper_neon_ceq_f64,
 };
-TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq)
+TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq, a->rm)
 
 static const FPScalar f_scalar_fcmge = {
     gen_helper_advsimd_cge_f16,
     gen_helper_neon_cge_f32,
     gen_helper_neon_cge_f64,
 };
-TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge)
+TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge, a->rm)
 
 static const FPScalar f_scalar_fcmgt = {
     gen_helper_advsimd_cgt_f16,
     gen_helper_neon_cgt_f32,
     gen_helper_neon_cgt_f64,
 };
-TRANS(FCMGT_s, do_fp3_scalar, a, &f_scalar_fcmgt)
+TRANS(FCMGT_s, do_fp3_scalar, a, &f_scalar_fcmgt, a->rm)
 
 static const FPScalar f_scalar_facge = {
     gen_helper_advsimd_acge_f16,
     gen_helper_neon_acge_f32,
     gen_helper_neon_acge_f64,
 };
-TRANS(FACGE_s, do_fp3_scalar, a, &f_scalar_facge)
+TRANS(FACGE_s, do_fp3_scalar, a, &f_scalar_facge, a->rm)
 
 static const FPScalar f_scalar_facgt = {
     gen_helper_advsimd_acgt_f16,
     gen_helper_neon_acgt_f32,
     gen_helper_neon_acgt_f64,
 };
-TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt)
+TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt, a->rm)
 
 static void gen_fabd_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fabd = {
     gen_fabd_s,
     gen_fabd_d,
 };
-TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd)
+TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd, a->rn)
 
 static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f16,
     gen_helper_recpsf_f32,
     gen_helper_recpsf_f64,
 };
-TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps)
+TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps, a->rn)
 
 static const FPScalar f_scalar_frsqrts = {
     gen_helper_rsqrtsf_f16,
     gen_helper_rsqrtsf_f32,
     gen_helper_rsqrtsf_f64,
 };
-TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts)
+TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts, a->rn)
 
 static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                        const FPScalar *f, bool swap)
-- 
2.34.1

Handle FPCR.NEP for the 3-input scalar operations which use
do_fmla_scalar_idx() and do_fmadd(), by making them call the
appropriate write_fp_*reg_merging() functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
                 gen_vfp_negd(t1, t1);
             }
             gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
-            write_fp_dreg(s, a->rd, t0);
+            write_fp_dreg_merging(s, a->rd, a->rd, t0);
         }
         break;
     case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
                 gen_vfp_negs(t1, t1);
             }
             gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_sreg_merging(s, a->rd, a->rd, t0);
         }
         break;
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
             }
             gen_helper_advsimd_muladdh(t0, t1, t2, t0,
                                        fpstatus_ptr(FPST_A64_F16));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_hreg_merging(s, a->rd, a->rd, t0);
         }
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             }
             fpst = fpstatus_ptr(FPST_A64);
             gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
-            write_fp_dreg(s, a->rd, ta);
+            write_fp_dreg_merging(s, a->rd, a->ra, ta);
         }
         break;
 
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             }
             fpst = fpstatus_ptr(FPST_A64);
             gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
-            write_fp_sreg(s, a->rd, ta);
+            write_fp_sreg_merging(s, a->rd, a->ra, ta);
         }
         break;
 
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             }
             fpst = fpstatus_ptr(FPST_A64_F16);
             gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
-            write_fp_sreg(s, a->rd, ta);
+            write_fp_hreg_merging(s, a->rd, a->ra, ta);
         }
         break;
 
-- 
2.34.1

Currently we implement BFCVT scalar via do_fp1_scalar().  This works
even though BFCVT is a narrowing operation from 32 to 16 bits,
because we can use write_fp_sreg() for float16. However, FPCR.NEP
support requires that we use write_fp_hreg_merging() for float16
outputs, so we can't continue to borrow the non-narrowing
do_fp1_scalar() function for this. Split out trans_BFCVT_s()
into its own implementation that honours FPCR.NEP.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frintx = {
 };
 TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
 
-static const FPScalar1 f_scalar_bfcvt = {
-    .gen_s = gen_helper_bfcvt,
-};
-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
+static bool trans_BFCVT_s(DisasContext *s, arg_rr_e *a)
+{
+    ARMFPStatusFlavour fpsttype = s->fpcr_ah ? FPST_AH : FPST_A64;
+    TCGv_i32 t32;
+    int check;
+
+    if (!dc_isar_feature(aa64_bf16, s)) {
+        return false;
+    }
+
+    check = fp_access_check_scalar_hsd(s, a->esz);
+
+    if (check <= 0) {
+        return check == 0;
+    }
+
+    t32 = read_fp_sreg(s, a->rn);
+    gen_helper_bfcvt(t32, t32, fpstatus_ptr(fpsttype));
+    write_fp_hreg_merging(s, a->rd, a->rd, t32);
+    return true;
+}
 
 static const FPScalar1 f_scalar_frint32 = {
     NULL,
-- 
2.34.1

Handle FPCR.NEP for the 1-input scalar operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_with_fpsttype(DisasContext *s, arg_rr_e *a,
     case MO_64:
         t64 = read_fp_dreg(s, a->rn);
         f->gen_d(t64, t64, fpst);
-        write_fp_dreg(s, a->rd, t64);
+        write_fp_dreg_merging(s, a->rd, a->rd, t64);
         break;
     case MO_32:
         t32 = read_fp_sreg(s, a->rn);
         f->gen_s(t32, t32, fpst);
-        write_fp_sreg(s, a->rd, t32);
+        write_fp_sreg_merging(s, a->rd, a->rd, t32);
         break;
     case MO_16:
         t32 = read_fp_hreg(s, a->rn);
         f->gen_h(t32, t32, fpst);
-        write_fp_sreg(s, a->rd, t32);
+        write_fp_hreg_merging(s, a->rd, a->rd, t32);
         break;
     default:
         g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
 
         gen_helper_vfp_fcvtds(tcg_rd, tcg_rn, fpst);
-        write_fp_dreg(s, a->rd, tcg_rd);
+        write_fp_dreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_hs(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
 
         gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-        /* write_fp_sreg is OK here because top half of result is zero */
-        write_fp_sreg(s, a->rd, tmp);
+        /* write_fp_hreg_merging is OK here because top half of result is zero */
+        write_fp_hreg_merging(s, a->rd, a->rd, tmp);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_sd(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
 
         gen_helper_vfp_fcvtsd(tcg_rd, tcg_rn, fpst);
-        write_fp_sreg(s, a->rd, tcg_rd);
+        write_fp_sreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_hd(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
 
         gen_helper_vfp_fcvt_f64_to_f16(tcg_rd, tcg_rn, fpst, ahp);
-        /* write_fp_sreg is OK here because top half of tcg_rd is zero */
-        write_fp_sreg(s, a->rd, tcg_rd);
+        /* write_fp_hreg_merging is OK here because top half of tcg_rd is zero */
+        write_fp_hreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_sh(DisasContext *s, arg_rr *a)
         TCGv_i32 tcg_ahp = get_ahp_flag();
 
         gen_helper_vfp_fcvt_f16_to_f32(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
-        write_fp_sreg(s, a->rd, tcg_rd);
+        write_fp_sreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_dh(DisasContext *s, arg_rr *a)
         TCGv_i32 tcg_ahp = get_ahp_flag();
 
         gen_helper_vfp_fcvt_f16_to_f64(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
-        write_fp_dreg(s, a->rd, tcg_rd);
+        write_fp_dreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_fcvt_f(DisasContext *s, arg_fcvt *a,
     do_fcvt_scalar(s, a->esz | (is_signed ? MO_SIGN : 0),
                    a->esz, tcg_int, a->shift, a->rn, rmode);
 
-    clear_vec(s, a->rd);
+    if (!s->fpcr_nep) {
+        clear_vec(s, a->rd);
+    }
     write_vec_element(s, tcg_int, a->rd, 0, a->esz);
     return true;
 }
-- 
2.34.1

Handle FPCR.NEP in the operations handled by do_cvtf_scalar().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
         } else {
             gen_helper_vfp_uqtod(tcg_double, tcg_int, tcg_shift, tcg_fpstatus);
         }
-        write_fp_dreg(s, rd, tcg_double);
+        write_fp_dreg_merging(s, rd, rd, tcg_double);
         break;
 
     case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
         } else {
             gen_helper_vfp_uqtos(tcg_single, tcg_int, tcg_shift, tcg_fpstatus);
         }
-        write_fp_sreg(s, rd, tcg_single);
+        write_fp_sreg_merging(s, rd, rd, tcg_single);
         break;
 
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
         } else {
             gen_helper_vfp_uqtoh(tcg_single, tcg_int, tcg_shift, tcg_fpstatus);
         }
-        write_fp_sreg(s, rd, tcg_single);
+        write_fp_hreg_merging(s, rd, rd, tcg_single);
         break;
 
     default:
-- 
2.34.1

Handle FPCR.NEP merging for scalar FABS and FNEG; this requires
an extra parameter to do_fp1_scalar_int(), since FMOV scalar
does not have the merging behaviour.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar1Int {
 } FPScalar1Int;
 
 static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
-                              const FPScalar1Int *f)
+                              const FPScalar1Int *f,
+                              bool merging)
 {
     switch (a->esz) {
     case MO_64:
         if (fp_access_check(s)) {
             TCGv_i64 t = read_fp_dreg(s, a->rn);
             f->gen_d(t, t);
-            write_fp_dreg(s, a->rd, t);
+            if (merging) {
+                write_fp_dreg_merging(s, a->rd, a->rd, t);
+            } else {
+                write_fp_dreg(s, a->rd, t);
+            }
         }
         break;
     case MO_32:
         if (fp_access_check(s)) {
             TCGv_i32 t = read_fp_sreg(s, a->rn);
             f->gen_s(t, t);
-            write_fp_sreg(s, a->rd, t);
+            if (merging) {
+                write_fp_sreg_merging(s, a->rd, a->rd, t);
+            } else {
+                write_fp_sreg(s, a->rd, t);
+            }
         }
         break;
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
         if (fp_access_check(s)) {
             TCGv_i32 t = read_fp_hreg(s, a->rn);
             f->gen_h(t, t);
-            write_fp_sreg(s, a->rd, t);
+            if (merging) {
+                write_fp_hreg_merging(s, a->rd, a->rd, t);
+            } else {
+                write_fp_sreg(s, a->rd, t);
+            }
         }
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fmov = {
     tcg_gen_mov_i32,
     tcg_gen_mov_i64,
 };
-TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov)
+TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov, false)
 
 static const FPScalar1Int f_scalar_fabs = {
     gen_vfp_absh,
     gen_vfp_abss,
     gen_vfp_absd,
 };
-TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs)
+TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs, true)
 
 static const FPScalar1Int f_scalar_fneg = {
     gen_vfp_negh,
     gen_vfp_negs,
     gen_vfp_negd,
 };
-TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg)
+TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg, true)
 
 typedef struct FPScalar1 {
     void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr);
-- 
2.34.1

Unlike the other users of do_2misc_narrow_scalar(), FCVTXN (scalar)
is always double-to-single and must honour FPCR.NEP.  Implement this
directly in a trans function rather than using
do_2misc_narrow_scalar().

We still need gen_fcvtxn_sd() and the f_scalar_fcvtxn[] array for
the FCVTXN (vector) insn, so we move those down in the file to
where they are used.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 43 ++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static ArithOneOp * const f_scalar_uqxtn[] = {
 };
 TRANS(UQXTN_s, do_2misc_narrow_scalar, a, f_scalar_uqxtn)
 
-static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
+static bool trans_FCVTXN_s(DisasContext *s, arg_rr_e *a)
 {
-    /*
-     * 64 bit to 32 bit float conversion
-     * with von Neumann rounding (round to odd)
-     */
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_A64));
-    tcg_gen_extu_i32_i64(d, tmp);
+    if (fp_access_check(s)) {
+        /*
+         * 64 bit to 32 bit float conversion
+         * with von Neumann rounding (round to odd)
+         */
+        TCGv_i64 src = read_fp_dreg(s, a->rn);
+        TCGv_i32 dst = tcg_temp_new_i32();
+        gen_helper_fcvtx_f64_to_f32(dst, src, fpstatus_ptr(FPST_A64));
+        write_fp_sreg_merging(s, a->rd, a->rd, dst);
+    }
+    return true;
 }
 
-static ArithOneOp * const f_scalar_fcvtxn[] = {
-    NULL,
-    NULL,
-    gen_fcvtxn_sd,
-};
-TRANS(FCVTXN_s, do_2misc_narrow_scalar, a, f_scalar_fcvtxn)
-
 #undef WRAP_ENV
 
 static bool do_gvec_fn2(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn)
@@ -XXX,XX +XXX,XX @@ static void gen_fcvtn_sd(TCGv_i64 d, TCGv_i64 n)
     tcg_gen_extu_i32_i64(d, tmp);
 }
 
+static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
+{
+    /*
+     * 64 bit to 32 bit float conversion
+     * with von Neumann rounding (round to odd)
+     */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_A64));
+    tcg_gen_extu_i32_i64(d, tmp);
+}
+
 static ArithOneOp * const f_vector_fcvtn[] = {
     NULL,
     gen_fcvtn_hs,
     gen_fcvtn_sd,
 };
+static ArithOneOp * const f_scalar_fcvtxn[] = {
+    NULL,
+    NULL,
+    gen_fcvtxn_sd,
+};
 TRANS(FCVTN_v, do_2misc_narrow_vector, a, f_vector_fcvtn)
 TRANS(FCVTXN_v, do_2misc_narrow_vector, a, f_scalar_fcvtxn)
 
-- 
2.34.1

do_fp3_scalar_idx() is used only for the FMUL and FMULX scalar by
element instructions; these both need to merge the result with the Rn
register when FPCR.NEP is set.

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
 
             read_vec_element(s, t1, a->rm, a->idx, MO_64);
             f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_A64));
-            write_fp_dreg(s, a->rd, t0);
+            write_fp_dreg_merging(s, a->rd, a->rn, t0);
         }
         break;
     case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
 
             read_vec_element_i32(s, t1, a->rm, a->idx, MO_32);
             f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_A64));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_sreg_merging(s, a->rd, a->rn, t0);
         }
         break;
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
 
             read_vec_element_i32(s, t1, a->rm, a->idx, MO_16);
             f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_A64_F16));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_hreg_merging(s, a->rd, a->rn, t0);
         }
         break;
     default:
-- 
2.34.1

When FPCR.AH == 1, floating point FMIN and FMAX have some odd special
cases:

* comparing two zeroes (even of different sign) or comparing a NaN
   with anything always returns the second argument (possibly
   squashed to zero)
 * denormal outputs are not squashed to zero regardless of FZ or FZ16

Implement these semantics in new helper functions and select them at
translate time if FPCR.AH is 1 for the scalar FMAX and FMIN insns.
(We will convert the other FMAX and FMIN insns in subsequent
commits.)

Note that FMINNM and FMAXNM are not affected.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-a64.h    |  7 +++++++
 target/arm/tcg/helper-a64.c    | 36 ++++++++++++++++++++++++++++++++++
 target/arm/tcg/translate-a64.c | 23 ++++++++++++++++++++--
 3 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(advsimd_muladd2h, i32, i32, i32, i32, fpst)
 DEF_HELPER_2(advsimd_rinth_exact, f16, f16, fpst)
 DEF_HELPER_2(advsimd_rinth, f16, f16, fpst)
 
+DEF_HELPER_3(vfp_ah_minh, f16, f16, f16, fpst)
+DEF_HELPER_3(vfp_ah_mins, f32, f32, f32, fpst)
+DEF_HELPER_3(vfp_ah_mind, f64, f64, f64, fpst)
+DEF_HELPER_3(vfp_ah_maxh, f16, f16, f16, fpst)
+DEF_HELPER_3(vfp_ah_maxs, f32, f32, f32, fpst)
+DEF_HELPER_3(vfp_ah_maxd, f64, f64, f64, fpst)
+
 DEF_HELPER_2(exception_return, void, env, i64)
 DEF_HELPER_FLAGS_2(dc_zva, TCG_CALL_NO_WG, void, env, i64)
 
diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-a64.c
+++ b/target/arm/tcg/helper-a64.c
@@ -XXX,XX +XXX,XX @@ float32 HELPER(fcvtx_f64_to_f32)(float64 a, float_status *fpst)
     return r;
 }
 
+/*
+ * AH=1 min/max have some odd special cases:
+ * comparing two zeroes (regardless of sign), (NaN, anything),
+ * or (anything, NaN) should return the second argument (possibly
+ * squashed to zero).
+ * Also, denormal outputs are not squashed to zero regardless of FZ or FZ16.
+ */
+#define AH_MINMAX_HELPER(NAME, CTYPE, FLOATTYPE, MINMAX)                \
+    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
+    {                                                                   \
+        bool save;                                                      \
+        CTYPE r;                                                        \
+        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
+        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
+        if (FLOATTYPE ## _is_zero(a) && FLOATTYPE ## _is_zero(b)) {     \
+            return b;                                                   \
+        }                                                               \
+        if (FLOATTYPE ## _is_any_nan(a) ||                              \
+            FLOATTYPE ## _is_any_nan(b)) {                              \
+            float_raise(float_flag_invalid, fpst);                      \
+            return b;                                                   \
+        }                                                               \
+        save = get_flush_to_zero(fpst);                                 \
+        set_flush_to_zero(false, fpst);                                 \
+        r = FLOATTYPE ## _ ## MINMAX(a, b, fpst);                       \
+        set_flush_to_zero(save, fpst);                                  \
+        return r;                                                       \
+    }
+
+AH_MINMAX_HELPER(vfp_ah_minh, dh_ctype_f16, float16, min)
+AH_MINMAX_HELPER(vfp_ah_mins, float32, float32, min)
+AH_MINMAX_HELPER(vfp_ah_mind, float64, float64, min)
+AH_MINMAX_HELPER(vfp_ah_maxh, dh_ctype_f16, float16, max)
+AH_MINMAX_HELPER(vfp_ah_maxs, float32, float32, max)
+AH_MINMAX_HELPER(vfp_ah_maxd, float64, float64, max)
+
 /* 64-bit versions of the CRC helpers. Note that although the operation
  * (and the prototypes of crc32c() and crc32() mean that only the bottom
  * 32 bits of the accumulator and result are used, we pass and return
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
                                        select_ah_fpst(s, a->esz));
 }
 
+/* Some insns need to call different helpers when FPCR.AH == 1 */
+static bool do_fp3_scalar_2fn(DisasContext *s, arg_rrr_e *a,
+                              const FPScalar *fnormal,
+                              const FPScalar *fah,
+                              int mergereg)
+{
+    return do_fp3_scalar(s, a, s->fpcr_ah ? fah : fnormal, mergereg);
+}
+
 static const FPScalar f_scalar_fadd = {
     gen_helper_vfp_addh,
     gen_helper_vfp_adds,
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fmax = {
     gen_helper_vfp_maxs,
     gen_helper_vfp_maxd,
 };
-TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax, a->rn)
+static const FPScalar f_scalar_fmax_ah = {
+    gen_helper_vfp_ah_maxh,
+    gen_helper_vfp_ah_maxs,
+    gen_helper_vfp_ah_maxd,
+};
+TRANS(FMAX_s, do_fp3_scalar_2fn, a, &f_scalar_fmax, &f_scalar_fmax_ah, a->rn)
 
 static const FPScalar f_scalar_fmin = {
     gen_helper_vfp_minh,
     gen_helper_vfp_mins,
     gen_helper_vfp_mind,
 };
-TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin, a->rn)
+static const FPScalar f_scalar_fmin_ah = {
+    gen_helper_vfp_ah_minh,
+    gen_helper_vfp_ah_mins,
+    gen_helper_vfp_ah_mind,
+};
+TRANS(FMIN_s, do_fp3_scalar_2fn, a, &f_scalar_fmin, &f_scalar_fmin_ah, a->rn)
 
 static const FPScalar f_scalar_fmaxnm = {
     gen_helper_vfp_maxnumh,
-- 
2.34.1

Implement the FPCR.AH == 1 semantics for vector FMIN/FMAX, by
creating new _ah_ versions of the gvec helpers which invoke the
scalar fmin_ah and fmax_ah helpers on each element.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 21 +++++++++++++++++++--
 target/arm/tcg/vec_helper.c    |  8 ++++++++
 3 files changed, 41 insertions(+), 2 deletions(-)

Implement the FPCR.AH semantics for FMAXV and FMINV.  These are the
"recursively reduce all lanes of a vector to a scalar result" insns;
we just need to use the _ah_ helper for the reduction step when
FPCR.AH == 1.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static TCGv_i32 do_reduction_op(DisasContext *s, int rn, MemOp esz,
 }
 
 static bool do_fp_reduction(DisasContext *s, arg_qrr_e *a,
-                              NeonGenTwoSingleOpFn *fn)
+                            NeonGenTwoSingleOpFn *fnormal,
+                            NeonGenTwoSingleOpFn *fah)
 {
     if (fp_access_check(s)) {
         MemOp esz = a->esz;
         int elts = (a->q ? 16 : 8) >> esz;
         TCGv_ptr fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64);
-        TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst, fn);
+        TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst,
+                                       s->fpcr_ah ? fah : fnormal);
         write_fp_sreg(s, a->rd, res);
     }
     return true;
 }
 
-TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxnumh)
-TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minnumh)
-TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxh)
-TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minh)
+TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_maxnumh, gen_helper_vfp_maxnumh)
+TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_minnumh, gen_helper_vfp_minnumh)
+TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_maxh, gen_helper_vfp_ah_maxh)
+TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_minh, gen_helper_vfp_ah_minh)
 
-TRANS(FMAXNMV_s, do_fp_reduction, a, gen_helper_vfp_maxnums)
-TRANS(FMINNMV_s, do_fp_reduction, a, gen_helper_vfp_minnums)
-TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs)
-TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins)
+TRANS(FMAXNMV_s, do_fp_reduction, a,
+      gen_helper_vfp_maxnums, gen_helper_vfp_maxnums)
+TRANS(FMINNMV_s, do_fp_reduction, a,
+      gen_helper_vfp_minnums, gen_helper_vfp_minnums)
+TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs, gen_helper_vfp_ah_maxs)
+TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins, gen_helper_vfp_ah_mins)
 
 /*
  * Floating-point Immediate
-- 
2.34.1

Implement the FPCR.AH semantics for the pairwise floating
point minimum/maximum insns FMINP and FMAXP.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
 target/arm/tcg/vec_helper.c    | 10 ++++++++++
 3 files changed, 45 insertions(+), 4 deletions(-)

Implement the FPCR.AH semantics for the SVE FMAXV and FMINV
vector-reduction-to-scalar max/min operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 +++++++++++
 target/arm/tcg/sve_helper.c    | 43 +++++++++++++++++++++-------------
 target/arm/tcg/translate-sve.c | 16 +++++++++++--
 3 files changed, 55 insertions(+), 18 deletions(-)

Implement the FPCR.AH semantics for the SVE FMAX and FMIN operations
that take an immediate as the second operand.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/sve_helper.c    |  8 ++++++++
 target/arm/tcg/translate-sve.c | 25 +++++++++++++++++++++++--
 3 files changed, 45 insertions(+), 2 deletions(-)

Implement the FPCR.AH semantics for the SVE FMAX and FMIN
operations that take two vector operands.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/sve_helper.c    |  8 ++++++++
 target/arm/tcg/translate-sve.c | 17 +++++++++++++++--
 3 files changed, 37 insertions(+), 2 deletions(-)

FPCR.AH == 1 mandates that negation of a NaN value should not flip
its sign bit.  This means we can no longer use gen_vfp_neg*()
everywhere but must instead generate slightly more complex code when
FPCR.AH is set.

Make this change for the scalar FNEG and for those places in
translate-a64.c which were previously directly calling
gen_vfp_neg*().

This change in semantics also affects any other instruction whose
pseudocode calls FPNeg(); in following commits we extend this
change to the other affected instructions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 125 ++++++++++++++++++++++++++++++---
 1 file changed, 114 insertions(+), 11 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
                        is_q ? 16 : 8, vec_full_reg_size(s), data, fn);
 }
 
+/*
+ * When FPCR.AH == 1, NEG and ABS do not flip the sign bit of a NaN.
+ * These functions implement
+ *   d = floatN_is_any_nan(s) ? s : floatN_chs(s)
+ * which for float32 is
+ *   d = (s & ~(1 << 31)) > 0x7f800000UL) ? s : (s ^ (1 << 31))
+ * and similarly for the other float sizes.
+ */
+static void gen_vfp_ah_negh(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32(), chs_s = tcg_temp_new_i32();
+
+    gen_vfp_negh(chs_s, s);
+    gen_vfp_absh(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7c00),
+                        s, chs_s);
+}
+
+static void gen_vfp_ah_negs(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32(), chs_s = tcg_temp_new_i32();
+
+    gen_vfp_negs(chs_s, s);
+    gen_vfp_abss(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7f800000UL),
+                        s, chs_s);
+}
+
+static void gen_vfp_ah_negd(TCGv_i64 d, TCGv_i64 s)
+{
+    TCGv_i64 abs_s = tcg_temp_new_i64(), chs_s = tcg_temp_new_i64();
+
+    gen_vfp_negd(chs_s, s);
+    gen_vfp_absd(abs_s, s);
+    tcg_gen_movcond_i64(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i64(0x7ff0000000000000ULL),
+                        s, chs_s);
+}
+
+static void gen_vfp_maybe_ah_negh(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
+{
+    if (dc->fpcr_ah) {
+        gen_vfp_ah_negh(d, s);
+    } else {
+        gen_vfp_negh(d, s);
+    }
+}
+
+static void gen_vfp_maybe_ah_negs(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
+{
+    if (dc->fpcr_ah) {
+        gen_vfp_ah_negs(d, s);
+    } else {
+        gen_vfp_negs(d, s);
+    }
+}
+
+static void gen_vfp_maybe_ah_negd(DisasContext *dc, TCGv_i64 d, TCGv_i64 s)
+{
+    if (dc->fpcr_ah) {
+        gen_vfp_ah_negd(d, s);
+    } else {
+        gen_vfp_negd(d, s);
+    }
+}
+
 /* Set ZF and NF based on a 64 bit result. This is alas fiddlier
  * than the 32 bit equivalent.
  */
@@ -XXX,XX +XXX,XX @@ static void gen_fnmul_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
     gen_vfp_negd(d, d);
 }
 
+static void gen_fnmul_ah_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_mulh(d, n, m, s);
+    gen_vfp_ah_negh(d, d);
+}
+
+static void gen_fnmul_ah_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_muls(d, n, m, s);
+    gen_vfp_ah_negs(d, d);
+}
+
+static void gen_fnmul_ah_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
+{
+    gen_helper_vfp_muld(d, n, m, s);
+    gen_vfp_ah_negd(d, d);
+}
+
 static const FPScalar f_scalar_fnmul = {
     gen_fnmul_h,
     gen_fnmul_s,
     gen_fnmul_d,
 };
-TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul, a->rn)
+static const FPScalar f_scalar_ah_fnmul = {
+    gen_fnmul_ah_h,
+    gen_fnmul_ah_s,
+    gen_fnmul_ah_d,
+};
+TRANS(FNMUL_s, do_fp3_scalar_2fn, a, &f_scalar_fnmul, &f_scalar_ah_fnmul, a->rn)
 
 static const FPScalar f_scalar_fcmeq = {
     gen_helper_advsimd_ceq_f16,
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
 
             read_vec_element(s, t2, a->rm, a->idx, MO_64);
             if (neg) {
-                gen_vfp_negd(t1, t1);
+                gen_vfp_maybe_ah_negd(s, t1, t1);
             }
             gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
             write_fp_dreg_merging(s, a->rd, a->rd, t0);
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
 
             read_vec_element_i32(s, t2, a->rm, a->idx, MO_32);
             if (neg) {
-                gen_vfp_negs(t1, t1);
+                gen_vfp_maybe_ah_negs(s, t1, t1);
             }
             gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
             write_fp_sreg_merging(s, a->rd, a->rd, t0);
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
 
             read_vec_element_i32(s, t2, a->rm, a->idx, MO_16);
             if (neg) {
-                gen_vfp_negh(t1, t1);
+                gen_vfp_maybe_ah_negh(s, t1, t1);
             }
             gen_helper_advsimd_muladdh(t0, t1, t2, t0,
                                        fpstatus_ptr(FPST_A64_F16));
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             TCGv_i64 ta = read_fp_dreg(s, a->ra);
 
             if (neg_a) {
-                gen_vfp_negd(ta, ta);
+                gen_vfp_maybe_ah_negd(s, ta, ta);
             }
             if (neg_n) {
-                gen_vfp_negd(tn, tn);
+                gen_vfp_maybe_ah_negd(s, tn, tn);
             }
             fpst = fpstatus_ptr(FPST_A64);
             gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             TCGv_i32 ta = read_fp_sreg(s, a->ra);
 
             if (neg_a) {
-                gen_vfp_negs(ta, ta);
+                gen_vfp_maybe_ah_negs(s, ta, ta);
             }
             if (neg_n) {
-                gen_vfp_negs(tn, tn);
+                gen_vfp_maybe_ah_negs(s, tn, tn);
             }
             fpst = fpstatus_ptr(FPST_A64);
             gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             TCGv_i32 ta = read_fp_hreg(s, a->ra);
 
             if (neg_a) {
-                gen_vfp_negh(ta, ta);
+                gen_vfp_maybe_ah_negh(s, ta, ta);
             }
             if (neg_n) {
-                gen_vfp_negh(tn, tn);
+                gen_vfp_maybe_ah_negh(s, tn, tn);
             }
             fpst = fpstatus_ptr(FPST_A64_F16);
             gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
     return true;
 }
 
+static bool do_fp1_scalar_int_2fn(DisasContext *s, arg_rr_e *a,
+                                  const FPScalar1Int *fnormal,
+                                  const FPScalar1Int *fah)
+{
+    return do_fp1_scalar_int(s, a, s->fpcr_ah ? fah : fnormal, true);
+}
+
 static const FPScalar1Int f_scalar_fmov = {
     tcg_gen_mov_i32,
     tcg_gen_mov_i32,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fneg = {
     gen_vfp_negs,
     gen_vfp_negd,
 };
-TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg, true)
+static const FPScalar1Int f_scalar_ah_fneg = {
+    gen_vfp_ah_negh,
+    gen_vfp_ah_negs,
+    gen_vfp_ah_negd,
+};
+TRANS(FNEG_s, do_fp1_scalar_int_2fn, a, &f_scalar_fneg, &f_scalar_ah_fneg)
 
 typedef struct FPScalar1 {
     void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr);
-- 
2.34.1

FPCR.AH == 1 mandates that taking the absolute value of a NaN should
not change its sign bit.  This means we can no longer use
gen_vfp_abs*() everywhere but must instead generate slightly more
complex code when FPCR.AH is set.

Implement these semantics for scalar FABS and FABD.  This change also
affects all other instructions whose psuedocode calls FPAbs(); we
will extend the change to those instructions in following commits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 69 +++++++++++++++++++++++++++++++++-
 1 file changed, 67 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_vfp_ah_negd(TCGv_i64 d, TCGv_i64 s)
                         s, chs_s);
 }
 
+/*
+ * These functions implement
+ *  d = floatN_is_any_nan(s) ? s : floatN_abs(s)
+ * which for float32 is
+ *  d = (s & ~(1 << 31)) > 0x7f800000UL) ? s : (s & ~(1 << 31))
+ * and similarly for the other float sizes.
+ */
+static void gen_vfp_ah_absh(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32();
+
+    gen_vfp_absh(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7c00),
+                        s, abs_s);
+}
+
+static void gen_vfp_ah_abss(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32();
+
+    gen_vfp_abss(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7f800000UL),
+                        s, abs_s);
+}
+
+static void gen_vfp_ah_absd(TCGv_i64 d, TCGv_i64 s)
+{
+    TCGv_i64 abs_s = tcg_temp_new_i64();
+
+    gen_vfp_absd(abs_s, s);
+    tcg_gen_movcond_i64(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i64(0x7ff0000000000000ULL),
+                        s, abs_s);
+}
+
 static void gen_vfp_maybe_ah_negh(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
 {
     if (dc->fpcr_ah) {
@@ -XXX,XX +XXX,XX @@ static void gen_fabd_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
     gen_vfp_absd(d, d);
 }
 
+static void gen_fabd_ah_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_subh(d, n, m, s);
+    gen_vfp_ah_absh(d, d);
+}
+
+static void gen_fabd_ah_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_subs(d, n, m, s);
+    gen_vfp_ah_abss(d, d);
+}
+
+static void gen_fabd_ah_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
+{
+    gen_helper_vfp_subd(d, n, m, s);
+    gen_vfp_ah_absd(d, d);
+}
+
 static const FPScalar f_scalar_fabd = {
     gen_fabd_h,
     gen_fabd_s,
     gen_fabd_d,
 };
-TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd, a->rn)
+static const FPScalar f_scalar_ah_fabd = {
+    gen_fabd_ah_h,
+    gen_fabd_ah_s,
+    gen_fabd_ah_d,
+};
+TRANS(FABD_s, do_fp3_scalar_2fn, a, &f_scalar_fabd, &f_scalar_ah_fabd, a->rn)
 
 static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f16,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fabs = {
     gen_vfp_abss,
     gen_vfp_absd,
 };
-TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs, true)
+static const FPScalar1Int f_scalar_ah_fabs = {
+    gen_vfp_ah_absh,
+    gen_vfp_ah_abss,
+    gen_vfp_ah_absd,
+};
+TRANS(FABS_s, do_fp1_scalar_int_2fn, a, &f_scalar_fabs, &f_scalar_ah_fabs)
 
 static const FPScalar1Int f_scalar_fneg = {
     gen_vfp_negh,
-- 
2.34.1

Split the handling of vector FABD so that it calls a different set
of helpers when FPCR.AH is 1, which implement the "no negation of
the sign of a NaN" semantics.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h            |  4 ++++
 target/arm/tcg/translate-a64.c |  7 ++++++-
 target/arm/tcg/vec_helper.c    | 23 +++++++++++++++++++++++
 3 files changed, 33 insertions(+), 1 deletion(-)

Make SVE FNEG honour the FPCR.AH "don't negate the sign of a NaN"
semantics.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 4 ++++
 target/arm/tcg/sve_helper.c    | 8 ++++++++
 target/arm/tcg/translate-sve.c | 7 ++++++-
 3 files changed, 18 insertions(+), 1 deletion(-)

Make SVE FABS honour the FPCR.AH "don't negate the sign of a NaN"
semantics.

Make the SVE FABD insn honour the FPCR.AH "don't negate the sign
of a NaN" semantics.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    |  7 +++++++
 target/arm/tcg/sve_helper.c    | 22 ++++++++++++++++++++++
 target/arm/tcg/translate-sve.c |  2 +-
 3 files changed, 30 insertions(+), 1 deletion(-)

The negation steps in FCADD must honour FPCR.AH's "don't change the
sign of a NaN" semantics.  Implement this in the same way we did for
the base ASIMD FCADD, by encoding FPCR.AH into the SIMD data field
passed to the helper and using that to decide whether to negate the
values.

The construction of neg_imag and neg_real were done to make it easy
to apply both in parallel with two simple logical operations.  This
changed with FPCR.AH, which is more complex than that. Switch to
an approach that follows the pseudocode more closely, by extracting
the 'rot=1' parameter from the SIMD data field and changing the
sign of the appropriate input value.

Note that there was a naming issue with neg_imag and neg_real.
They were named backward, with neg_imag being non-zero for rot=1,
and vice versa.  This was combined with reversed usage within the
loop, so that the negation in the end turned out correct.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/vec_internal.h  | 17 ++++++++++++++
 target/arm/tcg/sve_helper.c    | 42 ++++++++++++++++++++++++----------
 target/arm/tcg/translate-sve.c |  2 +-
 3 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_internal.h
+++ b/target/arm/tcg/vec_internal.h
@@ -XXX,XX +XXX,XX @@
 #ifndef TARGET_ARM_VEC_INTERNAL_H
 #define TARGET_ARM_VEC_INTERNAL_H
 
+#include "fpu/softfloat.h"
+
 /*
  * Note that vector data is stored in host-endian 64-bit chunks,
  * so addressing units smaller than that needs a host-endian fixup.
@@ -XXX,XX +XXX,XX @@ float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2,
  */
 bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp);
 
+static inline float16 float16_maybe_ah_chs(float16 a, bool fpcr_ah)
+{
+    return fpcr_ah && float16_is_any_nan(a) ? a : float16_chs(a);
+}
+
+static inline float32 float32_maybe_ah_chs(float32 a, bool fpcr_ah)
+{
+    return fpcr_ah && float32_is_any_nan(a) ? a : float32_chs(a);
+}
+
+static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah)
+{
+    return fpcr_ah && float64_is_any_nan(a) ? a : float64_chs(a);
+}
+
 #endif /* TARGET_ARM_VEC_INTERNAL_H */
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg,
 {
     intptr_t j, i = simd_oprsz(desc);
     uint64_t *g = vg;
-    float16 neg_imag = float16_set_sign(0, simd_data(desc));
-    float16 neg_real = float16_chs(neg_imag);
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg,
             i -= 2 * sizeof(float16);
 
             e0 = *(float16 *)(vn + H1_2(i));
-            e1 = *(float16 *)(vm + H1_2(j)) ^ neg_real;
+            e1 = *(float16 *)(vm + H1_2(j));
             e2 = *(float16 *)(vn + H1_2(j));
-            e3 = *(float16 *)(vm + H1_2(i)) ^ neg_imag;
+            e3 = *(float16 *)(vm + H1_2(i));
+
+            if (rot) {
+                e3 = float16_maybe_ah_chs(e3, fpcr_ah);
+            } else {
+                e1 = float16_maybe_ah_chs(e1, fpcr_ah);
+            }
 
             if (likely((pg >> (i & 63)) & 1)) {
                 *(float16 *)(vd + H1_2(i)) = float16_add(e0, e1, s);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg,
 {
     intptr_t j, i = simd_oprsz(desc);
     uint64_t *g = vg;
-    float32 neg_imag = float32_set_sign(0, simd_data(desc));
-    float32 neg_real = float32_chs(neg_imag);
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg,
             i -= 2 * sizeof(float32);
 
             e0 = *(float32 *)(vn + H1_2(i));
-            e1 = *(float32 *)(vm + H1_2(j)) ^ neg_real;
+            e1 = *(float32 *)(vm + H1_2(j));
             e2 = *(float32 *)(vn + H1_2(j));
-            e3 = *(float32 *)(vm + H1_2(i)) ^ neg_imag;
+            e3 = *(float32 *)(vm + H1_2(i));
+
+            if (rot) {
+                e3 = float32_maybe_ah_chs(e3, fpcr_ah);
+            } else {
+                e1 = float32_maybe_ah_chs(e1, fpcr_ah);
+            }
 
             if (likely((pg >> (i & 63)) & 1)) {
                 *(float32 *)(vd + H1_2(i)) = float32_add(e0, e1, s);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg,
 {
     intptr_t j, i = simd_oprsz(desc);
     uint64_t *g = vg;
-    float64 neg_imag = float64_set_sign(0, simd_data(desc));
-    float64 neg_real = float64_chs(neg_imag);
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg,
             i -= 2 * sizeof(float64);
 
             e0 = *(float64 *)(vn + H1_2(i));
-            e1 = *(float64 *)(vm + H1_2(j)) ^ neg_real;
+            e1 = *(float64 *)(vm + H1_2(j));
             e2 = *(float64 *)(vn + H1_2(j));
-            e3 = *(float64 *)(vm + H1_2(i)) ^ neg_imag;
+            e3 = *(float64 *)(vm + H1_2(i));
+
+            if (rot) {
+                e3 = float64_maybe_ah_chs(e3, fpcr_ah);
+            } else {
+                e1 = float64_maybe_ah_chs(e1, fpcr_ah);
+            }
 
             if (likely((pg >> (i & 63)) & 1)) {
                 *(float64 *)(vd + H1_2(i)) = float64_add(e0, e1, s);
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_4_ptr * const fcadd_fns[] = {
     gen_helper_sve_fcadd_s, gen_helper_sve_fcadd_d,
 };
 TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
-           a->rd, a->rn, a->rm, a->pg, a->rot,
+           a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
 #define DO_FMLA(NAME, name) \
-- 
2.34.1

The negation steps in FCADD must honour FPCR.AH's "don't change the
sign of a NaN" semantics.  Implement this by encoding FPCR.AH into
the SIMD data field passed to the helper and using that to decide
whether to negate the values.

The construction of neg_imag and neg_real were done to make it easy
to apply both in parallel with two simple logical operations.  This
changed with FPCR.AH, which is more complex than that. Switch to
an approach closer to the pseudocode, where we extract the rot
parameter from the SIMD data word and negate the appropriate
input value.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 10 +++++--
 target/arm/tcg/vec_helper.c    | 54 +++++++++++++++++++---------------
 2 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fcadd[3] = {
     gen_helper_gvec_fcadds,
     gen_helper_gvec_fcaddd,
 };
-TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0, f_vector_fcadd)
-TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1, f_vector_fcadd)
+/*
+ * Encode FPCR.AH into the data so the helper knows whether the
+ * negations it does should avoid flipping the sign bit on a NaN
+ */
+TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0 | (s->fpcr_ah << 1),
+           f_vector_fcadd)
+TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1 | (s->fpcr_ah << 1),
+           f_vector_fcadd)
 
 static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
 {
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm,
     float16 *d = vd;
     float16 *n = vn;
     float16 *m = vm;
-    uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = neg_real ^ 1;
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 15;
-    neg_imag <<= 15;
-
     for (i = 0; i < opr_sz / 2; i += 2) {
         float16 e0 = n[H2(i)];
-        float16 e1 = m[H2(i + 1)] ^ neg_imag;
+        float16 e1 = m[H2(i + 1)];
         float16 e2 = n[H2(i + 1)];
-        float16 e3 = m[H2(i)] ^ neg_real;
+        float16 e3 = m[H2(i)];
+
+        if (rot) {
+            e3 = float16_maybe_ah_chs(e3, fpcr_ah);
+        } else {
+            e1 = float16_maybe_ah_chs(e1, fpcr_ah);
+        }
 
         d[H2(i)] = float16_add(e0, e1, fpst);
         d[H2(i + 1)] = float16_add(e2, e3, fpst);
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcadds)(void *vd, void *vn, void *vm,
     float32 *d = vd;
     float32 *n = vn;
     float32 *m = vm;
-    uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = neg_real ^ 1;
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 31;
-    neg_imag <<= 31;
-
     for (i = 0; i < opr_sz / 4; i += 2) {
         float32 e0 = n[H4(i)];
-        float32 e1 = m[H4(i + 1)] ^ neg_imag;
+        float32 e1 = m[H4(i + 1)];
         float32 e2 = n[H4(i + 1)];
-        float32 e3 = m[H4(i)] ^ neg_real;
+        float32 e3 = m[H4(i)];
+
+        if (rot) {
+            e3 = float32_maybe_ah_chs(e3, fpcr_ah);
+        } else {
+            e1 = float32_maybe_ah_chs(e1, fpcr_ah);
+        }
 
         d[H4(i)] = float32_add(e0, e1, fpst);
         d[H4(i + 1)] = float32_add(e2, e3, fpst);
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
     float64 *d = vd;
     float64 *n = vn;
     float64 *m = vm;
-    uint64_t neg_real = extract64(desc, SIMD_DATA_SHIFT, 1);
-    uint64_t neg_imag = neg_real ^ 1;
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 63;
-    neg_imag <<= 63;
-
     for (i = 0; i < opr_sz / 8; i += 2) {
         float64 e0 = n[i];
-        float64 e1 = m[i + 1] ^ neg_imag;
+        float64 e1 = m[i + 1];
         float64 e2 = n[i + 1];
-        float64 e3 = m[i] ^ neg_real;
+        float64 e3 = m[i];
+
+        if (rot) {
+            e3 = float64_maybe_ah_chs(e3, fpcr_ah);
+        } else {
+            e1 = float64_maybe_ah_chs(e1, fpcr_ah);
+        }
 
         d[i] = float64_add(e0, e1, fpst);
         d[i + 1] = float64_add(e2, e3, fpst);
-- 
2.34.1

Handle the FPCR.AH semantics that we do not change the sign of an
input NaN in the FRECPS and FRSQRTS scalar insns, by providing
new helper functions that do the CHS part of the operation
differently.

Since the extra helper functions would be very repetitive if written
out longhand, we condense them and the existing non-AH helpers into
being emitted via macros.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-a64.h    |   6 ++
 target/arm/tcg/vec_internal.h  |  18 ++++++
 target/arm/tcg/helper-a64.c    | 115 ++++++++++++---------------------
 target/arm/tcg/translate-a64.c |  25 +++++--
 4 files changed, 83 insertions(+), 81 deletions(-)

diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(neon_cgt_f64, TCG_CALL_NO_RWG, i64, i64, i64, fpst)
 DEF_HELPER_FLAGS_3(recpsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
 DEF_HELPER_FLAGS_3(recpsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
 DEF_HELPER_FLAGS_3(recpsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
+DEF_HELPER_FLAGS_3(recpsf_ah_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
+DEF_HELPER_FLAGS_3(recpsf_ah_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
+DEF_HELPER_FLAGS_3(recpsf_ah_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
 DEF_HELPER_FLAGS_3(rsqrtsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
 DEF_HELPER_FLAGS_3(rsqrtsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
 DEF_HELPER_FLAGS_3(rsqrtsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
+DEF_HELPER_FLAGS_3(rsqrtsf_ah_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
+DEF_HELPER_FLAGS_3(rsqrtsf_ah_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
+DEF_HELPER_FLAGS_3(rsqrtsf_ah_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
 DEF_HELPER_FLAGS_2(frecpx_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
 DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
 DEF_HELPER_FLAGS_2(frecpx_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_internal.h
+++ b/target/arm/tcg/vec_internal.h
@@ -XXX,XX +XXX,XX @@ float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2,
  */
 bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp);
 
+/*
+ * Negate as for FPCR.AH=1 -- do not negate NaNs.
+ */
+static inline float16 float16_ah_chs(float16 a)
+{
+    return float16_is_any_nan(a) ? a : float16_chs(a);
+}
+
+static inline float32 float32_ah_chs(float32 a)
+{
+    return float32_is_any_nan(a) ? a : float32_chs(a);
+}
+
+static inline float64 float64_ah_chs(float64 a)
+{
+    return float64_is_any_nan(a) ? a : float64_chs(a);
+}
+
 static inline float16 float16_maybe_ah_chs(float16 a, bool fpcr_ah)
 {
     return fpcr_ah && float16_is_any_nan(a) ? a : float16_chs(a);
diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-a64.c
+++ b/target/arm/tcg/helper-a64.c
@@ -XXX,XX +XXX,XX @@
 #ifdef CONFIG_USER_ONLY
 #include "user/page-protection.h"
 #endif
+#include "vec_internal.h"
 
 /* C2.4.7 Multiply and divide */
 /* special cases for 0 and LLONG_MIN are mandated by the standard */
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_cgt_f64)(float64 a, float64 b, float_status *fpst)
     return -float64_lt(b, a, fpst);
 }
 
-/* Reciprocal step and sqrt step. Note that unlike the A32/T32
+/*
+ * Reciprocal step and sqrt step. Note that unlike the A32/T32
  * versions, these do a fully fused multiply-add or
  * multiply-add-and-halve.
+ * The FPCR.AH == 1 versions need to avoid flipping the sign of NaN.
  */
-
-uint32_t HELPER(recpsf_f16)(uint32_t a, uint32_t b, float_status *fpst)
-{
-    a = float16_squash_input_denormal(a, fpst);
-    b = float16_squash_input_denormal(b, fpst);
-
-    a = float16_chs(a);
-    if ((float16_is_infinity(a) && float16_is_zero(b)) ||
-        (float16_is_infinity(b) && float16_is_zero(a))) {
-        return float16_two;
+#define DO_RECPS(NAME, CTYPE, FLOATTYPE, CHSFN)                         \
+    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
+    {                                                                   \
+        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
+        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
+        a = FLOATTYPE ## _ ## CHSFN(a);                                 \
+        if ((FLOATTYPE ## _is_infinity(a) && FLOATTYPE ## _is_zero(b)) || \
+            (FLOATTYPE ## _is_infinity(b) && FLOATTYPE ## _is_zero(a))) { \
+            return FLOATTYPE ## _two;                                   \
+        }                                                               \
+        return FLOATTYPE ## _muladd(a, b, FLOATTYPE ## _two, 0, fpst);  \
     }
-    return float16_muladd(a, b, float16_two, 0, fpst);
-}
 
-float32 HELPER(recpsf_f32)(float32 a, float32 b, float_status *fpst)
-{
-    a = float32_squash_input_denormal(a, fpst);
-    b = float32_squash_input_denormal(b, fpst);
+DO_RECPS(recpsf_f16, uint32_t, float16, chs)
+DO_RECPS(recpsf_f32, float32, float32, chs)
+DO_RECPS(recpsf_f64, float64, float64, chs)
+DO_RECPS(recpsf_ah_f16, uint32_t, float16, ah_chs)
+DO_RECPS(recpsf_ah_f32, float32, float32, ah_chs)
+DO_RECPS(recpsf_ah_f64, float64, float64, ah_chs)
 
-    a = float32_chs(a);
-    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
-        (float32_is_infinity(b) && float32_is_zero(a))) {
-        return float32_two;
-    }
-    return float32_muladd(a, b, float32_two, 0, fpst);
-}
+#define DO_RSQRTSF(NAME, CTYPE, FLOATTYPE, CHSFN)                       \
+    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
+    {                                                                   \
+        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
+        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
+        a = FLOATTYPE ## _ ## CHSFN(a);                                 \
+        if ((FLOATTYPE ## _is_infinity(a) && FLOATTYPE ## _is_zero(b)) || \
+            (FLOATTYPE ## _is_infinity(b) && FLOATTYPE ## _is_zero(a))) { \
+            return FLOATTYPE ## _one_point_five;                        \
+        }                                                               \
+        return FLOATTYPE ## _muladd_scalbn(a, b, FLOATTYPE ## _three,   \
+                                           -1, 0, fpst);                \
+    }                                                                   \
 
-float64 HELPER(recpsf_f64)(float64 a, float64 b, float_status *fpst)
-{
-    a = float64_squash_input_denormal(a, fpst);
-    b = float64_squash_input_denormal(b, fpst);
-
-    a = float64_chs(a);
-    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
-        (float64_is_infinity(b) && float64_is_zero(a))) {
-        return float64_two;
-    }
-    return float64_muladd(a, b, float64_two, 0, fpst);
-}
-
-uint32_t HELPER(rsqrtsf_f16)(uint32_t a, uint32_t b, float_status *fpst)
-{
-    a = float16_squash_input_denormal(a, fpst);
-    b = float16_squash_input_denormal(b, fpst);
-
-    a = float16_chs(a);
-    if ((float16_is_infinity(a) && float16_is_zero(b)) ||
-        (float16_is_infinity(b) && float16_is_zero(a))) {
-        return float16_one_point_five;
-    }
-    return float16_muladd_scalbn(a, b, float16_three, -1, 0, fpst);
-}
-
-float32 HELPER(rsqrtsf_f32)(float32 a, float32 b, float_status *fpst)
-{
-    a = float32_squash_input_denormal(a, fpst);
-    b = float32_squash_input_denormal(b, fpst);
-
-    a = float32_chs(a);
-    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
-        (float32_is_infinity(b) && float32_is_zero(a))) {
-        return float32_one_point_five;
-    }
-    return float32_muladd_scalbn(a, b, float32_three, -1, 0, fpst);
-}
-
-float64 HELPER(rsqrtsf_f64)(float64 a, float64 b, float_status *fpst)
-{
-    a = float64_squash_input_denormal(a, fpst);
-    b = float64_squash_input_denormal(b, fpst);
-
-    a = float64_chs(a);
-    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
-        (float64_is_infinity(b) && float64_is_zero(a))) {
-        return float64_one_point_five;
-    }
-    return float64_muladd_scalbn(a, b, float64_three, -1, 0, fpst);
-}
+DO_RSQRTSF(rsqrtsf_f16, uint32_t, float16, chs)
+DO_RSQRTSF(rsqrtsf_f32, float32, float32, chs)
+DO_RSQRTSF(rsqrtsf_f64, float64, float64, chs)
+DO_RSQRTSF(rsqrtsf_ah_f16, uint32_t, float16, ah_chs)
+DO_RSQRTSF(rsqrtsf_ah_f32, float32, float32, ah_chs)
+DO_RSQRTSF(rsqrtsf_ah_f64, float64, float64, ah_chs)
 
 /* Floating-point reciprocal exponent - see FPRecpX in ARM ARM */
 uint32_t HELPER(frecpx_f16)(uint32_t a, float_status *fpst)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
                                        FPST_A64_F16 : FPST_A64);
 }
 
-static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
-                             int mergereg)
+static bool do_fp3_scalar_ah_2fn(DisasContext *s, arg_rrr_e *a,
+                                 const FPScalar *fnormal, const FPScalar *fah,
+                                 int mergereg)
 {
-    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
-                                       select_ah_fpst(s, a->esz));
+    return do_fp3_scalar_with_fpsttype(s, a, s->fpcr_ah ? fah : fnormal,
+                                       mergereg, select_ah_fpst(s, a->esz));
 }
 
 /* Some insns need to call different helpers when FPCR.AH == 1 */
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f32,
     gen_helper_recpsf_f64,
 };
-TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps, a->rn)
+static const FPScalar f_scalar_ah_frecps = {
+    gen_helper_recpsf_ah_f16,
+    gen_helper_recpsf_ah_f32,
+    gen_helper_recpsf_ah_f64,
+};
+TRANS(FRECPS_s, do_fp3_scalar_ah_2fn, a,
+      &f_scalar_frecps, &f_scalar_ah_frecps, a->rn)
 
 static const FPScalar f_scalar_frsqrts = {
     gen_helper_rsqrtsf_f16,
     gen_helper_rsqrtsf_f32,
     gen_helper_rsqrtsf_f64,
 };
-TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts, a->rn)
+static const FPScalar f_scalar_ah_frsqrts = {
+    gen_helper_rsqrtsf_ah_f16,
+    gen_helper_rsqrtsf_ah_f32,
+    gen_helper_rsqrtsf_ah_f64,
+};
+TRANS(FRSQRTS_s, do_fp3_scalar_ah_2fn, a,
+      &f_scalar_frsqrts, &f_scalar_ah_frsqrts, a->rn)
 
 static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                        const FPScalar *f, bool swap)
-- 
2.34.1

Handle the FPCR.AH "don't negate the sign of a NaN" semantics
in the vector versions of FRECPS and FRSQRTS, by implementing
new vector wrappers that call the _ah_ scalar helpers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 21 ++++++++++++++++-----
 target/arm/tcg/translate-sve.c |  7 ++++++-
 target/arm/tcg/vec_helper.c    |  8 ++++++++
 4 files changed, 44 insertions(+), 6 deletions(-)

Handle the FPCR.AH "don't negate the sign of a NaN" semantics in FMLS
(indexed). We do this by creating 6 new helpers, which allow us to
do the negation either by XOR (for AH=0) or by muladd flags
(for AH=1).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: Mostly from RTH's patch; error in index order into fns[][]
 fixed]
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h            | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 17 +++++++++++------
 target/arm/tcg/translate-sve.c | 31 +++++++++++++++++--------------
 target/arm/tcg/vec_helper.c    | 24 +++++++++++++++---------
 4 files changed, 57 insertions(+), 29 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_6(gvec_fmls_idx_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(gvec_fmls_idx_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(gvec_fmls_idx_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_5(gvec_uqadd_b, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_uqadd_h, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ TRANS(FMULX_vi, do_fp3_vector_idx, a, f_vector_idx_fmulx)
 
 static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
 {
-    static gen_helper_gvec_4_ptr * const fns[3] = {
-        gen_helper_gvec_fmla_idx_h,
-        gen_helper_gvec_fmla_idx_s,
-        gen_helper_gvec_fmla_idx_d,
+    static gen_helper_gvec_4_ptr * const fns[3][3] = {
+        { gen_helper_gvec_fmla_idx_h,
+          gen_helper_gvec_fmla_idx_s,
+          gen_helper_gvec_fmla_idx_d },
+        { gen_helper_gvec_fmls_idx_h,
+          gen_helper_gvec_fmls_idx_s,
+          gen_helper_gvec_fmls_idx_d },
+        { gen_helper_gvec_ah_fmls_idx_h,
+          gen_helper_gvec_ah_fmls_idx_s,
+          gen_helper_gvec_ah_fmls_idx_d },
     };
     MemOp esz = a->esz;
     int check = fp_access_check_vector_hsd(s, a->q, esz);
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
                       esz == MO_16 ? FPST_A64_F16 : FPST_A64,
-                      (a->idx << 1) | neg,
-                      fns[esz - 1]);
+                      a->idx, fns[neg ? 1 + s->fpcr_ah : 0][esz - 1]);
     return true;
 }
 
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ DO_SVE2_RRXR_ROT(CDOT_zzxw_d, gen_helper_sve2_cdot_idx_d)
  *** SVE Floating Point Multiply-Add Indexed Group
  */
 
-static bool do_FMLA_zzxz(DisasContext *s, arg_rrxr_esz *a, bool sub)
-{
-    static gen_helper_gvec_4_ptr * const fns[4] = {
-        NULL,
-        gen_helper_gvec_fmla_idx_h,
-        gen_helper_gvec_fmla_idx_s,
-        gen_helper_gvec_fmla_idx_d,
-    };
-    return gen_gvec_fpst_zzzz(s, fns[a->esz], a->rd, a->rn, a->rm, a->ra,
-                              (a->index << 1) | sub,
-                              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
-}
+static gen_helper_gvec_4_ptr * const fmla_idx_fns[4] = {
+    NULL,                       gen_helper_gvec_fmla_idx_h,
+    gen_helper_gvec_fmla_idx_s, gen_helper_gvec_fmla_idx_d
+};
+TRANS_FEAT(FMLA_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
+           fmla_idx_fns[a->esz], a->rd, a->rn, a->rm, a->ra, a->index,
+           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
-TRANS_FEAT(FMLA_zzxz, aa64_sve, do_FMLA_zzxz, a, false)
-TRANS_FEAT(FMLS_zzxz, aa64_sve, do_FMLA_zzxz, a, true)
+static gen_helper_gvec_4_ptr * const fmls_idx_fns[4][2] = {
+    { NULL, NULL },
+    { gen_helper_gvec_fmls_idx_h, gen_helper_gvec_ah_fmls_idx_h },
+    { gen_helper_gvec_fmls_idx_s, gen_helper_gvec_ah_fmls_idx_s },
+    { gen_helper_gvec_fmls_idx_d, gen_helper_gvec_ah_fmls_idx_d },
+};
+TRANS_FEAT(FMLS_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
+           fmls_idx_fns[a->esz][s->fpcr_ah],
+           a->rd, a->rn, a->rm, a->ra, a->index,
+           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
 /*
  *** SVE Floating Point Multiply Indexed Group
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_FMUL_IDX(gvec_fmls_nf_idx_s, float32_sub, float32_mul, float32, H4)
 
 #undef DO_FMUL_IDX
 
-#define DO_FMLA_IDX(NAME, TYPE, H)                                         \
+#define DO_FMLA_IDX(NAME, TYPE, H, NEGX, NEGF)                             \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *va,                  \
                   float_status *stat, uint32_t desc)                       \
 {                                                                          \
     intptr_t i, j, oprsz = simd_oprsz(desc);                               \
     intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
-    TYPE op1_neg = extract32(desc, SIMD_DATA_SHIFT, 1);                    \
-    intptr_t idx = desc >> (SIMD_DATA_SHIFT + 1);                          \
+    intptr_t idx = simd_data(desc);                                        \
     TYPE *d = vd, *n = vn, *m = vm, *a = va;                               \
-    op1_neg <<= (8 * sizeof(TYPE) - 1);                                    \
     for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
         TYPE mm = m[H(i + idx)];                                           \
         for (j = 0; j < segment; j++) {                                    \
-            d[i + j] = TYPE##_muladd(n[i + j] ^ op1_neg,                   \
-                                     mm, a[i + j], 0, stat);               \
+            d[i + j] = TYPE##_muladd(n[i + j] ^ NEGX, mm,                  \
+                                     a[i + j], NEGF, stat);                \
         }                                                                  \
     }                                                                      \
     clear_tail(d, oprsz, simd_maxsz(desc));                                \
 }
 
-DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2)
-DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4)
-DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8)
+DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2, 0, 0)
+DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4, 0, 0)
+DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8, 0, 0)
+
+DO_FMLA_IDX(gvec_fmls_idx_h, float16, H2, INT16_MIN, 0)
+DO_FMLA_IDX(gvec_fmls_idx_s, float32, H4, INT32_MIN, 0)
+DO_FMLA_IDX(gvec_fmls_idx_d, float64, H8, INT64_MIN, 0)
+
+DO_FMLA_IDX(gvec_ah_fmls_idx_h, float16, H2, 0, float_muladd_negate_product)
+DO_FMLA_IDX(gvec_ah_fmls_idx_s, float32, H4, 0, float_muladd_negate_product)
+DO_FMLA_IDX(gvec_ah_fmls_idx_d, float64, H8, 0, float_muladd_negate_product)
 
 #undef DO_FMLA_IDX
 
-- 
2.34.1

Handle the FPCR.AH "don't negate the sign of a NaN" semantics
in FMLS (vector), by implementing a new set of helpers for
the AH=1 case.

The float_muladd_negate_product flag produces the same result
as negating either of the multiplication operands, assuming
neither of the operands are NaNs.  But since FEAT_AFP does not
negate NaNs, this behaviour is exactly what we need.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h            |  4 ++++
 target/arm/tcg/translate-a64.c |  7 ++++++-
 target/arm/tcg/vec_helper.c    | 22 ++++++++++++++++++++++
 3 files changed, 32 insertions(+), 1 deletion(-)

Handle the FPCR.AH "don't negate the sign of a NaN" semantics fro the
SVE FMLS (vector) insns, by providing new helpers for the AH=1 case
which end up passing fpcr_ah = true to the do_fmla_zpzzz_* functions
that do the work.

The float*_muladd functions have a flags argument that can
perform optional negation of various operand.  We don't use
that for "normal" arm fmla, because the muladd flags are not
applied when an input is a NaN.  But since FEAT_AFP does not
negate NaNs, this behaviour is exactly what we need.

The non-AH helpers pass in a zero flags argument and control the
negation via the neg1 and neg3 arguments; the AH helpers always pass
in neg1 and neg3 as zero and control the negation via the flags
argument.  This allows us to avoid conditional branches within the
inner loop.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 21 ++++++++
 target/arm/tcg/sve_helper.c    | 99 +++++++++++++++++++++++++++-------
 target/arm/tcg/translate-sve.c | 18 ++++---
 3 files changed, 114 insertions(+), 24 deletions(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_ZPZ_FP(flogb_d, float64, H1_8, do_float64_logb_as_int)
 
 static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
                             float_status *status, uint32_t desc,
-                            uint16_t neg1, uint16_t neg3)
+                            uint16_t neg1, uint16_t neg3, int flags)
 {
     intptr_t i = simd_oprsz(desc);
     uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
                 e1 = *(uint16_t *)(vn + H1_2(i)) ^ neg1;
                 e2 = *(uint16_t *)(vm + H1_2(i));
                 e3 = *(uint16_t *)(va + H1_2(i)) ^ neg3;
-                r = float16_muladd(e1, e2, e3, 0, status);
+                r = float16_muladd(e1, e2, e3, flags, status);
                 *(uint16_t *)(vd + H1_2(i)) = r;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
 void HELPER(sve_fmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
 }
 
 void HELPER(sve_fmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0, 0);
 }
 
 void HELPER(sve_fnmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000, 0);
 }
 
 void HELPER(sve_fnmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000, 0);
+}
+
+void HELPER(sve_ah_fmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product);
+}
+
+void HELPER(sve_ah_fnmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product | float_muladd_negate_c);
+}
+
+void HELPER(sve_ah_fnmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_c);
 }
 
 static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
                             float_status *status, uint32_t desc,
-                            uint32_t neg1, uint32_t neg3)
+                            uint32_t neg1, uint32_t neg3, int flags)
 {
     intptr_t i = simd_oprsz(desc);
     uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
                 e1 = *(uint32_t *)(vn + H1_4(i)) ^ neg1;
                 e2 = *(uint32_t *)(vm + H1_4(i));
                 e3 = *(uint32_t *)(va + H1_4(i)) ^ neg3;
-                r = float32_muladd(e1, e2, e3, 0, status);
+                r = float32_muladd(e1, e2, e3, flags, status);
                 *(uint32_t *)(vd + H1_4(i)) = r;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
 void HELPER(sve_fmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
 }
 
 void HELPER(sve_fmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0, 0);
 }
 
 void HELPER(sve_fnmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000, 0);
 }
 
 void HELPER(sve_fnmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000, 0);
+}
+
+void HELPER(sve_ah_fmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product);
+}
+
+void HELPER(sve_ah_fnmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product | float_muladd_negate_c);
+}
+
+void HELPER(sve_ah_fnmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_c);
 }
 
 static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
                             float_status *status, uint32_t desc,
-                            uint64_t neg1, uint64_t neg3)
+                            uint64_t neg1, uint64_t neg3, int flags)
 {
     intptr_t i = simd_oprsz(desc);
     uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
                 e1 = *(uint64_t *)(vn + i) ^ neg1;
                 e2 = *(uint64_t *)(vm + i);
                 e3 = *(uint64_t *)(va + i) ^ neg3;
-                r = float64_muladd(e1, e2, e3, 0, status);
+                r = float64_muladd(e1, e2, e3, flags, status);
                 *(uint64_t *)(vd + i) = r;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
 void HELPER(sve_fmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
 }
 
 void HELPER(sve_fmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0, 0);
 }
 
 void HELPER(sve_fnmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN, 0);
 }
 
 void HELPER(sve_fnmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN, 0);
+}
+
+void HELPER(sve_ah_fmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product);
+}
+
+void HELPER(sve_ah_fnmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product | float_muladd_negate_c);
+}
+
+void HELPER(sve_ah_fnmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_c);
 }
 
 /* Two operand floating-point comparison controlled by a predicate.
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
            a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
-#define DO_FMLA(NAME, name) \
+#define DO_FMLA(NAME, name, ah_name)                                    \
     static gen_helper_gvec_5_ptr * const name##_fns[4] = {              \
         NULL, gen_helper_sve_##name##_h,                                \
         gen_helper_sve_##name##_s, gen_helper_sve_##name##_d            \
     };                                                                  \
-    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp, name##_fns[a->esz], \
+    static gen_helper_gvec_5_ptr * const name##_ah_fns[4] = {           \
+        NULL, gen_helper_sve_##ah_name##_h,                             \
+        gen_helper_sve_##ah_name##_s, gen_helper_sve_##ah_name##_d      \
+    };                                                                  \
+    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp,                     \
+               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz], \
                a->rd, a->rn, a->rm, a->ra, a->pg, 0,                    \
                a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
-DO_FMLA(FMLA_zpzzz, fmla_zpzzz)
-DO_FMLA(FMLS_zpzzz, fmls_zpzzz)
-DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz)
-DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz)
+/* We don't need an ah_fmla_zpzzz because fmla doesn't negate anything */
+DO_FMLA(FMLA_zpzzz, fmla_zpzzz, fmla_zpzzz)
+DO_FMLA(FMLS_zpzzz, fmls_zpzzz, ah_fmls_zpzzz)
+DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz, ah_fnmla_zpzzz)
+DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz, ah_fnmls_zpzzz)
 
 #undef DO_FMLA
 
-- 
2.34.1

The negation step in the SVE FTSSEL insn mustn't negate a NaN when
FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field
and use that to determine whether to do the negation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/sve_helper.c    | 18 +++++++++++++++---
 target/arm/tcg/translate-sve.c |  4 ++--
 2 files changed, 17 insertions(+), 5 deletions(-)

The negation step in the SVE FTMAD insn mustn't negate a NaN when
FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field,
so we can select the correct behaviour.

Because the operand is known to be negative, negating the operand
is the same as taking the absolute value.  Defer this to the muladd
operation via flags, so that it happens after NaN detection, which
is correct for FPCR.AH.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/sve_helper.c    | 42 ++++++++++++++++++++++++++--------
 target/arm/tcg/translate-sve.c |  3 ++-
 2 files changed, 35 insertions(+), 10 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

The negation step in FCMLA mustn't negate a NaN when FPCR.AH
is set. Handle this by passing FPCR.AH to the helper via the
SIMD data field, and use this to select whether to do the
negation via XOR or via the muladd negate_product flag.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-26-richard.henderson@linaro.org
[PMM: Expanded commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c |  2 +-
 target/arm/tcg/vec_helper.c    | 66 ++++++++++++++++++++--------------
 2 files changed, 40 insertions(+), 28 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
                       a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
-                      a->rot, fn[a->esz]);
+                      a->rot | (s->fpcr_ah << 2), fn[a->esz]);
     return true;
 }
 
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float16 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-    uint32_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float16 negx_imag, negx_real;
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 15;
-    neg_imag <<= 15;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 15;
+    negx_imag = (negf_imag & ~fpcr_ah) << 15;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < opr_sz / 2; i += 2) {
         float16 e2 = n[H2(i + flip)];
-        float16 e1 = m[H2(i + flip)] ^ neg_real;
+        float16 e1 = m[H2(i + flip)] ^ negx_real;
         float16 e4 = e2;
-        float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag;
+        float16 e3 = m[H2(i + 1 - flip)] ^ negx_imag;
 
-        d[H2(i)] = float16_muladd(e2, e1, a[H2(i)], 0, fpst);
-        d[H2(i + 1)] = float16_muladd(e4, e3, a[H2(i + 1)], 0, fpst);
+        d[H2(i)] = float16_muladd(e2, e1, a[H2(i)], negf_real, fpst);
+        d[H2(i + 1)] = float16_muladd(e4, e3, a[H2(i + 1)], negf_imag, fpst);
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlas)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float32 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-    uint32_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float32 negx_imag, negx_real;
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 31;
-    neg_imag <<= 31;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 31;
+    negx_imag = (negf_imag & ~fpcr_ah) << 31;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < opr_sz / 4; i += 2) {
         float32 e2 = n[H4(i + flip)];
-        float32 e1 = m[H4(i + flip)] ^ neg_real;
+        float32 e1 = m[H4(i + flip)] ^ negx_real;
         float32 e4 = e2;
-        float32 e3 = m[H4(i + 1 - flip)] ^ neg_imag;
+        float32 e3 = m[H4(i + 1 - flip)] ^ negx_imag;
 
-        d[H4(i)] = float32_muladd(e2, e1, a[H4(i)], 0, fpst);
-        d[H4(i + 1)] = float32_muladd(e4, e3, a[H4(i + 1)], 0, fpst);
+        d[H4(i)] = float32_muladd(e2, e1, a[H4(i)], negf_real, fpst);
+        d[H4(i + 1)] = float32_muladd(e4, e3, a[H4(i + 1)], negf_imag, fpst);
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlad)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float64 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint64_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-    uint64_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float64 negx_real, negx_imag;
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 63;
-    neg_imag <<= 63;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (uint64_t)(negf_real & ~fpcr_ah) << 63;
+    negx_imag = (uint64_t)(negf_imag & ~fpcr_ah) << 63;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < opr_sz / 8; i += 2) {
         float64 e2 = n[i + flip];
-        float64 e1 = m[i + flip] ^ neg_real;
+        float64 e1 = m[i + flip] ^ negx_real;
         float64 e4 = e2;
-        float64 e3 = m[i + 1 - flip] ^ neg_imag;
+        float64 e3 = m[i + 1 - flip] ^ negx_imag;
 
-        d[i] = float64_muladd(e2, e1, a[i], 0, fpst);
-        d[i + 1] = float64_muladd(e4, e3, a[i + 1], 0, fpst);
+        d[i] = float64_muladd(e2, e1, a[i], negf_real, fpst);
+        d[i + 1] = float64_muladd(e4, e3, a[i + 1], negf_imag, fpst);
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

The negation step in FCMLA by index mustn't negate a NaN when
FPCR.AH is set. Use the same approach as vector FCMLA of
passing in FPCR.AH and using it to select whether to negate
by XOR or by the muladd negate_product flag.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-27-richard.henderson@linaro.org
[PMM: Expanded commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c |  2 +-
 target/arm/tcg/vec_helper.c    | 44 ++++++++++++++++++++--------------
 2 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
     if (fp_access_check(s)) {
         gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
                           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
-                          (a->idx << 2) | a->rot, fn);
+                          (s->fpcr_ah << 4) | (a->idx << 2) | a->rot, fn);
     }
     return true;
 }
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float16 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
     intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
-    uint32_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 4, 1);
+    uint32_t negf_real = flip ^ negf_imag;
     intptr_t elements = opr_sz / sizeof(float16);
     intptr_t eltspersegment = MIN(16 / sizeof(float16), elements);
+    float16 negx_imag, negx_real;
     intptr_t i, j;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 15;
-    neg_imag <<= 15;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 15;
+    negx_imag = (negf_imag & ~fpcr_ah) << 15;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < elements; i += eltspersegment) {
         float16 mr = m[H2(i + 2 * index + 0)];
         float16 mi = m[H2(i + 2 * index + 1)];
-        float16 e1 = neg_real ^ (flip ? mi : mr);
-        float16 e3 = neg_imag ^ (flip ? mr : mi);
+        float16 e1 = negx_real ^ (flip ? mi : mr);
+        float16 e3 = negx_imag ^ (flip ? mr : mi);
 
         for (j = i; j < i + eltspersegment; j += 2) {
             float16 e2 = n[H2(j + flip)];
             float16 e4 = e2;
 
-            d[H2(j)] = float16_muladd(e2, e1, a[H2(j)], 0, fpst);
-            d[H2(j + 1)] = float16_muladd(e4, e3, a[H2(j + 1)], 0, fpst);
+            d[H2(j)] = float16_muladd(e2, e1, a[H2(j)], negf_real, fpst);
+            d[H2(j + 1)] = float16_muladd(e4, e3, a[H2(j + 1)], negf_imag, fpst);
         }
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float32 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
     intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
-    uint32_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 4, 1);
+    uint32_t negf_real = flip ^ negf_imag;
     intptr_t elements = opr_sz / sizeof(float32);
     intptr_t eltspersegment = MIN(16 / sizeof(float32), elements);
+    float32 negx_imag, negx_real;
     intptr_t i, j;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 31;
-    neg_imag <<= 31;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 31;
+    negx_imag = (negf_imag & ~fpcr_ah) << 31;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < elements; i += eltspersegment) {
         float32 mr = m[H4(i + 2 * index + 0)];
         float32 mi = m[H4(i + 2 * index + 1)];
-        float32 e1 = neg_real ^ (flip ? mi : mr);
-        float32 e3 = neg_imag ^ (flip ? mr : mi);
+        float32 e1 = negx_real ^ (flip ? mi : mr);
+        float32 e3 = negx_imag ^ (flip ? mr : mi);
 
         for (j = i; j < i + eltspersegment; j += 2) {
             float32 e2 = n[H4(j + flip)];
             float32 e4 = e2;
 
-            d[H4(j)] = float32_muladd(e2, e1, a[H4(j)], 0, fpst);
-            d[H4(j + 1)] = float32_muladd(e4, e3, a[H4(j + 1)], 0, fpst);
+            d[H4(j)] = float32_muladd(e2, e1, a[H4(j)], negf_real, fpst);
+            d[H4(j + 1)] = float32_muladd(e4, e3, a[H4(j + 1)], negf_imag, fpst);
         }
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

The negation step in SVE FCMLA mustn't negate a NaN when FPCR.AH is
set.  Use the same approach as we did for A64 FCMLA of passing in
FPCR.AH and using it to select whether to negate by XOR or by the
muladd negate_product flag.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-28-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sve_helper.c    | 69 +++++++++++++++++++++-------------
 target/arm/tcg/translate-sve.c |  2 +-
 2 files changed, 43 insertions(+), 28 deletions(-)

diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
     intptr_t j, i = simd_oprsz(desc);
-    unsigned rot = simd_data(desc);
-    bool flip = rot & 1;
-    float16 neg_imag, neg_real;
+    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float16 negx_imag, negx_real;
     uint64_t *g = vg;
 
-    neg_imag = float16_set_sign(0, (rot & 2) != 0);
-    neg_real = float16_set_sign(0, rot == 1 || rot == 2);
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 15;
+    negx_imag = (negf_imag & ~fpcr_ah) << 15;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
             mi = *(float16 *)(vm + H1_2(j));
 
             e2 = (flip ? ni : nr);
-            e1 = (flip ? mi : mr) ^ neg_real;
+            e1 = (flip ? mi : mr) ^ negx_real;
             e4 = e2;
-            e3 = (flip ? mr : mi) ^ neg_imag;
+            e3 = (flip ? mr : mi) ^ negx_imag;
 
             if (likely((pg >> (i & 63)) & 1)) {
                 d = *(float16 *)(va + H1_2(i));
-                d = float16_muladd(e2, e1, d, 0, status);
+                d = float16_muladd(e2, e1, d, negf_real, status);
                 *(float16 *)(vd + H1_2(i)) = d;
             }
             if (likely((pg >> (j & 63)) & 1)) {
                 d = *(float16 *)(va + H1_2(j));
-                d = float16_muladd(e4, e3, d, 0, status);
+                d = float16_muladd(e4, e3, d, negf_imag, status);
                 *(float16 *)(vd + H1_2(j)) = d;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
     intptr_t j, i = simd_oprsz(desc);
-    unsigned rot = simd_data(desc);
-    bool flip = rot & 1;
-    float32 neg_imag, neg_real;
+    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float32 negx_imag, negx_real;
     uint64_t *g = vg;
 
-    neg_imag = float32_set_sign(0, (rot & 2) != 0);
-    neg_real = float32_set_sign(0, rot == 1 || rot == 2);
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 31;
+    negx_imag = (negf_imag & ~fpcr_ah) << 31;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
             mi = *(float32 *)(vm + H1_2(j));
 
             e2 = (flip ? ni : nr);
-            e1 = (flip ? mi : mr) ^ neg_real;
+            e1 = (flip ? mi : mr) ^ negx_real;
             e4 = e2;
-            e3 = (flip ? mr : mi) ^ neg_imag;
+            e3 = (flip ? mr : mi) ^ negx_imag;
 
             if (likely((pg >> (i & 63)) & 1)) {
                 d = *(float32 *)(va + H1_2(i));
-                d = float32_muladd(e2, e1, d, 0, status);
+                d = float32_muladd(e2, e1, d, negf_real, status);
                 *(float32 *)(vd + H1_2(i)) = d;
             }
             if (likely((pg >> (j & 63)) & 1)) {
                 d = *(float32 *)(va + H1_2(j));
-                d = float32_muladd(e4, e3, d, 0, status);
+                d = float32_muladd(e4, e3, d, negf_imag, status);
                 *(float32 *)(vd + H1_2(j)) = d;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
     intptr_t j, i = simd_oprsz(desc);
-    unsigned rot = simd_data(desc);
-    bool flip = rot & 1;
-    float64 neg_imag, neg_real;
+    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float64 negx_imag, negx_real;
     uint64_t *g = vg;
 
-    neg_imag = float64_set_sign(0, (rot & 2) != 0);
-    neg_real = float64_set_sign(0, rot == 1 || rot == 2);
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (uint64_t)(negf_real & ~fpcr_ah) << 63;
+    negx_imag = (uint64_t)(negf_imag & ~fpcr_ah) << 63;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
             mi = *(float64 *)(vm + H1_2(j));
 
             e2 = (flip ? ni : nr);
-            e1 = (flip ? mi : mr) ^ neg_real;
+            e1 = (flip ? mi : mr) ^ negx_real;
             e4 = e2;
-            e3 = (flip ? mr : mi) ^ neg_imag;
+            e3 = (flip ? mr : mi) ^ negx_imag;
 
             if (likely((pg >> (i & 63)) & 1)) {
                 d = *(float64 *)(va + H1_2(i));
-                d = float64_muladd(e2, e1, d, 0, status);
+                d = float64_muladd(e2, e1, d, negf_real, status);
                 *(float64 *)(vd + H1_2(i)) = d;
             }
             if (likely((pg >> (j & 63)) & 1)) {
                 d = *(float64 *)(va + H1_2(j));
-                d = float64_muladd(e4, e3, d, 0, status);
+                d = float64_muladd(e4, e3, d, negf_imag, status);
                 *(float64 *)(vd + H1_2(j)) = d;
             }
         } while (i & 63);
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_5_ptr * const fcmla_fns[4] = {
     gen_helper_sve_fcmla_zpzzz_s, gen_helper_sve_fcmla_zpzzz_d,
 };
 TRANS_FEAT(FCMLA_zpzzz, aa64_sve, gen_gvec_fpst_zzzzp, fcmla_fns[a->esz],
-           a->rd, a->rn, a->rm, a->ra, a->pg, a->rot,
+           a->rd, a->rn, a->rm, a->ra, a->pg, a->rot | (s->fpcr_ah << 2),
            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
 static gen_helper_gvec_4_ptr * const fcmla_idx_fns[4] = {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Handle FPCR.AH's requirement to not negate the sign of a NaN
in FMLSL by element and vector, using the usual trick of
negating by XOR when AH=0 and by muladd flags when AH=1.

Since we have the CPUARMState* in the helper anyway, we can
look directly at env->vfp.fpcr and don't need toa pass in the
FPCR.AH value via the SIMD data word.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-31-richard.henderson@linaro.org
[PMM: commit message tweaked]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 71 ++++++++++++++++++++++++-------------
 1 file changed, 46 insertions(+), 25 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t load4_f16(uint64_t *ptr, int is_q, int is_2)
  */
 
 static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
-                     uint32_t desc, bool fz16)
+                     uint64_t negx, int negf, uint32_t desc, bool fz16)
 {
     intptr_t i, oprsz = simd_oprsz(desc);
-    int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
     int is_q = oprsz == 16;
     uint64_t n_4, m_4;
 
-    /* Pre-load all of the f16 data, avoiding overlap issues.  */
-    n_4 = load4_f16(vn, is_q, is_2);
+    /*
+     * Pre-load all of the f16 data, avoiding overlap issues.
+     * Negate all inputs for AH=0 FMLSL at once.
+     */
+    n_4 = load4_f16(vn, is_q, is_2) ^ negx;
     m_4 = load4_f16(vm, is_q, is_2);
 
-    /* Negate all inputs for FMLSL at once.  */
-    if (is_s) {
-        n_4 ^= 0x8000800080008000ull;
-    }
-
     for (i = 0; i < oprsz / 4; i++) {
         float32 n_1 = float16_to_float32_by_bits(n_4 >> (i * 16), fz16);
         float32 m_1 = float16_to_float32_by_bits(m_4 >> (i * 16), fz16);
-        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], 0, fpst);
+        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], negf, fpst);
     }
     clear_tail(d, oprsz, simd_maxsz(desc));
 }
@@ -XXX,XX +XXX,XX @@ static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
 void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
                             CPUARMState *env, uint32_t desc)
 {
-    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, desc,
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint64_t negx = is_s ? 0x8000800080008000ull : 0;
+
+    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
              get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 }
 
 void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
                             CPUARMState *env, uint32_t desc)
 {
-    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, desc,
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint64_t negx = 0;
+    int negf = 0;
+
+    if (is_s) {
+        if (env->vfp.fpcr & FPCR_AH) {
+            negf = float_muladd_negate_product;
+        } else {
+            negx = 0x8000800080008000ull;
+        }
+    }
+    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
              get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
 }
 
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
 }
 
 static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
-                         uint32_t desc, bool fz16)
+                         uint64_t negx, int negf, uint32_t desc, bool fz16)
 {
     intptr_t i, oprsz = simd_oprsz(desc);
-    int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
     int index = extract32(desc, SIMD_DATA_SHIFT + 2, 3);
     int is_q = oprsz == 16;
     uint64_t n_4;
     float32 m_1;
 
-    /* Pre-load all of the f16 data, avoiding overlap issues.  */
-    n_4 = load4_f16(vn, is_q, is_2);
-
-    /* Negate all inputs for FMLSL at once.  */
-    if (is_s) {
-        n_4 ^= 0x8000800080008000ull;
-    }
-
+    /*
+     * Pre-load all of the f16 data, avoiding overlap issues.
+     * Negate all inputs for AH=0 FMLSL at once.
+     */
+    n_4 = load4_f16(vn, is_q, is_2) ^ negx;
     m_1 = float16_to_float32_by_bits(((float16 *)vm)[H2(index)], fz16);
 
     for (i = 0; i < oprsz / 4; i++) {
         float32 n_1 = float16_to_float32_by_bits(n_4 >> (i * 16), fz16);
-        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], 0, fpst);
+        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], negf, fpst);
     }
     clear_tail(d, oprsz, simd_maxsz(desc));
 }
@@ -XXX,XX +XXX,XX @@ static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
 void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
                                 CPUARMState *env, uint32_t desc)
 {
-    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, desc,
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint64_t negx = is_s ? 0x8000800080008000ull : 0;
+
+    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 }
 
 void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
                                 CPUARMState *env, uint32_t desc)
 {
-    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, desc,
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint64_t negx = 0;
+    int negf = 0;
+
+    if (is_s) {
+        if (env->vfp.fpcr & FPCR_AH) {
+            negf = float_muladd_negate_product;
+        } else {
+            negx = 0x8000800080008000ull;
+        }
+    }
+    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
 }
 
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Handle FPCR.AH's requirement to not negate the sign of a NaN in SVE
FMLSL (indexed), using the usual trick of negating by XOR when AH=0
and by muladd flags when AH=1.

Since we have the CPUARMState* in the helper anyway, we can
look directly at env->vfp.fpcr and don't need toa pass in the
FPCR.AH value via the SIMD data word.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-32-richard.henderson@linaro.org
[PMM: commit message tweaked]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
                                CPUARMState *env, uint32_t desc)
 {
     intptr_t i, j, oprsz = simd_oprsz(desc);
-    uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
     float_status *status = &env->vfp.fp_status_a64;
     bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
+    int negx = 0, negf = 0;
+
+    if (is_s) {
+        if (env->vfp.fpcr & FPCR_AH) {
+            negf = float_muladd_negate_product;
+        } else {
+            negx = 0x8000;
+        }
+    }
 
     for (i = 0; i < oprsz; i += 16) {
         float16 mm_16 = *(float16 *)(vm + i + idx);
         float32 mm = float16_to_float32_by_bits(mm_16, fz16);
 
         for (j = 0; j < 16; j += sizeof(float32)) {
-            float16 nn_16 = *(float16 *)(vn + H1_2(i + j + sel)) ^ negn;
+            float16 nn_16 = *(float16 *)(vn + H1_2(i + j + sel)) ^ negx;
             float32 nn = float16_to_float32_by_bits(nn_16, fz16);
             float32 aa = *(float32 *)(va + H1_4(i + j));
 
             *(float32 *)(vd + H1_4(i + j)) =
-                float32_muladd(nn, mm, aa, 0, status);
+                float32_muladd(nn, mm, aa, negf, status);
         }
     }
 }
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Handle FPCR.AH's requirement to not negate the sign of a NaN in SVE
FMLSL (indexed), using the usual trick of negating by XOR when AH=0
and by muladd flags when AH=1.

Since we have the CPUARMState* in the helper anyway, we can
look directly at env->vfp.fpcr and don't need toa pass in the
FPCR.AH value via the SIMD data word.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-33-richard.henderson@linaro.org
[PMM: tweaked commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
                                CPUARMState *env, uint32_t desc)
 {
     intptr_t i, oprsz = simd_oprsz(desc);
-    uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     float_status *status = &env->vfp.fp_status_a64;
     bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
+    int negx = 0, negf = 0;
+
+    if (is_s) {
+        if (env->vfp.fpcr & FPCR_AH) {
+            negf = float_muladd_negate_product;
+        } else {
+            negx = 0x8000;
+        }
+    }
 
     for (i = 0; i < oprsz; i += sizeof(float32)) {
-        float16 nn_16 = *(float16 *)(vn + H1_2(i + sel)) ^ negn;
+        float16 nn_16 = *(float16 *)(vn + H1_2(i + sel)) ^ negx;
         float16 mm_16 = *(float16 *)(vm + H1_2(i + sel));
         float32 nn = float16_to_float32_by_bits(nn_16, fz16);
         float32 mm = float16_to_float32_by_bits(mm_16, fz16);
         float32 aa = *(float32 *)(va + H1_4(i));
 
-        *(float32 *)(vd + H1_4(i)) = float32_muladd(nn, mm, aa, 0, status);
+        *(float32 *)(vd + H1_4(i)) = float32_muladd(nn, mm, aa, negf, status);
     }
 }
 
-- 
2.34.1

Now that we have completed the handling for FPCR.{AH,FIZ,NEP}, we
can enable FEAT_AFP for '-cpu max', and document that we support it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 docs/system/arm/emulation.rst | 1 +
 target/arm/tcg/cpu64.c        | 1 +
 2 files changed, 2 insertions(+)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
 - FEAT_AA64EL3 (Support for AArch64 at EL3)
 - FEAT_AdvSIMD (Advanced SIMD Extension)
 - FEAT_AES (AESD and AESE instructions)
+- FEAT_AFP (Alternate floating-point behavior)
 - FEAT_Armv9_Crypto (Armv9 Cryptographic Extension)
 - FEAT_ASID16 (16 bit ASID)
 - FEAT_BBM at level 2 (Translation table break-before-make levels)
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
     t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1);      /* FEAT_XNX */
     t = FIELD_DP64(t, ID_AA64MMFR1, ETS, 2);      /* FEAT_ETS2 */
     t = FIELD_DP64(t, ID_AA64MMFR1, HCX, 1);      /* FEAT_HCX */
+    t = FIELD_DP64(t, ID_AA64MMFR1, AFP, 1);      /* FEAT_AFP */
     t = FIELD_DP64(t, ID_AA64MMFR1, TIDCP1, 1);   /* FEAT_TIDCP1 */
     t = FIELD_DP64(t, ID_AA64MMFR1, CMOW, 1);     /* FEAT_CMOW */
     cpu->isar.id_aa64mmfr1 = t;
-- 
2.34.1

FEAT_RPRES implements an "increased precision" variant of the single
precision FRECPE and FRSQRTE instructions from an 8 bit to a 12
bit mantissa. This applies only when FPCR.AH == 1. Note that the
halfprec and double versions of these insns retain the 8 bit
precision regardless.

In this commit we add all the plumbing to make these instructions
call a new helper function when the increased-precision is in
effect. In the following commit we will provide the actual change
in behaviour in the helpers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu-features.h      |  5 +++++
 target/arm/helper.h            |  4 ++++
 target/arm/tcg/translate-a64.c | 34 ++++++++++++++++++++++++++++++----
 target/arm/tcg/translate-sve.c | 16 ++++++++++++++--
 target/arm/tcg/vec_helper.c    |  2 ++
 target/arm/vfp_helper.c        | 32 ++++++++++++++++++++++++++++++--
 6 files changed, 85 insertions(+), 8 deletions(-)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_mops(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, MOPS);
 }
 
+static inline bool isar_feature_aa64_rpres(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, RPRES);
+}
+
 static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
 {
     /* We always set the AdvSIMD and FP fields identically.  */
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, fpst)
 
 DEF_HELPER_FLAGS_2(recpe_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
 DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
+DEF_HELPER_FLAGS_2(recpe_rpres_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
 DEF_HELPER_FLAGS_2(recpe_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
 DEF_HELPER_FLAGS_2(rsqrte_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
 DEF_HELPER_FLAGS_2(rsqrte_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
+DEF_HELPER_FLAGS_2(rsqrte_rpres_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
 DEF_HELPER_FLAGS_2(rsqrte_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
 DEF_HELPER_FLAGS_1(recpe_u32, TCG_CALL_NO_RWG, i32, i32)
 DEF_HELPER_FLAGS_1(rsqrte_u32, TCG_CALL_NO_RWG, i32, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vrintx_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 
 DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_4(gvec_frecpe_rpres_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 
 DEF_HELPER_FLAGS_4(gvec_frsqrte_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frsqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_4(gvec_frsqrte_rpres_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frsqrte_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 
 DEF_HELPER_FLAGS_4(gvec_fcgt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frecpe = {
     gen_helper_recpe_f32,
     gen_helper_recpe_f64,
 };
-TRANS(FRECPE_s, do_fp1_scalar_ah, a, &f_scalar_frecpe, -1)
+static const FPScalar1 f_scalar_frecpe_rpres = {
+    gen_helper_recpe_f16,
+    gen_helper_recpe_rpres_f32,
+    gen_helper_recpe_f64,
+};
+TRANS(FRECPE_s, do_fp1_scalar_ah, a,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+      &f_scalar_frecpe_rpres : &f_scalar_frecpe, -1)
 
 static const FPScalar1 f_scalar_frecpx = {
     gen_helper_frecpx_f16,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frsqrte = {
     gen_helper_rsqrte_f32,
     gen_helper_rsqrte_f64,
 };
-TRANS(FRSQRTE_s, do_fp1_scalar_ah, a, &f_scalar_frsqrte, -1)
+static const FPScalar1 f_scalar_frsqrte_rpres = {
+    gen_helper_rsqrte_f16,
+    gen_helper_rsqrte_rpres_f32,
+    gen_helper_rsqrte_f64,
+};
+TRANS(FRSQRTE_s, do_fp1_scalar_ah, a,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+      &f_scalar_frsqrte_rpres : &f_scalar_frsqrte, -1)
 
 static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
 {
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_frecpe[] = {
     gen_helper_gvec_frecpe_s,
     gen_helper_gvec_frecpe_d,
 };
-TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
+static gen_helper_gvec_2_ptr * const f_frecpe_rpres[] = {
+    gen_helper_gvec_frecpe_h,
+    gen_helper_gvec_frecpe_rpres_s,
+    gen_helper_gvec_frecpe_d,
+};
+TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ? f_frecpe_rpres : f_frecpe)
 
 static gen_helper_gvec_2_ptr * const f_frsqrte[] = {
     gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s,
     gen_helper_gvec_frsqrte_d,
 };
-TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
+static gen_helper_gvec_2_ptr * const f_frsqrte_rpres[] = {
+    gen_helper_gvec_frsqrte_h,
+    gen_helper_gvec_frsqrte_rpres_s,
+    gen_helper_gvec_frsqrte_d,
+};
+TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ? f_frsqrte_rpres : f_frsqrte)
 
 static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
 {
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const frecpe_fns[] = {
     NULL,                     gen_helper_gvec_frecpe_h,
     gen_helper_gvec_frecpe_s, gen_helper_gvec_frecpe_d,
 };
-TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frecpe_fns[a->esz], a, 0)
+static gen_helper_gvec_2_ptr * const frecpe_rpres_fns[] = {
+    NULL,                           gen_helper_gvec_frecpe_h,
+    gen_helper_gvec_frecpe_rpres_s, gen_helper_gvec_frecpe_d,
+};
+TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz,
+           s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+           frecpe_rpres_fns[a->esz] : frecpe_fns[a->esz], a, 0)
 
 static gen_helper_gvec_2_ptr * const frsqrte_fns[] = {
     NULL,                      gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s, gen_helper_gvec_frsqrte_d,
 };
-TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frsqrte_fns[a->esz], a, 0)
+static gen_helper_gvec_2_ptr * const frsqrte_rpres_fns[] = {
+    NULL,                            gen_helper_gvec_frsqrte_h,
+    gen_helper_gvec_frsqrte_rpres_s, gen_helper_gvec_frsqrte_d,
+};
+TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz,
+           s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+           frsqrte_rpres_fns[a->esz] : frsqrte_fns[a->esz], a, 0)
 
 /*
  *** SVE Floating Point Compare with Zero Group
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, float_status *stat, uint32_t desc)  \
 
 DO_2OP(gvec_frecpe_h, helper_recpe_f16, float16)
 DO_2OP(gvec_frecpe_s, helper_recpe_f32, float32)
+DO_2OP(gvec_frecpe_rpres_s, helper_recpe_rpres_f32, float32)
 DO_2OP(gvec_frecpe_d, helper_recpe_f64, float64)
 
 DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16)
 DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32)
+DO_2OP(gvec_frsqrte_rpres_s, helper_rsqrte_rpres_f32, float32)
 DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64)
 
 DO_2OP(gvec_vrintx_h, float16_round_to_int, float16)
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, float_status *fpst)
     return make_float16(f16_val);
 }
 
-float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
+/*
+ * FEAT_RPRES means the f32 FRECPE has an "increased precision" variant
+ * which is used when FPCR.AH == 1.
+ */
+static float32 do_recpe_f32(float32 input, float_status *fpst, bool rpres)
 {
     float32 f32 = float32_squash_input_denormal(input, fpst);
     uint32_t f32_val = float32_val(f32);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
     return make_float32(f32_val);
 }
 
+float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
+{
+    return do_recpe_f32(input, fpst, false);
+}
+
+float32 HELPER(recpe_rpres_f32)(float32 input, float_status *fpst)
+{
+    return do_recpe_f32(input, fpst, true);
+}
+
 float64 HELPER(recpe_f64)(float64 input, float_status *fpst)
 {
     float64 f64 = float64_squash_input_denormal(input, fpst);
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
     return make_float16(val);
 }
 
-float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
+/*
+ * FEAT_RPRES means the f32 FRSQRTE has an "increased precision" variant
+ * which is used when FPCR.AH == 1.
+ */
+static float32 do_rsqrte_f32(float32 input, float_status *s, bool rpres)
 {
     float32 f32 = float32_squash_input_denormal(input, s);
     uint32_t val = float32_val(f32);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
     return make_float32(val);
 }
 
+float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
+{
+    return do_rsqrte_f32(input, s, false);
+}
+
+float32 HELPER(rsqrte_rpres_f32)(float32 input, float_status *s)
+{
+    return do_rsqrte_f32(input, s, true);
+}
+
 float64 HELPER(rsqrte_f64)(float64 input, float_status *s)
 {
     float64 f64 = float64_squash_input_denormal(input, s);
-- 
2.34.1

Implement the increased precision variation of FRECPE.  In the
pseudocode this corresponds to the handling of the
"increasedprecision" boolean in the FPRecipEstimate() and
RecipEstimate() functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vfp_helper.c | 54 +++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 8 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static int recip_estimate(int input)
     return r;
 }
 
+/*
+ * Increased precision version:
+ * input is a 13 bit fixed point number
+ * input range 2048 .. 4095 for a number from 0.5 <= x < 1.0.
+ * result range 4096 .. 8191 for a number from 1.0 to 2.0
+ */
+static int recip_estimate_incprec(int input)
+{
+    int a, b, r;
+    assert(2048 <= input && input < 4096);
+    a = (input * 2) + 1;
+    /*
+     * The pseudocode expresses this as an operation on infinite
+     * precision reals where it calculates 2^25 / a and then looks
+     * at the error between that and the rounded-down-to-integer
+     * value to see if it should instead round up. We instead
+     * follow the same approach as the pseudocode for the 8-bit
+     * precision version, and calculate (2 * (2^25 / a)) as an
+     * integer so we can do the "add one and halve" to round it.
+     * So the 1 << 26 here is correct.
+     */
+    b = (1 << 26) / a;
+    r = (b + 1) >> 1;
+    assert(4096 <= r && r < 8192);
+    return r;
+}
+
 /*
  * Common wrapper to call recip_estimate
  *
@@ -XXX,XX +XXX,XX @@ static int recip_estimate(int input)
  * callee.
  */
 
-static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
+static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac,
+                                    bool increasedprecision)
 {
     uint32_t scaled, estimate;
     uint64_t result_frac;
@@ -XXX,XX +XXX,XX @@ static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
         }
     }
 
-    /* scaled = UInt('1':fraction<51:44>) */
-    scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
-    estimate = recip_estimate(scaled);
+    if (increasedprecision) {
+        /* scaled = UInt('1':fraction<51:41>) */
+        scaled = deposit32(1 << 11, 0, 11, extract64(frac, 41, 11));
+        estimate = recip_estimate_incprec(scaled);
+    } else {
+        /* scaled = UInt('1':fraction<51:44>) */
+        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
+        estimate = recip_estimate(scaled);
+    }
 
     result_exp = exp_off - *exp;
-    result_frac = deposit64(0, 44, 8, estimate);
+    if (increasedprecision) {
+        result_frac = deposit64(0, 40, 12, estimate);
+    } else {
+        result_frac = deposit64(0, 44, 8, estimate);
+    }
     if (result_exp == 0) {
         result_frac = deposit64(result_frac >> 1, 51, 1, 1);
     } else if (result_exp == -1) {
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, float_status *fpst)
     }
 
     f64_frac = call_recip_estimate(&f16_exp, 29,
-                                   ((uint64_t) f16_frac) << (52 - 10));
+                                   ((uint64_t) f16_frac) << (52 - 10), false);
 
     /* result = sign : result_exp<4:0> : fraction<51:42> */
     f16_val = deposit32(0, 15, 1, f16_sign);
@@ -XXX,XX +XXX,XX @@ static float32 do_recpe_f32(float32 input, float_status *fpst, bool rpres)
     }
 
     f64_frac = call_recip_estimate(&f32_exp, 253,
-                                   ((uint64_t) f32_frac) << (52 - 23));
+                                   ((uint64_t) f32_frac) << (52 - 23), rpres);
 
     /* result = sign : result_exp<7:0> : fraction<51:29> */
     f32_val = deposit32(0, 31, 1, f32_sign);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(recpe_f64)(float64 input, float_status *fpst)
         return float64_set_sign(float64_zero, float64_is_neg(f64));
     }
 
-    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac);
+    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac, false);
 
     /* result = sign : result_exp<10:0> : fraction<51:0>; */
     f64_val = deposit64(0, 63, 1, f64_sign);
-- 
2.34.1

Implement the increased precision variation of FRSQRTE.  In the
pseudocode this corresponds to the handling of the
"increasedprecision" boolean in the FPRSqrtEstimate() and
RecipSqrtEstimate() functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vfp_helper.c | 77 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 64 insertions(+), 13 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static int do_recip_sqrt_estimate(int a)
     return estimate;
 }
 
+static int do_recip_sqrt_estimate_incprec(int a)
+{
+    /*
+     * The Arm ARM describes the 12-bit precision version of RecipSqrtEstimate
+     * in terms of an infinite-precision floating point calculation of a
+     * square root. We implement this using the same kind of pure integer
+     * algorithm as the 8-bit mantissa, to get the same bit-for-bit result.
+     */
+    int64_t b, estimate;
 
-static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
+    assert(1024 <= a && a < 4096);
+    if (a < 2048) {
+        a = a * 2 + 1;
+    } else {
+        a = (a >> 1) << 1;
+        a = (a + 1) * 2;
+    }
+    b = 8192;
+    while (a * (b + 1) * (b + 1) < (1ULL << 39)) {
+        b += 1;
+    }
+    estimate = (b + 1) / 2;
+
+    assert(4096 <= estimate && estimate < 8192);
+
+    return estimate;
+}
+
+static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac,
+                                    bool increasedprecision)
 {
     int estimate;
     uint32_t scaled;
@@ -XXX,XX +XXX,XX @@ static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
         frac = extract64(frac, 0, 51) << 1;
     }
 
-    if (*exp & 1) {
-        /* scaled = UInt('01':fraction<51:45>) */
-        scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
+    if (increasedprecision) {
+        if (*exp & 1) {
+            /* scaled = UInt('01':fraction<51:42>) */
+            scaled = deposit32(1 << 10, 0, 10, extract64(frac, 42, 10));
+        } else {
+            /* scaled = UInt('1':fraction<51:41>) */
+            scaled = deposit32(1 << 11, 0, 11, extract64(frac, 41, 11));
+        }
+        estimate = do_recip_sqrt_estimate_incprec(scaled);
     } else {
-        /* scaled = UInt('1':fraction<51:44>) */
-        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
+        if (*exp & 1) {
+            /* scaled = UInt('01':fraction<51:45>) */
+            scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
+        } else {
+            /* scaled = UInt('1':fraction<51:44>) */
+            scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
+        }
+        estimate = do_recip_sqrt_estimate(scaled);
     }
-    estimate = do_recip_sqrt_estimate(scaled);
 
     *exp = (exp_off - *exp) / 2;
-    return extract64(estimate, 0, 8) << 44;
+    if (increasedprecision) {
+        return extract64(estimate, 0, 12) << 40;
+    } else {
+        return extract64(estimate, 0, 8) << 44;
+    }
 }
 
 uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
 
     f64_frac = ((uint64_t) f16_frac) << (52 - 10);
 
-    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac);
+    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac, false);
 
     /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(2) */
     val = deposit32(0, 15, 1, f16_sign);
@@ -XXX,XX +XXX,XX @@ static float32 do_rsqrte_f32(float32 input, float_status *s, bool rpres)
 
     f64_frac = ((uint64_t) f32_frac) << 29;
 
-    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac);
+    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac, rpres);
 
-    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(15) */
+    /*
+     * result = sign : result_exp<7:0> : estimate<7:0> : Zeros(15)
+     * or for increased precision
+     * result = sign : result_exp<7:0> : estimate<11:0> : Zeros(11)
+     */
     val = deposit32(0, 31, 1, f32_sign);
     val = deposit32(val, 23, 8, f32_exp);
-    val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
+    if (rpres) {
+        val = deposit32(val, 11, 12, extract64(f64_frac, 52 - 12, 12));
+    } else {
+        val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
+    }
     return make_float32(val);
 }
 
@@ -XXX,XX +XXX,XX @@ float64 HELPER(rsqrte_f64)(float64 input, float_status *s)
         return float64_zero;
     }
 
-    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac);
+    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac, false);
 
     /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(44) */
     val = deposit64(0, 61, 1, f64_sign);
-- 
2.34.1

Now the emulation is complete, we can enable FEAT_RPRES for the 'max'
CPU type.

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
 - FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions)
 - FEAT_RME (Realm Management Extension) (NB: support status in QEMU is experimental)
 - FEAT_RNG (Random number generator)
+- FEAT_RPRES (Increased precision of FRECPE and FRSQRTE)
 - FEAT_S2FWB (Stage 2 forced Write-Back)
 - FEAT_SB (Speculation Barrier)
 - FEAT_SEL2 (Secure EL2)
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
     cpu->isar.id_aa64isar1 = t;
 
     t = cpu->isar.id_aa64isar2;
+    t = FIELD_DP64(t, ID_AA64ISAR2, RPRES, 1);    /* FEAT_RPRES */
     t = FIELD_DP64(t, ID_AA64ISAR2, MOPS, 1);     /* FEAT_MOPS */
     t = FIELD_DP64(t, ID_AA64ISAR2, BC, 1);       /* FEAT_HBC */
     t = FIELD_DP64(t, ID_AA64ISAR2, WFXT, 2);     /* FEAT_WFxT */
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Move ARMFPStatusFlavour to cpu.h with which to index
this array.  For now, place the array in an anonymous
union with the existing structures.  Adjust the order
of the existing structures to match the enum.

Simplify fpstatus_ptr() using the new array.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           | 119 +++++++++++++++++++++----------------
 target/arm/tcg/translate.h |  64 +-------------------
 2 files changed, 70 insertions(+), 113 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct ARMMMUFaultInfo ARMMMUFaultInfo;
 
 typedef struct NVICState NVICState;
 
+/*
+ * Enum for indexing vfp.fp_status[].
+ *
+ * FPST_A32: is the "normal" fp status for AArch32 insns
+ * FPST_A64: is the "normal" fp status for AArch64 insns
+ * FPST_A32_F16: used for AArch32 half-precision calculations
+ * FPST_A64_F16: used for AArch64 half-precision calculations
+ * FPST_STD: the ARM "Standard FPSCR Value"
+ * FPST_STD_F16: used for half-precision
+ *       calculations with the ARM "Standard FPSCR Value"
+ * FPST_AH: used for the A64 insns which change behaviour
+ *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
+ *       and the reciprocal and square root estimate/step insns)
+ * FPST_AH_F16: used for the A64 insns which change behaviour
+ *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
+ *       and the reciprocal and square root estimate/step insns);
+ *       for half-precision
+ *
+ * Half-precision operations are governed by a separate
+ * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
+ * status structure to control this.
+ *
+ * The "Standard FPSCR", ie default-NaN, flush-to-zero,
+ * round-to-nearest and is used by any operations (generally
+ * Neon) which the architecture defines as controlled by the
+ * standard FPSCR value rather than the FPSCR.
+ *
+ * The "standard FPSCR but for fp16 ops" is needed because
+ * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
+ * using a fixed value for it.
+ *
+ * The ah_fp_status is needed because some insns have different
+ * behaviour when FPCR.AH == 1: they don't update cumulative
+ * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
+ * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
+ * which means we need an ah_fp_status_f16 as well.
+ *
+ * To avoid having to transfer exception bits around, we simply
+ * say that the FPSCR cumulative exception flags are the logical
+ * OR of the flags in the four fp statuses. This relies on the
+ * only thing which needs to read the exception flags being
+ * an explicit FPSCR read.
+ */
+typedef enum ARMFPStatusFlavour {
+    FPST_A32,
+    FPST_A64,
+    FPST_A32_F16,
+    FPST_A64_F16,
+    FPST_AH,
+    FPST_AH_F16,
+    FPST_STD,
+    FPST_STD_F16,
+} ARMFPStatusFlavour;
+#define FPST_COUNT  8
+
 typedef struct CPUArchState {
     /* Regs for current mode.  */
     uint32_t regs[16];
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
         /* Scratch space for aa32 neon expansion.  */
         uint32_t scratch[8];
 
-        /* There are a number of distinct float control structures:
-         *
-         *  fp_status_a32: is the "normal" fp status for AArch32 insns
-         *  fp_status_a64: is the "normal" fp status for AArch64 insns
-         *  fp_status_fp16_a32: used for AArch32 half-precision calculations
-         *  fp_status_fp16_a64: used for AArch64 half-precision calculations
-         *  standard_fp_status : the ARM "Standard FPSCR Value"
-         *  standard_fp_status_fp16 : used for half-precision
-         *       calculations with the ARM "Standard FPSCR Value"
-         *  ah_fp_status: used for the A64 insns which change behaviour
-         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
-         *       and the reciprocal and square root estimate/step insns)
-         *  ah_fp_status_f16: used for the A64 insns which change behaviour
-         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
-         *       and the reciprocal and square root estimate/step insns);
-         *       for half-precision
-         *
-         * Half-precision operations are governed by a separate
-         * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
-         * status structure to control this.
-         *
-         * The "Standard FPSCR", ie default-NaN, flush-to-zero,
-         * round-to-nearest and is used by any operations (generally
-         * Neon) which the architecture defines as controlled by the
-         * standard FPSCR value rather than the FPSCR.
-         *
-         * The "standard FPSCR but for fp16 ops" is needed because
-         * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
-         * using a fixed value for it.
-         *
-         * The ah_fp_status is needed because some insns have different
-         * behaviour when FPCR.AH == 1: they don't update cumulative
-         * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
-         * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
-         * which means we need an ah_fp_status_f16 as well.
-         *
-         * To avoid having to transfer exception bits around, we simply
-         * say that the FPSCR cumulative exception flags are the logical
-         * OR of the flags in the four fp statuses. This relies on the
-         * only thing which needs to read the exception flags being
-         * an explicit FPSCR read.
-         */
-        float_status fp_status_a32;
-        float_status fp_status_a64;
-        float_status fp_status_f16_a32;
-        float_status fp_status_f16_a64;
-        float_status standard_fp_status;
-        float_status standard_fp_status_f16;
-        float_status ah_fp_status;
-        float_status ah_fp_status_f16;
+        /* There are a number of distinct float control structures. */
+        union {
+            float_status fp_status[FPST_COUNT];
+            struct {
+                float_status fp_status_a32;
+                float_status fp_status_a64;
+                float_status fp_status_f16_a32;
+                float_status fp_status_f16_a64;
+                float_status ah_fp_status;
+                float_status ah_fp_status_f16;
+                float_status standard_fp_status;
+                float_status standard_fp_status_f16;
+            };
+        };
 
         uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
         uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ static inline CPUARMTBFlags arm_tbflags_from_tb(const TranslationBlock *tb)
     return (CPUARMTBFlags){ tb->flags, tb->cs_base };
 }
 
-/*
- * Enum for argument to fpstatus_ptr().
- */
-typedef enum ARMFPStatusFlavour {
-    FPST_A32,
-    FPST_A64,
-    FPST_A32_F16,
-    FPST_A64_F16,
-    FPST_AH,
-    FPST_AH_F16,
-    FPST_STD,
-    FPST_STD_F16,
-} ARMFPStatusFlavour;
-
 /**
  * fpstatus_ptr: return TCGv_ptr to the specified fp_status field
  *
  * We have multiple softfloat float_status fields in the Arm CPU state struct
  * (see the comment in cpu.h for details). Return a TCGv_ptr which has
  * been set up to point to the requested field in the CPU state struct.
- * The options are:
- *
- * FPST_A32
- *   for AArch32 non-FP16 operations controlled by the FPCR
- * FPST_A64
- *   for AArch64 non-FP16 operations controlled by the FPCR
- * FPST_A32_F16
- *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
- * FPST_A64_F16
- *   for AArch64 operations controlled by the FPCR where FPCR.FZ16 is to be used
- * FPST_AH:
- *   for AArch64 operations which change behaviour when AH=1 (specifically,
- *   bfloat16 conversions and multiplies, and the reciprocal and square root
- *   estimate/step insns)
- * FPST_AH_F16:
- *   ditto, but for half-precision operations
- * FPST_STD
- *   for A32/T32 Neon operations using the "standard FPSCR value"
- * FPST_STD_F16
- *   as FPST_STD, but where FPCR.FZ16 is to be used
  */
 static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
 {
     TCGv_ptr statusptr = tcg_temp_new_ptr();
-    int offset;
+    int offset = offsetof(CPUARMState, vfp.fp_status[flavour]);
 
-    switch (flavour) {
-    case FPST_A32:
-        offset = offsetof(CPUARMState, vfp.fp_status_a32);
-        break;
-    case FPST_A64:
-        offset = offsetof(CPUARMState, vfp.fp_status_a64);
-        break;
-    case FPST_A32_F16:
-        offset = offsetof(CPUARMState, vfp.fp_status_f16_a32);
-        break;
-    case FPST_A64_F16:
-        offset = offsetof(CPUARMState, vfp.fp_status_f16_a64);
-        break;
-    case FPST_AH:
-        offset = offsetof(CPUARMState, vfp.ah_fp_status);
-        break;
-    case FPST_AH_F16:
-        offset = offsetof(CPUARMState, vfp.ah_fp_status_f16);
-        break;
-    case FPST_STD:
-        offset = offsetof(CPUARMState, vfp.standard_fp_status);
-        break;
-    case FPST_STD_F16:
-        offset = offsetof(CPUARMState, vfp.standard_fp_status_f16);
-        break;
-    default:
-        g_assert_not_reached();
-    }
     tcg_gen_addi_ptr(statusptr, tcg_env, offset);
     return statusptr;
 }
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_STD_F16].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  4 ++--
 target/arm/tcg/mve_helper.c | 24 ++++++++++++------------
 target/arm/vfp_helper.c     |  8 ++++----
 4 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
                 float_status ah_fp_status;
                 float_status ah_fp_status_f16;
                 float_status standard_fp_status;
-                float_status standard_fp_status_f16;
             };
         };
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     set_flush_to_zero(1, &env->vfp.standard_fp_status);
     set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
     set_default_nan_mode(1, &env->vfp.standard_fp_status);
-    set_default_nan_mode(1, &env->vfp.standard_fp_status_f16);
+    set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
-    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
     arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
     set_flush_to_zero(1, &env->vfp.ah_fp_status);
     set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                 r[e] = 0;                                               \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(tm & 1)) {                                            \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE * 2)) == 0) {          \
                 continue;                                               \
             }                                                           \
-            fpst0 = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :   \
+            fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
                 &env->vfp.standard_fp_status;                           \
             fpst1 = fpst0;                                              \
             if (!(mask & 1)) {                                          \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
         TYPE *m = vm;                                           \
         TYPE ra = (TYPE)ra_in;                                  \
         float_status *fpst = (ESIZE == 2) ?                     \
-            &env->vfp.standard_fp_status_f16 :                  \
+            &env->vfp.fp_status[FPST_STD_F16] :                 \
             &env->vfp.standard_fp_status;                       \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
             if (mask & 1) {                                     \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
             if ((mask & emask) == 0) {                                  \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
             if ((mask & emask) == 0) {                                  \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
         float_status *fpst;                                             \
         float_status scratch_fpst;                                      \
         float_status *base_fpst = (ESIZE == 2) ?                        \
-            &env->vfp.standard_fp_status_f16 :                          \
+            &env->vfp.fp_status[FPST_STD_F16] :                         \
             &env->vfp.standard_fp_status;                               \
         uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
         set_float_rounding_mode(rmode, base_fpst);                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     /* FZ16 does not generate an input denormal exception.  */
     a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
           & ~float_flag_input_denormal_flushed);
-    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
+    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_STD_F16])
           & ~float_flag_input_denormal_flushed);
 
     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
     set_float_exception_flags(0, &env->vfp.standard_fp_status);
-    set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
     set_float_exception_flags(0, &env->vfp.ah_fp_status);
     set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
 }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         bool ftz_enabled = val & FPCR_FZ16;
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
-        set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
         set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
     }
     if (changed & FPCR_FZ) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_STD].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  8 ++++----
 target/arm/tcg/mve_helper.c | 28 ++++++++++++++--------------
 target/arm/tcg/vec_helper.c |  4 ++--
 target/arm/vfp_helper.c     |  4 ++--
 5 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
                 float_status fp_status_f16_a64;
                 float_status ah_fp_status;
                 float_status ah_fp_status_f16;
-                float_status standard_fp_status;
             };
         };
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
         env->sau.ctrl = 0;
     }
 
-    set_flush_to_zero(1, &env->vfp.standard_fp_status);
-    set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
-    set_default_nan_mode(1, &env->vfp.standard_fp_status);
+    set_flush_to_zero(1, &env->vfp.fp_status[FPST_STD]);
+    set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]);
+    set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
-    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(tm & 1)) {                                            \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
                 continue;                                               \
             }                                                           \
             fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             fpst1 = fpst0;                                              \
             if (!(mask & 1)) {                                          \
                 scratch_fpst = *fpst0;                                  \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
         TYPE ra = (TYPE)ra_in;                                  \
         float_status *fpst = (ESIZE == 2) ?                     \
             &env->vfp.fp_status[FPST_STD_F16] :                 \
-            &env->vfp.standard_fp_status;                       \
+            &env->vfp.fp_status[FPST_STD];                       \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
             if (mask & 1) {                                     \
                 TYPE v = m[H##ESIZE(e)];                        \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
         float_status scratch_fpst;                                      \
         float_status *base_fpst = (ESIZE == 2) ?                        \
             &env->vfp.fp_status[FPST_STD_F16] :                         \
-            &env->vfp.standard_fp_status;                               \
+            &env->vfp.fp_status[FPST_STD];                               \
         uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
         set_float_rounding_mode(rmode, base_fpst);                      \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
@@ -XXX,XX +XXX,XX @@ static void do_vcvt_sh(CPUARMState *env, void *vd, void *vm, int top)
     unsigned e;
     float_status *fpst;
     float_status scratch_fpst;
-    float_status *base_fpst = &env->vfp.standard_fp_status;
+    float_status *base_fpst = &env->vfp.fp_status[FPST_STD];
     bool old_fz = get_flush_to_zero(base_fpst);
     set_flush_to_zero(false, base_fpst);
     for (e = 0; e < 16 / 4; e++, mask >>= 4) {
@@ -XXX,XX +XXX,XX @@ static void do_vcvt_hs(CPUARMState *env, void *vd, void *vm, int top)
     unsigned e;
     float_status *fpst;
     float_status scratch_fpst;
-    float_status *base_fpst = &env->vfp.standard_fp_status;
+    float_status *base_fpst = &env->vfp.fp_status[FPST_STD];
     bool old_fiz = get_flush_inputs_to_zero(base_fpst);
     set_flush_inputs_to_zero(false, base_fpst);
     for (e = 0; e < 16 / 4; e++, mask >>= 4) {
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 
-    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
+    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
              get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 }
 
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 
-    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
+    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 }
 
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     uint32_t a32_flags = 0, a64_flags = 0;
 
     a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
-    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
+    a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
     /* FZ16 does not generate an input denormal exception.  */
     a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
           & ~float_flag_input_denormal_flushed);
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status_a64);
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
-    set_float_exception_flags(0, &env->vfp.standard_fp_status);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
     set_float_exception_flags(0, &env->vfp.ah_fp_status);
     set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_AH_F16].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h        |  3 +--
 target/arm/cpu.c        |  2 +-
 target/arm/vfp_helper.c | 10 +++++-----
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState;
  * behaviour when FPCR.AH == 1: they don't update cumulative
  * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
  * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
- * which means we need an ah_fp_status_f16 as well.
+ * which means we need an FPST_AH_F16 as well.
  *
  * To avoid having to transfer exception bits around, we simply
  * say that the FPSCR cumulative exception flags are the logical
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
                 float_status fp_status_f16_a32;
                 float_status fp_status_f16_a64;
                 float_status ah_fp_status;
-                float_status ah_fp_status_f16;
             };
         };
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
     set_flush_to_zero(1, &env->vfp.ah_fp_status);
     set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
-    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status_f16);
+    arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH_F16]);
 
 #ifndef CONFIG_USER_ONLY
     if (kvm_enabled()) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
     /*
-     * We do not merge in flags from ah_fp_status or ah_fp_status_f16, because
+     * We do not merge in flags from ah_fp_status or FPST_AH_F16, because
      * they are used for insns that must not set the cumulative exception bits.
      */
 
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
     set_float_exception_flags(0, &env->vfp.ah_fp_status);
-    set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH_F16]);
 }
 
 static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-        set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
     }
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
         set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
-        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status_f16);
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
     }
     if (changed & FPCR_AH) {
         bool ah_enabled = val & FPCR_AH;
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_AH].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h        | 3 +--
 target/arm/cpu.c        | 6 +++---
 target/arm/vfp_helper.c | 6 +++---
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState;
  * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
  * using a fixed value for it.
  *
- * The ah_fp_status is needed because some insns have different
+ * FPST_AH is needed because some insns have different
  * behaviour when FPCR.AH == 1: they don't update cumulative
  * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
  * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
                 float_status fp_status_a64;
                 float_status fp_status_f16_a32;
                 float_status fp_status_f16_a64;
-                float_status ah_fp_status;
             };
         };
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
-    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
-    set_flush_to_zero(1, &env->vfp.ah_fp_status);
-    set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
+    arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]);
+    set_flush_to_zero(1, &env->vfp.fp_status[FPST_AH]);
+    set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_AH]);
     arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH_F16]);
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
     /*
-     * We do not merge in flags from ah_fp_status or FPST_AH_F16, because
+     * We do not merge in flags from FPST_AH or FPST_AH_F16, because
      * they are used for insns that must not set the cumulative exception bits.
      */
 
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
-    set_float_exception_flags(0, &env->vfp.ah_fp_status);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH_F16]);
 }
 
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
-        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
     }
     if (changed & FPCR_AH) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_A64_F16].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  2 +-
 target/arm/tcg/sme_helper.c |  2 +-
 target/arm/tcg/vec_helper.c |  9 ++++-----
 target/arm/vfp_helper.c     | 16 ++++++++--------
 5 files changed, 14 insertions(+), 16 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_A32_F16].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  2 +-
 target/arm/tcg/vec_helper.c |  4 ++--
 target/arm/vfp_helper.c     | 14 +++++++-------
 4 files changed, 10 insertions(+), 11 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_A64].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-14-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  2 +-
 target/arm/tcg/sme_helper.c |  2 +-
 target/arm/tcg/vec_helper.c | 10 +++++-----
 target/arm/vfp_helper.c     | 16 ++++++++--------
 5 files changed, 15 insertions(+), 16 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_A32].  As this was the last of the
old structures, we can remove the anonymous union and struct.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-15-richard.henderson@linaro.org
[PMM: tweak to account for change to is_ebf()]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  7 +------
 target/arm/cpu.c            |  2 +-
 target/arm/tcg/vec_helper.c |  2 +-
 target/arm/vfp_helper.c     | 18 +++++++++---------
 4 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
         uint32_t scratch[8];
 
         /* There are a number of distinct float control structures. */
-        union {
-            float_status fp_status[FPST_COUNT];
-            struct {
-                float_status fp_status_a32;
-            };
-        };
+        float_status fp_status[FPST_COUNT];
 
         uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
         uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]);
     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
-    arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]);
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp)
      */
     bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;
 
-    *statusp = is_a64(env) ? env->vfp.fp_status[FPST_A64] : env->vfp.fp_status_a32;
+    *statusp = env->vfp.fp_status[is_a64(env) ? FPST_A64 : FPST_A32];
     set_default_nan_mode(true, statusp);
 
     if (ebf) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
 {
     uint32_t a32_flags = 0, a64_flags = 0;
 
-    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
+    a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_A32]);
     a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
     /* FZ16 does not generate an input denormal exception.  */
     a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A32_F16])
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      * values. The caller should have arranged for env->vfp.fpsr to
      * be the architecturally up-to-date exception flag information first.
      */
-    set_float_exception_flags(0, &env->vfp.fp_status_a32);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32_F16]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
             i = float_round_to_zero;
             break;
         }
-        set_float_rounding_mode(i, &env->vfp.fp_status_a32);
+        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32]);
         set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64]);
         set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]);
         set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
     }
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
-        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64]);
         /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]);
     }
     if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
         /*
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
     }
     if (changed & FPCR_DN) {
         bool dnan_enabled = val & FPCR_DN;
-        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32]);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64]);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32_F16]);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
         FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
 }
 DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
-DO_VFP_cmp(s, float32, float32, fp_status_a32)
-DO_VFP_cmp(d, float64, float64, fp_status_a32)
+DO_VFP_cmp(s, float32, float32, fp_status[FPST_A32])
+DO_VFP_cmp(d, float64, float64, fp_status[FPST_A32])
 #undef DO_VFP_cmp
 
 /* Integer to float and float to integer conversions */
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(fjcvtzs)(float64 value, float_status *status)
 
 uint32_t HELPER(vjcvt)(float64 value, CPUARMState *env)
 {
-    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status_a32);
+    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status[FPST_A32]);
     uint32_t result = pair;
     uint32_t z = (pair >> 32) == 0;
 
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Select on index instead of pointer.
No functional change.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-16-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/mve_helper.c | 40 +++++++++++++------------------------
 1 file changed, 14 insertions(+), 26 deletions(-)

diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                 r[e] = 0;                                               \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(tm & 1)) {                                            \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE * 2)) == 0) {          \
                 continue;                                               \
             }                                                           \
-            fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst0 = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             fpst1 = fpst0;                                              \
             if (!(mask & 1)) {                                          \
                 scratch_fpst = *fpst0;                                  \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
         unsigned e;                                             \
         TYPE *m = vm;                                           \
         TYPE ra = (TYPE)ra_in;                                  \
-        float_status *fpst = (ESIZE == 2) ?                     \
-            &env->vfp.fp_status[FPST_STD_F16] :                 \
-            &env->vfp.fp_status[FPST_STD];                       \
+        float_status *fpst =                                    \
+            &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
             if (mask & 1) {                                     \
                 TYPE v = m[H##ESIZE(e)];                        \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
             if ((mask & emask) == 0) {                                  \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
             if ((mask & emask) == 0) {                                  \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
         unsigned e;                                                     \
         float_status *fpst;                                             \
         float_status scratch_fpst;                                      \
-        float_status *base_fpst = (ESIZE == 2) ?                        \
-            &env->vfp.fp_status[FPST_STD_F16] :                         \
-            &env->vfp.fp_status[FPST_STD];                               \
+        float_status *base_fpst =                                       \
+            &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD];  \
         uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
         set_float_rounding_mode(rmode, base_fpst);                      \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Pass ARMFPStatusFlavour index instead of fp_status[FOO].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-17-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vfp_helper.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static void softfloat_to_vfp_compare(CPUARMState *env, FloatRelation cmp)
 void VFP_HELPER(cmp, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env)  \
 { \
     softfloat_to_vfp_compare(env, \
-        FLOATTYPE ## _compare_quiet(a, b, &env->vfp.FPST)); \
+        FLOATTYPE ## _compare_quiet(a, b, &env->vfp.fp_status[FPST])); \
 } \
 void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
 { \
     softfloat_to_vfp_compare(env, \
-        FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
+        FLOATTYPE ## _compare(a, b, &env->vfp.fp_status[FPST])); \
 }
-DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
-DO_VFP_cmp(s, float32, float32, fp_status[FPST_A32])
-DO_VFP_cmp(d, float64, float64, fp_status[FPST_A32])
+DO_VFP_cmp(h, float16, dh_ctype_f16, FPST_A32_F16)
+DO_VFP_cmp(s, float32, float32, FPST_A32)
+DO_VFP_cmp(d, float64, float64, FPST_A32)
 #undef DO_VFP_cmp
 
 /* Integer to float and float to integer conversions */
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Read the bit from the source, rather than from the proxy via
get_flush_inputs_to_zero.  This makes it clear that it does
not matter which of the float_status structures is used.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-34-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 
     do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
-             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
+             env->vfp.fpcr & FPCR_FZ16);
 }
 
 void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
         }
     }
     do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
+             env->vfp.fpcr & FPCR_FZ16);
 }
 
 void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     float_status *status = &env->vfp.fp_status[FPST_A64];
-    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
+    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
     int negx = 0, negf = 0;
 
     if (is_s) {
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 
     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
-                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
+                 env->vfp.fpcr & FPCR_FZ16);
 }
 
 void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
         }
     }
     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
+                 env->vfp.fpcr & FPCR_FZ16);
 }
 
 void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
     float_status *status = &env->vfp.fp_status[FPST_A64];
-    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
+    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
     int negx = 0, negf = 0;
 
     if (is_s) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Sink common code from the callers into do_fmlal
and do_fmlal_idx.  Reorder the arguments to minimize
the re-sorting from the caller's arguments.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-35-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)