Series comparison

-[PULL 00/35] target-arm queue
+[PULL 00/68] target-arm queue
-Hi; here's the first arm pullreq for the 8.2 cycle. These are
+Hi; this pullreq contains only my FEAT_AFP/FEAT_RPRES patches
-pretty much all bug fixes (mostly for the experimental FEAT_RME),
+(plus a fix for a target/alpha latent bug that would otherwise
-rather than any major features.
+be revealed by the fpu changes), because 68 patches is already
 longer than I prefer to send in at one time...
+thanks
 -- PMM
-The following changes since commit b0dd9a7d6dd15a6898e9c585b521e6bec79b25aa:
+The following changes since commit ffaf7f0376f8040ce9068d71ae9ae8722505c42e:
-  Open 8.2 development tree (2023-08-22 07:14:07 -0700)
+  Merge tag 'pull-10.0-testing-and-gdstub-updates-100225-1' of https://gitlab.com/stsquad/qemu into staging (2025-02-10 13:26:17 -0500)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230824
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20250211
-for you to fetch changes up to cd1e4db73646006039f25879af3bff55b2295ff3:
+for you to fetch changes up to ca4c34e07d1388df8e396520b5e7d60883cd3690:
-  target/arm: Fix 64-bit SSRA (2023-08-22 17:31:14 +0100)
+  target/arm: Sink fp_status and fpcr access into do_fmlal* (2025-02-11 16:22:08 +0000)
 ----------------------------------------------------------------
 target-arm queue:
- * hw/gpio/nrf51: implement DETECT signal
+ * target/alpha: Don't corrupt error_code with unknown softfloat flags
- * accel/kvm: Specify default IPA size for arm64
+ * target/arm: Implement FEAT_AFP and FEAT_RPRES
  * ptw: refactor, fix some FEAT_RME bugs
  * target/arm: Adjust PAR_EL1.SH for Device and Normal-NC memory types
  * target/arm/helper: Implement CNTHCTL_EL2.CNT[VP]MASK
  * Fix SME ST1Q
  * Fix 64-bit SSRA
 ----------------------------------------------------------------
-Akihiko Odaki (6):
+Peter Maydell (49):
-      kvm: Introduce kvm_arch_get_default_type hook
+      target/alpha: Don't corrupt error_code with unknown softfloat flags
-      accel/kvm: Specify default IPA size for arm64
+      fpu: Add float_class_denormal
-      mips: Report an error when KVM_VM_MIPS_VZ is unavailable
+      fpu: Implement float_flag_input_denormal_used
-      accel/kvm: Use negative KVM type for error propagation
+      fpu: allow flushing of output denormals to be after rounding
-      accel/kvm: Free as when an error occurred
+      target/arm: Define FPCR AH, FIZ, NEP bits
-      accel/kvm: Make kvm_dirty_ring_reaper_init() void
+      target/arm: Implement FPCR.FIZ handling
       target/arm: Adjust FP behaviour for FPCR.AH = 1
       target/arm: Adjust exception flag handling for AH = 1
       target/arm: Add FPCR.AH to tbflags
       target/arm: Set up float_status to use for FPCR.AH=1 behaviour
       target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
       target/arm: Use FPST_FPCR_AH for BFCVT* insns
       target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
       target/arm: Add FPCR.NEP to TBFLAGS
       target/arm: Define and use new write_fp_*reg_merging() functions
       target/arm: Handle FPCR.NEP for 3-input scalar operations
       target/arm: Handle FPCR.NEP for BFCVT scalar
       target/arm: Handle FPCR.NEP for 1-input scalar operations
       target/arm: Handle FPCR.NEP in do_cvtf_scalar()
       target/arm: Handle FPCR.NEP for scalar FABS and FNEG
       target/arm: Handle FPCR.NEP for FCVTXN (scalar)
       target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
       target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
       target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
       target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
       target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
       target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
       target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
       target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
       target/arm: Implement FPCR.AH handling of negation of NaN
       target/arm: Implement FPCR.AH handling for scalar FABS and FABD
       target/arm: Handle FPCR.AH in vector FABD
       target/arm: Handle FPCR.AH in SVE FNEG
       target/arm: Handle FPCR.AH in SVE FABS
       target/arm: Handle FPCR.AH in SVE FABD
       target/arm: Handle FPCR.AH in negation steps in SVE FCADD
       target/arm: Handle FPCR.AH in negation steps in FCADD
       target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
       target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
       target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
       target/arm: Handle FPCR.AH in negation in FMLS (vector)
       target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
       target/arm: Handle FPCR.AH in SVE FTSSEL
       target/arm: Handle FPCR.AH in SVE FTMAD
       target/arm: Enable FEAT_AFP for '-cpu max'
       target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
       target/arm: Implement increased precision FRECPE
       target/arm: Implement increased precision FRSQRTE
       target/arm: Enable FEAT_RPRES for -cpu max
-Chris Laplante (6):
+Richard Henderson (19):
-      hw/gpio/nrf51: implement DETECT signal
+      target/arm: Handle FPCR.AH in vector FCMLA
-      qtest: factor out qtest_install_gpio_out_intercept
+      target/arm: Handle FPCR.AH in FCMLA by index
-      qtest: implement named interception of out-GPIO
+      target/arm: Handle FPCR.AH in SVE FCMLA
-      qtest: bail from irq_intercept_in if name is specified
+      target/arm: Handle FPCR.AH in FMLSL (by element and vector)
-      qtest: irq_intercept_[out/in]: return FAIL if no intercepts are installed
+      target/arm: Handle FPCR.AH in SVE FMLSL (indexed)
-      qtest: microbit-test: add tests for nRF51 DETECT
+      target/arm: Handle FPCR.AH in SVE FMLSLB, FMLSLT (vectors)
       target/arm: Introduce CPUARMState.vfp.fp_status[]
       target/arm: Remove standard_fp_status_f16
       target/arm: Remove standard_fp_status
       target/arm: Remove ah_fp_status_f16
       target/arm: Remove ah_fp_status
       target/arm: Remove fp_status_f16_a64
       target/arm: Remove fp_status_f16_a32
       target/arm: Remove fp_status_a64
       target/arm: Remove fp_status_a32
       target/arm: Simplify fp_status indexing in mve_helper.c
       target/arm: Simplify DO_VFP_cmp in vfp_helper.c
       target/arm: Read fz16 from env->vfp.fpcr
       target/arm: Sink fp_status and fpcr access into do_fmlal*
-Jean-Philippe Brucker (6):
+ docs/system/arm/emulation.rst   |   2 +
-      target/arm/ptw: Load stage-2 tables from realm physical space
+ include/fpu/softfloat-helpers.h |  11 +
-      target/arm/helper: Fix tlbmask and tlbbits for TLBI VAE2*
+ include/fpu/softfloat-types.h   |  25 ++
-      target/arm: Skip granule protection checks for AT instructions
+ target/arm/cpu-features.h       |  10 +
-      target/arm: Pass security space rather than flag for AT instructions
+ target/arm/cpu.h                |  97 +++--
-      target/arm/helper: Check SCR_EL3.{NSE, NS} encoding for AT instructions
+ target/arm/helper.h             |  26 ++
-      target/arm/helper: Implement CNTHCTL_EL2.CNT[VP]MASK
+ target/arm/internals.h          |   6 +
+ target/arm/tcg/helper-a64.h     |  13 +
-Peter Maydell (15):
+ target/arm/tcg/helper-sve.h     | 120 ++++++
-      target/arm/ptw: Don't set fi->s1ptw for UnsuppAtomicUpdate fault
+ target/arm/tcg/translate-a64.h  |  13 +
-      target/arm/ptw: Don't report GPC faults on stage 1 ptw as stage2 faults
+ target/arm/tcg/translate.h      |  54 +--
-      target/arm/ptw: Set s1ns bit in fault info more consistently
+ target/arm/tcg/vec_internal.h   |  35 ++
-      target/arm/ptw: Pass ptw into get_phys_addr_pmsa*() and get_phys_addr_disabled()
+ target/mips/fpu_helper.h        |   6 +
-      target/arm/ptw: Pass ARMSecurityState to regime_translation_disabled()
+ fpu/softfloat.c                 |  66 +++-
-      target/arm/ptw: Pass an ARMSecuritySpace to arm_hcr_el2_eff_secstate()
+ target/alpha/cpu.c              |   7 +
-      target/arm: Pass an ARMSecuritySpace to arm_is_el2_enabled_secstate()
+ target/alpha/fpu_helper.c       |   2 +
-      target/arm/ptw: Only fold in NSTable bit effects in Secure state
+ target/arm/cpu.c                |  46 +--
-      target/arm/ptw: Remove last uses of ptw->in_secure
+ target/arm/helper.c             |   2 +-
-      target/arm/ptw: Remove S1Translate::in_secure
+ target/arm/tcg/cpu64.c          |   2 +
-      target/arm/ptw: Drop S1Translate::out_secure
+ target/arm/tcg/helper-a64.c     | 151 ++++----
-      target/arm/ptw: Set attributes correctly for MMU disabled data accesses
+ target/arm/tcg/hflags.c         |  13 +
-      target/arm/ptw: Check for block descriptors at invalid levels
+ target/arm/tcg/mve_helper.c     |  44 +--
-      target/arm/ptw: Report stage 2 fault level for stage 2 faults on stage 1 ptw
+ target/arm/tcg/sme_helper.c     |   4 +-
-      target/arm: Adjust PAR_EL1.SH for Device and Normal-NC memory types
+ target/arm/tcg/sve_helper.c     | 367 ++++++++++++++-----
+ target/arm/tcg/translate-a64.c  | 782 ++++++++++++++++++++++++++++++++--------
-Richard Henderson (2):
+ target/arm/tcg/translate-sve.c  | 193 +++++++---
-      target/arm: Fix SME ST1Q
+ target/arm/tcg/vec_helper.c     | 387 ++++++++++++++------
-      target/arm: Fix 64-bit SSRA
+ target/arm/vfp_helper.c         | 374 +++++++++++++++----
+ target/hppa/fpu_helper.c        |  11 +
- include/hw/gpio/nrf51_gpio.h |   1 +
+ target/i386/tcg/fpu_helper.c    |   8 +
- include/sysemu/kvm.h         |   2 +
+ target/mips/msa.c               |   9 +
- target/arm/cpu.h             |  19 ++--
+ target/ppc/cpu_init.c           |   3 +
- target/arm/internals.h       |  25 ++---
+ target/rx/cpu.c                 |   8 +
- target/mips/kvm_mips.h       |   9 --
+ target/sh4/cpu.c                |   8 +
- tests/qtest/libqtest.h       |  11 +++
+ target/tricore/helper.c         |   1 +
- accel/kvm/kvm-all.c          |  19 ++--
+ tests/fp/fp-bench.c             |   1 +
- hw/arm/virt.c                |   2 +-
+ fpu/softfloat-parts.c.inc       | 127 +++++--
- hw/gpio/nrf51_gpio.c         |  14 ++-
+files changed, 2325 insertions(+), 709 deletions(-)
  hw/mips/loongson3_virt.c     |   2 -
  hw/ppc/spapr.c               |   2 +-
  softmmu/qtest.c              |  52 +++++++---
  target/arm/cpu.c             |   6 ++
  target/arm/helper.c          | 207 ++++++++++++++++++++++++++++----------
  target/arm/kvm.c             |   7 ++
  target/arm/ptw.c             | 231 ++++++++++++++++++++++++++-----------------
  target/arm/tcg/sme_helper.c  |   2 +-
  target/arm/tcg/translate.c   |   2 +-
  target/i386/kvm/kvm.c        |   5 +
  target/mips/kvm.c            |   3 +-
  target/ppc/kvm.c             |   5 +
  target/riscv/kvm.c           |   5 +
  target/s390x/kvm/kvm.c       |   5 +
  tests/qtest/libqtest.c       |   6 ++
  tests/qtest/microbit-test.c  |  44 +++++++++
  target/arm/trace-events      |   7 +-
 files changed, 494 insertions(+), 199 deletions(-)

-New patch
+[PULL 01/68] target/alpha: Don't corrupt error_code with unknown softfloat flags
+In do_cvttq() we set env->error_code with what is supposed to be a
+set of FPCR exception bit values.  However, if the set of float
+exception flags we get back from softfloat for the conversion
+includes a flag which is not one of the three we expect here
+(invalid_cvti, invalid, inexact) then we will fall through the
+if-ladder and set env->error_code to the unconverted softfloat
+exception_flag value.  This will then cause us to take a spurious
+exception.
+This is harmless now, but when we add new floating point exception
+flags to softfloat it will cause problems.  Add an else clause to the
+if-ladder to make it ignore any float exception flags it doesn't care
+about.
+Specifically, without this fix, 'make check-tcg' will fail for Alpha
+when the commit adding float_flag_input_denormal_used lands.
+Fixes: aa3bad5b59e7 ("target/alpha: Use float64_to_int64_modulo for CVTTQ")
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+---
+ target/alpha/fpu_helper.c | 2 ++
+file changed, 2 insertions(+)
+diff --git a/target/alpha/fpu_helper.c b/target/alpha/fpu_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/alpha/fpu_helper.c
++++ b/target/alpha/fpu_helper.c
+@@ -XXX,XX +XXX,XX @@ static uint64_t do_cvttq(CPUAlphaState *env, uint64_t a, int roundmode)
+             exc = FPCR_INV;
+         } else if (exc & float_flag_inexact) {
+             exc = FPCR_INE;
++        } else {
++            exc = 0;
+         }
+     }
+     env->error_code = exc;
+--
+.34.1

-New patch
+[PULL 02/68] fpu: Add float_class_denormal
+Currently in softfloat we canonicalize input denormals and so the
+code that implements floating point operations does not need to care
+whether the input value was originally normal or denormal.  However,
+both x86 and Arm FEAT_AFP require that an exception flag is set if:
+ * an input is denormal
+ * that input is not squashed to zero
+ * that input is actually used in the calculation (e.g. we
+   did not find the other input was a NaN)
+So we need to track that the input was a non-squashed denormal.  To
+do this we add a new value to the FloatClass enum.  In this commit we
+add the value and adjust the code everywhere that looks at FloatClass
+values so that the new float_class_denormal behaves identically to
+float_class_normal.  We will add the code that does the "raise a new
+float exception flag if an input was an unsquashed denormal and we
+used it" in a subsequent commit.
+There should be no behavioural change in this commit.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ fpu/softfloat.c           | 32 ++++++++++++++++++++++++++++---
+ fpu/softfloat-parts.c.inc | 40 ++++++++++++++++++++++++---------------
+files changed, 54 insertions(+), 18 deletions(-)
+diff --git a/fpu/softfloat.c b/fpu/softfloat.c
+index XXXXXXX..XXXXXXX 100644
+--- a/fpu/softfloat.c
++++ b/fpu/softfloat.c
+@@ -XXX,XX +XXX,XX @@ float64_gen2(float64 xa, float64 xb, float_status *s,
+ /*
+  * Classify a floating point number. Everything above float_class_qnan
+  * is a NaN so cls >= float_class_qnan is any NaN.
++ *
++ * Note that we canonicalize denormals, so most code should treat
++ * class_normal and class_denormal identically.
+  */
+ typedef enum __attribute__ ((__packed__)) {
+     float_class_unclassified,
+     float_class_zero,
+     float_class_normal,
++    float_class_denormal, /* input was a non-squashed denormal */
+     float_class_inf,
+     float_class_qnan,  /* all NaNs from here */
+     float_class_snan,
+@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__ ((__packed__)) {
+ enum {
+     float_cmask_zero    = float_cmask(float_class_zero),
+     float_cmask_normal  = float_cmask(float_class_normal),
++    float_cmask_denormal = float_cmask(float_class_denormal),
+     float_cmask_inf     = float_cmask(float_class_inf),
+     float_cmask_qnan    = float_cmask(float_class_qnan),
+     float_cmask_snan    = float_cmask(float_class_snan),
+     float_cmask_infzero = float_cmask_zero | float_cmask_inf,
+     float_cmask_anynan  = float_cmask_qnan | float_cmask_snan,
++    float_cmask_anynorm = float_cmask_normal | float_cmask_denormal,
+ };
+ /* Flags for parts_minmax. */
+@@ -XXX,XX +XXX,XX @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
+     return c == float_class_qnan;
+ }
++/*
++ * Return true if the float_cmask has only normals in it
++ * (including input denormals that were canonicalized)
++ */
++static inline bool cmask_is_only_normals(int cmask)
++{
++    return !(cmask & ~float_cmask_anynorm);
++}
++
++static inline bool is_anynorm(FloatClass c)
++{
++    return float_cmask(c) & float_cmask_anynorm;
++}
++
+ /*
+  * Structure holding all of the decomposed parts of a float.
+  * The exponent is unbiased and the fraction is normalized.
+@@ -XXX,XX +XXX,XX @@ static float64 float64r32_round_pack_canonical(FloatParts64 *p,
+      */
+     switch (p->cls) {
+     case float_class_normal:
++    case float_class_denormal:
+         if (unlikely(p->exp == 0)) {
+             /*
+              * The result is denormal for float32, but can be represented
+@@ -XXX,XX +XXX,XX @@ static floatx80 floatx80_round_pack_canonical(FloatParts128 *p,
+     switch (p->cls) {
+     case float_class_normal:
++    case float_class_denormal:
+         if (s->floatx80_rounding_precision == floatx80_precision_x) {
+             parts_uncanon_normal(p, s, fmt);
+             frac = p->frac_hi;
+@@ -XXX,XX +XXX,XX @@ static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
+         break;
+     case float_class_normal:
++    case float_class_denormal:
+     case float_class_zero:
+         break;
+@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
+     a->sign = b->sign;
+     a->exp = b->exp;
+-    if (a->cls == float_class_normal) {
++    if (is_anynorm(a->cls)) {
+         frac_truncjam(a, b);
+     } else if (is_nan(a->cls)) {
+         /* Discard the low bits of the NaN. */
+@@ -XXX,XX +XXX,XX @@ static Int128 float128_to_int128_scalbn(float128 a, FloatRoundMode rmode,
+         return int128_zero();
+     case float_class_normal:
++    case float_class_denormal:
+         if (parts_round_to_int_normal(&p, rmode, scale, 128 - 2)) {
+             flags = float_flag_inexact;
+         }
+@@ -XXX,XX +XXX,XX @@ static Int128 float128_to_uint128_scalbn(float128 a, FloatRoundMode rmode,
+         return int128_zero();
+     case float_class_normal:
++    case float_class_denormal:
+         if (parts_round_to_int_normal(&p, rmode, scale, 128 - 2)) {
+             flags = float_flag_inexact;
+             if (p.cls == float_class_zero) {
+@@ -XXX,XX +XXX,XX @@ float32 float32_exp2(float32 a, float_status *status)
+     float32_unpack_canonical(&xp, a, status);
+     if (unlikely(xp.cls != float_class_normal)) {
+         switch (xp.cls) {
++        case float_class_denormal:
++            break;
+         case float_class_snan:
+         case float_class_qnan:
+             parts_return_nan(&xp, status);
+@@ -XXX,XX +XXX,XX @@ float32 float32_exp2(float32 a, float_status *status)
+         case float_class_zero:
+             return float32_one;
+         default:
+-            break;
++            g_assert_not_reached();
+         }
+-        g_assert_not_reached();
+     }
+     float_raise(float_flag_inexact, status);
+diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
+index XXXXXXX..XXXXXXX 100644
+--- a/fpu/softfloat-parts.c.inc
++++ b/fpu/softfloat-parts.c.inc
+@@ -XXX,XX +XXX,XX @@ static void partsN(canonicalize)(FloatPartsN *p, float_status *status,
+             frac_clear(p);
+         } else {
+             int shift = frac_normalize(p);
+-            p->cls = float_class_normal;
++            p->cls = float_class_denormal;
+             p->exp = fmt->frac_shift - fmt->exp_bias
+                    - shift + !fmt->m68k_denormal;
+         }
+@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
+ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
+                             const FloatFmt *fmt)
+ {
+-    if (likely(p->cls == float_class_normal)) {
++    if (likely(is_anynorm(p->cls))) {
+         parts_uncanon_normal(p, s, fmt);
+     } else {
+         switch (p->cls) {
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
+     if (a->sign != b_sign) {
+         /* Subtraction */
+-        if (likely(ab_mask == float_cmask_normal)) {
++        if (likely(cmask_is_only_normals(ab_mask))) {
+             if (parts_sub_normal(a, b)) {
+                 return a;
+             }
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
+         }
+     } else {
+         /* Addition */
+-        if (likely(ab_mask == float_cmask_normal)) {
++        if (likely(cmask_is_only_normals(ab_mask))) {
+             parts_add_normal(a, b);
+             return a;
+         }
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
+     }
+     if (b->cls == float_class_zero) {
+-        g_assert(a->cls == float_class_normal);
++        g_assert(is_anynorm(a->cls));
+         return a;
+     }
+     g_assert(a->cls == float_class_zero);
+-    g_assert(b->cls == float_class_normal);
++    g_assert(is_anynorm(b->cls));
+  return_b:
+     b->sign = b_sign;
+     return b;
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
+     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
+     bool sign = a->sign ^ b->sign;
+-    if (likely(ab_mask == float_cmask_normal)) {
++    if (likely(cmask_is_only_normals(ab_mask))) {
+         FloatPartsW tmp;
+         frac_mulw(&tmp, a, b);
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
+         a->sign ^= 1;
+     }
+-    if (unlikely(ab_mask != float_cmask_normal)) {
++    if (unlikely(!cmask_is_only_normals(ab_mask))) {
+         if (unlikely(ab_mask == float_cmask_infzero)) {
+             float_raise(float_flag_invalid | float_flag_invalid_imz, s);
+             goto d_nan;
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
+         }
+         g_assert(ab_mask & float_cmask_zero);
+-        if (c->cls == float_class_normal) {
++        if (is_anynorm(c->cls)) {
+             *a = *c;
+             goto return_normal;
+         }
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
+     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
+     bool sign = a->sign ^ b->sign;
+-    if (likely(ab_mask == float_cmask_normal)) {
++    if (likely(cmask_is_only_normals(ab_mask))) {
+         a->sign = sign;
+         a->exp -= b->exp + frac_div(a, b);
+         return a;
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
+ {
+     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
+-    if (likely(ab_mask == float_cmask_normal)) {
++    if (likely(cmask_is_only_normals(ab_mask))) {
+         frac_modrem(a, b, mod_quot);
+         return a;
+     }
+@@ -XXX,XX +XXX,XX @@ static void partsN(sqrt)(FloatPartsN *a, float_status *status,
+     if (unlikely(a->cls != float_class_normal)) {
+         switch (a->cls) {
++        case float_class_denormal:
++            break;
+         case float_class_snan:
+         case float_class_qnan:
+             parts_return_nan(a, status);
+@@ -XXX,XX +XXX,XX @@ static void partsN(round_to_int)(FloatPartsN *a, FloatRoundMode rmode,
+     case float_class_inf:
+         break;
+     case float_class_normal:
++    case float_class_denormal:
+         if (parts_round_to_int_normal(a, rmode, scale, fmt->frac_size)) {
+             float_raise(float_flag_inexact, s);
+         }
+@@ -XXX,XX +XXX,XX @@ static int64_t partsN(float_to_sint)(FloatPartsN *p, FloatRoundMode rmode,
+         return 0;
+     case float_class_normal:
++    case float_class_denormal:
+         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
+         if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
+             flags = float_flag_inexact;
+@@ -XXX,XX +XXX,XX @@ static uint64_t partsN(float_to_uint)(FloatPartsN *p, FloatRoundMode rmode,
+         return 0;
+     case float_class_normal:
++    case float_class_denormal:
+         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
+         if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
+             flags = float_flag_inexact;
+@@ -XXX,XX +XXX,XX @@ static int64_t partsN(float_to_sint_modulo)(FloatPartsN *p,
+         return 0;
+     case float_class_normal:
++    case float_class_denormal:
+         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
+         if (parts_round_to_int_normal(p, rmode, 0, N - 2)) {
+             flags = float_flag_inexact;
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
+     a_exp = a->exp;
+     b_exp = b->exp;
+-    if (unlikely(ab_mask != float_cmask_normal)) {
++    if (unlikely(!cmask_is_only_normals(ab_mask))) {
+         switch (a->cls) {
+         case float_class_normal:
++        case float_class_denormal:
+             break;
+         case float_class_inf:
+             a_exp = INT16_MAX;
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
+         }
+         switch (b->cls) {
+         case float_class_normal:
++        case float_class_denormal:
+             break;
+         case float_class_inf:
+             b_exp = INT16_MAX;
+@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
+ {
+     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
+-    if (likely(ab_mask == float_cmask_normal)) {
++    if (likely(cmask_is_only_normals(ab_mask))) {
+         FloatRelation cmp;
+         if (a->sign != b->sign) {
+@@ -XXX,XX +XXX,XX @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
+     case float_class_inf:
+         break;
+     case float_class_normal:
++    case float_class_denormal:
+         a->exp += MIN(MAX(n, -0x10000), 0x10000);
+         break;
+     default:
+@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
+     if (unlikely(a->cls != float_class_normal)) {
+         switch (a->cls) {
++        case float_class_denormal:
++            break;
+         case float_class_snan:
+         case float_class_qnan:
+             parts_return_nan(a, s);
+@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
+             }
+             return;
+         default:
+-            break;
++            g_assert_not_reached();
+         }
+-        g_assert_not_reached();
+     }
+     if (unlikely(a->sign)) {
+         goto d_nan;
+--
+.34.1

-[PULL 04/35] qtest: bail from irq_intercept_in if name is specified
+[PULL 03/68] fpu: Implement float_flag_input_denormal_used
-From: Chris Laplante <chris@laplante.io>
+For the x86 and the Arm FEAT_AFP semantics, we need to be able to
 tell the target code that the FPU operation has used an input
 denormal.  Implement this; when it happens we set the new
 float_flag_denormal_input_used.
-Named interception of in-GPIOs is not supported yet.
+Note that we only set this when an input denormal is actually used by
 the operation: if the operation results in Invalid Operation or
 Divide By Zero or the result is a NaN because some other input was a
 NaN then we never needed to look at the input denormal and do not set
 denormal_input_used.
-Signed-off-by: Chris Laplante <chris@laplante.io>
+We mostly do not need to adjust the hardfloat codepaths to deal with
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+this flag, because almost all hardfloat operations are already gated
-Message-id: 20230728160324.1159090-5-chris@laplante.io
+on the input not being a denormal, and will fall back to softfloat
 for a denormal input.  The only exception is the comparison
 operations, where we need to add the check for input denormals, which
 must now fall back to softfloat where they did not before.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- softmmu/qtest.c | 8 ++++++++
+ include/fpu/softfloat-types.h |  7 ++++
-file changed, 8 insertions(+)
+ fpu/softfloat.c               | 38 +++++++++++++++++---
  fpu/softfloat-parts.c.inc     | 68 ++++++++++++++++++++++++++++++++++-
 files changed, 107 insertions(+), 6 deletions(-)
-diff --git a/softmmu/qtest.c b/softmmu/qtest.c
+diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
 index XXXXXXX..XXXXXXX 100644
---- a/softmmu/qtest.c
+--- a/include/fpu/softfloat-types.h
-+++ b/softmmu/qtest.c
++++ b/include/fpu/softfloat-types.h
-@@ -XXX,XX +XXX,XX @@ static void qtest_process_command(CharBackend *chr, gchar **words)
+@@ -XXX,XX +XXX,XX @@ enum {
-         || strcmp(words[0], "irq_intercept_in") == 0) {
+     float_flag_invalid_sqrt    = 0x0800,  /* sqrt(-x) */
-         DeviceState *dev;
+     float_flag_invalid_cvti    = 0x1000,  /* non-nan to integer */
-         NamedGPIOList *ngl;
+     float_flag_invalid_snan    = 0x2000,  /* any operand was snan */
-+        bool is_named;
++    /*
-         bool is_outbound;
++     * An input was denormal and we used it (without flushing it to zero).
++     * Not set if we do not actually use the denormal input (e.g.
-         g_assert(words[1]);
++     * because some other input was a NaN, or because the operation
-+        is_named = words[2] != NULL;
++     * wasn't actually carried out (divide-by-zero; invalid))
-         is_outbound = words[0][14] == 'o';
++     */
-         dev = DEVICE(object_resolve_path(words[1], NULL));
++    float_flag_input_denormal_used = 0x4000,
-         if (!dev) {
+ };
-@@ -XXX,XX +XXX,XX @@ static void qtest_process_command(CharBackend *chr, gchar **words)
-             return;
+ /*
 diff --git a/fpu/softfloat.c b/fpu/softfloat.c
 index XXXXXXX..XXXXXXX 100644
 --- a/fpu/softfloat.c
 +++ b/fpu/softfloat.c
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
                                    float16_params_ahp.frac_size + 1);
          break;
 -    case float_class_normal:
      case float_class_denormal:
 +        float_raise(float_flag_input_denormal_used, s);
 +        break;
 +    case float_class_normal:
      case float_class_zero:
          break;
@@ -XXX,XX +XXX,XX @@ static void parts64_float_to_float(FloatParts64 *a, float_status *s)
      if (is_nan(a->cls)) {
          parts_return_nan(a, s);
      }
 +    if (a->cls == float_class_denormal) {
 +        float_raise(float_flag_input_denormal_used, s);
 +    }
  }
  static void parts128_float_to_float(FloatParts128 *a, float_status *s)
@@ -XXX,XX +XXX,XX @@ static void parts128_float_to_float(FloatParts128 *a, float_status *s)
      if (is_nan(a->cls)) {
          parts_return_nan(a, s);
      }
 +    if (a->cls == float_class_denormal) {
 +        float_raise(float_flag_input_denormal_used, s);
 +    }
  }
  #define parts_float_to_float(P, S) \
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
      a->sign = b->sign;
      a->exp = b->exp;
 -    if (is_anynorm(a->cls)) {
 +    switch (a->cls) {
 +    case float_class_denormal:
 +        float_raise(float_flag_input_denormal_used, s);
 +        /* fall through */
 +    case float_class_normal:
          frac_truncjam(a, b);
 -    } else if (is_nan(a->cls)) {
 +        break;
 +    case float_class_snan:
 +    case float_class_qnan:
          /* Discard the low bits of the NaN. */
          a->frac = b->frac_hi;
          parts_return_nan(a, s);
 +        break;
 +    default:
 +        break;
      }
  }
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_widen(FloatParts128 *a, FloatParts64 *b,
      if (is_nan(a->cls)) {
          parts_return_nan(a, s);
      }
 +    if (a->cls == float_class_denormal) {
 +        float_raise(float_flag_input_denormal_used, s);
 +    }
  }
  float32 float16_to_float32(float16 a, bool ieee, float_status *s)
@@ -XXX,XX +XXX,XX @@ float32_hs_compare(float32 xa, float32 xb, float_status *s, bool is_quiet)
          goto soft;
      }
 -    float32_input_flush2(&ua.s, &ub.s, s);
 +    if (unlikely(float32_is_denormal(ua.s) || float32_is_denormal(ub.s))) {
 +        /* We may need to set the input_denormal_used flag */
 +        goto soft;
 +    }
 +
      if (isgreaterequal(ua.h, ub.h)) {
          if (isgreater(ua.h, ub.h)) {
              return float_relation_greater;
@@ -XXX,XX +XXX,XX @@ float64_hs_compare(float64 xa, float64 xb, float_status *s, bool is_quiet)
          goto soft;
      }
 -    float64_input_flush2(&ua.s, &ub.s, s);
 +    if (unlikely(float64_is_denormal(ua.s) || float64_is_denormal(ub.s))) {
 +        /* We may need to set the input_denormal_used flag */
 +        goto soft;
 +    }
 +
      if (isgreaterequal(ua.h, ub.h)) {
          if (isgreater(ua.h, ub.h)) {
              return float_relation_greater;
 diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/fpu/softfloat-parts.c.inc
 +++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
      bool b_sign = b->sign ^ subtract;
      int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 +    /*
 +     * For addition and subtraction, we will consume an
 +     * input denormal unless the other input is a NaN.
 +     */
 +    if ((ab_mask & (float_cmask_denormal | float_cmask_anynan)) ==
 +        float_cmask_denormal) {
 +        float_raise(float_flag_input_denormal_used, s);
 +    }
 +
      if (a->sign != b_sign) {
          /* Subtraction */
          if (likely(cmask_is_only_normals(ab_mask))) {
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
      if (likely(cmask_is_only_normals(ab_mask))) {
          FloatPartsW tmp;
 +        if (ab_mask & float_cmask_denormal) {
 +            float_raise(float_flag_input_denormal_used, s);
 +        }
 +
          frac_mulw(&tmp, a, b);
          frac_truncjam(a, &tmp);
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
      }
      /* Multiply by 0 or Inf */
 +    if (ab_mask & float_cmask_denormal) {
 +        float_raise(float_flag_input_denormal_used, s);
 +    }
 +
      if (ab_mask & float_cmask_inf) {
          a->cls = float_class_inf;
          a->sign = sign;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
      if (flags & float_muladd_negate_result) {
          a->sign ^= 1;
      }
 +
 +    /*
 +     * All result types except for "return the default NaN
 +     * because this is an Invalid Operation" go through here;
 +     * this matches the set of cases where we consumed a
 +     * denormal input.
 +     */
 +    if (abc_mask & float_cmask_denormal) {
 +        float_raise(float_flag_input_denormal_used, s);
 +    }
      return a;
   return_sub_zero:
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
      bool sign = a->sign ^ b->sign;
      if (likely(cmask_is_only_normals(ab_mask))) {
 +        if (ab_mask & float_cmask_denormal) {
 +            float_raise(float_flag_input_denormal_used, s);
 +        }
          a->sign = sign;
          a->exp -= b->exp + frac_div(a, b);
          return a;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
          return parts_pick_nan(a, b, s);
      }
 +    if ((ab_mask & float_cmask_denormal) && b->cls != float_class_zero) {
 +        float_raise(float_flag_input_denormal_used, s);
 +    }
 +
      a->sign = sign;
      /* Inf / X */
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
      int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
      if (likely(cmask_is_only_normals(ab_mask))) {
 +        if (ab_mask & float_cmask_denormal) {
 +            float_raise(float_flag_input_denormal_used, s);
 +        }
          frac_modrem(a, b, mod_quot);
          return a;
      }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
          return a;
      }
 +    if (ab_mask & float_cmask_denormal) {
 +        float_raise(float_flag_input_denormal_used, s);
 +    }
 +
      /* N % Inf; 0 % N */
      g_assert(b->cls == float_class_inf || a->cls == float_class_zero);
      return a;
@@ -XXX,XX +XXX,XX @@ static void partsN(sqrt)(FloatPartsN *a, float_status *status,
      if (unlikely(a->cls != float_class_normal)) {
          switch (a->cls) {
          case float_class_denormal:
 +            if (!a->sign) {
 +                /* -ve denormal will be InvalidOperation */
 +                float_raise(float_flag_input_denormal_used, status);
 +            }
              break;
          case float_class_snan:
          case float_class_qnan:
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
          if ((flags & (minmax_isnum | minmax_isnumber))
              && !(ab_mask & float_cmask_snan)
              && (ab_mask & ~float_cmask_qnan)) {
 +            if (ab_mask & float_cmask_denormal) {
 +                float_raise(float_flag_input_denormal_used, s);
 +            }
              return is_nan(a->cls) ? b : a;
          }
-+        if (is_named && !is_outbound) {
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
-+            qtest_send_prefix(chr);
+         return parts_pick_nan(a, b, s);
-+            qtest_send(chr, "FAIL Interception of named in-GPIOs not yet supported\n");
+     }
-+            return;
 +    if (ab_mask & float_cmask_denormal) {
 +        float_raise(float_flag_input_denormal_used, s);
 +    }
 +
      a_exp = a->exp;
      b_exp = b->exp;
@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
      if (likely(cmask_is_only_normals(ab_mask))) {
          FloatRelation cmp;
 +        if (ab_mask & float_cmask_denormal) {
 +            float_raise(float_flag_input_denormal_used, s);
 +        }
 +
-         if (irq_intercept_dev) {
+         if (a->sign != b->sign) {
-             qtest_send_prefix(chr);
+             goto a_sign;
-             if (irq_intercept_dev != dev) {
+         }
@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
          return float_relation_unordered;
      }
 +    if (ab_mask & float_cmask_denormal) {
 +        float_raise(float_flag_input_denormal_used, s);
 +    }
 +
      if (ab_mask & float_cmask_zero) {
          if (ab_mask == float_cmask_zero) {
              return float_relation_equal;
@@ -XXX,XX +XXX,XX @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
      case float_class_zero:
      case float_class_inf:
          break;
 -    case float_class_normal:
      case float_class_denormal:
 +        float_raise(float_flag_input_denormal_used, s);
 +        /* fall through */
 +    case float_class_normal:
          a->exp += MIN(MAX(n, -0x10000), 0x10000);
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
      if (unlikely(a->cls != float_class_normal)) {
          switch (a->cls) {
          case float_class_denormal:
 +            if (!a->sign) {
 +                /* -ve denormal will be InvalidOperation */
 +                float_raise(float_flag_input_denormal_used, s);
 +            }
              break;
          case float_class_snan:
          case float_class_qnan:
 --
 .34.1

-New patch
+[PULL 04/68] fpu: allow flushing of output denormals to be after rounding
+Currently we handle flushing of output denormals in uncanon_normal
 always before we deal with rounding.  This works for architectures
 that detect tininess before rounding, but is usually not the right
 place when the architecture detects tininess after rounding.  For
 example, for x86 the SDM states that the MXCSR FTZ control bit causes
 outputs to be flushed to zero "when it detects a floating-point
 underflow condition".  This means that we mustn't flush to zero if
 the input is such that after rounding it is no longer tiny.
 At least one of our guest architectures does underflow detection
 after rounding but flushing of denormals before rounding (MIPS MSA);
 this means we need to have a config knob for this that is separate
 from our existing tininess_before_rounding setting.
 Add an ftz_detection flag.  For consistency with
 tininess_before_rounding, we make it default to "detect ftz after
 rounding"; this means that we need to explicitly set the flag to
 "detect ftz before rounding" on every existing architecture that sets
 flush_to_zero, so that this commit has no behaviour change.
 (This means more code change here but for the long term a less
 confusing API.)
 For several architectures the current behaviour is either
 definitely or possibly wrong; annotate those with TODO comments.
 These architectures are definitely wrong (and should detect
 ftz after rounding):
  * x86
  * Alpha
 For these architectures the spec is unclear:
  * MIPS (for non-MSA)
  * RX
  * SH4
 PA-RISC makes ftz detection IMPDEF, but we aren't setting the
 "tininess before rounding" setting that we ought to.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  include/fpu/softfloat-helpers.h | 11 +++++++++++
  include/fpu/softfloat-types.h   | 18 ++++++++++++++++++
  target/mips/fpu_helper.h        |  6 ++++++
  target/alpha/cpu.c              |  7 +++++++
  target/arm/cpu.c                |  1 +
  target/hppa/fpu_helper.c        | 11 +++++++++++
  target/i386/tcg/fpu_helper.c    |  8 ++++++++
  target/mips/msa.c               |  9 +++++++++
  target/ppc/cpu_init.c           |  3 +++
  target/rx/cpu.c                 |  8 ++++++++
  target/sh4/cpu.c                |  8 ++++++++
  target/tricore/helper.c         |  1 +
  tests/fp/fp-bench.c             |  1 +
  fpu/softfloat-parts.c.inc       | 21 +++++++++++++++------
 files changed, 107 insertions(+), 6 deletions(-)
 diff --git a/include/fpu/softfloat-helpers.h b/include/fpu/softfloat-helpers.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/fpu/softfloat-helpers.h
 +++ b/include/fpu/softfloat-helpers.h
@@ -XXX,XX +XXX,XX @@ static inline void set_flush_inputs_to_zero(bool val, float_status *status)
      status->flush_inputs_to_zero = val;
  }
 +static inline void set_float_ftz_detection(FloatFTZDetection d,
 +                                           float_status *status)
 +{
 +    status->ftz_detection = d;
 +}
 +
  static inline void set_default_nan_mode(bool val, float_status *status)
  {
      status->default_nan_mode = val;
@@ -XXX,XX +XXX,XX @@ static inline bool get_default_nan_mode(const float_status *status)
      return status->default_nan_mode;
  }
 +static inline FloatFTZDetection get_float_ftz_detection(const float_status *status)
 +{
 +    return status->ftz_detection;
 +}
 +
  #endif /* SOFTFLOAT_HELPERS_H */
 diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/fpu/softfloat-types.h
 +++ b/include/fpu/softfloat-types.h
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
      float_infzeronan_suppress_invalid = (1 << 7),
  } FloatInfZeroNaNRule;
 +/*
 + * When flush_to_zero is set, should we detect denormal results to
 + * be flushed before or after rounding? For most architectures this
 + * should be set to match the tininess_before_rounding setting,
 + * but a few architectures, e.g. MIPS MSA, detect FTZ before
 + * rounding but tininess after rounding.
 + *
 + * This enum is arranged so that the default if the target doesn't
 + * configure it matches the default for tininess_before_rounding
 + * (i.e. "after rounding").
 + */
 +typedef enum __attribute__((__packed__)) {
 +    float_ftz_after_rounding = 0,
 +    float_ftz_before_rounding = 1,
 +} FloatFTZDetection;
 +
  /*
   * Floating Point Status. Individual architectures may maintain
   * several versions of float_status for different functions. The
@@ -XXX,XX +XXX,XX @@ typedef struct float_status {
      bool tininess_before_rounding;
      /* should denormalised results go to zero and set output_denormal_flushed? */
      bool flush_to_zero;
 +    /* do we detect and flush denormal results before or after rounding? */
 +    FloatFTZDetection ftz_detection;
      /* should denormalised inputs go to zero and set input_denormal_flushed? */
      bool flush_inputs_to_zero;
      bool default_nan_mode;
 diff --git a/target/mips/fpu_helper.h b/target/mips/fpu_helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/mips/fpu_helper.h
 +++ b/target/mips/fpu_helper.h
@@ -XXX,XX +XXX,XX @@ static inline void fp_reset(CPUMIPSState *env)
       */
      set_float_2nan_prop_rule(float_2nan_prop_s_ab,
                               &env->active_fpu.fp_status);
 +    /*
 +     * TODO: the spec does't say clearly whether FTZ happens before
 +     * or after rounding for normal FPU operations.
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding,
 +                            &env->active_fpu.fp_status);
  }
  /* MSA */
 diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/alpha/cpu.c
 +++ b/target/alpha/cpu.c
@@ -XXX,XX +XXX,XX @@ static void alpha_cpu_initfn(Object *obj)
      set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
      /* Default NaN: sign bit clear, msb frac bit set */
      set_float_default_nan_pattern(0b01000000, &env->fp_status);
 +    /*
 +     * TODO: this is incorrect. The Alpha Architecture Handbook version 4
 +     * section 4.7.7.11 says that we flush to zero for underflow cases, so
 +     * this should be float_ftz_after_rounding to match the
 +     * tininess_after_rounding (which is specified in section 4.7.5).
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
  #if defined(CONFIG_USER_ONLY)
      env->flags = ENV_FLAG_PS_USER | ENV_FLAG_FEN;
      cpu_alpha_store_fpcr(env, (uint64_t)(FPCR_INVD | FPCR_DZED | FPCR_OVFD
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
  static void arm_set_default_fp_behaviours(float_status *s)
  {
      set_float_detect_tininess(float_tininess_before_rounding, s);
 +    set_float_ftz_detection(float_ftz_before_rounding, s);
      set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
      set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
      set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
 diff --git a/target/hppa/fpu_helper.c b/target/hppa/fpu_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/hppa/fpu_helper.c
 +++ b/target/hppa/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(loaded_fr0)(CPUHPPAState *env)
      set_float_infzeronan_rule(float_infzeronan_dnan_never, &env->fp_status);
      /* Default NaN: sign bit clear, msb-1 frac bit set */
      set_float_default_nan_pattern(0b00100000, &env->fp_status);
 +    /*
 +     * "PA-RISC 2.0 Architecture" says it is IMPDEF whether the flushing
 +     * enabled by FPSR.D happens before or after rounding. We pick "before"
 +     * for consistency with tininess detection.
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 +    /*
 +     * TODO: "PA-RISC 2.0 Architecture" chapter 10 says that we should
 +     * detect tininess before rounding, but we don't set that here so we
 +     * get the default tininess after rounding.
 +     */
  }
  void cpu_hppa_loaded_fr0(CPUHPPAState *env)
 diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/tcg/fpu_helper.c
 +++ b/target/i386/tcg/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ void cpu_init_fp_statuses(CPUX86State *env)
      set_float_default_nan_pattern(0b11000000, &env->fp_status);
      set_float_default_nan_pattern(0b11000000, &env->mmx_status);
      set_float_default_nan_pattern(0b11000000, &env->sse_status);
 +    /*
 +     * TODO: x86 does flush-to-zero detection after rounding (the SDM
 +     * section 10.2.3.3 on the FTZ bit of MXCSR says that we flush
 +     * when we detect underflow, which x86 does after rounding).
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->mmx_status);
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->sse_status);
  }
  static inline uint8_t save_exception_flags(CPUX86State *env)
 diff --git a/target/mips/msa.c b/target/mips/msa.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/mips/msa.c
 +++ b/target/mips/msa.c
@@ -XXX,XX +XXX,XX @@ void msa_reset(CPUMIPSState *env)
      /* tininess detected after rounding.*/
      set_float_detect_tininess(float_tininess_after_rounding,
                                &env->active_tc.msa_fp_status);
 +    /*
 +     * MSACSR.FS detects tiny results to flush to zero before rounding
 +     * (per "MIPS Architecture for Programmers Volume IV-j: The MIPS64 SIMD
 +     * Architecture Module, Revision 1.1" section 3.5.4), even though it
 +     * detects tininess after rounding for underflow purposes (section 3.4.2
 +     * table 3.3).
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding,
 +                            &env->active_tc.msa_fp_status);
      /*
       * According to MIPS specifications, if one of the two operands is
 diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/ppc/cpu_init.c
 +++ b/target/ppc/cpu_init.c
@@ -XXX,XX +XXX,XX @@ static void ppc_cpu_reset_hold(Object *obj, ResetType type)
      /* tininess for underflow is detected before rounding */
      set_float_detect_tininess(float_tininess_before_rounding,
                                &env->fp_status);
 +    /* Similarly for flush-to-zero */
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 +
      /*
       * PowerPC propagation rules:
       *  1. A if it sNaN or qNaN
 diff --git a/target/rx/cpu.c b/target/rx/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/rx/cpu.c
 +++ b/target/rx/cpu.c
@@ -XXX,XX +XXX,XX @@ static void rx_cpu_reset_hold(Object *obj, ResetType type)
      set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
      /* Default NaN value: sign bit clear, set frac msb */
      set_float_default_nan_pattern(0b01000000, &env->fp_status);
 +    /*
 +     * TODO: "RX Family RXv1 Instruction Set Architecture" is not 100% clear
 +     * on whether flush-to-zero should happen before or after rounding, but
 +     * section 1.3.2 says that it happens when underflow is detected, and
 +     * implies that underflow is detected after rounding. So this may not
 +     * be the correct setting.
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
  }
  static ObjectClass *rx_cpu_class_by_name(const char *cpu_model)
 diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/sh4/cpu.c
 +++ b/target/sh4/cpu.c
@@ -XXX,XX +XXX,XX @@ static void superh_cpu_reset_hold(Object *obj, ResetType type)
      set_default_nan_mode(1, &env->fp_status);
      /* sign bit clear, set all frac bits other than msb */
      set_float_default_nan_pattern(0b00111111, &env->fp_status);
 +    /*
 +     * TODO: "SH-4 CPU Core Architecture ADCS 7182230F" doesn't say whether
 +     * it detects tininess before or after rounding. Section 6.4 is clear
 +     * that flush-to-zero happens when the result underflows, though, so
 +     * either this should be "detect ftz after rounding" or else we should
 +     * be setting "detect tininess before rounding".
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
  }
  static void superh_cpu_disas_set_info(CPUState *cpu, disassemble_info *info)
 diff --git a/target/tricore/helper.c b/target/tricore/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/tricore/helper.c
 +++ b/target/tricore/helper.c
@@ -XXX,XX +XXX,XX @@ void fpu_set_state(CPUTriCoreState *env)
      set_flush_inputs_to_zero(1, &env->fp_status);
      set_flush_to_zero(1, &env->fp_status);
      set_float_detect_tininess(float_tininess_before_rounding, &env->fp_status);
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
      set_default_nan_mode(1, &env->fp_status);
      /* Default NaN pattern: sign bit clear, frac msb set */
      set_float_default_nan_pattern(0b01000000, &env->fp_status);
 diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/fp/fp-bench.c
 +++ b/tests/fp/fp-bench.c
@@ -XXX,XX +XXX,XX @@ static void run_bench(void)
      set_float_3nan_prop_rule(float_3nan_prop_s_cab, &soft_status);
      set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, &soft_status);
      set_float_default_nan_pattern(0b01000000, &soft_status);
 +    set_float_ftz_detection(float_ftz_before_rounding, &soft_status);
      f = bench_funcs[operation][precision];
      g_assert(f);
 diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/fpu/softfloat-parts.c.inc
 +++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
              p->frac_lo &= ~round_mask;
          }
          frac_shr(p, frac_shift);
 -    } else if (s->flush_to_zero) {
 +    } else if (s->flush_to_zero &&
 +               s->ftz_detection == float_ftz_before_rounding) {
          flags |= float_flag_output_denormal_flushed;
          p->cls = float_class_zero;
          exp = 0;
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
          exp = (p->frac_hi & DECOMPOSED_IMPLICIT_BIT) && !fmt->m68k_denormal;
          frac_shr(p, frac_shift);
 -        if (is_tiny && (flags & float_flag_inexact)) {
 -            flags |= float_flag_underflow;
 -        }
 -        if (exp == 0 && frac_eqz(p)) {
 -            p->cls = float_class_zero;
 +        if (is_tiny) {
 +            if (s->flush_to_zero) {
 +                assert(s->ftz_detection == float_ftz_after_rounding);
 +                flags |= float_flag_output_denormal_flushed;
 +                p->cls = float_class_zero;
 +                exp = 0;
 +                frac_clear(p);
 +            } else if (flags & float_flag_inexact) {
 +                flags |= float_flag_underflow;
 +            }
 +            if (exp == 0 && frac_eqz(p)) {
 +                p->cls = float_class_zero;
 +            }
          }
      }
      p->exp = exp;
 --
 .34.1

-New patch
+[PULL 05/68] target/arm: Define FPCR AH, FIZ, NEP bits
+The Armv8.7 FEAT_AFP feature defines three new control bits in
+the FPCR:
+ * FPCR.AH: "alternate floating point mode"; this changes floating
+   point behaviour in a variety of ways, including:
+    - the sign of a default NaN is 1, not 0
+    - if FPCR.FZ is also 1, denormals detected after rounding
+      with an unbounded exponent has been applied are flushed to zero
+    - FPCR.FZ does not cause denormalized inputs to be flushed to zero
+    - miscellaneous other corner-case behaviour changes
+ * FPCR.FIZ: flush denormalized numbers to zero on input for
+   most instructions
+ * FPCR.NEP: makes scalar SIMD operations merge the result with
+   higher vector elements in one of the source registers, instead
+   of zeroing the higher elements of the destination
+This commit defines the new bits in the FPCR, and allows them to be
+read or written when FEAT_AFP is implemented.  Actual behaviour
+changes will be implemented in subsequent commits.
+Note that these are the first FPCR bits which don't appear in the
+AArch32 FPSCR view of the register, and which share bit positions
+with FPSR bits.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/cpu-features.h |  5 +++++
+ target/arm/cpu.h          |  3 +++
+ target/arm/vfp_helper.c   | 11 ++++++++---
+files changed, 16 insertions(+), 3 deletions(-)
+diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu-features.h
++++ b/target/arm/cpu-features.h
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_hcx(const ARMISARegisters *id)
+     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HCX) != 0;
+ }
++static inline bool isar_feature_aa64_afp(const ARMISARegisters *id)
++{
++    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, AFP) != 0;
++}
++
+ static inline bool isar_feature_aa64_tidcp1(const ARMISARegisters *id)
+ {
+     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, TIDCP1) != 0;
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
+  */
+ /* FPCR bits */
++#define FPCR_FIZ    (1 << 0)    /* Flush Inputs to Zero (FEAT_AFP) */
++#define FPCR_AH     (1 << 1)    /* Alternate Handling (FEAT_AFP) */
++#define FPCR_NEP    (1 << 2)    /* SIMD scalar ops preserve elts (FEAT_AFP) */
+ #define FPCR_IOE    (1 << 8)    /* Invalid Operation exception trap enable */
+ #define FPCR_DZE    (1 << 9)    /* Divide by Zero exception trap enable */
+ #define FPCR_OFE    (1 << 10)   /* Overflow exception trap enable */
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp_helper.c
++++ b/target/arm/vfp_helper.c
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_masked(CPUARMState *env, uint32_t val, uint32_t mask)
+     if (!cpu_isar_feature(any_fp16, cpu)) {
+         val &= ~FPCR_FZ16;
+     }
++    if (!cpu_isar_feature(aa64_afp, cpu)) {
++        val &= ~(FPCR_FIZ | FPCR_AH | FPCR_NEP);
++    }
+     if (!cpu_isar_feature(aa64_ebf16, cpu)) {
+         val &= ~FPCR_EBF;
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_masked(CPUARMState *env, uint32_t val, uint32_t mask)
+      * We don't implement trapped exception handling, so the
+      * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
+      *
+-     * The FPCR bits we keep in vfp.fpcr are AHP, DN, FZ, RMode, EBF
+-     * and FZ16. Len, Stride and LTPSIZE we just handled. Store those bits
++     * The FPCR bits we keep in vfp.fpcr are AHP, DN, FZ, RMode, EBF, FZ16,
++     * FIZ, AH, and NEP.
++     * Len, Stride and LTPSIZE we just handled. Store those bits
+      * there, and zero any of the other FPCR bits and the RES0 and RAZ/WI
+      * bits.
+      */
+-    val &= FPCR_AHP | FPCR_DN | FPCR_FZ | FPCR_RMODE_MASK | FPCR_FZ16 | FPCR_EBF;
++    val &= FPCR_AHP | FPCR_DN | FPCR_FZ | FPCR_RMODE_MASK | FPCR_FZ16 |
++        FPCR_EBF | FPCR_FIZ | FPCR_AH | FPCR_NEP;
+     env->vfp.fpcr &= ~mask;
+     env->vfp.fpcr |= val;
+ }
+--
+.34.1

-[PULL 32/35] target/arm/helper: Check SCR_EL3.{NSE, NS} encoding for AT instructions
+[PULL 06/68] target/arm: Implement FPCR.FIZ handling
-From: Jean-Philippe Brucker <jean-philippe@linaro.org>
+Part of FEAT_AFP is the new control bit FPCR.FIZ.  This bit affects
 flushing of single and double precision denormal inputs to zero for
 AArch64 floating point instructions.  (For half-precision, the
 existing FPCR.FZ16 control remains the only one.)
-The AT instruction is UNDEFINED if the {NSE,NS} configuration is
+FPCR.FIZ differs from FPCR.FZ in that if we flush an input denormal
-invalid. Add a function to check this on all AT instructions that apply
+only because of FPCR.FIZ then we should *not* set the cumulative
-to an EL lower than 3.
+exception bit FPSR.IDC.
-Suggested-by: Peter Maydell <peter.maydell@linaro.org>
+FEAT_AFP also defines that in AArch64 the existing FPCR.FZ only
-Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
+applies when FPCR.AH is 0.
-Message-id: 20230809123706.1842548-6-jean-philippe@linaro.org
 We can implement this by setting the "flush inputs to zero" state
 appropriately when FPCR is written, and by not reflecting the
 float_flag_input_denormal status flag into FPSR reads when it is the
 result only of FPSR.FIZ.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/helper.c | 38 +++++++++++++++++++++++++++-----------
+ target/arm/vfp_helper.c | 60 ++++++++++++++++++++++++++++++++++-------
-file changed, 27 insertions(+), 11 deletions(-)
+file changed, 50 insertions(+), 10 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/vfp_helper.c
-+++ b/target/arm/helper.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ static void ats1h_write(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
- #endif /* CONFIG_TCG */
  static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
  {
 -    uint32_t i = 0;
 +    uint32_t a32_flags = 0, a64_flags = 0;
 -    i |= get_float_exception_flags(&env->vfp.fp_status_a32);
 -    i |= get_float_exception_flags(&env->vfp.fp_status_a64);
 -    i |= get_float_exception_flags(&env->vfp.standard_fp_status);
 +    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
 +    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
      /* FZ16 does not generate an input denormal exception.  */
 -    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
 +    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
            & ~float_flag_input_denormal_flushed);
 -    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
 +    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
            & ~float_flag_input_denormal_flushed);
 -    i |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
 +
 +    a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
 +    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
            & ~float_flag_input_denormal_flushed);
 -    return vfp_exceptbits_from_host(i);
 +    /*
 +     * Flushing an input denormal *only* because FPCR.FIZ == 1 does
 +     * not set FPSR.IDC; if FPCR.FZ is also set then this takes
 +     * precedence and IDC is set (see the FPUnpackBase pseudocode).
 +     * So squash it unless (FPCR.AH == 0 && FPCR.FZ == 1).
 +     * We only do this for the a64 flags because FIZ has no effect
 +     * on AArch32 even if it is set.
 +     */
 +    if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
 +        a64_flags &= ~float_flag_input_denormal_flushed;
 +    }
 +    return vfp_exceptbits_from_host(a32_flags | a64_flags);
  }
-+static CPAccessResult at_e012_access(CPUARMState *env, const ARMCPRegInfo *ri,
+ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
-+                                     bool isread)
+@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
  }
 +static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
 +{
 +    /*
-+     * R_NYXTL: instruction is UNDEFINED if it applies to an Exception level
++     * Synchronize any pending exception-flag information in the
-+     * lower than EL3 and the combination SCR_EL3.{NSE,NS} is reserved. This can
++     * float_status values into env->vfp.fpsr, and then clear out
-+     * only happen when executing at EL3 because that combination also causes an
++     * the float_status data.
 +     * illegal exception return. We don't need to check FEAT_RME either, because
 +     * scr_write() ensures that the NSE bit is not set otherwise.
 +     */
-+    if ((env->cp15.scr_el3 & (SCR_NSE | SCR_NS)) == SCR_NSE) {
++    env->vfp.fpsr |= vfp_get_fpsr_from_host(env);
-+        return CP_ACCESS_TRAP;
++    vfp_clear_float_status_exc_flags(env);
 +    }
 +    return CP_ACCESS_OK;
 +}
 +
- static CPAccessResult at_s1e2_access(CPUARMState *env, const ARMCPRegInfo *ri,
+ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
                                       bool isread)
  {
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult at_s1e2_access(CPUARMState *env, const ARMCPRegInfo *ri,
+     uint64_t changed = env->vfp.fpcr;
-         !(env->cp15.scr_el3 & (SCR_NS | SCR_EEL2))) {
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
-         return CP_ACCESS_TRAP;
+     if (changed & FPCR_FZ) {
          bool ftz_enabled = val & FPCR_FZ;
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
 -        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
 -        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
 +        /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
 +        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
 +    }
 +    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
 +        /*
 +         * A64: Flush denormalized inputs to zero if FPCR.FIZ = 1, or
 +         * both FPCR.AH = 0 and FPCR.FZ = 1.
 +         */
 +        bool fitz_enabled = (val & FPCR_FIZ) ||
 +            (val & (FPCR_FZ | FPCR_AH)) == FPCR_FZ;
 +        set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status_a64);
      }
--    return CP_ACCESS_OK;
+     if (changed & FPCR_DN) {
-+    return at_e012_access(env, ri, isread);
+         bool dnan_enabled = val & FPCR_DN;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
      }
 +    /*
 +     * If any bits changed that we look at in vfp_get_fpsr_from_host(),
 +     * we must sync the float_status flags into vfp.fpsr now (under the
 +     * old regime) before we update vfp.fpcr.
 +     */
 +    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
 +        vfp_sync_and_clear_float_status_exc_flags(env);
 +    }
  }
- static void ats_write64(CPUARMState *env, const ARMCPRegInfo *ri,
+ #else
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 0,
        .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
        .fgt = FGT_ATS1E1R,
 -      .writefn = ats_write64 },
 +      .accessfn = at_e012_access, .writefn = ats_write64 },
      { .name = "AT_S1E1W", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 1,
        .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
        .fgt = FGT_ATS1E1W,
 -      .writefn = ats_write64 },
 +      .accessfn = at_e012_access, .writefn = ats_write64 },
      { .name = "AT_S1E0R", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 2,
        .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
        .fgt = FGT_ATS1E0R,
 -      .writefn = ats_write64 },
 +      .accessfn = at_e012_access, .writefn = ats_write64 },
      { .name = "AT_S1E0W", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 3,
        .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
        .fgt = FGT_ATS1E0W,
 -      .writefn = ats_write64 },
 +      .accessfn = at_e012_access, .writefn = ats_write64 },
      { .name = "AT_S12E1R", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 4, .crn = 7, .crm = 8, .opc2 = 4,
        .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
 -      .writefn = ats_write64 },
 +      .accessfn = at_e012_access, .writefn = ats_write64 },
      { .name = "AT_S12E1W", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 4, .crn = 7, .crm = 8, .opc2 = 5,
        .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
 -      .writefn = ats_write64 },
 +      .accessfn = at_e012_access, .writefn = ats_write64 },
      { .name = "AT_S12E0R", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 4, .crn = 7, .crm = 8, .opc2 = 6,
        .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
 -      .writefn = ats_write64 },
 +      .accessfn = at_e012_access, .writefn = ats_write64 },
      { .name = "AT_S12E0W", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 4, .crn = 7, .crm = 8, .opc2 = 7,
        .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
 -      .writefn = ats_write64 },
 +      .accessfn = at_e012_access, .writefn = ats_write64 },
      /* AT S1E2* are elsewhere as they UNDEF from EL3 if EL2 is not present */
      { .name = "AT_S1E3R", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 6, .crn = 7, .crm = 8, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo ats1e1_reginfo[] = {
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 0,
        .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
        .fgt = FGT_ATS1E1RP,
 -      .writefn = ats_write64 },
 +      .accessfn = at_e012_access, .writefn = ats_write64 },
      { .name = "AT_S1E1WP", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 1,
        .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
        .fgt = FGT_ATS1E1WP,
 -      .writefn = ats_write64 },
 +      .accessfn = at_e012_access, .writefn = ats_write64 },
  };
  static const ARMCPRegInfo ats1cp_reginfo[] = {
 --
 .34.1

-[PULL 31/35] target/arm: Pass security space rather than flag for AT instructions
+[PULL 07/68] target/arm: Adjust FP behaviour for FPCR.AH = 1
-From: Jean-Philippe Brucker <jean-philippe@linaro.org>
+When FPCR.AH is set, various behaviours of AArch64 floating point
 operations which are controlled by softfloat config settings change:
  * tininess and ftz detection before/after rounding
  * NaN propagation order
  * result of 0 * Inf + NaN
  * default NaN value
-At the moment we only handle Secure and Nonsecure security spaces for
+When the guest changes the value of the AH bit, switch these config
-the AT instructions. Add support for Realm and Root.
+settings on the fp_status_a64 and fp_status_f16_a64 float_status
 fields.
-For AArch64, arm_security_space() gives the desired space. ARM DDI0487J
+This requires us to make the arm_set_default_fp_behaviours() function
-says (R_NYXTL):
+global, since we now need to call it from cpu.c and vfp_helper.c; we
 move it to vfp_helper.c so it can be next to the new
 arm_set_ah_fp_behaviours().
-  If EL3 is implemented, then when an address translation instruction
-  that applies to an Exception level lower than EL3 is executed, the
-  Effective value of SCR_EL3.{NSE, NS} determines the target Security
-  state that the instruction applies to.
-For AArch32, some instructions can access NonSecure space from Secure,
-so we still need to pass the state explicitly to do_ats_write().
-Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20230809123706.1842548-5-jean-philippe@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/internals.h | 18 +++++++++---------
+ target/arm/internals.h  |  4 +++
- target/arm/helper.c    | 27 ++++++++++++---------------
+ target/arm/cpu.c        | 23 ----------------
- target/arm/ptw.c       | 12 ++++++------
+ target/arm/vfp_helper.c | 58 ++++++++++++++++++++++++++++++++++++++++-
-files changed, 27 insertions(+), 30 deletions(-)
+files changed, 61 insertions(+), 24 deletions(-)
 diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/internals.h
 +++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
+@@ -XXX,XX +XXX,XX @@ uint64_t gt_virt_cnt_offset(CPUARMState *env);
-     __attribute__((nonnull));
+  * all EL1" scope; this covers stage 1 and stage 2.
  /**
 - * get_phys_addr_with_secure_nogpc: get the physical address for a virtual
 - *                                  address
 + * get_phys_addr_with_space_nogpc: get the physical address for a virtual
 + *                                 address
   * @env: CPUARMState
   * @address: virtual address to get physical address for
   * @access_type: 0 for read, 1 for write, 2 for execute
   * @mmu_idx: MMU index indicating required translation regime
 - * @is_secure: security state for the access
 + * @space: security space for the access
   * @result: set on translation success.
   * @fi: set to fault info if the translation fails
   *
 - * Similar to get_phys_addr, but use the given security regime and don't perform
 + * Similar to get_phys_addr, but use the given security space and don't perform
   * a Granule Protection Check on the resulting address.
   */
--bool get_phys_addr_with_secure_nogpc(CPUARMState *env, target_ulong address,
+ int alle1_tlbmask(CPUARMState *env);
--                                     MMUAccessType access_type,
++
--                                     ARMMMUIdx mmu_idx, bool is_secure,
++/* Set the float_status behaviour to match the Arm defaults */
--                                     GetPhysAddrResult *result,
++void arm_set_default_fp_behaviours(float_status *s);
--                                     ARMMMUFaultInfo *fi)
++
-+bool get_phys_addr_with_space_nogpc(CPUARMState *env, target_ulong address,
+ #endif
-+                                    MMUAccessType access_type,
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 +                                    ARMMMUIdx mmu_idx, ARMSecuritySpace space,
 +                                    GetPhysAddrResult *result,
 +                                    ARMMMUFaultInfo *fi)
      __attribute__((nonnull));
  bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/cpu.c
-+++ b/target/arm/helper.c
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static int par_el1_shareability(GetPhysAddrResult *res)
+@@ -XXX,XX +XXX,XX @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
+     QLIST_INSERT_HEAD(&cpu->el_change_hooks, entry, node);
- static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
+ }
-                              MMUAccessType access_type, ARMMMUIdx mmu_idx,
--                             bool is_secure)
+-/*
-+                             ARMSecuritySpace ss)
+- * Set the float_status behaviour to match the Arm defaults:
 - *  * tininess-before-rounding
 - *  * 2-input NaN propagation prefers SNaN over QNaN, and then
 - *    operand A over operand B (see FPProcessNaNs() pseudocode)
 - *  * 3-input NaN propagation prefers SNaN over QNaN, and then
 - *    operand C over A over B (see FPProcessNaNs3() pseudocode,
 - *    but note that for QEMU muladd is a * b + c, whereas for
 - *    the pseudocode function the arguments are in the order c, a, b.
 - *  * 0 * Inf + NaN returns the default NaN if the input NaN is quiet,
 - *    and the input NaN if it is signalling
 - *  * Default NaN has sign bit clear, msb frac bit set
 - */
 -static void arm_set_default_fp_behaviours(float_status *s)
 -{
 -    set_float_detect_tininess(float_tininess_before_rounding, s);
 -    set_float_ftz_detection(float_ftz_before_rounding, s);
 -    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
 -    set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
 -    set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
 -    set_float_default_nan_pattern(0b01000000, s);
 -}
 -
  static void cp_reg_reset(gpointer key, gpointer value, gpointer opaque)
  {
-     bool ret;
+     /* Reset a single ARMCPRegInfo register */
-     uint64_t par64;
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
+index XXXXXXX..XXXXXXX 100644
-      * I_MXTJT: Granule protection checks are not performed on the final address
+--- a/target/arm/vfp_helper.c
-      * of a successful translation.
++++ b/target/arm/vfp_helper.c
-      */
+@@ -XXX,XX +XXX,XX @@
--    ret = get_phys_addr_with_secure_nogpc(env, value, access_type, mmu_idx,
+ #include "exec/helper-proto.h"
--                                          is_secure, &res, &fi);
+ #include "internals.h"
-+    ret = get_phys_addr_with_space_nogpc(env, value, access_type, mmu_idx, ss,
+ #include "cpu-features.h"
-+                                         &res, &fi);
++#include "fpu/softfloat.h"
+ #ifdef CONFIG_TCG
  #include "qemu/log.h"
 -#include "fpu/softfloat.h"
  #endif
  /* VFP support.  We follow the convention used for VFP instructions:
     Single precision routines have a "s" suffix, double precision a
     "d" suffix.  */
 +/*
 + * Set the float_status behaviour to match the Arm defaults:
 + *  * tininess-before-rounding
 + *  * 2-input NaN propagation prefers SNaN over QNaN, and then
 + *    operand A over operand B (see FPProcessNaNs() pseudocode)
 + *  * 3-input NaN propagation prefers SNaN over QNaN, and then
 + *    operand C over A over B (see FPProcessNaNs3() pseudocode,
 + *    but note that for QEMU muladd is a * b + c, whereas for
 + *    the pseudocode function the arguments are in the order c, a, b.
 + *  * 0 * Inf + NaN returns the default NaN if the input NaN is quiet,
 + *    and the input NaN if it is signalling
 + *  * Default NaN has sign bit clear, msb frac bit set
 + */
 +void arm_set_default_fp_behaviours(float_status *s)
 +{
 +    set_float_detect_tininess(float_tininess_before_rounding, s);
 +    set_float_ftz_detection(float_ftz_before_rounding, s);
 +    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
 +    set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
 +    set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
 +    set_float_default_nan_pattern(0b01000000, s);
 +}
 +
 +/*
 + * Set the float_status behaviour to match the FEAT_AFP
 + * FPCR.AH=1 requirements:
 + *  * tininess-after-rounding
 + *  * 2-input NaN propagation prefers the first NaN
 + *  * 3-input NaN propagation prefers a over b over c
 + *  * 0 * Inf + NaN always returns the input NaN and doesn't
 + *    set Invalid for a QNaN
 + *  * default NaN has sign bit set, msb frac bit set
 + */
 +static void arm_set_ah_fp_behaviours(float_status *s)
 +{
 +    set_float_detect_tininess(float_tininess_after_rounding, s);
 +    set_float_ftz_detection(float_ftz_after_rounding, s);
 +    set_float_2nan_prop_rule(float_2nan_prop_ab, s);
 +    set_float_3nan_prop_rule(float_3nan_prop_abc, s);
 +    set_float_infzeronan_rule(float_infzeronan_dnan_never |
 +                              float_infzeronan_suppress_invalid, s);
 +    set_float_default_nan_pattern(0b11000000, s);
 +}
 +
  #ifdef CONFIG_TCG
  /* Convert host exception flags to vfp form.  */
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
      }
 +    if (changed & FPCR_AH) {
 +        bool ah_enabled = val & FPCR_AH;
 +
 +        if (ah_enabled) {
 +            /* Change behaviours for A64 FP operations */
 +            arm_set_ah_fp_behaviours(&env->vfp.fp_status_a64);
 +            arm_set_ah_fp_behaviours(&env->vfp.fp_status_f16_a64);
 +        } else {
 +            arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
 +            arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
 +        }
 +    }
      /*
-      * ATS operations only do S1 or S1+S2 translations, so we never
+      * If any bits changed that we look at in vfp_get_fpsr_from_host(),
-@@ -XXX,XX +XXX,XX @@ static void ats_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
+      * we must sync the float_status flags into vfp.fpsr now (under the
      uint64_t par64;
      ARMMMUIdx mmu_idx;
      int el = arm_current_el(env);
 -    bool secure = arm_is_secure_below_el3(env);
 +    ARMSecuritySpace ss = arm_security_space(env);
      switch (ri->opc2 & 6) {
      case 0:
@@ -XXX,XX +XXX,XX @@ static void ats_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
          switch (el) {
          case 3:
              mmu_idx = ARMMMUIdx_E3;
 -            secure = true;
              break;
          case 2:
 -            g_assert(!secure);  /* ARMv8.4-SecEL2 is 64-bit only */
 +            g_assert(ss != ARMSS_Secure);  /* ARMv8.4-SecEL2 is 64-bit only */
              /* fall through */
          case 1:
              if (ri->crm == 9 && (env->uncached_cpsr & CPSR_PAN)) {
@@ -XXX,XX +XXX,XX @@ static void ats_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
          switch (el) {
          case 3:
              mmu_idx = ARMMMUIdx_E10_0;
 -            secure = true;
              break;
          case 2:
 -            g_assert(!secure);  /* ARMv8.4-SecEL2 is 64-bit only */
 +            g_assert(ss != ARMSS_Secure);  /* ARMv8.4-SecEL2 is 64-bit only */
              mmu_idx = ARMMMUIdx_Stage1_E0;
              break;
          case 1:
@@ -XXX,XX +XXX,XX @@ static void ats_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
      case 4:
          /* stage 1+2 NonSecure PL1: ATS12NSOPR, ATS12NSOPW */
          mmu_idx = ARMMMUIdx_E10_1;
 -        secure = false;
 +        ss = ARMSS_NonSecure;
          break;
      case 6:
          /* stage 1+2 NonSecure PL0: ATS12NSOUR, ATS12NSOUW */
          mmu_idx = ARMMMUIdx_E10_0;
 -        secure = false;
 +        ss = ARMSS_NonSecure;
          break;
      default:
          g_assert_not_reached();
      }
 -    par64 = do_ats_write(env, value, access_type, mmu_idx, secure);
 +    par64 = do_ats_write(env, value, access_type, mmu_idx, ss);
      A32_BANKED_CURRENT_REG_SET(env, par, par64);
  #else
@@ -XXX,XX +XXX,XX @@ static void ats1h_write(CPUARMState *env, const ARMCPRegInfo *ri,
      uint64_t par64;
      /* There is no SecureEL2 for AArch32. */
 -    par64 = do_ats_write(env, value, access_type, ARMMMUIdx_E2, false);
 +    par64 = do_ats_write(env, value, access_type, ARMMMUIdx_E2,
 +                         ARMSS_NonSecure);
      A32_BANKED_CURRENT_REG_SET(env, par, par64);
  #else
@@ -XXX,XX +XXX,XX @@ static void ats_write64(CPUARMState *env, const ARMCPRegInfo *ri,
  #ifdef CONFIG_TCG
      MMUAccessType access_type = ri->opc2 & 1 ? MMU_DATA_STORE : MMU_DATA_LOAD;
      ARMMMUIdx mmu_idx;
 -    int secure = arm_is_secure_below_el3(env);
      uint64_t hcr_el2 = arm_hcr_el2_eff(env);
      bool regime_e20 = (hcr_el2 & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE);
@@ -XXX,XX +XXX,XX @@ static void ats_write64(CPUARMState *env, const ARMCPRegInfo *ri,
              break;
          case 6: /* AT S1E3R, AT S1E3W */
              mmu_idx = ARMMMUIdx_E3;
 -            secure = true;
              break;
          default:
              g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static void ats_write64(CPUARMState *env, const ARMCPRegInfo *ri,
      }
      env->cp15.par_el[1] = do_ats_write(env, value, access_type,
 -                                       mmu_idx, secure);
 +                                       mmu_idx, arm_security_space(env));
  #else
      /* Handled by hardware accelerator. */
      g_assert_not_reached();
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_gpc(CPUARMState *env, S1Translate *ptw,
      return false;
  }
 -bool get_phys_addr_with_secure_nogpc(CPUARMState *env, target_ulong address,
 -                                     MMUAccessType access_type,
 -                                     ARMMMUIdx mmu_idx, bool is_secure,
 -                                     GetPhysAddrResult *result,
 -                                     ARMMMUFaultInfo *fi)
 +bool get_phys_addr_with_space_nogpc(CPUARMState *env, target_ulong address,
 +                                    MMUAccessType access_type,
 +                                    ARMMMUIdx mmu_idx, ARMSecuritySpace space,
 +                                    GetPhysAddrResult *result,
 +                                    ARMMMUFaultInfo *fi)
  {
      S1Translate ptw = {
          .in_mmu_idx = mmu_idx,
 -        .in_space = arm_secure_to_space(is_secure),
 +        .in_space = space,
      };
      return get_phys_addr_nogpc(env, &ptw, address, access_type, result, fi);
  }
 --
 .34.1

-New patch
+[PULL 08/68] target/arm: Adjust exception flag handling for AH = 1
+When FPCR.AH = 1, some of the cumulative exception flags in the FPSR
+behave slightly differently for A64 operations:
+ * IDC is set when a denormal input is used without flushing
+ * IXC (Inexact) is set when an output denormal is flushed to zero
+Update vfp_get_fpsr_from_host() to do this.
+Note that because half-precision operations never set IDC, we now
+need to add float_flag_input_denormal_used to the set we mask out of
+fp_status_f16_a64.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/vfp_helper.c | 17 ++++++++++++++---
+file changed, 14 insertions(+), 3 deletions(-)
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp_helper.c
++++ b/target/arm/vfp_helper.c
+@@ -XXX,XX +XXX,XX @@ static void arm_set_ah_fp_behaviours(float_status *s)
+ #ifdef CONFIG_TCG
+ /* Convert host exception flags to vfp form.  */
+-static inline uint32_t vfp_exceptbits_from_host(int host_bits)
++static inline uint32_t vfp_exceptbits_from_host(int host_bits, bool ah)
+ {
+     uint32_t target_bits = 0;
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
+     if (host_bits & float_flag_input_denormal_flushed) {
+         target_bits |= FPSR_IDC;
+     }
++    /*
++     * With FPCR.AH, IDC is set when an input denormal is used,
++     * and flushing an output denormal to zero sets both IXC and UFC.
++     */
++    if (ah && (host_bits & float_flag_input_denormal_used)) {
++        target_bits |= FPSR_IDC;
++    }
++    if (ah && (host_bits & float_flag_output_denormal_flushed)) {
++        target_bits |= FPSR_IXC;
++    }
+     return target_bits;
+ }
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
+     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
+     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
+-          & ~float_flag_input_denormal_flushed);
++          & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
+     /*
+      * Flushing an input denormal *only* because FPCR.FIZ == 1 does
+      * not set FPSR.IDC; if FPCR.FZ is also set then this takes
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
+     if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
+         a64_flags &= ~float_flag_input_denormal_flushed;
+     }
+-    return vfp_exceptbits_from_host(a32_flags | a64_flags);
++    return vfp_exceptbits_from_host(a64_flags, env->vfp.fpcr & FPCR_AH) |
++        vfp_exceptbits_from_host(a32_flags, false);
+ }
+ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
+--
+.34.1

-[PULL 18/35] target/arm/ptw: Pass an ARMSecuritySpace to arm_hcr_el2_eff_secstate()
+[PULL 09/68] target/arm: Add FPCR.AH to tbflags
-arm_hcr_el2_eff_secstate() takes a bool secure, which it uses to
+We are going to need to generate different code in some cases when
-determine whether EL2 is enabled in the current security state.
+FPCR.AH is 1.  For example:
-With the advent of FEAT_RME this is no longer sufficient, because
+ * Floating point neg and abs must not flip the sign bit of NaNs
-EL2 can be enabled for Secure state but not for Root, and both
+ * some insns (FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, and various
-of those will pass 'secure == true' in the callsites in ptw.c.
+   BFCVT and BFM bfloat16 ops) need to use a different float_status
    to the usual one
-As it happens in all of our callsites in ptw.c we either avoid making
+Encode FPCR.AH into the A64 tbflags, so we can refer to it at
-the call or else avoid using the returned value if we're doing a
+translate time.
 translation for Root, so this is not a behaviour change even if the
 experimental FEAT_RME is enabled.  But it is less confusing in the
 ptw.c code if we avoid the use of a bool secure that duplicates some
 of the information in the ArmSecuritySpace argument.
-Make arm_hcr_el2_eff_secstate() take an ARMSecuritySpace argument
+Because we now have a bit in FPCR that affects codegen, we can't mark
-instead. Because we always want to know the HCR_EL2 for the
+the AArch64 FPCR register as being SUPPRESS_TB_END any more; writes
-security state defined by the current effective value of
+to it will now end the TB and trigger a regeneration of hflags.
 SCR_EL3.{NSE,NS}, it makes no sense to pass ARMSS_Root here,
 and we assert that callers don't do that.
 To avoid the assert(), we thus push the call to
 arm_hcr_el2_eff_secstate() down into the cases in
 regime_translation_disabled() that need it, rather than calling the
 function and ignoring the result for the Root space translations.
 All other calls to this function in ptw.c are already in places
 where we have confirmed that the mmu_idx is a stage 2 translation
 or that the regime EL is not 3.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-7-peter.maydell@linaro.org
 ---
- target/arm/cpu.h    |  2 +-
+ target/arm/cpu.h               | 1 +
- target/arm/helper.c |  8 +++++---
+ target/arm/tcg/translate.h     | 2 ++
- target/arm/ptw.c    | 15 +++++++--------
+ target/arm/helper.c            | 2 +-
-files changed, 13 insertions(+), 12 deletions(-)
+ target/arm/tcg/hflags.c        | 4 ++++
  target/arm/tcg/translate-a64.c | 1 +
 files changed, 9 insertions(+), 1 deletion(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline bool arm_is_el2_enabled(CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2, 34, 1)
-  * "for all purposes other than a direct read or write access of HCR_EL2."
+ FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1)
-  * Not included here is HCR_RW.
+ /* Set if FEAT_NV2 RAM accesses are big-endian */
-  */
+ FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1)
--uint64_t arm_hcr_el2_eff_secstate(CPUARMState *env, bool secure);
++FIELD(TBFLAG_A64, AH, 37, 1)   /* FPCR.AH */
-+uint64_t arm_hcr_el2_eff_secstate(CPUARMState *env, ARMSecuritySpace space);
- uint64_t arm_hcr_el2_eff(CPUARMState *env);
+ /*
- uint64_t arm_hcrx_el2_eff(CPUARMState *env);
+  * Helpers for using the above. Note that only the A64 accessors use
+diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate.h
 +++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
      bool nv2_mem_e20;
      /* True if NV2 enabled and NV2 RAM accesses are big-endian */
      bool nv2_mem_be;
 +    /* True if FPCR.AH is 1 (alternate floating point handling) */
 +    bool fpcr_ah;
      /*
       * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
       *  < 0, set by the current instruction.
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static void hcr_writelow(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-  * Bits that are not included here:
+       .writefn = aa64_daif_write, .resetfn = arm_cp_reset_ignore },
-  * RW       (read from SCR_EL3.RW as needed)
+     { .name = "FPCR", .state = ARM_CP_STATE_AA64,
-  */
+       .opc0 = 3, .opc1 = 3, .opc2 = 0, .crn = 4, .crm = 4,
--uint64_t arm_hcr_el2_eff_secstate(CPUARMState *env, bool secure)
+-      .access = PL0_RW, .type = ARM_CP_FPU | ARM_CP_SUPPRESS_TB_END,
-+uint64_t arm_hcr_el2_eff_secstate(CPUARMState *env, ARMSecuritySpace space)
++      .access = PL0_RW, .type = ARM_CP_FPU,
- {
+       .readfn = aa64_fpcr_read, .writefn = aa64_fpcr_write },
-     uint64_t ret = env->cp15.hcr_el2;
+     { .name = "FPSR", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 3, .opc2 = 1, .crn = 4, .crm = 4,
--    if (!arm_is_el2_enabled_secstate(env, secure)) {
+diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
-+    assert(space != ARMSS_Root);
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/hflags.c
 +++ b/target/arm/tcg/hflags.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
          DP_TBFLAG_A64(flags, TCMA, aa64_va_parameter_tcma(tcr, mmu_idx));
      }
 +    if (env->vfp.fpcr & FPCR_AH) {
 +        DP_TBFLAG_A64(flags, AH, 1);
 +    }
 +
-+    if (!arm_is_el2_enabled_secstate(env, arm_space_is_secure(space))) {
+     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
          /*
           * "This register has no effect if EL2 is not enabled in the
           * current Security state".  This is ARMv8.4-SecEL2 speak for
@@ -XXX,XX +XXX,XX @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
      if (arm_feature(env, ARM_FEATURE_M)) {
          return 0;
      }
 -    return arm_hcr_el2_eff_secstate(env, arm_is_secure_below_el3(env));
 +    return arm_hcr_el2_eff_secstate(env, arm_security_space_below_el3(env));
  }
- /*
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/ptw.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
+@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
-                                         ARMSecuritySpace space)
+     dc->nv2 = EX_TBFLAG_A64(tb_flags, NV2);
- {
+     dc->nv2_mem_e20 = EX_TBFLAG_A64(tb_flags, NV2_MEM_E20);
-     uint64_t hcr_el2;
+     dc->nv2_mem_be = EX_TBFLAG_A64(tb_flags, NV2_MEM_BE);
--    bool is_secure = arm_space_is_secure(space);
++    dc->fpcr_ah = EX_TBFLAG_A64(tb_flags, AH);
+     dc->vec_len = 0;
-     if (arm_feature(env, ARM_FEATURE_M)) {
+     dc->vec_stride = 0;
-+        bool is_secure = arm_space_is_secure(space);
+     dc->cp_regs = arm_cpu->cp_regs;
          switch (env->v7m.mpu_ctrl[is_secure] &
                  (R_V7M_MPU_CTRL_ENABLE_MASK | R_V7M_MPU_CTRL_HFNMIENA_MASK)) {
          case R_V7M_MPU_CTRL_ENABLE_MASK:
@@ -XXX,XX +XXX,XX @@ static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
          }
      }
 -    hcr_el2 = arm_hcr_el2_eff_secstate(env, is_secure);
      switch (mmu_idx) {
      case ARMMMUIdx_Stage2:
      case ARMMMUIdx_Stage2_S:
          /* HCR.DC means HCR.VM behaves as 1 */
 +        hcr_el2 = arm_hcr_el2_eff_secstate(env, space);
          return (hcr_el2 & (HCR_DC | HCR_VM)) == 0;
      case ARMMMUIdx_E10_0:
      case ARMMMUIdx_E10_1:
      case ARMMMUIdx_E10_1_PAN:
          /* TGE means that EL0/1 act as if SCTLR_EL1.M is zero */
 +        hcr_el2 = arm_hcr_el2_eff_secstate(env, space);
          if (hcr_el2 & HCR_TGE) {
              return true;
          }
@@ -XXX,XX +XXX,XX @@ static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
      case ARMMMUIdx_Stage1_E1:
      case ARMMMUIdx_Stage1_E1_PAN:
          /* HCR.DC means SCTLR_EL1.M behaves as 0 */
 +        hcr_el2 = arm_hcr_el2_eff_secstate(env, space);
          if (hcr_el2 & HCR_DC) {
              return true;
          }
@@ -XXX,XX +XXX,XX @@ static bool fault_s1ns(ARMSecuritySpace space, ARMMMUIdx s2_mmu_idx)
  static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
                               hwaddr addr, ARMMMUFaultInfo *fi)
  {
 -    bool is_secure = ptw->in_secure;
      ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
      ARMMMUIdx s2_mmu_idx = ptw->in_ptw_idx;
      uint8_t pte_attrs;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
      }
      if (regime_is_stage2(s2_mmu_idx)) {
 -        uint64_t hcr = arm_hcr_el2_eff_secstate(env, is_secure);
 +        uint64_t hcr = arm_hcr_el2_eff_secstate(env, ptw->in_space);
          if ((hcr & HCR_PTW) && S2_attrs_are_device(hcr, pte_attrs)) {
              /*
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_disabled(CPUARMState *env,
                                     ARMMMUFaultInfo *fi)
  {
      ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
 -    bool is_secure = arm_space_is_secure(ptw->in_space);
      uint8_t memattr = 0x00;    /* Device nGnRnE */
      uint8_t shareability = 0;  /* non-shareable */
      int r_el;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_disabled(CPUARMState *env,
          /* Fill in cacheattr a-la AArch64.TranslateAddressS1Off. */
          if (r_el == 1) {
 -            uint64_t hcr = arm_hcr_el2_eff_secstate(env, is_secure);
 +            uint64_t hcr = arm_hcr_el2_eff_secstate(env, ptw->in_space);
              if (hcr & HCR_DC) {
                  if (hcr & HCR_DCT) {
                      memattr = 0xf0;  /* Tagged, Normal, WB, RWA */
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
  {
      hwaddr ipa;
      int s1_prot, s1_lgpgsz;
 -    bool is_secure = ptw->in_secure;
      ARMSecuritySpace in_space = ptw->in_space;
      bool ret, ipa_secure;
      ARMCacheAttrs cacheattrs1;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
      }
      /* Combine the S1 and S2 cache attributes. */
 -    hcr = arm_hcr_el2_eff_secstate(env, is_secure);
 +    hcr = arm_hcr_el2_eff_secstate(env, in_space);
      if (hcr & HCR_DC) {
          /*
           * HCR.DC forces the first stage attributes to
 --
 .34.1

-[PULL 30/35] target/arm: Skip granule protection checks for AT instructions
+[PULL 10/68] target/arm: Set up float_status to use for FPCR.AH=1 behaviour
-From: Jean-Philippe Brucker <jean-philippe@linaro.org>
+When FPCR.AH is 1, the behaviour of some instructions changes:
+ * AdvSIMD BFCVT, BFCVTN, BFCVTN2, BFMLALB, BFMLALT
-GPC checks are not performed on the output address for AT instructions,
+ * SVE BFCVT, BFCVTNT, BFMLALB, BFMLALT, BFMLSLB, BFMLSLT
-as stated by ARM DDI 0487J in D8.12.2:
+ * SME BFCVT, BFCVTN, BFMLAL, BFMLSL (these are all in SME2 which
+   QEMU does not yet implement)
-  When populating PAR_EL1 with the result of an address translation
+ * FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
-  instruction, granule protection checks are not performed on the final
-  output address of a successful translation.
+The behaviour change is:
+ * the instructions do not update the FPSR cumulative exception flags
-Rename get_phys_addr_with_secure(), since it's only used to handle AT
+ * trapped floating point exceptions are disabled (a no-op for QEMU,
-instructions.
+   which doesn't implement FPCR.{IDE,IXE,UFE,OFE,DZE,IOE})
+ * rounding is always round-to-nearest-even regardless of FPCR.RMode
-Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
+ * denormalized inputs and outputs are always flushed to zero, as if
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+   FPCR.{FZ,FIZ} is {1,1}
-Message-id: 20230809123706.1842548-4-jean-philippe@linaro.org
+ * FPCR.FZ16 is still honoured for half-precision inputs
 (See the Arm ARM DDI0487L.a section A1.5.9.)
 We can provide all these behaviours with another pair of float_status fields
 which we use only for these insns, when FPCR.AH is 1. These float_status
 fields will always have:
  * flush_to_zero and flush_inputs_to_zero set for the non-F16 field
  * rounding mode set to round-to-nearest-even
 and so the only FPCR fields they need to honour are DN and FZ16.
 In this commit we only define the new fp_status fields and give them
 the required behaviour when FPSR is updated.  In subsequent commits
 we will arrange to use this new fp_status field for the instructions
 that should be affected by FPCR.AH in this way.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/internals.h | 25 ++++++++++++++-----------
+ target/arm/cpu.h           | 15 +++++++++++++++
- target/arm/helper.c    |  8 ++++++--
+ target/arm/internals.h     |  2 ++
- target/arm/ptw.c       | 11 ++++++-----
+ target/arm/tcg/translate.h | 14 ++++++++++++++
-files changed, 26 insertions(+), 18 deletions(-)
+ target/arm/cpu.c           |  4 ++++
+ target/arm/vfp_helper.c    | 13 ++++++++++++-
 files changed, 47 insertions(+), 1 deletion(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
           *  standard_fp_status : the ARM "Standard FPSCR Value"
           *  standard_fp_status_fp16 : used for half-precision
           *       calculations with the ARM "Standard FPSCR Value"
 +         *  ah_fp_status: used for the A64 insns which change behaviour
 +         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 +         *       and the reciprocal and square root estimate/step insns)
 +         *  ah_fp_status_f16: used for the A64 insns which change behaviour
 +         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 +         *       and the reciprocal and square root estimate/step insns);
 +         *       for half-precision
           *
           * Half-precision operations are governed by a separate
           * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
           * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
           * using a fixed value for it.
           *
 +         * The ah_fp_status is needed because some insns have different
 +         * behaviour when FPCR.AH == 1: they don't update cumulative
 +         * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
 +         * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
 +         * which means we need an ah_fp_status_f16 as well.
 +         *
           * To avoid having to transfer exception bits around, we simply
           * say that the FPSCR cumulative exception flags are the logical
           * OR of the flags in the four fp statuses. This relies on the
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          float_status fp_status_f16_a64;
          float_status standard_fp_status;
          float_status standard_fp_status_f16;
 +        float_status ah_fp_status;
 +        float_status ah_fp_status_f16;
          uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
          uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
 diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/internals.h
 +++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ typedef struct GetPhysAddrResult {
+@@ -XXX,XX +XXX,XX @@ int alle1_tlbmask(CPUARMState *env);
- } GetPhysAddrResult;
+ /* Set the float_status behaviour to match the Arm defaults */
- /**
+ void arm_set_default_fp_behaviours(float_status *s);
-- * get_phys_addr_with_secure: get the physical address for a virtual address
++/* Set the float_status behaviour to match Arm FPCR.AH=1 behaviour */
-+ * get_phys_addr: get the physical address for a virtual address
++void arm_set_ah_fp_behaviours(float_status *s);
-  * @env: CPUARMState
-  * @address: virtual address to get physical address for
+ #endif
-  * @access_type: 0 for read, 1 for write, 2 for execute
+diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
-  * @mmu_idx: MMU index indicating required translation regime
+index XXXXXXX..XXXXXXX 100644
-- * @is_secure: security state for the access
+--- a/target/arm/tcg/translate.h
-  * @result: set on translation success.
++++ b/target/arm/tcg/translate.h
-  * @fi: set to fault info if the translation fails
+@@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour {
-  *
+     FPST_A64,
-@@ -XXX,XX +XXX,XX @@ typedef struct GetPhysAddrResult {
+     FPST_A32_F16,
-  *  * for PSMAv5 based systems we don't bother to return a full FSR format
+     FPST_A64_F16,
-  *    value.
++    FPST_AH,
 +    FPST_AH_F16,
      FPST_STD,
      FPST_STD_F16,
  } ARMFPStatusFlavour;
@@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour {
   *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
   * FPST_A64_F16
   *   for AArch64 operations controlled by the FPCR where FPCR.FZ16 is to be used
 + * FPST_AH:
 + *   for AArch64 operations which change behaviour when AH=1 (specifically,
 + *   bfloat16 conversions and multiplies, and the reciprocal and square root
 + *   estimate/step insns)
 + * FPST_AH_F16:
 + *   ditto, but for half-precision operations
   * FPST_STD
   *   for A32/T32 Neon operations using the "standard FPSCR value"
   * FPST_STD_F16
@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
      case FPST_A64_F16:
          offset = offsetof(CPUARMState, vfp.fp_status_f16_a64);
          break;
 +    case FPST_AH:
 +        offset = offsetof(CPUARMState, vfp.ah_fp_status);
 +        break;
 +    case FPST_AH_F16:
 +        offset = offsetof(CPUARMState, vfp.ah_fp_status_f16);
 +        break;
      case FPST_STD:
          offset = offsetof(CPUARMState, vfp.standard_fp_status);
          break;
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
      arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
 +    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
 +    set_flush_to_zero(1, &env->vfp.ah_fp_status);
 +    set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
 +    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status_f16);
  #ifndef CONFIG_USER_ONLY
      if (kvm_enabled()) {
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ void arm_set_default_fp_behaviours(float_status *s)
   *    set Invalid for a QNaN
   *  * default NaN has sign bit set, msb frac bit set
   */
--bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
+-static void arm_set_ah_fp_behaviours(float_status *s)
--                               MMUAccessType access_type,
++void arm_set_ah_fp_behaviours(float_status *s)
--                               ARMMMUIdx mmu_idx, bool is_secure,
+ {
--                               GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
+     set_float_detect_tininess(float_tininess_after_rounding, s);
-+bool get_phys_addr(CPUARMState *env, target_ulong address,
+     set_float_ftz_detection(float_ftz_after_rounding, s);
-+                   MMUAccessType access_type, ARMMMUIdx mmu_idx,
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
-+                   GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
+     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
-     __attribute__((nonnull));
+     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
+           & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
  /**
 - * get_phys_addr: get the physical address for a virtual address
 + * get_phys_addr_with_secure_nogpc: get the physical address for a virtual
 + *                                  address
   * @env: CPUARMState
   * @address: virtual address to get physical address for
   * @access_type: 0 for read, 1 for write, 2 for execute
   * @mmu_idx: MMU index indicating required translation regime
 + * @is_secure: security state for the access
   * @result: set on translation success.
   * @fi: set to fault info if the translation fails
   *
 - * Similarly, but use the security regime of @mmu_idx.
 + * Similar to get_phys_addr, but use the given security regime and don't perform
 + * a Granule Protection Check on the resulting address.
   */
 -bool get_phys_addr(CPUARMState *env, target_ulong address,
 -                   MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                   GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
 +bool get_phys_addr_with_secure_nogpc(CPUARMState *env, target_ulong address,
 +                                     MMUAccessType access_type,
 +                                     ARMMMUIdx mmu_idx, bool is_secure,
 +                                     GetPhysAddrResult *result,
 +                                     ARMMMUFaultInfo *fi)
      __attribute__((nonnull));
  bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
      ARMMMUFaultInfo fi = {};
      GetPhysAddrResult res = {};
 -    ret = get_phys_addr_with_secure(env, value, access_type, mmu_idx,
 -                                    is_secure, &res, &fi);
 +    /*
-+     * I_MXTJT: Granule protection checks are not performed on the final address
++     * We do not merge in flags from ah_fp_status or ah_fp_status_f16, because
-+     * of a successful translation.
++     * they are used for insns that must not set the cumulative exception bits.
 +     */
-+    ret = get_phys_addr_with_secure_nogpc(env, value, access_type, mmu_idx,
++
 +                                          is_secure, &res, &fi);
      /*
-      * ATS operations only do S1 or S1+S2 translations, so we never
+      * Flushing an input denormal *only* because FPCR.FIZ == 1 does
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+      * not set FPSR.IDC; if FPCR.FZ is also set then this takes
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
---- a/target/arm/ptw.c
+     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
-+++ b/target/arm/ptw.c
+     set_float_exception_flags(0, &env->vfp.standard_fp_status);
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_gpc(CPUARMState *env, S1Translate *ptw,
+     set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
-     return false;
++    set_float_exception_flags(0, &env->vfp.ah_fp_status);
 +    set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
  }
--bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
+ static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
--                               MMUAccessType access_type, ARMMMUIdx mmu_idx,
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
--                               bool is_secure, GetPhysAddrResult *result,
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
--                               ARMMMUFaultInfo *fi)
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
-+bool get_phys_addr_with_secure_nogpc(CPUARMState *env, target_ulong address,
+         set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
-+                                     MMUAccessType access_type,
++        set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
-+                                     ARMMMUIdx mmu_idx, bool is_secure,
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
-+                                     GetPhysAddrResult *result,
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
-+                                     ARMMMUFaultInfo *fi)
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
- {
++        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
-     S1Translate ptw = {
+     }
-         .in_mmu_idx = mmu_idx,
+     if (changed & FPCR_FZ) {
-         .in_space = arm_secure_to_space(is_secure),
+         bool ftz_enabled = val & FPCR_FZ;
-     };
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
--    return get_phys_addr_gpc(env, &ptw, address, access_type, result, fi);
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
-+    return get_phys_addr_nogpc(env, &ptw, address, access_type, result, fi);
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
- }
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
++        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
- bool get_phys_addr(CPUARMState *env, target_ulong address,
++        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status_f16);
      }
      if (changed & FPCR_AH) {
          bool ah_enabled = val & FPCR_AH;
 --
 .34.1

-New patch
+[PULL 11/68] target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
+For the instructions FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, use
+FPST_FPCR_AH or FPST_FPCR_AH_F16 when FPCR.AH is 1, so that they get
+the required behaviour changes.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.h |  13 ++++
+ target/arm/tcg/translate-a64.c | 119 +++++++++++++++++++++++++--------
+ target/arm/tcg/translate-sve.c |  30 ++++++---
+files changed, 127 insertions(+), 35 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.h
++++ b/target/arm/tcg/translate-a64.h
+@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr pred_full_reg_ptr(DisasContext *s, int regno)
+     return ret;
+ }
++/*
++ * Return the ARMFPStatusFlavour to use based on element size and
++ * whether FPCR.AH is set.
++ */
++static inline ARMFPStatusFlavour select_ah_fpst(DisasContext *s, MemOp esz)
++{
++    if (s->fpcr_ah) {
++        return esz == MO_16 ? FPST_AH_F16 : FPST_AH;
++    } else {
++        return esz == MO_16 ? FPST_A64_F16 : FPST_A64;
++    }
++}
++
+ bool disas_sve(DisasContext *, uint32_t);
+ bool disas_sme(DisasContext *, uint32_t);
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op3_ool(DisasContext *s, bool is_q, int rd,
+  * an out-of-line helper.
+  */
+ static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, int rd, int rn,
+-                              int rm, bool is_fp16, int data,
++                              int rm, ARMFPStatusFlavour fpsttype, int data,
+                               gen_helper_gvec_3_ptr *fn)
+ {
+-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_A64_F16 : FPST_A64);
++    TCGv_ptr fpst = fpstatus_ptr(fpsttype);
+     tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
+                        vec_full_reg_offset(s, rn),
+                        vec_full_reg_offset(s, rm), fpst,
+@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar {
+     void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
+ } FPScalar;
+-static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
++static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
++                                        const FPScalar *f,
++                                        ARMFPStatusFlavour fpsttype)
+ {
+     switch (a->esz) {
+     case MO_64:
+         if (fp_access_check(s)) {
+             TCGv_i64 t0 = read_fp_dreg(s, a->rn);
+             TCGv_i64 t1 = read_fp_dreg(s, a->rm);
+-            f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_A64));
++            f->gen_d(t0, t0, t1, fpstatus_ptr(fpsttype));
+             write_fp_dreg(s, a->rd, t0);
+         }
+         break;
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+         if (fp_access_check(s)) {
+             TCGv_i32 t0 = read_fp_sreg(s, a->rn);
+             TCGv_i32 t1 = read_fp_sreg(s, a->rm);
+-            f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_A64));
++            f->gen_s(t0, t0, t1, fpstatus_ptr(fpsttype));
+             write_fp_sreg(s, a->rd, t0);
+         }
+         break;
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+         if (fp_access_check(s)) {
+             TCGv_i32 t0 = read_fp_hreg(s, a->rn);
+             TCGv_i32 t1 = read_fp_hreg(s, a->rm);
+-            f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_A64_F16));
++            f->gen_h(t0, t0, t1, fpstatus_ptr(fpsttype));
+             write_fp_sreg(s, a->rd, t0);
+         }
+         break;
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+     return true;
+ }
++static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
++{
++    return do_fp3_scalar_with_fpsttype(s, a, f,
++                                       a->esz == MO_16 ?
++                                       FPST_A64_F16 : FPST_A64);
++}
++
++static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
++{
++    return do_fp3_scalar_with_fpsttype(s, a, f, select_ah_fpst(s, a->esz));
++}
++
+ static const FPScalar f_scalar_fadd = {
+     gen_helper_vfp_addh,
+     gen_helper_vfp_adds,
+@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_frecps = {
+     gen_helper_recpsf_f32,
+     gen_helper_recpsf_f64,
+ };
+-TRANS(FRECPS_s, do_fp3_scalar, a, &f_scalar_frecps)
++TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps)
+ static const FPScalar f_scalar_frsqrts = {
+     gen_helper_rsqrtsf_f16,
+     gen_helper_rsqrtsf_f32,
+     gen_helper_rsqrtsf_f64,
+ };
+-TRANS(FRSQRTS_s, do_fp3_scalar, a, &f_scalar_frsqrts)
++TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts)
+ static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
+                        const FPScalar *f, bool swap)
+@@ -XXX,XX +XXX,XX @@ TRANS(CMHS_s, do_cmop_d, a, TCG_COND_GEU)
+ TRANS(CMEQ_s, do_cmop_d, a, TCG_COND_EQ)
+ TRANS(CMTST_s, do_cmop_d, a, TCG_COND_TSTNE)
+-static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
+-                          gen_helper_gvec_3_ptr * const fns[3])
++static bool do_fp3_vector_with_fpsttype(DisasContext *s, arg_qrrr_e *a,
++                                        int data,
++                                        gen_helper_gvec_3_ptr * const fns[3],
++                                        ARMFPStatusFlavour fpsttype)
+ {
+     MemOp esz = a->esz;
+     int check = fp_access_check_vector_hsd(s, a->q, esz);
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
+         return check == 0;
+     }
+-    gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
+-                      esz == MO_16, data, fns[esz - 1]);
++    gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm, fpsttype,
++                      data, fns[esz - 1]);
+     return true;
+ }
++static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
++                          gen_helper_gvec_3_ptr * const fns[3])
++{
++    return do_fp3_vector_with_fpsttype(s, a, data, fns,
++                                       a->esz == MO_16 ?
++                                       FPST_A64_F16 : FPST_A64);
++}
++
++static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
++                             gen_helper_gvec_3_ptr * const f[3])
++{
++    return do_fp3_vector_with_fpsttype(s, a, data, f,
++                                       select_ah_fpst(s, a->esz));
++}
++
+ static gen_helper_gvec_3_ptr * const f_vector_fadd[3] = {
+     gen_helper_gvec_fadd_h,
+     gen_helper_gvec_fadd_s,
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
+     gen_helper_gvec_recps_s,
+     gen_helper_gvec_recps_d,
+ };
+-TRANS(FRECPS_v, do_fp3_vector, a, 0, f_vector_frecps)
++TRANS(FRECPS_v, do_fp3_vector_ah, a, 0, f_vector_frecps)
+ static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] = {
+     gen_helper_gvec_rsqrts_h,
+     gen_helper_gvec_rsqrts_s,
+     gen_helper_gvec_rsqrts_d,
+ };
+-TRANS(FRSQRTS_v, do_fp3_vector, a, 0, f_vector_frsqrts)
++TRANS(FRSQRTS_v, do_fp3_vector_ah, a, 0, f_vector_frsqrts)
+ static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = {
+     gen_helper_gvec_faddp_h,
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector_idx(DisasContext *s, arg_qrrx_e *a,
+     }
+     gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
+-                      esz == MO_16, a->idx, fns[esz - 1]);
++                      esz == MO_16 ? FPST_A64_F16 : FPST_A64,
++                      a->idx, fns[esz - 1]);
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar1 {
+     void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_ptr);
+ } FPScalar1;
+-static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
+-                          const FPScalar1 *f, int rmode)
++static bool do_fp1_scalar_with_fpsttype(DisasContext *s, arg_rr_e *a,
++                                        const FPScalar1 *f, int rmode,
++                                        ARMFPStatusFlavour fpsttype)
+ {
+     TCGv_i32 tcg_rmode = NULL;
+     TCGv_ptr fpst;
+@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
+         return check == 0;
+     }
+-    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
++    fpst = fpstatus_ptr(fpsttype);
+     if (rmode >= 0) {
+         tcg_rmode = gen_set_rmode(rmode, fpst);
+     }
+@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
+     return true;
+ }
++static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
++                          const FPScalar1 *f, int rmode)
++{
++    return do_fp1_scalar_with_fpsttype(s, a, f, rmode,
++                                       a->esz == MO_16 ?
++                                       FPST_A64_F16 : FPST_A64);
++}
++
++static bool do_fp1_scalar_ah(DisasContext *s, arg_rr_e *a,
++                             const FPScalar1 *f, int rmode)
++{
++    return do_fp1_scalar_with_fpsttype(s, a, f, rmode, select_ah_fpst(s, a->esz));
++}
++
+ static const FPScalar1 f_scalar_fsqrt = {
+     gen_helper_vfp_sqrth,
+     gen_helper_vfp_sqrts,
+@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frecpe = {
+     gen_helper_recpe_f32,
+     gen_helper_recpe_f64,
+ };
+-TRANS(FRECPE_s, do_fp1_scalar, a, &f_scalar_frecpe, -1)
++TRANS(FRECPE_s, do_fp1_scalar_ah, a, &f_scalar_frecpe, -1)
+ static const FPScalar1 f_scalar_frecpx = {
+     gen_helper_frecpx_f16,
+     gen_helper_frecpx_f32,
+     gen_helper_frecpx_f64,
+ };
+-TRANS(FRECPX_s, do_fp1_scalar, a, &f_scalar_frecpx, -1)
++TRANS(FRECPX_s, do_fp1_scalar_ah, a, &f_scalar_frecpx, -1)
+ static const FPScalar1 f_scalar_frsqrte = {
+     gen_helper_rsqrte_f16,
+     gen_helper_rsqrte_f32,
+     gen_helper_rsqrte_f64,
+ };
+-TRANS(FRSQRTE_s, do_fp1_scalar, a, &f_scalar_frsqrte, -1)
++TRANS(FRSQRTE_s, do_fp1_scalar_ah, a, &f_scalar_frsqrte, -1)
+ static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
+ {
+@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FRINT64Z_v, aa64_frint, do_fp1_vector, a,
+            &f_scalar_frint64, FPROUNDING_ZERO)
+ TRANS_FEAT(FRINT64X_v, aa64_frint, do_fp1_vector, a, &f_scalar_frint64, -1)
+-static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
+-                             int rd, int rn, int data,
+-                             gen_helper_gvec_2_ptr * const fns[3])
++static bool do_gvec_op2_fpst_with_fpsttype(DisasContext *s, MemOp esz,
++                                           bool is_q, int rd, int rn, int data,
++                                           gen_helper_gvec_2_ptr * const fns[3],
++                                           ARMFPStatusFlavour fpsttype)
+ {
+     int check = fp_access_check_vector_hsd(s, is_q, esz);
+     TCGv_ptr fpst;
+@@ -XXX,XX +XXX,XX @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
+         return check == 0;
+     }
+-    fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64);
++    fpst = fpstatus_ptr(fpsttype);
+     tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd),
+                        vec_full_reg_offset(s, rn), fpst,
+                        is_q ? 16 : 8, vec_full_reg_size(s),
+@@ -XXX,XX +XXX,XX @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
+     return true;
+ }
++static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
++                             int rd, int rn, int data,
++                             gen_helper_gvec_2_ptr * const fns[3])
++{
++    return do_gvec_op2_fpst_with_fpsttype(s, esz, is_q, rd, rn, data, fns,
++                                          esz == MO_16 ? FPST_A64_F16 :
++                                          FPST_A64);
++}
++
++static bool do_gvec_op2_ah_fpst(DisasContext *s, MemOp esz, bool is_q,
++                                int rd, int rn, int data,
++                                gen_helper_gvec_2_ptr * const fns[3])
++{
++    return do_gvec_op2_fpst_with_fpsttype(s, esz, is_q, rd, rn, data,
++                                          fns, select_ah_fpst(s, esz));
++}
++
+ static gen_helper_gvec_2_ptr * const f_scvtf_v[] = {
+     gen_helper_gvec_vcvt_sh,
+     gen_helper_gvec_vcvt_sf,
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_frecpe[] = {
+     gen_helper_gvec_frecpe_s,
+     gen_helper_gvec_frecpe_d,
+ };
+-TRANS(FRECPE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
++TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
+ static gen_helper_gvec_2_ptr * const f_frsqrte[] = {
+     gen_helper_gvec_frsqrte_h,
+     gen_helper_gvec_frsqrte_s,
+     gen_helper_gvec_frsqrte_d,
+ };
+-TRANS(FRSQRTE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
++TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
+ static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
+ {
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static bool gen_gvec_fpst_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
+     return true;
+ }
+-static bool gen_gvec_fpst_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
+-                                 arg_rr_esz *a, int data)
++static bool gen_gvec_fpst_ah_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
++                                    arg_rr_esz *a, int data)
+ {
+     return gen_gvec_fpst_zz(s, fn, a->rd, a->rn, data,
+-                            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
++                            select_ah_fpst(s, a->esz));
+ }
+ /* Invoke an out-of-line helper on 3 Zregs. */
+@@ -XXX,XX +XXX,XX @@ static bool gen_gvec_fpst_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
+                              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+ }
++static bool gen_gvec_fpst_ah_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
++                                     arg_rrr_esz *a, int data)
++{
++    return gen_gvec_fpst_zzz(s, fn, a->rd, a->rn, a->rm, data,
++                             select_ah_fpst(s, a->esz));
++}
++
+ /* Invoke an out-of-line helper on 4 Zregs. */
+ static bool gen_gvec_ool_zzzz(DisasContext *s, gen_helper_gvec_4 *fn,
+                               int rd, int rn, int rm, int ra, int data)
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const frecpe_fns[] = {
+     NULL,                     gen_helper_gvec_frecpe_h,
+     gen_helper_gvec_frecpe_s, gen_helper_gvec_frecpe_d,
+ };
+-TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_arg_zz, frecpe_fns[a->esz], a, 0)
++TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frecpe_fns[a->esz], a, 0)
+ static gen_helper_gvec_2_ptr * const frsqrte_fns[] = {
+     NULL,                      gen_helper_gvec_frsqrte_h,
+     gen_helper_gvec_frsqrte_s, gen_helper_gvec_frsqrte_d,
+ };
+-TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_arg_zz, frsqrte_fns[a->esz], a, 0)
++TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frsqrte_fns[a->esz], a, 0)
+ /*
+  *** SVE Floating Point Compare with Zero Group
+@@ -XXX,XX +XXX,XX @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
+     };                                                              \
+     TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_arg_zzz, name##_fns[a->esz], a, 0)
++#define DO_FP3_AH(NAME, name) \
++    static gen_helper_gvec_3_ptr * const name##_fns[4] = {          \
++        NULL, gen_helper_gvec_##name##_h,                           \
++        gen_helper_gvec_##name##_s, gen_helper_gvec_##name##_d      \
++    };                                                              \
++    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz, name##_fns[a->esz], a, 0)
++
+ DO_FP3(FADD_zzz, fadd)
+ DO_FP3(FSUB_zzz, fsub)
+ DO_FP3(FMUL_zzz, fmul)
+-DO_FP3(FRECPS, recps)
+-DO_FP3(FRSQRTS, rsqrts)
++DO_FP3_AH(FRECPS, recps)
++DO_FP3_AH(FRSQRTS, rsqrts)
+ #undef DO_FP3
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const frecpx_fns[] = {
+     gen_helper_sve_frecpx_s, gen_helper_sve_frecpx_d,
+ };
+ TRANS_FEAT(FRECPX, aa64_sve, gen_gvec_fpst_arg_zpz, frecpx_fns[a->esz],
+-           a, 0, a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
++           a, 0, select_ah_fpst(s, a->esz))
+ static gen_helper_gvec_3_ptr * const fsqrt_fns[] = {
+     NULL,                   gen_helper_sve_fsqrt_h,
+--
+.34.1

-New patch
+[PULL 12/68] target/arm: Use FPST_FPCR_AH for BFCVT* insns
+When FPCR.AH is 1, use FPST_FPCR_AH for:
+ * AdvSIMD BFCVT, BFCVTN, BFCVTN2
+ * SVE BFCVT, BFCVTNT
+so that they get the required behaviour changes.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 27 +++++++++++++++++++++------
+ target/arm/tcg/translate-sve.c |  6 ++++--
+files changed, 25 insertions(+), 8 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
+ static const FPScalar1 f_scalar_bfcvt = {
+     .gen_s = gen_helper_bfcvt,
+ };
+-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar, a, &f_scalar_bfcvt, -1)
++TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
+ static const FPScalar1 f_scalar_frint32 = {
+     NULL,
+@@ -XXX,XX +XXX,XX @@ static void gen_bfcvtn_hs(TCGv_i64 d, TCGv_i64 n)
+     tcg_gen_extu_i32_i64(d, tmp);
+ }
+-static ArithOneOp * const f_vector_bfcvtn[] = {
+-    NULL,
+-    gen_bfcvtn_hs,
+-    NULL,
++static void gen_bfcvtn_ah_hs(TCGv_i64 d, TCGv_i64 n)
++{
++    TCGv_ptr fpst = fpstatus_ptr(FPST_AH);
++    TCGv_i32 tmp = tcg_temp_new_i32();
++    gen_helper_bfcvt_pair(tmp, n, fpst);
++    tcg_gen_extu_i32_i64(d, tmp);
++}
++
++static ArithOneOp * const f_vector_bfcvtn[2][3] = {
++    {
++        NULL,
++        gen_bfcvtn_hs,
++        NULL,
++    }, {
++        NULL,
++        gen_bfcvtn_ah_hs,
++        NULL,
++    }
+ };
+-TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a, f_vector_bfcvtn)
++TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a,
++           f_vector_bfcvtn[s->fpcr_ah])
+ static bool trans_SHLL_v(DisasContext *s, arg_qrr_e *a)
+ {
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVT_hs, aa64_sve, gen_gvec_fpst_arg_zpz,
+            gen_helper_sve_fcvt_hs, a, 0, FPST_A64_F16)
+ TRANS_FEAT(BFCVT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
+-           gen_helper_sve_bfcvt, a, 0, FPST_A64)
++           gen_helper_sve_bfcvt, a, 0,
++           s->fpcr_ah ? FPST_AH : FPST_A64)
+ TRANS_FEAT(FCVT_dh, aa64_sve, gen_gvec_fpst_arg_zpz,
+            gen_helper_sve_fcvt_dh, a, 0, FPST_A64)
+@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVTNT_ds, aa64_sve2, gen_gvec_fpst_arg_zpz,
+            gen_helper_sve2_fcvtnt_ds, a, 0, FPST_A64)
+ TRANS_FEAT(BFCVTNT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
+-           gen_helper_sve_bfcvtnt, a, 0, FPST_A64)
++           gen_helper_sve_bfcvtnt, a, 0,
++           s->fpcr_ah ? FPST_AH : FPST_A64)
+ TRANS_FEAT(FCVTLT_hs, aa64_sve2, gen_gvec_fpst_arg_zpz,
+            gen_helper_sve2_fcvtlt_hs, a, 0, FPST_A64)
+--
+.34.1

-[PULL 16/35] target/arm/ptw: Pass ptw into get_phys_addr_pmsa*() and get_phys_addr_disabled()
+[PULL 13/68] target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
-In commit 6d2654ffacea813916176 we created the S1Translate struct and
+When FPCR.AH is 1, use FPST_FPCR_AH for:
-used it to plumb through various arguments that we were previously
+ * AdvSIMD BFMLALB, BFMLALT
-passing one-at-a-time to get_phys_addr_v5(), get_phys_addr_v6(), and
+ * SVE BFMLALB, BFMLALT, BFMLSLB, BFMLSLT
 get_phys_addr_lpae().  Extend that pattern to get_phys_addr_pmsav5(),
 get_phys_addr_pmsav7(), get_phys_addr_pmsav8() and
 get_phys_addr_disabled(), so that all the get_phys_addr_* functions
 we call from get_phys_addr_nogpc() take the S1Translate struct rather
 than the mmu_idx and is_secure bool.
-(This refactoring is a prelude to having the called functions look
+so that they get the required behaviour changes.
-at ptw->is_space rather than using an is_secure boolean.)
 We do this by making gen_gvec_op4_fpst() take an ARMFPStatusFlavour
 rather than a bool is_fp16; existing callsites now select
 FPST_FPCR_F16_A64 vs FPST_FPCR_A64 themselves rather than passing in
 the boolean.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-5-peter.maydell@linaro.org
 ---
- target/arm/ptw.c | 57 ++++++++++++++++++++++++++++++------------------
+ target/arm/tcg/translate-a64.c | 20 +++++++++++++-------
-file changed, 36 insertions(+), 21 deletions(-)
+ target/arm/tcg/translate-sve.c |  6 ++++--
 files changed, 17 insertions(+), 9 deletions(-)
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/ptw.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op4_env(DisasContext *s, bool is_q, int rd, int rn,
   * an out-of-line helper.
   */
  static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
 -                              int rm, int ra, bool is_fp16, int data,
 +                              int rm, int ra, ARMFPStatusFlavour fpsttype,
 +                              int data,
                                gen_helper_gvec_4_ptr *fn)
  {
 -    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_A64_F16 : FPST_A64);
 +    TCGv_ptr fpst = fpstatus_ptr(fpsttype);
      tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, rd),
                         vec_full_reg_offset(s, rn),
                         vec_full_reg_offset(s, rm),
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLAL_v(DisasContext *s, arg_qrrr_e *a)
      }
      if (fp_access_check(s)) {
          /* Q bit selects BFMLALB vs BFMLALT. */
 -        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, false, a->q,
 +        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd,
 +                          s->fpcr_ah ? FPST_AH : FPST_A64, a->q,
                            gen_helper_gvec_bfmlal);
      }
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
      }
      gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
 -                      a->esz == MO_16, a->rot, fn[a->esz]);
 +                      a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
 +                      a->rot, fn[a->esz]);
      return true;
  }
--static bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
--                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
+     }
--                                 bool is_secure, GetPhysAddrResult *result,
-+static bool get_phys_addr_pmsav5(CPUARMState *env,
+     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
-+                                 S1Translate *ptw,
+-                      esz == MO_16, (a->idx << 1) | neg,
-+                                 uint32_t address,
++                      esz == MO_16 ? FPST_A64_F16 : FPST_A64,
-+                                 MMUAccessType access_type,
++                      (a->idx << 1) | neg,
-+                                 GetPhysAddrResult *result,
+                       fns[esz - 1]);
-                                  ARMMMUFaultInfo *fi)
+     return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLAL_vi(DisasContext *s, arg_qrrx_e *a)
      }
      if (fp_access_check(s)) {
          /* Q bit selects BFMLALB vs BFMLALT. */
 -        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, 0,
 +        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd,
 +                          s->fpcr_ah ? FPST_AH : FPST_A64,
                            (a->idx << 1) | a->q,
                            gen_helper_gvec_bfmlal_idx);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
      }
      if (fp_access_check(s)) {
          gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
 -                          a->esz == MO_16, (a->idx << 2) | a->rot, fn);
 +                          a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
 +                          (a->idx << 2) | a->rot, fn);
      }
      return true;
  }
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(BFMMLA, aa64_sve_bf16, gen_gvec_env_arg_zzzz,
  static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
  {
-     int n;
+     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal,
-     uint32_t mask;
+-                              a->rd, a->rn, a->rm, a->ra, sel, FPST_A64);
-     uint32_t base;
++                              a->rd, a->rn, a->rm, a->ra, sel,
-+    ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
++                              s->fpcr_ah ? FPST_AH : FPST_A64);
      bool is_user = regime_is_user(env, mmu_idx);
 +    bool is_secure = arm_space_is_secure(ptw->in_space);
      if (regime_translation_disabled(env, mmu_idx, is_secure)) {
          /* MPU disabled.  */
@@ -XXX,XX +XXX,XX @@ static bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx,
      return regime_sctlr(env, mmu_idx) & SCTLR_BR;
  }
--static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
+ TRANS_FEAT(BFMLALB_zzzw, aa64_sve_bf16, do_BFMLAL_zzzw, a, false)
--                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
+@@ -XXX,XX +XXX,XX @@ static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
 -                                 bool secure, GetPhysAddrResult *result,
 +static bool get_phys_addr_pmsav7(CPUARMState *env,
 +                                 S1Translate *ptw,
 +                                 uint32_t address,
 +                                 MMUAccessType access_type,
 +                                 GetPhysAddrResult *result,
                                   ARMMMUFaultInfo *fi)
  {
-     ARMCPU *cpu = env_archcpu(env);
+     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal_idx,
-     int n;
+                               a->rd, a->rn, a->rm, a->ra,
-+    ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
+-                              (a->index << 1) | sel, FPST_A64);
-     bool is_user = regime_is_user(env, mmu_idx);
++                              (a->index << 1) | sel,
-+    bool secure = arm_space_is_secure(ptw->in_space);
++                              s->fpcr_ah ? FPST_AH : FPST_A64);
      result->f.phys_addr = address;
      result->f.lg_page_size = TARGET_PAGE_BITS;
@@ -XXX,XX +XXX,XX @@ void v8m_security_lookup(CPUARMState *env, uint32_t address,
      }
  }
--static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
+ TRANS_FEAT(BFMLALB_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, false)
 -                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                                 bool secure, GetPhysAddrResult *result,
 +static bool get_phys_addr_pmsav8(CPUARMState *env,
 +                                 S1Translate *ptw,
 +                                 uint32_t address,
 +                                 MMUAccessType access_type,
 +                                 GetPhysAddrResult *result,
                                   ARMMMUFaultInfo *fi)
  {
      V8M_SAttributes sattrs = {};
 +    ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
 +    bool secure = arm_space_is_secure(ptw->in_space);
      bool ret;
      if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
@@ -XXX,XX +XXX,XX @@ static ARMCacheAttrs combine_cacheattrs(uint64_t hcr,
   * MMU disabled.  S1 addresses within aa64 translation regimes are
   * still checked for bounds -- see AArch64.S1DisabledOutput().
   */
 -static bool get_phys_addr_disabled(CPUARMState *env, target_ulong address,
 +static bool get_phys_addr_disabled(CPUARMState *env,
 +                                   S1Translate *ptw,
 +                                   target_ulong address,
                                     MMUAccessType access_type,
 -                                   ARMMMUIdx mmu_idx, bool is_secure,
                                     GetPhysAddrResult *result,
                                     ARMMMUFaultInfo *fi)
  {
 +    ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
 +    bool is_secure = arm_space_is_secure(ptw->in_space);
      uint8_t memattr = 0x00;    /* Device nGnRnE */
      uint8_t shareability = 0;  /* non-shareable */
      int r_el;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
      case ARMMMUIdx_Phys_Root:
      case ARMMMUIdx_Phys_Realm:
          /* Checking Phys early avoids special casing later vs regime_el. */
 -        return get_phys_addr_disabled(env, address, access_type, mmu_idx,
 -                                      is_secure, result, fi);
 +        return get_phys_addr_disabled(env, ptw, address, access_type,
 +                                      result, fi);
      case ARMMMUIdx_Stage1_E0:
      case ARMMMUIdx_Stage1_E1:
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
          if (arm_feature(env, ARM_FEATURE_V8)) {
              /* PMSAv8 */
 -            ret = get_phys_addr_pmsav8(env, address, access_type, mmu_idx,
 -                                       is_secure, result, fi);
 +            ret = get_phys_addr_pmsav8(env, ptw, address, access_type,
 +                                       result, fi);
          } else if (arm_feature(env, ARM_FEATURE_V7)) {
              /* PMSAv7 */
 -            ret = get_phys_addr_pmsav7(env, address, access_type, mmu_idx,
 -                                       is_secure, result, fi);
 +            ret = get_phys_addr_pmsav7(env, ptw, address, access_type,
 +                                       result, fi);
          } else {
              /* Pre-v7 MPU */
 -            ret = get_phys_addr_pmsav5(env, address, access_type, mmu_idx,
 -                                       is_secure, result, fi);
 +            ret = get_phys_addr_pmsav5(env, ptw, address, access_type,
 +                                       result, fi);
          }
          qemu_log_mask(CPU_LOG_MMU, "PMSA MPU lookup for %s at 0x%08" PRIx32
                        " mmu_idx %u -> %s (prot %c%c%c)\n",
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
      /* Definitely a real MMU, not an MPU */
      if (regime_translation_disabled(env, mmu_idx, is_secure)) {
 -        return get_phys_addr_disabled(env, address, access_type, mmu_idx,
 -                                      is_secure, result, fi);
 +        return get_phys_addr_disabled(env, ptw, address, access_type,
 +                                      result, fi);
      }
      if (regime_using_lpae_format(env, mmu_idx)) {
 --
 .34.1

-[PULL 19/35] target/arm: Pass an ARMSecuritySpace to arm_is_el2_enabled_secstate()
+[PULL 14/68] target/arm: Add FPCR.NEP to TBFLAGS
-Pass an ARMSecuritySpace instead of a bool secure to
+For FEAT_AFP, we want to emit different code when FPCR.NEP is set, so
-arm_is_el2_enabled_secstate(). This doesn't change behaviour.
+that instead of zeroing the high elements of a vector register when
 we write the output of a scalar operation to it, we instead merge in
 those elements from one of the source registers.  Since this affects
 the generated code, we need to put FPCR.NEP into the TBFLAGS.
 FPCR.NEP is treated as 0 when in streaming SVE mode and FEAT_SME_FA64
 is not implemented or not enabled; we can implement this logic in
 rebuild_hflags_a64().
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-8-peter.maydell@linaro.org
 ---
- target/arm/cpu.h    | 13 ++++++++-----
+ target/arm/cpu.h               | 1 +
- target/arm/helper.c |  2 +-
+ target/arm/tcg/translate.h     | 2 ++
-files changed, 9 insertions(+), 6 deletions(-)
+ target/arm/tcg/hflags.c        | 9 +++++++++
  target/arm/tcg/translate-a64.c | 1 +
 files changed, 13 insertions(+)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline bool arm_is_secure(CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1)
  /* Set if FEAT_NV2 RAM accesses are big-endian */
  FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1)
  FIELD(TBFLAG_A64, AH, 37, 1)   /* FPCR.AH */
 +FIELD(TBFLAG_A64, NEP, 38, 1)   /* FPCR.NEP */
  /*
-  * Return true if the current security state has AArch64 EL2 or AArch32 Hyp.
+  * Helpers for using the above. Note that only the A64 accessors use
-- * This corresponds to the pseudocode EL2Enabled()
+diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
-+ * This corresponds to the pseudocode EL2Enabled().
+index XXXXXXX..XXXXXXX 100644
-  */
+--- a/target/arm/tcg/translate.h
--static inline bool arm_is_el2_enabled_secstate(CPUARMState *env, bool secure)
++++ b/target/arm/tcg/translate.h
-+static inline bool arm_is_el2_enabled_secstate(CPUARMState *env,
+@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
-+                                               ARMSecuritySpace space)
+     bool nv2_mem_be;
- {
+     /* True if FPCR.AH is 1 (alternate floating point handling) */
-+    assert(space != ARMSS_Root);
+     bool fpcr_ah;
-     return arm_feature(env, ARM_FEATURE_EL2)
++    /* True if FPCR.NEP is 1 (FEAT_AFP scalar upper-element result handling) */
--           && (!secure || (env->cp15.scr_el3 & SCR_EEL2));
++    bool fpcr_nep;
-+           && (space != ARMSS_Secure || (env->cp15.scr_el3 & SCR_EEL2));
+     /*
       * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
       *  < 0, set by the current instruction.
 diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/hflags.c
 +++ b/target/arm/tcg/hflags.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
      if (env->vfp.fpcr & FPCR_AH) {
          DP_TBFLAG_A64(flags, AH, 1);
      }
 +    if (env->vfp.fpcr & FPCR_NEP) {
 +        /*
 +         * In streaming-SVE without FA64, NEP behaves as if zero;
 +         * compare pseudocode IsMerging()
 +         */
 +        if (!(EX_TBFLAG_A64(flags, PSTATE_SM) && !sme_fa64(env, el))) {
 +            DP_TBFLAG_A64(flags, NEP, 1);
 +        }
 +    }
      return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
  }
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
  static inline bool arm_is_el2_enabled(CPUARMState *env)
  {
 -    return arm_is_el2_enabled_secstate(env, arm_is_secure_below_el3(env));
 +    return arm_is_el2_enabled_secstate(env, arm_security_space_below_el3(env));
  }
  #else
@@ -XXX,XX +XXX,XX @@ static inline bool arm_is_secure(CPUARMState *env)
      return false;
  }
 -static inline bool arm_is_el2_enabled_secstate(CPUARMState *env, bool secure)
 +static inline bool arm_is_el2_enabled_secstate(CPUARMState *env,
 +                                               ARMSecuritySpace space)
  {
      return false;
  }
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ uint64_t arm_hcr_el2_eff_secstate(CPUARMState *env, ARMSecuritySpace space)
+@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
+     dc->nv2_mem_e20 = EX_TBFLAG_A64(tb_flags, NV2_MEM_E20);
-     assert(space != ARMSS_Root);
+     dc->nv2_mem_be = EX_TBFLAG_A64(tb_flags, NV2_MEM_BE);
+     dc->fpcr_ah = EX_TBFLAG_A64(tb_flags, AH);
--    if (!arm_is_el2_enabled_secstate(env, arm_space_is_secure(space))) {
++    dc->fpcr_nep = EX_TBFLAG_A64(tb_flags, NEP);
-+    if (!arm_is_el2_enabled_secstate(env, space)) {
+     dc->vec_len = 0;
-         /*
+     dc->vec_stride = 0;
-          * "This register has no effect if EL2 is not enabled in the
+     dc->cp_regs = arm_cpu->cp_regs;
           * current Security state".  This is ARMv8.4-SecEL2 speak for
 --
 .34.1

-[PULL 06/35] qtest: microbit-test: add tests for nRF51 DETECT
+[PULL 15/68] target/arm: Define and use new write_fp_*reg_merging() functions
-From: Chris Laplante <chris@laplante.io>
+For FEAT_AFP's FPCR.NEP bit, we need to programmatically change the
+behaviour of the writeback of the result for most SIMD scalar
-Exercise the DETECT mechanism of the GPIO peripheral.
+operations, so that instead of zeroing the upper part of the result
+register it merges the upper elements from one of the input
-Signed-off-by: Chris Laplante <chris@laplante.io>
+registers.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20230728160324.1159090-7-chris@laplante.io
+Provide new functions write_fp_*reg_merging() which can be used
-[PMM: fixed coding style nits]
+instead of the existing write_fp_*reg() functions when we want this
 "merge the result with one of the input registers if FPCR.NEP is
 enabled" handling, and use them in do_fp3_scalar_with_fpsttype().
 Note that (as documented in the description of the FPCR.NEP bit)
 which input register to use as the merge source varies by
 instruction: for these 2-input scalar operations, the comparison
 instructions take from Rm, not Rn.
 We'll extend this to also provide the merging behaviour for
 the remaining scalar insns in subsequent commits.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tests/qtest/microbit-test.c | 44 +++++++++++++++++++++++++++++++++++++
+ target/arm/tcg/translate-a64.c | 117 +++++++++++++++++++++++++--------
-file changed, 44 insertions(+)
+file changed, 91 insertions(+), 26 deletions(-)
-diff --git a/tests/qtest/microbit-test.c b/tests/qtest/microbit-test.c
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/tests/qtest/microbit-test.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/tests/qtest/microbit-test.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void test_nrf51_gpio(void)
+@@ -XXX,XX +XXX,XX @@ static void write_fp_sreg(DisasContext *s, int reg, TCGv_i32 v)
-     qtest_quit(qts);
+     write_fp_dreg(s, reg, tmp);
  }
-+static void test_nrf51_gpio_detect(void)
++/*
 + * Write a double result to 128 bit vector register reg, honouring FPCR.NEP:
 + * - if FPCR.NEP == 0, clear the high elements of reg
 + * - if FPCR.NEP == 1, set the high elements of reg from mergereg
 + *   (i.e. merge the result with those high elements)
 + * In either case, SVE register bits above 128 are zeroed (per R_WKYLB).
 + */
 +static void write_fp_dreg_merging(DisasContext *s, int reg, int mergereg,
 +                                  TCGv_i64 v)
 +{
-+    QTestState *qts = qtest_init("-M microbit");
++    if (!s->fpcr_nep) {
-+    int i;
++        write_fp_dreg(s, reg, v);
-+
++        return;
 +    /* Connect input buffer on pins 1-7, configure SENSE for high level */
 +    for (i = 1; i <= 7; i++) {
 +        qtest_writel(qts, NRF51_GPIO_BASE + NRF51_GPIO_REG_CNF_START + i * 4,
 +                     deposit32(0, 16, 2, 2));
 +    }
 +
-+    qtest_irq_intercept_out_named(qts, "/machine/nrf51/gpio", "detect");
++    /*
-+
++     * Move from mergereg to reg; this sets the high elements and
-+    for (i = 1; i <= 7; i++) {
++     * clears the bits above 128 as a side effect.
-+        /* Set pin high */
++     */
-+        qtest_set_irq_in(qts, "/machine/nrf51", "unnamed-gpio-in", i, 1);
++    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
-+        uint32_t actual = qtest_readl(qts, NRF51_GPIO_BASE + NRF51_GPIO_REG_IN);
++                     vec_full_reg_offset(s, mergereg),
-+        g_assert_cmpuint(actual, ==, 1 << i);
++                     16, vec_full_reg_size(s));
-+
++    tcg_gen_st_i64(v, tcg_env, vec_full_reg_offset(s, reg));
-+        /* Check that DETECT is high */
++}
-+        g_assert_true(qtest_get_irq(qts, 0));
++
-+
++/*
-+        /* Set pin low, check that DETECT goes low. */
++ * Write a single-prec result, but only clear the higher elements
-+        qtest_set_irq_in(qts, "/machine/nrf51", "unnamed-gpio-in", i, 0);
++ * of the destination register if FPCR.NEP is 0; otherwise preserve them.
-+        actual = qtest_readl(qts, NRF51_GPIO_BASE + NRF51_GPIO_REG_IN);
++ */
-+        g_assert_cmpuint(actual, ==, 0x0);
++static void write_fp_sreg_merging(DisasContext *s, int reg, int mergereg,
-+        g_assert_false(qtest_get_irq(qts, 0));
++                                  TCGv_i32 v)
 +{
 +    if (!s->fpcr_nep) {
 +        write_fp_sreg(s, reg, v);
 +        return;
 +    }
 +
-+    /* Set pin 0 high, check that DETECT doesn't fire */
++    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
-+    qtest_set_irq_in(qts, "/machine/nrf51", "unnamed-gpio-in", 0, 1);
++                     vec_full_reg_offset(s, mergereg),
-+    g_assert_false(qtest_get_irq(qts, 0));
++                     16, vec_full_reg_size(s));
-+    qtest_set_irq_in(qts, "/machine/nrf51", "unnamed-gpio-in", 0, 0);
++    tcg_gen_st_i32(v, tcg_env, fp_reg_offset(s, reg, MO_32));
-+
++}
-+    /* Set pins 1, 2, and 3 high, then set 3 low. Check DETECT is still high */
++
-+    for (i = 1; i <= 3; i++) {
++/*
-+        qtest_set_irq_in(qts, "/machine/nrf51", "unnamed-gpio-in", i, 1);
++ * Write a half-prec result, but only clear the higher elements
 + * of the destination register if FPCR.NEP is 0; otherwise preserve them.
 + * The caller must ensure that the top 16 bits of v are zero.
 + */
 +static void write_fp_hreg_merging(DisasContext *s, int reg, int mergereg,
 +                                  TCGv_i32 v)
 +{
 +    if (!s->fpcr_nep) {
 +        write_fp_sreg(s, reg, v);
 +        return;
 +    }
-+    g_assert_true(qtest_get_irq(qts, 0));
++
-+    qtest_set_irq_in(qts, "/machine/nrf51", "unnamed-gpio-in", 3, 0);
++    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
-+    g_assert_true(qtest_get_irq(qts, 0));
++                     vec_full_reg_offset(s, mergereg),
 +                     16, vec_full_reg_size(s));
 +    tcg_gen_st16_i32(v, tcg_env, fp_reg_offset(s, reg, MO_16));
 +}
 +
- static void timer_task(QTestState *qts, hwaddr task)
+ /* Expand a 2-operand AdvSIMD vector operation using an expander function.  */
- {
+ static void gen_gvec_fn2(DisasContext *s, bool is_q, int rd, int rn,
-     qtest_writel(qts, NRF51_TIMER_BASE + task, NRF51_TRIGGER_TASK);
+                          GVecGen2Fn *gvec_fn, int vece)
-@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
+@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar {
+ } FPScalar;
-     qtest_add_func("/microbit/nrf51/uart", test_nrf51_uart);
-     qtest_add_func("/microbit/nrf51/gpio", test_nrf51_gpio);
+ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
-+    qtest_add_func("/microbit/nrf51/gpio_detect", test_nrf51_gpio_detect);
+-                                        const FPScalar *f,
-     qtest_add_func("/microbit/nrf51/nvmc", test_nrf51_nvmc);
++                                        const FPScalar *f, int mergereg,
-     qtest_add_func("/microbit/nrf51/timer", test_nrf51_timer);
+                                         ARMFPStatusFlavour fpsttype)
-     qtest_add_func("/microbit/microbit/i2c", test_microbit_i2c);
+ {
      switch (a->esz) {
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
              TCGv_i64 t0 = read_fp_dreg(s, a->rn);
              TCGv_i64 t1 = read_fp_dreg(s, a->rm);
              f->gen_d(t0, t0, t1, fpstatus_ptr(fpsttype));
 -            write_fp_dreg(s, a->rd, t0);
 +            write_fp_dreg_merging(s, a->rd, mergereg, t0);
          }
          break;
      case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
              TCGv_i32 t0 = read_fp_sreg(s, a->rn);
              TCGv_i32 t1 = read_fp_sreg(s, a->rm);
              f->gen_s(t0, t0, t1, fpstatus_ptr(fpsttype));
 -            write_fp_sreg(s, a->rd, t0);
 +            write_fp_sreg_merging(s, a->rd, mergereg, t0);
          }
          break;
      case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
              TCGv_i32 t0 = read_fp_hreg(s, a->rn);
              TCGv_i32 t1 = read_fp_hreg(s, a->rm);
              f->gen_h(t0, t0, t1, fpstatus_ptr(fpsttype));
 -            write_fp_sreg(s, a->rd, t0);
 +            write_fp_hreg_merging(s, a->rd, mergereg, t0);
          }
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
      return true;
  }
 -static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
 +static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
 +                          int mergereg)
  {
 -    return do_fp3_scalar_with_fpsttype(s, a, f,
 +    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
                                         a->esz == MO_16 ?
                                         FPST_A64_F16 : FPST_A64);
  }
 -static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
 +static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
 +                             int mergereg)
  {
 -    return do_fp3_scalar_with_fpsttype(s, a, f, select_ah_fpst(s, a->esz));
 +    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
 +                                       select_ah_fpst(s, a->esz));
  }
  static const FPScalar f_scalar_fadd = {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fadd = {
      gen_helper_vfp_adds,
      gen_helper_vfp_addd,
  };
 -TRANS(FADD_s, do_fp3_scalar, a, &f_scalar_fadd)
 +TRANS(FADD_s, do_fp3_scalar, a, &f_scalar_fadd, a->rn)
  static const FPScalar f_scalar_fsub = {
      gen_helper_vfp_subh,
      gen_helper_vfp_subs,
      gen_helper_vfp_subd,
  };
 -TRANS(FSUB_s, do_fp3_scalar, a, &f_scalar_fsub)
 +TRANS(FSUB_s, do_fp3_scalar, a, &f_scalar_fsub, a->rn)
  static const FPScalar f_scalar_fdiv = {
      gen_helper_vfp_divh,
      gen_helper_vfp_divs,
      gen_helper_vfp_divd,
  };
 -TRANS(FDIV_s, do_fp3_scalar, a, &f_scalar_fdiv)
 +TRANS(FDIV_s, do_fp3_scalar, a, &f_scalar_fdiv, a->rn)
  static const FPScalar f_scalar_fmul = {
      gen_helper_vfp_mulh,
      gen_helper_vfp_muls,
      gen_helper_vfp_muld,
  };
 -TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul)
 +TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul, a->rn)
  static const FPScalar f_scalar_fmax = {
      gen_helper_vfp_maxh,
      gen_helper_vfp_maxs,
      gen_helper_vfp_maxd,
  };
 -TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax)
 +TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax, a->rn)
  static const FPScalar f_scalar_fmin = {
      gen_helper_vfp_minh,
      gen_helper_vfp_mins,
      gen_helper_vfp_mind,
  };
 -TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin)
 +TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin, a->rn)
  static const FPScalar f_scalar_fmaxnm = {
      gen_helper_vfp_maxnumh,
      gen_helper_vfp_maxnums,
      gen_helper_vfp_maxnumd,
  };
 -TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm)
 +TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm, a->rn)
  static const FPScalar f_scalar_fminnm = {
      gen_helper_vfp_minnumh,
      gen_helper_vfp_minnums,
      gen_helper_vfp_minnumd,
  };
 -TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm)
 +TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm, a->rn)
  static const FPScalar f_scalar_fmulx = {
      gen_helper_advsimd_mulxh,
      gen_helper_vfp_mulxs,
      gen_helper_vfp_mulxd,
  };
 -TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx)
 +TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx, a->rn)
  static void gen_fnmul_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
  {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fnmul = {
      gen_fnmul_s,
      gen_fnmul_d,
  };
 -TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul)
 +TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul, a->rn)
  static const FPScalar f_scalar_fcmeq = {
      gen_helper_advsimd_ceq_f16,
      gen_helper_neon_ceq_f32,
      gen_helper_neon_ceq_f64,
  };
 -TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq)
 +TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq, a->rm)
  static const FPScalar f_scalar_fcmge = {
      gen_helper_advsimd_cge_f16,
      gen_helper_neon_cge_f32,
      gen_helper_neon_cge_f64,
  };
 -TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge)
 +TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge, a->rm)
  static const FPScalar f_scalar_fcmgt = {
      gen_helper_advsimd_cgt_f16,
      gen_helper_neon_cgt_f32,
      gen_helper_neon_cgt_f64,
  };
 -TRANS(FCMGT_s, do_fp3_scalar, a, &f_scalar_fcmgt)
 +TRANS(FCMGT_s, do_fp3_scalar, a, &f_scalar_fcmgt, a->rm)
  static const FPScalar f_scalar_facge = {
      gen_helper_advsimd_acge_f16,
      gen_helper_neon_acge_f32,
      gen_helper_neon_acge_f64,
  };
 -TRANS(FACGE_s, do_fp3_scalar, a, &f_scalar_facge)
 +TRANS(FACGE_s, do_fp3_scalar, a, &f_scalar_facge, a->rm)
  static const FPScalar f_scalar_facgt = {
      gen_helper_advsimd_acgt_f16,
      gen_helper_neon_acgt_f32,
      gen_helper_neon_acgt_f64,
  };
 -TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt)
 +TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt, a->rm)
  static void gen_fabd_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
  {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fabd = {
      gen_fabd_s,
      gen_fabd_d,
  };
 -TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd)
 +TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd, a->rn)
  static const FPScalar f_scalar_frecps = {
      gen_helper_recpsf_f16,
      gen_helper_recpsf_f32,
      gen_helper_recpsf_f64,
  };
 -TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps)
 +TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps, a->rn)
  static const FPScalar f_scalar_frsqrts = {
      gen_helper_rsqrtsf_f16,
      gen_helper_rsqrtsf_f32,
      gen_helper_rsqrtsf_f64,
  };
 -TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts)
 +TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts, a->rn)
  static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                         const FPScalar *f, bool swap)
 --
 .34.1

-New patch
+[PULL 16/68] target/arm: Handle FPCR.NEP for 3-input scalar operations
+Handle FPCR.NEP for the 3-input scalar operations which use
+do_fmla_scalar_idx() and do_fmadd(), by making them call the
+appropriate write_fp_*reg_merging() functions.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 12 ++++++------
+file changed, 6 insertions(+), 6 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
+                 gen_vfp_negd(t1, t1);
+             }
+             gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
+-            write_fp_dreg(s, a->rd, t0);
++            write_fp_dreg_merging(s, a->rd, a->rd, t0);
+         }
+         break;
+     case MO_32:
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
+                 gen_vfp_negs(t1, t1);
+             }
+             gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
+-            write_fp_sreg(s, a->rd, t0);
++            write_fp_sreg_merging(s, a->rd, a->rd, t0);
+         }
+         break;
+     case MO_16:
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
+             }
+             gen_helper_advsimd_muladdh(t0, t1, t2, t0,
+                                        fpstatus_ptr(FPST_A64_F16));
+-            write_fp_sreg(s, a->rd, t0);
++            write_fp_hreg_merging(s, a->rd, a->rd, t0);
+         }
+         break;
+     default:
+@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
+             }
+             fpst = fpstatus_ptr(FPST_A64);
+             gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
+-            write_fp_dreg(s, a->rd, ta);
++            write_fp_dreg_merging(s, a->rd, a->ra, ta);
+         }
+         break;
+@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
+             }
+             fpst = fpstatus_ptr(FPST_A64);
+             gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
+-            write_fp_sreg(s, a->rd, ta);
++            write_fp_sreg_merging(s, a->rd, a->ra, ta);
+         }
+         break;
+@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
+             }
+             fpst = fpstatus_ptr(FPST_A64_F16);
+             gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
+-            write_fp_sreg(s, a->rd, ta);
++            write_fp_hreg_merging(s, a->rd, a->ra, ta);
+         }
+         break;
+--
+.34.1

-New patch
+[PULL 17/68] target/arm: Handle FPCR.NEP for BFCVT scalar
+Currently we implement BFCVT scalar via do_fp1_scalar().  This works
+even though BFCVT is a narrowing operation from 32 to 16 bits,
+because we can use write_fp_sreg() for float16. However, FPCR.NEP
+support requires that we use write_fp_hreg_merging() for float16
+outputs, so we can't continue to borrow the non-narrowing
+do_fp1_scalar() function for this. Split out trans_BFCVT_s()
+into its own implementation that honours FPCR.NEP.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
+file changed, 21 insertions(+), 4 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frintx = {
+ };
+ TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
+-static const FPScalar1 f_scalar_bfcvt = {
+-    .gen_s = gen_helper_bfcvt,
+-};
+-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
++static bool trans_BFCVT_s(DisasContext *s, arg_rr_e *a)
++{
++    ARMFPStatusFlavour fpsttype = s->fpcr_ah ? FPST_AH : FPST_A64;
++    TCGv_i32 t32;
++    int check;
++
++    if (!dc_isar_feature(aa64_bf16, s)) {
++        return false;
++    }
++
++    check = fp_access_check_scalar_hsd(s, a->esz);
++
++    if (check <= 0) {
++        return check == 0;
++    }
++
++    t32 = read_fp_sreg(s, a->rn);
++    gen_helper_bfcvt(t32, t32, fpstatus_ptr(fpsttype));
++    write_fp_hreg_merging(s, a->rd, a->rd, t32);
++    return true;
++}
+ static const FPScalar1 f_scalar_frint32 = {
+     NULL,
+--
+.34.1

-[PULL 26/35] target/arm/ptw: Report stage 2 fault level for stage 2 faults on stage 1 ptw
+[PULL 18/68] target/arm: Handle FPCR.NEP for 1-input scalar operations
-When we report faults due to stage 2 faults during a stage 1
+Handle FPCR.NEP for the 1-input scalar operations.
 page table walk, the 'level' parameter should be the level
 of the walk in stage 2 that faulted, not the level of the
 walk in stage 1. Correct the reporting of these faults.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-15-peter.maydell@linaro.org
 ---
- target/arm/ptw.c | 10 +++++++---
+ target/arm/tcg/translate-a64.c | 26 ++++++++++++++------------
-file changed, 7 insertions(+), 3 deletions(-)
+file changed, 14 insertions(+), 12 deletions(-)
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/ptw.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_with_fpsttype(DisasContext *s, arg_rr_e *a,
-  do_translation_fault:
+     case MO_64:
-     fi->type = ARMFault_Translation;
+         t64 = read_fp_dreg(s, a->rn);
-  do_fault:
+         f->gen_d(t64, t64, fpst);
--    fi->level = level;
+-        write_fp_dreg(s, a->rd, t64);
--    /* Tag the error as S2 for failed S1 PTW at S2 or ordinary S2.  */
++        write_fp_dreg_merging(s, a->rd, a->rd, t64);
--    fi->stage2 = fi->s1ptw || regime_is_stage2(mmu_idx);
+         break;
-+    if (fi->s1ptw) {
+     case MO_32:
-+        /* Retain the existing stage 2 fi->level */
+         t32 = read_fp_sreg(s, a->rn);
-+        assert(fi->stage2);
+         f->gen_s(t32, t32, fpst);
-+    } else {
+-        write_fp_sreg(s, a->rd, t32);
-+        fi->level = level;
++        write_fp_sreg_merging(s, a->rd, a->rd, t32);
-+        fi->stage2 = regime_is_stage2(mmu_idx);
+         break;
      case MO_16:
          t32 = read_fp_hreg(s, a->rn);
          f->gen_h(t32, t32, fpst);
 -        write_fp_sreg(s, a->rd, t32);
 +        write_fp_hreg_merging(s, a->rd, a->rd, t32);
          break;
      default:
          g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
          TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
          gen_helper_vfp_fcvtds(tcg_rd, tcg_rn, fpst);
 -        write_fp_dreg(s, a->rd, tcg_rd);
 +        write_fp_dreg_merging(s, a->rd, a->rd, tcg_rd);
      }
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_hs(DisasContext *s, arg_rr *a)
          TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
          gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -        /* write_fp_sreg is OK here because top half of result is zero */
 -        write_fp_sreg(s, a->rd, tmp);
 +        /* write_fp_hreg_merging is OK here because top half of result is zero */
 +        write_fp_hreg_merging(s, a->rd, a->rd, tmp);
      }
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_sd(DisasContext *s, arg_rr *a)
          TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
          gen_helper_vfp_fcvtsd(tcg_rd, tcg_rn, fpst);
 -        write_fp_sreg(s, a->rd, tcg_rd);
 +        write_fp_sreg_merging(s, a->rd, a->rd, tcg_rd);
      }
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_hd(DisasContext *s, arg_rr *a)
          TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
          gen_helper_vfp_fcvt_f64_to_f16(tcg_rd, tcg_rn, fpst, ahp);
 -        /* write_fp_sreg is OK here because top half of tcg_rd is zero */
 -        write_fp_sreg(s, a->rd, tcg_rd);
 +        /* write_fp_hreg_merging is OK here because top half of tcg_rd is zero */
 +        write_fp_hreg_merging(s, a->rd, a->rd, tcg_rd);
      }
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_sh(DisasContext *s, arg_rr *a)
          TCGv_i32 tcg_ahp = get_ahp_flag();
          gen_helper_vfp_fcvt_f16_to_f32(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
 -        write_fp_sreg(s, a->rd, tcg_rd);
 +        write_fp_sreg_merging(s, a->rd, a->rd, tcg_rd);
      }
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_dh(DisasContext *s, arg_rr *a)
          TCGv_i32 tcg_ahp = get_ahp_flag();
          gen_helper_vfp_fcvt_f16_to_f64(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
 -        write_fp_dreg(s, a->rd, tcg_rd);
 +        write_fp_dreg_merging(s, a->rd, a->rd, tcg_rd);
      }
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_fcvt_f(DisasContext *s, arg_fcvt *a,
      do_fcvt_scalar(s, a->esz | (is_signed ? MO_SIGN : 0),
                     a->esz, tcg_int, a->shift, a->rn, rmode);
 -    clear_vec(s, a->rd);
 +    if (!s->fpcr_nep) {
 +        clear_vec(s, a->rd);
 +    }
-     fi->s1ns = fault_s1ns(ptw->in_space, mmu_idx);
+     write_vec_element(s, tcg_int, a->rd, 0, a->esz);
      return true;
  }
 --
 .34.1

-[PULL 21/35] target/arm/ptw: Remove last uses of ptw->in_secure
+[PULL 19/68] target/arm: Handle FPCR.NEP in do_cvtf_scalar()
-Replace the last uses of ptw->in_secure with appropriate
+Handle FPCR.NEP in the operations handled by do_cvtf_scalar().
 checks on ptw->in_space.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-10-peter.maydell@linaro.org
 ---
- target/arm/ptw.c | 11 +++++++----
+ target/arm/tcg/translate-a64.c | 6 +++---
-file changed, 7 insertions(+), 4 deletions(-)
+file changed, 3 insertions(+), 3 deletions(-)
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/ptw.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
+@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
-                                       ARMMMUFaultInfo *fi)
+         } else {
- {
+             gen_helper_vfp_uqtod(tcg_double, tcg_int, tcg_shift, tcg_fpstatus);
-     ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
+         }
--    bool is_secure = ptw->in_secure;
+-        write_fp_dreg(s, rd, tcg_double);
-     ARMMMUIdx s1_mmu_idx;
++        write_fp_dreg_merging(s, rd, rd, tcg_double);
      /*
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
       * cannot upgrade a NonSecure translation regime's attributes
       * to Secure or Realm.
       */
 -    result->f.attrs.secure = is_secure;
      result->f.attrs.space = ptw->in_space;
 +    result->f.attrs.secure = arm_space_is_secure(ptw->in_space);
      switch (mmu_idx) {
      case ARMMMUIdx_Phys_S:
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
      case ARMMMUIdx_Stage1_E0:
      case ARMMMUIdx_Stage1_E1:
      case ARMMMUIdx_Stage1_E1_PAN:
 -        /* First stage lookup uses second stage for ptw. */
 -        ptw->in_ptw_idx = is_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
 +        /*
 +         * First stage lookup uses second stage for ptw; only
 +         * Secure has both S and NS IPA and starts with Stage2_S.
 +         */
 +        ptw->in_ptw_idx = (ptw->in_space == ARMSS_Secure) ?
 +            ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
          break;
-     case ARMMMUIdx_Stage2:
+     case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
          } else {
              gen_helper_vfp_uqtos(tcg_single, tcg_int, tcg_shift, tcg_fpstatus);
          }
 -        write_fp_sreg(s, rd, tcg_single);
 +        write_fp_sreg_merging(s, rd, rd, tcg_single);
          break;
      case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
          } else {
              gen_helper_vfp_uqtoh(tcg_single, tcg_int, tcg_shift, tcg_fpstatus);
          }
 -        write_fp_sreg(s, rd, tcg_single);
 +        write_fp_hreg_merging(s, rd, rd, tcg_single);
          break;
      default:
 --
 .34.1

-New patch
+[PULL 20/68] target/arm: Handle FPCR.NEP for scalar FABS and FNEG
+Handle FPCR.NEP merging for scalar FABS and FNEG; this requires
+an extra parameter to do_fp1_scalar_int(), since FMOV scalar
+does not have the merging behaviour.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 27 ++++++++++++++++++++-------
+file changed, 20 insertions(+), 7 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar1Int {
+ } FPScalar1Int;
+ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
+-                              const FPScalar1Int *f)
++                              const FPScalar1Int *f,
++                              bool merging)
+ {
+     switch (a->esz) {
+     case MO_64:
+         if (fp_access_check(s)) {
+             TCGv_i64 t = read_fp_dreg(s, a->rn);
+             f->gen_d(t, t);
+-            write_fp_dreg(s, a->rd, t);
++            if (merging) {
++                write_fp_dreg_merging(s, a->rd, a->rd, t);
++            } else {
++                write_fp_dreg(s, a->rd, t);
++            }
+         }
+         break;
+     case MO_32:
+         if (fp_access_check(s)) {
+             TCGv_i32 t = read_fp_sreg(s, a->rn);
+             f->gen_s(t, t);
+-            write_fp_sreg(s, a->rd, t);
++            if (merging) {
++                write_fp_sreg_merging(s, a->rd, a->rd, t);
++            } else {
++                write_fp_sreg(s, a->rd, t);
++            }
+         }
+         break;
+     case MO_16:
+@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
+         if (fp_access_check(s)) {
+             TCGv_i32 t = read_fp_hreg(s, a->rn);
+             f->gen_h(t, t);
+-            write_fp_sreg(s, a->rd, t);
++            if (merging) {
++                write_fp_hreg_merging(s, a->rd, a->rd, t);
++            } else {
++                write_fp_sreg(s, a->rd, t);
++            }
+         }
+         break;
+     default:
+@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fmov = {
+     tcg_gen_mov_i32,
+     tcg_gen_mov_i64,
+ };
+-TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov)
++TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov, false)
+ static const FPScalar1Int f_scalar_fabs = {
+     gen_vfp_absh,
+     gen_vfp_abss,
+     gen_vfp_absd,
+ };
+-TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs)
++TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs, true)
+ static const FPScalar1Int f_scalar_fneg = {
+     gen_vfp_negh,
+     gen_vfp_negs,
+     gen_vfp_negd,
+ };
+-TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg)
++TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg, true)
+ typedef struct FPScalar1 {
+     void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr);
+--
+.34.1

-[PULL 27/35] target/arm: Adjust PAR_EL1.SH for Device and Normal-NC memory types
+[PULL 21/68] target/arm: Handle FPCR.NEP for FCVTXN (scalar)
-The PAR_EL1.SH field documents that for the cases of:
+Unlike the other users of do_2misc_narrow_scalar(), FCVTXN (scalar)
- * Device memory
+is always double-to-single and must honour FPCR.NEP.  Implement this
- * Normal memory with both Inner and Outer Non-Cacheable
+directly in a trans function rather than using
-the field should be 0b10 rather than whatever was in the
+do_2misc_narrow_scalar().
-translation table descriptor field. (In the pseudocode this
-is handled by PAREncodeShareability().) Perform this
+We still need gen_fcvtxn_sd() and the f_scalar_fcvtxn[] array for
-adjustment when assembling a PAR value.
+the FCVTXN (vector) insn, so we move those down in the file to
 where they are used.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-16-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 15 ++++++++++++++-
+ target/arm/tcg/translate-a64.c | 43 ++++++++++++++++++++++------------
-file changed, 14 insertions(+), 1 deletion(-)
+file changed, 28 insertions(+), 15 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult ats_access(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@ static ArithOneOp * const f_scalar_uqxtn[] = {
  };
  TRANS(UQXTN_s, do_2misc_narrow_scalar, a, f_scalar_uqxtn)
 -static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
 +static bool trans_FCVTXN_s(DisasContext *s, arg_rr_e *a)
  {
 -    /*
 -     * 64 bit to 32 bit float conversion
 -     * with von Neumann rounding (round to odd)
 -     */
 -    TCGv_i32 tmp = tcg_temp_new_i32();
 -    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_A64));
 -    tcg_gen_extu_i32_i64(d, tmp);
 +    if (fp_access_check(s)) {
 +        /*
 +         * 64 bit to 32 bit float conversion
 +         * with von Neumann rounding (round to odd)
 +         */
 +        TCGv_i64 src = read_fp_dreg(s, a->rn);
 +        TCGv_i32 dst = tcg_temp_new_i32();
 +        gen_helper_fcvtx_f64_to_f32(dst, src, fpstatus_ptr(FPST_A64));
 +        write_fp_sreg_merging(s, a->rd, a->rd, dst);
 +    }
 +    return true;
  }
- #ifdef CONFIG_TCG
+-static ArithOneOp * const f_scalar_fcvtxn[] = {
-+static int par_el1_shareability(GetPhysAddrResult *res)
+-    NULL,
 -    NULL,
 -    gen_fcvtxn_sd,
 -};
 -TRANS(FCVTXN_s, do_2misc_narrow_scalar, a, f_scalar_fcvtxn)
 -
  #undef WRAP_ENV
  static bool do_gvec_fn2(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn)
@@ -XXX,XX +XXX,XX @@ static void gen_fcvtn_sd(TCGv_i64 d, TCGv_i64 n)
      tcg_gen_extu_i32_i64(d, tmp);
  }
 +static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
 +{
 +    /*
-+     * The PAR_EL1.SH field must be 0b10 for Device or Normal-NC
++     * 64 bit to 32 bit float conversion
-+     * memory -- see pseudocode PAREncodeShareability().
++     * with von Neumann rounding (round to odd)
 +     */
-+    if (((res->cacheattrs.attrs & 0xf0) == 0) ||
++    TCGv_i32 tmp = tcg_temp_new_i32();
-+        res->cacheattrs.attrs == 0x44 || res->cacheattrs.attrs == 0x40) {
++    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_A64));
-+        return 2;
++    tcg_gen_extu_i32_i64(d, tmp);
 +    }
 +    return res->cacheattrs.shareability;
 +}
 +
- static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
+ static ArithOneOp * const f_vector_fcvtn[] = {
-                              MMUAccessType access_type, ARMMMUIdx mmu_idx,
+     NULL,
-                              bool is_secure)
+     gen_fcvtn_hs,
-@@ -XXX,XX +XXX,XX @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
+     gen_fcvtn_sd,
-                 par64 |= (1 << 9); /* NS */
+ };
-             }
++static ArithOneOp * const f_scalar_fcvtxn[] = {
-             par64 |= (uint64_t)res.cacheattrs.attrs << 56; /* ATTR */
++    NULL,
--            par64 |= res.cacheattrs.shareability << 7; /* SH */
++    NULL,
-+            par64 |= par_el1_shareability(&res) << 7; /* SH */
++    gen_fcvtxn_sd,
-         } else {
++};
-             uint32_t fsr = arm_fi_to_lfsc(&fi);
+ TRANS(FCVTN_v, do_2misc_narrow_vector, a, f_vector_fcvtn)
  TRANS(FCVTXN_v, do_2misc_narrow_vector, a, f_scalar_fcvtxn)
 --
 .34.1

-New patch
+[PULL 22/68] target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
+do_fp3_scalar_idx() is used only for the FMUL and FMULX scalar by
+element instructions; these both need to merge the result with the Rn
+register when FPCR.NEP is set.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 6 +++---
+file changed, 3 insertions(+), 3 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
+             read_vec_element(s, t1, a->rm, a->idx, MO_64);
+             f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_A64));
+-            write_fp_dreg(s, a->rd, t0);
++            write_fp_dreg_merging(s, a->rd, a->rn, t0);
+         }
+         break;
+     case MO_32:
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
+             read_vec_element_i32(s, t1, a->rm, a->idx, MO_32);
+             f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_A64));
+-            write_fp_sreg(s, a->rd, t0);
++            write_fp_sreg_merging(s, a->rd, a->rn, t0);
+         }
+         break;
+     case MO_16:
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
+             read_vec_element_i32(s, t1, a->rm, a->idx, MO_16);
+             f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_A64_F16));
+-            write_fp_sreg(s, a->rd, t0);
++            write_fp_hreg_merging(s, a->rd, a->rn, t0);
+         }
+         break;
+     default:
+--
+.34.1

-New patch
+[PULL 23/68] target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
+When FPCR.AH == 1, floating point FMIN and FMAX have some odd special
+cases:
+ * comparing two zeroes (even of different sign) or comparing a NaN
+   with anything always returns the second argument (possibly
+   squashed to zero)
+ * denormal outputs are not squashed to zero regardless of FZ or FZ16
+Implement these semantics in new helper functions and select them at
+translate time if FPCR.AH is 1 for the scalar FMAX and FMIN insns.
+(We will convert the other FMAX and FMIN insns in subsequent
+commits.)
+Note that FMINNM and FMAXNM are not affected.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-a64.h    |  7 +++++++
+ target/arm/tcg/helper-a64.c    | 36 ++++++++++++++++++++++++++++++++++
+ target/arm/tcg/translate-a64.c | 23 ++++++++++++++++++++--
+files changed, 64 insertions(+), 2 deletions(-)
+diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-a64.h
++++ b/target/arm/tcg/helper-a64.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(advsimd_muladd2h, i32, i32, i32, i32, fpst)
+ DEF_HELPER_2(advsimd_rinth_exact, f16, f16, fpst)
+ DEF_HELPER_2(advsimd_rinth, f16, f16, fpst)
++DEF_HELPER_3(vfp_ah_minh, f16, f16, f16, fpst)
++DEF_HELPER_3(vfp_ah_mins, f32, f32, f32, fpst)
++DEF_HELPER_3(vfp_ah_mind, f64, f64, f64, fpst)
++DEF_HELPER_3(vfp_ah_maxh, f16, f16, f16, fpst)
++DEF_HELPER_3(vfp_ah_maxs, f32, f32, f32, fpst)
++DEF_HELPER_3(vfp_ah_maxd, f64, f64, f64, fpst)
++
+ DEF_HELPER_2(exception_return, void, env, i64)
+ DEF_HELPER_FLAGS_2(dc_zva, TCG_CALL_NO_WG, void, env, i64)
+diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-a64.c
++++ b/target/arm/tcg/helper-a64.c
+@@ -XXX,XX +XXX,XX @@ float32 HELPER(fcvtx_f64_to_f32)(float64 a, float_status *fpst)
+     return r;
+ }
++/*
++ * AH=1 min/max have some odd special cases:
++ * comparing two zeroes (regardless of sign), (NaN, anything),
++ * or (anything, NaN) should return the second argument (possibly
++ * squashed to zero).
++ * Also, denormal outputs are not squashed to zero regardless of FZ or FZ16.
++ */
++#define AH_MINMAX_HELPER(NAME, CTYPE, FLOATTYPE, MINMAX)                \
++    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
++    {                                                                   \
++        bool save;                                                      \
++        CTYPE r;                                                        \
++        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
++        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
++        if (FLOATTYPE ## _is_zero(a) && FLOATTYPE ## _is_zero(b)) {     \
++            return b;                                                   \
++        }                                                               \
++        if (FLOATTYPE ## _is_any_nan(a) ||                              \
++            FLOATTYPE ## _is_any_nan(b)) {                              \
++            float_raise(float_flag_invalid, fpst);                      \
++            return b;                                                   \
++        }                                                               \
++        save = get_flush_to_zero(fpst);                                 \
++        set_flush_to_zero(false, fpst);                                 \
++        r = FLOATTYPE ## _ ## MINMAX(a, b, fpst);                       \
++        set_flush_to_zero(save, fpst);                                  \
++        return r;                                                       \
++    }
++
++AH_MINMAX_HELPER(vfp_ah_minh, dh_ctype_f16, float16, min)
++AH_MINMAX_HELPER(vfp_ah_mins, float32, float32, min)
++AH_MINMAX_HELPER(vfp_ah_mind, float64, float64, min)
++AH_MINMAX_HELPER(vfp_ah_maxh, dh_ctype_f16, float16, max)
++AH_MINMAX_HELPER(vfp_ah_maxs, float32, float32, max)
++AH_MINMAX_HELPER(vfp_ah_maxd, float64, float64, max)
++
+ /* 64-bit versions of the CRC helpers. Note that although the operation
+  * (and the prototypes of crc32c() and crc32() mean that only the bottom
+  * 32 bits of the accumulator and result are used, we pass and return
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
+                                        select_ah_fpst(s, a->esz));
+ }
++/* Some insns need to call different helpers when FPCR.AH == 1 */
++static bool do_fp3_scalar_2fn(DisasContext *s, arg_rrr_e *a,
++                              const FPScalar *fnormal,
++                              const FPScalar *fah,
++                              int mergereg)
++{
++    return do_fp3_scalar(s, a, s->fpcr_ah ? fah : fnormal, mergereg);
++}
++
+ static const FPScalar f_scalar_fadd = {
+     gen_helper_vfp_addh,
+     gen_helper_vfp_adds,
+@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fmax = {
+     gen_helper_vfp_maxs,
+     gen_helper_vfp_maxd,
+ };
+-TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax, a->rn)
++static const FPScalar f_scalar_fmax_ah = {
++    gen_helper_vfp_ah_maxh,
++    gen_helper_vfp_ah_maxs,
++    gen_helper_vfp_ah_maxd,
++};
++TRANS(FMAX_s, do_fp3_scalar_2fn, a, &f_scalar_fmax, &f_scalar_fmax_ah, a->rn)
+ static const FPScalar f_scalar_fmin = {
+     gen_helper_vfp_minh,
+     gen_helper_vfp_mins,
+     gen_helper_vfp_mind,
+ };
+-TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin, a->rn)
++static const FPScalar f_scalar_fmin_ah = {
++    gen_helper_vfp_ah_minh,
++    gen_helper_vfp_ah_mins,
++    gen_helper_vfp_ah_mind,
++};
++TRANS(FMIN_s, do_fp3_scalar_2fn, a, &f_scalar_fmin, &f_scalar_fmin_ah, a->rn)
+ static const FPScalar f_scalar_fmaxnm = {
+     gen_helper_vfp_maxnumh,
+--
+.34.1

-New patch
+[PULL 24/68] target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
+Implement the FPCR.AH == 1 semantics for vector FMIN/FMAX, by
+creating new _ah_ versions of the gvec helpers which invoke the
+scalar fmin_ah and fmax_ah helpers on each element.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
+ target/arm/tcg/translate-a64.c | 21 +++++++++++++++++++--
+ target/arm/tcg/vec_helper.c    |  8 ++++++++
+files changed, 41 insertions(+), 2 deletions(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmax_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmax_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmax_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++
++DEF_HELPER_FLAGS_5(gvec_ah_fmin_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmin_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmin_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_4(sve_faddv_h, TCG_CALL_NO_RWG,
+                    i64, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_4(sve_faddv_s, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
+                                        FPST_A64_F16 : FPST_A64);
+ }
++static bool do_fp3_vector_2fn(DisasContext *s, arg_qrrr_e *a, int data,
++                              gen_helper_gvec_3_ptr * const fnormal[3],
++                              gen_helper_gvec_3_ptr * const fah[3])
++{
++    return do_fp3_vector(s, a, data, s->fpcr_ah ? fah : fnormal);
++}
++
+ static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
+                              gen_helper_gvec_3_ptr * const f[3])
+ {
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmax[3] = {
+     gen_helper_gvec_fmax_s,
+     gen_helper_gvec_fmax_d,
+ };
+-TRANS(FMAX_v, do_fp3_vector, a, 0, f_vector_fmax)
++static gen_helper_gvec_3_ptr * const f_vector_fmax_ah[3] = {
++    gen_helper_gvec_ah_fmax_h,
++    gen_helper_gvec_ah_fmax_s,
++    gen_helper_gvec_ah_fmax_d,
++};
++TRANS(FMAX_v, do_fp3_vector_2fn, a, 0, f_vector_fmax, f_vector_fmax_ah)
+ static gen_helper_gvec_3_ptr * const f_vector_fmin[3] = {
+     gen_helper_gvec_fmin_h,
+     gen_helper_gvec_fmin_s,
+     gen_helper_gvec_fmin_d,
+ };
+-TRANS(FMIN_v, do_fp3_vector, a, 0, f_vector_fmin)
++static gen_helper_gvec_3_ptr * const f_vector_fmin_ah[3] = {
++    gen_helper_gvec_ah_fmin_h,
++    gen_helper_gvec_ah_fmin_s,
++    gen_helper_gvec_ah_fmin_d,
++};
++TRANS(FMIN_v, do_fp3_vector_2fn, a, 0, f_vector_fmin, f_vector_fmin_ah)
+ static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[3] = {
+     gen_helper_gvec_fmaxnum_h,
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_rsqrts_h, helper_rsqrtsf_f16, float16)
+ DO_3OP(gvec_rsqrts_s, helper_rsqrtsf_f32, float32)
+ DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64)
++DO_3OP(gvec_ah_fmax_h, helper_vfp_ah_maxh, float16)
++DO_3OP(gvec_ah_fmax_s, helper_vfp_ah_maxs, float32)
++DO_3OP(gvec_ah_fmax_d, helper_vfp_ah_maxd, float64)
++
++DO_3OP(gvec_ah_fmin_h, helper_vfp_ah_minh, float16)
++DO_3OP(gvec_ah_fmin_s, helper_vfp_ah_mins, float32)
++DO_3OP(gvec_ah_fmin_d, helper_vfp_ah_mind, float64)
++
+ #endif
+ #undef DO_3OP
+--
+.34.1

-New patch
+[PULL 25/68] target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
+Implement the FPCR.AH semantics for FMAXV and FMINV.  These are the
+"recursively reduce all lanes of a vector to a scalar result" insns;
+we just need to use the _ah_ helper for the reduction step when
+FPCR.AH == 1.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 28 ++++++++++++++++++----------
+file changed, 18 insertions(+), 10 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static TCGv_i32 do_reduction_op(DisasContext *s, int rn, MemOp esz,
+ }
+ static bool do_fp_reduction(DisasContext *s, arg_qrr_e *a,
+-                              NeonGenTwoSingleOpFn *fn)
++                            NeonGenTwoSingleOpFn *fnormal,
++                            NeonGenTwoSingleOpFn *fah)
+ {
+     if (fp_access_check(s)) {
+         MemOp esz = a->esz;
+         int elts = (a->q ? 16 : 8) >> esz;
+         TCGv_ptr fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+-        TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst, fn);
++        TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst,
++                                       s->fpcr_ah ? fah : fnormal);
+         write_fp_sreg(s, a->rd, res);
+     }
+     return true;
+ }
+-TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxnumh)
+-TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minnumh)
+-TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxh)
+-TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minh)
++TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a,
++           gen_helper_vfp_maxnumh, gen_helper_vfp_maxnumh)
++TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a,
++           gen_helper_vfp_minnumh, gen_helper_vfp_minnumh)
++TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a,
++           gen_helper_vfp_maxh, gen_helper_vfp_ah_maxh)
++TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a,
++           gen_helper_vfp_minh, gen_helper_vfp_ah_minh)
+-TRANS(FMAXNMV_s, do_fp_reduction, a, gen_helper_vfp_maxnums)
+-TRANS(FMINNMV_s, do_fp_reduction, a, gen_helper_vfp_minnums)
+-TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs)
+-TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins)
++TRANS(FMAXNMV_s, do_fp_reduction, a,
++      gen_helper_vfp_maxnums, gen_helper_vfp_maxnums)
++TRANS(FMINNMV_s, do_fp_reduction, a,
++      gen_helper_vfp_minnums, gen_helper_vfp_minnums)
++TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs, gen_helper_vfp_ah_maxs)
++TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins, gen_helper_vfp_ah_mins)
+ /*
+  * Floating-point Immediate
+--
+.34.1

-New patch
+[PULL 26/68] target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
+Implement the FPCR.AH semantics for the pairwise floating
+point minimum/maximum insns FMINP and FMAXP.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
+ target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
+ target/arm/tcg/vec_helper.c    | 10 ++++++++++
+files changed, 45 insertions(+), 4 deletions(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_ah_fmin_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_ah_fmin_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmaxp_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmaxp_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fmaxp_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++
++DEF_HELPER_FLAGS_5(gvec_ah_fminp_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fminp_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fminp_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_4(sve_faddv_h, TCG_CALL_NO_RWG,
+                    i64, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_4(sve_faddv_s, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmaxp[3] = {
+     gen_helper_gvec_fmaxp_s,
+     gen_helper_gvec_fmaxp_d,
+ };
+-TRANS(FMAXP_v, do_fp3_vector, a, 0, f_vector_fmaxp)
++static gen_helper_gvec_3_ptr * const f_vector_ah_fmaxp[3] = {
++    gen_helper_gvec_ah_fmaxp_h,
++    gen_helper_gvec_ah_fmaxp_s,
++    gen_helper_gvec_ah_fmaxp_d,
++};
++TRANS(FMAXP_v, do_fp3_vector_2fn, a, 0, f_vector_fmaxp, f_vector_ah_fmaxp)
+ static gen_helper_gvec_3_ptr * const f_vector_fminp[3] = {
+     gen_helper_gvec_fminp_h,
+     gen_helper_gvec_fminp_s,
+     gen_helper_gvec_fminp_d,
+ };
+-TRANS(FMINP_v, do_fp3_vector, a, 0, f_vector_fminp)
++static gen_helper_gvec_3_ptr * const f_vector_ah_fminp[3] = {
++    gen_helper_gvec_ah_fminp_h,
++    gen_helper_gvec_ah_fminp_s,
++    gen_helper_gvec_ah_fminp_d,
++};
++TRANS(FMINP_v, do_fp3_vector_2fn, a, 0, f_vector_fminp, f_vector_ah_fminp)
+ static gen_helper_gvec_3_ptr * const f_vector_fmaxnmp[3] = {
+     gen_helper_gvec_fmaxnump_h,
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_pair(DisasContext *s, arg_rr_e *a, const FPScalar *f)
+     return true;
+ }
++static bool do_fp3_scalar_pair_2fn(DisasContext *s, arg_rr_e *a,
++                                   const FPScalar *fnormal,
++                                   const FPScalar *fah)
++{
++    return do_fp3_scalar_pair(s, a, s->fpcr_ah ? fah : fnormal);
++}
++
+ TRANS(FADDP_s, do_fp3_scalar_pair, a, &f_scalar_fadd)
+-TRANS(FMAXP_s, do_fp3_scalar_pair, a, &f_scalar_fmax)
+-TRANS(FMINP_s, do_fp3_scalar_pair, a, &f_scalar_fmin)
++TRANS(FMAXP_s, do_fp3_scalar_pair_2fn, a, &f_scalar_fmax, &f_scalar_fmax_ah)
++TRANS(FMINP_s, do_fp3_scalar_pair_2fn, a, &f_scalar_fmin, &f_scalar_fmin_ah)
+ TRANS(FMAXNMP_s, do_fp3_scalar_pair, a, &f_scalar_fmaxnm)
+ TRANS(FMINNMP_s, do_fp3_scalar_pair, a, &f_scalar_fminnm)
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_3OP_PAIR(gvec_fminnump_h, float16_minnum, float16, H2)
+ DO_3OP_PAIR(gvec_fminnump_s, float32_minnum, float32, H4)
+ DO_3OP_PAIR(gvec_fminnump_d, float64_minnum, float64, )
++#ifdef TARGET_AARCH64
++DO_3OP_PAIR(gvec_ah_fmaxp_h, helper_vfp_ah_maxh, float16, H2)
++DO_3OP_PAIR(gvec_ah_fmaxp_s, helper_vfp_ah_maxs, float32, H4)
++DO_3OP_PAIR(gvec_ah_fmaxp_d, helper_vfp_ah_maxd, float64, )
++
++DO_3OP_PAIR(gvec_ah_fminp_h, helper_vfp_ah_minh, float16, H2)
++DO_3OP_PAIR(gvec_ah_fminp_s, helper_vfp_ah_mins, float32, H4)
++DO_3OP_PAIR(gvec_ah_fminp_d, helper_vfp_ah_mind, float64, )
++#endif
++
+ #undef DO_3OP_PAIR
+ #define DO_3OP_PAIR(NAME, FUNC, TYPE, H) \
+--
+.34.1

-New patch
+[PULL 27/68] target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
+Implement the FPCR.AH semantics for the SVE FMAXV and FMINV
+vector-reduction-to-scalar max/min operations.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 14 +++++++++++
+ target/arm/tcg/sve_helper.c    | 43 +++++++++++++++++++++-------------
+ target/arm/tcg/translate-sve.c | 16 +++++++++++--
+files changed, 55 insertions(+), 18 deletions(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_fminv_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_4(sve_fminv_d, TCG_CALL_NO_RWG,
+                    i64, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fmaxv_h, TCG_CALL_NO_RWG,
++                   i64, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fmaxv_s, TCG_CALL_NO_RWG,
++                   i64, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fmaxv_d, TCG_CALL_NO_RWG,
++                   i64, ptr, ptr, fpst, i32)
++
++DEF_HELPER_FLAGS_4(sve_ah_fminv_h, TCG_CALL_NO_RWG,
++                   i64, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fminv_s, TCG_CALL_NO_RWG,
++                   i64, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fminv_d, TCG_CALL_NO_RWG,
++                   i64, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_5(sve_fadda_h, TCG_CALL_NO_RWG,
+                    i64, i64, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(sve_fadda_s, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ static TYPE NAME##_reduce(TYPE *data, float_status *status, uintptr_t n) \
+         uintptr_t half = n / 2;                                       \
+         TYPE lo = NAME##_reduce(data, status, half);                  \
+         TYPE hi = NAME##_reduce(data + half, status, half);           \
+-        return TYPE##_##FUNC(lo, hi, status);                         \
++        return FUNC(lo, hi, status);                                  \
+     }                                                                 \
+ }                                                                     \
+ uint64_t HELPER(NAME)(void *vn, void *vg, float_status *s, uint32_t desc) \
+@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(NAME)(void *vn, void *vg, float_status *s, uint32_t desc) \
+     return NAME##_reduce(data, s, maxsz / sizeof(TYPE));              \
+ }
+-DO_REDUCE(sve_faddv_h, float16, H1_2, add, float16_zero)
+-DO_REDUCE(sve_faddv_s, float32, H1_4, add, float32_zero)
+-DO_REDUCE(sve_faddv_d, float64, H1_8, add, float64_zero)
++DO_REDUCE(sve_faddv_h, float16, H1_2, float16_add, float16_zero)
++DO_REDUCE(sve_faddv_s, float32, H1_4, float32_add, float32_zero)
++DO_REDUCE(sve_faddv_d, float64, H1_8, float64_add, float64_zero)
+ /* Identity is floatN_default_nan, without the function call.  */
+-DO_REDUCE(sve_fminnmv_h, float16, H1_2, minnum, 0x7E00)
+-DO_REDUCE(sve_fminnmv_s, float32, H1_4, minnum, 0x7FC00000)
+-DO_REDUCE(sve_fminnmv_d, float64, H1_8, minnum, 0x7FF8000000000000ULL)
++DO_REDUCE(sve_fminnmv_h, float16, H1_2, float16_minnum, 0x7E00)
++DO_REDUCE(sve_fminnmv_s, float32, H1_4, float32_minnum, 0x7FC00000)
++DO_REDUCE(sve_fminnmv_d, float64, H1_8, float64_minnum, 0x7FF8000000000000ULL)
+-DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, maxnum, 0x7E00)
+-DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, maxnum, 0x7FC00000)
+-DO_REDUCE(sve_fmaxnmv_d, float64, H1_8, maxnum, 0x7FF8000000000000ULL)
++DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, float16_maxnum, 0x7E00)
++DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, float32_maxnum, 0x7FC00000)
++DO_REDUCE(sve_fmaxnmv_d, float64, H1_8, float64_maxnum, 0x7FF8000000000000ULL)
+-DO_REDUCE(sve_fminv_h, float16, H1_2, min, float16_infinity)
+-DO_REDUCE(sve_fminv_s, float32, H1_4, min, float32_infinity)
+-DO_REDUCE(sve_fminv_d, float64, H1_8, min, float64_infinity)
++DO_REDUCE(sve_fminv_h, float16, H1_2, float16_min, float16_infinity)
++DO_REDUCE(sve_fminv_s, float32, H1_4, float32_min, float32_infinity)
++DO_REDUCE(sve_fminv_d, float64, H1_8, float64_min, float64_infinity)
+-DO_REDUCE(sve_fmaxv_h, float16, H1_2, max, float16_chs(float16_infinity))
+-DO_REDUCE(sve_fmaxv_s, float32, H1_4, max, float32_chs(float32_infinity))
+-DO_REDUCE(sve_fmaxv_d, float64, H1_8, max, float64_chs(float64_infinity))
++DO_REDUCE(sve_fmaxv_h, float16, H1_2, float16_max, float16_chs(float16_infinity))
++DO_REDUCE(sve_fmaxv_s, float32, H1_4, float32_max, float32_chs(float32_infinity))
++DO_REDUCE(sve_fmaxv_d, float64, H1_8, float64_max, float64_chs(float64_infinity))
++
++DO_REDUCE(sve_ah_fminv_h, float16, H1_2, helper_vfp_ah_minh, float16_infinity)
++DO_REDUCE(sve_ah_fminv_s, float32, H1_4, helper_vfp_ah_mins, float32_infinity)
++DO_REDUCE(sve_ah_fminv_d, float64, H1_8, helper_vfp_ah_mind, float64_infinity)
++
++DO_REDUCE(sve_ah_fmaxv_h, float16, H1_2, helper_vfp_ah_maxh,
++          float16_chs(float16_infinity))
++DO_REDUCE(sve_ah_fmaxv_s, float32, H1_4, helper_vfp_ah_maxs,
++          float32_chs(float32_infinity))
++DO_REDUCE(sve_ah_fmaxv_d, float64, H1_8, helper_vfp_ah_maxd,
++          float64_chs(float64_infinity))
+ #undef DO_REDUCE
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static bool do_reduce(DisasContext *s, arg_rpr_esz *a,
+     };                                                                   \
+     TRANS_FEAT(NAME, aa64_sve, do_reduce, a, name##_fns[a->esz])
++#define DO_VPZ_AH(NAME, name)                                            \
++    static gen_helper_fp_reduce * const name##_fns[4] = {                \
++        NULL,                      gen_helper_sve_##name##_h,            \
++        gen_helper_sve_##name##_s, gen_helper_sve_##name##_d,            \
++    };                                                                   \
++    static gen_helper_fp_reduce * const name##_ah_fns[4] = {             \
++        NULL,                      gen_helper_sve_ah_##name##_h,         \
++        gen_helper_sve_ah_##name##_s, gen_helper_sve_ah_##name##_d,      \
++    };                                                                   \
++    TRANS_FEAT(NAME, aa64_sve, do_reduce, a,                             \
++               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz])
++
+ DO_VPZ(FADDV, faddv)
+ DO_VPZ(FMINNMV, fminnmv)
+ DO_VPZ(FMAXNMV, fmaxnmv)
+-DO_VPZ(FMINV, fminv)
+-DO_VPZ(FMAXV, fmaxv)
++DO_VPZ_AH(FMINV, fminv)
++DO_VPZ_AH(FMAXV, fmaxv)
+ #undef DO_VPZ
+--
+.34.1

-New patch
+[PULL 28/68] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
+Implement the FPCR.AH semantics for the SVE FMAX and FMIN operations
+that take an immediate as the second operand.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
+ target/arm/tcg/sve_helper.c    |  8 ++++++++
+ target/arm/tcg/translate-sve.c | 25 +++++++++++++++++++++++--
+files changed, 45 insertions(+), 2 deletions(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_fmins_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_6(sve_fmins_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmaxs_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmaxs_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmaxs_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++
++DEF_HELPER_FLAGS_6(sve_ah_fmins_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmins_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmins_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++
+ DEF_HELPER_FLAGS_5(sve_fcvt_sh, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(sve_fcvt_dh, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_ZPZS_FP(sve_fmins_h, float16, H1_2, float16_min)
+ DO_ZPZS_FP(sve_fmins_s, float32, H1_4, float32_min)
+ DO_ZPZS_FP(sve_fmins_d, float64, H1_8, float64_min)
++DO_ZPZS_FP(sve_ah_fmaxs_h, float16, H1_2, helper_vfp_ah_maxh)
++DO_ZPZS_FP(sve_ah_fmaxs_s, float32, H1_4, helper_vfp_ah_maxs)
++DO_ZPZS_FP(sve_ah_fmaxs_d, float64, H1_8, helper_vfp_ah_maxd)
++
++DO_ZPZS_FP(sve_ah_fmins_h, float16, H1_2, helper_vfp_ah_minh)
++DO_ZPZS_FP(sve_ah_fmins_s, float32, H1_4, helper_vfp_ah_mins)
++DO_ZPZS_FP(sve_ah_fmins_d, float64, H1_8, helper_vfp_ah_mind)
++
+ /* Fully general two-operand expander, controlled by a predicate,
+  * With the extra float_status parameter.
+  */
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fp_imm(DisasContext *s, arg_rpri_esz *a, uint64_t imm,
+     TRANS_FEAT(NAME##_zpzi, aa64_sve, do_fp_imm, a,                     \
+                name##_const[a->esz][a->imm], name##_fns[a->esz])
++#define DO_FP_AH_IMM(NAME, name, const0, const1)                        \
++    static gen_helper_sve_fp2scalar * const name##_fns[4] = {           \
++        NULL, gen_helper_sve_##name##_h,                                \
++        gen_helper_sve_##name##_s,                                      \
++        gen_helper_sve_##name##_d                                       \
++    };                                                                  \
++    static gen_helper_sve_fp2scalar * const name##_ah_fns[4] = {        \
++        NULL, gen_helper_sve_ah_##name##_h,                             \
++        gen_helper_sve_ah_##name##_s,                                   \
++        gen_helper_sve_ah_##name##_d                                    \
++    };                                                                  \
++    static uint64_t const name##_const[4][2] = {                        \
++        { -1, -1 },                                                     \
++        { float16_##const0, float16_##const1 },                         \
++        { float32_##const0, float32_##const1 },                         \
++        { float64_##const0, float64_##const1 },                         \
++    };                                                                  \
++    TRANS_FEAT(NAME##_zpzi, aa64_sve, do_fp_imm, a,                     \
++               name##_const[a->esz][a->imm],                            \
++               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz])
++
+ DO_FP_IMM(FADD, fadds, half, one)
+ DO_FP_IMM(FSUB, fsubs, half, one)
+ DO_FP_IMM(FMUL, fmuls, half, two)
+ DO_FP_IMM(FSUBR, fsubrs, half, one)
+ DO_FP_IMM(FMAXNM, fmaxnms, zero, one)
+ DO_FP_IMM(FMINNM, fminnms, zero, one)
+-DO_FP_IMM(FMAX, fmaxs, zero, one)
+-DO_FP_IMM(FMIN, fmins, zero, one)
++DO_FP_AH_IMM(FMAX, fmaxs, zero, one)
++DO_FP_AH_IMM(FMIN, fmins, zero, one)
+ #undef DO_FP_IMM
+--
+.34.1

-New patch
+[PULL 29/68] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
+Implement the FPCR.AH semantics for the SVE FMAX and FMIN
+operations that take two vector operands.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
+ target/arm/tcg/sve_helper.c    |  8 ++++++++
+ target/arm/tcg/translate-sve.c | 17 +++++++++++++++--
+files changed, 37 insertions(+), 2 deletions(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_fmax_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_6(sve_fmax_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmin_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmin_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmin_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++
++DEF_HELPER_FLAGS_6(sve_ah_fmax_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmax_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmax_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_6(sve_fminnum_h, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_6(sve_fminnum_s, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_ZPZZ_FP(sve_fmax_h, uint16_t, H1_2, float16_max)
+ DO_ZPZZ_FP(sve_fmax_s, uint32_t, H1_4, float32_max)
+ DO_ZPZZ_FP(sve_fmax_d, uint64_t, H1_8, float64_max)
++DO_ZPZZ_FP(sve_ah_fmin_h, uint16_t, H1_2, helper_vfp_ah_minh)
++DO_ZPZZ_FP(sve_ah_fmin_s, uint32_t, H1_4, helper_vfp_ah_mins)
++DO_ZPZZ_FP(sve_ah_fmin_d, uint64_t, H1_8, helper_vfp_ah_mind)
++
++DO_ZPZZ_FP(sve_ah_fmax_h, uint16_t, H1_2, helper_vfp_ah_maxh)
++DO_ZPZZ_FP(sve_ah_fmax_s, uint32_t, H1_4, helper_vfp_ah_maxs)
++DO_ZPZZ_FP(sve_ah_fmax_d, uint64_t, H1_8, helper_vfp_ah_maxd)
++
+ DO_ZPZZ_FP(sve_fminnum_h, uint16_t, H1_2, float16_minnum)
+ DO_ZPZZ_FP(sve_fminnum_s, uint32_t, H1_4, float32_minnum)
+ DO_ZPZZ_FP(sve_fminnum_d, uint64_t, H1_8, float64_minnum)
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(FTSMUL, aa64_sve, gen_gvec_fpst_arg_zzz,
+     };                                                          \
+     TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz, name##_zpzz_fns[a->esz], a)
++#define DO_ZPZZ_AH_FP(NAME, FEAT, name, ah_name)                        \
++    static gen_helper_gvec_4_ptr * const name##_zpzz_fns[4] = {         \
++        NULL,                  gen_helper_##name##_h,                   \
++        gen_helper_##name##_s, gen_helper_##name##_d                    \
++    };                                                                  \
++    static gen_helper_gvec_4_ptr * const name##_ah_zpzz_fns[4] = {      \
++        NULL,                  gen_helper_##ah_name##_h,                \
++        gen_helper_##ah_name##_s, gen_helper_##ah_name##_d              \
++    };                                                                  \
++    TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz,                      \
++               s->fpcr_ah ? name##_ah_zpzz_fns[a->esz] :                \
++               name##_zpzz_fns[a->esz], a)
++
+ DO_ZPZZ_FP(FADD_zpzz, aa64_sve, sve_fadd)
+ DO_ZPZZ_FP(FSUB_zpzz, aa64_sve, sve_fsub)
+ DO_ZPZZ_FP(FMUL_zpzz, aa64_sve, sve_fmul)
+-DO_ZPZZ_FP(FMIN_zpzz, aa64_sve, sve_fmin)
+-DO_ZPZZ_FP(FMAX_zpzz, aa64_sve, sve_fmax)
++DO_ZPZZ_AH_FP(FMIN_zpzz, aa64_sve, sve_fmin, sve_ah_fmin)
++DO_ZPZZ_AH_FP(FMAX_zpzz, aa64_sve, sve_fmax, sve_ah_fmax)
+ DO_ZPZZ_FP(FMINNM_zpzz, aa64_sve, sve_fminnum)
+ DO_ZPZZ_FP(FMAXNM_zpzz, aa64_sve, sve_fmaxnum)
+ DO_ZPZZ_FP(FABD, aa64_sve, sve_fabd)
+--
+.34.1

-New patch
+[PULL 30/68] target/arm: Implement FPCR.AH handling of negation of NaN
+FPCR.AH == 1 mandates that negation of a NaN value should not flip
 its sign bit.  This means we can no longer use gen_vfp_neg*()
 everywhere but must instead generate slightly more complex code when
 FPCR.AH is set.
 Make this change for the scalar FNEG and for those places in
 translate-a64.c which were previously directly calling
 gen_vfp_neg*().
 This change in semantics also affects any other instruction whose
 pseudocode calls FPNeg(); in following commits we extend this
 change to the other affected instructions.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/tcg/translate-a64.c | 125 ++++++++++++++++++++++++++++++---
 file changed, 114 insertions(+), 11 deletions(-)
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
                         is_q ? 16 : 8, vec_full_reg_size(s), data, fn);
  }
 +/*
 + * When FPCR.AH == 1, NEG and ABS do not flip the sign bit of a NaN.
 + * These functions implement
 + *   d = floatN_is_any_nan(s) ? s : floatN_chs(s)
 + * which for float32 is
 + *   d = (s & ~(1 << 31)) > 0x7f800000UL) ? s : (s ^ (1 << 31))
 + * and similarly for the other float sizes.
 + */
 +static void gen_vfp_ah_negh(TCGv_i32 d, TCGv_i32 s)
 +{
 +    TCGv_i32 abs_s = tcg_temp_new_i32(), chs_s = tcg_temp_new_i32();
 +
 +    gen_vfp_negh(chs_s, s);
 +    gen_vfp_absh(abs_s, s);
 +    tcg_gen_movcond_i32(TCG_COND_GTU, d,
 +                        abs_s, tcg_constant_i32(0x7c00),
 +                        s, chs_s);
 +}
 +
 +static void gen_vfp_ah_negs(TCGv_i32 d, TCGv_i32 s)
 +{
 +    TCGv_i32 abs_s = tcg_temp_new_i32(), chs_s = tcg_temp_new_i32();
 +
 +    gen_vfp_negs(chs_s, s);
 +    gen_vfp_abss(abs_s, s);
 +    tcg_gen_movcond_i32(TCG_COND_GTU, d,
 +                        abs_s, tcg_constant_i32(0x7f800000UL),
 +                        s, chs_s);
 +}
 +
 +static void gen_vfp_ah_negd(TCGv_i64 d, TCGv_i64 s)
 +{
 +    TCGv_i64 abs_s = tcg_temp_new_i64(), chs_s = tcg_temp_new_i64();
 +
 +    gen_vfp_negd(chs_s, s);
 +    gen_vfp_absd(abs_s, s);
 +    tcg_gen_movcond_i64(TCG_COND_GTU, d,
 +                        abs_s, tcg_constant_i64(0x7ff0000000000000ULL),
 +                        s, chs_s);
 +}
 +
 +static void gen_vfp_maybe_ah_negh(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
 +{
 +    if (dc->fpcr_ah) {
 +        gen_vfp_ah_negh(d, s);
 +    } else {
 +        gen_vfp_negh(d, s);
 +    }
 +}
 +
 +static void gen_vfp_maybe_ah_negs(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
 +{
 +    if (dc->fpcr_ah) {
 +        gen_vfp_ah_negs(d, s);
 +    } else {
 +        gen_vfp_negs(d, s);
 +    }
 +}
 +
 +static void gen_vfp_maybe_ah_negd(DisasContext *dc, TCGv_i64 d, TCGv_i64 s)
 +{
 +    if (dc->fpcr_ah) {
 +        gen_vfp_ah_negd(d, s);
 +    } else {
 +        gen_vfp_negd(d, s);
 +    }
 +}
 +
  /* Set ZF and NF based on a 64 bit result. This is alas fiddlier
   * than the 32 bit equivalent.
   */
@@ -XXX,XX +XXX,XX @@ static void gen_fnmul_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
      gen_vfp_negd(d, d);
  }
 +static void gen_fnmul_ah_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 +{
 +    gen_helper_vfp_mulh(d, n, m, s);
 +    gen_vfp_ah_negh(d, d);
 +}
 +
 +static void gen_fnmul_ah_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 +{
 +    gen_helper_vfp_muls(d, n, m, s);
 +    gen_vfp_ah_negs(d, d);
 +}
 +
 +static void gen_fnmul_ah_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
 +{
 +    gen_helper_vfp_muld(d, n, m, s);
 +    gen_vfp_ah_negd(d, d);
 +}
 +
  static const FPScalar f_scalar_fnmul = {
      gen_fnmul_h,
      gen_fnmul_s,
      gen_fnmul_d,
  };
 -TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul, a->rn)
 +static const FPScalar f_scalar_ah_fnmul = {
 +    gen_fnmul_ah_h,
 +    gen_fnmul_ah_s,
 +    gen_fnmul_ah_d,
 +};
 +TRANS(FNMUL_s, do_fp3_scalar_2fn, a, &f_scalar_fnmul, &f_scalar_ah_fnmul, a->rn)
  static const FPScalar f_scalar_fcmeq = {
      gen_helper_advsimd_ceq_f16,
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
              read_vec_element(s, t2, a->rm, a->idx, MO_64);
              if (neg) {
 -                gen_vfp_negd(t1, t1);
 +                gen_vfp_maybe_ah_negd(s, t1, t1);
              }
              gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
              write_fp_dreg_merging(s, a->rd, a->rd, t0);
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
              read_vec_element_i32(s, t2, a->rm, a->idx, MO_32);
              if (neg) {
 -                gen_vfp_negs(t1, t1);
 +                gen_vfp_maybe_ah_negs(s, t1, t1);
              }
              gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
              write_fp_sreg_merging(s, a->rd, a->rd, t0);
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
              read_vec_element_i32(s, t2, a->rm, a->idx, MO_16);
              if (neg) {
 -                gen_vfp_negh(t1, t1);
 +                gen_vfp_maybe_ah_negh(s, t1, t1);
              }
              gen_helper_advsimd_muladdh(t0, t1, t2, t0,
                                         fpstatus_ptr(FPST_A64_F16));
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
              TCGv_i64 ta = read_fp_dreg(s, a->ra);
              if (neg_a) {
 -                gen_vfp_negd(ta, ta);
 +                gen_vfp_maybe_ah_negd(s, ta, ta);
              }
              if (neg_n) {
 -                gen_vfp_negd(tn, tn);
 +                gen_vfp_maybe_ah_negd(s, tn, tn);
              }
              fpst = fpstatus_ptr(FPST_A64);
              gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
              TCGv_i32 ta = read_fp_sreg(s, a->ra);
              if (neg_a) {
 -                gen_vfp_negs(ta, ta);
 +                gen_vfp_maybe_ah_negs(s, ta, ta);
              }
              if (neg_n) {
 -                gen_vfp_negs(tn, tn);
 +                gen_vfp_maybe_ah_negs(s, tn, tn);
              }
              fpst = fpstatus_ptr(FPST_A64);
              gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
              TCGv_i32 ta = read_fp_hreg(s, a->ra);
              if (neg_a) {
 -                gen_vfp_negh(ta, ta);
 +                gen_vfp_maybe_ah_negh(s, ta, ta);
              }
              if (neg_n) {
 -                gen_vfp_negh(tn, tn);
 +                gen_vfp_maybe_ah_negh(s, tn, tn);
              }
              fpst = fpstatus_ptr(FPST_A64_F16);
              gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
      return true;
  }
 +static bool do_fp1_scalar_int_2fn(DisasContext *s, arg_rr_e *a,
 +                                  const FPScalar1Int *fnormal,
 +                                  const FPScalar1Int *fah)
 +{
 +    return do_fp1_scalar_int(s, a, s->fpcr_ah ? fah : fnormal, true);
 +}
 +
  static const FPScalar1Int f_scalar_fmov = {
      tcg_gen_mov_i32,
      tcg_gen_mov_i32,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fneg = {
      gen_vfp_negs,
      gen_vfp_negd,
  };
 -TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg, true)
 +static const FPScalar1Int f_scalar_ah_fneg = {
 +    gen_vfp_ah_negh,
 +    gen_vfp_ah_negs,
 +    gen_vfp_ah_negd,
 +};
 +TRANS(FNEG_s, do_fp1_scalar_int_2fn, a, &f_scalar_fneg, &f_scalar_ah_fneg)
  typedef struct FPScalar1 {
      void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr);
 --
 .34.1

-[PULL 07/35] kvm: Introduce kvm_arch_get_default_type hook
+[PULL 31/68] target/arm: Implement FPCR.AH handling for scalar FABS and FABD
-From: Akihiko Odaki <akihiko.odaki@daynix.com>
+FPCR.AH == 1 mandates that taking the absolute value of a NaN should
 not change its sign bit.  This means we can no longer use
 gen_vfp_abs*() everywhere but must instead generate slightly more
 complex code when FPCR.AH is set.
-kvm_arch_get_default_type() returns the default KVM type. This hook is
+Implement these semantics for scalar FABS and FABD.  This change also
-particularly useful to derive a KVM type that is valid for "none"
+affects all other instructions whose psuedocode calls FPAbs(); we
-machine model, which is used by libvirt to probe the availability of
+will extend the change to those instructions in following commits.
 KVM.
-For MIPS, the existing mips_kvm_type() is reused. This function ensures
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-the availability of VZ which is mandatory to use KVM on the current
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-QEMU.
+---
  target/arm/tcg/translate-a64.c | 69 +++++++++++++++++++++++++++++++++-
 file changed, 67 insertions(+), 2 deletions(-)
-Cc: qemu-stable@nongnu.org
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
 Message-id: 20230727073134.134102-2-akihiko.odaki@daynix.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 [PMM: added doc comment for new function]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 ---
  include/sysemu/kvm.h     | 2 ++
  target/mips/kvm_mips.h   | 9 ---------
  accel/kvm/kvm-all.c      | 4 +++-
  hw/mips/loongson3_virt.c | 2 --
  target/arm/kvm.c         | 5 +++++
  target/i386/kvm/kvm.c    | 5 +++++
  target/mips/kvm.c        | 2 +-
  target/ppc/kvm.c         | 5 +++++
  target/riscv/kvm.c       | 5 +++++
  target/s390x/kvm/kvm.c   | 5 +++++
 files changed, 31 insertions(+), 13 deletions(-)
 diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/sysemu/kvm.h
+--- a/target/arm/tcg/translate-a64.c
-+++ b/include/sysemu/kvm.h
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_get_registers(CPUState *cpu);
+@@ -XXX,XX +XXX,XX @@ static void gen_vfp_ah_negd(TCGv_i64 d, TCGv_i64 s)
+                         s, chs_s);
- int kvm_arch_put_registers(CPUState *cpu, int level);
+ }
-+int kvm_arch_get_default_type(MachineState *ms);
++/*
 + * These functions implement
 + *  d = floatN_is_any_nan(s) ? s : floatN_abs(s)
 + * which for float32 is
 + *  d = (s & ~(1 << 31)) > 0x7f800000UL) ? s : (s & ~(1 << 31))
 + * and similarly for the other float sizes.
 + */
 +static void gen_vfp_ah_absh(TCGv_i32 d, TCGv_i32 s)
 +{
 +    TCGv_i32 abs_s = tcg_temp_new_i32();
 +
- int kvm_arch_init(MachineState *ms, KVMState *s);
++    gen_vfp_absh(abs_s, s);
++    tcg_gen_movcond_i32(TCG_COND_GTU, d,
- int kvm_arch_init_vcpu(CPUState *cpu);
++                        abs_s, tcg_constant_i32(0x7c00),
-diff --git a/target/mips/kvm_mips.h b/target/mips/kvm_mips.h
++                        s, abs_s);
 index XXXXXXX..XXXXXXX 100644
 --- a/target/mips/kvm_mips.h
 +++ b/target/mips/kvm_mips.h
@@ -XXX,XX +XXX,XX @@ void kvm_mips_reset_vcpu(MIPSCPU *cpu);
  int kvm_mips_set_interrupt(MIPSCPU *cpu, int irq, int level);
  int kvm_mips_set_ipi_interrupt(MIPSCPU *cpu, int irq, int level);
 -#ifdef CONFIG_KVM
 -int mips_kvm_type(MachineState *machine, const char *vm_type);
 -#else
 -static inline int mips_kvm_type(MachineState *machine, const char *vm_type)
 -{
 -    return 0;
 -}
 -#endif
 -
  #endif /* KVM_MIPS_H */
 diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/kvm/kvm-all.c
 +++ b/accel/kvm/kvm-all.c
@@ -XXX,XX +XXX,XX @@ static int kvm_init(MachineState *ms)
      KVMState *s;
      const KVMCapabilityInfo *missing_cap;
      int ret;
 -    int type = 0;
 +    int type;
      uint64_t dirty_log_manual_caps;
      qemu_mutex_init(&kml_slots_lock);
@@ -XXX,XX +XXX,XX @@ static int kvm_init(MachineState *ms)
          type = mc->kvm_type(ms, kvm_type);
      } else if (mc->kvm_type) {
          type = mc->kvm_type(ms, NULL);
 +    } else {
 +        type = kvm_arch_get_default_type(ms);
      }
      do {
 diff --git a/hw/mips/loongson3_virt.c b/hw/mips/loongson3_virt.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/mips/loongson3_virt.c
 +++ b/hw/mips/loongson3_virt.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/datadir.h"
  #include "qapi/error.h"
  #include "elf.h"
 -#include "kvm_mips.h"
  #include "hw/char/serial.h"
  #include "hw/intc/loongson_liointc.h"
  #include "hw/mips/mips.h"
@@ -XXX,XX +XXX,XX @@ static void loongson3v_machine_class_init(ObjectClass *oc, void *data)
      mc->max_cpus = LOONGSON_MAX_VCPUS;
      mc->default_ram_id = "loongson3.highram";
      mc->default_ram_size = 1600 * MiB;
 -    mc->kvm_type = mips_kvm_type;
      mc->minimum_page_bits = 14;
      mc->default_nic = "virtio-net-pci";
  }
 diff --git a/target/arm/kvm.c b/target/arm/kvm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm.c
 +++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ int kvm_arm_get_max_vm_ipa_size(MachineState *ms, bool *fixed_ipa)
      return ret > 0 ? ret : 40;
  }
 +int kvm_arch_get_default_type(MachineState *ms)
 +{
 +    return 0;
 +}
 +
- int kvm_arch_init(MachineState *ms, KVMState *s)
++static void gen_vfp_ah_abss(TCGv_i32 d, TCGv_i32 s)
  {
      int ret = 0;
 diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/kvm/kvm.c
 +++ b/target/i386/kvm/kvm.c
@@ -XXX,XX +XXX,XX @@ static void register_smram_listener(Notifier *n, void *unused)
                                   &smram_address_space, 1, "kvm-smram");
  }
 +int kvm_arch_get_default_type(MachineState *ms)
 +{
-+    return 0;
++    TCGv_i32 abs_s = tcg_temp_new_i32();
 +
 +    gen_vfp_abss(abs_s, s);
 +    tcg_gen_movcond_i32(TCG_COND_GTU, d,
 +                        abs_s, tcg_constant_i32(0x7f800000UL),
 +                        s, abs_s);
 +}
 +
- int kvm_arch_init(MachineState *ms, KVMState *s)
++static void gen_vfp_ah_absd(TCGv_i64 d, TCGv_i64 s)
  {
      uint64_t identity_base = 0xfffbc000;
 diff --git a/target/mips/kvm.c b/target/mips/kvm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/mips/kvm.c
 +++ b/target/mips/kvm.c
@@ -XXX,XX +XXX,XX @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
      abort();
  }
 -int mips_kvm_type(MachineState *machine, const char *vm_type)
 +int kvm_arch_get_default_type(MachineState *machine)
  {
  #if defined(KVM_CAP_MIPS_VZ)
      int r;
 diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/ppc/kvm.c
 +++ b/target/ppc/kvm.c
@@ -XXX,XX +XXX,XX @@ static int kvm_ppc_register_host_cpu_type(void);
  static void kvmppc_get_cpu_characteristics(KVMState *s);
  static int kvmppc_get_dec_bits(void);
 +int kvm_arch_get_default_type(MachineState *ms)
 +{
-+    return 0;
++    TCGv_i64 abs_s = tcg_temp_new_i64();
 +
 +    gen_vfp_absd(abs_s, s);
 +    tcg_gen_movcond_i64(TCG_COND_GTU, d,
 +                        abs_s, tcg_constant_i64(0x7ff0000000000000ULL),
 +                        s, abs_s);
 +}
 +
- int kvm_arch_init(MachineState *ms, KVMState *s)
+ static void gen_vfp_maybe_ah_negh(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
  {
-     cap_interrupt_unset = kvm_check_extension(s, KVM_CAP_PPC_UNSET_IRQ);
+     if (dc->fpcr_ah) {
-diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
+@@ -XXX,XX +XXX,XX @@ static void gen_fabd_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
-index XXXXXXX..XXXXXXX 100644
+     gen_vfp_absd(d, d);
 --- a/target/riscv/kvm.c
 +++ b/target/riscv/kvm.c
@@ -XXX,XX +XXX,XX @@ int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
      return 0;
  }
-+int kvm_arch_get_default_type(MachineState *ms)
++static void gen_fabd_ah_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 +{
-+    return 0;
++    gen_helper_vfp_subh(d, n, m, s);
 +    gen_vfp_ah_absh(d, d);
 +}
 +
- int kvm_arch_init(MachineState *ms, KVMState *s)
++static void gen_fabd_ah_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
  {
      return 0;
 diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/s390x/kvm/kvm.c
 +++ b/target/s390x/kvm/kvm.c
@@ -XXX,XX +XXX,XX @@ static void ccw_machine_class_foreach(ObjectClass *oc, void *opaque)
      mc->default_cpu_type = S390_CPU_TYPE_NAME("host");
  }
 +int kvm_arch_get_default_type(MachineState *ms)
 +{
-+    return 0;
++    gen_helper_vfp_subs(d, n, m, s);
 +    gen_vfp_ah_abss(d, d);
 +}
 +
- int kvm_arch_init(MachineState *ms, KVMState *s)
++static void gen_fabd_ah_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
- {
++{
-     object_class_foreach(ccw_machine_class_foreach, TYPE_S390_CCW_MACHINE,
++    gen_helper_vfp_subd(d, n, m, s);
 +    gen_vfp_ah_absd(d, d);
 +}
 +
  static const FPScalar f_scalar_fabd = {
      gen_fabd_h,
      gen_fabd_s,
      gen_fabd_d,
  };
 -TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd, a->rn)
 +static const FPScalar f_scalar_ah_fabd = {
 +    gen_fabd_ah_h,
 +    gen_fabd_ah_s,
 +    gen_fabd_ah_d,
 +};
 +TRANS(FABD_s, do_fp3_scalar_2fn, a, &f_scalar_fabd, &f_scalar_ah_fabd, a->rn)
  static const FPScalar f_scalar_frecps = {
      gen_helper_recpsf_f16,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fabs = {
      gen_vfp_abss,
      gen_vfp_absd,
  };
 -TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs, true)
 +static const FPScalar1Int f_scalar_ah_fabs = {
 +    gen_vfp_ah_absh,
 +    gen_vfp_ah_abss,
 +    gen_vfp_ah_absd,
 +};
 +TRANS(FABS_s, do_fp1_scalar_int_2fn, a, &f_scalar_fabs, &f_scalar_ah_fabs)
  static const FPScalar1Int f_scalar_fneg = {
      gen_vfp_negh,
 --
 .34.1

-New patch
+[PULL 32/68] target/arm: Handle FPCR.AH in vector FABD
+Split the handling of vector FABD so that it calls a different set
+of helpers when FPCR.AH is 1, which implement the "no negation of
+the sign of a NaN" semantics.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/helper.h            |  4 ++++
+ target/arm/tcg/translate-a64.c |  7 ++++++-
+ target/arm/tcg/vec_helper.c    | 23 +++++++++++++++++++++++
+files changed, 33 insertions(+), 1 deletion(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fabd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(gvec_fabd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(gvec_fabd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fabd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fabd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_fabd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_5(gvec_fceq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(gvec_fceq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(gvec_fceq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fabd[3] = {
+     gen_helper_gvec_fabd_s,
+     gen_helper_gvec_fabd_d,
+ };
+-TRANS(FABD_v, do_fp3_vector, a, 0, f_vector_fabd)
++static gen_helper_gvec_3_ptr * const f_vector_ah_fabd[3] = {
++    gen_helper_gvec_ah_fabd_h,
++    gen_helper_gvec_ah_fabd_s,
++    gen_helper_gvec_ah_fabd_d,
++};
++TRANS(FABD_v, do_fp3_vector_2fn, a, 0, f_vector_fabd, f_vector_ah_fabd)
+ static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
+     gen_helper_gvec_recps_h,
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ static float64 float64_abd(float64 op1, float64 op2, float_status *stat)
+     return float64_abs(float64_sub(op1, op2, stat));
+ }
++/* ABD when FPCR.AH = 1: avoid flipping sign bit of a NaN result */
++static float16 float16_ah_abd(float16 op1, float16 op2, float_status *stat)
++{
++    float16 r = float16_sub(op1, op2, stat);
++    return float16_is_any_nan(r) ? r : float16_abs(r);
++}
++
++static float32 float32_ah_abd(float32 op1, float32 op2, float_status *stat)
++{
++    float32 r = float32_sub(op1, op2, stat);
++    return float32_is_any_nan(r) ? r : float32_abs(r);
++}
++
++static float64 float64_ah_abd(float64 op1, float64 op2, float_status *stat)
++{
++    float64 r = float64_sub(op1, op2, stat);
++    return float64_is_any_nan(r) ? r : float64_abs(r);
++}
++
+ /*
+  * Reciprocal step. These are the AArch32 version which uses a
+  * non-fused multiply-and-subtract.
+@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_fabd_h, float16_abd, float16)
+ DO_3OP(gvec_fabd_s, float32_abd, float32)
+ DO_3OP(gvec_fabd_d, float64_abd, float64)
++DO_3OP(gvec_ah_fabd_h, float16_ah_abd, float16)
++DO_3OP(gvec_ah_fabd_s, float32_ah_abd, float32)
++DO_3OP(gvec_ah_fabd_d, float64_ah_abd, float64)
++
+ DO_3OP(gvec_fceq_h, float16_ceq, float16)
+ DO_3OP(gvec_fceq_s, float32_ceq, float32)
+ DO_3OP(gvec_fceq_d, float64_ceq, float64)
+--
+.34.1

-New patch
+[PULL 33/68] target/arm: Handle FPCR.AH in SVE FNEG
+Make SVE FNEG honour the FPCR.AH "don't negate the sign of a NaN"
+semantics.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 4 ++++
+ target/arm/tcg/sve_helper.c    | 8 ++++++++
+ target/arm/tcg/translate-sve.c | 7 ++++++-
+files changed, 18 insertions(+), 1 deletion(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++
+ DEF_HELPER_FLAGS_4(sve_not_zpz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_not_zpz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_not_zpz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_ZPZ(sve_fneg_h, uint16_t, H1_2, DO_FNEG)
+ DO_ZPZ(sve_fneg_s, uint32_t, H1_4, DO_FNEG)
+ DO_ZPZ_D(sve_fneg_d, uint64_t, DO_FNEG)
++#define DO_AH_FNEG_H(N) (float16_is_any_nan(N) ? (N) : DO_FNEG(N))
++#define DO_AH_FNEG_S(N) (float32_is_any_nan(N) ? (N) : DO_FNEG(N))
++#define DO_AH_FNEG_D(N) (float64_is_any_nan(N) ? (N) : DO_FNEG(N))
++
++DO_ZPZ(sve_ah_fneg_h, uint16_t, H1_2, DO_AH_FNEG_H)
++DO_ZPZ(sve_ah_fneg_s, uint32_t, H1_4, DO_AH_FNEG_S)
++DO_ZPZ_D(sve_ah_fneg_d, uint64_t, DO_AH_FNEG_D)
++
+ #define DO_NOT(N)    (~N)
+ DO_ZPZ(sve_not_zpz_b, uint8_t, H1, DO_NOT)
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const fneg_fns[4] = {
+     NULL,                  gen_helper_sve_fneg_h,
+     gen_helper_sve_fneg_s, gen_helper_sve_fneg_d,
+ };
+-TRANS_FEAT(FNEG, aa64_sve, gen_gvec_ool_arg_zpz, fneg_fns[a->esz], a, 0)
++static gen_helper_gvec_3 * const fneg_ah_fns[4] = {
++    NULL,                  gen_helper_sve_ah_fneg_h,
++    gen_helper_sve_ah_fneg_s, gen_helper_sve_ah_fneg_d,
++};
++TRANS_FEAT(FNEG, aa64_sve, gen_gvec_ool_arg_zpz,
++           s->fpcr_ah ? fneg_ah_fns[a->esz] : fneg_fns[a->esz], a, 0)
+ static gen_helper_gvec_3 * const sxtb_fns[4] = {
+     NULL,                  gen_helper_sve_sxtb_h,
+--
+.34.1

-New patch
+[PULL 34/68] target/arm: Handle FPCR.AH in SVE FABS
+Make SVE FABS honour the FPCR.AH "don't negate the sign of a NaN"
+semantics.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 4 ++++
+ target/arm/tcg/sve_helper.c    | 8 ++++++++
+ target/arm/tcg/translate-sve.c | 7 ++++++-
+files changed, 18 insertions(+), 1 deletion(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_fabs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fabs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fabs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fabs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fabs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fabs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++
+ DEF_HELPER_FLAGS_4(sve_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_ZPZ(sve_fabs_h, uint16_t, H1_2, DO_FABS)
+ DO_ZPZ(sve_fabs_s, uint32_t, H1_4, DO_FABS)
+ DO_ZPZ_D(sve_fabs_d, uint64_t, DO_FABS)
++#define DO_AH_FABS_H(N) (float16_is_any_nan(N) ? (N) : DO_FABS(N))
++#define DO_AH_FABS_S(N) (float32_is_any_nan(N) ? (N) : DO_FABS(N))
++#define DO_AH_FABS_D(N) (float64_is_any_nan(N) ? (N) : DO_FABS(N))
++
++DO_ZPZ(sve_ah_fabs_h, uint16_t, H1_2, DO_AH_FABS_H)
++DO_ZPZ(sve_ah_fabs_s, uint32_t, H1_4, DO_AH_FABS_S)
++DO_ZPZ_D(sve_ah_fabs_d, uint64_t, DO_AH_FABS_D)
++
+ #define DO_FNEG(N)    (N ^ ~((__typeof(N))-1 >> 1))
+ DO_ZPZ(sve_fneg_h, uint16_t, H1_2, DO_FNEG)
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const fabs_fns[4] = {
+     NULL,                  gen_helper_sve_fabs_h,
+     gen_helper_sve_fabs_s, gen_helper_sve_fabs_d,
+ };
+-TRANS_FEAT(FABS, aa64_sve, gen_gvec_ool_arg_zpz, fabs_fns[a->esz], a, 0)
++static gen_helper_gvec_3 * const fabs_ah_fns[4] = {
++    NULL,                  gen_helper_sve_ah_fabs_h,
++    gen_helper_sve_ah_fabs_s, gen_helper_sve_ah_fabs_d,
++};
++TRANS_FEAT(FABS, aa64_sve, gen_gvec_ool_arg_zpz,
++           s->fpcr_ah ? fabs_ah_fns[a->esz] : fabs_fns[a->esz], a, 0)
+ static gen_helper_gvec_3 * const fneg_fns[4] = {
+     NULL,                  gen_helper_sve_fneg_h,
+--
+.34.1

-New patch
+[PULL 35/68] target/arm: Handle FPCR.AH in SVE FABD
+Make the SVE FABD insn honour the FPCR.AH "don't negate the sign
+of a NaN" semantics.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    |  7 +++++++
+ target/arm/tcg/sve_helper.c    | 22 ++++++++++++++++++++++
+ target/arm/tcg/translate-sve.c |  2 +-
+files changed, 30 insertions(+), 1 deletion(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_fabd_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_6(sve_fabd_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fabd_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fabd_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fabd_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_6(sve_fscalbn_h, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_6(sve_fscalbn_s, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ static inline float64 abd_d(float64 a, float64 b, float_status *s)
+     return float64_abs(float64_sub(a, b, s));
+ }
++/* ABD when FPCR.AH = 1: avoid flipping sign bit of a NaN result */
++static float16 ah_abd_h(float16 op1, float16 op2, float_status *stat)
++{
++    float16 r = float16_sub(op1, op2, stat);
++    return float16_is_any_nan(r) ? r : float16_abs(r);
++}
++
++static float32 ah_abd_s(float32 op1, float32 op2, float_status *stat)
++{
++    float32 r = float32_sub(op1, op2, stat);
++    return float32_is_any_nan(r) ? r : float32_abs(r);
++}
++
++static float64 ah_abd_d(float64 op1, float64 op2, float_status *stat)
++{
++    float64 r = float64_sub(op1, op2, stat);
++    return float64_is_any_nan(r) ? r : float64_abs(r);
++}
++
+ DO_ZPZZ_FP(sve_fabd_h, uint16_t, H1_2, abd_h)
+ DO_ZPZZ_FP(sve_fabd_s, uint32_t, H1_4, abd_s)
+ DO_ZPZZ_FP(sve_fabd_d, uint64_t, H1_8, abd_d)
++DO_ZPZZ_FP(sve_ah_fabd_h, uint16_t, H1_2, ah_abd_h)
++DO_ZPZZ_FP(sve_ah_fabd_s, uint32_t, H1_4, ah_abd_s)
++DO_ZPZZ_FP(sve_ah_fabd_d, uint64_t, H1_8, ah_abd_d)
+ static inline float64 scalbn_d(float64 a, int64_t b, float_status *s)
+ {
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ DO_ZPZZ_AH_FP(FMIN_zpzz, aa64_sve, sve_fmin, sve_ah_fmin)
+ DO_ZPZZ_AH_FP(FMAX_zpzz, aa64_sve, sve_fmax, sve_ah_fmax)
+ DO_ZPZZ_FP(FMINNM_zpzz, aa64_sve, sve_fminnum)
+ DO_ZPZZ_FP(FMAXNM_zpzz, aa64_sve, sve_fmaxnum)
+-DO_ZPZZ_FP(FABD, aa64_sve, sve_fabd)
++DO_ZPZZ_AH_FP(FABD, aa64_sve, sve_fabd, sve_ah_fabd)
+ DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn)
+ DO_ZPZZ_FP(FDIV, aa64_sve, sve_fdiv)
+ DO_ZPZZ_FP(FMULX, aa64_sve, sve_fmulx)
+--
+.34.1

-New patch
+[PULL 36/68] target/arm: Handle FPCR.AH in negation steps in SVE FCADD
+The negation steps in FCADD must honour FPCR.AH's "don't change the
+sign of a NaN" semantics.  Implement this in the same way we did for
+the base ASIMD FCADD, by encoding FPCR.AH into the SIMD data field
+passed to the helper and using that to decide whether to negate the
+values.
+The construction of neg_imag and neg_real were done to make it easy
+to apply both in parallel with two simple logical operations.  This
+changed with FPCR.AH, which is more complex than that. Switch to
+an approach that follows the pseudocode more closely, by extracting
+the 'rot=1' parameter from the SIMD data field and changing the
+sign of the appropriate input value.
+Note that there was a naming issue with neg_imag and neg_real.
+They were named backward, with neg_imag being non-zero for rot=1,
+and vice versa.  This was combined with reversed usage within the
+loop, so that the negation in the end turned out correct.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/vec_internal.h  | 17 ++++++++++++++
+ target/arm/tcg/sve_helper.c    | 42 ++++++++++++++++++++++++----------
+ target/arm/tcg/translate-sve.c |  2 +-
+files changed, 48 insertions(+), 13 deletions(-)
+diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_internal.h
++++ b/target/arm/tcg/vec_internal.h
+@@ -XXX,XX +XXX,XX @@
+ #ifndef TARGET_ARM_VEC_INTERNAL_H
+ #define TARGET_ARM_VEC_INTERNAL_H
++#include "fpu/softfloat.h"
++
+ /*
+  * Note that vector data is stored in host-endian 64-bit chunks,
+  * so addressing units smaller than that needs a host-endian fixup.
+@@ -XXX,XX +XXX,XX @@ float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2,
+  */
+ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp);
++static inline float16 float16_maybe_ah_chs(float16 a, bool fpcr_ah)
++{
++    return fpcr_ah && float16_is_any_nan(a) ? a : float16_chs(a);
++}
++
++static inline float32 float32_maybe_ah_chs(float32 a, bool fpcr_ah)
++{
++    return fpcr_ah && float32_is_any_nan(a) ? a : float32_chs(a);
++}
++
++static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah)
++{
++    return fpcr_ah && float64_is_any_nan(a) ? a : float64_chs(a);
++}
++
+ #endif /* TARGET_ARM_VEC_INTERNAL_H */
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg,
+ {
+     intptr_t j, i = simd_oprsz(desc);
+     uint64_t *g = vg;
+-    float16 neg_imag = float16_set_sign(0, simd_data(desc));
+-    float16 neg_real = float16_chs(neg_imag);
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+     do {
+         uint64_t pg = g[(i - 1) >> 6];
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg,
+             i -= 2 * sizeof(float16);
+             e0 = *(float16 *)(vn + H1_2(i));
+-            e1 = *(float16 *)(vm + H1_2(j)) ^ neg_real;
++            e1 = *(float16 *)(vm + H1_2(j));
+             e2 = *(float16 *)(vn + H1_2(j));
+-            e3 = *(float16 *)(vm + H1_2(i)) ^ neg_imag;
++            e3 = *(float16 *)(vm + H1_2(i));
++
++            if (rot) {
++                e3 = float16_maybe_ah_chs(e3, fpcr_ah);
++            } else {
++                e1 = float16_maybe_ah_chs(e1, fpcr_ah);
++            }
+             if (likely((pg >> (i & 63)) & 1)) {
+                 *(float16 *)(vd + H1_2(i)) = float16_add(e0, e1, s);
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg,
+ {
+     intptr_t j, i = simd_oprsz(desc);
+     uint64_t *g = vg;
+-    float32 neg_imag = float32_set_sign(0, simd_data(desc));
+-    float32 neg_real = float32_chs(neg_imag);
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+     do {
+         uint64_t pg = g[(i - 1) >> 6];
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg,
+             i -= 2 * sizeof(float32);
+             e0 = *(float32 *)(vn + H1_2(i));
+-            e1 = *(float32 *)(vm + H1_2(j)) ^ neg_real;
++            e1 = *(float32 *)(vm + H1_2(j));
+             e2 = *(float32 *)(vn + H1_2(j));
+-            e3 = *(float32 *)(vm + H1_2(i)) ^ neg_imag;
++            e3 = *(float32 *)(vm + H1_2(i));
++
++            if (rot) {
++                e3 = float32_maybe_ah_chs(e3, fpcr_ah);
++            } else {
++                e1 = float32_maybe_ah_chs(e1, fpcr_ah);
++            }
+             if (likely((pg >> (i & 63)) & 1)) {
+                 *(float32 *)(vd + H1_2(i)) = float32_add(e0, e1, s);
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg,
+ {
+     intptr_t j, i = simd_oprsz(desc);
+     uint64_t *g = vg;
+-    float64 neg_imag = float64_set_sign(0, simd_data(desc));
+-    float64 neg_real = float64_chs(neg_imag);
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+     do {
+         uint64_t pg = g[(i - 1) >> 6];
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg,
+             i -= 2 * sizeof(float64);
+             e0 = *(float64 *)(vn + H1_2(i));
+-            e1 = *(float64 *)(vm + H1_2(j)) ^ neg_real;
++            e1 = *(float64 *)(vm + H1_2(j));
+             e2 = *(float64 *)(vn + H1_2(j));
+-            e3 = *(float64 *)(vm + H1_2(i)) ^ neg_imag;
++            e3 = *(float64 *)(vm + H1_2(i));
++
++            if (rot) {
++                e3 = float64_maybe_ah_chs(e3, fpcr_ah);
++            } else {
++                e1 = float64_maybe_ah_chs(e1, fpcr_ah);
++            }
+             if (likely((pg >> (i & 63)) & 1)) {
+                 *(float64 *)(vd + H1_2(i)) = float64_add(e0, e1, s);
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_4_ptr * const fcadd_fns[] = {
+     gen_helper_sve_fcadd_s, gen_helper_sve_fcadd_d,
+ };
+ TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
+-           a->rd, a->rn, a->rm, a->pg, a->rot,
++           a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
+            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
+ #define DO_FMLA(NAME, name) \
+--
+.34.1

-New patch
+[PULL 37/68] target/arm: Handle FPCR.AH in negation steps in FCADD
+The negation steps in FCADD must honour FPCR.AH's "don't change the
+sign of a NaN" semantics.  Implement this by encoding FPCR.AH into
+the SIMD data field passed to the helper and using that to decide
+whether to negate the values.
+The construction of neg_imag and neg_real were done to make it easy
+to apply both in parallel with two simple logical operations.  This
+changed with FPCR.AH, which is more complex than that. Switch to
+an approach closer to the pseudocode, where we extract the rot
+parameter from the SIMD data word and negate the appropriate
+input value.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 10 +++++--
+ target/arm/tcg/vec_helper.c    | 54 +++++++++++++++++++---------------
+files changed, 38 insertions(+), 26 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fcadd[3] = {
+     gen_helper_gvec_fcadds,
+     gen_helper_gvec_fcaddd,
+ };
+-TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0, f_vector_fcadd)
+-TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1, f_vector_fcadd)
++/*
++ * Encode FPCR.AH into the data so the helper knows whether the
++ * negations it does should avoid flipping the sign bit on a NaN
++ */
++TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0 | (s->fpcr_ah << 1),
++           f_vector_fcadd)
++TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1 | (s->fpcr_ah << 1),
++           f_vector_fcadd)
+ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
+ {
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm,
+     float16 *d = vd;
+     float16 *n = vn;
+     float16 *m = vm;
+-    uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
+-    uint32_t neg_imag = neg_real ^ 1;
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
+     uintptr_t i;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 15;
+-    neg_imag <<= 15;
+-
+     for (i = 0; i < opr_sz / 2; i += 2) {
+         float16 e0 = n[H2(i)];
+-        float16 e1 = m[H2(i + 1)] ^ neg_imag;
++        float16 e1 = m[H2(i + 1)];
+         float16 e2 = n[H2(i + 1)];
+-        float16 e3 = m[H2(i)] ^ neg_real;
++        float16 e3 = m[H2(i)];
++
++        if (rot) {
++            e3 = float16_maybe_ah_chs(e3, fpcr_ah);
++        } else {
++            e1 = float16_maybe_ah_chs(e1, fpcr_ah);
++        }
+         d[H2(i)] = float16_add(e0, e1, fpst);
+         d[H2(i + 1)] = float16_add(e2, e3, fpst);
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcadds)(void *vd, void *vn, void *vm,
+     float32 *d = vd;
+     float32 *n = vn;
+     float32 *m = vm;
+-    uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
+-    uint32_t neg_imag = neg_real ^ 1;
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
+     uintptr_t i;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 31;
+-    neg_imag <<= 31;
+-
+     for (i = 0; i < opr_sz / 4; i += 2) {
+         float32 e0 = n[H4(i)];
+-        float32 e1 = m[H4(i + 1)] ^ neg_imag;
++        float32 e1 = m[H4(i + 1)];
+         float32 e2 = n[H4(i + 1)];
+-        float32 e3 = m[H4(i)] ^ neg_real;
++        float32 e3 = m[H4(i)];
++
++        if (rot) {
++            e3 = float32_maybe_ah_chs(e3, fpcr_ah);
++        } else {
++            e1 = float32_maybe_ah_chs(e1, fpcr_ah);
++        }
+         d[H4(i)] = float32_add(e0, e1, fpst);
+         d[H4(i + 1)] = float32_add(e2, e3, fpst);
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
+     float64 *d = vd;
+     float64 *n = vn;
+     float64 *m = vm;
+-    uint64_t neg_real = extract64(desc, SIMD_DATA_SHIFT, 1);
+-    uint64_t neg_imag = neg_real ^ 1;
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
+     uintptr_t i;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 63;
+-    neg_imag <<= 63;
+-
+     for (i = 0; i < opr_sz / 8; i += 2) {
+         float64 e0 = n[i];
+-        float64 e1 = m[i + 1] ^ neg_imag;
++        float64 e1 = m[i + 1];
+         float64 e2 = n[i + 1];
+-        float64 e3 = m[i] ^ neg_real;
++        float64 e3 = m[i];
++
++        if (rot) {
++            e3 = float64_maybe_ah_chs(e3, fpcr_ah);
++        } else {
++            e1 = float64_maybe_ah_chs(e1, fpcr_ah);
++        }
+         d[i] = float64_add(e0, e1, fpst);
+         d[i + 1] = float64_add(e2, e3, fpst);
+--
+.34.1

-New patch
+[PULL 38/68] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
+Handle the FPCR.AH semantics that we do not change the sign of an
+input NaN in the FRECPS and FRSQRTS scalar insns, by providing
+new helper functions that do the CHS part of the operation
+differently.
+Since the extra helper functions would be very repetitive if written
+out longhand, we condense them and the existing non-AH helpers into
+being emitted via macros.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-a64.h    |   6 ++
+ target/arm/tcg/vec_internal.h  |  18 ++++++
+ target/arm/tcg/helper-a64.c    | 115 ++++++++++++---------------------
+ target/arm/tcg/translate-a64.c |  25 +++++--
+files changed, 83 insertions(+), 81 deletions(-)
+diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-a64.h
++++ b/target/arm/tcg/helper-a64.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(neon_cgt_f64, TCG_CALL_NO_RWG, i64, i64, i64, fpst)
+ DEF_HELPER_FLAGS_3(recpsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
+ DEF_HELPER_FLAGS_3(recpsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
+ DEF_HELPER_FLAGS_3(recpsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
++DEF_HELPER_FLAGS_3(recpsf_ah_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
++DEF_HELPER_FLAGS_3(recpsf_ah_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
++DEF_HELPER_FLAGS_3(recpsf_ah_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
+ DEF_HELPER_FLAGS_3(rsqrtsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
+ DEF_HELPER_FLAGS_3(rsqrtsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
+ DEF_HELPER_FLAGS_3(rsqrtsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
++DEF_HELPER_FLAGS_3(rsqrtsf_ah_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
++DEF_HELPER_FLAGS_3(rsqrtsf_ah_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
++DEF_HELPER_FLAGS_3(rsqrtsf_ah_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
+ DEF_HELPER_FLAGS_2(frecpx_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
+ DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
+ DEF_HELPER_FLAGS_2(frecpx_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
+diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_internal.h
++++ b/target/arm/tcg/vec_internal.h
+@@ -XXX,XX +XXX,XX @@ float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2,
+  */
+ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp);
++/*
++ * Negate as for FPCR.AH=1 -- do not negate NaNs.
++ */
++static inline float16 float16_ah_chs(float16 a)
++{
++    return float16_is_any_nan(a) ? a : float16_chs(a);
++}
++
++static inline float32 float32_ah_chs(float32 a)
++{
++    return float32_is_any_nan(a) ? a : float32_chs(a);
++}
++
++static inline float64 float64_ah_chs(float64 a)
++{
++    return float64_is_any_nan(a) ? a : float64_chs(a);
++}
++
+ static inline float16 float16_maybe_ah_chs(float16 a, bool fpcr_ah)
+ {
+     return fpcr_ah && float16_is_any_nan(a) ? a : float16_chs(a);
+diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-a64.c
++++ b/target/arm/tcg/helper-a64.c
+@@ -XXX,XX +XXX,XX @@
+ #ifdef CONFIG_USER_ONLY
+ #include "user/page-protection.h"
+ #endif
++#include "vec_internal.h"
+ /* C2.4.7 Multiply and divide */
+ /* special cases for 0 and LLONG_MIN are mandated by the standard */
+@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_cgt_f64)(float64 a, float64 b, float_status *fpst)
+     return -float64_lt(b, a, fpst);
+ }
+-/* Reciprocal step and sqrt step. Note that unlike the A32/T32
++/*
++ * Reciprocal step and sqrt step. Note that unlike the A32/T32
+  * versions, these do a fully fused multiply-add or
+  * multiply-add-and-halve.
++ * The FPCR.AH == 1 versions need to avoid flipping the sign of NaN.
+  */
+-
+-uint32_t HELPER(recpsf_f16)(uint32_t a, uint32_t b, float_status *fpst)
+-{
+-    a = float16_squash_input_denormal(a, fpst);
+-    b = float16_squash_input_denormal(b, fpst);
+-
+-    a = float16_chs(a);
+-    if ((float16_is_infinity(a) && float16_is_zero(b)) ||
+-        (float16_is_infinity(b) && float16_is_zero(a))) {
+-        return float16_two;
++#define DO_RECPS(NAME, CTYPE, FLOATTYPE, CHSFN)                         \
++    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
++    {                                                                   \
++        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
++        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
++        a = FLOATTYPE ## _ ## CHSFN(a);                                 \
++        if ((FLOATTYPE ## _is_infinity(a) && FLOATTYPE ## _is_zero(b)) || \
++            (FLOATTYPE ## _is_infinity(b) && FLOATTYPE ## _is_zero(a))) { \
++            return FLOATTYPE ## _two;                                   \
++        }                                                               \
++        return FLOATTYPE ## _muladd(a, b, FLOATTYPE ## _two, 0, fpst);  \
+     }
+-    return float16_muladd(a, b, float16_two, 0, fpst);
+-}
+-float32 HELPER(recpsf_f32)(float32 a, float32 b, float_status *fpst)
+-{
+-    a = float32_squash_input_denormal(a, fpst);
+-    b = float32_squash_input_denormal(b, fpst);
++DO_RECPS(recpsf_f16, uint32_t, float16, chs)
++DO_RECPS(recpsf_f32, float32, float32, chs)
++DO_RECPS(recpsf_f64, float64, float64, chs)
++DO_RECPS(recpsf_ah_f16, uint32_t, float16, ah_chs)
++DO_RECPS(recpsf_ah_f32, float32, float32, ah_chs)
++DO_RECPS(recpsf_ah_f64, float64, float64, ah_chs)
+-    a = float32_chs(a);
+-    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
+-        (float32_is_infinity(b) && float32_is_zero(a))) {
+-        return float32_two;
+-    }
+-    return float32_muladd(a, b, float32_two, 0, fpst);
+-}
++#define DO_RSQRTSF(NAME, CTYPE, FLOATTYPE, CHSFN)                       \
++    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
++    {                                                                   \
++        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
++        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
++        a = FLOATTYPE ## _ ## CHSFN(a);                                 \
++        if ((FLOATTYPE ## _is_infinity(a) && FLOATTYPE ## _is_zero(b)) || \
++            (FLOATTYPE ## _is_infinity(b) && FLOATTYPE ## _is_zero(a))) { \
++            return FLOATTYPE ## _one_point_five;                        \
++        }                                                               \
++        return FLOATTYPE ## _muladd_scalbn(a, b, FLOATTYPE ## _three,   \
++                                           -1, 0, fpst);                \
++    }                                                                   \
+-float64 HELPER(recpsf_f64)(float64 a, float64 b, float_status *fpst)
+-{
+-    a = float64_squash_input_denormal(a, fpst);
+-    b = float64_squash_input_denormal(b, fpst);
+-
+-    a = float64_chs(a);
+-    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
+-        (float64_is_infinity(b) && float64_is_zero(a))) {
+-        return float64_two;
+-    }
+-    return float64_muladd(a, b, float64_two, 0, fpst);
+-}
+-
+-uint32_t HELPER(rsqrtsf_f16)(uint32_t a, uint32_t b, float_status *fpst)
+-{
+-    a = float16_squash_input_denormal(a, fpst);
+-    b = float16_squash_input_denormal(b, fpst);
+-
+-    a = float16_chs(a);
+-    if ((float16_is_infinity(a) && float16_is_zero(b)) ||
+-        (float16_is_infinity(b) && float16_is_zero(a))) {
+-        return float16_one_point_five;
+-    }
+-    return float16_muladd_scalbn(a, b, float16_three, -1, 0, fpst);
+-}
+-
+-float32 HELPER(rsqrtsf_f32)(float32 a, float32 b, float_status *fpst)
+-{
+-    a = float32_squash_input_denormal(a, fpst);
+-    b = float32_squash_input_denormal(b, fpst);
+-
+-    a = float32_chs(a);
+-    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
+-        (float32_is_infinity(b) && float32_is_zero(a))) {
+-        return float32_one_point_five;
+-    }
+-    return float32_muladd_scalbn(a, b, float32_three, -1, 0, fpst);
+-}
+-
+-float64 HELPER(rsqrtsf_f64)(float64 a, float64 b, float_status *fpst)
+-{
+-    a = float64_squash_input_denormal(a, fpst);
+-    b = float64_squash_input_denormal(b, fpst);
+-
+-    a = float64_chs(a);
+-    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
+-        (float64_is_infinity(b) && float64_is_zero(a))) {
+-        return float64_one_point_five;
+-    }
+-    return float64_muladd_scalbn(a, b, float64_three, -1, 0, fpst);
+-}
++DO_RSQRTSF(rsqrtsf_f16, uint32_t, float16, chs)
++DO_RSQRTSF(rsqrtsf_f32, float32, float32, chs)
++DO_RSQRTSF(rsqrtsf_f64, float64, float64, chs)
++DO_RSQRTSF(rsqrtsf_ah_f16, uint32_t, float16, ah_chs)
++DO_RSQRTSF(rsqrtsf_ah_f32, float32, float32, ah_chs)
++DO_RSQRTSF(rsqrtsf_ah_f64, float64, float64, ah_chs)
+ /* Floating-point reciprocal exponent - see FPRecpX in ARM ARM */
+ uint32_t HELPER(frecpx_f16)(uint32_t a, float_status *fpst)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
+                                        FPST_A64_F16 : FPST_A64);
+ }
+-static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
+-                             int mergereg)
++static bool do_fp3_scalar_ah_2fn(DisasContext *s, arg_rrr_e *a,
++                                 const FPScalar *fnormal, const FPScalar *fah,
++                                 int mergereg)
+ {
+-    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
+-                                       select_ah_fpst(s, a->esz));
++    return do_fp3_scalar_with_fpsttype(s, a, s->fpcr_ah ? fah : fnormal,
++                                       mergereg, select_ah_fpst(s, a->esz));
+ }
+ /* Some insns need to call different helpers when FPCR.AH == 1 */
+@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_frecps = {
+     gen_helper_recpsf_f32,
+     gen_helper_recpsf_f64,
+ };
+-TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps, a->rn)
++static const FPScalar f_scalar_ah_frecps = {
++    gen_helper_recpsf_ah_f16,
++    gen_helper_recpsf_ah_f32,
++    gen_helper_recpsf_ah_f64,
++};
++TRANS(FRECPS_s, do_fp3_scalar_ah_2fn, a,
++      &f_scalar_frecps, &f_scalar_ah_frecps, a->rn)
+ static const FPScalar f_scalar_frsqrts = {
+     gen_helper_rsqrtsf_f16,
+     gen_helper_rsqrtsf_f32,
+     gen_helper_rsqrtsf_f64,
+ };
+-TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts, a->rn)
++static const FPScalar f_scalar_ah_frsqrts = {
++    gen_helper_rsqrtsf_ah_f16,
++    gen_helper_rsqrtsf_ah_f32,
++    gen_helper_rsqrtsf_ah_f64,
++};
++TRANS(FRSQRTS_s, do_fp3_scalar_ah_2fn, a,
++      &f_scalar_frsqrts, &f_scalar_ah_frsqrts, a->rn)
+ static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
+                        const FPScalar *f, bool swap)
+--
+.34.1

-[PULL 24/35] target/arm/ptw: Set attributes correctly for MMU disabled data accesses
+[PULL 39/68] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
-When the MMU is disabled, data accesses should be Device nGnRnE,
+Handle the FPCR.AH "don't negate the sign of a NaN" semantics
-Outer Shareable, Untagged.  We handle the other cases from
+in the vector versions of FRECPS and FRSQRTS, by implementing
-AArch64.S1DisabledOutput() correctly but missed this one.
+new vector wrappers that call the _ah_ scalar helpers.
 Device nGnRnE is memattr == 0, so the only part we were missing
 was that shareability should be set to 2 for both insn fetches
 and data accesses.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-13-peter.maydell@linaro.org
 ---
- target/arm/ptw.c | 12 +++++++-----
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
-file changed, 7 insertions(+), 5 deletions(-)
+ target/arm/tcg/translate-a64.c | 21 ++++++++++++++++-----
  target/arm/tcg/translate-sve.c |  7 ++++++-
  target/arm/tcg/vec_helper.c    |  8 ++++++++
 files changed, 44 insertions(+), 6 deletions(-)
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/tcg/helper-sve.h
-+++ b/target/arm/ptw.c
++++ b/target/arm/tcg/helper-sve.h
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_disabled(CPUARMState *env,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG,
-                 }
+ DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG,
-             }
+                    void, ptr, ptr, ptr, fpst, i32)
-         }
--        if (memattr == 0 && access_type == MMU_INST_FETCH) {
++DEF_HELPER_FLAGS_5(gvec_ah_recps_h, TCG_CALL_NO_RWG,
--            if (regime_sctlr(env, mmu_idx) & SCTLR_I) {
++                   void, ptr, ptr, ptr, fpst, i32)
--                memattr = 0xee;  /* Normal, WT, RA, NT */
++DEF_HELPER_FLAGS_5(gvec_ah_recps_s, TCG_CALL_NO_RWG,
--            } else {
++                   void, ptr, ptr, ptr, fpst, i32)
--                memattr = 0x44;  /* Normal, NC, No */
++DEF_HELPER_FLAGS_5(gvec_ah_recps_d, TCG_CALL_NO_RWG,
-+        if (memattr == 0) {
++                   void, ptr, ptr, ptr, fpst, i32)
-+            if (access_type == MMU_INST_FETCH) {
++
-+                if (regime_sctlr(env, mmu_idx) & SCTLR_I) {
++DEF_HELPER_FLAGS_5(gvec_ah_rsqrts_h, TCG_CALL_NO_RWG,
-+                    memattr = 0xee;  /* Normal, WT, RA, NT */
++                   void, ptr, ptr, ptr, fpst, i32)
-+                } else {
++DEF_HELPER_FLAGS_5(gvec_ah_rsqrts_s, TCG_CALL_NO_RWG,
-+                    memattr = 0x44;  /* Normal, NC, No */
++                   void, ptr, ptr, ptr, fpst, i32)
-+                }
++DEF_HELPER_FLAGS_5(gvec_ah_rsqrts_d, TCG_CALL_NO_RWG,
-             }
++                   void, ptr, ptr, ptr, fpst, i32)
-             shareability = 2; /* outer shareable */
++
-         }
+ DEF_HELPER_FLAGS_5(gvec_ah_fmax_h, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_5(gvec_ah_fmax_s, TCG_CALL_NO_RWG,
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector_2fn(DisasContext *s, arg_qrrr_e *a, int data,
      return do_fp3_vector(s, a, data, s->fpcr_ah ? fah : fnormal);
  }
 -static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
 -                             gen_helper_gvec_3_ptr * const f[3])
 +static bool do_fp3_vector_ah_2fn(DisasContext *s, arg_qrrr_e *a, int data,
 +                                 gen_helper_gvec_3_ptr * const fnormal[3],
 +                                 gen_helper_gvec_3_ptr * const fah[3])
  {
 -    return do_fp3_vector_with_fpsttype(s, a, data, f,
 +    return do_fp3_vector_with_fpsttype(s, a, data, s->fpcr_ah ? fah : fnormal,
                                         select_ah_fpst(s, a->esz));
  }
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
      gen_helper_gvec_recps_s,
      gen_helper_gvec_recps_d,
  };
 -TRANS(FRECPS_v, do_fp3_vector_ah, a, 0, f_vector_frecps)
 +static gen_helper_gvec_3_ptr * const f_vector_ah_frecps[3] = {
 +    gen_helper_gvec_ah_recps_h,
 +    gen_helper_gvec_ah_recps_s,
 +    gen_helper_gvec_ah_recps_d,
 +};
 +TRANS(FRECPS_v, do_fp3_vector_ah_2fn, a, 0, f_vector_frecps, f_vector_ah_frecps)
  static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] = {
      gen_helper_gvec_rsqrts_h,
      gen_helper_gvec_rsqrts_s,
      gen_helper_gvec_rsqrts_d,
  };
 -TRANS(FRSQRTS_v, do_fp3_vector_ah, a, 0, f_vector_frsqrts)
 +static gen_helper_gvec_3_ptr * const f_vector_ah_frsqrts[3] = {
 +    gen_helper_gvec_ah_rsqrts_h,
 +    gen_helper_gvec_ah_rsqrts_s,
 +    gen_helper_gvec_ah_rsqrts_d,
 +};
 +TRANS(FRSQRTS_v, do_fp3_vector_ah_2fn, a, 0, f_vector_frsqrts, f_vector_ah_frsqrts)
  static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = {
      gen_helper_gvec_faddp_h,
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
          NULL, gen_helper_gvec_##name##_h,                           \
          gen_helper_gvec_##name##_s, gen_helper_gvec_##name##_d      \
      };                                                              \
 -    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz, name##_fns[a->esz], a, 0)
 +    static gen_helper_gvec_3_ptr * const name##_ah_fns[4] = {       \
 +        NULL, gen_helper_gvec_ah_##name##_h,                        \
 +        gen_helper_gvec_ah_##name##_s, gen_helper_gvec_ah_##name##_d    \
 +    };                                                              \
 +    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz,            \
 +               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz], a, 0)
  DO_FP3(FADD_zzz, fadd)
  DO_FP3(FSUB_zzz, fsub)
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_rsqrts_h, helper_rsqrtsf_f16, float16)
  DO_3OP(gvec_rsqrts_s, helper_rsqrtsf_f32, float32)
  DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64)
 +DO_3OP(gvec_ah_recps_h, helper_recpsf_ah_f16, float16)
 +DO_3OP(gvec_ah_recps_s, helper_recpsf_ah_f32, float32)
 +DO_3OP(gvec_ah_recps_d, helper_recpsf_ah_f64, float64)
 +
 +DO_3OP(gvec_ah_rsqrts_h, helper_rsqrtsf_ah_f16, float16)
 +DO_3OP(gvec_ah_rsqrts_s, helper_rsqrtsf_ah_f32, float32)
 +DO_3OP(gvec_ah_rsqrts_d, helper_rsqrtsf_ah_f64, float64)
 +
  DO_3OP(gvec_ah_fmax_h, helper_vfp_ah_maxh, float16)
  DO_3OP(gvec_ah_fmax_s, helper_vfp_ah_maxs, float32)
  DO_3OP(gvec_ah_fmax_d, helper_vfp_ah_maxd, float64)
 --
 .34.1

-New patch
+[PULL 40/68] target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
+Handle the FPCR.AH "don't negate the sign of a NaN" semantics in FMLS
+(indexed). We do this by creating 6 new helpers, which allow us to
+do the negation either by XOR (for AH=0) or by muladd flags
+(for AH=1).
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+[PMM: Mostly from RTH's patch; error in index order into fns[][]
+ fixed]
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/helper.h            | 14 ++++++++++++++
+ target/arm/tcg/translate-a64.c | 17 +++++++++++------
+ target/arm/tcg/translate-sve.c | 31 +++++++++++++++++--------------
+ target/arm/tcg/vec_helper.c    | 24 +++++++++++++++---------
+files changed, 57 insertions(+), 29 deletions(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(gvec_fmls_idx_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(gvec_fmls_idx_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(gvec_fmls_idx_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++
++DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_5(gvec_uqadd_b, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_5(gvec_uqadd_h, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ TRANS(FMULX_vi, do_fp3_vector_idx, a, f_vector_idx_fmulx)
+ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
+ {
+-    static gen_helper_gvec_4_ptr * const fns[3] = {
+-        gen_helper_gvec_fmla_idx_h,
+-        gen_helper_gvec_fmla_idx_s,
+-        gen_helper_gvec_fmla_idx_d,
++    static gen_helper_gvec_4_ptr * const fns[3][3] = {
++        { gen_helper_gvec_fmla_idx_h,
++          gen_helper_gvec_fmla_idx_s,
++          gen_helper_gvec_fmla_idx_d },
++        { gen_helper_gvec_fmls_idx_h,
++          gen_helper_gvec_fmls_idx_s,
++          gen_helper_gvec_fmls_idx_d },
++        { gen_helper_gvec_ah_fmls_idx_h,
++          gen_helper_gvec_ah_fmls_idx_s,
++          gen_helper_gvec_ah_fmls_idx_d },
+     };
+     MemOp esz = a->esz;
+     int check = fp_access_check_vector_hsd(s, a->q, esz);
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
+     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
+                       esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+-                      (a->idx << 1) | neg,
+-                      fns[esz - 1]);
++                      a->idx, fns[neg ? 1 + s->fpcr_ah : 0][esz - 1]);
+     return true;
+ }
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ DO_SVE2_RRXR_ROT(CDOT_zzxw_d, gen_helper_sve2_cdot_idx_d)
+  *** SVE Floating Point Multiply-Add Indexed Group
+  */
+-static bool do_FMLA_zzxz(DisasContext *s, arg_rrxr_esz *a, bool sub)
+-{
+-    static gen_helper_gvec_4_ptr * const fns[4] = {
+-        NULL,
+-        gen_helper_gvec_fmla_idx_h,
+-        gen_helper_gvec_fmla_idx_s,
+-        gen_helper_gvec_fmla_idx_d,
+-    };
+-    return gen_gvec_fpst_zzzz(s, fns[a->esz], a->rd, a->rn, a->rm, a->ra,
+-                              (a->index << 1) | sub,
+-                              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+-}
++static gen_helper_gvec_4_ptr * const fmla_idx_fns[4] = {
++    NULL,                       gen_helper_gvec_fmla_idx_h,
++    gen_helper_gvec_fmla_idx_s, gen_helper_gvec_fmla_idx_d
++};
++TRANS_FEAT(FMLA_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
++           fmla_idx_fns[a->esz], a->rd, a->rn, a->rm, a->ra, a->index,
++           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
+-TRANS_FEAT(FMLA_zzxz, aa64_sve, do_FMLA_zzxz, a, false)
+-TRANS_FEAT(FMLS_zzxz, aa64_sve, do_FMLA_zzxz, a, true)
++static gen_helper_gvec_4_ptr * const fmls_idx_fns[4][2] = {
++    { NULL, NULL },
++    { gen_helper_gvec_fmls_idx_h, gen_helper_gvec_ah_fmls_idx_h },
++    { gen_helper_gvec_fmls_idx_s, gen_helper_gvec_ah_fmls_idx_s },
++    { gen_helper_gvec_fmls_idx_d, gen_helper_gvec_ah_fmls_idx_d },
++};
++TRANS_FEAT(FMLS_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
++           fmls_idx_fns[a->esz][s->fpcr_ah],
++           a->rd, a->rn, a->rm, a->ra, a->index,
++           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
+ /*
+  *** SVE Floating Point Multiply Indexed Group
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_FMUL_IDX(gvec_fmls_nf_idx_s, float32_sub, float32_mul, float32, H4)
+ #undef DO_FMUL_IDX
+-#define DO_FMLA_IDX(NAME, TYPE, H)                                         \
++#define DO_FMLA_IDX(NAME, TYPE, H, NEGX, NEGF)                             \
+ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va,                  \
+                   float_status *stat, uint32_t desc)                       \
+ {                                                                          \
+     intptr_t i, j, oprsz = simd_oprsz(desc);                               \
+     intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
+-    TYPE op1_neg = extract32(desc, SIMD_DATA_SHIFT, 1);                    \
+-    intptr_t idx = desc >> (SIMD_DATA_SHIFT + 1);                          \
++    intptr_t idx = simd_data(desc);                                        \
+     TYPE *d = vd, *n = vn, *m = vm, *a = va;                               \
+-    op1_neg <<= (8 * sizeof(TYPE) - 1);                                    \
+     for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
+         TYPE mm = m[H(i + idx)];                                           \
+         for (j = 0; j < segment; j++) {                                    \
+-            d[i + j] = TYPE##_muladd(n[i + j] ^ op1_neg,                   \
+-                                     mm, a[i + j], 0, stat);               \
++            d[i + j] = TYPE##_muladd(n[i + j] ^ NEGX, mm,                  \
++                                     a[i + j], NEGF, stat);                \
+         }                                                                  \
+     }                                                                      \
+     clear_tail(d, oprsz, simd_maxsz(desc));                                \
+ }
+-DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2)
+-DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4)
+-DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8)
++DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2, 0, 0)
++DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4, 0, 0)
++DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8, 0, 0)
++
++DO_FMLA_IDX(gvec_fmls_idx_h, float16, H2, INT16_MIN, 0)
++DO_FMLA_IDX(gvec_fmls_idx_s, float32, H4, INT32_MIN, 0)
++DO_FMLA_IDX(gvec_fmls_idx_d, float64, H8, INT64_MIN, 0)
++
++DO_FMLA_IDX(gvec_ah_fmls_idx_h, float16, H2, 0, float_muladd_negate_product)
++DO_FMLA_IDX(gvec_ah_fmls_idx_s, float32, H4, 0, float_muladd_negate_product)
++DO_FMLA_IDX(gvec_ah_fmls_idx_d, float64, H8, 0, float_muladd_negate_product)
+ #undef DO_FMLA_IDX
+--
+.34.1

-[PULL 02/35] qtest: factor out qtest_install_gpio_out_intercept
+[PULL 41/68] target/arm: Handle FPCR.AH in negation in FMLS (vector)
-From: Chris Laplante <chris@laplante.io>
+Handle the FPCR.AH "don't negate the sign of a NaN" semantics
 in FMLS (vector), by implementing a new set of helpers for
 the AH=1 case.
-Signed-off-by: Chris Laplante <chris@laplante.io>
+The float_muladd_negate_product flag produces the same result
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+as negating either of the multiplication operands, assuming
-Message-id: 20230728160324.1159090-3-chris@laplante.io
+neither of the operands are NaNs.  But since FEAT_AFP does not
 negate NaNs, this behaviour is exactly what we need.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- softmmu/qtest.c | 16 ++++++++++------
+ target/arm/helper.h            |  4 ++++
-file changed, 10 insertions(+), 6 deletions(-)
+ target/arm/tcg/translate-a64.c |  7 ++++++-
  target/arm/tcg/vec_helper.c    | 22 ++++++++++++++++++++++
 files changed, 32 insertions(+), 1 deletion(-)
-diff --git a/softmmu/qtest.c b/softmmu/qtest.c
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/softmmu/qtest.c
+--- a/target/arm/helper.h
-+++ b/softmmu/qtest.c
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ void qtest_set_command_cb(bool (*pc_cb)(CharBackend *chr, gchar **words))
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
-     process_command_cb = pc_cb;
+ DEF_HELPER_FLAGS_5(gvec_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_5(gvec_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_5(gvec_ah_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_5(gvec_ah_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_5(gvec_ah_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 +
  DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmls[3] = {
      gen_helper_gvec_vfms_s,
      gen_helper_gvec_vfms_d,
  };
 -TRANS(FMLS_v, do_fp3_vector, a, 0, f_vector_fmls)
 +static gen_helper_gvec_3_ptr * const f_vector_fmls_ah[3] = {
 +    gen_helper_gvec_ah_vfms_h,
 +    gen_helper_gvec_ah_vfms_s,
 +    gen_helper_gvec_ah_vfms_d,
 +};
 +TRANS(FMLS_v, do_fp3_vector_2fn, a, 0, f_vector_fmls, f_vector_fmls_ah)
  static gen_helper_gvec_3_ptr * const f_vector_fcmeq[3] = {
      gen_helper_gvec_fceq_h,
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static float64 float64_mulsub_f(float64 dest, float64 op1, float64 op2,
      return float64_muladd(float64_chs(op1), op2, dest, 0, stat);
  }
-+static void qtest_install_gpio_out_intercept(DeviceState *dev, const char *name, int n)
++static float16 float16_ah_mulsub_f(float16 dest, float16 op1, float16 op2,
 +                                 float_status *stat)
 +{
-+    qemu_irq *disconnected = g_new0(qemu_irq, 1);
++    return float16_muladd(op1, op2, dest, float_muladd_negate_product, stat);
 +    qemu_irq icpt = qemu_allocate_irq(qtest_irq_handler,
 +                                      disconnected, n);
 +
 +    *disconnected = qdev_intercept_gpio_out(dev, icpt, name, n);
 +}
 +
- static void qtest_process_command(CharBackend *chr, gchar **words)
++static float32 float32_ah_mulsub_f(float32 dest, float32 op1, float32 op2,
- {
++                                 float_status *stat)
-     const gchar *command;
++{
-@@ -XXX,XX +XXX,XX @@ static void qtest_process_command(CharBackend *chr, gchar **words)
++    return float32_muladd(op1, op2, dest, float_muladd_negate_product, stat);
-             if (words[0][14] == 'o') {
++}
-                 int i;
++
-                 for (i = 0; i < ngl->num_out; ++i) {
++static float64 float64_ah_mulsub_f(float64 dest, float64 op1, float64 op2,
--                    qemu_irq *disconnected = g_new0(qemu_irq, 1);
++                                 float_status *stat)
--                    qemu_irq icpt = qemu_allocate_irq(qtest_irq_handler,
++{
--                                                      disconnected, i);
++    return float64_muladd(op1, op2, dest, float_muladd_negate_product, stat);
--
++}
--                    *disconnected = qdev_intercept_gpio_out(dev, icpt,
++
--                                                            ngl->name, i);
+ #define DO_MULADD(NAME, FUNC, TYPE)                                        \
-+                    qtest_install_gpio_out_intercept(dev, ngl->name, i);
+ void HELPER(NAME)(void *vd, void *vn, void *vm,                            \
-                 }
+                   float_status *stat, uint32_t desc)                       \
-             } else {
+@@ -XXX,XX +XXX,XX @@ DO_MULADD(gvec_vfms_h, float16_mulsub_f, float16)
-                 qemu_irq_intercept_in(ngl->in, qtest_irq_handler,
+ DO_MULADD(gvec_vfms_s, float32_mulsub_f, float32)
  DO_MULADD(gvec_vfms_d, float64_mulsub_f, float64)
 +DO_MULADD(gvec_ah_vfms_h, float16_ah_mulsub_f, float16)
 +DO_MULADD(gvec_ah_vfms_s, float32_ah_mulsub_f, float32)
 +DO_MULADD(gvec_ah_vfms_d, float64_ah_mulsub_f, float64)
 +
  /* For the indexed ops, SVE applies the index per 128-bit vector segment.
   * For AdvSIMD, there is of course only one such vector segment.
   */
 --
 .34.1

-[PULL 13/35] target/arm/ptw: Don't set fi->s1ptw for UnsuppAtomicUpdate fault
+[PULL 42/68] target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
-For an Unsupported Atomic Update fault where the stage 1 translation
+Handle the FPCR.AH "don't negate the sign of a NaN" semantics fro the
-table descriptor update can't be done because it's to an unsupported
+SVE FMLS (vector) insns, by providing new helpers for the AH=1 case
-memory type, this is a stage 1 abort (per the Arm ARM R_VSXXT).  This
+which end up passing fpcr_ah = true to the do_fmla_zpzzz_* functions
-means we should not set fi->s1ptw, because this will cause the code
+that do the work.
-in the get_phys_addr_lpae() error-exit path to mark it as stage 2.
 The float*_muladd functions have a flags argument that can
 perform optional negation of various operand.  We don't use
 that for "normal" arm fmla, because the muladd flags are not
 applied when an input is a NaN.  But since FEAT_AFP does not
 negate NaNs, this behaviour is exactly what we need.
 The non-AH helpers pass in a zero flags argument and control the
 negation via the neg1 and neg3 arguments; the AH helpers always pass
 in neg1 and neg3 as zero and control the negation via the flags
 argument.  This allows us to avoid conditional branches within the
 inner loop.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-2-peter.maydell@linaro.org
 ---
- target/arm/ptw.c | 1 -
+ target/arm/tcg/helper-sve.h    | 21 ++++++++
-file changed, 1 deletion(-)
+ target/arm/tcg/sve_helper.c    | 99 +++++++++++++++++++++++++++-------
  target/arm/tcg/translate-sve.c | 18 ++++---
 files changed, 114 insertions(+), 24 deletions(-)
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/tcg/helper-sve.h
-+++ b/target/arm/ptw.c
++++ b/target/arm/tcg/helper-sve.h
-@@ -XXX,XX +XXX,XX @@ static uint64_t arm_casq_ptw(CPUARMState *env, uint64_t old_val,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
-     if (unlikely(!host)) {
+                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
-         fi->type = ARMFault_UnsuppAtomicUpdate;
--        fi->s1ptw = true;
++DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_h, TCG_CALL_NO_RWG,
-         return 0;
++                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
-     }
++DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_s, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_d, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +
 +DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_h, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_s, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_d, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +
 +DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_h, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +
  DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_h, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_s, TCG_CALL_NO_RWG,
 diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/sve_helper.c
 +++ b/target/arm/tcg/sve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_ZPZ_FP(flogb_d, float64, H1_8, do_float64_logb_as_int)
  static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
                              float_status *status, uint32_t desc,
 -                            uint16_t neg1, uint16_t neg3)
 +                            uint16_t neg1, uint16_t neg3, int flags)
  {
      intptr_t i = simd_oprsz(desc);
      uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
                  e1 = *(uint16_t *)(vn + H1_2(i)) ^ neg1;
                  e2 = *(uint16_t *)(vm + H1_2(i));
                  e3 = *(uint16_t *)(va + H1_2(i)) ^ neg3;
 -                r = float16_muladd(e1, e2, e3, 0, status);
 +                r = float16_muladd(e1, e2, e3, flags, status);
                  *(uint16_t *)(vd + H1_2(i)) = r;
              }
          } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
  void HELPER(sve_fmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0);
 +    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
  }
  void HELPER(sve_fmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0);
 +    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0, 0);
  }
  void HELPER(sve_fnmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000);
 +    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000, 0);
  }
  void HELPER(sve_fnmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000);
 +    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000, 0);
 +}
 +
 +void HELPER(sve_ah_fmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
 +                              void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product);
 +}
 +
 +void HELPER(sve_ah_fnmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product | float_muladd_negate_c);
 +}
 +
 +void HELPER(sve_ah_fnmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_c);
  }
  static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
                              float_status *status, uint32_t desc,
 -                            uint32_t neg1, uint32_t neg3)
 +                            uint32_t neg1, uint32_t neg3, int flags)
  {
      intptr_t i = simd_oprsz(desc);
      uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
                  e1 = *(uint32_t *)(vn + H1_4(i)) ^ neg1;
                  e2 = *(uint32_t *)(vm + H1_4(i));
                  e3 = *(uint32_t *)(va + H1_4(i)) ^ neg3;
 -                r = float32_muladd(e1, e2, e3, 0, status);
 +                r = float32_muladd(e1, e2, e3, flags, status);
                  *(uint32_t *)(vd + H1_4(i)) = r;
              }
          } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
  void HELPER(sve_fmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0);
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
  }
  void HELPER(sve_fmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0);
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0, 0);
  }
  void HELPER(sve_fnmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000);
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000, 0);
  }
  void HELPER(sve_fnmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000);
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000, 0);
 +}
 +
 +void HELPER(sve_ah_fmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
 +                              void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product);
 +}
 +
 +void HELPER(sve_ah_fnmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product | float_muladd_negate_c);
 +}
 +
 +void HELPER(sve_ah_fnmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_c);
  }
  static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
                              float_status *status, uint32_t desc,
 -                            uint64_t neg1, uint64_t neg3)
 +                            uint64_t neg1, uint64_t neg3, int flags)
  {
      intptr_t i = simd_oprsz(desc);
      uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
                  e1 = *(uint64_t *)(vn + i) ^ neg1;
                  e2 = *(uint64_t *)(vm + i);
                  e3 = *(uint64_t *)(va + i) ^ neg3;
 -                r = float64_muladd(e1, e2, e3, 0, status);
 +                r = float64_muladd(e1, e2, e3, flags, status);
                  *(uint64_t *)(vd + i) = r;
              }
          } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
  void HELPER(sve_fmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0);
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
  }
  void HELPER(sve_fmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0);
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0, 0);
  }
  void HELPER(sve_fnmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN);
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN, 0);
  }
  void HELPER(sve_fnmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN);
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN, 0);
 +}
 +
 +void HELPER(sve_ah_fmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
 +                              void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product);
 +}
 +
 +void HELPER(sve_ah_fnmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product | float_muladd_negate_c);
 +}
 +
 +void HELPER(sve_ah_fnmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_c);
  }
  /* Two operand floating-point comparison controlled by a predicate.
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
             a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
             a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 -#define DO_FMLA(NAME, name) \
 +#define DO_FMLA(NAME, name, ah_name)                                    \
      static gen_helper_gvec_5_ptr * const name##_fns[4] = {              \
          NULL, gen_helper_sve_##name##_h,                                \
          gen_helper_sve_##name##_s, gen_helper_sve_##name##_d            \
      };                                                                  \
 -    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp, name##_fns[a->esz], \
 +    static gen_helper_gvec_5_ptr * const name##_ah_fns[4] = {           \
 +        NULL, gen_helper_sve_##ah_name##_h,                             \
 +        gen_helper_sve_##ah_name##_s, gen_helper_sve_##ah_name##_d      \
 +    };                                                                  \
 +    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp,                     \
 +               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz], \
                 a->rd, a->rn, a->rm, a->ra, a->pg, 0,                    \
                 a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 -DO_FMLA(FMLA_zpzzz, fmla_zpzzz)
 -DO_FMLA(FMLS_zpzzz, fmls_zpzzz)
 -DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz)
 -DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz)
 +/* We don't need an ah_fmla_zpzzz because fmla doesn't negate anything */
 +DO_FMLA(FMLA_zpzzz, fmla_zpzzz, fmla_zpzzz)
 +DO_FMLA(FMLS_zpzzz, fmls_zpzzz, ah_fmls_zpzzz)
 +DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz, ah_fnmla_zpzzz)
 +DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz, ah_fnmls_zpzzz)
  #undef DO_FMLA
 --
 .34.1

-[PULL 23/35] target/arm/ptw: Drop S1Translate::out_secure
+[PULL 43/68] target/arm: Handle FPCR.AH in SVE FTSSEL
-We only use S1Translate::out_secure in two places, where we are
+The negation step in the SVE FTSSEL insn mustn't negate a NaN when
-setting up MemTxAttrs for a page table load. We can use
+FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field
-arm_space_is_secure(ptw->out_space) instead, which guarantees
+and use that to determine whether to do the negation.
 that we're setting the MemTxAttrs secure and space fields
 consistently, and allows us to drop the out_secure field in
 S1Translate entirely.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-12-peter.maydell@linaro.org
 ---
- target/arm/ptw.c | 7 ++-----
+ target/arm/tcg/sve_helper.c    | 18 +++++++++++++++---
-file changed, 2 insertions(+), 5 deletions(-)
+ target/arm/tcg/translate-sve.c |  4 ++--
 files changed, 17 insertions(+), 5 deletions(-)
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/tcg/sve_helper.c
-+++ b/target/arm/ptw.c
++++ b/target/arm/tcg/sve_helper.c
-@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fexpa_d)(void *vd, void *vn, uint32_t desc)
-      * Stage 2 is indicated by in_mmu_idx set to ARMMMUIdx_Stage2{,_S}.
+ void HELPER(sve_ftssel_h)(void *vd, void *vn, void *vm, uint32_t desc)
-      */
+ {
-     bool in_s1_is_el0;
+     intptr_t i, opr_sz = simd_oprsz(desc) / 2;
--    bool out_secure;
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT, 1);
-     bool out_rw;
+     uint16_t *d = vd, *n = vn, *m = vm;
-     bool out_be;
+     for (i = 0; i < opr_sz; i += 1) {
-     ARMSecuritySpace out_space;
+         uint16_t nn = n[i];
-@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftssel_h)(void *vd, void *vn, void *vm, uint32_t desc)
-         pte_attrs = s2.cacheattrs.attrs;
+         if (mm & 1) {
-         ptw->out_host = NULL;
+             nn = float16_one;
-         ptw->out_rw = false;
+         }
--        ptw->out_secure = s2.f.attrs.secure;
+-        d[i] = nn ^ (mm & 2) << 14;
-         ptw->out_space = s2.f.attrs.space;
++        if (mm & 2) {
-     } else {
++            nn = float16_maybe_ah_chs(nn, fpcr_ah);
- #ifdef CONFIG_TCG
++        }
-@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
++        d[i] = nn;
-         ptw->out_phys = full->phys_addr | (addr & ~TARGET_PAGE_MASK);
+     }
-         ptw->out_rw = full->prot & PAGE_WRITE;
+ }
-         pte_attrs = full->pte_attrs;
--        ptw->out_secure = full->attrs.secure;
+ void HELPER(sve_ftssel_s)(void *vd, void *vn, void *vm, uint32_t desc)
-         ptw->out_space = full->attrs.space;
+ {
- #else
+     intptr_t i, opr_sz = simd_oprsz(desc) / 4;
-         g_assert_not_reached();
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT, 1);
-@@ -XXX,XX +XXX,XX @@ static uint32_t arm_ldl_ptw(CPUARMState *env, S1Translate *ptw,
+     uint32_t *d = vd, *n = vn, *m = vm;
-     } else {
+     for (i = 0; i < opr_sz; i += 1) {
-         /* Page tables are in MMIO. */
+         uint32_t nn = n[i];
-         MemTxAttrs attrs = {
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftssel_s)(void *vd, void *vn, void *vm, uint32_t desc)
--            .secure = ptw->out_secure,
+         if (mm & 1) {
-             .space = ptw->out_space,
+             nn = float32_one;
-+            .secure = arm_space_is_secure(ptw->out_space),
+         }
-         };
+-        d[i] = nn ^ (mm & 2) << 30;
-         AddressSpace *as = arm_addressspace(cs, attrs);
++        if (mm & 2) {
-         MemTxResult result = MEMTX_OK;
++            nn = float32_maybe_ah_chs(nn, fpcr_ah);
-@@ -XXX,XX +XXX,XX @@ static uint64_t arm_ldq_ptw(CPUARMState *env, S1Translate *ptw,
++        }
-     } else {
++        d[i] = nn;
-         /* Page tables are in MMIO. */
+     }
-         MemTxAttrs attrs = {
+ }
--            .secure = ptw->out_secure,
-             .space = ptw->out_space,
+ void HELPER(sve_ftssel_d)(void *vd, void *vn, void *vm, uint32_t desc)
-+            .secure = arm_space_is_secure(ptw->out_space),
+ {
-         };
+     intptr_t i, opr_sz = simd_oprsz(desc) / 8;
-         AddressSpace *as = arm_addressspace(cs, attrs);
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT, 1);
-         MemTxResult result = MEMTX_OK;
+     uint64_t *d = vd, *n = vn, *m = vm;
      for (i = 0; i < opr_sz; i += 1) {
          uint64_t nn = n[i];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftssel_d)(void *vd, void *vn, void *vm, uint32_t desc)
          if (mm & 1) {
              nn = float64_one;
          }
 -        d[i] = nn ^ (mm & 2) << 62;
 +        if (mm & 2) {
 +            nn = float64_maybe_ah_chs(nn, fpcr_ah);
 +        }
 +        d[i] = nn;
      }
  }
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2 * const fexpa_fns[4] = {
      gen_helper_sve_fexpa_s, gen_helper_sve_fexpa_d,
  };
  TRANS_FEAT_NONSTREAMING(FEXPA, aa64_sve, gen_gvec_ool_zz,
 -                        fexpa_fns[a->esz], a->rd, a->rn, 0)
 +                        fexpa_fns[a->esz], a->rd, a->rn, s->fpcr_ah)
  static gen_helper_gvec_3 * const ftssel_fns[4] = {
      NULL,                    gen_helper_sve_ftssel_h,
      gen_helper_sve_ftssel_s, gen_helper_sve_ftssel_d,
  };
  TRANS_FEAT_NONSTREAMING(FTSSEL, aa64_sve, gen_gvec_ool_arg_zzz,
 -                        ftssel_fns[a->esz], a, 0)
 +                        ftssel_fns[a->esz], a, s->fpcr_ah)
  /*
   *** SVE Predicate Logical Operations Group
 --
 .34.1

-[PULL 15/35] target/arm/ptw: Set s1ns bit in fault info more consistently
+[PULL 44/68] target/arm: Handle FPCR.AH in SVE FTMAD
-The s1ns bit in ARMMMUFaultInfo is documented as "true if
+The negation step in the SVE FTMAD insn mustn't negate a NaN when
-we faulted on a non-secure IPA while in secure state". Both the
+FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field,
-places which look at this bit only do so after having confirmed
+so we can select the correct behaviour.
 that this is a stage 2 fault and we're dealing with Secure EL2,
 which leaves the ptw.c code free to set the bit to any random
 value in the other cases.
-Instead of taking advantage of that freedom, consistently
+Because the operand is known to be negative, negating the operand
-make the bit be set to false for the "not a stage 2 fault
+is the same as taking the absolute value.  Defer this to the muladd
-for Secure EL2" cases. This removes some cases where we
+operation via flags, so that it happens after NaN detection, which
-were using an 'is_secure' boolean and leaving the reader
+is correct for FPCR.AH.
 guessing about whether that was the right thing for Realm
 and Root cases.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-4-peter.maydell@linaro.org
 ---
- target/arm/ptw.c | 19 +++++++++++++++----
+ target/arm/tcg/sve_helper.c    | 42 ++++++++++++++++++++++++++--------
-file changed, 15 insertions(+), 4 deletions(-)
+ target/arm/tcg/translate-sve.c |  3 ++-
 files changed, 35 insertions(+), 10 deletions(-)
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/tcg/sve_helper.c
-+++ b/target/arm/ptw.c
++++ b/target/arm/tcg/sve_helper.c
-@@ -XXX,XX +XXX,XX @@ static ARMSecuritySpace S2_security_space(ARMSecuritySpace s1_space,
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftmad_h)(void *vd, void *vn, void *vm,
 x3c00, 0xb800, 0x293a, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
      };
      intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float16);
 -    intptr_t x = simd_data(desc);
 +    intptr_t x = extract32(desc, SIMD_DATA_SHIFT, 3);
 +    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 3, 1);
      float16 *d = vd, *n = vn, *m = vm;
 +
      for (i = 0; i < opr_sz; i++) {
          float16 mm = m[i];
          intptr_t xx = x;
 +        int flags = 0;
 +
          if (float16_is_neg(mm)) {
 -            mm = float16_abs(mm);
 +            if (fpcr_ah) {
 +                flags = float_muladd_negate_product;
 +            } else {
 +                mm = float16_abs(mm);
 +            }
              xx += 8;
          }
 -        d[i] = float16_muladd(n[i], mm, coeff[xx], 0, s);
 +        d[i] = float16_muladd(n[i], mm, coeff[xx], flags, s);
      }
  }
-+static bool fault_s1ns(ARMSecuritySpace space, ARMMMUIdx s2_mmu_idx)
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftmad_s)(void *vd, void *vn, void *vm,
-+{
+x37cd37cc, 0x00000000, 0x00000000, 0x00000000,
-+    /*
+     };
-+     * For stage 2 faults in Secure EL22, S1NS indicates
+     intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float32);
-+     * whether the faulting IPA is in the Secure or NonSecure
+-    intptr_t x = simd_data(desc);
-+     * IPA space. For all other kinds of fault, it is false.
++    intptr_t x = extract32(desc, SIMD_DATA_SHIFT, 3);
-+     */
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 3, 1);
-+    return space == ARMSS_Secure && regime_is_stage2(s2_mmu_idx)
+     float32 *d = vd, *n = vn, *m = vm;
 +        && s2_mmu_idx == ARMMMUIdx_Stage2_S;
 +}
 +
- /* Translate a S1 pagetable walk through S2 if needed.  */
+     for (i = 0; i < opr_sz; i++) {
- static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
+         float32 mm = m[i];
-                              hwaddr addr, ARMMMUFaultInfo *fi)
+         intptr_t xx = x;
-@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
++        int flags = 0;
-             fi->s2addr = addr;
++
-             fi->stage2 = true;
+         if (float32_is_neg(mm)) {
-             fi->s1ptw = true;
+-            mm = float32_abs(mm);
--            fi->s1ns = !is_secure;
++            if (fpcr_ah) {
-+            fi->s1ns = fault_s1ns(ptw->in_space, s2_mmu_idx);
++                flags = float_muladd_negate_product;
-             return false;
++            } else {
 +                mm = float32_abs(mm);
 +            }
              xx += 8;
          }
+-        d[i] = float32_muladd(n[i], mm, coeff[xx], 0, s);
++        d[i] = float32_muladd(n[i], mm, coeff[xx], flags, s);
      }
-@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
-     fi->s2addr = addr;
-     fi->stage2 = regime_is_stage2(s2_mmu_idx);
-     fi->s1ptw = fi->stage2;
--    fi->s1ns = !is_secure;
-+    fi->s1ns = fault_s1ns(ptw->in_space, s2_mmu_idx);
-     return false;
  }
-@@ -XXX,XX +XXX,XX @@ static uint64_t arm_casq_ptw(CPUARMState *env, uint64_t old_val,
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftmad_d)(void *vd, void *vn, void *vm,
-             fi->s2addr = ptw->out_virt;
+x3e21ee96d2641b13ull, 0xbda8f76380fbb401ull,
-             fi->stage2 = true;
+     };
-             fi->s1ptw = true;
+     intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float64);
--            fi->s1ns = !ptw->in_secure;
+-    intptr_t x = simd_data(desc);
-+            fi->s1ns = fault_s1ns(ptw->in_space, ptw->in_ptw_idx);
++    intptr_t x = extract32(desc, SIMD_DATA_SHIFT, 3);
-             return 0;
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 3, 1);
      float64 *d = vd, *n = vn, *m = vm;
 +
      for (i = 0; i < opr_sz; i++) {
          float64 mm = m[i];
          intptr_t xx = x;
 +        int flags = 0;
 +
          if (float64_is_neg(mm)) {
 -            mm = float64_abs(mm);
 +            if (fpcr_ah) {
 +                flags = float_muladd_negate_product;
 +            } else {
 +                mm = float64_abs(mm);
 +            }
              xx += 8;
          }
+-        d[i] = float64_muladd(n[i], mm, coeff[xx], 0, s);
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
++        d[i] = float64_muladd(n[i], mm, coeff[xx], flags, s);
-     fi->level = level;
+     }
      /* Tag the error as S2 for failed S1 PTW at S2 or ordinary S2.  */
      fi->stage2 = fi->s1ptw || regime_is_stage2(mmu_idx);
 -    fi->s1ns = mmu_idx == ARMMMUIdx_Stage2;
 +    fi->s1ns = fault_s1ns(ptw->in_space, mmu_idx);
      return true;
  }
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const ftmad_fns[4] = {
+     gen_helper_sve_ftmad_s, gen_helper_sve_ftmad_d,
+ };
+ TRANS_FEAT_NONSTREAMING(FTMAD, aa64_sve, gen_gvec_fpst_zzz,
+-                        ftmad_fns[a->esz], a->rd, a->rn, a->rm, a->imm,
++                        ftmad_fns[a->esz], a->rd, a->rn, a->rm,
++                        a->imm | (s->fpcr_ah << 3),
+                         a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
+ /*
 --
 .34.1

-[PULL 08/35] accel/kvm: Specify default IPA size for arm64
+[PULL 45/68] target/arm: Handle FPCR.AH in vector FCMLA
-From: Akihiko Odaki <akihiko.odaki@daynix.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-Before this change, the default KVM type, which is used for non-virt
+The negation step in FCMLA mustn't negate a NaN when FPCR.AH
-machine models, was 0.
+is set. Handle this by passing FPCR.AH to the helper via the
 SIMD data field, and use this to select whether to do the
 negation via XOR or via the muladd negate_product flag.
-The kernel documentation says:
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-> On arm64, the physical address size for a VM (IPA Size limit) is
+Message-id: 20250129013857.135256-26-richard.henderson@linaro.org
-> limited to 40bits by default. The limit can be configured if the host
+[PMM: Expanded commit message]
 > supports the extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
 > KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type
 > identifier, where IPA_Bits is the maximum width of any physical
 > address used by the VM. The IPA_Bits is encoded in bits[7-0] of the
 > machine type identifier.
 >
 > e.g, to configure a guest to use 48bit physical address size::
 >
 >     vm_fd = ioctl(dev_fd, KVM_CREATE_VM, KVM_VM_TYPE_ARM_IPA_SIZE(48));
 >
 > The requested size (IPA_Bits) must be:
 >
 >  ==   =========================================================
 >   0   Implies default size, 40bits (for backward compatibility)
 >   N   Implies N bits, where N is a positive integer such that,
 >       32 <= N <= Host_IPA_Limit
 >  ==   =========================================================
 > Host_IPA_Limit is the maximum possible value for IPA_Bits on the host
 > and is dependent on the CPU capability and the kernel configuration.
 > The limit can be retrieved using KVM_CAP_ARM_VM_IPA_SIZE of the
 > KVM_CHECK_EXTENSION ioctl() at run-time.
 >
 > Creation of the VM will fail if the requested IPA size (whether it is
 > implicit or explicit) is unsupported on the host.
 https://docs.kernel.org/virt/kvm/api.html#kvm-create-vm
 So if Host_IPA_Limit < 40, specifying 0 as the type will fail. This
 actually confused libvirt, which uses "none" machine model to probe the
 KVM availability, on M2 MacBook Air.
 Fix this by using Host_IPA_Limit as the default type when
 KVM_CAP_ARM_VM_IPA_SIZE is available.
 Cc: qemu-stable@nongnu.org
 Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
 Message-id: 20230727073134.134102-3-akihiko.odaki@daynix.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/kvm.c | 4 +++-
+ target/arm/tcg/translate-a64.c |  2 +-
-file changed, 3 insertions(+), 1 deletion(-)
+ target/arm/tcg/vec_helper.c    | 66 ++++++++++++++++++++--------------
 files changed, 40 insertions(+), 28 deletions(-)
-diff --git a/target/arm/kvm.c b/target/arm/kvm.c
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/kvm.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ int kvm_arm_get_max_vm_ipa_size(MachineState *ms, bool *fixed_ipa)
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
- int kvm_arch_get_default_type(MachineState *ms)
+     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
- {
+                       a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
--    return 0;
+-                      a->rot, fn[a->esz]);
-+    bool fixed_ipa;
++                      a->rot | (s->fpcr_ah << 2), fn[a->esz]);
-+    int size = kvm_arm_get_max_vm_ipa_size(ms, &fixed_ipa);
+     return true;
 +    return fixed_ipa ? 0 : size;
  }
- int kvm_arch_init(MachineState *ms, KVMState *s)
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm, void *va,
      uintptr_t opr_sz = simd_oprsz(desc);
      float16 *d = vd, *n = vn, *m = vm, *a = va;
      intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 -    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 -    uint32_t neg_real = flip ^ neg_imag;
 +    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
 +    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t negf_real = flip ^ negf_imag;
 +    float16 negx_imag, negx_real;
      uintptr_t i;
 -    /* Shift boolean to the sign bit so we can xor to negate.  */
 -    neg_real <<= 15;
 -    neg_imag <<= 15;
 +    /* With AH=0, use negx; with AH=1 use negf. */
 +    negx_real = (negf_real & ~fpcr_ah) << 15;
 +    negx_imag = (negf_imag & ~fpcr_ah) << 15;
 +    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
 +    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
      for (i = 0; i < opr_sz / 2; i += 2) {
          float16 e2 = n[H2(i + flip)];
 -        float16 e1 = m[H2(i + flip)] ^ neg_real;
 +        float16 e1 = m[H2(i + flip)] ^ negx_real;
          float16 e4 = e2;
 -        float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag;
 +        float16 e3 = m[H2(i + 1 - flip)] ^ negx_imag;
 -        d[H2(i)] = float16_muladd(e2, e1, a[H2(i)], 0, fpst);
 -        d[H2(i + 1)] = float16_muladd(e4, e3, a[H2(i + 1)], 0, fpst);
 +        d[H2(i)] = float16_muladd(e2, e1, a[H2(i)], negf_real, fpst);
 +        d[H2(i + 1)] = float16_muladd(e4, e3, a[H2(i + 1)], negf_imag, fpst);
      }
      clear_tail(d, opr_sz, simd_maxsz(desc));
  }
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlas)(void *vd, void *vn, void *vm, void *va,
      uintptr_t opr_sz = simd_oprsz(desc);
      float32 *d = vd, *n = vn, *m = vm, *a = va;
      intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 -    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 -    uint32_t neg_real = flip ^ neg_imag;
 +    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
 +    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t negf_real = flip ^ negf_imag;
 +    float32 negx_imag, negx_real;
      uintptr_t i;
 -    /* Shift boolean to the sign bit so we can xor to negate.  */
 -    neg_real <<= 31;
 -    neg_imag <<= 31;
 +    /* With AH=0, use negx; with AH=1 use negf. */
 +    negx_real = (negf_real & ~fpcr_ah) << 31;
 +    negx_imag = (negf_imag & ~fpcr_ah) << 31;
 +    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
 +    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
      for (i = 0; i < opr_sz / 4; i += 2) {
          float32 e2 = n[H4(i + flip)];
 -        float32 e1 = m[H4(i + flip)] ^ neg_real;
 +        float32 e1 = m[H4(i + flip)] ^ negx_real;
          float32 e4 = e2;
 -        float32 e3 = m[H4(i + 1 - flip)] ^ neg_imag;
 +        float32 e3 = m[H4(i + 1 - flip)] ^ negx_imag;
 -        d[H4(i)] = float32_muladd(e2, e1, a[H4(i)], 0, fpst);
 -        d[H4(i + 1)] = float32_muladd(e4, e3, a[H4(i + 1)], 0, fpst);
 +        d[H4(i)] = float32_muladd(e2, e1, a[H4(i)], negf_real, fpst);
 +        d[H4(i + 1)] = float32_muladd(e4, e3, a[H4(i + 1)], negf_imag, fpst);
      }
      clear_tail(d, opr_sz, simd_maxsz(desc));
  }
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlad)(void *vd, void *vn, void *vm, void *va,
      uintptr_t opr_sz = simd_oprsz(desc);
      float64 *d = vd, *n = vn, *m = vm, *a = va;
      intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 -    uint64_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 -    uint64_t neg_real = flip ^ neg_imag;
 +    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
 +    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t negf_real = flip ^ negf_imag;
 +    float64 negx_real, negx_imag;
      uintptr_t i;
 -    /* Shift boolean to the sign bit so we can xor to negate.  */
 -    neg_real <<= 63;
 -    neg_imag <<= 63;
 +    /* With AH=0, use negx; with AH=1 use negf. */
 +    negx_real = (uint64_t)(negf_real & ~fpcr_ah) << 63;
 +    negx_imag = (uint64_t)(negf_imag & ~fpcr_ah) << 63;
 +    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
 +    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
      for (i = 0; i < opr_sz / 8; i += 2) {
          float64 e2 = n[i + flip];
 -        float64 e1 = m[i + flip] ^ neg_real;
 +        float64 e1 = m[i + flip] ^ negx_real;
          float64 e4 = e2;
 -        float64 e3 = m[i + 1 - flip] ^ neg_imag;
 +        float64 e3 = m[i + 1 - flip] ^ negx_imag;
 -        d[i] = float64_muladd(e2, e1, a[i], 0, fpst);
 -        d[i + 1] = float64_muladd(e4, e3, a[i + 1], 0, fpst);
 +        d[i] = float64_muladd(e2, e1, a[i], negf_real, fpst);
 +        d[i + 1] = float64_muladd(e4, e3, a[i + 1], negf_imag, fpst);
      }
      clear_tail(d, opr_sz, simd_maxsz(desc));
  }
 --
 .34.1

-New patch
+[PULL 46/68] target/arm: Handle FPCR.AH in FCMLA by index
+From: Richard Henderson <richard.henderson@linaro.org>
+The negation step in FCMLA by index mustn't negate a NaN when
+FPCR.AH is set. Use the same approach as vector FCMLA of
+passing in FPCR.AH and using it to select whether to negate
+by XOR or by the muladd negate_product flag.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20250129013857.135256-27-richard.henderson@linaro.org
+[PMM: Expanded commit message]
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/tcg/translate-a64.c |  2 +-
+ target/arm/tcg/vec_helper.c    | 44 ++++++++++++++++++++--------------
+files changed, 27 insertions(+), 19 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
+     if (fp_access_check(s)) {
+         gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
+                           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+-                          (a->idx << 2) | a->rot, fn);
++                          (s->fpcr_ah << 4) | (a->idx << 2) | a->rot, fn);
+     }
+     return true;
+ }
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm, void *va,
+     uintptr_t opr_sz = simd_oprsz(desc);
+     float16 *d = vd, *n = vn, *m = vm, *a = va;
+     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
++    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+     intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
+-    uint32_t neg_real = flip ^ neg_imag;
++    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 4, 1);
++    uint32_t negf_real = flip ^ negf_imag;
+     intptr_t elements = opr_sz / sizeof(float16);
+     intptr_t eltspersegment = MIN(16 / sizeof(float16), elements);
++    float16 negx_imag, negx_real;
+     intptr_t i, j;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 15;
+-    neg_imag <<= 15;
++    /* With AH=0, use negx; with AH=1 use negf. */
++    negx_real = (negf_real & ~fpcr_ah) << 15;
++    negx_imag = (negf_imag & ~fpcr_ah) << 15;
++    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
++    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
+     for (i = 0; i < elements; i += eltspersegment) {
+         float16 mr = m[H2(i + 2 * index + 0)];
+         float16 mi = m[H2(i + 2 * index + 1)];
+-        float16 e1 = neg_real ^ (flip ? mi : mr);
+-        float16 e3 = neg_imag ^ (flip ? mr : mi);
++        float16 e1 = negx_real ^ (flip ? mi : mr);
++        float16 e3 = negx_imag ^ (flip ? mr : mi);
+         for (j = i; j < i + eltspersegment; j += 2) {
+             float16 e2 = n[H2(j + flip)];
+             float16 e4 = e2;
+-            d[H2(j)] = float16_muladd(e2, e1, a[H2(j)], 0, fpst);
+-            d[H2(j + 1)] = float16_muladd(e4, e3, a[H2(j + 1)], 0, fpst);
++            d[H2(j)] = float16_muladd(e2, e1, a[H2(j)], negf_real, fpst);
++            d[H2(j + 1)] = float16_muladd(e4, e3, a[H2(j + 1)], negf_imag, fpst);
+         }
+     }
+     clear_tail(d, opr_sz, simd_maxsz(desc));
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm, void *va,
+     uintptr_t opr_sz = simd_oprsz(desc);
+     float32 *d = vd, *n = vn, *m = vm, *a = va;
+     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
++    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+     intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
+-    uint32_t neg_real = flip ^ neg_imag;
++    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 4, 1);
++    uint32_t negf_real = flip ^ negf_imag;
+     intptr_t elements = opr_sz / sizeof(float32);
+     intptr_t eltspersegment = MIN(16 / sizeof(float32), elements);
++    float32 negx_imag, negx_real;
+     intptr_t i, j;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 31;
+-    neg_imag <<= 31;
++    /* With AH=0, use negx; with AH=1 use negf. */
++    negx_real = (negf_real & ~fpcr_ah) << 31;
++    negx_imag = (negf_imag & ~fpcr_ah) << 31;
++    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
++    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
+     for (i = 0; i < elements; i += eltspersegment) {
+         float32 mr = m[H4(i + 2 * index + 0)];
+         float32 mi = m[H4(i + 2 * index + 1)];
+-        float32 e1 = neg_real ^ (flip ? mi : mr);
+-        float32 e3 = neg_imag ^ (flip ? mr : mi);
++        float32 e1 = negx_real ^ (flip ? mi : mr);
++        float32 e3 = negx_imag ^ (flip ? mr : mi);
+         for (j = i; j < i + eltspersegment; j += 2) {
+             float32 e2 = n[H4(j + flip)];
+             float32 e4 = e2;
+-            d[H4(j)] = float32_muladd(e2, e1, a[H4(j)], 0, fpst);
+-            d[H4(j + 1)] = float32_muladd(e4, e3, a[H4(j + 1)], 0, fpst);
++            d[H4(j)] = float32_muladd(e2, e1, a[H4(j)], negf_real, fpst);
++            d[H4(j + 1)] = float32_muladd(e4, e3, a[H4(j + 1)], negf_imag, fpst);
+         }
+     }
+     clear_tail(d, opr_sz, simd_maxsz(desc));
+--
+.34.1

-[PULL 11/35] accel/kvm: Free as when an error occurred
+[PULL 47/68] target/arm: Handle FPCR.AH in SVE FCMLA
-From: Akihiko Odaki <akihiko.odaki@daynix.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-An error may occur after s->as is allocated, for example if the
+The negation step in SVE FCMLA mustn't negate a NaN when FPCR.AH is
-KVM_CREATE_VM ioctl call fails.
+set.  Use the same approach as we did for A64 FCMLA of passing in
 FPCR.AH and using it to select whether to negate by XOR or by the
 muladd negate_product flag.
-Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230727073134.134102-6-akihiko.odaki@daynix.com
+Message-id: 20250129013857.135256-28-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-[PMM: tweaked commit message]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- accel/kvm/kvm-all.c | 1 +
+ target/arm/tcg/sve_helper.c    | 69 +++++++++++++++++++++-------------
-file changed, 1 insertion(+)
+ target/arm/tcg/translate-sve.c |  2 +-
 files changed, 43 insertions(+), 28 deletions(-)
-diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/accel/kvm/kvm-all.c
+--- a/target/arm/tcg/sve_helper.c
-+++ b/accel/kvm/kvm-all.c
++++ b/target/arm/tcg/sve_helper.c
-@@ -XXX,XX +XXX,XX @@ err:
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
-     if (s->fd != -1) {
+                                void *vg, float_status *status, uint32_t desc)
-         close(s->fd);
+ {
-     }
+     intptr_t j, i = simd_oprsz(desc);
-+    g_free(s->as);
+-    unsigned rot = simd_data(desc);
-     g_free(s->memory_listener.slots);
+-    bool flip = rot & 1;
+-    float16 neg_imag, neg_real;
-     return ret;
++    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
 +    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t negf_real = flip ^ negf_imag;
 +    float16 negx_imag, negx_real;
      uint64_t *g = vg;
 -    neg_imag = float16_set_sign(0, (rot & 2) != 0);
 -    neg_real = float16_set_sign(0, rot == 1 || rot == 2);
 +    /* With AH=0, use negx; with AH=1 use negf. */
 +    negx_real = (negf_real & ~fpcr_ah) << 15;
 +    negx_imag = (negf_imag & ~fpcr_ah) << 15;
 +    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
 +    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
      do {
          uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
              mi = *(float16 *)(vm + H1_2(j));
              e2 = (flip ? ni : nr);
 -            e1 = (flip ? mi : mr) ^ neg_real;
 +            e1 = (flip ? mi : mr) ^ negx_real;
              e4 = e2;
 -            e3 = (flip ? mr : mi) ^ neg_imag;
 +            e3 = (flip ? mr : mi) ^ negx_imag;
              if (likely((pg >> (i & 63)) & 1)) {
                  d = *(float16 *)(va + H1_2(i));
 -                d = float16_muladd(e2, e1, d, 0, status);
 +                d = float16_muladd(e2, e1, d, negf_real, status);
                  *(float16 *)(vd + H1_2(i)) = d;
              }
              if (likely((pg >> (j & 63)) & 1)) {
                  d = *(float16 *)(va + H1_2(j));
 -                d = float16_muladd(e4, e3, d, 0, status);
 +                d = float16_muladd(e4, e3, d, negf_imag, status);
                  *(float16 *)(vd + H1_2(j)) = d;
              }
          } while (i & 63);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
      intptr_t j, i = simd_oprsz(desc);
 -    unsigned rot = simd_data(desc);
 -    bool flip = rot & 1;
 -    float32 neg_imag, neg_real;
 +    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
 +    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t negf_real = flip ^ negf_imag;
 +    float32 negx_imag, negx_real;
      uint64_t *g = vg;
 -    neg_imag = float32_set_sign(0, (rot & 2) != 0);
 -    neg_real = float32_set_sign(0, rot == 1 || rot == 2);
 +    /* With AH=0, use negx; with AH=1 use negf. */
 +    negx_real = (negf_real & ~fpcr_ah) << 31;
 +    negx_imag = (negf_imag & ~fpcr_ah) << 31;
 +    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
 +    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
      do {
          uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
              mi = *(float32 *)(vm + H1_2(j));
              e2 = (flip ? ni : nr);
 -            e1 = (flip ? mi : mr) ^ neg_real;
 +            e1 = (flip ? mi : mr) ^ negx_real;
              e4 = e2;
 -            e3 = (flip ? mr : mi) ^ neg_imag;
 +            e3 = (flip ? mr : mi) ^ negx_imag;
              if (likely((pg >> (i & 63)) & 1)) {
                  d = *(float32 *)(va + H1_2(i));
 -                d = float32_muladd(e2, e1, d, 0, status);
 +                d = float32_muladd(e2, e1, d, negf_real, status);
                  *(float32 *)(vd + H1_2(i)) = d;
              }
              if (likely((pg >> (j & 63)) & 1)) {
                  d = *(float32 *)(va + H1_2(j));
 -                d = float32_muladd(e4, e3, d, 0, status);
 +                d = float32_muladd(e4, e3, d, negf_imag, status);
                  *(float32 *)(vd + H1_2(j)) = d;
              }
          } while (i & 63);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
      intptr_t j, i = simd_oprsz(desc);
 -    unsigned rot = simd_data(desc);
 -    bool flip = rot & 1;
 -    float64 neg_imag, neg_real;
 +    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
 +    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t negf_real = flip ^ negf_imag;
 +    float64 negx_imag, negx_real;
      uint64_t *g = vg;
 -    neg_imag = float64_set_sign(0, (rot & 2) != 0);
 -    neg_real = float64_set_sign(0, rot == 1 || rot == 2);
 +    /* With AH=0, use negx; with AH=1 use negf. */
 +    negx_real = (uint64_t)(negf_real & ~fpcr_ah) << 63;
 +    negx_imag = (uint64_t)(negf_imag & ~fpcr_ah) << 63;
 +    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
 +    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
      do {
          uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
              mi = *(float64 *)(vm + H1_2(j));
              e2 = (flip ? ni : nr);
 -            e1 = (flip ? mi : mr) ^ neg_real;
 +            e1 = (flip ? mi : mr) ^ negx_real;
              e4 = e2;
 -            e3 = (flip ? mr : mi) ^ neg_imag;
 +            e3 = (flip ? mr : mi) ^ negx_imag;
              if (likely((pg >> (i & 63)) & 1)) {
                  d = *(float64 *)(va + H1_2(i));
 -                d = float64_muladd(e2, e1, d, 0, status);
 +                d = float64_muladd(e2, e1, d, negf_real, status);
                  *(float64 *)(vd + H1_2(i)) = d;
              }
              if (likely((pg >> (j & 63)) & 1)) {
                  d = *(float64 *)(va + H1_2(j));
 -                d = float64_muladd(e4, e3, d, 0, status);
 +                d = float64_muladd(e4, e3, d, negf_imag, status);
                  *(float64 *)(vd + H1_2(j)) = d;
              }
          } while (i & 63);
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_5_ptr * const fcmla_fns[4] = {
      gen_helper_sve_fcmla_zpzzz_s, gen_helper_sve_fcmla_zpzzz_d,
  };
  TRANS_FEAT(FCMLA_zpzzz, aa64_sve, gen_gvec_fpst_zzzzp, fcmla_fns[a->esz],
 -           a->rd, a->rn, a->rm, a->ra, a->pg, a->rot,
 +           a->rd, a->rn, a->rm, a->ra, a->pg, a->rot | (s->fpcr_ah << 2),
             a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
  static gen_helper_gvec_4_ptr * const fcmla_idx_fns[4] = {
 --
 .34.1

-[PULL 28/35] target/arm/ptw: Load stage-2 tables from realm physical space
+[PULL 48/68] target/arm: Handle FPCR.AH in FMLSL (by element and vector)
-From: Jean-Philippe Brucker <jean-philippe@linaro.org>
+From: Richard Henderson <richard.henderson@linaro.org>
-In realm state, stage-2 translation tables are fetched from the realm
+Handle FPCR.AH's requirement to not negate the sign of a NaN
-physical address space (R_PGRQD).
+in FMLSL by element and vector, using the usual trick of
 negating by XOR when AH=0 and by muladd flags when AH=1.
-Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
+Since we have the CPUARMState* in the helper anyway, we can
 look directly at env->vfp.fpcr and don't need toa pass in the
 FPCR.AH value via the SIMD data word.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20250129013857.135256-31-richard.henderson@linaro.org
 [PMM: commit message tweaked]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20230809123706.1842548-2-jean-philippe@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/ptw.c | 26 ++++++++++++++++++--------
+ target/arm/tcg/vec_helper.c | 71 ++++++++++++++++++++++++-------------
-file changed, 18 insertions(+), 8 deletions(-)
+file changed, 46 insertions(+), 25 deletions(-)
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/tcg/vec_helper.c
-+++ b/target/arm/ptw.c
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static ARMMMUIdx ptw_idx_for_stage_2(CPUARMState *env, ARMMMUIdx stage2idx)
+@@ -XXX,XX +XXX,XX @@ static uint64_t load4_f16(uint64_t *ptr, int is_q, int is_2)
+  */
-     /*
-      * We're OK to check the current state of the CPU here because
+ static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
--     * (1) we always invalidate all TLBs when the SCR_EL3.NS bit changes
+-                     uint32_t desc, bool fz16)
-+     * (1) we always invalidate all TLBs when the SCR_EL3.NS or SCR_EL3.NSE bit
++                     uint64_t negx, int negf, uint32_t desc, bool fz16)
-+     * changes.
+ {
-      * (2) there's no way to do a lookup that cares about Stage 2 for a
+     intptr_t i, oprsz = simd_oprsz(desc);
-      * different security state to the current one for AArch64, and AArch32
+-    int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-      * never has a secure EL2. (AArch32 ATS12NSO[UP][RW] allow EL3 to do
+     int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-      * an NS stage 1+2 lookup while the NS bit is 0.)
+     int is_q = oprsz == 16;
-      */
+     uint64_t n_4, m_4;
--    if (!arm_is_secure_below_el3(env) || !arm_el_is_aa64(env, 3)) {
-+    if (!arm_el_is_aa64(env, 3)) {
+-    /* Pre-load all of the f16 data, avoiding overlap issues.  */
-         return ARMMMUIdx_Phys_NS;
+-    n_4 = load4_f16(vn, is_q, is_2);
 +    /*
 +     * Pre-load all of the f16 data, avoiding overlap issues.
 +     * Negate all inputs for AH=0 FMLSL at once.
 +     */
 +    n_4 = load4_f16(vn, is_q, is_2) ^ negx;
      m_4 = load4_f16(vm, is_q, is_2);
 -    /* Negate all inputs for FMLSL at once.  */
 -    if (is_s) {
 -        n_4 ^= 0x8000800080008000ull;
 -    }
 -
      for (i = 0; i < oprsz / 4; i++) {
          float32 n_1 = float16_to_float32_by_bits(n_4 >> (i * 16), fz16);
          float32 m_1 = float16_to_float32_by_bits(m_4 >> (i * 16), fz16);
 -        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], 0, fpst);
 +        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], negf, fpst);
      }
--    if (stage2idx == ARMMMUIdx_Stage2_S) {
+     clear_tail(d, oprsz, simd_maxsz(desc));
--        s2walk_secure = !(env->cp15.vstcr_el2 & VSTCR_SW);
+ }
--    } else {
+@@ -XXX,XX +XXX,XX @@ static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
--        s2walk_secure = !(env->cp15.vtcr_el2 & VTCR_NSW);
+ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
                              CPUARMState *env, uint32_t desc)
  {
 -    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, desc,
 +    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 +
 +    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
               get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
  }
  void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
                              CPUARMState *env, uint32_t desc)
  {
 -    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, desc,
 +    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint64_t negx = 0;
 +    int negf = 0;
 +
 +    if (is_s) {
 +        if (env->vfp.fpcr & FPCR_AH) {
 +            negf = float_muladd_negate_product;
 +        } else {
 +            negx = 0x8000800080008000ull;
 +        }
 +    }
 +    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
               get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
  }
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
  }
  static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
 -                         uint32_t desc, bool fz16)
 +                         uint64_t negx, int negf, uint32_t desc, bool fz16)
  {
      intptr_t i, oprsz = simd_oprsz(desc);
 -    int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
      int index = extract32(desc, SIMD_DATA_SHIFT + 2, 3);
      int is_q = oprsz == 16;
      uint64_t n_4;
      float32 m_1;
 -    /* Pre-load all of the f16 data, avoiding overlap issues.  */
 -    n_4 = load4_f16(vn, is_q, is_2);
 -
 -    /* Negate all inputs for FMLSL at once.  */
 -    if (is_s) {
 -        n_4 ^= 0x8000800080008000ull;
 -    }
--    return s2walk_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS;
+-
++    /*
-+    switch (arm_security_space_below_el3(env)) {
++     * Pre-load all of the f16 data, avoiding overlap issues.
-+    case ARMSS_NonSecure:
++     * Negate all inputs for AH=0 FMLSL at once.
-+        return ARMMMUIdx_Phys_NS;
++     */
-+    case ARMSS_Realm:
++    n_4 = load4_f16(vn, is_q, is_2) ^ negx;
-+        return ARMMMUIdx_Phys_Realm;
+     m_1 = float16_to_float32_by_bits(((float16 *)vm)[H2(index)], fz16);
-+    case ARMSS_Secure:
-+        if (stage2idx == ARMMMUIdx_Stage2_S) {
+     for (i = 0; i < oprsz / 4; i++) {
-+            s2walk_secure = !(env->cp15.vstcr_el2 & VSTCR_SW);
+         float32 n_1 = float16_to_float32_by_bits(n_4 >> (i * 16), fz16);
 -        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], 0, fpst);
 +        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], negf, fpst);
      }
      clear_tail(d, oprsz, simd_maxsz(desc));
  }
@@ -XXX,XX +XXX,XX @@ static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
  void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
                                  CPUARMState *env, uint32_t desc)
  {
 -    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, desc,
 +    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 +
 +    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
                   get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
  }
  void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
                                  CPUARMState *env, uint32_t desc)
  {
 -    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, desc,
 +    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint64_t negx = 0;
 +    int negf = 0;
 +
 +    if (is_s) {
 +        if (env->vfp.fpcr & FPCR_AH) {
 +            negf = float_muladd_negate_product;
 +        } else {
-+            s2walk_secure = !(env->cp15.vtcr_el2 & VTCR_NSW);
++            negx = 0x8000800080008000ull;
 +        }
-+        return s2walk_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS;
-+    default:
-+        g_assert_not_reached();
 +    }
++    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
+                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
  }
- static bool regime_translation_big_endian(CPUARMState *env, ARMMMUIdx mmu_idx)
 --
 .34.1

-[PULL 05/35] qtest: irq_intercept_[out/in]: return FAIL if no intercepts are installed
+[PULL 49/68] target/arm: Handle FPCR.AH in SVE FMLSL (indexed)
-From: Chris Laplante <chris@laplante.io>
+From: Richard Henderson <richard.henderson@linaro.org>
-This is much better than just silently failing with OK.
+Handle FPCR.AH's requirement to not negate the sign of a NaN in SVE
 FMLSL (indexed), using the usual trick of negating by XOR when AH=0
 and by muladd flags when AH=1.
-Signed-off-by: Chris Laplante <chris@laplante.io>
+Since we have the CPUARMState* in the helper anyway, we can
-Message-id: 20230728160324.1159090-6-chris@laplante.io
+look directly at env->vfp.fpcr and don't need toa pass in the
 FPCR.AH value via the SIMD data word.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20250129013857.135256-32-richard.henderson@linaro.org
 [PMM: commit message tweaked]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- softmmu/qtest.c | 12 ++++++++++--
+ target/arm/tcg/vec_helper.c | 15 ++++++++++++---
-file changed, 10 insertions(+), 2 deletions(-)
+file changed, 12 insertions(+), 3 deletions(-)
-diff --git a/softmmu/qtest.c b/softmmu/qtest.c
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/softmmu/qtest.c
+--- a/target/arm/tcg/vec_helper.c
-+++ b/softmmu/qtest.c
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static void qtest_process_command(CharBackend *chr, gchar **words)
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
-         NamedGPIOList *ngl;
+                                CPUARMState *env, uint32_t desc)
-         bool is_named;
+ {
-         bool is_outbound;
+     intptr_t i, j, oprsz = simd_oprsz(desc);
-+        bool interception_succeeded = false;
+-    uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
++    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-         g_assert(words[1]);
+     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
-         is_named = words[2] != NULL;
+     intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
-@@ -XXX,XX +XXX,XX @@ static void qtest_process_command(CharBackend *chr, gchar **words)
+     float_status *status = &env->vfp.fp_status_a64;
-                     for (i = 0; i < ngl->num_out; ++i) {
+     bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
-                         qtest_install_gpio_out_intercept(dev, ngl->name, i);
++    int negx = 0, negf = 0;
-                     }
++
-+                    interception_succeeded = true;
++    if (is_s) {
-                 }
++        if (env->vfp.fpcr & FPCR_AH) {
-             } else {
++            negf = float_muladd_negate_product;
-                 qemu_irq_intercept_in(ngl->in, qtest_irq_handler,
++        } else {
-                                       ngl->num_in);
++            negx = 0x8000;
-+                interception_succeeded = true;
++        }
-             }
++    }
      for (i = 0; i < oprsz; i += 16) {
          float16 mm_16 = *(float16 *)(vm + i + idx);
          float32 mm = float16_to_float32_by_bits(mm_16, fz16);
          for (j = 0; j < 16; j += sizeof(float32)) {
 -            float16 nn_16 = *(float16 *)(vn + H1_2(i + j + sel)) ^ negn;
 +            float16 nn_16 = *(float16 *)(vn + H1_2(i + j + sel)) ^ negx;
              float32 nn = float16_to_float32_by_bits(nn_16, fz16);
              float32 aa = *(float32 *)(va + H1_4(i + j));
              *(float32 *)(vd + H1_4(i + j)) =
 -                float32_muladd(nn, mm, aa, 0, status);
 +                float32_muladd(nn, mm, aa, negf, status);
          }
--        irq_intercept_dev = dev;
+     }
-+
+ }
          qtest_send_prefix(chr);
 -        qtest_send(chr, "OK\n");
 +        if (interception_succeeded) {
 +            irq_intercept_dev = dev;
 +            qtest_send(chr, "OK\n");
 +        } else {
 +            qtest_send(chr, "FAIL No intercepts installed\n");
 +        }
      } else if (strcmp(words[0], "set_irq_in") == 0) {
          DeviceState *dev;
          qemu_irq irq;
 --
 .34.1

-[PULL 03/35] qtest: implement named interception of out-GPIO
+[PULL 50/68] target/arm: Handle FPCR.AH in SVE FMLSLB, FMLSLT (vectors)
-From: Chris Laplante <chris@laplante.io>
+From: Richard Henderson <richard.henderson@linaro.org>
-Adds qtest_irq_intercept_out_named method, which utilizes a new optional
+Handle FPCR.AH's requirement to not negate the sign of a NaN in SVE
-name parameter to the irq_intercept_out qtest command.
+FMLSL (indexed), using the usual trick of negating by XOR when AH=0
 and by muladd flags when AH=1.
-Signed-off-by: Chris Laplante <chris@laplante.io>
+Since we have the CPUARMState* in the helper anyway, we can
-Message-id: 20230728160324.1159090-4-chris@laplante.io
+look directly at env->vfp.fpcr and don't need toa pass in the
 FPCR.AH value via the SIMD data word.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20250129013857.135256-33-richard.henderson@linaro.org
 [PMM: tweaked commit message]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- tests/qtest/libqtest.h | 11 +++++++++++
+ target/arm/tcg/vec_helper.c | 15 ++++++++++++---
- softmmu/qtest.c        | 18 ++++++++++--------
+file changed, 12 insertions(+), 3 deletions(-)
  tests/qtest/libqtest.c |  6 ++++++
 files changed, 27 insertions(+), 8 deletions(-)
-diff --git a/tests/qtest/libqtest.h b/tests/qtest/libqtest.h
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/tests/qtest/libqtest.h
+--- a/target/arm/tcg/vec_helper.c
-+++ b/tests/qtest/libqtest.h
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ void qtest_irq_intercept_in(QTestState *s, const char *string);
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
-  */
+                                CPUARMState *env, uint32_t desc)
- void qtest_irq_intercept_out(QTestState *s, const char *string);
+ {
+     intptr_t i, oprsz = simd_oprsz(desc);
-+/**
+-    uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
-+ * qtest_irq_intercept_out_named:
++    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-+ * @s: #QTestState instance to operate on.
+     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
-+ * @qom_path: QOM path of a device.
+     float_status *status = &env->vfp.fp_status_a64;
-+ * @name: Name of the GPIO out pin
+     bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
-+ *
++    int negx = 0, negf = 0;
 + * Associate a qtest irq with the named GPIO-out pin of the device
 + * whose path is specified by @string and whose name is @name.
 + */
 +void qtest_irq_intercept_out_named(QTestState *s, const char *qom_path, const char *name);
 +
- /**
++    if (is_s) {
-  * qtest_set_irq_in:
++        if (env->vfp.fpcr & FPCR_AH) {
-  * @s: QTestState instance to operate on.
++            negf = float_muladd_negate_product;
-diff --git a/softmmu/qtest.c b/softmmu/qtest.c
++        } else {
-index XXXXXXX..XXXXXXX 100644
++            negx = 0x8000;
---- a/softmmu/qtest.c
++        }
-+++ b/softmmu/qtest.c
++    }
-@@ -XXX,XX +XXX,XX @@ static void qtest_process_command(CharBackend *chr, gchar **words)
-         || strcmp(words[0], "irq_intercept_in") == 0) {
+     for (i = 0; i < oprsz; i += sizeof(float32)) {
-         DeviceState *dev;
+-        float16 nn_16 = *(float16 *)(vn + H1_2(i + sel)) ^ negn;
-         NamedGPIOList *ngl;
++        float16 nn_16 = *(float16 *)(vn + H1_2(i + sel)) ^ negx;
-+        bool is_outbound;
+         float16 mm_16 = *(float16 *)(vm + H1_2(i + sel));
+         float32 nn = float16_to_float32_by_bits(nn_16, fz16);
-         g_assert(words[1]);
+         float32 mm = float16_to_float32_by_bits(mm_16, fz16);
-+        is_outbound = words[0][14] == 'o';
+         float32 aa = *(float32 *)(va + H1_4(i));
-         dev = DEVICE(object_resolve_path(words[1], NULL));
-         if (!dev) {
+-        *(float32 *)(vd + H1_4(i)) = float32_muladd(nn, mm, aa, 0, status);
-             qtest_send_prefix(chr);
++        *(float32 *)(vd + H1_4(i)) = float32_muladd(nn, mm, aa, negf, status);
-@@ -XXX,XX +XXX,XX @@ static void qtest_process_command(CharBackend *chr, gchar **words)
+     }
          }
          QLIST_FOREACH(ngl, &dev->gpios, node) {
 -            /* We don't support intercept of named GPIOs yet */
 -            if (ngl->name) {
 -                continue;
 -            }
 -            if (words[0][14] == 'o') {
 -                int i;
 -                for (i = 0; i < ngl->num_out; ++i) {
 -                    qtest_install_gpio_out_intercept(dev, ngl->name, i);
 +            /* We don't support inbound interception of named GPIOs yet */
 +            if (is_outbound) {
 +                /* NULL is valid and matchable, for "unnamed GPIO" */
 +                if (g_strcmp0(ngl->name, words[2]) == 0) {
 +                    int i;
 +                    for (i = 0; i < ngl->num_out; ++i) {
 +                        qtest_install_gpio_out_intercept(dev, ngl->name, i);
 +                    }
                  }
              } else {
                  qemu_irq_intercept_in(ngl->in, qtest_irq_handler,
 diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qtest/libqtest.c
 +++ b/tests/qtest/libqtest.c
@@ -XXX,XX +XXX,XX @@ void qtest_irq_intercept_out(QTestState *s, const char *qom_path)
      qtest_rsp(s);
  }
-+void qtest_irq_intercept_out_named(QTestState *s, const char *qom_path, const char *name)
-+{
-+    qtest_sendf(s, "irq_intercept_out %s %s\n", qom_path, name);
-+    qtest_rsp(s);
-+}
-+
- void qtest_irq_intercept_in(QTestState *s, const char *qom_path)
- {
-     qtest_sendf(s, "irq_intercept_in %s\n", qom_path);
 --
 .34.1

-[PULL 20/35] target/arm/ptw: Only fold in NSTable bit effects in Secure state
+[PULL 51/68] target/arm: Enable FEAT_AFP for '-cpu max'
-When we do a translation in Secure state, the NSTable bits in table
+Now that we have completed the handling for FPCR.{AH,FIZ,NEP}, we
-descriptors may downgrade us to NonSecure; we update ptw->in_secure
+can enable FEAT_AFP for '-cpu max', and document that we support it.
 and ptw->in_space accordingly.  We guard that check correctly with a
 conditional that means it's only applied for Secure stage 1
 translations.  However, later on in get_phys_addr_lpae() we fold the
 effects of the NSTable bits into the final descriptor attributes
 bits, and there we do it unconditionally regardless of the CPU state.
 That means that in Realm state (where in_secure is false) we will set
 bit 5 in attrs, and later use it to decide to output to non-secure
 space.
 We don't in fact need to do this folding in at all any more (since
 commit 2f1ff4e7b9f30c): if an NSTable bit was set then we have
 already set ptw->in_space to ARMSS_NonSecure, and in that situation
 we don't look at attrs bit 5.  The only thing we still need to deal
 with is the real NS bit in the final descriptor word, so we can just
 drop the code that ORed in the NSTable bit.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-9-peter.maydell@linaro.org
 ---
- target/arm/ptw.c | 3 +--
+ docs/system/arm/emulation.rst | 1 +
-file changed, 1 insertion(+), 2 deletions(-)
+ target/arm/tcg/cpu64.c        | 1 +
 files changed, 2 insertions(+)
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/docs/system/arm/emulation.rst
-+++ b/target/arm/ptw.c
++++ b/docs/system/arm/emulation.rst
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
-      * Extract attributes from the (modified) descriptor, and apply
+ - FEAT_AA64EL3 (Support for AArch64 at EL3)
-      * table descriptors. Stage 2 table descriptors do not include
+ - FEAT_AdvSIMD (Advanced SIMD Extension)
-      * any attribute fields. HPD disables all the table attributes
+ - FEAT_AES (AESD and AESE instructions)
--     * except NSTable.
++- FEAT_AFP (Alternate floating-point behavior)
-+     * except NSTable (which we have already handled).
+ - FEAT_Armv9_Crypto (Armv9 Cryptographic Extension)
-      */
+ - FEAT_ASID16 (16 bit ASID)
-     attrs = new_descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 14));
+ - FEAT_BBM at level 2 (Translation table break-before-make levels)
-     if (!regime_is_stage2(mmu_idx)) {
+diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
--        attrs |= !ptw->in_secure << 5; /* NS */
+index XXXXXXX..XXXXXXX 100644
-         if (!param.hpd) {
+--- a/target/arm/tcg/cpu64.c
-             attrs |= extract64(tableattrs, 0, 2) << 53;     /* XN, PXN */
++++ b/target/arm/tcg/cpu64.c
-             /*
+@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
      t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1);      /* FEAT_XNX */
      t = FIELD_DP64(t, ID_AA64MMFR1, ETS, 2);      /* FEAT_ETS2 */
      t = FIELD_DP64(t, ID_AA64MMFR1, HCX, 1);      /* FEAT_HCX */
 +    t = FIELD_DP64(t, ID_AA64MMFR1, AFP, 1);      /* FEAT_AFP */
      t = FIELD_DP64(t, ID_AA64MMFR1, TIDCP1, 1);   /* FEAT_TIDCP1 */
      t = FIELD_DP64(t, ID_AA64MMFR1, CMOW, 1);     /* FEAT_CMOW */
      cpu->isar.id_aa64mmfr1 = t;
 --
 .34.1

-[PULL 17/35] target/arm/ptw: Pass ARMSecurityState to regime_translation_disabled()
+[PULL 52/68] target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
-Plumb the ARMSecurityState through to regime_translation_disabled()
+FEAT_RPRES implements an "increased precision" variant of the single
-rather than just a bool is_secure.
+precision FRECPE and FRSQRTE instructions from an 8 bit to a 12
 bit mantissa. This applies only when FPCR.AH == 1. Note that the
 halfprec and double versions of these insns retain the 8 bit
 precision regardless.
 In this commit we add all the plumbing to make these instructions
 call a new helper function when the increased-precision is in
 effect. In the following commit we will provide the actual change
 in behaviour in the helpers.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-6-peter.maydell@linaro.org
 ---
- target/arm/ptw.c | 15 ++++++++-------
+ target/arm/cpu-features.h      |  5 +++++
-file changed, 8 insertions(+), 7 deletions(-)
+ target/arm/helper.h            |  4 ++++
  target/arm/tcg/translate-a64.c | 34 ++++++++++++++++++++++++++++++----
  target/arm/tcg/translate-sve.c | 16 ++++++++++++++--
  target/arm/tcg/vec_helper.c    |  2 ++
  target/arm/vfp_helper.c        | 32 ++++++++++++++++++++++++++++++--
 files changed, 85 insertions(+), 8 deletions(-)
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/cpu-features.h
-+++ b/target/arm/ptw.c
++++ b/target/arm/cpu-features.h
-@@ -XXX,XX +XXX,XX @@ static uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn)
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_mops(const ARMISARegisters *id)
+     return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, MOPS);
- /* Return true if the specified stage of address translation is disabled */
+ }
- static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
--                                        bool is_secure)
++static inline bool isar_feature_aa64_rpres(const ARMISARegisters *id)
-+                                        ARMSecuritySpace space)
++{
- {
++    return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, RPRES);
-     uint64_t hcr_el2;
++}
-+    bool is_secure = arm_space_is_secure(space);
++
+ static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
-     if (arm_feature(env, ARM_FEATURE_M)) {
+ {
-         switch (env->v7m.mpu_ctrl[is_secure] &
+     /* We always set the AdvSIMD and FP fields identically.  */
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav5(CPUARMState *env,
+diff --git a/target/arm/helper.h b/target/arm/helper.h
-     uint32_t base;
+index XXXXXXX..XXXXXXX 100644
-     ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
+--- a/target/arm/helper.h
-     bool is_user = regime_is_user(env, mmu_idx);
++++ b/target/arm/helper.h
--    bool is_secure = arm_space_is_secure(ptw->in_space);
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, fpst)
--    if (regime_translation_disabled(env, mmu_idx, is_secure)) {
+ DEF_HELPER_FLAGS_2(recpe_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
-+    if (regime_translation_disabled(env, mmu_idx, ptw->in_space)) {
+ DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
-         /* MPU disabled.  */
++DEF_HELPER_FLAGS_2(recpe_rpres_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
-         result->f.phys_addr = address;
+ DEF_HELPER_FLAGS_2(recpe_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
-         result->f.prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
+ DEF_HELPER_FLAGS_2(rsqrte_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav7(CPUARMState *env,
+ DEF_HELPER_FLAGS_2(rsqrte_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
-     result->f.lg_page_size = TARGET_PAGE_BITS;
++DEF_HELPER_FLAGS_2(rsqrte_rpres_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
-     result->f.prot = 0;
+ DEF_HELPER_FLAGS_2(rsqrte_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
+ DEF_HELPER_FLAGS_1(recpe_u32, TCG_CALL_NO_RWG, i32, i32)
--    if (regime_translation_disabled(env, mmu_idx, secure) ||
+ DEF_HELPER_FLAGS_1(rsqrte_u32, TCG_CALL_NO_RWG, i32, i32)
-+    if (regime_translation_disabled(env, mmu_idx, ptw->in_space) ||
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vrintx_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
-         m_is_ppb_region(env, address)) {
-         /*
+ DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
-          * MPU disabled or M profile PPB access: use default memory map.
+ DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
-@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
++DEF_HELPER_FLAGS_4(gvec_frecpe_rpres_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
-      * are done in arm_v7m_load_vector(), which always does a direct
+ DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
-      * read using address_space_ldl(), rather than going via this function.
-      */
+ DEF_HELPER_FLAGS_4(gvec_frsqrte_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
--    if (regime_translation_disabled(env, mmu_idx, secure)) { /* MPU disabled */
+ DEF_HELPER_FLAGS_4(gvec_frsqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
-+    if (regime_translation_disabled(env, mmu_idx, arm_secure_to_space(secure))) {
++DEF_HELPER_FLAGS_4(gvec_frsqrte_rpres_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
-+        /* MPU disabled */
+ DEF_HELPER_FLAGS_4(gvec_frsqrte_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
-         hit = true;
-     } else if (m_is_ppb_region(env, address)) {
+ DEF_HELPER_FLAGS_4(gvec_fcgt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
-         hit = true;
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
+index XXXXXXX..XXXXXXX 100644
-          */
+--- a/target/arm/tcg/translate-a64.c
-         ptw->in_mmu_idx = mmu_idx = s1_mmu_idx;
++++ b/target/arm/tcg/translate-a64.c
-         if (arm_feature(env, ARM_FEATURE_EL2) &&
+@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frecpe = {
--            !regime_translation_disabled(env, ARMMMUIdx_Stage2, is_secure)) {
+     gen_helper_recpe_f32,
-+            !regime_translation_disabled(env, ARMMMUIdx_Stage2, ptw->in_space)) {
+     gen_helper_recpe_f64,
-             return get_phys_addr_twostage(env, ptw, address, access_type,
+ };
-                                           result, fi);
+-TRANS(FRECPE_s, do_fp1_scalar_ah, a, &f_scalar_frecpe, -1)
-         }
++static const FPScalar1 f_scalar_frecpe_rpres = {
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
++    gen_helper_recpe_f16,
++    gen_helper_recpe_rpres_f32,
-     /* Definitely a real MMU, not an MPU */
++    gen_helper_recpe_f64,
++};
--    if (regime_translation_disabled(env, mmu_idx, is_secure)) {
++TRANS(FRECPE_s, do_fp1_scalar_ah, a,
-+    if (regime_translation_disabled(env, mmu_idx, ptw->in_space)) {
++      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
-         return get_phys_addr_disabled(env, ptw, address, access_type,
++      &f_scalar_frecpe_rpres : &f_scalar_frecpe, -1)
-                                       result, fi);
-     }
+ static const FPScalar1 f_scalar_frecpx = {
      gen_helper_frecpx_f16,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frsqrte = {
      gen_helper_rsqrte_f32,
      gen_helper_rsqrte_f64,
  };
 -TRANS(FRSQRTE_s, do_fp1_scalar_ah, a, &f_scalar_frsqrte, -1)
 +static const FPScalar1 f_scalar_frsqrte_rpres = {
 +    gen_helper_rsqrte_f16,
 +    gen_helper_rsqrte_rpres_f32,
 +    gen_helper_rsqrte_f64,
 +};
 +TRANS(FRSQRTE_s, do_fp1_scalar_ah, a,
 +      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
 +      &f_scalar_frsqrte_rpres : &f_scalar_frsqrte, -1)
  static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
  {
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_frecpe[] = {
      gen_helper_gvec_frecpe_s,
      gen_helper_gvec_frecpe_d,
  };
 -TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
 +static gen_helper_gvec_2_ptr * const f_frecpe_rpres[] = {
 +    gen_helper_gvec_frecpe_h,
 +    gen_helper_gvec_frecpe_rpres_s,
 +    gen_helper_gvec_frecpe_d,
 +};
 +TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0,
 +      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ? f_frecpe_rpres : f_frecpe)
  static gen_helper_gvec_2_ptr * const f_frsqrte[] = {
      gen_helper_gvec_frsqrte_h,
      gen_helper_gvec_frsqrte_s,
      gen_helper_gvec_frsqrte_d,
  };
 -TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
 +static gen_helper_gvec_2_ptr * const f_frsqrte_rpres[] = {
 +    gen_helper_gvec_frsqrte_h,
 +    gen_helper_gvec_frsqrte_rpres_s,
 +    gen_helper_gvec_frsqrte_d,
 +};
 +TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0,
 +      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ? f_frsqrte_rpres : f_frsqrte)
  static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
  {
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const frecpe_fns[] = {
      NULL,                     gen_helper_gvec_frecpe_h,
      gen_helper_gvec_frecpe_s, gen_helper_gvec_frecpe_d,
  };
 -TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frecpe_fns[a->esz], a, 0)
 +static gen_helper_gvec_2_ptr * const frecpe_rpres_fns[] = {
 +    NULL,                           gen_helper_gvec_frecpe_h,
 +    gen_helper_gvec_frecpe_rpres_s, gen_helper_gvec_frecpe_d,
 +};
 +TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz,
 +           s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
 +           frecpe_rpres_fns[a->esz] : frecpe_fns[a->esz], a, 0)
  static gen_helper_gvec_2_ptr * const frsqrte_fns[] = {
      NULL,                      gen_helper_gvec_frsqrte_h,
      gen_helper_gvec_frsqrte_s, gen_helper_gvec_frsqrte_d,
  };
 -TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frsqrte_fns[a->esz], a, 0)
 +static gen_helper_gvec_2_ptr * const frsqrte_rpres_fns[] = {
 +    NULL,                            gen_helper_gvec_frsqrte_h,
 +    gen_helper_gvec_frsqrte_rpres_s, gen_helper_gvec_frsqrte_d,
 +};
 +TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz,
 +           s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
 +           frsqrte_rpres_fns[a->esz] : frsqrte_fns[a->esz], a, 0)
  /*
   *** SVE Floating Point Compare with Zero Group
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, float_status *stat, uint32_t desc)  \
  DO_2OP(gvec_frecpe_h, helper_recpe_f16, float16)
  DO_2OP(gvec_frecpe_s, helper_recpe_f32, float32)
 +DO_2OP(gvec_frecpe_rpres_s, helper_recpe_rpres_f32, float32)
  DO_2OP(gvec_frecpe_d, helper_recpe_f64, float64)
  DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16)
  DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32)
 +DO_2OP(gvec_frsqrte_rpres_s, helper_rsqrte_rpres_f32, float32)
  DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64)
  DO_2OP(gvec_vrintx_h, float16_round_to_int, float16)
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, float_status *fpst)
      return make_float16(f16_val);
  }
 -float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
 +/*
 + * FEAT_RPRES means the f32 FRECPE has an "increased precision" variant
 + * which is used when FPCR.AH == 1.
 + */
 +static float32 do_recpe_f32(float32 input, float_status *fpst, bool rpres)
  {
      float32 f32 = float32_squash_input_denormal(input, fpst);
      uint32_t f32_val = float32_val(f32);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
      return make_float32(f32_val);
  }
 +float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
 +{
 +    return do_recpe_f32(input, fpst, false);
 +}
 +
 +float32 HELPER(recpe_rpres_f32)(float32 input, float_status *fpst)
 +{
 +    return do_recpe_f32(input, fpst, true);
 +}
 +
  float64 HELPER(recpe_f64)(float64 input, float_status *fpst)
  {
      float64 f64 = float64_squash_input_denormal(input, fpst);
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
      return make_float16(val);
  }
 -float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
 +/*
 + * FEAT_RPRES means the f32 FRSQRTE has an "increased precision" variant
 + * which is used when FPCR.AH == 1.
 + */
 +static float32 do_rsqrte_f32(float32 input, float_status *s, bool rpres)
  {
      float32 f32 = float32_squash_input_denormal(input, s);
      uint32_t val = float32_val(f32);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
      return make_float32(val);
  }
 +float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
 +{
 +    return do_rsqrte_f32(input, s, false);
 +}
 +
 +float32 HELPER(rsqrte_rpres_f32)(float32 input, float_status *s)
 +{
 +    return do_rsqrte_f32(input, s, true);
 +}
 +
  float64 HELPER(rsqrte_f64)(float64 input, float_status *s)
  {
      float64 f64 = float64_squash_input_denormal(input, s);
 --
 .34.1

-[PULL 22/35] target/arm/ptw: Remove S1Translate::in_secure
+[PULL 53/68] target/arm: Implement increased precision FRECPE
-We no longer look at the in_secure field of the S1Translate struct
+Implement the increased precision variation of FRECPE.  In the
-anyway, so we can remove it and all the code which sets it.
+pseudocode this corresponds to the handling of the
 "increasedprecision" boolean in the FPRecipEstimate() and
 RecipEstimate() functions.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-11-peter.maydell@linaro.org
 ---
- target/arm/ptw.c | 13 -------------
+ target/arm/vfp_helper.c | 54 +++++++++++++++++++++++++++++++++++------
-file changed, 13 deletions(-)
+file changed, 46 insertions(+), 8 deletions(-)
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/vfp_helper.c
-+++ b/target/arm/ptw.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
+@@ -XXX,XX +XXX,XX @@ static int recip_estimate(int input)
-      *    value being Stage2 vs Stage2_S distinguishes those.
+     return r;
-      */
+ }
-     ARMSecuritySpace in_space;
--    /*
++/*
--     * in_secure: whether the translation regime is a Secure one.
++ * Increased precision version:
--     * This is always equal to arm_space_is_secure(in_space).
++ * input is a 13 bit fixed point number
--     * If a Secure ptw is "downgraded" to NonSecure by an NSTable bit,
++ * input range 2048 .. 4095 for a number from 0.5 <= x < 1.0.
--     * this field is updated accordingly.
++ * result range 4096 .. 8191 for a number from 1.0 to 2.0
--     */
++ */
--    bool in_secure;
++static int recip_estimate_incprec(int input)
-     /*
++{
-      * in_debug: is this a QEMU debug access (gdbstub, etc)? Debug
++    int a, b, r;
-      * accesses will not update the guest page table access flags
++    assert(2048 <= input && input < 4096);
-@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
++    a = (input * 2) + 1;
-         S1Translate s2ptw = {
++    /*
-             .in_mmu_idx = s2_mmu_idx,
++     * The pseudocode expresses this as an operation on infinite
-             .in_ptw_idx = ptw_idx_for_stage_2(env, s2_mmu_idx),
++     * precision reals where it calculates 2^25 / a and then looks
--            .in_secure = arm_space_is_secure(s2_space),
++     * at the error between that and the rounded-down-to-integer
-             .in_space = s2_space,
++     * value to see if it should instead round up. We instead
-             .in_debug = true,
++     * follow the same approach as the pseudocode for the 8-bit
-         };
++     * precision version, and calculate (2 * (2^25 / a)) as an
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
++     * integer so we can do the "add one and halve" to round it.
-         QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_S + 1 != ARMMMUIdx_Phys_NS);
++     * So the 1 << 26 here is correct.
-         QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2_S + 1 != ARMMMUIdx_Stage2);
++     */
-         ptw->in_ptw_idx += 1;
++    b = (1 << 26) / a;
--        ptw->in_secure = false;
++    r = (b + 1) >> 1;
-         ptw->in_space = ARMSS_NonSecure;
++    assert(4096 <= r && r < 8192);
 +    return r;
 +}
 +
  /*
   * Common wrapper to call recip_estimate
   *
@@ -XXX,XX +XXX,XX @@ static int recip_estimate(int input)
   * callee.
   */
 -static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
 +static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac,
 +                                    bool increasedprecision)
  {
      uint32_t scaled, estimate;
      uint64_t result_frac;
@@ -XXX,XX +XXX,XX @@ static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
          }
      }
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
+-    /* scaled = UInt('1':fraction<51:44>) */
+-    scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
-     ptw->in_s1_is_el0 = ptw->in_mmu_idx == ARMMMUIdx_Stage1_E0;
+-    estimate = recip_estimate(scaled);
-     ptw->in_mmu_idx = ipa_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
++    if (increasedprecision) {
--    ptw->in_secure = ipa_secure;
++        /* scaled = UInt('1':fraction<51:41>) */
-     ptw->in_space = ipa_space;
++        scaled = deposit32(1 << 11, 0, 11, extract64(frac, 41, 11));
-     ptw->in_ptw_idx = ptw_idx_for_stage_2(env, ptw->in_mmu_idx);
++        estimate = recip_estimate_incprec(scaled);
++    } else {
-@@ -XXX,XX +XXX,XX @@ bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
++        /* scaled = UInt('1':fraction<51:44>) */
- {
++        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
-     S1Translate ptw = {
++        estimate = recip_estimate(scaled);
-         .in_mmu_idx = mmu_idx,
++    }
--        .in_secure = is_secure,
-         .in_space = arm_secure_to_space(is_secure),
+     result_exp = exp_off - *exp;
-     };
+-    result_frac = deposit64(0, 44, 8, estimate);
-     return get_phys_addr_gpc(env, &ptw, address, access_type, result, fi);
++    if (increasedprecision) {
-@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
++        result_frac = deposit64(0, 40, 12, estimate);
 +    } else {
 +        result_frac = deposit64(0, 44, 8, estimate);
 +    }
      if (result_exp == 0) {
          result_frac = deposit64(result_frac >> 1, 51, 1, 1);
      } else if (result_exp == -1) {
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, float_status *fpst)
      }
-     ptw.in_space = ss;
+     f64_frac = call_recip_estimate(&f16_exp, 29,
--    ptw.in_secure = arm_space_is_secure(ss);
+-                                   ((uint64_t) f16_frac) << (52 - 10));
-     return get_phys_addr_gpc(env, &ptw, address, access_type, result, fi);
++                                   ((uint64_t) f16_frac) << (52 - 10), false);
- }
+     /* result = sign : result_exp<4:0> : fraction<51:42> */
-@@ -XXX,XX +XXX,XX @@ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
+     f16_val = deposit32(0, 15, 1, f16_sign);
-     S1Translate ptw = {
+@@ -XXX,XX +XXX,XX @@ static float32 do_recpe_f32(float32 input, float_status *fpst, bool rpres)
-         .in_mmu_idx = mmu_idx,
+     }
-         .in_space = ss,
--        .in_secure = arm_space_is_secure(ss),
+     f64_frac = call_recip_estimate(&f32_exp, 253,
-         .in_debug = true,
+-                                   ((uint64_t) f32_frac) << (52 - 23));
-     };
++                                   ((uint64_t) f32_frac) << (52 - 23), rpres);
-     GetPhysAddrResult res = {};
      /* result = sign : result_exp<7:0> : fraction<51:29> */
      f32_val = deposit32(0, 31, 1, f32_sign);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(recpe_f64)(float64 input, float_status *fpst)
          return float64_set_sign(float64_zero, float64_is_neg(f64));
      }
 -    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac);
 +    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac, false);
      /* result = sign : result_exp<10:0> : fraction<51:0>; */
      f64_val = deposit64(0, 63, 1, f64_sign);
 --
 .34.1

-[PULL 25/35] target/arm/ptw: Check for block descriptors at invalid levels
+[PULL 54/68] target/arm: Implement increased precision FRSQRTE
-The architecture doesn't permit block descriptors at any arbitrary
+Implement the increased precision variation of FRSQRTE.  In the
-level of the page table walk; it depends on the granule size which
+pseudocode this corresponds to the handling of the
-levels are permitted.  We implemented only a partial version of this
+"increasedprecision" boolean in the FPRSqrtEstimate() and
-check which assumes that block descriptors are valid at all levels
+RecipSqrtEstimate() functions.
 except level 3, which meant that we wouldn't deliver the Translation
 fault for all cases of this sort of guest page table error.
 Implement the logic corresponding to the pseudocode
 AArch64.DecodeDescriptorType() and AArch64.BlockDescSupported().
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-14-peter.maydell@linaro.org
 ---
- target/arm/ptw.c | 25 +++++++++++++++++++++++--
+ target/arm/vfp_helper.c | 77 ++++++++++++++++++++++++++++++++++-------
-file changed, 23 insertions(+), 2 deletions(-)
+file changed, 64 insertions(+), 13 deletions(-)
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/vfp_helper.c
-+++ b/target/arm/ptw.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ static int check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, uint64_t tcr,
+@@ -XXX,XX +XXX,XX @@ static int do_recip_sqrt_estimate(int a)
-     return INT_MIN;
+     return estimate;
  }
-+static bool lpae_block_desc_valid(ARMCPU *cpu, bool ds,
++static int do_recip_sqrt_estimate_incprec(int a)
 +                                  ARMGranuleSize gran, int level)
 +{
 +    /*
-+     * See pseudocode AArch46.BlockDescSupported(): block descriptors
++     * The Arm ARM describes the 12-bit precision version of RecipSqrtEstimate
-+     * are not valid at all levels, depending on the page size.
++     * in terms of an infinite-precision floating point calculation of a
 +     * square root. We implement this using the same kind of pure integer
 +     * algorithm as the 8-bit mantissa, to get the same bit-for-bit result.
 +     */
-+    switch (gran) {
++    int64_t b, estimate;
-+    case Gran4K:
-+        return (level == 0 && ds) || level == 1 || level == 2;
+-static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
-+    case Gran16K:
++    assert(1024 <= a && a < 4096);
-+        return (level == 1 && ds) || level == 2;
++    if (a < 2048) {
-+    case Gran64K:
++        a = a * 2 + 1;
-+        return (level == 1 && arm_pamax(cpu) == 52) || level == 2;
++    } else {
-+    default:
++        a = (a >> 1) << 1;
-+        g_assert_not_reached();
++        a = (a + 1) * 2;
 +    }
++    b = 8192;
++    while (a * (b + 1) * (b + 1) < (1ULL << 39)) {
++        b += 1;
++    }
++    estimate = (b + 1) / 2;
++
++    assert(4096 <= estimate && estimate < 8192);
++
++    return estimate;
 +}
 +
- /**
++static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac,
-  * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
++                                    bool increasedprecision)
-  *
+ {
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+     int estimate;
-     new_descriptor = descriptor;
+     uint32_t scaled;
+@@ -XXX,XX +XXX,XX @@ static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
-  restart_atomic_update:
+         frac = extract64(frac, 0, 51) << 1;
 -    if (!(descriptor & 1) || (!(descriptor & 2) && (level == 3))) {
 -        /* Invalid, or the Reserved level 3 encoding */
 +    if (!(descriptor & 1) ||
 +        (!(descriptor & 2) &&
 +         !lpae_block_desc_valid(cpu, param.ds, param.gran, level))) {
 +        /* Invalid, or a block descriptor at an invalid level */
          goto do_translation_fault;
      }
+-    if (*exp & 1) {
+-        /* scaled = UInt('01':fraction<51:45>) */
+-        scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
++    if (increasedprecision) {
++        if (*exp & 1) {
++            /* scaled = UInt('01':fraction<51:42>) */
++            scaled = deposit32(1 << 10, 0, 10, extract64(frac, 42, 10));
++        } else {
++            /* scaled = UInt('1':fraction<51:41>) */
++            scaled = deposit32(1 << 11, 0, 11, extract64(frac, 41, 11));
++        }
++        estimate = do_recip_sqrt_estimate_incprec(scaled);
+     } else {
+-        /* scaled = UInt('1':fraction<51:44>) */
+-        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
++        if (*exp & 1) {
++            /* scaled = UInt('01':fraction<51:45>) */
++            scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
++        } else {
++            /* scaled = UInt('1':fraction<51:44>) */
++            scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
++        }
++        estimate = do_recip_sqrt_estimate(scaled);
+     }
+-    estimate = do_recip_sqrt_estimate(scaled);
+     *exp = (exp_off - *exp) / 2;
+-    return extract64(estimate, 0, 8) << 44;
++    if (increasedprecision) {
++        return extract64(estimate, 0, 12) << 40;
++    } else {
++        return extract64(estimate, 0, 8) << 44;
++    }
+ }
+ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
+     f64_frac = ((uint64_t) f16_frac) << (52 - 10);
+-    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac);
++    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac, false);
+     /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(2) */
+     val = deposit32(0, 15, 1, f16_sign);
+@@ -XXX,XX +XXX,XX @@ static float32 do_rsqrte_f32(float32 input, float_status *s, bool rpres)
+     f64_frac = ((uint64_t) f32_frac) << 29;
+-    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac);
++    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac, rpres);
+-    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(15) */
++    /*
++     * result = sign : result_exp<7:0> : estimate<7:0> : Zeros(15)
++     * or for increased precision
++     * result = sign : result_exp<7:0> : estimate<11:0> : Zeros(11)
++     */
+     val = deposit32(0, 31, 1, f32_sign);
+     val = deposit32(val, 23, 8, f32_exp);
+-    val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
++    if (rpres) {
++        val = deposit32(val, 11, 12, extract64(f64_frac, 52 - 12, 12));
++    } else {
++        val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
++    }
+     return make_float32(val);
+ }
+@@ -XXX,XX +XXX,XX @@ float64 HELPER(rsqrte_f64)(float64 input, float_status *s)
+         return float64_zero;
+     }
+-    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac);
++    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac, false);
+     /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(44) */
+     val = deposit64(0, 61, 1, f64_sign);
 --
 .34.1

-[PULL 14/35] target/arm/ptw: Don't report GPC faults on stage 1 ptw as stage2 faults
+[PULL 55/68] target/arm: Enable FEAT_RPRES for -cpu max
-In S1_ptw_translate() we set up the ARMMMUFaultInfo if the attempt to
+Now the emulation is complete, we can enable FEAT_RPRES for the 'max'
-translate the page descriptor address into a physical address fails.
+CPU type.
 This used to only be possible if we are doing a stage 2 ptw for that
 descriptor address, and so the code always sets fi->stage2 and
 fi->s1ptw to true.  However, with FEAT_RME it is also possible for
 the lookup of the page descriptor address to fail because of a
 Granule Protection Check fault.  These should not be reported as
 stage 2, otherwise arm_deliver_fault() will incorrectly set
 HPFAR_EL2.  Similarly the s1ptw bit should only be set for stage 2
 faults on stage 1 translation table walks, i.e.  not for GPC faults.
 Add a comment to the the other place where we might detect a
 stage2-fault-on-stage-1-ptw, in arm_casq_ptw(), noting why we know in
 that case that it must really be a stage 2 fault and not a GPC fault.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230807141514.19075-3-peter.maydell@linaro.org
 ---
- target/arm/ptw.c | 10 ++++++++--
+ docs/system/arm/emulation.rst | 1 +
-file changed, 8 insertions(+), 2 deletions(-)
+ target/arm/tcg/cpu64.c        | 1 +
 files changed, 2 insertions(+)
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/docs/system/arm/emulation.rst
-+++ b/target/arm/ptw.c
++++ b/docs/system/arm/emulation.rst
-@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
-         fi->type = ARMFault_GPCFOnWalk;
+ - FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions)
-     }
+ - FEAT_RME (Realm Management Extension) (NB: support status in QEMU is experimental)
-     fi->s2addr = addr;
+ - FEAT_RNG (Random number generator)
--    fi->stage2 = true;
++- FEAT_RPRES (Increased precision of FRECPE and FRSQRTE)
--    fi->s1ptw = true;
+ - FEAT_S2FWB (Stage 2 forced Write-Back)
-+    fi->stage2 = regime_is_stage2(s2_mmu_idx);
+ - FEAT_SB (Speculation Barrier)
-+    fi->s1ptw = fi->stage2;
+ - FEAT_SEL2 (Secure EL2)
-     fi->s1ns = !is_secure;
+diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
-     return false;
+index XXXXXXX..XXXXXXX 100644
- }
+--- a/target/arm/tcg/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static uint64_t arm_casq_ptw(CPUARMState *env, uint64_t old_val,
++++ b/target/arm/tcg/cpu64.c
-         env->tlb_fi = NULL;
+@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
+     cpu->isar.id_aa64isar1 = t;
-         if (unlikely(flags & TLB_INVALID_MASK)) {
-+            /*
+     t = cpu->isar.id_aa64isar2;
-+             * We know this must be a stage 2 fault because the granule
++    t = FIELD_DP64(t, ID_AA64ISAR2, RPRES, 1);    /* FEAT_RPRES */
-+             * protection table does not separately track read and write
+     t = FIELD_DP64(t, ID_AA64ISAR2, MOPS, 1);     /* FEAT_MOPS */
-+             * permission, so all GPC faults are caught in S1_ptw_translate():
+     t = FIELD_DP64(t, ID_AA64ISAR2, BC, 1);       /* FEAT_HBC */
-+             * we only get here for "readable but not writeable".
+     t = FIELD_DP64(t, ID_AA64ISAR2, WFXT, 2);     /* FEAT_WFxT */
 +             */
              assert(fi->type != ARMFault_None);
              fi->s2addr = ptw->out_virt;
              fi->stage2 = true;
 --
 .34.1

-New patch
+[PULL 56/68] target/arm: Introduce CPUARMState.vfp.fp_status[]
+From: Richard Henderson <richard.henderson@linaro.org>
 Move ARMFPStatusFlavour to cpu.h with which to index
 this array.  For now, place the array in an anonymous
 union with the existing structures.  Adjust the order
 of the existing structures to match the enum.
 Simplify fpstatus_ptr() using the new array.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20250129013857.135256-7-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/cpu.h           | 119 +++++++++++++++++++++----------------
  target/arm/tcg/translate.h |  64 +-------------------
 files changed, 70 insertions(+), 113 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct ARMMMUFaultInfo ARMMMUFaultInfo;
  typedef struct NVICState NVICState;
 +/*
 + * Enum for indexing vfp.fp_status[].
 + *
 + * FPST_A32: is the "normal" fp status for AArch32 insns
 + * FPST_A64: is the "normal" fp status for AArch64 insns
 + * FPST_A32_F16: used for AArch32 half-precision calculations
 + * FPST_A64_F16: used for AArch64 half-precision calculations
 + * FPST_STD: the ARM "Standard FPSCR Value"
 + * FPST_STD_F16: used for half-precision
 + *       calculations with the ARM "Standard FPSCR Value"
 + * FPST_AH: used for the A64 insns which change behaviour
 + *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 + *       and the reciprocal and square root estimate/step insns)
 + * FPST_AH_F16: used for the A64 insns which change behaviour
 + *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 + *       and the reciprocal and square root estimate/step insns);
 + *       for half-precision
 + *
 + * Half-precision operations are governed by a separate
 + * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
 + * status structure to control this.
 + *
 + * The "Standard FPSCR", ie default-NaN, flush-to-zero,
 + * round-to-nearest and is used by any operations (generally
 + * Neon) which the architecture defines as controlled by the
 + * standard FPSCR value rather than the FPSCR.
 + *
 + * The "standard FPSCR but for fp16 ops" is needed because
 + * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
 + * using a fixed value for it.
 + *
 + * The ah_fp_status is needed because some insns have different
 + * behaviour when FPCR.AH == 1: they don't update cumulative
 + * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
 + * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
 + * which means we need an ah_fp_status_f16 as well.
 + *
 + * To avoid having to transfer exception bits around, we simply
 + * say that the FPSCR cumulative exception flags are the logical
 + * OR of the flags in the four fp statuses. This relies on the
 + * only thing which needs to read the exception flags being
 + * an explicit FPSCR read.
 + */
 +typedef enum ARMFPStatusFlavour {
 +    FPST_A32,
 +    FPST_A64,
 +    FPST_A32_F16,
 +    FPST_A64_F16,
 +    FPST_AH,
 +    FPST_AH_F16,
 +    FPST_STD,
 +    FPST_STD_F16,
 +} ARMFPStatusFlavour;
 +#define FPST_COUNT  8
 +
  typedef struct CPUArchState {
      /* Regs for current mode.  */
      uint32_t regs[16];
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          /* Scratch space for aa32 neon expansion.  */
          uint32_t scratch[8];
 -        /* There are a number of distinct float control structures:
 -         *
 -         *  fp_status_a32: is the "normal" fp status for AArch32 insns
 -         *  fp_status_a64: is the "normal" fp status for AArch64 insns
 -         *  fp_status_fp16_a32: used for AArch32 half-precision calculations
 -         *  fp_status_fp16_a64: used for AArch64 half-precision calculations
 -         *  standard_fp_status : the ARM "Standard FPSCR Value"
 -         *  standard_fp_status_fp16 : used for half-precision
 -         *       calculations with the ARM "Standard FPSCR Value"
 -         *  ah_fp_status: used for the A64 insns which change behaviour
 -         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 -         *       and the reciprocal and square root estimate/step insns)
 -         *  ah_fp_status_f16: used for the A64 insns which change behaviour
 -         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 -         *       and the reciprocal and square root estimate/step insns);
 -         *       for half-precision
 -         *
 -         * Half-precision operations are governed by a separate
 -         * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
 -         * status structure to control this.
 -         *
 -         * The "Standard FPSCR", ie default-NaN, flush-to-zero,
 -         * round-to-nearest and is used by any operations (generally
 -         * Neon) which the architecture defines as controlled by the
 -         * standard FPSCR value rather than the FPSCR.
 -         *
 -         * The "standard FPSCR but for fp16 ops" is needed because
 -         * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
 -         * using a fixed value for it.
 -         *
 -         * The ah_fp_status is needed because some insns have different
 -         * behaviour when FPCR.AH == 1: they don't update cumulative
 -         * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
 -         * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
 -         * which means we need an ah_fp_status_f16 as well.
 -         *
 -         * To avoid having to transfer exception bits around, we simply
 -         * say that the FPSCR cumulative exception flags are the logical
 -         * OR of the flags in the four fp statuses. This relies on the
 -         * only thing which needs to read the exception flags being
 -         * an explicit FPSCR read.
 -         */
 -        float_status fp_status_a32;
 -        float_status fp_status_a64;
 -        float_status fp_status_f16_a32;
 -        float_status fp_status_f16_a64;
 -        float_status standard_fp_status;
 -        float_status standard_fp_status_f16;
 -        float_status ah_fp_status;
 -        float_status ah_fp_status_f16;
 +        /* There are a number of distinct float control structures. */
 +        union {
 +            float_status fp_status[FPST_COUNT];
 +            struct {
 +                float_status fp_status_a32;
 +                float_status fp_status_a64;
 +                float_status fp_status_f16_a32;
 +                float_status fp_status_f16_a64;
 +                float_status ah_fp_status;
 +                float_status ah_fp_status_f16;
 +                float_status standard_fp_status;
 +                float_status standard_fp_status_f16;
 +            };
 +        };
          uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
          uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
 diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate.h
 +++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ static inline CPUARMTBFlags arm_tbflags_from_tb(const TranslationBlock *tb)
      return (CPUARMTBFlags){ tb->flags, tb->cs_base };
  }
 -/*
 - * Enum for argument to fpstatus_ptr().
 - */
 -typedef enum ARMFPStatusFlavour {
 -    FPST_A32,
 -    FPST_A64,
 -    FPST_A32_F16,
 -    FPST_A64_F16,
 -    FPST_AH,
 -    FPST_AH_F16,
 -    FPST_STD,
 -    FPST_STD_F16,
 -} ARMFPStatusFlavour;
 -
  /**
   * fpstatus_ptr: return TCGv_ptr to the specified fp_status field
   *
   * We have multiple softfloat float_status fields in the Arm CPU state struct
   * (see the comment in cpu.h for details). Return a TCGv_ptr which has
   * been set up to point to the requested field in the CPU state struct.
 - * The options are:
 - *
 - * FPST_A32
 - *   for AArch32 non-FP16 operations controlled by the FPCR
 - * FPST_A64
 - *   for AArch64 non-FP16 operations controlled by the FPCR
 - * FPST_A32_F16
 - *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
 - * FPST_A64_F16
 - *   for AArch64 operations controlled by the FPCR where FPCR.FZ16 is to be used
 - * FPST_AH:
 - *   for AArch64 operations which change behaviour when AH=1 (specifically,
 - *   bfloat16 conversions and multiplies, and the reciprocal and square root
 - *   estimate/step insns)
 - * FPST_AH_F16:
 - *   ditto, but for half-precision operations
 - * FPST_STD
 - *   for A32/T32 Neon operations using the "standard FPSCR value"
 - * FPST_STD_F16
 - *   as FPST_STD, but where FPCR.FZ16 is to be used
   */
  static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
  {
      TCGv_ptr statusptr = tcg_temp_new_ptr();
 -    int offset;
 +    int offset = offsetof(CPUARMState, vfp.fp_status[flavour]);
 -    switch (flavour) {
 -    case FPST_A32:
 -        offset = offsetof(CPUARMState, vfp.fp_status_a32);
 -        break;
 -    case FPST_A64:
 -        offset = offsetof(CPUARMState, vfp.fp_status_a64);
 -        break;
 -    case FPST_A32_F16:
 -        offset = offsetof(CPUARMState, vfp.fp_status_f16_a32);
 -        break;
 -    case FPST_A64_F16:
 -        offset = offsetof(CPUARMState, vfp.fp_status_f16_a64);
 -        break;
 -    case FPST_AH:
 -        offset = offsetof(CPUARMState, vfp.ah_fp_status);
 -        break;
 -    case FPST_AH_F16:
 -        offset = offsetof(CPUARMState, vfp.ah_fp_status_f16);
 -        break;
 -    case FPST_STD:
 -        offset = offsetof(CPUARMState, vfp.standard_fp_status);
 -        break;
 -    case FPST_STD_F16:
 -        offset = offsetof(CPUARMState, vfp.standard_fp_status_f16);
 -        break;
 -    default:
 -        g_assert_not_reached();
 -    }
      tcg_gen_addi_ptr(statusptr, tcg_env, offset);
      return statusptr;
  }
 --
 .34.1

-New patch
+[PULL 57/68] target/arm: Remove standard_fp_status_f16
+From: Richard Henderson <richard.henderson@linaro.org>
+Replace with fp_status[FPST_STD_F16].
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20250129013857.135256-8-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/cpu.h            |  1 -
+ target/arm/cpu.c            |  4 ++--
+ target/arm/tcg/mve_helper.c | 24 ++++++++++++------------
+ target/arm/vfp_helper.c     |  8 ++++----
+files changed, 18 insertions(+), 19 deletions(-)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
+                 float_status ah_fp_status;
+                 float_status ah_fp_status_f16;
+                 float_status standard_fp_status;
+-                float_status standard_fp_status_f16;
+             };
+         };
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
+     set_flush_to_zero(1, &env->vfp.standard_fp_status);
+     set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
+     set_default_nan_mode(1, &env->vfp.standard_fp_status);
+-    set_default_nan_mode(1, &env->vfp.standard_fp_status_f16);
++    set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
+     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
+-    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
++    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
+     arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
+     set_flush_to_zero(1, &env->vfp.ah_fp_status);
+     set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
+diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/mve_helper.c
++++ b/target/arm/tcg/mve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
++            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+                 &env->vfp.standard_fp_status;                           \
+             if (!(mask & 1)) {                                          \
+                 /* We need the result but without updating flags */     \
+@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
+                 r[e] = 0;                                               \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
++            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+                 &env->vfp.standard_fp_status;                           \
+             if (!(tm & 1)) {                                            \
+                 /* We need the result but without updating flags */     \
+@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
++            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+                 &env->vfp.standard_fp_status;                           \
+             if (!(mask & 1)) {                                          \
+                 /* We need the result but without updating flags */     \
+@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE * 2)) == 0) {          \
+                 continue;                                               \
+             }                                                           \
+-            fpst0 = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :   \
++            fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
+                 &env->vfp.standard_fp_status;                           \
+             fpst1 = fpst0;                                              \
+             if (!(mask & 1)) {                                          \
+@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
++            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+                 &env->vfp.standard_fp_status;                           \
+             if (!(mask & 1)) {                                          \
+                 /* We need the result but without updating flags */     \
+@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
++            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+                 &env->vfp.standard_fp_status;                           \
+             if (!(mask & 1)) {                                          \
+                 /* We need the result but without updating flags */     \
+@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
+         TYPE *m = vm;                                           \
+         TYPE ra = (TYPE)ra_in;                                  \
+         float_status *fpst = (ESIZE == 2) ?                     \
+-            &env->vfp.standard_fp_status_f16 :                  \
++            &env->vfp.fp_status[FPST_STD_F16] :                 \
+             &env->vfp.standard_fp_status;                       \
+         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
+             if (mask & 1) {                                     \
+@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
+             if ((mask & emask) == 0) {                                  \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
++            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+                 &env->vfp.standard_fp_status;                           \
+             if (!(mask & (1 << (e * ESIZE)))) {                         \
+                 /* We need the result but without updating flags */     \
+@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
+             if ((mask & emask) == 0) {                                  \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
++            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+                 &env->vfp.standard_fp_status;                           \
+             if (!(mask & (1 << (e * ESIZE)))) {                         \
+                 /* We need the result but without updating flags */     \
+@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
++            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+                 &env->vfp.standard_fp_status;                           \
+             if (!(mask & 1)) {                                          \
+                 /* We need the result but without updating flags */     \
+@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
+         float_status *fpst;                                             \
+         float_status scratch_fpst;                                      \
+         float_status *base_fpst = (ESIZE == 2) ?                        \
+-            &env->vfp.standard_fp_status_f16 :                          \
++            &env->vfp.fp_status[FPST_STD_F16] :                         \
+             &env->vfp.standard_fp_status;                               \
+         uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
+         set_float_rounding_mode(rmode, base_fpst);                      \
+@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
++            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+                 &env->vfp.standard_fp_status;                           \
+             if (!(mask & 1)) {                                          \
+                 /* We need the result but without updating flags */     \
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp_helper.c
++++ b/target/arm/vfp_helper.c
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
+     /* FZ16 does not generate an input denormal exception.  */
+     a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
+           & ~float_flag_input_denormal_flushed);
+-    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
++    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_STD_F16])
+           & ~float_flag_input_denormal_flushed);
+     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
+@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
+     set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
+     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
+     set_float_exception_flags(0, &env->vfp.standard_fp_status);
+-    set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
++    set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
+     set_float_exception_flags(0, &env->vfp.ah_fp_status);
+     set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
+ }
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
+         bool ftz_enabled = val & FPCR_FZ16;
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
+-        set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
++        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
+         set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
+-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
++        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
+     }
+     if (changed & FPCR_FZ) {
+--
+.34.1

-[PULL 33/35] target/arm/helper: Implement CNTHCTL_EL2.CNT[VP]MASK
+[PULL 58/68] target/arm: Remove standard_fp_status
-From: Jean-Philippe Brucker <jean-philippe@linaro.org>
+From: Richard Henderson <richard.henderson@linaro.org>
-When FEAT_RME is implemented, these bits override the value of
+Replace with fp_status[FPST_STD].
-CNT[VP]_CTL_EL0.IMASK in Realm and Root state. Move the IRQ state update
-into a new gt_update_irq() function and test those bits every time we
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-recompute the IRQ state.
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20250129013857.135256-9-richard.henderson@linaro.org
 Since we're removing the IRQ state from some trace events, add a new
 trace event for gt_update_irq().
 Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
 Message-id: 20230809123706.1842548-7-jean-philippe@linaro.org
 [PMM: only register change hook if not USER_ONLY and if TCG]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h        |  4 +++
+ target/arm/cpu.h            |  1 -
- target/arm/cpu.c        |  6 ++++
+ target/arm/cpu.c            |  8 ++++----
- target/arm/helper.c     | 65 ++++++++++++++++++++++++++++++++++-------
+ target/arm/tcg/mve_helper.c | 28 ++++++++++++++--------------
- target/arm/trace-events |  7 +++--
+ target/arm/tcg/vec_helper.c |  4 ++--
-files changed, 68 insertions(+), 14 deletions(-)
+ target/arm/vfp_helper.c     |  4 ++--
 files changed, 22 insertions(+), 23 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
- };
+                 float_status fp_status_f16_a64;
+                 float_status ah_fp_status;
- unsigned int gt_cntfrq_period_ns(ARMCPU *cpu);
+                 float_status ah_fp_status_f16;
-+void gt_rme_post_el_change(ARMCPU *cpu, void *opaque);
+-                float_status standard_fp_status;
+             };
- void arm_cpu_post_init(Object *obj);
+         };
@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
  #define HSTR_TTEE (1 << 16)
  #define HSTR_TJDBX (1 << 17)
 +#define CNTHCTL_CNTVMASK      (1 << 18)
 +#define CNTHCTL_CNTPMASK      (1 << 19)
 +
  /* Return the current FPSCR value.  */
  uint32_t vfp_get_fpscr(CPUARMState *env);
  void vfp_set_fpscr(CPUARMState *env, uint32_t val);
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
-         set_feature(env, ARM_FEATURE_VBAR);
+         env->sau.ctrl = 0;
      }
-+#ifndef CONFIG_USER_ONLY
+-    set_flush_to_zero(1, &env->vfp.standard_fp_status);
-+    if (tcg_enabled() && cpu_isar_feature(aa64_rme, cpu)) {
+-    set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
-+        arm_register_el_change_hook(cpu, &gt_rme_post_el_change, 0);
+-    set_default_nan_mode(1, &env->vfp.standard_fp_status);
-+    }
++    set_flush_to_zero(1, &env->vfp.fp_status[FPST_STD]);
-+#endif
++    set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]);
-+
++    set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
-     register_cp_regs_for_features(cpu);
+     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
-     arm_cpu_register_gdb_regs_for_features(cpu);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+-    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
-index XXXXXXX..XXXXXXX 100644
++    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
---- a/target/arm/helper.c
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
-+++ b/target/arm/helper.c
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
-@@ -XXX,XX +XXX,XX @@ static uint64_t gt_get_countervalue(CPUARMState *env)
+     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
-     return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) / gt_cntfrq_period_ns(cpu);
+diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/mve_helper.c
 +++ b/target/arm/tcg/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(tm & 1)) {                                            \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
                  continue;                                               \
              }                                                           \
              fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              fpst1 = fpst0;                                              \
              if (!(mask & 1)) {                                          \
                  scratch_fpst = *fpst0;                                  \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
          TYPE ra = (TYPE)ra_in;                                  \
          float_status *fpst = (ESIZE == 2) ?                     \
              &env->vfp.fp_status[FPST_STD_F16] :                 \
 -            &env->vfp.standard_fp_status;                       \
 +            &env->vfp.fp_status[FPST_STD];                       \
          for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
              if (mask & 1) {                                     \
                  TYPE v = m[H##ESIZE(e)];                        \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & (1 << (e * ESIZE)))) {                         \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & (1 << (e * ESIZE)))) {                         \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
          float_status scratch_fpst;                                      \
          float_status *base_fpst = (ESIZE == 2) ?                        \
              &env->vfp.fp_status[FPST_STD_F16] :                         \
 -            &env->vfp.standard_fp_status;                               \
 +            &env->vfp.fp_status[FPST_STD];                               \
          uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
          set_float_rounding_mode(rmode, base_fpst);                      \
          for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
@@ -XXX,XX +XXX,XX @@ static void do_vcvt_sh(CPUARMState *env, void *vd, void *vm, int top)
      unsigned e;
      float_status *fpst;
      float_status scratch_fpst;
 -    float_status *base_fpst = &env->vfp.standard_fp_status;
 +    float_status *base_fpst = &env->vfp.fp_status[FPST_STD];
      bool old_fz = get_flush_to_zero(base_fpst);
      set_flush_to_zero(false, base_fpst);
      for (e = 0; e < 16 / 4; e++, mask >>= 4) {
@@ -XXX,XX +XXX,XX @@ static void do_vcvt_hs(CPUARMState *env, void *vd, void *vm, int top)
      unsigned e;
      float_status *fpst;
      float_status scratch_fpst;
 -    float_status *base_fpst = &env->vfp.standard_fp_status;
 +    float_status *base_fpst = &env->vfp.fp_status[FPST_STD];
      bool old_fiz = get_flush_inputs_to_zero(base_fpst);
      set_flush_inputs_to_zero(false, base_fpst);
      for (e = 0; e < 16 / 4; e++, mask >>= 4) {
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
      bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 -    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
 +    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
               get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
  }
-+static void gt_update_irq(ARMCPU *cpu, int timeridx)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
-+{
+     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-+    CPUARMState *env = &cpu->env;
+     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
-+    uint64_t cnthctl = env->cp15.cnthctl_el2;
-+    ARMSecuritySpace ss = arm_security_space(env);
+-    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
-+    /* ISTATUS && !IMASK */
++    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
-+    int irqstate = (env->cp15.c14_timer[timeridx].ctl & 6) == 4;
+                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 +
 +    /*
 +     * If bit CNTHCTL_EL2.CNT[VP]MASK is set, it overrides IMASK.
 +     * It is RES0 in Secure and NonSecure state.
 +     */
 +    if ((ss == ARMSS_Root || ss == ARMSS_Realm) &&
 +        ((timeridx == GTIMER_VIRT && (cnthctl & CNTHCTL_CNTVMASK)) ||
 +         (timeridx == GTIMER_PHYS && (cnthctl & CNTHCTL_CNTPMASK)))) {
 +        irqstate = 0;
 +    }
 +
 +    qemu_set_irq(cpu->gt_timer_outputs[timeridx], irqstate);
 +    trace_arm_gt_update_irq(timeridx, irqstate);
 +}
 +
 +void gt_rme_post_el_change(ARMCPU *cpu, void *ignored)
 +{
 +    /*
 +     * Changing security state between Root and Secure/NonSecure, which may
 +     * happen when switching EL, can change the effective value of CNTHCTL_EL2
 +     * mask bits. Update the IRQ state accordingly.
 +     */
 +    gt_update_irq(cpu, GTIMER_VIRT);
 +    gt_update_irq(cpu, GTIMER_PHYS);
 +}
 +
  static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
  {
      ARMGenericTimer *gt = &cpu->env.cp15.c14_timer[timeridx];
@@ -XXX,XX +XXX,XX @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
          /* Note that this must be unsigned 64 bit arithmetic: */
          int istatus = count - offset >= gt->cval;
          uint64_t nexttick;
 -        int irqstate;
          gt->ctl = deposit32(gt->ctl, 2, 1, istatus);
 -        irqstate = (istatus && !(gt->ctl & 2));
 -        qemu_set_irq(cpu->gt_timer_outputs[timeridx], irqstate);
 -
          if (istatus) {
              /* Next transition is when count rolls back over to zero */
              nexttick = UINT64_MAX;
@@ -XXX,XX +XXX,XX @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
          } else {
              timer_mod(cpu->gt_timer[timeridx], nexttick);
          }
 -        trace_arm_gt_recalc(timeridx, irqstate, nexttick);
 +        trace_arm_gt_recalc(timeridx, nexttick);
      } else {
          /* Timer disabled: ISTATUS and timer output always clear */
          gt->ctl &= ~4;
 -        qemu_set_irq(cpu->gt_timer_outputs[timeridx], 0);
          timer_del(cpu->gt_timer[timeridx]);
          trace_arm_gt_recalc_disabled(timeridx);
      }
 +    gt_update_irq(cpu, timeridx);
  }
- static void gt_timer_reset(CPUARMState *env, const ARMCPRegInfo *ri,
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ static void gt_ctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
+index XXXXXXX..XXXXXXX 100644
-          * IMASK toggled: don't need to recalculate,
+--- a/target/arm/vfp_helper.c
-          * just set the interrupt line based on ISTATUS
++++ b/target/arm/vfp_helper.c
-          */
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
--        int irqstate = (oldval & 4) && !(value & 2);
+     uint32_t a32_flags = 0, a64_flags = 0;
--
--        trace_arm_gt_imask_toggle(timeridx, irqstate);
+     a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
--        qemu_set_irq(cpu->gt_timer_outputs[timeridx], irqstate);
+-    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
-+        trace_arm_gt_imask_toggle(timeridx);
++    a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
-+        gt_update_irq(cpu, timeridx);
+     /* FZ16 does not generate an input denormal exception.  */
-     }
+     a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
- }
+           & ~float_flag_input_denormal_flushed);
+@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
-@@ -XXX,XX +XXX,XX @@ static void gt_virt_ctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
+     set_float_exception_flags(0, &env->vfp.fp_status_a64);
-     gt_ctl_write(env, ri, GTIMER_VIRT, value);
+     set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
- }
+     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
+-    set_float_exception_flags(0, &env->vfp.standard_fp_status);
-+static void gt_cnthctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
++    set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
-+                             uint64_t value)
+     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
-+{
+     set_float_exception_flags(0, &env->vfp.ah_fp_status);
-+    ARMCPU *cpu = env_archcpu(env);
+     set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
 +    uint32_t oldval = env->cp15.cnthctl_el2;
 +
 +    raw_write(env, ri, value);
 +
 +    if ((oldval ^ value) & CNTHCTL_CNTVMASK) {
 +        gt_update_irq(cpu, GTIMER_VIRT);
 +    } else if ((oldval ^ value) & CNTHCTL_CNTPMASK) {
 +        gt_update_irq(cpu, GTIMER_PHYS);
 +    }
 +}
 +
  static void gt_cntvoff_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                uint64_t value)
  {
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
         * reset values as IMPDEF. We choose to reset to 3 to comply with
         * both ARMv7 and ARMv8.
         */
 -      .access = PL2_RW, .resetvalue = 3,
 +      .access = PL2_RW, .type = ARM_CP_IO, .resetvalue = 3,
 +      .writefn = gt_cnthctl_write, .raw_writefn = raw_write,
        .fieldoffset = offsetof(CPUARMState, cp15.cnthctl_el2) },
      { .name = "CNTVOFF_EL2", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 4, .crn = 14, .crm = 0, .opc2 = 3,
 diff --git a/target/arm/trace-events b/target/arm/trace-events
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/trace-events
 +++ b/target/arm/trace-events
@@ -XXX,XX +XXX,XX @@
  # See docs/devel/tracing.rst for syntax documentation.
  # helper.c
 -arm_gt_recalc(int timer, int irqstate, uint64_t nexttick) "gt recalc: timer %d irqstate %d next tick 0x%" PRIx64
 -arm_gt_recalc_disabled(int timer) "gt recalc: timer %d irqstate 0 timer disabled"
 +arm_gt_recalc(int timer, uint64_t nexttick) "gt recalc: timer %d next tick 0x%" PRIx64
 +arm_gt_recalc_disabled(int timer) "gt recalc: timer %d timer disabled"
  arm_gt_cval_write(int timer, uint64_t value) "gt_cval_write: timer %d value 0x%" PRIx64
  arm_gt_tval_write(int timer, uint64_t value) "gt_tval_write: timer %d value 0x%" PRIx64
  arm_gt_ctl_write(int timer, uint64_t value) "gt_ctl_write: timer %d value 0x%" PRIx64
 -arm_gt_imask_toggle(int timer, int irqstate) "gt_ctl_write: timer %d IMASK toggle, new irqstate %d"
 +arm_gt_imask_toggle(int timer) "gt_ctl_write: timer %d IMASK toggle"
  arm_gt_cntvoff_write(uint64_t value) "gt_cntvoff_write: value 0x%" PRIx64
 +arm_gt_update_irq(int timer, int irqstate) "gt_update_irq: timer %d irqstate %d"
  # kvm.c
  kvm_arm_fixup_msi_route(uint64_t iova, uint64_t gpa) "MSI iova = 0x%"PRIx64" is translated into 0x%"PRIx64
 --
 .34.1

-New patch
+[PULL 59/68] target/arm: Remove ah_fp_status_f16
+From: Richard Henderson <richard.henderson@linaro.org>
+Replace with fp_status[FPST_AH_F16].
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20250129013857.135256-10-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/cpu.h        |  3 +--
+ target/arm/cpu.c        |  2 +-
+ target/arm/vfp_helper.c | 10 +++++-----
+files changed, 7 insertions(+), 8 deletions(-)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState;
+  * behaviour when FPCR.AH == 1: they don't update cumulative
+  * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
+  * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
+- * which means we need an ah_fp_status_f16 as well.
++ * which means we need an FPST_AH_F16 as well.
+  *
+  * To avoid having to transfer exception bits around, we simply
+  * say that the FPSCR cumulative exception flags are the logical
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
+                 float_status fp_status_f16_a32;
+                 float_status fp_status_f16_a64;
+                 float_status ah_fp_status;
+-                float_status ah_fp_status_f16;
+             };
+         };
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
+     arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
+     set_flush_to_zero(1, &env->vfp.ah_fp_status);
+     set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
+-    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status_f16);
++    arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH_F16]);
+ #ifndef CONFIG_USER_ONLY
+     if (kvm_enabled()) {
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp_helper.c
++++ b/target/arm/vfp_helper.c
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
+     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
+           & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
+     /*
+-     * We do not merge in flags from ah_fp_status or ah_fp_status_f16, because
++     * We do not merge in flags from ah_fp_status or FPST_AH_F16, because
+      * they are used for insns that must not set the cumulative exception bits.
+      */
+@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
+     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
+     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
+     set_float_exception_flags(0, &env->vfp.ah_fp_status);
+-    set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
++    set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH_F16]);
+ }
+ static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
+-        set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
++        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
+-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
++        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
+     }
+     if (changed & FPCR_FZ) {
+         bool ftz_enabled = val & FPCR_FZ;
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
+         set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
+-        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status_f16);
++        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
+     }
+     if (changed & FPCR_AH) {
+         bool ah_enabled = val & FPCR_AH;
+--
+.34.1

-New patch
+[PULL 60/68] target/arm: Remove ah_fp_status
+From: Richard Henderson <richard.henderson@linaro.org>
+Replace with fp_status[FPST_AH].
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20250129013857.135256-11-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/cpu.h        | 3 +--
+ target/arm/cpu.c        | 6 +++---
+ target/arm/vfp_helper.c | 6 +++---
+files changed, 7 insertions(+), 8 deletions(-)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState;
+  * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
+  * using a fixed value for it.
+  *
+- * The ah_fp_status is needed because some insns have different
++ * FPST_AH is needed because some insns have different
+  * behaviour when FPCR.AH == 1: they don't update cumulative
+  * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
+  * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
+                 float_status fp_status_a64;
+                 float_status fp_status_f16_a32;
+                 float_status fp_status_f16_a64;
+-                float_status ah_fp_status;
+             };
+         };
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
+-    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
+-    set_flush_to_zero(1, &env->vfp.ah_fp_status);
+-    set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
++    arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]);
++    set_flush_to_zero(1, &env->vfp.fp_status[FPST_AH]);
++    set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_AH]);
+     arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH_F16]);
+ #ifndef CONFIG_USER_ONLY
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp_helper.c
++++ b/target/arm/vfp_helper.c
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
+     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
+           & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
+     /*
+-     * We do not merge in flags from ah_fp_status or FPST_AH_F16, because
++     * We do not merge in flags from FPST_AH or FPST_AH_F16, because
+      * they are used for insns that must not set the cumulative exception bits.
+      */
+@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
+     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
+     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
+     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
+-    set_float_exception_flags(0, &env->vfp.ah_fp_status);
++    set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH]);
+     set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH_F16]);
+ }
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
+-        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
++        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
+     }
+     if (changed & FPCR_AH) {
+--
+.34.1

-[PULL 10/35] accel/kvm: Use negative KVM type for error propagation
+[PULL 61/68] target/arm: Remove fp_status_f16_a64
-From: Akihiko Odaki <akihiko.odaki@daynix.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-On MIPS, kvm_arch_get_default_type() returns a negative value when an
+Replace with fp_status[FPST_A64_F16].
 error occurred so handle the case. Also, let other machines return
 negative values when errors occur and declare returning a negative
 value as the correct way to propagate an error that happened when
 determining KVM type.
-Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230727073134.134102-5-akihiko.odaki@daynix.com
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20250129013857.135256-12-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 ---
- accel/kvm/kvm-all.c | 5 +++++
+ target/arm/cpu.h            |  1 -
- hw/arm/virt.c       | 2 +-
+ target/arm/cpu.c            |  2 +-
- hw/ppc/spapr.c      | 2 +-
+ target/arm/tcg/sme_helper.c |  2 +-
-files changed, 7 insertions(+), 2 deletions(-)
+ target/arm/tcg/vec_helper.c |  9 ++++-----
  target/arm/vfp_helper.c     | 16 ++++++++--------
 files changed, 14 insertions(+), 16 deletions(-)
-diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/accel/kvm/kvm-all.c
+--- a/target/arm/cpu.h
-+++ b/accel/kvm/kvm-all.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static int kvm_init(MachineState *ms)
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
-         type = kvm_arch_get_default_type(ms);
+                 float_status fp_status_a32;
                  float_status fp_status_a64;
                  float_status fp_status_f16_a32;
 -                float_status fp_status_f16_a64;
              };
          };
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
      arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
 -    arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
 +    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
      arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]);
      set_flush_to_zero(1, &env->vfp.fp_status[FPST_AH]);
 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/sme_helper.c
 +++ b/target/arm/tcg/sme_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
       * produces default NaNs. We also need a second copy of fp_status with
       * round-to-odd -- see above.
       */
 -    fpst_f16 = env->vfp.fp_status_f16_a64;
 +    fpst_f16 = env->vfp.fp_status[FPST_A64_F16];
      fpst_std = env->vfp.fp_status_a64;
      set_default_nan_mode(true, &fpst_std);
      set_default_nan_mode(true, &fpst_f16);
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
          }
      }
+     do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
-+    if (type < 0) {
+-             get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
-+        ret = -EINVAL;
++             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
-+        goto err;
+ }
-+    }
-+
+ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
-     do {
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
-         ret = kvm_ioctl(s, KVM_CREATE_VM, type);
+     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-     } while (ret == -EINTR);
+     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
-diff --git a/hw/arm/virt.c b/hw/arm/virt.c
+     float_status *status = &env->vfp.fp_status_a64;
 -    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
 +    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
      int negx = 0, negf = 0;
      if (is_s) {
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
          }
      }
      do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
 -                 get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
 +                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
  }
  void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
      intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
      intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
      float_status *status = &env->vfp.fp_status_a64;
 -    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
 +    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
      int negx = 0, negf = 0;
      if (is_s) {
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
              negx = 0x8000;
          }
      }
 -
      for (i = 0; i < oprsz; i += 16) {
          float16 mm_16 = *(float16 *)(vm + i + idx);
          float32 mm = float16_to_float32_by_bits(mm_16, fz16);
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/virt.c
+--- a/target/arm/vfp_helper.c
-+++ b/hw/arm/virt.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ static int virt_kvm_type(MachineState *ms, const char *type_str)
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
-                      "require an IPA range (%d bits) larger than "
+           & ~float_flag_input_denormal_flushed);
-                      "the one supported by the host (%d bits)",
-                      requested_pa_size, max_vm_pa_size);
+     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
--        exit(1);
+-    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
-+        return -1;
++    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A64_F16])
            & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
      /*
       * We do not merge in flags from FPST_AH or FPST_AH_F16, because
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      set_float_exception_flags(0, &env->vfp.fp_status_a32);
      set_float_exception_flags(0, &env->vfp.fp_status_a64);
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
 -    set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          set_float_rounding_mode(i, &env->vfp.fp_status_a32);
          set_float_rounding_mode(i, &env->vfp.fp_status_a64);
          set_float_rounding_mode(i, &env->vfp.fp_status_f16_a32);
 -        set_float_rounding_mode(i, &env->vfp.fp_status_f16_a64);
 +        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
      }
      if (changed & FPCR_FZ16) {
          bool ftz_enabled = val & FPCR_FZ16;
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
 -        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
 +        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]);
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
 -        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
 +        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
      }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
 -        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
      }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          if (ah_enabled) {
              /* Change behaviours for A64 FP operations */
              arm_set_ah_fp_behaviours(&env->vfp.fp_status_a64);
 -            arm_set_ah_fp_behaviours(&env->vfp.fp_status_f16_a64);
 +            arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
          } else {
              arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
 -            arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
 +            arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
          }
      }
      /*
-      * We return the requested PA log size, unless KVM only supports
-diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/ppc/spapr.c
-+++ b/hw/ppc/spapr.c
-@@ -XXX,XX +XXX,XX @@ static int spapr_kvm_type(MachineState *machine, const char *vm_type)
-     }
-     error_report("Unknown kvm-type specified '%s'", vm_type);
--    exit(1);
-+    return -1;
- }
- /*
 --
 .34.1

-[PULL 12/35] accel/kvm: Make kvm_dirty_ring_reaper_init() void
+[PULL 62/68] target/arm: Remove fp_status_f16_a32
-From: Akihiko Odaki <akihiko.odaki@daynix.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-The returned value was always zero and had no meaning.
+Replace with fp_status[FPST_A32_F16].
-Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230727073134.134102-7-akihiko.odaki@daynix.com
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20250129013857.135256-13-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 ---
- accel/kvm/kvm-all.c | 9 ++-------
+ target/arm/cpu.h            |  1 -
-file changed, 2 insertions(+), 7 deletions(-)
+ target/arm/cpu.c            |  2 +-
  target/arm/tcg/vec_helper.c |  4 ++--
  target/arm/vfp_helper.c     | 14 +++++++-------
 files changed, 10 insertions(+), 11 deletions(-)
-diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/accel/kvm/kvm-all.c
+--- a/target/arm/cpu.h
-+++ b/accel/kvm/kvm-all.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static void *kvm_dirty_ring_reaper_thread(void *data)
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
-     return NULL;
+             struct {
                  float_status fp_status_a32;
                  float_status fp_status_a64;
 -                float_status fp_status_f16_a32;
              };
          };
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
      arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
 -    arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
 +    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
      arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]);
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
      uint64_t negx = is_s ? 0x8000800080008000ull : 0;
      do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
 -             get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 +             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
  }
--static int kvm_dirty_ring_reaper_init(KVMState *s)
+ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
-+static void kvm_dirty_ring_reaper_init(KVMState *s)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
- {
+     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
-     struct KVMDirtyRingReaper *r = &s->reaper;
+     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
-     qemu_thread_create(&r->reaper_thr, "kvm-reaper",
+-                 get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
-                        kvm_dirty_ring_reaper_thread,
++                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
                         s, QEMU_THREAD_JOINABLE);
 -
 -    return 0;
  }
- static int kvm_dirty_ring_init(KVMState *s)
+ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
-@@ -XXX,XX +XXX,XX @@ static int kvm_init(MachineState *ms)
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
      a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
      a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
      /* FZ16 does not generate an input denormal exception.  */
 -    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
 +    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A32_F16])
            & ~float_flag_input_denormal_flushed);
      a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_STD_F16])
            & ~float_flag_input_denormal_flushed);
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
       */
      set_float_exception_flags(0, &env->vfp.fp_status_a32);
      set_float_exception_flags(0, &env->vfp.fp_status_a64);
 -    set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          }
          set_float_rounding_mode(i, &env->vfp.fp_status_a32);
          set_float_rounding_mode(i, &env->vfp.fp_status_a64);
 -        set_float_rounding_mode(i, &env->vfp.fp_status_f16_a32);
 +        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]);
          set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
      }
+     if (changed & FPCR_FZ16) {
-     if (s->kvm_dirty_ring_size) {
+         bool ftz_enabled = val & FPCR_FZ16;
--        ret = kvm_dirty_ring_reaper_init(s);
+-        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
--        if (ret) {
++        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32_F16]);
--            goto err;
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]);
--        }
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-+        kvm_dirty_ring_reaper_init(s);
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-     }
+-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
++        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32_F16]);
-     if (kvm_check_extension(kvm_state, KVM_CAP_BINARY_STATS_FD)) {
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          bool dnan_enabled = val & FPCR_DN;
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
 -        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
@@ -XXX,XX +XXX,XX @@ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
      softfloat_to_vfp_compare(env, \
          FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
  }
 -DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status_f16_a32)
 +DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
  DO_VFP_cmp(s, float32, float32, fp_status_a32)
  DO_VFP_cmp(d, float64, float64, fp_status_a32)
  #undef DO_VFP_cmp
 --
 .34.1

-[PULL 34/35] target/arm: Fix SME ST1Q
+[PULL 63/68] target/arm: Remove fp_status_a64
 From: Richard Henderson <richard.henderson@linaro.org>
-A typo, noted in the bug report, resulting in an
+Replace with fp_status[FPST_A64].
 incorrect write offset.
-Cc: qemu-stable@nongnu.org
-Fixes: 7390e0e9ab8 ("target/arm: Implement SME LD1, ST1")
-Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1833
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 20230818214255.146905-1-richard.henderson@linaro.org
+Message-id: 20250129013857.135256-14-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/tcg/sme_helper.c | 2 +-
+ target/arm/cpu.h            |  1 -
-file changed, 1 insertion(+), 1 deletion(-)
+ target/arm/cpu.c            |  2 +-
  target/arm/tcg/sme_helper.c |  2 +-
  target/arm/tcg/vec_helper.c | 10 +++++-----
  target/arm/vfp_helper.c     | 16 ++++++++--------
 files changed, 15 insertions(+), 16 deletions(-)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
+             float_status fp_status[FPST_COUNT];
+             struct {
+                 float_status fp_status_a32;
+-                float_status fp_status_a64;
+             };
+         };
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
+     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
+     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
+-    arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
++    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/sme_helper.c
 +++ b/target/arm/tcg/sme_helper.c
-@@ -XXX,XX +XXX,XX @@ static inline void HNAME##_host(void *za, intptr_t off, void *host)         \
+@@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
- {                                                                           \
+      * round-to-odd -- see above.
-     uint64_t *ptr = za + off;                                               \
+      */
-     HOST(host, ptr[BE]);                                                    \
+     fpst_f16 = env->vfp.fp_status[FPST_A64_F16];
--    HOST(host + 1, ptr[!BE]);                                               \
+-    fpst_std = env->vfp.fp_status_a64;
-+    HOST(host + 8, ptr[!BE]);                                               \
++    fpst_std = env->vfp.fp_status[FPST_A64];
- }                                                                           \
+     set_default_nan_mode(true, &fpst_std);
- static inline void VNAME##_v_host(void *za, intptr_t off, void *host)       \
+     set_default_nan_mode(true, &fpst_f16);
- {                                                                           \
+     fpst_odd = fpst_std;
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
              negx = 0x8000800080008000ull;
          }
      }
 -    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
 +    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
               get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
  }
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
      intptr_t i, oprsz = simd_oprsz(desc);
      bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
 -    float_status *status = &env->vfp.fp_status_a64;
 +    float_status *status = &env->vfp.fp_status[FPST_A64];
      bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
      int negx = 0, negf = 0;
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
              negx = 0x8000800080008000ull;
          }
      }
 -    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
 +    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
                   get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
  }
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
      bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
      intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
 -    float_status *status = &env->vfp.fp_status_a64;
 +    float_status *status = &env->vfp.fp_status[FPST_A64];
      bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
      int negx = 0, negf = 0;
@@ -XXX,XX +XXX,XX @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp)
       */
      bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;
 -    *statusp = is_a64(env) ? env->vfp.fp_status_a64 : env->vfp.fp_status_a32;
 +    *statusp = is_a64(env) ? env->vfp.fp_status[FPST_A64] : env->vfp.fp_status_a32;
      set_default_nan_mode(true, statusp);
      if (ebf) {
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
      a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_STD_F16])
            & ~float_flag_input_denormal_flushed);
 -    a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
 +    a64_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_A64]);
      a64_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A64_F16])
            & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
      /*
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
       * be the architecturally up-to-date exception flag information first.
       */
      set_float_exception_flags(0, &env->vfp.fp_status_a32);
 -    set_float_exception_flags(0, &env->vfp.fp_status_a64);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
              break;
          }
          set_float_rounding_mode(i, &env->vfp.fp_status_a32);
 -        set_float_rounding_mode(i, &env->vfp.fp_status_a64);
 +        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64]);
          set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]);
          set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
      }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
      if (changed & FPCR_FZ) {
          bool ftz_enabled = val & FPCR_FZ;
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
 -        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
 +        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64]);
          /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
      }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
           */
          bool fitz_enabled = (val & FPCR_FIZ) ||
              (val & (FPCR_FZ | FPCR_AH)) == FPCR_FZ;
 -        set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status_a64);
 +        set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status[FPST_A64]);
      }
      if (changed & FPCR_DN) {
          bool dnan_enabled = val & FPCR_DN;
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
 -        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          if (ah_enabled) {
              /* Change behaviours for A64 FP operations */
 -            arm_set_ah_fp_behaviours(&env->vfp.fp_status_a64);
 +            arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
              arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
          } else {
 -            arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
 +            arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
              arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
          }
      }
 --
 .34.1

-[PULL 09/35] mips: Report an error when KVM_VM_MIPS_VZ is unavailable
+[PULL 64/68] target/arm: Remove fp_status_a32
-From: Akihiko Odaki <akihiko.odaki@daynix.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-On MIPS, QEMU requires KVM_VM_MIPS_VZ type for KVM. Report an error in
+Replace with fp_status[FPST_A32].  As this was the last of the
-such a case as other architectures do when an error occurred during KVM
+old structures, we can remove the anonymous union and struct.
 type decision.
-Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20230727073134.134102-4-akihiko.odaki@daynix.com
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20250129013857.135256-15-richard.henderson@linaro.org
 [PMM: tweak to account for change to is_ebf()]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 ---
- target/mips/kvm.c | 1 +
+ target/arm/cpu.h            |  7 +------
-file changed, 1 insertion(+)
+ target/arm/cpu.c            |  2 +-
  target/arm/tcg/vec_helper.c |  2 +-
  target/arm/vfp_helper.c     | 18 +++++++++---------
 files changed, 12 insertions(+), 17 deletions(-)
-diff --git a/target/mips/kvm.c b/target/mips/kvm.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/mips/kvm.c
+--- a/target/arm/cpu.h
-+++ b/target/mips/kvm.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_get_default_type(MachineState *machine)
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          uint32_t scratch[8];
          /* There are a number of distinct float control structures. */
 -        union {
 -            float_status fp_status[FPST_COUNT];
 -            struct {
 -                float_status fp_status_a32;
 -            };
 -        };
 +        float_status fp_status[FPST_COUNT];
          uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
          uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
      set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]);
      set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
      set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
 -    arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
 +    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]);
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp)
       */
      bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;
 -    *statusp = is_a64(env) ? env->vfp.fp_status[FPST_A64] : env->vfp.fp_status_a32;
 +    *statusp = env->vfp.fp_status[is_a64(env) ? FPST_A64 : FPST_A32];
      set_default_nan_mode(true, statusp);
      if (ebf) {
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
  {
      uint32_t a32_flags = 0, a64_flags = 0;
 -    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
 +    a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_A32]);
      a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
      /* FZ16 does not generate an input denormal exception.  */
      a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A32_F16])
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
       * values. The caller should have arranged for env->vfp.fpsr to
       * be the architecturally up-to-date exception flag information first.
       */
 -    set_float_exception_flags(0, &env->vfp.fp_status_a32);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
              i = float_round_to_zero;
              break;
          }
 -        set_float_rounding_mode(i, &env->vfp.fp_status_a32);
 +        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32]);
          set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64]);
          set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]);
          set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
      }
- #endif
+     if (changed & FPCR_FZ) {
+         bool ftz_enabled = val & FPCR_FZ;
-+    error_report("KVM_VM_MIPS_VZ type is not available");
+-        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
-     return -1;
++        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]);
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64]);
          /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
 -        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
 +        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]);
      }
      if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
          /*
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
      }
      if (changed & FPCR_DN) {
          bool dnan_enabled = val & FPCR_DN;
 -        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
          FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
  }
+ DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
+-DO_VFP_cmp(s, float32, float32, fp_status_a32)
+-DO_VFP_cmp(d, float64, float64, fp_status_a32)
++DO_VFP_cmp(s, float32, float32, fp_status[FPST_A32])
++DO_VFP_cmp(d, float64, float64, fp_status[FPST_A32])
+ #undef DO_VFP_cmp
+ /* Integer to float and float to integer conversions */
+@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(fjcvtzs)(float64 value, float_status *status)
+ uint32_t HELPER(vjcvt)(float64 value, CPUARMState *env)
+ {
+-    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status_a32);
++    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status[FPST_A32]);
+     uint32_t result = pair;
+     uint32_t z = (pair >> 32) == 0;
 --
 .34.1

-New patch
+[PULL 65/68] target/arm: Simplify fp_status indexing in mve_helper.c
+From: Richard Henderson <richard.henderson@linaro.org>
+Select on index instead of pointer.
+No functional change.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20250129013857.135256-16-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/tcg/mve_helper.c | 40 +++++++++++++------------------------
+file changed, 14 insertions(+), 26 deletions(-)
+diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/mve_helper.c
++++ b/target/arm/tcg/mve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+-                &env->vfp.fp_status[FPST_STD];                           \
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
+             if (!(mask & 1)) {                                          \
+                 /* We need the result but without updating flags */     \
+                 scratch_fpst = *fpst;                                   \
+@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
+                 r[e] = 0;                                               \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+-                &env->vfp.fp_status[FPST_STD];                           \
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
+             if (!(tm & 1)) {                                            \
+                 /* We need the result but without updating flags */     \
+                 scratch_fpst = *fpst;                                   \
+@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+-                &env->vfp.fp_status[FPST_STD];                           \
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
+             if (!(mask & 1)) {                                          \
+                 /* We need the result but without updating flags */     \
+                 scratch_fpst = *fpst;                                   \
+@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE * 2)) == 0) {          \
+                 continue;                                               \
+             }                                                           \
+-            fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
+-                &env->vfp.fp_status[FPST_STD];                           \
++            fpst0 = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
+             fpst1 = fpst0;                                              \
+             if (!(mask & 1)) {                                          \
+                 scratch_fpst = *fpst0;                                  \
+@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+-                &env->vfp.fp_status[FPST_STD];                           \
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
+             if (!(mask & 1)) {                                          \
+                 /* We need the result but without updating flags */     \
+                 scratch_fpst = *fpst;                                   \
+@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+-                &env->vfp.fp_status[FPST_STD];                           \
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
+             if (!(mask & 1)) {                                          \
+                 /* We need the result but without updating flags */     \
+                 scratch_fpst = *fpst;                                   \
+@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
+         unsigned e;                                             \
+         TYPE *m = vm;                                           \
+         TYPE ra = (TYPE)ra_in;                                  \
+-        float_status *fpst = (ESIZE == 2) ?                     \
+-            &env->vfp.fp_status[FPST_STD_F16] :                 \
+-            &env->vfp.fp_status[FPST_STD];                       \
++        float_status *fpst =                                    \
++            &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
+         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
+             if (mask & 1) {                                     \
+                 TYPE v = m[H##ESIZE(e)];                        \
+@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
+             if ((mask & emask) == 0) {                                  \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+-                &env->vfp.fp_status[FPST_STD];                           \
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
+             if (!(mask & (1 << (e * ESIZE)))) {                         \
+                 /* We need the result but without updating flags */     \
+                 scratch_fpst = *fpst;                                   \
+@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
+             if ((mask & emask) == 0) {                                  \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+-                &env->vfp.fp_status[FPST_STD];                           \
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
+             if (!(mask & (1 << (e * ESIZE)))) {                         \
+                 /* We need the result but without updating flags */     \
+                 scratch_fpst = *fpst;                                   \
+@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+-                &env->vfp.fp_status[FPST_STD];                           \
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
+             if (!(mask & 1)) {                                          \
+                 /* We need the result but without updating flags */     \
+                 scratch_fpst = *fpst;                                   \
+@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
+         unsigned e;                                                     \
+         float_status *fpst;                                             \
+         float_status scratch_fpst;                                      \
+-        float_status *base_fpst = (ESIZE == 2) ?                        \
+-            &env->vfp.fp_status[FPST_STD_F16] :                         \
+-            &env->vfp.fp_status[FPST_STD];                               \
++        float_status *base_fpst =                                       \
++            &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD];  \
+         uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
+         set_float_rounding_mode(rmode, base_fpst);                      \
+         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
+                 continue;                                               \
+             }                                                           \
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
+-                &env->vfp.fp_status[FPST_STD];                           \
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
+             if (!(mask & 1)) {                                          \
+                 /* We need the result but without updating flags */     \
+                 scratch_fpst = *fpst;                                   \
+--
+.34.1

-[PULL 35/35] target/arm: Fix 64-bit SSRA
+[PULL 66/68] target/arm: Simplify DO_VFP_cmp in vfp_helper.c
 From: Richard Henderson <richard.henderson@linaro.org>
-Typo applied byte-wise shift instead of double-word shift.
+Pass ARMFPStatusFlavour index instead of fp_status[FOO].
-Cc: qemu-stable@nongnu.org
-Fixes: 631e565450c ("target/arm: Create gen_gvec_[us]sra")
-Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1737
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 20230821022025.397682-1-richard.henderson@linaro.org
+Message-id: 20250129013857.135256-17-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/tcg/translate.c | 2 +-
+ target/arm/vfp_helper.c | 10 +++++-----
-file changed, 1 insertion(+), 1 deletion(-)
+file changed, 5 insertions(+), 5 deletions(-)
-diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/tcg/translate.c
+--- a/target/arm/vfp_helper.c
-+++ b/target/arm/tcg/translate.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ void gen_gvec_ssra(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
+@@ -XXX,XX +XXX,XX @@ static void softfloat_to_vfp_compare(CPUARMState *env, FloatRelation cmp)
-           .vece = MO_32 },
+ void VFP_HELPER(cmp, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env)  \
-         { .fni8 = gen_ssra64_i64,
+ { \
-           .fniv = gen_ssra_vec,
+     softfloat_to_vfp_compare(env, \
--          .fno = gen_helper_gvec_ssra_b,
+-        FLOATTYPE ## _compare_quiet(a, b, &env->vfp.FPST)); \
-+          .fno = gen_helper_gvec_ssra_d,
++        FLOATTYPE ## _compare_quiet(a, b, &env->vfp.fp_status[FPST])); \
-           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+ } \
-           .opt_opc = vecop_list,
+ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
-           .load_dest = true,
+ { \
      softfloat_to_vfp_compare(env, \
 -        FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
 +        FLOATTYPE ## _compare(a, b, &env->vfp.fp_status[FPST])); \
  }
 -DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
 -DO_VFP_cmp(s, float32, float32, fp_status[FPST_A32])
 -DO_VFP_cmp(d, float64, float64, fp_status[FPST_A32])
 +DO_VFP_cmp(h, float16, dh_ctype_f16, FPST_A32_F16)
 +DO_VFP_cmp(s, float32, float32, FPST_A32)
 +DO_VFP_cmp(d, float64, float64, FPST_A32)
  #undef DO_VFP_cmp
  /* Integer to float and float to integer conversions */
 --
 .34.1

-[PULL 01/35] hw/gpio/nrf51: implement DETECT signal
+[PULL 67/68] target/arm: Read fz16 from env->vfp.fpcr
-From: Chris Laplante <chris@laplante.io>
+From: Richard Henderson <richard.henderson@linaro.org>
-Implement nRF51 DETECT signal in the GPIO peripheral.
+Read the bit from the source, rather than from the proxy via
 get_flush_inputs_to_zero.  This makes it clear that it does
 not matter which of the float_status structures is used.
-The reference manual makes mention of a per-pin DETECT signal, but these
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-are not exposed to the user. See https://devzone.nordicsemi.com/f/nordic-q-a/39858/gpio-per-pin-detect-signal-available
+Message-id: 20250129013857.135256-34-richard.henderson@linaro.org
 for more information. Currently, I don't see a reason to model these.
 Signed-off-by: Chris Laplante <chris@laplante.io>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20230728160324.1159090-2-chris@laplante.io
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/gpio/nrf51_gpio.h |  1 +
+ target/arm/tcg/vec_helper.c | 12 ++++++------
- hw/gpio/nrf51_gpio.c         | 14 +++++++++++++-
+file changed, 6 insertions(+), 6 deletions(-)
 files changed, 14 insertions(+), 1 deletion(-)
-diff --git a/include/hw/gpio/nrf51_gpio.h b/include/hw/gpio/nrf51_gpio.h
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/gpio/nrf51_gpio.h
+--- a/target/arm/tcg/vec_helper.c
-+++ b/include/hw/gpio/nrf51_gpio.h
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ struct NRF51GPIOState {
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
-     uint32_t old_out_connected;
+     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
-     qemu_irq output[NRF51_GPIO_PINS];
+     do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
-+    qemu_irq detect;
+-             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
- };
++             env->vfp.fpcr & FPCR_FZ16);
+ }
-diff --git a/hw/gpio/nrf51_gpio.c b/hw/gpio/nrf51_gpio.c
+ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
 --- a/hw/gpio/nrf51_gpio.c
 +++ b/hw/gpio/nrf51_gpio.c
@@ -XXX,XX +XXX,XX @@ static void update_state(NRF51GPIOState *s)
      int pull;
      size_t i;
      bool connected_out, dir, connected_in, out, in, input;
 +    bool assert_detect = false;
      for (i = 0; i < NRF51_GPIO_PINS; i++) {
          pull = pull_value(s->cnf[i]);
@@ -XXX,XX +XXX,XX @@ static void update_state(NRF51GPIOState *s)
                  qemu_log_mask(LOG_GUEST_ERROR,
                                "GPIO pin %zu short circuited\n", i);
              }
 -            if (!connected_in) {
 +            if (connected_in) {
 +                uint32_t detect_config = extract32(s->cnf[i], 16, 2);
 +                if ((detect_config == 2) && (in == 1)) {
 +                    assert_detect = true;
 +                }
 +                if ((detect_config == 3) && (in == 0)) {
 +                    assert_detect = true;
 +                }
 +            } else {
                  /*
                   * Floating input: the output stimulates IN if connected,
                   * otherwise pull-up/pull-down resistors put a value on both
@@ -XXX,XX +XXX,XX @@ static void update_state(NRF51GPIOState *s)
          }
-         update_output_irq(s, i, connected_out, out);
      }
-+
+     do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-+    qemu_set_irq(s->detect, assert_detect);
+-             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
 +             env->vfp.fpcr & FPCR_FZ16);
  }
- /*
+ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
-@@ -XXX,XX +XXX,XX @@ static void nrf51_gpio_init(Object *obj)
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
+     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-     qdev_init_gpio_in(DEVICE(s), nrf51_gpio_set, NRF51_GPIO_PINS);
+     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
-     qdev_init_gpio_out(DEVICE(s), s->output, NRF51_GPIO_PINS);
+     float_status *status = &env->vfp.fp_status[FPST_A64];
-+    qdev_init_gpio_out_named(DEVICE(s), &s->detect, "detect", 1);
+-    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
 +    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
      int negx = 0, negf = 0;
      if (is_s) {
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
      uint64_t negx = is_s ? 0x8000800080008000ull : 0;
      do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
 -                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
 +                 env->vfp.fpcr & FPCR_FZ16);
  }
- static void nrf51_gpio_class_init(ObjectClass *klass, void *data)
+ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
          }
      }
      do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
 -                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
 +                 env->vfp.fpcr & FPCR_FZ16);
  }
  void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
      intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
      intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
      float_status *status = &env->vfp.fp_status[FPST_A64];
 -    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
 +    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
      int negx = 0, negf = 0;
      if (is_s) {
 --
 .34.1

-[PULL 29/35] target/arm/helper: Fix tlbmask and tlbbits for TLBI VAE2*
+[PULL 68/68] target/arm: Sink fp_status and fpcr access into do_fmlal*
-From: Jean-Philippe Brucker <jean-philippe@linaro.org>
+From: Richard Henderson <richard.henderson@linaro.org>
-When HCR_EL2.E2H is enabled, TLB entries are formed using the EL2&0
+Sink common code from the callers into do_fmlal
-translation regime, instead of the EL2 translation regime. The TLB VAE2*
+and do_fmlal_idx.  Reorder the arguments to minimize
-instructions invalidate the regime that corresponds to the current value
+the re-sorting from the caller's arguments.
 of HCR_EL2.E2H.
-At the moment we only invalidate the EL2 translation regime. This causes
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-problems with RMM, which issues TLBI VAE2IS instructions with
+Message-id: 20250129013857.135256-35-richard.henderson@linaro.org
 HCR_EL2.E2H enabled. Update vae2_tlbmask() to take HCR_EL2.E2H into
 account.
 Add vae2_tlbbits() as well, since the top-byte-ignore configuration is
 different between the EL2&0 and EL2 regime.
 Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20230809123706.1842548-3-jean-philippe@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 50 ++++++++++++++++++++++++++++++++++++---------
+ target/arm/tcg/vec_helper.c | 28 ++++++++++++++++------------
-file changed, 40 insertions(+), 10 deletions(-)
+file changed, 16 insertions(+), 12 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/vec_helper.c
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static int vae1_tlbmask(CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ static uint64_t load4_f16(uint64_t *ptr, int is_q, int is_2)
-     return mask;
+  * as there is not yet SVE versions that might use blocking.
   */
 -static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
 -                     uint64_t negx, int negf, uint32_t desc, bool fz16)
 +static void do_fmlal(float32 *d, void *vn, void *vm,
 +                     CPUARMState *env, uint32_t desc,
 +                     ARMFPStatusFlavour fpst_idx,
 +                     uint64_t negx, int negf)
  {
 +    float_status *fpst = &env->vfp.fp_status[fpst_idx];
 +    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
      intptr_t i, oprsz = simd_oprsz(desc);
      int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
      int is_q = oprsz == 16;
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
      bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 -    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
 -             env->vfp.fpcr & FPCR_FZ16);
 +    do_fmlal(vd, vn, vm, env, desc, FPST_STD, negx, 0);
  }
-+static int vae2_tlbmask(CPUARMState *env)
+ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
-+{
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
-+    uint64_t hcr = arm_hcr_el2_eff(env);
+             negx = 0x8000800080008000ull;
-+    uint16_t mask;
+         }
-+
+     }
-+    if (hcr & HCR_E2H) {
+-    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-+        mask = ARMMMUIdxBit_E20_2 |
+-             env->vfp.fpcr & FPCR_FZ16);
-+               ARMMMUIdxBit_E20_2_PAN |
++    do_fmlal(vd, vn, vm, env, desc, FPST_A64, negx, negf);
 +               ARMMMUIdxBit_E20_0;
 +    } else {
 +        mask = ARMMMUIdxBit_E2;
 +    }
 +    return mask;
 +}
 +
  /* Return 56 if TBI is enabled, 64 otherwise. */
  static int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx,
                                uint64_t addr)
@@ -XXX,XX +XXX,XX @@ static int vae1_tlbbits(CPUARMState *env, uint64_t addr)
      return tlbbits_for_regime(env, mmu_idx, addr);
  }
-+static int vae2_tlbbits(CPUARMState *env, uint64_t addr)
+ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
-+{
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
-+    uint64_t hcr = arm_hcr_el2_eff(env);
+     }
-+    ARMMMUIdx mmu_idx;
+ }
-+
-+    /*
+-static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
-+     * Only the regime of the mmu_idx below is significant.
+-                         uint64_t negx, int negf, uint32_t desc, bool fz16)
-+     * Regime EL2&0 has two ranges with separate TBI configuration, while EL2
++static void do_fmlal_idx(float32 *d, void *vn, void *vm,
-+     * only has one.
++                         CPUARMState *env, uint32_t desc,
-+     */
++                         ARMFPStatusFlavour fpst_idx,
-+    if (hcr & HCR_E2H) {
++                         uint64_t negx, int negf)
 +        mmu_idx = ARMMMUIdx_E20_2;
 +    } else {
 +        mmu_idx = ARMMMUIdx_E2;
 +    }
 +
 +    return tlbbits_for_regime(env, mmu_idx, addr);
 +}
 +
  static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                        uint64_t value)
  {
-@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae2_write(CPUARMState *env, const ARMCPRegInfo *ri,
++    float_status *fpst = &env->vfp.fp_status[fpst_idx];
-      * flush-last-level-only.
++    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
-      */
+     intptr_t i, oprsz = simd_oprsz(desc);
-     CPUState *cs = env_cpu(env);
+     int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
--    int mask = e2_tlbmask(env);
+     int index = extract32(desc, SIMD_DATA_SHIFT + 2, 3);
-+    int mask = vae2_tlbmask(env);
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
-     uint64_t pageaddr = sextract64(value << 12, 0, 56);
+     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-+    int bits = vae2_tlbbits(env, pageaddr);
+     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
--    tlb_flush_page_by_mmuidx(cs, pageaddr, mask);
+-    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
-+    tlb_flush_page_bits_by_mmuidx(cs, pageaddr, mask, bits);
+-                 env->vfp.fpcr & FPCR_FZ16);
 +    do_fmlal_idx(vd, vn, vm, env, desc, FPST_STD, negx, 0);
  }
- static void tlbi_aa64_vae3_write(CPUARMState *env, const ARMCPRegInfo *ri,
+ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
-@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
-                                    uint64_t value)
+             negx = 0x8000800080008000ull;
- {
+         }
-     CPUState *cs = env_cpu(env);
+     }
-+    int mask = vae2_tlbmask(env);
+-    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-     uint64_t pageaddr = sextract64(value << 12, 0, 56);
+-                 env->vfp.fpcr & FPCR_FZ16);
--    int bits = tlbbits_for_regime(env, ARMMMUIdx_E2, pageaddr);
++    do_fmlal_idx(vd, vn, vm, env, desc, FPST_A64, negx, negf);
 +    int bits = vae2_tlbbits(env, pageaddr);
 -    tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr,
 -                                                  ARMMMUIdxBit_E2, bits);
 +    tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits);
  }
- static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_rvae1is_write(CPUARMState *env,
      do_rvae_write(env, value, vae1_tlbmask(env), true);
  }
 -static int vae2_tlbmask(CPUARMState *env)
 -{
 -    return ARMMMUIdxBit_E2;
 -}
 -
  static void tlbi_aa64_rvae2_write(CPUARMState *env,
                                    const ARMCPRegInfo *ri,
                                    uint64_t value)
 --
 .34.1

Hi; here's the first arm pullreq for the 8.2 cycle. These are
pretty much all bug fixes (mostly for the experimental FEAT_RME),
rather than any major features.

-- PMM

The following changes since commit b0dd9a7d6dd15a6898e9c585b521e6bec79b25aa:

Open 8.2 development tree (2023-08-22 07:14:07 -0700)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230824

for you to fetch changes up to cd1e4db73646006039f25879af3bff55b2295ff3:

target/arm: Fix 64-bit SSRA (2023-08-22 17:31:14 +0100)

----------------------------------------------------------------
target-arm queue:
 * hw/gpio/nrf51: implement DETECT signal
 * accel/kvm: Specify default IPA size for arm64
 * ptw: refactor, fix some FEAT_RME bugs
 * target/arm: Adjust PAR_EL1.SH for Device and Normal-NC memory types
 * target/arm/helper: Implement CNTHCTL_EL2.CNT[VP]MASK
 * Fix SME ST1Q
 * Fix 64-bit SSRA

----------------------------------------------------------------
Akihiko Odaki (6):
      kvm: Introduce kvm_arch_get_default_type hook
      accel/kvm: Specify default IPA size for arm64
      mips: Report an error when KVM_VM_MIPS_VZ is unavailable
      accel/kvm: Use negative KVM type for error propagation
      accel/kvm: Free as when an error occurred
      accel/kvm: Make kvm_dirty_ring_reaper_init() void

Chris Laplante (6):
      hw/gpio/nrf51: implement DETECT signal
      qtest: factor out qtest_install_gpio_out_intercept
      qtest: implement named interception of out-GPIO
      qtest: bail from irq_intercept_in if name is specified
      qtest: irq_intercept_[out/in]: return FAIL if no intercepts are installed
      qtest: microbit-test: add tests for nRF51 DETECT

Jean-Philippe Brucker (6):
      target/arm/ptw: Load stage-2 tables from realm physical space
      target/arm/helper: Fix tlbmask and tlbbits for TLBI VAE2*
      target/arm: Skip granule protection checks for AT instructions
      target/arm: Pass security space rather than flag for AT instructions
      target/arm/helper: Check SCR_EL3.{NSE, NS} encoding for AT instructions
      target/arm/helper: Implement CNTHCTL_EL2.CNT[VP]MASK

Peter Maydell (15):
      target/arm/ptw: Don't set fi->s1ptw for UnsuppAtomicUpdate fault
      target/arm/ptw: Don't report GPC faults on stage 1 ptw as stage2 faults
      target/arm/ptw: Set s1ns bit in fault info more consistently
      target/arm/ptw: Pass ptw into get_phys_addr_pmsa*() and get_phys_addr_disabled()
      target/arm/ptw: Pass ARMSecurityState to regime_translation_disabled()
      target/arm/ptw: Pass an ARMSecuritySpace to arm_hcr_el2_eff_secstate()
      target/arm: Pass an ARMSecuritySpace to arm_is_el2_enabled_secstate()
      target/arm/ptw: Only fold in NSTable bit effects in Secure state
      target/arm/ptw: Remove last uses of ptw->in_secure
      target/arm/ptw: Remove S1Translate::in_secure
      target/arm/ptw: Drop S1Translate::out_secure
      target/arm/ptw: Set attributes correctly for MMU disabled data accesses
      target/arm/ptw: Check for block descriptors at invalid levels
      target/arm/ptw: Report stage 2 fault level for stage 2 faults on stage 1 ptw
      target/arm: Adjust PAR_EL1.SH for Device and Normal-NC memory types

Richard Henderson (2):
      target/arm: Fix SME ST1Q
      target/arm: Fix 64-bit SSRA

From: Chris Laplante <chris@laplante.io>

Implement nRF51 DETECT signal in the GPIO peripheral.

The reference manual makes mention of a per-pin DETECT signal, but these
are not exposed to the user. See https://devzone.nordicsemi.com/f/nordic-q-a/39858/gpio-per-pin-detect-signal-available
for more information. Currently, I don't see a reason to model these.

Signed-off-by: Chris Laplante <chris@laplante.io>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230728160324.1159090-2-chris@laplante.io
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/gpio/nrf51_gpio.h |  1 +
 hw/gpio/nrf51_gpio.c         | 14 +++++++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/hw/gpio/nrf51_gpio.h b/include/hw/gpio/nrf51_gpio.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/gpio/nrf51_gpio.h
+++ b/include/hw/gpio/nrf51_gpio.h
@@ -XXX,XX +XXX,XX @@ struct NRF51GPIOState {
     uint32_t old_out_connected;
 
     qemu_irq output[NRF51_GPIO_PINS];
+    qemu_irq detect;
 };
 
 
diff --git a/hw/gpio/nrf51_gpio.c b/hw/gpio/nrf51_gpio.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/gpio/nrf51_gpio.c
+++ b/hw/gpio/nrf51_gpio.c
@@ -XXX,XX +XXX,XX @@ static void update_state(NRF51GPIOState *s)
     int pull;
     size_t i;
     bool connected_out, dir, connected_in, out, in, input;
+    bool assert_detect = false;
 
     for (i = 0; i < NRF51_GPIO_PINS; i++) {
         pull = pull_value(s->cnf[i]);
@@ -XXX,XX +XXX,XX @@ static void update_state(NRF51GPIOState *s)
                 qemu_log_mask(LOG_GUEST_ERROR,
                               "GPIO pin %zu short circuited\n", i);
             }
-            if (!connected_in) {
+            if (connected_in) {
+                uint32_t detect_config = extract32(s->cnf[i], 16, 2);
+                if ((detect_config == 2) && (in == 1)) {
+                    assert_detect = true;
+                }
+                if ((detect_config == 3) && (in == 0)) {
+                    assert_detect = true;
+                }
+            } else {
                 /*
                  * Floating input: the output stimulates IN if connected,
                  * otherwise pull-up/pull-down resistors put a value on both
@@ -XXX,XX +XXX,XX @@ static void update_state(NRF51GPIOState *s)
         }
         update_output_irq(s, i, connected_out, out);
     }
+
+    qemu_set_irq(s->detect, assert_detect);
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ static void nrf51_gpio_init(Object *obj)
 
     qdev_init_gpio_in(DEVICE(s), nrf51_gpio_set, NRF51_GPIO_PINS);
     qdev_init_gpio_out(DEVICE(s), s->output, NRF51_GPIO_PINS);
+    qdev_init_gpio_out_named(DEVICE(s), &s->detect, "detect", 1);
 }
 
 static void nrf51_gpio_class_init(ObjectClass *klass, void *data)
-- 
2.34.1

From: Chris Laplante <chris@laplante.io>

Signed-off-by: Chris Laplante <chris@laplante.io>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230728160324.1159090-3-chris@laplante.io
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 softmmu/qtest.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/softmmu/qtest.c b/softmmu/qtest.c
index XXXXXXX..XXXXXXX 100644
--- a/softmmu/qtest.c
+++ b/softmmu/qtest.c
@@ -XXX,XX +XXX,XX @@ void qtest_set_command_cb(bool (*pc_cb)(CharBackend *chr, gchar **words))
     process_command_cb = pc_cb;
 }
 
+static void qtest_install_gpio_out_intercept(DeviceState *dev, const char *name, int n)
+{
+    qemu_irq *disconnected = g_new0(qemu_irq, 1);
+    qemu_irq icpt = qemu_allocate_irq(qtest_irq_handler,
+                                      disconnected, n);
+
+    *disconnected = qdev_intercept_gpio_out(dev, icpt, name, n);
+}
+
 static void qtest_process_command(CharBackend *chr, gchar **words)
 {
     const gchar *command;
@@ -XXX,XX +XXX,XX @@ static void qtest_process_command(CharBackend *chr, gchar **words)
             if (words[0][14] == 'o') {
                 int i;
                 for (i = 0; i < ngl->num_out; ++i) {
-                    qemu_irq *disconnected = g_new0(qemu_irq, 1);
-                    qemu_irq icpt = qemu_allocate_irq(qtest_irq_handler,
-                                                      disconnected, i);
-
-                    *disconnected = qdev_intercept_gpio_out(dev, icpt,
-                                                            ngl->name, i);
+                    qtest_install_gpio_out_intercept(dev, ngl->name, i);
                 }
             } else {
                 qemu_irq_intercept_in(ngl->in, qtest_irq_handler,
-- 
2.34.1

From: Chris Laplante <chris@laplante.io>

Adds qtest_irq_intercept_out_named method, which utilizes a new optional
name parameter to the irq_intercept_out qtest command.

Signed-off-by: Chris Laplante <chris@laplante.io>
Message-id: 20230728160324.1159090-4-chris@laplante.io
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/libqtest.h | 11 +++++++++++
 softmmu/qtest.c        | 18 ++++++++++--------
 tests/qtest/libqtest.c |  6 ++++++
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/tests/qtest/libqtest.h b/tests/qtest/libqtest.h
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/libqtest.h
+++ b/tests/qtest/libqtest.h
@@ -XXX,XX +XXX,XX @@ void qtest_irq_intercept_in(QTestState *s, const char *string);
  */
 void qtest_irq_intercept_out(QTestState *s, const char *string);
 
+/**
+ * qtest_irq_intercept_out_named:
+ * @s: #QTestState instance to operate on.
+ * @qom_path: QOM path of a device.
+ * @name: Name of the GPIO out pin
+ *
+ * Associate a qtest irq with the named GPIO-out pin of the device
+ * whose path is specified by @string and whose name is @name.
+ */
+void qtest_irq_intercept_out_named(QTestState *s, const char *qom_path, const char *name);
+
 /**
  * qtest_set_irq_in:
  * @s: QTestState instance to operate on.
diff --git a/softmmu/qtest.c b/softmmu/qtest.c
index XXXXXXX..XXXXXXX 100644
--- a/softmmu/qtest.c
+++ b/softmmu/qtest.c
@@ -XXX,XX +XXX,XX @@ static void qtest_process_command(CharBackend *chr, gchar **words)
         || strcmp(words[0], "irq_intercept_in") == 0) {
         DeviceState *dev;
         NamedGPIOList *ngl;
+        bool is_outbound;
 
         g_assert(words[1]);
+        is_outbound = words[0][14] == 'o';
         dev = DEVICE(object_resolve_path(words[1], NULL));
         if (!dev) {
             qtest_send_prefix(chr);
@@ -XXX,XX +XXX,XX @@ static void qtest_process_command(CharBackend *chr, gchar **words)
         }
 
         QLIST_FOREACH(ngl, &dev->gpios, node) {
-            /* We don't support intercept of named GPIOs yet */
-            if (ngl->name) {
-                continue;
-            }
-            if (words[0][14] == 'o') {
-                int i;
-                for (i = 0; i < ngl->num_out; ++i) {
-                    qtest_install_gpio_out_intercept(dev, ngl->name, i);
+            /* We don't support inbound interception of named GPIOs yet */
+            if (is_outbound) {
+                /* NULL is valid and matchable, for "unnamed GPIO" */
+                if (g_strcmp0(ngl->name, words[2]) == 0) {
+                    int i;
+                    for (i = 0; i < ngl->num_out; ++i) {
+                        qtest_install_gpio_out_intercept(dev, ngl->name, i);
+                    }
                 }
             } else {
                 qemu_irq_intercept_in(ngl->in, qtest_irq_handler,
diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -XXX,XX +XXX,XX @@ void qtest_irq_intercept_out(QTestState *s, const char *qom_path)
     qtest_rsp(s);
 }
 
+void qtest_irq_intercept_out_named(QTestState *s, const char *qom_path, const char *name)
+{
+    qtest_sendf(s, "irq_intercept_out %s %s\n", qom_path, name);
+    qtest_rsp(s);
+}
+
 void qtest_irq_intercept_in(QTestState *s, const char *qom_path)
 {
     qtest_sendf(s, "irq_intercept_in %s\n", qom_path);
-- 
2.34.1

From: Chris Laplante <chris@laplante.io>

Named interception of in-GPIOs is not supported yet.

Signed-off-by: Chris Laplante <chris@laplante.io>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230728160324.1159090-5-chris@laplante.io
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 softmmu/qtest.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/softmmu/qtest.c b/softmmu/qtest.c
index XXXXXXX..XXXXXXX 100644
--- a/softmmu/qtest.c
+++ b/softmmu/qtest.c
@@ -XXX,XX +XXX,XX @@ static void qtest_process_command(CharBackend *chr, gchar **words)
         || strcmp(words[0], "irq_intercept_in") == 0) {
         DeviceState *dev;
         NamedGPIOList *ngl;
+        bool is_named;
         bool is_outbound;
 
         g_assert(words[1]);
+        is_named = words[2] != NULL;
         is_outbound = words[0][14] == 'o';
         dev = DEVICE(object_resolve_path(words[1], NULL));
         if (!dev) {
@@ -XXX,XX +XXX,XX @@ static void qtest_process_command(CharBackend *chr, gchar **words)
             return;
         }
 
+        if (is_named && !is_outbound) {
+            qtest_send_prefix(chr);
+            qtest_send(chr, "FAIL Interception of named in-GPIOs not yet supported\n");
+            return;
+        }
+
         if (irq_intercept_dev) {
             qtest_send_prefix(chr);
             if (irq_intercept_dev != dev) {
-- 
2.34.1

From: Chris Laplante <chris@laplante.io>

This is much better than just silently failing with OK.

Signed-off-by: Chris Laplante <chris@laplante.io>
Message-id: 20230728160324.1159090-6-chris@laplante.io
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 softmmu/qtest.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/softmmu/qtest.c b/softmmu/qtest.c
index XXXXXXX..XXXXXXX 100644
--- a/softmmu/qtest.c
+++ b/softmmu/qtest.c
@@ -XXX,XX +XXX,XX @@ static void qtest_process_command(CharBackend *chr, gchar **words)
         NamedGPIOList *ngl;
         bool is_named;
         bool is_outbound;
+        bool interception_succeeded = false;
 
         g_assert(words[1]);
         is_named = words[2] != NULL;
@@ -XXX,XX +XXX,XX @@ static void qtest_process_command(CharBackend *chr, gchar **words)
                     for (i = 0; i < ngl->num_out; ++i) {
                         qtest_install_gpio_out_intercept(dev, ngl->name, i);
                     }
+                    interception_succeeded = true;
                 }
             } else {
                 qemu_irq_intercept_in(ngl->in, qtest_irq_handler,
                                       ngl->num_in);
+                interception_succeeded = true;
             }
         }
-        irq_intercept_dev = dev;
+
         qtest_send_prefix(chr);
-        qtest_send(chr, "OK\n");
+        if (interception_succeeded) {
+            irq_intercept_dev = dev;
+            qtest_send(chr, "OK\n");
+        } else {
+            qtest_send(chr, "FAIL No intercepts installed\n");
+        }
     } else if (strcmp(words[0], "set_irq_in") == 0) {
         DeviceState *dev;
         qemu_irq irq;
-- 
2.34.1

From: Chris Laplante <chris@laplante.io>

Exercise the DETECT mechanism of the GPIO peripheral.

Signed-off-by: Chris Laplante <chris@laplante.io>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230728160324.1159090-7-chris@laplante.io
[PMM: fixed coding style nits]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/microbit-test.c | 44 +++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/tests/qtest/microbit-test.c b/tests/qtest/microbit-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/microbit-test.c
+++ b/tests/qtest/microbit-test.c
@@ -XXX,XX +XXX,XX @@ static void test_nrf51_gpio(void)
     qtest_quit(qts);
 }
 
+static void test_nrf51_gpio_detect(void)
+{
+    QTestState *qts = qtest_init("-M microbit");
+    int i;
+
+    /* Connect input buffer on pins 1-7, configure SENSE for high level */
+    for (i = 1; i <= 7; i++) {
+        qtest_writel(qts, NRF51_GPIO_BASE + NRF51_GPIO_REG_CNF_START + i * 4,
+                     deposit32(0, 16, 2, 2));
+    }
+
+    qtest_irq_intercept_out_named(qts, "/machine/nrf51/gpio", "detect");
+
+    for (i = 1; i <= 7; i++) {
+        /* Set pin high */
+        qtest_set_irq_in(qts, "/machine/nrf51", "unnamed-gpio-in", i, 1);
+        uint32_t actual = qtest_readl(qts, NRF51_GPIO_BASE + NRF51_GPIO_REG_IN);
+        g_assert_cmpuint(actual, ==, 1 << i);
+
+        /* Check that DETECT is high */
+        g_assert_true(qtest_get_irq(qts, 0));
+
+        /* Set pin low, check that DETECT goes low. */
+        qtest_set_irq_in(qts, "/machine/nrf51", "unnamed-gpio-in", i, 0);
+        actual = qtest_readl(qts, NRF51_GPIO_BASE + NRF51_GPIO_REG_IN);
+        g_assert_cmpuint(actual, ==, 0x0);
+        g_assert_false(qtest_get_irq(qts, 0));
+    }
+
+    /* Set pin 0 high, check that DETECT doesn't fire */
+    qtest_set_irq_in(qts, "/machine/nrf51", "unnamed-gpio-in", 0, 1);
+    g_assert_false(qtest_get_irq(qts, 0));
+    qtest_set_irq_in(qts, "/machine/nrf51", "unnamed-gpio-in", 0, 0);
+
+    /* Set pins 1, 2, and 3 high, then set 3 low. Check DETECT is still high */
+    for (i = 1; i <= 3; i++) {
+        qtest_set_irq_in(qts, "/machine/nrf51", "unnamed-gpio-in", i, 1);
+    }
+    g_assert_true(qtest_get_irq(qts, 0));
+    qtest_set_irq_in(qts, "/machine/nrf51", "unnamed-gpio-in", 3, 0);
+    g_assert_true(qtest_get_irq(qts, 0));
+}
+
 static void timer_task(QTestState *qts, hwaddr task)
 {
     qtest_writel(qts, NRF51_TIMER_BASE + task, NRF51_TRIGGER_TASK);
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
 
     qtest_add_func("/microbit/nrf51/uart", test_nrf51_uart);
     qtest_add_func("/microbit/nrf51/gpio", test_nrf51_gpio);
+    qtest_add_func("/microbit/nrf51/gpio_detect", test_nrf51_gpio_detect);
     qtest_add_func("/microbit/nrf51/nvmc", test_nrf51_nvmc);
     qtest_add_func("/microbit/nrf51/timer", test_nrf51_timer);
     qtest_add_func("/microbit/microbit/i2c", test_microbit_i2c);
-- 
2.34.1

From: Akihiko Odaki <akihiko.odaki@daynix.com>

kvm_arch_get_default_type() returns the default KVM type. This hook is
particularly useful to derive a KVM type that is valid for "none"
machine model, which is used by libvirt to probe the availability of
KVM.

For MIPS, the existing mips_kvm_type() is reused. This function ensures
the availability of VZ which is mandatory to use KVM on the current
QEMU.

Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20230727073134.134102-2-akihiko.odaki@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: added doc comment for new function]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
 include/sysemu/kvm.h     | 2 ++
 target/mips/kvm_mips.h   | 9 ---------
 accel/kvm/kvm-all.c      | 4 +++-
 hw/mips/loongson3_virt.c | 2 --
 target/arm/kvm.c         | 5 +++++
 target/i386/kvm/kvm.c    | 5 +++++
 target/mips/kvm.c        | 2 +-
 target/ppc/kvm.c         | 5 +++++
 target/riscv/kvm.c       | 5 +++++
 target/s390x/kvm/kvm.c   | 5 +++++
 10 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -XXX,XX +XXX,XX @@ int kvm_arch_get_registers(CPUState *cpu);
 
 int kvm_arch_put_registers(CPUState *cpu, int level);
 
+int kvm_arch_get_default_type(MachineState *ms);
+
 int kvm_arch_init(MachineState *ms, KVMState *s);
 
 int kvm_arch_init_vcpu(CPUState *cpu);
diff --git a/target/mips/kvm_mips.h b/target/mips/kvm_mips.h
index XXXXXXX..XXXXXXX 100644
--- a/target/mips/kvm_mips.h
+++ b/target/mips/kvm_mips.h
@@ -XXX,XX +XXX,XX @@ void kvm_mips_reset_vcpu(MIPSCPU *cpu);
 int kvm_mips_set_interrupt(MIPSCPU *cpu, int irq, int level);
 int kvm_mips_set_ipi_interrupt(MIPSCPU *cpu, int irq, int level);
 
-#ifdef CONFIG_KVM
-int mips_kvm_type(MachineState *machine, const char *vm_type);
-#else
-static inline int mips_kvm_type(MachineState *machine, const char *vm_type)
-{
-    return 0;
-}
-#endif
-
 #endif /* KVM_MIPS_H */
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -XXX,XX +XXX,XX @@ static int kvm_init(MachineState *ms)
     KVMState *s;
     const KVMCapabilityInfo *missing_cap;
     int ret;
-    int type = 0;
+    int type;
     uint64_t dirty_log_manual_caps;
 
     qemu_mutex_init(&kml_slots_lock);
@@ -XXX,XX +XXX,XX @@ static int kvm_init(MachineState *ms)
         type = mc->kvm_type(ms, kvm_type);
     } else if (mc->kvm_type) {
         type = mc->kvm_type(ms, NULL);
+    } else {
+        type = kvm_arch_get_default_type(ms);
     }
 
     do {
diff --git a/hw/mips/loongson3_virt.c b/hw/mips/loongson3_virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/mips/loongson3_virt.c
+++ b/hw/mips/loongson3_virt.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/datadir.h"
 #include "qapi/error.h"
 #include "elf.h"
-#include "kvm_mips.h"
 #include "hw/char/serial.h"
 #include "hw/intc/loongson_liointc.h"
 #include "hw/mips/mips.h"
@@ -XXX,XX +XXX,XX @@ static void loongson3v_machine_class_init(ObjectClass *oc, void *data)
     mc->max_cpus = LOONGSON_MAX_VCPUS;
     mc->default_ram_id = "loongson3.highram";
     mc->default_ram_size = 1600 * MiB;
-    mc->kvm_type = mips_kvm_type;
     mc->minimum_page_bits = 14;
     mc->default_nic = "virtio-net-pci";
 }
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ int kvm_arm_get_max_vm_ipa_size(MachineState *ms, bool *fixed_ipa)
     return ret > 0 ? ret : 40;
 }
 
+int kvm_arch_get_default_type(MachineState *ms)
+{
+    return 0;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     int ret = 0;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -XXX,XX +XXX,XX @@ static void register_smram_listener(Notifier *n, void *unused)
                                  &smram_address_space, 1, "kvm-smram");
 }
 
+int kvm_arch_get_default_type(MachineState *ms)
+{
+    return 0;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     uint64_t identity_base = 0xfffbc000;
diff --git a/target/mips/kvm.c b/target/mips/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/mips/kvm.c
+++ b/target/mips/kvm.c
@@ -XXX,XX +XXX,XX @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
     abort();
 }
 
-int mips_kvm_type(MachineState *machine, const char *vm_type)
+int kvm_arch_get_default_type(MachineState *machine)
 {
 #if defined(KVM_CAP_MIPS_VZ)
     int r;
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -XXX,XX +XXX,XX @@ static int kvm_ppc_register_host_cpu_type(void);
 static void kvmppc_get_cpu_characteristics(KVMState *s);
 static int kvmppc_get_dec_bits(void);
 
+int kvm_arch_get_default_type(MachineState *ms)
+{
+    return 0;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     cap_interrupt_unset = kvm_check_extension(s, KVM_CAP_PPC_UNSET_IRQ);
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -XXX,XX +XXX,XX @@ int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
     return 0;
 }
 
+int kvm_arch_get_default_type(MachineState *ms)
+{
+    return 0;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     return 0;
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -XXX,XX +XXX,XX @@ static void ccw_machine_class_foreach(ObjectClass *oc, void *opaque)
     mc->default_cpu_type = S390_CPU_TYPE_NAME("host");
 }
 
+int kvm_arch_get_default_type(MachineState *ms)
+{
+    return 0;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     object_class_foreach(ccw_machine_class_foreach, TYPE_S390_CCW_MACHINE,
-- 
2.34.1

From: Akihiko Odaki <akihiko.odaki@daynix.com>

Before this change, the default KVM type, which is used for non-virt
machine models, was 0.

The kernel documentation says:
> On arm64, the physical address size for a VM (IPA Size limit) is
> limited to 40bits by default. The limit can be configured if the host
> supports the extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
> KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type
> identifier, where IPA_Bits is the maximum width of any physical
> address used by the VM. The IPA_Bits is encoded in bits[7-0] of the
> machine type identifier.
>
> e.g, to configure a guest to use 48bit physical address size::
>
>     vm_fd = ioctl(dev_fd, KVM_CREATE_VM, KVM_VM_TYPE_ARM_IPA_SIZE(48));
>
> The requested size (IPA_Bits) must be:
>
>  ==   =========================================================
>   0   Implies default size, 40bits (for backward compatibility)
>   N   Implies N bits, where N is a positive integer such that,
>       32 <= N <= Host_IPA_Limit
>  ==   =========================================================

> Host_IPA_Limit is the maximum possible value for IPA_Bits on the host
> and is dependent on the CPU capability and the kernel configuration.
> The limit can be retrieved using KVM_CAP_ARM_VM_IPA_SIZE of the
> KVM_CHECK_EXTENSION ioctl() at run-time.
>
> Creation of the VM will fail if the requested IPA size (whether it is
> implicit or explicit) is unsupported on the host.
https://docs.kernel.org/virt/kvm/api.html#kvm-create-vm

So if Host_IPA_Limit < 40, specifying 0 as the type will fail. This
actually confused libvirt, which uses "none" machine model to probe the
KVM availability, on M2 MacBook Air.

Fix this by using Host_IPA_Limit as the default type when
KVM_CAP_ARM_VM_IPA_SIZE is available.

Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20230727073134.134102-3-akihiko.odaki@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/kvm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ int kvm_arm_get_max_vm_ipa_size(MachineState *ms, bool *fixed_ipa)
 
 int kvm_arch_get_default_type(MachineState *ms)
 {
-    return 0;
+    bool fixed_ipa;
+    int size = kvm_arm_get_max_vm_ipa_size(ms, &fixed_ipa);
+    return fixed_ipa ? 0 : size;
 }
 
 int kvm_arch_init(MachineState *ms, KVMState *s)
-- 
2.34.1

From: Akihiko Odaki <akihiko.odaki@daynix.com>

On MIPS, kvm_arch_get_default_type() returns a negative value when an
error occurred so handle the case. Also, let other machines return
negative values when errors occur and declare returning a negative
value as the correct way to propagate an error that happened when
determining KVM type.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20230727073134.134102-5-akihiko.odaki@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
 accel/kvm/kvm-all.c | 5 +++++
 hw/arm/virt.c       | 2 +-
 hw/ppc/spapr.c      | 2 +-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -XXX,XX +XXX,XX @@ static int kvm_init(MachineState *ms)
         type = kvm_arch_get_default_type(ms);
     }
 
+    if (type < 0) {
+        ret = -EINVAL;
+        goto err;
+    }
+
     do {
         ret = kvm_ioctl(s, KVM_CREATE_VM, type);
     } while (ret == -EINTR);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static int virt_kvm_type(MachineState *ms, const char *type_str)
                      "require an IPA range (%d bits) larger than "
                      "the one supported by the host (%d bits)",
                      requested_pa_size, max_vm_pa_size);
-        exit(1);
+        return -1;
     }
     /*
      * We return the requested PA log size, unless KVM only supports
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -XXX,XX +XXX,XX @@ static int spapr_kvm_type(MachineState *machine, const char *vm_type)
     }
 
     error_report("Unknown kvm-type specified '%s'", vm_type);
-    exit(1);
+    return -1;
 }
 
 /*
-- 
2.34.1

From: Akihiko Odaki <akihiko.odaki@daynix.com>

The returned value was always zero and had no meaning.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20230727073134.134102-7-akihiko.odaki@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
 accel/kvm/kvm-all.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -XXX,XX +XXX,XX @@ static void *kvm_dirty_ring_reaper_thread(void *data)
     return NULL;
 }
 
-static int kvm_dirty_ring_reaper_init(KVMState *s)
+static void kvm_dirty_ring_reaper_init(KVMState *s)
 {
     struct KVMDirtyRingReaper *r = &s->reaper;
 
     qemu_thread_create(&r->reaper_thr, "kvm-reaper",
                        kvm_dirty_ring_reaper_thread,
                        s, QEMU_THREAD_JOINABLE);
-
-    return 0;
 }
 
 static int kvm_dirty_ring_init(KVMState *s)
@@ -XXX,XX +XXX,XX @@ static int kvm_init(MachineState *ms)
     }
 
     if (s->kvm_dirty_ring_size) {
-        ret = kvm_dirty_ring_reaper_init(s);
-        if (ret) {
-            goto err;
-        }
+        kvm_dirty_ring_reaper_init(s);
     }
 
     if (kvm_check_extension(kvm_state, KVM_CAP_BINARY_STATS_FD)) {
-- 
2.34.1

In S1_ptw_translate() we set up the ARMMMUFaultInfo if the attempt to
translate the page descriptor address into a physical address fails.
This used to only be possible if we are doing a stage 2 ptw for that
descriptor address, and so the code always sets fi->stage2 and
fi->s1ptw to true.  However, with FEAT_RME it is also possible for
the lookup of the page descriptor address to fail because of a
Granule Protection Check fault.  These should not be reported as
stage 2, otherwise arm_deliver_fault() will incorrectly set
HPFAR_EL2.  Similarly the s1ptw bit should only be set for stage 2
faults on stage 1 translation table walks, i.e.  not for GPC faults.

Add a comment to the the other place where we might detect a
stage2-fault-on-stage-1-ptw, in arm_casq_ptw(), noting why we know in
that case that it must really be a stage 2 fault and not a GPC fault.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230807141514.19075-3-peter.maydell@linaro.org
---
 target/arm/ptw.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
         fi->type = ARMFault_GPCFOnWalk;
     }
     fi->s2addr = addr;
-    fi->stage2 = true;
-    fi->s1ptw = true;
+    fi->stage2 = regime_is_stage2(s2_mmu_idx);
+    fi->s1ptw = fi->stage2;
     fi->s1ns = !is_secure;
     return false;
 }
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_casq_ptw(CPUARMState *env, uint64_t old_val,
         env->tlb_fi = NULL;
 
         if (unlikely(flags & TLB_INVALID_MASK)) {
+            /*
+             * We know this must be a stage 2 fault because the granule
+             * protection table does not separately track read and write
+             * permission, so all GPC faults are caught in S1_ptw_translate():
+             * we only get here for "readable but not writeable".
+             */
             assert(fi->type != ARMFault_None);
             fi->s2addr = ptw->out_virt;
             fi->stage2 = true;
-- 
2.34.1

The s1ns bit in ARMMMUFaultInfo is documented as "true if
we faulted on a non-secure IPA while in secure state". Both the
places which look at this bit only do so after having confirmed
that this is a stage 2 fault and we're dealing with Secure EL2,
which leaves the ptw.c code free to set the bit to any random
value in the other cases.

Instead of taking advantage of that freedom, consistently
make the bit be set to false for the "not a stage 2 fault
for Secure EL2" cases. This removes some cases where we
were using an 'is_secure' boolean and leaving the reader
guessing about whether that was the right thing for Realm
and Root cases.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230807141514.19075-4-peter.maydell@linaro.org
---
 target/arm/ptw.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static ARMSecuritySpace S2_security_space(ARMSecuritySpace s1_space,
     }
 }
 
+static bool fault_s1ns(ARMSecuritySpace space, ARMMMUIdx s2_mmu_idx)
+{
+    /*
+     * For stage 2 faults in Secure EL22, S1NS indicates
+     * whether the faulting IPA is in the Secure or NonSecure
+     * IPA space. For all other kinds of fault, it is false.
+     */
+    return space == ARMSS_Secure && regime_is_stage2(s2_mmu_idx)
+        && s2_mmu_idx == ARMMMUIdx_Stage2_S;
+}
+
 /* Translate a S1 pagetable walk through S2 if needed.  */
 static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
                              hwaddr addr, ARMMMUFaultInfo *fi)
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
             fi->s2addr = addr;
             fi->stage2 = true;
             fi->s1ptw = true;
-            fi->s1ns = !is_secure;
+            fi->s1ns = fault_s1ns(ptw->in_space, s2_mmu_idx);
             return false;
         }
     }
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
     fi->s2addr = addr;
     fi->stage2 = regime_is_stage2(s2_mmu_idx);
     fi->s1ptw = fi->stage2;
-    fi->s1ns = !is_secure;
+    fi->s1ns = fault_s1ns(ptw->in_space, s2_mmu_idx);
     return false;
 }
 
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_casq_ptw(CPUARMState *env, uint64_t old_val,
             fi->s2addr = ptw->out_virt;
             fi->stage2 = true;
             fi->s1ptw = true;
-            fi->s1ns = !ptw->in_secure;
+            fi->s1ns = fault_s1ns(ptw->in_space, ptw->in_ptw_idx);
             return 0;
         }
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     fi->level = level;
     /* Tag the error as S2 for failed S1 PTW at S2 or ordinary S2.  */
     fi->stage2 = fi->s1ptw || regime_is_stage2(mmu_idx);
-    fi->s1ns = mmu_idx == ARMMMUIdx_Stage2;
+    fi->s1ns = fault_s1ns(ptw->in_space, mmu_idx);
     return true;
 }
 
-- 
2.34.1

In commit 6d2654ffacea813916176 we created the S1Translate struct and
used it to plumb through various arguments that we were previously
passing one-at-a-time to get_phys_addr_v5(), get_phys_addr_v6(), and
get_phys_addr_lpae().  Extend that pattern to get_phys_addr_pmsav5(),
get_phys_addr_pmsav7(), get_phys_addr_pmsav8() and
get_phys_addr_disabled(), so that all the get_phys_addr_* functions
we call from get_phys_addr_nogpc() take the S1Translate struct rather
than the mmu_idx and is_secure bool.

(This refactoring is a prelude to having the called functions look
at ptw->is_space rather than using an is_secure boolean.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230807141514.19075-5-peter.maydell@linaro.org
---
 target/arm/ptw.c | 57 ++++++++++++++++++++++++++++++------------------
 1 file changed, 36 insertions(+), 21 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     return true;
 }
 
-static bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
-                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                                 bool is_secure, GetPhysAddrResult *result,
+static bool get_phys_addr_pmsav5(CPUARMState *env,
+                                 S1Translate *ptw,
+                                 uint32_t address,
+                                 MMUAccessType access_type,
+                                 GetPhysAddrResult *result,
                                  ARMMMUFaultInfo *fi)
 {
     int n;
     uint32_t mask;
     uint32_t base;
+    ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
     bool is_user = regime_is_user(env, mmu_idx);
+    bool is_secure = arm_space_is_secure(ptw->in_space);
 
     if (regime_translation_disabled(env, mmu_idx, is_secure)) {
         /* MPU disabled.  */
@@ -XXX,XX +XXX,XX @@ static bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx,
     return regime_sctlr(env, mmu_idx) & SCTLR_BR;
 }
 
-static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
-                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                                 bool secure, GetPhysAddrResult *result,
+static bool get_phys_addr_pmsav7(CPUARMState *env,
+                                 S1Translate *ptw,
+                                 uint32_t address,
+                                 MMUAccessType access_type,
+                                 GetPhysAddrResult *result,
                                  ARMMMUFaultInfo *fi)
 {
     ARMCPU *cpu = env_archcpu(env);
     int n;
+    ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
     bool is_user = regime_is_user(env, mmu_idx);
+    bool secure = arm_space_is_secure(ptw->in_space);
 
     result->f.phys_addr = address;
     result->f.lg_page_size = TARGET_PAGE_BITS;
@@ -XXX,XX +XXX,XX @@ void v8m_security_lookup(CPUARMState *env, uint32_t address,
     }
 }
 
-static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
-                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                                 bool secure, GetPhysAddrResult *result,
+static bool get_phys_addr_pmsav8(CPUARMState *env,
+                                 S1Translate *ptw,
+                                 uint32_t address,
+                                 MMUAccessType access_type,
+                                 GetPhysAddrResult *result,
                                  ARMMMUFaultInfo *fi)
 {
     V8M_SAttributes sattrs = {};
+    ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
+    bool secure = arm_space_is_secure(ptw->in_space);
     bool ret;
 
     if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
@@ -XXX,XX +XXX,XX @@ static ARMCacheAttrs combine_cacheattrs(uint64_t hcr,
  * MMU disabled.  S1 addresses within aa64 translation regimes are
  * still checked for bounds -- see AArch64.S1DisabledOutput().
  */
-static bool get_phys_addr_disabled(CPUARMState *env, target_ulong address,
+static bool get_phys_addr_disabled(CPUARMState *env,
+                                   S1Translate *ptw,
+                                   target_ulong address,
                                    MMUAccessType access_type,
-                                   ARMMMUIdx mmu_idx, bool is_secure,
                                    GetPhysAddrResult *result,
                                    ARMMMUFaultInfo *fi)
 {
+    ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
+    bool is_secure = arm_space_is_secure(ptw->in_space);
     uint8_t memattr = 0x00;    /* Device nGnRnE */
     uint8_t shareability = 0;  /* non-shareable */
     int r_el;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
     case ARMMMUIdx_Phys_Root:
     case ARMMMUIdx_Phys_Realm:
         /* Checking Phys early avoids special casing later vs regime_el. */
-        return get_phys_addr_disabled(env, address, access_type, mmu_idx,
-                                      is_secure, result, fi);
+        return get_phys_addr_disabled(env, ptw, address, access_type,
+                                      result, fi);
 
     case ARMMMUIdx_Stage1_E0:
     case ARMMMUIdx_Stage1_E1:
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
 
         if (arm_feature(env, ARM_FEATURE_V8)) {
             /* PMSAv8 */
-            ret = get_phys_addr_pmsav8(env, address, access_type, mmu_idx,
-                                       is_secure, result, fi);
+            ret = get_phys_addr_pmsav8(env, ptw, address, access_type,
+                                       result, fi);
         } else if (arm_feature(env, ARM_FEATURE_V7)) {
             /* PMSAv7 */
-            ret = get_phys_addr_pmsav7(env, address, access_type, mmu_idx,
-                                       is_secure, result, fi);
+            ret = get_phys_addr_pmsav7(env, ptw, address, access_type,
+                                       result, fi);
         } else {
             /* Pre-v7 MPU */
-            ret = get_phys_addr_pmsav5(env, address, access_type, mmu_idx,
-                                       is_secure, result, fi);
+            ret = get_phys_addr_pmsav5(env, ptw, address, access_type,
+                                       result, fi);
         }
         qemu_log_mask(CPU_LOG_MMU, "PMSA MPU lookup for %s at 0x%08" PRIx32
                       " mmu_idx %u -> %s (prot %c%c%c)\n",
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
     /* Definitely a real MMU, not an MPU */
 
     if (regime_translation_disabled(env, mmu_idx, is_secure)) {
-        return get_phys_addr_disabled(env, address, access_type, mmu_idx,
-                                      is_secure, result, fi);
+        return get_phys_addr_disabled(env, ptw, address, access_type,
+                                      result, fi);
     }
 
     if (regime_using_lpae_format(env, mmu_idx)) {
-- 
2.34.1

Plumb the ARMSecurityState through to regime_translation_disabled()
rather than just a bool is_secure.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230807141514.19075-6-peter.maydell@linaro.org
---
 target/arm/ptw.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn)
 
 /* Return true if the specified stage of address translation is disabled */
 static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
-                                        bool is_secure)
+                                        ARMSecuritySpace space)
 {
     uint64_t hcr_el2;
+    bool is_secure = arm_space_is_secure(space);
 
     if (arm_feature(env, ARM_FEATURE_M)) {
         switch (env->v7m.mpu_ctrl[is_secure] &
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav5(CPUARMState *env,
     uint32_t base;
     ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
     bool is_user = regime_is_user(env, mmu_idx);
-    bool is_secure = arm_space_is_secure(ptw->in_space);
 
-    if (regime_translation_disabled(env, mmu_idx, is_secure)) {
+    if (regime_translation_disabled(env, mmu_idx, ptw->in_space)) {
         /* MPU disabled.  */
         result->f.phys_addr = address;
         result->f.prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav7(CPUARMState *env,
     result->f.lg_page_size = TARGET_PAGE_BITS;
     result->f.prot = 0;
 
-    if (regime_translation_disabled(env, mmu_idx, secure) ||
+    if (regime_translation_disabled(env, mmu_idx, ptw->in_space) ||
         m_is_ppb_region(env, address)) {
         /*
          * MPU disabled or M profile PPB access: use default memory map.
@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
      * are done in arm_v7m_load_vector(), which always does a direct
      * read using address_space_ldl(), rather than going via this function.
      */
-    if (regime_translation_disabled(env, mmu_idx, secure)) { /* MPU disabled */
+    if (regime_translation_disabled(env, mmu_idx, arm_secure_to_space(secure))) {
+        /* MPU disabled */
         hit = true;
     } else if (m_is_ppb_region(env, address)) {
         hit = true;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
          */
         ptw->in_mmu_idx = mmu_idx = s1_mmu_idx;
         if (arm_feature(env, ARM_FEATURE_EL2) &&
-            !regime_translation_disabled(env, ARMMMUIdx_Stage2, is_secure)) {
+            !regime_translation_disabled(env, ARMMMUIdx_Stage2, ptw->in_space)) {
             return get_phys_addr_twostage(env, ptw, address, access_type,
                                           result, fi);
         }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
 
     /* Definitely a real MMU, not an MPU */
 
-    if (regime_translation_disabled(env, mmu_idx, is_secure)) {
+    if (regime_translation_disabled(env, mmu_idx, ptw->in_space)) {
         return get_phys_addr_disabled(env, ptw, address, access_type,
                                       result, fi);
     }
-- 
2.34.1

arm_hcr_el2_eff_secstate() takes a bool secure, which it uses to
determine whether EL2 is enabled in the current security state.
With the advent of FEAT_RME this is no longer sufficient, because
EL2 can be enabled for Secure state but not for Root, and both
of those will pass 'secure == true' in the callsites in ptw.c.

As it happens in all of our callsites in ptw.c we either avoid making
the call or else avoid using the returned value if we're doing a
translation for Root, so this is not a behaviour change even if the
experimental FEAT_RME is enabled.  But it is less confusing in the
ptw.c code if we avoid the use of a bool secure that duplicates some
of the information in the ArmSecuritySpace argument.

Make arm_hcr_el2_eff_secstate() take an ARMSecuritySpace argument
instead. Because we always want to know the HCR_EL2 for the
security state defined by the current effective value of
SCR_EL3.{NSE,NS}, it makes no sense to pass ARMSS_Root here,
and we assert that callers don't do that.

To avoid the assert(), we thus push the call to
arm_hcr_el2_eff_secstate() down into the cases in
regime_translation_disabled() that need it, rather than calling the
function and ignoring the result for the Root space translations.
All other calls to this function in ptw.c are already in places
where we have confirmed that the mmu_idx is a stage 2 translation
or that the regime EL is not 3.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230807141514.19075-7-peter.maydell@linaro.org
---
 target/arm/cpu.h    |  2 +-
 target/arm/helper.c |  8 +++++---
 target/arm/ptw.c    | 15 +++++++--------
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool arm_is_el2_enabled(CPUARMState *env)
  * "for all purposes other than a direct read or write access of HCR_EL2."
  * Not included here is HCR_RW.
  */
-uint64_t arm_hcr_el2_eff_secstate(CPUARMState *env, bool secure);
+uint64_t arm_hcr_el2_eff_secstate(CPUARMState *env, ARMSecuritySpace space);
 uint64_t arm_hcr_el2_eff(CPUARMState *env);
 uint64_t arm_hcrx_el2_eff(CPUARMState *env);
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void hcr_writelow(CPUARMState *env, const ARMCPRegInfo *ri,
  * Bits that are not included here:
  * RW       (read from SCR_EL3.RW as needed)
  */
-uint64_t arm_hcr_el2_eff_secstate(CPUARMState *env, bool secure)
+uint64_t arm_hcr_el2_eff_secstate(CPUARMState *env, ARMSecuritySpace space)
 {
     uint64_t ret = env->cp15.hcr_el2;
 
-    if (!arm_is_el2_enabled_secstate(env, secure)) {
+    assert(space != ARMSS_Root);
+
+    if (!arm_is_el2_enabled_secstate(env, arm_space_is_secure(space))) {
         /*
          * "This register has no effect if EL2 is not enabled in the
          * current Security state".  This is ARMv8.4-SecEL2 speak for
@@ -XXX,XX +XXX,XX @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
     if (arm_feature(env, ARM_FEATURE_M)) {
         return 0;
     }
-    return arm_hcr_el2_eff_secstate(env, arm_is_secure_below_el3(env));
+    return arm_hcr_el2_eff_secstate(env, arm_security_space_below_el3(env));
 }
 
 /*
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
                                         ARMSecuritySpace space)
 {
     uint64_t hcr_el2;
-    bool is_secure = arm_space_is_secure(space);
 
     if (arm_feature(env, ARM_FEATURE_M)) {
+        bool is_secure = arm_space_is_secure(space);
         switch (env->v7m.mpu_ctrl[is_secure] &
                 (R_V7M_MPU_CTRL_ENABLE_MASK | R_V7M_MPU_CTRL_HFNMIENA_MASK)) {
         case R_V7M_MPU_CTRL_ENABLE_MASK:
@@ -XXX,XX +XXX,XX @@ static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
         }
     }
 
-    hcr_el2 = arm_hcr_el2_eff_secstate(env, is_secure);
 
     switch (mmu_idx) {
     case ARMMMUIdx_Stage2:
     case ARMMMUIdx_Stage2_S:
         /* HCR.DC means HCR.VM behaves as 1 */
+        hcr_el2 = arm_hcr_el2_eff_secstate(env, space);
         return (hcr_el2 & (HCR_DC | HCR_VM)) == 0;
 
     case ARMMMUIdx_E10_0:
     case ARMMMUIdx_E10_1:
     case ARMMMUIdx_E10_1_PAN:
         /* TGE means that EL0/1 act as if SCTLR_EL1.M is zero */
+        hcr_el2 = arm_hcr_el2_eff_secstate(env, space);
         if (hcr_el2 & HCR_TGE) {
             return true;
         }
@@ -XXX,XX +XXX,XX @@ static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
     case ARMMMUIdx_Stage1_E1:
     case ARMMMUIdx_Stage1_E1_PAN:
         /* HCR.DC means SCTLR_EL1.M behaves as 0 */
+        hcr_el2 = arm_hcr_el2_eff_secstate(env, space);
         if (hcr_el2 & HCR_DC) {
             return true;
         }
@@ -XXX,XX +XXX,XX @@ static bool fault_s1ns(ARMSecuritySpace space, ARMMMUIdx s2_mmu_idx)
 static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
                              hwaddr addr, ARMMMUFaultInfo *fi)
 {
-    bool is_secure = ptw->in_secure;
     ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
     ARMMMUIdx s2_mmu_idx = ptw->in_ptw_idx;
     uint8_t pte_attrs;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
     }
 
     if (regime_is_stage2(s2_mmu_idx)) {
-        uint64_t hcr = arm_hcr_el2_eff_secstate(env, is_secure);
+        uint64_t hcr = arm_hcr_el2_eff_secstate(env, ptw->in_space);
 
         if ((hcr & HCR_PTW) && S2_attrs_are_device(hcr, pte_attrs)) {
             /*
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_disabled(CPUARMState *env,
                                    ARMMMUFaultInfo *fi)
 {
     ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
-    bool is_secure = arm_space_is_secure(ptw->in_space);
     uint8_t memattr = 0x00;    /* Device nGnRnE */
     uint8_t shareability = 0;  /* non-shareable */
     int r_el;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_disabled(CPUARMState *env,
 
         /* Fill in cacheattr a-la AArch64.TranslateAddressS1Off. */
         if (r_el == 1) {
-            uint64_t hcr = arm_hcr_el2_eff_secstate(env, is_secure);
+            uint64_t hcr = arm_hcr_el2_eff_secstate(env, ptw->in_space);
             if (hcr & HCR_DC) {
                 if (hcr & HCR_DCT) {
                     memattr = 0xf0;  /* Tagged, Normal, WB, RWA */
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
 {
     hwaddr ipa;
     int s1_prot, s1_lgpgsz;
-    bool is_secure = ptw->in_secure;
     ARMSecuritySpace in_space = ptw->in_space;
     bool ret, ipa_secure;
     ARMCacheAttrs cacheattrs1;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
     }
 
     /* Combine the S1 and S2 cache attributes. */
-    hcr = arm_hcr_el2_eff_secstate(env, is_secure);
+    hcr = arm_hcr_el2_eff_secstate(env, in_space);
     if (hcr & HCR_DC) {
         /*
          * HCR.DC forces the first stage attributes to
-- 
2.34.1

Pass an ARMSecuritySpace instead of a bool secure to
arm_is_el2_enabled_secstate(). This doesn't change behaviour.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230807141514.19075-8-peter.maydell@linaro.org
---
 target/arm/cpu.h    | 13 ++++++++-----
 target/arm/helper.c |  2 +-
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool arm_is_secure(CPUARMState *env)
 
 /*
  * Return true if the current security state has AArch64 EL2 or AArch32 Hyp.
- * This corresponds to the pseudocode EL2Enabled()
+ * This corresponds to the pseudocode EL2Enabled().
  */
-static inline bool arm_is_el2_enabled_secstate(CPUARMState *env, bool secure)
+static inline bool arm_is_el2_enabled_secstate(CPUARMState *env,
+                                               ARMSecuritySpace space)
 {
+    assert(space != ARMSS_Root);
     return arm_feature(env, ARM_FEATURE_EL2)
-           && (!secure || (env->cp15.scr_el3 & SCR_EEL2));
+           && (space != ARMSS_Secure || (env->cp15.scr_el3 & SCR_EEL2));
 }
 
 static inline bool arm_is_el2_enabled(CPUARMState *env)
 {
-    return arm_is_el2_enabled_secstate(env, arm_is_secure_below_el3(env));
+    return arm_is_el2_enabled_secstate(env, arm_security_space_below_el3(env));
 }
 
 #else
@@ -XXX,XX +XXX,XX @@ static inline bool arm_is_secure(CPUARMState *env)
     return false;
 }
 
-static inline bool arm_is_el2_enabled_secstate(CPUARMState *env, bool secure)
+static inline bool arm_is_el2_enabled_secstate(CPUARMState *env,
+                                               ARMSecuritySpace space)
 {
     return false;
 }
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ uint64_t arm_hcr_el2_eff_secstate(CPUARMState *env, ARMSecuritySpace space)
 
     assert(space != ARMSS_Root);
 
-    if (!arm_is_el2_enabled_secstate(env, arm_space_is_secure(space))) {
+    if (!arm_is_el2_enabled_secstate(env, space)) {
         /*
          * "This register has no effect if EL2 is not enabled in the
          * current Security state".  This is ARMv8.4-SecEL2 speak for
-- 
2.34.1

When we do a translation in Secure state, the NSTable bits in table
descriptors may downgrade us to NonSecure; we update ptw->in_secure
and ptw->in_space accordingly.  We guard that check correctly with a
conditional that means it's only applied for Secure stage 1
translations.  However, later on in get_phys_addr_lpae() we fold the
effects of the NSTable bits into the final descriptor attributes
bits, and there we do it unconditionally regardless of the CPU state.
That means that in Realm state (where in_secure is false) we will set
bit 5 in attrs, and later use it to decide to output to non-secure
space.

We don't in fact need to do this folding in at all any more (since
commit 2f1ff4e7b9f30c): if an NSTable bit was set then we have
already set ptw->in_space to ARMSS_NonSecure, and in that situation
we don't look at attrs bit 5.  The only thing we still need to deal
with is the real NS bit in the final descriptor word, so we can just
drop the code that ORed in the NSTable bit.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230807141514.19075-9-peter.maydell@linaro.org
---
 target/arm/ptw.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
      * Extract attributes from the (modified) descriptor, and apply
      * table descriptors. Stage 2 table descriptors do not include
      * any attribute fields. HPD disables all the table attributes
-     * except NSTable.
+     * except NSTable (which we have already handled).
      */
     attrs = new_descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 14));
     if (!regime_is_stage2(mmu_idx)) {
-        attrs |= !ptw->in_secure << 5; /* NS */
         if (!param.hpd) {
             attrs |= extract64(tableattrs, 0, 2) << 53;     /* XN, PXN */
             /*
-- 
2.34.1

Replace the last uses of ptw->in_secure with appropriate
checks on ptw->in_space.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230807141514.19075-10-peter.maydell@linaro.org
---
 target/arm/ptw.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
                                       ARMMMUFaultInfo *fi)
 {
     ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
-    bool is_secure = ptw->in_secure;
     ARMMMUIdx s1_mmu_idx;
 
     /*
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
      * cannot upgrade a NonSecure translation regime's attributes
      * to Secure or Realm.
      */
-    result->f.attrs.secure = is_secure;
     result->f.attrs.space = ptw->in_space;
+    result->f.attrs.secure = arm_space_is_secure(ptw->in_space);
 
     switch (mmu_idx) {
     case ARMMMUIdx_Phys_S:
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
     case ARMMMUIdx_Stage1_E0:
     case ARMMMUIdx_Stage1_E1:
     case ARMMMUIdx_Stage1_E1_PAN:
-        /* First stage lookup uses second stage for ptw. */
-        ptw->in_ptw_idx = is_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
+        /*
+         * First stage lookup uses second stage for ptw; only
+         * Secure has both S and NS IPA and starts with Stage2_S.
+         */
+        ptw->in_ptw_idx = (ptw->in_space == ARMSS_Secure) ?
+            ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
         break;
 
     case ARMMMUIdx_Stage2:
-- 
2.34.1

We no longer look at the in_secure field of the S1Translate struct
anyway, so we can remove it and all the code which sets it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230807141514.19075-11-peter.maydell@linaro.org
---
 target/arm/ptw.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
      *    value being Stage2 vs Stage2_S distinguishes those.
      */
     ARMSecuritySpace in_space;
-    /*
-     * in_secure: whether the translation regime is a Secure one.
-     * This is always equal to arm_space_is_secure(in_space).
-     * If a Secure ptw is "downgraded" to NonSecure by an NSTable bit,
-     * this field is updated accordingly.
-     */
-    bool in_secure;
     /*
      * in_debug: is this a QEMU debug access (gdbstub, etc)? Debug
      * accesses will not update the guest page table access flags
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
         S1Translate s2ptw = {
             .in_mmu_idx = s2_mmu_idx,
             .in_ptw_idx = ptw_idx_for_stage_2(env, s2_mmu_idx),
-            .in_secure = arm_space_is_secure(s2_space),
             .in_space = s2_space,
             .in_debug = true,
         };
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_S + 1 != ARMMMUIdx_Phys_NS);
         QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2_S + 1 != ARMMMUIdx_Stage2);
         ptw->in_ptw_idx += 1;
-        ptw->in_secure = false;
         ptw->in_space = ARMSS_NonSecure;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
 
     ptw->in_s1_is_el0 = ptw->in_mmu_idx == ARMMMUIdx_Stage1_E0;
     ptw->in_mmu_idx = ipa_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
-    ptw->in_secure = ipa_secure;
     ptw->in_space = ipa_space;
     ptw->in_ptw_idx = ptw_idx_for_stage_2(env, ptw->in_mmu_idx);
 
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
 {
     S1Translate ptw = {
         .in_mmu_idx = mmu_idx,
-        .in_secure = is_secure,
         .in_space = arm_secure_to_space(is_secure),
     };
     return get_phys_addr_gpc(env, &ptw, address, access_type, result, fi);
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
     }
 
     ptw.in_space = ss;
-    ptw.in_secure = arm_space_is_secure(ss);
     return get_phys_addr_gpc(env, &ptw, address, access_type, result, fi);
 }
 
@@ -XXX,XX +XXX,XX @@ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
     S1Translate ptw = {
         .in_mmu_idx = mmu_idx,
         .in_space = ss,
-        .in_secure = arm_space_is_secure(ss),
         .in_debug = true,
     };
     GetPhysAddrResult res = {};
-- 
2.34.1

We only use S1Translate::out_secure in two places, where we are
setting up MemTxAttrs for a page table load. We can use
arm_space_is_secure(ptw->out_space) instead, which guarantees
that we're setting the MemTxAttrs secure and space fields
consistently, and allows us to drop the out_secure field in
S1Translate entirely.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230807141514.19075-12-peter.maydell@linaro.org
---
 target/arm/ptw.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
      * Stage 2 is indicated by in_mmu_idx set to ARMMMUIdx_Stage2{,_S}.
      */
     bool in_s1_is_el0;
-    bool out_secure;
     bool out_rw;
     bool out_be;
     ARMSecuritySpace out_space;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
         pte_attrs = s2.cacheattrs.attrs;
         ptw->out_host = NULL;
         ptw->out_rw = false;
-        ptw->out_secure = s2.f.attrs.secure;
         ptw->out_space = s2.f.attrs.space;
     } else {
 #ifdef CONFIG_TCG
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
         ptw->out_phys = full->phys_addr | (addr & ~TARGET_PAGE_MASK);
         ptw->out_rw = full->prot & PAGE_WRITE;
         pte_attrs = full->pte_attrs;
-        ptw->out_secure = full->attrs.secure;
         ptw->out_space = full->attrs.space;
 #else
         g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static uint32_t arm_ldl_ptw(CPUARMState *env, S1Translate *ptw,
     } else {
         /* Page tables are in MMIO. */
         MemTxAttrs attrs = {
-            .secure = ptw->out_secure,
             .space = ptw->out_space,
+            .secure = arm_space_is_secure(ptw->out_space),
         };
         AddressSpace *as = arm_addressspace(cs, attrs);
         MemTxResult result = MEMTX_OK;
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_ldq_ptw(CPUARMState *env, S1Translate *ptw,
     } else {
         /* Page tables are in MMIO. */
         MemTxAttrs attrs = {
-            .secure = ptw->out_secure,
             .space = ptw->out_space,
+            .secure = arm_space_is_secure(ptw->out_space),
         };
         AddressSpace *as = arm_addressspace(cs, attrs);
         MemTxResult result = MEMTX_OK;
-- 
2.34.1

When the MMU is disabled, data accesses should be Device nGnRnE,
Outer Shareable, Untagged.  We handle the other cases from
AArch64.S1DisabledOutput() correctly but missed this one.
Device nGnRnE is memattr == 0, so the only part we were missing
was that shareability should be set to 2 for both insn fetches
and data accesses.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230807141514.19075-13-peter.maydell@linaro.org
---
 target/arm/ptw.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_disabled(CPUARMState *env,
                 }
             }
         }
-        if (memattr == 0 && access_type == MMU_INST_FETCH) {
-            if (regime_sctlr(env, mmu_idx) & SCTLR_I) {
-                memattr = 0xee;  /* Normal, WT, RA, NT */
-            } else {
-                memattr = 0x44;  /* Normal, NC, No */
+        if (memattr == 0) {
+            if (access_type == MMU_INST_FETCH) {
+                if (regime_sctlr(env, mmu_idx) & SCTLR_I) {
+                    memattr = 0xee;  /* Normal, WT, RA, NT */
+                } else {
+                    memattr = 0x44;  /* Normal, NC, No */
+                }
             }
             shareability = 2; /* outer shareable */
         }
-- 
2.34.1

The architecture doesn't permit block descriptors at any arbitrary
level of the page table walk; it depends on the granule size which
levels are permitted.  We implemented only a partial version of this
check which assumes that block descriptors are valid at all levels
except level 3, which meant that we wouldn't deliver the Translation
fault for all cases of this sort of guest page table error.

Implement the logic corresponding to the pseudocode
AArch64.DecodeDescriptorType() and AArch64.BlockDescSupported().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230807141514.19075-14-peter.maydell@linaro.org
---
 target/arm/ptw.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static int check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, uint64_t tcr,
     return INT_MIN;
 }
 
+static bool lpae_block_desc_valid(ARMCPU *cpu, bool ds,
+                                  ARMGranuleSize gran, int level)
+{
+    /*
+     * See pseudocode AArch46.BlockDescSupported(): block descriptors
+     * are not valid at all levels, depending on the page size.
+     */
+    switch (gran) {
+    case Gran4K:
+        return (level == 0 && ds) || level == 1 || level == 2;
+    case Gran16K:
+        return (level == 1 && ds) || level == 2;
+    case Gran64K:
+        return (level == 1 && arm_pamax(cpu) == 52) || level == 2;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 /**
  * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
  *
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     new_descriptor = descriptor;
 
  restart_atomic_update:
-    if (!(descriptor & 1) || (!(descriptor & 2) && (level == 3))) {
-        /* Invalid, or the Reserved level 3 encoding */
+    if (!(descriptor & 1) ||
+        (!(descriptor & 2) &&
+         !lpae_block_desc_valid(cpu, param.ds, param.gran, level))) {
+        /* Invalid, or a block descriptor at an invalid level */
         goto do_translation_fault;
     }
 
-- 
2.34.1

When we report faults due to stage 2 faults during a stage 1
page table walk, the 'level' parameter should be the level
of the walk in stage 2 that faulted, not the level of the
walk in stage 1. Correct the reporting of these faults.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230807141514.19075-15-peter.maydell@linaro.org
---
 target/arm/ptw.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
  do_translation_fault:
     fi->type = ARMFault_Translation;
  do_fault:
-    fi->level = level;
-    /* Tag the error as S2 for failed S1 PTW at S2 or ordinary S2.  */
-    fi->stage2 = fi->s1ptw || regime_is_stage2(mmu_idx);
+    if (fi->s1ptw) {
+        /* Retain the existing stage 2 fi->level */
+        assert(fi->stage2);
+    } else {
+        fi->level = level;
+        fi->stage2 = regime_is_stage2(mmu_idx);
+    }
     fi->s1ns = fault_s1ns(ptw->in_space, mmu_idx);
     return true;
 }
-- 
2.34.1

The PAR_EL1.SH field documents that for the cases of:
 * Device memory
 * Normal memory with both Inner and Outer Non-Cacheable
the field should be 0b10 rather than whatever was in the
translation table descriptor field. (In the pseudocode this
is handled by PAREncodeShareability().) Perform this
adjustment when assembling a PAR value.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230807141514.19075-16-peter.maydell@linaro.org
---
 target/arm/helper.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPAccessResult ats_access(CPUARMState *env, const ARMCPRegInfo *ri,
 }
 
 #ifdef CONFIG_TCG
+static int par_el1_shareability(GetPhysAddrResult *res)
+{
+    /*
+     * The PAR_EL1.SH field must be 0b10 for Device or Normal-NC
+     * memory -- see pseudocode PAREncodeShareability().
+     */
+    if (((res->cacheattrs.attrs & 0xf0) == 0) ||
+        res->cacheattrs.attrs == 0x44 || res->cacheattrs.attrs == 0x40) {
+        return 2;
+    }
+    return res->cacheattrs.shareability;
+}
+
 static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
                              MMUAccessType access_type, ARMMMUIdx mmu_idx,
                              bool is_secure)
@@ -XXX,XX +XXX,XX @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
                 par64 |= (1 << 9); /* NS */
             }
             par64 |= (uint64_t)res.cacheattrs.attrs << 56; /* ATTR */
-            par64 |= res.cacheattrs.shareability << 7; /* SH */
+            par64 |= par_el1_shareability(&res) << 7; /* SH */
         } else {
             uint32_t fsr = arm_fi_to_lfsc(&fi);
 
-- 
2.34.1

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

In realm state, stage-2 translation tables are fetched from the realm
physical address space (R_PGRQD).

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230809123706.1842548-2-jean-philippe@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static ARMMMUIdx ptw_idx_for_stage_2(CPUARMState *env, ARMMMUIdx stage2idx)
 
     /*
      * We're OK to check the current state of the CPU here because
-     * (1) we always invalidate all TLBs when the SCR_EL3.NS bit changes
+     * (1) we always invalidate all TLBs when the SCR_EL3.NS or SCR_EL3.NSE bit
+     * changes.
      * (2) there's no way to do a lookup that cares about Stage 2 for a
      * different security state to the current one for AArch64, and AArch32
      * never has a secure EL2. (AArch32 ATS12NSO[UP][RW] allow EL3 to do
      * an NS stage 1+2 lookup while the NS bit is 0.)
      */
-    if (!arm_is_secure_below_el3(env) || !arm_el_is_aa64(env, 3)) {
+    if (!arm_el_is_aa64(env, 3)) {
         return ARMMMUIdx_Phys_NS;
     }
-    if (stage2idx == ARMMMUIdx_Stage2_S) {
-        s2walk_secure = !(env->cp15.vstcr_el2 & VSTCR_SW);
-    } else {
-        s2walk_secure = !(env->cp15.vtcr_el2 & VTCR_NSW);
-    }
-    return s2walk_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS;
 
+    switch (arm_security_space_below_el3(env)) {
+    case ARMSS_NonSecure:
+        return ARMMMUIdx_Phys_NS;
+    case ARMSS_Realm:
+        return ARMMMUIdx_Phys_Realm;
+    case ARMSS_Secure:
+        if (stage2idx == ARMMMUIdx_Stage2_S) {
+            s2walk_secure = !(env->cp15.vstcr_el2 & VSTCR_SW);
+        } else {
+            s2walk_secure = !(env->cp15.vtcr_el2 & VTCR_NSW);
+        }
+        return s2walk_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS;
+    default:
+        g_assert_not_reached();
+    }
 }
 
 static bool regime_translation_big_endian(CPUARMState *env, ARMMMUIdx mmu_idx)
-- 
2.34.1

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

When HCR_EL2.E2H is enabled, TLB entries are formed using the EL2&0
translation regime, instead of the EL2 translation regime. The TLB VAE2*
instructions invalidate the regime that corresponds to the current value
of HCR_EL2.E2H.

At the moment we only invalidate the EL2 translation regime. This causes
problems with RMM, which issues TLBI VAE2IS instructions with
HCR_EL2.E2H enabled. Update vae2_tlbmask() to take HCR_EL2.E2H into
account.

Add vae2_tlbbits() as well, since the top-byte-ignore configuration is
different between the EL2&0 and EL2 regime.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230809123706.1842548-3-jean-philippe@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 50 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 40 insertions(+), 10 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static int vae1_tlbmask(CPUARMState *env)
     return mask;
 }
 
+static int vae2_tlbmask(CPUARMState *env)
+{
+    uint64_t hcr = arm_hcr_el2_eff(env);
+    uint16_t mask;
+
+    if (hcr & HCR_E2H) {
+        mask = ARMMMUIdxBit_E20_2 |
+               ARMMMUIdxBit_E20_2_PAN |
+               ARMMMUIdxBit_E20_0;
+    } else {
+        mask = ARMMMUIdxBit_E2;
+    }
+    return mask;
+}
+
 /* Return 56 if TBI is enabled, 64 otherwise. */
 static int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx,
                               uint64_t addr)
@@ -XXX,XX +XXX,XX @@ static int vae1_tlbbits(CPUARMState *env, uint64_t addr)
     return tlbbits_for_regime(env, mmu_idx, addr);
 }
 
+static int vae2_tlbbits(CPUARMState *env, uint64_t addr)
+{
+    uint64_t hcr = arm_hcr_el2_eff(env);
+    ARMMMUIdx mmu_idx;
+
+    /*
+     * Only the regime of the mmu_idx below is significant.
+     * Regime EL2&0 has two ranges with separate TBI configuration, while EL2
+     * only has one.
+     */
+    if (hcr & HCR_E2H) {
+        mmu_idx = ARMMMUIdx_E20_2;
+    } else {
+        mmu_idx = ARMMMUIdx_E2;
+    }
+
+    return tlbbits_for_regime(env, mmu_idx, addr);
+}
+
 static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                       uint64_t value)
 {
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae2_write(CPUARMState *env, const ARMCPRegInfo *ri,
      * flush-last-level-only.
      */
     CPUState *cs = env_cpu(env);
-    int mask = e2_tlbmask(env);
+    int mask = vae2_tlbmask(env);
     uint64_t pageaddr = sextract64(value << 12, 0, 56);
+    int bits = vae2_tlbbits(env, pageaddr);
 
-    tlb_flush_page_by_mmuidx(cs, pageaddr, mask);
+    tlb_flush_page_bits_by_mmuidx(cs, pageaddr, mask, bits);
 }
 
 static void tlbi_aa64_vae3_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                    uint64_t value)
 {
     CPUState *cs = env_cpu(env);
+    int mask = vae2_tlbmask(env);
     uint64_t pageaddr = sextract64(value << 12, 0, 56);
-    int bits = tlbbits_for_regime(env, ARMMMUIdx_E2, pageaddr);
+    int bits = vae2_tlbbits(env, pageaddr);
 
-    tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr,
-                                                  ARMMMUIdxBit_E2, bits);
+    tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits);
 }
 
 static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_rvae1is_write(CPUARMState *env,
     do_rvae_write(env, value, vae1_tlbmask(env), true);
 }
 
-static int vae2_tlbmask(CPUARMState *env)
-{
-    return ARMMMUIdxBit_E2;
-}
-
 static void tlbi_aa64_rvae2_write(CPUARMState *env,
                                   const ARMCPRegInfo *ri,
                                   uint64_t value)
-- 
2.34.1

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

GPC checks are not performed on the output address for AT instructions,
as stated by ARM DDI 0487J in D8.12.2:

When populating PAR_EL1 with the result of an address translation
  instruction, granule protection checks are not performed on the final
  output address of a successful translation.

Rename get_phys_addr_with_secure(), since it's only used to handle AT
instructions.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230809123706.1842548-4-jean-philippe@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h | 25 ++++++++++++++-----------
 target/arm/helper.c    |  8 ++++++--
 target/arm/ptw.c       | 11 ++++++-----
 3 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ typedef struct GetPhysAddrResult {
 } GetPhysAddrResult;
 
 /**
- * get_phys_addr_with_secure: get the physical address for a virtual address
+ * get_phys_addr: get the physical address for a virtual address
  * @env: CPUARMState
  * @address: virtual address to get physical address for
  * @access_type: 0 for read, 1 for write, 2 for execute
  * @mmu_idx: MMU index indicating required translation regime
- * @is_secure: security state for the access
  * @result: set on translation success.
  * @fi: set to fault info if the translation fails
  *
@@ -XXX,XX +XXX,XX @@ typedef struct GetPhysAddrResult {
  *  * for PSMAv5 based systems we don't bother to return a full FSR format
  *    value.
  */
-bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
-                               MMUAccessType access_type,
-                               ARMMMUIdx mmu_idx, bool is_secure,
-                               GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
+bool get_phys_addr(CPUARMState *env, target_ulong address,
+                   MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                   GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
     __attribute__((nonnull));
 
 /**
- * get_phys_addr: get the physical address for a virtual address
+ * get_phys_addr_with_secure_nogpc: get the physical address for a virtual
+ *                                  address
  * @env: CPUARMState
  * @address: virtual address to get physical address for
  * @access_type: 0 for read, 1 for write, 2 for execute
  * @mmu_idx: MMU index indicating required translation regime
+ * @is_secure: security state for the access
  * @result: set on translation success.
  * @fi: set to fault info if the translation fails
  *
- * Similarly, but use the security regime of @mmu_idx.
+ * Similar to get_phys_addr, but use the given security regime and don't perform
+ * a Granule Protection Check on the resulting address.
  */
-bool get_phys_addr(CPUARMState *env, target_ulong address,
-                   MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                   GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
+bool get_phys_addr_with_secure_nogpc(CPUARMState *env, target_ulong address,
+                                     MMUAccessType access_type,
+                                     ARMMMUIdx mmu_idx, bool is_secure,
+                                     GetPhysAddrResult *result,
+                                     ARMMMUFaultInfo *fi)
     __attribute__((nonnull));
 
 bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
     ARMMMUFaultInfo fi = {};
     GetPhysAddrResult res = {};
 
-    ret = get_phys_addr_with_secure(env, value, access_type, mmu_idx,
-                                    is_secure, &res, &fi);
+    /*
+     * I_MXTJT: Granule protection checks are not performed on the final address
+     * of a successful translation.
+     */
+    ret = get_phys_addr_with_secure_nogpc(env, value, access_type, mmu_idx,
+                                          is_secure, &res, &fi);
 
     /*
      * ATS operations only do S1 or S1+S2 translations, so we never
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_gpc(CPUARMState *env, S1Translate *ptw,
     return false;
 }
 
-bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
-                               MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                               bool is_secure, GetPhysAddrResult *result,
-                               ARMMMUFaultInfo *fi)
+bool get_phys_addr_with_secure_nogpc(CPUARMState *env, target_ulong address,
+                                     MMUAccessType access_type,
+                                     ARMMMUIdx mmu_idx, bool is_secure,
+                                     GetPhysAddrResult *result,
+                                     ARMMMUFaultInfo *fi)
 {
     S1Translate ptw = {
         .in_mmu_idx = mmu_idx,
         .in_space = arm_secure_to_space(is_secure),
     };
-    return get_phys_addr_gpc(env, &ptw, address, access_type, result, fi);
+    return get_phys_addr_nogpc(env, &ptw, address, access_type, result, fi);
 }
 
 bool get_phys_addr(CPUARMState *env, target_ulong address,
-- 
2.34.1

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

At the moment we only handle Secure and Nonsecure security spaces for
the AT instructions. Add support for Realm and Root.

For AArch64, arm_security_space() gives the desired space. ARM DDI0487J
says (R_NYXTL):

If EL3 is implemented, then when an address translation instruction
  that applies to an Exception level lower than EL3 is executed, the
  Effective value of SCR_EL3.{NSE, NS} determines the target Security
  state that the instruction applies to.

For AArch32, some instructions can access NonSecure space from Secure,
so we still need to pass the state explicitly to do_ats_write().

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230809123706.1842548-5-jean-philippe@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h | 18 +++++++++---------
 target/arm/helper.c    | 27 ++++++++++++---------------
 target/arm/ptw.c       | 12 ++++++------
 3 files changed, 27 insertions(+), 30 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
     __attribute__((nonnull));
 
 /**
- * get_phys_addr_with_secure_nogpc: get the physical address for a virtual
- *                                  address
+ * get_phys_addr_with_space_nogpc: get the physical address for a virtual
+ *                                 address
  * @env: CPUARMState
  * @address: virtual address to get physical address for
  * @access_type: 0 for read, 1 for write, 2 for execute
  * @mmu_idx: MMU index indicating required translation regime
- * @is_secure: security state for the access
+ * @space: security space for the access
  * @result: set on translation success.
  * @fi: set to fault info if the translation fails
  *
- * Similar to get_phys_addr, but use the given security regime and don't perform
+ * Similar to get_phys_addr, but use the given security space and don't perform
  * a Granule Protection Check on the resulting address.
  */
-bool get_phys_addr_with_secure_nogpc(CPUARMState *env, target_ulong address,
-                                     MMUAccessType access_type,
-                                     ARMMMUIdx mmu_idx, bool is_secure,
-                                     GetPhysAddrResult *result,
-                                     ARMMMUFaultInfo *fi)
+bool get_phys_addr_with_space_nogpc(CPUARMState *env, target_ulong address,
+                                    MMUAccessType access_type,
+                                    ARMMMUIdx mmu_idx, ARMSecuritySpace space,
+                                    GetPhysAddrResult *result,
+                                    ARMMMUFaultInfo *fi)
     __attribute__((nonnull));
 
 bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static int par_el1_shareability(GetPhysAddrResult *res)
 
 static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
                              MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                             bool is_secure)
+                             ARMSecuritySpace ss)
 {
     bool ret;
     uint64_t par64;
@@ -XXX,XX +XXX,XX @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
      * I_MXTJT: Granule protection checks are not performed on the final address
      * of a successful translation.
      */
-    ret = get_phys_addr_with_secure_nogpc(env, value, access_type, mmu_idx,
-                                          is_secure, &res, &fi);
+    ret = get_phys_addr_with_space_nogpc(env, value, access_type, mmu_idx, ss,
+                                         &res, &fi);
 
     /*
      * ATS operations only do S1 or S1+S2 translations, so we never
@@ -XXX,XX +XXX,XX @@ static void ats_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
     uint64_t par64;
     ARMMMUIdx mmu_idx;
     int el = arm_current_el(env);
-    bool secure = arm_is_secure_below_el3(env);
+    ARMSecuritySpace ss = arm_security_space(env);
 
     switch (ri->opc2 & 6) {
     case 0:
@@ -XXX,XX +XXX,XX @@ static void ats_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
         switch (el) {
         case 3:
             mmu_idx = ARMMMUIdx_E3;
-            secure = true;
             break;
         case 2:
-            g_assert(!secure);  /* ARMv8.4-SecEL2 is 64-bit only */
+            g_assert(ss != ARMSS_Secure);  /* ARMv8.4-SecEL2 is 64-bit only */
             /* fall through */
         case 1:
             if (ri->crm == 9 && (env->uncached_cpsr & CPSR_PAN)) {
@@ -XXX,XX +XXX,XX @@ static void ats_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
         switch (el) {
         case 3:
             mmu_idx = ARMMMUIdx_E10_0;
-            secure = true;
             break;
         case 2:
-            g_assert(!secure);  /* ARMv8.4-SecEL2 is 64-bit only */
+            g_assert(ss != ARMSS_Secure);  /* ARMv8.4-SecEL2 is 64-bit only */
             mmu_idx = ARMMMUIdx_Stage1_E0;
             break;
         case 1:
@@ -XXX,XX +XXX,XX @@ static void ats_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
     case 4:
         /* stage 1+2 NonSecure PL1: ATS12NSOPR, ATS12NSOPW */
         mmu_idx = ARMMMUIdx_E10_1;
-        secure = false;
+        ss = ARMSS_NonSecure;
         break;
     case 6:
         /* stage 1+2 NonSecure PL0: ATS12NSOUR, ATS12NSOUW */
         mmu_idx = ARMMMUIdx_E10_0;
-        secure = false;
+        ss = ARMSS_NonSecure;
         break;
     default:
         g_assert_not_reached();
     }
 
-    par64 = do_ats_write(env, value, access_type, mmu_idx, secure);
+    par64 = do_ats_write(env, value, access_type, mmu_idx, ss);
 
     A32_BANKED_CURRENT_REG_SET(env, par, par64);
 #else
@@ -XXX,XX +XXX,XX @@ static void ats1h_write(CPUARMState *env, const ARMCPRegInfo *ri,
     uint64_t par64;
 
     /* There is no SecureEL2 for AArch32. */
-    par64 = do_ats_write(env, value, access_type, ARMMMUIdx_E2, false);
+    par64 = do_ats_write(env, value, access_type, ARMMMUIdx_E2,
+                         ARMSS_NonSecure);
 
     A32_BANKED_CURRENT_REG_SET(env, par, par64);
 #else
@@ -XXX,XX +XXX,XX @@ static void ats_write64(CPUARMState *env, const ARMCPRegInfo *ri,
 #ifdef CONFIG_TCG
     MMUAccessType access_type = ri->opc2 & 1 ? MMU_DATA_STORE : MMU_DATA_LOAD;
     ARMMMUIdx mmu_idx;
-    int secure = arm_is_secure_below_el3(env);
     uint64_t hcr_el2 = arm_hcr_el2_eff(env);
     bool regime_e20 = (hcr_el2 & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE);
 
@@ -XXX,XX +XXX,XX @@ static void ats_write64(CPUARMState *env, const ARMCPRegInfo *ri,
             break;
         case 6: /* AT S1E3R, AT S1E3W */
             mmu_idx = ARMMMUIdx_E3;
-            secure = true;
             break;
         default:
             g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static void ats_write64(CPUARMState *env, const ARMCPRegInfo *ri,
     }
 
     env->cp15.par_el[1] = do_ats_write(env, value, access_type,
-                                       mmu_idx, secure);
+                                       mmu_idx, arm_security_space(env));
 #else
     /* Handled by hardware accelerator. */
     g_assert_not_reached();
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_gpc(CPUARMState *env, S1Translate *ptw,
     return false;
 }
 
-bool get_phys_addr_with_secure_nogpc(CPUARMState *env, target_ulong address,
-                                     MMUAccessType access_type,
-                                     ARMMMUIdx mmu_idx, bool is_secure,
-                                     GetPhysAddrResult *result,
-                                     ARMMMUFaultInfo *fi)
+bool get_phys_addr_with_space_nogpc(CPUARMState *env, target_ulong address,
+                                    MMUAccessType access_type,
+                                    ARMMMUIdx mmu_idx, ARMSecuritySpace space,
+                                    GetPhysAddrResult *result,
+                                    ARMMMUFaultInfo *fi)
 {
     S1Translate ptw = {
         .in_mmu_idx = mmu_idx,
-        .in_space = arm_secure_to_space(is_secure),
+        .in_space = space,
     };
     return get_phys_addr_nogpc(env, &ptw, address, access_type, result, fi);
 }
-- 
2.34.1

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

The AT instruction is UNDEFINED if the {NSE,NS} configuration is
invalid. Add a function to check this on all AT instructions that apply
to an EL lower than 3.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-id: 20230809123706.1842548-6-jean-philippe@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void ats1h_write(CPUARMState *env, const ARMCPRegInfo *ri,
 #endif /* CONFIG_TCG */
 }
 
+static CPAccessResult at_e012_access(CPUARMState *env, const ARMCPRegInfo *ri,
+                                     bool isread)
+{
+    /*
+     * R_NYXTL: instruction is UNDEFINED if it applies to an Exception level
+     * lower than EL3 and the combination SCR_EL3.{NSE,NS} is reserved. This can
+     * only happen when executing at EL3 because that combination also causes an
+     * illegal exception return. We don't need to check FEAT_RME either, because
+     * scr_write() ensures that the NSE bit is not set otherwise.
+     */
+    if ((env->cp15.scr_el3 & (SCR_NSE | SCR_NS)) == SCR_NSE) {
+        return CP_ACCESS_TRAP;
+    }
+    return CP_ACCESS_OK;
+}
+
 static CPAccessResult at_s1e2_access(CPUARMState *env, const ARMCPRegInfo *ri,
                                      bool isread)
 {
@@ -XXX,XX +XXX,XX @@ static CPAccessResult at_s1e2_access(CPUARMState *env, const ARMCPRegInfo *ri,
         !(env->cp15.scr_el3 & (SCR_NS | SCR_EEL2))) {
         return CP_ACCESS_TRAP;
     }
-    return CP_ACCESS_OK;
+    return at_e012_access(env, ri, isread);
 }
 
 static void ats_write64(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 0,
       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
       .fgt = FGT_ATS1E1R,
-      .writefn = ats_write64 },
+      .accessfn = at_e012_access, .writefn = ats_write64 },
     { .name = "AT_S1E1W", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 1,
       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
       .fgt = FGT_ATS1E1W,
-      .writefn = ats_write64 },
+      .accessfn = at_e012_access, .writefn = ats_write64 },
     { .name = "AT_S1E0R", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 2,
       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
       .fgt = FGT_ATS1E0R,
-      .writefn = ats_write64 },
+      .accessfn = at_e012_access, .writefn = ats_write64 },
     { .name = "AT_S1E0W", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 3,
       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
       .fgt = FGT_ATS1E0W,
-      .writefn = ats_write64 },
+      .accessfn = at_e012_access, .writefn = ats_write64 },
     { .name = "AT_S12E1R", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 7, .crm = 8, .opc2 = 4,
       .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
-      .writefn = ats_write64 },
+      .accessfn = at_e012_access, .writefn = ats_write64 },
     { .name = "AT_S12E1W", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 7, .crm = 8, .opc2 = 5,
       .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
-      .writefn = ats_write64 },
+      .accessfn = at_e012_access, .writefn = ats_write64 },
     { .name = "AT_S12E0R", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 7, .crm = 8, .opc2 = 6,
       .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
-      .writefn = ats_write64 },
+      .accessfn = at_e012_access, .writefn = ats_write64 },
     { .name = "AT_S12E0W", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 7, .crm = 8, .opc2 = 7,
       .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
-      .writefn = ats_write64 },
+      .accessfn = at_e012_access, .writefn = ats_write64 },
     /* AT S1E2* are elsewhere as they UNDEF from EL3 if EL2 is not present */
     { .name = "AT_S1E3R", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 6, .crn = 7, .crm = 8, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo ats1e1_reginfo[] = {
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 0,
       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
       .fgt = FGT_ATS1E1RP,
-      .writefn = ats_write64 },
+      .accessfn = at_e012_access, .writefn = ats_write64 },
     { .name = "AT_S1E1WP", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 1,
       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
       .fgt = FGT_ATS1E1WP,
-      .writefn = ats_write64 },
+      .accessfn = at_e012_access, .writefn = ats_write64 },
 };
 
 static const ARMCPRegInfo ats1cp_reginfo[] = {
-- 
2.34.1

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

When FEAT_RME is implemented, these bits override the value of
CNT[VP]_CTL_EL0.IMASK in Realm and Root state. Move the IRQ state update
into a new gt_update_irq() function and test those bits every time we
recompute the IRQ state.

Since we're removing the IRQ state from some trace events, add a new
trace event for gt_update_irq().

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-id: 20230809123706.1842548-7-jean-philippe@linaro.org
[PMM: only register change hook if not USER_ONLY and if TCG]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h        |  4 +++
 target/arm/cpu.c        |  6 ++++
 target/arm/helper.c     | 65 ++++++++++++++++++++++++++++++++++-------
 target/arm/trace-events |  7 +++--
 4 files changed, 68 insertions(+), 14 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
 };
 
 unsigned int gt_cntfrq_period_ns(ARMCPU *cpu);
+void gt_rme_post_el_change(ARMCPU *cpu, void *opaque);
 
 void arm_cpu_post_init(Object *obj);
 
@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
 #define HSTR_TTEE (1 << 16)
 #define HSTR_TJDBX (1 << 17)
 
+#define CNTHCTL_CNTVMASK      (1 << 18)
+#define CNTHCTL_CNTPMASK      (1 << 19)
+
 /* Return the current FPSCR value.  */
 uint32_t vfp_get_fpscr(CPUARMState *env);
 void vfp_set_fpscr(CPUARMState *env, uint32_t val);
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         set_feature(env, ARM_FEATURE_VBAR);
     }
 
+#ifndef CONFIG_USER_ONLY
+    if (tcg_enabled() && cpu_isar_feature(aa64_rme, cpu)) {
+        arm_register_el_change_hook(cpu, &gt_rme_post_el_change, 0);
+    }
+#endif
+
     register_cp_regs_for_features(cpu);
     arm_cpu_register_gdb_regs_for_features(cpu);
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t gt_get_countervalue(CPUARMState *env)
     return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) / gt_cntfrq_period_ns(cpu);
 }
 
+static void gt_update_irq(ARMCPU *cpu, int timeridx)
+{
+    CPUARMState *env = &cpu->env;
+    uint64_t cnthctl = env->cp15.cnthctl_el2;
+    ARMSecuritySpace ss = arm_security_space(env);
+    /* ISTATUS && !IMASK */
+    int irqstate = (env->cp15.c14_timer[timeridx].ctl & 6) == 4;
+
+    /*
+     * If bit CNTHCTL_EL2.CNT[VP]MASK is set, it overrides IMASK.
+     * It is RES0 in Secure and NonSecure state.
+     */
+    if ((ss == ARMSS_Root || ss == ARMSS_Realm) &&
+        ((timeridx == GTIMER_VIRT && (cnthctl & CNTHCTL_CNTVMASK)) ||
+         (timeridx == GTIMER_PHYS && (cnthctl & CNTHCTL_CNTPMASK)))) {
+        irqstate = 0;
+    }
+
+    qemu_set_irq(cpu->gt_timer_outputs[timeridx], irqstate);
+    trace_arm_gt_update_irq(timeridx, irqstate);
+}
+
+void gt_rme_post_el_change(ARMCPU *cpu, void *ignored)
+{
+    /*
+     * Changing security state between Root and Secure/NonSecure, which may
+     * happen when switching EL, can change the effective value of CNTHCTL_EL2
+     * mask bits. Update the IRQ state accordingly.
+     */
+    gt_update_irq(cpu, GTIMER_VIRT);
+    gt_update_irq(cpu, GTIMER_PHYS);
+}
+
 static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
 {
     ARMGenericTimer *gt = &cpu->env.cp15.c14_timer[timeridx];
@@ -XXX,XX +XXX,XX @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
         /* Note that this must be unsigned 64 bit arithmetic: */
         int istatus = count - offset >= gt->cval;
         uint64_t nexttick;
-        int irqstate;
 
         gt->ctl = deposit32(gt->ctl, 2, 1, istatus);
 
-        irqstate = (istatus && !(gt->ctl & 2));
-        qemu_set_irq(cpu->gt_timer_outputs[timeridx], irqstate);
-
         if (istatus) {
             /* Next transition is when count rolls back over to zero */
             nexttick = UINT64_MAX;
@@ -XXX,XX +XXX,XX @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
         } else {
             timer_mod(cpu->gt_timer[timeridx], nexttick);
         }
-        trace_arm_gt_recalc(timeridx, irqstate, nexttick);
+        trace_arm_gt_recalc(timeridx, nexttick);
     } else {
         /* Timer disabled: ISTATUS and timer output always clear */
         gt->ctl &= ~4;
-        qemu_set_irq(cpu->gt_timer_outputs[timeridx], 0);
         timer_del(cpu->gt_timer[timeridx]);
         trace_arm_gt_recalc_disabled(timeridx);
     }
+    gt_update_irq(cpu, timeridx);
 }
 
 static void gt_timer_reset(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -XXX,XX +XXX,XX @@ static void gt_ctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
          * IMASK toggled: don't need to recalculate,
          * just set the interrupt line based on ISTATUS
          */
-        int irqstate = (oldval & 4) && !(value & 2);
-
-        trace_arm_gt_imask_toggle(timeridx, irqstate);
-        qemu_set_irq(cpu->gt_timer_outputs[timeridx], irqstate);
+        trace_arm_gt_imask_toggle(timeridx);
+        gt_update_irq(cpu, timeridx);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void gt_virt_ctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
     gt_ctl_write(env, ri, GTIMER_VIRT, value);
 }
 
+static void gt_cnthctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                             uint64_t value)
+{
+    ARMCPU *cpu = env_archcpu(env);
+    uint32_t oldval = env->cp15.cnthctl_el2;
+
+    raw_write(env, ri, value);
+
+    if ((oldval ^ value) & CNTHCTL_CNTVMASK) {
+        gt_update_irq(cpu, GTIMER_VIRT);
+    } else if ((oldval ^ value) & CNTHCTL_CNTPMASK) {
+        gt_update_irq(cpu, GTIMER_PHYS);
+    }
+}
+
 static void gt_cntvoff_write(CPUARMState *env, const ARMCPRegInfo *ri,
                               uint64_t value)
 {
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
        * reset values as IMPDEF. We choose to reset to 3 to comply with
        * both ARMv7 and ARMv8.
        */
-      .access = PL2_RW, .resetvalue = 3,
+      .access = PL2_RW, .type = ARM_CP_IO, .resetvalue = 3,
+      .writefn = gt_cnthctl_write, .raw_writefn = raw_write,
       .fieldoffset = offsetof(CPUARMState, cp15.cnthctl_el2) },
     { .name = "CNTVOFF_EL2", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 4, .crn = 14, .crm = 0, .opc2 = 3,
diff --git a/target/arm/trace-events b/target/arm/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/trace-events
+++ b/target/arm/trace-events
@@ -XXX,XX +XXX,XX @@
 # See docs/devel/tracing.rst for syntax documentation.
 
 # helper.c
-arm_gt_recalc(int timer, int irqstate, uint64_t nexttick) "gt recalc: timer %d irqstate %d next tick 0x%" PRIx64
-arm_gt_recalc_disabled(int timer) "gt recalc: timer %d irqstate 0 timer disabled"
+arm_gt_recalc(int timer, uint64_t nexttick) "gt recalc: timer %d next tick 0x%" PRIx64
+arm_gt_recalc_disabled(int timer) "gt recalc: timer %d timer disabled"
 arm_gt_cval_write(int timer, uint64_t value) "gt_cval_write: timer %d value 0x%" PRIx64
 arm_gt_tval_write(int timer, uint64_t value) "gt_tval_write: timer %d value 0x%" PRIx64
 arm_gt_ctl_write(int timer, uint64_t value) "gt_ctl_write: timer %d value 0x%" PRIx64
-arm_gt_imask_toggle(int timer, int irqstate) "gt_ctl_write: timer %d IMASK toggle, new irqstate %d"
+arm_gt_imask_toggle(int timer) "gt_ctl_write: timer %d IMASK toggle"
 arm_gt_cntvoff_write(uint64_t value) "gt_cntvoff_write: value 0x%" PRIx64
+arm_gt_update_irq(int timer, int irqstate) "gt_update_irq: timer %d irqstate %d"
 
 # kvm.c
 kvm_arm_fixup_msi_route(uint64_t iova, uint64_t gpa) "MSI iova = 0x%"PRIx64" is translated into 0x%"PRIx64
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

A typo, noted in the bug report, resulting in an
incorrect write offset.

Cc: qemu-stable@nongnu.org
Fixes: 7390e0e9ab8 ("target/arm: Implement SME LD1, ST1")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1833
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230818214255.146905-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sme_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -XXX,XX +XXX,XX @@ static inline void HNAME##_host(void *za, intptr_t off, void *host)         \
 {                                                                           \
     uint64_t *ptr = za + off;                                               \
     HOST(host, ptr[BE]);                                                    \
-    HOST(host + 1, ptr[!BE]);                                               \
+    HOST(host + 8, ptr[!BE]);                                               \
 }                                                                           \
 static inline void VNAME##_v_host(void *za, intptr_t off, void *host)       \
 {                                                                           \
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Typo applied byte-wise shift instead of double-word shift.

Cc: qemu-stable@nongnu.org
Fixes: 631e565450c ("target/arm: Create gen_gvec_[us]sra")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1737
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230821022025.397682-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -XXX,XX +XXX,XX @@ void gen_gvec_ssra(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
           .vece = MO_32 },
         { .fni8 = gen_ssra64_i64,
           .fniv = gen_ssra_vec,
-          .fno = gen_helper_gvec_ssra_b,
+          .fno = gen_helper_gvec_ssra_d,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .opt_opc = vecop_list,
           .load_dest = true,
-- 
2.34.1

Hi; this pullreq contains only my FEAT_AFP/FEAT_RPRES patches
(plus a fix for a target/alpha latent bug that would otherwise
be revealed by the fpu changes), because 68 patches is already
longer than I prefer to send in at one time...

thanks
-- PMM

The following changes since commit ffaf7f0376f8040ce9068d71ae9ae8722505c42e:

Merge tag 'pull-10.0-testing-and-gdstub-updates-100225-1' of https://gitlab.com/stsquad/qemu into staging (2025-02-10 13:26:17 -0500)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20250211

for you to fetch changes up to ca4c34e07d1388df8e396520b5e7d60883cd3690:

target/arm: Sink fp_status and fpcr access into do_fmlal* (2025-02-11 16:22:08 +0000)

----------------------------------------------------------------
target-arm queue:
 * target/alpha: Don't corrupt error_code with unknown softfloat flags
 * target/arm: Implement FEAT_AFP and FEAT_RPRES

----------------------------------------------------------------
Peter Maydell (49):
      target/alpha: Don't corrupt error_code with unknown softfloat flags
      fpu: Add float_class_denormal
      fpu: Implement float_flag_input_denormal_used
      fpu: allow flushing of output denormals to be after rounding
      target/arm: Define FPCR AH, FIZ, NEP bits
      target/arm: Implement FPCR.FIZ handling
      target/arm: Adjust FP behaviour for FPCR.AH = 1
      target/arm: Adjust exception flag handling for AH = 1
      target/arm: Add FPCR.AH to tbflags
      target/arm: Set up float_status to use for FPCR.AH=1 behaviour
      target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
      target/arm: Use FPST_FPCR_AH for BFCVT* insns
      target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
      target/arm: Add FPCR.NEP to TBFLAGS
      target/arm: Define and use new write_fp_*reg_merging() functions
      target/arm: Handle FPCR.NEP for 3-input scalar operations
      target/arm: Handle FPCR.NEP for BFCVT scalar
      target/arm: Handle FPCR.NEP for 1-input scalar operations
      target/arm: Handle FPCR.NEP in do_cvtf_scalar()
      target/arm: Handle FPCR.NEP for scalar FABS and FNEG
      target/arm: Handle FPCR.NEP for FCVTXN (scalar)
      target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
      target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
      target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
      target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
      target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
      target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
      target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
      target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
      target/arm: Implement FPCR.AH handling of negation of NaN
      target/arm: Implement FPCR.AH handling for scalar FABS and FABD
      target/arm: Handle FPCR.AH in vector FABD
      target/arm: Handle FPCR.AH in SVE FNEG
      target/arm: Handle FPCR.AH in SVE FABS
      target/arm: Handle FPCR.AH in SVE FABD
      target/arm: Handle FPCR.AH in negation steps in SVE FCADD
      target/arm: Handle FPCR.AH in negation steps in FCADD
      target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
      target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
      target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
      target/arm: Handle FPCR.AH in negation in FMLS (vector)
      target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
      target/arm: Handle FPCR.AH in SVE FTSSEL
      target/arm: Handle FPCR.AH in SVE FTMAD
      target/arm: Enable FEAT_AFP for '-cpu max'
      target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
      target/arm: Implement increased precision FRECPE
      target/arm: Implement increased precision FRSQRTE
      target/arm: Enable FEAT_RPRES for -cpu max

Richard Henderson (19):
      target/arm: Handle FPCR.AH in vector FCMLA
      target/arm: Handle FPCR.AH in FCMLA by index
      target/arm: Handle FPCR.AH in SVE FCMLA
      target/arm: Handle FPCR.AH in FMLSL (by element and vector)
      target/arm: Handle FPCR.AH in SVE FMLSL (indexed)
      target/arm: Handle FPCR.AH in SVE FMLSLB, FMLSLT (vectors)
      target/arm: Introduce CPUARMState.vfp.fp_status[]
      target/arm: Remove standard_fp_status_f16
      target/arm: Remove standard_fp_status
      target/arm: Remove ah_fp_status_f16
      target/arm: Remove ah_fp_status
      target/arm: Remove fp_status_f16_a64
      target/arm: Remove fp_status_f16_a32
      target/arm: Remove fp_status_a64
      target/arm: Remove fp_status_a32
      target/arm: Simplify fp_status indexing in mve_helper.c
      target/arm: Simplify DO_VFP_cmp in vfp_helper.c
      target/arm: Read fz16 from env->vfp.fpcr
      target/arm: Sink fp_status and fpcr access into do_fmlal*

In do_cvttq() we set env->error_code with what is supposed to be a
set of FPCR exception bit values.  However, if the set of float
exception flags we get back from softfloat for the conversion
includes a flag which is not one of the three we expect here
(invalid_cvti, invalid, inexact) then we will fall through the
if-ladder and set env->error_code to the unconverted softfloat
exception_flag value.  This will then cause us to take a spurious
exception.

This is harmless now, but when we add new floating point exception
flags to softfloat it will cause problems.  Add an else clause to the
if-ladder to make it ignore any float exception flags it doesn't care
about.

Specifically, without this fix, 'make check-tcg' will fail for Alpha
when the commit adding float_flag_input_denormal_used lands.

Fixes: aa3bad5b59e7 ("target/alpha: Use float64_to_int64_modulo for CVTTQ")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
 target/alpha/fpu_helper.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/alpha/fpu_helper.c b/target/alpha/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/alpha/fpu_helper.c
+++ b/target/alpha/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t do_cvttq(CPUAlphaState *env, uint64_t a, int roundmode)
             exc = FPCR_INV;
         } else if (exc & float_flag_inexact) {
             exc = FPCR_INE;
+        } else {
+            exc = 0;
         }
     }
     env->error_code = exc;
-- 
2.34.1

Currently in softfloat we canonicalize input denormals and so the
code that implements floating point operations does not need to care
whether the input value was originally normal or denormal.  However,
both x86 and Arm FEAT_AFP require that an exception flag is set if:
 * an input is denormal
 * that input is not squashed to zero
 * that input is actually used in the calculation (e.g. we
   did not find the other input was a NaN)

So we need to track that the input was a non-squashed denormal.  To
do this we add a new value to the FloatClass enum.  In this commit we
add the value and adjust the code everywhere that looks at FloatClass
values so that the new float_class_denormal behaves identically to
float_class_normal.  We will add the code that does the "raise a new
float exception flag if an input was an unsquashed denormal and we
used it" in a subsequent commit.

There should be no behavioural change in this commit.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 32 ++++++++++++++++++++++++++++---
 fpu/softfloat-parts.c.inc | 40 ++++++++++++++++++++++++---------------
 2 files changed, 54 insertions(+), 18 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -XXX,XX +XXX,XX @@ float64_gen2(float64 xa, float64 xb, float_status *s,
 /*
  * Classify a floating point number. Everything above float_class_qnan
  * is a NaN so cls >= float_class_qnan is any NaN.
+ *
+ * Note that we canonicalize denormals, so most code should treat
+ * class_normal and class_denormal identically.
  */
 
 typedef enum __attribute__ ((__packed__)) {
     float_class_unclassified,
     float_class_zero,
     float_class_normal,
+    float_class_denormal, /* input was a non-squashed denormal */
     float_class_inf,
     float_class_qnan,  /* all NaNs from here */
     float_class_snan,
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__ ((__packed__)) {
 enum {
     float_cmask_zero    = float_cmask(float_class_zero),
     float_cmask_normal  = float_cmask(float_class_normal),
+    float_cmask_denormal = float_cmask(float_class_denormal),
     float_cmask_inf     = float_cmask(float_class_inf),
     float_cmask_qnan    = float_cmask(float_class_qnan),
     float_cmask_snan    = float_cmask(float_class_snan),
 
     float_cmask_infzero = float_cmask_zero | float_cmask_inf,
     float_cmask_anynan  = float_cmask_qnan | float_cmask_snan,
+    float_cmask_anynorm = float_cmask_normal | float_cmask_denormal,
 };
 
 /* Flags for parts_minmax. */
@@ -XXX,XX +XXX,XX @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
     return c == float_class_qnan;
 }
 
+/*
+ * Return true if the float_cmask has only normals in it
+ * (including input denormals that were canonicalized)
+ */
+static inline bool cmask_is_only_normals(int cmask)
+{
+    return !(cmask & ~float_cmask_anynorm);
+}
+
+static inline bool is_anynorm(FloatClass c)
+{
+    return float_cmask(c) & float_cmask_anynorm;
+}
+
 /*
  * Structure holding all of the decomposed parts of a float.
  * The exponent is unbiased and the fraction is normalized.
@@ -XXX,XX +XXX,XX @@ static float64 float64r32_round_pack_canonical(FloatParts64 *p,
      */
     switch (p->cls) {
     case float_class_normal:
+    case float_class_denormal:
         if (unlikely(p->exp == 0)) {
             /*
              * The result is denormal for float32, but can be represented
@@ -XXX,XX +XXX,XX @@ static floatx80 floatx80_round_pack_canonical(FloatParts128 *p,
 
     switch (p->cls) {
     case float_class_normal:
+    case float_class_denormal:
         if (s->floatx80_rounding_precision == floatx80_precision_x) {
             parts_uncanon_normal(p, s, fmt);
             frac = p->frac_hi;
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
         break;
 
     case float_class_normal:
+    case float_class_denormal:
     case float_class_zero:
         break;
 
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
     a->sign = b->sign;
     a->exp = b->exp;
 
-    if (a->cls == float_class_normal) {
+    if (is_anynorm(a->cls)) {
         frac_truncjam(a, b);
     } else if (is_nan(a->cls)) {
         /* Discard the low bits of the NaN. */
@@ -XXX,XX +XXX,XX @@ static Int128 float128_to_int128_scalbn(float128 a, FloatRoundMode rmode,
         return int128_zero();
 
     case float_class_normal:
+    case float_class_denormal:
         if (parts_round_to_int_normal(&p, rmode, scale, 128 - 2)) {
             flags = float_flag_inexact;
         }
@@ -XXX,XX +XXX,XX @@ static Int128 float128_to_uint128_scalbn(float128 a, FloatRoundMode rmode,
         return int128_zero();
 
     case float_class_normal:
+    case float_class_denormal:
         if (parts_round_to_int_normal(&p, rmode, scale, 128 - 2)) {
             flags = float_flag_inexact;
             if (p.cls == float_class_zero) {
@@ -XXX,XX +XXX,XX @@ float32 float32_exp2(float32 a, float_status *status)
     float32_unpack_canonical(&xp, a, status);
     if (unlikely(xp.cls != float_class_normal)) {
         switch (xp.cls) {
+        case float_class_denormal:
+            break;
         case float_class_snan:
         case float_class_qnan:
             parts_return_nan(&xp, status);
@@ -XXX,XX +XXX,XX @@ float32 float32_exp2(float32 a, float_status *status)
         case float_class_zero:
             return float32_one;
         default:
-            break;
+            g_assert_not_reached();
         }
-        g_assert_not_reached();
     }
 
     float_raise(float_flag_inexact, status);
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(canonicalize)(FloatPartsN *p, float_status *status,
             frac_clear(p);
         } else {
             int shift = frac_normalize(p);
-            p->cls = float_class_normal;
+            p->cls = float_class_denormal;
             p->exp = fmt->frac_shift - fmt->exp_bias
                    - shift + !fmt->m68k_denormal;
         }
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
 static void partsN(uncanon)(FloatPartsN *p, float_status *s,
                             const FloatFmt *fmt)
 {
-    if (likely(p->cls == float_class_normal)) {
+    if (likely(is_anynorm(p->cls))) {
         parts_uncanon_normal(p, s, fmt);
     } else {
         switch (p->cls) {
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
 
     if (a->sign != b_sign) {
         /* Subtraction */
-        if (likely(ab_mask == float_cmask_normal)) {
+        if (likely(cmask_is_only_normals(ab_mask))) {
             if (parts_sub_normal(a, b)) {
                 return a;
             }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
         }
     } else {
         /* Addition */
-        if (likely(ab_mask == float_cmask_normal)) {
+        if (likely(cmask_is_only_normals(ab_mask))) {
             parts_add_normal(a, b);
             return a;
         }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
     }
 
     if (b->cls == float_class_zero) {
-        g_assert(a->cls == float_class_normal);
+        g_assert(is_anynorm(a->cls));
         return a;
     }
 
     g_assert(a->cls == float_class_zero);
-    g_assert(b->cls == float_class_normal);
+    g_assert(is_anynorm(b->cls));
  return_b:
     b->sign = b_sign;
     return b;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
     bool sign = a->sign ^ b->sign;
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         FloatPartsW tmp;
 
         frac_mulw(&tmp, a, b);
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
         a->sign ^= 1;
     }
 
-    if (unlikely(ab_mask != float_cmask_normal)) {
+    if (unlikely(!cmask_is_only_normals(ab_mask))) {
         if (unlikely(ab_mask == float_cmask_infzero)) {
             float_raise(float_flag_invalid | float_flag_invalid_imz, s);
             goto d_nan;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
         }
 
         g_assert(ab_mask & float_cmask_zero);
-        if (c->cls == float_class_normal) {
+        if (is_anynorm(c->cls)) {
             *a = *c;
             goto return_normal;
         }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
     bool sign = a->sign ^ b->sign;
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         a->sign = sign;
         a->exp -= b->exp + frac_div(a, b);
         return a;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
 {
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         frac_modrem(a, b, mod_quot);
         return a;
     }
@@ -XXX,XX +XXX,XX @@ static void partsN(sqrt)(FloatPartsN *a, float_status *status,
 
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
+        case float_class_denormal:
+            break;
         case float_class_snan:
         case float_class_qnan:
             parts_return_nan(a, status);
@@ -XXX,XX +XXX,XX @@ static void partsN(round_to_int)(FloatPartsN *a, FloatRoundMode rmode,
     case float_class_inf:
         break;
     case float_class_normal:
+    case float_class_denormal:
         if (parts_round_to_int_normal(a, rmode, scale, fmt->frac_size)) {
             float_raise(float_flag_inexact, s);
         }
@@ -XXX,XX +XXX,XX @@ static int64_t partsN(float_to_sint)(FloatPartsN *p, FloatRoundMode rmode,
         return 0;
 
     case float_class_normal:
+    case float_class_denormal:
         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
         if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
             flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static uint64_t partsN(float_to_uint)(FloatPartsN *p, FloatRoundMode rmode,
         return 0;
 
     case float_class_normal:
+    case float_class_denormal:
         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
         if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
             flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static int64_t partsN(float_to_sint_modulo)(FloatPartsN *p,
         return 0;
 
     case float_class_normal:
+    case float_class_denormal:
         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
         if (parts_round_to_int_normal(p, rmode, 0, N - 2)) {
             flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
     a_exp = a->exp;
     b_exp = b->exp;
 
-    if (unlikely(ab_mask != float_cmask_normal)) {
+    if (unlikely(!cmask_is_only_normals(ab_mask))) {
         switch (a->cls) {
         case float_class_normal:
+        case float_class_denormal:
             break;
         case float_class_inf:
             a_exp = INT16_MAX;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
         }
         switch (b->cls) {
         case float_class_normal:
+        case float_class_denormal:
             break;
         case float_class_inf:
             b_exp = INT16_MAX;
@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
 {
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         FloatRelation cmp;
 
         if (a->sign != b->sign) {
@@ -XXX,XX +XXX,XX @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
     case float_class_inf:
         break;
     case float_class_normal:
+    case float_class_denormal:
         a->exp += MIN(MAX(n, -0x10000), 0x10000);
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
 
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
+        case float_class_denormal:
+            break;
         case float_class_snan:
         case float_class_qnan:
             parts_return_nan(a, s);
@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
             }
             return;
         default:
-            break;
+            g_assert_not_reached();
         }
-        g_assert_not_reached();
     }
     if (unlikely(a->sign)) {
         goto d_nan;
-- 
2.34.1

For the x86 and the Arm FEAT_AFP semantics, we need to be able to
tell the target code that the FPU operation has used an input
denormal.  Implement this; when it happens we set the new
float_flag_denormal_input_used.

Note that we only set this when an input denormal is actually used by
the operation: if the operation results in Invalid Operation or
Divide By Zero or the result is a NaN because some other input was a
NaN then we never needed to look at the input denormal and do not set
denormal_input_used.

We mostly do not need to adjust the hardfloat codepaths to deal with
this flag, because almost all hardfloat operations are already gated
on the input not being a denormal, and will fall back to softfloat
for a denormal input.  The only exception is the comparison
operations, where we need to add the check for input denormals, which
must now fall back to softfloat where they did not before.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-types.h |  7 ++++
 fpu/softfloat.c               | 38 +++++++++++++++++---
 fpu/softfloat-parts.c.inc     | 68 ++++++++++++++++++++++++++++++++++-
 3 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -XXX,XX +XXX,XX @@ enum {
     float_flag_invalid_sqrt    = 0x0800,  /* sqrt(-x) */
     float_flag_invalid_cvti    = 0x1000,  /* non-nan to integer */
     float_flag_invalid_snan    = 0x2000,  /* any operand was snan */
+    /*
+     * An input was denormal and we used it (without flushing it to zero).
+     * Not set if we do not actually use the denormal input (e.g.
+     * because some other input was a NaN, or because the operation
+     * wasn't actually carried out (divide-by-zero; invalid))
+     */
+    float_flag_input_denormal_used = 0x4000,
 };
 
 /*
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
                                   float16_params_ahp.frac_size + 1);
         break;
 
-    case float_class_normal:
     case float_class_denormal:
+        float_raise(float_flag_input_denormal_used, s);
+        break;
+    case float_class_normal:
     case float_class_zero:
         break;
 
@@ -XXX,XX +XXX,XX @@ static void parts64_float_to_float(FloatParts64 *a, float_status *s)
     if (is_nan(a->cls)) {
         parts_return_nan(a, s);
     }
+    if (a->cls == float_class_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
 }
 
 static void parts128_float_to_float(FloatParts128 *a, float_status *s)
@@ -XXX,XX +XXX,XX @@ static void parts128_float_to_float(FloatParts128 *a, float_status *s)
     if (is_nan(a->cls)) {
         parts_return_nan(a, s);
     }
+    if (a->cls == float_class_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
 }
 
 #define parts_float_to_float(P, S) \
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
     a->sign = b->sign;
     a->exp = b->exp;
 
-    if (is_anynorm(a->cls)) {
+    switch (a->cls) {
+    case float_class_denormal:
+        float_raise(float_flag_input_denormal_used, s);
+        /* fall through */
+    case float_class_normal:
         frac_truncjam(a, b);
-    } else if (is_nan(a->cls)) {
+        break;
+    case float_class_snan:
+    case float_class_qnan:
         /* Discard the low bits of the NaN. */
         a->frac = b->frac_hi;
         parts_return_nan(a, s);
+        break;
+    default:
+        break;
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_widen(FloatParts128 *a, FloatParts64 *b,
     if (is_nan(a->cls)) {
         parts_return_nan(a, s);
     }
+    if (a->cls == float_class_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
 }
 
 float32 float16_to_float32(float16 a, bool ieee, float_status *s)
@@ -XXX,XX +XXX,XX @@ float32_hs_compare(float32 xa, float32 xb, float_status *s, bool is_quiet)
         goto soft;
     }
 
-    float32_input_flush2(&ua.s, &ub.s, s);
+    if (unlikely(float32_is_denormal(ua.s) || float32_is_denormal(ub.s))) {
+        /* We may need to set the input_denormal_used flag */
+        goto soft;
+    }
+
     if (isgreaterequal(ua.h, ub.h)) {
         if (isgreater(ua.h, ub.h)) {
             return float_relation_greater;
@@ -XXX,XX +XXX,XX @@ float64_hs_compare(float64 xa, float64 xb, float_status *s, bool is_quiet)
         goto soft;
     }
 
-    float64_input_flush2(&ua.s, &ub.s, s);
+    if (unlikely(float64_is_denormal(ua.s) || float64_is_denormal(ub.s))) {
+        /* We may need to set the input_denormal_used flag */
+        goto soft;
+    }
+
     if (isgreaterequal(ua.h, ub.h)) {
         if (isgreater(ua.h, ub.h)) {
             return float_relation_greater;
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
     bool b_sign = b->sign ^ subtract;
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
+    /*
+     * For addition and subtraction, we will consume an
+     * input denormal unless the other input is a NaN.
+     */
+    if ((ab_mask & (float_cmask_denormal | float_cmask_anynan)) ==
+        float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     if (a->sign != b_sign) {
         /* Subtraction */
         if (likely(cmask_is_only_normals(ab_mask))) {
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
     if (likely(cmask_is_only_normals(ab_mask))) {
         FloatPartsW tmp;
 
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
+
         frac_mulw(&tmp, a, b);
         frac_truncjam(a, &tmp);
 
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
     }
 
     /* Multiply by 0 or Inf */
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     if (ab_mask & float_cmask_inf) {
         a->cls = float_class_inf;
         a->sign = sign;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
     if (flags & float_muladd_negate_result) {
         a->sign ^= 1;
     }
+
+    /*
+     * All result types except for "return the default NaN
+     * because this is an Invalid Operation" go through here;
+     * this matches the set of cases where we consumed a
+     * denormal input.
+     */
+    if (abc_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
     return a;
 
  return_sub_zero:
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
     bool sign = a->sign ^ b->sign;
 
     if (likely(cmask_is_only_normals(ab_mask))) {
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
         a->sign = sign;
         a->exp -= b->exp + frac_div(a, b);
         return a;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
         return parts_pick_nan(a, b, s);
     }
 
+    if ((ab_mask & float_cmask_denormal) && b->cls != float_class_zero) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     a->sign = sign;
 
     /* Inf / X */
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
     if (likely(cmask_is_only_normals(ab_mask))) {
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
         frac_modrem(a, b, mod_quot);
         return a;
     }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
         return a;
     }
 
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     /* N % Inf; 0 % N */
     g_assert(b->cls == float_class_inf || a->cls == float_class_zero);
     return a;
@@ -XXX,XX +XXX,XX @@ static void partsN(sqrt)(FloatPartsN *a, float_status *status,
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
         case float_class_denormal:
+            if (!a->sign) {
+                /* -ve denormal will be InvalidOperation */
+                float_raise(float_flag_input_denormal_used, status);
+            }
             break;
         case float_class_snan:
         case float_class_qnan:
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
         if ((flags & (minmax_isnum | minmax_isnumber))
             && !(ab_mask & float_cmask_snan)
             && (ab_mask & ~float_cmask_qnan)) {
+            if (ab_mask & float_cmask_denormal) {
+                float_raise(float_flag_input_denormal_used, s);
+            }
             return is_nan(a->cls) ? b : a;
         }
 
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
         return parts_pick_nan(a, b, s);
     }
 
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     a_exp = a->exp;
     b_exp = b->exp;
 
@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
     if (likely(cmask_is_only_normals(ab_mask))) {
         FloatRelation cmp;
 
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
+
         if (a->sign != b->sign) {
             goto a_sign;
         }
@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
         return float_relation_unordered;
     }
 
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     if (ab_mask & float_cmask_zero) {
         if (ab_mask == float_cmask_zero) {
             return float_relation_equal;
@@ -XXX,XX +XXX,XX @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
     case float_class_zero:
     case float_class_inf:
         break;
-    case float_class_normal:
     case float_class_denormal:
+        float_raise(float_flag_input_denormal_used, s);
+        /* fall through */
+    case float_class_normal:
         a->exp += MIN(MAX(n, -0x10000), 0x10000);
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
         case float_class_denormal:
+            if (!a->sign) {
+                /* -ve denormal will be InvalidOperation */
+                float_raise(float_flag_input_denormal_used, s);
+            }
             break;
         case float_class_snan:
         case float_class_qnan:
-- 
2.34.1

Currently we handle flushing of output denormals in uncanon_normal
always before we deal with rounding.  This works for architectures
that detect tininess before rounding, but is usually not the right
place when the architecture detects tininess after rounding.  For
example, for x86 the SDM states that the MXCSR FTZ control bit causes
outputs to be flushed to zero "when it detects a floating-point
underflow condition".  This means that we mustn't flush to zero if
the input is such that after rounding it is no longer tiny.

At least one of our guest architectures does underflow detection
after rounding but flushing of denormals before rounding (MIPS MSA);
this means we need to have a config knob for this that is separate
from our existing tininess_before_rounding setting.

Add an ftz_detection flag.  For consistency with
tininess_before_rounding, we make it default to "detect ftz after
rounding"; this means that we need to explicitly set the flag to
"detect ftz before rounding" on every existing architecture that sets
flush_to_zero, so that this commit has no behaviour change.
(This means more code change here but for the long term a less
confusing API.)

For several architectures the current behaviour is either
definitely or possibly wrong; annotate those with TODO comments.
These architectures are definitely wrong (and should detect
ftz after rounding):
 * x86
 * Alpha

For these architectures the spec is unclear:
 * MIPS (for non-MSA)
 * RX
 * SH4

PA-RISC makes ftz detection IMPDEF, but we aren't setting the
"tininess before rounding" setting that we ought to.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-helpers.h | 11 +++++++++++
 include/fpu/softfloat-types.h   | 18 ++++++++++++++++++
 target/mips/fpu_helper.h        |  6 ++++++
 target/alpha/cpu.c              |  7 +++++++
 target/arm/cpu.c                |  1 +
 target/hppa/fpu_helper.c        | 11 +++++++++++
 target/i386/tcg/fpu_helper.c    |  8 ++++++++
 target/mips/msa.c               |  9 +++++++++
 target/ppc/cpu_init.c           |  3 +++
 target/rx/cpu.c                 |  8 ++++++++
 target/sh4/cpu.c                |  8 ++++++++
 target/tricore/helper.c         |  1 +
 tests/fp/fp-bench.c             |  1 +
 fpu/softfloat-parts.c.inc       | 21 +++++++++++++++------
 14 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/include/fpu/softfloat-helpers.h b/include/fpu/softfloat-helpers.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-helpers.h
+++ b/include/fpu/softfloat-helpers.h
@@ -XXX,XX +XXX,XX @@ static inline void set_flush_inputs_to_zero(bool val, float_status *status)
     status->flush_inputs_to_zero = val;
 }
 
+static inline void set_float_ftz_detection(FloatFTZDetection d,
+                                           float_status *status)
+{
+    status->ftz_detection = d;
+}
+
 static inline void set_default_nan_mode(bool val, float_status *status)
 {
     status->default_nan_mode = val;
@@ -XXX,XX +XXX,XX @@ static inline bool get_default_nan_mode(const float_status *status)
     return status->default_nan_mode;
 }
 
+static inline FloatFTZDetection get_float_ftz_detection(const float_status *status)
+{
+    return status->ftz_detection;
+}
+
 #endif /* SOFTFLOAT_HELPERS_H */
diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
     float_infzeronan_suppress_invalid = (1 << 7),
 } FloatInfZeroNaNRule;
 
+/*
+ * When flush_to_zero is set, should we detect denormal results to
+ * be flushed before or after rounding? For most architectures this
+ * should be set to match the tininess_before_rounding setting,
+ * but a few architectures, e.g. MIPS MSA, detect FTZ before
+ * rounding but tininess after rounding.
+ *
+ * This enum is arranged so that the default if the target doesn't
+ * configure it matches the default for tininess_before_rounding
+ * (i.e. "after rounding").
+ */
+typedef enum __attribute__((__packed__)) {
+    float_ftz_after_rounding = 0,
+    float_ftz_before_rounding = 1,
+} FloatFTZDetection;
+
 /*
  * Floating Point Status. Individual architectures may maintain
  * several versions of float_status for different functions. The
@@ -XXX,XX +XXX,XX @@ typedef struct float_status {
     bool tininess_before_rounding;
     /* should denormalised results go to zero and set output_denormal_flushed? */
     bool flush_to_zero;
+    /* do we detect and flush denormal results before or after rounding? */
+    FloatFTZDetection ftz_detection;
     /* should denormalised inputs go to zero and set input_denormal_flushed? */
     bool flush_inputs_to_zero;
     bool default_nan_mode;
diff --git a/target/mips/fpu_helper.h b/target/mips/fpu_helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/mips/fpu_helper.h
+++ b/target/mips/fpu_helper.h
@@ -XXX,XX +XXX,XX @@ static inline void fp_reset(CPUMIPSState *env)
      */
     set_float_2nan_prop_rule(float_2nan_prop_s_ab,
                              &env->active_fpu.fp_status);
+    /*
+     * TODO: the spec does't say clearly whether FTZ happens before
+     * or after rounding for normal FPU operations.
+     */
+    set_float_ftz_detection(float_ftz_before_rounding,
+                            &env->active_fpu.fp_status);
 }
 
 /* MSA */
diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/alpha/cpu.c
+++ b/target/alpha/cpu.c
@@ -XXX,XX +XXX,XX @@ static void alpha_cpu_initfn(Object *obj)
     set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
     /* Default NaN: sign bit clear, msb frac bit set */
     set_float_default_nan_pattern(0b01000000, &env->fp_status);
+    /*
+     * TODO: this is incorrect. The Alpha Architecture Handbook version 4
+     * section 4.7.7.11 says that we flush to zero for underflow cases, so
+     * this should be float_ftz_after_rounding to match the
+     * tininess_after_rounding (which is specified in section 4.7.5).
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 #if defined(CONFIG_USER_ONLY)
     env->flags = ENV_FLAG_PS_USER | ENV_FLAG_FEN;
     cpu_alpha_store_fpcr(env, (uint64_t)(FPCR_INVD | FPCR_DZED | FPCR_OVFD
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
 static void arm_set_default_fp_behaviours(float_status *s)
 {
     set_float_detect_tininess(float_tininess_before_rounding, s);
+    set_float_ftz_detection(float_ftz_before_rounding, s);
     set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
     set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
     set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
diff --git a/target/hppa/fpu_helper.c b/target/hppa/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/hppa/fpu_helper.c
+++ b/target/hppa/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(loaded_fr0)(CPUHPPAState *env)
     set_float_infzeronan_rule(float_infzeronan_dnan_never, &env->fp_status);
     /* Default NaN: sign bit clear, msb-1 frac bit set */
     set_float_default_nan_pattern(0b00100000, &env->fp_status);
+    /*
+     * "PA-RISC 2.0 Architecture" says it is IMPDEF whether the flushing
+     * enabled by FPSR.D happens before or after rounding. We pick "before"
+     * for consistency with tininess detection.
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
+    /*
+     * TODO: "PA-RISC 2.0 Architecture" chapter 10 says that we should
+     * detect tininess before rounding, but we don't set that here so we
+     * get the default tininess after rounding.
+     */
 }
 
 void cpu_hppa_loaded_fr0(CPUHPPAState *env)
diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ void cpu_init_fp_statuses(CPUX86State *env)
     set_float_default_nan_pattern(0b11000000, &env->fp_status);
     set_float_default_nan_pattern(0b11000000, &env->mmx_status);
     set_float_default_nan_pattern(0b11000000, &env->sse_status);
+    /*
+     * TODO: x86 does flush-to-zero detection after rounding (the SDM
+     * section 10.2.3.3 on the FTZ bit of MXCSR says that we flush
+     * when we detect underflow, which x86 does after rounding).
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
+    set_float_ftz_detection(float_ftz_before_rounding, &env->mmx_status);
+    set_float_ftz_detection(float_ftz_before_rounding, &env->sse_status);
 }
 
 static inline uint8_t save_exception_flags(CPUX86State *env)
diff --git a/target/mips/msa.c b/target/mips/msa.c
index XXXXXXX..XXXXXXX 100644
--- a/target/mips/msa.c
+++ b/target/mips/msa.c
@@ -XXX,XX +XXX,XX @@ void msa_reset(CPUMIPSState *env)
     /* tininess detected after rounding.*/
     set_float_detect_tininess(float_tininess_after_rounding,
                               &env->active_tc.msa_fp_status);
+    /*
+     * MSACSR.FS detects tiny results to flush to zero before rounding
+     * (per "MIPS Architecture for Programmers Volume IV-j: The MIPS64 SIMD
+     * Architecture Module, Revision 1.1" section 3.5.4), even though it
+     * detects tininess after rounding for underflow purposes (section 3.4.2
+     * table 3.3).
+     */
+    set_float_ftz_detection(float_ftz_before_rounding,
+                            &env->active_tc.msa_fp_status);
 
     /*
      * According to MIPS specifications, if one of the two operands is
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index XXXXXXX..XXXXXXX 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -XXX,XX +XXX,XX @@ static void ppc_cpu_reset_hold(Object *obj, ResetType type)
     /* tininess for underflow is detected before rounding */
     set_float_detect_tininess(float_tininess_before_rounding,
                               &env->fp_status);
+    /* Similarly for flush-to-zero */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
+
     /*
      * PowerPC propagation rules:
      *  1. A if it sNaN or qNaN
diff --git a/target/rx/cpu.c b/target/rx/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/rx/cpu.c
+++ b/target/rx/cpu.c
@@ -XXX,XX +XXX,XX @@ static void rx_cpu_reset_hold(Object *obj, ResetType type)
     set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
     /* Default NaN value: sign bit clear, set frac msb */
     set_float_default_nan_pattern(0b01000000, &env->fp_status);
+    /*
+     * TODO: "RX Family RXv1 Instruction Set Architecture" is not 100% clear
+     * on whether flush-to-zero should happen before or after rounding, but
+     * section 1.3.2 says that it happens when underflow is detected, and
+     * implies that underflow is detected after rounding. So this may not
+     * be the correct setting.
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 }
 
 static ObjectClass *rx_cpu_class_by_name(const char *cpu_model)
diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/sh4/cpu.c
+++ b/target/sh4/cpu.c
@@ -XXX,XX +XXX,XX @@ static void superh_cpu_reset_hold(Object *obj, ResetType type)
     set_default_nan_mode(1, &env->fp_status);
     /* sign bit clear, set all frac bits other than msb */
     set_float_default_nan_pattern(0b00111111, &env->fp_status);
+    /*
+     * TODO: "SH-4 CPU Core Architecture ADCS 7182230F" doesn't say whether
+     * it detects tininess before or after rounding. Section 6.4 is clear
+     * that flush-to-zero happens when the result underflows, though, so
+     * either this should be "detect ftz after rounding" or else we should
+     * be setting "detect tininess before rounding".
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 }
 
 static void superh_cpu_disas_set_info(CPUState *cpu, disassemble_info *info)
diff --git a/target/tricore/helper.c b/target/tricore/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/tricore/helper.c
+++ b/target/tricore/helper.c
@@ -XXX,XX +XXX,XX @@ void fpu_set_state(CPUTriCoreState *env)
     set_flush_inputs_to_zero(1, &env->fp_status);
     set_flush_to_zero(1, &env->fp_status);
     set_float_detect_tininess(float_tininess_before_rounding, &env->fp_status);
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
     set_default_nan_mode(1, &env->fp_status);
     /* Default NaN pattern: sign bit clear, frac msb set */
     set_float_default_nan_pattern(0b01000000, &env->fp_status);
diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/fp/fp-bench.c
+++ b/tests/fp/fp-bench.c
@@ -XXX,XX +XXX,XX @@ static void run_bench(void)
     set_float_3nan_prop_rule(float_3nan_prop_s_cab, &soft_status);
     set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, &soft_status);
     set_float_default_nan_pattern(0b01000000, &soft_status);
+    set_float_ftz_detection(float_ftz_before_rounding, &soft_status);
 
     f = bench_funcs[operation][precision];
     g_assert(f);
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
             p->frac_lo &= ~round_mask;
         }
         frac_shr(p, frac_shift);
-    } else if (s->flush_to_zero) {
+    } else if (s->flush_to_zero &&
+               s->ftz_detection == float_ftz_before_rounding) {
         flags |= float_flag_output_denormal_flushed;
         p->cls = float_class_zero;
         exp = 0;
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
         exp = (p->frac_hi & DECOMPOSED_IMPLICIT_BIT) && !fmt->m68k_denormal;
         frac_shr(p, frac_shift);
 
-        if (is_tiny && (flags & float_flag_inexact)) {
-            flags |= float_flag_underflow;
-        }
-        if (exp == 0 && frac_eqz(p)) {
-            p->cls = float_class_zero;
+        if (is_tiny) {
+            if (s->flush_to_zero) {
+                assert(s->ftz_detection == float_ftz_after_rounding);
+                flags |= float_flag_output_denormal_flushed;
+                p->cls = float_class_zero;
+                exp = 0;
+                frac_clear(p);
+            } else if (flags & float_flag_inexact) {
+                flags |= float_flag_underflow;
+            }
+            if (exp == 0 && frac_eqz(p)) {
+                p->cls = float_class_zero;
+            }
         }
     }
     p->exp = exp;
-- 
2.34.1

The Armv8.7 FEAT_AFP feature defines three new control bits in
the FPCR:
 * FPCR.AH: "alternate floating point mode"; this changes floating
   point behaviour in a variety of ways, including:
    - the sign of a default NaN is 1, not 0
    - if FPCR.FZ is also 1, denormals detected after rounding
      with an unbounded exponent has been applied are flushed to zero
    - FPCR.FZ does not cause denormalized inputs to be flushed to zero
    - miscellaneous other corner-case behaviour changes
 * FPCR.FIZ: flush denormalized numbers to zero on input for
   most instructions
 * FPCR.NEP: makes scalar SIMD operations merge the result with
   higher vector elements in one of the source registers, instead
   of zeroing the higher elements of the destination

This commit defines the new bits in the FPCR, and allows them to be
read or written when FEAT_AFP is implemented.  Actual behaviour
changes will be implemented in subsequent commits.

Note that these are the first FPCR bits which don't appear in the
AArch32 FPSCR view of the register, and which share bit positions
with FPSR bits.

Part of FEAT_AFP is the new control bit FPCR.FIZ.  This bit affects
flushing of single and double precision denormal inputs to zero for
AArch64 floating point instructions.  (For half-precision, the
existing FPCR.FZ16 control remains the only one.)

FPCR.FIZ differs from FPCR.FZ in that if we flush an input denormal
only because of FPCR.FIZ then we should *not* set the cumulative
exception bit FPSR.IDC.

FEAT_AFP also defines that in AArch64 the existing FPCR.FZ only
applies when FPCR.AH is 0.

We can implement this by setting the "flush inputs to zero" state
appropriately when FPCR is written, and by not reflecting the
float_flag_input_denormal status flag into FPSR reads when it is the
result only of FPSR.FIZ.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vfp_helper.c | 60 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 50 insertions(+), 10 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
 
 static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
 {
-    uint32_t i = 0;
+    uint32_t a32_flags = 0, a64_flags = 0;
 
-    i |= get_float_exception_flags(&env->vfp.fp_status_a32);
-    i |= get_float_exception_flags(&env->vfp.fp_status_a64);
-    i |= get_float_exception_flags(&env->vfp.standard_fp_status);
+    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
+    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
     /* FZ16 does not generate an input denormal exception.  */
-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
+    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
           & ~float_flag_input_denormal_flushed);
-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
+    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
           & ~float_flag_input_denormal_flushed);
-    i |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
+
+    a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
+    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~float_flag_input_denormal_flushed);
-    return vfp_exceptbits_from_host(i);
+    /*
+     * Flushing an input denormal *only* because FPCR.FIZ == 1 does
+     * not set FPSR.IDC; if FPCR.FZ is also set then this takes
+     * precedence and IDC is set (see the FPUnpackBase pseudocode).
+     * So squash it unless (FPCR.AH == 0 && FPCR.FZ == 1).
+     * We only do this for the a64 flags because FIZ has no effect
+     * on AArch32 even if it is set.
+     */
+    if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
+        a64_flags &= ~float_flag_input_denormal_flushed;
+    }
+    return vfp_exceptbits_from_host(a32_flags | a64_flags);
 }
 
 static void vfp_clear_float_status_exc_flags(CPUARMState *env)
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
 }
 
+static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
+{
+    /*
+     * Synchronize any pending exception-flag information in the
+     * float_status values into env->vfp.fpsr, and then clear out
+     * the float_status data.
+     */
+    env->vfp.fpsr |= vfp_get_fpsr_from_host(env);
+    vfp_clear_float_status_exc_flags(env);
+}
+
 static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
 {
     uint64_t changed = env->vfp.fpcr;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
+        /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+    }
+    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
+        /*
+         * A64: Flush denormalized inputs to zero if FPCR.FIZ = 1, or
+         * both FPCR.AH = 0 and FPCR.FZ = 1.
+         */
+        bool fitz_enabled = (val & FPCR_FIZ) ||
+            (val & (FPCR_FZ | FPCR_AH)) == FPCR_FZ;
+        set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status_a64);
     }
     if (changed & FPCR_DN) {
         bool dnan_enabled = val & FPCR_DN;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
     }
+    /*
+     * If any bits changed that we look at in vfp_get_fpsr_from_host(),
+     * we must sync the float_status flags into vfp.fpsr now (under the
+     * old regime) before we update vfp.fpcr.
+     */
+    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
+        vfp_sync_and_clear_float_status_exc_flags(env);
+    }
 }
 
 #else
-- 
2.34.1

When FPCR.AH is set, various behaviours of AArch64 floating point
operations which are controlled by softfloat config settings change:
 * tininess and ftz detection before/after rounding
 * NaN propagation order
 * result of 0 * Inf + NaN
 * default NaN value

When the guest changes the value of the AH bit, switch these config
settings on the fp_status_a64 and fp_status_f16_a64 float_status
fields.

This requires us to make the arm_set_default_fp_behaviours() function
global, since we now need to call it from cpu.c and vfp_helper.c; we
move it to vfp_helper.c so it can be next to the new
arm_set_ah_fp_behaviours().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/internals.h  |  4 +++
 target/arm/cpu.c        | 23 ----------------
 target/arm/vfp_helper.c | 58 ++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 61 insertions(+), 24 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ uint64_t gt_virt_cnt_offset(CPUARMState *env);
  * all EL1" scope; this covers stage 1 and stage 2.
  */
 int alle1_tlbmask(CPUARMState *env);
+
+/* Set the float_status behaviour to match the Arm defaults */
+void arm_set_default_fp_behaviours(float_status *s);
+
 #endif
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
     QLIST_INSERT_HEAD(&cpu->el_change_hooks, entry, node);
 }
 
-/*
- * Set the float_status behaviour to match the Arm defaults:
- *  * tininess-before-rounding
- *  * 2-input NaN propagation prefers SNaN over QNaN, and then
- *    operand A over operand B (see FPProcessNaNs() pseudocode)
- *  * 3-input NaN propagation prefers SNaN over QNaN, and then
- *    operand C over A over B (see FPProcessNaNs3() pseudocode,
- *    but note that for QEMU muladd is a * b + c, whereas for
- *    the pseudocode function the arguments are in the order c, a, b.
- *  * 0 * Inf + NaN returns the default NaN if the input NaN is quiet,
- *    and the input NaN if it is signalling
- *  * Default NaN has sign bit clear, msb frac bit set
- */
-static void arm_set_default_fp_behaviours(float_status *s)
-{
-    set_float_detect_tininess(float_tininess_before_rounding, s);
-    set_float_ftz_detection(float_ftz_before_rounding, s);
-    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
-    set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
-    set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
-    set_float_default_nan_pattern(0b01000000, s);
-}
-
 static void cp_reg_reset(gpointer key, gpointer value, gpointer opaque)
 {
     /* Reset a single ARMCPRegInfo register */
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@
 #include "exec/helper-proto.h"
 #include "internals.h"
 #include "cpu-features.h"
+#include "fpu/softfloat.h"
 #ifdef CONFIG_TCG
 #include "qemu/log.h"
-#include "fpu/softfloat.h"
 #endif
 
 /* VFP support.  We follow the convention used for VFP instructions:
    Single precision routines have a "s" suffix, double precision a
    "d" suffix.  */
 
+/*
+ * Set the float_status behaviour to match the Arm defaults:
+ *  * tininess-before-rounding
+ *  * 2-input NaN propagation prefers SNaN over QNaN, and then
+ *    operand A over operand B (see FPProcessNaNs() pseudocode)
+ *  * 3-input NaN propagation prefers SNaN over QNaN, and then
+ *    operand C over A over B (see FPProcessNaNs3() pseudocode,
+ *    but note that for QEMU muladd is a * b + c, whereas for
+ *    the pseudocode function the arguments are in the order c, a, b.
+ *  * 0 * Inf + NaN returns the default NaN if the input NaN is quiet,
+ *    and the input NaN if it is signalling
+ *  * Default NaN has sign bit clear, msb frac bit set
+ */
+void arm_set_default_fp_behaviours(float_status *s)
+{
+    set_float_detect_tininess(float_tininess_before_rounding, s);
+    set_float_ftz_detection(float_ftz_before_rounding, s);
+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
+    set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
+    set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
+    set_float_default_nan_pattern(0b01000000, s);
+}
+
+/*
+ * Set the float_status behaviour to match the FEAT_AFP
+ * FPCR.AH=1 requirements:
+ *  * tininess-after-rounding
+ *  * 2-input NaN propagation prefers the first NaN
+ *  * 3-input NaN propagation prefers a over b over c
+ *  * 0 * Inf + NaN always returns the input NaN and doesn't
+ *    set Invalid for a QNaN
+ *  * default NaN has sign bit set, msb frac bit set
+ */
+static void arm_set_ah_fp_behaviours(float_status *s)
+{
+    set_float_detect_tininess(float_tininess_after_rounding, s);
+    set_float_ftz_detection(float_ftz_after_rounding, s);
+    set_float_2nan_prop_rule(float_2nan_prop_ab, s);
+    set_float_3nan_prop_rule(float_3nan_prop_abc, s);
+    set_float_infzeronan_rule(float_infzeronan_dnan_never |
+                              float_infzeronan_suppress_invalid, s);
+    set_float_default_nan_pattern(0b11000000, s);
+}
+
 #ifdef CONFIG_TCG
 
 /* Convert host exception flags to vfp form.  */
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
     }
+    if (changed & FPCR_AH) {
+        bool ah_enabled = val & FPCR_AH;
+
+        if (ah_enabled) {
+            /* Change behaviours for A64 FP operations */
+            arm_set_ah_fp_behaviours(&env->vfp.fp_status_a64);
+            arm_set_ah_fp_behaviours(&env->vfp.fp_status_f16_a64);
+        } else {
+            arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
+            arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
+        }
+    }
     /*
      * If any bits changed that we look at in vfp_get_fpsr_from_host(),
      * we must sync the float_status flags into vfp.fpsr now (under the
-- 
2.34.1

When FPCR.AH = 1, some of the cumulative exception flags in the FPSR
behave slightly differently for A64 operations:
 * IDC is set when a denormal input is used without flushing
 * IXC (Inexact) is set when an output denormal is flushed to zero

Update vfp_get_fpsr_from_host() to do this.

Note that because half-precision operations never set IDC, we now
need to add float_flag_input_denormal_used to the set we mask out of
fp_status_f16_a64.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vfp_helper.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static void arm_set_ah_fp_behaviours(float_status *s)
 #ifdef CONFIG_TCG
 
 /* Convert host exception flags to vfp form.  */
-static inline uint32_t vfp_exceptbits_from_host(int host_bits)
+static inline uint32_t vfp_exceptbits_from_host(int host_bits, bool ah)
 {
     uint32_t target_bits = 0;
 
@@ -XXX,XX +XXX,XX @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
     if (host_bits & float_flag_input_denormal_flushed) {
         target_bits |= FPSR_IDC;
     }
+    /*
+     * With FPCR.AH, IDC is set when an input denormal is used,
+     * and flushing an output denormal to zero sets both IXC and UFC.
+     */
+    if (ah && (host_bits & float_flag_input_denormal_used)) {
+        target_bits |= FPSR_IDC;
+    }
+    if (ah && (host_bits & float_flag_output_denormal_flushed)) {
+        target_bits |= FPSR_IXC;
+    }
     return target_bits;
 }
 
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
 
     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
-          & ~float_flag_input_denormal_flushed);
+          & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
     /*
      * Flushing an input denormal *only* because FPCR.FIZ == 1 does
      * not set FPSR.IDC; if FPCR.FZ is also set then this takes
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
         a64_flags &= ~float_flag_input_denormal_flushed;
     }
-    return vfp_exceptbits_from_host(a32_flags | a64_flags);
+    return vfp_exceptbits_from_host(a64_flags, env->vfp.fpcr & FPCR_AH) |
+        vfp_exceptbits_from_host(a32_flags, false);
 }
 
 static void vfp_clear_float_status_exc_flags(CPUARMState *env)
-- 
2.34.1

We are going to need to generate different code in some cases when
FPCR.AH is 1.  For example:
 * Floating point neg and abs must not flip the sign bit of NaNs
 * some insns (FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, and various
   BFCVT and BFM bfloat16 ops) need to use a different float_status
   to the usual one

Encode FPCR.AH into the A64 tbflags, so we can refer to it at
translate time.

Because we now have a bit in FPCR that affects codegen, we can't mark
the AArch64 FPCR register as being SUPPRESS_TB_END any more; writes
to it will now end the TB and trigger a regeneration of hflags.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h               | 1 +
 target/arm/tcg/translate.h     | 2 ++
 target/arm/helper.c            | 2 +-
 target/arm/tcg/hflags.c        | 4 ++++
 target/arm/tcg/translate-a64.c | 1 +
 5 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2, 34, 1)
 FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1)
 /* Set if FEAT_NV2 RAM accesses are big-endian */
 FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1)
+FIELD(TBFLAG_A64, AH, 37, 1)   /* FPCR.AH */
 
 /*
  * Helpers for using the above. Note that only the A64 accessors use
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool nv2_mem_e20;
     /* True if NV2 enabled and NV2 RAM accesses are big-endian */
     bool nv2_mem_be;
+    /* True if FPCR.AH is 1 (alternate floating point handling) */
+    bool fpcr_ah;
     /*
      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
      *  < 0, set by the current instruction.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = aa64_daif_write, .resetfn = arm_cp_reset_ignore },
     { .name = "FPCR", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .opc2 = 0, .crn = 4, .crm = 4,
-      .access = PL0_RW, .type = ARM_CP_FPU | ARM_CP_SUPPRESS_TB_END,
+      .access = PL0_RW, .type = ARM_CP_FPU,
       .readfn = aa64_fpcr_read, .writefn = aa64_fpcr_write },
     { .name = "FPSR", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .opc2 = 1, .crn = 4, .crm = 4,
diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/hflags.c
+++ b/target/arm/tcg/hflags.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
         DP_TBFLAG_A64(flags, TCMA, aa64_va_parameter_tcma(tcr, mmu_idx));
     }
 
+    if (env->vfp.fpcr & FPCR_AH) {
+        DP_TBFLAG_A64(flags, AH, 1);
+    }
+
     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 }
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->nv2 = EX_TBFLAG_A64(tb_flags, NV2);
     dc->nv2_mem_e20 = EX_TBFLAG_A64(tb_flags, NV2_MEM_E20);
     dc->nv2_mem_be = EX_TBFLAG_A64(tb_flags, NV2_MEM_BE);
+    dc->fpcr_ah = EX_TBFLAG_A64(tb_flags, AH);
     dc->vec_len = 0;
     dc->vec_stride = 0;
     dc->cp_regs = arm_cpu->cp_regs;
-- 
2.34.1

When FPCR.AH is 1, the behaviour of some instructions changes:
 * AdvSIMD BFCVT, BFCVTN, BFCVTN2, BFMLALB, BFMLALT
 * SVE BFCVT, BFCVTNT, BFMLALB, BFMLALT, BFMLSLB, BFMLSLT
 * SME BFCVT, BFCVTN, BFMLAL, BFMLSL (these are all in SME2 which
   QEMU does not yet implement)
 * FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS

The behaviour change is:
 * the instructions do not update the FPSR cumulative exception flags
 * trapped floating point exceptions are disabled (a no-op for QEMU,
   which doesn't implement FPCR.{IDE,IXE,UFE,OFE,DZE,IOE})
 * rounding is always round-to-nearest-even regardless of FPCR.RMode
 * denormalized inputs and outputs are always flushed to zero, as if
   FPCR.{FZ,FIZ} is {1,1}
 * FPCR.FZ16 is still honoured for half-precision inputs

(See the Arm ARM DDI0487L.a section A1.5.9.)

We can provide all these behaviours with another pair of float_status fields
which we use only for these insns, when FPCR.AH is 1. These float_status
fields will always have:
 * flush_to_zero and flush_inputs_to_zero set for the non-F16 field
 * rounding mode set to round-to-nearest-even
and so the only FPCR fields they need to honour are DN and FZ16.

In this commit we only define the new fp_status fields and give them
the required behaviour when FPSR is updated.  In subsequent commits
we will arrange to use this new fp_status field for the instructions
that should be affected by FPCR.AH in this way.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h           | 15 +++++++++++++++
 target/arm/internals.h     |  2 ++
 target/arm/tcg/translate.h | 14 ++++++++++++++
 target/arm/cpu.c           |  4 ++++
 target/arm/vfp_helper.c    | 13 ++++++++++++-
 5 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          *  standard_fp_status : the ARM "Standard FPSCR Value"
          *  standard_fp_status_fp16 : used for half-precision
          *       calculations with the ARM "Standard FPSCR Value"
+         *  ah_fp_status: used for the A64 insns which change behaviour
+         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
+         *       and the reciprocal and square root estimate/step insns)
+         *  ah_fp_status_f16: used for the A64 insns which change behaviour
+         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
+         *       and the reciprocal and square root estimate/step insns);
+         *       for half-precision
          *
          * Half-precision operations are governed by a separate
          * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
          * using a fixed value for it.
          *
+         * The ah_fp_status is needed because some insns have different
+         * behaviour when FPCR.AH == 1: they don't update cumulative
+         * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
+         * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
+         * which means we need an ah_fp_status_f16 as well.
+         *
          * To avoid having to transfer exception bits around, we simply
          * say that the FPSCR cumulative exception flags are the logical
          * OR of the flags in the four fp statuses. This relies on the
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
         float_status fp_status_f16_a64;
         float_status standard_fp_status;
         float_status standard_fp_status_f16;
+        float_status ah_fp_status;
+        float_status ah_fp_status_f16;
 
         uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
         uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ int alle1_tlbmask(CPUARMState *env);
 
 /* Set the float_status behaviour to match the Arm defaults */
 void arm_set_default_fp_behaviours(float_status *s);
+/* Set the float_status behaviour to match Arm FPCR.AH=1 behaviour */
+void arm_set_ah_fp_behaviours(float_status *s);
 
 #endif
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour {
     FPST_A64,
     FPST_A32_F16,
     FPST_A64_F16,
+    FPST_AH,
+    FPST_AH_F16,
     FPST_STD,
     FPST_STD_F16,
 } ARMFPStatusFlavour;
@@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour {
  *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
  * FPST_A64_F16
  *   for AArch64 operations controlled by the FPCR where FPCR.FZ16 is to be used
+ * FPST_AH:
+ *   for AArch64 operations which change behaviour when AH=1 (specifically,
+ *   bfloat16 conversions and multiplies, and the reciprocal and square root
+ *   estimate/step insns)
+ * FPST_AH_F16:
+ *   ditto, but for half-precision operations
  * FPST_STD
  *   for A32/T32 Neon operations using the "standard FPSCR value"
  * FPST_STD_F16
@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
     case FPST_A64_F16:
         offset = offsetof(CPUARMState, vfp.fp_status_f16_a64);
         break;
+    case FPST_AH:
+        offset = offsetof(CPUARMState, vfp.ah_fp_status);
+        break;
+    case FPST_AH_F16:
+        offset = offsetof(CPUARMState, vfp.ah_fp_status_f16);
+        break;
     case FPST_STD:
         offset = offsetof(CPUARMState, vfp.standard_fp_status);
         break;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
+    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
+    set_flush_to_zero(1, &env->vfp.ah_fp_status);
+    set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
+    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status_f16);
 
 #ifndef CONFIG_USER_ONLY
     if (kvm_enabled()) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ void arm_set_default_fp_behaviours(float_status *s)
  *    set Invalid for a QNaN
  *  * default NaN has sign bit set, msb frac bit set
  */
-static void arm_set_ah_fp_behaviours(float_status *s)
+void arm_set_ah_fp_behaviours(float_status *s)
 {
     set_float_detect_tininess(float_tininess_after_rounding, s);
     set_float_ftz_detection(float_ftz_after_rounding, s);
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
+    /*
+     * We do not merge in flags from ah_fp_status or ah_fp_status_f16, because
+     * they are used for insns that must not set the cumulative exception bits.
+     */
+
     /*
      * Flushing an input denormal *only* because FPCR.FIZ == 1 does
      * not set FPSR.IDC; if FPCR.FZ is also set then this takes
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
     set_float_exception_flags(0, &env->vfp.standard_fp_status);
     set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
+    set_float_exception_flags(0, &env->vfp.ah_fp_status);
+    set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
 }
 
 static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
+        set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
     }
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
+        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
+        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status_f16);
     }
     if (changed & FPCR_AH) {
         bool ah_enabled = val & FPCR_AH;
-- 
2.34.1

For the instructions FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, use
FPST_FPCR_AH or FPST_FPCR_AH_F16 when FPCR.AH is 1, so that they get
the required behaviour changes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.h |  13 ++++
 target/arm/tcg/translate-a64.c | 119 +++++++++++++++++++++++++--------
 target/arm/tcg/translate-sve.c |  30 ++++++---
 3 files changed, 127 insertions(+), 35 deletions(-)

diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.h
+++ b/target/arm/tcg/translate-a64.h
@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr pred_full_reg_ptr(DisasContext *s, int regno)
     return ret;
 }
 
+/*
+ * Return the ARMFPStatusFlavour to use based on element size and
+ * whether FPCR.AH is set.
+ */
+static inline ARMFPStatusFlavour select_ah_fpst(DisasContext *s, MemOp esz)
+{
+    if (s->fpcr_ah) {
+        return esz == MO_16 ? FPST_AH_F16 : FPST_AH;
+    } else {
+        return esz == MO_16 ? FPST_A64_F16 : FPST_A64;
+    }
+}
+
 bool disas_sve(DisasContext *, uint32_t);
 bool disas_sme(DisasContext *, uint32_t);
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op3_ool(DisasContext *s, bool is_q, int rd,
  * an out-of-line helper.
  */
 static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, int rd, int rn,
-                              int rm, bool is_fp16, int data,
+                              int rm, ARMFPStatusFlavour fpsttype, int data,
                               gen_helper_gvec_3_ptr *fn)
 {
-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_A64_F16 : FPST_A64);
+    TCGv_ptr fpst = fpstatus_ptr(fpsttype);
     tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn),
                        vec_full_reg_offset(s, rm), fpst,
@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar {
     void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
 } FPScalar;
 
-static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
+                                        const FPScalar *f,
+                                        ARMFPStatusFlavour fpsttype)
 {
     switch (a->esz) {
     case MO_64:
         if (fp_access_check(s)) {
             TCGv_i64 t0 = read_fp_dreg(s, a->rn);
             TCGv_i64 t1 = read_fp_dreg(s, a->rm);
-            f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_A64));
+            f->gen_d(t0, t0, t1, fpstatus_ptr(fpsttype));
             write_fp_dreg(s, a->rd, t0);
         }
         break;
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
         if (fp_access_check(s)) {
             TCGv_i32 t0 = read_fp_sreg(s, a->rn);
             TCGv_i32 t1 = read_fp_sreg(s, a->rm);
-            f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_A64));
+            f->gen_s(t0, t0, t1, fpstatus_ptr(fpsttype));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
         if (fp_access_check(s)) {
             TCGv_i32 t0 = read_fp_hreg(s, a->rn);
             TCGv_i32 t1 = read_fp_hreg(s, a->rm);
-            f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_A64_F16));
+            f->gen_h(t0, t0, t1, fpstatus_ptr(fpsttype));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
     return true;
 }
 
+static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+{
+    return do_fp3_scalar_with_fpsttype(s, a, f,
+                                       a->esz == MO_16 ?
+                                       FPST_A64_F16 : FPST_A64);
+}
+
+static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+{
+    return do_fp3_scalar_with_fpsttype(s, a, f, select_ah_fpst(s, a->esz));
+}
+
 static const FPScalar f_scalar_fadd = {
     gen_helper_vfp_addh,
     gen_helper_vfp_adds,
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f32,
     gen_helper_recpsf_f64,
 };
-TRANS(FRECPS_s, do_fp3_scalar, a, &f_scalar_frecps)
+TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps)
 
 static const FPScalar f_scalar_frsqrts = {
     gen_helper_rsqrtsf_f16,
     gen_helper_rsqrtsf_f32,
     gen_helper_rsqrtsf_f64,
 };
-TRANS(FRSQRTS_s, do_fp3_scalar, a, &f_scalar_frsqrts)
+TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts)
 
 static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                        const FPScalar *f, bool swap)
@@ -XXX,XX +XXX,XX @@ TRANS(CMHS_s, do_cmop_d, a, TCG_COND_GEU)
 TRANS(CMEQ_s, do_cmop_d, a, TCG_COND_EQ)
 TRANS(CMTST_s, do_cmop_d, a, TCG_COND_TSTNE)
 
-static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
-                          gen_helper_gvec_3_ptr * const fns[3])
+static bool do_fp3_vector_with_fpsttype(DisasContext *s, arg_qrrr_e *a,
+                                        int data,
+                                        gen_helper_gvec_3_ptr * const fns[3],
+                                        ARMFPStatusFlavour fpsttype)
 {
     MemOp esz = a->esz;
     int check = fp_access_check_vector_hsd(s, a->q, esz);
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
         return check == 0;
     }
 
-    gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
-                      esz == MO_16, data, fns[esz - 1]);
+    gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm, fpsttype,
+                      data, fns[esz - 1]);
     return true;
 }
 
+static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
+                          gen_helper_gvec_3_ptr * const fns[3])
+{
+    return do_fp3_vector_with_fpsttype(s, a, data, fns,
+                                       a->esz == MO_16 ?
+                                       FPST_A64_F16 : FPST_A64);
+}
+
+static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
+                             gen_helper_gvec_3_ptr * const f[3])
+{
+    return do_fp3_vector_with_fpsttype(s, a, data, f,
+                                       select_ah_fpst(s, a->esz));
+}
+
 static gen_helper_gvec_3_ptr * const f_vector_fadd[3] = {
     gen_helper_gvec_fadd_h,
     gen_helper_gvec_fadd_s,
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
     gen_helper_gvec_recps_s,
     gen_helper_gvec_recps_d,
 };
-TRANS(FRECPS_v, do_fp3_vector, a, 0, f_vector_frecps)
+TRANS(FRECPS_v, do_fp3_vector_ah, a, 0, f_vector_frecps)
 
 static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] = {
     gen_helper_gvec_rsqrts_h,
     gen_helper_gvec_rsqrts_s,
     gen_helper_gvec_rsqrts_d,
 };
-TRANS(FRSQRTS_v, do_fp3_vector, a, 0, f_vector_frsqrts)
+TRANS(FRSQRTS_v, do_fp3_vector_ah, a, 0, f_vector_frsqrts)
 
 static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = {
     gen_helper_gvec_faddp_h,
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector_idx(DisasContext *s, arg_qrrx_e *a,
     }
 
     gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
-                      esz == MO_16, a->idx, fns[esz - 1]);
+                      esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+                      a->idx, fns[esz - 1]);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar1 {
     void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_ptr);
 } FPScalar1;
 
-static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
-                          const FPScalar1 *f, int rmode)
+static bool do_fp1_scalar_with_fpsttype(DisasContext *s, arg_rr_e *a,
+                                        const FPScalar1 *f, int rmode,
+                                        ARMFPStatusFlavour fpsttype)
 {
     TCGv_i32 tcg_rmode = NULL;
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
         return check == 0;
     }
 
-    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+    fpst = fpstatus_ptr(fpsttype);
     if (rmode >= 0) {
         tcg_rmode = gen_set_rmode(rmode, fpst);
     }
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
     return true;
 }
 
+static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
+                          const FPScalar1 *f, int rmode)
+{
+    return do_fp1_scalar_with_fpsttype(s, a, f, rmode,
+                                       a->esz == MO_16 ?
+                                       FPST_A64_F16 : FPST_A64);
+}
+
+static bool do_fp1_scalar_ah(DisasContext *s, arg_rr_e *a,
+                             const FPScalar1 *f, int rmode)
+{
+    return do_fp1_scalar_with_fpsttype(s, a, f, rmode, select_ah_fpst(s, a->esz));
+}
+
 static const FPScalar1 f_scalar_fsqrt = {
     gen_helper_vfp_sqrth,
     gen_helper_vfp_sqrts,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frecpe = {
     gen_helper_recpe_f32,
     gen_helper_recpe_f64,
 };
-TRANS(FRECPE_s, do_fp1_scalar, a, &f_scalar_frecpe, -1)
+TRANS(FRECPE_s, do_fp1_scalar_ah, a, &f_scalar_frecpe, -1)
 
 static const FPScalar1 f_scalar_frecpx = {
     gen_helper_frecpx_f16,
     gen_helper_frecpx_f32,
     gen_helper_frecpx_f64,
 };
-TRANS(FRECPX_s, do_fp1_scalar, a, &f_scalar_frecpx, -1)
+TRANS(FRECPX_s, do_fp1_scalar_ah, a, &f_scalar_frecpx, -1)
 
 static const FPScalar1 f_scalar_frsqrte = {
     gen_helper_rsqrte_f16,
     gen_helper_rsqrte_f32,
     gen_helper_rsqrte_f64,
 };
-TRANS(FRSQRTE_s, do_fp1_scalar, a, &f_scalar_frsqrte, -1)
+TRANS(FRSQRTE_s, do_fp1_scalar_ah, a, &f_scalar_frsqrte, -1)
 
 static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
 {
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FRINT64Z_v, aa64_frint, do_fp1_vector, a,
            &f_scalar_frint64, FPROUNDING_ZERO)
 TRANS_FEAT(FRINT64X_v, aa64_frint, do_fp1_vector, a, &f_scalar_frint64, -1)
 
-static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
-                             int rd, int rn, int data,
-                             gen_helper_gvec_2_ptr * const fns[3])
+static bool do_gvec_op2_fpst_with_fpsttype(DisasContext *s, MemOp esz,
+                                           bool is_q, int rd, int rn, int data,
+                                           gen_helper_gvec_2_ptr * const fns[3],
+                                           ARMFPStatusFlavour fpsttype)
 {
     int check = fp_access_check_vector_hsd(s, is_q, esz);
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
         return check == 0;
     }
 
-    fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+    fpst = fpstatus_ptr(fpsttype);
     tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn), fpst,
                        is_q ? 16 : 8, vec_full_reg_size(s),
@@ -XXX,XX +XXX,XX @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
     return true;
 }
 
+static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
+                             int rd, int rn, int data,
+                             gen_helper_gvec_2_ptr * const fns[3])
+{
+    return do_gvec_op2_fpst_with_fpsttype(s, esz, is_q, rd, rn, data, fns,
+                                          esz == MO_16 ? FPST_A64_F16 :
+                                          FPST_A64);
+}
+
+static bool do_gvec_op2_ah_fpst(DisasContext *s, MemOp esz, bool is_q,
+                                int rd, int rn, int data,
+                                gen_helper_gvec_2_ptr * const fns[3])
+{
+    return do_gvec_op2_fpst_with_fpsttype(s, esz, is_q, rd, rn, data,
+                                          fns, select_ah_fpst(s, esz));
+}
+
 static gen_helper_gvec_2_ptr * const f_scvtf_v[] = {
     gen_helper_gvec_vcvt_sh,
     gen_helper_gvec_vcvt_sf,
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_frecpe[] = {
     gen_helper_gvec_frecpe_s,
     gen_helper_gvec_frecpe_d,
 };
-TRANS(FRECPE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
+TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
 
 static gen_helper_gvec_2_ptr * const f_frsqrte[] = {
     gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s,
     gen_helper_gvec_frsqrte_d,
 };
-TRANS(FRSQRTE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
+TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
 
 static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
 {
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool gen_gvec_fpst_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
     return true;
 }
 
-static bool gen_gvec_fpst_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
-                                 arg_rr_esz *a, int data)
+static bool gen_gvec_fpst_ah_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
+                                    arg_rr_esz *a, int data)
 {
     return gen_gvec_fpst_zz(s, fn, a->rd, a->rn, data,
-                            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+                            select_ah_fpst(s, a->esz));
 }
 
 /* Invoke an out-of-line helper on 3 Zregs. */
@@ -XXX,XX +XXX,XX @@ static bool gen_gvec_fpst_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
                              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
 }
 
+static bool gen_gvec_fpst_ah_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
+                                     arg_rrr_esz *a, int data)
+{
+    return gen_gvec_fpst_zzz(s, fn, a->rd, a->rn, a->rm, data,
+                             select_ah_fpst(s, a->esz));
+}
+
 /* Invoke an out-of-line helper on 4 Zregs. */
 static bool gen_gvec_ool_zzzz(DisasContext *s, gen_helper_gvec_4 *fn,
                               int rd, int rn, int rm, int ra, int data)
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const frecpe_fns[] = {
     NULL,                     gen_helper_gvec_frecpe_h,
     gen_helper_gvec_frecpe_s, gen_helper_gvec_frecpe_d,
 };
-TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_arg_zz, frecpe_fns[a->esz], a, 0)
+TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frecpe_fns[a->esz], a, 0)
 
 static gen_helper_gvec_2_ptr * const frsqrte_fns[] = {
     NULL,                      gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s, gen_helper_gvec_frsqrte_d,
 };
-TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_arg_zz, frsqrte_fns[a->esz], a, 0)
+TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frsqrte_fns[a->esz], a, 0)
 
 /*
  *** SVE Floating Point Compare with Zero Group
@@ -XXX,XX +XXX,XX @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
     };                                                              \
     TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_arg_zzz, name##_fns[a->esz], a, 0)
 
+#define DO_FP3_AH(NAME, name) \
+    static gen_helper_gvec_3_ptr * const name##_fns[4] = {          \
+        NULL, gen_helper_gvec_##name##_h,                           \
+        gen_helper_gvec_##name##_s, gen_helper_gvec_##name##_d      \
+    };                                                              \
+    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz, name##_fns[a->esz], a, 0)
+
 DO_FP3(FADD_zzz, fadd)
 DO_FP3(FSUB_zzz, fsub)
 DO_FP3(FMUL_zzz, fmul)
-DO_FP3(FRECPS, recps)
-DO_FP3(FRSQRTS, rsqrts)
+DO_FP3_AH(FRECPS, recps)
+DO_FP3_AH(FRSQRTS, rsqrts)
 
 #undef DO_FP3
 
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const frecpx_fns[] = {
     gen_helper_sve_frecpx_s, gen_helper_sve_frecpx_d,
 };
 TRANS_FEAT(FRECPX, aa64_sve, gen_gvec_fpst_arg_zpz, frecpx_fns[a->esz],
-           a, 0, a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
+           a, 0, select_ah_fpst(s, a->esz))
 
 static gen_helper_gvec_3_ptr * const fsqrt_fns[] = {
     NULL,                   gen_helper_sve_fsqrt_h,
-- 
2.34.1

When FPCR.AH is 1, use FPST_FPCR_AH for:
 * AdvSIMD BFCVT, BFCVTN, BFCVTN2
 * SVE BFCVT, BFCVTNT

so that they get the required behaviour changes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 27 +++++++++++++++++++++------
 target/arm/tcg/translate-sve.c |  6 ++++--
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
 static const FPScalar1 f_scalar_bfcvt = {
     .gen_s = gen_helper_bfcvt,
 };
-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar, a, &f_scalar_bfcvt, -1)
+TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
 
 static const FPScalar1 f_scalar_frint32 = {
     NULL,
@@ -XXX,XX +XXX,XX @@ static void gen_bfcvtn_hs(TCGv_i64 d, TCGv_i64 n)
     tcg_gen_extu_i32_i64(d, tmp);
 }
 
-static ArithOneOp * const f_vector_bfcvtn[] = {
-    NULL,
-    gen_bfcvtn_hs,
-    NULL,
+static void gen_bfcvtn_ah_hs(TCGv_i64 d, TCGv_i64 n)
+{
+    TCGv_ptr fpst = fpstatus_ptr(FPST_AH);
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    gen_helper_bfcvt_pair(tmp, n, fpst);
+    tcg_gen_extu_i32_i64(d, tmp);
+}
+
+static ArithOneOp * const f_vector_bfcvtn[2][3] = {
+    {
+        NULL,
+        gen_bfcvtn_hs,
+        NULL,
+    }, {
+        NULL,
+        gen_bfcvtn_ah_hs,
+        NULL,
+    }
 };
-TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a, f_vector_bfcvtn)
+TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a,
+           f_vector_bfcvtn[s->fpcr_ah])
 
 static bool trans_SHLL_v(DisasContext *s, arg_qrr_e *a)
 {
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVT_hs, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvt_hs, a, 0, FPST_A64_F16)
 
 TRANS_FEAT(BFCVT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_bfcvt, a, 0, FPST_A64)
+           gen_helper_sve_bfcvt, a, 0,
+           s->fpcr_ah ? FPST_AH : FPST_A64)
 
 TRANS_FEAT(FCVT_dh, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvt_dh, a, 0, FPST_A64)
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVTNT_ds, aa64_sve2, gen_gvec_fpst_arg_zpz,
            gen_helper_sve2_fcvtnt_ds, a, 0, FPST_A64)
 
 TRANS_FEAT(BFCVTNT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_bfcvtnt, a, 0, FPST_A64)
+           gen_helper_sve_bfcvtnt, a, 0,
+           s->fpcr_ah ? FPST_AH : FPST_A64)
 
 TRANS_FEAT(FCVTLT_hs, aa64_sve2, gen_gvec_fpst_arg_zpz,
            gen_helper_sve2_fcvtlt_hs, a, 0, FPST_A64)
-- 
2.34.1

When FPCR.AH is 1, use FPST_FPCR_AH for:
 * AdvSIMD BFMLALB, BFMLALT
 * SVE BFMLALB, BFMLALT, BFMLSLB, BFMLSLT

so that they get the required behaviour changes.

We do this by making gen_gvec_op4_fpst() take an ARMFPStatusFlavour
rather than a bool is_fp16; existing callsites now select
FPST_FPCR_F16_A64 vs FPST_FPCR_A64 themselves rather than passing in
the boolean.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 20 +++++++++++++-------
 target/arm/tcg/translate-sve.c |  6 ++++--
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op4_env(DisasContext *s, bool is_q, int rd, int rn,
  * an out-of-line helper.
  */
 static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
-                              int rm, int ra, bool is_fp16, int data,
+                              int rm, int ra, ARMFPStatusFlavour fpsttype,
+                              int data,
                               gen_helper_gvec_4_ptr *fn)
 {
-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_A64_F16 : FPST_A64);
+    TCGv_ptr fpst = fpstatus_ptr(fpsttype);
     tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn),
                        vec_full_reg_offset(s, rm),
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLAL_v(DisasContext *s, arg_qrrr_e *a)
     }
     if (fp_access_check(s)) {
         /* Q bit selects BFMLALB vs BFMLALT. */
-        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, false, a->q,
+        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd,
+                          s->fpcr_ah ? FPST_AH : FPST_A64, a->q,
                           gen_helper_gvec_bfmlal);
     }
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
     }
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
-                      a->esz == MO_16, a->rot, fn[a->esz]);
+                      a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+                      a->rot, fn[a->esz]);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
     }
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
-                      esz == MO_16, (a->idx << 1) | neg,
+                      esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+                      (a->idx << 1) | neg,
                       fns[esz - 1]);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLAL_vi(DisasContext *s, arg_qrrx_e *a)
     }
     if (fp_access_check(s)) {
         /* Q bit selects BFMLALB vs BFMLALT. */
-        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, 0,
+        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd,
+                          s->fpcr_ah ? FPST_AH : FPST_A64,
                           (a->idx << 1) | a->q,
                           gen_helper_gvec_bfmlal_idx);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
     }
     if (fp_access_check(s)) {
         gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
-                          a->esz == MO_16, (a->idx << 2) | a->rot, fn);
+                          a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+                          (a->idx << 2) | a->rot, fn);
     }
     return true;
 }
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(BFMMLA, aa64_sve_bf16, gen_gvec_env_arg_zzzz,
 static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
 {
     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal,
-                              a->rd, a->rn, a->rm, a->ra, sel, FPST_A64);
+                              a->rd, a->rn, a->rm, a->ra, sel,
+                              s->fpcr_ah ? FPST_AH : FPST_A64);
 }
 
 TRANS_FEAT(BFMLALB_zzzw, aa64_sve_bf16, do_BFMLAL_zzzw, a, false)
@@ -XXX,XX +XXX,XX @@ static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
 {
     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal_idx,
                               a->rd, a->rn, a->rm, a->ra,
-                              (a->index << 1) | sel, FPST_A64);
+                              (a->index << 1) | sel,
+                              s->fpcr_ah ? FPST_AH : FPST_A64);
 }
 
 TRANS_FEAT(BFMLALB_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, false)
-- 
2.34.1

For FEAT_AFP, we want to emit different code when FPCR.NEP is set, so
that instead of zeroing the high elements of a vector register when
we write the output of a scalar operation to it, we instead merge in
those elements from one of the source registers.  Since this affects
the generated code, we need to put FPCR.NEP into the TBFLAGS.

FPCR.NEP is treated as 0 when in streaming SVE mode and FEAT_SME_FA64
is not implemented or not enabled; we can implement this logic in
rebuild_hflags_a64().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h               | 1 +
 target/arm/tcg/translate.h     | 2 ++
 target/arm/tcg/hflags.c        | 9 +++++++++
 target/arm/tcg/translate-a64.c | 1 +
 4 files changed, 13 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1)
 /* Set if FEAT_NV2 RAM accesses are big-endian */
 FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1)
 FIELD(TBFLAG_A64, AH, 37, 1)   /* FPCR.AH */
+FIELD(TBFLAG_A64, NEP, 38, 1)   /* FPCR.NEP */
 
 /*
  * Helpers for using the above. Note that only the A64 accessors use
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool nv2_mem_be;
     /* True if FPCR.AH is 1 (alternate floating point handling) */
     bool fpcr_ah;
+    /* True if FPCR.NEP is 1 (FEAT_AFP scalar upper-element result handling) */
+    bool fpcr_nep;
     /*
      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
      *  < 0, set by the current instruction.
diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/hflags.c
+++ b/target/arm/tcg/hflags.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
     if (env->vfp.fpcr & FPCR_AH) {
         DP_TBFLAG_A64(flags, AH, 1);
     }
+    if (env->vfp.fpcr & FPCR_NEP) {
+        /*
+         * In streaming-SVE without FA64, NEP behaves as if zero;
+         * compare pseudocode IsMerging()
+         */
+        if (!(EX_TBFLAG_A64(flags, PSTATE_SM) && !sme_fa64(env, el))) {
+            DP_TBFLAG_A64(flags, NEP, 1);
+        }
+    }
 
     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 }
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->nv2_mem_e20 = EX_TBFLAG_A64(tb_flags, NV2_MEM_E20);
     dc->nv2_mem_be = EX_TBFLAG_A64(tb_flags, NV2_MEM_BE);
     dc->fpcr_ah = EX_TBFLAG_A64(tb_flags, AH);
+    dc->fpcr_nep = EX_TBFLAG_A64(tb_flags, NEP);
     dc->vec_len = 0;
     dc->vec_stride = 0;
     dc->cp_regs = arm_cpu->cp_regs;
-- 
2.34.1

For FEAT_AFP's FPCR.NEP bit, we need to programmatically change the
behaviour of the writeback of the result for most SIMD scalar
operations, so that instead of zeroing the upper part of the result
register it merges the upper elements from one of the input
registers.

Provide new functions write_fp_*reg_merging() which can be used
instead of the existing write_fp_*reg() functions when we want this
"merge the result with one of the input registers if FPCR.NEP is
enabled" handling, and use them in do_fp3_scalar_with_fpsttype().

Note that (as documented in the description of the FPCR.NEP bit)
which input register to use as the merge source varies by
instruction: for these 2-input scalar operations, the comparison
instructions take from Rm, not Rn.

We'll extend this to also provide the merging behaviour for
the remaining scalar insns in subsequent commits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 117 +++++++++++++++++++++++++--------
 1 file changed, 91 insertions(+), 26 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void write_fp_sreg(DisasContext *s, int reg, TCGv_i32 v)
     write_fp_dreg(s, reg, tmp);
 }
 
+/*
+ * Write a double result to 128 bit vector register reg, honouring FPCR.NEP:
+ * - if FPCR.NEP == 0, clear the high elements of reg
+ * - if FPCR.NEP == 1, set the high elements of reg from mergereg
+ *   (i.e. merge the result with those high elements)
+ * In either case, SVE register bits above 128 are zeroed (per R_WKYLB).
+ */
+static void write_fp_dreg_merging(DisasContext *s, int reg, int mergereg,
+                                  TCGv_i64 v)
+{
+    if (!s->fpcr_nep) {
+        write_fp_dreg(s, reg, v);
+        return;
+    }
+
+    /*
+     * Move from mergereg to reg; this sets the high elements and
+     * clears the bits above 128 as a side effect.
+     */
+    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
+                     vec_full_reg_offset(s, mergereg),
+                     16, vec_full_reg_size(s));
+    tcg_gen_st_i64(v, tcg_env, vec_full_reg_offset(s, reg));
+}
+
+/*
+ * Write a single-prec result, but only clear the higher elements
+ * of the destination register if FPCR.NEP is 0; otherwise preserve them.
+ */
+static void write_fp_sreg_merging(DisasContext *s, int reg, int mergereg,
+                                  TCGv_i32 v)
+{
+    if (!s->fpcr_nep) {
+        write_fp_sreg(s, reg, v);
+        return;
+    }
+
+    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
+                     vec_full_reg_offset(s, mergereg),
+                     16, vec_full_reg_size(s));
+    tcg_gen_st_i32(v, tcg_env, fp_reg_offset(s, reg, MO_32));
+}
+
+/*
+ * Write a half-prec result, but only clear the higher elements
+ * of the destination register if FPCR.NEP is 0; otherwise preserve them.
+ * The caller must ensure that the top 16 bits of v are zero.
+ */
+static void write_fp_hreg_merging(DisasContext *s, int reg, int mergereg,
+                                  TCGv_i32 v)
+{
+    if (!s->fpcr_nep) {
+        write_fp_sreg(s, reg, v);
+        return;
+    }
+
+    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
+                     vec_full_reg_offset(s, mergereg),
+                     16, vec_full_reg_size(s));
+    tcg_gen_st16_i32(v, tcg_env, fp_reg_offset(s, reg, MO_16));
+}
+
 /* Expand a 2-operand AdvSIMD vector operation using an expander function.  */
 static void gen_gvec_fn2(DisasContext *s, bool is_q, int rd, int rn,
                          GVecGen2Fn *gvec_fn, int vece)
@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar {
 } FPScalar;
 
 static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
-                                        const FPScalar *f,
+                                        const FPScalar *f, int mergereg,
                                         ARMFPStatusFlavour fpsttype)
 {
     switch (a->esz) {
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
             TCGv_i64 t0 = read_fp_dreg(s, a->rn);
             TCGv_i64 t1 = read_fp_dreg(s, a->rm);
             f->gen_d(t0, t0, t1, fpstatus_ptr(fpsttype));
-            write_fp_dreg(s, a->rd, t0);
+            write_fp_dreg_merging(s, a->rd, mergereg, t0);
         }
         break;
     case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
             TCGv_i32 t0 = read_fp_sreg(s, a->rn);
             TCGv_i32 t1 = read_fp_sreg(s, a->rm);
             f->gen_s(t0, t0, t1, fpstatus_ptr(fpsttype));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_sreg_merging(s, a->rd, mergereg, t0);
         }
         break;
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
             TCGv_i32 t0 = read_fp_hreg(s, a->rn);
             TCGv_i32 t1 = read_fp_hreg(s, a->rm);
             f->gen_h(t0, t0, t1, fpstatus_ptr(fpsttype));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_hreg_merging(s, a->rd, mergereg, t0);
         }
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
     return true;
 }
 
-static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
+                          int mergereg)
 {
-    return do_fp3_scalar_with_fpsttype(s, a, f,
+    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
                                        a->esz == MO_16 ?
                                        FPST_A64_F16 : FPST_A64);
 }
 
-static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
+                             int mergereg)
 {
-    return do_fp3_scalar_with_fpsttype(s, a, f, select_ah_fpst(s, a->esz));
+    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
+                                       select_ah_fpst(s, a->esz));
 }
 
 static const FPScalar f_scalar_fadd = {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fadd = {
     gen_helper_vfp_adds,
     gen_helper_vfp_addd,
 };
-TRANS(FADD_s, do_fp3_scalar, a, &f_scalar_fadd)
+TRANS(FADD_s, do_fp3_scalar, a, &f_scalar_fadd, a->rn)
 
 static const FPScalar f_scalar_fsub = {
     gen_helper_vfp_subh,
     gen_helper_vfp_subs,
     gen_helper_vfp_subd,
 };
-TRANS(FSUB_s, do_fp3_scalar, a, &f_scalar_fsub)
+TRANS(FSUB_s, do_fp3_scalar, a, &f_scalar_fsub, a->rn)
 
 static const FPScalar f_scalar_fdiv = {
     gen_helper_vfp_divh,
     gen_helper_vfp_divs,
     gen_helper_vfp_divd,
 };
-TRANS(FDIV_s, do_fp3_scalar, a, &f_scalar_fdiv)
+TRANS(FDIV_s, do_fp3_scalar, a, &f_scalar_fdiv, a->rn)
 
 static const FPScalar f_scalar_fmul = {
     gen_helper_vfp_mulh,
     gen_helper_vfp_muls,
     gen_helper_vfp_muld,
 };
-TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul)
+TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul, a->rn)
 
 static const FPScalar f_scalar_fmax = {
     gen_helper_vfp_maxh,
     gen_helper_vfp_maxs,
     gen_helper_vfp_maxd,
 };
-TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax)
+TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax, a->rn)
 
 static const FPScalar f_scalar_fmin = {
     gen_helper_vfp_minh,
     gen_helper_vfp_mins,
     gen_helper_vfp_mind,
 };
-TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin)
+TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin, a->rn)
 
 static const FPScalar f_scalar_fmaxnm = {
     gen_helper_vfp_maxnumh,
     gen_helper_vfp_maxnums,
     gen_helper_vfp_maxnumd,
 };
-TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm)
+TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm, a->rn)
 
 static const FPScalar f_scalar_fminnm = {
     gen_helper_vfp_minnumh,
     gen_helper_vfp_minnums,
     gen_helper_vfp_minnumd,
 };
-TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm)
+TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm, a->rn)
 
 static const FPScalar f_scalar_fmulx = {
     gen_helper_advsimd_mulxh,
     gen_helper_vfp_mulxs,
     gen_helper_vfp_mulxd,
 };
-TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx)
+TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx, a->rn)
 
 static void gen_fnmul_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fnmul = {
     gen_fnmul_s,
     gen_fnmul_d,
 };
-TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul)
+TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul, a->rn)
 
 static const FPScalar f_scalar_fcmeq = {
     gen_helper_advsimd_ceq_f16,
     gen_helper_neon_ceq_f32,
     gen_helper_neon_ceq_f64,
 };
-TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq)
+TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq, a->rm)
 
 static const FPScalar f_scalar_fcmge = {
     gen_helper_advsimd_cge_f16,
     gen_helper_neon_cge_f32,
     gen_helper_neon_cge_f64,
 };
-TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge)
+TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge, a->rm)
 
 static const FPScalar f_scalar_fcmgt = {
     gen_helper_advsimd_cgt_f16,
     gen_helper_neon_cgt_f32,
     gen_helper_neon_cgt_f64,
 };
-TRANS(FCMGT_s, do_fp3_scalar, a, &f_scalar_fcmgt)
+TRANS(FCMGT_s, do_fp3_scalar, a, &f_scalar_fcmgt, a->rm)
 
 static const FPScalar f_scalar_facge = {
     gen_helper_advsimd_acge_f16,
     gen_helper_neon_acge_f32,
     gen_helper_neon_acge_f64,
 };
-TRANS(FACGE_s, do_fp3_scalar, a, &f_scalar_facge)
+TRANS(FACGE_s, do_fp3_scalar, a, &f_scalar_facge, a->rm)
 
 static const FPScalar f_scalar_facgt = {
     gen_helper_advsimd_acgt_f16,
     gen_helper_neon_acgt_f32,
     gen_helper_neon_acgt_f64,
 };
-TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt)
+TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt, a->rm)
 
 static void gen_fabd_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fabd = {
     gen_fabd_s,
     gen_fabd_d,
 };
-TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd)
+TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd, a->rn)
 
 static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f16,
     gen_helper_recpsf_f32,
     gen_helper_recpsf_f64,
 };
-TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps)
+TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps, a->rn)
 
 static const FPScalar f_scalar_frsqrts = {
     gen_helper_rsqrtsf_f16,
     gen_helper_rsqrtsf_f32,
     gen_helper_rsqrtsf_f64,
 };
-TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts)
+TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts, a->rn)
 
 static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                        const FPScalar *f, bool swap)
-- 
2.34.1

Handle FPCR.NEP for the 3-input scalar operations which use
do_fmla_scalar_idx() and do_fmadd(), by making them call the
appropriate write_fp_*reg_merging() functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
                 gen_vfp_negd(t1, t1);
             }
             gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
-            write_fp_dreg(s, a->rd, t0);
+            write_fp_dreg_merging(s, a->rd, a->rd, t0);
         }
         break;
     case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
                 gen_vfp_negs(t1, t1);
             }
             gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_sreg_merging(s, a->rd, a->rd, t0);
         }
         break;
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
             }
             gen_helper_advsimd_muladdh(t0, t1, t2, t0,
                                        fpstatus_ptr(FPST_A64_F16));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_hreg_merging(s, a->rd, a->rd, t0);
         }
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             }
             fpst = fpstatus_ptr(FPST_A64);
             gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
-            write_fp_dreg(s, a->rd, ta);
+            write_fp_dreg_merging(s, a->rd, a->ra, ta);
         }
         break;
 
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             }
             fpst = fpstatus_ptr(FPST_A64);
             gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
-            write_fp_sreg(s, a->rd, ta);
+            write_fp_sreg_merging(s, a->rd, a->ra, ta);
         }
         break;
 
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             }
             fpst = fpstatus_ptr(FPST_A64_F16);
             gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
-            write_fp_sreg(s, a->rd, ta);
+            write_fp_hreg_merging(s, a->rd, a->ra, ta);
         }
         break;
 
-- 
2.34.1

Currently we implement BFCVT scalar via do_fp1_scalar().  This works
even though BFCVT is a narrowing operation from 32 to 16 bits,
because we can use write_fp_sreg() for float16. However, FPCR.NEP
support requires that we use write_fp_hreg_merging() for float16
outputs, so we can't continue to borrow the non-narrowing
do_fp1_scalar() function for this. Split out trans_BFCVT_s()
into its own implementation that honours FPCR.NEP.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frintx = {
 };
 TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
 
-static const FPScalar1 f_scalar_bfcvt = {
-    .gen_s = gen_helper_bfcvt,
-};
-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
+static bool trans_BFCVT_s(DisasContext *s, arg_rr_e *a)
+{
+    ARMFPStatusFlavour fpsttype = s->fpcr_ah ? FPST_AH : FPST_A64;
+    TCGv_i32 t32;
+    int check;
+
+    if (!dc_isar_feature(aa64_bf16, s)) {
+        return false;
+    }
+
+    check = fp_access_check_scalar_hsd(s, a->esz);
+
+    if (check <= 0) {
+        return check == 0;
+    }
+
+    t32 = read_fp_sreg(s, a->rn);
+    gen_helper_bfcvt(t32, t32, fpstatus_ptr(fpsttype));
+    write_fp_hreg_merging(s, a->rd, a->rd, t32);
+    return true;
+}
 
 static const FPScalar1 f_scalar_frint32 = {
     NULL,
-- 
2.34.1

Handle FPCR.NEP for the 1-input scalar operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_with_fpsttype(DisasContext *s, arg_rr_e *a,
     case MO_64:
         t64 = read_fp_dreg(s, a->rn);
         f->gen_d(t64, t64, fpst);
-        write_fp_dreg(s, a->rd, t64);
+        write_fp_dreg_merging(s, a->rd, a->rd, t64);
         break;
     case MO_32:
         t32 = read_fp_sreg(s, a->rn);
         f->gen_s(t32, t32, fpst);
-        write_fp_sreg(s, a->rd, t32);
+        write_fp_sreg_merging(s, a->rd, a->rd, t32);
         break;
     case MO_16:
         t32 = read_fp_hreg(s, a->rn);
         f->gen_h(t32, t32, fpst);
-        write_fp_sreg(s, a->rd, t32);
+        write_fp_hreg_merging(s, a->rd, a->rd, t32);
         break;
     default:
         g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
 
         gen_helper_vfp_fcvtds(tcg_rd, tcg_rn, fpst);
-        write_fp_dreg(s, a->rd, tcg_rd);
+        write_fp_dreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_hs(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
 
         gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-        /* write_fp_sreg is OK here because top half of result is zero */
-        write_fp_sreg(s, a->rd, tmp);
+        /* write_fp_hreg_merging is OK here because top half of result is zero */
+        write_fp_hreg_merging(s, a->rd, a->rd, tmp);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_sd(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
 
         gen_helper_vfp_fcvtsd(tcg_rd, tcg_rn, fpst);
-        write_fp_sreg(s, a->rd, tcg_rd);
+        write_fp_sreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_hd(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
 
         gen_helper_vfp_fcvt_f64_to_f16(tcg_rd, tcg_rn, fpst, ahp);
-        /* write_fp_sreg is OK here because top half of tcg_rd is zero */
-        write_fp_sreg(s, a->rd, tcg_rd);
+        /* write_fp_hreg_merging is OK here because top half of tcg_rd is zero */
+        write_fp_hreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_sh(DisasContext *s, arg_rr *a)
         TCGv_i32 tcg_ahp = get_ahp_flag();
 
         gen_helper_vfp_fcvt_f16_to_f32(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
-        write_fp_sreg(s, a->rd, tcg_rd);
+        write_fp_sreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_dh(DisasContext *s, arg_rr *a)
         TCGv_i32 tcg_ahp = get_ahp_flag();
 
         gen_helper_vfp_fcvt_f16_to_f64(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
-        write_fp_dreg(s, a->rd, tcg_rd);
+        write_fp_dreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_fcvt_f(DisasContext *s, arg_fcvt *a,
     do_fcvt_scalar(s, a->esz | (is_signed ? MO_SIGN : 0),
                    a->esz, tcg_int, a->shift, a->rn, rmode);
 
-    clear_vec(s, a->rd);
+    if (!s->fpcr_nep) {
+        clear_vec(s, a->rd);
+    }
     write_vec_element(s, tcg_int, a->rd, 0, a->esz);
     return true;
 }
-- 
2.34.1

Handle FPCR.NEP in the operations handled by do_cvtf_scalar().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
         } else {
             gen_helper_vfp_uqtod(tcg_double, tcg_int, tcg_shift, tcg_fpstatus);
         }
-        write_fp_dreg(s, rd, tcg_double);
+        write_fp_dreg_merging(s, rd, rd, tcg_double);
         break;
 
     case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
         } else {
             gen_helper_vfp_uqtos(tcg_single, tcg_int, tcg_shift, tcg_fpstatus);
         }
-        write_fp_sreg(s, rd, tcg_single);
+        write_fp_sreg_merging(s, rd, rd, tcg_single);
         break;
 
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
         } else {
             gen_helper_vfp_uqtoh(tcg_single, tcg_int, tcg_shift, tcg_fpstatus);
         }
-        write_fp_sreg(s, rd, tcg_single);
+        write_fp_hreg_merging(s, rd, rd, tcg_single);
         break;
 
     default:
-- 
2.34.1

Handle FPCR.NEP merging for scalar FABS and FNEG; this requires
an extra parameter to do_fp1_scalar_int(), since FMOV scalar
does not have the merging behaviour.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar1Int {
 } FPScalar1Int;
 
 static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
-                              const FPScalar1Int *f)
+                              const FPScalar1Int *f,
+                              bool merging)
 {
     switch (a->esz) {
     case MO_64:
         if (fp_access_check(s)) {
             TCGv_i64 t = read_fp_dreg(s, a->rn);
             f->gen_d(t, t);
-            write_fp_dreg(s, a->rd, t);
+            if (merging) {
+                write_fp_dreg_merging(s, a->rd, a->rd, t);
+            } else {
+                write_fp_dreg(s, a->rd, t);
+            }
         }
         break;
     case MO_32:
         if (fp_access_check(s)) {
             TCGv_i32 t = read_fp_sreg(s, a->rn);
             f->gen_s(t, t);
-            write_fp_sreg(s, a->rd, t);
+            if (merging) {
+                write_fp_sreg_merging(s, a->rd, a->rd, t);
+            } else {
+                write_fp_sreg(s, a->rd, t);
+            }
         }
         break;
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
         if (fp_access_check(s)) {
             TCGv_i32 t = read_fp_hreg(s, a->rn);
             f->gen_h(t, t);
-            write_fp_sreg(s, a->rd, t);
+            if (merging) {
+                write_fp_hreg_merging(s, a->rd, a->rd, t);
+            } else {
+                write_fp_sreg(s, a->rd, t);
+            }
         }
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fmov = {
     tcg_gen_mov_i32,
     tcg_gen_mov_i64,
 };
-TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov)
+TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov, false)
 
 static const FPScalar1Int f_scalar_fabs = {
     gen_vfp_absh,
     gen_vfp_abss,
     gen_vfp_absd,
 };
-TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs)
+TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs, true)
 
 static const FPScalar1Int f_scalar_fneg = {
     gen_vfp_negh,
     gen_vfp_negs,
     gen_vfp_negd,
 };
-TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg)
+TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg, true)
 
 typedef struct FPScalar1 {
     void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr);
-- 
2.34.1

Unlike the other users of do_2misc_narrow_scalar(), FCVTXN (scalar)
is always double-to-single and must honour FPCR.NEP.  Implement this
directly in a trans function rather than using
do_2misc_narrow_scalar().

We still need gen_fcvtxn_sd() and the f_scalar_fcvtxn[] array for
the FCVTXN (vector) insn, so we move those down in the file to
where they are used.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 43 ++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static ArithOneOp * const f_scalar_uqxtn[] = {
 };
 TRANS(UQXTN_s, do_2misc_narrow_scalar, a, f_scalar_uqxtn)
 
-static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
+static bool trans_FCVTXN_s(DisasContext *s, arg_rr_e *a)
 {
-    /*
-     * 64 bit to 32 bit float conversion
-     * with von Neumann rounding (round to odd)
-     */
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_A64));
-    tcg_gen_extu_i32_i64(d, tmp);
+    if (fp_access_check(s)) {
+        /*
+         * 64 bit to 32 bit float conversion
+         * with von Neumann rounding (round to odd)
+         */
+        TCGv_i64 src = read_fp_dreg(s, a->rn);
+        TCGv_i32 dst = tcg_temp_new_i32();
+        gen_helper_fcvtx_f64_to_f32(dst, src, fpstatus_ptr(FPST_A64));
+        write_fp_sreg_merging(s, a->rd, a->rd, dst);
+    }
+    return true;
 }
 
-static ArithOneOp * const f_scalar_fcvtxn[] = {
-    NULL,
-    NULL,
-    gen_fcvtxn_sd,
-};
-TRANS(FCVTXN_s, do_2misc_narrow_scalar, a, f_scalar_fcvtxn)
-
 #undef WRAP_ENV
 
 static bool do_gvec_fn2(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn)
@@ -XXX,XX +XXX,XX @@ static void gen_fcvtn_sd(TCGv_i64 d, TCGv_i64 n)
     tcg_gen_extu_i32_i64(d, tmp);
 }
 
+static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
+{
+    /*
+     * 64 bit to 32 bit float conversion
+     * with von Neumann rounding (round to odd)
+     */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_A64));
+    tcg_gen_extu_i32_i64(d, tmp);
+}
+
 static ArithOneOp * const f_vector_fcvtn[] = {
     NULL,
     gen_fcvtn_hs,
     gen_fcvtn_sd,
 };
+static ArithOneOp * const f_scalar_fcvtxn[] = {
+    NULL,
+    NULL,
+    gen_fcvtxn_sd,
+};
 TRANS(FCVTN_v, do_2misc_narrow_vector, a, f_vector_fcvtn)
 TRANS(FCVTXN_v, do_2misc_narrow_vector, a, f_scalar_fcvtxn)
 
-- 
2.34.1

do_fp3_scalar_idx() is used only for the FMUL and FMULX scalar by
element instructions; these both need to merge the result with the Rn
register when FPCR.NEP is set.

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
 
             read_vec_element(s, t1, a->rm, a->idx, MO_64);
             f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_A64));
-            write_fp_dreg(s, a->rd, t0);
+            write_fp_dreg_merging(s, a->rd, a->rn, t0);
         }
         break;
     case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
 
             read_vec_element_i32(s, t1, a->rm, a->idx, MO_32);
             f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_A64));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_sreg_merging(s, a->rd, a->rn, t0);
         }
         break;
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
 
             read_vec_element_i32(s, t1, a->rm, a->idx, MO_16);
             f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_A64_F16));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_hreg_merging(s, a->rd, a->rn, t0);
         }
         break;
     default:
-- 
2.34.1

When FPCR.AH == 1, floating point FMIN and FMAX have some odd special
cases:

* comparing two zeroes (even of different sign) or comparing a NaN
   with anything always returns the second argument (possibly
   squashed to zero)
 * denormal outputs are not squashed to zero regardless of FZ or FZ16

Implement these semantics in new helper functions and select them at
translate time if FPCR.AH is 1 for the scalar FMAX and FMIN insns.
(We will convert the other FMAX and FMIN insns in subsequent
commits.)

Note that FMINNM and FMAXNM are not affected.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-a64.h    |  7 +++++++
 target/arm/tcg/helper-a64.c    | 36 ++++++++++++++++++++++++++++++++++
 target/arm/tcg/translate-a64.c | 23 ++++++++++++++++++++--
 3 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(advsimd_muladd2h, i32, i32, i32, i32, fpst)
 DEF_HELPER_2(advsimd_rinth_exact, f16, f16, fpst)
 DEF_HELPER_2(advsimd_rinth, f16, f16, fpst)
 
+DEF_HELPER_3(vfp_ah_minh, f16, f16, f16, fpst)
+DEF_HELPER_3(vfp_ah_mins, f32, f32, f32, fpst)
+DEF_HELPER_3(vfp_ah_mind, f64, f64, f64, fpst)
+DEF_HELPER_3(vfp_ah_maxh, f16, f16, f16, fpst)
+DEF_HELPER_3(vfp_ah_maxs, f32, f32, f32, fpst)
+DEF_HELPER_3(vfp_ah_maxd, f64, f64, f64, fpst)
+
 DEF_HELPER_2(exception_return, void, env, i64)
 DEF_HELPER_FLAGS_2(dc_zva, TCG_CALL_NO_WG, void, env, i64)
 
diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-a64.c
+++ b/target/arm/tcg/helper-a64.c
@@ -XXX,XX +XXX,XX @@ float32 HELPER(fcvtx_f64_to_f32)(float64 a, float_status *fpst)
     return r;
 }
 
+/*
+ * AH=1 min/max have some odd special cases:
+ * comparing two zeroes (regardless of sign), (NaN, anything),
+ * or (anything, NaN) should return the second argument (possibly
+ * squashed to zero).
+ * Also, denormal outputs are not squashed to zero regardless of FZ or FZ16.
+ */
+#define AH_MINMAX_HELPER(NAME, CTYPE, FLOATTYPE, MINMAX)                \
+    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
+    {                                                                   \
+        bool save;                                                      \
+        CTYPE r;                                                        \
+        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
+        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
+        if (FLOATTYPE ## _is_zero(a) && FLOATTYPE ## _is_zero(b)) {     \
+            return b;                                                   \
+        }                                                               \
+        if (FLOATTYPE ## _is_any_nan(a) ||                              \
+            FLOATTYPE ## _is_any_nan(b)) {                              \
+            float_raise(float_flag_invalid, fpst);                      \
+            return b;                                                   \
+        }                                                               \
+        save = get_flush_to_zero(fpst);                                 \
+        set_flush_to_zero(false, fpst);                                 \
+        r = FLOATTYPE ## _ ## MINMAX(a, b, fpst);                       \
+        set_flush_to_zero(save, fpst);                                  \
+        return r;                                                       \
+    }
+
+AH_MINMAX_HELPER(vfp_ah_minh, dh_ctype_f16, float16, min)
+AH_MINMAX_HELPER(vfp_ah_mins, float32, float32, min)
+AH_MINMAX_HELPER(vfp_ah_mind, float64, float64, min)
+AH_MINMAX_HELPER(vfp_ah_maxh, dh_ctype_f16, float16, max)
+AH_MINMAX_HELPER(vfp_ah_maxs, float32, float32, max)
+AH_MINMAX_HELPER(vfp_ah_maxd, float64, float64, max)
+
 /* 64-bit versions of the CRC helpers. Note that although the operation
  * (and the prototypes of crc32c() and crc32() mean that only the bottom
  * 32 bits of the accumulator and result are used, we pass and return
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
                                        select_ah_fpst(s, a->esz));
 }
 
+/* Some insns need to call different helpers when FPCR.AH == 1 */
+static bool do_fp3_scalar_2fn(DisasContext *s, arg_rrr_e *a,
+                              const FPScalar *fnormal,
+                              const FPScalar *fah,
+                              int mergereg)
+{
+    return do_fp3_scalar(s, a, s->fpcr_ah ? fah : fnormal, mergereg);
+}
+
 static const FPScalar f_scalar_fadd = {
     gen_helper_vfp_addh,
     gen_helper_vfp_adds,
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fmax = {
     gen_helper_vfp_maxs,
     gen_helper_vfp_maxd,
 };
-TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax, a->rn)
+static const FPScalar f_scalar_fmax_ah = {
+    gen_helper_vfp_ah_maxh,
+    gen_helper_vfp_ah_maxs,
+    gen_helper_vfp_ah_maxd,
+};
+TRANS(FMAX_s, do_fp3_scalar_2fn, a, &f_scalar_fmax, &f_scalar_fmax_ah, a->rn)
 
 static const FPScalar f_scalar_fmin = {
     gen_helper_vfp_minh,
     gen_helper_vfp_mins,
     gen_helper_vfp_mind,
 };
-TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin, a->rn)
+static const FPScalar f_scalar_fmin_ah = {
+    gen_helper_vfp_ah_minh,
+    gen_helper_vfp_ah_mins,
+    gen_helper_vfp_ah_mind,
+};
+TRANS(FMIN_s, do_fp3_scalar_2fn, a, &f_scalar_fmin, &f_scalar_fmin_ah, a->rn)
 
 static const FPScalar f_scalar_fmaxnm = {
     gen_helper_vfp_maxnumh,
-- 
2.34.1

Implement the FPCR.AH == 1 semantics for vector FMIN/FMAX, by
creating new _ah_ versions of the gvec helpers which invoke the
scalar fmin_ah and fmax_ah helpers on each element.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 21 +++++++++++++++++++--
 target/arm/tcg/vec_helper.c    |  8 ++++++++
 3 files changed, 41 insertions(+), 2 deletions(-)

Implement the FPCR.AH semantics for FMAXV and FMINV.  These are the
"recursively reduce all lanes of a vector to a scalar result" insns;
we just need to use the _ah_ helper for the reduction step when
FPCR.AH == 1.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static TCGv_i32 do_reduction_op(DisasContext *s, int rn, MemOp esz,
 }
 
 static bool do_fp_reduction(DisasContext *s, arg_qrr_e *a,
-                              NeonGenTwoSingleOpFn *fn)
+                            NeonGenTwoSingleOpFn *fnormal,
+                            NeonGenTwoSingleOpFn *fah)
 {
     if (fp_access_check(s)) {
         MemOp esz = a->esz;
         int elts = (a->q ? 16 : 8) >> esz;
         TCGv_ptr fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64);
-        TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst, fn);
+        TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst,
+                                       s->fpcr_ah ? fah : fnormal);
         write_fp_sreg(s, a->rd, res);
     }
     return true;
 }
 
-TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxnumh)
-TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minnumh)
-TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxh)
-TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minh)
+TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_maxnumh, gen_helper_vfp_maxnumh)
+TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_minnumh, gen_helper_vfp_minnumh)
+TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_maxh, gen_helper_vfp_ah_maxh)
+TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_minh, gen_helper_vfp_ah_minh)
 
-TRANS(FMAXNMV_s, do_fp_reduction, a, gen_helper_vfp_maxnums)
-TRANS(FMINNMV_s, do_fp_reduction, a, gen_helper_vfp_minnums)
-TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs)
-TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins)
+TRANS(FMAXNMV_s, do_fp_reduction, a,
+      gen_helper_vfp_maxnums, gen_helper_vfp_maxnums)
+TRANS(FMINNMV_s, do_fp_reduction, a,
+      gen_helper_vfp_minnums, gen_helper_vfp_minnums)
+TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs, gen_helper_vfp_ah_maxs)
+TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins, gen_helper_vfp_ah_mins)
 
 /*
  * Floating-point Immediate
-- 
2.34.1

Implement the FPCR.AH semantics for the pairwise floating
point minimum/maximum insns FMINP and FMAXP.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
 target/arm/tcg/vec_helper.c    | 10 ++++++++++
 3 files changed, 45 insertions(+), 4 deletions(-)

Implement the FPCR.AH semantics for the SVE FMAXV and FMINV
vector-reduction-to-scalar max/min operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 +++++++++++
 target/arm/tcg/sve_helper.c    | 43 +++++++++++++++++++++-------------
 target/arm/tcg/translate-sve.c | 16 +++++++++++--
 3 files changed, 55 insertions(+), 18 deletions(-)

Implement the FPCR.AH semantics for the SVE FMAX and FMIN operations
that take an immediate as the second operand.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/sve_helper.c    |  8 ++++++++
 target/arm/tcg/translate-sve.c | 25 +++++++++++++++++++++++--
 3 files changed, 45 insertions(+), 2 deletions(-)

Implement the FPCR.AH semantics for the SVE FMAX and FMIN
operations that take two vector operands.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/sve_helper.c    |  8 ++++++++
 target/arm/tcg/translate-sve.c | 17 +++++++++++++++--
 3 files changed, 37 insertions(+), 2 deletions(-)

FPCR.AH == 1 mandates that negation of a NaN value should not flip
its sign bit.  This means we can no longer use gen_vfp_neg*()
everywhere but must instead generate slightly more complex code when
FPCR.AH is set.

Make this change for the scalar FNEG and for those places in
translate-a64.c which were previously directly calling
gen_vfp_neg*().

This change in semantics also affects any other instruction whose
pseudocode calls FPNeg(); in following commits we extend this
change to the other affected instructions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 125 ++++++++++++++++++++++++++++++---
 1 file changed, 114 insertions(+), 11 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
                        is_q ? 16 : 8, vec_full_reg_size(s), data, fn);
 }
 
+/*
+ * When FPCR.AH == 1, NEG and ABS do not flip the sign bit of a NaN.
+ * These functions implement
+ *   d = floatN_is_any_nan(s) ? s : floatN_chs(s)
+ * which for float32 is
+ *   d = (s & ~(1 << 31)) > 0x7f800000UL) ? s : (s ^ (1 << 31))
+ * and similarly for the other float sizes.
+ */
+static void gen_vfp_ah_negh(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32(), chs_s = tcg_temp_new_i32();
+
+    gen_vfp_negh(chs_s, s);
+    gen_vfp_absh(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7c00),
+                        s, chs_s);
+}
+
+static void gen_vfp_ah_negs(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32(), chs_s = tcg_temp_new_i32();
+
+    gen_vfp_negs(chs_s, s);
+    gen_vfp_abss(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7f800000UL),
+                        s, chs_s);
+}
+
+static void gen_vfp_ah_negd(TCGv_i64 d, TCGv_i64 s)
+{
+    TCGv_i64 abs_s = tcg_temp_new_i64(), chs_s = tcg_temp_new_i64();
+
+    gen_vfp_negd(chs_s, s);
+    gen_vfp_absd(abs_s, s);
+    tcg_gen_movcond_i64(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i64(0x7ff0000000000000ULL),
+                        s, chs_s);
+}
+
+static void gen_vfp_maybe_ah_negh(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
+{
+    if (dc->fpcr_ah) {
+        gen_vfp_ah_negh(d, s);
+    } else {
+        gen_vfp_negh(d, s);
+    }
+}
+
+static void gen_vfp_maybe_ah_negs(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
+{
+    if (dc->fpcr_ah) {
+        gen_vfp_ah_negs(d, s);
+    } else {
+        gen_vfp_negs(d, s);
+    }
+}
+
+static void gen_vfp_maybe_ah_negd(DisasContext *dc, TCGv_i64 d, TCGv_i64 s)
+{
+    if (dc->fpcr_ah) {
+        gen_vfp_ah_negd(d, s);
+    } else {
+        gen_vfp_negd(d, s);
+    }
+}
+
 /* Set ZF and NF based on a 64 bit result. This is alas fiddlier
  * than the 32 bit equivalent.
  */
@@ -XXX,XX +XXX,XX @@ static void gen_fnmul_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
     gen_vfp_negd(d, d);
 }
 
+static void gen_fnmul_ah_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_mulh(d, n, m, s);
+    gen_vfp_ah_negh(d, d);
+}
+
+static void gen_fnmul_ah_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_muls(d, n, m, s);
+    gen_vfp_ah_negs(d, d);
+}
+
+static void gen_fnmul_ah_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
+{
+    gen_helper_vfp_muld(d, n, m, s);
+    gen_vfp_ah_negd(d, d);
+}
+
 static const FPScalar f_scalar_fnmul = {
     gen_fnmul_h,
     gen_fnmul_s,
     gen_fnmul_d,
 };
-TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul, a->rn)
+static const FPScalar f_scalar_ah_fnmul = {
+    gen_fnmul_ah_h,
+    gen_fnmul_ah_s,
+    gen_fnmul_ah_d,
+};
+TRANS(FNMUL_s, do_fp3_scalar_2fn, a, &f_scalar_fnmul, &f_scalar_ah_fnmul, a->rn)
 
 static const FPScalar f_scalar_fcmeq = {
     gen_helper_advsimd_ceq_f16,
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
 
             read_vec_element(s, t2, a->rm, a->idx, MO_64);
             if (neg) {
-                gen_vfp_negd(t1, t1);
+                gen_vfp_maybe_ah_negd(s, t1, t1);
             }
             gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
             write_fp_dreg_merging(s, a->rd, a->rd, t0);
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
 
             read_vec_element_i32(s, t2, a->rm, a->idx, MO_32);
             if (neg) {
-                gen_vfp_negs(t1, t1);
+                gen_vfp_maybe_ah_negs(s, t1, t1);
             }
             gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
             write_fp_sreg_merging(s, a->rd, a->rd, t0);
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
 
             read_vec_element_i32(s, t2, a->rm, a->idx, MO_16);
             if (neg) {
-                gen_vfp_negh(t1, t1);
+                gen_vfp_maybe_ah_negh(s, t1, t1);
             }
             gen_helper_advsimd_muladdh(t0, t1, t2, t0,
                                        fpstatus_ptr(FPST_A64_F16));
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             TCGv_i64 ta = read_fp_dreg(s, a->ra);
 
             if (neg_a) {
-                gen_vfp_negd(ta, ta);
+                gen_vfp_maybe_ah_negd(s, ta, ta);
             }
             if (neg_n) {
-                gen_vfp_negd(tn, tn);
+                gen_vfp_maybe_ah_negd(s, tn, tn);
             }
             fpst = fpstatus_ptr(FPST_A64);
             gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             TCGv_i32 ta = read_fp_sreg(s, a->ra);
 
             if (neg_a) {
-                gen_vfp_negs(ta, ta);
+                gen_vfp_maybe_ah_negs(s, ta, ta);
             }
             if (neg_n) {
-                gen_vfp_negs(tn, tn);
+                gen_vfp_maybe_ah_negs(s, tn, tn);
             }
             fpst = fpstatus_ptr(FPST_A64);
             gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             TCGv_i32 ta = read_fp_hreg(s, a->ra);
 
             if (neg_a) {
-                gen_vfp_negh(ta, ta);
+                gen_vfp_maybe_ah_negh(s, ta, ta);
             }
             if (neg_n) {
-                gen_vfp_negh(tn, tn);
+                gen_vfp_maybe_ah_negh(s, tn, tn);
             }
             fpst = fpstatus_ptr(FPST_A64_F16);
             gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
     return true;
 }
 
+static bool do_fp1_scalar_int_2fn(DisasContext *s, arg_rr_e *a,
+                                  const FPScalar1Int *fnormal,
+                                  const FPScalar1Int *fah)
+{
+    return do_fp1_scalar_int(s, a, s->fpcr_ah ? fah : fnormal, true);
+}
+
 static const FPScalar1Int f_scalar_fmov = {
     tcg_gen_mov_i32,
     tcg_gen_mov_i32,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fneg = {
     gen_vfp_negs,
     gen_vfp_negd,
 };
-TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg, true)
+static const FPScalar1Int f_scalar_ah_fneg = {
+    gen_vfp_ah_negh,
+    gen_vfp_ah_negs,
+    gen_vfp_ah_negd,
+};
+TRANS(FNEG_s, do_fp1_scalar_int_2fn, a, &f_scalar_fneg, &f_scalar_ah_fneg)
 
 typedef struct FPScalar1 {
     void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr);
-- 
2.34.1

FPCR.AH == 1 mandates that taking the absolute value of a NaN should
not change its sign bit.  This means we can no longer use
gen_vfp_abs*() everywhere but must instead generate slightly more
complex code when FPCR.AH is set.

Implement these semantics for scalar FABS and FABD.  This change also
affects all other instructions whose psuedocode calls FPAbs(); we
will extend the change to those instructions in following commits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 69 +++++++++++++++++++++++++++++++++-
 1 file changed, 67 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_vfp_ah_negd(TCGv_i64 d, TCGv_i64 s)
                         s, chs_s);
 }
 
+/*
+ * These functions implement
+ *  d = floatN_is_any_nan(s) ? s : floatN_abs(s)
+ * which for float32 is
+ *  d = (s & ~(1 << 31)) > 0x7f800000UL) ? s : (s & ~(1 << 31))
+ * and similarly for the other float sizes.
+ */
+static void gen_vfp_ah_absh(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32();
+
+    gen_vfp_absh(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7c00),
+                        s, abs_s);
+}
+
+static void gen_vfp_ah_abss(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32();
+
+    gen_vfp_abss(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7f800000UL),
+                        s, abs_s);
+}
+
+static void gen_vfp_ah_absd(TCGv_i64 d, TCGv_i64 s)
+{
+    TCGv_i64 abs_s = tcg_temp_new_i64();
+
+    gen_vfp_absd(abs_s, s);
+    tcg_gen_movcond_i64(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i64(0x7ff0000000000000ULL),
+                        s, abs_s);
+}
+
 static void gen_vfp_maybe_ah_negh(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
 {
     if (dc->fpcr_ah) {
@@ -XXX,XX +XXX,XX @@ static void gen_fabd_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
     gen_vfp_absd(d, d);
 }
 
+static void gen_fabd_ah_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_subh(d, n, m, s);
+    gen_vfp_ah_absh(d, d);
+}
+
+static void gen_fabd_ah_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_subs(d, n, m, s);
+    gen_vfp_ah_abss(d, d);
+}
+
+static void gen_fabd_ah_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
+{
+    gen_helper_vfp_subd(d, n, m, s);
+    gen_vfp_ah_absd(d, d);
+}
+
 static const FPScalar f_scalar_fabd = {
     gen_fabd_h,
     gen_fabd_s,
     gen_fabd_d,
 };
-TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd, a->rn)
+static const FPScalar f_scalar_ah_fabd = {
+    gen_fabd_ah_h,
+    gen_fabd_ah_s,
+    gen_fabd_ah_d,
+};
+TRANS(FABD_s, do_fp3_scalar_2fn, a, &f_scalar_fabd, &f_scalar_ah_fabd, a->rn)
 
 static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f16,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fabs = {
     gen_vfp_abss,
     gen_vfp_absd,
 };
-TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs, true)
+static const FPScalar1Int f_scalar_ah_fabs = {
+    gen_vfp_ah_absh,
+    gen_vfp_ah_abss,
+    gen_vfp_ah_absd,
+};
+TRANS(FABS_s, do_fp1_scalar_int_2fn, a, &f_scalar_fabs, &f_scalar_ah_fabs)
 
 static const FPScalar1Int f_scalar_fneg = {
     gen_vfp_negh,
-- 
2.34.1

Split the handling of vector FABD so that it calls a different set
of helpers when FPCR.AH is 1, which implement the "no negation of
the sign of a NaN" semantics.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h            |  4 ++++
 target/arm/tcg/translate-a64.c |  7 ++++++-
 target/arm/tcg/vec_helper.c    | 23 +++++++++++++++++++++++
 3 files changed, 33 insertions(+), 1 deletion(-)

Make SVE FNEG honour the FPCR.AH "don't negate the sign of a NaN"
semantics.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 4 ++++
 target/arm/tcg/sve_helper.c    | 8 ++++++++
 target/arm/tcg/translate-sve.c | 7 ++++++-
 3 files changed, 18 insertions(+), 1 deletion(-)

Make SVE FABS honour the FPCR.AH "don't negate the sign of a NaN"
semantics.

Make the SVE FABD insn honour the FPCR.AH "don't negate the sign
of a NaN" semantics.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    |  7 +++++++
 target/arm/tcg/sve_helper.c    | 22 ++++++++++++++++++++++
 target/arm/tcg/translate-sve.c |  2 +-
 3 files changed, 30 insertions(+), 1 deletion(-)

The negation steps in FCADD must honour FPCR.AH's "don't change the
sign of a NaN" semantics.  Implement this in the same way we did for
the base ASIMD FCADD, by encoding FPCR.AH into the SIMD data field
passed to the helper and using that to decide whether to negate the
values.

The construction of neg_imag and neg_real were done to make it easy
to apply both in parallel with two simple logical operations.  This
changed with FPCR.AH, which is more complex than that. Switch to
an approach that follows the pseudocode more closely, by extracting
the 'rot=1' parameter from the SIMD data field and changing the
sign of the appropriate input value.

Note that there was a naming issue with neg_imag and neg_real.
They were named backward, with neg_imag being non-zero for rot=1,
and vice versa.  This was combined with reversed usage within the
loop, so that the negation in the end turned out correct.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/vec_internal.h  | 17 ++++++++++++++
 target/arm/tcg/sve_helper.c    | 42 ++++++++++++++++++++++++----------
 target/arm/tcg/translate-sve.c |  2 +-
 3 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_internal.h
+++ b/target/arm/tcg/vec_internal.h
@@ -XXX,XX +XXX,XX @@
 #ifndef TARGET_ARM_VEC_INTERNAL_H
 #define TARGET_ARM_VEC_INTERNAL_H
 
+#include "fpu/softfloat.h"
+
 /*
  * Note that vector data is stored in host-endian 64-bit chunks,
  * so addressing units smaller than that needs a host-endian fixup.
@@ -XXX,XX +XXX,XX @@ float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2,
  */
 bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp);
 
+static inline float16 float16_maybe_ah_chs(float16 a, bool fpcr_ah)
+{
+    return fpcr_ah && float16_is_any_nan(a) ? a : float16_chs(a);
+}
+
+static inline float32 float32_maybe_ah_chs(float32 a, bool fpcr_ah)
+{
+    return fpcr_ah && float32_is_any_nan(a) ? a : float32_chs(a);
+}
+
+static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah)
+{
+    return fpcr_ah && float64_is_any_nan(a) ? a : float64_chs(a);
+}
+
 #endif /* TARGET_ARM_VEC_INTERNAL_H */
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg,
 {
     intptr_t j, i = simd_oprsz(desc);
     uint64_t *g = vg;
-    float16 neg_imag = float16_set_sign(0, simd_data(desc));
-    float16 neg_real = float16_chs(neg_imag);
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg,
             i -= 2 * sizeof(float16);
 
             e0 = *(float16 *)(vn + H1_2(i));
-            e1 = *(float16 *)(vm + H1_2(j)) ^ neg_real;
+            e1 = *(float16 *)(vm + H1_2(j));
             e2 = *(float16 *)(vn + H1_2(j));
-            e3 = *(float16 *)(vm + H1_2(i)) ^ neg_imag;
+            e3 = *(float16 *)(vm + H1_2(i));
+
+            if (rot) {
+                e3 = float16_maybe_ah_chs(e3, fpcr_ah);
+            } else {
+                e1 = float16_maybe_ah_chs(e1, fpcr_ah);
+            }
 
             if (likely((pg >> (i & 63)) & 1)) {
                 *(float16 *)(vd + H1_2(i)) = float16_add(e0, e1, s);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg,
 {
     intptr_t j, i = simd_oprsz(desc);
     uint64_t *g = vg;
-    float32 neg_imag = float32_set_sign(0, simd_data(desc));
-    float32 neg_real = float32_chs(neg_imag);
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg,
             i -= 2 * sizeof(float32);
 
             e0 = *(float32 *)(vn + H1_2(i));
-            e1 = *(float32 *)(vm + H1_2(j)) ^ neg_real;
+            e1 = *(float32 *)(vm + H1_2(j));
             e2 = *(float32 *)(vn + H1_2(j));
-            e3 = *(float32 *)(vm + H1_2(i)) ^ neg_imag;
+            e3 = *(float32 *)(vm + H1_2(i));
+
+            if (rot) {
+                e3 = float32_maybe_ah_chs(e3, fpcr_ah);
+            } else {
+                e1 = float32_maybe_ah_chs(e1, fpcr_ah);
+            }
 
             if (likely((pg >> (i & 63)) & 1)) {
                 *(float32 *)(vd + H1_2(i)) = float32_add(e0, e1, s);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg,
 {
     intptr_t j, i = simd_oprsz(desc);
     uint64_t *g = vg;
-    float64 neg_imag = float64_set_sign(0, simd_data(desc));
-    float64 neg_real = float64_chs(neg_imag);
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg,
             i -= 2 * sizeof(float64);
 
             e0 = *(float64 *)(vn + H1_2(i));
-            e1 = *(float64 *)(vm + H1_2(j)) ^ neg_real;
+            e1 = *(float64 *)(vm + H1_2(j));
             e2 = *(float64 *)(vn + H1_2(j));
-            e3 = *(float64 *)(vm + H1_2(i)) ^ neg_imag;
+            e3 = *(float64 *)(vm + H1_2(i));
+
+            if (rot) {
+                e3 = float64_maybe_ah_chs(e3, fpcr_ah);
+            } else {
+                e1 = float64_maybe_ah_chs(e1, fpcr_ah);
+            }
 
             if (likely((pg >> (i & 63)) & 1)) {
                 *(float64 *)(vd + H1_2(i)) = float64_add(e0, e1, s);
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_4_ptr * const fcadd_fns[] = {
     gen_helper_sve_fcadd_s, gen_helper_sve_fcadd_d,
 };
 TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
-           a->rd, a->rn, a->rm, a->pg, a->rot,
+           a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
 #define DO_FMLA(NAME, name) \
-- 
2.34.1

The negation steps in FCADD must honour FPCR.AH's "don't change the
sign of a NaN" semantics.  Implement this by encoding FPCR.AH into
the SIMD data field passed to the helper and using that to decide
whether to negate the values.

The construction of neg_imag and neg_real were done to make it easy
to apply both in parallel with two simple logical operations.  This
changed with FPCR.AH, which is more complex than that. Switch to
an approach closer to the pseudocode, where we extract the rot
parameter from the SIMD data word and negate the appropriate
input value.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 10 +++++--
 target/arm/tcg/vec_helper.c    | 54 +++++++++++++++++++---------------
 2 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fcadd[3] = {
     gen_helper_gvec_fcadds,
     gen_helper_gvec_fcaddd,
 };
-TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0, f_vector_fcadd)
-TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1, f_vector_fcadd)
+/*
+ * Encode FPCR.AH into the data so the helper knows whether the
+ * negations it does should avoid flipping the sign bit on a NaN
+ */
+TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0 | (s->fpcr_ah << 1),
+           f_vector_fcadd)
+TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1 | (s->fpcr_ah << 1),
+           f_vector_fcadd)
 
 static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
 {
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm,
     float16 *d = vd;
     float16 *n = vn;
     float16 *m = vm;
-    uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = neg_real ^ 1;
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 15;
-    neg_imag <<= 15;
-
     for (i = 0; i < opr_sz / 2; i += 2) {
         float16 e0 = n[H2(i)];
-        float16 e1 = m[H2(i + 1)] ^ neg_imag;
+        float16 e1 = m[H2(i + 1)];
         float16 e2 = n[H2(i + 1)];
-        float16 e3 = m[H2(i)] ^ neg_real;
+        float16 e3 = m[H2(i)];
+
+        if (rot) {
+            e3 = float16_maybe_ah_chs(e3, fpcr_ah);
+        } else {
+            e1 = float16_maybe_ah_chs(e1, fpcr_ah);
+        }
 
         d[H2(i)] = float16_add(e0, e1, fpst);
         d[H2(i + 1)] = float16_add(e2, e3, fpst);
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcadds)(void *vd, void *vn, void *vm,
     float32 *d = vd;
     float32 *n = vn;
     float32 *m = vm;
-    uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = neg_real ^ 1;
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 31;
-    neg_imag <<= 31;
-
     for (i = 0; i < opr_sz / 4; i += 2) {
         float32 e0 = n[H4(i)];
-        float32 e1 = m[H4(i + 1)] ^ neg_imag;
+        float32 e1 = m[H4(i + 1)];
         float32 e2 = n[H4(i + 1)];
-        float32 e3 = m[H4(i)] ^ neg_real;
+        float32 e3 = m[H4(i)];
+
+        if (rot) {
+            e3 = float32_maybe_ah_chs(e3, fpcr_ah);
+        } else {
+            e1 = float32_maybe_ah_chs(e1, fpcr_ah);
+        }
 
         d[H4(i)] = float32_add(e0, e1, fpst);
         d[H4(i + 1)] = float32_add(e2, e3, fpst);
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
     float64 *d = vd;
     float64 *n = vn;
     float64 *m = vm;
-    uint64_t neg_real = extract64(desc, SIMD_DATA_SHIFT, 1);
-    uint64_t neg_imag = neg_real ^ 1;
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 63;
-    neg_imag <<= 63;
-
     for (i = 0; i < opr_sz / 8; i += 2) {
         float64 e0 = n[i];
-        float64 e1 = m[i + 1] ^ neg_imag;
+        float64 e1 = m[i + 1];
         float64 e2 = n[i + 1];
-        float64 e3 = m[i] ^ neg_real;
+        float64 e3 = m[i];
+
+        if (rot) {
+            e3 = float64_maybe_ah_chs(e3, fpcr_ah);
+        } else {
+            e1 = float64_maybe_ah_chs(e1, fpcr_ah);
+        }
 
         d[i] = float64_add(e0, e1, fpst);
         d[i + 1] = float64_add(e2, e3, fpst);
-- 
2.34.1

Handle the FPCR.AH semantics that we do not change the sign of an
input NaN in the FRECPS and FRSQRTS scalar insns, by providing
new helper functions that do the CHS part of the operation
differently.

Since the extra helper functions would be very repetitive if written
out longhand, we condense them and the existing non-AH helpers into
being emitted via macros.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-a64.h    |   6 ++
 target/arm/tcg/vec_internal.h  |  18 ++++++
 target/arm/tcg/helper-a64.c    | 115 ++++++++++++---------------------
 target/arm/tcg/translate-a64.c |  25 +++++--
 4 files changed, 83 insertions(+), 81 deletions(-)

diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(neon_cgt_f64, TCG_CALL_NO_RWG, i64, i64, i64, fpst)
 DEF_HELPER_FLAGS_3(recpsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
 DEF_HELPER_FLAGS_3(recpsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
 DEF_HELPER_FLAGS_3(recpsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
+DEF_HELPER_FLAGS_3(recpsf_ah_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
+DEF_HELPER_FLAGS_3(recpsf_ah_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
+DEF_HELPER_FLAGS_3(recpsf_ah_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
 DEF_HELPER_FLAGS_3(rsqrtsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
 DEF_HELPER_FLAGS_3(rsqrtsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
 DEF_HELPER_FLAGS_3(rsqrtsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
+DEF_HELPER_FLAGS_3(rsqrtsf_ah_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
+DEF_HELPER_FLAGS_3(rsqrtsf_ah_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
+DEF_HELPER_FLAGS_3(rsqrtsf_ah_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
 DEF_HELPER_FLAGS_2(frecpx_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
 DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
 DEF_HELPER_FLAGS_2(frecpx_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_internal.h
+++ b/target/arm/tcg/vec_internal.h
@@ -XXX,XX +XXX,XX @@ float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2,
  */
 bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp);
 
+/*
+ * Negate as for FPCR.AH=1 -- do not negate NaNs.
+ */
+static inline float16 float16_ah_chs(float16 a)
+{
+    return float16_is_any_nan(a) ? a : float16_chs(a);
+}
+
+static inline float32 float32_ah_chs(float32 a)
+{
+    return float32_is_any_nan(a) ? a : float32_chs(a);
+}
+
+static inline float64 float64_ah_chs(float64 a)
+{
+    return float64_is_any_nan(a) ? a : float64_chs(a);
+}
+
 static inline float16 float16_maybe_ah_chs(float16 a, bool fpcr_ah)
 {
     return fpcr_ah && float16_is_any_nan(a) ? a : float16_chs(a);
diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-a64.c
+++ b/target/arm/tcg/helper-a64.c
@@ -XXX,XX +XXX,XX @@
 #ifdef CONFIG_USER_ONLY
 #include "user/page-protection.h"
 #endif
+#include "vec_internal.h"
 
 /* C2.4.7 Multiply and divide */
 /* special cases for 0 and LLONG_MIN are mandated by the standard */
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_cgt_f64)(float64 a, float64 b, float_status *fpst)
     return -float64_lt(b, a, fpst);
 }
 
-/* Reciprocal step and sqrt step. Note that unlike the A32/T32
+/*
+ * Reciprocal step and sqrt step. Note that unlike the A32/T32
  * versions, these do a fully fused multiply-add or
  * multiply-add-and-halve.
+ * The FPCR.AH == 1 versions need to avoid flipping the sign of NaN.
  */
-
-uint32_t HELPER(recpsf_f16)(uint32_t a, uint32_t b, float_status *fpst)
-{
-    a = float16_squash_input_denormal(a, fpst);
-    b = float16_squash_input_denormal(b, fpst);
-
-    a = float16_chs(a);
-    if ((float16_is_infinity(a) && float16_is_zero(b)) ||
-        (float16_is_infinity(b) && float16_is_zero(a))) {
-        return float16_two;
+#define DO_RECPS(NAME, CTYPE, FLOATTYPE, CHSFN)                         \
+    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
+    {                                                                   \
+        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
+        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
+        a = FLOATTYPE ## _ ## CHSFN(a);                                 \
+        if ((FLOATTYPE ## _is_infinity(a) && FLOATTYPE ## _is_zero(b)) || \
+            (FLOATTYPE ## _is_infinity(b) && FLOATTYPE ## _is_zero(a))) { \
+            return FLOATTYPE ## _two;                                   \
+        }                                                               \
+        return FLOATTYPE ## _muladd(a, b, FLOATTYPE ## _two, 0, fpst);  \
     }
-    return float16_muladd(a, b, float16_two, 0, fpst);
-}
 
-float32 HELPER(recpsf_f32)(float32 a, float32 b, float_status *fpst)
-{
-    a = float32_squash_input_denormal(a, fpst);
-    b = float32_squash_input_denormal(b, fpst);
+DO_RECPS(recpsf_f16, uint32_t, float16, chs)
+DO_RECPS(recpsf_f32, float32, float32, chs)
+DO_RECPS(recpsf_f64, float64, float64, chs)
+DO_RECPS(recpsf_ah_f16, uint32_t, float16, ah_chs)
+DO_RECPS(recpsf_ah_f32, float32, float32, ah_chs)
+DO_RECPS(recpsf_ah_f64, float64, float64, ah_chs)
 
-    a = float32_chs(a);
-    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
-        (float32_is_infinity(b) && float32_is_zero(a))) {
-        return float32_two;
-    }
-    return float32_muladd(a, b, float32_two, 0, fpst);
-}
+#define DO_RSQRTSF(NAME, CTYPE, FLOATTYPE, CHSFN)                       \
+    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
+    {                                                                   \
+        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
+        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
+        a = FLOATTYPE ## _ ## CHSFN(a);                                 \
+        if ((FLOATTYPE ## _is_infinity(a) && FLOATTYPE ## _is_zero(b)) || \
+            (FLOATTYPE ## _is_infinity(b) && FLOATTYPE ## _is_zero(a))) { \
+            return FLOATTYPE ## _one_point_five;                        \
+        }                                                               \
+        return FLOATTYPE ## _muladd_scalbn(a, b, FLOATTYPE ## _three,   \
+                                           -1, 0, fpst);                \
+    }                                                                   \
 
-float64 HELPER(recpsf_f64)(float64 a, float64 b, float_status *fpst)
-{
-    a = float64_squash_input_denormal(a, fpst);
-    b = float64_squash_input_denormal(b, fpst);
-
-    a = float64_chs(a);
-    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
-        (float64_is_infinity(b) && float64_is_zero(a))) {
-        return float64_two;
-    }
-    return float64_muladd(a, b, float64_two, 0, fpst);
-}
-
-uint32_t HELPER(rsqrtsf_f16)(uint32_t a, uint32_t b, float_status *fpst)
-{
-    a = float16_squash_input_denormal(a, fpst);
-    b = float16_squash_input_denormal(b, fpst);
-
-    a = float16_chs(a);
-    if ((float16_is_infinity(a) && float16_is_zero(b)) ||
-        (float16_is_infinity(b) && float16_is_zero(a))) {
-        return float16_one_point_five;
-    }
-    return float16_muladd_scalbn(a, b, float16_three, -1, 0, fpst);
-}
-
-float32 HELPER(rsqrtsf_f32)(float32 a, float32 b, float_status *fpst)
-{
-    a = float32_squash_input_denormal(a, fpst);
-    b = float32_squash_input_denormal(b, fpst);
-
-    a = float32_chs(a);
-    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
-        (float32_is_infinity(b) && float32_is_zero(a))) {
-        return float32_one_point_five;
-    }
-    return float32_muladd_scalbn(a, b, float32_three, -1, 0, fpst);
-}
-
-float64 HELPER(rsqrtsf_f64)(float64 a, float64 b, float_status *fpst)
-{
-    a = float64_squash_input_denormal(a, fpst);
-    b = float64_squash_input_denormal(b, fpst);
-
-    a = float64_chs(a);
-    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
-        (float64_is_infinity(b) && float64_is_zero(a))) {
-        return float64_one_point_five;
-    }
-    return float64_muladd_scalbn(a, b, float64_three, -1, 0, fpst);
-}
+DO_RSQRTSF(rsqrtsf_f16, uint32_t, float16, chs)
+DO_RSQRTSF(rsqrtsf_f32, float32, float32, chs)
+DO_RSQRTSF(rsqrtsf_f64, float64, float64, chs)
+DO_RSQRTSF(rsqrtsf_ah_f16, uint32_t, float16, ah_chs)
+DO_RSQRTSF(rsqrtsf_ah_f32, float32, float32, ah_chs)
+DO_RSQRTSF(rsqrtsf_ah_f64, float64, float64, ah_chs)
 
 /* Floating-point reciprocal exponent - see FPRecpX in ARM ARM */
 uint32_t HELPER(frecpx_f16)(uint32_t a, float_status *fpst)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
                                        FPST_A64_F16 : FPST_A64);
 }
 
-static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
-                             int mergereg)
+static bool do_fp3_scalar_ah_2fn(DisasContext *s, arg_rrr_e *a,
+                                 const FPScalar *fnormal, const FPScalar *fah,
+                                 int mergereg)
 {
-    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
-                                       select_ah_fpst(s, a->esz));
+    return do_fp3_scalar_with_fpsttype(s, a, s->fpcr_ah ? fah : fnormal,
+                                       mergereg, select_ah_fpst(s, a->esz));
 }
 
 /* Some insns need to call different helpers when FPCR.AH == 1 */
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f32,
     gen_helper_recpsf_f64,
 };
-TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps, a->rn)
+static const FPScalar f_scalar_ah_frecps = {
+    gen_helper_recpsf_ah_f16,
+    gen_helper_recpsf_ah_f32,
+    gen_helper_recpsf_ah_f64,
+};
+TRANS(FRECPS_s, do_fp3_scalar_ah_2fn, a,
+      &f_scalar_frecps, &f_scalar_ah_frecps, a->rn)
 
 static const FPScalar f_scalar_frsqrts = {
     gen_helper_rsqrtsf_f16,
     gen_helper_rsqrtsf_f32,
     gen_helper_rsqrtsf_f64,
 };
-TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts, a->rn)
+static const FPScalar f_scalar_ah_frsqrts = {
+    gen_helper_rsqrtsf_ah_f16,
+    gen_helper_rsqrtsf_ah_f32,
+    gen_helper_rsqrtsf_ah_f64,
+};
+TRANS(FRSQRTS_s, do_fp3_scalar_ah_2fn, a,
+      &f_scalar_frsqrts, &f_scalar_ah_frsqrts, a->rn)
 
 static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                        const FPScalar *f, bool swap)
-- 
2.34.1

Handle the FPCR.AH "don't negate the sign of a NaN" semantics
in the vector versions of FRECPS and FRSQRTS, by implementing
new vector wrappers that call the _ah_ scalar helpers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 21 ++++++++++++++++-----
 target/arm/tcg/translate-sve.c |  7 ++++++-
 target/arm/tcg/vec_helper.c    |  8 ++++++++
 4 files changed, 44 insertions(+), 6 deletions(-)

Handle the FPCR.AH "don't negate the sign of a NaN" semantics in FMLS
(indexed). We do this by creating 6 new helpers, which allow us to
do the negation either by XOR (for AH=0) or by muladd flags
(for AH=1).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: Mostly from RTH's patch; error in index order into fns[][]
 fixed]
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h            | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 17 +++++++++++------
 target/arm/tcg/translate-sve.c | 31 +++++++++++++++++--------------
 target/arm/tcg/vec_helper.c    | 24 +++++++++++++++---------
 4 files changed, 57 insertions(+), 29 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_6(gvec_fmls_idx_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(gvec_fmls_idx_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(gvec_fmls_idx_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_5(gvec_uqadd_b, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_uqadd_h, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ TRANS(FMULX_vi, do_fp3_vector_idx, a, f_vector_idx_fmulx)
 
 static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
 {
-    static gen_helper_gvec_4_ptr * const fns[3] = {
-        gen_helper_gvec_fmla_idx_h,
-        gen_helper_gvec_fmla_idx_s,
-        gen_helper_gvec_fmla_idx_d,
+    static gen_helper_gvec_4_ptr * const fns[3][3] = {
+        { gen_helper_gvec_fmla_idx_h,
+          gen_helper_gvec_fmla_idx_s,
+          gen_helper_gvec_fmla_idx_d },
+        { gen_helper_gvec_fmls_idx_h,
+          gen_helper_gvec_fmls_idx_s,
+          gen_helper_gvec_fmls_idx_d },
+        { gen_helper_gvec_ah_fmls_idx_h,
+          gen_helper_gvec_ah_fmls_idx_s,
+          gen_helper_gvec_ah_fmls_idx_d },
     };
     MemOp esz = a->esz;
     int check = fp_access_check_vector_hsd(s, a->q, esz);
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
                       esz == MO_16 ? FPST_A64_F16 : FPST_A64,
-                      (a->idx << 1) | neg,
-                      fns[esz - 1]);
+                      a->idx, fns[neg ? 1 + s->fpcr_ah : 0][esz - 1]);
     return true;
 }
 
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ DO_SVE2_RRXR_ROT(CDOT_zzxw_d, gen_helper_sve2_cdot_idx_d)
  *** SVE Floating Point Multiply-Add Indexed Group
  */
 
-static bool do_FMLA_zzxz(DisasContext *s, arg_rrxr_esz *a, bool sub)
-{
-    static gen_helper_gvec_4_ptr * const fns[4] = {
-        NULL,
-        gen_helper_gvec_fmla_idx_h,
-        gen_helper_gvec_fmla_idx_s,
-        gen_helper_gvec_fmla_idx_d,
-    };
-    return gen_gvec_fpst_zzzz(s, fns[a->esz], a->rd, a->rn, a->rm, a->ra,
-                              (a->index << 1) | sub,
-                              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
-}
+static gen_helper_gvec_4_ptr * const fmla_idx_fns[4] = {
+    NULL,                       gen_helper_gvec_fmla_idx_h,
+    gen_helper_gvec_fmla_idx_s, gen_helper_gvec_fmla_idx_d
+};
+TRANS_FEAT(FMLA_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
+           fmla_idx_fns[a->esz], a->rd, a->rn, a->rm, a->ra, a->index,
+           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
-TRANS_FEAT(FMLA_zzxz, aa64_sve, do_FMLA_zzxz, a, false)
-TRANS_FEAT(FMLS_zzxz, aa64_sve, do_FMLA_zzxz, a, true)
+static gen_helper_gvec_4_ptr * const fmls_idx_fns[4][2] = {
+    { NULL, NULL },
+    { gen_helper_gvec_fmls_idx_h, gen_helper_gvec_ah_fmls_idx_h },
+    { gen_helper_gvec_fmls_idx_s, gen_helper_gvec_ah_fmls_idx_s },
+    { gen_helper_gvec_fmls_idx_d, gen_helper_gvec_ah_fmls_idx_d },
+};
+TRANS_FEAT(FMLS_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
+           fmls_idx_fns[a->esz][s->fpcr_ah],
+           a->rd, a->rn, a->rm, a->ra, a->index,
+           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
 /*
  *** SVE Floating Point Multiply Indexed Group
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_FMUL_IDX(gvec_fmls_nf_idx_s, float32_sub, float32_mul, float32, H4)
 
 #undef DO_FMUL_IDX
 
-#define DO_FMLA_IDX(NAME, TYPE, H)                                         \
+#define DO_FMLA_IDX(NAME, TYPE, H, NEGX, NEGF)                             \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *va,                  \
                   float_status *stat, uint32_t desc)                       \
 {                                                                          \
     intptr_t i, j, oprsz = simd_oprsz(desc);                               \
     intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
-    TYPE op1_neg = extract32(desc, SIMD_DATA_SHIFT, 1);                    \
-    intptr_t idx = desc >> (SIMD_DATA_SHIFT + 1);                          \
+    intptr_t idx = simd_data(desc);                                        \
     TYPE *d = vd, *n = vn, *m = vm, *a = va;                               \
-    op1_neg <<= (8 * sizeof(TYPE) - 1);                                    \
     for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
         TYPE mm = m[H(i + idx)];                                           \
         for (j = 0; j < segment; j++) {                                    \
-            d[i + j] = TYPE##_muladd(n[i + j] ^ op1_neg,                   \
-                                     mm, a[i + j], 0, stat);               \
+            d[i + j] = TYPE##_muladd(n[i + j] ^ NEGX, mm,                  \
+                                     a[i + j], NEGF, stat);                \
         }                                                                  \
     }                                                                      \
     clear_tail(d, oprsz, simd_maxsz(desc));                                \
 }
 
-DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2)
-DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4)
-DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8)
+DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2, 0, 0)
+DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4, 0, 0)
+DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8, 0, 0)
+
+DO_FMLA_IDX(gvec_fmls_idx_h, float16, H2, INT16_MIN, 0)
+DO_FMLA_IDX(gvec_fmls_idx_s, float32, H4, INT32_MIN, 0)
+DO_FMLA_IDX(gvec_fmls_idx_d, float64, H8, INT64_MIN, 0)
+
+DO_FMLA_IDX(gvec_ah_fmls_idx_h, float16, H2, 0, float_muladd_negate_product)
+DO_FMLA_IDX(gvec_ah_fmls_idx_s, float32, H4, 0, float_muladd_negate_product)
+DO_FMLA_IDX(gvec_ah_fmls_idx_d, float64, H8, 0, float_muladd_negate_product)
 
 #undef DO_FMLA_IDX
 
-- 
2.34.1

Handle the FPCR.AH "don't negate the sign of a NaN" semantics
in FMLS (vector), by implementing a new set of helpers for
the AH=1 case.

The float_muladd_negate_product flag produces the same result
as negating either of the multiplication operands, assuming
neither of the operands are NaNs.  But since FEAT_AFP does not
negate NaNs, this behaviour is exactly what we need.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h            |  4 ++++
 target/arm/tcg/translate-a64.c |  7 ++++++-
 target/arm/tcg/vec_helper.c    | 22 ++++++++++++++++++++++
 3 files changed, 32 insertions(+), 1 deletion(-)

Handle the FPCR.AH "don't negate the sign of a NaN" semantics fro the
SVE FMLS (vector) insns, by providing new helpers for the AH=1 case
which end up passing fpcr_ah = true to the do_fmla_zpzzz_* functions
that do the work.

The float*_muladd functions have a flags argument that can
perform optional negation of various operand.  We don't use
that for "normal" arm fmla, because the muladd flags are not
applied when an input is a NaN.  But since FEAT_AFP does not
negate NaNs, this behaviour is exactly what we need.

The non-AH helpers pass in a zero flags argument and control the
negation via the neg1 and neg3 arguments; the AH helpers always pass
in neg1 and neg3 as zero and control the negation via the flags
argument.  This allows us to avoid conditional branches within the
inner loop.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 21 ++++++++
 target/arm/tcg/sve_helper.c    | 99 +++++++++++++++++++++++++++-------
 target/arm/tcg/translate-sve.c | 18 ++++---
 3 files changed, 114 insertions(+), 24 deletions(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_ZPZ_FP(flogb_d, float64, H1_8, do_float64_logb_as_int)
 
 static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
                             float_status *status, uint32_t desc,
-                            uint16_t neg1, uint16_t neg3)
+                            uint16_t neg1, uint16_t neg3, int flags)
 {
     intptr_t i = simd_oprsz(desc);
     uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
                 e1 = *(uint16_t *)(vn + H1_2(i)) ^ neg1;
                 e2 = *(uint16_t *)(vm + H1_2(i));
                 e3 = *(uint16_t *)(va + H1_2(i)) ^ neg3;
-                r = float16_muladd(e1, e2, e3, 0, status);
+                r = float16_muladd(e1, e2, e3, flags, status);
                 *(uint16_t *)(vd + H1_2(i)) = r;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
 void HELPER(sve_fmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
 }
 
 void HELPER(sve_fmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0, 0);
 }
 
 void HELPER(sve_fnmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000, 0);
 }
 
 void HELPER(sve_fnmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000, 0);
+}
+
+void HELPER(sve_ah_fmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product);
+}
+
+void HELPER(sve_ah_fnmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product | float_muladd_negate_c);
+}
+
+void HELPER(sve_ah_fnmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_c);
 }
 
 static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
                             float_status *status, uint32_t desc,
-                            uint32_t neg1, uint32_t neg3)
+                            uint32_t neg1, uint32_t neg3, int flags)
 {
     intptr_t i = simd_oprsz(desc);
     uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
                 e1 = *(uint32_t *)(vn + H1_4(i)) ^ neg1;
                 e2 = *(uint32_t *)(vm + H1_4(i));
                 e3 = *(uint32_t *)(va + H1_4(i)) ^ neg3;
-                r = float32_muladd(e1, e2, e3, 0, status);
+                r = float32_muladd(e1, e2, e3, flags, status);
                 *(uint32_t *)(vd + H1_4(i)) = r;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
 void HELPER(sve_fmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
 }
 
 void HELPER(sve_fmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0, 0);
 }
 
 void HELPER(sve_fnmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000, 0);
 }
 
 void HELPER(sve_fnmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000, 0);
+}
+
+void HELPER(sve_ah_fmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product);
+}
+
+void HELPER(sve_ah_fnmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product | float_muladd_negate_c);
+}
+
+void HELPER(sve_ah_fnmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_c);
 }
 
 static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
                             float_status *status, uint32_t desc,
-                            uint64_t neg1, uint64_t neg3)
+                            uint64_t neg1, uint64_t neg3, int flags)
 {
     intptr_t i = simd_oprsz(desc);
     uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
                 e1 = *(uint64_t *)(vn + i) ^ neg1;
                 e2 = *(uint64_t *)(vm + i);
                 e3 = *(uint64_t *)(va + i) ^ neg3;
-                r = float64_muladd(e1, e2, e3, 0, status);
+                r = float64_muladd(e1, e2, e3, flags, status);
                 *(uint64_t *)(vd + i) = r;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
 void HELPER(sve_fmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
 }
 
 void HELPER(sve_fmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0, 0);
 }
 
 void HELPER(sve_fnmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN, 0);
 }
 
 void HELPER(sve_fnmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN, 0);
+}
+
+void HELPER(sve_ah_fmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product);
+}
+
+void HELPER(sve_ah_fnmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product | float_muladd_negate_c);
+}
+
+void HELPER(sve_ah_fnmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_c);
 }
 
 /* Two operand floating-point comparison controlled by a predicate.
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
            a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
-#define DO_FMLA(NAME, name) \
+#define DO_FMLA(NAME, name, ah_name)                                    \
     static gen_helper_gvec_5_ptr * const name##_fns[4] = {              \
         NULL, gen_helper_sve_##name##_h,                                \
         gen_helper_sve_##name##_s, gen_helper_sve_##name##_d            \
     };                                                                  \
-    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp, name##_fns[a->esz], \
+    static gen_helper_gvec_5_ptr * const name##_ah_fns[4] = {           \
+        NULL, gen_helper_sve_##ah_name##_h,                             \
+        gen_helper_sve_##ah_name##_s, gen_helper_sve_##ah_name##_d      \
+    };                                                                  \
+    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp,                     \
+               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz], \
                a->rd, a->rn, a->rm, a->ra, a->pg, 0,                    \
                a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
-DO_FMLA(FMLA_zpzzz, fmla_zpzzz)
-DO_FMLA(FMLS_zpzzz, fmls_zpzzz)
-DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz)
-DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz)
+/* We don't need an ah_fmla_zpzzz because fmla doesn't negate anything */
+DO_FMLA(FMLA_zpzzz, fmla_zpzzz, fmla_zpzzz)
+DO_FMLA(FMLS_zpzzz, fmls_zpzzz, ah_fmls_zpzzz)
+DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz, ah_fnmla_zpzzz)
+DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz, ah_fnmls_zpzzz)
 
 #undef DO_FMLA
 
-- 
2.34.1

The negation step in the SVE FTSSEL insn mustn't negate a NaN when
FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field
and use that to determine whether to do the negation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/sve_helper.c    | 18 +++++++++++++++---
 target/arm/tcg/translate-sve.c |  4 ++--
 2 files changed, 17 insertions(+), 5 deletions(-)

The negation step in the SVE FTMAD insn mustn't negate a NaN when
FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field,
so we can select the correct behaviour.

Because the operand is known to be negative, negating the operand
is the same as taking the absolute value.  Defer this to the muladd
operation via flags, so that it happens after NaN detection, which
is correct for FPCR.AH.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/sve_helper.c    | 42 ++++++++++++++++++++++++++--------
 target/arm/tcg/translate-sve.c |  3 ++-
 2 files changed, 35 insertions(+), 10 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

The negation step in FCMLA mustn't negate a NaN when FPCR.AH
is set. Handle this by passing FPCR.AH to the helper via the
SIMD data field, and use this to select whether to do the
negation via XOR or via the muladd negate_product flag.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-26-richard.henderson@linaro.org
[PMM: Expanded commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c |  2 +-
 target/arm/tcg/vec_helper.c    | 66 ++++++++++++++++++++--------------
 2 files changed, 40 insertions(+), 28 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
                       a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
-                      a->rot, fn[a->esz]);
+                      a->rot | (s->fpcr_ah << 2), fn[a->esz]);
     return true;
 }
 
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float16 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-    uint32_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float16 negx_imag, negx_real;
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 15;
-    neg_imag <<= 15;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 15;
+    negx_imag = (negf_imag & ~fpcr_ah) << 15;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < opr_sz / 2; i += 2) {
         float16 e2 = n[H2(i + flip)];
-        float16 e1 = m[H2(i + flip)] ^ neg_real;
+        float16 e1 = m[H2(i + flip)] ^ negx_real;
         float16 e4 = e2;
-        float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag;
+        float16 e3 = m[H2(i + 1 - flip)] ^ negx_imag;
 
-        d[H2(i)] = float16_muladd(e2, e1, a[H2(i)], 0, fpst);
-        d[H2(i + 1)] = float16_muladd(e4, e3, a[H2(i + 1)], 0, fpst);
+        d[H2(i)] = float16_muladd(e2, e1, a[H2(i)], negf_real, fpst);
+        d[H2(i + 1)] = float16_muladd(e4, e3, a[H2(i + 1)], negf_imag, fpst);
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlas)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float32 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-    uint32_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float32 negx_imag, negx_real;
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 31;
-    neg_imag <<= 31;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 31;
+    negx_imag = (negf_imag & ~fpcr_ah) << 31;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < opr_sz / 4; i += 2) {
         float32 e2 = n[H4(i + flip)];
-        float32 e1 = m[H4(i + flip)] ^ neg_real;
+        float32 e1 = m[H4(i + flip)] ^ negx_real;
         float32 e4 = e2;
-        float32 e3 = m[H4(i + 1 - flip)] ^ neg_imag;
+        float32 e3 = m[H4(i + 1 - flip)] ^ negx_imag;
 
-        d[H4(i)] = float32_muladd(e2, e1, a[H4(i)], 0, fpst);
-        d[H4(i + 1)] = float32_muladd(e4, e3, a[H4(i + 1)], 0, fpst);
+        d[H4(i)] = float32_muladd(e2, e1, a[H4(i)], negf_real, fpst);
+        d[H4(i + 1)] = float32_muladd(e4, e3, a[H4(i + 1)], negf_imag, fpst);
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlad)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float64 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint64_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-    uint64_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float64 negx_real, negx_imag;
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 63;
-    neg_imag <<= 63;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (uint64_t)(negf_real & ~fpcr_ah) << 63;
+    negx_imag = (uint64_t)(negf_imag & ~fpcr_ah) << 63;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < opr_sz / 8; i += 2) {
         float64 e2 = n[i + flip];
-        float64 e1 = m[i + flip] ^ neg_real;
+        float64 e1 = m[i + flip] ^ negx_real;
         float64 e4 = e2;
-        float64 e3 = m[i + 1 - flip] ^ neg_imag;
+        float64 e3 = m[i + 1 - flip] ^ negx_imag;
 
-        d[i] = float64_muladd(e2, e1, a[i], 0, fpst);
-        d[i + 1] = float64_muladd(e4, e3, a[i + 1], 0, fpst);
+        d[i] = float64_muladd(e2, e1, a[i], negf_real, fpst);
+        d[i + 1] = float64_muladd(e4, e3, a[i + 1], negf_imag, fpst);
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

The negation step in FCMLA by index mustn't negate a NaN when
FPCR.AH is set. Use the same approach as vector FCMLA of
passing in FPCR.AH and using it to select whether to negate
by XOR or by the muladd negate_product flag.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-27-richard.henderson@linaro.org
[PMM: Expanded commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c |  2 +-
 target/arm/tcg/vec_helper.c    | 44 ++++++++++++++++++++--------------
 2 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
     if (fp_access_check(s)) {
         gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
                           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
-                          (a->idx << 2) | a->rot, fn);
+                          (s->fpcr_ah << 4) | (a->idx << 2) | a->rot, fn);
     }
     return true;
 }
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float16 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
     intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
-    uint32_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 4, 1);
+    uint32_t negf_real = flip ^ negf_imag;
     intptr_t elements = opr_sz / sizeof(float16);
     intptr_t eltspersegment = MIN(16 / sizeof(float16), elements);
+    float16 negx_imag, negx_real;
     intptr_t i, j;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 15;
-    neg_imag <<= 15;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 15;
+    negx_imag = (negf_imag & ~fpcr_ah) << 15;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < elements; i += eltspersegment) {
         float16 mr = m[H2(i + 2 * index + 0)];
         float16 mi = m[H2(i + 2 * index + 1)];
-        float16 e1 = neg_real ^ (flip ? mi : mr);
-        float16 e3 = neg_imag ^ (flip ? mr : mi);
+        float16 e1 = negx_real ^ (flip ? mi : mr);
+        float16 e3 = negx_imag ^ (flip ? mr : mi);
 
         for (j = i; j < i + eltspersegment; j += 2) {
             float16 e2 = n[H2(j + flip)];
             float16 e4 = e2;
 
-            d[H2(j)] = float16_muladd(e2, e1, a[H2(j)], 0, fpst);
-            d[H2(j + 1)] = float16_muladd(e4, e3, a[H2(j + 1)], 0, fpst);
+            d[H2(j)] = float16_muladd(e2, e1, a[H2(j)], negf_real, fpst);
+            d[H2(j + 1)] = float16_muladd(e4, e3, a[H2(j + 1)], negf_imag, fpst);
         }
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float32 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
     intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
-    uint32_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 4, 1);
+    uint32_t negf_real = flip ^ negf_imag;
     intptr_t elements = opr_sz / sizeof(float32);
     intptr_t eltspersegment = MIN(16 / sizeof(float32), elements);
+    float32 negx_imag, negx_real;
     intptr_t i, j;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 31;
-    neg_imag <<= 31;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 31;
+    negx_imag = (negf_imag & ~fpcr_ah) << 31;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < elements; i += eltspersegment) {
         float32 mr = m[H4(i + 2 * index + 0)];
         float32 mi = m[H4(i + 2 * index + 1)];
-        float32 e1 = neg_real ^ (flip ? mi : mr);
-        float32 e3 = neg_imag ^ (flip ? mr : mi);
+        float32 e1 = negx_real ^ (flip ? mi : mr);
+        float32 e3 = negx_imag ^ (flip ? mr : mi);
 
         for (j = i; j < i + eltspersegment; j += 2) {
             float32 e2 = n[H4(j + flip)];
             float32 e4 = e2;
 
-            d[H4(j)] = float32_muladd(e2, e1, a[H4(j)], 0, fpst);
-            d[H4(j + 1)] = float32_muladd(e4, e3, a[H4(j + 1)], 0, fpst);
+            d[H4(j)] = float32_muladd(e2, e1, a[H4(j)], negf_real, fpst);
+            d[H4(j + 1)] = float32_muladd(e4, e3, a[H4(j + 1)], negf_imag, fpst);
         }
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

The negation step in SVE FCMLA mustn't negate a NaN when FPCR.AH is
set.  Use the same approach as we did for A64 FCMLA of passing in
FPCR.AH and using it to select whether to negate by XOR or by the
muladd negate_product flag.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-28-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sve_helper.c    | 69 +++++++++++++++++++++-------------
 target/arm/tcg/translate-sve.c |  2 +-
 2 files changed, 43 insertions(+), 28 deletions(-)

diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
     intptr_t j, i = simd_oprsz(desc);
-    unsigned rot = simd_data(desc);
-    bool flip = rot & 1;
-    float16 neg_imag, neg_real;
+    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float16 negx_imag, negx_real;
     uint64_t *g = vg;
 
-    neg_imag = float16_set_sign(0, (rot & 2) != 0);
-    neg_real = float16_set_sign(0, rot == 1 || rot == 2);
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 15;
+    negx_imag = (negf_imag & ~fpcr_ah) << 15;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
             mi = *(float16 *)(vm + H1_2(j));
 
             e2 = (flip ? ni : nr);
-            e1 = (flip ? mi : mr) ^ neg_real;
+            e1 = (flip ? mi : mr) ^ negx_real;
             e4 = e2;
-            e3 = (flip ? mr : mi) ^ neg_imag;
+            e3 = (flip ? mr : mi) ^ negx_imag;
 
             if (likely((pg >> (i & 63)) & 1)) {
                 d = *(float16 *)(va + H1_2(i));
-                d = float16_muladd(e2, e1, d, 0, status);
+                d = float16_muladd(e2, e1, d, negf_real, status);
                 *(float16 *)(vd + H1_2(i)) = d;
             }
             if (likely((pg >> (j & 63)) & 1)) {
                 d = *(float16 *)(va + H1_2(j));
-                d = float16_muladd(e4, e3, d, 0, status);
+                d = float16_muladd(e4, e3, d, negf_imag, status);
                 *(float16 *)(vd + H1_2(j)) = d;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
     intptr_t j, i = simd_oprsz(desc);
-    unsigned rot = simd_data(desc);
-    bool flip = rot & 1;
-    float32 neg_imag, neg_real;
+    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float32 negx_imag, negx_real;
     uint64_t *g = vg;
 
-    neg_imag = float32_set_sign(0, (rot & 2) != 0);
-    neg_real = float32_set_sign(0, rot == 1 || rot == 2);
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 31;
+    negx_imag = (negf_imag & ~fpcr_ah) << 31;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
             mi = *(float32 *)(vm + H1_2(j));
 
             e2 = (flip ? ni : nr);
-            e1 = (flip ? mi : mr) ^ neg_real;
+            e1 = (flip ? mi : mr) ^ negx_real;
             e4 = e2;
-            e3 = (flip ? mr : mi) ^ neg_imag;
+            e3 = (flip ? mr : mi) ^ negx_imag;
 
             if (likely((pg >> (i & 63)) & 1)) {
                 d = *(float32 *)(va + H1_2(i));
-                d = float32_muladd(e2, e1, d, 0, status);
+                d = float32_muladd(e2, e1, d, negf_real, status);
                 *(float32 *)(vd + H1_2(i)) = d;
             }
             if (likely((pg >> (j & 63)) & 1)) {
                 d = *(float32 *)(va + H1_2(j));
-                d = float32_muladd(e4, e3, d, 0, status);
+                d = float32_muladd(e4, e3, d, negf_imag, status);
                 *(float32 *)(vd + H1_2(j)) = d;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
     intptr_t j, i = simd_oprsz(desc);
-    unsigned rot = simd_data(desc);
-    bool flip = rot & 1;
-    float64 neg_imag, neg_real;
+    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float64 negx_imag, negx_real;
     uint64_t *g = vg;
 
-    neg_imag = float64_set_sign(0, (rot & 2) != 0);
-    neg_real = float64_set_sign(0, rot == 1 || rot == 2);
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (uint64_t)(negf_real & ~fpcr_ah) << 63;
+    negx_imag = (uint64_t)(negf_imag & ~fpcr_ah) << 63;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
             mi = *(float64 *)(vm + H1_2(j));
 
             e2 = (flip ? ni : nr);
-            e1 = (flip ? mi : mr) ^ neg_real;
+            e1 = (flip ? mi : mr) ^ negx_real;
             e4 = e2;
-            e3 = (flip ? mr : mi) ^ neg_imag;
+            e3 = (flip ? mr : mi) ^ negx_imag;
 
             if (likely((pg >> (i & 63)) & 1)) {
                 d = *(float64 *)(va + H1_2(i));
-                d = float64_muladd(e2, e1, d, 0, status);
+                d = float64_muladd(e2, e1, d, negf_real, status);
                 *(float64 *)(vd + H1_2(i)) = d;
             }
             if (likely((pg >> (j & 63)) & 1)) {
                 d = *(float64 *)(va + H1_2(j));
-                d = float64_muladd(e4, e3, d, 0, status);
+                d = float64_muladd(e4, e3, d, negf_imag, status);
                 *(float64 *)(vd + H1_2(j)) = d;
             }
         } while (i & 63);
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_5_ptr * const fcmla_fns[4] = {
     gen_helper_sve_fcmla_zpzzz_s, gen_helper_sve_fcmla_zpzzz_d,
 };
 TRANS_FEAT(FCMLA_zpzzz, aa64_sve, gen_gvec_fpst_zzzzp, fcmla_fns[a->esz],
-           a->rd, a->rn, a->rm, a->ra, a->pg, a->rot,
+           a->rd, a->rn, a->rm, a->ra, a->pg, a->rot | (s->fpcr_ah << 2),
            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
 static gen_helper_gvec_4_ptr * const fcmla_idx_fns[4] = {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Handle FPCR.AH's requirement to not negate the sign of a NaN
in FMLSL by element and vector, using the usual trick of
negating by XOR when AH=0 and by muladd flags when AH=1.

Since we have the CPUARMState* in the helper anyway, we can
look directly at env->vfp.fpcr and don't need toa pass in the
FPCR.AH value via the SIMD data word.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-31-richard.henderson@linaro.org
[PMM: commit message tweaked]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 71 ++++++++++++++++++++++++-------------
 1 file changed, 46 insertions(+), 25 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t load4_f16(uint64_t *ptr, int is_q, int is_2)
  */
 
 static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
-                     uint32_t desc, bool fz16)
+                     uint64_t negx, int negf, uint32_t desc, bool fz16)
 {
     intptr_t i, oprsz = simd_oprsz(desc);
-    int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
     int is_q = oprsz == 16;
     uint64_t n_4, m_4;
 
-    /* Pre-load all of the f16 data, avoiding overlap issues.  */
-    n_4 = load4_f16(vn, is_q, is_2);
+    /*
+     * Pre-load all of the f16 data, avoiding overlap issues.
+     * Negate all inputs for AH=0 FMLSL at once.
+     */
+    n_4 = load4_f16(vn, is_q, is_2) ^ negx;
     m_4 = load4_f16(vm, is_q, is_2);
 
-    /* Negate all inputs for FMLSL at once.  */
-    if (is_s) {
-        n_4 ^= 0x8000800080008000ull;
-    }
-
     for (i = 0; i < oprsz / 4; i++) {
         float32 n_1 = float16_to_float32_by_bits(n_4 >> (i * 16), fz16);
         float32 m_1 = float16_to_float32_by_bits(m_4 >> (i * 16), fz16);
-        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], 0, fpst);
+        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], negf, fpst);
     }
     clear_tail(d, oprsz, simd_maxsz(desc));
 }
@@ -XXX,XX +XXX,XX @@ static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
 void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
                             CPUARMState *env, uint32_t desc)
 {
-    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, desc,
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint64_t negx = is_s ? 0x8000800080008000ull : 0;
+
+    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
              get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 }
 
 void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
                             CPUARMState *env, uint32_t desc)
 {
-    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, desc,
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint64_t negx = 0;
+    int negf = 0;
+
+    if (is_s) {
+        if (env->vfp.fpcr & FPCR_AH) {
+            negf = float_muladd_negate_product;
+        } else {
+            negx = 0x8000800080008000ull;
+        }
+    }
+    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
              get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
 }
 
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
 }
 
 static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
-                         uint32_t desc, bool fz16)
+                         uint64_t negx, int negf, uint32_t desc, bool fz16)
 {
     intptr_t i, oprsz = simd_oprsz(desc);
-    int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
     int index = extract32(desc, SIMD_DATA_SHIFT + 2, 3);
     int is_q = oprsz == 16;
     uint64_t n_4;
     float32 m_1;
 
-    /* Pre-load all of the f16 data, avoiding overlap issues.  */
-    n_4 = load4_f16(vn, is_q, is_2);
-
-    /* Negate all inputs for FMLSL at once.  */
-    if (is_s) {
-        n_4 ^= 0x8000800080008000ull;
-    }
-
+    /*
+     * Pre-load all of the f16 data, avoiding overlap issues.
+     * Negate all inputs for AH=0 FMLSL at once.
+     */
+    n_4 = load4_f16(vn, is_q, is_2) ^ negx;
     m_1 = float16_to_float32_by_bits(((float16 *)vm)[H2(index)], fz16);
 
     for (i = 0; i < oprsz / 4; i++) {
         float32 n_1 = float16_to_float32_by_bits(n_4 >> (i * 16), fz16);
-        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], 0, fpst);
+        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], negf, fpst);
     }
     clear_tail(d, oprsz, simd_maxsz(desc));
 }
@@ -XXX,XX +XXX,XX @@ static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
 void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
                                 CPUARMState *env, uint32_t desc)
 {
-    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, desc,
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint64_t negx = is_s ? 0x8000800080008000ull : 0;
+
+    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 }
 
 void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
                                 CPUARMState *env, uint32_t desc)
 {
-    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, desc,
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint64_t negx = 0;
+    int negf = 0;
+
+    if (is_s) {
+        if (env->vfp.fpcr & FPCR_AH) {
+            negf = float_muladd_negate_product;
+        } else {
+            negx = 0x8000800080008000ull;
+        }
+    }
+    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
 }
 
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Handle FPCR.AH's requirement to not negate the sign of a NaN in SVE
FMLSL (indexed), using the usual trick of negating by XOR when AH=0
and by muladd flags when AH=1.

Since we have the CPUARMState* in the helper anyway, we can
look directly at env->vfp.fpcr and don't need toa pass in the
FPCR.AH value via the SIMD data word.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-32-richard.henderson@linaro.org
[PMM: commit message tweaked]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
                                CPUARMState *env, uint32_t desc)
 {
     intptr_t i, j, oprsz = simd_oprsz(desc);
-    uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
     float_status *status = &env->vfp.fp_status_a64;
     bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
+    int negx = 0, negf = 0;
+
+    if (is_s) {
+        if (env->vfp.fpcr & FPCR_AH) {
+            negf = float_muladd_negate_product;
+        } else {
+            negx = 0x8000;
+        }
+    }
 
     for (i = 0; i < oprsz; i += 16) {
         float16 mm_16 = *(float16 *)(vm + i + idx);
         float32 mm = float16_to_float32_by_bits(mm_16, fz16);
 
         for (j = 0; j < 16; j += sizeof(float32)) {
-            float16 nn_16 = *(float16 *)(vn + H1_2(i + j + sel)) ^ negn;
+            float16 nn_16 = *(float16 *)(vn + H1_2(i + j + sel)) ^ negx;
             float32 nn = float16_to_float32_by_bits(nn_16, fz16);
             float32 aa = *(float32 *)(va + H1_4(i + j));
 
             *(float32 *)(vd + H1_4(i + j)) =
-                float32_muladd(nn, mm, aa, 0, status);
+                float32_muladd(nn, mm, aa, negf, status);
         }
     }
 }
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Handle FPCR.AH's requirement to not negate the sign of a NaN in SVE
FMLSL (indexed), using the usual trick of negating by XOR when AH=0
and by muladd flags when AH=1.

Since we have the CPUARMState* in the helper anyway, we can
look directly at env->vfp.fpcr and don't need toa pass in the
FPCR.AH value via the SIMD data word.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-33-richard.henderson@linaro.org
[PMM: tweaked commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
                                CPUARMState *env, uint32_t desc)
 {
     intptr_t i, oprsz = simd_oprsz(desc);
-    uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     float_status *status = &env->vfp.fp_status_a64;
     bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
+    int negx = 0, negf = 0;
+
+    if (is_s) {
+        if (env->vfp.fpcr & FPCR_AH) {
+            negf = float_muladd_negate_product;
+        } else {
+            negx = 0x8000;
+        }
+    }
 
     for (i = 0; i < oprsz; i += sizeof(float32)) {
-        float16 nn_16 = *(float16 *)(vn + H1_2(i + sel)) ^ negn;
+        float16 nn_16 = *(float16 *)(vn + H1_2(i + sel)) ^ negx;
         float16 mm_16 = *(float16 *)(vm + H1_2(i + sel));
         float32 nn = float16_to_float32_by_bits(nn_16, fz16);
         float32 mm = float16_to_float32_by_bits(mm_16, fz16);
         float32 aa = *(float32 *)(va + H1_4(i));
 
-        *(float32 *)(vd + H1_4(i)) = float32_muladd(nn, mm, aa, 0, status);
+        *(float32 *)(vd + H1_4(i)) = float32_muladd(nn, mm, aa, negf, status);
     }
 }
 
-- 
2.34.1

Now that we have completed the handling for FPCR.{AH,FIZ,NEP}, we
can enable FEAT_AFP for '-cpu max', and document that we support it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 docs/system/arm/emulation.rst | 1 +
 target/arm/tcg/cpu64.c        | 1 +
 2 files changed, 2 insertions(+)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
 - FEAT_AA64EL3 (Support for AArch64 at EL3)
 - FEAT_AdvSIMD (Advanced SIMD Extension)
 - FEAT_AES (AESD and AESE instructions)
+- FEAT_AFP (Alternate floating-point behavior)
 - FEAT_Armv9_Crypto (Armv9 Cryptographic Extension)
 - FEAT_ASID16 (16 bit ASID)
 - FEAT_BBM at level 2 (Translation table break-before-make levels)
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
     t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1);      /* FEAT_XNX */
     t = FIELD_DP64(t, ID_AA64MMFR1, ETS, 2);      /* FEAT_ETS2 */
     t = FIELD_DP64(t, ID_AA64MMFR1, HCX, 1);      /* FEAT_HCX */
+    t = FIELD_DP64(t, ID_AA64MMFR1, AFP, 1);      /* FEAT_AFP */
     t = FIELD_DP64(t, ID_AA64MMFR1, TIDCP1, 1);   /* FEAT_TIDCP1 */
     t = FIELD_DP64(t, ID_AA64MMFR1, CMOW, 1);     /* FEAT_CMOW */
     cpu->isar.id_aa64mmfr1 = t;
-- 
2.34.1

FEAT_RPRES implements an "increased precision" variant of the single
precision FRECPE and FRSQRTE instructions from an 8 bit to a 12
bit mantissa. This applies only when FPCR.AH == 1. Note that the
halfprec and double versions of these insns retain the 8 bit
precision regardless.

In this commit we add all the plumbing to make these instructions
call a new helper function when the increased-precision is in
effect. In the following commit we will provide the actual change
in behaviour in the helpers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu-features.h      |  5 +++++
 target/arm/helper.h            |  4 ++++
 target/arm/tcg/translate-a64.c | 34 ++++++++++++++++++++++++++++++----
 target/arm/tcg/translate-sve.c | 16 ++++++++++++++--
 target/arm/tcg/vec_helper.c    |  2 ++
 target/arm/vfp_helper.c        | 32 ++++++++++++++++++++++++++++++--
 6 files changed, 85 insertions(+), 8 deletions(-)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_mops(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, MOPS);
 }
 
+static inline bool isar_feature_aa64_rpres(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, RPRES);
+}
+
 static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
 {
     /* We always set the AdvSIMD and FP fields identically.  */
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, fpst)
 
 DEF_HELPER_FLAGS_2(recpe_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
 DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
+DEF_HELPER_FLAGS_2(recpe_rpres_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
 DEF_HELPER_FLAGS_2(recpe_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
 DEF_HELPER_FLAGS_2(rsqrte_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
 DEF_HELPER_FLAGS_2(rsqrte_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
+DEF_HELPER_FLAGS_2(rsqrte_rpres_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
 DEF_HELPER_FLAGS_2(rsqrte_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
 DEF_HELPER_FLAGS_1(recpe_u32, TCG_CALL_NO_RWG, i32, i32)
 DEF_HELPER_FLAGS_1(rsqrte_u32, TCG_CALL_NO_RWG, i32, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vrintx_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 
 DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_4(gvec_frecpe_rpres_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 
 DEF_HELPER_FLAGS_4(gvec_frsqrte_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frsqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_4(gvec_frsqrte_rpres_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frsqrte_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 
 DEF_HELPER_FLAGS_4(gvec_fcgt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frecpe = {
     gen_helper_recpe_f32,
     gen_helper_recpe_f64,
 };
-TRANS(FRECPE_s, do_fp1_scalar_ah, a, &f_scalar_frecpe, -1)
+static const FPScalar1 f_scalar_frecpe_rpres = {
+    gen_helper_recpe_f16,
+    gen_helper_recpe_rpres_f32,
+    gen_helper_recpe_f64,
+};
+TRANS(FRECPE_s, do_fp1_scalar_ah, a,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+      &f_scalar_frecpe_rpres : &f_scalar_frecpe, -1)
 
 static const FPScalar1 f_scalar_frecpx = {
     gen_helper_frecpx_f16,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frsqrte = {
     gen_helper_rsqrte_f32,
     gen_helper_rsqrte_f64,
 };
-TRANS(FRSQRTE_s, do_fp1_scalar_ah, a, &f_scalar_frsqrte, -1)
+static const FPScalar1 f_scalar_frsqrte_rpres = {
+    gen_helper_rsqrte_f16,
+    gen_helper_rsqrte_rpres_f32,
+    gen_helper_rsqrte_f64,
+};
+TRANS(FRSQRTE_s, do_fp1_scalar_ah, a,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+      &f_scalar_frsqrte_rpres : &f_scalar_frsqrte, -1)
 
 static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
 {
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_frecpe[] = {
     gen_helper_gvec_frecpe_s,
     gen_helper_gvec_frecpe_d,
 };
-TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
+static gen_helper_gvec_2_ptr * const f_frecpe_rpres[] = {
+    gen_helper_gvec_frecpe_h,
+    gen_helper_gvec_frecpe_rpres_s,
+    gen_helper_gvec_frecpe_d,
+};
+TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ? f_frecpe_rpres : f_frecpe)
 
 static gen_helper_gvec_2_ptr * const f_frsqrte[] = {
     gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s,
     gen_helper_gvec_frsqrte_d,
 };
-TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
+static gen_helper_gvec_2_ptr * const f_frsqrte_rpres[] = {
+    gen_helper_gvec_frsqrte_h,
+    gen_helper_gvec_frsqrte_rpres_s,
+    gen_helper_gvec_frsqrte_d,
+};
+TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ? f_frsqrte_rpres : f_frsqrte)
 
 static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
 {
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const frecpe_fns[] = {
     NULL,                     gen_helper_gvec_frecpe_h,
     gen_helper_gvec_frecpe_s, gen_helper_gvec_frecpe_d,
 };
-TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frecpe_fns[a->esz], a, 0)
+static gen_helper_gvec_2_ptr * const frecpe_rpres_fns[] = {
+    NULL,                           gen_helper_gvec_frecpe_h,
+    gen_helper_gvec_frecpe_rpres_s, gen_helper_gvec_frecpe_d,
+};
+TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz,
+           s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+           frecpe_rpres_fns[a->esz] : frecpe_fns[a->esz], a, 0)
 
 static gen_helper_gvec_2_ptr * const frsqrte_fns[] = {
     NULL,                      gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s, gen_helper_gvec_frsqrte_d,
 };
-TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frsqrte_fns[a->esz], a, 0)
+static gen_helper_gvec_2_ptr * const frsqrte_rpres_fns[] = {
+    NULL,                            gen_helper_gvec_frsqrte_h,
+    gen_helper_gvec_frsqrte_rpres_s, gen_helper_gvec_frsqrte_d,
+};
+TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz,
+           s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+           frsqrte_rpres_fns[a->esz] : frsqrte_fns[a->esz], a, 0)
 
 /*
  *** SVE Floating Point Compare with Zero Group
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, float_status *stat, uint32_t desc)  \
 
 DO_2OP(gvec_frecpe_h, helper_recpe_f16, float16)
 DO_2OP(gvec_frecpe_s, helper_recpe_f32, float32)
+DO_2OP(gvec_frecpe_rpres_s, helper_recpe_rpres_f32, float32)
 DO_2OP(gvec_frecpe_d, helper_recpe_f64, float64)
 
 DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16)
 DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32)
+DO_2OP(gvec_frsqrte_rpres_s, helper_rsqrte_rpres_f32, float32)
 DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64)
 
 DO_2OP(gvec_vrintx_h, float16_round_to_int, float16)
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, float_status *fpst)
     return make_float16(f16_val);
 }
 
-float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
+/*
+ * FEAT_RPRES means the f32 FRECPE has an "increased precision" variant
+ * which is used when FPCR.AH == 1.
+ */
+static float32 do_recpe_f32(float32 input, float_status *fpst, bool rpres)
 {
     float32 f32 = float32_squash_input_denormal(input, fpst);
     uint32_t f32_val = float32_val(f32);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
     return make_float32(f32_val);
 }
 
+float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
+{
+    return do_recpe_f32(input, fpst, false);
+}
+
+float32 HELPER(recpe_rpres_f32)(float32 input, float_status *fpst)
+{
+    return do_recpe_f32(input, fpst, true);
+}
+
 float64 HELPER(recpe_f64)(float64 input, float_status *fpst)
 {
     float64 f64 = float64_squash_input_denormal(input, fpst);
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
     return make_float16(val);
 }
 
-float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
+/*
+ * FEAT_RPRES means the f32 FRSQRTE has an "increased precision" variant
+ * which is used when FPCR.AH == 1.
+ */
+static float32 do_rsqrte_f32(float32 input, float_status *s, bool rpres)
 {
     float32 f32 = float32_squash_input_denormal(input, s);
     uint32_t val = float32_val(f32);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
     return make_float32(val);
 }
 
+float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
+{
+    return do_rsqrte_f32(input, s, false);
+}
+
+float32 HELPER(rsqrte_rpres_f32)(float32 input, float_status *s)
+{
+    return do_rsqrte_f32(input, s, true);
+}
+
 float64 HELPER(rsqrte_f64)(float64 input, float_status *s)
 {
     float64 f64 = float64_squash_input_denormal(input, s);
-- 
2.34.1

Implement the increased precision variation of FRECPE.  In the
pseudocode this corresponds to the handling of the
"increasedprecision" boolean in the FPRecipEstimate() and
RecipEstimate() functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vfp_helper.c | 54 +++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 8 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static int recip_estimate(int input)
     return r;
 }
 
+/*
+ * Increased precision version:
+ * input is a 13 bit fixed point number
+ * input range 2048 .. 4095 for a number from 0.5 <= x < 1.0.
+ * result range 4096 .. 8191 for a number from 1.0 to 2.0
+ */
+static int recip_estimate_incprec(int input)
+{
+    int a, b, r;
+    assert(2048 <= input && input < 4096);
+    a = (input * 2) + 1;
+    /*
+     * The pseudocode expresses this as an operation on infinite
+     * precision reals where it calculates 2^25 / a and then looks
+     * at the error between that and the rounded-down-to-integer
+     * value to see if it should instead round up. We instead
+     * follow the same approach as the pseudocode for the 8-bit
+     * precision version, and calculate (2 * (2^25 / a)) as an
+     * integer so we can do the "add one and halve" to round it.
+     * So the 1 << 26 here is correct.
+     */
+    b = (1 << 26) / a;
+    r = (b + 1) >> 1;
+    assert(4096 <= r && r < 8192);
+    return r;
+}
+
 /*
  * Common wrapper to call recip_estimate
  *
@@ -XXX,XX +XXX,XX @@ static int recip_estimate(int input)
  * callee.
  */
 
-static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
+static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac,
+                                    bool increasedprecision)
 {
     uint32_t scaled, estimate;
     uint64_t result_frac;
@@ -XXX,XX +XXX,XX @@ static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
         }
     }
 
-    /* scaled = UInt('1':fraction<51:44>) */
-    scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
-    estimate = recip_estimate(scaled);
+    if (increasedprecision) {
+        /* scaled = UInt('1':fraction<51:41>) */
+        scaled = deposit32(1 << 11, 0, 11, extract64(frac, 41, 11));
+        estimate = recip_estimate_incprec(scaled);
+    } else {
+        /* scaled = UInt('1':fraction<51:44>) */
+        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
+        estimate = recip_estimate(scaled);
+    }
 
     result_exp = exp_off - *exp;
-    result_frac = deposit64(0, 44, 8, estimate);
+    if (increasedprecision) {
+        result_frac = deposit64(0, 40, 12, estimate);
+    } else {
+        result_frac = deposit64(0, 44, 8, estimate);
+    }
     if (result_exp == 0) {
         result_frac = deposit64(result_frac >> 1, 51, 1, 1);
     } else if (result_exp == -1) {
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, float_status *fpst)
     }
 
     f64_frac = call_recip_estimate(&f16_exp, 29,
-                                   ((uint64_t) f16_frac) << (52 - 10));
+                                   ((uint64_t) f16_frac) << (52 - 10), false);
 
     /* result = sign : result_exp<4:0> : fraction<51:42> */
     f16_val = deposit32(0, 15, 1, f16_sign);
@@ -XXX,XX +XXX,XX @@ static float32 do_recpe_f32(float32 input, float_status *fpst, bool rpres)
     }
 
     f64_frac = call_recip_estimate(&f32_exp, 253,
-                                   ((uint64_t) f32_frac) << (52 - 23));
+                                   ((uint64_t) f32_frac) << (52 - 23), rpres);
 
     /* result = sign : result_exp<7:0> : fraction<51:29> */
     f32_val = deposit32(0, 31, 1, f32_sign);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(recpe_f64)(float64 input, float_status *fpst)
         return float64_set_sign(float64_zero, float64_is_neg(f64));
     }
 
-    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac);
+    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac, false);
 
     /* result = sign : result_exp<10:0> : fraction<51:0>; */
     f64_val = deposit64(0, 63, 1, f64_sign);
-- 
2.34.1

Implement the increased precision variation of FRSQRTE.  In the
pseudocode this corresponds to the handling of the
"increasedprecision" boolean in the FPRSqrtEstimate() and
RecipSqrtEstimate() functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vfp_helper.c | 77 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 64 insertions(+), 13 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static int do_recip_sqrt_estimate(int a)
     return estimate;
 }
 
+static int do_recip_sqrt_estimate_incprec(int a)
+{
+    /*
+     * The Arm ARM describes the 12-bit precision version of RecipSqrtEstimate
+     * in terms of an infinite-precision floating point calculation of a
+     * square root. We implement this using the same kind of pure integer
+     * algorithm as the 8-bit mantissa, to get the same bit-for-bit result.
+     */
+    int64_t b, estimate;
 
-static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
+    assert(1024 <= a && a < 4096);
+    if (a < 2048) {
+        a = a * 2 + 1;
+    } else {
+        a = (a >> 1) << 1;
+        a = (a + 1) * 2;
+    }
+    b = 8192;
+    while (a * (b + 1) * (b + 1) < (1ULL << 39)) {
+        b += 1;
+    }
+    estimate = (b + 1) / 2;
+
+    assert(4096 <= estimate && estimate < 8192);
+
+    return estimate;
+}
+
+static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac,
+                                    bool increasedprecision)
 {
     int estimate;
     uint32_t scaled;
@@ -XXX,XX +XXX,XX @@ static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
         frac = extract64(frac, 0, 51) << 1;
     }
 
-    if (*exp & 1) {
-        /* scaled = UInt('01':fraction<51:45>) */
-        scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
+    if (increasedprecision) {
+        if (*exp & 1) {
+            /* scaled = UInt('01':fraction<51:42>) */
+            scaled = deposit32(1 << 10, 0, 10, extract64(frac, 42, 10));
+        } else {
+            /* scaled = UInt('1':fraction<51:41>) */
+            scaled = deposit32(1 << 11, 0, 11, extract64(frac, 41, 11));
+        }
+        estimate = do_recip_sqrt_estimate_incprec(scaled);
     } else {
-        /* scaled = UInt('1':fraction<51:44>) */
-        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
+        if (*exp & 1) {
+            /* scaled = UInt('01':fraction<51:45>) */
+            scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
+        } else {
+            /* scaled = UInt('1':fraction<51:44>) */
+            scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
+        }
+        estimate = do_recip_sqrt_estimate(scaled);
     }
-    estimate = do_recip_sqrt_estimate(scaled);
 
     *exp = (exp_off - *exp) / 2;
-    return extract64(estimate, 0, 8) << 44;
+    if (increasedprecision) {
+        return extract64(estimate, 0, 12) << 40;
+    } else {
+        return extract64(estimate, 0, 8) << 44;
+    }
 }
 
 uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
 
     f64_frac = ((uint64_t) f16_frac) << (52 - 10);
 
-    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac);
+    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac, false);
 
     /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(2) */
     val = deposit32(0, 15, 1, f16_sign);
@@ -XXX,XX +XXX,XX @@ static float32 do_rsqrte_f32(float32 input, float_status *s, bool rpres)
 
     f64_frac = ((uint64_t) f32_frac) << 29;
 
-    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac);
+    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac, rpres);
 
-    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(15) */
+    /*
+     * result = sign : result_exp<7:0> : estimate<7:0> : Zeros(15)
+     * or for increased precision
+     * result = sign : result_exp<7:0> : estimate<11:0> : Zeros(11)
+     */
     val = deposit32(0, 31, 1, f32_sign);
     val = deposit32(val, 23, 8, f32_exp);
-    val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
+    if (rpres) {
+        val = deposit32(val, 11, 12, extract64(f64_frac, 52 - 12, 12));
+    } else {
+        val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
+    }
     return make_float32(val);
 }
 
@@ -XXX,XX +XXX,XX @@ float64 HELPER(rsqrte_f64)(float64 input, float_status *s)
         return float64_zero;
     }
 
-    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac);
+    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac, false);
 
     /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(44) */
     val = deposit64(0, 61, 1, f64_sign);
-- 
2.34.1

Now the emulation is complete, we can enable FEAT_RPRES for the 'max'
CPU type.

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
 - FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions)
 - FEAT_RME (Realm Management Extension) (NB: support status in QEMU is experimental)
 - FEAT_RNG (Random number generator)
+- FEAT_RPRES (Increased precision of FRECPE and FRSQRTE)
 - FEAT_S2FWB (Stage 2 forced Write-Back)
 - FEAT_SB (Speculation Barrier)
 - FEAT_SEL2 (Secure EL2)
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
     cpu->isar.id_aa64isar1 = t;
 
     t = cpu->isar.id_aa64isar2;
+    t = FIELD_DP64(t, ID_AA64ISAR2, RPRES, 1);    /* FEAT_RPRES */
     t = FIELD_DP64(t, ID_AA64ISAR2, MOPS, 1);     /* FEAT_MOPS */
     t = FIELD_DP64(t, ID_AA64ISAR2, BC, 1);       /* FEAT_HBC */
     t = FIELD_DP64(t, ID_AA64ISAR2, WFXT, 2);     /* FEAT_WFxT */
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Move ARMFPStatusFlavour to cpu.h with which to index
this array.  For now, place the array in an anonymous
union with the existing structures.  Adjust the order
of the existing structures to match the enum.

Simplify fpstatus_ptr() using the new array.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           | 119 +++++++++++++++++++++----------------
 target/arm/tcg/translate.h |  64 +-------------------
 2 files changed, 70 insertions(+), 113 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct ARMMMUFaultInfo ARMMMUFaultInfo;
 
 typedef struct NVICState NVICState;
 
+/*
+ * Enum for indexing vfp.fp_status[].
+ *
+ * FPST_A32: is the "normal" fp status for AArch32 insns
+ * FPST_A64: is the "normal" fp status for AArch64 insns
+ * FPST_A32_F16: used for AArch32 half-precision calculations
+ * FPST_A64_F16: used for AArch64 half-precision calculations
+ * FPST_STD: the ARM "Standard FPSCR Value"
+ * FPST_STD_F16: used for half-precision
+ *       calculations with the ARM "Standard FPSCR Value"
+ * FPST_AH: used for the A64 insns which change behaviour
+ *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
+ *       and the reciprocal and square root estimate/step insns)
+ * FPST_AH_F16: used for the A64 insns which change behaviour
+ *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
+ *       and the reciprocal and square root estimate/step insns);
+ *       for half-precision
+ *
+ * Half-precision operations are governed by a separate
+ * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
+ * status structure to control this.
+ *
+ * The "Standard FPSCR", ie default-NaN, flush-to-zero,
+ * round-to-nearest and is used by any operations (generally
+ * Neon) which the architecture defines as controlled by the
+ * standard FPSCR value rather than the FPSCR.
+ *
+ * The "standard FPSCR but for fp16 ops" is needed because
+ * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
+ * using a fixed value for it.
+ *
+ * The ah_fp_status is needed because some insns have different
+ * behaviour when FPCR.AH == 1: they don't update cumulative
+ * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
+ * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
+ * which means we need an ah_fp_status_f16 as well.
+ *
+ * To avoid having to transfer exception bits around, we simply
+ * say that the FPSCR cumulative exception flags are the logical
+ * OR of the flags in the four fp statuses. This relies on the
+ * only thing which needs to read the exception flags being
+ * an explicit FPSCR read.
+ */
+typedef enum ARMFPStatusFlavour {
+    FPST_A32,
+    FPST_A64,
+    FPST_A32_F16,
+    FPST_A64_F16,
+    FPST_AH,
+    FPST_AH_F16,
+    FPST_STD,
+    FPST_STD_F16,
+} ARMFPStatusFlavour;
+#define FPST_COUNT  8
+
 typedef struct CPUArchState {
     /* Regs for current mode.  */
     uint32_t regs[16];
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
         /* Scratch space for aa32 neon expansion.  */
         uint32_t scratch[8];
 
-        /* There are a number of distinct float control structures:
-         *
-         *  fp_status_a32: is the "normal" fp status for AArch32 insns
-         *  fp_status_a64: is the "normal" fp status for AArch64 insns
-         *  fp_status_fp16_a32: used for AArch32 half-precision calculations
-         *  fp_status_fp16_a64: used for AArch64 half-precision calculations
-         *  standard_fp_status : the ARM "Standard FPSCR Value"
-         *  standard_fp_status_fp16 : used for half-precision
-         *       calculations with the ARM "Standard FPSCR Value"
-         *  ah_fp_status: used for the A64 insns which change behaviour
-         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
-         *       and the reciprocal and square root estimate/step insns)
-         *  ah_fp_status_f16: used for the A64 insns which change behaviour
-         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
-         *       and the reciprocal and square root estimate/step insns);
-         *       for half-precision
-         *
-         * Half-precision operations are governed by a separate
-         * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
-         * status structure to control this.
-         *
-         * The "Standard FPSCR", ie default-NaN, flush-to-zero,
-         * round-to-nearest and is used by any operations (generally
-         * Neon) which the architecture defines as controlled by the
-         * standard FPSCR value rather than the FPSCR.
-         *
-         * The "standard FPSCR but for fp16 ops" is needed because
-         * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
-         * using a fixed value for it.
-         *
-         * The ah_fp_status is needed because some insns have different
-         * behaviour when FPCR.AH == 1: they don't update cumulative
-         * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
-         * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
-         * which means we need an ah_fp_status_f16 as well.
-         *
-         * To avoid having to transfer exception bits around, we simply
-         * say that the FPSCR cumulative exception flags are the logical
-         * OR of the flags in the four fp statuses. This relies on the
-         * only thing which needs to read the exception flags being
-         * an explicit FPSCR read.
-         */
-        float_status fp_status_a32;
-        float_status fp_status_a64;
-        float_status fp_status_f16_a32;
-        float_status fp_status_f16_a64;
-        float_status standard_fp_status;
-        float_status standard_fp_status_f16;
-        float_status ah_fp_status;
-        float_status ah_fp_status_f16;
+        /* There are a number of distinct float control structures. */
+        union {
+            float_status fp_status[FPST_COUNT];
+            struct {
+                float_status fp_status_a32;
+                float_status fp_status_a64;
+                float_status fp_status_f16_a32;
+                float_status fp_status_f16_a64;
+                float_status ah_fp_status;
+                float_status ah_fp_status_f16;
+                float_status standard_fp_status;
+                float_status standard_fp_status_f16;
+            };
+        };
 
         uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
         uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ static inline CPUARMTBFlags arm_tbflags_from_tb(const TranslationBlock *tb)
     return (CPUARMTBFlags){ tb->flags, tb->cs_base };
 }
 
-/*
- * Enum for argument to fpstatus_ptr().
- */
-typedef enum ARMFPStatusFlavour {
-    FPST_A32,
-    FPST_A64,
-    FPST_A32_F16,
-    FPST_A64_F16,
-    FPST_AH,
-    FPST_AH_F16,
-    FPST_STD,
-    FPST_STD_F16,
-} ARMFPStatusFlavour;
-
 /**
  * fpstatus_ptr: return TCGv_ptr to the specified fp_status field
  *
  * We have multiple softfloat float_status fields in the Arm CPU state struct
  * (see the comment in cpu.h for details). Return a TCGv_ptr which has
  * been set up to point to the requested field in the CPU state struct.
- * The options are:
- *
- * FPST_A32
- *   for AArch32 non-FP16 operations controlled by the FPCR
- * FPST_A64
- *   for AArch64 non-FP16 operations controlled by the FPCR
- * FPST_A32_F16
- *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
- * FPST_A64_F16
- *   for AArch64 operations controlled by the FPCR where FPCR.FZ16 is to be used
- * FPST_AH:
- *   for AArch64 operations which change behaviour when AH=1 (specifically,
- *   bfloat16 conversions and multiplies, and the reciprocal and square root
- *   estimate/step insns)
- * FPST_AH_F16:
- *   ditto, but for half-precision operations
- * FPST_STD
- *   for A32/T32 Neon operations using the "standard FPSCR value"
- * FPST_STD_F16
- *   as FPST_STD, but where FPCR.FZ16 is to be used
  */
 static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
 {
     TCGv_ptr statusptr = tcg_temp_new_ptr();
-    int offset;
+    int offset = offsetof(CPUARMState, vfp.fp_status[flavour]);
 
-    switch (flavour) {
-    case FPST_A32:
-        offset = offsetof(CPUARMState, vfp.fp_status_a32);
-        break;
-    case FPST_A64:
-        offset = offsetof(CPUARMState, vfp.fp_status_a64);
-        break;
-    case FPST_A32_F16:
-        offset = offsetof(CPUARMState, vfp.fp_status_f16_a32);
-        break;
-    case FPST_A64_F16:
-        offset = offsetof(CPUARMState, vfp.fp_status_f16_a64);
-        break;
-    case FPST_AH:
-        offset = offsetof(CPUARMState, vfp.ah_fp_status);
-        break;
-    case FPST_AH_F16:
-        offset = offsetof(CPUARMState, vfp.ah_fp_status_f16);
-        break;
-    case FPST_STD:
-        offset = offsetof(CPUARMState, vfp.standard_fp_status);
-        break;
-    case FPST_STD_F16:
-        offset = offsetof(CPUARMState, vfp.standard_fp_status_f16);
-        break;
-    default:
-        g_assert_not_reached();
-    }
     tcg_gen_addi_ptr(statusptr, tcg_env, offset);
     return statusptr;
 }
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_STD_F16].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  4 ++--
 target/arm/tcg/mve_helper.c | 24 ++++++++++++------------
 target/arm/vfp_helper.c     |  8 ++++----
 4 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
                 float_status ah_fp_status;
                 float_status ah_fp_status_f16;
                 float_status standard_fp_status;
-                float_status standard_fp_status_f16;
             };
         };
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     set_flush_to_zero(1, &env->vfp.standard_fp_status);
     set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
     set_default_nan_mode(1, &env->vfp.standard_fp_status);
-    set_default_nan_mode(1, &env->vfp.standard_fp_status_f16);
+    set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
-    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
     arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
     set_flush_to_zero(1, &env->vfp.ah_fp_status);
     set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                 r[e] = 0;                                               \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(tm & 1)) {                                            \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE * 2)) == 0) {          \
                 continue;                                               \
             }                                                           \
-            fpst0 = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :   \
+            fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
                 &env->vfp.standard_fp_status;                           \
             fpst1 = fpst0;                                              \
             if (!(mask & 1)) {                                          \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
         TYPE *m = vm;                                           \
         TYPE ra = (TYPE)ra_in;                                  \
         float_status *fpst = (ESIZE == 2) ?                     \
-            &env->vfp.standard_fp_status_f16 :                  \
+            &env->vfp.fp_status[FPST_STD_F16] :                 \
             &env->vfp.standard_fp_status;                       \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
             if (mask & 1) {                                     \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
             if ((mask & emask) == 0) {                                  \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
             if ((mask & emask) == 0) {                                  \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
         float_status *fpst;                                             \
         float_status scratch_fpst;                                      \
         float_status *base_fpst = (ESIZE == 2) ?                        \
-            &env->vfp.standard_fp_status_f16 :                          \
+            &env->vfp.fp_status[FPST_STD_F16] :                         \
             &env->vfp.standard_fp_status;                               \
         uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
         set_float_rounding_mode(rmode, base_fpst);                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     /* FZ16 does not generate an input denormal exception.  */
     a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
           & ~float_flag_input_denormal_flushed);
-    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
+    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_STD_F16])
           & ~float_flag_input_denormal_flushed);
 
     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
     set_float_exception_flags(0, &env->vfp.standard_fp_status);
-    set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
     set_float_exception_flags(0, &env->vfp.ah_fp_status);
     set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
 }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         bool ftz_enabled = val & FPCR_FZ16;
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
-        set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
         set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
     }
     if (changed & FPCR_FZ) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_STD].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  8 ++++----
 target/arm/tcg/mve_helper.c | 28 ++++++++++++++--------------
 target/arm/tcg/vec_helper.c |  4 ++--
 target/arm/vfp_helper.c     |  4 ++--
 5 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
                 float_status fp_status_f16_a64;
                 float_status ah_fp_status;
                 float_status ah_fp_status_f16;
-                float_status standard_fp_status;
             };
         };
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
         env->sau.ctrl = 0;
     }
 
-    set_flush_to_zero(1, &env->vfp.standard_fp_status);
-    set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
-    set_default_nan_mode(1, &env->vfp.standard_fp_status);
+    set_flush_to_zero(1, &env->vfp.fp_status[FPST_STD]);
+    set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]);
+    set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
-    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(tm & 1)) {                                            \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
                 continue;                                               \
             }                                                           \
             fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             fpst1 = fpst0;                                              \
             if (!(mask & 1)) {                                          \
                 scratch_fpst = *fpst0;                                  \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
         TYPE ra = (TYPE)ra_in;                                  \
         float_status *fpst = (ESIZE == 2) ?                     \
             &env->vfp.fp_status[FPST_STD_F16] :                 \
-            &env->vfp.standard_fp_status;                       \
+            &env->vfp.fp_status[FPST_STD];                       \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
             if (mask & 1) {                                     \
                 TYPE v = m[H##ESIZE(e)];                        \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
         float_status scratch_fpst;                                      \
         float_status *base_fpst = (ESIZE == 2) ?                        \
             &env->vfp.fp_status[FPST_STD_F16] :                         \
-            &env->vfp.standard_fp_status;                               \
+            &env->vfp.fp_status[FPST_STD];                               \
         uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
         set_float_rounding_mode(rmode, base_fpst);                      \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
@@ -XXX,XX +XXX,XX @@ static void do_vcvt_sh(CPUARMState *env, void *vd, void *vm, int top)
     unsigned e;
     float_status *fpst;
     float_status scratch_fpst;
-    float_status *base_fpst = &env->vfp.standard_fp_status;
+    float_status *base_fpst = &env->vfp.fp_status[FPST_STD];
     bool old_fz = get_flush_to_zero(base_fpst);
     set_flush_to_zero(false, base_fpst);
     for (e = 0; e < 16 / 4; e++, mask >>= 4) {
@@ -XXX,XX +XXX,XX @@ static void do_vcvt_hs(CPUARMState *env, void *vd, void *vm, int top)
     unsigned e;
     float_status *fpst;
     float_status scratch_fpst;
-    float_status *base_fpst = &env->vfp.standard_fp_status;
+    float_status *base_fpst = &env->vfp.fp_status[FPST_STD];
     bool old_fiz = get_flush_inputs_to_zero(base_fpst);
     set_flush_inputs_to_zero(false, base_fpst);
     for (e = 0; e < 16 / 4; e++, mask >>= 4) {
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 
-    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
+    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
              get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 }
 
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 
-    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
+    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 }
 
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     uint32_t a32_flags = 0, a64_flags = 0;
 
     a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
-    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
+    a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
     /* FZ16 does not generate an input denormal exception.  */
     a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
           & ~float_flag_input_denormal_flushed);
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status_a64);
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
-    set_float_exception_flags(0, &env->vfp.standard_fp_status);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
     set_float_exception_flags(0, &env->vfp.ah_fp_status);
     set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_AH_F16].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h        |  3 +--
 target/arm/cpu.c        |  2 +-
 target/arm/vfp_helper.c | 10 +++++-----
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState;
  * behaviour when FPCR.AH == 1: they don't update cumulative
  * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
  * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
- * which means we need an ah_fp_status_f16 as well.
+ * which means we need an FPST_AH_F16 as well.
  *
  * To avoid having to transfer exception bits around, we simply
  * say that the FPSCR cumulative exception flags are the logical
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
                 float_status fp_status_f16_a32;
                 float_status fp_status_f16_a64;
                 float_status ah_fp_status;
-                float_status ah_fp_status_f16;
             };
         };
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
     set_flush_to_zero(1, &env->vfp.ah_fp_status);
     set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
-    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status_f16);
+    arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH_F16]);
 
 #ifndef CONFIG_USER_ONLY
     if (kvm_enabled()) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
     /*
-     * We do not merge in flags from ah_fp_status or ah_fp_status_f16, because
+     * We do not merge in flags from ah_fp_status or FPST_AH_F16, because
      * they are used for insns that must not set the cumulative exception bits.
      */
 
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
     set_float_exception_flags(0, &env->vfp.ah_fp_status);
-    set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH_F16]);
 }
 
 static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-        set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
     }
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
         set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
-        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status_f16);
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
     }
     if (changed & FPCR_AH) {
         bool ah_enabled = val & FPCR_AH;
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_AH].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h        | 3 +--
 target/arm/cpu.c        | 6 +++---
 target/arm/vfp_helper.c | 6 +++---
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState;
  * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
  * using a fixed value for it.
  *
- * The ah_fp_status is needed because some insns have different
+ * FPST_AH is needed because some insns have different
  * behaviour when FPCR.AH == 1: they don't update cumulative
  * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
  * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
                 float_status fp_status_a64;
                 float_status fp_status_f16_a32;
                 float_status fp_status_f16_a64;
-                float_status ah_fp_status;
             };
         };
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
-    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
-    set_flush_to_zero(1, &env->vfp.ah_fp_status);
-    set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
+    arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]);
+    set_flush_to_zero(1, &env->vfp.fp_status[FPST_AH]);
+    set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_AH]);
     arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH_F16]);
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
     /*
-     * We do not merge in flags from ah_fp_status or FPST_AH_F16, because
+     * We do not merge in flags from FPST_AH or FPST_AH_F16, because
      * they are used for insns that must not set the cumulative exception bits.
      */
 
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
-    set_float_exception_flags(0, &env->vfp.ah_fp_status);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH_F16]);
 }
 
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
-        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
     }
     if (changed & FPCR_AH) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_A64_F16].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  2 +-
 target/arm/tcg/sme_helper.c |  2 +-
 target/arm/tcg/vec_helper.c |  9 ++++-----
 target/arm/vfp_helper.c     | 16 ++++++++--------
 5 files changed, 14 insertions(+), 16 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_A32_F16].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  2 +-
 target/arm/tcg/vec_helper.c |  4 ++--
 target/arm/vfp_helper.c     | 14 +++++++-------
 4 files changed, 10 insertions(+), 11 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_A64].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-14-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  2 +-
 target/arm/tcg/sme_helper.c |  2 +-
 target/arm/tcg/vec_helper.c | 10 +++++-----
 target/arm/vfp_helper.c     | 16 ++++++++--------
 5 files changed, 15 insertions(+), 16 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_A32].  As this was the last of the
old structures, we can remove the anonymous union and struct.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-15-richard.henderson@linaro.org
[PMM: tweak to account for change to is_ebf()]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  7 +------
 target/arm/cpu.c            |  2 +-
 target/arm/tcg/vec_helper.c |  2 +-
 target/arm/vfp_helper.c     | 18 +++++++++---------
 4 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
         uint32_t scratch[8];
 
         /* There are a number of distinct float control structures. */
-        union {
-            float_status fp_status[FPST_COUNT];
-            struct {
-                float_status fp_status_a32;
-            };
-        };
+        float_status fp_status[FPST_COUNT];
 
         uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
         uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]);
     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
-    arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]);
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp)
      */
     bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;
 
-    *statusp = is_a64(env) ? env->vfp.fp_status[FPST_A64] : env->vfp.fp_status_a32;
+    *statusp = env->vfp.fp_status[is_a64(env) ? FPST_A64 : FPST_A32];
     set_default_nan_mode(true, statusp);
 
     if (ebf) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
 {
     uint32_t a32_flags = 0, a64_flags = 0;
 
-    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
+    a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_A32]);
     a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
     /* FZ16 does not generate an input denormal exception.  */
     a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A32_F16])
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      * values. The caller should have arranged for env->vfp.fpsr to
      * be the architecturally up-to-date exception flag information first.
      */
-    set_float_exception_flags(0, &env->vfp.fp_status_a32);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32_F16]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
             i = float_round_to_zero;
             break;
         }
-        set_float_rounding_mode(i, &env->vfp.fp_status_a32);
+        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32]);
         set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64]);
         set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]);
         set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
     }
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
-        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64]);
         /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]);
     }
     if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
         /*
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
     }
     if (changed & FPCR_DN) {
         bool dnan_enabled = val & FPCR_DN;
-        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32]);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64]);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32_F16]);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
         FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
 }
 DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
-DO_VFP_cmp(s, float32, float32, fp_status_a32)
-DO_VFP_cmp(d, float64, float64, fp_status_a32)
+DO_VFP_cmp(s, float32, float32, fp_status[FPST_A32])
+DO_VFP_cmp(d, float64, float64, fp_status[FPST_A32])
 #undef DO_VFP_cmp
 
 /* Integer to float and float to integer conversions */
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(fjcvtzs)(float64 value, float_status *status)
 
 uint32_t HELPER(vjcvt)(float64 value, CPUARMState *env)
 {
-    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status_a32);
+    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status[FPST_A32]);
     uint32_t result = pair;
     uint32_t z = (pair >> 32) == 0;
 
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Select on index instead of pointer.
No functional change.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-16-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/mve_helper.c | 40 +++++++++++++------------------------
 1 file changed, 14 insertions(+), 26 deletions(-)

diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                 r[e] = 0;                                               \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(tm & 1)) {                                            \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE * 2)) == 0) {          \
                 continue;                                               \
             }                                                           \
-            fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst0 = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             fpst1 = fpst0;                                              \
             if (!(mask & 1)) {                                          \
                 scratch_fpst = *fpst0;                                  \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
         unsigned e;                                             \
         TYPE *m = vm;                                           \
         TYPE ra = (TYPE)ra_in;                                  \
-        float_status *fpst = (ESIZE == 2) ?                     \
-            &env->vfp.fp_status[FPST_STD_F16] :                 \
-            &env->vfp.fp_status[FPST_STD];                       \
+        float_status *fpst =                                    \
+            &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
             if (mask & 1) {                                     \
                 TYPE v = m[H##ESIZE(e)];                        \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
             if ((mask & emask) == 0) {                                  \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
             if ((mask & emask) == 0) {                                  \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
         unsigned e;                                                     \
         float_status *fpst;                                             \
         float_status scratch_fpst;                                      \
-        float_status *base_fpst = (ESIZE == 2) ?                        \
-            &env->vfp.fp_status[FPST_STD_F16] :                         \
-            &env->vfp.fp_status[FPST_STD];                               \
+        float_status *base_fpst =                                       \
+            &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD];  \
         uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
         set_float_rounding_mode(rmode, base_fpst);                      \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Pass ARMFPStatusFlavour index instead of fp_status[FOO].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-17-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vfp_helper.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static void softfloat_to_vfp_compare(CPUARMState *env, FloatRelation cmp)
 void VFP_HELPER(cmp, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env)  \
 { \
     softfloat_to_vfp_compare(env, \
-        FLOATTYPE ## _compare_quiet(a, b, &env->vfp.FPST)); \
+        FLOATTYPE ## _compare_quiet(a, b, &env->vfp.fp_status[FPST])); \
 } \
 void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
 { \
     softfloat_to_vfp_compare(env, \
-        FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
+        FLOATTYPE ## _compare(a, b, &env->vfp.fp_status[FPST])); \
 }
-DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
-DO_VFP_cmp(s, float32, float32, fp_status[FPST_A32])
-DO_VFP_cmp(d, float64, float64, fp_status[FPST_A32])
+DO_VFP_cmp(h, float16, dh_ctype_f16, FPST_A32_F16)
+DO_VFP_cmp(s, float32, float32, FPST_A32)
+DO_VFP_cmp(d, float64, float64, FPST_A32)
 #undef DO_VFP_cmp
 
 /* Integer to float and float to integer conversions */
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Read the bit from the source, rather than from the proxy via
get_flush_inputs_to_zero.  This makes it clear that it does
not matter which of the float_status structures is used.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-34-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 
     do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
-             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
+             env->vfp.fpcr & FPCR_FZ16);
 }
 
 void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
         }
     }
     do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
+             env->vfp.fpcr & FPCR_FZ16);
 }
 
 void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     float_status *status = &env->vfp.fp_status[FPST_A64];
-    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
+    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
     int negx = 0, negf = 0;
 
     if (is_s) {
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 
     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
-                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
+                 env->vfp.fpcr & FPCR_FZ16);
 }
 
 void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
         }
     }
     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
+                 env->vfp.fpcr & FPCR_FZ16);
 }
 
 void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
     float_status *status = &env->vfp.fp_status[FPST_A64];
-    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
+    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
     int negx = 0, negf = 0;
 
     if (is_s) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Sink common code from the callers into do_fmlal
and do_fmlal_idx.  Reorder the arguments to minimize
the re-sorting from the caller's arguments.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-35-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)