Series comparison

-[PULL 00/55] target-arm queue
+[PULL 00/68] target-arm queue
-The following changes since commit 6d940eff4734bcb40b1a25f62d7cec5a396f994a:
+Hi; this pullreq contains only my FEAT_AFP/FEAT_RPRES patches
 (plus a fix for a target/alpha latent bug that would otherwise
 be revealed by the fpu changes), because 68 patches is already
 longer than I prefer to send in at one time...
-  Merge tag 'pull-tpm-2022-06-07-1' of https://github.com/stefanberger/qemu-tpm into staging (2022-06-07 19:22:18 -0700)
+thanks
 -- PMM
 The following changes since commit ffaf7f0376f8040ce9068d71ae9ae8722505c42e:
   Merge tag 'pull-10.0-testing-and-gdstub-updates-100225-1' of https://gitlab.com/stsquad/qemu into staging (2025-02-10 13:26:17 -0500)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20220609
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20250211
-for you to fetch changes up to 414c54d515dba16bfaef643a8acec200c05f229a:
+for you to fetch changes up to ca4c34e07d1388df8e396520b5e7d60883cd3690:
-  target/arm: Add ID_AA64SMFR0_EL1 (2022-06-08 19:38:59 +0100)
+  target/arm: Sink fp_status and fpcr access into do_fmlal* (2025-02-11 16:22:08 +0000)
 ----------------------------------------------------------------
 target-arm queue:
- * target/arm: Declare support for FEAT_RASv1p1
+ * target/alpha: Don't corrupt error_code with unknown softfloat flags
- * target/arm: Implement FEAT_DoubleFault
+ * target/arm: Implement FEAT_AFP and FEAT_RPRES
  * Fix 'writeable' typos
  * xlnx_dp: Implement vblank interrupt
  * target/arm: Move page-table-walk code to ptw.c
  * target/arm: Preparatory patches for SME support
 ----------------------------------------------------------------
-Frederic Konrad (2):
+Peter Maydell (49):
-      xlnx_dp: fix the wrong register size
+      target/alpha: Don't corrupt error_code with unknown softfloat flags
-      xlnx-zynqmp: fix the irq mapping for the display port and its dma
+      fpu: Add float_class_denormal
       fpu: Implement float_flag_input_denormal_used
       fpu: allow flushing of output denormals to be after rounding
       target/arm: Define FPCR AH, FIZ, NEP bits
       target/arm: Implement FPCR.FIZ handling
       target/arm: Adjust FP behaviour for FPCR.AH = 1
       target/arm: Adjust exception flag handling for AH = 1
       target/arm: Add FPCR.AH to tbflags
       target/arm: Set up float_status to use for FPCR.AH=1 behaviour
       target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
       target/arm: Use FPST_FPCR_AH for BFCVT* insns
       target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
       target/arm: Add FPCR.NEP to TBFLAGS
       target/arm: Define and use new write_fp_*reg_merging() functions
       target/arm: Handle FPCR.NEP for 3-input scalar operations
       target/arm: Handle FPCR.NEP for BFCVT scalar
       target/arm: Handle FPCR.NEP for 1-input scalar operations
       target/arm: Handle FPCR.NEP in do_cvtf_scalar()
       target/arm: Handle FPCR.NEP for scalar FABS and FNEG
       target/arm: Handle FPCR.NEP for FCVTXN (scalar)
       target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
       target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
       target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
       target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
       target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
       target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
       target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
       target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
       target/arm: Implement FPCR.AH handling of negation of NaN
       target/arm: Implement FPCR.AH handling for scalar FABS and FABD
       target/arm: Handle FPCR.AH in vector FABD
       target/arm: Handle FPCR.AH in SVE FNEG
       target/arm: Handle FPCR.AH in SVE FABS
       target/arm: Handle FPCR.AH in SVE FABD
       target/arm: Handle FPCR.AH in negation steps in SVE FCADD
       target/arm: Handle FPCR.AH in negation steps in FCADD
       target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
       target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
       target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
       target/arm: Handle FPCR.AH in negation in FMLS (vector)
       target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
       target/arm: Handle FPCR.AH in SVE FTSSEL
       target/arm: Handle FPCR.AH in SVE FTMAD
       target/arm: Enable FEAT_AFP for '-cpu max'
       target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
       target/arm: Implement increased precision FRECPE
       target/arm: Implement increased precision FRSQRTE
       target/arm: Enable FEAT_RPRES for -cpu max
-Peter Maydell (3):
+Richard Henderson (19):
-      target/arm: Declare support for FEAT_RASv1p1
+      target/arm: Handle FPCR.AH in vector FCMLA
-      target/arm: Implement FEAT_DoubleFault
+      target/arm: Handle FPCR.AH in FCMLA by index
-      Fix 'writeable' typos
+      target/arm: Handle FPCR.AH in SVE FCMLA
       target/arm: Handle FPCR.AH in FMLSL (by element and vector)
       target/arm: Handle FPCR.AH in SVE FMLSL (indexed)
       target/arm: Handle FPCR.AH in SVE FMLSLB, FMLSLT (vectors)
       target/arm: Introduce CPUARMState.vfp.fp_status[]
       target/arm: Remove standard_fp_status_f16
       target/arm: Remove standard_fp_status
       target/arm: Remove ah_fp_status_f16
       target/arm: Remove ah_fp_status
       target/arm: Remove fp_status_f16_a64
       target/arm: Remove fp_status_f16_a32
       target/arm: Remove fp_status_a64
       target/arm: Remove fp_status_a32
       target/arm: Simplify fp_status indexing in mve_helper.c
       target/arm: Simplify DO_VFP_cmp in vfp_helper.c
       target/arm: Read fz16 from env->vfp.fpcr
       target/arm: Sink fp_status and fpcr access into do_fmlal*
-Richard Henderson (48):
+ docs/system/arm/emulation.rst   |   2 +
-      target/arm: Move stage_1_mmu_idx decl to internals.h
+ include/fpu/softfloat-helpers.h |  11 +
-      target/arm: Move get_phys_addr to ptw.c
+ include/fpu/softfloat-types.h   |  25 ++
-      target/arm: Move get_phys_addr_v5 to ptw.c
+ target/arm/cpu-features.h       |  10 +
-      target/arm: Move get_phys_addr_v6 to ptw.c
+ target/arm/cpu.h                |  97 +++--
-      target/arm: Move get_phys_addr_pmsav5 to ptw.c
+ target/arm/helper.h             |  26 ++
-      target/arm: Move get_phys_addr_pmsav7_default to ptw.c
+ target/arm/internals.h          |   6 +
-      target/arm: Move get_phys_addr_pmsav7 to ptw.c
+ target/arm/tcg/helper-a64.h     |  13 +
-      target/arm: Move get_phys_addr_pmsav8 to ptw.c
+ target/arm/tcg/helper-sve.h     | 120 ++++++
-      target/arm: Move pmsav8_mpu_lookup to ptw.c
+ target/arm/tcg/translate-a64.h  |  13 +
-      target/arm: Move pmsav7_use_background_region to ptw.c
+ target/arm/tcg/translate.h      |  54 +--
-      target/arm: Move v8m_security_lookup to ptw.c
+ target/arm/tcg/vec_internal.h   |  35 ++
-      target/arm: Move m_is_{ppb,system}_region to ptw.c
+ target/mips/fpu_helper.h        |   6 +
-      target/arm: Move get_level1_table_address to ptw.c
+ fpu/softfloat.c                 |  66 +++-
-      target/arm: Move combine_cacheattrs and subroutines to ptw.c
+ target/alpha/cpu.c              |   7 +
-      target/arm: Move get_phys_addr_lpae to ptw.c
+ target/alpha/fpu_helper.c       |   2 +
-      target/arm: Move arm_{ldl,ldq}_ptw to ptw.c
+ target/arm/cpu.c                |  46 +--
-      target/arm: Move {arm_s1_, }regime_using_lpae_format to tlb_helper.c
+ target/arm/helper.c             |   2 +-
-      target/arm: Move arm_pamax, pamax_map into ptw.c
+ target/arm/tcg/cpu64.c          |   2 +
-      target/arm: Move get_S1prot, get_S2prot to ptw.c
+ target/arm/tcg/helper-a64.c     | 151 ++++----
-      target/arm: Move check_s2_mmu_setup to ptw.c
+ target/arm/tcg/hflags.c         |  13 +
-      target/arm: Move aa32_va_parameters to ptw.c
+ target/arm/tcg/mve_helper.c     |  44 +--
-      target/arm: Move ap_to_tw_prot etc to ptw.c
+ target/arm/tcg/sme_helper.c     |   4 +-
-      target/arm: Move regime_is_user to ptw.c
+ target/arm/tcg/sve_helper.c     | 367 ++++++++++++++-----
-      target/arm: Move regime_ttbr to ptw.c
+ target/arm/tcg/translate-a64.c  | 782 ++++++++++++++++++++++++++++++++--------
-      target/arm: Move regime_translation_disabled to ptw.c
+ target/arm/tcg/translate-sve.c  | 193 +++++++---
-      target/arm: Move arm_cpu_get_phys_page_attrs_debug to ptw.c
+ target/arm/tcg/vec_helper.c     | 387 ++++++++++++++------
-      target/arm: Move stage_1_mmu_idx, arm_stage1_mmu_idx to ptw.c
+ target/arm/vfp_helper.c         | 374 +++++++++++++++----
-      target/arm: Pass CPUARMState to arm_ld[lq]_ptw
+ target/hppa/fpu_helper.c        |  11 +
-      target/arm: Rename TBFLAG_A64 ZCR_LEN to VL
+ target/i386/tcg/fpu_helper.c    |   8 +
-      linux-user/aarch64: Introduce sve_vq
+ target/mips/msa.c               |   9 +
-      target/arm: Remove route_to_el2 check from sve_exception_el
+ target/ppc/cpu_init.c           |   3 +
-      target/arm: Remove fp checks from sve_exception_el
+ target/rx/cpu.c                 |   8 +
-      target/arm: Add el_is_in_host
+ target/sh4/cpu.c                |   8 +
-      target/arm: Use el_is_in_host for sve_zcr_len_for_el
+ target/tricore/helper.c         |   1 +
-      target/arm: Use el_is_in_host for sve_exception_el
+ tests/fp/fp-bench.c             |   1 +
-      target/arm: Hoist arm_is_el2_enabled check in sve_exception_el
+ fpu/softfloat-parts.c.inc       | 127 +++++--
-      target/arm: Do not use aarch64_sve_zcr_get_valid_len in reset
+files changed, 2325 insertions(+), 709 deletions(-)
       target/arm: Merge aarch64_sve_zcr_get_valid_len into caller
       target/arm: Use uint32_t instead of bitmap for sve vq's
       target/arm: Rename sve_zcr_len_for_el to sve_vqm1_for_el
       target/arm: Split out load/store primitives to sve_ldst_internal.h
       target/arm: Export sve contiguous ldst support functions
       target/arm: Move expand_pred_b to vec_internal.h
       target/arm: Use expand_pred_b in mve_helper.c
       target/arm: Move expand_pred_h to vec_internal.h
       target/arm: Export bfdotadd from vec_helper.c
       target/arm: Add isar_feature_aa64_sme
       target/arm: Add ID_AA64SMFR0_EL1
 Sai Pavan Boddu (2):
       xlnx_dp: Introduce a vblank signal
       xlnx_dp: Fix the interrupt disable logic
  docs/interop/vhost-user.rst       |    2 +-
  docs/specs/vmgenid.txt            |    4 +-
  docs/system/arm/emulation.rst     |    2 +
  hw/scsi/mfi.h                     |    2 +-
  include/hw/display/xlnx_dp.h      |   12 +-
  linux-user/aarch64/target_prctl.h |   20 +-
  target/arm/cpu.h                  |   66 +-
  target/arm/internals.h            |   45 +-
  target/arm/kvm_arm.h              |    7 +-
  target/arm/sve_ldst_internal.h    |  221 +++
  target/arm/translate-a64.h        |    2 +-
  target/arm/translate.h            |    2 +-
  target/arm/vec_internal.h         |   28 +-
  target/i386/hvf/vmcs.h            |    2 +-
  target/i386/hvf/vmx.h             |    2 +-
  accel/hvf/hvf-accel-ops.c         |    4 +-
  accel/kvm/kvm-all.c               |    4 +-
  accel/tcg/user-exec.c             |    6 +-
  hw/acpi/ghes.c                    |    2 +-
  hw/arm/xlnx-zynqmp.c              |    4 +-
  hw/display/xlnx_dp.c              |   49 +-
  hw/intc/arm_gicv3_cpuif.c         |    2 +-
  hw/intc/arm_gicv3_dist.c          |    2 +-
  hw/intc/arm_gicv3_redist.c        |    4 +-
  hw/intc/riscv_aclint.c            |    2 +-
  hw/intc/riscv_aplic.c             |    2 +-
  hw/pci/shpc.c                     |    2 +-
  hw/sparc64/sun4u_iommu.c          |    2 +-
  hw/timer/sse-timer.c              |    2 +-
  linux-user/aarch64/signal.c       |    4 +-
  target/arm/arch_dump.c            |    2 +-
  target/arm/cpu.c                  |    5 +-
  target/arm/cpu64.c                |  120 +-
  target/arm/gdbstub.c              |    2 +-
  target/arm/gdbstub64.c            |    2 +-
  target/arm/helper.c               | 2742 ++-----------------------------------
  target/arm/hvf/hvf.c              |    4 +-
  target/arm/kvm64.c                |   47 +-
  target/arm/mve_helper.c           |    6 +-
  target/arm/ptw.c                  | 2540 ++++++++++++++++++++++++++++++++++
  target/arm/sve_helper.c           |  232 +---
  target/arm/tlb_helper.c           |   26 +
  target/arm/translate-a64.c        |    2 +-
  target/arm/translate-sve.c        |    2 +-
  target/arm/vec_helper.c           |   28 +-
  target/i386/cpu-sysemu.c          |    2 +-
  target/s390x/ioinst.c             |    2 +-
  python/qemu/machine/machine.py    |    2 +-
  target/arm/meson.build            |    1 +
  tests/tcg/x86_64/system/boot.S    |    2 +-
 files changed, 3240 insertions(+), 3037 deletions(-)
  create mode 100644 target/arm/sve_ldst_internal.h
  create mode 100644 target/arm/ptw.c

-New patch
+[PULL 01/68] target/alpha: Don't corrupt error_code with unknown softfloat flags
+In do_cvttq() we set env->error_code with what is supposed to be a
+set of FPCR exception bit values.  However, if the set of float
+exception flags we get back from softfloat for the conversion
+includes a flag which is not one of the three we expect here
+(invalid_cvti, invalid, inexact) then we will fall through the
+if-ladder and set env->error_code to the unconverted softfloat
+exception_flag value.  This will then cause us to take a spurious
+exception.
+This is harmless now, but when we add new floating point exception
+flags to softfloat it will cause problems.  Add an else clause to the
+if-ladder to make it ignore any float exception flags it doesn't care
+about.
+Specifically, without this fix, 'make check-tcg' will fail for Alpha
+when the commit adding float_flag_input_denormal_used lands.
+Fixes: aa3bad5b59e7 ("target/alpha: Use float64_to_int64_modulo for CVTTQ")
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+---
+ target/alpha/fpu_helper.c | 2 ++
+file changed, 2 insertions(+)
+diff --git a/target/alpha/fpu_helper.c b/target/alpha/fpu_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/alpha/fpu_helper.c
++++ b/target/alpha/fpu_helper.c
+@@ -XXX,XX +XXX,XX @@ static uint64_t do_cvttq(CPUAlphaState *env, uint64_t a, int roundmode)
+             exc = FPCR_INV;
+         } else if (exc & float_flag_inexact) {
+             exc = FPCR_INE;
++        } else {
++            exc = 0;
+         }
+     }
+     env->error_code = exc;
+--
+.34.1

-[PULL 05/55] xlnx_dp: Introduce a vblank signal
+[PULL 02/68] fpu: Add float_class_denormal
-From: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
+Currently in softfloat we canonicalize input denormals and so the
 code that implements floating point operations does not need to care
 whether the input value was originally normal or denormal.  However,
 both x86 and Arm FEAT_AFP require that an exception flag is set if:
  * an input is denormal
  * that input is not squashed to zero
  * that input is actually used in the calculation (e.g. we
    did not find the other input was a NaN)
-Add a periodic timer which raises vblank at a frequency of 30Hz.
+So we need to track that the input was a non-squashed denormal.  To
 do this we add a new value to the FloatClass enum.  In this commit we
 add the value and adjust the code everywhere that looks at FloatClass
 values so that the new float_class_denormal behaves identically to
 float_class_normal.  We will add the code that does the "raise a new
 float exception flag if an input was an unsquashed denormal and we
 used it" in a subsequent commit.
-Note that this is a migration compatibility break for the
+There should be no behavioural change in this commit.
 xlnx-zcu102 board type.
-Signed-off-by: Sai Pavan Boddu <saipava@xilinx.com>
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
-Signed-off-by: Frederic Konrad <fkonrad@amd.com>
-Acked-by: Alistair Francis <alistair.francis@wdc.com>
-Message-id: 20220601172353.3220232-3-fkonrad@xilinx.com
-Changes by fkonrad:
-  - Switched to transaction-based ptimer API.
-  - Added the DP_INT_VBLNK_START macro.
-Signed-off-by: Frederic Konrad <fkonrad@amd.com>
-[PMM: bump vmstate version, add commit message note about
- compat break]
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/hw/display/xlnx_dp.h |  3 +++
+ fpu/softfloat.c           | 32 ++++++++++++++++++++++++++++---
- hw/display/xlnx_dp.c         | 30 ++++++++++++++++++++++++++----
+ fpu/softfloat-parts.c.inc | 40 ++++++++++++++++++++++++---------------
-files changed, 29 insertions(+), 4 deletions(-)
+files changed, 54 insertions(+), 18 deletions(-)
-diff --git a/include/hw/display/xlnx_dp.h b/include/hw/display/xlnx_dp.h
+diff --git a/fpu/softfloat.c b/fpu/softfloat.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/display/xlnx_dp.h
+--- a/fpu/softfloat.c
-+++ b/include/hw/display/xlnx_dp.h
++++ b/fpu/softfloat.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ float64_gen2(float64 xa, float64 xb, float_status *s,
- #include "hw/dma/xlnx_dpdma.h"
+ /*
- #include "audio/audio.h"
+  * Classify a floating point number. Everything above float_class_qnan
- #include "qom/object.h"
+  * is a NaN so cls >= float_class_qnan is any NaN.
-+#include "hw/ptimer.h"
++ *
++ * Note that we canonicalize denormals, so most code should treat
- #define AUD_CHBUF_MAX_DEPTH                 (32 * KiB)
++ * class_normal and class_denormal identically.
- #define MAX_QEMU_BUFFER_SIZE                (4 * KiB)
+  */
-@@ -XXX,XX +XXX,XX @@ struct XlnxDPState {
-      */
+ typedef enum __attribute__ ((__packed__)) {
-     DPCDState *dpcd;
+     float_class_unclassified,
-     I2CDDCState *edid;
+     float_class_zero,
-+
+     float_class_normal,
-+    ptimer_state *vblank;
++    float_class_denormal, /* input was a non-squashed denormal */
      float_class_inf,
      float_class_qnan,  /* all NaNs from here */
      float_class_snan,
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__ ((__packed__)) {
  enum {
      float_cmask_zero    = float_cmask(float_class_zero),
      float_cmask_normal  = float_cmask(float_class_normal),
 +    float_cmask_denormal = float_cmask(float_class_denormal),
      float_cmask_inf     = float_cmask(float_class_inf),
      float_cmask_qnan    = float_cmask(float_class_qnan),
      float_cmask_snan    = float_cmask(float_class_snan),
      float_cmask_infzero = float_cmask_zero | float_cmask_inf,
      float_cmask_anynan  = float_cmask_qnan | float_cmask_snan,
 +    float_cmask_anynorm = float_cmask_normal | float_cmask_denormal,
  };
- #define TYPE_XLNX_DP "xlnx.v-dp"
+ /* Flags for parts_minmax. */
-diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
+@@ -XXX,XX +XXX,XX @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
-index XXXXXXX..XXXXXXX 100644
+     return c == float_class_qnan;
 --- a/hw/display/xlnx_dp.c
 +++ b/hw/display/xlnx_dp.c
@@ -XXX,XX +XXX,XX @@
  #define DP_TX_N_AUD                         (0x032C >> 2)
  #define DP_TX_AUDIO_EXT_DATA(n)             ((0x0330 + 4 * n) >> 2)
  #define DP_INT_STATUS                       (0x03A0 >> 2)
 +#define DP_INT_VBLNK_START                  (1 << 13)
  #define DP_INT_MASK                         (0x03A4 >> 2)
  #define DP_INT_EN                           (0x03A8 >> 2)
  #define DP_INT_DS                           (0x03AC >> 2)
@@ -XXX,XX +XXX,XX @@ typedef enum DPVideoFmt DPVideoFmt;
  static const VMStateDescription vmstate_dp = {
      .name = TYPE_XLNX_DP,
 -    .version_id = 1,
 +    .version_id = 2,
      .fields = (VMStateField[]){
          VMSTATE_UINT32_ARRAY(core_registers, XlnxDPState,
                               DP_CORE_REG_ARRAY_SIZE),
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_dp = {
                               DP_VBLEND_REG_ARRAY_SIZE),
          VMSTATE_UINT32_ARRAY(audio_registers, XlnxDPState,
                               DP_AUDIO_REG_ARRAY_SIZE),
 +        VMSTATE_PTIMER(vblank, XlnxDPState),
          VMSTATE_END_OF_LIST()
      }
  };
 +#define DP_VBLANK_PTIMER_POLICY (PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD | \
 +                                 PTIMER_POLICY_CONTINUOUS_TRIGGER |    \
 +                                 PTIMER_POLICY_NO_IMMEDIATE_TRIGGER)
 +
  static void xlnx_dp_update_irq(XlnxDPState *s);
  static uint64_t xlnx_dp_audio_read(void *opaque, hwaddr offset, unsigned size)
@@ -XXX,XX +XXX,XX @@ static void xlnx_dp_write(void *opaque, hwaddr offset, uint64_t value,
          break;
      case DP_TRANSMITTER_ENABLE:
          s->core_registers[offset] = value & 0x01;
 +        ptimer_transaction_begin(s->vblank);
 +        if (value & 0x1) {
 +            ptimer_run(s->vblank, 0);
 +        } else {
 +            ptimer_stop(s->vblank);
 +        }
 +        ptimer_transaction_commit(s->vblank);
          break;
      case DP_FORCE_SCRAMBLER_RESET:
          /*
@@ -XXX,XX +XXX,XX @@ static void xlnx_dp_update_display(void *opaque)
          return;
      }
 -    s->core_registers[DP_INT_STATUS] |= (1 << 13);
 -    xlnx_dp_update_irq(s);
 -
      xlnx_dpdma_trigger_vsync_irq(s->dpdma);
      /*
@@ -XXX,XX +XXX,XX @@ static void xlnx_dp_finalize(Object *obj)
      fifo8_destroy(&s->rx_fifo);
  }
-+static void vblank_hit(void *opaque)
++/*
 + * Return true if the float_cmask has only normals in it
 + * (including input denormals that were canonicalized)
 + */
 +static inline bool cmask_is_only_normals(int cmask)
 +{
-+    XlnxDPState *s = XLNX_DP(opaque);
++    return !(cmask & ~float_cmask_anynorm);
 +
 +    s->core_registers[DP_INT_STATUS] |= DP_INT_VBLNK_START;
 +    xlnx_dp_update_irq(s);
 +}
 +
- static void xlnx_dp_realize(DeviceState *dev, Error **errp)
++static inline bool is_anynorm(FloatClass c)
 +{
 +    return float_cmask(c) & float_cmask_anynorm;
 +}
 +
  /*
   * Structure holding all of the decomposed parts of a float.
   * The exponent is unbiased and the fraction is normalized.
@@ -XXX,XX +XXX,XX @@ static float64 float64r32_round_pack_canonical(FloatParts64 *p,
       */
      switch (p->cls) {
      case float_class_normal:
 +    case float_class_denormal:
          if (unlikely(p->exp == 0)) {
              /*
               * The result is denormal for float32, but can be represented
@@ -XXX,XX +XXX,XX @@ static floatx80 floatx80_round_pack_canonical(FloatParts128 *p,
      switch (p->cls) {
      case float_class_normal:
 +    case float_class_denormal:
          if (s->floatx80_rounding_precision == floatx80_precision_x) {
              parts_uncanon_normal(p, s, fmt);
              frac = p->frac_hi;
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
          break;
      case float_class_normal:
 +    case float_class_denormal:
      case float_class_zero:
          break;
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
      a->sign = b->sign;
      a->exp = b->exp;
 -    if (a->cls == float_class_normal) {
 +    if (is_anynorm(a->cls)) {
          frac_truncjam(a, b);
      } else if (is_nan(a->cls)) {
          /* Discard the low bits of the NaN. */
@@ -XXX,XX +XXX,XX @@ static Int128 float128_to_int128_scalbn(float128 a, FloatRoundMode rmode,
          return int128_zero();
      case float_class_normal:
 +    case float_class_denormal:
          if (parts_round_to_int_normal(&p, rmode, scale, 128 - 2)) {
              flags = float_flag_inexact;
          }
@@ -XXX,XX +XXX,XX @@ static Int128 float128_to_uint128_scalbn(float128 a, FloatRoundMode rmode,
          return int128_zero();
      case float_class_normal:
 +    case float_class_denormal:
          if (parts_round_to_int_normal(&p, rmode, scale, 128 - 2)) {
              flags = float_flag_inexact;
              if (p.cls == float_class_zero) {
@@ -XXX,XX +XXX,XX @@ float32 float32_exp2(float32 a, float_status *status)
      float32_unpack_canonical(&xp, a, status);
      if (unlikely(xp.cls != float_class_normal)) {
          switch (xp.cls) {
 +        case float_class_denormal:
 +            break;
          case float_class_snan:
          case float_class_qnan:
              parts_return_nan(&xp, status);
@@ -XXX,XX +XXX,XX @@ float32 float32_exp2(float32 a, float_status *status)
          case float_class_zero:
              return float32_one;
          default:
 -            break;
 +            g_assert_not_reached();
          }
 -        g_assert_not_reached();
      }
      float_raise(float_flag_inexact, status);
 diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/fpu/softfloat-parts.c.inc
 +++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(canonicalize)(FloatPartsN *p, float_status *status,
              frac_clear(p);
          } else {
              int shift = frac_normalize(p);
 -            p->cls = float_class_normal;
 +            p->cls = float_class_denormal;
              p->exp = fmt->frac_shift - fmt->exp_bias
                     - shift + !fmt->m68k_denormal;
          }
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
  static void partsN(uncanon)(FloatPartsN *p, float_status *s,
                              const FloatFmt *fmt)
  {
-     XlnxDPState *s = XLNX_DP(dev);
+-    if (likely(p->cls == float_class_normal)) {
-@@ -XXX,XX +XXX,XX @@ static void xlnx_dp_realize(DeviceState *dev, Error **errp)
++    if (likely(is_anynorm(p->cls))) {
-                                            &as);
+         parts_uncanon_normal(p, s, fmt);
-     AUD_set_volume_out(s->amixer_output_stream, 0, 255, 255);
+     } else {
-     xlnx_dp_audio_activate(s);
+         switch (p->cls) {
-+    s->vblank = ptimer_init(vblank_hit, s, DP_VBLANK_PTIMER_POLICY);
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
-+    ptimer_transaction_begin(s->vblank);
-+    ptimer_set_freq(s->vblank, 30);
+     if (a->sign != b_sign) {
-+    ptimer_transaction_commit(s->vblank);
+         /* Subtraction */
- }
+-        if (likely(ab_mask == float_cmask_normal)) {
++        if (likely(cmask_is_only_normals(ab_mask))) {
- static void xlnx_dp_reset(DeviceState *dev)
+             if (parts_sub_normal(a, b)) {
                  return a;
              }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
          }
      } else {
          /* Addition */
 -        if (likely(ab_mask == float_cmask_normal)) {
 +        if (likely(cmask_is_only_normals(ab_mask))) {
              parts_add_normal(a, b);
              return a;
          }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
      }
      if (b->cls == float_class_zero) {
 -        g_assert(a->cls == float_class_normal);
 +        g_assert(is_anynorm(a->cls));
          return a;
      }
      g_assert(a->cls == float_class_zero);
 -    g_assert(b->cls == float_class_normal);
 +    g_assert(is_anynorm(b->cls));
   return_b:
      b->sign = b_sign;
      return b;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
      int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
      bool sign = a->sign ^ b->sign;
 -    if (likely(ab_mask == float_cmask_normal)) {
 +    if (likely(cmask_is_only_normals(ab_mask))) {
          FloatPartsW tmp;
          frac_mulw(&tmp, a, b);
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
          a->sign ^= 1;
      }
 -    if (unlikely(ab_mask != float_cmask_normal)) {
 +    if (unlikely(!cmask_is_only_normals(ab_mask))) {
          if (unlikely(ab_mask == float_cmask_infzero)) {
              float_raise(float_flag_invalid | float_flag_invalid_imz, s);
              goto d_nan;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
          }
          g_assert(ab_mask & float_cmask_zero);
 -        if (c->cls == float_class_normal) {
 +        if (is_anynorm(c->cls)) {
              *a = *c;
              goto return_normal;
          }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
      int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
      bool sign = a->sign ^ b->sign;
 -    if (likely(ab_mask == float_cmask_normal)) {
 +    if (likely(cmask_is_only_normals(ab_mask))) {
          a->sign = sign;
          a->exp -= b->exp + frac_div(a, b);
          return a;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
  {
      int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 -    if (likely(ab_mask == float_cmask_normal)) {
 +    if (likely(cmask_is_only_normals(ab_mask))) {
          frac_modrem(a, b, mod_quot);
          return a;
      }
@@ -XXX,XX +XXX,XX @@ static void partsN(sqrt)(FloatPartsN *a, float_status *status,
      if (unlikely(a->cls != float_class_normal)) {
          switch (a->cls) {
 +        case float_class_denormal:
 +            break;
          case float_class_snan:
          case float_class_qnan:
              parts_return_nan(a, status);
@@ -XXX,XX +XXX,XX @@ static void partsN(round_to_int)(FloatPartsN *a, FloatRoundMode rmode,
      case float_class_inf:
          break;
      case float_class_normal:
 +    case float_class_denormal:
          if (parts_round_to_int_normal(a, rmode, scale, fmt->frac_size)) {
              float_raise(float_flag_inexact, s);
          }
@@ -XXX,XX +XXX,XX @@ static int64_t partsN(float_to_sint)(FloatPartsN *p, FloatRoundMode rmode,
          return 0;
      case float_class_normal:
 +    case float_class_denormal:
          /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
          if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
              flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static uint64_t partsN(float_to_uint)(FloatPartsN *p, FloatRoundMode rmode,
          return 0;
      case float_class_normal:
 +    case float_class_denormal:
          /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
          if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
              flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static int64_t partsN(float_to_sint_modulo)(FloatPartsN *p,
          return 0;
      case float_class_normal:
 +    case float_class_denormal:
          /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
          if (parts_round_to_int_normal(p, rmode, 0, N - 2)) {
              flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
      a_exp = a->exp;
      b_exp = b->exp;
 -    if (unlikely(ab_mask != float_cmask_normal)) {
 +    if (unlikely(!cmask_is_only_normals(ab_mask))) {
          switch (a->cls) {
          case float_class_normal:
 +        case float_class_denormal:
              break;
          case float_class_inf:
              a_exp = INT16_MAX;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
          }
          switch (b->cls) {
          case float_class_normal:
 +        case float_class_denormal:
              break;
          case float_class_inf:
              b_exp = INT16_MAX;
@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
  {
      int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 -    if (likely(ab_mask == float_cmask_normal)) {
 +    if (likely(cmask_is_only_normals(ab_mask))) {
          FloatRelation cmp;
          if (a->sign != b->sign) {
@@ -XXX,XX +XXX,XX @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
      case float_class_inf:
          break;
      case float_class_normal:
 +    case float_class_denormal:
          a->exp += MIN(MAX(n, -0x10000), 0x10000);
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
      if (unlikely(a->cls != float_class_normal)) {
          switch (a->cls) {
 +        case float_class_denormal:
 +            break;
          case float_class_snan:
          case float_class_qnan:
              parts_return_nan(a, s);
@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
              }
              return;
          default:
 -            break;
 +            g_assert_not_reached();
          }
 -        g_assert_not_reached();
      }
      if (unlikely(a->sign)) {
          goto d_nan;
 --
-.25.1
+.34.1

-[PULL 22/55] target/arm: Move get_phys_addr_lpae to ptw.c
+[PULL 03/68] fpu: Implement float_flag_input_denormal_used
-From: Richard Henderson <richard.henderson@linaro.org>
+For the x86 and the Arm FEAT_AFP semantics, we need to be able to
 tell the target code that the FPU operation has used an input
 denormal.  Implement this; when it happens we set the new
 float_flag_denormal_input_used.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Note that we only set this when an input denormal is actually used by
-Message-id: 20220604040607.269301-16-richard.henderson@linaro.org
+the operation: if the operation results in Invalid Operation or
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Divide By Zero or the result is a NaN because some other input was a
 NaN then we never needed to look at the input denormal and do not set
 denormal_input_used.
 We mostly do not need to adjust the hardfloat codepaths to deal with
 this flag, because almost all hardfloat operations are already gated
 on the input not being a denormal, and will fall back to softfloat
 for a denormal input.  The only exception is the comparison
 operations, where we need to add the check for input denormals, which
 must now fall back to softfloat where they did not before.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/ptw.h    |  10 ++
+ include/fpu/softfloat-types.h |  7 ++++
- target/arm/helper.c | 416 +-------------------------------------------
+ fpu/softfloat.c               | 38 +++++++++++++++++---
- target/arm/ptw.c    | 411 +++++++++++++++++++++++++++++++++++++++++++
+ fpu/softfloat-parts.c.inc     | 68 ++++++++++++++++++++++++++++++++++-
-files changed, 429 insertions(+), 408 deletions(-)
+files changed, 107 insertions(+), 6 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/include/fpu/softfloat-types.h
-+++ b/target/arm/ptw.h
++++ b/include/fpu/softfloat-types.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ enum {
+     float_flag_invalid_sqrt    = 0x0800,  /* sqrt(-x) */
- #ifndef CONFIG_USER_ONLY
+     float_flag_invalid_cvti    = 0x1000,  /* non-nan to integer */
+     float_flag_invalid_snan    = 0x2000,  /* any operand was snan */
-+extern const uint8_t pamax_map[7];
++    /*
-+
++     * An input was denormal and we used it (without flushing it to zero).
- uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
++     * Not set if we do not actually use the denormal input (e.g.
-                      ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi);
++     * because some other input was a NaN, or because the operation
- uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
++     * wasn't actually carried out (divide-by-zero; invalid))
-@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
++     */
-     return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
++    float_flag_input_denormal_used = 0x4000,
  };
  /*
 diff --git a/fpu/softfloat.c b/fpu/softfloat.c
 index XXXXXXX..XXXXXXX 100644
 --- a/fpu/softfloat.c
 +++ b/fpu/softfloat.c
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
                                    float16_params_ahp.frac_size + 1);
          break;
 -    case float_class_normal:
      case float_class_denormal:
 +        float_raise(float_flag_input_denormal_used, s);
 +        break;
 +    case float_class_normal:
      case float_class_zero:
          break;
@@ -XXX,XX +XXX,XX @@ static void parts64_float_to_float(FloatParts64 *a, float_status *s)
      if (is_nan(a->cls)) {
          parts_return_nan(a, s);
      }
 +    if (a->cls == float_class_denormal) {
 +        float_raise(float_flag_input_denormal_used, s);
 +    }
  }
-+ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
+ static void parts128_float_to_float(FloatParts128 *a, float_status *s)
-+                                   ARMMMUIdx mmu_idx);
+@@ -XXX,XX +XXX,XX @@ static void parts128_float_to_float(FloatParts128 *a, float_status *s)
-+bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
+     if (is_nan(a->cls)) {
-+                        int inputsize, int stride, int outputsize);
+         parts_return_nan(a, s);
-+int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0);
+     }
-+int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
++    if (a->cls == float_class_denormal) {
-+               int ap, int ns, int xn, int pxn);
++        float_raise(float_flag_input_denormal_used, s);
-+
++    }
- bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
+ }
-                         MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                         bool s1_is_el0,
+ #define parts_float_to_float(P, S) \
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
      a->sign = b->sign;
      a->exp = b->exp;
 -    if (is_anynorm(a->cls)) {
 +    switch (a->cls) {
 +    case float_class_denormal:
 +        float_raise(float_flag_input_denormal_used, s);
 +        /* fall through */
 +    case float_class_normal:
          frac_truncjam(a, b);
 -    } else if (is_nan(a->cls)) {
 +        break;
 +    case float_class_snan:
 +    case float_class_qnan:
          /* Discard the low bits of the NaN. */
          a->frac = b->frac_hi;
          parts_return_nan(a, s);
 +        break;
 +    default:
 +        break;
      }
  }
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_widen(FloatParts128 *a, FloatParts64 *b,
      if (is_nan(a->cls)) {
          parts_return_nan(a, s);
      }
 +    if (a->cls == float_class_denormal) {
 +        float_raise(float_flag_input_denormal_used, s);
 +    }
  }
  float32 float16_to_float32(float16 a, bool ieee, float_status *s)
@@ -XXX,XX +XXX,XX @@ float32_hs_compare(float32 xa, float32 xb, float_status *s, bool is_quiet)
          goto soft;
      }
 -    float32_input_flush2(&ua.s, &ub.s, s);
 +    if (unlikely(float32_is_denormal(ua.s) || float32_is_denormal(ub.s))) {
 +        /* We may need to set the input_denormal_used flag */
 +        goto soft;
 +    }
 +
      if (isgreaterequal(ua.h, ub.h)) {
          if (isgreater(ua.h, ub.h)) {
              return float_relation_greater;
@@ -XXX,XX +XXX,XX @@ float64_hs_compare(float64 xa, float64 xb, float_status *s, bool is_quiet)
          goto soft;
      }
 -    float64_input_flush2(&ua.s, &ub.s, s);
 +    if (unlikely(float64_is_denormal(ua.s) || float64_is_denormal(ub.s))) {
 +        /* We may need to set the input_denormal_used flag */
 +        goto soft;
 +    }
 +
      if (isgreaterequal(ua.h, ub.h)) {
          if (isgreater(ua.h, ub.h)) {
              return float_relation_greater;
 diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/fpu/softfloat-parts.c.inc
-+++ b/target/arm/helper.c
++++ b/fpu/softfloat-parts.c.inc
-@@ -XXX,XX +XXX,XX @@ int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
-  * @xn:      XN (execute-never) bits
+     bool b_sign = b->sign ^ subtract;
-  * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
+     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
-  */
--static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
++    /*
-+int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
++     * For addition and subtraction, we will consume an
- {
++     * input denormal unless the other input is a NaN.
-     int prot = 0;
++     */
++    if ((ab_mask & (float_cmask_denormal | float_cmask_anynan)) ==
-@@ -XXX,XX +XXX,XX @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
++        float_cmask_denormal) {
-  * @xn:      XN (execute-never) bit
++        float_raise(float_flag_input_denormal_used, s);
-  * @pxn:     PXN (privileged execute-never) bit
++    }
-  */
++
--static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
+     if (a->sign != b_sign) {
--                      int ap, int ns, int xn, int pxn)
+         /* Subtraction */
-+int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
+         if (likely(cmask_is_only_normals(ab_mask))) {
-+               int ap, int ns, int xn, int pxn)
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
- {
+     if (likely(cmask_is_only_normals(ab_mask))) {
-     bool is_user = regime_is_user(env, mmu_idx);
+         FloatPartsW tmp;
-     int prot_rw, user_rw;
-@@ -XXX,XX +XXX,XX @@ uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
++        if (ab_mask & float_cmask_denormal) {
-  * Returns true if the suggested S2 translation parameters are OK and
++            float_raise(float_flag_input_denormal_used, s);
   * false otherwise.
   */
 -static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
 -                               int inputsize, int stride, int outputsize)
 +bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
 +                        int inputsize, int stride, int outputsize)
  {
      const int grainsize = stride + 3;
      int startsizecheck;
@@ -XXX,XX +XXX,XX @@ static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
  #endif /* !CONFIG_USER_ONLY */
  /* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
 -static const uint8_t pamax_map[] = {
 +const uint8_t pamax_map[] = {
      [0] = 32,
      [1] = 36,
      [2] = 40,
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
  }
  #ifndef CONFIG_USER_ONLY
 -static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
 -                                          ARMMMUIdx mmu_idx)
 +ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
 +                                   ARMMMUIdx mmu_idx)
  {
      uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
      uint32_t el = regime_el(env, mmu_idx);
@@ -XXX,XX +XXX,XX @@ static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
      };
  }
 -/**
 - * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
 - *
 - * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
 - * prot and page_size may not be filled in, and the populated fsr value provides
 - * information on why the translation aborted, in the format of a long-format
 - * DFSR/IFSR fault register, with the following caveats:
 - *  * the WnR bit is never set (the caller must do this).
 - *
 - * @env: CPUARMState
 - * @address: virtual address to get physical address for
 - * @access_type: MMU_DATA_LOAD, MMU_DATA_STORE or MMU_INST_FETCH
 - * @mmu_idx: MMU index indicating required translation regime
 - * @s1_is_el0: if @mmu_idx is ARMMMUIdx_Stage2 (so this is a stage 2 page table
 - *             walk), must be true if this is stage 2 of a stage 1+2 walk for an
 - *             EL0 access). If @mmu_idx is anything else, @s1_is_el0 is ignored.
 - * @phys_ptr: set to the physical address corresponding to the virtual address
 - * @attrs: set to the memory transaction attributes to use
 - * @prot: set to the permissions for the page containing phys_ptr
 - * @page_size_ptr: set to the size of the page containing phys_ptr
 - * @fi: set to fault info if the translation fails
 - * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
 - */
 -bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
 -                        MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                        bool s1_is_el0,
 -                        hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
 -                        target_ulong *page_size_ptr,
 -                        ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
 -{
 -    ARMCPU *cpu = env_archcpu(env);
 -    CPUState *cs = CPU(cpu);
 -    /* Read an LPAE long-descriptor translation table. */
 -    ARMFaultType fault_type = ARMFault_Translation;
 -    uint32_t level;
 -    ARMVAParameters param;
 -    uint64_t ttbr;
 -    hwaddr descaddr, indexmask, indexmask_grainsize;
 -    uint32_t tableattrs;
 -    target_ulong page_size;
 -    uint32_t attrs;
 -    int32_t stride;
 -    int addrsize, inputsize, outputsize;
 -    TCR *tcr = regime_tcr(env, mmu_idx);
 -    int ap, ns, xn, pxn;
 -    uint32_t el = regime_el(env, mmu_idx);
 -    uint64_t descaddrmask;
 -    bool aarch64 = arm_el_is_aa64(env, el);
 -    bool guarded = false;
 -
 -    /* TODO: This code does not support shareability levels. */
 -    if (aarch64) {
 -        int ps;
 -
 -        param = aa64_va_parameters(env, address, mmu_idx,
 -                                   access_type != MMU_INST_FETCH);
 -        level = 0;
 -
 -        /*
 -         * If TxSZ is programmed to a value larger than the maximum,
 -         * or smaller than the effective minimum, it is IMPLEMENTATION
 -         * DEFINED whether we behave as if the field were programmed
 -         * within bounds, or if a level 0 Translation fault is generated.
 -         *
 -         * With FEAT_LVA, fault on less than minimum becomes required,
 -         * so our choice is to always raise the fault.
 -         */
 -        if (param.tsz_oob) {
 -            fault_type = ARMFault_Translation;
 -            goto do_fault;
 -        }
 -
 -        addrsize = 64 - 8 * param.tbi;
 -        inputsize = 64 - param.tsz;
 -
 -        /*
 -         * Bound PS by PARANGE to find the effective output address size.
 -         * ID_AA64MMFR0 is a read-only register so values outside of the
 -         * supported mappings can be considered an implementation error.
 -         */
 -        ps = FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
 -        ps = MIN(ps, param.ps);
 -        assert(ps < ARRAY_SIZE(pamax_map));
 -        outputsize = pamax_map[ps];
 -    } else {
 -        param = aa32_va_parameters(env, address, mmu_idx);
 -        level = 1;
 -        addrsize = (mmu_idx == ARMMMUIdx_Stage2 ? 40 : 32);
 -        inputsize = addrsize - param.tsz;
 -        outputsize = 40;
 -    }
 -
 -    /*
 -     * We determined the region when collecting the parameters, but we
 -     * have not yet validated that the address is valid for the region.
 -     * Extract the top bits and verify that they all match select.
 -     *
 -     * For aa32, if inputsize == addrsize, then we have selected the
 -     * region by exclusion in aa32_va_parameters and there is no more
 -     * validation to do here.
 -     */
 -    if (inputsize < addrsize) {
 -        target_ulong top_bits = sextract64(address, inputsize,
 -                                           addrsize - inputsize);
 -        if (-top_bits != param.select) {
 -            /* The gap between the two regions is a Translation fault */
 -            fault_type = ARMFault_Translation;
 -            goto do_fault;
 -        }
 -    }
 -
 -    if (param.using64k) {
 -        stride = 13;
 -    } else if (param.using16k) {
 -        stride = 11;
 -    } else {
 -        stride = 9;
 -    }
 -
 -    /* Note that QEMU ignores shareability and cacheability attributes,
 -     * so we don't need to do anything with the SH, ORGN, IRGN fields
 -     * in the TTBCR.  Similarly, TTBCR:A1 selects whether we get the
 -     * ASID from TTBR0 or TTBR1, but QEMU's TLB doesn't currently
 -     * implement any ASID-like capability so we can ignore it (instead
 -     * we will always flush the TLB any time the ASID is changed).
 -     */
 -    ttbr = regime_ttbr(env, mmu_idx, param.select);
 -
 -    /* Here we should have set up all the parameters for the translation:
 -     * inputsize, ttbr, epd, stride, tbi
 -     */
 -
 -    if (param.epd) {
 -        /* Translation table walk disabled => Translation fault on TLB miss
 -         * Note: This is always 0 on 64-bit EL2 and EL3.
 -         */
 -        goto do_fault;
 -    }
 -
 -    if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
 -        /* The starting level depends on the virtual address size (which can
 -         * be up to 48 bits) and the translation granule size. It indicates
 -         * the number of strides (stride bits at a time) needed to
 -         * consume the bits of the input address. In the pseudocode this is:
 -         *  level = 4 - RoundUp((inputsize - grainsize) / stride)
 -         * where their 'inputsize' is our 'inputsize', 'grainsize' is
 -         * our 'stride + 3' and 'stride' is our 'stride'.
 -         * Applying the usual "rounded up m/n is (m+n-1)/n" and simplifying:
 -         * = 4 - (inputsize - stride - 3 + stride - 1) / stride
 -         * = 4 - (inputsize - 4) / stride;
 -         */
 -        level = 4 - (inputsize - 4) / stride;
 -    } else {
 -        /* For stage 2 translations the starting level is specified by the
 -         * VTCR_EL2.SL0 field (whose interpretation depends on the page size)
 -         */
 -        uint32_t sl0 = extract32(tcr->raw_tcr, 6, 2);
 -        uint32_t sl2 = extract64(tcr->raw_tcr, 33, 1);
 -        uint32_t startlevel;
 -        bool ok;
 -
 -        /* SL2 is RES0 unless DS=1 & 4kb granule. */
 -        if (param.ds && stride == 9 && sl2) {
 -            if (sl0 != 0) {
 -                level = 0;
 -                fault_type = ARMFault_Translation;
 -                goto do_fault;
 -            }
 -            startlevel = -1;
 -        } else if (!aarch64 || stride == 9) {
 -            /* AArch32 or 4KB pages */
 -            startlevel = 2 - sl0;
 -
 -            if (cpu_isar_feature(aa64_st, cpu)) {
 -                startlevel &= 3;
 -            }
 -        } else {
 -            /* 16KB or 64KB pages */
 -            startlevel = 3 - sl0;
 -        }
 -
 -        /* Check that the starting level is valid. */
 -        ok = check_s2_mmu_setup(cpu, aarch64, startlevel,
 -                                inputsize, stride, outputsize);
 -        if (!ok) {
 -            fault_type = ARMFault_Translation;
 -            goto do_fault;
 -        }
 -        level = startlevel;
 -    }
 -
 -    indexmask_grainsize = MAKE_64BIT_MASK(0, stride + 3);
 -    indexmask = MAKE_64BIT_MASK(0, inputsize - (stride * (4 - level)));
 -
 -    /* Now we can extract the actual base address from the TTBR */
 -    descaddr = extract64(ttbr, 0, 48);
 -
 -    /*
 -     * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [5:2] of TTBR.
 -     *
 -     * Otherwise, if the base address is out of range, raise AddressSizeFault.
 -     * In the pseudocode, this is !IsZero(baseregister<47:outputsize>),
 -     * but we've just cleared the bits above 47, so simplify the test.
 -     */
 -    if (outputsize > 48) {
 -        descaddr |= extract64(ttbr, 2, 4) << 48;
 -    } else if (descaddr >> outputsize) {
 -        level = 0;
 -        fault_type = ARMFault_AddressSize;
 -        goto do_fault;
 -    }
 -
 -    /*
 -     * We rely on this masking to clear the RES0 bits at the bottom of the TTBR
 -     * and also to mask out CnP (bit 0) which could validly be non-zero.
 -     */
 -    descaddr &= ~indexmask;
 -
 -    /*
 -     * For AArch32, the address field in the descriptor goes up to bit 39
 -     * for both v7 and v8.  However, for v8 the SBZ bits [47:40] must be 0
 -     * or an AddressSize fault is raised.  So for v8 we extract those SBZ
 -     * bits as part of the address, which will be checked via outputsize.
 -     * For AArch64, the address field goes up to bit 47, or 49 with FEAT_LPA2;
 -     * the highest bits of a 52-bit output are placed elsewhere.
 -     */
 -    if (param.ds) {
 -        descaddrmask = MAKE_64BIT_MASK(0, 50);
 -    } else if (arm_feature(env, ARM_FEATURE_V8)) {
 -        descaddrmask = MAKE_64BIT_MASK(0, 48);
 -    } else {
 -        descaddrmask = MAKE_64BIT_MASK(0, 40);
 -    }
 -    descaddrmask &= ~indexmask_grainsize;
 -
 -    /* Secure accesses start with the page table in secure memory and
 -     * can be downgraded to non-secure at any step. Non-secure accesses
 -     * remain non-secure. We implement this by just ORing in the NSTable/NS
 -     * bits at each step.
 -     */
 -    tableattrs = regime_is_secure(env, mmu_idx) ? 0 : (1 << 4);
 -    for (;;) {
 -        uint64_t descriptor;
 -        bool nstable;
 -
 -        descaddr |= (address >> (stride * (4 - level))) & indexmask;
 -        descaddr &= ~7ULL;
 -        nstable = extract32(tableattrs, 4, 1);
 -        descriptor = arm_ldq_ptw(cs, descaddr, !nstable, mmu_idx, fi);
 -        if (fi->type != ARMFault_None) {
 -            goto do_fault;
 -        }
 -
 -        if (!(descriptor & 1) ||
 -            (!(descriptor & 2) && (level == 3))) {
 -            /* Invalid, or the Reserved level 3 encoding */
 -            goto do_fault;
 -        }
 -
 -        descaddr = descriptor & descaddrmask;
 -
 -        /*
 -         * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [15:12]
 -         * of descriptor.  For FEAT_LPA2 and effective DS, bits [51:50] of
 -         * descaddr are in [9:8].  Otherwise, if descaddr is out of range,
 -         * raise AddressSizeFault.
 -         */
 -        if (outputsize > 48) {
 -            if (param.ds) {
 -                descaddr |= extract64(descriptor, 8, 2) << 50;
 -            } else {
 -                descaddr |= extract64(descriptor, 12, 4) << 48;
 -            }
 -        } else if (descaddr >> outputsize) {
 -            fault_type = ARMFault_AddressSize;
 -            goto do_fault;
 -        }
 -
 -        if ((descriptor & 2) && (level < 3)) {
 -            /* Table entry. The top five bits are attributes which may
 -             * propagate down through lower levels of the table (and
 -             * which are all arranged so that 0 means "no effect", so
 -             * we can gather them up by ORing in the bits at each level).
 -             */
 -            tableattrs |= extract64(descriptor, 59, 5);
 -            level++;
 -            indexmask = indexmask_grainsize;
 -            continue;
 -        }
 -        /*
 -         * Block entry at level 1 or 2, or page entry at level 3.
 -         * These are basically the same thing, although the number
 -         * of bits we pull in from the vaddr varies. Note that although
 -         * descaddrmask masks enough of the low bits of the descriptor
 -         * to give a correct page or table address, the address field
 -         * in a block descriptor is smaller; so we need to explicitly
 -         * clear the lower bits here before ORing in the low vaddr bits.
 -         */
 -        page_size = (1ULL << ((stride * (4 - level)) + 3));
 -        descaddr &= ~(page_size - 1);
 -        descaddr |= (address & (page_size - 1));
 -        /* Extract attributes from the descriptor */
 -        attrs = extract64(descriptor, 2, 10)
 -            | (extract64(descriptor, 52, 12) << 10);
 -
 -        if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 -            /* Stage 2 table descriptors do not include any attribute fields */
 -            break;
 -        }
 -        /* Merge in attributes from table descriptors */
 -        attrs |= nstable << 3; /* NS */
 -        guarded = extract64(descriptor, 50, 1);  /* GP */
 -        if (param.hpd) {
 -            /* HPD disables all the table attributes except NSTable.  */
 -            break;
 -        }
 -        attrs |= extract32(tableattrs, 0, 2) << 11;     /* XN, PXN */
 -        /* The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
 -         * means "force PL1 access only", which means forcing AP[1] to 0.
 -         */
 -        attrs &= ~(extract32(tableattrs, 2, 1) << 4);   /* !APT[0] => AP[1] */
 -        attrs |= extract32(tableattrs, 3, 1) << 5;      /* APT[1] => AP[2] */
 -        break;
 -    }
 -    /* Here descaddr is the final physical address, and attributes
 -     * are all in attrs.
 -     */
 -    fault_type = ARMFault_AccessFlag;
 -    if ((attrs & (1 << 8)) == 0) {
 -        /* Access flag */
 -        goto do_fault;
 -    }
 -
 -    ap = extract32(attrs, 4, 2);
 -
 -    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 -        ns = mmu_idx == ARMMMUIdx_Stage2;
 -        xn = extract32(attrs, 11, 2);
 -        *prot = get_S2prot(env, ap, xn, s1_is_el0);
 -    } else {
 -        ns = extract32(attrs, 3, 1);
 -        xn = extract32(attrs, 12, 1);
 -        pxn = extract32(attrs, 11, 1);
 -        *prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
 -    }
 -
 -    fault_type = ARMFault_Permission;
 -    if (!(*prot & (1 << access_type))) {
 -        goto do_fault;
 -    }
 -
 -    if (ns) {
 -        /* The NS bit will (as required by the architecture) have no effect if
 -         * the CPU doesn't support TZ or this is a non-secure translation
 -         * regime, because the attribute will already be non-secure.
 -         */
 -        txattrs->secure = false;
 -    }
 -    /* When in aarch64 mode, and BTI is enabled, remember GP in the IOTLB.  */
 -    if (aarch64 && guarded && cpu_isar_feature(aa64_bti, cpu)) {
 -        arm_tlb_bti_gp(txattrs) = true;
 -    }
 -
 -    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 -        cacheattrs->is_s2_format = true;
 -        cacheattrs->attrs = extract32(attrs, 0, 4);
 -    } else {
 -        /* Index into MAIR registers for cache attributes */
 -        uint8_t attrindx = extract32(attrs, 0, 3);
 -        uint64_t mair = env->cp15.mair_el[regime_el(env, mmu_idx)];
 -        assert(attrindx <= 7);
 -        cacheattrs->is_s2_format = false;
 -        cacheattrs->attrs = extract64(mair, attrindx * 8, 8);
 -    }
 -
 -    /*
 -     * For FEAT_LPA2 and effective DS, the SH field in the attributes
 -     * was re-purposed for output address bits.  The SH attribute in
 -     * that case comes from TCR_ELx, which we extracted earlier.
 -     */
 -    if (param.ds) {
 -        cacheattrs->shareability = param.sh;
 -    } else {
 -        cacheattrs->shareability = extract32(attrs, 6, 2);
 -    }
 -
 -    *phys_ptr = descaddr;
 -    *page_size_ptr = page_size;
 -    return false;
 -
 -do_fault:
 -    fi->type = fault_type;
 -    fi->level = level;
 -    /* Tag the error as S2 for failed S1 PTW at S2 or ordinary S2.  */
 -    fi->stage2 = fi->s1ptw || (mmu_idx == ARMMMUIdx_Stage2 ||
 -                               mmu_idx == ARMMMUIdx_Stage2_S);
 -    fi->s1ns = mmu_idx == ARMMMUIdx_Stage2;
 -    return true;
 -}
 -
  hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
                                           MemTxAttrs *attrs)
  {
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ do_fault:
      return true;
  }
 +/**
 + * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
 + *
 + * Returns false if the translation was successful. Otherwise, phys_ptr,
 + * attrs, prot and page_size may not be filled in, and the populated fsr
 + * value provides information on why the translation aborted, in the format
 + * of a long-format DFSR/IFSR fault register, with the following caveat:
 + * the WnR bit is never set (the caller must do this).
 + *
 + * @env: CPUARMState
 + * @address: virtual address to get physical address for
 + * @access_type: MMU_DATA_LOAD, MMU_DATA_STORE or MMU_INST_FETCH
 + * @mmu_idx: MMU index indicating required translation regime
 + * @s1_is_el0: if @mmu_idx is ARMMMUIdx_Stage2 (so this is a stage 2 page
 + *             table walk), must be true if this is stage 2 of a stage 1+2
 + *             walk for an EL0 access. If @mmu_idx is anything else,
 + *             @s1_is_el0 is ignored.
 + * @phys_ptr: set to the physical address corresponding to the virtual address
 + * @attrs: set to the memory transaction attributes to use
 + * @prot: set to the permissions for the page containing phys_ptr
 + * @page_size_ptr: set to the size of the page containing phys_ptr
 + * @fi: set to fault info if the translation fails
 + * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
 + */
 +bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
 +                        MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                        bool s1_is_el0,
 +                        hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
 +                        target_ulong *page_size_ptr,
 +                        ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
 +{
 +    ARMCPU *cpu = env_archcpu(env);
 +    CPUState *cs = CPU(cpu);
 +    /* Read an LPAE long-descriptor translation table. */
 +    ARMFaultType fault_type = ARMFault_Translation;
 +    uint32_t level;
 +    ARMVAParameters param;
 +    uint64_t ttbr;
 +    hwaddr descaddr, indexmask, indexmask_grainsize;
 +    uint32_t tableattrs;
 +    target_ulong page_size;
 +    uint32_t attrs;
 +    int32_t stride;
 +    int addrsize, inputsize, outputsize;
 +    TCR *tcr = regime_tcr(env, mmu_idx);
 +    int ap, ns, xn, pxn;
 +    uint32_t el = regime_el(env, mmu_idx);
 +    uint64_t descaddrmask;
 +    bool aarch64 = arm_el_is_aa64(env, el);
 +    bool guarded = false;
 +
 +    /* TODO: This code does not support shareability levels. */
 +    if (aarch64) {
 +        int ps;
 +
 +        param = aa64_va_parameters(env, address, mmu_idx,
 +                                   access_type != MMU_INST_FETCH);
 +        level = 0;
 +
 +        /*
 +         * If TxSZ is programmed to a value larger than the maximum,
 +         * or smaller than the effective minimum, it is IMPLEMENTATION
 +         * DEFINED whether we behave as if the field were programmed
 +         * within bounds, or if a level 0 Translation fault is generated.
 +         *
 +         * With FEAT_LVA, fault on less than minimum becomes required,
 +         * so our choice is to always raise the fault.
 +         */
 +        if (param.tsz_oob) {
 +            fault_type = ARMFault_Translation;
 +            goto do_fault;
 +        }
 +
-+        addrsize = 64 - 8 * param.tbi;
+         frac_mulw(&tmp, a, b);
-+        inputsize = 64 - param.tsz;
+         frac_truncjam(a, &tmp);
-+
-+        /*
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
-+         * Bound PS by PARANGE to find the effective output address size.
+     }
-+         * ID_AA64MMFR0 is a read-only register so values outside of the
-+         * supported mappings can be considered an implementation error.
+     /* Multiply by 0 or Inf */
-+         */
++    if (ab_mask & float_cmask_denormal) {
-+        ps = FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
++        float_raise(float_flag_input_denormal_used, s);
-+        ps = MIN(ps, param.ps);
++    }
-+        assert(ps < ARRAY_SIZE(pamax_map));
++
-+        outputsize = pamax_map[ps];
+     if (ab_mask & float_cmask_inf) {
-+    } else {
+         a->cls = float_class_inf;
-+        param = aa32_va_parameters(env, address, mmu_idx);
+         a->sign = sign;
-+        level = 1;
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
-+        addrsize = (mmu_idx == ARMMMUIdx_Stage2 ? 40 : 32);
+     if (flags & float_muladd_negate_result) {
-+        inputsize = addrsize - param.tsz;
+         a->sign ^= 1;
-+        outputsize = 40;
+     }
 +    }
 +
 +    /*
-+     * We determined the region when collecting the parameters, but we
++     * All result types except for "return the default NaN
-+     * have not yet validated that the address is valid for the region.
++     * because this is an Invalid Operation" go through here;
-+     * Extract the top bits and verify that they all match select.
++     * this matches the set of cases where we consumed a
-+     *
++     * denormal input.
 +     * For aa32, if inputsize == addrsize, then we have selected the
 +     * region by exclusion in aa32_va_parameters and there is no more
 +     * validation to do here.
 +     */
-+    if (inputsize < addrsize) {
++    if (abc_mask & float_cmask_denormal) {
-+        target_ulong top_bits = sextract64(address, inputsize,
++        float_raise(float_flag_input_denormal_used, s);
-+                                           addrsize - inputsize);
++    }
-+        if (-top_bits != param.select) {
+     return a;
-+            /* The gap between the two regions is a Translation fault */
-+            fault_type = ARMFault_Translation;
+  return_sub_zero:
-+            goto do_fault;
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
      bool sign = a->sign ^ b->sign;
      if (likely(cmask_is_only_normals(ab_mask))) {
 +        if (ab_mask & float_cmask_denormal) {
 +            float_raise(float_flag_input_denormal_used, s);
 +        }
-+    }
+         a->sign = sign;
-+
+         a->exp -= b->exp + frac_div(a, b);
-+    if (param.using64k) {
+         return a;
-+        stride = 13;
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
-+    } else if (param.using16k) {
+         return parts_pick_nan(a, b, s);
-+        stride = 11;
+     }
-+    } else {
-+        stride = 9;
++    if ((ab_mask & float_cmask_denormal) && b->cls != float_class_zero) {
-+    }
++        float_raise(float_flag_input_denormal_used, s);
-+
++    }
-+    /*
++
-+     * Note that QEMU ignores shareability and cacheability attributes,
+     a->sign = sign;
-+     * so we don't need to do anything with the SH, ORGN, IRGN fields
-+     * in the TTBCR.  Similarly, TTBCR:A1 selects whether we get the
+     /* Inf / X */
-+     * ASID from TTBR0 or TTBR1, but QEMU's TLB doesn't currently
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
-+     * implement any ASID-like capability so we can ignore it (instead
+     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
-+     * we will always flush the TLB any time the ASID is changed).
-+     */
+     if (likely(cmask_is_only_normals(ab_mask))) {
-+    ttbr = regime_ttbr(env, mmu_idx, param.select);
++        if (ab_mask & float_cmask_denormal) {
-+
++            float_raise(float_flag_input_denormal_used, s);
-+    /*
++        }
-+     * Here we should have set up all the parameters for the translation:
+         frac_modrem(a, b, mod_quot);
-+     * inputsize, ttbr, epd, stride, tbi
+         return a;
-+     */
+     }
-+
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
-+    if (param.epd) {
+         return a;
-+        /*
+     }
-+         * Translation table walk disabled => Translation fault on TLB miss
-+         * Note: This is always 0 on 64-bit EL2 and EL3.
++    if (ab_mask & float_cmask_denormal) {
-+         */
++        float_raise(float_flag_input_denormal_used, s);
-+        goto do_fault;
++    }
-+    }
++
-+
+     /* N % Inf; 0 % N */
-+    if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
+     g_assert(b->cls == float_class_inf || a->cls == float_class_zero);
-+        /*
+     return a;
-+         * The starting level depends on the virtual address size (which can
+@@ -XXX,XX +XXX,XX @@ static void partsN(sqrt)(FloatPartsN *a, float_status *status,
-+         * be up to 48 bits) and the translation granule size. It indicates
+     if (unlikely(a->cls != float_class_normal)) {
-+         * the number of strides (stride bits at a time) needed to
+         switch (a->cls) {
-+         * consume the bits of the input address. In the pseudocode this is:
+         case float_class_denormal:
-+         *  level = 4 - RoundUp((inputsize - grainsize) / stride)
++            if (!a->sign) {
-+         * where their 'inputsize' is our 'inputsize', 'grainsize' is
++                /* -ve denormal will be InvalidOperation */
-+         * our 'stride + 3' and 'stride' is our 'stride'.
++                float_raise(float_flag_input_denormal_used, status);
 +         * Applying the usual "rounded up m/n is (m+n-1)/n" and simplifying:
 +         * = 4 - (inputsize - stride - 3 + stride - 1) / stride
 +         * = 4 - (inputsize - 4) / stride;
 +         */
 +        level = 4 - (inputsize - 4) / stride;
 +    } else {
 +        /*
 +         * For stage 2 translations the starting level is specified by the
 +         * VTCR_EL2.SL0 field (whose interpretation depends on the page size)
 +         */
 +        uint32_t sl0 = extract32(tcr->raw_tcr, 6, 2);
 +        uint32_t sl2 = extract64(tcr->raw_tcr, 33, 1);
 +        uint32_t startlevel;
 +        bool ok;
 +
 +        /* SL2 is RES0 unless DS=1 & 4kb granule. */
 +        if (param.ds && stride == 9 && sl2) {
 +            if (sl0 != 0) {
 +                level = 0;
 +                fault_type = ARMFault_Translation;
 +                goto do_fault;
 +            }
-+            startlevel = -1;
+             break;
-+        } else if (!aarch64 || stride == 9) {
+         case float_class_snan:
-+            /* AArch32 or 4KB pages */
+         case float_class_qnan:
-+            startlevel = 2 - sl0;
+@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
-+
+         if ((flags & (minmax_isnum | minmax_isnumber))
-+            if (cpu_isar_feature(aa64_st, cpu)) {
+             && !(ab_mask & float_cmask_snan)
-+                startlevel &= 3;
+             && (ab_mask & ~float_cmask_qnan)) {
 +            if (ab_mask & float_cmask_denormal) {
 +                float_raise(float_flag_input_denormal_used, s);
 +            }
-+        } else {
+             return is_nan(a->cls) ? b : a;
-+            /* 16KB or 64KB pages */
+         }
-+            startlevel = 3 - sl0;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
          return parts_pick_nan(a, b, s);
      }
 +    if (ab_mask & float_cmask_denormal) {
 +        float_raise(float_flag_input_denormal_used, s);
 +    }
 +
      a_exp = a->exp;
      b_exp = b->exp;
@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
      if (likely(cmask_is_only_normals(ab_mask))) {
          FloatRelation cmp;
 +        if (ab_mask & float_cmask_denormal) {
 +            float_raise(float_flag_input_denormal_used, s);
 +        }
 +
-+        /* Check that the starting level is valid. */
+         if (a->sign != b->sign) {
-+        ok = check_s2_mmu_setup(cpu, aarch64, startlevel,
+             goto a_sign;
-+                                inputsize, stride, outputsize);
+         }
-+        if (!ok) {
+@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
-+            fault_type = ARMFault_Translation;
+         return float_relation_unordered;
-+            goto do_fault;
+     }
-+        }
-+        level = startlevel;
++    if (ab_mask & float_cmask_denormal) {
-+    }
++        float_raise(float_flag_input_denormal_used, s);
-+
++    }
-+    indexmask_grainsize = MAKE_64BIT_MASK(0, stride + 3);
++
-+    indexmask = MAKE_64BIT_MASK(0, inputsize - (stride * (4 - level)));
+     if (ab_mask & float_cmask_zero) {
-+
+         if (ab_mask == float_cmask_zero) {
-+    /* Now we can extract the actual base address from the TTBR */
+             return float_relation_equal;
-+    descaddr = extract64(ttbr, 0, 48);
+@@ -XXX,XX +XXX,XX @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
-+
+     case float_class_zero:
-+    /*
+     case float_class_inf:
-+     * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [5:2] of TTBR.
+         break;
-+     *
+-    case float_class_normal:
-+     * Otherwise, if the base address is out of range, raise AddressSizeFault.
+     case float_class_denormal:
-+     * In the pseudocode, this is !IsZero(baseregister<47:outputsize>),
++        float_raise(float_flag_input_denormal_used, s);
-+     * but we've just cleared the bits above 47, so simplify the test.
++        /* fall through */
-+     */
++    case float_class_normal:
-+    if (outputsize > 48) {
+         a->exp += MIN(MAX(n, -0x10000), 0x10000);
-+        descaddr |= extract64(ttbr, 2, 4) << 48;
+         break;
-+    } else if (descaddr >> outputsize) {
+     default:
-+        level = 0;
+@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
-+        fault_type = ARMFault_AddressSize;
+     if (unlikely(a->cls != float_class_normal)) {
-+        goto do_fault;
+         switch (a->cls) {
-+    }
+         case float_class_denormal:
-+
++            if (!a->sign) {
-+    /*
++                /* -ve denormal will be InvalidOperation */
-+     * We rely on this masking to clear the RES0 bits at the bottom of the TTBR
++                float_raise(float_flag_input_denormal_used, s);
 +     * and also to mask out CnP (bit 0) which could validly be non-zero.
 +     */
 +    descaddr &= ~indexmask;
 +
 +    /*
 +     * For AArch32, the address field in the descriptor goes up to bit 39
 +     * for both v7 and v8.  However, for v8 the SBZ bits [47:40] must be 0
 +     * or an AddressSize fault is raised.  So for v8 we extract those SBZ
 +     * bits as part of the address, which will be checked via outputsize.
 +     * For AArch64, the address field goes up to bit 47, or 49 with FEAT_LPA2;
 +     * the highest bits of a 52-bit output are placed elsewhere.
 +     */
 +    if (param.ds) {
 +        descaddrmask = MAKE_64BIT_MASK(0, 50);
 +    } else if (arm_feature(env, ARM_FEATURE_V8)) {
 +        descaddrmask = MAKE_64BIT_MASK(0, 48);
 +    } else {
 +        descaddrmask = MAKE_64BIT_MASK(0, 40);
 +    }
 +    descaddrmask &= ~indexmask_grainsize;
 +
 +    /*
 +     * Secure accesses start with the page table in secure memory and
 +     * can be downgraded to non-secure at any step. Non-secure accesses
 +     * remain non-secure. We implement this by just ORing in the NSTable/NS
 +     * bits at each step.
 +     */
 +    tableattrs = regime_is_secure(env, mmu_idx) ? 0 : (1 << 4);
 +    for (;;) {
 +        uint64_t descriptor;
 +        bool nstable;
 +
 +        descaddr |= (address >> (stride * (4 - level))) & indexmask;
 +        descaddr &= ~7ULL;
 +        nstable = extract32(tableattrs, 4, 1);
 +        descriptor = arm_ldq_ptw(cs, descaddr, !nstable, mmu_idx, fi);
 +        if (fi->type != ARMFault_None) {
 +            goto do_fault;
 +        }
 +
 +        if (!(descriptor & 1) ||
 +            (!(descriptor & 2) && (level == 3))) {
 +            /* Invalid, or the Reserved level 3 encoding */
 +            goto do_fault;
 +        }
 +
 +        descaddr = descriptor & descaddrmask;
 +
 +        /*
 +         * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [15:12]
 +         * of descriptor.  For FEAT_LPA2 and effective DS, bits [51:50] of
 +         * descaddr are in [9:8].  Otherwise, if descaddr is out of range,
 +         * raise AddressSizeFault.
 +         */
 +        if (outputsize > 48) {
 +            if (param.ds) {
 +                descaddr |= extract64(descriptor, 8, 2) << 50;
 +            } else {
 +                descaddr |= extract64(descriptor, 12, 4) << 48;
 +            }
-+        } else if (descaddr >> outputsize) {
+             break;
-+            fault_type = ARMFault_AddressSize;
+         case float_class_snan:
-+            goto do_fault;
+         case float_class_qnan:
 +        }
 +
 +        if ((descriptor & 2) && (level < 3)) {
 +            /*
 +             * Table entry. The top five bits are attributes which may
 +             * propagate down through lower levels of the table (and
 +             * which are all arranged so that 0 means "no effect", so
 +             * we can gather them up by ORing in the bits at each level).
 +             */
 +            tableattrs |= extract64(descriptor, 59, 5);
 +            level++;
 +            indexmask = indexmask_grainsize;
 +            continue;
 +        }
 +        /*
 +         * Block entry at level 1 or 2, or page entry at level 3.
 +         * These are basically the same thing, although the number
 +         * of bits we pull in from the vaddr varies. Note that although
 +         * descaddrmask masks enough of the low bits of the descriptor
 +         * to give a correct page or table address, the address field
 +         * in a block descriptor is smaller; so we need to explicitly
 +         * clear the lower bits here before ORing in the low vaddr bits.
 +         */
 +        page_size = (1ULL << ((stride * (4 - level)) + 3));
 +        descaddr &= ~(page_size - 1);
 +        descaddr |= (address & (page_size - 1));
 +        /* Extract attributes from the descriptor */
 +        attrs = extract64(descriptor, 2, 10)
 +            | (extract64(descriptor, 52, 12) << 10);
 +
 +        if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 +            /* Stage 2 table descriptors do not include any attribute fields */
 +            break;
 +        }
 +        /* Merge in attributes from table descriptors */
 +        attrs |= nstable << 3; /* NS */
 +        guarded = extract64(descriptor, 50, 1);  /* GP */
 +        if (param.hpd) {
 +            /* HPD disables all the table attributes except NSTable.  */
 +            break;
 +        }
 +        attrs |= extract32(tableattrs, 0, 2) << 11;     /* XN, PXN */
 +        /*
 +         * The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
 +         * means "force PL1 access only", which means forcing AP[1] to 0.
 +         */
 +        attrs &= ~(extract32(tableattrs, 2, 1) << 4);   /* !APT[0] => AP[1] */
 +        attrs |= extract32(tableattrs, 3, 1) << 5;      /* APT[1] => AP[2] */
 +        break;
 +    }
 +    /*
 +     * Here descaddr is the final physical address, and attributes
 +     * are all in attrs.
 +     */
 +    fault_type = ARMFault_AccessFlag;
 +    if ((attrs & (1 << 8)) == 0) {
 +        /* Access flag */
 +        goto do_fault;
 +    }
 +
 +    ap = extract32(attrs, 4, 2);
 +
 +    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 +        ns = mmu_idx == ARMMMUIdx_Stage2;
 +        xn = extract32(attrs, 11, 2);
 +        *prot = get_S2prot(env, ap, xn, s1_is_el0);
 +    } else {
 +        ns = extract32(attrs, 3, 1);
 +        xn = extract32(attrs, 12, 1);
 +        pxn = extract32(attrs, 11, 1);
 +        *prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
 +    }
 +
 +    fault_type = ARMFault_Permission;
 +    if (!(*prot & (1 << access_type))) {
 +        goto do_fault;
 +    }
 +
 +    if (ns) {
 +        /*
 +         * The NS bit will (as required by the architecture) have no effect if
 +         * the CPU doesn't support TZ or this is a non-secure translation
 +         * regime, because the attribute will already be non-secure.
 +         */
 +        txattrs->secure = false;
 +    }
 +    /* When in aarch64 mode, and BTI is enabled, remember GP in the IOTLB.  */
 +    if (aarch64 && guarded && cpu_isar_feature(aa64_bti, cpu)) {
 +        arm_tlb_bti_gp(txattrs) = true;
 +    }
 +
 +    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 +        cacheattrs->is_s2_format = true;
 +        cacheattrs->attrs = extract32(attrs, 0, 4);
 +    } else {
 +        /* Index into MAIR registers for cache attributes */
 +        uint8_t attrindx = extract32(attrs, 0, 3);
 +        uint64_t mair = env->cp15.mair_el[regime_el(env, mmu_idx)];
 +        assert(attrindx <= 7);
 +        cacheattrs->is_s2_format = false;
 +        cacheattrs->attrs = extract64(mair, attrindx * 8, 8);
 +    }
 +
 +    /*
 +     * For FEAT_LPA2 and effective DS, the SH field in the attributes
 +     * was re-purposed for output address bits.  The SH attribute in
 +     * that case comes from TCR_ELx, which we extracted earlier.
 +     */
 +    if (param.ds) {
 +        cacheattrs->shareability = param.sh;
 +    } else {
 +        cacheattrs->shareability = extract32(attrs, 6, 2);
 +    }
 +
 +    *phys_ptr = descaddr;
 +    *page_size_ptr = page_size;
 +    return false;
 +
 +do_fault:
 +    fi->type = fault_type;
 +    fi->level = level;
 +    /* Tag the error as S2 for failed S1 PTW at S2 or ordinary S2.  */
 +    fi->stage2 = fi->s1ptw || (mmu_idx == ARMMMUIdx_Stage2 ||
 +                               mmu_idx == ARMMMUIdx_Stage2_S);
 +    fi->s1ns = mmu_idx == ARMMMUIdx_Stage2;
 +    return true;
 +}
 +
  static bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
                                   MMUAccessType access_type, ARMMMUIdx mmu_idx,
                                   hwaddr *phys_ptr, int *prot,
 --
-.25.1
+.34.1

-[PULL 03/55] Fix 'writeable' typos
+[PULL 04/68] fpu: allow flushing of output denormals to be after rounding
-We have about 30 instances of the typo/variant spelling 'writeable',
+Currently we handle flushing of output denormals in uncanon_normal
-and over 500 of the more common 'writable'.  Standardize on the
+always before we deal with rounding.  This works for architectures
-latter.
+that detect tininess before rounding, but is usually not the right
+place when the architecture detects tininess after rounding.  For
-Change produced with:
+example, for x86 the SDM states that the MXCSR FTZ control bit causes
+outputs to be flushed to zero "when it detects a floating-point
-  sed -i -e 's/\([Ww][Rr][Ii][Tt]\)[Ee]\([Aa][Bb][Ll][Ee]\)/\1\2/g' $(git grep -il writeable)
+underflow condition".  This means that we mustn't flush to zero if
+the input is such that after rounding it is no longer tiny.
-and then hand-undoing the instance in linux-headers/linux/kvm.h.
+At least one of our guest architectures does underflow detection
-Most of these changes are in comments or documentation; the
+after rounding but flushing of denormals before rounding (MIPS MSA);
-exceptions are:
+this means we need to have a config knob for this that is separate
- * a local variable in accel/hvf/hvf-accel-ops.c
+from our existing tininess_before_rounding setting.
- * a local variable in accel/kvm/kvm-all.c
- * the PMCR_WRITABLE_MASK macro in target/arm/internals.h
+Add an ftz_detection flag.  For consistency with
- * the EPT_VIOLATION_GPA_WRITABLE macro in target/i386/hvf/vmcs.h
+tininess_before_rounding, we make it default to "detect ftz after
-   (which is never used anywhere)
+rounding"; this means that we need to explicitly set the flag to
- * the AR_TYPE_WRITABLE_MASK macro in target/i386/hvf/vmx.h
+"detect ftz before rounding" on every existing architecture that sets
-   (which is never used anywhere)
+flush_to_zero, so that this commit has no behaviour change.
 (This means more code change here but for the long term a less
 confusing API.)
 For several architectures the current behaviour is either
 definitely or possibly wrong; annotate those with TODO comments.
 These architectures are definitely wrong (and should detect
 ftz after rounding):
  * x86
  * Alpha
 For these architectures the spec is unclear:
  * MIPS (for non-MSA)
  * RX
  * SH4
 PA-RISC makes ftz detection IMPDEF, but we aren't setting the
 "tininess before rounding" setting that we ought to.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Stefan Weil <sw@weilnetz.de>
 Message-id: 20220505095015.2714666-1-peter.maydell@linaro.org
 ---
- docs/interop/vhost-user.rst    | 2 +-
+ include/fpu/softfloat-helpers.h | 11 +++++++++++
- docs/specs/vmgenid.txt         | 4 ++--
+ include/fpu/softfloat-types.h   | 18 ++++++++++++++++++
- hw/scsi/mfi.h                  | 2 +-
+ target/mips/fpu_helper.h        |  6 ++++++
- target/arm/internals.h         | 4 ++--
+ target/alpha/cpu.c              |  7 +++++++
- target/i386/hvf/vmcs.h         | 2 +-
+ target/arm/cpu.c                |  1 +
- target/i386/hvf/vmx.h          | 2 +-
+ target/hppa/fpu_helper.c        | 11 +++++++++++
- accel/hvf/hvf-accel-ops.c      | 4 ++--
+ target/i386/tcg/fpu_helper.c    |  8 ++++++++
- accel/kvm/kvm-all.c            | 4 ++--
+ target/mips/msa.c               |  9 +++++++++
- accel/tcg/user-exec.c          | 6 +++---
+ target/ppc/cpu_init.c           |  3 +++
- hw/acpi/ghes.c                 | 2 +-
+ target/rx/cpu.c                 |  8 ++++++++
- hw/intc/arm_gicv3_cpuif.c      | 2 +-
+ target/sh4/cpu.c                |  8 ++++++++
- hw/intc/arm_gicv3_dist.c       | 2 +-
+ target/tricore/helper.c         |  1 +
- hw/intc/arm_gicv3_redist.c     | 4 ++--
+ tests/fp/fp-bench.c             |  1 +
- hw/intc/riscv_aclint.c         | 2 +-
+ fpu/softfloat-parts.c.inc       | 21 +++++++++++++++------
- hw/intc/riscv_aplic.c          | 2 +-
+files changed, 107 insertions(+), 6 deletions(-)
- hw/pci/shpc.c                  | 2 +-
- hw/sparc64/sun4u_iommu.c       | 2 +-
+diff --git a/include/fpu/softfloat-helpers.h b/include/fpu/softfloat-helpers.h
- hw/timer/sse-timer.c           | 2 +-
+index XXXXXXX..XXXXXXX 100644
- target/arm/gdbstub.c           | 2 +-
+--- a/include/fpu/softfloat-helpers.h
- target/arm/helper.c            | 4 ++--
++++ b/include/fpu/softfloat-helpers.h
- target/arm/hvf/hvf.c           | 4 ++--
+@@ -XXX,XX +XXX,XX @@ static inline void set_flush_inputs_to_zero(bool val, float_status *status)
- target/i386/cpu-sysemu.c       | 2 +-
+     status->flush_inputs_to_zero = val;
- target/s390x/ioinst.c          | 2 +-
+ }
- python/qemu/machine/machine.py | 2 +-
- tests/tcg/x86_64/system/boot.S | 2 +-
++static inline void set_float_ftz_detection(FloatFTZDetection d,
-files changed, 34 insertions(+), 34 deletions(-)
++                                           float_status *status)
++{
-diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
++    status->ftz_detection = d;
-index XXXXXXX..XXXXXXX 100644
++}
---- a/docs/interop/vhost-user.rst
++
-+++ b/docs/interop/vhost-user.rst
+ static inline void set_default_nan_mode(bool val, float_status *status)
-@@ -XXX,XX +XXX,XX @@ Virtio device config space
+ {
- :size: a 32-bit configuration space access size in bytes
+     status->default_nan_mode = val;
+@@ -XXX,XX +XXX,XX @@ static inline bool get_default_nan_mode(const float_status *status)
- :flags: a 32-bit value:
+     return status->default_nan_mode;
--  - 0: Vhost front-end messages used for writeable fields
+ }
-+  - 0: Vhost front-end messages used for writable fields
-   - 1: Vhost front-end messages used for live migration
++static inline FloatFTZDetection get_float_ftz_detection(const float_status *status)
++{
- :payload: Size bytes array holding the contents of the virtio
++    return status->ftz_detection;
-diff --git a/docs/specs/vmgenid.txt b/docs/specs/vmgenid.txt
++}
-index XXXXXXX..XXXXXXX 100644
++
---- a/docs/specs/vmgenid.txt
+ #endif /* SOFTFLOAT_HELPERS_H */
-+++ b/docs/specs/vmgenid.txt
+diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
-@@ -XXX,XX +XXX,XX @@ change the contents of the memory at runtime, specifically when starting a
+index XXXXXXX..XXXXXXX 100644
- backed-up or snapshotted image.  In order to do this, QEMU must know the
+--- a/include/fpu/softfloat-types.h
- address that has been allocated.
++++ b/include/fpu/softfloat-types.h
+@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
--The mechanism chosen for this memory sharing is writeable fw_cfg blobs.
+     float_infzeronan_suppress_invalid = (1 << 7),
-+The mechanism chosen for this memory sharing is writable fw_cfg blobs.
+ } FloatInfZeroNaNRule;
- These are data object that are visible to both QEMU and guests, and are
- addressable as sequential files.
++/*
++ * When flush_to_zero is set, should we detect denormal results to
-@@ -XXX,XX +XXX,XX @@ Two fw_cfg blobs are used in this case:
++ * be flushed before or after rounding? For most architectures this
- /etc/vmgenid_guid - contains the actual VM Generation ID GUID
++ * should be set to match the tininess_before_rounding setting,
-                   - read-only to the guest
++ * but a few architectures, e.g. MIPS MSA, detect FTZ before
- /etc/vmgenid_addr - contains the address of the downloaded vmgenid blob
++ * rounding but tininess after rounding.
--                  - writeable by the guest
++ *
-+                  - writable by the guest
++ * This enum is arranged so that the default if the target doesn't
++ * configure it matches the default for tininess_before_rounding
++ * (i.e. "after rounding").
- QEMU sends the following commands to the guest at startup:
++ */
-diff --git a/hw/scsi/mfi.h b/hw/scsi/mfi.h
++typedef enum __attribute__((__packed__)) {
-index XXXXXXX..XXXXXXX 100644
++    float_ftz_after_rounding = 0,
---- a/hw/scsi/mfi.h
++    float_ftz_before_rounding = 1,
-+++ b/hw/scsi/mfi.h
++} FloatFTZDetection;
-@@ -XXX,XX +XXX,XX @@ struct mfi_ctrl_props {
++
                                * metadata and user data
                                * 1=5%, 2=10%, 3=15% and so on
                                */
 -    uint8_t viewSpace;       /* snapshot writeable VIEWs
 +    uint8_t viewSpace;       /* snapshot writable VIEWs
                                * capacity as a % of source LD
                                * capacity. 0=READ only
                                * 1=5%, 2=10%, 3=15% and so on
 diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/internals.h
 +++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ enum MVEECIState {
  #define PMCRP   0x2
  #define PMCRE   0x1
  /*
-- * Mask of PMCR bits writeable by guest (not including WO bits like C, P,
+  * Floating Point Status. Individual architectures may maintain
-+ * Mask of PMCR bits writable by guest (not including WO bits like C, P,
+  * several versions of float_status for different functions. The
-  * which can be written as 1 to trigger behaviour but which stay RAZ).
+@@ -XXX,XX +XXX,XX @@ typedef struct float_status {
-  */
+     bool tininess_before_rounding;
--#define PMCR_WRITEABLE_MASK (PMCRLC | PMCRDP | PMCRX | PMCRD | PMCRE)
+     /* should denormalised results go to zero and set output_denormal_flushed? */
-+#define PMCR_WRITABLE_MASK (PMCRLC | PMCRDP | PMCRX | PMCRD | PMCRE)
+     bool flush_to_zero;
++    /* do we detect and flush denormal results before or after rounding? */
- #define PMXEVTYPER_P          0x80000000
++    FloatFTZDetection ftz_detection;
- #define PMXEVTYPER_U          0x40000000
+     /* should denormalised inputs go to zero and set input_denormal_flushed? */
-diff --git a/target/i386/hvf/vmcs.h b/target/i386/hvf/vmcs.h
+     bool flush_inputs_to_zero;
-index XXXXXXX..XXXXXXX 100644
+     bool default_nan_mode;
---- a/target/i386/hvf/vmcs.h
+diff --git a/target/mips/fpu_helper.h b/target/mips/fpu_helper.h
-+++ b/target/i386/hvf/vmcs.h
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@
+--- a/target/mips/fpu_helper.h
- #define EPT_VIOLATION_DATA_WRITE (1UL << 1)
++++ b/target/mips/fpu_helper.h
- #define EPT_VIOLATION_INST_FETCH (1UL << 2)
+@@ -XXX,XX +XXX,XX @@ static inline void fp_reset(CPUMIPSState *env)
- #define EPT_VIOLATION_GPA_READABLE (1UL << 3)
+      */
--#define EPT_VIOLATION_GPA_WRITEABLE (1UL << 4)
+     set_float_2nan_prop_rule(float_2nan_prop_s_ab,
-+#define EPT_VIOLATION_GPA_WRITABLE (1UL << 4)
+                              &env->active_fpu.fp_status);
- #define EPT_VIOLATION_GPA_EXECUTABLE (1UL << 5)
++    /*
- #define EPT_VIOLATION_GLA_VALID (1UL << 7)
++     * TODO: the spec does't say clearly whether FTZ happens before
- #define EPT_VIOLATION_XLAT_VALID (1UL << 8)
++     * or after rounding for normal FPU operations.
-diff --git a/target/i386/hvf/vmx.h b/target/i386/hvf/vmx.h
++     */
-index XXXXXXX..XXXXXXX 100644
++    set_float_ftz_detection(float_ftz_before_rounding,
---- a/target/i386/hvf/vmx.h
++                            &env->active_fpu.fp_status);
-+++ b/target/i386/hvf/vmx.h
+ }
-@@ -XXX,XX +XXX,XX @@ static inline uint64_t cap2ctrl(uint64_t cap, uint64_t ctrl)
+ /* MSA */
- #define AR_TYPE_ACCESSES_MASK 1
+diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
- #define AR_TYPE_READABLE_MASK (1 << 1)
+index XXXXXXX..XXXXXXX 100644
--#define AR_TYPE_WRITEABLE_MASK (1 << 2)
+--- a/target/alpha/cpu.c
-+#define AR_TYPE_WRITABLE_MASK (1 << 2)
++++ b/target/alpha/cpu.c
- #define AR_TYPE_CODE_MASK (1 << 3)
+@@ -XXX,XX +XXX,XX @@ static void alpha_cpu_initfn(Object *obj)
- #define AR_TYPE_MASK 0x0f
+     set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
- #define AR_TYPE_BUSY_64_TSS 11
+     /* Default NaN: sign bit clear, msb frac bit set */
-diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
+     set_float_default_nan_pattern(0b01000000, &env->fp_status);
-index XXXXXXX..XXXXXXX 100644
++    /*
---- a/accel/hvf/hvf-accel-ops.c
++     * TODO: this is incorrect. The Alpha Architecture Handbook version 4
-+++ b/accel/hvf/hvf-accel-ops.c
++     * section 4.7.7.11 says that we flush to zero for underflow cases, so
-@@ -XXX,XX +XXX,XX @@ static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
++     * this should be float_ftz_after_rounding to match the
 +     * tininess_after_rounding (which is specified in section 4.7.5).
 +     */
 +    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
  #if defined(CONFIG_USER_ONLY)
      env->flags = ENV_FLAG_PS_USER | ENV_FLAG_FEN;
      cpu_alpha_store_fpcr(env, (uint64_t)(FPCR_INVD | FPCR_DZED | FPCR_OVFD
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
  static void arm_set_default_fp_behaviours(float_status *s)
  {
-     hvf_slot *mem;
+     set_float_detect_tininess(float_tininess_before_rounding, s);
-     MemoryRegion *area = section->mr;
++    set_float_ftz_detection(float_ftz_before_rounding, s);
--    bool writeable = !area->readonly && !area->rom_device;
+     set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
-+    bool writable = !area->readonly && !area->rom_device;
+     set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
-     hv_memory_flags_t flags;
+     set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
-     uint64_t page_size = qemu_real_host_page_size();
+diff --git a/target/hppa/fpu_helper.c b/target/hppa/fpu_helper.c
+index XXXXXXX..XXXXXXX 100644
-     if (!memory_region_is_ram(area)) {
+--- a/target/hppa/fpu_helper.c
--        if (writeable) {
++++ b/target/hppa/fpu_helper.c
-+        if (writable) {
+@@ -XXX,XX +XXX,XX @@ void HELPER(loaded_fr0)(CPUHPPAState *env)
-             return;
+     set_float_infzeronan_rule(float_infzeronan_dnan_never, &env->fp_status);
-         } else if (!memory_region_is_romd(area)) {
+     /* Default NaN: sign bit clear, msb-1 frac bit set */
-             /*
+     set_float_default_nan_pattern(0b00100000, &env->fp_status);
-diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
++    /*
-index XXXXXXX..XXXXXXX 100644
++     * "PA-RISC 2.0 Architecture" says it is IMPDEF whether the flushing
---- a/accel/kvm/kvm-all.c
++     * enabled by FPSR.D happens before or after rounding. We pick "before"
-+++ b/accel/kvm/kvm-all.c
++     * for consistency with tininess detection.
-@@ -XXX,XX +XXX,XX @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
++     */
-     KVMSlot *mem;
++    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
-     int err;
++    /*
-     MemoryRegion *mr = section->mr;
++     * TODO: "PA-RISC 2.0 Architecture" chapter 10 says that we should
--    bool writeable = !mr->readonly && !mr->rom_device;
++     * detect tininess before rounding, but we don't set that here so we
-+    bool writable = !mr->readonly && !mr->rom_device;
++     * get the default tininess after rounding.
-     hwaddr start_addr, size, slot_size, mr_offset;
++     */
-     ram_addr_t ram_start_offset;
+ }
-     void *ram;
+ void cpu_hppa_loaded_fr0(CPUHPPAState *env)
-     if (!memory_region_is_ram(mr)) {
+diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
--        if (writeable || !kvm_readonly_mem_allowed) {
+index XXXXXXX..XXXXXXX 100644
-+        if (writable || !kvm_readonly_mem_allowed) {
+--- a/target/i386/tcg/fpu_helper.c
-             return;
++++ b/target/i386/tcg/fpu_helper.c
-         } else if (!mr->romd_mode) {
+@@ -XXX,XX +XXX,XX @@ void cpu_init_fp_statuses(CPUX86State *env)
-             /* If the memory device is not in romd_mode, then we actually want
+     set_float_default_nan_pattern(0b11000000, &env->fp_status);
-diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
+     set_float_default_nan_pattern(0b11000000, &env->mmx_status);
-index XXXXXXX..XXXXXXX 100644
+     set_float_default_nan_pattern(0b11000000, &env->sse_status);
---- a/accel/tcg/user-exec.c
++    /*
-+++ b/accel/tcg/user-exec.c
++     * TODO: x86 does flush-to-zero detection after rounding (the SDM
-@@ -XXX,XX +XXX,XX @@ MMUAccessType adjust_signal_pc(uintptr_t *pc, bool is_write)
++     * section 10.2.3.3 on the FTZ bit of MXCSR says that we flush
-  * Return true if the write fault has been handled, and should be re-tried.
++     * when we detect underflow, which x86 does after rounding).
-  *
++     */
-  * Note that it is important that we don't call page_unprotect() unless
++    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
-- * this is really a "write to nonwriteable page" fault, because
++    set_float_ftz_detection(float_ftz_before_rounding, &env->mmx_status);
-+ * this is really a "write to nonwritable page" fault, because
++    set_float_ftz_detection(float_ftz_before_rounding, &env->sse_status);
-  * page_unprotect() assumes that if it is called for an access to
+ }
-- * a page that's writeable this means we had two threads racing and
-- * another thread got there first and already made the page writeable;
+ static inline uint8_t save_exception_flags(CPUX86State *env)
-+ * a page that's writable this means we had two threads racing and
+diff --git a/target/mips/msa.c b/target/mips/msa.c
-+ * another thread got there first and already made the page writable;
+index XXXXXXX..XXXXXXX 100644
-  * so we will retry the access. If we were to call page_unprotect()
+--- a/target/mips/msa.c
-  * for some other kind of fault that should really be passed to the
++++ b/target/mips/msa.c
-  * guest, we'd end up in an infinite loop of retrying the faulting access.
+@@ -XXX,XX +XXX,XX @@ void msa_reset(CPUMIPSState *env)
-diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
+     /* tininess detected after rounding.*/
-index XXXXXXX..XXXXXXX 100644
+     set_float_detect_tininess(float_tininess_after_rounding,
---- a/hw/acpi/ghes.c
+                               &env->active_tc.msa_fp_status);
-+++ b/hw/acpi/ghes.c
++    /*
-@@ -XXX,XX +XXX,XX @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
++     * MSACSR.FS detects tiny results to flush to zero before rounding
-     for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
++     * (per "MIPS Architecture for Programmers Volume IV-j: The MIPS64 SIMD
-         /*
++     * Architecture Module, Revision 1.1" section 3.5.4), even though it
-          * Initialize the value of read_ack_register to 1, so GHES can be
++     * detects tininess after rounding for underflow purposes (section 3.4.2
--         * writeable after (re)boot.
++     * table 3.3).
-+         * writable after (re)boot.
++     */
-          * ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
++    set_float_ftz_detection(float_ftz_before_rounding,
-          * (GHESv2 - Type 10)
++                            &env->active_tc.msa_fp_status);
           */
 diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/intc/arm_gicv3_cpuif.c
 +++ b/hw/intc/arm_gicv3_cpuif.c
@@ -XXX,XX +XXX,XX @@ static void icc_ctlr_el3_write(CPUARMState *env, const ARMCPRegInfo *ri,
          cs->icc_ctlr_el1[GICV3_S] |= ICC_CTLR_EL1_CBPR;
      }
 -    /* The only bit stored in icc_ctlr_el3 which is writeable is EOIMODE_EL3: */
 +    /* The only bit stored in icc_ctlr_el3 which is writable is EOIMODE_EL3: */
      mask = ICC_CTLR_EL3_EOIMODE_EL3;
      cs->icc_ctlr_el3 &= ~mask;
 diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/intc/arm_gicv3_dist.c
 +++ b/hw/intc/arm_gicv3_dist.c
@@ -XXX,XX +XXX,XX @@ static bool gicd_writel(GICv3State *s, hwaddr offset,
          if (value & mask & GICD_CTLR_DS) {
              /* We just set DS, so the ARE_NS and EnG1S bits are now RES0.
               * Note that this is a one-way transition because if DS is set
 -             * then it's not writeable, so it can only go back to 0 with a
 +             * then it's not writable, so it can only go back to 0 with a
               * hardware reset.
               */
              s->gicd_ctlr &= ~(GICD_CTLR_EN_GRP1S | GICD_CTLR_ARE_NS);
 diff --git a/hw/intc/arm_gicv3_redist.c b/hw/intc/arm_gicv3_redist.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/intc/arm_gicv3_redist.c
 +++ b/hw/intc/arm_gicv3_redist.c
@@ -XXX,XX +XXX,XX @@ static void gicr_write_vpendbaser(GICv3CPUState *cs, uint64_t newval)
      /*
-      * The DIRTY bit is read-only and for us is always zero;
+      * According to MIPS specifications, if one of the two operands is
--     * other fields are writeable.
+diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
-+     * other fields are writable.
+index XXXXXXX..XXXXXXX 100644
-      */
+--- a/target/ppc/cpu_init.c
-     newval &= R_GICR_VPENDBASER_INNERCACHE_MASK |
++++ b/target/ppc/cpu_init.c
-         R_GICR_VPENDBASER_SHAREABILITY_MASK |
+@@ -XXX,XX +XXX,XX @@ static void ppc_cpu_reset_hold(Object *obj, ResetType type)
-@@ -XXX,XX +XXX,XX @@ static MemTxResult gicr_writel(GICv3CPUState *cs, hwaddr offset,
+     /* tininess for underflow is detected before rounding */
-         /* RAZ/WI for our implementation */
+     set_float_detect_tininess(float_tininess_before_rounding,
-         return MEMTX_OK;
+                               &env->fp_status);
-     case GICR_WAKER:
++    /* Similarly for flush-to-zero */
--        /* Only the ProcessorSleep bit is writeable. When the guest sets
++    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
-+        /* Only the ProcessorSleep bit is writable. When the guest sets
++
-          * it it requests that we transition the channel between the
+     /*
-          * redistributor and the cpu interface to quiescent, and that
+      * PowerPC propagation rules:
-          * we set the ChildrenAsleep bit once the inteface has reached the
+      *  1. A if it sNaN or qNaN
-diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
+diff --git a/target/rx/cpu.c b/target/rx/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/riscv_aclint.c
+--- a/target/rx/cpu.c
-+++ b/hw/intc/riscv_aclint.c
++++ b/target/rx/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void riscv_aclint_swi_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ static void rx_cpu_reset_hold(Object *obj, ResetType type)
-     /* Claim software interrupt bits */
+     set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
-     for (i = 0; i < swi->num_harts; i++) {
+     /* Default NaN value: sign bit clear, set frac msb */
-         RISCVCPU *cpu = RISCV_CPU(qemu_get_cpu(swi->hartid_base + i));
+     set_float_default_nan_pattern(0b01000000, &env->fp_status);
--        /* We don't claim mip.SSIP because it is writeable by software */
++    /*
-+        /* We don't claim mip.SSIP because it is writable by software */
++     * TODO: "RX Family RXv1 Instruction Set Architecture" is not 100% clear
-         if (riscv_cpu_claim_interrupts(cpu, swi->sswi ? 0 : MIP_MSIP) < 0) {
++     * on whether flush-to-zero should happen before or after rounding, but
-             error_report("MSIP already claimed");
++     * section 1.3.2 says that it happens when underflow is detected, and
-             exit(1);
++     * implies that underflow is detected after rounding. So this may not
-diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c
++     * be the correct setting.
-index XXXXXXX..XXXXXXX 100644
++     */
---- a/hw/intc/riscv_aplic.c
++    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
-+++ b/hw/intc/riscv_aplic.c
+ }
-@@ -XXX,XX +XXX,XX @@ static void riscv_aplic_write(void *opaque, hwaddr addr, uint64_t value,
-     }
+ static ObjectClass *rx_cpu_class_by_name(const char *cpu_model)
+diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
-     if (addr == APLIC_DOMAINCFG) {
+index XXXXXXX..XXXXXXX 100644
--        /* Only IE bit writeable at the moment */
+--- a/target/sh4/cpu.c
-+        /* Only IE bit writable at the moment */
++++ b/target/sh4/cpu.c
-         value &= APLIC_DOMAINCFG_IE;
+@@ -XXX,XX +XXX,XX @@ static void superh_cpu_reset_hold(Object *obj, ResetType type)
-         aplic->domaincfg = value;
+     set_default_nan_mode(1, &env->fp_status);
-     } else if ((APLIC_SOURCECFG_BASE <= addr) &&
+     /* sign bit clear, set all frac bits other than msb */
-diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
+     set_float_default_nan_pattern(0b00111111, &env->fp_status);
-index XXXXXXX..XXXXXXX 100644
++    /*
---- a/hw/pci/shpc.c
++     * TODO: "SH-4 CPU Core Architecture ADCS 7182230F" doesn't say whether
-+++ b/hw/pci/shpc.c
++     * it detects tininess before or after rounding. Section 6.4 is clear
-@@ -XXX,XX +XXX,XX @@ static int shpc_cap_add_config(PCIDevice *d, Error **errp)
++     * that flush-to-zero happens when the result underflows, though, so
-     pci_set_byte(config + SHPC_CAP_CxP, 0);
++     * either this should be "detect ftz after rounding" or else we should
-     pci_set_long(config + SHPC_CAP_DWORD_DATA, 0);
++     * be setting "detect tininess before rounding".
-     d->shpc->cap = config_offset;
++     */
--    /* Make dword select and data writeable. */
++    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
-+    /* Make dword select and data writable. */
+ }
-     pci_set_byte(d->wmask + config_offset + SHPC_CAP_DWORD_SELECT, 0xff);
-     pci_set_long(d->wmask + config_offset + SHPC_CAP_DWORD_DATA, 0xffffffff);
+ static void superh_cpu_disas_set_info(CPUState *cpu, disassemble_info *info)
-     return 0;
+diff --git a/target/tricore/helper.c b/target/tricore/helper.c
-diff --git a/hw/sparc64/sun4u_iommu.c b/hw/sparc64/sun4u_iommu.c
+index XXXXXXX..XXXXXXX 100644
-index XXXXXXX..XXXXXXX 100644
+--- a/target/tricore/helper.c
---- a/hw/sparc64/sun4u_iommu.c
++++ b/target/tricore/helper.c
-+++ b/hw/sparc64/sun4u_iommu.c
+@@ -XXX,XX +XXX,XX @@ void fpu_set_state(CPUTriCoreState *env)
-@@ -XXX,XX +XXX,XX @@ static IOMMUTLBEntry sun4u_translate_iommu(IOMMUMemoryRegion *iommu,
+     set_flush_inputs_to_zero(1, &env->fp_status);
-     }
+     set_flush_to_zero(1, &env->fp_status);
+     set_float_detect_tininess(float_tininess_before_rounding, &env->fp_status);
-     if (tte & IOMMU_TTE_DATA_W) {
++    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
--        /* Writeable */
+     set_default_nan_mode(1, &env->fp_status);
-+        /* Writable */
+     /* Default NaN pattern: sign bit clear, frac msb set */
-         ret.perm = IOMMU_RW;
+     set_float_default_nan_pattern(0b01000000, &env->fp_status);
-     } else {
+diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
-         ret.perm = IOMMU_RO;
+index XXXXXXX..XXXXXXX 100644
-diff --git a/hw/timer/sse-timer.c b/hw/timer/sse-timer.c
+--- a/tests/fp/fp-bench.c
-index XXXXXXX..XXXXXXX 100644
++++ b/tests/fp/fp-bench.c
---- a/hw/timer/sse-timer.c
+@@ -XXX,XX +XXX,XX @@ static void run_bench(void)
-+++ b/hw/timer/sse-timer.c
+     set_float_3nan_prop_rule(float_3nan_prop_s_cab, &soft_status);
-@@ -XXX,XX +XXX,XX @@ static void sse_timer_write(void *opaque, hwaddr offset, uint64_t value,
+     set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, &soft_status);
-     {
+     set_float_default_nan_pattern(0b01000000, &soft_status);
-         uint32_t old_ctl = s->cntp_aival_ctl;
++    set_float_ftz_detection(float_ftz_before_rounding, &soft_status);
--        /* EN bit is writeable; CLR bit is write-0-to-clear, write-1-ignored */
+     f = bench_funcs[operation][precision];
-+        /* EN bit is writable; CLR bit is write-0-to-clear, write-1-ignored */
+     g_assert(f);
-         s->cntp_aival_ctl &= ~R_CNTP_AIVAL_CTL_EN_MASK;
+diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
-         s->cntp_aival_ctl |= value & R_CNTP_AIVAL_CTL_EN_MASK;
+index XXXXXXX..XXXXXXX 100644
-         if (!(value & R_CNTP_AIVAL_CTL_CLR_MASK)) {
+--- a/fpu/softfloat-parts.c.inc
-diff --git a/target/arm/gdbstub.c b/target/arm/gdbstub.c
++++ b/fpu/softfloat-parts.c.inc
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
---- a/target/arm/gdbstub.c
+             p->frac_lo &= ~round_mask;
-+++ b/target/arm/gdbstub.c
+         }
-@@ -XXX,XX +XXX,XX @@ int arm_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
+         frac_shr(p, frac_shift);
-             /*
+-    } else if (s->flush_to_zero) {
-              * Don't allow writing to XPSR.Exception as it can cause
++    } else if (s->flush_to_zero &&
-              * a transition into or out of handler mode (it's not
++               s->ftz_detection == float_ftz_before_rounding) {
--             * writeable via the MSR insn so this is a reasonable
+         flags |= float_flag_output_denormal_flushed;
-+             * writable via the MSR insn so this is a reasonable
+         p->cls = float_class_zero;
-              * restriction). Other fields are safe to update.
+         exp = 0;
-              */
+@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
-             xpsr_write(env, tmp, ~XPSR_EXCP);
+         exp = (p->frac_hi & DECOMPOSED_IMPLICIT_BIT) && !fmt->m68k_denormal;
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+         frac_shr(p, frac_shift);
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+-        if (is_tiny && (flags & float_flag_inexact)) {
-+++ b/target/arm/helper.c
+-            flags |= float_flag_underflow;
-@@ -XXX,XX +XXX,XX @@ static void pmcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
+-        }
 -        if (exp == 0 && frac_eqz(p)) {
 -            p->cls = float_class_zero;
 +        if (is_tiny) {
 +            if (s->flush_to_zero) {
 +                assert(s->ftz_detection == float_ftz_after_rounding);
 +                flags |= float_flag_output_denormal_flushed;
 +                p->cls = float_class_zero;
 +                exp = 0;
 +                frac_clear(p);
 +            } else if (flags & float_flag_inexact) {
 +                flags |= float_flag_underflow;
 +            }
 +            if (exp == 0 && frac_eqz(p)) {
 +                p->cls = float_class_zero;
 +            }
          }
      }
+     p->exp = exp;
 -    env->cp15.c9_pmcr &= ~PMCR_WRITEABLE_MASK;
 -    env->cp15.c9_pmcr |= (value & PMCR_WRITEABLE_MASK);
 +    env->cp15.c9_pmcr &= ~PMCR_WRITABLE_MASK;
 +    env->cp15.c9_pmcr |= (value & PMCR_WRITABLE_MASK);
      pmu_op_finish(env);
  }
 diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/hvf/hvf.c
 +++ b/target/arm/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, uint64_t val)
              }
          }
 -        env->cp15.c9_pmcr &= ~PMCR_WRITEABLE_MASK;
 -        env->cp15.c9_pmcr |= (val & PMCR_WRITEABLE_MASK);
 +        env->cp15.c9_pmcr &= ~PMCR_WRITABLE_MASK;
 +        env->cp15.c9_pmcr |= (val & PMCR_WRITABLE_MASK);
          pmu_op_finish(env);
          break;
 diff --git a/target/i386/cpu-sysemu.c b/target/i386/cpu-sysemu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/cpu-sysemu.c
 +++ b/target/i386/cpu-sysemu.c
@@ -XXX,XX +XXX,XX @@ static void x86_cpu_to_dict(X86CPU *cpu, QDict *props)
  /* Convert CPU model data from X86CPU object to a property dictionary
   * that can recreate exactly the same CPU model, including every
 - * writeable QOM property.
 + * writable QOM property.
   */
  static void x86_cpu_to_dict_full(X86CPU *cpu, QDict *props)
  {
 diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/s390x/ioinst.c
 +++ b/target/s390x/ioinst.c
@@ -XXX,XX +XXX,XX @@ void ioinst_handle_stsch(S390CPU *cpu, uint64_t reg1, uint32_t ipb,
          g_assert(!s390_is_pv());
          /*
           * As operand exceptions have a lower priority than access exceptions,
 -         * we check whether the memory area is writeable (injecting the
 +         * we check whether the memory area is writable (injecting the
           * access execption if it is not) first.
           */
          if (!s390_cpu_virt_mem_check_write(cpu, addr, ar, sizeof(schib))) {
 diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
 index XXXXXXX..XXXXXXX 100644
 --- a/python/qemu/machine/machine.py
 +++ b/python/qemu/machine/machine.py
@@ -XXX,XX +XXX,XX @@ def _early_cleanup(self) -> None:
          """
          # If we keep the console socket open, we may deadlock waiting
          # for QEMU to exit, while QEMU is waiting for the socket to
 -        # become writeable.
 +        # become writable.
          if self._console_socket is not None:
              self._console_socket.close()
              self._console_socket = None
 diff --git a/tests/tcg/x86_64/system/boot.S b/tests/tcg/x86_64/system/boot.S
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/tcg/x86_64/system/boot.S
 +++ b/tests/tcg/x86_64/system/boot.S
@@ -XXX,XX +XXX,XX @@
      *
      * - `ebx`: contains the physical memory address where the loader has placed
      *          the boot start info structure.
 -    * - `cr0`: bit 0 (PE) must be set. All the other writeable bits are cleared.
 +    * - `cr0`: bit 0 (PE) must be set. All the other writable bits are cleared.
      * - `cr4`: all bits are cleared.
      * - `cs `: must be a 32-bit read/execute code segment with a base of ‘0’
      *          and a limit of ‘0xFFFFFFFF’. The selector value is unspecified.
 --
-.25.1
+.34.1

-[PULL 48/55] target/arm: Split out load/store primitives to sve_ldst_internal.h
+[PULL 05/68] target/arm: Define FPCR AH, FIZ, NEP bits
-From: Richard Henderson <richard.henderson@linaro.org>
+The Armv8.7 FEAT_AFP feature defines three new control bits in
 the FPCR:
  * FPCR.AH: "alternate floating point mode"; this changes floating
    point behaviour in a variety of ways, including:
     - the sign of a default NaN is 1, not 0
     - if FPCR.FZ is also 1, denormals detected after rounding
       with an unbounded exponent has been applied are flushed to zero
     - FPCR.FZ does not cause denormalized inputs to be flushed to zero
     - miscellaneous other corner-case behaviour changes
  * FPCR.FIZ: flush denormalized numbers to zero on input for
    most instructions
  * FPCR.NEP: makes scalar SIMD operations merge the result with
    higher vector elements in one of the source registers, instead
    of zeroing the higher elements of the destination
-Begin creation of sve_ldst_internal.h by moving the primitives
+This commit defines the new bits in the FPCR, and allows them to be
-that access host and tlb memory.
+read or written when FEAT_AFP is implemented.  Actual behaviour
 changes will be implemented in subsequent commits.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Note that these are the first FPCR bits which don't appear in the
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+AArch32 FPSCR view of the register, and which share bit positions
-Message-id: 20220607203306.657998-14-richard.henderson@linaro.org
+with FPSR bits.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/sve_ldst_internal.h | 127 +++++++++++++++++++++++++++++++++
+ target/arm/cpu-features.h |  5 +++++
- target/arm/sve_helper.c        | 107 +--------------------------
+ target/arm/cpu.h          |  3 +++
-files changed, 128 insertions(+), 106 deletions(-)
+ target/arm/vfp_helper.c   | 11 ++++++++---
- create mode 100644 target/arm/sve_ldst_internal.h
+files changed, 16 insertions(+), 3 deletions(-)
-diff --git a/target/arm/sve_ldst_internal.h b/target/arm/sve_ldst_internal.h
+diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
-new file mode 100644
+index XXXXXXX..XXXXXXX 100644
-index XXXXXXX..XXXXXXX
+--- a/target/arm/cpu-features.h
---- /dev/null
++++ b/target/arm/cpu-features.h
-+++ b/target/arm/sve_ldst_internal.h
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_hcx(const ARMISARegisters *id)
-@@ -XXX,XX +XXX,XX @@
+     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HCX) != 0;
-+/*
+ }
-+ * ARM SVE Load/Store Helpers
-+ *
++static inline bool isar_feature_aa64_afp(const ARMISARegisters *id)
-+ * Copyright (c) 2018-2022 Linaro
++{
-+ *
++    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, AFP) != 0;
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2.1 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
 + */
 +
 +#ifndef TARGET_ARM_SVE_LDST_INTERNAL_H
 +#define TARGET_ARM_SVE_LDST_INTERNAL_H
 +
 +#include "exec/cpu_ldst.h"
 +
 +/*
 + * Load one element into @vd + @reg_off from @host.
 + * The controlling predicate is known to be true.
 + */
 +typedef void sve_ldst1_host_fn(void *vd, intptr_t reg_off, void *host);
 +
 +/*
 + * Load one element into @vd + @reg_off from (@env, @vaddr, @ra).
 + * The controlling predicate is known to be true.
 + */
 +typedef void sve_ldst1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
 +                              target_ulong vaddr, uintptr_t retaddr);
 +
 +/*
 + * Generate the above primitives.
 + */
 +
 +#define DO_LD_HOST(NAME, H, TYPEE, TYPEM, HOST)                              \
 +static inline void sve_##NAME##_host(void *vd, intptr_t reg_off, void *host) \
 +{ TYPEM val = HOST(host); *(TYPEE *)(vd + H(reg_off)) = val; }
 +
 +#define DO_ST_HOST(NAME, H, TYPEE, TYPEM, HOST)                              \
 +static inline void sve_##NAME##_host(void *vd, intptr_t reg_off, void *host) \
 +{ TYPEM val = *(TYPEE *)(vd + H(reg_off)); HOST(host, val); }
 +
 +#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, TLB)                              \
 +static inline void sve_##NAME##_tlb(CPUARMState *env, void *vd,            \
 +                        intptr_t reg_off, target_ulong addr, uintptr_t ra) \
 +{                                                                          \
 +    TYPEM val = TLB(env, useronly_clean_ptr(addr), ra);                    \
 +    *(TYPEE *)(vd + H(reg_off)) = val;                                     \
 +}
 +
-+#define DO_ST_TLB(NAME, H, TYPEE, TYPEM, TLB)                              \
+ static inline bool isar_feature_aa64_tidcp1(const ARMISARegisters *id)
-+static inline void sve_##NAME##_tlb(CPUARMState *env, void *vd,            \
+ {
-+                        intptr_t reg_off, target_ulong addr, uintptr_t ra) \
+     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, TIDCP1) != 0;
-+{                                                                          \
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 +    TYPEM val = *(TYPEE *)(vd + H(reg_off));                               \
 +    TLB(env, useronly_clean_ptr(addr), val, ra);                           \
 +}
 +
 +#define DO_LD_PRIM_1(NAME, H, TE, TM)                   \
 +    DO_LD_HOST(NAME, H, TE, TM, ldub_p)                 \
 +    DO_LD_TLB(NAME, H, TE, TM, cpu_ldub_data_ra)
 +
 +DO_LD_PRIM_1(ld1bb,  H1,   uint8_t,  uint8_t)
 +DO_LD_PRIM_1(ld1bhu, H1_2, uint16_t, uint8_t)
 +DO_LD_PRIM_1(ld1bhs, H1_2, uint16_t,  int8_t)
 +DO_LD_PRIM_1(ld1bsu, H1_4, uint32_t, uint8_t)
 +DO_LD_PRIM_1(ld1bss, H1_4, uint32_t,  int8_t)
 +DO_LD_PRIM_1(ld1bdu, H1_8, uint64_t, uint8_t)
 +DO_LD_PRIM_1(ld1bds, H1_8, uint64_t,  int8_t)
 +
 +#define DO_ST_PRIM_1(NAME, H, TE, TM)                   \
 +    DO_ST_HOST(st1##NAME, H, TE, TM, stb_p)             \
 +    DO_ST_TLB(st1##NAME, H, TE, TM, cpu_stb_data_ra)
 +
 +DO_ST_PRIM_1(bb,   H1,  uint8_t, uint8_t)
 +DO_ST_PRIM_1(bh, H1_2, uint16_t, uint8_t)
 +DO_ST_PRIM_1(bs, H1_4, uint32_t, uint8_t)
 +DO_ST_PRIM_1(bd, H1_8, uint64_t, uint8_t)
 +
 +#define DO_LD_PRIM_2(NAME, H, TE, TM, LD) \
 +    DO_LD_HOST(ld1##NAME##_be, H, TE, TM, LD##_be_p)    \
 +    DO_LD_HOST(ld1##NAME##_le, H, TE, TM, LD##_le_p)    \
 +    DO_LD_TLB(ld1##NAME##_be, H, TE, TM, cpu_##LD##_be_data_ra) \
 +    DO_LD_TLB(ld1##NAME##_le, H, TE, TM, cpu_##LD##_le_data_ra)
 +
 +#define DO_ST_PRIM_2(NAME, H, TE, TM, ST) \
 +    DO_ST_HOST(st1##NAME##_be, H, TE, TM, ST##_be_p)    \
 +    DO_ST_HOST(st1##NAME##_le, H, TE, TM, ST##_le_p)    \
 +    DO_ST_TLB(st1##NAME##_be, H, TE, TM, cpu_##ST##_be_data_ra) \
 +    DO_ST_TLB(st1##NAME##_le, H, TE, TM, cpu_##ST##_le_data_ra)
 +
 +DO_LD_PRIM_2(hh,  H1_2, uint16_t, uint16_t, lduw)
 +DO_LD_PRIM_2(hsu, H1_4, uint32_t, uint16_t, lduw)
 +DO_LD_PRIM_2(hss, H1_4, uint32_t,  int16_t, lduw)
 +DO_LD_PRIM_2(hdu, H1_8, uint64_t, uint16_t, lduw)
 +DO_LD_PRIM_2(hds, H1_8, uint64_t,  int16_t, lduw)
 +
 +DO_ST_PRIM_2(hh, H1_2, uint16_t, uint16_t, stw)
 +DO_ST_PRIM_2(hs, H1_4, uint32_t, uint16_t, stw)
 +DO_ST_PRIM_2(hd, H1_8, uint64_t, uint16_t, stw)
 +
 +DO_LD_PRIM_2(ss,  H1_4, uint32_t, uint32_t, ldl)
 +DO_LD_PRIM_2(sdu, H1_8, uint64_t, uint32_t, ldl)
 +DO_LD_PRIM_2(sds, H1_8, uint64_t,  int32_t, ldl)
 +
 +DO_ST_PRIM_2(ss, H1_4, uint32_t, uint32_t, stl)
 +DO_ST_PRIM_2(sd, H1_8, uint64_t, uint32_t, stl)
 +
 +DO_LD_PRIM_2(dd, H1_8, uint64_t, uint64_t, ldq)
 +DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq)
 +
 +#undef DO_LD_TLB
 +#undef DO_ST_TLB
 +#undef DO_LD_HOST
 +#undef DO_LD_PRIM_1
 +#undef DO_ST_PRIM_1
 +#undef DO_LD_PRIM_2
 +#undef DO_ST_PRIM_2
 +
 +#endif /* TARGET_ARM_SVE_LDST_INTERNAL_H */
 diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/sve_helper.c
+--- a/target/arm/cpu.h
-+++ b/target/arm/sve_helper.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
  #include "cpu.h"
  #include "internals.h"
  #include "exec/exec-all.h"
 -#include "exec/cpu_ldst.h"
  #include "exec/helper-proto.h"
  #include "tcg/tcg-gvec-desc.h"
  #include "fpu/softfloat.h"
  #include "tcg/tcg.h"
  #include "vec_internal.h"
 +#include "sve_ldst_internal.h"
  /* Return a value for NZCV as per the ARM PredTest pseudofunction.
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
   * Load contiguous data, protected by a governing predicate.
   */
--/*
+ /* FPCR bits */
-- * Load one element into @vd + @reg_off from @host.
++#define FPCR_FIZ    (1 << 0)    /* Flush Inputs to Zero (FEAT_AFP) */
-- * The controlling predicate is known to be true.
++#define FPCR_AH     (1 << 1)    /* Alternate Handling (FEAT_AFP) */
-- */
++#define FPCR_NEP    (1 << 2)    /* SIMD scalar ops preserve elts (FEAT_AFP) */
--typedef void sve_ldst1_host_fn(void *vd, intptr_t reg_off, void *host);
+ #define FPCR_IOE    (1 << 8)    /* Invalid Operation exception trap enable */
--
+ #define FPCR_DZE    (1 << 9)    /* Divide by Zero exception trap enable */
--/*
+ #define FPCR_OFE    (1 << 10)   /* Overflow exception trap enable */
-- * Load one element into @vd + @reg_off from (@env, @vaddr, @ra).
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
-- * The controlling predicate is known to be true.
+index XXXXXXX..XXXXXXX 100644
-- */
+--- a/target/arm/vfp_helper.c
--typedef void sve_ldst1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
++++ b/target/arm/vfp_helper.c
--                              target_ulong vaddr, uintptr_t retaddr);
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_masked(CPUARMState *env, uint32_t val, uint32_t mask)
--
+     if (!cpu_isar_feature(any_fp16, cpu)) {
--/*
+         val &= ~FPCR_FZ16;
-- * Generate the above primitives.
+     }
-- */
++    if (!cpu_isar_feature(aa64_afp, cpu)) {
--
++        val &= ~(FPCR_FIZ | FPCR_AH | FPCR_NEP);
--#define DO_LD_HOST(NAME, H, TYPEE, TYPEM, HOST) \
++    }
--static void sve_##NAME##_host(void *vd, intptr_t reg_off, void *host)  \
--{                                                                      \
+     if (!cpu_isar_feature(aa64_ebf16, cpu)) {
--    TYPEM val = HOST(host);                                            \
+         val &= ~FPCR_EBF;
--    *(TYPEE *)(vd + H(reg_off)) = val;                                 \
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_masked(CPUARMState *env, uint32_t val, uint32_t mask)
--}
+      * We don't implement trapped exception handling, so the
--
+      * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
--#define DO_ST_HOST(NAME, H, TYPEE, TYPEM, HOST) \
+      *
--static void sve_##NAME##_host(void *vd, intptr_t reg_off, void *host)  \
+-     * The FPCR bits we keep in vfp.fpcr are AHP, DN, FZ, RMode, EBF
--{ HOST(host, (TYPEM)*(TYPEE *)(vd + H(reg_off))); }
+-     * and FZ16. Len, Stride and LTPSIZE we just handled. Store those bits
--
++     * The FPCR bits we keep in vfp.fpcr are AHP, DN, FZ, RMode, EBF, FZ16,
--#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, TLB) \
++     * FIZ, AH, and NEP.
--static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
++     * Len, Stride and LTPSIZE we just handled. Store those bits
--                             target_ulong addr, uintptr_t ra)               \
+      * there, and zero any of the other FPCR bits and the RES0 and RAZ/WI
--{                                                                           \
+      * bits.
--    *(TYPEE *)(vd + H(reg_off)) =                                           \
+      */
--        (TYPEM)TLB(env, useronly_clean_ptr(addr), ra);                      \
+-    val &= FPCR_AHP | FPCR_DN | FPCR_FZ | FPCR_RMODE_MASK | FPCR_FZ16 | FPCR_EBF;
--}
++    val &= FPCR_AHP | FPCR_DN | FPCR_FZ | FPCR_RMODE_MASK | FPCR_FZ16 |
--
++        FPCR_EBF | FPCR_FIZ | FPCR_AH | FPCR_NEP;
--#define DO_ST_TLB(NAME, H, TYPEE, TYPEM, TLB) \
+     env->vfp.fpcr &= ~mask;
--static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
+     env->vfp.fpcr |= val;
--                             target_ulong addr, uintptr_t ra)               \
+ }
 -{                                                                           \
 -    TLB(env, useronly_clean_ptr(addr),                                      \
 -        (TYPEM)*(TYPEE *)(vd + H(reg_off)), ra);                            \
 -}
 -
 -#define DO_LD_PRIM_1(NAME, H, TE, TM)                   \
 -    DO_LD_HOST(NAME, H, TE, TM, ldub_p)                 \
 -    DO_LD_TLB(NAME, H, TE, TM, cpu_ldub_data_ra)
 -
 -DO_LD_PRIM_1(ld1bb,  H1,   uint8_t,  uint8_t)
 -DO_LD_PRIM_1(ld1bhu, H1_2, uint16_t, uint8_t)
 -DO_LD_PRIM_1(ld1bhs, H1_2, uint16_t,  int8_t)
 -DO_LD_PRIM_1(ld1bsu, H1_4, uint32_t, uint8_t)
 -DO_LD_PRIM_1(ld1bss, H1_4, uint32_t,  int8_t)
 -DO_LD_PRIM_1(ld1bdu, H1_8, uint64_t, uint8_t)
 -DO_LD_PRIM_1(ld1bds, H1_8, uint64_t,  int8_t)
 -
 -#define DO_ST_PRIM_1(NAME, H, TE, TM)                   \
 -    DO_ST_HOST(st1##NAME, H, TE, TM, stb_p)             \
 -    DO_ST_TLB(st1##NAME, H, TE, TM, cpu_stb_data_ra)
 -
 -DO_ST_PRIM_1(bb,   H1,  uint8_t, uint8_t)
 -DO_ST_PRIM_1(bh, H1_2, uint16_t, uint8_t)
 -DO_ST_PRIM_1(bs, H1_4, uint32_t, uint8_t)
 -DO_ST_PRIM_1(bd, H1_8, uint64_t, uint8_t)
 -
 -#define DO_LD_PRIM_2(NAME, H, TE, TM, LD) \
 -    DO_LD_HOST(ld1##NAME##_be, H, TE, TM, LD##_be_p)    \
 -    DO_LD_HOST(ld1##NAME##_le, H, TE, TM, LD##_le_p)    \
 -    DO_LD_TLB(ld1##NAME##_be, H, TE, TM, cpu_##LD##_be_data_ra) \
 -    DO_LD_TLB(ld1##NAME##_le, H, TE, TM, cpu_##LD##_le_data_ra)
 -
 -#define DO_ST_PRIM_2(NAME, H, TE, TM, ST) \
 -    DO_ST_HOST(st1##NAME##_be, H, TE, TM, ST##_be_p)    \
 -    DO_ST_HOST(st1##NAME##_le, H, TE, TM, ST##_le_p)    \
 -    DO_ST_TLB(st1##NAME##_be, H, TE, TM, cpu_##ST##_be_data_ra) \
 -    DO_ST_TLB(st1##NAME##_le, H, TE, TM, cpu_##ST##_le_data_ra)
 -
 -DO_LD_PRIM_2(hh,  H1_2, uint16_t, uint16_t, lduw)
 -DO_LD_PRIM_2(hsu, H1_4, uint32_t, uint16_t, lduw)
 -DO_LD_PRIM_2(hss, H1_4, uint32_t,  int16_t, lduw)
 -DO_LD_PRIM_2(hdu, H1_8, uint64_t, uint16_t, lduw)
 -DO_LD_PRIM_2(hds, H1_8, uint64_t,  int16_t, lduw)
 -
 -DO_ST_PRIM_2(hh, H1_2, uint16_t, uint16_t, stw)
 -DO_ST_PRIM_2(hs, H1_4, uint32_t, uint16_t, stw)
 -DO_ST_PRIM_2(hd, H1_8, uint64_t, uint16_t, stw)
 -
 -DO_LD_PRIM_2(ss,  H1_4, uint32_t, uint32_t, ldl)
 -DO_LD_PRIM_2(sdu, H1_8, uint64_t, uint32_t, ldl)
 -DO_LD_PRIM_2(sds, H1_8, uint64_t,  int32_t, ldl)
 -
 -DO_ST_PRIM_2(ss, H1_4, uint32_t, uint32_t, stl)
 -DO_ST_PRIM_2(sd, H1_8, uint64_t, uint32_t, stl)
 -
 -DO_LD_PRIM_2(dd, H1_8, uint64_t, uint64_t, ldq)
 -DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq)
 -
 -#undef DO_LD_TLB
 -#undef DO_ST_TLB
 -#undef DO_LD_HOST
 -#undef DO_LD_PRIM_1
 -#undef DO_ST_PRIM_1
 -#undef DO_LD_PRIM_2
 -#undef DO_ST_PRIM_2
 -
  /*
   * Skip through a sequence of inactive elements in the guarding predicate @vg,
   * beginning at @reg_off bounded by @reg_max.  Return the offset of the active
 --
-.25.1
+.34.1

-[PULL 17/55] target/arm: Move pmsav7_use_background_region to ptw.c
+[PULL 06/68] target/arm: Implement FPCR.FIZ handling
-From: Richard Henderson <richard.henderson@linaro.org>
+Part of FEAT_AFP is the new control bit FPCR.FIZ.  This bit affects
 flushing of single and double precision denormal inputs to zero for
 AArch64 floating point instructions.  (For half-precision, the
 existing FPCR.FZ16 control remains the only one.)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+FPCR.FIZ differs from FPCR.FZ in that if we flush an input denormal
-Message-id: 20220604040607.269301-11-richard.henderson@linaro.org
+only because of FPCR.FIZ then we should *not* set the cumulative
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+exception bit FPSR.IDC.
 FEAT_AFP also defines that in AArch64 the existing FPCR.FZ only
 applies when FPCR.AH is 0.
 We can implement this by setting the "flush inputs to zero" state
 appropriately when FPCR is written, and by not reflecting the
 float_flag_input_denormal status flag into FPSR reads when it is the
 result only of FPSR.FIZ.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/ptw.h    |  2 --
+ target/arm/vfp_helper.c | 60 ++++++++++++++++++++++++++++++++++-------
- target/arm/helper.c | 19 -------------------
+file changed, 50 insertions(+), 10 deletions(-)
  target/arm/ptw.c    | 21 +++++++++++++++++++++
 files changed, 21 insertions(+), 21 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/vfp_helper.c
-+++ b/target/arm/ptw.h
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
- bool m_is_ppb_region(CPUARMState *env, uint32_t address);
- bool m_is_system_region(CPUARMState *env, uint32_t address);
+ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
+ {
--bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx, bool is_user);
+-    uint32_t i = 0;
--
++    uint32_t a32_flags = 0, a64_flags = 0;
- bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-                         MMUAccessType access_type, ARMMMUIdx mmu_idx,
+-    i |= get_float_exception_flags(&env->vfp.fp_status_a32);
-                         bool s1_is_el0,
+-    i |= get_float_exception_flags(&env->vfp.fp_status_a64);
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+-    i |= get_float_exception_flags(&env->vfp.standard_fp_status);
-index XXXXXXX..XXXXXXX 100644
++    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
---- a/target/arm/helper.c
++    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
-+++ b/target/arm/helper.c
+     /* FZ16 does not generate an input denormal exception.  */
-@@ -XXX,XX +XXX,XX @@ do_fault:
+-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
-     return true;
++    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
            & ~float_flag_input_denormal_flushed);
 -    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
 +    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
            & ~float_flag_input_denormal_flushed);
 -    i |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
 +
 +    a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
 +    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
            & ~float_flag_input_denormal_flushed);
 -    return vfp_exceptbits_from_host(i);
 +    /*
 +     * Flushing an input denormal *only* because FPCR.FIZ == 1 does
 +     * not set FPSR.IDC; if FPCR.FZ is also set then this takes
 +     * precedence and IDC is set (see the FPUnpackBase pseudocode).
 +     * So squash it unless (FPCR.AH == 0 && FPCR.FZ == 1).
 +     * We only do this for the a64 flags because FIZ has no effect
 +     * on AArch32 even if it is set.
 +     */
 +    if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
 +        a64_flags &= ~float_flag_input_denormal_flushed;
 +    }
 +    return vfp_exceptbits_from_host(a32_flags | a64_flags);
  }
--bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx, bool is_user)
+ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
--{
+@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
--    /* Return true if we should use the default memory map as a
+     set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
 -     * "background" region if there are no hits against any MPU regions.
 -     */
 -    CPUARMState *env = &cpu->env;
 -
 -    if (is_user) {
 -        return false;
 -    }
 -
 -    if (arm_feature(env, ARM_FEATURE_M)) {
 -        return env->v7m.mpu_ctrl[regime_is_secure(env, mmu_idx)]
 -            & R_V7M_MPU_CTRL_PRIVDEFENA_MASK;
 -    } else {
 -        return regime_sctlr(env, mmu_idx) & SCTLR_BR;
 -    }
 -}
 -
  bool m_is_ppb_region(CPUARMState *env, uint32_t address)
  {
      /* True if address is in the M profile PPB region 0xe0000000 - 0xe00fffff */
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static void get_phys_addr_pmsav7_default(CPUARMState *env, ARMMMUIdx mmu_idx,
      }
  }
-+static bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx,
++static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
 +                                         bool is_user)
 +{
 +    /*
-+     * Return true if we should use the default memory map as a
++     * Synchronize any pending exception-flag information in the
-+     * "background" region if there are no hits against any MPU regions.
++     * float_status values into env->vfp.fpsr, and then clear out
 +     * the float_status data.
 +     */
-+    CPUARMState *env = &cpu->env;
++    env->vfp.fpsr |= vfp_get_fpsr_from_host(env);
-+
++    vfp_clear_float_status_exc_flags(env);
 +    if (is_user) {
 +        return false;
 +    }
 +
 +    if (arm_feature(env, ARM_FEATURE_M)) {
 +        return env->v7m.mpu_ctrl[regime_is_secure(env, mmu_idx)]
 +            & R_V7M_MPU_CTRL_PRIVDEFENA_MASK;
 +    } else {
 +        return regime_sctlr(env, mmu_idx) & SCTLR_BR;
 +    }
 +}
 +
- static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
+ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
-                                  MMUAccessType access_type, ARMMMUIdx mmu_idx,
+ {
-                                  hwaddr *phys_ptr, int *prot,
+     uint64_t changed = env->vfp.fpcr;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
      if (changed & FPCR_FZ) {
          bool ftz_enabled = val & FPCR_FZ;
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
 -        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
 -        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
 +        /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
 +        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
 +    }
 +    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
 +        /*
 +         * A64: Flush denormalized inputs to zero if FPCR.FIZ = 1, or
 +         * both FPCR.AH = 0 and FPCR.FZ = 1.
 +         */
 +        bool fitz_enabled = (val & FPCR_FIZ) ||
 +            (val & (FPCR_FZ | FPCR_AH)) == FPCR_FZ;
 +        set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status_a64);
      }
      if (changed & FPCR_DN) {
          bool dnan_enabled = val & FPCR_DN;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
      }
 +    /*
 +     * If any bits changed that we look at in vfp_get_fpsr_from_host(),
 +     * we must sync the float_status flags into vfp.fpsr now (under the
 +     * old regime) before we update vfp.fpcr.
 +     */
 +    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
 +        vfp_sync_and_clear_float_status_exc_flags(env);
 +    }
  }
  #else
 --
-.25.1
+.34.1

-[PULL 26/55] target/arm: Move get_S1prot, get_S2prot to ptw.c
+[PULL 07/68] target/arm: Adjust FP behaviour for FPCR.AH = 1
-From: Richard Henderson <richard.henderson@linaro.org>
+When FPCR.AH is set, various behaviours of AArch64 floating point
 operations which are controlled by softfloat config settings change:
  * tininess and ftz detection before/after rounding
  * NaN propagation order
  * result of 0 * Inf + NaN
  * default NaN value
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+When the guest changes the value of the AH bit, switch these config
-Message-id: 20220604040607.269301-20-richard.henderson@linaro.org
+settings on the fp_status_a64 and fp_status_f16_a64 float_status
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+fields.
 This requires us to make the arm_set_default_fp_behaviours() function
 global, since we now need to call it from cpu.c and vfp_helper.c; we
 move it to vfp_helper.c so it can be next to the new
 arm_set_ah_fp_behaviours().
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/ptw.h    |   3 --
+ target/arm/internals.h  |  4 +++
- target/arm/helper.c | 128 --------------------------------------------
+ target/arm/cpu.c        | 23 ----------------
- target/arm/ptw.c    | 128 ++++++++++++++++++++++++++++++++++++++++++++
+ target/arm/vfp_helper.c | 58 ++++++++++++++++++++++++++++++++++++++++-
-files changed, 128 insertions(+), 131 deletions(-)
+files changed, 61 insertions(+), 24 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/internals.h
-+++ b/target/arm/ptw.h
++++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
+@@ -XXX,XX +XXX,XX @@ uint64_t gt_virt_cnt_offset(CPUARMState *env);
-                                    ARMMMUIdx mmu_idx);
+  * all EL1" scope; this covers stage 1 and stage 2.
- bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
+  */
-                         int inputsize, int stride, int outputsize);
+ int alle1_tlbmask(CPUARMState *env);
--int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0);
++
--int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
++/* Set the float_status behaviour to match the Arm defaults */
--               int ap, int ns, int xn, int pxn);
++void arm_set_default_fp_behaviours(float_status *s);
++
- #endif /* !CONFIG_USER_ONLY */
+ #endif
- #endif /* TARGET_ARM_PTW_H */
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/cpu.c
-+++ b/target/arm/helper.c
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
+@@ -XXX,XX +XXX,XX @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
-     }
+     QLIST_INSERT_HEAD(&cpu->el_change_hooks, entry, node);
  }
--/* Translate S2 section/page access permissions to protection flags
+-/*
-- *
+- * Set the float_status behaviour to match the Arm defaults:
-- * @env:     CPUARMState
+- *  * tininess-before-rounding
-- * @s2ap:    The 2-bit stage2 access permissions (S2AP)
+- *  * 2-input NaN propagation prefers SNaN over QNaN, and then
-- * @xn:      XN (execute-never) bits
+- *    operand A over operand B (see FPProcessNaNs() pseudocode)
-- * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
+- *  * 3-input NaN propagation prefers SNaN over QNaN, and then
 - *    operand C over A over B (see FPProcessNaNs3() pseudocode,
 - *    but note that for QEMU muladd is a * b + c, whereas for
 - *    the pseudocode function the arguments are in the order c, a, b.
 - *  * 0 * Inf + NaN returns the default NaN if the input NaN is quiet,
 - *    and the input NaN if it is signalling
 - *  * Default NaN has sign bit clear, msb frac bit set
 - */
--int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
+-static void arm_set_default_fp_behaviours(float_status *s)
 -{
--    int prot = 0;
+-    set_float_detect_tininess(float_tininess_before_rounding, s);
--
+-    set_float_ftz_detection(float_ftz_before_rounding, s);
--    if (s2ap & 1) {
+-    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
--        prot |= PAGE_READ;
+-    set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
--    }
+-    set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
--    if (s2ap & 2) {
+-    set_float_default_nan_pattern(0b01000000, s);
 -        prot |= PAGE_WRITE;
 -    }
 -
 -    if (cpu_isar_feature(any_tts2uxn, env_archcpu(env))) {
 -        switch (xn) {
 -        case 0:
 -            prot |= PAGE_EXEC;
 -            break;
 -        case 1:
 -            if (s1_is_el0) {
 -                prot |= PAGE_EXEC;
 -            }
 -            break;
 -        case 2:
 -            break;
 -        case 3:
 -            if (!s1_is_el0) {
 -                prot |= PAGE_EXEC;
 -            }
 -            break;
 -        default:
 -            g_assert_not_reached();
 -        }
 -    } else {
 -        if (!extract32(xn, 1, 1)) {
 -            if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
 -                prot |= PAGE_EXEC;
 -            }
 -        }
 -    }
 -    return prot;
 -}
 -
--/* Translate section/page access permissions to protection flags
+ static void cp_reg_reset(gpointer key, gpointer value, gpointer opaque)
-- *
+ {
-- * @env:     CPUARMState
+     /* Reset a single ARMCPRegInfo register */
-- * @mmu_idx: MMU index indicating required translation regime
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 - * @is_aa64: TRUE if AArch64
 - * @ap:      The 2-bit simple AP (AP[2:1])
 - * @ns:      NS (non-secure) bit
 - * @xn:      XN (execute-never) bit
 - * @pxn:     PXN (privileged execute-never) bit
 - */
 -int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
 -               int ap, int ns, int xn, int pxn)
 -{
 -    bool is_user = regime_is_user(env, mmu_idx);
 -    int prot_rw, user_rw;
 -    bool have_wxn;
 -    int wxn = 0;
 -
 -    assert(mmu_idx != ARMMMUIdx_Stage2);
 -    assert(mmu_idx != ARMMMUIdx_Stage2_S);
 -
 -    user_rw = simple_ap_to_rw_prot_is_user(ap, true);
 -    if (is_user) {
 -        prot_rw = user_rw;
 -    } else {
 -        if (user_rw && regime_is_pan(env, mmu_idx)) {
 -            /* PAN forbids data accesses but doesn't affect insn fetch */
 -            prot_rw = 0;
 -        } else {
 -            prot_rw = simple_ap_to_rw_prot_is_user(ap, false);
 -        }
 -    }
 -
 -    if (ns && arm_is_secure(env) && (env->cp15.scr_el3 & SCR_SIF)) {
 -        return prot_rw;
 -    }
 -
 -    /* TODO have_wxn should be replaced with
 -     *   ARM_FEATURE_V8 || (ARM_FEATURE_V7 && ARM_FEATURE_EL2)
 -     * when ARM_FEATURE_EL2 starts getting set. For now we assume all LPAE
 -     * compatible processors have EL2, which is required for [U]WXN.
 -     */
 -    have_wxn = arm_feature(env, ARM_FEATURE_LPAE);
 -
 -    if (have_wxn) {
 -        wxn = regime_sctlr(env, mmu_idx) & SCTLR_WXN;
 -    }
 -
 -    if (is_aa64) {
 -        if (regime_has_2_ranges(mmu_idx) && !is_user) {
 -            xn = pxn || (user_rw & PAGE_WRITE);
 -        }
 -    } else if (arm_feature(env, ARM_FEATURE_V7)) {
 -        switch (regime_el(env, mmu_idx)) {
 -        case 1:
 -        case 3:
 -            if (is_user) {
 -                xn = xn || !(user_rw & PAGE_READ);
 -            } else {
 -                int uwxn = 0;
 -                if (have_wxn) {
 -                    uwxn = regime_sctlr(env, mmu_idx) & SCTLR_UWXN;
 -                }
 -                xn = xn || !(prot_rw & PAGE_READ) || pxn ||
 -                     (uwxn && (user_rw & PAGE_WRITE));
 -            }
 -            break;
 -        case 2:
 -            break;
 -        }
 -    } else {
 -        xn = wxn = 0;
 -    }
 -
 -    if (xn || (wxn && (prot_rw & PAGE_WRITE))) {
 -        return prot_rw;
 -    }
 -    return prot_rw | PAGE_EXEC;
 -}
 -
  /*
   * check_s2_mmu_setup
   * @cpu:        ARMCPU
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/vfp_helper.c
-+++ b/target/arm/ptw.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ do_fault:
+@@ -XXX,XX +XXX,XX @@
-     return true;
+ #include "exec/helper-proto.h"
- }
+ #include "internals.h"
  #include "cpu-features.h"
 +#include "fpu/softfloat.h"
  #ifdef CONFIG_TCG
  #include "qemu/log.h"
 -#include "fpu/softfloat.h"
  #endif
  /* VFP support.  We follow the convention used for VFP instructions:
     Single precision routines have a "s" suffix, double precision a
     "d" suffix.  */
 +/*
-+ * Translate S2 section/page access permissions to protection flags
++ * Set the float_status behaviour to match the Arm defaults:
-+ * @env:     CPUARMState
++ *  * tininess-before-rounding
-+ * @s2ap:    The 2-bit stage2 access permissions (S2AP)
++ *  * 2-input NaN propagation prefers SNaN over QNaN, and then
-+ * @xn:      XN (execute-never) bits
++ *    operand A over operand B (see FPProcessNaNs() pseudocode)
-+ * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
++ *  * 3-input NaN propagation prefers SNaN over QNaN, and then
 + *    operand C over A over B (see FPProcessNaNs3() pseudocode,
 + *    but note that for QEMU muladd is a * b + c, whereas for
 + *    the pseudocode function the arguments are in the order c, a, b.
 + *  * 0 * Inf + NaN returns the default NaN if the input NaN is quiet,
 + *    and the input NaN if it is signalling
 + *  * Default NaN has sign bit clear, msb frac bit set
 + */
-+static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
++void arm_set_default_fp_behaviours(float_status *s)
 +{
-+    int prot = 0;
++    set_float_detect_tininess(float_tininess_before_rounding, s);
-+
++    set_float_ftz_detection(float_ftz_before_rounding, s);
-+    if (s2ap & 1) {
++    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
-+        prot |= PAGE_READ;
++    set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
-+    }
++    set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
-+    if (s2ap & 2) {
++    set_float_default_nan_pattern(0b01000000, s);
 +        prot |= PAGE_WRITE;
 +    }
 +
 +    if (cpu_isar_feature(any_tts2uxn, env_archcpu(env))) {
 +        switch (xn) {
 +        case 0:
 +            prot |= PAGE_EXEC;
 +            break;
 +        case 1:
 +            if (s1_is_el0) {
 +                prot |= PAGE_EXEC;
 +            }
 +            break;
 +        case 2:
 +            break;
 +        case 3:
 +            if (!s1_is_el0) {
 +                prot |= PAGE_EXEC;
 +            }
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +    } else {
 +        if (!extract32(xn, 1, 1)) {
 +            if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
 +                prot |= PAGE_EXEC;
 +            }
 +        }
 +    }
 +    return prot;
 +}
 +
 +/*
-+ * Translate section/page access permissions to protection flags
++ * Set the float_status behaviour to match the FEAT_AFP
-+ * @env:     CPUARMState
++ * FPCR.AH=1 requirements:
-+ * @mmu_idx: MMU index indicating required translation regime
++ *  * tininess-after-rounding
-+ * @is_aa64: TRUE if AArch64
++ *  * 2-input NaN propagation prefers the first NaN
-+ * @ap:      The 2-bit simple AP (AP[2:1])
++ *  * 3-input NaN propagation prefers a over b over c
-+ * @ns:      NS (non-secure) bit
++ *  * 0 * Inf + NaN always returns the input NaN and doesn't
-+ * @xn:      XN (execute-never) bit
++ *    set Invalid for a QNaN
-+ * @pxn:     PXN (privileged execute-never) bit
++ *  * default NaN has sign bit set, msb frac bit set
 + */
-+static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
++static void arm_set_ah_fp_behaviours(float_status *s)
 +                      int ap, int ns, int xn, int pxn)
 +{
-+    bool is_user = regime_is_user(env, mmu_idx);
++    set_float_detect_tininess(float_tininess_after_rounding, s);
-+    int prot_rw, user_rw;
++    set_float_ftz_detection(float_ftz_after_rounding, s);
-+    bool have_wxn;
++    set_float_2nan_prop_rule(float_2nan_prop_ab, s);
-+    int wxn = 0;
++    set_float_3nan_prop_rule(float_3nan_prop_abc, s);
 +    set_float_infzeronan_rule(float_infzeronan_dnan_never |
 +                              float_infzeronan_suppress_invalid, s);
 +    set_float_default_nan_pattern(0b11000000, s);
 +}
 +
-+    assert(mmu_idx != ARMMMUIdx_Stage2);
+ #ifdef CONFIG_TCG
-+    assert(mmu_idx != ARMMMUIdx_Stage2_S);
  /* Convert host exception flags to vfp form.  */
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
      }
 +    if (changed & FPCR_AH) {
 +        bool ah_enabled = val & FPCR_AH;
 +
-+    user_rw = simple_ap_to_rw_prot_is_user(ap, true);
++        if (ah_enabled) {
-+    if (is_user) {
++            /* Change behaviours for A64 FP operations */
-+        prot_rw = user_rw;
++            arm_set_ah_fp_behaviours(&env->vfp.fp_status_a64);
-+    } else {
++            arm_set_ah_fp_behaviours(&env->vfp.fp_status_f16_a64);
 +        if (user_rw && regime_is_pan(env, mmu_idx)) {
 +            /* PAN forbids data accesses but doesn't affect insn fetch */
 +            prot_rw = 0;
 +        } else {
-+            prot_rw = simple_ap_to_rw_prot_is_user(ap, false);
++            arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
 +            arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
 +        }
 +    }
-+
+     /*
-+    if (ns && arm_is_secure(env) && (env->cp15.scr_el3 & SCR_SIF)) {
+      * If any bits changed that we look at in vfp_get_fpsr_from_host(),
-+        return prot_rw;
+      * we must sync the float_status flags into vfp.fpsr now (under the
 +    }
 +
 +    /* TODO have_wxn should be replaced with
 +     *   ARM_FEATURE_V8 || (ARM_FEATURE_V7 && ARM_FEATURE_EL2)
 +     * when ARM_FEATURE_EL2 starts getting set. For now we assume all LPAE
 +     * compatible processors have EL2, which is required for [U]WXN.
 +     */
 +    have_wxn = arm_feature(env, ARM_FEATURE_LPAE);
 +
 +    if (have_wxn) {
 +        wxn = regime_sctlr(env, mmu_idx) & SCTLR_WXN;
 +    }
 +
 +    if (is_aa64) {
 +        if (regime_has_2_ranges(mmu_idx) && !is_user) {
 +            xn = pxn || (user_rw & PAGE_WRITE);
 +        }
 +    } else if (arm_feature(env, ARM_FEATURE_V7)) {
 +        switch (regime_el(env, mmu_idx)) {
 +        case 1:
 +        case 3:
 +            if (is_user) {
 +                xn = xn || !(user_rw & PAGE_READ);
 +            } else {
 +                int uwxn = 0;
 +                if (have_wxn) {
 +                    uwxn = regime_sctlr(env, mmu_idx) & SCTLR_UWXN;
 +                }
 +                xn = xn || !(prot_rw & PAGE_READ) || pxn ||
 +                     (uwxn && (user_rw & PAGE_WRITE));
 +            }
 +            break;
 +        case 2:
 +            break;
 +        }
 +    } else {
 +        xn = wxn = 0;
 +    }
 +
 +    if (xn || (wxn && (prot_rw & PAGE_WRITE))) {
 +        return prot_rw;
 +    }
 +    return prot_rw | PAGE_EXEC;
 +}
 +
  /**
   * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
   *
 --
-.25.1
+.34.1

-New patch
+[PULL 08/68] target/arm: Adjust exception flag handling for AH = 1
+When FPCR.AH = 1, some of the cumulative exception flags in the FPSR
+behave slightly differently for A64 operations:
+ * IDC is set when a denormal input is used without flushing
+ * IXC (Inexact) is set when an output denormal is flushed to zero
+Update vfp_get_fpsr_from_host() to do this.
+Note that because half-precision operations never set IDC, we now
+need to add float_flag_input_denormal_used to the set we mask out of
+fp_status_f16_a64.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/vfp_helper.c | 17 ++++++++++++++---
+file changed, 14 insertions(+), 3 deletions(-)
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp_helper.c
++++ b/target/arm/vfp_helper.c
+@@ -XXX,XX +XXX,XX @@ static void arm_set_ah_fp_behaviours(float_status *s)
+ #ifdef CONFIG_TCG
+ /* Convert host exception flags to vfp form.  */
+-static inline uint32_t vfp_exceptbits_from_host(int host_bits)
++static inline uint32_t vfp_exceptbits_from_host(int host_bits, bool ah)
+ {
+     uint32_t target_bits = 0;
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
+     if (host_bits & float_flag_input_denormal_flushed) {
+         target_bits |= FPSR_IDC;
+     }
++    /*
++     * With FPCR.AH, IDC is set when an input denormal is used,
++     * and flushing an output denormal to zero sets both IXC and UFC.
++     */
++    if (ah && (host_bits & float_flag_input_denormal_used)) {
++        target_bits |= FPSR_IDC;
++    }
++    if (ah && (host_bits & float_flag_output_denormal_flushed)) {
++        target_bits |= FPSR_IXC;
++    }
+     return target_bits;
+ }
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
+     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
+     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
+-          & ~float_flag_input_denormal_flushed);
++          & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
+     /*
+      * Flushing an input denormal *only* because FPCR.FIZ == 1 does
+      * not set FPSR.IDC; if FPCR.FZ is also set then this takes
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
+     if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
+         a64_flags &= ~float_flag_input_denormal_flushed;
+     }
+-    return vfp_exceptbits_from_host(a32_flags | a64_flags);
++    return vfp_exceptbits_from_host(a64_flags, env->vfp.fpcr & FPCR_AH) |
++        vfp_exceptbits_from_host(a32_flags, false);
+ }
+ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
+--
+.34.1

-[PULL 46/55] target/arm: Use uint32_t instead of bitmap for sve vq's
+[PULL 09/68] target/arm: Add FPCR.AH to tbflags
-From: Richard Henderson <richard.henderson@linaro.org>
+We are going to need to generate different code in some cases when
 FPCR.AH is 1.  For example:
  * Floating point neg and abs must not flip the sign bit of NaNs
  * some insns (FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, and various
    BFCVT and BFM bfloat16 ops) need to use a different float_status
    to the usual one
-The bitmap need only hold 15 bits; bitmap is over-complicated.
+Encode FPCR.AH into the A64 tbflags, so we can refer to it at
-We can simplify operations quite a bit with plain logical ops.
+translate time.
-The introduction of SVE_VQ_POW2_MAP eliminates the need for
+Because we now have a bit in FPCR that affects codegen, we can't mark
-looping in order to search for powers of two.  Simply perform
+the AArch64 FPCR register as being SUPPRESS_TB_END any more; writes
-the logical ops and use count leading or trailing zeros as
+to it will now end the TB and trigger a regeneration of hflags.
 required to find the result.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220607203306.657998-12-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/cpu.h       |   6 +--
+ target/arm/cpu.h               | 1 +
- target/arm/internals.h |   5 ++
+ target/arm/tcg/translate.h     | 2 ++
- target/arm/kvm_arm.h   |   7 ++-
+ target/arm/helper.c            | 2 +-
- target/arm/cpu64.c     | 117 ++++++++++++++++++++---------------------
+ target/arm/tcg/hflags.c        | 4 ++++
- target/arm/helper.c    |   9 +---
+ target/arm/tcg/translate-a64.c | 1 +
- target/arm/kvm64.c     |  36 +++----------
+files changed, 9 insertions(+), 1 deletion(-)
 files changed, 75 insertions(+), 105 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
+@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2, 34, 1)
-      * Bits set in sve_vq_supported represent valid vector lengths for
+ FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1)
-      * the CPU type.
+ /* Set if FEAT_NV2 RAM accesses are big-endian */
-      */
+ FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1)
--    DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ);
++FIELD(TBFLAG_A64, AH, 37, 1)   /* FPCR.AH */
--    DECLARE_BITMAP(sve_vq_init, ARM_MAX_VQ);
--    DECLARE_BITMAP(sve_vq_supported, ARM_MAX_VQ);
+ /*
-+    uint32_t sve_vq_map;
+  * Helpers for using the above. Note that only the A64 accessors use
-+    uint32_t sve_vq_init;
+diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
 +    uint32_t sve_vq_supported;
      /* Generic timer counter frequency, in Hz */
      uint64_t gt_cntfrq_hz;
 diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
+--- a/target/arm/tcg/translate.h
-+++ b/target/arm/internals.h
++++ b/target/arm/tcg/translate.h
-@@ -XXX,XX +XXX,XX @@ bool el_is_in_host(CPUARMState *env, int el);
+@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
+     bool nv2_mem_e20;
- void aa32_max_features(ARMCPU *cpu);
+     /* True if NV2 enabled and NV2 RAM accesses are big-endian */
+     bool nv2_mem_be;
-+/* Powers of 2 for sve_vq_map et al. */
++    /* True if FPCR.AH is 1 (alternate floating point handling) */
-+#define SVE_VQ_POW2_MAP                                 \
++    bool fpcr_ah;
 +    ((1 << (1 - 1)) | (1 << (2 - 1)) |                  \
 +     (1 << (4 - 1)) | (1 << (8 - 1)) | (1 << (16 - 1)))
 +
  #endif
 diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm_arm.h
 +++ b/target/arm/kvm_arm.h
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf);
  /**
   * kvm_arm_sve_get_vls:
   * @cs: CPUState
 - * @map: bitmap to fill in
   *
   * Get all the SVE vector lengths supported by the KVM host, setting
   * the bits corresponding to their length in quadwords minus one
 - * (vq - 1) in @map up to ARM_MAX_VQ.
 + * (vq - 1) up to ARM_MAX_VQ.  Return the resulting map.
   */
 -void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map);
 +uint32_t kvm_arm_sve_get_vls(CPUState *cs);
  /**
   * kvm_arm_set_cpu_features_from_host:
@@ -XXX,XX +XXX,XX @@ static inline void kvm_arm_steal_time_finalize(ARMCPU *cpu, Error **errp)
      g_assert_not_reached();
  }
 -static inline void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map)
 +static inline uint32_t kvm_arm_sve_get_vls(CPUState *cs)
  {
      g_assert_not_reached();
  }
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
       * any of the above.  Finally, if SVE is not disabled, then at least one
       * vector length must be enabled.
       */
 -    DECLARE_BITMAP(tmp, ARM_MAX_VQ);
 -    uint32_t vq, max_vq = 0;
 +    uint32_t vq_map = cpu->sve_vq_map;
 +    uint32_t vq_init = cpu->sve_vq_init;
 +    uint32_t vq_supported;
 +    uint32_t vq_mask = 0;
 +    uint32_t tmp, vq, max_vq = 0;
      /*
-      * CPU models specify a set of supported vector lengths which are
+      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
-@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
+      *  < 0, set by the current instruction.
       * in the supported bitmap results in an error.  When KVM is enabled we
       * fetch the supported bitmap from the host.
       */
 -    if (kvm_enabled() && kvm_arm_sve_supported()) {
 -        kvm_arm_sve_get_vls(CPU(cpu), cpu->sve_vq_supported);
 -    } else if (kvm_enabled()) {
 -        assert(!cpu_isar_feature(aa64_sve, cpu));
 +    if (kvm_enabled()) {
 +        if (kvm_arm_sve_supported()) {
 +            cpu->sve_vq_supported = kvm_arm_sve_get_vls(CPU(cpu));
 +            vq_supported = cpu->sve_vq_supported;
 +        } else {
 +            assert(!cpu_isar_feature(aa64_sve, cpu));
 +            vq_supported = 0;
 +        }
 +    } else {
 +        vq_supported = cpu->sve_vq_supported;
      }
      /*
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
       * From the properties, sve_vq_map<N> implies sve_vq_init<N>.
       * Check first for any sve<N> enabled.
       */
 -    if (!bitmap_empty(cpu->sve_vq_map, ARM_MAX_VQ)) {
 -        max_vq = find_last_bit(cpu->sve_vq_map, ARM_MAX_VQ) + 1;
 +    if (vq_map != 0) {
 +        max_vq = 32 - clz32(vq_map);
 +        vq_mask = MAKE_64BIT_MASK(0, max_vq);
          if (cpu->sve_max_vq && max_vq > cpu->sve_max_vq) {
              error_setg(errp, "cannot enable sve%d", max_vq * 128);
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
               * For KVM we have to automatically enable all supported unitialized
               * lengths, even when the smaller lengths are not all powers-of-two.
               */
 -            bitmap_andnot(tmp, cpu->sve_vq_supported, cpu->sve_vq_init, max_vq);
 -            bitmap_or(cpu->sve_vq_map, cpu->sve_vq_map, tmp, max_vq);
 +            vq_map |= vq_supported & ~vq_init & vq_mask;
          } else {
              /* Propagate enabled bits down through required powers-of-two. */
 -            for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
 -                if (!test_bit(vq - 1, cpu->sve_vq_init)) {
 -                    set_bit(vq - 1, cpu->sve_vq_map);
 -                }
 -            }
 +            vq_map |= SVE_VQ_POW2_MAP & ~vq_init & vq_mask;
          }
      } else if (cpu->sve_max_vq == 0) {
          /*
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
          if (kvm_enabled()) {
              /* Disabling a supported length disables all larger lengths. */
 -            for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
 -                if (test_bit(vq - 1, cpu->sve_vq_init) &&
 -                    test_bit(vq - 1, cpu->sve_vq_supported)) {
 -                    break;
 -                }
 -            }
 +            tmp = vq_init & vq_supported;
          } else {
              /* Disabling a power-of-two disables all larger lengths. */
 -            for (vq = 1; vq <= ARM_MAX_VQ; vq <<= 1) {
 -                if (test_bit(vq - 1, cpu->sve_vq_init)) {
 -                    break;
 -                }
 -            }
 +            tmp = vq_init & SVE_VQ_POW2_MAP;
          }
 +        vq = ctz32(tmp) + 1;
          max_vq = vq <= ARM_MAX_VQ ? vq - 1 : ARM_MAX_VQ;
 -        bitmap_andnot(cpu->sve_vq_map, cpu->sve_vq_supported,
 -                      cpu->sve_vq_init, max_vq);
 -        if (max_vq == 0 || bitmap_empty(cpu->sve_vq_map, max_vq)) {
 +        vq_mask = MAKE_64BIT_MASK(0, max_vq);
 +        vq_map = vq_supported & ~vq_init & vq_mask;
 +
 +        if (max_vq == 0 || vq_map == 0) {
              error_setg(errp, "cannot disable sve%d", vq * 128);
              error_append_hint(errp, "Disabling sve%d results in all "
                                "vector lengths being disabled.\n",
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
              return;
          }
 -        max_vq = find_last_bit(cpu->sve_vq_map, max_vq) + 1;
 +        max_vq = 32 - clz32(vq_map);
 +        vq_mask = MAKE_64BIT_MASK(0, max_vq);
      }
      /*
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
       */
      if (cpu->sve_max_vq != 0) {
          max_vq = cpu->sve_max_vq;
 +        vq_mask = MAKE_64BIT_MASK(0, max_vq);
 -        if (!test_bit(max_vq - 1, cpu->sve_vq_map) &&
 -            test_bit(max_vq - 1, cpu->sve_vq_init)) {
 +        if (vq_init & ~vq_map & (1 << (max_vq - 1))) {
              error_setg(errp, "cannot disable sve%d", max_vq * 128);
              error_append_hint(errp, "The maximum vector length must be "
                                "enabled, sve-max-vq=%d (%d bits)\n",
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
          }
          /* Set all bits not explicitly set within sve-max-vq. */
 -        bitmap_complement(tmp, cpu->sve_vq_init, max_vq);
 -        bitmap_or(cpu->sve_vq_map, cpu->sve_vq_map, tmp, max_vq);
 +        vq_map |= ~vq_init & vq_mask;
      }
      /*
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
       * are clear, just in case anybody looks.
       */
      assert(max_vq != 0);
 -    bitmap_clear(cpu->sve_vq_map, max_vq, ARM_MAX_VQ - max_vq);
 +    assert(vq_mask != 0);
 +    vq_map &= vq_mask;
      /* Ensure the set of lengths matches what is supported. */
 -    bitmap_xor(tmp, cpu->sve_vq_map, cpu->sve_vq_supported, max_vq);
 -    if (!bitmap_empty(tmp, max_vq)) {
 -        vq = find_last_bit(tmp, max_vq) + 1;
 -        if (test_bit(vq - 1, cpu->sve_vq_map)) {
 +    tmp = vq_map ^ (vq_supported & vq_mask);
 +    if (tmp) {
 +        vq = 32 - clz32(tmp);
 +        if (vq_map & (1 << (vq - 1))) {
              if (cpu->sve_max_vq) {
                  error_setg(errp, "cannot set sve-max-vq=%d", cpu->sve_max_vq);
                  error_append_hint(errp, "This CPU does not support "
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
                  return;
              } else {
                  /* Ensure all required powers-of-two are enabled. */
 -                for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
 -                    if (!test_bit(vq - 1, cpu->sve_vq_map)) {
 -                        error_setg(errp, "cannot disable sve%d", vq * 128);
 -                        error_append_hint(errp, "sve%d is required as it "
 -                                          "is a power-of-two length smaller "
 -                                          "than the maximum, sve%d\n",
 -                                          vq * 128, max_vq * 128);
 -                        return;
 -                    }
 +                tmp = SVE_VQ_POW2_MAP & vq_mask & ~vq_map;
 +                if (tmp) {
 +                    vq = 32 - clz32(tmp);
 +                    error_setg(errp, "cannot disable sve%d", vq * 128);
 +                    error_append_hint(errp, "sve%d is required as it "
 +                                      "is a power-of-two length smaller "
 +                                      "than the maximum, sve%d\n",
 +                                      vq * 128, max_vq * 128);
 +                    return;
                  }
              }
          }
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
      /* From now on sve_max_vq is the actual maximum supported length. */
      cpu->sve_max_vq = max_vq;
 +    cpu->sve_vq_map = vq_map;
  }
  static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
@@ -XXX,XX +XXX,XX @@ static void cpu_arm_get_sve_vq(Object *obj, Visitor *v, const char *name,
      if (!cpu_isar_feature(aa64_sve, cpu)) {
          value = false;
      } else {
 -        value = test_bit(vq - 1, cpu->sve_vq_map);
 +        value = extract32(cpu->sve_vq_map, vq - 1, 1);
      }
      visit_type_bool(v, name, &value, errp);
  }
@@ -XXX,XX +XXX,XX @@ static void cpu_arm_set_sve_vq(Object *obj, Visitor *v, const char *name,
          return;
      }
 -    if (value) {
 -        set_bit(vq - 1, cpu->sve_vq_map);
 -    } else {
 -        clear_bit(vq - 1, cpu->sve_vq_map);
 -    }
 -    set_bit(vq - 1, cpu->sve_vq_init);
 +    cpu->sve_vq_map = deposit32(cpu->sve_vq_map, vq - 1, 1, value);
 +    cpu->sve_vq_init |= 1 << (vq - 1);
  }
  static bool cpu_arm_get_sve(Object *obj, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
      cpu->dcz_blocksize = 7; /*  512 bytes */
  #endif
 -    bitmap_fill(cpu->sve_vq_supported, ARM_MAX_VQ);
 +    cpu->sve_vq_supported = MAKE_64BIT_MASK(0, ARM_MAX_VQ);
      aarch64_add_pauth_properties(obj);
      aarch64_add_sve_properties(obj);
@@ -XXX,XX +XXX,XX @@ static void aarch64_a64fx_initfn(Object *obj)
      cpu->gic_vprebits = 5;
      cpu->gic_pribits = 5;
 -    /* Suppport of A64FX's vector length are 128,256 and 512bit only */
 +    /* The A64FX supports only 128, 256 and 512 bit vector lengths */
      aarch64_add_sve_properties(obj);
 -    bitmap_zero(cpu->sve_vq_supported, ARM_MAX_VQ);
 -    set_bit(0, cpu->sve_vq_supported); /* 128bit */
 -    set_bit(1, cpu->sve_vq_supported); /* 256bit */
 -    set_bit(3, cpu->sve_vq_supported); /* 512bit */
 +    cpu->sve_vq_supported = (1 << 0)  /* 128bit */
 +                          | (1 << 1)  /* 256bit */
 +                          | (1 << 3); /* 512bit */
      cpu->isar.reset_pmcr_el0 = 0x46014040;
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
- {
+       .writefn = aa64_daif_write, .resetfn = arm_cp_reset_ignore },
-     ARMCPU *cpu = env_archcpu(env);
+     { .name = "FPCR", .state = ARM_CP_STATE_AA64,
-     uint32_t len = cpu->sve_max_vq - 1;
+       .opc0 = 3, .opc1 = 3, .opc2 = 0, .crn = 4, .crm = 4,
--    uint32_t end_len;
+-      .access = PL0_RW, .type = ARM_CP_FPU | ARM_CP_SUPPRESS_TB_END,
++      .access = PL0_RW, .type = ARM_CP_FPU,
-     if (el <= 1 && !el_is_in_host(env, el)) {
+       .readfn = aa64_fpcr_read, .writefn = aa64_fpcr_write },
-         len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
+     { .name = "FPSR", .state = ARM_CP_STATE_AA64,
-@@ -XXX,XX +XXX,XX @@ uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
+       .opc0 = 3, .opc1 = 3, .opc2 = 1, .crn = 4, .crm = 4,
-         len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
+diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/hflags.c
 +++ b/target/arm/tcg/hflags.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
          DP_TBFLAG_A64(flags, TCMA, aa64_va_parameter_tcma(tcr, mmu_idx));
      }
--    end_len = len;
++    if (env->vfp.fpcr & FPCR_AH) {
--    if (!test_bit(len, cpu->sve_vq_map)) {
++        DP_TBFLAG_A64(flags, AH, 1);
--        end_len = find_last_bit(cpu->sve_vq_map, len);
++    }
--        assert(end_len < len);
++
--    }
+     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 -    return end_len;
 +    len = 31 - clz32(cpu->sve_vq_map & MAKE_64BIT_MASK(0, len + 1));
 +    return len;
  }
- static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm64.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/kvm64.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ bool kvm_arm_steal_time_supported(void)
+@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
+     dc->nv2 = EX_TBFLAG_A64(tb_flags, NV2);
- QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1);
+     dc->nv2_mem_e20 = EX_TBFLAG_A64(tb_flags, NV2_MEM_E20);
+     dc->nv2_mem_be = EX_TBFLAG_A64(tb_flags, NV2_MEM_BE);
--void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map)
++    dc->fpcr_ah = EX_TBFLAG_A64(tb_flags, AH);
-+uint32_t kvm_arm_sve_get_vls(CPUState *cs)
+     dc->vec_len = 0;
- {
+     dc->vec_stride = 0;
-     /* Only call this function if kvm_arm_sve_supported() returns true. */
+     dc->cp_regs = arm_cpu->cp_regs;
      static uint64_t vls[KVM_ARM64_SVE_VLS_WORDS];
      static bool probed;
      uint32_t vq = 0;
 -    int i, j;
 -
 -    bitmap_zero(map, ARM_MAX_VQ);
 +    int i;
      /*
       * KVM ensures all host CPUs support the same set of vector lengths.
@@ -XXX,XX +XXX,XX @@ void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map)
          if (vq > ARM_MAX_VQ) {
              warn_report("KVM supports vector lengths larger than "
                          "QEMU can enable");
 +            vls[0] &= MAKE_64BIT_MASK(0, ARM_MAX_VQ);
          }
      }
 -    for (i = 0; i < KVM_ARM64_SVE_VLS_WORDS; ++i) {
 -        if (!vls[i]) {
 -            continue;
 -        }
 -        for (j = 1; j <= 64; ++j) {
 -            vq = j + i * 64;
 -            if (vq > ARM_MAX_VQ) {
 -                return;
 -            }
 -            if (vls[i] & (1UL << (j - 1))) {
 -                set_bit(vq - 1, map);
 -            }
 -        }
 -    }
 +    return vls[0];
  }
  static int kvm_arm_sve_set_vls(CPUState *cs)
  {
 -    uint64_t vls[KVM_ARM64_SVE_VLS_WORDS] = {0};
 +    ARMCPU *cpu = ARM_CPU(cs);
 +    uint64_t vls[KVM_ARM64_SVE_VLS_WORDS] = { cpu->sve_vq_map };
      struct kvm_one_reg reg = {
          .id = KVM_REG_ARM64_SVE_VLS,
          .addr = (uint64_t)&vls[0],
      };
 -    ARMCPU *cpu = ARM_CPU(cs);
 -    uint32_t vq;
 -    int i, j;
      assert(cpu->sve_max_vq <= KVM_ARM64_SVE_VQ_MAX);
 -    for (vq = 1; vq <= cpu->sve_max_vq; ++vq) {
 -        if (test_bit(vq - 1, cpu->sve_vq_map)) {
 -            i = (vq - 1) / 64;
 -            j = (vq - 1) % 64;
 -            vls[i] |= 1UL << j;
 -        }
 -    }
 -
      return kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
  }
 --
-.25.1
+.34.1

-[PULL 45/55] target/arm: Merge aarch64_sve_zcr_get_valid_len into caller
+[PULL 10/68] target/arm: Set up float_status to use for FPCR.AH=1 behaviour
-From: Richard Henderson <richard.henderson@linaro.org>
+When FPCR.AH is 1, the behaviour of some instructions changes:
+ * AdvSIMD BFCVT, BFCVTN, BFCVTN2, BFMLALB, BFMLALT
-This function is used only once, and will need modification
+ * SVE BFCVT, BFCVTNT, BFMLALB, BFMLALT, BFMLSLB, BFMLSLT
-for Streaming SVE mode.
+ * SME BFCVT, BFCVTN, BFMLAL, BFMLSL (these are all in SME2 which
+   QEMU does not yet implement)
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+ * FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220607203306.657998-11-richard.henderson@linaro.org
+The behaviour change is:
  * the instructions do not update the FPSR cumulative exception flags
  * trapped floating point exceptions are disabled (a no-op for QEMU,
    which doesn't implement FPCR.{IDE,IXE,UFE,OFE,DZE,IOE})
  * rounding is always round-to-nearest-even regardless of FPCR.RMode
  * denormalized inputs and outputs are always flushed to zero, as if
    FPCR.{FZ,FIZ} is {1,1}
  * FPCR.FZ16 is still honoured for half-precision inputs
 (See the Arm ARM DDI0487L.a section A1.5.9.)
 We can provide all these behaviours with another pair of float_status fields
 which we use only for these insns, when FPCR.AH is 1. These float_status
 fields will always have:
  * flush_to_zero and flush_inputs_to_zero set for the non-F16 field
  * rounding mode set to round-to-nearest-even
 and so the only FPCR fields they need to honour are DN and FZ16.
 In this commit we only define the new fp_status fields and give them
 the required behaviour when FPSR is updated.  In subsequent commits
 we will arrange to use this new fp_status field for the instructions
 that should be affected by FPCR.AH in this way.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/internals.h | 11 -----------
+ target/arm/cpu.h           | 15 +++++++++++++++
- target/arm/helper.c    | 30 +++++++++++-------------------
+ target/arm/internals.h     |  2 ++
-files changed, 11 insertions(+), 30 deletions(-)
+ target/arm/tcg/translate.h | 14 ++++++++++++++
+ target/arm/cpu.c           |  4 ++++
  target/arm/vfp_helper.c    | 13 ++++++++++++-
 files changed, 47 insertions(+), 1 deletion(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
           *  standard_fp_status : the ARM "Standard FPSCR Value"
           *  standard_fp_status_fp16 : used for half-precision
           *       calculations with the ARM "Standard FPSCR Value"
 +         *  ah_fp_status: used for the A64 insns which change behaviour
 +         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 +         *       and the reciprocal and square root estimate/step insns)
 +         *  ah_fp_status_f16: used for the A64 insns which change behaviour
 +         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 +         *       and the reciprocal and square root estimate/step insns);
 +         *       for half-precision
           *
           * Half-precision operations are governed by a separate
           * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
           * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
           * using a fixed value for it.
           *
 +         * The ah_fp_status is needed because some insns have different
 +         * behaviour when FPCR.AH == 1: they don't update cumulative
 +         * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
 +         * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
 +         * which means we need an ah_fp_status_f16 as well.
 +         *
           * To avoid having to transfer exception bits around, we simply
           * say that the FPSCR cumulative exception flags are the logical
           * OR of the flags in the four fp statuses. This relies on the
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          float_status fp_status_f16_a64;
          float_status standard_fp_status;
          float_status standard_fp_status_f16;
 +        float_status ah_fp_status;
 +        float_status ah_fp_status_f16;
          uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
          uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
 diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/internals.h
 +++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ void arm_translate_init(void);
+@@ -XXX,XX +XXX,XX @@ int alle1_tlbmask(CPUARMState *env);
- void arm_cpu_synchronize_from_tb(CPUState *cs, const TranslationBlock *tb);
- #endif /* CONFIG_TCG */
+ /* Set the float_status behaviour to match the Arm defaults */
+ void arm_set_default_fp_behaviours(float_status *s);
--/**
++/* Set the float_status behaviour to match Arm FPCR.AH=1 behaviour */
-- * aarch64_sve_zcr_get_valid_len:
++void arm_set_ah_fp_behaviours(float_status *s);
-- * @cpu: cpu context
-- * @start_len: maximum len to consider
+ #endif
-- *
+diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
-- * Return the maximum supported sve vector length <= @start_len.
+index XXXXXXX..XXXXXXX 100644
-- * Note that both @start_len and the return value are in units
+--- a/target/arm/tcg/translate.h
-- * of ZCR_ELx.LEN, so the vector bit length is (x + 1) * 128.
++++ b/target/arm/tcg/translate.h
-- */
+@@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour {
--uint32_t aarch64_sve_zcr_get_valid_len(ARMCPU *cpu, uint32_t start_len);
+     FPST_A64,
--
+     FPST_A32_F16,
- enum arm_fprounding {
+     FPST_A64_F16,
-     FPROUNDING_TIEEVEN,
++    FPST_AH,
-     FPROUNDING_POSINF,
++    FPST_AH_F16,
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+     FPST_STD,
-index XXXXXXX..XXXXXXX 100644
+     FPST_STD_F16,
---- a/target/arm/helper.c
+ } ARMFPStatusFlavour;
-+++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour {
-@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
+  *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
-     return 0;
+  * FPST_A64_F16
   *   for AArch64 operations controlled by the FPCR where FPCR.FZ16 is to be used
 + * FPST_AH:
 + *   for AArch64 operations which change behaviour when AH=1 (specifically,
 + *   bfloat16 conversions and multiplies, and the reciprocal and square root
 + *   estimate/step insns)
 + * FPST_AH_F16:
 + *   ditto, but for half-precision operations
   * FPST_STD
   *   for A32/T32 Neon operations using the "standard FPSCR value"
   * FPST_STD_F16
@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
      case FPST_A64_F16:
          offset = offsetof(CPUARMState, vfp.fp_status_f16_a64);
          break;
 +    case FPST_AH:
 +        offset = offsetof(CPUARMState, vfp.ah_fp_status);
 +        break;
 +    case FPST_AH_F16:
 +        offset = offsetof(CPUARMState, vfp.ah_fp_status_f16);
 +        break;
      case FPST_STD:
          offset = offsetof(CPUARMState, vfp.standard_fp_status);
          break;
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
      arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
 +    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
 +    set_flush_to_zero(1, &env->vfp.ah_fp_status);
 +    set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
 +    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status_f16);
  #ifndef CONFIG_USER_ONLY
      if (kvm_enabled()) {
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ void arm_set_default_fp_behaviours(float_status *s)
   *    set Invalid for a QNaN
   *  * default NaN has sign bit set, msb frac bit set
   */
 -static void arm_set_ah_fp_behaviours(float_status *s)
 +void arm_set_ah_fp_behaviours(float_status *s)
  {
      set_float_detect_tininess(float_tininess_after_rounding, s);
      set_float_ftz_detection(float_ftz_after_rounding, s);
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
      a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
      a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
            & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
 +    /*
 +     * We do not merge in flags from ah_fp_status or ah_fp_status_f16, because
 +     * they are used for insns that must not set the cumulative exception bits.
 +     */
 +
      /*
       * Flushing an input denormal *only* because FPCR.FIZ == 1 does
       * not set FPSR.IDC; if FPCR.FZ is also set then this takes
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
      set_float_exception_flags(0, &env->vfp.standard_fp_status);
      set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
 +    set_float_exception_flags(0, &env->vfp.ah_fp_status);
 +    set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
  }
--uint32_t aarch64_sve_zcr_get_valid_len(ARMCPU *cpu, uint32_t start_len)
+ static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
--{
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
--    uint32_t end_len;
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
--
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
--    start_len = MIN(start_len, ARM_MAX_VQ - 1);
+         set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
--    end_len = start_len;
++        set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
--
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
--    if (!test_bit(start_len, cpu->sve_vq_map)) {
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
--        end_len = find_last_bit(cpu->sve_vq_map, start_len);
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
--        assert(end_len < start_len);
++        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
 -    }
 -    return end_len;
 -}
 -
  /*
   * Given that SVE is enabled, return the vector length for EL.
   */
  uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
  {
      ARMCPU *cpu = env_archcpu(env);
 -    uint32_t zcr_len = cpu->sve_max_vq - 1;
 +    uint32_t len = cpu->sve_max_vq - 1;
 +    uint32_t end_len;
      if (el <= 1 && !el_is_in_host(env, el)) {
 -        zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
 +        len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
      }
-     if (el <= 2 && arm_feature(env, ARM_FEATURE_EL2)) {
+     if (changed & FPCR_FZ) {
--        zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[2]);
+         bool ftz_enabled = val & FPCR_FZ;
-+        len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[2]);
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status_f16);
      }
-     if (arm_feature(env, ARM_FEATURE_EL3)) {
+     if (changed & FPCR_AH) {
--        zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
+         bool ah_enabled = val & FPCR_AH;
 +        len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
      }
 -    return aarch64_sve_zcr_get_valid_len(cpu, zcr_len);
 +    end_len = len;
 +    if (!test_bit(len, cpu->sve_vq_map)) {
 +        end_len = find_last_bit(cpu->sve_vq_map, len);
 +        assert(end_len < len);
 +    }
 +    return end_len;
  }
  static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
 --
-.25.1
+.34.1

-[PULL 39/55] target/arm: Remove fp checks from sve_exception_el
+[PULL 11/68] target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
-From: Richard Henderson <richard.henderson@linaro.org>
+For the instructions FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, use
 FPST_FPCR_AH or FPST_FPCR_AH_F16 when FPCR.AH is 1, so that they get
 the required behaviour changes.
-Instead of checking these bits in fp_exception_el and
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-also in sve_exception_el, document that we must compare
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-the results.  The only place where we have not already
+---
-checked that FP EL is zero is in rebuild_hflags_a64.
+ target/arm/tcg/translate-a64.h |  13 ++++
  target/arm/tcg/translate-a64.c | 119 +++++++++++++++++++++++++--------
  target/arm/tcg/translate-sve.c |  30 ++++++---
 files changed, 127 insertions(+), 35 deletions(-)
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20220607203306.657998-5-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper.c | 58 +++++++++++++++------------------------------
 file changed, 19 insertions(+), 39 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/translate-a64.h
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/translate-a64.h
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo minimal_ras_reginfo[] = {
+@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr pred_full_reg_ptr(DisasContext *s, int regno)
-       .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.vsesr_el2) },
+     return ret;
- };
+ }
 -/* Return the exception level to which exceptions should be taken
 - * via SVEAccessTrap.  If an exception should be routed through
 - * AArch64.AdvSIMDFPAccessTrap, return 0; fp_exception_el should
 - * take care of raising that exception.
 - * C.f. the ARM pseudocode function CheckSVEEnabled.
 +/*
-+ * Return the exception level to which exceptions should be taken
++ * Return the ARMFPStatusFlavour to use based on element size and
-+ * via SVEAccessTrap.  This excludes the check for whether the exception
++ * whether FPCR.AH is set.
-+ * should be routed through AArch64.AdvSIMDFPAccessTrap.  That can easily
++ */
-+ * be found by testing 0 < fp_exception_el < sve_exception_el.
++static inline ARMFPStatusFlavour select_ah_fpst(DisasContext *s, MemOp esz)
-+ *
++{
-+ * C.f. the ARM pseudocode function CheckSVEEnabled.  Note that the
++    if (s->fpcr_ah) {
-+ * pseudocode does *not* separate out the FP trap checks, but has them
++        return esz == MO_16 ? FPST_AH_F16 : FPST_AH;
-+ * all in one function.
++    } else {
 +        return esz == MO_16 ? FPST_A64_F16 : FPST_A64;
 +    }
 +}
 +
  bool disas_sve(DisasContext *, uint32_t);
  bool disas_sme(DisasContext *, uint32_t);
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op3_ool(DisasContext *s, bool is_q, int rd,
   * an out-of-line helper.
   */
- int sve_exception_el(CPUARMState *env, int el)
+ static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, int rd, int rn,
- {
+-                              int rm, bool is_fp16, int data,
-@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
++                              int rm, ARMFPStatusFlavour fpsttype, int data,
-         case 2:
+                               gen_helper_gvec_3_ptr *fn)
-             return 1;
+ {
 -    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_A64_F16 : FPST_A64);
 +    TCGv_ptr fpst = fpstatus_ptr(fpsttype);
      tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
                         vec_full_reg_offset(s, rn),
                         vec_full_reg_offset(s, rm), fpst,
@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar {
      void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
  } FPScalar;
 -static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
 +static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
 +                                        const FPScalar *f,
 +                                        ARMFPStatusFlavour fpsttype)
  {
      switch (a->esz) {
      case MO_64:
          if (fp_access_check(s)) {
              TCGv_i64 t0 = read_fp_dreg(s, a->rn);
              TCGv_i64 t1 = read_fp_dreg(s, a->rm);
 -            f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_A64));
 +            f->gen_d(t0, t0, t1, fpstatus_ptr(fpsttype));
              write_fp_dreg(s, a->rd, t0);
          }
--
+         break;
--        /* Check CPACR.FPEN.  */
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
--        switch (FIELD_EX64(env->cp15.cpacr_el1, CPACR_EL1, FPEN)) {
+         if (fp_access_check(s)) {
--        case 1:
+             TCGv_i32 t0 = read_fp_sreg(s, a->rn);
--            if (el != 0) {
+             TCGv_i32 t1 = read_fp_sreg(s, a->rm);
--                break;
+-            f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_A64));
--            }
++            f->gen_s(t0, t0, t1, fpstatus_ptr(fpsttype));
--            /* fall through */
+             write_fp_sreg(s, a->rd, t0);
--        case 0:
+         }
--        case 2:
+         break;
--            return 0;
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
--        }
+         if (fp_access_check(s)) {
              TCGv_i32 t0 = read_fp_hreg(s, a->rn);
              TCGv_i32 t1 = read_fp_hreg(s, a->rm);
 -            f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_A64_F16));
 +            f->gen_h(t0, t0, t1, fpstatus_ptr(fpsttype));
              write_fp_sreg(s, a->rd, t0);
          }
          break;
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
      return true;
  }
 +static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
 +{
 +    return do_fp3_scalar_with_fpsttype(s, a, f,
 +                                       a->esz == MO_16 ?
 +                                       FPST_A64_F16 : FPST_A64);
 +}
 +
 +static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
 +{
 +    return do_fp3_scalar_with_fpsttype(s, a, f, select_ah_fpst(s, a->esz));
 +}
 +
  static const FPScalar f_scalar_fadd = {
      gen_helper_vfp_addh,
      gen_helper_vfp_adds,
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_frecps = {
      gen_helper_recpsf_f32,
      gen_helper_recpsf_f64,
  };
 -TRANS(FRECPS_s, do_fp3_scalar, a, &f_scalar_frecps)
 +TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps)
  static const FPScalar f_scalar_frsqrts = {
      gen_helper_rsqrtsf_f16,
      gen_helper_rsqrtsf_f32,
      gen_helper_rsqrtsf_f64,
  };
 -TRANS(FRSQRTS_s, do_fp3_scalar, a, &f_scalar_frsqrts)
 +TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts)
  static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                         const FPScalar *f, bool swap)
@@ -XXX,XX +XXX,XX @@ TRANS(CMHS_s, do_cmop_d, a, TCG_COND_GEU)
  TRANS(CMEQ_s, do_cmop_d, a, TCG_COND_EQ)
  TRANS(CMTST_s, do_cmop_d, a, TCG_COND_TSTNE)
 -static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
 -                          gen_helper_gvec_3_ptr * const fns[3])
 +static bool do_fp3_vector_with_fpsttype(DisasContext *s, arg_qrrr_e *a,
 +                                        int data,
 +                                        gen_helper_gvec_3_ptr * const fns[3],
 +                                        ARMFPStatusFlavour fpsttype)
  {
      MemOp esz = a->esz;
      int check = fp_access_check_vector_hsd(s, a->q, esz);
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
          return check == 0;
      }
-     /*
+-    gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
-@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
+-                      esz == MO_16, data, fns[esz - 1]);
-             case 2:
++    gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm, fpsttype,
-                 return 2;
++                      data, fns[esz - 1]);
-             }
+     return true;
--
+ }
--            switch (FIELD_EX32(env->cp15.cptr_el[2], CPTR_EL2, FPEN)) {
--            case 1:
++static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
--                if (el == 2 || !(hcr_el2 & HCR_TGE)) {
++                          gen_helper_gvec_3_ptr * const fns[3])
--                    break;
++{
--                }
++    return do_fp3_vector_with_fpsttype(s, a, data, fns,
--                /* fall through */
++                                       a->esz == MO_16 ?
--            case 0:
++                                       FPST_A64_F16 : FPST_A64);
--            case 2:
++}
--                return 0;
++
--            }
++static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
-         } else if (arm_is_el2_enabled(env)) {
++                             gen_helper_gvec_3_ptr * const f[3])
-             if (FIELD_EX64(env->cp15.cptr_el[2], CPTR_EL2, TZ)) {
++{
-                 return 2;
++    return do_fp3_vector_with_fpsttype(s, a, data, f,
-             }
++                                       select_ah_fpst(s, a->esz));
--            if (FIELD_EX64(env->cp15.cptr_el[2], CPTR_EL2, TFP)) {
++}
--                return 0;
++
--            }
+ static gen_helper_gvec_3_ptr * const f_vector_fadd[3] = {
-         }
+     gen_helper_gvec_fadd_h,
      gen_helper_gvec_fadd_s,
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
      gen_helper_gvec_recps_s,
      gen_helper_gvec_recps_d,
  };
 -TRANS(FRECPS_v, do_fp3_vector, a, 0, f_vector_frecps)
 +TRANS(FRECPS_v, do_fp3_vector_ah, a, 0, f_vector_frecps)
  static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] = {
      gen_helper_gvec_rsqrts_h,
      gen_helper_gvec_rsqrts_s,
      gen_helper_gvec_rsqrts_d,
  };
 -TRANS(FRSQRTS_v, do_fp3_vector, a, 0, f_vector_frsqrts)
 +TRANS(FRSQRTS_v, do_fp3_vector_ah, a, 0, f_vector_frsqrts)
  static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = {
      gen_helper_gvec_faddp_h,
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector_idx(DisasContext *s, arg_qrrx_e *a,
      }
-@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
+     gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
+-                      esz == MO_16, a->idx, fns[esz - 1]);
-     if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
++                      esz == MO_16 ? FPST_A64_F16 : FPST_A64,
-         int sve_el = sve_exception_el(env, el);
++                      a->idx, fns[esz - 1]);
--        uint32_t zcr_len;
+     return true;
+ }
-         /*
--         * If SVE is disabled, but FP is enabled,
+@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar1 {
--         * then the effective len is 0.
+     void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_ptr);
-+         * If either FP or SVE are disabled, translator does not need len.
+ } FPScalar1;
-+         * If SVE EL > FP EL, FP exception has precedence, and translator
-+         * does not need SVE EL.  Save potential re-translations by forcing
+-static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
-+         * the unneeded data to zero.
+-                          const FPScalar1 *f, int rmode)
-          */
++static bool do_fp1_scalar_with_fpsttype(DisasContext *s, arg_rr_e *a,
--        if (sve_el != 0 && fp_el == 0) {
++                                        const FPScalar1 *f, int rmode,
--            zcr_len = 0;
++                                        ARMFPStatusFlavour fpsttype)
--        } else {
+ {
--            zcr_len = sve_zcr_len_for_el(env, el);
+     TCGv_i32 tcg_rmode = NULL;
-+        if (fp_el != 0) {
+     TCGv_ptr fpst;
-+            if (sve_el > fp_el) {
+@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
-+                sve_el = 0;
+         return check == 0;
 +            }
 +        } else if (sve_el == 0) {
 +            DP_TBFLAG_A64(flags, VL, sve_zcr_len_for_el(env, el));
          }
          DP_TBFLAG_A64(flags, SVEEXC_EL, sve_el);
 -        DP_TBFLAG_A64(flags, VL, zcr_len);
      }
-     sctlr = regime_sctlr(env, stage1);
+-    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
 +    fpst = fpstatus_ptr(fpsttype);
      if (rmode >= 0) {
          tcg_rmode = gen_set_rmode(rmode, fpst);
      }
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
      return true;
  }
 +static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
 +                          const FPScalar1 *f, int rmode)
 +{
 +    return do_fp1_scalar_with_fpsttype(s, a, f, rmode,
 +                                       a->esz == MO_16 ?
 +                                       FPST_A64_F16 : FPST_A64);
 +}
 +
 +static bool do_fp1_scalar_ah(DisasContext *s, arg_rr_e *a,
 +                             const FPScalar1 *f, int rmode)
 +{
 +    return do_fp1_scalar_with_fpsttype(s, a, f, rmode, select_ah_fpst(s, a->esz));
 +}
 +
  static const FPScalar1 f_scalar_fsqrt = {
      gen_helper_vfp_sqrth,
      gen_helper_vfp_sqrts,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frecpe = {
      gen_helper_recpe_f32,
      gen_helper_recpe_f64,
  };
 -TRANS(FRECPE_s, do_fp1_scalar, a, &f_scalar_frecpe, -1)
 +TRANS(FRECPE_s, do_fp1_scalar_ah, a, &f_scalar_frecpe, -1)
  static const FPScalar1 f_scalar_frecpx = {
      gen_helper_frecpx_f16,
      gen_helper_frecpx_f32,
      gen_helper_frecpx_f64,
  };
 -TRANS(FRECPX_s, do_fp1_scalar, a, &f_scalar_frecpx, -1)
 +TRANS(FRECPX_s, do_fp1_scalar_ah, a, &f_scalar_frecpx, -1)
  static const FPScalar1 f_scalar_frsqrte = {
      gen_helper_rsqrte_f16,
      gen_helper_rsqrte_f32,
      gen_helper_rsqrte_f64,
  };
 -TRANS(FRSQRTE_s, do_fp1_scalar, a, &f_scalar_frsqrte, -1)
 +TRANS(FRSQRTE_s, do_fp1_scalar_ah, a, &f_scalar_frsqrte, -1)
  static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
  {
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FRINT64Z_v, aa64_frint, do_fp1_vector, a,
             &f_scalar_frint64, FPROUNDING_ZERO)
  TRANS_FEAT(FRINT64X_v, aa64_frint, do_fp1_vector, a, &f_scalar_frint64, -1)
 -static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
 -                             int rd, int rn, int data,
 -                             gen_helper_gvec_2_ptr * const fns[3])
 +static bool do_gvec_op2_fpst_with_fpsttype(DisasContext *s, MemOp esz,
 +                                           bool is_q, int rd, int rn, int data,
 +                                           gen_helper_gvec_2_ptr * const fns[3],
 +                                           ARMFPStatusFlavour fpsttype)
  {
      int check = fp_access_check_vector_hsd(s, is_q, esz);
      TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
          return check == 0;
      }
 -    fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64);
 +    fpst = fpstatus_ptr(fpsttype);
      tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd),
                         vec_full_reg_offset(s, rn), fpst,
                         is_q ? 16 : 8, vec_full_reg_size(s),
@@ -XXX,XX +XXX,XX @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
      return true;
  }
 +static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
 +                             int rd, int rn, int data,
 +                             gen_helper_gvec_2_ptr * const fns[3])
 +{
 +    return do_gvec_op2_fpst_with_fpsttype(s, esz, is_q, rd, rn, data, fns,
 +                                          esz == MO_16 ? FPST_A64_F16 :
 +                                          FPST_A64);
 +}
 +
 +static bool do_gvec_op2_ah_fpst(DisasContext *s, MemOp esz, bool is_q,
 +                                int rd, int rn, int data,
 +                                gen_helper_gvec_2_ptr * const fns[3])
 +{
 +    return do_gvec_op2_fpst_with_fpsttype(s, esz, is_q, rd, rn, data,
 +                                          fns, select_ah_fpst(s, esz));
 +}
 +
  static gen_helper_gvec_2_ptr * const f_scvtf_v[] = {
      gen_helper_gvec_vcvt_sh,
      gen_helper_gvec_vcvt_sf,
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_frecpe[] = {
      gen_helper_gvec_frecpe_s,
      gen_helper_gvec_frecpe_d,
  };
 -TRANS(FRECPE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
 +TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
  static gen_helper_gvec_2_ptr * const f_frsqrte[] = {
      gen_helper_gvec_frsqrte_h,
      gen_helper_gvec_frsqrte_s,
      gen_helper_gvec_frsqrte_d,
  };
 -TRANS(FRSQRTE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
 +TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
  static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
  {
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool gen_gvec_fpst_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
      return true;
  }
 -static bool gen_gvec_fpst_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
 -                                 arg_rr_esz *a, int data)
 +static bool gen_gvec_fpst_ah_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
 +                                    arg_rr_esz *a, int data)
  {
      return gen_gvec_fpst_zz(s, fn, a->rd, a->rn, data,
 -                            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
 +                            select_ah_fpst(s, a->esz));
  }
  /* Invoke an out-of-line helper on 3 Zregs. */
@@ -XXX,XX +XXX,XX @@ static bool gen_gvec_fpst_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
                               a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
  }
 +static bool gen_gvec_fpst_ah_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
 +                                     arg_rrr_esz *a, int data)
 +{
 +    return gen_gvec_fpst_zzz(s, fn, a->rd, a->rn, a->rm, data,
 +                             select_ah_fpst(s, a->esz));
 +}
 +
  /* Invoke an out-of-line helper on 4 Zregs. */
  static bool gen_gvec_ool_zzzz(DisasContext *s, gen_helper_gvec_4 *fn,
                                int rd, int rn, int rm, int ra, int data)
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const frecpe_fns[] = {
      NULL,                     gen_helper_gvec_frecpe_h,
      gen_helper_gvec_frecpe_s, gen_helper_gvec_frecpe_d,
  };
 -TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_arg_zz, frecpe_fns[a->esz], a, 0)
 +TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frecpe_fns[a->esz], a, 0)
  static gen_helper_gvec_2_ptr * const frsqrte_fns[] = {
      NULL,                      gen_helper_gvec_frsqrte_h,
      gen_helper_gvec_frsqrte_s, gen_helper_gvec_frsqrte_d,
  };
 -TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_arg_zz, frsqrte_fns[a->esz], a, 0)
 +TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frsqrte_fns[a->esz], a, 0)
  /*
   *** SVE Floating Point Compare with Zero Group
@@ -XXX,XX +XXX,XX @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
      };                                                              \
      TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_arg_zzz, name##_fns[a->esz], a, 0)
 +#define DO_FP3_AH(NAME, name) \
 +    static gen_helper_gvec_3_ptr * const name##_fns[4] = {          \
 +        NULL, gen_helper_gvec_##name##_h,                           \
 +        gen_helper_gvec_##name##_s, gen_helper_gvec_##name##_d      \
 +    };                                                              \
 +    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz, name##_fns[a->esz], a, 0)
 +
  DO_FP3(FADD_zzz, fadd)
  DO_FP3(FSUB_zzz, fsub)
  DO_FP3(FMUL_zzz, fmul)
 -DO_FP3(FRECPS, recps)
 -DO_FP3(FRSQRTS, rsqrts)
 +DO_FP3_AH(FRECPS, recps)
 +DO_FP3_AH(FRSQRTS, rsqrts)
  #undef DO_FP3
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const frecpx_fns[] = {
      gen_helper_sve_frecpx_s, gen_helper_sve_frecpx_d,
  };
  TRANS_FEAT(FRECPX, aa64_sve, gen_gvec_fpst_arg_zpz, frecpx_fns[a->esz],
 -           a, 0, a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 +           a, 0, select_ah_fpst(s, a->esz))
  static gen_helper_gvec_3_ptr * const fsqrt_fns[] = {
      NULL,                   gen_helper_sve_fsqrt_h,
 --
-.25.1
+.34.1

-New patch
+[PULL 12/68] target/arm: Use FPST_FPCR_AH for BFCVT* insns
+When FPCR.AH is 1, use FPST_FPCR_AH for:
+ * AdvSIMD BFCVT, BFCVTN, BFCVTN2
+ * SVE BFCVT, BFCVTNT
+so that they get the required behaviour changes.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 27 +++++++++++++++++++++------
+ target/arm/tcg/translate-sve.c |  6 ++++--
+files changed, 25 insertions(+), 8 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
+ static const FPScalar1 f_scalar_bfcvt = {
+     .gen_s = gen_helper_bfcvt,
+ };
+-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar, a, &f_scalar_bfcvt, -1)
++TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
+ static const FPScalar1 f_scalar_frint32 = {
+     NULL,
+@@ -XXX,XX +XXX,XX @@ static void gen_bfcvtn_hs(TCGv_i64 d, TCGv_i64 n)
+     tcg_gen_extu_i32_i64(d, tmp);
+ }
+-static ArithOneOp * const f_vector_bfcvtn[] = {
+-    NULL,
+-    gen_bfcvtn_hs,
+-    NULL,
++static void gen_bfcvtn_ah_hs(TCGv_i64 d, TCGv_i64 n)
++{
++    TCGv_ptr fpst = fpstatus_ptr(FPST_AH);
++    TCGv_i32 tmp = tcg_temp_new_i32();
++    gen_helper_bfcvt_pair(tmp, n, fpst);
++    tcg_gen_extu_i32_i64(d, tmp);
++}
++
++static ArithOneOp * const f_vector_bfcvtn[2][3] = {
++    {
++        NULL,
++        gen_bfcvtn_hs,
++        NULL,
++    }, {
++        NULL,
++        gen_bfcvtn_ah_hs,
++        NULL,
++    }
+ };
+-TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a, f_vector_bfcvtn)
++TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a,
++           f_vector_bfcvtn[s->fpcr_ah])
+ static bool trans_SHLL_v(DisasContext *s, arg_qrr_e *a)
+ {
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVT_hs, aa64_sve, gen_gvec_fpst_arg_zpz,
+            gen_helper_sve_fcvt_hs, a, 0, FPST_A64_F16)
+ TRANS_FEAT(BFCVT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
+-           gen_helper_sve_bfcvt, a, 0, FPST_A64)
++           gen_helper_sve_bfcvt, a, 0,
++           s->fpcr_ah ? FPST_AH : FPST_A64)
+ TRANS_FEAT(FCVT_dh, aa64_sve, gen_gvec_fpst_arg_zpz,
+            gen_helper_sve_fcvt_dh, a, 0, FPST_A64)
+@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVTNT_ds, aa64_sve2, gen_gvec_fpst_arg_zpz,
+            gen_helper_sve2_fcvtnt_ds, a, 0, FPST_A64)
+ TRANS_FEAT(BFCVTNT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
+-           gen_helper_sve_bfcvtnt, a, 0, FPST_A64)
++           gen_helper_sve_bfcvtnt, a, 0,
++           s->fpcr_ah ? FPST_AH : FPST_A64)
+ TRANS_FEAT(FCVTLT_hs, aa64_sve2, gen_gvec_fpst_arg_zpz,
+            gen_helper_sve2_fcvtlt_hs, a, 0, FPST_A64)
+--
+.34.1

-[PULL 04/55] xlnx_dp: fix the wrong register size
+[PULL 13/68] target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
-From: Frederic Konrad <fkonrad@amd.com>
+When FPCR.AH is 1, use FPST_FPCR_AH for:
  * AdvSIMD BFMLALB, BFMLALT
  * SVE BFMLALB, BFMLALT, BFMLSLB, BFMLSLT
-The core and the vblend registers size are wrong, they should respectively be
+so that they get the required behaviour changes.
 x3B0 and 0x1E0 according to:
   https://www.xilinx.com/htmldocs/registers/ug1087/ug1087-zynq-ultrascale-registers.html.
-Let's fix that and use macros when creating the mmio region.
+We do this by making gen_gvec_op4_fpst() take an ARMFPStatusFlavour
 rather than a bool is_fp16; existing callsites now select
 FPST_FPCR_F16_A64 vs FPST_FPCR_A64 themselves rather than passing in
 the boolean.
-Fixes: 58ac482a66d ("introduce xlnx-dp")
-Signed-off-by: Frederic Konrad <fkonrad@amd.com>
-Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
-Acked-by: Alistair Francis <alistair.francis@wdc.com>
-Message-id: 20220601172353.3220232-2-fkonrad@xilinx.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/hw/display/xlnx_dp.h |  9 +++++++--
+ target/arm/tcg/translate-a64.c | 20 +++++++++++++-------
- hw/display/xlnx_dp.c         | 17 ++++++++++-------
+ target/arm/tcg/translate-sve.c |  6 ++++--
 files changed, 17 insertions(+), 9 deletions(-)
-diff --git a/include/hw/display/xlnx_dp.h b/include/hw/display/xlnx_dp.h
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/display/xlnx_dp.h
+--- a/target/arm/tcg/translate-a64.c
-+++ b/include/hw/display/xlnx_dp.h
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op4_env(DisasContext *s, bool is_q, int rd, int rn,
- #define AUD_CHBUF_MAX_DEPTH                 (32 * KiB)
+  * an out-of-line helper.
- #define MAX_QEMU_BUFFER_SIZE                (4 * KiB)
+  */
+ static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
--#define DP_CORE_REG_ARRAY_SIZE              (0x3AF >> 2)
+-                              int rm, int ra, bool is_fp16, int data,
-+#define DP_CORE_REG_OFFSET                  (0x0000)
++                              int rm, int ra, ARMFPStatusFlavour fpsttype,
-+#define DP_CORE_REG_ARRAY_SIZE              (0x3B0 >> 2)
++                              int data,
-+#define DP_AVBUF_REG_OFFSET                 (0xB000)
+                               gen_helper_gvec_4_ptr *fn)
- #define DP_AVBUF_REG_ARRAY_SIZE             (0x238 >> 2)
+ {
--#define DP_VBLEND_REG_ARRAY_SIZE            (0x1DF >> 2)
+-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_A64_F16 : FPST_A64);
-+#define DP_VBLEND_REG_OFFSET                (0xA000)
++    TCGv_ptr fpst = fpstatus_ptr(fpsttype);
-+#define DP_VBLEND_REG_ARRAY_SIZE            (0x1E0 >> 2)
+     tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, rd),
-+#define DP_AUDIO_REG_OFFSET                 (0xC000)
+                        vec_full_reg_offset(s, rn),
- #define DP_AUDIO_REG_ARRAY_SIZE             (0x50 >> 2)
+                        vec_full_reg_offset(s, rm),
-+#define DP_CONTAINER_SIZE                   (0xC050)
+@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLAL_v(DisasContext *s, arg_qrrr_e *a)
+     }
- struct PixmanPlane {
+     if (fp_access_check(s)) {
-     pixman_format_code_t format;
+         /* Q bit selects BFMLALB vs BFMLALT. */
-diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
+-        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, false, a->q,
 +        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd,
 +                          s->fpcr_ah ? FPST_AH : FPST_A64, a->q,
                            gen_helper_gvec_bfmlal);
      }
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
      }
      gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
 -                      a->esz == MO_16, a->rot, fn[a->esz]);
 +                      a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
 +                      a->rot, fn[a->esz]);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
      }
      gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
 -                      esz == MO_16, (a->idx << 1) | neg,
 +                      esz == MO_16 ? FPST_A64_F16 : FPST_A64,
 +                      (a->idx << 1) | neg,
                        fns[esz - 1]);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLAL_vi(DisasContext *s, arg_qrrx_e *a)
      }
      if (fp_access_check(s)) {
          /* Q bit selects BFMLALB vs BFMLALT. */
 -        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, 0,
 +        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd,
 +                          s->fpcr_ah ? FPST_AH : FPST_A64,
                            (a->idx << 1) | a->q,
                            gen_helper_gvec_bfmlal_idx);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
      }
      if (fp_access_check(s)) {
          gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
 -                          a->esz == MO_16, (a->idx << 2) | a->rot, fn);
 +                          a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
 +                          (a->idx << 2) | a->rot, fn);
      }
      return true;
  }
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/xlnx_dp.c
+--- a/target/arm/tcg/translate-sve.c
-+++ b/hw/display/xlnx_dp.c
++++ b/target/arm/tcg/translate-sve.c
-@@ -XXX,XX +XXX,XX @@ static void xlnx_dp_init(Object *obj)
+@@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(BFMMLA, aa64_sve_bf16, gen_gvec_env_arg_zzzz,
-     SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
+ static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
-     XlnxDPState *s = XLNX_DP(obj);
+ {
+     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal,
--    memory_region_init(&s->container, obj, TYPE_XLNX_DP, 0xC050);
+-                              a->rd, a->rn, a->rm, a->ra, sel, FPST_A64);
-+    memory_region_init(&s->container, obj, TYPE_XLNX_DP, DP_CONTAINER_SIZE);
++                              a->rd, a->rn, a->rm, a->ra, sel,
++                              s->fpcr_ah ? FPST_AH : FPST_A64);
-     memory_region_init_io(&s->core_iomem, obj, &dp_ops, s, TYPE_XLNX_DP
+ }
--                          ".core", 0x3AF);
--    memory_region_add_subregion(&s->container, 0x0000, &s->core_iomem);
+ TRANS_FEAT(BFMLALB_zzzw, aa64_sve_bf16, do_BFMLAL_zzzw, a, false)
-+                          ".core", sizeof(s->core_registers));
+@@ -XXX,XX +XXX,XX @@ static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
-+    memory_region_add_subregion(&s->container, DP_CORE_REG_OFFSET,
+ {
-+                                &s->core_iomem);
+     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal_idx,
+                               a->rd, a->rn, a->rm, a->ra,
-     memory_region_init_io(&s->vblend_iomem, obj, &vblend_ops, s, TYPE_XLNX_DP
+-                              (a->index << 1) | sel, FPST_A64);
--                          ".v_blend", 0x1DF);
++                              (a->index << 1) | sel,
--    memory_region_add_subregion(&s->container, 0xA000, &s->vblend_iomem);
++                              s->fpcr_ah ? FPST_AH : FPST_A64);
-+                          ".v_blend", sizeof(s->vblend_registers));
+ }
-+    memory_region_add_subregion(&s->container, DP_VBLEND_REG_OFFSET,
-+                                &s->vblend_iomem);
+ TRANS_FEAT(BFMLALB_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, false)
      memory_region_init_io(&s->avbufm_iomem, obj, &avbufm_ops, s, TYPE_XLNX_DP
 -                          ".av_buffer_manager", 0x238);
 -    memory_region_add_subregion(&s->container, 0xB000, &s->avbufm_iomem);
 +                          ".av_buffer_manager", sizeof(s->avbufm_registers));
 +    memory_region_add_subregion(&s->container, DP_AVBUF_REG_OFFSET,
 +                                &s->avbufm_iomem);
      memory_region_init_io(&s->audio_iomem, obj, &audio_ops, s, TYPE_XLNX_DP
                            ".audio", sizeof(s->audio_registers));
 --
-.25.1
+.34.1

-New patch
+[PULL 14/68] target/arm: Add FPCR.NEP to TBFLAGS
+For FEAT_AFP, we want to emit different code when FPCR.NEP is set, so
+that instead of zeroing the high elements of a vector register when
+we write the output of a scalar operation to it, we instead merge in
+those elements from one of the source registers.  Since this affects
+the generated code, we need to put FPCR.NEP into the TBFLAGS.
+FPCR.NEP is treated as 0 when in streaming SVE mode and FEAT_SME_FA64
+is not implemented or not enabled; we can implement this logic in
+rebuild_hflags_a64().
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/cpu.h               | 1 +
+ target/arm/tcg/translate.h     | 2 ++
+ target/arm/tcg/hflags.c        | 9 +++++++++
+ target/arm/tcg/translate-a64.c | 1 +
+files changed, 13 insertions(+)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1)
+ /* Set if FEAT_NV2 RAM accesses are big-endian */
+ FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1)
+ FIELD(TBFLAG_A64, AH, 37, 1)   /* FPCR.AH */
++FIELD(TBFLAG_A64, NEP, 38, 1)   /* FPCR.NEP */
+ /*
+  * Helpers for using the above. Note that only the A64 accessors use
+diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate.h
++++ b/target/arm/tcg/translate.h
+@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
+     bool nv2_mem_be;
+     /* True if FPCR.AH is 1 (alternate floating point handling) */
+     bool fpcr_ah;
++    /* True if FPCR.NEP is 1 (FEAT_AFP scalar upper-element result handling) */
++    bool fpcr_nep;
+     /*
+      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
+      *  < 0, set by the current instruction.
+diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/hflags.c
++++ b/target/arm/tcg/hflags.c
+@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
+     if (env->vfp.fpcr & FPCR_AH) {
+         DP_TBFLAG_A64(flags, AH, 1);
+     }
++    if (env->vfp.fpcr & FPCR_NEP) {
++        /*
++         * In streaming-SVE without FA64, NEP behaves as if zero;
++         * compare pseudocode IsMerging()
++         */
++        if (!(EX_TBFLAG_A64(flags, PSTATE_SM) && !sme_fa64(env, el))) {
++            DP_TBFLAG_A64(flags, NEP, 1);
++        }
++    }
+     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
+ }
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
+     dc->nv2_mem_e20 = EX_TBFLAG_A64(tb_flags, NV2_MEM_E20);
+     dc->nv2_mem_be = EX_TBFLAG_A64(tb_flags, NV2_MEM_BE);
+     dc->fpcr_ah = EX_TBFLAG_A64(tb_flags, AH);
++    dc->fpcr_nep = EX_TBFLAG_A64(tb_flags, NEP);
+     dc->vec_len = 0;
+     dc->vec_stride = 0;
+     dc->cp_regs = arm_cpu->cp_regs;
+--
+.34.1

-[PULL 18/55] target/arm: Move v8m_security_lookup to ptw.c
+[PULL 15/68] target/arm: Define and use new write_fp_*reg_merging() functions
-From: Richard Henderson <richard.henderson@linaro.org>
+For FEAT_AFP's FPCR.NEP bit, we need to programmatically change the
+behaviour of the writeback of the result for most SIMD scalar
-This function has one private helper, v8m_is_sau_exempt,
+operations, so that instead of zeroing the upper part of the result
-so move that at the same time.
+register it merges the upper elements from one of the input
+registers.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-12-richard.henderson@linaro.org
+Provide new functions write_fp_*reg_merging() which can be used
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+instead of the existing write_fp_*reg() functions when we want this
 "merge the result with one of the input registers if FPCR.NEP is
 enabled" handling, and use them in do_fp3_scalar_with_fpsttype().
 Note that (as documented in the description of the FPCR.NEP bit)
 which input register to use as the merge source varies by
 instruction: for these 2-input scalar operations, the comparison
 instructions take from Rm, not Rn.
 We'll extend this to also provide the merging behaviour for
 the remaining scalar insns in subsequent commits.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/helper.c | 123 ------------------------------------------
+ target/arm/tcg/translate-a64.c | 117 +++++++++++++++++++++++++--------
- target/arm/ptw.c    | 126 ++++++++++++++++++++++++++++++++++++++++++++
+file changed, 91 insertions(+), 26 deletions(-)
-files changed, 126 insertions(+), 123 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void write_fp_sreg(DisasContext *s, int reg, TCGv_i32 v)
- #include "qemu/osdep.h"
+     write_fp_dreg(s, reg, tmp);
  #include "qemu/units.h"
  #include "qemu/log.h"
 -#include "target/arm/idau.h"
  #include "trace.h"
  #include "cpu.h"
  #include "internals.h"
@@ -XXX,XX +XXX,XX @@ bool m_is_system_region(CPUARMState *env, uint32_t address)
      return arm_feature(env, ARM_FEATURE_M) && extract32(address, 29, 3) == 0x7;
  }
--static bool v8m_is_sau_exempt(CPUARMState *env,
++/*
--                              uint32_t address, MMUAccessType access_type)
++ * Write a double result to 128 bit vector register reg, honouring FPCR.NEP:
--{
++ * - if FPCR.NEP == 0, clear the high elements of reg
--    /* The architecture specifies that certain address ranges are
++ * - if FPCR.NEP == 1, set the high elements of reg from mergereg
--     * exempt from v8M SAU/IDAU checks.
++ *   (i.e. merge the result with those high elements)
--     */
++ * In either case, SVE register bits above 128 are zeroed (per R_WKYLB).
--    return
++ */
--        (access_type == MMU_INST_FETCH && m_is_system_region(env, address)) ||
++static void write_fp_dreg_merging(DisasContext *s, int reg, int mergereg,
--        (address >= 0xe0000000 && address <= 0xe0002fff) ||
++                                  TCGv_i64 v)
 -        (address >= 0xe000e000 && address <= 0xe000efff) ||
 -        (address >= 0xe002e000 && address <= 0xe002efff) ||
 -        (address >= 0xe0040000 && address <= 0xe0041fff) ||
 -        (address >= 0xe00ff000 && address <= 0xe00fffff);
 -}
 -
 -void v8m_security_lookup(CPUARMState *env, uint32_t address,
 -                                MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                                V8M_SAttributes *sattrs)
 -{
 -    /* Look up the security attributes for this address. Compare the
 -     * pseudocode SecurityCheck() function.
 -     * We assume the caller has zero-initialized *sattrs.
 -     */
 -    ARMCPU *cpu = env_archcpu(env);
 -    int r;
 -    bool idau_exempt = false, idau_ns = true, idau_nsc = true;
 -    int idau_region = IREGION_NOTVALID;
 -    uint32_t addr_page_base = address & TARGET_PAGE_MASK;
 -    uint32_t addr_page_limit = addr_page_base + (TARGET_PAGE_SIZE - 1);
 -
 -    if (cpu->idau) {
 -        IDAUInterfaceClass *iic = IDAU_INTERFACE_GET_CLASS(cpu->idau);
 -        IDAUInterface *ii = IDAU_INTERFACE(cpu->idau);
 -
 -        iic->check(ii, address, &idau_region, &idau_exempt, &idau_ns,
 -                   &idau_nsc);
 -    }
 -
 -    if (access_type == MMU_INST_FETCH && extract32(address, 28, 4) == 0xf) {
 -        /* 0xf0000000..0xffffffff is always S for insn fetches */
 -        return;
 -    }
 -
 -    if (idau_exempt || v8m_is_sau_exempt(env, address, access_type)) {
 -        sattrs->ns = !regime_is_secure(env, mmu_idx);
 -        return;
 -    }
 -
 -    if (idau_region != IREGION_NOTVALID) {
 -        sattrs->irvalid = true;
 -        sattrs->iregion = idau_region;
 -    }
 -
 -    switch (env->sau.ctrl & 3) {
 -    case 0: /* SAU.ENABLE == 0, SAU.ALLNS == 0 */
 -        break;
 -    case 2: /* SAU.ENABLE == 0, SAU.ALLNS == 1 */
 -        sattrs->ns = true;
 -        break;
 -    default: /* SAU.ENABLE == 1 */
 -        for (r = 0; r < cpu->sau_sregion; r++) {
 -            if (env->sau.rlar[r] & 1) {
 -                uint32_t base = env->sau.rbar[r] & ~0x1f;
 -                uint32_t limit = env->sau.rlar[r] | 0x1f;
 -
 -                if (base <= address && limit >= address) {
 -                    if (base > addr_page_base || limit < addr_page_limit) {
 -                        sattrs->subpage = true;
 -                    }
 -                    if (sattrs->srvalid) {
 -                        /* If we hit in more than one region then we must report
 -                         * as Secure, not NS-Callable, with no valid region
 -                         * number info.
 -                         */
 -                        sattrs->ns = false;
 -                        sattrs->nsc = false;
 -                        sattrs->sregion = 0;
 -                        sattrs->srvalid = false;
 -                        break;
 -                    } else {
 -                        if (env->sau.rlar[r] & 2) {
 -                            sattrs->nsc = true;
 -                        } else {
 -                            sattrs->ns = true;
 -                        }
 -                        sattrs->srvalid = true;
 -                        sattrs->sregion = r;
 -                    }
 -                } else {
 -                    /*
 -                     * Address not in this region. We must check whether the
 -                     * region covers addresses in the same page as our address.
 -                     * In that case we must not report a size that covers the
 -                     * whole page for a subsequent hit against a different MPU
 -                     * region or the background region, because it would result
 -                     * in incorrect TLB hits for subsequent accesses to
 -                     * addresses that are in this MPU region.
 -                     */
 -                    if (limit >= base &&
 -                        ranges_overlap(base, limit - base + 1,
 -                                       addr_page_base,
 -                                       TARGET_PAGE_SIZE)) {
 -                        sattrs->subpage = true;
 -                    }
 -                }
 -            }
 -        }
 -        break;
 -    }
 -
 -    /*
 -     * The IDAU will override the SAU lookup results if it specifies
 -     * higher security than the SAU does.
 -     */
 -    if (!idau_ns) {
 -        if (sattrs->ns || (!idau_nsc && sattrs->nsc)) {
 -            sattrs->ns = false;
 -            sattrs->nsc = idau_nsc;
 -        }
 -    }
 -}
 -
  /* Combine either inner or outer cacheability attributes for normal
   * memory, according to table D4-42 and pseudocode procedure
   * CombineS1S2AttrHints() of ARM DDI 0487B.b (the ARMv8 ARM).
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/range.h"
  #include "cpu.h"
  #include "internals.h"
 +#include "idau.h"
  #include "ptw.h"
@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
      return !(*prot & (1 << access_type));
  }
 +static bool v8m_is_sau_exempt(CPUARMState *env,
 +                              uint32_t address, MMUAccessType access_type)
 +{
-+    /*
++    if (!s->fpcr_nep) {
-+     * The architecture specifies that certain address ranges are
++        write_fp_dreg(s, reg, v);
 +     * exempt from v8M SAU/IDAU checks.
 +     */
 +    return
 +        (access_type == MMU_INST_FETCH && m_is_system_region(env, address)) ||
 +        (address >= 0xe0000000 && address <= 0xe0002fff) ||
 +        (address >= 0xe000e000 && address <= 0xe000efff) ||
 +        (address >= 0xe002e000 && address <= 0xe002efff) ||
 +        (address >= 0xe0040000 && address <= 0xe0041fff) ||
 +        (address >= 0xe00ff000 && address <= 0xe00fffff);
 +}
 +
 +void v8m_security_lookup(CPUARMState *env, uint32_t address,
 +                                MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                                V8M_SAttributes *sattrs)
 +{
 +    /*
 +     * Look up the security attributes for this address. Compare the
 +     * pseudocode SecurityCheck() function.
 +     * We assume the caller has zero-initialized *sattrs.
 +     */
 +    ARMCPU *cpu = env_archcpu(env);
 +    int r;
 +    bool idau_exempt = false, idau_ns = true, idau_nsc = true;
 +    int idau_region = IREGION_NOTVALID;
 +    uint32_t addr_page_base = address & TARGET_PAGE_MASK;
 +    uint32_t addr_page_limit = addr_page_base + (TARGET_PAGE_SIZE - 1);
 +
 +    if (cpu->idau) {
 +        IDAUInterfaceClass *iic = IDAU_INTERFACE_GET_CLASS(cpu->idau);
 +        IDAUInterface *ii = IDAU_INTERFACE(cpu->idau);
 +
 +        iic->check(ii, address, &idau_region, &idau_exempt, &idau_ns,
 +                   &idau_nsc);
 +    }
 +
 +    if (access_type == MMU_INST_FETCH && extract32(address, 28, 4) == 0xf) {
 +        /* 0xf0000000..0xffffffff is always S for insn fetches */
 +        return;
 +    }
 +
-+    if (idau_exempt || v8m_is_sau_exempt(env, address, access_type)) {
++    /*
-+        sattrs->ns = !regime_is_secure(env, mmu_idx);
++     * Move from mergereg to reg; this sets the high elements and
 +     * clears the bits above 128 as a side effect.
 +     */
 +    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
 +                     vec_full_reg_offset(s, mergereg),
 +                     16, vec_full_reg_size(s));
 +    tcg_gen_st_i64(v, tcg_env, vec_full_reg_offset(s, reg));
 +}
 +
 +/*
 + * Write a single-prec result, but only clear the higher elements
 + * of the destination register if FPCR.NEP is 0; otherwise preserve them.
 + */
 +static void write_fp_sreg_merging(DisasContext *s, int reg, int mergereg,
 +                                  TCGv_i32 v)
 +{
 +    if (!s->fpcr_nep) {
 +        write_fp_sreg(s, reg, v);
 +        return;
 +    }
 +
-+    if (idau_region != IREGION_NOTVALID) {
++    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
-+        sattrs->irvalid = true;
++                     vec_full_reg_offset(s, mergereg),
-+        sattrs->iregion = idau_region;
++                     16, vec_full_reg_size(s));
 +    tcg_gen_st_i32(v, tcg_env, fp_reg_offset(s, reg, MO_32));
 +}
 +
 +/*
 + * Write a half-prec result, but only clear the higher elements
 + * of the destination register if FPCR.NEP is 0; otherwise preserve them.
 + * The caller must ensure that the top 16 bits of v are zero.
 + */
 +static void write_fp_hreg_merging(DisasContext *s, int reg, int mergereg,
 +                                  TCGv_i32 v)
 +{
 +    if (!s->fpcr_nep) {
 +        write_fp_sreg(s, reg, v);
 +        return;
 +    }
 +
-+    switch (env->sau.ctrl & 3) {
++    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
-+    case 0: /* SAU.ENABLE == 0, SAU.ALLNS == 0 */
++                     vec_full_reg_offset(s, mergereg),
-+        break;
++                     16, vec_full_reg_size(s));
-+    case 2: /* SAU.ENABLE == 0, SAU.ALLNS == 1 */
++    tcg_gen_st16_i32(v, tcg_env, fp_reg_offset(s, reg, MO_16));
 +        sattrs->ns = true;
 +        break;
 +    default: /* SAU.ENABLE == 1 */
 +        for (r = 0; r < cpu->sau_sregion; r++) {
 +            if (env->sau.rlar[r] & 1) {
 +                uint32_t base = env->sau.rbar[r] & ~0x1f;
 +                uint32_t limit = env->sau.rlar[r] | 0x1f;
 +
 +                if (base <= address && limit >= address) {
 +                    if (base > addr_page_base || limit < addr_page_limit) {
 +                        sattrs->subpage = true;
 +                    }
 +                    if (sattrs->srvalid) {
 +                        /*
 +                         * If we hit in more than one region then we must report
 +                         * as Secure, not NS-Callable, with no valid region
 +                         * number info.
 +                         */
 +                        sattrs->ns = false;
 +                        sattrs->nsc = false;
 +                        sattrs->sregion = 0;
 +                        sattrs->srvalid = false;
 +                        break;
 +                    } else {
 +                        if (env->sau.rlar[r] & 2) {
 +                            sattrs->nsc = true;
 +                        } else {
 +                            sattrs->ns = true;
 +                        }
 +                        sattrs->srvalid = true;
 +                        sattrs->sregion = r;
 +                    }
 +                } else {
 +                    /*
 +                     * Address not in this region. We must check whether the
 +                     * region covers addresses in the same page as our address.
 +                     * In that case we must not report a size that covers the
 +                     * whole page for a subsequent hit against a different MPU
 +                     * region or the background region, because it would result
 +                     * in incorrect TLB hits for subsequent accesses to
 +                     * addresses that are in this MPU region.
 +                     */
 +                    if (limit >= base &&
 +                        ranges_overlap(base, limit - base + 1,
 +                                       addr_page_base,
 +                                       TARGET_PAGE_SIZE)) {
 +                        sattrs->subpage = true;
 +                    }
 +                }
 +            }
 +        }
 +        break;
 +    }
 +
 +    /*
 +     * The IDAU will override the SAU lookup results if it specifies
 +     * higher security than the SAU does.
 +     */
 +    if (!idau_ns) {
 +        if (sattrs->ns || (!idau_nsc && sattrs->nsc)) {
 +            sattrs->ns = false;
 +            sattrs->nsc = idau_nsc;
 +        }
 +    }
 +}
 +
- static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
+ /* Expand a 2-operand AdvSIMD vector operation using an expander function.  */
-                                  MMUAccessType access_type, ARMMMUIdx mmu_idx,
+ static void gen_gvec_fn2(DisasContext *s, bool is_q, int rd, int rn,
-                                  hwaddr *phys_ptr, MemTxAttrs *txattrs,
+                          GVecGen2Fn *gvec_fn, int vece)
@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar {
  } FPScalar;
  static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
 -                                        const FPScalar *f,
 +                                        const FPScalar *f, int mergereg,
                                          ARMFPStatusFlavour fpsttype)
  {
      switch (a->esz) {
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
              TCGv_i64 t0 = read_fp_dreg(s, a->rn);
              TCGv_i64 t1 = read_fp_dreg(s, a->rm);
              f->gen_d(t0, t0, t1, fpstatus_ptr(fpsttype));
 -            write_fp_dreg(s, a->rd, t0);
 +            write_fp_dreg_merging(s, a->rd, mergereg, t0);
          }
          break;
      case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
              TCGv_i32 t0 = read_fp_sreg(s, a->rn);
              TCGv_i32 t1 = read_fp_sreg(s, a->rm);
              f->gen_s(t0, t0, t1, fpstatus_ptr(fpsttype));
 -            write_fp_sreg(s, a->rd, t0);
 +            write_fp_sreg_merging(s, a->rd, mergereg, t0);
          }
          break;
      case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
              TCGv_i32 t0 = read_fp_hreg(s, a->rn);
              TCGv_i32 t1 = read_fp_hreg(s, a->rm);
              f->gen_h(t0, t0, t1, fpstatus_ptr(fpsttype));
 -            write_fp_sreg(s, a->rd, t0);
 +            write_fp_hreg_merging(s, a->rd, mergereg, t0);
          }
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
      return true;
  }
 -static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
 +static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
 +                          int mergereg)
  {
 -    return do_fp3_scalar_with_fpsttype(s, a, f,
 +    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
                                         a->esz == MO_16 ?
                                         FPST_A64_F16 : FPST_A64);
  }
 -static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
 +static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
 +                             int mergereg)
  {
 -    return do_fp3_scalar_with_fpsttype(s, a, f, select_ah_fpst(s, a->esz));
 +    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
 +                                       select_ah_fpst(s, a->esz));
  }
  static const FPScalar f_scalar_fadd = {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fadd = {
      gen_helper_vfp_adds,
      gen_helper_vfp_addd,
  };
 -TRANS(FADD_s, do_fp3_scalar, a, &f_scalar_fadd)
 +TRANS(FADD_s, do_fp3_scalar, a, &f_scalar_fadd, a->rn)
  static const FPScalar f_scalar_fsub = {
      gen_helper_vfp_subh,
      gen_helper_vfp_subs,
      gen_helper_vfp_subd,
  };
 -TRANS(FSUB_s, do_fp3_scalar, a, &f_scalar_fsub)
 +TRANS(FSUB_s, do_fp3_scalar, a, &f_scalar_fsub, a->rn)
  static const FPScalar f_scalar_fdiv = {
      gen_helper_vfp_divh,
      gen_helper_vfp_divs,
      gen_helper_vfp_divd,
  };
 -TRANS(FDIV_s, do_fp3_scalar, a, &f_scalar_fdiv)
 +TRANS(FDIV_s, do_fp3_scalar, a, &f_scalar_fdiv, a->rn)
  static const FPScalar f_scalar_fmul = {
      gen_helper_vfp_mulh,
      gen_helper_vfp_muls,
      gen_helper_vfp_muld,
  };
 -TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul)
 +TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul, a->rn)
  static const FPScalar f_scalar_fmax = {
      gen_helper_vfp_maxh,
      gen_helper_vfp_maxs,
      gen_helper_vfp_maxd,
  };
 -TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax)
 +TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax, a->rn)
  static const FPScalar f_scalar_fmin = {
      gen_helper_vfp_minh,
      gen_helper_vfp_mins,
      gen_helper_vfp_mind,
  };
 -TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin)
 +TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin, a->rn)
  static const FPScalar f_scalar_fmaxnm = {
      gen_helper_vfp_maxnumh,
      gen_helper_vfp_maxnums,
      gen_helper_vfp_maxnumd,
  };
 -TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm)
 +TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm, a->rn)
  static const FPScalar f_scalar_fminnm = {
      gen_helper_vfp_minnumh,
      gen_helper_vfp_minnums,
      gen_helper_vfp_minnumd,
  };
 -TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm)
 +TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm, a->rn)
  static const FPScalar f_scalar_fmulx = {
      gen_helper_advsimd_mulxh,
      gen_helper_vfp_mulxs,
      gen_helper_vfp_mulxd,
  };
 -TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx)
 +TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx, a->rn)
  static void gen_fnmul_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
  {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fnmul = {
      gen_fnmul_s,
      gen_fnmul_d,
  };
 -TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul)
 +TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul, a->rn)
  static const FPScalar f_scalar_fcmeq = {
      gen_helper_advsimd_ceq_f16,
      gen_helper_neon_ceq_f32,
      gen_helper_neon_ceq_f64,
  };
 -TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq)
 +TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq, a->rm)
  static const FPScalar f_scalar_fcmge = {
      gen_helper_advsimd_cge_f16,
      gen_helper_neon_cge_f32,
      gen_helper_neon_cge_f64,
  };
 -TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge)
 +TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge, a->rm)
  static const FPScalar f_scalar_fcmgt = {
      gen_helper_advsimd_cgt_f16,
      gen_helper_neon_cgt_f32,
      gen_helper_neon_cgt_f64,
  };
 -TRANS(FCMGT_s, do_fp3_scalar, a, &f_scalar_fcmgt)
 +TRANS(FCMGT_s, do_fp3_scalar, a, &f_scalar_fcmgt, a->rm)
  static const FPScalar f_scalar_facge = {
      gen_helper_advsimd_acge_f16,
      gen_helper_neon_acge_f32,
      gen_helper_neon_acge_f64,
  };
 -TRANS(FACGE_s, do_fp3_scalar, a, &f_scalar_facge)
 +TRANS(FACGE_s, do_fp3_scalar, a, &f_scalar_facge, a->rm)
  static const FPScalar f_scalar_facgt = {
      gen_helper_advsimd_acgt_f16,
      gen_helper_neon_acgt_f32,
      gen_helper_neon_acgt_f64,
  };
 -TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt)
 +TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt, a->rm)
  static void gen_fabd_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
  {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fabd = {
      gen_fabd_s,
      gen_fabd_d,
  };
 -TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd)
 +TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd, a->rn)
  static const FPScalar f_scalar_frecps = {
      gen_helper_recpsf_f16,
      gen_helper_recpsf_f32,
      gen_helper_recpsf_f64,
  };
 -TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps)
 +TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps, a->rn)
  static const FPScalar f_scalar_frsqrts = {
      gen_helper_rsqrtsf_f16,
      gen_helper_rsqrtsf_f32,
      gen_helper_rsqrtsf_f64,
  };
 -TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts)
 +TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts, a->rn)
  static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                         const FPScalar *f, bool swap)
 --
-.25.1
+.34.1

-New patch
+[PULL 16/68] target/arm: Handle FPCR.NEP for 3-input scalar operations
+Handle FPCR.NEP for the 3-input scalar operations which use
+do_fmla_scalar_idx() and do_fmadd(), by making them call the
+appropriate write_fp_*reg_merging() functions.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 12 ++++++------
+file changed, 6 insertions(+), 6 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
+                 gen_vfp_negd(t1, t1);
+             }
+             gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
+-            write_fp_dreg(s, a->rd, t0);
++            write_fp_dreg_merging(s, a->rd, a->rd, t0);
+         }
+         break;
+     case MO_32:
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
+                 gen_vfp_negs(t1, t1);
+             }
+             gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
+-            write_fp_sreg(s, a->rd, t0);
++            write_fp_sreg_merging(s, a->rd, a->rd, t0);
+         }
+         break;
+     case MO_16:
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
+             }
+             gen_helper_advsimd_muladdh(t0, t1, t2, t0,
+                                        fpstatus_ptr(FPST_A64_F16));
+-            write_fp_sreg(s, a->rd, t0);
++            write_fp_hreg_merging(s, a->rd, a->rd, t0);
+         }
+         break;
+     default:
+@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
+             }
+             fpst = fpstatus_ptr(FPST_A64);
+             gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
+-            write_fp_dreg(s, a->rd, ta);
++            write_fp_dreg_merging(s, a->rd, a->ra, ta);
+         }
+         break;
+@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
+             }
+             fpst = fpstatus_ptr(FPST_A64);
+             gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
+-            write_fp_sreg(s, a->rd, ta);
++            write_fp_sreg_merging(s, a->rd, a->ra, ta);
+         }
+         break;
+@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
+             }
+             fpst = fpstatus_ptr(FPST_A64_F16);
+             gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
+-            write_fp_sreg(s, a->rd, ta);
++            write_fp_hreg_merging(s, a->rd, a->ra, ta);
+         }
+         break;
+--
+.34.1

-[PULL 27/55] target/arm: Move check_s2_mmu_setup to ptw.c
+[PULL 17/68] target/arm: Handle FPCR.NEP for BFCVT scalar
-From: Richard Henderson <richard.henderson@linaro.org>
+Currently we implement BFCVT scalar via do_fp1_scalar().  This works
 even though BFCVT is a narrowing operation from 32 to 16 bits,
 because we can use write_fp_sreg() for float16. However, FPCR.NEP
 support requires that we use write_fp_hreg_merging() for float16
 outputs, so we can't continue to borrow the non-narrowing
 do_fp1_scalar() function for this. Split out trans_BFCVT_s()
 into its own implementation that honours FPCR.NEP.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-21-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/ptw.h    |  2 --
+ target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
- target/arm/helper.c | 70 ---------------------------------------------
+file changed, 21 insertions(+), 4 deletions(-)
  target/arm/ptw.c    | 70 +++++++++++++++++++++++++++++++++++++++++++++
 files changed, 70 insertions(+), 72 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/ptw.h
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frintx = {
+ };
- ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
+ TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
-                                    ARMMMUIdx mmu_idx);
--bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
+-static const FPScalar1 f_scalar_bfcvt = {
--                        int inputsize, int stride, int outputsize);
+-    .gen_s = gen_helper_bfcvt,
+-};
- #endif /* !CONFIG_USER_ONLY */
+-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
- #endif /* TARGET_ARM_PTW_H */
++static bool trans_BFCVT_s(DisasContext *s, arg_rr_e *a)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
          g_assert_not_reached();
      }
  }
 -
 -/*
 - * check_s2_mmu_setup
 - * @cpu:        ARMCPU
 - * @is_aa64:    True if the translation regime is in AArch64 state
 - * @startlevel: Suggested starting level
 - * @inputsize:  Bitsize of IPAs
 - * @stride:     Page-table stride (See the ARM ARM)
 - *
 - * Returns true if the suggested S2 translation parameters are OK and
 - * false otherwise.
 - */
 -bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
 -                        int inputsize, int stride, int outputsize)
 -{
 -    const int grainsize = stride + 3;
 -    int startsizecheck;
 -
 -    /*
 -     * Negative levels are usually not allowed...
 -     * Except for FEAT_LPA2, 4k page table, 52-bit address space, which
 -     * begins with level -1.  Note that previous feature tests will have
 -     * eliminated this combination if it is not enabled.
 -     */
 -    if (level < (inputsize == 52 && stride == 9 ? -1 : 0)) {
 -        return false;
 -    }
 -
 -    startsizecheck = inputsize - ((3 - level) * stride + grainsize);
 -    if (startsizecheck < 1 || startsizecheck > stride + 4) {
 -        return false;
 -    }
 -
 -    if (is_aa64) {
 -        switch (stride) {
 -        case 13: /* 64KB Pages.  */
 -            if (level == 0 || (level == 1 && outputsize <= 42)) {
 -                return false;
 -            }
 -            break;
 -        case 11: /* 16KB Pages.  */
 -            if (level == 0 || (level == 1 && outputsize <= 40)) {
 -                return false;
 -            }
 -            break;
 -        case 9: /* 4KB Pages.  */
 -            if (level == 0 && outputsize <= 42) {
 -                return false;
 -            }
 -            break;
 -        default:
 -            g_assert_not_reached();
 -        }
 -
 -        /* Inputsize checks.  */
 -        if (inputsize > outputsize &&
 -            (arm_el_is_aa64(&cpu->env, 1) || inputsize > 40)) {
 -            /* This is CONSTRAINED UNPREDICTABLE and we choose to fault.  */
 -            return false;
 -        }
 -    } else {
 -        /* AArch32 only supports 4KB pages. Assert on that.  */
 -        assert(stride == 9);
 -
 -        if (level == 0) {
 -            return false;
 -        }
 -    }
 -    return true;
 -}
  #endif /* !CONFIG_USER_ONLY */
  int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
      return prot_rw | PAGE_EXEC;
  }
 +/*
 + * check_s2_mmu_setup
 + * @cpu:        ARMCPU
 + * @is_aa64:    True if the translation regime is in AArch64 state
 + * @startlevel: Suggested starting level
 + * @inputsize:  Bitsize of IPAs
 + * @stride:     Page-table stride (See the ARM ARM)
 + *
 + * Returns true if the suggested S2 translation parameters are OK and
 + * false otherwise.
 + */
 +static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
 +                               int inputsize, int stride, int outputsize)
 +{
-+    const int grainsize = stride + 3;
++    ARMFPStatusFlavour fpsttype = s->fpcr_ah ? FPST_AH : FPST_A64;
-+    int startsizecheck;
++    TCGv_i32 t32;
 +    int check;
 +
-+    /*
++    if (!dc_isar_feature(aa64_bf16, s)) {
 +     * Negative levels are usually not allowed...
 +     * Except for FEAT_LPA2, 4k page table, 52-bit address space, which
 +     * begins with level -1.  Note that previous feature tests will have
 +     * eliminated this combination if it is not enabled.
 +     */
 +    if (level < (inputsize == 52 && stride == 9 ? -1 : 0)) {
 +        return false;
 +    }
 +
-+    startsizecheck = inputsize - ((3 - level) * stride + grainsize);
++    check = fp_access_check_scalar_hsd(s, a->esz);
-+    if (startsizecheck < 1 || startsizecheck > stride + 4) {
++
-+        return false;
++    if (check <= 0) {
 +        return check == 0;
 +    }
 +
-+    if (is_aa64) {
++    t32 = read_fp_sreg(s, a->rn);
-+        switch (stride) {
++    gen_helper_bfcvt(t32, t32, fpstatus_ptr(fpsttype));
-+        case 13: /* 64KB Pages.  */
++    write_fp_hreg_merging(s, a->rd, a->rd, t32);
 +            if (level == 0 || (level == 1 && outputsize <= 42)) {
 +                return false;
 +            }
 +            break;
 +        case 11: /* 16KB Pages.  */
 +            if (level == 0 || (level == 1 && outputsize <= 40)) {
 +                return false;
 +            }
 +            break;
 +        case 9: /* 4KB Pages.  */
 +            if (level == 0 && outputsize <= 42) {
 +                return false;
 +            }
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +
 +        /* Inputsize checks.  */
 +        if (inputsize > outputsize &&
 +            (arm_el_is_aa64(&cpu->env, 1) || inputsize > 40)) {
 +            /* This is CONSTRAINED UNPREDICTABLE and we choose to fault.  */
 +            return false;
 +        }
 +    } else {
 +        /* AArch32 only supports 4KB pages. Assert on that.  */
 +        assert(stride == 9);
 +
 +        if (level == 0) {
 +            return false;
 +        }
 +    }
 +    return true;
 +}
-+
- /**
+ static const FPScalar1 f_scalar_frint32 = {
-  * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
+     NULL,
   *
 --
-.25.1
+.34.1

-New patch
+[PULL 18/68] target/arm: Handle FPCR.NEP for 1-input scalar operations
+Handle FPCR.NEP for the 1-input scalar operations.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 26 ++++++++++++++------------
+file changed, 14 insertions(+), 12 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_with_fpsttype(DisasContext *s, arg_rr_e *a,
+     case MO_64:
+         t64 = read_fp_dreg(s, a->rn);
+         f->gen_d(t64, t64, fpst);
+-        write_fp_dreg(s, a->rd, t64);
++        write_fp_dreg_merging(s, a->rd, a->rd, t64);
+         break;
+     case MO_32:
+         t32 = read_fp_sreg(s, a->rn);
+         f->gen_s(t32, t32, fpst);
+-        write_fp_sreg(s, a->rd, t32);
++        write_fp_sreg_merging(s, a->rd, a->rd, t32);
+         break;
+     case MO_16:
+         t32 = read_fp_hreg(s, a->rn);
+         f->gen_h(t32, t32, fpst);
+-        write_fp_sreg(s, a->rd, t32);
++        write_fp_hreg_merging(s, a->rd, a->rd, t32);
+         break;
+     default:
+         g_assert_not_reached();
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
+         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
+         gen_helper_vfp_fcvtds(tcg_rd, tcg_rn, fpst);
+-        write_fp_dreg(s, a->rd, tcg_rd);
++        write_fp_dreg_merging(s, a->rd, a->rd, tcg_rd);
+     }
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_hs(DisasContext *s, arg_rr *a)
+         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
+         gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
+-        /* write_fp_sreg is OK here because top half of result is zero */
+-        write_fp_sreg(s, a->rd, tmp);
++        /* write_fp_hreg_merging is OK here because top half of result is zero */
++        write_fp_hreg_merging(s, a->rd, a->rd, tmp);
+     }
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_sd(DisasContext *s, arg_rr *a)
+         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
+         gen_helper_vfp_fcvtsd(tcg_rd, tcg_rn, fpst);
+-        write_fp_sreg(s, a->rd, tcg_rd);
++        write_fp_sreg_merging(s, a->rd, a->rd, tcg_rd);
+     }
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_hd(DisasContext *s, arg_rr *a)
+         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
+         gen_helper_vfp_fcvt_f64_to_f16(tcg_rd, tcg_rn, fpst, ahp);
+-        /* write_fp_sreg is OK here because top half of tcg_rd is zero */
+-        write_fp_sreg(s, a->rd, tcg_rd);
++        /* write_fp_hreg_merging is OK here because top half of tcg_rd is zero */
++        write_fp_hreg_merging(s, a->rd, a->rd, tcg_rd);
+     }
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_sh(DisasContext *s, arg_rr *a)
+         TCGv_i32 tcg_ahp = get_ahp_flag();
+         gen_helper_vfp_fcvt_f16_to_f32(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
+-        write_fp_sreg(s, a->rd, tcg_rd);
++        write_fp_sreg_merging(s, a->rd, a->rd, tcg_rd);
+     }
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_dh(DisasContext *s, arg_rr *a)
+         TCGv_i32 tcg_ahp = get_ahp_flag();
+         gen_helper_vfp_fcvt_f16_to_f64(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
+-        write_fp_dreg(s, a->rd, tcg_rd);
++        write_fp_dreg_merging(s, a->rd, a->rd, tcg_rd);
+     }
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool do_fcvt_f(DisasContext *s, arg_fcvt *a,
+     do_fcvt_scalar(s, a->esz | (is_signed ? MO_SIGN : 0),
+                    a->esz, tcg_int, a->shift, a->rn, rmode);
+-    clear_vec(s, a->rd);
++    if (!s->fpcr_nep) {
++        clear_vec(s, a->rd);
++    }
+     write_vec_element(s, tcg_int, a->rd, 0, a->esz);
+     return true;
+ }
+--
+.34.1

-New patch
+[PULL 19/68] target/arm: Handle FPCR.NEP in do_cvtf_scalar()
+Handle FPCR.NEP in the operations handled by do_cvtf_scalar().
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 6 +++---
+file changed, 3 insertions(+), 3 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
+         } else {
+             gen_helper_vfp_uqtod(tcg_double, tcg_int, tcg_shift, tcg_fpstatus);
+         }
+-        write_fp_dreg(s, rd, tcg_double);
++        write_fp_dreg_merging(s, rd, rd, tcg_double);
+         break;
+     case MO_32:
+@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
+         } else {
+             gen_helper_vfp_uqtos(tcg_single, tcg_int, tcg_shift, tcg_fpstatus);
+         }
+-        write_fp_sreg(s, rd, tcg_single);
++        write_fp_sreg_merging(s, rd, rd, tcg_single);
+         break;
+     case MO_16:
+@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
+         } else {
+             gen_helper_vfp_uqtoh(tcg_single, tcg_int, tcg_shift, tcg_fpstatus);
+         }
+-        write_fp_sreg(s, rd, tcg_single);
++        write_fp_hreg_merging(s, rd, rd, tcg_single);
+         break;
+     default:
+--
+.34.1

-New patch
+[PULL 20/68] target/arm: Handle FPCR.NEP for scalar FABS and FNEG
+Handle FPCR.NEP merging for scalar FABS and FNEG; this requires
+an extra parameter to do_fp1_scalar_int(), since FMOV scalar
+does not have the merging behaviour.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 27 ++++++++++++++++++++-------
+file changed, 20 insertions(+), 7 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar1Int {
+ } FPScalar1Int;
+ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
+-                              const FPScalar1Int *f)
++                              const FPScalar1Int *f,
++                              bool merging)
+ {
+     switch (a->esz) {
+     case MO_64:
+         if (fp_access_check(s)) {
+             TCGv_i64 t = read_fp_dreg(s, a->rn);
+             f->gen_d(t, t);
+-            write_fp_dreg(s, a->rd, t);
++            if (merging) {
++                write_fp_dreg_merging(s, a->rd, a->rd, t);
++            } else {
++                write_fp_dreg(s, a->rd, t);
++            }
+         }
+         break;
+     case MO_32:
+         if (fp_access_check(s)) {
+             TCGv_i32 t = read_fp_sreg(s, a->rn);
+             f->gen_s(t, t);
+-            write_fp_sreg(s, a->rd, t);
++            if (merging) {
++                write_fp_sreg_merging(s, a->rd, a->rd, t);
++            } else {
++                write_fp_sreg(s, a->rd, t);
++            }
+         }
+         break;
+     case MO_16:
+@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
+         if (fp_access_check(s)) {
+             TCGv_i32 t = read_fp_hreg(s, a->rn);
+             f->gen_h(t, t);
+-            write_fp_sreg(s, a->rd, t);
++            if (merging) {
++                write_fp_hreg_merging(s, a->rd, a->rd, t);
++            } else {
++                write_fp_sreg(s, a->rd, t);
++            }
+         }
+         break;
+     default:
+@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fmov = {
+     tcg_gen_mov_i32,
+     tcg_gen_mov_i64,
+ };
+-TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov)
++TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov, false)
+ static const FPScalar1Int f_scalar_fabs = {
+     gen_vfp_absh,
+     gen_vfp_abss,
+     gen_vfp_absd,
+ };
+-TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs)
++TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs, true)
+ static const FPScalar1Int f_scalar_fneg = {
+     gen_vfp_negh,
+     gen_vfp_negs,
+     gen_vfp_negd,
+ };
+-TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg)
++TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg, true)
+ typedef struct FPScalar1 {
+     void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr);
+--
+.34.1

-[PULL 15/55] target/arm: Move get_phys_addr_pmsav8 to ptw.c
+[PULL 21/68] target/arm: Handle FPCR.NEP for FCVTXN (scalar)
-From: Richard Henderson <richard.henderson@linaro.org>
+Unlike the other users of do_2misc_narrow_scalar(), FCVTXN (scalar)
 is always double-to-single and must honour FPCR.NEP.  Implement this
 directly in a trans function rather than using
 do_2misc_narrow_scalar().
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+We still need gen_fcvtxn_sd() and the f_scalar_fcvtxn[] array for
-Message-id: 20220604040607.269301-9-richard.henderson@linaro.org
+the FCVTXN (vector) insn, so we move those down in the file to
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+where they are used.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/ptw.h    |  5 ---
+ target/arm/tcg/translate-a64.c | 43 ++++++++++++++++++++++------------
- target/arm/helper.c | 75 -------------------------------------------
+file changed, 28 insertions(+), 15 deletions(-)
  target/arm/ptw.c    | 77 +++++++++++++++++++++++++++++++++++++++++++++
 files changed, 77 insertions(+), 80 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/ptw.h
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ void get_phys_addr_pmsav7_default(CPUARMState *env,
+@@ -XXX,XX +XXX,XX @@ static ArithOneOp * const f_scalar_uqxtn[] = {
-                                   int32_t address, int *prot);
+ };
- bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx, bool is_user);
+ TRANS(UQXTN_s, do_2misc_narrow_scalar, a, f_scalar_uqxtn)
--bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
+-static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
--                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
++static bool trans_FCVTXN_s(DisasContext *s, arg_rr_e *a)
--                          hwaddr *phys_ptr, MemTxAttrs *txattrs,
+ {
--                          int *prot, target_ulong *page_size,
+-    /*
--                          ARMMMUFaultInfo *fi);
+-     * 64 bit to 32 bit float conversion
- bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
+-     * with von Neumann rounding (round to odd)
-                         MMUAccessType access_type, ARMMMUIdx mmu_idx,
+-     */
-                         bool s1_is_el0,
+-    TCGv_i32 tmp = tcg_temp_new_i32();
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+-    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_A64));
-index XXXXXXX..XXXXXXX 100644
+-    tcg_gen_extu_i32_i64(d, tmp);
---- a/target/arm/helper.c
++    if (fp_access_check(s)) {
-+++ b/target/arm/helper.c
++        /*
-@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
++         * 64 bit to 32 bit float conversion
-     return !(*prot & (1 << access_type));
++         * with von Neumann rounding (round to odd)
 +         */
 +        TCGv_i64 src = read_fp_dreg(s, a->rn);
 +        TCGv_i32 dst = tcg_temp_new_i32();
 +        gen_helper_fcvtx_f64_to_f32(dst, src, fpstatus_ptr(FPST_A64));
 +        write_fp_sreg_merging(s, a->rd, a->rd, dst);
 +    }
 +    return true;
  }
+-static ArithOneOp * const f_scalar_fcvtxn[] = {
+-    NULL,
+-    NULL,
+-    gen_fcvtxn_sd,
+-};
+-TRANS(FCVTXN_s, do_2misc_narrow_scalar, a, f_scalar_fcvtxn)
 -
--bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
+ #undef WRAP_ENV
--                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
--                          hwaddr *phys_ptr, MemTxAttrs *txattrs,
+ static bool do_gvec_fn2(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn)
--                          int *prot, target_ulong *page_size,
+@@ -XXX,XX +XXX,XX @@ static void gen_fcvtn_sd(TCGv_i64 d, TCGv_i64 n)
--                          ARMMMUFaultInfo *fi)
+     tcg_gen_extu_i32_i64(d, tmp);
 -{
 -    uint32_t secure = regime_is_secure(env, mmu_idx);
 -    V8M_SAttributes sattrs = {};
 -    bool ret;
 -    bool mpu_is_subpage;
 -
 -    if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
 -        v8m_security_lookup(env, address, access_type, mmu_idx, &sattrs);
 -        if (access_type == MMU_INST_FETCH) {
 -            /* Instruction fetches always use the MMU bank and the
 -             * transaction attribute determined by the fetch address,
 -             * regardless of CPU state. This is painful for QEMU
 -             * to handle, because it would mean we need to encode
 -             * into the mmu_idx not just the (user, negpri) information
 -             * for the current security state but also that for the
 -             * other security state, which would balloon the number
 -             * of mmu_idx values needed alarmingly.
 -             * Fortunately we can avoid this because it's not actually
 -             * possible to arbitrarily execute code from memory with
 -             * the wrong security attribute: it will always generate
 -             * an exception of some kind or another, apart from the
 -             * special case of an NS CPU executing an SG instruction
 -             * in S&NSC memory. So we always just fail the translation
 -             * here and sort things out in the exception handler
 -             * (including possibly emulating an SG instruction).
 -             */
 -            if (sattrs.ns != !secure) {
 -                if (sattrs.nsc) {
 -                    fi->type = ARMFault_QEMU_NSCExec;
 -                } else {
 -                    fi->type = ARMFault_QEMU_SFault;
 -                }
 -                *page_size = sattrs.subpage ? 1 : TARGET_PAGE_SIZE;
 -                *phys_ptr = address;
 -                *prot = 0;
 -                return true;
 -            }
 -        } else {
 -            /* For data accesses we always use the MMU bank indicated
 -             * by the current CPU state, but the security attributes
 -             * might downgrade a secure access to nonsecure.
 -             */
 -            if (sattrs.ns) {
 -                txattrs->secure = false;
 -            } else if (!secure) {
 -                /* NS access to S memory must fault.
 -                 * Architecturally we should first check whether the
 -                 * MPU information for this address indicates that we
 -                 * are doing an unaligned access to Device memory, which
 -                 * should generate a UsageFault instead. QEMU does not
 -                 * currently check for that kind of unaligned access though.
 -                 * If we added it we would need to do so as a special case
 -                 * for M_FAKE_FSR_SFAULT in arm_v7m_cpu_do_interrupt().
 -                 */
 -                fi->type = ARMFault_QEMU_SFault;
 -                *page_size = sattrs.subpage ? 1 : TARGET_PAGE_SIZE;
 -                *phys_ptr = address;
 -                *prot = 0;
 -                return true;
 -            }
 -        }
 -    }
 -
 -    ret = pmsav8_mpu_lookup(env, address, access_type, mmu_idx, phys_ptr,
 -                            txattrs, prot, &mpu_is_subpage, fi, NULL);
 -    *page_size = sattrs.subpage || mpu_is_subpage ? 1 : TARGET_PAGE_SIZE;
 -    return ret;
 -}
 -
  /* Combine either inner or outer cacheability attributes for normal
   * memory, according to table D4-42 and pseudocode procedure
   * CombineS1S2AttrHints() of ARM DDI 0487B.b (the ARMv8 ARM).
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
      return !(*prot & (1 << access_type));
  }
-+static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
++static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
 +                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                                 hwaddr *phys_ptr, MemTxAttrs *txattrs,
 +                                 int *prot, target_ulong *page_size,
 +                                 ARMMMUFaultInfo *fi)
 +{
-+    uint32_t secure = regime_is_secure(env, mmu_idx);
++    /*
-+    V8M_SAttributes sattrs = {};
++     * 64 bit to 32 bit float conversion
-+    bool ret;
++     * with von Neumann rounding (round to odd)
-+    bool mpu_is_subpage;
++     */
-+
++    TCGv_i32 tmp = tcg_temp_new_i32();
-+    if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
++    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_A64));
-+        v8m_security_lookup(env, address, access_type, mmu_idx, &sattrs);
++    tcg_gen_extu_i32_i64(d, tmp);
 +        if (access_type == MMU_INST_FETCH) {
 +            /*
 +             * Instruction fetches always use the MMU bank and the
 +             * transaction attribute determined by the fetch address,
 +             * regardless of CPU state. This is painful for QEMU
 +             * to handle, because it would mean we need to encode
 +             * into the mmu_idx not just the (user, negpri) information
 +             * for the current security state but also that for the
 +             * other security state, which would balloon the number
 +             * of mmu_idx values needed alarmingly.
 +             * Fortunately we can avoid this because it's not actually
 +             * possible to arbitrarily execute code from memory with
 +             * the wrong security attribute: it will always generate
 +             * an exception of some kind or another, apart from the
 +             * special case of an NS CPU executing an SG instruction
 +             * in S&NSC memory. So we always just fail the translation
 +             * here and sort things out in the exception handler
 +             * (including possibly emulating an SG instruction).
 +             */
 +            if (sattrs.ns != !secure) {
 +                if (sattrs.nsc) {
 +                    fi->type = ARMFault_QEMU_NSCExec;
 +                } else {
 +                    fi->type = ARMFault_QEMU_SFault;
 +                }
 +                *page_size = sattrs.subpage ? 1 : TARGET_PAGE_SIZE;
 +                *phys_ptr = address;
 +                *prot = 0;
 +                return true;
 +            }
 +        } else {
 +            /*
 +             * For data accesses we always use the MMU bank indicated
 +             * by the current CPU state, but the security attributes
 +             * might downgrade a secure access to nonsecure.
 +             */
 +            if (sattrs.ns) {
 +                txattrs->secure = false;
 +            } else if (!secure) {
 +                /*
 +                 * NS access to S memory must fault.
 +                 * Architecturally we should first check whether the
 +                 * MPU information for this address indicates that we
 +                 * are doing an unaligned access to Device memory, which
 +                 * should generate a UsageFault instead. QEMU does not
 +                 * currently check for that kind of unaligned access though.
 +                 * If we added it we would need to do so as a special case
 +                 * for M_FAKE_FSR_SFAULT in arm_v7m_cpu_do_interrupt().
 +                 */
 +                fi->type = ARMFault_QEMU_SFault;
 +                *page_size = sattrs.subpage ? 1 : TARGET_PAGE_SIZE;
 +                *phys_ptr = address;
 +                *prot = 0;
 +                return true;
 +            }
 +        }
 +    }
 +
 +    ret = pmsav8_mpu_lookup(env, address, access_type, mmu_idx, phys_ptr,
 +                            txattrs, prot, &mpu_is_subpage, fi, NULL);
 +    *page_size = sattrs.subpage || mpu_is_subpage ? 1 : TARGET_PAGE_SIZE;
 +    return ret;
 +}
 +
- /**
+ static ArithOneOp * const f_vector_fcvtn[] = {
-  * get_phys_addr - get the physical address for this virtual address
+     NULL,
-  *
+     gen_fcvtn_hs,
      gen_fcvtn_sd,
  };
 +static ArithOneOp * const f_scalar_fcvtxn[] = {
 +    NULL,
 +    NULL,
 +    gen_fcvtxn_sd,
 +};
  TRANS(FCVTN_v, do_2misc_narrow_vector, a, f_vector_fcvtn)
  TRANS(FCVTXN_v, do_2misc_narrow_vector, a, f_scalar_fcvtxn)
 --
-.25.1
+.34.1

-[PULL 06/55] xlnx_dp: Fix the interrupt disable logic
+[PULL 22/68] target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
-From: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
+do_fp3_scalar_idx() is used only for the FMUL and FMULX scalar by
 element instructions; these both need to merge the result with the Rn
 register when FPCR.NEP is set.
-Fix interrupt disable logic. Mask value 1 indicates that interrupts are
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-disabled.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/tcg/translate-a64.c | 6 +++---
 file changed, 3 insertions(+), 3 deletions(-)
-Signed-off-by: Sai Pavan Boddu <saipava@xilinx.com>
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 Signed-off-by: Frederic Konrad <fkonrad@amd.com>
 Acked-by: Alistair Francis <alistair.francis@wdc.com>
 Message-id: 20220601172353.3220232-4-fkonrad@xilinx.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/display/xlnx_dp.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
 diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/xlnx_dp.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/hw/display/xlnx_dp.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void xlnx_dp_write(void *opaque, hwaddr offset, uint64_t value,
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
-         xlnx_dp_update_irq(s);
              read_vec_element(s, t1, a->rm, a->idx, MO_64);
              f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_A64));
 -            write_fp_dreg(s, a->rd, t0);
 +            write_fp_dreg_merging(s, a->rd, a->rn, t0);
          }
          break;
-     case DP_INT_DS:
+     case MO_32:
--        s->core_registers[DP_INT_MASK] |= ~value;
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
-+        s->core_registers[DP_INT_MASK] |= value;
-         xlnx_dp_update_irq(s);
+             read_vec_element_i32(s, t1, a->rm, a->idx, MO_32);
              f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_A64));
 -            write_fp_sreg(s, a->rd, t0);
 +            write_fp_sreg_merging(s, a->rd, a->rn, t0);
          }
          break;
      case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
              read_vec_element_i32(s, t1, a->rm, a->idx, MO_16);
              f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_A64_F16));
 -            write_fp_sreg(s, a->rd, t0);
 +            write_fp_hreg_merging(s, a->rd, a->rn, t0);
          }
          break;
      default:
 --
-.25.1
+.34.1

-[PULL 40/55] target/arm: Add el_is_in_host
+[PULL 23/68] target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
-From: Richard Henderson <richard.henderson@linaro.org>
+When FPCR.AH == 1, floating point FMIN and FMAX have some odd special
 cases:
-This (newish) ARM pseudocode function is easier to work with
+ * comparing two zeroes (even of different sign) or comparing a NaN
-than open-coded tests for HCR_E2H etc.  Use of the function
+   with anything always returns the second argument (possibly
-will be staged into the code base in parts.
+   squashed to zero)
  * denormal outputs are not squashed to zero regardless of FZ or FZ16
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Implement these semantics in new helper functions and select them at
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+translate time if FPCR.AH is 1 for the scalar FMAX and FMIN insns.
-Message-id: 20220607203306.657998-6-richard.henderson@linaro.org
+(We will convert the other FMAX and FMIN insns in subsequent
 commits.)
 Note that FMINNM and FMAXNM are not affected.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/internals.h |  2 ++
+ target/arm/tcg/helper-a64.h    |  7 +++++++
- target/arm/helper.c    | 28 ++++++++++++++++++++++++++++
+ target/arm/tcg/helper-a64.c    | 36 ++++++++++++++++++++++++++++++++++
-files changed, 30 insertions(+)
+ target/arm/tcg/translate-a64.c | 23 ++++++++++++++++++++--
 files changed, 64 insertions(+), 2 deletions(-)
-diff --git a/target/arm/internals.h b/target/arm/internals.h
+diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
+--- a/target/arm/tcg/helper-a64.h
-+++ b/target/arm/internals.h
++++ b/target/arm/tcg/helper-a64.h
-@@ -XXX,XX +XXX,XX @@ static inline void define_cortex_a72_a57_a53_cp_reginfo(ARMCPU *cpu) { }
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(advsimd_muladd2h, i32, i32, i32, i32, fpst)
- void define_cortex_a72_a57_a53_cp_reginfo(ARMCPU *cpu);
+ DEF_HELPER_2(advsimd_rinth_exact, f16, f16, fpst)
- #endif
+ DEF_HELPER_2(advsimd_rinth, f16, f16, fpst)
-+bool el_is_in_host(CPUARMState *env, int el);
++DEF_HELPER_3(vfp_ah_minh, f16, f16, f16, fpst)
 +DEF_HELPER_3(vfp_ah_mins, f32, f32, f32, fpst)
 +DEF_HELPER_3(vfp_ah_mind, f64, f64, f64, fpst)
 +DEF_HELPER_3(vfp_ah_maxh, f16, f16, f16, fpst)
 +DEF_HELPER_3(vfp_ah_maxs, f32, f32, f32, fpst)
 +DEF_HELPER_3(vfp_ah_maxd, f64, f64, f64, fpst)
 +
- void aa32_max_features(ARMCPU *cpu);
+ DEF_HELPER_2(exception_return, void, env, i64)
+ DEF_HELPER_FLAGS_2(dc_zva, TCG_CALL_NO_WG, void, env, i64)
- #endif
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/helper-a64.c
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/helper-a64.c
-@@ -XXX,XX +XXX,XX @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ float32 HELPER(fcvtx_f64_to_f32)(float64 a, float_status *fpst)
-     return ret;
+     return r;
  }
 +/*
-+ * Corresponds to ARM pseudocode function ELIsInHost().
++ * AH=1 min/max have some odd special cases:
 + * comparing two zeroes (regardless of sign), (NaN, anything),
 + * or (anything, NaN) should return the second argument (possibly
 + * squashed to zero).
 + * Also, denormal outputs are not squashed to zero regardless of FZ or FZ16.
 + */
-+bool el_is_in_host(CPUARMState *env, int el)
++#define AH_MINMAX_HELPER(NAME, CTYPE, FLOATTYPE, MINMAX)                \
-+{
++    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
-+    uint64_t mask;
++    {                                                                   \
-+
++        bool save;                                                      \
-+    /*
++        CTYPE r;                                                        \
-+     * Since we only care about E2H and TGE, we can skip arm_hcr_el2_eff().
++        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
-+     * Perform the simplest bit tests first, and validate EL2 afterward.
++        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
-+     */
++        if (FLOATTYPE ## _is_zero(a) && FLOATTYPE ## _is_zero(b)) {     \
-+    if (el & 1) {
++            return b;                                                   \
-+        return false; /* EL1 or EL3 */
++        }                                                               \
 +        if (FLOATTYPE ## _is_any_nan(a) ||                              \
 +            FLOATTYPE ## _is_any_nan(b)) {                              \
 +            float_raise(float_flag_invalid, fpst);                      \
 +            return b;                                                   \
 +        }                                                               \
 +        save = get_flush_to_zero(fpst);                                 \
 +        set_flush_to_zero(false, fpst);                                 \
 +        r = FLOATTYPE ## _ ## MINMAX(a, b, fpst);                       \
 +        set_flush_to_zero(save, fpst);                                  \
 +        return r;                                                       \
 +    }
 +
-+    /*
++AH_MINMAX_HELPER(vfp_ah_minh, dh_ctype_f16, float16, min)
-+     * Note that hcr_write() checks isar_feature_aa64_vh(),
++AH_MINMAX_HELPER(vfp_ah_mins, float32, float32, min)
-+     * aka HaveVirtHostExt(), in allowing HCR_E2H to be set.
++AH_MINMAX_HELPER(vfp_ah_mind, float64, float64, min)
-+     */
++AH_MINMAX_HELPER(vfp_ah_maxh, dh_ctype_f16, float16, max)
-+    mask = el ? HCR_E2H : HCR_E2H | HCR_TGE;
++AH_MINMAX_HELPER(vfp_ah_maxs, float32, float32, max)
-+    if ((env->cp15.hcr_el2 & mask) != mask) {
++AH_MINMAX_HELPER(vfp_ah_maxd, float64, float64, max)
 +        return false;
 +    }
 +
-+    /* TGE and/or E2H set: double check those bits are currently legal. */
+ /* 64-bit versions of the CRC helpers. Note that although the operation
-+    return arm_is_el2_enabled(env) && arm_el_is_aa64(env, 2);
+  * (and the prototypes of crc32c() and crc32() mean that only the bottom
   * 32 bits of the accumulator and result are used, we pass and return
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
                                         select_ah_fpst(s, a->esz));
  }
 +/* Some insns need to call different helpers when FPCR.AH == 1 */
 +static bool do_fp3_scalar_2fn(DisasContext *s, arg_rrr_e *a,
 +                              const FPScalar *fnormal,
 +                              const FPScalar *fah,
 +                              int mergereg)
 +{
 +    return do_fp3_scalar(s, a, s->fpcr_ah ? fah : fnormal, mergereg);
 +}
 +
- static void hcrx_write(CPUARMState *env, const ARMCPRegInfo *ri,
+ static const FPScalar f_scalar_fadd = {
-                        uint64_t value)
+     gen_helper_vfp_addh,
- {
+     gen_helper_vfp_adds,
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fmax = {
      gen_helper_vfp_maxs,
      gen_helper_vfp_maxd,
  };
 -TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax, a->rn)
 +static const FPScalar f_scalar_fmax_ah = {
 +    gen_helper_vfp_ah_maxh,
 +    gen_helper_vfp_ah_maxs,
 +    gen_helper_vfp_ah_maxd,
 +};
 +TRANS(FMAX_s, do_fp3_scalar_2fn, a, &f_scalar_fmax, &f_scalar_fmax_ah, a->rn)
  static const FPScalar f_scalar_fmin = {
      gen_helper_vfp_minh,
      gen_helper_vfp_mins,
      gen_helper_vfp_mind,
  };
 -TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin, a->rn)
 +static const FPScalar f_scalar_fmin_ah = {
 +    gen_helper_vfp_ah_minh,
 +    gen_helper_vfp_ah_mins,
 +    gen_helper_vfp_ah_mind,
 +};
 +TRANS(FMIN_s, do_fp3_scalar_2fn, a, &f_scalar_fmin, &f_scalar_fmin_ah, a->rn)
  static const FPScalar f_scalar_fmaxnm = {
      gen_helper_vfp_maxnumh,
 --
-.25.1
+.34.1

-[PULL 32/55] target/arm: Move regime_translation_disabled to ptw.c
+[PULL 24/68] target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the FPCR.AH == 1 semantics for vector FMIN/FMAX, by
 creating new _ah_ versions of the gvec helpers which invoke the
 scalar fmin_ah and fmax_ah helpers on each element.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-26-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/ptw.h    | 17 ----------------
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
- target/arm/helper.c | 47 ---------------------------------------------
+ target/arm/tcg/translate-a64.c | 21 +++++++++++++++++++--
- target/arm/ptw.c    | 47 ++++++++++++++++++++++++++++++++++++++++++++-
+ target/arm/tcg/vec_helper.c    |  8 ++++++++
-files changed, 46 insertions(+), 65 deletions(-)
+files changed, 41 insertions(+), 2 deletions(-)
  delete mode 100644 target/arm/ptw.h
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
 deleted file mode 100644
 index XXXXXXX..XXXXXXX
 --- a/target/arm/ptw.h
 +++ /dev/null
@@ -XXX,XX +XXX,XX @@
 -/*
 - * ARM page table walking.
 - *
 - * This code is licensed under the GNU GPL v2 or later.
 - *
 - * SPDX-License-Identifier: GPL-2.0-or-later
 - */
 -
 -#ifndef TARGET_ARM_PTW_H
 -#define TARGET_ARM_PTW_H
 -
 -#ifndef CONFIG_USER_ONLY
 -
 -bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
 -
 -#endif /* !CONFIG_USER_ONLY */
 -#endif /* TARGET_ARM_PTW_H */
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/helper-sve.h
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/helper-sve.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG,
- #include "semihosting/common-semi.h"
+ DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG,
- #endif
+                    void, ptr, ptr, ptr, fpst, i32)
- #include "cpregs.h"
--#include "ptw.h"
++DEF_HELPER_FLAGS_5(gvec_ah_fmax_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
- #define ARM_CPU_FREQ 1000000000 /* FIXME: 1 GHz, should be configurable */
++DEF_HELPER_FLAGS_5(gvec_ah_fmax_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, fpst, i32)
-@@ -XXX,XX +XXX,XX @@ uint64_t arm_sctlr(CPUARMState *env, int el)
++DEF_HELPER_FLAGS_5(gvec_ah_fmax_d, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, fpst, i32)
 +
 +DEF_HELPER_FLAGS_5(gvec_ah_fmin_h, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_5(gvec_ah_fmin_s, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_5(gvec_ah_fmin_d, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, fpst, i32)
 +
  DEF_HELPER_FLAGS_4(sve_faddv_h, TCG_CALL_NO_RWG,
                     i64, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_4(sve_faddv_s, TCG_CALL_NO_RWG,
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
                                         FPST_A64_F16 : FPST_A64);
  }
- #ifndef CONFIG_USER_ONLY
++static bool do_fp3_vector_2fn(DisasContext *s, arg_qrrr_e *a, int data,
--
++                              gen_helper_gvec_3_ptr * const fnormal[3],
--/* Return true if the specified stage of address translation is disabled */
++                              gen_helper_gvec_3_ptr * const fah[3])
 -bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx)
 -{
 -    uint64_t hcr_el2;
 -
 -    if (arm_feature(env, ARM_FEATURE_M)) {
 -        switch (env->v7m.mpu_ctrl[regime_is_secure(env, mmu_idx)] &
 -                (R_V7M_MPU_CTRL_ENABLE_MASK | R_V7M_MPU_CTRL_HFNMIENA_MASK)) {
 -        case R_V7M_MPU_CTRL_ENABLE_MASK:
 -            /* Enabled, but not for HardFault and NMI */
 -            return mmu_idx & ARM_MMU_IDX_M_NEGPRI;
 -        case R_V7M_MPU_CTRL_ENABLE_MASK | R_V7M_MPU_CTRL_HFNMIENA_MASK:
 -            /* Enabled for all cases */
 -            return false;
 -        case 0:
 -        default:
 -            /* HFNMIENA set and ENABLE clear is UNPREDICTABLE, but
 -             * we warned about that in armv7m_nvic.c when the guest set it.
 -             */
 -            return true;
 -        }
 -    }
 -
 -    hcr_el2 = arm_hcr_el2_eff(env);
 -
 -    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 -        /* HCR.DC means HCR.VM behaves as 1 */
 -        return (hcr_el2 & (HCR_DC | HCR_VM)) == 0;
 -    }
 -
 -    if (hcr_el2 & HCR_TGE) {
 -        /* TGE means that NS EL0/1 act as if SCTLR_EL1.M is zero */
 -        if (!regime_is_secure(env, mmu_idx) && regime_el(env, mmu_idx) == 1) {
 -            return true;
 -        }
 -    }
 -
 -    if ((hcr_el2 & HCR_DC) && arm_mmu_idx_is_stage1_of_2(mmu_idx)) {
 -        /* HCR.DC means SCTLR_EL1.M behaves as 0 */
 -        return true;
 -    }
 -
 -    return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
 -}
 -
  /* Convert a possible stage1+2 MMU index into the appropriate
   * stage 1 MMU index
   */
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@
  #include "cpu.h"
  #include "internals.h"
  #include "idau.h"
 -#include "ptw.h"
  static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
@@ -XXX,XX +XXX,XX @@ static uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn)
      }
  }
 +/* Return true if the specified stage of address translation is disabled */
 +static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx)
 +{
-+    uint64_t hcr_el2;
++    return do_fp3_vector(s, a, data, s->fpcr_ah ? fah : fnormal);
 +
 +    if (arm_feature(env, ARM_FEATURE_M)) {
 +        switch (env->v7m.mpu_ctrl[regime_is_secure(env, mmu_idx)] &
 +                (R_V7M_MPU_CTRL_ENABLE_MASK | R_V7M_MPU_CTRL_HFNMIENA_MASK)) {
 +        case R_V7M_MPU_CTRL_ENABLE_MASK:
 +            /* Enabled, but not for HardFault and NMI */
 +            return mmu_idx & ARM_MMU_IDX_M_NEGPRI;
 +        case R_V7M_MPU_CTRL_ENABLE_MASK | R_V7M_MPU_CTRL_HFNMIENA_MASK:
 +            /* Enabled for all cases */
 +            return false;
 +        case 0:
 +        default:
 +            /*
 +             * HFNMIENA set and ENABLE clear is UNPREDICTABLE, but
 +             * we warned about that in armv7m_nvic.c when the guest set it.
 +             */
 +            return true;
 +        }
 +    }
 +
 +    hcr_el2 = arm_hcr_el2_eff(env);
 +
 +    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 +        /* HCR.DC means HCR.VM behaves as 1 */
 +        return (hcr_el2 & (HCR_DC | HCR_VM)) == 0;
 +    }
 +
 +    if (hcr_el2 & HCR_TGE) {
 +        /* TGE means that NS EL0/1 act as if SCTLR_EL1.M is zero */
 +        if (!regime_is_secure(env, mmu_idx) && regime_el(env, mmu_idx) == 1) {
 +            return true;
 +        }
 +    }
 +
 +    if ((hcr_el2 & HCR_DC) && arm_mmu_idx_is_stage1_of_2(mmu_idx)) {
 +        /* HCR.DC means SCTLR_EL1.M behaves as 0 */
 +        return true;
 +    }
 +
 +    return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
 +}
 +
- static bool ptw_attrs_are_device(CPUARMState *env, ARMCacheAttrs cacheattrs)
+ static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
                               gen_helper_gvec_3_ptr * const f[3])
  {
-     /*
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmax[3] = {
      gen_helper_gvec_fmax_s,
      gen_helper_gvec_fmax_d,
  };
 -TRANS(FMAX_v, do_fp3_vector, a, 0, f_vector_fmax)
 +static gen_helper_gvec_3_ptr * const f_vector_fmax_ah[3] = {
 +    gen_helper_gvec_ah_fmax_h,
 +    gen_helper_gvec_ah_fmax_s,
 +    gen_helper_gvec_ah_fmax_d,
 +};
 +TRANS(FMAX_v, do_fp3_vector_2fn, a, 0, f_vector_fmax, f_vector_fmax_ah)
  static gen_helper_gvec_3_ptr * const f_vector_fmin[3] = {
      gen_helper_gvec_fmin_h,
      gen_helper_gvec_fmin_s,
      gen_helper_gvec_fmin_d,
  };
 -TRANS(FMIN_v, do_fp3_vector, a, 0, f_vector_fmin)
 +static gen_helper_gvec_3_ptr * const f_vector_fmin_ah[3] = {
 +    gen_helper_gvec_ah_fmin_h,
 +    gen_helper_gvec_ah_fmin_s,
 +    gen_helper_gvec_ah_fmin_d,
 +};
 +TRANS(FMIN_v, do_fp3_vector_2fn, a, 0, f_vector_fmin, f_vector_fmin_ah)
  static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[3] = {
      gen_helper_gvec_fmaxnum_h,
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_rsqrts_h, helper_rsqrtsf_f16, float16)
  DO_3OP(gvec_rsqrts_s, helper_rsqrtsf_f32, float32)
  DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64)
 +DO_3OP(gvec_ah_fmax_h, helper_vfp_ah_maxh, float16)
 +DO_3OP(gvec_ah_fmax_s, helper_vfp_ah_maxs, float32)
 +DO_3OP(gvec_ah_fmax_d, helper_vfp_ah_maxd, float64)
 +
 +DO_3OP(gvec_ah_fmin_h, helper_vfp_ah_minh, float16)
 +DO_3OP(gvec_ah_fmin_s, helper_vfp_ah_mins, float32)
 +DO_3OP(gvec_ah_fmin_d, helper_vfp_ah_mind, float64)
 +
  #endif
  #undef DO_3OP
 --
-.25.1
+.34.1

-[PULL 13/55] target/arm: Move get_phys_addr_pmsav7_default to ptw.c
+[PULL 25/68] target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the FPCR.AH semantics for FMAXV and FMINV.  These are the
 "recursively reduce all lanes of a vector to a scalar result" insns;
 we just need to use the _ah_ helper for the reduction step when
 FPCR.AH == 1.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-7-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/ptw.h    |  3 +++
+ target/arm/tcg/translate-a64.c | 28 ++++++++++++++++++----------
- target/arm/helper.c | 41 -----------------------------------------
+file changed, 18 insertions(+), 10 deletions(-)
  target/arm/ptw.c    | 41 +++++++++++++++++++++++++++++++++++++++++
 files changed, 44 insertions(+), 41 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/ptw.h
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+@@ -XXX,XX +XXX,XX @@ static TCGv_i32 do_reduction_op(DisasContext *s, int rn, MemOp esz,
      return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
  }
-+void get_phys_addr_pmsav7_default(CPUARMState *env,
+ static bool do_fp_reduction(DisasContext *s, arg_qrr_e *a,
-+                                  ARMMMUIdx mmu_idx,
+-                              NeonGenTwoSingleOpFn *fn)
-+                                  int32_t address, int *prot);
++                            NeonGenTwoSingleOpFn *fnormal,
- bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
++                            NeonGenTwoSingleOpFn *fah)
-                           MMUAccessType access_type, ARMMMUIdx mmu_idx,
+ {
-                           hwaddr *phys_ptr, int *prot,
+     if (fp_access_check(s)) {
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+         MemOp esz = a->esz;
-index XXXXXXX..XXXXXXX 100644
+         int elts = (a->q ? 16 : 8) >> esz;
---- a/target/arm/helper.c
+         TCGv_ptr fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64);
-+++ b/target/arm/helper.c
+-        TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst, fn);
-@@ -XXX,XX +XXX,XX @@ do_fault:
++        TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst,
 +                                       s->fpcr_ah ? fah : fnormal);
          write_fp_sreg(s, a->rd, res);
      }
      return true;
  }
--static inline void get_phys_addr_pmsav7_default(CPUARMState *env,
+-TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxnumh)
--                                                ARMMMUIdx mmu_idx,
+-TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minnumh)
--                                                int32_t address, int *prot)
+-TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxh)
--{
+-TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minh)
--    if (!arm_feature(env, ARM_FEATURE_M)) {
++TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a,
--        *prot = PAGE_READ | PAGE_WRITE;
++           gen_helper_vfp_maxnumh, gen_helper_vfp_maxnumh)
--        switch (address) {
++TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a,
--        case 0xF0000000 ... 0xFFFFFFFF:
++           gen_helper_vfp_minnumh, gen_helper_vfp_minnumh)
--            if (regime_sctlr(env, mmu_idx) & SCTLR_V) {
++TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a,
--                /* hivecs execing is ok */
++           gen_helper_vfp_maxh, gen_helper_vfp_ah_maxh)
--                *prot |= PAGE_EXEC;
++TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a,
--            }
++           gen_helper_vfp_minh, gen_helper_vfp_ah_minh)
--            break;
--        case 0x00000000 ... 0x7FFFFFFF:
+-TRANS(FMAXNMV_s, do_fp_reduction, a, gen_helper_vfp_maxnums)
--            *prot |= PAGE_EXEC;
+-TRANS(FMINNMV_s, do_fp_reduction, a, gen_helper_vfp_minnums)
--            break;
+-TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs)
--        }
+-TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins)
--    } else {
++TRANS(FMAXNMV_s, do_fp_reduction, a,
--        /* Default system address map for M profile cores.
++      gen_helper_vfp_maxnums, gen_helper_vfp_maxnums)
--         * The architecture specifies which regions are execute-never;
++TRANS(FMINNMV_s, do_fp_reduction, a,
--         * at the MPU level no other checks are defined.
++      gen_helper_vfp_minnums, gen_helper_vfp_minnums)
--         */
++TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs, gen_helper_vfp_ah_maxs)
--        switch (address) {
++TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins, gen_helper_vfp_ah_mins)
--        case 0x00000000 ... 0x1fffffff: /* ROM */
--        case 0x20000000 ... 0x3fffffff: /* SRAM */
+ /*
--        case 0x60000000 ... 0x7fffffff: /* RAM */
+  * Floating-point Immediate
 -        case 0x80000000 ... 0x9fffffff: /* RAM */
 -            *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
 -            break;
 -        case 0x40000000 ... 0x5fffffff: /* Peripheral */
 -        case 0xa0000000 ... 0xbfffffff: /* Device */
 -        case 0xc0000000 ... 0xdfffffff: /* Device */
 -        case 0xe0000000 ... 0xffffffff: /* System */
 -            *prot = PAGE_READ | PAGE_WRITE;
 -            break;
 -        default:
 -            g_assert_not_reached();
 -        }
 -    }
 -}
 -
  static bool pmsav7_use_background_region(ARMCPU *cpu,
                                           ARMMMUIdx mmu_idx, bool is_user)
  {
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
      return false;
  }
 +void get_phys_addr_pmsav7_default(CPUARMState *env,
 +                                  ARMMMUIdx mmu_idx,
 +                                  int32_t address, int *prot)
 +{
 +    if (!arm_feature(env, ARM_FEATURE_M)) {
 +        *prot = PAGE_READ | PAGE_WRITE;
 +        switch (address) {
 +        case 0xF0000000 ... 0xFFFFFFFF:
 +            if (regime_sctlr(env, mmu_idx) & SCTLR_V) {
 +                /* hivecs execing is ok */
 +                *prot |= PAGE_EXEC;
 +            }
 +            break;
 +        case 0x00000000 ... 0x7FFFFFFF:
 +            *prot |= PAGE_EXEC;
 +            break;
 +        }
 +    } else {
 +        /* Default system address map for M profile cores.
 +         * The architecture specifies which regions are execute-never;
 +         * at the MPU level no other checks are defined.
 +         */
 +        switch (address) {
 +        case 0x00000000 ... 0x1fffffff: /* ROM */
 +        case 0x20000000 ... 0x3fffffff: /* SRAM */
 +        case 0x60000000 ... 0x7fffffff: /* RAM */
 +        case 0x80000000 ... 0x9fffffff: /* RAM */
 +            *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
 +            break;
 +        case 0x40000000 ... 0x5fffffff: /* Peripheral */
 +        case 0xa0000000 ... 0xbfffffff: /* Device */
 +        case 0xc0000000 ... 0xdfffffff: /* Device */
 +        case 0xe0000000 ... 0xffffffff: /* System */
 +            *prot = PAGE_READ | PAGE_WRITE;
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +    }
 +}
 +
  /**
   * get_phys_addr - get the physical address for this virtual address
   *
 --
-.25.1
+.34.1

-[PULL 19/55] target/arm: Move m_is_{ppb,system}_region to ptw.c
+[PULL 26/68] target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the FPCR.AH semantics for the pairwise floating
 point minimum/maximum insns FMINP and FMAXP.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-13-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/ptw.h    |  3 ---
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
- target/arm/helper.c | 15 ---------------
+ target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
- target/arm/ptw.c    | 16 ++++++++++++++++
+ target/arm/tcg/vec_helper.c    | 10 ++++++++++
-files changed, 16 insertions(+), 18 deletions(-)
+files changed, 45 insertions(+), 4 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/tcg/helper-sve.h
-+++ b/target/arm/ptw.h
++++ b/target/arm/tcg/helper-sve.h
-@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_ah_fmin_s, TCG_CALL_NO_RWG,
-     return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
+ DEF_HELPER_FLAGS_5(gvec_ah_fmin_d, TCG_CALL_NO_RWG,
- }
+                    void, ptr, ptr, ptr, fpst, i32)
--bool m_is_ppb_region(CPUARMState *env, uint32_t address);
++DEF_HELPER_FLAGS_5(gvec_ah_fmaxp_h, TCG_CALL_NO_RWG,
--bool m_is_system_region(CPUARMState *env, uint32_t address);
++                   void, ptr, ptr, ptr, fpst, i32)
--
++DEF_HELPER_FLAGS_5(gvec_ah_fmaxp_s, TCG_CALL_NO_RWG,
- bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
++                   void, ptr, ptr, ptr, fpst, i32)
-                         MMUAccessType access_type, ARMMMUIdx mmu_idx,
++DEF_HELPER_FLAGS_5(gvec_ah_fmaxp_d, TCG_CALL_NO_RWG,
-                         bool s1_is_el0,
++                   void, ptr, ptr, ptr, fpst, i32)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
++
 +DEF_HELPER_FLAGS_5(gvec_ah_fminp_h, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_5(gvec_ah_fminp_s, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_5(gvec_ah_fminp_d, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, fpst, i32)
 +
  DEF_HELPER_FLAGS_4(sve_faddv_h, TCG_CALL_NO_RWG,
                     i64, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_4(sve_faddv_s, TCG_CALL_NO_RWG,
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ do_fault:
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmaxp[3] = {
      gen_helper_gvec_fmaxp_s,
      gen_helper_gvec_fmaxp_d,
  };
 -TRANS(FMAXP_v, do_fp3_vector, a, 0, f_vector_fmaxp)
 +static gen_helper_gvec_3_ptr * const f_vector_ah_fmaxp[3] = {
 +    gen_helper_gvec_ah_fmaxp_h,
 +    gen_helper_gvec_ah_fmaxp_s,
 +    gen_helper_gvec_ah_fmaxp_d,
 +};
 +TRANS(FMAXP_v, do_fp3_vector_2fn, a, 0, f_vector_fmaxp, f_vector_ah_fmaxp)
  static gen_helper_gvec_3_ptr * const f_vector_fminp[3] = {
      gen_helper_gvec_fminp_h,
      gen_helper_gvec_fminp_s,
      gen_helper_gvec_fminp_d,
  };
 -TRANS(FMINP_v, do_fp3_vector, a, 0, f_vector_fminp)
 +static gen_helper_gvec_3_ptr * const f_vector_ah_fminp[3] = {
 +    gen_helper_gvec_ah_fminp_h,
 +    gen_helper_gvec_ah_fminp_s,
 +    gen_helper_gvec_ah_fminp_d,
 +};
 +TRANS(FMINP_v, do_fp3_vector_2fn, a, 0, f_vector_fminp, f_vector_ah_fminp)
  static gen_helper_gvec_3_ptr * const f_vector_fmaxnmp[3] = {
      gen_helper_gvec_fmaxnump_h,
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_pair(DisasContext *s, arg_rr_e *a, const FPScalar *f)
      return true;
  }
--bool m_is_ppb_region(CPUARMState *env, uint32_t address)
++static bool do_fp3_scalar_pair_2fn(DisasContext *s, arg_rr_e *a,
--{
++                                   const FPScalar *fnormal,
--    /* True if address is in the M profile PPB region 0xe0000000 - 0xe00fffff */
++                                   const FPScalar *fah)
 -    return arm_feature(env, ARM_FEATURE_M) &&
 -        extract32(address, 20, 12) == 0xe00;
 -}
 -
 -bool m_is_system_region(CPUARMState *env, uint32_t address)
 -{
 -    /* True if address is in the M profile system region
 -     * 0xe0000000 - 0xffffffff
 -     */
 -    return arm_feature(env, ARM_FEATURE_M) && extract32(address, 29, 3) == 0x7;
 -}
 -
  /* Combine either inner or outer cacheability attributes for normal
   * memory, according to table D4-42 and pseudocode procedure
   * CombineS1S2AttrHints() of ARM DDI 0487B.b (the ARMv8 ARM).
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static void get_phys_addr_pmsav7_default(CPUARMState *env, ARMMMUIdx mmu_idx,
      }
  }
 +static bool m_is_ppb_region(CPUARMState *env, uint32_t address)
 +{
-+    /* True if address is in the M profile PPB region 0xe0000000 - 0xe00fffff */
++    return do_fp3_scalar_pair(s, a, s->fpcr_ah ? fah : fnormal);
 +    return arm_feature(env, ARM_FEATURE_M) &&
 +        extract32(address, 20, 12) == 0xe00;
 +}
 +
-+static bool m_is_system_region(CPUARMState *env, uint32_t address)
+ TRANS(FADDP_s, do_fp3_scalar_pair, a, &f_scalar_fadd)
-+{
+-TRANS(FMAXP_s, do_fp3_scalar_pair, a, &f_scalar_fmax)
-+    /*
+-TRANS(FMINP_s, do_fp3_scalar_pair, a, &f_scalar_fmin)
-+     * True if address is in the M profile system region
++TRANS(FMAXP_s, do_fp3_scalar_pair_2fn, a, &f_scalar_fmax, &f_scalar_fmax_ah)
-+     * 0xe0000000 - 0xffffffff
++TRANS(FMINP_s, do_fp3_scalar_pair_2fn, a, &f_scalar_fmin, &f_scalar_fmin_ah)
-+     */
+ TRANS(FMAXNMP_s, do_fp3_scalar_pair, a, &f_scalar_fmaxnm)
-+    return arm_feature(env, ARM_FEATURE_M) && extract32(address, 29, 3) == 0x7;
+ TRANS(FMINNMP_s, do_fp3_scalar_pair, a, &f_scalar_fminnm)
-+}
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_3OP_PAIR(gvec_fminnump_h, float16_minnum, float16, H2)
  DO_3OP_PAIR(gvec_fminnump_s, float32_minnum, float32, H4)
  DO_3OP_PAIR(gvec_fminnump_d, float64_minnum, float64, )
 +#ifdef TARGET_AARCH64
 +DO_3OP_PAIR(gvec_ah_fmaxp_h, helper_vfp_ah_maxh, float16, H2)
 +DO_3OP_PAIR(gvec_ah_fmaxp_s, helper_vfp_ah_maxs, float32, H4)
 +DO_3OP_PAIR(gvec_ah_fmaxp_d, helper_vfp_ah_maxd, float64, )
 +
- static bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx,
++DO_3OP_PAIR(gvec_ah_fminp_h, helper_vfp_ah_minh, float16, H2)
-                                          bool is_user)
++DO_3OP_PAIR(gvec_ah_fminp_s, helper_vfp_ah_mins, float32, H4)
- {
++DO_3OP_PAIR(gvec_ah_fminp_d, helper_vfp_ah_mind, float64, )
 +#endif
 +
  #undef DO_3OP_PAIR
  #define DO_3OP_PAIR(NAME, FUNC, TYPE, H) \
 --
-.25.1
+.34.1

-[PULL 50/55] target/arm: Move expand_pred_b to vec_internal.h
+[PULL 27/68] target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the FPCR.AH semantics for the SVE FMAXV and FMINV
 vector-reduction-to-scalar max/min operations.
-Put the inline function near the array declaration.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/tcg/helper-sve.h    | 14 +++++++++++
  target/arm/tcg/sve_helper.c    | 43 +++++++++++++++++++++-------------
  target/arm/tcg/translate-sve.c | 16 +++++++++++--
 files changed, 55 insertions(+), 18 deletions(-)
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20220607203306.657998-16-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/vec_internal.h | 8 +++++++-
  target/arm/sve_helper.c   | 9 ---------
 files changed, 7 insertions(+), 10 deletions(-)
 diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_internal.h
+--- a/target/arm/tcg/helper-sve.h
-+++ b/target/arm/vec_internal.h
++++ b/target/arm/tcg/helper-sve.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_fminv_s, TCG_CALL_NO_RWG,
- #define H8(x)   (x)
+ DEF_HELPER_FLAGS_4(sve_fminv_d, TCG_CALL_NO_RWG,
- #define H1_8(x) (x)
+                    i64, ptr, ptr, fpst, i32)
--/* Data for expanding active predicate bits to bytes, for byte elements. */
++DEF_HELPER_FLAGS_4(sve_ah_fmaxv_h, TCG_CALL_NO_RWG,
-+/*
++                   i64, ptr, ptr, fpst, i32)
-+ * Expand active predicate bits to bytes, for byte elements.
++DEF_HELPER_FLAGS_4(sve_ah_fmaxv_s, TCG_CALL_NO_RWG,
-+ */
++                   i64, ptr, ptr, fpst, i32)
- extern const uint64_t expand_pred_b_data[256];
++DEF_HELPER_FLAGS_4(sve_ah_fmaxv_d, TCG_CALL_NO_RWG,
-+static inline uint64_t expand_pred_b(uint8_t byte)
++                   i64, ptr, ptr, fpst, i32)
-+{
++
-+    return expand_pred_b_data[byte];
++DEF_HELPER_FLAGS_4(sve_ah_fminv_h, TCG_CALL_NO_RWG,
-+}
++                   i64, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fminv_s, TCG_CALL_NO_RWG,
- static inline void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
++                   i64, ptr, ptr, fpst, i32)
- {
++DEF_HELPER_FLAGS_4(sve_ah_fminv_d, TCG_CALL_NO_RWG,
-diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
++                   i64, ptr, ptr, fpst, i32)
 +
  DEF_HELPER_FLAGS_5(sve_fadda_h, TCG_CALL_NO_RWG,
                     i64, i64, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_5(sve_fadda_s, TCG_CALL_NO_RWG,
 diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/sve_helper.c
+--- a/target/arm/tcg/sve_helper.c
-+++ b/target/arm/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
+@@ -XXX,XX +XXX,XX @@ static TYPE NAME##_reduce(TYPE *data, float_status *status, uintptr_t n) \
-     return flags;
+         uintptr_t half = n / 2;                                       \
          TYPE lo = NAME##_reduce(data, status, half);                  \
          TYPE hi = NAME##_reduce(data + half, status, half);           \
 -        return TYPE##_##FUNC(lo, hi, status);                         \
 +        return FUNC(lo, hi, status);                                  \
      }                                                                 \
  }                                                                     \
  uint64_t HELPER(NAME)(void *vn, void *vg, float_status *s, uint32_t desc) \
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(NAME)(void *vn, void *vg, float_status *s, uint32_t desc) \
      return NAME##_reduce(data, s, maxsz / sizeof(TYPE));              \
  }
--/*
+-DO_REDUCE(sve_faddv_h, float16, H1_2, add, float16_zero)
-- * Expand active predicate bits to bytes, for byte elements.
+-DO_REDUCE(sve_faddv_s, float32, H1_4, add, float32_zero)
-- * (The data table itself is in vec_helper.c as MVE also needs it.)
+-DO_REDUCE(sve_faddv_d, float64, H1_8, add, float64_zero)
-- */
++DO_REDUCE(sve_faddv_h, float16, H1_2, float16_add, float16_zero)
--static inline uint64_t expand_pred_b(uint8_t byte)
++DO_REDUCE(sve_faddv_s, float32, H1_4, float32_add, float32_zero)
--{
++DO_REDUCE(sve_faddv_d, float64, H1_8, float64_add, float64_zero)
--    return expand_pred_b_data[byte];
--}
+ /* Identity is floatN_default_nan, without the function call.  */
--
+-DO_REDUCE(sve_fminnmv_h, float16, H1_2, minnum, 0x7E00)
- /* Similarly for half-word elements.
+-DO_REDUCE(sve_fminnmv_s, float32, H1_4, minnum, 0x7FC00000)
-  *  for (i = 0; i < 256; ++i) {
+-DO_REDUCE(sve_fminnmv_d, float64, H1_8, minnum, 0x7FF8000000000000ULL)
-  *      unsigned long m = 0;
++DO_REDUCE(sve_fminnmv_h, float16, H1_2, float16_minnum, 0x7E00)
 +DO_REDUCE(sve_fminnmv_s, float32, H1_4, float32_minnum, 0x7FC00000)
 +DO_REDUCE(sve_fminnmv_d, float64, H1_8, float64_minnum, 0x7FF8000000000000ULL)
 -DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, maxnum, 0x7E00)
 -DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, maxnum, 0x7FC00000)
 -DO_REDUCE(sve_fmaxnmv_d, float64, H1_8, maxnum, 0x7FF8000000000000ULL)
 +DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, float16_maxnum, 0x7E00)
 +DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, float32_maxnum, 0x7FC00000)
 +DO_REDUCE(sve_fmaxnmv_d, float64, H1_8, float64_maxnum, 0x7FF8000000000000ULL)
 -DO_REDUCE(sve_fminv_h, float16, H1_2, min, float16_infinity)
 -DO_REDUCE(sve_fminv_s, float32, H1_4, min, float32_infinity)
 -DO_REDUCE(sve_fminv_d, float64, H1_8, min, float64_infinity)
 +DO_REDUCE(sve_fminv_h, float16, H1_2, float16_min, float16_infinity)
 +DO_REDUCE(sve_fminv_s, float32, H1_4, float32_min, float32_infinity)
 +DO_REDUCE(sve_fminv_d, float64, H1_8, float64_min, float64_infinity)
 -DO_REDUCE(sve_fmaxv_h, float16, H1_2, max, float16_chs(float16_infinity))
 -DO_REDUCE(sve_fmaxv_s, float32, H1_4, max, float32_chs(float32_infinity))
 -DO_REDUCE(sve_fmaxv_d, float64, H1_8, max, float64_chs(float64_infinity))
 +DO_REDUCE(sve_fmaxv_h, float16, H1_2, float16_max, float16_chs(float16_infinity))
 +DO_REDUCE(sve_fmaxv_s, float32, H1_4, float32_max, float32_chs(float32_infinity))
 +DO_REDUCE(sve_fmaxv_d, float64, H1_8, float64_max, float64_chs(float64_infinity))
 +
 +DO_REDUCE(sve_ah_fminv_h, float16, H1_2, helper_vfp_ah_minh, float16_infinity)
 +DO_REDUCE(sve_ah_fminv_s, float32, H1_4, helper_vfp_ah_mins, float32_infinity)
 +DO_REDUCE(sve_ah_fminv_d, float64, H1_8, helper_vfp_ah_mind, float64_infinity)
 +
 +DO_REDUCE(sve_ah_fmaxv_h, float16, H1_2, helper_vfp_ah_maxh,
 +          float16_chs(float16_infinity))
 +DO_REDUCE(sve_ah_fmaxv_s, float32, H1_4, helper_vfp_ah_maxs,
 +          float32_chs(float32_infinity))
 +DO_REDUCE(sve_ah_fmaxv_d, float64, H1_8, helper_vfp_ah_maxd,
 +          float64_chs(float64_infinity))
  #undef DO_REDUCE
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool do_reduce(DisasContext *s, arg_rpr_esz *a,
      };                                                                   \
      TRANS_FEAT(NAME, aa64_sve, do_reduce, a, name##_fns[a->esz])
 +#define DO_VPZ_AH(NAME, name)                                            \
 +    static gen_helper_fp_reduce * const name##_fns[4] = {                \
 +        NULL,                      gen_helper_sve_##name##_h,            \
 +        gen_helper_sve_##name##_s, gen_helper_sve_##name##_d,            \
 +    };                                                                   \
 +    static gen_helper_fp_reduce * const name##_ah_fns[4] = {             \
 +        NULL,                      gen_helper_sve_ah_##name##_h,         \
 +        gen_helper_sve_ah_##name##_s, gen_helper_sve_ah_##name##_d,      \
 +    };                                                                   \
 +    TRANS_FEAT(NAME, aa64_sve, do_reduce, a,                             \
 +               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz])
 +
  DO_VPZ(FADDV, faddv)
  DO_VPZ(FMINNMV, fminnmv)
  DO_VPZ(FMAXNMV, fmaxnmv)
 -DO_VPZ(FMINV, fminv)
 -DO_VPZ(FMAXV, fmaxv)
 +DO_VPZ_AH(FMINV, fminv)
 +DO_VPZ_AH(FMAXV, fmaxv)
  #undef DO_VPZ
 --
-.25.1
+.34.1

-New patch
+[PULL 28/68] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
+Implement the FPCR.AH semantics for the SVE FMAX and FMIN operations
+that take an immediate as the second operand.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
+ target/arm/tcg/sve_helper.c    |  8 ++++++++
+ target/arm/tcg/translate-sve.c | 25 +++++++++++++++++++++++--
+files changed, 45 insertions(+), 2 deletions(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_fmins_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_6(sve_fmins_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmaxs_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmaxs_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmaxs_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++
++DEF_HELPER_FLAGS_6(sve_ah_fmins_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmins_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmins_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, i64, fpst, i32)
++
+ DEF_HELPER_FLAGS_5(sve_fcvt_sh, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(sve_fcvt_dh, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_ZPZS_FP(sve_fmins_h, float16, H1_2, float16_min)
+ DO_ZPZS_FP(sve_fmins_s, float32, H1_4, float32_min)
+ DO_ZPZS_FP(sve_fmins_d, float64, H1_8, float64_min)
++DO_ZPZS_FP(sve_ah_fmaxs_h, float16, H1_2, helper_vfp_ah_maxh)
++DO_ZPZS_FP(sve_ah_fmaxs_s, float32, H1_4, helper_vfp_ah_maxs)
++DO_ZPZS_FP(sve_ah_fmaxs_d, float64, H1_8, helper_vfp_ah_maxd)
++
++DO_ZPZS_FP(sve_ah_fmins_h, float16, H1_2, helper_vfp_ah_minh)
++DO_ZPZS_FP(sve_ah_fmins_s, float32, H1_4, helper_vfp_ah_mins)
++DO_ZPZS_FP(sve_ah_fmins_d, float64, H1_8, helper_vfp_ah_mind)
++
+ /* Fully general two-operand expander, controlled by a predicate,
+  * With the extra float_status parameter.
+  */
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static bool do_fp_imm(DisasContext *s, arg_rpri_esz *a, uint64_t imm,
+     TRANS_FEAT(NAME##_zpzi, aa64_sve, do_fp_imm, a,                     \
+                name##_const[a->esz][a->imm], name##_fns[a->esz])
++#define DO_FP_AH_IMM(NAME, name, const0, const1)                        \
++    static gen_helper_sve_fp2scalar * const name##_fns[4] = {           \
++        NULL, gen_helper_sve_##name##_h,                                \
++        gen_helper_sve_##name##_s,                                      \
++        gen_helper_sve_##name##_d                                       \
++    };                                                                  \
++    static gen_helper_sve_fp2scalar * const name##_ah_fns[4] = {        \
++        NULL, gen_helper_sve_ah_##name##_h,                             \
++        gen_helper_sve_ah_##name##_s,                                   \
++        gen_helper_sve_ah_##name##_d                                    \
++    };                                                                  \
++    static uint64_t const name##_const[4][2] = {                        \
++        { -1, -1 },                                                     \
++        { float16_##const0, float16_##const1 },                         \
++        { float32_##const0, float32_##const1 },                         \
++        { float64_##const0, float64_##const1 },                         \
++    };                                                                  \
++    TRANS_FEAT(NAME##_zpzi, aa64_sve, do_fp_imm, a,                     \
++               name##_const[a->esz][a->imm],                            \
++               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz])
++
+ DO_FP_IMM(FADD, fadds, half, one)
+ DO_FP_IMM(FSUB, fsubs, half, one)
+ DO_FP_IMM(FMUL, fmuls, half, two)
+ DO_FP_IMM(FSUBR, fsubrs, half, one)
+ DO_FP_IMM(FMAXNM, fmaxnms, zero, one)
+ DO_FP_IMM(FMINNM, fminnms, zero, one)
+-DO_FP_IMM(FMAX, fmaxs, zero, one)
+-DO_FP_IMM(FMIN, fmins, zero, one)
++DO_FP_AH_IMM(FMAX, fmaxs, zero, one)
++DO_FP_AH_IMM(FMIN, fmins, zero, one)
+ #undef DO_FP_IMM
+--
+.34.1

-New patch
+[PULL 29/68] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
+Implement the FPCR.AH semantics for the SVE FMAX and FMIN
+operations that take two vector operands.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
+ target/arm/tcg/sve_helper.c    |  8 ++++++++
+ target/arm/tcg/translate-sve.c | 17 +++++++++++++++--
+files changed, 37 insertions(+), 2 deletions(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_fmax_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_6(sve_fmax_d, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmin_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmin_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmin_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++
++DEF_HELPER_FLAGS_6(sve_ah_fmax_h, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmax_s, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_6(sve_ah_fmax_d, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
++
+ DEF_HELPER_FLAGS_6(sve_fminnum_h, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_6(sve_fminnum_s, TCG_CALL_NO_RWG,
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_ZPZZ_FP(sve_fmax_h, uint16_t, H1_2, float16_max)
+ DO_ZPZZ_FP(sve_fmax_s, uint32_t, H1_4, float32_max)
+ DO_ZPZZ_FP(sve_fmax_d, uint64_t, H1_8, float64_max)
++DO_ZPZZ_FP(sve_ah_fmin_h, uint16_t, H1_2, helper_vfp_ah_minh)
++DO_ZPZZ_FP(sve_ah_fmin_s, uint32_t, H1_4, helper_vfp_ah_mins)
++DO_ZPZZ_FP(sve_ah_fmin_d, uint64_t, H1_8, helper_vfp_ah_mind)
++
++DO_ZPZZ_FP(sve_ah_fmax_h, uint16_t, H1_2, helper_vfp_ah_maxh)
++DO_ZPZZ_FP(sve_ah_fmax_s, uint32_t, H1_4, helper_vfp_ah_maxs)
++DO_ZPZZ_FP(sve_ah_fmax_d, uint64_t, H1_8, helper_vfp_ah_maxd)
++
+ DO_ZPZZ_FP(sve_fminnum_h, uint16_t, H1_2, float16_minnum)
+ DO_ZPZZ_FP(sve_fminnum_s, uint32_t, H1_4, float32_minnum)
+ DO_ZPZZ_FP(sve_fminnum_d, uint64_t, H1_8, float64_minnum)
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(FTSMUL, aa64_sve, gen_gvec_fpst_arg_zzz,
+     };                                                          \
+     TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz, name##_zpzz_fns[a->esz], a)
++#define DO_ZPZZ_AH_FP(NAME, FEAT, name, ah_name)                        \
++    static gen_helper_gvec_4_ptr * const name##_zpzz_fns[4] = {         \
++        NULL,                  gen_helper_##name##_h,                   \
++        gen_helper_##name##_s, gen_helper_##name##_d                    \
++    };                                                                  \
++    static gen_helper_gvec_4_ptr * const name##_ah_zpzz_fns[4] = {      \
++        NULL,                  gen_helper_##ah_name##_h,                \
++        gen_helper_##ah_name##_s, gen_helper_##ah_name##_d              \
++    };                                                                  \
++    TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz,                      \
++               s->fpcr_ah ? name##_ah_zpzz_fns[a->esz] :                \
++               name##_zpzz_fns[a->esz], a)
++
+ DO_ZPZZ_FP(FADD_zpzz, aa64_sve, sve_fadd)
+ DO_ZPZZ_FP(FSUB_zpzz, aa64_sve, sve_fsub)
+ DO_ZPZZ_FP(FMUL_zpzz, aa64_sve, sve_fmul)
+-DO_ZPZZ_FP(FMIN_zpzz, aa64_sve, sve_fmin)
+-DO_ZPZZ_FP(FMAX_zpzz, aa64_sve, sve_fmax)
++DO_ZPZZ_AH_FP(FMIN_zpzz, aa64_sve, sve_fmin, sve_ah_fmin)
++DO_ZPZZ_AH_FP(FMAX_zpzz, aa64_sve, sve_fmax, sve_ah_fmax)
+ DO_ZPZZ_FP(FMINNM_zpzz, aa64_sve, sve_fminnum)
+ DO_ZPZZ_FP(FMAXNM_zpzz, aa64_sve, sve_fmaxnum)
+ DO_ZPZZ_FP(FABD, aa64_sve, sve_fabd)
+--
+.34.1

-[PULL 12/55] target/arm: Move get_phys_addr_pmsav5 to ptw.c
+[PULL 30/68] target/arm: Implement FPCR.AH handling of negation of NaN
-From: Richard Henderson <richard.henderson@linaro.org>
+FPCR.AH == 1 mandates that negation of a NaN value should not flip
+its sign bit.  This means we can no longer use gen_vfp_neg*()
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+everywhere but must instead generate slightly more complex code when
-Message-id: 20220604040607.269301-6-richard.henderson@linaro.org
+FPCR.AH is set.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Make this change for the scalar FNEG and for those places in
 translate-a64.c which were previously directly calling
 gen_vfp_neg*().
 This change in semantics also affects any other instruction whose
 pseudocode calls FPNeg(); in following commits we extend this
 change to the other affected instructions.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/ptw.h    |  4 ---
+ target/arm/tcg/translate-a64.c | 125 ++++++++++++++++++++++++++++++---
- target/arm/helper.c | 85 ---------------------------------------------
+file changed, 114 insertions(+), 11 deletions(-)
- target/arm/ptw.c    | 85 +++++++++++++++++++++++++++++++++++++++++++++
-files changed, 85 insertions(+), 89 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 diff --git a/target/arm/ptw.h b/target/arm/ptw.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/ptw.h
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
-     return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
+                        is_q ? 16 : 8, vec_full_reg_size(s), data, fn);
  }
--bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
++/*
--                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
++ * When FPCR.AH == 1, NEG and ABS do not flip the sign bit of a NaN.
--                          hwaddr *phys_ptr, int *prot,
++ * These functions implement
--                          ARMMMUFaultInfo *fi);
++ *   d = floatN_is_any_nan(s) ? s : floatN_chs(s)
- bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
++ * which for float32 is
-                           MMUAccessType access_type, ARMMMUIdx mmu_idx,
++ *   d = (s & ~(1 << 31)) > 0x7f800000UL) ? s : (s ^ (1 << 31))
-                           hwaddr *phys_ptr, int *prot,
++ * and similarly for the other float sizes.
-diff --git a/target/arm/helper.c b/target/arm/helper.c
++ */
-index XXXXXXX..XXXXXXX 100644
++static void gen_vfp_ah_negh(TCGv_i32 d, TCGv_i32 s)
---- a/target/arm/helper.c
++{
-+++ b/target/arm/helper.c
++    TCGv_i32 abs_s = tcg_temp_new_i32(), chs_s = tcg_temp_new_i32();
-@@ -XXX,XX +XXX,XX @@ bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
++
-     return ret;
++    gen_vfp_negh(chs_s, s);
 +    gen_vfp_absh(abs_s, s);
 +    tcg_gen_movcond_i32(TCG_COND_GTU, d,
 +                        abs_s, tcg_constant_i32(0x7c00),
 +                        s, chs_s);
 +}
 +
 +static void gen_vfp_ah_negs(TCGv_i32 d, TCGv_i32 s)
 +{
 +    TCGv_i32 abs_s = tcg_temp_new_i32(), chs_s = tcg_temp_new_i32();
 +
 +    gen_vfp_negs(chs_s, s);
 +    gen_vfp_abss(abs_s, s);
 +    tcg_gen_movcond_i32(TCG_COND_GTU, d,
 +                        abs_s, tcg_constant_i32(0x7f800000UL),
 +                        s, chs_s);
 +}
 +
 +static void gen_vfp_ah_negd(TCGv_i64 d, TCGv_i64 s)
 +{
 +    TCGv_i64 abs_s = tcg_temp_new_i64(), chs_s = tcg_temp_new_i64();
 +
 +    gen_vfp_negd(chs_s, s);
 +    gen_vfp_absd(abs_s, s);
 +    tcg_gen_movcond_i64(TCG_COND_GTU, d,
 +                        abs_s, tcg_constant_i64(0x7ff0000000000000ULL),
 +                        s, chs_s);
 +}
 +
 +static void gen_vfp_maybe_ah_negh(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
 +{
 +    if (dc->fpcr_ah) {
 +        gen_vfp_ah_negh(d, s);
 +    } else {
 +        gen_vfp_negh(d, s);
 +    }
 +}
 +
 +static void gen_vfp_maybe_ah_negs(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
 +{
 +    if (dc->fpcr_ah) {
 +        gen_vfp_ah_negs(d, s);
 +    } else {
 +        gen_vfp_negs(d, s);
 +    }
 +}
 +
 +static void gen_vfp_maybe_ah_negd(DisasContext *dc, TCGv_i64 d, TCGv_i64 s)
 +{
 +    if (dc->fpcr_ah) {
 +        gen_vfp_ah_negd(d, s);
 +    } else {
 +        gen_vfp_negd(d, s);
 +    }
 +}
 +
  /* Set ZF and NF based on a 64 bit result. This is alas fiddlier
   * than the 32 bit equivalent.
   */
@@ -XXX,XX +XXX,XX @@ static void gen_fnmul_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
      gen_vfp_negd(d, d);
  }
--bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
++static void gen_fnmul_ah_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
--                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
++{
--                          hwaddr *phys_ptr, int *prot,
++    gen_helper_vfp_mulh(d, n, m, s);
--                          ARMMMUFaultInfo *fi)
++    gen_vfp_ah_negh(d, d);
--{
++}
--    int n;
++
--    uint32_t mask;
++static void gen_fnmul_ah_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
--    uint32_t base;
++{
--    bool is_user = regime_is_user(env, mmu_idx);
++    gen_helper_vfp_muls(d, n, m, s);
--
++    gen_vfp_ah_negs(d, d);
--    if (regime_translation_disabled(env, mmu_idx)) {
++}
--        /* MPU disabled.  */
++
--        *phys_ptr = address;
++static void gen_fnmul_ah_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
--        *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
++{
--        return false;
++    gen_helper_vfp_muld(d, n, m, s);
--    }
++    gen_vfp_ah_negd(d, d);
--
++}
--    *phys_ptr = address;
++
--    for (n = 7; n >= 0; n--) {
+ static const FPScalar f_scalar_fnmul = {
--        base = env->cp15.c6_region[n];
+     gen_fnmul_h,
--        if ((base & 1) == 0) {
+     gen_fnmul_s,
--            continue;
+     gen_fnmul_d,
--        }
+ };
--        mask = 1 << ((base >> 1) & 0x1f);
+-TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul, a->rn)
--        /* Keep this shift separate from the above to avoid an
++static const FPScalar f_scalar_ah_fnmul = {
--           (undefined) << 32.  */
++    gen_fnmul_ah_h,
--        mask = (mask << 1) - 1;
++    gen_fnmul_ah_s,
--        if (((base ^ address) & ~mask) == 0) {
++    gen_fnmul_ah_d,
--            break;
++};
--        }
++TRANS(FNMUL_s, do_fp3_scalar_2fn, a, &f_scalar_fnmul, &f_scalar_ah_fnmul, a->rn)
--    }
--    if (n < 0) {
+ static const FPScalar f_scalar_fcmeq = {
--        fi->type = ARMFault_Background;
+     gen_helper_advsimd_ceq_f16,
--        return true;
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
--    }
--
+             read_vec_element(s, t2, a->rm, a->idx, MO_64);
--    if (access_type == MMU_INST_FETCH) {
+             if (neg) {
--        mask = env->cp15.pmsav5_insn_ap;
+-                gen_vfp_negd(t1, t1);
--    } else {
++                gen_vfp_maybe_ah_negd(s, t1, t1);
--        mask = env->cp15.pmsav5_data_ap;
+             }
--    }
+             gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
--    mask = (mask >> (n * 4)) & 0xf;
+             write_fp_dreg_merging(s, a->rd, a->rd, t0);
--    switch (mask) {
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
--    case 0:
--        fi->type = ARMFault_Permission;
+             read_vec_element_i32(s, t2, a->rm, a->idx, MO_32);
--        fi->level = 1;
+             if (neg) {
--        return true;
+-                gen_vfp_negs(t1, t1);
--    case 1:
++                gen_vfp_maybe_ah_negs(s, t1, t1);
--        if (is_user) {
+             }
--            fi->type = ARMFault_Permission;
+             gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
--            fi->level = 1;
+             write_fp_sreg_merging(s, a->rd, a->rd, t0);
--            return true;
+@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
--        }
--        *prot = PAGE_READ | PAGE_WRITE;
+             read_vec_element_i32(s, t2, a->rm, a->idx, MO_16);
--        break;
+             if (neg) {
--    case 2:
+-                gen_vfp_negh(t1, t1);
--        *prot = PAGE_READ;
++                gen_vfp_maybe_ah_negh(s, t1, t1);
--        if (!is_user) {
+             }
--            *prot |= PAGE_WRITE;
+             gen_helper_advsimd_muladdh(t0, t1, t2, t0,
--        }
+                                        fpstatus_ptr(FPST_A64_F16));
--        break;
+@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
--    case 3:
+             TCGv_i64 ta = read_fp_dreg(s, a->ra);
--        *prot = PAGE_READ | PAGE_WRITE;
--        break;
+             if (neg_a) {
--    case 5:
+-                gen_vfp_negd(ta, ta);
--        if (is_user) {
++                gen_vfp_maybe_ah_negd(s, ta, ta);
--            fi->type = ARMFault_Permission;
+             }
--            fi->level = 1;
+             if (neg_n) {
--            return true;
+-                gen_vfp_negd(tn, tn);
--        }
++                gen_vfp_maybe_ah_negd(s, tn, tn);
--        *prot = PAGE_READ;
+             }
--        break;
+             fpst = fpstatus_ptr(FPST_A64);
--    case 6:
+             gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
--        *prot = PAGE_READ;
+@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
--        break;
+             TCGv_i32 ta = read_fp_sreg(s, a->ra);
--    default:
--        /* Bad permission.  */
+             if (neg_a) {
--        fi->type = ARMFault_Permission;
+-                gen_vfp_negs(ta, ta);
--        fi->level = 1;
++                gen_vfp_maybe_ah_negs(s, ta, ta);
--        return true;
+             }
--    }
+             if (neg_n) {
--    *prot |= PAGE_EXEC;
+-                gen_vfp_negs(tn, tn);
--    return false;
++                gen_vfp_maybe_ah_negs(s, tn, tn);
--}
+             }
--
+             fpst = fpstatus_ptr(FPST_A64);
- /* Combine either inner or outer cacheability attributes for normal
+             gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
-  * memory, according to table D4-42 and pseudocode procedure
+@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
-  * CombineS1S2AttrHints() of ARM DDI 0487B.b (the ARMv8 ARM).
+             TCGv_i32 ta = read_fp_hreg(s, a->ra);
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
-index XXXXXXX..XXXXXXX 100644
+             if (neg_a) {
---- a/target/arm/ptw.c
+-                gen_vfp_negh(ta, ta);
-+++ b/target/arm/ptw.c
++                gen_vfp_maybe_ah_negh(s, ta, ta);
-@@ -XXX,XX +XXX,XX @@ do_fault:
+             }
              if (neg_n) {
 -                gen_vfp_negh(tn, tn);
 +                gen_vfp_maybe_ah_negh(s, tn, tn);
              }
              fpst = fpstatus_ptr(FPST_A64_F16);
              gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
      return true;
  }
-+static bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
++static bool do_fp1_scalar_int_2fn(DisasContext *s, arg_rr_e *a,
-+                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
++                                  const FPScalar1Int *fnormal,
-+                                 hwaddr *phys_ptr, int *prot,
++                                  const FPScalar1Int *fah)
-+                                 ARMMMUFaultInfo *fi)
++{
-+{
++    return do_fp1_scalar_int(s, a, s->fpcr_ah ? fah : fnormal, true);
-+    int n;
++}
-+    uint32_t mask;
++
-+    uint32_t base;
+ static const FPScalar1Int f_scalar_fmov = {
-+    bool is_user = regime_is_user(env, mmu_idx);
+     tcg_gen_mov_i32,
-+
+     tcg_gen_mov_i32,
-+    if (regime_translation_disabled(env, mmu_idx)) {
+@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fneg = {
-+        /* MPU disabled.  */
+     gen_vfp_negs,
-+        *phys_ptr = address;
+     gen_vfp_negd,
-+        *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
+ };
-+        return false;
+-TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg, true)
-+    }
++static const FPScalar1Int f_scalar_ah_fneg = {
-+
++    gen_vfp_ah_negh,
-+    *phys_ptr = address;
++    gen_vfp_ah_negs,
-+    for (n = 7; n >= 0; n--) {
++    gen_vfp_ah_negd,
-+        base = env->cp15.c6_region[n];
++};
-+        if ((base & 1) == 0) {
++TRANS(FNEG_s, do_fp1_scalar_int_2fn, a, &f_scalar_fneg, &f_scalar_ah_fneg)
-+            continue;
-+        }
+ typedef struct FPScalar1 {
-+        mask = 1 << ((base >> 1) & 0x1f);
+     void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr);
 +        /* Keep this shift separate from the above to avoid an
 +           (undefined) << 32.  */
 +        mask = (mask << 1) - 1;
 +        if (((base ^ address) & ~mask) == 0) {
 +            break;
 +        }
 +    }
 +    if (n < 0) {
 +        fi->type = ARMFault_Background;
 +        return true;
 +    }
 +
 +    if (access_type == MMU_INST_FETCH) {
 +        mask = env->cp15.pmsav5_insn_ap;
 +    } else {
 +        mask = env->cp15.pmsav5_data_ap;
 +    }
 +    mask = (mask >> (n * 4)) & 0xf;
 +    switch (mask) {
 +    case 0:
 +        fi->type = ARMFault_Permission;
 +        fi->level = 1;
 +        return true;
 +    case 1:
 +        if (is_user) {
 +            fi->type = ARMFault_Permission;
 +            fi->level = 1;
 +            return true;
 +        }
 +        *prot = PAGE_READ | PAGE_WRITE;
 +        break;
 +    case 2:
 +        *prot = PAGE_READ;
 +        if (!is_user) {
 +            *prot |= PAGE_WRITE;
 +        }
 +        break;
 +    case 3:
 +        *prot = PAGE_READ | PAGE_WRITE;
 +        break;
 +    case 5:
 +        if (is_user) {
 +            fi->type = ARMFault_Permission;
 +            fi->level = 1;
 +            return true;
 +        }
 +        *prot = PAGE_READ;
 +        break;
 +    case 6:
 +        *prot = PAGE_READ;
 +        break;
 +    default:
 +        /* Bad permission.  */
 +        fi->type = ARMFault_Permission;
 +        fi->level = 1;
 +        return true;
 +    }
 +    *prot |= PAGE_EXEC;
 +    return false;
 +}
 +
  /**
   * get_phys_addr - get the physical address for this virtual address
   *
 --
-.25.1
+.34.1

-[PULL 21/55] target/arm: Move combine_cacheattrs and subroutines to ptw.c
+[PULL 31/68] target/arm: Implement FPCR.AH handling for scalar FABS and FABD
-From: Richard Henderson <richard.henderson@linaro.org>
+FPCR.AH == 1 mandates that taking the absolute value of a NaN should
 not change its sign bit.  This means we can no longer use
 gen_vfp_abs*() everywhere but must instead generate slightly more
 complex code when FPCR.AH is set.
-There are a handful of helpers for combine_cacheattrs
+Implement these semantics for scalar FABS and FABD.  This change also
-that we can move at the same time as the main entry point.
+affects all other instructions whose psuedocode calls FPAbs(); we
 will extend the change to those instructions in following commits.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-15-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/ptw.h    |   3 -
+ target/arm/tcg/translate-a64.c | 69 +++++++++++++++++++++++++++++++++-
- target/arm/helper.c | 218 -------------------------------------------
+file changed, 67 insertions(+), 2 deletions(-)
  target/arm/ptw.c    | 221 ++++++++++++++++++++++++++++++++++++++++++++
 files changed, 221 insertions(+), 221 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/ptw.h
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx);
+@@ -XXX,XX +XXX,XX @@ static void gen_vfp_ah_negd(TCGv_i64 d, TCGv_i64 s)
- bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
+                         s, chs_s);
  uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn);
 -ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
 -                                 ARMCacheAttrs s1, ARMCacheAttrs s2);
 -
  int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
                    int ap, int domain_prot);
  int simple_ap_to_rw_prot_is_user(int ap, bool is_user);
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
      }
      return true;
  }
--
--/* Translate from the 4-bit stage 2 representation of
-- * memory attributes (without cache-allocation hints) to
-- * the 8-bit representation of the stage 1 MAIR registers
-- * (which includes allocation hints).
-- *
-- * ref: shared/translation/attrs/S2AttrDecode()
-- *      .../S2ConvertAttrsHints()
-- */
--static uint8_t convert_stage2_attrs(CPUARMState *env, uint8_t s2attrs)
--{
--    uint8_t hiattr = extract32(s2attrs, 2, 2);
--    uint8_t loattr = extract32(s2attrs, 0, 2);
--    uint8_t hihint = 0, lohint = 0;
--
--    if (hiattr != 0) { /* normal memory */
--        if (arm_hcr_el2_eff(env) & HCR_CD) { /* cache disabled */
--            hiattr = loattr = 1; /* non-cacheable */
--        } else {
--            if (hiattr != 1) { /* Write-through or write-back */
--                hihint = 3; /* RW allocate */
--            }
--            if (loattr != 1) { /* Write-through or write-back */
--                lohint = 3; /* RW allocate */
--            }
--        }
--    }
--
--    return (hiattr << 6) | (hihint << 4) | (loattr << 2) | lohint;
--}
- #endif /* !CONFIG_USER_ONLY */
- /* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
-@@ -XXX,XX +XXX,XX @@ do_fault:
-     return true;
- }
--/* Combine either inner or outer cacheability attributes for normal
-- * memory, according to table D4-42 and pseudocode procedure
-- * CombineS1S2AttrHints() of ARM DDI 0487B.b (the ARMv8 ARM).
-- *
-- * NB: only stage 1 includes allocation hints (RW bits), leading to
-- * some asymmetry.
-- */
--static uint8_t combine_cacheattr_nibble(uint8_t s1, uint8_t s2)
--{
--    if (s1 == 4 || s2 == 4) {
--        /* non-cacheable has precedence */
--        return 4;
--    } else if (extract32(s1, 2, 2) == 0 || extract32(s1, 2, 2) == 2) {
--        /* stage 1 write-through takes precedence */
--        return s1;
--    } else if (extract32(s2, 2, 2) == 2) {
--        /* stage 2 write-through takes precedence, but the allocation hint
--         * is still taken from stage 1
--         */
--        return (2 << 2) | extract32(s1, 0, 2);
--    } else { /* write-back */
--        return s1;
--    }
--}
--
--/*
-- * Combine the memory type and cacheability attributes of
-- * s1 and s2 for the HCR_EL2.FWB == 0 case, returning the
-- * combined attributes in MAIR_EL1 format.
-- */
--static uint8_t combined_attrs_nofwb(CPUARMState *env,
--                                    ARMCacheAttrs s1, ARMCacheAttrs s2)
--{
--    uint8_t s1lo, s2lo, s1hi, s2hi, s2_mair_attrs, ret_attrs;
--
--    s2_mair_attrs = convert_stage2_attrs(env, s2.attrs);
--
--    s1lo = extract32(s1.attrs, 0, 4);
--    s2lo = extract32(s2_mair_attrs, 0, 4);
--    s1hi = extract32(s1.attrs, 4, 4);
--    s2hi = extract32(s2_mair_attrs, 4, 4);
--
--    /* Combine memory type and cacheability attributes */
--    if (s1hi == 0 || s2hi == 0) {
--        /* Device has precedence over normal */
--        if (s1lo == 0 || s2lo == 0) {
--            /* nGnRnE has precedence over anything */
--            ret_attrs = 0;
--        } else if (s1lo == 4 || s2lo == 4) {
--            /* non-Reordering has precedence over Reordering */
--            ret_attrs = 4;  /* nGnRE */
--        } else if (s1lo == 8 || s2lo == 8) {
--            /* non-Gathering has precedence over Gathering */
--            ret_attrs = 8;  /* nGRE */
--        } else {
--            ret_attrs = 0xc; /* GRE */
--        }
--    } else { /* Normal memory */
--        /* Outer/inner cacheability combine independently */
--        ret_attrs = combine_cacheattr_nibble(s1hi, s2hi) << 4
--                  | combine_cacheattr_nibble(s1lo, s2lo);
--    }
--    return ret_attrs;
--}
--
--static uint8_t force_cacheattr_nibble_wb(uint8_t attr)
--{
--    /*
--     * Given the 4 bits specifying the outer or inner cacheability
--     * in MAIR format, return a value specifying Normal Write-Back,
--     * with the allocation and transient hints taken from the input
--     * if the input specified some kind of cacheable attribute.
--     */
--    if (attr == 0 || attr == 4) {
--        /*
--         * 0 == an UNPREDICTABLE encoding
--         * 4 == Non-cacheable
--         * Either way, force Write-Back RW allocate non-transient
--         */
--        return 0xf;
--    }
--    /* Change WriteThrough to WriteBack, keep allocation and transient hints */
--    return attr | 4;
--}
--
--/*
-- * Combine the memory type and cacheability attributes of
-- * s1 and s2 for the HCR_EL2.FWB == 1 case, returning the
-- * combined attributes in MAIR_EL1 format.
-- */
--static uint8_t combined_attrs_fwb(CPUARMState *env,
--                                  ARMCacheAttrs s1, ARMCacheAttrs s2)
--{
--    switch (s2.attrs) {
--    case 7:
--        /* Use stage 1 attributes */
--        return s1.attrs;
--    case 6:
--        /*
--         * Force Normal Write-Back. Note that if S1 is Normal cacheable
--         * then we take the allocation hints from it; otherwise it is
--         * RW allocate, non-transient.
--         */
--        if ((s1.attrs & 0xf0) == 0) {
--            /* S1 is Device */
--            return 0xff;
--        }
--        /* Need to check the Inner and Outer nibbles separately */
--        return force_cacheattr_nibble_wb(s1.attrs & 0xf) |
--            force_cacheattr_nibble_wb(s1.attrs >> 4) << 4;
--    case 5:
--        /* If S1 attrs are Device, use them; otherwise Normal Non-cacheable */
--        if ((s1.attrs & 0xf0) == 0) {
--            return s1.attrs;
--        }
--        return 0x44;
--    case 0 ... 3:
--        /* Force Device, of subtype specified by S2 */
--        return s2.attrs << 2;
--    default:
--        /*
--         * RESERVED values (including RES0 descriptor bit [5] being nonzero);
--         * arbitrarily force Device.
--         */
--        return 0;
--    }
--}
--
--/* Combine S1 and S2 cacheability/shareability attributes, per D4.5.4
-- * and CombineS1S2Desc()
-- *
-- * @env:     CPUARMState
-- * @s1:      Attributes from stage 1 walk
-- * @s2:      Attributes from stage 2 walk
-- */
--ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
--                                 ARMCacheAttrs s1, ARMCacheAttrs s2)
--{
--    ARMCacheAttrs ret;
--    bool tagged = false;
--
--    assert(s2.is_s2_format && !s1.is_s2_format);
--    ret.is_s2_format = false;
--
--    if (s1.attrs == 0xf0) {
--        tagged = true;
--        s1.attrs = 0xff;
--    }
--
--    /* Combine shareability attributes (table D4-43) */
--    if (s1.shareability == 2 || s2.shareability == 2) {
--        /* if either are outer-shareable, the result is outer-shareable */
--        ret.shareability = 2;
--    } else if (s1.shareability == 3 || s2.shareability == 3) {
--        /* if either are inner-shareable, the result is inner-shareable */
--        ret.shareability = 3;
--    } else {
--        /* both non-shareable */
--        ret.shareability = 0;
--    }
--
--    /* Combine memory type and cacheability attributes */
--    if (arm_hcr_el2_eff(env) & HCR_FWB) {
--        ret.attrs = combined_attrs_fwb(env, s1, s2);
--    } else {
--        ret.attrs = combined_attrs_nofwb(env, s1, s2);
--    }
--
--    /*
--     * Any location for which the resultant memory type is any
--     * type of Device memory is always treated as Outer Shareable.
--     * Any location for which the resultant memory type is Normal
--     * Inner Non-cacheable, Outer Non-cacheable is always treated
--     * as Outer Shareable.
--     * TODO: FEAT_XS adds another value (0x40) also meaning iNCoNC
--     */
--    if ((ret.attrs & 0xf0) == 0 || ret.attrs == 0x44) {
--        ret.shareability = 2;
--    }
--
--    /* TODO: CombineS1S2Desc does not consider transient, only WB, RWA. */
--    if (tagged && ret.attrs == 0xff) {
--        ret.attrs = 0xf0;
--    }
--
--    return ret;
--}
--
- hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
-                                          MemTxAttrs *attrs)
- {
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
-+++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
-     return ret;
- }
 +/*
-+ * Translate from the 4-bit stage 2 representation of
++ * These functions implement
-+ * memory attributes (without cache-allocation hints) to
++ *  d = floatN_is_any_nan(s) ? s : floatN_abs(s)
-+ * the 8-bit representation of the stage 1 MAIR registers
++ * which for float32 is
-+ * (which includes allocation hints).
++ *  d = (s & ~(1 << 31)) > 0x7f800000UL) ? s : (s & ~(1 << 31))
-+ *
++ * and similarly for the other float sizes.
 + * ref: shared/translation/attrs/S2AttrDecode()
 + *      .../S2ConvertAttrsHints()
 + */
-+static uint8_t convert_stage2_attrs(CPUARMState *env, uint8_t s2attrs)
++static void gen_vfp_ah_absh(TCGv_i32 d, TCGv_i32 s)
 +{
-+    uint8_t hiattr = extract32(s2attrs, 2, 2);
++    TCGv_i32 abs_s = tcg_temp_new_i32();
 +    uint8_t loattr = extract32(s2attrs, 0, 2);
 +    uint8_t hihint = 0, lohint = 0;
 +
-+    if (hiattr != 0) { /* normal memory */
++    gen_vfp_absh(abs_s, s);
-+        if (arm_hcr_el2_eff(env) & HCR_CD) { /* cache disabled */
++    tcg_gen_movcond_i32(TCG_COND_GTU, d,
-+            hiattr = loattr = 1; /* non-cacheable */
++                        abs_s, tcg_constant_i32(0x7c00),
-+        } else {
++                        s, abs_s);
 +            if (hiattr != 1) { /* Write-through or write-back */
 +                hihint = 3; /* RW allocate */
 +            }
 +            if (loattr != 1) { /* Write-through or write-back */
 +                lohint = 3; /* RW allocate */
 +            }
 +        }
 +    }
 +
 +    return (hiattr << 6) | (hihint << 4) | (loattr << 2) | lohint;
 +}
 +
-+/*
++static void gen_vfp_ah_abss(TCGv_i32 d, TCGv_i32 s)
 + * Combine either inner or outer cacheability attributes for normal
 + * memory, according to table D4-42 and pseudocode procedure
 + * CombineS1S2AttrHints() of ARM DDI 0487B.b (the ARMv8 ARM).
 + *
 + * NB: only stage 1 includes allocation hints (RW bits), leading to
 + * some asymmetry.
 + */
 +static uint8_t combine_cacheattr_nibble(uint8_t s1, uint8_t s2)
 +{
-+    if (s1 == 4 || s2 == 4) {
++    TCGv_i32 abs_s = tcg_temp_new_i32();
-+        /* non-cacheable has precedence */
++
-+        return 4;
++    gen_vfp_abss(abs_s, s);
-+    } else if (extract32(s1, 2, 2) == 0 || extract32(s1, 2, 2) == 2) {
++    tcg_gen_movcond_i32(TCG_COND_GTU, d,
-+        /* stage 1 write-through takes precedence */
++                        abs_s, tcg_constant_i32(0x7f800000UL),
-+        return s1;
++                        s, abs_s);
 +    } else if (extract32(s2, 2, 2) == 2) {
 +        /* stage 2 write-through takes precedence, but the allocation hint
 +         * is still taken from stage 1
 +         */
 +        return (2 << 2) | extract32(s1, 0, 2);
 +    } else { /* write-back */
 +        return s1;
 +    }
 +}
 +
-+/*
++static void gen_vfp_ah_absd(TCGv_i64 d, TCGv_i64 s)
 + * Combine the memory type and cacheability attributes of
 + * s1 and s2 for the HCR_EL2.FWB == 0 case, returning the
 + * combined attributes in MAIR_EL1 format.
 + */
 +static uint8_t combined_attrs_nofwb(CPUARMState *env,
 +                                    ARMCacheAttrs s1, ARMCacheAttrs s2)
 +{
-+    uint8_t s1lo, s2lo, s1hi, s2hi, s2_mair_attrs, ret_attrs;
++    TCGv_i64 abs_s = tcg_temp_new_i64();
 +
-+    s2_mair_attrs = convert_stage2_attrs(env, s2.attrs);
++    gen_vfp_absd(abs_s, s);
-+
++    tcg_gen_movcond_i64(TCG_COND_GTU, d,
-+    s1lo = extract32(s1.attrs, 0, 4);
++                        abs_s, tcg_constant_i64(0x7ff0000000000000ULL),
-+    s2lo = extract32(s2_mair_attrs, 0, 4);
++                        s, abs_s);
 +    s1hi = extract32(s1.attrs, 4, 4);
 +    s2hi = extract32(s2_mair_attrs, 4, 4);
 +
 +    /* Combine memory type and cacheability attributes */
 +    if (s1hi == 0 || s2hi == 0) {
 +        /* Device has precedence over normal */
 +        if (s1lo == 0 || s2lo == 0) {
 +            /* nGnRnE has precedence over anything */
 +            ret_attrs = 0;
 +        } else if (s1lo == 4 || s2lo == 4) {
 +            /* non-Reordering has precedence over Reordering */
 +            ret_attrs = 4;  /* nGnRE */
 +        } else if (s1lo == 8 || s2lo == 8) {
 +            /* non-Gathering has precedence over Gathering */
 +            ret_attrs = 8;  /* nGRE */
 +        } else {
 +            ret_attrs = 0xc; /* GRE */
 +        }
 +    } else { /* Normal memory */
 +        /* Outer/inner cacheability combine independently */
 +        ret_attrs = combine_cacheattr_nibble(s1hi, s2hi) << 4
 +                  | combine_cacheattr_nibble(s1lo, s2lo);
 +    }
 +    return ret_attrs;
 +}
 +
-+static uint8_t force_cacheattr_nibble_wb(uint8_t attr)
+ static void gen_vfp_maybe_ah_negh(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
  {
      if (dc->fpcr_ah) {
@@ -XXX,XX +XXX,XX @@ static void gen_fabd_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
      gen_vfp_absd(d, d);
  }
 +static void gen_fabd_ah_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 +{
-+    /*
++    gen_helper_vfp_subh(d, n, m, s);
-+     * Given the 4 bits specifying the outer or inner cacheability
++    gen_vfp_ah_absh(d, d);
 +     * in MAIR format, return a value specifying Normal Write-Back,
 +     * with the allocation and transient hints taken from the input
 +     * if the input specified some kind of cacheable attribute.
 +     */
 +    if (attr == 0 || attr == 4) {
 +        /*
 +         * 0 == an UNPREDICTABLE encoding
 +         * 4 == Non-cacheable
 +         * Either way, force Write-Back RW allocate non-transient
 +         */
 +        return 0xf;
 +    }
 +    /* Change WriteThrough to WriteBack, keep allocation and transient hints */
 +    return attr | 4;
 +}
 +
-+/*
++static void gen_fabd_ah_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 + * Combine the memory type and cacheability attributes of
 + * s1 and s2 for the HCR_EL2.FWB == 1 case, returning the
 + * combined attributes in MAIR_EL1 format.
 + */
 +static uint8_t combined_attrs_fwb(CPUARMState *env,
 +                                  ARMCacheAttrs s1, ARMCacheAttrs s2)
 +{
-+    switch (s2.attrs) {
++    gen_helper_vfp_subs(d, n, m, s);
-+    case 7:
++    gen_vfp_ah_abss(d, d);
 +        /* Use stage 1 attributes */
 +        return s1.attrs;
 +    case 6:
 +        /*
 +         * Force Normal Write-Back. Note that if S1 is Normal cacheable
 +         * then we take the allocation hints from it; otherwise it is
 +         * RW allocate, non-transient.
 +         */
 +        if ((s1.attrs & 0xf0) == 0) {
 +            /* S1 is Device */
 +            return 0xff;
 +        }
 +        /* Need to check the Inner and Outer nibbles separately */
 +        return force_cacheattr_nibble_wb(s1.attrs & 0xf) |
 +            force_cacheattr_nibble_wb(s1.attrs >> 4) << 4;
 +    case 5:
 +        /* If S1 attrs are Device, use them; otherwise Normal Non-cacheable */
 +        if ((s1.attrs & 0xf0) == 0) {
 +            return s1.attrs;
 +        }
 +        return 0x44;
 +    case 0 ... 3:
 +        /* Force Device, of subtype specified by S2 */
 +        return s2.attrs << 2;
 +    default:
 +        /*
 +         * RESERVED values (including RES0 descriptor bit [5] being nonzero);
 +         * arbitrarily force Device.
 +         */
 +        return 0;
 +    }
 +}
 +
-+/*
++static void gen_fabd_ah_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
 + * Combine S1 and S2 cacheability/shareability attributes, per D4.5.4
 + * and CombineS1S2Desc()
 + *
 + * @env:     CPUARMState
 + * @s1:      Attributes from stage 1 walk
 + * @s2:      Attributes from stage 2 walk
 + */
 +static ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
 +                                        ARMCacheAttrs s1, ARMCacheAttrs s2)
 +{
-+    ARMCacheAttrs ret;
++    gen_helper_vfp_subd(d, n, m, s);
-+    bool tagged = false;
++    gen_vfp_ah_absd(d, d);
 +
 +    assert(s2.is_s2_format && !s1.is_s2_format);
 +    ret.is_s2_format = false;
 +
 +    if (s1.attrs == 0xf0) {
 +        tagged = true;
 +        s1.attrs = 0xff;
 +    }
 +
 +    /* Combine shareability attributes (table D4-43) */
 +    if (s1.shareability == 2 || s2.shareability == 2) {
 +        /* if either are outer-shareable, the result is outer-shareable */
 +        ret.shareability = 2;
 +    } else if (s1.shareability == 3 || s2.shareability == 3) {
 +        /* if either are inner-shareable, the result is inner-shareable */
 +        ret.shareability = 3;
 +    } else {
 +        /* both non-shareable */
 +        ret.shareability = 0;
 +    }
 +
 +    /* Combine memory type and cacheability attributes */
 +    if (arm_hcr_el2_eff(env) & HCR_FWB) {
 +        ret.attrs = combined_attrs_fwb(env, s1, s2);
 +    } else {
 +        ret.attrs = combined_attrs_nofwb(env, s1, s2);
 +    }
 +
 +    /*
 +     * Any location for which the resultant memory type is any
 +     * type of Device memory is always treated as Outer Shareable.
 +     * Any location for which the resultant memory type is Normal
 +     * Inner Non-cacheable, Outer Non-cacheable is always treated
 +     * as Outer Shareable.
 +     * TODO: FEAT_XS adds another value (0x40) also meaning iNCoNC
 +     */
 +    if ((ret.attrs & 0xf0) == 0 || ret.attrs == 0x44) {
 +        ret.shareability = 2;
 +    }
 +
 +    /* TODO: CombineS1S2Desc does not consider transient, only WB, RWA. */
 +    if (tagged && ret.attrs == 0xff) {
 +        ret.attrs = 0xf0;
 +    }
 +
 +    return ret;
 +}
 +
- /**
+ static const FPScalar f_scalar_fabd = {
-  * get_phys_addr - get the physical address for this virtual address
+     gen_fabd_h,
-  *
+     gen_fabd_s,
      gen_fabd_d,
  };
 -TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd, a->rn)
 +static const FPScalar f_scalar_ah_fabd = {
 +    gen_fabd_ah_h,
 +    gen_fabd_ah_s,
 +    gen_fabd_ah_d,
 +};
 +TRANS(FABD_s, do_fp3_scalar_2fn, a, &f_scalar_fabd, &f_scalar_ah_fabd, a->rn)
  static const FPScalar f_scalar_frecps = {
      gen_helper_recpsf_f16,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fabs = {
      gen_vfp_abss,
      gen_vfp_absd,
  };
 -TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs, true)
 +static const FPScalar1Int f_scalar_ah_fabs = {
 +    gen_vfp_ah_absh,
 +    gen_vfp_ah_abss,
 +    gen_vfp_ah_absd,
 +};
 +TRANS(FABS_s, do_fp1_scalar_int_2fn, a, &f_scalar_fabs, &f_scalar_ah_fabs)
  static const FPScalar1Int f_scalar_fneg = {
      gen_vfp_negh,
 --
-.25.1
+.34.1

-[PULL 28/55] target/arm: Move aa32_va_parameters to ptw.c
+[PULL 32/68] target/arm: Handle FPCR.AH in vector FABD
-From: Richard Henderson <richard.henderson@linaro.org>
+Split the handling of vector FABD so that it calls a different set
 of helpers when FPCR.AH is 1, which implement the "no negation of
 the sign of a NaN" semantics.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-22-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/ptw.h    |  3 ---
+ target/arm/helper.h            |  4 ++++
- target/arm/helper.c | 64 ---------------------------------------------
+ target/arm/tcg/translate-a64.c |  7 ++++++-
- target/arm/ptw.c    | 64 +++++++++++++++++++++++++++++++++++++++++++++
+ target/arm/tcg/vec_helper.c    | 23 +++++++++++++++++++++++
-files changed, 64 insertions(+), 67 deletions(-)
+files changed, 33 insertions(+), 1 deletion(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/helper.h
-+++ b/target/arm/ptw.h
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fabd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
-     return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
+ DEF_HELPER_FLAGS_5(gvec_fabd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_5(gvec_fabd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_5(gvec_ah_fabd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_5(gvec_ah_fabd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_5(gvec_ah_fabd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 +
  DEF_HELPER_FLAGS_5(gvec_fceq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_5(gvec_fceq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_5(gvec_fceq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fabd[3] = {
      gen_helper_gvec_fabd_s,
      gen_helper_gvec_fabd_d,
  };
 -TRANS(FABD_v, do_fp3_vector, a, 0, f_vector_fabd)
 +static gen_helper_gvec_3_ptr * const f_vector_ah_fabd[3] = {
 +    gen_helper_gvec_ah_fabd_h,
 +    gen_helper_gvec_ah_fabd_s,
 +    gen_helper_gvec_ah_fabd_d,
 +};
 +TRANS(FABD_v, do_fp3_vector_2fn, a, 0, f_vector_fabd, f_vector_ah_fabd)
  static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
      gen_helper_gvec_recps_h,
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static float64 float64_abd(float64 op1, float64 op2, float_status *stat)
      return float64_abs(float64_sub(op1, op2, stat));
  }
--ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
++/* ABD when FPCR.AH = 1: avoid flipping sign bit of a NaN result */
--                                   ARMMMUIdx mmu_idx);
++static float16 float16_ah_abd(float16 op1, float16 op2, float_status *stat)
 -
  #endif /* !CONFIG_USER_ONLY */
  #endif /* TARGET_ARM_PTW_H */
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
  }
  #ifndef CONFIG_USER_ONLY
 -ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
 -                                   ARMMMUIdx mmu_idx)
 -{
 -    uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
 -    uint32_t el = regime_el(env, mmu_idx);
 -    int select, tsz;
 -    bool epd, hpd;
 -
 -    assert(mmu_idx != ARMMMUIdx_Stage2_S);
 -
 -    if (mmu_idx == ARMMMUIdx_Stage2) {
 -        /* VTCR */
 -        bool sext = extract32(tcr, 4, 1);
 -        bool sign = extract32(tcr, 3, 1);
 -
 -        /*
 -         * If the sign-extend bit is not the same as t0sz[3], the result
 -         * is unpredictable. Flag this as a guest error.
 -         */
 -        if (sign != sext) {
 -            qemu_log_mask(LOG_GUEST_ERROR,
 -                          "AArch32: VTCR.S / VTCR.T0SZ[3] mismatch\n");
 -        }
 -        tsz = sextract32(tcr, 0, 4) + 8;
 -        select = 0;
 -        hpd = false;
 -        epd = false;
 -    } else if (el == 2) {
 -        /* HTCR */
 -        tsz = extract32(tcr, 0, 3);
 -        select = 0;
 -        hpd = extract64(tcr, 24, 1);
 -        epd = false;
 -    } else {
 -        int t0sz = extract32(tcr, 0, 3);
 -        int t1sz = extract32(tcr, 16, 3);
 -
 -        if (t1sz == 0) {
 -            select = va > (0xffffffffu >> t0sz);
 -        } else {
 -            /* Note that we will detect errors later.  */
 -            select = va >= ~(0xffffffffu >> t1sz);
 -        }
 -        if (!select) {
 -            tsz = t0sz;
 -            epd = extract32(tcr, 7, 1);
 -            hpd = extract64(tcr, 41, 1);
 -        } else {
 -            tsz = t1sz;
 -            epd = extract32(tcr, 23, 1);
 -            hpd = extract64(tcr, 42, 1);
 -        }
 -        /* For aarch32, hpd0 is not enabled without t2e as well.  */
 -        hpd &= extract32(tcr, 6, 1);
 -    }
 -
 -    return (ARMVAParameters) {
 -        .tsz = tsz,
 -        .select = select,
 -        .epd = epd,
 -        .hpd = hpd,
 -    };
 -}
 -
  hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
                                           MemTxAttrs *attrs)
  {
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
      return prot_rw | PAGE_EXEC;
  }
 +static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
 +                                          ARMMMUIdx mmu_idx)
 +{
-+    uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
++    float16 r = float16_sub(op1, op2, stat);
-+    uint32_t el = regime_el(env, mmu_idx);
++    return float16_is_any_nan(r) ? r : float16_abs(r);
-+    int select, tsz;
++}
 +    bool epd, hpd;
 +
-+    assert(mmu_idx != ARMMMUIdx_Stage2_S);
++static float32 float32_ah_abd(float32 op1, float32 op2, float_status *stat)
 +{
 +    float32 r = float32_sub(op1, op2, stat);
 +    return float32_is_any_nan(r) ? r : float32_abs(r);
 +}
 +
-+    if (mmu_idx == ARMMMUIdx_Stage2) {
++static float64 float64_ah_abd(float64 op1, float64 op2, float_status *stat)
-+        /* VTCR */
++{
-+        bool sext = extract32(tcr, 4, 1);
++    float64 r = float64_sub(op1, op2, stat);
-+        bool sign = extract32(tcr, 3, 1);
++    return float64_is_any_nan(r) ? r : float64_abs(r);
 +
 +        /*
 +         * If the sign-extend bit is not the same as t0sz[3], the result
 +         * is unpredictable. Flag this as a guest error.
 +         */
 +        if (sign != sext) {
 +            qemu_log_mask(LOG_GUEST_ERROR,
 +                          "AArch32: VTCR.S / VTCR.T0SZ[3] mismatch\n");
 +        }
 +        tsz = sextract32(tcr, 0, 4) + 8;
 +        select = 0;
 +        hpd = false;
 +        epd = false;
 +    } else if (el == 2) {
 +        /* HTCR */
 +        tsz = extract32(tcr, 0, 3);
 +        select = 0;
 +        hpd = extract64(tcr, 24, 1);
 +        epd = false;
 +    } else {
 +        int t0sz = extract32(tcr, 0, 3);
 +        int t1sz = extract32(tcr, 16, 3);
 +
 +        if (t1sz == 0) {
 +            select = va > (0xffffffffu >> t0sz);
 +        } else {
 +            /* Note that we will detect errors later.  */
 +            select = va >= ~(0xffffffffu >> t1sz);
 +        }
 +        if (!select) {
 +            tsz = t0sz;
 +            epd = extract32(tcr, 7, 1);
 +            hpd = extract64(tcr, 41, 1);
 +        } else {
 +            tsz = t1sz;
 +            epd = extract32(tcr, 23, 1);
 +            hpd = extract64(tcr, 42, 1);
 +        }
 +        /* For aarch32, hpd0 is not enabled without t2e as well.  */
 +        hpd &= extract32(tcr, 6, 1);
 +    }
 +
 +    return (ARMVAParameters) {
 +        .tsz = tsz,
 +        .select = select,
 +        .epd = epd,
 +        .hpd = hpd,
 +    };
 +}
 +
  /*
-  * check_s2_mmu_setup
+  * Reciprocal step. These are the AArch32 version which uses a
-  * @cpu:        ARMCPU
+  * non-fused multiply-and-subtract.
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_fabd_h, float16_abd, float16)
  DO_3OP(gvec_fabd_s, float32_abd, float32)
  DO_3OP(gvec_fabd_d, float64_abd, float64)
 +DO_3OP(gvec_ah_fabd_h, float16_ah_abd, float16)
 +DO_3OP(gvec_ah_fabd_s, float32_ah_abd, float32)
 +DO_3OP(gvec_ah_fabd_d, float64_ah_abd, float64)
 +
  DO_3OP(gvec_fceq_h, float16_ceq, float16)
  DO_3OP(gvec_fceq_s, float32_ceq, float32)
  DO_3OP(gvec_fceq_d, float64_ceq, float64)
 --
-.25.1
+.34.1

-New patch
+[PULL 33/68] target/arm: Handle FPCR.AH in SVE FNEG
+Make SVE FNEG honour the FPCR.AH "don't negate the sign of a NaN"
+semantics.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 4 ++++
+ target/arm/tcg/sve_helper.c    | 8 ++++++++
+ target/arm/tcg/translate-sve.c | 7 ++++++-
+files changed, 18 insertions(+), 1 deletion(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++
+ DEF_HELPER_FLAGS_4(sve_not_zpz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_not_zpz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_not_zpz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_ZPZ(sve_fneg_h, uint16_t, H1_2, DO_FNEG)
+ DO_ZPZ(sve_fneg_s, uint32_t, H1_4, DO_FNEG)
+ DO_ZPZ_D(sve_fneg_d, uint64_t, DO_FNEG)
++#define DO_AH_FNEG_H(N) (float16_is_any_nan(N) ? (N) : DO_FNEG(N))
++#define DO_AH_FNEG_S(N) (float32_is_any_nan(N) ? (N) : DO_FNEG(N))
++#define DO_AH_FNEG_D(N) (float64_is_any_nan(N) ? (N) : DO_FNEG(N))
++
++DO_ZPZ(sve_ah_fneg_h, uint16_t, H1_2, DO_AH_FNEG_H)
++DO_ZPZ(sve_ah_fneg_s, uint32_t, H1_4, DO_AH_FNEG_S)
++DO_ZPZ_D(sve_ah_fneg_d, uint64_t, DO_AH_FNEG_D)
++
+ #define DO_NOT(N)    (~N)
+ DO_ZPZ(sve_not_zpz_b, uint8_t, H1, DO_NOT)
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const fneg_fns[4] = {
+     NULL,                  gen_helper_sve_fneg_h,
+     gen_helper_sve_fneg_s, gen_helper_sve_fneg_d,
+ };
+-TRANS_FEAT(FNEG, aa64_sve, gen_gvec_ool_arg_zpz, fneg_fns[a->esz], a, 0)
++static gen_helper_gvec_3 * const fneg_ah_fns[4] = {
++    NULL,                  gen_helper_sve_ah_fneg_h,
++    gen_helper_sve_ah_fneg_s, gen_helper_sve_ah_fneg_d,
++};
++TRANS_FEAT(FNEG, aa64_sve, gen_gvec_ool_arg_zpz,
++           s->fpcr_ah ? fneg_ah_fns[a->esz] : fneg_fns[a->esz], a, 0)
+ static gen_helper_gvec_3 * const sxtb_fns[4] = {
+     NULL,                  gen_helper_sve_sxtb_h,
+--
+.34.1

-New patch
+[PULL 34/68] target/arm: Handle FPCR.AH in SVE FABS
+Make SVE FABS honour the FPCR.AH "don't negate the sign of a NaN"
+semantics.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/helper-sve.h    | 4 ++++
+ target/arm/tcg/sve_helper.c    | 8 ++++++++
+ target/arm/tcg/translate-sve.c | 7 ++++++-
+files changed, 18 insertions(+), 1 deletion(-)
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/helper-sve.h
++++ b/target/arm/tcg/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_fabs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fabs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fabs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fabs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fabs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(sve_ah_fabs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++
+ DEF_HELPER_FLAGS_4(sve_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(sve_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_ZPZ(sve_fabs_h, uint16_t, H1_2, DO_FABS)
+ DO_ZPZ(sve_fabs_s, uint32_t, H1_4, DO_FABS)
+ DO_ZPZ_D(sve_fabs_d, uint64_t, DO_FABS)
++#define DO_AH_FABS_H(N) (float16_is_any_nan(N) ? (N) : DO_FABS(N))
++#define DO_AH_FABS_S(N) (float32_is_any_nan(N) ? (N) : DO_FABS(N))
++#define DO_AH_FABS_D(N) (float64_is_any_nan(N) ? (N) : DO_FABS(N))
++
++DO_ZPZ(sve_ah_fabs_h, uint16_t, H1_2, DO_AH_FABS_H)
++DO_ZPZ(sve_ah_fabs_s, uint32_t, H1_4, DO_AH_FABS_S)
++DO_ZPZ_D(sve_ah_fabs_d, uint64_t, DO_AH_FABS_D)
++
+ #define DO_FNEG(N)    (N ^ ~((__typeof(N))-1 >> 1))
+ DO_ZPZ(sve_fneg_h, uint16_t, H1_2, DO_FNEG)
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const fabs_fns[4] = {
+     NULL,                  gen_helper_sve_fabs_h,
+     gen_helper_sve_fabs_s, gen_helper_sve_fabs_d,
+ };
+-TRANS_FEAT(FABS, aa64_sve, gen_gvec_ool_arg_zpz, fabs_fns[a->esz], a, 0)
++static gen_helper_gvec_3 * const fabs_ah_fns[4] = {
++    NULL,                  gen_helper_sve_ah_fabs_h,
++    gen_helper_sve_ah_fabs_s, gen_helper_sve_ah_fabs_d,
++};
++TRANS_FEAT(FABS, aa64_sve, gen_gvec_ool_arg_zpz,
++           s->fpcr_ah ? fabs_ah_fns[a->esz] : fabs_fns[a->esz], a, 0)
+ static gen_helper_gvec_3 * const fneg_fns[4] = {
+     NULL,                  gen_helper_sve_fneg_h,
+--
+.34.1

-[PULL 34/55] target/arm: Move stage_1_mmu_idx, arm_stage1_mmu_idx to ptw.c
+[PULL 35/68] target/arm: Handle FPCR.AH in SVE FABD
-From: Richard Henderson <richard.henderson@linaro.org>
+Make the SVE FABD insn honour the FPCR.AH "don't negate the sign
 of a NaN" semantics.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-28-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/helper.c | 32 --------------------------------
+ target/arm/tcg/helper-sve.h    |  7 +++++++
- target/arm/ptw.c    | 28 ++++++++++++++++++++++++++++
+ target/arm/tcg/sve_helper.c    | 22 ++++++++++++++++++++++
-files changed, 28 insertions(+), 32 deletions(-)
+ target/arm/tcg/translate-sve.c |  2 +-
 files changed, 30 insertions(+), 1 deletion(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/helper-sve.h
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/helper-sve.h
-@@ -XXX,XX +XXX,XX @@ uint64_t arm_sctlr(CPUARMState *env, int el)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_fabd_s, TCG_CALL_NO_RWG,
-     return env->cp15.sctlr_el[el];
+ DEF_HELPER_FLAGS_6(sve_fabd_d, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_6(sve_ah_fabd_h, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_6(sve_ah_fabd_s, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_6(sve_ah_fabd_d, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, fpst, i32)
 +
  DEF_HELPER_FLAGS_6(sve_fscalbn_h, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_6(sve_fscalbn_s, TCG_CALL_NO_RWG,
 diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/sve_helper.c
 +++ b/target/arm/tcg/sve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline float64 abd_d(float64 a, float64 b, float_status *s)
      return float64_abs(float64_sub(a, b, s));
  }
--#ifndef CONFIG_USER_ONLY
++/* ABD when FPCR.AH = 1: avoid flipping sign bit of a NaN result */
--/* Convert a possible stage1+2 MMU index into the appropriate
++static float16 ah_abd_h(float16 op1, float16 op2, float_status *stat)
 - * stage 1 MMU index
 - */
 -ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
 -{
 -    switch (mmu_idx) {
 -    case ARMMMUIdx_SE10_0:
 -        return ARMMMUIdx_Stage1_SE0;
 -    case ARMMMUIdx_SE10_1:
 -        return ARMMMUIdx_Stage1_SE1;
 -    case ARMMMUIdx_SE10_1_PAN:
 -        return ARMMMUIdx_Stage1_SE1_PAN;
 -    case ARMMMUIdx_E10_0:
 -        return ARMMMUIdx_Stage1_E0;
 -    case ARMMMUIdx_E10_1:
 -        return ARMMMUIdx_Stage1_E1;
 -    case ARMMMUIdx_E10_1_PAN:
 -        return ARMMMUIdx_Stage1_E1_PAN;
 -    default:
 -        return mmu_idx;
 -    }
 -}
 -#endif /* !CONFIG_USER_ONLY */
 -
  int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
  {
      if (regime_has_2_ranges(mmu_idx)) {
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx(CPUARMState *env)
      return arm_mmu_idx_el(env, arm_current_el(env));
  }
 -#ifndef CONFIG_USER_ONLY
 -ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
 -{
 -    return stage_1_mmu_idx(arm_mmu_idx(env));
 -}
 -#endif
 -
  static CPUARMTBFlags rebuild_hflags_common(CPUARMState *env, int fp_el,
                                             ARMMMUIdx mmu_idx,
                                             CPUARMTBFlags flags)
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ unsigned int arm_pamax(ARMCPU *cpu)
      return pamax_map[parange];
  }
 +/*
 + * Convert a possible stage1+2 MMU index into the appropriate stage 1 MMU index
 + */
 +ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
 +{
-+    switch (mmu_idx) {
++    float16 r = float16_sub(op1, op2, stat);
-+    case ARMMMUIdx_SE10_0:
++    return float16_is_any_nan(r) ? r : float16_abs(r);
 +        return ARMMMUIdx_Stage1_SE0;
 +    case ARMMMUIdx_SE10_1:
 +        return ARMMMUIdx_Stage1_SE1;
 +    case ARMMMUIdx_SE10_1_PAN:
 +        return ARMMMUIdx_Stage1_SE1_PAN;
 +    case ARMMMUIdx_E10_0:
 +        return ARMMMUIdx_Stage1_E0;
 +    case ARMMMUIdx_E10_1:
 +        return ARMMMUIdx_Stage1_E1;
 +    case ARMMMUIdx_E10_1_PAN:
 +        return ARMMMUIdx_Stage1_E1_PAN;
 +    default:
 +        return mmu_idx;
 +    }
 +}
 +
-+ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
++static float32 ah_abd_s(float32 op1, float32 op2, float_status *stat)
 +{
-+    return stage_1_mmu_idx(arm_mmu_idx(env));
++    float32 r = float32_sub(op1, op2, stat);
 +    return float32_is_any_nan(r) ? r : float32_abs(r);
 +}
 +
- static bool regime_translation_big_endian(CPUARMState *env, ARMMMUIdx mmu_idx)
++static float64 ah_abd_d(float64 op1, float64 op2, float_status *stat)
 +{
 +    float64 r = float64_sub(op1, op2, stat);
 +    return float64_is_any_nan(r) ? r : float64_abs(r);
 +}
 +
  DO_ZPZZ_FP(sve_fabd_h, uint16_t, H1_2, abd_h)
  DO_ZPZZ_FP(sve_fabd_s, uint32_t, H1_4, abd_s)
  DO_ZPZZ_FP(sve_fabd_d, uint64_t, H1_8, abd_d)
 +DO_ZPZZ_FP(sve_ah_fabd_h, uint16_t, H1_2, ah_abd_h)
 +DO_ZPZZ_FP(sve_ah_fabd_s, uint32_t, H1_4, ah_abd_s)
 +DO_ZPZZ_FP(sve_ah_fabd_d, uint64_t, H1_8, ah_abd_d)
  static inline float64 scalbn_d(float64 a, int64_t b, float_status *s)
  {
-     return (regime_sctlr(env, mmu_idx) & SCTLR_EE) != 0;
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ DO_ZPZZ_AH_FP(FMIN_zpzz, aa64_sve, sve_fmin, sve_ah_fmin)
  DO_ZPZZ_AH_FP(FMAX_zpzz, aa64_sve, sve_fmax, sve_ah_fmax)
  DO_ZPZZ_FP(FMINNM_zpzz, aa64_sve, sve_fminnum)
  DO_ZPZZ_FP(FMAXNM_zpzz, aa64_sve, sve_fmaxnum)
 -DO_ZPZZ_FP(FABD, aa64_sve, sve_fabd)
 +DO_ZPZZ_AH_FP(FABD, aa64_sve, sve_fabd, sve_ah_fabd)
  DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn)
  DO_ZPZZ_FP(FDIV, aa64_sve, sve_fdiv)
  DO_ZPZZ_FP(FMULX, aa64_sve, sve_fmulx)
 --
-.25.1
+.34.1

-[PULL 53/55] target/arm: Export bfdotadd from vec_helper.c
+[PULL 36/68] target/arm: Handle FPCR.AH in negation steps in SVE FCADD
-From: Richard Henderson <richard.henderson@linaro.org>
+The negation steps in FCADD must honour FPCR.AH's "don't change the
 sign of a NaN" semantics.  Implement this in the same way we did for
 the base ASIMD FCADD, by encoding FPCR.AH into the SIMD data field
 passed to the helper and using that to decide whether to negate the
 values.
-We will need this over in sme_helper.c.
+The construction of neg_imag and neg_real were done to make it easy
 to apply both in parallel with two simple logical operations.  This
 changed with FPCR.AH, which is more complex than that. Switch to
 an approach that follows the pseudocode more closely, by extracting
 the 'rot=1' parameter from the SIMD data field and changing the
 sign of the appropriate input value.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Note that there was a naming issue with neg_imag and neg_real.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+They were named backward, with neg_imag being non-zero for rot=1,
-Message-id: 20220607203306.657998-19-richard.henderson@linaro.org
+and vice versa.  This was combined with reversed usage within the
 loop, so that the negation in the end turned out correct.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/vec_internal.h | 13 +++++++++++++
+ target/arm/tcg/vec_internal.h  | 17 ++++++++++++++
- target/arm/vec_helper.c   |  2 +-
+ target/arm/tcg/sve_helper.c    | 42 ++++++++++++++++++++++++----------
-files changed, 14 insertions(+), 1 deletion(-)
+ target/arm/tcg/translate-sve.c |  2 +-
 files changed, 48 insertions(+), 13 deletions(-)
-diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
+diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_internal.h
+--- a/target/arm/tcg/vec_internal.h
-+++ b/target/arm/vec_internal.h
++++ b/target/arm/tcg/vec_internal.h
-@@ -XXX,XX +XXX,XX @@ uint64_t pmull_h(uint64_t op1, uint64_t op2);
+@@ -XXX,XX +XXX,XX @@
  #ifndef TARGET_ARM_VEC_INTERNAL_H
  #define TARGET_ARM_VEC_INTERNAL_H
 +#include "fpu/softfloat.h"
 +
  /*
   * Note that vector data is stored in host-endian 64-bit chunks,
   * so addressing units smaller than that needs a host-endian fixup.
@@ -XXX,XX +XXX,XX @@ float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2,
   */
- uint64_t pmull_w(uint64_t op1, uint64_t op2);
+ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp);
-+/**
++static inline float16 float16_maybe_ah_chs(float16 a, bool fpcr_ah)
-+ * bfdotadd:
++{
-+ * @sum: addend
++    return fpcr_ah && float16_is_any_nan(a) ? a : float16_chs(a);
-+ * @e1, @e2: multiplicand vectors
++}
-+ *
++
-+ * BFloat16 2-way dot product of @e1 & @e2, accumulating with @sum.
++static inline float32 float32_maybe_ah_chs(float32 a, bool fpcr_ah)
-+ * The @e1 and @e2 operands correspond to the 32-bit source vector
++{
-+ * slots and contain two Bfloat16 values each.
++    return fpcr_ah && float32_is_any_nan(a) ? a : float32_chs(a);
-+ *
++}
-+ * Corresponds to the ARM pseudocode function BFDotAdd.
++
-+ */
++static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah)
-+float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2);
++{
 +    return fpcr_ah && float64_is_any_nan(a) ? a : float64_chs(a);
 +}
 +
  #endif /* TARGET_ARM_VEC_INTERNAL_H */
-diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_helper.c
+--- a/target/arm/tcg/sve_helper.c
-+++ b/target/arm/vec_helper.c
++++ b/target/arm/tcg/sve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_MMLA_B(gvec_usmmla_b, do_usmmla_b)
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg,
   * BFloat16 Dot Product
   */
 -static float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2)
 +float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2)
  {
-     /* FPCR is ignored for BFDOT and BFMMLA. */
+     intptr_t j, i = simd_oprsz(desc);
-     float_status bf_status = {
+     uint64_t *g = vg;
 -    float16 neg_imag = float16_set_sign(0, simd_data(desc));
 -    float16 neg_real = float16_chs(neg_imag);
 +    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
      do {
          uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg,
              i -= 2 * sizeof(float16);
              e0 = *(float16 *)(vn + H1_2(i));
 -            e1 = *(float16 *)(vm + H1_2(j)) ^ neg_real;
 +            e1 = *(float16 *)(vm + H1_2(j));
              e2 = *(float16 *)(vn + H1_2(j));
 -            e3 = *(float16 *)(vm + H1_2(i)) ^ neg_imag;
 +            e3 = *(float16 *)(vm + H1_2(i));
 +
 +            if (rot) {
 +                e3 = float16_maybe_ah_chs(e3, fpcr_ah);
 +            } else {
 +                e1 = float16_maybe_ah_chs(e1, fpcr_ah);
 +            }
              if (likely((pg >> (i & 63)) & 1)) {
                  *(float16 *)(vd + H1_2(i)) = float16_add(e0, e1, s);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg,
  {
      intptr_t j, i = simd_oprsz(desc);
      uint64_t *g = vg;
 -    float32 neg_imag = float32_set_sign(0, simd_data(desc));
 -    float32 neg_real = float32_chs(neg_imag);
 +    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
      do {
          uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg,
              i -= 2 * sizeof(float32);
              e0 = *(float32 *)(vn + H1_2(i));
 -            e1 = *(float32 *)(vm + H1_2(j)) ^ neg_real;
 +            e1 = *(float32 *)(vm + H1_2(j));
              e2 = *(float32 *)(vn + H1_2(j));
 -            e3 = *(float32 *)(vm + H1_2(i)) ^ neg_imag;
 +            e3 = *(float32 *)(vm + H1_2(i));
 +
 +            if (rot) {
 +                e3 = float32_maybe_ah_chs(e3, fpcr_ah);
 +            } else {
 +                e1 = float32_maybe_ah_chs(e1, fpcr_ah);
 +            }
              if (likely((pg >> (i & 63)) & 1)) {
                  *(float32 *)(vd + H1_2(i)) = float32_add(e0, e1, s);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg,
  {
      intptr_t j, i = simd_oprsz(desc);
      uint64_t *g = vg;
 -    float64 neg_imag = float64_set_sign(0, simd_data(desc));
 -    float64 neg_real = float64_chs(neg_imag);
 +    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
      do {
          uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg,
              i -= 2 * sizeof(float64);
              e0 = *(float64 *)(vn + H1_2(i));
 -            e1 = *(float64 *)(vm + H1_2(j)) ^ neg_real;
 +            e1 = *(float64 *)(vm + H1_2(j));
              e2 = *(float64 *)(vn + H1_2(j));
 -            e3 = *(float64 *)(vm + H1_2(i)) ^ neg_imag;
 +            e3 = *(float64 *)(vm + H1_2(i));
 +
 +            if (rot) {
 +                e3 = float64_maybe_ah_chs(e3, fpcr_ah);
 +            } else {
 +                e1 = float64_maybe_ah_chs(e1, fpcr_ah);
 +            }
              if (likely((pg >> (i & 63)) & 1)) {
                  *(float64 *)(vd + H1_2(i)) = float64_add(e0, e1, s);
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_4_ptr * const fcadd_fns[] = {
      gen_helper_sve_fcadd_s, gen_helper_sve_fcadd_d,
  };
  TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
 -           a->rd, a->rn, a->rm, a->pg, a->rot,
 +           a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
             a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
  #define DO_FMLA(NAME, name) \
 --
-.25.1
+.34.1

-New patch
+[PULL 37/68] target/arm: Handle FPCR.AH in negation steps in FCADD
+The negation steps in FCADD must honour FPCR.AH's "don't change the
+sign of a NaN" semantics.  Implement this by encoding FPCR.AH into
+the SIMD data field passed to the helper and using that to decide
+whether to negate the values.
+The construction of neg_imag and neg_real were done to make it easy
+to apply both in parallel with two simple logical operations.  This
+changed with FPCR.AH, which is more complex than that. Switch to
+an approach closer to the pseudocode, where we extract the rot
+parameter from the SIMD data word and negate the appropriate
+input value.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/tcg/translate-a64.c | 10 +++++--
+ target/arm/tcg/vec_helper.c    | 54 +++++++++++++++++++---------------
+files changed, 38 insertions(+), 26 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.c
++++ b/target/arm/tcg/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fcadd[3] = {
+     gen_helper_gvec_fcadds,
+     gen_helper_gvec_fcaddd,
+ };
+-TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0, f_vector_fcadd)
+-TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1, f_vector_fcadd)
++/*
++ * Encode FPCR.AH into the data so the helper knows whether the
++ * negations it does should avoid flipping the sign bit on a NaN
++ */
++TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0 | (s->fpcr_ah << 1),
++           f_vector_fcadd)
++TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1 | (s->fpcr_ah << 1),
++           f_vector_fcadd)
+ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
+ {
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/vec_helper.c
++++ b/target/arm/tcg/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm,
+     float16 *d = vd;
+     float16 *n = vn;
+     float16 *m = vm;
+-    uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
+-    uint32_t neg_imag = neg_real ^ 1;
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
+     uintptr_t i;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 15;
+-    neg_imag <<= 15;
+-
+     for (i = 0; i < opr_sz / 2; i += 2) {
+         float16 e0 = n[H2(i)];
+-        float16 e1 = m[H2(i + 1)] ^ neg_imag;
++        float16 e1 = m[H2(i + 1)];
+         float16 e2 = n[H2(i + 1)];
+-        float16 e3 = m[H2(i)] ^ neg_real;
++        float16 e3 = m[H2(i)];
++
++        if (rot) {
++            e3 = float16_maybe_ah_chs(e3, fpcr_ah);
++        } else {
++            e1 = float16_maybe_ah_chs(e1, fpcr_ah);
++        }
+         d[H2(i)] = float16_add(e0, e1, fpst);
+         d[H2(i + 1)] = float16_add(e2, e3, fpst);
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcadds)(void *vd, void *vn, void *vm,
+     float32 *d = vd;
+     float32 *n = vn;
+     float32 *m = vm;
+-    uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
+-    uint32_t neg_imag = neg_real ^ 1;
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
+     uintptr_t i;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 31;
+-    neg_imag <<= 31;
+-
+     for (i = 0; i < opr_sz / 4; i += 2) {
+         float32 e0 = n[H4(i)];
+-        float32 e1 = m[H4(i + 1)] ^ neg_imag;
++        float32 e1 = m[H4(i + 1)];
+         float32 e2 = n[H4(i + 1)];
+-        float32 e3 = m[H4(i)] ^ neg_real;
++        float32 e3 = m[H4(i)];
++
++        if (rot) {
++            e3 = float32_maybe_ah_chs(e3, fpcr_ah);
++        } else {
++            e1 = float32_maybe_ah_chs(e1, fpcr_ah);
++        }
+         d[H4(i)] = float32_add(e0, e1, fpst);
+         d[H4(i + 1)] = float32_add(e2, e3, fpst);
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
+     float64 *d = vd;
+     float64 *n = vn;
+     float64 *m = vm;
+-    uint64_t neg_real = extract64(desc, SIMD_DATA_SHIFT, 1);
+-    uint64_t neg_imag = neg_real ^ 1;
++    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
++    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
+     uintptr_t i;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
+-    neg_real <<= 63;
+-    neg_imag <<= 63;
+-
+     for (i = 0; i < opr_sz / 8; i += 2) {
+         float64 e0 = n[i];
+-        float64 e1 = m[i + 1] ^ neg_imag;
++        float64 e1 = m[i + 1];
+         float64 e2 = n[i + 1];
+-        float64 e3 = m[i] ^ neg_real;
++        float64 e3 = m[i];
++
++        if (rot) {
++            e3 = float64_maybe_ah_chs(e3, fpcr_ah);
++        } else {
++            e1 = float64_maybe_ah_chs(e1, fpcr_ah);
++        }
+         d[i] = float64_add(e0, e1, fpst);
+         d[i + 1] = float64_add(e2, e3, fpst);
+--
+.34.1

-[PULL 29/55] target/arm: Move ap_to_tw_prot etc to ptw.c
+[PULL 38/68] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
-From: Richard Henderson <richard.henderson@linaro.org>
+Handle the FPCR.AH semantics that we do not change the sign of an
 input NaN in the FRECPS and FRSQRTS scalar insns, by providing
 new helper functions that do the CHS part of the operation
 differently.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Since the extra helper functions would be very repetitive if written
-Message-id: 20220604040607.269301-23-richard.henderson@linaro.org
+out longhand, we condense them and the existing non-AH helpers into
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+being emitted via macros.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/ptw.h    | 10 ------
+ target/arm/tcg/helper-a64.h    |   6 ++
- target/arm/helper.c | 77 ------------------------------------------
+ target/arm/tcg/vec_internal.h  |  18 ++++++
- target/arm/ptw.c    | 81 +++++++++++++++++++++++++++++++++++++++++++++
+ target/arm/tcg/helper-a64.c    | 115 ++++++++++++---------------------
-files changed, 81 insertions(+), 87 deletions(-)
+ target/arm/tcg/translate-a64.c |  25 +++++--
 files changed, 83 insertions(+), 81 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/tcg/helper-a64.h
-+++ b/target/arm/ptw.h
++++ b/target/arm/tcg/helper-a64.h
-@@ -XXX,XX +XXX,XX @@ bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx);
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(neon_cgt_f64, TCG_CALL_NO_RWG, i64, i64, i64, fpst)
- bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
+ DEF_HELPER_FLAGS_3(recpsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
- uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn);
+ DEF_HELPER_FLAGS_3(recpsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
+ DEF_HELPER_FLAGS_3(recpsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
--int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
++DEF_HELPER_FLAGS_3(recpsf_ah_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
--                  int ap, int domain_prot);
++DEF_HELPER_FLAGS_3(recpsf_ah_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
--int simple_ap_to_rw_prot_is_user(int ap, bool is_user);
++DEF_HELPER_FLAGS_3(recpsf_ah_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
--
+ DEF_HELPER_FLAGS_3(rsqrtsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
--static inline int
+ DEF_HELPER_FLAGS_3(rsqrtsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
--simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+ DEF_HELPER_FLAGS_3(rsqrtsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
--{
++DEF_HELPER_FLAGS_3(rsqrtsf_ah_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
--    return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
++DEF_HELPER_FLAGS_3(rsqrtsf_ah_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
--}
++DEF_HELPER_FLAGS_3(rsqrtsf_ah_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
--
+ DEF_HELPER_FLAGS_2(frecpx_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
- #endif /* !CONFIG_USER_ONLY */
+ DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
- #endif /* TARGET_ARM_PTW_H */
+ DEF_HELPER_FLAGS_2(frecpx_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/vec_internal.h
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/vec_internal.h
-@@ -XXX,XX +XXX,XX @@ bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
+@@ -XXX,XX +XXX,XX @@ float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2,
-         g_assert_not_reached();
+  */
-     }
+ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp);
  }
 -
 -/* Translate section/page access permissions to page
 - * R/W protection flags
 - *
 - * @env:         CPUARMState
 - * @mmu_idx:     MMU index indicating required translation regime
 - * @ap:          The 3-bit access permissions (AP[2:0])
 - * @domain_prot: The 2-bit domain access permissions
 - */
 -int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap, int domain_prot)
 -{
 -    bool is_user = regime_is_user(env, mmu_idx);
 -
 -    if (domain_prot == 3) {
 -        return PAGE_READ | PAGE_WRITE;
 -    }
 -
 -    switch (ap) {
 -    case 0:
 -        if (arm_feature(env, ARM_FEATURE_V7)) {
 -            return 0;
 -        }
 -        switch (regime_sctlr(env, mmu_idx) & (SCTLR_S | SCTLR_R)) {
 -        case SCTLR_S:
 -            return is_user ? 0 : PAGE_READ;
 -        case SCTLR_R:
 -            return PAGE_READ;
 -        default:
 -            return 0;
 -        }
 -    case 1:
 -        return is_user ? 0 : PAGE_READ | PAGE_WRITE;
 -    case 2:
 -        if (is_user) {
 -            return PAGE_READ;
 -        } else {
 -            return PAGE_READ | PAGE_WRITE;
 -        }
 -    case 3:
 -        return PAGE_READ | PAGE_WRITE;
 -    case 4: /* Reserved.  */
 -        return 0;
 -    case 5:
 -        return is_user ? 0 : PAGE_READ;
 -    case 6:
 -        return PAGE_READ;
 -    case 7:
 -        if (!arm_feature(env, ARM_FEATURE_V6K)) {
 -            return 0;
 -        }
 -        return PAGE_READ;
 -    default:
 -        g_assert_not_reached();
 -    }
 -}
 -
 -/* Translate section/page access permissions to page
 - * R/W protection flags.
 - *
 - * @ap:      The 2-bit simple AP (AP[2:1])
 - * @is_user: TRUE if accessing from PL0
 - */
 -int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
 -{
 -    switch (ap) {
 -    case 0:
 -        return is_user ? 0 : PAGE_READ | PAGE_WRITE;
 -    case 1:
 -        return PAGE_READ | PAGE_WRITE;
 -    case 2:
 -        return is_user ? 0 : PAGE_READ;
 -    case 3:
 -        return PAGE_READ;
 -    default:
 -        g_assert_not_reached();
 -    }
 -}
  #endif /* !CONFIG_USER_ONLY */
  int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
      return true;
  }
 +/*
-+ * Translate section/page access permissions to page R/W protection flags
++ * Negate as for FPCR.AH=1 -- do not negate NaNs.
 + * @env:         CPUARMState
 + * @mmu_idx:     MMU index indicating required translation regime
 + * @ap:          The 3-bit access permissions (AP[2:0])
 + * @domain_prot: The 2-bit domain access permissions
 + */
-+static int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
++static inline float16 float16_ah_chs(float16 a)
 +                         int ap, int domain_prot)
 +{
-+    bool is_user = regime_is_user(env, mmu_idx);
++    return float16_is_any_nan(a) ? a : float16_chs(a);
 +
 +    if (domain_prot == 3) {
 +        return PAGE_READ | PAGE_WRITE;
 +    }
 +
 +    switch (ap) {
 +    case 0:
 +        if (arm_feature(env, ARM_FEATURE_V7)) {
 +            return 0;
 +        }
 +        switch (regime_sctlr(env, mmu_idx) & (SCTLR_S | SCTLR_R)) {
 +        case SCTLR_S:
 +            return is_user ? 0 : PAGE_READ;
 +        case SCTLR_R:
 +            return PAGE_READ;
 +        default:
 +            return 0;
 +        }
 +    case 1:
 +        return is_user ? 0 : PAGE_READ | PAGE_WRITE;
 +    case 2:
 +        if (is_user) {
 +            return PAGE_READ;
 +        } else {
 +            return PAGE_READ | PAGE_WRITE;
 +        }
 +    case 3:
 +        return PAGE_READ | PAGE_WRITE;
 +    case 4: /* Reserved.  */
 +        return 0;
 +    case 5:
 +        return is_user ? 0 : PAGE_READ;
 +    case 6:
 +        return PAGE_READ;
 +    case 7:
 +        if (!arm_feature(env, ARM_FEATURE_V6K)) {
 +            return 0;
 +        }
 +        return PAGE_READ;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
-+/*
++static inline float32 float32_ah_chs(float32 a)
 + * Translate section/page access permissions to page R/W protection flags.
 + * @ap:      The 2-bit simple AP (AP[2:1])
 + * @is_user: TRUE if accessing from PL0
 + */
 +static int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
 +{
-+    switch (ap) {
++    return float32_is_any_nan(a) ? a : float32_chs(a);
 +    case 0:
 +        return is_user ? 0 : PAGE_READ | PAGE_WRITE;
 +    case 1:
 +        return PAGE_READ | PAGE_WRITE;
 +    case 2:
 +        return is_user ? 0 : PAGE_READ;
 +    case 3:
 +        return PAGE_READ;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
-+static int simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
++static inline float64 float64_ah_chs(float64 a)
 +{
-+    return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
++    return float64_is_any_nan(a) ? a : float64_chs(a);
 +}
 +
- static bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
+ static inline float16 float16_maybe_ah_chs(float16 a, bool fpcr_ah)
-                              MMUAccessType access_type, ARMMMUIdx mmu_idx,
+ {
-                              hwaddr *phys_ptr, int *prot,
+     return fpcr_ah && float16_is_any_nan(a) ? a : float16_chs(a);
 diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/helper-a64.c
 +++ b/target/arm/tcg/helper-a64.c
@@ -XXX,XX +XXX,XX @@
  #ifdef CONFIG_USER_ONLY
  #include "user/page-protection.h"
  #endif
 +#include "vec_internal.h"
  /* C2.4.7 Multiply and divide */
  /* special cases for 0 and LLONG_MIN are mandated by the standard */
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_cgt_f64)(float64 a, float64 b, float_status *fpst)
      return -float64_lt(b, a, fpst);
  }
 -/* Reciprocal step and sqrt step. Note that unlike the A32/T32
 +/*
 + * Reciprocal step and sqrt step. Note that unlike the A32/T32
   * versions, these do a fully fused multiply-add or
   * multiply-add-and-halve.
 + * The FPCR.AH == 1 versions need to avoid flipping the sign of NaN.
   */
 -
 -uint32_t HELPER(recpsf_f16)(uint32_t a, uint32_t b, float_status *fpst)
 -{
 -    a = float16_squash_input_denormal(a, fpst);
 -    b = float16_squash_input_denormal(b, fpst);
 -
 -    a = float16_chs(a);
 -    if ((float16_is_infinity(a) && float16_is_zero(b)) ||
 -        (float16_is_infinity(b) && float16_is_zero(a))) {
 -        return float16_two;
 +#define DO_RECPS(NAME, CTYPE, FLOATTYPE, CHSFN)                         \
 +    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
 +    {                                                                   \
 +        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
 +        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
 +        a = FLOATTYPE ## _ ## CHSFN(a);                                 \
 +        if ((FLOATTYPE ## _is_infinity(a) && FLOATTYPE ## _is_zero(b)) || \
 +            (FLOATTYPE ## _is_infinity(b) && FLOATTYPE ## _is_zero(a))) { \
 +            return FLOATTYPE ## _two;                                   \
 +        }                                                               \
 +        return FLOATTYPE ## _muladd(a, b, FLOATTYPE ## _two, 0, fpst);  \
      }
 -    return float16_muladd(a, b, float16_two, 0, fpst);
 -}
 -float32 HELPER(recpsf_f32)(float32 a, float32 b, float_status *fpst)
 -{
 -    a = float32_squash_input_denormal(a, fpst);
 -    b = float32_squash_input_denormal(b, fpst);
 +DO_RECPS(recpsf_f16, uint32_t, float16, chs)
 +DO_RECPS(recpsf_f32, float32, float32, chs)
 +DO_RECPS(recpsf_f64, float64, float64, chs)
 +DO_RECPS(recpsf_ah_f16, uint32_t, float16, ah_chs)
 +DO_RECPS(recpsf_ah_f32, float32, float32, ah_chs)
 +DO_RECPS(recpsf_ah_f64, float64, float64, ah_chs)
 -    a = float32_chs(a);
 -    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
 -        (float32_is_infinity(b) && float32_is_zero(a))) {
 -        return float32_two;
 -    }
 -    return float32_muladd(a, b, float32_two, 0, fpst);
 -}
 +#define DO_RSQRTSF(NAME, CTYPE, FLOATTYPE, CHSFN)                       \
 +    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
 +    {                                                                   \
 +        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
 +        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
 +        a = FLOATTYPE ## _ ## CHSFN(a);                                 \
 +        if ((FLOATTYPE ## _is_infinity(a) && FLOATTYPE ## _is_zero(b)) || \
 +            (FLOATTYPE ## _is_infinity(b) && FLOATTYPE ## _is_zero(a))) { \
 +            return FLOATTYPE ## _one_point_five;                        \
 +        }                                                               \
 +        return FLOATTYPE ## _muladd_scalbn(a, b, FLOATTYPE ## _three,   \
 +                                           -1, 0, fpst);                \
 +    }                                                                   \
 -float64 HELPER(recpsf_f64)(float64 a, float64 b, float_status *fpst)
 -{
 -    a = float64_squash_input_denormal(a, fpst);
 -    b = float64_squash_input_denormal(b, fpst);
 -
 -    a = float64_chs(a);
 -    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
 -        (float64_is_infinity(b) && float64_is_zero(a))) {
 -        return float64_two;
 -    }
 -    return float64_muladd(a, b, float64_two, 0, fpst);
 -}
 -
 -uint32_t HELPER(rsqrtsf_f16)(uint32_t a, uint32_t b, float_status *fpst)
 -{
 -    a = float16_squash_input_denormal(a, fpst);
 -    b = float16_squash_input_denormal(b, fpst);
 -
 -    a = float16_chs(a);
 -    if ((float16_is_infinity(a) && float16_is_zero(b)) ||
 -        (float16_is_infinity(b) && float16_is_zero(a))) {
 -        return float16_one_point_five;
 -    }
 -    return float16_muladd_scalbn(a, b, float16_three, -1, 0, fpst);
 -}
 -
 -float32 HELPER(rsqrtsf_f32)(float32 a, float32 b, float_status *fpst)
 -{
 -    a = float32_squash_input_denormal(a, fpst);
 -    b = float32_squash_input_denormal(b, fpst);
 -
 -    a = float32_chs(a);
 -    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
 -        (float32_is_infinity(b) && float32_is_zero(a))) {
 -        return float32_one_point_five;
 -    }
 -    return float32_muladd_scalbn(a, b, float32_three, -1, 0, fpst);
 -}
 -
 -float64 HELPER(rsqrtsf_f64)(float64 a, float64 b, float_status *fpst)
 -{
 -    a = float64_squash_input_denormal(a, fpst);
 -    b = float64_squash_input_denormal(b, fpst);
 -
 -    a = float64_chs(a);
 -    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
 -        (float64_is_infinity(b) && float64_is_zero(a))) {
 -        return float64_one_point_five;
 -    }
 -    return float64_muladd_scalbn(a, b, float64_three, -1, 0, fpst);
 -}
 +DO_RSQRTSF(rsqrtsf_f16, uint32_t, float16, chs)
 +DO_RSQRTSF(rsqrtsf_f32, float32, float32, chs)
 +DO_RSQRTSF(rsqrtsf_f64, float64, float64, chs)
 +DO_RSQRTSF(rsqrtsf_ah_f16, uint32_t, float16, ah_chs)
 +DO_RSQRTSF(rsqrtsf_ah_f32, float32, float32, ah_chs)
 +DO_RSQRTSF(rsqrtsf_ah_f64, float64, float64, ah_chs)
  /* Floating-point reciprocal exponent - see FPRecpX in ARM ARM */
  uint32_t HELPER(frecpx_f16)(uint32_t a, float_status *fpst)
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
                                         FPST_A64_F16 : FPST_A64);
  }
 -static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
 -                             int mergereg)
 +static bool do_fp3_scalar_ah_2fn(DisasContext *s, arg_rrr_e *a,
 +                                 const FPScalar *fnormal, const FPScalar *fah,
 +                                 int mergereg)
  {
 -    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
 -                                       select_ah_fpst(s, a->esz));
 +    return do_fp3_scalar_with_fpsttype(s, a, s->fpcr_ah ? fah : fnormal,
 +                                       mergereg, select_ah_fpst(s, a->esz));
  }
  /* Some insns need to call different helpers when FPCR.AH == 1 */
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_frecps = {
      gen_helper_recpsf_f32,
      gen_helper_recpsf_f64,
  };
 -TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps, a->rn)
 +static const FPScalar f_scalar_ah_frecps = {
 +    gen_helper_recpsf_ah_f16,
 +    gen_helper_recpsf_ah_f32,
 +    gen_helper_recpsf_ah_f64,
 +};
 +TRANS(FRECPS_s, do_fp3_scalar_ah_2fn, a,
 +      &f_scalar_frecps, &f_scalar_ah_frecps, a->rn)
  static const FPScalar f_scalar_frsqrts = {
      gen_helper_rsqrtsf_f16,
      gen_helper_rsqrtsf_f32,
      gen_helper_rsqrtsf_f64,
  };
 -TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts, a->rn)
 +static const FPScalar f_scalar_ah_frsqrts = {
 +    gen_helper_rsqrtsf_ah_f16,
 +    gen_helper_rsqrtsf_ah_f32,
 +    gen_helper_rsqrtsf_ah_f64,
 +};
 +TRANS(FRSQRTS_s, do_fp3_scalar_ah_2fn, a,
 +      &f_scalar_frsqrts, &f_scalar_ah_frsqrts, a->rn)
  static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                         const FPScalar *f, bool swap)
 --
-.25.1
+.34.1

-[PULL 20/55] target/arm: Move get_level1_table_address to ptw.c
+[PULL 39/68] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
-From: Richard Henderson <richard.henderson@linaro.org>
+Handle the FPCR.AH "don't negate the sign of a NaN" semantics
 in the vector versions of FRECPS and FRSQRTS, by implementing
 new vector wrappers that call the _ah_ scalar helpers.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-14-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/ptw.h    |  4 ++--
+ target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
- target/arm/helper.c | 26 +-------------------------
+ target/arm/tcg/translate-a64.c | 21 ++++++++++++++++-----
- target/arm/ptw.c    | 23 +++++++++++++++++++++++
+ target/arm/tcg/translate-sve.c |  7 ++++++-
-files changed, 26 insertions(+), 27 deletions(-)
+ target/arm/tcg/vec_helper.c    |  8 ++++++++
 files changed, 44 insertions(+), 6 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/tcg/helper-sve.h
-+++ b/target/arm/ptw.h
++++ b/target/arm/tcg/helper-sve.h
-@@ -XXX,XX +XXX,XX @@ uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG,
- bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx);
+                    void, ptr, ptr, ptr, fpst, i32)
- bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
-+uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn);
++DEF_HELPER_FLAGS_5(gvec_ah_recps_h, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_5(gvec_ah_recps_s, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_5(gvec_ah_recps_d, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, fpst, i32)
 +
- ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
++DEF_HELPER_FLAGS_5(gvec_ah_rsqrts_h, TCG_CALL_NO_RWG,
-                                  ARMCacheAttrs s1, ARMCacheAttrs s2);
++                   void, ptr, ptr, ptr, fpst, i32)
++DEF_HELPER_FLAGS_5(gvec_ah_rsqrts_s, TCG_CALL_NO_RWG,
--bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
++                   void, ptr, ptr, ptr, fpst, i32)
--                              uint32_t *table, uint32_t address);
++DEF_HELPER_FLAGS_5(gvec_ah_rsqrts_d, TCG_CALL_NO_RWG,
- int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
++                   void, ptr, ptr, ptr, fpst, i32)
-                   int ap, int domain_prot);
++
- int simple_ap_to_rw_prot_is_user(int ap, bool is_user);
+ DEF_HELPER_FLAGS_5(gvec_ah_fmax_h, TCG_CALL_NO_RWG,
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+                    void, ptr, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_5(gvec_ah_fmax_s, TCG_CALL_NO_RWG,
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static inline bool regime_translation_big_endian(CPUARMState *env,
+@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector_2fn(DisasContext *s, arg_qrrr_e *a, int data,
      return do_fp3_vector(s, a, data, s->fpcr_ah ? fah : fnormal);
  }
- /* Return the TTBR associated with this translation regime */
+-static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
--static inline uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx,
+-                             gen_helper_gvec_3_ptr * const f[3])
--                                   int ttbrn)
++static bool do_fp3_vector_ah_2fn(DisasContext *s, arg_qrrr_e *a, int data,
-+uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn)
++                                 gen_helper_gvec_3_ptr * const fnormal[3],
 +                                 gen_helper_gvec_3_ptr * const fah[3])
  {
-     if (mmu_idx == ARMMMUIdx_Stage2) {
+-    return do_fp3_vector_with_fpsttype(s, a, data, f,
-         return env->cp15.vttbr_el2;
++    return do_fp3_vector_with_fpsttype(s, a, data, s->fpcr_ah ? fah : fnormal,
-@@ -XXX,XX +XXX,XX @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
+                                        select_ah_fpst(s, a->esz));
      return prot_rw | PAGE_EXEC;
  }
--bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
--                              uint32_t *table, uint32_t address)
+     gen_helper_gvec_recps_s,
--{
+     gen_helper_gvec_recps_d,
--    /* Note that we can only get here for an AArch32 PL0/PL1 lookup */
+ };
--    TCR *tcr = regime_tcr(env, mmu_idx);
+-TRANS(FRECPS_v, do_fp3_vector_ah, a, 0, f_vector_frecps)
--
++static gen_helper_gvec_3_ptr * const f_vector_ah_frecps[3] = {
--    if (address & tcr->mask) {
++    gen_helper_gvec_ah_recps_h,
--        if (tcr->raw_tcr & TTBCR_PD1) {
++    gen_helper_gvec_ah_recps_s,
--            /* Translation table walk disabled for TTBR1 */
++    gen_helper_gvec_ah_recps_d,
--            return false;
++};
--        }
++TRANS(FRECPS_v, do_fp3_vector_ah_2fn, a, 0, f_vector_frecps, f_vector_ah_frecps)
--        *table = regime_ttbr(env, mmu_idx, 1) & 0xffffc000;
--    } else {
+ static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] = {
--        if (tcr->raw_tcr & TTBCR_PD0) {
+     gen_helper_gvec_rsqrts_h,
--            /* Translation table walk disabled for TTBR0 */
+     gen_helper_gvec_rsqrts_s,
--            return false;
+     gen_helper_gvec_rsqrts_d,
--        }
+ };
--        *table = regime_ttbr(env, mmu_idx, 0) & tcr->base_mask;
+-TRANS(FRSQRTS_v, do_fp3_vector_ah, a, 0, f_vector_frsqrts)
--    }
++static gen_helper_gvec_3_ptr * const f_vector_ah_frsqrts[3] = {
--    *table |= (address >> 18) & 0x3ffc;
++    gen_helper_gvec_ah_rsqrts_h,
--    return true;
++    gen_helper_gvec_ah_rsqrts_s,
--}
++    gen_helper_gvec_ah_rsqrts_d,
--
++};
- static bool ptw_attrs_are_device(CPUARMState *env, ARMCacheAttrs cacheattrs)
++TRANS(FRSQRTS_v, do_fp3_vector_ah_2fn, a, 0, f_vector_frsqrts, f_vector_ah_frsqrts)
- {
-     /*
+ static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = {
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+     gen_helper_gvec_faddp_h,
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/tcg/translate-sve.c
-+++ b/target/arm/ptw.c
++++ b/target/arm/tcg/translate-sve.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
- #include "ptw.h"
+         NULL, gen_helper_gvec_##name##_h,                           \
+         gen_helper_gvec_##name##_s, gen_helper_gvec_##name##_d      \
+     };                                                              \
-+static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
+-    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz, name##_fns[a->esz], a, 0)
-+                                     uint32_t *table, uint32_t address)
++    static gen_helper_gvec_3_ptr * const name##_ah_fns[4] = {       \
-+{
++        NULL, gen_helper_gvec_ah_##name##_h,                        \
-+    /* Note that we can only get here for an AArch32 PL0/PL1 lookup */
++        gen_helper_gvec_ah_##name##_s, gen_helper_gvec_ah_##name##_d    \
-+    TCR *tcr = regime_tcr(env, mmu_idx);
++    };                                                              \
 +    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz,            \
 +               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz], a, 0)
  DO_FP3(FADD_zzz, fadd)
  DO_FP3(FSUB_zzz, fsub)
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_rsqrts_h, helper_rsqrtsf_f16, float16)
  DO_3OP(gvec_rsqrts_s, helper_rsqrtsf_f32, float32)
  DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64)
 +DO_3OP(gvec_ah_recps_h, helper_recpsf_ah_f16, float16)
 +DO_3OP(gvec_ah_recps_s, helper_recpsf_ah_f32, float32)
 +DO_3OP(gvec_ah_recps_d, helper_recpsf_ah_f64, float64)
 +
-+    if (address & tcr->mask) {
++DO_3OP(gvec_ah_rsqrts_h, helper_rsqrtsf_ah_f16, float16)
-+        if (tcr->raw_tcr & TTBCR_PD1) {
++DO_3OP(gvec_ah_rsqrts_s, helper_rsqrtsf_ah_f32, float32)
-+            /* Translation table walk disabled for TTBR1 */
++DO_3OP(gvec_ah_rsqrts_d, helper_rsqrtsf_ah_f64, float64)
 +            return false;
 +        }
 +        *table = regime_ttbr(env, mmu_idx, 1) & 0xffffc000;
 +    } else {
 +        if (tcr->raw_tcr & TTBCR_PD0) {
 +            /* Translation table walk disabled for TTBR0 */
 +            return false;
 +        }
 +        *table = regime_ttbr(env, mmu_idx, 0) & tcr->base_mask;
 +    }
 +    *table |= (address >> 18) & 0x3ffc;
 +    return true;
 +}
 +
- static bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
+ DO_3OP(gvec_ah_fmax_h, helper_vfp_ah_maxh, float16)
-                              MMUAccessType access_type, ARMMMUIdx mmu_idx,
+ DO_3OP(gvec_ah_fmax_s, helper_vfp_ah_maxs, float32)
-                              hwaddr *phys_ptr, int *prot,
+ DO_3OP(gvec_ah_fmax_d, helper_vfp_ah_maxd, float64)
 --
-.25.1
+.34.1

-[PULL 49/55] target/arm: Export sve contiguous ldst support functions
+[PULL 40/68] target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
-From: Richard Henderson <richard.henderson@linaro.org>
+Handle the FPCR.AH "don't negate the sign of a NaN" semantics in FMLS
 (indexed). We do this by creating 6 new helpers, which allow us to
 do the negation either by XOR (for AH=0) or by muladd flags
 (for AH=1).
-Export all of the support functions for performing bulk
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-fault analysis on a set of elements at contiguous addresses
+[PMM: Mostly from RTH's patch; error in index order into fns[][]
-controlled by a predicate.
+ fixed]
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/helper.h            | 14 ++++++++++++++
  target/arm/tcg/translate-a64.c | 17 +++++++++++------
  target/arm/tcg/translate-sve.c | 31 +++++++++++++++++--------------
  target/arm/tcg/vec_helper.c    | 24 +++++++++++++++---------
 files changed, 57 insertions(+), 29 deletions(-)
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20220607203306.657998-15-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/sve_ldst_internal.h | 94 ++++++++++++++++++++++++++++++++++
  target/arm/sve_helper.c        | 87 ++++++-------------------------
 files changed, 111 insertions(+), 70 deletions(-)
 diff --git a/target/arm/sve_ldst_internal.h b/target/arm/sve_ldst_internal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/sve_ldst_internal.h
+--- a/target/arm/helper.h
-+++ b/target/arm/sve_ldst_internal.h
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG,
- #undef DO_LD_PRIM_2
+ DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG,
- #undef DO_ST_PRIM_2
+                    void, ptr, ptr, ptr, ptr, fpst, i32)
-+/*
++DEF_HELPER_FLAGS_6(gvec_fmls_idx_h, TCG_CALL_NO_RWG,
-+ * Resolve the guest virtual address to info->host and info->flags.
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
-+ * If @nofault, return false if the page is invalid, otherwise
++DEF_HELPER_FLAGS_6(gvec_fmls_idx_s, TCG_CALL_NO_RWG,
-+ * exit via page fault exception.
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
-+ */
++DEF_HELPER_FLAGS_6(gvec_fmls_idx_d, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, fpst, i32)
 +
-+typedef struct {
++DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_h, TCG_CALL_NO_RWG,
-+    void *host;
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
-+    int flags;
++DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_s, TCG_CALL_NO_RWG,
-+    MemTxAttrs attrs;
++                   void, ptr, ptr, ptr, ptr, fpst, i32)
-+} SVEHostPage;
++DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_d, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, fpst, i32)
 +
-+bool sve_probe_page(SVEHostPage *info, bool nofault, CPUARMState *env,
+ DEF_HELPER_FLAGS_5(gvec_uqadd_b, TCG_CALL_NO_RWG,
-+                    target_ulong addr, int mem_off, MMUAccessType access_type,
+                    void, ptr, ptr, ptr, ptr, i32)
-+                    int mmu_idx, uintptr_t retaddr);
+ DEF_HELPER_FLAGS_5(gvec_uqadd_h, TCG_CALL_NO_RWG,
-+
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 +/*
 + * Analyse contiguous data, protected by a governing predicate.
 + */
 +
 +typedef enum {
 +    FAULT_NO,
 +    FAULT_FIRST,
 +    FAULT_ALL,
 +} SVEContFault;
 +
 +typedef struct {
 +    /*
 +     * First and last element wholly contained within the two pages.
 +     * mem_off_first[0] and reg_off_first[0] are always set >= 0.
 +     * reg_off_last[0] may be < 0 if the first element crosses pages.
 +     * All of mem_off_first[1], reg_off_first[1] and reg_off_last[1]
 +     * are set >= 0 only if there are complete elements on a second page.
 +     *
 +     * The reg_off_* offsets are relative to the internal vector register.
 +     * The mem_off_first offset is relative to the memory address; the
 +     * two offsets are different when a load operation extends, a store
 +     * operation truncates, or for multi-register operations.
 +     */
 +    int16_t mem_off_first[2];
 +    int16_t reg_off_first[2];
 +    int16_t reg_off_last[2];
 +
 +    /*
 +     * One element that is misaligned and spans both pages,
 +     * or -1 if there is no such active element.
 +     */
 +    int16_t mem_off_split;
 +    int16_t reg_off_split;
 +
 +    /*
 +     * The byte offset at which the entire operation crosses a page boundary.
 +     * Set >= 0 if and only if the entire operation spans two pages.
 +     */
 +    int16_t page_split;
 +
 +    /* TLB data for the two pages. */
 +    SVEHostPage page[2];
 +} SVEContLdSt;
 +
 +/*
 + * Find first active element on each page, and a loose bound for the
 + * final element on each page.  Identify any single element that spans
 + * the page boundary.  Return true if there are any active elements.
 + */
 +bool sve_cont_ldst_elements(SVEContLdSt *info, target_ulong addr, uint64_t *vg,
 +                            intptr_t reg_max, int esz, int msize);
 +
 +/*
 + * Resolve the guest virtual addresses to info->page[].
 + * Control the generation of page faults with @fault.  Return false if
 + * there is no work to do, which can only happen with @fault == FAULT_NO.
 + */
 +bool sve_cont_ldst_pages(SVEContLdSt *info, SVEContFault fault,
 +                         CPUARMState *env, target_ulong addr,
 +                         MMUAccessType access_type, uintptr_t retaddr);
 +
 +#ifdef CONFIG_USER_ONLY
 +static inline void
 +sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env, uint64_t *vg,
 +                          target_ulong addr, int esize, int msize,
 +                          int wp_access, uintptr_t retaddr)
 +{ }
 +#else
 +void sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env,
 +                               uint64_t *vg, target_ulong addr,
 +                               int esize, int msize, int wp_access,
 +                               uintptr_t retaddr);
 +#endif
 +
 +void sve_cont_ldst_mte_check(SVEContLdSt *info, CPUARMState *env, uint64_t *vg,
 +                             target_ulong addr, int esize, int msize,
 +                             uint32_t mtedesc, uintptr_t ra);
 +
  #endif /* TARGET_ARM_SVE_LDST_INTERNAL_H */
 diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/sve_helper.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/sve_helper.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static intptr_t find_next_active(uint64_t *vg, intptr_t reg_off,
+@@ -XXX,XX +XXX,XX @@ TRANS(FMULX_vi, do_fp3_vector_idx, a, f_vector_idx_fmulx)
-  * exit via page fault exception.
-  */
+ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
 -typedef struct {
 -    void *host;
 -    int flags;
 -    MemTxAttrs attrs;
 -} SVEHostPage;
 -
 -static bool sve_probe_page(SVEHostPage *info, bool nofault,
 -                           CPUARMState *env, target_ulong addr,
 -                           int mem_off, MMUAccessType access_type,
 -                           int mmu_idx, uintptr_t retaddr)
 +bool sve_probe_page(SVEHostPage *info, bool nofault, CPUARMState *env,
 +                    target_ulong addr, int mem_off, MMUAccessType access_type,
 +                    int mmu_idx, uintptr_t retaddr)
  {
-     int flags;
+-    static gen_helper_gvec_4_ptr * const fns[3] = {
+-        gen_helper_gvec_fmla_idx_h,
-@@ -XXX,XX +XXX,XX @@ static bool sve_probe_page(SVEHostPage *info, bool nofault,
+-        gen_helper_gvec_fmla_idx_s,
 -        gen_helper_gvec_fmla_idx_d,
 +    static gen_helper_gvec_4_ptr * const fns[3][3] = {
 +        { gen_helper_gvec_fmla_idx_h,
 +          gen_helper_gvec_fmla_idx_s,
 +          gen_helper_gvec_fmla_idx_d },
 +        { gen_helper_gvec_fmls_idx_h,
 +          gen_helper_gvec_fmls_idx_s,
 +          gen_helper_gvec_fmls_idx_d },
 +        { gen_helper_gvec_ah_fmls_idx_h,
 +          gen_helper_gvec_ah_fmls_idx_s,
 +          gen_helper_gvec_ah_fmls_idx_d },
      };
      MemOp esz = a->esz;
      int check = fp_access_check_vector_hsd(s, a->q, esz);
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
      gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
                        esz == MO_16 ? FPST_A64_F16 : FPST_A64,
 -                      (a->idx << 1) | neg,
 -                      fns[esz - 1]);
 +                      a->idx, fns[neg ? 1 + s->fpcr_ah : 0][esz - 1]);
      return true;
  }
--
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
--/*
+index XXXXXXX..XXXXXXX 100644
-- * Analyse contiguous data, protected by a governing predicate.
+--- a/target/arm/tcg/translate-sve.c
-- */
++++ b/target/arm/tcg/translate-sve.c
--
+@@ -XXX,XX +XXX,XX @@ DO_SVE2_RRXR_ROT(CDOT_zzxw_d, gen_helper_sve2_cdot_idx_d)
--typedef enum {
+  *** SVE Floating Point Multiply-Add Indexed Group
--    FAULT_NO,
+  */
--    FAULT_FIRST,
--    FAULT_ALL,
+-static bool do_FMLA_zzxz(DisasContext *s, arg_rrxr_esz *a, bool sub)
--} SVEContFault;
+-{
--
+-    static gen_helper_gvec_4_ptr * const fns[4] = {
--typedef struct {
+-        NULL,
--    /*
+-        gen_helper_gvec_fmla_idx_h,
--     * First and last element wholly contained within the two pages.
+-        gen_helper_gvec_fmla_idx_s,
--     * mem_off_first[0] and reg_off_first[0] are always set >= 0.
+-        gen_helper_gvec_fmla_idx_d,
--     * reg_off_last[0] may be < 0 if the first element crosses pages.
+-    };
--     * All of mem_off_first[1], reg_off_first[1] and reg_off_last[1]
+-    return gen_gvec_fpst_zzzz(s, fns[a->esz], a->rd, a->rn, a->rm, a->ra,
--     * are set >= 0 only if there are complete elements on a second page.
+-                              (a->index << 1) | sub,
--     *
+-                              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
--     * The reg_off_* offsets are relative to the internal vector register.
+-}
--     * The mem_off_first offset is relative to the memory address; the
++static gen_helper_gvec_4_ptr * const fmla_idx_fns[4] = {
--     * two offsets are different when a load operation extends, a store
++    NULL,                       gen_helper_gvec_fmla_idx_h,
--     * operation truncates, or for multi-register operations.
++    gen_helper_gvec_fmla_idx_s, gen_helper_gvec_fmla_idx_d
--     */
++};
--    int16_t mem_off_first[2];
++TRANS_FEAT(FMLA_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
--    int16_t reg_off_first[2];
++           fmla_idx_fns[a->esz], a->rd, a->rn, a->rm, a->ra, a->index,
--    int16_t reg_off_last[2];
++           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
--
--    /*
+-TRANS_FEAT(FMLA_zzxz, aa64_sve, do_FMLA_zzxz, a, false)
--     * One element that is misaligned and spans both pages,
+-TRANS_FEAT(FMLS_zzxz, aa64_sve, do_FMLA_zzxz, a, true)
--     * or -1 if there is no such active element.
++static gen_helper_gvec_4_ptr * const fmls_idx_fns[4][2] = {
--     */
++    { NULL, NULL },
--    int16_t mem_off_split;
++    { gen_helper_gvec_fmls_idx_h, gen_helper_gvec_ah_fmls_idx_h },
--    int16_t reg_off_split;
++    { gen_helper_gvec_fmls_idx_s, gen_helper_gvec_ah_fmls_idx_s },
--
++    { gen_helper_gvec_fmls_idx_d, gen_helper_gvec_ah_fmls_idx_d },
--    /*
++};
--     * The byte offset at which the entire operation crosses a page boundary.
++TRANS_FEAT(FMLS_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
--     * Set >= 0 if and only if the entire operation spans two pages.
++           fmls_idx_fns[a->esz][s->fpcr_ah],
--     */
++           a->rd, a->rn, a->rm, a->ra, a->index,
--    int16_t page_split;
++           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
--
 -    /* TLB data for the two pages. */
 -    SVEHostPage page[2];
 -} SVEContLdSt;
 -
  /*
-  * Find first active element on each page, and a loose bound for the
+  *** SVE Floating Point Multiply Indexed Group
-  * final element on each page.  Identify any single element that spans
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
-  * the page boundary.  Return true if there are any active elements.
+index XXXXXXX..XXXXXXX 100644
-  */
+--- a/target/arm/tcg/vec_helper.c
--static bool sve_cont_ldst_elements(SVEContLdSt *info, target_ulong addr,
++++ b/target/arm/tcg/vec_helper.c
--                                   uint64_t *vg, intptr_t reg_max,
+@@ -XXX,XX +XXX,XX @@ DO_FMUL_IDX(gvec_fmls_nf_idx_s, float32_sub, float32_mul, float32, H4)
--                                   int esz, int msize)
-+bool sve_cont_ldst_elements(SVEContLdSt *info, target_ulong addr, uint64_t *vg,
+ #undef DO_FMUL_IDX
-+                            intptr_t reg_max, int esz, int msize)
- {
+-#define DO_FMLA_IDX(NAME, TYPE, H)                                         \
-     const int esize = 1 << esz;
++#define DO_FMLA_IDX(NAME, TYPE, H, NEGX, NEGF)                             \
-     const uint64_t pg_mask = pred_esz_masks[esz];
+ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va,                  \
-@@ -XXX,XX +XXX,XX @@ static bool sve_cont_ldst_elements(SVEContLdSt *info, target_ulong addr,
+                   float_status *stat, uint32_t desc)                       \
-  * Control the generation of page faults with @fault.  Return false if
+ {                                                                          \
-  * there is no work to do, which can only happen with @fault == FAULT_NO.
+     intptr_t i, j, oprsz = simd_oprsz(desc);                               \
-  */
+     intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
--static bool sve_cont_ldst_pages(SVEContLdSt *info, SVEContFault fault,
+-    TYPE op1_neg = extract32(desc, SIMD_DATA_SHIFT, 1);                    \
--                                CPUARMState *env, target_ulong addr,
+-    intptr_t idx = desc >> (SIMD_DATA_SHIFT + 1);                          \
--                                MMUAccessType access_type, uintptr_t retaddr)
++    intptr_t idx = simd_data(desc);                                        \
-+bool sve_cont_ldst_pages(SVEContLdSt *info, SVEContFault fault,
+     TYPE *d = vd, *n = vn, *m = vm, *a = va;                               \
-+                         CPUARMState *env, target_ulong addr,
+-    op1_neg <<= (8 * sizeof(TYPE) - 1);                                    \
-+                         MMUAccessType access_type, uintptr_t retaddr)
+     for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
- {
+         TYPE mm = m[H(i + idx)];                                           \
-     int mmu_idx = cpu_mmu_index(env, false);
+         for (j = 0; j < segment; j++) {                                    \
-     int mem_off = info->mem_off_first[0];
+-            d[i + j] = TYPE##_muladd(n[i + j] ^ op1_neg,                   \
-@@ -XXX,XX +XXX,XX @@ static bool sve_cont_ldst_pages(SVEContLdSt *info, SVEContFault fault,
+-                                     mm, a[i + j], 0, stat);               \
-     return have_work;
++            d[i + j] = TYPE##_muladd(n[i + j] ^ NEGX, mm,                  \
 +                                     a[i + j], NEGF, stat);                \
          }                                                                  \
      }                                                                      \
      clear_tail(d, oprsz, simd_maxsz(desc));                                \
  }
--static void sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env,
+-DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2)
--                                      uint64_t *vg, target_ulong addr,
+-DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4)
--                                      int esize, int msize, int wp_access,
+-DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8)
--                                      uintptr_t retaddr)
++DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2, 0, 0)
--{
++DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4, 0, 0)
- #ifndef CONFIG_USER_ONLY
++DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8, 0, 0)
-+void sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env,
++
-+                               uint64_t *vg, target_ulong addr,
++DO_FMLA_IDX(gvec_fmls_idx_h, float16, H2, INT16_MIN, 0)
-+                               int esize, int msize, int wp_access,
++DO_FMLA_IDX(gvec_fmls_idx_s, float32, H4, INT32_MIN, 0)
-+                               uintptr_t retaddr)
++DO_FMLA_IDX(gvec_fmls_idx_d, float64, H8, INT64_MIN, 0)
-+{
++
-     intptr_t mem_off, reg_off, reg_last;
++DO_FMLA_IDX(gvec_ah_fmls_idx_h, float16, H2, 0, float_muladd_negate_product)
-     int flags0 = info->page[0].flags;
++DO_FMLA_IDX(gvec_ah_fmls_idx_s, float32, H4, 0, float_muladd_negate_product)
-     int flags1 = info->page[1].flags;
++DO_FMLA_IDX(gvec_ah_fmls_idx_d, float64, H8, 0, float_muladd_negate_product)
-@@ -XXX,XX +XXX,XX @@ static void sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env,
-             } while (reg_off & 63);
+ #undef DO_FMLA_IDX
          } while (reg_off <= reg_last);
      }
 -#endif
  }
 +#endif
 -static void sve_cont_ldst_mte_check(SVEContLdSt *info, CPUARMState *env,
 -                                    uint64_t *vg, target_ulong addr, int esize,
 -                                    int msize, uint32_t mtedesc, uintptr_t ra)
 +void sve_cont_ldst_mte_check(SVEContLdSt *info, CPUARMState *env,
 +                             uint64_t *vg, target_ulong addr, int esize,
 +                             int msize, uint32_t mtedesc, uintptr_t ra)
  {
      intptr_t mem_off, reg_off, reg_last;
 --
-.25.1
+.34.1

-[PULL 23/55] target/arm: Move arm_{ldl,ldq}_ptw to ptw.c
+[PULL 41/68] target/arm: Handle FPCR.AH in negation in FMLS (vector)
-From: Richard Henderson <richard.henderson@linaro.org>
+Handle the FPCR.AH "don't negate the sign of a NaN" semantics
 in FMLS (vector), by implementing a new set of helpers for
 the AH=1 case.
-Move the ptw load functions, plus 3 common subroutines:
+The float_muladd_negate_product flag produces the same result
-S1_ptw_translate, ptw_attrs_are_device, and regime_translation_big_endian.
+as negating either of the multiplication operands, assuming
-This also allows get_phys_addr_lpae to become static again.
+neither of the operands are NaNs.  But since FEAT_AFP does not
 negate NaNs, this behaviour is exactly what we need.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-17-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/ptw.h    |  13 ----
+ target/arm/helper.h            |  4 ++++
- target/arm/helper.c | 141 --------------------------------------
+ target/arm/tcg/translate-a64.c |  7 ++++++-
- target/arm/ptw.c    | 160 ++++++++++++++++++++++++++++++++++++++++++--
+ target/arm/tcg/vec_helper.c    | 22 ++++++++++++++++++++++
-files changed, 154 insertions(+), 160 deletions(-)
+files changed, 32 insertions(+), 1 deletion(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/helper.h
-+++ b/target/arm/ptw.h
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
+ DEF_HELPER_FLAGS_5(gvec_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
- extern const uint8_t pamax_map[7];
+ DEF_HELPER_FLAGS_5(gvec_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
--uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
++DEF_HELPER_FLAGS_5(gvec_ah_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
--                     ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi);
++DEF_HELPER_FLAGS_5(gvec_ah_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
--uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
++DEF_HELPER_FLAGS_5(gvec_ah_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
--                     ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi);
++
--
+ DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
- bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx);
+                    void, ptr, ptr, ptr, fpst, i32)
- bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
+ DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
- uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn);
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0);
  int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
                 int ap, int ns, int xn, int pxn);
 -bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
 -                        MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                        bool s1_is_el0,
 -                        hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
 -                        target_ulong *page_size_ptr,
 -                        ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
 -    __attribute__((nonnull));
 -
  #endif /* !CONFIG_USER_ONLY */
  #endif /* TARGET_ARM_PTW_H */
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx)
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmls[3] = {
-     return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
+     gen_helper_gvec_vfms_s,
      gen_helper_gvec_vfms_d,
  };
 -TRANS(FMLS_v, do_fp3_vector, a, 0, f_vector_fmls)
 +static gen_helper_gvec_3_ptr * const f_vector_fmls_ah[3] = {
 +    gen_helper_gvec_ah_vfms_h,
 +    gen_helper_gvec_ah_vfms_s,
 +    gen_helper_gvec_ah_vfms_d,
 +};
 +TRANS(FMLS_v, do_fp3_vector_2fn, a, 0, f_vector_fmls, f_vector_fmls_ah)
  static gen_helper_gvec_3_ptr * const f_vector_fcmeq[3] = {
      gen_helper_gvec_fceq_h,
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static float64 float64_mulsub_f(float64 dest, float64 op1, float64 op2,
      return float64_muladd(float64_chs(op1), op2, dest, 0, stat);
  }
--static inline bool regime_translation_big_endian(CPUARMState *env,
++static float16 float16_ah_mulsub_f(float16 dest, float16 op1, float16 op2,
--                                                 ARMMMUIdx mmu_idx)
++                                 float_status *stat)
 -{
 -    return (regime_sctlr(env, mmu_idx) & SCTLR_EE) != 0;
 -}
 -
  /* Return the TTBR associated with this translation regime */
  uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn)
  {
@@ -XXX,XX +XXX,XX @@ int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
      return prot_rw | PAGE_EXEC;
  }
 -static bool ptw_attrs_are_device(CPUARMState *env, ARMCacheAttrs cacheattrs)
 -{
 -    /*
 -     * For an S1 page table walk, the stage 1 attributes are always
 -     * some form of "this is Normal memory". The combined S1+S2
 -     * attributes are therefore only Device if stage 2 specifies Device.
 -     * With HCR_EL2.FWB == 0 this is when descriptor bits [5:4] are 0b00,
 -     * ie when cacheattrs.attrs bits [3:2] are 0b00.
 -     * With HCR_EL2.FWB == 1 this is when descriptor bit [4] is 0, ie
 -     * when cacheattrs.attrs bit [2] is 0.
 -     */
 -    assert(cacheattrs.is_s2_format);
 -    if (arm_hcr_el2_eff(env) & HCR_FWB) {
 -        return (cacheattrs.attrs & 0x4) == 0;
 -    } else {
 -        return (cacheattrs.attrs & 0xc) == 0;
 -    }
 -}
 -
 -/* Translate a S1 pagetable walk through S2 if needed.  */
 -static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
 -                               hwaddr addr, bool *is_secure,
 -                               ARMMMUFaultInfo *fi)
 -{
 -    if (arm_mmu_idx_is_stage1_of_2(mmu_idx) &&
 -        !regime_translation_disabled(env, ARMMMUIdx_Stage2)) {
 -        target_ulong s2size;
 -        hwaddr s2pa;
 -        int s2prot;
 -        int ret;
 -        ARMMMUIdx s2_mmu_idx = *is_secure ? ARMMMUIdx_Stage2_S
 -                                          : ARMMMUIdx_Stage2;
 -        ARMCacheAttrs cacheattrs = {};
 -        MemTxAttrs txattrs = {};
 -
 -        ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, s2_mmu_idx, false,
 -                                 &s2pa, &txattrs, &s2prot, &s2size, fi,
 -                                 &cacheattrs);
 -        if (ret) {
 -            assert(fi->type != ARMFault_None);
 -            fi->s2addr = addr;
 -            fi->stage2 = true;
 -            fi->s1ptw = true;
 -            fi->s1ns = !*is_secure;
 -            return ~0;
 -        }
 -        if ((arm_hcr_el2_eff(env) & HCR_PTW) &&
 -            ptw_attrs_are_device(env, cacheattrs)) {
 -            /*
 -             * PTW set and S1 walk touched S2 Device memory:
 -             * generate Permission fault.
 -             */
 -            fi->type = ARMFault_Permission;
 -            fi->s2addr = addr;
 -            fi->stage2 = true;
 -            fi->s1ptw = true;
 -            fi->s1ns = !*is_secure;
 -            return ~0;
 -        }
 -
 -        if (arm_is_secure_below_el3(env)) {
 -            /* Check if page table walk is to secure or non-secure PA space. */
 -            if (*is_secure) {
 -                *is_secure = !(env->cp15.vstcr_el2.raw_tcr & VSTCR_SW);
 -            } else {
 -                *is_secure = !(env->cp15.vtcr_el2.raw_tcr & VTCR_NSW);
 -            }
 -        } else {
 -            assert(!*is_secure);
 -        }
 -
 -        addr = s2pa;
 -    }
 -    return addr;
 -}
 -
 -/* All loads done in the course of a page table walk go through here. */
 -uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
 -                     ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
 -{
 -    ARMCPU *cpu = ARM_CPU(cs);
 -    CPUARMState *env = &cpu->env;
 -    MemTxAttrs attrs = {};
 -    MemTxResult result = MEMTX_OK;
 -    AddressSpace *as;
 -    uint32_t data;
 -
 -    addr = S1_ptw_translate(env, mmu_idx, addr, &is_secure, fi);
 -    attrs.secure = is_secure;
 -    as = arm_addressspace(cs, attrs);
 -    if (fi->s1ptw) {
 -        return 0;
 -    }
 -    if (regime_translation_big_endian(env, mmu_idx)) {
 -        data = address_space_ldl_be(as, addr, attrs, &result);
 -    } else {
 -        data = address_space_ldl_le(as, addr, attrs, &result);
 -    }
 -    if (result == MEMTX_OK) {
 -        return data;
 -    }
 -    fi->type = ARMFault_SyncExternalOnWalk;
 -    fi->ea = arm_extabort_type(result);
 -    return 0;
 -}
 -
 -uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
 -                     ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
 -{
 -    ARMCPU *cpu = ARM_CPU(cs);
 -    CPUARMState *env = &cpu->env;
 -    MemTxAttrs attrs = {};
 -    MemTxResult result = MEMTX_OK;
 -    AddressSpace *as;
 -    uint64_t data;
 -
 -    addr = S1_ptw_translate(env, mmu_idx, addr, &is_secure, fi);
 -    attrs.secure = is_secure;
 -    as = arm_addressspace(cs, attrs);
 -    if (fi->s1ptw) {
 -        return 0;
 -    }
 -    if (regime_translation_big_endian(env, mmu_idx)) {
 -        data = address_space_ldq_be(as, addr, attrs, &result);
 -    } else {
 -        data = address_space_ldq_le(as, addr, attrs, &result);
 -    }
 -    if (result == MEMTX_OK) {
 -        return data;
 -    }
 -    fi->type = ARMFault_SyncExternalOnWalk;
 -    fi->ea = arm_extabort_type(result);
 -    return 0;
 -}
 -
  /*
   * check_s2_mmu_setup
   * @cpu:        ARMCPU
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@
  #include "ptw.h"
 +static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
 +                               MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                               bool s1_is_el0, hwaddr *phys_ptr,
 +                               MemTxAttrs *txattrs, int *prot,
 +                               target_ulong *page_size_ptr,
 +                               ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
 +    __attribute__((nonnull));
 +
 +static bool regime_translation_big_endian(CPUARMState *env, ARMMMUIdx mmu_idx)
 +{
-+    return (regime_sctlr(env, mmu_idx) & SCTLR_EE) != 0;
++    return float16_muladd(op1, op2, dest, float_muladd_negate_product, stat);
 +}
 +
-+static bool ptw_attrs_are_device(CPUARMState *env, ARMCacheAttrs cacheattrs)
++static float32 float32_ah_mulsub_f(float32 dest, float32 op1, float32 op2,
 +                                 float_status *stat)
 +{
-+    /*
++    return float32_muladd(op1, op2, dest, float_muladd_negate_product, stat);
 +     * For an S1 page table walk, the stage 1 attributes are always
 +     * some form of "this is Normal memory". The combined S1+S2
 +     * attributes are therefore only Device if stage 2 specifies Device.
 +     * With HCR_EL2.FWB == 0 this is when descriptor bits [5:4] are 0b00,
 +     * ie when cacheattrs.attrs bits [3:2] are 0b00.
 +     * With HCR_EL2.FWB == 1 this is when descriptor bit [4] is 0, ie
 +     * when cacheattrs.attrs bit [2] is 0.
 +     */
 +    assert(cacheattrs.is_s2_format);
 +    if (arm_hcr_el2_eff(env) & HCR_FWB) {
 +        return (cacheattrs.attrs & 0x4) == 0;
 +    } else {
 +        return (cacheattrs.attrs & 0xc) == 0;
 +    }
 +}
 +
-+/* Translate a S1 pagetable walk through S2 if needed.  */
++static float64 float64_ah_mulsub_f(float64 dest, float64 op1, float64 op2,
-+static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
++                                 float_status *stat)
 +                               hwaddr addr, bool *is_secure,
 +                               ARMMMUFaultInfo *fi)
 +{
-+    if (arm_mmu_idx_is_stage1_of_2(mmu_idx) &&
++    return float64_muladd(op1, op2, dest, float_muladd_negate_product, stat);
 +        !regime_translation_disabled(env, ARMMMUIdx_Stage2)) {
 +        target_ulong s2size;
 +        hwaddr s2pa;
 +        int s2prot;
 +        int ret;
 +        ARMMMUIdx s2_mmu_idx = *is_secure ? ARMMMUIdx_Stage2_S
 +                                          : ARMMMUIdx_Stage2;
 +        ARMCacheAttrs cacheattrs = {};
 +        MemTxAttrs txattrs = {};
 +
 +        ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, s2_mmu_idx, false,
 +                                 &s2pa, &txattrs, &s2prot, &s2size, fi,
 +                                 &cacheattrs);
 +        if (ret) {
 +            assert(fi->type != ARMFault_None);
 +            fi->s2addr = addr;
 +            fi->stage2 = true;
 +            fi->s1ptw = true;
 +            fi->s1ns = !*is_secure;
 +            return ~0;
 +        }
 +        if ((arm_hcr_el2_eff(env) & HCR_PTW) &&
 +            ptw_attrs_are_device(env, cacheattrs)) {
 +            /*
 +             * PTW set and S1 walk touched S2 Device memory:
 +             * generate Permission fault.
 +             */
 +            fi->type = ARMFault_Permission;
 +            fi->s2addr = addr;
 +            fi->stage2 = true;
 +            fi->s1ptw = true;
 +            fi->s1ns = !*is_secure;
 +            return ~0;
 +        }
 +
 +        if (arm_is_secure_below_el3(env)) {
 +            /* Check if page table walk is to secure or non-secure PA space. */
 +            if (*is_secure) {
 +                *is_secure = !(env->cp15.vstcr_el2.raw_tcr & VSTCR_SW);
 +            } else {
 +                *is_secure = !(env->cp15.vtcr_el2.raw_tcr & VTCR_NSW);
 +            }
 +        } else {
 +            assert(!*is_secure);
 +        }
 +
 +        addr = s2pa;
 +    }
 +    return addr;
 +}
 +
-+/* All loads done in the course of a page table walk go through here. */
+ #define DO_MULADD(NAME, FUNC, TYPE)                                        \
-+static uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
+ void HELPER(NAME)(void *vd, void *vn, void *vm,                            \
-+                            ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
+                   float_status *stat, uint32_t desc)                       \
-+{
+@@ -XXX,XX +XXX,XX @@ DO_MULADD(gvec_vfms_h, float16_mulsub_f, float16)
-+    ARMCPU *cpu = ARM_CPU(cs);
+ DO_MULADD(gvec_vfms_s, float32_mulsub_f, float32)
-+    CPUARMState *env = &cpu->env;
+ DO_MULADD(gvec_vfms_d, float64_mulsub_f, float64)
-+    MemTxAttrs attrs = {};
-+    MemTxResult result = MEMTX_OK;
++DO_MULADD(gvec_ah_vfms_h, float16_ah_mulsub_f, float16)
-+    AddressSpace *as;
++DO_MULADD(gvec_ah_vfms_s, float32_ah_mulsub_f, float32)
-+    uint32_t data;
++DO_MULADD(gvec_ah_vfms_d, float64_ah_mulsub_f, float64)
 +
-+    addr = S1_ptw_translate(env, mmu_idx, addr, &is_secure, fi);
+ /* For the indexed ops, SVE applies the index per 128-bit vector segment.
-+    attrs.secure = is_secure;
+  * For AdvSIMD, there is of course only one such vector segment.
 +    as = arm_addressspace(cs, attrs);
 +    if (fi->s1ptw) {
 +        return 0;
 +    }
 +    if (regime_translation_big_endian(env, mmu_idx)) {
 +        data = address_space_ldl_be(as, addr, attrs, &result);
 +    } else {
 +        data = address_space_ldl_le(as, addr, attrs, &result);
 +    }
 +    if (result == MEMTX_OK) {
 +        return data;
 +    }
 +    fi->type = ARMFault_SyncExternalOnWalk;
 +    fi->ea = arm_extabort_type(result);
 +    return 0;
 +}
 +
 +static uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
 +                            ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
 +{
 +    ARMCPU *cpu = ARM_CPU(cs);
 +    CPUARMState *env = &cpu->env;
 +    MemTxAttrs attrs = {};
 +    MemTxResult result = MEMTX_OK;
 +    AddressSpace *as;
 +    uint64_t data;
 +
 +    addr = S1_ptw_translate(env, mmu_idx, addr, &is_secure, fi);
 +    attrs.secure = is_secure;
 +    as = arm_addressspace(cs, attrs);
 +    if (fi->s1ptw) {
 +        return 0;
 +    }
 +    if (regime_translation_big_endian(env, mmu_idx)) {
 +        data = address_space_ldq_be(as, addr, attrs, &result);
 +    } else {
 +        data = address_space_ldq_le(as, addr, attrs, &result);
 +    }
 +    if (result == MEMTX_OK) {
 +        return data;
 +    }
 +    fi->type = ARMFault_SyncExternalOnWalk;
 +    fi->ea = arm_extabort_type(result);
 +    return 0;
 +}
 +
  static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
                                       uint32_t *table, uint32_t address)
  {
@@ -XXX,XX +XXX,XX @@ do_fault:
   * @fi: set to fault info if the translation fails
   * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
   */
--bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
--                        MMUAccessType access_type, ARMMMUIdx mmu_idx,
--                        bool s1_is_el0,
--                        hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
--                        target_ulong *page_size_ptr,
--                        ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
-+static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-+                               MMUAccessType access_type, ARMMMUIdx mmu_idx,
-+                               bool s1_is_el0, hwaddr *phys_ptr,
-+                               MemTxAttrs *txattrs, int *prot,
-+                               target_ulong *page_size_ptr,
-+                               ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
- {
-     ARMCPU *cpu = env_archcpu(env);
-     CPUState *cs = CPU(cpu);
 --
-.25.1
+.34.1

-[PULL 42/55] target/arm: Use el_is_in_host for sve_exception_el
+[PULL 42/68] target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
-From: Richard Henderson <richard.henderson@linaro.org>
+Handle the FPCR.AH "don't negate the sign of a NaN" semantics fro the
 SVE FMLS (vector) insns, by providing new helpers for the AH=1 case
 which end up passing fpcr_ah = true to the do_fmla_zpzzz_* functions
 that do the work.
-The ARM pseudocode function CheckNormalSVEEnabled uses this
+The float*_muladd functions have a flags argument that can
-predicate now, and I think it's a bit clearer.
+perform optional negation of various operand.  We don't use
 that for "normal" arm fmla, because the muladd flags are not
 applied when an input is a NaN.  But since FEAT_AFP does not
 negate NaNs, this behaviour is exactly what we need.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+The non-AH helpers pass in a zero flags argument and control the
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+negation via the neg1 and neg3 arguments; the AH helpers always pass
-Message-id: 20220607203306.657998-8-richard.henderson@linaro.org
+in neg1 and neg3 as zero and control the negation via the flags
 argument.  This allows us to avoid conditional branches within the
 inner loop.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/helper.c | 5 ++---
+ target/arm/tcg/helper-sve.h    | 21 ++++++++
-file changed, 2 insertions(+), 3 deletions(-)
+ target/arm/tcg/sve_helper.c    | 99 +++++++++++++++++++++++++++-------
  target/arm/tcg/translate-sve.c | 18 ++++---
 files changed, 114 insertions(+), 24 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/helper-sve.h
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/helper-sve.h
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo minimal_ras_reginfo[] = {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
- int sve_exception_el(CPUARMState *env, int el)
+ DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
- {
+                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
- #ifndef CONFIG_USER_ONLY
--    uint64_t hcr_el2 = arm_hcr_el2_eff(env);
++DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_h, TCG_CALL_NO_RWG,
--
++                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
--    if (el <= 1 && (hcr_el2 & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
++DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_s, TCG_CALL_NO_RWG,
-+    if (el <= 1 && !el_is_in_host(env, el)) {
++                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
-         switch (FIELD_EX64(env->cp15.cpacr_el1, CPACR_EL1, ZEN)) {
++DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_d, TCG_CALL_NO_RWG,
-         case 1:
++                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
-             if (el != 0) {
++
-@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
++DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_h, TCG_CALL_NO_RWG,
-      * CPTR_EL2 changes format with HCR_EL2.E2H (regardless of TGE).
++                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
-      */
++DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_s, TCG_CALL_NO_RWG,
-     if (el <= 2) {
++                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
-+        uint64_t hcr_el2 = arm_hcr_el2_eff(env);
++DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_d, TCG_CALL_NO_RWG,
-         if (hcr_el2 & HCR_E2H) {
++                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
-             switch (FIELD_EX64(env->cp15.cptr_el[2], CPTR_EL2, ZEN)) {
++
-             case 1:
++DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_h, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 +
  DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_h, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_s, TCG_CALL_NO_RWG,
 diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/sve_helper.c
 +++ b/target/arm/tcg/sve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_ZPZ_FP(flogb_d, float64, H1_8, do_float64_logb_as_int)
  static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
                              float_status *status, uint32_t desc,
 -                            uint16_t neg1, uint16_t neg3)
 +                            uint16_t neg1, uint16_t neg3, int flags)
  {
      intptr_t i = simd_oprsz(desc);
      uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
                  e1 = *(uint16_t *)(vn + H1_2(i)) ^ neg1;
                  e2 = *(uint16_t *)(vm + H1_2(i));
                  e3 = *(uint16_t *)(va + H1_2(i)) ^ neg3;
 -                r = float16_muladd(e1, e2, e3, 0, status);
 +                r = float16_muladd(e1, e2, e3, flags, status);
                  *(uint16_t *)(vd + H1_2(i)) = r;
              }
          } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
  void HELPER(sve_fmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0);
 +    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
  }
  void HELPER(sve_fmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0);
 +    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0, 0);
  }
  void HELPER(sve_fnmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000);
 +    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000, 0);
  }
  void HELPER(sve_fnmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000);
 +    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000, 0);
 +}
 +
 +void HELPER(sve_ah_fmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
 +                              void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product);
 +}
 +
 +void HELPER(sve_ah_fnmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product | float_muladd_negate_c);
 +}
 +
 +void HELPER(sve_ah_fnmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_c);
  }
  static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
                              float_status *status, uint32_t desc,
 -                            uint32_t neg1, uint32_t neg3)
 +                            uint32_t neg1, uint32_t neg3, int flags)
  {
      intptr_t i = simd_oprsz(desc);
      uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
                  e1 = *(uint32_t *)(vn + H1_4(i)) ^ neg1;
                  e2 = *(uint32_t *)(vm + H1_4(i));
                  e3 = *(uint32_t *)(va + H1_4(i)) ^ neg3;
 -                r = float32_muladd(e1, e2, e3, 0, status);
 +                r = float32_muladd(e1, e2, e3, flags, status);
                  *(uint32_t *)(vd + H1_4(i)) = r;
              }
          } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
  void HELPER(sve_fmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0);
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
  }
  void HELPER(sve_fmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0);
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0, 0);
  }
  void HELPER(sve_fnmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000);
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000, 0);
  }
  void HELPER(sve_fnmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000);
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000, 0);
 +}
 +
 +void HELPER(sve_ah_fmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
 +                              void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product);
 +}
 +
 +void HELPER(sve_ah_fnmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product | float_muladd_negate_c);
 +}
 +
 +void HELPER(sve_ah_fnmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_c);
  }
  static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
                              float_status *status, uint32_t desc,
 -                            uint64_t neg1, uint64_t neg3)
 +                            uint64_t neg1, uint64_t neg3, int flags)
  {
      intptr_t i = simd_oprsz(desc);
      uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
                  e1 = *(uint64_t *)(vn + i) ^ neg1;
                  e2 = *(uint64_t *)(vm + i);
                  e3 = *(uint64_t *)(va + i) ^ neg3;
 -                r = float64_muladd(e1, e2, e3, 0, status);
 +                r = float64_muladd(e1, e2, e3, flags, status);
                  *(uint64_t *)(vd + i) = r;
              }
          } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
  void HELPER(sve_fmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0);
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
  }
  void HELPER(sve_fmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0);
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0, 0);
  }
  void HELPER(sve_fnmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN);
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN, 0);
  }
  void HELPER(sve_fnmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
 -    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN);
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN, 0);
 +}
 +
 +void HELPER(sve_ah_fmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
 +                              void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product);
 +}
 +
 +void HELPER(sve_ah_fnmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_product | float_muladd_negate_c);
 +}
 +
 +void HELPER(sve_ah_fnmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
 +                               void *vg, float_status *status, uint32_t desc)
 +{
 +    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
 +                    float_muladd_negate_c);
  }
  /* Two operand floating-point comparison controlled by a predicate.
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
             a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
             a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 -#define DO_FMLA(NAME, name) \
 +#define DO_FMLA(NAME, name, ah_name)                                    \
      static gen_helper_gvec_5_ptr * const name##_fns[4] = {              \
          NULL, gen_helper_sve_##name##_h,                                \
          gen_helper_sve_##name##_s, gen_helper_sve_##name##_d            \
      };                                                                  \
 -    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp, name##_fns[a->esz], \
 +    static gen_helper_gvec_5_ptr * const name##_ah_fns[4] = {           \
 +        NULL, gen_helper_sve_##ah_name##_h,                             \
 +        gen_helper_sve_##ah_name##_s, gen_helper_sve_##ah_name##_d      \
 +    };                                                                  \
 +    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp,                     \
 +               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz], \
                 a->rd, a->rn, a->rm, a->ra, a->pg, 0,                    \
                 a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 -DO_FMLA(FMLA_zpzzz, fmla_zpzzz)
 -DO_FMLA(FMLS_zpzzz, fmls_zpzzz)
 -DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz)
 -DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz)
 +/* We don't need an ah_fmla_zpzzz because fmla doesn't negate anything */
 +DO_FMLA(FMLA_zpzzz, fmla_zpzzz, fmla_zpzzz)
 +DO_FMLA(FMLS_zpzzz, fmls_zpzzz, ah_fmls_zpzzz)
 +DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz, ah_fnmla_zpzzz)
 +DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz, ah_fnmls_zpzzz)
  #undef DO_FMLA
 --
-.25.1
+.34.1

-[PULL 51/55] target/arm: Use expand_pred_b in mve_helper.c
+[PULL 43/68] target/arm: Handle FPCR.AH in SVE FTSSEL
-From: Richard Henderson <richard.henderson@linaro.org>
+The negation step in the SVE FTSSEL insn mustn't negate a NaN when
 FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field
 and use that to determine whether to do the negation.
-Use the function instead of the array directly.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/tcg/sve_helper.c    | 18 +++++++++++++++---
  target/arm/tcg/translate-sve.c |  4 ++--
 files changed, 17 insertions(+), 5 deletions(-)
-Because the function performs its own masking, via the uint8_t
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 parameter, we need to do nothing extra within the users: the bits
 above the first 2 (_uh) or 4 (_uw) will be discarded by assignment
 to the local bmask variables, and of course _uq uses the entire
 uint64_t result.
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20220607203306.657998-17-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/mve_helper.c | 6 +++---
 file changed, 3 insertions(+), 3 deletions(-)
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
+--- a/target/arm/tcg/sve_helper.c
-+++ b/target/arm/mve_helper.c
++++ b/target/arm/tcg/sve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void mergemask_sb(int8_t *d, int8_t r, uint16_t mask)
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fexpa_d)(void *vd, void *vn, uint32_t desc)
+ void HELPER(sve_ftssel_h)(void *vd, void *vn, void *vm, uint32_t desc)
  static void mergemask_uh(uint16_t *d, uint16_t r, uint16_t mask)
  {
--    uint16_t bmask = expand_pred_b_data[mask & 3];
+     intptr_t i, opr_sz = simd_oprsz(desc) / 2;
-+    uint16_t bmask = expand_pred_b(mask);
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT, 1);
-     *d = (*d & ~bmask) | (r & bmask);
+     uint16_t *d = vd, *n = vn, *m = vm;
      for (i = 0; i < opr_sz; i += 1) {
          uint16_t nn = n[i];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftssel_h)(void *vd, void *vn, void *vm, uint32_t desc)
          if (mm & 1) {
              nn = float16_one;
          }
 -        d[i] = nn ^ (mm & 2) << 14;
 +        if (mm & 2) {
 +            nn = float16_maybe_ah_chs(nn, fpcr_ah);
 +        }
 +        d[i] = nn;
      }
  }
-@@ -XXX,XX +XXX,XX @@ static void mergemask_sh(int16_t *d, int16_t r, uint16_t mask)
+ void HELPER(sve_ftssel_s)(void *vd, void *vn, void *vm, uint32_t desc)
  static void mergemask_uw(uint32_t *d, uint32_t r, uint16_t mask)
  {
--    uint32_t bmask = expand_pred_b_data[mask & 0xf];
+     intptr_t i, opr_sz = simd_oprsz(desc) / 4;
-+    uint32_t bmask = expand_pred_b(mask);
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT, 1);
-     *d = (*d & ~bmask) | (r & bmask);
+     uint32_t *d = vd, *n = vn, *m = vm;
      for (i = 0; i < opr_sz; i += 1) {
          uint32_t nn = n[i];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftssel_s)(void *vd, void *vn, void *vm, uint32_t desc)
          if (mm & 1) {
              nn = float32_one;
          }
 -        d[i] = nn ^ (mm & 2) << 30;
 +        if (mm & 2) {
 +            nn = float32_maybe_ah_chs(nn, fpcr_ah);
 +        }
 +        d[i] = nn;
      }
  }
-@@ -XXX,XX +XXX,XX @@ static void mergemask_sw(int32_t *d, int32_t r, uint16_t mask)
+ void HELPER(sve_ftssel_d)(void *vd, void *vn, void *vm, uint32_t desc)
  static void mergemask_uq(uint64_t *d, uint64_t r, uint16_t mask)
  {
--    uint64_t bmask = expand_pred_b_data[mask & 0xff];
+     intptr_t i, opr_sz = simd_oprsz(desc) / 8;
-+    uint64_t bmask = expand_pred_b(mask);
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT, 1);
-     *d = (*d & ~bmask) | (r & bmask);
+     uint64_t *d = vd, *n = vn, *m = vm;
      for (i = 0; i < opr_sz; i += 1) {
          uint64_t nn = n[i];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftssel_d)(void *vd, void *vn, void *vm, uint32_t desc)
          if (mm & 1) {
              nn = float64_one;
          }
 -        d[i] = nn ^ (mm & 2) << 62;
 +        if (mm & 2) {
 +            nn = float64_maybe_ah_chs(nn, fpcr_ah);
 +        }
 +        d[i] = nn;
      }
  }
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2 * const fexpa_fns[4] = {
+     gen_helper_sve_fexpa_s, gen_helper_sve_fexpa_d,
+ };
+ TRANS_FEAT_NONSTREAMING(FEXPA, aa64_sve, gen_gvec_ool_zz,
+-                        fexpa_fns[a->esz], a->rd, a->rn, 0)
++                        fexpa_fns[a->esz], a->rd, a->rn, s->fpcr_ah)
+ static gen_helper_gvec_3 * const ftssel_fns[4] = {
+     NULL,                    gen_helper_sve_ftssel_h,
+     gen_helper_sve_ftssel_s, gen_helper_sve_ftssel_d,
+ };
+ TRANS_FEAT_NONSTREAMING(FTSSEL, aa64_sve, gen_gvec_ool_arg_zzz,
+-                        ftssel_fns[a->esz], a, 0)
++                        ftssel_fns[a->esz], a, s->fpcr_ah)
+ /*
+  *** SVE Predicate Logical Operations Group
 --
-.25.1
+.34.1

-[PULL 09/55] target/arm: Move get_phys_addr to ptw.c
+[PULL 44/68] target/arm: Handle FPCR.AH in SVE FTMAD
-From: Richard Henderson <richard.henderson@linaro.org>
+The negation step in the SVE FTMAD insn mustn't negate a NaN when
 FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field,
 so we can select the correct behaviour.
-Begin moving all of the page table walking functions
+Because the operand is known to be negative, negating the operand
-out of helper.c, starting with get_phys_addr().
+is the same as taking the absolute value.  Defer this to the muladd
 operation via flags, so that it happens after NaN detection, which
 is correct for FPCR.AH.
-Create a temporary header file, "ptw.h", in which to
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-share declarations between the two C files while we
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-are moving functions.
+---
  target/arm/tcg/sve_helper.c    | 42 ++++++++++++++++++++++++++--------
  target/arm/tcg/translate-sve.c |  3 ++-
 files changed, 35 insertions(+), 10 deletions(-)
-Move a few declarations to "internals.h", which will
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 remain used by multiple C files.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20220604040607.269301-3-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/internals.h |  18 ++-
  target/arm/ptw.h       |  51 ++++++
  target/arm/helper.c    | 344 +++++------------------------------------
  target/arm/ptw.c       | 267 ++++++++++++++++++++++++++++++++
  target/arm/meson.build |   1 +
 files changed, 372 insertions(+), 309 deletions(-)
  create mode 100644 target/arm/ptw.h
  create mode 100644 target/arm/ptw.c
 diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
+--- a/target/arm/tcg/sve_helper.c
-+++ b/target/arm/internals.h
++++ b/target/arm/tcg/sve_helper.c
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftmad_h)(void *vd, void *vn, void *vm,
- /* Return the MMU index for a v7M CPU in the specified security state */
+x3c00, 0xb800, 0x293a, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
- ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate);
+     };
+     intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float16);
--/* Return true if the stage 1 translation regime is using LPAE format page
+-    intptr_t x = simd_data(desc);
-- * tables */
++    intptr_t x = extract32(desc, SIMD_DATA_SHIFT, 3);
-+/* Return true if the translation regime is using LPAE format page tables */
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 3, 1);
-+bool regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx);
+     float16 *d = vd, *n = vn, *m = vm;
 +
-+/*
+     for (i = 0; i < opr_sz; i++) {
-+ * Return true if the stage 1 translation regime is using LPAE
+         float16 mm = m[i];
-+ * format page tables
+         intptr_t xx = x;
-+ */
++        int flags = 0;
- bool arm_s1_regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx);
++
+         if (float16_is_neg(mm)) {
- /* Raise a data fault alignment exception for the specified virtual address */
+-            mm = float16_abs(mm);
-@@ -XXX,XX +XXX,XX @@ static inline uint32_t regime_el(CPUARMState *env, ARMMMUIdx mmu_idx)
++            if (fpcr_ah) {
 +                flags = float_muladd_negate_product;
 +            } else {
 +                mm = float16_abs(mm);
 +            }
              xx += 8;
          }
 -        d[i] = float16_muladd(n[i], mm, coeff[xx], 0, s);
 +        d[i] = float16_muladd(n[i], mm, coeff[xx], flags, s);
      }
  }
-+/* Return the SCTLR value which controls this address translation regime */
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftmad_s)(void *vd, void *vn, void *vm,
-+static inline uint64_t regime_sctlr(CPUARMState *env, ARMMMUIdx mmu_idx)
+x37cd37cc, 0x00000000, 0x00000000, 0x00000000,
-+{
+     };
-+    return env->cp15.sctlr_el[regime_el(env, mmu_idx)];
+     intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float32);
-+}
+-    intptr_t x = simd_data(desc);
 +    intptr_t x = extract32(desc, SIMD_DATA_SHIFT, 3);
 +    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 3, 1);
      float32 *d = vd, *n = vn, *m = vm;
 +
- /* Return the TCR controlling this translation regime */
+     for (i = 0; i < opr_sz; i++) {
- static inline TCR *regime_tcr(CPUARMState *env, ARMMMUIdx mmu_idx)
+         float32 mm = m[i];
- {
+         intptr_t xx = x;
-@@ -XXX,XX +XXX,XX @@ typedef struct ARMVAParameters {
++        int flags = 0;
  ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
                                     ARMMMUIdx mmu_idx, bool data);
 +int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx);
 +int aa64_va_parameter_tbid(uint64_t tcr, ARMMMUIdx mmu_idx);
 +
- static inline int exception_target_el(CPUARMState *env)
+         if (float32_is_neg(mm)) {
- {
+-            mm = float32_abs(mm);
-     int target_el = MAX(1, arm_current_el(env));
++            if (fpcr_ah) {
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
++                flags = float_muladd_negate_product;
-new file mode 100644
++            } else {
-index XXXXXXX..XXXXXXX
++                mm = float32_abs(mm);
---- /dev/null
++            }
-+++ b/target/arm/ptw.h
+             xx += 8;
-@@ -XXX,XX +XXX,XX @@
+         }
-+/*
+-        d[i] = float32_muladd(n[i], mm, coeff[xx], 0, s);
-+ * ARM page table walking.
++        d[i] = float32_muladd(n[i], mm, coeff[xx], flags, s);
 + *
 + * This code is licensed under the GNU GPL v2 or later.
 + *
 + * SPDX-License-Identifier: GPL-2.0-or-later
 + */
 +
 +#ifndef TARGET_ARM_PTW_H
 +#define TARGET_ARM_PTW_H
 +
 +#ifndef CONFIG_USER_ONLY
 +
 +bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx);
 +bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
 +ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
 +                                 ARMCacheAttrs s1, ARMCacheAttrs s2);
 +
 +bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
 +                      MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                      hwaddr *phys_ptr, int *prot,
 +                      target_ulong *page_size,
 +                      ARMMMUFaultInfo *fi);
 +bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
 +                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                          hwaddr *phys_ptr, int *prot,
 +                          ARMMMUFaultInfo *fi);
 +bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
 +                      MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                      hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
 +                      target_ulong *page_size, ARMMMUFaultInfo *fi);
 +bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
 +                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                          hwaddr *phys_ptr, int *prot,
 +                          target_ulong *page_size,
 +                          ARMMMUFaultInfo *fi);
 +bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
 +                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                          hwaddr *phys_ptr, MemTxAttrs *txattrs,
 +                          int *prot, target_ulong *page_size,
 +                          ARMMMUFaultInfo *fi);
 +bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
 +                        MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                        bool s1_is_el0,
 +                        hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
 +                        target_ulong *page_size_ptr,
 +                        ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
 +    __attribute__((nonnull));
 +
 +#endif /* !CONFIG_USER_ONLY */
 +#endif /* TARGET_ARM_PTW_H */
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@
  #include "semihosting/common-semi.h"
  #endif
  #include "cpregs.h"
 +#include "ptw.h"
  #define ARM_CPU_FREQ 1000000000 /* FIXME: 1 GHz, should be configurable */
 -#ifndef CONFIG_USER_ONLY
 -
 -static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
 -                               MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                               bool s1_is_el0,
 -                               hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
 -                               target_ulong *page_size_ptr,
 -                               ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
 -    __attribute__((nonnull));
 -#endif
 -
  static void switch_mode(CPUARMState *env, int mode);
 -static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx);
  static uint64_t raw_read(CPUARMState *env, const ARMCPRegInfo *ri)
  {
@@ -XXX,XX +XXX,XX @@ uint64_t arm_sctlr(CPUARMState *env, int el)
      return env->cp15.sctlr_el[el];
  }
 -/* Return the SCTLR value which controls this address translation regime */
 -static inline uint64_t regime_sctlr(CPUARMState *env, ARMMMUIdx mmu_idx)
 -{
 -    return env->cp15.sctlr_el[regime_el(env, mmu_idx)];
 -}
 -
  #ifndef CONFIG_USER_ONLY
  /* Return true if the specified stage of address translation is disabled */
 -static inline bool regime_translation_disabled(CPUARMState *env,
 -                                               ARMMMUIdx mmu_idx)
 +bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx)
  {
      uint64_t hcr_el2;
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
  #endif /* !CONFIG_USER_ONLY */
  /* Return true if the translation regime is using LPAE format page tables */
 -static inline bool regime_using_lpae_format(CPUARMState *env,
 -                                            ARMMMUIdx mmu_idx)
 +bool regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
  {
      int el = regime_el(env, mmu_idx);
      if (el == 2 || arm_el_is_aa64(env, el)) {
@@ -XXX,XX +XXX,XX @@ bool arm_s1_regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
  }
  #ifndef CONFIG_USER_ONLY
 -static inline bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
 +bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
  {
      switch (mmu_idx) {
      case ARMMMUIdx_SE10_0:
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
      return 0;
  }
 -static bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
 -                             MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                             hwaddr *phys_ptr, int *prot,
 -                             target_ulong *page_size,
 -                             ARMMMUFaultInfo *fi)
 +bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
 +                      MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                      hwaddr *phys_ptr, int *prot,
 +                      target_ulong *page_size,
 +                      ARMMMUFaultInfo *fi)
  {
      CPUState *cs = env_cpu(env);
      int level = 1;
@@ -XXX,XX +XXX,XX @@ do_fault:
      return true;
  }
 -static bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
 -                             MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                             hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
 -                             target_ulong *page_size, ARMMMUFaultInfo *fi)
 +bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
 +                      MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                      hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
 +                      target_ulong *page_size, ARMMMUFaultInfo *fi)
  {
      CPUState *cs = env_cpu(env);
      ARMCPU *cpu = env_archcpu(env);
@@ -XXX,XX +XXX,XX @@ unsigned int arm_pamax(ARMCPU *cpu)
      return pamax_map[parange];
  }
 -static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
 +int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
  {
      if (regime_has_2_ranges(mmu_idx)) {
          return extract64(tcr, 37, 2);
@@ -XXX,XX +XXX,XX @@ static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
      }
  }
--static int aa64_va_parameter_tbid(uint64_t tcr, ARMMMUIdx mmu_idx)
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftmad_d)(void *vd, void *vn, void *vm,
-+int aa64_va_parameter_tbid(uint64_t tcr, ARMMMUIdx mmu_idx)
+x3e21ee96d2641b13ull, 0xbda8f76380fbb401ull,
- {
+     };
-     if (regime_has_2_ranges(mmu_idx)) {
+     intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float64);
-         return extract64(tcr, 51, 2);
+-    intptr_t x = simd_data(desc);
-@@ -XXX,XX +XXX,XX @@ static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
++    intptr_t x = extract32(desc, SIMD_DATA_SHIFT, 3);
-  * @fi: set to fault info if the translation fails
++    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 3, 1);
-  * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
+     float64 *d = vd, *n = vn, *m = vm;
-  */
++
--static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
+     for (i = 0; i < opr_sz; i++) {
--                               MMUAccessType access_type, ARMMMUIdx mmu_idx,
+         float64 mm = m[i];
--                               bool s1_is_el0,
+         intptr_t xx = x;
--                               hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
++        int flags = 0;
--                               target_ulong *page_size_ptr,
++
--                               ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
+         if (float64_is_neg(mm)) {
-+bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
+-            mm = float64_abs(mm);
-+                        MMUAccessType access_type, ARMMMUIdx mmu_idx,
++            if (fpcr_ah) {
-+                        bool s1_is_el0,
++                flags = float_muladd_negate_product;
-+                        hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
++            } else {
-+                        target_ulong *page_size_ptr,
++                mm = float64_abs(mm);
-+                        ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
++            }
- {
+             xx += 8;
-     ARMCPU *cpu = env_archcpu(env);
+         }
-     CPUState *cs = CPU(cpu);
+-        d[i] = float64_muladd(n[i], mm, coeff[xx], 0, s);
-@@ -XXX,XX +XXX,XX @@ static inline bool m_is_system_region(CPUARMState *env, uint32_t address)
++        d[i] = float64_muladd(n[i], mm, coeff[xx], flags, s);
-     return arm_feature(env, ARM_FEATURE_M) && extract32(address, 29, 3) == 0x7;
+     }
  }
--static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 -                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                                 hwaddr *phys_ptr, int *prot,
 -                                 target_ulong *page_size,
 -                                 ARMMMUFaultInfo *fi)
 +bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
 +                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                          hwaddr *phys_ptr, int *prot,
 +                          target_ulong *page_size,
 +                          ARMMMUFaultInfo *fi)
  {
      ARMCPU *cpu = env_archcpu(env);
      int n;
@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
  }
 -static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
 -                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                                 hwaddr *phys_ptr, MemTxAttrs *txattrs,
 -                                 int *prot, target_ulong *page_size,
 -                                 ARMMMUFaultInfo *fi)
 +bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
 +                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                          hwaddr *phys_ptr, MemTxAttrs *txattrs,
 +                          int *prot, target_ulong *page_size,
 +                          ARMMMUFaultInfo *fi)
  {
      uint32_t secure = regime_is_secure(env, mmu_idx);
      V8M_SAttributes sattrs = {};
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
      return ret;
  }
 -static bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
 -                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                                 hwaddr *phys_ptr, int *prot,
 -                                 ARMMMUFaultInfo *fi)
 +bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
 +                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                          hwaddr *phys_ptr, int *prot,
 +                          ARMMMUFaultInfo *fi)
  {
      int n;
      uint32_t mask;
@@ -XXX,XX +XXX,XX @@ static uint8_t combined_attrs_fwb(CPUARMState *env,
   * @s1:      Attributes from stage 1 walk
   * @s2:      Attributes from stage 2 walk
   */
 -static ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
 -                                        ARMCacheAttrs s1, ARMCacheAttrs s2)
 +ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
 +                                 ARMCacheAttrs s1, ARMCacheAttrs s2)
  {
      ARMCacheAttrs ret;
      bool tagged = false;
@@ -XXX,XX +XXX,XX @@ static ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
      return ret;
  }
 -
 -/* get_phys_addr - get the physical address for this virtual address
 - *
 - * Find the physical address corresponding to the given virtual address,
 - * by doing a translation table walk on MMU based systems or using the
 - * MPU state on MPU based systems.
 - *
 - * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
 - * prot and page_size may not be filled in, and the populated fsr value provides
 - * information on why the translation aborted, in the format of a
 - * DFSR/IFSR fault register, with the following caveats:
 - *  * we honour the short vs long DFSR format differences.
 - *  * the WnR bit is never set (the caller must do this).
 - *  * for PSMAv5 based systems we don't bother to return a full FSR format
 - *    value.
 - *
 - * @env: CPUARMState
 - * @address: virtual address to get physical address for
 - * @access_type: 0 for read, 1 for write, 2 for execute
 - * @mmu_idx: MMU index indicating required translation regime
 - * @phys_ptr: set to the physical address corresponding to the virtual address
 - * @attrs: set to the memory transaction attributes to use
 - * @prot: set to the permissions for the page containing phys_ptr
 - * @page_size: set to the size of the page containing phys_ptr
 - * @fi: set to fault info if the translation fails
 - * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
 - */
 -bool get_phys_addr(CPUARMState *env, target_ulong address,
 -                   MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                   hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
 -                   target_ulong *page_size,
 -                   ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
 -{
 -    ARMMMUIdx s1_mmu_idx = stage_1_mmu_idx(mmu_idx);
 -
 -    if (mmu_idx != s1_mmu_idx) {
 -        /* Call ourselves recursively to do the stage 1 and then stage 2
 -         * translations if mmu_idx is a two-stage regime.
 -         */
 -        if (arm_feature(env, ARM_FEATURE_EL2)) {
 -            hwaddr ipa;
 -            int s2_prot;
 -            int ret;
 -            bool ipa_secure;
 -            ARMCacheAttrs cacheattrs2 = {};
 -            ARMMMUIdx s2_mmu_idx;
 -            bool is_el0;
 -
 -            ret = get_phys_addr(env, address, access_type, s1_mmu_idx, &ipa,
 -                                attrs, prot, page_size, fi, cacheattrs);
 -
 -            /* If S1 fails or S2 is disabled, return early.  */
 -            if (ret || regime_translation_disabled(env, ARMMMUIdx_Stage2)) {
 -                *phys_ptr = ipa;
 -                return ret;
 -            }
 -
 -            ipa_secure = attrs->secure;
 -            if (arm_is_secure_below_el3(env)) {
 -                if (ipa_secure) {
 -                    attrs->secure = !(env->cp15.vstcr_el2.raw_tcr & VSTCR_SW);
 -                } else {
 -                    attrs->secure = !(env->cp15.vtcr_el2.raw_tcr & VTCR_NSW);
 -                }
 -            } else {
 -                assert(!ipa_secure);
 -            }
 -
 -            s2_mmu_idx = attrs->secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
 -            is_el0 = mmu_idx == ARMMMUIdx_E10_0 || mmu_idx == ARMMMUIdx_SE10_0;
 -
 -            /* S1 is done. Now do S2 translation.  */
 -            ret = get_phys_addr_lpae(env, ipa, access_type, s2_mmu_idx, is_el0,
 -                                     phys_ptr, attrs, &s2_prot,
 -                                     page_size, fi, &cacheattrs2);
 -            fi->s2addr = ipa;
 -            /* Combine the S1 and S2 perms.  */
 -            *prot &= s2_prot;
 -
 -            /* If S2 fails, return early.  */
 -            if (ret) {
 -                return ret;
 -            }
 -
 -            /* Combine the S1 and S2 cache attributes. */
 -            if (arm_hcr_el2_eff(env) & HCR_DC) {
 -                /*
 -                 * HCR.DC forces the first stage attributes to
 -                 *  Normal Non-Shareable,
 -                 *  Inner Write-Back Read-Allocate Write-Allocate,
 -                 *  Outer Write-Back Read-Allocate Write-Allocate.
 -                 * Do not overwrite Tagged within attrs.
 -                 */
 -                if (cacheattrs->attrs != 0xf0) {
 -                    cacheattrs->attrs = 0xff;
 -                }
 -                cacheattrs->shareability = 0;
 -            }
 -            *cacheattrs = combine_cacheattrs(env, *cacheattrs, cacheattrs2);
 -
 -            /* Check if IPA translates to secure or non-secure PA space. */
 -            if (arm_is_secure_below_el3(env)) {
 -                if (ipa_secure) {
 -                    attrs->secure =
 -                        !(env->cp15.vstcr_el2.raw_tcr & (VSTCR_SA | VSTCR_SW));
 -                } else {
 -                    attrs->secure =
 -                        !((env->cp15.vtcr_el2.raw_tcr & (VTCR_NSA | VTCR_NSW))
 -                        || (env->cp15.vstcr_el2.raw_tcr & (VSTCR_SA | VSTCR_SW)));
 -                }
 -            }
 -            return 0;
 -        } else {
 -            /*
 -             * For non-EL2 CPUs a stage1+stage2 translation is just stage 1.
 -             */
 -            mmu_idx = stage_1_mmu_idx(mmu_idx);
 -        }
 -    }
 -
 -    /* The page table entries may downgrade secure to non-secure, but
 -     * cannot upgrade an non-secure translation regime's attributes
 -     * to secure.
 -     */
 -    attrs->secure = regime_is_secure(env, mmu_idx);
 -    attrs->user = regime_is_user(env, mmu_idx);
 -
 -    /* Fast Context Switch Extension. This doesn't exist at all in v8.
 -     * In v7 and earlier it affects all stage 1 translations.
 -     */
 -    if (address < 0x02000000 && mmu_idx != ARMMMUIdx_Stage2
 -        && !arm_feature(env, ARM_FEATURE_V8)) {
 -        if (regime_el(env, mmu_idx) == 3) {
 -            address += env->cp15.fcseidr_s;
 -        } else {
 -            address += env->cp15.fcseidr_ns;
 -        }
 -    }
 -
 -    if (arm_feature(env, ARM_FEATURE_PMSA)) {
 -        bool ret;
 -        *page_size = TARGET_PAGE_SIZE;
 -
 -        if (arm_feature(env, ARM_FEATURE_V8)) {
 -            /* PMSAv8 */
 -            ret = get_phys_addr_pmsav8(env, address, access_type, mmu_idx,
 -                                       phys_ptr, attrs, prot, page_size, fi);
 -        } else if (arm_feature(env, ARM_FEATURE_V7)) {
 -            /* PMSAv7 */
 -            ret = get_phys_addr_pmsav7(env, address, access_type, mmu_idx,
 -                                       phys_ptr, prot, page_size, fi);
 -        } else {
 -            /* Pre-v7 MPU */
 -            ret = get_phys_addr_pmsav5(env, address, access_type, mmu_idx,
 -                                       phys_ptr, prot, fi);
 -        }
 -        qemu_log_mask(CPU_LOG_MMU, "PMSA MPU lookup for %s at 0x%08" PRIx32
 -                      " mmu_idx %u -> %s (prot %c%c%c)\n",
 -                      access_type == MMU_DATA_LOAD ? "reading" :
 -                      (access_type == MMU_DATA_STORE ? "writing" : "execute"),
 -                      (uint32_t)address, mmu_idx,
 -                      ret ? "Miss" : "Hit",
 -                      *prot & PAGE_READ ? 'r' : '-',
 -                      *prot & PAGE_WRITE ? 'w' : '-',
 -                      *prot & PAGE_EXEC ? 'x' : '-');
 -
 -        return ret;
 -    }
 -
 -    /* Definitely a real MMU, not an MPU */
 -
 -    if (regime_translation_disabled(env, mmu_idx)) {
 -        uint64_t hcr;
 -        uint8_t memattr;
 -
 -        /*
 -         * MMU disabled.  S1 addresses within aa64 translation regimes are
 -         * still checked for bounds -- see AArch64.TranslateAddressS1Off.
 -         */
 -        if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
 -            int r_el = regime_el(env, mmu_idx);
 -            if (arm_el_is_aa64(env, r_el)) {
 -                int pamax = arm_pamax(env_archcpu(env));
 -                uint64_t tcr = env->cp15.tcr_el[r_el].raw_tcr;
 -                int addrtop, tbi;
 -
 -                tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
 -                if (access_type == MMU_INST_FETCH) {
 -                    tbi &= ~aa64_va_parameter_tbid(tcr, mmu_idx);
 -                }
 -                tbi = (tbi >> extract64(address, 55, 1)) & 1;
 -                addrtop = (tbi ? 55 : 63);
 -
 -                if (extract64(address, pamax, addrtop - pamax + 1) != 0) {
 -                    fi->type = ARMFault_AddressSize;
 -                    fi->level = 0;
 -                    fi->stage2 = false;
 -                    return 1;
 -                }
 -
 -                /*
 -                 * When TBI is disabled, we've just validated that all of the
 -                 * bits above PAMax are zero, so logically we only need to
 -                 * clear the top byte for TBI.  But it's clearer to follow
 -                 * the pseudocode set of addrdesc.paddress.
 -                 */
 -                address = extract64(address, 0, 52);
 -            }
 -        }
 -        *phys_ptr = address;
 -        *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
 -        *page_size = TARGET_PAGE_SIZE;
 -
 -        /* Fill in cacheattr a-la AArch64.TranslateAddressS1Off. */
 -        hcr = arm_hcr_el2_eff(env);
 -        cacheattrs->shareability = 0;
 -        cacheattrs->is_s2_format = false;
 -        if (hcr & HCR_DC) {
 -            if (hcr & HCR_DCT) {
 -                memattr = 0xf0;  /* Tagged, Normal, WB, RWA */
 -            } else {
 -                memattr = 0xff;  /* Normal, WB, RWA */
 -            }
 -        } else if (access_type == MMU_INST_FETCH) {
 -            if (regime_sctlr(env, mmu_idx) & SCTLR_I) {
 -                memattr = 0xee;  /* Normal, WT, RA, NT */
 -            } else {
 -                memattr = 0x44;  /* Normal, NC, No */
 -            }
 -            cacheattrs->shareability = 2; /* outer sharable */
 -        } else {
 -            memattr = 0x00;      /* Device, nGnRnE */
 -        }
 -        cacheattrs->attrs = memattr;
 -        return 0;
 -    }
 -
 -    if (regime_using_lpae_format(env, mmu_idx)) {
 -        return get_phys_addr_lpae(env, address, access_type, mmu_idx, false,
 -                                  phys_ptr, attrs, prot, page_size,
 -                                  fi, cacheattrs);
 -    } else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
 -        return get_phys_addr_v6(env, address, access_type, mmu_idx,
 -                                phys_ptr, attrs, prot, page_size, fi);
 -    } else {
 -        return get_phys_addr_v5(env, address, access_type, mmu_idx,
 -                                    phys_ptr, prot, page_size, fi);
 -    }
 -}
 -
  hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
                                           MemTxAttrs *attrs)
  {
@@ -XXX,XX +XXX,XX @@ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
      }
      return phys_addr;
  }
 -
  #endif
  /* Note that signed overflow is undefined in C.  The following routines are
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * ARM page table walking.
 + *
 + * This code is licensed under the GNU GPL v2 or later.
 + *
 + * SPDX-License-Identifier: GPL-2.0-or-later
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/log.h"
 +#include "cpu.h"
 +#include "internals.h"
 +#include "ptw.h"
 +
 +
 +/**
 + * get_phys_addr - get the physical address for this virtual address
 + *
 + * Find the physical address corresponding to the given virtual address,
 + * by doing a translation table walk on MMU based systems or using the
 + * MPU state on MPU based systems.
 + *
 + * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
 + * prot and page_size may not be filled in, and the populated fsr value provides
 + * information on why the translation aborted, in the format of a
 + * DFSR/IFSR fault register, with the following caveats:
 + *  * we honour the short vs long DFSR format differences.
 + *  * the WnR bit is never set (the caller must do this).
 + *  * for PSMAv5 based systems we don't bother to return a full FSR format
 + *    value.
 + *
 + * @env: CPUARMState
 + * @address: virtual address to get physical address for
 + * @access_type: 0 for read, 1 for write, 2 for execute
 + * @mmu_idx: MMU index indicating required translation regime
 + * @phys_ptr: set to the physical address corresponding to the virtual address
 + * @attrs: set to the memory transaction attributes to use
 + * @prot: set to the permissions for the page containing phys_ptr
 + * @page_size: set to the size of the page containing phys_ptr
 + * @fi: set to fault info if the translation fails
 + * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
 + */
 +bool get_phys_addr(CPUARMState *env, target_ulong address,
 +                   MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                   hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
 +                   target_ulong *page_size,
 +                   ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
 +{
 +    ARMMMUIdx s1_mmu_idx = stage_1_mmu_idx(mmu_idx);
 +
 +    if (mmu_idx != s1_mmu_idx) {
 +        /*
 +         * Call ourselves recursively to do the stage 1 and then stage 2
 +         * translations if mmu_idx is a two-stage regime.
 +         */
 +        if (arm_feature(env, ARM_FEATURE_EL2)) {
 +            hwaddr ipa;
 +            int s2_prot;
 +            int ret;
 +            bool ipa_secure;
 +            ARMCacheAttrs cacheattrs2 = {};
 +            ARMMMUIdx s2_mmu_idx;
 +            bool is_el0;
 +
 +            ret = get_phys_addr(env, address, access_type, s1_mmu_idx, &ipa,
 +                                attrs, prot, page_size, fi, cacheattrs);
 +
 +            /* If S1 fails or S2 is disabled, return early.  */
 +            if (ret || regime_translation_disabled(env, ARMMMUIdx_Stage2)) {
 +                *phys_ptr = ipa;
 +                return ret;
 +            }
 +
 +            ipa_secure = attrs->secure;
 +            if (arm_is_secure_below_el3(env)) {
 +                if (ipa_secure) {
 +                    attrs->secure = !(env->cp15.vstcr_el2.raw_tcr & VSTCR_SW);
 +                } else {
 +                    attrs->secure = !(env->cp15.vtcr_el2.raw_tcr & VTCR_NSW);
 +                }
 +            } else {
 +                assert(!ipa_secure);
 +            }
 +
 +            s2_mmu_idx = attrs->secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
 +            is_el0 = mmu_idx == ARMMMUIdx_E10_0 || mmu_idx == ARMMMUIdx_SE10_0;
 +
 +            /* S1 is done. Now do S2 translation.  */
 +            ret = get_phys_addr_lpae(env, ipa, access_type, s2_mmu_idx, is_el0,
 +                                     phys_ptr, attrs, &s2_prot,
 +                                     page_size, fi, &cacheattrs2);
 +            fi->s2addr = ipa;
 +            /* Combine the S1 and S2 perms.  */
 +            *prot &= s2_prot;
 +
 +            /* If S2 fails, return early.  */
 +            if (ret) {
 +                return ret;
 +            }
 +
 +            /* Combine the S1 and S2 cache attributes. */
 +            if (arm_hcr_el2_eff(env) & HCR_DC) {
 +                /*
 +                 * HCR.DC forces the first stage attributes to
 +                 *  Normal Non-Shareable,
 +                 *  Inner Write-Back Read-Allocate Write-Allocate,
 +                 *  Outer Write-Back Read-Allocate Write-Allocate.
 +                 * Do not overwrite Tagged within attrs.
 +                 */
 +                if (cacheattrs->attrs != 0xf0) {
 +                    cacheattrs->attrs = 0xff;
 +                }
 +                cacheattrs->shareability = 0;
 +            }
 +            *cacheattrs = combine_cacheattrs(env, *cacheattrs, cacheattrs2);
 +
 +            /* Check if IPA translates to secure or non-secure PA space. */
 +            if (arm_is_secure_below_el3(env)) {
 +                if (ipa_secure) {
 +                    attrs->secure =
 +                        !(env->cp15.vstcr_el2.raw_tcr & (VSTCR_SA | VSTCR_SW));
 +                } else {
 +                    attrs->secure =
 +                        !((env->cp15.vtcr_el2.raw_tcr & (VTCR_NSA | VTCR_NSW))
 +                        || (env->cp15.vstcr_el2.raw_tcr & (VSTCR_SA | VSTCR_SW)));
 +                }
 +            }
 +            return 0;
 +        } else {
 +            /*
 +             * For non-EL2 CPUs a stage1+stage2 translation is just stage 1.
 +             */
 +            mmu_idx = stage_1_mmu_idx(mmu_idx);
 +        }
 +    }
 +
 +    /*
 +     * The page table entries may downgrade secure to non-secure, but
 +     * cannot upgrade an non-secure translation regime's attributes
 +     * to secure.
 +     */
 +    attrs->secure = regime_is_secure(env, mmu_idx);
 +    attrs->user = regime_is_user(env, mmu_idx);
 +
 +    /*
 +     * Fast Context Switch Extension. This doesn't exist at all in v8.
 +     * In v7 and earlier it affects all stage 1 translations.
 +     */
 +    if (address < 0x02000000 && mmu_idx != ARMMMUIdx_Stage2
 +        && !arm_feature(env, ARM_FEATURE_V8)) {
 +        if (regime_el(env, mmu_idx) == 3) {
 +            address += env->cp15.fcseidr_s;
 +        } else {
 +            address += env->cp15.fcseidr_ns;
 +        }
 +    }
 +
 +    if (arm_feature(env, ARM_FEATURE_PMSA)) {
 +        bool ret;
 +        *page_size = TARGET_PAGE_SIZE;
 +
 +        if (arm_feature(env, ARM_FEATURE_V8)) {
 +            /* PMSAv8 */
 +            ret = get_phys_addr_pmsav8(env, address, access_type, mmu_idx,
 +                                       phys_ptr, attrs, prot, page_size, fi);
 +        } else if (arm_feature(env, ARM_FEATURE_V7)) {
 +            /* PMSAv7 */
 +            ret = get_phys_addr_pmsav7(env, address, access_type, mmu_idx,
 +                                       phys_ptr, prot, page_size, fi);
 +        } else {
 +            /* Pre-v7 MPU */
 +            ret = get_phys_addr_pmsav5(env, address, access_type, mmu_idx,
 +                                       phys_ptr, prot, fi);
 +        }
 +        qemu_log_mask(CPU_LOG_MMU, "PMSA MPU lookup for %s at 0x%08" PRIx32
 +                      " mmu_idx %u -> %s (prot %c%c%c)\n",
 +                      access_type == MMU_DATA_LOAD ? "reading" :
 +                      (access_type == MMU_DATA_STORE ? "writing" : "execute"),
 +                      (uint32_t)address, mmu_idx,
 +                      ret ? "Miss" : "Hit",
 +                      *prot & PAGE_READ ? 'r' : '-',
 +                      *prot & PAGE_WRITE ? 'w' : '-',
 +                      *prot & PAGE_EXEC ? 'x' : '-');
 +
 +        return ret;
 +    }
 +
 +    /* Definitely a real MMU, not an MPU */
 +
 +    if (regime_translation_disabled(env, mmu_idx)) {
 +        uint64_t hcr;
 +        uint8_t memattr;
 +
 +        /*
 +         * MMU disabled.  S1 addresses within aa64 translation regimes are
 +         * still checked for bounds -- see AArch64.TranslateAddressS1Off.
 +         */
 +        if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
 +            int r_el = regime_el(env, mmu_idx);
 +            if (arm_el_is_aa64(env, r_el)) {
 +                int pamax = arm_pamax(env_archcpu(env));
 +                uint64_t tcr = env->cp15.tcr_el[r_el].raw_tcr;
 +                int addrtop, tbi;
 +
 +                tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
 +                if (access_type == MMU_INST_FETCH) {
 +                    tbi &= ~aa64_va_parameter_tbid(tcr, mmu_idx);
 +                }
 +                tbi = (tbi >> extract64(address, 55, 1)) & 1;
 +                addrtop = (tbi ? 55 : 63);
 +
 +                if (extract64(address, pamax, addrtop - pamax + 1) != 0) {
 +                    fi->type = ARMFault_AddressSize;
 +                    fi->level = 0;
 +                    fi->stage2 = false;
 +                    return 1;
 +                }
 +
 +                /*
 +                 * When TBI is disabled, we've just validated that all of the
 +                 * bits above PAMax are zero, so logically we only need to
 +                 * clear the top byte for TBI.  But it's clearer to follow
 +                 * the pseudocode set of addrdesc.paddress.
 +                 */
 +                address = extract64(address, 0, 52);
 +            }
 +        }
 +        *phys_ptr = address;
 +        *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
 +        *page_size = TARGET_PAGE_SIZE;
 +
 +        /* Fill in cacheattr a-la AArch64.TranslateAddressS1Off. */
 +        hcr = arm_hcr_el2_eff(env);
 +        cacheattrs->shareability = 0;
 +        cacheattrs->is_s2_format = false;
 +        if (hcr & HCR_DC) {
 +            if (hcr & HCR_DCT) {
 +                memattr = 0xf0;  /* Tagged, Normal, WB, RWA */
 +            } else {
 +                memattr = 0xff;  /* Normal, WB, RWA */
 +            }
 +        } else if (access_type == MMU_INST_FETCH) {
 +            if (regime_sctlr(env, mmu_idx) & SCTLR_I) {
 +                memattr = 0xee;  /* Normal, WT, RA, NT */
 +            } else {
 +                memattr = 0x44;  /* Normal, NC, No */
 +            }
 +            cacheattrs->shareability = 2; /* outer sharable */
 +        } else {
 +            memattr = 0x00;      /* Device, nGnRnE */
 +        }
 +        cacheattrs->attrs = memattr;
 +        return 0;
 +    }
 +
 +    if (regime_using_lpae_format(env, mmu_idx)) {
 +        return get_phys_addr_lpae(env, address, access_type, mmu_idx, false,
 +                                  phys_ptr, attrs, prot, page_size,
 +                                  fi, cacheattrs);
 +    } else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
 +        return get_phys_addr_v6(env, address, access_type, mmu_idx,
 +                                phys_ptr, attrs, prot, page_size, fi);
 +    } else {
 +        return get_phys_addr_v5(env, address, access_type, mmu_idx,
 +                                    phys_ptr, prot, page_size, fi);
 +    }
 +}
 diff --git a/target/arm/meson.build b/target/arm/meson.build
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/meson.build
+--- a/target/arm/tcg/translate-sve.c
-+++ b/target/arm/meson.build
++++ b/target/arm/tcg/translate-sve.c
-@@ -XXX,XX +XXX,XX @@ arm_softmmu_ss.add(files(
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const ftmad_fns[4] = {
-   'machine.c',
+     gen_helper_sve_ftmad_s, gen_helper_sve_ftmad_d,
-   'monitor.c',
+ };
-   'psci.c',
+ TRANS_FEAT_NONSTREAMING(FTMAD, aa64_sve, gen_gvec_fpst_zzz,
-+  'ptw.c',
+-                        ftmad_fns[a->esz], a->rd, a->rn, a->rm, a->imm,
- ))
++                        ftmad_fns[a->esz], a->rd, a->rn, a->rm,
++                        a->imm | (s->fpcr_ah << 3),
- subdir('hvf')
+                         a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
  /*
 --
-.25.1
+.34.1

-[PULL 11/55] target/arm: Move get_phys_addr_v6 to ptw.c
+[PULL 45/68] target/arm: Handle FPCR.AH in vector FCMLA
 From: Richard Henderson <richard.henderson@linaro.org>
+The negation step in FCMLA mustn't negate a NaN when FPCR.AH
+is set. Handle this by passing FPCR.AH to the helper via the
+SIMD data field, and use this to select whether to do the
+negation via XOR or via the muladd negate_product flag.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-5-richard.henderson@linaro.org
+Message-id: 20250129013857.135256-26-richard.henderson@linaro.org
 [PMM: Expanded commit message]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/ptw.h    |  11 +--
+ target/arm/tcg/translate-a64.c |  2 +-
- target/arm/helper.c | 161 +-------------------------------------------
+ target/arm/tcg/vec_helper.c    | 66 ++++++++++++++++++++--------------
- target/arm/ptw.c    | 153 +++++++++++++++++++++++++++++++++++++++++
+files changed, 40 insertions(+), 28 deletions(-)
 files changed, 161 insertions(+), 164 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/ptw.h
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
-                               uint32_t *table, uint32_t address);
- int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
+     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
-                   int ap, int domain_prot);
+                       a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
-+int simple_ap_to_rw_prot_is_user(int ap, bool is_user);
+-                      a->rot, fn[a->esz]);
-+
++                      a->rot | (s->fpcr_ah << 2), fn[a->esz]);
 +static inline int
 +simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
 +{
 +    return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
 +}
  bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
                            MMUAccessType access_type, ARMMMUIdx mmu_idx,
                            hwaddr *phys_ptr, int *prot,
                            ARMMMUFaultInfo *fi);
 -bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
 -                      MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                      hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
 -                      target_ulong *page_size, ARMMMUFaultInfo *fi);
  bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
                            MMUAccessType access_type, ARMMMUIdx mmu_idx,
                            hwaddr *phys_ptr, int *prot,
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap, int domain_prot)
   * @ap:      The 2-bit simple AP (AP[2:1])
   * @is_user: TRUE if accessing from PL0
   */
 -static inline int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
 +int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
  {
      switch (ap) {
      case 0:
@@ -XXX,XX +XXX,XX @@ static inline int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
      }
  }
 -static inline int
 -simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
 -{
 -    return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
 -}
 -
  /* Translate S2 section/page access permissions to protection flags
   *
   * @env:     CPUARMState
@@ -XXX,XX +XXX,XX @@ uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
      return 0;
  }
 -bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
 -                      MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                      hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
 -                      target_ulong *page_size, ARMMMUFaultInfo *fi)
 -{
 -    CPUState *cs = env_cpu(env);
 -    ARMCPU *cpu = env_archcpu(env);
 -    int level = 1;
 -    uint32_t table;
 -    uint32_t desc;
 -    uint32_t xn;
 -    uint32_t pxn = 0;
 -    int type;
 -    int ap;
 -    int domain = 0;
 -    int domain_prot;
 -    hwaddr phys_addr;
 -    uint32_t dacr;
 -    bool ns;
 -
 -    /* Pagetable walk.  */
 -    /* Lookup l1 descriptor.  */
 -    if (!get_level1_table_address(env, mmu_idx, &table, address)) {
 -        /* Section translation fault if page walk is disabled by PD0 or PD1 */
 -        fi->type = ARMFault_Translation;
 -        goto do_fault;
 -    }
 -    desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
 -                       mmu_idx, fi);
 -    if (fi->type != ARMFault_None) {
 -        goto do_fault;
 -    }
 -    type = (desc & 3);
 -    if (type == 0 || (type == 3 && !cpu_isar_feature(aa32_pxn, cpu))) {
 -        /* Section translation fault, or attempt to use the encoding
 -         * which is Reserved on implementations without PXN.
 -         */
 -        fi->type = ARMFault_Translation;
 -        goto do_fault;
 -    }
 -    if ((type == 1) || !(desc & (1 << 18))) {
 -        /* Page or Section.  */
 -        domain = (desc >> 5) & 0x0f;
 -    }
 -    if (regime_el(env, mmu_idx) == 1) {
 -        dacr = env->cp15.dacr_ns;
 -    } else {
 -        dacr = env->cp15.dacr_s;
 -    }
 -    if (type == 1) {
 -        level = 2;
 -    }
 -    domain_prot = (dacr >> (domain * 2)) & 3;
 -    if (domain_prot == 0 || domain_prot == 2) {
 -        /* Section or Page domain fault */
 -        fi->type = ARMFault_Domain;
 -        goto do_fault;
 -    }
 -    if (type != 1) {
 -        if (desc & (1 << 18)) {
 -            /* Supersection.  */
 -            phys_addr = (desc & 0xff000000) | (address & 0x00ffffff);
 -            phys_addr |= (uint64_t)extract32(desc, 20, 4) << 32;
 -            phys_addr |= (uint64_t)extract32(desc, 5, 4) << 36;
 -            *page_size = 0x1000000;
 -        } else {
 -            /* Section.  */
 -            phys_addr = (desc & 0xfff00000) | (address & 0x000fffff);
 -            *page_size = 0x100000;
 -        }
 -        ap = ((desc >> 10) & 3) | ((desc >> 13) & 4);
 -        xn = desc & (1 << 4);
 -        pxn = desc & 1;
 -        ns = extract32(desc, 19, 1);
 -    } else {
 -        if (cpu_isar_feature(aa32_pxn, cpu)) {
 -            pxn = (desc >> 2) & 1;
 -        }
 -        ns = extract32(desc, 3, 1);
 -        /* Lookup l2 entry.  */
 -        table = (desc & 0xfffffc00) | ((address >> 10) & 0x3fc);
 -        desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
 -                           mmu_idx, fi);
 -        if (fi->type != ARMFault_None) {
 -            goto do_fault;
 -        }
 -        ap = ((desc >> 4) & 3) | ((desc >> 7) & 4);
 -        switch (desc & 3) {
 -        case 0: /* Page translation fault.  */
 -            fi->type = ARMFault_Translation;
 -            goto do_fault;
 -        case 1: /* 64k page.  */
 -            phys_addr = (desc & 0xffff0000) | (address & 0xffff);
 -            xn = desc & (1 << 15);
 -            *page_size = 0x10000;
 -            break;
 -        case 2: case 3: /* 4k page.  */
 -            phys_addr = (desc & 0xfffff000) | (address & 0xfff);
 -            xn = desc & 1;
 -            *page_size = 0x1000;
 -            break;
 -        default:
 -            /* Never happens, but compiler isn't smart enough to tell.  */
 -            g_assert_not_reached();
 -        }
 -    }
 -    if (domain_prot == 3) {
 -        *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
 -    } else {
 -        if (pxn && !regime_is_user(env, mmu_idx)) {
 -            xn = 1;
 -        }
 -        if (xn && access_type == MMU_INST_FETCH) {
 -            fi->type = ARMFault_Permission;
 -            goto do_fault;
 -        }
 -
 -        if (arm_feature(env, ARM_FEATURE_V6K) &&
 -                (regime_sctlr(env, mmu_idx) & SCTLR_AFE)) {
 -            /* The simplified model uses AP[0] as an access control bit.  */
 -            if ((ap & 1) == 0) {
 -                /* Access flag fault.  */
 -                fi->type = ARMFault_AccessFlag;
 -                goto do_fault;
 -            }
 -            *prot = simple_ap_to_rw_prot(env, mmu_idx, ap >> 1);
 -        } else {
 -            *prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
 -        }
 -        if (*prot && !xn) {
 -            *prot |= PAGE_EXEC;
 -        }
 -        if (!(*prot & (1 << access_type))) {
 -            /* Access permission fault.  */
 -            fi->type = ARMFault_Permission;
 -            goto do_fault;
 -        }
 -    }
 -    if (ns) {
 -        /* The NS bit will (as required by the architecture) have no effect if
 -         * the CPU doesn't support TZ or this is a non-secure translation
 -         * regime, because the attribute will already be non-secure.
 -         */
 -        attrs->secure = false;
 -    }
 -    *phys_ptr = phys_addr;
 -    return false;
 -do_fault:
 -    fi->domain = domain;
 -    fi->level = level;
 -    return true;
 -}
 -
  /*
   * check_s2_mmu_setup
   * @cpu:        ARMCPU
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ do_fault:
      return true;
  }
-+static bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
-+                             MMUAccessType access_type, ARMMMUIdx mmu_idx,
+index XXXXXXX..XXXXXXX 100644
-+                             hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
+--- a/target/arm/tcg/vec_helper.c
-+                             target_ulong *page_size, ARMMMUFaultInfo *fi)
++++ b/target/arm/tcg/vec_helper.c
-+{
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm, void *va,
-+    CPUState *cs = env_cpu(env);
+     uintptr_t opr_sz = simd_oprsz(desc);
-+    ARMCPU *cpu = env_archcpu(env);
+     float16 *d = vd, *n = vn, *m = vm, *a = va;
-+    int level = 1;
+     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-+    uint32_t table;
+-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-+    uint32_t desc;
+-    uint32_t neg_real = flip ^ neg_imag;
-+    uint32_t xn;
++    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
-+    uint32_t pxn = 0;
++    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-+    int type;
++    uint32_t negf_real = flip ^ negf_imag;
-+    int ap;
++    float16 negx_imag, negx_real;
-+    int domain = 0;
+     uintptr_t i;
-+    int domain_prot;
-+    hwaddr phys_addr;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
-+    uint32_t dacr;
+-    neg_real <<= 15;
-+    bool ns;
+-    neg_imag <<= 15;
-+
++    /* With AH=0, use negx; with AH=1 use negf. */
-+    /* Pagetable walk.  */
++    negx_real = (negf_real & ~fpcr_ah) << 15;
-+    /* Lookup l1 descriptor.  */
++    negx_imag = (negf_imag & ~fpcr_ah) << 15;
-+    if (!get_level1_table_address(env, mmu_idx, &table, address)) {
++    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
-+        /* Section translation fault if page walk is disabled by PD0 or PD1 */
++    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
-+        fi->type = ARMFault_Translation;
-+        goto do_fault;
+     for (i = 0; i < opr_sz / 2; i += 2) {
-+    }
+         float16 e2 = n[H2(i + flip)];
-+    desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
+-        float16 e1 = m[H2(i + flip)] ^ neg_real;
-+                       mmu_idx, fi);
++        float16 e1 = m[H2(i + flip)] ^ negx_real;
-+    if (fi->type != ARMFault_None) {
+         float16 e4 = e2;
-+        goto do_fault;
+-        float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag;
-+    }
++        float16 e3 = m[H2(i + 1 - flip)] ^ negx_imag;
-+    type = (desc & 3);
-+    if (type == 0 || (type == 3 && !cpu_isar_feature(aa32_pxn, cpu))) {
+-        d[H2(i)] = float16_muladd(e2, e1, a[H2(i)], 0, fpst);
-+        /* Section translation fault, or attempt to use the encoding
+-        d[H2(i + 1)] = float16_muladd(e4, e3, a[H2(i + 1)], 0, fpst);
-+         * which is Reserved on implementations without PXN.
++        d[H2(i)] = float16_muladd(e2, e1, a[H2(i)], negf_real, fpst);
-+         */
++        d[H2(i + 1)] = float16_muladd(e4, e3, a[H2(i + 1)], negf_imag, fpst);
-+        fi->type = ARMFault_Translation;
+     }
-+        goto do_fault;
+     clear_tail(d, opr_sz, simd_maxsz(desc));
-+    }
+ }
-+    if ((type == 1) || !(desc & (1 << 18))) {
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlas)(void *vd, void *vn, void *vm, void *va,
-+        /* Page or Section.  */
+     uintptr_t opr_sz = simd_oprsz(desc);
-+        domain = (desc >> 5) & 0x0f;
+     float32 *d = vd, *n = vn, *m = vm, *a = va;
-+    }
+     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-+    if (regime_el(env, mmu_idx) == 1) {
+-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-+        dacr = env->cp15.dacr_ns;
+-    uint32_t neg_real = flip ^ neg_imag;
-+    } else {
++    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
-+        dacr = env->cp15.dacr_s;
++    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-+    }
++    uint32_t negf_real = flip ^ negf_imag;
-+    if (type == 1) {
++    float32 negx_imag, negx_real;
-+        level = 2;
+     uintptr_t i;
-+    }
-+    domain_prot = (dacr >> (domain * 2)) & 3;
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
-+    if (domain_prot == 0 || domain_prot == 2) {
+-    neg_real <<= 31;
-+        /* Section or Page domain fault */
+-    neg_imag <<= 31;
-+        fi->type = ARMFault_Domain;
++    /* With AH=0, use negx; with AH=1 use negf. */
-+        goto do_fault;
++    negx_real = (negf_real & ~fpcr_ah) << 31;
-+    }
++    negx_imag = (negf_imag & ~fpcr_ah) << 31;
-+    if (type != 1) {
++    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
-+        if (desc & (1 << 18)) {
++    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
-+            /* Supersection.  */
-+            phys_addr = (desc & 0xff000000) | (address & 0x00ffffff);
+     for (i = 0; i < opr_sz / 4; i += 2) {
-+            phys_addr |= (uint64_t)extract32(desc, 20, 4) << 32;
+         float32 e2 = n[H4(i + flip)];
-+            phys_addr |= (uint64_t)extract32(desc, 5, 4) << 36;
+-        float32 e1 = m[H4(i + flip)] ^ neg_real;
-+            *page_size = 0x1000000;
++        float32 e1 = m[H4(i + flip)] ^ negx_real;
-+        } else {
+         float32 e4 = e2;
-+            /* Section.  */
+-        float32 e3 = m[H4(i + 1 - flip)] ^ neg_imag;
-+            phys_addr = (desc & 0xfff00000) | (address & 0x000fffff);
++        float32 e3 = m[H4(i + 1 - flip)] ^ negx_imag;
-+            *page_size = 0x100000;
-+        }
+-        d[H4(i)] = float32_muladd(e2, e1, a[H4(i)], 0, fpst);
-+        ap = ((desc >> 10) & 3) | ((desc >> 13) & 4);
+-        d[H4(i + 1)] = float32_muladd(e4, e3, a[H4(i + 1)], 0, fpst);
-+        xn = desc & (1 << 4);
++        d[H4(i)] = float32_muladd(e2, e1, a[H4(i)], negf_real, fpst);
-+        pxn = desc & 1;
++        d[H4(i + 1)] = float32_muladd(e4, e3, a[H4(i + 1)], negf_imag, fpst);
-+        ns = extract32(desc, 19, 1);
+     }
-+    } else {
+     clear_tail(d, opr_sz, simd_maxsz(desc));
-+        if (cpu_isar_feature(aa32_pxn, cpu)) {
+ }
-+            pxn = (desc >> 2) & 1;
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlad)(void *vd, void *vn, void *vm, void *va,
-+        }
+     uintptr_t opr_sz = simd_oprsz(desc);
-+        ns = extract32(desc, 3, 1);
+     float64 *d = vd, *n = vn, *m = vm, *a = va;
-+        /* Lookup l2 entry.  */
+     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-+        table = (desc & 0xfffffc00) | ((address >> 10) & 0x3fc);
+-    uint64_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-+        desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
+-    uint64_t neg_real = flip ^ neg_imag;
-+                           mmu_idx, fi);
++    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
-+        if (fi->type != ARMFault_None) {
++    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-+            goto do_fault;
++    uint32_t negf_real = flip ^ negf_imag;
-+        }
++    float64 negx_real, negx_imag;
-+        ap = ((desc >> 4) & 3) | ((desc >> 7) & 4);
+     uintptr_t i;
-+        switch (desc & 3) {
-+        case 0: /* Page translation fault.  */
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
-+            fi->type = ARMFault_Translation;
+-    neg_real <<= 63;
-+            goto do_fault;
+-    neg_imag <<= 63;
-+        case 1: /* 64k page.  */
++    /* With AH=0, use negx; with AH=1 use negf. */
-+            phys_addr = (desc & 0xffff0000) | (address & 0xffff);
++    negx_real = (uint64_t)(negf_real & ~fpcr_ah) << 63;
-+            xn = desc & (1 << 15);
++    negx_imag = (uint64_t)(negf_imag & ~fpcr_ah) << 63;
-+            *page_size = 0x10000;
++    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
-+            break;
++    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
-+        case 2: case 3: /* 4k page.  */
-+            phys_addr = (desc & 0xfffff000) | (address & 0xfff);
+     for (i = 0; i < opr_sz / 8; i += 2) {
-+            xn = desc & 1;
+         float64 e2 = n[i + flip];
-+            *page_size = 0x1000;
+-        float64 e1 = m[i + flip] ^ neg_real;
-+            break;
++        float64 e1 = m[i + flip] ^ negx_real;
-+        default:
+         float64 e4 = e2;
-+            /* Never happens, but compiler isn't smart enough to tell.  */
+-        float64 e3 = m[i + 1 - flip] ^ neg_imag;
-+            g_assert_not_reached();
++        float64 e3 = m[i + 1 - flip] ^ negx_imag;
-+        }
-+    }
+-        d[i] = float64_muladd(e2, e1, a[i], 0, fpst);
-+    if (domain_prot == 3) {
+-        d[i + 1] = float64_muladd(e4, e3, a[i + 1], 0, fpst);
-+        *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
++        d[i] = float64_muladd(e2, e1, a[i], negf_real, fpst);
-+    } else {
++        d[i + 1] = float64_muladd(e4, e3, a[i + 1], negf_imag, fpst);
-+        if (pxn && !regime_is_user(env, mmu_idx)) {
+     }
-+            xn = 1;
+     clear_tail(d, opr_sz, simd_maxsz(desc));
-+        }
+ }
 +        if (xn && access_type == MMU_INST_FETCH) {
 +            fi->type = ARMFault_Permission;
 +            goto do_fault;
 +        }
 +
 +        if (arm_feature(env, ARM_FEATURE_V6K) &&
 +                (regime_sctlr(env, mmu_idx) & SCTLR_AFE)) {
 +            /* The simplified model uses AP[0] as an access control bit.  */
 +            if ((ap & 1) == 0) {
 +                /* Access flag fault.  */
 +                fi->type = ARMFault_AccessFlag;
 +                goto do_fault;
 +            }
 +            *prot = simple_ap_to_rw_prot(env, mmu_idx, ap >> 1);
 +        } else {
 +            *prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
 +        }
 +        if (*prot && !xn) {
 +            *prot |= PAGE_EXEC;
 +        }
 +        if (!(*prot & (1 << access_type))) {
 +            /* Access permission fault.  */
 +            fi->type = ARMFault_Permission;
 +            goto do_fault;
 +        }
 +    }
 +    if (ns) {
 +        /* The NS bit will (as required by the architecture) have no effect if
 +         * the CPU doesn't support TZ or this is a non-secure translation
 +         * regime, because the attribute will already be non-secure.
 +         */
 +        attrs->secure = false;
 +    }
 +    *phys_ptr = phys_addr;
 +    return false;
 +do_fault:
 +    fi->domain = domain;
 +    fi->level = level;
 +    return true;
 +}
 +
  /**
   * get_phys_addr - get the physical address for this virtual address
   *
 --
-.25.1
+.34.1

-[PULL 25/55] target/arm: Move arm_pamax, pamax_map into ptw.c
+[PULL 46/68] target/arm: Handle FPCR.AH in FCMLA by index
 From: Richard Henderson <richard.henderson@linaro.org>
+The negation step in FCMLA by index mustn't negate a NaN when
+FPCR.AH is set. Use the same approach as vector FCMLA of
+passing in FPCR.AH and using it to select whether to negate
+by XOR or by the muladd negate_product flag.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-19-richard.henderson@linaro.org
+Message-id: 20250129013857.135256-27-richard.henderson@linaro.org
 [PMM: Expanded commit message]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/ptw.h    |  2 --
+ target/arm/tcg/translate-a64.c |  2 +-
- target/arm/helper.c | 25 -------------------------
+ target/arm/tcg/vec_helper.c    | 44 ++++++++++++++++++++--------------
- target/arm/ptw.c    | 25 +++++++++++++++++++++++++
+files changed, 27 insertions(+), 19 deletions(-)
 files changed, 25 insertions(+), 27 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/ptw.h
++++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
+     if (fp_access_check(s)) {
- #ifndef CONFIG_USER_ONLY
+         gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
+                           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
--extern const uint8_t pamax_map[7];
+-                          (a->idx << 2) | a->rot, fn);
--
++                          (s->fpcr_ah << 4) | (a->idx << 2) | a->rot, fn);
- bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx);
+     }
- bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
+     return true;
- uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn);
+ }
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/vec_helper.c
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm, void *va,
- }
+     uintptr_t opr_sz = simd_oprsz(desc);
- #endif /* !CONFIG_USER_ONLY */
+     float16 *d = vd, *n = vn, *m = vm, *a = va;
+     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
--/* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
+-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
--const uint8_t pamax_map[] = {
++    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
--    [0] = 32,
+     intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
--    [1] = 36,
+-    uint32_t neg_real = flip ^ neg_imag;
--    [2] = 40,
++    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 4, 1);
--    [3] = 42,
++    uint32_t negf_real = flip ^ negf_imag;
--    [4] = 44,
+     intptr_t elements = opr_sz / sizeof(float16);
--    [5] = 48,
+     intptr_t eltspersegment = MIN(16 / sizeof(float16), elements);
--    [6] = 52,
++    float16 negx_imag, negx_real;
--};
+     intptr_t i, j;
--
--/* The cpu-specific constant value of PAMax; also used by hw/arm/virt. */
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
--unsigned int arm_pamax(ARMCPU *cpu)
+-    neg_real <<= 15;
--{
+-    neg_imag <<= 15;
--    unsigned int parange =
++    /* With AH=0, use negx; with AH=1 use negf. */
--        FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
++    negx_real = (negf_real & ~fpcr_ah) << 15;
--
++    negx_imag = (negf_imag & ~fpcr_ah) << 15;
--    /*
++    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
--     * id_aa64mmfr0 is a read-only register so values outside of the
++    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
--     * supported mappings can be considered an implementation error.
--     */
+     for (i = 0; i < elements; i += eltspersegment) {
--    assert(parange < ARRAY_SIZE(pamax_map));
+         float16 mr = m[H2(i + 2 * index + 0)];
--    return pamax_map[parange];
+         float16 mi = m[H2(i + 2 * index + 1)];
--}
+-        float16 e1 = neg_real ^ (flip ? mi : mr);
--
+-        float16 e3 = neg_imag ^ (flip ? mr : mi);
- int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
++        float16 e1 = negx_real ^ (flip ? mi : mr);
- {
++        float16 e3 = negx_imag ^ (flip ? mr : mi);
-     if (regime_has_2_ranges(mmu_idx)) {
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+         for (j = i; j < i + eltspersegment; j += 2) {
-index XXXXXXX..XXXXXXX 100644
+             float16 e2 = n[H2(j + flip)];
---- a/target/arm/ptw.c
+             float16 e4 = e2;
-+++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
+-            d[H2(j)] = float16_muladd(e2, e1, a[H2(j)], 0, fpst);
-                                ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
+-            d[H2(j + 1)] = float16_muladd(e4, e3, a[H2(j + 1)], 0, fpst);
-     __attribute__((nonnull));
++            d[H2(j)] = float16_muladd(e2, e1, a[H2(j)], negf_real, fpst);
++            d[H2(j + 1)] = float16_muladd(e4, e3, a[H2(j + 1)], negf_imag, fpst);
-+/* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
+         }
-+static const uint8_t pamax_map[] = {
+     }
-+    [0] = 32,
+     clear_tail(d, opr_sz, simd_maxsz(desc));
-+    [1] = 36,
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm, void *va,
-+    [2] = 40,
+     uintptr_t opr_sz = simd_oprsz(desc);
-+    [3] = 42,
+     float32 *d = vd, *n = vn, *m = vm, *a = va;
-+    [4] = 44,
+     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-+    [5] = 48,
+-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-+    [6] = 52,
++    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-+};
+     intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
-+
+-    uint32_t neg_real = flip ^ neg_imag;
-+/* The cpu-specific constant value of PAMax; also used by hw/arm/virt. */
++    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 4, 1);
-+unsigned int arm_pamax(ARMCPU *cpu)
++    uint32_t negf_real = flip ^ negf_imag;
-+{
+     intptr_t elements = opr_sz / sizeof(float32);
-+    unsigned int parange =
+     intptr_t eltspersegment = MIN(16 / sizeof(float32), elements);
-+        FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
++    float32 negx_imag, negx_real;
-+
+     intptr_t i, j;
-+    /*
-+     * id_aa64mmfr0 is a read-only register so values outside of the
+-    /* Shift boolean to the sign bit so we can xor to negate.  */
-+     * supported mappings can be considered an implementation error.
+-    neg_real <<= 31;
-+     */
+-    neg_imag <<= 31;
-+    assert(parange < ARRAY_SIZE(pamax_map));
++    /* With AH=0, use negx; with AH=1 use negf. */
-+    return pamax_map[parange];
++    negx_real = (negf_real & ~fpcr_ah) << 31;
-+}
++    negx_imag = (negf_imag & ~fpcr_ah) << 31;
-+
++    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
- static bool regime_translation_big_endian(CPUARMState *env, ARMMMUIdx mmu_idx)
++    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
- {
-     return (regime_sctlr(env, mmu_idx) & SCTLR_EE) != 0;
+     for (i = 0; i < elements; i += eltspersegment) {
          float32 mr = m[H4(i + 2 * index + 0)];
          float32 mi = m[H4(i + 2 * index + 1)];
 -        float32 e1 = neg_real ^ (flip ? mi : mr);
 -        float32 e3 = neg_imag ^ (flip ? mr : mi);
 +        float32 e1 = negx_real ^ (flip ? mi : mr);
 +        float32 e3 = negx_imag ^ (flip ? mr : mi);
          for (j = i; j < i + eltspersegment; j += 2) {
              float32 e2 = n[H4(j + flip)];
              float32 e4 = e2;
 -            d[H4(j)] = float32_muladd(e2, e1, a[H4(j)], 0, fpst);
 -            d[H4(j + 1)] = float32_muladd(e4, e3, a[H4(j + 1)], 0, fpst);
 +            d[H4(j)] = float32_muladd(e2, e1, a[H4(j)], negf_real, fpst);
 +            d[H4(j + 1)] = float32_muladd(e4, e3, a[H4(j + 1)], negf_imag, fpst);
          }
      }
      clear_tail(d, opr_sz, simd_maxsz(desc));
 --
-.25.1
+.34.1

-[PULL 24/55] target/arm: Move {arm_s1_, }regime_using_lpae_format to tlb_helper.c
+[PULL 47/68] target/arm: Handle FPCR.AH in SVE FCMLA
 From: Richard Henderson <richard.henderson@linaro.org>
-These functions are used for both page table walking and for
+The negation step in SVE FCMLA mustn't negate a NaN when FPCR.AH is
-deciding what format in which to deliver exception results.
+set.  Use the same approach as we did for A64 FCMLA of passing in
-Since ptw.c is only present for system mode, put the functions
+FPCR.AH and using it to select whether to negate by XOR or by the
-into tlb_helper.c.
+muladd negate_product flag.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-18-richard.henderson@linaro.org
+Message-id: 20250129013857.135256-28-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c     | 24 ------------------------
+ target/arm/tcg/sve_helper.c    | 69 +++++++++++++++++++++-------------
- target/arm/tlb_helper.c | 26 ++++++++++++++++++++++++++
+ target/arm/tcg/translate-sve.c |  2 +-
-files changed, 26 insertions(+), 24 deletions(-)
+files changed, 43 insertions(+), 28 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/sve_helper.c
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/sve_helper.c
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
- }
+                                void *vg, float_status *status, uint32_t desc)
  #endif /* !CONFIG_USER_ONLY */
 -/* Return true if the translation regime is using LPAE format page tables */
 -bool regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
 -{
 -    int el = regime_el(env, mmu_idx);
 -    if (el == 2 || arm_el_is_aa64(env, el)) {
 -        return true;
 -    }
 -    if (arm_feature(env, ARM_FEATURE_LPAE)
 -        && (regime_tcr(env, mmu_idx)->raw_tcr & TTBCR_EAE)) {
 -        return true;
 -    }
 -    return false;
 -}
 -
 -/* Returns true if the stage 1 translation regime is using LPAE format page
 - * tables. Used when raising alignment exceptions, whose FSR changes depending
 - * on whether the long or short descriptor format is in use. */
 -bool arm_s1_regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
 -{
 -    mmu_idx = stage_1_mmu_idx(mmu_idx);
 -
 -    return regime_using_lpae_format(env, mmu_idx);
 -}
 -
  #ifndef CONFIG_USER_ONLY
  bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
  {
-diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
+     intptr_t j, i = simd_oprsz(desc);
 -    unsigned rot = simd_data(desc);
 -    bool flip = rot & 1;
 -    float16 neg_imag, neg_real;
 +    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
 +    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t negf_real = flip ^ negf_imag;
 +    float16 negx_imag, negx_real;
      uint64_t *g = vg;
 -    neg_imag = float16_set_sign(0, (rot & 2) != 0);
 -    neg_real = float16_set_sign(0, rot == 1 || rot == 2);
 +    /* With AH=0, use negx; with AH=1 use negf. */
 +    negx_real = (negf_real & ~fpcr_ah) << 15;
 +    negx_imag = (negf_imag & ~fpcr_ah) << 15;
 +    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
 +    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
      do {
          uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
              mi = *(float16 *)(vm + H1_2(j));
              e2 = (flip ? ni : nr);
 -            e1 = (flip ? mi : mr) ^ neg_real;
 +            e1 = (flip ? mi : mr) ^ negx_real;
              e4 = e2;
 -            e3 = (flip ? mr : mi) ^ neg_imag;
 +            e3 = (flip ? mr : mi) ^ negx_imag;
              if (likely((pg >> (i & 63)) & 1)) {
                  d = *(float16 *)(va + H1_2(i));
 -                d = float16_muladd(e2, e1, d, 0, status);
 +                d = float16_muladd(e2, e1, d, negf_real, status);
                  *(float16 *)(vd + H1_2(i)) = d;
              }
              if (likely((pg >> (j & 63)) & 1)) {
                  d = *(float16 *)(va + H1_2(j));
 -                d = float16_muladd(e4, e3, d, 0, status);
 +                d = float16_muladd(e4, e3, d, negf_imag, status);
                  *(float16 *)(vd + H1_2(j)) = d;
              }
          } while (i & 63);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
      intptr_t j, i = simd_oprsz(desc);
 -    unsigned rot = simd_data(desc);
 -    bool flip = rot & 1;
 -    float32 neg_imag, neg_real;
 +    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
 +    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t negf_real = flip ^ negf_imag;
 +    float32 negx_imag, negx_real;
      uint64_t *g = vg;
 -    neg_imag = float32_set_sign(0, (rot & 2) != 0);
 -    neg_real = float32_set_sign(0, rot == 1 || rot == 2);
 +    /* With AH=0, use negx; with AH=1 use negf. */
 +    negx_real = (negf_real & ~fpcr_ah) << 31;
 +    negx_imag = (negf_imag & ~fpcr_ah) << 31;
 +    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
 +    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
      do {
          uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
              mi = *(float32 *)(vm + H1_2(j));
              e2 = (flip ? ni : nr);
 -            e1 = (flip ? mi : mr) ^ neg_real;
 +            e1 = (flip ? mi : mr) ^ negx_real;
              e4 = e2;
 -            e3 = (flip ? mr : mi) ^ neg_imag;
 +            e3 = (flip ? mr : mi) ^ negx_imag;
              if (likely((pg >> (i & 63)) & 1)) {
                  d = *(float32 *)(va + H1_2(i));
 -                d = float32_muladd(e2, e1, d, 0, status);
 +                d = float32_muladd(e2, e1, d, negf_real, status);
                  *(float32 *)(vd + H1_2(i)) = d;
              }
              if (likely((pg >> (j & 63)) & 1)) {
                  d = *(float32 *)(va + H1_2(j));
 -                d = float32_muladd(e4, e3, d, 0, status);
 +                d = float32_muladd(e4, e3, d, negf_imag, status);
                  *(float32 *)(vd + H1_2(j)) = d;
              }
          } while (i & 63);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                 void *vg, float_status *status, uint32_t desc)
  {
      intptr_t j, i = simd_oprsz(desc);
 -    unsigned rot = simd_data(desc);
 -    bool flip = rot & 1;
 -    float64 neg_imag, neg_real;
 +    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
 +    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 +    uint32_t negf_real = flip ^ negf_imag;
 +    float64 negx_imag, negx_real;
      uint64_t *g = vg;
 -    neg_imag = float64_set_sign(0, (rot & 2) != 0);
 -    neg_real = float64_set_sign(0, rot == 1 || rot == 2);
 +    /* With AH=0, use negx; with AH=1 use negf. */
 +    negx_real = (uint64_t)(negf_real & ~fpcr_ah) << 63;
 +    negx_imag = (uint64_t)(negf_imag & ~fpcr_ah) << 63;
 +    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
 +    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
      do {
          uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
              mi = *(float64 *)(vm + H1_2(j));
              e2 = (flip ? ni : nr);
 -            e1 = (flip ? mi : mr) ^ neg_real;
 +            e1 = (flip ? mi : mr) ^ negx_real;
              e4 = e2;
 -            e3 = (flip ? mr : mi) ^ neg_imag;
 +            e3 = (flip ? mr : mi) ^ negx_imag;
              if (likely((pg >> (i & 63)) & 1)) {
                  d = *(float64 *)(va + H1_2(i));
 -                d = float64_muladd(e2, e1, d, 0, status);
 +                d = float64_muladd(e2, e1, d, negf_real, status);
                  *(float64 *)(vd + H1_2(i)) = d;
              }
              if (likely((pg >> (j & 63)) & 1)) {
                  d = *(float64 *)(va + H1_2(j));
 -                d = float64_muladd(e4, e3, d, 0, status);
 +                d = float64_muladd(e4, e3, d, negf_imag, status);
                  *(float64 *)(vd + H1_2(j)) = d;
              }
          } while (i & 63);
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/tlb_helper.c
+--- a/target/arm/tcg/translate-sve.c
-+++ b/target/arm/tlb_helper.c
++++ b/target/arm/tcg/translate-sve.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_5_ptr * const fcmla_fns[4] = {
- #include "exec/exec-all.h"
+     gen_helper_sve_fcmla_zpzzz_s, gen_helper_sve_fcmla_zpzzz_d,
- #include "exec/helper-proto.h"
+ };
+ TRANS_FEAT(FCMLA_zpzzz, aa64_sve, gen_gvec_fpst_zzzzp, fcmla_fns[a->esz],
-+
+-           a->rd, a->rn, a->rm, a->ra, a->pg, a->rot,
-+/* Return true if the translation regime is using LPAE format page tables */
++           a->rd, a->rn, a->rm, a->ra, a->pg, a->rot | (s->fpcr_ah << 2),
-+bool regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
+            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
-+{
-+    int el = regime_el(env, mmu_idx);
+ static gen_helper_gvec_4_ptr * const fcmla_idx_fns[4] = {
 +    if (el == 2 || arm_el_is_aa64(env, el)) {
 +        return true;
 +    }
 +    if (arm_feature(env, ARM_FEATURE_LPAE)
 +        && (regime_tcr(env, mmu_idx)->raw_tcr & TTBCR_EAE)) {
 +        return true;
 +    }
 +    return false;
 +}
 +
 +/*
 + * Returns true if the stage 1 translation regime is using LPAE format page
 + * tables. Used when raising alignment exceptions, whose FSR changes depending
 + * on whether the long or short descriptor format is in use.
 + */
 +bool arm_s1_regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
 +{
 +    mmu_idx = stage_1_mmu_idx(mmu_idx);
 +    return regime_using_lpae_format(env, mmu_idx);
 +}
 +
  static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
                                              unsigned int target_el,
                                              bool same_el, bool ea,
 --
-.25.1
+.34.1

-[PULL 16/55] target/arm: Move pmsav8_mpu_lookup to ptw.c
+[PULL 48/68] target/arm: Handle FPCR.AH in FMLSL (by element and vector)
 From: Richard Henderson <richard.henderson@linaro.org>
-This is the final user of get_phys_addr_pmsav7_default
+Handle FPCR.AH's requirement to not negate the sign of a NaN
-within helper.c, so make it static within ptw.c.
+in FMLSL by element and vector, using the usual trick of
 negating by XOR when AH=0 and by muladd flags when AH=1.
 Since we have the CPUARMState* in the helper anyway, we can
 look directly at env->vfp.fpcr and don't need toa pass in the
 FPCR.AH value via the SIMD data word.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-10-richard.henderson@linaro.org
+Message-id: 20250129013857.135256-31-richard.henderson@linaro.org
 [PMM: commit message tweaked]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/ptw.h    |   3 -
+ target/arm/tcg/vec_helper.c | 71 ++++++++++++++++++++++++-------------
- target/arm/helper.c | 136 -----------------------------------------
+file changed, 46 insertions(+), 25 deletions(-)
  target/arm/ptw.c    | 146 +++++++++++++++++++++++++++++++++++++++++++-
 files changed, 143 insertions(+), 142 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/tcg/vec_helper.c
-+++ b/target/arm/ptw.h
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+@@ -XXX,XX +XXX,XX @@ static uint64_t load4_f16(uint64_t *ptr, int is_q, int is_2)
- bool m_is_ppb_region(CPUARMState *env, uint32_t address);
+  */
- bool m_is_system_region(CPUARMState *env, uint32_t address);
+ static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
--void get_phys_addr_pmsav7_default(CPUARMState *env,
+-                     uint32_t desc, bool fz16)
--                                  ARMMMUIdx mmu_idx,
++                     uint64_t negx, int negf, uint32_t desc, bool fz16)
--                                  int32_t address, int *prot);
+ {
- bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx, bool is_user);
+     intptr_t i, oprsz = simd_oprsz(desc);
+-    int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
- bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
+     int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+     int is_q = oprsz == 16;
-index XXXXXXX..XXXXXXX 100644
+     uint64_t n_4, m_4;
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
+-    /* Pre-load all of the f16 data, avoiding overlap issues.  */
-@@ -XXX,XX +XXX,XX @@ void v8m_security_lookup(CPUARMState *env, uint32_t address,
+-    n_4 = load4_f16(vn, is_q, is_2);
-     }
++    /*
- }
++     * Pre-load all of the f16 data, avoiding overlap issues.
++     * Negate all inputs for AH=0 FMLSL at once.
--bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
++     */
--                              MMUAccessType access_type, ARMMMUIdx mmu_idx,
++    n_4 = load4_f16(vn, is_q, is_2) ^ negx;
--                              hwaddr *phys_ptr, MemTxAttrs *txattrs,
+     m_4 = load4_f16(vm, is_q, is_2);
--                              int *prot, bool *is_subpage,
--                              ARMMMUFaultInfo *fi, uint32_t *mregion)
+-    /* Negate all inputs for FMLSL at once.  */
--{
+-    if (is_s) {
--    /* Perform a PMSAv8 MPU lookup (without also doing the SAU check
+-        n_4 ^= 0x8000800080008000ull;
 -     * that a full phys-to-virt translation does).
 -     * mregion is (if not NULL) set to the region number which matched,
 -     * or -1 if no region number is returned (MPU off, address did not
 -     * hit a region, address hit in multiple regions).
 -     * We set is_subpage to true if the region hit doesn't cover the
 -     * entire TARGET_PAGE the address is within.
 -     */
 -    ARMCPU *cpu = env_archcpu(env);
 -    bool is_user = regime_is_user(env, mmu_idx);
 -    uint32_t secure = regime_is_secure(env, mmu_idx);
 -    int n;
 -    int matchregion = -1;
 -    bool hit = false;
 -    uint32_t addr_page_base = address & TARGET_PAGE_MASK;
 -    uint32_t addr_page_limit = addr_page_base + (TARGET_PAGE_SIZE - 1);
 -
 -    *is_subpage = false;
 -    *phys_ptr = address;
 -    *prot = 0;
 -    if (mregion) {
 -        *mregion = -1;
 -    }
 -
--    /* Unlike the ARM ARM pseudocode, we don't need to check whether this
+     for (i = 0; i < oprsz / 4; i++) {
--     * was an exception vector read from the vector table (which is always
+         float32 n_1 = float16_to_float32_by_bits(n_4 >> (i * 16), fz16);
--     * done using the default system address map), because those accesses
+         float32 m_1 = float16_to_float32_by_bits(m_4 >> (i * 16), fz16);
--     * are done in arm_v7m_load_vector(), which always does a direct
+-        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], 0, fpst);
--     * read using address_space_ldl(), rather than going via this function.
++        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], negf, fpst);
--     */
+     }
--    if (regime_translation_disabled(env, mmu_idx)) { /* MPU disabled */
+     clear_tail(d, oprsz, simd_maxsz(desc));
--        hit = true;
+ }
--    } else if (m_is_ppb_region(env, address)) {
+@@ -XXX,XX +XXX,XX @@ static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
--        hit = true;
+ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
--    } else {
+                             CPUARMState *env, uint32_t desc)
--        if (pmsav7_use_background_region(cpu, mmu_idx, is_user)) {
+ {
--            hit = true;
+-    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, desc,
--        }
++    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 +
 +    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
               get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
  }
  void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
                              CPUARMState *env, uint32_t desc)
  {
 -    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, desc,
 +    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    uint64_t negx = 0;
 +    int negf = 0;
 +
 +    if (is_s) {
 +        if (env->vfp.fpcr & FPCR_AH) {
 +            negf = float_muladd_negate_product;
 +        } else {
 +            negx = 0x8000800080008000ull;
 +        }
 +    }
 +    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
               get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
  }
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
  }
  static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
 -                         uint32_t desc, bool fz16)
 +                         uint64_t negx, int negf, uint32_t desc, bool fz16)
  {
      intptr_t i, oprsz = simd_oprsz(desc);
 -    int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
      int index = extract32(desc, SIMD_DATA_SHIFT + 2, 3);
      int is_q = oprsz == 16;
      uint64_t n_4;
      float32 m_1;
 -    /* Pre-load all of the f16 data, avoiding overlap issues.  */
 -    n_4 = load4_f16(vn, is_q, is_2);
 -
--        for (n = (int)cpu->pmsav7_dregion - 1; n >= 0; n--) {
+-    /* Negate all inputs for FMLSL at once.  */
--            /* region search */
+-    if (is_s) {
--            /* Note that the base address is bits [31:5] from the register
+-        n_4 ^= 0x8000800080008000ull;
 -             * with bits [4:0] all zeroes, but the limit address is bits
 -             * [31:5] from the register with bits [4:0] all ones.
 -             */
 -            uint32_t base = env->pmsav8.rbar[secure][n] & ~0x1f;
 -            uint32_t limit = env->pmsav8.rlar[secure][n] | 0x1f;
 -
 -            if (!(env->pmsav8.rlar[secure][n] & 0x1)) {
 -                /* Region disabled */
 -                continue;
 -            }
 -
 -            if (address < base || address > limit) {
 -                /*
 -                 * Address not in this region. We must check whether the
 -                 * region covers addresses in the same page as our address.
 -                 * In that case we must not report a size that covers the
 -                 * whole page for a subsequent hit against a different MPU
 -                 * region or the background region, because it would result in
 -                 * incorrect TLB hits for subsequent accesses to addresses that
 -                 * are in this MPU region.
 -                 */
 -                if (limit >= base &&
 -                    ranges_overlap(base, limit - base + 1,
 -                                   addr_page_base,
 -                                   TARGET_PAGE_SIZE)) {
 -                    *is_subpage = true;
 -                }
 -                continue;
 -            }
 -
 -            if (base > addr_page_base || limit < addr_page_limit) {
 -                *is_subpage = true;
 -            }
 -
 -            if (matchregion != -1) {
 -                /* Multiple regions match -- always a failure (unlike
 -                 * PMSAv7 where highest-numbered-region wins)
 -                 */
 -                fi->type = ARMFault_Permission;
 -                fi->level = 1;
 -                return true;
 -            }
 -
 -            matchregion = n;
 -            hit = true;
 -        }
 -    }
 -
--    if (!hit) {
++    /*
--        /* background fault */
++     * Pre-load all of the f16 data, avoiding overlap issues.
--        fi->type = ARMFault_Background;
++     * Negate all inputs for AH=0 FMLSL at once.
--        return true;
++     */
--    }
++    n_4 = load4_f16(vn, is_q, is_2) ^ negx;
--
+     m_1 = float16_to_float32_by_bits(((float16 *)vm)[H2(index)], fz16);
--    if (matchregion == -1) {
--        /* hit using the background region */
+     for (i = 0; i < oprsz / 4; i++) {
--        get_phys_addr_pmsav7_default(env, mmu_idx, address, prot);
+         float32 n_1 = float16_to_float32_by_bits(n_4 >> (i * 16), fz16);
--    } else {
+-        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], 0, fpst);
--        uint32_t ap = extract32(env->pmsav8.rbar[secure][matchregion], 1, 2);
++        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], negf, fpst);
--        uint32_t xn = extract32(env->pmsav8.rbar[secure][matchregion], 0, 1);
+     }
--        bool pxn = false;
+     clear_tail(d, oprsz, simd_maxsz(desc));
 -
 -        if (arm_feature(env, ARM_FEATURE_V8_1M)) {
 -            pxn = extract32(env->pmsav8.rlar[secure][matchregion], 4, 1);
 -        }
 -
 -        if (m_is_system_region(env, address)) {
 -            /* System space is always execute never */
 -            xn = 1;
 -        }
 -
 -        *prot = simple_ap_to_rw_prot(env, mmu_idx, ap);
 -        if (*prot && !xn && !(pxn && !is_user)) {
 -            *prot |= PAGE_EXEC;
 -        }
 -        /* We don't need to look the attribute up in the MAIR0/MAIR1
 -         * registers because that only tells us about cacheability.
 -         */
 -        if (mregion) {
 -            *mregion = matchregion;
 -        }
 -    }
 -
 -    fi->type = ARMFault_Permission;
 -    fi->level = 1;
 -    return !(*prot & (1 << access_type));
 -}
 -
  /* Combine either inner or outer cacheability attributes for normal
   * memory, according to table D4-42 and pseudocode procedure
   * CombineS1S2AttrHints() of ARM DDI 0487B.b (the ARMv8 ARM).
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
      return false;
  }
+@@ -XXX,XX +XXX,XX @@ static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
--void get_phys_addr_pmsav7_default(CPUARMState *env,
+ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
--                                  ARMMMUIdx mmu_idx,
+                                 CPUARMState *env, uint32_t desc)
 -                                  int32_t address, int *prot)
 +static void get_phys_addr_pmsav7_default(CPUARMState *env, ARMMMUIdx mmu_idx,
 +                                         int32_t address, int *prot)
  {
-     if (!arm_feature(env, ARM_FEATURE_M)) {
+-    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, desc,
-         *prot = PAGE_READ | PAGE_WRITE;
++    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
++    uint64_t negx = is_s ? 0x8000800080008000ull : 0;
-     return !(*prot & (1 << access_type));
++
 +    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
                   get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
  }
-+bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
+ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
-+                       MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                                 CPUARMState *env, uint32_t desc)
-+                       hwaddr *phys_ptr, MemTxAttrs *txattrs,
+ {
-+                       int *prot, bool *is_subpage,
+-    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, desc,
-+                       ARMMMUFaultInfo *fi, uint32_t *mregion)
++    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-+{
++    uint64_t negx = 0;
-+    /*
++    int negf = 0;
 +     * Perform a PMSAv8 MPU lookup (without also doing the SAU check
 +     * that a full phys-to-virt translation does).
 +     * mregion is (if not NULL) set to the region number which matched,
 +     * or -1 if no region number is returned (MPU off, address did not
 +     * hit a region, address hit in multiple regions).
 +     * We set is_subpage to true if the region hit doesn't cover the
 +     * entire TARGET_PAGE the address is within.
 +     */
 +    ARMCPU *cpu = env_archcpu(env);
 +    bool is_user = regime_is_user(env, mmu_idx);
 +    uint32_t secure = regime_is_secure(env, mmu_idx);
 +    int n;
 +    int matchregion = -1;
 +    bool hit = false;
 +    uint32_t addr_page_base = address & TARGET_PAGE_MASK;
 +    uint32_t addr_page_limit = addr_page_base + (TARGET_PAGE_SIZE - 1);
 +
-+    *is_subpage = false;
++    if (is_s) {
-+    *phys_ptr = address;
++        if (env->vfp.fpcr & FPCR_AH) {
-+    *prot = 0;
++            negf = float_muladd_negate_product;
-+    if (mregion) {
++        } else {
-+        *mregion = -1;
++            negx = 0x8000800080008000ull;
 +    }
 +
 +    /*
 +     * Unlike the ARM ARM pseudocode, we don't need to check whether this
 +     * was an exception vector read from the vector table (which is always
 +     * done using the default system address map), because those accesses
 +     * are done in arm_v7m_load_vector(), which always does a direct
 +     * read using address_space_ldl(), rather than going via this function.
 +     */
 +    if (regime_translation_disabled(env, mmu_idx)) { /* MPU disabled */
 +        hit = true;
 +    } else if (m_is_ppb_region(env, address)) {
 +        hit = true;
 +    } else {
 +        if (pmsav7_use_background_region(cpu, mmu_idx, is_user)) {
 +            hit = true;
 +        }
 +
 +        for (n = (int)cpu->pmsav7_dregion - 1; n >= 0; n--) {
 +            /* region search */
 +            /*
 +             * Note that the base address is bits [31:5] from the register
 +             * with bits [4:0] all zeroes, but the limit address is bits
 +             * [31:5] from the register with bits [4:0] all ones.
 +             */
 +            uint32_t base = env->pmsav8.rbar[secure][n] & ~0x1f;
 +            uint32_t limit = env->pmsav8.rlar[secure][n] | 0x1f;
 +
 +            if (!(env->pmsav8.rlar[secure][n] & 0x1)) {
 +                /* Region disabled */
 +                continue;
 +            }
 +
 +            if (address < base || address > limit) {
 +                /*
 +                 * Address not in this region. We must check whether the
 +                 * region covers addresses in the same page as our address.
 +                 * In that case we must not report a size that covers the
 +                 * whole page for a subsequent hit against a different MPU
 +                 * region or the background region, because it would result in
 +                 * incorrect TLB hits for subsequent accesses to addresses that
 +                 * are in this MPU region.
 +                 */
 +                if (limit >= base &&
 +                    ranges_overlap(base, limit - base + 1,
 +                                   addr_page_base,
 +                                   TARGET_PAGE_SIZE)) {
 +                    *is_subpage = true;
 +                }
 +                continue;
 +            }
 +
 +            if (base > addr_page_base || limit < addr_page_limit) {
 +                *is_subpage = true;
 +            }
 +
 +            if (matchregion != -1) {
 +                /*
 +                 * Multiple regions match -- always a failure (unlike
 +                 * PMSAv7 where highest-numbered-region wins)
 +                 */
 +                fi->type = ARMFault_Permission;
 +                fi->level = 1;
 +                return true;
 +            }
 +
 +            matchregion = n;
 +            hit = true;
 +        }
 +    }
-+
++    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
-+    if (!hit) {
+                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
-+        /* background fault */
+ }
-+        fi->type = ARMFault_Background;
 +        return true;
 +    }
 +
 +    if (matchregion == -1) {
 +        /* hit using the background region */
 +        get_phys_addr_pmsav7_default(env, mmu_idx, address, prot);
 +    } else {
 +        uint32_t ap = extract32(env->pmsav8.rbar[secure][matchregion], 1, 2);
 +        uint32_t xn = extract32(env->pmsav8.rbar[secure][matchregion], 0, 1);
 +        bool pxn = false;
 +
 +        if (arm_feature(env, ARM_FEATURE_V8_1M)) {
 +            pxn = extract32(env->pmsav8.rlar[secure][matchregion], 4, 1);
 +        }
 +
 +        if (m_is_system_region(env, address)) {
 +            /* System space is always execute never */
 +            xn = 1;
 +        }
 +
 +        *prot = simple_ap_to_rw_prot(env, mmu_idx, ap);
 +        if (*prot && !xn && !(pxn && !is_user)) {
 +            *prot |= PAGE_EXEC;
 +        }
 +        /*
 +         * We don't need to look the attribute up in the MAIR0/MAIR1
 +         * registers because that only tells us about cacheability.
 +         */
 +        if (mregion) {
 +            *mregion = matchregion;
 +        }
 +    }
 +
 +    fi->type = ARMFault_Permission;
 +    fi->level = 1;
 +    return !(*prot & (1 << access_type));
 +}
 +
  static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
                                   MMUAccessType access_type, ARMMMUIdx mmu_idx,
                                   hwaddr *phys_ptr, MemTxAttrs *txattrs,
 --
-.25.1
+.34.1

-[PULL 33/55] target/arm: Move arm_cpu_get_phys_page_attrs_debug to ptw.c
+[PULL 49/68] target/arm: Handle FPCR.AH in SVE FMLSL (indexed)
 From: Richard Henderson <richard.henderson@linaro.org>
+Handle FPCR.AH's requirement to not negate the sign of a NaN in SVE
+FMLSL (indexed), using the usual trick of negating by XOR when AH=0
+and by muladd flags when AH=1.
+Since we have the CPUARMState* in the helper anyway, we can
+look directly at env->vfp.fpcr and don't need toa pass in the
+FPCR.AH value via the SIMD data word.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-27-richard.henderson@linaro.org
+Message-id: 20250129013857.135256-32-richard.henderson@linaro.org
 [PMM: commit message tweaked]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 26 --------------------------
+ target/arm/tcg/vec_helper.c | 15 ++++++++++++---
- target/arm/ptw.c    | 24 ++++++++++++++++++++++++
+file changed, 12 insertions(+), 3 deletions(-)
 files changed, 24 insertions(+), 26 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/vec_helper.c
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
-     };
+                                CPUARMState *env, uint32_t desc)
- }
+ {
+     intptr_t i, j, oprsz = simd_oprsz(desc);
--#ifndef CONFIG_USER_ONLY
+-    uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
--hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
++    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
--                                         MemTxAttrs *attrs)
+     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
--{
+     intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
--    ARMCPU *cpu = ARM_CPU(cs);
+     float_status *status = &env->vfp.fp_status_a64;
--    CPUARMState *env = &cpu->env;
+     bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
--    hwaddr phys_addr;
++    int negx = 0, negf = 0;
--    target_ulong page_size;
++
--    int prot;
++    if (is_s) {
--    bool ret;
++        if (env->vfp.fpcr & FPCR_AH) {
--    ARMMMUFaultInfo fi = {};
++            negf = float_muladd_negate_product;
--    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
++        } else {
--    ARMCacheAttrs cacheattrs = {};
++            negx = 0x8000;
--
++        }
--    *attrs = (MemTxAttrs) {};
++    }
--
--    ret = get_phys_addr(env, addr, MMU_DATA_LOAD, mmu_idx, &phys_addr,
+     for (i = 0; i < oprsz; i += 16) {
--                        attrs, &prot, &page_size, &fi, &cacheattrs);
+         float16 mm_16 = *(float16 *)(vm + i + idx);
--
+         float32 mm = float16_to_float32_by_bits(mm_16, fz16);
--    if (ret) {
--        return -1;
+         for (j = 0; j < 16; j += sizeof(float32)) {
--    }
+-            float16 nn_16 = *(float16 *)(vn + H1_2(i + j + sel)) ^ negn;
--    return phys_addr;
++            float16 nn_16 = *(float16 *)(vn + H1_2(i + j + sel)) ^ negx;
--}
+             float32 nn = float16_to_float32_by_bits(nn_16, fz16);
--#endif
+             float32 aa = *(float32 *)(va + H1_4(i + j));
--
- /* Note that signed overflow is undefined in C.  The following routines are
+             *(float32 *)(vd + H1_4(i + j)) =
-    careful to use unsigned types where modulo arithmetic is required.
+-                float32_muladd(nn, mm, aa, 0, status);
-    Failure to do so _will_ break on newer gcc.  */
++                float32_muladd(nn, mm, aa, negf, status);
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+         }
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
                                      phys_ptr, prot, page_size, fi);
      }
  }
-+
-+hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
-+                                         MemTxAttrs *attrs)
-+{
-+    ARMCPU *cpu = ARM_CPU(cs);
-+    CPUARMState *env = &cpu->env;
-+    hwaddr phys_addr;
-+    target_ulong page_size;
-+    int prot;
-+    bool ret;
-+    ARMMMUFaultInfo fi = {};
-+    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
-+    ARMCacheAttrs cacheattrs = {};
-+
-+    *attrs = (MemTxAttrs) {};
-+
-+    ret = get_phys_addr(env, addr, MMU_DATA_LOAD, mmu_idx, &phys_addr,
-+                        attrs, &prot, &page_size, &fi, &cacheattrs);
-+
-+    if (ret) {
-+        return -1;
-+    }
-+    return phys_addr;
-+}
 --
-.25.1
+.34.1

-[PULL 08/55] target/arm: Move stage_1_mmu_idx decl to internals.h
+[PULL 50/68] target/arm: Handle FPCR.AH in SVE FMLSLB, FMLSLT (vectors)
 From: Richard Henderson <richard.henderson@linaro.org>
-Move the decl from ptw.h to internals.h.  Provide an inline
+Handle FPCR.AH's requirement to not negate the sign of a NaN in SVE
-version for user-only, just as we do for arm_stage1_mmu_idx.
+FMLSL (indexed), using the usual trick of negating by XOR when AH=0
-Move an endif down to make the definition in helper.c be
+and by muladd flags when AH=1.
-system only.
 Since we have the CPUARMState* in the helper anyway, we can
 look directly at env->vfp.fpcr and don't need toa pass in the
 FPCR.AH value via the SIMD data word.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-2-richard.henderson@linaro.org
+Message-id: 20250129013857.135256-33-richard.henderson@linaro.org
 [PMM: tweaked commit message]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/internals.h | 5 +++++
+ target/arm/tcg/vec_helper.c | 15 ++++++++++++---
- target/arm/helper.c    | 5 ++---
+file changed, 12 insertions(+), 3 deletions(-)
 files changed, 7 insertions(+), 3 deletions(-)
-diff --git a/target/arm/internals.h b/target/arm/internals.h
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
+--- a/target/arm/tcg/vec_helper.c
-+++ b/target/arm/internals.h
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx(CPUARMState *env);
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
-  * Return the ARMMMUIdx for the stage1 traversal for the current regime.
+                                CPUARMState *env, uint32_t desc)
   */
  #ifdef CONFIG_USER_ONLY
 +static inline ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
 +{
 +    return ARMMMUIdx_Stage1_E0;
 +}
  static inline ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
  {
-     return ARMMMUIdx_Stage1_E0;
+     intptr_t i, oprsz = simd_oprsz(desc);
- }
+-    uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
- #else
++    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-+ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx);
+     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
- ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env);
+     float_status *status = &env->vfp.fp_status_a64;
- #endif
+     bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
++    int negx = 0, negf = 0;
-diff --git a/target/arm/helper.c b/target/arm/helper.c
++
-index XXXXXXX..XXXXXXX 100644
++    if (is_s) {
---- a/target/arm/helper.c
++        if (env->vfp.fpcr & FPCR_AH) {
-+++ b/target/arm/helper.c
++            negf = float_muladd_negate_product;
-@@ -XXX,XX +XXX,XX @@ static inline uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx,
++        } else {
 +            negx = 0x8000;
 +        }
 +    }
      for (i = 0; i < oprsz; i += sizeof(float32)) {
 -        float16 nn_16 = *(float16 *)(vn + H1_2(i + sel)) ^ negn;
 +        float16 nn_16 = *(float16 *)(vn + H1_2(i + sel)) ^ negx;
          float16 mm_16 = *(float16 *)(vm + H1_2(i + sel));
          float32 nn = float16_to_float32_by_bits(nn_16, fz16);
          float32 mm = float16_to_float32_by_bits(mm_16, fz16);
          float32 aa = *(float32 *)(va + H1_4(i));
 -        *(float32 *)(vd + H1_4(i)) = float32_muladd(nn, mm, aa, 0, status);
 +        *(float32 *)(vd + H1_4(i)) = float32_muladd(nn, mm, aa, negf, status);
      }
  }
--#endif /* !CONFIG_USER_ONLY */
--
- /* Convert a possible stage1+2 MMU index into the appropriate
-  * stage 1 MMU index
-  */
--static inline ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
-+ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
- {
-     switch (mmu_idx) {
-     case ARMMMUIdx_SE10_0:
-@@ -XXX,XX +XXX,XX @@ static inline ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
-         return mmu_idx;
-     }
- }
-+#endif /* !CONFIG_USER_ONLY */
- /* Return true if the translation regime is using LPAE format page tables */
- static inline bool regime_using_lpae_format(CPUARMState *env,
 --
-.25.1
+.34.1

-[PULL 01/55] target/arm: Declare support for FEAT_RASv1p1
+[PULL 51/68] target/arm: Enable FEAT_AFP for '-cpu max'
-The architectural feature RASv1p1 introduces the following new
+Now that we have completed the handling for FPCR.{AH,FIZ,NEP}, we
-features:
+can enable FEAT_AFP for '-cpu max', and document that we support it.
  * new registers ERXPFGCDN_EL1, ERXPFGCTL_EL1 and ERXPFGF_EL1
  * new bits in the fine-grained trap registers that control traps
    for these new registers
  * new trap bits HCR_EL2.FIEN and SCR_EL3.FIEN that control traps
    for ERXPFGCDN_EL1, ERXPFGCTL_EL1, ERXPFGP_EL1
  * a larger number of the ERXMISC<n>_EL1 registers
  * the format of ERR<n>STATUS registers changes
 The architecture permits that if ERRIDR_EL1.NUM is 0 (as it is for
 QEMU) then all these new registers may UNDEF, and the HCR_EL2.FIEN
 and SCR_EL3.FIEN bits may be RES0.  We don't have any ERR<n>STATUS
 registers (again, because ERRIDR_EL1.NUM is 0).  QEMU does not yet
 implement the fine-grained-trap extension.  So there is nothing we
 need to implement to be compliant with the feature spec.  Make the
 'max' CPU report the feature in its ID registers, and document it.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220531114258.855804-1-peter.maydell@linaro.org
 ---
  docs/system/arm/emulation.rst | 1 +
- target/arm/cpu64.c            | 1 +
+ target/arm/tcg/cpu64.c        | 1 +
 files changed, 2 insertions(+)
 diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/system/arm/emulation.rst
 +++ b/docs/system/arm/emulation.rst
 @@ -XXX,XX +XXX,XX @@ the following architecture extensions:
- - FEAT_PMUv3p1 (PMU Extensions v3.1)
+ - FEAT_AA64EL3 (Support for AArch64 at EL3)
- - FEAT_PMUv3p4 (PMU Extensions v3.4)
+ - FEAT_AdvSIMD (Advanced SIMD Extension)
- - FEAT_RAS (Reliability, availability, and serviceability)
+ - FEAT_AES (AESD and AESE instructions)
-+- FEAT_RASv1p1 (RAS Extension v1.1)
++- FEAT_AFP (Alternate floating-point behavior)
- - FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions)
+ - FEAT_Armv9_Crypto (Armv9 Cryptographic Extension)
- - FEAT_RNG (Random number generator)
+ - FEAT_ASID16 (16 bit ASID)
- - FEAT_S2FWB (Stage 2 forced Write-Back)
+ - FEAT_BBM at level 2 (Translation table break-before-make levels)
-diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
+diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu64.c
+--- a/target/arm/tcg/cpu64.c
-+++ b/target/arm/cpu64.c
++++ b/target/arm/tcg/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
-      * we do for EL2 with the virtualization=on property.
+     t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1);      /* FEAT_XNX */
-      */
+     t = FIELD_DP64(t, ID_AA64MMFR1, ETS, 2);      /* FEAT_ETS2 */
-     t = FIELD_DP64(t, ID_AA64PFR1, MTE, 3);       /* FEAT_MTE3 */
+     t = FIELD_DP64(t, ID_AA64MMFR1, HCX, 1);      /* FEAT_HCX */
-+    t = FIELD_DP64(t, ID_AA64PFR1, RAS_FRAC, 1);  /* FEAT_RASv1p1 */
++    t = FIELD_DP64(t, ID_AA64MMFR1, AFP, 1);      /* FEAT_AFP */
-     t = FIELD_DP64(t, ID_AA64PFR1, CSV2_FRAC, 0); /* FEAT_CSV2_2 */
+     t = FIELD_DP64(t, ID_AA64MMFR1, TIDCP1, 1);   /* FEAT_TIDCP1 */
-     cpu->isar.id_aa64pfr1 = t;
+     t = FIELD_DP64(t, ID_AA64MMFR1, CMOW, 1);     /* FEAT_CMOW */
+     cpu->isar.id_aa64mmfr1 = t;
 --
-.25.1
+.34.1

-[PULL 07/55] xlnx-zynqmp: fix the irq mapping for the display port and its dma
+[PULL 52/68] target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
-From: Frederic Konrad <fkonrad@amd.com>
+FEAT_RPRES implements an "increased precision" variant of the single
 precision FRECPE and FRSQRTE instructions from an 8 bit to a 12
 bit mantissa. This applies only when FPCR.AH == 1. Note that the
 halfprec and double versions of these insns retain the 8 bit
 precision regardless.
-When the display port has been initially implemented the device
+In this commit we add all the plumbing to make these instructions
-driver wasn't using interrupts.  Now that the display port driver
+call a new helper function when the increased-precision is in
-waits for vblank interrupt it has been noticed that the irq mapping
+effect. In the following commit we will provide the actual change
-is wrong.  So use the value from the linux device tree and the
+in behaviour in the helpers.
 ultrascale+ reference manual.
-Signed-off-by: Frederic Konrad <fkonrad@amd.com>
-Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
-Acked-by: Alistair Francis <alistair.francis@wdc.com>
-Message-id: 20220601172353.3220232-5-fkonrad@xilinx.com
-[PMM: refold lines in commit message]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- hw/arm/xlnx-zynqmp.c | 4 ++--
+ target/arm/cpu-features.h      |  5 +++++
-file changed, 2 insertions(+), 2 deletions(-)
+ target/arm/helper.h            |  4 ++++
  target/arm/tcg/translate-a64.c | 34 ++++++++++++++++++++++++++++++----
  target/arm/tcg/translate-sve.c | 16 ++++++++++++++--
  target/arm/tcg/vec_helper.c    |  2 ++
  target/arm/vfp_helper.c        | 32 ++++++++++++++++++++++++++++++--
 files changed, 85 insertions(+), 8 deletions(-)
-diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
+diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/xlnx-zynqmp.c
+--- a/target/arm/cpu-features.h
-+++ b/hw/arm/xlnx-zynqmp.c
++++ b/target/arm/cpu-features.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_mops(const ARMISARegisters *id)
- #define SERDES_SIZE         0x20000
+     return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, MOPS);
+ }
- #define DP_ADDR             0xfd4a0000
--#define DP_IRQ              113
++static inline bool isar_feature_aa64_rpres(const ARMISARegisters *id)
-+#define DP_IRQ              0x77
++{
++    return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, RPRES);
- #define DPDMA_ADDR          0xfd4c0000
++}
--#define DPDMA_IRQ           116
++
-+#define DPDMA_IRQ           0x7a
+ static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
+ {
- #define APU_ADDR            0xfd5c0000
+     /* We always set the AdvSIMD and FP fields identically.  */
- #define APU_IRQ             153
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, fpst)
  DEF_HELPER_FLAGS_2(recpe_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
  DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
 +DEF_HELPER_FLAGS_2(recpe_rpres_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
  DEF_HELPER_FLAGS_2(recpe_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
  DEF_HELPER_FLAGS_2(rsqrte_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
  DEF_HELPER_FLAGS_2(rsqrte_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
 +DEF_HELPER_FLAGS_2(rsqrte_rpres_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
  DEF_HELPER_FLAGS_2(rsqrte_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
  DEF_HELPER_FLAGS_1(recpe_u32, TCG_CALL_NO_RWG, i32, i32)
  DEF_HELPER_FLAGS_1(rsqrte_u32, TCG_CALL_NO_RWG, i32, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vrintx_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_4(gvec_frecpe_rpres_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_4(gvec_frsqrte_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_4(gvec_frsqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 +DEF_HELPER_FLAGS_4(gvec_frsqrte_rpres_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_4(gvec_frsqrte_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
  DEF_HELPER_FLAGS_4(gvec_fcgt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-a64.c
 +++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frecpe = {
      gen_helper_recpe_f32,
      gen_helper_recpe_f64,
  };
 -TRANS(FRECPE_s, do_fp1_scalar_ah, a, &f_scalar_frecpe, -1)
 +static const FPScalar1 f_scalar_frecpe_rpres = {
 +    gen_helper_recpe_f16,
 +    gen_helper_recpe_rpres_f32,
 +    gen_helper_recpe_f64,
 +};
 +TRANS(FRECPE_s, do_fp1_scalar_ah, a,
 +      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
 +      &f_scalar_frecpe_rpres : &f_scalar_frecpe, -1)
  static const FPScalar1 f_scalar_frecpx = {
      gen_helper_frecpx_f16,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frsqrte = {
      gen_helper_rsqrte_f32,
      gen_helper_rsqrte_f64,
  };
 -TRANS(FRSQRTE_s, do_fp1_scalar_ah, a, &f_scalar_frsqrte, -1)
 +static const FPScalar1 f_scalar_frsqrte_rpres = {
 +    gen_helper_rsqrte_f16,
 +    gen_helper_rsqrte_rpres_f32,
 +    gen_helper_rsqrte_f64,
 +};
 +TRANS(FRSQRTE_s, do_fp1_scalar_ah, a,
 +      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
 +      &f_scalar_frsqrte_rpres : &f_scalar_frsqrte, -1)
  static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
  {
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_frecpe[] = {
      gen_helper_gvec_frecpe_s,
      gen_helper_gvec_frecpe_d,
  };
 -TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
 +static gen_helper_gvec_2_ptr * const f_frecpe_rpres[] = {
 +    gen_helper_gvec_frecpe_h,
 +    gen_helper_gvec_frecpe_rpres_s,
 +    gen_helper_gvec_frecpe_d,
 +};
 +TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0,
 +      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ? f_frecpe_rpres : f_frecpe)
  static gen_helper_gvec_2_ptr * const f_frsqrte[] = {
      gen_helper_gvec_frsqrte_h,
      gen_helper_gvec_frsqrte_s,
      gen_helper_gvec_frsqrte_d,
  };
 -TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
 +static gen_helper_gvec_2_ptr * const f_frsqrte_rpres[] = {
 +    gen_helper_gvec_frsqrte_h,
 +    gen_helper_gvec_frsqrte_rpres_s,
 +    gen_helper_gvec_frsqrte_d,
 +};
 +TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0,
 +      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ? f_frsqrte_rpres : f_frsqrte)
  static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
  {
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const frecpe_fns[] = {
      NULL,                     gen_helper_gvec_frecpe_h,
      gen_helper_gvec_frecpe_s, gen_helper_gvec_frecpe_d,
  };
 -TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frecpe_fns[a->esz], a, 0)
 +static gen_helper_gvec_2_ptr * const frecpe_rpres_fns[] = {
 +    NULL,                           gen_helper_gvec_frecpe_h,
 +    gen_helper_gvec_frecpe_rpres_s, gen_helper_gvec_frecpe_d,
 +};
 +TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz,
 +           s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
 +           frecpe_rpres_fns[a->esz] : frecpe_fns[a->esz], a, 0)
  static gen_helper_gvec_2_ptr * const frsqrte_fns[] = {
      NULL,                      gen_helper_gvec_frsqrte_h,
      gen_helper_gvec_frsqrte_s, gen_helper_gvec_frsqrte_d,
  };
 -TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frsqrte_fns[a->esz], a, 0)
 +static gen_helper_gvec_2_ptr * const frsqrte_rpres_fns[] = {
 +    NULL,                            gen_helper_gvec_frsqrte_h,
 +    gen_helper_gvec_frsqrte_rpres_s, gen_helper_gvec_frsqrte_d,
 +};
 +TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz,
 +           s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
 +           frsqrte_rpres_fns[a->esz] : frsqrte_fns[a->esz], a, 0)
  /*
   *** SVE Floating Point Compare with Zero Group
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, float_status *stat, uint32_t desc)  \
  DO_2OP(gvec_frecpe_h, helper_recpe_f16, float16)
  DO_2OP(gvec_frecpe_s, helper_recpe_f32, float32)
 +DO_2OP(gvec_frecpe_rpres_s, helper_recpe_rpres_f32, float32)
  DO_2OP(gvec_frecpe_d, helper_recpe_f64, float64)
  DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16)
  DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32)
 +DO_2OP(gvec_frsqrte_rpres_s, helper_rsqrte_rpres_f32, float32)
  DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64)
  DO_2OP(gvec_vrintx_h, float16_round_to_int, float16)
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, float_status *fpst)
      return make_float16(f16_val);
  }
 -float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
 +/*
 + * FEAT_RPRES means the f32 FRECPE has an "increased precision" variant
 + * which is used when FPCR.AH == 1.
 + */
 +static float32 do_recpe_f32(float32 input, float_status *fpst, bool rpres)
  {
      float32 f32 = float32_squash_input_denormal(input, fpst);
      uint32_t f32_val = float32_val(f32);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
      return make_float32(f32_val);
  }
 +float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
 +{
 +    return do_recpe_f32(input, fpst, false);
 +}
 +
 +float32 HELPER(recpe_rpres_f32)(float32 input, float_status *fpst)
 +{
 +    return do_recpe_f32(input, fpst, true);
 +}
 +
  float64 HELPER(recpe_f64)(float64 input, float_status *fpst)
  {
      float64 f64 = float64_squash_input_denormal(input, fpst);
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
      return make_float16(val);
  }
 -float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
 +/*
 + * FEAT_RPRES means the f32 FRSQRTE has an "increased precision" variant
 + * which is used when FPCR.AH == 1.
 + */
 +static float32 do_rsqrte_f32(float32 input, float_status *s, bool rpres)
  {
      float32 f32 = float32_squash_input_denormal(input, s);
      uint32_t val = float32_val(f32);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
      return make_float32(val);
  }
 +float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
 +{
 +    return do_rsqrte_f32(input, s, false);
 +}
 +
 +float32 HELPER(rsqrte_rpres_f32)(float32 input, float_status *s)
 +{
 +    return do_rsqrte_f32(input, s, true);
 +}
 +
  float64 HELPER(rsqrte_f64)(float64 input, float_status *s)
  {
      float64 f64 = float64_squash_input_denormal(input, s);
 --
-.25.1
+.34.1

-[PULL 43/55] target/arm: Hoist arm_is_el2_enabled check in sve_exception_el
+[PULL 53/68] target/arm: Implement increased precision FRECPE
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the increased precision variation of FRECPE.  In the
 pseudocode this corresponds to the handling of the
 "increasedprecision" boolean in the FPRecipEstimate() and
 RecipEstimate() functions.
-This check is buried within arm_hcr_el2_eff(), but since we
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-have to have the explicit check for CPTR_EL2.TZ, we might as
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-well just check it once at the beginning of the block.
+---
  target/arm/vfp_helper.c | 54 +++++++++++++++++++++++++++++++++++------
 file changed, 46 insertions(+), 8 deletions(-)
-Once this is done, we can test HCR_EL2.{E2H,TGE} directly,
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 rather than going through arm_hcr_el2_eff().
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20220607203306.657998-9-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper.c | 13 +++++--------
 file changed, 5 insertions(+), 8 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/vfp_helper.c
-+++ b/target/arm/helper.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
+@@ -XXX,XX +XXX,XX @@ static int recip_estimate(int input)
      return r;
  }
 +/*
 + * Increased precision version:
 + * input is a 13 bit fixed point number
 + * input range 2048 .. 4095 for a number from 0.5 <= x < 1.0.
 + * result range 4096 .. 8191 for a number from 1.0 to 2.0
 + */
 +static int recip_estimate_incprec(int input)
 +{
 +    int a, b, r;
 +    assert(2048 <= input && input < 4096);
 +    a = (input * 2) + 1;
 +    /*
 +     * The pseudocode expresses this as an operation on infinite
 +     * precision reals where it calculates 2^25 / a and then looks
 +     * at the error between that and the rounded-down-to-integer
 +     * value to see if it should instead round up. We instead
 +     * follow the same approach as the pseudocode for the 8-bit
 +     * precision version, and calculate (2 * (2^25 / a)) as an
 +     * integer so we can do the "add one and halve" to round it.
 +     * So the 1 << 26 here is correct.
 +     */
 +    b = (1 << 26) / a;
 +    r = (b + 1) >> 1;
 +    assert(4096 <= r && r < 8192);
 +    return r;
 +}
 +
  /*
   * Common wrapper to call recip_estimate
   *
@@ -XXX,XX +XXX,XX @@ static int recip_estimate(int input)
   * callee.
   */
 -static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
 +static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac,
 +                                    bool increasedprecision)
  {
      uint32_t scaled, estimate;
      uint64_t result_frac;
@@ -XXX,XX +XXX,XX @@ static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
          }
      }
--    /*
+-    /* scaled = UInt('1':fraction<51:44>) */
--     * CPTR_EL2 changes format with HCR_EL2.E2H (regardless of TGE).
+-    scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
--     */
+-    estimate = recip_estimate(scaled);
--    if (el <= 2) {
++    if (increasedprecision) {
--        uint64_t hcr_el2 = arm_hcr_el2_eff(env);
++        /* scaled = UInt('1':fraction<51:41>) */
--        if (hcr_el2 & HCR_E2H) {
++        scaled = deposit32(1 << 11, 0, 11, extract64(frac, 41, 11));
-+    if (el <= 2 && arm_is_el2_enabled(env)) {
++        estimate = recip_estimate_incprec(scaled);
-+        /* CPTR_EL2 changes format with HCR_EL2.E2H (regardless of TGE). */
++    } else {
-+        if (env->cp15.hcr_el2 & HCR_E2H) {
++        /* scaled = UInt('1':fraction<51:44>) */
-             switch (FIELD_EX64(env->cp15.cptr_el[2], CPTR_EL2, ZEN)) {
++        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
-             case 1:
++        estimate = recip_estimate(scaled);
--                if (el != 0 || !(hcr_el2 & HCR_TGE)) {
++    }
-+                if (el != 0 || !(env->cp15.hcr_el2 & HCR_TGE)) {
-                     break;
+     result_exp = exp_off - *exp;
-                 }
+-    result_frac = deposit64(0, 44, 8, estimate);
-                 /* fall through */
++    if (increasedprecision) {
-@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
++        result_frac = deposit64(0, 40, 12, estimate);
-             case 2:
++    } else {
-                 return 2;
++        result_frac = deposit64(0, 44, 8, estimate);
-             }
++    }
--        } else if (arm_is_el2_enabled(env)) {
+     if (result_exp == 0) {
-+        } else {
+         result_frac = deposit64(result_frac >> 1, 51, 1, 1);
-             if (FIELD_EX64(env->cp15.cptr_el[2], CPTR_EL2, TZ)) {
+     } else if (result_exp == -1) {
-                 return 2;
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, float_status *fpst)
-             }
+     }
      f64_frac = call_recip_estimate(&f16_exp, 29,
 -                                   ((uint64_t) f16_frac) << (52 - 10));
 +                                   ((uint64_t) f16_frac) << (52 - 10), false);
      /* result = sign : result_exp<4:0> : fraction<51:42> */
      f16_val = deposit32(0, 15, 1, f16_sign);
@@ -XXX,XX +XXX,XX @@ static float32 do_recpe_f32(float32 input, float_status *fpst, bool rpres)
      }
      f64_frac = call_recip_estimate(&f32_exp, 253,
 -                                   ((uint64_t) f32_frac) << (52 - 23));
 +                                   ((uint64_t) f32_frac) << (52 - 23), rpres);
      /* result = sign : result_exp<7:0> : fraction<51:29> */
      f32_val = deposit32(0, 31, 1, f32_sign);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(recpe_f64)(float64 input, float_status *fpst)
          return float64_set_sign(float64_zero, float64_is_neg(f64));
      }
 -    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac);
 +    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac, false);
      /* result = sign : result_exp<10:0> : fraction<51:0>; */
      f64_val = deposit64(0, 63, 1, f64_sign);
 --
-.25.1
+.34.1

-[PULL 52/55] target/arm: Move expand_pred_h to vec_internal.h
+[PULL 54/68] target/arm: Implement increased precision FRSQRTE
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the increased precision variation of FRSQRTE.  In the
 pseudocode this corresponds to the handling of the
 "increasedprecision" boolean in the FPRSqrtEstimate() and
 RecipSqrtEstimate() functions.
-Move the data to vec_helper.c and the inline to vec_internal.h.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/vfp_helper.c | 77 ++++++++++++++++++++++++++++++++++-------
 file changed, 64 insertions(+), 13 deletions(-)
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20220607203306.657998-18-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/vec_internal.h |  7 +++++++
  target/arm/sve_helper.c   | 29 -----------------------------
  target/arm/vec_helper.c   | 26 ++++++++++++++++++++++++++
 files changed, 33 insertions(+), 29 deletions(-)
 diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_internal.h
+--- a/target/arm/vfp_helper.c
-+++ b/target/arm/vec_internal.h
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ static inline uint64_t expand_pred_b(uint8_t byte)
+@@ -XXX,XX +XXX,XX @@ static int do_recip_sqrt_estimate(int a)
-     return expand_pred_b_data[byte];
+     return estimate;
  }
-+/* Similarly for half-word elements. */
++static int do_recip_sqrt_estimate_incprec(int a)
 +extern const uint64_t expand_pred_h_data[0x55 + 1];
 +static inline uint64_t expand_pred_h(uint8_t byte)
 +{
-+    return expand_pred_h_data[byte & 0x55];
++    /*
 +     * The Arm ARM describes the 12-bit precision version of RecipSqrtEstimate
 +     * in terms of an infinite-precision floating point calculation of a
 +     * square root. We implement this using the same kind of pure integer
 +     * algorithm as the 8-bit mantissa, to get the same bit-for-bit result.
 +     */
 +    int64_t b, estimate;
 -static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
 +    assert(1024 <= a && a < 4096);
 +    if (a < 2048) {
 +        a = a * 2 + 1;
 +    } else {
 +        a = (a >> 1) << 1;
 +        a = (a + 1) * 2;
 +    }
 +    b = 8192;
 +    while (a * (b + 1) * (b + 1) < (1ULL << 39)) {
 +        b += 1;
 +    }
 +    estimate = (b + 1) / 2;
 +
 +    assert(4096 <= estimate && estimate < 8192);
 +
 +    return estimate;
 +}
 +
- static inline void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
++static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac,
 +                                    bool increasedprecision)
  {
-     uint64_t *d = vd + opr_sz;
+     int estimate;
-diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
+     uint32_t scaled;
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@ static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
---- a/target/arm/sve_helper.c
+         frac = extract64(frac, 0, 51) << 1;
-+++ b/target/arm/sve_helper.c
+     }
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
-     return flags;
+-    if (*exp & 1) {
 -        /* scaled = UInt('01':fraction<51:45>) */
 -        scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
 +    if (increasedprecision) {
 +        if (*exp & 1) {
 +            /* scaled = UInt('01':fraction<51:42>) */
 +            scaled = deposit32(1 << 10, 0, 10, extract64(frac, 42, 10));
 +        } else {
 +            /* scaled = UInt('1':fraction<51:41>) */
 +            scaled = deposit32(1 << 11, 0, 11, extract64(frac, 41, 11));
 +        }
 +        estimate = do_recip_sqrt_estimate_incprec(scaled);
      } else {
 -        /* scaled = UInt('1':fraction<51:44>) */
 -        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
 +        if (*exp & 1) {
 +            /* scaled = UInt('01':fraction<51:45>) */
 +            scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
 +        } else {
 +            /* scaled = UInt('1':fraction<51:44>) */
 +            scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
 +        }
 +        estimate = do_recip_sqrt_estimate(scaled);
      }
 -    estimate = do_recip_sqrt_estimate(scaled);
      *exp = (exp_off - *exp) / 2;
 -    return extract64(estimate, 0, 8) << 44;
 +    if (increasedprecision) {
 +        return extract64(estimate, 0, 12) << 40;
 +    } else {
 +        return extract64(estimate, 0, 8) << 44;
 +    }
  }
--/* Similarly for half-word elements.
+ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
-- *  for (i = 0; i < 256; ++i) {
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
-- *      unsigned long m = 0;
-- *      if (i & 0xaa) {
+     f64_frac = ((uint64_t) f16_frac) << (52 - 10);
-- *          continue;
-- *      }
+-    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac);
-- *      for (j = 0; j < 8; j += 2) {
++    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac, false);
-- *          if ((i >> j) & 1) {
-- *              m |= 0xfffful << (j << 3);
+     /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(2) */
-- *          }
+     val = deposit32(0, 15, 1, f16_sign);
-- *      }
+@@ -XXX,XX +XXX,XX @@ static float32 do_rsqrte_f32(float32 input, float_status *s, bool rpres)
-- *      printf("[0x%x] = 0x%016lx,\n", i, m);
-- *  }
+     f64_frac = ((uint64_t) f32_frac) << 29;
-- */
--static inline uint64_t expand_pred_h(uint8_t byte)
+-    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac);
--{
++    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac, rpres);
--    static const uint64_t word[] = {
--        [0x01] = 0x000000000000ffff, [0x04] = 0x00000000ffff0000,
+-    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(15) */
--        [0x05] = 0x00000000ffffffff, [0x10] = 0x0000ffff00000000,
++    /*
--        [0x11] = 0x0000ffff0000ffff, [0x14] = 0x0000ffffffff0000,
++     * result = sign : result_exp<7:0> : estimate<7:0> : Zeros(15)
--        [0x15] = 0x0000ffffffffffff, [0x40] = 0xffff000000000000,
++     * or for increased precision
--        [0x41] = 0xffff00000000ffff, [0x44] = 0xffff0000ffff0000,
++     * result = sign : result_exp<7:0> : estimate<11:0> : Zeros(11)
--        [0x45] = 0xffff0000ffffffff, [0x50] = 0xffffffff00000000,
++     */
--        [0x51] = 0xffffffff0000ffff, [0x54] = 0xffffffffffff0000,
+     val = deposit32(0, 31, 1, f32_sign);
--        [0x55] = 0xffffffffffffffff,
+     val = deposit32(val, 23, 8, f32_exp);
--    };
+-    val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
--    return word[byte & 0x55];
++    if (rpres) {
--}
++        val = deposit32(val, 11, 12, extract64(f64_frac, 52 - 12, 12));
--
++    } else {
- /* Similarly for single word elements.  */
++        val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
- static inline uint64_t expand_pred_s(uint8_t byte)
++    }
- {
+     return make_float32(val);
-diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
+ }
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ float64 HELPER(rsqrte_f64)(float64 input, float_status *s)
-+++ b/target/arm/vec_helper.c
+         return float64_zero;
-@@ -XXX,XX +XXX,XX @@ const uint64_t expand_pred_b_data[256] = {
+     }
-xffffffffffffffff,
- };
+-    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac);
++    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac, false);
-+/*
-+ * Similarly for half-word elements.
+     /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(44) */
-+ *  for (i = 0; i < 256; ++i) {
+     val = deposit64(0, 61, 1, f64_sign);
 + *      unsigned long m = 0;
 + *      if (i & 0xaa) {
 + *          continue;
 + *      }
 + *      for (j = 0; j < 8; j += 2) {
 + *          if ((i >> j) & 1) {
 + *              m |= 0xfffful << (j << 3);
 + *          }
 + *      }
 + *      printf("[0x%x] = 0x%016lx,\n", i, m);
 + *  }
 + */
 +const uint64_t expand_pred_h_data[0x55 + 1] = {
 +    [0x01] = 0x000000000000ffff, [0x04] = 0x00000000ffff0000,
 +    [0x05] = 0x00000000ffffffff, [0x10] = 0x0000ffff00000000,
 +    [0x11] = 0x0000ffff0000ffff, [0x14] = 0x0000ffffffff0000,
 +    [0x15] = 0x0000ffffffffffff, [0x40] = 0xffff000000000000,
 +    [0x41] = 0xffff00000000ffff, [0x44] = 0xffff0000ffff0000,
 +    [0x45] = 0xffff0000ffffffff, [0x50] = 0xffffffff00000000,
 +    [0x51] = 0xffffffff0000ffff, [0x54] = 0xffffffffffff0000,
 +    [0x55] = 0xffffffffffffffff,
 +};
 +
  /* Signed saturating rounding doubling multiply-accumulate high half, 8-bit */
  int8_t do_sqrdmlah_b(int8_t src1, int8_t src2, int8_t src3,
                       bool neg, bool round)
 --
-.25.1
+.34.1

-[PULL 02/55] target/arm: Implement FEAT_DoubleFault
+[PULL 55/68] target/arm: Enable FEAT_RPRES for -cpu max
-The FEAT_DoubleFault extension adds the following:
+Now the emulation is complete, we can enable FEAT_RPRES for the 'max'
+CPU type.
  * All external aborts on instruction fetches and translation table
    walks for instruction fetches must be synchronous.  For QEMU this
    is already true.
  * SCR_EL3 has a new bit NMEA which disables the masking of SError
    interrupts by PSTATE.A when the SError interrupt is taken to EL3.
    For QEMU we only need to make the bit writable, because we have no
    sources of SError interrupts.
  * SCR_EL3 has a new bit EASE which causes synchronous external
    aborts taken to EL3 to be taken at the same entry point as SError.
    (Note that this does not mean that they are SErrors for purposes
    of PSTATE.A masking or that the syndrome register reports them as
    SErrors: it just means that the vector offset is different.)
  * The existing SCTLR_EL3.IESB has an effective value of 1 when
    SCR_EL3.NMEA is 1.  For QEMU this is a no-op because we don't need
    different behaviour based on IESB (we don't need to do anything to
    ensure that error exceptions are synchronized).
 So for QEMU the things we need to change are:
  * Make SCR_EL3.{NMEA,EASE} writable
  * When taking a synchronous external abort at EL3, adjust the
    vector entry point if SCR_EL3.EASE is set
  * Advertise the feature in the ID registers
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220531151431.949322-1-peter.maydell@linaro.org
 ---
- docs/system/arm/emulation.rst |  1 +
+ docs/system/arm/emulation.rst | 1 +
- target/arm/cpu.h              |  5 +++++
+ target/arm/tcg/cpu64.c        | 1 +
- target/arm/cpu64.c            |  4 ++--
+files changed, 2 insertions(+)
  target/arm/helper.c           | 36 +++++++++++++++++++++++++++++++++++
 files changed, 44 insertions(+), 2 deletions(-)
 diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/system/arm/emulation.rst
 +++ b/docs/system/arm/emulation.rst
 @@ -XXX,XX +XXX,XX @@ the following architecture extensions:
- - FEAT_Debugv8p2 (Debug changes for v8.2)
+ - FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions)
- - FEAT_Debugv8p4 (Debug changes for v8.4)
+ - FEAT_RME (Realm Management Extension) (NB: support status in QEMU is experimental)
- - FEAT_DotProd (Advanced SIMD dot product instructions)
+ - FEAT_RNG (Random number generator)
-+- FEAT_DoubleFault (Double Fault Extension)
++- FEAT_RPRES (Increased precision of FRECPE and FRSQRTE)
- - FEAT_FCMA (Floating-point complex number instructions)
+ - FEAT_S2FWB (Stage 2 forced Write-Back)
- - FEAT_FHM (Floating-point half-precision multiplication instructions)
+ - FEAT_SB (Speculation Barrier)
- - FEAT_FP16 (Half-precision floating-point data processing)
+ - FEAT_SEL2 (Secure EL2)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/tcg/cpu64.c
-+++ b/target/arm/cpu.h
++++ b/target/arm/tcg/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_ras(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
-     return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, RAS) != 0;
+     cpu->isar.id_aa64isar1 = t;
- }
+     t = cpu->isar.id_aa64isar2;
-+static inline bool isar_feature_aa64_doublefault(const ARMISARegisters *id)
++    t = FIELD_DP64(t, ID_AA64ISAR2, RPRES, 1);    /* FEAT_RPRES */
-+{
+     t = FIELD_DP64(t, ID_AA64ISAR2, MOPS, 1);     /* FEAT_MOPS */
-+    return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, RAS) >= 2;
+     t = FIELD_DP64(t, ID_AA64ISAR2, BC, 1);       /* FEAT_HBC */
-+}
+     t = FIELD_DP64(t, ID_AA64ISAR2, WFXT, 2);     /* FEAT_WFxT */
 +
  static inline bool isar_feature_aa64_sve(const ARMISARegisters *id)
  {
      return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, SVE) != 0;
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
      t = cpu->isar.id_aa64pfr0;
      t = FIELD_DP64(t, ID_AA64PFR0, FP, 1);        /* FEAT_FP16 */
      t = FIELD_DP64(t, ID_AA64PFR0, ADVSIMD, 1);   /* FEAT_FP16 */
 -    t = FIELD_DP64(t, ID_AA64PFR0, RAS, 1);       /* FEAT_RAS */
 +    t = FIELD_DP64(t, ID_AA64PFR0, RAS, 2);       /* FEAT_RASv1p1 + FEAT_DoubleFault */
      t = FIELD_DP64(t, ID_AA64PFR0, SVE, 1);
      t = FIELD_DP64(t, ID_AA64PFR0, SEL2, 1);      /* FEAT_SEL2 */
      t = FIELD_DP64(t, ID_AA64PFR0, DIT, 1);       /* FEAT_DIT */
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
       * we do for EL2 with the virtualization=on property.
       */
      t = FIELD_DP64(t, ID_AA64PFR1, MTE, 3);       /* FEAT_MTE3 */
 -    t = FIELD_DP64(t, ID_AA64PFR1, RAS_FRAC, 1);  /* FEAT_RASv1p1 */
 +    t = FIELD_DP64(t, ID_AA64PFR1, RAS_FRAC, 0);  /* FEAT_RASv1p1 + FEAT_DoubleFault */
      t = FIELD_DP64(t, ID_AA64PFR1, CSV2_FRAC, 0); /* FEAT_CSV2_2 */
      cpu->isar.id_aa64pfr1 = t;
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void scr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
          if (cpu_isar_feature(aa64_scxtnum, cpu)) {
              valid_mask |= SCR_ENSCXT;
          }
 +        if (cpu_isar_feature(aa64_doublefault, cpu)) {
 +            valid_mask |= SCR_EASE | SCR_NMEA;
 +        }
      } else {
          valid_mask &= ~(SCR_RW | SCR_ST);
          if (cpu_isar_feature(aa32_ras, cpu)) {
@@ -XXX,XX +XXX,XX @@ static uint32_t cpsr_read_for_spsr_elx(CPUARMState *env)
      return ret;
  }
 +static bool syndrome_is_sync_extabt(uint32_t syndrome)
 +{
 +    /* Return true if this syndrome value is a synchronous external abort */
 +    switch (syn_get_ec(syndrome)) {
 +    case EC_INSNABORT:
 +    case EC_INSNABORT_SAME_EL:
 +    case EC_DATAABORT:
 +    case EC_DATAABORT_SAME_EL:
 +        /* Look at fault status code for all the synchronous ext abort cases */
 +        switch (syndrome & 0x3f) {
 +        case 0x10:
 +        case 0x13:
 +        case 0x14:
 +        case 0x15:
 +        case 0x16:
 +        case 0x17:
 +            return true;
 +        default:
 +            return false;
 +        }
 +    default:
 +        return false;
 +    }
 +}
 +
  /* Handle exception entry to a target EL which is using AArch64 */
  static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
  {
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
      switch (cs->exception_index) {
      case EXCP_PREFETCH_ABORT:
      case EXCP_DATA_ABORT:
 +        /*
 +         * FEAT_DoubleFault allows synchronous external aborts taken to EL3
 +         * to be taken to the SError vector entrypoint.
 +         */
 +        if (new_el == 3 && (env->cp15.scr_el3 & SCR_EASE) &&
 +            syndrome_is_sync_extabt(env->exception.syndrome)) {
 +            addr += 0x180;
 +        }
          env->cp15.far_el[new_el] = env->exception.vaddress;
          qemu_log_mask(CPU_LOG_INT, "...with FAR 0x%" PRIx64 "\n",
                        env->cp15.far_el[new_el]);
 --
-.25.1
+.34.1

-[PULL 30/55] target/arm: Move regime_is_user to ptw.c
+[PULL 56/68] target/arm: Introduce CPUARMState.vfp.fp_status[]
 From: Richard Henderson <richard.henderson@linaro.org>
+Move ARMFPStatusFlavour to cpu.h with which to index
+this array.  For now, place the array in an anonymous
+union with the existing structures.  Adjust the order
+of the existing structures to match the enum.
+Simplify fpstatus_ptr() using the new array.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-24-richard.henderson@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20250129013857.135256-7-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/ptw.h    |  1 -
+ target/arm/cpu.h           | 119 +++++++++++++++++++++----------------
- target/arm/helper.c | 24 ------------------------
+ target/arm/tcg/translate.h |  64 +-------------------
- target/arm/ptw.c    | 22 ++++++++++++++++++++++
+files changed, 70 insertions(+), 113 deletions(-)
-files changed, 22 insertions(+), 25 deletions(-)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 diff --git a/target/arm/ptw.h b/target/arm/ptw.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/cpu.h
-+++ b/target/arm/ptw.h
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ typedef struct ARMMMUFaultInfo ARMMMUFaultInfo;
- #ifndef CONFIG_USER_ONLY
+ typedef struct NVICState NVICState;
--bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx);
++/*
- bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
++ * Enum for indexing vfp.fp_status[].
- uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn);
++ *
++ * FPST_A32: is the "normal" fp status for AArch32 insns
-diff --git a/target/arm/helper.c b/target/arm/helper.c
++ * FPST_A64: is the "normal" fp status for AArch64 insns
 + * FPST_A32_F16: used for AArch32 half-precision calculations
 + * FPST_A64_F16: used for AArch64 half-precision calculations
 + * FPST_STD: the ARM "Standard FPSCR Value"
 + * FPST_STD_F16: used for half-precision
 + *       calculations with the ARM "Standard FPSCR Value"
 + * FPST_AH: used for the A64 insns which change behaviour
 + *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 + *       and the reciprocal and square root estimate/step insns)
 + * FPST_AH_F16: used for the A64 insns which change behaviour
 + *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 + *       and the reciprocal and square root estimate/step insns);
 + *       for half-precision
 + *
 + * Half-precision operations are governed by a separate
 + * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
 + * status structure to control this.
 + *
 + * The "Standard FPSCR", ie default-NaN, flush-to-zero,
 + * round-to-nearest and is used by any operations (generally
 + * Neon) which the architecture defines as controlled by the
 + * standard FPSCR value rather than the FPSCR.
 + *
 + * The "standard FPSCR but for fp16 ops" is needed because
 + * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
 + * using a fixed value for it.
 + *
 + * The ah_fp_status is needed because some insns have different
 + * behaviour when FPCR.AH == 1: they don't update cumulative
 + * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
 + * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
 + * which means we need an ah_fp_status_f16 as well.
 + *
 + * To avoid having to transfer exception bits around, we simply
 + * say that the FPSCR cumulative exception flags are the logical
 + * OR of the flags in the four fp statuses. This relies on the
 + * only thing which needs to read the exception flags being
 + * an explicit FPSCR read.
 + */
 +typedef enum ARMFPStatusFlavour {
 +    FPST_A32,
 +    FPST_A64,
 +    FPST_A32_F16,
 +    FPST_A64_F16,
 +    FPST_AH,
 +    FPST_AH_F16,
 +    FPST_STD,
 +    FPST_STD_F16,
 +} ARMFPStatusFlavour;
 +#define FPST_COUNT  8
 +
  typedef struct CPUArchState {
      /* Regs for current mode.  */
      uint32_t regs[16];
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          /* Scratch space for aa32 neon expansion.  */
          uint32_t scratch[8];
 -        /* There are a number of distinct float control structures:
 -         *
 -         *  fp_status_a32: is the "normal" fp status for AArch32 insns
 -         *  fp_status_a64: is the "normal" fp status for AArch64 insns
 -         *  fp_status_fp16_a32: used for AArch32 half-precision calculations
 -         *  fp_status_fp16_a64: used for AArch64 half-precision calculations
 -         *  standard_fp_status : the ARM "Standard FPSCR Value"
 -         *  standard_fp_status_fp16 : used for half-precision
 -         *       calculations with the ARM "Standard FPSCR Value"
 -         *  ah_fp_status: used for the A64 insns which change behaviour
 -         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 -         *       and the reciprocal and square root estimate/step insns)
 -         *  ah_fp_status_f16: used for the A64 insns which change behaviour
 -         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
 -         *       and the reciprocal and square root estimate/step insns);
 -         *       for half-precision
 -         *
 -         * Half-precision operations are governed by a separate
 -         * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
 -         * status structure to control this.
 -         *
 -         * The "Standard FPSCR", ie default-NaN, flush-to-zero,
 -         * round-to-nearest and is used by any operations (generally
 -         * Neon) which the architecture defines as controlled by the
 -         * standard FPSCR value rather than the FPSCR.
 -         *
 -         * The "standard FPSCR but for fp16 ops" is needed because
 -         * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
 -         * using a fixed value for it.
 -         *
 -         * The ah_fp_status is needed because some insns have different
 -         * behaviour when FPCR.AH == 1: they don't update cumulative
 -         * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
 -         * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
 -         * which means we need an ah_fp_status_f16 as well.
 -         *
 -         * To avoid having to transfer exception bits around, we simply
 -         * say that the FPSCR cumulative exception flags are the logical
 -         * OR of the flags in the four fp statuses. This relies on the
 -         * only thing which needs to read the exception flags being
 -         * an explicit FPSCR read.
 -         */
 -        float_status fp_status_a32;
 -        float_status fp_status_a64;
 -        float_status fp_status_f16_a32;
 -        float_status fp_status_f16_a64;
 -        float_status standard_fp_status;
 -        float_status standard_fp_status_f16;
 -        float_status ah_fp_status;
 -        float_status ah_fp_status_f16;
 +        /* There are a number of distinct float control structures. */
 +        union {
 +            float_status fp_status[FPST_COUNT];
 +            struct {
 +                float_status fp_status_a32;
 +                float_status fp_status_a64;
 +                float_status fp_status_f16_a32;
 +                float_status fp_status_f16_a64;
 +                float_status ah_fp_status;
 +                float_status ah_fp_status_f16;
 +                float_status standard_fp_status;
 +                float_status standard_fp_status_f16;
 +            };
 +        };
          uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
          uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
 diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/translate.h
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/translate.h
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
+@@ -XXX,XX +XXX,XX @@ static inline CPUARMTBFlags arm_tbflags_from_tb(const TranslationBlock *tb)
      return (CPUARMTBFlags){ tb->flags, tb->cs_base };
  }
- #endif /* !CONFIG_USER_ONLY */
+-/*
--#ifndef CONFIG_USER_ONLY
+- * Enum for argument to fpstatus_ptr().
--bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
+- */
--{
+-typedef enum ARMFPStatusFlavour {
--    switch (mmu_idx) {
+-    FPST_A32,
--    case ARMMMUIdx_SE10_0:
+-    FPST_A64,
--    case ARMMMUIdx_E20_0:
+-    FPST_A32_F16,
--    case ARMMMUIdx_SE20_0:
+-    FPST_A64_F16,
--    case ARMMMUIdx_Stage1_E0:
+-    FPST_AH,
--    case ARMMMUIdx_Stage1_SE0:
+-    FPST_AH_F16,
--    case ARMMMUIdx_MUser:
+-    FPST_STD,
--    case ARMMMUIdx_MSUser:
+-    FPST_STD_F16,
--    case ARMMMUIdx_MUserNegPri:
+-} ARMFPStatusFlavour;
--    case ARMMMUIdx_MSUserNegPri:
+-
--        return true;
+ /**
   * fpstatus_ptr: return TCGv_ptr to the specified fp_status field
   *
   * We have multiple softfloat float_status fields in the Arm CPU state struct
   * (see the comment in cpu.h for details). Return a TCGv_ptr which has
   * been set up to point to the requested field in the CPU state struct.
 - * The options are:
 - *
 - * FPST_A32
 - *   for AArch32 non-FP16 operations controlled by the FPCR
 - * FPST_A64
 - *   for AArch64 non-FP16 operations controlled by the FPCR
 - * FPST_A32_F16
 - *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
 - * FPST_A64_F16
 - *   for AArch64 operations controlled by the FPCR where FPCR.FZ16 is to be used
 - * FPST_AH:
 - *   for AArch64 operations which change behaviour when AH=1 (specifically,
 - *   bfloat16 conversions and multiplies, and the reciprocal and square root
 - *   estimate/step insns)
 - * FPST_AH_F16:
 - *   ditto, but for half-precision operations
 - * FPST_STD
 - *   for A32/T32 Neon operations using the "standard FPSCR value"
 - * FPST_STD_F16
 - *   as FPST_STD, but where FPCR.FZ16 is to be used
   */
  static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
  {
      TCGv_ptr statusptr = tcg_temp_new_ptr();
 -    int offset;
 +    int offset = offsetof(CPUARMState, vfp.fp_status[flavour]);
 -    switch (flavour) {
 -    case FPST_A32:
 -        offset = offsetof(CPUARMState, vfp.fp_status_a32);
 -        break;
 -    case FPST_A64:
 -        offset = offsetof(CPUARMState, vfp.fp_status_a64);
 -        break;
 -    case FPST_A32_F16:
 -        offset = offsetof(CPUARMState, vfp.fp_status_f16_a32);
 -        break;
 -    case FPST_A64_F16:
 -        offset = offsetof(CPUARMState, vfp.fp_status_f16_a64);
 -        break;
 -    case FPST_AH:
 -        offset = offsetof(CPUARMState, vfp.ah_fp_status);
 -        break;
 -    case FPST_AH_F16:
 -        offset = offsetof(CPUARMState, vfp.ah_fp_status_f16);
 -        break;
 -    case FPST_STD:
 -        offset = offsetof(CPUARMState, vfp.standard_fp_status);
 -        break;
 -    case FPST_STD_F16:
 -        offset = offsetof(CPUARMState, vfp.standard_fp_status_f16);
 -        break;
 -    default:
--        return false;
--    case ARMMMUIdx_E10_0:
--    case ARMMMUIdx_E10_1:
--    case ARMMMUIdx_E10_1_PAN:
 -        g_assert_not_reached();
 -    }
--}
+     tcg_gen_addi_ptr(statusptr, tcg_env, offset);
--#endif /* !CONFIG_USER_ONLY */
+     return statusptr;
 -
  int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
  {
      if (regime_has_2_ranges(mmu_idx)) {
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool regime_translation_big_endian(CPUARMState *env, ARMMMUIdx mmu_idx)
      return (regime_sctlr(env, mmu_idx) & SCTLR_EE) != 0;
  }
-+static bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
-+{
-+    switch (mmu_idx) {
-+    case ARMMMUIdx_SE10_0:
-+    case ARMMMUIdx_E20_0:
-+    case ARMMMUIdx_SE20_0:
-+    case ARMMMUIdx_Stage1_E0:
-+    case ARMMMUIdx_Stage1_SE0:
-+    case ARMMMUIdx_MUser:
-+    case ARMMMUIdx_MSUser:
-+    case ARMMMUIdx_MUserNegPri:
-+    case ARMMMUIdx_MSUserNegPri:
-+        return true;
-+    default:
-+        return false;
-+    case ARMMMUIdx_E10_0:
-+    case ARMMMUIdx_E10_1:
-+    case ARMMMUIdx_E10_1_PAN:
-+        g_assert_not_reached();
-+    }
-+}
-+
- static bool ptw_attrs_are_device(CPUARMState *env, ARMCacheAttrs cacheattrs)
- {
-     /*
 --
-.25.1
+.34.1

-[PULL 47/55] target/arm: Rename sve_zcr_len_for_el to sve_vqm1_for_el
+[PULL 57/68] target/arm: Remove standard_fp_status_f16
 From: Richard Henderson <richard.henderson@linaro.org>
-This will be used for both Normal and Streaming SVE, and the value
+Replace with fp_status[FPST_STD_F16].
 does not necessarily come from ZCR_ELx.  While we're at it, emphasize
 the units in which the value is returned.
-Patch produced by
-    git grep -l sve_zcr_len_for_el | \
-    xargs -n1 sed -i 's/sve_zcr_len_for_el/sve_vqm1_for_el/g'
-and then adding a function comment.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220607203306.657998-13-richard.henderson@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20250129013857.135256-8-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h       | 11 ++++++++++-
+ target/arm/cpu.h            |  1 -
- target/arm/arch_dump.c |  2 +-
+ target/arm/cpu.c            |  4 ++--
- target/arm/cpu.c       |  2 +-
+ target/arm/tcg/mve_helper.c | 24 ++++++++++++------------
- target/arm/gdbstub64.c |  2 +-
+ target/arm/vfp_helper.c     |  8 ++++----
- target/arm/helper.c    | 12 ++++++------
+files changed, 18 insertions(+), 19 deletions(-)
 files changed, 19 insertions(+), 10 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ void aarch64_sync_64_to_32(CPUARMState *env);
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
+                 float_status ah_fp_status;
- int fp_exception_el(CPUARMState *env, int cur_el);
+                 float_status ah_fp_status_f16;
- int sve_exception_el(CPUARMState *env, int cur_el);
+                 float_status standard_fp_status;
--uint32_t sve_zcr_len_for_el(CPUARMState *env, int el);
+-                float_status standard_fp_status_f16;
-+
+             };
-+/**
+         };
-+ * sve_vqm1_for_el:
 + * @env: CPUARMState
 + * @el: exception level
 + *
 + * Compute the current SVE vector length for @el, in units of
 + * Quadwords Minus 1 -- the same scale used for ZCR_ELx.LEN.
 + */
 +uint32_t sve_vqm1_for_el(CPUARMState *env, int el);
  static inline bool is_a64(CPUARMState *env)
  {
 diff --git a/target/arm/arch_dump.c b/target/arm/arch_dump.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/arch_dump.c
 +++ b/target/arm/arch_dump.c
@@ -XXX,XX +XXX,XX @@ static off_t sve_fpcr_offset(uint32_t vq)
  static uint32_t sve_current_vq(CPUARMState *env)
  {
 -    return sve_zcr_len_for_el(env, arm_current_el(env)) + 1;
 +    return sve_vqm1_for_el(env, arm_current_el(env)) + 1;
  }
  static size_t sve_size_vq(uint32_t vq)
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
-                  vfp_get_fpcr(env), vfp_get_fpsr(env));
+     set_flush_to_zero(1, &env->vfp.standard_fp_status);
+     set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
-     if (cpu_isar_feature(aa64_sve, cpu) && sve_exception_el(env, el) == 0) {
+     set_default_nan_mode(1, &env->vfp.standard_fp_status);
--        int j, zcr_len = sve_zcr_len_for_el(env, el);
+-    set_default_nan_mode(1, &env->vfp.standard_fp_status_f16);
-+        int j, zcr_len = sve_vqm1_for_el(env, el);
++    set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
-         for (i = 0; i <= FFR_PRED_NUM; i++) {
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
-             bool eol;
+     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
-diff --git a/target/arm/gdbstub64.c b/target/arm/gdbstub64.c
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
 -    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
 +    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
      arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
      set_flush_to_zero(1, &env->vfp.ah_fp_status);
      set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
 diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/gdbstub64.c
+--- a/target/arm/tcg/mve_helper.c
-+++ b/target/arm/gdbstub64.c
++++ b/target/arm/tcg/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ int arm_gdb_get_svereg(CPUARMState *env, GByteArray *buf, int reg)
+@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
-          * We report in Vector Granules (VG) which is 64bit in a Z reg
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
-          * while the ZCR works in Vector Quads (VQ) which is 128bit chunks.
+                 continue;                                               \
-          */
+             }                                                           \
--        int vq = sve_zcr_len_for_el(env, arm_current_el(env)) + 1;
+-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
-+        int vq = sve_vqm1_for_el(env, arm_current_el(env)) + 1;
++            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-         return gdb_get_reg64(buf, vq * 2);
+                 &env->vfp.standard_fp_status;                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                  r[e] = 0;                                               \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(tm & 1)) {                                            \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE * 2)) == 0) {          \
                  continue;                                               \
              }                                                           \
 -            fpst0 = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :   \
 +            fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
                  &env->vfp.standard_fp_status;                           \
              fpst1 = fpst0;                                              \
              if (!(mask & 1)) {                                          \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
          TYPE *m = vm;                                           \
          TYPE ra = (TYPE)ra_in;                                  \
          float_status *fpst = (ESIZE == 2) ?                     \
 -            &env->vfp.standard_fp_status_f16 :                  \
 +            &env->vfp.fp_status[FPST_STD_F16] :                 \
              &env->vfp.standard_fp_status;                       \
          for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
              if (mask & 1) {                                     \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
              if ((mask & emask) == 0) {                                  \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & (1 << (e * ESIZE)))) {                         \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
              if ((mask & emask) == 0) {                                  \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & (1 << (e * ESIZE)))) {                         \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
          float_status *fpst;                                             \
          float_status scratch_fpst;                                      \
          float_status *base_fpst = (ESIZE == 2) ?                        \
 -            &env->vfp.standard_fp_status_f16 :                          \
 +            &env->vfp.fp_status[FPST_STD_F16] :                         \
              &env->vfp.standard_fp_status;                               \
          uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
          set_float_rounding_mode(rmode, base_fpst);                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
 +            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                  &env->vfp.standard_fp_status;                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
      /* FZ16 does not generate an input denormal exception.  */
      a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
            & ~float_flag_input_denormal_flushed);
 -    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
 +    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_STD_F16])
            & ~float_flag_input_denormal_flushed);
      a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
      set_float_exception_flags(0, &env->vfp.standard_fp_status);
 -    set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
      set_float_exception_flags(0, &env->vfp.ah_fp_status);
      set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
  }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          bool ftz_enabled = val & FPCR_FZ16;
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
 -        set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
 +        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
          set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
 -        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
 +        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
      }
-     default:
+     if (changed & FPCR_FZ) {
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
  /*
   * Given that SVE is enabled, return the vector length for EL.
   */
 -uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
 +uint32_t sve_vqm1_for_el(CPUARMState *env, int el)
  {
      ARMCPU *cpu = env_archcpu(env);
      uint32_t len = cpu->sve_max_vq - 1;
@@ -XXX,XX +XXX,XX @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
                        uint64_t value)
  {
      int cur_el = arm_current_el(env);
 -    int old_len = sve_zcr_len_for_el(env, cur_el);
 +    int old_len = sve_vqm1_for_el(env, cur_el);
      int new_len;
      /* Bits other than [3:0] are RAZ/WI.  */
@@ -XXX,XX +XXX,XX @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
       * Because we arrived here, we know both FP and SVE are enabled;
       * otherwise we would have trapped access to the ZCR_ELn register.
       */
 -    new_len = sve_zcr_len_for_el(env, cur_el);
 +    new_len = sve_vqm1_for_el(env, cur_el);
      if (new_len < old_len) {
          aarch64_sve_narrow_vq(env, new_len + 1);
      }
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
                  sve_el = 0;
              }
          } else if (sve_el == 0) {
 -            DP_TBFLAG_A64(flags, VL, sve_zcr_len_for_el(env, el));
 +            DP_TBFLAG_A64(flags, VL, sve_vqm1_for_el(env, el));
          }
          DP_TBFLAG_A64(flags, SVEEXC_EL, sve_el);
      }
@@ -XXX,XX +XXX,XX @@ void aarch64_sve_change_el(CPUARMState *env, int old_el,
       */
      old_a64 = old_el ? arm_el_is_aa64(env, old_el) : el0_a64;
      old_len = (old_a64 && !sve_exception_el(env, old_el)
 -               ? sve_zcr_len_for_el(env, old_el) : 0);
 +               ? sve_vqm1_for_el(env, old_el) : 0);
      new_a64 = new_el ? arm_el_is_aa64(env, new_el) : el0_a64;
      new_len = (new_a64 && !sve_exception_el(env, new_el)
 -               ? sve_zcr_len_for_el(env, new_el) : 0);
 +               ? sve_vqm1_for_el(env, new_el) : 0);
      /* When changing vector length, clear inaccessible state.  */
      if (new_len < old_len) {
 --
-.25.1
+.34.1

-[PULL 36/55] target/arm: Rename TBFLAG_A64 ZCR_LEN to VL
+[PULL 58/68] target/arm: Remove standard_fp_status
 From: Richard Henderson <richard.henderson@linaro.org>
-With SME, the vector length does not only come from ZCR_ELx.
+Replace with fp_status[FPST_STD].
-Comment that this is either NVL or SVL, like the pseudocode.
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220607203306.657998-2-richard.henderson@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20250129013857.135256-9-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h           | 3 ++-
+ target/arm/cpu.h            |  1 -
- target/arm/translate-a64.h | 2 +-
+ target/arm/cpu.c            |  8 ++++----
- target/arm/translate.h     | 2 +-
+ target/arm/tcg/mve_helper.c | 28 ++++++++++++++--------------
- target/arm/helper.c        | 2 +-
+ target/arm/tcg/vec_helper.c |  4 ++--
- target/arm/translate-a64.c | 2 +-
+ target/arm/vfp_helper.c     |  4 ++--
- target/arm/translate-sve.c | 2 +-
+files changed, 22 insertions(+), 23 deletions(-)
 files changed, 7 insertions(+), 6 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_M32, MVE_NO_PRED, 5, 1)            /* Not cached. */
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
-  */
+                 float_status fp_status_f16_a64;
- FIELD(TBFLAG_A64, TBII, 0, 2)
+                 float_status ah_fp_status;
- FIELD(TBFLAG_A64, SVEEXC_EL, 2, 2)
+                 float_status ah_fp_status_f16;
--FIELD(TBFLAG_A64, ZCR_LEN, 4, 4)
+-                float_status standard_fp_status;
-+/* The current vector length, either NVL or SVL. */
+             };
-+FIELD(TBFLAG_A64, VL, 4, 4)
+         };
- FIELD(TBFLAG_A64, PAUTH_ACTIVE, 8, 1)
- FIELD(TBFLAG_A64, BT, 9, 1)
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
- FIELD(TBFLAG_A64, BTYPE, 10, 2)         /* Not cached. */
+index XXXXXXX..XXXXXXX 100644
-diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
+--- a/target/arm/cpu.c
-index XXXXXXX..XXXXXXX 100644
++++ b/target/arm/cpu.c
---- a/target/arm/translate-a64.h
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
-+++ b/target/arm/translate-a64.h
+         env->sau.ctrl = 0;
-@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr vec_full_reg_ptr(DisasContext *s, int regno)
+     }
- /* Return the byte size of the "whole" vector register, VL / 8.  */
- static inline int vec_full_reg_size(DisasContext *s)
+-    set_flush_to_zero(1, &env->vfp.standard_fp_status);
- {
+-    set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
--    return s->sve_len;
+-    set_default_nan_mode(1, &env->vfp.standard_fp_status);
-+    return s->vl;
++    set_flush_to_zero(1, &env->vfp.fp_status[FPST_STD]);
 +    set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]);
 +    set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
      set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
 -    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
 +    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
 diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/mve_helper.c
 +++ b/target/arm/tcg/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(tm & 1)) {                                            \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
                  continue;                                               \
              }                                                           \
              fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              fpst1 = fpst0;                                              \
              if (!(mask & 1)) {                                          \
                  scratch_fpst = *fpst0;                                  \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
          TYPE ra = (TYPE)ra_in;                                  \
          float_status *fpst = (ESIZE == 2) ?                     \
              &env->vfp.fp_status[FPST_STD_F16] :                 \
 -            &env->vfp.standard_fp_status;                       \
 +            &env->vfp.fp_status[FPST_STD];                       \
          for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
              if (mask & 1) {                                     \
                  TYPE v = m[H##ESIZE(e)];                        \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & (1 << (e * ESIZE)))) {                         \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & (1 << (e * ESIZE)))) {                         \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
          float_status scratch_fpst;                                      \
          float_status *base_fpst = (ESIZE == 2) ?                        \
              &env->vfp.fp_status[FPST_STD_F16] :                         \
 -            &env->vfp.standard_fp_status;                               \
 +            &env->vfp.fp_status[FPST_STD];                               \
          uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
          set_float_rounding_mode(rmode, base_fpst);                      \
          for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
@@ -XXX,XX +XXX,XX @@ static void do_vcvt_sh(CPUARMState *env, void *vd, void *vm, int top)
      unsigned e;
      float_status *fpst;
      float_status scratch_fpst;
 -    float_status *base_fpst = &env->vfp.standard_fp_status;
 +    float_status *base_fpst = &env->vfp.fp_status[FPST_STD];
      bool old_fz = get_flush_to_zero(base_fpst);
      set_flush_to_zero(false, base_fpst);
      for (e = 0; e < 16 / 4; e++, mask >>= 4) {
@@ -XXX,XX +XXX,XX @@ static void do_vcvt_hs(CPUARMState *env, void *vd, void *vm, int top)
      unsigned e;
      float_status *fpst;
      float_status scratch_fpst;
 -    float_status *base_fpst = &env->vfp.standard_fp_status;
 +    float_status *base_fpst = &env->vfp.fp_status[FPST_STD];
      bool old_fiz = get_flush_inputs_to_zero(base_fpst);
      set_flush_inputs_to_zero(false, base_fpst);
      for (e = 0; e < 16 / 4; e++, mask >>= 4) {
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
                  continue;                                               \
              }                                                           \
              fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.standard_fp_status;                           \
 +                &env->vfp.fp_status[FPST_STD];                           \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
      bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 -    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
 +    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
               get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
  }
- bool disas_sve(DisasContext *, uint32_t);
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
-diff --git a/target/arm/translate.h b/target/arm/translate.h
+     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-index XXXXXXX..XXXXXXX 100644
+     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
---- a/target/arm/translate.h
-+++ b/target/arm/translate.h
+-    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
-@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
++    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
-     bool ns;        /* Use non-secure CPREG bank on access */
+                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
      int fp_excp_el; /* FP exception EL or 0 if enabled */
      int sve_excp_el; /* SVE exception EL or 0 if enabled */
 -    int sve_len;     /* SVE vector length in bytes */
 +    int vl;          /* current vector length in bytes */
      /* Flag indicating that exceptions from secure mode are routed to EL3. */
      bool secure_routed_to_el3;
      bool vfp_enabled; /* FP enabled via FPSCR.EN */
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
              zcr_len = sve_zcr_len_for_el(env, el);
          }
          DP_TBFLAG_A64(flags, SVEEXC_EL, sve_el);
 -        DP_TBFLAG_A64(flags, ZCR_LEN, zcr_len);
 +        DP_TBFLAG_A64(flags, VL, zcr_len);
      }
      sctlr = regime_sctlr(env, stage1);
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
      dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
      dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
      dc->sve_excp_el = EX_TBFLAG_A64(tb_flags, SVEEXC_EL);
 -    dc->sve_len = (EX_TBFLAG_A64(tb_flags, ZCR_LEN) + 1) * 16;
 +    dc->vl = (EX_TBFLAG_A64(tb_flags, VL) + 1) * 16;
      dc->pauth_active = EX_TBFLAG_A64(tb_flags, PAUTH_ACTIVE);
      dc->bt = EX_TBFLAG_A64(tb_flags, BT);
      dc->btype = EX_TBFLAG_A64(tb_flags, BTYPE);
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static inline int pred_full_reg_offset(DisasContext *s, int regno)
  /* Return the byte size of the whole predicate register, VL / 64.  */
  static inline int pred_full_reg_size(DisasContext *s)
  {
 -    return s->sve_len >> 3;
 +    return s->vl >> 3;
  }
- /* Round up the size of a register to a size allowed by
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
      uint32_t a32_flags = 0, a64_flags = 0;
      a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
 -    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
 +    a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
      /* FZ16 does not generate an input denormal exception.  */
      a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
            & ~float_flag_input_denormal_flushed);
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      set_float_exception_flags(0, &env->vfp.fp_status_a64);
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
 -    set_float_exception_flags(0, &env->vfp.standard_fp_status);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
      set_float_exception_flags(0, &env->vfp.ah_fp_status);
      set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
 --
-.25.1
+.34.1

-[PULL 55/55] target/arm: Add ID_AA64SMFR0_EL1
+[PULL 59/68] target/arm: Remove ah_fp_status_f16
 From: Richard Henderson <richard.henderson@linaro.org>
-This register is allocated from the existing block of id registers,
+Replace with fp_status[FPST_AH_F16].
 so it is already RES0 for cpus that do not implement SME.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220607203306.657998-21-richard.henderson@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20250129013857.135256-10-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h    | 25 +++++++++++++++++++++++++
+ target/arm/cpu.h        |  3 +--
- target/arm/helper.c |  4 ++--
+ target/arm/cpu.c        |  2 +-
- target/arm/kvm64.c  | 11 +++++++----
+ target/arm/vfp_helper.c | 10 +++++-----
-files changed, 34 insertions(+), 6 deletions(-)
+files changed, 7 insertions(+), 8 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
+@@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState;
-         uint64_t id_aa64dfr0;
+  * behaviour when FPCR.AH == 1: they don't update cumulative
-         uint64_t id_aa64dfr1;
+  * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
-         uint64_t id_aa64zfr0;
+  * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
-+        uint64_t id_aa64smfr0;
+- * which means we need an ah_fp_status_f16 as well.
-         uint64_t reset_pmcr_el0;
++ * which means we need an FPST_AH_F16 as well.
-     } isar;
+  *
-     uint64_t midr;
+  * To avoid having to transfer exception bits around, we simply
-@@ -XXX,XX +XXX,XX @@ FIELD(ID_AA64ZFR0, I8MM, 44, 4)
+  * say that the FPSCR cumulative exception flags are the logical
- FIELD(ID_AA64ZFR0, F32MM, 52, 4)
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
- FIELD(ID_AA64ZFR0, F64MM, 56, 4)
+                 float_status fp_status_f16_a32;
+                 float_status fp_status_f16_a64;
-+FIELD(ID_AA64SMFR0, F32F32, 32, 1)
+                 float_status ah_fp_status;
-+FIELD(ID_AA64SMFR0, B16F32, 34, 1)
+-                float_status ah_fp_status_f16;
-+FIELD(ID_AA64SMFR0, F16F32, 35, 1)
+             };
-+FIELD(ID_AA64SMFR0, I8I32, 36, 4)
+         };
-+FIELD(ID_AA64SMFR0, F64F64, 48, 1)
-+FIELD(ID_AA64SMFR0, I16I64, 52, 4)
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
-+FIELD(ID_AA64SMFR0, SMEVER, 56, 4)
+index XXXXXXX..XXXXXXX 100644
-+FIELD(ID_AA64SMFR0, FA64, 63, 1)
+--- a/target/arm/cpu.c
-+
++++ b/target/arm/cpu.c
- FIELD(ID_DFR0, COPDBG, 0, 4)
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
- FIELD(ID_DFR0, COPSDBG, 4, 4)
+     arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
- FIELD(ID_DFR0, MMAPDBG, 8, 4)
+     set_flush_to_zero(1, &env->vfp.ah_fp_status);
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve_f64mm(const ARMISARegisters *id)
+     set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
-     return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, F64MM) != 0;
+-    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status_f16);
 +    arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH_F16]);
  #ifndef CONFIG_USER_ONLY
      if (kvm_enabled()) {
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
      a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
            & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
      /*
 -     * We do not merge in flags from ah_fp_status or ah_fp_status_f16, because
 +     * We do not merge in flags from ah_fp_status or FPST_AH_F16, because
       * they are used for insns that must not set the cumulative exception bits.
       */
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
      set_float_exception_flags(0, &env->vfp.ah_fp_status);
 -    set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH_F16]);
  }
-+static inline bool isar_feature_aa64_sme_f64f64(const ARMISARegisters *id)
+ static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
-+{
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
-+    return FIELD_EX64(id->id_aa64smfr0, ID_AA64SMFR0, F64F64);
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
-+}
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
-+
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-+static inline bool isar_feature_aa64_sme_i16i64(const ARMISARegisters *id)
+-        set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
-+{
++        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-+    return FIELD_EX64(id->id_aa64smfr0, ID_AA64SMFR0, I16I64) == 0xf;
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
-+}
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
-+
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-+static inline bool isar_feature_aa64_sme_fa64(const ARMISARegisters *id)
+-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
-+{
++        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-+    return FIELD_EX64(id->id_aa64smfr0, ID_AA64SMFR0, FA64);
+     }
-+}
+     if (changed & FPCR_FZ) {
-+
+         bool ftz_enabled = val & FPCR_FZ;
- /*
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
-  * Feature tests for "does this exist in either 32-bit or 64-bit?"
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
-  */
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+         set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
-index XXXXXXX..XXXXXXX 100644
+-        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status_f16);
---- a/target/arm/helper.c
++        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-+++ b/target/arm/helper.c
+     }
-@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
+     if (changed & FPCR_AH) {
-               .access = PL1_R, .type = ARM_CP_CONST,
+         bool ah_enabled = val & FPCR_AH;
                .accessfn = access_aa64_tid3,
                .resetvalue = cpu->isar.id_aa64zfr0 },
 -            { .name = "ID_AA64PFR5_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
 +            { .name = "ID_AA64SMFR0_EL1", .state = ARM_CP_STATE_AA64,
                .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 5,
                .access = PL1_R, .type = ARM_CP_CONST,
                .accessfn = access_aa64_tid3,
 -              .resetvalue = 0 },
 +              .resetvalue = cpu->isar.id_aa64smfr0 },
              { .name = "ID_AA64PFR6_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
                .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 6,
                .access = PL1_R, .type = ARM_CP_CONST,
 diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm64.c
 +++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
      } else {
          err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64pfr1,
                                ARM64_SYS_REG(3, 0, 0, 4, 1));
 +        err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64smfr0,
 +                              ARM64_SYS_REG(3, 0, 0, 4, 5));
          err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64dfr0,
                                ARM64_SYS_REG(3, 0, 0, 5, 0));
          err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64dfr1,
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
          ahcf->isar.id_aa64pfr0 = t;
          /*
 -         * Before v5.1, KVM did not support SVE and did not expose
 -         * ID_AA64ZFR0_EL1 even as RAZ.  After v5.1, KVM still does
 -         * not expose the register to "user" requests like this
 -         * unless the host supports SVE.
 +         * There is a range of kernels between kernel commit 73433762fcae
 +         * and f81cb2c3ad41 which have a bug where the kernel doesn't expose
 +         * SYS_ID_AA64ZFR0_EL1 via the ONE_REG API unless the VM has enabled
 +         * SVE support, so we only read it here, rather than together with all
 +         * the other ID registers earlier.
           */
          err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64zfr0,
                                ARM64_SYS_REG(3, 0, 0, 4, 4));
 --
-.25.1
+.34.1

-[PULL 54/55] target/arm: Add isar_feature_aa64_sme
+[PULL 60/68] target/arm: Remove ah_fp_status
 From: Richard Henderson <richard.henderson@linaro.org>
-This will be used for implementing FEAT_SME.
+Replace with fp_status[FPST_AH].
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220607203306.657998-20-richard.henderson@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20250129013857.135256-11-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h | 5 +++++
+ target/arm/cpu.h        | 3 +--
-file changed, 5 insertions(+)
+ target/arm/cpu.c        | 6 +++---
  target/arm/vfp_helper.c | 6 +++---
 files changed, 7 insertions(+), 8 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_mte(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState;
-     return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, MTE) >= 2;
+  * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
   * using a fixed value for it.
   *
 - * The ah_fp_status is needed because some insns have different
 + * FPST_AH is needed because some insns have different
   * behaviour when FPCR.AH == 1: they don't update cumulative
   * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
   * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
                  float_status fp_status_a64;
                  float_status fp_status_f16_a32;
                  float_status fp_status_f16_a64;
 -                float_status ah_fp_status;
              };
          };
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
 -    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
 -    set_flush_to_zero(1, &env->vfp.ah_fp_status);
 -    set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
 +    arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]);
 +    set_flush_to_zero(1, &env->vfp.fp_status[FPST_AH]);
 +    set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_AH]);
      arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH_F16]);
  #ifndef CONFIG_USER_ONLY
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
      a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
            & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
      /*
 -     * We do not merge in flags from ah_fp_status or FPST_AH_F16, because
 +     * We do not merge in flags from FPST_AH or FPST_AH_F16, because
       * they are used for insns that must not set the cumulative exception bits.
       */
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
 -    set_float_exception_flags(0, &env->vfp.ah_fp_status);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH_F16]);
  }
-+static inline bool isar_feature_aa64_sme(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
-+{
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
-+    return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, SME) != 0;
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
-+}
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
-+
+-        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
- static inline bool isar_feature_aa64_pmu_8_1(const ARMISARegisters *id)
++        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
- {
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-     return FIELD_EX64(id->id_aa64dfr0, ID_AA64DFR0, PMUVER) >= 4 &&
+     }
      if (changed & FPCR_AH) {
 --
-.25.1
+.34.1

-[PULL 31/55] target/arm: Move regime_ttbr to ptw.c
+[PULL 61/68] target/arm: Remove fp_status_f16_a64
 From: Richard Henderson <richard.henderson@linaro.org>
+Replace with fp_status[FPST_A64_F16].
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-25-richard.henderson@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20250129013857.135256-12-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/ptw.h    |  1 -
+ target/arm/cpu.h            |  1 -
- target/arm/helper.c | 16 ----------------
+ target/arm/cpu.c            |  2 +-
- target/arm/ptw.c    | 16 ++++++++++++++++
+ target/arm/tcg/sme_helper.c |  2 +-
-files changed, 16 insertions(+), 17 deletions(-)
+ target/arm/tcg/vec_helper.c |  9 ++++-----
  target/arm/vfp_helper.c     | 16 ++++++++--------
 files changed, 14 insertions(+), 16 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/cpu.h
-+++ b/target/arm/ptw.h
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
- #ifndef CONFIG_USER_ONLY
+                 float_status fp_status_a32;
+                 float_status fp_status_a64;
- bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
+                 float_status fp_status_f16_a32;
--uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn);
+-                float_status fp_status_f16_a64;
+             };
- #endif /* !CONFIG_USER_ONLY */
+         };
- #endif /* TARGET_ARM_PTW_H */
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/cpu.c
-+++ b/target/arm/helper.c
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx)
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
-     return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
+     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
 -    arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
 +    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
      arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]);
      set_flush_to_zero(1, &env->vfp.fp_status[FPST_AH]);
 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/sme_helper.c
 +++ b/target/arm/tcg/sme_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
       * produces default NaNs. We also need a second copy of fp_status with
       * round-to-odd -- see above.
       */
 -    fpst_f16 = env->vfp.fp_status_f16_a64;
 +    fpst_f16 = env->vfp.fp_status[FPST_A64_F16];
      fpst_std = env->vfp.fp_status_a64;
      set_default_nan_mode(true, &fpst_std);
      set_default_nan_mode(true, &fpst_f16);
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
          }
      }
      do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
 -             get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
 +             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
  }
--/* Return the TTBR associated with this translation regime */
+ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
--uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn)
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
--{
+     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
--    if (mmu_idx == ARMMMUIdx_Stage2) {
+     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
--        return env->cp15.vttbr_el2;
+     float_status *status = &env->vfp.fp_status_a64;
--    }
+-    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
--    if (mmu_idx == ARMMMUIdx_Stage2_S) {
++    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
--        return env->cp15.vsttbr_el2;
+     int negx = 0, negf = 0;
--    }
--    if (ttbrn == 0) {
+     if (is_s) {
--        return env->cp15.ttbr0_el[regime_el(env, mmu_idx)];
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
--    } else {
+         }
--        return env->cp15.ttbr1_el[regime_el(env, mmu_idx)];
+     }
--    }
+     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
--}
+-                 get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
 +                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
  }
  void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
      intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
      intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
      float_status *status = &env->vfp.fp_status_a64;
 -    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
 +    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
      int negx = 0, negf = 0;
      if (is_s) {
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
              negx = 0x8000;
          }
      }
 -
- /* Convert a possible stage1+2 MMU index into the appropriate
+     for (i = 0; i < oprsz; i += 16) {
-  * stage 1 MMU index
+         float16 mm_16 = *(float16 *)(vm + i + idx);
-  */
+         float32 mm = float16_to_float32_by_bits(mm_16, fz16);
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/vfp_helper.c
-+++ b/target/arm/ptw.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ static bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
            & ~float_flag_input_denormal_flushed);
      a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
 -    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
 +    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A64_F16])
            & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
      /*
       * We do not merge in flags from FPST_AH or FPST_AH_F16, because
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      set_float_exception_flags(0, &env->vfp.fp_status_a32);
      set_float_exception_flags(0, &env->vfp.fp_status_a64);
      set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
 -    set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          set_float_rounding_mode(i, &env->vfp.fp_status_a32);
          set_float_rounding_mode(i, &env->vfp.fp_status_a64);
          set_float_rounding_mode(i, &env->vfp.fp_status_f16_a32);
 -        set_float_rounding_mode(i, &env->vfp.fp_status_f16_a64);
 +        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
      }
- }
+     if (changed & FPCR_FZ16) {
+         bool ftz_enabled = val & FPCR_FZ16;
-+/* Return the TTBR associated with this translation regime */
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
-+static uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn)
+-        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
-+{
++        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]);
-+    if (mmu_idx == ARMMMUIdx_Stage2) {
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-+        return env->cp15.vttbr_el2;
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-+    }
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
-+    if (mmu_idx == ARMMMUIdx_Stage2_S) {
+-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
-+        return env->cp15.vsttbr_el2;
++        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]);
-+    }
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-+    if (ttbrn == 0) {
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-+        return env->cp15.ttbr0_el[regime_el(env, mmu_idx)];
+     }
-+    } else {
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
-+        return env->cp15.ttbr1_el[regime_el(env, mmu_idx)];
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
-+    }
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
-+}
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
-+
+-        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
- static bool ptw_attrs_are_device(CPUARMState *env, ARMCacheAttrs cacheattrs)
++        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
- {
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
      }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          if (ah_enabled) {
              /* Change behaviours for A64 FP operations */
              arm_set_ah_fp_behaviours(&env->vfp.fp_status_a64);
 -            arm_set_ah_fp_behaviours(&env->vfp.fp_status_f16_a64);
 +            arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
          } else {
              arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
 -            arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
 +            arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
          }
      }
      /*
 --
-.25.1
+.34.1

-[PULL 35/55] target/arm: Pass CPUARMState to arm_ld[lq]_ptw
+[PULL 62/68] target/arm: Remove fp_status_f16_a32
 From: Richard Henderson <richard.henderson@linaro.org>
-The use of ARM_CPU to recover env from cs calls
+Replace with fp_status[FPST_A32_F16].
 object_class_dynamic_cast, which shows up on the profile.
 This is pointless, because all callers already have env, and
 the reverse operation, env_cpu, is only pointer arithmetic.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-29-richard.henderson@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20250129013857.135256-13-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/ptw.c | 23 +++++++++--------------
+ target/arm/cpu.h            |  1 -
-file changed, 9 insertions(+), 14 deletions(-)
+ target/arm/cpu.c            |  2 +-
  target/arm/tcg/vec_helper.c |  4 ++--
  target/arm/vfp_helper.c     | 14 +++++++-------
 files changed, 10 insertions(+), 11 deletions(-)
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
+--- a/target/arm/cpu.h
-+++ b/target/arm/ptw.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
              struct {
                  float_status fp_status_a32;
                  float_status fp_status_a64;
 -                float_status fp_status_f16_a32;
              };
          };
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
      arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
 -    arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
 +    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
      arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]);
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
      uint64_t negx = is_s ? 0x8000800080008000ull : 0;
      do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
 -             get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 +             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
  }
- /* All loads done in the course of a page table walk go through here. */
+ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
--static uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
-+static uint32_t arm_ldl_ptw(CPUARMState *env, hwaddr addr, bool is_secure,
+     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
-                             ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
- {
+     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
--    ARMCPU *cpu = ARM_CPU(cs);
+-                 get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
--    CPUARMState *env = &cpu->env;
++                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
 +    CPUState *cs = env_cpu(env);
      MemTxAttrs attrs = {};
      MemTxResult result = MEMTX_OK;
      AddressSpace *as;
@@ -XXX,XX +XXX,XX @@ static uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
      return 0;
  }
--static uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
+ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
-+static uint64_t arm_ldq_ptw(CPUARMState *env, hwaddr addr, bool is_secure,
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
-                             ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
+index XXXXXXX..XXXXXXX 100644
- {
+--- a/target/arm/vfp_helper.c
--    ARMCPU *cpu = ARM_CPU(cs);
++++ b/target/arm/vfp_helper.c
--    CPUARMState *env = &cpu->env;
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
-+    CPUState *cs = env_cpu(env);
+     a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
-     MemTxAttrs attrs = {};
+     a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
-     MemTxResult result = MEMTX_OK;
+     /* FZ16 does not generate an input denormal exception.  */
-     AddressSpace *as;
+-    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
++    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A32_F16])
-                              target_ulong *page_size,
+           & ~float_flag_input_denormal_flushed);
-                              ARMMMUFaultInfo *fi)
+     a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_STD_F16])
- {
+           & ~float_flag_input_denormal_flushed);
--    CPUState *cs = env_cpu(env);
+@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
-     int level = 1;
+      */
-     uint32_t table;
+     set_float_exception_flags(0, &env->vfp.fp_status_a32);
-     uint32_t desc;
+     set_float_exception_flags(0, &env->vfp.fp_status_a64);
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
+-    set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
-         fi->type = ARMFault_Translation;
++    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32_F16]);
-         goto do_fault;
+     set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          }
          set_float_rounding_mode(i, &env->vfp.fp_status_a32);
          set_float_rounding_mode(i, &env->vfp.fp_status_a64);
 -        set_float_rounding_mode(i, &env->vfp.fp_status_f16_a32);
 +        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]);
          set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
      }
--    desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
+     if (changed & FPCR_FZ16) {
-+    desc = arm_ldl_ptw(env, table, regime_is_secure(env, mmu_idx),
+         bool ftz_enabled = val & FPCR_FZ16;
-                        mmu_idx, fi);
+-        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
-     if (fi->type != ARMFault_None) {
++        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32_F16]);
-         goto do_fault;
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]);
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-             /* Fine pagetable.  */
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-             table = (desc & 0xfffff000) | ((address >> 8) & 0xffc);
+-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
-         }
++        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32_F16]);
--        desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]);
-+        desc = arm_ldl_ptw(env, table, regime_is_secure(env, mmu_idx),
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-                            mmu_idx, fi);
+         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-         if (fi->type != ARMFault_None) {
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
-             goto do_fault;
+         bool dnan_enabled = val & FPCR_DN;
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
-                              hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
-                              target_ulong *page_size, ARMMMUFaultInfo *fi)
+-        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
- {
++        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32_F16]);
--    CPUState *cs = env_cpu(env);
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
-     ARMCPU *cpu = env_archcpu(env);
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
-     int level = 1;
+         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
-     uint32_t table;
+@@ -XXX,XX +XXX,XX @@ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
+     softfloat_to_vfp_compare(env, \
-         fi->type = ARMFault_Translation;
+         FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
-         goto do_fault;
+ }
-     }
+-DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status_f16_a32)
--    desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
++DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
-+    desc = arm_ldl_ptw(env, table, regime_is_secure(env, mmu_idx),
+ DO_VFP_cmp(s, float32, float32, fp_status_a32)
-                        mmu_idx, fi);
+ DO_VFP_cmp(d, float64, float64, fp_status_a32)
-     if (fi->type != ARMFault_None) {
+ #undef DO_VFP_cmp
          goto do_fault;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
          ns = extract32(desc, 3, 1);
          /* Lookup l2 entry.  */
          table = (desc & 0xfffffc00) | ((address >> 10) & 0x3fc);
 -        desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
 +        desc = arm_ldl_ptw(env, table, regime_is_secure(env, mmu_idx),
                             mmu_idx, fi);
          if (fi->type != ARMFault_None) {
              goto do_fault;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
                                 ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
  {
      ARMCPU *cpu = env_archcpu(env);
 -    CPUState *cs = CPU(cpu);
      /* Read an LPAE long-descriptor translation table. */
      ARMFaultType fault_type = ARMFault_Translation;
      uint32_t level;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
          descaddr |= (address >> (stride * (4 - level))) & indexmask;
          descaddr &= ~7ULL;
          nstable = extract32(tableattrs, 4, 1);
 -        descriptor = arm_ldq_ptw(cs, descaddr, !nstable, mmu_idx, fi);
 +        descriptor = arm_ldq_ptw(env, descaddr, !nstable, mmu_idx, fi);
          if (fi->type != ARMFault_None) {
              goto do_fault;
          }
 --
-.25.1
+.34.1

-[PULL 37/55] linux-user/aarch64: Introduce sve_vq
+[PULL 63/68] target/arm: Remove fp_status_a64
 From: Richard Henderson <richard.henderson@linaro.org>
-Add an interface function to extract the digested vector length
+Replace with fp_status[FPST_A64].
 rather than the raw zcr_el[1] value.  This fixes an incorrect
 return from do_prctl_set_vl where we didn't take into account
 the set of vector lengths supported by the cpu.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220607203306.657998-3-richard.henderson@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20250129013857.135256-14-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- linux-user/aarch64/target_prctl.h | 20 +++++++++++++-------
+ target/arm/cpu.h            |  1 -
- target/arm/cpu.h                  | 11 +++++++++++
+ target/arm/cpu.c            |  2 +-
- linux-user/aarch64/signal.c       |  4 ++--
+ target/arm/tcg/sme_helper.c |  2 +-
-files changed, 26 insertions(+), 9 deletions(-)
+ target/arm/tcg/vec_helper.c | 10 +++++-----
  target/arm/vfp_helper.c     | 16 ++++++++--------
 files changed, 15 insertions(+), 16 deletions(-)
-diff --git a/linux-user/aarch64/target_prctl.h b/linux-user/aarch64/target_prctl.h
-index XXXXXXX..XXXXXXX 100644
---- a/linux-user/aarch64/target_prctl.h
-+++ b/linux-user/aarch64/target_prctl.h
-@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_get_vl(CPUArchState *env)
- {
-     ARMCPU *cpu = env_archcpu(env);
-     if (cpu_isar_feature(aa64_sve, cpu)) {
--        return ((cpu->env.vfp.zcr_el[1] & 0xf) + 1) * 16;
-+        return sve_vq(env) * 16;
-     }
-     return -TARGET_EINVAL;
- }
-@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_set_vl(CPUArchState *env, abi_long arg2)
-      */
-     if (cpu_isar_feature(aa64_sve, env_archcpu(env))
-         && arg2 >= 0 && arg2 <= 512 * 16 && !(arg2 & 15)) {
--        ARMCPU *cpu = env_archcpu(env);
-         uint32_t vq, old_vq;
--        old_vq = (env->vfp.zcr_el[1] & 0xf) + 1;
--        vq = MAX(arg2 / 16, 1);
--        vq = MIN(vq, cpu->sve_max_vq);
-+        old_vq = sve_vq(env);
-+        /*
-+         * Bound the value of arg2, so that we know that it fits into
-+         * the 4-bit field in ZCR_EL1.  Rely on the hflags rebuild to
-+         * sort out the length supported by the cpu.
-+         */
-+        vq = MAX(arg2 / 16, 1);
-+        vq = MIN(vq, ARM_MAX_VQ);
-+        env->vfp.zcr_el[1] = vq - 1;
-+        arm_rebuild_hflags(env);
-+
-+        vq = sve_vq(env);
-         if (vq < old_vq) {
-             aarch64_sve_narrow_vq(env, vq);
-         }
--        env->vfp.zcr_el[1] = vq - 1;
--        arm_rebuild_hflags(env);
-         return vq * 16;
-     }
-     return -TARGET_EINVAL;
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline int cpu_mmu_index(CPUARMState *env, bool ifetch)
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
-     return EX_TBFLAG_ANY(env->hflags, MMUIDX);
+             float_status fp_status[FPST_COUNT];
              struct {
                  float_status fp_status_a32;
 -                float_status fp_status_a64;
              };
          };
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
      set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
      set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
 -    arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
 +    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/sme_helper.c
 +++ b/target/arm/tcg/sme_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
       * round-to-odd -- see above.
       */
      fpst_f16 = env->vfp.fp_status[FPST_A64_F16];
 -    fpst_std = env->vfp.fp_status_a64;
 +    fpst_std = env->vfp.fp_status[FPST_A64];
      set_default_nan_mode(true, &fpst_std);
      set_default_nan_mode(true, &fpst_f16);
      fpst_odd = fpst_std;
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
              negx = 0x8000800080008000ull;
          }
      }
 -    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
 +    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
               get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
  }
-+/**
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
-+ * sve_vq
+     intptr_t i, oprsz = simd_oprsz(desc);
-+ * @env: the cpu context
+     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-+ *
+     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
-+ * Return the VL cached within env->hflags, in units of quadwords.
+-    float_status *status = &env->vfp.fp_status_a64;
-+ */
++    float_status *status = &env->vfp.fp_status[FPST_A64];
-+static inline int sve_vq(CPUARMState *env)
+     bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
-+{
+     int negx = 0, negf = 0;
-+    return EX_TBFLAG_A64(env->hflags, VL) + 1;
-+}
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
-+
+             negx = 0x8000800080008000ull;
- static inline bool bswap_code(bool sctlr_b)
+         }
- {
+     }
- #ifdef CONFIG_USER_ONLY
+-    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
-diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
++    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
                   get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
  }
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
      bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
      intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
 -    float_status *status = &env->vfp.fp_status_a64;
 +    float_status *status = &env->vfp.fp_status[FPST_A64];
      bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
      int negx = 0, negf = 0;
@@ -XXX,XX +XXX,XX @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp)
       */
      bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;
 -    *statusp = is_a64(env) ? env->vfp.fp_status_a64 : env->vfp.fp_status_a32;
 +    *statusp = is_a64(env) ? env->vfp.fp_status[FPST_A64] : env->vfp.fp_status_a32;
      set_default_nan_mode(true, statusp);
      if (ebf) {
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/linux-user/aarch64/signal.c
+--- a/target/arm/vfp_helper.c
-+++ b/linux-user/aarch64/signal.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
+@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
+     a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_STD_F16])
-         case TARGET_SVE_MAGIC:
+           & ~float_flag_input_denormal_flushed);
-             if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
--                vq = (env->vfp.zcr_el[1] & 0xf) + 1;
+-    a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
-+                vq = sve_vq(env);
++    a64_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_A64]);
-                 sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
+     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A64_F16])
-                 if (!sve && size == sve_size) {
+           & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
-                     sve = (struct target_sve_context *)ctx;
+     /*
-@@ -XXX,XX +XXX,XX @@ static void target_setup_frame(int usig, struct target_sigaction *ka,
+@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
+      * be the architecturally up-to-date exception flag information first.
-     /* SVE state needs saving only if it exists.  */
+      */
-     if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
+     set_float_exception_flags(0, &env->vfp.fp_status_a32);
--        vq = (env->vfp.zcr_el[1] & 0xf) + 1;
+-    set_float_exception_flags(0, &env->vfp.fp_status_a64);
-+        vq = sve_vq(env);
++    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64]);
-         sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
+     set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32_F16]);
-         sve_ofs = alloc_sigframe_space(sve_size, &layout);
+     set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
              break;
          }
          set_float_rounding_mode(i, &env->vfp.fp_status_a32);
 -        set_float_rounding_mode(i, &env->vfp.fp_status_a64);
 +        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64]);
          set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]);
          set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
      }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
      if (changed & FPCR_FZ) {
          bool ftz_enabled = val & FPCR_FZ;
          set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
 -        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
 +        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64]);
          /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
          set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
      }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
           */
          bool fitz_enabled = (val & FPCR_FIZ) ||
              (val & (FPCR_FZ | FPCR_AH)) == FPCR_FZ;
 -        set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status_a64);
 +        set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status[FPST_A64]);
      }
      if (changed & FPCR_DN) {
          bool dnan_enabled = val & FPCR_DN;
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
 -        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
          if (ah_enabled) {
              /* Change behaviours for A64 FP operations */
 -            arm_set_ah_fp_behaviours(&env->vfp.fp_status_a64);
 +            arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
              arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
          } else {
 -            arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
 +            arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
              arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]);
          }
      }
 --
-.25.1
+.34.1

-[PULL 44/55] target/arm: Do not use aarch64_sve_zcr_get_valid_len in reset
+[PULL 64/68] target/arm: Remove fp_status_a32
 From: Richard Henderson <richard.henderson@linaro.org>
-We don't need to constrain the value set in zcr_el[1],
+Replace with fp_status[FPST_A32].  As this was the last of the
-because it will be done by sve_zcr_len_for_el.
+old structures, we can remove the anonymous union and struct.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220607203306.657998-10-richard.henderson@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20250129013857.135256-15-richard.henderson@linaro.org
 [PMM: tweak to account for change to is_ebf()]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.c | 3 +--
+ target/arm/cpu.h            |  7 +------
-file changed, 1 insertion(+), 2 deletions(-)
+ target/arm/cpu.c            |  2 +-
  target/arm/tcg/vec_helper.c |  2 +-
  target/arm/vfp_helper.c     | 18 +++++++++---------
 files changed, 12 insertions(+), 17 deletions(-)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
+         uint32_t scratch[8];
+         /* There are a number of distinct float control structures. */
+-        union {
+-            float_status fp_status[FPST_COUNT];
+-            struct {
+-                float_status fp_status_a32;
+-            };
+-        };
++        float_status fp_status[FPST_COUNT];
+         uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
+         uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
-                                          CPACR_EL1, ZEN, 3);
+     set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]);
-         /* with reasonable vector length */
+     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
-         if (cpu_isar_feature(aa64_sve, cpu)) {
+     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
--            env->vfp.zcr_el[1] =
+-    arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
--                aarch64_sve_zcr_get_valid_len(cpu, cpu->sve_default_vq - 1);
++    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32]);
-+            env->vfp.zcr_el[1] = cpu->sve_default_vq - 1;
+     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
      arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]);
 diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/vec_helper.c
 +++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp)
       */
      bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;
 -    *statusp = is_a64(env) ? env->vfp.fp_status[FPST_A64] : env->vfp.fp_status_a32;
 +    *statusp = env->vfp.fp_status[is_a64(env) ? FPST_A64 : FPST_A32];
      set_default_nan_mode(true, statusp);
      if (ebf) {
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
  {
      uint32_t a32_flags = 0, a64_flags = 0;
 -    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
 +    a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_A32]);
      a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
      /* FZ16 does not generate an input denormal exception.  */
      a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A32_F16])
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
       * values. The caller should have arranged for env->vfp.fpsr to
       * be the architecturally up-to-date exception flag information first.
       */
 -    set_float_exception_flags(0, &env->vfp.fp_status_a32);
 +    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32_F16]);
      set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
              i = float_round_to_zero;
              break;
          }
+-        set_float_rounding_mode(i, &env->vfp.fp_status_a32);
++        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32]);
+         set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64]);
+         set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]);
+         set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
+     }
+     if (changed & FPCR_FZ) {
+         bool ftz_enabled = val & FPCR_FZ;
+-        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
++        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]);
+         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64]);
+         /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
+-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
++        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]);
+     }
+     if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
          /*
-          * Enable 48-bit address space (TODO: take reserved_va into account).
+@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
      }
      if (changed & FPCR_DN) {
          bool dnan_enabled = val & FPCR_DN;
 -        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32_F16]);
          set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
          FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
  }
  DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
 -DO_VFP_cmp(s, float32, float32, fp_status_a32)
 -DO_VFP_cmp(d, float64, float64, fp_status_a32)
 +DO_VFP_cmp(s, float32, float32, fp_status[FPST_A32])
 +DO_VFP_cmp(d, float64, float64, fp_status[FPST_A32])
  #undef DO_VFP_cmp
  /* Integer to float and float to integer conversions */
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(fjcvtzs)(float64 value, float_status *status)
  uint32_t HELPER(vjcvt)(float64 value, CPUARMState *env)
  {
 -    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status_a32);
 +    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status[FPST_A32]);
      uint32_t result = pair;
      uint32_t z = (pair >> 32) == 0;
 --
-.25.1
+.34.1

-[PULL 41/55] target/arm: Use el_is_in_host for sve_zcr_len_for_el
+[PULL 65/68] target/arm: Simplify fp_status indexing in mve_helper.c
 From: Richard Henderson <richard.henderson@linaro.org>
-The ARM pseudocode function NVL uses this predicate now,
+Select on index instead of pointer.
-and I think it's a bit clearer.  Simplify the pseudocode
+No functional change.
 condition by noting that IsInHost is always false for EL1.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220607203306.657998-7-richard.henderson@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20250129013857.135256-16-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 3 +--
+ target/arm/tcg/mve_helper.c | 40 +++++++++++++------------------------
-file changed, 1 insertion(+), 2 deletions(-)
+file changed, 14 insertions(+), 26 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/tcg/mve_helper.c
-+++ b/target/arm/helper.c
++++ b/target/arm/tcg/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
+@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
-     ARMCPU *cpu = env_archcpu(env);
+             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
-     uint32_t zcr_len = cpu->sve_max_vq - 1;
+                 continue;                                               \
+             }                                                           \
--    if (el <= 1 &&
+-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
--        (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
+-                &env->vfp.fp_status[FPST_STD];                           \
-+    if (el <= 1 && !el_is_in_host(env, el)) {
++            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
-         zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
+             if (!(mask & 1)) {                                          \
-     }
+                 /* We need the result but without updating flags */     \
-     if (el <= 2 && arm_feature(env, ARM_FEATURE_EL2)) {
+                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                  r[e] = 0;                                               \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.fp_status[FPST_STD];                           \
 +            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
              if (!(tm & 1)) {                                            \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.fp_status[FPST_STD];                           \
 +            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE * 2)) == 0) {          \
                  continue;                                               \
              }                                                           \
 -            fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
 -                &env->vfp.fp_status[FPST_STD];                           \
 +            fpst0 = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
              fpst1 = fpst0;                                              \
              if (!(mask & 1)) {                                          \
                  scratch_fpst = *fpst0;                                  \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.fp_status[FPST_STD];                           \
 +            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.fp_status[FPST_STD];                           \
 +            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
          unsigned e;                                             \
          TYPE *m = vm;                                           \
          TYPE ra = (TYPE)ra_in;                                  \
 -        float_status *fpst = (ESIZE == 2) ?                     \
 -            &env->vfp.fp_status[FPST_STD_F16] :                 \
 -            &env->vfp.fp_status[FPST_STD];                       \
 +        float_status *fpst =                                    \
 +            &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
          for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
              if (mask & 1) {                                     \
                  TYPE v = m[H##ESIZE(e)];                        \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
              if ((mask & emask) == 0) {                                  \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.fp_status[FPST_STD];                           \
 +            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
              if (!(mask & (1 << (e * ESIZE)))) {                         \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
              if ((mask & emask) == 0) {                                  \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.fp_status[FPST_STD];                           \
 +            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
              if (!(mask & (1 << (e * ESIZE)))) {                         \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.fp_status[FPST_STD];                           \
 +            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
          unsigned e;                                                     \
          float_status *fpst;                                             \
          float_status scratch_fpst;                                      \
 -        float_status *base_fpst = (ESIZE == 2) ?                        \
 -            &env->vfp.fp_status[FPST_STD_F16] :                         \
 -            &env->vfp.fp_status[FPST_STD];                               \
 +        float_status *base_fpst =                                       \
 +            &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD];  \
          uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
          set_float_rounding_mode(rmode, base_fpst);                      \
          for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
              if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                  continue;                                               \
              }                                                           \
 -            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
 -                &env->vfp.fp_status[FPST_STD];                           \
 +            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
              if (!(mask & 1)) {                                          \
                  /* We need the result but without updating flags */     \
                  scratch_fpst = *fpst;                                   \
 --
-.25.1
+.34.1

-[PULL 38/55] target/arm: Remove route_to_el2 check from sve_exception_el
+[PULL 66/68] target/arm: Simplify DO_VFP_cmp in vfp_helper.c
 From: Richard Henderson <richard.henderson@linaro.org>
-We handle this routing in raise_exception.  Promoting the value early
+Pass ARMFPStatusFlavour index instead of fp_status[FOO].
 means that we can't directly compare FPEXC_EL and SVEEXC_EL.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220607203306.657998-4-richard.henderson@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20250129013857.135256-17-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 3 +--
+ target/arm/vfp_helper.c | 10 +++++-----
-file changed, 1 insertion(+), 2 deletions(-)
+file changed, 5 insertions(+), 5 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/vfp_helper.c
-+++ b/target/arm/helper.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
+@@ -XXX,XX +XXX,XX @@ static void softfloat_to_vfp_compare(CPUARMState *env, FloatRelation cmp)
-             /* fall through */
+ void VFP_HELPER(cmp, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env)  \
-         case 0:
+ { \
-         case 2:
+     softfloat_to_vfp_compare(env, \
--            /* route_to_el2 */
+-        FLOATTYPE ## _compare_quiet(a, b, &env->vfp.FPST)); \
--            return hcr_el2 & HCR_TGE ? 2 : 1;
++        FLOATTYPE ## _compare_quiet(a, b, &env->vfp.fp_status[FPST])); \
-+            return 1;
+ } \
-         }
+ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
+ { \
-         /* Check CPACR.FPEN.  */
+     softfloat_to_vfp_compare(env, \
 -        FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
 +        FLOATTYPE ## _compare(a, b, &env->vfp.fp_status[FPST])); \
  }
 -DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
 -DO_VFP_cmp(s, float32, float32, fp_status[FPST_A32])
 -DO_VFP_cmp(d, float64, float64, fp_status[FPST_A32])
 +DO_VFP_cmp(h, float16, dh_ctype_f16, FPST_A32_F16)
 +DO_VFP_cmp(s, float32, float32, FPST_A32)
 +DO_VFP_cmp(d, float64, float64, FPST_A32)
  #undef DO_VFP_cmp
  /* Integer to float and float to integer conversions */
 --
-.25.1
+.34.1

-[PULL 10/55] target/arm: Move get_phys_addr_v5 to ptw.c
+[PULL 67/68] target/arm: Read fz16 from env->vfp.fpcr
 From: Richard Henderson <richard.henderson@linaro.org>
+Read the bit from the source, rather than from the proxy via
+get_flush_inputs_to_zero.  This makes it clear that it does
+not matter which of the float_status structures is used.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-4-richard.henderson@linaro.org
+Message-id: 20250129013857.135256-34-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/ptw.h    |  15 +++--
+ target/arm/tcg/vec_helper.c | 12 ++++++------
- target/arm/helper.c | 137 +++-----------------------------------------
+file changed, 6 insertions(+), 6 deletions(-)
  target/arm/ptw.c    | 123 +++++++++++++++++++++++++++++++++++++++
 files changed, 140 insertions(+), 135 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/tcg/vec_helper.c
-+++ b/target/arm/ptw.h
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
+     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
- #ifndef CONFIG_USER_ONLY
+     do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
-+uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
+-             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
-+                     ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi);
++             env->vfp.fpcr & FPCR_FZ16);
 +uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
 +                     ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi);
 +
  bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx);
  bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
  ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
                                   ARMCacheAttrs s1, ARMCacheAttrs s2);
 -bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
 -                      MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                      hwaddr *phys_ptr, int *prot,
 -                      target_ulong *page_size,
 -                      ARMMMUFaultInfo *fi);
 +bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
 +                              uint32_t *table, uint32_t address);
 +int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
 +                  int ap, int domain_prot);
 +
  bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
                            MMUAccessType access_type, ARMMMUIdx mmu_idx,
                            hwaddr *phys_ptr, int *prot,
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
   * @ap:          The 3-bit access permissions (AP[2:0])
   * @domain_prot: The 2-bit domain access permissions
   */
 -static inline int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
 -                                int ap, int domain_prot)
 +int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap, int domain_prot)
  {
      bool is_user = regime_is_user(env, mmu_idx);
@@ -XXX,XX +XXX,XX @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
      return prot_rw | PAGE_EXEC;
  }
--static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
+ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
--                                     uint32_t *table, uint32_t address)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
-+bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
+         }
-+                              uint32_t *table, uint32_t address)
+     }
- {
+     do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-     /* Note that we can only get here for an AArch32 PL0/PL1 lookup */
+-             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
-     TCR *tcr = regime_tcr(env, mmu_idx);
++             env->vfp.fpcr & FPCR_FZ16);
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
  }
- /* All loads done in the course of a page table walk go through here. */
+ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
--static uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
--                            ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
+     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
-+uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
+     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
-+                     ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
+     float_status *status = &env->vfp.fp_status[FPST_A64];
- {
+-    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
-     ARMCPU *cpu = ARM_CPU(cs);
++    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
-     CPUARMState *env = &cpu->env;
+     int negx = 0, negf = 0;
-@@ -XXX,XX +XXX,XX @@ static uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
-     return 0;
+     if (is_s) {
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
      uint64_t negx = is_s ? 0x8000800080008000ull : 0;
      do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
 -                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
 +                 env->vfp.fpcr & FPCR_FZ16);
  }
--static uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
+ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
--                            ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
-+uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
+         }
-+                     ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
+     }
- {
+     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-     ARMCPU *cpu = ARM_CPU(cs);
+-                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
-     CPUARMState *env = &cpu->env;
++                 env->vfp.fpcr & FPCR_FZ16);
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
      return 0;
  }
--bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
+ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
--                      MMUAccessType access_type, ARMMMUIdx mmu_idx,
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
--                      hwaddr *phys_ptr, int *prot,
+     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
--                      target_ulong *page_size,
+     intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
--                      ARMMMUFaultInfo *fi)
+     float_status *status = &env->vfp.fp_status[FPST_A64];
--{
+-    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
--    CPUState *cs = env_cpu(env);
++    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
--    int level = 1;
+     int negx = 0, negf = 0;
--    uint32_t table;
--    uint32_t desc;
+     if (is_s) {
 -    int type;
 -    int ap;
 -    int domain = 0;
 -    int domain_prot;
 -    hwaddr phys_addr;
 -    uint32_t dacr;
 -
 -    /* Pagetable walk.  */
 -    /* Lookup l1 descriptor.  */
 -    if (!get_level1_table_address(env, mmu_idx, &table, address)) {
 -        /* Section translation fault if page walk is disabled by PD0 or PD1 */
 -        fi->type = ARMFault_Translation;
 -        goto do_fault;
 -    }
 -    desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
 -                       mmu_idx, fi);
 -    if (fi->type != ARMFault_None) {
 -        goto do_fault;
 -    }
 -    type = (desc & 3);
 -    domain = (desc >> 5) & 0x0f;
 -    if (regime_el(env, mmu_idx) == 1) {
 -        dacr = env->cp15.dacr_ns;
 -    } else {
 -        dacr = env->cp15.dacr_s;
 -    }
 -    domain_prot = (dacr >> (domain * 2)) & 3;
 -    if (type == 0) {
 -        /* Section translation fault.  */
 -        fi->type = ARMFault_Translation;
 -        goto do_fault;
 -    }
 -    if (type != 2) {
 -        level = 2;
 -    }
 -    if (domain_prot == 0 || domain_prot == 2) {
 -        fi->type = ARMFault_Domain;
 -        goto do_fault;
 -    }
 -    if (type == 2) {
 -        /* 1Mb section.  */
 -        phys_addr = (desc & 0xfff00000) | (address & 0x000fffff);
 -        ap = (desc >> 10) & 3;
 -        *page_size = 1024 * 1024;
 -    } else {
 -        /* Lookup l2 entry.  */
 -        if (type == 1) {
 -            /* Coarse pagetable.  */
 -            table = (desc & 0xfffffc00) | ((address >> 10) & 0x3fc);
 -        } else {
 -            /* Fine pagetable.  */
 -            table = (desc & 0xfffff000) | ((address >> 8) & 0xffc);
 -        }
 -        desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
 -                           mmu_idx, fi);
 -        if (fi->type != ARMFault_None) {
 -            goto do_fault;
 -        }
 -        switch (desc & 3) {
 -        case 0: /* Page translation fault.  */
 -            fi->type = ARMFault_Translation;
 -            goto do_fault;
 -        case 1: /* 64k page.  */
 -            phys_addr = (desc & 0xffff0000) | (address & 0xffff);
 -            ap = (desc >> (4 + ((address >> 13) & 6))) & 3;
 -            *page_size = 0x10000;
 -            break;
 -        case 2: /* 4k page.  */
 -            phys_addr = (desc & 0xfffff000) | (address & 0xfff);
 -            ap = (desc >> (4 + ((address >> 9) & 6))) & 3;
 -            *page_size = 0x1000;
 -            break;
 -        case 3: /* 1k page, or ARMv6/XScale "extended small (4k) page" */
 -            if (type == 1) {
 -                /* ARMv6/XScale extended small page format */
 -                if (arm_feature(env, ARM_FEATURE_XSCALE)
 -                    || arm_feature(env, ARM_FEATURE_V6)) {
 -                    phys_addr = (desc & 0xfffff000) | (address & 0xfff);
 -                    *page_size = 0x1000;
 -                } else {
 -                    /* UNPREDICTABLE in ARMv5; we choose to take a
 -                     * page translation fault.
 -                     */
 -                    fi->type = ARMFault_Translation;
 -                    goto do_fault;
 -                }
 -            } else {
 -                phys_addr = (desc & 0xfffffc00) | (address & 0x3ff);
 -                *page_size = 0x400;
 -            }
 -            ap = (desc >> 4) & 3;
 -            break;
 -        default:
 -            /* Never happens, but compiler isn't smart enough to tell.  */
 -            g_assert_not_reached();
 -        }
 -    }
 -    *prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
 -    *prot |= *prot ? PAGE_EXEC : 0;
 -    if (!(*prot & (1 << access_type))) {
 -        /* Access permission fault.  */
 -        fi->type = ARMFault_Permission;
 -        goto do_fault;
 -    }
 -    *phys_ptr = phys_addr;
 -    return false;
 -do_fault:
 -    fi->domain = domain;
 -    fi->level = level;
 -    return true;
 -}
 -
  bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
                        MMUAccessType access_type, ARMMMUIdx mmu_idx,
                        hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@
  #include "ptw.h"
 +static bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
 +                             MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                             hwaddr *phys_ptr, int *prot,
 +                             target_ulong *page_size,
 +                             ARMMMUFaultInfo *fi)
 +{
 +    CPUState *cs = env_cpu(env);
 +    int level = 1;
 +    uint32_t table;
 +    uint32_t desc;
 +    int type;
 +    int ap;
 +    int domain = 0;
 +    int domain_prot;
 +    hwaddr phys_addr;
 +    uint32_t dacr;
 +
 +    /* Pagetable walk.  */
 +    /* Lookup l1 descriptor.  */
 +    if (!get_level1_table_address(env, mmu_idx, &table, address)) {
 +        /* Section translation fault if page walk is disabled by PD0 or PD1 */
 +        fi->type = ARMFault_Translation;
 +        goto do_fault;
 +    }
 +    desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
 +                       mmu_idx, fi);
 +    if (fi->type != ARMFault_None) {
 +        goto do_fault;
 +    }
 +    type = (desc & 3);
 +    domain = (desc >> 5) & 0x0f;
 +    if (regime_el(env, mmu_idx) == 1) {
 +        dacr = env->cp15.dacr_ns;
 +    } else {
 +        dacr = env->cp15.dacr_s;
 +    }
 +    domain_prot = (dacr >> (domain * 2)) & 3;
 +    if (type == 0) {
 +        /* Section translation fault.  */
 +        fi->type = ARMFault_Translation;
 +        goto do_fault;
 +    }
 +    if (type != 2) {
 +        level = 2;
 +    }
 +    if (domain_prot == 0 || domain_prot == 2) {
 +        fi->type = ARMFault_Domain;
 +        goto do_fault;
 +    }
 +    if (type == 2) {
 +        /* 1Mb section.  */
 +        phys_addr = (desc & 0xfff00000) | (address & 0x000fffff);
 +        ap = (desc >> 10) & 3;
 +        *page_size = 1024 * 1024;
 +    } else {
 +        /* Lookup l2 entry.  */
 +        if (type == 1) {
 +            /* Coarse pagetable.  */
 +            table = (desc & 0xfffffc00) | ((address >> 10) & 0x3fc);
 +        } else {
 +            /* Fine pagetable.  */
 +            table = (desc & 0xfffff000) | ((address >> 8) & 0xffc);
 +        }
 +        desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
 +                           mmu_idx, fi);
 +        if (fi->type != ARMFault_None) {
 +            goto do_fault;
 +        }
 +        switch (desc & 3) {
 +        case 0: /* Page translation fault.  */
 +            fi->type = ARMFault_Translation;
 +            goto do_fault;
 +        case 1: /* 64k page.  */
 +            phys_addr = (desc & 0xffff0000) | (address & 0xffff);
 +            ap = (desc >> (4 + ((address >> 13) & 6))) & 3;
 +            *page_size = 0x10000;
 +            break;
 +        case 2: /* 4k page.  */
 +            phys_addr = (desc & 0xfffff000) | (address & 0xfff);
 +            ap = (desc >> (4 + ((address >> 9) & 6))) & 3;
 +            *page_size = 0x1000;
 +            break;
 +        case 3: /* 1k page, or ARMv6/XScale "extended small (4k) page" */
 +            if (type == 1) {
 +                /* ARMv6/XScale extended small page format */
 +                if (arm_feature(env, ARM_FEATURE_XSCALE)
 +                    || arm_feature(env, ARM_FEATURE_V6)) {
 +                    phys_addr = (desc & 0xfffff000) | (address & 0xfff);
 +                    *page_size = 0x1000;
 +                } else {
 +                    /*
 +                     * UNPREDICTABLE in ARMv5; we choose to take a
 +                     * page translation fault.
 +                     */
 +                    fi->type = ARMFault_Translation;
 +                    goto do_fault;
 +                }
 +            } else {
 +                phys_addr = (desc & 0xfffffc00) | (address & 0x3ff);
 +                *page_size = 0x400;
 +            }
 +            ap = (desc >> 4) & 3;
 +            break;
 +        default:
 +            /* Never happens, but compiler isn't smart enough to tell.  */
 +            g_assert_not_reached();
 +        }
 +    }
 +    *prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
 +    *prot |= *prot ? PAGE_EXEC : 0;
 +    if (!(*prot & (1 << access_type))) {
 +        /* Access permission fault.  */
 +        fi->type = ARMFault_Permission;
 +        goto do_fault;
 +    }
 +    *phys_ptr = phys_addr;
 +    return false;
 +do_fault:
 +    fi->domain = domain;
 +    fi->level = level;
 +    return true;
 +}
 +
  /**
   * get_phys_addr - get the physical address for this virtual address
   *
 --
-.25.1
+.34.1

-[PULL 14/55] target/arm: Move get_phys_addr_pmsav7 to ptw.c
+[PULL 68/68] target/arm: Sink fp_status and fpcr access into do_fmlal*
 From: Richard Henderson <richard.henderson@linaro.org>
+Sink common code from the callers into do_fmlal
+and do_fmlal_idx.  Reorder the arguments to minimize
+the re-sorting from the caller's arguments.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20220604040607.269301-8-richard.henderson@linaro.org
+Message-id: 20250129013857.135256-35-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/ptw.h    |  10 +--
+ target/arm/tcg/vec_helper.c | 28 ++++++++++++++++------------
- target/arm/helper.c | 194 +-------------------------------------------
+file changed, 16 insertions(+), 12 deletions(-)
  target/arm/ptw.c    | 190 +++++++++++++++++++++++++++++++++++++++++++
 files changed, 198 insertions(+), 196 deletions(-)
-diff --git a/target/arm/ptw.h b/target/arm/ptw.h
+diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.h
+--- a/target/arm/tcg/vec_helper.c
-+++ b/target/arm/ptw.h
++++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+@@ -XXX,XX +XXX,XX @@ static uint64_t load4_f16(uint64_t *ptr, int is_q, int is_2)
-     return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
+  * as there is not yet SVE versions that might use blocking.
   */
 -static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
 -                     uint64_t negx, int negf, uint32_t desc, bool fz16)
 +static void do_fmlal(float32 *d, void *vn, void *vm,
 +                     CPUARMState *env, uint32_t desc,
 +                     ARMFPStatusFlavour fpst_idx,
 +                     uint64_t negx, int negf)
  {
 +    float_status *fpst = &env->vfp.fp_status[fpst_idx];
 +    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
      intptr_t i, oprsz = simd_oprsz(desc);
      int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
      int is_q = oprsz == 16;
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
      bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 -    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
 -             env->vfp.fpcr & FPCR_FZ16);
 +    do_fmlal(vd, vn, vm, env, desc, FPST_STD, negx, 0);
  }
-+bool m_is_ppb_region(CPUARMState *env, uint32_t address);
+ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
-+bool m_is_system_region(CPUARMState *env, uint32_t address);
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
-+
+             negx = 0x8000800080008000ull;
- void get_phys_addr_pmsav7_default(CPUARMState *env,
+         }
-                                   ARMMMUIdx mmu_idx,
+     }
-                                   int32_t address, int *prot);
+-    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
--bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
+-             env->vfp.fpcr & FPCR_FZ16);
--                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
++    do_fmlal(vd, vn, vm, env, desc, FPST_A64, negx, negf);
 -                          hwaddr *phys_ptr, int *prot,
 -                          target_ulong *page_size,
 -                          ARMMMUFaultInfo *fi);
 +bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx, bool is_user);
 +
  bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
                            MMUAccessType access_type, ARMMMUIdx mmu_idx,
                            hwaddr *phys_ptr, MemTxAttrs *txattrs,
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ do_fault:
      return true;
  }
--static bool pmsav7_use_background_region(ARMCPU *cpu,
+ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
--                                         ARMMMUIdx mmu_idx, bool is_user)
+@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
 +bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx, bool is_user)
  {
      /* Return true if we should use the default memory map as a
       * "background" region if there are no hits against any MPU regions.
@@ -XXX,XX +XXX,XX @@ static bool pmsav7_use_background_region(ARMCPU *cpu,
      }
  }
--static inline bool m_is_ppb_region(CPUARMState *env, uint32_t address)
+-static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
-+bool m_is_ppb_region(CPUARMState *env, uint32_t address)
+-                         uint64_t negx, int negf, uint32_t desc, bool fz16)
 +static void do_fmlal_idx(float32 *d, void *vn, void *vm,
 +                         CPUARMState *env, uint32_t desc,
 +                         ARMFPStatusFlavour fpst_idx,
 +                         uint64_t negx, int negf)
  {
-     /* True if address is in the M profile PPB region 0xe0000000 - 0xe00fffff */
++    float_status *fpst = &env->vfp.fp_status[fpst_idx];
-     return arm_feature(env, ARM_FEATURE_M) &&
++    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
-         extract32(address, 20, 12) == 0xe00;
+     intptr_t i, oprsz = simd_oprsz(desc);
      int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
      int index = extract32(desc, SIMD_DATA_SHIFT + 2, 3);
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
      bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
      uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 -    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
 -                 env->vfp.fpcr & FPCR_FZ16);
 +    do_fmlal_idx(vd, vn, vm, env, desc, FPST_STD, negx, 0);
  }
--static inline bool m_is_system_region(CPUARMState *env, uint32_t address)
+ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
-+bool m_is_system_region(CPUARMState *env, uint32_t address)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
- {
+             negx = 0x8000800080008000ull;
-     /* True if address is in the M profile system region
+         }
-      * 0xe0000000 - 0xffffffff
+     }
-@@ -XXX,XX +XXX,XX @@ static inline bool m_is_system_region(CPUARMState *env, uint32_t address)
+-    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-     return arm_feature(env, ARM_FEATURE_M) && extract32(address, 29, 3) == 0x7;
+-                 env->vfp.fpcr & FPCR_FZ16);
 +    do_fmlal_idx(vd, vn, vm, env, desc, FPST_A64, negx, negf);
  }
--bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
+ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
 -                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
 -                          hwaddr *phys_ptr, int *prot,
 -                          target_ulong *page_size,
 -                          ARMMMUFaultInfo *fi)
 -{
 -    ARMCPU *cpu = env_archcpu(env);
 -    int n;
 -    bool is_user = regime_is_user(env, mmu_idx);
 -
 -    *phys_ptr = address;
 -    *page_size = TARGET_PAGE_SIZE;
 -    *prot = 0;
 -
 -    if (regime_translation_disabled(env, mmu_idx) ||
 -        m_is_ppb_region(env, address)) {
 -        /* MPU disabled or M profile PPB access: use default memory map.
 -         * The other case which uses the default memory map in the
 -         * v7M ARM ARM pseudocode is exception vector reads from the vector
 -         * table. In QEMU those accesses are done in arm_v7m_load_vector(),
 -         * which always does a direct read using address_space_ldl(), rather
 -         * than going via this function, so we don't need to check that here.
 -         */
 -        get_phys_addr_pmsav7_default(env, mmu_idx, address, prot);
 -    } else { /* MPU enabled */
 -        for (n = (int)cpu->pmsav7_dregion - 1; n >= 0; n--) {
 -            /* region search */
 -            uint32_t base = env->pmsav7.drbar[n];
 -            uint32_t rsize = extract32(env->pmsav7.drsr[n], 1, 5);
 -            uint32_t rmask;
 -            bool srdis = false;
 -
 -            if (!(env->pmsav7.drsr[n] & 0x1)) {
 -                continue;
 -            }
 -
 -            if (!rsize) {
 -                qemu_log_mask(LOG_GUEST_ERROR,
 -                              "DRSR[%d]: Rsize field cannot be 0\n", n);
 -                continue;
 -            }
 -            rsize++;
 -            rmask = (1ull << rsize) - 1;
 -
 -            if (base & rmask) {
 -                qemu_log_mask(LOG_GUEST_ERROR,
 -                              "DRBAR[%d]: 0x%" PRIx32 " misaligned "
 -                              "to DRSR region size, mask = 0x%" PRIx32 "\n",
 -                              n, base, rmask);
 -                continue;
 -            }
 -
 -            if (address < base || address > base + rmask) {
 -                /*
 -                 * Address not in this region. We must check whether the
 -                 * region covers addresses in the same page as our address.
 -                 * In that case we must not report a size that covers the
 -                 * whole page for a subsequent hit against a different MPU
 -                 * region or the background region, because it would result in
 -                 * incorrect TLB hits for subsequent accesses to addresses that
 -                 * are in this MPU region.
 -                 */
 -                if (ranges_overlap(base, rmask,
 -                                   address & TARGET_PAGE_MASK,
 -                                   TARGET_PAGE_SIZE)) {
 -                    *page_size = 1;
 -                }
 -                continue;
 -            }
 -
 -            /* Region matched */
 -
 -            if (rsize >= 8) { /* no subregions for regions < 256 bytes */
 -                int i, snd;
 -                uint32_t srdis_mask;
 -
 -                rsize -= 3; /* sub region size (power of 2) */
 -                snd = ((address - base) >> rsize) & 0x7;
 -                srdis = extract32(env->pmsav7.drsr[n], snd + 8, 1);
 -
 -                srdis_mask = srdis ? 0x3 : 0x0;
 -                for (i = 2; i <= 8 && rsize < TARGET_PAGE_BITS; i *= 2) {
 -                    /* This will check in groups of 2, 4 and then 8, whether
 -                     * the subregion bits are consistent. rsize is incremented
 -                     * back up to give the region size, considering consistent
 -                     * adjacent subregions as one region. Stop testing if rsize
 -                     * is already big enough for an entire QEMU page.
 -                     */
 -                    int snd_rounded = snd & ~(i - 1);
 -                    uint32_t srdis_multi = extract32(env->pmsav7.drsr[n],
 -                                                     snd_rounded + 8, i);
 -                    if (srdis_mask ^ srdis_multi) {
 -                        break;
 -                    }
 -                    srdis_mask = (srdis_mask << i) | srdis_mask;
 -                    rsize++;
 -                }
 -            }
 -            if (srdis) {
 -                continue;
 -            }
 -            if (rsize < TARGET_PAGE_BITS) {
 -                *page_size = 1 << rsize;
 -            }
 -            break;
 -        }
 -
 -        if (n == -1) { /* no hits */
 -            if (!pmsav7_use_background_region(cpu, mmu_idx, is_user)) {
 -                /* background fault */
 -                fi->type = ARMFault_Background;
 -                return true;
 -            }
 -            get_phys_addr_pmsav7_default(env, mmu_idx, address, prot);
 -        } else { /* a MPU hit! */
 -            uint32_t ap = extract32(env->pmsav7.dracr[n], 8, 3);
 -            uint32_t xn = extract32(env->pmsav7.dracr[n], 12, 1);
 -
 -            if (m_is_system_region(env, address)) {
 -                /* System space is always execute never */
 -                xn = 1;
 -            }
 -
 -            if (is_user) { /* User mode AP bit decoding */
 -                switch (ap) {
 -                case 0:
 -                case 1:
 -                case 5:
 -                    break; /* no access */
 -                case 3:
 -                    *prot |= PAGE_WRITE;
 -                    /* fall through */
 -                case 2:
 -                case 6:
 -                    *prot |= PAGE_READ | PAGE_EXEC;
 -                    break;
 -                case 7:
 -                    /* for v7M, same as 6; for R profile a reserved value */
 -                    if (arm_feature(env, ARM_FEATURE_M)) {
 -                        *prot |= PAGE_READ | PAGE_EXEC;
 -                        break;
 -                    }
 -                    /* fall through */
 -                default:
 -                    qemu_log_mask(LOG_GUEST_ERROR,
 -                                  "DRACR[%d]: Bad value for AP bits: 0x%"
 -                                  PRIx32 "\n", n, ap);
 -                }
 -            } else { /* Priv. mode AP bits decoding */
 -                switch (ap) {
 -                case 0:
 -                    break; /* no access */
 -                case 1:
 -                case 2:
 -                case 3:
 -                    *prot |= PAGE_WRITE;
 -                    /* fall through */
 -                case 5:
 -                case 6:
 -                    *prot |= PAGE_READ | PAGE_EXEC;
 -                    break;
 -                case 7:
 -                    /* for v7M, same as 6; for R profile a reserved value */
 -                    if (arm_feature(env, ARM_FEATURE_M)) {
 -                        *prot |= PAGE_READ | PAGE_EXEC;
 -                        break;
 -                    }
 -                    /* fall through */
 -                default:
 -                    qemu_log_mask(LOG_GUEST_ERROR,
 -                                  "DRACR[%d]: Bad value for AP bits: 0x%"
 -                                  PRIx32 "\n", n, ap);
 -                }
 -            }
 -
 -            /* execute never */
 -            if (xn) {
 -                *prot &= ~PAGE_EXEC;
 -            }
 -        }
 -    }
 -
 -    fi->type = ARMFault_Permission;
 -    fi->level = 1;
 -    return !(*prot & (1 << access_type));
 -}
 -
  static bool v8m_is_sau_exempt(CPUARMState *env,
                                uint32_t address, MMUAccessType access_type)
  {
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/osdep.h"
  #include "qemu/log.h"
 +#include "qemu/range.h"
  #include "cpu.h"
  #include "internals.h"
  #include "ptw.h"
@@ -XXX,XX +XXX,XX @@ void get_phys_addr_pmsav7_default(CPUARMState *env,
      }
  }
 +static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
 +                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                                 hwaddr *phys_ptr, int *prot,
 +                                 target_ulong *page_size,
 +                                 ARMMMUFaultInfo *fi)
 +{
 +    ARMCPU *cpu = env_archcpu(env);
 +    int n;
 +    bool is_user = regime_is_user(env, mmu_idx);
 +
 +    *phys_ptr = address;
 +    *page_size = TARGET_PAGE_SIZE;
 +    *prot = 0;
 +
 +    if (regime_translation_disabled(env, mmu_idx) ||
 +        m_is_ppb_region(env, address)) {
 +        /*
 +         * MPU disabled or M profile PPB access: use default memory map.
 +         * The other case which uses the default memory map in the
 +         * v7M ARM ARM pseudocode is exception vector reads from the vector
 +         * table. In QEMU those accesses are done in arm_v7m_load_vector(),
 +         * which always does a direct read using address_space_ldl(), rather
 +         * than going via this function, so we don't need to check that here.
 +         */
 +        get_phys_addr_pmsav7_default(env, mmu_idx, address, prot);
 +    } else { /* MPU enabled */
 +        for (n = (int)cpu->pmsav7_dregion - 1; n >= 0; n--) {
 +            /* region search */
 +            uint32_t base = env->pmsav7.drbar[n];
 +            uint32_t rsize = extract32(env->pmsav7.drsr[n], 1, 5);
 +            uint32_t rmask;
 +            bool srdis = false;
 +
 +            if (!(env->pmsav7.drsr[n] & 0x1)) {
 +                continue;
 +            }
 +
 +            if (!rsize) {
 +                qemu_log_mask(LOG_GUEST_ERROR,
 +                              "DRSR[%d]: Rsize field cannot be 0\n", n);
 +                continue;
 +            }
 +            rsize++;
 +            rmask = (1ull << rsize) - 1;
 +
 +            if (base & rmask) {
 +                qemu_log_mask(LOG_GUEST_ERROR,
 +                              "DRBAR[%d]: 0x%" PRIx32 " misaligned "
 +                              "to DRSR region size, mask = 0x%" PRIx32 "\n",
 +                              n, base, rmask);
 +                continue;
 +            }
 +
 +            if (address < base || address > base + rmask) {
 +                /*
 +                 * Address not in this region. We must check whether the
 +                 * region covers addresses in the same page as our address.
 +                 * In that case we must not report a size that covers the
 +                 * whole page for a subsequent hit against a different MPU
 +                 * region or the background region, because it would result in
 +                 * incorrect TLB hits for subsequent accesses to addresses that
 +                 * are in this MPU region.
 +                 */
 +                if (ranges_overlap(base, rmask,
 +                                   address & TARGET_PAGE_MASK,
 +                                   TARGET_PAGE_SIZE)) {
 +                    *page_size = 1;
 +                }
 +                continue;
 +            }
 +
 +            /* Region matched */
 +
 +            if (rsize >= 8) { /* no subregions for regions < 256 bytes */
 +                int i, snd;
 +                uint32_t srdis_mask;
 +
 +                rsize -= 3; /* sub region size (power of 2) */
 +                snd = ((address - base) >> rsize) & 0x7;
 +                srdis = extract32(env->pmsav7.drsr[n], snd + 8, 1);
 +
 +                srdis_mask = srdis ? 0x3 : 0x0;
 +                for (i = 2; i <= 8 && rsize < TARGET_PAGE_BITS; i *= 2) {
 +                    /*
 +                     * This will check in groups of 2, 4 and then 8, whether
 +                     * the subregion bits are consistent. rsize is incremented
 +                     * back up to give the region size, considering consistent
 +                     * adjacent subregions as one region. Stop testing if rsize
 +                     * is already big enough for an entire QEMU page.
 +                     */
 +                    int snd_rounded = snd & ~(i - 1);
 +                    uint32_t srdis_multi = extract32(env->pmsav7.drsr[n],
 +                                                     snd_rounded + 8, i);
 +                    if (srdis_mask ^ srdis_multi) {
 +                        break;
 +                    }
 +                    srdis_mask = (srdis_mask << i) | srdis_mask;
 +                    rsize++;
 +                }
 +            }
 +            if (srdis) {
 +                continue;
 +            }
 +            if (rsize < TARGET_PAGE_BITS) {
 +                *page_size = 1 << rsize;
 +            }
 +            break;
 +        }
 +
 +        if (n == -1) { /* no hits */
 +            if (!pmsav7_use_background_region(cpu, mmu_idx, is_user)) {
 +                /* background fault */
 +                fi->type = ARMFault_Background;
 +                return true;
 +            }
 +            get_phys_addr_pmsav7_default(env, mmu_idx, address, prot);
 +        } else { /* a MPU hit! */
 +            uint32_t ap = extract32(env->pmsav7.dracr[n], 8, 3);
 +            uint32_t xn = extract32(env->pmsav7.dracr[n], 12, 1);
 +
 +            if (m_is_system_region(env, address)) {
 +                /* System space is always execute never */
 +                xn = 1;
 +            }
 +
 +            if (is_user) { /* User mode AP bit decoding */
 +                switch (ap) {
 +                case 0:
 +                case 1:
 +                case 5:
 +                    break; /* no access */
 +                case 3:
 +                    *prot |= PAGE_WRITE;
 +                    /* fall through */
 +                case 2:
 +                case 6:
 +                    *prot |= PAGE_READ | PAGE_EXEC;
 +                    break;
 +                case 7:
 +                    /* for v7M, same as 6; for R profile a reserved value */
 +                    if (arm_feature(env, ARM_FEATURE_M)) {
 +                        *prot |= PAGE_READ | PAGE_EXEC;
 +                        break;
 +                    }
 +                    /* fall through */
 +                default:
 +                    qemu_log_mask(LOG_GUEST_ERROR,
 +                                  "DRACR[%d]: Bad value for AP bits: 0x%"
 +                                  PRIx32 "\n", n, ap);
 +                }
 +            } else { /* Priv. mode AP bits decoding */
 +                switch (ap) {
 +                case 0:
 +                    break; /* no access */
 +                case 1:
 +                case 2:
 +                case 3:
 +                    *prot |= PAGE_WRITE;
 +                    /* fall through */
 +                case 5:
 +                case 6:
 +                    *prot |= PAGE_READ | PAGE_EXEC;
 +                    break;
 +                case 7:
 +                    /* for v7M, same as 6; for R profile a reserved value */
 +                    if (arm_feature(env, ARM_FEATURE_M)) {
 +                        *prot |= PAGE_READ | PAGE_EXEC;
 +                        break;
 +                    }
 +                    /* fall through */
 +                default:
 +                    qemu_log_mask(LOG_GUEST_ERROR,
 +                                  "DRACR[%d]: Bad value for AP bits: 0x%"
 +                                  PRIx32 "\n", n, ap);
 +                }
 +            }
 +
 +            /* execute never */
 +            if (xn) {
 +                *prot &= ~PAGE_EXEC;
 +            }
 +        }
 +    }
 +
 +    fi->type = ARMFault_Permission;
 +    fi->level = 1;
 +    return !(*prot & (1 << access_type));
 +}
 +
  /**
   * get_phys_addr - get the physical address for this virtual address
   *
 --
-.25.1
+.34.1

The following changes since commit 6d940eff4734bcb40b1a25f62d7cec5a396f994a:

Merge tag 'pull-tpm-2022-06-07-1' of https://github.com/stefanberger/qemu-tpm into staging (2022-06-07 19:22:18 -0700)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20220609

for you to fetch changes up to 414c54d515dba16bfaef643a8acec200c05f229a:

target/arm: Add ID_AA64SMFR0_EL1 (2022-06-08 19:38:59 +0100)

----------------------------------------------------------------
target-arm queue:
 * target/arm: Declare support for FEAT_RASv1p1
 * target/arm: Implement FEAT_DoubleFault
 * Fix 'writeable' typos
 * xlnx_dp: Implement vblank interrupt
 * target/arm: Move page-table-walk code to ptw.c
 * target/arm: Preparatory patches for SME support

----------------------------------------------------------------
Frederic Konrad (2):
      xlnx_dp: fix the wrong register size
      xlnx-zynqmp: fix the irq mapping for the display port and its dma

Peter Maydell (3):
      target/arm: Declare support for FEAT_RASv1p1
      target/arm: Implement FEAT_DoubleFault
      Fix 'writeable' typos

Richard Henderson (48):
      target/arm: Move stage_1_mmu_idx decl to internals.h
      target/arm: Move get_phys_addr to ptw.c
      target/arm: Move get_phys_addr_v5 to ptw.c
      target/arm: Move get_phys_addr_v6 to ptw.c
      target/arm: Move get_phys_addr_pmsav5 to ptw.c
      target/arm: Move get_phys_addr_pmsav7_default to ptw.c
      target/arm: Move get_phys_addr_pmsav7 to ptw.c
      target/arm: Move get_phys_addr_pmsav8 to ptw.c
      target/arm: Move pmsav8_mpu_lookup to ptw.c
      target/arm: Move pmsav7_use_background_region to ptw.c
      target/arm: Move v8m_security_lookup to ptw.c
      target/arm: Move m_is_{ppb,system}_region to ptw.c
      target/arm: Move get_level1_table_address to ptw.c
      target/arm: Move combine_cacheattrs and subroutines to ptw.c
      target/arm: Move get_phys_addr_lpae to ptw.c
      target/arm: Move arm_{ldl,ldq}_ptw to ptw.c
      target/arm: Move {arm_s1_, }regime_using_lpae_format to tlb_helper.c
      target/arm: Move arm_pamax, pamax_map into ptw.c
      target/arm: Move get_S1prot, get_S2prot to ptw.c
      target/arm: Move check_s2_mmu_setup to ptw.c
      target/arm: Move aa32_va_parameters to ptw.c
      target/arm: Move ap_to_tw_prot etc to ptw.c
      target/arm: Move regime_is_user to ptw.c
      target/arm: Move regime_ttbr to ptw.c
      target/arm: Move regime_translation_disabled to ptw.c
      target/arm: Move arm_cpu_get_phys_page_attrs_debug to ptw.c
      target/arm: Move stage_1_mmu_idx, arm_stage1_mmu_idx to ptw.c
      target/arm: Pass CPUARMState to arm_ld[lq]_ptw
      target/arm: Rename TBFLAG_A64 ZCR_LEN to VL
      linux-user/aarch64: Introduce sve_vq
      target/arm: Remove route_to_el2 check from sve_exception_el
      target/arm: Remove fp checks from sve_exception_el
      target/arm: Add el_is_in_host
      target/arm: Use el_is_in_host for sve_zcr_len_for_el
      target/arm: Use el_is_in_host for sve_exception_el
      target/arm: Hoist arm_is_el2_enabled check in sve_exception_el
      target/arm: Do not use aarch64_sve_zcr_get_valid_len in reset
      target/arm: Merge aarch64_sve_zcr_get_valid_len into caller
      target/arm: Use uint32_t instead of bitmap for sve vq's
      target/arm: Rename sve_zcr_len_for_el to sve_vqm1_for_el
      target/arm: Split out load/store primitives to sve_ldst_internal.h
      target/arm: Export sve contiguous ldst support functions
      target/arm: Move expand_pred_b to vec_internal.h
      target/arm: Use expand_pred_b in mve_helper.c
      target/arm: Move expand_pred_h to vec_internal.h
      target/arm: Export bfdotadd from vec_helper.c
      target/arm: Add isar_feature_aa64_sme
      target/arm: Add ID_AA64SMFR0_EL1

Sai Pavan Boddu (2):
      xlnx_dp: Introduce a vblank signal
      xlnx_dp: Fix the interrupt disable logic

The architectural feature RASv1p1 introduces the following new
features:
 * new registers ERXPFGCDN_EL1, ERXPFGCTL_EL1 and ERXPFGF_EL1
 * new bits in the fine-grained trap registers that control traps
   for these new registers
 * new trap bits HCR_EL2.FIEN and SCR_EL3.FIEN that control traps
   for ERXPFGCDN_EL1, ERXPFGCTL_EL1, ERXPFGP_EL1
 * a larger number of the ERXMISC<n>_EL1 registers
 * the format of ERR<n>STATUS registers changes

The architecture permits that if ERRIDR_EL1.NUM is 0 (as it is for
QEMU) then all these new registers may UNDEF, and the HCR_EL2.FIEN
and SCR_EL3.FIEN bits may be RES0.  We don't have any ERR<n>STATUS
registers (again, because ERRIDR_EL1.NUM is 0).  QEMU does not yet
implement the fine-grained-trap extension.  So there is nothing we
need to implement to be compliant with the feature spec.  Make the
'max' CPU report the feature in its ID registers, and document it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220531114258.855804-1-peter.maydell@linaro.org
---
 docs/system/arm/emulation.rst | 1 +
 target/arm/cpu64.c            | 1 +
 2 files changed, 2 insertions(+)

The FEAT_DoubleFault extension adds the following:

* All external aborts on instruction fetches and translation table
   walks for instruction fetches must be synchronous.  For QEMU this
   is already true.

* SCR_EL3 has a new bit NMEA which disables the masking of SError
   interrupts by PSTATE.A when the SError interrupt is taken to EL3.
   For QEMU we only need to make the bit writable, because we have no
   sources of SError interrupts.

* SCR_EL3 has a new bit EASE which causes synchronous external
   aborts taken to EL3 to be taken at the same entry point as SError.
   (Note that this does not mean that they are SErrors for purposes
   of PSTATE.A masking or that the syndrome register reports them as
   SErrors: it just means that the vector offset is different.)

* The existing SCTLR_EL3.IESB has an effective value of 1 when
   SCR_EL3.NMEA is 1.  For QEMU this is a no-op because we don't need
   different behaviour based on IESB (we don't need to do anything to
   ensure that error exceptions are synchronized).

So for QEMU the things we need to change are:
 * Make SCR_EL3.{NMEA,EASE} writable
 * When taking a synchronous external abort at EL3, adjust the
   vector entry point if SCR_EL3.EASE is set
 * Advertise the feature in the ID registers

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220531151431.949322-1-peter.maydell@linaro.org
---
 docs/system/arm/emulation.rst |  1 +
 target/arm/cpu.h              |  5 +++++
 target/arm/cpu64.c            |  4 ++--
 target/arm/helper.c           | 36 +++++++++++++++++++++++++++++++++++
 4 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
 - FEAT_Debugv8p2 (Debug changes for v8.2)
 - FEAT_Debugv8p4 (Debug changes for v8.4)
 - FEAT_DotProd (Advanced SIMD dot product instructions)
+- FEAT_DoubleFault (Double Fault Extension)
 - FEAT_FCMA (Floating-point complex number instructions)
 - FEAT_FHM (Floating-point half-precision multiplication instructions)
 - FEAT_FP16 (Half-precision floating-point data processing)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_ras(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, RAS) != 0;
 }
 
+static inline bool isar_feature_aa64_doublefault(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, RAS) >= 2;
+}
+
 static inline bool isar_feature_aa64_sve(const ARMISARegisters *id)
 {
     return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, SVE) != 0;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
     t = cpu->isar.id_aa64pfr0;
     t = FIELD_DP64(t, ID_AA64PFR0, FP, 1);        /* FEAT_FP16 */
     t = FIELD_DP64(t, ID_AA64PFR0, ADVSIMD, 1);   /* FEAT_FP16 */
-    t = FIELD_DP64(t, ID_AA64PFR0, RAS, 1);       /* FEAT_RAS */
+    t = FIELD_DP64(t, ID_AA64PFR0, RAS, 2);       /* FEAT_RASv1p1 + FEAT_DoubleFault */
     t = FIELD_DP64(t, ID_AA64PFR0, SVE, 1);
     t = FIELD_DP64(t, ID_AA64PFR0, SEL2, 1);      /* FEAT_SEL2 */
     t = FIELD_DP64(t, ID_AA64PFR0, DIT, 1);       /* FEAT_DIT */
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
      * we do for EL2 with the virtualization=on property.
      */
     t = FIELD_DP64(t, ID_AA64PFR1, MTE, 3);       /* FEAT_MTE3 */
-    t = FIELD_DP64(t, ID_AA64PFR1, RAS_FRAC, 1);  /* FEAT_RASv1p1 */
+    t = FIELD_DP64(t, ID_AA64PFR1, RAS_FRAC, 0);  /* FEAT_RASv1p1 + FEAT_DoubleFault */
     t = FIELD_DP64(t, ID_AA64PFR1, CSV2_FRAC, 0); /* FEAT_CSV2_2 */
     cpu->isar.id_aa64pfr1 = t;
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void scr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
         if (cpu_isar_feature(aa64_scxtnum, cpu)) {
             valid_mask |= SCR_ENSCXT;
         }
+        if (cpu_isar_feature(aa64_doublefault, cpu)) {
+            valid_mask |= SCR_EASE | SCR_NMEA;
+        }
     } else {
         valid_mask &= ~(SCR_RW | SCR_ST);
         if (cpu_isar_feature(aa32_ras, cpu)) {
@@ -XXX,XX +XXX,XX @@ static uint32_t cpsr_read_for_spsr_elx(CPUARMState *env)
     return ret;
 }
 
+static bool syndrome_is_sync_extabt(uint32_t syndrome)
+{
+    /* Return true if this syndrome value is a synchronous external abort */
+    switch (syn_get_ec(syndrome)) {
+    case EC_INSNABORT:
+    case EC_INSNABORT_SAME_EL:
+    case EC_DATAABORT:
+    case EC_DATAABORT_SAME_EL:
+        /* Look at fault status code for all the synchronous ext abort cases */
+        switch (syndrome & 0x3f) {
+        case 0x10:
+        case 0x13:
+        case 0x14:
+        case 0x15:
+        case 0x16:
+        case 0x17:
+            return true;
+        default:
+            return false;
+        }
+    default:
+        return false;
+    }
+}
+
 /* Handle exception entry to a target EL which is using AArch64 */
 static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
 {
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
     switch (cs->exception_index) {
     case EXCP_PREFETCH_ABORT:
     case EXCP_DATA_ABORT:
+        /*
+         * FEAT_DoubleFault allows synchronous external aborts taken to EL3
+         * to be taken to the SError vector entrypoint.
+         */
+        if (new_el == 3 && (env->cp15.scr_el3 & SCR_EASE) &&
+            syndrome_is_sync_extabt(env->exception.syndrome)) {
+            addr += 0x180;
+        }
         env->cp15.far_el[new_el] = env->exception.vaddress;
         qemu_log_mask(CPU_LOG_INT, "...with FAR 0x%" PRIx64 "\n",
                       env->cp15.far_el[new_el]);
-- 
2.25.1

We have about 30 instances of the typo/variant spelling 'writeable',
and over 500 of the more common 'writable'.  Standardize on the
latter.

Change produced with:

sed -i -e 's/$[Ww][Rr][Ii][Tt]$[Ee]$[Aa][Bb][Ll][Ee]$/\1\2/g' $(git grep -il writeable)

and then hand-undoing the instance in linux-headers/linux/kvm.h.

Most of these changes are in comments or documentation; the
exceptions are:
 * a local variable in accel/hvf/hvf-accel-ops.c
 * a local variable in accel/kvm/kvm-all.c
 * the PMCR_WRITABLE_MASK macro in target/arm/internals.h
 * the EPT_VIOLATION_GPA_WRITABLE macro in target/i386/hvf/vmcs.h
   (which is never used anywhere)
 * the AR_TYPE_WRITABLE_MASK macro in target/i386/hvf/vmx.h
   (which is never used anywhere)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -XXX,XX +XXX,XX @@ Virtio device config space
 :size: a 32-bit configuration space access size in bytes
 
 :flags: a 32-bit value:
-  - 0: Vhost front-end messages used for writeable fields
+  - 0: Vhost front-end messages used for writable fields
   - 1: Vhost front-end messages used for live migration
 
 :payload: Size bytes array holding the contents of the virtio
diff --git a/docs/specs/vmgenid.txt b/docs/specs/vmgenid.txt
index XXXXXXX..XXXXXXX 100644
--- a/docs/specs/vmgenid.txt
+++ b/docs/specs/vmgenid.txt
@@ -XXX,XX +XXX,XX @@ change the contents of the memory at runtime, specifically when starting a
 backed-up or snapshotted image.  In order to do this, QEMU must know the
 address that has been allocated.
 
-The mechanism chosen for this memory sharing is writeable fw_cfg blobs.
+The mechanism chosen for this memory sharing is writable fw_cfg blobs.
 These are data object that are visible to both QEMU and guests, and are
 addressable as sequential files.
 
@@ -XXX,XX +XXX,XX @@ Two fw_cfg blobs are used in this case:
 /etc/vmgenid_guid - contains the actual VM Generation ID GUID
                   - read-only to the guest
 /etc/vmgenid_addr - contains the address of the downloaded vmgenid blob
-                  - writeable by the guest
+                  - writable by the guest
 
 
 QEMU sends the following commands to the guest at startup:
diff --git a/hw/scsi/mfi.h b/hw/scsi/mfi.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/scsi/mfi.h
+++ b/hw/scsi/mfi.h
@@ -XXX,XX +XXX,XX @@ struct mfi_ctrl_props {
                               * metadata and user data
                               * 1=5%, 2=10%, 3=15% and so on
                               */
-    uint8_t viewSpace;       /* snapshot writeable VIEWs
+    uint8_t viewSpace;       /* snapshot writable VIEWs
                               * capacity as a % of source LD
                               * capacity. 0=READ only
                               * 1=5%, 2=10%, 3=15% and so on
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ enum MVEECIState {
 #define PMCRP   0x2
 #define PMCRE   0x1
 /*
- * Mask of PMCR bits writeable by guest (not including WO bits like C, P,
+ * Mask of PMCR bits writable by guest (not including WO bits like C, P,
  * which can be written as 1 to trigger behaviour but which stay RAZ).
  */
-#define PMCR_WRITEABLE_MASK (PMCRLC | PMCRDP | PMCRX | PMCRD | PMCRE)
+#define PMCR_WRITABLE_MASK (PMCRLC | PMCRDP | PMCRX | PMCRD | PMCRE)
 
 #define PMXEVTYPER_P          0x80000000
 #define PMXEVTYPER_U          0x40000000
diff --git a/target/i386/hvf/vmcs.h b/target/i386/hvf/vmcs.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/vmcs.h
+++ b/target/i386/hvf/vmcs.h
@@ -XXX,XX +XXX,XX @@
 #define EPT_VIOLATION_DATA_WRITE (1UL << 1)
 #define EPT_VIOLATION_INST_FETCH (1UL << 2)
 #define EPT_VIOLATION_GPA_READABLE (1UL << 3)
-#define EPT_VIOLATION_GPA_WRITEABLE (1UL << 4)
+#define EPT_VIOLATION_GPA_WRITABLE (1UL << 4)
 #define EPT_VIOLATION_GPA_EXECUTABLE (1UL << 5)
 #define EPT_VIOLATION_GLA_VALID (1UL << 7)
 #define EPT_VIOLATION_XLAT_VALID (1UL << 8)
diff --git a/target/i386/hvf/vmx.h b/target/i386/hvf/vmx.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/vmx.h
+++ b/target/i386/hvf/vmx.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t cap2ctrl(uint64_t cap, uint64_t ctrl)
 
 #define AR_TYPE_ACCESSES_MASK 1
 #define AR_TYPE_READABLE_MASK (1 << 1)
-#define AR_TYPE_WRITEABLE_MASK (1 << 2)
+#define AR_TYPE_WRITABLE_MASK (1 << 2)
 #define AR_TYPE_CODE_MASK (1 << 3)
 #define AR_TYPE_MASK 0x0f
 #define AR_TYPE_BUSY_64_TSS 11
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
 {
     hvf_slot *mem;
     MemoryRegion *area = section->mr;
-    bool writeable = !area->readonly && !area->rom_device;
+    bool writable = !area->readonly && !area->rom_device;
     hv_memory_flags_t flags;
     uint64_t page_size = qemu_real_host_page_size();
 
     if (!memory_region_is_ram(area)) {
-        if (writeable) {
+        if (writable) {
             return;
         } else if (!memory_region_is_romd(area)) {
             /*
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -XXX,XX +XXX,XX @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
     KVMSlot *mem;
     int err;
     MemoryRegion *mr = section->mr;
-    bool writeable = !mr->readonly && !mr->rom_device;
+    bool writable = !mr->readonly && !mr->rom_device;
     hwaddr start_addr, size, slot_size, mr_offset;
     ram_addr_t ram_start_offset;
     void *ram;
 
     if (!memory_region_is_ram(mr)) {
-        if (writeable || !kvm_readonly_mem_allowed) {
+        if (writable || !kvm_readonly_mem_allowed) {
             return;
         } else if (!mr->romd_mode) {
             /* If the memory device is not in romd_mode, then we actually want
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -XXX,XX +XXX,XX @@ MMUAccessType adjust_signal_pc(uintptr_t *pc, bool is_write)
  * Return true if the write fault has been handled, and should be re-tried.
  *
  * Note that it is important that we don't call page_unprotect() unless
- * this is really a "write to nonwriteable page" fault, because
+ * this is really a "write to nonwritable page" fault, because
  * page_unprotect() assumes that if it is called for an access to
- * a page that's writeable this means we had two threads racing and
- * another thread got there first and already made the page writeable;
+ * a page that's writable this means we had two threads racing and
+ * another thread got there first and already made the page writable;
  * so we will retry the access. If we were to call page_unprotect()
  * for some other kind of fault that should really be passed to the
  * guest, we'd end up in an infinite loop of retrying the faulting access.
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -XXX,XX +XXX,XX @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
     for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
         /*
          * Initialize the value of read_ack_register to 1, so GHES can be
-         * writeable after (re)boot.
+         * writable after (re)boot.
          * ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
          * (GHESv2 - Type 10)
          */
diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -XXX,XX +XXX,XX @@ static void icc_ctlr_el3_write(CPUARMState *env, const ARMCPRegInfo *ri,
         cs->icc_ctlr_el1[GICV3_S] |= ICC_CTLR_EL1_CBPR;
     }
 
-    /* The only bit stored in icc_ctlr_el3 which is writeable is EOIMODE_EL3: */
+    /* The only bit stored in icc_ctlr_el3 which is writable is EOIMODE_EL3: */
     mask = ICC_CTLR_EL3_EOIMODE_EL3;
 
     cs->icc_ctlr_el3 &= ~mask;
diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_dist.c
+++ b/hw/intc/arm_gicv3_dist.c
@@ -XXX,XX +XXX,XX @@ static bool gicd_writel(GICv3State *s, hwaddr offset,
         if (value & mask & GICD_CTLR_DS) {
             /* We just set DS, so the ARE_NS and EnG1S bits are now RES0.
              * Note that this is a one-way transition because if DS is set
-             * then it's not writeable, so it can only go back to 0 with a
+             * then it's not writable, so it can only go back to 0 with a
              * hardware reset.
              */
             s->gicd_ctlr &= ~(GICD_CTLR_EN_GRP1S | GICD_CTLR_ARE_NS);
diff --git a/hw/intc/arm_gicv3_redist.c b/hw/intc/arm_gicv3_redist.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_redist.c
+++ b/hw/intc/arm_gicv3_redist.c
@@ -XXX,XX +XXX,XX @@ static void gicr_write_vpendbaser(GICv3CPUState *cs, uint64_t newval)
 
     /*
      * The DIRTY bit is read-only and for us is always zero;
-     * other fields are writeable.
+     * other fields are writable.
      */
     newval &= R_GICR_VPENDBASER_INNERCACHE_MASK |
         R_GICR_VPENDBASER_SHAREABILITY_MASK |
@@ -XXX,XX +XXX,XX @@ static MemTxResult gicr_writel(GICv3CPUState *cs, hwaddr offset,
         /* RAZ/WI for our implementation */
         return MEMTX_OK;
     case GICR_WAKER:
-        /* Only the ProcessorSleep bit is writeable. When the guest sets
+        /* Only the ProcessorSleep bit is writable. When the guest sets
          * it it requests that we transition the channel between the
          * redistributor and the cpu interface to quiescent, and that
          * we set the ChildrenAsleep bit once the inteface has reached the
diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/riscv_aclint.c
+++ b/hw/intc/riscv_aclint.c
@@ -XXX,XX +XXX,XX @@ static void riscv_aclint_swi_realize(DeviceState *dev, Error **errp)
     /* Claim software interrupt bits */
     for (i = 0; i < swi->num_harts; i++) {
         RISCVCPU *cpu = RISCV_CPU(qemu_get_cpu(swi->hartid_base + i));
-        /* We don't claim mip.SSIP because it is writeable by software */
+        /* We don't claim mip.SSIP because it is writable by software */
         if (riscv_cpu_claim_interrupts(cpu, swi->sswi ? 0 : MIP_MSIP) < 0) {
             error_report("MSIP already claimed");
             exit(1);
diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/riscv_aplic.c
+++ b/hw/intc/riscv_aplic.c
@@ -XXX,XX +XXX,XX @@ static void riscv_aplic_write(void *opaque, hwaddr addr, uint64_t value,
     }
 
     if (addr == APLIC_DOMAINCFG) {
-        /* Only IE bit writeable at the moment */
+        /* Only IE bit writable at the moment */
         value &= APLIC_DOMAINCFG_IE;
         aplic->domaincfg = value;
     } else if ((APLIC_SOURCECFG_BASE <= addr) &&
diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/pci/shpc.c
+++ b/hw/pci/shpc.c
@@ -XXX,XX +XXX,XX @@ static int shpc_cap_add_config(PCIDevice *d, Error **errp)
     pci_set_byte(config + SHPC_CAP_CxP, 0);
     pci_set_long(config + SHPC_CAP_DWORD_DATA, 0);
     d->shpc->cap = config_offset;
-    /* Make dword select and data writeable. */
+    /* Make dword select and data writable. */
     pci_set_byte(d->wmask + config_offset + SHPC_CAP_DWORD_SELECT, 0xff);
     pci_set_long(d->wmask + config_offset + SHPC_CAP_DWORD_DATA, 0xffffffff);
     return 0;
diff --git a/hw/sparc64/sun4u_iommu.c b/hw/sparc64/sun4u_iommu.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/sparc64/sun4u_iommu.c
+++ b/hw/sparc64/sun4u_iommu.c
@@ -XXX,XX +XXX,XX @@ static IOMMUTLBEntry sun4u_translate_iommu(IOMMUMemoryRegion *iommu,
     }
 
     if (tte & IOMMU_TTE_DATA_W) {
-        /* Writeable */
+        /* Writable */
         ret.perm = IOMMU_RW;
     } else {
         ret.perm = IOMMU_RO;
diff --git a/hw/timer/sse-timer.c b/hw/timer/sse-timer.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/sse-timer.c
+++ b/hw/timer/sse-timer.c
@@ -XXX,XX +XXX,XX @@ static void sse_timer_write(void *opaque, hwaddr offset, uint64_t value,
     {
         uint32_t old_ctl = s->cntp_aival_ctl;
 
-        /* EN bit is writeable; CLR bit is write-0-to-clear, write-1-ignored */
+        /* EN bit is writable; CLR bit is write-0-to-clear, write-1-ignored */
         s->cntp_aival_ctl &= ~R_CNTP_AIVAL_CTL_EN_MASK;
         s->cntp_aival_ctl |= value & R_CNTP_AIVAL_CTL_EN_MASK;
         if (!(value & R_CNTP_AIVAL_CTL_CLR_MASK)) {
diff --git a/target/arm/gdbstub.c b/target/arm/gdbstub.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/gdbstub.c
+++ b/target/arm/gdbstub.c
@@ -XXX,XX +XXX,XX @@ int arm_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
             /*
              * Don't allow writing to XPSR.Exception as it can cause
              * a transition into or out of handler mode (it's not
-             * writeable via the MSR insn so this is a reasonable
+             * writable via the MSR insn so this is a reasonable
              * restriction). Other fields are safe to update.
              */
             xpsr_write(env, tmp, ~XPSR_EXCP);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void pmcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
         }
     }
 
-    env->cp15.c9_pmcr &= ~PMCR_WRITEABLE_MASK;
-    env->cp15.c9_pmcr |= (value & PMCR_WRITEABLE_MASK);
+    env->cp15.c9_pmcr &= ~PMCR_WRITABLE_MASK;
+    env->cp15.c9_pmcr |= (value & PMCR_WRITABLE_MASK);
 
     pmu_op_finish(env);
 }
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, uint64_t val)
             }
         }
 
-        env->cp15.c9_pmcr &= ~PMCR_WRITEABLE_MASK;
-        env->cp15.c9_pmcr |= (val & PMCR_WRITEABLE_MASK);
+        env->cp15.c9_pmcr &= ~PMCR_WRITABLE_MASK;
+        env->cp15.c9_pmcr |= (val & PMCR_WRITABLE_MASK);
 
         pmu_op_finish(env);
         break;
diff --git a/target/i386/cpu-sysemu.c b/target/i386/cpu-sysemu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/cpu-sysemu.c
+++ b/target/i386/cpu-sysemu.c
@@ -XXX,XX +XXX,XX @@ static void x86_cpu_to_dict(X86CPU *cpu, QDict *props)
 
 /* Convert CPU model data from X86CPU object to a property dictionary
  * that can recreate exactly the same CPU model, including every
- * writeable QOM property.
+ * writable QOM property.
  */
 static void x86_cpu_to_dict_full(X86CPU *cpu, QDict *props)
 {
diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c
index XXXXXXX..XXXXXXX 100644
--- a/target/s390x/ioinst.c
+++ b/target/s390x/ioinst.c
@@ -XXX,XX +XXX,XX @@ void ioinst_handle_stsch(S390CPU *cpu, uint64_t reg1, uint32_t ipb,
         g_assert(!s390_is_pv());
         /*
          * As operand exceptions have a lower priority than access exceptions,
-         * we check whether the memory area is writeable (injecting the
+         * we check whether the memory area is writable (injecting the
          * access execption if it is not) first.
          */
         if (!s390_cpu_virt_mem_check_write(cpu, addr, ar, sizeof(schib))) {
diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index XXXXXXX..XXXXXXX 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -XXX,XX +XXX,XX @@ def _early_cleanup(self) -> None:
         """
         # If we keep the console socket open, we may deadlock waiting
         # for QEMU to exit, while QEMU is waiting for the socket to
-        # become writeable.
+        # become writable.
         if self._console_socket is not None:
             self._console_socket.close()
             self._console_socket = None
diff --git a/tests/tcg/x86_64/system/boot.S b/tests/tcg/x86_64/system/boot.S
index XXXXXXX..XXXXXXX 100644
--- a/tests/tcg/x86_64/system/boot.S
+++ b/tests/tcg/x86_64/system/boot.S
@@ -XXX,XX +XXX,XX @@
 	*
 	* - `ebx`: contains the physical memory address where the loader has placed
 	*          the boot start info structure.
-	* - `cr0`: bit 0 (PE) must be set. All the other writeable bits are cleared.
+	* - `cr0`: bit 0 (PE) must be set. All the other writable bits are cleared.
 	* - `cr4`: all bits are cleared.
 	* - `cs `: must be a 32-bit read/execute code segment with a base of ‘0’
 	*          and a limit of ‘0xFFFFFFFF’. The selector value is unspecified.
-- 
2.25.1

From: Frederic Konrad <fkonrad@amd.com>

The core and the vblend registers size are wrong, they should respectively be
0x3B0 and 0x1E0 according to:
  https://www.xilinx.com/htmldocs/registers/ug1087/ug1087-zynq-ultrascale-registers.html.

Let's fix that and use macros when creating the mmio region.

Fixes: 58ac482a66d ("introduce xlnx-dp")
Signed-off-by: Frederic Konrad <fkonrad@amd.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20220601172353.3220232-2-fkonrad@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/display/xlnx_dp.h |  9 +++++++--
 hw/display/xlnx_dp.c         | 17 ++++++++++-------
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/hw/display/xlnx_dp.h b/include/hw/display/xlnx_dp.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/display/xlnx_dp.h
+++ b/include/hw/display/xlnx_dp.h
@@ -XXX,XX +XXX,XX @@
 #define AUD_CHBUF_MAX_DEPTH                 (32 * KiB)
 #define MAX_QEMU_BUFFER_SIZE                (4 * KiB)
 
-#define DP_CORE_REG_ARRAY_SIZE              (0x3AF >> 2)
+#define DP_CORE_REG_OFFSET                  (0x0000)
+#define DP_CORE_REG_ARRAY_SIZE              (0x3B0 >> 2)
+#define DP_AVBUF_REG_OFFSET                 (0xB000)
 #define DP_AVBUF_REG_ARRAY_SIZE             (0x238 >> 2)
-#define DP_VBLEND_REG_ARRAY_SIZE            (0x1DF >> 2)
+#define DP_VBLEND_REG_OFFSET                (0xA000)
+#define DP_VBLEND_REG_ARRAY_SIZE            (0x1E0 >> 2)
+#define DP_AUDIO_REG_OFFSET                 (0xC000)
 #define DP_AUDIO_REG_ARRAY_SIZE             (0x50 >> 2)
+#define DP_CONTAINER_SIZE                   (0xC050)
 
 struct PixmanPlane {
     pixman_format_code_t format;
diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/xlnx_dp.c
+++ b/hw/display/xlnx_dp.c
@@ -XXX,XX +XXX,XX @@ static void xlnx_dp_init(Object *obj)
     SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
     XlnxDPState *s = XLNX_DP(obj);
 
-    memory_region_init(&s->container, obj, TYPE_XLNX_DP, 0xC050);
+    memory_region_init(&s->container, obj, TYPE_XLNX_DP, DP_CONTAINER_SIZE);
 
     memory_region_init_io(&s->core_iomem, obj, &dp_ops, s, TYPE_XLNX_DP
-                          ".core", 0x3AF);
-    memory_region_add_subregion(&s->container, 0x0000, &s->core_iomem);
+                          ".core", sizeof(s->core_registers));
+    memory_region_add_subregion(&s->container, DP_CORE_REG_OFFSET,
+                                &s->core_iomem);
 
     memory_region_init_io(&s->vblend_iomem, obj, &vblend_ops, s, TYPE_XLNX_DP
-                          ".v_blend", 0x1DF);
-    memory_region_add_subregion(&s->container, 0xA000, &s->vblend_iomem);
+                          ".v_blend", sizeof(s->vblend_registers));
+    memory_region_add_subregion(&s->container, DP_VBLEND_REG_OFFSET,
+                                &s->vblend_iomem);
 
     memory_region_init_io(&s->avbufm_iomem, obj, &avbufm_ops, s, TYPE_XLNX_DP
-                          ".av_buffer_manager", 0x238);
-    memory_region_add_subregion(&s->container, 0xB000, &s->avbufm_iomem);
+                          ".av_buffer_manager", sizeof(s->avbufm_registers));
+    memory_region_add_subregion(&s->container, DP_AVBUF_REG_OFFSET,
+                                &s->avbufm_iomem);
 
     memory_region_init_io(&s->audio_iomem, obj, &audio_ops, s, TYPE_XLNX_DP
                           ".audio", sizeof(s->audio_registers));
-- 
2.25.1

From: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>

Add a periodic timer which raises vblank at a frequency of 30Hz.

Note that this is a migration compatibility break for the
xlnx-zcu102 board type.

Signed-off-by: Sai Pavan Boddu <saipava@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Frederic Konrad <fkonrad@amd.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20220601172353.3220232-3-fkonrad@xilinx.com
Changes by fkonrad:
  - Switched to transaction-based ptimer API.
  - Added the DP_INT_VBLNK_START macro.
Signed-off-by: Frederic Konrad <fkonrad@amd.com>
[PMM: bump vmstate version, add commit message note about
 compat break]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/display/xlnx_dp.h |  3 +++
 hw/display/xlnx_dp.c         | 30 ++++++++++++++++++++++++++----
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/include/hw/display/xlnx_dp.h b/include/hw/display/xlnx_dp.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/display/xlnx_dp.h
+++ b/include/hw/display/xlnx_dp.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/dma/xlnx_dpdma.h"
 #include "audio/audio.h"
 #include "qom/object.h"
+#include "hw/ptimer.h"
 
 #define AUD_CHBUF_MAX_DEPTH                 (32 * KiB)
 #define MAX_QEMU_BUFFER_SIZE                (4 * KiB)
@@ -XXX,XX +XXX,XX @@ struct XlnxDPState {
      */
     DPCDState *dpcd;
     I2CDDCState *edid;
+
+    ptimer_state *vblank;
 };
 
 #define TYPE_XLNX_DP "xlnx.v-dp"
diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/xlnx_dp.c
+++ b/hw/display/xlnx_dp.c
@@ -XXX,XX +XXX,XX @@
 #define DP_TX_N_AUD                         (0x032C >> 2)
 #define DP_TX_AUDIO_EXT_DATA(n)             ((0x0330 + 4 * n) >> 2)
 #define DP_INT_STATUS                       (0x03A0 >> 2)
+#define DP_INT_VBLNK_START                  (1 << 13)
 #define DP_INT_MASK                         (0x03A4 >> 2)
 #define DP_INT_EN                           (0x03A8 >> 2)
 #define DP_INT_DS                           (0x03AC >> 2)
@@ -XXX,XX +XXX,XX @@ typedef enum DPVideoFmt DPVideoFmt;
 
 static const VMStateDescription vmstate_dp = {
     .name = TYPE_XLNX_DP,
-    .version_id = 1,
+    .version_id = 2,
     .fields = (VMStateField[]){
         VMSTATE_UINT32_ARRAY(core_registers, XlnxDPState,
                              DP_CORE_REG_ARRAY_SIZE),
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_dp = {
                              DP_VBLEND_REG_ARRAY_SIZE),
         VMSTATE_UINT32_ARRAY(audio_registers, XlnxDPState,
                              DP_AUDIO_REG_ARRAY_SIZE),
+        VMSTATE_PTIMER(vblank, XlnxDPState),
         VMSTATE_END_OF_LIST()
     }
 };
 
+#define DP_VBLANK_PTIMER_POLICY (PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD | \
+                                 PTIMER_POLICY_CONTINUOUS_TRIGGER |    \
+                                 PTIMER_POLICY_NO_IMMEDIATE_TRIGGER)
+
 static void xlnx_dp_update_irq(XlnxDPState *s);
 
 static uint64_t xlnx_dp_audio_read(void *opaque, hwaddr offset, unsigned size)
@@ -XXX,XX +XXX,XX @@ static void xlnx_dp_write(void *opaque, hwaddr offset, uint64_t value,
         break;
     case DP_TRANSMITTER_ENABLE:
         s->core_registers[offset] = value & 0x01;
+        ptimer_transaction_begin(s->vblank);
+        if (value & 0x1) {
+            ptimer_run(s->vblank, 0);
+        } else {
+            ptimer_stop(s->vblank);
+        }
+        ptimer_transaction_commit(s->vblank);
         break;
     case DP_FORCE_SCRAMBLER_RESET:
         /*
@@ -XXX,XX +XXX,XX @@ static void xlnx_dp_update_display(void *opaque)
         return;
     }
 
-    s->core_registers[DP_INT_STATUS] |= (1 << 13);
-    xlnx_dp_update_irq(s);
-
     xlnx_dpdma_trigger_vsync_irq(s->dpdma);
 
     /*
@@ -XXX,XX +XXX,XX @@ static void xlnx_dp_finalize(Object *obj)
     fifo8_destroy(&s->rx_fifo);
 }
 
+static void vblank_hit(void *opaque)
+{
+    XlnxDPState *s = XLNX_DP(opaque);
+
+    s->core_registers[DP_INT_STATUS] |= DP_INT_VBLNK_START;
+    xlnx_dp_update_irq(s);
+}
+
 static void xlnx_dp_realize(DeviceState *dev, Error **errp)
 {
     XlnxDPState *s = XLNX_DP(dev);
@@ -XXX,XX +XXX,XX @@ static void xlnx_dp_realize(DeviceState *dev, Error **errp)
                                            &as);
     AUD_set_volume_out(s->amixer_output_stream, 0, 255, 255);
     xlnx_dp_audio_activate(s);
+    s->vblank = ptimer_init(vblank_hit, s, DP_VBLANK_PTIMER_POLICY);
+    ptimer_transaction_begin(s->vblank);
+    ptimer_set_freq(s->vblank, 30);
+    ptimer_transaction_commit(s->vblank);
 }
 
 static void xlnx_dp_reset(DeviceState *dev)
-- 
2.25.1

From: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>

Fix interrupt disable logic. Mask value 1 indicates that interrupts are
disabled.

Signed-off-by: Sai Pavan Boddu <saipava@xilinx.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Frederic Konrad <fkonrad@amd.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20220601172353.3220232-4-fkonrad@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/display/xlnx_dp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/xlnx_dp.c
+++ b/hw/display/xlnx_dp.c
@@ -XXX,XX +XXX,XX @@ static void xlnx_dp_write(void *opaque, hwaddr offset, uint64_t value,
         xlnx_dp_update_irq(s);
         break;
     case DP_INT_DS:
-        s->core_registers[DP_INT_MASK] |= ~value;
+        s->core_registers[DP_INT_MASK] |= value;
         xlnx_dp_update_irq(s);
         break;
     default:
-- 
2.25.1

From: Frederic Konrad <fkonrad@amd.com>

When the display port has been initially implemented the device
driver wasn't using interrupts.  Now that the display port driver
waits for vblank interrupt it has been noticed that the irq mapping
is wrong.  So use the value from the linux device tree and the
ultrascale+ reference manual.

Signed-off-by: Frederic Konrad <fkonrad@amd.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20220601172353.3220232-5-fkonrad@xilinx.com
[PMM: refold lines in commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xlnx-zynqmp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -XXX,XX +XXX,XX @@
 #define SERDES_SIZE         0x20000
 
 #define DP_ADDR             0xfd4a0000
-#define DP_IRQ              113
+#define DP_IRQ              0x77
 
 #define DPDMA_ADDR          0xfd4c0000
-#define DPDMA_IRQ           116
+#define DPDMA_IRQ           0x7a
 
 #define APU_ADDR            0xfd5c0000
 #define APU_IRQ             153
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Move the decl from ptw.h to internals.h.  Provide an inline
version for user-only, just as we do for arm_stage1_mmu_idx.
Move an endif down to make the definition in helper.c be
system only.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h | 5 +++++
 target/arm/helper.c    | 5 ++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx(CPUARMState *env);
  * Return the ARMMMUIdx for the stage1 traversal for the current regime.
  */
 #ifdef CONFIG_USER_ONLY
+static inline ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
+{
+    return ARMMMUIdx_Stage1_E0;
+}
 static inline ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
 {
     return ARMMMUIdx_Stage1_E0;
 }
 #else
+ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx);
 ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env);
 #endif
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx,
     }
 }
 
-#endif /* !CONFIG_USER_ONLY */
-
 /* Convert a possible stage1+2 MMU index into the appropriate
  * stage 1 MMU index
  */
-static inline ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
+ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
 {
     switch (mmu_idx) {
     case ARMMMUIdx_SE10_0:
@@ -XXX,XX +XXX,XX @@ static inline ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
         return mmu_idx;
     }
 }
+#endif /* !CONFIG_USER_ONLY */
 
 /* Return true if the translation regime is using LPAE format page tables */
 static inline bool regime_using_lpae_format(CPUARMState *env,
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Begin moving all of the page table walking functions
out of helper.c, starting with get_phys_addr().

Create a temporary header file, "ptw.h", in which to
share declarations between the two C files while we
are moving functions.

Move a few declarations to "internals.h", which will
remain used by multiple C files.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h |  18 ++-
 target/arm/ptw.h       |  51 ++++++
 target/arm/helper.c    | 344 +++++------------------------------------
 target/arm/ptw.c       | 267 ++++++++++++++++++++++++++++++++
 target/arm/meson.build |   1 +
 5 files changed, 372 insertions(+), 309 deletions(-)
 create mode 100644 target/arm/ptw.h
 create mode 100644 target/arm/ptw.c

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
 /* Return the MMU index for a v7M CPU in the specified security state */
 ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate);
 
-/* Return true if the stage 1 translation regime is using LPAE format page
- * tables */
+/* Return true if the translation regime is using LPAE format page tables */
+bool regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx);
+
+/*
+ * Return true if the stage 1 translation regime is using LPAE
+ * format page tables
+ */
 bool arm_s1_regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx);
 
 /* Raise a data fault alignment exception for the specified virtual address */
@@ -XXX,XX +XXX,XX @@ static inline uint32_t regime_el(CPUARMState *env, ARMMMUIdx mmu_idx)
     }
 }
 
+/* Return the SCTLR value which controls this address translation regime */
+static inline uint64_t regime_sctlr(CPUARMState *env, ARMMMUIdx mmu_idx)
+{
+    return env->cp15.sctlr_el[regime_el(env, mmu_idx)];
+}
+
 /* Return the TCR controlling this translation regime */
 static inline TCR *regime_tcr(CPUARMState *env, ARMMMUIdx mmu_idx)
 {
@@ -XXX,XX +XXX,XX @@ typedef struct ARMVAParameters {
 ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
                                    ARMMMUIdx mmu_idx, bool data);
 
+int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx);
+int aa64_va_parameter_tbid(uint64_t tcr, ARMMMUIdx mmu_idx);
+
 static inline int exception_target_el(CPUARMState *env)
 {
     int target_el = MAX(1, arm_current_el(env));
diff --git a/target/arm/ptw.h b/target/arm/ptw.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM page table walking.
+ *
+ * This code is licensed under the GNU GPL v2 or later.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef TARGET_ARM_PTW_H
+#define TARGET_ARM_PTW_H
+
+#ifndef CONFIG_USER_ONLY
+
+bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx);
+bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
+ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
+                                 ARMCacheAttrs s1, ARMCacheAttrs s2);
+
+bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
+                      MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                      hwaddr *phys_ptr, int *prot,
+                      target_ulong *page_size,
+                      ARMMMUFaultInfo *fi);
+bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
+                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                          hwaddr *phys_ptr, int *prot,
+                          ARMMMUFaultInfo *fi);
+bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
+                      MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                      hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
+                      target_ulong *page_size, ARMMMUFaultInfo *fi);
+bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
+                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                          hwaddr *phys_ptr, int *prot,
+                          target_ulong *page_size,
+                          ARMMMUFaultInfo *fi);
+bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
+                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                          hwaddr *phys_ptr, MemTxAttrs *txattrs,
+                          int *prot, target_ulong *page_size,
+                          ARMMMUFaultInfo *fi);
+bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
+                        MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                        bool s1_is_el0,
+                        hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
+                        target_ulong *page_size_ptr,
+                        ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
+    __attribute__((nonnull));
+
+#endif /* !CONFIG_USER_ONLY */
+#endif /* TARGET_ARM_PTW_H */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@
 #include "semihosting/common-semi.h"
 #endif
 #include "cpregs.h"
+#include "ptw.h"
 
 #define ARM_CPU_FREQ 1000000000 /* FIXME: 1 GHz, should be configurable */
 
-#ifndef CONFIG_USER_ONLY
-
-static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-                               MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                               bool s1_is_el0,
-                               hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
-                               target_ulong *page_size_ptr,
-                               ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
-    __attribute__((nonnull));
-#endif
-
 static void switch_mode(CPUARMState *env, int mode);
-static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx);
 
 static uint64_t raw_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
@@ -XXX,XX +XXX,XX @@ uint64_t arm_sctlr(CPUARMState *env, int el)
     return env->cp15.sctlr_el[el];
 }
 
-/* Return the SCTLR value which controls this address translation regime */
-static inline uint64_t regime_sctlr(CPUARMState *env, ARMMMUIdx mmu_idx)
-{
-    return env->cp15.sctlr_el[regime_el(env, mmu_idx)];
-}
-
 #ifndef CONFIG_USER_ONLY
 
 /* Return true if the specified stage of address translation is disabled */
-static inline bool regime_translation_disabled(CPUARMState *env,
-                                               ARMMMUIdx mmu_idx)
+bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx)
 {
     uint64_t hcr_el2;
 
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
 #endif /* !CONFIG_USER_ONLY */
 
 /* Return true if the translation regime is using LPAE format page tables */
-static inline bool regime_using_lpae_format(CPUARMState *env,
-                                            ARMMMUIdx mmu_idx)
+bool regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
 {
     int el = regime_el(env, mmu_idx);
     if (el == 2 || arm_el_is_aa64(env, el)) {
@@ -XXX,XX +XXX,XX @@ bool arm_s1_regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
 }
 
 #ifndef CONFIG_USER_ONLY
-static inline bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
+bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
 {
     switch (mmu_idx) {
     case ARMMMUIdx_SE10_0:
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
     return 0;
 }
 
-static bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
-                             MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                             hwaddr *phys_ptr, int *prot,
-                             target_ulong *page_size,
-                             ARMMMUFaultInfo *fi)
+bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
+                      MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                      hwaddr *phys_ptr, int *prot,
+                      target_ulong *page_size,
+                      ARMMMUFaultInfo *fi)
 {
     CPUState *cs = env_cpu(env);
     int level = 1;
@@ -XXX,XX +XXX,XX @@ do_fault:
     return true;
 }
 
-static bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
-                             MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                             hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
-                             target_ulong *page_size, ARMMMUFaultInfo *fi)
+bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
+                      MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                      hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
+                      target_ulong *page_size, ARMMMUFaultInfo *fi)
 {
     CPUState *cs = env_cpu(env);
     ARMCPU *cpu = env_archcpu(env);
@@ -XXX,XX +XXX,XX @@ unsigned int arm_pamax(ARMCPU *cpu)
     return pamax_map[parange];
 }
 
-static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
+int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
 {
     if (regime_has_2_ranges(mmu_idx)) {
         return extract64(tcr, 37, 2);
@@ -XXX,XX +XXX,XX @@ static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
     }
 }
 
-static int aa64_va_parameter_tbid(uint64_t tcr, ARMMMUIdx mmu_idx)
+int aa64_va_parameter_tbid(uint64_t tcr, ARMMMUIdx mmu_idx)
 {
     if (regime_has_2_ranges(mmu_idx)) {
         return extract64(tcr, 51, 2);
@@ -XXX,XX +XXX,XX @@ static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
  * @fi: set to fault info if the translation fails
  * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
  */
-static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-                               MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                               bool s1_is_el0,
-                               hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
-                               target_ulong *page_size_ptr,
-                               ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
+bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
+                        MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                        bool s1_is_el0,
+                        hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
+                        target_ulong *page_size_ptr,
+                        ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
 {
     ARMCPU *cpu = env_archcpu(env);
     CPUState *cs = CPU(cpu);
@@ -XXX,XX +XXX,XX @@ static inline bool m_is_system_region(CPUARMState *env, uint32_t address)
     return arm_feature(env, ARM_FEATURE_M) && extract32(address, 29, 3) == 0x7;
 }
 
-static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
-                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                                 hwaddr *phys_ptr, int *prot,
-                                 target_ulong *page_size,
-                                 ARMMMUFaultInfo *fi)
+bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
+                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                          hwaddr *phys_ptr, int *prot,
+                          target_ulong *page_size,
+                          ARMMMUFaultInfo *fi)
 {
     ARMCPU *cpu = env_archcpu(env);
     int n;
@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
 }
 
 
-static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
-                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                                 hwaddr *phys_ptr, MemTxAttrs *txattrs,
-                                 int *prot, target_ulong *page_size,
-                                 ARMMMUFaultInfo *fi)
+bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
+                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                          hwaddr *phys_ptr, MemTxAttrs *txattrs,
+                          int *prot, target_ulong *page_size,
+                          ARMMMUFaultInfo *fi)
 {
     uint32_t secure = regime_is_secure(env, mmu_idx);
     V8M_SAttributes sattrs = {};
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
     return ret;
 }
 
-static bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
-                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                                 hwaddr *phys_ptr, int *prot,
-                                 ARMMMUFaultInfo *fi)
+bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
+                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                          hwaddr *phys_ptr, int *prot,
+                          ARMMMUFaultInfo *fi)
 {
     int n;
     uint32_t mask;
@@ -XXX,XX +XXX,XX @@ static uint8_t combined_attrs_fwb(CPUARMState *env,
  * @s1:      Attributes from stage 1 walk
  * @s2:      Attributes from stage 2 walk
  */
-static ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
-                                        ARMCacheAttrs s1, ARMCacheAttrs s2)
+ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
+                                 ARMCacheAttrs s1, ARMCacheAttrs s2)
 {
     ARMCacheAttrs ret;
     bool tagged = false;
@@ -XXX,XX +XXX,XX @@ static ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
     return ret;
 }
 
-
-/* get_phys_addr - get the physical address for this virtual address
- *
- * Find the physical address corresponding to the given virtual address,
- * by doing a translation table walk on MMU based systems or using the
- * MPU state on MPU based systems.
- *
- * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
- * prot and page_size may not be filled in, and the populated fsr value provides
- * information on why the translation aborted, in the format of a
- * DFSR/IFSR fault register, with the following caveats:
- *  * we honour the short vs long DFSR format differences.
- *  * the WnR bit is never set (the caller must do this).
- *  * for PSMAv5 based systems we don't bother to return a full FSR format
- *    value.
- *
- * @env: CPUARMState
- * @address: virtual address to get physical address for
- * @access_type: 0 for read, 1 for write, 2 for execute
- * @mmu_idx: MMU index indicating required translation regime
- * @phys_ptr: set to the physical address corresponding to the virtual address
- * @attrs: set to the memory transaction attributes to use
- * @prot: set to the permissions for the page containing phys_ptr
- * @page_size: set to the size of the page containing phys_ptr
- * @fi: set to fault info if the translation fails
- * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
- */
-bool get_phys_addr(CPUARMState *env, target_ulong address,
-                   MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                   hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
-                   target_ulong *page_size,
-                   ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
-{
-    ARMMMUIdx s1_mmu_idx = stage_1_mmu_idx(mmu_idx);
-
-    if (mmu_idx != s1_mmu_idx) {
-        /* Call ourselves recursively to do the stage 1 and then stage 2
-         * translations if mmu_idx is a two-stage regime.
-         */
-        if (arm_feature(env, ARM_FEATURE_EL2)) {
-            hwaddr ipa;
-            int s2_prot;
-            int ret;
-            bool ipa_secure;
-            ARMCacheAttrs cacheattrs2 = {};
-            ARMMMUIdx s2_mmu_idx;
-            bool is_el0;
-
-            ret = get_phys_addr(env, address, access_type, s1_mmu_idx, &ipa,
-                                attrs, prot, page_size, fi, cacheattrs);
-
-            /* If S1 fails or S2 is disabled, return early.  */
-            if (ret || regime_translation_disabled(env, ARMMMUIdx_Stage2)) {
-                *phys_ptr = ipa;
-                return ret;
-            }
-
-            ipa_secure = attrs->secure;
-            if (arm_is_secure_below_el3(env)) {
-                if (ipa_secure) {
-                    attrs->secure = !(env->cp15.vstcr_el2.raw_tcr & VSTCR_SW);
-                } else {
-                    attrs->secure = !(env->cp15.vtcr_el2.raw_tcr & VTCR_NSW);
-                }
-            } else {
-                assert(!ipa_secure);
-            }
-
-            s2_mmu_idx = attrs->secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
-            is_el0 = mmu_idx == ARMMMUIdx_E10_0 || mmu_idx == ARMMMUIdx_SE10_0;
-
-            /* S1 is done. Now do S2 translation.  */
-            ret = get_phys_addr_lpae(env, ipa, access_type, s2_mmu_idx, is_el0,
-                                     phys_ptr, attrs, &s2_prot,
-                                     page_size, fi, &cacheattrs2);
-            fi->s2addr = ipa;
-            /* Combine the S1 and S2 perms.  */
-            *prot &= s2_prot;
-
-            /* If S2 fails, return early.  */
-            if (ret) {
-                return ret;
-            }
-
-            /* Combine the S1 and S2 cache attributes. */
-            if (arm_hcr_el2_eff(env) & HCR_DC) {
-                /*
-                 * HCR.DC forces the first stage attributes to
-                 *  Normal Non-Shareable,
-                 *  Inner Write-Back Read-Allocate Write-Allocate,
-                 *  Outer Write-Back Read-Allocate Write-Allocate.
-                 * Do not overwrite Tagged within attrs.
-                 */
-                if (cacheattrs->attrs != 0xf0) {
-                    cacheattrs->attrs = 0xff;
-                }
-                cacheattrs->shareability = 0;
-            }
-            *cacheattrs = combine_cacheattrs(env, *cacheattrs, cacheattrs2);
-
-            /* Check if IPA translates to secure or non-secure PA space. */
-            if (arm_is_secure_below_el3(env)) {
-                if (ipa_secure) {
-                    attrs->secure =
-                        !(env->cp15.vstcr_el2.raw_tcr & (VSTCR_SA | VSTCR_SW));
-                } else {
-                    attrs->secure =
-                        !((env->cp15.vtcr_el2.raw_tcr & (VTCR_NSA | VTCR_NSW))
-                        || (env->cp15.vstcr_el2.raw_tcr & (VSTCR_SA | VSTCR_SW)));
-                }
-            }
-            return 0;
-        } else {
-            /*
-             * For non-EL2 CPUs a stage1+stage2 translation is just stage 1.
-             */
-            mmu_idx = stage_1_mmu_idx(mmu_idx);
-        }
-    }
-
-    /* The page table entries may downgrade secure to non-secure, but
-     * cannot upgrade an non-secure translation regime's attributes
-     * to secure.
-     */
-    attrs->secure = regime_is_secure(env, mmu_idx);
-    attrs->user = regime_is_user(env, mmu_idx);
-
-    /* Fast Context Switch Extension. This doesn't exist at all in v8.
-     * In v7 and earlier it affects all stage 1 translations.
-     */
-    if (address < 0x02000000 && mmu_idx != ARMMMUIdx_Stage2
-        && !arm_feature(env, ARM_FEATURE_V8)) {
-        if (regime_el(env, mmu_idx) == 3) {
-            address += env->cp15.fcseidr_s;
-        } else {
-            address += env->cp15.fcseidr_ns;
-        }
-    }
-
-    if (arm_feature(env, ARM_FEATURE_PMSA)) {
-        bool ret;
-        *page_size = TARGET_PAGE_SIZE;
-
-        if (arm_feature(env, ARM_FEATURE_V8)) {
-            /* PMSAv8 */
-            ret = get_phys_addr_pmsav8(env, address, access_type, mmu_idx,
-                                       phys_ptr, attrs, prot, page_size, fi);
-        } else if (arm_feature(env, ARM_FEATURE_V7)) {
-            /* PMSAv7 */
-            ret = get_phys_addr_pmsav7(env, address, access_type, mmu_idx,
-                                       phys_ptr, prot, page_size, fi);
-        } else {
-            /* Pre-v7 MPU */
-            ret = get_phys_addr_pmsav5(env, address, access_type, mmu_idx,
-                                       phys_ptr, prot, fi);
-        }
-        qemu_log_mask(CPU_LOG_MMU, "PMSA MPU lookup for %s at 0x%08" PRIx32
-                      " mmu_idx %u -> %s (prot %c%c%c)\n",
-                      access_type == MMU_DATA_LOAD ? "reading" :
-                      (access_type == MMU_DATA_STORE ? "writing" : "execute"),
-                      (uint32_t)address, mmu_idx,
-                      ret ? "Miss" : "Hit",
-                      *prot & PAGE_READ ? 'r' : '-',
-                      *prot & PAGE_WRITE ? 'w' : '-',
-                      *prot & PAGE_EXEC ? 'x' : '-');
-
-        return ret;
-    }
-
-    /* Definitely a real MMU, not an MPU */
-
-    if (regime_translation_disabled(env, mmu_idx)) {
-        uint64_t hcr;
-        uint8_t memattr;
-
-        /*
-         * MMU disabled.  S1 addresses within aa64 translation regimes are
-         * still checked for bounds -- see AArch64.TranslateAddressS1Off.
-         */
-        if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
-            int r_el = regime_el(env, mmu_idx);
-            if (arm_el_is_aa64(env, r_el)) {
-                int pamax = arm_pamax(env_archcpu(env));
-                uint64_t tcr = env->cp15.tcr_el[r_el].raw_tcr;
-                int addrtop, tbi;
-
-                tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
-                if (access_type == MMU_INST_FETCH) {
-                    tbi &= ~aa64_va_parameter_tbid(tcr, mmu_idx);
-                }
-                tbi = (tbi >> extract64(address, 55, 1)) & 1;
-                addrtop = (tbi ? 55 : 63);
-
-                if (extract64(address, pamax, addrtop - pamax + 1) != 0) {
-                    fi->type = ARMFault_AddressSize;
-                    fi->level = 0;
-                    fi->stage2 = false;
-                    return 1;
-                }
-
-                /*
-                 * When TBI is disabled, we've just validated that all of the
-                 * bits above PAMax are zero, so logically we only need to
-                 * clear the top byte for TBI.  But it's clearer to follow
-                 * the pseudocode set of addrdesc.paddress.
-                 */
-                address = extract64(address, 0, 52);
-            }
-        }
-        *phys_ptr = address;
-        *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
-        *page_size = TARGET_PAGE_SIZE;
-
-        /* Fill in cacheattr a-la AArch64.TranslateAddressS1Off. */
-        hcr = arm_hcr_el2_eff(env);
-        cacheattrs->shareability = 0;
-        cacheattrs->is_s2_format = false;
-        if (hcr & HCR_DC) {
-            if (hcr & HCR_DCT) {
-                memattr = 0xf0;  /* Tagged, Normal, WB, RWA */
-            } else {
-                memattr = 0xff;  /* Normal, WB, RWA */
-            }
-        } else if (access_type == MMU_INST_FETCH) {
-            if (regime_sctlr(env, mmu_idx) & SCTLR_I) {
-                memattr = 0xee;  /* Normal, WT, RA, NT */
-            } else {
-                memattr = 0x44;  /* Normal, NC, No */
-            }
-            cacheattrs->shareability = 2; /* outer sharable */
-        } else {
-            memattr = 0x00;      /* Device, nGnRnE */
-        }
-        cacheattrs->attrs = memattr;
-        return 0;
-    }
-
-    if (regime_using_lpae_format(env, mmu_idx)) {
-        return get_phys_addr_lpae(env, address, access_type, mmu_idx, false,
-                                  phys_ptr, attrs, prot, page_size,
-                                  fi, cacheattrs);
-    } else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
-        return get_phys_addr_v6(env, address, access_type, mmu_idx,
-                                phys_ptr, attrs, prot, page_size, fi);
-    } else {
-        return get_phys_addr_v5(env, address, access_type, mmu_idx,
-                                    phys_ptr, prot, page_size, fi);
-    }
-}
-
 hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
                                          MemTxAttrs *attrs)
 {
@@ -XXX,XX +XXX,XX @@ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
     }
     return phys_addr;
 }
-
 #endif
 
 /* Note that signed overflow is undefined in C.  The following routines are
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM page table walking.
+ *
+ * This code is licensed under the GNU GPL v2 or later.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "cpu.h"
+#include "internals.h"
+#include "ptw.h"
+
+
+/**
+ * get_phys_addr - get the physical address for this virtual address
+ *
+ * Find the physical address corresponding to the given virtual address,
+ * by doing a translation table walk on MMU based systems or using the
+ * MPU state on MPU based systems.
+ *
+ * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
+ * prot and page_size may not be filled in, and the populated fsr value provides
+ * information on why the translation aborted, in the format of a
+ * DFSR/IFSR fault register, with the following caveats:
+ *  * we honour the short vs long DFSR format differences.
+ *  * the WnR bit is never set (the caller must do this).
+ *  * for PSMAv5 based systems we don't bother to return a full FSR format
+ *    value.
+ *
+ * @env: CPUARMState
+ * @address: virtual address to get physical address for
+ * @access_type: 0 for read, 1 for write, 2 for execute
+ * @mmu_idx: MMU index indicating required translation regime
+ * @phys_ptr: set to the physical address corresponding to the virtual address
+ * @attrs: set to the memory transaction attributes to use
+ * @prot: set to the permissions for the page containing phys_ptr
+ * @page_size: set to the size of the page containing phys_ptr
+ * @fi: set to fault info if the translation fails
+ * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
+ */
+bool get_phys_addr(CPUARMState *env, target_ulong address,
+                   MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                   hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
+                   target_ulong *page_size,
+                   ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
+{
+    ARMMMUIdx s1_mmu_idx = stage_1_mmu_idx(mmu_idx);
+
+    if (mmu_idx != s1_mmu_idx) {
+        /*
+         * Call ourselves recursively to do the stage 1 and then stage 2
+         * translations if mmu_idx is a two-stage regime.
+         */
+        if (arm_feature(env, ARM_FEATURE_EL2)) {
+            hwaddr ipa;
+            int s2_prot;
+            int ret;
+            bool ipa_secure;
+            ARMCacheAttrs cacheattrs2 = {};
+            ARMMMUIdx s2_mmu_idx;
+            bool is_el0;
+
+            ret = get_phys_addr(env, address, access_type, s1_mmu_idx, &ipa,
+                                attrs, prot, page_size, fi, cacheattrs);
+
+            /* If S1 fails or S2 is disabled, return early.  */
+            if (ret || regime_translation_disabled(env, ARMMMUIdx_Stage2)) {
+                *phys_ptr = ipa;
+                return ret;
+            }
+
+            ipa_secure = attrs->secure;
+            if (arm_is_secure_below_el3(env)) {
+                if (ipa_secure) {
+                    attrs->secure = !(env->cp15.vstcr_el2.raw_tcr & VSTCR_SW);
+                } else {
+                    attrs->secure = !(env->cp15.vtcr_el2.raw_tcr & VTCR_NSW);
+                }
+            } else {
+                assert(!ipa_secure);
+            }
+
+            s2_mmu_idx = attrs->secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
+            is_el0 = mmu_idx == ARMMMUIdx_E10_0 || mmu_idx == ARMMMUIdx_SE10_0;
+
+            /* S1 is done. Now do S2 translation.  */
+            ret = get_phys_addr_lpae(env, ipa, access_type, s2_mmu_idx, is_el0,
+                                     phys_ptr, attrs, &s2_prot,
+                                     page_size, fi, &cacheattrs2);
+            fi->s2addr = ipa;
+            /* Combine the S1 and S2 perms.  */
+            *prot &= s2_prot;
+
+            /* If S2 fails, return early.  */
+            if (ret) {
+                return ret;
+            }
+
+            /* Combine the S1 and S2 cache attributes. */
+            if (arm_hcr_el2_eff(env) & HCR_DC) {
+                /*
+                 * HCR.DC forces the first stage attributes to
+                 *  Normal Non-Shareable,
+                 *  Inner Write-Back Read-Allocate Write-Allocate,
+                 *  Outer Write-Back Read-Allocate Write-Allocate.
+                 * Do not overwrite Tagged within attrs.
+                 */
+                if (cacheattrs->attrs != 0xf0) {
+                    cacheattrs->attrs = 0xff;
+                }
+                cacheattrs->shareability = 0;
+            }
+            *cacheattrs = combine_cacheattrs(env, *cacheattrs, cacheattrs2);
+
+            /* Check if IPA translates to secure or non-secure PA space. */
+            if (arm_is_secure_below_el3(env)) {
+                if (ipa_secure) {
+                    attrs->secure =
+                        !(env->cp15.vstcr_el2.raw_tcr & (VSTCR_SA | VSTCR_SW));
+                } else {
+                    attrs->secure =
+                        !((env->cp15.vtcr_el2.raw_tcr & (VTCR_NSA | VTCR_NSW))
+                        || (env->cp15.vstcr_el2.raw_tcr & (VSTCR_SA | VSTCR_SW)));
+                }
+            }
+            return 0;
+        } else {
+            /*
+             * For non-EL2 CPUs a stage1+stage2 translation is just stage 1.
+             */
+            mmu_idx = stage_1_mmu_idx(mmu_idx);
+        }
+    }
+
+    /*
+     * The page table entries may downgrade secure to non-secure, but
+     * cannot upgrade an non-secure translation regime's attributes
+     * to secure.
+     */
+    attrs->secure = regime_is_secure(env, mmu_idx);
+    attrs->user = regime_is_user(env, mmu_idx);
+
+    /*
+     * Fast Context Switch Extension. This doesn't exist at all in v8.
+     * In v7 and earlier it affects all stage 1 translations.
+     */
+    if (address < 0x02000000 && mmu_idx != ARMMMUIdx_Stage2
+        && !arm_feature(env, ARM_FEATURE_V8)) {
+        if (regime_el(env, mmu_idx) == 3) {
+            address += env->cp15.fcseidr_s;
+        } else {
+            address += env->cp15.fcseidr_ns;
+        }
+    }
+
+    if (arm_feature(env, ARM_FEATURE_PMSA)) {
+        bool ret;
+        *page_size = TARGET_PAGE_SIZE;
+
+        if (arm_feature(env, ARM_FEATURE_V8)) {
+            /* PMSAv8 */
+            ret = get_phys_addr_pmsav8(env, address, access_type, mmu_idx,
+                                       phys_ptr, attrs, prot, page_size, fi);
+        } else if (arm_feature(env, ARM_FEATURE_V7)) {
+            /* PMSAv7 */
+            ret = get_phys_addr_pmsav7(env, address, access_type, mmu_idx,
+                                       phys_ptr, prot, page_size, fi);
+        } else {
+            /* Pre-v7 MPU */
+            ret = get_phys_addr_pmsav5(env, address, access_type, mmu_idx,
+                                       phys_ptr, prot, fi);
+        }
+        qemu_log_mask(CPU_LOG_MMU, "PMSA MPU lookup for %s at 0x%08" PRIx32
+                      " mmu_idx %u -> %s (prot %c%c%c)\n",
+                      access_type == MMU_DATA_LOAD ? "reading" :
+                      (access_type == MMU_DATA_STORE ? "writing" : "execute"),
+                      (uint32_t)address, mmu_idx,
+                      ret ? "Miss" : "Hit",
+                      *prot & PAGE_READ ? 'r' : '-',
+                      *prot & PAGE_WRITE ? 'w' : '-',
+                      *prot & PAGE_EXEC ? 'x' : '-');
+
+        return ret;
+    }
+
+    /* Definitely a real MMU, not an MPU */
+
+    if (regime_translation_disabled(env, mmu_idx)) {
+        uint64_t hcr;
+        uint8_t memattr;
+
+        /*
+         * MMU disabled.  S1 addresses within aa64 translation regimes are
+         * still checked for bounds -- see AArch64.TranslateAddressS1Off.
+         */
+        if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
+            int r_el = regime_el(env, mmu_idx);
+            if (arm_el_is_aa64(env, r_el)) {
+                int pamax = arm_pamax(env_archcpu(env));
+                uint64_t tcr = env->cp15.tcr_el[r_el].raw_tcr;
+                int addrtop, tbi;
+
+                tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
+                if (access_type == MMU_INST_FETCH) {
+                    tbi &= ~aa64_va_parameter_tbid(tcr, mmu_idx);
+                }
+                tbi = (tbi >> extract64(address, 55, 1)) & 1;
+                addrtop = (tbi ? 55 : 63);
+
+                if (extract64(address, pamax, addrtop - pamax + 1) != 0) {
+                    fi->type = ARMFault_AddressSize;
+                    fi->level = 0;
+                    fi->stage2 = false;
+                    return 1;
+                }
+
+                /*
+                 * When TBI is disabled, we've just validated that all of the
+                 * bits above PAMax are zero, so logically we only need to
+                 * clear the top byte for TBI.  But it's clearer to follow
+                 * the pseudocode set of addrdesc.paddress.
+                 */
+                address = extract64(address, 0, 52);
+            }
+        }
+        *phys_ptr = address;
+        *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
+        *page_size = TARGET_PAGE_SIZE;
+
+        /* Fill in cacheattr a-la AArch64.TranslateAddressS1Off. */
+        hcr = arm_hcr_el2_eff(env);
+        cacheattrs->shareability = 0;
+        cacheattrs->is_s2_format = false;
+        if (hcr & HCR_DC) {
+            if (hcr & HCR_DCT) {
+                memattr = 0xf0;  /* Tagged, Normal, WB, RWA */
+            } else {
+                memattr = 0xff;  /* Normal, WB, RWA */
+            }
+        } else if (access_type == MMU_INST_FETCH) {
+            if (regime_sctlr(env, mmu_idx) & SCTLR_I) {
+                memattr = 0xee;  /* Normal, WT, RA, NT */
+            } else {
+                memattr = 0x44;  /* Normal, NC, No */
+            }
+            cacheattrs->shareability = 2; /* outer sharable */
+        } else {
+            memattr = 0x00;      /* Device, nGnRnE */
+        }
+        cacheattrs->attrs = memattr;
+        return 0;
+    }
+
+    if (regime_using_lpae_format(env, mmu_idx)) {
+        return get_phys_addr_lpae(env, address, access_type, mmu_idx, false,
+                                  phys_ptr, attrs, prot, page_size,
+                                  fi, cacheattrs);
+    } else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
+        return get_phys_addr_v6(env, address, access_type, mmu_idx,
+                                phys_ptr, attrs, prot, page_size, fi);
+    } else {
+        return get_phys_addr_v5(env, address, access_type, mmu_idx,
+                                    phys_ptr, prot, page_size, fi);
+    }
+}
diff --git a/target/arm/meson.build b/target/arm/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -XXX,XX +XXX,XX @@ arm_softmmu_ss.add(files(
   'machine.c',
   'monitor.c',
   'psci.c',
+  'ptw.c',
 ))
 
 subdir('hvf')
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  15 +++--
 target/arm/helper.c | 137 +++-----------------------------------------
 target/arm/ptw.c    | 123 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 140 insertions(+), 135 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  11 +--
 target/arm/helper.c | 161 +-------------------------------------------
 target/arm/ptw.c    | 153 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 161 insertions(+), 164 deletions(-)

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.h
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@ bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
                               uint32_t *table, uint32_t address);
 int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
                   int ap, int domain_prot);
+int simple_ap_to_rw_prot_is_user(int ap, bool is_user);
+
+static inline int
+simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+{
+    return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
+}
 
 bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
                           MMUAccessType access_type, ARMMMUIdx mmu_idx,
                           hwaddr *phys_ptr, int *prot,
                           ARMMMUFaultInfo *fi);
-bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
-                      MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                      hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
-                      target_ulong *page_size, ARMMMUFaultInfo *fi);
 bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
                           MMUAccessType access_type, ARMMMUIdx mmu_idx,
                           hwaddr *phys_ptr, int *prot,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap, int domain_prot)
  * @ap:      The 2-bit simple AP (AP[2:1])
  * @is_user: TRUE if accessing from PL0
  */
-static inline int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
+int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
 {
     switch (ap) {
     case 0:
@@ -XXX,XX +XXX,XX @@ static inline int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
     }
 }
 
-static inline int
-simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
-{
-    return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
-}
-
 /* Translate S2 section/page access permissions to protection flags
  *
  * @env:     CPUARMState
@@ -XXX,XX +XXX,XX @@ uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
     return 0;
 }
 
-bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
-                      MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                      hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
-                      target_ulong *page_size, ARMMMUFaultInfo *fi)
-{
-    CPUState *cs = env_cpu(env);
-    ARMCPU *cpu = env_archcpu(env);
-    int level = 1;
-    uint32_t table;
-    uint32_t desc;
-    uint32_t xn;
-    uint32_t pxn = 0;
-    int type;
-    int ap;
-    int domain = 0;
-    int domain_prot;
-    hwaddr phys_addr;
-    uint32_t dacr;
-    bool ns;
-
-    /* Pagetable walk.  */
-    /* Lookup l1 descriptor.  */
-    if (!get_level1_table_address(env, mmu_idx, &table, address)) {
-        /* Section translation fault if page walk is disabled by PD0 or PD1 */
-        fi->type = ARMFault_Translation;
-        goto do_fault;
-    }
-    desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
-                       mmu_idx, fi);
-    if (fi->type != ARMFault_None) {
-        goto do_fault;
-    }
-    type = (desc & 3);
-    if (type == 0 || (type == 3 && !cpu_isar_feature(aa32_pxn, cpu))) {
-        /* Section translation fault, or attempt to use the encoding
-         * which is Reserved on implementations without PXN.
-         */
-        fi->type = ARMFault_Translation;
-        goto do_fault;
-    }
-    if ((type == 1) || !(desc & (1 << 18))) {
-        /* Page or Section.  */
-        domain = (desc >> 5) & 0x0f;
-    }
-    if (regime_el(env, mmu_idx) == 1) {
-        dacr = env->cp15.dacr_ns;
-    } else {
-        dacr = env->cp15.dacr_s;
-    }
-    if (type == 1) {
-        level = 2;
-    }
-    domain_prot = (dacr >> (domain * 2)) & 3;
-    if (domain_prot == 0 || domain_prot == 2) {
-        /* Section or Page domain fault */
-        fi->type = ARMFault_Domain;
-        goto do_fault;
-    }
-    if (type != 1) {
-        if (desc & (1 << 18)) {
-            /* Supersection.  */
-            phys_addr = (desc & 0xff000000) | (address & 0x00ffffff);
-            phys_addr |= (uint64_t)extract32(desc, 20, 4) << 32;
-            phys_addr |= (uint64_t)extract32(desc, 5, 4) << 36;
-            *page_size = 0x1000000;
-        } else {
-            /* Section.  */
-            phys_addr = (desc & 0xfff00000) | (address & 0x000fffff);
-            *page_size = 0x100000;
-        }
-        ap = ((desc >> 10) & 3) | ((desc >> 13) & 4);
-        xn = desc & (1 << 4);
-        pxn = desc & 1;
-        ns = extract32(desc, 19, 1);
-    } else {
-        if (cpu_isar_feature(aa32_pxn, cpu)) {
-            pxn = (desc >> 2) & 1;
-        }
-        ns = extract32(desc, 3, 1);
-        /* Lookup l2 entry.  */
-        table = (desc & 0xfffffc00) | ((address >> 10) & 0x3fc);
-        desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
-                           mmu_idx, fi);
-        if (fi->type != ARMFault_None) {
-            goto do_fault;
-        }
-        ap = ((desc >> 4) & 3) | ((desc >> 7) & 4);
-        switch (desc & 3) {
-        case 0: /* Page translation fault.  */
-            fi->type = ARMFault_Translation;
-            goto do_fault;
-        case 1: /* 64k page.  */
-            phys_addr = (desc & 0xffff0000) | (address & 0xffff);
-            xn = desc & (1 << 15);
-            *page_size = 0x10000;
-            break;
-        case 2: case 3: /* 4k page.  */
-            phys_addr = (desc & 0xfffff000) | (address & 0xfff);
-            xn = desc & 1;
-            *page_size = 0x1000;
-            break;
-        default:
-            /* Never happens, but compiler isn't smart enough to tell.  */
-            g_assert_not_reached();
-        }
-    }
-    if (domain_prot == 3) {
-        *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
-    } else {
-        if (pxn && !regime_is_user(env, mmu_idx)) {
-            xn = 1;
-        }
-        if (xn && access_type == MMU_INST_FETCH) {
-            fi->type = ARMFault_Permission;
-            goto do_fault;
-        }
-
-        if (arm_feature(env, ARM_FEATURE_V6K) &&
-                (regime_sctlr(env, mmu_idx) & SCTLR_AFE)) {
-            /* The simplified model uses AP[0] as an access control bit.  */
-            if ((ap & 1) == 0) {
-                /* Access flag fault.  */
-                fi->type = ARMFault_AccessFlag;
-                goto do_fault;
-            }
-            *prot = simple_ap_to_rw_prot(env, mmu_idx, ap >> 1);
-        } else {
-            *prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
-        }
-        if (*prot && !xn) {
-            *prot |= PAGE_EXEC;
-        }
-        if (!(*prot & (1 << access_type))) {
-            /* Access permission fault.  */
-            fi->type = ARMFault_Permission;
-            goto do_fault;
-        }
-    }
-    if (ns) {
-        /* The NS bit will (as required by the architecture) have no effect if
-         * the CPU doesn't support TZ or this is a non-secure translation
-         * regime, because the attribute will already be non-secure.
-         */
-        attrs->secure = false;
-    }
-    *phys_ptr = phys_addr;
-    return false;
-do_fault:
-    fi->domain = domain;
-    fi->level = level;
-    return true;
-}
-
 /*
  * check_s2_mmu_setup
  * @cpu:        ARMCPU
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ do_fault:
     return true;
 }
 
+static bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
+                             MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                             hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
+                             target_ulong *page_size, ARMMMUFaultInfo *fi)
+{
+    CPUState *cs = env_cpu(env);
+    ARMCPU *cpu = env_archcpu(env);
+    int level = 1;
+    uint32_t table;
+    uint32_t desc;
+    uint32_t xn;
+    uint32_t pxn = 0;
+    int type;
+    int ap;
+    int domain = 0;
+    int domain_prot;
+    hwaddr phys_addr;
+    uint32_t dacr;
+    bool ns;
+
+    /* Pagetable walk.  */
+    /* Lookup l1 descriptor.  */
+    if (!get_level1_table_address(env, mmu_idx, &table, address)) {
+        /* Section translation fault if page walk is disabled by PD0 or PD1 */
+        fi->type = ARMFault_Translation;
+        goto do_fault;
+    }
+    desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
+                       mmu_idx, fi);
+    if (fi->type != ARMFault_None) {
+        goto do_fault;
+    }
+    type = (desc & 3);
+    if (type == 0 || (type == 3 && !cpu_isar_feature(aa32_pxn, cpu))) {
+        /* Section translation fault, or attempt to use the encoding
+         * which is Reserved on implementations without PXN.
+         */
+        fi->type = ARMFault_Translation;
+        goto do_fault;
+    }
+    if ((type == 1) || !(desc & (1 << 18))) {
+        /* Page or Section.  */
+        domain = (desc >> 5) & 0x0f;
+    }
+    if (regime_el(env, mmu_idx) == 1) {
+        dacr = env->cp15.dacr_ns;
+    } else {
+        dacr = env->cp15.dacr_s;
+    }
+    if (type == 1) {
+        level = 2;
+    }
+    domain_prot = (dacr >> (domain * 2)) & 3;
+    if (domain_prot == 0 || domain_prot == 2) {
+        /* Section or Page domain fault */
+        fi->type = ARMFault_Domain;
+        goto do_fault;
+    }
+    if (type != 1) {
+        if (desc & (1 << 18)) {
+            /* Supersection.  */
+            phys_addr = (desc & 0xff000000) | (address & 0x00ffffff);
+            phys_addr |= (uint64_t)extract32(desc, 20, 4) << 32;
+            phys_addr |= (uint64_t)extract32(desc, 5, 4) << 36;
+            *page_size = 0x1000000;
+        } else {
+            /* Section.  */
+            phys_addr = (desc & 0xfff00000) | (address & 0x000fffff);
+            *page_size = 0x100000;
+        }
+        ap = ((desc >> 10) & 3) | ((desc >> 13) & 4);
+        xn = desc & (1 << 4);
+        pxn = desc & 1;
+        ns = extract32(desc, 19, 1);
+    } else {
+        if (cpu_isar_feature(aa32_pxn, cpu)) {
+            pxn = (desc >> 2) & 1;
+        }
+        ns = extract32(desc, 3, 1);
+        /* Lookup l2 entry.  */
+        table = (desc & 0xfffffc00) | ((address >> 10) & 0x3fc);
+        desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
+                           mmu_idx, fi);
+        if (fi->type != ARMFault_None) {
+            goto do_fault;
+        }
+        ap = ((desc >> 4) & 3) | ((desc >> 7) & 4);
+        switch (desc & 3) {
+        case 0: /* Page translation fault.  */
+            fi->type = ARMFault_Translation;
+            goto do_fault;
+        case 1: /* 64k page.  */
+            phys_addr = (desc & 0xffff0000) | (address & 0xffff);
+            xn = desc & (1 << 15);
+            *page_size = 0x10000;
+            break;
+        case 2: case 3: /* 4k page.  */
+            phys_addr = (desc & 0xfffff000) | (address & 0xfff);
+            xn = desc & 1;
+            *page_size = 0x1000;
+            break;
+        default:
+            /* Never happens, but compiler isn't smart enough to tell.  */
+            g_assert_not_reached();
+        }
+    }
+    if (domain_prot == 3) {
+        *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
+    } else {
+        if (pxn && !regime_is_user(env, mmu_idx)) {
+            xn = 1;
+        }
+        if (xn && access_type == MMU_INST_FETCH) {
+            fi->type = ARMFault_Permission;
+            goto do_fault;
+        }
+
+        if (arm_feature(env, ARM_FEATURE_V6K) &&
+                (regime_sctlr(env, mmu_idx) & SCTLR_AFE)) {
+            /* The simplified model uses AP[0] as an access control bit.  */
+            if ((ap & 1) == 0) {
+                /* Access flag fault.  */
+                fi->type = ARMFault_AccessFlag;
+                goto do_fault;
+            }
+            *prot = simple_ap_to_rw_prot(env, mmu_idx, ap >> 1);
+        } else {
+            *prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
+        }
+        if (*prot && !xn) {
+            *prot |= PAGE_EXEC;
+        }
+        if (!(*prot & (1 << access_type))) {
+            /* Access permission fault.  */
+            fi->type = ARMFault_Permission;
+            goto do_fault;
+        }
+    }
+    if (ns) {
+        /* The NS bit will (as required by the architecture) have no effect if
+         * the CPU doesn't support TZ or this is a non-secure translation
+         * regime, because the attribute will already be non-secure.
+         */
+        attrs->secure = false;
+    }
+    *phys_ptr = phys_addr;
+    return false;
+do_fault:
+    fi->domain = domain;
+    fi->level = level;
+    return true;
+}
+
 /**
  * get_phys_addr - get the physical address for this virtual address
  *
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  4 ---
 target/arm/helper.c | 85 ---------------------------------------------
 target/arm/ptw.c    | 85 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 85 insertions(+), 89 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  3 +++
 target/arm/helper.c | 41 -----------------------------------------
 target/arm/ptw.c    | 41 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+), 41 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  10 +--
 target/arm/helper.c | 194 +-------------------------------------------
 target/arm/ptw.c    | 190 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 198 insertions(+), 196 deletions(-)

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.h
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
     return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
 }
 
+bool m_is_ppb_region(CPUARMState *env, uint32_t address);
+bool m_is_system_region(CPUARMState *env, uint32_t address);
+
 void get_phys_addr_pmsav7_default(CPUARMState *env,
                                   ARMMMUIdx mmu_idx,
                                   int32_t address, int *prot);
-bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
-                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                          hwaddr *phys_ptr, int *prot,
-                          target_ulong *page_size,
-                          ARMMMUFaultInfo *fi);
+bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx, bool is_user);
+
 bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
                           MMUAccessType access_type, ARMMMUIdx mmu_idx,
                           hwaddr *phys_ptr, MemTxAttrs *txattrs,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ do_fault:
     return true;
 }
 
-static bool pmsav7_use_background_region(ARMCPU *cpu,
-                                         ARMMMUIdx mmu_idx, bool is_user)
+bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx, bool is_user)
 {
     /* Return true if we should use the default memory map as a
      * "background" region if there are no hits against any MPU regions.
@@ -XXX,XX +XXX,XX @@ static bool pmsav7_use_background_region(ARMCPU *cpu,
     }
 }
 
-static inline bool m_is_ppb_region(CPUARMState *env, uint32_t address)
+bool m_is_ppb_region(CPUARMState *env, uint32_t address)
 {
     /* True if address is in the M profile PPB region 0xe0000000 - 0xe00fffff */
     return arm_feature(env, ARM_FEATURE_M) &&
         extract32(address, 20, 12) == 0xe00;
 }
 
-static inline bool m_is_system_region(CPUARMState *env, uint32_t address)
+bool m_is_system_region(CPUARMState *env, uint32_t address)
 {
     /* True if address is in the M profile system region
      * 0xe0000000 - 0xffffffff
@@ -XXX,XX +XXX,XX @@ static inline bool m_is_system_region(CPUARMState *env, uint32_t address)
     return arm_feature(env, ARM_FEATURE_M) && extract32(address, 29, 3) == 0x7;
 }
 
-bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
-                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                          hwaddr *phys_ptr, int *prot,
-                          target_ulong *page_size,
-                          ARMMMUFaultInfo *fi)
-{
-    ARMCPU *cpu = env_archcpu(env);
-    int n;
-    bool is_user = regime_is_user(env, mmu_idx);
-
-    *phys_ptr = address;
-    *page_size = TARGET_PAGE_SIZE;
-    *prot = 0;
-
-    if (regime_translation_disabled(env, mmu_idx) ||
-        m_is_ppb_region(env, address)) {
-        /* MPU disabled or M profile PPB access: use default memory map.
-         * The other case which uses the default memory map in the
-         * v7M ARM ARM pseudocode is exception vector reads from the vector
-         * table. In QEMU those accesses are done in arm_v7m_load_vector(),
-         * which always does a direct read using address_space_ldl(), rather
-         * than going via this function, so we don't need to check that here.
-         */
-        get_phys_addr_pmsav7_default(env, mmu_idx, address, prot);
-    } else { /* MPU enabled */
-        for (n = (int)cpu->pmsav7_dregion - 1; n >= 0; n--) {
-            /* region search */
-            uint32_t base = env->pmsav7.drbar[n];
-            uint32_t rsize = extract32(env->pmsav7.drsr[n], 1, 5);
-            uint32_t rmask;
-            bool srdis = false;
-
-            if (!(env->pmsav7.drsr[n] & 0x1)) {
-                continue;
-            }
-
-            if (!rsize) {
-                qemu_log_mask(LOG_GUEST_ERROR,
-                              "DRSR[%d]: Rsize field cannot be 0\n", n);
-                continue;
-            }
-            rsize++;
-            rmask = (1ull << rsize) - 1;
-
-            if (base & rmask) {
-                qemu_log_mask(LOG_GUEST_ERROR,
-                              "DRBAR[%d]: 0x%" PRIx32 " misaligned "
-                              "to DRSR region size, mask = 0x%" PRIx32 "\n",
-                              n, base, rmask);
-                continue;
-            }
-
-            if (address < base || address > base + rmask) {
-                /*
-                 * Address not in this region. We must check whether the
-                 * region covers addresses in the same page as our address.
-                 * In that case we must not report a size that covers the
-                 * whole page for a subsequent hit against a different MPU
-                 * region or the background region, because it would result in
-                 * incorrect TLB hits for subsequent accesses to addresses that
-                 * are in this MPU region.
-                 */
-                if (ranges_overlap(base, rmask,
-                                   address & TARGET_PAGE_MASK,
-                                   TARGET_PAGE_SIZE)) {
-                    *page_size = 1;
-                }
-                continue;
-            }
-
-            /* Region matched */
-
-            if (rsize >= 8) { /* no subregions for regions < 256 bytes */
-                int i, snd;
-                uint32_t srdis_mask;
-
-                rsize -= 3; /* sub region size (power of 2) */
-                snd = ((address - base) >> rsize) & 0x7;
-                srdis = extract32(env->pmsav7.drsr[n], snd + 8, 1);
-
-                srdis_mask = srdis ? 0x3 : 0x0;
-                for (i = 2; i <= 8 && rsize < TARGET_PAGE_BITS; i *= 2) {
-                    /* This will check in groups of 2, 4 and then 8, whether
-                     * the subregion bits are consistent. rsize is incremented
-                     * back up to give the region size, considering consistent
-                     * adjacent subregions as one region. Stop testing if rsize
-                     * is already big enough for an entire QEMU page.
-                     */
-                    int snd_rounded = snd & ~(i - 1);
-                    uint32_t srdis_multi = extract32(env->pmsav7.drsr[n],
-                                                     snd_rounded + 8, i);
-                    if (srdis_mask ^ srdis_multi) {
-                        break;
-                    }
-                    srdis_mask = (srdis_mask << i) | srdis_mask;
-                    rsize++;
-                }
-            }
-            if (srdis) {
-                continue;
-            }
-            if (rsize < TARGET_PAGE_BITS) {
-                *page_size = 1 << rsize;
-            }
-            break;
-        }
-
-        if (n == -1) { /* no hits */
-            if (!pmsav7_use_background_region(cpu, mmu_idx, is_user)) {
-                /* background fault */
-                fi->type = ARMFault_Background;
-                return true;
-            }
-            get_phys_addr_pmsav7_default(env, mmu_idx, address, prot);
-        } else { /* a MPU hit! */
-            uint32_t ap = extract32(env->pmsav7.dracr[n], 8, 3);
-            uint32_t xn = extract32(env->pmsav7.dracr[n], 12, 1);
-
-            if (m_is_system_region(env, address)) {
-                /* System space is always execute never */
-                xn = 1;
-            }
-
-            if (is_user) { /* User mode AP bit decoding */
-                switch (ap) {
-                case 0:
-                case 1:
-                case 5:
-                    break; /* no access */
-                case 3:
-                    *prot |= PAGE_WRITE;
-                    /* fall through */
-                case 2:
-                case 6:
-                    *prot |= PAGE_READ | PAGE_EXEC;
-                    break;
-                case 7:
-                    /* for v7M, same as 6; for R profile a reserved value */
-                    if (arm_feature(env, ARM_FEATURE_M)) {
-                        *prot |= PAGE_READ | PAGE_EXEC;
-                        break;
-                    }
-                    /* fall through */
-                default:
-                    qemu_log_mask(LOG_GUEST_ERROR,
-                                  "DRACR[%d]: Bad value for AP bits: 0x%"
-                                  PRIx32 "\n", n, ap);
-                }
-            } else { /* Priv. mode AP bits decoding */
-                switch (ap) {
-                case 0:
-                    break; /* no access */
-                case 1:
-                case 2:
-                case 3:
-                    *prot |= PAGE_WRITE;
-                    /* fall through */
-                case 5:
-                case 6:
-                    *prot |= PAGE_READ | PAGE_EXEC;
-                    break;
-                case 7:
-                    /* for v7M, same as 6; for R profile a reserved value */
-                    if (arm_feature(env, ARM_FEATURE_M)) {
-                        *prot |= PAGE_READ | PAGE_EXEC;
-                        break;
-                    }
-                    /* fall through */
-                default:
-                    qemu_log_mask(LOG_GUEST_ERROR,
-                                  "DRACR[%d]: Bad value for AP bits: 0x%"
-                                  PRIx32 "\n", n, ap);
-                }
-            }
-
-            /* execute never */
-            if (xn) {
-                *prot &= ~PAGE_EXEC;
-            }
-        }
-    }
-
-    fi->type = ARMFault_Permission;
-    fi->level = 1;
-    return !(*prot & (1 << access_type));
-}
-
 static bool v8m_is_sau_exempt(CPUARMState *env,
                               uint32_t address, MMUAccessType access_type)
 {
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "qemu/range.h"
 #include "cpu.h"
 #include "internals.h"
 #include "ptw.h"
@@ -XXX,XX +XXX,XX @@ void get_phys_addr_pmsav7_default(CPUARMState *env,
     }
 }
 
+static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
+                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                                 hwaddr *phys_ptr, int *prot,
+                                 target_ulong *page_size,
+                                 ARMMMUFaultInfo *fi)
+{
+    ARMCPU *cpu = env_archcpu(env);
+    int n;
+    bool is_user = regime_is_user(env, mmu_idx);
+
+    *phys_ptr = address;
+    *page_size = TARGET_PAGE_SIZE;
+    *prot = 0;
+
+    if (regime_translation_disabled(env, mmu_idx) ||
+        m_is_ppb_region(env, address)) {
+        /*
+         * MPU disabled or M profile PPB access: use default memory map.
+         * The other case which uses the default memory map in the
+         * v7M ARM ARM pseudocode is exception vector reads from the vector
+         * table. In QEMU those accesses are done in arm_v7m_load_vector(),
+         * which always does a direct read using address_space_ldl(), rather
+         * than going via this function, so we don't need to check that here.
+         */
+        get_phys_addr_pmsav7_default(env, mmu_idx, address, prot);
+    } else { /* MPU enabled */
+        for (n = (int)cpu->pmsav7_dregion - 1; n >= 0; n--) {
+            /* region search */
+            uint32_t base = env->pmsav7.drbar[n];
+            uint32_t rsize = extract32(env->pmsav7.drsr[n], 1, 5);
+            uint32_t rmask;
+            bool srdis = false;
+
+            if (!(env->pmsav7.drsr[n] & 0x1)) {
+                continue;
+            }
+
+            if (!rsize) {
+                qemu_log_mask(LOG_GUEST_ERROR,
+                              "DRSR[%d]: Rsize field cannot be 0\n", n);
+                continue;
+            }
+            rsize++;
+            rmask = (1ull << rsize) - 1;
+
+            if (base & rmask) {
+                qemu_log_mask(LOG_GUEST_ERROR,
+                              "DRBAR[%d]: 0x%" PRIx32 " misaligned "
+                              "to DRSR region size, mask = 0x%" PRIx32 "\n",
+                              n, base, rmask);
+                continue;
+            }
+
+            if (address < base || address > base + rmask) {
+                /*
+                 * Address not in this region. We must check whether the
+                 * region covers addresses in the same page as our address.
+                 * In that case we must not report a size that covers the
+                 * whole page for a subsequent hit against a different MPU
+                 * region or the background region, because it would result in
+                 * incorrect TLB hits for subsequent accesses to addresses that
+                 * are in this MPU region.
+                 */
+                if (ranges_overlap(base, rmask,
+                                   address & TARGET_PAGE_MASK,
+                                   TARGET_PAGE_SIZE)) {
+                    *page_size = 1;
+                }
+                continue;
+            }
+
+            /* Region matched */
+
+            if (rsize >= 8) { /* no subregions for regions < 256 bytes */
+                int i, snd;
+                uint32_t srdis_mask;
+
+                rsize -= 3; /* sub region size (power of 2) */
+                snd = ((address - base) >> rsize) & 0x7;
+                srdis = extract32(env->pmsav7.drsr[n], snd + 8, 1);
+
+                srdis_mask = srdis ? 0x3 : 0x0;
+                for (i = 2; i <= 8 && rsize < TARGET_PAGE_BITS; i *= 2) {
+                    /*
+                     * This will check in groups of 2, 4 and then 8, whether
+                     * the subregion bits are consistent. rsize is incremented
+                     * back up to give the region size, considering consistent
+                     * adjacent subregions as one region. Stop testing if rsize
+                     * is already big enough for an entire QEMU page.
+                     */
+                    int snd_rounded = snd & ~(i - 1);
+                    uint32_t srdis_multi = extract32(env->pmsav7.drsr[n],
+                                                     snd_rounded + 8, i);
+                    if (srdis_mask ^ srdis_multi) {
+                        break;
+                    }
+                    srdis_mask = (srdis_mask << i) | srdis_mask;
+                    rsize++;
+                }
+            }
+            if (srdis) {
+                continue;
+            }
+            if (rsize < TARGET_PAGE_BITS) {
+                *page_size = 1 << rsize;
+            }
+            break;
+        }
+
+        if (n == -1) { /* no hits */
+            if (!pmsav7_use_background_region(cpu, mmu_idx, is_user)) {
+                /* background fault */
+                fi->type = ARMFault_Background;
+                return true;
+            }
+            get_phys_addr_pmsav7_default(env, mmu_idx, address, prot);
+        } else { /* a MPU hit! */
+            uint32_t ap = extract32(env->pmsav7.dracr[n], 8, 3);
+            uint32_t xn = extract32(env->pmsav7.dracr[n], 12, 1);
+
+            if (m_is_system_region(env, address)) {
+                /* System space is always execute never */
+                xn = 1;
+            }
+
+            if (is_user) { /* User mode AP bit decoding */
+                switch (ap) {
+                case 0:
+                case 1:
+                case 5:
+                    break; /* no access */
+                case 3:
+                    *prot |= PAGE_WRITE;
+                    /* fall through */
+                case 2:
+                case 6:
+                    *prot |= PAGE_READ | PAGE_EXEC;
+                    break;
+                case 7:
+                    /* for v7M, same as 6; for R profile a reserved value */
+                    if (arm_feature(env, ARM_FEATURE_M)) {
+                        *prot |= PAGE_READ | PAGE_EXEC;
+                        break;
+                    }
+                    /* fall through */
+                default:
+                    qemu_log_mask(LOG_GUEST_ERROR,
+                                  "DRACR[%d]: Bad value for AP bits: 0x%"
+                                  PRIx32 "\n", n, ap);
+                }
+            } else { /* Priv. mode AP bits decoding */
+                switch (ap) {
+                case 0:
+                    break; /* no access */
+                case 1:
+                case 2:
+                case 3:
+                    *prot |= PAGE_WRITE;
+                    /* fall through */
+                case 5:
+                case 6:
+                    *prot |= PAGE_READ | PAGE_EXEC;
+                    break;
+                case 7:
+                    /* for v7M, same as 6; for R profile a reserved value */
+                    if (arm_feature(env, ARM_FEATURE_M)) {
+                        *prot |= PAGE_READ | PAGE_EXEC;
+                        break;
+                    }
+                    /* fall through */
+                default:
+                    qemu_log_mask(LOG_GUEST_ERROR,
+                                  "DRACR[%d]: Bad value for AP bits: 0x%"
+                                  PRIx32 "\n", n, ap);
+                }
+            }
+
+            /* execute never */
+            if (xn) {
+                *prot &= ~PAGE_EXEC;
+            }
+        }
+    }
+
+    fi->type = ARMFault_Permission;
+    fi->level = 1;
+    return !(*prot & (1 << access_type));
+}
+
 /**
  * get_phys_addr - get the physical address for this virtual address
  *
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-9-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  5 ---
 target/arm/helper.c | 75 -------------------------------------------
 target/arm/ptw.c    | 77 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 77 insertions(+), 80 deletions(-)

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.h
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@ void get_phys_addr_pmsav7_default(CPUARMState *env,
                                   int32_t address, int *prot);
 bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx, bool is_user);
 
-bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
-                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                          hwaddr *phys_ptr, MemTxAttrs *txattrs,
-                          int *prot, target_ulong *page_size,
-                          ARMMMUFaultInfo *fi);
 bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
                         MMUAccessType access_type, ARMMMUIdx mmu_idx,
                         bool s1_is_el0,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
     return !(*prot & (1 << access_type));
 }
 
-
-bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
-                          MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                          hwaddr *phys_ptr, MemTxAttrs *txattrs,
-                          int *prot, target_ulong *page_size,
-                          ARMMMUFaultInfo *fi)
-{
-    uint32_t secure = regime_is_secure(env, mmu_idx);
-    V8M_SAttributes sattrs = {};
-    bool ret;
-    bool mpu_is_subpage;
-
-    if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
-        v8m_security_lookup(env, address, access_type, mmu_idx, &sattrs);
-        if (access_type == MMU_INST_FETCH) {
-            /* Instruction fetches always use the MMU bank and the
-             * transaction attribute determined by the fetch address,
-             * regardless of CPU state. This is painful for QEMU
-             * to handle, because it would mean we need to encode
-             * into the mmu_idx not just the (user, negpri) information
-             * for the current security state but also that for the
-             * other security state, which would balloon the number
-             * of mmu_idx values needed alarmingly.
-             * Fortunately we can avoid this because it's not actually
-             * possible to arbitrarily execute code from memory with
-             * the wrong security attribute: it will always generate
-             * an exception of some kind or another, apart from the
-             * special case of an NS CPU executing an SG instruction
-             * in S&NSC memory. So we always just fail the translation
-             * here and sort things out in the exception handler
-             * (including possibly emulating an SG instruction).
-             */
-            if (sattrs.ns != !secure) {
-                if (sattrs.nsc) {
-                    fi->type = ARMFault_QEMU_NSCExec;
-                } else {
-                    fi->type = ARMFault_QEMU_SFault;
-                }
-                *page_size = sattrs.subpage ? 1 : TARGET_PAGE_SIZE;
-                *phys_ptr = address;
-                *prot = 0;
-                return true;
-            }
-        } else {
-            /* For data accesses we always use the MMU bank indicated
-             * by the current CPU state, but the security attributes
-             * might downgrade a secure access to nonsecure.
-             */
-            if (sattrs.ns) {
-                txattrs->secure = false;
-            } else if (!secure) {
-                /* NS access to S memory must fault.
-                 * Architecturally we should first check whether the
-                 * MPU information for this address indicates that we
-                 * are doing an unaligned access to Device memory, which
-                 * should generate a UsageFault instead. QEMU does not
-                 * currently check for that kind of unaligned access though.
-                 * If we added it we would need to do so as a special case
-                 * for M_FAKE_FSR_SFAULT in arm_v7m_cpu_do_interrupt().
-                 */
-                fi->type = ARMFault_QEMU_SFault;
-                *page_size = sattrs.subpage ? 1 : TARGET_PAGE_SIZE;
-                *phys_ptr = address;
-                *prot = 0;
-                return true;
-            }
-        }
-    }
-
-    ret = pmsav8_mpu_lookup(env, address, access_type, mmu_idx, phys_ptr,
-                            txattrs, prot, &mpu_is_subpage, fi, NULL);
-    *page_size = sattrs.subpage || mpu_is_subpage ? 1 : TARGET_PAGE_SIZE;
-    return ret;
-}
-
 /* Combine either inner or outer cacheability attributes for normal
  * memory, according to table D4-42 and pseudocode procedure
  * CombineS1S2AttrHints() of ARM DDI 0487B.b (the ARMv8 ARM).
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
     return !(*prot & (1 << access_type));
 }
 
+static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
+                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                                 hwaddr *phys_ptr, MemTxAttrs *txattrs,
+                                 int *prot, target_ulong *page_size,
+                                 ARMMMUFaultInfo *fi)
+{
+    uint32_t secure = regime_is_secure(env, mmu_idx);
+    V8M_SAttributes sattrs = {};
+    bool ret;
+    bool mpu_is_subpage;
+
+    if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
+        v8m_security_lookup(env, address, access_type, mmu_idx, &sattrs);
+        if (access_type == MMU_INST_FETCH) {
+            /*
+             * Instruction fetches always use the MMU bank and the
+             * transaction attribute determined by the fetch address,
+             * regardless of CPU state. This is painful for QEMU
+             * to handle, because it would mean we need to encode
+             * into the mmu_idx not just the (user, negpri) information
+             * for the current security state but also that for the
+             * other security state, which would balloon the number
+             * of mmu_idx values needed alarmingly.
+             * Fortunately we can avoid this because it's not actually
+             * possible to arbitrarily execute code from memory with
+             * the wrong security attribute: it will always generate
+             * an exception of some kind or another, apart from the
+             * special case of an NS CPU executing an SG instruction
+             * in S&NSC memory. So we always just fail the translation
+             * here and sort things out in the exception handler
+             * (including possibly emulating an SG instruction).
+             */
+            if (sattrs.ns != !secure) {
+                if (sattrs.nsc) {
+                    fi->type = ARMFault_QEMU_NSCExec;
+                } else {
+                    fi->type = ARMFault_QEMU_SFault;
+                }
+                *page_size = sattrs.subpage ? 1 : TARGET_PAGE_SIZE;
+                *phys_ptr = address;
+                *prot = 0;
+                return true;
+            }
+        } else {
+            /*
+             * For data accesses we always use the MMU bank indicated
+             * by the current CPU state, but the security attributes
+             * might downgrade a secure access to nonsecure.
+             */
+            if (sattrs.ns) {
+                txattrs->secure = false;
+            } else if (!secure) {
+                /*
+                 * NS access to S memory must fault.
+                 * Architecturally we should first check whether the
+                 * MPU information for this address indicates that we
+                 * are doing an unaligned access to Device memory, which
+                 * should generate a UsageFault instead. QEMU does not
+                 * currently check for that kind of unaligned access though.
+                 * If we added it we would need to do so as a special case
+                 * for M_FAKE_FSR_SFAULT in arm_v7m_cpu_do_interrupt().
+                 */
+                fi->type = ARMFault_QEMU_SFault;
+                *page_size = sattrs.subpage ? 1 : TARGET_PAGE_SIZE;
+                *phys_ptr = address;
+                *prot = 0;
+                return true;
+            }
+        }
+    }
+
+    ret = pmsav8_mpu_lookup(env, address, access_type, mmu_idx, phys_ptr,
+                            txattrs, prot, &mpu_is_subpage, fi, NULL);
+    *page_size = sattrs.subpage || mpu_is_subpage ? 1 : TARGET_PAGE_SIZE;
+    return ret;
+}
+
 /**
  * get_phys_addr - get the physical address for this virtual address
  *
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

This is the final user of get_phys_addr_pmsav7_default
within helper.c, so make it static within ptw.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-10-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |   3 -
 target/arm/helper.c | 136 -----------------------------------------
 target/arm/ptw.c    | 146 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 143 insertions(+), 142 deletions(-)

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.h
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
 bool m_is_ppb_region(CPUARMState *env, uint32_t address);
 bool m_is_system_region(CPUARMState *env, uint32_t address);
 
-void get_phys_addr_pmsav7_default(CPUARMState *env,
-                                  ARMMMUIdx mmu_idx,
-                                  int32_t address, int *prot);
 bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx, bool is_user);
 
 bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void v8m_security_lookup(CPUARMState *env, uint32_t address,
     }
 }
 
-bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
-                              MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                              hwaddr *phys_ptr, MemTxAttrs *txattrs,
-                              int *prot, bool *is_subpage,
-                              ARMMMUFaultInfo *fi, uint32_t *mregion)
-{
-    /* Perform a PMSAv8 MPU lookup (without also doing the SAU check
-     * that a full phys-to-virt translation does).
-     * mregion is (if not NULL) set to the region number which matched,
-     * or -1 if no region number is returned (MPU off, address did not
-     * hit a region, address hit in multiple regions).
-     * We set is_subpage to true if the region hit doesn't cover the
-     * entire TARGET_PAGE the address is within.
-     */
-    ARMCPU *cpu = env_archcpu(env);
-    bool is_user = regime_is_user(env, mmu_idx);
-    uint32_t secure = regime_is_secure(env, mmu_idx);
-    int n;
-    int matchregion = -1;
-    bool hit = false;
-    uint32_t addr_page_base = address & TARGET_PAGE_MASK;
-    uint32_t addr_page_limit = addr_page_base + (TARGET_PAGE_SIZE - 1);
-
-    *is_subpage = false;
-    *phys_ptr = address;
-    *prot = 0;
-    if (mregion) {
-        *mregion = -1;
-    }
-
-    /* Unlike the ARM ARM pseudocode, we don't need to check whether this
-     * was an exception vector read from the vector table (which is always
-     * done using the default system address map), because those accesses
-     * are done in arm_v7m_load_vector(), which always does a direct
-     * read using address_space_ldl(), rather than going via this function.
-     */
-    if (regime_translation_disabled(env, mmu_idx)) { /* MPU disabled */
-        hit = true;
-    } else if (m_is_ppb_region(env, address)) {
-        hit = true;
-    } else {
-        if (pmsav7_use_background_region(cpu, mmu_idx, is_user)) {
-            hit = true;
-        }
-
-        for (n = (int)cpu->pmsav7_dregion - 1; n >= 0; n--) {
-            /* region search */
-            /* Note that the base address is bits [31:5] from the register
-             * with bits [4:0] all zeroes, but the limit address is bits
-             * [31:5] from the register with bits [4:0] all ones.
-             */
-            uint32_t base = env->pmsav8.rbar[secure][n] & ~0x1f;
-            uint32_t limit = env->pmsav8.rlar[secure][n] | 0x1f;
-
-            if (!(env->pmsav8.rlar[secure][n] & 0x1)) {
-                /* Region disabled */
-                continue;
-            }
-
-            if (address < base || address > limit) {
-                /*
-                 * Address not in this region. We must check whether the
-                 * region covers addresses in the same page as our address.
-                 * In that case we must not report a size that covers the
-                 * whole page for a subsequent hit against a different MPU
-                 * region or the background region, because it would result in
-                 * incorrect TLB hits for subsequent accesses to addresses that
-                 * are in this MPU region.
-                 */
-                if (limit >= base &&
-                    ranges_overlap(base, limit - base + 1,
-                                   addr_page_base,
-                                   TARGET_PAGE_SIZE)) {
-                    *is_subpage = true;
-                }
-                continue;
-            }
-
-            if (base > addr_page_base || limit < addr_page_limit) {
-                *is_subpage = true;
-            }
-
-            if (matchregion != -1) {
-                /* Multiple regions match -- always a failure (unlike
-                 * PMSAv7 where highest-numbered-region wins)
-                 */
-                fi->type = ARMFault_Permission;
-                fi->level = 1;
-                return true;
-            }
-
-            matchregion = n;
-            hit = true;
-        }
-    }
-
-    if (!hit) {
-        /* background fault */
-        fi->type = ARMFault_Background;
-        return true;
-    }
-
-    if (matchregion == -1) {
-        /* hit using the background region */
-        get_phys_addr_pmsav7_default(env, mmu_idx, address, prot);
-    } else {
-        uint32_t ap = extract32(env->pmsav8.rbar[secure][matchregion], 1, 2);
-        uint32_t xn = extract32(env->pmsav8.rbar[secure][matchregion], 0, 1);
-        bool pxn = false;
-
-        if (arm_feature(env, ARM_FEATURE_V8_1M)) {
-            pxn = extract32(env->pmsav8.rlar[secure][matchregion], 4, 1);
-        }
-
-        if (m_is_system_region(env, address)) {
-            /* System space is always execute never */
-            xn = 1;
-        }
-
-        *prot = simple_ap_to_rw_prot(env, mmu_idx, ap);
-        if (*prot && !xn && !(pxn && !is_user)) {
-            *prot |= PAGE_EXEC;
-        }
-        /* We don't need to look the attribute up in the MAIR0/MAIR1
-         * registers because that only tells us about cacheability.
-         */
-        if (mregion) {
-            *mregion = matchregion;
-        }
-    }
-
-    fi->type = ARMFault_Permission;
-    fi->level = 1;
-    return !(*prot & (1 << access_type));
-}
-
 /* Combine either inner or outer cacheability attributes for normal
  * memory, according to table D4-42 and pseudocode procedure
  * CombineS1S2AttrHints() of ARM DDI 0487B.b (the ARMv8 ARM).
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
     return false;
 }
 
-void get_phys_addr_pmsav7_default(CPUARMState *env,
-                                  ARMMMUIdx mmu_idx,
-                                  int32_t address, int *prot)
+static void get_phys_addr_pmsav7_default(CPUARMState *env, ARMMMUIdx mmu_idx,
+                                         int32_t address, int *prot)
 {
     if (!arm_feature(env, ARM_FEATURE_M)) {
         *prot = PAGE_READ | PAGE_WRITE;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
     return !(*prot & (1 << access_type));
 }
 
+bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
+                       MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                       hwaddr *phys_ptr, MemTxAttrs *txattrs,
+                       int *prot, bool *is_subpage,
+                       ARMMMUFaultInfo *fi, uint32_t *mregion)
+{
+    /*
+     * Perform a PMSAv8 MPU lookup (without also doing the SAU check
+     * that a full phys-to-virt translation does).
+     * mregion is (if not NULL) set to the region number which matched,
+     * or -1 if no region number is returned (MPU off, address did not
+     * hit a region, address hit in multiple regions).
+     * We set is_subpage to true if the region hit doesn't cover the
+     * entire TARGET_PAGE the address is within.
+     */
+    ARMCPU *cpu = env_archcpu(env);
+    bool is_user = regime_is_user(env, mmu_idx);
+    uint32_t secure = regime_is_secure(env, mmu_idx);
+    int n;
+    int matchregion = -1;
+    bool hit = false;
+    uint32_t addr_page_base = address & TARGET_PAGE_MASK;
+    uint32_t addr_page_limit = addr_page_base + (TARGET_PAGE_SIZE - 1);
+
+    *is_subpage = false;
+    *phys_ptr = address;
+    *prot = 0;
+    if (mregion) {
+        *mregion = -1;
+    }
+
+    /*
+     * Unlike the ARM ARM pseudocode, we don't need to check whether this
+     * was an exception vector read from the vector table (which is always
+     * done using the default system address map), because those accesses
+     * are done in arm_v7m_load_vector(), which always does a direct
+     * read using address_space_ldl(), rather than going via this function.
+     */
+    if (regime_translation_disabled(env, mmu_idx)) { /* MPU disabled */
+        hit = true;
+    } else if (m_is_ppb_region(env, address)) {
+        hit = true;
+    } else {
+        if (pmsav7_use_background_region(cpu, mmu_idx, is_user)) {
+            hit = true;
+        }
+
+        for (n = (int)cpu->pmsav7_dregion - 1; n >= 0; n--) {
+            /* region search */
+            /*
+             * Note that the base address is bits [31:5] from the register
+             * with bits [4:0] all zeroes, but the limit address is bits
+             * [31:5] from the register with bits [4:0] all ones.
+             */
+            uint32_t base = env->pmsav8.rbar[secure][n] & ~0x1f;
+            uint32_t limit = env->pmsav8.rlar[secure][n] | 0x1f;
+
+            if (!(env->pmsav8.rlar[secure][n] & 0x1)) {
+                /* Region disabled */
+                continue;
+            }
+
+            if (address < base || address > limit) {
+                /*
+                 * Address not in this region. We must check whether the
+                 * region covers addresses in the same page as our address.
+                 * In that case we must not report a size that covers the
+                 * whole page for a subsequent hit against a different MPU
+                 * region or the background region, because it would result in
+                 * incorrect TLB hits for subsequent accesses to addresses that
+                 * are in this MPU region.
+                 */
+                if (limit >= base &&
+                    ranges_overlap(base, limit - base + 1,
+                                   addr_page_base,
+                                   TARGET_PAGE_SIZE)) {
+                    *is_subpage = true;
+                }
+                continue;
+            }
+
+            if (base > addr_page_base || limit < addr_page_limit) {
+                *is_subpage = true;
+            }
+
+            if (matchregion != -1) {
+                /*
+                 * Multiple regions match -- always a failure (unlike
+                 * PMSAv7 where highest-numbered-region wins)
+                 */
+                fi->type = ARMFault_Permission;
+                fi->level = 1;
+                return true;
+            }
+
+            matchregion = n;
+            hit = true;
+        }
+    }
+
+    if (!hit) {
+        /* background fault */
+        fi->type = ARMFault_Background;
+        return true;
+    }
+
+    if (matchregion == -1) {
+        /* hit using the background region */
+        get_phys_addr_pmsav7_default(env, mmu_idx, address, prot);
+    } else {
+        uint32_t ap = extract32(env->pmsav8.rbar[secure][matchregion], 1, 2);
+        uint32_t xn = extract32(env->pmsav8.rbar[secure][matchregion], 0, 1);
+        bool pxn = false;
+
+        if (arm_feature(env, ARM_FEATURE_V8_1M)) {
+            pxn = extract32(env->pmsav8.rlar[secure][matchregion], 4, 1);
+        }
+
+        if (m_is_system_region(env, address)) {
+            /* System space is always execute never */
+            xn = 1;
+        }
+
+        *prot = simple_ap_to_rw_prot(env, mmu_idx, ap);
+        if (*prot && !xn && !(pxn && !is_user)) {
+            *prot |= PAGE_EXEC;
+        }
+        /*
+         * We don't need to look the attribute up in the MAIR0/MAIR1
+         * registers because that only tells us about cacheability.
+         */
+        if (mregion) {
+            *mregion = matchregion;
+        }
+    }
+
+    fi->type = ARMFault_Permission;
+    fi->level = 1;
+    return !(*prot & (1 << access_type));
+}
+
 static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
                                  MMUAccessType access_type, ARMMMUIdx mmu_idx,
                                  hwaddr *phys_ptr, MemTxAttrs *txattrs,
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-11-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  2 --
 target/arm/helper.c | 19 -------------------
 target/arm/ptw.c    | 21 +++++++++++++++++++++
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.h
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
 bool m_is_ppb_region(CPUARMState *env, uint32_t address);
 bool m_is_system_region(CPUARMState *env, uint32_t address);
 
-bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx, bool is_user);
-
 bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
                         MMUAccessType access_type, ARMMMUIdx mmu_idx,
                         bool s1_is_el0,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ do_fault:
     return true;
 }
 
-bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx, bool is_user)
-{
-    /* Return true if we should use the default memory map as a
-     * "background" region if there are no hits against any MPU regions.
-     */
-    CPUARMState *env = &cpu->env;
-
-    if (is_user) {
-        return false;
-    }
-
-    if (arm_feature(env, ARM_FEATURE_M)) {
-        return env->v7m.mpu_ctrl[regime_is_secure(env, mmu_idx)]
-            & R_V7M_MPU_CTRL_PRIVDEFENA_MASK;
-    } else {
-        return regime_sctlr(env, mmu_idx) & SCTLR_BR;
-    }
-}
-
 bool m_is_ppb_region(CPUARMState *env, uint32_t address)
 {
     /* True if address is in the M profile PPB region 0xe0000000 - 0xe00fffff */
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static void get_phys_addr_pmsav7_default(CPUARMState *env, ARMMMUIdx mmu_idx,
     }
 }
 
+static bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx,
+                                         bool is_user)
+{
+    /*
+     * Return true if we should use the default memory map as a
+     * "background" region if there are no hits against any MPU regions.
+     */
+    CPUARMState *env = &cpu->env;
+
+    if (is_user) {
+        return false;
+    }
+
+    if (arm_feature(env, ARM_FEATURE_M)) {
+        return env->v7m.mpu_ctrl[regime_is_secure(env, mmu_idx)]
+            & R_V7M_MPU_CTRL_PRIVDEFENA_MASK;
+    } else {
+        return regime_sctlr(env, mmu_idx) & SCTLR_BR;
+    }
+}
+
 static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
                                  MMUAccessType access_type, ARMMMUIdx mmu_idx,
                                  hwaddr *phys_ptr, int *prot,
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

This function has one private helper, v8m_is_sau_exempt,
so move that at the same time.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 123 ------------------------------------------
 target/arm/ptw.c    | 126 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+), 123 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/osdep.h"
 #include "qemu/units.h"
 #include "qemu/log.h"
-#include "target/arm/idau.h"
 #include "trace.h"
 #include "cpu.h"
 #include "internals.h"
@@ -XXX,XX +XXX,XX @@ bool m_is_system_region(CPUARMState *env, uint32_t address)
     return arm_feature(env, ARM_FEATURE_M) && extract32(address, 29, 3) == 0x7;
 }
 
-static bool v8m_is_sau_exempt(CPUARMState *env,
-                              uint32_t address, MMUAccessType access_type)
-{
-    /* The architecture specifies that certain address ranges are
-     * exempt from v8M SAU/IDAU checks.
-     */
-    return
-        (access_type == MMU_INST_FETCH && m_is_system_region(env, address)) ||
-        (address >= 0xe0000000 && address <= 0xe0002fff) ||
-        (address >= 0xe000e000 && address <= 0xe000efff) ||
-        (address >= 0xe002e000 && address <= 0xe002efff) ||
-        (address >= 0xe0040000 && address <= 0xe0041fff) ||
-        (address >= 0xe00ff000 && address <= 0xe00fffff);
-}
-
-void v8m_security_lookup(CPUARMState *env, uint32_t address,
-                                MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                                V8M_SAttributes *sattrs)
-{
-    /* Look up the security attributes for this address. Compare the
-     * pseudocode SecurityCheck() function.
-     * We assume the caller has zero-initialized *sattrs.
-     */
-    ARMCPU *cpu = env_archcpu(env);
-    int r;
-    bool idau_exempt = false, idau_ns = true, idau_nsc = true;
-    int idau_region = IREGION_NOTVALID;
-    uint32_t addr_page_base = address & TARGET_PAGE_MASK;
-    uint32_t addr_page_limit = addr_page_base + (TARGET_PAGE_SIZE - 1);
-
-    if (cpu->idau) {
-        IDAUInterfaceClass *iic = IDAU_INTERFACE_GET_CLASS(cpu->idau);
-        IDAUInterface *ii = IDAU_INTERFACE(cpu->idau);
-
-        iic->check(ii, address, &idau_region, &idau_exempt, &idau_ns,
-                   &idau_nsc);
-    }
-
-    if (access_type == MMU_INST_FETCH && extract32(address, 28, 4) == 0xf) {
-        /* 0xf0000000..0xffffffff is always S for insn fetches */
-        return;
-    }
-
-    if (idau_exempt || v8m_is_sau_exempt(env, address, access_type)) {
-        sattrs->ns = !regime_is_secure(env, mmu_idx);
-        return;
-    }
-
-    if (idau_region != IREGION_NOTVALID) {
-        sattrs->irvalid = true;
-        sattrs->iregion = idau_region;
-    }
-
-    switch (env->sau.ctrl & 3) {
-    case 0: /* SAU.ENABLE == 0, SAU.ALLNS == 0 */
-        break;
-    case 2: /* SAU.ENABLE == 0, SAU.ALLNS == 1 */
-        sattrs->ns = true;
-        break;
-    default: /* SAU.ENABLE == 1 */
-        for (r = 0; r < cpu->sau_sregion; r++) {
-            if (env->sau.rlar[r] & 1) {
-                uint32_t base = env->sau.rbar[r] & ~0x1f;
-                uint32_t limit = env->sau.rlar[r] | 0x1f;
-
-                if (base <= address && limit >= address) {
-                    if (base > addr_page_base || limit < addr_page_limit) {
-                        sattrs->subpage = true;
-                    }
-                    if (sattrs->srvalid) {
-                        /* If we hit in more than one region then we must report
-                         * as Secure, not NS-Callable, with no valid region
-                         * number info.
-                         */
-                        sattrs->ns = false;
-                        sattrs->nsc = false;
-                        sattrs->sregion = 0;
-                        sattrs->srvalid = false;
-                        break;
-                    } else {
-                        if (env->sau.rlar[r] & 2) {
-                            sattrs->nsc = true;
-                        } else {
-                            sattrs->ns = true;
-                        }
-                        sattrs->srvalid = true;
-                        sattrs->sregion = r;
-                    }
-                } else {
-                    /*
-                     * Address not in this region. We must check whether the
-                     * region covers addresses in the same page as our address.
-                     * In that case we must not report a size that covers the
-                     * whole page for a subsequent hit against a different MPU
-                     * region or the background region, because it would result
-                     * in incorrect TLB hits for subsequent accesses to
-                     * addresses that are in this MPU region.
-                     */
-                    if (limit >= base &&
-                        ranges_overlap(base, limit - base + 1,
-                                       addr_page_base,
-                                       TARGET_PAGE_SIZE)) {
-                        sattrs->subpage = true;
-                    }
-                }
-            }
-        }
-        break;
-    }
-
-    /*
-     * The IDAU will override the SAU lookup results if it specifies
-     * higher security than the SAU does.
-     */
-    if (!idau_ns) {
-        if (sattrs->ns || (!idau_nsc && sattrs->nsc)) {
-            sattrs->ns = false;
-            sattrs->nsc = idau_nsc;
-        }
-    }
-}
-
 /* Combine either inner or outer cacheability attributes for normal
  * memory, according to table D4-42 and pseudocode procedure
  * CombineS1S2AttrHints() of ARM DDI 0487B.b (the ARMv8 ARM).
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/range.h"
 #include "cpu.h"
 #include "internals.h"
+#include "idau.h"
 #include "ptw.h"
 
 
@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
     return !(*prot & (1 << access_type));
 }
 
+static bool v8m_is_sau_exempt(CPUARMState *env,
+                              uint32_t address, MMUAccessType access_type)
+{
+    /*
+     * The architecture specifies that certain address ranges are
+     * exempt from v8M SAU/IDAU checks.
+     */
+    return
+        (access_type == MMU_INST_FETCH && m_is_system_region(env, address)) ||
+        (address >= 0xe0000000 && address <= 0xe0002fff) ||
+        (address >= 0xe000e000 && address <= 0xe000efff) ||
+        (address >= 0xe002e000 && address <= 0xe002efff) ||
+        (address >= 0xe0040000 && address <= 0xe0041fff) ||
+        (address >= 0xe00ff000 && address <= 0xe00fffff);
+}
+
+void v8m_security_lookup(CPUARMState *env, uint32_t address,
+                                MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                                V8M_SAttributes *sattrs)
+{
+    /*
+     * Look up the security attributes for this address. Compare the
+     * pseudocode SecurityCheck() function.
+     * We assume the caller has zero-initialized *sattrs.
+     */
+    ARMCPU *cpu = env_archcpu(env);
+    int r;
+    bool idau_exempt = false, idau_ns = true, idau_nsc = true;
+    int idau_region = IREGION_NOTVALID;
+    uint32_t addr_page_base = address & TARGET_PAGE_MASK;
+    uint32_t addr_page_limit = addr_page_base + (TARGET_PAGE_SIZE - 1);
+
+    if (cpu->idau) {
+        IDAUInterfaceClass *iic = IDAU_INTERFACE_GET_CLASS(cpu->idau);
+        IDAUInterface *ii = IDAU_INTERFACE(cpu->idau);
+
+        iic->check(ii, address, &idau_region, &idau_exempt, &idau_ns,
+                   &idau_nsc);
+    }
+
+    if (access_type == MMU_INST_FETCH && extract32(address, 28, 4) == 0xf) {
+        /* 0xf0000000..0xffffffff is always S for insn fetches */
+        return;
+    }
+
+    if (idau_exempt || v8m_is_sau_exempt(env, address, access_type)) {
+        sattrs->ns = !regime_is_secure(env, mmu_idx);
+        return;
+    }
+
+    if (idau_region != IREGION_NOTVALID) {
+        sattrs->irvalid = true;
+        sattrs->iregion = idau_region;
+    }
+
+    switch (env->sau.ctrl & 3) {
+    case 0: /* SAU.ENABLE == 0, SAU.ALLNS == 0 */
+        break;
+    case 2: /* SAU.ENABLE == 0, SAU.ALLNS == 1 */
+        sattrs->ns = true;
+        break;
+    default: /* SAU.ENABLE == 1 */
+        for (r = 0; r < cpu->sau_sregion; r++) {
+            if (env->sau.rlar[r] & 1) {
+                uint32_t base = env->sau.rbar[r] & ~0x1f;
+                uint32_t limit = env->sau.rlar[r] | 0x1f;
+
+                if (base <= address && limit >= address) {
+                    if (base > addr_page_base || limit < addr_page_limit) {
+                        sattrs->subpage = true;
+                    }
+                    if (sattrs->srvalid) {
+                        /*
+                         * If we hit in more than one region then we must report
+                         * as Secure, not NS-Callable, with no valid region
+                         * number info.
+                         */
+                        sattrs->ns = false;
+                        sattrs->nsc = false;
+                        sattrs->sregion = 0;
+                        sattrs->srvalid = false;
+                        break;
+                    } else {
+                        if (env->sau.rlar[r] & 2) {
+                            sattrs->nsc = true;
+                        } else {
+                            sattrs->ns = true;
+                        }
+                        sattrs->srvalid = true;
+                        sattrs->sregion = r;
+                    }
+                } else {
+                    /*
+                     * Address not in this region. We must check whether the
+                     * region covers addresses in the same page as our address.
+                     * In that case we must not report a size that covers the
+                     * whole page for a subsequent hit against a different MPU
+                     * region or the background region, because it would result
+                     * in incorrect TLB hits for subsequent accesses to
+                     * addresses that are in this MPU region.
+                     */
+                    if (limit >= base &&
+                        ranges_overlap(base, limit - base + 1,
+                                       addr_page_base,
+                                       TARGET_PAGE_SIZE)) {
+                        sattrs->subpage = true;
+                    }
+                }
+            }
+        }
+        break;
+    }
+
+    /*
+     * The IDAU will override the SAU lookup results if it specifies
+     * higher security than the SAU does.
+     */
+    if (!idau_ns) {
+        if (sattrs->ns || (!idau_nsc && sattrs->nsc)) {
+            sattrs->ns = false;
+            sattrs->nsc = idau_nsc;
+        }
+    }
+}
+
 static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
                                  MMUAccessType access_type, ARMMMUIdx mmu_idx,
                                  hwaddr *phys_ptr, MemTxAttrs *txattrs,
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-13-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  3 ---
 target/arm/helper.c | 15 ---------------
 target/arm/ptw.c    | 16 ++++++++++++++++
 3 files changed, 16 insertions(+), 18 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-14-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  4 ++--
 target/arm/helper.c | 26 +-------------------------
 target/arm/ptw.c    | 23 +++++++++++++++++++++++
 3 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.h
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@ uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
 
 bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx);
 bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
+uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn);
+
 ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
                                  ARMCacheAttrs s1, ARMCacheAttrs s2);
 
-bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
-                              uint32_t *table, uint32_t address);
 int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
                   int ap, int domain_prot);
 int simple_ap_to_rw_prot_is_user(int ap, bool is_user);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static inline bool regime_translation_big_endian(CPUARMState *env,
 }
 
 /* Return the TTBR associated with this translation regime */
-static inline uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx,
-                                   int ttbrn)
+uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn)
 {
     if (mmu_idx == ARMMMUIdx_Stage2) {
         return env->cp15.vttbr_el2;
@@ -XXX,XX +XXX,XX @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
     return prot_rw | PAGE_EXEC;
 }
 
-bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
-                              uint32_t *table, uint32_t address)
-{
-    /* Note that we can only get here for an AArch32 PL0/PL1 lookup */
-    TCR *tcr = regime_tcr(env, mmu_idx);
-
-    if (address & tcr->mask) {
-        if (tcr->raw_tcr & TTBCR_PD1) {
-            /* Translation table walk disabled for TTBR1 */
-            return false;
-        }
-        *table = regime_ttbr(env, mmu_idx, 1) & 0xffffc000;
-    } else {
-        if (tcr->raw_tcr & TTBCR_PD0) {
-            /* Translation table walk disabled for TTBR0 */
-            return false;
-        }
-        *table = regime_ttbr(env, mmu_idx, 0) & tcr->base_mask;
-    }
-    *table |= (address >> 18) & 0x3ffc;
-    return true;
-}
-
 static bool ptw_attrs_are_device(CPUARMState *env, ARMCacheAttrs cacheattrs)
 {
     /*
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@
 #include "ptw.h"
 
 
+static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
+                                     uint32_t *table, uint32_t address)
+{
+    /* Note that we can only get here for an AArch32 PL0/PL1 lookup */
+    TCR *tcr = regime_tcr(env, mmu_idx);
+
+    if (address & tcr->mask) {
+        if (tcr->raw_tcr & TTBCR_PD1) {
+            /* Translation table walk disabled for TTBR1 */
+            return false;
+        }
+        *table = regime_ttbr(env, mmu_idx, 1) & 0xffffc000;
+    } else {
+        if (tcr->raw_tcr & TTBCR_PD0) {
+            /* Translation table walk disabled for TTBR0 */
+            return false;
+        }
+        *table = regime_ttbr(env, mmu_idx, 0) & tcr->base_mask;
+    }
+    *table |= (address >> 18) & 0x3ffc;
+    return true;
+}
+
 static bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
                              MMUAccessType access_type, ARMMMUIdx mmu_idx,
                              hwaddr *phys_ptr, int *prot,
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

There are a handful of helpers for combine_cacheattrs
that we can move at the same time as the main entry point.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-15-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |   3 -
 target/arm/helper.c | 218 -------------------------------------------
 target/arm/ptw.c    | 221 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 221 insertions(+), 221 deletions(-)

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.h
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@ bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx);
 bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
 uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn);
 
-ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
-                                 ARMCacheAttrs s1, ARMCacheAttrs s2);
-
 int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
                   int ap, int domain_prot);
 int simple_ap_to_rw_prot_is_user(int ap, bool is_user);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
     }
     return true;
 }
-
-/* Translate from the 4-bit stage 2 representation of
- * memory attributes (without cache-allocation hints) to
- * the 8-bit representation of the stage 1 MAIR registers
- * (which includes allocation hints).
- *
- * ref: shared/translation/attrs/S2AttrDecode()
- *      .../S2ConvertAttrsHints()
- */
-static uint8_t convert_stage2_attrs(CPUARMState *env, uint8_t s2attrs)
-{
-    uint8_t hiattr = extract32(s2attrs, 2, 2);
-    uint8_t loattr = extract32(s2attrs, 0, 2);
-    uint8_t hihint = 0, lohint = 0;
-
-    if (hiattr != 0) { /* normal memory */
-        if (arm_hcr_el2_eff(env) & HCR_CD) { /* cache disabled */
-            hiattr = loattr = 1; /* non-cacheable */
-        } else {
-            if (hiattr != 1) { /* Write-through or write-back */
-                hihint = 3; /* RW allocate */
-            }
-            if (loattr != 1) { /* Write-through or write-back */
-                lohint = 3; /* RW allocate */
-            }
-        }
-    }
-
-    return (hiattr << 6) | (hihint << 4) | (loattr << 2) | lohint;
-}
 #endif /* !CONFIG_USER_ONLY */
 
 /* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
@@ -XXX,XX +XXX,XX @@ do_fault:
     return true;
 }
 
-/* Combine either inner or outer cacheability attributes for normal
- * memory, according to table D4-42 and pseudocode procedure
- * CombineS1S2AttrHints() of ARM DDI 0487B.b (the ARMv8 ARM).
- *
- * NB: only stage 1 includes allocation hints (RW bits), leading to
- * some asymmetry.
- */
-static uint8_t combine_cacheattr_nibble(uint8_t s1, uint8_t s2)
-{
-    if (s1 == 4 || s2 == 4) {
-        /* non-cacheable has precedence */
-        return 4;
-    } else if (extract32(s1, 2, 2) == 0 || extract32(s1, 2, 2) == 2) {
-        /* stage 1 write-through takes precedence */
-        return s1;
-    } else if (extract32(s2, 2, 2) == 2) {
-        /* stage 2 write-through takes precedence, but the allocation hint
-         * is still taken from stage 1
-         */
-        return (2 << 2) | extract32(s1, 0, 2);
-    } else { /* write-back */
-        return s1;
-    }
-}
-
-/*
- * Combine the memory type and cacheability attributes of
- * s1 and s2 for the HCR_EL2.FWB == 0 case, returning the
- * combined attributes in MAIR_EL1 format.
- */
-static uint8_t combined_attrs_nofwb(CPUARMState *env,
-                                    ARMCacheAttrs s1, ARMCacheAttrs s2)
-{
-    uint8_t s1lo, s2lo, s1hi, s2hi, s2_mair_attrs, ret_attrs;
-
-    s2_mair_attrs = convert_stage2_attrs(env, s2.attrs);
-
-    s1lo = extract32(s1.attrs, 0, 4);
-    s2lo = extract32(s2_mair_attrs, 0, 4);
-    s1hi = extract32(s1.attrs, 4, 4);
-    s2hi = extract32(s2_mair_attrs, 4, 4);
-
-    /* Combine memory type and cacheability attributes */
-    if (s1hi == 0 || s2hi == 0) {
-        /* Device has precedence over normal */
-        if (s1lo == 0 || s2lo == 0) {
-            /* nGnRnE has precedence over anything */
-            ret_attrs = 0;
-        } else if (s1lo == 4 || s2lo == 4) {
-            /* non-Reordering has precedence over Reordering */
-            ret_attrs = 4;  /* nGnRE */
-        } else if (s1lo == 8 || s2lo == 8) {
-            /* non-Gathering has precedence over Gathering */
-            ret_attrs = 8;  /* nGRE */
-        } else {
-            ret_attrs = 0xc; /* GRE */
-        }
-    } else { /* Normal memory */
-        /* Outer/inner cacheability combine independently */
-        ret_attrs = combine_cacheattr_nibble(s1hi, s2hi) << 4
-                  | combine_cacheattr_nibble(s1lo, s2lo);
-    }
-    return ret_attrs;
-}
-
-static uint8_t force_cacheattr_nibble_wb(uint8_t attr)
-{
-    /*
-     * Given the 4 bits specifying the outer or inner cacheability
-     * in MAIR format, return a value specifying Normal Write-Back,
-     * with the allocation and transient hints taken from the input
-     * if the input specified some kind of cacheable attribute.
-     */
-    if (attr == 0 || attr == 4) {
-        /*
-         * 0 == an UNPREDICTABLE encoding
-         * 4 == Non-cacheable
-         * Either way, force Write-Back RW allocate non-transient
-         */
-        return 0xf;
-    }
-    /* Change WriteThrough to WriteBack, keep allocation and transient hints */
-    return attr | 4;
-}
-
-/*
- * Combine the memory type and cacheability attributes of
- * s1 and s2 for the HCR_EL2.FWB == 1 case, returning the
- * combined attributes in MAIR_EL1 format.
- */
-static uint8_t combined_attrs_fwb(CPUARMState *env,
-                                  ARMCacheAttrs s1, ARMCacheAttrs s2)
-{
-    switch (s2.attrs) {
-    case 7:
-        /* Use stage 1 attributes */
-        return s1.attrs;
-    case 6:
-        /*
-         * Force Normal Write-Back. Note that if S1 is Normal cacheable
-         * then we take the allocation hints from it; otherwise it is
-         * RW allocate, non-transient.
-         */
-        if ((s1.attrs & 0xf0) == 0) {
-            /* S1 is Device */
-            return 0xff;
-        }
-        /* Need to check the Inner and Outer nibbles separately */
-        return force_cacheattr_nibble_wb(s1.attrs & 0xf) |
-            force_cacheattr_nibble_wb(s1.attrs >> 4) << 4;
-    case 5:
-        /* If S1 attrs are Device, use them; otherwise Normal Non-cacheable */
-        if ((s1.attrs & 0xf0) == 0) {
-            return s1.attrs;
-        }
-        return 0x44;
-    case 0 ... 3:
-        /* Force Device, of subtype specified by S2 */
-        return s2.attrs << 2;
-    default:
-        /*
-         * RESERVED values (including RES0 descriptor bit [5] being nonzero);
-         * arbitrarily force Device.
-         */
-        return 0;
-    }
-}
-
-/* Combine S1 and S2 cacheability/shareability attributes, per D4.5.4
- * and CombineS1S2Desc()
- *
- * @env:     CPUARMState
- * @s1:      Attributes from stage 1 walk
- * @s2:      Attributes from stage 2 walk
- */
-ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
-                                 ARMCacheAttrs s1, ARMCacheAttrs s2)
-{
-    ARMCacheAttrs ret;
-    bool tagged = false;
-
-    assert(s2.is_s2_format && !s1.is_s2_format);
-    ret.is_s2_format = false;
-
-    if (s1.attrs == 0xf0) {
-        tagged = true;
-        s1.attrs = 0xff;
-    }
-
-    /* Combine shareability attributes (table D4-43) */
-    if (s1.shareability == 2 || s2.shareability == 2) {
-        /* if either are outer-shareable, the result is outer-shareable */
-        ret.shareability = 2;
-    } else if (s1.shareability == 3 || s2.shareability == 3) {
-        /* if either are inner-shareable, the result is inner-shareable */
-        ret.shareability = 3;
-    } else {
-        /* both non-shareable */
-        ret.shareability = 0;
-    }
-
-    /* Combine memory type and cacheability attributes */
-    if (arm_hcr_el2_eff(env) & HCR_FWB) {
-        ret.attrs = combined_attrs_fwb(env, s1, s2);
-    } else {
-        ret.attrs = combined_attrs_nofwb(env, s1, s2);
-    }
-
-    /*
-     * Any location for which the resultant memory type is any
-     * type of Device memory is always treated as Outer Shareable.
-     * Any location for which the resultant memory type is Normal
-     * Inner Non-cacheable, Outer Non-cacheable is always treated
-     * as Outer Shareable.
-     * TODO: FEAT_XS adds another value (0x40) also meaning iNCoNC
-     */
-    if ((ret.attrs & 0xf0) == 0 || ret.attrs == 0x44) {
-        ret.shareability = 2;
-    }
-
-    /* TODO: CombineS1S2Desc does not consider transient, only WB, RWA. */
-    if (tagged && ret.attrs == 0xff) {
-        ret.attrs = 0xf0;
-    }
-
-    return ret;
-}
-
 hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
                                          MemTxAttrs *attrs)
 {
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
     return ret;
 }
 
+/*
+ * Translate from the 4-bit stage 2 representation of
+ * memory attributes (without cache-allocation hints) to
+ * the 8-bit representation of the stage 1 MAIR registers
+ * (which includes allocation hints).
+ *
+ * ref: shared/translation/attrs/S2AttrDecode()
+ *      .../S2ConvertAttrsHints()
+ */
+static uint8_t convert_stage2_attrs(CPUARMState *env, uint8_t s2attrs)
+{
+    uint8_t hiattr = extract32(s2attrs, 2, 2);
+    uint8_t loattr = extract32(s2attrs, 0, 2);
+    uint8_t hihint = 0, lohint = 0;
+
+    if (hiattr != 0) { /* normal memory */
+        if (arm_hcr_el2_eff(env) & HCR_CD) { /* cache disabled */
+            hiattr = loattr = 1; /* non-cacheable */
+        } else {
+            if (hiattr != 1) { /* Write-through or write-back */
+                hihint = 3; /* RW allocate */
+            }
+            if (loattr != 1) { /* Write-through or write-back */
+                lohint = 3; /* RW allocate */
+            }
+        }
+    }
+
+    return (hiattr << 6) | (hihint << 4) | (loattr << 2) | lohint;
+}
+
+/*
+ * Combine either inner or outer cacheability attributes for normal
+ * memory, according to table D4-42 and pseudocode procedure
+ * CombineS1S2AttrHints() of ARM DDI 0487B.b (the ARMv8 ARM).
+ *
+ * NB: only stage 1 includes allocation hints (RW bits), leading to
+ * some asymmetry.
+ */
+static uint8_t combine_cacheattr_nibble(uint8_t s1, uint8_t s2)
+{
+    if (s1 == 4 || s2 == 4) {
+        /* non-cacheable has precedence */
+        return 4;
+    } else if (extract32(s1, 2, 2) == 0 || extract32(s1, 2, 2) == 2) {
+        /* stage 1 write-through takes precedence */
+        return s1;
+    } else if (extract32(s2, 2, 2) == 2) {
+        /* stage 2 write-through takes precedence, but the allocation hint
+         * is still taken from stage 1
+         */
+        return (2 << 2) | extract32(s1, 0, 2);
+    } else { /* write-back */
+        return s1;
+    }
+}
+
+/*
+ * Combine the memory type and cacheability attributes of
+ * s1 and s2 for the HCR_EL2.FWB == 0 case, returning the
+ * combined attributes in MAIR_EL1 format.
+ */
+static uint8_t combined_attrs_nofwb(CPUARMState *env,
+                                    ARMCacheAttrs s1, ARMCacheAttrs s2)
+{
+    uint8_t s1lo, s2lo, s1hi, s2hi, s2_mair_attrs, ret_attrs;
+
+    s2_mair_attrs = convert_stage2_attrs(env, s2.attrs);
+
+    s1lo = extract32(s1.attrs, 0, 4);
+    s2lo = extract32(s2_mair_attrs, 0, 4);
+    s1hi = extract32(s1.attrs, 4, 4);
+    s2hi = extract32(s2_mair_attrs, 4, 4);
+
+    /* Combine memory type and cacheability attributes */
+    if (s1hi == 0 || s2hi == 0) {
+        /* Device has precedence over normal */
+        if (s1lo == 0 || s2lo == 0) {
+            /* nGnRnE has precedence over anything */
+            ret_attrs = 0;
+        } else if (s1lo == 4 || s2lo == 4) {
+            /* non-Reordering has precedence over Reordering */
+            ret_attrs = 4;  /* nGnRE */
+        } else if (s1lo == 8 || s2lo == 8) {
+            /* non-Gathering has precedence over Gathering */
+            ret_attrs = 8;  /* nGRE */
+        } else {
+            ret_attrs = 0xc; /* GRE */
+        }
+    } else { /* Normal memory */
+        /* Outer/inner cacheability combine independently */
+        ret_attrs = combine_cacheattr_nibble(s1hi, s2hi) << 4
+                  | combine_cacheattr_nibble(s1lo, s2lo);
+    }
+    return ret_attrs;
+}
+
+static uint8_t force_cacheattr_nibble_wb(uint8_t attr)
+{
+    /*
+     * Given the 4 bits specifying the outer or inner cacheability
+     * in MAIR format, return a value specifying Normal Write-Back,
+     * with the allocation and transient hints taken from the input
+     * if the input specified some kind of cacheable attribute.
+     */
+    if (attr == 0 || attr == 4) {
+        /*
+         * 0 == an UNPREDICTABLE encoding
+         * 4 == Non-cacheable
+         * Either way, force Write-Back RW allocate non-transient
+         */
+        return 0xf;
+    }
+    /* Change WriteThrough to WriteBack, keep allocation and transient hints */
+    return attr | 4;
+}
+
+/*
+ * Combine the memory type and cacheability attributes of
+ * s1 and s2 for the HCR_EL2.FWB == 1 case, returning the
+ * combined attributes in MAIR_EL1 format.
+ */
+static uint8_t combined_attrs_fwb(CPUARMState *env,
+                                  ARMCacheAttrs s1, ARMCacheAttrs s2)
+{
+    switch (s2.attrs) {
+    case 7:
+        /* Use stage 1 attributes */
+        return s1.attrs;
+    case 6:
+        /*
+         * Force Normal Write-Back. Note that if S1 is Normal cacheable
+         * then we take the allocation hints from it; otherwise it is
+         * RW allocate, non-transient.
+         */
+        if ((s1.attrs & 0xf0) == 0) {
+            /* S1 is Device */
+            return 0xff;
+        }
+        /* Need to check the Inner and Outer nibbles separately */
+        return force_cacheattr_nibble_wb(s1.attrs & 0xf) |
+            force_cacheattr_nibble_wb(s1.attrs >> 4) << 4;
+    case 5:
+        /* If S1 attrs are Device, use them; otherwise Normal Non-cacheable */
+        if ((s1.attrs & 0xf0) == 0) {
+            return s1.attrs;
+        }
+        return 0x44;
+    case 0 ... 3:
+        /* Force Device, of subtype specified by S2 */
+        return s2.attrs << 2;
+    default:
+        /*
+         * RESERVED values (including RES0 descriptor bit [5] being nonzero);
+         * arbitrarily force Device.
+         */
+        return 0;
+    }
+}
+
+/*
+ * Combine S1 and S2 cacheability/shareability attributes, per D4.5.4
+ * and CombineS1S2Desc()
+ *
+ * @env:     CPUARMState
+ * @s1:      Attributes from stage 1 walk
+ * @s2:      Attributes from stage 2 walk
+ */
+static ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
+                                        ARMCacheAttrs s1, ARMCacheAttrs s2)
+{
+    ARMCacheAttrs ret;
+    bool tagged = false;
+
+    assert(s2.is_s2_format && !s1.is_s2_format);
+    ret.is_s2_format = false;
+
+    if (s1.attrs == 0xf0) {
+        tagged = true;
+        s1.attrs = 0xff;
+    }
+
+    /* Combine shareability attributes (table D4-43) */
+    if (s1.shareability == 2 || s2.shareability == 2) {
+        /* if either are outer-shareable, the result is outer-shareable */
+        ret.shareability = 2;
+    } else if (s1.shareability == 3 || s2.shareability == 3) {
+        /* if either are inner-shareable, the result is inner-shareable */
+        ret.shareability = 3;
+    } else {
+        /* both non-shareable */
+        ret.shareability = 0;
+    }
+
+    /* Combine memory type and cacheability attributes */
+    if (arm_hcr_el2_eff(env) & HCR_FWB) {
+        ret.attrs = combined_attrs_fwb(env, s1, s2);
+    } else {
+        ret.attrs = combined_attrs_nofwb(env, s1, s2);
+    }
+
+    /*
+     * Any location for which the resultant memory type is any
+     * type of Device memory is always treated as Outer Shareable.
+     * Any location for which the resultant memory type is Normal
+     * Inner Non-cacheable, Outer Non-cacheable is always treated
+     * as Outer Shareable.
+     * TODO: FEAT_XS adds another value (0x40) also meaning iNCoNC
+     */
+    if ((ret.attrs & 0xf0) == 0 || ret.attrs == 0x44) {
+        ret.shareability = 2;
+    }
+
+    /* TODO: CombineS1S2Desc does not consider transient, only WB, RWA. */
+    if (tagged && ret.attrs == 0xff) {
+        ret.attrs = 0xf0;
+    }
+
+    return ret;
+}
+
 /**
  * get_phys_addr - get the physical address for this virtual address
  *
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-16-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  10 ++
 target/arm/helper.c | 416 +-------------------------------------------
 target/arm/ptw.c    | 411 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 429 insertions(+), 408 deletions(-)

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.h
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@
 
 #ifndef CONFIG_USER_ONLY
 
+extern const uint8_t pamax_map[7];
+
 uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
                      ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi);
 uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
     return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
 }
 
+ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
+                                   ARMMMUIdx mmu_idx);
+bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
+                        int inputsize, int stride, int outputsize);
+int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0);
+int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
+               int ap, int ns, int xn, int pxn);
+
 bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
                         MMUAccessType access_type, ARMMMUIdx mmu_idx,
                         bool s1_is_el0,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
  * @xn:      XN (execute-never) bits
  * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
  */
-static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
+int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
 {
     int prot = 0;
 
@@ -XXX,XX +XXX,XX @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
  * @xn:      XN (execute-never) bit
  * @pxn:     PXN (privileged execute-never) bit
  */
-static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
-                      int ap, int ns, int xn, int pxn)
+int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
+               int ap, int ns, int xn, int pxn)
 {
     bool is_user = regime_is_user(env, mmu_idx);
     int prot_rw, user_rw;
@@ -XXX,XX +XXX,XX @@ uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
  * Returns true if the suggested S2 translation parameters are OK and
  * false otherwise.
  */
-static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
-                               int inputsize, int stride, int outputsize)
+bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
+                        int inputsize, int stride, int outputsize)
 {
     const int grainsize = stride + 3;
     int startsizecheck;
@@ -XXX,XX +XXX,XX @@ static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
 #endif /* !CONFIG_USER_ONLY */
 
 /* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
-static const uint8_t pamax_map[] = {
+const uint8_t pamax_map[] = {
     [0] = 32,
     [1] = 36,
     [2] = 40,
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
 }
 
 #ifndef CONFIG_USER_ONLY
-static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
-                                          ARMMMUIdx mmu_idx)
+ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
+                                   ARMMMUIdx mmu_idx)
 {
     uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
     uint32_t el = regime_el(env, mmu_idx);
@@ -XXX,XX +XXX,XX @@ static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
     };
 }
 
-/**
- * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
- *
- * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
- * prot and page_size may not be filled in, and the populated fsr value provides
- * information on why the translation aborted, in the format of a long-format
- * DFSR/IFSR fault register, with the following caveats:
- *  * the WnR bit is never set (the caller must do this).
- *
- * @env: CPUARMState
- * @address: virtual address to get physical address for
- * @access_type: MMU_DATA_LOAD, MMU_DATA_STORE or MMU_INST_FETCH
- * @mmu_idx: MMU index indicating required translation regime
- * @s1_is_el0: if @mmu_idx is ARMMMUIdx_Stage2 (so this is a stage 2 page table
- *             walk), must be true if this is stage 2 of a stage 1+2 walk for an
- *             EL0 access). If @mmu_idx is anything else, @s1_is_el0 is ignored.
- * @phys_ptr: set to the physical address corresponding to the virtual address
- * @attrs: set to the memory transaction attributes to use
- * @prot: set to the permissions for the page containing phys_ptr
- * @page_size_ptr: set to the size of the page containing phys_ptr
- * @fi: set to fault info if the translation fails
- * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
- */
-bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-                        MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                        bool s1_is_el0,
-                        hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
-                        target_ulong *page_size_ptr,
-                        ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
-{
-    ARMCPU *cpu = env_archcpu(env);
-    CPUState *cs = CPU(cpu);
-    /* Read an LPAE long-descriptor translation table. */
-    ARMFaultType fault_type = ARMFault_Translation;
-    uint32_t level;
-    ARMVAParameters param;
-    uint64_t ttbr;
-    hwaddr descaddr, indexmask, indexmask_grainsize;
-    uint32_t tableattrs;
-    target_ulong page_size;
-    uint32_t attrs;
-    int32_t stride;
-    int addrsize, inputsize, outputsize;
-    TCR *tcr = regime_tcr(env, mmu_idx);
-    int ap, ns, xn, pxn;
-    uint32_t el = regime_el(env, mmu_idx);
-    uint64_t descaddrmask;
-    bool aarch64 = arm_el_is_aa64(env, el);
-    bool guarded = false;
-
-    /* TODO: This code does not support shareability levels. */
-    if (aarch64) {
-        int ps;
-
-        param = aa64_va_parameters(env, address, mmu_idx,
-                                   access_type != MMU_INST_FETCH);
-        level = 0;
-
-        /*
-         * If TxSZ is programmed to a value larger than the maximum,
-         * or smaller than the effective minimum, it is IMPLEMENTATION
-         * DEFINED whether we behave as if the field were programmed
-         * within bounds, or if a level 0 Translation fault is generated.
-         *
-         * With FEAT_LVA, fault on less than minimum becomes required,
-         * so our choice is to always raise the fault.
-         */
-        if (param.tsz_oob) {
-            fault_type = ARMFault_Translation;
-            goto do_fault;
-        }
-
-        addrsize = 64 - 8 * param.tbi;
-        inputsize = 64 - param.tsz;
-
-        /*
-         * Bound PS by PARANGE to find the effective output address size.
-         * ID_AA64MMFR0 is a read-only register so values outside of the
-         * supported mappings can be considered an implementation error.
-         */
-        ps = FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
-        ps = MIN(ps, param.ps);
-        assert(ps < ARRAY_SIZE(pamax_map));
-        outputsize = pamax_map[ps];
-    } else {
-        param = aa32_va_parameters(env, address, mmu_idx);
-        level = 1;
-        addrsize = (mmu_idx == ARMMMUIdx_Stage2 ? 40 : 32);
-        inputsize = addrsize - param.tsz;
-        outputsize = 40;
-    }
-
-    /*
-     * We determined the region when collecting the parameters, but we
-     * have not yet validated that the address is valid for the region.
-     * Extract the top bits and verify that they all match select.
-     *
-     * For aa32, if inputsize == addrsize, then we have selected the
-     * region by exclusion in aa32_va_parameters and there is no more
-     * validation to do here.
-     */
-    if (inputsize < addrsize) {
-        target_ulong top_bits = sextract64(address, inputsize,
-                                           addrsize - inputsize);
-        if (-top_bits != param.select) {
-            /* The gap between the two regions is a Translation fault */
-            fault_type = ARMFault_Translation;
-            goto do_fault;
-        }
-    }
-
-    if (param.using64k) {
-        stride = 13;
-    } else if (param.using16k) {
-        stride = 11;
-    } else {
-        stride = 9;
-    }
-
-    /* Note that QEMU ignores shareability and cacheability attributes,
-     * so we don't need to do anything with the SH, ORGN, IRGN fields
-     * in the TTBCR.  Similarly, TTBCR:A1 selects whether we get the
-     * ASID from TTBR0 or TTBR1, but QEMU's TLB doesn't currently
-     * implement any ASID-like capability so we can ignore it (instead
-     * we will always flush the TLB any time the ASID is changed).
-     */
-    ttbr = regime_ttbr(env, mmu_idx, param.select);
-
-    /* Here we should have set up all the parameters for the translation:
-     * inputsize, ttbr, epd, stride, tbi
-     */
-
-    if (param.epd) {
-        /* Translation table walk disabled => Translation fault on TLB miss
-         * Note: This is always 0 on 64-bit EL2 and EL3.
-         */
-        goto do_fault;
-    }
-
-    if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
-        /* The starting level depends on the virtual address size (which can
-         * be up to 48 bits) and the translation granule size. It indicates
-         * the number of strides (stride bits at a time) needed to
-         * consume the bits of the input address. In the pseudocode this is:
-         *  level = 4 - RoundUp((inputsize - grainsize) / stride)
-         * where their 'inputsize' is our 'inputsize', 'grainsize' is
-         * our 'stride + 3' and 'stride' is our 'stride'.
-         * Applying the usual "rounded up m/n is (m+n-1)/n" and simplifying:
-         * = 4 - (inputsize - stride - 3 + stride - 1) / stride
-         * = 4 - (inputsize - 4) / stride;
-         */
-        level = 4 - (inputsize - 4) / stride;
-    } else {
-        /* For stage 2 translations the starting level is specified by the
-         * VTCR_EL2.SL0 field (whose interpretation depends on the page size)
-         */
-        uint32_t sl0 = extract32(tcr->raw_tcr, 6, 2);
-        uint32_t sl2 = extract64(tcr->raw_tcr, 33, 1);
-        uint32_t startlevel;
-        bool ok;
-
-        /* SL2 is RES0 unless DS=1 & 4kb granule. */
-        if (param.ds && stride == 9 && sl2) {
-            if (sl0 != 0) {
-                level = 0;
-                fault_type = ARMFault_Translation;
-                goto do_fault;
-            }
-            startlevel = -1;
-        } else if (!aarch64 || stride == 9) {
-            /* AArch32 or 4KB pages */
-            startlevel = 2 - sl0;
-
-            if (cpu_isar_feature(aa64_st, cpu)) {
-                startlevel &= 3;
-            }
-        } else {
-            /* 16KB or 64KB pages */
-            startlevel = 3 - sl0;
-        }
-
-        /* Check that the starting level is valid. */
-        ok = check_s2_mmu_setup(cpu, aarch64, startlevel,
-                                inputsize, stride, outputsize);
-        if (!ok) {
-            fault_type = ARMFault_Translation;
-            goto do_fault;
-        }
-        level = startlevel;
-    }
-
-    indexmask_grainsize = MAKE_64BIT_MASK(0, stride + 3);
-    indexmask = MAKE_64BIT_MASK(0, inputsize - (stride * (4 - level)));
-
-    /* Now we can extract the actual base address from the TTBR */
-    descaddr = extract64(ttbr, 0, 48);
-
-    /*
-     * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [5:2] of TTBR.
-     *
-     * Otherwise, if the base address is out of range, raise AddressSizeFault.
-     * In the pseudocode, this is !IsZero(baseregister<47:outputsize>),
-     * but we've just cleared the bits above 47, so simplify the test.
-     */
-    if (outputsize > 48) {
-        descaddr |= extract64(ttbr, 2, 4) << 48;
-    } else if (descaddr >> outputsize) {
-        level = 0;
-        fault_type = ARMFault_AddressSize;
-        goto do_fault;
-    }
-
-    /*
-     * We rely on this masking to clear the RES0 bits at the bottom of the TTBR
-     * and also to mask out CnP (bit 0) which could validly be non-zero.
-     */
-    descaddr &= ~indexmask;
-
-    /*
-     * For AArch32, the address field in the descriptor goes up to bit 39
-     * for both v7 and v8.  However, for v8 the SBZ bits [47:40] must be 0
-     * or an AddressSize fault is raised.  So for v8 we extract those SBZ
-     * bits as part of the address, which will be checked via outputsize.
-     * For AArch64, the address field goes up to bit 47, or 49 with FEAT_LPA2;
-     * the highest bits of a 52-bit output are placed elsewhere.
-     */
-    if (param.ds) {
-        descaddrmask = MAKE_64BIT_MASK(0, 50);
-    } else if (arm_feature(env, ARM_FEATURE_V8)) {
-        descaddrmask = MAKE_64BIT_MASK(0, 48);
-    } else {
-        descaddrmask = MAKE_64BIT_MASK(0, 40);
-    }
-    descaddrmask &= ~indexmask_grainsize;
-
-    /* Secure accesses start with the page table in secure memory and
-     * can be downgraded to non-secure at any step. Non-secure accesses
-     * remain non-secure. We implement this by just ORing in the NSTable/NS
-     * bits at each step.
-     */
-    tableattrs = regime_is_secure(env, mmu_idx) ? 0 : (1 << 4);
-    for (;;) {
-        uint64_t descriptor;
-        bool nstable;
-
-        descaddr |= (address >> (stride * (4 - level))) & indexmask;
-        descaddr &= ~7ULL;
-        nstable = extract32(tableattrs, 4, 1);
-        descriptor = arm_ldq_ptw(cs, descaddr, !nstable, mmu_idx, fi);
-        if (fi->type != ARMFault_None) {
-            goto do_fault;
-        }
-
-        if (!(descriptor & 1) ||
-            (!(descriptor & 2) && (level == 3))) {
-            /* Invalid, or the Reserved level 3 encoding */
-            goto do_fault;
-        }
-
-        descaddr = descriptor & descaddrmask;
-
-        /*
-         * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [15:12]
-         * of descriptor.  For FEAT_LPA2 and effective DS, bits [51:50] of
-         * descaddr are in [9:8].  Otherwise, if descaddr is out of range,
-         * raise AddressSizeFault.
-         */
-        if (outputsize > 48) {
-            if (param.ds) {
-                descaddr |= extract64(descriptor, 8, 2) << 50;
-            } else {
-                descaddr |= extract64(descriptor, 12, 4) << 48;
-            }
-        } else if (descaddr >> outputsize) {
-            fault_type = ARMFault_AddressSize;
-            goto do_fault;
-        }
-
-        if ((descriptor & 2) && (level < 3)) {
-            /* Table entry. The top five bits are attributes which may
-             * propagate down through lower levels of the table (and
-             * which are all arranged so that 0 means "no effect", so
-             * we can gather them up by ORing in the bits at each level).
-             */
-            tableattrs |= extract64(descriptor, 59, 5);
-            level++;
-            indexmask = indexmask_grainsize;
-            continue;
-        }
-        /*
-         * Block entry at level 1 or 2, or page entry at level 3.
-         * These are basically the same thing, although the number
-         * of bits we pull in from the vaddr varies. Note that although
-         * descaddrmask masks enough of the low bits of the descriptor
-         * to give a correct page or table address, the address field
-         * in a block descriptor is smaller; so we need to explicitly
-         * clear the lower bits here before ORing in the low vaddr bits.
-         */
-        page_size = (1ULL << ((stride * (4 - level)) + 3));
-        descaddr &= ~(page_size - 1);
-        descaddr |= (address & (page_size - 1));
-        /* Extract attributes from the descriptor */
-        attrs = extract64(descriptor, 2, 10)
-            | (extract64(descriptor, 52, 12) << 10);
-
-        if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
-            /* Stage 2 table descriptors do not include any attribute fields */
-            break;
-        }
-        /* Merge in attributes from table descriptors */
-        attrs |= nstable << 3; /* NS */
-        guarded = extract64(descriptor, 50, 1);  /* GP */
-        if (param.hpd) {
-            /* HPD disables all the table attributes except NSTable.  */
-            break;
-        }
-        attrs |= extract32(tableattrs, 0, 2) << 11;     /* XN, PXN */
-        /* The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
-         * means "force PL1 access only", which means forcing AP[1] to 0.
-         */
-        attrs &= ~(extract32(tableattrs, 2, 1) << 4);   /* !APT[0] => AP[1] */
-        attrs |= extract32(tableattrs, 3, 1) << 5;      /* APT[1] => AP[2] */
-        break;
-    }
-    /* Here descaddr is the final physical address, and attributes
-     * are all in attrs.
-     */
-    fault_type = ARMFault_AccessFlag;
-    if ((attrs & (1 << 8)) == 0) {
-        /* Access flag */
-        goto do_fault;
-    }
-
-    ap = extract32(attrs, 4, 2);
-
-    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
-        ns = mmu_idx == ARMMMUIdx_Stage2;
-        xn = extract32(attrs, 11, 2);
-        *prot = get_S2prot(env, ap, xn, s1_is_el0);
-    } else {
-        ns = extract32(attrs, 3, 1);
-        xn = extract32(attrs, 12, 1);
-        pxn = extract32(attrs, 11, 1);
-        *prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
-    }
-
-    fault_type = ARMFault_Permission;
-    if (!(*prot & (1 << access_type))) {
-        goto do_fault;
-    }
-
-    if (ns) {
-        /* The NS bit will (as required by the architecture) have no effect if
-         * the CPU doesn't support TZ or this is a non-secure translation
-         * regime, because the attribute will already be non-secure.
-         */
-        txattrs->secure = false;
-    }
-    /* When in aarch64 mode, and BTI is enabled, remember GP in the IOTLB.  */
-    if (aarch64 && guarded && cpu_isar_feature(aa64_bti, cpu)) {
-        arm_tlb_bti_gp(txattrs) = true;
-    }
-
-    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
-        cacheattrs->is_s2_format = true;
-        cacheattrs->attrs = extract32(attrs, 0, 4);
-    } else {
-        /* Index into MAIR registers for cache attributes */
-        uint8_t attrindx = extract32(attrs, 0, 3);
-        uint64_t mair = env->cp15.mair_el[regime_el(env, mmu_idx)];
-        assert(attrindx <= 7);
-        cacheattrs->is_s2_format = false;
-        cacheattrs->attrs = extract64(mair, attrindx * 8, 8);
-    }
-
-    /*
-     * For FEAT_LPA2 and effective DS, the SH field in the attributes
-     * was re-purposed for output address bits.  The SH attribute in
-     * that case comes from TCR_ELx, which we extracted earlier.
-     */
-    if (param.ds) {
-        cacheattrs->shareability = param.sh;
-    } else {
-        cacheattrs->shareability = extract32(attrs, 6, 2);
-    }
-
-    *phys_ptr = descaddr;
-    *page_size_ptr = page_size;
-    return false;
-
-do_fault:
-    fi->type = fault_type;
-    fi->level = level;
-    /* Tag the error as S2 for failed S1 PTW at S2 or ordinary S2.  */
-    fi->stage2 = fi->s1ptw || (mmu_idx == ARMMMUIdx_Stage2 ||
-                               mmu_idx == ARMMMUIdx_Stage2_S);
-    fi->s1ns = mmu_idx == ARMMMUIdx_Stage2;
-    return true;
-}
-
 hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
                                          MemTxAttrs *attrs)
 {
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ do_fault:
     return true;
 }
 
+/**
+ * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
+ *
+ * Returns false if the translation was successful. Otherwise, phys_ptr,
+ * attrs, prot and page_size may not be filled in, and the populated fsr
+ * value provides information on why the translation aborted, in the format
+ * of a long-format DFSR/IFSR fault register, with the following caveat:
+ * the WnR bit is never set (the caller must do this).
+ *
+ * @env: CPUARMState
+ * @address: virtual address to get physical address for
+ * @access_type: MMU_DATA_LOAD, MMU_DATA_STORE or MMU_INST_FETCH
+ * @mmu_idx: MMU index indicating required translation regime
+ * @s1_is_el0: if @mmu_idx is ARMMMUIdx_Stage2 (so this is a stage 2 page
+ *             table walk), must be true if this is stage 2 of a stage 1+2
+ *             walk for an EL0 access. If @mmu_idx is anything else,
+ *             @s1_is_el0 is ignored.
+ * @phys_ptr: set to the physical address corresponding to the virtual address
+ * @attrs: set to the memory transaction attributes to use
+ * @prot: set to the permissions for the page containing phys_ptr
+ * @page_size_ptr: set to the size of the page containing phys_ptr
+ * @fi: set to fault info if the translation fails
+ * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
+ */
+bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
+                        MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                        bool s1_is_el0,
+                        hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
+                        target_ulong *page_size_ptr,
+                        ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
+{
+    ARMCPU *cpu = env_archcpu(env);
+    CPUState *cs = CPU(cpu);
+    /* Read an LPAE long-descriptor translation table. */
+    ARMFaultType fault_type = ARMFault_Translation;
+    uint32_t level;
+    ARMVAParameters param;
+    uint64_t ttbr;
+    hwaddr descaddr, indexmask, indexmask_grainsize;
+    uint32_t tableattrs;
+    target_ulong page_size;
+    uint32_t attrs;
+    int32_t stride;
+    int addrsize, inputsize, outputsize;
+    TCR *tcr = regime_tcr(env, mmu_idx);
+    int ap, ns, xn, pxn;
+    uint32_t el = regime_el(env, mmu_idx);
+    uint64_t descaddrmask;
+    bool aarch64 = arm_el_is_aa64(env, el);
+    bool guarded = false;
+
+    /* TODO: This code does not support shareability levels. */
+    if (aarch64) {
+        int ps;
+
+        param = aa64_va_parameters(env, address, mmu_idx,
+                                   access_type != MMU_INST_FETCH);
+        level = 0;
+
+        /*
+         * If TxSZ is programmed to a value larger than the maximum,
+         * or smaller than the effective minimum, it is IMPLEMENTATION
+         * DEFINED whether we behave as if the field were programmed
+         * within bounds, or if a level 0 Translation fault is generated.
+         *
+         * With FEAT_LVA, fault on less than minimum becomes required,
+         * so our choice is to always raise the fault.
+         */
+        if (param.tsz_oob) {
+            fault_type = ARMFault_Translation;
+            goto do_fault;
+        }
+
+        addrsize = 64 - 8 * param.tbi;
+        inputsize = 64 - param.tsz;
+
+        /*
+         * Bound PS by PARANGE to find the effective output address size.
+         * ID_AA64MMFR0 is a read-only register so values outside of the
+         * supported mappings can be considered an implementation error.
+         */
+        ps = FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
+        ps = MIN(ps, param.ps);
+        assert(ps < ARRAY_SIZE(pamax_map));
+        outputsize = pamax_map[ps];
+    } else {
+        param = aa32_va_parameters(env, address, mmu_idx);
+        level = 1;
+        addrsize = (mmu_idx == ARMMMUIdx_Stage2 ? 40 : 32);
+        inputsize = addrsize - param.tsz;
+        outputsize = 40;
+    }
+
+    /*
+     * We determined the region when collecting the parameters, but we
+     * have not yet validated that the address is valid for the region.
+     * Extract the top bits and verify that they all match select.
+     *
+     * For aa32, if inputsize == addrsize, then we have selected the
+     * region by exclusion in aa32_va_parameters and there is no more
+     * validation to do here.
+     */
+    if (inputsize < addrsize) {
+        target_ulong top_bits = sextract64(address, inputsize,
+                                           addrsize - inputsize);
+        if (-top_bits != param.select) {
+            /* The gap between the two regions is a Translation fault */
+            fault_type = ARMFault_Translation;
+            goto do_fault;
+        }
+    }
+
+    if (param.using64k) {
+        stride = 13;
+    } else if (param.using16k) {
+        stride = 11;
+    } else {
+        stride = 9;
+    }
+
+    /*
+     * Note that QEMU ignores shareability and cacheability attributes,
+     * so we don't need to do anything with the SH, ORGN, IRGN fields
+     * in the TTBCR.  Similarly, TTBCR:A1 selects whether we get the
+     * ASID from TTBR0 or TTBR1, but QEMU's TLB doesn't currently
+     * implement any ASID-like capability so we can ignore it (instead
+     * we will always flush the TLB any time the ASID is changed).
+     */
+    ttbr = regime_ttbr(env, mmu_idx, param.select);
+
+    /*
+     * Here we should have set up all the parameters for the translation:
+     * inputsize, ttbr, epd, stride, tbi
+     */
+
+    if (param.epd) {
+        /*
+         * Translation table walk disabled => Translation fault on TLB miss
+         * Note: This is always 0 on 64-bit EL2 and EL3.
+         */
+        goto do_fault;
+    }
+
+    if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
+        /*
+         * The starting level depends on the virtual address size (which can
+         * be up to 48 bits) and the translation granule size. It indicates
+         * the number of strides (stride bits at a time) needed to
+         * consume the bits of the input address. In the pseudocode this is:
+         *  level = 4 - RoundUp((inputsize - grainsize) / stride)
+         * where their 'inputsize' is our 'inputsize', 'grainsize' is
+         * our 'stride + 3' and 'stride' is our 'stride'.
+         * Applying the usual "rounded up m/n is (m+n-1)/n" and simplifying:
+         * = 4 - (inputsize - stride - 3 + stride - 1) / stride
+         * = 4 - (inputsize - 4) / stride;
+         */
+        level = 4 - (inputsize - 4) / stride;
+    } else {
+        /*
+         * For stage 2 translations the starting level is specified by the
+         * VTCR_EL2.SL0 field (whose interpretation depends on the page size)
+         */
+        uint32_t sl0 = extract32(tcr->raw_tcr, 6, 2);
+        uint32_t sl2 = extract64(tcr->raw_tcr, 33, 1);
+        uint32_t startlevel;
+        bool ok;
+
+        /* SL2 is RES0 unless DS=1 & 4kb granule. */
+        if (param.ds && stride == 9 && sl2) {
+            if (sl0 != 0) {
+                level = 0;
+                fault_type = ARMFault_Translation;
+                goto do_fault;
+            }
+            startlevel = -1;
+        } else if (!aarch64 || stride == 9) {
+            /* AArch32 or 4KB pages */
+            startlevel = 2 - sl0;
+
+            if (cpu_isar_feature(aa64_st, cpu)) {
+                startlevel &= 3;
+            }
+        } else {
+            /* 16KB or 64KB pages */
+            startlevel = 3 - sl0;
+        }
+
+        /* Check that the starting level is valid. */
+        ok = check_s2_mmu_setup(cpu, aarch64, startlevel,
+                                inputsize, stride, outputsize);
+        if (!ok) {
+            fault_type = ARMFault_Translation;
+            goto do_fault;
+        }
+        level = startlevel;
+    }
+
+    indexmask_grainsize = MAKE_64BIT_MASK(0, stride + 3);
+    indexmask = MAKE_64BIT_MASK(0, inputsize - (stride * (4 - level)));
+
+    /* Now we can extract the actual base address from the TTBR */
+    descaddr = extract64(ttbr, 0, 48);
+
+    /*
+     * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [5:2] of TTBR.
+     *
+     * Otherwise, if the base address is out of range, raise AddressSizeFault.
+     * In the pseudocode, this is !IsZero(baseregister<47:outputsize>),
+     * but we've just cleared the bits above 47, so simplify the test.
+     */
+    if (outputsize > 48) {
+        descaddr |= extract64(ttbr, 2, 4) << 48;
+    } else if (descaddr >> outputsize) {
+        level = 0;
+        fault_type = ARMFault_AddressSize;
+        goto do_fault;
+    }
+
+    /*
+     * We rely on this masking to clear the RES0 bits at the bottom of the TTBR
+     * and also to mask out CnP (bit 0) which could validly be non-zero.
+     */
+    descaddr &= ~indexmask;
+
+    /*
+     * For AArch32, the address field in the descriptor goes up to bit 39
+     * for both v7 and v8.  However, for v8 the SBZ bits [47:40] must be 0
+     * or an AddressSize fault is raised.  So for v8 we extract those SBZ
+     * bits as part of the address, which will be checked via outputsize.
+     * For AArch64, the address field goes up to bit 47, or 49 with FEAT_LPA2;
+     * the highest bits of a 52-bit output are placed elsewhere.
+     */
+    if (param.ds) {
+        descaddrmask = MAKE_64BIT_MASK(0, 50);
+    } else if (arm_feature(env, ARM_FEATURE_V8)) {
+        descaddrmask = MAKE_64BIT_MASK(0, 48);
+    } else {
+        descaddrmask = MAKE_64BIT_MASK(0, 40);
+    }
+    descaddrmask &= ~indexmask_grainsize;
+
+    /*
+     * Secure accesses start with the page table in secure memory and
+     * can be downgraded to non-secure at any step. Non-secure accesses
+     * remain non-secure. We implement this by just ORing in the NSTable/NS
+     * bits at each step.
+     */
+    tableattrs = regime_is_secure(env, mmu_idx) ? 0 : (1 << 4);
+    for (;;) {
+        uint64_t descriptor;
+        bool nstable;
+
+        descaddr |= (address >> (stride * (4 - level))) & indexmask;
+        descaddr &= ~7ULL;
+        nstable = extract32(tableattrs, 4, 1);
+        descriptor = arm_ldq_ptw(cs, descaddr, !nstable, mmu_idx, fi);
+        if (fi->type != ARMFault_None) {
+            goto do_fault;
+        }
+
+        if (!(descriptor & 1) ||
+            (!(descriptor & 2) && (level == 3))) {
+            /* Invalid, or the Reserved level 3 encoding */
+            goto do_fault;
+        }
+
+        descaddr = descriptor & descaddrmask;
+
+        /*
+         * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [15:12]
+         * of descriptor.  For FEAT_LPA2 and effective DS, bits [51:50] of
+         * descaddr are in [9:8].  Otherwise, if descaddr is out of range,
+         * raise AddressSizeFault.
+         */
+        if (outputsize > 48) {
+            if (param.ds) {
+                descaddr |= extract64(descriptor, 8, 2) << 50;
+            } else {
+                descaddr |= extract64(descriptor, 12, 4) << 48;
+            }
+        } else if (descaddr >> outputsize) {
+            fault_type = ARMFault_AddressSize;
+            goto do_fault;
+        }
+
+        if ((descriptor & 2) && (level < 3)) {
+            /*
+             * Table entry. The top five bits are attributes which may
+             * propagate down through lower levels of the table (and
+             * which are all arranged so that 0 means "no effect", so
+             * we can gather them up by ORing in the bits at each level).
+             */
+            tableattrs |= extract64(descriptor, 59, 5);
+            level++;
+            indexmask = indexmask_grainsize;
+            continue;
+        }
+        /*
+         * Block entry at level 1 or 2, or page entry at level 3.
+         * These are basically the same thing, although the number
+         * of bits we pull in from the vaddr varies. Note that although
+         * descaddrmask masks enough of the low bits of the descriptor
+         * to give a correct page or table address, the address field
+         * in a block descriptor is smaller; so we need to explicitly
+         * clear the lower bits here before ORing in the low vaddr bits.
+         */
+        page_size = (1ULL << ((stride * (4 - level)) + 3));
+        descaddr &= ~(page_size - 1);
+        descaddr |= (address & (page_size - 1));
+        /* Extract attributes from the descriptor */
+        attrs = extract64(descriptor, 2, 10)
+            | (extract64(descriptor, 52, 12) << 10);
+
+        if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
+            /* Stage 2 table descriptors do not include any attribute fields */
+            break;
+        }
+        /* Merge in attributes from table descriptors */
+        attrs |= nstable << 3; /* NS */
+        guarded = extract64(descriptor, 50, 1);  /* GP */
+        if (param.hpd) {
+            /* HPD disables all the table attributes except NSTable.  */
+            break;
+        }
+        attrs |= extract32(tableattrs, 0, 2) << 11;     /* XN, PXN */
+        /*
+         * The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
+         * means "force PL1 access only", which means forcing AP[1] to 0.
+         */
+        attrs &= ~(extract32(tableattrs, 2, 1) << 4);   /* !APT[0] => AP[1] */
+        attrs |= extract32(tableattrs, 3, 1) << 5;      /* APT[1] => AP[2] */
+        break;
+    }
+    /*
+     * Here descaddr is the final physical address, and attributes
+     * are all in attrs.
+     */
+    fault_type = ARMFault_AccessFlag;
+    if ((attrs & (1 << 8)) == 0) {
+        /* Access flag */
+        goto do_fault;
+    }
+
+    ap = extract32(attrs, 4, 2);
+
+    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
+        ns = mmu_idx == ARMMMUIdx_Stage2;
+        xn = extract32(attrs, 11, 2);
+        *prot = get_S2prot(env, ap, xn, s1_is_el0);
+    } else {
+        ns = extract32(attrs, 3, 1);
+        xn = extract32(attrs, 12, 1);
+        pxn = extract32(attrs, 11, 1);
+        *prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
+    }
+
+    fault_type = ARMFault_Permission;
+    if (!(*prot & (1 << access_type))) {
+        goto do_fault;
+    }
+
+    if (ns) {
+        /*
+         * The NS bit will (as required by the architecture) have no effect if
+         * the CPU doesn't support TZ or this is a non-secure translation
+         * regime, because the attribute will already be non-secure.
+         */
+        txattrs->secure = false;
+    }
+    /* When in aarch64 mode, and BTI is enabled, remember GP in the IOTLB.  */
+    if (aarch64 && guarded && cpu_isar_feature(aa64_bti, cpu)) {
+        arm_tlb_bti_gp(txattrs) = true;
+    }
+
+    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
+        cacheattrs->is_s2_format = true;
+        cacheattrs->attrs = extract32(attrs, 0, 4);
+    } else {
+        /* Index into MAIR registers for cache attributes */
+        uint8_t attrindx = extract32(attrs, 0, 3);
+        uint64_t mair = env->cp15.mair_el[regime_el(env, mmu_idx)];
+        assert(attrindx <= 7);
+        cacheattrs->is_s2_format = false;
+        cacheattrs->attrs = extract64(mair, attrindx * 8, 8);
+    }
+
+    /*
+     * For FEAT_LPA2 and effective DS, the SH field in the attributes
+     * was re-purposed for output address bits.  The SH attribute in
+     * that case comes from TCR_ELx, which we extracted earlier.
+     */
+    if (param.ds) {
+        cacheattrs->shareability = param.sh;
+    } else {
+        cacheattrs->shareability = extract32(attrs, 6, 2);
+    }
+
+    *phys_ptr = descaddr;
+    *page_size_ptr = page_size;
+    return false;
+
+do_fault:
+    fi->type = fault_type;
+    fi->level = level;
+    /* Tag the error as S2 for failed S1 PTW at S2 or ordinary S2.  */
+    fi->stage2 = fi->s1ptw || (mmu_idx == ARMMMUIdx_Stage2 ||
+                               mmu_idx == ARMMMUIdx_Stage2_S);
+    fi->s1ns = mmu_idx == ARMMMUIdx_Stage2;
+    return true;
+}
+
 static bool get_phys_addr_pmsav5(CPUARMState *env, uint32_t address,
                                  MMUAccessType access_type, ARMMMUIdx mmu_idx,
                                  hwaddr *phys_ptr, int *prot,
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Move the ptw load functions, plus 3 common subroutines:
S1_ptw_translate, ptw_attrs_are_device, and regime_translation_big_endian.
This also allows get_phys_addr_lpae to become static again.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-17-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  13 ----
 target/arm/helper.c | 141 --------------------------------------
 target/arm/ptw.c    | 160 ++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 154 insertions(+), 160 deletions(-)

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.h
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@
 
 extern const uint8_t pamax_map[7];
 
-uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
-                     ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi);
-uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
-                     ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi);
-
 bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx);
 bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
 uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn);
@@ -XXX,XX +XXX,XX @@ int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0);
 int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
                int ap, int ns, int xn, int pxn);
 
-bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-                        MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                        bool s1_is_el0,
-                        hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
-                        target_ulong *page_size_ptr,
-                        ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
-    __attribute__((nonnull));
-
 #endif /* !CONFIG_USER_ONLY */
 #endif /* TARGET_ARM_PTW_H */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx)
     return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
 }
 
-static inline bool regime_translation_big_endian(CPUARMState *env,
-                                                 ARMMMUIdx mmu_idx)
-{
-    return (regime_sctlr(env, mmu_idx) & SCTLR_EE) != 0;
-}
-
 /* Return the TTBR associated with this translation regime */
 uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn)
 {
@@ -XXX,XX +XXX,XX @@ int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
     return prot_rw | PAGE_EXEC;
 }
 
-static bool ptw_attrs_are_device(CPUARMState *env, ARMCacheAttrs cacheattrs)
-{
-    /*
-     * For an S1 page table walk, the stage 1 attributes are always
-     * some form of "this is Normal memory". The combined S1+S2
-     * attributes are therefore only Device if stage 2 specifies Device.
-     * With HCR_EL2.FWB == 0 this is when descriptor bits [5:4] are 0b00,
-     * ie when cacheattrs.attrs bits [3:2] are 0b00.
-     * With HCR_EL2.FWB == 1 this is when descriptor bit [4] is 0, ie
-     * when cacheattrs.attrs bit [2] is 0.
-     */
-    assert(cacheattrs.is_s2_format);
-    if (arm_hcr_el2_eff(env) & HCR_FWB) {
-        return (cacheattrs.attrs & 0x4) == 0;
-    } else {
-        return (cacheattrs.attrs & 0xc) == 0;
-    }
-}
-
-/* Translate a S1 pagetable walk through S2 if needed.  */
-static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
-                               hwaddr addr, bool *is_secure,
-                               ARMMMUFaultInfo *fi)
-{
-    if (arm_mmu_idx_is_stage1_of_2(mmu_idx) &&
-        !regime_translation_disabled(env, ARMMMUIdx_Stage2)) {
-        target_ulong s2size;
-        hwaddr s2pa;
-        int s2prot;
-        int ret;
-        ARMMMUIdx s2_mmu_idx = *is_secure ? ARMMMUIdx_Stage2_S
-                                          : ARMMMUIdx_Stage2;
-        ARMCacheAttrs cacheattrs = {};
-        MemTxAttrs txattrs = {};
-
-        ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, s2_mmu_idx, false,
-                                 &s2pa, &txattrs, &s2prot, &s2size, fi,
-                                 &cacheattrs);
-        if (ret) {
-            assert(fi->type != ARMFault_None);
-            fi->s2addr = addr;
-            fi->stage2 = true;
-            fi->s1ptw = true;
-            fi->s1ns = !*is_secure;
-            return ~0;
-        }
-        if ((arm_hcr_el2_eff(env) & HCR_PTW) &&
-            ptw_attrs_are_device(env, cacheattrs)) {
-            /*
-             * PTW set and S1 walk touched S2 Device memory:
-             * generate Permission fault.
-             */
-            fi->type = ARMFault_Permission;
-            fi->s2addr = addr;
-            fi->stage2 = true;
-            fi->s1ptw = true;
-            fi->s1ns = !*is_secure;
-            return ~0;
-        }
-
-        if (arm_is_secure_below_el3(env)) {
-            /* Check if page table walk is to secure or non-secure PA space. */
-            if (*is_secure) {
-                *is_secure = !(env->cp15.vstcr_el2.raw_tcr & VSTCR_SW);
-            } else {
-                *is_secure = !(env->cp15.vtcr_el2.raw_tcr & VTCR_NSW);
-            }
-        } else {
-            assert(!*is_secure);
-        }
-
-        addr = s2pa;
-    }
-    return addr;
-}
-
-/* All loads done in the course of a page table walk go through here. */
-uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
-                     ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
-{
-    ARMCPU *cpu = ARM_CPU(cs);
-    CPUARMState *env = &cpu->env;
-    MemTxAttrs attrs = {};
-    MemTxResult result = MEMTX_OK;
-    AddressSpace *as;
-    uint32_t data;
-
-    addr = S1_ptw_translate(env, mmu_idx, addr, &is_secure, fi);
-    attrs.secure = is_secure;
-    as = arm_addressspace(cs, attrs);
-    if (fi->s1ptw) {
-        return 0;
-    }
-    if (regime_translation_big_endian(env, mmu_idx)) {
-        data = address_space_ldl_be(as, addr, attrs, &result);
-    } else {
-        data = address_space_ldl_le(as, addr, attrs, &result);
-    }
-    if (result == MEMTX_OK) {
-        return data;
-    }
-    fi->type = ARMFault_SyncExternalOnWalk;
-    fi->ea = arm_extabort_type(result);
-    return 0;
-}
-
-uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
-                     ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
-{
-    ARMCPU *cpu = ARM_CPU(cs);
-    CPUARMState *env = &cpu->env;
-    MemTxAttrs attrs = {};
-    MemTxResult result = MEMTX_OK;
-    AddressSpace *as;
-    uint64_t data;
-
-    addr = S1_ptw_translate(env, mmu_idx, addr, &is_secure, fi);
-    attrs.secure = is_secure;
-    as = arm_addressspace(cs, attrs);
-    if (fi->s1ptw) {
-        return 0;
-    }
-    if (regime_translation_big_endian(env, mmu_idx)) {
-        data = address_space_ldq_be(as, addr, attrs, &result);
-    } else {
-        data = address_space_ldq_le(as, addr, attrs, &result);
-    }
-    if (result == MEMTX_OK) {
-        return data;
-    }
-    fi->type = ARMFault_SyncExternalOnWalk;
-    fi->ea = arm_extabort_type(result);
-    return 0;
-}
-
 /*
  * check_s2_mmu_setup
  * @cpu:        ARMCPU
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@
 #include "ptw.h"
 
 
+static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
+                               MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                               bool s1_is_el0, hwaddr *phys_ptr,
+                               MemTxAttrs *txattrs, int *prot,
+                               target_ulong *page_size_ptr,
+                               ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
+    __attribute__((nonnull));
+
+static bool regime_translation_big_endian(CPUARMState *env, ARMMMUIdx mmu_idx)
+{
+    return (regime_sctlr(env, mmu_idx) & SCTLR_EE) != 0;
+}
+
+static bool ptw_attrs_are_device(CPUARMState *env, ARMCacheAttrs cacheattrs)
+{
+    /*
+     * For an S1 page table walk, the stage 1 attributes are always
+     * some form of "this is Normal memory". The combined S1+S2
+     * attributes are therefore only Device if stage 2 specifies Device.
+     * With HCR_EL2.FWB == 0 this is when descriptor bits [5:4] are 0b00,
+     * ie when cacheattrs.attrs bits [3:2] are 0b00.
+     * With HCR_EL2.FWB == 1 this is when descriptor bit [4] is 0, ie
+     * when cacheattrs.attrs bit [2] is 0.
+     */
+    assert(cacheattrs.is_s2_format);
+    if (arm_hcr_el2_eff(env) & HCR_FWB) {
+        return (cacheattrs.attrs & 0x4) == 0;
+    } else {
+        return (cacheattrs.attrs & 0xc) == 0;
+    }
+}
+
+/* Translate a S1 pagetable walk through S2 if needed.  */
+static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
+                               hwaddr addr, bool *is_secure,
+                               ARMMMUFaultInfo *fi)
+{
+    if (arm_mmu_idx_is_stage1_of_2(mmu_idx) &&
+        !regime_translation_disabled(env, ARMMMUIdx_Stage2)) {
+        target_ulong s2size;
+        hwaddr s2pa;
+        int s2prot;
+        int ret;
+        ARMMMUIdx s2_mmu_idx = *is_secure ? ARMMMUIdx_Stage2_S
+                                          : ARMMMUIdx_Stage2;
+        ARMCacheAttrs cacheattrs = {};
+        MemTxAttrs txattrs = {};
+
+        ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, s2_mmu_idx, false,
+                                 &s2pa, &txattrs, &s2prot, &s2size, fi,
+                                 &cacheattrs);
+        if (ret) {
+            assert(fi->type != ARMFault_None);
+            fi->s2addr = addr;
+            fi->stage2 = true;
+            fi->s1ptw = true;
+            fi->s1ns = !*is_secure;
+            return ~0;
+        }
+        if ((arm_hcr_el2_eff(env) & HCR_PTW) &&
+            ptw_attrs_are_device(env, cacheattrs)) {
+            /*
+             * PTW set and S1 walk touched S2 Device memory:
+             * generate Permission fault.
+             */
+            fi->type = ARMFault_Permission;
+            fi->s2addr = addr;
+            fi->stage2 = true;
+            fi->s1ptw = true;
+            fi->s1ns = !*is_secure;
+            return ~0;
+        }
+
+        if (arm_is_secure_below_el3(env)) {
+            /* Check if page table walk is to secure or non-secure PA space. */
+            if (*is_secure) {
+                *is_secure = !(env->cp15.vstcr_el2.raw_tcr & VSTCR_SW);
+            } else {
+                *is_secure = !(env->cp15.vtcr_el2.raw_tcr & VTCR_NSW);
+            }
+        } else {
+            assert(!*is_secure);
+        }
+
+        addr = s2pa;
+    }
+    return addr;
+}
+
+/* All loads done in the course of a page table walk go through here. */
+static uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
+                            ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
+{
+    ARMCPU *cpu = ARM_CPU(cs);
+    CPUARMState *env = &cpu->env;
+    MemTxAttrs attrs = {};
+    MemTxResult result = MEMTX_OK;
+    AddressSpace *as;
+    uint32_t data;
+
+    addr = S1_ptw_translate(env, mmu_idx, addr, &is_secure, fi);
+    attrs.secure = is_secure;
+    as = arm_addressspace(cs, attrs);
+    if (fi->s1ptw) {
+        return 0;
+    }
+    if (regime_translation_big_endian(env, mmu_idx)) {
+        data = address_space_ldl_be(as, addr, attrs, &result);
+    } else {
+        data = address_space_ldl_le(as, addr, attrs, &result);
+    }
+    if (result == MEMTX_OK) {
+        return data;
+    }
+    fi->type = ARMFault_SyncExternalOnWalk;
+    fi->ea = arm_extabort_type(result);
+    return 0;
+}
+
+static uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
+                            ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
+{
+    ARMCPU *cpu = ARM_CPU(cs);
+    CPUARMState *env = &cpu->env;
+    MemTxAttrs attrs = {};
+    MemTxResult result = MEMTX_OK;
+    AddressSpace *as;
+    uint64_t data;
+
+    addr = S1_ptw_translate(env, mmu_idx, addr, &is_secure, fi);
+    attrs.secure = is_secure;
+    as = arm_addressspace(cs, attrs);
+    if (fi->s1ptw) {
+        return 0;
+    }
+    if (regime_translation_big_endian(env, mmu_idx)) {
+        data = address_space_ldq_be(as, addr, attrs, &result);
+    } else {
+        data = address_space_ldq_le(as, addr, attrs, &result);
+    }
+    if (result == MEMTX_OK) {
+        return data;
+    }
+    fi->type = ARMFault_SyncExternalOnWalk;
+    fi->ea = arm_extabort_type(result);
+    return 0;
+}
+
 static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
                                      uint32_t *table, uint32_t address)
 {
@@ -XXX,XX +XXX,XX @@ do_fault:
  * @fi: set to fault info if the translation fails
  * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
  */
-bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-                        MMUAccessType access_type, ARMMMUIdx mmu_idx,
-                        bool s1_is_el0,
-                        hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
-                        target_ulong *page_size_ptr,
-                        ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
+static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
+                               MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                               bool s1_is_el0, hwaddr *phys_ptr,
+                               MemTxAttrs *txattrs, int *prot,
+                               target_ulong *page_size_ptr,
+                               ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
 {
     ARMCPU *cpu = env_archcpu(env);
     CPUState *cs = CPU(cpu);
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

These functions are used for both page table walking and for
deciding what format in which to deliver exception results.
Since ptw.c is only present for system mode, put the functions
into tlb_helper.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-18-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c     | 24 ------------------------
 target/arm/tlb_helper.c | 26 ++++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
 }
 #endif /* !CONFIG_USER_ONLY */
 
-/* Return true if the translation regime is using LPAE format page tables */
-bool regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
-{
-    int el = regime_el(env, mmu_idx);
-    if (el == 2 || arm_el_is_aa64(env, el)) {
-        return true;
-    }
-    if (arm_feature(env, ARM_FEATURE_LPAE)
-        && (regime_tcr(env, mmu_idx)->raw_tcr & TTBCR_EAE)) {
-        return true;
-    }
-    return false;
-}
-
-/* Returns true if the stage 1 translation regime is using LPAE format page
- * tables. Used when raising alignment exceptions, whose FSR changes depending
- * on whether the long or short descriptor format is in use. */
-bool arm_s1_regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
-{
-    mmu_idx = stage_1_mmu_idx(mmu_idx);
-
-    return regime_using_lpae_format(env, mmu_idx);
-}
-
 #ifndef CONFIG_USER_ONLY
 bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
 {
diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tlb_helper.c
+++ b/target/arm/tlb_helper.c
@@ -XXX,XX +XXX,XX @@
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
 
+
+/* Return true if the translation regime is using LPAE format page tables */
+bool regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
+{
+    int el = regime_el(env, mmu_idx);
+    if (el == 2 || arm_el_is_aa64(env, el)) {
+        return true;
+    }
+    if (arm_feature(env, ARM_FEATURE_LPAE)
+        && (regime_tcr(env, mmu_idx)->raw_tcr & TTBCR_EAE)) {
+        return true;
+    }
+    return false;
+}
+
+/*
+ * Returns true if the stage 1 translation regime is using LPAE format page
+ * tables. Used when raising alignment exceptions, whose FSR changes depending
+ * on whether the long or short descriptor format is in use.
+ */
+bool arm_s1_regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
+{
+    mmu_idx = stage_1_mmu_idx(mmu_idx);
+    return regime_using_lpae_format(env, mmu_idx);
+}
+
 static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
                                             unsigned int target_el,
                                             bool same_el, bool ea,
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-19-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  2 --
 target/arm/helper.c | 25 -------------------------
 target/arm/ptw.c    | 25 +++++++++++++++++++++++++
 3 files changed, 25 insertions(+), 27 deletions(-)

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.h
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@
 
 #ifndef CONFIG_USER_ONLY
 
-extern const uint8_t pamax_map[7];
-
 bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx);
 bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
 uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
 }
 #endif /* !CONFIG_USER_ONLY */
 
-/* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
-const uint8_t pamax_map[] = {
-    [0] = 32,
-    [1] = 36,
-    [2] = 40,
-    [3] = 42,
-    [4] = 44,
-    [5] = 48,
-    [6] = 52,
-};
-
-/* The cpu-specific constant value of PAMax; also used by hw/arm/virt. */
-unsigned int arm_pamax(ARMCPU *cpu)
-{
-    unsigned int parange =
-        FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
-
-    /*
-     * id_aa64mmfr0 is a read-only register so values outside of the
-     * supported mappings can be considered an implementation error.
-     */
-    assert(parange < ARRAY_SIZE(pamax_map));
-    return pamax_map[parange];
-}
-
 int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
 {
     if (regime_has_2_ranges(mmu_idx)) {
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
                                ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
     __attribute__((nonnull));
 
+/* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
+static const uint8_t pamax_map[] = {
+    [0] = 32,
+    [1] = 36,
+    [2] = 40,
+    [3] = 42,
+    [4] = 44,
+    [5] = 48,
+    [6] = 52,
+};
+
+/* The cpu-specific constant value of PAMax; also used by hw/arm/virt. */
+unsigned int arm_pamax(ARMCPU *cpu)
+{
+    unsigned int parange =
+        FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
+
+    /*
+     * id_aa64mmfr0 is a read-only register so values outside of the
+     * supported mappings can be considered an implementation error.
+     */
+    assert(parange < ARRAY_SIZE(pamax_map));
+    return pamax_map[parange];
+}
+
 static bool regime_translation_big_endian(CPUARMState *env, ARMMMUIdx mmu_idx)
 {
     return (regime_sctlr(env, mmu_idx) & SCTLR_EE) != 0;
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-20-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |   3 --
 target/arm/helper.c | 128 --------------------------------------------
 target/arm/ptw.c    | 128 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 128 insertions(+), 131 deletions(-)

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.h
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
                                    ARMMMUIdx mmu_idx);
 bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
                         int inputsize, int stride, int outputsize);
-int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0);
-int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
-               int ap, int ns, int xn, int pxn);
 
 #endif /* !CONFIG_USER_ONLY */
 #endif /* TARGET_ARM_PTW_H */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
     }
 }
 
-/* Translate S2 section/page access permissions to protection flags
- *
- * @env:     CPUARMState
- * @s2ap:    The 2-bit stage2 access permissions (S2AP)
- * @xn:      XN (execute-never) bits
- * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
- */
-int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
-{
-    int prot = 0;
-
-    if (s2ap & 1) {
-        prot |= PAGE_READ;
-    }
-    if (s2ap & 2) {
-        prot |= PAGE_WRITE;
-    }
-
-    if (cpu_isar_feature(any_tts2uxn, env_archcpu(env))) {
-        switch (xn) {
-        case 0:
-            prot |= PAGE_EXEC;
-            break;
-        case 1:
-            if (s1_is_el0) {
-                prot |= PAGE_EXEC;
-            }
-            break;
-        case 2:
-            break;
-        case 3:
-            if (!s1_is_el0) {
-                prot |= PAGE_EXEC;
-            }
-            break;
-        default:
-            g_assert_not_reached();
-        }
-    } else {
-        if (!extract32(xn, 1, 1)) {
-            if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
-                prot |= PAGE_EXEC;
-            }
-        }
-    }
-    return prot;
-}
-
-/* Translate section/page access permissions to protection flags
- *
- * @env:     CPUARMState
- * @mmu_idx: MMU index indicating required translation regime
- * @is_aa64: TRUE if AArch64
- * @ap:      The 2-bit simple AP (AP[2:1])
- * @ns:      NS (non-secure) bit
- * @xn:      XN (execute-never) bit
- * @pxn:     PXN (privileged execute-never) bit
- */
-int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
-               int ap, int ns, int xn, int pxn)
-{
-    bool is_user = regime_is_user(env, mmu_idx);
-    int prot_rw, user_rw;
-    bool have_wxn;
-    int wxn = 0;
-
-    assert(mmu_idx != ARMMMUIdx_Stage2);
-    assert(mmu_idx != ARMMMUIdx_Stage2_S);
-
-    user_rw = simple_ap_to_rw_prot_is_user(ap, true);
-    if (is_user) {
-        prot_rw = user_rw;
-    } else {
-        if (user_rw && regime_is_pan(env, mmu_idx)) {
-            /* PAN forbids data accesses but doesn't affect insn fetch */
-            prot_rw = 0;
-        } else {
-            prot_rw = simple_ap_to_rw_prot_is_user(ap, false);
-        }
-    }
-
-    if (ns && arm_is_secure(env) && (env->cp15.scr_el3 & SCR_SIF)) {
-        return prot_rw;
-    }
-
-    /* TODO have_wxn should be replaced with
-     *   ARM_FEATURE_V8 || (ARM_FEATURE_V7 && ARM_FEATURE_EL2)
-     * when ARM_FEATURE_EL2 starts getting set. For now we assume all LPAE
-     * compatible processors have EL2, which is required for [U]WXN.
-     */
-    have_wxn = arm_feature(env, ARM_FEATURE_LPAE);
-
-    if (have_wxn) {
-        wxn = regime_sctlr(env, mmu_idx) & SCTLR_WXN;
-    }
-
-    if (is_aa64) {
-        if (regime_has_2_ranges(mmu_idx) && !is_user) {
-            xn = pxn || (user_rw & PAGE_WRITE);
-        }
-    } else if (arm_feature(env, ARM_FEATURE_V7)) {
-        switch (regime_el(env, mmu_idx)) {
-        case 1:
-        case 3:
-            if (is_user) {
-                xn = xn || !(user_rw & PAGE_READ);
-            } else {
-                int uwxn = 0;
-                if (have_wxn) {
-                    uwxn = regime_sctlr(env, mmu_idx) & SCTLR_UWXN;
-                }
-                xn = xn || !(prot_rw & PAGE_READ) || pxn ||
-                     (uwxn && (user_rw & PAGE_WRITE));
-            }
-            break;
-        case 2:
-            break;
-        }
-    } else {
-        xn = wxn = 0;
-    }
-
-    if (xn || (wxn && (prot_rw & PAGE_WRITE))) {
-        return prot_rw;
-    }
-    return prot_rw | PAGE_EXEC;
-}
-
 /*
  * check_s2_mmu_setup
  * @cpu:        ARMCPU
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ do_fault:
     return true;
 }
 
+/*
+ * Translate S2 section/page access permissions to protection flags
+ * @env:     CPUARMState
+ * @s2ap:    The 2-bit stage2 access permissions (S2AP)
+ * @xn:      XN (execute-never) bits
+ * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
+ */
+static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
+{
+    int prot = 0;
+
+    if (s2ap & 1) {
+        prot |= PAGE_READ;
+    }
+    if (s2ap & 2) {
+        prot |= PAGE_WRITE;
+    }
+
+    if (cpu_isar_feature(any_tts2uxn, env_archcpu(env))) {
+        switch (xn) {
+        case 0:
+            prot |= PAGE_EXEC;
+            break;
+        case 1:
+            if (s1_is_el0) {
+                prot |= PAGE_EXEC;
+            }
+            break;
+        case 2:
+            break;
+        case 3:
+            if (!s1_is_el0) {
+                prot |= PAGE_EXEC;
+            }
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    } else {
+        if (!extract32(xn, 1, 1)) {
+            if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
+                prot |= PAGE_EXEC;
+            }
+        }
+    }
+    return prot;
+}
+
+/*
+ * Translate section/page access permissions to protection flags
+ * @env:     CPUARMState
+ * @mmu_idx: MMU index indicating required translation regime
+ * @is_aa64: TRUE if AArch64
+ * @ap:      The 2-bit simple AP (AP[2:1])
+ * @ns:      NS (non-secure) bit
+ * @xn:      XN (execute-never) bit
+ * @pxn:     PXN (privileged execute-never) bit
+ */
+static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
+                      int ap, int ns, int xn, int pxn)
+{
+    bool is_user = regime_is_user(env, mmu_idx);
+    int prot_rw, user_rw;
+    bool have_wxn;
+    int wxn = 0;
+
+    assert(mmu_idx != ARMMMUIdx_Stage2);
+    assert(mmu_idx != ARMMMUIdx_Stage2_S);
+
+    user_rw = simple_ap_to_rw_prot_is_user(ap, true);
+    if (is_user) {
+        prot_rw = user_rw;
+    } else {
+        if (user_rw && regime_is_pan(env, mmu_idx)) {
+            /* PAN forbids data accesses but doesn't affect insn fetch */
+            prot_rw = 0;
+        } else {
+            prot_rw = simple_ap_to_rw_prot_is_user(ap, false);
+        }
+    }
+
+    if (ns && arm_is_secure(env) && (env->cp15.scr_el3 & SCR_SIF)) {
+        return prot_rw;
+    }
+
+    /* TODO have_wxn should be replaced with
+     *   ARM_FEATURE_V8 || (ARM_FEATURE_V7 && ARM_FEATURE_EL2)
+     * when ARM_FEATURE_EL2 starts getting set. For now we assume all LPAE
+     * compatible processors have EL2, which is required for [U]WXN.
+     */
+    have_wxn = arm_feature(env, ARM_FEATURE_LPAE);
+
+    if (have_wxn) {
+        wxn = regime_sctlr(env, mmu_idx) & SCTLR_WXN;
+    }
+
+    if (is_aa64) {
+        if (regime_has_2_ranges(mmu_idx) && !is_user) {
+            xn = pxn || (user_rw & PAGE_WRITE);
+        }
+    } else if (arm_feature(env, ARM_FEATURE_V7)) {
+        switch (regime_el(env, mmu_idx)) {
+        case 1:
+        case 3:
+            if (is_user) {
+                xn = xn || !(user_rw & PAGE_READ);
+            } else {
+                int uwxn = 0;
+                if (have_wxn) {
+                    uwxn = regime_sctlr(env, mmu_idx) & SCTLR_UWXN;
+                }
+                xn = xn || !(prot_rw & PAGE_READ) || pxn ||
+                     (uwxn && (user_rw & PAGE_WRITE));
+            }
+            break;
+        case 2:
+            break;
+        }
+    } else {
+        xn = wxn = 0;
+    }
+
+    if (xn || (wxn && (prot_rw & PAGE_WRITE))) {
+        return prot_rw;
+    }
+    return prot_rw | PAGE_EXEC;
+}
+
 /**
  * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
  *
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-21-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  2 --
 target/arm/helper.c | 70 ---------------------------------------------
 target/arm/ptw.c    | 70 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 70 insertions(+), 72 deletions(-)

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.h
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
 
 ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
                                    ARMMMUIdx mmu_idx);
-bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
-                        int inputsize, int stride, int outputsize);
 
 #endif /* !CONFIG_USER_ONLY */
 #endif /* TARGET_ARM_PTW_H */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
         g_assert_not_reached();
     }
 }
-
-/*
- * check_s2_mmu_setup
- * @cpu:        ARMCPU
- * @is_aa64:    True if the translation regime is in AArch64 state
- * @startlevel: Suggested starting level
- * @inputsize:  Bitsize of IPAs
- * @stride:     Page-table stride (See the ARM ARM)
- *
- * Returns true if the suggested S2 translation parameters are OK and
- * false otherwise.
- */
-bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
-                        int inputsize, int stride, int outputsize)
-{
-    const int grainsize = stride + 3;
-    int startsizecheck;
-
-    /*
-     * Negative levels are usually not allowed...
-     * Except for FEAT_LPA2, 4k page table, 52-bit address space, which
-     * begins with level -1.  Note that previous feature tests will have
-     * eliminated this combination if it is not enabled.
-     */
-    if (level < (inputsize == 52 && stride == 9 ? -1 : 0)) {
-        return false;
-    }
-
-    startsizecheck = inputsize - ((3 - level) * stride + grainsize);
-    if (startsizecheck < 1 || startsizecheck > stride + 4) {
-        return false;
-    }
-
-    if (is_aa64) {
-        switch (stride) {
-        case 13: /* 64KB Pages.  */
-            if (level == 0 || (level == 1 && outputsize <= 42)) {
-                return false;
-            }
-            break;
-        case 11: /* 16KB Pages.  */
-            if (level == 0 || (level == 1 && outputsize <= 40)) {
-                return false;
-            }
-            break;
-        case 9: /* 4KB Pages.  */
-            if (level == 0 && outputsize <= 42) {
-                return false;
-            }
-            break;
-        default:
-            g_assert_not_reached();
-        }
-
-        /* Inputsize checks.  */
-        if (inputsize > outputsize &&
-            (arm_el_is_aa64(&cpu->env, 1) || inputsize > 40)) {
-            /* This is CONSTRAINED UNPREDICTABLE and we choose to fault.  */
-            return false;
-        }
-    } else {
-        /* AArch32 only supports 4KB pages. Assert on that.  */
-        assert(stride == 9);
-
-        if (level == 0) {
-            return false;
-        }
-    }
-    return true;
-}
 #endif /* !CONFIG_USER_ONLY */
 
 int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
     return prot_rw | PAGE_EXEC;
 }
 
+/*
+ * check_s2_mmu_setup
+ * @cpu:        ARMCPU
+ * @is_aa64:    True if the translation regime is in AArch64 state
+ * @startlevel: Suggested starting level
+ * @inputsize:  Bitsize of IPAs
+ * @stride:     Page-table stride (See the ARM ARM)
+ *
+ * Returns true if the suggested S2 translation parameters are OK and
+ * false otherwise.
+ */
+static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
+                               int inputsize, int stride, int outputsize)
+{
+    const int grainsize = stride + 3;
+    int startsizecheck;
+
+    /*
+     * Negative levels are usually not allowed...
+     * Except for FEAT_LPA2, 4k page table, 52-bit address space, which
+     * begins with level -1.  Note that previous feature tests will have
+     * eliminated this combination if it is not enabled.
+     */
+    if (level < (inputsize == 52 && stride == 9 ? -1 : 0)) {
+        return false;
+    }
+
+    startsizecheck = inputsize - ((3 - level) * stride + grainsize);
+    if (startsizecheck < 1 || startsizecheck > stride + 4) {
+        return false;
+    }
+
+    if (is_aa64) {
+        switch (stride) {
+        case 13: /* 64KB Pages.  */
+            if (level == 0 || (level == 1 && outputsize <= 42)) {
+                return false;
+            }
+            break;
+        case 11: /* 16KB Pages.  */
+            if (level == 0 || (level == 1 && outputsize <= 40)) {
+                return false;
+            }
+            break;
+        case 9: /* 4KB Pages.  */
+            if (level == 0 && outputsize <= 42) {
+                return false;
+            }
+            break;
+        default:
+            g_assert_not_reached();
+        }
+
+        /* Inputsize checks.  */
+        if (inputsize > outputsize &&
+            (arm_el_is_aa64(&cpu->env, 1) || inputsize > 40)) {
+            /* This is CONSTRAINED UNPREDICTABLE and we choose to fault.  */
+            return false;
+        }
+    } else {
+        /* AArch32 only supports 4KB pages. Assert on that.  */
+        assert(stride == 9);
+
+        if (level == 0) {
+            return false;
+        }
+    }
+    return true;
+}
+
 /**
  * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
  *
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-22-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  3 ---
 target/arm/helper.c | 64 ---------------------------------------------
 target/arm/ptw.c    | 64 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 64 insertions(+), 67 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-23-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    | 10 ------
 target/arm/helper.c | 77 ------------------------------------------
 target/arm/ptw.c    | 81 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+), 87 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-24-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  1 -
 target/arm/helper.c | 24 ------------------------
 target/arm/ptw.c    | 22 ++++++++++++++++++++++
 3 files changed, 22 insertions(+), 25 deletions(-)

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.h
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@
 
 #ifndef CONFIG_USER_ONLY
 
-bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx);
 bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
 uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn);
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
 }
 #endif /* !CONFIG_USER_ONLY */
 
-#ifndef CONFIG_USER_ONLY
-bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
-{
-    switch (mmu_idx) {
-    case ARMMMUIdx_SE10_0:
-    case ARMMMUIdx_E20_0:
-    case ARMMMUIdx_SE20_0:
-    case ARMMMUIdx_Stage1_E0:
-    case ARMMMUIdx_Stage1_SE0:
-    case ARMMMUIdx_MUser:
-    case ARMMMUIdx_MSUser:
-    case ARMMMUIdx_MUserNegPri:
-    case ARMMMUIdx_MSUserNegPri:
-        return true;
-    default:
-        return false;
-    case ARMMMUIdx_E10_0:
-    case ARMMMUIdx_E10_1:
-    case ARMMMUIdx_E10_1_PAN:
-        g_assert_not_reached();
-    }
-}
-#endif /* !CONFIG_USER_ONLY */
-
 int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
 {
     if (regime_has_2_ranges(mmu_idx)) {
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool regime_translation_big_endian(CPUARMState *env, ARMMMUIdx mmu_idx)
     return (regime_sctlr(env, mmu_idx) & SCTLR_EE) != 0;
 }
 
+static bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
+{
+    switch (mmu_idx) {
+    case ARMMMUIdx_SE10_0:
+    case ARMMMUIdx_E20_0:
+    case ARMMMUIdx_SE20_0:
+    case ARMMMUIdx_Stage1_E0:
+    case ARMMMUIdx_Stage1_SE0:
+    case ARMMMUIdx_MUser:
+    case ARMMMUIdx_MSUser:
+    case ARMMMUIdx_MUserNegPri:
+    case ARMMMUIdx_MSUserNegPri:
+        return true;
+    default:
+        return false;
+    case ARMMMUIdx_E10_0:
+    case ARMMMUIdx_E10_1:
+    case ARMMMUIdx_E10_1_PAN:
+        g_assert_not_reached();
+    }
+}
+
 static bool ptw_attrs_are_device(CPUARMState *env, ARMCacheAttrs cacheattrs)
 {
     /*
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-25-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    |  1 -
 target/arm/helper.c | 16 ----------------
 target/arm/ptw.c    | 16 ++++++++++++++++
 3 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.h
+++ b/target/arm/ptw.h
@@ -XXX,XX +XXX,XX @@
 #ifndef CONFIG_USER_ONLY
 
 bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
-uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn);
 
 #endif /* !CONFIG_USER_ONLY */
 #endif /* TARGET_ARM_PTW_H */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx)
     return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
 }
 
-/* Return the TTBR associated with this translation regime */
-uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn)
-{
-    if (mmu_idx == ARMMMUIdx_Stage2) {
-        return env->cp15.vttbr_el2;
-    }
-    if (mmu_idx == ARMMMUIdx_Stage2_S) {
-        return env->cp15.vsttbr_el2;
-    }
-    if (ttbrn == 0) {
-        return env->cp15.ttbr0_el[regime_el(env, mmu_idx)];
-    } else {
-        return env->cp15.ttbr1_el[regime_el(env, mmu_idx)];
-    }
-}
-
 /* Convert a possible stage1+2 MMU index into the appropriate
  * stage 1 MMU index
  */
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
     }
 }
 
+/* Return the TTBR associated with this translation regime */
+static uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn)
+{
+    if (mmu_idx == ARMMMUIdx_Stage2) {
+        return env->cp15.vttbr_el2;
+    }
+    if (mmu_idx == ARMMMUIdx_Stage2_S) {
+        return env->cp15.vsttbr_el2;
+    }
+    if (ttbrn == 0) {
+        return env->cp15.ttbr0_el[regime_el(env, mmu_idx)];
+    } else {
+        return env->cp15.ttbr1_el[regime_el(env, mmu_idx)];
+    }
+}
+
 static bool ptw_attrs_are_device(CPUARMState *env, ARMCacheAttrs cacheattrs)
 {
     /*
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-26-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.h    | 17 ----------------
 target/arm/helper.c | 47 ---------------------------------------------
 target/arm/ptw.c    | 47 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 46 insertions(+), 65 deletions(-)
 delete mode 100644 target/arm/ptw.h

diff --git a/target/arm/ptw.h b/target/arm/ptw.h
deleted file mode 100644
index XXXXXXX..XXXXXXX
--- a/target/arm/ptw.h
+++ /dev/null
@@ -XXX,XX +XXX,XX @@
-/*
- * ARM page table walking.
- *
- * This code is licensed under the GNU GPL v2 or later.
- *
- * SPDX-License-Identifier: GPL-2.0-or-later
- */
-
-#ifndef TARGET_ARM_PTW_H
-#define TARGET_ARM_PTW_H
-
-#ifndef CONFIG_USER_ONLY
-
-bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx);
-
-#endif /* !CONFIG_USER_ONLY */
-#endif /* TARGET_ARM_PTW_H */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@
 #include "semihosting/common-semi.h"
 #endif
 #include "cpregs.h"
-#include "ptw.h"
 
 #define ARM_CPU_FREQ 1000000000 /* FIXME: 1 GHz, should be configurable */
 
@@ -XXX,XX +XXX,XX @@ uint64_t arm_sctlr(CPUARMState *env, int el)
 }
 
 #ifndef CONFIG_USER_ONLY
-
-/* Return true if the specified stage of address translation is disabled */
-bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx)
-{
-    uint64_t hcr_el2;
-
-    if (arm_feature(env, ARM_FEATURE_M)) {
-        switch (env->v7m.mpu_ctrl[regime_is_secure(env, mmu_idx)] &
-                (R_V7M_MPU_CTRL_ENABLE_MASK | R_V7M_MPU_CTRL_HFNMIENA_MASK)) {
-        case R_V7M_MPU_CTRL_ENABLE_MASK:
-            /* Enabled, but not for HardFault and NMI */
-            return mmu_idx & ARM_MMU_IDX_M_NEGPRI;
-        case R_V7M_MPU_CTRL_ENABLE_MASK | R_V7M_MPU_CTRL_HFNMIENA_MASK:
-            /* Enabled for all cases */
-            return false;
-        case 0:
-        default:
-            /* HFNMIENA set and ENABLE clear is UNPREDICTABLE, but
-             * we warned about that in armv7m_nvic.c when the guest set it.
-             */
-            return true;
-        }
-    }
-
-    hcr_el2 = arm_hcr_el2_eff(env);
-
-    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
-        /* HCR.DC means HCR.VM behaves as 1 */
-        return (hcr_el2 & (HCR_DC | HCR_VM)) == 0;
-    }
-
-    if (hcr_el2 & HCR_TGE) {
-        /* TGE means that NS EL0/1 act as if SCTLR_EL1.M is zero */
-        if (!regime_is_secure(env, mmu_idx) && regime_el(env, mmu_idx) == 1) {
-            return true;
-        }
-    }
-
-    if ((hcr_el2 & HCR_DC) && arm_mmu_idx_is_stage1_of_2(mmu_idx)) {
-        /* HCR.DC means SCTLR_EL1.M behaves as 0 */
-        return true;
-    }
-
-    return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
-}
-
 /* Convert a possible stage1+2 MMU index into the appropriate
  * stage 1 MMU index
  */
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@
 #include "cpu.h"
 #include "internals.h"
 #include "idau.h"
-#include "ptw.h"
 
 
 static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
@@ -XXX,XX +XXX,XX @@ static uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn)
     }
 }
 
+/* Return true if the specified stage of address translation is disabled */
+static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx)
+{
+    uint64_t hcr_el2;
+
+    if (arm_feature(env, ARM_FEATURE_M)) {
+        switch (env->v7m.mpu_ctrl[regime_is_secure(env, mmu_idx)] &
+                (R_V7M_MPU_CTRL_ENABLE_MASK | R_V7M_MPU_CTRL_HFNMIENA_MASK)) {
+        case R_V7M_MPU_CTRL_ENABLE_MASK:
+            /* Enabled, but not for HardFault and NMI */
+            return mmu_idx & ARM_MMU_IDX_M_NEGPRI;
+        case R_V7M_MPU_CTRL_ENABLE_MASK | R_V7M_MPU_CTRL_HFNMIENA_MASK:
+            /* Enabled for all cases */
+            return false;
+        case 0:
+        default:
+            /*
+             * HFNMIENA set and ENABLE clear is UNPREDICTABLE, but
+             * we warned about that in armv7m_nvic.c when the guest set it.
+             */
+            return true;
+        }
+    }
+
+    hcr_el2 = arm_hcr_el2_eff(env);
+
+    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
+        /* HCR.DC means HCR.VM behaves as 1 */
+        return (hcr_el2 & (HCR_DC | HCR_VM)) == 0;
+    }
+
+    if (hcr_el2 & HCR_TGE) {
+        /* TGE means that NS EL0/1 act as if SCTLR_EL1.M is zero */
+        if (!regime_is_secure(env, mmu_idx) && regime_el(env, mmu_idx) == 1) {
+            return true;
+        }
+    }
+
+    if ((hcr_el2 & HCR_DC) && arm_mmu_idx_is_stage1_of_2(mmu_idx)) {
+        /* HCR.DC means SCTLR_EL1.M behaves as 0 */
+        return true;
+    }
+
+    return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
+}
+
 static bool ptw_attrs_are_device(CPUARMState *env, ARMCacheAttrs cacheattrs)
 {
     /*
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-27-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 26 --------------------------
 target/arm/ptw.c    | 24 ++++++++++++++++++++++++
 2 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
     };
 }
 
-#ifndef CONFIG_USER_ONLY
-hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
-                                         MemTxAttrs *attrs)
-{
-    ARMCPU *cpu = ARM_CPU(cs);
-    CPUARMState *env = &cpu->env;
-    hwaddr phys_addr;
-    target_ulong page_size;
-    int prot;
-    bool ret;
-    ARMMMUFaultInfo fi = {};
-    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
-    ARMCacheAttrs cacheattrs = {};
-
-    *attrs = (MemTxAttrs) {};
-
-    ret = get_phys_addr(env, addr, MMU_DATA_LOAD, mmu_idx, &phys_addr,
-                        attrs, &prot, &page_size, &fi, &cacheattrs);
-
-    if (ret) {
-        return -1;
-    }
-    return phys_addr;
-}
-#endif
-
 /* Note that signed overflow is undefined in C.  The following routines are
    careful to use unsigned types where modulo arithmetic is required.
    Failure to do so _will_ break on newer gcc.  */
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
                                     phys_ptr, prot, page_size, fi);
     }
 }
+
+hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
+                                         MemTxAttrs *attrs)
+{
+    ARMCPU *cpu = ARM_CPU(cs);
+    CPUARMState *env = &cpu->env;
+    hwaddr phys_addr;
+    target_ulong page_size;
+    int prot;
+    bool ret;
+    ARMMMUFaultInfo fi = {};
+    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
+    ARMCacheAttrs cacheattrs = {};
+
+    *attrs = (MemTxAttrs) {};
+
+    ret = get_phys_addr(env, addr, MMU_DATA_LOAD, mmu_idx, &phys_addr,
+                        attrs, &prot, &page_size, &fi, &cacheattrs);
+
+    if (ret) {
+        return -1;
+    }
+    return phys_addr;
+}
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-28-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 32 --------------------------------
 target/arm/ptw.c    | 28 ++++++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 32 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ uint64_t arm_sctlr(CPUARMState *env, int el)
     return env->cp15.sctlr_el[el];
 }
 
-#ifndef CONFIG_USER_ONLY
-/* Convert a possible stage1+2 MMU index into the appropriate
- * stage 1 MMU index
- */
-ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
-{
-    switch (mmu_idx) {
-    case ARMMMUIdx_SE10_0:
-        return ARMMMUIdx_Stage1_SE0;
-    case ARMMMUIdx_SE10_1:
-        return ARMMMUIdx_Stage1_SE1;
-    case ARMMMUIdx_SE10_1_PAN:
-        return ARMMMUIdx_Stage1_SE1_PAN;
-    case ARMMMUIdx_E10_0:
-        return ARMMMUIdx_Stage1_E0;
-    case ARMMMUIdx_E10_1:
-        return ARMMMUIdx_Stage1_E1;
-    case ARMMMUIdx_E10_1_PAN:
-        return ARMMMUIdx_Stage1_E1_PAN;
-    default:
-        return mmu_idx;
-    }
-}
-#endif /* !CONFIG_USER_ONLY */
-
 int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
 {
     if (regime_has_2_ranges(mmu_idx)) {
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx(CPUARMState *env)
     return arm_mmu_idx_el(env, arm_current_el(env));
 }
 
-#ifndef CONFIG_USER_ONLY
-ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
-{
-    return stage_1_mmu_idx(arm_mmu_idx(env));
-}
-#endif
-
 static CPUARMTBFlags rebuild_hflags_common(CPUARMState *env, int fp_el,
                                            ARMMMUIdx mmu_idx,
                                            CPUARMTBFlags flags)
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ unsigned int arm_pamax(ARMCPU *cpu)
     return pamax_map[parange];
 }
 
+/*
+ * Convert a possible stage1+2 MMU index into the appropriate stage 1 MMU index
+ */
+ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx mmu_idx)
+{
+    switch (mmu_idx) {
+    case ARMMMUIdx_SE10_0:
+        return ARMMMUIdx_Stage1_SE0;
+    case ARMMMUIdx_SE10_1:
+        return ARMMMUIdx_Stage1_SE1;
+    case ARMMMUIdx_SE10_1_PAN:
+        return ARMMMUIdx_Stage1_SE1_PAN;
+    case ARMMMUIdx_E10_0:
+        return ARMMMUIdx_Stage1_E0;
+    case ARMMMUIdx_E10_1:
+        return ARMMMUIdx_Stage1_E1;
+    case ARMMMUIdx_E10_1_PAN:
+        return ARMMMUIdx_Stage1_E1_PAN;
+    default:
+        return mmu_idx;
+    }
+}
+
+ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
+{
+    return stage_1_mmu_idx(arm_mmu_idx(env));
+}
+
 static bool regime_translation_big_endian(CPUARMState *env, ARMMMUIdx mmu_idx)
 {
     return (regime_sctlr(env, mmu_idx) & SCTLR_EE) != 0;
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

The use of ARM_CPU to recover env from cs calls
object_class_dynamic_cast, which shows up on the profile.
This is pointless, because all callers already have env, and
the reverse operation, env_cpu, is only pointer arithmetic.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220604040607.269301-29-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
 }
 
 /* All loads done in the course of a page table walk go through here. */
-static uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
+static uint32_t arm_ldl_ptw(CPUARMState *env, hwaddr addr, bool is_secure,
                             ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
 {
-    ARMCPU *cpu = ARM_CPU(cs);
-    CPUARMState *env = &cpu->env;
+    CPUState *cs = env_cpu(env);
     MemTxAttrs attrs = {};
     MemTxResult result = MEMTX_OK;
     AddressSpace *as;
@@ -XXX,XX +XXX,XX @@ static uint32_t arm_ldl_ptw(CPUState *cs, hwaddr addr, bool is_secure,
     return 0;
 }
 
-static uint64_t arm_ldq_ptw(CPUState *cs, hwaddr addr, bool is_secure,
+static uint64_t arm_ldq_ptw(CPUARMState *env, hwaddr addr, bool is_secure,
                             ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
 {
-    ARMCPU *cpu = ARM_CPU(cs);
-    CPUARMState *env = &cpu->env;
+    CPUState *cs = env_cpu(env);
     MemTxAttrs attrs = {};
     MemTxResult result = MEMTX_OK;
     AddressSpace *as;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
                              target_ulong *page_size,
                              ARMMMUFaultInfo *fi)
 {
-    CPUState *cs = env_cpu(env);
     int level = 1;
     uint32_t table;
     uint32_t desc;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
         fi->type = ARMFault_Translation;
         goto do_fault;
     }
-    desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
+    desc = arm_ldl_ptw(env, table, regime_is_secure(env, mmu_idx),
                        mmu_idx, fi);
     if (fi->type != ARMFault_None) {
         goto do_fault;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
             /* Fine pagetable.  */
             table = (desc & 0xfffff000) | ((address >> 8) & 0xffc);
         }
-        desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
+        desc = arm_ldl_ptw(env, table, regime_is_secure(env, mmu_idx),
                            mmu_idx, fi);
         if (fi->type != ARMFault_None) {
             goto do_fault;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
                              hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
                              target_ulong *page_size, ARMMMUFaultInfo *fi)
 {
-    CPUState *cs = env_cpu(env);
     ARMCPU *cpu = env_archcpu(env);
     int level = 1;
     uint32_t table;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
         fi->type = ARMFault_Translation;
         goto do_fault;
     }
-    desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
+    desc = arm_ldl_ptw(env, table, regime_is_secure(env, mmu_idx),
                        mmu_idx, fi);
     if (fi->type != ARMFault_None) {
         goto do_fault;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
         ns = extract32(desc, 3, 1);
         /* Lookup l2 entry.  */
         table = (desc & 0xfffffc00) | ((address >> 10) & 0x3fc);
-        desc = arm_ldl_ptw(cs, table, regime_is_secure(env, mmu_idx),
+        desc = arm_ldl_ptw(env, table, regime_is_secure(env, mmu_idx),
                            mmu_idx, fi);
         if (fi->type != ARMFault_None) {
             goto do_fault;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
                                ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
 {
     ARMCPU *cpu = env_archcpu(env);
-    CPUState *cs = CPU(cpu);
     /* Read an LPAE long-descriptor translation table. */
     ARMFaultType fault_type = ARMFault_Translation;
     uint32_t level;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
         descaddr |= (address >> (stride * (4 - level))) & indexmask;
         descaddr &= ~7ULL;
         nstable = extract32(tableattrs, 4, 1);
-        descriptor = arm_ldq_ptw(cs, descaddr, !nstable, mmu_idx, fi);
+        descriptor = arm_ldq_ptw(env, descaddr, !nstable, mmu_idx, fi);
         if (fi->type != ARMFault_None) {
             goto do_fault;
         }
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

With SME, the vector length does not only come from ZCR_ELx.
Comment that this is either NVL or SVL, like the pseudocode.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           | 3 ++-
 target/arm/translate-a64.h | 2 +-
 target/arm/translate.h     | 2 +-
 target/arm/helper.c        | 2 +-
 target/arm/translate-a64.c | 2 +-
 target/arm/translate-sve.c | 2 +-
 6 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_M32, MVE_NO_PRED, 5, 1)            /* Not cached. */
  */
 FIELD(TBFLAG_A64, TBII, 0, 2)
 FIELD(TBFLAG_A64, SVEEXC_EL, 2, 2)
-FIELD(TBFLAG_A64, ZCR_LEN, 4, 4)
+/* The current vector length, either NVL or SVL. */
+FIELD(TBFLAG_A64, VL, 4, 4)
 FIELD(TBFLAG_A64, PAUTH_ACTIVE, 8, 1)
 FIELD(TBFLAG_A64, BT, 9, 1)
 FIELD(TBFLAG_A64, BTYPE, 10, 2)         /* Not cached. */
diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr vec_full_reg_ptr(DisasContext *s, int regno)
 /* Return the byte size of the "whole" vector register, VL / 8.  */
 static inline int vec_full_reg_size(DisasContext *s)
 {
-    return s->sve_len;
+    return s->vl;
 }
 
 bool disas_sve(DisasContext *, uint32_t);
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool ns;        /* Use non-secure CPREG bank on access */
     int fp_excp_el; /* FP exception EL or 0 if enabled */
     int sve_excp_el; /* SVE exception EL or 0 if enabled */
-    int sve_len;     /* SVE vector length in bytes */
+    int vl;          /* current vector length in bytes */
     /* Flag indicating that exceptions from secure mode are routed to EL3. */
     bool secure_routed_to_el3;
     bool vfp_enabled; /* FP enabled via FPSCR.EN */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
             zcr_len = sve_zcr_len_for_el(env, el);
         }
         DP_TBFLAG_A64(flags, SVEEXC_EL, sve_el);
-        DP_TBFLAG_A64(flags, ZCR_LEN, zcr_len);
+        DP_TBFLAG_A64(flags, VL, zcr_len);
     }
 
     sctlr = regime_sctlr(env, stage1);
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
     dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
     dc->sve_excp_el = EX_TBFLAG_A64(tb_flags, SVEEXC_EL);
-    dc->sve_len = (EX_TBFLAG_A64(tb_flags, ZCR_LEN) + 1) * 16;
+    dc->vl = (EX_TBFLAG_A64(tb_flags, VL) + 1) * 16;
     dc->pauth_active = EX_TBFLAG_A64(tb_flags, PAUTH_ACTIVE);
     dc->bt = EX_TBFLAG_A64(tb_flags, BT);
     dc->btype = EX_TBFLAG_A64(tb_flags, BTYPE);
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static inline int pred_full_reg_offset(DisasContext *s, int regno)
 /* Return the byte size of the whole predicate register, VL / 64.  */
 static inline int pred_full_reg_size(DisasContext *s)
 {
-    return s->sve_len >> 3;
+    return s->vl >> 3;
 }
 
 /* Round up the size of a register to a size allowed by
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Add an interface function to extract the digested vector length
rather than the raw zcr_el[1] value.  This fixes an incorrect
return from do_prctl_set_vl where we didn't take into account
the set of vector lengths supported by the cpu.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/aarch64/target_prctl.h | 20 +++++++++++++-------
 target/arm/cpu.h                  | 11 +++++++++++
 linux-user/aarch64/signal.c       |  4 ++--
 3 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/linux-user/aarch64/target_prctl.h b/linux-user/aarch64/target_prctl.h
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/aarch64/target_prctl.h
+++ b/linux-user/aarch64/target_prctl.h
@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_get_vl(CPUArchState *env)
 {
     ARMCPU *cpu = env_archcpu(env);
     if (cpu_isar_feature(aa64_sve, cpu)) {
-        return ((cpu->env.vfp.zcr_el[1] & 0xf) + 1) * 16;
+        return sve_vq(env) * 16;
     }
     return -TARGET_EINVAL;
 }
@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_set_vl(CPUArchState *env, abi_long arg2)
      */
     if (cpu_isar_feature(aa64_sve, env_archcpu(env))
         && arg2 >= 0 && arg2 <= 512 * 16 && !(arg2 & 15)) {
-        ARMCPU *cpu = env_archcpu(env);
         uint32_t vq, old_vq;
 
-        old_vq = (env->vfp.zcr_el[1] & 0xf) + 1;
-        vq = MAX(arg2 / 16, 1);
-        vq = MIN(vq, cpu->sve_max_vq);
+        old_vq = sve_vq(env);
 
+        /*
+         * Bound the value of arg2, so that we know that it fits into
+         * the 4-bit field in ZCR_EL1.  Rely on the hflags rebuild to
+         * sort out the length supported by the cpu.
+         */
+        vq = MAX(arg2 / 16, 1);
+        vq = MIN(vq, ARM_MAX_VQ);
+        env->vfp.zcr_el[1] = vq - 1;
+        arm_rebuild_hflags(env);
+
+        vq = sve_vq(env);
         if (vq < old_vq) {
             aarch64_sve_narrow_vq(env, vq);
         }
-        env->vfp.zcr_el[1] = vq - 1;
-        arm_rebuild_hflags(env);
         return vq * 16;
     }
     return -TARGET_EINVAL;
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline int cpu_mmu_index(CPUARMState *env, bool ifetch)
     return EX_TBFLAG_ANY(env->hflags, MMUIDX);
 }
 
+/**
+ * sve_vq
+ * @env: the cpu context
+ *
+ * Return the VL cached within env->hflags, in units of quadwords.
+ */
+static inline int sve_vq(CPUARMState *env)
+{
+    return EX_TBFLAG_A64(env->hflags, VL) + 1;
+}
+
 static inline bool bswap_code(bool sctlr_b)
 {
 #ifdef CONFIG_USER_ONLY
diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
 
         case TARGET_SVE_MAGIC:
             if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
-                vq = (env->vfp.zcr_el[1] & 0xf) + 1;
+                vq = sve_vq(env);
                 sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
                 if (!sve && size == sve_size) {
                     sve = (struct target_sve_context *)ctx;
@@ -XXX,XX +XXX,XX @@ static void target_setup_frame(int usig, struct target_sigaction *ka,
 
     /* SVE state needs saving only if it exists.  */
     if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
-        vq = (env->vfp.zcr_el[1] & 0xf) + 1;
+        vq = sve_vq(env);
         sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
         sve_ofs = alloc_sigframe_space(sve_size, &layout);
     }
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Instead of checking these bits in fp_exception_el and
also in sve_exception_el, document that we must compare
the results.  The only place where we have not already
checked that FP EL is zero is in rebuild_hflags_a64.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 58 +++++++++++++++------------------------------
 1 file changed, 19 insertions(+), 39 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo minimal_ras_reginfo[] = {
       .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.vsesr_el2) },
 };
 
-/* Return the exception level to which exceptions should be taken
- * via SVEAccessTrap.  If an exception should be routed through
- * AArch64.AdvSIMDFPAccessTrap, return 0; fp_exception_el should
- * take care of raising that exception.
- * C.f. the ARM pseudocode function CheckSVEEnabled.
+/*
+ * Return the exception level to which exceptions should be taken
+ * via SVEAccessTrap.  This excludes the check for whether the exception
+ * should be routed through AArch64.AdvSIMDFPAccessTrap.  That can easily
+ * be found by testing 0 < fp_exception_el < sve_exception_el.
+ *
+ * C.f. the ARM pseudocode function CheckSVEEnabled.  Note that the
+ * pseudocode does *not* separate out the FP trap checks, but has them
+ * all in one function.
  */
 int sve_exception_el(CPUARMState *env, int el)
 {
@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
         case 2:
             return 1;
         }
-
-        /* Check CPACR.FPEN.  */
-        switch (FIELD_EX64(env->cp15.cpacr_el1, CPACR_EL1, FPEN)) {
-        case 1:
-            if (el != 0) {
-                break;
-            }
-            /* fall through */
-        case 0:
-        case 2:
-            return 0;
-        }
     }
 
     /*
@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
             case 2:
                 return 2;
             }
-
-            switch (FIELD_EX32(env->cp15.cptr_el[2], CPTR_EL2, FPEN)) {
-            case 1:
-                if (el == 2 || !(hcr_el2 & HCR_TGE)) {
-                    break;
-                }
-                /* fall through */
-            case 0:
-            case 2:
-                return 0;
-            }
         } else if (arm_is_el2_enabled(env)) {
             if (FIELD_EX64(env->cp15.cptr_el[2], CPTR_EL2, TZ)) {
                 return 2;
             }
-            if (FIELD_EX64(env->cp15.cptr_el[2], CPTR_EL2, TFP)) {
-                return 0;
-            }
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
 
     if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
         int sve_el = sve_exception_el(env, el);
-        uint32_t zcr_len;
 
         /*
-         * If SVE is disabled, but FP is enabled,
-         * then the effective len is 0.
+         * If either FP or SVE are disabled, translator does not need len.
+         * If SVE EL > FP EL, FP exception has precedence, and translator
+         * does not need SVE EL.  Save potential re-translations by forcing
+         * the unneeded data to zero.
          */
-        if (sve_el != 0 && fp_el == 0) {
-            zcr_len = 0;
-        } else {
-            zcr_len = sve_zcr_len_for_el(env, el);
+        if (fp_el != 0) {
+            if (sve_el > fp_el) {
+                sve_el = 0;
+            }
+        } else if (sve_el == 0) {
+            DP_TBFLAG_A64(flags, VL, sve_zcr_len_for_el(env, el));
         }
         DP_TBFLAG_A64(flags, SVEEXC_EL, sve_el);
-        DP_TBFLAG_A64(flags, VL, zcr_len);
     }
 
     sctlr = regime_sctlr(env, stage1);
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

This (newish) ARM pseudocode function is easier to work with
than open-coded tests for HCR_E2H etc.  Use of the function
will be staged into the code base in parts.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-6-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h |  2 ++
 target/arm/helper.c    | 28 ++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline void define_cortex_a72_a57_a53_cp_reginfo(ARMCPU *cpu) { }
 void define_cortex_a72_a57_a53_cp_reginfo(ARMCPU *cpu);
 #endif
 
+bool el_is_in_host(CPUARMState *env, int el);
+
 void aa32_max_features(ARMCPU *cpu);
 
 #endif
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
     return ret;
 }
 
+/*
+ * Corresponds to ARM pseudocode function ELIsInHost().
+ */
+bool el_is_in_host(CPUARMState *env, int el)
+{
+    uint64_t mask;
+
+    /*
+     * Since we only care about E2H and TGE, we can skip arm_hcr_el2_eff().
+     * Perform the simplest bit tests first, and validate EL2 afterward.
+     */
+    if (el & 1) {
+        return false; /* EL1 or EL3 */
+    }
+
+    /*
+     * Note that hcr_write() checks isar_feature_aa64_vh(),
+     * aka HaveVirtHostExt(), in allowing HCR_E2H to be set.
+     */
+    mask = el ? HCR_E2H : HCR_E2H | HCR_TGE;
+    if ((env->cp15.hcr_el2 & mask) != mask) {
+        return false;
+    }
+
+    /* TGE and/or E2H set: double check those bits are currently legal. */
+    return arm_is_el2_enabled(env) && arm_el_is_aa64(env, 2);
+}
+
 static void hcrx_write(CPUARMState *env, const ARMCPRegInfo *ri,
                        uint64_t value)
 {
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

The ARM pseudocode function NVL uses this predicate now,
and I think it's a bit clearer.  Simplify the pseudocode
condition by noting that IsInHost is always false for EL1.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
     ARMCPU *cpu = env_archcpu(env);
     uint32_t zcr_len = cpu->sve_max_vq - 1;
 
-    if (el <= 1 &&
-        (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
+    if (el <= 1 && !el_is_in_host(env, el)) {
         zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
     }
     if (el <= 2 && arm_feature(env, ARM_FEATURE_EL2)) {
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

The ARM pseudocode function CheckNormalSVEEnabled uses this
predicate now, and I think it's a bit clearer.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo minimal_ras_reginfo[] = {
 int sve_exception_el(CPUARMState *env, int el)
 {
 #ifndef CONFIG_USER_ONLY
-    uint64_t hcr_el2 = arm_hcr_el2_eff(env);
-
-    if (el <= 1 && (hcr_el2 & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
+    if (el <= 1 && !el_is_in_host(env, el)) {
         switch (FIELD_EX64(env->cp15.cpacr_el1, CPACR_EL1, ZEN)) {
         case 1:
             if (el != 0) {
@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
      * CPTR_EL2 changes format with HCR_EL2.E2H (regardless of TGE).
      */
     if (el <= 2) {
+        uint64_t hcr_el2 = arm_hcr_el2_eff(env);
         if (hcr_el2 & HCR_E2H) {
             switch (FIELD_EX64(env->cp15.cptr_el[2], CPTR_EL2, ZEN)) {
             case 1:
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

This check is buried within arm_hcr_el2_eff(), but since we
have to have the explicit check for CPTR_EL2.TZ, we might as
well just check it once at the beginning of the block.

Once this is done, we can test HCR_EL2.{E2H,TGE} directly,
rather than going through arm_hcr_el2_eff().

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
         }
     }
 
-    /*
-     * CPTR_EL2 changes format with HCR_EL2.E2H (regardless of TGE).
-     */
-    if (el <= 2) {
-        uint64_t hcr_el2 = arm_hcr_el2_eff(env);
-        if (hcr_el2 & HCR_E2H) {
+    if (el <= 2 && arm_is_el2_enabled(env)) {
+        /* CPTR_EL2 changes format with HCR_EL2.E2H (regardless of TGE). */
+        if (env->cp15.hcr_el2 & HCR_E2H) {
             switch (FIELD_EX64(env->cp15.cptr_el[2], CPTR_EL2, ZEN)) {
             case 1:
-                if (el != 0 || !(hcr_el2 & HCR_TGE)) {
+                if (el != 0 || !(env->cp15.hcr_el2 & HCR_TGE)) {
                     break;
                 }
                 /* fall through */
@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
             case 2:
                 return 2;
             }
-        } else if (arm_is_el2_enabled(env)) {
+        } else {
             if (FIELD_EX64(env->cp15.cptr_el[2], CPTR_EL2, TZ)) {
                 return 2;
             }
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

We don't need to constrain the value set in zcr_el[1],
because it will be done by sve_zcr_len_for_el.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
                                          CPACR_EL1, ZEN, 3);
         /* with reasonable vector length */
         if (cpu_isar_feature(aa64_sve, cpu)) {
-            env->vfp.zcr_el[1] =
-                aarch64_sve_zcr_get_valid_len(cpu, cpu->sve_default_vq - 1);
+            env->vfp.zcr_el[1] = cpu->sve_default_vq - 1;
         }
         /*
          * Enable 48-bit address space (TODO: take reserved_va into account).
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

This function is used only once, and will need modification
for Streaming SVE mode.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h | 11 -----------
 target/arm/helper.c    | 30 +++++++++++-------------------
 2 files changed, 11 insertions(+), 30 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ void arm_translate_init(void);
 void arm_cpu_synchronize_from_tb(CPUState *cs, const TranslationBlock *tb);
 #endif /* CONFIG_TCG */
 
-/**
- * aarch64_sve_zcr_get_valid_len:
- * @cpu: cpu context
- * @start_len: maximum len to consider
- *
- * Return the maximum supported sve vector length <= @start_len.
- * Note that both @start_len and the return value are in units
- * of ZCR_ELx.LEN, so the vector bit length is (x + 1) * 128.
- */
-uint32_t aarch64_sve_zcr_get_valid_len(ARMCPU *cpu, uint32_t start_len);
-
 enum arm_fprounding {
     FPROUNDING_TIEEVEN,
     FPROUNDING_POSINF,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
     return 0;
 }
 
-uint32_t aarch64_sve_zcr_get_valid_len(ARMCPU *cpu, uint32_t start_len)
-{
-    uint32_t end_len;
-
-    start_len = MIN(start_len, ARM_MAX_VQ - 1);
-    end_len = start_len;
-
-    if (!test_bit(start_len, cpu->sve_vq_map)) {
-        end_len = find_last_bit(cpu->sve_vq_map, start_len);
-        assert(end_len < start_len);
-    }
-    return end_len;
-}
-
 /*
  * Given that SVE is enabled, return the vector length for EL.
  */
 uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
 {
     ARMCPU *cpu = env_archcpu(env);
-    uint32_t zcr_len = cpu->sve_max_vq - 1;
+    uint32_t len = cpu->sve_max_vq - 1;
+    uint32_t end_len;
 
     if (el <= 1 && !el_is_in_host(env, el)) {
-        zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
+        len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
     }
     if (el <= 2 && arm_feature(env, ARM_FEATURE_EL2)) {
-        zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[2]);
+        len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[2]);
     }
     if (arm_feature(env, ARM_FEATURE_EL3)) {
-        zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
+        len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
     }
 
-    return aarch64_sve_zcr_get_valid_len(cpu, zcr_len);
+    end_len = len;
+    if (!test_bit(len, cpu->sve_vq_map)) {
+        end_len = find_last_bit(cpu->sve_vq_map, len);
+        assert(end_len < len);
+    }
+    return end_len;
 }
 
 static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

The bitmap need only hold 15 bits; bitmap is over-complicated.
We can simplify operations quite a bit with plain logical ops.

The introduction of SVE_VQ_POW2_MAP eliminates the need for
looping in order to search for powers of two.  Simply perform
the logical ops and use count leading or trailing zeros as
required to find the result.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h       |   6 +--
 target/arm/internals.h |   5 ++
 target/arm/kvm_arm.h   |   7 ++-
 target/arm/cpu64.c     | 117 ++++++++++++++++++++---------------------
 target/arm/helper.c    |   9 +---
 target/arm/kvm64.c     |  36 +++----------
 6 files changed, 75 insertions(+), 105 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
      * Bits set in sve_vq_supported represent valid vector lengths for
      * the CPU type.
      */
-    DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ);
-    DECLARE_BITMAP(sve_vq_init, ARM_MAX_VQ);
-    DECLARE_BITMAP(sve_vq_supported, ARM_MAX_VQ);
+    uint32_t sve_vq_map;
+    uint32_t sve_vq_init;
+    uint32_t sve_vq_supported;
 
     /* Generic timer counter frequency, in Hz */
     uint64_t gt_cntfrq_hz;
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ bool el_is_in_host(CPUARMState *env, int el);
 
 void aa32_max_features(ARMCPU *cpu);
 
+/* Powers of 2 for sve_vq_map et al. */
+#define SVE_VQ_POW2_MAP                                 \
+    ((1 << (1 - 1)) | (1 << (2 - 1)) |                  \
+     (1 << (4 - 1)) | (1 << (8 - 1)) | (1 << (16 - 1)))
+
 #endif
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf);
 /**
  * kvm_arm_sve_get_vls:
  * @cs: CPUState
- * @map: bitmap to fill in
  *
  * Get all the SVE vector lengths supported by the KVM host, setting
  * the bits corresponding to their length in quadwords minus one
- * (vq - 1) in @map up to ARM_MAX_VQ.
+ * (vq - 1) up to ARM_MAX_VQ.  Return the resulting map.
  */
-void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map);
+uint32_t kvm_arm_sve_get_vls(CPUState *cs);
 
 /**
  * kvm_arm_set_cpu_features_from_host:
@@ -XXX,XX +XXX,XX @@ static inline void kvm_arm_steal_time_finalize(ARMCPU *cpu, Error **errp)
     g_assert_not_reached();
 }
 
-static inline void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map)
+static inline uint32_t kvm_arm_sve_get_vls(CPUState *cs)
 {
     g_assert_not_reached();
 }
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
      * any of the above.  Finally, if SVE is not disabled, then at least one
      * vector length must be enabled.
      */
-    DECLARE_BITMAP(tmp, ARM_MAX_VQ);
-    uint32_t vq, max_vq = 0;
+    uint32_t vq_map = cpu->sve_vq_map;
+    uint32_t vq_init = cpu->sve_vq_init;
+    uint32_t vq_supported;
+    uint32_t vq_mask = 0;
+    uint32_t tmp, vq, max_vq = 0;
 
     /*
      * CPU models specify a set of supported vector lengths which are
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
      * in the supported bitmap results in an error.  When KVM is enabled we
      * fetch the supported bitmap from the host.
      */
-    if (kvm_enabled() && kvm_arm_sve_supported()) {
-        kvm_arm_sve_get_vls(CPU(cpu), cpu->sve_vq_supported);
-    } else if (kvm_enabled()) {
-        assert(!cpu_isar_feature(aa64_sve, cpu));
+    if (kvm_enabled()) {
+        if (kvm_arm_sve_supported()) {
+            cpu->sve_vq_supported = kvm_arm_sve_get_vls(CPU(cpu));
+            vq_supported = cpu->sve_vq_supported;
+        } else {
+            assert(!cpu_isar_feature(aa64_sve, cpu));
+            vq_supported = 0;
+        }
+    } else {
+        vq_supported = cpu->sve_vq_supported;
     }
 
     /*
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
      * From the properties, sve_vq_map<N> implies sve_vq_init<N>.
      * Check first for any sve<N> enabled.
      */
-    if (!bitmap_empty(cpu->sve_vq_map, ARM_MAX_VQ)) {
-        max_vq = find_last_bit(cpu->sve_vq_map, ARM_MAX_VQ) + 1;
+    if (vq_map != 0) {
+        max_vq = 32 - clz32(vq_map);
+        vq_mask = MAKE_64BIT_MASK(0, max_vq);
 
         if (cpu->sve_max_vq && max_vq > cpu->sve_max_vq) {
             error_setg(errp, "cannot enable sve%d", max_vq * 128);
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
              * For KVM we have to automatically enable all supported unitialized
              * lengths, even when the smaller lengths are not all powers-of-two.
              */
-            bitmap_andnot(tmp, cpu->sve_vq_supported, cpu->sve_vq_init, max_vq);
-            bitmap_or(cpu->sve_vq_map, cpu->sve_vq_map, tmp, max_vq);
+            vq_map |= vq_supported & ~vq_init & vq_mask;
         } else {
             /* Propagate enabled bits down through required powers-of-two. */
-            for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
-                if (!test_bit(vq - 1, cpu->sve_vq_init)) {
-                    set_bit(vq - 1, cpu->sve_vq_map);
-                }
-            }
+            vq_map |= SVE_VQ_POW2_MAP & ~vq_init & vq_mask;
         }
     } else if (cpu->sve_max_vq == 0) {
         /*
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
 
         if (kvm_enabled()) {
             /* Disabling a supported length disables all larger lengths. */
-            for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
-                if (test_bit(vq - 1, cpu->sve_vq_init) &&
-                    test_bit(vq - 1, cpu->sve_vq_supported)) {
-                    break;
-                }
-            }
+            tmp = vq_init & vq_supported;
         } else {
             /* Disabling a power-of-two disables all larger lengths. */
-            for (vq = 1; vq <= ARM_MAX_VQ; vq <<= 1) {
-                if (test_bit(vq - 1, cpu->sve_vq_init)) {
-                    break;
-                }
-            }
+            tmp = vq_init & SVE_VQ_POW2_MAP;
         }
+        vq = ctz32(tmp) + 1;
 
         max_vq = vq <= ARM_MAX_VQ ? vq - 1 : ARM_MAX_VQ;
-        bitmap_andnot(cpu->sve_vq_map, cpu->sve_vq_supported,
-                      cpu->sve_vq_init, max_vq);
-        if (max_vq == 0 || bitmap_empty(cpu->sve_vq_map, max_vq)) {
+        vq_mask = MAKE_64BIT_MASK(0, max_vq);
+        vq_map = vq_supported & ~vq_init & vq_mask;
+
+        if (max_vq == 0 || vq_map == 0) {
             error_setg(errp, "cannot disable sve%d", vq * 128);
             error_append_hint(errp, "Disabling sve%d results in all "
                               "vector lengths being disabled.\n",
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
             return;
         }
 
-        max_vq = find_last_bit(cpu->sve_vq_map, max_vq) + 1;
+        max_vq = 32 - clz32(vq_map);
+        vq_mask = MAKE_64BIT_MASK(0, max_vq);
     }
 
     /*
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
      */
     if (cpu->sve_max_vq != 0) {
         max_vq = cpu->sve_max_vq;
+        vq_mask = MAKE_64BIT_MASK(0, max_vq);
 
-        if (!test_bit(max_vq - 1, cpu->sve_vq_map) &&
-            test_bit(max_vq - 1, cpu->sve_vq_init)) {
+        if (vq_init & ~vq_map & (1 << (max_vq - 1))) {
             error_setg(errp, "cannot disable sve%d", max_vq * 128);
             error_append_hint(errp, "The maximum vector length must be "
                               "enabled, sve-max-vq=%d (%d bits)\n",
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
         }
 
         /* Set all bits not explicitly set within sve-max-vq. */
-        bitmap_complement(tmp, cpu->sve_vq_init, max_vq);
-        bitmap_or(cpu->sve_vq_map, cpu->sve_vq_map, tmp, max_vq);
+        vq_map |= ~vq_init & vq_mask;
     }
 
     /*
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
      * are clear, just in case anybody looks.
      */
     assert(max_vq != 0);
-    bitmap_clear(cpu->sve_vq_map, max_vq, ARM_MAX_VQ - max_vq);
+    assert(vq_mask != 0);
+    vq_map &= vq_mask;
 
     /* Ensure the set of lengths matches what is supported. */
-    bitmap_xor(tmp, cpu->sve_vq_map, cpu->sve_vq_supported, max_vq);
-    if (!bitmap_empty(tmp, max_vq)) {
-        vq = find_last_bit(tmp, max_vq) + 1;
-        if (test_bit(vq - 1, cpu->sve_vq_map)) {
+    tmp = vq_map ^ (vq_supported & vq_mask);
+    if (tmp) {
+        vq = 32 - clz32(tmp);
+        if (vq_map & (1 << (vq - 1))) {
             if (cpu->sve_max_vq) {
                 error_setg(errp, "cannot set sve-max-vq=%d", cpu->sve_max_vq);
                 error_append_hint(errp, "This CPU does not support "
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
                 return;
             } else {
                 /* Ensure all required powers-of-two are enabled. */
-                for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
-                    if (!test_bit(vq - 1, cpu->sve_vq_map)) {
-                        error_setg(errp, "cannot disable sve%d", vq * 128);
-                        error_append_hint(errp, "sve%d is required as it "
-                                          "is a power-of-two length smaller "
-                                          "than the maximum, sve%d\n",
-                                          vq * 128, max_vq * 128);
-                        return;
-                    }
+                tmp = SVE_VQ_POW2_MAP & vq_mask & ~vq_map;
+                if (tmp) {
+                    vq = 32 - clz32(tmp);
+                    error_setg(errp, "cannot disable sve%d", vq * 128);
+                    error_append_hint(errp, "sve%d is required as it "
+                                      "is a power-of-two length smaller "
+                                      "than the maximum, sve%d\n",
+                                      vq * 128, max_vq * 128);
+                    return;
                 }
             }
         }
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
 
     /* From now on sve_max_vq is the actual maximum supported length. */
     cpu->sve_max_vq = max_vq;
+    cpu->sve_vq_map = vq_map;
 }
 
 static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
@@ -XXX,XX +XXX,XX @@ static void cpu_arm_get_sve_vq(Object *obj, Visitor *v, const char *name,
     if (!cpu_isar_feature(aa64_sve, cpu)) {
         value = false;
     } else {
-        value = test_bit(vq - 1, cpu->sve_vq_map);
+        value = extract32(cpu->sve_vq_map, vq - 1, 1);
     }
     visit_type_bool(v, name, &value, errp);
 }
@@ -XXX,XX +XXX,XX @@ static void cpu_arm_set_sve_vq(Object *obj, Visitor *v, const char *name,
         return;
     }
 
-    if (value) {
-        set_bit(vq - 1, cpu->sve_vq_map);
-    } else {
-        clear_bit(vq - 1, cpu->sve_vq_map);
-    }
-    set_bit(vq - 1, cpu->sve_vq_init);
+    cpu->sve_vq_map = deposit32(cpu->sve_vq_map, vq - 1, 1, value);
+    cpu->sve_vq_init |= 1 << (vq - 1);
 }
 
 static bool cpu_arm_get_sve(Object *obj, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
     cpu->dcz_blocksize = 7; /*  512 bytes */
 #endif
 
-    bitmap_fill(cpu->sve_vq_supported, ARM_MAX_VQ);
+    cpu->sve_vq_supported = MAKE_64BIT_MASK(0, ARM_MAX_VQ);
 
     aarch64_add_pauth_properties(obj);
     aarch64_add_sve_properties(obj);
@@ -XXX,XX +XXX,XX @@ static void aarch64_a64fx_initfn(Object *obj)
     cpu->gic_vprebits = 5;
     cpu->gic_pribits = 5;
 
-    /* Suppport of A64FX's vector length are 128,256 and 512bit only */
+    /* The A64FX supports only 128, 256 and 512 bit vector lengths */
     aarch64_add_sve_properties(obj);
-    bitmap_zero(cpu->sve_vq_supported, ARM_MAX_VQ);
-    set_bit(0, cpu->sve_vq_supported); /* 128bit */
-    set_bit(1, cpu->sve_vq_supported); /* 256bit */
-    set_bit(3, cpu->sve_vq_supported); /* 512bit */
+    cpu->sve_vq_supported = (1 << 0)  /* 128bit */
+                          | (1 << 1)  /* 256bit */
+                          | (1 << 3); /* 512bit */
 
     cpu->isar.reset_pmcr_el0 = 0x46014040;
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
 {
     ARMCPU *cpu = env_archcpu(env);
     uint32_t len = cpu->sve_max_vq - 1;
-    uint32_t end_len;
 
     if (el <= 1 && !el_is_in_host(env, el)) {
         len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
@@ -XXX,XX +XXX,XX @@ uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
         len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
     }
 
-    end_len = len;
-    if (!test_bit(len, cpu->sve_vq_map)) {
-        end_len = find_last_bit(cpu->sve_vq_map, len);
-        assert(end_len < len);
-    }
-    return end_len;
+    len = 31 - clz32(cpu->sve_vq_map & MAKE_64BIT_MASK(0, len + 1));
+    return len;
 }
 
 static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_steal_time_supported(void)
 
 QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1);
 
-void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map)
+uint32_t kvm_arm_sve_get_vls(CPUState *cs)
 {
     /* Only call this function if kvm_arm_sve_supported() returns true. */
     static uint64_t vls[KVM_ARM64_SVE_VLS_WORDS];
     static bool probed;
     uint32_t vq = 0;
-    int i, j;
-
-    bitmap_zero(map, ARM_MAX_VQ);
+    int i;
 
     /*
      * KVM ensures all host CPUs support the same set of vector lengths.
@@ -XXX,XX +XXX,XX @@ void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map)
         if (vq > ARM_MAX_VQ) {
             warn_report("KVM supports vector lengths larger than "
                         "QEMU can enable");
+            vls[0] &= MAKE_64BIT_MASK(0, ARM_MAX_VQ);
         }
     }
 
-    for (i = 0; i < KVM_ARM64_SVE_VLS_WORDS; ++i) {
-        if (!vls[i]) {
-            continue;
-        }
-        for (j = 1; j <= 64; ++j) {
-            vq = j + i * 64;
-            if (vq > ARM_MAX_VQ) {
-                return;
-            }
-            if (vls[i] & (1UL << (j - 1))) {
-                set_bit(vq - 1, map);
-            }
-        }
-    }
+    return vls[0];
 }
 
 static int kvm_arm_sve_set_vls(CPUState *cs)
 {
-    uint64_t vls[KVM_ARM64_SVE_VLS_WORDS] = {0};
+    ARMCPU *cpu = ARM_CPU(cs);
+    uint64_t vls[KVM_ARM64_SVE_VLS_WORDS] = { cpu->sve_vq_map };
     struct kvm_one_reg reg = {
         .id = KVM_REG_ARM64_SVE_VLS,
         .addr = (uint64_t)&vls[0],
     };
-    ARMCPU *cpu = ARM_CPU(cs);
-    uint32_t vq;
-    int i, j;
 
     assert(cpu->sve_max_vq <= KVM_ARM64_SVE_VQ_MAX);
 
-    for (vq = 1; vq <= cpu->sve_max_vq; ++vq) {
-        if (test_bit(vq - 1, cpu->sve_vq_map)) {
-            i = (vq - 1) / 64;
-            j = (vq - 1) % 64;
-            vls[i] |= 1UL << j;
-        }
-    }
-
     return kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
 }
 
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

This will be used for both Normal and Streaming SVE, and the value
does not necessarily come from ZCR_ELx.  While we're at it, emphasize
the units in which the value is returned.

Patch produced by
    git grep -l sve_zcr_len_for_el | \
    xargs -n1 sed -i 's/sve_zcr_len_for_el/sve_vqm1_for_el/g'

and then adding a function comment.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h       | 11 ++++++++++-
 target/arm/arch_dump.c |  2 +-
 target/arm/cpu.c       |  2 +-
 target/arm/gdbstub64.c |  2 +-
 target/arm/helper.c    | 12 ++++++------
 5 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ void aarch64_sync_64_to_32(CPUARMState *env);
 
 int fp_exception_el(CPUARMState *env, int cur_el);
 int sve_exception_el(CPUARMState *env, int cur_el);
-uint32_t sve_zcr_len_for_el(CPUARMState *env, int el);
+
+/**
+ * sve_vqm1_for_el:
+ * @env: CPUARMState
+ * @el: exception level
+ *
+ * Compute the current SVE vector length for @el, in units of
+ * Quadwords Minus 1 -- the same scale used for ZCR_ELx.LEN.
+ */
+uint32_t sve_vqm1_for_el(CPUARMState *env, int el);
 
 static inline bool is_a64(CPUARMState *env)
 {
diff --git a/target/arm/arch_dump.c b/target/arm/arch_dump.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/arch_dump.c
+++ b/target/arm/arch_dump.c
@@ -XXX,XX +XXX,XX @@ static off_t sve_fpcr_offset(uint32_t vq)
 
 static uint32_t sve_current_vq(CPUARMState *env)
 {
-    return sve_zcr_len_for_el(env, arm_current_el(env)) + 1;
+    return sve_vqm1_for_el(env, arm_current_el(env)) + 1;
 }
 
 static size_t sve_size_vq(uint32_t vq)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
                  vfp_get_fpcr(env), vfp_get_fpsr(env));
 
     if (cpu_isar_feature(aa64_sve, cpu) && sve_exception_el(env, el) == 0) {
-        int j, zcr_len = sve_zcr_len_for_el(env, el);
+        int j, zcr_len = sve_vqm1_for_el(env, el);
 
         for (i = 0; i <= FFR_PRED_NUM; i++) {
             bool eol;
diff --git a/target/arm/gdbstub64.c b/target/arm/gdbstub64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/gdbstub64.c
+++ b/target/arm/gdbstub64.c
@@ -XXX,XX +XXX,XX @@ int arm_gdb_get_svereg(CPUARMState *env, GByteArray *buf, int reg)
          * We report in Vector Granules (VG) which is 64bit in a Z reg
          * while the ZCR works in Vector Quads (VQ) which is 128bit chunks.
          */
-        int vq = sve_zcr_len_for_el(env, arm_current_el(env)) + 1;
+        int vq = sve_vqm1_for_el(env, arm_current_el(env)) + 1;
         return gdb_get_reg64(buf, vq * 2);
     }
     default:
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
 /*
  * Given that SVE is enabled, return the vector length for EL.
  */
-uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
+uint32_t sve_vqm1_for_el(CPUARMState *env, int el)
 {
     ARMCPU *cpu = env_archcpu(env);
     uint32_t len = cpu->sve_max_vq - 1;
@@ -XXX,XX +XXX,XX @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
                       uint64_t value)
 {
     int cur_el = arm_current_el(env);
-    int old_len = sve_zcr_len_for_el(env, cur_el);
+    int old_len = sve_vqm1_for_el(env, cur_el);
     int new_len;
 
     /* Bits other than [3:0] are RAZ/WI.  */
@@ -XXX,XX +XXX,XX @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
      * Because we arrived here, we know both FP and SVE are enabled;
      * otherwise we would have trapped access to the ZCR_ELn register.
      */
-    new_len = sve_zcr_len_for_el(env, cur_el);
+    new_len = sve_vqm1_for_el(env, cur_el);
     if (new_len < old_len) {
         aarch64_sve_narrow_vq(env, new_len + 1);
     }
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
                 sve_el = 0;
             }
         } else if (sve_el == 0) {
-            DP_TBFLAG_A64(flags, VL, sve_zcr_len_for_el(env, el));
+            DP_TBFLAG_A64(flags, VL, sve_vqm1_for_el(env, el));
         }
         DP_TBFLAG_A64(flags, SVEEXC_EL, sve_el);
     }
@@ -XXX,XX +XXX,XX @@ void aarch64_sve_change_el(CPUARMState *env, int old_el,
      */
     old_a64 = old_el ? arm_el_is_aa64(env, old_el) : el0_a64;
     old_len = (old_a64 && !sve_exception_el(env, old_el)
-               ? sve_zcr_len_for_el(env, old_el) : 0);
+               ? sve_vqm1_for_el(env, old_el) : 0);
     new_a64 = new_el ? arm_el_is_aa64(env, new_el) : el0_a64;
     new_len = (new_a64 && !sve_exception_el(env, new_el)
-               ? sve_zcr_len_for_el(env, new_el) : 0);
+               ? sve_vqm1_for_el(env, new_el) : 0);
 
     /* When changing vector length, clear inaccessible state.  */
     if (new_len < old_len) {
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Begin creation of sve_ldst_internal.h by moving the primitives
that access host and tlb memory.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-14-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/sve_ldst_internal.h | 127 +++++++++++++++++++++++++++++++++
 target/arm/sve_helper.c        | 107 +--------------------------
 2 files changed, 128 insertions(+), 106 deletions(-)
 create mode 100644 target/arm/sve_ldst_internal.h

diff --git a/target/arm/sve_ldst_internal.h b/target/arm/sve_ldst_internal.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/sve_ldst_internal.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM SVE Load/Store Helpers
+ *
+ * Copyright (c) 2018-2022 Linaro
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef TARGET_ARM_SVE_LDST_INTERNAL_H
+#define TARGET_ARM_SVE_LDST_INTERNAL_H
+
+#include "exec/cpu_ldst.h"
+
+/*
+ * Load one element into @vd + @reg_off from @host.
+ * The controlling predicate is known to be true.
+ */
+typedef void sve_ldst1_host_fn(void *vd, intptr_t reg_off, void *host);
+
+/*
+ * Load one element into @vd + @reg_off from (@env, @vaddr, @ra).
+ * The controlling predicate is known to be true.
+ */
+typedef void sve_ldst1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
+                              target_ulong vaddr, uintptr_t retaddr);
+
+/*
+ * Generate the above primitives.
+ */
+
+#define DO_LD_HOST(NAME, H, TYPEE, TYPEM, HOST)                              \
+static inline void sve_##NAME##_host(void *vd, intptr_t reg_off, void *host) \
+{ TYPEM val = HOST(host); *(TYPEE *)(vd + H(reg_off)) = val; }
+
+#define DO_ST_HOST(NAME, H, TYPEE, TYPEM, HOST)                              \
+static inline void sve_##NAME##_host(void *vd, intptr_t reg_off, void *host) \
+{ TYPEM val = *(TYPEE *)(vd + H(reg_off)); HOST(host, val); }
+
+#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, TLB)                              \
+static inline void sve_##NAME##_tlb(CPUARMState *env, void *vd,            \
+                        intptr_t reg_off, target_ulong addr, uintptr_t ra) \
+{                                                                          \
+    TYPEM val = TLB(env, useronly_clean_ptr(addr), ra);                    \
+    *(TYPEE *)(vd + H(reg_off)) = val;                                     \
+}
+
+#define DO_ST_TLB(NAME, H, TYPEE, TYPEM, TLB)                              \
+static inline void sve_##NAME##_tlb(CPUARMState *env, void *vd,            \
+                        intptr_t reg_off, target_ulong addr, uintptr_t ra) \
+{                                                                          \
+    TYPEM val = *(TYPEE *)(vd + H(reg_off));                               \
+    TLB(env, useronly_clean_ptr(addr), val, ra);                           \
+}
+
+#define DO_LD_PRIM_1(NAME, H, TE, TM)                   \
+    DO_LD_HOST(NAME, H, TE, TM, ldub_p)                 \
+    DO_LD_TLB(NAME, H, TE, TM, cpu_ldub_data_ra)
+
+DO_LD_PRIM_1(ld1bb,  H1,   uint8_t,  uint8_t)
+DO_LD_PRIM_1(ld1bhu, H1_2, uint16_t, uint8_t)
+DO_LD_PRIM_1(ld1bhs, H1_2, uint16_t,  int8_t)
+DO_LD_PRIM_1(ld1bsu, H1_4, uint32_t, uint8_t)
+DO_LD_PRIM_1(ld1bss, H1_4, uint32_t,  int8_t)
+DO_LD_PRIM_1(ld1bdu, H1_8, uint64_t, uint8_t)
+DO_LD_PRIM_1(ld1bds, H1_8, uint64_t,  int8_t)
+
+#define DO_ST_PRIM_1(NAME, H, TE, TM)                   \
+    DO_ST_HOST(st1##NAME, H, TE, TM, stb_p)             \
+    DO_ST_TLB(st1##NAME, H, TE, TM, cpu_stb_data_ra)
+
+DO_ST_PRIM_1(bb,   H1,  uint8_t, uint8_t)
+DO_ST_PRIM_1(bh, H1_2, uint16_t, uint8_t)
+DO_ST_PRIM_1(bs, H1_4, uint32_t, uint8_t)
+DO_ST_PRIM_1(bd, H1_8, uint64_t, uint8_t)
+
+#define DO_LD_PRIM_2(NAME, H, TE, TM, LD) \
+    DO_LD_HOST(ld1##NAME##_be, H, TE, TM, LD##_be_p)    \
+    DO_LD_HOST(ld1##NAME##_le, H, TE, TM, LD##_le_p)    \
+    DO_LD_TLB(ld1##NAME##_be, H, TE, TM, cpu_##LD##_be_data_ra) \
+    DO_LD_TLB(ld1##NAME##_le, H, TE, TM, cpu_##LD##_le_data_ra)
+
+#define DO_ST_PRIM_2(NAME, H, TE, TM, ST) \
+    DO_ST_HOST(st1##NAME##_be, H, TE, TM, ST##_be_p)    \
+    DO_ST_HOST(st1##NAME##_le, H, TE, TM, ST##_le_p)    \
+    DO_ST_TLB(st1##NAME##_be, H, TE, TM, cpu_##ST##_be_data_ra) \
+    DO_ST_TLB(st1##NAME##_le, H, TE, TM, cpu_##ST##_le_data_ra)
+
+DO_LD_PRIM_2(hh,  H1_2, uint16_t, uint16_t, lduw)
+DO_LD_PRIM_2(hsu, H1_4, uint32_t, uint16_t, lduw)
+DO_LD_PRIM_2(hss, H1_4, uint32_t,  int16_t, lduw)
+DO_LD_PRIM_2(hdu, H1_8, uint64_t, uint16_t, lduw)
+DO_LD_PRIM_2(hds, H1_8, uint64_t,  int16_t, lduw)
+
+DO_ST_PRIM_2(hh, H1_2, uint16_t, uint16_t, stw)
+DO_ST_PRIM_2(hs, H1_4, uint32_t, uint16_t, stw)
+DO_ST_PRIM_2(hd, H1_8, uint64_t, uint16_t, stw)
+
+DO_LD_PRIM_2(ss,  H1_4, uint32_t, uint32_t, ldl)
+DO_LD_PRIM_2(sdu, H1_8, uint64_t, uint32_t, ldl)
+DO_LD_PRIM_2(sds, H1_8, uint64_t,  int32_t, ldl)
+
+DO_ST_PRIM_2(ss, H1_4, uint32_t, uint32_t, stl)
+DO_ST_PRIM_2(sd, H1_8, uint64_t, uint32_t, stl)
+
+DO_LD_PRIM_2(dd, H1_8, uint64_t, uint64_t, ldq)
+DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq)
+
+#undef DO_LD_TLB
+#undef DO_ST_TLB
+#undef DO_LD_HOST
+#undef DO_LD_PRIM_1
+#undef DO_ST_PRIM_1
+#undef DO_LD_PRIM_2
+#undef DO_ST_PRIM_2
+
+#endif /* TARGET_ARM_SVE_LDST_INTERNAL_H */
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -XXX,XX +XXX,XX @@
 #include "cpu.h"
 #include "internals.h"
 #include "exec/exec-all.h"
-#include "exec/cpu_ldst.h"
 #include "exec/helper-proto.h"
 #include "tcg/tcg-gvec-desc.h"
 #include "fpu/softfloat.h"
 #include "tcg/tcg.h"
 #include "vec_internal.h"
+#include "sve_ldst_internal.h"
 
 
 /* Return a value for NZCV as per the ARM PredTest pseudofunction.
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
  * Load contiguous data, protected by a governing predicate.
  */
 
-/*
- * Load one element into @vd + @reg_off from @host.
- * The controlling predicate is known to be true.
- */
-typedef void sve_ldst1_host_fn(void *vd, intptr_t reg_off, void *host);
-
-/*
- * Load one element into @vd + @reg_off from (@env, @vaddr, @ra).
- * The controlling predicate is known to be true.
- */
-typedef void sve_ldst1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
-                              target_ulong vaddr, uintptr_t retaddr);
-
-/*
- * Generate the above primitives.
- */
-
-#define DO_LD_HOST(NAME, H, TYPEE, TYPEM, HOST) \
-static void sve_##NAME##_host(void *vd, intptr_t reg_off, void *host)  \
-{                                                                      \
-    TYPEM val = HOST(host);                                            \
-    *(TYPEE *)(vd + H(reg_off)) = val;                                 \
-}
-
-#define DO_ST_HOST(NAME, H, TYPEE, TYPEM, HOST) \
-static void sve_##NAME##_host(void *vd, intptr_t reg_off, void *host)  \
-{ HOST(host, (TYPEM)*(TYPEE *)(vd + H(reg_off))); }
-
-#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, TLB) \
-static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
-                             target_ulong addr, uintptr_t ra)               \
-{                                                                           \
-    *(TYPEE *)(vd + H(reg_off)) =                                           \
-        (TYPEM)TLB(env, useronly_clean_ptr(addr), ra);                      \
-}
-
-#define DO_ST_TLB(NAME, H, TYPEE, TYPEM, TLB) \
-static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
-                             target_ulong addr, uintptr_t ra)               \
-{                                                                           \
-    TLB(env, useronly_clean_ptr(addr),                                      \
-        (TYPEM)*(TYPEE *)(vd + H(reg_off)), ra);                            \
-}
-
-#define DO_LD_PRIM_1(NAME, H, TE, TM)                   \
-    DO_LD_HOST(NAME, H, TE, TM, ldub_p)                 \
-    DO_LD_TLB(NAME, H, TE, TM, cpu_ldub_data_ra)
-
-DO_LD_PRIM_1(ld1bb,  H1,   uint8_t,  uint8_t)
-DO_LD_PRIM_1(ld1bhu, H1_2, uint16_t, uint8_t)
-DO_LD_PRIM_1(ld1bhs, H1_2, uint16_t,  int8_t)
-DO_LD_PRIM_1(ld1bsu, H1_4, uint32_t, uint8_t)
-DO_LD_PRIM_1(ld1bss, H1_4, uint32_t,  int8_t)
-DO_LD_PRIM_1(ld1bdu, H1_8, uint64_t, uint8_t)
-DO_LD_PRIM_1(ld1bds, H1_8, uint64_t,  int8_t)
-
-#define DO_ST_PRIM_1(NAME, H, TE, TM)                   \
-    DO_ST_HOST(st1##NAME, H, TE, TM, stb_p)             \
-    DO_ST_TLB(st1##NAME, H, TE, TM, cpu_stb_data_ra)
-
-DO_ST_PRIM_1(bb,   H1,  uint8_t, uint8_t)
-DO_ST_PRIM_1(bh, H1_2, uint16_t, uint8_t)
-DO_ST_PRIM_1(bs, H1_4, uint32_t, uint8_t)
-DO_ST_PRIM_1(bd, H1_8, uint64_t, uint8_t)
-
-#define DO_LD_PRIM_2(NAME, H, TE, TM, LD) \
-    DO_LD_HOST(ld1##NAME##_be, H, TE, TM, LD##_be_p)    \
-    DO_LD_HOST(ld1##NAME##_le, H, TE, TM, LD##_le_p)    \
-    DO_LD_TLB(ld1##NAME##_be, H, TE, TM, cpu_##LD##_be_data_ra) \
-    DO_LD_TLB(ld1##NAME##_le, H, TE, TM, cpu_##LD##_le_data_ra)
-
-#define DO_ST_PRIM_2(NAME, H, TE, TM, ST) \
-    DO_ST_HOST(st1##NAME##_be, H, TE, TM, ST##_be_p)    \
-    DO_ST_HOST(st1##NAME##_le, H, TE, TM, ST##_le_p)    \
-    DO_ST_TLB(st1##NAME##_be, H, TE, TM, cpu_##ST##_be_data_ra) \
-    DO_ST_TLB(st1##NAME##_le, H, TE, TM, cpu_##ST##_le_data_ra)
-
-DO_LD_PRIM_2(hh,  H1_2, uint16_t, uint16_t, lduw)
-DO_LD_PRIM_2(hsu, H1_4, uint32_t, uint16_t, lduw)
-DO_LD_PRIM_2(hss, H1_4, uint32_t,  int16_t, lduw)
-DO_LD_PRIM_2(hdu, H1_8, uint64_t, uint16_t, lduw)
-DO_LD_PRIM_2(hds, H1_8, uint64_t,  int16_t, lduw)
-
-DO_ST_PRIM_2(hh, H1_2, uint16_t, uint16_t, stw)
-DO_ST_PRIM_2(hs, H1_4, uint32_t, uint16_t, stw)
-DO_ST_PRIM_2(hd, H1_8, uint64_t, uint16_t, stw)
-
-DO_LD_PRIM_2(ss,  H1_4, uint32_t, uint32_t, ldl)
-DO_LD_PRIM_2(sdu, H1_8, uint64_t, uint32_t, ldl)
-DO_LD_PRIM_2(sds, H1_8, uint64_t,  int32_t, ldl)
-
-DO_ST_PRIM_2(ss, H1_4, uint32_t, uint32_t, stl)
-DO_ST_PRIM_2(sd, H1_8, uint64_t, uint32_t, stl)
-
-DO_LD_PRIM_2(dd, H1_8, uint64_t, uint64_t, ldq)
-DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq)
-
-#undef DO_LD_TLB
-#undef DO_ST_TLB
-#undef DO_LD_HOST
-#undef DO_LD_PRIM_1
-#undef DO_ST_PRIM_1
-#undef DO_LD_PRIM_2
-#undef DO_ST_PRIM_2
-
 /*
  * Skip through a sequence of inactive elements in the guarding predicate @vg,
  * beginning at @reg_off bounded by @reg_max.  Return the offset of the active
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Export all of the support functions for performing bulk
fault analysis on a set of elements at contiguous addresses
controlled by a predicate.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-15-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/sve_ldst_internal.h | 94 ++++++++++++++++++++++++++++++++++
 target/arm/sve_helper.c        | 87 ++++++-------------------------
 2 files changed, 111 insertions(+), 70 deletions(-)

diff --git a/target/arm/sve_ldst_internal.h b/target/arm/sve_ldst_internal.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve_ldst_internal.h
+++ b/target/arm/sve_ldst_internal.h
@@ -XXX,XX +XXX,XX @@ DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq)
 #undef DO_LD_PRIM_2
 #undef DO_ST_PRIM_2
 
+/*
+ * Resolve the guest virtual address to info->host and info->flags.
+ * If @nofault, return false if the page is invalid, otherwise
+ * exit via page fault exception.
+ */
+
+typedef struct {
+    void *host;
+    int flags;
+    MemTxAttrs attrs;
+} SVEHostPage;
+
+bool sve_probe_page(SVEHostPage *info, bool nofault, CPUARMState *env,
+                    target_ulong addr, int mem_off, MMUAccessType access_type,
+                    int mmu_idx, uintptr_t retaddr);
+
+/*
+ * Analyse contiguous data, protected by a governing predicate.
+ */
+
+typedef enum {
+    FAULT_NO,
+    FAULT_FIRST,
+    FAULT_ALL,
+} SVEContFault;
+
+typedef struct {
+    /*
+     * First and last element wholly contained within the two pages.
+     * mem_off_first[0] and reg_off_first[0] are always set >= 0.
+     * reg_off_last[0] may be < 0 if the first element crosses pages.
+     * All of mem_off_first[1], reg_off_first[1] and reg_off_last[1]
+     * are set >= 0 only if there are complete elements on a second page.
+     *
+     * The reg_off_* offsets are relative to the internal vector register.
+     * The mem_off_first offset is relative to the memory address; the
+     * two offsets are different when a load operation extends, a store
+     * operation truncates, or for multi-register operations.
+     */
+    int16_t mem_off_first[2];
+    int16_t reg_off_first[2];
+    int16_t reg_off_last[2];
+
+    /*
+     * One element that is misaligned and spans both pages,
+     * or -1 if there is no such active element.
+     */
+    int16_t mem_off_split;
+    int16_t reg_off_split;
+
+    /*
+     * The byte offset at which the entire operation crosses a page boundary.
+     * Set >= 0 if and only if the entire operation spans two pages.
+     */
+    int16_t page_split;
+
+    /* TLB data for the two pages. */
+    SVEHostPage page[2];
+} SVEContLdSt;
+
+/*
+ * Find first active element on each page, and a loose bound for the
+ * final element on each page.  Identify any single element that spans
+ * the page boundary.  Return true if there are any active elements.
+ */
+bool sve_cont_ldst_elements(SVEContLdSt *info, target_ulong addr, uint64_t *vg,
+                            intptr_t reg_max, int esz, int msize);
+
+/*
+ * Resolve the guest virtual addresses to info->page[].
+ * Control the generation of page faults with @fault.  Return false if
+ * there is no work to do, which can only happen with @fault == FAULT_NO.
+ */
+bool sve_cont_ldst_pages(SVEContLdSt *info, SVEContFault fault,
+                         CPUARMState *env, target_ulong addr,
+                         MMUAccessType access_type, uintptr_t retaddr);
+
+#ifdef CONFIG_USER_ONLY
+static inline void
+sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env, uint64_t *vg,
+                          target_ulong addr, int esize, int msize,
+                          int wp_access, uintptr_t retaddr)
+{ }
+#else
+void sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env,
+                               uint64_t *vg, target_ulong addr,
+                               int esize, int msize, int wp_access,
+                               uintptr_t retaddr);
+#endif
+
+void sve_cont_ldst_mte_check(SVEContLdSt *info, CPUARMState *env, uint64_t *vg,
+                             target_ulong addr, int esize, int msize,
+                             uint32_t mtedesc, uintptr_t ra);
+
 #endif /* TARGET_ARM_SVE_LDST_INTERNAL_H */
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -XXX,XX +XXX,XX @@ static intptr_t find_next_active(uint64_t *vg, intptr_t reg_off,
  * exit via page fault exception.
  */
 
-typedef struct {
-    void *host;
-    int flags;
-    MemTxAttrs attrs;
-} SVEHostPage;
-
-static bool sve_probe_page(SVEHostPage *info, bool nofault,
-                           CPUARMState *env, target_ulong addr,
-                           int mem_off, MMUAccessType access_type,
-                           int mmu_idx, uintptr_t retaddr)
+bool sve_probe_page(SVEHostPage *info, bool nofault, CPUARMState *env,
+                    target_ulong addr, int mem_off, MMUAccessType access_type,
+                    int mmu_idx, uintptr_t retaddr)
 {
     int flags;
 
@@ -XXX,XX +XXX,XX @@ static bool sve_probe_page(SVEHostPage *info, bool nofault,
     return true;
 }
 
-
-/*
- * Analyse contiguous data, protected by a governing predicate.
- */
-
-typedef enum {
-    FAULT_NO,
-    FAULT_FIRST,
-    FAULT_ALL,
-} SVEContFault;
-
-typedef struct {
-    /*
-     * First and last element wholly contained within the two pages.
-     * mem_off_first[0] and reg_off_first[0] are always set >= 0.
-     * reg_off_last[0] may be < 0 if the first element crosses pages.
-     * All of mem_off_first[1], reg_off_first[1] and reg_off_last[1]
-     * are set >= 0 only if there are complete elements on a second page.
-     *
-     * The reg_off_* offsets are relative to the internal vector register.
-     * The mem_off_first offset is relative to the memory address; the
-     * two offsets are different when a load operation extends, a store
-     * operation truncates, or for multi-register operations.
-     */
-    int16_t mem_off_first[2];
-    int16_t reg_off_first[2];
-    int16_t reg_off_last[2];
-
-    /*
-     * One element that is misaligned and spans both pages,
-     * or -1 if there is no such active element.
-     */
-    int16_t mem_off_split;
-    int16_t reg_off_split;
-
-    /*
-     * The byte offset at which the entire operation crosses a page boundary.
-     * Set >= 0 if and only if the entire operation spans two pages.
-     */
-    int16_t page_split;
-
-    /* TLB data for the two pages. */
-    SVEHostPage page[2];
-} SVEContLdSt;
-
 /*
  * Find first active element on each page, and a loose bound for the
  * final element on each page.  Identify any single element that spans
  * the page boundary.  Return true if there are any active elements.
  */
-static bool sve_cont_ldst_elements(SVEContLdSt *info, target_ulong addr,
-                                   uint64_t *vg, intptr_t reg_max,
-                                   int esz, int msize)
+bool sve_cont_ldst_elements(SVEContLdSt *info, target_ulong addr, uint64_t *vg,
+                            intptr_t reg_max, int esz, int msize)
 {
     const int esize = 1 << esz;
     const uint64_t pg_mask = pred_esz_masks[esz];
@@ -XXX,XX +XXX,XX @@ static bool sve_cont_ldst_elements(SVEContLdSt *info, target_ulong addr,
  * Control the generation of page faults with @fault.  Return false if
  * there is no work to do, which can only happen with @fault == FAULT_NO.
  */
-static bool sve_cont_ldst_pages(SVEContLdSt *info, SVEContFault fault,
-                                CPUARMState *env, target_ulong addr,
-                                MMUAccessType access_type, uintptr_t retaddr)
+bool sve_cont_ldst_pages(SVEContLdSt *info, SVEContFault fault,
+                         CPUARMState *env, target_ulong addr,
+                         MMUAccessType access_type, uintptr_t retaddr)
 {
     int mmu_idx = cpu_mmu_index(env, false);
     int mem_off = info->mem_off_first[0];
@@ -XXX,XX +XXX,XX @@ static bool sve_cont_ldst_pages(SVEContLdSt *info, SVEContFault fault,
     return have_work;
 }
 
-static void sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env,
-                                      uint64_t *vg, target_ulong addr,
-                                      int esize, int msize, int wp_access,
-                                      uintptr_t retaddr)
-{
 #ifndef CONFIG_USER_ONLY
+void sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env,
+                               uint64_t *vg, target_ulong addr,
+                               int esize, int msize, int wp_access,
+                               uintptr_t retaddr)
+{
     intptr_t mem_off, reg_off, reg_last;
     int flags0 = info->page[0].flags;
     int flags1 = info->page[1].flags;
@@ -XXX,XX +XXX,XX @@ static void sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env,
             } while (reg_off & 63);
         } while (reg_off <= reg_last);
     }
-#endif
 }
+#endif
 
-static void sve_cont_ldst_mte_check(SVEContLdSt *info, CPUARMState *env,
-                                    uint64_t *vg, target_ulong addr, int esize,
-                                    int msize, uint32_t mtedesc, uintptr_t ra)
+void sve_cont_ldst_mte_check(SVEContLdSt *info, CPUARMState *env,
+                             uint64_t *vg, target_ulong addr, int esize,
+                             int msize, uint32_t mtedesc, uintptr_t ra)
 {
     intptr_t mem_off, reg_off, reg_last;
 
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Put the inline function near the array declaration.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-16-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vec_internal.h | 8 +++++++-
 target/arm/sve_helper.c   | 9 ---------
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_internal.h
+++ b/target/arm/vec_internal.h
@@ -XXX,XX +XXX,XX @@
 #define H8(x)   (x)
 #define H1_8(x) (x)
 
-/* Data for expanding active predicate bits to bytes, for byte elements. */
+/*
+ * Expand active predicate bits to bytes, for byte elements.
+ */
 extern const uint64_t expand_pred_b_data[256];
+static inline uint64_t expand_pred_b(uint8_t byte)
+{
+    return expand_pred_b_data[byte];
+}
 
 static inline void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
 {
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
     return flags;
 }
 
-/*
- * Expand active predicate bits to bytes, for byte elements.
- * (The data table itself is in vec_helper.c as MVE also needs it.)
- */
-static inline uint64_t expand_pred_b(uint8_t byte)
-{
-    return expand_pred_b_data[byte];
-}
-
 /* Similarly for half-word elements.
  *  for (i = 0; i < 256; ++i) {
  *      unsigned long m = 0;
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Use the function instead of the array directly.

Because the function performs its own masking, via the uint8_t
parameter, we need to do nothing extra within the users: the bits
above the first 2 (_uh) or 4 (_uw) will be discarded by assignment
to the local bmask variables, and of course _uq uses the entire
uint64_t result.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-17-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/mve_helper.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static void mergemask_sb(int8_t *d, int8_t r, uint16_t mask)
 
 static void mergemask_uh(uint16_t *d, uint16_t r, uint16_t mask)
 {
-    uint16_t bmask = expand_pred_b_data[mask & 3];
+    uint16_t bmask = expand_pred_b(mask);
     *d = (*d & ~bmask) | (r & bmask);
 }
 
@@ -XXX,XX +XXX,XX @@ static void mergemask_sh(int16_t *d, int16_t r, uint16_t mask)
 
 static void mergemask_uw(uint32_t *d, uint32_t r, uint16_t mask)
 {
-    uint32_t bmask = expand_pred_b_data[mask & 0xf];
+    uint32_t bmask = expand_pred_b(mask);
     *d = (*d & ~bmask) | (r & bmask);
 }
 
@@ -XXX,XX +XXX,XX @@ static void mergemask_sw(int32_t *d, int32_t r, uint16_t mask)
 
 static void mergemask_uq(uint64_t *d, uint64_t r, uint16_t mask)
 {
-    uint64_t bmask = expand_pred_b_data[mask & 0xff];
+    uint64_t bmask = expand_pred_b(mask);
     *d = (*d & ~bmask) | (r & bmask);
 }
 
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Move the data to vec_helper.c and the inline to vec_internal.h.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-18-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vec_internal.h |  7 +++++++
 target/arm/sve_helper.c   | 29 -----------------------------
 target/arm/vec_helper.c   | 26 ++++++++++++++++++++++++++
 3 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_internal.h
+++ b/target/arm/vec_internal.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t expand_pred_b(uint8_t byte)
     return expand_pred_b_data[byte];
 }
 
+/* Similarly for half-word elements. */
+extern const uint64_t expand_pred_h_data[0x55 + 1];
+static inline uint64_t expand_pred_h(uint8_t byte)
+{
+    return expand_pred_h_data[byte & 0x55];
+}
+
 static inline void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
 {
     uint64_t *d = vd + opr_sz;
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
     return flags;
 }
 
-/* Similarly for half-word elements.
- *  for (i = 0; i < 256; ++i) {
- *      unsigned long m = 0;
- *      if (i & 0xaa) {
- *          continue;
- *      }
- *      for (j = 0; j < 8; j += 2) {
- *          if ((i >> j) & 1) {
- *              m |= 0xfffful << (j << 3);
- *          }
- *      }
- *      printf("[0x%x] = 0x%016lx,\n", i, m);
- *  }
- */
-static inline uint64_t expand_pred_h(uint8_t byte)
-{
-    static const uint64_t word[] = {
-        [0x01] = 0x000000000000ffff, [0x04] = 0x00000000ffff0000,
-        [0x05] = 0x00000000ffffffff, [0x10] = 0x0000ffff00000000,
-        [0x11] = 0x0000ffff0000ffff, [0x14] = 0x0000ffffffff0000,
-        [0x15] = 0x0000ffffffffffff, [0x40] = 0xffff000000000000,
-        [0x41] = 0xffff00000000ffff, [0x44] = 0xffff0000ffff0000,
-        [0x45] = 0xffff0000ffffffff, [0x50] = 0xffffffff00000000,
-        [0x51] = 0xffffffff0000ffff, [0x54] = 0xffffffffffff0000,
-        [0x55] = 0xffffffffffffffff,
-    };
-    return word[byte & 0x55];
-}
-
 /* Similarly for single word elements.  */
 static inline uint64_t expand_pred_s(uint8_t byte)
 {
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ const uint64_t expand_pred_b_data[256] = {
     0xffffffffffffffff,
 };
 
+/*
+ * Similarly for half-word elements.
+ *  for (i = 0; i < 256; ++i) {
+ *      unsigned long m = 0;
+ *      if (i & 0xaa) {
+ *          continue;
+ *      }
+ *      for (j = 0; j < 8; j += 2) {
+ *          if ((i >> j) & 1) {
+ *              m |= 0xfffful << (j << 3);
+ *          }
+ *      }
+ *      printf("[0x%x] = 0x%016lx,\n", i, m);
+ *  }
+ */
+const uint64_t expand_pred_h_data[0x55 + 1] = {
+    [0x01] = 0x000000000000ffff, [0x04] = 0x00000000ffff0000,
+    [0x05] = 0x00000000ffffffff, [0x10] = 0x0000ffff00000000,
+    [0x11] = 0x0000ffff0000ffff, [0x14] = 0x0000ffffffff0000,
+    [0x15] = 0x0000ffffffffffff, [0x40] = 0xffff000000000000,
+    [0x41] = 0xffff00000000ffff, [0x44] = 0xffff0000ffff0000,
+    [0x45] = 0xffff0000ffffffff, [0x50] = 0xffffffff00000000,
+    [0x51] = 0xffffffff0000ffff, [0x54] = 0xffffffffffff0000,
+    [0x55] = 0xffffffffffffffff,
+};
+
 /* Signed saturating rounding doubling multiply-accumulate high half, 8-bit */
 int8_t do_sqrdmlah_b(int8_t src1, int8_t src2, int8_t src3,
                      bool neg, bool round)
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

We will need this over in sme_helper.c.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-19-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vec_internal.h | 13 +++++++++++++
 target/arm/vec_helper.c   |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_internal.h
+++ b/target/arm/vec_internal.h
@@ -XXX,XX +XXX,XX @@ uint64_t pmull_h(uint64_t op1, uint64_t op2);
  */
 uint64_t pmull_w(uint64_t op1, uint64_t op2);
 
+/**
+ * bfdotadd:
+ * @sum: addend
+ * @e1, @e2: multiplicand vectors
+ *
+ * BFloat16 2-way dot product of @e1 & @e2, accumulating with @sum.
+ * The @e1 and @e2 operands correspond to the 32-bit source vector
+ * slots and contain two Bfloat16 values each.
+ *
+ * Corresponds to the ARM pseudocode function BFDotAdd.
+ */
+float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2);
+
 #endif /* TARGET_ARM_VEC_INTERNAL_H */
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_MMLA_B(gvec_usmmla_b, do_usmmla_b)
  * BFloat16 Dot Product
  */
 
-static float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2)
+float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2)
 {
     /* FPCR is ignored for BFDOT and BFMMLA. */
     float_status bf_status = {
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

This register is allocated from the existing block of id registers,
so it is already RES0 for cpus that do not implement SME.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220607203306.657998-21-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h    | 25 +++++++++++++++++++++++++
 target/arm/helper.c |  4 ++--
 target/arm/kvm64.c  | 11 +++++++----
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
         uint64_t id_aa64dfr0;
         uint64_t id_aa64dfr1;
         uint64_t id_aa64zfr0;
+        uint64_t id_aa64smfr0;
         uint64_t reset_pmcr_el0;
     } isar;
     uint64_t midr;
@@ -XXX,XX +XXX,XX @@ FIELD(ID_AA64ZFR0, I8MM, 44, 4)
 FIELD(ID_AA64ZFR0, F32MM, 52, 4)
 FIELD(ID_AA64ZFR0, F64MM, 56, 4)
 
+FIELD(ID_AA64SMFR0, F32F32, 32, 1)
+FIELD(ID_AA64SMFR0, B16F32, 34, 1)
+FIELD(ID_AA64SMFR0, F16F32, 35, 1)
+FIELD(ID_AA64SMFR0, I8I32, 36, 4)
+FIELD(ID_AA64SMFR0, F64F64, 48, 1)
+FIELD(ID_AA64SMFR0, I16I64, 52, 4)
+FIELD(ID_AA64SMFR0, SMEVER, 56, 4)
+FIELD(ID_AA64SMFR0, FA64, 63, 1)
+
 FIELD(ID_DFR0, COPDBG, 0, 4)
 FIELD(ID_DFR0, COPSDBG, 4, 4)
 FIELD(ID_DFR0, MMAPDBG, 8, 4)
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve_f64mm(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, F64MM) != 0;
 }
 
+static inline bool isar_feature_aa64_sme_f64f64(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64smfr0, ID_AA64SMFR0, F64F64);
+}
+
+static inline bool isar_feature_aa64_sme_i16i64(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64smfr0, ID_AA64SMFR0, I16I64) == 0xf;
+}
+
+static inline bool isar_feature_aa64_sme_fa64(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64smfr0, ID_AA64SMFR0, FA64);
+}
+
 /*
  * Feature tests for "does this exist in either 32-bit or 64-bit?"
  */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
               .access = PL1_R, .type = ARM_CP_CONST,
               .accessfn = access_aa64_tid3,
               .resetvalue = cpu->isar.id_aa64zfr0 },
-            { .name = "ID_AA64PFR5_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
+            { .name = "ID_AA64SMFR0_EL1", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 5,
               .access = PL1_R, .type = ARM_CP_CONST,
               .accessfn = access_aa64_tid3,
-              .resetvalue = 0 },
+              .resetvalue = cpu->isar.id_aa64smfr0 },
             { .name = "ID_AA64PFR6_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 6,
               .access = PL1_R, .type = ARM_CP_CONST,
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
     } else {
         err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64pfr1,
                               ARM64_SYS_REG(3, 0, 0, 4, 1));
+        err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64smfr0,
+                              ARM64_SYS_REG(3, 0, 0, 4, 5));
         err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64dfr0,
                               ARM64_SYS_REG(3, 0, 0, 5, 0));
         err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64dfr1,
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
         ahcf->isar.id_aa64pfr0 = t;
 
         /*
-         * Before v5.1, KVM did not support SVE and did not expose
-         * ID_AA64ZFR0_EL1 even as RAZ.  After v5.1, KVM still does
-         * not expose the register to "user" requests like this
-         * unless the host supports SVE.
+         * There is a range of kernels between kernel commit 73433762fcae
+         * and f81cb2c3ad41 which have a bug where the kernel doesn't expose
+         * SYS_ID_AA64ZFR0_EL1 via the ONE_REG API unless the VM has enabled
+         * SVE support, so we only read it here, rather than together with all
+         * the other ID registers earlier.
          */
         err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64zfr0,
                               ARM64_SYS_REG(3, 0, 0, 4, 4));
-- 
2.25.1

Hi; this pullreq contains only my FEAT_AFP/FEAT_RPRES patches
(plus a fix for a target/alpha latent bug that would otherwise
be revealed by the fpu changes), because 68 patches is already
longer than I prefer to send in at one time...

thanks
-- PMM

The following changes since commit ffaf7f0376f8040ce9068d71ae9ae8722505c42e:

Merge tag 'pull-10.0-testing-and-gdstub-updates-100225-1' of https://gitlab.com/stsquad/qemu into staging (2025-02-10 13:26:17 -0500)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20250211

for you to fetch changes up to ca4c34e07d1388df8e396520b5e7d60883cd3690:

target/arm: Sink fp_status and fpcr access into do_fmlal* (2025-02-11 16:22:08 +0000)

----------------------------------------------------------------
target-arm queue:
 * target/alpha: Don't corrupt error_code with unknown softfloat flags
 * target/arm: Implement FEAT_AFP and FEAT_RPRES

----------------------------------------------------------------
Peter Maydell (49):
      target/alpha: Don't corrupt error_code with unknown softfloat flags
      fpu: Add float_class_denormal
      fpu: Implement float_flag_input_denormal_used
      fpu: allow flushing of output denormals to be after rounding
      target/arm: Define FPCR AH, FIZ, NEP bits
      target/arm: Implement FPCR.FIZ handling
      target/arm: Adjust FP behaviour for FPCR.AH = 1
      target/arm: Adjust exception flag handling for AH = 1
      target/arm: Add FPCR.AH to tbflags
      target/arm: Set up float_status to use for FPCR.AH=1 behaviour
      target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS
      target/arm: Use FPST_FPCR_AH for BFCVT* insns
      target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
      target/arm: Add FPCR.NEP to TBFLAGS
      target/arm: Define and use new write_fp_*reg_merging() functions
      target/arm: Handle FPCR.NEP for 3-input scalar operations
      target/arm: Handle FPCR.NEP for BFCVT scalar
      target/arm: Handle FPCR.NEP for 1-input scalar operations
      target/arm: Handle FPCR.NEP in do_cvtf_scalar()
      target/arm: Handle FPCR.NEP for scalar FABS and FNEG
      target/arm: Handle FPCR.NEP for FCVTXN (scalar)
      target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
      target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
      target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
      target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
      target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
      target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
      target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
      target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
      target/arm: Implement FPCR.AH handling of negation of NaN
      target/arm: Implement FPCR.AH handling for scalar FABS and FABD
      target/arm: Handle FPCR.AH in vector FABD
      target/arm: Handle FPCR.AH in SVE FNEG
      target/arm: Handle FPCR.AH in SVE FABS
      target/arm: Handle FPCR.AH in SVE FABD
      target/arm: Handle FPCR.AH in negation steps in SVE FCADD
      target/arm: Handle FPCR.AH in negation steps in FCADD
      target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
      target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
      target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
      target/arm: Handle FPCR.AH in negation in FMLS (vector)
      target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
      target/arm: Handle FPCR.AH in SVE FTSSEL
      target/arm: Handle FPCR.AH in SVE FTMAD
      target/arm: Enable FEAT_AFP for '-cpu max'
      target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
      target/arm: Implement increased precision FRECPE
      target/arm: Implement increased precision FRSQRTE
      target/arm: Enable FEAT_RPRES for -cpu max

Richard Henderson (19):
      target/arm: Handle FPCR.AH in vector FCMLA
      target/arm: Handle FPCR.AH in FCMLA by index
      target/arm: Handle FPCR.AH in SVE FCMLA
      target/arm: Handle FPCR.AH in FMLSL (by element and vector)
      target/arm: Handle FPCR.AH in SVE FMLSL (indexed)
      target/arm: Handle FPCR.AH in SVE FMLSLB, FMLSLT (vectors)
      target/arm: Introduce CPUARMState.vfp.fp_status[]
      target/arm: Remove standard_fp_status_f16
      target/arm: Remove standard_fp_status
      target/arm: Remove ah_fp_status_f16
      target/arm: Remove ah_fp_status
      target/arm: Remove fp_status_f16_a64
      target/arm: Remove fp_status_f16_a32
      target/arm: Remove fp_status_a64
      target/arm: Remove fp_status_a32
      target/arm: Simplify fp_status indexing in mve_helper.c
      target/arm: Simplify DO_VFP_cmp in vfp_helper.c
      target/arm: Read fz16 from env->vfp.fpcr
      target/arm: Sink fp_status and fpcr access into do_fmlal*

In do_cvttq() we set env->error_code with what is supposed to be a
set of FPCR exception bit values.  However, if the set of float
exception flags we get back from softfloat for the conversion
includes a flag which is not one of the three we expect here
(invalid_cvti, invalid, inexact) then we will fall through the
if-ladder and set env->error_code to the unconverted softfloat
exception_flag value.  This will then cause us to take a spurious
exception.

This is harmless now, but when we add new floating point exception
flags to softfloat it will cause problems.  Add an else clause to the
if-ladder to make it ignore any float exception flags it doesn't care
about.

Specifically, without this fix, 'make check-tcg' will fail for Alpha
when the commit adding float_flag_input_denormal_used lands.

Fixes: aa3bad5b59e7 ("target/alpha: Use float64_to_int64_modulo for CVTTQ")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
 target/alpha/fpu_helper.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/alpha/fpu_helper.c b/target/alpha/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/alpha/fpu_helper.c
+++ b/target/alpha/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t do_cvttq(CPUAlphaState *env, uint64_t a, int roundmode)
             exc = FPCR_INV;
         } else if (exc & float_flag_inexact) {
             exc = FPCR_INE;
+        } else {
+            exc = 0;
         }
     }
     env->error_code = exc;
-- 
2.34.1

Currently in softfloat we canonicalize input denormals and so the
code that implements floating point operations does not need to care
whether the input value was originally normal or denormal.  However,
both x86 and Arm FEAT_AFP require that an exception flag is set if:
 * an input is denormal
 * that input is not squashed to zero
 * that input is actually used in the calculation (e.g. we
   did not find the other input was a NaN)

So we need to track that the input was a non-squashed denormal.  To
do this we add a new value to the FloatClass enum.  In this commit we
add the value and adjust the code everywhere that looks at FloatClass
values so that the new float_class_denormal behaves identically to
float_class_normal.  We will add the code that does the "raise a new
float exception flag if an input was an unsquashed denormal and we
used it" in a subsequent commit.

There should be no behavioural change in this commit.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 32 ++++++++++++++++++++++++++++---
 fpu/softfloat-parts.c.inc | 40 ++++++++++++++++++++++++---------------
 2 files changed, 54 insertions(+), 18 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -XXX,XX +XXX,XX @@ float64_gen2(float64 xa, float64 xb, float_status *s,
 /*
  * Classify a floating point number. Everything above float_class_qnan
  * is a NaN so cls >= float_class_qnan is any NaN.
+ *
+ * Note that we canonicalize denormals, so most code should treat
+ * class_normal and class_denormal identically.
  */
 
 typedef enum __attribute__ ((__packed__)) {
     float_class_unclassified,
     float_class_zero,
     float_class_normal,
+    float_class_denormal, /* input was a non-squashed denormal */
     float_class_inf,
     float_class_qnan,  /* all NaNs from here */
     float_class_snan,
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__ ((__packed__)) {
 enum {
     float_cmask_zero    = float_cmask(float_class_zero),
     float_cmask_normal  = float_cmask(float_class_normal),
+    float_cmask_denormal = float_cmask(float_class_denormal),
     float_cmask_inf     = float_cmask(float_class_inf),
     float_cmask_qnan    = float_cmask(float_class_qnan),
     float_cmask_snan    = float_cmask(float_class_snan),
 
     float_cmask_infzero = float_cmask_zero | float_cmask_inf,
     float_cmask_anynan  = float_cmask_qnan | float_cmask_snan,
+    float_cmask_anynorm = float_cmask_normal | float_cmask_denormal,
 };
 
 /* Flags for parts_minmax. */
@@ -XXX,XX +XXX,XX @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
     return c == float_class_qnan;
 }
 
+/*
+ * Return true if the float_cmask has only normals in it
+ * (including input denormals that were canonicalized)
+ */
+static inline bool cmask_is_only_normals(int cmask)
+{
+    return !(cmask & ~float_cmask_anynorm);
+}
+
+static inline bool is_anynorm(FloatClass c)
+{
+    return float_cmask(c) & float_cmask_anynorm;
+}
+
 /*
  * Structure holding all of the decomposed parts of a float.
  * The exponent is unbiased and the fraction is normalized.
@@ -XXX,XX +XXX,XX @@ static float64 float64r32_round_pack_canonical(FloatParts64 *p,
      */
     switch (p->cls) {
     case float_class_normal:
+    case float_class_denormal:
         if (unlikely(p->exp == 0)) {
             /*
              * The result is denormal for float32, but can be represented
@@ -XXX,XX +XXX,XX @@ static floatx80 floatx80_round_pack_canonical(FloatParts128 *p,
 
     switch (p->cls) {
     case float_class_normal:
+    case float_class_denormal:
         if (s->floatx80_rounding_precision == floatx80_precision_x) {
             parts_uncanon_normal(p, s, fmt);
             frac = p->frac_hi;
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
         break;
 
     case float_class_normal:
+    case float_class_denormal:
     case float_class_zero:
         break;
 
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
     a->sign = b->sign;
     a->exp = b->exp;
 
-    if (a->cls == float_class_normal) {
+    if (is_anynorm(a->cls)) {
         frac_truncjam(a, b);
     } else if (is_nan(a->cls)) {
         /* Discard the low bits of the NaN. */
@@ -XXX,XX +XXX,XX @@ static Int128 float128_to_int128_scalbn(float128 a, FloatRoundMode rmode,
         return int128_zero();
 
     case float_class_normal:
+    case float_class_denormal:
         if (parts_round_to_int_normal(&p, rmode, scale, 128 - 2)) {
             flags = float_flag_inexact;
         }
@@ -XXX,XX +XXX,XX @@ static Int128 float128_to_uint128_scalbn(float128 a, FloatRoundMode rmode,
         return int128_zero();
 
     case float_class_normal:
+    case float_class_denormal:
         if (parts_round_to_int_normal(&p, rmode, scale, 128 - 2)) {
             flags = float_flag_inexact;
             if (p.cls == float_class_zero) {
@@ -XXX,XX +XXX,XX @@ float32 float32_exp2(float32 a, float_status *status)
     float32_unpack_canonical(&xp, a, status);
     if (unlikely(xp.cls != float_class_normal)) {
         switch (xp.cls) {
+        case float_class_denormal:
+            break;
         case float_class_snan:
         case float_class_qnan:
             parts_return_nan(&xp, status);
@@ -XXX,XX +XXX,XX @@ float32 float32_exp2(float32 a, float_status *status)
         case float_class_zero:
             return float32_one;
         default:
-            break;
+            g_assert_not_reached();
         }
-        g_assert_not_reached();
     }
 
     float_raise(float_flag_inexact, status);
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(canonicalize)(FloatPartsN *p, float_status *status,
             frac_clear(p);
         } else {
             int shift = frac_normalize(p);
-            p->cls = float_class_normal;
+            p->cls = float_class_denormal;
             p->exp = fmt->frac_shift - fmt->exp_bias
                    - shift + !fmt->m68k_denormal;
         }
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
 static void partsN(uncanon)(FloatPartsN *p, float_status *s,
                             const FloatFmt *fmt)
 {
-    if (likely(p->cls == float_class_normal)) {
+    if (likely(is_anynorm(p->cls))) {
         parts_uncanon_normal(p, s, fmt);
     } else {
         switch (p->cls) {
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
 
     if (a->sign != b_sign) {
         /* Subtraction */
-        if (likely(ab_mask == float_cmask_normal)) {
+        if (likely(cmask_is_only_normals(ab_mask))) {
             if (parts_sub_normal(a, b)) {
                 return a;
             }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
         }
     } else {
         /* Addition */
-        if (likely(ab_mask == float_cmask_normal)) {
+        if (likely(cmask_is_only_normals(ab_mask))) {
             parts_add_normal(a, b);
             return a;
         }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
     }
 
     if (b->cls == float_class_zero) {
-        g_assert(a->cls == float_class_normal);
+        g_assert(is_anynorm(a->cls));
         return a;
     }
 
     g_assert(a->cls == float_class_zero);
-    g_assert(b->cls == float_class_normal);
+    g_assert(is_anynorm(b->cls));
  return_b:
     b->sign = b_sign;
     return b;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
     bool sign = a->sign ^ b->sign;
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         FloatPartsW tmp;
 
         frac_mulw(&tmp, a, b);
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
         a->sign ^= 1;
     }
 
-    if (unlikely(ab_mask != float_cmask_normal)) {
+    if (unlikely(!cmask_is_only_normals(ab_mask))) {
         if (unlikely(ab_mask == float_cmask_infzero)) {
             float_raise(float_flag_invalid | float_flag_invalid_imz, s);
             goto d_nan;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
         }
 
         g_assert(ab_mask & float_cmask_zero);
-        if (c->cls == float_class_normal) {
+        if (is_anynorm(c->cls)) {
             *a = *c;
             goto return_normal;
         }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
     bool sign = a->sign ^ b->sign;
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         a->sign = sign;
         a->exp -= b->exp + frac_div(a, b);
         return a;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
 {
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         frac_modrem(a, b, mod_quot);
         return a;
     }
@@ -XXX,XX +XXX,XX @@ static void partsN(sqrt)(FloatPartsN *a, float_status *status,
 
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
+        case float_class_denormal:
+            break;
         case float_class_snan:
         case float_class_qnan:
             parts_return_nan(a, status);
@@ -XXX,XX +XXX,XX @@ static void partsN(round_to_int)(FloatPartsN *a, FloatRoundMode rmode,
     case float_class_inf:
         break;
     case float_class_normal:
+    case float_class_denormal:
         if (parts_round_to_int_normal(a, rmode, scale, fmt->frac_size)) {
             float_raise(float_flag_inexact, s);
         }
@@ -XXX,XX +XXX,XX @@ static int64_t partsN(float_to_sint)(FloatPartsN *p, FloatRoundMode rmode,
         return 0;
 
     case float_class_normal:
+    case float_class_denormal:
         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
         if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
             flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static uint64_t partsN(float_to_uint)(FloatPartsN *p, FloatRoundMode rmode,
         return 0;
 
     case float_class_normal:
+    case float_class_denormal:
         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
         if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
             flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static int64_t partsN(float_to_sint_modulo)(FloatPartsN *p,
         return 0;
 
     case float_class_normal:
+    case float_class_denormal:
         /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
         if (parts_round_to_int_normal(p, rmode, 0, N - 2)) {
             flags = float_flag_inexact;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
     a_exp = a->exp;
     b_exp = b->exp;
 
-    if (unlikely(ab_mask != float_cmask_normal)) {
+    if (unlikely(!cmask_is_only_normals(ab_mask))) {
         switch (a->cls) {
         case float_class_normal:
+        case float_class_denormal:
             break;
         case float_class_inf:
             a_exp = INT16_MAX;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
         }
         switch (b->cls) {
         case float_class_normal:
+        case float_class_denormal:
             break;
         case float_class_inf:
             b_exp = INT16_MAX;
@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
 {
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
-    if (likely(ab_mask == float_cmask_normal)) {
+    if (likely(cmask_is_only_normals(ab_mask))) {
         FloatRelation cmp;
 
         if (a->sign != b->sign) {
@@ -XXX,XX +XXX,XX @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
     case float_class_inf:
         break;
     case float_class_normal:
+    case float_class_denormal:
         a->exp += MIN(MAX(n, -0x10000), 0x10000);
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
 
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
+        case float_class_denormal:
+            break;
         case float_class_snan:
         case float_class_qnan:
             parts_return_nan(a, s);
@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
             }
             return;
         default:
-            break;
+            g_assert_not_reached();
         }
-        g_assert_not_reached();
     }
     if (unlikely(a->sign)) {
         goto d_nan;
-- 
2.34.1

For the x86 and the Arm FEAT_AFP semantics, we need to be able to
tell the target code that the FPU operation has used an input
denormal.  Implement this; when it happens we set the new
float_flag_denormal_input_used.

Note that we only set this when an input denormal is actually used by
the operation: if the operation results in Invalid Operation or
Divide By Zero or the result is a NaN because some other input was a
NaN then we never needed to look at the input denormal and do not set
denormal_input_used.

We mostly do not need to adjust the hardfloat codepaths to deal with
this flag, because almost all hardfloat operations are already gated
on the input not being a denormal, and will fall back to softfloat
for a denormal input.  The only exception is the comparison
operations, where we need to add the check for input denormals, which
must now fall back to softfloat where they did not before.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-types.h |  7 ++++
 fpu/softfloat.c               | 38 +++++++++++++++++---
 fpu/softfloat-parts.c.inc     | 68 ++++++++++++++++++++++++++++++++++-
 3 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -XXX,XX +XXX,XX @@ enum {
     float_flag_invalid_sqrt    = 0x0800,  /* sqrt(-x) */
     float_flag_invalid_cvti    = 0x1000,  /* non-nan to integer */
     float_flag_invalid_snan    = 0x2000,  /* any operand was snan */
+    /*
+     * An input was denormal and we used it (without flushing it to zero).
+     * Not set if we do not actually use the denormal input (e.g.
+     * because some other input was a NaN, or because the operation
+     * wasn't actually carried out (divide-by-zero; invalid))
+     */
+    float_flag_input_denormal_used = 0x4000,
 };
 
 /*
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
                                   float16_params_ahp.frac_size + 1);
         break;
 
-    case float_class_normal:
     case float_class_denormal:
+        float_raise(float_flag_input_denormal_used, s);
+        break;
+    case float_class_normal:
     case float_class_zero:
         break;
 
@@ -XXX,XX +XXX,XX @@ static void parts64_float_to_float(FloatParts64 *a, float_status *s)
     if (is_nan(a->cls)) {
         parts_return_nan(a, s);
     }
+    if (a->cls == float_class_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
 }
 
 static void parts128_float_to_float(FloatParts128 *a, float_status *s)
@@ -XXX,XX +XXX,XX @@ static void parts128_float_to_float(FloatParts128 *a, float_status *s)
     if (is_nan(a->cls)) {
         parts_return_nan(a, s);
     }
+    if (a->cls == float_class_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
 }
 
 #define parts_float_to_float(P, S) \
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
     a->sign = b->sign;
     a->exp = b->exp;
 
-    if (is_anynorm(a->cls)) {
+    switch (a->cls) {
+    case float_class_denormal:
+        float_raise(float_flag_input_denormal_used, s);
+        /* fall through */
+    case float_class_normal:
         frac_truncjam(a, b);
-    } else if (is_nan(a->cls)) {
+        break;
+    case float_class_snan:
+    case float_class_qnan:
         /* Discard the low bits of the NaN. */
         a->frac = b->frac_hi;
         parts_return_nan(a, s);
+        break;
+    default:
+        break;
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void parts_float_to_float_widen(FloatParts128 *a, FloatParts64 *b,
     if (is_nan(a->cls)) {
         parts_return_nan(a, s);
     }
+    if (a->cls == float_class_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
 }
 
 float32 float16_to_float32(float16 a, bool ieee, float_status *s)
@@ -XXX,XX +XXX,XX @@ float32_hs_compare(float32 xa, float32 xb, float_status *s, bool is_quiet)
         goto soft;
     }
 
-    float32_input_flush2(&ua.s, &ub.s, s);
+    if (unlikely(float32_is_denormal(ua.s) || float32_is_denormal(ub.s))) {
+        /* We may need to set the input_denormal_used flag */
+        goto soft;
+    }
+
     if (isgreaterequal(ua.h, ub.h)) {
         if (isgreater(ua.h, ub.h)) {
             return float_relation_greater;
@@ -XXX,XX +XXX,XX @@ float64_hs_compare(float64 xa, float64 xb, float_status *s, bool is_quiet)
         goto soft;
     }
 
-    float64_input_flush2(&ua.s, &ub.s, s);
+    if (unlikely(float64_is_denormal(ua.s) || float64_is_denormal(ub.s))) {
+        /* We may need to set the input_denormal_used flag */
+        goto soft;
+    }
+
     if (isgreaterequal(ua.h, ub.h)) {
         if (isgreater(ua.h, ub.h)) {
             return float_relation_greater;
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
     bool b_sign = b->sign ^ subtract;
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
+    /*
+     * For addition and subtraction, we will consume an
+     * input denormal unless the other input is a NaN.
+     */
+    if ((ab_mask & (float_cmask_denormal | float_cmask_anynan)) ==
+        float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     if (a->sign != b_sign) {
         /* Subtraction */
         if (likely(cmask_is_only_normals(ab_mask))) {
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
     if (likely(cmask_is_only_normals(ab_mask))) {
         FloatPartsW tmp;
 
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
+
         frac_mulw(&tmp, a, b);
         frac_truncjam(a, &tmp);
 
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
     }
 
     /* Multiply by 0 or Inf */
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     if (ab_mask & float_cmask_inf) {
         a->cls = float_class_inf;
         a->sign = sign;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(muladd_scalbn)(FloatPartsN *a, FloatPartsN *b,
     if (flags & float_muladd_negate_result) {
         a->sign ^= 1;
     }
+
+    /*
+     * All result types except for "return the default NaN
+     * because this is an Invalid Operation" go through here;
+     * this matches the set of cases where we consumed a
+     * denormal input.
+     */
+    if (abc_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
     return a;
 
  return_sub_zero:
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
     bool sign = a->sign ^ b->sign;
 
     if (likely(cmask_is_only_normals(ab_mask))) {
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
         a->sign = sign;
         a->exp -= b->exp + frac_div(a, b);
         return a;
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
         return parts_pick_nan(a, b, s);
     }
 
+    if ((ab_mask & float_cmask_denormal) && b->cls != float_class_zero) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     a->sign = sign;
 
     /* Inf / X */
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
     int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
 
     if (likely(cmask_is_only_normals(ab_mask))) {
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
         frac_modrem(a, b, mod_quot);
         return a;
     }
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
         return a;
     }
 
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     /* N % Inf; 0 % N */
     g_assert(b->cls == float_class_inf || a->cls == float_class_zero);
     return a;
@@ -XXX,XX +XXX,XX @@ static void partsN(sqrt)(FloatPartsN *a, float_status *status,
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
         case float_class_denormal:
+            if (!a->sign) {
+                /* -ve denormal will be InvalidOperation */
+                float_raise(float_flag_input_denormal_used, status);
+            }
             break;
         case float_class_snan:
         case float_class_qnan:
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
         if ((flags & (minmax_isnum | minmax_isnumber))
             && !(ab_mask & float_cmask_snan)
             && (ab_mask & ~float_cmask_qnan)) {
+            if (ab_mask & float_cmask_denormal) {
+                float_raise(float_flag_input_denormal_used, s);
+            }
             return is_nan(a->cls) ? b : a;
         }
 
@@ -XXX,XX +XXX,XX @@ static FloatPartsN *partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
         return parts_pick_nan(a, b, s);
     }
 
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     a_exp = a->exp;
     b_exp = b->exp;
 
@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
     if (likely(cmask_is_only_normals(ab_mask))) {
         FloatRelation cmp;
 
+        if (ab_mask & float_cmask_denormal) {
+            float_raise(float_flag_input_denormal_used, s);
+        }
+
         if (a->sign != b->sign) {
             goto a_sign;
         }
@@ -XXX,XX +XXX,XX @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
         return float_relation_unordered;
     }
 
+    if (ab_mask & float_cmask_denormal) {
+        float_raise(float_flag_input_denormal_used, s);
+    }
+
     if (ab_mask & float_cmask_zero) {
         if (ab_mask == float_cmask_zero) {
             return float_relation_equal;
@@ -XXX,XX +XXX,XX @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
     case float_class_zero:
     case float_class_inf:
         break;
-    case float_class_normal:
     case float_class_denormal:
+        float_raise(float_flag_input_denormal_used, s);
+        /* fall through */
+    case float_class_normal:
         a->exp += MIN(MAX(n, -0x10000), 0x10000);
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
     if (unlikely(a->cls != float_class_normal)) {
         switch (a->cls) {
         case float_class_denormal:
+            if (!a->sign) {
+                /* -ve denormal will be InvalidOperation */
+                float_raise(float_flag_input_denormal_used, s);
+            }
             break;
         case float_class_snan:
         case float_class_qnan:
-- 
2.34.1

Currently we handle flushing of output denormals in uncanon_normal
always before we deal with rounding.  This works for architectures
that detect tininess before rounding, but is usually not the right
place when the architecture detects tininess after rounding.  For
example, for x86 the SDM states that the MXCSR FTZ control bit causes
outputs to be flushed to zero "when it detects a floating-point
underflow condition".  This means that we mustn't flush to zero if
the input is such that after rounding it is no longer tiny.

At least one of our guest architectures does underflow detection
after rounding but flushing of denormals before rounding (MIPS MSA);
this means we need to have a config knob for this that is separate
from our existing tininess_before_rounding setting.

Add an ftz_detection flag.  For consistency with
tininess_before_rounding, we make it default to "detect ftz after
rounding"; this means that we need to explicitly set the flag to
"detect ftz before rounding" on every existing architecture that sets
flush_to_zero, so that this commit has no behaviour change.
(This means more code change here but for the long term a less
confusing API.)

For several architectures the current behaviour is either
definitely or possibly wrong; annotate those with TODO comments.
These architectures are definitely wrong (and should detect
ftz after rounding):
 * x86
 * Alpha

For these architectures the spec is unclear:
 * MIPS (for non-MSA)
 * RX
 * SH4

PA-RISC makes ftz detection IMPDEF, but we aren't setting the
"tininess before rounding" setting that we ought to.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-helpers.h | 11 +++++++++++
 include/fpu/softfloat-types.h   | 18 ++++++++++++++++++
 target/mips/fpu_helper.h        |  6 ++++++
 target/alpha/cpu.c              |  7 +++++++
 target/arm/cpu.c                |  1 +
 target/hppa/fpu_helper.c        | 11 +++++++++++
 target/i386/tcg/fpu_helper.c    |  8 ++++++++
 target/mips/msa.c               |  9 +++++++++
 target/ppc/cpu_init.c           |  3 +++
 target/rx/cpu.c                 |  8 ++++++++
 target/sh4/cpu.c                |  8 ++++++++
 target/tricore/helper.c         |  1 +
 tests/fp/fp-bench.c             |  1 +
 fpu/softfloat-parts.c.inc       | 21 +++++++++++++++------
 14 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/include/fpu/softfloat-helpers.h b/include/fpu/softfloat-helpers.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-helpers.h
+++ b/include/fpu/softfloat-helpers.h
@@ -XXX,XX +XXX,XX @@ static inline void set_flush_inputs_to_zero(bool val, float_status *status)
     status->flush_inputs_to_zero = val;
 }
 
+static inline void set_float_ftz_detection(FloatFTZDetection d,
+                                           float_status *status)
+{
+    status->ftz_detection = d;
+}
+
 static inline void set_default_nan_mode(bool val, float_status *status)
 {
     status->default_nan_mode = val;
@@ -XXX,XX +XXX,XX @@ static inline bool get_default_nan_mode(const float_status *status)
     return status->default_nan_mode;
 }
 
+static inline FloatFTZDetection get_float_ftz_detection(const float_status *status)
+{
+    return status->ftz_detection;
+}
+
 #endif /* SOFTFLOAT_HELPERS_H */
diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
     float_infzeronan_suppress_invalid = (1 << 7),
 } FloatInfZeroNaNRule;
 
+/*
+ * When flush_to_zero is set, should we detect denormal results to
+ * be flushed before or after rounding? For most architectures this
+ * should be set to match the tininess_before_rounding setting,
+ * but a few architectures, e.g. MIPS MSA, detect FTZ before
+ * rounding but tininess after rounding.
+ *
+ * This enum is arranged so that the default if the target doesn't
+ * configure it matches the default for tininess_before_rounding
+ * (i.e. "after rounding").
+ */
+typedef enum __attribute__((__packed__)) {
+    float_ftz_after_rounding = 0,
+    float_ftz_before_rounding = 1,
+} FloatFTZDetection;
+
 /*
  * Floating Point Status. Individual architectures may maintain
  * several versions of float_status for different functions. The
@@ -XXX,XX +XXX,XX @@ typedef struct float_status {
     bool tininess_before_rounding;
     /* should denormalised results go to zero and set output_denormal_flushed? */
     bool flush_to_zero;
+    /* do we detect and flush denormal results before or after rounding? */
+    FloatFTZDetection ftz_detection;
     /* should denormalised inputs go to zero and set input_denormal_flushed? */
     bool flush_inputs_to_zero;
     bool default_nan_mode;
diff --git a/target/mips/fpu_helper.h b/target/mips/fpu_helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/mips/fpu_helper.h
+++ b/target/mips/fpu_helper.h
@@ -XXX,XX +XXX,XX @@ static inline void fp_reset(CPUMIPSState *env)
      */
     set_float_2nan_prop_rule(float_2nan_prop_s_ab,
                              &env->active_fpu.fp_status);
+    /*
+     * TODO: the spec does't say clearly whether FTZ happens before
+     * or after rounding for normal FPU operations.
+     */
+    set_float_ftz_detection(float_ftz_before_rounding,
+                            &env->active_fpu.fp_status);
 }
 
 /* MSA */
diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/alpha/cpu.c
+++ b/target/alpha/cpu.c
@@ -XXX,XX +XXX,XX @@ static void alpha_cpu_initfn(Object *obj)
     set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
     /* Default NaN: sign bit clear, msb frac bit set */
     set_float_default_nan_pattern(0b01000000, &env->fp_status);
+    /*
+     * TODO: this is incorrect. The Alpha Architecture Handbook version 4
+     * section 4.7.7.11 says that we flush to zero for underflow cases, so
+     * this should be float_ftz_after_rounding to match the
+     * tininess_after_rounding (which is specified in section 4.7.5).
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 #if defined(CONFIG_USER_ONLY)
     env->flags = ENV_FLAG_PS_USER | ENV_FLAG_FEN;
     cpu_alpha_store_fpcr(env, (uint64_t)(FPCR_INVD | FPCR_DZED | FPCR_OVFD
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
 static void arm_set_default_fp_behaviours(float_status *s)
 {
     set_float_detect_tininess(float_tininess_before_rounding, s);
+    set_float_ftz_detection(float_ftz_before_rounding, s);
     set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
     set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
     set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
diff --git a/target/hppa/fpu_helper.c b/target/hppa/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/hppa/fpu_helper.c
+++ b/target/hppa/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(loaded_fr0)(CPUHPPAState *env)
     set_float_infzeronan_rule(float_infzeronan_dnan_never, &env->fp_status);
     /* Default NaN: sign bit clear, msb-1 frac bit set */
     set_float_default_nan_pattern(0b00100000, &env->fp_status);
+    /*
+     * "PA-RISC 2.0 Architecture" says it is IMPDEF whether the flushing
+     * enabled by FPSR.D happens before or after rounding. We pick "before"
+     * for consistency with tininess detection.
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
+    /*
+     * TODO: "PA-RISC 2.0 Architecture" chapter 10 says that we should
+     * detect tininess before rounding, but we don't set that here so we
+     * get the default tininess after rounding.
+     */
 }
 
 void cpu_hppa_loaded_fr0(CPUHPPAState *env)
diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ void cpu_init_fp_statuses(CPUX86State *env)
     set_float_default_nan_pattern(0b11000000, &env->fp_status);
     set_float_default_nan_pattern(0b11000000, &env->mmx_status);
     set_float_default_nan_pattern(0b11000000, &env->sse_status);
+    /*
+     * TODO: x86 does flush-to-zero detection after rounding (the SDM
+     * section 10.2.3.3 on the FTZ bit of MXCSR says that we flush
+     * when we detect underflow, which x86 does after rounding).
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
+    set_float_ftz_detection(float_ftz_before_rounding, &env->mmx_status);
+    set_float_ftz_detection(float_ftz_before_rounding, &env->sse_status);
 }
 
 static inline uint8_t save_exception_flags(CPUX86State *env)
diff --git a/target/mips/msa.c b/target/mips/msa.c
index XXXXXXX..XXXXXXX 100644
--- a/target/mips/msa.c
+++ b/target/mips/msa.c
@@ -XXX,XX +XXX,XX @@ void msa_reset(CPUMIPSState *env)
     /* tininess detected after rounding.*/
     set_float_detect_tininess(float_tininess_after_rounding,
                               &env->active_tc.msa_fp_status);
+    /*
+     * MSACSR.FS detects tiny results to flush to zero before rounding
+     * (per "MIPS Architecture for Programmers Volume IV-j: The MIPS64 SIMD
+     * Architecture Module, Revision 1.1" section 3.5.4), even though it
+     * detects tininess after rounding for underflow purposes (section 3.4.2
+     * table 3.3).
+     */
+    set_float_ftz_detection(float_ftz_before_rounding,
+                            &env->active_tc.msa_fp_status);
 
     /*
      * According to MIPS specifications, if one of the two operands is
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index XXXXXXX..XXXXXXX 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -XXX,XX +XXX,XX @@ static void ppc_cpu_reset_hold(Object *obj, ResetType type)
     /* tininess for underflow is detected before rounding */
     set_float_detect_tininess(float_tininess_before_rounding,
                               &env->fp_status);
+    /* Similarly for flush-to-zero */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
+
     /*
      * PowerPC propagation rules:
      *  1. A if it sNaN or qNaN
diff --git a/target/rx/cpu.c b/target/rx/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/rx/cpu.c
+++ b/target/rx/cpu.c
@@ -XXX,XX +XXX,XX @@ static void rx_cpu_reset_hold(Object *obj, ResetType type)
     set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
     /* Default NaN value: sign bit clear, set frac msb */
     set_float_default_nan_pattern(0b01000000, &env->fp_status);
+    /*
+     * TODO: "RX Family RXv1 Instruction Set Architecture" is not 100% clear
+     * on whether flush-to-zero should happen before or after rounding, but
+     * section 1.3.2 says that it happens when underflow is detected, and
+     * implies that underflow is detected after rounding. So this may not
+     * be the correct setting.
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 }
 
 static ObjectClass *rx_cpu_class_by_name(const char *cpu_model)
diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/sh4/cpu.c
+++ b/target/sh4/cpu.c
@@ -XXX,XX +XXX,XX @@ static void superh_cpu_reset_hold(Object *obj, ResetType type)
     set_default_nan_mode(1, &env->fp_status);
     /* sign bit clear, set all frac bits other than msb */
     set_float_default_nan_pattern(0b00111111, &env->fp_status);
+    /*
+     * TODO: "SH-4 CPU Core Architecture ADCS 7182230F" doesn't say whether
+     * it detects tininess before or after rounding. Section 6.4 is clear
+     * that flush-to-zero happens when the result underflows, though, so
+     * either this should be "detect ftz after rounding" or else we should
+     * be setting "detect tininess before rounding".
+     */
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
 }
 
 static void superh_cpu_disas_set_info(CPUState *cpu, disassemble_info *info)
diff --git a/target/tricore/helper.c b/target/tricore/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/tricore/helper.c
+++ b/target/tricore/helper.c
@@ -XXX,XX +XXX,XX @@ void fpu_set_state(CPUTriCoreState *env)
     set_flush_inputs_to_zero(1, &env->fp_status);
     set_flush_to_zero(1, &env->fp_status);
     set_float_detect_tininess(float_tininess_before_rounding, &env->fp_status);
+    set_float_ftz_detection(float_ftz_before_rounding, &env->fp_status);
     set_default_nan_mode(1, &env->fp_status);
     /* Default NaN pattern: sign bit clear, frac msb set */
     set_float_default_nan_pattern(0b01000000, &env->fp_status);
diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/fp/fp-bench.c
+++ b/tests/fp/fp-bench.c
@@ -XXX,XX +XXX,XX @@ static void run_bench(void)
     set_float_3nan_prop_rule(float_3nan_prop_s_cab, &soft_status);
     set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, &soft_status);
     set_float_default_nan_pattern(0b01000000, &soft_status);
+    set_float_ftz_detection(float_ftz_before_rounding, &soft_status);
 
     f = bench_funcs[operation][precision];
     g_assert(f);
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
             p->frac_lo &= ~round_mask;
         }
         frac_shr(p, frac_shift);
-    } else if (s->flush_to_zero) {
+    } else if (s->flush_to_zero &&
+               s->ftz_detection == float_ftz_before_rounding) {
         flags |= float_flag_output_denormal_flushed;
         p->cls = float_class_zero;
         exp = 0;
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
         exp = (p->frac_hi & DECOMPOSED_IMPLICIT_BIT) && !fmt->m68k_denormal;
         frac_shr(p, frac_shift);
 
-        if (is_tiny && (flags & float_flag_inexact)) {
-            flags |= float_flag_underflow;
-        }
-        if (exp == 0 && frac_eqz(p)) {
-            p->cls = float_class_zero;
+        if (is_tiny) {
+            if (s->flush_to_zero) {
+                assert(s->ftz_detection == float_ftz_after_rounding);
+                flags |= float_flag_output_denormal_flushed;
+                p->cls = float_class_zero;
+                exp = 0;
+                frac_clear(p);
+            } else if (flags & float_flag_inexact) {
+                flags |= float_flag_underflow;
+            }
+            if (exp == 0 && frac_eqz(p)) {
+                p->cls = float_class_zero;
+            }
         }
     }
     p->exp = exp;
-- 
2.34.1

The Armv8.7 FEAT_AFP feature defines three new control bits in
the FPCR:
 * FPCR.AH: "alternate floating point mode"; this changes floating
   point behaviour in a variety of ways, including:
    - the sign of a default NaN is 1, not 0
    - if FPCR.FZ is also 1, denormals detected after rounding
      with an unbounded exponent has been applied are flushed to zero
    - FPCR.FZ does not cause denormalized inputs to be flushed to zero
    - miscellaneous other corner-case behaviour changes
 * FPCR.FIZ: flush denormalized numbers to zero on input for
   most instructions
 * FPCR.NEP: makes scalar SIMD operations merge the result with
   higher vector elements in one of the source registers, instead
   of zeroing the higher elements of the destination

This commit defines the new bits in the FPCR, and allows them to be
read or written when FEAT_AFP is implemented.  Actual behaviour
changes will be implemented in subsequent commits.

Note that these are the first FPCR bits which don't appear in the
AArch32 FPSCR view of the register, and which share bit positions
with FPSR bits.

Part of FEAT_AFP is the new control bit FPCR.FIZ.  This bit affects
flushing of single and double precision denormal inputs to zero for
AArch64 floating point instructions.  (For half-precision, the
existing FPCR.FZ16 control remains the only one.)

FPCR.FIZ differs from FPCR.FZ in that if we flush an input denormal
only because of FPCR.FIZ then we should *not* set the cumulative
exception bit FPSR.IDC.

FEAT_AFP also defines that in AArch64 the existing FPCR.FZ only
applies when FPCR.AH is 0.

We can implement this by setting the "flush inputs to zero" state
appropriately when FPCR is written, and by not reflecting the
float_flag_input_denormal status flag into FPSR reads when it is the
result only of FPSR.FIZ.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vfp_helper.c | 60 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 50 insertions(+), 10 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
 
 static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
 {
-    uint32_t i = 0;
+    uint32_t a32_flags = 0, a64_flags = 0;
 
-    i |= get_float_exception_flags(&env->vfp.fp_status_a32);
-    i |= get_float_exception_flags(&env->vfp.fp_status_a64);
-    i |= get_float_exception_flags(&env->vfp.standard_fp_status);
+    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
+    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
     /* FZ16 does not generate an input denormal exception.  */
-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
+    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
           & ~float_flag_input_denormal_flushed);
-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
+    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
           & ~float_flag_input_denormal_flushed);
-    i |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
+
+    a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
+    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~float_flag_input_denormal_flushed);
-    return vfp_exceptbits_from_host(i);
+    /*
+     * Flushing an input denormal *only* because FPCR.FIZ == 1 does
+     * not set FPSR.IDC; if FPCR.FZ is also set then this takes
+     * precedence and IDC is set (see the FPUnpackBase pseudocode).
+     * So squash it unless (FPCR.AH == 0 && FPCR.FZ == 1).
+     * We only do this for the a64 flags because FIZ has no effect
+     * on AArch32 even if it is set.
+     */
+    if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
+        a64_flags &= ~float_flag_input_denormal_flushed;
+    }
+    return vfp_exceptbits_from_host(a32_flags | a64_flags);
 }
 
 static void vfp_clear_float_status_exc_flags(CPUARMState *env)
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
 }
 
+static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
+{
+    /*
+     * Synchronize any pending exception-flag information in the
+     * float_status values into env->vfp.fpsr, and then clear out
+     * the float_status data.
+     */
+    env->vfp.fpsr |= vfp_get_fpsr_from_host(env);
+    vfp_clear_float_status_exc_flags(env);
+}
+
 static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
 {
     uint64_t changed = env->vfp.fpcr;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
+        /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+    }
+    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
+        /*
+         * A64: Flush denormalized inputs to zero if FPCR.FIZ = 1, or
+         * both FPCR.AH = 0 and FPCR.FZ = 1.
+         */
+        bool fitz_enabled = (val & FPCR_FIZ) ||
+            (val & (FPCR_FZ | FPCR_AH)) == FPCR_FZ;
+        set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status_a64);
     }
     if (changed & FPCR_DN) {
         bool dnan_enabled = val & FPCR_DN;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
     }
+    /*
+     * If any bits changed that we look at in vfp_get_fpsr_from_host(),
+     * we must sync the float_status flags into vfp.fpsr now (under the
+     * old regime) before we update vfp.fpcr.
+     */
+    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
+        vfp_sync_and_clear_float_status_exc_flags(env);
+    }
 }
 
 #else
-- 
2.34.1

When FPCR.AH is set, various behaviours of AArch64 floating point
operations which are controlled by softfloat config settings change:
 * tininess and ftz detection before/after rounding
 * NaN propagation order
 * result of 0 * Inf + NaN
 * default NaN value

When the guest changes the value of the AH bit, switch these config
settings on the fp_status_a64 and fp_status_f16_a64 float_status
fields.

This requires us to make the arm_set_default_fp_behaviours() function
global, since we now need to call it from cpu.c and vfp_helper.c; we
move it to vfp_helper.c so it can be next to the new
arm_set_ah_fp_behaviours().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/internals.h  |  4 +++
 target/arm/cpu.c        | 23 ----------------
 target/arm/vfp_helper.c | 58 ++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 61 insertions(+), 24 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ uint64_t gt_virt_cnt_offset(CPUARMState *env);
  * all EL1" scope; this covers stage 1 and stage 2.
  */
 int alle1_tlbmask(CPUARMState *env);
+
+/* Set the float_status behaviour to match the Arm defaults */
+void arm_set_default_fp_behaviours(float_status *s);
+
 #endif
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
     QLIST_INSERT_HEAD(&cpu->el_change_hooks, entry, node);
 }
 
-/*
- * Set the float_status behaviour to match the Arm defaults:
- *  * tininess-before-rounding
- *  * 2-input NaN propagation prefers SNaN over QNaN, and then
- *    operand A over operand B (see FPProcessNaNs() pseudocode)
- *  * 3-input NaN propagation prefers SNaN over QNaN, and then
- *    operand C over A over B (see FPProcessNaNs3() pseudocode,
- *    but note that for QEMU muladd is a * b + c, whereas for
- *    the pseudocode function the arguments are in the order c, a, b.
- *  * 0 * Inf + NaN returns the default NaN if the input NaN is quiet,
- *    and the input NaN if it is signalling
- *  * Default NaN has sign bit clear, msb frac bit set
- */
-static void arm_set_default_fp_behaviours(float_status *s)
-{
-    set_float_detect_tininess(float_tininess_before_rounding, s);
-    set_float_ftz_detection(float_ftz_before_rounding, s);
-    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
-    set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
-    set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
-    set_float_default_nan_pattern(0b01000000, s);
-}
-
 static void cp_reg_reset(gpointer key, gpointer value, gpointer opaque)
 {
     /* Reset a single ARMCPRegInfo register */
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@
 #include "exec/helper-proto.h"
 #include "internals.h"
 #include "cpu-features.h"
+#include "fpu/softfloat.h"
 #ifdef CONFIG_TCG
 #include "qemu/log.h"
-#include "fpu/softfloat.h"
 #endif
 
 /* VFP support.  We follow the convention used for VFP instructions:
    Single precision routines have a "s" suffix, double precision a
    "d" suffix.  */
 
+/*
+ * Set the float_status behaviour to match the Arm defaults:
+ *  * tininess-before-rounding
+ *  * 2-input NaN propagation prefers SNaN over QNaN, and then
+ *    operand A over operand B (see FPProcessNaNs() pseudocode)
+ *  * 3-input NaN propagation prefers SNaN over QNaN, and then
+ *    operand C over A over B (see FPProcessNaNs3() pseudocode,
+ *    but note that for QEMU muladd is a * b + c, whereas for
+ *    the pseudocode function the arguments are in the order c, a, b.
+ *  * 0 * Inf + NaN returns the default NaN if the input NaN is quiet,
+ *    and the input NaN if it is signalling
+ *  * Default NaN has sign bit clear, msb frac bit set
+ */
+void arm_set_default_fp_behaviours(float_status *s)
+{
+    set_float_detect_tininess(float_tininess_before_rounding, s);
+    set_float_ftz_detection(float_ftz_before_rounding, s);
+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
+    set_float_3nan_prop_rule(float_3nan_prop_s_cab, s);
+    set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s);
+    set_float_default_nan_pattern(0b01000000, s);
+}
+
+/*
+ * Set the float_status behaviour to match the FEAT_AFP
+ * FPCR.AH=1 requirements:
+ *  * tininess-after-rounding
+ *  * 2-input NaN propagation prefers the first NaN
+ *  * 3-input NaN propagation prefers a over b over c
+ *  * 0 * Inf + NaN always returns the input NaN and doesn't
+ *    set Invalid for a QNaN
+ *  * default NaN has sign bit set, msb frac bit set
+ */
+static void arm_set_ah_fp_behaviours(float_status *s)
+{
+    set_float_detect_tininess(float_tininess_after_rounding, s);
+    set_float_ftz_detection(float_ftz_after_rounding, s);
+    set_float_2nan_prop_rule(float_2nan_prop_ab, s);
+    set_float_3nan_prop_rule(float_3nan_prop_abc, s);
+    set_float_infzeronan_rule(float_infzeronan_dnan_never |
+                              float_infzeronan_suppress_invalid, s);
+    set_float_default_nan_pattern(0b11000000, s);
+}
+
 #ifdef CONFIG_TCG
 
 /* Convert host exception flags to vfp form.  */
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
     }
+    if (changed & FPCR_AH) {
+        bool ah_enabled = val & FPCR_AH;
+
+        if (ah_enabled) {
+            /* Change behaviours for A64 FP operations */
+            arm_set_ah_fp_behaviours(&env->vfp.fp_status_a64);
+            arm_set_ah_fp_behaviours(&env->vfp.fp_status_f16_a64);
+        } else {
+            arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
+            arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
+        }
+    }
     /*
      * If any bits changed that we look at in vfp_get_fpsr_from_host(),
      * we must sync the float_status flags into vfp.fpsr now (under the
-- 
2.34.1

When FPCR.AH = 1, some of the cumulative exception flags in the FPSR
behave slightly differently for A64 operations:
 * IDC is set when a denormal input is used without flushing
 * IXC (Inexact) is set when an output denormal is flushed to zero

Update vfp_get_fpsr_from_host() to do this.

Note that because half-precision operations never set IDC, we now
need to add float_flag_input_denormal_used to the set we mask out of
fp_status_f16_a64.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vfp_helper.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static void arm_set_ah_fp_behaviours(float_status *s)
 #ifdef CONFIG_TCG
 
 /* Convert host exception flags to vfp form.  */
-static inline uint32_t vfp_exceptbits_from_host(int host_bits)
+static inline uint32_t vfp_exceptbits_from_host(int host_bits, bool ah)
 {
     uint32_t target_bits = 0;
 
@@ -XXX,XX +XXX,XX @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
     if (host_bits & float_flag_input_denormal_flushed) {
         target_bits |= FPSR_IDC;
     }
+    /*
+     * With FPCR.AH, IDC is set when an input denormal is used,
+     * and flushing an output denormal to zero sets both IXC and UFC.
+     */
+    if (ah && (host_bits & float_flag_input_denormal_used)) {
+        target_bits |= FPSR_IDC;
+    }
+    if (ah && (host_bits & float_flag_output_denormal_flushed)) {
+        target_bits |= FPSR_IXC;
+    }
     return target_bits;
 }
 
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
 
     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
-          & ~float_flag_input_denormal_flushed);
+          & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
     /*
      * Flushing an input denormal *only* because FPCR.FIZ == 1 does
      * not set FPSR.IDC; if FPCR.FZ is also set then this takes
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
         a64_flags &= ~float_flag_input_denormal_flushed;
     }
-    return vfp_exceptbits_from_host(a32_flags | a64_flags);
+    return vfp_exceptbits_from_host(a64_flags, env->vfp.fpcr & FPCR_AH) |
+        vfp_exceptbits_from_host(a32_flags, false);
 }
 
 static void vfp_clear_float_status_exc_flags(CPUARMState *env)
-- 
2.34.1

We are going to need to generate different code in some cases when
FPCR.AH is 1.  For example:
 * Floating point neg and abs must not flip the sign bit of NaNs
 * some insns (FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, and various
   BFCVT and BFM bfloat16 ops) need to use a different float_status
   to the usual one

Encode FPCR.AH into the A64 tbflags, so we can refer to it at
translate time.

Because we now have a bit in FPCR that affects codegen, we can't mark
the AArch64 FPCR register as being SUPPRESS_TB_END any more; writes
to it will now end the TB and trigger a regeneration of hflags.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h               | 1 +
 target/arm/tcg/translate.h     | 2 ++
 target/arm/helper.c            | 2 +-
 target/arm/tcg/hflags.c        | 4 ++++
 target/arm/tcg/translate-a64.c | 1 +
 5 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2, 34, 1)
 FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1)
 /* Set if FEAT_NV2 RAM accesses are big-endian */
 FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1)
+FIELD(TBFLAG_A64, AH, 37, 1)   /* FPCR.AH */
 
 /*
  * Helpers for using the above. Note that only the A64 accessors use
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool nv2_mem_e20;
     /* True if NV2 enabled and NV2 RAM accesses are big-endian */
     bool nv2_mem_be;
+    /* True if FPCR.AH is 1 (alternate floating point handling) */
+    bool fpcr_ah;
     /*
      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
      *  < 0, set by the current instruction.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = aa64_daif_write, .resetfn = arm_cp_reset_ignore },
     { .name = "FPCR", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .opc2 = 0, .crn = 4, .crm = 4,
-      .access = PL0_RW, .type = ARM_CP_FPU | ARM_CP_SUPPRESS_TB_END,
+      .access = PL0_RW, .type = ARM_CP_FPU,
       .readfn = aa64_fpcr_read, .writefn = aa64_fpcr_write },
     { .name = "FPSR", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .opc2 = 1, .crn = 4, .crm = 4,
diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/hflags.c
+++ b/target/arm/tcg/hflags.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
         DP_TBFLAG_A64(flags, TCMA, aa64_va_parameter_tcma(tcr, mmu_idx));
     }
 
+    if (env->vfp.fpcr & FPCR_AH) {
+        DP_TBFLAG_A64(flags, AH, 1);
+    }
+
     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 }
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->nv2 = EX_TBFLAG_A64(tb_flags, NV2);
     dc->nv2_mem_e20 = EX_TBFLAG_A64(tb_flags, NV2_MEM_E20);
     dc->nv2_mem_be = EX_TBFLAG_A64(tb_flags, NV2_MEM_BE);
+    dc->fpcr_ah = EX_TBFLAG_A64(tb_flags, AH);
     dc->vec_len = 0;
     dc->vec_stride = 0;
     dc->cp_regs = arm_cpu->cp_regs;
-- 
2.34.1

When FPCR.AH is 1, the behaviour of some instructions changes:
 * AdvSIMD BFCVT, BFCVTN, BFCVTN2, BFMLALB, BFMLALT
 * SVE BFCVT, BFCVTNT, BFMLALB, BFMLALT, BFMLSLB, BFMLSLT
 * SME BFCVT, BFCVTN, BFMLAL, BFMLSL (these are all in SME2 which
   QEMU does not yet implement)
 * FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS

The behaviour change is:
 * the instructions do not update the FPSR cumulative exception flags
 * trapped floating point exceptions are disabled (a no-op for QEMU,
   which doesn't implement FPCR.{IDE,IXE,UFE,OFE,DZE,IOE})
 * rounding is always round-to-nearest-even regardless of FPCR.RMode
 * denormalized inputs and outputs are always flushed to zero, as if
   FPCR.{FZ,FIZ} is {1,1}
 * FPCR.FZ16 is still honoured for half-precision inputs

(See the Arm ARM DDI0487L.a section A1.5.9.)

We can provide all these behaviours with another pair of float_status fields
which we use only for these insns, when FPCR.AH is 1. These float_status
fields will always have:
 * flush_to_zero and flush_inputs_to_zero set for the non-F16 field
 * rounding mode set to round-to-nearest-even
and so the only FPCR fields they need to honour are DN and FZ16.

In this commit we only define the new fp_status fields and give them
the required behaviour when FPSR is updated.  In subsequent commits
we will arrange to use this new fp_status field for the instructions
that should be affected by FPCR.AH in this way.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h           | 15 +++++++++++++++
 target/arm/internals.h     |  2 ++
 target/arm/tcg/translate.h | 14 ++++++++++++++
 target/arm/cpu.c           |  4 ++++
 target/arm/vfp_helper.c    | 13 ++++++++++++-
 5 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          *  standard_fp_status : the ARM "Standard FPSCR Value"
          *  standard_fp_status_fp16 : used for half-precision
          *       calculations with the ARM "Standard FPSCR Value"
+         *  ah_fp_status: used for the A64 insns which change behaviour
+         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
+         *       and the reciprocal and square root estimate/step insns)
+         *  ah_fp_status_f16: used for the A64 insns which change behaviour
+         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
+         *       and the reciprocal and square root estimate/step insns);
+         *       for half-precision
          *
          * Half-precision operations are governed by a separate
          * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
          * using a fixed value for it.
          *
+         * The ah_fp_status is needed because some insns have different
+         * behaviour when FPCR.AH == 1: they don't update cumulative
+         * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
+         * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
+         * which means we need an ah_fp_status_f16 as well.
+         *
          * To avoid having to transfer exception bits around, we simply
          * say that the FPSCR cumulative exception flags are the logical
          * OR of the flags in the four fp statuses. This relies on the
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
         float_status fp_status_f16_a64;
         float_status standard_fp_status;
         float_status standard_fp_status_f16;
+        float_status ah_fp_status;
+        float_status ah_fp_status_f16;
 
         uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
         uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ int alle1_tlbmask(CPUARMState *env);
 
 /* Set the float_status behaviour to match the Arm defaults */
 void arm_set_default_fp_behaviours(float_status *s);
+/* Set the float_status behaviour to match Arm FPCR.AH=1 behaviour */
+void arm_set_ah_fp_behaviours(float_status *s);
 
 #endif
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour {
     FPST_A64,
     FPST_A32_F16,
     FPST_A64_F16,
+    FPST_AH,
+    FPST_AH_F16,
     FPST_STD,
     FPST_STD_F16,
 } ARMFPStatusFlavour;
@@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour {
  *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
  * FPST_A64_F16
  *   for AArch64 operations controlled by the FPCR where FPCR.FZ16 is to be used
+ * FPST_AH:
+ *   for AArch64 operations which change behaviour when AH=1 (specifically,
+ *   bfloat16 conversions and multiplies, and the reciprocal and square root
+ *   estimate/step insns)
+ * FPST_AH_F16:
+ *   ditto, but for half-precision operations
  * FPST_STD
  *   for A32/T32 Neon operations using the "standard FPSCR value"
  * FPST_STD_F16
@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
     case FPST_A64_F16:
         offset = offsetof(CPUARMState, vfp.fp_status_f16_a64);
         break;
+    case FPST_AH:
+        offset = offsetof(CPUARMState, vfp.ah_fp_status);
+        break;
+    case FPST_AH_F16:
+        offset = offsetof(CPUARMState, vfp.ah_fp_status_f16);
+        break;
     case FPST_STD:
         offset = offsetof(CPUARMState, vfp.standard_fp_status);
         break;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
+    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
+    set_flush_to_zero(1, &env->vfp.ah_fp_status);
+    set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
+    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status_f16);
 
 #ifndef CONFIG_USER_ONLY
     if (kvm_enabled()) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ void arm_set_default_fp_behaviours(float_status *s)
  *    set Invalid for a QNaN
  *  * default NaN has sign bit set, msb frac bit set
  */
-static void arm_set_ah_fp_behaviours(float_status *s)
+void arm_set_ah_fp_behaviours(float_status *s)
 {
     set_float_detect_tininess(float_tininess_after_rounding, s);
     set_float_ftz_detection(float_ftz_after_rounding, s);
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
+    /*
+     * We do not merge in flags from ah_fp_status or ah_fp_status_f16, because
+     * they are used for insns that must not set the cumulative exception bits.
+     */
+
     /*
      * Flushing an input denormal *only* because FPCR.FIZ == 1 does
      * not set FPSR.IDC; if FPCR.FZ is also set then this takes
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
     set_float_exception_flags(0, &env->vfp.standard_fp_status);
     set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
+    set_float_exception_flags(0, &env->vfp.ah_fp_status);
+    set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
 }
 
 static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
+        set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
     }
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
+        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
+        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status_f16);
     }
     if (changed & FPCR_AH) {
         bool ah_enabled = val & FPCR_AH;
-- 
2.34.1

For the instructions FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, use
FPST_FPCR_AH or FPST_FPCR_AH_F16 when FPCR.AH is 1, so that they get
the required behaviour changes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.h |  13 ++++
 target/arm/tcg/translate-a64.c | 119 +++++++++++++++++++++++++--------
 target/arm/tcg/translate-sve.c |  30 ++++++---
 3 files changed, 127 insertions(+), 35 deletions(-)

diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.h
+++ b/target/arm/tcg/translate-a64.h
@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr pred_full_reg_ptr(DisasContext *s, int regno)
     return ret;
 }
 
+/*
+ * Return the ARMFPStatusFlavour to use based on element size and
+ * whether FPCR.AH is set.
+ */
+static inline ARMFPStatusFlavour select_ah_fpst(DisasContext *s, MemOp esz)
+{
+    if (s->fpcr_ah) {
+        return esz == MO_16 ? FPST_AH_F16 : FPST_AH;
+    } else {
+        return esz == MO_16 ? FPST_A64_F16 : FPST_A64;
+    }
+}
+
 bool disas_sve(DisasContext *, uint32_t);
 bool disas_sme(DisasContext *, uint32_t);
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op3_ool(DisasContext *s, bool is_q, int rd,
  * an out-of-line helper.
  */
 static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, int rd, int rn,
-                              int rm, bool is_fp16, int data,
+                              int rm, ARMFPStatusFlavour fpsttype, int data,
                               gen_helper_gvec_3_ptr *fn)
 {
-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_A64_F16 : FPST_A64);
+    TCGv_ptr fpst = fpstatus_ptr(fpsttype);
     tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn),
                        vec_full_reg_offset(s, rm), fpst,
@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar {
     void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
 } FPScalar;
 
-static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
+                                        const FPScalar *f,
+                                        ARMFPStatusFlavour fpsttype)
 {
     switch (a->esz) {
     case MO_64:
         if (fp_access_check(s)) {
             TCGv_i64 t0 = read_fp_dreg(s, a->rn);
             TCGv_i64 t1 = read_fp_dreg(s, a->rm);
-            f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_A64));
+            f->gen_d(t0, t0, t1, fpstatus_ptr(fpsttype));
             write_fp_dreg(s, a->rd, t0);
         }
         break;
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
         if (fp_access_check(s)) {
             TCGv_i32 t0 = read_fp_sreg(s, a->rn);
             TCGv_i32 t1 = read_fp_sreg(s, a->rm);
-            f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_A64));
+            f->gen_s(t0, t0, t1, fpstatus_ptr(fpsttype));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
         if (fp_access_check(s)) {
             TCGv_i32 t0 = read_fp_hreg(s, a->rn);
             TCGv_i32 t1 = read_fp_hreg(s, a->rm);
-            f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_A64_F16));
+            f->gen_h(t0, t0, t1, fpstatus_ptr(fpsttype));
             write_fp_sreg(s, a->rd, t0);
         }
         break;
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
     return true;
 }
 
+static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+{
+    return do_fp3_scalar_with_fpsttype(s, a, f,
+                                       a->esz == MO_16 ?
+                                       FPST_A64_F16 : FPST_A64);
+}
+
+static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+{
+    return do_fp3_scalar_with_fpsttype(s, a, f, select_ah_fpst(s, a->esz));
+}
+
 static const FPScalar f_scalar_fadd = {
     gen_helper_vfp_addh,
     gen_helper_vfp_adds,
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f32,
     gen_helper_recpsf_f64,
 };
-TRANS(FRECPS_s, do_fp3_scalar, a, &f_scalar_frecps)
+TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps)
 
 static const FPScalar f_scalar_frsqrts = {
     gen_helper_rsqrtsf_f16,
     gen_helper_rsqrtsf_f32,
     gen_helper_rsqrtsf_f64,
 };
-TRANS(FRSQRTS_s, do_fp3_scalar, a, &f_scalar_frsqrts)
+TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts)
 
 static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                        const FPScalar *f, bool swap)
@@ -XXX,XX +XXX,XX @@ TRANS(CMHS_s, do_cmop_d, a, TCG_COND_GEU)
 TRANS(CMEQ_s, do_cmop_d, a, TCG_COND_EQ)
 TRANS(CMTST_s, do_cmop_d, a, TCG_COND_TSTNE)
 
-static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
-                          gen_helper_gvec_3_ptr * const fns[3])
+static bool do_fp3_vector_with_fpsttype(DisasContext *s, arg_qrrr_e *a,
+                                        int data,
+                                        gen_helper_gvec_3_ptr * const fns[3],
+                                        ARMFPStatusFlavour fpsttype)
 {
     MemOp esz = a->esz;
     int check = fp_access_check_vector_hsd(s, a->q, esz);
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
         return check == 0;
     }
 
-    gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
-                      esz == MO_16, data, fns[esz - 1]);
+    gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm, fpsttype,
+                      data, fns[esz - 1]);
     return true;
 }
 
+static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
+                          gen_helper_gvec_3_ptr * const fns[3])
+{
+    return do_fp3_vector_with_fpsttype(s, a, data, fns,
+                                       a->esz == MO_16 ?
+                                       FPST_A64_F16 : FPST_A64);
+}
+
+static bool do_fp3_vector_ah(DisasContext *s, arg_qrrr_e *a, int data,
+                             gen_helper_gvec_3_ptr * const f[3])
+{
+    return do_fp3_vector_with_fpsttype(s, a, data, f,
+                                       select_ah_fpst(s, a->esz));
+}
+
 static gen_helper_gvec_3_ptr * const f_vector_fadd[3] = {
     gen_helper_gvec_fadd_h,
     gen_helper_gvec_fadd_s,
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
     gen_helper_gvec_recps_s,
     gen_helper_gvec_recps_d,
 };
-TRANS(FRECPS_v, do_fp3_vector, a, 0, f_vector_frecps)
+TRANS(FRECPS_v, do_fp3_vector_ah, a, 0, f_vector_frecps)
 
 static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] = {
     gen_helper_gvec_rsqrts_h,
     gen_helper_gvec_rsqrts_s,
     gen_helper_gvec_rsqrts_d,
 };
-TRANS(FRSQRTS_v, do_fp3_vector, a, 0, f_vector_frsqrts)
+TRANS(FRSQRTS_v, do_fp3_vector_ah, a, 0, f_vector_frsqrts)
 
 static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = {
     gen_helper_gvec_faddp_h,
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector_idx(DisasContext *s, arg_qrrx_e *a,
     }
 
     gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
-                      esz == MO_16, a->idx, fns[esz - 1]);
+                      esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+                      a->idx, fns[esz - 1]);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar1 {
     void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_ptr);
 } FPScalar1;
 
-static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
-                          const FPScalar1 *f, int rmode)
+static bool do_fp1_scalar_with_fpsttype(DisasContext *s, arg_rr_e *a,
+                                        const FPScalar1 *f, int rmode,
+                                        ARMFPStatusFlavour fpsttype)
 {
     TCGv_i32 tcg_rmode = NULL;
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
         return check == 0;
     }
 
-    fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+    fpst = fpstatus_ptr(fpsttype);
     if (rmode >= 0) {
         tcg_rmode = gen_set_rmode(rmode, fpst);
     }
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
     return true;
 }
 
+static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a,
+                          const FPScalar1 *f, int rmode)
+{
+    return do_fp1_scalar_with_fpsttype(s, a, f, rmode,
+                                       a->esz == MO_16 ?
+                                       FPST_A64_F16 : FPST_A64);
+}
+
+static bool do_fp1_scalar_ah(DisasContext *s, arg_rr_e *a,
+                             const FPScalar1 *f, int rmode)
+{
+    return do_fp1_scalar_with_fpsttype(s, a, f, rmode, select_ah_fpst(s, a->esz));
+}
+
 static const FPScalar1 f_scalar_fsqrt = {
     gen_helper_vfp_sqrth,
     gen_helper_vfp_sqrts,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frecpe = {
     gen_helper_recpe_f32,
     gen_helper_recpe_f64,
 };
-TRANS(FRECPE_s, do_fp1_scalar, a, &f_scalar_frecpe, -1)
+TRANS(FRECPE_s, do_fp1_scalar_ah, a, &f_scalar_frecpe, -1)
 
 static const FPScalar1 f_scalar_frecpx = {
     gen_helper_frecpx_f16,
     gen_helper_frecpx_f32,
     gen_helper_frecpx_f64,
 };
-TRANS(FRECPX_s, do_fp1_scalar, a, &f_scalar_frecpx, -1)
+TRANS(FRECPX_s, do_fp1_scalar_ah, a, &f_scalar_frecpx, -1)
 
 static const FPScalar1 f_scalar_frsqrte = {
     gen_helper_rsqrte_f16,
     gen_helper_rsqrte_f32,
     gen_helper_rsqrte_f64,
 };
-TRANS(FRSQRTE_s, do_fp1_scalar, a, &f_scalar_frsqrte, -1)
+TRANS(FRSQRTE_s, do_fp1_scalar_ah, a, &f_scalar_frsqrte, -1)
 
 static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
 {
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FRINT64Z_v, aa64_frint, do_fp1_vector, a,
            &f_scalar_frint64, FPROUNDING_ZERO)
 TRANS_FEAT(FRINT64X_v, aa64_frint, do_fp1_vector, a, &f_scalar_frint64, -1)
 
-static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
-                             int rd, int rn, int data,
-                             gen_helper_gvec_2_ptr * const fns[3])
+static bool do_gvec_op2_fpst_with_fpsttype(DisasContext *s, MemOp esz,
+                                           bool is_q, int rd, int rn, int data,
+                                           gen_helper_gvec_2_ptr * const fns[3],
+                                           ARMFPStatusFlavour fpsttype)
 {
     int check = fp_access_check_vector_hsd(s, is_q, esz);
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
         return check == 0;
     }
 
-    fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+    fpst = fpstatus_ptr(fpsttype);
     tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn), fpst,
                        is_q ? 16 : 8, vec_full_reg_size(s),
@@ -XXX,XX +XXX,XX @@ static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
     return true;
 }
 
+static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q,
+                             int rd, int rn, int data,
+                             gen_helper_gvec_2_ptr * const fns[3])
+{
+    return do_gvec_op2_fpst_with_fpsttype(s, esz, is_q, rd, rn, data, fns,
+                                          esz == MO_16 ? FPST_A64_F16 :
+                                          FPST_A64);
+}
+
+static bool do_gvec_op2_ah_fpst(DisasContext *s, MemOp esz, bool is_q,
+                                int rd, int rn, int data,
+                                gen_helper_gvec_2_ptr * const fns[3])
+{
+    return do_gvec_op2_fpst_with_fpsttype(s, esz, is_q, rd, rn, data,
+                                          fns, select_ah_fpst(s, esz));
+}
+
 static gen_helper_gvec_2_ptr * const f_scvtf_v[] = {
     gen_helper_gvec_vcvt_sh,
     gen_helper_gvec_vcvt_sf,
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_frecpe[] = {
     gen_helper_gvec_frecpe_s,
     gen_helper_gvec_frecpe_d,
 };
-TRANS(FRECPE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
+TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
 
 static gen_helper_gvec_2_ptr * const f_frsqrte[] = {
     gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s,
     gen_helper_gvec_frsqrte_d,
 };
-TRANS(FRSQRTE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
+TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
 
 static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
 {
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool gen_gvec_fpst_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
     return true;
 }
 
-static bool gen_gvec_fpst_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
-                                 arg_rr_esz *a, int data)
+static bool gen_gvec_fpst_ah_arg_zz(DisasContext *s, gen_helper_gvec_2_ptr *fn,
+                                    arg_rr_esz *a, int data)
 {
     return gen_gvec_fpst_zz(s, fn, a->rd, a->rn, data,
-                            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+                            select_ah_fpst(s, a->esz));
 }
 
 /* Invoke an out-of-line helper on 3 Zregs. */
@@ -XXX,XX +XXX,XX @@ static bool gen_gvec_fpst_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
                              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
 }
 
+static bool gen_gvec_fpst_ah_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
+                                     arg_rrr_esz *a, int data)
+{
+    return gen_gvec_fpst_zzz(s, fn, a->rd, a->rn, a->rm, data,
+                             select_ah_fpst(s, a->esz));
+}
+
 /* Invoke an out-of-line helper on 4 Zregs. */
 static bool gen_gvec_ool_zzzz(DisasContext *s, gen_helper_gvec_4 *fn,
                               int rd, int rn, int rm, int ra, int data)
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const frecpe_fns[] = {
     NULL,                     gen_helper_gvec_frecpe_h,
     gen_helper_gvec_frecpe_s, gen_helper_gvec_frecpe_d,
 };
-TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_arg_zz, frecpe_fns[a->esz], a, 0)
+TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frecpe_fns[a->esz], a, 0)
 
 static gen_helper_gvec_2_ptr * const frsqrte_fns[] = {
     NULL,                      gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s, gen_helper_gvec_frsqrte_d,
 };
-TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_arg_zz, frsqrte_fns[a->esz], a, 0)
+TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frsqrte_fns[a->esz], a, 0)
 
 /*
  *** SVE Floating Point Compare with Zero Group
@@ -XXX,XX +XXX,XX @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
     };                                                              \
     TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_arg_zzz, name##_fns[a->esz], a, 0)
 
+#define DO_FP3_AH(NAME, name) \
+    static gen_helper_gvec_3_ptr * const name##_fns[4] = {          \
+        NULL, gen_helper_gvec_##name##_h,                           \
+        gen_helper_gvec_##name##_s, gen_helper_gvec_##name##_d      \
+    };                                                              \
+    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_ah_arg_zzz, name##_fns[a->esz], a, 0)
+
 DO_FP3(FADD_zzz, fadd)
 DO_FP3(FSUB_zzz, fsub)
 DO_FP3(FMUL_zzz, fmul)
-DO_FP3(FRECPS, recps)
-DO_FP3(FRSQRTS, rsqrts)
+DO_FP3_AH(FRECPS, recps)
+DO_FP3_AH(FRSQRTS, rsqrts)
 
 #undef DO_FP3
 
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const frecpx_fns[] = {
     gen_helper_sve_frecpx_s, gen_helper_sve_frecpx_d,
 };
 TRANS_FEAT(FRECPX, aa64_sve, gen_gvec_fpst_arg_zpz, frecpx_fns[a->esz],
-           a, 0, a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
+           a, 0, select_ah_fpst(s, a->esz))
 
 static gen_helper_gvec_3_ptr * const fsqrt_fns[] = {
     NULL,                   gen_helper_sve_fsqrt_h,
-- 
2.34.1

When FPCR.AH is 1, use FPST_FPCR_AH for:
 * AdvSIMD BFCVT, BFCVTN, BFCVTN2
 * SVE BFCVT, BFCVTNT

so that they get the required behaviour changes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 27 +++++++++++++++++++++------
 target/arm/tcg/translate-sve.c |  6 ++++--
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
 static const FPScalar1 f_scalar_bfcvt = {
     .gen_s = gen_helper_bfcvt,
 };
-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar, a, &f_scalar_bfcvt, -1)
+TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
 
 static const FPScalar1 f_scalar_frint32 = {
     NULL,
@@ -XXX,XX +XXX,XX @@ static void gen_bfcvtn_hs(TCGv_i64 d, TCGv_i64 n)
     tcg_gen_extu_i32_i64(d, tmp);
 }
 
-static ArithOneOp * const f_vector_bfcvtn[] = {
-    NULL,
-    gen_bfcvtn_hs,
-    NULL,
+static void gen_bfcvtn_ah_hs(TCGv_i64 d, TCGv_i64 n)
+{
+    TCGv_ptr fpst = fpstatus_ptr(FPST_AH);
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    gen_helper_bfcvt_pair(tmp, n, fpst);
+    tcg_gen_extu_i32_i64(d, tmp);
+}
+
+static ArithOneOp * const f_vector_bfcvtn[2][3] = {
+    {
+        NULL,
+        gen_bfcvtn_hs,
+        NULL,
+    }, {
+        NULL,
+        gen_bfcvtn_ah_hs,
+        NULL,
+    }
 };
-TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a, f_vector_bfcvtn)
+TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a,
+           f_vector_bfcvtn[s->fpcr_ah])
 
 static bool trans_SHLL_v(DisasContext *s, arg_qrr_e *a)
 {
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVT_hs, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvt_hs, a, 0, FPST_A64_F16)
 
 TRANS_FEAT(BFCVT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_bfcvt, a, 0, FPST_A64)
+           gen_helper_sve_bfcvt, a, 0,
+           s->fpcr_ah ? FPST_AH : FPST_A64)
 
 TRANS_FEAT(FCVT_dh, aa64_sve, gen_gvec_fpst_arg_zpz,
            gen_helper_sve_fcvt_dh, a, 0, FPST_A64)
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVTNT_ds, aa64_sve2, gen_gvec_fpst_arg_zpz,
            gen_helper_sve2_fcvtnt_ds, a, 0, FPST_A64)
 
 TRANS_FEAT(BFCVTNT, aa64_sve_bf16, gen_gvec_fpst_arg_zpz,
-           gen_helper_sve_bfcvtnt, a, 0, FPST_A64)
+           gen_helper_sve_bfcvtnt, a, 0,
+           s->fpcr_ah ? FPST_AH : FPST_A64)
 
 TRANS_FEAT(FCVTLT_hs, aa64_sve2, gen_gvec_fpst_arg_zpz,
            gen_helper_sve2_fcvtlt_hs, a, 0, FPST_A64)
-- 
2.34.1

When FPCR.AH is 1, use FPST_FPCR_AH for:
 * AdvSIMD BFMLALB, BFMLALT
 * SVE BFMLALB, BFMLALT, BFMLSLB, BFMLSLT

so that they get the required behaviour changes.

We do this by making gen_gvec_op4_fpst() take an ARMFPStatusFlavour
rather than a bool is_fp16; existing callsites now select
FPST_FPCR_F16_A64 vs FPST_FPCR_A64 themselves rather than passing in
the boolean.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 20 +++++++++++++-------
 target/arm/tcg/translate-sve.c |  6 ++++--
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op4_env(DisasContext *s, bool is_q, int rd, int rn,
  * an out-of-line helper.
  */
 static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
-                              int rm, int ra, bool is_fp16, int data,
+                              int rm, int ra, ARMFPStatusFlavour fpsttype,
+                              int data,
                               gen_helper_gvec_4_ptr *fn)
 {
-    TCGv_ptr fpst = fpstatus_ptr(is_fp16 ? FPST_A64_F16 : FPST_A64);
+    TCGv_ptr fpst = fpstatus_ptr(fpsttype);
     tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, rd),
                        vec_full_reg_offset(s, rn),
                        vec_full_reg_offset(s, rm),
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLAL_v(DisasContext *s, arg_qrrr_e *a)
     }
     if (fp_access_check(s)) {
         /* Q bit selects BFMLALB vs BFMLALT. */
-        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, false, a->q,
+        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd,
+                          s->fpcr_ah ? FPST_AH : FPST_A64, a->q,
                           gen_helper_gvec_bfmlal);
     }
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
     }
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
-                      a->esz == MO_16, a->rot, fn[a->esz]);
+                      a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+                      a->rot, fn[a->esz]);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
     }
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
-                      esz == MO_16, (a->idx << 1) | neg,
+                      esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+                      (a->idx << 1) | neg,
                       fns[esz - 1]);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLAL_vi(DisasContext *s, arg_qrrx_e *a)
     }
     if (fp_access_check(s)) {
         /* Q bit selects BFMLALB vs BFMLALT. */
-        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, 0,
+        gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd,
+                          s->fpcr_ah ? FPST_AH : FPST_A64,
                           (a->idx << 1) | a->q,
                           gen_helper_gvec_bfmlal_idx);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
     }
     if (fp_access_check(s)) {
         gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
-                          a->esz == MO_16, (a->idx << 2) | a->rot, fn);
+                          a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
+                          (a->idx << 2) | a->rot, fn);
     }
     return true;
 }
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(BFMMLA, aa64_sve_bf16, gen_gvec_env_arg_zzzz,
 static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
 {
     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal,
-                              a->rd, a->rn, a->rm, a->ra, sel, FPST_A64);
+                              a->rd, a->rn, a->rm, a->ra, sel,
+                              s->fpcr_ah ? FPST_AH : FPST_A64);
 }
 
 TRANS_FEAT(BFMLALB_zzzw, aa64_sve_bf16, do_BFMLAL_zzzw, a, false)
@@ -XXX,XX +XXX,XX @@ static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
 {
     return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlal_idx,
                               a->rd, a->rn, a->rm, a->ra,
-                              (a->index << 1) | sel, FPST_A64);
+                              (a->index << 1) | sel,
+                              s->fpcr_ah ? FPST_AH : FPST_A64);
 }
 
 TRANS_FEAT(BFMLALB_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, false)
-- 
2.34.1

For FEAT_AFP, we want to emit different code when FPCR.NEP is set, so
that instead of zeroing the high elements of a vector register when
we write the output of a scalar operation to it, we instead merge in
those elements from one of the source registers.  Since this affects
the generated code, we need to put FPCR.NEP into the TBFLAGS.

FPCR.NEP is treated as 0 when in streaming SVE mode and FEAT_SME_FA64
is not implemented or not enabled; we can implement this logic in
rebuild_hflags_a64().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h               | 1 +
 target/arm/tcg/translate.h     | 2 ++
 target/arm/tcg/hflags.c        | 9 +++++++++
 target/arm/tcg/translate-a64.c | 1 +
 4 files changed, 13 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1)
 /* Set if FEAT_NV2 RAM accesses are big-endian */
 FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1)
 FIELD(TBFLAG_A64, AH, 37, 1)   /* FPCR.AH */
+FIELD(TBFLAG_A64, NEP, 38, 1)   /* FPCR.NEP */
 
 /*
  * Helpers for using the above. Note that only the A64 accessors use
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool nv2_mem_be;
     /* True if FPCR.AH is 1 (alternate floating point handling) */
     bool fpcr_ah;
+    /* True if FPCR.NEP is 1 (FEAT_AFP scalar upper-element result handling) */
+    bool fpcr_nep;
     /*
      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
      *  < 0, set by the current instruction.
diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/hflags.c
+++ b/target/arm/tcg/hflags.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
     if (env->vfp.fpcr & FPCR_AH) {
         DP_TBFLAG_A64(flags, AH, 1);
     }
+    if (env->vfp.fpcr & FPCR_NEP) {
+        /*
+         * In streaming-SVE without FA64, NEP behaves as if zero;
+         * compare pseudocode IsMerging()
+         */
+        if (!(EX_TBFLAG_A64(flags, PSTATE_SM) && !sme_fa64(env, el))) {
+            DP_TBFLAG_A64(flags, NEP, 1);
+        }
+    }
 
     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 }
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->nv2_mem_e20 = EX_TBFLAG_A64(tb_flags, NV2_MEM_E20);
     dc->nv2_mem_be = EX_TBFLAG_A64(tb_flags, NV2_MEM_BE);
     dc->fpcr_ah = EX_TBFLAG_A64(tb_flags, AH);
+    dc->fpcr_nep = EX_TBFLAG_A64(tb_flags, NEP);
     dc->vec_len = 0;
     dc->vec_stride = 0;
     dc->cp_regs = arm_cpu->cp_regs;
-- 
2.34.1

For FEAT_AFP's FPCR.NEP bit, we need to programmatically change the
behaviour of the writeback of the result for most SIMD scalar
operations, so that instead of zeroing the upper part of the result
register it merges the upper elements from one of the input
registers.

Provide new functions write_fp_*reg_merging() which can be used
instead of the existing write_fp_*reg() functions when we want this
"merge the result with one of the input registers if FPCR.NEP is
enabled" handling, and use them in do_fp3_scalar_with_fpsttype().

Note that (as documented in the description of the FPCR.NEP bit)
which input register to use as the merge source varies by
instruction: for these 2-input scalar operations, the comparison
instructions take from Rm, not Rn.

We'll extend this to also provide the merging behaviour for
the remaining scalar insns in subsequent commits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 117 +++++++++++++++++++++++++--------
 1 file changed, 91 insertions(+), 26 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void write_fp_sreg(DisasContext *s, int reg, TCGv_i32 v)
     write_fp_dreg(s, reg, tmp);
 }
 
+/*
+ * Write a double result to 128 bit vector register reg, honouring FPCR.NEP:
+ * - if FPCR.NEP == 0, clear the high elements of reg
+ * - if FPCR.NEP == 1, set the high elements of reg from mergereg
+ *   (i.e. merge the result with those high elements)
+ * In either case, SVE register bits above 128 are zeroed (per R_WKYLB).
+ */
+static void write_fp_dreg_merging(DisasContext *s, int reg, int mergereg,
+                                  TCGv_i64 v)
+{
+    if (!s->fpcr_nep) {
+        write_fp_dreg(s, reg, v);
+        return;
+    }
+
+    /*
+     * Move from mergereg to reg; this sets the high elements and
+     * clears the bits above 128 as a side effect.
+     */
+    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
+                     vec_full_reg_offset(s, mergereg),
+                     16, vec_full_reg_size(s));
+    tcg_gen_st_i64(v, tcg_env, vec_full_reg_offset(s, reg));
+}
+
+/*
+ * Write a single-prec result, but only clear the higher elements
+ * of the destination register if FPCR.NEP is 0; otherwise preserve them.
+ */
+static void write_fp_sreg_merging(DisasContext *s, int reg, int mergereg,
+                                  TCGv_i32 v)
+{
+    if (!s->fpcr_nep) {
+        write_fp_sreg(s, reg, v);
+        return;
+    }
+
+    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
+                     vec_full_reg_offset(s, mergereg),
+                     16, vec_full_reg_size(s));
+    tcg_gen_st_i32(v, tcg_env, fp_reg_offset(s, reg, MO_32));
+}
+
+/*
+ * Write a half-prec result, but only clear the higher elements
+ * of the destination register if FPCR.NEP is 0; otherwise preserve them.
+ * The caller must ensure that the top 16 bits of v are zero.
+ */
+static void write_fp_hreg_merging(DisasContext *s, int reg, int mergereg,
+                                  TCGv_i32 v)
+{
+    if (!s->fpcr_nep) {
+        write_fp_sreg(s, reg, v);
+        return;
+    }
+
+    tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, reg),
+                     vec_full_reg_offset(s, mergereg),
+                     16, vec_full_reg_size(s));
+    tcg_gen_st16_i32(v, tcg_env, fp_reg_offset(s, reg, MO_16));
+}
+
 /* Expand a 2-operand AdvSIMD vector operation using an expander function.  */
 static void gen_gvec_fn2(DisasContext *s, bool is_q, int rd, int rn,
                          GVecGen2Fn *gvec_fn, int vece)
@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar {
 } FPScalar;
 
 static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
-                                        const FPScalar *f,
+                                        const FPScalar *f, int mergereg,
                                         ARMFPStatusFlavour fpsttype)
 {
     switch (a->esz) {
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
             TCGv_i64 t0 = read_fp_dreg(s, a->rn);
             TCGv_i64 t1 = read_fp_dreg(s, a->rm);
             f->gen_d(t0, t0, t1, fpstatus_ptr(fpsttype));
-            write_fp_dreg(s, a->rd, t0);
+            write_fp_dreg_merging(s, a->rd, mergereg, t0);
         }
         break;
     case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
             TCGv_i32 t0 = read_fp_sreg(s, a->rn);
             TCGv_i32 t1 = read_fp_sreg(s, a->rm);
             f->gen_s(t0, t0, t1, fpstatus_ptr(fpsttype));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_sreg_merging(s, a->rd, mergereg, t0);
         }
         break;
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
             TCGv_i32 t0 = read_fp_hreg(s, a->rn);
             TCGv_i32 t1 = read_fp_hreg(s, a->rm);
             f->gen_h(t0, t0, t1, fpstatus_ptr(fpsttype));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_hreg_merging(s, a->rd, mergereg, t0);
         }
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_with_fpsttype(DisasContext *s, arg_rrr_e *a,
     return true;
 }
 
-static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
+                          int mergereg)
 {
-    return do_fp3_scalar_with_fpsttype(s, a, f,
+    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
                                        a->esz == MO_16 ?
                                        FPST_A64_F16 : FPST_A64);
 }
 
-static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f)
+static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
+                             int mergereg)
 {
-    return do_fp3_scalar_with_fpsttype(s, a, f, select_ah_fpst(s, a->esz));
+    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
+                                       select_ah_fpst(s, a->esz));
 }
 
 static const FPScalar f_scalar_fadd = {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fadd = {
     gen_helper_vfp_adds,
     gen_helper_vfp_addd,
 };
-TRANS(FADD_s, do_fp3_scalar, a, &f_scalar_fadd)
+TRANS(FADD_s, do_fp3_scalar, a, &f_scalar_fadd, a->rn)
 
 static const FPScalar f_scalar_fsub = {
     gen_helper_vfp_subh,
     gen_helper_vfp_subs,
     gen_helper_vfp_subd,
 };
-TRANS(FSUB_s, do_fp3_scalar, a, &f_scalar_fsub)
+TRANS(FSUB_s, do_fp3_scalar, a, &f_scalar_fsub, a->rn)
 
 static const FPScalar f_scalar_fdiv = {
     gen_helper_vfp_divh,
     gen_helper_vfp_divs,
     gen_helper_vfp_divd,
 };
-TRANS(FDIV_s, do_fp3_scalar, a, &f_scalar_fdiv)
+TRANS(FDIV_s, do_fp3_scalar, a, &f_scalar_fdiv, a->rn)
 
 static const FPScalar f_scalar_fmul = {
     gen_helper_vfp_mulh,
     gen_helper_vfp_muls,
     gen_helper_vfp_muld,
 };
-TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul)
+TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul, a->rn)
 
 static const FPScalar f_scalar_fmax = {
     gen_helper_vfp_maxh,
     gen_helper_vfp_maxs,
     gen_helper_vfp_maxd,
 };
-TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax)
+TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax, a->rn)
 
 static const FPScalar f_scalar_fmin = {
     gen_helper_vfp_minh,
     gen_helper_vfp_mins,
     gen_helper_vfp_mind,
 };
-TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin)
+TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin, a->rn)
 
 static const FPScalar f_scalar_fmaxnm = {
     gen_helper_vfp_maxnumh,
     gen_helper_vfp_maxnums,
     gen_helper_vfp_maxnumd,
 };
-TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm)
+TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm, a->rn)
 
 static const FPScalar f_scalar_fminnm = {
     gen_helper_vfp_minnumh,
     gen_helper_vfp_minnums,
     gen_helper_vfp_minnumd,
 };
-TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm)
+TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm, a->rn)
 
 static const FPScalar f_scalar_fmulx = {
     gen_helper_advsimd_mulxh,
     gen_helper_vfp_mulxs,
     gen_helper_vfp_mulxd,
 };
-TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx)
+TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx, a->rn)
 
 static void gen_fnmul_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fnmul = {
     gen_fnmul_s,
     gen_fnmul_d,
 };
-TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul)
+TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul, a->rn)
 
 static const FPScalar f_scalar_fcmeq = {
     gen_helper_advsimd_ceq_f16,
     gen_helper_neon_ceq_f32,
     gen_helper_neon_ceq_f64,
 };
-TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq)
+TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq, a->rm)
 
 static const FPScalar f_scalar_fcmge = {
     gen_helper_advsimd_cge_f16,
     gen_helper_neon_cge_f32,
     gen_helper_neon_cge_f64,
 };
-TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge)
+TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge, a->rm)
 
 static const FPScalar f_scalar_fcmgt = {
     gen_helper_advsimd_cgt_f16,
     gen_helper_neon_cgt_f32,
     gen_helper_neon_cgt_f64,
 };
-TRANS(FCMGT_s, do_fp3_scalar, a, &f_scalar_fcmgt)
+TRANS(FCMGT_s, do_fp3_scalar, a, &f_scalar_fcmgt, a->rm)
 
 static const FPScalar f_scalar_facge = {
     gen_helper_advsimd_acge_f16,
     gen_helper_neon_acge_f32,
     gen_helper_neon_acge_f64,
 };
-TRANS(FACGE_s, do_fp3_scalar, a, &f_scalar_facge)
+TRANS(FACGE_s, do_fp3_scalar, a, &f_scalar_facge, a->rm)
 
 static const FPScalar f_scalar_facgt = {
     gen_helper_advsimd_acgt_f16,
     gen_helper_neon_acgt_f32,
     gen_helper_neon_acgt_f64,
 };
-TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt)
+TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt, a->rm)
 
 static void gen_fabd_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
 {
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fabd = {
     gen_fabd_s,
     gen_fabd_d,
 };
-TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd)
+TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd, a->rn)
 
 static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f16,
     gen_helper_recpsf_f32,
     gen_helper_recpsf_f64,
 };
-TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps)
+TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps, a->rn)
 
 static const FPScalar f_scalar_frsqrts = {
     gen_helper_rsqrtsf_f16,
     gen_helper_rsqrtsf_f32,
     gen_helper_rsqrtsf_f64,
 };
-TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts)
+TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts, a->rn)
 
 static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                        const FPScalar *f, bool swap)
-- 
2.34.1

Handle FPCR.NEP for the 3-input scalar operations which use
do_fmla_scalar_idx() and do_fmadd(), by making them call the
appropriate write_fp_*reg_merging() functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
                 gen_vfp_negd(t1, t1);
             }
             gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
-            write_fp_dreg(s, a->rd, t0);
+            write_fp_dreg_merging(s, a->rd, a->rd, t0);
         }
         break;
     case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
                 gen_vfp_negs(t1, t1);
             }
             gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_sreg_merging(s, a->rd, a->rd, t0);
         }
         break;
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
             }
             gen_helper_advsimd_muladdh(t0, t1, t2, t0,
                                        fpstatus_ptr(FPST_A64_F16));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_hreg_merging(s, a->rd, a->rd, t0);
         }
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             }
             fpst = fpstatus_ptr(FPST_A64);
             gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
-            write_fp_dreg(s, a->rd, ta);
+            write_fp_dreg_merging(s, a->rd, a->ra, ta);
         }
         break;
 
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             }
             fpst = fpstatus_ptr(FPST_A64);
             gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
-            write_fp_sreg(s, a->rd, ta);
+            write_fp_sreg_merging(s, a->rd, a->ra, ta);
         }
         break;
 
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             }
             fpst = fpstatus_ptr(FPST_A64_F16);
             gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
-            write_fp_sreg(s, a->rd, ta);
+            write_fp_hreg_merging(s, a->rd, a->ra, ta);
         }
         break;
 
-- 
2.34.1

Currently we implement BFCVT scalar via do_fp1_scalar().  This works
even though BFCVT is a narrowing operation from 32 to 16 bits,
because we can use write_fp_sreg() for float16. However, FPCR.NEP
support requires that we use write_fp_hreg_merging() for float16
outputs, so we can't continue to borrow the non-narrowing
do_fp1_scalar() function for this. Split out trans_BFCVT_s()
into its own implementation that honours FPCR.NEP.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frintx = {
 };
 TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
 
-static const FPScalar1 f_scalar_bfcvt = {
-    .gen_s = gen_helper_bfcvt,
-};
-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
+static bool trans_BFCVT_s(DisasContext *s, arg_rr_e *a)
+{
+    ARMFPStatusFlavour fpsttype = s->fpcr_ah ? FPST_AH : FPST_A64;
+    TCGv_i32 t32;
+    int check;
+
+    if (!dc_isar_feature(aa64_bf16, s)) {
+        return false;
+    }
+
+    check = fp_access_check_scalar_hsd(s, a->esz);
+
+    if (check <= 0) {
+        return check == 0;
+    }
+
+    t32 = read_fp_sreg(s, a->rn);
+    gen_helper_bfcvt(t32, t32, fpstatus_ptr(fpsttype));
+    write_fp_hreg_merging(s, a->rd, a->rd, t32);
+    return true;
+}
 
 static const FPScalar1 f_scalar_frint32 = {
     NULL,
-- 
2.34.1

Handle FPCR.NEP for the 1-input scalar operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_with_fpsttype(DisasContext *s, arg_rr_e *a,
     case MO_64:
         t64 = read_fp_dreg(s, a->rn);
         f->gen_d(t64, t64, fpst);
-        write_fp_dreg(s, a->rd, t64);
+        write_fp_dreg_merging(s, a->rd, a->rd, t64);
         break;
     case MO_32:
         t32 = read_fp_sreg(s, a->rn);
         f->gen_s(t32, t32, fpst);
-        write_fp_sreg(s, a->rd, t32);
+        write_fp_sreg_merging(s, a->rd, a->rd, t32);
         break;
     case MO_16:
         t32 = read_fp_hreg(s, a->rn);
         f->gen_h(t32, t32, fpst);
-        write_fp_sreg(s, a->rd, t32);
+        write_fp_hreg_merging(s, a->rd, a->rd, t32);
         break;
     default:
         g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
 
         gen_helper_vfp_fcvtds(tcg_rd, tcg_rn, fpst);
-        write_fp_dreg(s, a->rd, tcg_rd);
+        write_fp_dreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_hs(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
 
         gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-        /* write_fp_sreg is OK here because top half of result is zero */
-        write_fp_sreg(s, a->rd, tmp);
+        /* write_fp_hreg_merging is OK here because top half of result is zero */
+        write_fp_hreg_merging(s, a->rd, a->rd, tmp);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_sd(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
 
         gen_helper_vfp_fcvtsd(tcg_rd, tcg_rn, fpst);
-        write_fp_sreg(s, a->rd, tcg_rd);
+        write_fp_sreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_hd(DisasContext *s, arg_rr *a)
         TCGv_ptr fpst = fpstatus_ptr(FPST_A64);
 
         gen_helper_vfp_fcvt_f64_to_f16(tcg_rd, tcg_rn, fpst, ahp);
-        /* write_fp_sreg is OK here because top half of tcg_rd is zero */
-        write_fp_sreg(s, a->rd, tcg_rd);
+        /* write_fp_hreg_merging is OK here because top half of tcg_rd is zero */
+        write_fp_hreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_sh(DisasContext *s, arg_rr *a)
         TCGv_i32 tcg_ahp = get_ahp_flag();
 
         gen_helper_vfp_fcvt_f16_to_f32(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
-        write_fp_sreg(s, a->rd, tcg_rd);
+        write_fp_sreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_dh(DisasContext *s, arg_rr *a)
         TCGv_i32 tcg_ahp = get_ahp_flag();
 
         gen_helper_vfp_fcvt_f16_to_f64(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp);
-        write_fp_dreg(s, a->rd, tcg_rd);
+        write_fp_dreg_merging(s, a->rd, a->rd, tcg_rd);
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_fcvt_f(DisasContext *s, arg_fcvt *a,
     do_fcvt_scalar(s, a->esz | (is_signed ? MO_SIGN : 0),
                    a->esz, tcg_int, a->shift, a->rn, rmode);
 
-    clear_vec(s, a->rd);
+    if (!s->fpcr_nep) {
+        clear_vec(s, a->rd);
+    }
     write_vec_element(s, tcg_int, a->rd, 0, a->esz);
     return true;
 }
-- 
2.34.1

Handle FPCR.NEP in the operations handled by do_cvtf_scalar().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
         } else {
             gen_helper_vfp_uqtod(tcg_double, tcg_int, tcg_shift, tcg_fpstatus);
         }
-        write_fp_dreg(s, rd, tcg_double);
+        write_fp_dreg_merging(s, rd, rd, tcg_double);
         break;
 
     case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
         } else {
             gen_helper_vfp_uqtos(tcg_single, tcg_int, tcg_shift, tcg_fpstatus);
         }
-        write_fp_sreg(s, rd, tcg_single);
+        write_fp_sreg_merging(s, rd, rd, tcg_single);
         break;
 
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift,
         } else {
             gen_helper_vfp_uqtoh(tcg_single, tcg_int, tcg_shift, tcg_fpstatus);
         }
-        write_fp_sreg(s, rd, tcg_single);
+        write_fp_hreg_merging(s, rd, rd, tcg_single);
         break;
 
     default:
-- 
2.34.1

Handle FPCR.NEP merging for scalar FABS and FNEG; this requires
an extra parameter to do_fp1_scalar_int(), since FMOV scalar
does not have the merging behaviour.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ typedef struct FPScalar1Int {
 } FPScalar1Int;
 
 static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
-                              const FPScalar1Int *f)
+                              const FPScalar1Int *f,
+                              bool merging)
 {
     switch (a->esz) {
     case MO_64:
         if (fp_access_check(s)) {
             TCGv_i64 t = read_fp_dreg(s, a->rn);
             f->gen_d(t, t);
-            write_fp_dreg(s, a->rd, t);
+            if (merging) {
+                write_fp_dreg_merging(s, a->rd, a->rd, t);
+            } else {
+                write_fp_dreg(s, a->rd, t);
+            }
         }
         break;
     case MO_32:
         if (fp_access_check(s)) {
             TCGv_i32 t = read_fp_sreg(s, a->rn);
             f->gen_s(t, t);
-            write_fp_sreg(s, a->rd, t);
+            if (merging) {
+                write_fp_sreg_merging(s, a->rd, a->rd, t);
+            } else {
+                write_fp_sreg(s, a->rd, t);
+            }
         }
         break;
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
         if (fp_access_check(s)) {
             TCGv_i32 t = read_fp_hreg(s, a->rn);
             f->gen_h(t, t);
-            write_fp_sreg(s, a->rd, t);
+            if (merging) {
+                write_fp_hreg_merging(s, a->rd, a->rd, t);
+            } else {
+                write_fp_sreg(s, a->rd, t);
+            }
         }
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fmov = {
     tcg_gen_mov_i32,
     tcg_gen_mov_i64,
 };
-TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov)
+TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov, false)
 
 static const FPScalar1Int f_scalar_fabs = {
     gen_vfp_absh,
     gen_vfp_abss,
     gen_vfp_absd,
 };
-TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs)
+TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs, true)
 
 static const FPScalar1Int f_scalar_fneg = {
     gen_vfp_negh,
     gen_vfp_negs,
     gen_vfp_negd,
 };
-TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg)
+TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg, true)
 
 typedef struct FPScalar1 {
     void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr);
-- 
2.34.1

Unlike the other users of do_2misc_narrow_scalar(), FCVTXN (scalar)
is always double-to-single and must honour FPCR.NEP.  Implement this
directly in a trans function rather than using
do_2misc_narrow_scalar().

We still need gen_fcvtxn_sd() and the f_scalar_fcvtxn[] array for
the FCVTXN (vector) insn, so we move those down in the file to
where they are used.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 43 ++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static ArithOneOp * const f_scalar_uqxtn[] = {
 };
 TRANS(UQXTN_s, do_2misc_narrow_scalar, a, f_scalar_uqxtn)
 
-static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
+static bool trans_FCVTXN_s(DisasContext *s, arg_rr_e *a)
 {
-    /*
-     * 64 bit to 32 bit float conversion
-     * with von Neumann rounding (round to odd)
-     */
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_A64));
-    tcg_gen_extu_i32_i64(d, tmp);
+    if (fp_access_check(s)) {
+        /*
+         * 64 bit to 32 bit float conversion
+         * with von Neumann rounding (round to odd)
+         */
+        TCGv_i64 src = read_fp_dreg(s, a->rn);
+        TCGv_i32 dst = tcg_temp_new_i32();
+        gen_helper_fcvtx_f64_to_f32(dst, src, fpstatus_ptr(FPST_A64));
+        write_fp_sreg_merging(s, a->rd, a->rd, dst);
+    }
+    return true;
 }
 
-static ArithOneOp * const f_scalar_fcvtxn[] = {
-    NULL,
-    NULL,
-    gen_fcvtxn_sd,
-};
-TRANS(FCVTXN_s, do_2misc_narrow_scalar, a, f_scalar_fcvtxn)
-
 #undef WRAP_ENV
 
 static bool do_gvec_fn2(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn)
@@ -XXX,XX +XXX,XX @@ static void gen_fcvtn_sd(TCGv_i64 d, TCGv_i64 n)
     tcg_gen_extu_i32_i64(d, tmp);
 }
 
+static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n)
+{
+    /*
+     * 64 bit to 32 bit float conversion
+     * with von Neumann rounding (round to odd)
+     */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    gen_helper_fcvtx_f64_to_f32(tmp, n, fpstatus_ptr(FPST_A64));
+    tcg_gen_extu_i32_i64(d, tmp);
+}
+
 static ArithOneOp * const f_vector_fcvtn[] = {
     NULL,
     gen_fcvtn_hs,
     gen_fcvtn_sd,
 };
+static ArithOneOp * const f_scalar_fcvtxn[] = {
+    NULL,
+    NULL,
+    gen_fcvtxn_sd,
+};
 TRANS(FCVTN_v, do_2misc_narrow_vector, a, f_vector_fcvtn)
 TRANS(FCVTXN_v, do_2misc_narrow_vector, a, f_scalar_fcvtxn)
 
-- 
2.34.1

do_fp3_scalar_idx() is used only for the FMUL and FMULX scalar by
element instructions; these both need to merge the result with the Rn
register when FPCR.NEP is set.

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
 
             read_vec_element(s, t1, a->rm, a->idx, MO_64);
             f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_A64));
-            write_fp_dreg(s, a->rd, t0);
+            write_fp_dreg_merging(s, a->rd, a->rn, t0);
         }
         break;
     case MO_32:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
 
             read_vec_element_i32(s, t1, a->rm, a->idx, MO_32);
             f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_A64));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_sreg_merging(s, a->rd, a->rn, t0);
         }
         break;
     case MO_16:
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e *a, const FPScalar *f)
 
             read_vec_element_i32(s, t1, a->rm, a->idx, MO_16);
             f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_A64_F16));
-            write_fp_sreg(s, a->rd, t0);
+            write_fp_hreg_merging(s, a->rd, a->rn, t0);
         }
         break;
     default:
-- 
2.34.1

When FPCR.AH == 1, floating point FMIN and FMAX have some odd special
cases:

* comparing two zeroes (even of different sign) or comparing a NaN
   with anything always returns the second argument (possibly
   squashed to zero)
 * denormal outputs are not squashed to zero regardless of FZ or FZ16

Implement these semantics in new helper functions and select them at
translate time if FPCR.AH is 1 for the scalar FMAX and FMIN insns.
(We will convert the other FMAX and FMIN insns in subsequent
commits.)

Note that FMINNM and FMAXNM are not affected.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-a64.h    |  7 +++++++
 target/arm/tcg/helper-a64.c    | 36 ++++++++++++++++++++++++++++++++++
 target/arm/tcg/translate-a64.c | 23 ++++++++++++++++++++--
 3 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(advsimd_muladd2h, i32, i32, i32, i32, fpst)
 DEF_HELPER_2(advsimd_rinth_exact, f16, f16, fpst)
 DEF_HELPER_2(advsimd_rinth, f16, f16, fpst)
 
+DEF_HELPER_3(vfp_ah_minh, f16, f16, f16, fpst)
+DEF_HELPER_3(vfp_ah_mins, f32, f32, f32, fpst)
+DEF_HELPER_3(vfp_ah_mind, f64, f64, f64, fpst)
+DEF_HELPER_3(vfp_ah_maxh, f16, f16, f16, fpst)
+DEF_HELPER_3(vfp_ah_maxs, f32, f32, f32, fpst)
+DEF_HELPER_3(vfp_ah_maxd, f64, f64, f64, fpst)
+
 DEF_HELPER_2(exception_return, void, env, i64)
 DEF_HELPER_FLAGS_2(dc_zva, TCG_CALL_NO_WG, void, env, i64)
 
diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-a64.c
+++ b/target/arm/tcg/helper-a64.c
@@ -XXX,XX +XXX,XX @@ float32 HELPER(fcvtx_f64_to_f32)(float64 a, float_status *fpst)
     return r;
 }
 
+/*
+ * AH=1 min/max have some odd special cases:
+ * comparing two zeroes (regardless of sign), (NaN, anything),
+ * or (anything, NaN) should return the second argument (possibly
+ * squashed to zero).
+ * Also, denormal outputs are not squashed to zero regardless of FZ or FZ16.
+ */
+#define AH_MINMAX_HELPER(NAME, CTYPE, FLOATTYPE, MINMAX)                \
+    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
+    {                                                                   \
+        bool save;                                                      \
+        CTYPE r;                                                        \
+        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
+        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
+        if (FLOATTYPE ## _is_zero(a) && FLOATTYPE ## _is_zero(b)) {     \
+            return b;                                                   \
+        }                                                               \
+        if (FLOATTYPE ## _is_any_nan(a) ||                              \
+            FLOATTYPE ## _is_any_nan(b)) {                              \
+            float_raise(float_flag_invalid, fpst);                      \
+            return b;                                                   \
+        }                                                               \
+        save = get_flush_to_zero(fpst);                                 \
+        set_flush_to_zero(false, fpst);                                 \
+        r = FLOATTYPE ## _ ## MINMAX(a, b, fpst);                       \
+        set_flush_to_zero(save, fpst);                                  \
+        return r;                                                       \
+    }
+
+AH_MINMAX_HELPER(vfp_ah_minh, dh_ctype_f16, float16, min)
+AH_MINMAX_HELPER(vfp_ah_mins, float32, float32, min)
+AH_MINMAX_HELPER(vfp_ah_mind, float64, float64, min)
+AH_MINMAX_HELPER(vfp_ah_maxh, dh_ctype_f16, float16, max)
+AH_MINMAX_HELPER(vfp_ah_maxs, float32, float32, max)
+AH_MINMAX_HELPER(vfp_ah_maxd, float64, float64, max)
+
 /* 64-bit versions of the CRC helpers. Note that although the operation
  * (and the prototypes of crc32c() and crc32() mean that only the bottom
  * 32 bits of the accumulator and result are used, we pass and return
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
                                        select_ah_fpst(s, a->esz));
 }
 
+/* Some insns need to call different helpers when FPCR.AH == 1 */
+static bool do_fp3_scalar_2fn(DisasContext *s, arg_rrr_e *a,
+                              const FPScalar *fnormal,
+                              const FPScalar *fah,
+                              int mergereg)
+{
+    return do_fp3_scalar(s, a, s->fpcr_ah ? fah : fnormal, mergereg);
+}
+
 static const FPScalar f_scalar_fadd = {
     gen_helper_vfp_addh,
     gen_helper_vfp_adds,
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_fmax = {
     gen_helper_vfp_maxs,
     gen_helper_vfp_maxd,
 };
-TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax, a->rn)
+static const FPScalar f_scalar_fmax_ah = {
+    gen_helper_vfp_ah_maxh,
+    gen_helper_vfp_ah_maxs,
+    gen_helper_vfp_ah_maxd,
+};
+TRANS(FMAX_s, do_fp3_scalar_2fn, a, &f_scalar_fmax, &f_scalar_fmax_ah, a->rn)
 
 static const FPScalar f_scalar_fmin = {
     gen_helper_vfp_minh,
     gen_helper_vfp_mins,
     gen_helper_vfp_mind,
 };
-TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin, a->rn)
+static const FPScalar f_scalar_fmin_ah = {
+    gen_helper_vfp_ah_minh,
+    gen_helper_vfp_ah_mins,
+    gen_helper_vfp_ah_mind,
+};
+TRANS(FMIN_s, do_fp3_scalar_2fn, a, &f_scalar_fmin, &f_scalar_fmin_ah, a->rn)
 
 static const FPScalar f_scalar_fmaxnm = {
     gen_helper_vfp_maxnumh,
-- 
2.34.1

Implement the FPCR.AH == 1 semantics for vector FMIN/FMAX, by
creating new _ah_ versions of the gvec helpers which invoke the
scalar fmin_ah and fmax_ah helpers on each element.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 21 +++++++++++++++++++--
 target/arm/tcg/vec_helper.c    |  8 ++++++++
 3 files changed, 41 insertions(+), 2 deletions(-)

Implement the FPCR.AH semantics for FMAXV and FMINV.  These are the
"recursively reduce all lanes of a vector to a scalar result" insns;
we just need to use the _ah_ helper for the reduction step when
FPCR.AH == 1.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static TCGv_i32 do_reduction_op(DisasContext *s, int rn, MemOp esz,
 }
 
 static bool do_fp_reduction(DisasContext *s, arg_qrr_e *a,
-                              NeonGenTwoSingleOpFn *fn)
+                            NeonGenTwoSingleOpFn *fnormal,
+                            NeonGenTwoSingleOpFn *fah)
 {
     if (fp_access_check(s)) {
         MemOp esz = a->esz;
         int elts = (a->q ? 16 : 8) >> esz;
         TCGv_ptr fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64);
-        TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst, fn);
+        TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst,
+                                       s->fpcr_ah ? fah : fnormal);
         write_fp_sreg(s, a->rd, res);
     }
     return true;
 }
 
-TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxnumh)
-TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minnumh)
-TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_maxh)
-TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a, gen_helper_vfp_minh)
+TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_maxnumh, gen_helper_vfp_maxnumh)
+TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_minnumh, gen_helper_vfp_minnumh)
+TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_maxh, gen_helper_vfp_ah_maxh)
+TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a,
+           gen_helper_vfp_minh, gen_helper_vfp_ah_minh)
 
-TRANS(FMAXNMV_s, do_fp_reduction, a, gen_helper_vfp_maxnums)
-TRANS(FMINNMV_s, do_fp_reduction, a, gen_helper_vfp_minnums)
-TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs)
-TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins)
+TRANS(FMAXNMV_s, do_fp_reduction, a,
+      gen_helper_vfp_maxnums, gen_helper_vfp_maxnums)
+TRANS(FMINNMV_s, do_fp_reduction, a,
+      gen_helper_vfp_minnums, gen_helper_vfp_minnums)
+TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs, gen_helper_vfp_ah_maxs)
+TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins, gen_helper_vfp_ah_mins)
 
 /*
  * Floating-point Immediate
-- 
2.34.1

Implement the FPCR.AH semantics for the pairwise floating
point minimum/maximum insns FMINP and FMAXP.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
 target/arm/tcg/vec_helper.c    | 10 ++++++++++
 3 files changed, 45 insertions(+), 4 deletions(-)

Implement the FPCR.AH semantics for the SVE FMAXV and FMINV
vector-reduction-to-scalar max/min operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 +++++++++++
 target/arm/tcg/sve_helper.c    | 43 +++++++++++++++++++++-------------
 target/arm/tcg/translate-sve.c | 16 +++++++++++--
 3 files changed, 55 insertions(+), 18 deletions(-)

Implement the FPCR.AH semantics for the SVE FMAX and FMIN operations
that take an immediate as the second operand.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/sve_helper.c    |  8 ++++++++
 target/arm/tcg/translate-sve.c | 25 +++++++++++++++++++++++--
 3 files changed, 45 insertions(+), 2 deletions(-)

Implement the FPCR.AH semantics for the SVE FMAX and FMIN
operations that take two vector operands.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/sve_helper.c    |  8 ++++++++
 target/arm/tcg/translate-sve.c | 17 +++++++++++++++--
 3 files changed, 37 insertions(+), 2 deletions(-)

FPCR.AH == 1 mandates that negation of a NaN value should not flip
its sign bit.  This means we can no longer use gen_vfp_neg*()
everywhere but must instead generate slightly more complex code when
FPCR.AH is set.

Make this change for the scalar FNEG and for those places in
translate-a64.c which were previously directly calling
gen_vfp_neg*().

This change in semantics also affects any other instruction whose
pseudocode calls FPNeg(); in following commits we extend this
change to the other affected instructions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 125 ++++++++++++++++++++++++++++++---
 1 file changed, 114 insertions(+), 11 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn,
                        is_q ? 16 : 8, vec_full_reg_size(s), data, fn);
 }
 
+/*
+ * When FPCR.AH == 1, NEG and ABS do not flip the sign bit of a NaN.
+ * These functions implement
+ *   d = floatN_is_any_nan(s) ? s : floatN_chs(s)
+ * which for float32 is
+ *   d = (s & ~(1 << 31)) > 0x7f800000UL) ? s : (s ^ (1 << 31))
+ * and similarly for the other float sizes.
+ */
+static void gen_vfp_ah_negh(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32(), chs_s = tcg_temp_new_i32();
+
+    gen_vfp_negh(chs_s, s);
+    gen_vfp_absh(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7c00),
+                        s, chs_s);
+}
+
+static void gen_vfp_ah_negs(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32(), chs_s = tcg_temp_new_i32();
+
+    gen_vfp_negs(chs_s, s);
+    gen_vfp_abss(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7f800000UL),
+                        s, chs_s);
+}
+
+static void gen_vfp_ah_negd(TCGv_i64 d, TCGv_i64 s)
+{
+    TCGv_i64 abs_s = tcg_temp_new_i64(), chs_s = tcg_temp_new_i64();
+
+    gen_vfp_negd(chs_s, s);
+    gen_vfp_absd(abs_s, s);
+    tcg_gen_movcond_i64(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i64(0x7ff0000000000000ULL),
+                        s, chs_s);
+}
+
+static void gen_vfp_maybe_ah_negh(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
+{
+    if (dc->fpcr_ah) {
+        gen_vfp_ah_negh(d, s);
+    } else {
+        gen_vfp_negh(d, s);
+    }
+}
+
+static void gen_vfp_maybe_ah_negs(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
+{
+    if (dc->fpcr_ah) {
+        gen_vfp_ah_negs(d, s);
+    } else {
+        gen_vfp_negs(d, s);
+    }
+}
+
+static void gen_vfp_maybe_ah_negd(DisasContext *dc, TCGv_i64 d, TCGv_i64 s)
+{
+    if (dc->fpcr_ah) {
+        gen_vfp_ah_negd(d, s);
+    } else {
+        gen_vfp_negd(d, s);
+    }
+}
+
 /* Set ZF and NF based on a 64 bit result. This is alas fiddlier
  * than the 32 bit equivalent.
  */
@@ -XXX,XX +XXX,XX @@ static void gen_fnmul_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
     gen_vfp_negd(d, d);
 }
 
+static void gen_fnmul_ah_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_mulh(d, n, m, s);
+    gen_vfp_ah_negh(d, d);
+}
+
+static void gen_fnmul_ah_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_muls(d, n, m, s);
+    gen_vfp_ah_negs(d, d);
+}
+
+static void gen_fnmul_ah_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
+{
+    gen_helper_vfp_muld(d, n, m, s);
+    gen_vfp_ah_negd(d, d);
+}
+
 static const FPScalar f_scalar_fnmul = {
     gen_fnmul_h,
     gen_fnmul_s,
     gen_fnmul_d,
 };
-TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul, a->rn)
+static const FPScalar f_scalar_ah_fnmul = {
+    gen_fnmul_ah_h,
+    gen_fnmul_ah_s,
+    gen_fnmul_ah_d,
+};
+TRANS(FNMUL_s, do_fp3_scalar_2fn, a, &f_scalar_fnmul, &f_scalar_ah_fnmul, a->rn)
 
 static const FPScalar f_scalar_fcmeq = {
     gen_helper_advsimd_ceq_f16,
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
 
             read_vec_element(s, t2, a->rm, a->idx, MO_64);
             if (neg) {
-                gen_vfp_negd(t1, t1);
+                gen_vfp_maybe_ah_negd(s, t1, t1);
             }
             gen_helper_vfp_muladdd(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
             write_fp_dreg_merging(s, a->rd, a->rd, t0);
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
 
             read_vec_element_i32(s, t2, a->rm, a->idx, MO_32);
             if (neg) {
-                gen_vfp_negs(t1, t1);
+                gen_vfp_maybe_ah_negs(s, t1, t1);
             }
             gen_helper_vfp_muladds(t0, t1, t2, t0, fpstatus_ptr(FPST_A64));
             write_fp_sreg_merging(s, a->rd, a->rd, t0);
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
 
             read_vec_element_i32(s, t2, a->rm, a->idx, MO_16);
             if (neg) {
-                gen_vfp_negh(t1, t1);
+                gen_vfp_maybe_ah_negh(s, t1, t1);
             }
             gen_helper_advsimd_muladdh(t0, t1, t2, t0,
                                        fpstatus_ptr(FPST_A64_F16));
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             TCGv_i64 ta = read_fp_dreg(s, a->ra);
 
             if (neg_a) {
-                gen_vfp_negd(ta, ta);
+                gen_vfp_maybe_ah_negd(s, ta, ta);
             }
             if (neg_n) {
-                gen_vfp_negd(tn, tn);
+                gen_vfp_maybe_ah_negd(s, tn, tn);
             }
             fpst = fpstatus_ptr(FPST_A64);
             gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             TCGv_i32 ta = read_fp_sreg(s, a->ra);
 
             if (neg_a) {
-                gen_vfp_negs(ta, ta);
+                gen_vfp_maybe_ah_negs(s, ta, ta);
             }
             if (neg_n) {
-                gen_vfp_negs(tn, tn);
+                gen_vfp_maybe_ah_negs(s, tn, tn);
             }
             fpst = fpstatus_ptr(FPST_A64);
             gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
             TCGv_i32 ta = read_fp_hreg(s, a->ra);
 
             if (neg_a) {
-                gen_vfp_negh(ta, ta);
+                gen_vfp_maybe_ah_negh(s, ta, ta);
             }
             if (neg_n) {
-                gen_vfp_negh(tn, tn);
+                gen_vfp_maybe_ah_negh(s, tn, tn);
             }
             fpst = fpstatus_ptr(FPST_A64_F16);
             gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
@@ -XXX,XX +XXX,XX @@ static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a,
     return true;
 }
 
+static bool do_fp1_scalar_int_2fn(DisasContext *s, arg_rr_e *a,
+                                  const FPScalar1Int *fnormal,
+                                  const FPScalar1Int *fah)
+{
+    return do_fp1_scalar_int(s, a, s->fpcr_ah ? fah : fnormal, true);
+}
+
 static const FPScalar1Int f_scalar_fmov = {
     tcg_gen_mov_i32,
     tcg_gen_mov_i32,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fneg = {
     gen_vfp_negs,
     gen_vfp_negd,
 };
-TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg, true)
+static const FPScalar1Int f_scalar_ah_fneg = {
+    gen_vfp_ah_negh,
+    gen_vfp_ah_negs,
+    gen_vfp_ah_negd,
+};
+TRANS(FNEG_s, do_fp1_scalar_int_2fn, a, &f_scalar_fneg, &f_scalar_ah_fneg)
 
 typedef struct FPScalar1 {
     void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr);
-- 
2.34.1

FPCR.AH == 1 mandates that taking the absolute value of a NaN should
not change its sign bit.  This means we can no longer use
gen_vfp_abs*() everywhere but must instead generate slightly more
complex code when FPCR.AH is set.

Implement these semantics for scalar FABS and FABD.  This change also
affects all other instructions whose psuedocode calls FPAbs(); we
will extend the change to those instructions in following commits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 69 +++++++++++++++++++++++++++++++++-
 1 file changed, 67 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_vfp_ah_negd(TCGv_i64 d, TCGv_i64 s)
                         s, chs_s);
 }
 
+/*
+ * These functions implement
+ *  d = floatN_is_any_nan(s) ? s : floatN_abs(s)
+ * which for float32 is
+ *  d = (s & ~(1 << 31)) > 0x7f800000UL) ? s : (s & ~(1 << 31))
+ * and similarly for the other float sizes.
+ */
+static void gen_vfp_ah_absh(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32();
+
+    gen_vfp_absh(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7c00),
+                        s, abs_s);
+}
+
+static void gen_vfp_ah_abss(TCGv_i32 d, TCGv_i32 s)
+{
+    TCGv_i32 abs_s = tcg_temp_new_i32();
+
+    gen_vfp_abss(abs_s, s);
+    tcg_gen_movcond_i32(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i32(0x7f800000UL),
+                        s, abs_s);
+}
+
+static void gen_vfp_ah_absd(TCGv_i64 d, TCGv_i64 s)
+{
+    TCGv_i64 abs_s = tcg_temp_new_i64();
+
+    gen_vfp_absd(abs_s, s);
+    tcg_gen_movcond_i64(TCG_COND_GTU, d,
+                        abs_s, tcg_constant_i64(0x7ff0000000000000ULL),
+                        s, abs_s);
+}
+
 static void gen_vfp_maybe_ah_negh(DisasContext *dc, TCGv_i32 d, TCGv_i32 s)
 {
     if (dc->fpcr_ah) {
@@ -XXX,XX +XXX,XX @@ static void gen_fabd_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
     gen_vfp_absd(d, d);
 }
 
+static void gen_fabd_ah_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_subh(d, n, m, s);
+    gen_vfp_ah_absh(d, d);
+}
+
+static void gen_fabd_ah_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+    gen_helper_vfp_subs(d, n, m, s);
+    gen_vfp_ah_abss(d, d);
+}
+
+static void gen_fabd_ah_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
+{
+    gen_helper_vfp_subd(d, n, m, s);
+    gen_vfp_ah_absd(d, d);
+}
+
 static const FPScalar f_scalar_fabd = {
     gen_fabd_h,
     gen_fabd_s,
     gen_fabd_d,
 };
-TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd, a->rn)
+static const FPScalar f_scalar_ah_fabd = {
+    gen_fabd_ah_h,
+    gen_fabd_ah_s,
+    gen_fabd_ah_d,
+};
+TRANS(FABD_s, do_fp3_scalar_2fn, a, &f_scalar_fabd, &f_scalar_ah_fabd, a->rn)
 
 static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f16,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fabs = {
     gen_vfp_abss,
     gen_vfp_absd,
 };
-TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs, true)
+static const FPScalar1Int f_scalar_ah_fabs = {
+    gen_vfp_ah_absh,
+    gen_vfp_ah_abss,
+    gen_vfp_ah_absd,
+};
+TRANS(FABS_s, do_fp1_scalar_int_2fn, a, &f_scalar_fabs, &f_scalar_ah_fabs)
 
 static const FPScalar1Int f_scalar_fneg = {
     gen_vfp_negh,
-- 
2.34.1

Split the handling of vector FABD so that it calls a different set
of helpers when FPCR.AH is 1, which implement the "no negation of
the sign of a NaN" semantics.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h            |  4 ++++
 target/arm/tcg/translate-a64.c |  7 ++++++-
 target/arm/tcg/vec_helper.c    | 23 +++++++++++++++++++++++
 3 files changed, 33 insertions(+), 1 deletion(-)

Make SVE FNEG honour the FPCR.AH "don't negate the sign of a NaN"
semantics.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 4 ++++
 target/arm/tcg/sve_helper.c    | 8 ++++++++
 target/arm/tcg/translate-sve.c | 7 ++++++-
 3 files changed, 18 insertions(+), 1 deletion(-)

Make SVE FABS honour the FPCR.AH "don't negate the sign of a NaN"
semantics.

Make the SVE FABD insn honour the FPCR.AH "don't negate the sign
of a NaN" semantics.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    |  7 +++++++
 target/arm/tcg/sve_helper.c    | 22 ++++++++++++++++++++++
 target/arm/tcg/translate-sve.c |  2 +-
 3 files changed, 30 insertions(+), 1 deletion(-)

The negation steps in FCADD must honour FPCR.AH's "don't change the
sign of a NaN" semantics.  Implement this in the same way we did for
the base ASIMD FCADD, by encoding FPCR.AH into the SIMD data field
passed to the helper and using that to decide whether to negate the
values.

The construction of neg_imag and neg_real were done to make it easy
to apply both in parallel with two simple logical operations.  This
changed with FPCR.AH, which is more complex than that. Switch to
an approach that follows the pseudocode more closely, by extracting
the 'rot=1' parameter from the SIMD data field and changing the
sign of the appropriate input value.

Note that there was a naming issue with neg_imag and neg_real.
They were named backward, with neg_imag being non-zero for rot=1,
and vice versa.  This was combined with reversed usage within the
loop, so that the negation in the end turned out correct.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/vec_internal.h  | 17 ++++++++++++++
 target/arm/tcg/sve_helper.c    | 42 ++++++++++++++++++++++++----------
 target/arm/tcg/translate-sve.c |  2 +-
 3 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_internal.h
+++ b/target/arm/tcg/vec_internal.h
@@ -XXX,XX +XXX,XX @@
 #ifndef TARGET_ARM_VEC_INTERNAL_H
 #define TARGET_ARM_VEC_INTERNAL_H
 
+#include "fpu/softfloat.h"
+
 /*
  * Note that vector data is stored in host-endian 64-bit chunks,
  * so addressing units smaller than that needs a host-endian fixup.
@@ -XXX,XX +XXX,XX @@ float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2,
  */
 bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp);
 
+static inline float16 float16_maybe_ah_chs(float16 a, bool fpcr_ah)
+{
+    return fpcr_ah && float16_is_any_nan(a) ? a : float16_chs(a);
+}
+
+static inline float32 float32_maybe_ah_chs(float32 a, bool fpcr_ah)
+{
+    return fpcr_ah && float32_is_any_nan(a) ? a : float32_chs(a);
+}
+
+static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah)
+{
+    return fpcr_ah && float64_is_any_nan(a) ? a : float64_chs(a);
+}
+
 #endif /* TARGET_ARM_VEC_INTERNAL_H */
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg,
 {
     intptr_t j, i = simd_oprsz(desc);
     uint64_t *g = vg;
-    float16 neg_imag = float16_set_sign(0, simd_data(desc));
-    float16 neg_real = float16_chs(neg_imag);
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg,
             i -= 2 * sizeof(float16);
 
             e0 = *(float16 *)(vn + H1_2(i));
-            e1 = *(float16 *)(vm + H1_2(j)) ^ neg_real;
+            e1 = *(float16 *)(vm + H1_2(j));
             e2 = *(float16 *)(vn + H1_2(j));
-            e3 = *(float16 *)(vm + H1_2(i)) ^ neg_imag;
+            e3 = *(float16 *)(vm + H1_2(i));
+
+            if (rot) {
+                e3 = float16_maybe_ah_chs(e3, fpcr_ah);
+            } else {
+                e1 = float16_maybe_ah_chs(e1, fpcr_ah);
+            }
 
             if (likely((pg >> (i & 63)) & 1)) {
                 *(float16 *)(vd + H1_2(i)) = float16_add(e0, e1, s);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg,
 {
     intptr_t j, i = simd_oprsz(desc);
     uint64_t *g = vg;
-    float32 neg_imag = float32_set_sign(0, simd_data(desc));
-    float32 neg_real = float32_chs(neg_imag);
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg,
             i -= 2 * sizeof(float32);
 
             e0 = *(float32 *)(vn + H1_2(i));
-            e1 = *(float32 *)(vm + H1_2(j)) ^ neg_real;
+            e1 = *(float32 *)(vm + H1_2(j));
             e2 = *(float32 *)(vn + H1_2(j));
-            e3 = *(float32 *)(vm + H1_2(i)) ^ neg_imag;
+            e3 = *(float32 *)(vm + H1_2(i));
+
+            if (rot) {
+                e3 = float32_maybe_ah_chs(e3, fpcr_ah);
+            } else {
+                e1 = float32_maybe_ah_chs(e1, fpcr_ah);
+            }
 
             if (likely((pg >> (i & 63)) & 1)) {
                 *(float32 *)(vd + H1_2(i)) = float32_add(e0, e1, s);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg,
 {
     intptr_t j, i = simd_oprsz(desc);
     uint64_t *g = vg;
-    float64 neg_imag = float64_set_sign(0, simd_data(desc));
-    float64 neg_real = float64_chs(neg_imag);
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg,
             i -= 2 * sizeof(float64);
 
             e0 = *(float64 *)(vn + H1_2(i));
-            e1 = *(float64 *)(vm + H1_2(j)) ^ neg_real;
+            e1 = *(float64 *)(vm + H1_2(j));
             e2 = *(float64 *)(vn + H1_2(j));
-            e3 = *(float64 *)(vm + H1_2(i)) ^ neg_imag;
+            e3 = *(float64 *)(vm + H1_2(i));
+
+            if (rot) {
+                e3 = float64_maybe_ah_chs(e3, fpcr_ah);
+            } else {
+                e1 = float64_maybe_ah_chs(e1, fpcr_ah);
+            }
 
             if (likely((pg >> (i & 63)) & 1)) {
                 *(float64 *)(vd + H1_2(i)) = float64_add(e0, e1, s);
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_4_ptr * const fcadd_fns[] = {
     gen_helper_sve_fcadd_s, gen_helper_sve_fcadd_d,
 };
 TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
-           a->rd, a->rn, a->rm, a->pg, a->rot,
+           a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
 #define DO_FMLA(NAME, name) \
-- 
2.34.1

The negation steps in FCADD must honour FPCR.AH's "don't change the
sign of a NaN" semantics.  Implement this by encoding FPCR.AH into
the SIMD data field passed to the helper and using that to decide
whether to negate the values.

The construction of neg_imag and neg_real were done to make it easy
to apply both in parallel with two simple logical operations.  This
changed with FPCR.AH, which is more complex than that. Switch to
an approach closer to the pseudocode, where we extract the rot
parameter from the SIMD data word and negate the appropriate
input value.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 10 +++++--
 target/arm/tcg/vec_helper.c    | 54 +++++++++++++++++++---------------
 2 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fcadd[3] = {
     gen_helper_gvec_fcadds,
     gen_helper_gvec_fcaddd,
 };
-TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0, f_vector_fcadd)
-TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1, f_vector_fcadd)
+/*
+ * Encode FPCR.AH into the data so the helper knows whether the
+ * negations it does should avoid flipping the sign bit on a NaN
+ */
+TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0 | (s->fpcr_ah << 1),
+           f_vector_fcadd)
+TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1 | (s->fpcr_ah << 1),
+           f_vector_fcadd)
 
 static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
 {
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm,
     float16 *d = vd;
     float16 *n = vn;
     float16 *m = vm;
-    uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = neg_real ^ 1;
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 15;
-    neg_imag <<= 15;
-
     for (i = 0; i < opr_sz / 2; i += 2) {
         float16 e0 = n[H2(i)];
-        float16 e1 = m[H2(i + 1)] ^ neg_imag;
+        float16 e1 = m[H2(i + 1)];
         float16 e2 = n[H2(i + 1)];
-        float16 e3 = m[H2(i)] ^ neg_real;
+        float16 e3 = m[H2(i)];
+
+        if (rot) {
+            e3 = float16_maybe_ah_chs(e3, fpcr_ah);
+        } else {
+            e1 = float16_maybe_ah_chs(e1, fpcr_ah);
+        }
 
         d[H2(i)] = float16_add(e0, e1, fpst);
         d[H2(i + 1)] = float16_add(e2, e3, fpst);
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcadds)(void *vd, void *vn, void *vm,
     float32 *d = vd;
     float32 *n = vn;
     float32 *m = vm;
-    uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = neg_real ^ 1;
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 31;
-    neg_imag <<= 31;
-
     for (i = 0; i < opr_sz / 4; i += 2) {
         float32 e0 = n[H4(i)];
-        float32 e1 = m[H4(i + 1)] ^ neg_imag;
+        float32 e1 = m[H4(i + 1)];
         float32 e2 = n[H4(i + 1)];
-        float32 e3 = m[H4(i)] ^ neg_real;
+        float32 e3 = m[H4(i)];
+
+        if (rot) {
+            e3 = float32_maybe_ah_chs(e3, fpcr_ah);
+        } else {
+            e1 = float32_maybe_ah_chs(e1, fpcr_ah);
+        }
 
         d[H4(i)] = float32_add(e0, e1, fpst);
         d[H4(i + 1)] = float32_add(e2, e3, fpst);
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
     float64 *d = vd;
     float64 *n = vn;
     float64 *m = vm;
-    uint64_t neg_real = extract64(desc, SIMD_DATA_SHIFT, 1);
-    uint64_t neg_imag = neg_real ^ 1;
+    bool rot = extract32(desc, SIMD_DATA_SHIFT, 1);
+    bool fpcr_ah = extract64(desc, SIMD_DATA_SHIFT + 1, 1);
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 63;
-    neg_imag <<= 63;
-
     for (i = 0; i < opr_sz / 8; i += 2) {
         float64 e0 = n[i];
-        float64 e1 = m[i + 1] ^ neg_imag;
+        float64 e1 = m[i + 1];
         float64 e2 = n[i + 1];
-        float64 e3 = m[i] ^ neg_real;
+        float64 e3 = m[i];
+
+        if (rot) {
+            e3 = float64_maybe_ah_chs(e3, fpcr_ah);
+        } else {
+            e1 = float64_maybe_ah_chs(e1, fpcr_ah);
+        }
 
         d[i] = float64_add(e0, e1, fpst);
         d[i + 1] = float64_add(e2, e3, fpst);
-- 
2.34.1

Handle the FPCR.AH semantics that we do not change the sign of an
input NaN in the FRECPS and FRSQRTS scalar insns, by providing
new helper functions that do the CHS part of the operation
differently.

Since the extra helper functions would be very repetitive if written
out longhand, we condense them and the existing non-AH helpers into
being emitted via macros.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-a64.h    |   6 ++
 target/arm/tcg/vec_internal.h  |  18 ++++++
 target/arm/tcg/helper-a64.c    | 115 ++++++++++++---------------------
 target/arm/tcg/translate-a64.c |  25 +++++--
 4 files changed, 83 insertions(+), 81 deletions(-)

diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(neon_cgt_f64, TCG_CALL_NO_RWG, i64, i64, i64, fpst)
 DEF_HELPER_FLAGS_3(recpsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
 DEF_HELPER_FLAGS_3(recpsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
 DEF_HELPER_FLAGS_3(recpsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
+DEF_HELPER_FLAGS_3(recpsf_ah_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
+DEF_HELPER_FLAGS_3(recpsf_ah_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
+DEF_HELPER_FLAGS_3(recpsf_ah_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
 DEF_HELPER_FLAGS_3(rsqrtsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
 DEF_HELPER_FLAGS_3(rsqrtsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
 DEF_HELPER_FLAGS_3(rsqrtsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
+DEF_HELPER_FLAGS_3(rsqrtsf_ah_f16, TCG_CALL_NO_RWG, f16, f16, f16, fpst)
+DEF_HELPER_FLAGS_3(rsqrtsf_ah_f32, TCG_CALL_NO_RWG, f32, f32, f32, fpst)
+DEF_HELPER_FLAGS_3(rsqrtsf_ah_f64, TCG_CALL_NO_RWG, f64, f64, f64, fpst)
 DEF_HELPER_FLAGS_2(frecpx_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
 DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
 DEF_HELPER_FLAGS_2(frecpx_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_internal.h
+++ b/target/arm/tcg/vec_internal.h
@@ -XXX,XX +XXX,XX @@ float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2,
  */
 bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp);
 
+/*
+ * Negate as for FPCR.AH=1 -- do not negate NaNs.
+ */
+static inline float16 float16_ah_chs(float16 a)
+{
+    return float16_is_any_nan(a) ? a : float16_chs(a);
+}
+
+static inline float32 float32_ah_chs(float32 a)
+{
+    return float32_is_any_nan(a) ? a : float32_chs(a);
+}
+
+static inline float64 float64_ah_chs(float64 a)
+{
+    return float64_is_any_nan(a) ? a : float64_chs(a);
+}
+
 static inline float16 float16_maybe_ah_chs(float16 a, bool fpcr_ah)
 {
     return fpcr_ah && float16_is_any_nan(a) ? a : float16_chs(a);
diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-a64.c
+++ b/target/arm/tcg/helper-a64.c
@@ -XXX,XX +XXX,XX @@
 #ifdef CONFIG_USER_ONLY
 #include "user/page-protection.h"
 #endif
+#include "vec_internal.h"
 
 /* C2.4.7 Multiply and divide */
 /* special cases for 0 and LLONG_MIN are mandated by the standard */
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_cgt_f64)(float64 a, float64 b, float_status *fpst)
     return -float64_lt(b, a, fpst);
 }
 
-/* Reciprocal step and sqrt step. Note that unlike the A32/T32
+/*
+ * Reciprocal step and sqrt step. Note that unlike the A32/T32
  * versions, these do a fully fused multiply-add or
  * multiply-add-and-halve.
+ * The FPCR.AH == 1 versions need to avoid flipping the sign of NaN.
  */
-
-uint32_t HELPER(recpsf_f16)(uint32_t a, uint32_t b, float_status *fpst)
-{
-    a = float16_squash_input_denormal(a, fpst);
-    b = float16_squash_input_denormal(b, fpst);
-
-    a = float16_chs(a);
-    if ((float16_is_infinity(a) && float16_is_zero(b)) ||
-        (float16_is_infinity(b) && float16_is_zero(a))) {
-        return float16_two;
+#define DO_RECPS(NAME, CTYPE, FLOATTYPE, CHSFN)                         \
+    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
+    {                                                                   \
+        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
+        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
+        a = FLOATTYPE ## _ ## CHSFN(a);                                 \
+        if ((FLOATTYPE ## _is_infinity(a) && FLOATTYPE ## _is_zero(b)) || \
+            (FLOATTYPE ## _is_infinity(b) && FLOATTYPE ## _is_zero(a))) { \
+            return FLOATTYPE ## _two;                                   \
+        }                                                               \
+        return FLOATTYPE ## _muladd(a, b, FLOATTYPE ## _two, 0, fpst);  \
     }
-    return float16_muladd(a, b, float16_two, 0, fpst);
-}
 
-float32 HELPER(recpsf_f32)(float32 a, float32 b, float_status *fpst)
-{
-    a = float32_squash_input_denormal(a, fpst);
-    b = float32_squash_input_denormal(b, fpst);
+DO_RECPS(recpsf_f16, uint32_t, float16, chs)
+DO_RECPS(recpsf_f32, float32, float32, chs)
+DO_RECPS(recpsf_f64, float64, float64, chs)
+DO_RECPS(recpsf_ah_f16, uint32_t, float16, ah_chs)
+DO_RECPS(recpsf_ah_f32, float32, float32, ah_chs)
+DO_RECPS(recpsf_ah_f64, float64, float64, ah_chs)
 
-    a = float32_chs(a);
-    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
-        (float32_is_infinity(b) && float32_is_zero(a))) {
-        return float32_two;
-    }
-    return float32_muladd(a, b, float32_two, 0, fpst);
-}
+#define DO_RSQRTSF(NAME, CTYPE, FLOATTYPE, CHSFN)                       \
+    CTYPE HELPER(NAME)(CTYPE a, CTYPE b, float_status *fpst)            \
+    {                                                                   \
+        a = FLOATTYPE ## _squash_input_denormal(a, fpst);               \
+        b = FLOATTYPE ## _squash_input_denormal(b, fpst);               \
+        a = FLOATTYPE ## _ ## CHSFN(a);                                 \
+        if ((FLOATTYPE ## _is_infinity(a) && FLOATTYPE ## _is_zero(b)) || \
+            (FLOATTYPE ## _is_infinity(b) && FLOATTYPE ## _is_zero(a))) { \
+            return FLOATTYPE ## _one_point_five;                        \
+        }                                                               \
+        return FLOATTYPE ## _muladd_scalbn(a, b, FLOATTYPE ## _three,   \
+                                           -1, 0, fpst);                \
+    }                                                                   \
 
-float64 HELPER(recpsf_f64)(float64 a, float64 b, float_status *fpst)
-{
-    a = float64_squash_input_denormal(a, fpst);
-    b = float64_squash_input_denormal(b, fpst);
-
-    a = float64_chs(a);
-    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
-        (float64_is_infinity(b) && float64_is_zero(a))) {
-        return float64_two;
-    }
-    return float64_muladd(a, b, float64_two, 0, fpst);
-}
-
-uint32_t HELPER(rsqrtsf_f16)(uint32_t a, uint32_t b, float_status *fpst)
-{
-    a = float16_squash_input_denormal(a, fpst);
-    b = float16_squash_input_denormal(b, fpst);
-
-    a = float16_chs(a);
-    if ((float16_is_infinity(a) && float16_is_zero(b)) ||
-        (float16_is_infinity(b) && float16_is_zero(a))) {
-        return float16_one_point_five;
-    }
-    return float16_muladd_scalbn(a, b, float16_three, -1, 0, fpst);
-}
-
-float32 HELPER(rsqrtsf_f32)(float32 a, float32 b, float_status *fpst)
-{
-    a = float32_squash_input_denormal(a, fpst);
-    b = float32_squash_input_denormal(b, fpst);
-
-    a = float32_chs(a);
-    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
-        (float32_is_infinity(b) && float32_is_zero(a))) {
-        return float32_one_point_five;
-    }
-    return float32_muladd_scalbn(a, b, float32_three, -1, 0, fpst);
-}
-
-float64 HELPER(rsqrtsf_f64)(float64 a, float64 b, float_status *fpst)
-{
-    a = float64_squash_input_denormal(a, fpst);
-    b = float64_squash_input_denormal(b, fpst);
-
-    a = float64_chs(a);
-    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
-        (float64_is_infinity(b) && float64_is_zero(a))) {
-        return float64_one_point_five;
-    }
-    return float64_muladd_scalbn(a, b, float64_three, -1, 0, fpst);
-}
+DO_RSQRTSF(rsqrtsf_f16, uint32_t, float16, chs)
+DO_RSQRTSF(rsqrtsf_f32, float32, float32, chs)
+DO_RSQRTSF(rsqrtsf_f64, float64, float64, chs)
+DO_RSQRTSF(rsqrtsf_ah_f16, uint32_t, float16, ah_chs)
+DO_RSQRTSF(rsqrtsf_ah_f32, float32, float32, ah_chs)
+DO_RSQRTSF(rsqrtsf_ah_f64, float64, float64, ah_chs)
 
 /* Floating-point reciprocal exponent - see FPRecpX in ARM ARM */
 uint32_t HELPER(frecpx_f16)(uint32_t a, float_status *fpst)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
                                        FPST_A64_F16 : FPST_A64);
 }
 
-static bool do_fp3_scalar_ah(DisasContext *s, arg_rrr_e *a, const FPScalar *f,
-                             int mergereg)
+static bool do_fp3_scalar_ah_2fn(DisasContext *s, arg_rrr_e *a,
+                                 const FPScalar *fnormal, const FPScalar *fah,
+                                 int mergereg)
 {
-    return do_fp3_scalar_with_fpsttype(s, a, f, mergereg,
-                                       select_ah_fpst(s, a->esz));
+    return do_fp3_scalar_with_fpsttype(s, a, s->fpcr_ah ? fah : fnormal,
+                                       mergereg, select_ah_fpst(s, a->esz));
 }
 
 /* Some insns need to call different helpers when FPCR.AH == 1 */
@@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_frecps = {
     gen_helper_recpsf_f32,
     gen_helper_recpsf_f64,
 };
-TRANS(FRECPS_s, do_fp3_scalar_ah, a, &f_scalar_frecps, a->rn)
+static const FPScalar f_scalar_ah_frecps = {
+    gen_helper_recpsf_ah_f16,
+    gen_helper_recpsf_ah_f32,
+    gen_helper_recpsf_ah_f64,
+};
+TRANS(FRECPS_s, do_fp3_scalar_ah_2fn, a,
+      &f_scalar_frecps, &f_scalar_ah_frecps, a->rn)
 
 static const FPScalar f_scalar_frsqrts = {
     gen_helper_rsqrtsf_f16,
     gen_helper_rsqrtsf_f32,
     gen_helper_rsqrtsf_f64,
 };
-TRANS(FRSQRTS_s, do_fp3_scalar_ah, a, &f_scalar_frsqrts, a->rn)
+static const FPScalar f_scalar_ah_frsqrts = {
+    gen_helper_rsqrtsf_ah_f16,
+    gen_helper_rsqrtsf_ah_f32,
+    gen_helper_rsqrtsf_ah_f64,
+};
+TRANS(FRSQRTS_s, do_fp3_scalar_ah_2fn, a,
+      &f_scalar_frsqrts, &f_scalar_ah_frsqrts, a->rn)
 
 static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a,
                        const FPScalar *f, bool swap)
-- 
2.34.1

Handle the FPCR.AH "don't negate the sign of a NaN" semantics
in the vector versions of FRECPS and FRSQRTS, by implementing
new vector wrappers that call the _ah_ scalar helpers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 21 ++++++++++++++++-----
 target/arm/tcg/translate-sve.c |  7 ++++++-
 target/arm/tcg/vec_helper.c    |  8 ++++++++
 4 files changed, 44 insertions(+), 6 deletions(-)

Handle the FPCR.AH "don't negate the sign of a NaN" semantics in FMLS
(indexed). We do this by creating 6 new helpers, which allow us to
do the negation either by XOR (for AH=0) or by muladd flags
(for AH=1).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: Mostly from RTH's patch; error in index order into fns[][]
 fixed]
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h            | 14 ++++++++++++++
 target/arm/tcg/translate-a64.c | 17 +++++++++++------
 target/arm/tcg/translate-sve.c | 31 +++++++++++++++++--------------
 target/arm/tcg/vec_helper.c    | 24 +++++++++++++++---------
 4 files changed, 57 insertions(+), 29 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_6(gvec_fmls_idx_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(gvec_fmls_idx_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(gvec_fmls_idx_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_5(gvec_uqadd_b, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_uqadd_h, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ TRANS(FMULX_vi, do_fp3_vector_idx, a, f_vector_idx_fmulx)
 
 static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
 {
-    static gen_helper_gvec_4_ptr * const fns[3] = {
-        gen_helper_gvec_fmla_idx_h,
-        gen_helper_gvec_fmla_idx_s,
-        gen_helper_gvec_fmla_idx_d,
+    static gen_helper_gvec_4_ptr * const fns[3][3] = {
+        { gen_helper_gvec_fmla_idx_h,
+          gen_helper_gvec_fmla_idx_s,
+          gen_helper_gvec_fmla_idx_d },
+        { gen_helper_gvec_fmls_idx_h,
+          gen_helper_gvec_fmls_idx_s,
+          gen_helper_gvec_fmls_idx_d },
+        { gen_helper_gvec_ah_fmls_idx_h,
+          gen_helper_gvec_ah_fmls_idx_s,
+          gen_helper_gvec_ah_fmls_idx_d },
     };
     MemOp esz = a->esz;
     int check = fp_access_check_vector_hsd(s, a->q, esz);
@@ -XXX,XX +XXX,XX @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg)
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
                       esz == MO_16 ? FPST_A64_F16 : FPST_A64,
-                      (a->idx << 1) | neg,
-                      fns[esz - 1]);
+                      a->idx, fns[neg ? 1 + s->fpcr_ah : 0][esz - 1]);
     return true;
 }
 
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ DO_SVE2_RRXR_ROT(CDOT_zzxw_d, gen_helper_sve2_cdot_idx_d)
  *** SVE Floating Point Multiply-Add Indexed Group
  */
 
-static bool do_FMLA_zzxz(DisasContext *s, arg_rrxr_esz *a, bool sub)
-{
-    static gen_helper_gvec_4_ptr * const fns[4] = {
-        NULL,
-        gen_helper_gvec_fmla_idx_h,
-        gen_helper_gvec_fmla_idx_s,
-        gen_helper_gvec_fmla_idx_d,
-    };
-    return gen_gvec_fpst_zzzz(s, fns[a->esz], a->rd, a->rn, a->rm, a->ra,
-                              (a->index << 1) | sub,
-                              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
-}
+static gen_helper_gvec_4_ptr * const fmla_idx_fns[4] = {
+    NULL,                       gen_helper_gvec_fmla_idx_h,
+    gen_helper_gvec_fmla_idx_s, gen_helper_gvec_fmla_idx_d
+};
+TRANS_FEAT(FMLA_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
+           fmla_idx_fns[a->esz], a->rd, a->rn, a->rm, a->ra, a->index,
+           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
-TRANS_FEAT(FMLA_zzxz, aa64_sve, do_FMLA_zzxz, a, false)
-TRANS_FEAT(FMLS_zzxz, aa64_sve, do_FMLA_zzxz, a, true)
+static gen_helper_gvec_4_ptr * const fmls_idx_fns[4][2] = {
+    { NULL, NULL },
+    { gen_helper_gvec_fmls_idx_h, gen_helper_gvec_ah_fmls_idx_h },
+    { gen_helper_gvec_fmls_idx_s, gen_helper_gvec_ah_fmls_idx_s },
+    { gen_helper_gvec_fmls_idx_d, gen_helper_gvec_ah_fmls_idx_d },
+};
+TRANS_FEAT(FMLS_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
+           fmls_idx_fns[a->esz][s->fpcr_ah],
+           a->rd, a->rn, a->rm, a->ra, a->index,
+           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
 /*
  *** SVE Floating Point Multiply Indexed Group
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_FMUL_IDX(gvec_fmls_nf_idx_s, float32_sub, float32_mul, float32, H4)
 
 #undef DO_FMUL_IDX
 
-#define DO_FMLA_IDX(NAME, TYPE, H)                                         \
+#define DO_FMLA_IDX(NAME, TYPE, H, NEGX, NEGF)                             \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *va,                  \
                   float_status *stat, uint32_t desc)                       \
 {                                                                          \
     intptr_t i, j, oprsz = simd_oprsz(desc);                               \
     intptr_t segment = MIN(16, oprsz) / sizeof(TYPE);                      \
-    TYPE op1_neg = extract32(desc, SIMD_DATA_SHIFT, 1);                    \
-    intptr_t idx = desc >> (SIMD_DATA_SHIFT + 1);                          \
+    intptr_t idx = simd_data(desc);                                        \
     TYPE *d = vd, *n = vn, *m = vm, *a = va;                               \
-    op1_neg <<= (8 * sizeof(TYPE) - 1);                                    \
     for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
         TYPE mm = m[H(i + idx)];                                           \
         for (j = 0; j < segment; j++) {                                    \
-            d[i + j] = TYPE##_muladd(n[i + j] ^ op1_neg,                   \
-                                     mm, a[i + j], 0, stat);               \
+            d[i + j] = TYPE##_muladd(n[i + j] ^ NEGX, mm,                  \
+                                     a[i + j], NEGF, stat);                \
         }                                                                  \
     }                                                                      \
     clear_tail(d, oprsz, simd_maxsz(desc));                                \
 }
 
-DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2)
-DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4)
-DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8)
+DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2, 0, 0)
+DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4, 0, 0)
+DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8, 0, 0)
+
+DO_FMLA_IDX(gvec_fmls_idx_h, float16, H2, INT16_MIN, 0)
+DO_FMLA_IDX(gvec_fmls_idx_s, float32, H4, INT32_MIN, 0)
+DO_FMLA_IDX(gvec_fmls_idx_d, float64, H8, INT64_MIN, 0)
+
+DO_FMLA_IDX(gvec_ah_fmls_idx_h, float16, H2, 0, float_muladd_negate_product)
+DO_FMLA_IDX(gvec_ah_fmls_idx_s, float32, H4, 0, float_muladd_negate_product)
+DO_FMLA_IDX(gvec_ah_fmls_idx_d, float64, H8, 0, float_muladd_negate_product)
 
 #undef DO_FMLA_IDX
 
-- 
2.34.1

Handle the FPCR.AH "don't negate the sign of a NaN" semantics
in FMLS (vector), by implementing a new set of helpers for
the AH=1 case.

The float_muladd_negate_product flag produces the same result
as negating either of the multiplication operands, assuming
neither of the operands are NaNs.  But since FEAT_AFP does not
negate NaNs, this behaviour is exactly what we need.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h            |  4 ++++
 target/arm/tcg/translate-a64.c |  7 ++++++-
 target/arm/tcg/vec_helper.c    | 22 ++++++++++++++++++++++
 3 files changed, 32 insertions(+), 1 deletion(-)

Handle the FPCR.AH "don't negate the sign of a NaN" semantics fro the
SVE FMLS (vector) insns, by providing new helpers for the AH=1 case
which end up passing fpcr_ah = true to the do_fmla_zpzzz_* functions
that do the work.

The float*_muladd functions have a flags argument that can
perform optional negation of various operand.  We don't use
that for "normal" arm fmla, because the muladd flags are not
applied when an input is a NaN.  But since FEAT_AFP does not
negate NaNs, this behaviour is exactly what we need.

The non-AH helpers pass in a zero flags argument and control the
negation via the neg1 and neg3 arguments; the AH helpers always pass
in neg1 and neg3 as zero and control the negation via the flags
argument.  This allows us to avoid conditional branches within the
inner loop.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 21 ++++++++
 target/arm/tcg/sve_helper.c    | 99 +++++++++++++++++++++++++++-------
 target/arm/tcg/translate-sve.c | 18 ++++---
 3 files changed, 114 insertions(+), 24 deletions(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+
+DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
+
 DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_ZPZ_FP(flogb_d, float64, H1_8, do_float64_logb_as_int)
 
 static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
                             float_status *status, uint32_t desc,
-                            uint16_t neg1, uint16_t neg3)
+                            uint16_t neg1, uint16_t neg3, int flags)
 {
     intptr_t i = simd_oprsz(desc);
     uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
                 e1 = *(uint16_t *)(vn + H1_2(i)) ^ neg1;
                 e2 = *(uint16_t *)(vm + H1_2(i));
                 e3 = *(uint16_t *)(va + H1_2(i)) ^ neg3;
-                r = float16_muladd(e1, e2, e3, 0, status);
+                r = float16_muladd(e1, e2, e3, flags, status);
                 *(uint16_t *)(vd + H1_2(i)) = r;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
 void HELPER(sve_fmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
 }
 
 void HELPER(sve_fmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0, 0);
 }
 
 void HELPER(sve_fnmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000, 0);
 }
 
 void HELPER(sve_fnmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000);
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0x8000, 0);
+}
+
+void HELPER(sve_ah_fmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product);
+}
+
+void HELPER(sve_ah_fnmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product | float_muladd_negate_c);
+}
+
+void HELPER(sve_ah_fnmls_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_h(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_c);
 }
 
 static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
                             float_status *status, uint32_t desc,
-                            uint32_t neg1, uint32_t neg3)
+                            uint32_t neg1, uint32_t neg3, int flags)
 {
     intptr_t i = simd_oprsz(desc);
     uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
                 e1 = *(uint32_t *)(vn + H1_4(i)) ^ neg1;
                 e2 = *(uint32_t *)(vm + H1_4(i));
                 e3 = *(uint32_t *)(va + H1_4(i)) ^ neg3;
-                r = float32_muladd(e1, e2, e3, 0, status);
+                r = float32_muladd(e1, e2, e3, flags, status);
                 *(uint32_t *)(vd + H1_4(i)) = r;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_s(void *vd, void *vn, void *vm, void *va, void *vg,
 void HELPER(sve_fmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
 }
 
 void HELPER(sve_fmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0, 0);
 }
 
 void HELPER(sve_fnmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0x80000000, 0x80000000, 0);
 }
 
 void HELPER(sve_fnmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000);
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0x80000000, 0);
+}
+
+void HELPER(sve_ah_fmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product);
+}
+
+void HELPER(sve_ah_fnmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product | float_muladd_negate_c);
+}
+
+void HELPER(sve_ah_fnmls_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_s(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_c);
 }
 
 static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
                             float_status *status, uint32_t desc,
-                            uint64_t neg1, uint64_t neg3)
+                            uint64_t neg1, uint64_t neg3, int flags)
 {
     intptr_t i = simd_oprsz(desc);
     uint64_t *g = vg;
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
                 e1 = *(uint64_t *)(vn + i) ^ neg1;
                 e2 = *(uint64_t *)(vm + i);
                 e3 = *(uint64_t *)(va + i) ^ neg3;
-                r = float64_muladd(e1, e2, e3, 0, status);
+                r = float64_muladd(e1, e2, e3, flags, status);
                 *(uint64_t *)(vd + i) = r;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ static void do_fmla_zpzzz_d(void *vd, void *vn, void *vm, void *va, void *vg,
 void HELPER(sve_fmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
 }
 
 void HELPER(sve_fmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                               void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, 0, 0);
 }
 
 void HELPER(sve_fnmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, INT64_MIN, INT64_MIN, 0);
 }
 
 void HELPER(sve_fnmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
-    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN);
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, INT64_MIN, 0);
+}
+
+void HELPER(sve_ah_fmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product);
+}
+
+void HELPER(sve_ah_fnmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_product | float_muladd_negate_c);
+}
+
+void HELPER(sve_ah_fnmls_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_d(vd, vn, vm, va, vg, status, desc, 0, 0,
+                    float_muladd_negate_c);
 }
 
 /* Two operand floating-point comparison controlled by a predicate.
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
            a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
-#define DO_FMLA(NAME, name) \
+#define DO_FMLA(NAME, name, ah_name)                                    \
     static gen_helper_gvec_5_ptr * const name##_fns[4] = {              \
         NULL, gen_helper_sve_##name##_h,                                \
         gen_helper_sve_##name##_s, gen_helper_sve_##name##_d            \
     };                                                                  \
-    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp, name##_fns[a->esz], \
+    static gen_helper_gvec_5_ptr * const name##_ah_fns[4] = {           \
+        NULL, gen_helper_sve_##ah_name##_h,                             \
+        gen_helper_sve_##ah_name##_s, gen_helper_sve_##ah_name##_d      \
+    };                                                                  \
+    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp,                     \
+               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz], \
                a->rd, a->rn, a->rm, a->ra, a->pg, 0,                    \
                a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
-DO_FMLA(FMLA_zpzzz, fmla_zpzzz)
-DO_FMLA(FMLS_zpzzz, fmls_zpzzz)
-DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz)
-DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz)
+/* We don't need an ah_fmla_zpzzz because fmla doesn't negate anything */
+DO_FMLA(FMLA_zpzzz, fmla_zpzzz, fmla_zpzzz)
+DO_FMLA(FMLS_zpzzz, fmls_zpzzz, ah_fmls_zpzzz)
+DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz, ah_fnmla_zpzzz)
+DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz, ah_fnmls_zpzzz)
 
 #undef DO_FMLA
 
-- 
2.34.1

The negation step in the SVE FTSSEL insn mustn't negate a NaN when
FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field
and use that to determine whether to do the negation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/sve_helper.c    | 18 +++++++++++++++---
 target/arm/tcg/translate-sve.c |  4 ++--
 2 files changed, 17 insertions(+), 5 deletions(-)

The negation step in the SVE FTMAD insn mustn't negate a NaN when
FPCR.AH is set.  Pass FPCR.AH to the helper via the SIMD data field,
so we can select the correct behaviour.

Because the operand is known to be negative, negating the operand
is the same as taking the absolute value.  Defer this to the muladd
operation via flags, so that it happens after NaN detection, which
is correct for FPCR.AH.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/sve_helper.c    | 42 ++++++++++++++++++++++++++--------
 target/arm/tcg/translate-sve.c |  3 ++-
 2 files changed, 35 insertions(+), 10 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

The negation step in FCMLA mustn't negate a NaN when FPCR.AH
is set. Handle this by passing FPCR.AH to the helper via the
SIMD data field, and use this to select whether to do the
negation via XOR or via the muladd negate_product flag.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-26-richard.henderson@linaro.org
[PMM: Expanded commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c |  2 +-
 target/arm/tcg/vec_helper.c    | 66 ++++++++++++++++++++--------------
 2 files changed, 40 insertions(+), 28 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
 
     gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
                       a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
-                      a->rot, fn[a->esz]);
+                      a->rot | (s->fpcr_ah << 2), fn[a->esz]);
     return true;
 }
 
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float16 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-    uint32_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float16 negx_imag, negx_real;
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 15;
-    neg_imag <<= 15;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 15;
+    negx_imag = (negf_imag & ~fpcr_ah) << 15;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < opr_sz / 2; i += 2) {
         float16 e2 = n[H2(i + flip)];
-        float16 e1 = m[H2(i + flip)] ^ neg_real;
+        float16 e1 = m[H2(i + flip)] ^ negx_real;
         float16 e4 = e2;
-        float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag;
+        float16 e3 = m[H2(i + 1 - flip)] ^ negx_imag;
 
-        d[H2(i)] = float16_muladd(e2, e1, a[H2(i)], 0, fpst);
-        d[H2(i + 1)] = float16_muladd(e4, e3, a[H2(i + 1)], 0, fpst);
+        d[H2(i)] = float16_muladd(e2, e1, a[H2(i)], negf_real, fpst);
+        d[H2(i + 1)] = float16_muladd(e4, e3, a[H2(i + 1)], negf_imag, fpst);
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlas)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float32 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-    uint32_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float32 negx_imag, negx_real;
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 31;
-    neg_imag <<= 31;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 31;
+    negx_imag = (negf_imag & ~fpcr_ah) << 31;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < opr_sz / 4; i += 2) {
         float32 e2 = n[H4(i + flip)];
-        float32 e1 = m[H4(i + flip)] ^ neg_real;
+        float32 e1 = m[H4(i + flip)] ^ negx_real;
         float32 e4 = e2;
-        float32 e3 = m[H4(i + 1 - flip)] ^ neg_imag;
+        float32 e3 = m[H4(i + 1 - flip)] ^ negx_imag;
 
-        d[H4(i)] = float32_muladd(e2, e1, a[H4(i)], 0, fpst);
-        d[H4(i + 1)] = float32_muladd(e4, e3, a[H4(i + 1)], 0, fpst);
+        d[H4(i)] = float32_muladd(e2, e1, a[H4(i)], negf_real, fpst);
+        d[H4(i + 1)] = float32_muladd(e4, e3, a[H4(i + 1)], negf_imag, fpst);
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlad)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float64 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint64_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
-    uint64_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float64 negx_real, negx_imag;
     uintptr_t i;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 63;
-    neg_imag <<= 63;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (uint64_t)(negf_real & ~fpcr_ah) << 63;
+    negx_imag = (uint64_t)(negf_imag & ~fpcr_ah) << 63;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < opr_sz / 8; i += 2) {
         float64 e2 = n[i + flip];
-        float64 e1 = m[i + flip] ^ neg_real;
+        float64 e1 = m[i + flip] ^ negx_real;
         float64 e4 = e2;
-        float64 e3 = m[i + 1 - flip] ^ neg_imag;
+        float64 e3 = m[i + 1 - flip] ^ negx_imag;
 
-        d[i] = float64_muladd(e2, e1, a[i], 0, fpst);
-        d[i + 1] = float64_muladd(e4, e3, a[i + 1], 0, fpst);
+        d[i] = float64_muladd(e2, e1, a[i], negf_real, fpst);
+        d[i + 1] = float64_muladd(e4, e3, a[i + 1], negf_imag, fpst);
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

The negation step in FCMLA by index mustn't negate a NaN when
FPCR.AH is set. Use the same approach as vector FCMLA of
passing in FPCR.AH and using it to select whether to negate
by XOR or by the muladd negate_product flag.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-27-richard.henderson@linaro.org
[PMM: Expanded commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c |  2 +-
 target/arm/tcg/vec_helper.c    | 44 ++++++++++++++++++++--------------
 2 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
     if (fp_access_check(s)) {
         gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
                           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
-                          (a->idx << 2) | a->rot, fn);
+                          (s->fpcr_ah << 4) | (a->idx << 2) | a->rot, fn);
     }
     return true;
 }
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float16 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
     intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
-    uint32_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 4, 1);
+    uint32_t negf_real = flip ^ negf_imag;
     intptr_t elements = opr_sz / sizeof(float16);
     intptr_t eltspersegment = MIN(16 / sizeof(float16), elements);
+    float16 negx_imag, negx_real;
     intptr_t i, j;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 15;
-    neg_imag <<= 15;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 15;
+    negx_imag = (negf_imag & ~fpcr_ah) << 15;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < elements; i += eltspersegment) {
         float16 mr = m[H2(i + 2 * index + 0)];
         float16 mi = m[H2(i + 2 * index + 1)];
-        float16 e1 = neg_real ^ (flip ? mi : mr);
-        float16 e3 = neg_imag ^ (flip ? mr : mi);
+        float16 e1 = negx_real ^ (flip ? mi : mr);
+        float16 e3 = negx_imag ^ (flip ? mr : mi);
 
         for (j = i; j < i + eltspersegment; j += 2) {
             float16 e2 = n[H2(j + flip)];
             float16 e4 = e2;
 
-            d[H2(j)] = float16_muladd(e2, e1, a[H2(j)], 0, fpst);
-            d[H2(j + 1)] = float16_muladd(e4, e3, a[H2(j + 1)], 0, fpst);
+            d[H2(j)] = float16_muladd(e2, e1, a[H2(j)], negf_real, fpst);
+            d[H2(j + 1)] = float16_muladd(e4, e3, a[H2(j + 1)], negf_imag, fpst);
         }
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm, void *va,
     uintptr_t opr_sz = simd_oprsz(desc);
     float32 *d = vd, *n = vn, *m = vm, *a = va;
     intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
-    uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
     intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
-    uint32_t neg_real = flip ^ neg_imag;
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 4, 1);
+    uint32_t negf_real = flip ^ negf_imag;
     intptr_t elements = opr_sz / sizeof(float32);
     intptr_t eltspersegment = MIN(16 / sizeof(float32), elements);
+    float32 negx_imag, negx_real;
     intptr_t i, j;
 
-    /* Shift boolean to the sign bit so we can xor to negate.  */
-    neg_real <<= 31;
-    neg_imag <<= 31;
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 31;
+    negx_imag = (negf_imag & ~fpcr_ah) << 31;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     for (i = 0; i < elements; i += eltspersegment) {
         float32 mr = m[H4(i + 2 * index + 0)];
         float32 mi = m[H4(i + 2 * index + 1)];
-        float32 e1 = neg_real ^ (flip ? mi : mr);
-        float32 e3 = neg_imag ^ (flip ? mr : mi);
+        float32 e1 = negx_real ^ (flip ? mi : mr);
+        float32 e3 = negx_imag ^ (flip ? mr : mi);
 
         for (j = i; j < i + eltspersegment; j += 2) {
             float32 e2 = n[H4(j + flip)];
             float32 e4 = e2;
 
-            d[H4(j)] = float32_muladd(e2, e1, a[H4(j)], 0, fpst);
-            d[H4(j + 1)] = float32_muladd(e4, e3, a[H4(j + 1)], 0, fpst);
+            d[H4(j)] = float32_muladd(e2, e1, a[H4(j)], negf_real, fpst);
+            d[H4(j + 1)] = float32_muladd(e4, e3, a[H4(j + 1)], negf_imag, fpst);
         }
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

The negation step in SVE FCMLA mustn't negate a NaN when FPCR.AH is
set.  Use the same approach as we did for A64 FCMLA of passing in
FPCR.AH and using it to select whether to negate by XOR or by the
muladd negate_product flag.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-28-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sve_helper.c    | 69 +++++++++++++++++++++-------------
 target/arm/tcg/translate-sve.c |  2 +-
 2 files changed, 43 insertions(+), 28 deletions(-)

diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
     intptr_t j, i = simd_oprsz(desc);
-    unsigned rot = simd_data(desc);
-    bool flip = rot & 1;
-    float16 neg_imag, neg_real;
+    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float16 negx_imag, negx_real;
     uint64_t *g = vg;
 
-    neg_imag = float16_set_sign(0, (rot & 2) != 0);
-    neg_real = float16_set_sign(0, rot == 1 || rot == 2);
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 15;
+    negx_imag = (negf_imag & ~fpcr_ah) << 15;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_h)(void *vd, void *vn, void *vm, void *va,
             mi = *(float16 *)(vm + H1_2(j));
 
             e2 = (flip ? ni : nr);
-            e1 = (flip ? mi : mr) ^ neg_real;
+            e1 = (flip ? mi : mr) ^ negx_real;
             e4 = e2;
-            e3 = (flip ? mr : mi) ^ neg_imag;
+            e3 = (flip ? mr : mi) ^ negx_imag;
 
             if (likely((pg >> (i & 63)) & 1)) {
                 d = *(float16 *)(va + H1_2(i));
-                d = float16_muladd(e2, e1, d, 0, status);
+                d = float16_muladd(e2, e1, d, negf_real, status);
                 *(float16 *)(vd + H1_2(i)) = d;
             }
             if (likely((pg >> (j & 63)) & 1)) {
                 d = *(float16 *)(va + H1_2(j));
-                d = float16_muladd(e4, e3, d, 0, status);
+                d = float16_muladd(e4, e3, d, negf_imag, status);
                 *(float16 *)(vd + H1_2(j)) = d;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
     intptr_t j, i = simd_oprsz(desc);
-    unsigned rot = simd_data(desc);
-    bool flip = rot & 1;
-    float32 neg_imag, neg_real;
+    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float32 negx_imag, negx_real;
     uint64_t *g = vg;
 
-    neg_imag = float32_set_sign(0, (rot & 2) != 0);
-    neg_real = float32_set_sign(0, rot == 1 || rot == 2);
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (negf_real & ~fpcr_ah) << 31;
+    negx_imag = (negf_imag & ~fpcr_ah) << 31;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_s)(void *vd, void *vn, void *vm, void *va,
             mi = *(float32 *)(vm + H1_2(j));
 
             e2 = (flip ? ni : nr);
-            e1 = (flip ? mi : mr) ^ neg_real;
+            e1 = (flip ? mi : mr) ^ negx_real;
             e4 = e2;
-            e3 = (flip ? mr : mi) ^ neg_imag;
+            e3 = (flip ? mr : mi) ^ negx_imag;
 
             if (likely((pg >> (i & 63)) & 1)) {
                 d = *(float32 *)(va + H1_2(i));
-                d = float32_muladd(e2, e1, d, 0, status);
+                d = float32_muladd(e2, e1, d, negf_real, status);
                 *(float32 *)(vd + H1_2(i)) = d;
             }
             if (likely((pg >> (j & 63)) & 1)) {
                 d = *(float32 *)(va + H1_2(j));
-                d = float32_muladd(e4, e3, d, 0, status);
+                d = float32_muladd(e4, e3, d, negf_imag, status);
                 *(float32 *)(vd + H1_2(j)) = d;
             }
         } while (i & 63);
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
                                void *vg, float_status *status, uint32_t desc)
 {
     intptr_t j, i = simd_oprsz(desc);
-    unsigned rot = simd_data(desc);
-    bool flip = rot & 1;
-    float64 neg_imag, neg_real;
+    bool flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+    uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    uint32_t negf_real = flip ^ negf_imag;
+    float64 negx_imag, negx_real;
     uint64_t *g = vg;
 
-    neg_imag = float64_set_sign(0, (rot & 2) != 0);
-    neg_real = float64_set_sign(0, rot == 1 || rot == 2);
+    /* With AH=0, use negx; with AH=1 use negf. */
+    negx_real = (uint64_t)(negf_real & ~fpcr_ah) << 63;
+    negx_imag = (uint64_t)(negf_imag & ~fpcr_ah) << 63;
+    negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+    negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
 
     do {
         uint64_t pg = g[(i - 1) >> 6];
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
             mi = *(float64 *)(vm + H1_2(j));
 
             e2 = (flip ? ni : nr);
-            e1 = (flip ? mi : mr) ^ neg_real;
+            e1 = (flip ? mi : mr) ^ negx_real;
             e4 = e2;
-            e3 = (flip ? mr : mi) ^ neg_imag;
+            e3 = (flip ? mr : mi) ^ negx_imag;
 
             if (likely((pg >> (i & 63)) & 1)) {
                 d = *(float64 *)(va + H1_2(i));
-                d = float64_muladd(e2, e1, d, 0, status);
+                d = float64_muladd(e2, e1, d, negf_real, status);
                 *(float64 *)(vd + H1_2(i)) = d;
             }
             if (likely((pg >> (j & 63)) & 1)) {
                 d = *(float64 *)(va + H1_2(j));
-                d = float64_muladd(e4, e3, d, 0, status);
+                d = float64_muladd(e4, e3, d, negf_imag, status);
                 *(float64 *)(vd + H1_2(j)) = d;
             }
         } while (i & 63);
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_5_ptr * const fcmla_fns[4] = {
     gen_helper_sve_fcmla_zpzzz_s, gen_helper_sve_fcmla_zpzzz_d,
 };
 TRANS_FEAT(FCMLA_zpzzz, aa64_sve, gen_gvec_fpst_zzzzp, fcmla_fns[a->esz],
-           a->rd, a->rn, a->rm, a->ra, a->pg, a->rot,
+           a->rd, a->rn, a->rm, a->ra, a->pg, a->rot | (s->fpcr_ah << 2),
            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
 static gen_helper_gvec_4_ptr * const fcmla_idx_fns[4] = {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Handle FPCR.AH's requirement to not negate the sign of a NaN
in FMLSL by element and vector, using the usual trick of
negating by XOR when AH=0 and by muladd flags when AH=1.

Since we have the CPUARMState* in the helper anyway, we can
look directly at env->vfp.fpcr and don't need toa pass in the
FPCR.AH value via the SIMD data word.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-31-richard.henderson@linaro.org
[PMM: commit message tweaked]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 71 ++++++++++++++++++++++++-------------
 1 file changed, 46 insertions(+), 25 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t load4_f16(uint64_t *ptr, int is_q, int is_2)
  */
 
 static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
-                     uint32_t desc, bool fz16)
+                     uint64_t negx, int negf, uint32_t desc, bool fz16)
 {
     intptr_t i, oprsz = simd_oprsz(desc);
-    int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
     int is_q = oprsz == 16;
     uint64_t n_4, m_4;
 
-    /* Pre-load all of the f16 data, avoiding overlap issues.  */
-    n_4 = load4_f16(vn, is_q, is_2);
+    /*
+     * Pre-load all of the f16 data, avoiding overlap issues.
+     * Negate all inputs for AH=0 FMLSL at once.
+     */
+    n_4 = load4_f16(vn, is_q, is_2) ^ negx;
     m_4 = load4_f16(vm, is_q, is_2);
 
-    /* Negate all inputs for FMLSL at once.  */
-    if (is_s) {
-        n_4 ^= 0x8000800080008000ull;
-    }
-
     for (i = 0; i < oprsz / 4; i++) {
         float32 n_1 = float16_to_float32_by_bits(n_4 >> (i * 16), fz16);
         float32 m_1 = float16_to_float32_by_bits(m_4 >> (i * 16), fz16);
-        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], 0, fpst);
+        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], negf, fpst);
     }
     clear_tail(d, oprsz, simd_maxsz(desc));
 }
@@ -XXX,XX +XXX,XX @@ static void do_fmlal(float32 *d, void *vn, void *vm, float_status *fpst,
 void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
                             CPUARMState *env, uint32_t desc)
 {
-    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, desc,
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint64_t negx = is_s ? 0x8000800080008000ull : 0;
+
+    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
              get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 }
 
 void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
                             CPUARMState *env, uint32_t desc)
 {
-    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, desc,
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint64_t negx = 0;
+    int negf = 0;
+
+    if (is_s) {
+        if (env->vfp.fpcr & FPCR_AH) {
+            negf = float_muladd_negate_product;
+        } else {
+            negx = 0x8000800080008000ull;
+        }
+    }
+    do_fmlal(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
              get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
 }
 
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
 }
 
 static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
-                         uint32_t desc, bool fz16)
+                         uint64_t negx, int negf, uint32_t desc, bool fz16)
 {
     intptr_t i, oprsz = simd_oprsz(desc);
-    int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
     int index = extract32(desc, SIMD_DATA_SHIFT + 2, 3);
     int is_q = oprsz == 16;
     uint64_t n_4;
     float32 m_1;
 
-    /* Pre-load all of the f16 data, avoiding overlap issues.  */
-    n_4 = load4_f16(vn, is_q, is_2);
-
-    /* Negate all inputs for FMLSL at once.  */
-    if (is_s) {
-        n_4 ^= 0x8000800080008000ull;
-    }
-
+    /*
+     * Pre-load all of the f16 data, avoiding overlap issues.
+     * Negate all inputs for AH=0 FMLSL at once.
+     */
+    n_4 = load4_f16(vn, is_q, is_2) ^ negx;
     m_1 = float16_to_float32_by_bits(((float16 *)vm)[H2(index)], fz16);
 
     for (i = 0; i < oprsz / 4; i++) {
         float32 n_1 = float16_to_float32_by_bits(n_4 >> (i * 16), fz16);
-        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], 0, fpst);
+        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], negf, fpst);
     }
     clear_tail(d, oprsz, simd_maxsz(desc));
 }
@@ -XXX,XX +XXX,XX @@ static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
 void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
                                 CPUARMState *env, uint32_t desc)
 {
-    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, desc,
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint64_t negx = is_s ? 0x8000800080008000ull : 0;
+
+    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 }
 
 void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
                                 CPUARMState *env, uint32_t desc)
 {
-    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, desc,
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
+    uint64_t negx = 0;
+    int negf = 0;
+
+    if (is_s) {
+        if (env->vfp.fpcr & FPCR_AH) {
+            negf = float_muladd_negate_product;
+        } else {
+            negx = 0x8000800080008000ull;
+        }
+    }
+    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status_a64, negx, negf, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64));
 }
 
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Handle FPCR.AH's requirement to not negate the sign of a NaN in SVE
FMLSL (indexed), using the usual trick of negating by XOR when AH=0
and by muladd flags when AH=1.

Since we have the CPUARMState* in the helper anyway, we can
look directly at env->vfp.fpcr and don't need toa pass in the
FPCR.AH value via the SIMD data word.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-32-richard.henderson@linaro.org
[PMM: commit message tweaked]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
                                CPUARMState *env, uint32_t desc)
 {
     intptr_t i, j, oprsz = simd_oprsz(desc);
-    uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
     float_status *status = &env->vfp.fp_status_a64;
     bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
+    int negx = 0, negf = 0;
+
+    if (is_s) {
+        if (env->vfp.fpcr & FPCR_AH) {
+            negf = float_muladd_negate_product;
+        } else {
+            negx = 0x8000;
+        }
+    }
 
     for (i = 0; i < oprsz; i += 16) {
         float16 mm_16 = *(float16 *)(vm + i + idx);
         float32 mm = float16_to_float32_by_bits(mm_16, fz16);
 
         for (j = 0; j < 16; j += sizeof(float32)) {
-            float16 nn_16 = *(float16 *)(vn + H1_2(i + j + sel)) ^ negn;
+            float16 nn_16 = *(float16 *)(vn + H1_2(i + j + sel)) ^ negx;
             float32 nn = float16_to_float32_by_bits(nn_16, fz16);
             float32 aa = *(float32 *)(va + H1_4(i + j));
 
             *(float32 *)(vd + H1_4(i + j)) =
-                float32_muladd(nn, mm, aa, 0, status);
+                float32_muladd(nn, mm, aa, negf, status);
         }
     }
 }
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Handle FPCR.AH's requirement to not negate the sign of a NaN in SVE
FMLSL (indexed), using the usual trick of negating by XOR when AH=0
and by muladd flags when AH=1.

Since we have the CPUARMState* in the helper anyway, we can
look directly at env->vfp.fpcr and don't need toa pass in the
FPCR.AH value via the SIMD data word.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-33-richard.henderson@linaro.org
[PMM: tweaked commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
                                CPUARMState *env, uint32_t desc)
 {
     intptr_t i, oprsz = simd_oprsz(desc);
-    uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
+    bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     float_status *status = &env->vfp.fp_status_a64;
     bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a64);
+    int negx = 0, negf = 0;
+
+    if (is_s) {
+        if (env->vfp.fpcr & FPCR_AH) {
+            negf = float_muladd_negate_product;
+        } else {
+            negx = 0x8000;
+        }
+    }
 
     for (i = 0; i < oprsz; i += sizeof(float32)) {
-        float16 nn_16 = *(float16 *)(vn + H1_2(i + sel)) ^ negn;
+        float16 nn_16 = *(float16 *)(vn + H1_2(i + sel)) ^ negx;
         float16 mm_16 = *(float16 *)(vm + H1_2(i + sel));
         float32 nn = float16_to_float32_by_bits(nn_16, fz16);
         float32 mm = float16_to_float32_by_bits(mm_16, fz16);
         float32 aa = *(float32 *)(va + H1_4(i));
 
-        *(float32 *)(vd + H1_4(i)) = float32_muladd(nn, mm, aa, 0, status);
+        *(float32 *)(vd + H1_4(i)) = float32_muladd(nn, mm, aa, negf, status);
     }
 }
 
-- 
2.34.1

Now that we have completed the handling for FPCR.{AH,FIZ,NEP}, we
can enable FEAT_AFP for '-cpu max', and document that we support it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 docs/system/arm/emulation.rst | 1 +
 target/arm/tcg/cpu64.c        | 1 +
 2 files changed, 2 insertions(+)

FEAT_RPRES implements an "increased precision" variant of the single
precision FRECPE and FRSQRTE instructions from an 8 bit to a 12
bit mantissa. This applies only when FPCR.AH == 1. Note that the
halfprec and double versions of these insns retain the 8 bit
precision regardless.

In this commit we add all the plumbing to make these instructions
call a new helper function when the increased-precision is in
effect. In the following commit we will provide the actual change
in behaviour in the helpers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu-features.h      |  5 +++++
 target/arm/helper.h            |  4 ++++
 target/arm/tcg/translate-a64.c | 34 ++++++++++++++++++++++++++++++----
 target/arm/tcg/translate-sve.c | 16 ++++++++++++++--
 target/arm/tcg/vec_helper.c    |  2 ++
 target/arm/vfp_helper.c        | 32 ++++++++++++++++++++++++++++++--
 6 files changed, 85 insertions(+), 8 deletions(-)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_mops(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, MOPS);
 }
 
+static inline bool isar_feature_aa64_rpres(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, RPRES);
+}
+
 static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
 {
     /* We always set the AdvSIMD and FP fields identically.  */
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, fpst)
 
 DEF_HELPER_FLAGS_2(recpe_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
 DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
+DEF_HELPER_FLAGS_2(recpe_rpres_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
 DEF_HELPER_FLAGS_2(recpe_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
 DEF_HELPER_FLAGS_2(rsqrte_f16, TCG_CALL_NO_RWG, f16, f16, fpst)
 DEF_HELPER_FLAGS_2(rsqrte_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
+DEF_HELPER_FLAGS_2(rsqrte_rpres_f32, TCG_CALL_NO_RWG, f32, f32, fpst)
 DEF_HELPER_FLAGS_2(rsqrte_f64, TCG_CALL_NO_RWG, f64, f64, fpst)
 DEF_HELPER_FLAGS_1(recpe_u32, TCG_CALL_NO_RWG, i32, i32)
 DEF_HELPER_FLAGS_1(rsqrte_u32, TCG_CALL_NO_RWG, i32, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vrintx_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 
 DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_4(gvec_frecpe_rpres_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 
 DEF_HELPER_FLAGS_4(gvec_frsqrte_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frsqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
+DEF_HELPER_FLAGS_4(gvec_frsqrte_rpres_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_frsqrte_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 
 DEF_HELPER_FLAGS_4(gvec_fcgt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frecpe = {
     gen_helper_recpe_f32,
     gen_helper_recpe_f64,
 };
-TRANS(FRECPE_s, do_fp1_scalar_ah, a, &f_scalar_frecpe, -1)
+static const FPScalar1 f_scalar_frecpe_rpres = {
+    gen_helper_recpe_f16,
+    gen_helper_recpe_rpres_f32,
+    gen_helper_recpe_f64,
+};
+TRANS(FRECPE_s, do_fp1_scalar_ah, a,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+      &f_scalar_frecpe_rpres : &f_scalar_frecpe, -1)
 
 static const FPScalar1 f_scalar_frecpx = {
     gen_helper_frecpx_f16,
@@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frsqrte = {
     gen_helper_rsqrte_f32,
     gen_helper_rsqrte_f64,
 };
-TRANS(FRSQRTE_s, do_fp1_scalar_ah, a, &f_scalar_frsqrte, -1)
+static const FPScalar1 f_scalar_frsqrte_rpres = {
+    gen_helper_rsqrte_f16,
+    gen_helper_rsqrte_rpres_f32,
+    gen_helper_rsqrte_f64,
+};
+TRANS(FRSQRTE_s, do_fp1_scalar_ah, a,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+      &f_scalar_frsqrte_rpres : &f_scalar_frsqrte, -1)
 
 static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a)
 {
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_frecpe[] = {
     gen_helper_gvec_frecpe_s,
     gen_helper_gvec_frecpe_d,
 };
-TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe)
+static gen_helper_gvec_2_ptr * const f_frecpe_rpres[] = {
+    gen_helper_gvec_frecpe_h,
+    gen_helper_gvec_frecpe_rpres_s,
+    gen_helper_gvec_frecpe_d,
+};
+TRANS(FRECPE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ? f_frecpe_rpres : f_frecpe)
 
 static gen_helper_gvec_2_ptr * const f_frsqrte[] = {
     gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s,
     gen_helper_gvec_frsqrte_d,
 };
-TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte)
+static gen_helper_gvec_2_ptr * const f_frsqrte_rpres[] = {
+    gen_helper_gvec_frsqrte_h,
+    gen_helper_gvec_frsqrte_rpres_s,
+    gen_helper_gvec_frsqrte_d,
+};
+TRANS(FRSQRTE_v, do_gvec_op2_ah_fpst, a->esz, a->q, a->rd, a->rn, 0,
+      s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ? f_frsqrte_rpres : f_frsqrte)
 
 static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a)
 {
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const frecpe_fns[] = {
     NULL,                     gen_helper_gvec_frecpe_h,
     gen_helper_gvec_frecpe_s, gen_helper_gvec_frecpe_d,
 };
-TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frecpe_fns[a->esz], a, 0)
+static gen_helper_gvec_2_ptr * const frecpe_rpres_fns[] = {
+    NULL,                           gen_helper_gvec_frecpe_h,
+    gen_helper_gvec_frecpe_rpres_s, gen_helper_gvec_frecpe_d,
+};
+TRANS_FEAT(FRECPE, aa64_sve, gen_gvec_fpst_ah_arg_zz,
+           s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+           frecpe_rpres_fns[a->esz] : frecpe_fns[a->esz], a, 0)
 
 static gen_helper_gvec_2_ptr * const frsqrte_fns[] = {
     NULL,                      gen_helper_gvec_frsqrte_h,
     gen_helper_gvec_frsqrte_s, gen_helper_gvec_frsqrte_d,
 };
-TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz, frsqrte_fns[a->esz], a, 0)
+static gen_helper_gvec_2_ptr * const frsqrte_rpres_fns[] = {
+    NULL,                            gen_helper_gvec_frsqrte_h,
+    gen_helper_gvec_frsqrte_rpres_s, gen_helper_gvec_frsqrte_d,
+};
+TRANS_FEAT(FRSQRTE, aa64_sve, gen_gvec_fpst_ah_arg_zz,
+           s->fpcr_ah && dc_isar_feature(aa64_rpres, s) ?
+           frsqrte_rpres_fns[a->esz] : frsqrte_fns[a->esz], a, 0)
 
 /*
  *** SVE Floating Point Compare with Zero Group
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, float_status *stat, uint32_t desc)  \
 
 DO_2OP(gvec_frecpe_h, helper_recpe_f16, float16)
 DO_2OP(gvec_frecpe_s, helper_recpe_f32, float32)
+DO_2OP(gvec_frecpe_rpres_s, helper_recpe_rpres_f32, float32)
 DO_2OP(gvec_frecpe_d, helper_recpe_f64, float64)
 
 DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16)
 DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32)
+DO_2OP(gvec_frsqrte_rpres_s, helper_rsqrte_rpres_f32, float32)
 DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64)
 
 DO_2OP(gvec_vrintx_h, float16_round_to_int, float16)
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, float_status *fpst)
     return make_float16(f16_val);
 }
 
-float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
+/*
+ * FEAT_RPRES means the f32 FRECPE has an "increased precision" variant
+ * which is used when FPCR.AH == 1.
+ */
+static float32 do_recpe_f32(float32 input, float_status *fpst, bool rpres)
 {
     float32 f32 = float32_squash_input_denormal(input, fpst);
     uint32_t f32_val = float32_val(f32);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
     return make_float32(f32_val);
 }
 
+float32 HELPER(recpe_f32)(float32 input, float_status *fpst)
+{
+    return do_recpe_f32(input, fpst, false);
+}
+
+float32 HELPER(recpe_rpres_f32)(float32 input, float_status *fpst)
+{
+    return do_recpe_f32(input, fpst, true);
+}
+
 float64 HELPER(recpe_f64)(float64 input, float_status *fpst)
 {
     float64 f64 = float64_squash_input_denormal(input, fpst);
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
     return make_float16(val);
 }
 
-float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
+/*
+ * FEAT_RPRES means the f32 FRSQRTE has an "increased precision" variant
+ * which is used when FPCR.AH == 1.
+ */
+static float32 do_rsqrte_f32(float32 input, float_status *s, bool rpres)
 {
     float32 f32 = float32_squash_input_denormal(input, s);
     uint32_t val = float32_val(f32);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
     return make_float32(val);
 }
 
+float32 HELPER(rsqrte_f32)(float32 input, float_status *s)
+{
+    return do_rsqrte_f32(input, s, false);
+}
+
+float32 HELPER(rsqrte_rpres_f32)(float32 input, float_status *s)
+{
+    return do_rsqrte_f32(input, s, true);
+}
+
 float64 HELPER(rsqrte_f64)(float64 input, float_status *s)
 {
     float64 f64 = float64_squash_input_denormal(input, s);
-- 
2.34.1

Implement the increased precision variation of FRECPE.  In the
pseudocode this corresponds to the handling of the
"increasedprecision" boolean in the FPRecipEstimate() and
RecipEstimate() functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vfp_helper.c | 54 +++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 8 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static int recip_estimate(int input)
     return r;
 }
 
+/*
+ * Increased precision version:
+ * input is a 13 bit fixed point number
+ * input range 2048 .. 4095 for a number from 0.5 <= x < 1.0.
+ * result range 4096 .. 8191 for a number from 1.0 to 2.0
+ */
+static int recip_estimate_incprec(int input)
+{
+    int a, b, r;
+    assert(2048 <= input && input < 4096);
+    a = (input * 2) + 1;
+    /*
+     * The pseudocode expresses this as an operation on infinite
+     * precision reals where it calculates 2^25 / a and then looks
+     * at the error between that and the rounded-down-to-integer
+     * value to see if it should instead round up. We instead
+     * follow the same approach as the pseudocode for the 8-bit
+     * precision version, and calculate (2 * (2^25 / a)) as an
+     * integer so we can do the "add one and halve" to round it.
+     * So the 1 << 26 here is correct.
+     */
+    b = (1 << 26) / a;
+    r = (b + 1) >> 1;
+    assert(4096 <= r && r < 8192);
+    return r;
+}
+
 /*
  * Common wrapper to call recip_estimate
  *
@@ -XXX,XX +XXX,XX @@ static int recip_estimate(int input)
  * callee.
  */
 
-static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
+static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac,
+                                    bool increasedprecision)
 {
     uint32_t scaled, estimate;
     uint64_t result_frac;
@@ -XXX,XX +XXX,XX @@ static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
         }
     }
 
-    /* scaled = UInt('1':fraction<51:44>) */
-    scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
-    estimate = recip_estimate(scaled);
+    if (increasedprecision) {
+        /* scaled = UInt('1':fraction<51:41>) */
+        scaled = deposit32(1 << 11, 0, 11, extract64(frac, 41, 11));
+        estimate = recip_estimate_incprec(scaled);
+    } else {
+        /* scaled = UInt('1':fraction<51:44>) */
+        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
+        estimate = recip_estimate(scaled);
+    }
 
     result_exp = exp_off - *exp;
-    result_frac = deposit64(0, 44, 8, estimate);
+    if (increasedprecision) {
+        result_frac = deposit64(0, 40, 12, estimate);
+    } else {
+        result_frac = deposit64(0, 44, 8, estimate);
+    }
     if (result_exp == 0) {
         result_frac = deposit64(result_frac >> 1, 51, 1, 1);
     } else if (result_exp == -1) {
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, float_status *fpst)
     }
 
     f64_frac = call_recip_estimate(&f16_exp, 29,
-                                   ((uint64_t) f16_frac) << (52 - 10));
+                                   ((uint64_t) f16_frac) << (52 - 10), false);
 
     /* result = sign : result_exp<4:0> : fraction<51:42> */
     f16_val = deposit32(0, 15, 1, f16_sign);
@@ -XXX,XX +XXX,XX @@ static float32 do_recpe_f32(float32 input, float_status *fpst, bool rpres)
     }
 
     f64_frac = call_recip_estimate(&f32_exp, 253,
-                                   ((uint64_t) f32_frac) << (52 - 23));
+                                   ((uint64_t) f32_frac) << (52 - 23), rpres);
 
     /* result = sign : result_exp<7:0> : fraction<51:29> */
     f32_val = deposit32(0, 31, 1, f32_sign);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(recpe_f64)(float64 input, float_status *fpst)
         return float64_set_sign(float64_zero, float64_is_neg(f64));
     }
 
-    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac);
+    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac, false);
 
     /* result = sign : result_exp<10:0> : fraction<51:0>; */
     f64_val = deposit64(0, 63, 1, f64_sign);
-- 
2.34.1

Implement the increased precision variation of FRSQRTE.  In the
pseudocode this corresponds to the handling of the
"increasedprecision" boolean in the FPRSqrtEstimate() and
RecipSqrtEstimate() functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vfp_helper.c | 77 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 64 insertions(+), 13 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static int do_recip_sqrt_estimate(int a)
     return estimate;
 }
 
+static int do_recip_sqrt_estimate_incprec(int a)
+{
+    /*
+     * The Arm ARM describes the 12-bit precision version of RecipSqrtEstimate
+     * in terms of an infinite-precision floating point calculation of a
+     * square root. We implement this using the same kind of pure integer
+     * algorithm as the 8-bit mantissa, to get the same bit-for-bit result.
+     */
+    int64_t b, estimate;
 
-static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
+    assert(1024 <= a && a < 4096);
+    if (a < 2048) {
+        a = a * 2 + 1;
+    } else {
+        a = (a >> 1) << 1;
+        a = (a + 1) * 2;
+    }
+    b = 8192;
+    while (a * (b + 1) * (b + 1) < (1ULL << 39)) {
+        b += 1;
+    }
+    estimate = (b + 1) / 2;
+
+    assert(4096 <= estimate && estimate < 8192);
+
+    return estimate;
+}
+
+static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac,
+                                    bool increasedprecision)
 {
     int estimate;
     uint32_t scaled;
@@ -XXX,XX +XXX,XX @@ static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
         frac = extract64(frac, 0, 51) << 1;
     }
 
-    if (*exp & 1) {
-        /* scaled = UInt('01':fraction<51:45>) */
-        scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
+    if (increasedprecision) {
+        if (*exp & 1) {
+            /* scaled = UInt('01':fraction<51:42>) */
+            scaled = deposit32(1 << 10, 0, 10, extract64(frac, 42, 10));
+        } else {
+            /* scaled = UInt('1':fraction<51:41>) */
+            scaled = deposit32(1 << 11, 0, 11, extract64(frac, 41, 11));
+        }
+        estimate = do_recip_sqrt_estimate_incprec(scaled);
     } else {
-        /* scaled = UInt('1':fraction<51:44>) */
-        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
+        if (*exp & 1) {
+            /* scaled = UInt('01':fraction<51:45>) */
+            scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
+        } else {
+            /* scaled = UInt('1':fraction<51:44>) */
+            scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
+        }
+        estimate = do_recip_sqrt_estimate(scaled);
     }
-    estimate = do_recip_sqrt_estimate(scaled);
 
     *exp = (exp_off - *exp) / 2;
-    return extract64(estimate, 0, 8) << 44;
+    if (increasedprecision) {
+        return extract64(estimate, 0, 12) << 40;
+    } else {
+        return extract64(estimate, 0, 8) << 44;
+    }
 }
 
 uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, float_status *s)
 
     f64_frac = ((uint64_t) f16_frac) << (52 - 10);
 
-    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac);
+    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac, false);
 
     /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(2) */
     val = deposit32(0, 15, 1, f16_sign);
@@ -XXX,XX +XXX,XX @@ static float32 do_rsqrte_f32(float32 input, float_status *s, bool rpres)
 
     f64_frac = ((uint64_t) f32_frac) << 29;
 
-    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac);
+    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac, rpres);
 
-    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(15) */
+    /*
+     * result = sign : result_exp<7:0> : estimate<7:0> : Zeros(15)
+     * or for increased precision
+     * result = sign : result_exp<7:0> : estimate<11:0> : Zeros(11)
+     */
     val = deposit32(0, 31, 1, f32_sign);
     val = deposit32(val, 23, 8, f32_exp);
-    val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
+    if (rpres) {
+        val = deposit32(val, 11, 12, extract64(f64_frac, 52 - 12, 12));
+    } else {
+        val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
+    }
     return make_float32(val);
 }
 
@@ -XXX,XX +XXX,XX @@ float64 HELPER(rsqrte_f64)(float64 input, float_status *s)
         return float64_zero;
     }
 
-    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac);
+    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac, false);
 
     /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(44) */
     val = deposit64(0, 61, 1, f64_sign);
-- 
2.34.1

Now the emulation is complete, we can enable FEAT_RPRES for the 'max'
CPU type.

From: Richard Henderson <richard.henderson@linaro.org>

Move ARMFPStatusFlavour to cpu.h with which to index
this array.  For now, place the array in an anonymous
union with the existing structures.  Adjust the order
of the existing structures to match the enum.

Simplify fpstatus_ptr() using the new array.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           | 119 +++++++++++++++++++++----------------
 target/arm/tcg/translate.h |  64 +-------------------
 2 files changed, 70 insertions(+), 113 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct ARMMMUFaultInfo ARMMMUFaultInfo;
 
 typedef struct NVICState NVICState;
 
+/*
+ * Enum for indexing vfp.fp_status[].
+ *
+ * FPST_A32: is the "normal" fp status for AArch32 insns
+ * FPST_A64: is the "normal" fp status for AArch64 insns
+ * FPST_A32_F16: used for AArch32 half-precision calculations
+ * FPST_A64_F16: used for AArch64 half-precision calculations
+ * FPST_STD: the ARM "Standard FPSCR Value"
+ * FPST_STD_F16: used for half-precision
+ *       calculations with the ARM "Standard FPSCR Value"
+ * FPST_AH: used for the A64 insns which change behaviour
+ *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
+ *       and the reciprocal and square root estimate/step insns)
+ * FPST_AH_F16: used for the A64 insns which change behaviour
+ *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
+ *       and the reciprocal and square root estimate/step insns);
+ *       for half-precision
+ *
+ * Half-precision operations are governed by a separate
+ * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
+ * status structure to control this.
+ *
+ * The "Standard FPSCR", ie default-NaN, flush-to-zero,
+ * round-to-nearest and is used by any operations (generally
+ * Neon) which the architecture defines as controlled by the
+ * standard FPSCR value rather than the FPSCR.
+ *
+ * The "standard FPSCR but for fp16 ops" is needed because
+ * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
+ * using a fixed value for it.
+ *
+ * The ah_fp_status is needed because some insns have different
+ * behaviour when FPCR.AH == 1: they don't update cumulative
+ * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
+ * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
+ * which means we need an ah_fp_status_f16 as well.
+ *
+ * To avoid having to transfer exception bits around, we simply
+ * say that the FPSCR cumulative exception flags are the logical
+ * OR of the flags in the four fp statuses. This relies on the
+ * only thing which needs to read the exception flags being
+ * an explicit FPSCR read.
+ */
+typedef enum ARMFPStatusFlavour {
+    FPST_A32,
+    FPST_A64,
+    FPST_A32_F16,
+    FPST_A64_F16,
+    FPST_AH,
+    FPST_AH_F16,
+    FPST_STD,
+    FPST_STD_F16,
+} ARMFPStatusFlavour;
+#define FPST_COUNT  8
+
 typedef struct CPUArchState {
     /* Regs for current mode.  */
     uint32_t regs[16];
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
         /* Scratch space for aa32 neon expansion.  */
         uint32_t scratch[8];
 
-        /* There are a number of distinct float control structures:
-         *
-         *  fp_status_a32: is the "normal" fp status for AArch32 insns
-         *  fp_status_a64: is the "normal" fp status for AArch64 insns
-         *  fp_status_fp16_a32: used for AArch32 half-precision calculations
-         *  fp_status_fp16_a64: used for AArch64 half-precision calculations
-         *  standard_fp_status : the ARM "Standard FPSCR Value"
-         *  standard_fp_status_fp16 : used for half-precision
-         *       calculations with the ARM "Standard FPSCR Value"
-         *  ah_fp_status: used for the A64 insns which change behaviour
-         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
-         *       and the reciprocal and square root estimate/step insns)
-         *  ah_fp_status_f16: used for the A64 insns which change behaviour
-         *       when FPCR.AH == 1 (bfloat16 conversions and multiplies,
-         *       and the reciprocal and square root estimate/step insns);
-         *       for half-precision
-         *
-         * Half-precision operations are governed by a separate
-         * flush-to-zero control bit in FPSCR:FZ16. We pass a separate
-         * status structure to control this.
-         *
-         * The "Standard FPSCR", ie default-NaN, flush-to-zero,
-         * round-to-nearest and is used by any operations (generally
-         * Neon) which the architecture defines as controlled by the
-         * standard FPSCR value rather than the FPSCR.
-         *
-         * The "standard FPSCR but for fp16 ops" is needed because
-         * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
-         * using a fixed value for it.
-         *
-         * The ah_fp_status is needed because some insns have different
-         * behaviour when FPCR.AH == 1: they don't update cumulative
-         * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
-         * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
-         * which means we need an ah_fp_status_f16 as well.
-         *
-         * To avoid having to transfer exception bits around, we simply
-         * say that the FPSCR cumulative exception flags are the logical
-         * OR of the flags in the four fp statuses. This relies on the
-         * only thing which needs to read the exception flags being
-         * an explicit FPSCR read.
-         */
-        float_status fp_status_a32;
-        float_status fp_status_a64;
-        float_status fp_status_f16_a32;
-        float_status fp_status_f16_a64;
-        float_status standard_fp_status;
-        float_status standard_fp_status_f16;
-        float_status ah_fp_status;
-        float_status ah_fp_status_f16;
+        /* There are a number of distinct float control structures. */
+        union {
+            float_status fp_status[FPST_COUNT];
+            struct {
+                float_status fp_status_a32;
+                float_status fp_status_a64;
+                float_status fp_status_f16_a32;
+                float_status fp_status_f16_a64;
+                float_status ah_fp_status;
+                float_status ah_fp_status_f16;
+                float_status standard_fp_status;
+                float_status standard_fp_status_f16;
+            };
+        };
 
         uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
         uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ static inline CPUARMTBFlags arm_tbflags_from_tb(const TranslationBlock *tb)
     return (CPUARMTBFlags){ tb->flags, tb->cs_base };
 }
 
-/*
- * Enum for argument to fpstatus_ptr().
- */
-typedef enum ARMFPStatusFlavour {
-    FPST_A32,
-    FPST_A64,
-    FPST_A32_F16,
-    FPST_A64_F16,
-    FPST_AH,
-    FPST_AH_F16,
-    FPST_STD,
-    FPST_STD_F16,
-} ARMFPStatusFlavour;
-
 /**
  * fpstatus_ptr: return TCGv_ptr to the specified fp_status field
  *
  * We have multiple softfloat float_status fields in the Arm CPU state struct
  * (see the comment in cpu.h for details). Return a TCGv_ptr which has
  * been set up to point to the requested field in the CPU state struct.
- * The options are:
- *
- * FPST_A32
- *   for AArch32 non-FP16 operations controlled by the FPCR
- * FPST_A64
- *   for AArch64 non-FP16 operations controlled by the FPCR
- * FPST_A32_F16
- *   for AArch32 operations controlled by the FPCR where FPCR.FZ16 is to be used
- * FPST_A64_F16
- *   for AArch64 operations controlled by the FPCR where FPCR.FZ16 is to be used
- * FPST_AH:
- *   for AArch64 operations which change behaviour when AH=1 (specifically,
- *   bfloat16 conversions and multiplies, and the reciprocal and square root
- *   estimate/step insns)
- * FPST_AH_F16:
- *   ditto, but for half-precision operations
- * FPST_STD
- *   for A32/T32 Neon operations using the "standard FPSCR value"
- * FPST_STD_F16
- *   as FPST_STD, but where FPCR.FZ16 is to be used
  */
 static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
 {
     TCGv_ptr statusptr = tcg_temp_new_ptr();
-    int offset;
+    int offset = offsetof(CPUARMState, vfp.fp_status[flavour]);
 
-    switch (flavour) {
-    case FPST_A32:
-        offset = offsetof(CPUARMState, vfp.fp_status_a32);
-        break;
-    case FPST_A64:
-        offset = offsetof(CPUARMState, vfp.fp_status_a64);
-        break;
-    case FPST_A32_F16:
-        offset = offsetof(CPUARMState, vfp.fp_status_f16_a32);
-        break;
-    case FPST_A64_F16:
-        offset = offsetof(CPUARMState, vfp.fp_status_f16_a64);
-        break;
-    case FPST_AH:
-        offset = offsetof(CPUARMState, vfp.ah_fp_status);
-        break;
-    case FPST_AH_F16:
-        offset = offsetof(CPUARMState, vfp.ah_fp_status_f16);
-        break;
-    case FPST_STD:
-        offset = offsetof(CPUARMState, vfp.standard_fp_status);
-        break;
-    case FPST_STD_F16:
-        offset = offsetof(CPUARMState, vfp.standard_fp_status_f16);
-        break;
-    default:
-        g_assert_not_reached();
-    }
     tcg_gen_addi_ptr(statusptr, tcg_env, offset);
     return statusptr;
 }
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_STD_F16].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  4 ++--
 target/arm/tcg/mve_helper.c | 24 ++++++++++++------------
 target/arm/vfp_helper.c     |  8 ++++----
 4 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
                 float_status ah_fp_status;
                 float_status ah_fp_status_f16;
                 float_status standard_fp_status;
-                float_status standard_fp_status_f16;
             };
         };
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     set_flush_to_zero(1, &env->vfp.standard_fp_status);
     set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
     set_default_nan_mode(1, &env->vfp.standard_fp_status);
-    set_default_nan_mode(1, &env->vfp.standard_fp_status_f16);
+    set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
     arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
-    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
     arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
     set_flush_to_zero(1, &env->vfp.ah_fp_status);
     set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                 r[e] = 0;                                               \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(tm & 1)) {                                            \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE * 2)) == 0) {          \
                 continue;                                               \
             }                                                           \
-            fpst0 = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :   \
+            fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
                 &env->vfp.standard_fp_status;                           \
             fpst1 = fpst0;                                              \
             if (!(mask & 1)) {                                          \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
         TYPE *m = vm;                                           \
         TYPE ra = (TYPE)ra_in;                                  \
         float_status *fpst = (ESIZE == 2) ?                     \
-            &env->vfp.standard_fp_status_f16 :                  \
+            &env->vfp.fp_status[FPST_STD_F16] :                 \
             &env->vfp.standard_fp_status;                       \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
             if (mask & 1) {                                     \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
             if ((mask & emask) == 0) {                                  \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
             if ((mask & emask) == 0) {                                  \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
         float_status *fpst;                                             \
         float_status scratch_fpst;                                      \
         float_status *base_fpst = (ESIZE == 2) ?                        \
-            &env->vfp.standard_fp_status_f16 :                          \
+            &env->vfp.fp_status[FPST_STD_F16] :                         \
             &env->vfp.standard_fp_status;                               \
         uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
         set_float_rounding_mode(rmode, base_fpst);                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 :    \
+            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
                 &env->vfp.standard_fp_status;                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     /* FZ16 does not generate an input denormal exception.  */
     a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
           & ~float_flag_input_denormal_flushed);
-    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
+    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_STD_F16])
           & ~float_flag_input_denormal_flushed);
 
     a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
     set_float_exception_flags(0, &env->vfp.standard_fp_status);
-    set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
     set_float_exception_flags(0, &env->vfp.ah_fp_status);
     set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
 }
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         bool ftz_enabled = val & FPCR_FZ16;
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
-        set_flush_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
         set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.standard_fp_status_f16);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
     }
     if (changed & FPCR_FZ) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_STD].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  8 ++++----
 target/arm/tcg/mve_helper.c | 28 ++++++++++++++--------------
 target/arm/tcg/vec_helper.c |  4 ++--
 target/arm/vfp_helper.c     |  4 ++--
 5 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
                 float_status fp_status_f16_a64;
                 float_status ah_fp_status;
                 float_status ah_fp_status_f16;
-                float_status standard_fp_status;
             };
         };
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
         env->sau.ctrl = 0;
     }
 
-    set_flush_to_zero(1, &env->vfp.standard_fp_status);
-    set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
-    set_default_nan_mode(1, &env->vfp.standard_fp_status);
+    set_flush_to_zero(1, &env->vfp.fp_status[FPST_STD]);
+    set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]);
+    set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_a64);
-    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(tm & 1)) {                                            \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
                 continue;                                               \
             }                                                           \
             fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             fpst1 = fpst0;                                              \
             if (!(mask & 1)) {                                          \
                 scratch_fpst = *fpst0;                                  \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
         TYPE ra = (TYPE)ra_in;                                  \
         float_status *fpst = (ESIZE == 2) ?                     \
             &env->vfp.fp_status[FPST_STD_F16] :                 \
-            &env->vfp.standard_fp_status;                       \
+            &env->vfp.fp_status[FPST_STD];                       \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
             if (mask & 1) {                                     \
                 TYPE v = m[H##ESIZE(e)];                        \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
         float_status scratch_fpst;                                      \
         float_status *base_fpst = (ESIZE == 2) ?                        \
             &env->vfp.fp_status[FPST_STD_F16] :                         \
-            &env->vfp.standard_fp_status;                               \
+            &env->vfp.fp_status[FPST_STD];                               \
         uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
         set_float_rounding_mode(rmode, base_fpst);                      \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
@@ -XXX,XX +XXX,XX @@ static void do_vcvt_sh(CPUARMState *env, void *vd, void *vm, int top)
     unsigned e;
     float_status *fpst;
     float_status scratch_fpst;
-    float_status *base_fpst = &env->vfp.standard_fp_status;
+    float_status *base_fpst = &env->vfp.fp_status[FPST_STD];
     bool old_fz = get_flush_to_zero(base_fpst);
     set_flush_to_zero(false, base_fpst);
     for (e = 0; e < 16 / 4; e++, mask >>= 4) {
@@ -XXX,XX +XXX,XX @@ static void do_vcvt_hs(CPUARMState *env, void *vd, void *vm, int top)
     unsigned e;
     float_status *fpst;
     float_status scratch_fpst;
-    float_status *base_fpst = &env->vfp.standard_fp_status;
+    float_status *base_fpst = &env->vfp.fp_status[FPST_STD];
     bool old_fiz = get_flush_inputs_to_zero(base_fpst);
     set_flush_inputs_to_zero(false, base_fpst);
     for (e = 0; e < 16 / 4; e++, mask >>= 4) {
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
                 continue;                                               \
             }                                                           \
             fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.standard_fp_status;                           \
+                &env->vfp.fp_status[FPST_STD];                           \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 
-    do_fmlal(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
+    do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
              get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 }
 
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 
-    do_fmlal_idx(vd, vn, vm, &env->vfp.standard_fp_status, negx, 0, desc,
+    do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16_a32));
 }
 
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     uint32_t a32_flags = 0, a64_flags = 0;
 
     a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
-    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
+    a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
     /* FZ16 does not generate an input denormal exception.  */
     a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
           & ~float_flag_input_denormal_flushed);
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status_a64);
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a32);
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
-    set_float_exception_flags(0, &env->vfp.standard_fp_status);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
     set_float_exception_flags(0, &env->vfp.ah_fp_status);
     set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_AH_F16].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h        |  3 +--
 target/arm/cpu.c        |  2 +-
 target/arm/vfp_helper.c | 10 +++++-----
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState;
  * behaviour when FPCR.AH == 1: they don't update cumulative
  * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
  * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
- * which means we need an ah_fp_status_f16 as well.
+ * which means we need an FPST_AH_F16 as well.
  *
  * To avoid having to transfer exception bits around, we simply
  * say that the FPSCR cumulative exception flags are the logical
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
                 float_status fp_status_f16_a32;
                 float_status fp_status_f16_a64;
                 float_status ah_fp_status;
-                float_status ah_fp_status_f16;
             };
         };
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
     set_flush_to_zero(1, &env->vfp.ah_fp_status);
     set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
-    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status_f16);
+    arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH_F16]);
 
 #ifndef CONFIG_USER_ONLY
     if (kvm_enabled()) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
     /*
-     * We do not merge in flags from ah_fp_status or ah_fp_status_f16, because
+     * We do not merge in flags from ah_fp_status or FPST_AH_F16, because
      * they are used for insns that must not set the cumulative exception bits.
      */
 
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
     set_float_exception_flags(0, &env->vfp.ah_fp_status);
-    set_float_exception_flags(0, &env->vfp.ah_fp_status_f16);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH_F16]);
 }
 
 static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-        set_flush_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a32);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16_a64);
         set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.ah_fp_status_f16);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]);
     }
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
         set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
-        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status_f16);
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
     }
     if (changed & FPCR_AH) {
         bool ah_enabled = val & FPCR_AH;
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_AH].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h        | 3 +--
 target/arm/cpu.c        | 6 +++---
 target/arm/vfp_helper.c | 6 +++---
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState;
  * the "standard FPSCR" tracks the FPSCR.FZ16 bit rather than
  * using a fixed value for it.
  *
- * The ah_fp_status is needed because some insns have different
+ * FPST_AH is needed because some insns have different
  * behaviour when FPCR.AH == 1: they don't update cumulative
  * exception flags, they act like FPCR.{FZ,FIZ} = {1,1} and
  * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16,
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
                 float_status fp_status_a64;
                 float_status fp_status_f16_a32;
                 float_status fp_status_f16_a64;
-                float_status ah_fp_status;
             };
         };
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a32);
     arm_set_default_fp_behaviours(&env->vfp.fp_status_f16_a64);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]);
-    arm_set_ah_fp_behaviours(&env->vfp.ah_fp_status);
-    set_flush_to_zero(1, &env->vfp.ah_fp_status);
-    set_flush_inputs_to_zero(1, &env->vfp.ah_fp_status);
+    arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]);
+    set_flush_to_zero(1, &env->vfp.fp_status[FPST_AH]);
+    set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_AH]);
     arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH_F16]);
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
     a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used));
     /*
-     * We do not merge in flags from ah_fp_status or FPST_AH_F16, because
+     * We do not merge in flags from FPST_AH or FPST_AH_F16, because
      * they are used for insns that must not set the cumulative exception bits.
      */
 
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.fp_status_f16_a64);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_STD_F16]);
-    set_float_exception_flags(0, &env->vfp.ah_fp_status);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_AH_F16]);
 }
 
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a64);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
-        set_default_nan_mode(dnan_enabled, &env->vfp.ah_fp_status);
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH]);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_AH_F16]);
     }
     if (changed & FPCR_AH) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_A64_F16].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  2 +-
 target/arm/tcg/sme_helper.c |  2 +-
 target/arm/tcg/vec_helper.c |  9 ++++-----
 target/arm/vfp_helper.c     | 16 ++++++++--------
 5 files changed, 14 insertions(+), 16 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_A32_F16].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  2 +-
 target/arm/tcg/vec_helper.c |  4 ++--
 target/arm/vfp_helper.c     | 14 +++++++-------
 4 files changed, 10 insertions(+), 11 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_A64].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-14-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 -
 target/arm/cpu.c            |  2 +-
 target/arm/tcg/sme_helper.c |  2 +-
 target/arm/tcg/vec_helper.c | 10 +++++-----
 target/arm/vfp_helper.c     | 16 ++++++++--------
 5 files changed, 15 insertions(+), 16 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Replace with fp_status[FPST_A32].  As this was the last of the
old structures, we can remove the anonymous union and struct.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-15-richard.henderson@linaro.org
[PMM: tweak to account for change to is_ebf()]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  7 +------
 target/arm/cpu.c            |  2 +-
 target/arm/tcg/vec_helper.c |  2 +-
 target/arm/vfp_helper.c     | 18 +++++++++---------
 4 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
         uint32_t scratch[8];
 
         /* There are a number of distinct float control structures. */
-        union {
-            float_status fp_status[FPST_COUNT];
-            struct {
-                float_status fp_status_a32;
-            };
-        };
+        float_status fp_status[FPST_COUNT];
 
         uint64_t zcr_el[4];   /* ZCR_EL[1-3] */
         uint64_t smcr_el[4];  /* SMCR_EL[1-3] */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]);
     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]);
     set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]);
-    arm_set_default_fp_behaviours(&env->vfp.fp_status_a32);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]);
     arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]);
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp)
      */
     bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF;
 
-    *statusp = is_a64(env) ? env->vfp.fp_status[FPST_A64] : env->vfp.fp_status_a32;
+    *statusp = env->vfp.fp_status[is_a64(env) ? FPST_A64 : FPST_A32];
     set_default_nan_mode(true, statusp);
 
     if (ebf) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
 {
     uint32_t a32_flags = 0, a64_flags = 0;
 
-    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
+    a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_A32]);
     a32_flags |= get_float_exception_flags(&env->vfp.fp_status[FPST_STD]);
     /* FZ16 does not generate an input denormal exception.  */
     a32_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A32_F16])
@@ -XXX,XX +XXX,XX @@ static void vfp_clear_float_status_exc_flags(CPUARMState *env)
      * values. The caller should have arranged for env->vfp.fpsr to
      * be the architecturally up-to-date exception flag information first.
      */
-    set_float_exception_flags(0, &env->vfp.fp_status_a32);
+    set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_A32_F16]);
     set_float_exception_flags(0, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
             i = float_round_to_zero;
             break;
         }
-        set_float_rounding_mode(i, &env->vfp.fp_status_a32);
+        set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32]);
         set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64]);
         set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]);
         set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
     }
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
-        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64]);
         /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]);
     }
     if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
         /*
@@ -XXX,XX +XXX,XX @@ static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
     }
     if (changed & FPCR_DN) {
         bool dnan_enabled = val & FPCR_DN;
-        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_a32);
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32]);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64]);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A32_F16]);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status[FPST_A64_F16]);
@@ -XXX,XX +XXX,XX @@ void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
         FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
 }
 DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
-DO_VFP_cmp(s, float32, float32, fp_status_a32)
-DO_VFP_cmp(d, float64, float64, fp_status_a32)
+DO_VFP_cmp(s, float32, float32, fp_status[FPST_A32])
+DO_VFP_cmp(d, float64, float64, fp_status[FPST_A32])
 #undef DO_VFP_cmp
 
 /* Integer to float and float to integer conversions */
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(fjcvtzs)(float64 value, float_status *status)
 
 uint32_t HELPER(vjcvt)(float64 value, CPUARMState *env)
 {
-    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status_a32);
+    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status[FPST_A32]);
     uint32_t result = pair;
     uint32_t z = (pair >> 32) == 0;
 
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Select on index instead of pointer.
No functional change.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-16-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/mve_helper.c | 40 +++++++++++++------------------------
 1 file changed, 14 insertions(+), 26 deletions(-)

diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ALL(vminnma, minnuma)
                 r[e] = 0;                                               \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(tm & 1)) {                                            \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VFMA(vfmss, 4, float32, true)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE * 2)) == 0) {          \
                 continue;                                               \
             }                                                           \
-            fpst0 = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :  \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst0 = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             fpst1 = fpst0;                                              \
             if (!(mask & 1)) {                                          \
                 scratch_fpst = *fpst0;                                  \
@@ -XXX,XX +XXX,XX @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
         unsigned e;                                             \
         TYPE *m = vm;                                           \
         TYPE ra = (TYPE)ra_in;                                  \
-        float_status *fpst = (ESIZE == 2) ?                     \
-            &env->vfp.fp_status[FPST_STD_F16] :                 \
-            &env->vfp.fp_status[FPST_STD];                       \
+        float_status *fpst =                                    \
+            &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
             if (mask & 1) {                                     \
                 TYPE v = m[H##ESIZE(e)];                        \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
             if ((mask & emask) == 0) {                                  \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
             if ((mask & emask) == 0) {                                  \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & (1 << (e * ESIZE)))) {                         \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
         unsigned e;                                                     \
         float_status *fpst;                                             \
         float_status scratch_fpst;                                      \
-        float_status *base_fpst = (ESIZE == 2) ?                        \
-            &env->vfp.fp_status[FPST_STD_F16] :                         \
-            &env->vfp.fp_status[FPST_STD];                               \
+        float_status *base_fpst =                                       \
+            &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD];  \
         uint32_t prev_rmode = get_float_rounding_mode(base_fpst);       \
         set_float_rounding_mode(rmode, base_fpst);                      \
         for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
             if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) {              \
                 continue;                                               \
             }                                                           \
-            fpst = (ESIZE == 2) ? &env->vfp.fp_status[FPST_STD_F16] :   \
-                &env->vfp.fp_status[FPST_STD];                           \
+            fpst = &env->vfp.fp_status[ESIZE == 2 ? FPST_STD_F16 : FPST_STD]; \
             if (!(mask & 1)) {                                          \
                 /* We need the result but without updating flags */     \
                 scratch_fpst = *fpst;                                   \
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Pass ARMFPStatusFlavour index instead of fp_status[FOO].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-17-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vfp_helper.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ static void softfloat_to_vfp_compare(CPUARMState *env, FloatRelation cmp)
 void VFP_HELPER(cmp, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env)  \
 { \
     softfloat_to_vfp_compare(env, \
-        FLOATTYPE ## _compare_quiet(a, b, &env->vfp.FPST)); \
+        FLOATTYPE ## _compare_quiet(a, b, &env->vfp.fp_status[FPST])); \
 } \
 void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
 { \
     softfloat_to_vfp_compare(env, \
-        FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
+        FLOATTYPE ## _compare(a, b, &env->vfp.fp_status[FPST])); \
 }
-DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status[FPST_A32_F16])
-DO_VFP_cmp(s, float32, float32, fp_status[FPST_A32])
-DO_VFP_cmp(d, float64, float64, fp_status[FPST_A32])
+DO_VFP_cmp(h, float16, dh_ctype_f16, FPST_A32_F16)
+DO_VFP_cmp(s, float32, float32, FPST_A32)
+DO_VFP_cmp(d, float64, float64, FPST_A32)
 #undef DO_VFP_cmp
 
 /* Integer to float and float to integer conversions */
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Read the bit from the source, rather than from the proxy via
get_flush_inputs_to_zero.  This makes it clear that it does
not matter which of the float_status structures is used.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-34-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a32)(void *vd, void *vn, void *vm,
     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 
     do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
-             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
+             env->vfp.fpcr & FPCR_FZ16);
 }
 
 void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
         }
     }
     do_fmlal(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-             get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
+             env->vfp.fpcr & FPCR_FZ16);
 }
 
 void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
     bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     float_status *status = &env->vfp.fp_status[FPST_A64];
-    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
+    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
     int negx = 0, negf = 0;
 
     if (is_s) {
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a32)(void *vd, void *vn, void *vm,
     uint64_t negx = is_s ? 0x8000800080008000ull : 0;
 
     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_STD], negx, 0, desc,
-                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A32_F16]));
+                 env->vfp.fpcr & FPCR_FZ16);
 }
 
 void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
         }
     }
     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status[FPST_A64], negx, negf, desc,
-                 get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]));
+                 env->vfp.fpcr & FPCR_FZ16);
 }
 
 void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
     intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
     intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
     float_status *status = &env->vfp.fp_status[FPST_A64];
-    bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status[FPST_A64_F16]);
+    bool fz16 = env->vfp.fpcr & FPCR_FZ16;
     int negx = 0, negf = 0;
 
     if (is_s) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Sink common code from the callers into do_fmlal
and do_fmlal_idx.  Reorder the arguments to minimize
the re-sorting from the caller's arguments.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-35-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/vec_helper.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)