This patchset implements emulation of the Arm FEAT_AFP and FEAT_RPRES
extensions, which are floating-point related. (Summary of what
these are exactly is at the bottom of the cover letter.)
If you'd rather have these patches as a git branch:
https://git.linaro.org/people/pmaydell/qemu-arm.git feat-afp
with human readable web view at:
https://git.linaro.org/people/peter.maydell/qemu-arm.git/log/?h=feat-afp
Changes between v1 and v2:
* first part of the series has been upstreamed
* I've left the first two x86 patches in here, just to avoid having
to use a Based-on: tag. They've both been taken by Paolo already,
they just haven't landed upstream yet.
* the tail-end patches fixing x86 denormal support are not posted
here (indeed I didn't mean to send them in v1!); I'll send those
separately once the underlying softfloat patches are upstream
* the renaming of the FPST_ constants (already upstream) is
carried through into these patches
* name changes in the "allow flushing of output denormals to be
after rounding" patch:
now set_float_ftz_detection(), get_float_ftz_detection(),
float_ftz_after_rounding and float_ftz_before_rounding
* moved select_fpst to translate-a64.h and renamed to select_ah_fpst
* use vec_full_reg_offset() in the write_fp_*reg_merging fns
* drop no-longer-nedeed float*_input_flush2() calls in the
float*_hs_compare() fns in "implement float_flag_input_denormal_used"
* adopted RTH's patchset, by a mix of merging in fixes to my
patches and adding his (partly on the end, and partly sorted
into the series at appropriate places). I updated commit messages
in a few places (notably standardising them onto "Handle X for
<some insn>" rather than "for <some QEMU function>")
Patches that still need review:
04 fpu: Implement float_flag_input_denormal_used
05 fpu: allow flushing of output denormals to be after rounding
06 target/arm: Define FPCR AH, FIZ, NEP bits
RTH: I kept your r-by tags on the patches where I squashed
in your fixes from your followup series (mostly this is the
changes to use the muladd flags). If you want to re-review
to check that I did the squashing right, those are patches:
37 target/arm: Handle FPCR.AH in negation steps in SVE FCADD
38 target/arm: Handle FPCR.AH in negation steps in FCADD
41 target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
42 target/arm: Handle FPCR.AH in negation in FMLS (vector)
43 target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
44 target/arm: Handle FPCR.AH in SVE FTSSEL
45 target/arm: Handle FPCR.AH in SVE FTMAD
Summary of what FEAT_AFP/FEAT_RPRES are, from v1 cover letter:
FEAT_AFP defines three new control bits in the FPCR, whose
operations are basically independent of each other:
* FPCR.AH: "alternate floating point mode"; this changes floating
point behaviour in a variety of ways, including:
- the sign of a default NaN is 1, not 0
- if FPCR.FZ is also 1, denormals detected after rounding
with an unbounded exponent has been applied are flushed to zero
- FPCR.FZ does not cause denormalized inputs to be flushed to zero
- miscellaneous other corner-case behaviour changes
* FPCR.FIZ: flush denormalized numbers to zero on input for
most instructions
* FPCR.NEP: makes scalar SIMD operations merge the result with
higher vector elements in one of the source registers, instead
of zeroing the higher elements of the destination
FEAT_RPRES makes single-precision FRECPE and FRSQRTE use a 12-bit
mantissa precision instead of 8-bit when FPCR.AH is set.
thanks
-- PMM
Peter Maydell (50):
target/i386: Do not raise Invalid for 0 * Inf + QNaN
tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases
fpu: Add float_class_denormal
fpu: Implement float_flag_input_denormal_used
fpu: allow flushing of output denormals to be after rounding
target/arm: Define FPCR AH, FIZ, NEP bits
target/arm: Implement FPCR.FIZ handling
target/arm: Adjust FP behaviour for FPCR.AH = 1
target/arm: Adjust exception flag handling for AH = 1
target/arm: Add FPCR.AH to tbflags
target/arm: Set up float_status to use for FPCR.AH=1 behaviour
target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE,
FRSQRTS
target/arm: Use FPST_FPCR_AH for BFCVT* insns
target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
target/arm: Add FPCR.NEP to TBFLAGS
target/arm: Define and use new write_fp_*reg_merging() functions
target/arm: Handle FPCR.NEP for 3-input scalar operations
target/arm: Handle FPCR.NEP for BFCVT scalar
target/arm: Handle FPCR.NEP for 1-input scalar operations
target/arm: Handle FPCR.NEP in do_cvtf_scalar()
target/arm: Handle FPCR.NEP for scalar FABS and FNEG
target/arm: Handle FPCR.NEP for FCVTXN (scalar)
target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
target/arm: Implement FPCR.AH handling of negation of NaN
target/arm: Implement FPCR.AH handling for scalar FABS and FABD
target/arm: Handle FPCR.AH in vector FABD
target/arm: Handle FPCR.AH in SVE FNEG
target/arm: Handle FPCR.AH in SVE FABS
target/arm: Handle FPCR.AH in SVE FABD
target/arm: Handle FPCR.AH in negation steps in SVE FCADD
target/arm: Handle FPCR.AH in negation steps in FCADD
target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
target/arm: Handle FPCR.AH in negation in FMLS (vector)
target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
target/arm: Handle FPCR.AH in SVE FTSSEL
target/arm: Handle FPCR.AH in SVE FTMAD
target/arm: Enable FEAT_AFP for '-cpu max'
target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
target/arm: Implement increased precision FRECPE
target/arm: Implement increased precision FRSQRTE
target/arm: Enable FEAT_RPRES for -cpu max
Richard Henderson (19):
target/arm: Handle FPCR.AH in vector FCMLA
target/arm: Handle FPCR.AH in FCMLA by index
target/arm: Handle FPCR.AH in SVE FCMLA
target/arm: Handle FPCR.AH in FMLSL (by element and vector)
target/arm: Handle FPCR.AH in SVE FMLSL (indexed)
target/arm: Handle FPCR.AH in SVE FMLSLB, FMLSLT (vectors)
target/arm: Introduce CPUARMState.vfp.fp_status[]
target/arm: Remove standard_fp_status_f16
target/arm: Remove standard_fp_status
target/arm: Remove ah_fp_status_f16
target/arm: Remove ah_fp_status
target/arm: Remove fp_status_f16_a64
target/arm: Remove fp_status_f16_a32
target/arm: Remove fp_status_a64
target/arm: Remove fp_status_a32
target/arm: Simplify fp_status indexing in mve_helper.c
target/arm: Simplify DO_VFP_cmp in vfp_helper.c
target/arm: Read fz16 from env->vfp.fpcr
target/arm: Sink fp_status and fpcr access into do_fmlal*
docs/system/arm/emulation.rst | 2 +
include/fpu/softfloat-helpers.h | 11 +
include/fpu/softfloat-types.h | 41 +-
target/arm/cpu-features.h | 10 +
target/arm/cpu.h | 97 ++--
target/arm/helper.h | 26 +
target/arm/internals.h | 6 +
target/arm/tcg/helper-a64.h | 13 +
target/arm/tcg/helper-sve.h | 120 +++++
target/arm/tcg/translate-a64.h | 13 +
target/arm/tcg/translate.h | 54 +--
target/arm/tcg/vec_internal.h | 35 ++
target/mips/fpu_helper.h | 6 +
fpu/softfloat.c | 66 ++-
target/alpha/cpu.c | 7 +
target/arm/cpu.c | 46 +-
target/arm/helper.c | 2 +-
target/arm/tcg/cpu64.c | 2 +
target/arm/tcg/helper-a64.c | 151 +++---
target/arm/tcg/hflags.c | 13 +
target/arm/tcg/mve_helper.c | 44 +-
target/arm/tcg/sme_helper.c | 4 +-
target/arm/tcg/sve_helper.c | 367 +++++++++++----
target/arm/tcg/translate-a64.c | 782 +++++++++++++++++++++++++------
target/arm/tcg/translate-sve.c | 193 ++++++--
target/arm/tcg/vec_helper.c | 387 ++++++++++-----
target/arm/vfp_helper.c | 372 ++++++++++++---
target/hppa/fpu_helper.c | 11 +
target/i386/tcg/fpu_helper.c | 13 +-
target/mips/msa.c | 9 +
target/ppc/cpu_init.c | 3 +
target/rx/cpu.c | 8 +
target/sh4/cpu.c | 8 +
target/tricore/helper.c | 1 +
tests/fp/fp-bench.c | 1 +
tests/tcg/x86_64/fma.c | 109 +++++
fpu/softfloat-parts.c.inc | 132 +++++-
tests/tcg/x86_64/Makefile.target | 1 +
38 files changed, 2452 insertions(+), 714 deletions(-)
create mode 100644 tests/tcg/x86_64/fma.c
--
2.34.1