Expose enough softfloat infrastructure so that we can move the
s390_divide_to_integer code back into target/s390x/, and to
perform the Arm FPDotAdd operation without using either float64
intermediates and round-to-odd.
The motivation is the Arm FP8DotAdd operation, which has up to
8 multiplies, 9 additions, and 1 scaling operation before rounding.
Rather than export the code as-is, with functions that sometimes
(but not always) return a pointer to one of the inputs, and may
or may not modify one or all of them as scratch space, return a
new structure instead.
This turns out to improve code as well, because we compile with
-ftrivial-auto-var-init=zero. A pattern such as
FloatParts t;
foo(&t, ...);
will zero t before calling foo, whereas
FloatParts t = foo(...);
will not.
I have not converted everything, only enough to do the testing
of the target/ patches at the end. Comments?
r~
Richard Henderson (84):
fpu: Drop parts_canonicalize
fpu: Drop parts_uncanon
fpu: Drop parts_uncanon_normal
fpu: Drop parts_default_nan
fpu: Drop parts_silence_nan
fpu: Drop parts_return_nan
fpu: Drop parts_pick_nan
fpu: Drop parts_pick_nan_muladd
fpu: Reverse the order of softfloat-parts* inclusions
fpu: Drop parts_{add,sub}_normal
fpu: Drop parts_addsub
fpu: Drop parts_mul
fpu: Drop parts_muladd_scalbn
fpu: Drop parts_div
fpu: Drop parts_modrem
fpu: Drop parts_sqrt
fpu: Drop parts_round_to_int_normal
fpu: Drop parts_round_to_int
fpu: Drop parts_float_to_sint
fpu: Drop parts_float_to_uint
fpu: Drop parts_float_to_sint_modulo
fpu: Drop parts_sint_to_float
fpu: Drop parts_uint_to_float
fpu: Drop parts_minmax
fpu: Drop parts_compare
fpu: Drop parts_scalbn
fpu: Drop parts_log2
fpu: Drop parts_float_to_float
fpu: Drop PARTS_GENERIC_64_128{_256}
fpu: Drop FRAC_GENERIC_64_128{_256}
fpu: Constify frac{64,128,256}_* inputs
fpu: Return structure from unpack_raw64
fpu: Return struct from float4_e2m1_unpack_canonical
fpu: Return struct from float8_e4m3_unpack_canonical
fpu: Return struct from float8_e5m2_unpack_canonical
fpu: Inline float16_unpack_raw into callers
fpu: Return struct from float16a_unpack_canonical
fpu: Return struct from float16_unpack_canonical
fpu: Inline bfloat16_unpack_raw into callers
fpu: Return struct from bfloat16_unpack_canonical
fpu: Inline float32_unpack_raw into callers
fpu: Inline float64_unpack_raw into callers
fpu: Return struct from float{32,64}_unpack_canonical
fpu: Inline floatx80_unpack_raw into only caller
fpu: Return struct from float128_unpack_raw
fpu: Return struct from float128_unpack_canonical
fpu: Change parts_float_to_float_narrow to parts128_to_parts64
fpu: Change parts_float_to_float_widen to parts64_to_parts128
fpu: Inline float8_e4m3_pack_raw to single caller
fpu: Inline float8_e5m2_pack_raw into single caller
fpu: Inline float16_pack_raw into callers
fpu: Inline bfloat16_pack_raw into callers
fpu: Inline float32_pack_raw into callers
fpu: Inline float64_pack_raw into callers
fpu: Mark unpack_raw64 QEMU_ALWAYS_INLINE
fpu: Mark pack_raw64 QEMU_ALWAYS_INLINE
fpu: Split FloatParts{64,128} to softfloat-parts.h
fpu: Export FloatFmt structures
fpu: Export unpack_canonical and round_pack_canonical routines
fpu: Return struct from parts{64,128}_default_nan
fpu: Return struct from parts{64,128}_silence_nan
fpu: Return struct from parts{64,128}_return_nan
fpu: Sink exp_bias adjustment in float64r32_pack_raw
fpu: Return struct from parts{64,128}_pick_nan
fpu: Return struct from parts{64,128}_div
fpu: Return struct from parts{64,128}_round_to_int
fpu: Use parts64_round_to_int in parts_s390_divide_to_integer
fpu: Export default_nan and pick_nan routines
fpu: Introduce parts64_round_canonical
fpu: Export parts{64,128}_div
fpu: Export parts{64,128}_round_to_int
fpu: Return struct from parts{64,128}_pick_nan_muladd
fpu: Introduce record_denormals_used
fpu: Return struct from parts{64,128}_muladd_scalbn
fpu: Drop QEMU_FLATTEN from muladd routines
fpu: Export parts{64,128}_compare
fpu: Return struct from parts{64,128}_mul
fpu: Hoist nan check in partsN_addsub
fpu: Return struct from parts{64,128}_addsub
fpu: Simplify 0 +/- N case in parts_addsub
target/s390x: Move float{32,64}_s390_divide_to_integer
target/arm: Use FloatParts64 in bfdotadd_ebf
target/arm: Drop oddstatus from is_ebf and bfdotadd_ebf
target/arm: Use FloatParts64 in f16_dotadd
include/fpu/softfloat-parts.h | 223 ++++
include/fpu/softfloat.h | 11 -
target/arm/tcg/vec_internal.h | 12 +-
fpu/softfloat.c | 2124 +++++++++++-------------------
target/arm/tcg/sme_helper.c | 102 +-
target/arm/tcg/vec_helper.c | 127 +-
target/s390x/tcg/fpu_helper.c | 135 ++
fpu/softfloat-parts-addsub.c.inc | 22 +-
fpu/softfloat-parts.c.inc | 723 ++++------
fpu/softfloat-specialize.c.inc | 42 +-
10 files changed, 1529 insertions(+), 1992 deletions(-)
create mode 100644 include/fpu/softfloat-parts.h
--
2.43.0