[RFC PATCH 00/84] fpu: Export some internals for targets

Richard Henderson posted 84 patches 4 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260426134002.865628-1-richard.henderson@linaro.org
Maintainers: Aurelien Jarno <aurelien@aurel32.net>, Peter Maydell <peter.maydell@linaro.org>, "Alex Bennée" <alex.bennee@linaro.org>, Richard Henderson <richard.henderson@linaro.org>, Ilya Leoshkevich <iii@linux.ibm.com>, David Hildenbrand <david@kernel.org>, Cornelia Huck <cohuck@redhat.com>, Eric Farman <farman@linux.ibm.com>, Matthew Rosato <mjrosato@linux.ibm.com>
There is a newer version of this series
include/fpu/softfloat-parts.h    |  223 ++++
include/fpu/softfloat.h          |   11 -
target/arm/tcg/vec_internal.h    |   12 +-
fpu/softfloat.c                  | 2124 +++++++++++-------------------
target/arm/tcg/sme_helper.c      |  102 +-
target/arm/tcg/vec_helper.c      |  127 +-
target/s390x/tcg/fpu_helper.c    |  135 ++
fpu/softfloat-parts-addsub.c.inc |   22 +-
fpu/softfloat-parts.c.inc        |  723 ++++------
fpu/softfloat-specialize.c.inc   |   42 +-
10 files changed, 1529 insertions(+), 1992 deletions(-)
create mode 100644 include/fpu/softfloat-parts.h
[RFC PATCH 00/84] fpu: Export some internals for targets
Posted by Richard Henderson 4 weeks ago
Expose enough softfloat infrastructure so that we can move the
s390_divide_to_integer code back into target/s390x/, and to
perform the Arm FPDotAdd operation without using either float64
intermediates and round-to-odd.

The motivation is the Arm FP8DotAdd operation, which has up to
8 multiplies, 9 additions, and 1 scaling operation before rounding.

Rather than export the code as-is, with functions that sometimes
(but not always) return a pointer to one of the inputs, and may
or may not modify one or all of them as scratch space, return a
new structure instead.

This turns out to improve code as well, because we compile with
-ftrivial-auto-var-init=zero.  A pattern such as

    FloatParts t;
    foo(&t, ...);

will zero t before calling foo, whereas

    FloatParts t = foo(...);

will not.

I have not converted everything, only enough to do the testing
of the target/ patches at the end.  Comments?


r~


Richard Henderson (84):
  fpu: Drop parts_canonicalize
  fpu: Drop parts_uncanon
  fpu: Drop parts_uncanon_normal
  fpu: Drop parts_default_nan
  fpu: Drop parts_silence_nan
  fpu: Drop parts_return_nan
  fpu: Drop parts_pick_nan
  fpu: Drop parts_pick_nan_muladd
  fpu: Reverse the order of softfloat-parts* inclusions
  fpu: Drop parts_{add,sub}_normal
  fpu: Drop parts_addsub
  fpu: Drop parts_mul
  fpu: Drop parts_muladd_scalbn
  fpu: Drop parts_div
  fpu: Drop parts_modrem
  fpu: Drop parts_sqrt
  fpu: Drop parts_round_to_int_normal
  fpu: Drop parts_round_to_int
  fpu: Drop parts_float_to_sint
  fpu: Drop parts_float_to_uint
  fpu: Drop parts_float_to_sint_modulo
  fpu: Drop parts_sint_to_float
  fpu: Drop parts_uint_to_float
  fpu: Drop parts_minmax
  fpu: Drop parts_compare
  fpu: Drop parts_scalbn
  fpu: Drop parts_log2
  fpu: Drop parts_float_to_float
  fpu: Drop PARTS_GENERIC_64_128{_256}
  fpu: Drop FRAC_GENERIC_64_128{_256}
  fpu: Constify frac{64,128,256}_* inputs
  fpu: Return structure from unpack_raw64
  fpu: Return struct from float4_e2m1_unpack_canonical
  fpu: Return struct from float8_e4m3_unpack_canonical
  fpu: Return struct from float8_e5m2_unpack_canonical
  fpu: Inline float16_unpack_raw into callers
  fpu: Return struct from float16a_unpack_canonical
  fpu: Return struct from float16_unpack_canonical
  fpu: Inline bfloat16_unpack_raw into callers
  fpu: Return struct from bfloat16_unpack_canonical
  fpu: Inline float32_unpack_raw into callers
  fpu: Inline float64_unpack_raw into callers
  fpu: Return struct from float{32,64}_unpack_canonical
  fpu: Inline floatx80_unpack_raw into only caller
  fpu: Return struct from float128_unpack_raw
  fpu: Return struct from float128_unpack_canonical
  fpu: Change parts_float_to_float_narrow to parts128_to_parts64
  fpu: Change parts_float_to_float_widen to parts64_to_parts128
  fpu: Inline float8_e4m3_pack_raw to single caller
  fpu: Inline float8_e5m2_pack_raw into single caller
  fpu: Inline float16_pack_raw into callers
  fpu: Inline bfloat16_pack_raw into callers
  fpu: Inline float32_pack_raw into callers
  fpu: Inline float64_pack_raw into callers
  fpu: Mark unpack_raw64 QEMU_ALWAYS_INLINE
  fpu: Mark pack_raw64 QEMU_ALWAYS_INLINE
  fpu: Split FloatParts{64,128} to softfloat-parts.h
  fpu: Export FloatFmt structures
  fpu: Export unpack_canonical and round_pack_canonical routines
  fpu: Return struct from parts{64,128}_default_nan
  fpu: Return struct from parts{64,128}_silence_nan
  fpu: Return struct from parts{64,128}_return_nan
  fpu: Sink exp_bias adjustment in float64r32_pack_raw
  fpu: Return struct from parts{64,128}_pick_nan
  fpu: Return struct from parts{64,128}_div
  fpu: Return struct from parts{64,128}_round_to_int
  fpu: Use parts64_round_to_int in parts_s390_divide_to_integer
  fpu: Export default_nan and pick_nan routines
  fpu: Introduce parts64_round_canonical
  fpu: Export parts{64,128}_div
  fpu: Export parts{64,128}_round_to_int
  fpu: Return struct from parts{64,128}_pick_nan_muladd
  fpu: Introduce record_denormals_used
  fpu: Return struct from parts{64,128}_muladd_scalbn
  fpu: Drop QEMU_FLATTEN from muladd routines
  fpu: Export parts{64,128}_compare
  fpu: Return struct from parts{64,128}_mul
  fpu: Hoist nan check in partsN_addsub
  fpu: Return struct from parts{64,128}_addsub
  fpu: Simplify 0 +/- N case in parts_addsub
  target/s390x: Move float{32,64}_s390_divide_to_integer
  target/arm: Use FloatParts64 in bfdotadd_ebf
  target/arm: Drop oddstatus from is_ebf and bfdotadd_ebf
  target/arm: Use FloatParts64 in f16_dotadd

 include/fpu/softfloat-parts.h    |  223 ++++
 include/fpu/softfloat.h          |   11 -
 target/arm/tcg/vec_internal.h    |   12 +-
 fpu/softfloat.c                  | 2124 +++++++++++-------------------
 target/arm/tcg/sme_helper.c      |  102 +-
 target/arm/tcg/vec_helper.c      |  127 +-
 target/s390x/tcg/fpu_helper.c    |  135 ++
 fpu/softfloat-parts-addsub.c.inc |   22 +-
 fpu/softfloat-parts.c.inc        |  723 ++++------
 fpu/softfloat-specialize.c.inc   |   42 +-
 10 files changed, 1529 insertions(+), 1992 deletions(-)
 create mode 100644 include/fpu/softfloat-parts.h

-- 
2.43.0