[v10] target/arm: Implement FEAT_FP8

[PATCH v10 00/45] target/arm: Implement FEAT_FP8

Richard Henderson posted 45 patches 2 hours ago

Diff against v2 v3 v4 v5 v6 v7 v8 v9
Download series mbox

Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260609192110.752384-1-richard.henderson@linaro.org

Maintainers: Peter Maydell <peter.maydell@linaro.org>, Pierrick Bouvier <pierrick.bouvier@oss.qualcomm.com>, Laurent Vivier <laurent@vivier.eu>, Helge Deller <deller@gmx.de>

target/arm/cpu-features.h        |  83 +++
target/arm/helper-fp8.h          |  14 +
target/arm/tcg/helper-defs.h     |   6 +
target/arm/tcg/helper-fp8-defs.h |  40 ++
target/arm/tcg/helper-sme-defs.h |   2 +-
target/arm/tcg/translate-a64.h   |   1 +
target/arm/tcg/translate.h       |   7 +-
target/arm/tcg/vec_internal.h    |  12 +-
linux-user/aarch64/elfload.c     |  13 +
target/arm/cpu.c                 |  20 +-
target/arm/tcg/cpu64.c           |  16 +
target/arm/tcg/fp8_helper.c      | 859 +++++++++++++++++++++++++++++++
target/arm/tcg/sme_helper.c      | 105 ++--
target/arm/tcg/translate-a64.c   | 160 ++++++
target/arm/tcg/translate-sme.c   |  71 ++-
target/arm/tcg/translate-sve.c   | 245 ++++++++-
target/arm/tcg/vec_helper.c      | 194 ++++---
docs/system/arm/emulation.rst    |  11 +
target/arm/tcg/a64.decode        |  39 ++
target/arm/tcg/meson.build       |   1 +
target/arm/tcg/sme.decode        |  25 +-
target/arm/tcg/sve.decode        |  48 +-
22 files changed, 1815 insertions(+), 157 deletions(-)
create mode 100644 target/arm/helper-fp8.h
create mode 100644 target/arm/tcg/helper-fp8-defs.h
create mode 100644 target/arm/tcg/fp8_helper.c

Expand all Fold all

[PATCH v10 00/45] target/arm: Implement FEAT_FP8

Posted by Richard Henderson 2 hours ago

Changes for v10:
  - bfdotadd_ebf and f16_dotadd fixes
  - streaming fixes
  - comment fixes
  - All patches reviewed.

r~

Richard Henderson (45):
  target/arm: Use FloatParts64 in bfdotadd_ebf
  target/arm: Drop oddstatus from is_ebf and bfdotadd_ebf
  target/arm: Use FloatParts64 in f16_dotadd
  target/arm: Generalize TRANS_FEAT_STREAMING_SME2
  target/arm: Introduce arm_init_fp_status
  target/arm: Set e4m3_nan_is_snan
  target/arm: Implement BF1CVTL, BF1CVTL2, BF2CVTL, BF2CVTL2 for AdvSIMD
  target/arm: Implement BF1CVT, BF1CVTLT, BF2CVT, BF2CVTLT for SVE
  target/arm: Rename SME BFCVT patterns to BFCVT_hs
  target/arm: Implement BF1CVT, BF1CVTL, BF2CVT, BF2CVTL for SME
  target/arm: Implement F1CVTL, F1CVTL2, F2CVTL, F2CVTL2 for AdvSIMD
  target/arm: Implement F1CVT, F1CVTLT, F2CVT, F2CVTLT for SVE
  target/arm: Implement F1CVT, F1CVTL, F2CVT, F2CVTL for SME
  target/arm: Implement BFCVTN for SVE
  target/arm: Implement FCVTN (16- to 8-bit fp) for AdvSIMD
  target/arm: Implement FCVTN, FCVTN2 (32- to 8-bit fp) for AdvSIMD
  target/arm: Implement FCVTN (16- to 8-bit fp) for SVE
  target/arm: Implement FCVTNB, FCVTNT for SVE
  target/arm: Implement FCVT (FP16 to FP8) for SME
  target/arm: Implement FCVT, FCVTN (FP32 to FP8) for SME
  target/arm: Implement LUTI2, LUTI4 for AdvSIMD
  target/arm: Implement LUTI2, LUTI4 for SVE
  target/arm: Enable FEAT_LUT for -cpu max
  target/arm: Enable FEAT_FP8 for -cpu max
  target/arm: Update ID_AA64SMFR0_EL1 fields to ARM M.b
  target/arm: Implement MOVT (vector to table)
  target/arm: Implement LUTI4 (four registers, 8-bit)
  target/arm: Enable FEAT_SME_LUTv2 for -cpu max
  target/arm: Implement FMLALB, FMLALT for AdvSIMD
  target/arm: Implement FMLALB, FMLALT (FP8 to FP16) for SVE
  target/arm: Implement FMLALL{BB,BT,TB,TT} for AdvSIMD
  target/arm: Implement FMLALL{BB,BT,TB,TT} for SVE
  target/arm: Enable FEAT_FP8FMA, FEAT_SSVE_FP8FMA for -cpu max
  target/arm: Implement FDOT (FP8 to FP32) for AdvSIMD
  target/arm: Implement FDOT (FP8 to FP32) for SVE
  target/arm: Enable FEAT_FP8DOT4, FEAT_SSVE_FP8DOT4 for -cpu max
  target/arm: Implement FDOT (FP8 to FP16) for AdvSIMD
  target/arm: Implement FDOT (FP8 to FP16) for SVE
  target/arm: Enable FEAT_FP8DOT2, FEAT_SSVE_FP8DOT2 for -cpu max
  target/arm: Implement FMMLA (FP8 to FP32) for AdvSIMD
  target/arm: Implement FMMLA (FP8 to FP32) for SVE
  target/arm: Enable FEAT_F8F32MM for -cpu max
  target/arm: Implement FMMLA (FP8 to FP16) for AdvSIMD
  target/arm: Implement FMMLA (FP8 to FP16) for SVE
  target/arm: Enable FEAT_F8F16MM for -cpu max

 target/arm/cpu-features.h        |  83 +++
 target/arm/helper-fp8.h          |  14 +
 target/arm/tcg/helper-defs.h     |   6 +
 target/arm/tcg/helper-fp8-defs.h |  40 ++
 target/arm/tcg/helper-sme-defs.h |   2 +-
 target/arm/tcg/translate-a64.h   |   1 +
 target/arm/tcg/translate.h       |   7 +-
 target/arm/tcg/vec_internal.h    |  12 +-
 linux-user/aarch64/elfload.c     |  13 +
 target/arm/cpu.c                 |  20 +-
 target/arm/tcg/cpu64.c           |  16 +
 target/arm/tcg/fp8_helper.c      | 859 +++++++++++++++++++++++++++++++
 target/arm/tcg/sme_helper.c      | 105 ++--
 target/arm/tcg/translate-a64.c   | 160 ++++++
 target/arm/tcg/translate-sme.c   |  71 ++-
 target/arm/tcg/translate-sve.c   | 245 ++++++++-
 target/arm/tcg/vec_helper.c      | 194 ++++---
 docs/system/arm/emulation.rst    |  11 +
 target/arm/tcg/a64.decode        |  39 ++
 target/arm/tcg/meson.build       |   1 +
 target/arm/tcg/sme.decode        |  25 +-
 target/arm/tcg/sve.decode        |  48 +-
 22 files changed, 1815 insertions(+), 157 deletions(-)
 create mode 100644 target/arm/helper-fp8.h
 create mode 100644 target/arm/tcg/helper-fp8-defs.h
 create mode 100644 target/arm/tcg/fp8_helper.c

-- 
2.43.0