[PATCH v4 00/60] target/arm: Implement FEAT_FP8

Richard Henderson posted 60 patches 2 weeks, 3 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260507234413.643512-1-richard.henderson@linaro.org
Maintainers: Peter Maydell <peter.maydell@linaro.org>, Pierrick Bouvier <pierrick.bouvier@oss.qualcomm.com>, Leif Lindholm <leif.lindholm@oss.qualcomm.com>
There is a newer version of this series
target/arm/cpregs.h                          |   5 +
target/arm/cpu-features.h                    | 137 +++
target/arm/cpu.h                             |  52 +-
target/arm/helper-fp8.h                      |  14 +
target/arm/internals.h                       |  13 +-
target/arm/tcg/helper-a64-defs.h             |  11 +
target/arm/tcg/helper-defs.h                 |   6 +
target/arm/tcg/helper-fp8-defs.h             |  40 +
target/arm/tcg/helper-sme-defs.h             |   2 +-
target/arm/tcg/helper-sve-defs.h             |  14 +
target/arm/tcg/translate-a64.h               |   1 +
target/arm/tcg/translate.h                   |  10 +
target/arm/tcg/vec_internal.h                |  19 +
target/arm/vector-type.h                     |  44 +
target/arm/helper.c                          |  43 +-
target/arm/machine.c                         |  20 +
target/arm/tcg/cpu64.c                       |  24 +
target/arm/tcg/fp8_helper.c                  | 867 +++++++++++++++++++
target/arm/tcg/hflags.c                      |  41 +
target/arm/tcg/sme_helper.c                  |   8 +-
target/arm/tcg/sve_helper.c                  |   8 +
target/arm/tcg/translate-a64.c               | 186 ++++
target/arm/tcg/translate-sme.c               | 109 ++-
target/arm/tcg/translate-sve.c               | 235 +++++
target/arm/tcg/vec_helper.c                  |  66 ++
target/arm/tcg/vec_helper64.c                |  51 ++
docs/system/arm/emulation.rst                |  13 +
target/arm/cpu-sysregs.h.inc                 |   2 +
target/arm/tcg/a64.decode                    |  47 +
target/arm/tcg/meson.build                   |   1 +
target/arm/tcg/sme.decode                    |  36 +-
target/arm/tcg/sve.decode                    |  50 +-
tests/functional/aarch64/test_rme_sbsaref.py |   7 +-
tests/functional/aarch64/test_rme_virt.py    |   7 +-
34 files changed, 2124 insertions(+), 65 deletions(-)
create mode 100644 target/arm/helper-fp8.h
create mode 100644 target/arm/tcg/helper-fp8-defs.h
create mode 100644 target/arm/vector-type.h
create mode 100644 target/arm/tcg/fp8_helper.c
[PATCH v4 00/60] target/arm: Implement FEAT_FP8
Posted by Richard Henderson 2 weeks, 3 days ago
Based-on: 20260507160959.449170-1-richard.henderson@linaro.org
("[PATCH v3 00/11] fpu: Export some internals for targets")

Changes for v4:
  - TF-A firmware update for FEAT_FPMR
  - Rewrite decode/encode of FP8 data types
  - Rewrite FEAT_FP8FMA helpers
  - Enable FEAT_FP8DOT{4,2}
  - Enable FEAT_FP8F{32,16}MM

This completes the set of 8-bit floating point features.


r~


Pierrick Bouvier (1):
  tests/functional/aarch64/rme: update images to support FEAT_FP8

Richard Henderson (59):
  target/arm: Implement ID_AA64ISAR3
  target/arm: Implement FEAT_FAMINMAX for AdvSIMD
  target/arm: Implement FEAT_FAMINMAX for SME
  target/arm: Implement FEAT_FAMINMAX for SVE
  target/arm: Enable FEAT_FAMINMAX for -cpu max
  target/arm: Update SCR bits for Arm ARM M.a.a
  target/arm: Update HCRX bits for Arm ARM M.a.a
  target/arm: Introduce FPMR
  target/arm: Update SCTLR bits for FEAT_FPMR
  target/arm: Enable EnFPM bits for FEAT_FPMR
  target/arm: Clear FPMR on ResetSVEState
  target/arm: Add FPMR_EL to TBFLAGS
  target/arm: Trap direct acceses to FPMR
  target/arm: Enable FEAT_FPMR for -cpu max
  target/arm: Implement ID_AA64FPFR0
  target/arm: Add isar_feature_aa64_f8cvt
  target/arm: Implement FSCALE for AdvSIMD
  target/arm: Implement FSCALE for SME
  target/arm: Split vector-type.h from cpu.h
  target/arm: Move vectors_overlap to vec_internal.h
  target/arm: Implement BF1CVTL, BF1CVTL2, BF2CVTL, BF2CVTL2 for AdvSIMD
  target/arm: Implement BF1CVT, BF1CVTLT, BF2CVT, BF2CVTLT for SVE
  target/arm: Rename SME BFCVT patterns to BFCVT_hs
  target/arm: Implement BF1CVT, BF1CVTL, BF2CVT, BF2CVTL for SME
  target/arm: Implement F1CVTL, F1CVTL2, F2CVTL, F2CVTL2 for AdvSIMD
  target/arm: Implement F1CVT, F1CVTLT, F2CVT, F2CVTLT for SVE
  target/arm: Implement F1CVT, F1CVTL, F2CVT, F2CVTL for SME
  target/arm: Implement BFCVTN for SVE
  target/arm: Implement FCVTN (16- to 8-bit fp) for AdvSIMD
  target/arm: Implement FCVTN, FCVTN2 (32- to 8-bit fp) for AdvSIMD
  target/arm: Implement FCVTN (16- to 8-bit fp) for SVE
  target/arm: Implement FCVTNB, FCVTNT for SVE
  target/arm: Implement FCVT (FP16 to FP8) for SME
  target/arm: Implement FCVT, FCVTN (FP32 to FP8) for SME
  target/arm: Implement LUTI2, LUTI4 for AdvSIMD
  target/arm: Implement LUTI2, LUTI4 for SVE
  target/arm: Enable FEAT_LUT for -cpu max
  target/arm: Enable FEAT_FP8 for -cpu max
  target/arm: Update ID_AA64SMFR0_EL1 fields to ARM M.b
  target/arm: Implement MOVT (vector to table)
  target/arm: Implement LUTI4 (four registers, 8-bit)
  target/arm: Enable FEAT_SME_LUTv2 for -cpu max
  target/arm: Implement FMLALB, FMLALT for AdvSIMD
  target/arm: Implement FMLALB, FMLALT (FP8 to FP16) for SVE
  target/arm: Implement FMLALL{BB,BT,TB,TT} for AdvSIMD
  target/arm: Implement FMLALL{BB,BT,TB,TT} for SVE
  target/arm: Enable FEAT_FP8FMA, FEAT_SSVE_FP8FMA for -cpu max
  target/arm: Implement FDOT (FP8 to FP32) for AdvSIMD
  target/arm: Implement FDOT (FP8 to FP32) for SVE
  target/arm: Enable FEAT_FP8DOT4, FEAT_SSVE_FP8DOT4 for -cpu max
  target/arm: Implement FDOT (FP8 to FP16) for AdvSIMD
  target/arm: Implement FDOT (FP8 to FP16) for SVE
  target/arm: Enable FEAT_FP8DOT2, FEAT_SSVE_FP8DOT2 for -cpu max
  target/arm: Implement FMMLA (FP8 to FP32) for AdvSIMD
  target/arm: Implement FMMLA (FP8 to FP32) for SVE
  target/arm: Enable FEAT_F8F32MM for -cpu max
  target/arm: Implement FMMLA (FP8 to FP16) for AdvSIMD
  target/arm: Implement FMMLA (FP8 to FP16) for SVE
  target/arm: Enable FEAT_F8F16MM for -cpu max

 target/arm/cpregs.h                          |   5 +
 target/arm/cpu-features.h                    | 137 +++
 target/arm/cpu.h                             |  52 +-
 target/arm/helper-fp8.h                      |  14 +
 target/arm/internals.h                       |  13 +-
 target/arm/tcg/helper-a64-defs.h             |  11 +
 target/arm/tcg/helper-defs.h                 |   6 +
 target/arm/tcg/helper-fp8-defs.h             |  40 +
 target/arm/tcg/helper-sme-defs.h             |   2 +-
 target/arm/tcg/helper-sve-defs.h             |  14 +
 target/arm/tcg/translate-a64.h               |   1 +
 target/arm/tcg/translate.h                   |  10 +
 target/arm/tcg/vec_internal.h                |  19 +
 target/arm/vector-type.h                     |  44 +
 target/arm/helper.c                          |  43 +-
 target/arm/machine.c                         |  20 +
 target/arm/tcg/cpu64.c                       |  24 +
 target/arm/tcg/fp8_helper.c                  | 867 +++++++++++++++++++
 target/arm/tcg/hflags.c                      |  41 +
 target/arm/tcg/sme_helper.c                  |   8 +-
 target/arm/tcg/sve_helper.c                  |   8 +
 target/arm/tcg/translate-a64.c               | 186 ++++
 target/arm/tcg/translate-sme.c               | 109 ++-
 target/arm/tcg/translate-sve.c               | 235 +++++
 target/arm/tcg/vec_helper.c                  |  66 ++
 target/arm/tcg/vec_helper64.c                |  51 ++
 docs/system/arm/emulation.rst                |  13 +
 target/arm/cpu-sysregs.h.inc                 |   2 +
 target/arm/tcg/a64.decode                    |  47 +
 target/arm/tcg/meson.build                   |   1 +
 target/arm/tcg/sme.decode                    |  36 +-
 target/arm/tcg/sve.decode                    |  50 +-
 tests/functional/aarch64/test_rme_sbsaref.py |   7 +-
 tests/functional/aarch64/test_rme_virt.py    |   7 +-
 34 files changed, 2124 insertions(+), 65 deletions(-)
 create mode 100644 target/arm/helper-fp8.h
 create mode 100644 target/arm/tcg/helper-fp8-defs.h
 create mode 100644 target/arm/vector-type.h
 create mode 100644 target/arm/tcg/fp8_helper.c

-- 
2.43.0
Re: [PATCH v4 00/60] target/arm: Implement FEAT_FP8
Posted by Peter Maydell 1 week, 4 days ago
On Fri, 8 May 2026 at 00:45, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Based-on: 20260507160959.449170-1-richard.henderson@linaro.org
> ("[PATCH v3 00/11] fpu: Export some internals for targets")
>
> Changes for v4:
>   - TF-A firmware update for FEAT_FPMR
>   - Rewrite decode/encode of FP8 data types
>   - Rewrite FEAT_FP8FMA helpers
>   - Enable FEAT_FP8DOT{4,2}
>   - Enable FEAT_FP8F{32,16}MM
>
> This completes the set of 8-bit floating point features.

Do you have a git branch with a rebased version of these patches? I
had a look at them, but as well as some mostly trivial rebase issues
on current head-of-git I found that it doesn't compile, because
this series wants fpu/ to export a parts64_scalbn() that returns
a value, but the version of that function in fpu/ is static and
returns void. Did some patch for fpu/ get lost ?

thanks
-- PMM
Re: [PATCH v4 00/60] target/arm: Implement FEAT_FP8
Posted by Peter Maydell 1 week, 4 days ago
On Thu, 14 May 2026 at 09:46, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Fri, 8 May 2026 at 00:45, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> >
> > Based-on: 20260507160959.449170-1-richard.henderson@linaro.org
> > ("[PATCH v3 00/11] fpu: Export some internals for targets")
> >
> > Changes for v4:
> >   - TF-A firmware update for FEAT_FPMR
> >   - Rewrite decode/encode of FP8 data types
> >   - Rewrite FEAT_FP8FMA helpers
> >   - Enable FEAT_FP8DOT{4,2}
> >   - Enable FEAT_FP8F{32,16}MM
> >
> > This completes the set of 8-bit floating point features.
>
> Do you have a git branch with a rebased version of these patches? I
> had a look at them, but as well as some mostly trivial rebase issues
> on current head-of-git I found that it doesn't compile, because
> this series wants fpu/ to export a parts64_scalbn() that returns
> a value, but the version of that function in fpu/ is static and
> returns void. Did some patch for fpu/ get lost ?

Ah, I found the scalbn patch that got dropped from the fpu pull,
which looks like it was the missing one.

-- PMM