target/arm: SME BFCVT, BFCVTN have "Alternate BFloat16 behaviors"

[PATCH] target/arm: SME BFCVT, BFCVTN have "Alternate BFloat16 behaviors"

Posted by Peter Maydell 1 week, 1 day ago

The Arm ARM A1.5.10 notes that some instructions have "Alternate
Bfloat16 behaviors" when FPCR.AH == 1.  We implement these using the
FPST_AH and FPST_AH_F16 fp_status words.  The list includes the SME
BFVCT (single-precision to BFloat16) and BFCVTN, but we forgot to
make those use FPST_AH_F16 when we implemented them. (We get the
ASIMD and SVE insns on the list right.)

Add the missing logic to select the right FPST.

Cc: qemu-stable@nongnu.org
Fixes: 465d36db0e1 ("target/arm: Implement SME2 BFCVT, BFCVTN, FCVT, FCVTN")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
(There will be a minor conflict with the rename of BFCVT to
BFCVT_hs in RTH's FP8 series.)
---
 target/arm/tcg/translate-sme.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index 08254b088e..8e56dedaa3 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -1419,9 +1419,9 @@ static bool do_zz_fpst(DisasContext *s, arg_zz_n *a, int data,
 }
 
 TRANS_FEAT(BFCVT, aa64_sme2, do_zz_fpst, a, 0,
-           FPST_A64, gen_helper_sme2_bfcvt)
+           s->fpcr_ah ? FPST_AH : FPST_A64, gen_helper_sme2_bfcvt)
 TRANS_FEAT(BFCVTN, aa64_sme2, do_zz_fpst, a, 0,
-           FPST_A64, gen_helper_sme2_bfcvtn)
+           s->fpcr_ah ? FPST_AH : FPST_A64, gen_helper_sme2_bfcvtn)
 TRANS_FEAT(FCVT_n, aa64_sme2, do_zz_fpst, a, 0,
            FPST_A64, gen_helper_sme2_fcvt_n)
 TRANS_FEAT(FCVTN, aa64_sme2, do_zz_fpst, a, 0,
-- 
2.43.0

Re: [PATCH] target/arm: SME BFCVT, BFCVTN have "Alternate BFloat16 behaviors"

Posted by Richard Henderson 2 days ago

On 5/21/26 11:08, Peter Maydell wrote:
> The Arm ARM A1.5.10 notes that some instructions have "Alternate
> Bfloat16 behaviors" when FPCR.AH == 1.  We implement these using the
> FPST_AH and FPST_AH_F16 fp_status words.  The list includes the SME
> BFVCT (single-precision to BFloat16) and BFCVTN, but we forgot to
> make those use FPST_AH_F16 when we implemented them. (We get the
> ASIMD and SVE insns on the list right.)
> 
> Add the missing logic to select the right FPST.
> 
> Cc: qemu-stable@nongnu.org
> Fixes: 465d36db0e1 ("target/arm: Implement SME2 BFCVT, BFCVTN, FCVT, FCVTN")
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> (There will be a minor conflict with the rename of BFCVT to
> BFCVT_hs in RTH's FP8 series.)
> ---
>   target/arm/tcg/translate-sme.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
> index 08254b088e..8e56dedaa3 100644
> --- a/target/arm/tcg/translate-sme.c
> +++ b/target/arm/tcg/translate-sme.c
> @@ -1419,9 +1419,9 @@ static bool do_zz_fpst(DisasContext *s, arg_zz_n *a, int data,
>   }
>   
>   TRANS_FEAT(BFCVT, aa64_sme2, do_zz_fpst, a, 0,
> -           FPST_A64, gen_helper_sme2_bfcvt)
> +           s->fpcr_ah ? FPST_AH : FPST_A64, gen_helper_sme2_bfcvt)
>   TRANS_FEAT(BFCVTN, aa64_sme2, do_zz_fpst, a, 0,
> -           FPST_A64, gen_helper_sme2_bfcvtn)
> +           s->fpcr_ah ? FPST_AH : FPST_A64, gen_helper_sme2_bfcvtn)
>   TRANS_FEAT(FCVT_n, aa64_sme2, do_zz_fpst, a, 0,
>              FPST_A64, gen_helper_sme2_fcvt_n)
>   TRANS_FEAT(FCVTN, aa64_sme2, do_zz_fpst, a, 0,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~

Re: [PATCH] target/arm: SME BFCVT, BFCVTN have "Alternate BFloat16 behaviors"

Posted by Richard Henderson 1 week ago

On 5/21/26 11:08, Peter Maydell wrote:
> The Arm ARM A1.5.10 notes that some instructions have "Alternate
> Bfloat16 behaviors" when FPCR.AH == 1.  We implement these using the
> FPST_AH and FPST_AH_F16 fp_status words.  The list includes the SME
> BFVCT (single-precision to BFloat16) and BFCVTN, but we forgot to
> make those use FPST_AH_F16 when we implemented them. (We get the
> ASIMD and SVE insns on the list right.)
> 
> Add the missing logic to select the right FPST.
> 
> Cc: qemu-stable@nongnu.org
> Fixes: 465d36db0e1 ("target/arm: Implement SME2 BFCVT, BFCVTN, FCVT, FCVTN")
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> (There will be a minor conflict with the rename of BFCVT to
> BFCVT_hs in RTH's FP8 series.)
> ---
>   target/arm/tcg/translate-sme.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
> index 08254b088e..8e56dedaa3 100644
> --- a/target/arm/tcg/translate-sme.c
> +++ b/target/arm/tcg/translate-sme.c
> @@ -1419,9 +1419,9 @@ static bool do_zz_fpst(DisasContext *s, arg_zz_n *a, int data,
>   }
>   
>   TRANS_FEAT(BFCVT, aa64_sme2, do_zz_fpst, a, 0,
> -           FPST_A64, gen_helper_sme2_bfcvt)
> +           s->fpcr_ah ? FPST_AH : FPST_A64, gen_helper_sme2_bfcvt)
>   TRANS_FEAT(BFCVTN, aa64_sme2, do_zz_fpst, a, 0,
> -           FPST_A64, gen_helper_sme2_bfcvtn)
> +           s->fpcr_ah ? FPST_AH : FPST_A64, gen_helper_sme2_bfcvtn)
>   TRANS_FEAT(FCVT_n, aa64_sme2, do_zz_fpst, a, 0,
>              FPST_A64, gen_helper_sme2_fcvt_n)
>   TRANS_FEAT(FCVTN, aa64_sme2, do_zz_fpst, a, 0,

What is the difference between FPST_AH and FPST_A64 when AH is set, given that 
vfp_set_fpcr_to_host does the arm_set_*_fp_behaviours dance?


r~

Re: [PATCH] target/arm: SME BFCVT, BFCVTN have "Alternate BFloat16 behaviors"

Posted by Peter Maydell 6 days, 19 hours ago

On Fri, 22 May 2026 at 22:57, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 5/21/26 11:08, Peter Maydell wrote:
> > The Arm ARM A1.5.10 notes that some instructions have "Alternate
> > Bfloat16 behaviors" when FPCR.AH == 1.  We implement these using the
> > FPST_AH and FPST_AH_F16 fp_status words.  The list includes the SME
> > BFVCT (single-precision to BFloat16) and BFCVTN, but we forgot to
> > make those use FPST_AH_F16 when we implemented them. (We get the
> > ASIMD and SVE insns on the list right.)
> >
> > Add the missing logic to select the right FPST.
> >
> > Cc: qemu-stable@nongnu.org
> > Fixes: 465d36db0e1 ("target/arm: Implement SME2 BFCVT, BFCVTN, FCVT, FCVTN")
> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > ---
> > (There will be a minor conflict with the rename of BFCVT to
> > BFCVT_hs in RTH's FP8 series.)
> > ---
> >   target/arm/tcg/translate-sme.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
> > index 08254b088e..8e56dedaa3 100644
> > --- a/target/arm/tcg/translate-sme.c
> > +++ b/target/arm/tcg/translate-sme.c
> > @@ -1419,9 +1419,9 @@ static bool do_zz_fpst(DisasContext *s, arg_zz_n *a, int data,
> >   }
> >
> >   TRANS_FEAT(BFCVT, aa64_sme2, do_zz_fpst, a, 0,
> > -           FPST_A64, gen_helper_sme2_bfcvt)
> > +           s->fpcr_ah ? FPST_AH : FPST_A64, gen_helper_sme2_bfcvt)
> >   TRANS_FEAT(BFCVTN, aa64_sme2, do_zz_fpst, a, 0,
> > -           FPST_A64, gen_helper_sme2_bfcvtn)
> > +           s->fpcr_ah ? FPST_AH : FPST_A64, gen_helper_sme2_bfcvtn)
> >   TRANS_FEAT(FCVT_n, aa64_sme2, do_zz_fpst, a, 0,
> >              FPST_A64, gen_helper_sme2_fcvt_n)
> >   TRANS_FEAT(FCVTN, aa64_sme2, do_zz_fpst, a, 0,
>
> What is the difference between FPST_AH and FPST_A64 when AH is set, given that
> vfp_set_fpcr_to_host does the arm_set_*_fp_behaviours dance?

(1) FPST_AH always has flush_to_zero and flush_input_to_zero set,
whereas FPST_A64 sets and clears those based on FPCR.FZ and FPCR.FIZ
(2) FPST_A64 adjusts the rounding mode when the FPCR rounding mode
field changes, and FPST_AH does not
(3) vfp_get_fpsr_from_host() looks at FPST_A64 but ignores FPST_AH

These combined get us the A1.5.10 required behaviour:

- Produce the expected IEEE 754 default result but do not update the
 FPSR cumulative exception flag bits.
- Disable trapped floating-point exceptions, as if
 FPCR.{IDE, IXE, UFE, OFE, DZE, IOE} are all 0.
- Use Round to Nearest Even, ignoring FPCR.RMode.
- Flush denormalized inputs and outputs to zero, as if FPCR.{FZ, FIZ} is {1, 1}.

(since we don't implement trapped fp exceptions).

We use FPST_AH for the A1.5.10 insns, and also for FRECPE, FRECPS,
FRECPX, FRSQRTE, and FRSQRTS, which also want these effects.
(This information is slightly less helpfully located, being
scattered across A1.5.6.1, A1.5.6.2, A1.5.8, A1.5.9.1.)

There might be a clearer name than FPST_AH, but I couldn't
think of one at the time. We do at least document all this
in the comment in cpu.h.

thanks
-- PMM