[PATCH for-5.1 00/31] target/arm: SVE2, part 1

Richard Henderson posted 31 patches 4 years ago
Failed in applying to current master (apply log)
target/arm/cpu.h           |  31 ++
target/arm/helper-sve.h    | 345 +++++++++++++++++
target/arm/helper.h        |  81 +++-
target/arm/translate-a64.h |   9 +
target/arm/translate.h     |  24 +-
target/arm/vec_internal.h  | 161 ++++++++
target/arm/sve.decode      | 217 ++++++++++-
target/arm/helper.c        |   3 +-
target/arm/kvm64.c         |   2 +
target/arm/neon_helper.c   | 515 ++++---------------------
target/arm/sve_helper.c    | 757 ++++++++++++++++++++++++++++++++++---
target/arm/translate-a64.c | 557 +++++++++++++++++++++++----
target/arm/translate-sve.c | 557 +++++++++++++++++++++++++++
target/arm/translate.c     | 626 ++++++++++++++++++++++--------
target/arm/vec_helper.c    | 411 ++++++++++++++++++++
target/arm/vfp_helper.c    |   4 +-
16 files changed, 3532 insertions(+), 768 deletions(-)
create mode 100644 target/arm/vec_internal.h
[PATCH for-5.1 00/31] target/arm: SVE2, part 1
Posted by Richard Henderson 4 years ago
Posting this for early review.  It's based on some other patch
sets that I have posted recently that also touch SVE, listed
below.  But it might just be easier to clone the devel tree [2].
While the branch itself will rebase frequently for development,
I've also created a tag, post-sve2-20200326, for this posting.

This is mostly untested, as the most recently released Foundation
Model does not support SVE2.  Some of the new instructions overlap
with old fashioned NEON, and I can verify that those have not
broken, and show that SVE2 will use the same code path.  But the
predicated insns and bottom/top interleaved insns are not yet
RISU testable, as I have nothing to compare against.

The patches are in general arranged so that one complete group
of insns are added at once.  The groups within the manual [1]
have so far been small-ish.


r~

---

[1] ISA manual: https://static.docs.arm.com/ddi0602/d/ISA_A64_xml_futureA-2019-12_OPT.pdf

[2] Devel tree: https://github.com/rth7680/qemu/tree/tgt-arm-sve-2

Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=163610
("target/arm: sve load/store improvements")

Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164500
("target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA")

Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164048
("target/arm: Implement ARMv8.5-MemTag, system mode")

Richard Henderson (31):
  target/arm: Add ID_AA64ZFR0 fields and isar_feature_aa64_sve2
  target/arm: Implement SVE2 Integer Multiply - Unpredicated
  target/arm: Implement SVE2 integer pairwise add and accumulate long
  target/arm: Remove fp_status from helper_{recpe,rsqrte}_u32
  target/arm: Implement SVE2 integer unary operations (predicated)
  target/arm: Split out saturating/rounding shifts from neon
  target/arm: Implement SVE2 saturating/rounding bitwise shift left
    (predicated)
  target/arm: Implement SVE2 integer halving add/subtract (predicated)
  target/arm: Implement SVE2 integer pairwise arithmetic
  target/arm: Implement SVE2 saturating add/subtract (predicated)
  target/arm: Implement SVE2 integer add/subtract long
  target/arm: Implement SVE2 integer add/subtract interleaved long
  target/arm: Implement SVE2 integer add/subtract wide
  target/arm: Implement SVE2 integer multiply long
  target/arm: Implement PMULLB and PMULLT
  target/arm: Tidy SVE tszimm shift formats
  target/arm: Implement SVE2 bitwise shift left long
  target/arm: Implement SVE2 bitwise exclusive-or interleaved
  target/arm: Implement SVE2 bitwise permute
  target/arm: Implement SVE2 complex integer add
  target/arm: Implement SVE2 integer absolute difference and accumulate
    long
  target/arm: Implement SVE2 integer add/subtract long with carry
  target/arm: Create arm_gen_gvec_[us]sra
  target/arm: Create arm_gen_gvec_{u,s}{rshr,rsra}
  target/arm: Implement SVE2 bitwise shift right and accumulate
  target/arm: Create arm_gen_gvec_{sri,sli}
  target/arm: Tidy handle_vec_simd_shri
  target/arm: Implement SVE2 bitwise shift and insert
  target/arm: Vectorize SABD/UABD
  target/arm: Vectorize SABA/UABA
  target/arm: Implement SVE2 integer absolute difference and accumulate

 target/arm/cpu.h           |  31 ++
 target/arm/helper-sve.h    | 345 +++++++++++++++++
 target/arm/helper.h        |  81 +++-
 target/arm/translate-a64.h |   9 +
 target/arm/translate.h     |  24 +-
 target/arm/vec_internal.h  | 161 ++++++++
 target/arm/sve.decode      | 217 ++++++++++-
 target/arm/helper.c        |   3 +-
 target/arm/kvm64.c         |   2 +
 target/arm/neon_helper.c   | 515 ++++---------------------
 target/arm/sve_helper.c    | 757 ++++++++++++++++++++++++++++++++++---
 target/arm/translate-a64.c | 557 +++++++++++++++++++++++----
 target/arm/translate-sve.c | 557 +++++++++++++++++++++++++++
 target/arm/translate.c     | 626 ++++++++++++++++++++++--------
 target/arm/vec_helper.c    | 411 ++++++++++++++++++++
 target/arm/vfp_helper.c    |   4 +-
 16 files changed, 3532 insertions(+), 768 deletions(-)
 create mode 100644 target/arm/vec_internal.h

-- 
2.20.1


RE: [PATCH for-5.1 00/31] target/arm: SVE2, part 1
Posted by Ana Pazos 3 years, 12 months ago
Hello Richard,

I want to introduce you to Stephen Long. He is our new hire who started this week.

I want to know if you are available for a sync-up meeting to discuss how we can cooperate with qemu sve2 support.

Thank you,
Ana.

-----Original Message-----
From: Richard Henderson <richard.henderson@linaro.org>
Sent: Thursday, March 26, 2020 4:08 PM
To: qemu-devel@nongnu.org
Cc: qemu-arm@nongnu.org; Ana Pazos <apazos@quicinc.com>; Raja Venkateswaran <rajav@quicinc.com>
Subject: [PATCH for-5.1 00/31] target/arm: SVE2, part 1

-------------------------------------------------------------------------
CAUTION: This email originated from outside of the organization.
-------------------------------------------------------------------------

Posting this for early review.  It's based on some other patch sets that I have posted recently that also touch SVE, listed below.  But it might just be easier to clone the devel tree [2].
While the branch itself will rebase frequently for development, I've also created a tag, post-sve2-20200326, for this posting.

This is mostly untested, as the most recently released Foundation Model does not support SVE2.  Some of the new instructions overlap with old fashioned NEON, and I can verify that those have not broken, and show that SVE2 will use the same code path.  But the predicated insns and bottom/top interleaved insns are not yet RISU testable, as I have nothing to compare against.

The patches are in general arranged so that one complete group of insns are added at once.  The groups within the manual [1] have so far been small-ish.


r~

---

[1] ISA manual: https://static.docs.arm.com/ddi0602/d/ISA_A64_xml_futureA-2019-12_OPT.pdf

[2] Devel tree: https://github.com/rth7680/qemu/tree/tgt-arm-sve-2

Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=163610
("target/arm: sve load/store improvements")

Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164500
("target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA")

Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164048
("target/arm: Implement ARMv8.5-MemTag, system mode")

Richard Henderson (31):
  target/arm: Add ID_AA64ZFR0 fields and isar_feature_aa64_sve2
  target/arm: Implement SVE2 Integer Multiply - Unpredicated
  target/arm: Implement SVE2 integer pairwise add and accumulate long
  target/arm: Remove fp_status from helper_{recpe,rsqrte}_u32
  target/arm: Implement SVE2 integer unary operations (predicated)
  target/arm: Split out saturating/rounding shifts from neon
  target/arm: Implement SVE2 saturating/rounding bitwise shift left
    (predicated)
  target/arm: Implement SVE2 integer halving add/subtract (predicated)
  target/arm: Implement SVE2 integer pairwise arithmetic
  target/arm: Implement SVE2 saturating add/subtract (predicated)
  target/arm: Implement SVE2 integer add/subtract long
  target/arm: Implement SVE2 integer add/subtract interleaved long
  target/arm: Implement SVE2 integer add/subtract wide
  target/arm: Implement SVE2 integer multiply long
  target/arm: Implement PMULLB and PMULLT
  target/arm: Tidy SVE tszimm shift formats
  target/arm: Implement SVE2 bitwise shift left long
  target/arm: Implement SVE2 bitwise exclusive-or interleaved
  target/arm: Implement SVE2 bitwise permute
  target/arm: Implement SVE2 complex integer add
  target/arm: Implement SVE2 integer absolute difference and accumulate
    long
  target/arm: Implement SVE2 integer add/subtract long with carry
  target/arm: Create arm_gen_gvec_[us]sra
  target/arm: Create arm_gen_gvec_{u,s}{rshr,rsra}
  target/arm: Implement SVE2 bitwise shift right and accumulate
  target/arm: Create arm_gen_gvec_{sri,sli}
  target/arm: Tidy handle_vec_simd_shri
  target/arm: Implement SVE2 bitwise shift and insert
  target/arm: Vectorize SABD/UABD
  target/arm: Vectorize SABA/UABA
  target/arm: Implement SVE2 integer absolute difference and accumulate

 target/arm/cpu.h           |  31 ++
 target/arm/helper-sve.h    | 345 +++++++++++++++++
 target/arm/helper.h        |  81 +++-
 target/arm/translate-a64.h |   9 +
 target/arm/translate.h     |  24 +-
 target/arm/vec_internal.h  | 161 ++++++++
 target/arm/sve.decode      | 217 ++++++++++-
 target/arm/helper.c        |   3 +-
 target/arm/kvm64.c         |   2 +
 target/arm/neon_helper.c   | 515 ++++---------------------
 target/arm/sve_helper.c    | 757 ++++++++++++++++++++++++++++++++++---
 target/arm/translate-a64.c | 557 +++++++++++++++++++++++----  target/arm/translate-sve.c | 557 +++++++++++++++++++++++++++
 target/arm/translate.c     | 626 ++++++++++++++++++++++--------
 target/arm/vec_helper.c    | 411 ++++++++++++++++++++
 target/arm/vfp_helper.c    |   4 +-
 16 files changed, 3532 insertions(+), 768 deletions(-)  create mode 100644 target/arm/vec_internal.h

--
2.20.1



Re: [PATCH for-5.1 00/31] target/arm: SVE2, part 1
Posted by LIU Zhiwei 3 years, 11 months ago
Hi Richard,

I find BF16 is included in the ISA.  Will you extend  the softfpu in 
this patch set?

Zhiwei

On 2020/3/27 7:08, Richard Henderson wrote:
> Posting this for early review.  It's based on some other patch
> sets that I have posted recently that also touch SVE, listed
> below.  But it might just be easier to clone the devel tree [2].
> While the branch itself will rebase frequently for development,
> I've also created a tag, post-sve2-20200326, for this posting.
>
> This is mostly untested, as the most recently released Foundation
> Model does not support SVE2.  Some of the new instructions overlap
> with old fashioned NEON, and I can verify that those have not
> broken, and show that SVE2 will use the same code path.  But the
> predicated insns and bottom/top interleaved insns are not yet
> RISU testable, as I have nothing to compare against.
>
> The patches are in general arranged so that one complete group
> of insns are added at once.  The groups within the manual [1]
> have so far been small-ish.
>
>
> r~
>
> ---
>
> [1] ISA manual: https://static.docs.arm.com/ddi0602/d/ISA_A64_xml_futureA-2019-12_OPT.pdf
>
> [2] Devel tree: https://github.com/rth7680/qemu/tree/tgt-arm-sve-2
>
> Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=163610
> ("target/arm: sve load/store improvements")
>
> Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164500
> ("target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA")
>
> Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164048
> ("target/arm: Implement ARMv8.5-MemTag, system mode")
>
> Richard Henderson (31):
>    target/arm: Add ID_AA64ZFR0 fields and isar_feature_aa64_sve2
>    target/arm: Implement SVE2 Integer Multiply - Unpredicated
>    target/arm: Implement SVE2 integer pairwise add and accumulate long
>    target/arm: Remove fp_status from helper_{recpe,rsqrte}_u32
>    target/arm: Implement SVE2 integer unary operations (predicated)
>    target/arm: Split out saturating/rounding shifts from neon
>    target/arm: Implement SVE2 saturating/rounding bitwise shift left
>      (predicated)
>    target/arm: Implement SVE2 integer halving add/subtract (predicated)
>    target/arm: Implement SVE2 integer pairwise arithmetic
>    target/arm: Implement SVE2 saturating add/subtract (predicated)
>    target/arm: Implement SVE2 integer add/subtract long
>    target/arm: Implement SVE2 integer add/subtract interleaved long
>    target/arm: Implement SVE2 integer add/subtract wide
>    target/arm: Implement SVE2 integer multiply long
>    target/arm: Implement PMULLB and PMULLT
>    target/arm: Tidy SVE tszimm shift formats
>    target/arm: Implement SVE2 bitwise shift left long
>    target/arm: Implement SVE2 bitwise exclusive-or interleaved
>    target/arm: Implement SVE2 bitwise permute
>    target/arm: Implement SVE2 complex integer add
>    target/arm: Implement SVE2 integer absolute difference and accumulate
>      long
>    target/arm: Implement SVE2 integer add/subtract long with carry
>    target/arm: Create arm_gen_gvec_[us]sra
>    target/arm: Create arm_gen_gvec_{u,s}{rshr,rsra}
>    target/arm: Implement SVE2 bitwise shift right and accumulate
>    target/arm: Create arm_gen_gvec_{sri,sli}
>    target/arm: Tidy handle_vec_simd_shri
>    target/arm: Implement SVE2 bitwise shift and insert
>    target/arm: Vectorize SABD/UABD
>    target/arm: Vectorize SABA/UABA
>    target/arm: Implement SVE2 integer absolute difference and accumulate
>
>   target/arm/cpu.h           |  31 ++
>   target/arm/helper-sve.h    | 345 +++++++++++++++++
>   target/arm/helper.h        |  81 +++-
>   target/arm/translate-a64.h |   9 +
>   target/arm/translate.h     |  24 +-
>   target/arm/vec_internal.h  | 161 ++++++++
>   target/arm/sve.decode      | 217 ++++++++++-
>   target/arm/helper.c        |   3 +-
>   target/arm/kvm64.c         |   2 +
>   target/arm/neon_helper.c   | 515 ++++---------------------
>   target/arm/sve_helper.c    | 757 ++++++++++++++++++++++++++++++++++---
>   target/arm/translate-a64.c | 557 +++++++++++++++++++++++----
>   target/arm/translate-sve.c | 557 +++++++++++++++++++++++++++
>   target/arm/translate.c     | 626 ++++++++++++++++++++++--------
>   target/arm/vec_helper.c    | 411 ++++++++++++++++++++
>   target/arm/vfp_helper.c    |   4 +-
>   16 files changed, 3532 insertions(+), 768 deletions(-)
>   create mode 100644 target/arm/vec_internal.h
>



Re: [PATCH for-5.1 00/31] target/arm: SVE2, part 1
Posted by Richard Henderson 3 years, 11 months ago
On 4/21/20 7:51 PM, LIU Zhiwei wrote:
> I find BF16 is included in the ISA.  Will you extend  the softfpu in this patch
> set?

I will do that eventually, but probably not part of the first full SVE2 patch set.

There are several optional extensions to SVE2, of which BF16 is one.  But BF16
also requires changes to the normal FPU as well, and Arm requires SVE and FPU
be in sync.


r~