[PATCH qemu v18 00/16] Add tail agnostic behavior for rvv instructions

~eopxd posted 16 patches 1 year, 11 months ago
Failed in applying to current master (apply log)
Maintainers: Palmer Dabbelt <palmer@dabbelt.com>, Alistair Francis <alistair.francis@wdc.com>, Bin Meng <bin.meng@windriver.com>
There is a newer version of this series
target/riscv/cpu.c                      |    1 +
target/riscv/cpu.h                      |    2 +
target/riscv/cpu_helper.c               |    2 +
target/riscv/insn_trans/trans_rvv.c.inc |   94 +-
target/riscv/internals.h                |    6 +-
target/riscv/translate.c                |    4 +
target/riscv/vector_helper.c            | 1587 ++++++++++++++---------
7 files changed, 1053 insertions(+), 643 deletions(-)
[PATCH qemu v18 00/16] Add tail agnostic behavior for rvv instructions
Posted by ~eopxd 1 year, 11 months ago
According to v-spec, tail agnostic behavior can be either kept as
undisturbed or set elements' bits to all 1s. To distinguish the
difference of tail policies, QEMU should be able to simulate the tail
agnostic behavior as "set tail elements' bits to all 1s". An option
'rvv_ta_all_1s' is added to enable the behavior, it is default as
disabled.

There are multiple possibility for agnostic elements according to
v-spec. The main intent of this patch-set tries to add option that
can distinguish between tail policies. Setting agnostic elements to
all 1s makes things simple and allow QEMU to express this.

We may explore other possibility of agnostic behavior by adding
other options in the future. Please understand that this patch-set
is limited.

v2 updates:
- Addressed comments from Weiwei Li
- Added commit tail agnostic on load / store instructions (which
  I forgot to include into the patch-set)

v3 updates:
- Missed the very 1st commit, adding it back

v4 updates:
- Renamed vlmax to total_elems
- Deal with tail element when vl_eq_vlmax == true

v5 updates:
- Let `vext_get_total_elems` take `desc` and `esz`
- Utilize `simd_maxsz(desc)` to get `vlenb`
- Fix alignments to code

v6 updates:
- Fix `vext_get_total_elems`

v7 updates:
- Reuse `max_elems` for vector load / store helper functions. The
  translation sets desc's `lmul` to `min(1, lmul)`, making
  `vext_max_elems` equivalent to `vext_get_total_elems`.

v8 updates:
- Simplify `vext_set_elems_1s`, don't need `vext_set_elems_1s_fns`
- Fix `vext_get_total_elems`, it should derive upon EMUL instead
  of LMUL

v9 updates:
- Let instructions that is tail agnostic regardless of vta respect the
  option and not the vta.

v10 updates:
- Correct range to set element to 1s for load instructions

v11 updates:
- Separate addition of option 'rvv_ta_all_1s' as a new (last) commit
- Add description to show intent of the option in first commit for the
  optional tail agnostic behavior
- Tag WeiWei as Reviewed-by for all commits
- Tag Alistair as Reviewed-by for commit 01, 02
- Tag Alistair as Acked-by for commit 03

v12 updates:
- Add missing space in WeiWei's "Reviewed-by" tag

v13 updates:
- Fix tail agnostic for vext_ldst_us. The function operates on input
  parameter 'evl' rather than 'env->vl'.
- Fix tail elements for vector segment load / store instructions
  A vector segment load / store instruction may contain fractional
  lmul with nf * lmul > 1. The rest of the elements in the last
  register should be treated as tail elements.
- Fix tail agnostic length for instructions with mask destination
  register. Instructions with mask destination register should have
  'vlen - vl' tail elements.

v14 updates:
- Pass lmul information to into vector helper function.
  `vext_get_total_elems` needs it.

v15 updates:
- Rebase to latest `master`
- Tag Alistair as Acked by for commit 04 ~ 14
- Tag Alistair as Acked by for commit 15

v16 updates:
- Fix bug, when encountering situation when lmul < 0 and vl_eq_vlmax,
  the original version will override on `vd` but the computation will
  override again, meaning the tail elements will not be set correctly.
  Now, we don't use TCG functions if we are trying to simulate all 1s
  for agnostic and use vector helpers instead.

v17 updates:
- Add "Prune access_type parameter" commit to cleanup vector load/
  store functions. Then add parameter `is_load` in vector helper
  functions to enable vta behavior in the commit for adding vta on
  vector load/store functions.

v18 updates:
- Don't use `is_load` parameter in vector helper. Don't let vta pass
   through in `trans_rvv.inc`

eopXD (16):
  target/riscv: rvv: Prune redundant ESZ, DSZ parameter passed
  target/riscv: rvv: Prune redundant access_type parameter passed
  target/riscv: rvv: Rename ambiguous esz
  target/riscv: rvv: Early exit when vstart >= vl
  target/riscv: rvv: Add tail agnostic for vv instructions
  target/riscv: rvv: Add tail agnostic for vector load / store
    instructions
  target/riscv: rvv: Add tail agnostic for vx, vvm, vxm instructions
  target/riscv: rvv: Add tail agnostic for vector integer shift
    instructions
  target/riscv: rvv: Add tail agnostic for vector integer comparison
    instructions
  target/riscv: rvv: Add tail agnostic for vector integer merge and move
    instructions
  target/riscv: rvv: Add tail agnostic for vector fix-point arithmetic
    instructions
  target/riscv: rvv: Add tail agnostic for vector floating-point
    instructions
  target/riscv: rvv: Add tail agnostic for vector reduction instructions
  target/riscv: rvv: Add tail agnostic for vector mask instructions
  target/riscv: rvv: Add tail agnostic for vector permutation
    instructions
  target/riscv: rvv: Add option 'rvv_ta_all_1s' to enable optional tail
    agnostic behavior

 target/riscv/cpu.c                      |    1 +
 target/riscv/cpu.h                      |    2 +
 target/riscv/cpu_helper.c               |    2 +
 target/riscv/insn_trans/trans_rvv.c.inc |   94 +-
 target/riscv/internals.h                |    6 +-
 target/riscv/translate.c                |    4 +
 target/riscv/vector_helper.c            | 1587 ++++++++++++++---------
 7 files changed, 1053 insertions(+), 643 deletions(-)

-- 
2.34.2
Re: [PATCH qemu v18 00/16] Add tail agnostic behavior for rvv instructions
Posted by Alistair Francis 1 year, 10 months ago
On Fri, May 13, 2022 at 9:55 PM ~eopxd <eopxd@git.sr.ht> wrote:
>
> According to v-spec, tail agnostic behavior can be either kept as
> undisturbed or set elements' bits to all 1s. To distinguish the
> difference of tail policies, QEMU should be able to simulate the tail
> agnostic behavior as "set tail elements' bits to all 1s". An option
> 'rvv_ta_all_1s' is added to enable the behavior, it is default as
> disabled.
>
> There are multiple possibility for agnostic elements according to
> v-spec. The main intent of this patch-set tries to add option that
> can distinguish between tail policies. Setting agnostic elements to
> all 1s makes things simple and allow QEMU to express this.
>
> We may explore other possibility of agnostic behavior by adding
> other options in the future. Please understand that this patch-set
> is limited.
>
> v2 updates:
> - Addressed comments from Weiwei Li
> - Added commit tail agnostic on load / store instructions (which
>   I forgot to include into the patch-set)
>
> v3 updates:
> - Missed the very 1st commit, adding it back
>
> v4 updates:
> - Renamed vlmax to total_elems
> - Deal with tail element when vl_eq_vlmax == true
>
> v5 updates:
> - Let `vext_get_total_elems` take `desc` and `esz`
> - Utilize `simd_maxsz(desc)` to get `vlenb`
> - Fix alignments to code
>
> v6 updates:
> - Fix `vext_get_total_elems`
>
> v7 updates:
> - Reuse `max_elems` for vector load / store helper functions. The
>   translation sets desc's `lmul` to `min(1, lmul)`, making
>   `vext_max_elems` equivalent to `vext_get_total_elems`.
>
> v8 updates:
> - Simplify `vext_set_elems_1s`, don't need `vext_set_elems_1s_fns`
> - Fix `vext_get_total_elems`, it should derive upon EMUL instead
>   of LMUL
>
> v9 updates:
> - Let instructions that is tail agnostic regardless of vta respect the
>   option and not the vta.
>
> v10 updates:
> - Correct range to set element to 1s for load instructions
>
> v11 updates:
> - Separate addition of option 'rvv_ta_all_1s' as a new (last) commit
> - Add description to show intent of the option in first commit for the
>   optional tail agnostic behavior
> - Tag WeiWei as Reviewed-by for all commits
> - Tag Alistair as Reviewed-by for commit 01, 02
> - Tag Alistair as Acked-by for commit 03
>
> v12 updates:
> - Add missing space in WeiWei's "Reviewed-by" tag
>
> v13 updates:
> - Fix tail agnostic for vext_ldst_us. The function operates on input
>   parameter 'evl' rather than 'env->vl'.
> - Fix tail elements for vector segment load / store instructions
>   A vector segment load / store instruction may contain fractional
>   lmul with nf * lmul > 1. The rest of the elements in the last
>   register should be treated as tail elements.
> - Fix tail agnostic length for instructions with mask destination
>   register. Instructions with mask destination register should have
>   'vlen - vl' tail elements.
>
> v14 updates:
> - Pass lmul information to into vector helper function.
>   `vext_get_total_elems` needs it.
>
> v15 updates:
> - Rebase to latest `master`
> - Tag Alistair as Acked by for commit 04 ~ 14
> - Tag Alistair as Acked by for commit 15
>
> v16 updates:
> - Fix bug, when encountering situation when lmul < 0 and vl_eq_vlmax,
>   the original version will override on `vd` but the computation will
>   override again, meaning the tail elements will not be set correctly.
>   Now, we don't use TCG functions if we are trying to simulate all 1s
>   for agnostic and use vector helpers instead.
>
> v17 updates:
> - Add "Prune access_type parameter" commit to cleanup vector load/
>   store functions. Then add parameter `is_load` in vector helper
>   functions to enable vta behavior in the commit for adding vta on
>   vector load/store functions.
>
> v18 updates:
> - Don't use `is_load` parameter in vector helper. Don't let vta pass
>    through in `trans_rvv.inc`
>
> eopXD (16):
>   target/riscv: rvv: Prune redundant ESZ, DSZ parameter passed
>   target/riscv: rvv: Prune redundant access_type parameter passed
>   target/riscv: rvv: Rename ambiguous esz
>   target/riscv: rvv: Early exit when vstart >= vl
>   target/riscv: rvv: Add tail agnostic for vv instructions
>   target/riscv: rvv: Add tail agnostic for vector load / store
>     instructions
>   target/riscv: rvv: Add tail agnostic for vx, vvm, vxm instructions
>   target/riscv: rvv: Add tail agnostic for vector integer shift
>     instructions
>   target/riscv: rvv: Add tail agnostic for vector integer comparison
>     instructions
>   target/riscv: rvv: Add tail agnostic for vector integer merge and move
>     instructions
>   target/riscv: rvv: Add tail agnostic for vector fix-point arithmetic
>     instructions
>   target/riscv: rvv: Add tail agnostic for vector floating-point
>     instructions
>   target/riscv: rvv: Add tail agnostic for vector reduction instructions
>   target/riscv: rvv: Add tail agnostic for vector mask instructions
>   target/riscv: rvv: Add tail agnostic for vector permutation
>     instructions
>   target/riscv: rvv: Add option 'rvv_ta_all_1s' to enable optional tail
>     agnostic behavior

Do you mind rebasing this on:
https://github.com/alistair23/qemu/tree/riscv-to-apply.next

Alistair

>
>  target/riscv/cpu.c                      |    1 +
>  target/riscv/cpu.h                      |    2 +
>  target/riscv/cpu_helper.c               |    2 +
>  target/riscv/insn_trans/trans_rvv.c.inc |   94 +-
>  target/riscv/internals.h                |    6 +-
>  target/riscv/translate.c                |    4 +
>  target/riscv/vector_helper.c            | 1587 ++++++++++++++---------
>  7 files changed, 1053 insertions(+), 643 deletions(-)
>
> --
> 2.34.2
>
Re: [PATCH qemu v18 00/16] Add tail agnostic behavior for rvv instructions
Posted by eop Chen 1 year, 10 months ago
Rebased to riscv-to-apply.next and submitted v19.
Thank you WeiWei, Frank and Alistair for the reviews along the way.

Regards,

eop Chen

> Alistair Francis <alistair23@gmail.com> 於 2022年6月6日 上午9:37 寫道:
> 
> On Fri, May 13, 2022 at 9:55 PM ~eopxd <eopxd@git.sr.ht> wrote:
>> 
>> According to v-spec, tail agnostic behavior can be either kept as
>> undisturbed or set elements' bits to all 1s. To distinguish the
>> difference of tail policies, QEMU should be able to simulate the tail
>> agnostic behavior as "set tail elements' bits to all 1s". An option
>> 'rvv_ta_all_1s' is added to enable the behavior, it is default as
>> disabled.
>> 
>> There are multiple possibility for agnostic elements according to
>> v-spec. The main intent of this patch-set tries to add option that
>> can distinguish between tail policies. Setting agnostic elements to
>> all 1s makes things simple and allow QEMU to express this.
>> 
>> We may explore other possibility of agnostic behavior by adding
>> other options in the future. Please understand that this patch-set
>> is limited.
>> 
>> v2 updates:
>> - Addressed comments from Weiwei Li
>> - Added commit tail agnostic on load / store instructions (which
>>  I forgot to include into the patch-set)
>> 
>> v3 updates:
>> - Missed the very 1st commit, adding it back
>> 
>> v4 updates:
>> - Renamed vlmax to total_elems
>> - Deal with tail element when vl_eq_vlmax == true
>> 
>> v5 updates:
>> - Let `vext_get_total_elems` take `desc` and `esz`
>> - Utilize `simd_maxsz(desc)` to get `vlenb`
>> - Fix alignments to code
>> 
>> v6 updates:
>> - Fix `vext_get_total_elems`
>> 
>> v7 updates:
>> - Reuse `max_elems` for vector load / store helper functions. The
>>  translation sets desc's `lmul` to `min(1, lmul)`, making
>>  `vext_max_elems` equivalent to `vext_get_total_elems`.
>> 
>> v8 updates:
>> - Simplify `vext_set_elems_1s`, don't need `vext_set_elems_1s_fns`
>> - Fix `vext_get_total_elems`, it should derive upon EMUL instead
>>  of LMUL
>> 
>> v9 updates:
>> - Let instructions that is tail agnostic regardless of vta respect the
>>  option and not the vta.
>> 
>> v10 updates:
>> - Correct range to set element to 1s for load instructions
>> 
>> v11 updates:
>> - Separate addition of option 'rvv_ta_all_1s' as a new (last) commit
>> - Add description to show intent of the option in first commit for the
>>  optional tail agnostic behavior
>> - Tag WeiWei as Reviewed-by for all commits
>> - Tag Alistair as Reviewed-by for commit 01, 02
>> - Tag Alistair as Acked-by for commit 03
>> 
>> v12 updates:
>> - Add missing space in WeiWei's "Reviewed-by" tag
>> 
>> v13 updates:
>> - Fix tail agnostic for vext_ldst_us. The function operates on input
>>  parameter 'evl' rather than 'env->vl'.
>> - Fix tail elements for vector segment load / store instructions
>>  A vector segment load / store instruction may contain fractional
>>  lmul with nf * lmul > 1. The rest of the elements in the last
>>  register should be treated as tail elements.
>> - Fix tail agnostic length for instructions with mask destination
>>  register. Instructions with mask destination register should have
>>  'vlen - vl' tail elements.
>> 
>> v14 updates:
>> - Pass lmul information to into vector helper function.
>>  `vext_get_total_elems` needs it.
>> 
>> v15 updates:
>> - Rebase to latest `master`
>> - Tag Alistair as Acked by for commit 04 ~ 14
>> - Tag Alistair as Acked by for commit 15
>> 
>> v16 updates:
>> - Fix bug, when encountering situation when lmul < 0 and vl_eq_vlmax,
>>  the original version will override on `vd` but the computation will
>>  override again, meaning the tail elements will not be set correctly.
>>  Now, we don't use TCG functions if we are trying to simulate all 1s
>>  for agnostic and use vector helpers instead.
>> 
>> v17 updates:
>> - Add "Prune access_type parameter" commit to cleanup vector load/
>>  store functions. Then add parameter `is_load` in vector helper
>>  functions to enable vta behavior in the commit for adding vta on
>>  vector load/store functions.
>> 
>> v18 updates:
>> - Don't use `is_load` parameter in vector helper. Don't let vta pass
>>   through in `trans_rvv.inc`
>> 
>> eopXD (16):
>>  target/riscv: rvv: Prune redundant ESZ, DSZ parameter passed
>>  target/riscv: rvv: Prune redundant access_type parameter passed
>>  target/riscv: rvv: Rename ambiguous esz
>>  target/riscv: rvv: Early exit when vstart >= vl
>>  target/riscv: rvv: Add tail agnostic for vv instructions
>>  target/riscv: rvv: Add tail agnostic for vector load / store
>>    instructions
>>  target/riscv: rvv: Add tail agnostic for vx, vvm, vxm instructions
>>  target/riscv: rvv: Add tail agnostic for vector integer shift
>>    instructions
>>  target/riscv: rvv: Add tail agnostic for vector integer comparison
>>    instructions
>>  target/riscv: rvv: Add tail agnostic for vector integer merge and move
>>    instructions
>>  target/riscv: rvv: Add tail agnostic for vector fix-point arithmetic
>>    instructions
>>  target/riscv: rvv: Add tail agnostic for vector floating-point
>>    instructions
>>  target/riscv: rvv: Add tail agnostic for vector reduction instructions
>>  target/riscv: rvv: Add tail agnostic for vector mask instructions
>>  target/riscv: rvv: Add tail agnostic for vector permutation
>>    instructions
>>  target/riscv: rvv: Add option 'rvv_ta_all_1s' to enable optional tail
>>    agnostic behavior
> 
> Do you mind rebasing this on:
> https://github.com/alistair23/qemu/tree/riscv-to-apply.next
> 
> Alistair
> 
>> 
>> target/riscv/cpu.c                      |    1 +
>> target/riscv/cpu.h                      |    2 +
>> target/riscv/cpu_helper.c               |    2 +
>> target/riscv/insn_trans/trans_rvv.c.inc |   94 +-
>> target/riscv/internals.h                |    6 +-
>> target/riscv/translate.c                |    4 +
>> target/riscv/vector_helper.c            | 1587 ++++++++++++++---------
>> 7 files changed, 1053 insertions(+), 643 deletions(-)
>> 
>> --
>> 2.34.2
>>