[PATCH v2 00/12] target/arm: Use TCG vector ops for MVE

Peter Maydell posted 12 patches 2 years, 7 months ago
Failed in applying to current master (apply log)
target/arm/cpu.h              |   4 +-
target/arm/translate.h        |   2 +
target/arm/helper.c           |  33 ++++
target/arm/machine.c          |  13 ++
target/arm/translate-m-nocp.c |   8 +-
target/arm/translate-mve.c    | 310 ++++++++++++++++++++++++++--------
target/arm/translate-vfp.c    |  33 +++-
target/arm/translate.c        |  42 ++++-
8 files changed, 361 insertions(+), 84 deletions(-)
[PATCH v2 00/12] target/arm: Use TCG vector ops for MVE
Posted by Peter Maydell 2 years, 7 months ago
This patchset uses the TCG vector ops for some MVE
instructions. We can only do this when we know that none
of the MVE lanes are predicated, ie when neither tail
predication nor VPT predication nor ECI partial insn
execution are happening.

Changes v1->v2:

The major change is that instead of just updating the local
s->mve_no_pred flag when we translate an insn that changes the
predication state, we end the TB with DISAS_UPDATE_NONCHAIN.
The exceptions are the code called from vfp_access_check()
(gen_preserve_fp_state() and gen_update_fp_context()). We
can definitely determine the new flag value in one of these cases,
but in the other we can't always.

So patch 1 is new, and adds support to gen_jmp_tb() for
looking at the existing value of is_jmp so it can honour
a preceding request for an UPDATE_NOCHAIN or UPDATE_EXIT.
(We already were assuming this because gen_preserve_fp_state()
can set is_jmp to DISAS_UPDATE_EXIT if icount is in use.)

Patch 2 (new) enforces that FPDSCR.LTPSIZE is 4 on inbound
migration, because we now rely on this architectural invariant.

Patch 3 is the old patch 1, updated as noted above.

Patches 4-6 have been reviewed (they have been very slightly
tweaked to use a new mve_no_predication() function that checks
both s->eci and s->mve_no_pred, rather than v1's direct check
of mve_no_pred.)

Patches 7-12 are new, and add optimized variants of VDUP, VMVN,
various shifts, the shift-and-inserts, and the 1-operand-immediate
insns.

I think this should now be the complete set of optimizations
it's worth implementing at this point.

thanks
-- PMM

Peter Maydell (12):
  target/arm: Avoid goto_tb if we're trying to exit to the main loop
  target/arm: Enforce that FPDSCR.LTPSIZE is 4 on inbound migration
  target/arm: Add TB flag for "MVE insns not predicated"
  target/arm: Optimize MVE logic ops
  target/arm: Optimize MVE arithmetic ops
  target/arm: Optimize MVE VNEG, VABS
  target/arm: Optimize MVE VDUP
  target/arm: Optimize MVE VMVN
  target/arm: Optimize MVE VSHL, VSHR immediate forms
  target/arm: Optimize MVE VSHLL and VMOVL
  target/arm: Optimize MVE VSLI and VSRI
  target/arm: Optimize MVE 1op-immediate insns

 target/arm/cpu.h              |   4 +-
 target/arm/translate.h        |   2 +
 target/arm/helper.c           |  33 ++++
 target/arm/machine.c          |  13 ++
 target/arm/translate-m-nocp.c |   8 +-
 target/arm/translate-mve.c    | 310 ++++++++++++++++++++++++++--------
 target/arm/translate-vfp.c    |  33 +++-
 target/arm/translate.c        |  42 ++++-
 8 files changed, 361 insertions(+), 84 deletions(-)

-- 
2.20.1