[RFC PATCH v2 0/6] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions

Max Chou posted 6 patches 1 year, 8 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20240531174504.281461-1-max.chou@sifive.com
Maintainers: Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Palmer Dabbelt <palmer@dabbelt.com>, Alistair Francis <alistair.francis@wdc.com>, Bin Meng <bmeng.cn@gmail.com>, Weiwei Li <liwei1518@gmail.com>, Daniel Henrique Barboza <dbarboza@ventanamicro.com>, Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
There is a newer version of this series
accel/tcg/ldst_common.c.inc             |   8 +-
target/riscv/helper.h                   |   8 +
target/riscv/insn32.decode              |  11 +-
target/riscv/insn_trans/trans_rvv.c.inc | 454 +++++++++++++++++++++++-
target/riscv/vector_helper.c            | 142 ++++++--
5 files changed, 591 insertions(+), 32 deletions(-)
[RFC PATCH v2 0/6] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions
Posted by Max Chou 1 year, 8 months ago
Hi,

This RFC patch set tries to fix the issue of
https://gitlab.com/qemu-project/qemu/-/issues/2137.

In this new version, we added patches that try to load/store more data
at a time in part of vector continuous load/store (unit-stride/whole
register) instructions with some assumptions (e.g. no masking, no tail
agnostic, perform virtual address resolution once for the entire vector,
etc.) as suggested by Richard Henderson.

This version can improve the performance of the test case provided in
https://gitlab.com/qemu-project/qemu/-/issues/2137#note_1757501369 (from
~13.5 sec to ~1.5 sec) on QEMU user mode.

PS: This RFC patch set only focuses on the vle8.v/vse8.v/vl8re8.v/vs8r.v
instructions. The next version will try to complete other instructions.

Series based on riscv-to-apply.next branch (commit 1806da7).

Max Chou (6):
  target/riscv: Separate vector segment ld/st instructions
  accel/tcg: Avoid unnecessary call overhead from
    qemu_plugin_vcpu_mem_cb
  target/riscv: Inline vext_ldst_us and corresponding function for
    performance
  target/riscv: Add check_probe_[read|write] helper functions
  target/riscv: rvv: Optimize v[l|s]e8.v with limitations
  target/riscv: rvv: Optimize vl8re8.v/vs8r.v with limitations

 accel/tcg/ldst_common.c.inc             |   8 +-
 target/riscv/helper.h                   |   8 +
 target/riscv/insn32.decode              |  11 +-
 target/riscv/insn_trans/trans_rvv.c.inc | 454 +++++++++++++++++++++++-
 target/riscv/vector_helper.c            | 142 ++++++--
 5 files changed, 591 insertions(+), 32 deletions(-)

-- 
2.34.1