[PATCH v8 0/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores

Craig Blackmore posted 2 patches 3 months, 2 weeks ago
target/riscv/vector_helper.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
[PATCH v8 0/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores
Posted by Craig Blackmore 3 months, 2 weeks ago
Changes since v7:
- Fixed typo `bits` -> `bytes`
- Tuned threshold for applying the optimization
- Provided results for larger sizes requested by Max Chou

This patch provides up to 60% speedup on the `memcpy` benchmark from:

   https://github.com/embecosm/rise-rvv-tcg-qemu-tooling/tree/main/strmem-benchmarks

There is some variation in the measurements so results are attached for six runs on a single thread on an Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz.

The three graphs are:

   memcpy-594c0cb1ab-128-speedup.pdf: VLEN 128

   memcpy-594c0cb1ab-1024-speedup.pdf: VLEN 1024

   memcpy-594c0cb1ab-stdlib-speedup.pdf: Scalar (to further illustrate measurement variation as this version will not touch the function modified by this patch)

Previous versions:
- v1:https://lore.kernel.org/all/20240717153040.11073-1-paolo.savini@embecosm.com/
- v2:https://lore.kernel.org/all/20241002135708.99146-1-paolo.savini@embecosm.com/
- v3:https://lore.kernel.org/all/20241014220153.196183-1-paolo.savini@embecosm.com/
- v4:https://lore.kernel.org/all/20241029194348.59574-1-paolo.savini@embecosm.com/
- v5:https://lore.kernel.org/all/20241111130324.32487-1-paolo.savini@embecosm.com/
- v6:https://lore.kernel.org/all/20241204122952.53375-1-craig.blackmore@embecosm.com/
- v7:https://lore.kernel.org/all/20241211125113.583902-1-craig.blackmore@embecosm.com/

Cc: Richard Henderson<richard.henderson@linaro.org>
Cc: Palmer Dabbelt<palmer@dabbelt.com>
Cc: Alistair Francis<alistair.francis@wdc.com>
Cc: Bin Meng<bmeng.cn@gmail.com>
Cc: Weiwei Li<liwei1518@gmail.com>
Cc: Daniel Henrique Barboza<dbarboza@ventanamicro.com>
Cc: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
Cc: Helene Chelin<helene.chelin@embecosm.com>
Cc: Nathan Egge<negge@google.com>
Cc: Max Chou<max.chou@sifive.com>
Cc: Paolo Savini<paolo.savini@embecosm.com>

Craig Blackmore (2):
   target/riscv: rvv: fix typo in vext continuous ldst function names
   target/riscv: rvv: speed up small unit-stride loads and stores

  target/riscv/vector_helper.c | 26 +++++++++++++++++++++-----
  1 file changed, 21 insertions(+), 5 deletions(-)

-- 
2.43.0

Re: [PATCH v8 0/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores
Posted by Alistair Francis 3 months ago
On Thu, Dec 19, 2024 at 12:15 AM Craig Blackmore
<craig.blackmore@embecosm.com> wrote:
>
> Changes since v7:
> - Fixed typo `bits` -> `bytes`
> - Tuned threshold for applying the optimization
> - Provided results for larger sizes requested by Max Chou
>
> This patch provides up to 60% speedup on the `memcpy` benchmark from:
>
>   https://github.com/embecosm/rise-rvv-tcg-qemu-tooling/tree/main/strmem-benchmarks
>
> There is some variation in the measurements so results are attached for six runs on a single thread on an Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz.
>
> The three graphs are:
>
>   memcpy-594c0cb1ab-128-speedup.pdf: VLEN 128
>
>   memcpy-594c0cb1ab-1024-speedup.pdf: VLEN 1024
>
>   memcpy-594c0cb1ab-stdlib-speedup.pdf: Scalar (to further illustrate measurement variation as this version will not touch the function modified by this patch)
>
> Previous versions:
> - v1: https://lore.kernel.org/all/20240717153040.11073-1-paolo.savini@embecosm.com/
> - v2: https://lore.kernel.org/all/20241002135708.99146-1-paolo.savini@embecosm.com/
> - v3: https://lore.kernel.org/all/20241014220153.196183-1-paolo.savini@embecosm.com/
> - v4: https://lore.kernel.org/all/20241029194348.59574-1-paolo.savini@embecosm.com/
> - v5: https://lore.kernel.org/all/20241111130324.32487-1-paolo.savini@embecosm.com/
> - v6: https://lore.kernel.org/all/20241204122952.53375-1-craig.blackmore@embecosm.com/
> - v7: https://lore.kernel.org/all/20241211125113.583902-1-craig.blackmore@embecosm.com/
>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: Alistair Francis <alistair.francis@wdc.com>
> Cc: Bin Meng <bmeng.cn@gmail.com>
> Cc: Weiwei Li <liwei1518@gmail.com>
> Cc: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> Cc: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
> Cc: Helene Chelin <helene.chelin@embecosm.com>
> Cc: Nathan Egge <negge@google.com>
> Cc: Max Chou <max.chou@sifive.com>
> Cc: Paolo Savini <paolo.savini@embecosm.com>
>
> Craig Blackmore (2):
>   target/riscv: rvv: fix typo in vext continuous ldst function names
>   target/riscv: rvv: speed up small unit-stride loads and stores

Thanks!

Applied to riscv-to-apply.next

Alistair

>
>  target/riscv/vector_helper.c | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
>
> --
> 2.43.0
>
>