Calling `vext_continuous_ldst_tlb` for load/stores up to 6 bytes
significantly improves performance.
Co-authored-by: Helene CHELIN <helene.chelin@embecosm.com>
Co-authored-by: Paolo Savini <paolo.savini@embecosm.com>
Co-authored-by: Craig Blackmore <craig.blackmore@embecosm.com>
Signed-off-by: Helene CHELIN <helene.chelin@embecosm.com>
Signed-off-by: Paolo Savini <paolo.savini@embecosm.com>
Signed-off-by: Craig Blackmore <craig.blackmore@embecosm.com>
---
target/riscv/vector_helper.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 0f57e48cc5..ead3ec5194 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -393,6 +393,22 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
return;
}
+#if defined(CONFIG_USER_ONLY)
+ /*
+ * For data sizes <= 6 bytes we get better performance by simply calling
+ * vext_continuous_ldst_tlb
+ */
+ if (nf == 1 && (evl << log2_esz) <= 6) {
+ addr = base + (env->vstart << log2_esz);
+ vext_continuous_ldst_tlb(env, ldst_tlb, vd, evl, addr, env->vstart, ra,
+ esz, is_load);
+
+ env->vstart = 0;
+ vext_set_tail_elems_1s(evl, vd, desc, nf, esz, max_elems);
+ return;
+ }
+#endif
+
/* Calculate the page range of first page */
addr = base + ((env->vstart * nf) << log2_esz);
page_split = -(addr | TARGET_PAGE_MASK);
--
2.43.0
On 12/18/24 08:23, Craig Blackmore wrote: > Calling `vext_continuous_ldst_tlb` for load/stores up to 6 bytes > significantly improves performance. > > Co-authored-by: Helene CHELIN <helene.chelin@embecosm.com> > Co-authored-by: Paolo Savini <paolo.savini@embecosm.com> > Co-authored-by: Craig Blackmore <craig.blackmore@embecosm.com> > > Signed-off-by: Helene CHELIN <helene.chelin@embecosm.com> > Signed-off-by: Paolo Savini <paolo.savini@embecosm.com> > Signed-off-by: Craig Blackmore <craig.blackmore@embecosm.com> > --- > target/riscv/vector_helper.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) Thanks for the graphs. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~
On 12/18/24 11:23 AM, Craig Blackmore wrote:
> Calling `vext_continuous_ldst_tlb` for load/stores up to 6 bytes
> significantly improves performance.
>
> Co-authored-by: Helene CHELIN <helene.chelin@embecosm.com>
> Co-authored-by: Paolo Savini <paolo.savini@embecosm.com>
> Co-authored-by: Craig Blackmore <craig.blackmore@embecosm.com>
>
> Signed-off-by: Helene CHELIN <helene.chelin@embecosm.com>
> Signed-off-by: Paolo Savini <paolo.savini@embecosm.com>
> Signed-off-by: Craig Blackmore <craig.blackmore@embecosm.com>
> ---
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> target/riscv/vector_helper.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 0f57e48cc5..ead3ec5194 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -393,6 +393,22 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
> return;
> }
>
> +#if defined(CONFIG_USER_ONLY)
> + /*
> + * For data sizes <= 6 bytes we get better performance by simply calling
> + * vext_continuous_ldst_tlb
> + */
> + if (nf == 1 && (evl << log2_esz) <= 6) {
> + addr = base + (env->vstart << log2_esz);
> + vext_continuous_ldst_tlb(env, ldst_tlb, vd, evl, addr, env->vstart, ra,
> + esz, is_load);
> +
> + env->vstart = 0;
> + vext_set_tail_elems_1s(evl, vd, desc, nf, esz, max_elems);
> + return;
> + }
> +#endif
> +
> /* Calculate the page range of first page */
> addr = base + ((env->vstart * nf) << log2_esz);
> page_split = -(addr | TARGET_PAGE_MASK);
© 2016 - 2026 Red Hat, Inc.