target/riscv: fix vector LD/ST instruction

[PATCH v2 1/2] target/riscv: fix vector register address calculation in strided LD/ST

Posted by Chao Liu 5 months, 3 weeks ago

This patch fixes a critical bug in the RISC-V vector instruction
translation that caused incorrect data handling in strided load
operations (e.g., vlsseg8e32).

Problem Description:

The `get_log2` function in `trans_rvv.c.inc` returned a value 1 higher
than the actual log2 value. For example, get_log2(4) incorrectly
returned 3 instead of 2.

This led to erroneous vector register offset calculations, resulting in
data overlap where bytes 32-47 were incorrectly copied to positions
16-31 in ChaCha20 encryption code.

rvv_test_func:
    vsetivli    zero, 1, e32, m1, ta, ma
    li          t0, 64

    vlsseg8e32.v v0, (a0), t0
    addi        a0, a0, 32
    vlsseg8e32.v v8, (a0), t0

    vssseg8e32.v v0, (a1), t0
    addi        a1, a1, 32
    vssseg8e32.v v8, (a1), t0
    ret

Analysis:

The original implementation counted the number of right shifts until
zero, including the final shift that reduced the value to zero:

static inline uint32_t get_log2(uint32_t a)
{
    uint32_t i = 0;
    for (; a > 0;) {
        a >>= 1;
        i++;
    }
    return i; // Returns 3 for a=4 (0b100 → 0b10 → 0b1 → 0b0)
}

Fix:

The corrected function stops shifting when only the highest bit remains
and handles the special case of a=0:

static inline uint32_t get_log2(uint32_t a)
{
    uint32_t i = 0;
    if (a == 0) {
        return i; // Handle edge case
    }
    for (; a > 1; a >>= 1) {
        i++;
    }
    return i; // Now returns 2 for a=4
}

Fixes: 28c12c1f2f ("Generate strided vector loads/stores with tcg nodes.")

Signed-off-by: Chao Liu <chao.liu@yeah.net>
---
 target/riscv/insn_trans/trans_rvv.c.inc | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index 2b6077ac06..f50b62b1d8 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -877,7 +877,10 @@ static inline uint32_t MAXSZ(DisasContext *s)
 static inline uint32_t get_log2(uint32_t a)
 {
     uint32_t i = 0;
-    for (; a > 0;) {
+    if (a == 0) {
+        return i;
+    }
+    for (; a > 1;) {
         a >>= 1;
         i++;
     }
-- 
2.50.1

Re: [PATCH v2 1/2] target/riscv: fix vector register address calculation in strided LD/ST

Posted by Richard Henderson 5 months, 3 weeks ago

On 8/16/25 10:29, Chao Liu wrote:
> This patch fixes a critical bug in the RISC-V vector instruction
> translation that caused incorrect data handling in strided load
> operations (e.g., vlsseg8e32).
> 
> Problem Description:
> 
> The `get_log2` function in `trans_rvv.c.inc` returned a value 1 higher
> than the actual log2 value. For example, get_log2(4) incorrectly
> returned 3 instead of 2.
> 
> This led to erroneous vector register offset calculations, resulting in
> data overlap where bytes 32-47 were incorrectly copied to positions
> 16-31 in ChaCha20 encryption code.
> 
> rvv_test_func:
>      vsetivli    zero, 1, e32, m1, ta, ma
>      li          t0, 64
> 
>      vlsseg8e32.v v0, (a0), t0
>      addi        a0, a0, 32
>      vlsseg8e32.v v8, (a0), t0
> 
>      vssseg8e32.v v0, (a1), t0
>      addi        a1, a1, 32
>      vssseg8e32.v v8, (a1), t0
>      ret
> 
> Analysis:
> 
> The original implementation counted the number of right shifts until
> zero, including the final shift that reduced the value to zero:
> 
> static inline uint32_t get_log2(uint32_t a)
> {
>      uint32_t i = 0;
>      for (; a > 0;) {
>          a >>= 1;
>          i++;
>      }
>      return i; // Returns 3 for a=4 (0b100 → 0b10 → 0b1 → 0b0)
> }
> 
> Fix:
> 
> The corrected function stops shifting when only the highest bit remains
> and handles the special case of a=0:
> 
> static inline uint32_t get_log2(uint32_t a)
> {
>      uint32_t i = 0;
>      if (a == 0) {
>          return i; // Handle edge case
>      }
>      for (; a > 1; a >>= 1) {
>          i++;
>      }
>      return i; // Now returns 2 for a=4
> }
> 
> Fixes: 28c12c1f2f ("Generate strided vector loads/stores with tcg nodes.")
> 
> Signed-off-by: Chao Liu <chao.liu@yeah.net>
> ---
>   target/riscv/insn_trans/trans_rvv.c.inc | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
> index 2b6077ac06..f50b62b1d8 100644
> --- a/target/riscv/insn_trans/trans_rvv.c.inc
> +++ b/target/riscv/insn_trans/trans_rvv.c.inc
> @@ -877,7 +877,10 @@ static inline uint32_t MAXSZ(DisasContext *s)
>   static inline uint32_t get_log2(uint32_t a)
>   {
>       uint32_t i = 0;
> -    for (; a > 0;) {
> +    if (a == 0) {
> +        return i;
> +    }
> +    for (; a > 1;) {
>           a >>= 1;
>           i++;
>       }

I suggest

     assert(is_power_of_2(a));
     return ctz32(a);

I was surprised we don't have such a function in qemu/host-utils.h already.


r~

Re: [PATCH v2 1/2] target/riscv: fix vector register address calculation in strided LD/ST

Posted by Eric Biggers 5 months, 3 weeks ago

On Sat, Aug 16, 2025 at 03:55:40AM +0800, Chao Liu wrote:
> This patch fixes a critical bug in the RISC-V vector instruction
> translation that caused incorrect data handling in strided load
> operations (e.g., vlsseg8e32).
> 
> Problem Description:
> 
> The `get_log2` function in `trans_rvv.c.inc` returned a value 1 higher
> than the actual log2 value. For example, get_log2(4) incorrectly
> returned 3 instead of 2.
> 
> This led to erroneous vector register offset calculations, resulting in
> data overlap where bytes 32-47 were incorrectly copied to positions
> 16-31 in ChaCha20 encryption code.
> 
> rvv_test_func:
>     vsetivli    zero, 1, e32, m1, ta, ma
>     li          t0, 64
> 
>     vlsseg8e32.v v0, (a0), t0
>     addi        a0, a0, 32
>     vlsseg8e32.v v8, (a0), t0
> 
>     vssseg8e32.v v0, (a1), t0
>     addi        a1, a1, 32
>     vssseg8e32.v v8, (a1), t0
>     ret
> 
> Analysis:
> 
> The original implementation counted the number of right shifts until
> zero, including the final shift that reduced the value to zero:
> 
> static inline uint32_t get_log2(uint32_t a)
> {
>     uint32_t i = 0;
>     for (; a > 0;) {
>         a >>= 1;
>         i++;
>     }
>     return i; // Returns 3 for a=4 (0b100 → 0b10 → 0b1 → 0b0)
> }
> 
> Fix:
> 
> The corrected function stops shifting when only the highest bit remains
> and handles the special case of a=0:
> 
> static inline uint32_t get_log2(uint32_t a)
> {
>     uint32_t i = 0;
>     if (a == 0) {
>         return i; // Handle edge case
>     }
>     for (; a > 1; a >>= 1) {
>         i++;
>     }
>     return i; // Now returns 2 for a=4
> }
> 
> Fixes: 28c12c1f2f ("Generate strided vector loads/stores with tcg nodes.")
> 
> Signed-off-by: Chao Liu <chao.liu@yeah.net>
> ---
>  target/riscv/insn_trans/trans_rvv.c.inc | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 

Tested-by: Eric Biggers <ebiggers@kernel.org>

But, to get this to apply I had to re-apply the fixed commit (which was
reverted), then resolve a conflict.  You'll need to send out a new
series which applies to the latest master branch.

- Eric