[PATCH v2 3/3] target/riscv/translate.c: set vstart_eq_zero in mark_vs_dirty()

Daniel Henrique Barboza posted 3 patches 8 months, 2 weeks ago
Maintainers: Palmer Dabbelt <palmer@dabbelt.com>, Alistair Francis <alistair.francis@wdc.com>, Bin Meng <bin.meng@windriver.com>, Weiwei Li <liwei1518@gmail.com>, Daniel Henrique Barboza <dbarboza@ventanamicro.com>, Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
There is a newer version of this series
[PATCH v2 3/3] target/riscv/translate.c: set vstart_eq_zero in mark_vs_dirty()
Posted by Daniel Henrique Barboza 8 months, 2 weeks ago
The 'vstart_eq_zero' flag which is used to determine if some insns, like
vector reductor operations, should SIGILL. At this moment the flag is
being updated only during cpu_get_tb_cpu_state(), at the start of each
translation block.

This cadence isn't enough and we're facing situations where a vector
instruction successfully updated 'vstart' to zero, but the flag was
still marked as 'false', resulting in a SIGILL because instructions are
checking the flag.

mark_vs_dirty() is called after any instruction changes Vector CSR
state, making it a good place to update 'vstart_eq_zero'.

Fixes: 8e1ee1fb57 ("target/riscv: rvv-1.0: add translation-time vector context status")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1976
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 target/riscv/translate.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 177418b2b9..f9ff7b6173 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -652,6 +652,8 @@ static inline void mark_fs_dirty(DisasContext *ctx) { }
  */
 static void mark_vs_dirty(DisasContext *ctx)
 {
+    TCGLabel *vstart_zero = gen_new_label();
+    TCGLabel *done = gen_new_label();
     TCGv tmp;
 
     if (ctx->mstatus_vs != EXT_STATUS_DIRTY) {
@@ -669,6 +671,24 @@ static void mark_vs_dirty(DisasContext *ctx)
             tcg_gen_st_tl(tmp, tcg_env, offsetof(CPURISCVState, mstatus_hs));
         }
     }
+
+    /*
+     * We can safely make 'vl_eq_vlmax = false' if we marked
+     * VS as dirty with non-zero 'vstart', i.e. there's a fault
+     * to be handled. If 'vstart' is zero then we should retain
+     * the existing 'vl_eq_vlmax' - it'll be recalculated on the
+     * start of the next TB or during vset{i}vl{i} (that forces a
+     * TB end).
+     */
+    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vstart, 0, vstart_zero);
+    ctx->vstart_eq_zero = false;
+    ctx->vl_eq_vlmax = false;
+    tcg_gen_br(done);
+
+    gen_set_label(vstart_zero);
+    ctx->vstart_eq_zero = true;
+
+    gen_set_label(done);
 }
 #else
 static inline void mark_vs_dirty(DisasContext *ctx) { }
-- 
2.43.0
Re: [PATCH v2 3/3] target/riscv/translate.c: set vstart_eq_zero in mark_vs_dirty()
Posted by Richard Henderson 8 months, 2 weeks ago
On 2/16/24 03:57, Daniel Henrique Barboza wrote:
> The 'vstart_eq_zero' flag which is used to determine if some insns, like
> vector reductor operations, should SIGILL. At this moment the flag is
> being updated only during cpu_get_tb_cpu_state(), at the start of each
> translation block.
> 
> This cadence isn't enough and we're facing situations where a vector
> instruction successfully updated 'vstart' to zero, but the flag was
> still marked as 'false', resulting in a SIGILL because instructions are
> checking the flag.
> 
> mark_vs_dirty() is called after any instruction changes Vector CSR
> state, making it a good place to update 'vstart_eq_zero'.
> 
> Fixes: 8e1ee1fb57 ("target/riscv: rvv-1.0: add translation-time vector context status")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1976
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>   target/riscv/translate.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
> 
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 177418b2b9..f9ff7b6173 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -652,6 +652,8 @@ static inline void mark_fs_dirty(DisasContext *ctx) { }
>    */
>   static void mark_vs_dirty(DisasContext *ctx)
>   {
> +    TCGLabel *vstart_zero = gen_new_label();
> +    TCGLabel *done = gen_new_label();
>       TCGv tmp;
>   
>       if (ctx->mstatus_vs != EXT_STATUS_DIRTY) {
> @@ -669,6 +671,24 @@ static void mark_vs_dirty(DisasContext *ctx)
>               tcg_gen_st_tl(tmp, tcg_env, offsetof(CPURISCVState, mstatus_hs));
>           }
>       }
> +
> +    /*
> +     * We can safely make 'vl_eq_vlmax = false' if we marked
> +     * VS as dirty with non-zero 'vstart', i.e. there's a fault
> +     * to be handled. If 'vstart' is zero then we should retain
> +     * the existing 'vl_eq_vlmax' - it'll be recalculated on the
> +     * start of the next TB or during vset{i}vl{i} (that forces a
> +     * TB end).
> +     */
> +    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vstart, 0, vstart_zero);
> +    ctx->vstart_eq_zero = false;
> +    ctx->vl_eq_vlmax = false;
> +    tcg_gen_br(done);
> +
> +    gen_set_label(vstart_zero);
> +    ctx->vstart_eq_zero = true;
> +
> +    gen_set_label(done);

This is very confused, apparently generating code to test vstart at runtime, and then set 
some translation time variables in the branches.

Afaik, the only way vstart != 0 is an explicit set to the CSR or exiting a load via 
exception.  Therefore you have no need to have any sort of brcond here -- just set
ctx->vstart_eq_zero = true.

Also, you need to move the ifdefs from around mark_vs_dirty, because it is now relevant to 
user-only.

It may be worth a rename, because it does more than mark vs dirty, and therefore is 
different than mark_fs_dirty.

You need to review all instances of

     TCGLabel *over = gen_new_label();
     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
...
     gen_set_label(over);
     return true;

because this *should have* set vstart = 0.  With vstart < vl, this is done in the helper 
function, but every place using a conditional branch needs attention.


r~
Re: [PATCH v2 3/3] target/riscv/translate.c: set vstart_eq_zero in mark_vs_dirty()
Posted by Daniel Henrique Barboza 8 months, 2 weeks ago

On 2/16/24 15:56, Richard Henderson wrote:
> On 2/16/24 03:57, Daniel Henrique Barboza wrote:
>> The 'vstart_eq_zero' flag which is used to determine if some insns, like
>> vector reductor operations, should SIGILL. At this moment the flag is
>> being updated only during cpu_get_tb_cpu_state(), at the start of each
>> translation block.
>>
>> This cadence isn't enough and we're facing situations where a vector
>> instruction successfully updated 'vstart' to zero, but the flag was
>> still marked as 'false', resulting in a SIGILL because instructions are
>> checking the flag.
>>
>> mark_vs_dirty() is called after any instruction changes Vector CSR
>> state, making it a good place to update 'vstart_eq_zero'.
>>
>> Fixes: 8e1ee1fb57 ("target/riscv: rvv-1.0: add translation-time vector context status")
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1976
>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
>> ---
>>   target/riscv/translate.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
>>
>> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
>> index 177418b2b9..f9ff7b6173 100644
>> --- a/target/riscv/translate.c
>> +++ b/target/riscv/translate.c
>> @@ -652,6 +652,8 @@ static inline void mark_fs_dirty(DisasContext *ctx) { }
>>    */
>>   static void mark_vs_dirty(DisasContext *ctx)
>>   {
>> +    TCGLabel *vstart_zero = gen_new_label();
>> +    TCGLabel *done = gen_new_label();
>>       TCGv tmp;
>>       if (ctx->mstatus_vs != EXT_STATUS_DIRTY) {
>> @@ -669,6 +671,24 @@ static void mark_vs_dirty(DisasContext *ctx)
>>               tcg_gen_st_tl(tmp, tcg_env, offsetof(CPURISCVState, mstatus_hs));
>>           }
>>       }
>> +
>> +    /*
>> +     * We can safely make 'vl_eq_vlmax = false' if we marked
>> +     * VS as dirty with non-zero 'vstart', i.e. there's a fault
>> +     * to be handled. If 'vstart' is zero then we should retain
>> +     * the existing 'vl_eq_vlmax' - it'll be recalculated on the
>> +     * start of the next TB or during vset{i}vl{i} (that forces a
>> +     * TB end).
>> +     */
>> +    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vstart, 0, vstart_zero);
>> +    ctx->vstart_eq_zero = false;
>> +    ctx->vl_eq_vlmax = false;
>> +    tcg_gen_br(done);
>> +
>> +    gen_set_label(vstart_zero);
>> +    ctx->vstart_eq_zero = true;
>> +
>> +    gen_set_label(done);
> 
> This is very confused, apparently generating code to test vstart at runtime, and then set some translation time variables in the branches.
> 
> Afaik, the only way vstart != 0 is an explicit set to the CSR or exiting a load via exception.  Therefore you have no need to have any sort of brcond here -- just set
> ctx->vstart_eq_zero = true.
> 
> Also, you need to move the ifdefs from around mark_vs_dirty, because it is now relevant to user-only.
> 
> It may be worth a rename, because it does more than mark vs dirty, and therefore is different than mark_fs_dirty.
> 
> You need to review all instances of
> 
>      TCGLabel *over = gen_new_label();
>      tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
> ...
>      gen_set_label(over);
>      return true;
> 
> because this *should have* set vstart = 0.  With vstart < vl, this is done in the helper function, but every place using a conditional branch needs attention.

After reading the reviews of patches 1 and 3 what I'm considering here is:

1 - drop patch 1;

2 - there's a patch from Ivan Klokov sent 2 months ago:

"[PATCH 1/1] target/riscv: Clear vstart_qe_zero flag"
https://lore.kernel.org/qemu-riscv/20231214111851.142532-1-ivan.klokov@syntacore.com/

His patch is closer to what you suggested than mine. He already renamed mark_vs_dirty()
to finalize_rvv_inst() and made it set start_eq_zero unconditionally. It needs a
little work (i.e. remove the ifds from the function) that I'll do myself.

3 - I'll keep patch 2 to reduce the redundant calls to the now finalize_rvv_inst();

4 - Add another patch to through all "gen_set_label(over)" cond branches and set
vstart = 0 && vstart_eq_zero manually when we're doing the jump.

In fact, shouldn't we return earlier if we're not taking the branch? Otherwise
we'll set vstart twice in case we didn't get the branch. E.g:

       TCGLabel *over = gen_new_label();
       tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
       (...)
       finalize_rvv_insn();
       return true;

       gen_set_label(over);
       /* some TCG ops to set cpu_vstart to zero. Perhaps a helper?  */
       s->vstart_eq_zero = true;
       return true;




Thanks,

Daniel







> 
> 
> r~

Re: [PATCH v2 3/3] target/riscv/translate.c: set vstart_eq_zero in mark_vs_dirty()
Posted by Richard Henderson 8 months, 2 weeks ago
On 2/16/24 12:40, Daniel Henrique Barboza wrote:
> After reading the reviews of patches 1 and 3 what I'm considering here is:
> 
> 1 - drop patch 1;

Ok.

> 2 - there's a patch from Ivan Klokov sent 2 months ago:
> 
> "[PATCH 1/1] target/riscv: Clear vstart_qe_zero flag"
> https://lore.kernel.org/qemu-riscv/20231214111851.142532-1-ivan.klokov@syntacore.com/
> 
> His patch is closer to what you suggested than mine. He already renamed mark_vs_dirty()
> to finalize_rvv_inst() and made it set start_eq_zero unconditionally. It needs a
> little work (i.e. remove the ifds from the function) that I'll do myself.
> 
> 3 - I'll keep patch 2 to reduce the redundant calls to the now finalize_rvv_inst();

Ok.

> 4 - Add another patch to through all "gen_set_label(over)" cond branches and set
> vstart = 0 && vstart_eq_zero manually when we're doing the jump.
> 
> In fact, shouldn't we return earlier if we're not taking the branch? Otherwise
> we'll set vstart twice in case we didn't get the branch. E.g:
> 
>        TCGLabel *over = gen_new_label();
>        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
>        (...)
>        finalize_rvv_insn();
>        return true;
> 
>        gen_set_label(over);
>        /* some TCG ops to set cpu_vstart to zero. Perhaps a helper?  */
>        s->vstart_eq_zero = true;
>        return true;

That will break, of course, because you wouldn't emit 'over'.
You really need to get translation-time and run-time separate in your head.

That said, I think these brcond(vstart >= vl) are a mistake.
The loops within the helpers are generally of the form

     for (i = env->vstart; i < evl; i++, env->vstart++) {

which will operate just fine with vstart >= vl, iterating zero times.
We will then fall through to the post-insn cleanup,

     env->vstart = 0;
     vext_set_tail_elems_1s(evl, vd, desc, nf, esz, max_elems);

or whatever.

I would expect the condition vstart >= vl to never happen in practice.  I believe the only 
way to induce it is an explicit write to vstart.  Therefore I think you should not attempt 
to "optimize away" the call to the helper.

Of course you will want to double-check all of the loop iterations in the associated 
helpers when removing the branches.


r~


PS: I believe a better form for the ldst loops is

     for (i = env->vstart; i < evl; env->vstart = ++i)

to avoid re-reading from vstart each iteration.

Re: [PATCH v2 3/3] target/riscv/translate.c: set vstart_eq_zero in mark_vs_dirty()
Posted by Daniel Henrique Barboza 8 months, 2 weeks ago

On 2/16/24 20:41, Richard Henderson wrote:
> On 2/16/24 12:40, Daniel Henrique Barboza wrote:
>> After reading the reviews of patches 1 and 3 what I'm considering here is:
>>
>> 1 - drop patch 1;
> 
> Ok.
> 
>> 2 - there's a patch from Ivan Klokov sent 2 months ago:
>>
>> "[PATCH 1/1] target/riscv: Clear vstart_qe_zero flag"
>> https://lore.kernel.org/qemu-riscv/20231214111851.142532-1-ivan.klokov@syntacore.com/
>>
>> His patch is closer to what you suggested than mine. He already renamed mark_vs_dirty()
>> to finalize_rvv_inst() and made it set start_eq_zero unconditionally. It needs a
>> little work (i.e. remove the ifds from the function) that I'll do myself.
>>
>> 3 - I'll keep patch 2 to reduce the redundant calls to the now finalize_rvv_inst();
> 
> Ok.
> 
>> 4 - Add another patch to through all "gen_set_label(over)" cond branches and set
>> vstart = 0 && vstart_eq_zero manually when we're doing the jump.
>>
>> In fact, shouldn't we return earlier if we're not taking the branch? Otherwise
>> we'll set vstart twice in case we didn't get the branch. E.g:
>>
>>        TCGLabel *over = gen_new_label();
>>        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
>>        (...)
>>        finalize_rvv_insn();
>>        return true;
>>
>>        gen_set_label(over);
>>        /* some TCG ops to set cpu_vstart to zero. Perhaps a helper?  */
>>        s->vstart_eq_zero = true;
>>        return true;
> 
> That will break, of course, because you wouldn't emit 'over'.
> You really need to get translation-time and run-time separate in your head.
> 
> That said, I think these brcond(vstart >= vl) are a mistake.
> The loops within the helpers are generally of the form
> 
>      for (i = env->vstart; i < evl; i++, env->vstart++) {
> 
> which will operate just fine with vstart >= vl, iterating zero times.
> We will then fall through to the post-insn cleanup,
> 
>      env->vstart = 0;
>      vext_set_tail_elems_1s(evl, vd, desc, nf, esz, max_elems);
> 
> or whatever.
> 
> I would expect the condition vstart >= vl to never happen in practice.  I believe the only way to induce it is an explicit write to vstart.  Therefore I think you should not attempt to "optimize away" the call to the helper.
> 
> Of course you will want to double-check all of the loop iterations in the associated helpers when removing the branches.

Ok!

> 
> 
> r~
> 
> 
> PS: I believe a better form for the ldst loops is
> 
>      for (i = env->vstart; i < evl; env->vstart = ++i)
> 
> to avoid re-reading from vstart each iteration.


I'll add a patch to convert these loops. Thanks,


Daniel