target/i386: optimize string operations

[PATCH 08/13] target/i386: make cc_op handling more explicit for repeated string instructions.

Posted by Paolo Bonzini 3 months, 3 weeks ago

Since the cost of gen_update_cc_op() must be paid anyway, it's easier
to place them manually and not rely on spilling that is buried under
multiple levels of function calls.

And since cc_op will have been spilled at the point of a fault, just
make the whole insn CC_OP_DYNAMIC.  Once repz_opt is reintroduced,
a fault could happen either before or after the first execution of
CMPS/SCAS, and CC_OP_DYNAMIC sidesteps the complicated matter of what
x86_restore_state_to_opc would do.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/tcg/translate.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 4ff0ccf71cb..4b53a47000e 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -1345,14 +1345,29 @@ static void do_gen_rep(DisasContext *s, MemOp ot,
      */
     s->flags &= ~HF_RF_MASK;
 
+    /*
+     * For CMPS/SCAS, the CC_OP after a memory fault could come from either
+     * the previous instruction or the string instruction; but because we
+     * arrange to keep CC_OP up to date all the time, just mark the whole
+     * insn as CC_OP_DYNAMIC.
+     *
+     * It's not a problem to do this even for instructions that do not
+     * modify the flags, so do it unconditionally.
+     */
     gen_update_cc_op(s);
+    tcg_set_insn_start_param(s->base.insn_start, 1, CC_OP_DYNAMIC);
+
+    /* Any iteration at all?  */
     gen_op_jz_ecx(s, done);
 
     fn(s, ot);
     gen_op_add_reg_im(s, s->aflag, R_ECX, -1);
+    gen_update_cc_op(s);
+
+    /* Leave if REP condition fails.  */
     if (is_repz_nz) {
         int nz = (s->prefix & PREFIX_REPNZ) ? 1 : 0;
-        gen_jcc(s, (JCC_Z << 1) | (nz ^ 1), done);
+        gen_jcc_noeob(s, (JCC_Z << 1) | (nz ^ 1), done);
     }
 
     /*
-- 
2.47.1

Re: [PATCH 08/13] target/i386: make cc_op handling more explicit for repeated string instructions.

Posted by Richard Henderson 3 months, 3 weeks ago

On 12/15/24 03:06, Paolo Bonzini wrote:
>       fn(s, ot);
>       gen_op_add_reg_im(s, s->aflag, R_ECX, -1);
> +    gen_update_cc_op(s);
> +
> +    /* Leave if REP condition fails.  */
>       if (is_repz_nz) {
>           int nz = (s->prefix & PREFIX_REPNZ) ? 1 : 0;
> -        gen_jcc(s, (JCC_Z << 1) | (nz ^ 1), done);
> +        gen_jcc_noeob(s, (JCC_Z << 1) | (nz ^ 1), done);

The comment in gen_jcc would still seem to apply:

     CCPrepare cc = gen_prepare_cc(s, b, NULL);

     /*
      * Note that this must be _after_ gen_prepare_cc, because it
      * can change the cc_op from CC_OP_DYNAMIC to CC_OP_EFLAGS!
      */
     gen_update_cc_op(s);

via any path through gen_prepare_cc that reaches gen_compute_eflags.

However!  Because this is JCC_Z, we will never call gen_compute_eflags, we will always go 
through the gen_prepare_eflags_z, which doesn't have the same problem.

This subtlety deserves a comment and maybe an assert.  Perhaps

     gen_jcc_noeob(...);
     assert(!s->cc_op_dirty);

r~

Re: [PATCH 08/13] target/i386: make cc_op handling more explicit for repeated string instructions.

Posted by Paolo Bonzini 3 months, 3 weeks ago

Il dom 15 dic 2024, 15:45 Richard Henderson <richard.henderson@linaro.org>
ha scritto:

> On 12/15/24 03:06, Paolo Bonzini wrote:
> >       fn(s, ot);
> >       gen_op_add_reg_im(s, s->aflag, R_ECX, -1);
> > +    gen_update_cc_op(s);
> > +
> > +    /* Leave if REP condition fails.  */
> >       if (is_repz_nz) {
> >           int nz = (s->prefix & PREFIX_REPNZ) ? 1 : 0;
> > -        gen_jcc(s, (JCC_Z << 1) | (nz ^ 1), done);
> > +        gen_jcc_noeob(s, (JCC_Z << 1) | (nz ^ 1), done);
>
> The comment in gen_jcc would still seem to apply:
>
>      CCPrepare cc = gen_prepare_cc(s, b, NULL);
>
>      /*
>       * Note that this must be _after_ gen_prepare_cc, because it
>       * can change the cc_op from CC_OP_DYNAMIC to CC_OP_EFLAGS!
>       */
>      gen_update_cc_op(s);
>
> via any path through gen_prepare_cc that reaches gen_compute_eflags.
>
> However!  Because this is JCC_Z, we will never call gen_compute_eflags, we
> will always go
> through the gen_prepare_eflags_z, which doesn't have the same problem.
>

Or more simply, fn(s, ot) must have left CC_OP_SUBx in cc_op.

This subtlety deserves a comment and maybe an assert.  Perhaps
>
>      gen_jcc_noeob(...);
>      assert(!s->cc_op_dirty);
>

Either that or an assert(s->cc_op != CC_OP_DYNAMIC) before the call to
gen_jcc_noeob().

Paolo


>
> r~
>
>

Re: [PATCH 08/13] target/i386: make cc_op handling more explicit for repeated string instructions.

Posted by Richard Henderson 3 months, 3 weeks ago

On 12/15/24 09:13, Paolo Bonzini wrote:
> 
> 
> Il dom 15 dic 2024, 15:45 Richard Henderson <richard.henderson@linaro.org 
> <mailto:richard.henderson@linaro.org>> ha scritto:
> 
>     On 12/15/24 03:06, Paolo Bonzini wrote:
>      >       fn(s, ot);
>      >       gen_op_add_reg_im(s, s->aflag, R_ECX, -1);
>      > +    gen_update_cc_op(s);
>      > +
>      > +    /* Leave if REP condition fails.  */
>      >       if (is_repz_nz) {
>      >           int nz = (s->prefix & PREFIX_REPNZ) ? 1 : 0;
>      > -        gen_jcc(s, (JCC_Z << 1) | (nz ^ 1), done);
>      > +        gen_jcc_noeob(s, (JCC_Z << 1) | (nz ^ 1), done);
> 
>     The comment in gen_jcc would still seem to apply:
> 
>           CCPrepare cc = gen_prepare_cc(s, b, NULL);
> 
>           /*
>            * Note that this must be _after_ gen_prepare_cc, because it
>            * can change the cc_op from CC_OP_DYNAMIC to CC_OP_EFLAGS!
>            */
>           gen_update_cc_op(s);
> 
>     via any path through gen_prepare_cc that reaches gen_compute_eflags.
> 
>     However!  Because this is JCC_Z, we will never call gen_compute_eflags, we will always go
>     through the gen_prepare_eflags_z, which doesn't have the same problem.
> 
> 
> Or more simply, fn(s, ot) must have left CC_OP_SUBx in cc_op.

No, even CC_OP_SUBx can (and in this case, will) goto slow_jcc.
My correctness analysis is strictly based on JCC_Z.

>     This subtlety deserves a comment and maybe an assert.  Perhaps
> 
>           gen_jcc_noeob(...);
>           assert(!s->cc_op_dirty);
> 
> 
> Either that or an assert(s->cc_op != CC_OP_DYNAMIC) before the call to gen_jcc_noeob().

No, not before gen_jcc_noeob, since that's where any buggy change would occur.

r~