This started as an "easy" fix for RF handling in string instructions.
I then realized how broken repz_opt is (patch 5) in that it was optimizing
for the wrong case; and that redoing the optimization would make the RF
handling basically free.
On a microbenchmark running x86-on-x86 user-mode emulation, stos and
movs execute about 40% less instruction and about 60% less branches.
Performance is very variable, because it is limited by memory bandwidth
and because the out-of-order processor does a great job of scheduling
all the useless instructions executed by the older code; but the
microbenchmark results seem to improve by 10-15%.
Paolo
Paolo Bonzini (13):
target/i386: inline gen_jcc into sole caller
target/i386: remove trailing 1 from gen_{j,cmov,set}cc1
target/i386: unify REP and REPZ/REPNZ generation
target/i386: unify choice between single and repeated string
instructions
target/i386: reorganize ops emitted by do_gen_rep, drop repz_opt
target/i386: tcg: move gen_set/reset_* earlier in the file
target/i386: fix RF handling for string instructions
target/i386: make cc_op handling more explicit for repeated string
instructions.
target/i386: do not use gen_op_jz_ecx for repeated string operations
target/i386: optimize CX handling in repeated string operations
target/i386: execute multiple REP/REPZ iterations without leaving TB
target/i386: pull computation of string update value out of loop
target/i386: avoid using s->tmp0 for add to implicit registers
target/i386/tcg/translate.c | 342 +++++++++++++++++++++---------------
target/i386/tcg/emit.c.inc | 56 ++----
2 files changed, 219 insertions(+), 179 deletions(-)
--
2.47.1