[PATCH v2 00/25] target/i386: convert 1-byte opcodes to new decoder

Paolo Bonzini posted 25 patches 6 months, 3 weeks ago
Failed in applying to current master (apply log)
target/i386/helper.h                        |   11 -
target/i386/tcg/decode-new.h                |   23 +-
target/i386/tcg/shift_helper_template.h.inc |  108 -
target/i386/tcg/int_helper.c                |   34 -
target/i386/tcg/translate.c                 | 3780 ++++---------------
target/i386/tcg/decode-new.c.inc            |  605 ++-
target/i386/tcg/emit.c.inc                  | 1588 +++++++-
7 files changed, 2961 insertions(+), 3188 deletions(-)
delete mode 100644 target/i386/tcg/shift_helper_template.h.inc
[PATCH v2 00/25] target/i386: convert 1-byte opcodes to new decoder
Posted by Paolo Bonzini 6 months, 3 weeks ago
This series includes changes to the x86 TCG decoder that switch the
1-byte opcodes to the table-driven decoder (except for x87).  A few
easy 2-byte opcodes are also converted (BSWAP, SETcc, CMOVcc,
MOVZX/MOVSX and those that are extensions of 1-byte opcodes like PUSH/POP
FS/GS, LFS/LGS/LSS).

After optimization, the generated code is generally similar to what
is produced by the old decoder, with some differences for 32-bit
multiplications and rotate operations (RCL/RCR, and ROL/ROR less so).

This reaches a point where prefix decoding is done entirely in the new
decoder; when the opcode is loaded, if needed it will defer to
translate.c for the actual translation of the instruction.

Quite surprisingly, even without removing this duplicate code the
patch remove more lines than it adds, even though the table-driven
translator is theoretically more verbose (1 line per entry in the tables
plus all the function declarations for group decoders and emitters).
This shows how much operand decoding is spread all over the place in
translate.c.

The main change from v1 is new changes to flag computation that
avoid using s->tmp0 and s->tmp4.  Of the changes to code generation,
it's probably worth pointing out 32-bit IMUL is now different for
32-bit and 64-bit hosts, and more efficient computation of flags
in gen_shift_dynamic_flags() via cc_op_live[].

I have a few more cleanup patches that drop ~150 lines of now
redundant translate.c code, but I'll send them separately.

Paolo

v1->v2:
- replaced "target/i386: do not use s->tmp0 and s->tmp4 to compute flags"
  with patches 4..10
- fixed compilation by moving I_unsigned to patch 15 ("target/i386:
  move 60-BF opcodes to new decoder")
- changed 3-operand IMUL to use sextT0
- fixed OUTS decoding
- changed ARPL to use the umax TCG op
- use 64-bit multiply for imull on 64-bit hosts
- use sari instead of negsetcondi(LT)
- make INS code more similar to IN
- fix compilation after "target/i386: generalize gen_movl_seg_T0"
- note that 0xD0-0xD3/6 is the undocumented SAL opcode
- move opcodes_grp3 and opcodes_grp4 inside group decoding functions
- rename decode_group4 to decode_group4_5
- put together all TCG_TARGET_* definitions in emit.c.inc
- as a result of the new flag patches, ensure that gen_op_jz_ecx()
  and gen_jmp_rel() are always preceded by gen_update_cc_op()
- do not generate temporaries until after all early return cases
- compute carry of rotation operations with extract instead of
  setcond(TSTNE)
- gen_shift_dynamic_flags() uses cc_op_live[] to generate movcond
  operations
- a few alignment changes to decoding tables and removals of TABs
  

Paolo Bonzini (25):
  target/i386: use TSTEQ/TSTNE to test low bits
  target/i386: use TSTEQ/TSTNE to check flags
  target/i386: remove mask from CCPrepare
  target/i386: cc_op is not dynamic in gen_jcc1
  target/i386: cleanup cc_op changes for REP/REPZ/REPNZ
  target/i386: pull cc_op update to callers of gen_jmp_rel{,_csize}
  target/i386: extend cc_* when using them to compute flags
  target/i386: do not use s->T0 and s->T1 as scratch registers for CCPrepare
  target/i386: clarify the "reg" argument of functions returning CCPrepare
  target/i386: cleanup *gen_eob*
  target/i386: reintroduce debugging mechanism
  target/i386: move 00-5F opcodes to new decoder
  target/i386: extract gen_far_call/jmp, reordering temporaries
  target/i386: allow instructions with more than one immediate
  target/i386: move 60-BF opcodes to new decoder
  target/i386: generalize gen_movl_seg_T0
  target/i386: move C0-FF opcodes to new decoder (except for x87)
  target/i386: merge and enlarge a few ranges for call to disas_insn_new
  target/i386: move remaining conditional operations to new decoder
  target/i386: move BSWAP to new decoder
  target/i386: port extensions of one-byte opcodes to new decoder
  target/i386: remove now-converted opcodes from old decoder
  target/i386: decode x87 instructions in a separate function
  target/i386: split legacy decoder into a separate function
  target/i386: remove duplicate prefix decoding

 target/i386/helper.h                        |   11 -
 target/i386/tcg/decode-new.h                |   23 +-
 target/i386/tcg/shift_helper_template.h.inc |  108 -
 target/i386/tcg/int_helper.c                |   34 -
 target/i386/tcg/translate.c                 | 3780 ++++---------------
 target/i386/tcg/decode-new.c.inc            |  605 ++-
 target/i386/tcg/emit.c.inc                  | 1588 +++++++-
 7 files changed, 2961 insertions(+), 3188 deletions(-)
 delete mode 100644 target/i386/tcg/shift_helper_template.h.inc

-- 
2.45.0