[v2] target/i386: new decoder + AVX implementation

[PATCH v2 00/37] target/i386: new decoder + AVX implementation
Posted by Paolo Bonzini 3 years, 4 months ago
This is now mostly ready and has been tested quite heavily, but I expect
to repost a final version once the PC-relative code generation patches
are in.  I also plan to do more testing in the meanwhile, which might
well find other bugs of course.  I also have not looked at all into
XSAVE/XRSTOR support for usermode emulation sigcontext; this is already
a missing feature but it becomes more important for AVX.

Compared to the previous RFC there are a bunch of bugfixes, mostly for
big-endian systems but also for system emulation (XSAVE/XRSTOR, without
which OSes cannot enable AVX even though usermode emulation can cheat).
They are detailed below.  Code generation changes cover mostly what
was pointed out in the review, but also reusing the new functionality
introduced to fix bugs.  3DNow has been converted to the new decoder.


The series is at the i386 branch of https://gitlab.com/bonzini/qemu,
up to commit 94743924ea14103e348eb4ca533945213fa4018a.

The final patch, removing the old SSE decoder, seems to be too big
for the mailing list, so I removed the big hunk in the middle that
just deletes gen_sse and the tables above it.

Paolo


Bugfixes from v1:
* enter MMX for PSHUFW
* categorized MOVNTSS, MOVNTSD as SSE4A
* categorized CVTPI2PS, CVTPI2PD, CVTPS2PI, CVTPD2PI, CVTTPS2PI, CVTTPD2PI
  as non-VEX
* fixed length of argument of CVTPS2PI and CVTTPS2PI 
* fixed X86_SPECIAL_AVXExtMov which reversed MO_128/MO_256
* tested SSE4a and AES
* finished implementation of 256-bit AES instructions
* removed some unnecessary/wrong X86_SPECIAL_MMX annotations
* fix signedness of 0F3Ah immediates
* fixed big-endian support in patch 2 (old decoder)
* fixed big-endian support in MOVLPx, MOVHPx, MOVLHPS, MOVSD, MOVSS, PMOVMSKB,
  VEXTRACTx128, VGATHER (new decoder)
* tested system emulation, which actually covers XSAVE/XRSTOR

Other code generation changes from v1:
* more operations (addus, adds, subus, subs, minu, mins, mullw, mulld,
  broadcast, abs) moved to gvec
* pointer temps for helpers are generated lazily
* implement alignment restrictions for SSE instructions
* PMOVMSKB now uses extract2 or deposit
* looked into using maxsz > oprsz feature, but it does not work on
  big-endian hosts
* change tcg_const to tcg_constant
* fixed register changes before loads; unaligned loads always go through
  a temporary for the same reason
* reimplemented VZEROALL using gen_helper_memset
* reimplemented VZEROUPPER using gvec moves
* introduced new function vector_elem_offset, mostly for big-endian but it has
  a few other uses

Testing changes from v1:
* added more AES and VAES testcases

Decoding changes from v1:
* removed #define of gen_V* to gen_P*
* split group 12/13/14 decoding
* converted 3DNow to new decoder
* used decode_by_prefix where applicable
* interpret prefixes at decode time for 0F5B, 0F77, 0F78, 0F79, 0F7E, 0FE6
* cleaned up 0F6F, splitting 0F7F out of it

Other cleanups from v1:
* added remark on VEX.256 being available for MOVLPx
* changed disas_insn_new to return void
* moved switch labels out of if statements
* changed abort() to g_assert_not_reached()
* left out "default: abort()" altogether when applicable
* fixed spacing around vgather helpers
* removed some (most) inline markers, compiled with clang
* added const markers to all X86OpEntry arrays
* squashed move of scalar VEX operations into a single patch
* fixed checkpatch complaints (outside the table)
* improved some commit messages

Paolo Bonzini (32):
  target/i386: make ldo/sto operations consistent with ldq
  target/i386: REPZ and REPNZ are mutually exclusive
  target/i386: introduce insn_get_addr
  target/i386: add core of new i386 decoder
  target/i386: add ALU load/writeback core
  target/i386: add CPUID[EAX=7,ECX=0].ECX to DisasContext
  target/i386: add CPUID feature checks to new decoder
  target/i386: validate VEX prefixes via the instructions' exception
    classes
  target/i386: validate SSE prefixes directly in the decoding table
  target/i386: move scalar 0F 38 and 0F 3A instruction to new decoder
  target/i386: extend helpers to support VEX.V 3- and 4- operand
    encodings
  target/i386: support operand merging in binary scalar helpers
  target/i386: provide 3-operand versions of unary scalar helpers
  target/i386: implement additional AVX comparison operators
  target/i386: Introduce 256-bit vector helpers
  target/i386: reimplement 0x0f 0x60-0x6f, add AVX
  target/i386: reimplement 0x0f 0xd8-0xdf, 0xe8-0xef, 0xf8-0xff, add AVX
  target/i386: reimplement 0x0f 0x50-0x5f, add AVX
  target/i386: reimplement 0x0f 0x78-0x7f, add AVX
  target/i386: reimplement 0x0f 0x70-0x77, add AVX
  target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX
  target/i386: clarify (un)signedness of immediates from 0F3Ah opcodes
  target/i386: reimplement 0x0f 0x3a, add AVX
  target/i386: reimplement 0x0f 0x38, add AVX
  target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVX
  target/i386: reimplement 0x0f 0x10-0x17, add AVX
  target/i386: reimplement 0x0f 0x28-0x2f, add AVX
  target/i386: implement XSAVE and XRSTOR of AVX registers
  target/i386: implement VLDMXCSR/VSTMXCSR
  tests/tcg: extend SSE tests to AVX
  target/i386: move 3DNow to the new decoder
  target/i386: remove old SSE decoder

Paul Brook (3):
  target/i386: add AVX_EN hflag
  target/i386: Prepare ops_sse_header.h for 256 bit AVX
  target/i386: Enable AVX cpuid bits when using TCG

Richard Henderson (2):
  target/i386: Define XMMReg and access macros, align ZMM registers
  target/i386: Use tcg gvec ops for pmovmskb

 target/i386/cpu.c                |   10 +-
 target/i386/cpu.h                |   59 +-
 target/i386/helper.c             |   12 +
 target/i386/helper.h             |    2 +
 target/i386/ops_sse.h            |  700 ++++++----
 target/i386/ops_sse_header.h     |  347 +++--
 target/i386/tcg/decode-new.c.inc | 1791 ++++++++++++++++++++++++
 target/i386/tcg/decode-new.h     |  249 ++++
 target/i386/tcg/emit.c.inc       | 2234 ++++++++++++++++++++++++++++++
 target/i386/tcg/fpu_helper.c     |   82 +-
 target/i386/tcg/translate.c      | 2117 ++--------------------------
 tests/tcg/i386/Makefile.target   |    2 +-
 tests/tcg/i386/test-avx.c        |  201 +--
 tests/tcg/i386/test-avx.py       |    5 +-
 14 files changed, 5298 insertions(+), 2513 deletions(-)
 create mode 100644 target/i386/tcg/decode-new.c.inc
 create mode 100644 target/i386/tcg/decode-new.h
 create mode 100644 target/i386/tcg/emit.c.inc

-- 
2.37.2