target/i386/tcg: implement APX

[RFC PATCH 00/18] target/i386/tcg: implement APX

Paolo Bonzini posted 18 patches 16 hours ago

Download series mbox

Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260301144218.458140-1-pbonzini@redhat.com

Maintainers: Warner Losh <imp@bsdimp.com>, Kyle Evans <kevans@freebsd.org>, Laurent Vivier <laurent@vivier.eu>, Pierrick Bouvier <pierrick.bouvier@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Zhao Liu <zhao1.liu@intel.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>

configs/targets/x86_64-bsd-user.mak      |   2 +-
configs/targets/x86_64-linux-user.mak    |   2 +-
target/i386/cpu.h                        |   8 +
target/i386/helper.h                     |   1 +
target/i386/tcg/decode-new.h             |  20 +
target/i386/tcg/tcg-cpu.h                |  16 +-
target/i386/tcg/cc_helper_template.h.inc |  11 +
target/i386/cpu.c                        |  15 +-
target/i386/helper.c                     |  11 +
target/i386/tcg/cc_helper.c              |  10 +
target/i386/tcg/excp_helper.c            |   5 +
target/i386/tcg/fpu_helper.c             |  59 +-
target/i386/tcg/tcg-cpu.c                |   5 +-
target/i386/tcg/translate.c              | 106 ++-
target/i386/tcg/decode-new.c.inc         | 937 ++++++++++++++++++-----
target/i386/tcg/emit.c.inc               | 243 +++++-
16 files changed, 1226 insertions(+), 225 deletions(-)

Expand all Fold all

[RFC PATCH 00/18] target/i386/tcg: implement APX

Posted by Paolo Bonzini 16 hours ago

This series implements APX support.  This is little more than some
Christmas time hacking, and I have tested it for now only on user mode
emulation.  The main reason to do this is actually to have some initial
infrastructure for EVEX, without requiring all the complexity of AVX512.
It also forces some changes that (hopefully) make QEMU's decoder align
a bit more with what Intel processors actually do.  These (patches 1, 4,
8, 9 in particular) could be extracted and committed separately, though
at this point this would be something for 11.1 anyway.

There are relatively few *new* instructions (CCMP/CTEST, CFCMOV,
PUSH2/POP2), all of them trivial except for CCMP/CTEST. Therefore, most
of the new code is for decoding.  The new data destination feature comes
almost for free thanks to the existing support for BMI instructions, and
variants such as no flags update and zero-upper are quite easy as well.
For CCMP/CTEST, I tried to make them reasonably efficient, also thanks
to the changes to AF/CF/OF computation that came over the past year,
but that takes quite a few lines of code.

Don't expect any performance gains.  APX binaries do produce about 1%
fewer TCG ops, but they map to about 1% *more* assembly instructions,
at least for x86-on-x86: that's because while the optimizer could
already produce roughly the same ops as NDD or NF instructions, the new
PUSH2/POP2 instructions include a stack alignment check that isn't there
in non-APX code.  I don't think it's worth wasting a precious HF_ bit
for it, at least not until for 10 years or so.

Other than testing system emulation, the main decision to take is
whether to treat VEX and EVEX maps as an extension of the regular maps,
or just bite the bullet, copy them over to a new array and define them
from scratch.  There are some annoying differences already in APX
(accepted prefixes and opcodes moved to a different spot) and there
would be even more with AVX512, for example mask register instructions
use opcodes in VEX map 1 that overlap with two-byte 0F opcodes.

I don't think I will have much time to work on this for a few months,
since I did it just out of my own interest, but I thought I'd throw it
out there for review.

Paolo

Paolo Bonzini (18):
  target/i386/tcg: move check bits out of validate_vex
  target/i386/tcg: add APX support to XSAVE/XRSTOR
  target/i386/tcg: treat VEX as disabling high-byte registers
  target/i386/tcg: add definition for REX2 prefix
  target/i386/tcg: mark XSAVE* as not allowing REX2
  target/i386/tcg: decode REX2 prefix
  target/i386/tcg: implement JMPABS instruction
  target/i386/tcg: fetch modrm early
  target/i386/tcg: move VEX validation early
  target/i386/tcg: extend VEX.vvvv parsing for APX
  target/i386/tcg: decode EVEX prefix
  target/i386/tcg: add ZU writeback
  target/i386/tcg: add decode functionality for APX
  target/i386/tcg: implement CCMP/CTEST
  target/i386/tcg: undo IMUL memory load optimization
  target/i386/tcg: decode APX instructions
  target/i386/tcg: mark APX as supported
  target/i386/tcg: optimize CCMP

 configs/targets/x86_64-bsd-user.mak      |   2 +-
 configs/targets/x86_64-linux-user.mak    |   2 +-
 target/i386/cpu.h                        |   8 +
 target/i386/helper.h                     |   1 +
 target/i386/tcg/decode-new.h             |  20 +
 target/i386/tcg/tcg-cpu.h                |  16 +-
 target/i386/tcg/cc_helper_template.h.inc |  11 +
 target/i386/cpu.c                        |  15 +-
 target/i386/helper.c                     |  11 +
 target/i386/tcg/cc_helper.c              |  10 +
 target/i386/tcg/excp_helper.c            |   5 +
 target/i386/tcg/fpu_helper.c             |  59 +-
 target/i386/tcg/tcg-cpu.c                |   5 +-
 target/i386/tcg/translate.c              | 106 ++-
 target/i386/tcg/decode-new.c.inc         | 937 ++++++++++++++++++-----
 target/i386/tcg/emit.c.inc               | 243 +++++-
 16 files changed, 1226 insertions(+), 225 deletions(-)

-- 
2.52.0