This series implements APX support. This is little more than some
Christmas time hacking, and I have tested it for now only on user mode
emulation. The main reason to do this is actually to have some initial
infrastructure for EVEX, without requiring all the complexity of AVX512.
It also forces some changes that (hopefully) make QEMU's decoder align
a bit more with what Intel processors actually do. These (patches 1, 4,
8, 9 in particular) could be extracted and committed separately, though
at this point this would be something for 11.1 anyway.
There are relatively few *new* instructions (CCMP/CTEST, CFCMOV,
PUSH2/POP2), all of them trivial except for CCMP/CTEST. Therefore, most
of the new code is for decoding. The new data destination feature comes
almost for free thanks to the existing support for BMI instructions, and
variants such as no flags update and zero-upper are quite easy as well.
For CCMP/CTEST, I tried to make them reasonably efficient, also thanks
to the changes to AF/CF/OF computation that came over the past year,
but that takes quite a few lines of code.
Don't expect any performance gains. APX binaries do produce about 1%
fewer TCG ops, but they map to about 1% *more* assembly instructions,
at least for x86-on-x86: that's because while the optimizer could
already produce roughly the same ops as NDD or NF instructions, the new
PUSH2/POP2 instructions include a stack alignment check that isn't there
in non-APX code. I don't think it's worth wasting a precious HF_ bit
for it, at least not until for 10 years or so.
Other than testing system emulation, the main decision to take is
whether to treat VEX and EVEX maps as an extension of the regular maps,
or just bite the bullet, copy them over to a new array and define them
from scratch. There are some annoying differences already in APX
(accepted prefixes and opcodes moved to a different spot) and there
would be even more with AVX512, for example mask register instructions
use opcodes in VEX map 1 that overlap with two-byte 0F opcodes.
I don't think I will have much time to work on this for a few months,
since I did it just out of my own interest, but I thought I'd throw it
out there for review.
Paolo
Paolo Bonzini (18):
target/i386/tcg: move check bits out of validate_vex
target/i386/tcg: add APX support to XSAVE/XRSTOR
target/i386/tcg: treat VEX as disabling high-byte registers
target/i386/tcg: add definition for REX2 prefix
target/i386/tcg: mark XSAVE* as not allowing REX2
target/i386/tcg: decode REX2 prefix
target/i386/tcg: implement JMPABS instruction
target/i386/tcg: fetch modrm early
target/i386/tcg: move VEX validation early
target/i386/tcg: extend VEX.vvvv parsing for APX
target/i386/tcg: decode EVEX prefix
target/i386/tcg: add ZU writeback
target/i386/tcg: add decode functionality for APX
target/i386/tcg: implement CCMP/CTEST
target/i386/tcg: undo IMUL memory load optimization
target/i386/tcg: decode APX instructions
target/i386/tcg: mark APX as supported
target/i386/tcg: optimize CCMP
configs/targets/x86_64-bsd-user.mak | 2 +-
configs/targets/x86_64-linux-user.mak | 2 +-
target/i386/cpu.h | 8 +
target/i386/helper.h | 1 +
target/i386/tcg/decode-new.h | 20 +
target/i386/tcg/tcg-cpu.h | 16 +-
target/i386/tcg/cc_helper_template.h.inc | 11 +
target/i386/cpu.c | 15 +-
target/i386/helper.c | 11 +
target/i386/tcg/cc_helper.c | 10 +
target/i386/tcg/excp_helper.c | 5 +
target/i386/tcg/fpu_helper.c | 59 +-
target/i386/tcg/tcg-cpu.c | 5 +-
target/i386/tcg/translate.c | 106 ++-
target/i386/tcg/decode-new.c.inc | 937 ++++++++++++++++++-----
target/i386/tcg/emit.c.inc | 243 +++++-
16 files changed, 1226 insertions(+), 225 deletions(-)
--
2.52.0