(The beginning of) a new i386 decoder

[RFC PATCH 00/17] (The beginning of) a new i386 decoder

Posted by Paolo Bonzini 3 years, 5 months ago

While looking again at Paul's patches for AVX, I came to the conclusion
that the x86 decoder is unsalvageable.  The encoding of x86 is simply too
messy for it to be decoded in code; huge tables, derived as much as possible
from the architecture reference, are the real way to go.

So here is a new, albeit partial decoder, that is based on three principles:

- use mostly table-driven decoding, using tables derived as much as possible
  from the Intel manual, keeping the code as "non-branchy" as possible

- keep address generation and (for ALU operands) memory loads and write back
  as much in common code as possible, to avoid code duplication (this
  is less relevant to non-ALU instructions because read-modify-write
  operations are rare)

- do minimal changes on the old decoder while allowing incremental
  replacement of the old decoder with the new one

So this series introduces the main decoder flow, integrates it with the
old decoder (which takes care of parsing prefixes and then optionally
drops to the new one based on the first byte of the opcode), and
implements three quarters of the one byte opcodes.

It is only lightly tested but it can boot to iPXE and run some 64-bit
coreutils just fine; Linux seems to trigger a bug in outsw/l/q emulation
that I haven't checked yet, but still it's enough to show the result of
a couple days of hacking.

The generated code is mostly the same, though marginally worse in some
cases because I privileged code simplicity.  For example, MOVSXD is not
able to use MO_SL and falls back to MO_UL + sign extension.  One notable
difference is that the new decoder always sign-extends 8-bit immediates,
so for example a "cmpb $e9, %dl" instruction will subtract $0xfff...fffe9
from the temporary value.  This is the way Intel intended "Ib" immediates
to work, and there's no difference between the two.

Anyay, porting these opcodes is really more of a validation for the
whole concept and a test for the common decoder code; it's probably more
efficient to focus on the SSE and VEX 2-byte and 3-byte opcodes as a path
towards enabling AVX in QEMU, and keep the existing decoder for non-VEX,
non-SSE opcodes.  Getting the conditions right for VEX.L, VEX.W etc. is
going to be, well, vexing because of the way Intel has decided to format
the exception tables in the manual, but it should be feasible to use a
more table-based decoding process for those operations as well.

The series is available at https://gitlab.com/bonzini/qemu.git, branch i386.

Paolo

Paolo Bonzini (17):
  target/i386: extract old decoder to a separate file
  target/i386: introduce insn_get_addr
  target/i386: add core of new i386 decoder
  target/i386: add ALU load/writeback core
  target/i386: add 00-07, 10-17 opcodes
  target/i386: add 08-0F, 18-1F opcodes
  target/i386: add 20-27, 30-37 opcodes
  target/i386: add 28-2f, 38-3f opcodes
  target/i386: add 40-47, 50-57 opcodes
  target/i386: add 48-4f, 58-5f opcodes
  target/i386: add 60-67, 70-77 opcodes
  target/i386: add 68-6f, 78-7f opcodes
  target/i386: add 80-87, 90-97 opcodes
  target/i386: add a0-a7, b0-b7 opcodes
  target/i386: do not clobber A0 in POP translation
  target/i386: add 88-8f, 98-9f opcodes
  target/i386: add a8-af, b8-bf opcodes

 target/i386/tcg/decode-new.c.inc | 1254 +++++++
 target/i386/tcg/decode-old.c.inc | 5707 +++++++++++++++++++++++++++++
 target/i386/tcg/emit.c.inc       |  684 ++++
 target/i386/tcg/translate.c      | 5822 +-----------------------------
 4 files changed, 7740 insertions(+), 5727 deletions(-)
 create mode 100644 target/i386/tcg/decode-new.c.inc
 create mode 100644 target/i386/tcg/decode-old.c.inc
 create mode 100644 target/i386/tcg/emit.c.inc

-- 
2.37.1

Re: [RFC PATCH 00/17] (The beginning of) a new i386 decoder

Posted by Richard Henderson 3 years, 5 months ago

On 8/24/22 10:31, Paolo Bonzini wrote:
> It is only lightly tested but it can boot to iPXE and run some 64-bit
> coreutils just fine; Linux seems to trigger a bug in outsw/l/q emulation
> that I haven't checked yet, but still it's enough to show the result of
> a couple days of hacking.

Excellent.

> The generated code is mostly the same, though marginally worse in some
> cases because I privileged code simplicity.  For example, MOVSXD is not
> able to use MO_SL and falls back to MO_UL + sign extension.

I think this is ok.

We can improve things like this on a case-by-case basis.
For example, MOVSXD could gain a X86_SPECIAL_Signed flag,
to be passed on to gen_load().


> One notable
> difference is that the new decoder always sign-extends 8-bit immediates,
> so for example a "cmpb $e9, %dl" instruction will subtract $0xfff...fffe9
> from the temporary value.  This is the way Intel intended "Ib" immediates
> to work, and there's no difference between the two.

That is in fact an improvement.

> Anyay, porting these opcodes is really more of a validation for the
> whole concept and a test for the common decoder code; it's probably more
> efficient to focus on the SSE and VEX 2-byte and 3-byte opcodes as a path
> towards enabling AVX in QEMU, and keep the existing decoder for non-VEX,
> non-SSE opcodes.

Eh... I disagree.  I would really hate to retain the existing decoder.
This is already so much better...

> The series is available at https://gitlab.com/bonzini/qemu.git, branch i386.

Thanks.


r~

Re: [RFC PATCH 00/17] (The beginning of) a new i386 decoder

Posted by Paolo Bonzini 3 years, 5 months ago

On 8/25/22 01:01, Richard Henderson wrote:
>> One notable
>> difference is that the new decoder always sign-extends 8-bit immediates,
>> so for example a "cmpb $e9, %dl" instruction will subtract $0xfff...fffe9
>> from the temporary value.  This is the way Intel intended "Ib" immediates
>> to work, and there's no difference between the two.
> 
> That is in fact an improvement.

Yes, it is and it is a direct effect of encoding the operand types and 
sizes in a table, instead of writing ad hoc code everywhere.

>> Anyay, porting these opcodes is really more of a validation for the
>> whole concept and a test for the common decoder code; it's probably more
>> efficient to focus on the SSE and VEX 2-byte and 3-byte opcodes as a path
>> towards enabling AVX in QEMU, and keep the existing decoder for non-VEX,
>> non-SSE opcodes.
> 
> Eh... I disagree.  I would really hate to retain the existing decoder.
> This is already so much better...

Absolutely, it's just a matter of programmer efficiency, and then 
SSE/AVX is where I would start.

I would hate to not get there just because I didn't have time to 
complete the last sixty-ish one-byte opcodes (which are also the ones 
that benefit the least from table-driven decoding; that's already 
visible in 90-9F).

This was just a heads up that if I complete this patchset I would 
probably ask to have it committed with "just" SSE/AVX (plus the BMI VEX 
instructions), in a similar spirit to how the Meson conversion only 
covered Makefiles.

Paolo