On 10/19/23 03:46, Paolo Bonzini wrote:
> This includes:
>
> - implementing SHA and CMPccXADD instruction extensions
>
> - introducing a new mechanism for flags writeback that avoids a
> tricky failure
>
> - converting the more orthogonal parts of the one-byte opcode
> map, as well as the CMOVcc and SETcc instructions.
>
> Tested by booting several 32-bit and 64-bit guests.
>
> The new decoder produces roughly 2% more ops, but after optimization there
> are just 0.5% more and almost all of them come from cmp instructions.
> For some reason that I have not investigated, these end up with an extra
> mov even after optimization:
>
> sub_i64 tmp0,rax,$0x33
> mov_i64 cc_src,$0x33 mov_i64 cc_dst,tmp0
> sub_i64 cc_dst,rax,$0x33 mov_i64 cc_src,$0x33
> discard cc_src2 discard cc_src2
> discard cc_op discard cc_op
>
> It could be easily fixed by not reusing gen_SUB for cmp instructions,
> or by debugging what goes on in the optimizer. However, it does not
> result in larger assembly.
This is expected behaviour out of the tcg optimizer. We don't forward-propagate outputs
at that point. But during register allocation of the "mov cc_dst,tmp0" opcode, we will
see that tmp0 is dead and re-assign the register from tmp0 to cc_dst without emitting an
host instruction.
r~