[Qemu-devel] [PATCH v4 00/11] TCG optimizations for 2.10

Emilio G. Cota posted 11 patches 6 years, 12 months ago
Failed in applying to current master (apply log)
[Qemu-devel] [PATCH v4 00/11] TCG optimizations for 2.10
Posted by Emilio G. Cota 6 years, 12 months ago
v3 for context: https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg04795.html

Changes from v3:

- Added reviewed-by tags.

- Added a couple of suggested-by tags that I forgot to add in v3
  regarding lookup_and_goto_ptr and i386's implementation of goto_ptr.

- lookup_tb_ptr
  + Dropped the unnecessary exit_request check, as suggested by Paolo and
    Richard.
  + Only get the CPU state if we get a tb from the jmp_cache, as suggested
    by Richard.
  + Added tb_htable_lookup if we miss in tb_jmp_cache, as suggested by
    Richard. This requires an extra patch to export tb_htable_lookup.

- goto_ptr: add IMPL(has_goto_ptr), as pointed out by Richard.

- target/arm: added a comment about gen_jr. See the v3 thread for why
  it is needed.

- target/i386: use TCGV_UNUSED instead of (ab)using NULL on a TCGv,
  as suggested by Richard. Also took his suggestion to simplify
  the addition of jr + cs_base.
  To minimize churn I renamed gen_eob_worker to do_gen_eob_worker,
  which takes the newly added argument.

I have *not* re-run all experiments, because it takes several hours and
performance hasn't changed much from v3, as can be seen in these two charts:
* spec06int user-mode, test input, v2.9.0 baseline: http://imgur.com/ME2eMq1
* spec06int softmmu, test input, v3 baseline: http://imgur.com/Clolu9Z
The perf differences are mostly due to adding the htable check. Note that
its impact is small, since tb_jmp_cache has a %hit rate in the high 90's.

You can inspect/fetch the changes at:
  https://github.com/cota/qemu/tree/tcg-opt-v4

Thanks,

		Emilio


Re: [Qemu-devel] [PATCH v4 00/11] TCG optimizations for 2.10
Posted by Emilio G. Cota 6 years, 12 months ago
Just to avoid confusion,

On Wed, Apr 26, 2017 at 23:29:13 -0400, Emilio G. Cota wrote:
> I have *not* re-run all experiments, because it takes several hours and
> performance hasn't changed much from v3, as can be seen in these two charts:
> * spec06int user-mode, test input, v2.9.0 baseline: http://imgur.com/ME2eMq1

Here v4 doesn't have the htable lookup. (v4+htable has it)

> * spec06int softmmu, test input, v3 baseline: http://imgur.com/Clolu9Z

Here v4 does have it.

		E.

Re: [Qemu-devel] [PATCH v4 00/11] TCG optimizations for 2.10
Posted by Aurelien Jarno 6 years, 12 months ago
On 2017-04-26 23:29, Emilio G. Cota wrote:
> v3 for context: https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg04795.html
> 
> Changes from v3:
> 
> - Added reviewed-by tags.
> 
> - Added a couple of suggested-by tags that I forgot to add in v3
>   regarding lookup_and_goto_ptr and i386's implementation of goto_ptr.
> 
> - lookup_tb_ptr
>   + Dropped the unnecessary exit_request check, as suggested by Paolo and
>     Richard.
>   + Only get the CPU state if we get a tb from the jmp_cache, as suggested
>     by Richard.
>   + Added tb_htable_lookup if we miss in tb_jmp_cache, as suggested by
>     Richard. This requires an extra patch to export tb_htable_lookup.
> 
> - goto_ptr: add IMPL(has_goto_ptr), as pointed out by Richard.
> 
> - target/arm: added a comment about gen_jr. See the v3 thread for why
>   it is needed.
> 
> - target/i386: use TCGV_UNUSED instead of (ab)using NULL on a TCGv,
>   as suggested by Richard. Also took his suggestion to simplify
>   the addition of jr + cs_base.
>   To minimize churn I renamed gen_eob_worker to do_gen_eob_worker,
>   which takes the newly added argument.
> 
> I have *not* re-run all experiments, because it takes several hours and
> performance hasn't changed much from v3, as can be seen in these two charts:
> * spec06int user-mode, test input, v2.9.0 baseline: http://imgur.com/ME2eMq1
> * spec06int softmmu, test input, v3 baseline: http://imgur.com/Clolu9Z
> The perf differences are mostly due to adding the htable check. Note that
> its impact is small, since tb_jmp_cache has a %hit rate in the high 90's.
> 
> You can inspect/fetch the changes at:
>   https://github.com/cota/qemu/tree/tcg-opt-v4

Thanks for this patchset. I have tested it with an arm target, but also
with a mips target with and additional patch. I haven't done any precise
benchmark yet. The patch is trivial and only changes 3 lines, but I am
not 100% sure I have done things correctly (see my comment on patch 7).

Tested-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net