[Qemu-devel] [PATCH v6 00/25] tcg cross-tb optimizations

Richard Henderson posted 25 patches 8 years, 9 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170502192300.2124-1-rth@twiddle.net
Test checkpatch failed
Test docker passed
Test s390x passed
There is a newer version of this series
configure                    |  6 ++---
cpu-exec.c                   |  6 ++---
include/exec/exec-all.h      |  2 ++
include/exec/tb-hash.h       | 12 ++++++++++
include/qemu/atomic.h        | 34 +++++++++++++++++++++-------
target/alpha/translate.c     | 54 ++++++++++++++++++++++++++++++++++++++------
target/arm/translate-a64.c   |  5 ++--
target/arm/translate.c       | 21 +++++++++++++----
target/arm/translate.h       |  4 ++++
target/hppa/translate.c      |  8 +++----
target/i386/translate.c      | 43 +++++++++++++++++++++++++++--------
target/mips/translate.c      |  4 ++--
target/nios2/translate.c     |  2 +-
target/s390x/translate.c     | 17 ++++++++++----
tcg-runtime.c                | 32 ++++++++++++++++++++++++++
tcg/README                   |  8 +++++++
tcg/aarch64/tcg-target.h     |  1 +
tcg/aarch64/tcg-target.inc.c | 22 ++++++++++++++++--
tcg/arm/tcg-target.h         |  1 +
tcg/arm/tcg-target.inc.c     | 54 +++++++++++++++++++++++++++++---------------
tcg/i386/tcg-target.h        |  1 +
tcg/i386/tcg-target.inc.c    | 24 ++++++++++++++++++--
tcg/ia64/tcg-target.h        |  1 +
tcg/mips/tcg-target.h        |  1 +
tcg/mips/tcg-target.inc.c    | 13 +++++++++++
tcg/ppc/tcg-target.h         |  1 +
tcg/ppc/tcg-target.inc.c     |  7 ++++++
tcg/s390/tcg-target.h        |  1 +
tcg/s390/tcg-target.inc.c    | 24 +++++++++++++++++---
tcg/sparc/tcg-target.h       |  1 +
tcg/sparc/tcg-target.inc.c   | 11 ++++++++-
tcg/tcg-op.c                 | 13 +++++++++++
tcg/tcg-op.h                 | 11 +++++++++
tcg/tcg-opc.h                |  1 +
tcg/tcg-runtime.h            |  2 ++
tcg/tcg.c                    |  5 ++++
tcg/tcg.h                    |  1 +
tcg/tci/tcg-target.h         |  1 +
38 files changed, 378 insertions(+), 77 deletions(-)
[Qemu-devel] [PATCH v6 00/25] tcg cross-tb optimizations
Posted by Richard Henderson 8 years, 9 months ago
Changes since v5:
  * MIPS patches from Aurelien.
  * AArch64 patches from Emilio.
  * ARM32 backend support for goto_ptr
  * Alpha frontend patch rewritten; the former patch appears to
    drop clock interrupts, not exiting the kernel's idle loop.
    I never *really* figured out why, since both patches seem
    to annotate the same TBs in the same way.
  * Front end patchs for hppa and s390.


r~


Aurelien Jarno (3):
  tcg/mips: implement goto_ptr
  target/mips: optimize cross-page direct jumps in softmmu
  target/mips: optimize indirect branches

Emilio G. Cota (10):
  tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr
  tcg/i386: implement goto_ptr
  target/arm: optimize cross-page direct jumps in softmmu
  target/arm: optimize indirect branches
  target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr
  target/i386: optimize cross-page direct jumps in softmmu
  target/i386: optimize indirect branches
  tb-hash: improve tb_jmp_cache hash function in user mode
  target/aarch64: optimize cross-page direct jumps in softmmu
  target/aarch64: optimize indirect branches

Richard Henderson (12):
  target/nios2: Fix 64-bit ilp32 compilation
  tcg/sparc: Use the proper compilation flags for 32-bit
  qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts
  tcg/ppc: Implement goto_ptr
  tcg/aarch64: Implement goto_ptr
  tcg/sparc: Implement goto_ptr
  tcg/s390: Implement goto_ptr
  tcg/arm: Clarify tcg_out_bx for arm4 host
  tcg/arm: Implement goto_ptr
  target/s390: Use tcg_gen_lookup_and_goto_ptr
  target/hppa: Use tcg_gen_lookup_and_goto_ptr
  target/alpha: Use tcg_gen_lookup_and_goto_ptr

 configure                    |  6 ++---
 cpu-exec.c                   |  6 ++---
 include/exec/exec-all.h      |  2 ++
 include/exec/tb-hash.h       | 12 ++++++++++
 include/qemu/atomic.h        | 34 +++++++++++++++++++++-------
 target/alpha/translate.c     | 54 ++++++++++++++++++++++++++++++++++++++------
 target/arm/translate-a64.c   |  5 ++--
 target/arm/translate.c       | 21 +++++++++++++----
 target/arm/translate.h       |  4 ++++
 target/hppa/translate.c      |  8 +++----
 target/i386/translate.c      | 43 +++++++++++++++++++++++++++--------
 target/mips/translate.c      |  4 ++--
 target/nios2/translate.c     |  2 +-
 target/s390x/translate.c     | 17 ++++++++++----
 tcg-runtime.c                | 32 ++++++++++++++++++++++++++
 tcg/README                   |  8 +++++++
 tcg/aarch64/tcg-target.h     |  1 +
 tcg/aarch64/tcg-target.inc.c | 22 ++++++++++++++++--
 tcg/arm/tcg-target.h         |  1 +
 tcg/arm/tcg-target.inc.c     | 54 +++++++++++++++++++++++++++++---------------
 tcg/i386/tcg-target.h        |  1 +
 tcg/i386/tcg-target.inc.c    | 24 ++++++++++++++++++--
 tcg/ia64/tcg-target.h        |  1 +
 tcg/mips/tcg-target.h        |  1 +
 tcg/mips/tcg-target.inc.c    | 13 +++++++++++
 tcg/ppc/tcg-target.h         |  1 +
 tcg/ppc/tcg-target.inc.c     |  7 ++++++
 tcg/s390/tcg-target.h        |  1 +
 tcg/s390/tcg-target.inc.c    | 24 +++++++++++++++++---
 tcg/sparc/tcg-target.h       |  1 +
 tcg/sparc/tcg-target.inc.c   | 11 ++++++++-
 tcg/tcg-op.c                 | 13 +++++++++++
 tcg/tcg-op.h                 | 11 +++++++++
 tcg/tcg-opc.h                |  1 +
 tcg/tcg-runtime.h            |  2 ++
 tcg/tcg.c                    |  5 ++++
 tcg/tcg.h                    |  1 +
 tcg/tci/tcg-target.h         |  1 +
 38 files changed, 378 insertions(+), 77 deletions(-)

-- 
2.9.3


Re: [Qemu-devel] [PATCH v6 00/25] tcg cross-tb optimizations
Posted by no-reply@patchew.org 8 years, 9 months ago
Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [PATCH v6 00/25] tcg cross-tb optimizations
Message-id: 20170502192300.2124-1-rth@twiddle.net

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]      patchew/149373610338.5144.9635049015143453288.stgit@bahia.lan -> patchew/149373610338.5144.9635049015143453288.stgit@bahia.lan
Switched to a new branch 'test'
7aa8166 target/mips: optimize indirect branches
13b0c04 target/mips: optimize cross-page direct jumps in softmmu
c56c077 tcg/mips: implement goto_ptr
80167d9 target/aarch64: optimize indirect branches
0cbe645 target/aarch64: optimize cross-page direct jumps in softmmu
3854c1f target/alpha: Use tcg_gen_lookup_and_goto_ptr
29b76c6 target/hppa: Use tcg_gen_lookup_and_goto_ptr
f904940 target/s390: Use tcg_gen_lookup_and_goto_ptr
4312937 tcg/arm: Implement goto_ptr
20d8e07 tcg/arm: Clarify tcg_out_bx for arm4 host
fcae688 tcg/s390: Implement goto_ptr
feff310 tcg/sparc: Implement goto_ptr
e11e15b tcg/aarch64: Implement goto_ptr
aba9561 tcg/ppc: Implement goto_ptr
3d86def tb-hash: improve tb_jmp_cache hash function in user mode
85c8bca target/i386: optimize indirect branches
56cde2c target/i386: optimize cross-page direct jumps in softmmu
642d3a1 target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr
66cbfb4 target/arm: optimize indirect branches
ac54831 target/arm: optimize cross-page direct jumps in softmmu
4eda0de tcg/i386: implement goto_ptr
d977fcc tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr
da662a3 qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts
4401168 tcg/sparc: Use the proper compilation flags for 32-bit
7c59ccb target/nios2: Fix 64-bit ilp32 compilation

=== OUTPUT BEGIN ===
Checking PATCH 1/25: target/nios2: Fix 64-bit ilp32 compilation...
Checking PATCH 2/25: tcg/sparc: Use the proper compilation flags for 32-bit...
Checking PATCH 3/25: qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts...
WARNING: architecture specific defines should be avoided
#37: FILE: include/qemu/atomic.h:104:
+#if defined(__x86_64__) || defined(__sparc__)

total: 0 errors, 1 warnings, 87 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 4/25: tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr...
Checking PATCH 5/25: tcg/i386: implement goto_ptr...
Checking PATCH 6/25: target/arm: optimize cross-page direct jumps in softmmu...
Checking PATCH 7/25: target/arm: optimize indirect branches...
Checking PATCH 8/25: target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr...
Checking PATCH 9/25: target/i386: optimize cross-page direct jumps in softmmu...
Checking PATCH 10/25: target/i386: optimize indirect branches...
Checking PATCH 11/25: tb-hash: improve tb_jmp_cache hash function in user mode...
Checking PATCH 12/25: tcg/ppc: Implement goto_ptr...
Checking PATCH 13/25: tcg/aarch64: Implement goto_ptr...
Checking PATCH 14/25: tcg/sparc: Implement goto_ptr...
Checking PATCH 15/25: tcg/s390: Implement goto_ptr...
Checking PATCH 16/25: tcg/arm: Clarify tcg_out_bx for arm4 host...
Checking PATCH 17/25: tcg/arm: Implement goto_ptr...
Checking PATCH 18/25: target/s390: Use tcg_gen_lookup_and_goto_ptr...
ERROR: return is not a function, parentheses are not required
#23: FILE: target/s390x/translate.c:613:
+    return (s->singlestep_enabled ||

total: 1 errors, 0 warnings, 31 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 19/25: target/hppa: Use tcg_gen_lookup_and_goto_ptr...
Checking PATCH 20/25: target/alpha: Use tcg_gen_lookup_and_goto_ptr...
ERROR: return is not a function, parentheses are not required
#29: FILE: target/alpha/translate.c:463:
+    return ((ctx->tb->cflags & CF_LAST_IO)

total: 1 errors, 0 warnings, 100 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 21/25: target/aarch64: optimize cross-page direct jumps in softmmu...
Checking PATCH 22/25: target/aarch64: optimize indirect branches...
Checking PATCH 23/25: tcg/mips: implement goto_ptr...
Checking PATCH 24/25: target/mips: optimize cross-page direct jumps in softmmu...
Checking PATCH 25/25: target/mips: optimize indirect branches...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org
Re: [Qemu-devel] [PATCH v6 00/25] tcg cross-tb optimizations
Posted by Richard Henderson 8 years, 9 months ago
On 05/02/2017 12:22 PM, Richard Henderson wrote:
> Changes since v5:
...
>    * Alpha frontend patch rewritten; the former patch appears to
>      drop clock interrupts, not exiting the kernel's idle loop.
>      I never *really* figured out why, since both patches seem
>      to annotate the same TBs in the same way.

There's definitely something odd going on.

With a rebuild from scratch, the same symptoms have re-appeared for Alpha.  So 
it really had nothing to do with the original patch.  I'm at a bit of a loss...


r~

Re: [Qemu-devel] [PATCH v6 00/25] tcg cross-tb optimizations
Posted by Emilio G. Cota 8 years, 9 months ago
On Tue, May 02, 2017 at 20:36:52 -0700, Richard Henderson wrote:
> On 05/02/2017 12:22 PM, Richard Henderson wrote:
> >Changes since v5:
> ...
> >   * Alpha frontend patch rewritten; the former patch appears to
> >     drop clock interrupts, not exiting the kernel's idle loop.
> >     I never *really* figured out why, since both patches seem
> >     to annotate the same TBs in the same way.
> 
> There's definitely something odd going on.
> 
> With a rebuild from scratch, the same symptoms have re-appeared for Alpha.
> So it really had nothing to do with the original patch.  I'm at a bit of a
> loss...

I can reliably reproduce a freeze upon booting.

Not sure this can help much (this is the first time I run an Alpha
guest), but here are some findings.

In my testing, if I disable the lookup for JMP/JSR/ret, I can boot OK.
This works:

+++ b/target/alpha/translate.c
@@ -2435,12 +2435,16 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
         if (ra != 31) {
             tcg_gen_movi_i64(ctx->ir[ra], ctx->pc);
         }
+#if 0
         if (use_exit_tb(ctx)) {
             ret = EXIT_PC_UPDATED;
         } else {
             tcg_gen_lookup_and_goto_ptr(cpu_pc);
             ret = EXIT_GOTO_TB;
         }
+#else
+        ret = EXIT_PC_UPDATED;
+#endif
         break;

However, this doesn't tell us much, since these jumps are pretty common.

Interestingly, if I leave the lookup_and_goto_ptr above (s/#if 0/#if 1/), but
change the lookup_ptr helper to bypass tb_jmp_cache and directly check the
htable, it boots OK.

Could it be that we're forgetting to clear (or set) tb_jmp_cache somewhere?

		Emilio

Re: [Qemu-devel] [PATCH v6 00/25] tcg cross-tb optimizations
Posted by Richard Henderson 8 years, 9 months ago
On 05/03/2017 08:51 AM, Emilio G. Cota wrote:
> On Tue, May 02, 2017 at 20:36:52 -0700, Richard Henderson wrote:
>> On 05/02/2017 12:22 PM, Richard Henderson wrote:
>>> Changes since v5:
>> ...
>>>    * Alpha frontend patch rewritten; the former patch appears to
>>>      drop clock interrupts, not exiting the kernel's idle loop.
>>>      I never *really* figured out why, since both patches seem
>>>      to annotate the same TBs in the same way.
>>
>> There's definitely something odd going on.
>>
>> With a rebuild from scratch, the same symptoms have re-appeared for Alpha.
>> So it really had nothing to do with the original patch.  I'm at a bit of a
>> loss...
> 
> I can reliably reproduce a freeze upon booting.

Oh good.  Sort of.  The oddly non-reproducible nature of this for me has been 
disconcerting.

> Not sure this can help much (this is the first time I run an Alpha
> guest), but here are some findings.
> 
> In my testing, if I disable the lookup for JMP/JSR/ret, I can boot OK.
> This works:
> 
> +++ b/target/alpha/translate.c
> @@ -2435,12 +2435,16 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
>           if (ra != 31) {
>               tcg_gen_movi_i64(ctx->ir[ra], ctx->pc);
>           }
> +#if 0
>           if (use_exit_tb(ctx)) {
>               ret = EXIT_PC_UPDATED;
>           } else {
>               tcg_gen_lookup_and_goto_ptr(cpu_pc);
>               ret = EXIT_GOTO_TB;
>           }
> +#else
> +        ret = EXIT_PC_UPDATED;
> +#endif
>           break;
> 
> However, this doesn't tell us much, since these jumps are pretty common.

Indeed.

> Interestingly, if I leave the lookup_and_goto_ptr above (s/#if 0/#if 1/), but
> change the lookup_ptr helper to bypass tb_jmp_cache and directly check the
> htable, it boots OK.

Now that *is* odd.  However ...

> Could it be that we're forgetting to clear (or set) tb_jmp_cache somewhere?

... even that should not affect the setting (or clearing) of 
cpu->icount_decr.u16.high.  Which should have been set by tcg_handle_interrupt. 
  We should have exited the chain of TBs at some point.

Which to me means there's some deeper issue.  I.e. the only reason it's been 
working to date so far is that previously we never put together chains of any 
great length.


r~

Re: [Qemu-devel] [PATCH v6 00/25] tcg cross-tb optimizations
Posted by Emilio G. Cota 8 years, 9 months ago
On Wed, May 03, 2017 at 09:27:54 -0700, Richard Henderson wrote:
> On 05/03/2017 08:51 AM, Emilio G. Cota wrote:
> >On Tue, May 02, 2017 at 20:36:52 -0700, Richard Henderson wrote:
> >>On 05/02/2017 12:22 PM, Richard Henderson wrote:
> >>>Changes since v5:
> >>...
> >>>   * Alpha frontend patch rewritten; the former patch appears to
> >>>     drop clock interrupts, not exiting the kernel's idle loop.
> >>>     I never *really* figured out why, since both patches seem
> >>>     to annotate the same TBs in the same way.
> >>
> >>There's definitely something odd going on.
> >>
> >>With a rebuild from scratch, the same symptoms have re-appeared for Alpha.
> >>So it really had nothing to do with the original patch.  I'm at a bit of a
> >>loss...
> >
> >I can reliably reproduce a freeze upon booting.
> 
> Oh good.  Sort of.  The oddly non-reproducible nature of this for me has
> been disconcerting.

I'm booting this image:
  https://gmplib.org/~tege/qemu/images/alpha/disk.img.xz
with this kernel:
  https://gmplib.org/~tege/qemu/images/alpha/vmlinux
invoking with:
  $ qemu-system-alpha -m 512 -drive file=disk.img,media=disk,format=raw,index=0 \
	-kernel vmlinux -append "root=/dev/sda2" [-accel accel=tcg,thread=multi]
I got the above from https://gmplib.org/~tege/qemu.html

I can reproduce reliably with either thread=single or =multi. When booting,
it stops for a few seconds  at "Key type dns_resolver registered"; then it
prints a few more lines to then stop for a while at
"sd 0:0:0:0: [sda] Attached SCSI disk". If I wait long enough, it
does boot. However, without the chaining patch it boots in a few seconds.

> >Interestingly, if I leave the lookup_and_goto_ptr above (s/#if 0/#if 1/), but
> >change the lookup_ptr helper to bypass tb_jmp_cache and directly check the
> >htable, it boots OK.
> 
> Now that *is* odd.  However ...
> 
> >Could it be that we're forgetting to clear (or set) tb_jmp_cache somewhere?
> 
> ... even that should not affect the setting (or clearing) of
> cpu->icount_decr.u16.high.  Which should have been set by
> tcg_handle_interrupt.  We should have exited the chain of TBs at some point.
> 
> Which to me means there's some deeper issue.  I.e. the only reason it's been
> working to date so far is that previously we never put together chains of
> any great length.

Yes, this is my hypothesis as well.

		E.