[Qemu-devel] [PATCH 00/22] tcg: tb_lock removal

Emilio G. Cota posted 22 patches 6 years, 8 months ago
Failed in applying to current master (apply log)
accel/tcg/cpu-exec.c            |  58 +-
accel/tcg/cputlb.c              |   8 -
accel/tcg/translate-all.c       | 958 +++++++++++++++--------
accel/tcg/translate-all.h       |   6 +-
disas/arm.c                     |   2 +-
docs/devel/multi-thread-tcg.txt |  21 +-
exec.c                          |  25 +-
include/exec/cpu-common.h       |   1 -
include/exec/exec-all.h         |  86 +-
include/exec/tb-context.h       |   4 -
include/qemu/qht.h              |  36 +-
linux-user/main.c               |   3 -
linux-user/syscall.c            |   2 +-
tcg/tcg.c                       | 205 +++++
tcg/tcg.h                       |  13 +-
tests/qht-bench.c               |  18 +-
tests/test-qht.c                |  20 +-
util/qht.c                      |  30 +-
18 files changed, 1022 insertions(+), 474 deletions(-)
[Qemu-devel] [PATCH 00/22] tcg: tb_lock removal
Posted by Emilio G. Cota 6 years, 8 months ago
This series applies on top of the "multiple TCG contexts" series, v4:
  https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06769.html

Highlights:

- First, fix a few typos I encountered while working on this (patches 1-3).
  I could send them separately to qemu-trivial if you prefer.
- QHT: use a proper cmp function, instead of just checking pointer values
  to determine equality of matches.
- Use a binary search tree for each TCG region.
- Make l1_map lockless by using cmpxchg
- Introduce page locks (for !user-mode), so that tb_lock is not
  needed when operating on a page
  - Introduce page_collection, to lock a range of pages
- Introduce tb->jmp_lock to protect TB jump lists.
- Remove tb_lock. User-mode uses just mmap_lock and tb->jmp_lock's;
  !user-mode uses the same jump locks as well as page locks.

Performance numbers are in patch 22. We get nice speedups, but I still
see a lot of idling when booting many cores. I suspect it comes from
cross-CPU events (e.g. TLB invalidations), but I need to profile it
better (perf is not good for this; mutrace doesn't quite work). But
anyway that's for another patchset.

You can fetch these changes from:
  https://github.com/cota/qemu/tree/multi-tcg-v4-parallel

Please review!

Thanks,

		Emilio

diffstat:

 accel/tcg/cpu-exec.c            |  58 +-
 accel/tcg/cputlb.c              |   8 -
 accel/tcg/translate-all.c       | 958 +++++++++++++++--------
 accel/tcg/translate-all.h       |   6 +-
 disas/arm.c                     |   2 +-
 docs/devel/multi-thread-tcg.txt |  21 +-
 exec.c                          |  25 +-
 include/exec/cpu-common.h       |   1 -
 include/exec/exec-all.h         |  86 +-
 include/exec/tb-context.h       |   4 -
 include/qemu/qht.h              |  36 +-
 linux-user/main.c               |   3 -
 linux-user/syscall.c            |   2 +-
 tcg/tcg.c                       | 205 +++++
 tcg/tcg.h                       |  13 +-
 tests/qht-bench.c               |  18 +-
 tests/test-qht.c                |  20 +-
 util/qht.c                      |  30 +-
 18 files changed, 1022 insertions(+), 474 deletions(-)


Re: [Qemu-devel] [PATCH 00/22] tcg: tb_lock removal
Posted by Emilio G. Cota 6 years, 8 months ago
On Mon, Aug 07, 2017 at 19:52:16 -0400, Emilio G. Cota wrote:
> This series applies on top of the "multiple TCG contexts" series, v4:
>   https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06769.html
> 
> Highlights:
> 
> - First, fix a few typos I encountered while working on this (patches 1-3).
>   I could send them separately to qemu-trivial if you prefer.
> - QHT: use a proper cmp function, instead of just checking pointer values
>   to determine equality of matches.
> - Use a binary search tree for each TCG region.
> - Make l1_map lockless by using cmpxchg
> - Introduce page locks (for !user-mode), so that tb_lock is not
>   needed when operating on a page
>   - Introduce page_collection, to lock a range of pages
> - Introduce tb->jmp_lock to protect TB jump lists.
> - Remove tb_lock. User-mode uses just mmap_lock and tb->jmp_lock's;
>   !user-mode uses the same jump locks as well as page locks.
> 
> Performance numbers are in patch 22. We get nice speedups, but I still
> see a lot of idling when booting many cores. I suspect it comes from
> cross-CPU events (e.g. TLB invalidations), but I need to profile it
> better (perf is not good for this; mutrace doesn't quite work). But
> anyway that's for another patchset.

The idling is due to BQL contention related to interrupt handling. In the
case of ARM, this boils down to the GICv3 code being single-threaded.
I don't have time right now to make it multi-threaded, but at least we
know where the scalability bottleneck is.

BTW if there's interest I can submit the lock profiler to the list. The code
is in this branch:
  https://github.com/cota/qemu/tree/lock-profiler
The first commit has sample output: https://github.com/cota/qemu/commit/c5bda634

Also, any feedback on the parent (tb_lock removal) patchset would be appreciated.

To make the 2.11 merge easier, I rebased this patchset (as well as the
multi-tcg-v4 set it is based on) on top of rth's tcg-generic-15, fixing a good
bunch of annoying conflicts. The resulting branch is available at:
  https://github.com/cota/qemu/tree/tcg-generic-15%2Bmulti-tcg-v4-parallel

Thanks,

		Emilio

Re: [Qemu-devel] [PATCH 00/22] tcg: tb_lock removal
Posted by Emilio G. Cota 6 years, 6 months ago
On Mon, Aug 07, 2017 at 19:52:16 -0400, Emilio G. Cota wrote:
> This series applies on top of the "multiple TCG contexts" series, v4:
>   https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06769.html
(snip)
> Please review!

Turns out this patchset breaks icount, even after fixing the patchset
it is based on (see [1]).

The last good patch in the series is patch 10:
"translate-all: work page-by-page in tb_invalidate_phys_range_1".
I have no time to look into fixing that, so if you end up reviewing
this set, please review only patches 1-10.

The good news is that patches > 10 get really hairy, so this should
reduce the review burden substantially.

Thanks,

		Emilio

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg01198.html

Re: [Qemu-devel] [PATCH 00/22] tcg: tb_lock removal
Posted by Alex Bennée 6 years, 6 months ago
Emilio G. Cota <cota@braap.org> writes:

> On Mon, Aug 07, 2017 at 19:52:16 -0400, Emilio G. Cota wrote:
>> This series applies on top of the "multiple TCG contexts" series, v4:
>>   https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06769.html
> (snip)
>> Please review!
>
> Turns out this patchset breaks icount, even after fixing the patchset
> it is based on (see [1]).

Isn't icount only used in single-threaded mode though?

>
> The last good patch in the series is patch 10:
> "translate-all: work page-by-page in tb_invalidate_phys_range_1".
> I have no time to look into fixing that, so if you end up reviewing
> this set, please review only patches 1-10.
>
> The good news is that patches > 10 get really hairy, so this should
> reduce the review burden substantially.
>
> Thanks,
>
> 		Emilio
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg01198.html


--
Alex Bennée

Re: [Qemu-devel] [PATCH 00/22] tcg: tb_lock removal
Posted by Emilio G. Cota 6 years, 6 months ago
On Fri, Oct 06, 2017 at 11:56:21 +0100, Alex Bennée wrote:
> 
> Emilio G. Cota <cota@braap.org> writes:
> 
> > On Mon, Aug 07, 2017 at 19:52:16 -0400, Emilio G. Cota wrote:
> >> This series applies on top of the "multiple TCG contexts" series, v4:
> >>   https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06769.html
> > (snip)
> >> Please review!
> >
> > Turns out this patchset breaks icount, even after fixing the patchset
> > it is based on (see [1]).
> 
> Isn't icount only used in single-threaded mode though?

Yes, but it also hits some code paths that get executed very rarely
(or not at all) in !icount mode. And we still acquire locks in single-threaded
mode -- the breakage is most likely due to a missing page_unlock somewhere.

Cheers,

		Emilio