tcg: per-thread TCG

[Qemu-devel] [PATCH 00/22] tcg: per-thread TCG

Posted by Emilio G. Cota 8 years, 7 months ago

Original RFC here:
  https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg06874.html

I included Richard's feedback (Thanks!) from the original RFC, and
added quite a few things. This is now a proper PATCHset since it is
a lot more mature.

Highlights:
- It works! I tested single/multi-threaded arm, aarch64 and alpha softmmu
  with various -smp's (up to 120 on aarch64) and -tb-size's.
  Also tested x86_64-linux-user with multi-threaded code. valgrind's
  drd shows no obvious issues (it doesn't swallow C11 atomics, so it
  spits out a lot of false positives though). Have not tested on a
  non-x86 host, but given the audit I did of global non-const variables
  (see commit message in patch 21), it should be OK.

- Region-based allocation to maximize code_gen_buffer utilization.
  See patch 20.

- Patches 1-8 are unrelated fixes, but I'm keeping them as part of this
  series to avoid merge headaches later on.

- Performance-wise we get a 20% improvement when booting+shutting down
  debian-arm with MTTCG and -smp 8 (see patch 22). Not bad! This is due
  to not holding tb_lock during code translation, although the fact that
  we still have to take it after every translation remains a scalability
  issue. But before focusing on that, I'd like to get this reviewed.

I broke down features as much as possible, so that we do not end up
with a "per-thread TCG" megapatch.

The series applies on top of the current master (b11365867568).

Thanks,

		Emilio

Re: [Qemu-devel] [PATCH 00/22] tcg: per-thread TCG

Posted by Emilio G. Cota 8 years, 7 months ago

On Sun, Jul 09, 2017 at 03:49:52 -0400, Emilio G. Cota wrote:
> The series applies on top of the current master (b11365867568).

It's a lot of patches -- you can fetch them from:
  https://github.com/cota/qemu/commits/multi-tcg

Note that there's a patch in the branch there that is not part
of the patchset ("scripts: add "git.orderfile" for ordering ...")

		E.

Re: [Qemu-devel] [PATCH 00/22] tcg: per-thread TCG

Posted by Alex Bennée 8 years, 7 months ago

Emilio G. Cota <cota@braap.org> writes:

> Original RFC here:
>   https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg06874.html
>
> I included Richard's feedback (Thanks!) from the original RFC, and
> added quite a few things. This is now a proper PATCHset since it is
> a lot more mature.
>
> Highlights:
> - It works! I tested single/multi-threaded arm, aarch64 and alpha softmmu
>   with various -smp's (up to 120 on aarch64) and -tb-size's.
>   Also tested x86_64-linux-user with multi-threaded code. valgrind's
>   drd shows no obvious issues (it doesn't swallow C11 atomics, so it
>   spits out a lot of false positives though). Have not tested on a
>   non-x86 host, but given the audit I did of global non-const variables
>   (see commit message in patch 21), it should be OK.

It would be really nice if we could get ThreadSanitizer to support our
setcontext() co-routines. It was very useful during the original MTTCG
work and is a lot faster than Valgrind. There was some discussion on the
sanitizer lists and a basic plan of what is needed is known but its
unlikely to get done by the project itself.

>
> - Region-based allocation to maximize code_gen_buffer utilization.
>   See patch 20.
>
> - Patches 1-8 are unrelated fixes, but I'm keeping them as part of this
>   series to avoid merge headaches later on.
>
> - Performance-wise we get a 20% improvement when booting+shutting down
>   debian-arm with MTTCG and -smp 8 (see patch 22). Not bad! This is due
>   to not holding tb_lock during code translation, although the fact that
>   we still have to take it after every translation remains a scalability
>   issue. But before focusing on that, I'd like to get this reviewed.

Side issue. Have we considered the impact on codegen buffer utilisation
by doing an "off-code_gen_buffer" no cache translation the first time we
ever see a TB?

>
> I broke down features as much as possible, so that we do not end up
> with a "per-thread TCG" megapatch.
>
> The series applies on top of the current master (b11365867568).
>
> Thanks,
>
> 		Emilio


--
Alex Bennée

Re: [Qemu-devel] [PATCH 00/22] tcg: per-thread TCG

Posted by Richard Henderson 8 years, 7 months ago

On 07/09/2017 11:50 PM, Alex Bennée wrote:
> Side issue. Have we considered the impact on codegen buffer utilisation
> by doing an "off-code_gen_buffer" no cache translation the first time we
> ever see a TB?

No we haven't.  Possibly because we'd need additional infrastructure to do even 
that -- we'd want to record somehow that we've seen a given TB before, so that 
we *could* generate permanent code for it in future.

r~