Original RFC here:
https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg06874.html
I included Richard's feedback (Thanks!) from the original RFC, and
added quite a few things. This is now a proper PATCHset since it is
a lot more mature.
Highlights:
- It works! I tested single/multi-threaded arm, aarch64 and alpha softmmu
with various -smp's (up to 120 on aarch64) and -tb-size's.
Also tested x86_64-linux-user with multi-threaded code. valgrind's
drd shows no obvious issues (it doesn't swallow C11 atomics, so it
spits out a lot of false positives though). Have not tested on a
non-x86 host, but given the audit I did of global non-const variables
(see commit message in patch 21), it should be OK.
- Region-based allocation to maximize code_gen_buffer utilization.
See patch 20.
- Patches 1-8 are unrelated fixes, but I'm keeping them as part of this
series to avoid merge headaches later on.
- Performance-wise we get a 20% improvement when booting+shutting down
debian-arm with MTTCG and -smp 8 (see patch 22). Not bad! This is due
to not holding tb_lock during code translation, although the fact that
we still have to take it after every translation remains a scalability
issue. But before focusing on that, I'd like to get this reviewed.
I broke down features as much as possible, so that we do not end up
with a "per-thread TCG" megapatch.
The series applies on top of the current master (b11365867568).
Thanks,
Emilio