[Qemu-devel] [PATCH v6 00/73] per-CPU locks

Emilio G. Cota posted 73 patches 6 years, 9 months ago
Test asan failed
Test docker-clang@ubuntu passed
Test docker-mingw@fedora passed
Test checkpatch passed
Failed in applying to current master (apply log)
There is a newer version of this series
accel/tcg/cpu-exec.c            |  40 ++--
accel/tcg/cputlb.c              |  10 +-
accel/tcg/tcg-all.c             |  12 +-
accel/tcg/tcg-runtime.c         |   7 +
accel/tcg/tcg-runtime.h         |   2 +
accel/tcg/translate-all.c       |   2 +-
cpus-common.c                   | 129 ++++++++----
cpus.c                          | 421 ++++++++++++++++++++++++++++++++--------
exec.c                          |   2 +-
gdbstub.c                       |   4 +-
hw/arm/omap1.c                  |   4 +-
hw/arm/pxa2xx_gpio.c            |   2 +-
hw/arm/pxa2xx_pic.c             |   2 +-
hw/intc/s390_flic.c             |   4 +-
hw/mips/cps.c                   |   2 +-
hw/misc/mips_itu.c              |   4 +-
hw/openrisc/cputimer.c          |   2 +-
hw/ppc/e500.c                   |   4 +-
hw/ppc/ppc.c                    |  12 +-
hw/ppc/ppce500_spin.c           |   6 +-
hw/ppc/spapr_cpu_core.c         |   4 +-
hw/ppc/spapr_hcall.c            |   4 +-
hw/ppc/spapr_rtas.c             |   6 +-
hw/sparc/leon3.c                |   2 +-
hw/sparc/sun4m.c                |   8 +-
hw/sparc64/sparc64.c            |   8 +-
include/qom/cpu.h               | 189 +++++++++++++++---
qom/cpu.c                       |  27 ++-
stubs/Makefile.objs             |   1 +
stubs/cpu-lock.c                |  28 +++
target/alpha/cpu.c              |   8 +-
target/alpha/translate.c        |   6 +-
target/arm/arm-powerctl.c       |   4 +-
target/arm/cpu.c                |   8 +-
target/arm/helper.c             |  16 +-
target/arm/machine.c            |   2 +-
target/arm/op_helper.c          |   2 +-
target/cris/cpu.c               |   2 +-
target/cris/helper.c            |   6 +-
target/cris/translate.c         |   5 +-
target/hppa/cpu.c               |   2 +-
target/hppa/translate.c         |   3 +-
target/i386/cpu.c               |   4 +-
target/i386/cpu.h               |   2 +-
target/i386/hax-all.c           |  36 ++--
target/i386/helper.c            |   8 +-
target/i386/hvf/hvf.c           |  16 +-
target/i386/hvf/x86hvf.c        |  38 ++--
target/i386/kvm.c               |  78 ++++----
target/i386/misc_helper.c       |   2 +-
target/i386/seg_helper.c        |  13 +-
target/i386/svm_helper.c        |   6 +-
target/i386/whpx-all.c          |  57 +++---
target/lm32/cpu.c               |   2 +-
target/lm32/op_helper.c         |   4 +-
target/m68k/cpu.c               |   2 +-
target/m68k/op_helper.c         |   2 +-
target/m68k/translate.c         |   9 +-
target/microblaze/cpu.c         |   2 +-
target/microblaze/translate.c   |   4 +-
target/mips/cpu.c               |  11 +-
target/mips/kvm.c               |   4 +-
target/mips/op_helper.c         |   8 +-
target/mips/translate.c         |   4 +-
target/moxie/cpu.c              |   2 +-
target/nios2/cpu.c              |   2 +-
target/openrisc/cpu.c           |   4 +-
target/openrisc/sys_helper.c    |   4 +-
target/ppc/excp_helper.c        |   8 +-
target/ppc/helper_regs.h        |   2 +-
target/ppc/kvm.c                |   8 +-
target/ppc/translate.c          |   6 +-
target/ppc/translate_init.inc.c |  36 ++--
target/riscv/cpu.c              |   5 +-
target/riscv/op_helper.c        |   2 +-
target/s390x/cpu.c              |  28 ++-
target/s390x/excp_helper.c      |   4 +-
target/s390x/kvm.c              |   2 +-
target/s390x/sigp.c             |   8 +-
target/sh4/cpu.c                |   2 +-
target/sh4/helper.c             |   2 +-
target/sh4/op_helper.c          |   2 +-
target/sparc/cpu.c              |   6 +-
target/sparc/helper.c           |   2 +-
target/unicore32/cpu.c          |   2 +-
target/unicore32/softmmu.c      |   2 +-
target/xtensa/cpu.c             |   6 +-
target/xtensa/exc_helper.c      |   2 +-
target/xtensa/helper.c          |   2 +-
89 files changed, 1018 insertions(+), 455 deletions(-)
[Qemu-devel] [PATCH v6 00/73] per-CPU locks
Posted by Emilio G. Cota 6 years, 9 months ago
v5: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02979.html

For context, the goal of this series is to substitute the BQL for the
per-CPU locks in many places, notably the execution loop in cpus.c.
This leads to better scalability for MTTCG, since CPUs don't have
to acquire a contended global lock (the BQL) every time they
stop executing code.

See the last commit for some performance numbers.

After this series, the remaining obstacles to achieving KVM-like
scalability in MTTCG are: (1) interrupt handling, which
in some targets requires the BQL, and (2) frequent execution of
"async safe" work.
That said, some targets scale great on MTTCG even before this
series -- for instance, when running a parallel compilation job
in an x86_64 guest, scalability is comparable to what we get with
KVM.

This series is very long. If you only have time to look at a few patches,
I suggest the following, which do most of the heavy lifting and
have not yet been reviewed:

- Patch 7: cpu: make per-CPU locks an alias of the BQL in TCG rr mode
- Patch 70: cpu: protect CPU state with cpu->lock instead of the BQL

I've tested all patches with `make check-qtest -j' for all targets.
The series is checkpatch-clean (just some warnings about __COVERITY__).

You can fetch the series from:
  https://github.com/cota/qemu/tree/cpu-lock-v6

---
Changes since v5:

- Rebase on current master
  + Fixed a few conflicts, and converted the references to cpu->halted and
    cpu->interrupt_request that had been added since v5.

- Add R-b's and Ack's -- thanks everyone!

Thanks,

		Emilio
---
 accel/tcg/cpu-exec.c            |  40 ++--
 accel/tcg/cputlb.c              |  10 +-
 accel/tcg/tcg-all.c             |  12 +-
 accel/tcg/tcg-runtime.c         |   7 +
 accel/tcg/tcg-runtime.h         |   2 +
 accel/tcg/translate-all.c       |   2 +-
 cpus-common.c                   | 129 ++++++++----
 cpus.c                          | 421 ++++++++++++++++++++++++++++++++--------
 exec.c                          |   2 +-
 gdbstub.c                       |   4 +-
 hw/arm/omap1.c                  |   4 +-
 hw/arm/pxa2xx_gpio.c            |   2 +-
 hw/arm/pxa2xx_pic.c             |   2 +-
 hw/intc/s390_flic.c             |   4 +-
 hw/mips/cps.c                   |   2 +-
 hw/misc/mips_itu.c              |   4 +-
 hw/openrisc/cputimer.c          |   2 +-
 hw/ppc/e500.c                   |   4 +-
 hw/ppc/ppc.c                    |  12 +-
 hw/ppc/ppce500_spin.c           |   6 +-
 hw/ppc/spapr_cpu_core.c         |   4 +-
 hw/ppc/spapr_hcall.c            |   4 +-
 hw/ppc/spapr_rtas.c             |   6 +-
 hw/sparc/leon3.c                |   2 +-
 hw/sparc/sun4m.c                |   8 +-
 hw/sparc64/sparc64.c            |   8 +-
 include/qom/cpu.h               | 189 +++++++++++++++---
 qom/cpu.c                       |  27 ++-
 stubs/Makefile.objs             |   1 +
 stubs/cpu-lock.c                |  28 +++
 target/alpha/cpu.c              |   8 +-
 target/alpha/translate.c        |   6 +-
 target/arm/arm-powerctl.c       |   4 +-
 target/arm/cpu.c                |   8 +-
 target/arm/helper.c             |  16 +-
 target/arm/machine.c            |   2 +-
 target/arm/op_helper.c          |   2 +-
 target/cris/cpu.c               |   2 +-
 target/cris/helper.c            |   6 +-
 target/cris/translate.c         |   5 +-
 target/hppa/cpu.c               |   2 +-
 target/hppa/translate.c         |   3 +-
 target/i386/cpu.c               |   4 +-
 target/i386/cpu.h               |   2 +-
 target/i386/hax-all.c           |  36 ++--
 target/i386/helper.c            |   8 +-
 target/i386/hvf/hvf.c           |  16 +-
 target/i386/hvf/x86hvf.c        |  38 ++--
 target/i386/kvm.c               |  78 ++++----
 target/i386/misc_helper.c       |   2 +-
 target/i386/seg_helper.c        |  13 +-
 target/i386/svm_helper.c        |   6 +-
 target/i386/whpx-all.c          |  57 +++---
 target/lm32/cpu.c               |   2 +-
 target/lm32/op_helper.c         |   4 +-
 target/m68k/cpu.c               |   2 +-
 target/m68k/op_helper.c         |   2 +-
 target/m68k/translate.c         |   9 +-
 target/microblaze/cpu.c         |   2 +-
 target/microblaze/translate.c   |   4 +-
 target/mips/cpu.c               |  11 +-
 target/mips/kvm.c               |   4 +-
 target/mips/op_helper.c         |   8 +-
 target/mips/translate.c         |   4 +-
 target/moxie/cpu.c              |   2 +-
 target/nios2/cpu.c              |   2 +-
 target/openrisc/cpu.c           |   4 +-
 target/openrisc/sys_helper.c    |   4 +-
 target/ppc/excp_helper.c        |   8 +-
 target/ppc/helper_regs.h        |   2 +-
 target/ppc/kvm.c                |   8 +-
 target/ppc/translate.c          |   6 +-
 target/ppc/translate_init.inc.c |  36 ++--
 target/riscv/cpu.c              |   5 +-
 target/riscv/op_helper.c        |   2 +-
 target/s390x/cpu.c              |  28 ++-
 target/s390x/excp_helper.c      |   4 +-
 target/s390x/kvm.c              |   2 +-
 target/s390x/sigp.c             |   8 +-
 target/sh4/cpu.c                |   2 +-
 target/sh4/helper.c             |   2 +-
 target/sh4/op_helper.c          |   2 +-
 target/sparc/cpu.c              |   6 +-
 target/sparc/helper.c           |   2 +-
 target/unicore32/cpu.c          |   2 +-
 target/unicore32/softmmu.c      |   2 +-
 target/xtensa/cpu.c             |   6 +-
 target/xtensa/exc_helper.c      |   2 +-
 target/xtensa/helper.c          |   2 +-
 89 files changed, 1018 insertions(+), 455 deletions(-)

Re: [Qemu-devel] [PATCH v6 00/73] per-CPU locks
Posted by Richard Henderson 6 years, 8 months ago
On 1/29/19 4:46 PM, Emilio G. Cota wrote:
> v5: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02979.html
> 
> For context, the goal of this series is to substitute the BQL for the
> per-CPU locks in many places, notably the execution loop in cpus.c.
> This leads to better scalability for MTTCG, since CPUs don't have
> to acquire a contended global lock (the BQL) every time they
> stop executing code.
> 
> See the last commit for some performance numbers.
> 
> After this series, the remaining obstacles to achieving KVM-like
> scalability in MTTCG are: (1) interrupt handling, which
> in some targets requires the BQL, and (2) frequent execution of
> "async safe" work.
> That said, some targets scale great on MTTCG even before this
> series -- for instance, when running a parallel compilation job
> in an x86_64 guest, scalability is comparable to what we get with
> KVM.
> 
> This series is very long. If you only have time to look at a few patches,
> I suggest the following, which do most of the heavy lifting and
> have not yet been reviewed:
> 
> - Patch 7: cpu: make per-CPU locks an alias of the BQL in TCG rr mode
> - Patch 70: cpu: protect CPU state with cpu->lock instead of the BQL
> 
> I've tested all patches with `make check-qtest -j' for all targets.
> The series is checkpatch-clean (just some warnings about __COVERITY__).
> 
> You can fetch the series from:
>   https://github.com/cota/qemu/tree/cpu-lock-v6

Thanks for the patience.  Both Alex and I have now completed review, and I
think this is ready for merge.

There are some patch conflicts with master, so if you can fix those and post a
v7, we'll get it merged right away.


r~

Re: [Qemu-devel] [PATCH v6 00/73] per-CPU locks
Posted by Emilio G. Cota 6 years, 8 months ago
On Wed, Feb 20, 2019 at 09:27:06 -0800, Richard Henderson wrote:
> Thanks for the patience.  Both Alex and I have now completed review, and I
> think this is ready for merge.
> 
> There are some patch conflicts with master, so if you can fix those and post a
> v7, we'll get it merged right away.

Thanks for reviewing! Will send a v7 in a few days.

		Emilio