[PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops

Daniel Henrique Barboza posted 8 patches 2 years, 4 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20211222134520.587877-1-danielhb413@gmail.com
Maintainers: David Gibson <david@gibson.dropbear.id.au>, Greg Kurz <groug@kaod.org>, "Cédric Le Goater" <clg@kaod.org>, Daniel Henrique Barboza <danielhb413@gmail.com>
There is a newer version of this series
target/ppc/helper.h                  |   2 +-
target/ppc/power8-pmu-insn-cnt.c.inc | 365 +++++++++++++++++++++++++++
target/ppc/power8-pmu.c              |  60 +----
target/ppc/translate.c               |  44 +---
4 files changed, 372 insertions(+), 99 deletions(-)
create mode 100644 target/ppc/power8-pmu-insn-cnt.c.inc
[PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops
Posted by Daniel Henrique Barboza 2 years, 4 months ago
Hi,

Two days ago Richard Henderson reported test failures with Avocado and
powernv8/9 due to timeouts [1]. The culprit ended up to be commit , a
commit where I introduced PMU instruction counting for TCG PPC64.

For a reason that is still unclear to me these Avocado powernv tests are
suffering a huge performance impact after that patch, something that I
didn't verify in any other scenario I've tested. So one alternative to
fix the situation is to understand this difference and try to solve it,
which can take some time. 
 
Another alternative is to optimize the code introduced by that commit.
Today the instruction count is done by a TCG helper that is called after
each TB exit. I was aware that calling a helper frequently isn't
optimal, but that got the job done and didn't  hindered the use of
pSeries and powernv machines.  Well, until [1] at least.

This series rewrites the PMU instruction counting using TCG Ops instead
of a TCG helper. To do that we needed to write in TCG Ops not only the
logic for increment the counters but also the logic to detect counter
overflows.

A lot of code was added but the performance improvement is noticeable.
Using my local machine I did some test runs with the 2 Avocado powernv
tests that are timing out at this moment:

- failing Avocado powernv tests with current master:

 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (70.17 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (70.90 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (70.81 s)
 
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (75.62 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (69.79 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (72.33 s)

- after this series:

 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (39.90 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (38.25 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (37.99 s)

 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (43.17 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (43.64 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (44.21 s)


I've also tested this code with the EBB exception patch that is pending
re-send [2]. The EBB kernel selftests are working as expected. This
means that we improved the performance and didn't lost any PMU
capability we already have.


[1] https://lists.gnu.org/archive/html/qemu-devel/2021-12/msg03486.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-12/msg00082.html


Daniel Henrique Barboza (8):
  target/ppc: introduce power8-pmu-insn-cnt.c.inc
  target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc5()
  target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc1()
  target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc2()
  target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc3()
  target/ppc/power8-pmu-insn-cnt.c: add pmu_inc_pmc4()
  target/ppc/power8-pmu-insn-cnt: add pmu_check_overflow()
  target/ppc/power8-pmu.c: remove helper_insns_inc()

 target/ppc/helper.h                  |   2 +-
 target/ppc/power8-pmu-insn-cnt.c.inc | 365 +++++++++++++++++++++++++++
 target/ppc/power8-pmu.c              |  60 +----
 target/ppc/translate.c               |  44 +---
 4 files changed, 372 insertions(+), 99 deletions(-)
 create mode 100644 target/ppc/power8-pmu-insn-cnt.c.inc

-- 
2.33.1


Re: [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops
Posted by Richard Henderson 2 years, 4 months ago
On 12/22/21 5:45 AM, Daniel Henrique Barboza wrote:
> Hi,
> 
> Two days ago Richard Henderson reported test failures with Avocado and
> powernv8/9 due to timeouts [1]. The culprit ended up to be commit , a
> commit where I introduced PMU instruction counting for TCG PPC64.
> 
> For a reason that is still unclear to me these Avocado powernv tests are
> suffering a huge performance impact after that patch, something that I
> didn't verify in any other scenario I've tested. So one alternative to
> fix the situation is to understand this difference and try to solve it,
> which can take some time.
>   
> Another alternative is to optimize the code introduced by that commit.
> Today the instruction count is done by a TCG helper that is called after
> each TB exit. I was aware that calling a helper frequently isn't
> optimal, but that got the job done and didn't  hindered the use of
> pSeries and powernv machines.  Well, until [1] at least.
> 
> This series rewrites the PMU instruction counting using TCG Ops instead
> of a TCG helper. To do that we needed to write in TCG Ops not only the
> logic for increment the counters but also the logic to detect counter
> overflows.
> 
> A lot of code was added but the performance improvement is noticeable.
> Using my local machine I did some test runs with the 2 Avocado powernv
> tests that are timing out at this moment:

You generate a *lot* of inline code here.  Way too much, actually.

If you can get this performance improvement with this reorg, it merely means that your 
original C algorithm was poor.  The compiler should have been able to do better.

I've tested this theory here and...

> - failing Avocado powernv tests with current master:
> 
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (70.17 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (70.90 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (70.81 s)
>   
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (75.62 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (69.79 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (72.33 s)

boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (75.73 s)
boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (80.20 s)

> - after this series:
> 
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (39.90 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (38.25 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (37.99 s)
> 
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (43.17 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (43.64 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (44.21 s)

boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (39.66 s)
boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (43.02 s)

BTW, pre-power8-pmu, 29c4a3363b:

boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (36.62 s)
boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (39.69 s)

I'll post my series shortly.


r~