[PATCH 0/7] Embedded PPC misc clean up and optimisation

BALATON Zoltan posted 7 patches 11 months, 2 weeks ago
Failed in applying to current master (apply log)
Maintainers: Daniel Henrique Barboza <danielhb413@gmail.com>, "Cédric Le Goater" <clg@kaod.org>, David Gibson <david@gibson.dropbear.id.au>, Greg Kurz <groug@kaod.org>
target/ppc/cpu.h        | 13 ++----
target/ppc/mmu_common.c | 91 +++++++++++++++++++++++------------------
target/ppc/mmu_helper.c | 32 +--------------
3 files changed, 57 insertions(+), 79 deletions(-)
[PATCH 0/7] Embedded PPC misc clean up and optimisation
Posted by BALATON Zoltan 11 months, 2 weeks ago
Hello,

This series improves embedded PPC TLB emulation a bit and contains
some misc clean up I've found along the way. Before this patch
ppcemb_tlb_check() shows up in a memory access intensive profile
(running RageMem speed test in AmigaOS on sam460ex) at 11.91%
children, 10.77% self. After this series it does not show up at all.
This is not the biggest bottleneck, that is calling tlb_flush() from
helper_440_tlbwe() excessively but this was simpler to clean up and
still makes a small improvement.

RageMem results on master:
---> RAM <---
READ32:  593 MB/Sec
READ64:  616 MB/Sec
WRITE32: 589 MB/Sec
WRITE64: 621 MB/Sec
WRITE: 518 MB/Sec (Tricky)

---> VIDEO BUS <---
READ:  588 MB/Sec
WRITE: 571 MB/Sec

with this series:
---> RAM <---
READ32:  674 MB/Sec
READ64:  707 MB/Sec
WRITE32: 665 MB/Sec
WRITE64: 714 MB/Sec
WRITE: 580 MB/Sec (Tricky)

---> VIDEO BUS <---
READ:  691 MB/Sec
WRITE: 662 MB/Sec

The results have some jitter but both the higher values and that the
function is gone from the profile can prove the series has an effect.
If nothing else then simplifying the code a bit. For comparison this
is faster than a real sam460ex but much slower than running the same
with -M pegasos2 so embedded PPC TLB emulation still might need some
improvement. I know these are different and PPC440 has software
assisted TLB but the problem with it seems to be too much tlb_flushes
not that it needs more exceptions.

(If somebody is interested to reproduce and experiment with it the
benchmarks and some results are available from here:
https://www.amigans.net/modules/newbb/viewtopic.php?topic_id=9226
some of the tests also have MorphOS versions that's easier to get than
AmigaOS or sources that could be compiled under Linux.)

Regards,
BALATON Zoltan

BALATON Zoltan (7):
  target/ppc: Remove single use function
  target/ppc: Remove "ext" parameter of ppcemb_tlb_check()
  target/ppc: Move ppcemb_tlb_search() to mmu_common.c
  target/ppc: Remove some unneded line breaks
  target/ppc: Simplify ppcemb_tlb_search()
  target/ppc: Change ppcemb_tlb_check() to return bool
  target/ppc: Eliminate goto in mmubooke_check_tlb()

 target/ppc/cpu.h        | 13 ++----
 target/ppc/mmu_common.c | 91 +++++++++++++++++++++++------------------
 target/ppc/mmu_helper.c | 32 +--------------
 3 files changed, 57 insertions(+), 79 deletions(-)

-- 
2.30.9
Re: [PATCH 0/7] Embedded PPC misc clean up and optimisation
Posted by Daniel Henrique Barboza 11 months, 1 week ago
Queued in gitlab.com/danielhb/qemu/tree/ppc-next. Thanks,


Daniel

On 5/30/23 10:28, BALATON Zoltan wrote:
> Hello,
> 
> This series improves embedded PPC TLB emulation a bit and contains
> some misc clean up I've found along the way. Before this patch
> ppcemb_tlb_check() shows up in a memory access intensive profile
> (running RageMem speed test in AmigaOS on sam460ex) at 11.91%
> children, 10.77% self. After this series it does not show up at all.
> This is not the biggest bottleneck, that is calling tlb_flush() from
> helper_440_tlbwe() excessively but this was simpler to clean up and
> still makes a small improvement.
> 
> RageMem results on master:
> ---> RAM <---
> READ32:  593 MB/Sec
> READ64:  616 MB/Sec
> WRITE32: 589 MB/Sec
> WRITE64: 621 MB/Sec
> WRITE: 518 MB/Sec (Tricky)
> 
> ---> VIDEO BUS <---
> READ:  588 MB/Sec
> WRITE: 571 MB/Sec
> 
> with this series:
> ---> RAM <---
> READ32:  674 MB/Sec
> READ64:  707 MB/Sec
> WRITE32: 665 MB/Sec
> WRITE64: 714 MB/Sec
> WRITE: 580 MB/Sec (Tricky)
> 
> ---> VIDEO BUS <---
> READ:  691 MB/Sec
> WRITE: 662 MB/Sec
> 
> The results have some jitter but both the higher values and that the
> function is gone from the profile can prove the series has an effect.
> If nothing else then simplifying the code a bit. For comparison this
> is faster than a real sam460ex but much slower than running the same
> with -M pegasos2 so embedded PPC TLB emulation still might need some
> improvement. I know these are different and PPC440 has software
> assisted TLB but the problem with it seems to be too much tlb_flushes
> not that it needs more exceptions.
> 
> (If somebody is interested to reproduce and experiment with it the
> benchmarks and some results are available from here:
> https://www.amigans.net/modules/newbb/viewtopic.php?topic_id=9226
> some of the tests also have MorphOS versions that's easier to get than
> AmigaOS or sources that could be compiled under Linux.)
> 
> Regards,
> BALATON Zoltan
> 
> BALATON Zoltan (7):
>    target/ppc: Remove single use function
>    target/ppc: Remove "ext" parameter of ppcemb_tlb_check()
>    target/ppc: Move ppcemb_tlb_search() to mmu_common.c
>    target/ppc: Remove some unneded line breaks
>    target/ppc: Simplify ppcemb_tlb_search()
>    target/ppc: Change ppcemb_tlb_check() to return bool
>    target/ppc: Eliminate goto in mmubooke_check_tlb()
> 
>   target/ppc/cpu.h        | 13 ++----
>   target/ppc/mmu_common.c | 91 +++++++++++++++++++++++------------------
>   target/ppc/mmu_helper.c | 32 +--------------
>   3 files changed, 57 insertions(+), 79 deletions(-)
>