[Qemu-devel] [PATCH v5 0/8] target/ppc: Optimize emulation of some Altivec instructions

Stefan Brankovic posted 8 patches 4 years, 8 months ago
Test docker-clang@ubuntu passed
Test s390x passed
Test asan passed
Test docker-mingw@fedora passed
Test FreeBSD passed
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/1563200574-11098-1-git-send-email-stefan.brankovic@rt-rk.com
Maintainers: David Gibson <david@gibson.dropbear.id.au>
There is a newer version of this series
target/ppc/helper.h                 |  10 -
target/ppc/int_helper.c             | 365 --------------------
target/ppc/translate/vmx-impl.inc.c | 656 ++++++++++++++++++++++++++++++++----
3 files changed, 587 insertions(+), 444 deletions(-)
[Qemu-devel] [PATCH v5 0/8] target/ppc: Optimize emulation of some Altivec instructions
Posted by Stefan Brankovic 4 years, 8 months ago
Optimize emulation of ten Altivec instructions: lvsl, lvsr, vsl, vsr, vpkpx,
vgbbd, vclzb, vclzh, vclzw and vclzd.

This series buils up on and complements recent work of Thomas Murta, Mark
Cave-Ayland and Richard Henderson in the same area. It is based on devising TCG
translation implementation for selected instructions rather than using helpers.
The selected instructions are most of the time idiosyncratic to ppc platform,
so relatively complex TCG translation (without direct mapping to host
instruction that is not possible in these cases) seems to be the best option,
and that approach is presented in this series. The performance improvements
are significant in all cases.

V5:

Fixed vpkpx bug and added it back in patch.
Fixed graphical distortions on OSX 10.3 and 10.4.
Removed conversion of vmrgh and vmrgl instructions to vector operations for
further investigation.

V4:

Addressed Richard's Henderson's suggestions.
Removed vpkpx's optimization for further investigation on graphical distortions
it caused on OSX 10.2-4 guests.
Added opcodes for vector vmrgh(b|h|w) and vmrgl(b|h|w) in tcg.
Implemented vector vmrgh and vmrgl instructions for i386.
Converted vmrgh and vmrgl instructions to vector operations.

V3:

Fixed problem during build.

V2:

Addressed Richard's Henderson's suggestions.
Fixed problem during build on patch 2/8.
Rebased series to the latest qemu code.

Stefan Brankovic (8):
  target/ppc: Optimize emulation of lvsl and lvsr instructions
  target/ppc: Optimize emulation of vsl and vsr instructions
  target/ppc: Optimize emulation of vpkpx instruction
  target/ppc: Optimize emulation of vgbbd instruction
  target/ppc: Optimize emulation of vclzd instruction
  target/ppc: Optimize emulation of vclzw instruction
  target/ppc: Optimize emulation of vclzh and vclzb instructions
  target/ppc: Refactor emulation of vmrgew and vmrgow instructions

 target/ppc/helper.h                 |  10 -
 target/ppc/int_helper.c             | 365 --------------------
 target/ppc/translate/vmx-impl.inc.c | 656 ++++++++++++++++++++++++++++++++----
 3 files changed, 587 insertions(+), 444 deletions(-)

-- 
2.7.4


Re: [Qemu-devel] [PATCH v5 0/8] target/ppc: Optimize emulation of some Altivec instructions
Posted by David Gibson 4 years, 7 months ago
On Mon, Jul 15, 2019 at 04:22:46PM +0200, Stefan Brankovic wrote:
> Optimize emulation of ten Altivec instructions: lvsl, lvsr, vsl, vsr, vpkpx,
> vgbbd, vclzb, vclzh, vclzw and vclzd.
> 
> This series buils up on and complements recent work of Thomas Murta, Mark
> Cave-Ayland and Richard Henderson in the same area. It is based on devising TCG
> translation implementation for selected instructions rather than using helpers.
> The selected instructions are most of the time idiosyncratic to ppc platform,
> so relatively complex TCG translation (without direct mapping to host
> instruction that is not possible in these cases) seems to be the best option,
> and that approach is presented in this series. The performance improvements
> are significant in all cases.

I was waiting for acks from rth on the remainder of this series, but
it seems to have been forgotten.  Can you rebase and resend the
remaining patches for inclusion in ppc-for-4.2.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [PATCH v5 0/8] target/ppc: Optimize emulation of some Altivec instructions
Posted by David Gibson 4 years, 8 months ago
On Mon, Jul 15, 2019 at 04:22:46PM +0200, Stefan Brankovic wrote:
> Optimize emulation of ten Altivec instructions: lvsl, lvsr, vsl, vsr, vpkpx,
> vgbbd, vclzb, vclzh, vclzw and vclzd.
> 
> This series buils up on and complements recent work of Thomas Murta, Mark
> Cave-Ayland and Richard Henderson in the same area. It is based on devising TCG
> translation implementation for selected instructions rather than using helpers.
> The selected instructions are most of the time idiosyncratic to ppc platform,
> so relatively complex TCG translation (without direct mapping to host
> instruction that is not possible in these cases) seems to be the best option,
> and that approach is presented in this series. The performance improvements
> are significant in all cases.

I've applied 1 & 2, I'm waiting on Richards ack for the rest.

> 
> V5:
> 
> Fixed vpkpx bug and added it back in patch.
> Fixed graphical distortions on OSX 10.3 and 10.4.
> Removed conversion of vmrgh and vmrgl instructions to vector operations for
> further investigation.
> 
> V4:
> 
> Addressed Richard's Henderson's suggestions.
> Removed vpkpx's optimization for further investigation on graphical distortions
> it caused on OSX 10.2-4 guests.
> Added opcodes for vector vmrgh(b|h|w) and vmrgl(b|h|w) in tcg.
> Implemented vector vmrgh and vmrgl instructions for i386.
> Converted vmrgh and vmrgl instructions to vector operations.
> 
> V3:
> 
> Fixed problem during build.
> 
> V2:
> 
> Addressed Richard's Henderson's suggestions.
> Fixed problem during build on patch 2/8.
> Rebased series to the latest qemu code.
> 
> Stefan Brankovic (8):
>   target/ppc: Optimize emulation of lvsl and lvsr instructions
>   target/ppc: Optimize emulation of vsl and vsr instructions
>   target/ppc: Optimize emulation of vpkpx instruction
>   target/ppc: Optimize emulation of vgbbd instruction
>   target/ppc: Optimize emulation of vclzd instruction
>   target/ppc: Optimize emulation of vclzw instruction
>   target/ppc: Optimize emulation of vclzh and vclzb instructions
>   target/ppc: Refactor emulation of vmrgew and vmrgow instructions
> 
>  target/ppc/helper.h                 |  10 -
>  target/ppc/int_helper.c             | 365 --------------------
>  target/ppc/translate/vmx-impl.inc.c | 656 ++++++++++++++++++++++++++++++++----
>  3 files changed, 587 insertions(+), 444 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [PATCH v5 0/8] target/ppc: Optimize emulation of some Altivec instructions
Posted by David Gibson 4 years, 8 months ago
On Mon, Jul 15, 2019 at 04:22:46PM +0200, Stefan Brankovic wrote:
> Optimize emulation of ten Altivec instructions: lvsl, lvsr, vsl, vsr, vpkpx,
> vgbbd, vclzb, vclzh, vclzw and vclzd.
> 
> This series buils up on and complements recent work of Thomas Murta, Mark
> Cave-Ayland and Richard Henderson in the same area. It is based on devising TCG
> translation implementation for selected instructions rather than using helpers.
> The selected instructions are most of the time idiosyncratic to ppc platform,
> so relatively complex TCG translation (without direct mapping to host
> instruction that is not possible in these cases) seems to be the best option,
> and that approach is presented in this series. The performance improvements
> are significant in all cases.

I've now also applied patches 4-6 to ppc-for-4.2, leaving just the
ones which don't have acks from rth yet.

> 
> V5:
> 
> Fixed vpkpx bug and added it back in patch.
> Fixed graphical distortions on OSX 10.3 and 10.4.
> Removed conversion of vmrgh and vmrgl instructions to vector operations for
> further investigation.
> 
> V4:
> 
> Addressed Richard's Henderson's suggestions.
> Removed vpkpx's optimization for further investigation on graphical distortions
> it caused on OSX 10.2-4 guests.
> Added opcodes for vector vmrgh(b|h|w) and vmrgl(b|h|w) in tcg.
> Implemented vector vmrgh and vmrgl instructions for i386.
> Converted vmrgh and vmrgl instructions to vector operations.
> 
> V3:
> 
> Fixed problem during build.
> 
> V2:
> 
> Addressed Richard's Henderson's suggestions.
> Fixed problem during build on patch 2/8.
> Rebased series to the latest qemu code.
> 
> Stefan Brankovic (8):
>   target/ppc: Optimize emulation of lvsl and lvsr instructions
>   target/ppc: Optimize emulation of vsl and vsr instructions
>   target/ppc: Optimize emulation of vpkpx instruction
>   target/ppc: Optimize emulation of vgbbd instruction
>   target/ppc: Optimize emulation of vclzd instruction
>   target/ppc: Optimize emulation of vclzw instruction
>   target/ppc: Optimize emulation of vclzh and vclzb instructions
>   target/ppc: Refactor emulation of vmrgew and vmrgow instructions
> 
>  target/ppc/helper.h                 |  10 -
>  target/ppc/int_helper.c             | 365 --------------------
>  target/ppc/translate/vmx-impl.inc.c | 656 ++++++++++++++++++++++++++++++++----
>  3 files changed, 587 insertions(+), 444 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson