[PATCH 0/3] tcg/i386: Improve 8-bit shifts with VGF2P8AFFINEQB

Richard Henderson posted 3 patches 3 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20250809234208.12158-1-richard.henderson@linaro.org
Maintainers: Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>
host/include/i386/host/cpuinfo.h |  1 +
include/qemu/cpuid.h             |  3 ++
util/cpuinfo-i386.c              |  1 +
tcg/i386/tcg-target-opc.h.inc    |  1 +
tcg/i386/tcg-target.c.inc        | 81 ++++++++++++++++++++++++++++++--
5 files changed, 83 insertions(+), 4 deletions(-)
[PATCH 0/3] tcg/i386: Improve 8-bit shifts with VGF2P8AFFINEQB
Posted by Richard Henderson 3 months, 1 week ago
x86 doesn't directly support 8-bit vector shifts, so we have
some 2 to 5 insn expansions.  With VGF2P8AFFINEQB, we can do
it in 1 insn, plus a (possibly shared) constant load.


r~


Richard Henderson (3):
  cpuinfo/i386: Detect GFNI as an AVX extension
  tcg/i386: Add INDEX_op_x86_vgf2p8affineqb_vec
  tcg/i386: Use vgf2p8affineqb for MO_8 vector shifts

 host/include/i386/host/cpuinfo.h |  1 +
 include/qemu/cpuid.h             |  3 ++
 util/cpuinfo-i386.c              |  1 +
 tcg/i386/tcg-target-opc.h.inc    |  1 +
 tcg/i386/tcg-target.c.inc        | 81 ++++++++++++++++++++++++++++++--
 5 files changed, 83 insertions(+), 4 deletions(-)

-- 
2.43.0
Re: [PATCH 0/3] tcg/i386: Improve 8-bit shifts with VGF2P8AFFINEQB
Posted by Richard Henderson 2 months, 2 weeks ago
On 8/10/25 09:42, Richard Henderson wrote:
> x86 doesn't directly support 8-bit vector shifts, so we have
> some 2 to 5 insn expansions.  With VGF2P8AFFINEQB, we can do
> it in 1 insn, plus a (possibly shared) constant load.
> 
> 
> r~
> 
> 
> Richard Henderson (3):
>    cpuinfo/i386: Detect GFNI as an AVX extension
>    tcg/i386: Add INDEX_op_x86_vgf2p8affineqb_vec
>    tcg/i386: Use vgf2p8affineqb for MO_8 vector shifts
> 
>   host/include/i386/host/cpuinfo.h |  1 +
>   include/qemu/cpuid.h             |  3 ++
>   util/cpuinfo-i386.c              |  1 +
>   tcg/i386/tcg-target-opc.h.inc    |  1 +
>   tcg/i386/tcg-target.c.inc        | 81 ++++++++++++++++++++++++++++++--
>   5 files changed, 83 insertions(+), 4 deletions(-)
> 

Ping.

r~
Re: [PATCH 0/3] tcg/i386: Improve 8-bit shifts with VGF2P8AFFINEQB
Posted by Paolo Bonzini 2 months, 2 weeks ago
On 8/27/25 09:26, Richard Henderson wrote:
> On 8/10/25 09:42, Richard Henderson wrote:
>> x86 doesn't directly support 8-bit vector shifts, so we have
>> some 2 to 5 insn expansions.  With VGF2P8AFFINEQB, we can do
>> it in 1 insn, plus a (possibly shared) constant load.
>>
>>
>> r~
>>
>>
>> Richard Henderson (3):
>>    cpuinfo/i386: Detect GFNI as an AVX extension
>>    tcg/i386: Add INDEX_op_x86_vgf2p8affineqb_vec
>>    tcg/i386: Use vgf2p8affineqb for MO_8 vector shifts
>>
>>   host/include/i386/host/cpuinfo.h |  1 +
>>   include/qemu/cpuid.h             |  3 ++
>>   util/cpuinfo-i386.c              |  1 +
>>   tcg/i386/tcg-target-opc.h.inc    |  1 +
>>   tcg/i386/tcg-target.c.inc        | 81 ++++++++++++++++++++++++++++++--
>>   5 files changed, 83 insertions(+), 4 deletions(-)
>>
> 
> Ping.

I don't know the target-independent part of TCG, but arithmetic right 
shift by 7 probably should keep using pcmpgtb?

There's also a typo in patch 1 (s/NF/FN/):

+#ifndef bit_GNFI
+#define bit_GNFI        (1 << 8)
+#endif

Paolo


Re: [PATCH 0/3] tcg/i386: Improve 8-bit shifts with VGF2P8AFFINEQB
Posted by Richard Henderson 2 months, 2 weeks ago
On 8/27/25 18:12, Paolo Bonzini wrote:
> On 8/27/25 09:26, Richard Henderson wrote:
>> On 8/10/25 09:42, Richard Henderson wrote:
>>> x86 doesn't directly support 8-bit vector shifts, so we have
>>> some 2 to 5 insn expansions.  With VGF2P8AFFINEQB, we can do
>>> it in 1 insn, plus a (possibly shared) constant load.
>>>
>>>
>>> r~
>>>
>>>
>>> Richard Henderson (3):
>>>    cpuinfo/i386: Detect GFNI as an AVX extension
>>>    tcg/i386: Add INDEX_op_x86_vgf2p8affineqb_vec
>>>    tcg/i386: Use vgf2p8affineqb for MO_8 vector shifts
>>>
>>>   host/include/i386/host/cpuinfo.h |  1 +
>>>   include/qemu/cpuid.h             |  3 ++
>>>   util/cpuinfo-i386.c              |  1 +
>>>   tcg/i386/tcg-target-opc.h.inc    |  1 +
>>>   tcg/i386/tcg-target.c.inc        | 81 ++++++++++++++++++++++++++++++--
>>>   5 files changed, 83 insertions(+), 4 deletions(-)
>>>
>>
>> Ping.
> 
> I don't know the target-independent part of TCG, but arithmetic right shift by 7 probably 
> should keep using pcmpgtb?

There is no such target independent expansion.  But it's a good idea to add to x86, at 
least for the two sizes that don't directly support such shift.


r~