[PATCH 0/2] tcg: improve instruction selection for extract and deposit_z

Paolo Bonzini posted 2 patches 3 weeks, 3 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260115135453.140870-1-pbonzini@redhat.com
Maintainers: Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>
include/tcg/tcg.h |  2 ++
tcg/tcg-op.c      | 58 +++++++++++++++++++++++++++++++----------------
tcg/tcg.c         | 23 +++++++++++++++----
3 files changed, 59 insertions(+), 24 deletions(-)
[PATCH 0/2] tcg: improve instruction selection for extract and deposit_z
Posted by Paolo Bonzini 3 weeks, 3 days ago
extract and deposit_z are similar operations, only differing in
that extract shifts the operand right and deposit_z shifts it left.
However, their code generation is currently different.

extract is implemented as either SHL+SHR or SHR+AND, with the latter
chosen for "simple" cases where we expect the immediate to be available
or a zero extension instruction to be usable.  deposit instead uses only
AND+SHL, though SHL+SHR would be just as usable.

To get the best of both worlds, introduce tcg_op_imm_match to check
whether the processor supports the immediate that is needed for the mask,
and if not fall back to two shifts.

Paolo

Paolo Bonzini (2):
  tcg: target-dependent lowering of extract to shr/and
  tcg: possibly convert deposit_z to shl+shr

 include/tcg/tcg.h |  2 ++
 tcg/tcg-op.c      | 58 +++++++++++++++++++++++++++++++----------------
 tcg/tcg.c         | 23 +++++++++++++++----
 3 files changed, 59 insertions(+), 24 deletions(-)

-- 
2.52.0
Re: [PATCH 0/2] tcg: improve instruction selection for extract and deposit_z
Posted by Richard Henderson 3 weeks, 1 day ago
On 1/16/26 00:54, Paolo Bonzini wrote:
> extract and deposit_z are similar operations, only differing in
> that extract shifts the operand right and deposit_z shifts it left.
> However, their code generation is currently different.
> 
> extract is implemented as either SHL+SHR or SHR+AND, with the latter
> chosen for "simple" cases where we expect the immediate to be available
> or a zero extension instruction to be usable.  deposit instead uses only
> AND+SHL, though SHL+SHR would be just as usable.
> 
> To get the best of both worlds, introduce tcg_op_imm_match to check
> whether the processor supports the immediate that is needed for the mask,
> and if not fall back to two shifts.

Hmm.

I have a patch set that's been stagnant for a while, now currently waiting on the removal 
of 32-bit hosts, which delays expansion of extract and deposit until optimize, when we can 
see then input constants.

There's probably some overlap here.


r~