[PATCH for-8.0 v3 00/45] tcg: Support for Int128 with helpers

Richard Henderson posted 45 patches 1 year, 6 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20221111074101.2069454-1-richard.henderson@linaro.org
Maintainers: Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Riku Voipio <riku.voipio@iki.fi>, "Marc-André Lureau" <marcandre.lureau@redhat.com>, "Daniel P. Berrangé" <berrange@redhat.com>, Thomas Huth <thuth@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>, Artyom Tarasenko <atar4qemu@gmail.com>, WANG Xuerui <git@xen0n.name>, Aurelien Jarno <aurelien@aurel32.net>, Huacai Chen <chenhuacai@kernel.org>, Jiaxun Yang <jiaxun.yang@flygoat.com>, Aleksandar Rikalo <aleksandar.rikalo@syrmia.com>, Palmer Dabbelt <palmer@dabbelt.com>, Alistair Francis <Alistair.Francis@wdc.com>, Stefan Weil <sw@weilnetz.de>
There is a newer version of this series
accel/tcg/tcg-runtime.h          |   11 +
include/exec/cpu_ldst.h          |   10 +
include/exec/helper-head.h       |    9 +-
include/qemu/atomic128.h         |   29 +-
include/qemu/int128.h            |   25 +-
include/tcg/tcg-op.h             |   50 +-
include/tcg/tcg.h                |  145 ++-
tcg/aarch64/tcg-target.h         |    6 +-
tcg/arm/tcg-target-con-set.h     |    7 +-
tcg/arm/tcg-target-con-str.h     |    2 +
tcg/arm/tcg-target.h             |    6 +-
tcg/i386/tcg-target.h            |   12 +
tcg/loongarch64/tcg-target.h     |    5 +-
tcg/mips/tcg-target.h            |    6 +-
tcg/riscv/tcg-target.h           |   10 +-
tcg/s390x/tcg-target-con-set.h   |    4 +-
tcg/s390x/tcg-target-con-str.h   |    8 +-
tcg/s390x/tcg-target.h           |    5 +-
tcg/sparc64/tcg-target.h         |    5 +-
tcg/tcg-internal.h               |   75 +-
tcg/tci/tcg-target.h             |   10 +
accel/tcg/cputlb.c               |  112 ++
accel/tcg/plugin-gen.c           |   54 +-
accel/tcg/user-exec.c            |   66 ++
hw/core/cpu-common.c             |    1 +
target/sparc/translate.c         |   21 +-
tcg/optimize.c                   |   10 +-
tcg/tcg-op-vec.c                 |   10 +-
tcg/tcg-op.c                     |  442 ++++++--
tcg/tcg.c                        | 1679 +++++++++++++++++++++---------
tcg/tci.c                        |   66 +-
util/int128.c                    |   42 +
accel/tcg/atomic_common.c.inc    |   45 +
meson.build                      |    4 +-
tcg/aarch64/tcg-target.c.inc     |   36 +-
tcg/arm/tcg-target.c.inc         |   68 +-
tcg/i386/tcg-target.c.inc        |   57 +-
tcg/loongarch64/tcg-target.c.inc |   24 +-
tcg/mips/tcg-target.c.inc        |   20 +-
tcg/ppc/tcg-target.c.inc         |   56 +-
tcg/riscv/tcg-target.c.inc       |   24 +-
tcg/s390x/tcg-target.c.inc       |   71 +-
tcg/sparc64/tcg-target.c.inc     |   22 +-
tcg/tci/tcg-target.c.inc         |   36 +-
44 files changed, 2506 insertions(+), 900 deletions(-)
[PATCH for-8.0 v3 00/45] tcg: Support for Int128 with helpers
Posted by Richard Henderson 1 year, 6 months ago
This is working toward improving atomicity within TCG, especially
with respect to Arm FEAT_LSE2, which guarantees that any operation
that does not cross a 16-byte boundary is treated atomically.

(Incidentally, I've also stumbled across language in the Intel SDM
that shows the feature is required there too, and the guarantee is
even bigger -- anything that does not cross a cache line boundary.
Given that we've only ever got 16-byte atomic operations on other
hosts, there's no chance of supporting full cache line generically.
But we could turn on the same "within 16" atomicity that will be
used for FEAT_LSE2.)

That goal is somewhat down the road.  This patch set contains two
items: paired register allocation and TCGv_i128 usage with helpers.

The next step will be putting these two together to provide atomic
128-bit load/store operations within TCG.  Via, e.g. AArch64 LDP,
Power7 LDQ, or S390x LPQ -- all of which require allocating a pair
of registers.  (Intel will require that I go through AVX, which is
a bit of a complication, but I'll figure that out.)  And then of
course separately via the helpers used by the slow path.

Patches for target/ to use and test this to follow.

Changes for v3:
  * Testing showed that trying to make things "easier" for the
    register allocator on 32-bit hosts by keeping TCGv_i128 as
    a blob was a mistake.  Now split to 4 parts, similar to
    how we treat TCGv_i64 as 2 parts.
  * Fallout from the above is that we now have to support more
    than 14 call arguments, which meant expanding TCGOp.
    Now allocated variable sized, using only the nubmer of
    operands required.  This could in fact result in less memory
    usage on average, but haven't collected any numbers.
  * Implement (non-atomic) load/store on TCGv_i128, which gives
    some of the helpers required for...
  * Implement tcg_gen_atomic_cmpxchg_i128, which will eliminate
    the primary source of 128-bit hacks/ifdefs in target/.

Changes for v2:
  * Fixes and r-b (philmd).
  * Include i386 atomic16 patch, which avoids minor conflicts later.
  * Split a few larger patches.
  * Bug fixes for TCI.


r~


Richard Henderson (45):
  meson: Move CONFIG_TCG_INTERPRETER to config_host
  tcg: Tidy tcg_reg_alloc_op
  tcg: Introduce paired register allocation
  tcg/s390x: Use register pair allocation for div and mulu2
  tcg/arm: Use register pair allocation for qemu_{ld,st}_i64
  tcg: Remove TCG_TARGET_STACK_GROWSUP
  accel/tcg: Set cflags_next_tb in cpu_common_initfn
  target/sparc: Avoid TCGV_{LOW,HIGH}
  tcg: Move TCG_{LOW,HIGH} to tcg-internal.h
  tcg: Add temp_subindex to TCGTemp
  tcg: Simplify calls to temp_sync vs mem_coherent
  tcg: Allocate TCGTemp pairs in host memory order
  tcg: Move TCG_TYPE_COUNT outside enum
  tcg: Introduce tcg_type_size
  tcg: Introduce TCGCallReturnKind and TCGCallArgumentKind
  tcg: Replace TCG_TARGET_CALL_ALIGN_ARGS with TCG_TARGET_CALL_ARG_I64
  tcg: Replace TCG_TARGET_EXTEND_ARGS with TCG_TARGET_CALL_ARG_I32
  tcg: Use TCG_CALL_ARG_EVEN for TCI special case
  accel/tcg/plugin: Don't search for the function pointer index
  accel/tcg/plugin: Avoid duplicate copy in copy_call
  accel/tcg/plugin: Use copy_op in append_{udata,mem}_cb
  tci: MAX_OPC_PARAM_IARGS is no longer used
  tcg: Vary the allocation size for TCGOp
  tcg: Use output_pref wrapper function
  tcg: Reorg function calls
  tcg: Move ffi_cif pointer into TCGHelperInfo
  tcg/aarch64: Merge tcg_out_callr into tcg_out_call
  tcg: Add TCGHelperInfo argument to tcg_out_call
  tcg: Define TCG_TYPE_I128 and related helper macros
  tcg: Handle dh_typecode_i128 with TCG_CALL_{RET,ARG}_NORMAL
  tcg: Allocate objects contiguously in temp_allocate_frame
  tcg: Introduce tcg_out_addi_ptr
  tcg: Add TCG_CALL_{RET,ARG}_BY_REF
  tcg: Introduce tcg_target_call_oarg_reg
  tcg: Add TCG_CALL_RET_BY_VEC
  include/qemu/int128: Use Int128 structure for TCI
  tcg/i386: Add TCG_TARGET_CALL_{RET,ARG}_I128
  tcg/tci: Fix big-endian return register ordering
  tcg/tci: Add TCG_TARGET_CALL_{RET,ARG}_I128
  tcg: Add TCG_TARGET_CALL_{RET,ARG}_I128
  tcg: Add temp allocation for TCGv_i128
  tcg: Add basic data movement for TCGv_i128
  tcg: Add guest load/store primitives for TCGv_i128
  tcg: Add tcg_gen_{non}atomic_cmpxchg_i128
  tcg: Split out tcg_gen_nonatomic_cmpxchg_i{32,64}

 accel/tcg/tcg-runtime.h          |   11 +
 include/exec/cpu_ldst.h          |   10 +
 include/exec/helper-head.h       |    9 +-
 include/qemu/atomic128.h         |   29 +-
 include/qemu/int128.h            |   25 +-
 include/tcg/tcg-op.h             |   50 +-
 include/tcg/tcg.h                |  145 ++-
 tcg/aarch64/tcg-target.h         |    6 +-
 tcg/arm/tcg-target-con-set.h     |    7 +-
 tcg/arm/tcg-target-con-str.h     |    2 +
 tcg/arm/tcg-target.h             |    6 +-
 tcg/i386/tcg-target.h            |   12 +
 tcg/loongarch64/tcg-target.h     |    5 +-
 tcg/mips/tcg-target.h            |    6 +-
 tcg/riscv/tcg-target.h           |   10 +-
 tcg/s390x/tcg-target-con-set.h   |    4 +-
 tcg/s390x/tcg-target-con-str.h   |    8 +-
 tcg/s390x/tcg-target.h           |    5 +-
 tcg/sparc64/tcg-target.h         |    5 +-
 tcg/tcg-internal.h               |   75 +-
 tcg/tci/tcg-target.h             |   10 +
 accel/tcg/cputlb.c               |  112 ++
 accel/tcg/plugin-gen.c           |   54 +-
 accel/tcg/user-exec.c            |   66 ++
 hw/core/cpu-common.c             |    1 +
 target/sparc/translate.c         |   21 +-
 tcg/optimize.c                   |   10 +-
 tcg/tcg-op-vec.c                 |   10 +-
 tcg/tcg-op.c                     |  442 ++++++--
 tcg/tcg.c                        | 1679 +++++++++++++++++++++---------
 tcg/tci.c                        |   66 +-
 util/int128.c                    |   42 +
 accel/tcg/atomic_common.c.inc    |   45 +
 meson.build                      |    4 +-
 tcg/aarch64/tcg-target.c.inc     |   36 +-
 tcg/arm/tcg-target.c.inc         |   68 +-
 tcg/i386/tcg-target.c.inc        |   57 +-
 tcg/loongarch64/tcg-target.c.inc |   24 +-
 tcg/mips/tcg-target.c.inc        |   20 +-
 tcg/ppc/tcg-target.c.inc         |   56 +-
 tcg/riscv/tcg-target.c.inc       |   24 +-
 tcg/s390x/tcg-target.c.inc       |   71 +-
 tcg/sparc64/tcg-target.c.inc     |   22 +-
 tcg/tci/tcg-target.c.inc         |   36 +-
 44 files changed, 2506 insertions(+), 900 deletions(-)

-- 
2.34.1