The main objective here is to support Arm FEAT_LSE2, which says that any
single memory access that does not cross a 16-byte boundary is atomic.
This is the MO_ATOM_WITHIN16 control.
While I'm touching all of this, a secondary objective is to handle the
atomicity of the IBM machines. Both Power and s390x treat misaligned
accesses as atomic on the lsb of the pointer. For instance, an 8-byte
access at ptr % 8 == 4 will appear as two atomic 4-byte accesses, and
ptr % 4 == 2 will appear as four 3-byte accesses.
This is the MO_ATOM_SUBALIGN control.
By default, acceses are atomic only if aligned, which is the current
behaviour of the tcg code generator (mostly, anyway, there were bugs).
This is the MO_ATOM_IFALIGN control.
Further, one can say that a large memory access is really a set of
contiguous smaller accesses, and we need not provide more atomicity
than that (modulo MO_ATOM_WITHIN16). This is the MO_ATMAX_* control.
While I've had a go at documenting all of this, I'm certain it could
be improved -- soliciting suggestions.
r~
Based-on: 20221118091858.242569-1-richard.henderson@linaro.org
("main-loop: Introduce QEMU_IOTHREAD_LOCK_GUARD")
which itself depends on "tcg: Support for Int128 with helpers".
Richard Henderson (29):
include/qemu/cpuid: Introduce xgetbv_low
include/exec/memop: Add bits describing atomicity
accel/tcg: Add cpu_in_serial_context
accel/tcg: Introduce tlb_read_idx
accel/tcg: Reorg system mode load helpers
accel/tcg: Reorg system mode store helpers
accel/tcg: Honor atomicity of loads
accel/tcg: Honor atomicity of stores
tcg/tci: Use cpu_{ld,st}_mmu
tcg: Unify helper_{be,le}_{ld,st}*
accel/tcg: Implement helper_{ld,st}*_mmu for user-only
tcg: Add 128-bit guest memory primitives
meson: Detect atomic128 support with optimization
tcg/i386: Add have_atomic16
include/qemu/int128: Add vector type to Int128Alias
accel/tcg: Use have_atomic16 in ldst_atomicity.c.inc
tcg/aarch64: Add have_lse, have_lse2
accel/tcg: Add aarch64 specific support in ldst_atomicity
tcg: Introduce TCG_OPF_TYPE_MASK
tcg: Add INDEX_op_qemu_{ld,st}_i128
tcg/i386: Introduce tcg_out_mov2
tcg/i386: Introduce tcg_out_testi
tcg/i386: Use full load/store helpers in user-only mode
tcg/i386: Replace is64 with type in qemu_ld/st routines
tcg/i386: Mark Win64 call-saved vector regs as reserved
tcg/i386: Examine MemOp for atomicity and alignment
tcg/i386: Support 128-bit load/store with have_atomic16
tcg/i386: Add vex_v argument to tcg_out_vex_modrm_pool
tcg/i386: Honor 64-bit atomicity in 32-bit mode
accel/tcg/internal.h | 5 +
accel/tcg/tcg-runtime.h | 3 +
include/exec/cpu-defs.h | 7 +-
include/exec/cpu_ldst.h | 26 +-
include/exec/memop.h | 36 +
include/qemu/cpuid.h | 25 +
include/qemu/int128.h | 10 +-
include/tcg/tcg-ldst.h | 70 +-
include/tcg/tcg-opc.h | 8 +
include/tcg/tcg.h | 22 +-
tcg/aarch64/tcg-target.h | 5 +
tcg/arm/tcg-target.h | 2 +
tcg/i386/tcg-target.h | 4 +
tcg/loongarch64/tcg-target.h | 2 +
tcg/mips/tcg-target.h | 2 +
tcg/ppc/tcg-target.h | 2 +
tcg/riscv/tcg-target.h | 2 +
tcg/s390x/tcg-target.h | 2 +
tcg/sparc64/tcg-target.h | 2 +
tcg/tci/tcg-target.h | 2 +
accel/tcg/cpu-exec-common.c | 3 +
accel/tcg/cputlb.c | 1884 +++++++++++++++++++-----------
accel/tcg/tb-maint.c | 2 +-
accel/tcg/user-exec.c | 478 +++++---
tcg/optimize.c | 15 +-
tcg/tcg-op.c | 246 ++--
tcg/tcg.c | 8 +-
tcg/tci.c | 127 +-
util/bufferiszero.c | 3 +-
accel/tcg/ldst_atomicity.c.inc | 1170 +++++++++++++++++++
docs/devel/loads-stores.rst | 36 +-
meson.build | 52 +-
tcg/README | 10 +-
tcg/aarch64/tcg-target.c.inc | 57 +-
tcg/arm/tcg-target.c.inc | 45 +-
tcg/i386/tcg-target.c.inc | 1228 +++++++++++++------
tcg/loongarch64/tcg-target.c.inc | 25 +-
tcg/mips/tcg-target.c.inc | 40 +-
tcg/ppc/tcg-target.c.inc | 30 +-
tcg/riscv/tcg-target.c.inc | 51 +-
tcg/s390x/tcg-target.c.inc | 38 +-
tcg/sparc64/tcg-target.c.inc | 37 +-
tcg/tci/tcg-target.c.inc | 3 +-
43 files changed, 4145 insertions(+), 1680 deletions(-)
create mode 100644 accel/tcg/ldst_atomicity.c.inc
--
2.34.1