[PATCH-for-9.1 v2 0/3] target/ppc: fix tlb flushing race (plus

Nicholas Piggin posted 3 patches 3 weeks, 6 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20240405125340.380828-1-npiggin@gmail.com
Maintainers: Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Nicholas Piggin <npiggin@gmail.com>, Daniel Henrique Barboza <danielhb413@gmail.com>
docs/devel/multi-thread-tcg.rst              |  13 +-
include/exec/exec-all.h                      |  97 ++-----------
accel/tcg/cputlb.c                           | 145 ++-----------------
target/ppc/helper_regs.c                     |   2 +-
target/ppc/mmu_helper.c                      |   2 +-
target/ppc/translate.c                       |   7 +
target/ppc/translate/storage-ctrl-impl.c.inc |   7 +
7 files changed, 44 insertions(+), 229 deletions(-)
[PATCH-for-9.1 v2 0/3] target/ppc: fix tlb flushing race (plus
Posted by Nicholas Piggin 3 weeks, 6 days ago
ppc's broadcast tlb flushe must ensure all vCPUs have invalidated
their TLBs when the tlbie sequence completes. This is not true in
the current TCG implementation, due to async_run_on_cpu() returning
before the remote CPU runs the job.

Fixing ppc requires moving to async_safe_run_on_cpu(), however this
API does not guarantee that either, and actually changing to the
tlb_flush_*_all_cpus_synced() variants introduces another race
which is that the flush is not even guaraneed to complete on the
local CPU. To ensure that it is, the tlbie has to be made the last
instruction in the TB.

Fixing ppc removes the last caller of the non-synced TLB flush
variants, we can remove some dead code too.

For others - at least arm, riscv, and s390x all use the
tlb_flush_*_all_cpus_synced() calls AFAIKS without ending the TB. But
it's possible I've missed where they do, or they have other issues that
mean it's not required for the flush to take effect until some later
operation which does end the TB. Maybe there is no problem, but it might
be worth looking at.

To reproduce, I have a kvm-unit-tests case for ppc but should be quite
easy to port to other archs. You just need to be careful for the local
CPU test case that your tlbi instruction is in the same TB as the
subsequent load/store that is to incorrectly use a stale translation.

https://gitlab.com/npiggin/kvm-unit-tests/-/tree/powerpc

The test 'powerpc/mmu.elf tlbie:tlbi-other-cpu' breaks with upstream
QEMU. If we use tlb_flush_*_all_cpus_synced() like other arch, then
'powerpc/mmu.elf tlbie:tlbi-this-cpu' also breaks.

Since v1 I understood the full problem and fix, and fixed the fix.

Thanks,
Nick

Nicholas Piggin (3):
  target/ppc: Fix broadcast tlbie synchronisation
  tcg/cputlb: Remove non-synced variants of global TLB flushes
  tcg/cputlb: remove other-cpu capability from TLB flushing

 docs/devel/multi-thread-tcg.rst              |  13 +-
 include/exec/exec-all.h                      |  97 ++-----------
 accel/tcg/cputlb.c                           | 145 ++-----------------
 target/ppc/helper_regs.c                     |   2 +-
 target/ppc/mmu_helper.c                      |   2 +-
 target/ppc/translate.c                       |   7 +
 target/ppc/translate/storage-ctrl-impl.c.inc |   7 +
 7 files changed, 44 insertions(+), 229 deletions(-)

-- 
2.43.0