[PATCH 0/6] accel/tcg: Always require can_do_io (#1866)

Richard Henderson posted 6 patches 8 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230914174436.1597356-1-richard.henderson@linaro.org
Maintainers: Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Aurelien Jarno <aurelien@aurel32.net>, Jiaxun Yang <jiaxun.yang@flygoat.com>, Aleksandar Rikalo <aleksandar.rikalo@syrmia.com>
include/exec/translator.h   |  2 ++
accel/tcg/cpu-exec.c        |  2 +-
accel/tcg/tb-maint.c        |  6 ++--
accel/tcg/translator.c      | 72 ++++++++++++++++++-------------------
target/mips/tcg/translate.c |  1 -
5 files changed, 41 insertions(+), 42 deletions(-)
[PATCH 0/6] accel/tcg: Always require can_do_io (#1866)
Posted by Richard Henderson 8 months ago
The problem exposed by the fix for #1826 (et al) is that the TB that
contains the i/o instruction that alters the address space continues
on to issue other i/o instructions.

Since #1826 deferred the update to the address space, these subsequent
i/o instructions do not reference the correct address space, and things
go awry from there.

Ideally we would treat changes to the address space as specially, but
that knowledge is buried quite far down in the device models.  We don't
find out that such a change is coming until quite late such that we
cannot undo all of the other side effects and start over.

The only alternative would seem to be to treat all i/o pesimistically
and end the TB after any i/o, exactly as we do for icount.

I'm not pleased about this, because an eyeball of avocado times
suggests a slowdown.  No doubt caused by most i/o having to go
through cpu_io_recompile.

I begin to wonder if #1826 should be solved differently, like *not*
caching MemoryRegionSections within the cpu tlb, and looking up the
physical address within the address space during the i/o itself.
But even that seems like it would work only for the more common case
where the address space reorg only changes devices.  For the odd case
where an address space reorg changes RAM, we still have the result
of the phys addr lookup cached via the adjustment to host memory.

So this seems like the most reasonable solution.

Follow-up patches could optimize setting of can_do_io, and replace
previous TranslationBlocks so that we only go through cpu_io_recompile
once for a particular bit of guest code, rather than every single time.


r~


Richard Henderson (6):
  accel/tcg: Avoid load of icount_decr if unused
  accel/tcg: Hoist CF_MEMI_ONLY check outside translation loop
  accel/tcg: Track current value of can_do_io in the TB
  accel/tcg: Improve setting of can_do_io at start of TB
  accel/tcg: Always set CF_LAST_IO with CF_NOIRQ
  accel/tcg: Always require can_do_io

 include/exec/translator.h   |  2 ++
 accel/tcg/cpu-exec.c        |  2 +-
 accel/tcg/tb-maint.c        |  6 ++--
 accel/tcg/translator.c      | 72 ++++++++++++++++++-------------------
 target/mips/tcg/translate.c |  1 -
 5 files changed, 41 insertions(+), 42 deletions(-)

-- 
2.34.1