[PATCH RFC 0/1] accel/tcg: Clear PAGE_WRITE before translation

Ilya Leoshkevich posted 1 patch 4 years, 6 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20210804224633.154083-1-iii@linux.ibm.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>
There is a newer version of this series
accel/tcg/translate-all.c    | 59 +++++++++++++++++++++---------------
accel/tcg/translator.c       | 26 ++++++++++++++--
include/exec/translate-all.h |  1 +
3 files changed, 59 insertions(+), 27 deletions(-)
[PATCH RFC 0/1] accel/tcg: Clear PAGE_WRITE before translation
Posted by Ilya Leoshkevich 4 years, 6 months ago
Hello,

As discussed on IRC, here is the tentative fix for concurrent code
patching. It helps with the x86_64 .NET app on s390x and survives
check-tcg.

Bug report: https://lists.nongnu.org/archive/html/qemu-devel/2021-08/msg00644.html

IRC log:
========
<stsquad> iii: my initial thoughts are there is a race in tb_page_add because while we will have flushed all the old pages this new corrupted page gets added the new corrupted one gets in
<iii> stsquad: I think you are right that it can be considered a tb_page_add race. Would it be reasonable to solve it by marking the page read-only before translation and then making sure that it doesn't get its PAGE_WRITE back until translation is complete?
<rth> iii, stsquad: https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg07995.html
<rth> iii: yes, making the page read-only early is the fix, i think.  i believe we already hold the mmap_lock around translation, so that should make a writer fault and then wait on the mmap_lock.
<iii> rth: Thanks, let me give it a try. I'll post whatever I come up with as an RFC patch to qemu-devel.
<rth> thanks
<stsquad> rth: doesn't that serialise all translation again?
<stsquad> rth: we could page lock instead?
<rth> stsquad: i thought we were talking about user-only, where translation is always serial.
<rth> stsquad: the link from january is a system-mode bug of the same kind, where, yes, we need to hold the page lock or something.
<stsquad> rth: ahh yes because we don't have zoned translation caches...

Ilya Leoshkevich (1):
  accel/tcg: Clear PAGE_WRITE before translation

 accel/tcg/translate-all.c    | 59 +++++++++++++++++++++---------------
 accel/tcg/translator.c       | 26 ++++++++++++++--
 include/exec/translate-all.h |  1 +
 3 files changed, 59 insertions(+), 27 deletions(-)

-- 
2.31.1