[PATCH 0/2] target/arm: Fix unwind from dc zva and FEAT_MOPS

Richard Henderson posted 2 patches 2 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20240702234155.2106399-1-richard.henderson@linaro.org
Maintainers: Riku Voipio <riku.voipio@iki.fi>, Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, "Alex Bennée" <alex.bennee@linaro.org>
include/exec/cpu_ldst.h            | 40 ++++++++++++++++
accel/tcg/user-exec.c              | 22 +++++++++
target/arm/tcg/helper-a64.c        | 10 ++--
tests/tcg/multiarch/memset-fault.c | 77 ++++++++++++++++++++++++++++++
4 files changed, 144 insertions(+), 5 deletions(-)
create mode 100644 tests/tcg/multiarch/memset-fault.c
[PATCH 0/2] target/arm: Fix unwind from dc zva and FEAT_MOPS
Posted by Richard Henderson 2 months, 1 week ago
While looking into Zoltan's attempt to speed up ppc64 DCBZ
(data cache block set to zero), I wondered what AArch64 was
doing differently.  It turned out that Arm is the only user
of tlb_vaddr_to_host.

None of the code sequences in use between AArch64, Power64 and S390X
are 100% safe, with race conditions vs mmap et al, however, AArch64
is the only one that will fail this single threaded test case.  Use
of these new functions fixes the race condition as well, though I
have not yet touched the other guests.

I thought about exposing accel/tcg/user-retaddr.h for direct use
from the targets, but perhaps these wrappers are cleaner.  RFC?


r~


Richard Henderson (2):
  accel/tcg: Introduce memset_ra, memmove_ra
  target/arm: Use memset_ra, memmove_ra in helper-a64.c

 include/exec/cpu_ldst.h            | 40 ++++++++++++++++
 accel/tcg/user-exec.c              | 22 +++++++++
 target/arm/tcg/helper-a64.c        | 10 ++--
 tests/tcg/multiarch/memset-fault.c | 77 ++++++++++++++++++++++++++++++
 4 files changed, 144 insertions(+), 5 deletions(-)
 create mode 100644 tests/tcg/multiarch/memset-fault.c

-- 
2.34.1
Re: [PATCH 0/2] target/arm: Fix unwind from dc zva and FEAT_MOPS
Posted by Peter Maydell 2 months ago
On Wed, 3 Jul 2024 at 00:43, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> While looking into Zoltan's attempt to speed up ppc64 DCBZ
> (data cache block set to zero), I wondered what AArch64 was
> doing differently.  It turned out that Arm is the only user
> of tlb_vaddr_to_host.

riscv also seems to use it in vext_ldff(), fwiw.

-- PMM
Re: [PATCH 0/2] target/arm: Fix unwind from dc zva and FEAT_MOPS
Posted by Richard Henderson 2 months ago
On 7/8/24 07:25, Peter Maydell wrote:
> On Wed, 3 Jul 2024 at 00:43, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> While looking into Zoltan's attempt to speed up ppc64 DCBZ
>> (data cache block set to zero), I wondered what AArch64 was
>> doing differently.  It turned out that Arm is the only user
>> of tlb_vaddr_to_host.
> 
> riscv also seems to use it in vext_ldff(), fwiw.

So it does, followed by a second probe for read.
That's quite wrong...

But you're right that it has a similar race condition.


r~
Re: [PATCH 0/2] target/arm: Fix unwind from dc zva and FEAT_MOPS
Posted by Ilya Leoshkevich 2 months ago
On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote:
> While looking into Zoltan's attempt to speed up ppc64 DCBZ
> (data cache block set to zero), I wondered what AArch64 was
> doing differently.  It turned out that Arm is the only user
> of tlb_vaddr_to_host.
> 
> None of the code sequences in use between AArch64, Power64 and S390X
> are 100% safe, with race conditions vs mmap et al, however, AArch64
> is the only one that will fail this single threaded test case.  Use
> of these new functions fixes the race condition as well, though I
> have not yet touched the other guests.
> 
> I thought about exposing accel/tcg/user-retaddr.h for direct use
> from the targets, but perhaps these wrappers are cleaner.  RFC?
> 
> 
> r~
> 
> 
> Richard Henderson (2):
>   accel/tcg: Introduce memset_ra, memmove_ra
>   target/arm: Use memset_ra, memmove_ra in helper-a64.c
> 
>  include/exec/cpu_ldst.h            | 40 ++++++++++++++++
>  accel/tcg/user-exec.c              | 22 +++++++++
>  target/arm/tcg/helper-a64.c        | 10 ++--
>  tests/tcg/multiarch/memset-fault.c | 77
> ++++++++++++++++++++++++++++++
>  4 files changed, 144 insertions(+), 5 deletions(-)
>  create mode 100644 tests/tcg/multiarch/memset-fault.c

This sounds good to me.

I haven't debugged it, but I wonder why doesn't s390x fail here.
For XC with src == dst, it does access_memset() -> do_access_memset()
-> memset() without setting the RA. And I don't think that anything
around it sets the RA either.
Re: [PATCH 0/2] target/arm: Fix unwind from dc zva and FEAT_MOPS
Posted by Richard Henderson 2 months ago
On 7/4/24 07:50, Ilya Leoshkevich wrote:
> On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote:
>> While looking into Zoltan's attempt to speed up ppc64 DCBZ
>> (data cache block set to zero), I wondered what AArch64 was
>> doing differently.  It turned out that Arm is the only user
>> of tlb_vaddr_to_host.
>>
>> None of the code sequences in use between AArch64, Power64 and S390X
>> are 100% safe, with race conditions vs mmap et al, however, AArch64
>> is the only one that will fail this single threaded test case.  Use
>> of these new functions fixes the race condition as well, though I
>> have not yet touched the other guests.
>>
>> I thought about exposing accel/tcg/user-retaddr.h for direct use
>> from the targets, but perhaps these wrappers are cleaner.  RFC?
>>
>>
>> r~
>>
>>
>> Richard Henderson (2):
>>    accel/tcg: Introduce memset_ra, memmove_ra
>>    target/arm: Use memset_ra, memmove_ra in helper-a64.c
>>
>>   include/exec/cpu_ldst.h            | 40 ++++++++++++++++
>>   accel/tcg/user-exec.c              | 22 +++++++++
>>   target/arm/tcg/helper-a64.c        | 10 ++--
>>   tests/tcg/multiarch/memset-fault.c | 77
>> ++++++++++++++++++++++++++++++
>>   4 files changed, 144 insertions(+), 5 deletions(-)
>>   create mode 100644 tests/tcg/multiarch/memset-fault.c
> 
> This sounds good to me.
> 
> I haven't debugged it, but I wonder why doesn't s390x fail here.
> For XC with src == dst, it does access_memset() -> do_access_memset()
> -> memset() without setting the RA. And I don't think that anything
> around it sets the RA either.

s390x uses probe_access_flags, which verifies the page is mapped and writable, and raises 
the exception when it isn't.  In contrast, for user-only, tlb_vaddr_to_host *only* 
performs the guest -> host address mapping, i.e. (addr + guest_base).


r~

Re: [PATCH 0/2] target/arm: Fix unwind from dc zva and FEAT_MOPS
Posted by Richard Henderson 2 months ago
On 7/4/24 08:18, Richard Henderson wrote:
> On 7/4/24 07:50, Ilya Leoshkevich wrote:
>> On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote:
>>> While looking into Zoltan's attempt to speed up ppc64 DCBZ
>>> (data cache block set to zero), I wondered what AArch64 was
>>> doing differently.  It turned out that Arm is the only user
>>> of tlb_vaddr_to_host.
>>>
>>> None of the code sequences in use between AArch64, Power64 and S390X
>>> are 100% safe, with race conditions vs mmap et al, however, AArch64
>>> is the only one that will fail this single threaded test case.  Use
>>> of these new functions fixes the race condition as well, though I
>>> have not yet touched the other guests.
>>>
>>> I thought about exposing accel/tcg/user-retaddr.h for direct use
>>> from the targets, but perhaps these wrappers are cleaner.  RFC?
>>>
>>>
>>> r~
>>>
>>>
>>> Richard Henderson (2):
>>>    accel/tcg: Introduce memset_ra, memmove_ra
>>>    target/arm: Use memset_ra, memmove_ra in helper-a64.c
>>>
>>>   include/exec/cpu_ldst.h            | 40 ++++++++++++++++
>>>   accel/tcg/user-exec.c              | 22 +++++++++
>>>   target/arm/tcg/helper-a64.c        | 10 ++--
>>>   tests/tcg/multiarch/memset-fault.c | 77
>>> ++++++++++++++++++++++++++++++
>>>   4 files changed, 144 insertions(+), 5 deletions(-)
>>>   create mode 100644 tests/tcg/multiarch/memset-fault.c
>>
>> This sounds good to me.
>>
>> I haven't debugged it, but I wonder why doesn't s390x fail here.
>> For XC with src == dst, it does access_memset() -> do_access_memset()
>> -> memset() without setting the RA. And I don't think that anything
>> around it sets the RA either.
> 
> s390x uses probe_access_flags, which verifies the page is mapped and writable, and raises 
> the exception when it isn't.  In contrast, for user-only, tlb_vaddr_to_host *only* 
> performs the guest -> host address mapping, i.e. (addr + guest_base).

I should clarify: probe_access_flags verifies that the page is mapped *at that moment*, 
but does not take the mmap_lock.  So the race is that the page can be unmapped by another 
thread after probe_access_flags and before the memset completes.


r~


Re: [PATCH 0/2] target/arm: Fix unwind from dc zva and FEAT_MOPS
Posted by Ilya Leoshkevich 2 months ago
On Thu, 2024-07-04 at 14:48 -0700, Richard Henderson wrote:
> On 7/4/24 08:18, Richard Henderson wrote:
> > On 7/4/24 07:50, Ilya Leoshkevich wrote:
> > > On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote:
> > > > While looking into Zoltan's attempt to speed up ppc64 DCBZ
> > > > (data cache block set to zero), I wondered what AArch64 was
> > > > doing differently.  It turned out that Arm is the only user
> > > > of tlb_vaddr_to_host.
> > > > 
> > > > None of the code sequences in use between AArch64, Power64 and
> > > > S390X
> > > > are 100% safe, with race conditions vs mmap et al, however,
> > > > AArch64
> > > > is the only one that will fail this single threaded test case. 
> > > > Use
> > > > of these new functions fixes the race condition as well, though
> > > > I
> > > > have not yet touched the other guests.
> > > > 
> > > > I thought about exposing accel/tcg/user-retaddr.h for direct
> > > > use
> > > > from the targets, but perhaps these wrappers are cleaner.  RFC?
> > > > 
> > > > 
> > > > r~
> > > > 
> > > > 
> > > > Richard Henderson (2):
> > > >    accel/tcg: Introduce memset_ra, memmove_ra
> > > >    target/arm: Use memset_ra, memmove_ra in helper-a64.c
> > > > 
> > > >   include/exec/cpu_ldst.h            | 40 ++++++++++++++++
> > > >   accel/tcg/user-exec.c              | 22 +++++++++
> > > >   target/arm/tcg/helper-a64.c        | 10 ++--
> > > >   tests/tcg/multiarch/memset-fault.c | 77
> > > > ++++++++++++++++++++++++++++++
> > > >   4 files changed, 144 insertions(+), 5 deletions(-)
> > > >   create mode 100644 tests/tcg/multiarch/memset-fault.c
> > > 
> > > This sounds good to me.
> > > 
> > > I haven't debugged it, but I wonder why doesn't s390x fail here.
> > > For XC with src == dst, it does access_memset() ->
> > > do_access_memset()
> > > -> memset() without setting the RA. And I don't think that
> > > anything
> > > around it sets the RA either.
> > 
> > s390x uses probe_access_flags, which verifies the page is mapped
> > and writable, and raises 
> > the exception when it isn't.  In contrast, for user-only,
> > tlb_vaddr_to_host *only* 
> > performs the guest -> host address mapping, i.e. (addr +
> > guest_base).
> 
> I should clarify: probe_access_flags verifies that the page is mapped
> *at that moment*, 
> but does not take the mmap_lock.  So the race is that the page can be
> unmapped by another 
> thread after probe_access_flags and before the memset completes.

I see, thanks. I completely overlooked the access_prepare() calls.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>