[PATCH for-8.1] accel/tcg: Take mmap_lock in load_atomic16_or_exit

Richard Henderson posted 1 patch 9 months, 4 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230716170150.22398-1-richard.henderson@linaro.org
Maintainers: Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>
accel/tcg/ldst_atomicity.c.inc | 54 +++++++++++++++-------------------
1 file changed, 24 insertions(+), 30 deletions(-)
[PATCH for-8.1] accel/tcg: Take mmap_lock in load_atomic16_or_exit
Posted by Richard Henderson 9 months, 4 weeks ago
For user-only, the probe for page writability may race with another
thread's mprotect.  Take the mmap_lock around the operation.  This
is still faster than the start/end_exclusive fallback.

Remove the write probe in load_atomic8_or_exit.  There we don't have
the same machinery for testing the existance of an 8-byte cmpxchg.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/ldst_atomicity.c.inc | 54 +++++++++++++++-------------------
 1 file changed, 24 insertions(+), 30 deletions(-)

diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index 4de0a80492..e7170f8ba2 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -152,19 +152,6 @@ static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
         return load_atomic8(pv);
     }
 
-#ifdef CONFIG_USER_ONLY
-    /*
-     * If the page is not writable, then assume the value is immutable
-     * and requires no locking.  This ignores the case of MAP_SHARED with
-     * another process, because the fallback start_exclusive solution
-     * provides no protection across processes.
-     */
-    if (page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) {
-        uint64_t *p = __builtin_assume_aligned(pv, 8);
-        return *p;
-    }
-#endif
-
     /* Ultimate fallback: re-execute in serial context. */
     cpu_loop_exit_atomic(env_cpu(env), ra);
 }
@@ -186,25 +173,32 @@ static Int128 load_atomic16_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
         return atomic16_read_ro(p);
     }
 
-#ifdef CONFIG_USER_ONLY
-    /*
-     * We can only use cmpxchg to emulate a load if the page is writable.
-     * If the page is not writable, then assume the value is immutable
-     * and requires no locking.  This ignores the case of MAP_SHARED with
-     * another process, because the fallback start_exclusive solution
-     * provides no protection across processes.
-     */
-    if (page_check_range(h2g(p), 16, PAGE_WRITE_ORG)) {
-        return *p;
-    }
-#endif
-
-    /*
-     * In system mode all guest pages are writable, and for user-only
-     * we have just checked writability.  Try cmpxchg.
-     */
+    /* We can only use cmpxchg to emulate a load if the page is writable. */
     if (HAVE_ATOMIC128_RW) {
+#ifdef CONFIG_USER_ONLY
+        /*
+         * If the page is not writable, then assume the value is immutable
+         * and requires no locking.  This ignores the case of MAP_SHARED with
+         * another process, because the fallback start_exclusive solution
+         * provides no protection across processes.
+         * We must take mmap_lock so that the query remains valid until
+         * the write is complete -- tests/tcg/multiarch/munmap-pthread.c
+         * is an example that can race.
+         */
+        Int128 r;
+
+        mmap_lock();
+        if (page_get_flags(h2g(p)) & PAGE_WRITE_ORG) {
+            r = atomic16_read_rw(p);
+        } else {
+            r = *p;
+        }
+        mmap_unlock();
+        return r;
+#else
+        /* In system mode all guest pages are host writable. */
         return atomic16_read_rw(p);
+#endif
     }
 
     /* Ultimate fallback: re-execute in serial context. */
-- 
2.34.1
Re: [PATCH for-8.1] accel/tcg: Take mmap_lock in load_atomic16_or_exit
Posted by Alex Bennée 9 months, 4 weeks ago
Richard Henderson <richard.henderson@linaro.org> writes:

> For user-only, the probe for page writability may race with another
> thread's mprotect.  Take the mmap_lock around the operation.  This
> is still faster than the start/end_exclusive fallback.

Did we have a bug report or replication for this? 

>
> Remove the write probe in load_atomic8_or_exit.  There we don't have
> the same machinery for testing the existance of an 8-byte cmpxchg.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Otherwise makes sense to me:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro
Re: [PATCH for-8.1] accel/tcg: Take mmap_lock in load_atomic16_or_exit
Posted by Richard Henderson 9 months, 4 weeks ago
On 7/17/23 11:40, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> For user-only, the probe for page writability may race with another
>> thread's mprotect.  Take the mmap_lock around the operation.  This
>> is still faster than the start/end_exclusive fallback.
> 
> Did we have a bug report or replication for this?

See the comment:

> +         * We must take mmap_lock so that the query remains valid until
> +         * the write is complete -- tests/tcg/multiarch/munmap-pthread.c
> +         * is an example that can race.


r~

Re: [PATCH for-8.1] accel/tcg: Take mmap_lock in load_atomic16_or_exit
Posted by Alex Bennée 9 months, 3 weeks ago
Richard Henderson <richard.henderson@linaro.org> writes:

> On 7/17/23 11:40, Alex Bennée wrote:
>> Richard Henderson <richard.henderson@linaro.org> writes:
>> 
>>> For user-only, the probe for page writability may race with another
>>> thread's mprotect.  Take the mmap_lock around the operation.  This
>>> is still faster than the start/end_exclusive fallback.
>> Did we have a bug report or replication for this?
>
> See the comment:
>
>> +         * We must take mmap_lock so that the query remains valid until
>> +         * the write is complete -- tests/tcg/multiarch/munmap-pthread.c
>> +         * is an example that can race.

doh...

>
>
> r~


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro
Re: [PATCH for-8.1] accel/tcg: Take mmap_lock in load_atomic16_or_exit
Posted by Peter Maydell 9 months, 4 weeks ago
On Sun, 16 Jul 2023 at 18:03, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> For user-only, the probe for page writability may race with another
> thread's mprotect.  Take the mmap_lock around the operation.  This
> is still faster than the start/end_exclusive fallback.
>
> Remove the write probe in load_atomic8_or_exit.  There we don't have
> the same machinery for testing the existance of an 8-byte cmpxchg.

"existence"

>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/ldst_atomicity.c.inc | 54 +++++++++++++++-------------------
>  1 file changed, 24 insertions(+), 30 deletions(-)
>
> diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
> index 4de0a80492..e7170f8ba2 100644
> --- a/accel/tcg/ldst_atomicity.c.inc
> +++ b/accel/tcg/ldst_atomicity.c.inc
> @@ -152,19 +152,6 @@ static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
>          return load_atomic8(pv);
>      }
>
> -#ifdef CONFIG_USER_ONLY
> -    /*
> -     * If the page is not writable, then assume the value is immutable
> -     * and requires no locking.  This ignores the case of MAP_SHARED with
> -     * another process, because the fallback start_exclusive solution
> -     * provides no protection across processes.
> -     */
> -    if (page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) {
> -        uint64_t *p = __builtin_assume_aligned(pv, 8);
> -        return *p;
> -    }
> -#endif

I don't really understand the comment in the commit message:
why would it be wrong to wrap this "test writeability and
do the operation" in the mmap-lock, the same way we do for the
16-byte case?

thanks
-- PMM
Re: [PATCH for-8.1] accel/tcg: Take mmap_lock in load_atomic16_or_exit
Posted by Richard Henderson 9 months, 4 weeks ago
On 7/17/23 11:12, Peter Maydell wrote:
> On Sun, 16 Jul 2023 at 18:03, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> For user-only, the probe for page writability may race with another
>> thread's mprotect.  Take the mmap_lock around the operation.  This
>> is still faster than the start/end_exclusive fallback.
>>
>> Remove the write probe in load_atomic8_or_exit.  There we don't have
>> the same machinery for testing the existance of an 8-byte cmpxchg.
> 
> "existence"
> 
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   accel/tcg/ldst_atomicity.c.inc | 54 +++++++++++++++-------------------
>>   1 file changed, 24 insertions(+), 30 deletions(-)
>>
>> diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
>> index 4de0a80492..e7170f8ba2 100644
>> --- a/accel/tcg/ldst_atomicity.c.inc
>> +++ b/accel/tcg/ldst_atomicity.c.inc
>> @@ -152,19 +152,6 @@ static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
>>           return load_atomic8(pv);
>>       }
>>
>> -#ifdef CONFIG_USER_ONLY
>> -    /*
>> -     * If the page is not writable, then assume the value is immutable
>> -     * and requires no locking.  This ignores the case of MAP_SHARED with
>> -     * another process, because the fallback start_exclusive solution
>> -     * provides no protection across processes.
>> -     */
>> -    if (page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) {
>> -        uint64_t *p = __builtin_assume_aligned(pv, 8);
>> -        return *p;
>> -    }
>> -#endif
> 
> I don't really understand the comment in the commit message:
> why would it be wrong to wrap this "test writeability and
> do the operation" in the mmap-lock, the same way we do for the
> 16-byte case?

It would not be wrong.  I was just thinking of the cmpxchg8 part, for which we do not have 
a configure probe, and for which I *think* there's no call, because there are no 32-bit 
hosts that have cmpxchg8 but not the full CONFIG_ATOMIC64.


r~
Re: [PATCH for-8.1] accel/tcg: Take mmap_lock in load_atomic16_or_exit
Posted by Peter Maydell 9 months, 3 weeks ago
On Mon, 17 Jul 2023 at 19:25, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 7/17/23 11:12, Peter Maydell wrote:
> > On Sun, 16 Jul 2023 at 18:03, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >>
> >> For user-only, the probe for page writability may race with another
> >> thread's mprotect.  Take the mmap_lock around the operation.  This
> >> is still faster than the start/end_exclusive fallback.
> >>
> >> Remove the write probe in load_atomic8_or_exit.  There we don't have
> >> the same machinery for testing the existance of an 8-byte cmpxchg.
> >
> > "existence"
> >
> >>
> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> >> ---
> >>   accel/tcg/ldst_atomicity.c.inc | 54 +++++++++++++++-------------------
> >>   1 file changed, 24 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
> >> index 4de0a80492..e7170f8ba2 100644
> >> --- a/accel/tcg/ldst_atomicity.c.inc
> >> +++ b/accel/tcg/ldst_atomicity.c.inc
> >> @@ -152,19 +152,6 @@ static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
> >>           return load_atomic8(pv);
> >>       }
> >>
> >> -#ifdef CONFIG_USER_ONLY
> >> -    /*
> >> -     * If the page is not writable, then assume the value is immutable
> >> -     * and requires no locking.  This ignores the case of MAP_SHARED with
> >> -     * another process, because the fallback start_exclusive solution
> >> -     * provides no protection across processes.
> >> -     */
> >> -    if (page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) {
> >> -        uint64_t *p = __builtin_assume_aligned(pv, 8);
> >> -        return *p;
> >> -    }
> >> -#endif
> >
> > I don't really understand the comment in the commit message:
> > why would it be wrong to wrap this "test writeability and
> > do the operation" in the mmap-lock, the same way we do for the
> > 16-byte case?
>
> It would not be wrong.  I was just thinking of the cmpxchg8 part, for which we do not have
> a configure probe, and for which I *think* there's no call, because there are no 32-bit
> hosts that have cmpxchg8 but not the full CONFIG_ATOMIC64.

But this piece of code the patch deletes isn't doing a
cmpxchg8, it just does a plain load. If "take the lock
and do the operation" is faster than always using the
fallback code for the atomic16 case, why don't we make
the same tradeoff choice in atomic8  ?

thanks
-- PMM