[Qemu-devel] [PATCH for-2.11] rcu: init globals only once

Peter Xu posted 1 patch 6 years, 8 months ago
Failed in applying to current master (apply log)
util/rcu.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
[Qemu-devel] [PATCH for-2.11] rcu: init globals only once
Posted by Peter Xu 6 years, 8 months ago
We were calling rcu_init_complete() twice in the child processes when
fork happened. However the pthread library does not really suggest to do
it that way:

http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_init.html

"Attempting to initialise an already initialised mutex results in
 undefined behaviour."

Actually, IMHO we can do it in a more natural way: Firstly, we only init
the RCU globals once in rcu_init(). Then, in rcu_init_child(), we unlock
all the locks held in rcu_init_lock() just like what we do in the parent
process, then do the rest of RCU re-init (e.g., create the RCU thread).

CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
this is based on Paolo's series:
"[PATCH for-2.10 0/2] RCU: forking fix and cleanups"
---
 util/rcu.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/util/rcu.c b/util/rcu.c
index ca5a63e..6fbbe4c 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -299,15 +299,17 @@ void rcu_unregister_thread(void)
     qemu_mutex_unlock(&rcu_registry_lock);
 }
 
-static void rcu_init_complete(void)
+static void rcu_init_globals(void)
 {
-    QemuThread thread;
-
     qemu_mutex_init(&rcu_registry_lock);
     qemu_mutex_init(&rcu_sync_lock);
     qemu_event_init(&rcu_gp_event, true);
-
     qemu_event_init(&rcu_call_ready_event, false);
+}
+
+static void rcu_init_complete(void)
+{
+    QemuThread thread;
 
     /* The caller is assumed to have iothread lock, so the call_rcu thread
      * must have been quiescent even after forking, just recreate it.
@@ -357,6 +359,13 @@ static void rcu_init_child(void)
         return;
     }
 
+    rcu_init_unlock();
+
+    /*
+     * For the newly forked child, we need something extra: since
+     * after fork the threads are all gone, we need to re-init the RCU
+     * thread, along with the globals.
+     */
     memset(&registry, 0, sizeof(registry));
     rcu_init_complete();
 }
@@ -367,5 +376,6 @@ static void __attribute__((__constructor__)) rcu_init(void)
 #ifdef CONFIG_POSIX
     pthread_atfork(rcu_init_lock, rcu_init_unlock, rcu_init_child);
 #endif
+    rcu_init_globals();
     rcu_init_complete();
 }
-- 
2.7.4


Re: [Qemu-devel] [PATCH for-2.11] rcu: init globals only once
Posted by Paolo Bonzini 6 years, 8 months ago
On 08/08/2017 09:00, Peter Xu wrote:
> We were calling rcu_init_complete() twice in the child processes when
> fork happened. However the pthread library does not really suggest to do
> it that way:
> 
> http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_init.html
> 
> "Attempting to initialise an already initialised mutex results in
>  undefined behaviour."
> 
> Actually, IMHO we can do it in a more natural way: Firstly, we only init
> the RCU globals once in rcu_init(). Then, in rcu_init_child(), we unlock
> all the locks held in rcu_init_lock() just like what we do in the parent
> process, then do the rest of RCU re-init (e.g., create the RCU thread).

This doesn't work for error-checking mutexes: rcu_init_child has a
different PID than the parent, so the mutexes aren't unlocked.  It's
also true that right now we don't use error-checking mutexes (commit
24fa90499f, "qemu-thread: do not use PTHREAD_MUTEX_ERRORCHECK",
2015-03-10); however, that's also a bit sad.

The reason for the undefined behavior is probably that some operating
systems allocate memory in pthread_mutex_init, and initializing twice
causes a memory leak.  One such operating system is OpenBSD. :(

Eric, you chimed in on the patch that became commit 24fa90499f, what do
you suggest?

Paolo

> CC: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> this is based on Paolo's series:
> "[PATCH for-2.10 0/2] RCU: forking fix and cleanups"
> ---
>  util/rcu.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/util/rcu.c b/util/rcu.c
> index ca5a63e..6fbbe4c 100644
> --- a/util/rcu.c
> +++ b/util/rcu.c
> @@ -299,15 +299,17 @@ void rcu_unregister_thread(void)
>      qemu_mutex_unlock(&rcu_registry_lock);
>  }
>  
> -static void rcu_init_complete(void)
> +static void rcu_init_globals(void)
>  {
> -    QemuThread thread;
> -
>      qemu_mutex_init(&rcu_registry_lock);
>      qemu_mutex_init(&rcu_sync_lock);
>      qemu_event_init(&rcu_gp_event, true);
> -
>      qemu_event_init(&rcu_call_ready_event, false);
> +}
> +
> +static void rcu_init_complete(void)
> +{
> +    QemuThread thread;
>  
>      /* The caller is assumed to have iothread lock, so the call_rcu thread
>       * must have been quiescent even after forking, just recreate it.
> @@ -357,6 +359,13 @@ static void rcu_init_child(void)
>          return;
>      }
>  
> +    rcu_init_unlock();
> +
> +    /*
> +     * For the newly forked child, we need something extra: since
> +     * after fork the threads are all gone, we need to re-init the RCU
> +     * thread, along with the globals.
> +     */
>      memset(&registry, 0, sizeof(registry));
>      rcu_init_complete();
>  }
> @@ -367,5 +376,6 @@ static void __attribute__((__constructor__)) rcu_init(void)
>  #ifdef CONFIG_POSIX
>      pthread_atfork(rcu_init_lock, rcu_init_unlock, rcu_init_child);
>  #endif
> +    rcu_init_globals();
>      rcu_init_complete();
>  }
> 


Re: [Qemu-devel] [PATCH for-2.11] rcu: init globals only once
Posted by Peter Xu 6 years, 8 months ago
On Tue, Aug 08, 2017 at 09:26:43AM +0200, Paolo Bonzini wrote:
> On 08/08/2017 09:00, Peter Xu wrote:
> > We were calling rcu_init_complete() twice in the child processes when
> > fork happened. However the pthread library does not really suggest to do
> > it that way:
> > 
> > http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_init.html
> > 
> > "Attempting to initialise an already initialised mutex results in
> >  undefined behaviour."
> > 
> > Actually, IMHO we can do it in a more natural way: Firstly, we only init
> > the RCU globals once in rcu_init(). Then, in rcu_init_child(), we unlock
> > all the locks held in rcu_init_lock() just like what we do in the parent
> > process, then do the rest of RCU re-init (e.g., create the RCU thread).
> 
> This doesn't work for error-checking mutexes: rcu_init_child has a
> different PID than the parent, so the mutexes aren't unlocked.  It's
> also true that right now we don't use error-checking mutexes (commit
> 24fa90499f, "qemu-thread: do not use PTHREAD_MUTEX_ERRORCHECK",
> 2015-03-10); however, that's also a bit sad.
> 
> The reason for the undefined behavior is probably that some operating
> systems allocate memory in pthread_mutex_init, and initializing twice
> causes a memory leak.  One such operating system is OpenBSD. :(

Good to know. :)

I thought pthread_atfork() was designed to solve such a locking
problem (in child hanlder, we unlock all the held locks). If
PTHREAD_MUTEX_ERRORCHECK cannot coop well with it, not sure whether
that means we should just avoid using PTHREAD_MUTEX_ERRORCHECK in such
a use case (but we should be able to use the error checks in other
mutexes that do not need extra fork handling)?

Another idea is: can we just destroy the mutex first then re-init it
in subprocess? A quick glance in libpthread code shows that at least
pthread_mutex_destroy() won't check PTHREAD_MUTEX_ERRORCHECK.

Thanks,

> 
> Eric, you chimed in on the patch that became commit 24fa90499f, what do
> you suggest?
> 
> Paolo

-- 
Peter Xu

Re: [Qemu-devel] [PATCH for-2.11] rcu: init globals only once
Posted by Paolo Bonzini 6 years, 8 months ago
On 08/08/2017 09:49, Peter Xu wrote:
> On Tue, Aug 08, 2017 at 09:26:43AM +0200, Paolo Bonzini wrote:
>> On 08/08/2017 09:00, Peter Xu wrote:
>>> We were calling rcu_init_complete() twice in the child processes when
>>> fork happened. However the pthread library does not really suggest to do
>>> it that way:
>>>
>>> http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_init.html
>>>
>>> "Attempting to initialise an already initialised mutex results in
>>>  undefined behaviour."
>>>
>>> Actually, IMHO we can do it in a more natural way: Firstly, we only init
>>> the RCU globals once in rcu_init(). Then, in rcu_init_child(), we unlock
>>> all the locks held in rcu_init_lock() just like what we do in the parent
>>> process, then do the rest of RCU re-init (e.g., create the RCU thread).
>>
>> This doesn't work for error-checking mutexes: rcu_init_child has a
>> different PID than the parent, so the mutexes aren't unlocked.  It's
>> also true that right now we don't use error-checking mutexes (commit
>> 24fa90499f, "qemu-thread: do not use PTHREAD_MUTEX_ERRORCHECK",
>> 2015-03-10); however, that's also a bit sad.
>>
>> The reason for the undefined behavior is probably that some operating
>> systems allocate memory in pthread_mutex_init, and initializing twice
>> causes a memory leak.  One such operating system is OpenBSD. :(
> 
> Good to know. :)
> 
> I thought pthread_atfork() was designed to solve such a locking
> problem (in child hanlder, we unlock all the held locks). If
> PTHREAD_MUTEX_ERRORCHECK cannot coop well with it, not sure whether
> that means we should just avoid using PTHREAD_MUTEX_ERRORCHECK in such
> a use case (but we should be able to use the error checks in other
> mutexes that do not need extra fork handling)?
> 
> Another idea is: can we just destroy the mutex first then re-init it
> in subprocess? A quick glance in libpthread code shows that at least
> pthread_mutex_destroy() won't check PTHREAD_MUTEX_ERRORCHECK.

Destroying a mutex that is locked is also undefined behavior, even if it
has no waiters.  So it's a nice catch-22: we cannot destroy before
unlocking, and we cannot unlock an error-checking mutex, so we cannot
destroy before init.

Actually, unlocking in atfork's child callback is interesting even for
non-error-checking mutexes.  If there are waiters, waking up the first
one in pthread_mutex_unlock fails.  This is okay if (as on Linux)
pthread_mutex_lock does none of the woken-up process's work.  However,
on OpenBSD pthread_mutex_lock sets mutex->owner... but in the child
there's no other thread to wake up and thus no one will set mutex->owner
to NULL.  This should lead to deadlock scenarios with pthread_atfork.

So pthread_atfork is impossible to use correctly with mutexes, and
reinitializing is the best we can do (the memory leak being a lesser evil).

The workaround would be to replace mutexes with semaphores, and to
implement QemuEvent natively for OpenBSD using its __thrsleep and
__thrwakeup system calls (no, I am not going to do that :)).

Let's wait and hear what Eric has to say.

Paolo

> Thanks,
> 
>>
>> Eric, you chimed in on the patch that became commit 24fa90499f, what do
>> you suggest?
>>
>> Paolo
> 


Re: [Qemu-devel] [PATCH for-2.11] rcu: init globals only once
Posted by Eric Blake 6 years, 8 months ago
On 08/08/2017 02:49 AM, Peter Xu wrote:
>> This doesn't work for error-checking mutexes: rcu_init_child has a
>> different PID than the parent, so the mutexes aren't unlocked.  It's
>> also true that right now we don't use error-checking mutexes (commit
>> 24fa90499f, "qemu-thread: do not use PTHREAD_MUTEX_ERRORCHECK",
>> 2015-03-10); however, that's also a bit sad.
>>
>> The reason for the undefined behavior is probably that some operating
>> systems allocate memory in pthread_mutex_init, and initializing twice
>> causes a memory leak.  One such operating system is OpenBSD. :(
> 
> Good to know. :)
> 
> I thought pthread_atfork() was designed to solve such a locking
> problem (in child hanlder, we unlock all the held locks).

What's also sad is that POSIX says that pthread_atfork() is rather
useless - there's no way it can be reliably used to do everything that
everyone wants (and I think this case of error-checking mutexes is just
ONE of those reasons).

> If
> PTHREAD_MUTEX_ERRORCHECK cannot coop well with it, not sure whether
> that means we should just avoid using PTHREAD_MUTEX_ERRORCHECK in such
> a use case (but we should be able to use the error checks in other
> mutexes that do not need extra fork handling)?
> 
> Another idea is: can we just destroy the mutex first then re-init it
> in subprocess? A quick glance in libpthread code shows that at least
> pthread_mutex_destroy() won't check PTHREAD_MUTEX_ERRORCHECK.
> 
> Thanks,
> 
>>
>> Eric, you chimed in on the patch that became commit 24fa90499f, what do
>> you suggest?

If, after forking, you can successfully destroy the mutex to then
reinitialize it (even though you can't unlock it), then that sounds as
good as anything I can come up with.

An alternative approach might be to add a new mutex that anyone obtains
just before forking; as long as you hold that mutex, you can then
release any other mutex, fork, and then reobtain in the parent - but it
still becomes tricky bookkeeping to know which locks need to be dropped
and reobtained, and I worry that gating fork performance with such a
heavy lock will have noticeable slowdowns.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH for-2.11] rcu: init globals only once
Posted by Peter Xu 6 years, 8 months ago
On Tue, Aug 08, 2017 at 09:09:23AM -0500, Eric Blake wrote:
> On 08/08/2017 02:49 AM, Peter Xu wrote:
> >> This doesn't work for error-checking mutexes: rcu_init_child has a
> >> different PID than the parent, so the mutexes aren't unlocked.  It's
> >> also true that right now we don't use error-checking mutexes (commit
> >> 24fa90499f, "qemu-thread: do not use PTHREAD_MUTEX_ERRORCHECK",
> >> 2015-03-10); however, that's also a bit sad.
> >>
> >> The reason for the undefined behavior is probably that some operating
> >> systems allocate memory in pthread_mutex_init, and initializing twice
> >> causes a memory leak.  One such operating system is OpenBSD. :(
> > 
> > Good to know. :)
> > 
> > I thought pthread_atfork() was designed to solve such a locking
> > problem (in child hanlder, we unlock all the held locks).
> 
> What's also sad is that POSIX says that pthread_atfork() is rather
> useless - there's no way it can be reliably used to do everything that
> everyone wants (and I think this case of error-checking mutexes is just
> ONE of those reasons).
> 
> > If
> > PTHREAD_MUTEX_ERRORCHECK cannot coop well with it, not sure whether
> > that means we should just avoid using PTHREAD_MUTEX_ERRORCHECK in such
> > a use case (but we should be able to use the error checks in other
> > mutexes that do not need extra fork handling)?
> > 
> > Another idea is: can we just destroy the mutex first then re-init it
> > in subprocess? A quick glance in libpthread code shows that at least
> > pthread_mutex_destroy() won't check PTHREAD_MUTEX_ERRORCHECK.
> > 
> > Thanks,
> > 
> >>
> >> Eric, you chimed in on the patch that became commit 24fa90499f, what do
> >> you suggest?
> 
> If, after forking, you can successfully destroy the mutex to then
> reinitialize it (even though you can't unlock it), then that sounds as
> good as anything I can come up with.
> 
> An alternative approach might be to add a new mutex that anyone obtains
> just before forking; as long as you hold that mutex, you can then
> release any other mutex, fork, and then reobtain in the parent - but it
> still becomes tricky bookkeeping to know which locks need to be dropped
> and reobtained, and I worry that gating fork performance with such a
> heavy lock will have noticeable slowdowns.

It sounds like a MBQL (Much Bigger Qemu Lock :-).

I am thinking whether we can simplify this. After all, we have
existing assumptions:

1. for "-daemonize", we need the pthread_atfork() thing to make sure
   RCU works well even in the daemonized process

2. for all the rest of the fork cases, we assume that it will always
   be a quick exec() afterward, so need to maintain memory consistency

Then, why not we just initialize RCU after os_daemonize()? IIUC, we
can throw pthread_atfork() away then (considering that even it is not
suggested to use "officially").

I guess it also depends on whether we'll need RCU before
os_daemonize(). I assume not, but I may be wrong.

Thanks,

-- 
Peter Xu