[PATCH 0/2] tests: Don't destroy locked mutexes

Michal Privoznik posted 2 patches 2 years, 2 months ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/cover.1644844874.git.mprivozn@redhat.com
tests/qemuhotplugtest.c      | 13 +++++--------
tests/qemumonitortestutils.c |  7 ++++++-
tests/qemusecuritytest.c     |  3 ++-
tests/qemuxml2argvtest.c     | 22 +++++++++++++---------
4 files changed, 26 insertions(+), 19 deletions(-)
[PATCH 0/2] tests: Don't destroy locked mutexes
Posted by Michal Privoznik 2 years, 2 months ago
I've rewritten our virMutexes to check for failures [1] and saw couple
of problems. In most cases we are destroying a locked mutex which these
patches aim to fix. Then we have virLogLock() which is locked in
virFork() and then unlocked from child (a different process) which is
problematic. Pthread doesn't like it when one thread locks a mutex only
so that another one unlocks it. Semaphores don't care.

Anyway, leave virLogMutex aside, patches here fix problems at hand.

BTW: commits v8.0.0-267-g39ac285c6b and v8.0.0-236-ga7201789ab were
spotted because of this branch of mine.

1: https://gitlab.com/MichalPrivoznik/libvirt/-/commits/tests_mutext/

Michal Prívozník (2):
  tests: Track if @vm was created by qemuMonitorCommonTestNew()
  tests: Destroy domain object properly

 tests/qemuhotplugtest.c      | 13 +++++--------
 tests/qemumonitortestutils.c |  7 ++++++-
 tests/qemusecuritytest.c     |  3 ++-
 tests/qemuxml2argvtest.c     | 22 +++++++++++++---------
 4 files changed, 26 insertions(+), 19 deletions(-)

-- 
2.34.1

Re: [PATCH 0/2] tests: Don't destroy locked mutexes
Posted by Daniel P. Berrangé 2 years, 2 months ago
On Mon, Feb 14, 2022 at 02:32:55PM +0100, Michal Privoznik wrote:
> I've rewritten our virMutexes to check for failures [1] and saw couple
> of problems. In most cases we are destroying a locked mutex which these
> patches aim to fix. Then we have virLogLock() which is locked in
> virFork() and then unlocked from child (a different process) which is
> problematic. Pthread doesn't like it when one thread locks a mutex only
> so that another one unlocks it. Semaphores don't care.

Despite behaviuor being undefined, in practice it works with any
impl we have in supported platforms.

The key problem we're addressing is that the child process needs
to inherit a known state for the mutex. Whether that state is
locked or unlocked doesn't matter, as long as it has a bulletproof
guarantee of what that status is.

We call virLogLock before fork() to guarantee a locked state,
by blocking on any other threads in the parent that might be
temporarily holding it.

The pthread_atfork() handler can be used todo the same trick
if you don't control the place where fork() is called. The
man page explicitly says

  https://linux.die.net/man/3/pthread_atfork

   "The expected usage is that the prepare handler acquires
    all mutex locks and the other two fork handlers release
    them."

I understand why error checking mutexes reject this but AFAIK
what we do is the only possible solution that exists, other
than never touching any mutexes at all in the child which is
not viable for us.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH 0/2] tests: Don't destroy locked mutexes
Posted by Michal Prívozník 2 years, 2 months ago
On 2/14/22 15:01, Daniel P. Berrangé wrote:
> On Mon, Feb 14, 2022 at 02:32:55PM +0100, Michal Privoznik wrote:
>> I've rewritten our virMutexes to check for failures [1] and saw couple
>> of problems. In most cases we are destroying a locked mutex which these
>> patches aim to fix. Then we have virLogLock() which is locked in
>> virFork() and then unlocked from child (a different process) which is
>> problematic. Pthread doesn't like it when one thread locks a mutex only
>> so that another one unlocks it. Semaphores don't care.
> 
> Despite behaviuor being undefined, in practice it works with any
> impl we have in supported platforms.
> 

Indeed, that's why I haven't sent the patch that rewrites virLogMutex.

> The key problem we're addressing is that the child process needs
> to inherit a known state for the mutex. Whether that state is
> locked or unlocked doesn't matter, as long as it has a bulletproof
> guarantee of what that status is.
> 
> We call virLogLock before fork() to guarantee a locked state,
> by blocking on any other threads in the parent that might be
> temporarily holding it.

Again, I understand that. It's just that the following code returns an
error:

#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[])
{
    pthread_mutex_t l;
    pthread_mutexattr_t attr;

    pthread_mutexattr_init(&attr);
    pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ERRORCHECK);

    pthread_mutex_init(&l, &attr);

    pthread_mutex_lock(&l);
    if (fork() == 0) {
        /* child */
        int rc;

        if ((rc = pthread_mutex_unlock(&l)) != 0) {
            fprintf(stderr, "unlock error: %s\n", strerror(rc));
            abort();
        }
    }

    return 0;
}


Switching to a semaphore works again. I'm not advocating for that
though, it's needless because apparently pthread does the right thing on
all platforms we care about.

> 
> The pthread_atfork() handler can be used todo the same trick
> if you don't control the place where fork() is called. The
> man page explicitly says
> 
>   https://linux.die.net/man/3/pthread_atfork
> 
>    "The expected usage is that the prepare handler acquires
>     all mutex locks and the other two fork handlers release
>     them."
> 

Which is a terrible advice, let me say. This assumes that there is a
global list of mutexes (which there certainly is not in projects of our
size), but more importantly - this is so fragile to lock ordering than
anything else.

Michal

Re: [PATCH 0/2] tests: Don't destroy locked mutexes
Posted by Daniel P. Berrangé 2 years, 2 months ago
On Mon, Feb 14, 2022 at 03:27:51PM +0100, Michal Prívozník wrote:
> On 2/14/22 15:01, Daniel P. Berrangé wrote:
> > On Mon, Feb 14, 2022 at 02:32:55PM +0100, Michal Privoznik wrote:
> >> I've rewritten our virMutexes to check for failures [1] and saw couple
> >> of problems. In most cases we are destroying a locked mutex which these
> >> patches aim to fix. Then we have virLogLock() which is locked in
> >> virFork() and then unlocked from child (a different process) which is
> >> problematic. Pthread doesn't like it when one thread locks a mutex only
> >> so that another one unlocks it. Semaphores don't care.
> > 
> > Despite behaviuor being undefined, in practice it works with any
> > impl we have in supported platforms.
> > 
> 
> Indeed, that's why I haven't sent the patch that rewrites virLogMutex.
> 
> > The key problem we're addressing is that the child process needs
> > to inherit a known state for the mutex. Whether that state is
> > locked or unlocked doesn't matter, as long as it has a bulletproof
> > guarantee of what that status is.
> > 
> > We call virLogLock before fork() to guarantee a locked state,
> > by blocking on any other threads in the parent that might be
> > temporarily holding it.
> 
> Again, I understand that. It's just that the following code returns an
> error:
> 
> #include <pthread.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <stdio.h>
> #include <string.h>
> 
> int main(int argc, char *argv[])
> {
>     pthread_mutex_t l;
>     pthread_mutexattr_t attr;
> 
>     pthread_mutexattr_init(&attr);
>     pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ERRORCHECK);
> 
>     pthread_mutex_init(&l, &attr);
> 
>     pthread_mutex_lock(&l);
>     if (fork() == 0) {
>         /* child */
>         int rc;
> 
>         if ((rc = pthread_mutex_unlock(&l)) != 0) {
>             fprintf(stderr, "unlock error: %s\n", strerror(rc));
>             abort();
>         }
>     }
> 
>     return 0;
> }
> 
> 
> Switching to a semaphore works again. I'm not advocating for that
> though, it's needless because apparently pthread does the right thing on
> all platforms we care about.

I was thinking an alternative was to call 'pthread_mutex_init' in
the child to blindly re-initialize the mutex, since we know the
parent had synchronize already. Downside though is that we could
leak resources if mutexes have some associated OS state we don't
know about.

The 'robust' mutex type looked quite attractive because it says

      PTHREAD_MUTEX_ROBUST
              If a mutex is initialized with the PTHREAD_MUTEX_ROBUST
              attribute and its owner dies without unlocking it,  any
              future  attempts  to call pthread_mutex_lock(3) on this
              mutex will succeed and return  EOWNERDEAD  to  indicate
              that  the original owner no longer exists and the mutex
              is in an inconsistent state.  Usually after  EOWNERDEAD
              is  returned,  the  next  owner should call pthread_mu‐
              tex_consistent(3) on the acquired mutex to make it con‐
              sistent again before using it any further.

IOW we could

    pthread_mutex_lock(&m)
    pid = fork()
    if (pid == 0) {
        pthread_mutex_lock(&m)
        pthread_mutex_consistent(&m)
    } else {
        pthread_mutex_unlock(&m)
    }

but in practice that doesn't work. The lock() call in the child
deadlocks instead of returning EOWNERDEAD. IOW, from POV of a
robust mutex the parent thread & child process are the same,
but from POV of an error chekcing mutex they are different.
Yay for consistency :-(

> > The pthread_atfork() handler can be used todo the same trick
> > if you don't control the place where fork() is called. The
> > man page explicitly says
> > 
> >   https://linux.die.net/man/3/pthread_atfork
> > 
> >    "The expected usage is that the prepare handler acquires
> >     all mutex locks and the other two fork handlers release
> >     them."
> > 
> 
> Which is a terrible advice, let me say. This assumes that there is a
> global list of mutexes (which there certainly is not in projects of our
> size), but more importantly - this is so fragile to lock ordering than
> anything else.

Yeah lock ordering would be a trainwreck.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|