[PATCH 1/2] virMutex: Warn on error

Tim Wiederhake posted 2 patches 3 months, 2 weeks ago
[PATCH 1/2] virMutex: Warn on error
Posted by Tim Wiederhake 3 months, 2 weeks ago
`pthread_mutex_destroy`, `pthread_mutex_lock` and `pthread_mutex_unlock`
return an error code that is currently ignored.

Add debug information if one of these operations failed, e.g. when there
is an attempt to destroy a still locked mutex or unlock an already
unlocked mutex. Both scenarios are considered undefined behavior.

Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
---
 src/util/virthread.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/util/virthread.c b/src/util/virthread.c
index 5422bb74fd..14116a2221 100644
--- a/src/util/virthread.c
+++ b/src/util/virthread.c
@@ -35,7 +35,10 @@
 
 #include "viralloc.h"
 #include "virthreadjob.h"
+#include "virlog.h"
 
+#define VIR_FROM_THIS VIR_FROM_THREAD
+VIR_LOG_INIT("util.thread");
 
 int virOnce(virOnceControl *once, virOnceFunc init)
 {
@@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m)
 
 void virMutexDestroy(virMutex *m)
 {
-    pthread_mutex_destroy(&m->lock);
+    if (pthread_mutex_destroy(&m->lock)) {
+        VIR_WARN("Failed to destroy mutex=%p", m);
+    }
 }
 
 void virMutexLock(virMutex *m)
 {
-    pthread_mutex_lock(&m->lock);
+    if (pthread_mutex_lock(&m->lock)) {
+        VIR_WARN("Failed to lock mutex=%p", m);
+    }
 }
 
 void virMutexUnlock(virMutex *m)
 {
-    pthread_mutex_unlock(&m->lock);
+    if (pthread_mutex_unlock(&m->lock)) {
+        VIR_WARN("Failed to unlock mutex=%p", m);
+    }
 }
 
 virLockGuard virLockGuardLock(virMutex *m)
-- 
2.43.0
Re: [PATCH 1/2] virMutex: Warn on error
Posted by Daniel P. Berrangé 3 months, 2 weeks ago
On Wed, Jul 03, 2024 at 02:44:37PM +0200, Tim Wiederhake wrote:
> `pthread_mutex_destroy`, `pthread_mutex_lock` and `pthread_mutex_unlock`
> return an error code that is currently ignored.
> 
> Add debug information if one of these operations failed, e.g. when there
> is an attempt to destroy a still locked mutex or unlock an already
> unlocked mutex. Both scenarios are considered undefined behavior.
> 
> Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
> ---
>  src/util/virthread.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/src/util/virthread.c b/src/util/virthread.c
> index 5422bb74fd..14116a2221 100644
> --- a/src/util/virthread.c
> +++ b/src/util/virthread.c
> @@ -35,7 +35,10 @@
>  
>  #include "viralloc.h"
>  #include "virthreadjob.h"
> +#include "virlog.h"
>  
> +#define VIR_FROM_THIS VIR_FROM_THREAD
> +VIR_LOG_INIT("util.thread");
>  
>  int virOnce(virOnceControl *once, virOnceFunc init)
>  {
> @@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m)
>  
>  void virMutexDestroy(virMutex *m)
>  {
> -    pthread_mutex_destroy(&m->lock);
> +    if (pthread_mutex_destroy(&m->lock)) {
> +        VIR_WARN("Failed to destroy mutex=%p", m);
> +    }
>  }
>  
>  void virMutexLock(virMutex *m)
>  {
> -    pthread_mutex_lock(&m->lock);
> +    if (pthread_mutex_lock(&m->lock)) {
> +        VIR_WARN("Failed to lock mutex=%p", m);
> +    }
>  }
>  
>  void virMutexUnlock(virMutex *m)
>  {
> -    pthread_mutex_unlock(&m->lock);
> +    if (pthread_mutex_unlock(&m->lock)) {
> +        VIR_WARN("Failed to unlock mutex=%p", m);
> +    }
>  }

I'd be surprised if these lock/unlock warnings ever trigger, since IIUC
they would need us to be using an error checking mutex, not a regular
mutex. IOW, aren't these just adding condition test overhead + unreachable
code to the lock calls ?

The 2nd patch shows failures in the destroy calls IIUC.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH 1/2] virMutex: Warn on error
Posted by Tim Wiederhake 3 months ago
On Thu, 2024-07-04 at 08:10 +0100, Daniel P. Berrangé wrote:
> On Wed, Jul 03, 2024 at 02:44:37PM +0200, Tim Wiederhake wrote:
> > `pthread_mutex_destroy`, `pthread_mutex_lock` and
> > `pthread_mutex_unlock`
> > return an error code that is currently ignored.
> > 
> > Add debug information if one of these operations failed, e.g. when
> > there
> > is an attempt to destroy a still locked mutex or unlock an already
> > unlocked mutex. Both scenarios are considered undefined behavior.
> > 
> > Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
> > ---
> >  src/util/virthread.c | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> > 
> > diff --git a/src/util/virthread.c b/src/util/virthread.c
> > index 5422bb74fd..14116a2221 100644
> > --- a/src/util/virthread.c
> > +++ b/src/util/virthread.c
> > @@ -35,7 +35,10 @@
> >  
> >  #include "viralloc.h"
> >  #include "virthreadjob.h"
> > +#include "virlog.h"
> >  
> > +#define VIR_FROM_THIS VIR_FROM_THREAD
> > +VIR_LOG_INIT("util.thread");
> >  
> >  int virOnce(virOnceControl *once, virOnceFunc init)
> >  {
> > @@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m)
> >  
> >  void virMutexDestroy(virMutex *m)
> >  {
> > -    pthread_mutex_destroy(&m->lock);
> > +    if (pthread_mutex_destroy(&m->lock)) {
> > +        VIR_WARN("Failed to destroy mutex=%p", m);
> > +    }
> >  }
> >  
> >  void virMutexLock(virMutex *m)
> >  {
> > -    pthread_mutex_lock(&m->lock);
> > +    if (pthread_mutex_lock(&m->lock)) {
> > +        VIR_WARN("Failed to lock mutex=%p", m);
> > +    }
> >  }
> >  
> >  void virMutexUnlock(virMutex *m)
> >  {
> > -    pthread_mutex_unlock(&m->lock);
> > +    if (pthread_mutex_unlock(&m->lock)) {
> > +        VIR_WARN("Failed to unlock mutex=%p", m);
> > +    }
> >  }
> 
> I'd be surprised if these lock/unlock warnings ever trigger, since
> IIUC
> they would need us to be using an error checking mutex, not a regular
> mutex. IOW, aren't these just adding condition test overhead +
> unreachable
> code to the lock calls ?
> 
> The 2nd patch shows failures in the destroy calls IIUC.
> 
> With regards,
> Daniel

I have looked more closely into the issue now. pthread_mutex_lock and
pthread_mutex_unlock do indeed not return a non-zero value over us not
using error checking mutexes.

During my last attempt at fixing the issues I had a patch that would
count lockings and unlockings of mutexes explicitly, and I believe I
recall seeing problems in that area as well. Sadly, I cannot reproduce
that now, at least not reliably: Ignoring the warnings for
pthread_mutex_destroy, virnetdaemontest does seem to trigger my "number
of locks == number of unlocks" check in about 3 out of 10.000 runs. And
sometimes with a frequency of 1 in 10. Sometimes not at all. In any
case: I do not consider the checks for locking / unlocking dead code.

So far I have been using the test suite to check for obvious issues,
but I cannot rule out that libvirt itself has race conditions too.

I would advocate for merging this patch as is, and add a patch to
enable error checking for the mutexes.

Regards,
Tim
Re: [PATCH 1/2] virMutex: Warn on error
Posted by Daniel P. Berrangé 3 months ago
On Thu, Jul 18, 2024 at 03:50:27PM +0200, Tim Wiederhake wrote:
> On Thu, 2024-07-04 at 08:10 +0100, Daniel P. Berrangé wrote:
> > On Wed, Jul 03, 2024 at 02:44:37PM +0200, Tim Wiederhake wrote:
> > > `pthread_mutex_destroy`, `pthread_mutex_lock` and
> > > `pthread_mutex_unlock`
> > > return an error code that is currently ignored.
> > > 
> > > Add debug information if one of these operations failed, e.g. when
> > > there
> > > is an attempt to destroy a still locked mutex or unlock an already
> > > unlocked mutex. Both scenarios are considered undefined behavior.
> > > 
> > > Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
> > > ---
> > >  src/util/virthread.c | 15 ++++++++++++---
> > >  1 file changed, 12 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/src/util/virthread.c b/src/util/virthread.c
> > > index 5422bb74fd..14116a2221 100644
> > > --- a/src/util/virthread.c
> > > +++ b/src/util/virthread.c
> > > @@ -35,7 +35,10 @@
> > >  
> > >  #include "viralloc.h"
> > >  #include "virthreadjob.h"
> > > +#include "virlog.h"
> > >  
> > > +#define VIR_FROM_THIS VIR_FROM_THREAD
> > > +VIR_LOG_INIT("util.thread");
> > >  
> > >  int virOnce(virOnceControl *once, virOnceFunc init)
> > >  {
> > > @@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m)
> > >  
> > >  void virMutexDestroy(virMutex *m)
> > >  {
> > > -    pthread_mutex_destroy(&m->lock);
> > > +    if (pthread_mutex_destroy(&m->lock)) {
> > > +        VIR_WARN("Failed to destroy mutex=%p", m);
> > > +    }
> > >  }
> > >  
> > >  void virMutexLock(virMutex *m)
> > >  {
> > > -    pthread_mutex_lock(&m->lock);
> > > +    if (pthread_mutex_lock(&m->lock)) {
> > > +        VIR_WARN("Failed to lock mutex=%p", m);
> > > +    }
> > >  }
> > >  
> > >  void virMutexUnlock(virMutex *m)
> > >  {
> > > -    pthread_mutex_unlock(&m->lock);
> > > +    if (pthread_mutex_unlock(&m->lock)) {
> > > +        VIR_WARN("Failed to unlock mutex=%p", m);
> > > +    }
> > >  }
> > 
> > I'd be surprised if these lock/unlock warnings ever trigger, since
> > IIUC
> > they would need us to be using an error checking mutex, not a regular
> > mutex. IOW, aren't these just adding condition test overhead +
> > unreachable
> > code to the lock calls ?
> > 
> > The 2nd patch shows failures in the destroy calls IIUC.
> 
> I have looked more closely into the issue now. pthread_mutex_lock and
> pthread_mutex_unlock do indeed not return a non-zero value over us not
> using error checking mutexes.
> 
> During my last attempt at fixing the issues I had a patch that would
> count lockings and unlockings of mutexes explicitly, and I believe I
> recall seeing problems in that area as well. Sadly, I cannot reproduce
> that now, at least not reliably: Ignoring the warnings for
> pthread_mutex_destroy, virnetdaemontest does seem to trigger my "number
> of locks == number of unlocks" check in about 3 out of 10.000 runs. And
> sometimes with a frequency of 1 in 10. Sometimes not at all. In any
> case: I do not consider the checks for locking / unlocking dead code.
> 
> So far I have been using the test suite to check for obvious issues,
> but I cannot rule out that libvirt itself has race conditions too.

But if the checks never fire, due to the mutex type, this isn't
helping us diagnose anything surely ?

> I would advocate for merging this patch as is, and add a patch to
> enable error checking for the mutexes.

Error checking mutexes aren't something we can easily enable, because
they break any code which needs to hold a mutex across fork() and
unlock in the child. IOW, we would need two different types of mutex
and pick which to use, where.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH 1/2] virMutex: Warn on error
Posted by Michal Prívozník 3 months, 2 weeks ago
On 7/3/24 14:44, Tim Wiederhake wrote:
> `pthread_mutex_destroy`, `pthread_mutex_lock` and `pthread_mutex_unlock`
> return an error code that is currently ignored.
> 
> Add debug information if one of these operations failed, e.g. when there
> is an attempt to destroy a still locked mutex or unlock an already
> unlocked mutex. Both scenarios are considered undefined behavior.
> 
> Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
> ---
>  src/util/virthread.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/src/util/virthread.c b/src/util/virthread.c
> index 5422bb74fd..14116a2221 100644
> --- a/src/util/virthread.c
> +++ b/src/util/virthread.c
> @@ -35,7 +35,10 @@
>  
>  #include "viralloc.h"
>  #include "virthreadjob.h"
> +#include "virlog.h"
>  
> +#define VIR_FROM_THIS VIR_FROM_THREAD
> +VIR_LOG_INIT("util.thread");
>  
>  int virOnce(virOnceControl *once, virOnceFunc init)
>  {
> @@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m)
>  
>  void virMutexDestroy(virMutex *m)
>  {
> -    pthread_mutex_destroy(&m->lock);
> +    if (pthread_mutex_destroy(&m->lock)) {
> +        VIR_WARN("Failed to destroy mutex=%p", m);
> +    }
>  }
>  
>  void virMutexLock(virMutex *m)
>  {
> -    pthread_mutex_lock(&m->lock);
> +    if (pthread_mutex_lock(&m->lock)) {
> +        VIR_WARN("Failed to lock mutex=%p", m);
> +    }
>  }
>  
>  void virMutexUnlock(virMutex *m)
>  {
> -    pthread_mutex_unlock(&m->lock);
> +    if (pthread_mutex_unlock(&m->lock)) {
> +        VIR_WARN("Failed to unlock mutex=%p", m);
> +    }
>  }
>  
>  virLockGuard virLockGuardLock(virMutex *m)

I'm a bit worried that without additional info (e.g. stacktrace) this
info is going to be useless. Though, even stacktrace might be not enough
:( But this can at least let us know there's something suspicious going on.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>

Michal