[PATCH RFC 0/2] qemu-thread: Strict unlock check

Peter Xu posted 2 patches 3 years, 3 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20221011224154.644379-1-peterx@redhat.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>
include/qemu/thread-posix.h |  1 +
util/qemu-thread-common.h   | 10 ++++++++++
util/qemu-thread-posix.c    |  4 ++--
3 files changed, 13 insertions(+), 2 deletions(-)
[PATCH RFC 0/2] qemu-thread: Strict unlock check
Posted by Peter Xu 3 years, 3 months ago
NOTE: mark patchset RFC because "make check" will easily fail; but I didn't
yet dig into why as I'm not familiar with the code paths that triggers, it
can be bugs hidden or something I missed.  So RFC to just have some thoughts.

The first patch converts the new timedwait to use DEBUG_MUTEX paths too.
IMO this one is pretty much wanted.  The second patch add a strict version
of pthread_mutex_unlock() check by making sure the lock is locked first.

This comes from a debugging of migration code where we have had functions
like:

  /* func() must be with lockA held */
  func() {
    ...
    /* Temporarily release the lock */
    qemu_mutex_unlock(lockA);
    ...
    /* Retake the lock */
    qemu_mutex_lock(lockA);
    ...
  }

Since I found that pthread lib is very "friendly" to unpaired unlock and
just silently ignore it, returning 0 as succeed.  It means when func() is
called without lockA held the unlock() above will be ignored, but the
follow up lock() will be real.  Later it will easily cause deadlock
afterwards in func() above because they just don't pair anymore right after
the one ignored unlock().

Since it's harder to know where should we take the lock, it's still easily
to fail the unlock() upon a lock not being held at all, so it's at least
earlier than the deadlock later on lockA in some other thread.

Patch 2 can also be used to further replace [sg]et_iothread_locked(), I
think, then we need to move the "locked" to be outside DEBUG_MUTEX but only
keep the assert() inside.  But we can discuss that later.

Comments welcomed, thanks.

Peter Xu (2):
  qemu-thread: Enable the new timedwait to use DEBUG_MUTEX too
  qemu-thread: Fail hard for suspecious mutex unlocks

 include/qemu/thread-posix.h |  1 +
 util/qemu-thread-common.h   | 10 ++++++++++
 util/qemu-thread-posix.c    |  4 ++--
 3 files changed, 13 insertions(+), 2 deletions(-)

-- 
2.37.3
Re: [PATCH RFC 0/2] qemu-thread: Strict unlock check
Posted by Peter Xu 3 years, 3 months ago
On Tue, Oct 11, 2022 at 06:41:52PM -0400, Peter Xu wrote:
> NOTE: mark patchset RFC because "make check" will easily fail; but I didn't
> yet dig into why as I'm not familiar with the code paths that triggers, it
> can be bugs hidden or something I missed.  So RFC to just have some thoughts.

I just noticed (after reminded from Dave) that the reclock was actually the
recursive lock, which definitely won't work with patch 2 at all.

OTOH I also noticed PTHREAD_MUTEX_ERRORCHECK which does the same unlock
check that we can leverage (and it'll also check re-lock from the same
thread which causes deadlock).  I'll give that a shot instead.

Please ignore this version.  Patch 1 is still meaningful I think, but
anyway I'll repost.  Sorry for the noise.

-- 
Peter Xu
Re: [PATCH RFC 0/2] qemu-thread: Strict unlock check
Posted by Peter Maydell 3 years, 3 months ago
On Wed, 12 Oct 2022 at 19:16, Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Oct 11, 2022 at 06:41:52PM -0400, Peter Xu wrote:
> > NOTE: mark patchset RFC because "make check" will easily fail; but I didn't
> > yet dig into why as I'm not familiar with the code paths that triggers, it
> > can be bugs hidden or something I missed.  So RFC to just have some thoughts.
>
> I just noticed (after reminded from Dave) that the reclock was actually the
> recursive lock, which definitely won't work with patch 2 at all.
>
> OTOH I also noticed PTHREAD_MUTEX_ERRORCHECK which does the same unlock
> check that we can leverage (and it'll also check re-lock from the same
> thread which causes deadlock).  I'll give that a shot instead.

We used to use PTHREAD_MUTEX_ERRORCHECK, but stopped because it
does not work with the idiom we use for handling mutexes across
fork() where you take the lock in the parent, and then unlock it
in the child after the fork. With glibc's implementation of
PTHREAD_MUTEX_ERRORCHECK the unlock in the child fails. See
commit 24fa90499f8b24bcba29 from 2015.

thanks
-- PMM
Re: [PATCH RFC 0/2] qemu-thread: Strict unlock check
Posted by Peter Xu 3 years, 3 months ago
On Thu, Oct 13, 2022 at 11:31:20AM +0100, Peter Maydell wrote:
> We used to use PTHREAD_MUTEX_ERRORCHECK, but stopped because it
> does not work with the idiom we use for handling mutexes across
> fork() where you take the lock in the parent, and then unlock it
> in the child after the fork. With glibc's implementation of
> PTHREAD_MUTEX_ERRORCHECK the unlock in the child fails. See
> commit 24fa90499f8b24bcba29 from 2015.

Oops, thanks Peter.

-- 
Peter Xu