include/qemu/thread-posix.h | 1 + util/qemu-thread-common.h | 10 ++++++++++ util/qemu-thread-posix.c | 4 ++-- 3 files changed, 13 insertions(+), 2 deletions(-)
NOTE: mark patchset RFC because "make check" will easily fail; but I didn't
yet dig into why as I'm not familiar with the code paths that triggers, it
can be bugs hidden or something I missed. So RFC to just have some thoughts.
The first patch converts the new timedwait to use DEBUG_MUTEX paths too.
IMO this one is pretty much wanted. The second patch add a strict version
of pthread_mutex_unlock() check by making sure the lock is locked first.
This comes from a debugging of migration code where we have had functions
like:
/* func() must be with lockA held */
func() {
...
/* Temporarily release the lock */
qemu_mutex_unlock(lockA);
...
/* Retake the lock */
qemu_mutex_lock(lockA);
...
}
Since I found that pthread lib is very "friendly" to unpaired unlock and
just silently ignore it, returning 0 as succeed. It means when func() is
called without lockA held the unlock() above will be ignored, but the
follow up lock() will be real. Later it will easily cause deadlock
afterwards in func() above because they just don't pair anymore right after
the one ignored unlock().
Since it's harder to know where should we take the lock, it's still easily
to fail the unlock() upon a lock not being held at all, so it's at least
earlier than the deadlock later on lockA in some other thread.
Patch 2 can also be used to further replace [sg]et_iothread_locked(), I
think, then we need to move the "locked" to be outside DEBUG_MUTEX but only
keep the assert() inside. But we can discuss that later.
Comments welcomed, thanks.
Peter Xu (2):
qemu-thread: Enable the new timedwait to use DEBUG_MUTEX too
qemu-thread: Fail hard for suspecious mutex unlocks
include/qemu/thread-posix.h | 1 +
util/qemu-thread-common.h | 10 ++++++++++
util/qemu-thread-posix.c | 4 ++--
3 files changed, 13 insertions(+), 2 deletions(-)
--
2.37.3
On Tue, Oct 11, 2022 at 06:41:52PM -0400, Peter Xu wrote: > NOTE: mark patchset RFC because "make check" will easily fail; but I didn't > yet dig into why as I'm not familiar with the code paths that triggers, it > can be bugs hidden or something I missed. So RFC to just have some thoughts. I just noticed (after reminded from Dave) that the reclock was actually the recursive lock, which definitely won't work with patch 2 at all. OTOH I also noticed PTHREAD_MUTEX_ERRORCHECK which does the same unlock check that we can leverage (and it'll also check re-lock from the same thread which causes deadlock). I'll give that a shot instead. Please ignore this version. Patch 1 is still meaningful I think, but anyway I'll repost. Sorry for the noise. -- Peter Xu
On Wed, 12 Oct 2022 at 19:16, Peter Xu <peterx@redhat.com> wrote: > > On Tue, Oct 11, 2022 at 06:41:52PM -0400, Peter Xu wrote: > > NOTE: mark patchset RFC because "make check" will easily fail; but I didn't > > yet dig into why as I'm not familiar with the code paths that triggers, it > > can be bugs hidden or something I missed. So RFC to just have some thoughts. > > I just noticed (after reminded from Dave) that the reclock was actually the > recursive lock, which definitely won't work with patch 2 at all. > > OTOH I also noticed PTHREAD_MUTEX_ERRORCHECK which does the same unlock > check that we can leverage (and it'll also check re-lock from the same > thread which causes deadlock). I'll give that a shot instead. We used to use PTHREAD_MUTEX_ERRORCHECK, but stopped because it does not work with the idiom we use for handling mutexes across fork() where you take the lock in the parent, and then unlock it in the child after the fork. With glibc's implementation of PTHREAD_MUTEX_ERRORCHECK the unlock in the child fails. See commit 24fa90499f8b24bcba29 from 2015. thanks -- PMM
On Thu, Oct 13, 2022 at 11:31:20AM +0100, Peter Maydell wrote: > We used to use PTHREAD_MUTEX_ERRORCHECK, but stopped because it > does not work with the idiom we use for handling mutexes across > fork() where you take the lock in the parent, and then unlock it > in the child after the fork. With glibc's implementation of > PTHREAD_MUTEX_ERRORCHECK the unlock in the child fails. See > commit 24fa90499f8b24bcba29 from 2015. Oops, thanks Peter. -- Peter Xu
© 2016 - 2026 Red Hat, Inc.