arch/mips/include/asm/spinlock.h | 6 +-- arch/x86/include/asm/paravirt-spinlock.h | 6 +-- include/asm-generic/qrwlock.h | 48 ++++++++++++++++++++---- include/asm-generic/qspinlock.h | 33 ++++++++++++++-- include/linux/percpu-rwsem.h | 15 ++------ include/trace/events/lock.h | 18 ++++++++- kernel/locking/mutex.c | 4 ++ kernel/locking/percpu-rwsem.c | 29 ++++++++++++++ kernel/locking/qrwlock.c | 16 ++++++++ kernel/locking/qspinlock.c | 8 ++++ kernel/locking/rtmutex.c | 1 + kernel/locking/rwbase_rt.c | 6 +++ kernel/locking/rwsem.c | 10 ++++- kernel/locking/semaphore.c | 4 ++ 14 files changed, 172 insertions(+), 32 deletions(-)
The existing contention_begin/contention_end tracepoints fire on the waiter side. The lock holder's identity and stack can be captured at contention_begin time (e.g. perf lock contention --lock-owner), but this reflects the holder's state when a waiter arrives, not when the lock is actually released. This series adds a contended_release tracepoint that fires on the holder side when a lock with waiters is released. This provides: - Hold time estimation: when the holder's own acquisition was contended, its contention_end (acquisition) and contended_release can be correlated to measure how long the lock was held under contention. - The holder's stack at release time, which may differ from what perf lock contention --lock-owner captures if the holder does significant work between the waiter's arrival and the unlock. Note: for reader/writer locks, the tracepoint fires for every reader releasing while a writer is waiting, not only for the last reader. v3 -> v4: - Fix spurious events in __percpu_up_read(): guard with rcuwait_active(&sem->writer) to avoid tracing during the RCU grace period after a writer releases (Sashiko). - Fix possible use-after-free in semaphore up(): move trace_contended_release() inside the sem->lock critical section (Sashiko). - Fix build failure with CONFIG_PARAVIRT_SPINLOCKS=y: introduce queued_spin_release() as the arch-overridable unlock primitive, so queued_spin_unlock() can be a generic tracing wrapper. Convert x86 (paravirt) and MIPS overrides (Sashiko). - Add EXPORT_TRACEPOINT_SYMBOL_GPL(contended_release) for module support (Sashiko). - Split spinning locks patch: factor out queued_spin_release() as a separate preparatory commit (Sashiko). - Make read unlock tracepoint behavior consistent across all reader/writer lock types: fire for every reader releasing while a writer is waiting (rwsem, rwbase_rt were previously last-reader only). v2 -> v3: - Added new patch: extend contended_release tracepoint to queued spinlocks and queued rwlocks (marked as RFC, requesting feedback). This is prompted by Matthew Wilcox's suggestion to try to come up with generic instrumentation, instead of instrumenting each "special" lock manually. See [1] for the discussion. - Reworked tracepoint placement to fire before the lock is released and before the waiter is woken where possible, for consistency with spinning locks where there is no explicit wake (inspired by Usama Arif's suggestion). - Remove unnecessary linux/sched.h include from trace/events/lock.h. RFC -> v2: - Add trace_contended_release_enabled() guard before waiter checks that exist only for the tracepoint (Steven Rostedt). - Rename __percpu_up_read_slowpath() to __percpu_up_read() (Peter Zijlstra). - Add extern for __percpu_up_read() (Peter Zijlstra). - Squashed tracepoint introduction and usage commits (Masami Hiramatsu). v3: https://lore.kernel.org/all/cover.1773858853.git.d@ilvokhin.com/ v2: https://lore.kernel.org/all/cover.1773164180.git.d@ilvokhin.com/ RFC: https://lore.kernel.org/all/cover.1772642407.git.d@ilvokhin.com/ [1]: https://lore.kernel.org/all/aa7G1nD7Rd9F4eBH@casper.infradead.org/ Dmitry Ilvokhin (5): tracing/lock: Remove unnecessary linux/sched.h include locking/percpu-rwsem: Extract __percpu_up_read() locking: Add contended_release tracepoint to sleepable locks locking: Factor out queued_spin_release() locking: Add contended_release tracepoint to spinning locks arch/mips/include/asm/spinlock.h | 6 +-- arch/x86/include/asm/paravirt-spinlock.h | 6 +-- include/asm-generic/qrwlock.h | 48 ++++++++++++++++++++---- include/asm-generic/qspinlock.h | 33 ++++++++++++++-- include/linux/percpu-rwsem.h | 15 ++------ include/trace/events/lock.h | 18 ++++++++- kernel/locking/mutex.c | 4 ++ kernel/locking/percpu-rwsem.c | 29 ++++++++++++++ kernel/locking/qrwlock.c | 16 ++++++++ kernel/locking/qspinlock.c | 8 ++++ kernel/locking/rtmutex.c | 1 + kernel/locking/rwbase_rt.c | 6 +++ kernel/locking/rwsem.c | 10 ++++- kernel/locking/semaphore.c | 4 ++ 14 files changed, 172 insertions(+), 32 deletions(-) -- 2.52.0
On Thu, 26 Mar 2026 15:09:59 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote: > The existing contention_begin/contention_end tracepoints fire on the > waiter side. The lock holder's identity and stack can be captured at > contention_begin time (e.g. perf lock contention --lock-owner), but > this reflects the holder's state when a waiter arrives, not when the > lock is actually released. > > This series adds a contended_release tracepoint that fires on the > holder side when a lock with waiters is released. This provides: > > - Hold time estimation: when the holder's own acquisition was > contended, its contention_end (acquisition) and contended_release > can be correlated to measure how long the lock was held under > contention. > > - The holder's stack at release time, which may differ from what perf lock > contention --lock-owner captures if the holder does significant work between > the waiter's arrival and the unlock. > > Note: for reader/writer locks, the tracepoint fires for every reader > releasing while a writer is waiting, not only for the last reader. > Would it be better to reorder the patches? It would help with git bisectability as well. Move the refractoring work in patch 4 and 5 (excluding adding the tracepoints ofcourse) earlier, and then add all the tracepoints in the same commit at the end? It would help in the future with git blame to see where all the tracepoints were added as well.
On Tue, Mar 31, 2026 at 03:27:03AM -0700, Usama Arif wrote: > On Thu, 26 Mar 2026 15:09:59 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote: > > > The existing contention_begin/contention_end tracepoints fire on the > > waiter side. The lock holder's identity and stack can be captured at > > contention_begin time (e.g. perf lock contention --lock-owner), but > > this reflects the holder's state when a waiter arrives, not when the > > lock is actually released. > > > > This series adds a contended_release tracepoint that fires on the > > holder side when a lock with waiters is released. This provides: > > > > - Hold time estimation: when the holder's own acquisition was > > contended, its contention_end (acquisition) and contended_release > > can be correlated to measure how long the lock was held under > > contention. > > > > - The holder's stack at release time, which may differ from what perf lock > > contention --lock-owner captures if the holder does significant work between > > the waiter's arrival and the unlock. > > > > Note: for reader/writer locks, the tracepoint fires for every reader > > releasing while a writer is waiting, not only for the last reader. > > > > Would it be better to reorder the patches? It would help with git > bisectability as well. Move the refractoring work in patch 4 and > 5 (excluding adding the tracepoints ofcourse) earlier, and then add > all the tracepoints in the same commit at the end? It would help > in the future with git blame to see where all the tracepoints > were added as well. Thanks for the suggestion, Usama. I'd prefer to keep the current ordering: each refactoring commit is immediately followed by the commit that uses it. For example, queued_spin_release() is factored out right before the commit that adds the tracepoint to spinning locks. This makes the motivation for the refactoring clear and should also ease the review since the context is still fresh. Bisectability should be fine as-is, each commit compiles and works independently, since the refactoring patches do not introduce behavioral changes on their own.
On Thu, Mar 26, 2026 at 03:09:59PM +0000, Dmitry Ilvokhin wrote: > The existing contention_begin/contention_end tracepoints fire on the > waiter side. The lock holder's identity and stack can be captured at > contention_begin time (e.g. perf lock contention --lock-owner), but > this reflects the holder's state when a waiter arrives, not when the > lock is actually released. > > This series adds a contended_release tracepoint that fires on the > holder side when a lock with waiters is released. This provides: > > - Hold time estimation: when the holder's own acquisition was > contended, its contention_end (acquisition) and contended_release > can be correlated to measure how long the lock was held under > contention. > > - The holder's stack at release time, which may differ from what perf lock > contention --lock-owner captures if the holder does significant work between > the waiter's arrival and the unlock. As someone who's not an expert in this area (so please use short words to explain it to me), why do we want to know how long this holder took to release the lock from when it became contended? I understand why we want to know how long any given waiter had to wait to gain the lock (but we already have tracepoints which show that). I also don't understand why we want to know the holder's stack at release time. The stack at contention-begin time will include the point at which the lock was acquired which should be correlated with where the lock was released. Perhaps examples might help me understand why we want this?
On Thu, Mar 26, 2026 at 03:55:21PM +0000, Matthew Wilcox wrote: > On Thu, Mar 26, 2026 at 03:09:59PM +0000, Dmitry Ilvokhin wrote: > > The existing contention_begin/contention_end tracepoints fire on the > > waiter side. The lock holder's identity and stack can be captured at > > contention_begin time (e.g. perf lock contention --lock-owner), but > > this reflects the holder's state when a waiter arrives, not when the > > lock is actually released. > > > > This series adds a contended_release tracepoint that fires on the > > holder side when a lock with waiters is released. This provides: > > > > - Hold time estimation: when the holder's own acquisition was > > contended, its contention_end (acquisition) and contended_release > > can be correlated to measure how long the lock was held under > > contention. > > > > - The holder's stack at release time, which may differ from what perf lock > > contention --lock-owner captures if the holder does significant work between > > the waiter's arrival and the unlock. > > As someone who's not an expert in this area (so please use short words > to explain it to me), why do we want to know how long this holder took > to release the lock from when it became contended? > > I understand why we want to know how long any given waiter had to wait > to gain the lock (but we already have tracepoints which show that). I think the simplest way to think about it is the following. Waiter time is the symptom, while holder time is the cause. The waiter-side contention_begin/contention_end tells us how long a waiter waited, but that time can span multiple holders. If a waiter waited 10 ms, we can not tell whether one holder held the lock for 10 ms or five holders held it for 2 ms each. These need different treatments: the first means shrink the critical section, the second means reduce lock frequency or split the lock. Today we can not distinguish between these cases from waiter-side data alone. > > I also don't understand why we want to know the holder's stack at > release time. The stack at contention-begin time will include > the point at which the lock was acquired which should be correlated > with where the lock was released. > > Perhaps examples might help me understand why we want this? Holder's stack allows us to understand who exactly waiters were waiting for to release the lock. The stack at contention_begin time does not always include the holder's stack. The --lock-owner feature works by reading the owner field from the lock struct, but it only supports mutex and rwsem. For spinlocks, queued rwlocks, semaphores, and several others, the waiter has no visibility into the holder whatsoever. contended_release fires in the holder's context, so we get the holder's stack at release time. For spinlocks, this is the only way to get any holder-side information. Original motivation was zone lock contention (a spinlock) in Meta production workloads. We could see waiters were blocked, but had no way to identify the holders or what they were doing.
On Thu, 26 Mar 2026 15:55:21 +0000 Matthew Wilcox <willy@infradead.org> wrote: > > - The holder's stack at release time, which may differ from what perf lock > > contention --lock-owner captures if the holder does significant work between > > the waiter's arrival and the unlock. > > As someone who's not an expert in this area (so please use short words > to explain it to me), why do we want to know how long this holder took > to release the lock from when it became contended? > > I understand why we want to know how long any given waiter had to wait > to gain the lock (but we already have tracepoints which show that). > > I also don't understand why we want to know the holder's stack at > release time. The stack at contention-begin time will include > the point at which the lock was acquired which should be correlated > with where the lock was released. > > Perhaps examples might help me understand why we want this? Dmitry could give his own rationale for this, but I have my only use case. This would be useful to find out how long the critical section is. If a lock is highly contended by many tasks, you could get a high contention time simply because other tasks are causing the delay for the waiter. Seeing the release time and location would let you also know how long the critical section was held, and if the length of the critical section is causing the contention. Having a stack trace of the release would differentiate the path that released the lock, as there can be many places that release them. Although, I have to admit, I'm not sure there are many different places locks are released. Especially now that we have guard(), which will make all the releases in a function at the same location. -- Steve
© 2016 - 2026 Red Hat, Inc.