From nobody Thu Apr 2 22:08:45 2026 Received: from mail.ilvokhin.com (mail.ilvokhin.com [178.62.254.231]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45B0439183E; Thu, 26 Mar 2026 15:17:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=178.62.254.231 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774538262; cv=none; b=NgvUWLUiaXIHrAWkr1eYzUveCKtUgw2m5l+cFROU+8zGyEUt5E+iPQuVmRAPbhzE1yzlP1EBOqqF+qAuFk5SCkdjeNnlZWzqFUpgN2Sf9TbYa0U+JlNR+8tJ+t4CfwJEn44lP2e8n33Q6s/R0WtEqQusM5IzvdUozBGpvNEq+50= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774538262; c=relaxed/simple; bh=QuZfeCyIFeR73CU4lcxWQf6j+lQ4UzKQ2ub9B+9Tte4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F4BosFdDacvYnZhJTKhPgTyn0w2euC7/3vTy23ocuAjX5O0Mu14bsAortcDXTSElZ1O3tyGYn2nA7lRsJY5LvpsNKjw7YhxMGnXmnbS7hPEHoWQnaQYVm9f05D9sq3PJHiYM7VklyLvhwcWxhy5IV7ndpc7RGPs+A7tDftDkJHc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=ilvokhin.com; spf=pass smtp.mailfrom=ilvokhin.com; dkim=pass (1024-bit key) header.d=ilvokhin.com header.i=@ilvokhin.com header.b=CVHp7urw; arc=none smtp.client-ip=178.62.254.231 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=ilvokhin.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ilvokhin.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=ilvokhin.com header.i=@ilvokhin.com header.b="CVHp7urw" Received: from localhost.localdomain (shell.ilvokhin.com [138.68.190.75]) (Authenticated sender: d@ilvokhin.com) by mail.ilvokhin.com (Postfix) with ESMTPSA id A15A2BDE15; Thu, 26 Mar 2026 15:10:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ilvokhin.com; s=mail; t=1774537821; bh=i/M9PsdD1hvodrj8kym7pAUYAKq2QJMoXOtrpOgGSSk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=CVHp7urwzbN5ASyotqDRqqGd+HBvSD1RzM6rD7/UT4LcaHVhcSPSvqdam9KollFRC bPnWAZOq+H9bCYGM7CSQ1TsjvY8A0RzEooO/Ydq1T5xkNsQk6+u3rSCseIlPRRYBHW +QjOaMxZL31tg/TU0+IvGKyZkz0PskRVs7I9EKoM= From: Dmitry Ilvokhin To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng , Waiman Long , Thomas Bogendoerfer , Juergen Gross , Ajay Kaher , Alexey Makhalov , Broadcom internal kernel review list , Thomas Gleixner , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Arnd Bergmann , Dennis Zhou , Tejun Heo , Christoph Lameter , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers Cc: linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, virtualization@lists.linux.dev, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, kernel-team@meta.com, Dmitry Ilvokhin Subject: [PATCH v4 3/5] locking: Add contended_release tracepoint to sleepable locks Date: Thu, 26 Mar 2026 15:10:02 +0000 Message-ID: X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add the contended_release trace event. This tracepoint fires on the holder side when a contended lock is released, complementing the existing contention_begin/contention_end tracepoints which fire on the waiter side. This enables correlating lock hold time under contention with waiter events by lock address. Add trace_contended_release() calls to the slowpath unlock paths of sleepable locks: mutex, rtmutex, semaphore, rwsem, percpu-rwsem, and RT-specific rwbase locks. Where possible, trace_contended_release() fires before the lock is released and before the waiter is woken. For some lock types, the tracepoint fires after the release but before the wake. Making the placement consistent across all lock types is not worth the added complexity. For reader/writer locks, the tracepoint fires for every reader releasing while a writer is waiting, not only for the last reader. Signed-off-by: Dmitry Ilvokhin --- include/trace/events/lock.h | 17 +++++++++++++++++ kernel/locking/mutex.c | 4 ++++ kernel/locking/percpu-rwsem.c | 11 +++++++++++ kernel/locking/rtmutex.c | 1 + kernel/locking/rwbase_rt.c | 6 ++++++ kernel/locking/rwsem.c | 10 ++++++++-- kernel/locking/semaphore.c | 4 ++++ 7 files changed, 51 insertions(+), 2 deletions(-) diff --git a/include/trace/events/lock.h b/include/trace/events/lock.h index da978f2afb45..1ded869cd619 100644 --- a/include/trace/events/lock.h +++ b/include/trace/events/lock.h @@ -137,6 +137,23 @@ TRACE_EVENT(contention_end, TP_printk("%p (ret=3D%d)", __entry->lock_addr, __entry->ret) ); =20 +TRACE_EVENT(contended_release, + + TP_PROTO(void *lock), + + TP_ARGS(lock), + + TP_STRUCT__entry( + __field(void *, lock_addr) + ), + + TP_fast_assign( + __entry->lock_addr =3D lock; + ), + + TP_printk("%p", __entry->lock_addr) +); + #endif /* _TRACE_LOCK_H */ =20 /* This part must be outside protection */ diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 427187ff02db..6c2c9312eb8f 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -997,6 +997,9 @@ static noinline void __sched __mutex_unlock_slowpath(st= ruct mutex *lock, unsigne wake_q_add(&wake_q, next); } =20 + if (trace_contended_release_enabled() && waiter) + trace_contended_release(lock); + if (owner & MUTEX_FLAG_HANDOFF) __mutex_handoff(lock, next); =20 @@ -1194,6 +1197,7 @@ EXPORT_SYMBOL(ww_mutex_lock_interruptible); =20 EXPORT_TRACEPOINT_SYMBOL_GPL(contention_begin); EXPORT_TRACEPOINT_SYMBOL_GPL(contention_end); +EXPORT_TRACEPOINT_SYMBOL_GPL(contended_release); =20 /** * atomic_dec_and_mutex_lock - return holding mutex if we dec to 0 diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c index f3ee7a0d6047..46b5903989b8 100644 --- a/kernel/locking/percpu-rwsem.c +++ b/kernel/locking/percpu-rwsem.c @@ -263,6 +263,9 @@ void percpu_up_write(struct percpu_rw_semaphore *sem) { rwsem_release(&sem->dep_map, _RET_IP_); =20 + if (trace_contended_release_enabled() && wq_has_sleeper(&sem->waiters)) + trace_contended_release(sem); + /* * Signal the writer is done, no fast path yet. * @@ -292,6 +295,14 @@ EXPORT_SYMBOL_GPL(percpu_up_write); void __percpu_up_read(struct percpu_rw_semaphore *sem) { lockdep_assert_preemption_disabled(); + /* + * After percpu_up_write() completes, rcu_sync_is_idle() can still + * return false during the grace period, forcing readers into this + * slowpath. Only trace when a writer is actually waiting for + * readers to drain. + */ + if (trace_contended_release_enabled() && rcuwait_active(&sem->writer)) + trace_contended_release(sem); /* * slowpath; reader will only ever wake a single blocked * writer. diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index ccaba6148b61..3db8a840b4e8 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1466,6 +1466,7 @@ static void __sched rt_mutex_slowunlock(struct rt_mut= ex_base *lock) raw_spin_lock_irqsave(&lock->wait_lock, flags); } =20 + trace_contended_release(lock); /* * The wakeup next waiter path does not suffer from the above * race. See the comments there. diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index 82e078c0665a..74da5601018f 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -174,6 +174,8 @@ static void __sched __rwbase_read_unlock(struct rwbase_= rt *rwb, static __always_inline void rwbase_read_unlock(struct rwbase_rt *rwb, unsigned int state) { + if (trace_contended_release_enabled() && rt_mutex_owner(&rwb->rtmutex)) + trace_contended_release(rwb); /* * rwb->readers can only hit 0 when a writer is waiting for the * active readers to leave the critical section. @@ -205,6 +207,8 @@ static inline void rwbase_write_unlock(struct rwbase_rt= *rwb) unsigned long flags; =20 raw_spin_lock_irqsave(&rtm->wait_lock, flags); + if (trace_contended_release_enabled() && rt_mutex_has_waiters(rtm)) + trace_contended_release(rwb); __rwbase_write_unlock(rwb, WRITER_BIAS, flags); } =20 @@ -214,6 +218,8 @@ static inline void rwbase_write_downgrade(struct rwbase= _rt *rwb) unsigned long flags; =20 raw_spin_lock_irqsave(&rtm->wait_lock, flags); + if (trace_contended_release_enabled() && rt_mutex_has_waiters(rtm)) + trace_contended_release(rwb); /* Release it and account current as reader */ __rwbase_write_unlock(rwb, WRITER_BIAS - 1, flags); } diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index bf647097369c..602d5fd3c91a 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1387,6 +1387,8 @@ static inline void __up_read(struct rw_semaphore *sem) rwsem_clear_reader_owned(sem); tmp =3D atomic_long_add_return_release(-RWSEM_READER_BIAS, &sem->count); DEBUG_RWSEMS_WARN_ON(tmp < 0, sem); + if (trace_contended_release_enabled() && (tmp & RWSEM_FLAG_WAITERS)) + trace_contended_release(sem); if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS)) =3D=3D RWSEM_FLAG_WAITERS)) { clear_nonspinnable(sem); @@ -1413,8 +1415,10 @@ static inline void __up_write(struct rw_semaphore *s= em) preempt_disable(); rwsem_clear_owner(sem); tmp =3D atomic_long_fetch_add_release(-RWSEM_WRITER_LOCKED, &sem->count); - if (unlikely(tmp & RWSEM_FLAG_WAITERS)) + if (unlikely(tmp & RWSEM_FLAG_WAITERS)) { + trace_contended_release(sem); rwsem_wake(sem); + } preempt_enable(); } =20 @@ -1437,8 +1441,10 @@ static inline void __downgrade_write(struct rw_semap= hore *sem) tmp =3D atomic_long_fetch_add_release( -RWSEM_WRITER_LOCKED+RWSEM_READER_BIAS, &sem->count); rwsem_set_reader_owned(sem); - if (tmp & RWSEM_FLAG_WAITERS) + if (tmp & RWSEM_FLAG_WAITERS) { + trace_contended_release(sem); rwsem_downgrade_wake(sem); + } preempt_enable(); } =20 diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c index 74d41433ba13..35ac3498dca5 100644 --- a/kernel/locking/semaphore.c +++ b/kernel/locking/semaphore.c @@ -230,6 +230,10 @@ void __sched up(struct semaphore *sem) sem->count++; else __up(sem, &wake_q); + + if (trace_contended_release_enabled() && !wake_q_empty(&wake_q)) + trace_contended_release(sem); + raw_spin_unlock_irqrestore(&sem->lock, flags); if (!wake_q_empty(&wake_q)) wake_up_q(&wake_q); --=20 2.52.0