From: Peter Zijlstra <peterz@infradead.org>
Track the blocked-on relation for mutexes, to allow following this
relation at schedule time.
task
| blocked-on
v
mutex
| owner
v
task
This all will be used for tracking blocked-task/mutex chains
with the prox-execution patch in a similar fashion to how
priority inheritance is done with rt_mutexes.
For serialization, blocked-on is only set by the task itself
(current). And both when setting or clearing (potentially by
others), is done while holding the mutex::wait_lock.
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[minor changes while rebasing]
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Fix blocked_on tracking in __mutex_lock_common in error paths]
Signed-off-by: John Stultz <jstultz@google.com>
---
v2:
* Fixed blocked_on tracking in error paths that was causing crashes
v4:
* Ensure we clear blocked_on when waking ww_mutexes to die or wound.
This is critical so we don't get circular blocked_on relationships
that can't be resolved.
v5:
* Fix potential bug where the skip_wait path might clear blocked_on
when that path never set it
* Slight tweaks to where we set blocked_on to make it consistent,
along with extra WARN_ON correctness checking
* Minor comment changes
v7:
* Minor commit message change suggested by Metin Kaya
* Fix WARN_ON conditionals in unlock path (as blocked_on might already
be cleared), found while looking at issue Metin Kaya raised.
* Minor tweaks to be consistent in what we do under the
blocked_on lock, also tweaked variable name to avoid confusion
with label, and comment typos, as suggested by Metin Kaya
* Minor tweak for CONFIG_SCHED_PROXY_EXEC name change
* Moved unused block of code to later in the series, as suggested
by Metin Kaya
* Switch to a tri-state to be able to distinguish from waking and
runnable so we can later safely do return migration from ttwu
* Folded together with related blocked_on changes
v8:
* Fix issue leaving task BO_BLOCKED when calling into optimistic
spinning path.
* Include helper to better handle BO_BLOCKED->BO_WAKING transitions
v9:
* Typo fixup pointed out by Metin
* Cleanup BO_WAKING->BO_RUNNABLE transitions for the !proxy case
* Many cleanups and simplifications suggested by Metin
v11:
* Whitespace fixup pointed out by Metin
v13:
* Refactor set_blocked_on helpers clean things up a bit
v14:
* Small build fixup with PREEMPT_RT
v15:
* Improve consistency of names for functions that assume blocked_lock
is held, as suggested by Peter
* Use guard instead of separate spinlock/unlock calls, also suggested
by Peter
* Drop blocked_on_state tri-state for now, as its not needed until
later in the series, when we get to proxy-migration and return-
migration.
---
include/linux/sched.h | 5 +----
kernel/fork.c | 3 +--
kernel/locking/mutex-debug.c | 9 +++++----
kernel/locking/mutex.c | 19 +++++++++++++++++++
kernel/locking/ww_mutex.h | 18 ++++++++++++++++--
5 files changed, 42 insertions(+), 12 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1462f2c70aefc..03775c44b7073 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1212,10 +1212,7 @@ struct task_struct {
struct rt_mutex_waiter *pi_blocked_on;
#endif
-#ifdef CONFIG_DEBUG_MUTEXES
- /* Mutex deadlock detection: */
- struct mutex_waiter *blocked_on;
-#endif
+ struct mutex *blocked_on; /* lock we're blocked on */
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
int non_block_count;
diff --git a/kernel/fork.c b/kernel/fork.c
index 735405a9c5f32..38f055082d07a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2357,9 +2357,8 @@ __latent_entropy struct task_struct *copy_process(
lockdep_init_task(p);
#endif
-#ifdef CONFIG_DEBUG_MUTEXES
p->blocked_on = NULL; /* not blocked yet */
-#endif
+
#ifdef CONFIG_BCACHE
p->sequential_io = 0;
p->sequential_io_avg = 0;
diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index 6e6f6071cfa27..758b7a6792b0c 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -53,17 +53,18 @@ void debug_mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
{
lockdep_assert_held(&lock->wait_lock);
- /* Mark the current thread as blocked on the lock: */
- task->blocked_on = waiter;
+ /* Current thread can't be already blocked (since it's executing!) */
+ DEBUG_LOCKS_WARN_ON(task->blocked_on);
}
void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter,
struct task_struct *task)
{
+ struct mutex *blocked_on = READ_ONCE(task->blocked_on);
+
DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list));
DEBUG_LOCKS_WARN_ON(waiter->task != task);
- DEBUG_LOCKS_WARN_ON(task->blocked_on != waiter);
- task->blocked_on = NULL;
+ DEBUG_LOCKS_WARN_ON(blocked_on && blocked_on != lock);
INIT_LIST_HEAD(&waiter->list);
waiter->task = NULL;
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index b36f23de48f1b..37d1966970617 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -627,6 +627,8 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
goto err_early_kill;
}
+ WARN_ON(current->blocked_on);
+ current->blocked_on = lock;
set_current_state(state);
trace_contention_begin(lock, LCB_F_MUTEX);
for (;;) {
@@ -663,6 +665,12 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
first = __mutex_waiter_is_first(lock, &waiter);
+ /*
+ * As we likely have been woken up by task
+ * that has cleared our blocked_on state, re-set
+ * it to the lock we are trying to aquire.
+ */
+ current->blocked_on = lock;
set_current_state(state);
/*
* Here we order against unlock; we must either see it change
@@ -683,6 +691,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
}
raw_spin_lock_irqsave(&lock->wait_lock, flags);
acquired:
+ current->blocked_on = NULL;
__set_current_state(TASK_RUNNING);
if (ww_ctx) {
@@ -712,9 +721,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
return 0;
err:
+ current->blocked_on = NULL;
__set_current_state(TASK_RUNNING);
__mutex_remove_waiter(lock, &waiter);
err_early_kill:
+ WARN_ON(current->blocked_on);
trace_contention_end(lock, ret);
raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q);
debug_mutex_free_waiter(&waiter);
@@ -924,6 +935,14 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
next = waiter->task;
debug_mutex_wake_waiter(lock, waiter);
+ /*
+ * Unlock wakeups can be happening in parallel
+ * (when optimistic spinners steal and release
+ * the lock), so blocked_on may already be
+ * cleared here.
+ */
+ WARN_ON(next->blocked_on && next->blocked_on != lock);
+ next->blocked_on = NULL;
wake_q_add(&wake_q, next);
}
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 37f025a096c9d..00db40946328e 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -284,6 +284,14 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
#ifndef WW_RT
debug_mutex_wake_waiter(lock, waiter);
#endif
+ /*
+ * When waking up the task to die, be sure to clear the
+ * blocked_on pointer. Otherwise we can see circular
+ * blocked_on relationships that can't resolve.
+ */
+ WARN_ON(waiter->task->blocked_on &&
+ waiter->task->blocked_on != lock);
+ waiter->task->blocked_on = NULL;
wake_q_add(wake_q, waiter->task);
}
@@ -331,9 +339,15 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
* it's wounded in __ww_mutex_check_kill() or has a
* wakeup pending to re-read the wounded state.
*/
- if (owner != current)
+ if (owner != current) {
+ /*
+ * When waking up the task to wound, be sure to clear the
+ * blocked_on pointer. Otherwise we can see circular
+ * blocked_on relationships that can't resolve.
+ */
+ owner->blocked_on = NULL;
wake_q_add(wake_q, owner);
-
+ }
return true;
}
--
2.49.0.rc0.332.g42c0ae87b1-goog
FYI, this is useful for Masami's "hung task" work that will show what
tasks a hung task is blocked on in a crash report.
https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com/
-- Steve
On Wed, 12 Mar 2025 15:11:32 -0700
John Stultz <jstultz@google.com> wrote:
> From: Peter Zijlstra <peterz@infradead.org>
>
> Track the blocked-on relation for mutexes, to allow following this
> relation at schedule time.
>
> task
> | blocked-on
> v
> mutex
> | owner
> v
> task
>
> This all will be used for tracking blocked-task/mutex chains
> with the prox-execution patch in a similar fashion to how
> priority inheritance is done with rt_mutexes.
>
> For serialization, blocked-on is only set by the task itself
> (current). And both when setting or clearing (potentially by
> others), is done while holding the mutex::wait_lock.
>
> Cc: Joel Fernandes <joelagnelf@nvidia.com>
> Cc: Qais Yousef <qyousef@layalina.io>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Zimuzo Ezeozue <zezeozue@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Will Deacon <will@kernel.org>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Metin Kaya <Metin.Kaya@arm.com>
> Cc: Xuewen Yan <xuewen.yan94@gmail.com>
> Cc: K Prateek Nayak <kprateek.nayak@amd.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Suleiman Souhlal <suleiman@google.com>
> Cc: kernel-team@android.com
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> [minor changes while rebasing]
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Connor O'Brien <connoro@google.com>
> [jstultz: Fix blocked_on tracking in __mutex_lock_common in error paths]
> Signed-off-by: John Stultz <jstultz@google.com>
> ---
> v2:
> * Fixed blocked_on tracking in error paths that was causing crashes
> v4:
> * Ensure we clear blocked_on when waking ww_mutexes to die or wound.
> This is critical so we don't get circular blocked_on relationships
> that can't be resolved.
> v5:
> * Fix potential bug where the skip_wait path might clear blocked_on
> when that path never set it
> * Slight tweaks to where we set blocked_on to make it consistent,
> along with extra WARN_ON correctness checking
> * Minor comment changes
> v7:
> * Minor commit message change suggested by Metin Kaya
> * Fix WARN_ON conditionals in unlock path (as blocked_on might already
> be cleared), found while looking at issue Metin Kaya raised.
> * Minor tweaks to be consistent in what we do under the
> blocked_on lock, also tweaked variable name to avoid confusion
> with label, and comment typos, as suggested by Metin Kaya
> * Minor tweak for CONFIG_SCHED_PROXY_EXEC name change
> * Moved unused block of code to later in the series, as suggested
> by Metin Kaya
> * Switch to a tri-state to be able to distinguish from waking and
> runnable so we can later safely do return migration from ttwu
> * Folded together with related blocked_on changes
> v8:
> * Fix issue leaving task BO_BLOCKED when calling into optimistic
> spinning path.
> * Include helper to better handle BO_BLOCKED->BO_WAKING transitions
> v9:
> * Typo fixup pointed out by Metin
> * Cleanup BO_WAKING->BO_RUNNABLE transitions for the !proxy case
> * Many cleanups and simplifications suggested by Metin
> v11:
> * Whitespace fixup pointed out by Metin
> v13:
> * Refactor set_blocked_on helpers clean things up a bit
> v14:
> * Small build fixup with PREEMPT_RT
> v15:
> * Improve consistency of names for functions that assume blocked_lock
> is held, as suggested by Peter
> * Use guard instead of separate spinlock/unlock calls, also suggested
> by Peter
> * Drop blocked_on_state tri-state for now, as its not needed until
> later in the series, when we get to proxy-migration and return-
> migration.
> ---
> include/linux/sched.h | 5 +----
> kernel/fork.c | 3 +--
> kernel/locking/mutex-debug.c | 9 +++++----
> kernel/locking/mutex.c | 19 +++++++++++++++++++
> kernel/locking/ww_mutex.h | 18 ++++++++++++++++--
> 5 files changed, 42 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 1462f2c70aefc..03775c44b7073 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1212,10 +1212,7 @@ struct task_struct {
> struct rt_mutex_waiter *pi_blocked_on;
> #endif
>
> -#ifdef CONFIG_DEBUG_MUTEXES
> - /* Mutex deadlock detection: */
> - struct mutex_waiter *blocked_on;
> -#endif
> + struct mutex *blocked_on; /* lock we're blocked on */
>
> #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> int non_block_count;
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 735405a9c5f32..38f055082d07a 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2357,9 +2357,8 @@ __latent_entropy struct task_struct *copy_process(
> lockdep_init_task(p);
> #endif
>
> -#ifdef CONFIG_DEBUG_MUTEXES
> p->blocked_on = NULL; /* not blocked yet */
> -#endif
> +
> #ifdef CONFIG_BCACHE
> p->sequential_io = 0;
> p->sequential_io_avg = 0;
> diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
> index 6e6f6071cfa27..758b7a6792b0c 100644
> --- a/kernel/locking/mutex-debug.c
> +++ b/kernel/locking/mutex-debug.c
> @@ -53,17 +53,18 @@ void debug_mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
> {
> lockdep_assert_held(&lock->wait_lock);
>
> - /* Mark the current thread as blocked on the lock: */
> - task->blocked_on = waiter;
> + /* Current thread can't be already blocked (since it's executing!) */
> + DEBUG_LOCKS_WARN_ON(task->blocked_on);
> }
>
> void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter,
> struct task_struct *task)
> {
> + struct mutex *blocked_on = READ_ONCE(task->blocked_on);
> +
> DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list));
> DEBUG_LOCKS_WARN_ON(waiter->task != task);
> - DEBUG_LOCKS_WARN_ON(task->blocked_on != waiter);
> - task->blocked_on = NULL;
> + DEBUG_LOCKS_WARN_ON(blocked_on && blocked_on != lock);
>
> INIT_LIST_HEAD(&waiter->list);
> waiter->task = NULL;
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index b36f23de48f1b..37d1966970617 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -627,6 +627,8 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
> goto err_early_kill;
> }
>
> + WARN_ON(current->blocked_on);
> + current->blocked_on = lock;
> set_current_state(state);
> trace_contention_begin(lock, LCB_F_MUTEX);
> for (;;) {
> @@ -663,6 +665,12 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
>
> first = __mutex_waiter_is_first(lock, &waiter);
>
> + /*
> + * As we likely have been woken up by task
> + * that has cleared our blocked_on state, re-set
> + * it to the lock we are trying to aquire.
> + */
> + current->blocked_on = lock;
> set_current_state(state);
> /*
> * Here we order against unlock; we must either see it change
> @@ -683,6 +691,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
> }
> raw_spin_lock_irqsave(&lock->wait_lock, flags);
> acquired:
> + current->blocked_on = NULL;
> __set_current_state(TASK_RUNNING);
>
> if (ww_ctx) {
> @@ -712,9 +721,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
> return 0;
>
> err:
> + current->blocked_on = NULL;
> __set_current_state(TASK_RUNNING);
> __mutex_remove_waiter(lock, &waiter);
> err_early_kill:
> + WARN_ON(current->blocked_on);
> trace_contention_end(lock, ret);
> raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q);
> debug_mutex_free_waiter(&waiter);
> @@ -924,6 +935,14 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
> next = waiter->task;
>
> debug_mutex_wake_waiter(lock, waiter);
> + /*
> + * Unlock wakeups can be happening in parallel
> + * (when optimistic spinners steal and release
> + * the lock), so blocked_on may already be
> + * cleared here.
> + */
> + WARN_ON(next->blocked_on && next->blocked_on != lock);
> + next->blocked_on = NULL;
> wake_q_add(&wake_q, next);
> }
>
> diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
> index 37f025a096c9d..00db40946328e 100644
> --- a/kernel/locking/ww_mutex.h
> +++ b/kernel/locking/ww_mutex.h
> @@ -284,6 +284,14 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
> #ifndef WW_RT
> debug_mutex_wake_waiter(lock, waiter);
> #endif
> + /*
> + * When waking up the task to die, be sure to clear the
> + * blocked_on pointer. Otherwise we can see circular
> + * blocked_on relationships that can't resolve.
> + */
> + WARN_ON(waiter->task->blocked_on &&
> + waiter->task->blocked_on != lock);
> + waiter->task->blocked_on = NULL;
> wake_q_add(wake_q, waiter->task);
> }
>
> @@ -331,9 +339,15 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
> * it's wounded in __ww_mutex_check_kill() or has a
> * wakeup pending to re-read the wounded state.
> */
> - if (owner != current)
> + if (owner != current) {
> + /*
> + * When waking up the task to wound, be sure to clear the
> + * blocked_on pointer. Otherwise we can see circular
> + * blocked_on relationships that can't resolve.
> + */
> + owner->blocked_on = NULL;
> wake_q_add(wake_q, owner);
> -
> + }
> return true;
> }
>
On Thu, Mar 13, 2025 at 06:13:51AM -0400, Steven Rostedt wrote: > > FYI, this is useful for Masami's "hung task" work that will show what > tasks a hung task is blocked on in a crash report. Yeah, but I would really rather not have that interfere.
On Thu, Mar 13, 2025 at 3:14 AM Steven Rostedt <rostedt@goodmis.org> wrote: > FYI, this is useful for Masami's "hung task" work that will show what > tasks a hung task is blocked on in a crash report. > > https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com/ > Ah. Indeed, we have similar use cases. There's some slight difference in when we consider the task blocked, especially in this early patch (as waking tasks mark us as unblocked so we can be selected to run). But later on in the series (in the portions I've not yet submitted here) when the blocked_on_state has been introduced, the blocked_on value approximates to about the same spot as used here. So I should be able to unify these. It looks like Masami's patch is close to being queued, so maybe I'll incorporate it into my series and rework my set ontop. Any objections to this? thanks -john
On Thu, 13 Mar 2025 23:12:57 -0700
John Stultz <jstultz@google.com> wrote:
> On Thu, Mar 13, 2025 at 3:14 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > FYI, this is useful for Masami's "hung task" work that will show what
> > tasks a hung task is blocked on in a crash report.
> >
> > https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com/
> >
>
> Ah. Indeed, we have similar use cases. There's some slight difference
> in when we consider the task blocked, especially in this early patch
> (as waking tasks mark us as unblocked so we can be selected to run).
> But later on in the series (in the portions I've not yet submitted
> here) when the blocked_on_state has been introduced, the blocked_on
> value approximates to about the same spot as used here.
Interesting. Can yo also track tasks which takes other locks like
rwsem/semaphore ? Lance is also working on this to expand it to
support semaphore.
https://lore.kernel.org/all/20250314144300.32542-1-ioworker0@gmail.com/
Please add them for the next version.
>
> So I should be able to unify these. It looks like Masami's patch is
> close to being queued, so maybe I'll incorporate it into my series and
> rework my set ontop. Any objections to this?
No :) Please Cc to me.
BTW, I had a chat with Suleiman and he suggested me to expand
this idea to record what locks the task takes. Then we can search
all tasks who is holding the lock. Something like,
struct task_struct {
unsigned long blocking_on;
unsigned long holding_locks[HOLDING_LOCK_MAX];
unsigned int holding_idx;
};
lock(lock_addr) {
if (succeeded_to_lock) {
current->holding_locks[current->holding_idx++] = lock_addr;
} else {
record_blocking_on(current, lock_addr)
wait_for_lock();
clear_blocking_on(current, lock_addr)
}
}
unlock(lock_addr) {
current->holding_locks[--current->holding_idx] = 0UL;
}
And when we found a hung task, call dump_blocker() like this;
dump_blocker() {
lock_addr = hung_task->blocking_on;
for_each_task(task) {
if (find_lock(task->holding_locks, lock_addr)) {
dump_task(task);
/* semaphore, rwsem will need to dump all holders. */
if (lock is mutex)
break;
}
}
}
This can be too much but interesting idea to find semaphore type blocker.
Thank you,
>
> thanks
> -john
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
On Tue, Mar 18, 2025 at 3:11 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> On Thu, 13 Mar 2025 23:12:57 -0700
> John Stultz <jstultz@google.com> wrote:
>
> > On Thu, Mar 13, 2025 at 3:14 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > > FYI, this is useful for Masami's "hung task" work that will show what
> > > tasks a hung task is blocked on in a crash report.
> > >
> > > https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com/
> > >
> >
> > Ah. Indeed, we have similar use cases. There's some slight difference
> > in when we consider the task blocked, especially in this early patch
> > (as waking tasks mark us as unblocked so we can be selected to run).
> > But later on in the series (in the portions I've not yet submitted
> > here) when the blocked_on_state has been introduced, the blocked_on
> > value approximates to about the same spot as used here.
>
> Interesting. Can yo also track tasks which takes other locks like
> rwsem/semaphore ? Lance is also working on this to expand it to
> support semaphore.
Currently no, proxy-exec is initially just focused on kernel mutexes.
However I do hope to expand it to be usable with other locking
primitives, so something like what Lance is proposing would be needed
for that, so I'm eager to make use of his work.
I've pulled both of your work into my tree and will try to rework my
logic on top.
> BTW, I had a chat with Suleiman and he suggested me to expand
> this idea to record what locks the task takes. Then we can search
> all tasks who is holding the lock. Something like,
>
> struct task_struct {
> unsigned long blocking_on;
> unsigned long holding_locks[HOLDING_LOCK_MAX];
> unsigned int holding_idx;
> };
>
> lock(lock_addr) {
> if (succeeded_to_lock) {
> current->holding_locks[current->holding_idx++] = lock_addr;
> } else {
> record_blocking_on(current, lock_addr)
> wait_for_lock();
> clear_blocking_on(current, lock_addr)
> }
> }
>
> unlock(lock_addr) {
> current->holding_locks[--current->holding_idx] = 0UL;
> }
>
> And when we found a hung task, call dump_blocker() like this;
>
> dump_blocker() {
> lock_addr = hung_task->blocking_on;
> for_each_task(task) {
> if (find_lock(task->holding_locks, lock_addr)) {
> dump_task(task);
> /* semaphore, rwsem will need to dump all holders. */
> if (lock is mutex)
> break;
> }
> }
> }
>
> This can be too much but interesting idea to find semaphore type blocker.
Yeah. I suspect the rw/sem -> owners mapping is a missing piece that
will be needed for proxy-exec, but I've not looked closely yet.
thanks
-john
On Tue, Mar 18, 2025 at 10:11 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Thu, 13 Mar 2025 23:12:57 -0700
> John Stultz <jstultz@google.com> wrote:
>
> > On Thu, Mar 13, 2025 at 3:14 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > > FYI, this is useful for Masami's "hung task" work that will show what
> > > tasks a hung task is blocked on in a crash report.
> > >
> > > https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com/
> > >
> >
> > Ah. Indeed, we have similar use cases. There's some slight difference
> > in when we consider the task blocked, especially in this early patch
> > (as waking tasks mark us as unblocked so we can be selected to run).
> > But later on in the series (in the portions I've not yet submitted
> > here) when the blocked_on_state has been introduced, the blocked_on
> > value approximates to about the same spot as used here.
>
> Interesting. Can yo also track tasks which takes other locks like
> rwsem/semaphore ? Lance is also working on this to expand it to
> support semaphore.
>
> https://lore.kernel.org/all/20250314144300.32542-1-ioworker0@gmail.com/
>
> Please add them for the next version.
Hi John,
When’s the next version expected? I intend to send a new version out
soon, and it’d be great if you could include it in the next version ;)
Also, since we have similar use cases, it might make sense to use
the same field to store the lock, encoding the lock type in the LSB as
Masami suggested.
>
> >
> > So I should be able to unify these. It looks like Masami's patch is
> > close to being queued, so maybe I'll incorporate it into my series and
> > rework my set ontop. Any objections to this?
>
> No :) Please Cc to me.
>
>
> BTW, I had a chat with Suleiman and he suggested me to expand
> this idea to record what locks the task takes. Then we can search
> all tasks who is holding the lock. Something like,
Hi Masami,
Yeah, that’s a really cool idea - being able to search for all tasks holding a
lock. Maybe we just keep it simple for now and extend it when we need to
handle more complex stuff like rwsem ;)
Thanks,
Lance
>
> struct task_struct {
> unsigned long blocking_on;
> unsigned long holding_locks[HOLDING_LOCK_MAX];
> unsigned int holding_idx;
> };
>
> lock(lock_addr) {
> if (succeeded_to_lock) {
> current->holding_locks[current->holding_idx++] = lock_addr;
> } else {
> record_blocking_on(current, lock_addr)
> wait_for_lock();
> clear_blocking_on(current, lock_addr)
> }
> }
>
> unlock(lock_addr) {
> current->holding_locks[--current->holding_idx] = 0UL;
> }
>
> And when we found a hung task, call dump_blocker() like this;
>
> dump_blocker() {
> lock_addr = hung_task->blocking_on;
> for_each_task(task) {
> if (find_lock(task->holding_locks, lock_addr)) {
> dump_task(task);
> /* semaphore, rwsem will need to dump all holders. */
> if (lock is mutex)
> break;
> }
> }
> }
>
> This can be too much but interesting idea to find semaphore type blocker.
>
> Thank you,
>
> >
> > thanks
> > -john
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>
On Tue, Mar 18, 2025 at 4:33 PM Lance Yang <ioworker0@gmail.com> wrote: > > When’s the next version expected? I intend to send a new version out > soon, and it’d be great if you could include it in the next version ;) Yeah, I've pulled in your old version already, but I'll update my tree with your new one. Between conf travel and vacation, it might be April before I get v16 out. Most likely I'll be moving much of linux/hung_task.h into linux/sched.h to get generic accessors to setting and clearing the task blocked-on relationship (so its not just tied to the hung task debug logic). Then for proxy I just need to integrate it with the additional blocked_on_state that is used to determine if the (previously blocked, but left on the rq) task is runnable or not. > Also, since we have similar use cases, it might make sense to use > the same field to store the lock, encoding the lock type in the LSB as > Masami suggested. Yep. Agreed. As I mentioned earlier, proxy only works with mutexes at the moment, but I do intend to expand that once we have the core proxy logic well tested upstream, so the multi-type blocked-on relationship is really useful to have. thanks! -john
On Wed, Mar 19, 2025 at 5:49 PM John Stultz <jstultz@google.com> wrote: > > On Tue, Mar 18, 2025 at 4:33 PM Lance Yang <ioworker0@gmail.com> wrote: > > > > When’s the next version expected? I intend to send a new version out > > soon, and it’d be great if you could include it in the next version ;) > > Yeah, I've pulled in your old version already, but I'll update my tree > with your new one. Between conf travel and vacation, it might be > April before I get v16 out. Appreciate it! I've sent a new one[1] out today and also Cc'd you. Will keep you in the loop if anything changes ;) > > Most likely I'll be moving much of linux/hung_task.h into > linux/sched.h to get generic accessors to setting and clearing the > task blocked-on relationship (so its not just tied to the hung task > debug logic). Then for proxy I just need to integrate it with the > additional blocked_on_state that is used to determine if the > (previously blocked, but left on the rq) task is runnable or not. Yes, that makes sense to me. Feel free to adjust as needed ;) > > > Also, since we have similar use cases, it might make sense to use > > the same field to store the lock, encoding the lock type in the LSB as > > Masami suggested. > > Yep. Agreed. As I mentioned earlier, proxy only works with mutexes at > the moment, but I do intend to expand that once we have the core proxy > logic well tested upstream, so the multi-type blocked-on relationship > is really useful to have. Yeah, let's keep it simple for now and expand it later once the core proxy logic is good to go ;) Thanks again for your time! Lance > > thanks! > -john
On Thu, 13 Mar 2025 23:12:57 -0700 John Stultz <jstultz@google.com> wrote: > So I should be able to unify these. It looks like Masami's patch is > close to being queued, so maybe I'll incorporate it into my series and > rework my set ontop. Any objections to this? None by me. -- Steve
© 2016 - 2025 Red Hat, Inc.