From nobody Wed Jun 17 05:10:16 2026 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B9423ECBEE for ; Mon, 27 Apr 2026 18:38:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777315135; cv=none; b=u/BunOILdmumDJlli4bZjahel0eHeA6MLX3H0CAr2xC9eoCIBmBlm2USsT+coroqCIMwT3RVg2GfrKpTEwWSsYliwpBiZ+M+d+hmpM1LPTcsFS7AT5M64j+ggiDBGmF4azuC5OfINZwnEMpASG0YeQelAbC8PKYBsa0qABffpck= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777315135; c=relaxed/simple; bh=zmBCw8fckUNF2iWohGiF7Yva/7VA6jxNUbU5DHGp5bY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=oz544uUZCZ2RUTISguHybNh65rc9oyNhjRZHhpzhtq94NhBij+/jISjIF5zZcp0V3fWncY7Y0zuPSMTIGae/yKCmyOEUIETjTSbn10Y7apa4CBxRxVCwcl0FxFTXWe6DI0jwSbI7RDkOGWriMTp1FohoDxmOnz/PzjYwJ55p3J8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=CFFNMtNf; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="CFFNMtNf" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-82f803658d5so12680937b3a.1 for ; Mon, 27 Apr 2026 11:38:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1777315133; x=1777919933; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=HN/1YRYPaRxP2aKoTAAxqdPHqemktADUER/AsMMbkeI=; b=CFFNMtNfjdTifUuoF8hdEBFYMl4SiCPAxtmAFjGhm2SbA4s1EdFkww9F5zZRs8b3IG vOebLSHXrOSylEqDSknhc7my1Cg2LWsSzefimFHlXjzo1bQhfiBNFAyYX1KeBlfpsgh6 pwi02NJ6OgTAafxqnmgCNCjhwgp7ws3+h5vgMZ841uCleIwL67Q+hEOcnk8HkshHVTxs tdeREqTmTceboyDRh6ZMbc3Rl6CLw3YKg2cJ6QV0MHWhm76qG+QeIHfNnYIlK/xH7VQZ O5GA6lCnUsAjGua4IBVtgySvK4aXNEcjgPUTHWndjOA7MColuN4eqwhQJIL3kiFxQBIj dQVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777315133; x=1777919933; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HN/1YRYPaRxP2aKoTAAxqdPHqemktADUER/AsMMbkeI=; b=NGMHgStcmBnauDufsNDG5X1WWGEWdvc51U1gB/765md+6EpH+srTLnqgYrIccqJROb jrKoEEZBR5YJxxuQ7e0C/JIzpEY/n/O7bU/mrSI994qaMhsV5WO0GFFnRkP4sBR+Jx5o mzs0cO9MkclxT0sE4qhyFViwcYeAo0DCdmJK03jJnxVLTgiKE+fA3lM27LUvmKQ/8L9i rsHztDk/XvRM38myNEUwtOznNJvphfm/37q2CsHBWhZGYUuN5Kq3p7KodEa0V4nDP4Qw Iv/l6Sy/DkoiTCKr1VuSnAOeDFjEW2AT4JUnPT0oSjbxz/eNBt0S8SqVIssAqaNJBHuj EhkQ== X-Gm-Message-State: AOJu0Yzc/484dw+AWK8sqXxSfM5mYUOaDTEdLV7zqOxcsBYQx42FS4Jl fMfocvNK7n3SmAvCSq7TpelB4FfiiEy9BbpVCE48+gSJr0ci18dUK7kAOCOcRnVyJUcwTfgmWHQ J9uaG5vBnMdzMM7ZdIIV9IMgVOiTtsKna+COG71jv7gbYyb1azJ4B4wkt4fCN1K9AKgnYtAi11F Qy/HIIONAzW8LgOhJRu5hxlSEtIgRWNy4DCM9Ps4V0AXmehx72 X-Received: from pfbhm23.prod.google.com ([2002:a05:6a00:6717:b0:82f:a056:177a]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:139a:b0:82f:aae5:c7a6 with SMTP id d2e1a72fcca58-834dc2ab466mr150535b3a.43.1777315132856; Mon, 27 Apr 2026 11:38:52 -0700 (PDT) Date: Mon, 27 Apr 2026 18:38:40 +0000 In-Reply-To: <20260427183848.698551-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260427183848.698551-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.545.g6539524ca2-goog Message-ID: <20260427183848.698551-2-jstultz@google.com> Subject: [PATCH 1/2] sched: proxy-exec: Close race causing workqueue work being delayed From: John Stultz To: LKML Cc: John Stultz , Vineeth Pillai , Sonam Sanju , Sean Christopherson , Kunwu Chan , Tejun Heo , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Vineeth reported seeing a KVM related deadlock connected to work queue lockups using the android17-6.18 tree, which has Proxy Execution enabled (using the full patch stack), but I've subsequently reproduced it on v7.1-rc1. On further debugging he found: - kvm-irqfd-cleanup workqueue and rcu_gp lands in a per-cpu pwq(work queue pool) - one of kvm-irqfd-cleanup worker(say A) takes a mutex and then calls synchronize_srcu_expedited() - one other kvm-irqfd-cleanup worker worker(Say B) tries to acquire the lock and then gets blocked - On the way to blocking, this cpu gets an IPI and on return from IPI, it calls __schedule() and did not get to complete workqueue accounting(worker->sleeping =3D 0 and decrementing pool->nr_running). This is done in sched_submit_work() -> wq_worker_sleeping() called from schedule() and we got preempted before that. - proxy execution doesn't immediately take it off run queue as p->blocked_on is set during __mutex_lock - Next time when B is picked for running, it notices A(mutex holder) is not on a runqueue and then blocks B. find_proxy_task() -> proxy_deactivate() -> block_task() - And things are then stuck. A is waiting for the workqueue to be run, but B can't run the workqueue as it is blocked on A. The trouble is that with Proxy Execution, in __mutex_lock_common() we set the task state to TASK_UNINTERRUPTIBLE, and set blocked_on before calling into schedule(), where sched_submit_work() will be called. But if an IPI comes in before we call schedule() the interrupt will call __schedule(SM_PREEMPT) directly. This causes the scheduler to see the current task as blocked_on, and deactivate it (because the owner is off the runqueue). Since its deactivated, it wont' be run, and it won't get to call sched_submit_work(). Without proxy-execution, the SM_PREEMPT case will prevent the task from being dequeued, and it can be reselected again and run, which will allow it to finish calling into schedule() and calling sched_submit_work() before actually blocking. So we need to make sure on the SM_PREEMPT case, if current is marked as blocked_on, we should clear the blocked_on state and mark the task RUNNABLE so the task can be selected to complete its call to schedule() -> sched_submit_work(). Now because we cleared BLOCKED_ON and set the task RUNNABLE, the task will be able to be selected and run again and loop back in __mutex_lock_common() where it can re-set the blocked_on state and call back into schedule() in order to properly be chosen as a donor. Many thanks to Vineeth for figuring this very obscure race out and for implementing a test tool to make it easily reproducible! Reported-by: Vineeth Pillai Tested-by: Vineeth Pillai Signed-off-by: John Stultz Reviewed-by: K Prateek Nayak --- Cc: Vineeth Pillai Cc: Sonam Sanju Cc: Sean Christopherson Cc: Kunwu Chan Cc: Tejun Heo Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/core.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index da20fb6ea25ae..5f684caefd8b2 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7097,6 +7097,17 @@ static void __sched notrace __schedule(int sched_mod= e) try_to_block_task(rq, prev, &prev_state, !task_is_blocked(prev)); switch_count =3D &prev->nvcsw; + } else if (preempt && prev->blocked_on) { + /* + * If we are SM_PREEMPT, we may have interrupted + * after blocked_on was set, before schedule() + * was run, preventing workques from running. So + * clear blocked_on and mark task RUNNING so it + * can be reselected to run and complete its + * logic + */ + WRITE_ONCE(prev->__state, TASK_RUNNING); + clear_task_blocked_on(prev, NULL); } =20 pick_again: --=20 2.54.0.545.g6539524ca2-goog From nobody Wed Jun 17 05:10:16 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E1DB73ECBFD for ; Mon, 27 Apr 2026 18:38:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777315137; cv=none; b=F+zvLT9SR04soEU59NZHTvM7QCemHB0SpB0NpLuxAIPZSP8w6YMw6zQl5s3TAsV2qjmHTYxAaKZx+gA1mcXqua9VghlWQ4KHG5LI0W1vEpdU5MvLKIa9KC7nggBCeQGoYkDSIHRKqBYyHN27VCQCanbdV3lNQv3XLPvRy4o7JOI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777315137; c=relaxed/simple; bh=x/Wh1D+yXamvOjv3wRMZFBY9yDBFNOH7/v8a82glpZg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=If4y3br8UDRbMbprM90o3KFwbsJaaLI0FqxxHYJ9Fk5zMNu7Rol4r5qkKxR/rnINhDBSkk1sE0anRt2dEbcxfcivjPT5+g7yMJbOh9MARsllKBmTL5gYvXG/QkRHg7bcd6T+81H7P0oQDR1sjGPvFHKnQDVoLyOVQt2QmsHm1z8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=rSPN8kZ7; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rSPN8kZ7" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-35e5791871fso12830125a91.3 for ; Mon, 27 Apr 2026 11:38:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1777315135; x=1777919935; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LCFOqcRmwNjHX26koROMD9CdgEZKlERSoYkhoq5pjSA=; b=rSPN8kZ7M5zayFxtg/DhJYUQScMVzLxYipv+GNRL3i10jPlTkZDtAVUlkTqEeLifaw v3hxay7a8fLJ8CynYDenMMc3d3GfzQUdB7+Kj/BY0eiqTWR+eAKOpt8d/KeyKVO/Yklm vkgkizx7Rr0+MlcwDlflR9VvjtWrx3TALdFLX6uIuElo9SqhGw5zzrlWJOWuHWk2fPmU bmrqZ935HPV/aMmcLEDoV5f0uWIUMyOwnV587yKV60iasRt9qej7mgya/MRaUpVYeIAK 0amZQjo78XNYPvqWZIfLOHv2RoxWmpzWTMihESoDH3I8ubXN/qw3HlovJhwpZeaFvBae u7FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777315135; x=1777919935; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LCFOqcRmwNjHX26koROMD9CdgEZKlERSoYkhoq5pjSA=; b=rT7NcmLieuXnCJPRcmec6V0eQ8jl5TLwfT75DkKVNFVfRgp/55TAaRGcDZT4lujWZS EgV+rUhlKOWelpAhavKFmmeQBGb6VWnNrT5to3wXg3/yXdNpxBgmGnrBXJoSdxwnVVQN A012RPdEkVGyqI5Z1v8v77QtCfh6PLskvKRva46+Gr8z4X6Ca9r5sPPlxnhvDiBDIT7l j1HIe9vQfM3UOqaDoZzH3qJJE1Wi6N9xcYHfgOR6Mc8EcLUI55tTus+faWrUHO3agiNz cCxzX4wCSqTD1zyybYBovR3fdagS+8APw9fcSr7gTpnVnJKS+orb6frl4oaC5Q3WW3bQ iBJg== X-Gm-Message-State: AOJu0YxSbdiBwx2noPoTW3ZCljB5750uj0yoSJqmz1kGyarzaHdDaVi9 rYWiWydZxi3XZ9+tbY8yW5YOZEnBd4E+CiLAHbveaNq+nxPfJleDuW3DqkTq6C/5xIb1bP8C2hC NXDTbpCnSM0cvcKweaQrTKpnYsMvBbz4Z4JOEL1h9MRIJfkWO/vzPNnht20tuus/jIvYTaIYKSI gZFF5dXALPZ5FkLM4f5bLNIyCAfWfCJh0QFjSVxdsLUUR+DyyV X-Received: from plti2.prod.google.com ([2002:a17:902:6ac2:b0:2b2:eb2c:c86e]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3d84:b0:35d:8f3d:c554 with SMTP id 98e67ed59e1d1-36490c6437cmr263197a91.13.1777315134844; Mon, 27 Apr 2026 11:38:54 -0700 (PDT) Date: Mon, 27 Apr 2026 18:38:41 +0000 In-Reply-To: <20260427183848.698551-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260427183848.698551-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.545.g6539524ca2-goog Message-ID: <20260427183848.698551-3-jstultz@google.com> Subject: [PATCH 2/2] locking: mutex: Fix proxy-exec potentially deactivating tasks marked TASK_RUNNING From: John Stultz To: LKML Cc: John Stultz , Vineeth Pillai , Sonam Sanju , Sean Christopherson , Kunwu Chan , Tejun Heo , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Vineeth found came up with a test driver that could trip up workqueue stalls. After fixing one issue this test found, Vineeth reported the test was still failing. Greatly simplified, a task that tries to take a mutex already owned by another task that is sleeping, can hit a edge case in the mutex_lock_common() case. If the task fails to get the lock, calls into schedule, but gets a spurious wakeup, it will find that it is first waiter, and go into the mutex_optimistic_spin() logic. Though before calling mutex_optimistic_spin(), we clear task blocked_on state, since mutex_optimistic_spin() may call schedule() if need_resched() is set. After mutex_optimistic_spin() fails, we set blocked_on again, restart the main mutex loop, try to take the lock and call into schedule_preempt_disabled(). From there, with proxy-execution, we'll see the task is blocked_on, follow the chain, see the owner is sleeping and dequeue the waiting task from the runqueue. This all sounds fine and reasonable. But what I had missed is that in mutex_optimistic_spin(), not only do we call schedule() but we set TASK_RUNNABLE right before doing so. This is ok for that invocation of schedule(). But when we come back we re-set the blocked_on we had just cleared, but we do not re-set the task state to TASK_INTERRUPTIBLE/UNINTERRUPTIBLE. This means we have a task that is blocked_on & TASK_RUNNABLE, so when the proxy execution code dequeues the task, we are in trouble since future wakeups will be shortcut by the ttwu_state_match() check. Thus, to avoid this, after mutex_optimistic_spin(), set the task state back when we set blocked_on. Many many thanks again to Vineeth for his very useful testing driver that uncovered this long hidden bug, that I hadn't tripped in all my testing! Very impressed with the problems he's uncovered! Reported-by: Vineeth Pillai Tested-by: Vineeth Pillai Signed-off-by: John Stultz Reviewed-by: K Prateek Nayak --- Cc: Vineeth Pillai Cc: Sonam Sanju Cc: Sean Christopherson Cc: Kunwu Chan Cc: Tejun Heo Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/locking/mutex.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 09534628dc01a..a93d4c6bee1a3 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -763,6 +763,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas raw_spin_lock_irqsave(&lock->wait_lock, flags); raw_spin_lock(¤t->blocked_lock); __set_task_blocked_on(current, lock); + set_current_state(state); =20 if (opt_acquired) break; --=20 2.54.0.545.g6539524ca2-goog