From nobody Tue Apr 7 12:52:09 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80961341062 for ; Fri, 13 Mar 2026 02:30:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369031; cv=none; b=V5fBKxiX/f/CDBHYF0JfXto3hDOhwZ7B6Zsc5WcwylOXzoWtTWWE7tpNsuMYNhoxMXC9hW71fkGW1UAsUxPvx2SEMwr7NdhXh9nddtTPA06tfNCrf3ZaNFbCTdezCcupQnSF5ze3lkQ62Ds7h6K6q70O9Wim7GAE5mjBy9ik1nk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369031; c=relaxed/simple; bh=rr82F+ga7KGmv1C4psrqbuFWVIMbe1J8ngMGoRHGr9g=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=KryYuSPVcQJDTTePzZ3rIXOa5U8XtBEBH4IB8i6FuHezQ+gmqNvxS9PlU3E0t7XBNKARncyWq6nqh3hgE3GcDX22W33pR3Z1rMDalc4k5mcifSxuV5a4clwU6RkWFt/mgWx+joO6tUD5SURo9RprAncW4RvLJcqiGV49WNsyjdo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=V2m+b1X5; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="V2m+b1X5" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2aebfa0af7dso81978935ad.1 for ; Thu, 12 Mar 2026 19:30:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773369028; x=1773973828; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lOjdGJLqTDMVlzYMP2N+jeP6HMxGKcLcSbSRz38UF7s=; b=V2m+b1X5z0eQiagKzJ9m7/FNFKkW28GysDYC4+0rXBRyq8BKJHkYe+hxHq8lraEWFN AZIqPk8s+gXvWrMYu7cisZRXifF7bIUJ652TUQFBkH93q1BEQ5a1Fa4OYXh9+2IvzWpu zla4jf87En8+CPZTrYBlYb0sMCRh7meQm8oKJVa62aVoxMjsOPpinc/PmGIQywbDagSn web1TtVgE3DZOgZiTdDOKLVpaYuEoEZrt8pnxQnbF4/UexKfuW4N729dbP/xWWHVCGVj 57rKmzavSf7F0zwEsqtJFImGDDJF1nHeBYrRxZdunDcAh0CAsw+G2J5PnFh9KuTojz8+ 1Hug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773369028; x=1773973828; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lOjdGJLqTDMVlzYMP2N+jeP6HMxGKcLcSbSRz38UF7s=; b=I6XG84pmpIv1MoEQLTi2XomQ12LGZY8GFEKvhNpZ8MpxrWnoi4b9TXLZC5MOAB084u QMWW/Zb2mhRDEE3n5JVaEIWyKbn5c2RnCWzeJvT7AM+X/qJEyzz3Mf/4OBK7l+Wn7k02 Qa1DhIyKuq/bcsFn2eWl7YQ2uCgAZ+rbHjdqaAb3GJdsO2Iv+i3N2F6mzMKhNetiNu0m pboAr93uec+Hjhd88Xf8zRBHtxeJxYQoZkQ0uK2dCD6F65iTY+tDBbhG0wbGbVJzEh6r 3NGKHmo7yy11qWgk1ecklrmXTujrJ6QwwUyR2Ll5fa8qVeClIt3jjsKoKrL/yE2pjFny +5Qw== X-Gm-Message-State: AOJu0Yzy2nTINibHOcUQJ8A084Bok5BHw+ymxKkN4lS7+T2YA2YmBptM lGVfcCjmve1VyhDaYqEIfrpJYc25dlV4GePKBIYU/nKWTwq+q59/0uIHnmha0TPbSoYPVcrmRvm vvAaI+j5li6dCGFGksu3w33T9KpLuWo7V50y3/D3mLhLyIv6GBmkOXxjg0L1XBQ54BBgQRvSuqB CV+btywRRF8swcRIQE0OPIWMSolc2m0j6SzZ6sJ37YvqToerPG X-Received: from plhu10.prod.google.com ([2002:a17:903:124a:b0:2ae:6338:73ca]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:182:b0:2a7:c188:bd1f with SMTP id d9443c01a7336-2aecac549e8mr15834725ad.40.1773369027312; Thu, 12 Mar 2026 19:30:27 -0700 (PDT) Date: Fri, 13 Mar 2026 02:30:02 +0000 In-Reply-To: <20260313023022.2902479-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260313023022.2902479-1-jstultz@google.com> X-Mailer: git-send-email 2.53.0.880.g73c4285caa-goog Message-ID: <20260313023022.2902479-2-jstultz@google.com> Subject: [PATCH v25 1/9] sched: Make class_schedulers avoid pushing current, and get rid of proxy_tag_curr() From: John Stultz To: LKML Cc: John Stultz , K Prateek Nayak , Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With proxy-execution, the scheduler selects the donor, but for blocked donors, we end up running the lock owner. This caused some complexity, because the class schedulers make sure to remove the task they pick from their pushable task lists, which prevents the donor from being migrated, but there wasn't then anything to prevent rq->curr from being migrated if rq->curr !=3D rq->donor. This was sort of hacked around by calling proxy_tag_curr() on the rq->curr task if we were running something other then the donor. proxy_tag_curr() did a dequeue/enqueue pair on the rq->curr task, allowing the class schedulers to remove it from their pushable list. The dequeue/enqueue pair was wasteful, and additonally K Prateek highlighted that we didn't properly undo things when we stopped proxying, leaving the lock owner off the pushable list. After some alternative approaches were considered, Peter suggested just having the RT/DL classes just avoid migrating when task_on_cpu(). So rework pick_next_pushable_dl_task() and the rt pick_next_pushable_task() functions so that they skip over the first pushable task if it is on_cpu. Then just drop all of the proxy_tag_curr() logic. Fixes: be39617e38e0 ("sched: Fix proxy/current (push,pull)ability") Reported-by: K Prateek Nayak Closes: https://lore.kernel.org/lkml/e735cae0-2cc9-4bae-b761-fcb082ed3e94@a= md.com/ Suggested-by: Peter Zijlstra Signed-off-by: John Stultz --- Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/core.c | 24 ------------------------ kernel/sched/deadline.c | 16 ++++++++++++++-- kernel/sched/rt.c | 15 ++++++++++++--- 3 files changed, 26 insertions(+), 29 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b7f77c165a6e0..d86d648a75a4b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6702,23 +6702,6 @@ find_proxy_task(struct rq *rq, struct task_struct *d= onor, struct rq_flags *rf) } #endif /* SCHED_PROXY_EXEC */ =20 -static inline void proxy_tag_curr(struct rq *rq, struct task_struct *owner) -{ - if (!sched_proxy_exec()) - return; - /* - * pick_next_task() calls set_next_task() on the chosen task - * at some point, which ensures it is not push/pullable. - * However, the chosen/donor task *and* the mutex owner form an - * atomic pair wrt push/pull. - * - * Make sure owner we run is not pushable. Unfortunately we can - * only deal with that by means of a dequeue/enqueue cycle. :-/ - */ - dequeue_task(rq, owner, DEQUEUE_NOCLOCK | DEQUEUE_SAVE); - enqueue_task(rq, owner, ENQUEUE_NOCLOCK | ENQUEUE_RESTORE); -} - /* * __schedule() is the main scheduler function. * @@ -6871,9 +6854,6 @@ static void __sched notrace __schedule(int sched_mode) */ RCU_INIT_POINTER(rq->curr, next); =20 - if (!task_current_donor(rq, next)) - proxy_tag_curr(rq, next); - /* * The membarrier system call requires each architecture * to have a full memory barrier after updating @@ -6907,10 +6887,6 @@ static void __sched notrace __schedule(int sched_mod= e) /* Also unlocks the rq: */ rq =3D context_switch(rq, prev, next, &rf); } else { - /* In case next was already curr but just got blocked_donor */ - if (!task_current_donor(rq, next)) - proxy_tag_curr(rq, next); - rq_unpin_lock(rq, &rf); __balance_callbacks(rq, NULL); raw_spin_rq_unlock_irq(rq); diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index d08b004293234..4e746f4de6529 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2801,12 +2801,24 @@ static int find_later_rq(struct task_struct *task) =20 static struct task_struct *pick_next_pushable_dl_task(struct rq *rq) { - struct task_struct *p; + struct task_struct *p =3D NULL; + struct rb_node *next_node; =20 if (!has_pushable_dl_tasks(rq)) return NULL; =20 - p =3D __node_2_pdl(rb_first_cached(&rq->dl.pushable_dl_tasks_root)); + next_node =3D rb_first_cached(&rq->dl.pushable_dl_tasks_root); + while (next_node) { + p =3D __node_2_pdl(next_node); + /* make sure task isn't on_cpu (possible with proxy-exec) */ + if (!task_on_cpu(rq, p)) + break; + + next_node =3D rb_next(next_node); + } + + if (!p) + return NULL; =20 WARN_ON_ONCE(rq->cpu !=3D task_cpu(p)); WARN_ON_ONCE(task_current(rq, p)); diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index f69e1f16d9238..61569b622d1a3 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1853,13 +1853,22 @@ static int find_lowest_rq(struct task_struct *task) =20 static struct task_struct *pick_next_pushable_task(struct rq *rq) { - struct task_struct *p; + struct plist_head *head =3D &rq->rt.pushable_tasks; + struct task_struct *i, *p =3D NULL; =20 if (!has_pushable_tasks(rq)) return NULL; =20 - p =3D plist_first_entry(&rq->rt.pushable_tasks, - struct task_struct, pushable_tasks); + plist_for_each_entry(i, head, pushable_tasks) { + /* make sure task isn't on_cpu (possible with proxy-exec) */ + if (!task_on_cpu(rq, i)) { + p =3D i; + break; + } + } + + if (!p) + return NULL; =20 BUG_ON(rq->cpu !=3D task_cpu(p)); BUG_ON(task_current(rq, p)); --=20 2.53.0.880.g73c4285caa-goog From nobody Tue Apr 7 12:52:09 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41DD4253958 for ; Fri, 13 Mar 2026 02:30:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369031; cv=none; b=IRCpzw4L8I3hXN2IUqU4H8M7qOF+LHwmQhIHNR48PpUSeMUzgSPesRKOGM3SrQCU8C+sWxqhJvGBtsTCNpZB4oKjvCrimsGmIAElBdP1V8xcroTnTfLwdzhkSzXysr5vmDppWH5XmSmqj9kfesD7cWOLH2ctC63bA/bo0Ic4UhM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369031; c=relaxed/simple; bh=+ymqSKWvCPL3VrbEo/OvawyU0MzQwuNuQPTg8XmRLgA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Nb+oE58c9qVzk0Nymfp2szzpUdWX5Vt0i8LjknHmZa8LVpxRy36iKfo86TV7XT2uzXcCGGeBkV1VO637u8lfDMXyA0+2G39y42izQx0rMx6boYrt1yl66qZm+4hU7jcYOO5gEzGhmnRjsbwt+EnlLj55d1yQ1AC+WNp3+vv3DRY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=wlsoSClg; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wlsoSClg" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2ae4f27033cso17099785ad.3 for ; Thu, 12 Mar 2026 19:30:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773369029; x=1773973829; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=DQGP1ytIhcbCSNcQKXDXMUsYt+EzuKFD+qp+gg1Maak=; b=wlsoSClg5jksIkRT6ekLPwWN95p2SyPiqWoghIpHD5l7B8ahPjLJDwqx/Bb9TOKVAj yifStmzmkYTN8vQO6NcAxAdvFYF3OfVP6jzimiTCecvMoCoF2E3tnBZRjWIFwWwZMD7X yiO6Ax1Ivm/zvg9rDGXEWkIQOcYcQGAmM+dwr2kNgMxtLLiX2Nigu8RiSMHpx04mOaVm SJHwQbNsbR9y+O/guXtQWlC704Ih2S84w4uoZ576cO5ZQA9dVJijXpI82ReuOVIijaEo idVLrwsJgIXYf63GB5VfSB1nRdCSaDBQ/IcfXbQVe2d2qWIIRSGMi1A4N3nKZmDgIiUk +2Xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773369029; x=1773973829; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=DQGP1ytIhcbCSNcQKXDXMUsYt+EzuKFD+qp+gg1Maak=; b=mqDltyy5Nr6XDX7bdgfieIO/JVjPNJeljCC+Ijh6DeokAaZrbPrHK4KIXQKDaY/qxs HSGd23M9RRnFOm+ouRbZZpe2e1FBK3+EaRFUGbzQXAwPgSjEny6RHC3u23C1LmP28Dl/ R5MoUmBa/MM/ZlvV8iGa3AhJ8KZIy6DnkX5wR+vjTyex5Wg5y9fTBp+H2WKJ+2odbUtd Fy6VP82GxgpUeXfhpOeVUQj14MJyW4OUu3b+VteU8X4SVkh2+JdbIfSOHQ8gxwOEzwYe G7v0Axysst7GDMbVb/B8Njfj/jXsOavz6kkIELyx7Ed3tJfOH7Fy7ddj33YvCT97N1yl 1m9A== X-Gm-Message-State: AOJu0YyLrXgfKcKGcXhAObHD4EPKOC5I8HQILF23WJqNVK45igu+j9uA /sVnSgbgCQONlxOXNziQ3me42uRVIspLhz7Xr6DvGkQwSkPFlUVtXUF4gu4Bt+wJkGfWeztdHmC 5hRNRMqMXHSc9+XPCSK7zNo8RpKWv1rQipsx+RxZda6Lfx+uWbHEP93DuXnPXHm/Ibq+jjq8dGN 333Fx1Gi1Kq8tj8E6VGjJ5sLTpk6x7TRXwHgHOQ/jHPYlWxXWM X-Received: from plbkh12.prod.google.com ([2002:a17:903:64c:b0:2ae:8073:a4f4]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:e888:b0:2ae:a9c2:6aa with SMTP id d9443c01a7336-2aecaaf6060mr14405885ad.32.1773369029028; Thu, 12 Mar 2026 19:30:29 -0700 (PDT) Date: Fri, 13 Mar 2026 02:30:03 +0000 In-Reply-To: <20260313023022.2902479-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260313023022.2902479-1-jstultz@google.com> X-Mailer: git-send-email 2.53.0.880.g73c4285caa-goog Message-ID: <20260313023022.2902479-3-jstultz@google.com> Subject: [PATCH v25 2/9] sched: Minimise repeated sched_proxy_exec() checking From: John Stultz To: LKML Cc: John Stultz , Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Peter noted: Compilers are really bad (as in they utterly refuse) optimizing (even when marked with __pure) the static branch things, and will happily emit multiple identical in a row. So pull out the one obvious sched_proxy_exec() branch in __schedule() and remove some of the 'implicit' ones in that path. Suggested-by: Peter Zijlstra Signed-off-by: John Stultz Reviewed-by: K Prateek Nayak --- Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/core.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d86d648a75a4b..84c61496fa263 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6597,11 +6597,7 @@ find_proxy_task(struct rq *rq, struct task_struct *d= onor, struct rq_flags *rf) struct mutex *mutex; =20 /* Follow blocked_on chain. */ - for (p =3D donor; task_is_blocked(p); p =3D owner) { - mutex =3D p->blocked_on; - /* Something changed in the chain, so pick again */ - if (!mutex) - return NULL; + for (p =3D donor; (mutex =3D p->blocked_on); p =3D owner) { /* * By taking mutex->wait_lock we hold off concurrent mutex_unlock() * and ensure @owner sticks around. @@ -6832,12 +6828,14 @@ static void __sched notrace __schedule(int sched_mo= de) next =3D pick_next_task(rq, rq->donor, &rf); rq_set_donor(rq, next); rq->next_class =3D next->sched_class; - if (unlikely(task_is_blocked(next))) { - next =3D find_proxy_task(rq, next, &rf); - if (!next) - goto pick_again; - if (next =3D=3D rq->idle) - goto keep_resched; + if (sched_proxy_exec()) { + if (unlikely(next->blocked_on)) { + next =3D find_proxy_task(rq, next, &rf); + if (!next) + goto pick_again; + if (next =3D=3D rq->idle) + goto keep_resched; + } } picked: clear_tsk_need_resched(prev); --=20 2.53.0.880.g73c4285caa-goog From nobody Tue Apr 7 12:52:09 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A1F6D340298 for ; Fri, 13 Mar 2026 02:30:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369038; cv=none; b=Ap08oeZ62ZiISFmjNaL4ipHIgpM9Iqp2PArgdv7XdtKH7AeP2Ls/Ni9q9HX4bCwmk2fwHJhdPQfadzr/H46r4Qt9P9+YO5aCUC8cCUO3enbPf41Dpi86B38E9IJ5gqMIF+2n6tHbk9i2QQ+ThRUPi+DlsuwbYPwteJoUmxkfDX8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369038; c=relaxed/simple; bh=qm80zGxpWdmN1RVKUBPbOU17cl8yBd7jJ5QWj9xXobs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=XGROPDOAmHm/auFx7KCvGcgTFX8OPUSmfm1fFJQXgYoMrCudV4E7ujGdRvszco18xoJREn8Ji2j3z8yX7RlBZYCFvShhPuViaq7pxLwIuCc8SIXm+fLQWU2RU3HqDZWPzamIFW1ldt30o9S1InsR7bbHsmf/qjx8Gf8xQbuFyvw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=PPHdKSJ0; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PPHdKSJ0" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2ae50463ba8so133024655ad.0 for ; Thu, 12 Mar 2026 19:30:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773369031; x=1773973831; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=GH65yTw2v86eXQ4E5gKFopSEI5WQE0qJlJKuQ42h+K0=; b=PPHdKSJ0DNpI4XaCQ9CMdS7xf2WRCVQ4+XKFTYjP5OyjUoUHVD7FCvCDe5fxpiFUED SmdK/pLizRG5D100SKFqb42ThEayzFslui3dQWqzDqKpzpS6GLdB8zXE2HWJDY2BaYVW UB0n21LDF+kBJBG7kKT688XwAzAyt7z/aleUH8HiaKwS2cU7J9kpPayfmPJ3PjVUyJ/j DQmrAoCcQ8KXZ9MfyC4HvRTTV4FYlVW0zvk4nRDt+Nvc00h8lNoaVKsaZIlQbMZdK4Hz NmOipqcSQnv5nx0ilqVyQ0cwmX7vGrQB0Ngw4W3s22GTIbddIsISdTWqAfCdpM5pj5vy Dycg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773369031; x=1773973831; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GH65yTw2v86eXQ4E5gKFopSEI5WQE0qJlJKuQ42h+K0=; b=n2lM20EIJTv03P6F1+ry/b2OltcMvPrWKuvKorygGorSXEx0BXteyAfdBSpXEx1e+d 3aJiQsDGYJtjO4aqFgdFeg2cAP9vOBA4bj5aKsZnVY9c1JpIKM9G2yufZYBvBArFo/2V sQw9FZ7/4kSsGbfQK8dgb6PTqdKxmSd5MbHV3o40ZdhMCB8kXvpC0DrZXtVvaNgyy1Wb IX4HaCe7j6PwqVbx31xvOWVAg/3M9TEgRcE7/QCCnx20p7ERqDczdLCaOAVBYSlRnRkR 98duEQ4ZMCCTSgCMH5mGyzCnnmcVMyN6j2b47tagQKdW57ou8+iWzPOn8UvM6dGyurnc xd+w== X-Gm-Message-State: AOJu0YwVlWOV6CWvHnVPVwj00oeXvucc8/GRJE8bFMp+/hYOJAPioRuj lDijmTVjoh/Kd/grXXlzcj0JeXjDmf3jraeQYF+1VNrI7k2xEFXBu7avdznXeEc3dYpKsSBOwtZ JcS81JiyPuF+kdMEhY3f+D4X6tNpbOBaS1Sa7YUEjCzycD6fYaPH6lYJTDPcatwTv5uZVmGTqf7 fmcr8VYBH5yTMfzs1hjbj6g/9SfsQ+hZtNLwQnFjPeRyjPXt8f X-Received: from plgo15.prod.google.com ([2002:a17:902:d4cf:b0:2ae:506a:658]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:17c7:b0:2ae:47c9:68b9 with SMTP id d9443c01a7336-2aecac4dedamr12827635ad.49.1773369030737; Thu, 12 Mar 2026 19:30:30 -0700 (PDT) Date: Fri, 13 Mar 2026 02:30:04 +0000 In-Reply-To: <20260313023022.2902479-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260313023022.2902479-1-jstultz@google.com> X-Mailer: git-send-email 2.53.0.880.g73c4285caa-goog Message-ID: <20260313023022.2902479-4-jstultz@google.com> Subject: [PATCH v25 3/9] locking: Add task::blocked_lock to serialize blocked_on state From: John Stultz To: LKML Cc: John Stultz , K Prateek Nayak , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" So far, we have been able to utilize the mutex::wait_lock for serializing the blocked_on state, but when we move to proxying across runqueues, we will need to add more state and a way to serialize changes to this state in contexts where we don't hold the mutex::wait_lock. So introduce the task::blocked_lock, which nests under the mutex::wait_lock in the locking order, and rework the locking to use it. Signed-off-by: John Stultz Reviewed-by: K Prateek Nayak --- v15: * Split back out into later in the series v16: * Fixups to mark tasks unblocked before sleeping in mutex_optimistic_spin() * Rework to use guard() as suggested by Peter v19: * Rework logic for PREEMPT_RT issues reported by K Prateek Nayak v21: * After recently thinking more on ww_mutex code, I reworked the blocked_lock usage in mutex lock to avoid having to take nested locks in the ww_mutex paths, as I was concerned the lock ordering constraints weren't as strong as I had previously thought. v22: * Added some extra spaces to avoid dense code blocks suggested by K Prateek v23: * Move get_task_blocked_on() to kernel/locking/mutex.h as requested by PeterZ Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- include/linux/sched.h | 48 +++++++++++++----------------------- init/init_task.c | 1 + kernel/fork.c | 1 + kernel/locking/mutex-debug.c | 4 +-- kernel/locking/mutex.c | 40 +++++++++++++++++++----------- kernel/locking/mutex.h | 6 +++++ kernel/locking/ww_mutex.h | 4 +-- kernel/sched/core.c | 4 ++- 8 files changed, 58 insertions(+), 50 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index a7b4a980eb2f0..f9924ec02c4f2 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1238,6 +1238,7 @@ struct task_struct { #endif =20 struct mutex *blocked_on; /* lock we're blocked on */ + raw_spinlock_t blocked_lock; =20 #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER /* @@ -2181,57 +2182,42 @@ extern int __cond_resched_rwlock_write(rwlock_t *lo= ck) __must_hold(lock); #ifndef CONFIG_PREEMPT_RT static inline struct mutex *__get_task_blocked_on(struct task_struct *p) { - struct mutex *m =3D p->blocked_on; - - if (m) - lockdep_assert_held_once(&m->wait_lock); - return m; + lockdep_assert_held_once(&p->blocked_lock); + return p->blocked_on; } =20 static inline void __set_task_blocked_on(struct task_struct *p, struct mut= ex *m) { - struct mutex *blocked_on =3D READ_ONCE(p->blocked_on); - WARN_ON_ONCE(!m); /* The task should only be setting itself as blocked */ WARN_ON_ONCE(p !=3D current); - /* Currently we serialize blocked_on under the mutex::wait_lock */ - lockdep_assert_held_once(&m->wait_lock); + /* Currently we serialize blocked_on under the task::blocked_lock */ + lockdep_assert_held_once(&p->blocked_lock); /* * Check ensure we don't overwrite existing mutex value * with a different mutex. Note, setting it to the same * lock repeatedly is ok. */ - WARN_ON_ONCE(blocked_on && blocked_on !=3D m); - WRITE_ONCE(p->blocked_on, m); -} - -static inline void set_task_blocked_on(struct task_struct *p, struct mutex= *m) -{ - guard(raw_spinlock_irqsave)(&m->wait_lock); - __set_task_blocked_on(p, m); + WARN_ON_ONCE(p->blocked_on && p->blocked_on !=3D m); + p->blocked_on =3D m; } =20 static inline void __clear_task_blocked_on(struct task_struct *p, struct m= utex *m) { - if (m) { - struct mutex *blocked_on =3D READ_ONCE(p->blocked_on); - - /* Currently we serialize blocked_on under the mutex::wait_lock */ - lockdep_assert_held_once(&m->wait_lock); - /* - * There may be cases where we re-clear already cleared - * blocked_on relationships, but make sure we are not - * clearing the relationship with a different lock. - */ - WARN_ON_ONCE(blocked_on && blocked_on !=3D m); - } - WRITE_ONCE(p->blocked_on, NULL); + /* Currently we serialize blocked_on under the task::blocked_lock */ + lockdep_assert_held_once(&p->blocked_lock); + /* + * There may be cases where we re-clear already cleared + * blocked_on relationships, but make sure we are not + * clearing the relationship with a different lock. + */ + WARN_ON_ONCE(m && p->blocked_on && p->blocked_on !=3D m); + p->blocked_on =3D NULL; } =20 static inline void clear_task_blocked_on(struct task_struct *p, struct mut= ex *m) { - guard(raw_spinlock_irqsave)(&m->wait_lock); + guard(raw_spinlock_irqsave)(&p->blocked_lock); __clear_task_blocked_on(p, m); } #else diff --git a/init/init_task.c b/init/init_task.c index 5c838757fc10e..b5f48ebdc2b6e 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -169,6 +169,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = =3D { .journal_info =3D NULL, INIT_CPU_TIMERS(init_task) .pi_lock =3D __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock), + .blocked_lock =3D __RAW_SPIN_LOCK_UNLOCKED(init_task.blocked_lock), .timer_slack_ns =3D 50000, /* 50 usec default slack */ .thread_pid =3D &init_struct_pid, .thread_node =3D LIST_HEAD_INIT(init_signals.thread_head), diff --git a/kernel/fork.c b/kernel/fork.c index 65113a304518a..f233316ffad42 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2076,6 +2076,7 @@ __latent_entropy struct task_struct *copy_process( ftrace_graph_init_task(p); =20 rt_mutex_init_task(p); + raw_spin_lock_init(&p->blocked_lock); =20 lockdep_assert_irqs_enabled(); #ifdef CONFIG_PROVE_LOCKING diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c index 2c6b02d4699be..cc6aa9c6e9813 100644 --- a/kernel/locking/mutex-debug.c +++ b/kernel/locking/mutex-debug.c @@ -54,13 +54,13 @@ void debug_mutex_add_waiter(struct mutex *lock, struct = mutex_waiter *waiter, lockdep_assert_held(&lock->wait_lock); =20 /* Current thread can't be already blocked (since it's executing!) */ - DEBUG_LOCKS_WARN_ON(__get_task_blocked_on(task)); + DEBUG_LOCKS_WARN_ON(get_task_blocked_on(task)); } =20 void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *wa= iter, struct task_struct *task) { - struct mutex *blocked_on =3D __get_task_blocked_on(task); + struct mutex *blocked_on =3D get_task_blocked_on(task); =20 DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list)); DEBUG_LOCKS_WARN_ON(waiter->task !=3D task); diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 2a1d165b3167e..4aa79bcab08c7 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -656,6 +656,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas goto err_early_kill; } =20 + raw_spin_lock(¤t->blocked_lock); __set_task_blocked_on(current, lock); set_current_state(state); trace_contention_begin(lock, LCB_F_MUTEX); @@ -669,8 +670,9 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas * the handoff. */ if (__mutex_trylock(lock)) - goto acquired; + break; =20 + raw_spin_unlock(¤t->blocked_lock); /* * Check for signals and kill conditions while holding * wait_lock. This ensures the lock cancellation is ordered @@ -693,12 +695,14 @@ __mutex_lock_common(struct mutex *lock, unsigned int = state, unsigned int subclas =20 first =3D __mutex_waiter_is_first(lock, &waiter); =20 + raw_spin_lock_irqsave(&lock->wait_lock, flags); + raw_spin_lock(¤t->blocked_lock); /* * As we likely have been woken up by task * that has cleared our blocked_on state, re-set * it to the lock we are trying to acquire. */ - set_task_blocked_on(current, lock); + __set_task_blocked_on(current, lock); set_current_state(state); /* * Here we order against unlock; we must either see it change @@ -709,25 +713,33 @@ __mutex_lock_common(struct mutex *lock, unsigned int = state, unsigned int subclas break; =20 if (first) { - trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN); + bool opt_acquired; + /* * mutex_optimistic_spin() can call schedule(), so - * clear blocked on so we don't become unselectable + * we need to release these locks before calling it, + * and clear blocked on so we don't become unselectable * to run. */ - clear_task_blocked_on(current, lock); - if (mutex_optimistic_spin(lock, ww_ctx, &waiter)) + __clear_task_blocked_on(current, lock); + raw_spin_unlock(¤t->blocked_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); + + trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN); + opt_acquired =3D mutex_optimistic_spin(lock, ww_ctx, &waiter); + + raw_spin_lock_irqsave(&lock->wait_lock, flags); + raw_spin_lock(¤t->blocked_lock); + __set_task_blocked_on(current, lock); + + if (opt_acquired) break; - set_task_blocked_on(current, lock); trace_contention_begin(lock, LCB_F_MUTEX); } - - raw_spin_lock_irqsave(&lock->wait_lock, flags); } - raw_spin_lock_irqsave(&lock->wait_lock, flags); -acquired: __clear_task_blocked_on(current, lock); __set_current_state(TASK_RUNNING); + raw_spin_unlock(¤t->blocked_lock); =20 if (ww_ctx) { /* @@ -756,11 +768,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int = state, unsigned int subclas return 0; =20 err: - __clear_task_blocked_on(current, lock); + clear_task_blocked_on(current, lock); __set_current_state(TASK_RUNNING); __mutex_remove_waiter(lock, &waiter); err_early_kill: - WARN_ON(__get_task_blocked_on(current)); + WARN_ON(get_task_blocked_on(current)); trace_contention_end(lock, ret); raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q); debug_mutex_free_waiter(&waiter); @@ -971,7 +983,7 @@ static noinline void __sched __mutex_unlock_slowpath(st= ruct mutex *lock, unsigne next =3D waiter->task; =20 debug_mutex_wake_waiter(lock, waiter); - __clear_task_blocked_on(next, lock); + clear_task_blocked_on(next, lock); wake_q_add(&wake_q, next); } =20 diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h index 9ad4da8cea004..7a8ba13fee949 100644 --- a/kernel/locking/mutex.h +++ b/kernel/locking/mutex.h @@ -47,6 +47,12 @@ static inline struct task_struct *__mutex_owner(struct m= utex *lock) return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLA= GS); } =20 +static inline struct mutex *get_task_blocked_on(struct task_struct *p) +{ + guard(raw_spinlock_irqsave)(&p->blocked_lock); + return __get_task_blocked_on(p); +} + #ifdef CONFIG_DEBUG_MUTEXES extern void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter); diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h index 31a785afee6c0..e4a81790ea7dd 100644 --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -289,7 +289,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER = *waiter, * blocked_on pointer. Otherwise we can see circular * blocked_on relationships that can't resolve. */ - __clear_task_blocked_on(waiter->task, lock); + clear_task_blocked_on(waiter->task, lock); wake_q_add(wake_q, waiter->task); } =20 @@ -347,7 +347,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock, * are waking the mutex owner, who may be currently * blocked on a different mutex. */ - __clear_task_blocked_on(owner, NULL); + clear_task_blocked_on(owner, NULL); wake_q_add(wake_q, owner); } return true; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 84c61496fa263..96e2784dbba49 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6584,6 +6584,7 @@ static struct task_struct *proxy_deactivate(struct rq= *rq, struct task_struct *d * p->pi_lock * rq->lock * mutex->wait_lock + * p->blocked_lock * * Returns the task that is going to be used as execution context (the one * that is actually going to be run on cpu_of(rq)). @@ -6603,8 +6604,9 @@ find_proxy_task(struct rq *rq, struct task_struct *do= nor, struct rq_flags *rf) * and ensure @owner sticks around. */ guard(raw_spinlock)(&mutex->wait_lock); + guard(raw_spinlock)(&p->blocked_lock); =20 - /* Check again that p is blocked with wait_lock held */ + /* Check again that p is blocked with blocked_lock held */ if (mutex !=3D __get_task_blocked_on(p)) { /* * Something changed in the blocked_on chain and --=20 2.53.0.880.g73c4285caa-goog From nobody Tue Apr 7 12:52:09 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B06EC342519 for ; Fri, 13 Mar 2026 02:30:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369039; cv=none; b=DVHH3XJvY7Rmevs0xeJnC2XjcCWcbSTdbcVDIdvGbt3A5CN8vtTvBMgPACoWIy7GUxkm7teDiqZXV2Oed66m2A+oKjnplKDEed2w60HhPeVhrQH4Qi0APSemjsG+2Bm4vvfWjNgpRU7otiF6FPK4Nu6K4vQbFbVFanGyCroy12c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369039; c=relaxed/simple; bh=AKjoayGHIA60ynGU5hWRiNvsYRQB0TZbXDcMYbyU3QQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=aoYpNJspy4+wR9nMiVvSkZWadudCuhO8o479w95n80eMm2LuVu5JNCGgF5uHs0DdvIHRCSrMat+6xhoiJtAHHhEj2mhCwg+MX9OgUlbpFrsNDMyXXdPWanINHUopHkVn8W9INd2aLTHAIOGndQjb4DKxgblul1p8SFpNrg5Z8rY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=X+kKgaD9; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="X+kKgaD9" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2ae4e9577ceso120364095ad.1 for ; Thu, 12 Mar 2026 19:30:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773369033; x=1773973833; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=v6kyjTaRuEIVy2OJjeoHdnZzJalhFMinIdG0tUIi2C0=; b=X+kKgaD9h48UvBFrp9G8q2ou8q3zSm6jcv45rN5harBdF2GoXBJBAlOyiCVgVLOUEN Vgpn0zM5T+sXEOsOXPbT71KNBILV3etUlsZJkhKnKcP6oEHB21ibJT7Bb0V9izQ6Ty24 H81POQuxLLqPtNF/hReEb3Q6jPCisg6OsjlzcjVcxkcOYXTrHuasLFY8l1aeCzka0Uqj DdgWsZTRQKdyO5M3CMi6Gba72I7tyUbA+/wAHS5/v712kRTT4PqaPt42VlKdDRisg94C 0HC7hP/+86IIyJN2u7CsV/Qd/ZR9FNvLrs4EnsV/sW4d6njiNipJd9PYjMUMVKON695A VvFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773369033; x=1773973833; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=v6kyjTaRuEIVy2OJjeoHdnZzJalhFMinIdG0tUIi2C0=; b=hckoRp2yFBAIbbO1570CXOdnWaT6dcuqEQmdU9CGWeJDkBMeGvvKupJTdtqb5UKXoz YGF5sCygtoxmsz7pD63zkxtn0V4HijVBBrtMLzQudV5+MpnHOny96e2UVjv5mjUHDe1Q 80AxQXGwTwhnExM12tfzHySmT8F30Kz+k8t0dhsBaZYSXxRDwlQAuTcL3MmktNH1NqKr 22XL1JU55flG8RBYvmp5VlIYIrqNXxc/b2ia3ea8HHXlIKbns5JqK+jj7ildOcZuEvUg wMlLifttfSfFq0vWf1gm2tDWy1BgOLCJbFNw4dfClKi6TrfHMhfKncgl6IK4VsSHtz+E T4vw== X-Gm-Message-State: AOJu0Yy1cQnhW06FC8HKM23dXoKgjWVBUrBlzGlr0gMKzXd75xpBL0yS zDy8DWDM71hjwhjo0KuiX3Udft+M0dM1dpyy6Ly0duglzc4hcqB3GvYChPX8FQ4Q85myasI3Ssh HnXZ2MX8U3xnndIec933mPfnAlN/XQGn2Fco6kbw3VW+gLy7aX8wmX/K9PqTL0YZCbW7nqVsP7D vtfCWe45fMbIrPjFR3IbNqYBuZ+nFs9bS3eSDHjjcOWea7K+5M X-Received: from plqq12.prod.google.com ([2002:a17:902:c74c:b0:2ae:5a45:5844]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:d2cd:b0:2ae:8081:c5a0 with SMTP id d9443c01a7336-2aecab0c799mr15328895ad.39.1773369032508; Thu, 12 Mar 2026 19:30:32 -0700 (PDT) Date: Fri, 13 Mar 2026 02:30:05 +0000 In-Reply-To: <20260313023022.2902479-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260313023022.2902479-1-jstultz@google.com> X-Mailer: git-send-email 2.53.0.880.g73c4285caa-goog Message-ID: <20260313023022.2902479-5-jstultz@google.com> Subject: [PATCH v25 4/9] sched: Fix modifying donor->blocked on without proper locking From: John Stultz To: LKML Cc: John Stultz , K Prateek Nayak , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce an action enum in find_proxy_task() which allows us to handle work needed to be done outside the mutex.wait_lock and task.blocked_lock guard scopes. This ensures proper locking when we clear the donor's blocked_on pointer in proxy_deactivate(), and the switch statement will be useful as we add more cases to handle later in this series. Reviewed-by: K Prateek Nayak Signed-off-by: John Stultz --- v23: * Split out from earlier patch. v24: * Minor re-ordering local variables to keep with style as suggested by K Prateek Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/core.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 96e2784dbba49..0bb7272106c9e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6568,7 +6568,7 @@ static struct task_struct *proxy_deactivate(struct rq= *rq, struct task_struct *d * as unblocked, as we aren't doing proxy-migrations * yet (more logic will be needed then). */ - donor->blocked_on =3D NULL; + clear_task_blocked_on(donor, NULL); } return NULL; } @@ -6592,6 +6592,7 @@ static struct task_struct *proxy_deactivate(struct rq= *rq, struct task_struct *d static struct task_struct * find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags = *rf) { + enum { FOUND, DEACTIVATE_DONOR } action =3D FOUND; struct task_struct *owner =3D NULL; int this_cpu =3D cpu_of(rq); struct task_struct *p; @@ -6625,12 +6626,14 @@ find_proxy_task(struct rq *rq, struct task_struct *= donor, struct rq_flags *rf) =20 if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) { /* XXX Don't handle blocked owners/delayed dequeue yet */ - return proxy_deactivate(rq, donor); + action =3D DEACTIVATE_DONOR; + break; } =20 if (task_cpu(owner) !=3D this_cpu) { /* XXX Don't handle migrations yet */ - return proxy_deactivate(rq, donor); + action =3D DEACTIVATE_DONOR; + break; } =20 if (task_on_rq_migrating(owner)) { @@ -6688,6 +6691,13 @@ find_proxy_task(struct rq *rq, struct task_struct *d= onor, struct rq_flags *rf) */ } =20 + /* Handle actions we need to do outside of the guard() scope */ + switch (action) { + case DEACTIVATE_DONOR: + return proxy_deactivate(rq, donor); + case FOUND: + /* fallthrough */; + } WARN_ON_ONCE(owner && !owner->on_rq); return owner; } --=20 2.53.0.880.g73c4285caa-goog From nobody Tue Apr 7 12:52:09 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B63541917F0 for ; Fri, 13 Mar 2026 02:30:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369039; cv=none; b=s35R6KwP3vGR1O+AUZ6fZFkZx79NIPI5RfzesctwPnN1900mUsws44IdOA5HF8TZfezC6W9hE/3IXMJN2jjKTC6tTC+n2swXL6zPfFeeXYQvQrcoXT4bZGnTZcBmGDBws1sSBlA/KCqs9T3eEsGgMqF77NZgWUaISwUCDedjl5s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369039; c=relaxed/simple; bh=urW9+xuiC6IkeNpeLye7V+jpyt76jzMk1sVwjt6OeVc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=nNdNpv7FwbzNXPMP3vomI5jVpQJsHNaVAUtTYqml8TNpxy3AAG+e4sbJ0OTS71awWAeOLE3tvCXrVM0LHFwnIR4EXAxupKpCq1ayQ6kMcqjF/tZQhGOHsKipIF6xJHaLdvNICMHnJRQYntR25dZL32hMc2bK1hXye90UIr9TEwY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=uYvtVxTN; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="uYvtVxTN" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c73c065dd15so837283a12.0 for ; Thu, 12 Mar 2026 19:30:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773369035; x=1773973835; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=HDCtObVPBNxNcgmhwMGaW5pyGwzxOYwQJUPRv5qdE9A=; b=uYvtVxTNqnQld5/R+PTzcF5H1zKaWZp8NiZwzgELSVd2eVoIpR5TYncsNl6wj6MBXO nLSJoQdylF6kMXhUNZDPHjSDZ0ja+yrBDnCwyIhkw/CJLf5GGNUV4mxtUUDXEXJPypL3 0J6Rag4WRytv/vT2P6EojYRdm+cWug36PRHLYPIF0RJvkEUlJVsaPolhFUXP8VhkETKd HU7lHlYH9ughmnDFquQnoHKX/Zs0ObUxN38XzR+/mylyp10fo0pe1dgBterOAQj802Tn huOtQmEXiM296ZE5vcHNBxXrylmdbXzJDLnNl+iCgqED1FqySeffH7HF0sT/knRiGamR nT6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773369035; x=1773973835; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HDCtObVPBNxNcgmhwMGaW5pyGwzxOYwQJUPRv5qdE9A=; b=grKFdkAbi8+kyIFoptqnYUXenQgDEN6qENZ+6puW+gBClJrInBGLQQbDP6rLJ1NiPj nc4EK7ekvab9JJ0gFbF8VmjKu/NgqbMRuJ1wy94rNNGhQWKiVznVI4Vn+QmRP5BHdeCF FVE8rjeCLCBtY6FKvY4yrfjwPA3TGbb/yJLUtdpFkEy8HMh33qoI64aDK/j0Aow2vHaC v43BerACVykaFUMEC5BGYvUhek0beOusfRd3IadL6qhQP5FUTHNybt4dMu+YwAg2SY4U b7O/klKhqcRYyKHy43TMAHpw+yQiMdx4irP6vpUl/1okztxNzv50epH/BVQahAVWwF6/ H5wQ== X-Gm-Message-State: AOJu0YwQ3qwMa2nsOWTL23048+Sh5Wy6ceEqbCJC2V8mtacGpQjxI56I HYnif/Lw8xrq/zx8FUza9miVmda/r8Xn7+HAD5HAfZ1VJy3SeqGqYYZHVUXnqwBsREdFVi9T0LS zP7VP6OT5mJRZDSnWlvm1iUyj+HWveINuhjeFoFrG0CcGgRB4R5XCAVqX6li64IWhS5DLmWEQua 0i8K99Y60ZpDiMK4dy1vtjgK5eRmam9z4YG8864y/uustt5ine X-Received: from pgbcz14.prod.google.com ([2002:a05:6a02:230e:b0:c73:cd2b:fc06]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:43a7:b0:394:53de:9176 with SMTP id adf61e73a8af0-398eca31ee1mr1060298637.7.1773369034491; Thu, 12 Mar 2026 19:30:34 -0700 (PDT) Date: Fri, 13 Mar 2026 02:30:06 +0000 In-Reply-To: <20260313023022.2902479-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260313023022.2902479-1-jstultz@google.com> X-Mailer: git-send-email 2.53.0.880.g73c4285caa-goog Message-ID: <20260313023022.2902479-6-jstultz@google.com> Subject: [PATCH v25 5/9] sched/locking: Add special p->blocked_on==PROXY_WAKING value for proxy return-migration From: John Stultz To: LKML Cc: John Stultz , K Prateek Nayak , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As we add functionality to proxy execution, we may migrate a donor task to a runqueue where it can't run due to cpu affinity. Thus, we must be careful to ensure we return-migrate the task back to a cpu in its cpumask when it becomes unblocked. Peter helpfully provided the following example with pictures: "Suppose we have a ww_mutex cycle: ,-+-* Mutex-1 <-. Task-A ---' | | ,-- Task-B `-> Mutex-2 *-+-' Where Task-A holds Mutex-1 and tries to acquire Mutex-2, and where Task-B holds Mutex-2 and tries to acquire Mutex-1. Then the blocked_on->owner chain will go in circles. Task-A -> Mutex-2 ^ | | v Mutex-1 <- Task-B We need two things: - find_proxy_task() to stop iterating the circle; - the woken task to 'unblock' and run, such that it can back-off and re-try the transaction. Now, the current code [without this patch] does: __clear_task_blocked_on(); wake_q_add(); And surely clearing ->blocked_on is sufficient to break the cycle. Suppose it is Task-B that is made to back-off, then we have: Task-A -> Mutex-2 -> Task-B (no further blocked_on) and it would attempt to run Task-B. Or worse, it could directly pick Task-B and run it, without ever getting into find_proxy_task(). Now, here is a problem because Task-B might not be runnable on the CPU it is currently on; and because !task_is_blocked() we don't get into the proxy paths, so nobody is going to fix this up. Ideally we would have dequeued Task-B alongside of clearing ->blocked_on, but alas, [the lock ordering prevents us from getting the task_rq_lock() and] spoils things." Thus we need more than just a binary concept of the task being blocked on a mutex or not. So allow setting blocked_on to PROXY_WAKING as a special value which specifies the task is no longer blocked, but needs to be evaluated for return migration *before* it can be run. This will then be used in a later patch to handle proxy return-migration. Reviewed-by: K Prateek Nayak Signed-off-by: John Stultz --- v15: * Split blocked_on_state into its own patch later in the series, as the tri-state isn't necessary until we deal with proxy/return migrations v16: * Handle case where task in the chain is being set as BO_WAKING by another cpu (usually via ww_mutex die code). Make sure we release the rq lock so the wakeup can complete. * Rework to use guard() in find_proxy_task() as suggested by Peter v18: * Add initialization of blocked_on_state for init_task v19: * PREEMPT_RT build fixups and rework suggested by K Prateek Nayak v20: * Simplify one of the blocked_on_state changes to avoid extra PREMEPT_RT conditionals v21: * Slight reworks due to avoiding nested blocked_lock locking * Be consistent in use of blocked_on_state helper functions * Rework calls to proxy_deactivate() to do proper locking around blocked_on_state changes that we were cheating in previous versions. * Minor cleanups, some comment improvements v22: * Re-order blocked_on_state helpers to try to make it clearer the set_task_blocked_on() and clear_task_blocked_on() are the main enter/exit states and the blocked_on_state helpers help manage the transition states within. Per feedback from K Prateek Nayak. * Rework blocked_on_state to be defined within CONFIG_SCHED_PROXY_EXEC as suggested by K Prateek Nayak. * Reworked empty stub functions to just take one line as suggestd by K Prateek * Avoid using gotos out of a guard() scope, as highlighted by K Prateek, and instead rework logic to break and switch() on an action value. v23: * Big rework to using PROXY_WAKING instead of blocked_on_state as suggested by Peter. * Reworked commit message to include Peter's nice diagrams and example for why this extra state is necessary. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- include/linux/sched.h | 51 +++++++++++++++++++++++++++++++++++++-- kernel/locking/mutex.c | 2 +- kernel/locking/ww_mutex.h | 16 ++++++------ kernel/sched/core.c | 16 ++++++++++++ 4 files changed, 74 insertions(+), 11 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index f9924ec02c4f2..24b7b4a48ce03 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2180,10 +2180,20 @@ extern int __cond_resched_rwlock_write(rwlock_t *lo= ck) __must_hold(lock); }) =20 #ifndef CONFIG_PREEMPT_RT + +/* + * With proxy exec, if a task has been proxy-migrated, it may be a donor + * on a cpu that it can't actually run on. Thus we need a special state + * to denote that the task is being woken, but that it needs to be + * evaluated for return-migration before it is run. So if the task is + * blocked_on PROXY_WAKING, return migrate it before running it. + */ +#define PROXY_WAKING ((struct mutex *)(-1L)) + static inline struct mutex *__get_task_blocked_on(struct task_struct *p) { lockdep_assert_held_once(&p->blocked_lock); - return p->blocked_on; + return p->blocked_on =3D=3D PROXY_WAKING ? NULL : p->blocked_on; } =20 static inline void __set_task_blocked_on(struct task_struct *p, struct mut= ex *m) @@ -2211,7 +2221,7 @@ static inline void __clear_task_blocked_on(struct tas= k_struct *p, struct mutex * * blocked_on relationships, but make sure we are not * clearing the relationship with a different lock. */ - WARN_ON_ONCE(m && p->blocked_on && p->blocked_on !=3D m); + WARN_ON_ONCE(m && p->blocked_on && p->blocked_on !=3D m && p->blocked_on = !=3D PROXY_WAKING); p->blocked_on =3D NULL; } =20 @@ -2220,6 +2230,35 @@ static inline void clear_task_blocked_on(struct task= _struct *p, struct mutex *m) guard(raw_spinlock_irqsave)(&p->blocked_lock); __clear_task_blocked_on(p, m); } + +static inline void __set_task_blocked_on_waking(struct task_struct *p, str= uct mutex *m) +{ + /* Currently we serialize blocked_on under the task::blocked_lock */ + lockdep_assert_held_once(&p->blocked_lock); + + if (!sched_proxy_exec()) { + __clear_task_blocked_on(p, m); + return; + } + + /* Don't set PROXY_WAKING if blocked_on was already cleared */ + if (!p->blocked_on) + return; + /* + * There may be cases where we set PROXY_WAKING on tasks that were + * already set to waking, but make sure we are not changing + * the relationship with a different lock. + */ + WARN_ON_ONCE(m && p->blocked_on !=3D m && p->blocked_on !=3D PROXY_WAKING= ); + p->blocked_on =3D PROXY_WAKING; +} + +static inline void set_task_blocked_on_waking(struct task_struct *p, struc= t mutex *m) +{ + guard(raw_spinlock_irqsave)(&p->blocked_lock); + __set_task_blocked_on_waking(p, m); +} + #else static inline void __clear_task_blocked_on(struct task_struct *p, struct r= t_mutex *m) { @@ -2228,6 +2267,14 @@ static inline void __clear_task_blocked_on(struct ta= sk_struct *p, struct rt_mute static inline void clear_task_blocked_on(struct task_struct *p, struct rt_= mutex *m) { } + +static inline void __set_task_blocked_on_waking(struct task_struct *p, str= uct rt_mutex *m) +{ +} + +static inline void set_task_blocked_on_waking(struct task_struct *p, struc= t rt_mutex *m) +{ +} #endif /* !CONFIG_PREEMPT_RT */ =20 static __always_inline bool need_resched(void) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 4aa79bcab08c7..7d359647156df 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -983,7 +983,7 @@ static noinline void __sched __mutex_unlock_slowpath(st= ruct mutex *lock, unsigne next =3D waiter->task; =20 debug_mutex_wake_waiter(lock, waiter); - clear_task_blocked_on(next, lock); + set_task_blocked_on_waking(next, lock); wake_q_add(&wake_q, next); } =20 diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h index e4a81790ea7dd..5cd9dfa4b31e6 100644 --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -285,11 +285,11 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITE= R *waiter, debug_mutex_wake_waiter(lock, waiter); #endif /* - * When waking up the task to die, be sure to clear the - * blocked_on pointer. Otherwise we can see circular - * blocked_on relationships that can't resolve. + * When waking up the task to die, be sure to set the + * blocked_on to PROXY_WAKING. Otherwise we can see + * circular blocked_on relationships that can't resolve. */ - clear_task_blocked_on(waiter->task, lock); + set_task_blocked_on_waking(waiter->task, lock); wake_q_add(wake_q, waiter->task); } =20 @@ -339,15 +339,15 @@ static bool __ww_mutex_wound(struct MUTEX *lock, */ if (owner !=3D current) { /* - * When waking up the task to wound, be sure to clear the - * blocked_on pointer. Otherwise we can see circular - * blocked_on relationships that can't resolve. + * When waking up the task to wound, be sure to set the + * blocked_on to PROXY_WAKING. Otherwise we can see + * circular blocked_on relationships that can't resolve. * * NOTE: We pass NULL here instead of lock, because we * are waking the mutex owner, who may be currently * blocked on a different mutex. */ - clear_task_blocked_on(owner, NULL); + set_task_blocked_on_waking(owner, NULL); wake_q_add(wake_q, owner); } return true; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0bb7272106c9e..7212a439124a9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4242,6 +4242,13 @@ int try_to_wake_up(struct task_struct *p, unsigned i= nt state, int wake_flags) ttwu_queue(p, cpu, wake_flags); } out: + /* + * For now, if we've been woken up, clear the task->blocked_on + * regardless if it was set to a mutex or PROXY_WAKING so the + * task can run. We will need to be more careful later when + * properly handling proxy migration + */ + clear_task_blocked_on(p, NULL); if (success) ttwu_stat(p, task_cpu(p), wake_flags); =20 @@ -6600,6 +6607,10 @@ find_proxy_task(struct rq *rq, struct task_struct *d= onor, struct rq_flags *rf) =20 /* Follow blocked_on chain. */ for (p =3D donor; (mutex =3D p->blocked_on); p =3D owner) { + /* if its PROXY_WAKING, resched_idle so ttwu can complete */ + if (mutex =3D=3D PROXY_WAKING) + return proxy_resched_idle(rq); + /* * By taking mutex->wait_lock we hold off concurrent mutex_unlock() * and ensure @owner sticks around. @@ -6620,6 +6631,11 @@ find_proxy_task(struct rq *rq, struct task_struct *d= onor, struct rq_flags *rf) =20 owner =3D __mutex_owner(mutex); if (!owner) { + /* + * If there is no owner, clear blocked_on + * and return p so it can run and try to + * acquire the lock + */ __clear_task_blocked_on(p, mutex); return p; } --=20 2.53.0.880.g73c4285caa-goog From nobody Tue Apr 7 12:52:09 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 458251D6195 for ; Fri, 13 Mar 2026 02:30:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369038; cv=none; b=jGcpfScouVv2SGH237iDSA3+gNXoZe0kUdqQOk0hh6LZSscK5ahGRCQYo6ZpV3IPHbfZYkD6A7VsObwcmrh/PJmrDki11ezis8BCg9eJPJmjQtVx43nEFxNEeF+shCaGtn62Uy3GEnkfZDaxLLtFUkM+VOxDG72z7Jumm/r6dak= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369038; c=relaxed/simple; bh=fVqQ1eGTWwDT1GO1KZsD2s81lfAZrw50APz1xHM9qIs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jZoOFaXEu8duOAYOt7fSf/oEEhIczAgLCXA3GGDLQkk4pbHZ1Y050DtIkweOBr9hog6tynONVRJuoGQenDRH9Y2mnsNcPOTIZqCaccXcSTwxQ8+IWDDOXFkyGQM7uLD63d6CsBZrbHrm8bN5R7ODwbaN3+scjdgXN1l8gF0utHE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Kttu/XL2; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Kttu/XL2" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2ae6961bff0so131758225ad.2 for ; Thu, 12 Mar 2026 19:30:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773369036; x=1773973836; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=808JJUEjIBH5mtg1HfkcbobcjxGGZpssbTl6iIUWUyE=; b=Kttu/XL2+F3DIQy5XiOdf4w1oV/LO0kNb0mJ8Y5vHJSuFUL8z7JIqEvmH5Dnf4A6B9 +7lZ+2Q675YsqZsuymg1J9B7n1eMsVWm1HOxENgq2gXj1KIAPE4gIhjjS1AKSaQeBB+k 4Mnbod3VBkpoDwQEOfXGqzWg1gO58RtG6VeDTZn6VhvgC13Ll+YViOYnwJDsFwAXmiy0 kbBQhwei3a4Ag0IzvF8pYENjJGjMVw+170nAPzF6kLFgP66vHiSrk6AZDFYV1UuDbU92 0w8KfeEihlmwtEpgAdP6eXAunuz3kxioC1CKJfTQ54RMmfEhuYHLAS+NebZicBf8y7hW DMdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773369036; x=1773973836; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=808JJUEjIBH5mtg1HfkcbobcjxGGZpssbTl6iIUWUyE=; b=ol/vtXyRv3FYIaPV5LhAgD/kWej1c967e5WY/wUvzbUcGAMI4TZeMdpDCJiQ5k28mV ct4t0JFiFqGbQWpRR/CDODXw/aRmdOSaqT4e4IZZ50Oos517m4+vRRg9U7L4NRF8ebwZ D+Fp4wrNx2q5X4wj6Duh1CPYbVXNdi2Xtdt4hKmch2DM402SUOvVRhcor1a2V3LVxibi BWsB5i2A6jtGjatLpRbvYLNNc160xdaIK7hwr3Dm+Va3xpU004kYcqH1HeyYeXo0Jq9u iMCF5H+3Mm6hXbeTP+ZecaxfPrLxAox+o8X/CNRv2JtLsxUoKR4gg9FJ/NkMa45F8ll7 xTeg== X-Gm-Message-State: AOJu0YyByIUEGSs7mGM3ylZLWKYyFvdXjwD5kNyOVpXqiotFwOkVbjQS M7PdI06tjS4Ox9amZEPILoelZ2Gxrnp5HgsEBwcl0PLnerLP6I4yj9QFA6+fgQ/7mSzG2hm2Iuz BbySTWdQ0bJ/iskJxweYfQIi2uxAZReydfVoqJTqNlXVOTuPhmTE4neWN36hQbj/xAhWRgvEdfB TJ9XNihNByf+/SCaegQrn7rcLssBMJT/r2gJ//9W828AQVFigE X-Received: from plhz15.prod.google.com ([2002:a17:902:d9cf:b0:2ad:9f2a:e9e2]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:ac7:b0:2ae:414c:d0a8 with SMTP id d9443c01a7336-2aecac8782bmr13671515ad.48.1773369036209; Thu, 12 Mar 2026 19:30:36 -0700 (PDT) Date: Fri, 13 Mar 2026 02:30:07 +0000 In-Reply-To: <20260313023022.2902479-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260313023022.2902479-1-jstultz@google.com> X-Mailer: git-send-email 2.53.0.880.g73c4285caa-goog Message-ID: <20260313023022.2902479-7-jstultz@google.com> Subject: [PATCH v25 6/9] sched: Add assert_balance_callbacks_empty helper From: John Stultz To: LKML Cc: John Stultz , Peter Zijlstra , K Prateek Nayak , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With proxy-exec utilizing pick-again logic, we can end up having balance callbacks set by the preivous pick_next_task() call left on the list. So pull the warning out into a helper function, and make sure we check it when we pick again. Suggested-by: Peter Zijlstra Reviewed-by: K Prateek Nayak Signed-off-by: John Stultz --- v24: * Use IS_ENABLED() as suggested by K Prateek Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/core.c | 1 + kernel/sched/sched.h | 9 ++++++++- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7212a439124a9..ec9e8fe39f9fc 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6853,6 +6853,7 @@ static void __sched notrace __schedule(int sched_mode) } =20 pick_again: + assert_balance_callbacks_empty(rq); next =3D pick_next_task(rq, rq->donor, &rf); rq_set_donor(rq, next); rq->next_class =3D next->sched_class; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 43bbf0693cca4..2a0236d745832 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1853,6 +1853,13 @@ static inline void scx_rq_clock_update(struct rq *rq= , u64 clock) {} static inline void scx_rq_clock_invalidate(struct rq *rq) {} #endif /* !CONFIG_SCHED_CLASS_EXT */ =20 +static inline void assert_balance_callbacks_empty(struct rq *rq) +{ + WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_LOCKING) && + rq->balance_callback && + rq->balance_callback !=3D &balance_push_callback); +} + /* * Lockdep annotation that avoids accidental unlocks; it's like a * sticky/continuous lockdep_assert_held(). @@ -1869,7 +1876,7 @@ static inline void rq_pin_lock(struct rq *rq, struct = rq_flags *rf) =20 rq->clock_update_flags &=3D (RQCF_REQ_SKIP|RQCF_ACT_SKIP); rf->clock_update_flags =3D 0; - WARN_ON_ONCE(rq->balance_callback && rq->balance_callback !=3D &balance_p= ush_callback); + assert_balance_callbacks_empty(rq); } =20 static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf) --=20 2.53.0.880.g73c4285caa-goog From nobody Tue Apr 7 12:52:09 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40AE0340A5A for ; Fri, 13 Mar 2026 02:30:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369040; cv=none; b=i/St5w0Pg5uQFa3ZPAEJwT+Mii/+2js/S+l/japJiYtlY7/f4ARFtm0xnz6qqaOpjDeSV8q0CNKBuLlZFzIUJ8OYs7qR9oIRtwePPyV3GaZvmCb/2OE18bYLsbz1thtm3dmsKwbKCVHeF+vbunsMiH5qjhJY/V7H+Qlb+pxt/PA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369040; c=relaxed/simple; bh=pBvhCAtlY3dusekPnSl2YbU6yRgklqK/eM2/uf9N5tU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=f69MPu2FXxM7ckv8ezYSQZa2haitCEzTb9HjdN/QC8cXV8QF7O1aADkA3IGNhuvGIN15LVfmI21S3hFa8ZTaWZUTnoptEMN1Yt/D4/xnl/D5spos64E7qSOVyWDIzrbeAZpTRCUxmWHJPCgy35qFz7CmGrnQbAXUOThHGOogwBU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=BIC9yKl0; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="BIC9yKl0" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2ae59e057f1so22121825ad.1 for ; Thu, 12 Mar 2026 19:30:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773369038; x=1773973838; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=eMzlxle0S33TV4ZlgehDuBu1TCqN9OLsn14Om1T/gko=; b=BIC9yKl0o46q10P/1CGDmz8InD8MOI7fwd+3KA+GT6tlclpS2kYAdu/QOQ+iMaQgtj DF/a8WydUzisTnNoUb7r0dUAkffXWpjPfZ4OqZRst79rc/G3gjbbMEz3oORt5ivGfKcf HVjkVetzmp1W4ggzJbEpeT9rwTYMx2TDvG2+A0rUUgxTcNVgkGUiGUAH8HMI26D+Mvgs yri6xnOEM/T/0gMiBzMLoP4EFQWVJCnSBDB8Wd8GHsB30xYopiWswKYCmSAgpnfUFFC7 0KpExiv3sLxZ+xHTQkH2phW6BfHrBmw/Tn8eBSq4CqwQMSZeh9zl+zHIxF2bxHWpZVAi 0Fbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773369038; x=1773973838; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eMzlxle0S33TV4ZlgehDuBu1TCqN9OLsn14Om1T/gko=; b=k9IWu6vQZe7boxKI0nTlV/OqYOGtYTDriWmVnzj5ctxZVFEZqrzgIT1HHMC3UxPDK6 OO+fwDPHw503+KvU/g8olopXlE7zPNQBzCXHNwAP7ZinpMPsUkVL84COfM4DyYAyirpM K9ZMu7VzW5I9UNIT0WC1gTRuyYZjXgKAqpS0H+YF4UEsuOb73ZnNyeP3o91nir7yP5da SPEQtdxjSWxrBMIoUo5BOK+jspVoY5MZNCCLAaMVDulCokgUuzrr2Clug5zqzycJlkxi +wybla/c01Rf/D6brn85tPtbFCGX6n8CrGWVdAnB0F1WAO+GfXO2w2vwDLxH/crKyBS9 ICVQ== X-Gm-Message-State: AOJu0Yw+GTCtwSSdwy0lKoXlyPxjOCMpbAVa+cm7UtQUhi5a82AgLRBI GYYMcOcKxug9kcGyV8sxNAyUP5aeOHjLXtq2woEd39FFWzCfuKCRSdxAtOvrkIfix4Qx09zmP2H bft/0+YW69b4wfIiHyvrOgZ9nJjiq/5BswB7gws0ALE3dsRzsWPI0EdGX+oroGkgM45t3/s42UE /cB0WRw1EbkkiXcIFPks4RdhFnNN2hBTRmIrpBBoT/yWdVaGgs X-Received: from plmo20.prod.google.com ([2002:a17:903:18d4:b0:2ae:c8f6:b8ac]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:2ec6:b0:2ae:c9be:5f2c with SMTP id d9443c01a7336-2aecaa15da9mr14642505ad.21.1773369038196; Thu, 12 Mar 2026 19:30:38 -0700 (PDT) Date: Fri, 13 Mar 2026 02:30:08 +0000 In-Reply-To: <20260313023022.2902479-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260313023022.2902479-1-jstultz@google.com> X-Mailer: git-send-email 2.53.0.880.g73c4285caa-goog Message-ID: <20260313023022.2902479-8-jstultz@google.com> Subject: [PATCH v25 7/9] sched: Add logic to zap balance callbacks if we pick again From: John Stultz To: LKML Cc: John Stultz , K Prateek Nayak , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With proxy-exec, a task is selected to run via pick_next_task(), and then if it is a mutex blocked task, we call find_proxy_task() to find a runnable owner. If the runnable owner is on another cpu, we will need to migrate the selected donor task away, after which we will pick_again can call pick_next_task() to choose something else. However, in the first call to pick_next_task(), we may have had a balance_callback setup by the class scheduler. After we pick again, its possible pick_next_task_fair() will be called which calls sched_balance_newidle() and sched_balance_rq(). This will throw a warning: [ 8.796467] rq->balance_callback && rq->balance_callback !=3D &balance_p= ush_callback [ 8.796467] WARNING: CPU: 32 PID: 458 at kernel/sched/sched.h:1750 sched= _balance_rq+0xe92/0x1250 ... [ 8.796467] Call Trace: [ 8.796467] [ 8.796467] ? __warn.cold+0xb2/0x14e [ 8.796467] ? sched_balance_rq+0xe92/0x1250 [ 8.796467] ? report_bug+0x107/0x1a0 [ 8.796467] ? handle_bug+0x54/0x90 [ 8.796467] ? exc_invalid_op+0x17/0x70 [ 8.796467] ? asm_exc_invalid_op+0x1a/0x20 [ 8.796467] ? sched_balance_rq+0xe92/0x1250 [ 8.796467] sched_balance_newidle+0x295/0x820 [ 8.796467] pick_next_task_fair+0x51/0x3f0 [ 8.796467] __schedule+0x23a/0x14b0 [ 8.796467] ? lock_release+0x16d/0x2e0 [ 8.796467] schedule+0x3d/0x150 [ 8.796467] worker_thread+0xb5/0x350 [ 8.796467] ? __pfx_worker_thread+0x10/0x10 [ 8.796467] kthread+0xee/0x120 [ 8.796467] ? __pfx_kthread+0x10/0x10 [ 8.796467] ret_from_fork+0x31/0x50 [ 8.796467] ? __pfx_kthread+0x10/0x10 [ 8.796467] ret_from_fork_asm+0x1a/0x30 [ 8.796467] This is because if a RT task was originally picked, it will setup the rq->balance_callback with push_rt_tasks() via set_next_task_rt(). Once the task is migrated away and we pick again, we haven't processed any balance callbacks, so rq->balance_callback is not in the same state as it was the first time pick_next_task was called. To handle this, add a zap_balance_callbacks() helper function which cleans up the balance callbacks without running them. This should be ok, as we are effectively undoing the state set in the first call to pick_next_task(), and when we pick again, the new callback can be configured for the donor task actually selected. Reviewed-by: K Prateek Nayak Signed-off-by: John Stultz --- v20: * Tweaked to avoid build issues with different configs v22: * Spelling fix suggested by K Prateek * Collapsed the stub implementation to one line as suggested by K Prateek * Zap callbacks when we resched idle, as suggested by K Prateek v24: * Don't conditionalize function on CONFIG_SCHED_PROXY_EXEC as the callers will be optimized out if that is unset, and the dead function will be removed, as suggsted by K Prateek Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/core.c | 36 ++++++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ec9e8fe39f9fc..af497b8c72dce 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4917,6 +4917,34 @@ static inline void finish_task(struct task_struct *p= rev) smp_store_release(&prev->on_cpu, 0); } =20 +/* + * Only called from __schedule context + * + * There are some cases where we are going to re-do the action + * that added the balance callbacks. We may not be in a state + * where we can run them, so just zap them so they can be + * properly re-added on the next time around. This is similar + * handling to running the callbacks, except we just don't call + * them. + */ +static void zap_balance_callbacks(struct rq *rq) +{ + struct balance_callback *next, *head; + bool found =3D false; + + lockdep_assert_rq_held(rq); + + head =3D rq->balance_callback; + while (head) { + if (head =3D=3D &balance_push_callback) + found =3D true; + next =3D head->next; + head->next =3D NULL; + head =3D next; + } + rq->balance_callback =3D found ? &balance_push_callback : NULL; +} + static void do_balance_callbacks(struct rq *rq, struct balance_callback *h= ead) { void (*func)(struct rq *rq); @@ -6860,10 +6888,14 @@ static void __sched notrace __schedule(int sched_mo= de) if (sched_proxy_exec()) { if (unlikely(next->blocked_on)) { next =3D find_proxy_task(rq, next, &rf); - if (!next) + if (!next) { + zap_balance_callbacks(rq); goto pick_again; - if (next =3D=3D rq->idle) + } + if (next =3D=3D rq->idle) { + zap_balance_callbacks(rq); goto keep_resched; + } } } picked: --=20 2.53.0.880.g73c4285caa-goog From nobody Tue Apr 7 12:52:09 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BD34D3451DA for ; Fri, 13 Mar 2026 02:30:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369042; cv=none; b=Xc1tI21Zh8N724vX8IZS8ao1b+LOtm63QMHs07WwjBBMcUXh1wzyv5Upi/goh8/pcbf+BI92dv7SMx5DCLN06K2IbvGQo+fsPbIW2hF0QrfLzuont5o1KxwhQquRUF7HKCrEStipL36um5qBeuPquJPnWFeU92qe7Wy2Vkvvsdw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369042; c=relaxed/simple; bh=/ZrLFMXlOA5ycRv5sLKzYWrX/sWrnE4uXyvnOTIiLxE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=acXAxUPfA5adzm2RMCtv9uU84Qififq0CrpaCA4wKAzNnJJxjzO/2Qi+yp6bJJa52cq0Teaju+iF6nz8mvUyDI9jNAVtUYaf5rIOEp/riC3YjsYic9wcyPNto9zHjyrBaMvic5+Y2lSsQnnunSbqk+gQPiV8wXVZ9bZhQ+zptAw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=WXJgzDgX; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="WXJgzDgX" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2aecd4f7c37so2612255ad.1 for ; Thu, 12 Mar 2026 19:30:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773369040; x=1773973840; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nc1PeXdKztWL7oqAkvY2/RZ62naVi0MC12J5472olWI=; b=WXJgzDgXi7X/AuMqmDNJTWz0EEJ1YBbQfA3stFGf7kK3Ol0zT8jyNjUlfa/jH6A9NW N3DwvGpIUDowJiD/Mt+6ka8fLQrwGEeRh0ovBXehmJcvSgVP4b8sxM6omVZhi5P48I+l Kn4hal/wz86C8Z6ytRF+HMrRJCOq+3iUZnMXJAjkg+ywwK1Lxhek8KTa0e1+P/x7Ii2l PeolNzQuit2yNFApBYXEyieQXddX4VW4CPfMy6fxAVVNqKJSo05MAdZxMcrdTbnNoM1O iKFUbpNJqYc3iQTEVy9qSet8ZBgTo/yHLu5sut7LR8RfsPh+gR8eYgO9GqbReg6hANgq /tIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773369040; x=1773973840; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nc1PeXdKztWL7oqAkvY2/RZ62naVi0MC12J5472olWI=; b=o9en4sGSyRiqj8ogqki0ImusBAZ0hD1ldhwsLBLyOcKroCoOT8XiWyWIAGCxJAPXlh orBaJhbb1emLuIprftGfq0LS5v4YZKn1Tf6sLLnXRpQDjV4qoFwXvaNR9728IPi3s2w3 6X5opRAE1M6rNHNrpEeof+Wh0yCp+7hrc84dMolYCX3zKfom+pPcErOt4Td+XR316k9A U0lsqeP+cTDNcNG8SdFzuf+dyN67usrTCTZmYHlRgUYZ6YbB/LTd4OgOKcqMOKj6NQXc kPYL9VJDYw9tLwDOHgSpZ+zcxvp+Qj01XNlye9idkpL9UBz2bQabtqt3cMEfO5NbNsdr o7nw== X-Gm-Message-State: AOJu0YyvLB7dqzSNJo0dtZRaBHuREEj47rD81gHAwWje4smRXUXSgzf1 tZYqlIqNfa+92nP5BNsiHjWu6QgD8DfdT2cS6TizOBFaGKQarxflb0pJSAiEQXjdOzOYL8aoiUI 0Z7QK+fkRnknN9EczuOIgVoHQizTtvdxUdAVHjUEielPqaDDNOXBwCb34YhuFOSlVMEjwTSC98L mNFFQQUxjJBGSToQbF73TeLqHDYiJm+LKvaIQLOPg9EoTyxUZ8 X-Received: from plbbj9.prod.google.com ([2002:a17:902:8509:b0:2ae:bfa5:6ba4]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:fc87:b0:2ae:cfb5:a7a2 with SMTP id d9443c01a7336-2aecfb5aa91mr1584845ad.32.1773369039863; Thu, 12 Mar 2026 19:30:39 -0700 (PDT) Date: Fri, 13 Mar 2026 02:30:09 +0000 In-Reply-To: <20260313023022.2902479-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260313023022.2902479-1-jstultz@google.com> X-Mailer: git-send-email 2.53.0.880.g73c4285caa-goog Message-ID: <20260313023022.2902479-9-jstultz@google.com> Subject: [PATCH v25 8/9] sched: Move attach_one_task and attach_task helpers to sched.h From: John Stultz To: LKML Cc: John Stultz , K Prateek Nayak , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The fair scheduler locally introduced attach_one_task() and attach_task() helpers, but these could be generically useful so move this code to sched.h so we can use them elsewhere. Suggested-by: K Prateek Nayak Signed-off-by: John Stultz Reviewed-by: K Prateek Nayak --- Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/fair.c | 26 -------------------------- kernel/sched/sched.h | 26 ++++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 26 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bf948db905ed1..53da01a251487 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9784,32 +9784,6 @@ static int detach_tasks(struct lb_env *env) return detached; } =20 -/* - * attach_task() -- attach the task detached by detach_task() to its new r= q. - */ -static void attach_task(struct rq *rq, struct task_struct *p) -{ - lockdep_assert_rq_held(rq); - - WARN_ON_ONCE(task_rq(p) !=3D rq); - activate_task(rq, p, ENQUEUE_NOCLOCK); - wakeup_preempt(rq, p, 0); -} - -/* - * attach_one_task() -- attaches the task returned from detach_one_task() = to - * its new rq. - */ -static void attach_one_task(struct rq *rq, struct task_struct *p) -{ - struct rq_flags rf; - - rq_lock(rq, &rf); - update_rq_clock(rq); - attach_task(rq, p); - rq_unlock(rq, &rf); -} - /* * attach_tasks() -- attaches all tasks detached by detach_tasks() to their * new rq. diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2a0236d745832..d4a73c3db03d4 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3008,6 +3008,32 @@ extern void deactivate_task(struct rq *rq, struct ta= sk_struct *p, int flags); =20 extern void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags= ); =20 +/* + * attach_task() -- attach the task detached by detach_task() to its new r= q. + */ +static inline void attach_task(struct rq *rq, struct task_struct *p) +{ + lockdep_assert_rq_held(rq); + + WARN_ON_ONCE(task_rq(p) !=3D rq); + activate_task(rq, p, ENQUEUE_NOCLOCK); + wakeup_preempt(rq, p, 0); +} + +/* + * attach_one_task() -- attaches the task returned from detach_one_task() = to + * its new rq. + */ +static inline void attach_one_task(struct rq *rq, struct task_struct *p) +{ + struct rq_flags rf; + + rq_lock(rq, &rf); + update_rq_clock(rq); + attach_task(rq, p); + rq_unlock(rq, &rf); +} + #ifdef CONFIG_PREEMPT_RT # define SCHED_NR_MIGRATE_BREAK 8 #else --=20 2.53.0.880.g73c4285caa-goog From nobody Tue Apr 7 12:52:09 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D037734166B for ; Fri, 13 Mar 2026 02:30:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369045; cv=none; b=DEszAe35UhhHAAQkBufhYm+93ptgbuigedep6A8PJjGkTmdq0dDA4ERCi+e0Dk04W5YCbDCVXEyLDereRAF27FXWUF6g/vXyEyC+AcWA3Vgp9twhRFaRxTzKsogXzr+yqHD1EZGA9KwXD8748xxdY6CH3xJ3alNtCSJtYYm4als= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773369045; c=relaxed/simple; bh=fJ/ewZ1BgGFppi57TjAsxjhNajBPeGEd1jnuSM/0i88=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=YNeO6i9jboq9OMsFTHBZ+ffN1N0plRqlydJdXpC8OMGeLo8NvyGUoHU4e6Pf1BNlWgzg0HMoax4fVtHC9xvk969k8yhn+8iiF9tgzR9mZ2IHN17YcX7f1L7IiC9uF+ofRyOuC3Cmlh915HGMHqzK4xfU0s81oPNWnwfF9pGQO+U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=BkdCvfMu; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="BkdCvfMu" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2ae502a1dd9so23544245ad.3 for ; Thu, 12 Mar 2026 19:30:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773369042; x=1773973842; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=IfYJxy3WjqH0aXm+McKEn/0/sbaR270PYkPf6EibeaQ=; b=BkdCvfMu8pacb/Dj7FHznC+tiTcL/LuF1HYoTrhgbBTRrQF7ek6hTlBkStm9ssMiJQ YffPDEF4hsOu+K+BzV7oN3BwfK6y50j0E3I9tNjDj2+GcBIwbYSne16YL2GxhDxo0je0 1BFfGwdKBSKFSIi0xFBOaG9I9ejBrgMuUvBxLvSh4Vpoqjr4akXPi3MoE3minbs9XMUG u5uozweiyil1YFFX76xus+O/Jbovg7oTjDsqbU8/aXkrcKnsYJHV3RXHQlbil5MnetGL ylQ2bVtVKylWVxVCdQesvKSLy6B53XI9E87CJ+IEBGZ6PnoXijQ24aT0G2ejtSOEBtT8 K1iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773369042; x=1773973842; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=IfYJxy3WjqH0aXm+McKEn/0/sbaR270PYkPf6EibeaQ=; b=QR4robyuqLPPMgiPAVfCyJMsFDGfCeEXNRDkgEd4IrenzYwD2XnLzuYj9HfG02MfAq NnEyBiw/rwYMADkl752FYXISLa9XaFRbWEKS8eXpoqG717Cn780+kHXuWybjbwC/psnm nZWBWKf8HjJaxj0RkAkqk6clEMOMZYsqNODBS0e/NULq+wdH3WCzbcdkr+/UunCazqiJ xEgKpallvPzAYz4DUmZNObMBhqldxBHTLPABxzrwukrMhvEap9r/nL7Kk+8A+3Lt4VwZ D6wQ633NPcTj8k4J5PSoFNQtgHeGeqJgw+iWAgYJV82E2dnlVfgaV/xGTFWLHb4IZe78 OTLQ== X-Gm-Message-State: AOJu0YyUv8LYFv3d3haP/FK9k9r3kABIjbb9jUPAf3m3vfL/Jv6OeRvp YLaE8aXK3f0cLKoWmMqyun5dZE9rXEvfrruZAJlnQy7vlQxwJ/cXWxtcmLCTa0SlYys63I7QXDL FK0vElml5UzfioAddz9jtZQuobOVNl3NJe938GroCmgx97Tkfr365krP+n0Bg7q5liaElSQWHWg 1YZi+DiY+mOGkac0PbYA9OjY2iFl42AH4VQLjlU4YwDxhe+RT0 X-Received: from pgbcq7.prod.google.com ([2002:a05:6a02:4087:b0:c65:e8e7:d49d]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:cecf:b0:2ae:ac0c:5a2f with SMTP id d9443c01a7336-2aecab2af6dmr14452585ad.41.1773369041812; Thu, 12 Mar 2026 19:30:41 -0700 (PDT) Date: Fri, 13 Mar 2026 02:30:10 +0000 In-Reply-To: <20260313023022.2902479-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260313023022.2902479-1-jstultz@google.com> X-Mailer: git-send-email 2.53.0.880.g73c4285caa-goog Message-ID: <20260313023022.2902479-10-jstultz@google.com> Subject: [PATCH v25 9/9] sched: Handle blocked-waiter migration (and return migration) From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add logic to handle migrating a blocked waiter to a remote cpu where the lock owner is runnable. Additionally, as the blocked task may not be able to run on the remote cpu, add logic to handle return migration once the waiting task is given the mutex. Because tasks may get migrated to where they cannot run, also modify the scheduling classes to avoid sched class migrations on mutex blocked tasks, leaving find_proxy_task() and related logic to do the migrations and return migrations. This was split out from the larger proxy patch, and significantly reworked. Credits for the original patch go to: Peter Zijlstra (Intel) Juri Lelli Valentin Schneider Connor O'Brien Signed-off-by: John Stultz --- v6: * Integrated sched_proxy_exec() check in proxy_return_migration() * Minor cleanups to diff * Unpin the rq before calling __balance_callbacks() * Tweak proxy migrate to migrate deeper task in chain, to avoid tasks pingponging between rqs v7: * Fixup for unused function arguments * Switch from that_rq -> target_rq, other minor tweaks, and typo fixes suggested by Metin Kaya * Switch back to doing return migration in the ttwu path, which avoids nasty lock juggling and performance issues * Fixes for UP builds v8: * More simplifications from Metin Kaya * Fixes for null owner case, including doing return migration * Cleanup proxy_needs_return logic v9: * Narrow logic in ttwu that sets BO_RUNNABLE, to avoid missed return migrations * Switch to using zap_balance_callbacks rathern then running them when we are dropping rq locks for proxy_migration. * Drop task_is_blocked check in sched_submit_work as suggested by Metin (may re-add later if this causes trouble) * Do return migration when we're not on wake_cpu. This avoids bad task placement caused by proxy migrations raised by Xuewen Yan * Fix to call set_next_task(rq->curr) prior to dropping rq lock to avoid rq->curr getting migrated before we have actually switched from it * Cleanup to re-use proxy_resched_idle() instead of open coding it in proxy_migrate_task() * Fix return migration not to use DEQUEUE_SLEEP, so that we properly see the task as task_on_rq_migrating() after it is dequeued but before set_task_cpu() has been called on it * Fix to broaden find_proxy_task() checks to avoid race where a task is dequeued off the rq due to return migration, but set_task_cpu() and the enqueue on another rq happened after we checked task_cpu(owner). This ensures we don't proxy using a task that is not actually on our runqueue. * Cleanup to avoid the locked BO_WAKING->BO_RUNNABLE transition in try_to_wake_up() if proxy execution isn't enabled. * Cleanup to improve comment in proxy_migrate_task() explaining the set_next_task(rq->curr) logic * Cleanup deadline.c change to stylistically match rt.c change * Numerous cleanups suggested by Metin v10: * Drop WARN_ON(task_is_blocked(p)) in ttwu current case v11: * Include proxy_set_task_cpu from later in the series to this change so we can use it, rather then reworking logic later in the series. * Fix problem with return migration, where affinity was changed and wake_cpu was left outside the affinity mask. * Avoid reading the owner's cpu twice (as it might change inbetween) to avoid occasional migration-to-same-cpu edge cases * Add extra WARN_ON checks for wake_cpu and return migration edge cases. * Typo fix from Metin v13: * As we set ret, return it, not just NULL (pulling this change in from later patch) * Avoid deadlock between try_to_wake_up() and find_proxy_task() when blocked_on cycle with ww_mutex is trying a mid-chain wakeup. * Tweaks to use new __set_blocked_on_runnable() helper * Potential fix for incorrectly updated task->dl_server issues * Minor comment improvements * Add logic to handle missed wakeups, in that case doing return migration from the find_proxy_task() path * Minor cleanups v14: * Improve edge cases where we wouldn't set the task as BO_RUNNABLE v15: * Added comment to better describe proxy_needs_return() as suggested by Qais * Build fixes for !CONFIG_SMP reported by Maciej =C5=BBenczykowski * Adds fix for re-evaluating proxy_needs_return when sched_proxy_exec() is disabled, reported and diagnosed by: kuyo chang v16: * Larger rework of needs_return logic in find_proxy_task, in order to avoid problems with cpuhotplug * Rework to use guard() as suggested by Peter v18: * Integrate optimization suggested by Suleiman to do the checks for sleeping owners before checking if the task_cpu is this_cpu, so that we can avoid needlessly proxy-migrating tasks to only then dequeue them. Also check if migrating last. * Improve comments around guard locking * Include tweak to ttwu_runnable() as suggested by hupu * Rework the logic releasing the rq->donor reference before letting go of the rqlock. Just use rq->idle. * Go back to doing return migration on BO_WAKING owners, as I was hitting some softlockups caused by running tasks not making it out of BO_WAKING. v19: * Fixed proxy_force_return() logic for !SMP cases v21: * Reworked donor deactivation for unhandled sleeping owners * Commit message tweaks v22: * Add comments around zap_balance_callbacks in proxy_migration logic * Rework logic to avoid gotos out of guard() scopes, and instead use break and switch() on action value, as suggested by K Prateek * K Prateek suggested simplifications around putting donor and setting idle as next task in the migration paths, which I further simplified to using proxy_resched_idle() * Comment improvements * Dropped curr !=3D donor check in pick_next_task_fair() suggested by K Prateek v23: * Rework to use the PROXY_WAKING approach suggested by Peter * Drop unnecessarily setting wake_cpu when affinity changes as noticed by Peter * Split out the ttwu() logic changes into a later separate patch as suggested by Peter v24: * Numerous fixes for rq clock handling, pointed out by K Prateek * Slight tweak to where put_task() is called suggested by K Prateek v25: * Use WF_TTWU in proxy_force_return(), suggested by K Prateek * Drop get/put_task_struct() in proxy_force_return(), suggested by K Prateek * Use attach_one_task() to reduce repetitive logic, as suggested by K Prateek Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/core.c | 221 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 191 insertions(+), 30 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index af497b8c72dce..fe20204cf51cc 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3643,6 +3643,23 @@ void update_rq_avg_idle(struct rq *rq) rq->idle_stamp =3D 0; } =20 +#ifdef CONFIG_SCHED_PROXY_EXEC +static inline void proxy_set_task_cpu(struct task_struct *p, int cpu) +{ + unsigned int wake_cpu; + + /* + * Since we are enqueuing a blocked task on a cpu it may + * not be able to run on, preserve wake_cpu when we + * __set_task_cpu so we can return the task to where it + * was previously runnable. + */ + wake_cpu =3D p->wake_cpu; + __set_task_cpu(p, cpu); + p->wake_cpu =3D wake_cpu; +} +#endif /* CONFIG_SCHED_PROXY_EXEC */ + static void ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, struct rq_flags *rf) @@ -4242,13 +4259,6 @@ int try_to_wake_up(struct task_struct *p, unsigned i= nt state, int wake_flags) ttwu_queue(p, cpu, wake_flags); } out: - /* - * For now, if we've been woken up, clear the task->blocked_on - * regardless if it was set to a mutex or PROXY_WAKING so the - * task can run. We will need to be more careful later when - * properly handling proxy migration - */ - clear_task_blocked_on(p, NULL); if (success) ttwu_stat(p, task_cpu(p), wake_flags); =20 @@ -6575,7 +6585,7 @@ static inline struct task_struct *proxy_resched_idle(= struct rq *rq) return rq->idle; } =20 -static bool __proxy_deactivate(struct rq *rq, struct task_struct *donor) +static bool proxy_deactivate(struct rq *rq, struct task_struct *donor) { unsigned long state =3D READ_ONCE(donor->__state); =20 @@ -6595,17 +6605,135 @@ static bool __proxy_deactivate(struct rq *rq, stru= ct task_struct *donor) return try_to_block_task(rq, donor, &state, true); } =20 -static struct task_struct *proxy_deactivate(struct rq *rq, struct task_str= uct *donor) +/* + * If the blocked-on relationship crosses CPUs, migrate @p to the + * owner's CPU. + * + * This is because we must respect the CPU affinity of execution + * contexts (owner) but we can ignore affinity for scheduling + * contexts (@p). So we have to move scheduling contexts towards + * potential execution contexts. + * + * Note: The owner can disappear, but simply migrate to @target_cpu + * and leave that CPU to sort things out. + */ +static void proxy_migrate_task(struct rq *rq, struct rq_flags *rf, + struct task_struct *p, int target_cpu) { - if (!__proxy_deactivate(rq, donor)) { - /* - * XXX: For now, if deactivation failed, set donor - * as unblocked, as we aren't doing proxy-migrations - * yet (more logic will be needed then). - */ - clear_task_blocked_on(donor, NULL); + struct rq *target_rq =3D cpu_rq(target_cpu); + + lockdep_assert_rq_held(rq); + + /* + * Since we're going to drop @rq, we have to put(@rq->donor) first, + * otherwise we have a reference that no longer belongs to us. + * + * Additionally, as we put_prev_task(prev) earlier, its possible that + * prev will migrate away as soon as we drop the rq lock, however we + * still have it marked as rq->curr, as we've not yet switched tasks. + * + * So call proxy_resched_idle() to let go of the references before + * we release the lock. + */ + proxy_resched_idle(rq); + + WARN_ON(p =3D=3D rq->curr); + + deactivate_task(rq, p, DEQUEUE_NOCLOCK); + proxy_set_task_cpu(p, target_cpu); + + /* + * We have to zap callbacks before unlocking the rq + * as another CPU may jump in and call sched_balance_rq + * which can trip the warning in rq_pin_lock() if we + * leave callbacks set. + */ + zap_balance_callbacks(rq); + rq_unpin_lock(rq, rf); + raw_spin_rq_unlock(rq); + + attach_one_task(target_rq, p); + + raw_spin_rq_lock(rq); + rq_repin_lock(rq, rf); + update_rq_clock(rq); +} + +static void proxy_force_return(struct rq *rq, struct rq_flags *rf, + struct task_struct *p) +{ + struct rq *this_rq, *target_rq; + struct rq_flags this_rf; + int cpu, wake_flag =3D WF_TTWU; + + lockdep_assert_rq_held(rq); + WARN_ON(p =3D=3D rq->curr); + + /* + * We have to zap callbacks before unlocking the rq + * as another CPU may jump in and call sched_balance_rq + * which can trip the warning in rq_pin_lock() if we + * leave callbacks set. + */ + zap_balance_callbacks(rq); + rq_unpin_lock(rq, rf); + raw_spin_rq_unlock(rq); + + /* + * We drop the rq lock, and re-grab task_rq_lock to get + * the pi_lock (needed for select_task_rq) as well. + */ + this_rq =3D task_rq_lock(p, &this_rf); + + /* + * Since we let go of the rq lock, the task may have been + * woken or migrated to another rq before we got the + * task_rq_lock. So re-check we're on the same RQ. If + * not, the task has already been migrated and that CPU + * will handle any futher migrations. + */ + if (this_rq !=3D rq) + goto err_out; + + /* Similarly, if we've been dequeued, someone else will wake us */ + if (!task_on_rq_queued(p)) + goto err_out; + + /* + * Since we should only be calling here from __schedule() + * -> find_proxy_task(), no one else should have + * assigned current out from under us. But check and warn + * if we see this, then bail. + */ + if (task_current(this_rq, p) || task_on_cpu(this_rq, p)) { + WARN_ONCE(1, "%s rq: %i current/on_cpu task %s %d on_cpu: %i\n", + __func__, cpu_of(this_rq), + p->comm, p->pid, p->on_cpu); + goto err_out; } - return NULL; + + update_rq_clock(this_rq); + proxy_resched_idle(this_rq); + deactivate_task(this_rq, p, DEQUEUE_NOCLOCK); + cpu =3D select_task_rq(p, p->wake_cpu, &wake_flag); + set_task_cpu(p, cpu); + target_rq =3D cpu_rq(cpu); + clear_task_blocked_on(p, NULL); + task_rq_unlock(this_rq, p, &this_rf); + + attach_one_task(target_rq, p); + + /* Finally, re-grab the origianl rq lock and return to pick-again */ + raw_spin_rq_lock(rq); + rq_repin_lock(rq, rf); + update_rq_clock(rq); + return; + +err_out: + task_rq_unlock(this_rq, p, &this_rf); + raw_spin_rq_lock(rq); + rq_repin_lock(rq, rf); + update_rq_clock(rq); } =20 /* @@ -6627,17 +6755,25 @@ static struct task_struct *proxy_deactivate(struct = rq *rq, struct task_struct *d static struct task_struct * find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags = *rf) { - enum { FOUND, DEACTIVATE_DONOR } action =3D FOUND; + enum { FOUND, DEACTIVATE_DONOR, MIGRATE, NEEDS_RETURN } action =3D FOUND; struct task_struct *owner =3D NULL; + bool curr_in_chain =3D false; int this_cpu =3D cpu_of(rq); struct task_struct *p; struct mutex *mutex; + int owner_cpu; =20 /* Follow blocked_on chain. */ for (p =3D donor; (mutex =3D p->blocked_on); p =3D owner) { - /* if its PROXY_WAKING, resched_idle so ttwu can complete */ - if (mutex =3D=3D PROXY_WAKING) - return proxy_resched_idle(rq); + /* if its PROXY_WAKING, do return migration or run if current */ + if (mutex =3D=3D PROXY_WAKING) { + if (task_current(rq, p)) { + clear_task_blocked_on(p, PROXY_WAKING); + return p; + } + action =3D NEEDS_RETURN; + break; + } =20 /* * By taking mutex->wait_lock we hold off concurrent mutex_unlock() @@ -6657,26 +6793,41 @@ find_proxy_task(struct rq *rq, struct task_struct *= donor, struct rq_flags *rf) return NULL; } =20 + if (task_current(rq, p)) + curr_in_chain =3D true; + owner =3D __mutex_owner(mutex); if (!owner) { /* - * If there is no owner, clear blocked_on - * and return p so it can run and try to - * acquire the lock + * If there is no owner, either clear blocked_on + * and return p (if it is current and safe to + * just run on this rq), or return-migrate the task. */ - __clear_task_blocked_on(p, mutex); - return p; + if (task_current(rq, p)) { + __clear_task_blocked_on(p, NULL); + return p; + } + action =3D NEEDS_RETURN; + break; } =20 if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) { /* XXX Don't handle blocked owners/delayed dequeue yet */ + if (curr_in_chain) + return proxy_resched_idle(rq); action =3D DEACTIVATE_DONOR; break; } =20 - if (task_cpu(owner) !=3D this_cpu) { - /* XXX Don't handle migrations yet */ - action =3D DEACTIVATE_DONOR; + owner_cpu =3D task_cpu(owner); + if (owner_cpu !=3D this_cpu) { + /* + * @owner can disappear, simply migrate to @owner_cpu + * and leave that CPU to sort things out. + */ + if (curr_in_chain) + return proxy_resched_idle(rq); + action =3D MIGRATE; break; } =20 @@ -6738,7 +6889,17 @@ find_proxy_task(struct rq *rq, struct task_struct *d= onor, struct rq_flags *rf) /* Handle actions we need to do outside of the guard() scope */ switch (action) { case DEACTIVATE_DONOR: - return proxy_deactivate(rq, donor); + if (proxy_deactivate(rq, donor)) + return NULL; + /* If deactivate fails, force return */ + p =3D donor; + fallthrough; + case NEEDS_RETURN: + proxy_force_return(rq, rf, p); + return NULL; + case MIGRATE: + proxy_migrate_task(rq, rf, p, owner_cpu); + return NULL; case FOUND: /* fallthrough */; } --=20 2.53.0.880.g73c4285caa-goog