From nobody Sat Feb 7 13:56:41 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26D111E2823 for ; Wed, 12 Mar 2025 22:12:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741817528; cv=none; b=B9ERq+GqzoB2EisERqNSlrJJYoTv5mqXAHHdb0/T3dcFIyEjOcAo44bgzIF1IV0JDRBBvXzd7Mfdn0UGEzh8aNmd3EWGIzz51zEI9N/GgbmjQXEMozcq27mXTvgnQoSnh5UcfzZmMvMlMoyqZqOwjj31fiQ8JEgjfBSQ+wz73+Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741817528; c=relaxed/simple; bh=ScRvgL5J66/FNMNQgyhMFOslQSnvwTPOErcqUsKWrxc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=rZ8ewWE7TerxZJM8B3ChqN4pSKrXZ3hcavON7c2olnBg4egWmnL2XSbUbPJieklWopBbv14JNRA+JktMCUSKx61FcKMrU5Ql48q53JXQg/ScVQZSD0gZQ9hSHENQanCQ9lrkrcEpYS6vrhwekAPkpx6/YjN6qEl/Y9vFnM4LsZQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=GD5psmxo; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="GD5psmxo" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-225ab4b8fe9so4611045ad.0 for ; Wed, 12 Mar 2025 15:12:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741817525; x=1742422325; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=td7oW6pOWYq7YvXPP724rbAu1/wbGK62pp3VH3i7leI=; b=GD5psmxo1gk179tOiAsO4NQqpHMMqKNNqJjU1BXia/9ZA4qLvzNXE22VY6N/E4Py8K dLVnXktT87r+GZh+mfeL/vtVQ4njiMZ+Th0VhI4WtWHAPPwZi7fxzh2wWw/nc6jwfQkX oFXu6EXMAk5ZHw1XWxAKyrI6tsC1JYJ9HYcfDhNKDnBqUNWbtGov3VwOGHwcW/dxVj/m ql77n6KWLqmtvvn07jL3WpfHZIIsbRhoj66Ptpmt16T+GojnKrDXK2DM8O/KrSYxNX7z w3FGfcUjZK9EOTZuy4mFIA+61UdPVJsivaVV5WeEzizGaAT2Vc9GBBZ39vf1pKcGC8gN ToMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741817525; x=1742422325; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=td7oW6pOWYq7YvXPP724rbAu1/wbGK62pp3VH3i7leI=; b=ayCPwS6/BmyEYnCJwZNYmKXAbZo4iHbKFYKQgJvjcq4g42I/EDkTfJOOBjaFWRoD99 8ZgKx/8BC2PJ8a51VGGO+SVs+aXYdQ0qZ5RlQtAmJICax8iAV2wMibO8SVyOwMwkQqBE 9GtnXV2ZHlTMM377lYls5vVGhZFIOX/K977bTS+sRP9rVZpmG6MmHJpZaqeBavrRNF4F CKavq/b58bdT/ht9E0Xf398JU6aXMEyYJt6nZmVKw29mqRrcENSLlRi2ISyPglMawrEs ea7/t5v9TktkWRgjTy3fFO0/4uX6Df1Wc8KCcVKBJMmPaSZjYnCu52p4814Jw6V1X3Cd 7HFQ== X-Gm-Message-State: AOJu0Yy05JX49ru8QB3ZsV4BoIK8XfKsKwj3HwnH8yWPH/oE7IAMeQqL qyZafGLGEo2aEkniPBTBEu+DTd1YY4gKVP0v4fJhQutQYJilQkAvpm2C/4bXW5C7SgTxRmMt/BR ZdPedczRj4XX4r8HD1H7WJmalQ/AC+XeuhjKIpEfLXICzUhZcrYTAY+Bu7R+WoHAfS6CDXK3Rd1 Fw19bhgedHGSO+CWTkSjZ4XGzFiWLZ8EDrU5qxSmbhyN6S X-Google-Smtp-Source: AGHT+IGuSkFcXrZEdH54rhkETrN7461UlsipLwjDZ0vn5KVHhJclQ2ZPE5pFd99+xZMGVOT8WBwbI+o8nQVY X-Received: from pfuv20.prod.google.com ([2002:a05:6a00:1494:b0:730:4672:64ac]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:3e28:b0:736:4e67:d631 with SMTP id d2e1a72fcca58-736eb8a17c0mr11578939b3a.23.1741817525257; Wed, 12 Mar 2025 15:12:05 -0700 (PDT) Date: Wed, 12 Mar 2025 15:11:31 -0700 In-Reply-To: <20250312221147.1865364-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250312221147.1865364-1-jstultz@google.com> X-Mailer: git-send-email 2.49.0.rc0.332.g42c0ae87b1-goog Message-ID: <20250312221147.1865364-2-jstultz@google.com> Subject: [RFC PATCH v15 1/7] sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a CONFIG_SCHED_PROXY_EXEC option, along with a boot argument sched_proxy_exec=3D that can be used to disable the feature at boot time if CONFIG_SCHED_PROXY_EXEC was enabled. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kernel-team@android.com Tested-by: K Prateek Nayak Signed-off-by: John Stultz --- v7: * Switch to CONFIG_SCHED_PROXY_EXEC/sched_proxy_exec=3D as suggested by Metin Kaya. * Switch boot arg from =3Ddisable/enable to use kstrtobool(), which supports =3Dyes|no|1|0|true|false|on|off, as also suggested by Metin Kaya, and print a message when a boot argument is used. v8: * Move CONFIG_SCHED_PROXY_EXEC under Scheduler Features as Suggested by Metin * Minor rework reordering with split sched contexts patch v12: * Rework for selected -> donor renaming v14: * Depend on !PREEMPT_RT to avoid build issues for now v15: * Depend on EXPERT while patch series upstreaming is in progress. --- .../admin-guide/kernel-parameters.txt | 5 ++++ include/linux/sched.h | 13 +++++++++ init/Kconfig | 10 +++++++ kernel/sched/core.c | 29 +++++++++++++++++++ kernel/sched/sched.h | 12 ++++++++ 5 files changed, 69 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index fb8752b42ec85..dcc2443078d00 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6262,6 +6262,11 @@ sa1100ir [NET] See drivers/net/irda/sa1100_ir.c. =20 + sched_proxy_exec=3D [KNL] + Enables or disables "proxy execution" style + solution to mutex-based priority inversion. + Format: + sched_verbose [KNL,EARLY] Enables verbose scheduler debug messages. =20 schedstats=3D [KNL,X86] Enable or disable scheduled statistics. diff --git a/include/linux/sched.h b/include/linux/sched.h index 9c15365a30c08..1462f2c70aefc 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1636,6 +1636,19 @@ struct task_struct { */ }; =20 +#ifdef CONFIG_SCHED_PROXY_EXEC +DECLARE_STATIC_KEY_TRUE(__sched_proxy_exec); +static inline bool sched_proxy_exec(void) +{ + return static_branch_likely(&__sched_proxy_exec); +} +#else +static inline bool sched_proxy_exec(void) +{ + return false; +} +#endif + #define TASK_REPORT_IDLE (TASK_REPORT + 1) #define TASK_REPORT_MAX (TASK_REPORT_IDLE << 1) =20 diff --git a/init/Kconfig b/init/Kconfig index d0d021b3fa3b3..b989ddc27444e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -875,6 +875,16 @@ config UCLAMP_BUCKETS_COUNT =20 If in doubt, use the default value. =20 +config SCHED_PROXY_EXEC + bool "Proxy Execution" + default n + # Avoid some build failures w/ PREEMPT_RT until it can be fixed + depends on !PREEMPT_RT + depends on EXPERT + help + This option enables proxy execution, a mechanism for mutex-owning + tasks to inherit the scheduling context of higher priority waiters. + endmenu =20 # diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 67189907214d3..3968c3967ec38 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -119,6 +119,35 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp); =20 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); =20 +#ifdef CONFIG_SCHED_PROXY_EXEC +DEFINE_STATIC_KEY_TRUE(__sched_proxy_exec); +static int __init setup_proxy_exec(char *str) +{ + bool proxy_enable; + + if (kstrtobool(str, &proxy_enable)) { + pr_warn("Unable to parse sched_proxy_exec=3D\n"); + return 0; + } + + if (proxy_enable) { + pr_info("sched_proxy_exec enabled via boot arg\n"); + static_branch_enable(&__sched_proxy_exec); + } else { + pr_info("sched_proxy_exec disabled via boot arg\n"); + static_branch_disable(&__sched_proxy_exec); + } + return 1; +} +#else +static int __init setup_proxy_exec(char *str) +{ + pr_warn("CONFIG_SCHED_PROXY_EXEC=3Dn, so it cannot be enabled or disabled= at boot time\n"); + return 0; +} +#endif +__setup("sched_proxy_exec=3D", setup_proxy_exec); + #ifdef CONFIG_SCHED_DEBUG /* * Debugging: various feature bits diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index c8512a9fb0229..05d2122533619 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1155,10 +1155,15 @@ struct rq { */ unsigned int nr_uninterruptible; =20 +#ifdef CONFIG_SCHED_PROXY_EXEC + struct task_struct __rcu *donor; /* Scheduling context */ + struct task_struct __rcu *curr; /* Execution context */ +#else union { struct task_struct __rcu *donor; /* Scheduler context */ struct task_struct __rcu *curr; /* Execution context */ }; +#endif struct sched_dl_entity *dl_server; struct task_struct *idle; struct task_struct *stop; @@ -1355,10 +1360,17 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues= ); #define cpu_curr(cpu) (cpu_rq(cpu)->curr) #define raw_rq() raw_cpu_ptr(&runqueues) =20 +#ifdef CONFIG_SCHED_PROXY_EXEC +static inline void rq_set_donor(struct rq *rq, struct task_struct *t) +{ + rcu_assign_pointer(rq->donor, t); +} +#else static inline void rq_set_donor(struct rq *rq, struct task_struct *t) { /* Do nothing */ } +#endif =20 #ifdef CONFIG_SCHED_CORE static inline struct cpumask *sched_group_span(struct sched_group *sg); --=20 2.49.0.rc0.332.g42c0ae87b1-goog From nobody Sat Feb 7 13:56:41 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D42F71EC012 for ; Wed, 12 Mar 2025 22:12:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741817529; cv=none; b=IoDgVCJpnnl8hEiiNWkBs2bFTMRqgDOXDhRqMRdyJiYo6eqYJUJwbS3xLHtZPZtoSvNe+R3wDPXlAZ+9p9BJ2K/Q8aBV6MB2/SPDYWeMkvxH4bwxZTK1/qi+P3CIkegSeUzbrDfEUbFkEP3ebBg5zmi5XFo+co4lLZyA43ZAlLE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741817529; c=relaxed/simple; bh=cuncSF2jZ+loXRRw8H1XIHEL/W0XZl6Skl7kz+5I+GY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=V97PIRgTqB8jrnM40a2T8uurmDksVY0G5jm8u8947M6NCg3aNAV8CSVn55MIh6LX3T1c69LVos3PPLOc+Q7urYMNo69L13I3chuALEhponQmr1MCrsK4Fv9c3WUf/W9Q9xBbXjoo3zb7ePEMHzxSiP2oFAzyEth0ZV94WatUzAU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=yYwwRAM8; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yYwwRAM8" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ff78dd28ecso867824a91.1 for ; Wed, 12 Mar 2025 15:12:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741817527; x=1742422327; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zpcsowxkDEiJyeVQoyWZ2mXnVaEczDflLqvxCgHmPP8=; b=yYwwRAM8CHF0T07V39ZG7MDAT7W1WTRqzrXFDnxoWMWEgG+XAvz/o+1rS0dpXeR3ub bFibdx9BRq8xNxZBNKGcEwYQlcfuBIc/g+y2pupOwVM4N3lnhOx5CE8lLPZXtR27KIMW KpqDyaRSNp8fSYJT0X25CSy9nTGjKCUDPwVyuSKA7gs/PuRXhEviGj48Pr+K4LhVWzp/ ZSdLoVBU/GaokDrrKGvMsrVnmfpmXrwquULtc/xNBfseXElFHXN/yUrW90HNu3yfo3rZ zdz8/T+MC/akwWln3vHxLn/oS69Ot+dkx62/2Dsn3n9IbK8hE7r5KrfJVZP1vNwnoOKH d30w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741817527; x=1742422327; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zpcsowxkDEiJyeVQoyWZ2mXnVaEczDflLqvxCgHmPP8=; b=ncCwrgaZg1Mj5+kKjpHcHPOd9+iCb3+brajw/KUt3m2+K9if1Z90ZKmFLI+dbGpW2J Nrk+rWnohIfnBAagcPHfqwFQC73ZZI1wuzG3rcjyD36+4OS0xALED17IbYeg22OaO0k+ it4v5Ii4Z8gvvkGmtFoPUp/ZL8+v1qDHWgf+D5S4OD+Qbrpy9HBh4p2InAbNVNSeo/U2 1b3EeFdKvq6w5uh2Ow7lhxbBJ+zZnf8zwgbtLpCRDAg/oKN5Eb2oiawy5MB9cxzcdoXT FwkiQkWt6cx72wKjCcB1C2O2nm8S7MUg9t/vLzRyJ/8ToPp4U9wmV7wjReSOhqKHkEHA k2Fw== X-Gm-Message-State: AOJu0YzeySZoOi1K0cCHY5XhryV+djOcIeHXLs3489hOxHpoBgbt5LKY DhvbfbvWuaX+7w+r2qBf6eMExnF8nKTX9+HYqzETiOBk2ucmIjS9r5tRkxMHdUkH0JKKKwu0MZI F0/AZCm/eP5NsKImg2uq+LuL8FLG6oM2zxwukfp2thl8EtnQMaEvG4vRjIDFtvLLCkR2Wx+VQMO nqlQOdKG1cHC1qgkLQUsV+ynDoH0SWFAzitOy/ErKM3ZTZ X-Google-Smtp-Source: AGHT+IEexg54sYdUtUplT6KEfl8P9L7Q5f4Ez/1HHjn2De3GaCMiFUI7VxwPrAckrG/ANNv/jsZ4ISTKo4o0 X-Received: from pjbsn6.prod.google.com ([2002:a17:90b:2e86:b0:2fa:1481:81f5]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:35cd:b0:2ff:6608:78cd with SMTP id 98e67ed59e1d1-2ff7ce632a3mr36942738a91.9.1741817527071; Wed, 12 Mar 2025 15:12:07 -0700 (PDT) Date: Wed, 12 Mar 2025 15:11:32 -0700 In-Reply-To: <20250312221147.1865364-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250312221147.1865364-1-jstultz@google.com> X-Mailer: git-send-email 2.49.0.rc0.332.g42c0ae87b1-goog Message-ID: <20250312221147.1865364-3-jstultz@google.com> Subject: [RFC PATCH v15 2/7] locking/mutex: Rework task_struct::blocked_on From: John Stultz To: LKML Cc: Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kernel-team@android.com, "Connor O'Brien" , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Track the blocked-on relation for mutexes, to allow following this relation at schedule time. task | blocked-on v mutex | owner v task This all will be used for tracking blocked-task/mutex chains with the prox-execution patch in a similar fashion to how priority inheritance is done with rt_mutexes. For serialization, blocked-on is only set by the task itself (current). And both when setting or clearing (potentially by others), is done while holding the mutex::wait_lock. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kernel-team@android.com Signed-off-by: Peter Zijlstra (Intel) [minor changes while rebasing] Signed-off-by: Juri Lelli Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Connor O'Brien [jstultz: Fix blocked_on tracking in __mutex_lock_common in error paths] Signed-off-by: John Stultz --- v2: * Fixed blocked_on tracking in error paths that was causing crashes v4: * Ensure we clear blocked_on when waking ww_mutexes to die or wound. This is critical so we don't get circular blocked_on relationships that can't be resolved. v5: * Fix potential bug where the skip_wait path might clear blocked_on when that path never set it * Slight tweaks to where we set blocked_on to make it consistent, along with extra WARN_ON correctness checking * Minor comment changes v7: * Minor commit message change suggested by Metin Kaya * Fix WARN_ON conditionals in unlock path (as blocked_on might already be cleared), found while looking at issue Metin Kaya raised. * Minor tweaks to be consistent in what we do under the blocked_on lock, also tweaked variable name to avoid confusion with label, and comment typos, as suggested by Metin Kaya * Minor tweak for CONFIG_SCHED_PROXY_EXEC name change * Moved unused block of code to later in the series, as suggested by Metin Kaya * Switch to a tri-state to be able to distinguish from waking and runnable so we can later safely do return migration from ttwu * Folded together with related blocked_on changes v8: * Fix issue leaving task BO_BLOCKED when calling into optimistic spinning path. * Include helper to better handle BO_BLOCKED->BO_WAKING transitions v9: * Typo fixup pointed out by Metin * Cleanup BO_WAKING->BO_RUNNABLE transitions for the !proxy case * Many cleanups and simplifications suggested by Metin v11: * Whitespace fixup pointed out by Metin v13: * Refactor set_blocked_on helpers clean things up a bit v14: * Small build fixup with PREEMPT_RT v15: * Improve consistency of names for functions that assume blocked_lock is held, as suggested by Peter * Use guard instead of separate spinlock/unlock calls, also suggested by Peter * Drop blocked_on_state tri-state for now, as its not needed until later in the series, when we get to proxy-migration and return- migration. --- include/linux/sched.h | 5 +---- kernel/fork.c | 3 +-- kernel/locking/mutex-debug.c | 9 +++++---- kernel/locking/mutex.c | 19 +++++++++++++++++++ kernel/locking/ww_mutex.h | 18 ++++++++++++++++-- 5 files changed, 42 insertions(+), 12 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 1462f2c70aefc..03775c44b7073 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1212,10 +1212,7 @@ struct task_struct { struct rt_mutex_waiter *pi_blocked_on; #endif =20 -#ifdef CONFIG_DEBUG_MUTEXES - /* Mutex deadlock detection: */ - struct mutex_waiter *blocked_on; -#endif + struct mutex *blocked_on; /* lock we're blocked on */ =20 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP int non_block_count; diff --git a/kernel/fork.c b/kernel/fork.c index 735405a9c5f32..38f055082d07a 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2357,9 +2357,8 @@ __latent_entropy struct task_struct *copy_process( lockdep_init_task(p); #endif =20 -#ifdef CONFIG_DEBUG_MUTEXES p->blocked_on =3D NULL; /* not blocked yet */ -#endif + #ifdef CONFIG_BCACHE p->sequential_io =3D 0; p->sequential_io_avg =3D 0; diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c index 6e6f6071cfa27..758b7a6792b0c 100644 --- a/kernel/locking/mutex-debug.c +++ b/kernel/locking/mutex-debug.c @@ -53,17 +53,18 @@ void debug_mutex_add_waiter(struct mutex *lock, struct = mutex_waiter *waiter, { lockdep_assert_held(&lock->wait_lock); =20 - /* Mark the current thread as blocked on the lock: */ - task->blocked_on =3D waiter; + /* Current thread can't be already blocked (since it's executing!) */ + DEBUG_LOCKS_WARN_ON(task->blocked_on); } =20 void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *wa= iter, struct task_struct *task) { + struct mutex *blocked_on =3D READ_ONCE(task->blocked_on); + DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list)); DEBUG_LOCKS_WARN_ON(waiter->task !=3D task); - DEBUG_LOCKS_WARN_ON(task->blocked_on !=3D waiter); - task->blocked_on =3D NULL; + DEBUG_LOCKS_WARN_ON(blocked_on && blocked_on !=3D lock); =20 INIT_LIST_HEAD(&waiter->list); waiter->task =3D NULL; diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index b36f23de48f1b..37d1966970617 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -627,6 +627,8 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas goto err_early_kill; } =20 + WARN_ON(current->blocked_on); + current->blocked_on =3D lock; set_current_state(state); trace_contention_begin(lock, LCB_F_MUTEX); for (;;) { @@ -663,6 +665,12 @@ __mutex_lock_common(struct mutex *lock, unsigned int s= tate, unsigned int subclas =20 first =3D __mutex_waiter_is_first(lock, &waiter); =20 + /* + * As we likely have been woken up by task + * that has cleared our blocked_on state, re-set + * it to the lock we are trying to aquire. + */ + current->blocked_on =3D lock; set_current_state(state); /* * Here we order against unlock; we must either see it change @@ -683,6 +691,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas } raw_spin_lock_irqsave(&lock->wait_lock, flags); acquired: + current->blocked_on =3D NULL; __set_current_state(TASK_RUNNING); =20 if (ww_ctx) { @@ -712,9 +721,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int s= tate, unsigned int subclas return 0; =20 err: + current->blocked_on =3D NULL; __set_current_state(TASK_RUNNING); __mutex_remove_waiter(lock, &waiter); err_early_kill: + WARN_ON(current->blocked_on); trace_contention_end(lock, ret); raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q); debug_mutex_free_waiter(&waiter); @@ -924,6 +935,14 @@ static noinline void __sched __mutex_unlock_slowpath(s= truct mutex *lock, unsigne next =3D waiter->task; =20 debug_mutex_wake_waiter(lock, waiter); + /* + * Unlock wakeups can be happening in parallel + * (when optimistic spinners steal and release + * the lock), so blocked_on may already be + * cleared here. + */ + WARN_ON(next->blocked_on && next->blocked_on !=3D lock); + next->blocked_on =3D NULL; wake_q_add(&wake_q, next); } =20 diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h index 37f025a096c9d..00db40946328e 100644 --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -284,6 +284,14 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER= *waiter, #ifndef WW_RT debug_mutex_wake_waiter(lock, waiter); #endif + /* + * When waking up the task to die, be sure to clear the + * blocked_on pointer. Otherwise we can see circular + * blocked_on relationships that can't resolve. + */ + WARN_ON(waiter->task->blocked_on && + waiter->task->blocked_on !=3D lock); + waiter->task->blocked_on =3D NULL; wake_q_add(wake_q, waiter->task); } =20 @@ -331,9 +339,15 @@ static bool __ww_mutex_wound(struct MUTEX *lock, * it's wounded in __ww_mutex_check_kill() or has a * wakeup pending to re-read the wounded state. */ - if (owner !=3D current) + if (owner !=3D current) { + /* + * When waking up the task to wound, be sure to clear the + * blocked_on pointer. Otherwise we can see circular + * blocked_on relationships that can't resolve. + */ + owner->blocked_on =3D NULL; wake_q_add(wake_q, owner); - + } return true; } =20 --=20 2.49.0.rc0.332.g42c0ae87b1-goog From nobody Sat Feb 7 13:56:41 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0B6C1EFFA2 for ; Wed, 12 Mar 2025 22:12:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741817532; cv=none; b=HUgI3NwKfzG48GVnqr/4oG0YLzhsNHvmRRV9Uy+RYyzGue80LrynqLyD0lrJRGVNque+yleGuN+1ejuCc3bzIoogwnNPo1Ae5gKV17Gg4dvTtxxYFtUpKu5+pG9Ma2qKPxdv9m2MVeaQxjW7+9hjVlZnOI34QK1tkrbtt/0hIBI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741817532; c=relaxed/simple; bh=wQ4q9BLz+/Ygh0iE1my+UV+H0RQphJCoS7PHCglxKOs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=eeD0UTWOuQjXBWAUlU6y0Clat1AzB2eq43DDKScbLdHU0/+1vCpTo6RAoxr2ATLY777xIznThHs7yFGz2y/Qp9xxPtLYff08sbAIwJXKeuHbrraSZYuMIneAC5Xxd9OcLK1sl7cc1E438NZSfsdQ8YgYF04N6+NqU4sz9g6csR4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=AI1bS0Cf; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="AI1bS0Cf" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2242f3fd213so4953605ad.1 for ; Wed, 12 Mar 2025 15:12:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741817529; x=1742422329; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nkT2tLj01VhqJCQPdRpaz0OoBLYfRLMGUJqn28kAih0=; b=AI1bS0CfVhUNqIXQOsdGVldi6rTsyC/k7Z0xOKfqqKSTu05gm89AiNYkcuIRyTJOQY jH2V35MoPSDamWhDJOqz54qoACBJ7KZLg65yTYR+eC2NN+lOFnHIOQgo9PPGWodSgYmp mt8qLdtR6ngEualhNOzdDBRBLVuI3emV/T9a7wU7GRSbRfptpsJ2MMQ2Zq16erxZJVu0 X4MFyfrtMKxTGvWewk6uSTGGhUabMFsAhpwbvbUUGVNcghJ+h0sBO5k/HZpY2NoBPUhh z1LBUZ5rkD69uH/jfLn3r/F4DbIx2j6GGSEhj94tY1vEL6x1LmwuvqySbVcN0IXLPk0b mpsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741817529; x=1742422329; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nkT2tLj01VhqJCQPdRpaz0OoBLYfRLMGUJqn28kAih0=; b=vujntzeH2B/7Qntjh5uQfroddS4oz4DW4XkcEB59beytqz1R7mfjKBh9YbGgCc/6vK p8RnQi0CKYXzH14Jw8V+HeSVPdxe30/pYxHn57W8o2cAKIQO+VAgf8RcUyuWAgDmlE6g JqWox8ZSQwPHEGfxSzi8wfix8p1hSdBJNplinehCRKpq+sfPx+82Tu5fCIdlk1D3cp2X /bCKuwms8NYXfH+jitE8b7uA6gf3zUY1JokJsNawfpi0wc6O6cKGeqH2BNGOUcdDG2re z6vpnJAwrMYCIKB0K3CEILIfrZzDqiHGOma65LwFYsaVLTXMSOJPUxJRSQxjzrP2bAma rTKw== X-Gm-Message-State: AOJu0YzhcU58WpfG2KWevE+eTUJPX3tBFtHRN+PT8H0hPQeKXKpOykaV KDpPpigKZG7vTllMXFZ1yoPtTEnZv0RBnTotpKX2BTghWraCIjfCiAdMccm3Ol7djrKQCvhlJoM Ir+pd+xCpO0ALh4qh7pU4QGLsdoPs0rjkQ/txNeVsgITK5XbJpLxfXLW166gb8b1WD/fHGjZiWQ OHxz+24xjWpY6ZP7RXKaSRuGLJNz4UcdqP7HzmyWG6hgGn X-Google-Smtp-Source: AGHT+IFRwX9wtAAWoDGVE22lmgc1vBQwRmfmbC3N7VTkxvBj02OdDXnkum50eVo1YcbuSjZ4qMldNVtv4lkh X-Received: from pllx8.prod.google.com ([2002:a17:902:7c08:b0:220:ea57:34e3]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:2405:b0:224:93e:b5d7 with SMTP id d9443c01a7336-22428c0579bmr361718175ad.34.1741817528883; Wed, 12 Mar 2025 15:12:08 -0700 (PDT) Date: Wed, 12 Mar 2025 15:11:33 -0700 In-Reply-To: <20250312221147.1865364-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250312221147.1865364-1-jstultz@google.com> X-Mailer: git-send-email 2.49.0.rc0.332.g42c0ae87b1-goog Message-ID: <20250312221147.1865364-4-jstultz@google.com> Subject: [RFC PATCH v15 3/7] locking/mutex: Add p->blocked_on wrappers for correctness checks From: John Stultz To: LKML Cc: Valentin Schneider , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kernel-team@android.com, "Connor O'Brien" , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Valentin Schneider This lets us assert mutex::wait_lock is held whenever we access p->blocked_on, as well as warn us for unexpected state changes. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kernel-team@android.com Signed-off-by: Valentin Schneider [fix conflicts, call in more places] Signed-off-by: Connor O'Brien [jstultz: tweaked commit subject, reworked a good bit] Signed-off-by: John Stultz --- v2: * Added get_task_blocked_on() accessor v4: * Address READ_ONCE usage that was dropped in v2 * Reordered to be a later add on to the main patch series as Peter was unhappy with similar wrappers in other patches. v5: * Added some extra correctness checking in wrappers v7: * Tweaks to reorder this change in the patch series * Minor cleanup to set_task_blocked_on() suggested by Metin Kaya v15: * Split out into its own patch again. * Further improve assumption checks in helpers. --- include/linux/sched.h | 44 ++++++++++++++++++++++++++++++++++-- kernel/locking/mutex-debug.c | 4 ++-- kernel/locking/mutex.c | 20 +++++----------- kernel/locking/ww_mutex.h | 6 ++--- 4 files changed, 52 insertions(+), 22 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 03775c44b7073..62870077379a6 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -2154,6 +2155,47 @@ extern int __cond_resched_rwlock_write(rwlock_t *loc= k); __cond_resched_rwlock_write(lock); \ }) =20 +static inline void __set_task_blocked_on(struct task_struct *p, struct mut= ex *m) +{ + WARN_ON_ONCE(!m); + /* The task should only be setting itself as blocked */ + WARN_ON_ONCE(p !=3D current); + /* Currently we serialize blocked_on under the mutex::wait_lock */ + lockdep_assert_held_once(&m->wait_lock); + /* + * Check ensure we don't overwrite exisiting mutex value + * with a different mutex. Note, setting it to the same + * lock repeatedly is ok. + */ + WARN_ON_ONCE(p->blocked_on && p->blocked_on !=3D m); + p->blocked_on =3D m; +} + +static inline void set_task_blocked_on(struct task_struct *p, struct mutex= *m) +{ + guard(raw_spinlock_irqsave)(&m->wait_lock); + __set_task_blocked_on(p, m); +} + +static inline void __clear_task_blocked_on(struct task_struct *p, struct m= utex *m) +{ + WARN_ON_ONCE(!m); + /* Currently we serialize blocked_on under the mutex::wait_lock */ + lockdep_assert_held_once(&m->wait_lock); + /* + * There may be cases where we re-clear already cleared + * blocked_on relationships, but make sure we are not + * clearing the relationship with a different lock. + */ + WARN_ON_ONCE(m && p->blocked_on && p->blocked_on !=3D m); + p->blocked_on =3D NULL; +} + +static inline struct mutex *__get_task_blocked_on(struct task_struct *p) +{ + return READ_ONCE(p->blocked_on); +} + static __always_inline bool need_resched(void) { return unlikely(tif_need_resched()); @@ -2193,8 +2235,6 @@ extern bool sched_task_on_rq(struct task_struct *p); extern unsigned long get_wchan(struct task_struct *p); extern struct task_struct *cpu_curr_snapshot(int cpu); =20 -#include - /* * In order to reduce various lock holder preemption latencies provide an * interface to see if a vCPU is currently running or not. diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c index 758b7a6792b0c..949103fd8e9b5 100644 --- a/kernel/locking/mutex-debug.c +++ b/kernel/locking/mutex-debug.c @@ -54,13 +54,13 @@ void debug_mutex_add_waiter(struct mutex *lock, struct = mutex_waiter *waiter, lockdep_assert_held(&lock->wait_lock); =20 /* Current thread can't be already blocked (since it's executing!) */ - DEBUG_LOCKS_WARN_ON(task->blocked_on); + DEBUG_LOCKS_WARN_ON(__get_task_blocked_on(task)); } =20 void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *wa= iter, struct task_struct *task) { - struct mutex *blocked_on =3D READ_ONCE(task->blocked_on); + struct mutex *blocked_on =3D __get_task_blocked_on(task); =20 DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list)); DEBUG_LOCKS_WARN_ON(waiter->task !=3D task); diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 37d1966970617..351500cf50ece 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -627,8 +627,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas goto err_early_kill; } =20 - WARN_ON(current->blocked_on); - current->blocked_on =3D lock; + __set_task_blocked_on(current, lock); set_current_state(state); trace_contention_begin(lock, LCB_F_MUTEX); for (;;) { @@ -670,7 +669,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas * that has cleared our blocked_on state, re-set * it to the lock we are trying to aquire. */ - current->blocked_on =3D lock; + set_task_blocked_on(current, lock); set_current_state(state); /* * Here we order against unlock; we must either see it change @@ -691,7 +690,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas } raw_spin_lock_irqsave(&lock->wait_lock, flags); acquired: - current->blocked_on =3D NULL; + __clear_task_blocked_on(current, lock); __set_current_state(TASK_RUNNING); =20 if (ww_ctx) { @@ -721,11 +720,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int = state, unsigned int subclas return 0; =20 err: - current->blocked_on =3D NULL; + __clear_task_blocked_on(current, lock); __set_current_state(TASK_RUNNING); __mutex_remove_waiter(lock, &waiter); err_early_kill: - WARN_ON(current->blocked_on); + WARN_ON(__get_task_blocked_on(current)); trace_contention_end(lock, ret); raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q); debug_mutex_free_waiter(&waiter); @@ -935,14 +934,7 @@ static noinline void __sched __mutex_unlock_slowpath(s= truct mutex *lock, unsigne next =3D waiter->task; =20 debug_mutex_wake_waiter(lock, waiter); - /* - * Unlock wakeups can be happening in parallel - * (when optimistic spinners steal and release - * the lock), so blocked_on may already be - * cleared here. - */ - WARN_ON(next->blocked_on && next->blocked_on !=3D lock); - next->blocked_on =3D NULL; + __clear_task_blocked_on(next, lock); wake_q_add(&wake_q, next); } =20 diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h index 00db40946328e..086fd5487ca77 100644 --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -289,9 +289,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER = *waiter, * blocked_on pointer. Otherwise we can see circular * blocked_on relationships that can't resolve. */ - WARN_ON(waiter->task->blocked_on && - waiter->task->blocked_on !=3D lock); - waiter->task->blocked_on =3D NULL; + __clear_task_blocked_on(waiter->task, lock); wake_q_add(wake_q, waiter->task); } =20 @@ -345,7 +343,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock, * blocked_on pointer. Otherwise we can see circular * blocked_on relationships that can't resolve. */ - owner->blocked_on =3D NULL; + __clear_task_blocked_on(owner, lock); wake_q_add(wake_q, owner); } return true; --=20 2.49.0.rc0.332.g42c0ae87b1-goog From nobody Sat Feb 7 13:56:41 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3FB0A1F099D for ; Wed, 12 Mar 2025 22:12:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741817532; cv=none; b=js5AYZV0+pxAJ5H+pkGOKaqIPdJyin7JFNfwBFw7ZpV8vfbOJy1rMYq0DqGcUFU8VnCMDPE0FcKlKcKrDEW3thBqV2NDymKvCBKk+v6HWO81pswvZlm4PQS/1kMFp9bYXdrWGQEcN+JbyiOkMay6gnC3mRtk09Jm7EwVRyVjSAI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741817532; c=relaxed/simple; bh=r8Xmdb7/oMNjDbX7ftHrQP69ytcX2LgW27zNpKAmLxA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=anYhtI2Ae20RAqQj1YgsARg0bl9TMrWxgs8IDmVJrE2ES4MZL7FA3laEds6N+29iXXbdpWzdSq9u0GLu7I0tOM5vepkqANTh5MV0qdrzOjqGyCId8WIkur1FA/LfCVmi94twrt/CxnqpnUgk8cbIWMgSXBpyLVNsWZAA5Sxle7w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=BHQKhEty; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="BHQKhEty" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ff6167e9ccso819590a91.1 for ; Wed, 12 Mar 2025 15:12:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741817530; x=1742422330; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=p85IZWyNhDKAJWT9u/TQ4C6aOXZqtmGlenKHzziJi5g=; b=BHQKhEtyKl2hdU60vAoJmRR9HBBUaLIzCkb5lc9xFjb4+0oTS2EVCj2roZAFGL9LaH jkUdVAqz+ewLjOZP4Kb7kODqoZnCNotgMFMq6rnkI5P+eEchblUopJUS0QWUy5/hwbuN uiOPJyBPY9uU3B5glFSDlTUSxtGsZyITruoh6QKpU/49U4ZJ9Nui1PUQ+8PA7sxZLwlq M1EZC4zp256cngBdCLHkxMQ0YueOb8HjaE+BazVVn5EQPCvkEBrYAiifLwejWN1N7cxx FBCRUOA9ssOJZEAEQhfr7GusK/E+ZRrzZH/rnxitUujVirI3B88z1tc/Hr9/ySmAT8sC mqQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741817530; x=1742422330; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=p85IZWyNhDKAJWT9u/TQ4C6aOXZqtmGlenKHzziJi5g=; b=B8R2UgSbW7iO694iCOCf2Z/QnHDTOPfw3df1AZnMBjzmsLDbcM4aQOM8ObDYZFZqGY JPkFg4VxWIR6qRw6aGM34gg1NR6lqNSHJg9sGgEQe+1/j3AedNGc2zJGFRBhuDyPlIN4 ZonDQLxHyfTTcRSkfyMb7JEz6TiyOTo5i1nB3h2PonvrHF0GFGdrvj76eQTBumuV5A8x CxduIbvH0b3llII5Locch2uQ7xok4XgqiZ9FjPKLJSPjzAsAugw+98y0340qI0dMxtTs E2ykw2/mJT+ITuML+bRA3EceRQDO7UoC5YAq5qFC/GAXaVIS6V7aFOSYdTxGiXCzVxxf j99A== X-Gm-Message-State: AOJu0Yxkuz262iAInGzKCwMQdOwohrsI4QwAFFV7xxg9AKmdpM3ypBr0 QsUyiP0Cn605Q/zGo0sWtReU7f9Y2+00JOXZ32ZFQxtR+UNLAq/pCS9G+/r5arbVHpr927eFfn4 qeDzqo+yuU9jK7fAusY/mCEwqMwDPGcd7T0UioAln4SmxyccL3+KtxO6tSrC2V4dZ+379Ar2DgN KuAuyEtW8d0wWIQvyLE2DuyIf1Bj4z+5wFEKise0YYKGrI X-Google-Smtp-Source: AGHT+IFmw/822eTzUBkUVDRlJuN992GCCAXwPjeKnTfFQRKUKF9R3vwJT7t1Frwq0C8CYIOpFWNNAva4zdPN X-Received: from pjbsd4.prod.google.com ([2002:a17:90b:5144:b0:2ff:6e58:89f7]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2fc5:b0:2ee:fa0c:cebc with SMTP id 98e67ed59e1d1-2ff7ce949d2mr34249512a91.20.1741817530434; Wed, 12 Mar 2025 15:12:10 -0700 (PDT) Date: Wed, 12 Mar 2025 15:11:34 -0700 In-Reply-To: <20250312221147.1865364-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250312221147.1865364-1-jstultz@google.com> X-Mailer: git-send-email 2.49.0.rc0.332.g42c0ae87b1-goog Message-ID: <20250312221147.1865364-5-jstultz@google.com> Subject: [RFC PATCH v15 4/7] sched: Fix runtime accounting w/ split exec & sched contexts From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The idea here is we want to charge the scheduler-context task's vruntime but charge the execution-context task's sum_exec_runtime. This way cputime accounting goes against the task actually running but vruntime accounting goes against the rq->donor task so we get proper fairness. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kernel-team@android.com Signed-off-by: John Stultz --- kernel/sched/fair.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c798d27952431..f8ad3a44b3771 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1129,22 +1129,33 @@ static void update_tg_load_avg(struct cfs_rq *cfs_r= q) } #endif /* CONFIG_SMP */ =20 -static s64 update_curr_se(struct rq *rq, struct sched_entity *curr) +static s64 update_curr_se(struct rq *rq, struct sched_entity *se) { u64 now =3D rq_clock_task(rq); s64 delta_exec; =20 - delta_exec =3D now - curr->exec_start; + delta_exec =3D now - se->exec_start; if (unlikely(delta_exec <=3D 0)) return delta_exec; =20 - curr->exec_start =3D now; - curr->sum_exec_runtime +=3D delta_exec; + se->exec_start =3D now; + if (entity_is_task(se)) { + struct task_struct *running =3D rq->curr; + /* + * If se is a task, we account the time against the running + * task, as w/ proxy-exec they may not be the same. + */ + running->se.exec_start =3D now; + running->se.sum_exec_runtime +=3D delta_exec; + } else { + /* If not task, account the time against se */ + se->sum_exec_runtime +=3D delta_exec; + } =20 if (schedstat_enabled()) { struct sched_statistics *stats; =20 - stats =3D __schedstats_from_se(curr); + stats =3D __schedstats_from_se(se); __schedstat_set(stats->exec_max, max(delta_exec, stats->exec_max)); } --=20 2.49.0.rc0.332.g42c0ae87b1-goog From nobody Sat Feb 7 13:56:41 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F4F31F0E56 for ; Wed, 12 Mar 2025 22:12:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741817535; cv=none; b=V1RQIJtdEqk85Fk8tuMfyeGwxP2Mjdd/gf9pGrruWoJKtECuu94WOWe/uy3puK+FOGKql3a+yvqZDa7SwbTl1JJ7MkLBhrJoML90yA97xZ3Eik144ahtOlBpZy0Xvn9wz+EF0/Ck4qgI2hg28VzKz7f9nrT6utt2Su5CNg47yJs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741817535; c=relaxed/simple; bh=/mbP4YZxfCyEn2ISQ0juJw1J8fHDfeUVnNlWsNL7flc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=QkIsCGl8BmjE7cfn4DGlyhkNWAPYW2RxDl8XCwW1yXYj1DHfqGWEVllyPyuERP5iEZf3KGW8yoyE0qkGjFaVm+1U0fQRcKTTgk2e7Ra3DPVxTzTb7NxXSQ/x2Jhs5fLe+Z+wZ/xap43LArWeyA/amwY2fZp9wE+AK4cvbmAyKzo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=C7vQShDs; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="C7vQShDs" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2235a1f8aadso4019245ad.2 for ; Wed, 12 Mar 2025 15:12:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741817532; x=1742422332; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MrsfwqXXgkFJ2winP2ZmaOd8BgTSD1NeOP1iqnaIQP4=; b=C7vQShDsVyk/F08Nx9E+uGS3n9FHZ3tVEAGzeRIKK8nUO9M05yLSLv3DkWnu8PO9i+ KJXOnHI1ayD+RDg/L0IZvxS6a/xrAMAH77iGf6+ZpGXJ1DE02PhfSyX1xsUkT+Rnzp9r kei6fjCpkrgW6EgL2ioUliFAEIu8Y7w8fcefzXvl2AqeCBaGx/taUwiolwZ+xu1jB8n8 XQOyLNGifBUa1x3XWK5qLsnddgH8sC2zanbK47haroXN7LjEWlGP08x9od6fs7Z+h05o EEAuoFkOxGODmG1M83rKnbOXVN7U5ZmCwAbAOamseBeEskY9RLl/FIEjZWZjwlEumJI7 hoHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741817532; x=1742422332; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MrsfwqXXgkFJ2winP2ZmaOd8BgTSD1NeOP1iqnaIQP4=; b=d2r88+eB+ziLlbKgAfPyKrKK0mHDYpbUtQ45G3pyE6SC3tKDovRzZ8Zt4Gk4lM6+1q HuqWrtLoykHxw++3bTn8VJ7qYipncYCXBDm0Xi6erbHrcuyE87QrjqOZiJJLSUXsBYmN s+T35fLXmAAQgLEtehAe2DYKoh4DrkWiLraWMfVVX8Ag63iB6DHPiSKvbLm6fhCrATcy pEW8sI98nKBPuu6LXV6NDjPBFsplQmV6Cj2g1glavzcN0rW/5q1G86yqm96PvIkXeEyl MO94f/K/wkay5jPRZMkkznuw5bNZpl0wy+jTckYVE6eFxH44/3SNAWOD600fZXnwux57 JoMg== X-Gm-Message-State: AOJu0YygOgGwP+thWLwrhhOrD9YlOMnYL9CU7ylA0fjbJWtGwv4dJH5u B/JiV2ZkyjD2wUlOqNdWpJCXFSLqHpvufcLmrVLuuIGkl/HuwmUl+4gUSFF+LDp1R6Pb0lDygfR iMzEOcPnIA7B0GHXGe1/pbpUCg5ha5oUIGsZRdv4rRNKUj6UqPE9P5AIBmPHETMta0eyeq0AQUA Yu5Ru76fyu1I4aiYmS/eHvggMTnkvinRy0KsKEuum/LQaJ X-Google-Smtp-Source: AGHT+IFEPYyaeCVs3PtoZnYnn43MPFjmvJuf32UoV9oQUH9UWlRga5DN+nb/mOKj7mdFwCWxJOpJPtioVyu7 X-Received: from plbkf11.prod.google.com ([2002:a17:903:5cb:b0:223:faee:f527]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:e88a:b0:21f:6fb9:9299 with SMTP id d9443c01a7336-22428aaad15mr360725245ad.27.1741817532211; Wed, 12 Mar 2025 15:12:12 -0700 (PDT) Date: Wed, 12 Mar 2025 15:11:35 -0700 In-Reply-To: <20250312221147.1865364-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250312221147.1865364-1-jstultz@google.com> X-Mailer: git-send-email 2.49.0.rc0.332.g42c0ae87b1-goog Message-ID: <20250312221147.1865364-6-jstultz@google.com> Subject: [RFC PATCH v15 5/7] sched: Add an initial sketch of the find_proxy_task() function From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a find_proxy_task() function which doesn't do much. When we select a blocked task to run, we will just deactivate it and pick again. The exception being if it has become unblocked after find_proxy_task() was called. Greatly simplified from patch by: Peter Zijlstra (Intel) Juri Lelli Valentin Schneider Connor O'Brien Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kernel-team@android.com [jstultz: Split out from larger proxy patch and simplified for review and testing.] Signed-off-by: John Stultz --- v5: * Split out from larger proxy patch v7: * Fixed unused function arguments, spelling nits, and tweaks for clarity, pointed out by Metin Kaya * Fix build warning Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202311081028.yDLmCWgr-lkp@i= ntel.com/ v8: * Fixed case where we might return a blocked task from find_proxy_task() * Continued tweaks to handle avoiding returning blocked tasks v9: * Add zap_balance_callbacks helper to unwind balance_callbacks when we will re-call pick_next_task() again. * Add extra comment suggested by Metin * Typo fixes from Metin * Moved adding proxy_resched_idle earlier in the series, as suggested by Metin * Fix to call proxy_resched_idle() *prior* to deactivating next, to avoid crashes caused by stale references to next * s/PROXY/SCHED_PROXY_EXEC/ as suggested by Metin * Number of tweaks and cleanups suggested by Metin * Simplify proxy_deactivate as suggested by Metin v11: * Tweaks for earlier simplification in try_to_deactivate_task v13: * Rename rename "next" to "donor" in find_proxy_task() for clarity * Similarly use "donor" instead of next in proxy_deactivate * Refactor/simplify proxy_resched_idle * Moved up a needed fix from later in the series v15: * Tweaked some comments to better explain the initial sketch of find_proxy_task(), suggested by Qais * Build fixes for !CONFIG_SMP * Slight rework for blocked_on_state being added later in the series. * Move the zap_balance_callbacks to later in the patch series --- kernel/sched/core.c | 103 +++++++++++++++++++++++++++++++++++++++++-- kernel/sched/rt.c | 15 ++++++- kernel/sched/sched.h | 10 ++++- 3 files changed, 122 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3968c3967ec38..b4f7b14f62a24 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6600,7 +6600,7 @@ pick_next_task(struct rq *rq, struct task_struct *pre= v, struct rq_flags *rf) * Otherwise marks the task's __state as RUNNING */ static bool try_to_block_task(struct rq *rq, struct task_struct *p, - unsigned long task_state) + unsigned long task_state, bool deactivate_cond) { int flags =3D DEQUEUE_NOCLOCK; =20 @@ -6609,6 +6609,9 @@ static bool try_to_block_task(struct rq *rq, struct t= ask_struct *p, return false; } =20 + if (!deactivate_cond) + return false; + p->sched_contributes_to_load =3D (task_state & TASK_UNINTERRUPTIBLE) && !(task_state & TASK_NOLOAD) && @@ -6632,6 +6635,93 @@ static bool try_to_block_task(struct rq *rq, struct = task_struct *p, return true; } =20 +#ifdef CONFIG_SCHED_PROXY_EXEC + +static inline struct task_struct * +proxy_resched_idle(struct rq *rq) +{ + put_prev_task(rq, rq->donor); + rq_set_donor(rq, rq->idle); + set_next_task(rq, rq->idle); + set_tsk_need_resched(rq->idle); + return rq->idle; +} + +static bool proxy_deactivate(struct rq *rq, struct task_struct *donor) +{ + unsigned long state =3D READ_ONCE(donor->__state); + + /* Don't deactivate if the state has been changed to TASK_RUNNING */ + if (state =3D=3D TASK_RUNNING) + return false; + /* + * Because we got donor from pick_next_task, it is *crucial* + * that we call proxy_resched_idle before we deactivate it. + * As once we deactivate donor, donor->on_rq is set to zero, + * which allows ttwu to immediately try to wake the task on + * another rq. So we cannot use *any* references to donor + * after that point. So things like cfs_rq->curr or rq->donor + * need to be changed from next *before* we deactivate. + */ + proxy_resched_idle(rq); + return try_to_block_task(rq, donor, state, true); +} + +/* + * Initial simple sketch that just deactivates the blocked task + * chosen by pick_next_task() so we can then pick something that + * isn't blocked. + */ +static struct task_struct * +find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags = *rf) +{ + struct task_struct *p =3D donor; + struct mutex *mutex; + + mutex =3D p->blocked_on; + /* Something changed in the chain, so pick again */ + if (!mutex) + return NULL; + /* + * By taking mutex->wait_lock we hold off concurrent mutex_unlock() + * and ensure @owner sticks around. + */ + raw_spin_lock(&mutex->wait_lock); + + /* Check again that p is blocked with blocked_lock held */ + if (!task_is_blocked(p) || mutex !=3D __get_task_blocked_on(p)) { + /* + * Something changed in the blocked_on chain and + * we don't know if only at this level. So, let's + * just bail out completely and let __schedule + * figure things out (pick_again loop). + */ + goto out; + } + + if (!proxy_deactivate(rq, donor)) { + /* + * XXX: For now, if deactivation failed, set donor + * as not blocked, as we aren't doing proxy-migrations + * yet (more logic will be needed then). + */ + __clear_task_blocked_on(donor, mutex); + raw_spin_unlock(&mutex->wait_lock); + return NULL; + } +out: + raw_spin_unlock(&mutex->wait_lock); + return NULL; /* do pick_next_task again */ +} +#else /* SCHED_PROXY_EXEC */ +static struct task_struct * +find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags = *rf) +{ + WARN_ONCE(1, "This should never be called in the !SCHED_PROXY_EXEC case\n= "); + return donor; +} +#endif /* SCHED_PROXY_EXEC */ + /* * __schedule() is the main scheduler function. * @@ -6739,12 +6829,19 @@ static void __sched notrace __schedule(int sched_mo= de) goto picked; } } else if (!preempt && prev_state) { - try_to_block_task(rq, prev, prev_state); + try_to_block_task(rq, prev, prev_state, + !task_is_blocked(prev)); switch_count =3D &prev->nvcsw; } =20 - next =3D pick_next_task(rq, prev, &rf); +pick_again: + next =3D pick_next_task(rq, rq->donor, &rf); rq_set_donor(rq, next); + if (unlikely(task_is_blocked(next))) { + next =3D find_proxy_task(rq, next, &rf); + if (!next) + goto pick_again; + } picked: clear_tsk_need_resched(prev); clear_preempt_need_resched(); diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 4b8e33c615b12..2d418e0efecc5 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1479,8 +1479,19 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p= , int flags) =20 enqueue_rt_entity(rt_se, flags); =20 - if (!task_current(rq, p) && p->nr_cpus_allowed > 1) - enqueue_pushable_task(rq, p); + /* + * Current can't be pushed away. Selected is tied to current, + * so don't push it either. + */ + if (task_current(rq, p) || task_current_donor(rq, p)) + return; + /* + * Pinned tasks can't be pushed. + */ + if (p->nr_cpus_allowed =3D=3D 1) + return; + + enqueue_pushable_task(rq, p); } =20 static bool dequeue_task_rt(struct rq *rq, struct task_struct *p, int flag= s) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 05d2122533619..3e49d77ce2cdd 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2311,6 +2311,14 @@ static inline int task_current_donor(struct rq *rq, = struct task_struct *p) return rq->donor =3D=3D p; } =20 +static inline bool task_is_blocked(struct task_struct *p) +{ + if (!sched_proxy_exec()) + return false; + + return !!p->blocked_on; +} + static inline int task_on_cpu(struct rq *rq, struct task_struct *p) { #ifdef CONFIG_SMP @@ -2520,7 +2528,7 @@ static inline void put_prev_set_next_task(struct rq *= rq, struct task_struct *prev, struct task_struct *next) { - WARN_ON_ONCE(rq->curr !=3D prev); + WARN_ON_ONCE(rq->donor !=3D prev); =20 __put_prev_set_next_dl_server(rq, prev, next); =20 --=20 2.49.0.rc0.332.g42c0ae87b1-goog From nobody Sat Feb 7 13:56:41 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 22E281F1311 for ; Wed, 12 Mar 2025 22:12:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741817537; cv=none; b=N4ujhLnxp7MmAT4Uz4+vBQrQeZRlysfght9PEcbTjzuKvPEwbJeRsictKEiKECrU9/AAMNpIuYl7KStZoh8sDYOgkH+eIEotfitiVI2aGgLQ2K9MZW/dCFjgtB7iM9LNrwwywtriwgkkvNb8NQLGlcMx6e/yxh8qGE3qgeOCess= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741817537; c=relaxed/simple; bh=req2emdWfB0EILF9JqxNwvTgZRnMrCObPu1gITQ16xc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=msMjHoQBUEQzy9/XaT6GMAP//q4fw6P8CVPMLuuLIYOt/fsMTeyyPUm11pR+RTyu778NGhkKpcqROPiPbC1t/xA+vZcW7pjeTdOW6PD6WxNSK+/Inn2CveIOJ1KgwKwQV2f7T3Q0EcOtqB1nyx20pfuWDLuUe2aPBqLv3A2QrLk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=PfXXdvV8; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PfXXdvV8" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-225429696a9so8617675ad.1 for ; Wed, 12 Mar 2025 15:12:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741817534; x=1742422334; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=yb7D8Nv8RbDNVPA6BEFWWZ/iiYtFcCjVbuTNxnftR+c=; b=PfXXdvV8fNBjB0240vmdR/nlaainAORlJb1pZ0ZjqxnaIKEyvRxANglcB2lVxFw3xQ LsYEjavgQo9zJ8WxM+9TmE1MjhjFsrFEo43rL9D0fYav6acp+YkGxW101UXmkU9Bzkon hMWLko9NnoVZycfVQwD1c5vEcDMDcmu/Y1WO2I4It+PrctYD5qoQTk4GUGaRyvUZfZXG SvA+Jacl6ZVURu13j/EHf4nKMJasqvz5/m+1qCOxN5rKgTKUZxEanj3S6GIGVDGj4QrB e9CK8a1ZkojM4cuWnOrWt9QJE+5AdnWm2NDchllNk7tYyJAV97kuAbrGLG1zobsXsAft HAqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741817534; x=1742422334; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yb7D8Nv8RbDNVPA6BEFWWZ/iiYtFcCjVbuTNxnftR+c=; b=vqNFgnAlQwIodmd0fu/XLugS2uj+B2BLDs+LFwpCzu3nHkMt56jlFcBz3z7vAxICZy p0Hcbk2lZYIBc8uzC41654k94a8/VgryafkcYYBp5Bu5KcmNR5FiRk4hFX3UJW8Meqzx ZRHJo2D0bsqV0cCes8pZAc5649142nmSs9MfzSHlfUZ90i75u67n/obQwenfq/EF4VHQ r3iYisBRJBFqTQHCKbj8VpgMXyJcugj4+QTrH+6vEcXXJMLGcFqJ3UeVNHMm4JV/ct8f w/d8lfgnU6reVF0QUpZ4Eva7q6nvBapJfJgAJDUGturvIAz/dWZQNPosf+kIl/dlYdE9 Et+Q== X-Gm-Message-State: AOJu0YxIrJO0Pt5vxqoWy8UGOEsJRP0d5Alnbud8RCwFvfVdb0cNdF+e pyf8htpaAXRQZXrCN3zDIAXgQxTZ8WSTK514I3tcqSS5UyJbRVpBmz6b8PiI+GMX21J74+AN8ya 4yyAD4qz86q0aONpT7aqcB9PvHNKI4s+q/d7gKqY49S1QM/xnPAs0qcelLTKdaEw2kePjeesLA0 iIZ68f63uxCLFvuf2YNG7dcQSdKvkkCD4aqRYb6/xKT9t1 X-Google-Smtp-Source: AGHT+IGyWxTtg6fXUr10H8hhSrLY5MwvkDnk2Pt0hz5MA85DWNygzaAwBWdJAzaQWFEDbBP9zLY6M2NVZ5tU X-Received: from pfjd14.prod.google.com ([2002:a05:6a00:244e:b0:736:a1eb:1520]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:b4d:b0:730:8a0a:9f09 with SMTP id d2e1a72fcca58-736aaaaca0fmr34850966b3a.18.1741817534134; Wed, 12 Mar 2025 15:12:14 -0700 (PDT) Date: Wed, 12 Mar 2025 15:11:36 -0700 In-Reply-To: <20250312221147.1865364-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250312221147.1865364-1-jstultz@google.com> X-Mailer: git-send-email 2.49.0.rc0.332.g42c0ae87b1-goog Message-ID: <20250312221147.1865364-7-jstultz@google.com> Subject: [RFC PATCH v15 6/7] sched: Fix proxy/current (push,pull)ability From: John Stultz To: LKML Cc: Valentin Schneider , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kernel-team@android.com, "Connor O'Brien" , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Valentin Schneider Proxy execution forms atomic pairs of tasks: The waiting donor task (scheduling context) and a proxy (execution context). The donor task, along with the rest of the blocked chain, follows the proxy wrt CPU placement. They can be the same task, in which case push/pull doesn't need any modification. When they are different, however, FIFO1 & FIFO42: ,-> RT42 | | blocked-on | v blocked_donor | mutex | | owner | v `-- RT1 RT1 RT42 CPU0 CPU1 ^ ^ | | overloaded !overloaded rq prio =3D 42 rq prio =3D 0 RT1 is eligible to be pushed to CPU1, but should that happen it will "carry" RT42 along. Clearly here neither RT1 nor RT42 must be seen as push/pullable. Unfortunately, only the donor task is usually dequeued from the rq, and the proxy'ed execution context (rq->curr) remains on the rq. This can cause RT1 to be selected for migration from logic like the rt pushable_list. Thus, adda a dequeue/enqueue cycle on the proxy task before __schedule returns, which allows the sched class logic to avoid adding the now current task to the pushable_list. Furthermore, tasks becoming blocked on a mutex don't need an explicit dequeue/enqueue cycle to be made (push/pull)able: they have to be running to block on a mutex, thus they will eventually hit put_prev_task(). Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kernel-team@android.com Signed-off-by: Valentin Schneider Signed-off-by: Connor O'Brien Signed-off-by: John Stultz --- v3: * Tweaked comments & commit message v5: * Minor simplifications to utilize the fix earlier in the patch series. * Rework the wording of the commit message to match selected/ proxy terminology and expand a bit to make it more clear how it works. v6: * Dropped now-unused proxied value, to be re-added later in the series when it is used, as caught by Dietmar v7: * Unused function argument fixup * Commit message nit pointed out by Metin Kaya * Dropped unproven unlikely() and use sched_proxy_exec() in proxy_tag_curr, suggested by Metin Kaya v8: * More cleanups and typo fixes suggested by Metin Kaya v11: * Cleanup of comimt message suggested by Metin v12: * Rework for rq_selected -> rq->donor renaming --- kernel/sched/core.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b4f7b14f62a24..3596244f613f8 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6722,6 +6722,23 @@ find_proxy_task(struct rq *rq, struct task_struct *d= onor, struct rq_flags *rf) } #endif /* SCHED_PROXY_EXEC */ =20 +static inline void proxy_tag_curr(struct rq *rq, struct task_struct *owner) +{ + if (!sched_proxy_exec()) + return; + /* + * pick_next_task() calls set_next_task() on the chosen task + * at some point, which ensures it is not push/pullable. + * However, the chosen/donor task *and* the mutex owner form an + * atomic pair wrt push/pull. + * + * Make sure owner we run is not pushable. Unfortunately we can + * only deal with that by means of a dequeue/enqueue cycle. :-/ + */ + dequeue_task(rq, owner, DEQUEUE_NOCLOCK | DEQUEUE_SAVE); + enqueue_task(rq, owner, ENQUEUE_NOCLOCK | ENQUEUE_RESTORE); +} + /* * __schedule() is the main scheduler function. * @@ -6856,6 +6873,10 @@ static void __sched notrace __schedule(int sched_mod= e) * changes to task_struct made by pick_next_task(). */ RCU_INIT_POINTER(rq->curr, next); + + if (!task_current_donor(rq, next)) + proxy_tag_curr(rq, next); + /* * The membarrier system call requires each architecture * to have a full memory barrier after updating @@ -6890,6 +6911,10 @@ static void __sched notrace __schedule(int sched_mod= e) /* Also unlocks the rq: */ rq =3D context_switch(rq, prev, next, &rf); } else { + /* In case next was already curr but just got blocked_donor */ + if (!task_current_donor(rq, next)) + proxy_tag_curr(rq, next); + rq_unpin_lock(rq, &rf); __balance_callbacks(rq); raw_spin_rq_unlock_irq(rq); --=20 2.49.0.rc0.332.g42c0ae87b1-goog From nobody Sat Feb 7 13:56:41 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 14B021F151C for ; Wed, 12 Mar 2025 22:12:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741817539; cv=none; b=VBlm15B+Ad8fPTDgw1Q9Fdp4LTJkRT1Tk72OYut4EsFg9d0c6dd2QgjSQMTMLI7HuJN7pJRgAUBBpgxnKymnr5AtlJNYXD63Naqid5JwH+geqB8bDzI4WwOJbwhTzMLBkrHnLOS/Yi3cu1NQrAky4flBCEgsZHQPd0Hl2W4RyjU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741817539; c=relaxed/simple; bh=ZL+W08F6hmTxX7JOTvRSMoamHy2gZu+n+t0tdGRw/dE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=QIFElxTRM98nYen8ceGbIsLZeC6j1G4korqvGtuu2To4jU2wx7pv/+UR1eGtn/n0TK65t2mTSQvTlNJm/K/V0x+/1jioo7aAInGu5qKzWVh+yzP5Wjhcg+Hqbh4WknK8JFsyhAxUL6VsghMvxKMbgPrKpHgyu5lkUkiAHizOGo0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4KhV/olv; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4KhV/olv" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-223a2770b75so5974265ad.1 for ; Wed, 12 Mar 2025 15:12:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741817536; x=1742422336; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=oDKOq7nEKF9zLUpU/z6oRT+PMxi+3SiB9TIA9WPNa58=; b=4KhV/olvTsHBMWjUAIDLlgoCfFjA5vjWrI+aw9OKbr3ZJ1E2OlL+vkYhX1pbt5Qdm8 tLCNqDgx+f4RxLCWr8w58qEkKaL9rAAlp3PJn3Nf6268RS1EjjF1AN0/U05yl8CWGKXl NVB/a5fdMROQcNkRA+B0rVXNL8+l7FOCj0FDk8opB6pVMdj0Y4VwrXb+jmDCc/cijG8t CYlqlz9B705ioQ4xcwYbB6AU6uaN+4dJjIQaF8K2pa2r/TXm1CyJ1J3whTtJA6T47eb0 H1tisJCIc42zMK/ZE+4dDRHp0PYOogQ3owG2nq11SFGQMjDtoRp9NYehXAZL3b1gjPq/ UyXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741817536; x=1742422336; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oDKOq7nEKF9zLUpU/z6oRT+PMxi+3SiB9TIA9WPNa58=; b=gy4fgiSuKz35VPDufs84E/9lJynmlsLwLaq8MamfIDzTttjTDUoWtb1ZqrvgocozIH LNBn7alp4VLlILFqcz0Bx3dht5JtpZJWsojehXSR37ZDzLz+g2UsCngVx91dgTBwRghM kghM2ztQ2sdn6ml3XemLs+TLnSPz+TqjWFkJi+tV1fPzksvuegVGEvJ0O/enXJztP+Ha ZCIflLM2lDY/Y7XYHhJVr5nOZs/ZI5attjawpP5uUdpP+91egJDx1EXgVsZjkeTvLxvb PK+tpUt9IZdzhVomykYfIBEzIFTP9vElar8Yd9Trrbmr8iU+oP7x63H3bbj/cOSfAQpm JO/Q== X-Gm-Message-State: AOJu0YwYjtJnSi16x7zPBp+RRfDDvOEzCOlQzsCNFNlZ3wsn/fmu+iiU bYyU/95QpjS7O4iVszPN8PTGBsEVG0vEtifdY8J4gcxppTBhkKOFpCbMv5tzFgayaofgq06LZdw dkrKP4hAyLdTStOreGImNvHlFaeS78XqDUg/DItstt4DvRrh2JsGmwwyt/351Rdp3oL8he/3UU+ xQXeXS9BJQazUaDP9+4e100B4qU4eUY+bZNfEbhDWHzYh5 X-Google-Smtp-Source: AGHT+IHbLDd7V0OZQQ2jTBebl0iJ1mRUWnRHGoH7JEc9da5PBdFIfBLtX42eU0MKHmg6RNWpW159A96Oa2v4 X-Received: from pfbhj4.prod.google.com ([2002:a05:6a00:8704:b0:730:880d:7ed5]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:eb81:b0:224:78e:4eb4 with SMTP id d9443c01a7336-22428bd57fcmr347217755ad.39.1741817536059; Wed, 12 Mar 2025 15:12:16 -0700 (PDT) Date: Wed, 12 Mar 2025 15:11:37 -0700 In-Reply-To: <20250312221147.1865364-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250312221147.1865364-1-jstultz@google.com> X-Mailer: git-send-email 2.49.0.rc0.332.g42c0ae87b1-goog Message-ID: <20250312221147.1865364-8-jstultz@google.com> Subject: [RFC PATCH v15 7/7] sched: Start blocked_on chain processing in find_proxy_task() From: John Stultz To: LKML Cc: Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kernel-team@android.com, Valentin Schneider , "Connor O'Brien" , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Start to flesh out the real find_proxy_task() implementation, but avoid the migration cases for now, in those cases just deactivate the donor task and pick again. To ensure the donor task or other blocked tasks in the chain aren't migrated away while we're running the proxy, also tweak the fair class logic to avoid migrating donor or mutex blocked tasks. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kernel-team@android.com Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Juri Lelli Signed-off-by: Valentin Schneider Signed-off-by: Connor O'Brien [jstultz: This change was split out from the larger proxy patch] Signed-off-by: John Stultz --- v5: * Split this out from larger proxy patch v7: * Minor refactoring of core find_proxy_task() function * Minor spelling and corrections suggested by Metin Kaya * Dropped an added BUG_ON that was frequently tripped v8: * Fix issue if proxy_deactivate fails, we don't leave task BO_BLOCKED * Switch to WARN_ON from BUG_ON checks v9: * Improve comments suggested by Metin * Minor cleanups v11: * Previously we checked next=3D=3Drq->idle && prev=3D=3Drq->idle, but I think we only really care if next=3D=3Drq->idle from find_proxy_task, as we will still want to resched regardless of what prev was. v12: * Commit message rework for selected -> donor rewording v13: * Address new delayed dequeue condition (deactivate donor for now) * Next to donor renaming in find_proxy_task * Improved comments for find_proxy_task * Rework for proxy_deactivate cleanup v14: * Fix build error from __mutex_owner() with CONFIG_PREEMPT_RT v15: * Reworks for moving blocked_on_state to later in the series --- kernel/locking/mutex.h | 3 +- kernel/sched/core.c | 165 +++++++++++++++++++++++++++++++++-------- kernel/sched/fair.c | 10 ++- 3 files changed, 145 insertions(+), 33 deletions(-) diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h index cbff35b9b7ae3..2e8080a9bee37 100644 --- a/kernel/locking/mutex.h +++ b/kernel/locking/mutex.h @@ -6,7 +6,7 @@ * * Copyright (C) 2004, 2005, 2006 Red Hat, Inc., Ingo Molnar */ - +#ifndef CONFIG_PREEMPT_RT /* * This is the control structure for tasks blocked on mutex, which resides * on the blocked task's kernel stack: @@ -70,3 +70,4 @@ extern void debug_mutex_init(struct mutex *lock, const ch= ar *name, # define debug_mutex_unlock(lock) do { } while (0) # define debug_mutex_init(lock, name, key) do { } while (0) #endif /* !CONFIG_DEBUG_MUTEXES */ +#endif /* CONFIG_PREEMPT_RT */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3596244f613f8..28ac71dfc7e66 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -96,6 +96,7 @@ #include "../workqueue_internal.h" #include "../../io_uring/io-wq.h" #include "../smpboot.h" +#include "../locking/mutex.h" =20 EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_send_cpu); EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_send_cpumask); @@ -2950,8 +2951,15 @@ static int affine_move_task(struct rq *rq, struct ta= sk_struct *p, struct rq_flag struct set_affinity_pending my_pending =3D { }, *pending =3D NULL; bool stop_pending, complete =3D false; =20 - /* Can the task run on the task's current CPU? If so, we're done */ - if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) { + /* + * Can the task run on the task's current CPU? If so, we're done + * + * We are also done if the task is the current donor, boosting a lock- + * holding proxy, (and potentially has been migrated outside its + * current or previous affinity mask) + */ + if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask) || + (task_current_donor(rq, p) && !task_current(rq, p))) { struct task_struct *push_task =3D NULL; =20 if ((flags & SCA_MIGRATE_ENABLE) && @@ -6668,47 +6676,138 @@ static bool proxy_deactivate(struct rq *rq, struct= task_struct *donor) } =20 /* - * Initial simple sketch that just deactivates the blocked task - * chosen by pick_next_task() so we can then pick something that - * isn't blocked. + * Find runnable lock owner to proxy for mutex blocked donor + * + * Follow the blocked-on relation: + * task->blocked_on -> mutex->owner -> task... + * + * Lock order: + * + * p->pi_lock + * rq->lock + * mutex->wait_lock + * + * Returns the task that is going to be used as execution context (the one + * that is actually going to be run on cpu_of(rq)). */ static struct task_struct * find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags = *rf) { - struct task_struct *p =3D donor; + struct task_struct *owner =3D NULL; + struct task_struct *ret =3D NULL; + int this_cpu =3D cpu_of(rq); + struct task_struct *p; struct mutex *mutex; =20 - mutex =3D p->blocked_on; - /* Something changed in the chain, so pick again */ - if (!mutex) - return NULL; - /* - * By taking mutex->wait_lock we hold off concurrent mutex_unlock() - * and ensure @owner sticks around. - */ - raw_spin_lock(&mutex->wait_lock); - - /* Check again that p is blocked with blocked_lock held */ - if (!task_is_blocked(p) || mutex !=3D __get_task_blocked_on(p)) { + /* Follow blocked_on chain. */ + for (p =3D donor; task_is_blocked(p); p =3D owner) { + mutex =3D p->blocked_on; + /* Something changed in the chain, so pick again */ + if (!mutex) + return NULL; /* - * Something changed in the blocked_on chain and - * we don't know if only at this level. So, let's - * just bail out completely and let __schedule - * figure things out (pick_again loop). + * By taking mutex->wait_lock we hold off concurrent mutex_unlock() + * and ensure @owner sticks around. */ - goto out; - } + raw_spin_lock(&mutex->wait_lock); + + /* Check again that p is blocked with wait_lock held */ + if (mutex !=3D __get_task_blocked_on(p)) { + /* + * Something changed in the blocked_on chain and + * we don't know if only at this level. So, let's + * just bail out completely and let __schedule + * figure things out (pick_again loop). + */ + goto out; + } + + owner =3D __mutex_owner(mutex); + if (!owner) { + __clear_task_blocked_on(p, mutex); + ret =3D p; + goto out; + } + + if (task_cpu(owner) !=3D this_cpu) { + /* XXX Don't handle migrations yet */ + if (!proxy_deactivate(rq, donor)) + goto deactivate_failed; + goto out; + } + + if (task_on_rq_migrating(owner)) { + /* + * One of the chain of mutex owners is currently migrating to this + * CPU, but has not yet been enqueued because we are holding the + * rq lock. As a simple solution, just schedule rq->idle to give + * the migration a chance to complete. Much like the migrate_task + * case we should end up back in find_proxy_task(), this time + * hopefully with all relevant tasks already enqueued. + */ + raw_spin_unlock(&mutex->wait_lock); + return proxy_resched_idle(rq); + } + + if (!owner->on_rq) { + /* XXX Don't handle blocked owners yet */ + if (!proxy_deactivate(rq, donor)) + goto deactivate_failed; + goto out; + } + + if (owner->se.sched_delayed) { + /* XXX Don't handle delayed dequeue yet */ + if (!proxy_deactivate(rq, donor)) + goto deactivate_failed; + goto out; + } + + if (owner =3D=3D p) { + /* + * It's possible we interleave with mutex_unlock like: + * + * lock(&rq->lock); + * find_proxy_task() + * mutex_unlock() + * lock(&wait_lock); + * donor(owner) =3D current->blocked_donor; + * unlock(&wait_lock); + * + * wake_up_q(); + * ... + * ttwu_runnable() + * __task_rq_lock() + * lock(&wait_lock); + * owner =3D=3D p + * + * Which leaves us to finish the ttwu_runnable() and make it go. + * + * So schedule rq->idle so that ttwu_runnable can get the rq lock + * and mark owner as running. + */ + raw_spin_unlock(&mutex->wait_lock); + return proxy_resched_idle(rq); + } =20 - if (!proxy_deactivate(rq, donor)) { /* - * XXX: For now, if deactivation failed, set donor - * as not blocked, as we aren't doing proxy-migrations - * yet (more logic will be needed then). + * OK, now we're absolutely sure @owner is on this + * rq, therefore holding @rq->lock is sufficient to + * guarantee its existence, as per ttwu_remote(). */ - __clear_task_blocked_on(donor, mutex); raw_spin_unlock(&mutex->wait_lock); - return NULL; } + + WARN_ON_ONCE(owner && !owner->on_rq); + return owner; + +deactivate_failed: + /* + * XXX: For now, if deactivation failed, set donor + * as unblocked, as we aren't doing proxy-migrations + * yet (more logic will be needed then). + */ + donor->blocked_on =3D NULL; /* XXX not following locking rules :( */ out: raw_spin_unlock(&mutex->wait_lock); return NULL; /* do pick_next_task again */ @@ -6791,6 +6890,7 @@ static void __sched notrace __schedule(int sched_mode) struct rq_flags rf; struct rq *rq; int cpu; + bool preserve_need_resched =3D false; =20 cpu =3D smp_processor_id(); rq =3D cpu_rq(cpu); @@ -6858,9 +6958,12 @@ static void __sched notrace __schedule(int sched_mod= e) next =3D find_proxy_task(rq, next, &rf); if (!next) goto pick_again; + if (next =3D=3D rq->idle) + preserve_need_resched =3D true; } picked: - clear_tsk_need_resched(prev); + if (!preserve_need_resched) + clear_tsk_need_resched(prev); clear_preempt_need_resched(); #ifdef CONFIG_SCHED_DEBUG rq->last_seen_need_resched_ns =3D 0; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f8ad3a44b3771..091f1a01b3327 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9385,6 +9385,7 @@ int can_migrate_task(struct task_struct *p, struct lb= _env *env) * 3) cannot be migrated to this CPU due to cpus_ptr, or * 4) running (obviously), or * 5) are cache-hot on their current CPU. + * 6) are blocked on mutexes (if SCHED_PROXY_EXEC is enabled) */ if ((p->se.sched_delayed) && (env->migration_type !=3D migrate_load)) return 0; @@ -9406,6 +9407,9 @@ int can_migrate_task(struct task_struct *p, struct lb= _env *env) if (kthread_is_per_cpu(p)) return 0; =20 + if (task_is_blocked(p)) + return 0; + if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) { int cpu; =20 @@ -9442,7 +9446,8 @@ int can_migrate_task(struct task_struct *p, struct lb= _env *env) /* Record that we found at least one task that could run on dst_cpu */ env->flags &=3D ~LBF_ALL_PINNED; =20 - if (task_on_cpu(env->src_rq, p)) { + if (task_on_cpu(env->src_rq, p) || + task_current_donor(env->src_rq, p)) { schedstat_inc(p->stats.nr_failed_migrations_running); return 0; } @@ -9486,6 +9491,9 @@ static void detach_task(struct task_struct *p, struct= lb_env *env) schedstat_inc(p->stats.nr_forced_migrations); } =20 + WARN_ON(task_current(env->src_rq, p)); + WARN_ON(task_current_donor(env->src_rq, p)); + deactivate_task(env->src_rq, p, DEQUEUE_NOCLOCK); set_task_cpu(p, env->dst_cpu); } --=20 2.49.0.rc0.332.g42c0ae87b1-goog