From nobody Sun Feb 8 07:56:36 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2C21713A418 for ; Sat, 12 Apr 2025 06:03:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744437789; cv=none; b=iONkfn8S9G7K9MtMDXjl5O1GGFVt1xH4k9UnPbjLHmupZY/HGkA9Wgl4hqr+1/1MS3P4m3wjd8DhAhOojSNN1MzWT0la24ojcfjIdNWsv7/+wR//9nny9vA7+SZeDxEF8HZQJ9t/Z42ACvVznfwamEyOuF7k0uQyByiSl7agFw0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744437789; c=relaxed/simple; bh=CAwtDqXGs1bHF5//VqV77rIjPIExC0e0uBtxtG4yEJY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=laU0u0IoLMfbNeVCFq358nIQON6zbZFHk8r7edtoGbbfikbp+MwEKtwu7B4VnNERV+VodDrdXUjUs4CV1sXhUsgxPt5mlkKVV96WvC3auSvGIXdnysK1er+VXI0l55SyffYggJeSbfHcSzEmcXgqGcYdOgpcHaK7PwVi4jrqAgA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=dZvmarxT; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dZvmarxT" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ff7aecba07so2582717a91.2 for ; Fri, 11 Apr 2025 23:03:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744437787; x=1745042587; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=SaH1XVz/XdC6yPqQ//7wehgj0cXdD7WNC52y2uOHy6M=; b=dZvmarxTqIFRydfhExEuOCV17stmE34EOAs2r+0F5KpRDVFYekBTJrf+fWd4IDfC9j Zy1uGWJJP4S9LMCyArOg6+npK02VmcAGgWzz5AySuBRLhUK+ZPMYqb8bA17hVmWn/66X O3jDvt7h1/MWQtNIThUzi7MfizNz6Jht/qWFXEuG5W+gH2KYgLcVhPC4+sKvYKHmFgPp HF9fhrEBfYYFkO10OPS58YuDxKA/CNNUKDLrEAQTGnWaTkxJFJVyWGTIObo9siybVNmq kDtQ/llQf9V9f/t1RKBDPusOvU2/ImvLSDRdvLz0RtWMVv42D3jcRdi7XiXlcmMy4WED GtQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744437787; x=1745042587; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SaH1XVz/XdC6yPqQ//7wehgj0cXdD7WNC52y2uOHy6M=; b=gWdZhhBg6OLtfrBL3QR9i9O4UzxjnzZJ30vSPIRziuOVPNnvL3DoPM0gjOP5DCHee3 NBBZ7dREeUCJL6QLUuKjff5bPTWtAroV7BnL0KnasiVo5PDslPK/BK+8p6XwZ+T5neDP wRgq1AAtnzsUSEgAylFBoPKECRBt+Cp6fZCEkXWbA2qXDy4Krk1XmiILqIAI3cIS5JzY WOvpWWh2oH7CIBDSoSDO88tWbBWMi+Jdg4GcqXtdwmY3RY2GphyFEw66stPqIzQqWR1E eer4LTmPnx+SATRlo8oVA3pvZtNNLcZ9Ey5AbVnYtpqo2WVBLoEnfFH8EM2/X/t3j66b Lsaw== X-Gm-Message-State: AOJu0Yx8grZ7li56Ybcx/xCqReR0hURJV2uyidmaHhD5a5EcJPWCU+PI kU97ykuar7HN8ie7LgazNGM5WHm00aY/taAGi+UUMOcbeKdlmgh0nJGA+32AVRnoib3LZVpeHHo OxiOEhV06KW0NmYBlN9EDprzaKy8gZAMuM7C0Kpi59LUxOtk2dx+sV8ujSlW31JUxMUtqAV00WQ 6SsZQukuf5dfE42+uqWPjACqnr3/wL1+7u8nOXoLZDrVPZ X-Google-Smtp-Source: AGHT+IFpmeOKYVxsd1mvmwrK4y8aUz+yH7zrpJiQKm+J4XYkGaJTppiJ4/IJkLg8Hoa60Qn3wwsGI4DExiP1 X-Received: from pjbkl8.prod.google.com ([2002:a17:90b:4988:b0:2ff:611c:bae8]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:c88f:b0:301:98fc:9b2f with SMTP id 98e67ed59e1d1-30823660373mr6059313a91.1.1744437787199; Fri, 11 Apr 2025 23:03:07 -0700 (PDT) Date: Fri, 11 Apr 2025 23:02:35 -0700 In-Reply-To: <20250412060258.3844594-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250412060258.3844594-1-jstultz@google.com> X-Mailer: git-send-email 2.49.0.604.gff1f9ca942-goog Message-ID: <20250412060258.3844594-2-jstultz@google.com> Subject: [PATCH v16 1/7] sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a CONFIG_SCHED_PROXY_EXEC option, along with a boot argument sched_proxy_exec=3D that can be used to disable the feature at boot time if CONFIG_SCHED_PROXY_EXEC was enabled. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kernel-team@android.com Tested-by: K Prateek Nayak Signed-off-by: John Stultz --- v7: * Switch to CONFIG_SCHED_PROXY_EXEC/sched_proxy_exec=3D as suggested by Metin Kaya. * Switch boot arg from =3Ddisable/enable to use kstrtobool(), which supports =3Dyes|no|1|0|true|false|on|off, as also suggested by Metin Kaya, and print a message when a boot argument is used. v8: * Move CONFIG_SCHED_PROXY_EXEC under Scheduler Features as Suggested by Metin * Minor rework reordering with split sched contexts patch v12: * Rework for selected -> donor renaming v14: * Depend on !PREEMPT_RT to avoid build issues for now v15: * Depend on EXPERT while patch series upstreaming is in progress. v16: * Allow "sched_proxy_exec" without "=3Dtrue" to enable proxy-execution at boot time, in addition to the "sched_proxy_exec=3Dtrue" or "sched_proxy_exec=3Dfalse" options as suggested by Steven * Drop the "default n" in Kconfig as suggested by Steven * Add !SCHED_CLASS_EXT dependency until I can investigate if sched_ext can understand split contexts, as suggested by Peter --- .../admin-guide/kernel-parameters.txt | 5 ++++ include/linux/sched.h | 13 +++++++++ init/Kconfig | 12 ++++++++ kernel/sched/core.c | 29 +++++++++++++++++++ kernel/sched/sched.h | 12 ++++++++ 5 files changed, 71 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 76e538c77e316..b21cb89a09831 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6307,6 +6307,11 @@ sa1100ir [NET] See drivers/net/irda/sa1100_ir.c. =20 + sched_proxy_exec=3D [KNL] + Enables or disables "proxy execution" style + solution to mutex-based priority inversion. + Format: + sched_verbose [KNL,EARLY] Enables verbose scheduler debug messages. =20 schedstats=3D [KNL,X86] Enable or disable scheduled statistics. diff --git a/include/linux/sched.h b/include/linux/sched.h index f96ac19828934..3cdd598aaa9aa 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1663,6 +1663,19 @@ struct task_struct { */ }; =20 +#ifdef CONFIG_SCHED_PROXY_EXEC +DECLARE_STATIC_KEY_TRUE(__sched_proxy_exec); +static inline bool sched_proxy_exec(void) +{ + return static_branch_likely(&__sched_proxy_exec); +} +#else +static inline bool sched_proxy_exec(void) +{ + return false; +} +#endif + #define TASK_REPORT_IDLE (TASK_REPORT + 1) #define TASK_REPORT_MAX (TASK_REPORT_IDLE << 1) =20 diff --git a/init/Kconfig b/init/Kconfig index dd2ea3b9a7992..23f6edff481b2 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -883,6 +883,18 @@ config UCLAMP_BUCKETS_COUNT =20 If in doubt, use the default value. =20 +config SCHED_PROXY_EXEC + bool "Proxy Execution" + # Avoid some build failures w/ PREEMPT_RT until it can be fixed + depends on !PREEMPT_RT + # Need to investigate how to inform sched_ext of split contexts + depends on !SCHED_CLASS_EXT + # Not particularly useful until we get to multi-rq proxying + depends on EXPERT + help + This option enables proxy execution, a mechanism for mutex-owning + tasks to inherit the scheduling context of higher priority waiters. + endmenu =20 # diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c81cf642dba05..82817650a635b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -118,6 +118,35 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp); =20 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); =20 +#ifdef CONFIG_SCHED_PROXY_EXEC +DEFINE_STATIC_KEY_TRUE(__sched_proxy_exec); +static int __init setup_proxy_exec(char *str) +{ + bool proxy_enable =3D true; + + if (*str && kstrtobool(str + 1, &proxy_enable)) { + pr_warn("Unable to parse sched_proxy_exec=3D\n"); + return 0; + } + + if (proxy_enable) { + pr_info("sched_proxy_exec enabled via boot arg\n"); + static_branch_enable(&__sched_proxy_exec); + } else { + pr_info("sched_proxy_exec disabled via boot arg\n"); + static_branch_disable(&__sched_proxy_exec); + } + return 1; +} +#else +static int __init setup_proxy_exec(char *str) +{ + pr_warn("CONFIG_SCHED_PROXY_EXEC=3Dn, so it cannot be enabled or disabled= at boot time\n"); + return 0; +} +#endif +__setup("sched_proxy_exec", setup_proxy_exec); + /* * Debugging: various feature bits * diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 47972f34ea701..154f0aa0c6322 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1149,10 +1149,15 @@ struct rq { */ unsigned int nr_uninterruptible; =20 +#ifdef CONFIG_SCHED_PROXY_EXEC + struct task_struct __rcu *donor; /* Scheduling context */ + struct task_struct __rcu *curr; /* Execution context */ +#else union { struct task_struct __rcu *donor; /* Scheduler context */ struct task_struct __rcu *curr; /* Execution context */ }; +#endif struct sched_dl_entity *dl_server; struct task_struct *idle; struct task_struct *stop; @@ -1347,10 +1352,17 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues= ); #define cpu_curr(cpu) (cpu_rq(cpu)->curr) #define raw_rq() raw_cpu_ptr(&runqueues) =20 +#ifdef CONFIG_SCHED_PROXY_EXEC +static inline void rq_set_donor(struct rq *rq, struct task_struct *t) +{ + rcu_assign_pointer(rq->donor, t); +} +#else static inline void rq_set_donor(struct rq *rq, struct task_struct *t) { /* Do nothing */ } +#endif =20 #ifdef CONFIG_SCHED_CORE static inline struct cpumask *sched_group_span(struct sched_group *sg); --=20 2.49.0.604.gff1f9ca942-goog From nobody Sun Feb 8 07:56:36 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 017A718CBFB for ; Sat, 12 Apr 2025 06:03:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744437791; cv=none; b=MdS4rfL0jyR4kF1V2QhnmiyU+d/7SEuMDecZiezNeJL3Hj7PY+v4KeNWzECrtWanhdCI3mGpv+zaolQv3BQ2fZ8pYY8oSTaPhMRrlrAEVe10wK/9+Fls+9GNbHmaOFkmyQdS+FNFgv5igWvgN2OiZt+t6ZgT8sj2OALx3LK/Uys= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744437791; c=relaxed/simple; bh=i01W9tzCVLe6CDhopzVzJdbjZ6qhq531PVXc23zGdyM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=pB+qv2gkt2o4IwXpUs/lWqacXRLfbJAUlcaWMJC8baUlhvMESumtBAoY5I6O9Xv7qcQWSkXsCThs4Q5ebkrU3uHg1DbSLQR7MMVZ/eLYWXZvlT517em2cDHh8smTKoP4OZ1lF5OCl3zNxnVdFghTzia69MEOviWKT/b7/nWj7mk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=UUwiu0yK; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="UUwiu0yK" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-af8a4410816so2212474a12.1 for ; Fri, 11 Apr 2025 23:03:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744437789; x=1745042589; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Doue9T+JKMVe7UX4UPfk+4CgtNy1mcwnKJMERYpmyL8=; b=UUwiu0yKuUMbjWMcxhUMwbe/3RAWCXN5Edl6nQErx8Nj6pjriGOhIcUYV9wRoRt2rl X4jBTZl7GMVSA8XdVhEm76cgNjrc0v8HorWYPKTVTlC/o39WvXLS2WvTmxRN/aGucW7o 0XNAMlSkqL5anG4eT55qBMOapj9Jm+AxkY/zXH/hHd2Iu8uVB2S+efLxS7trjJbR6YIU C2/3Upa1OhJh2gZleCUB6e7oE2XreC1t8Nb/f6ieTFbXQd5VPeZyLxtU8g0R6l1UEKsa T8zgyTpfnQFnxoNdE9lES5D0Kf64vLZTa0Vs47Q99MjauVTMmFRoIEeQDM5TSRTEXWbW lHVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744437789; x=1745042589; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Doue9T+JKMVe7UX4UPfk+4CgtNy1mcwnKJMERYpmyL8=; b=HYqDGo+HWdasSvoEoeOx0He0yDxPdbw5WPRZetaTgZ5Z64x/nbRNDRybautLru+Skb 1F5wct+bGMYhpnKFhy767dBIPM5qtqEYUVD1z+JpRgw3F9rEtKwzpXlw+x4ek8BgiXa9 8NhJVkTnOA8MFF9LDP9DVOjUqJGjC3QUAds5ROFLfISsG73MFLGprUrX8HQtjnlyOrdF 9FdTBkP5HqYdKLzGiUY2NH3YGlAKDUnArHkSCUiXpqO8MTYldesrTNtgqzsPaT4b+pEg QRrOXZnZXrk9Kroxvf8GbnGGvhQ8JWK+Qoj/pmU/kilQhwtRRE3TYodPJnf0QGHSWPFJ 8pJg== X-Gm-Message-State: AOJu0YzEhvi48/L/Ato9ZKr9wLUaSWDlVggONyk4og90MQUxjmieeMGH hjbOeiFM+lKXF3XL49jn9RS3VC9ft2sL5TluTpwjVsQ1aq/V1EB/RmhwGPjTG0LArqI2AL7IxuM +py0ERQZ5wtAr00X+Wh3G352Ivl25DTphh0CwU+2l+QlROYZK5+hdrIeo+LYMi5eJ6NpVnsLIVe p8O5rhFDDJ2tBwgN+F6ftByWWH2KlSrwmzJe/FwSejtcOK X-Google-Smtp-Source: AGHT+IGNcwQ//6cGMNitoLwcxj/RhueM11iz1TGfOp5rXnNizcTHcsGq82iyoDy2ZnpQoHQ6s5YwAYHHAYNg X-Received: from pgbcz8.prod.google.com ([2002:a05:6a02:2308:b0:af2:44f3:ad46]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:ce49:b0:1f5:6680:82b6 with SMTP id adf61e73a8af0-20179981eb7mr8299979637.38.1744437788906; Fri, 11 Apr 2025 23:03:08 -0700 (PDT) Date: Fri, 11 Apr 2025 23:02:36 -0700 In-Reply-To: <20250412060258.3844594-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250412060258.3844594-1-jstultz@google.com> X-Mailer: git-send-email 2.49.0.604.gff1f9ca942-goog Message-ID: <20250412060258.3844594-3-jstultz@google.com> Subject: [PATCH v16 2/7] locking/mutex: Rework task_struct::blocked_on From: John Stultz To: LKML Cc: Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kernel-team@android.com, "Connor O'Brien" , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Track the blocked-on relation for mutexes, to allow following this relation at schedule time. task | blocked-on v mutex | owner v task This all will be used for tracking blocked-task/mutex chains with the prox-execution patch in a similar fashion to how priority inheritance is done with rt_mutexes. For serialization, blocked-on is only set by the task itself (current). And both when setting or clearing (potentially by others), is done while holding the mutex::wait_lock. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kernel-team@android.com Signed-off-by: Peter Zijlstra (Intel) [minor changes while rebasing] Signed-off-by: Juri Lelli Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Connor O'Brien [jstultz: Fix blocked_on tracking in __mutex_lock_common in error paths] Signed-off-by: John Stultz --- v2: * Fixed blocked_on tracking in error paths that was causing crashes v4: * Ensure we clear blocked_on when waking ww_mutexes to die or wound. This is critical so we don't get circular blocked_on relationships that can't be resolved. v5: * Fix potential bug where the skip_wait path might clear blocked_on when that path never set it * Slight tweaks to where we set blocked_on to make it consistent, along with extra WARN_ON correctness checking * Minor comment changes v7: * Minor commit message change suggested by Metin Kaya * Fix WARN_ON conditionals in unlock path (as blocked_on might already be cleared), found while looking at issue Metin Kaya raised. * Minor tweaks to be consistent in what we do under the blocked_on lock, also tweaked variable name to avoid confusion with label, and comment typos, as suggested by Metin Kaya * Minor tweak for CONFIG_SCHED_PROXY_EXEC name change * Moved unused block of code to later in the series, as suggested by Metin Kaya * Switch to a tri-state to be able to distinguish from waking and runnable so we can later safely do return migration from ttwu * Folded together with related blocked_on changes v8: * Fix issue leaving task BO_BLOCKED when calling into optimistic spinning path. * Include helper to better handle BO_BLOCKED->BO_WAKING transitions v9: * Typo fixup pointed out by Metin * Cleanup BO_WAKING->BO_RUNNABLE transitions for the !proxy case * Many cleanups and simplifications suggested by Metin v11: * Whitespace fixup pointed out by Metin v13: * Refactor set_blocked_on helpers clean things up a bit v14: * Small build fixup with PREEMPT_RT v15: * Improve consistency of names for functions that assume blocked_lock is held, as suggested by Peter * Use guard instead of separate spinlock/unlock calls, also suggested by Peter * Drop blocked_on_state tri-state for now, as its not needed until later in the series, when we get to proxy-migration and return- migration. v16: * Clear blocked on before optimistic spinning --- include/linux/sched.h | 5 +---- kernel/fork.c | 3 +-- kernel/locking/mutex-debug.c | 9 +++++---- kernel/locking/mutex.c | 22 ++++++++++++++++++++++ kernel/locking/ww_mutex.h | 18 ++++++++++++++++-- 5 files changed, 45 insertions(+), 12 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 3cdd598aaa9aa..10be203ddb7e1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1234,10 +1234,7 @@ struct task_struct { struct rt_mutex_waiter *pi_blocked_on; #endif =20 -#ifdef CONFIG_DEBUG_MUTEXES - /* Mutex deadlock detection: */ - struct mutex_waiter *blocked_on; -#endif + struct mutex *blocked_on; /* lock we're blocked on */ =20 #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER struct mutex *blocker_mutex; diff --git a/kernel/fork.c b/kernel/fork.c index c4b26cd8998b8..3455ab283482e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2383,9 +2383,8 @@ __latent_entropy struct task_struct *copy_process( lockdep_init_task(p); #endif =20 -#ifdef CONFIG_DEBUG_MUTEXES p->blocked_on =3D NULL; /* not blocked yet */ -#endif + #ifdef CONFIG_BCACHE p->sequential_io =3D 0; p->sequential_io_avg =3D 0; diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c index 6e6f6071cfa27..758b7a6792b0c 100644 --- a/kernel/locking/mutex-debug.c +++ b/kernel/locking/mutex-debug.c @@ -53,17 +53,18 @@ void debug_mutex_add_waiter(struct mutex *lock, struct = mutex_waiter *waiter, { lockdep_assert_held(&lock->wait_lock); =20 - /* Mark the current thread as blocked on the lock: */ - task->blocked_on =3D waiter; + /* Current thread can't be already blocked (since it's executing!) */ + DEBUG_LOCKS_WARN_ON(task->blocked_on); } =20 void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *wa= iter, struct task_struct *task) { + struct mutex *blocked_on =3D READ_ONCE(task->blocked_on); + DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list)); DEBUG_LOCKS_WARN_ON(waiter->task !=3D task); - DEBUG_LOCKS_WARN_ON(task->blocked_on !=3D waiter); - task->blocked_on =3D NULL; + DEBUG_LOCKS_WARN_ON(blocked_on && blocked_on !=3D lock); =20 INIT_LIST_HEAD(&waiter->list); waiter->task =3D NULL; diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 555e2b3a665a3..5243e59d75f40 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -643,6 +643,8 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas goto err_early_kill; } =20 + WARN_ON(current->blocked_on); + current->blocked_on =3D lock; set_current_state(state); trace_contention_begin(lock, LCB_F_MUTEX); for (;;) { @@ -679,6 +681,12 @@ __mutex_lock_common(struct mutex *lock, unsigned int s= tate, unsigned int subclas =20 first =3D __mutex_waiter_is_first(lock, &waiter); =20 + /* + * As we likely have been woken up by task + * that has cleared our blocked_on state, re-set + * it to the lock we are trying to aquire. + */ + current->blocked_on =3D lock; set_current_state(state); /* * Here we order against unlock; we must either see it change @@ -690,8 +698,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int s= tate, unsigned int subclas =20 if (first) { trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN); + /* clear blocked_on as mutex_optimistic_spin may schedule() */ + current->blocked_on =3D NULL; if (mutex_optimistic_spin(lock, ww_ctx, &waiter)) break; + current->blocked_on =3D lock; trace_contention_begin(lock, LCB_F_MUTEX); } =20 @@ -699,6 +710,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas } raw_spin_lock_irqsave(&lock->wait_lock, flags); acquired: + current->blocked_on =3D NULL; __set_current_state(TASK_RUNNING); =20 if (ww_ctx) { @@ -728,9 +740,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int s= tate, unsigned int subclas return 0; =20 err: + current->blocked_on =3D NULL; __set_current_state(TASK_RUNNING); __mutex_remove_waiter(lock, &waiter); err_early_kill: + WARN_ON(current->blocked_on); trace_contention_end(lock, ret); raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q); debug_mutex_free_waiter(&waiter); @@ -940,6 +954,14 @@ static noinline void __sched __mutex_unlock_slowpath(s= truct mutex *lock, unsigne next =3D waiter->task; =20 debug_mutex_wake_waiter(lock, waiter); + /* + * Unlock wakeups can be happening in parallel + * (when optimistic spinners steal and release + * the lock), so blocked_on may already be + * cleared here. + */ + WARN_ON(next->blocked_on && next->blocked_on !=3D lock); + next->blocked_on =3D NULL; wake_q_add(&wake_q, next); } =20 diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h index 37f025a096c9d..00db40946328e 100644 --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -284,6 +284,14 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER= *waiter, #ifndef WW_RT debug_mutex_wake_waiter(lock, waiter); #endif + /* + * When waking up the task to die, be sure to clear the + * blocked_on pointer. Otherwise we can see circular + * blocked_on relationships that can't resolve. + */ + WARN_ON(waiter->task->blocked_on && + waiter->task->blocked_on !=3D lock); + waiter->task->blocked_on =3D NULL; wake_q_add(wake_q, waiter->task); } =20 @@ -331,9 +339,15 @@ static bool __ww_mutex_wound(struct MUTEX *lock, * it's wounded in __ww_mutex_check_kill() or has a * wakeup pending to re-read the wounded state. */ - if (owner !=3D current) + if (owner !=3D current) { + /* + * When waking up the task to wound, be sure to clear the + * blocked_on pointer. Otherwise we can see circular + * blocked_on relationships that can't resolve. + */ + owner->blocked_on =3D NULL; wake_q_add(wake_q, owner); - + } return true; } =20 --=20 2.49.0.604.gff1f9ca942-goog From nobody Sun Feb 8 07:56:36 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0FAE19049B for ; Sat, 12 Apr 2025 06:03:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744437793; cv=none; b=M9RZy9fZEqWwnTmzGUmKaJaiix6IKdjYC882YNZxE47CsA64nmbGLDuf59TEjm+b3HOkbCEtznEM3azywc7v29cwBQIaJ9ykdksxwfFecLybuumW4kxZfextzjYVwCJeg1l0JUSBIH8BvrItyYzx6ue6zu1OQDZozVFFOJH/wnA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744437793; c=relaxed/simple; bh=e/ffJR/qp7iLrTSX7nti5dlog7s8VIdteaNX21NwnfU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=A7DmtszNTfQoInQECVYPID5Z9LZoJn2uPQJPttjODT+m25nrbHK2IbUCfwHUktucLAqOStCTLBfdBXniMG9ugRKl/nAUZ8tqJ+WyWl9u2CkGhLM1iCl2ORCFUP8b8Zex3KVABLUnaMdqNkPyXHcJPCHHV8p+ikGRQlnkGQLfenA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=CnlgvLXR; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="CnlgvLXR" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-229170fbe74so21842205ad.2 for ; Fri, 11 Apr 2025 23:03:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744437791; x=1745042591; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LdTgksVLeqR5tS79ULIHZhxv9zuIBe54Sjbfo1IhSck=; b=CnlgvLXRRhg0/40a7EWppGfNwo17I+25m6lK/74laPTB1FTHKOo4chSLe4YP2uoKpF 7nirYC1KAP94X3VVcvlOCEC/fD806/V5+X5RQ9NKC8OUlNcsqaq1B6RnIcNvC1Gzf9QL i4l9PMMGQbt+tdXvVQ714UINzJ8gajx86R/cjj3scjOSgR91Q2inNjcNngcbxxQkMVRd tY8h08Hd+z7pn9jWWC4Rgwge2v5UbuvzTr+eEtVWD6vmXOJyCKKTQrTvkrAp7K+/wU3U Zgy77Fo+Ukh9vnGNBCWKYUsTlUtzPxX1mwSoSm63rdh+RFkhCGXs96cy/YNFcn2l6dcP Lwtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744437791; x=1745042591; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LdTgksVLeqR5tS79ULIHZhxv9zuIBe54Sjbfo1IhSck=; b=nq7r5t2Wq63nrI8j0IH9RLT6hwSb6cutJ2YtL9inyzgQHf+xaMQyNANJQrUogTLPjG HSpqgfDga1GpF+m2ihOvLLWbyNH+4orOlNzqb3NkO9xtJU3mPDB4M8M9Fjl2/+zqTxCS V8/eP1DtTTYi3v863ZzJmOnzJSA96WiRzviMsE9ScMrLp0jZKBEdNueZFNMPEov2Gkek yexGzI368R2bnAl2zhRBEbLbA6k6mynhJ1MVoVU+S5safxo7sGXTK08CiaXvSsZUF+hP DkTC0i6lKLkTHQx79KaT3TQTSMbnOmpdKo6ZZFXor5OfVP+cE8JU9c/zT7FrVhKuW4kC lLqA== X-Gm-Message-State: AOJu0YyXV87wxEqVVgmaBRRqdnqobg0t3tu6B3n9xfx74iF1VZHtoo+L pch1vynyunmV9MwlfILawEbgUf4Qitdm/XF/7xDuTqbJmch8mV4PwByezGOoRoJvevhdSbzeD5H nlFMS+Y3XyNEyhTRWwSfEX/3y2KNM+KDQp46TI25UwdfL7j3Fk0YN02rrxWhcLp4t96u6PBQ34D fyUiUk3Rfv8Qh8NSjV8J2gshKZjzudcEYL+2+SMGsGe/96 X-Google-Smtp-Source: AGHT+IGrqG3joaoQiw+v5Rc6GXA+41TWInCKhgpG1F7RhiEvg85M2mB/yWftk1A2pumQo90EkNtkDobnfIm3 X-Received: from pji12.prod.google.com ([2002:a17:90b:3fcc:b0:2fc:2ee0:d38a]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:1b08:b0:21f:85af:4bbf with SMTP id d9443c01a7336-22bea4b70d6mr76463465ad.20.1744437790619; Fri, 11 Apr 2025 23:03:10 -0700 (PDT) Date: Fri, 11 Apr 2025 23:02:37 -0700 In-Reply-To: <20250412060258.3844594-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250412060258.3844594-1-jstultz@google.com> X-Mailer: git-send-email 2.49.0.604.gff1f9ca942-goog Message-ID: <20250412060258.3844594-4-jstultz@google.com> Subject: [PATCH v16 3/7] locking/mutex: Add p->blocked_on wrappers for correctness checks From: John Stultz To: LKML Cc: Valentin Schneider , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kernel-team@android.com, "Connor O'Brien" , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Valentin Schneider This lets us assert mutex::wait_lock is held whenever we access p->blocked_on, as well as warn us for unexpected state changes. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kernel-team@android.com Signed-off-by: Valentin Schneider [fix conflicts, call in more places] Signed-off-by: Connor O'Brien [jstultz: tweaked commit subject, reworked a good bit] Signed-off-by: John Stultz --- v2: * Added get_task_blocked_on() accessor v4: * Address READ_ONCE usage that was dropped in v2 * Reordered to be a later add on to the main patch series as Peter was unhappy with similar wrappers in other patches. v5: * Added some extra correctness checking in wrappers v7: * Tweaks to reorder this change in the patch series * Minor cleanup to set_task_blocked_on() suggested by Metin Kaya v15: * Split out into its own patch again. * Further improve assumption checks in helpers. v16: * Fix optimistic spin case that can call schedule() --- include/linux/sched.h | 50 ++++++++++++++++++++++++++++++++++-- kernel/locking/mutex-debug.c | 4 +-- kernel/locking/mutex.c | 30 ++++++++++------------ kernel/locking/ww_mutex.h | 6 ++--- 4 files changed, 65 insertions(+), 25 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 10be203ddb7e1..8a1f0703caba7 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -2181,6 +2182,53 @@ extern int __cond_resched_rwlock_write(rwlock_t *loc= k); __cond_resched_rwlock_write(lock); \ }) =20 +static inline void __set_task_blocked_on(struct task_struct *p, struct mut= ex *m) +{ + WARN_ON_ONCE(!m); + /* The task should only be setting itself as blocked */ + WARN_ON_ONCE(p !=3D current); + /* Currently we serialize blocked_on under the mutex::wait_lock */ + lockdep_assert_held_once(&m->wait_lock); + /* + * Check ensure we don't overwrite exisiting mutex value + * with a different mutex. Note, setting it to the same + * lock repeatedly is ok. + */ + WARN_ON_ONCE(p->blocked_on && p->blocked_on !=3D m); + p->blocked_on =3D m; +} + +static inline void set_task_blocked_on(struct task_struct *p, struct mutex= *m) +{ + guard(raw_spinlock_irqsave)(&m->wait_lock); + __set_task_blocked_on(p, m); +} + +static inline void __clear_task_blocked_on(struct task_struct *p, struct m= utex *m) +{ + WARN_ON_ONCE(!m); + /* Currently we serialize blocked_on under the mutex::wait_lock */ + lockdep_assert_held_once(&m->wait_lock); + /* + * There may be cases where we re-clear already cleared + * blocked_on relationships, but make sure we are not + * clearing the relationship with a different lock. + */ + WARN_ON_ONCE(m && p->blocked_on && p->blocked_on !=3D m); + p->blocked_on =3D NULL; +} + +static inline void clear_task_blocked_on(struct task_struct *p, struct mut= ex *m) +{ + guard(raw_spinlock_irqsave)(&m->wait_lock); + __clear_task_blocked_on(p, m); +} + +static inline struct mutex *__get_task_blocked_on(struct task_struct *p) +{ + return READ_ONCE(p->blocked_on); +} + static __always_inline bool need_resched(void) { return unlikely(tif_need_resched()); @@ -2220,8 +2268,6 @@ extern bool sched_task_on_rq(struct task_struct *p); extern unsigned long get_wchan(struct task_struct *p); extern struct task_struct *cpu_curr_snapshot(int cpu); =20 -#include - /* * In order to reduce various lock holder preemption latencies provide an * interface to see if a vCPU is currently running or not. diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c index 758b7a6792b0c..949103fd8e9b5 100644 --- a/kernel/locking/mutex-debug.c +++ b/kernel/locking/mutex-debug.c @@ -54,13 +54,13 @@ void debug_mutex_add_waiter(struct mutex *lock, struct = mutex_waiter *waiter, lockdep_assert_held(&lock->wait_lock); =20 /* Current thread can't be already blocked (since it's executing!) */ - DEBUG_LOCKS_WARN_ON(task->blocked_on); + DEBUG_LOCKS_WARN_ON(__get_task_blocked_on(task)); } =20 void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *wa= iter, struct task_struct *task) { - struct mutex *blocked_on =3D READ_ONCE(task->blocked_on); + struct mutex *blocked_on =3D __get_task_blocked_on(task); =20 DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list)); DEBUG_LOCKS_WARN_ON(waiter->task !=3D task); diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 5243e59d75f40..a34a7974b418e 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -643,8 +643,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas goto err_early_kill; } =20 - WARN_ON(current->blocked_on); - current->blocked_on =3D lock; + __set_task_blocked_on(current, lock); set_current_state(state); trace_contention_begin(lock, LCB_F_MUTEX); for (;;) { @@ -686,7 +685,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas * that has cleared our blocked_on state, re-set * it to the lock we are trying to aquire. */ - current->blocked_on =3D lock; + set_task_blocked_on(current, lock); set_current_state(state); /* * Here we order against unlock; we must either see it change @@ -698,11 +697,15 @@ __mutex_lock_common(struct mutex *lock, unsigned int = state, unsigned int subclas =20 if (first) { trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN); - /* clear blocked_on as mutex_optimistic_spin may schedule() */ - current->blocked_on =3D NULL; + /* + * mutex_optimistic_spin() can call schedule(), so + * clear blocked on so we don't become unselectable + * to run. + */ + clear_task_blocked_on(current, lock); if (mutex_optimistic_spin(lock, ww_ctx, &waiter)) break; - current->blocked_on =3D lock; + set_task_blocked_on(current, lock); trace_contention_begin(lock, LCB_F_MUTEX); } =20 @@ -710,7 +713,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas } raw_spin_lock_irqsave(&lock->wait_lock, flags); acquired: - current->blocked_on =3D NULL; + __clear_task_blocked_on(current, lock); __set_current_state(TASK_RUNNING); =20 if (ww_ctx) { @@ -740,11 +743,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int = state, unsigned int subclas return 0; =20 err: - current->blocked_on =3D NULL; + __clear_task_blocked_on(current, lock); __set_current_state(TASK_RUNNING); __mutex_remove_waiter(lock, &waiter); err_early_kill: - WARN_ON(current->blocked_on); + WARN_ON(__get_task_blocked_on(current)); trace_contention_end(lock, ret); raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q); debug_mutex_free_waiter(&waiter); @@ -954,14 +957,7 @@ static noinline void __sched __mutex_unlock_slowpath(s= truct mutex *lock, unsigne next =3D waiter->task; =20 debug_mutex_wake_waiter(lock, waiter); - /* - * Unlock wakeups can be happening in parallel - * (when optimistic spinners steal and release - * the lock), so blocked_on may already be - * cleared here. - */ - WARN_ON(next->blocked_on && next->blocked_on !=3D lock); - next->blocked_on =3D NULL; + __clear_task_blocked_on(next, lock); wake_q_add(&wake_q, next); } =20 diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h index 00db40946328e..086fd5487ca77 100644 --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -289,9 +289,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER = *waiter, * blocked_on pointer. Otherwise we can see circular * blocked_on relationships that can't resolve. */ - WARN_ON(waiter->task->blocked_on && - waiter->task->blocked_on !=3D lock); - waiter->task->blocked_on =3D NULL; + __clear_task_blocked_on(waiter->task, lock); wake_q_add(wake_q, waiter->task); } =20 @@ -345,7 +343,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock, * blocked_on pointer. Otherwise we can see circular * blocked_on relationships that can't resolve. */ - owner->blocked_on =3D NULL; + __clear_task_blocked_on(owner, lock); wake_q_add(wake_q, owner); } return true; --=20 2.49.0.604.gff1f9ca942-goog From nobody Sun Feb 8 07:56:36 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3CA2C19340B for ; Sat, 12 Apr 2025 06:03:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744437794; cv=none; b=n7XSkqjityeMsdOIkpDtfsoa8lMPBEncVrG1zDs1B9JOcTZYXPddMxfJ6rwnynlVGpFmHvWAIprfvvj2eS6OalVp6LUox75/hMAjcpFcCxs5qXjcl6a3hdWqxTouJrFSETgMm5EuYUDdnds4rwmGw2S04O3T4gzYW7sQ5YK/760= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744437794; c=relaxed/simple; bh=AHid+QPjfDxgq+hMgC8goRPXMc3XuzmK7Kqu5KDAZDY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Tao9xn93QJLdNVH8yZHwkyONiG11h/S/DW53zr+BR5NtVXHOhxmRemKSMfTnSbjzgnwvWkm15WAAyxE6+jeOTz3CsQho72vOmv9ctMlmbJ+A52a92oES4TpYUJOxYB2S1Se13Ya1m2gcrs/eapli/IsW9AlwSKLS7kK+W48jhHo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=OIASH/WM; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="OIASH/WM" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-3055f2e1486so3847030a91.0 for ; Fri, 11 Apr 2025 23:03:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744437792; x=1745042592; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Q0eSoKWtXz3jskMowbEvQ6wXhZ8nL/54iGlts/Saw/g=; b=OIASH/WMMxBba1pBp6DZ67QreaHv6rBcX2b6N3HiNCFaGIh62Za30Q0U6e3YDVcfLK tUuWFEdge100yHxe83x47h38BqbE80hNqrn0bFrVTVl/xljFGhP5n+83ZfKfg5Yhl6LX 3nbj9kzxoMZNcRqZ4WlmXuKRHrZmk3iVKVbppdUIDwG3hC3JO4oXVJobo1JR9hQ5lMkI EFo78VcGxrhnVtfd6DWAY62w5sFSDa1mVx9/+R7gvBKZvEMspGF7Jew82ouRpyUtaOBB sgopMzSCGr8MdgjDw7wZsPMzlgRVrVgxgjBW+2U9OEfW6MMxUQ2yOCzv8elPIPXrsf4n vicA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744437792; x=1745042592; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Q0eSoKWtXz3jskMowbEvQ6wXhZ8nL/54iGlts/Saw/g=; b=cb6egWm5BhyWIB49e2uCmuwYsi+UR02FmmLbJyBQjdI8Ucz0ZPqy20d4ffWyA+35Cn bGpXRWyTRyrLvkRRZzFeaxdZ7BMNCB7rHTPEvqMCD+Lmz5nycy+VbRw4BuAdF/LV3ehX r+e/QBVWdA6j+VlXAKuzqRFVLb58W+43LQ/8KmivbLpnakcv0CsW9tn7ecYc/bMPoix1 BJUNLFYlszB1heijDMR79dHR/nELZBYk9F5Q3MclnOq1LBgNQT7iaX4A8spIwSYuOBmU xNe/SNMEYG9U5UPM6OO3Scf0xWqwxzxSSec917NcHzqYjTX/NbgAwg5w6JbOMG3LDFu4 YoNQ== X-Gm-Message-State: AOJu0YwZJriWdzJPHoAs7YVUao8RGWuy8fPExCDs3gtlCMpi1cbFeuV0 CzZcNvEYtczWd/2m9ILjGOMkwpGaerrARjMphzegVt+8oztruliFGMwGcm2bs1xoxXU8s87D9DZ jq+db42WDyWYl93XB6csFhEUxFgsR/Gb5g4ro8626kT3j0Jrga7osb/elc91zZk9omcIkLOnISJ nKTt1Dclpe/mcQuJhYbZzZboXU5fyqd5/PmHn28YRKCuw+ X-Google-Smtp-Source: AGHT+IEDg48TYHOpe+SUZJtRPXZlETeKa5NztvclyjfQ4aNai/S5HHcRMb3CbvBn9wi5Q/ED+dYCvmdmXXWl X-Received: from pji12.prod.google.com ([2002:a17:90b:3fcc:b0:2fc:2ee0:d38a]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5206:b0:305:2d27:7ca7 with SMTP id 98e67ed59e1d1-308236523femr9436973a91.16.1744437792452; Fri, 11 Apr 2025 23:03:12 -0700 (PDT) Date: Fri, 11 Apr 2025 23:02:38 -0700 In-Reply-To: <20250412060258.3844594-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250412060258.3844594-1-jstultz@google.com> X-Mailer: git-send-email 2.49.0.604.gff1f9ca942-goog Message-ID: <20250412060258.3844594-5-jstultz@google.com> Subject: [PATCH v16 4/7] sched: Fix runtime accounting w/ split exec & sched contexts From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Without proxy-exec, we normally charge the "current" task for both its vruntime as well as its sum_exec_runtime. With proxy, however, we have two "current" contexts: the scheduler context and the execution context. We want to charge the execution context rq->curr (ie: proxy/lock holder) execution time to its sum_exec_runtime (so it's clear to userland the rq->curr task *is* running). Then instead of charging the execution context (rq->curr) for the vruntime, we charge the vruntime against the scheduler context (rq->donor) task, because that is the time it is donating when it is used as the scheduler-context. If the donor and curr tasks are the same, then it's the same as without proxy. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kernel-team@android.com Signed-off-by: John Stultz --- v16: * Renamed update_curr_se to update_se_times, as suggested by Steven Rostedt. * Reworded the commit message as suggested by Steven Rostedt --- kernel/sched/fair.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e43993a4e5807..da8b0970c6655 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1143,22 +1143,33 @@ static void update_tg_load_avg(struct cfs_rq *cfs_r= q) } #endif /* CONFIG_SMP */ =20 -static s64 update_curr_se(struct rq *rq, struct sched_entity *curr) +static s64 update_se_times(struct rq *rq, struct sched_entity *se) { u64 now =3D rq_clock_task(rq); s64 delta_exec; =20 - delta_exec =3D now - curr->exec_start; + delta_exec =3D now - se->exec_start; if (unlikely(delta_exec <=3D 0)) return delta_exec; =20 - curr->exec_start =3D now; - curr->sum_exec_runtime +=3D delta_exec; + se->exec_start =3D now; + if (entity_is_task(se)) { + struct task_struct *running =3D rq->curr; + /* + * If se is a task, we account the time against the running + * task, as w/ proxy-exec they may not be the same. + */ + running->se.exec_start =3D now; + running->se.sum_exec_runtime +=3D delta_exec; + } else { + /* If not task, account the time against se */ + se->sum_exec_runtime +=3D delta_exec; + } =20 if (schedstat_enabled()) { struct sched_statistics *stats; =20 - stats =3D __schedstats_from_se(curr); + stats =3D __schedstats_from_se(se); __schedstat_set(stats->exec_max, max(delta_exec, stats->exec_max)); } @@ -1213,7 +1224,7 @@ s64 update_curr_common(struct rq *rq) struct task_struct *donor =3D rq->donor; s64 delta_exec; =20 - delta_exec =3D update_curr_se(rq, &donor->se); + delta_exec =3D update_se_times(rq, &donor->se); if (likely(delta_exec > 0)) update_curr_task(donor, delta_exec); =20 @@ -1233,7 +1244,7 @@ static void update_curr(struct cfs_rq *cfs_rq) if (unlikely(!curr)) return; =20 - delta_exec =3D update_curr_se(rq, curr); + delta_exec =3D update_se_times(rq, curr); if (unlikely(delta_exec <=3D 0)) return; =20 --=20 2.49.0.604.gff1f9ca942-goog From nobody Sun Feb 8 07:56:36 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E5BDC198A29 for ; Sat, 12 Apr 2025 06:03:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744437797; cv=none; b=ky9cdzTxev65YSKTRTGNV9eR0E8yQP61AGDiGJ0tttskJjAinmeO9hDQPv/HKuF7JIq08dh+OtQSPsVOSRxyqM5WHrKYbmE6L+MsHJmD0VOEOHnpE/8iT3mqdTQbBLq+0gL8ovPqPuyrz/VGmYV9lPqbTuHnw4+JYIm9a1U40Mk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744437797; c=relaxed/simple; bh=P4LjumHh/z+dK8+rRpiX4HsoiipXYY7JIUy+sWGN32s=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=KYfbWlvKbtQx5oyBZnAdWkn/POFZ/PbeJE2khPIhdTA5xcrjm1PQEj45AVv1ndBmsVc0EKiVeymoDwljyxN9UpU6objJaCWnMHOr05khjcVVKskc4KaYQAKCMnaj1W3NKrQufj8YRpgb6wT1nsFvIetAjAod5vdYHINQfAkXiJo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=OHecZFDv; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="OHecZFDv" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-224347aef79so35160975ad.2 for ; Fri, 11 Apr 2025 23:03:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744437794; x=1745042594; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=naVwflgBNbBg0NLcXqLDTTYAuw4Vp+VwbWUapN6oLCc=; b=OHecZFDvYHB+JWpM9vd7LmmKUVKRaQFOJ4wVggTG2CPlxxx121OMLUZ9JmbUOYOHrB 7Op6l1s44YHA39VRI0iNaSi3TDlYkztfR4XFoj7vmYHdzI6BTptcBFvuhUEHKP3Y+wh6 cIU3vyFeCMP5okwvrZ9WbI9NMPWo8gWfqknSpkepsH2J052n836ub8VpkYqw+wiYQRqH 7bQbnsgahRazACuwZe5vcxOGqIq+guImaG4uhZSXlU9kU1si17J9hsSS369R9vFitG8x Hy4yg25mBAr+E7NCprCtYKlDHKMaSNAdYu9P2GVs052RRJOIPBBX0tbAD7ImHdzj0dQI VeKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744437794; x=1745042594; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=naVwflgBNbBg0NLcXqLDTTYAuw4Vp+VwbWUapN6oLCc=; b=bRInXMUdo3fwEDxAl54O4Vp84k7P9niYDIPEqaBS+VdKG8sL5mPjjH4HvFtkY4BrVY mW1a6bmRAQ5uuSjJQDeunv5mQiznXrvry5c3rLCnXJ+9SWRSwhuMO0vFZL2RWpVMsi3r 8JLmKCTF7m8vBVv+hLxRlTKo/SK8IHTk+vrqutzBuEdF52kgkjdS1Kblfd2C3FDQ696O nT6CgYVMMWC3ykf5Y+7+mTISiuGIlM+jzL53BsxQLEte0Lzt5UdjUytK+7Bby+kRkJqS bcb1ss22QjUWUctj5ZgKDJ8hMPeRfDcbJqXzMkX4plGsj53KTXqqT1108LEsBXkWnGG4 7j6A== X-Gm-Message-State: AOJu0YzMv4YTZENs4Vu6LdRYE+AnpZMN0jOhXocUgay1xDoToB3KrnAD cXuCY5KJDA4n8QHKDg6BdNcprg0eWdHKUU6fOhpK8ZsM4OyxJVyABgOmYfTOvKTcfVyBjm2pFn2 3mjFQdhKDPNK3NxG8nhjv4dZzD6nibQVqRumQF16s5cflXnLyLqzPIWcnIA9nIsTvrugj4vjLb7 YwsGFsff2SPTSEqVoMTsuKlpNYR09MkuNJ6wRCGz2d9Olu X-Google-Smtp-Source: AGHT+IEf7DGwnkqpsVBaZ9GPliFq+nyLl/3/Hol8NZMDKrMj8xAjm65Lqq8cIUhdrpcFACwmNxOWRDODIi6y X-Received: from pjbsi10.prod.google.com ([2002:a17:90b:528a:b0:2ef:7af4:5e8e]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:1b27:b0:223:f408:c3dc with SMTP id d9443c01a7336-22bea4952f7mr71861345ad.9.1744437794244; Fri, 11 Apr 2025 23:03:14 -0700 (PDT) Date: Fri, 11 Apr 2025 23:02:39 -0700 In-Reply-To: <20250412060258.3844594-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250412060258.3844594-1-jstultz@google.com> X-Mailer: git-send-email 2.49.0.604.gff1f9ca942-goog Message-ID: <20250412060258.3844594-6-jstultz@google.com> Subject: [PATCH v16 5/7] sched: Add an initial sketch of the find_proxy_task() function From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a find_proxy_task() function which doesn't do much. When we select a blocked task to run, we will just deactivate it and pick again. The exception being if it has become unblocked after find_proxy_task() was called. Greatly simplified from patch by: Peter Zijlstra (Intel) Juri Lelli Valentin Schneider Connor O'Brien Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kernel-team@android.com [jstultz: Split out from larger proxy patch and simplified for review and testing.] Signed-off-by: John Stultz --- v5: * Split out from larger proxy patch v7: * Fixed unused function arguments, spelling nits, and tweaks for clarity, pointed out by Metin Kaya * Fix build warning Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202311081028.yDLmCWgr-lkp@i= ntel.com/ v8: * Fixed case where we might return a blocked task from find_proxy_task() * Continued tweaks to handle avoiding returning blocked tasks v9: * Add zap_balance_callbacks helper to unwind balance_callbacks when we will re-call pick_next_task() again. * Add extra comment suggested by Metin * Typo fixes from Metin * Moved adding proxy_resched_idle earlier in the series, as suggested by Metin * Fix to call proxy_resched_idle() *prior* to deactivating next, to avoid crashes caused by stale references to next * s/PROXY/SCHED_PROXY_EXEC/ as suggested by Metin * Number of tweaks and cleanups suggested by Metin * Simplify proxy_deactivate as suggested by Metin v11: * Tweaks for earlier simplification in try_to_deactivate_task v13: * Rename rename "next" to "donor" in find_proxy_task() for clarity * Similarly use "donor" instead of next in proxy_deactivate * Refactor/simplify proxy_resched_idle * Moved up a needed fix from later in the series v15: * Tweaked some comments to better explain the initial sketch of find_proxy_task(), suggested by Qais * Build fixes for !CONFIG_SMP * Slight rework for blocked_on_state being added later in the series. * Move the zap_balance_callbacks to later in the patch series v16: * Move the enqueue_task_rt() out to later in the series, as suggested by K Prateek Nayak * Fixup whitespace error pointed out by K Prateek Nayak * Use put_prev_set_next_task as suggested by K Prateek Nayak * Try to rework find_proxy_task() locking to use guard and proxy_deactivate_task() in the way Peter suggested. --- kernel/sched/core.c | 100 +++++++++++++++++++++++++++++++++++++++++-- kernel/sched/sched.h | 10 ++++- 2 files changed, 106 insertions(+), 4 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 82817650a635b..88acb47f50d0f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6600,7 +6600,7 @@ pick_next_task(struct rq *rq, struct task_struct *pre= v, struct rq_flags *rf) * Otherwise marks the task's __state as RUNNING */ static bool try_to_block_task(struct rq *rq, struct task_struct *p, - unsigned long task_state) + unsigned long task_state, bool deactivate_cond) { int flags =3D DEQUEUE_NOCLOCK; =20 @@ -6609,6 +6609,9 @@ static bool try_to_block_task(struct rq *rq, struct t= ask_struct *p, return false; } =20 + if (!deactivate_cond) + return false; + p->sched_contributes_to_load =3D (task_state & TASK_UNINTERRUPTIBLE) && !(task_state & TASK_NOLOAD) && @@ -6632,6 +6635,90 @@ static bool try_to_block_task(struct rq *rq, struct = task_struct *p, return true; } =20 +#ifdef CONFIG_SCHED_PROXY_EXEC +static inline struct task_struct *proxy_resched_idle(struct rq *rq) +{ + put_prev_set_next_task(rq, rq->donor, rq->idle); + rq_set_donor(rq, rq->idle); + set_tsk_need_resched(rq->idle); + return rq->idle; +} + +static bool __proxy_deactivate(struct rq *rq, struct task_struct *donor) +{ + unsigned long state =3D READ_ONCE(donor->__state); + + /* Don't deactivate if the state has been changed to TASK_RUNNING */ + if (state =3D=3D TASK_RUNNING) + return false; + /* + * Because we got donor from pick_next_task, it is *crucial* + * that we call proxy_resched_idle before we deactivate it. + * As once we deactivate donor, donor->on_rq is set to zero, + * which allows ttwu to immediately try to wake the task on + * another rq. So we cannot use *any* references to donor + * after that point. So things like cfs_rq->curr or rq->donor + * need to be changed from next *before* we deactivate. + */ + proxy_resched_idle(rq); + return try_to_block_task(rq, donor, state, true); +} + +static struct task_struct *proxy_deactivate(struct rq *rq, struct task_str= uct *donor) +{ + if (!__proxy_deactivate(rq, donor)) { + /* + * XXX: For now, if deactivation failed, set donor + * as unblocked, as we aren't doing proxy-migrations + * yet (more logic will be needed then). + */ + donor->blocked_on =3D NULL; + } + return NULL; +} + +/* + * Initial simple sketch that just deactivates the blocked task + * chosen by pick_next_task() so we can then pick something that + * isn't blocked. + */ +static struct task_struct * +find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags = *rf) +{ + struct task_struct *p =3D donor; + struct mutex *mutex; + + mutex =3D p->blocked_on; + /* Something changed in the chain, so pick again */ + if (!mutex) + return NULL; + /* + * By taking mutex->wait_lock we hold off concurrent mutex_unlock() + * and ensure @owner sticks around. + */ + guard(raw_spinlock)(&mutex->wait_lock); + + /* Check again that p is blocked with blocked_lock held */ + if (!task_is_blocked(p) || mutex !=3D __get_task_blocked_on(p)) { + /* + * Something changed in the blocked_on chain and + * we don't know if only at this level. So, let's + * just bail out completely and let __schedule + * figure things out (pick_again loop). + */ + return NULL; /* do pick_next_task again */ + } + return proxy_deactivate(rq, donor); +} +#else /* SCHED_PROXY_EXEC */ +static struct task_struct * +find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags = *rf) +{ + WARN_ONCE(1, "This should never be called in the !SCHED_PROXY_EXEC case\n= "); + return donor; +} +#endif /* SCHED_PROXY_EXEC */ + /* * __schedule() is the main scheduler function. * @@ -6742,12 +6829,19 @@ static void __sched notrace __schedule(int sched_mo= de) goto picked; } } else if (!preempt && prev_state) { - try_to_block_task(rq, prev, prev_state); + try_to_block_task(rq, prev, prev_state, + !task_is_blocked(prev)); switch_count =3D &prev->nvcsw; } =20 - next =3D pick_next_task(rq, prev, &rf); +pick_again: + next =3D pick_next_task(rq, rq->donor, &rf); rq_set_donor(rq, next); + if (unlikely(task_is_blocked(next))) { + next =3D find_proxy_task(rq, next, &rf); + if (!next) + goto pick_again; + } picked: clear_tsk_need_resched(prev); clear_preempt_need_resched(); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 154f0aa0c6322..ea2c987005bc1 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2264,6 +2264,14 @@ static inline int task_current_donor(struct rq *rq, = struct task_struct *p) return rq->donor =3D=3D p; } =20 +static inline bool task_is_blocked(struct task_struct *p) +{ + if (!sched_proxy_exec()) + return false; + + return !!p->blocked_on; +} + static inline int task_on_cpu(struct rq *rq, struct task_struct *p) { #ifdef CONFIG_SMP @@ -2473,7 +2481,7 @@ static inline void put_prev_set_next_task(struct rq *= rq, struct task_struct *prev, struct task_struct *next) { - WARN_ON_ONCE(rq->curr !=3D prev); + WARN_ON_ONCE(rq->donor !=3D prev); =20 __put_prev_set_next_dl_server(rq, prev, next); =20 --=20 2.49.0.604.gff1f9ca942-goog From nobody Sun Feb 8 07:56:36 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B19EA199EB7 for ; Sat, 12 Apr 2025 06:03:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744437799; cv=none; b=liy8UqaKXcWMBgD9wOjpxTHsnFfbNUN5HOCoFxaDA6Of6PEIflgxwwuye0Ztmdifcid6ZZZ24CRCmahUPj5F3hdC3X2BPsZSJNxSSM+oZypQJ3DxlKKVo2+PXotdYfo8LWPKe3hK8M8YSnODs6HzBX7zthGx/3Y0lCHDMarRCHw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744437799; c=relaxed/simple; bh=fKTNvFvODPcI1rCmpuwOiM+rUgcui9jkItI+2fxkRBo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NNG85QOGc0+65annaQdfrzlnaY7YJLHhtFEcyJSIi5dn4f38MWE/AykhgjP5o14jHNihyvrGTwsZYOJ954dEsEFs7KQIo5a/4nfozUeUJGWzmuT0PfOVfcMXsIAPbWt1noWU6vesCO/hpTPwef3zx/Sx5CbMxkqUnV5ykz+NiJE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=zquhV0GL; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="zquhV0GL" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-af8a4410816so2212523a12.1 for ; Fri, 11 Apr 2025 23:03:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744437796; x=1745042596; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xoxewrjA0eRKaqgPhy1nPLkUdhv1NKZQYr6pe4XtuJw=; b=zquhV0GLqhRjs801kMBKKfiMWwNG6dNhncI4nhF3MBsA6sylmL5yEvCmX5UDBE4hMi 0rJEWEUzZ3CO/l0FAc5CcCBcDAbYATgubuQ+F02tbIq+I6HcsV+74tXHQ9odTsWEDrwn k3f9UC6Wz5XhBKJx0N8PPbkpuCIvDQMbJ4CpjYdj1y2JIXLxXhag/QsjHuFkDPXKl3hc C6UqJzG5bNL5cIqUYGdgMI6cgr5HAi0zSg6X7JTgqzhFALf0zyweO3N+pmYJjrWnIce8 nkAiqRmGewG1nnPqmwlDR6jMz0yx+/AITw2TDxI3UUibyPqyg3pSmeTjgNZRzIl1QOXW TT2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744437796; x=1745042596; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xoxewrjA0eRKaqgPhy1nPLkUdhv1NKZQYr6pe4XtuJw=; b=O+VopZqp5bcdSN1GRe67IfIxIf7kUJLQ1OOc9uAUvBAbzWiJ2GzvdLr2sTNVIZ2w8q oDD+FPaHOFPPC4ETqs8/sW816Nemj+8DjLjZ6HE/oCkCqZEljvAdn6fmkuMthVn8nTfz IJcEZSLQ09RqwOcq/uqkQQiVUgXVOsxALeOPu8FPeAqmTDTjTDqF0aLENpmLbvu+6hLF vkBEQvLir0L1BnZ0k2Fh7yMfUkpUdfWdIEY//ReF//HxrDxJGOmDN2/V6gUc5R4O8RDn sPTmkAh8FV47nQHUB7Xga8nLOHdZIaVP7l3iEQUT38oVSqLLWu4hP6N0XMc1gmeI4dSl pwVA== X-Gm-Message-State: AOJu0YyFwQDvPC4Ib55cb+kDzidZ892qpnAtSnOlX+pbpL0MM5hVtgtQ SWBDqEV5nf53Mx3EuBKkaUuzTXXZaRLAiRO9BPEPIXIp2YI8J4RmJ15F7c6EyOlmnUysLVeR5cg 8JdTWr5ouGtarGJH+1NmDN9eqJ7kOZJkFOHuc/EqKn9V8reiggMCp9rjFKiU9xeg62donX8Wfkb 9pQVChymMr3mAhsfsr9HFUgZBXgv2ON+nuFO/YpkQCdA3t X-Google-Smtp-Source: AGHT+IHRrIZOLx+coeg/Jiph+bXKsm9O4rkyMkbO695loThOvdjJk3HvsU6HqvMMnpnNbd/xzRkboFZCOLxQ X-Received: from pgbdn6.prod.google.com ([2002:a05:6a02:e06:b0:af8:c71b:5fbe]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:438f:b0:1f5:8903:860f with SMTP id adf61e73a8af0-2017979a290mr9035889637.14.1744437795653; Fri, 11 Apr 2025 23:03:15 -0700 (PDT) Date: Fri, 11 Apr 2025 23:02:40 -0700 In-Reply-To: <20250412060258.3844594-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250412060258.3844594-1-jstultz@google.com> X-Mailer: git-send-email 2.49.0.604.gff1f9ca942-goog Message-ID: <20250412060258.3844594-7-jstultz@google.com> Subject: [PATCH v16 6/7] sched: Fix proxy/current (push,pull)ability From: John Stultz To: LKML Cc: Valentin Schneider , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kernel-team@android.com, "Connor O'Brien" , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Valentin Schneider Proxy execution forms atomic pairs of tasks: The waiting donor task (scheduling context) and a proxy (execution context). The donor task, along with the rest of the blocked chain, follows the proxy wrt CPU placement. They can be the same task, in which case push/pull doesn't need any modification. When they are different, however, FIFO1 & FIFO42: ,-> RT42 | | blocked-on | v blocked_donor | mutex | | owner | v `-- RT1 RT1 RT42 CPU0 CPU1 ^ ^ | | overloaded !overloaded rq prio =3D 42 rq prio =3D 0 RT1 is eligible to be pushed to CPU1, but should that happen it will "carry" RT42 along. Clearly here neither RT1 nor RT42 must be seen as push/pullable. Unfortunately, only the donor task is usually dequeued from the rq, and the proxy'ed execution context (rq->curr) remains on the rq. This can cause RT1 to be selected for migration from logic like the rt pushable_list. Thus, adda a dequeue/enqueue cycle on the proxy task before __schedule returns, which allows the sched class logic to avoid adding the now current task to the pushable_list. Furthermore, tasks becoming blocked on a mutex don't need an explicit dequeue/enqueue cycle to be made (push/pull)able: they have to be running to block on a mutex, thus they will eventually hit put_prev_task(). Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kernel-team@android.com Signed-off-by: Valentin Schneider Signed-off-by: Connor O'Brien Signed-off-by: John Stultz --- v3: * Tweaked comments & commit message v5: * Minor simplifications to utilize the fix earlier in the patch series. * Rework the wording of the commit message to match selected/ proxy terminology and expand a bit to make it more clear how it works. v6: * Dropped now-unused proxied value, to be re-added later in the series when it is used, as caught by Dietmar v7: * Unused function argument fixup * Commit message nit pointed out by Metin Kaya * Dropped unproven unlikely() and use sched_proxy_exec() in proxy_tag_curr, suggested by Metin Kaya v8: * More cleanups and typo fixes suggested by Metin Kaya v11: * Cleanup of comimt message suggested by Metin v12: * Rework for rq_selected -> rq->donor renaming v16: * Pulled logic from later patch in to avoid sched_balance migrating blocked tasks. * Moved enqueue_task_rt() logic from earlier into this patch as suggested by K Prateek Nayak * Simplified changes to enqueue_task_rt to match deadline's logic, as pointed out by Peter --- kernel/sched/core.c | 25 +++++++++++++++++++++++++ kernel/sched/deadline.c | 3 +++ kernel/sched/rt.c | 5 +++++ 3 files changed, 33 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 88acb47f50d0f..33f0260c20609 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6719,6 +6719,23 @@ find_proxy_task(struct rq *rq, struct task_struct *d= onor, struct rq_flags *rf) } #endif /* SCHED_PROXY_EXEC */ =20 +static inline void proxy_tag_curr(struct rq *rq, struct task_struct *owner) +{ + if (!sched_proxy_exec()) + return; + /* + * pick_next_task() calls set_next_task() on the chosen task + * at some point, which ensures it is not push/pullable. + * However, the chosen/donor task *and* the mutex owner form an + * atomic pair wrt push/pull. + * + * Make sure owner we run is not pushable. Unfortunately we can + * only deal with that by means of a dequeue/enqueue cycle. :-/ + */ + dequeue_task(rq, owner, DEQUEUE_NOCLOCK | DEQUEUE_SAVE); + enqueue_task(rq, owner, ENQUEUE_NOCLOCK | ENQUEUE_RESTORE); +} + /* * __schedule() is the main scheduler function. * @@ -6855,6 +6872,10 @@ static void __sched notrace __schedule(int sched_mod= e) * changes to task_struct made by pick_next_task(). */ RCU_INIT_POINTER(rq->curr, next); + + if (!task_current_donor(rq, next)) + proxy_tag_curr(rq, next); + /* * The membarrier system call requires each architecture * to have a full memory barrier after updating @@ -6889,6 +6910,10 @@ static void __sched notrace __schedule(int sched_mod= e) /* Also unlocks the rq: */ rq =3D context_switch(rq, prev, next, &rf); } else { + /* In case next was already curr but just got blocked_donor */ + if (!task_current_donor(rq, next)) + proxy_tag_curr(rq, next); + rq_unpin_lock(rq, &rf); __balance_callbacks(rq); raw_spin_rq_unlock_irq(rq); diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index ad45a8fea245e..eb07c3a1b8fa4 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2166,6 +2166,9 @@ static void enqueue_task_dl(struct rq *rq, struct tas= k_struct *p, int flags) if (dl_server(&p->dl)) return; =20 + if (task_is_blocked(p)) + return; + if (!task_current(rq, p) && !p->dl.dl_throttled && p->nr_cpus_allowed > 1) enqueue_pushable_dl_task(rq, p); } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index fa03ec3ed56a2..87ccd5d5375a3 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1477,6 +1477,9 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p,= int flags) =20 enqueue_rt_entity(rt_se, flags); =20 + if (task_is_blocked(p)) + return; + if (!task_current(rq, p) && p->nr_cpus_allowed > 1) enqueue_pushable_task(rq, p); } @@ -1757,6 +1760,8 @@ static void put_prev_task_rt(struct rq *rq, struct ta= sk_struct *p, struct task_s =20 update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 1); =20 + if (task_is_blocked(p)) + return; /* * The previous task needs to be made eligible for pushing * if it is still active --=20 2.49.0.604.gff1f9ca942-goog From nobody Sun Feb 8 07:56:36 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FDFE19B5B1 for ; Sat, 12 Apr 2025 06:03:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744437800; cv=none; b=NR4jhRVpsy7AdOknznely+YnRx93N/euWkjOAs+OLHTFe6UY7yyKYzhvW1rGDuTW9B16KCTFo4K4GyjRebFWnMMBeNl5PHothiB9jsweGmo/GdnxoaC82XIPDgFa8dlBBhKPXPHRuaMlbXs1Jjg1gnlOJ9LE6W35w0zI8LWlNhk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744437800; c=relaxed/simple; bh=6tM9+MxTZUtYkx3p00Wm+273k5zyv3CrYFk73MFyYck=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=AuULMISaYKYzfKoHo5+BzPKb/9e4Oi1L/2mr57nQXGhoDTHILcKBoM7N8PdG/xyzyk2XOgyo854VjgkCR54Kt8/IWJMKEx6IPOUYBT9D/n6/KELFo8oRDIslvmfwiGT5rBwBy8y3fVOzIlYaAiWAezv8WKyever/qEHmbs93b8w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=EM6TpAB4; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EM6TpAB4" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30828f9af10so2001336a91.3 for ; Fri, 11 Apr 2025 23:03:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744437798; x=1745042598; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=duR7DmfJIKaJrcYmCaNuS2R9apKh4RJR8oGOKow+S2s=; b=EM6TpAB4z5xvpwd1/XypsVRFzrEm/jGJbD/f/XlykQ/cC1nobud5ZLY6WrQ5KKHjk8 Q99Dl2XPUQ3xq7NAgG3rLmtobgJRn0kXHjxvDex/VQsj/lzANSp11lktkOMJpCASIb48 SpruOJ16PatteAft0XlQJsKxl36Fn2LqqV13jiDwwWv/zlektWgmgFiTu32NvLwvjQgj hZGPMbkVGshmKxi3XAjqVpJWuI2k+rVuh6CxYWWpMjZiKD3fzbUxJEEFWKKQxZ9O/CqI ZEYUddK424+SRIOqsK2s0N8Iu1ED6hwMKoCeZsPz0If3qzE33CYZ7Xlb/suypLtN5Rgr bQDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744437798; x=1745042598; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=duR7DmfJIKaJrcYmCaNuS2R9apKh4RJR8oGOKow+S2s=; b=hb65vzV2jPzHaMunp8jVBK/AJLD+VI5N5+BE2+Yd3Z9oisn43uRbDZa4rNpiOI0NCW lFrRVJ/8MTWHZB9Br6F1VRYoCZI8qQoS4V7cHU6MYOwbgAGZL7kPKTA0Y7iinZU8iTOs alW5jh5pYkM5s2x/EXGtSx4GRatv1UJLCc8zkqxTcYWGsYbsnWkye6RUcOZE+zOLYc5o ffLS5DYj6PJPrPZZXB04K9LmTISAtNGJl9FkNMJRNfwLmYxd70PGvmq/lSEmFxVGmvQH KbwWlETGHjAG8T7119Qm0oYBZRnoEObWDl98Mcvd6aL0o7a8O/uANH/gzoEumNDRUqza YJ6w== X-Gm-Message-State: AOJu0YyMugDbM8htWYopUIieHE+jkmlyFXuEQb8R58p2RdUwCqBtTEx0 BFm+c9t1aqcNm+r2HKhEstUpuX1fj2WzG7hRW9wFpEL5JFniR/dQ03FhmSchzf57yKgtoni9H4Y nFRK9fETef4eMHDrYbal+TqdgoVRF4edPdiC3c83HcQpoF43LZB41YR/pS+HQIgQYvml34/wXZ3 8By0KICatnTW752UA6x1BN0k1WaGsDH18FxeE+4WmHBedJ X-Google-Smtp-Source: AGHT+IF/EkQ6EXTg82jQFBXU3upMTh+bvJXG3QBLHsSVsZYxx/WJdKsoRxtZrJd5brAcBpvnDq9fi+V7KTR5 X-Received: from pjbqo15.prod.google.com ([2002:a17:90b:3dcf:b0:2ea:aa56:49c]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3907:b0:2fe:baa3:b8bc with SMTP id 98e67ed59e1d1-3082367df4dmr6022891a91.23.1744437797405; Fri, 11 Apr 2025 23:03:17 -0700 (PDT) Date: Fri, 11 Apr 2025 23:02:41 -0700 In-Reply-To: <20250412060258.3844594-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250412060258.3844594-1-jstultz@google.com> X-Mailer: git-send-email 2.49.0.604.gff1f9ca942-goog Message-ID: <20250412060258.3844594-8-jstultz@google.com> Subject: [PATCH v16 7/7] sched: Start blocked_on chain processing in find_proxy_task() From: John Stultz To: LKML Cc: Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kernel-team@android.com, Valentin Schneider , "Connor O'Brien" , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Start to flesh out the real find_proxy_task() implementation, but avoid the migration cases for now, in those cases just deactivate the donor task and pick again. To ensure the donor task or other blocked tasks in the chain aren't migrated away while we're running the proxy, also tweak the fair class logic to avoid migrating donor or mutex blocked tasks. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kernel-team@android.com Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Juri Lelli Signed-off-by: Valentin Schneider Signed-off-by: Connor O'Brien [jstultz: This change was split out from the larger proxy patch] Signed-off-by: John Stultz --- v5: * Split this out from larger proxy patch v7: * Minor refactoring of core find_proxy_task() function * Minor spelling and corrections suggested by Metin Kaya * Dropped an added BUG_ON that was frequently tripped v8: * Fix issue if proxy_deactivate fails, we don't leave task BO_BLOCKED * Switch to WARN_ON from BUG_ON checks v9: * Improve comments suggested by Metin * Minor cleanups v11: * Previously we checked next=3D=3Drq->idle && prev=3D=3Drq->idle, but I think we only really care if next=3D=3Drq->idle from find_proxy_task, as we will still want to resched regardless of what prev was. v12: * Commit message rework for selected -> donor rewording v13: * Address new delayed dequeue condition (deactivate donor for now) * Next to donor renaming in find_proxy_task * Improved comments for find_proxy_task * Rework for proxy_deactivate cleanup v14: * Fix build error from __mutex_owner() with CONFIG_PREEMPT_RT v15: * Reworks for moving blocked_on_state to later in the series v16: * Pull down fix from later in the series where a deactivated task could pass the (task_cpu(owner) =3D=3D this_cpu) check then have it be activated on a different cpu, so it passes the on_rq check. Thus double check the values in the opposite order to make sure nothing slips by. * Add resched_idle label to simplify common exit path * Get rid of preserve_need_resched flag and rework per Peter's suggestion * Rework find_proxy_task() to use guard to cleanup the exit gotos as Peter suggested. --- kernel/locking/mutex.h | 3 +- kernel/sched/core.c | 146 ++++++++++++++++++++++++++++++++++------- kernel/sched/fair.c | 10 ++- 3 files changed, 134 insertions(+), 25 deletions(-) diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h index cbff35b9b7ae3..2e8080a9bee37 100644 --- a/kernel/locking/mutex.h +++ b/kernel/locking/mutex.h @@ -6,7 +6,7 @@ * * Copyright (C) 2004, 2005, 2006 Red Hat, Inc., Ingo Molnar */ - +#ifndef CONFIG_PREEMPT_RT /* * This is the control structure for tasks blocked on mutex, which resides * on the blocked task's kernel stack: @@ -70,3 +70,4 @@ extern void debug_mutex_init(struct mutex *lock, const ch= ar *name, # define debug_mutex_unlock(lock) do { } while (0) # define debug_mutex_init(lock, name, key) do { } while (0) #endif /* !CONFIG_DEBUG_MUTEXES */ +#endif /* CONFIG_PREEMPT_RT */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 33f0260c20609..c58980028fb5f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -95,6 +95,7 @@ #include "../workqueue_internal.h" #include "../../io_uring/io-wq.h" #include "../smpboot.h" +#include "../locking/mutex.h" =20 EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_send_cpu); EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_send_cpumask); @@ -2955,8 +2956,15 @@ static int affine_move_task(struct rq *rq, struct ta= sk_struct *p, struct rq_flag struct set_affinity_pending my_pending =3D { }, *pending =3D NULL; bool stop_pending, complete =3D false; =20 - /* Can the task run on the task's current CPU? If so, we're done */ - if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) { + /* + * Can the task run on the task's current CPU? If so, we're done + * + * We are also done if the task is the current donor, boosting a lock- + * holding proxy, (and potentially has been migrated outside its + * current or previous affinity mask) + */ + if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask) || + (task_current_donor(rq, p) && !task_current(rq, p))) { struct task_struct *push_task =3D NULL; =20 if ((flags & SCA_MIGRATE_ENABLE) && @@ -6678,37 +6686,126 @@ static struct task_struct *proxy_deactivate(struct= rq *rq, struct task_struct *d } =20 /* - * Initial simple sketch that just deactivates the blocked task - * chosen by pick_next_task() so we can then pick something that - * isn't blocked. + * Find runnable lock owner to proxy for mutex blocked donor + * + * Follow the blocked-on relation: + * task->blocked_on -> mutex->owner -> task... + * + * Lock order: + * + * p->pi_lock + * rq->lock + * mutex->wait_lock + * + * Returns the task that is going to be used as execution context (the one + * that is actually going to be run on cpu_of(rq)). */ static struct task_struct * find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags = *rf) { - struct task_struct *p =3D donor; + struct task_struct *owner =3D NULL; + int this_cpu =3D cpu_of(rq); + struct task_struct *p; struct mutex *mutex; =20 - mutex =3D p->blocked_on; - /* Something changed in the chain, so pick again */ - if (!mutex) - return NULL; - /* - * By taking mutex->wait_lock we hold off concurrent mutex_unlock() - * and ensure @owner sticks around. - */ - guard(raw_spinlock)(&mutex->wait_lock); + /* Follow blocked_on chain. */ + for (p =3D donor; task_is_blocked(p); p =3D owner) { + mutex =3D p->blocked_on; + /* Something changed in the chain, so pick again */ + if (!mutex) + return NULL; + /* + * By taking mutex->wait_lock we hold off concurrent mutex_unlock() + * and ensure @owner sticks around. + */ + guard(raw_spinlock)(&mutex->wait_lock); + + /* Check again that p is blocked with wait_lock held */ + if (mutex !=3D __get_task_blocked_on(p)) { + /* + * Something changed in the blocked_on chain and + * we don't know if only at this level. So, let's + * just bail out completely and let __schedule + * figure things out (pick_again loop). + */ + return NULL; + } + + owner =3D __mutex_owner(mutex); + if (!owner) { + __clear_task_blocked_on(p, mutex); + return p; + } + + if (task_cpu(owner) !=3D this_cpu) { + /* XXX Don't handle migrations yet */ + return proxy_deactivate(rq, donor); + } + + if (task_on_rq_migrating(owner)) { + /* + * One of the chain of mutex owners is currently migrating to this + * CPU, but has not yet been enqueued because we are holding the + * rq lock. As a simple solution, just schedule rq->idle to give + * the migration a chance to complete. Much like the migrate_task + * case we should end up back in find_proxy_task(), this time + * hopefully with all relevant tasks already enqueued. + */ + return proxy_resched_idle(rq); + } + + if (!owner->on_rq) { + /* XXX Don't handle blocked owners yet */ + return proxy_deactivate(rq, donor); + } + + if (owner->se.sched_delayed) { + /* XXX Don't handle delayed dequeue yet */ + return proxy_deactivate(rq, donor); + } =20 - /* Check again that p is blocked with blocked_lock held */ - if (!task_is_blocked(p) || mutex !=3D __get_task_blocked_on(p)) { /* - * Something changed in the blocked_on chain and - * we don't know if only at this level. So, let's - * just bail out completely and let __schedule - * figure things out (pick_again loop). + * If owner was !on_rq, the task_cpu() check followed by on_rq check + * could race with a wakeup onto another cpu right inbetween those check= s. + * So double check owner is both on_rq & on this cpu. + */ + if (!(task_on_rq_queued(owner) && task_cpu(owner) =3D=3D this_cpu)) + return NULL; + + if (owner =3D=3D p) { + /* + * It's possible we interleave with mutex_unlock like: + * + * lock(&rq->lock); + * find_proxy_task() + * mutex_unlock() + * lock(&wait_lock); + * donor(owner) =3D current->blocked_donor; + * unlock(&wait_lock); + * + * wake_up_q(); + * ... + * ttwu_runnable() + * __task_rq_lock() + * lock(&wait_lock); + * owner =3D=3D p + * + * Which leaves us to finish the ttwu_runnable() and make it go. + * + * So schedule rq->idle so that ttwu_runnable can get the rq lock + * and mark owner as running. + */ + return proxy_resched_idle(rq); + } + /* + * OK, now we're absolutely sure @owner is on this + * rq, therefore holding @rq->lock is sufficient to + * guarantee its existence, as per ttwu_remote(). */ - return NULL; /* do pick_next_task again */ } - return proxy_deactivate(rq, donor); + + WARN_ON_ONCE(owner && !owner->on_rq); + return owner; } #else /* SCHED_PROXY_EXEC */ static struct task_struct * @@ -6858,10 +6955,13 @@ static void __sched notrace __schedule(int sched_mo= de) next =3D find_proxy_task(rq, next, &rf); if (!next) goto pick_again; + if (next =3D=3D rq->idle) + goto keep_resched; } picked: clear_tsk_need_resched(prev); clear_preempt_need_resched(); +keep_resched: rq->last_seen_need_resched_ns =3D 0; =20 is_switch =3D prev !=3D next; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index da8b0970c6655..b67c3b44c7b4d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9407,6 +9407,7 @@ int can_migrate_task(struct task_struct *p, struct lb= _env *env) * 3) cannot be migrated to this CPU due to cpus_ptr, or * 4) running (obviously), or * 5) are cache-hot on their current CPU. + * 6) are blocked on mutexes (if SCHED_PROXY_EXEC is enabled) */ if ((p->se.sched_delayed) && (env->migration_type !=3D migrate_load)) return 0; @@ -9428,6 +9429,9 @@ int can_migrate_task(struct task_struct *p, struct lb= _env *env) if (kthread_is_per_cpu(p)) return 0; =20 + if (task_is_blocked(p)) + return 0; + if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) { int cpu; =20 @@ -9463,7 +9467,8 @@ int can_migrate_task(struct task_struct *p, struct lb= _env *env) /* Record that we found at least one task that could run on dst_cpu */ env->flags &=3D ~LBF_ALL_PINNED; =20 - if (task_on_cpu(env->src_rq, p)) { + if (task_on_cpu(env->src_rq, p) || + task_current_donor(env->src_rq, p)) { schedstat_inc(p->stats.nr_failed_migrations_running); return 0; } @@ -9507,6 +9512,9 @@ static void detach_task(struct task_struct *p, struct= lb_env *env) schedstat_inc(p->stats.nr_forced_migrations); } =20 + WARN_ON(task_current(env->src_rq, p)); + WARN_ON(task_current_donor(env->src_rq, p)); + deactivate_task(env->src_rq, p, DEQUEUE_NOCLOCK); set_task_cpu(p, env->dst_cpu); } --=20 2.49.0.604.gff1f9ca942-goog