From nobody Mon May 25 00:09:07 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B10383BE14A; Wed, 20 May 2026 08:34:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779266049; cv=none; b=NJqQOgsFVt0etNETfGebF/cAYb5+nssodsJls+E2FXmUepNfJlK8SNUwGw039C5ygMsQLwNEADOxyj0ExNetZ+jDsOhQQTaZgHcOcEfrH+bhC37FTDdQAuo0Vuv3H56N8QoY/JbmQl8FreMMIJZFWzwh5/o5SBnqGgIYuMbKlEE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779266049; c=relaxed/simple; bh=CXCj2BJ0WcrqyQWNm6O1sw/jVuYmbCzURlA/xUJgq/8=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=a5w0t/GceVZxcv9LEDy1RF0ZXWAlnh8Mnl8rgEdcGKqw4tdClDVWmjv1CbiiZKadQtl/daf2ExAfq5zY/qoYNa6eBpsI2rvLk6bsHsYHg6QTd2od8FEF8i0Dwwe0WIrT2ECv/YVm5qfhKqxn4N/L3xo4UK3GgJJGj+CIe0rnZqA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=LoRfOQOI; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=nfIPalh+; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="LoRfOQOI"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="nfIPalh+" Date: Wed, 20 May 2026 08:34:04 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1779266046; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tNQ9+C+aGmReBwRdWdaMioxsEf0XPK3I16EZKfLN/pU=; b=LoRfOQOI6HejE4EGhY6hcDL8rZXzP6IrBD3m+diQraPD6nYYc+jFKI+SVfZkBhrAcZXs5D S/RWJulR8nan2LmMM1MDH/Uclxz0pjeeXRy9CE0EvBHZaRqfnf5JYuV5uRcPsWpzHlfEBh OoShos/4L68OGISrJrQoTy7oswS8dRVpG7yb7xbuJRDcGkhgpOlv5vlUsexflzYe61K+gc 45m+TltiuLWY1GuifhQSrnsoFCE/Sd34kUbjsdIyLkpncQGZWBQjTW10WUgaiXuOFXZZI7 IGxJT8UFWN3pc5U5QQJufGZSG5epSBj+xLXrcdoEqup+Z3yyebNPPj3a13FT0w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1779266046; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tNQ9+C+aGmReBwRdWdaMioxsEf0XPK3I16EZKfLN/pU=; b=nfIPalh+OdL64UQr0X6gCs7vmxJYEfDkLs7kbSgpQ6UnvljOazQEZPSC1hjrTrJPjkPAgG tMdc4ge7anr39qDA== From: "tip-bot2 for K Prateek Nayak" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/core] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity Cc: Andrea Righi , K Prateek Nayak , "Peter Zijlstra (Intel)" , Shrikanth Hegde , Vincent Guittot , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20260516055850.1345932-1-arighi@nvidia.com> References: <20260516055850.1345932-1-arighi@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <177926604484.711.7467616484857242040.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the sched/core branch of tip: Commit-ID: fdfe5a8cd8731dd81840f26abfb6527edd27b0cb Gitweb: https://git.kernel.org/tip/fdfe5a8cd8731dd81840f26abfb6527ed= d27b0cb Author: K Prateek Nayak AuthorDate: Sat, 16 May 2026 07:58:50 +02:00 Committer: Peter Zijlstra CommitterDate: Tue, 19 May 2026 12:17:38 +02:00 sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity On asymmetric CPU capacity systems, the wakeup path uses select_idle_capacity(), which scans the span of sd_asym_cpucapacity rather than sd_llc. The has_idle_cores hint however lives on sd_llc->shared, so the wakeup-time read of has_idle_cores operates on an LLC-scoped blob while the actual scan/decision spans the asym domain; nr_busy_cpus also lives in the same shared sched_domain data, but it's never used in the asym CPU capacity scenario. Therefore, move the sched_domain_shared object to sd_asym_cpucapacity whenever the CPU has a SD_ASYM_CPUCAPACITY_FULL ancestor and that ancestor is non-overlapping (i.e., not built from SD_NUMA). In that case the scope of has_idle_cores matches the scope of the wakeup scan. Fall back to attaching the shared object to sd_llc in three cases: 1) plain symmetric systems (no SD_ASYM_CPUCAPACITY_FULL anywhere); 2) CPUs in an exclusive cpuset that carves out a symmetric capacity island: has_asym is system-wide but those CPUs have no SD_ASYM_CPUCAPACITY_FULL ancestor in their hierarchy and follow the symmetric LLC path in select_idle_sibling(); 3) exotic topologies where SD_ASYM_CPUCAPACITY_FULL lands on an SD_NUMA-built domain. init_sched_domain_shared() keys the shared blob off cpumask_first(span), which on overlapping NUMA domains would alias unrelated spans onto the same blob. Keep the shared object on the LLC there; select_idle_capacity() gracefully skips the has_idle_cores preference when sd->shared is NULL. While at it, also rename the per-CPU sd_llc_shared to sd_balance_shared, as it is no longer strictly tied to the LLC. Co-developed-by: Andrea Righi Signed-off-by: Andrea Righi Signed-off-by: K Prateek Nayak Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Shrikanth Hegde Acked-by: Vincent Guittot Link: https://patch.msgid.link/20260516055850.1345932-1-arighi@nvidia.com --- kernel/sched/fair.c | 22 +++++---- kernel/sched/sched.h | 2 +- kernel/sched/topology.c | 95 ++++++++++++++++++++++++++++++++++------ 3 files changed, 97 insertions(+), 22 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 03f63b0..2637a6f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7773,7 +7773,7 @@ static inline void set_idle_cores(int cpu, int val) { struct sched_domain_shared *sds; =20 - sds =3D rcu_dereference_all(per_cpu(sd_llc_shared, cpu)); + sds =3D rcu_dereference_all(per_cpu(sd_balance_shared, cpu)); if (sds) WRITE_ONCE(sds->has_idle_cores, val); } @@ -7782,7 +7782,7 @@ static inline bool test_idle_cores(int cpu) { struct sched_domain_shared *sds; =20 - sds =3D rcu_dereference_all(per_cpu(sd_llc_shared, cpu)); + sds =3D rcu_dereference_all(per_cpu(sd_balance_shared, cpu)); if (sds) return READ_ONCE(sds->has_idle_cores); =20 @@ -7791,7 +7791,7 @@ static inline bool test_idle_cores(int cpu) =20 /* * Scans the local SMT mask to see if the entire core is idle, and records= this - * information in sd_llc_shared->has_idle_cores. + * information in sd_balance_shared->has_idle_cores. * * Since SMT siblings share all cache levels, inspecting this limited remo= te * state should be fairly cheap. @@ -7821,7 +7821,8 @@ unlock: /* * Scan the entire LLC domain for idle cores; this dynamically switches of= f if * there are no idle cores left in the system; tracked through - * sd_llc->shared->has_idle_cores and enabled through update_idle_core() a= bove. + * sd_balance_shared->has_idle_cores and enabled through update_idle_core() + * above. */ static int select_idle_core(struct task_struct *p, int core, struct cpumas= k *cpus, int *idle_cpu) { @@ -7885,7 +7886,7 @@ static int select_idle_cpu(struct task_struct *p, str= uct sched_domain *sd, bool=20 struct cpumask *cpus =3D this_cpu_cpumask_var_ptr(select_rq_mask); int i, cpu, idle_cpu =3D -1, nr =3D INT_MAX; =20 - if (sched_feat(SIS_UTIL)) { + if (sched_feat(SIS_UTIL) && sd->shared) { /* * Increment because !--nr is the condition to stop scan. * @@ -12764,7 +12765,7 @@ static void nohz_balancer_kick(struct rq *rq) goto out; } =20 - sds =3D rcu_dereference_all(per_cpu(sd_llc_shared, cpu)); + sds =3D rcu_dereference_all(per_cpu(sd_balance_shared, cpu)); if (sds) { /* * If there is an imbalance between LLC domains (IOW we could @@ -12792,7 +12793,11 @@ static void set_cpu_sd_state_busy(int cpu) struct sched_domain *sd; sd =3D rcu_dereference_all(per_cpu(sd_llc, cpu)); =20 - if (!sd || !sd->nohz_idle) + /* + * sd->nohz_idle only pairs with nr_busy_cpus on sd->shared; if this + * domain has no shared object there is nothing to clear or account. + */ + if (!sd || !sd->shared || !sd->nohz_idle) return; sd->nohz_idle =3D 0; =20 @@ -12817,7 +12822,8 @@ static void set_cpu_sd_state_idle(int cpu) struct sched_domain *sd; sd =3D rcu_dereference_all(per_cpu(sd_llc, cpu)); =20 - if (!sd || sd->nohz_idle) + /* See set_cpu_sd_state_busy(): nohz_idle is only used with sd->shared. */ + if (!sd || !sd->shared || sd->nohz_idle) return; sd->nohz_idle =3D 1; =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ffe77b2..bfb4b47 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2164,7 +2164,7 @@ DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc); DECLARE_PER_CPU(int, sd_llc_size); DECLARE_PER_CPU(int, sd_llc_id); DECLARE_PER_CPU(int, sd_share_id); -DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared); +DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_balance_shared); DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa); DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index a1f46e3..f96d501 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -665,7 +665,7 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc); DEFINE_PER_CPU(int, sd_llc_size); DEFINE_PER_CPU(int, sd_llc_id); DEFINE_PER_CPU(int, sd_share_id); -DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared); +DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_balance_shared); DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa); DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); @@ -680,20 +680,38 @@ static void update_top_cache_domain(int cpu) int id =3D cpu; int size =3D 1; =20 + sd =3D lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL); + /* + * The shared object is attached to sd_asym_cpucapacity only when the + * asym domain is non-overlapping (i.e., not built from SD_NUMA). + * On overlapping (NUMA) asym domains we fall back to letting the + * SD_SHARE_LLC path own the shared object, so sd->shared may be NULL + * here. + */ + if (sd && sd->shared) + sds =3D sd->shared; + + rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd); + sd =3D highest_flag_domain(cpu, SD_SHARE_LLC); if (sd) { id =3D cpumask_first(sched_domain_span(sd)); size =3D cpumask_weight(sched_domain_span(sd)); =20 - /* If sd_llc exists, sd_llc_shared should exist too. */ - WARN_ON_ONCE(!sd->shared); - sds =3D sd->shared; + /* + * If sd_asym_cpucapacity didn't claim the shared object, + * sd_llc must have one linked. + */ + if (!sds) { + WARN_ON_ONCE(!sd->shared); + sds =3D sd->shared; + } } =20 rcu_assign_pointer(per_cpu(sd_llc, cpu), sd); per_cpu(sd_llc_size, cpu) =3D size; per_cpu(sd_llc_id, cpu) =3D id; - rcu_assign_pointer(per_cpu(sd_llc_shared, cpu), sds); + rcu_assign_pointer(per_cpu(sd_balance_shared, cpu), sds); =20 sd =3D lowest_flag_domain(cpu, SD_CLUSTER); if (sd) @@ -711,9 +729,6 @@ static void update_top_cache_domain(int cpu) =20 sd =3D highest_flag_domain(cpu, SD_ASYM_PACKING); rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd); - - sd =3D lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL); - rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd); } =20 /* @@ -2648,6 +2663,54 @@ static void adjust_numa_imbalance(struct sched_domai= n *sd_llc) } } =20 +static void init_sched_domain_shared(struct s_data *d, struct sched_domain= *sd) +{ + int sd_id =3D cpumask_first(sched_domain_span(sd)); + + sd->shared =3D *per_cpu_ptr(d->sds, sd_id); + /* + * nr_busy_cpus is consumed only by the NOHZ kick path via + * sd_balance_shared; on the asym-capacity path it is initialized but + * never read. + */ + atomic_set(&sd->shared->nr_busy_cpus, sd->span_weight); + atomic_inc(&sd->shared->ref); +} + +/* + * For asymmetric CPU capacity, attach sched_domain_shared on the innermost + * SD_ASYM_CPUCAPACITY_FULL ancestor of @cpu's base domain when that ances= tor is + * not an overlapping NUMA-built domain (then LLC should claim shared). + * + * A CPU may lack any FULL ancestor (e.g., exclusive cpuset symmetric isla= nd), + * then LLC must claim shared instead. + * + * Note: SD_ASYM_CPUCAPACITY_FULL is only set when all CPU capacity values + * are present in the domain span, so the asym domain we attach to cannot + * degenerate into a single-capacity group. The relevant edge cases are in= stead + * covered by the caveats above. + * + * Return true if this CPU's asym path claimed sd->shared, false otherwise. + */ +static bool claim_asym_sched_domain_shared(struct s_data *d, int cpu) +{ + struct sched_domain *sd =3D *per_cpu_ptr(d->sd, cpu); + struct sched_domain *sd_asym; + + if (!sd) + return false; + + sd_asym =3D sd; + while (sd_asym && !(sd_asym->flags & SD_ASYM_CPUCAPACITY_FULL)) + sd_asym =3D sd_asym->parent; + + if (!sd_asym || (sd_asym->flags & SD_NUMA)) + return false; + + init_sched_domain_shared(d, sd_asym); + return true; +} + /* * Build sched domains for a given set of CPUs and attach the sched domains * to the individual CPUs @@ -2706,20 +2769,26 @@ build_sched_domains(const struct cpumask *cpu_map, = struct sched_domain_attr *att } =20 for_each_cpu(i, cpu_map) { + bool asym_claimed =3D false; + sd =3D *per_cpu_ptr(d.sd, i); if (!sd) continue; =20 + if (has_asym) + asym_claimed =3D claim_asym_sched_domain_shared(&d, i); + /* First, find the topmost SD_SHARE_LLC domain */ while (sd->parent && (sd->parent->flags & SD_SHARE_LLC)) sd =3D sd->parent; =20 if (sd->flags & SD_SHARE_LLC) { - int sd_id =3D cpumask_first(sched_domain_span(sd)); - - sd->shared =3D *per_cpu_ptr(d.sds, sd_id); - atomic_set(&sd->shared->nr_busy_cpus, sd->span_weight); - atomic_inc(&sd->shared->ref); + /* + * Initialize the sd->shared for SD_SHARE_LLC unless + * the asym path above already claimed it. + */ + if (!asym_claimed) + init_sched_domain_shared(&d, sd); =20 /* * In presence of higher domains, adjust the