From nobody Mon May 25 00:09:08 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF6423BAD84; Wed, 20 May 2026 08:33:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779266041; cv=none; b=PUtL/RWkE2lWe1da+AlZqYOk3OwI8uiSoQmvmmmIFcPYyWEXa1iE8IeEhp+tSb+VwiHLdpz7eU6soI6F/K6AbTCSm3hsbLNJZv1MXi853eHiOMsk0NtsYHegby0gs8H3hhBfvm4f1EmR9t7g1UYPl5GXUTBd5Ooe4Jie8b7ODCk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779266041; c=relaxed/simple; bh=wGyj8nzIMjtaqOy0L6JX14WGk1QmWdMwEoDGn1SLAXA=; h=Date:From:To:Subject:Cc:MIME-Version:Message-ID:Content-Type; b=JbHVViscwiBZH24/Lc46U0vBZ8WyCoY/VhbJYdLS/jGECF2Gy+F1i6OrlR3bU/bnsjS14dB7JWPQZwVdYKlmdUCITC5qFPpfqQtW96RV3Z189ZzRb+G5+hB/Wh6s/RycQ3KVpQpKyeXMiqtPPAfpUP69Rj1+pvzjXhsuu5hg3Dc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=EqPYGgUp; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=8zqmB/Tp; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="EqPYGgUp"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="8zqmB/Tp" Date: Wed, 20 May 2026 08:33:57 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1779266038; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Y8pTP+OAcK4S5I5JM7HrBG3xYh/8cWuxmPCBHsMUW84=; b=EqPYGgUpQ2jipCv77tMzUXPGbIa/eB1BvapiLvr4m4YDaaUrBiUE+ZigQfoeqxnr06gWsp x3hSgeJ2jTjplOzWdF2BYJyIkQD9Zm5cNrp6Tptd4qnd8AC4ygiL3u2Fli5RcOQX1jDuK9 +jw3X0t831Duwfumo9qdNsN+vH64/s8Gvn+kp0jgZU0pFRuLZajlSSzFZUlq6IVRTP2pFo cu8ESEGUrbqT5VtjwBHAQoURRC4QCb97zEwkzv0KHpxjWu2uSLReHQz4d6ILniQ7xftl9w jkaBpe0mJLja+fd1KzCLT9E6IY3dk5bLHFwjDPTlrbMAzMFq4t248y/3Ci0Mxw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1779266038; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Y8pTP+OAcK4S5I5JM7HrBG3xYh/8cWuxmPCBHsMUW84=; b=8zqmB/TpUmv/zl/EswcxtFRNN4Nf3JJlhH3dxJwBunmWCRz1vHYdRCHhkoRJDmtomEP2nk AxnHxlnRc7rOnPCA== From: "tip-bot2 for K Prateek Nayak" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/core] sched/topology: Allow multiple domains to claim sched_domain_shared Cc: Peter Zijlstra , K Prateek Nayak , Andrea Righi , x86@kernel.org, linux-kernel@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <177926603704.711.1041501448427895324.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the sched/core branch of tip: Commit-ID: 9e005ed21152d4a4bb0ceea71045ff8a642a6feb Gitweb: https://git.kernel.org/tip/9e005ed21152d4a4bb0ceea71045ff8a6= 42a6feb Author: K Prateek Nayak AuthorDate: Tue, 19 May 2026 05:14:23=20 Committer: Peter Zijlstra CommitterDate: Tue, 19 May 2026 13:35:36 +02:00 sched/topology: Allow multiple domains to claim sched_domain_shared Recent optimizations of sd->shared assignment moved to allocating a single instance of per-CPU sched_domain_shared objects per s_data. Recent optimizations to select_idle_capacity() moved the sd->shared assignments to "sd_asym" domain when ASYM_CPUCAPACITY is detected but cache-aware scheduling mandates the presence of "sd_llc_shared" to compute and cache per-LLC statistics. Use an "alloc_flags" union in sched_domain_shared to claim a sched_domain_shared object per sched_domain. Allocation starts searching for an available / matching sched_domain_shared instance from the first CPU of sched_domain_span(sd) (sd can be sd_llc, or sd_asym). If the shared object is claimed by another domain, the instance corresponding to next CPU in the domain span is explored until a matching / available instance is found. In case of a single CPU in sched_domain_span(), the domain will be degenerated and a temporary overlap of ->shared objects across different domains is acceptable. "alloc_flags" forms a union with "nr_idle_scan" and the stale flags are left as is when the sd->shared is published. The expectation is for the first load balancing instance to correct the value just like the current behavior, except the initial value is no longer 0. Originally-by: Peter Zijlstra Signed-off-by: K Prateek Nayak Signed-off-by: Peter Zijlstra (Intel) Tested-by: Andrea Righi --- include/linux/sched/topology.h | 16 +++++++- kernel/sched/topology.c | 63 ++++++++++++++++++++++++++++----- 2 files changed, 69 insertions(+), 10 deletions(-) diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index fe09d32..b5d9d7c 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -67,7 +67,21 @@ struct sched_domain_shared { atomic_t ref; atomic_t nr_busy_cpus; int has_idle_cores; - int nr_idle_scan; + union { + int nr_idle_scan; + /* + * Used during allocation to claim the sched_domain_shared + * object at multiple levels. + * + * Note: between build and the first periodic LB tick, which + * rewrites the union via update_idle_cpu_scan(), readers of + * nr_idle_scan may observe the transient SD_* flag value as + * the scan bound. The flag bits are small positive integers, + * so the effect is just a slightly relaxed scan bound for one + * window and self-heals on the first tick. + */ + int alloc_flags; + }; #ifdef CONFIG_SCHED_CACHE unsigned long util_avg; unsigned long capacity; diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index dbfd965..df2ceb5 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -623,6 +623,12 @@ static void free_sched_groups(struct sched_group *sg, = int free_sgc) } while (sg !=3D first); } =20 +static void free_sched_domain_shared(struct sched_domain_shared *sds) +{ + if (sds && atomic_dec_and_test(&sds->ref)) + kfree(sds); +} + static void destroy_sched_domain(struct sched_domain *sd) { /* @@ -631,9 +637,7 @@ static void destroy_sched_domain(struct sched_domain *s= d) * dropping group/capacity references, freeing where none remain. */ free_sched_groups(sd->groups, 1); - - if (sd->shared && atomic_dec_and_test(&sd->shared->ref)) - kfree(sd->shared); + free_sched_domain_shared(sd->shared); =20 #ifdef CONFIG_SCHED_CACHE /* only the bottom sd has llc_counts array */ @@ -755,7 +759,14 @@ cpu_attach_domain(struct sched_domain *sd, struct root= _domain *rd, int cpu) =20 /* Pick reference to parent->shared. */ if (parent->shared) { - WARN_ON_ONCE(tmp->shared); + /* + * It is safe to free a sd->shared that + * has not been published yet. If a + * sd->shared was published, the refcount + * will end up being non-zero and it will + * not be freed here. + */ + free_sched_domain_shared(tmp->shared); tmp->shared =3D parent->shared; parent->shared =3D NULL; } @@ -2916,11 +2927,45 @@ static void adjust_numa_imbalance(struct sched_doma= in *sd_llc) } } =20 -static void init_sched_domain_shared(struct s_data *d, struct sched_domain= *sd) +static void +init_sched_domain_shared(struct s_data *d, struct sched_domain *sd, int fl= ags) { - int sd_id =3D cpumask_first(sched_domain_span(sd)); + struct sched_domain_shared *sds =3D NULL; + int cpu; + + /* + * Multiple domains can try to claim a shared object like + * SD_ASYM_CPUCAPACITY and SD_SHARE_LLC which can alias to + * same cpumask_first(sched_domain_span(sd)) CPU and can + * cause "nr_idle_scan" to be populated incorrectly during + * load balancing. + * + * Find the first CPU in sched_domain_span(sd) with an + * unclaimed domain (!alloc_flags) or where the alloc_flag + * matches the requested flag (SD_* flag) + * + * If the domain only has single CPU, allow temporary overlap + * in allocation since the domains will be degenerated later. + */ + for_each_cpu(cpu, sched_domain_span(sd)) { + sds =3D *per_cpu_ptr(d->sds, cpu); + + if (!sds->alloc_flags || + sd->span_weight =3D=3D 1 || + sds->alloc_flags =3D=3D flags) { + sds->alloc_flags =3D flags; + sd->shared =3D sds; + break; + } + } + + /* + * Use the sd_shared corresponding to the last + * CPU in the span if none are avaialable. + */ + if (WARN_ON_ONCE(!sd->shared)) + sd->shared =3D sds; =20 - sd->shared =3D *per_cpu_ptr(d->sds, sd_id); /* * nr_busy_cpus is consumed only by the NOHZ kick path via * sd_balance_shared; on the asym-capacity path it is initialized but @@ -2960,7 +3005,7 @@ static bool claim_asym_sched_domain_shared(struct s_d= ata *d, int cpu) if (!sd_asym || (sd_asym->flags & SD_NUMA)) return false; =20 - init_sched_domain_shared(d, sd_asym); + init_sched_domain_shared(d, sd_asym, SD_ASYM_CPUCAPACITY); return true; } =20 @@ -3115,7 +3160,7 @@ build_sched_domains(const struct cpumask *cpu_map, st= ruct sched_domain_attr *att sd =3D sd->parent; =20 if (sd->flags & SD_SHARE_LLC) { - init_sched_domain_shared(&d, sd); + init_sched_domain_shared(&d, sd, SD_SHARE_LLC); =20 /* * In presence of higher domains, adjust the