From nobody Sun Feb  8 12:24:33 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E6619EE49A3
	for <linux-kernel@archiver.kernel.org>; Tue, 22 Aug 2023 11:31:03 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234745AbjHVLbD (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 22 Aug 2023 07:31:03 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47416 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234732AbjHVLbC (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 22 Aug 2023 07:31:02 -0400
Received: from smtpout.efficios.com (unknown [IPv6:2607:5300:203:b2ee::31e5])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 81F57CD4
        for <linux-kernel@vger.kernel.org>;
 Tue, 22 Aug 2023 04:31:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com;
        s=smtpout1; t=1692703859;
        bh=v8MyPMO4Jmb5HBQwko1EVELi/LKGqwIlMqsnHcrxHJw=;
        h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
        b=LYhtNoBGAEy/YK6hmXvzCJeC8gcCB5epdPYJ/Id5AXi0n+oX60faMWuceYgh8vy2P
         bG/EpqI1KlHejo2XDGXzOTcZ/a9/RV4j7irK5WAHNS2gMAGOdywy+taEbYBFgDRVvM
         5SeHoXXwXZQ5H49rIsWHI1qdiTOI7GQ7DOFf+5hEsMw1Kneq2R40ntkCCpuA+3lrEw
         bDY9S19odVwbfF298AFgCwF2Gr2OBcnufolaglnOPGnSstBKl+7+YsiwslLR65/Y63
         VqtXEjMy8lpKZxESp3O7R5hqTIJ1h0FCTTxVNWiXUu70iKeth2k2zZa9L6CSJJBkQN
         6KEImpbVTrW8A==
Received: from thinkos.home (unknown [142.120.205.109])
        by smtpout.efficios.com (Postfix) with ESMTPSA id 4RVRxH4qKpz1M2M;
        Tue, 22 Aug 2023 07:30:59 -0400 (EDT)
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org,
        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
        Ingo Molnar <mingo@redhat.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Swapnil Sapkal <Swapnil.Sapkal@amd.com>,
        Aaron Lu <aaron.lu@intel.com>,
        Julien Desfossez <jdesfossez@digitalocean.com>, x86@kernel.org
Subject: [RFC PATCH v3 2/3] sched: Introduce cpus_share_l2c
Date: Tue, 22 Aug 2023 07:31:32 -0400
Message-Id: <20230822113133.643238-3-mathieu.desnoyers@efficios.com>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230822113133.643238-1-mathieu.desnoyers@efficios.com>
References: <20230822113133.643238-1-mathieu.desnoyers@efficios.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Introduce cpus_share_l2c to allow querying whether two logical CPUs
share a common L2 cache.

Considering a system like the AMD EPYC 9654 96-Core Processor, the L1
cache has a latency of 4-5 cycles, the L2 cache has a latency of at
least 14ns, whereas the L3 cache has a latency of 50ns [1]. Compared to
this, I measured the RAM accesses to a latency around 120ns on my
system [2]. So L3 really is only 2.4x faster than RAM accesses.
Therefore, with this relatively slow access speed compared to L2, the
scheduler will benefit from only considering CPUs sharing an L2 cache
for the purpose of using remote runqueue locking rather than queued
wakeups.

Link: https://en.wikichip.org/wiki/amd/microarchitectures/zen_4 [1]
Link: https://github.com/ChipsandCheese/MemoryLatencyTest [2]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Swapnil Sapkal <Swapnil.Sapkal@amd.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Julien Desfossez <jdesfossez@digitalocean.com>
Cc: x86@kernel.org
---
Changes since v1:
- Fix l2c id for configurations where L2 have a single logical CPU:
  use TOPOLOGY_CLUSTER_SYSFS to find out whether topology cluster is
  implemented or if LLC should be used as fallback.

Changes since v2:
- Reverse order of cpu_get_l2c_info() l2c_id and l2c_size output
  arguments to match the caller.
---
 include/linux/sched/topology.h |  6 ++++++
 kernel/sched/core.c            |  8 ++++++++
 kernel/sched/sched.h           |  2 ++
 kernel/sched/topology.c        | 32 +++++++++++++++++++++++++++++---
 4 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 7f9331f71260..c5fdee188bea 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -178,6 +178,7 @@ extern void partition_sched_domains(int ndoms_new, cpum=
ask_var_t doms_new[],
 cpumask_var_t *alloc_sched_domains(unsigned int ndoms);
 void free_sched_domains(cpumask_var_t doms[], unsigned int ndoms);
=20
+bool cpus_share_l2c(int this_cpu, int that_cpu);
 bool cpus_share_llc(int this_cpu, int that_cpu);
=20
 typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
@@ -227,6 +228,11 @@ partition_sched_domains(int ndoms_new, cpumask_var_t d=
oms_new[],
 {
 }
=20
+static inline bool cpus_share_l2c(int this_cpu, int that_cpu)
+{
+	return true;
+}
+
 static inline bool cpus_share_llc(int this_cpu, int that_cpu)
 {
 	return true;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d096ce815099..11e60a69ae31 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3904,6 +3904,14 @@ void wake_up_if_idle(int cpu)
 	rcu_read_unlock();
 }
=20
+bool cpus_share_l2c(int this_cpu, int that_cpu)
+{
+	if (this_cpu =3D=3D that_cpu)
+		return true;
+
+	return per_cpu(sd_l2c_id, this_cpu) =3D=3D per_cpu(sd_l2c_id, that_cpu);
+}
+
 bool cpus_share_llc(int this_cpu, int that_cpu)
 {
 	if (this_cpu =3D=3D that_cpu)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 81ac605b9cd5..d93543db214c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1828,6 +1828,8 @@ static inline struct sched_domain *lowest_flag_domain=
(int cpu, int flag)
 	return sd;
 }
=20
+DECLARE_PER_CPU(int, sd_l2c_size);
+DECLARE_PER_CPU(int, sd_l2c_id);
 DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc);
 DECLARE_PER_CPU(int, sd_llc_size);
 DECLARE_PER_CPU(int, sd_llc_id);
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 1ae2a0a1115a..fadb66edcf5e 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -661,8 +661,11 @@ static void destroy_sched_domains(struct sched_domain =
*sd)
  *
  * Also keep a unique ID per domain (we use the first CPU number in
  * the cpumask of the domain), this allows us to quickly tell if
- * two CPUs are in the same cache domain, see cpus_share_llc().
+ * two CPUs are in the same cache domain, see cpus_share_l2c() and
+ * cpus_share_llc().
  */
+DEFINE_PER_CPU(int, sd_l2c_size);
+DEFINE_PER_CPU(int, sd_l2c_id);
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc);
 DEFINE_PER_CPU(int, sd_llc_size);
 DEFINE_PER_CPU(int, sd_llc_id);
@@ -672,12 +675,27 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_p=
acking);
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
 DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
=20
+#ifdef TOPOLOGY_CLUSTER_SYSFS
+static int cpu_get_l2c_info(int cpu, int *l2c_size, int *l2c_id)
+{
+	const struct cpumask *cluster_mask =3D topology_cluster_cpumask(cpu);
+
+	*l2c_size =3D cpumask_weight(cluster_mask);
+	*l2c_id =3D cpumask_first(cluster_mask);
+	return 0;
+}
+#else
+static int cpu_get_l2c_info(int cpu, int *l2c_size, int *l2c_id)
+{
+	return -1;
+}
+#endif
+
 static void update_top_cache_domain(int cpu)
 {
 	struct sched_domain_shared *sds =3D NULL;
 	struct sched_domain *sd;
-	int id =3D cpu;
-	int size =3D 1;
+	int id =3D cpu, size =3D 1, l2c_id, l2c_size;
=20
 	sd =3D highest_flag_domain(cpu, SD_SHARE_PKG_RESOURCES);
 	if (sd) {
@@ -686,6 +704,14 @@ static void update_top_cache_domain(int cpu)
 		sds =3D sd->shared;
 	}
=20
+	if (cpu_get_l2c_info(cpu, &l2c_size, &l2c_id)) {
+		/* Fallback on using LLC. */
+		l2c_size =3D size;
+		l2c_id =3D id;
+	}
+	per_cpu(sd_l2c_size, cpu) =3D l2c_size;
+	per_cpu(sd_l2c_id, cpu) =3D l2c_id;
+
 	rcu_assign_pointer(per_cpu(sd_llc, cpu), sd);
 	per_cpu(sd_llc_size, cpu) =3D size;
 	per_cpu(sd_llc_id, cpu) =3D id;
--=20
2.39.2