From nobody Sat Feb  7 14:14:04 2026
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F2A72F0696
	for <linux-kernel@vger.kernel.org>; Wed,  3 Dec 2025 23:01:29 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.11
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764802892; cv=none;
 b=JuI4HP7FPjUZRRvIF57U5a+nyKFVaejSLBjOwb2o4K+dyMy+TzvS6alNai1tmhDlx/F2kpTdrbKJxXsp0ye0xTv9vWh98FuHcXDXimNg3p+EZ0AClnIocNRkMFznzOXiGUgsNO6KJzOsOmRV7MqRji4PoMn2fV9YYulhopDCdW0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764802892; c=relaxed/simple;
	bh=Kgfm8ZrVAem+cuIFSErLp11pWO+uaVSLeCf068dctEI=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=oWgrj/+vmbD4ydPoKoPApP5RMU0UBhjF4mxsiADMVL/t5AARqr//6C8rqPkshWdzhhrhMPF1AzqYud7ZATo+YBem2D9OjWwAWcvEU+adG0BNbDeKX0F/tFC7FpYkxBtH1K1PhGVx8OIwbNowGJZ5W0OZkvMWwyvk09t3vXbHMn4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=pass smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=Ff+wBHml; arc=none smtp.client-ip=198.175.65.11
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="Ff+wBHml"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1764802889; x=1796338889;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Kgfm8ZrVAem+cuIFSErLp11pWO+uaVSLeCf068dctEI=;
  b=Ff+wBHmlGI9Ls+hPQ/icfiRSQpZE9xFA2dMFUkAvN4HoLDq9rsPxZPeZ
   8VRONCVnKKzfdp0/tx6ByohayUgQnukEiUM/5FG80edcOUwn8pLvcV6CD
   rsakyGnOPLHSStQkG1+f0q6DnjhqobEUdJaywwMsE54fftDticAbLprId
   3bhB2AwAPJQjK37rs0/N96in+m4FjW7qil9FvPJrQKe2CXx6Vw8vc05XH
   UOnoKjT+4VoaXotKSh3uNxjPZTKFSxLyHcD1a3z71R7y9pyahaHenJnCZ
   3UkyBEcsW2m1c1Cx8k4IAc/bj/uxMr+zGfxYNNEZL+3nmX/2zLcKYH7UG
   w==;
X-CSE-ConnectionGUID: XVdRsMs0TMKO/Xjz8IoNBA==
X-CSE-MsgGUID: 8lt8Jb1nTZqY7huRsHsSQw==
X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="77136318"
X-IronPort-AV: E=Sophos;i="6.20,247,1758610800";
   d="scan'208";a="77136318"
Received: from fmviesa004.fm.intel.com ([10.60.135.144])
  by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Dec 2025 15:01:28 -0800
X-CSE-ConnectionGUID: HuS/ZH/YT/Cjm+dSF3UiRw==
X-CSE-MsgGUID: YDDbEJCdQwCGXEo7YEaNTQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.20,247,1758610800";
   d="scan'208";a="199763787"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa004.fm.intel.com with ESMTP; 03 Dec 2025 15:01:28 -0800
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Vern Hao <haoxing990@gmail.com>,
	Len Brown <len.brown@intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	Adam Li <adamli@os.amperecomputing.com>,
	Aaron Lu <ziqianlu@bytedance.com>,
	Tim Chen <tim.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [PATCH v2 07/23] sched/cache: Introduce per runqueue task LLC
 preference counter
Date: Wed,  3 Dec 2025 15:07:26 -0800
Message-Id: 
 <63091f7ca7bb473fbc176af86a87d27a07a6e149.1764801860.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1764801860.git.tim.c.chen@linux.intel.com>
References: <cover.1764801860.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Each runqueue is assigned an array where each element tracks
the number of tasks preferring a given LLC, indexed from 0 to
max_llcs - 1.

For example, rq->nr_pref_llc[3] =3D 2 signifies that there are 2 tasks on
this runqueue which prefer to run within LLC3.

The load balancer can use this information to identify busy
runqueues and migrate tasks to their preferred LLC domains.
This array will be reallocated at runtime if the number of LLCs
increases due to CPU hotplug. Only extending the buffer(rather
than shrinking it) is supported to simplify the implementation.

Introduce the buffer allocation mechanism, and the statistics
will be calculated in the subsequent patch.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---

Notes:
    v1->v2:
        Remove static allocation of per runqueue LLC preference arrays.
        Allocate array size to the actual number of LLCs online. (Peter Zij=
lstra, Madadi Vineeth Reddy)

 kernel/sched/core.c     |   1 +
 kernel/sched/sched.h    |   1 +
 kernel/sched/topology.c | 117 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 48626c81ba8e..ce533dc485f5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8800,6 +8800,7 @@ void __init sched_init(void)
 #ifdef CONFIG_SCHED_CACHE
 		raw_spin_lock_init(&rq->cpu_epoch_lock);
 		rq->cpu_epoch_next =3D jiffies;
+		rq->nr_pref_llc =3D NULL;
 #endif
=20
 		zalloc_cpumask_var_node(&rq->scratch_mask, GFP_KERNEL, cpu_to_node(i));
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ee8b70647835..8f2a779825e4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1129,6 +1129,7 @@ struct rq {
 #ifdef CONFIG_SCHED_CACHE
 	unsigned int		nr_pref_llc_running;
 	unsigned int		nr_llc_running;
+	unsigned int		*nr_pref_llc;
 #endif
 #ifdef CONFIG_NO_HZ_COMMON
 	unsigned long		last_blocked_load_update_tick;
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index f25d950ab015..d583399fc6a1 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -17,8 +17,121 @@ void sched_domains_mutex_unlock(void)
 	mutex_unlock(&sched_domains_mutex);
 }
=20
+/* the number of max LLCs being detected */
+static int new_max_llcs;
+/* the current number of max LLCs */
 int max_llcs;
=20
+#ifdef CONFIG_SCHED_CACHE
+
+static unsigned int *alloc_new_pref_llcs(unsigned int *old, unsigned int *=
*gc)
+{
+	unsigned int *new =3D NULL;
+
+	new =3D kcalloc(new_max_llcs, sizeof(unsigned int),
+		      GFP_KERNEL | __GFP_NOWARN);
+
+	if (!new) {
+		*gc =3D NULL;
+	} else {
+		/*
+		 * Place old entry in garbage collector
+		 * for later disposal.
+		 */
+		*gc =3D old;
+	}
+	return new;
+}
+
+static void populate_new_pref_llcs(unsigned int *old, unsigned int *new)
+{
+	int i;
+
+	if (!old)
+		return;
+
+	for (i =3D 0; i < max_llcs; i++)
+		new[i] =3D old[i];
+}
+
+static int resize_llc_pref(void)
+{
+	unsigned int *__percpu *tmp_llc_pref;
+	int i, ret =3D 0;
+
+	if (new_max_llcs <=3D max_llcs)
+		return 0;
+
+	/*
+	 * Allocate temp percpu pointer for old llc_pref,
+	 * which will be released after switching to the
+	 * new buffer.
+	 */
+	tmp_llc_pref =3D alloc_percpu_noprof(unsigned int *);
+	if (!tmp_llc_pref)
+		return -ENOMEM;
+
+	for_each_present_cpu(i)
+		*per_cpu_ptr(tmp_llc_pref, i) =3D NULL;
+
+	/*
+	 * Resize the per rq nr_pref_llc buffer and
+	 * switch to this new buffer.
+	 */
+	for_each_present_cpu(i) {
+		struct rq_flags rf;
+		unsigned int *new;
+		struct rq *rq;
+
+		rq =3D cpu_rq(i);
+		new =3D alloc_new_pref_llcs(rq->nr_pref_llc, per_cpu_ptr(tmp_llc_pref, i=
));
+		if (!new) {
+			ret =3D -ENOMEM;
+
+			goto release_old;
+		}
+
+		/*
+		 * Locking rq ensures that rq->nr_pref_llc values
+		 * don't change with new task enqueue/dequeue
+		 * when we repopulate the newly enlarged array.
+		 */
+		rq_lock_irqsave(rq, &rf);
+		populate_new_pref_llcs(rq->nr_pref_llc, new);
+		rq->nr_pref_llc =3D new;
+		rq_unlock_irqrestore(rq, &rf);
+	}
+
+release_old:
+	/*
+	 * Load balance is done under rcu_lock.
+	 * Wait for load balance before and during resizing to
+	 * be done. They may refer to old nr_pref_llc[]
+	 * that hasn't been resized.
+	 */
+	synchronize_rcu();
+	for_each_present_cpu(i)
+		kfree(*per_cpu_ptr(tmp_llc_pref, i));
+
+	free_percpu(tmp_llc_pref);
+
+	/* succeed and update */
+	if (!ret)
+		max_llcs =3D new_max_llcs;
+
+	return ret;
+}
+
+#else
+
+static int resize_llc_pref(void)
+{
+	max_llcs =3D new_max_llcs;
+	return 0;
+}
+
+#endif
+
 /* Protected by sched_domains_mutex: */
 static cpumask_var_t sched_domains_tmpmask;
 static cpumask_var_t sched_domains_tmpmask2;
@@ -714,7 +827,7 @@ static int update_llc_id(struct sched_domain *sd,
 	 *
 	 * For both cases, we want to increase the number of LLCs.
 	 */
-	per_cpu(sd_llc_id, cpu) =3D max_llcs++;
+	per_cpu(sd_llc_id, cpu) =3D new_max_llcs++;
=20
 	return per_cpu(sd_llc_id, cpu);
 }
@@ -2674,6 +2787,8 @@ build_sched_domains(const struct cpumask *cpu_map, st=
ruct sched_domain_attr *att
 	if (has_cluster)
 		static_branch_inc_cpuslocked(&sched_cluster_active);
=20
+	resize_llc_pref();
+
 	if (rq && sched_debug_verbose)
 		pr_info("root domain span: %*pbl\n", cpumask_pr_args(cpu_map));
=20
--=20
2.32.0