From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0EE3F2F4A0C
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:36 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270898; cv=none;
 b=i6UpioJgMk5GxDDiJNU6ym/ql7fYtIxc3m0laytI789opI8LDjTvyDtwqrIQyQ1c4ZCnekjBz/wO4Aujx1CK9ipZtczqav2p9tw1Hd3Voibb0lwiXLdi8v6PAAo2cAsX9FlCAQdMHkE7TrPq3hrfK8cTUOewsHi11k8otsoUoF8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270898; c=relaxed/simple;
	bh=TYBL04yFFQ2m3vahwqxlPWkj0L0Itx8xkCpecBkZyTo=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=NvFXEMhOMy2ly3nACqhZx/B7xBuxxjLzhrCAPQhizdE6auJ2An7O1CCizePDbPUt3x9R8OXvvONlUeFHXfcGpYcXF9IC+2ogKFFrxsqx07VvjHSs8Ud1keSWy0e/3m5mCyNcyiv9/x9sdp1mHOJsM6ZH+h1t1P9HV+a9nVVwfeU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=m8zYzaEs; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="m8zYzaEs"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270896; x=1781806896;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=TYBL04yFFQ2m3vahwqxlPWkj0L0Itx8xkCpecBkZyTo=;
  b=m8zYzaEsszmRk8d10p09dWuoM0QHBZbGZnY8ZU7YQ5oc73eP1UDpRePw
   Iu/jPTNT8uBPQfAA79j/b4tbxQImw3Vm2EFuhqgbCGzn8WuO4p6CfGONf
   nNIOGop6F+y60hE6rmyGD7GqoufIf/Xz+S8d55r5HVm1AioodkMfJsCMq
   p01kqQ4AItdgoMkEnocPNDlyafzx3MsZtFHCCqR6F929sYF5LFax3HY4G
   ozjUU62bKBmlkBn1eho1JY0ZSLQPetm2LAIQE4QynDS4MQ7bq112tEsWH
   BJ/IXkd445+qfY1nWMwW+JO0pyelws29KEOXY7BvPQVHFy5P4DCp3UAI/
   w==;
X-CSE-ConnectionGUID: 97R+yvzpS6K/KMRnntFzgg==
X-CSE-MsgGUID: WI5jQwWESQSCLf6+tddXlg==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931324"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931324"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:35 -0700
X-CSE-ConnectionGUID: 3Wla1slbSPWqbG0bkSSkVA==
X-CSE-MsgGUID: c3ximr2GR+C5Bp6cvb58uA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180959507"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:34 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 01/20] sched: Cache aware load-balancing
Date: Wed, 18 Jun 2025 11:27:49 -0700
Message-Id: 
 <cbe56cb89cca4c3ddcbc9956a82bda23f5b81c57.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Peter Zijlstra <peterz@infradead.org>

Hi all,

One of the many things on the eternal todo list has been finishing the
below hackery.

It is an attempt at modelling cache affinity -- and while the patch
really only targets LLC, it could very well be extended to also apply to
clusters (L2). Specifically any case of multiple cache domains inside a
node.

Anyway, I wrote this about a year ago, and I mentioned this at the
recent OSPM conf where Gautham and Prateek expressed interest in playing
with this code.

So here goes, very rough and largely unproven code ahead :-)

It applies to current tip/master, but I know it will fail the __percpu
validation that sits in -next, although that shouldn't be terribly hard
to fix up.

As is, it only computes a CPU inside the LLC that has the highest recent
runtime, this CPU is then used in the wake-up path to steer towards this
LLC and in task_hot() to limit migrations away from it.

More elaborate things could be done, notably there is an XXX in there
somewhere about finding the best LLC inside a NODE (interaction with
NUMA_BALANCING).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 include/linux/mm_types.h |  44 ++++++
 include/linux/sched.h    |   4 +
 init/Kconfig             |   4 +
 kernel/fork.c            |   5 +
 kernel/sched/core.c      |  13 +-
 kernel/sched/fair.c      | 330 +++++++++++++++++++++++++++++++++++++--
 kernel/sched/sched.h     |   8 +
 7 files changed, 388 insertions(+), 20 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 56d07edd01f9..013291c6aaa2 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -893,6 +893,12 @@ struct mm_cid {
 };
 #endif
=20
+struct mm_sched {
+	u64 runtime;
+	unsigned long epoch;
+	unsigned long occ;
+};
+
 struct kioctx_table;
 struct iommu_mm_data;
 struct mm_struct {
@@ -983,6 +989,17 @@ struct mm_struct {
 		 */
 		raw_spinlock_t cpus_allowed_lock;
 #endif
+#ifdef CONFIG_SCHED_CACHE
+		/*
+		 * Track per-cpu-per-process occupancy as a proxy for cache residency.
+		 * See account_mm_sched() and ...
+		 */
+		struct mm_sched __percpu *pcpu_sched;
+		raw_spinlock_t mm_sched_lock;
+		unsigned long mm_sched_epoch;
+		int mm_sched_cpu;
+#endif
+
 #ifdef CONFIG_MMU
 		atomic_long_t pgtables_bytes;	/* size of all page tables */
 #endif
@@ -1393,6 +1410,33 @@ static inline unsigned int mm_cid_size(void)
 static inline void mm_set_cpus_allowed(struct mm_struct *mm, const struct =
cpumask *cpumask) { }
 #endif /* CONFIG_SCHED_MM_CID */
=20
+#ifdef CONFIG_SCHED_CACHE
+extern void mm_init_sched(struct mm_struct *mm, struct mm_sched *pcpu_sche=
d);
+
+static inline int mm_alloc_sched_noprof(struct mm_struct *mm)
+{
+	struct mm_sched *pcpu_sched =3D alloc_percpu_noprof(struct mm_sched);
+	if (!pcpu_sched)
+		return -ENOMEM;
+
+	mm_init_sched(mm, pcpu_sched);
+	return 0;
+}
+
+#define mm_alloc_sched(...)	alloc_hooks(mm_alloc_sched_noprof(__VA_ARGS__))
+
+static inline void mm_destroy_sched(struct mm_struct *mm)
+{
+	free_percpu(mm->pcpu_sched);
+	mm->pcpu_sched =3D NULL;
+}
+#else /* !CONFIG_SCHED_CACHE */
+
+static inline int mm_alloc_sched(struct mm_struct *mm) { return 0; }
+static inline void mm_destroy_sched(struct mm_struct *mm) { }
+
+#endif /* CONFIG_SCHED_CACHE */
+
 struct mmu_gather;
 extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm);
 extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct=
 *mm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f96ac1982893..d0e4cda2b3cd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1399,6 +1399,10 @@ struct task_struct {
 	unsigned long			numa_pages_migrated;
 #endif /* CONFIG_NUMA_BALANCING */
=20
+#ifdef CONFIG_SCHED_CACHE
+	struct callback_head		cache_work;
+#endif
+
 #ifdef CONFIG_RSEQ
 	struct rseq __user *rseq;
 	u32 rseq_len;
diff --git a/init/Kconfig b/init/Kconfig
index bf3a920064be..e2509127b6f9 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -953,6 +953,10 @@ config NUMA_BALANCING
=20
 	  This system will be inactive on UMA systems.
=20
+config SCHED_CACHE
+	bool "Cache aware scheduler"
+	default y
+
 config NUMA_BALANCING_DEFAULT_ENABLED
 	bool "Automatically enable NUMA aware memory/task placement"
 	default y
diff --git a/kernel/fork.c b/kernel/fork.c
index 168681fc4b25..da1387823b9e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1332,6 +1332,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm=
, struct task_struct *p,
 	if (mm_alloc_cid(mm, p))
 		goto fail_cid;
=20
+	if (mm_alloc_sched(mm))
+		goto fail_sched;
+
 	if (percpu_counter_init_many(mm->rss_stat, 0, GFP_KERNEL_ACCOUNT,
 				     NR_MM_COUNTERS))
 		goto fail_pcpu;
@@ -1341,6 +1344,8 @@ static struct mm_struct *mm_init(struct mm_struct *mm=
, struct task_struct *p,
 	return mm;
=20
 fail_pcpu:
+	mm_destroy_sched(mm);
+fail_sched:
 	mm_destroy_cid(mm);
 fail_cid:
 	destroy_context(mm);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c81cf642dba0..d9c3e75f79d1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4524,6 +4524,7 @@ static void __sched_fork(unsigned long clone_flags, s=
truct task_struct *p)
 	p->migration_pending =3D NULL;
 #endif
 	init_sched_mm_cid(p);
+	init_sched_mm(p);
 }
=20
 DEFINE_STATIC_KEY_FALSE(sched_numa_balancing);
@@ -8526,6 +8527,7 @@ static struct kmem_cache *task_group_cache __ro_after=
_init;
=20
 void __init sched_init(void)
 {
+	unsigned long now =3D jiffies;
 	unsigned long ptr =3D 0;
 	int i;
=20
@@ -8600,7 +8602,7 @@ void __init sched_init(void)
 		raw_spin_lock_init(&rq->__lock);
 		rq->nr_running =3D 0;
 		rq->calc_load_active =3D 0;
-		rq->calc_load_update =3D jiffies + LOAD_FREQ;
+		rq->calc_load_update =3D now + LOAD_FREQ;
 		init_cfs_rq(&rq->cfs);
 		init_rt_rq(&rq->rt);
 		init_dl_rq(&rq->dl);
@@ -8644,7 +8646,7 @@ void __init sched_init(void)
 		rq->cpu_capacity =3D SCHED_CAPACITY_SCALE;
 		rq->balance_callback =3D &balance_push_callback;
 		rq->active_balance =3D 0;
-		rq->next_balance =3D jiffies;
+		rq->next_balance =3D now;
 		rq->push_cpu =3D 0;
 		rq->cpu =3D i;
 		rq->online =3D 0;
@@ -8656,7 +8658,7 @@ void __init sched_init(void)
=20
 		rq_attach_root(rq, &def_root_domain);
 #ifdef CONFIG_NO_HZ_COMMON
-		rq->last_blocked_load_update_tick =3D jiffies;
+		rq->last_blocked_load_update_tick =3D now;
 		atomic_set(&rq->nohz_flags, 0);
=20
 		INIT_CSD(&rq->nohz_csd, nohz_csd_func, rq);
@@ -8681,6 +8683,11 @@ void __init sched_init(void)
=20
 		rq->core_cookie =3D 0UL;
 #endif
+#ifdef CONFIG_SCHED_CACHE
+		raw_spin_lock_init(&rq->cpu_epoch_lock);
+		rq->cpu_epoch_next =3D now;
+#endif
+
 		zalloc_cpumask_var_node(&rq->scratch_mask, GFP_KERNEL, cpu_to_node(i));
 	}
=20
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0fb9bf995a47..df7d4a324fbe 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1166,10 +1166,229 @@ static s64 update_curr_se(struct rq *rq, struct sc=
hed_entity *curr)
 	return delta_exec;
 }
=20
-static inline void update_curr_task(struct task_struct *p, s64 delta_exec)
+#ifdef CONFIG_SCHED_CACHE
+
+/*
+ * XXX numbers come from a place the sun don't shine -- probably wants to =
be SD
+ * tunable or so.
+ */
+#define EPOCH_PERIOD	(HZ/100)	/* 10 ms */
+#define EPOCH_OLD	5		/* 50 ms */
+
+void mm_init_sched(struct mm_struct *mm, struct mm_sched *_pcpu_sched)
+{
+	unsigned long epoch;
+	int i;
+
+	for_each_possible_cpu(i) {
+		struct mm_sched *pcpu_sched =3D per_cpu_ptr(_pcpu_sched, i);
+		struct rq *rq =3D cpu_rq(i);
+
+		pcpu_sched->runtime =3D 0;
+		pcpu_sched->epoch =3D epoch =3D rq->cpu_epoch;
+		pcpu_sched->occ =3D -1;
+	}
+
+	raw_spin_lock_init(&mm->mm_sched_lock);
+	mm->mm_sched_epoch =3D epoch;
+	mm->mm_sched_cpu =3D -1;
+
+	smp_store_release(&mm->pcpu_sched, _pcpu_sched);
+}
+
+/* because why would C be fully specified */
+static __always_inline void __shr_u64(u64 *val, unsigned int n)
+{
+	if (n >=3D 64) {
+		*val =3D 0;
+		return;
+	}
+	*val >>=3D n;
+}
+
+static inline void __update_mm_sched(struct rq *rq, struct mm_sched *pcpu_=
sched)
+{
+	lockdep_assert_held(&rq->cpu_epoch_lock);
+
+	unsigned long n, now =3D jiffies;
+	long delta =3D now - rq->cpu_epoch_next;
+
+	if (delta > 0) {
+		n =3D (delta + EPOCH_PERIOD - 1) / EPOCH_PERIOD;
+		rq->cpu_epoch +=3D n;
+		rq->cpu_epoch_next +=3D n * EPOCH_PERIOD;
+		__shr_u64(&rq->cpu_runtime, n);
+	}
+
+	n =3D rq->cpu_epoch - pcpu_sched->epoch;
+	if (n) {
+		pcpu_sched->epoch +=3D n;
+		__shr_u64(&pcpu_sched->runtime, n);
+	}
+}
+
+static unsigned long fraction_mm_sched(struct rq *rq, struct mm_sched *pcp=
u_sched)
+{
+	guard(raw_spinlock_irqsave)(&rq->cpu_epoch_lock);
+
+	__update_mm_sched(rq, pcpu_sched);
+
+	/*
+	 * Runtime is a geometric series (r=3D0.5) and as such will sum to twice
+	 * the accumulation period, this means the multiplcation here should
+	 * not overflow.
+	 */
+	return div64_u64(NICE_0_LOAD * pcpu_sched->runtime, rq->cpu_runtime + 1);
+}
+
+static inline
+void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec)
+{
+	struct mm_struct *mm =3D p->mm;
+	struct mm_sched *pcpu_sched;
+	unsigned long epoch;
+
+	/*
+	 * init_task and kthreads don't be having no mm
+	 */
+	if (!mm || !mm->pcpu_sched)
+		return;
+
+	pcpu_sched =3D this_cpu_ptr(p->mm->pcpu_sched);
+
+	scoped_guard (raw_spinlock, &rq->cpu_epoch_lock) {
+		__update_mm_sched(rq, pcpu_sched);
+		pcpu_sched->runtime +=3D delta_exec;
+		rq->cpu_runtime +=3D delta_exec;
+		epoch =3D rq->cpu_epoch;
+	}
+
+	/*
+	 * If this task hasn't hit task_cache_work() for a while, invalidate
+	 * it's preferred state.
+	 */
+	if (epoch - READ_ONCE(mm->mm_sched_epoch) > EPOCH_OLD) {
+		mm->mm_sched_cpu =3D -1;
+		pcpu_sched->occ =3D -1;
+	}
+}
+
+static void task_tick_cache(struct rq *rq, struct task_struct *p)
+{
+	struct callback_head *work =3D &p->cache_work;
+	struct mm_struct *mm =3D p->mm;
+
+	if (!mm || !mm->pcpu_sched)
+		return;
+
+	if (mm->mm_sched_epoch =3D=3D rq->cpu_epoch)
+		return;
+
+	guard(raw_spinlock)(&mm->mm_sched_lock);
+
+	if (mm->mm_sched_epoch =3D=3D rq->cpu_epoch)
+		return;
+
+	if (work->next =3D=3D work) {
+		task_work_add(p, work, TWA_RESUME);
+		WRITE_ONCE(mm->mm_sched_epoch, rq->cpu_epoch);
+	}
+}
+
+static void task_cache_work(struct callback_head *work)
+{
+	struct task_struct *p =3D current;
+	struct mm_struct *mm =3D p->mm;
+	unsigned long m_a_occ =3D 0;
+	int cpu, m_a_cpu =3D -1;
+	cpumask_var_t cpus;
+
+	WARN_ON_ONCE(work !=3D &p->cache_work);
+
+	work->next =3D work;
+
+	if (p->flags & PF_EXITING)
+		return;
+
+	if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
+		return;
+
+	scoped_guard (cpus_read_lock) {
+		cpumask_copy(cpus, cpu_online_mask);
+
+		for_each_cpu(cpu, cpus) {
+			/* XXX sched_cluster_active */
+			struct sched_domain *sd =3D per_cpu(sd_llc, cpu);
+			unsigned long occ, m_occ =3D 0, a_occ =3D 0;
+			int m_cpu =3D -1, nr =3D 0, i;
+
+			for_each_cpu(i, sched_domain_span(sd)) {
+				occ =3D fraction_mm_sched(cpu_rq(i),
+							per_cpu_ptr(mm->pcpu_sched, i));
+				a_occ +=3D occ;
+				if (occ > m_occ) {
+					m_occ =3D occ;
+					m_cpu =3D i;
+				}
+				nr++;
+				trace_printk("(%d) occ: %ld m_occ: %ld m_cpu: %d nr: %d\n",
+					     per_cpu(sd_llc_id, i), occ, m_occ, m_cpu, nr);
+			}
+
+			a_occ /=3D nr;
+			if (a_occ > m_a_occ) {
+				m_a_occ =3D a_occ;
+				m_a_cpu =3D m_cpu;
+			}
+
+			trace_printk("(%d) a_occ: %ld m_a_occ: %ld\n",
+				     per_cpu(sd_llc_id, cpu), a_occ, m_a_occ);
+
+			for_each_cpu(i, sched_domain_span(sd)) {
+				/* XXX threshold ? */
+				per_cpu_ptr(mm->pcpu_sched, i)->occ =3D a_occ;
+			}
+
+			cpumask_andnot(cpus, cpus, sched_domain_span(sd));
+		}
+	}
+
+	/*
+	 * If the max average cache occupancy is 'small' we don't care.
+	 */
+	if (m_a_occ < (NICE_0_LOAD >> EPOCH_OLD))
+		m_a_cpu =3D -1;
+
+	mm->mm_sched_cpu =3D m_a_cpu;
+
+	free_cpumask_var(cpus);
+}
+
+void init_sched_mm(struct task_struct *p)
+{
+	struct callback_head *work =3D &p->cache_work;
+	init_task_work(work, task_cache_work);
+	work->next =3D work;
+}
+
+#else
+
+static inline void account_mm_sched(struct rq *rq, struct task_struct *p,
+				    s64 delta_exec) { }
+
+
+void init_sched_mm(struct task_struct *p) { }
+
+static void task_tick_cache(struct rq *rq, struct task_struct *p) { }
+
+#endif
+
+static inline
+void update_curr_task(struct rq *rq, struct task_struct *p, s64 delta_exec)
 {
 	trace_sched_stat_runtime(p, delta_exec);
 	account_group_exec_runtime(p, delta_exec);
+	account_mm_sched(rq, p, delta_exec);
 	cgroup_account_cputime(p, delta_exec);
 }
=20
@@ -1215,7 +1434,7 @@ s64 update_curr_common(struct rq *rq)
=20
 	delta_exec =3D update_curr_se(rq, &donor->se);
 	if (likely(delta_exec > 0))
-		update_curr_task(donor, delta_exec);
+		update_curr_task(rq, donor, delta_exec);
=20
 	return delta_exec;
 }
@@ -1244,7 +1463,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
 	if (entity_is_task(curr)) {
 		struct task_struct *p =3D task_of(curr);
=20
-		update_curr_task(p, delta_exec);
+		update_curr_task(rq, p, delta_exec);
=20
 		/*
 		 * If the fair_server is active, we need to account for the
@@ -7848,7 +8067,7 @@ static int select_idle_sibling(struct task_struct *p,=
 int prev, int target)
 	 * per-cpu select_rq_mask usage
 	 */
 	lockdep_assert_irqs_disabled();
-
+again:
 	if ((available_idle_cpu(target) || sched_idle_cpu(target)) &&
 	    asym_fits_cpu(task_util, util_min, util_max, target))
 		return target;
@@ -7886,7 +8105,8 @@ static int select_idle_sibling(struct task_struct *p,=
 int prev, int target)
 	/* Check a recently used CPU as a potential idle candidate: */
 	recent_used_cpu =3D p->recent_used_cpu;
 	p->recent_used_cpu =3D prev;
-	if (recent_used_cpu !=3D prev &&
+	if (prev =3D=3D p->wake_cpu &&
+	    recent_used_cpu !=3D prev &&
 	    recent_used_cpu !=3D target &&
 	    cpus_share_cache(recent_used_cpu, target) &&
 	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cp=
u)) &&
@@ -7939,6 +8159,18 @@ static int select_idle_sibling(struct task_struct *p=
, int prev, int target)
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
=20
+	if (prev !=3D p->wake_cpu && !cpus_share_cache(prev, p->wake_cpu)) {
+		/*
+		 * Most likely select_cache_cpu() will have re-directed
+		 * the wakeup, but getting here means the preferred cache is
+		 * too busy, so re-try with the actual previous.
+		 *
+		 * XXX wake_affine is lost for this pass.
+		 */
+		prev =3D target =3D p->wake_cpu;
+		goto again;
+	}
+
 	/*
 	 * For cluster machines which have lower sharing cache like L2 or
 	 * LLC Tag, we tend to find an idle CPU in the target's cluster
@@ -8561,6 +8793,40 @@ static int find_energy_efficient_cpu(struct task_str=
uct *p, int prev_cpu)
 	return target;
 }
=20
+#ifdef CONFIG_SCHED_CACHE
+static long __migrate_degrades_locality(struct task_struct *p, int src_cpu=
, int dst_cpu, bool idle);
+
+static int select_cache_cpu(struct task_struct *p, int prev_cpu)
+{
+	struct mm_struct *mm =3D p->mm;
+	int cpu;
+
+	if (!mm || p->nr_cpus_allowed =3D=3D 1)
+		return prev_cpu;
+
+	cpu =3D mm->mm_sched_cpu;
+	if (cpu < 0)
+		return prev_cpu;
+
+
+	if (static_branch_likely(&sched_numa_balancing) &&
+	    __migrate_degrades_locality(p, prev_cpu, cpu, false) > 0) {
+		/*
+		 * XXX look for max occupancy inside prev_cpu's node
+		 */
+		return prev_cpu;
+	}
+
+	return cpu;
+}
+#else
+static int select_cache_cpu(struct task_struct *p, int prev_cpu)
+{
+	return prev_cpu;
+}
+#endif
+
+
 /*
  * select_task_rq_fair: Select target runqueue for the waking task in doma=
ins
  * that have the relevant SD flag set. In practice, this is SD_BALANCE_WAK=
E,
@@ -8586,6 +8852,8 @@ select_task_rq_fair(struct task_struct *p, int prev_c=
pu, int wake_flags)
 	 * required for stable ->cpus_allowed
 	 */
 	lockdep_assert_held(&p->pi_lock);
+	guard(rcu)();
+
 	if (wake_flags & WF_TTWU) {
 		record_wakee(p);
=20
@@ -8593,6 +8861,8 @@ select_task_rq_fair(struct task_struct *p, int prev_c=
pu, int wake_flags)
 		    cpumask_test_cpu(cpu, p->cpus_ptr))
 			return cpu;
=20
+		new_cpu =3D prev_cpu =3D select_cache_cpu(p, prev_cpu);
+
 		if (!is_rd_overutilized(this_rq()->rd)) {
 			new_cpu =3D find_energy_efficient_cpu(p, prev_cpu);
 			if (new_cpu >=3D 0)
@@ -8603,7 +8873,6 @@ select_task_rq_fair(struct task_struct *p, int prev_c=
pu, int wake_flags)
 		want_affine =3D !wake_wide(p) && cpumask_test_cpu(cpu, p->cpus_ptr);
 	}
=20
-	rcu_read_lock();
 	for_each_domain(cpu, tmp) {
 		/*
 		 * If both 'cpu' and 'prev_cpu' are part of this domain,
@@ -8636,7 +8905,6 @@ select_task_rq_fair(struct task_struct *p, int prev_c=
pu, int wake_flags)
 		/* Fast path */
 		new_cpu =3D select_idle_sibling(p, prev_cpu, new_cpu);
 	}
-	rcu_read_unlock();
=20
 	return new_cpu;
 }
@@ -9286,6 +9554,17 @@ static int task_hot(struct task_struct *p, struct lb=
_env *env)
 	if (sysctl_sched_migration_cost =3D=3D 0)
 		return 0;
=20
+#ifdef CONFIG_SCHED_CACHE
+	if (p->mm && p->mm->pcpu_sched) {
+		/*
+		 * XXX things like Skylake have non-inclusive L3 and might not
+		 * like this L3 centric view. What to do about L2 stickyness ?
+		 */
+		return per_cpu_ptr(p->mm->pcpu_sched, env->src_cpu)->occ >
+		       per_cpu_ptr(p->mm->pcpu_sched, env->dst_cpu)->occ;
+	}
+#endif
+
 	delta =3D rq_clock_task(env->src_rq) - p->se.exec_start;
=20
 	return delta < (s64)sysctl_sched_migration_cost;
@@ -9297,27 +9576,25 @@ static int task_hot(struct task_struct *p, struct l=
b_env *env)
  * Returns 0, if task migration is not affected by locality.
  * Returns a negative value, if task migration improves locality i.e migra=
tion preferred.
  */
-static long migrate_degrades_locality(struct task_struct *p, struct lb_env=
 *env)
+static long __migrate_degrades_locality(struct task_struct *p, int src_cpu=
, int dst_cpu, bool idle)
 {
 	struct numa_group *numa_group =3D rcu_dereference(p->numa_group);
 	unsigned long src_weight, dst_weight;
 	int src_nid, dst_nid, dist;
=20
-	if (!static_branch_likely(&sched_numa_balancing))
-		return 0;
-
-	if (!p->numa_faults || !(env->sd->flags & SD_NUMA))
+	if (!p->numa_faults)
 		return 0;
=20
-	src_nid =3D cpu_to_node(env->src_cpu);
-	dst_nid =3D cpu_to_node(env->dst_cpu);
+	src_nid =3D cpu_to_node(src_cpu);
+	dst_nid =3D cpu_to_node(dst_cpu);
=20
 	if (src_nid =3D=3D dst_nid)
 		return 0;
=20
 	/* Migrating away from the preferred node is always bad. */
 	if (src_nid =3D=3D p->numa_preferred_nid) {
-		if (env->src_rq->nr_running > env->src_rq->nr_preferred_running)
+		struct rq *src_rq =3D cpu_rq(src_cpu);
+		if (src_rq->nr_running > src_rq->nr_preferred_running)
 			return 1;
 		else
 			return 0;
@@ -9328,7 +9605,7 @@ static long migrate_degrades_locality(struct task_str=
uct *p, struct lb_env *env)
 		return -1;
=20
 	/* Leaving a core idle is often worse than degrading locality. */
-	if (env->idle =3D=3D CPU_IDLE)
+	if (idle)
 		return 0;
=20
 	dist =3D node_distance(src_nid, dst_nid);
@@ -9343,7 +9620,24 @@ static long migrate_degrades_locality(struct task_st=
ruct *p, struct lb_env *env)
 	return src_weight - dst_weight;
 }
=20
+static long migrate_degrades_locality(struct task_struct *p, struct lb_env=
 *env)
+{
+	if (!static_branch_likely(&sched_numa_balancing))
+		return 0;
+
+	if (!(env->sd->flags & SD_NUMA))
+		return 0;
+
+	return __migrate_degrades_locality(p, env->src_cpu, env->dst_cpu,
+					   env->idle =3D=3D CPU_IDLE);
+}
+
 #else
+static long __migrate_degrades_locality(struct task_struct *p, int src_cpu=
, int dst_cpu, bool idle)
+{
+	return 0;
+}
+
 static inline long migrate_degrades_locality(struct task_struct *p,
 					     struct lb_env *env)
 {
@@ -13102,8 +13396,8 @@ static inline void task_tick_core(struct rq *rq, st=
ruct task_struct *curr) {}
  */
 static void task_tick_fair(struct rq *rq, struct task_struct *curr, int qu=
eued)
 {
-	struct cfs_rq *cfs_rq;
 	struct sched_entity *se =3D &curr->se;
+	struct cfs_rq *cfs_rq;
=20
 	for_each_sched_entity(se) {
 		cfs_rq =3D cfs_rq_of(se);
@@ -13113,6 +13407,8 @@ static void task_tick_fair(struct rq *rq, struct ta=
sk_struct *curr, int queued)
 	if (static_branch_unlikely(&sched_numa_balancing))
 		task_tick_numa(rq, curr);
=20
+	task_tick_cache(rq, curr);
+
 	update_misfit_status(curr, rq);
 	check_update_overutilized_status(task_rq(curr));
=20
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 47972f34ea70..d16ccd66ca07 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1171,6 +1171,12 @@ struct rq {
 	u64			clock_pelt_idle_copy;
 	u64			clock_idle_copy;
 #endif
+#ifdef CONFIG_SCHED_CACHE
+	raw_spinlock_t		cpu_epoch_lock;
+	u64			cpu_runtime;
+	unsigned long		cpu_epoch;
+	unsigned long		cpu_epoch_next;
+#endif
=20
 	atomic_t		nr_iowait;
=20
@@ -3861,6 +3867,8 @@ static inline void task_tick_mm_cid(struct rq *rq, st=
ruct task_struct *curr) { }
 static inline void init_sched_mm_cid(struct task_struct *t) { }
 #endif /* !CONFIG_SCHED_MM_CID */
=20
+extern void init_sched_mm(struct task_struct *p);
+
 extern u64 avg_vruntime(struct cfs_rq *cfs_rq);
 extern int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se);
 #ifdef CONFIG_SMP
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4E492F4A06
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:47 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270909; cv=none;
 b=OfHuG3LBktIYQ17A0wezDcBygFYIqQGnVMGi+J74hfqxXToXJOkcfe/QAshk8VQr3iHhepGalcue2+Gh9lXUo6YIap3bPlMoXEKyEF/uKj/HOqBRTfPfSFVzKLCzuG1BPrKVWm/9VqF9CtRE/PxcAkoHlqkKJK38pOqYBIlTkpU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270909; c=relaxed/simple;
	bh=bDfBIxEdnv+hYygaV+u3o+TV5wT/EFTyHlaYTI7nFpw=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=UKMgGxYucBL+0GlD7p3d/Zom0U72DS+gR1yhIB67WE5LMqkpy3l5lREKGfo/WMbkvVplPyT3O4LIWAcMuzVNNgwy1U2yRPrfXUYrbe55jB5Ido4zUO7riYoUV38Tur4ZomgT4/03W4QWQnIXvK43x1VsVDq4rpLILrJkc0MLRz4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=ZPYgHNmn; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="ZPYgHNmn"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270907; x=1781806907;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=bDfBIxEdnv+hYygaV+u3o+TV5wT/EFTyHlaYTI7nFpw=;
  b=ZPYgHNmnGkft4GpxmGCyeKoUkQ215gIizexSy0kMiXa/NTiZo/gGlfyT
   kUMF1ZHUyeWicU44K6z5ga9ude/u1b7dOInRSMexJBl7xg2wWzt43htgN
   0SZD8bKm1Psl9VbJQzK0J75KsgRFBxuGgnNUxw1QIktunEAn4cwXF97df
   RDtco1RU/rA4YaqRY//20xf5f/vHjT5XptItMKZ8mzjiE8ikBvrqvp2HQ
   6dxxSgUzxm0LyOJqknyMhoG24RtRQixTDOHtv0zR/a3Etu3Yfy4fbLcgR
   4fJb3VSDSPIj3CLVFTTh/id7nCY7gUsIPDC9MnN4GNRnUeVr8jGQaTFWv
   Q==;
X-CSE-ConnectionGUID: YR1pxp3JSTSWW1r6sY2fCw==
X-CSE-MsgGUID: N3vXqxYJSWukJNGgS8J6EQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931471"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931471"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:47 -0700
X-CSE-ConnectionGUID: 4Lza8fDiSzyAbCqKOh54VQ==
X-CSE-MsgGUID: sumQVaeBRRG5ZeV1Vg8XHA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180959777"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:46 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Chen Yu <yu.c.chen@intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC patch v3 02/20] sched: Several fixes for cache aware scheduling
Date: Wed, 18 Jun 2025 11:27:50 -0700
Message-Id: 
 <d73418022de76dab9f60c0c5432d783b3b2833dc.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Chen Yu <yu.c.chen@intel.com>

1. Fix compile error on percpu allocation.
2. Enqueue to the target CPU rather than the current CPU.
3. NULL LLC sched domain check(Libo Chen).
4. Introduce sched feature SCHED_CACHE to control cache aware scheduling
5. Fix unsigned occupancy initialization to -1.
6. If there is only 1 thread in the process, no need to enable cache
   awareness
7. Add __maybe_unused to __migrate_degrades_locality() to
   avoid compile warnings.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 include/linux/mm_types.h |  4 ++--
 kernel/sched/fair.c      | 27 ++++++++++++++++-----------
 kernel/sched/features.h  |  1 +
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 013291c6aaa2..9de4a0a13c4d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1411,11 +1411,11 @@ static inline void mm_set_cpus_allowed(struct mm_st=
ruct *mm, const struct cpumas
 #endif /* CONFIG_SCHED_MM_CID */
=20
 #ifdef CONFIG_SCHED_CACHE
-extern void mm_init_sched(struct mm_struct *mm, struct mm_sched *pcpu_sche=
d);
+extern void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *=
pcpu_sched);
=20
 static inline int mm_alloc_sched_noprof(struct mm_struct *mm)
 {
-	struct mm_sched *pcpu_sched =3D alloc_percpu_noprof(struct mm_sched);
+	struct mm_sched __percpu *pcpu_sched =3D alloc_percpu_noprof(struct mm_sc=
hed);
 	if (!pcpu_sched)
 		return -ENOMEM;
=20
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df7d4a324fbe..89db97f8ef02 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1175,7 +1175,7 @@ static s64 update_curr_se(struct rq *rq, struct sched=
_entity *curr)
 #define EPOCH_PERIOD	(HZ/100)	/* 10 ms */
 #define EPOCH_OLD	5		/* 50 ms */
=20
-void mm_init_sched(struct mm_struct *mm, struct mm_sched *_pcpu_sched)
+void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_s=
ched)
 {
 	unsigned long epoch;
 	int i;
@@ -1186,7 +1186,7 @@ void mm_init_sched(struct mm_struct *mm, struct mm_sc=
hed *_pcpu_sched)
=20
 		pcpu_sched->runtime =3D 0;
 		pcpu_sched->epoch =3D epoch =3D rq->cpu_epoch;
-		pcpu_sched->occ =3D -1;
+		pcpu_sched->occ =3D 0;
 	}
=20
 	raw_spin_lock_init(&mm->mm_sched_lock);
@@ -1254,7 +1254,7 @@ void account_mm_sched(struct rq *rq, struct task_stru=
ct *p, s64 delta_exec)
 	if (!mm || !mm->pcpu_sched)
 		return;
=20
-	pcpu_sched =3D this_cpu_ptr(p->mm->pcpu_sched);
+	pcpu_sched =3D per_cpu_ptr(p->mm->pcpu_sched, cpu_of(rq));
=20
 	scoped_guard (raw_spinlock, &rq->cpu_epoch_lock) {
 		__update_mm_sched(rq, pcpu_sched);
@@ -1264,12 +1264,14 @@ void account_mm_sched(struct rq *rq, struct task_st=
ruct *p, s64 delta_exec)
 	}
=20
 	/*
-	 * If this task hasn't hit task_cache_work() for a while, invalidate
+	 * If this task hasn't hit task_cache_work() for a while, or it
+	 * has only 1 thread, invalidate
 	 * it's preferred state.
 	 */
-	if (epoch - READ_ONCE(mm->mm_sched_epoch) > EPOCH_OLD) {
+	if (epoch - READ_ONCE(mm->mm_sched_epoch) > EPOCH_OLD ||
+	    get_nr_threads(p) <=3D 1) {
 		mm->mm_sched_cpu =3D -1;
-		pcpu_sched->occ =3D -1;
+		pcpu_sched->occ =3D 0;
 	}
 }
=20
@@ -1286,9 +1288,6 @@ static void task_tick_cache(struct rq *rq, struct tas=
k_struct *p)
=20
 	guard(raw_spinlock)(&mm->mm_sched_lock);
=20
-	if (mm->mm_sched_epoch =3D=3D rq->cpu_epoch)
-		return;
-
 	if (work->next =3D=3D work) {
 		task_work_add(p, work, TWA_RESUME);
 		WRITE_ONCE(mm->mm_sched_epoch, rq->cpu_epoch);
@@ -1322,6 +1321,9 @@ static void task_cache_work(struct callback_head *wor=
k)
 			unsigned long occ, m_occ =3D 0, a_occ =3D 0;
 			int m_cpu =3D -1, nr =3D 0, i;
=20
+			if (!sd)
+				continue;
+
 			for_each_cpu(i, sched_domain_span(sd)) {
 				occ =3D fraction_mm_sched(cpu_rq(i),
 							per_cpu_ptr(mm->pcpu_sched, i));
@@ -8801,6 +8803,9 @@ static int select_cache_cpu(struct task_struct *p, in=
t prev_cpu)
 	struct mm_struct *mm =3D p->mm;
 	int cpu;
=20
+	if (!sched_feat(SCHED_CACHE))
+		return prev_cpu;
+
 	if (!mm || p->nr_cpus_allowed =3D=3D 1)
 		return prev_cpu;
=20
@@ -9555,7 +9560,7 @@ static int task_hot(struct task_struct *p, struct lb_=
env *env)
 		return 0;
=20
 #ifdef CONFIG_SCHED_CACHE
-	if (p->mm && p->mm->pcpu_sched) {
+	if (sched_feat(SCHED_CACHE) && p->mm && p->mm->pcpu_sched) {
 		/*
 		 * XXX things like Skylake have non-inclusive L3 and might not
 		 * like this L3 centric view. What to do about L2 stickyness ?
@@ -9633,7 +9638,7 @@ static long migrate_degrades_locality(struct task_str=
uct *p, struct lb_env *env)
 }
=20
 #else
-static long __migrate_degrades_locality(struct task_struct *p, int src_cpu=
, int dst_cpu, bool idle)
+static __maybe_unused long __migrate_degrades_locality(struct task_struct =
*p, int src_cpu, int dst_cpu, bool idle)
 {
 	return 0;
 }
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 3c12d9f93331..d2af7bfd36bf 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -87,6 +87,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
  */
 SCHED_FEAT(SIS_UTIL, true)
=20
+SCHED_FEAT(SCHED_CACHE, true)
 /*
  * Issue a WARN when we do multiple update_rq_clock() calls
  * in a single rq->lock section. Default disabled because the
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B98A42F4A13
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:49 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270911; cv=none;
 b=H37a6GaMCpKVbBfru0xhkv/YQMjuzakfh40XV8mJ06HkLTiVswK7M40TUc0iJ2+QdHbjvIsa3fkD0Ch9hrzqgWR417U/tS7He62fpoRnc/RWieBtEAO7KEIcS4LI+2bm+YmBVIN6m7jaZ7yUlmNHWqu6HcD8VDmZ1CHMeSMizgk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270911; c=relaxed/simple;
	bh=56rh2PM2yAL35gap+jzrhdtnDXnsh7kQnStk2sSwL5M=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=G2idI3SEXApx4GgXZ/P+aOwo15Jk0qYkNXdo1GgZkBwTVB4/wZnkC8GaXlcdpiVIBcXRH+vvno/YvO528eUsbhW6TDkWnRait/B5YQRy1pg5uGy44IKpVEiwxTH7cssQJgE1Tsmt4x5g8AlJKz8IC1CaADUdr9RqjRgXQ9GaDLs=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=nWZpvDE7; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="nWZpvDE7"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270909; x=1781806909;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=56rh2PM2yAL35gap+jzrhdtnDXnsh7kQnStk2sSwL5M=;
  b=nWZpvDE7SLXGGfo3jHHGEomAw3ClY8hUPhWrfFErNHzBVjyIHSQntZd2
   1nZXjrvyQxuRk9ZxQYH3QHfm14LYHc70BTSraxKI+8chQcuVj6tsAgrD1
   RWdBKfVBbXjt+LaiRwCbUYMll6u+jjqnmEMHSrUpZujg3klH237md9SXa
   37yFQxyarddD1nF8E+ny40AEdtC3cTGt5Ar19Wsp+W6417mEmx6ktZkZd
   2s4JDPZkFpV3gOOZumfVaiUV2iM+gjXUhTEiJqzIpaoYdgOiBcYYH7tlm
   I/na3T9fEDfmsq6JxtGp9O8CDC/E09i7K1m27+a7hskfivj2Uj+kXW6uY
   w==;
X-CSE-ConnectionGUID: iTxb4W8TSr+JVDmjhHc1nA==
X-CSE-MsgGUID: 3TLdAYAnTC6j+0rzbFr1IQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931484"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931484"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:47 -0700
X-CSE-ConnectionGUID: VnM+trhjTJSnHQCXxzcG2w==
X-CSE-MsgGUID: xdjg5U3US8yEOUARte8LqQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180959793"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:47 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Chen Yu <yu.c.chen@intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC patch v3 03/20] sched: Avoid task migration within its preferred
 LLC
Date: Wed, 18 Jun 2025 11:27:51 -0700
Message-Id: 
 <284223928844c9ae64de1fa142f8db89775de27b.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Chen Yu <yu.c.chen@intel.com>

It was found that when running schbench, there is a
significant amount of in-LLC task migrations, even if
the wakee is woken up on its preferred LLC. This
leads to core-to-core latency and impairs performance.

Inhibit task migration if the wakee is already in its
preferred LLC.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/fair.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 89db97f8ef02..567ad2a0cfa2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8813,6 +8813,8 @@ static int select_cache_cpu(struct task_struct *p, in=
t prev_cpu)
 	if (cpu < 0)
 		return prev_cpu;
=20
+	if (cpus_share_cache(cpu, prev_cpu))
+		return prev_cpu;
=20
 	if (static_branch_likely(&sched_numa_balancing) &&
 	    __migrate_degrades_locality(p, prev_cpu, cpu, false) > 0) {
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C23B2F4A0D
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:49 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270911; cv=none;
 b=qgOqtbwj/b1lT+pk59OtjXjeOpPuI2zhfn0D21JXULqPLorw/ZgDKmNNS7Urzt/SfBFCUz801jIRaBO9Cslv2B7LxeJe//HjIB3+4P845payLN3vcYxAOxAbfLaUARgyfK6W4UUcOiOk0TZHcE9SbDMxaEDC/rvvNnXsMH1W0ug=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270911; c=relaxed/simple;
	bh=nwpdkAoJpW4EOqPZQh+uhJ+qmZILgurvv6g4rDaWrYo=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=HAZIoXDlwzbldKWFUzY/Q4Jaxhhc2Mkt0bb9WOBfRPLAMS3DspRW02WEau/R32ErYRWHNC9ZlNYoWyuZiO4M7Awr204pB/+urb4Knb8pmbJ2BIdcVmJE5vaTPh/a1tQlov0Ea2J7+pXzMfBW5Wl8AAuJzrs80wakQeKZDNOWJJ0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=ivStBZKK; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="ivStBZKK"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270909; x=1781806909;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=nwpdkAoJpW4EOqPZQh+uhJ+qmZILgurvv6g4rDaWrYo=;
  b=ivStBZKKr1kLat+jKvOOQxbf9d4bC1I/SJy7LDib/zgN3n589asFL4+r
   PYsbbyBX6DjZAdbQ1Ik/G6Sc6usq0dD9Ziu/7QfHFyk6vz2whi6PSRkGc
   qh0Xwo9lT9BAKy1pR/Oo42AYGiTgM5CVtwmFP8HFWa4TdxGk4w9dRkMX3
   4DxdIZo1ar93mu1DEN1+6WJ0elyBKkJVxfHeC50jqR9/1tsGUnexjnQ7X
   3xWpf6BNsr2eTOb+JyArdlGpErCa4hDLn0ptngc6kRC2FeOvceeJAY/Rw
   iKQTxZNYHsLyWzeH16AJJGbNG66qTaf60D44RqGpvBPrHgL3ZhdDghMpz
   A==;
X-CSE-ConnectionGUID: x7oHEr0dTrOTH6OFzpdkdg==
X-CSE-MsgGUID: UfwNMRYcRZKjKAWmE406aQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931496"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931496"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:48 -0700
X-CSE-ConnectionGUID: FdoRwVvNTIiAzpt945cGzw==
X-CSE-MsgGUID: TkAAX+2KQ6qAC2qWqI6SgA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180959815"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:48 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 04/20] sched: Avoid calculating the cpumask if the
 system is overloaded
Date: Wed, 18 Jun 2025 11:27:52 -0700
Message-Id: 
 <2901db739c1792eb9ece08a2b8d3b8de32efd0eb.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: K Prateek Nayak <kprateek.nayak@amd.com>

If the SIS_UTIL cuts off idle cpu search, result of the cpumask_and() is
of no use. Since select_idle_cpu() can now be called twice per wake up
in the select_idle_sibling() due to cache aware wake up, this overhead
can be visible in benchmarks like hackbench.

To save some additional cycles, especially in cases where we target
the LLC frequently and the search bails out because the LLC is busy,
only calculate the cpumask if the system is not overloaded.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 567ad2a0cfa2..6a2678f9d44a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7918,8 +7918,6 @@ static int select_idle_cpu(struct task_struct *p, str=
uct sched_domain *sd, bool
 	int i, cpu, idle_cpu =3D -1, nr =3D INT_MAX;
 	struct sched_domain_shared *sd_share;
=20
-	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
-
 	if (sched_feat(SIS_UTIL)) {
 		sd_share =3D rcu_dereference(per_cpu(sd_llc_shared, target));
 		if (sd_share) {
@@ -7931,6 +7929,8 @@ static int select_idle_cpu(struct task_struct *p, str=
uct sched_domain *sd, bool
 		}
 	}
=20
+	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+
 	if (static_branch_unlikely(&sched_cluster_active)) {
 		struct sched_group *sg =3D sd->groups;
=20
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 546862F948D
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:51 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270915; cv=none;
 b=gCit7OdxmL/z+sHqvOYlmAu0jpeZwqhZeORAcAbKfGEmC7Cut1d8DS6/6wZGZvCrl4vEp3HCh8qjH6ozHZZ6tcNiYj+z1Y0m+CEOlg676Q1clxzwBbJ2P+CbCmGGNeEg3rwixeD6+R0v81UsIKJXJfMsf8UU4IJzxCienwRrWeA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270915; c=relaxed/simple;
	bh=GRt1KcpJc3uX3j1lW75IjTwwtuVzCFPIjuwEQmJ7o0c=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=FBJTd0ukxWRW+/MIo7zsf0SmGsGIBox8sqZYavmOM5fv6LOEqVyOr4wp2ndfb2WlEQFkw2Pp36oF0gqWn9mhyDwfdEsnpnqdcv1XeDKQshvz9ZWQHVjdtgZQp4BkNGTp0CFAVo9mAHC/VnVDZvOjoYt4QE/rqzevU8YFJ+7bEko=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=MSgKa/Xs; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="MSgKa/Xs"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270911; x=1781806911;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=GRt1KcpJc3uX3j1lW75IjTwwtuVzCFPIjuwEQmJ7o0c=;
  b=MSgKa/Xs8zOJ4H2lqLt5AoI5bQ2Z5hgnpUup28q98ByliGPYLMtTuVoT
   +lSE1UHq2qbmWe+CGFnXXNN0O11daPjgIRfzwTIAXeYYAwWhWm/SJOst4
   2yClxpLCWgokA1/yxRHLW2J/20uBhmoIokqluLvohhQNoEZU8oeZgagEC
   Urji0g1zMpdionTkeyTJrvrZh+ExyPjKEjVPQFLk6s+JnHq/wiwVKWRjm
   iKAY3vicJCdgEZaqexyIOwSVKYAdj5Ds+qaro+e1pYLQIVMXZfJCil89W
   2TwNI95OlQaBFc3aLSCKuvjf4TUtpWBzOnftomqkcPANu/uLnxs7/ZZJ5
   Q==;
X-CSE-ConnectionGUID: OsOzpi1fQVmQCEIlEJnnaA==
X-CSE-MsgGUID: b5N1VeS7R72ao2jICSSBCQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931510"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931510"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:49 -0700
X-CSE-ConnectionGUID: 03mrFPsdTSSv8opMC44BDA==
X-CSE-MsgGUID: vBS5/z1NSra3RrHanVjnzw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180959835"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:49 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 05/20] sched: Add hysteresis to switch a task's
 preferred LLC
Date: Wed, 18 Jun 2025 11:27:53 -0700
Message-Id: 
 <7371f30196b317c0c5a0ae3fa463ec76a4dc69ef.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Switching a process's preferred LLC generates lots of task
migrations across LLCs. To avoid frequent switches
of home LLC, implement the following policy:

1. Require a 2x occ change threshold to switch preferred LLC
2. Don't discard preferred LLC for a task

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/fair.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6a2678f9d44a..7fb2322c5d9e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1175,6 +1175,14 @@ static s64 update_curr_se(struct rq *rq, struct sche=
d_entity *curr)
 #define EPOCH_PERIOD	(HZ/100)	/* 10 ms */
 #define EPOCH_OLD	5		/* 50 ms */
=20
+static int llc_id(int cpu)
+{
+	if (cpu < 0)
+		return -1;
+
+	return per_cpu(sd_llc_id, cpu);
+}
+
 void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_s=
ched)
 {
 	unsigned long epoch;
@@ -1299,6 +1307,7 @@ static void task_cache_work(struct callback_head *wor=
k)
 	struct task_struct *p =3D current;
 	struct mm_struct *mm =3D p->mm;
 	unsigned long m_a_occ =3D 0;
+	unsigned long last_m_a_occ =3D 0;
 	int cpu, m_a_cpu =3D -1;
 	cpumask_var_t cpus;
=20
@@ -1337,11 +1346,13 @@ static void task_cache_work(struct callback_head *w=
ork)
 					     per_cpu(sd_llc_id, i), occ, m_occ, m_cpu, nr);
 			}
=20
-			a_occ /=3D nr;
+			// a_occ /=3D nr;
 			if (a_occ > m_a_occ) {
 				m_a_occ =3D a_occ;
 				m_a_cpu =3D m_cpu;
 			}
+			if (llc_id(cpu) =3D=3D llc_id(mm->mm_sched_cpu))
+				last_m_a_occ =3D a_occ;
=20
 			trace_printk("(%d) a_occ: %ld m_a_occ: %ld\n",
 				     per_cpu(sd_llc_id, cpu), a_occ, m_a_occ);
@@ -1355,13 +1366,10 @@ static void task_cache_work(struct callback_head *w=
ork)
 		}
 	}
=20
-	/*
-	 * If the max average cache occupancy is 'small' we don't care.
-	 */
-	if (m_a_occ < (NICE_0_LOAD >> EPOCH_OLD))
-		m_a_cpu =3D -1;
-
-	mm->mm_sched_cpu =3D m_a_cpu;
+	if (m_a_occ > (2 * last_m_a_occ)) {
+		/* avoid the bouncing of mm_sched_cpu */
+		mm->mm_sched_cpu =3D m_a_cpu;
+	}
=20
 	free_cpumask_var(cpus);
 }
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 229A22F9488
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:51 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270912; cv=none;
 b=trjx3OlHRA8G6/xha0gDitGyeQDoeJ3CWSLlk3i5tVcRLQ172YBtygleCjw//E9Ox3BBaWN+ph357z6VKUSjNpOMgeWNiH6GkUqOMtdSlowllHpGMFXca9dnbLicNEyUsvDUBI3SRpuyiOyhA3wQi6hex0PK3QdUEepICMIjXZg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270912; c=relaxed/simple;
	bh=o9ssIKZoupCUY+RNkWM8+C2a5S5kPuUisQEegnB07u4=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=iGg9GEruTAvKeZoWaAvOBUnqouWOODirEVRQZXGYRmNvKmUBtFTbO+hnJ2kyJkTtn96ZqswISYcW/8MaFKP99lSrk/CuZH9xxItsJABocbfde14vreP3VS50k7ELI9JacoChlVu0tLEaIQCSl73iwrgLFU0W1jkrM5FMDR2/bjQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=gB32GMi+; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="gB32GMi+"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270911; x=1781806911;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=o9ssIKZoupCUY+RNkWM8+C2a5S5kPuUisQEegnB07u4=;
  b=gB32GMi+ZsCEz3HcthojXq5IZgpULXknP3fum+zJzV6sVIWDquFR+WL7
   p4aRxUvmzJTtGtb6DbOhdfWcTMwfYPeBoThNWTLkO+kN/Gx/5mzI4RDN3
   3JDAP4eXcOHwI+Xgzs+L46NGuc3oyWxTcDIB8oNEL0esdpvR2zH9nzgTk
   s/AtWUC3ubNeM+NWKgu756KSw8M2pErJkISkQA7CeZMciVqZKfTbgApS0
   EUcI9uAwecjVzzaaA+BEUO8jqFgqqlw3NL3G9rPT6t50c2BEIqPjRs/ed
   abvo42eYS3OUZKzeI3iJnnkUKHv/OSzErtpgNUBNHgkLcvq3ke4HuEw9u
   A==;
X-CSE-ConnectionGUID: PSo7rU+bQcqeonMNyHQ+LA==
X-CSE-MsgGUID: ps4xpG3nQJSMcwDcH4WDhg==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931523"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931523"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:50 -0700
X-CSE-ConnectionGUID: SALh/A1xQRqjZut3f9eZCA==
X-CSE-MsgGUID: 8pi5QrQxSdKOJKN1adsLzw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180959857"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:49 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Chen Yu <yu.c.chen@intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC patch v3 06/20] sched: Save the per LLC utilization for better
 cache aware scheduling
Date: Wed, 18 Jun 2025 11:27:54 -0700
Message-Id: 
 <22f5c52b3e904bd782c43bc4bfc6fcd4b447ea54.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Chen Yu <yu.c.chen@intel.com>

When a system gets busy and a process's preferred LLC
is saturated by too many threads within this process, there are significant
in-LLC task migrations within its preferred LLC. This leads to migration
latency and degrades performance. Ideally, task aggregation should be
inhibited if the task's preferred LLC is overloaded. This implies that a
metric is needed to indicate whether the LLC is busy.

Store the per-LLC utilization calculated via periodic load
balancing. These statistics will be used in subsequent patches to
determine whether tasks should be aggregated to their preferred LLC.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 include/linux/sched/topology.h |  3 ++
 kernel/sched/fair.c            | 53 ++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 7b4301b7235f..b3115bc1cbc0 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -78,6 +78,9 @@ struct sched_domain_shared {
 	atomic_t	nr_busy_cpus;
 	int		has_idle_cores;
 	int		nr_idle_scan;
+#ifdef CONFIG_SCHED_CACHE
+	unsigned long	util_avg;
+#endif
 };
=20
 struct sched_domain {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7fb2322c5d9e..02f104414b9a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8806,6 +8806,22 @@ static int find_energy_efficient_cpu(struct task_str=
uct *p, int prev_cpu)
 #ifdef CONFIG_SCHED_CACHE
 static long __migrate_degrades_locality(struct task_struct *p, int src_cpu=
, int dst_cpu, bool idle);
=20
+/* expected to be protected by rcu_read_lock() */
+static bool get_llc_stats(int cpu, unsigned long *util,
+			  unsigned long *cap)
+{
+	struct sched_domain_shared *sd_share;
+
+	sd_share =3D rcu_dereference(per_cpu(sd_llc_shared, cpu));
+	if (!sd_share)
+		return false;
+
+	*util =3D READ_ONCE(sd_share->util_avg);
+	*cap =3D per_cpu(sd_llc_size, cpu) * SCHED_CAPACITY_SCALE;
+
+	return true;
+}
+
 static int select_cache_cpu(struct task_struct *p, int prev_cpu)
 {
 	struct mm_struct *mm =3D p->mm;
@@ -10646,6 +10662,42 @@ sched_reduced_capacity(struct rq *rq, struct sched=
_domain *sd)
 	return check_cpu_capacity(rq, sd);
 }
=20
+#ifdef CONFIG_SCHED_CACHE
+/*
+ * Save this sched group's statistic for later use:
+ * The task wakeup and load balance can make better
+ * decision based on these statistics.
+ */
+static void update_sg_if_llc(struct lb_env *env, struct sg_lb_stats *sgs,
+			     struct sched_group *group)
+{
+	/* Find the sched domain that spans this group. */
+	struct sched_domain *sd =3D env->sd->child;
+	struct sched_domain_shared *sd_share;
+
+	if (!sched_feat(SCHED_CACHE) || env->idle =3D=3D CPU_NEWLY_IDLE)
+		return;
+
+	/* only care the sched domain that spans 1 LLC */
+	if (!sd || !(sd->flags & SD_SHARE_LLC) ||
+	    !sd->parent || (sd->parent->flags & SD_SHARE_LLC))
+		return;
+
+	sd_share =3D rcu_dereference(per_cpu(sd_llc_shared,
+				   cpumask_first(sched_group_span(group))));
+	if (!sd_share)
+		return;
+
+	if (likely(READ_ONCE(sd_share->util_avg) !=3D sgs->group_util))
+		WRITE_ONCE(sd_share->util_avg, sgs->group_util);
+}
+#else
+static inline void update_sg_if_llc(struct lb_env *env, struct sg_lb_stats=
 *sgs,
+				    struct sched_group *group)
+{
+}
+#endif
+
 /**
  * update_sg_lb_stats - Update sched_group's statistics for load balancing.
  * @env: The load balancing environment.
@@ -10735,6 +10787,7 @@ static inline void update_sg_lb_stats(struct lb_env=
 *env,
=20
 	sgs->group_type =3D group_classify(env->sd->imbalance_pct, group, sgs);
=20
+	update_sg_if_llc(env, sgs, group);
 	/* Computing avg_load makes sense only when group is overloaded */
 	if (sgs->group_type =3D=3D group_overloaded)
 		sgs->avg_load =3D (sgs->group_load * SCHED_CAPACITY_SCALE) /
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id CFC872F949E
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:51 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270915; cv=none;
 b=Z36ep254EkAHaKvpr8i7KwV4mxwpymtfd2E0A6r1XAf82xXaqo+m0qmyyZX4NW84q+tYHhFu/VEpulhnEvbYzsslAlOsEdbj/cgL3d/z0RoO88Yz4a6r2b06VMNmTH912fsGsTfN/YW+VYbD10CiJVQVXES+s/I6OYnWE4BIsqk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270915; c=relaxed/simple;
	bh=HEb10UrMOCgohlL9Nxhp873UqhXKYVZZ5sEwo42BWsw=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=V5Z/EUYBBE/1u61niCwmDRwy0KcujoUnnkZEXjAt3pPVOfg6lPDVWI8oDLucO2nivKtjepMD9B9l2n0gliPOmSpL08JDjYQMNdIIyu7FMX+A1RKWk/VwrZpb/ie9Q5tCUnYK9oOIFmI9VCF2TdqXlfQyE/9gjbxg71N+r5vzmpE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=iTHUAET4; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="iTHUAET4"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270911; x=1781806911;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=HEb10UrMOCgohlL9Nxhp873UqhXKYVZZ5sEwo42BWsw=;
  b=iTHUAET4QNzU15KipInKDKa553rGd8p4EGv5ueHYEHtVMDJBGVWb9jyb
   0f+AUwD60jIkZnWy1Hh0X/GY6o6skz2X0cFXL5PYANaViVdrNCYYcBI9+
   93Yy0pV59RjM4ec4buLe0pykhrmHjPvgPH4t2P2rCGCOr9UAxesex95B2
   ljR8tWCmfhg2uyQELaySWiA4N1O7lUWXcjia1sXNj2D47V0T1Gu0IqDrm
   dg1Y/Am2QjVh/PycKohb5TLEWxUNqGp1dzcMX1OUkpYS92qdY5o4yxWJp
   mPLxjSRGG+jTFdBLrVXXU3kBfrZAbV2I+WaHLGmP5yRhH+/iv/CBputDH
   g==;
X-CSE-ConnectionGUID: nGL6WccUTluIsoPploYN4A==
X-CSE-MsgGUID: vdU2TrNrS76S5tDix/iXow==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931537"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931537"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:51 -0700
X-CSE-ConnectionGUID: 2qoc+SXgS1igWBhKnpTp2g==
X-CSE-MsgGUID: +fdgV0C1R2CwRkSohE+TnA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180959878"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:50 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 07/20] sched: Add helper function to decide whether to
 allow cache aware scheduling
Date: Wed, 18 Jun 2025 11:27:55 -0700
Message-Id: 
 <abb433c3345587d068e5381c65e9d0b3f50828d9.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Cache-aware scheduling is designed to aggregate threads into their
preferred LLC, either via the task wake up path or the load balancing
path. One side effect is that when the preferred LLC is saturated,
more threads will continue to be stacked on it, degrading the workload's
latency. A strategy is needed to prevent this aggregation from going too
far such that the preferred LLC is too overloaded.

Introduce helper function _get_migrate_hint() to implement the LLC
migration policy:

1) A task is aggregated to its preferred LLC if both source/dest LLC
   are not too busy (<50% utilization, tunable), or the preferred
   LLC will not be too out of balanced from the non preferred LLC
   (>20% utilization, tunable, close to imbalance_pct of the LLC
   domain).
2) Allow a task to be moved from the preferred LLC to the
   non-preferred one if the non-preferred LLC will not be too out
   of balanced from the preferred prompting an aggregation task
   migration later.  We are still experimenting with the aggregation
   and migration policy. Some other possibilities are policy based
   on LLC's load or average number of tasks running.  Those could
   be tried out by tweaking _get_migrate_hint().

The function _get_migrate_hint() returns migration suggestions for the uppe=
r-level
functions.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/debug.c |   4 ++
 kernel/sched/fair.c  | 110 ++++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/sched.h |   5 ++
 3 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 56ae54e0ce6a..7271ad1152af 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -532,6 +532,10 @@ static __init int sched_init_debug(void)
 	debugfs_create_u32("hot_threshold_ms", 0644, numa, &sysctl_numa_balancing=
_hot_threshold);
 #endif
=20
+#ifdef CONFIG_SCHED_CACHE
+	debugfs_create_u32("llc_aggr_cap", 0644, debugfs_sched, &sysctl_llc_aggr_=
cap);
+	debugfs_create_u32("llc_aggr_imb", 0644, debugfs_sched, &sysctl_llc_aggr_=
imb);
+#endif
 	debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops=
);
=20
 	debugfs_fair_server_init();
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f104414b9a..10ea408d0e40 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8804,7 +8804,39 @@ static int find_energy_efficient_cpu(struct task_str=
uct *p, int prev_cpu)
 }
=20
 #ifdef CONFIG_SCHED_CACHE
-static long __migrate_degrades_locality(struct task_struct *p, int src_cpu=
, int dst_cpu, bool idle);
+static long __migrate_degrades_locality(struct task_struct *p,
+					int src_cpu, int dst_cpu,
+					bool idle);
+__read_mostly unsigned int sysctl_llc_aggr_cap       =3D 50;
+__read_mostly unsigned int sysctl_llc_aggr_imb       =3D 20;
+
+/*
+ * The margin used when comparing LLC utilization with CPU capacity.
+ * Parameter sysctl_llc_aggr_cap determines the LLC load level where
+ * active LLC aggregation is done.
+ * Derived from fits_capacity().
+ *
+ * (default: ~50%)
+ */
+#define fits_llc_capacity(util, max)	\
+	((util) * 100 < (max) * sysctl_llc_aggr_cap)
+
+/*
+ * The margin used when comparing utilization.
+ * is 'util1' noticeably greater than 'util2'
+ * Derived from capacity_greater().
+ * Bias is in perentage.
+ */
+/* Allows dst util to be bigger than src util by up to bias percent */
+#define util_greater(util1, util2) \
+	((util1) * 100 > (util2) * (100 + sysctl_llc_aggr_imb))
+
+enum llc_mig_hint {
+	mig_allow =3D 0,
+	mig_ignore,
+	mig_forbid
+};
+
=20
 /* expected to be protected by rcu_read_lock() */
 static bool get_llc_stats(int cpu, unsigned long *util,
@@ -8822,6 +8854,82 @@ static bool get_llc_stats(int cpu, unsigned long *ut=
il,
 	return true;
 }
=20
+static enum llc_mig_hint _get_migrate_hint(int src_cpu, int dst_cpu,
+					   unsigned long tsk_util,
+					   bool to_pref)
+{
+	unsigned long src_util, dst_util, src_cap, dst_cap;
+
+	if (cpus_share_cache(src_cpu, dst_cpu))
+		return mig_allow;
+
+	if (!get_llc_stats(src_cpu, &src_util, &src_cap) ||
+	    !get_llc_stats(dst_cpu, &dst_util, &dst_cap))
+		return mig_allow;
+
+	if (!fits_llc_capacity(dst_util, dst_cap) &&
+	    !fits_llc_capacity(src_util, src_cap))
+		return mig_ignore;
+
+	src_util =3D src_util < tsk_util ? 0 : src_util - tsk_util;
+	dst_util =3D dst_util + tsk_util;
+	if (to_pref) {
+		/*
+		 * sysctl_llc_aggr_imb is the imbalance allowed between
+		 * preferred LLC and non-preferred LLC.
+		 * Don't migrate if we will get preferred LLC too
+		 * heavily loaded and if the dest is much busier
+		 * than the src, in which case migration will
+		 * increase the imbalance too much.
+		 */
+		if (!fits_llc_capacity(dst_util, dst_cap) &&
+		    util_greater(dst_util, src_util))
+			return mig_forbid;
+	} else {
+		/*
+		 * Don't migrate if we will leave preferred LLC
+		 * too idle, or if this migration leads to the
+		 * non-preferred LLC falls within sysctl_aggr_imb percent
+		 * of preferred LLC, leading to migration again
+		 * back to preferred LLC.
+		 */
+		if (fits_llc_capacity(src_util, src_cap) ||
+		    !util_greater(src_util, dst_util))
+			return mig_forbid;
+	}
+	return mig_allow;
+}
+
+/*
+ * Give suggestion when task p is migrated from src_cpu to dst_cpu.
+ */
+static __maybe_unused enum llc_mig_hint get_migrate_hint(int src_cpu, int =
dst_cpu,
+							 struct task_struct *p)
+{
+	struct mm_struct *mm;
+	int cpu;
+
+	if (cpus_share_cache(src_cpu, dst_cpu))
+		return mig_allow;
+
+	mm =3D p->mm;
+	if (!mm)
+		return mig_allow;
+
+	cpu =3D mm->mm_sched_cpu;
+	if (cpu < 0)
+		return mig_allow;
+
+	if (cpus_share_cache(dst_cpu, cpu))
+		return _get_migrate_hint(src_cpu, dst_cpu,
+					 task_util(p), true);
+	else if (cpus_share_cache(src_cpu, cpu))
+		return _get_migrate_hint(src_cpu, dst_cpu,
+					 task_util(p), false);
+	else
+		return mig_allow;
+}
+
 static int select_cache_cpu(struct task_struct *p, int prev_cpu)
 {
 	struct mm_struct *mm =3D p->mm;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d16ccd66ca07..1c6fd45c7f62 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2818,6 +2818,11 @@ extern unsigned int sysctl_numa_balancing_scan_perio=
d_max;
 extern unsigned int sysctl_numa_balancing_scan_size;
 extern unsigned int sysctl_numa_balancing_hot_threshold;
=20
+#ifdef CONFIG_SCHED_CACHE
+extern unsigned int sysctl_llc_aggr_cap;
+extern unsigned int sysctl_llc_aggr_imb;
+#endif
+
 #ifdef CONFIG_SCHED_HRTICK
=20
 /*
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D922E2F94BC
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:52 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270914; cv=none;
 b=qo1W6boom/FPcJM6aUbqBPYFsJsbF683DNq+T76orD1BhUuPT/cDgxLm/IdIt7lsAwwvhls6rRgrRp3wVI2a2orhxiRxH4pzTcUnStzKQ94lhDbiQkmwAnBP+Oe6i31HfDZbyBWWJXZl9duCrd/52c4F5rx8/huBgPpKES9g+o0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270914; c=relaxed/simple;
	bh=cQUCpe4LZfLrzrqrRvhJ4zn32opdkosRxPW3YpSdQpw=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=k8VqwMymU4J6e9GkRGLqFZapA+h/KmaiCqV6pWjSVsWwmgnsiVhiYCMQUYaIYCuRy8xdvkKxXLmpnnj81wxXCZFpX6tgVD+1igdqcNGnsw+8Dd7OREe7hmOl9DpLac08ZCVRrBBzXbfElsFBTfBsioIq0k6loXV4uYpQZrYBrMU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=FUltCdk8; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="FUltCdk8"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270912; x=1781806912;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=cQUCpe4LZfLrzrqrRvhJ4zn32opdkosRxPW3YpSdQpw=;
  b=FUltCdk86P1n8b/xtaRLXyRNaWFyT3up5/DOjijJa7AptKtTq7F17sT9
   +qgOnAsLS4hrBcE0M3b/NgTR2SCnVrsRwXJKozVz4N9t01io3n/dvVQKq
   38gbrgGuDv4YYXh0s0Tdj8hQgPW825VDrCKW2iASUc/Zz+VmPLgQmKiPp
   FyR41eBRrDbzEAAwNxvUiMbjT740rIgIieuCoK8C/tv7tcqrUHNVi1T/k
   b/vnTpMgt+sYhmf2tlLBsLIZRkLEKBWUqEj3rUfk1D31j6gtYWu/kqjBU
   asxz7novbH7ygWHnG2F/F9OvSjwzEc0+3fkEynQRMLwKacbHoHNVa0CrB
   w==;
X-CSE-ConnectionGUID: pkqYpBEfS3upf7WK/GoLIw==
X-CSE-MsgGUID: paR/450tR6eOwDodmnSOag==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931550"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931550"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:52 -0700
X-CSE-ConnectionGUID: uNM5F0vKRkCxQO9WvZjBoQ==
X-CSE-MsgGUID: qccG1toMTWyEIVjk+nl3kg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180959901"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:51 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 08/20] sched: Set up LLC indexing
Date: Wed, 18 Jun 2025 11:27:56 -0700
Message-Id: 
 <71e251a086be786fb2a0480bbab69142d14bd22d.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Prepare for indexing arrays that track in each run queue: the number
of tasks preferring current LLC and each of the other LLC.

The reason to introduce LLC index is because the per LLC-scope data
is needed to do cache aware load balancing. However, the native lld_id
is usually the first CPU of that LLC domain, which is not continuous,
which might waste the space if the per LLC-scope data is stored
in an array (in current implementation).

In the future, this LLC index could be removed after
the native llc_id is used as the key to search into xarray based
array.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 include/linux/sched.h   |  3 +++
 kernel/sched/fair.c     | 12 ++++++++++++
 kernel/sched/sched.h    |  2 ++
 kernel/sched/topology.c | 29 +++++++++++++++++++++++++++++
 4 files changed, 46 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d0e4cda2b3cd..7ce95a32e9ff 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -810,6 +810,9 @@ struct kmap_ctrl {
 #endif
 };
=20
+/* XXX need fix to not use magic number */
+#define MAX_LLC 64
+
 struct task_struct {
 #ifdef CONFIG_THREAD_INFO_IN_TASK
 	/*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 10ea408d0e40..5549710d95cf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1183,6 +1183,18 @@ static int llc_id(int cpu)
 	return per_cpu(sd_llc_id, cpu);
 }
=20
+/*
+ * continous index.
+ * TBD: replace by xarray with key llc_id()
+ */
+static inline int llc_idx(int cpu)
+{
+	if (cpu < 0)
+		return -1;
+
+	return per_cpu(sd_llc_idx, cpu);
+}
+
 void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_s=
ched)
 {
 	unsigned long epoch;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1c6fd45c7f62..74eb2f3615aa 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2037,6 +2037,7 @@ static inline struct sched_domain *lowest_flag_domain=
(int cpu, int flag)
 DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc);
 DECLARE_PER_CPU(int, sd_llc_size);
 DECLARE_PER_CPU(int, sd_llc_id);
+DECLARE_PER_CPU(int, sd_llc_idx);
 DECLARE_PER_CPU(int, sd_share_id);
 DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
 DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa);
@@ -2045,6 +2046,7 @@ DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_=
cpucapacity);
=20
 extern struct static_key_false sched_asym_cpucapacity;
 extern struct static_key_false sched_cluster_active;
+extern int max_llcs;
=20
 static __always_inline bool sched_asym_cpucap_active(void)
 {
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index f1ebc60d967f..b7bb13045dd8 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -672,6 +672,7 @@ static void destroy_sched_domains(struct sched_domain *=
sd)
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc);
 DEFINE_PER_CPU(int, sd_llc_size);
 DEFINE_PER_CPU(int, sd_llc_id);
+DEFINE_PER_CPU(int, sd_llc_idx);
 DEFINE_PER_CPU(int, sd_share_id);
 DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
@@ -681,6 +682,25 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cp=
ucapacity);
 DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
 DEFINE_STATIC_KEY_FALSE(sched_cluster_active);
=20
+int max_llcs =3D -1;
+
+static void update_llc_idx(int cpu)
+{
+#ifdef CONFIG_SCHED_CACHE
+	int idx =3D -1, llc_id =3D -1;
+
+	llc_id =3D per_cpu(sd_llc_id, cpu);
+	idx =3D per_cpu(sd_llc_idx, llc_id);
+
+	if (idx < 0) {
+		idx =3D max_llcs++;
+		BUG_ON(idx > MAX_LLC);
+		per_cpu(sd_llc_idx, llc_id) =3D idx;
+	}
+	per_cpu(sd_llc_idx, cpu) =3D idx;
+#endif
+}
+
 static void update_top_cache_domain(int cpu)
 {
 	struct sched_domain_shared *sds =3D NULL;
@@ -699,6 +719,7 @@ static void update_top_cache_domain(int cpu)
 	per_cpu(sd_llc_size, cpu) =3D size;
 	per_cpu(sd_llc_id, cpu) =3D id;
 	rcu_assign_pointer(per_cpu(sd_llc_shared, cpu), sds);
+	update_llc_idx(cpu);
=20
 	sd =3D lowest_flag_domain(cpu, SD_CLUSTER);
 	if (sd)
@@ -2394,6 +2415,14 @@ build_sched_domains(const struct cpumask *cpu_map, s=
truct sched_domain_attr *att
 	bool has_asym =3D false;
 	bool has_cluster =3D false;
=20
+#ifdef CONFIG_SCHED_CACHE
+	if (max_llcs < 0) {
+		for_each_possible_cpu(i)
+			per_cpu(sd_llc_idx, i) =3D -1;
+		max_llcs =3D 0;
+	}
+#endif
+
 	if (WARN_ON(cpumask_empty(cpu_map)))
 		goto error;
=20
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B94F12FA65E
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:54 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270916; cv=none;
 b=UeyCyGRjZM/3la5K7W5DZr7Fqbx/pXCPlXhvjgFOgYIXCDUEtBpJd57eBNDqgWvGJv8lBL+mUxf5kMWOHyA8RedaqjM+j02Jn9B78T1lChIZ2n/HcQ4ovyIdvMDIjh6GJZ0rAG3+mVxq530ReordkAU/8zkHRYpmSCOgeBTL+I4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270916; c=relaxed/simple;
	bh=qCkIhL6rGvZ+dzcJ6E4keINBL3DRXFHBoE2OhjjXt84=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=HdR2Xf4V3RIfo4sN94qeqMUmpg9AvXnNLA7hey4fd0WeAkTF0T4KDeRwQToF9rBlqaacSEzIC3ldfo3gUF0PSPBuQsHrIDtfOE0rb1e2syBB9Uy0m0Pyuh26N9i8RFDaGsxHG8i9HPtFTwvfAZ4PLiBu3T6sgQXJ8rTnQ7LikE0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=ctebiIFc; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="ctebiIFc"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270914; x=1781806914;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=qCkIhL6rGvZ+dzcJ6E4keINBL3DRXFHBoE2OhjjXt84=;
  b=ctebiIFceLs/6XgbLjwR55dJzRZbzapx502Dxi0l/NvCbPchL4pYfpTv
   mX4fL7dCkAt2uRGyN8gw9ioXWlpiuXifnjj+0MNxYIis5NlGMWrVuELt9
   p2k9M3g2gYolFGidjsV123j/xwGYMbxHzvGAIu7gZe6H/GBXSmGkX0BuE
   BbuXEWcCw0iqTOGZJwjotpQh0+0BjJedRDEyx/wJT4zQv28fNmgWwOtv1
   f4suB3nLhc82MQOzvFx7z7nB0rbHQlioxhlaZW+cZpn776eX5rSkia9jE
   XjCTqRmxuGrPP0O3C/HkP/FgXNG323aYUO7tahFuWXcmZCHjHmALri07M
   Q==;
X-CSE-ConnectionGUID: NPicx326QWy/zNraeo2kXQ==
X-CSE-MsgGUID: V9D14qHKQa6053W/dqsxug==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931566"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931566"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:53 -0700
X-CSE-ConnectionGUID: 7bYdnBGCRGq67Ukuexzzfg==
X-CSE-MsgGUID: bMqkNepHSg24kSctAmlfHw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180959924"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:52 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 09/20] sched: Introduce task preferred LLC field
Date: Wed, 18 Jun 2025 11:27:57 -0700
Message-Id: 
 <7b9df4433d73ce415a27925ce42cec53774debaf.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

With cache aware scheduling enabled, each process is assigned
a preferred LLC id, which will be used to quickly identify
the LLC domain this thread prefers to run. This is similar to
numa_preferred_nid for NUMA balance.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 include/linux/sched.h | 1 +
 init/init_task.c      | 3 +++
 kernel/sched/fair.c   | 7 +++++++
 3 files changed, 11 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7ce95a32e9ff..2f1cb7445733 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1404,6 +1404,7 @@ struct task_struct {
=20
 #ifdef CONFIG_SCHED_CACHE
 	struct callback_head		cache_work;
+	int				preferred_llc;
 #endif
=20
 #ifdef CONFIG_RSEQ
diff --git a/init/init_task.c b/init/init_task.c
index e557f622bd90..5fffbe766f57 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -188,6 +188,9 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) =
=3D {
 	.numa_group	=3D NULL,
 	.numa_faults	=3D NULL,
 #endif
+#ifdef CONFIG_SCHED_CACHE
+	.preferred_llc  =3D -1,
+#endif
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	.kasan_depth	=3D 1,
 #endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5549710d95cf..cc804a8c7061 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1267,6 +1267,7 @@ void account_mm_sched(struct rq *rq, struct task_stru=
ct *p, s64 delta_exec)
 	struct mm_struct *mm =3D p->mm;
 	struct mm_sched *pcpu_sched;
 	unsigned long epoch;
+	int mm_sched_llc =3D -1;
=20
 	/*
 	 * init_task and kthreads don't be having no mm
@@ -1293,6 +1294,12 @@ void account_mm_sched(struct rq *rq, struct task_str=
uct *p, s64 delta_exec)
 		mm->mm_sched_cpu =3D -1;
 		pcpu_sched->occ =3D 0;
 	}
+
+	if (mm->mm_sched_cpu !=3D -1)
+		mm_sched_llc =3D per_cpu(sd_llc_id, mm->mm_sched_cpu);
+
+	if (p->preferred_llc !=3D mm_sched_llc)
+		p->preferred_llc =3D mm_sched_llc;
 }
=20
 static void task_tick_cache(struct rq *rq, struct task_struct *p)
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 378672FBFE4
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:55 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270916; cv=none;
 b=e05IFwV/MLqo2ClgaOGWLNSlY10NaanqCNFgdhzJHiSJKN+sh3Zlln5UdTmioRlRhRjp/nMlzDaeMob8JkL6Vnprcb7T4E++++CTqUtUCV4CFP4PSK0vw5A7hPHtk/OEkDT1g3ZfjDaU9iC9y5xW8mEtVA1goM0jAs9Hl7McPfo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270916; c=relaxed/simple;
	bh=WuCB5SIKo2iM8nX5ebWteeRjxdPAXbYteS+EwA2pE0w=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=oAGm3igrNrOpxMrHQQoLXScf+6Qhzs+QT0b8NDUn/Z1Lg6wRKdWBLm3Z8cBcgmvttwNrNTS1WRshJkfEt9buiwj9p9r7b5+8Pgfu5tiqdhoABS40DvzWpr6d+nOBznhgOieUV5aoD2LXxBNoP4rNF0a9Ez39F7B7HtKZvpqaKlk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=NHLCwYLq; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="NHLCwYLq"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270915; x=1781806915;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=WuCB5SIKo2iM8nX5ebWteeRjxdPAXbYteS+EwA2pE0w=;
  b=NHLCwYLqVTRJAuEFNcV41R7MnvuS9z2QmW5K059zPETnEpDlEodn/VG8
   IZzAuJmXE4uhiAbjaKpwaneVSv8DpipBx12ro57h14bYGeGVOnAEqtRC0
   MYOOelRwA6HB65s8wwQlNHjKsFH8Px5CvzkVOr9zfB2+Cf3ZCiBWNjFvJ
   ia6JfMeXbhgywU5X/aCFqFVuO0i7U1S4e/3PZ/4lISImQE5ptcYQvqsVE
   7frWf0qbM3P8Z3xratwf4AuiFQOa5n18Y0HGvqXbmBvbVIX5w1NHG2f4g
   7u4XdLdk1q19T6udH+vKwbKmJFB9NiWgUD4pYJZpJbmpavs+mwAM+IGJz
   Q==;
X-CSE-ConnectionGUID: ABRohcKYRLuwXUvFn04ETg==
X-CSE-MsgGUID: ajqbr60WT6mWJDBrVQIQ4w==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931579"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931579"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:54 -0700
X-CSE-ConnectionGUID: wKSilpu8SRKhyBsxbDtmNA==
X-CSE-MsgGUID: LTGYQY+lTBaZzmctTTPKPQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180959948"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:53 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 10/20] sched: Calculate the number of tasks that have
 LLC preference on a runqueue
Date: Wed, 18 Jun 2025 11:27:58 -0700
Message-Id: 
 <0664be8a3e805ed93eb930131951b1a84cebed66.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Track for each run queue, the number of tasks that have a LLC preference
and how many of those tasks are running in its preferred LLC.  This is
similar to nr_numa_running and nr_preferred_running for NUMA balance,
and will be used by the cache-aware load balancing in subsequent patches.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/core.c  | 12 ++++++++++++
 kernel/sched/fair.c  | 42 +++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/sched.h |  7 +++++++
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d9c3e75f79d1..34056eb79ef2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -498,6 +498,18 @@ void __trace_set_current_state(int state_value)
 }
 EXPORT_SYMBOL(__trace_set_current_state);
=20
+#ifdef CONFIG_SMP
+int task_llc(const struct task_struct *p)
+{
+	return per_cpu(sd_llc_id, task_cpu(p));
+}
+#else
+int task_llc(const struct task_struct *p)
+{
+	return 0;
+}
+#endif
+
 /*
  * Serialization rules:
  *
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cc804a8c7061..88ff47194faa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1195,6 +1195,18 @@ static inline int llc_idx(int cpu)
 	return per_cpu(sd_llc_idx, cpu);
 }
=20
+static void account_llc_enqueue(struct rq *rq, struct task_struct *p)
+{
+	rq->nr_llc_running +=3D (p->preferred_llc !=3D -1);
+	rq->nr_pref_llc_running +=3D (p->preferred_llc =3D=3D task_llc(p));
+}
+
+static void account_llc_dequeue(struct rq *rq, struct task_struct *p)
+{
+	rq->nr_llc_running -=3D (p->preferred_llc !=3D -1);
+	rq->nr_pref_llc_running -=3D (p->preferred_llc =3D=3D task_llc(p));
+}
+
 void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_s=
ched)
 {
 	unsigned long epoch;
@@ -1298,8 +1310,11 @@ void account_mm_sched(struct rq *rq, struct task_str=
uct *p, s64 delta_exec)
 	if (mm->mm_sched_cpu !=3D -1)
 		mm_sched_llc =3D per_cpu(sd_llc_id, mm->mm_sched_cpu);
=20
-	if (p->preferred_llc !=3D mm_sched_llc)
+	if (p->preferred_llc !=3D mm_sched_llc) {
+		account_llc_dequeue(rq, p);
 		p->preferred_llc =3D mm_sched_llc;
+		account_llc_enqueue(rq, p);
+	}
 }
=20
 static void task_tick_cache(struct rq *rq, struct task_struct *p)
@@ -1400,6 +1415,14 @@ void init_sched_mm(struct task_struct *p)
 	work->next =3D work;
 }
=20
+void reset_llc_stats(struct rq *rq)
+{
+	if (rq->nr_llc_running)
+		rq->nr_llc_running =3D 0;
+
+	rq->nr_pref_llc_running =3D 0;
+}
+
 #else
=20
 static inline void account_mm_sched(struct rq *rq, struct task_struct *p,
@@ -1410,6 +1433,17 @@ void init_sched_mm(struct task_struct *p) { }
=20
 static void task_tick_cache(struct rq *rq, struct task_struct *p) { }
=20
+static void account_llc_enqueue(struct rq *rq, struct task_struct *p)
+{
+}
+
+static void account_llc_dequeue(struct rq *rq, struct task_struct *p)
+{
+}
+
+void reset_llc_stats(struct rq *rq)
+{
+}
 #endif
=20
 static inline
@@ -3939,6 +3973,7 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct =
sched_entity *se)
 		struct rq *rq =3D rq_of(cfs_rq);
=20
 		account_numa_enqueue(rq, task_of(se));
+		account_llc_enqueue(rq, task_of(se));
 		list_add(&se->group_node, &rq->cfs_tasks);
 	}
 #endif
@@ -3952,10 +3987,15 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struc=
t sched_entity *se)
 #ifdef CONFIG_SMP
 	if (entity_is_task(se)) {
 		account_numa_dequeue(rq_of(cfs_rq), task_of(se));
+		account_llc_dequeue(rq_of(cfs_rq), task_of(se));
 		list_del_init(&se->group_node);
 	}
 #endif
 	cfs_rq->nr_queued--;
+
+	/* safeguard? */
+	if (!parent_entity(se) && !cfs_rq->nr_queued)
+		reset_llc_stats(rq_of(cfs_rq));
 }
=20
 /*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 74eb2f3615aa..6c83a71ac8ca 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1104,6 +1104,10 @@ struct rq {
 	unsigned int		nr_preferred_running;
 	unsigned int		numa_migrate_on;
 #endif
+#ifdef CONFIG_SCHED_CACHE
+	unsigned int		nr_pref_llc_running;
+	unsigned int		nr_llc_running;
+#endif
 #ifdef CONFIG_NO_HZ_COMMON
 #ifdef CONFIG_SMP
 	unsigned long		last_blocked_load_update_tick;
@@ -1948,6 +1952,9 @@ init_numa_balancing(unsigned long clone_flags, struct=
 task_struct *p)
=20
 #endif /* !CONFIG_NUMA_BALANCING */
=20
+extern void reset_llc_stats(struct rq *rq);
+extern int task_llc(const struct task_struct *p);
+
 #ifdef CONFIG_SMP
=20
 static inline void
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 88EE22FBFED
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:55 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270917; cv=none;
 b=P5gHhCOHlQM9p9ku+pEKNmIWE5uT5+S4kV5oxYtsqpEFVlr9fc0D6NpdlPtZ/gPYaVzFIEml4c8bRLDf/rApQ0P+4X3sXKjvOocZZdUlhKVWOp4g7Z3DkjKRfUK4EbZevwf1AguUUNQhOhr+jz43UGTNA9B35kqwXwKY13QXM6M=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270917; c=relaxed/simple;
	bh=RUIbYeV38UehaAavCqtUVWreQbqIvMzQPunFd7dVPyE=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=Nzm7Ku6skORzgLcSoIMVnTsqDhfLNjvSWDaPtFERhQETbGDLm9wp9WChfMfmJr8ewsuksm4tdeOptPipC31yfOalmbU4lZM/tCb9mFe/8h4/7fz9qSGQznPwU1NdTCxhel20eNqlBKW2RqU7JuzYJfo7KMA4C1hPgt5AyDArxK4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=Mvd9Vfzt; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="Mvd9Vfzt"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270915; x=1781806915;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=RUIbYeV38UehaAavCqtUVWreQbqIvMzQPunFd7dVPyE=;
  b=Mvd9VfztrhHTmbjDNIxNZ/GGBzT7vnFVGVF55+uq3hkxTLk1MtmoMPMI
   U1wXH7aIGF8CNiN6VHqa6PsrvDZd1CfkDD23bWW38C2q0vuFWUdOR1rsg
   nQS1Vx/AFI6+tsMY9N0jzPGLqIf//4y/teLgExUZvlOCdWkv+ZRBOa19l
   Q/hMdcFdmtGM8n1dub+WeL8RYjxLFhZ3ifnf7sPjEA0wCKGpnuAk0VE1o
   xK9Vp73JH2YBrGV5TiS4D6hJAPIirmfsd5xm4xtPojrdCuhbtq4bZsQTB
   QF0NjbAxGEPgdEHlXrMy/bsHGHXWbrQ1UYQwmOgyrnmyyqbinzuyILx3v
   g==;
X-CSE-ConnectionGUID: s21awipcQP25JSLcWRQsFw==
X-CSE-MsgGUID: 8CcDA+u7TbaNoc5uJcqMDw==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931593"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931593"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:55 -0700
X-CSE-ConnectionGUID: SW2wbjbRTeKDU7dLzQyhpA==
X-CSE-MsgGUID: /CMNaZ0xTte1NASZbF1kDA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180959969"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:54 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 11/20] sched: Introduce per runqueue task LLC
 preference counter
Date: Wed, 18 Jun 2025 11:27:59 -0700
Message-Id: 
 <5334cbd97788ba58938444f6e6f07e6c433a9e1c.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Each runqueue is assigned a static array, where each element indicates
the number of tasks preferring a particular LLC mapped to the
array index.

For example, rq->nr_pref_llc[3] =3D 2 signifies that there are 2 tasks on
this runqueue which prefer to run within LLC3 (indexed from 0 to MAX_LLC
across the entire system). With this information, the load balancer can
make better decisions to select the busiest runqueue and migrate tasks
to their preferred LLC domains.

Note: The static array could be converted to an xarray in the future.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/fair.c  | 36 +++++++++++++++++++++++++++++++++++-
 kernel/sched/sched.h |  1 +
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 88ff47194faa..ba62b445bbbb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1195,16 +1195,45 @@ static inline int llc_idx(int cpu)
 	return per_cpu(sd_llc_idx, cpu);
 }
=20
+static inline int pref_llc_idx(struct task_struct *p)
+{
+	return llc_idx(p->preferred_llc);
+}
+
 static void account_llc_enqueue(struct rq *rq, struct task_struct *p)
 {
+	int pref_llc;
+
 	rq->nr_llc_running +=3D (p->preferred_llc !=3D -1);
 	rq->nr_pref_llc_running +=3D (p->preferred_llc =3D=3D task_llc(p));
+
+	if (p->preferred_llc < 0)
+		return;
+
+	pref_llc =3D pref_llc_idx(p);
+	if (pref_llc < 0)
+		return;
+
+	++rq->nr_pref_llc[pref_llc];
 }
=20
 static void account_llc_dequeue(struct rq *rq, struct task_struct *p)
 {
+	int pref_llc;
+
 	rq->nr_llc_running -=3D (p->preferred_llc !=3D -1);
 	rq->nr_pref_llc_running -=3D (p->preferred_llc =3D=3D task_llc(p));
+
+	if (p->preferred_llc < 0)
+		return;
+
+	pref_llc =3D pref_llc_idx(p);
+	if (pref_llc < 0)
+		return;
+
+	/* avoid negative counter */
+	if (rq->nr_pref_llc[pref_llc] > 0)
+		--rq->nr_pref_llc[pref_llc];
 }
=20
 void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_s=
ched)
@@ -1417,8 +1446,13 @@ void init_sched_mm(struct task_struct *p)
=20
 void reset_llc_stats(struct rq *rq)
 {
-	if (rq->nr_llc_running)
+	int i;
+
+	if (rq->nr_llc_running) {
+		for (i =3D 0; i < MAX_LLC; ++i)
+			rq->nr_pref_llc[i] =3D 0;
 		rq->nr_llc_running =3D 0;
+	}
=20
 	rq->nr_pref_llc_running =3D 0;
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6c83a71ac8ca..391ddc0195f8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1107,6 +1107,7 @@ struct rq {
 #ifdef CONFIG_SCHED_CACHE
 	unsigned int		nr_pref_llc_running;
 	unsigned int		nr_llc_running;
+	unsigned int		nr_pref_llc[MAX_LLC];
 #endif
 #ifdef CONFIG_NO_HZ_COMMON
 #ifdef CONFIG_SMP
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A23042FC004
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:56 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270918; cv=none;
 b=prYV6kiZ/g5CU/Hn41eTTdcf+nsgsejRMF9YlOFqiewBWHbnrWCTP9kxBZckxiRQ1VvQpER8tjN7QgbQ4c0zij9LcckrJVkX+Cpu6SZazEmgx+hiz7gehO5ul8BA5MMZlqJwJ29H/mrdXyUEt1ZTi0aUDrhm0/ch8vBT2HSUIiY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270918; c=relaxed/simple;
	bh=iM3ABT5TB/b+NuSHpzRR+MTBRchvN6UYmnNwwhN0XIw=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=UorTPRN8/VKWenu+tPzkgHwZNthp7FChWvQMxBlnU9ZXHkqWwmKJBieJwmssJMgRaYf3QdYXGefsj9yI6+t1biKPLv8Rtoe+CX8vRIiRQisArNkktnElOHhLTlNzPEMBul5M5VTszzGE4dDKi0vulBDpxRNY+j7NieHcUzTr6NQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=dvIH6C13; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="dvIH6C13"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270916; x=1781806916;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=iM3ABT5TB/b+NuSHpzRR+MTBRchvN6UYmnNwwhN0XIw=;
  b=dvIH6C13m8eg38MbrhvutO7tNOXJjoru+H/dxlcGqGgIEqt/3lnui+ls
   Ax0AdHjwUeIvAv0vKW02IFqBb62BDr2GZeTL5v+KcPecWocJqc8AwLJPW
   p6Re4BEEp9c3O5ht3z8Rh9lsWPW/V46p2aLbDPxAIC/89O9nQObGsK7fd
   S23TsGqyhc3rr4+MaCrD+MN/GwL1Up9gi5S59wfKXiFZTw2VyXU6i/ieb
   p3W93cwc0GbptCXluULNXuQNFNSSINbzdZ13xvmBr8sQkUjlHJttZa6ng
   jxyrfFMwPCHG2cw2U4W0FjchU7U1sQuFxJb51T+CMtt3NmPwYWM3XaTsO
   Q==;
X-CSE-ConnectionGUID: WGpTAeg9TUi2Z/6fkCAarg==
X-CSE-MsgGUID: 7dA/FD/PQxGrwYmk5u2dRA==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931607"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931607"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:56 -0700
X-CSE-ConnectionGUID: bXFqvuY3S6+9Zd3SiPYU/Q==
X-CSE-MsgGUID: /yrcI1KcQO2+tdFpydQB5w==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180959988"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:55 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 12/20] sched: Calculate the total number of preferred
 LLC tasks during load balance
Date: Wed, 18 Jun 2025 11:28:00 -0700
Message-Id: 
 <4a37811c12bbca8cb669904ad67dad3b7e99a552.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

During load balancing between LLCs, gather the number of tasks
on each runqueue of a source LLC.

For example, consider a system with 4 sched groups LLC0, LLC1,
..., LLC3. We are balancing towards LLC3 and LLC0 has 3 tasks
preferring LLC3, LLC1 has 2 tasks preferring LLC3 and LLC2 has
1 task preferring LLC3. LLC0 with most tasks preferring LLC3
will be chosen as the busiest LLC to pick the tasks from.

The number of tasks preferring the destination LLC are gathered
from each run queue for a source LLC.

For example, consider the sched_group LLC0 with two CPUs, CPU0
and CPU1. On CPU0, 2 tasks prefer to run on LLC3, and on CPU1,
one task prefers LLC3. The total number of tasks preferring
LLC3 in LLC0 is 2 + 1 =3D 3.

These statistics enable the load balancer to select tasks from
a sched_group that best aligns tasks with their preferred LLCs.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/fair.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ba62b445bbbb..99f3cee7b276 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10459,6 +10459,9 @@ struct sg_lb_stats {
 	unsigned int nr_numa_running;
 	unsigned int nr_preferred_running;
 #endif
+#ifdef CONFIG_SCHED_CACHE
+	unsigned int nr_pref_llc[MAX_LLC];
+#endif
 };
=20
 /*
@@ -10937,6 +10940,14 @@ static inline void update_sg_lb_stats(struct lb_en=
v *env,
 		if (cpu_overutilized(i))
 			*sg_overutilized =3D 1;
=20
+#ifdef CONFIG_SCHED_CACHE
+		if (sched_feat(SCHED_CACHE)) {
+			int j;
+
+			for (j =3D 0; j < max_llcs; ++j)
+				sgs->nr_pref_llc[j] +=3D rq->nr_pref_llc[j];
+		}
+#endif
 		/*
 		 * No need to call idle_cpu() if nr_running is not 0
 		 */
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 514DE2FC01C
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:57 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270919; cv=none;
 b=nmqdaWDatMrhBkfjuY3zJis51UO9eAa5aRb1rJdWVySUjW3tfYRxyj1Xkvi+fNpajS95RQl1kNM/Uc2yZ/0qy4Yr0n5zWNCB62WmrDP+LPoiGxGjwroeiGueYQuwTtAOG6KXPOSjIfKn4GM4dEwjzo3+VttU3Mxq2/vSBP1gjkQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270919; c=relaxed/simple;
	bh=qbjVDBu0+RR9cVBkMVV/EEaCntO2T94kbrD7rZkO8yA=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=XnMghoHFlyLEgu+Wh96kFEDjFxOvIdG1kYivt+ooFQzL7JHqy7Y2tRCCgBmmjgXcODTYXXNN9TLYbc2t4TSsUmKPzAY7GwTWviiMJPDQpqTfLl+bgoY1YdlK7e1ynWuUJ9NxwRUCfO0asQkBgDzntM+cRVZ3lV7tz/MiGA6JuHE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=coUsxsM/; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="coUsxsM/"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270917; x=1781806917;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=qbjVDBu0+RR9cVBkMVV/EEaCntO2T94kbrD7rZkO8yA=;
  b=coUsxsM/ixxY43XGQ61Df5pL4/CwC9wk7zMajcpFP2eKll8eBCTuGhU1
   TDmuCcPGg5tMI5ZhS8hwToyQBxfmHALCjIHPMRTTN7NWZkIjloQEW5hzf
   8OM/inZ27wXqGy9oddWdVppotNblwyx73zjRCiYiilRwXBDqWBWSJby2f
   mn56QOTvTT4uucpyocRsNzlz0tvki+S25xv2mNIZJ1GFIXdpAREJ2ZZvQ
   7hlrzMUkv6jPGBx21WWsulHPgdDzFpzgrgy7hSF/p1HI793hc8L9jfEZv
   KcS4ylrKsFNBqYOFqL6hfs7PvPzeeEHhVD6z0cM0apx9kBQCg3dCDjTKK
   Q==;
X-CSE-ConnectionGUID: tZPMCIUPT5iVlY3/j2c6gg==
X-CSE-MsgGUID: qbFq77/RT1KHBqt/bJ6sEA==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931622"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931622"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:57 -0700
X-CSE-ConnectionGUID: btZ9nZi5QCuWcK+hX1K3bQ==
X-CSE-MsgGUID: 68bF5Nl8R3yXe/gNzr0YHw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180960011"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:56 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 13/20] sched: Tag the sched group as llc_balance if it
 has tasks prefer other LLC
Date: Wed, 18 Jun 2025 11:28:01 -0700
Message-Id: 
 <936c261e6283b8fa8c2d7e60493721f6594ce176.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

During load balancing between LLCs, check whether there are tasks
preferring the destination LLC. If so, balance those tasks to the
destination LLC first.

Tag the sched_group that has tasks preferring to run on other LLCs
(non-local) with the group_llc_balance flag. This way, the load
balancer will later attempt to pull/push these tasks to their
preferred LLCs.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 99f3cee7b276..48a090c6e885 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10454,6 +10454,7 @@ struct sg_lb_stats {
 	enum group_type group_type;
 	unsigned int group_asym_packing;	/* Tasks should be moved to preferred CP=
U */
 	unsigned int group_smt_balance;		/* Task on busy SMT be moved */
+	unsigned int group_llc_balance;		/* Tasks should be moved to preferred LL=
C */
 	unsigned long group_misfit_task_load;	/* A CPU has a task too big for its=
 capacity */
 #ifdef CONFIG_NUMA_BALANCING
 	unsigned int nr_numa_running;
@@ -10818,6 +10819,43 @@ static inline bool smt_balance(struct lb_env *env,=
 struct sg_lb_stats *sgs,
 	return false;
 }
=20
+/*
+ * Do LLC balance on sched group that contains LLC, and have tasks preferr=
ing
+ * to run on LLC in idle dst_cpu.
+ */
+#ifdef CONFIG_SCHED_CACHE
+static inline bool llc_balance(struct lb_env *env, struct sg_lb_stats *sgs,
+			       struct sched_group *group)
+{
+	struct sched_domain *child =3D env->sd->child;
+	int llc;
+
+	if (!sched_feat(SCHED_CACHE))
+		return false;
+
+	if (env->sd->flags & SD_SHARE_LLC)
+		return false;
+
+	/* only care about task migration among LLCs */
+	if (child && !(child->flags & SD_SHARE_LLC))
+		return false;
+
+	llc =3D llc_idx(env->dst_cpu);
+	if (sgs->nr_pref_llc[llc] > 0 &&
+	    _get_migrate_hint(env->src_cpu, env->dst_cpu,
+			      0, true) =3D=3D mig_allow)
+		return true;
+
+	return false;
+}
+#else
+static inline bool llc_balance(struct lb_env *env, struct sg_lb_stats *sgs,
+			       struct sched_group *group)
+{
+	return false;
+}
+#endif
+
 static inline long sibling_imbalance(struct lb_env *env,
 				    struct sd_lb_stats *sds,
 				    struct sg_lb_stats *busiest,
@@ -11000,6 +11038,11 @@ static inline void update_sg_lb_stats(struct lb_en=
v *env,
 	sgs->group_type =3D group_classify(env->sd->imbalance_pct, group, sgs);
=20
 	update_sg_if_llc(env, sgs, group);
+
+	/* Check for tasks in this group can be moved to their preferred LLC */
+	if (!local_group && llc_balance(env, sgs, group))
+		sgs->group_llc_balance =3D 1;
+
 	/* Computing avg_load makes sense only when group is overloaded */
 	if (sgs->group_type =3D=3D group_overloaded)
 		sgs->avg_load =3D (sgs->group_load * SCHED_CAPACITY_SCALE) /
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 378E62FCE1A
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:58 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270920; cv=none;
 b=ASxthUrBX6Z81qbsQocazhyRsc3w4KjSaibX0r0fmO/uPp3e/rDgPPSjzptpfRM65fVEmwToh+9nY+/mmo0DpYzyL1hx2NIpj7GZfFXAuMz7beZVpYQkrh2HNY8gyzOoVYXLKhwer420hvK4In5+4ah/Az0BdRL3g8Qqt51fikc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270920; c=relaxed/simple;
	bh=cMgIIHv7v3o6hwLDWo7qzlIUuIJ0MkqCBDVD8FqJjxk=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=EWQtFnwscFIHJgsfTtDl3lp0BZZED+6rJ4mkP26EG71H0KTH1swfv+jnlTEZByRE4fdbCqlRsxWJKQE1P2n1+rbfG/iNowQx5qRuzogKgl+wAixpKa+2O1Es/si7+y0czWh1Gp4kfwIn7pT6wQ0T9XCYr9+UWmHFzVEOypFRnzs=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=BpQBSzXY; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="BpQBSzXY"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270918; x=1781806918;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=cMgIIHv7v3o6hwLDWo7qzlIUuIJ0MkqCBDVD8FqJjxk=;
  b=BpQBSzXYfQCuuNQrY07i0oxTUklWpFxgVOI3T2jR9DS0iIIrA0R9YJ83
   A9emVOApgFn/Vtg45BEuMxyBgA2TiW4xddTrQIm5gKRorVrmWRGguFZxO
   nCW/eG3N/h/KeRxeDQDhVLByESmAqIOMi1VfU1gEw2Y77ZQX7MjFWlXNH
   OUxB74DFQr31EirYxBp+QPY8d/5S5jyj2WR0Nq+yVEz01jtl24VXePQsv
   wBg0aK2thwbQ070vTU2iI+McTBs29ChLZRqwba7zv7kzEGNCrqmDUK6Zg
   bhc7UBL3FUSUKcR5z/7hq6ahpD4cObaTjU9buWfjUFBWJ1k3FZrl+UJpI
   g==;
X-CSE-ConnectionGUID: 3Abw6Bt+Qx+VsWy2ZRBi4Q==
X-CSE-MsgGUID: HFq6k7fqT72cxCbUZ+ATOA==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931636"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931636"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:58 -0700
X-CSE-ConnectionGUID: ul8Mak7HRui0K/AZNo//iw==
X-CSE-MsgGUID: NFSnEWS8RbuYiY0YXOwNnQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180960034"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:57 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 14/20] sched: Introduce update_llc_busiest() to deal
 with groups having preferred LLC tasks
Date: Wed, 18 Jun 2025 11:28:02 -0700
Message-Id: 
 <e5b77a2e33a6a98de0468c999e8c94d226b8e6d3.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

The load balancer attempts to identify the busiest sched_group with
the highest load and migrates some tasks to a less busy sched_group
to distribute the load across different CPUs.

When cache-aware scheduling is enabled, the busiest sched_group is
defined as the one with the highest number of tasks preferring to run
on the destination LLC. If the busiest group has llc_balance tag,
the cache aware load balance will be launched.

Introduce the helper function update_llc_busiest() to identify
such sched group with most tasks preferring the destination LLC.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/fair.c | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 48a090c6e885..ab3d1239d6e4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10848,12 +10848,36 @@ static inline bool llc_balance(struct lb_env *env=
, struct sg_lb_stats *sgs,
=20
 	return false;
 }
+
+static bool update_llc_busiest(struct lb_env *env,
+			       struct sg_lb_stats *busiest,
+			       struct sg_lb_stats *sgs)
+{
+	int idx;
+
+	/* Only the candidate with llc_balance need to be taken care of */
+	if (!sgs->group_llc_balance)
+		return false;
+
+	/*
+	 * There are more tasks that want to run on dst_cpu's LLC.
+	 */
+	idx =3D llc_idx(env->dst_cpu);
+	return sgs->nr_pref_llc[idx] > busiest->nr_pref_llc[idx];
+}
 #else
 static inline bool llc_balance(struct lb_env *env, struct sg_lb_stats *sgs,
 			       struct sched_group *group)
 {
 	return false;
 }
+
+static bool update_llc_busiest(struct lb_env *env,
+			       struct sg_lb_stats *busiest,
+			       struct sg_lb_stats *sgs)
+{
+	return false;
+}
 #endif
=20
 static inline long sibling_imbalance(struct lb_env *env,
@@ -11085,6 +11109,14 @@ static bool update_sd_pick_busiest(struct lb_env *=
env,
 	     sds->local_stat.group_type !=3D group_has_spare))
 		return false;
=20
+	/* deal with prefer LLC load balance, if failed, fall into normal load ba=
lance */
+	if (update_llc_busiest(env, busiest, sgs))
+		return true;
+
+	/* if there is already a busy group, skip the normal load balance */
+	if (busiest->group_llc_balance)
+		return false;
+
 	if (sgs->group_type > busiest->group_type)
 		return true;
=20
@@ -11991,9 +12023,11 @@ static struct sched_group *sched_balance_find_src_=
group(struct lb_env *env)
 	/*
 	 * Try to move all excess tasks to a sibling domain of the busiest
 	 * group's child domain.
+	 * Also do so if we can move some tasks that prefer the local LLC.
 	 */
 	if (sds.prefer_sibling && local->group_type =3D=3D group_has_spare &&
-	    sibling_imbalance(env, &sds, busiest, local) > 1)
+	    (busiest->group_llc_balance ||
+	    sibling_imbalance(env, &sds, busiest, local) > 1))
 		goto force_balance;
=20
 	if (busiest->group_type !=3D group_overloaded) {
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 571022FCFC2
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:21:59 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270921; cv=none;
 b=Au/8Jdq57vaG6zpgwbWDMgAuubIJAHPnTlsoAXwGoHognpeK/aWGydhvQxM3536916CeCjNp7EH7OJ1j+rscZhPywV3siybixACVKTWmKknqhXSmK9iQja3rE6sE7M29Xk/pKSsaah9dw+I+23TM1f6VNcw/zxHYJJuvbu42ScY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270921; c=relaxed/simple;
	bh=qHzPg7pOAdSMp76icLDVAZqOGBB1+iXyIxtSLlESXR0=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=UvCpUAB1cF+/DlV2OPMao3wito5w7p/P7XCMH0zVQpdX7ISAPe7+UYDSTlR5CXGTWmTgG7MhjDnYZB0VvoII8J7ZwG7QGcKzF1ITC8sBcvoSR2nl05LkQA/9d/FIPodpuCurin5CPmjX8yQEcG/PuH0gr8OoT0oFfJQ9PTGL73c=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=FmFBFr6Y; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="FmFBFr6Y"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270919; x=1781806919;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=qHzPg7pOAdSMp76icLDVAZqOGBB1+iXyIxtSLlESXR0=;
  b=FmFBFr6Yj94ghXvPX0OCmpCOy98F1E44OpxMfBpTuE01Up5uaW3BC4dp
   LKM2y1rnTUzZVvsXBUk+n0OQLTLEDTa762KmotgATQyk408JVWd7CeTmx
   a5qvM/9qZL3kEomZaLdyET8OE/W/+gBaxg35o/VfV60g6iC8kUriAFUIK
   FnOkrknbKEmGtpNieAKL4Z11kucxta5+z0O7A4asBMEslen5BktgpvTBS
   OaNU8TXkSuVwDP/FVVia7CCMK0h99Xst5sxVHgrZZz/hLD2iZNRH8LdJs
   at3EQiEbK/gun5R/uTtPhw5w9l5xV9iGjFYl1aRfV6FTSQSAQS5govoCZ
   w==;
X-CSE-ConnectionGUID: l1gKsZfFSOGB2PlCgz10uA==
X-CSE-MsgGUID: vayaBofJRAyAxRdcDQ65+A==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931650"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931650"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:58 -0700
X-CSE-ConnectionGUID: K6PGgga3SImXHYmzMR4M/w==
X-CSE-MsgGUID: l0kLMoLZQamEcnOsk8UfMg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180960060"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:58 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 15/20] sched: Introduce a new migration_type to track
 the preferred LLC load balance
Date: Wed, 18 Jun 2025 11:28:03 -0700
Message-Id: 
 <5b9c5a9ddb5b8b16ad20fbba9d41288de95741bc.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Introduce a new migration type named migrate_llc_task to facilitate
cache-aware load balancing.

After the busiest sched_group is identified as the one that needs
migration due to having most tasks preferring destination LLC, tag the
migration type as the newly introduced migrate_llc_task. During load
balancing, each runqueue within the busiest preferred-LLC sched_group
is checked, and the runqueue with the highest number of tasks preferring
to run on the destination CPU is chosen as the busiest runqueue.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/fair.c | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ab3d1239d6e4..42222364ad9c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9711,7 +9711,8 @@ enum migration_type {
 	migrate_load =3D 0,
 	migrate_util,
 	migrate_task,
-	migrate_misfit
+	migrate_misfit,
+	migrate_llc_task
 };
=20
 #define LBF_ALL_PINNED	0x01
@@ -10143,6 +10144,15 @@ static int detach_tasks(struct lb_env *env)
 			env->imbalance -=3D util;
 			break;
=20
+		case migrate_llc_task:
+			/*
+			 * Since can_migrate_task() succeed, when we reach here, it means that p
+			 * can be migrated even if dst_cpu is not p's preferred_llc, because th=
ere
+			 * are no idle cores for p to do in-llc load balance.
+			 */
+			env->imbalance--;
+			break;
+
 		case migrate_task:
 			env->imbalance--;
 			break;
@@ -11779,6 +11789,15 @@ static inline void calculate_imbalance(struct lb_e=
nv *env, struct sd_lb_stats *s
 		return;
 	}
=20
+#ifdef CONFIG_SCHED_CACHE
+	if (busiest->group_llc_balance) {
+		/* Move a task that prefer local LLC */
+		env->migration_type =3D migrate_llc_task;
+		env->imbalance =3D 1;
+		return;
+	}
+#endif
+
 	if (busiest->group_type =3D=3D group_imbalanced) {
 		/*
 		 * In the group_imb case we cannot rely on group-wide averages
@@ -12087,6 +12106,10 @@ static struct rq *sched_balance_find_src_rq(struct=
 lb_env *env,
 	struct rq *busiest =3D NULL, *rq;
 	unsigned long busiest_util =3D 0, busiest_load =3D 0, busiest_capacity =
=3D 1;
 	unsigned int busiest_nr =3D 0;
+#ifdef CONFIG_SCHED_CACHE
+	unsigned int busiest_pref_llc =3D 0;
+	int dst_llc;
+#endif
 	int i;
=20
 	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
@@ -12195,6 +12218,16 @@ static struct rq *sched_balance_find_src_rq(struct=
 lb_env *env,
 			}
 			break;
=20
+		case migrate_llc_task:
+#ifdef CONFIG_SCHED_CACHE
+			dst_llc =3D llc_idx(env->dst_cpu);
+			if (!cpus_share_cache(env->dst_cpu, rq->cpu) &&
+			    busiest_pref_llc < rq->nr_pref_llc[dst_llc]) {
+				busiest_pref_llc =3D rq->nr_pref_llc[dst_llc];
+				busiest =3D rq;
+			}
+#endif
+			break;
 		case migrate_task:
 			if (busiest_nr < nr_running) {
 				busiest_nr =3D nr_running;
@@ -12377,6 +12410,8 @@ static void update_lb_imbalance_stat(struct lb_env =
*env, struct sched_domain *sd
 	case migrate_misfit:
 		__schedstat_add(sd->lb_imbalance_misfit[idle], env->imbalance);
 		break;
+	case migrate_llc_task:
+		break;
 	}
 }
=20
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 10A9D2FCFE0
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:22:00 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270921; cv=none;
 b=P+N3O5LY+75YpxuXAxCziwrhSLux4hrlWTDJ+f8IcG0rzPNOVsWLmvsBedk/2+jdPdqDw1wzG7atrrNckzm5Yyg74mjwX53XlvX5jdoIe7rAPpy4h3viBerEuO6WUgh96xv+h8Lwf0GEIyOdryyHXJYAAnIgHDvJPZSQuXjuTaw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270921; c=relaxed/simple;
	bh=v21t2Zwrb1Nh/wAuSkK34PJS3ovpgrP9JoFKO8zGmaY=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=ajQTLgiN8PnOKWjIVod1oKlrsTKQVI7OTOmzFXtG+73wlaVQvWGhqk0WMJJXjNaJRuZH21mpCLXZaDLmAm/qdrOjIx4FzgPZZPymYhkjL6psVeBZGckQhtba+IMm3kDkbtawrmcsQMOZ+Zwe6kb6VuOntBmYomWGuxm13ODEJMI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=eNKkiPXx; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="eNKkiPXx"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270920; x=1781806920;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=v21t2Zwrb1Nh/wAuSkK34PJS3ovpgrP9JoFKO8zGmaY=;
  b=eNKkiPXxlLeUp9MrxUVzTWIyvBs9ufToarQslgq0bYk5s/czMppb7IvI
   h6f2ZKRMsVlGKRrHwIGJjWC2qE8qaTMXUEdQj3r8D+h0SN43VWG+hMsrT
   RYlk/KgtxAMp9QrgVWboKznJ7vUI8egFwzA9KbPGigmWN87qDwRCRj+PI
   1DVrQqjN70cltKuzihCwLLBClt53KvfL9NiCkywt0JuRcLKIP2iJEokrX
   ajvz3jtkrzeg38383zwJHxSMbB8WBku1/QPExsvxhxX+x84ckOTN+YWR0
   r0W/M6eCut34E/W0ufbrbXos01UbUqSL73VaxS852oWlwl340CVxjghCK
   g==;
X-CSE-ConnectionGUID: WN1cDgRBRjePAqtNlvvThA==
X-CSE-MsgGUID: GG9U87oTRii/aNSvr7x6sQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931663"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931663"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:21:59 -0700
X-CSE-ConnectionGUID: RBeMMmkDTB+RrdOWQJsvcA==
X-CSE-MsgGUID: 9zB/bWW2R3GKVE6WyIDfGA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180960082"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:21:59 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 16/20] sched: Consider LLC locality for active balance
Date: Wed, 18 Jun 2025 11:28:04 -0700
Message-Id: 
 <1ce821178bf178ce841ea94bb8139fd9a197b86b.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

If busiest run queue has only one task, active balance is enlisted
to actually move the task.  However, before moving the task,
we should consider whether we are moving the task from its preferred
LLC.

Don't move the single running task in a run queue to another LLC, if
we are moving it from its desired LLC, or moving it will cause too much
imbalance between the LLCs.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/fair.c | 51 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 42222364ad9c..3a8f6fc52055 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12294,10 +12294,43 @@ imbalanced_active_balance(struct lb_env *env)
 	return 0;
 }
=20
+#ifdef CONFIG_SCHED_CACHE
+static inline bool
+break_llc_locality(struct lb_env *env)
+{
+	if (!sched_feat(SCHED_CACHE))
+		return 0;
+
+	if (cpus_share_cache(env->src_cpu, env->dst_cpu))
+		return 0;
+	/*
+	 * All tasks want to stay put. Move only if LLC is
+	 * heavily loaded or don't pull a task from its
+	 * preferred CPU if it is the only one running.
+	 */
+	if (env->src_rq->nr_pref_llc_running =3D=3D env->src_rq->cfs.h_nr_runnabl=
e &&
+	    (env->src_rq->nr_running <=3D 1 ||
+	    _get_migrate_hint(env->src_cpu, env->dst_cpu,
+			      0, false) =3D=3D mig_forbid))
+		return 1;
+
+	return 0;
+}
+#else
+static inline bool
+break_llc_locality(struct lb_env *env)
+{
+	return 0;
+}
+#endif
+
 static int need_active_balance(struct lb_env *env)
 {
 	struct sched_domain *sd =3D env->sd;
=20
+	if (break_llc_locality(env))
+		return 0;
+
 	if (asym_active_balance(env))
 		return 1;
=20
@@ -12317,7 +12350,8 @@ static int need_active_balance(struct lb_env *env)
 			return 1;
 	}
=20
-	if (env->migration_type =3D=3D migrate_misfit)
+	if (env->migration_type =3D=3D migrate_misfit ||
+	    env->migration_type =3D=3D migrate_llc_task)
 		return 1;
=20
 	return 0;
@@ -12762,9 +12796,20 @@ static int active_load_balance_cpu_stop(void *data)
 		goto out_unlock;
=20
 	/* Is there any task to move? */
-	if (busiest_rq->nr_running <=3D 1)
-		goto out_unlock;
+	if (busiest_rq->nr_running <=3D 1) {
+#ifdef CONFIG_SCHED_CACHE
+		int llc =3D llc_idx(target_cpu);
=20
+		if (!sched_feat(SCHED_CACHE))
+			goto out_unlock;
+
+		if (llc < 0)
+			goto out_unlock;
+		/* don't migrate if task does not prefer target */
+		if (busiest_rq->nr_pref_llc[llc] < 1)
+#endif
+			goto out_unlock;
+	}
 	/*
 	 * This condition is "impossible", if it occurs
 	 * we need to fix it. Originally reported by
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B6EE2F5476
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:22:00 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270922; cv=none;
 b=MFTVX6qRlvnmcnypqJAnNUvjVFnCj+BYWehMpFkjTBU+YkJWvxgKrAJGcPlOnWFlULsIE0HJF5adxlSs+4NcBPZqPwLUEpp3DyzPS31YqqskBVjcvxtKVfWLg48hqUzzgp9v2j0fKtLs13VTywRh7Dh2csNg/XDFtX5FiqAZvbY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270922; c=relaxed/simple;
	bh=UQoBqN95xnQudsJ44o6C5oD7PSQHCIXgA6EcclP4fcc=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=nDrGI2/xX7+VWoIaZPAxSQ62lLIbd2FIobNRajXdY8S5xE+UkDaqRZV9oSWRZyNefE1ch1lBYfvNcBa+4ghO/kDKZP04UYkGh8gv4TzDurIYTenC4Ns0bIWzJq0lXhvHvCBGCuNffM2eKs5JKCc2O0pb2ptRCWrh8hgx0OzEDJU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=gwPPS4Py; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="gwPPS4Py"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270921; x=1781806921;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=UQoBqN95xnQudsJ44o6C5oD7PSQHCIXgA6EcclP4fcc=;
  b=gwPPS4PyJxuWZjpqppzzQNy6oNFS2apouvtFBoztM3FRMsIZNoXOCYZS
   4ZGsvXZ1GBKYWiosJJLy3Afvyz6rPjZGp6kTCMd3SEk6QElyc++ZHbpeH
   U+87HjtVKO3MPeHlo5eycdT091abyiOHsWsk02bh++KLCXtrZChonH3SN
   EXN9QhBQhTsKkKvGvzRjZJXx+5ylM+EmAu0SlP86VdBwSp8bjkVa10OXt
   tZ/lEtGqbQUS8nYQOIluXmFXapZZs3teRfaTMOdaD+49KrQPZjXrq2Ex4
   mkaL045bqiOr9hiagrNO4Meh5T5RKF9itXcTr61PJGxchTt1XfKGFw5XC
   g==;
X-CSE-ConnectionGUID: Yl7CRBbAQ5iJFKjFcNyJkw==
X-CSE-MsgGUID: JmWjpB8BQ8SyduLVhShd6A==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931677"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931677"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:22:00 -0700
X-CSE-ConnectionGUID: PriWcHtzQ+CMXjb98vjSvg==
X-CSE-MsgGUID: 4FJAHvYoQNmNJWCnl/pw1g==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180960102"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:22:00 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 17/20] sched: Consider LLC preference when picking
 tasks from busiest queue
Date: Wed, 18 Jun 2025 11:28:05 -0700
Message-Id: 
 <9d28a5a892f0413a96498bbf711eaa9b354ca895.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

When picking tasks from busiest queue for load balance, we currently
do not consider LLC preference.

Order the task in the busiest queue such that we picked the tasks in the
following order:
	1. tasks that prefer dst cpu's LLC
	2. tasks that have no preference in LLC
	3. tasks that prefer LLC other than the ones they are on
	4. tasks that prefer the LLC that they are currently on

This will allow tasks better chances to wind up in its preferred LLC.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/fair.c | 66 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 65 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3a8f6fc52055..c9db32c2df63 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10056,6 +10056,68 @@ static struct task_struct *detach_one_task(struct =
lb_env *env)
 	return NULL;
 }
=20
+#ifdef CONFIG_SCHED_CACHE
+/*
+ * Prepare lists to detach tasks in the following order:
+ * 1. tasks that prefer dst cpu's LLC
+ * 2. tasks that have no preference in LLC
+ * 3. tasks that prefer LLC other than the ones they are on
+ * 4. tasks that prefer the LLC that they are currently on.
+ */
+static struct list_head
+*order_tasks_by_llc(struct lb_env *env, struct list_head *tasks)
+{
+	struct task_struct *p;
+	LIST_HEAD(pref_old_llc);
+	LIST_HEAD(pref_new_llc);
+	LIST_HEAD(no_pref_llc);
+	LIST_HEAD(pref_other_llc);
+
+	if (!sched_feat(SCHED_CACHE))
+		return tasks;
+
+	if (cpus_share_cache(env->dst_cpu, env->src_cpu))
+		return tasks;
+
+	while (!list_empty(tasks)) {
+		p =3D list_last_entry(tasks, struct task_struct, se.group_node);
+
+		if (p->preferred_llc =3D=3D llc_id(env->dst_cpu)) {
+			list_move(&p->se.group_node, &pref_new_llc);
+			continue;
+		}
+
+		if (p->preferred_llc =3D=3D llc_id(env->src_cpu)) {
+			list_move(&p->se.group_node, &pref_old_llc);
+			continue;
+		}
+
+		if (p->preferred_llc =3D=3D -1) {
+			list_move(&p->se.group_node, &no_pref_llc);
+			continue;
+		}
+
+		list_move(&p->se.group_node, &pref_other_llc);
+	}
+
+	/*
+	 * We detach tasks from list tail in detach tasks.  Put tasks
+	 * to be chosen first at end of list.
+	 */
+	list_splice(&pref_new_llc, tasks);
+	list_splice(&no_pref_llc, tasks);
+	list_splice(&pref_other_llc, tasks);
+	list_splice(&pref_old_llc, tasks);
+	return tasks;
+}
+#else
+static inline struct list_head
+*order_tasks_by_llc(struct lb_env *env, struct list_head *tasks)
+{
+	return tasks;
+}
+#endif
+
 /*
  * detach_tasks() -- tries to detach up to imbalance load/util/tasks from
  * busiest_rq, as part of a balancing operation within domain "sd".
@@ -10064,7 +10126,7 @@ static struct task_struct *detach_one_task(struct l=
b_env *env)
  */
 static int detach_tasks(struct lb_env *env)
 {
-	struct list_head *tasks =3D &env->src_rq->cfs_tasks;
+	struct list_head *tasks;
 	unsigned long util, load;
 	struct task_struct *p;
 	int detached =3D 0;
@@ -10083,6 +10145,8 @@ static int detach_tasks(struct lb_env *env)
 	if (env->imbalance <=3D 0)
 		return 0;
=20
+	tasks =3D order_tasks_by_llc(env, &env->src_rq->cfs_tasks);
+
 	while (!list_empty(tasks)) {
 		/*
 		 * We don't want to steal all, otherwise we may be treated likewise,
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9EB802F5473
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:22:02 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270924; cv=none;
 b=UcUwkmFrXm1QZiHhRd9nLRLaJcXdFq15Quaiz8ZBN1nnL9SrnbVlLxUTqIyE9whgxAiEKu2+OgsC5VVcnjsA8wMU0p6jVlFPPQ7qmeBTzB6VM8FM85LAnq7ENrafpJvlDPCDIM9KyyIse0EZlGPKURu465AkRFJXtSxWwqh6Huo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270924; c=relaxed/simple;
	bh=LvkJ+My0UyW3xwEVx+qylSLEWrcmxbixkOEWZ10FTps=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=OB9jn7VvZ/rPmdAtTnDXoDgCyo72RKBXKCzeOdkrfLO6e85bK8hUMUUUOJEZalXFcdLxjZn/HiIycPeaDEtK0UPQwahP6NKXPIqG2XHiIkz2fC4a83E4onHZ8UGT50ZPOtl9Nzhvr5GlVAWllkK6TCvxOkmzUawTYoseeve+8+0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=GlYX1oli; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="GlYX1oli"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270922; x=1781806922;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=LvkJ+My0UyW3xwEVx+qylSLEWrcmxbixkOEWZ10FTps=;
  b=GlYX1olin1BGUE11MHHJj9SNVdaVYXouh/gB+4N21ppPm9OJp1+kXEIW
   jYKsD82DRpSUkmfbSWF6qlHl4i9BpZM+/sHr7a8DHXRrYaxO/Rj5jbXOw
   +J6epKBAznqSQbDha14UPGm8Z7tWVAxbi3VVbxuqnoizc+7JuMcPHjd8u
   wYx2yGauKvj0wQL3aVlaHP9Wp4NxgHk3BFgHplWMZc9XEc1wUSkwxQzqV
   T9whL5z31EzCyXebtiORr7A5MGjv8KhiLdLGBw82yyfUiT62ZER6bvm4y
   FqiKWIa9GZLlg4Z0hqOBSzlk3RseQZUvCgYBNujCvQGV07FE+Rdzvhqlv
   Q==;
X-CSE-ConnectionGUID: 4oIUGMDbTxGXwK/Erd3pww==
X-CSE-MsgGUID: tbl/Cv7HTzy00jIWW42WkQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931694"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931694"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:22:02 -0700
X-CSE-ConnectionGUID: dreUdLK/Snq9Tth5AMDo9w==
X-CSE-MsgGUID: FI0VhmRdTFe0V4qTkbZ9qg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180960127"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:22:00 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 18/20] sched: Do not migrate task if it is moving out
 of its preferred LLC
Date: Wed, 18 Jun 2025 11:28:06 -0700
Message-Id: 
 <cfd95c4e502c5987141afcabc38566de9a5a34e1.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

In the final step of task migration during load balancing,
can_migrate_task() is used to determine whether a task can
be moved to the destination. If the task has an LLC preference,
consider this preference when moving it out of its preferred LLC.
With this check in place, there is no need to retain the task's
cache-hot CPU check in task_hot(); remove it accordingly.

Besides, add more checks in detach_tasks() to avoid choosing
tasks that prefer their current LLC.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/fair.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c9db32c2df63..e342524481ed 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9787,17 +9787,6 @@ static int task_hot(struct task_struct *p, struct lb=
_env *env)
 	if (sysctl_sched_migration_cost =3D=3D 0)
 		return 0;
=20
-#ifdef CONFIG_SCHED_CACHE
-	if (sched_feat(SCHED_CACHE) && p->mm && p->mm->pcpu_sched) {
-		/*
-		 * XXX things like Skylake have non-inclusive L3 and might not
-		 * like this L3 centric view. What to do about L2 stickyness ?
-		 */
-		return per_cpu_ptr(p->mm->pcpu_sched, env->src_cpu)->occ >
-		       per_cpu_ptr(p->mm->pcpu_sched, env->dst_cpu)->occ;
-	}
-#endif
-
 	delta =3D rq_clock_task(env->src_rq) - p->se.exec_start;
=20
 	return delta < (s64)sysctl_sched_migration_cost;
@@ -9992,6 +9981,12 @@ int can_migrate_task(struct task_struct *p, struct l=
b_env *env)
 	if (env->flags & LBF_ACTIVE_LB)
 		return 1;
=20
+#ifdef CONFIG_SCHED_CACHE
+	if (sched_feat(SCHED_CACHE) &&
+	    get_migrate_hint(env->src_cpu, env->dst_cpu, p) =3D=3D mig_forbid)
+		return 0;
+#endif
+
 	degrades =3D migrate_degrades_locality(p, env);
 	if (!degrades)
 		hot =3D task_hot(p, env);
@@ -10252,6 +10247,17 @@ static int detach_tasks(struct lb_env *env)
 		if (env->imbalance <=3D 0)
 			break;
=20
+#ifdef CONFIG_SCHED_CACHE
+		/*
+		 * Don't detach more tasks if remaining tasks want to stay:
+		 * The tasks have already been sorted by order_tasks_by_llc(),
+		 * they are tasks that prefer the current LLC.
+		 */
+		if (sched_feat(SCHED_CACHE) && p->preferred_llc !=3D -1 &&
+		    llc_id(env->src_cpu) =3D=3D p->preferred_llc)
+			break;
+#endif
+
 		continue;
 next:
 		if (p->sched_task_hot)
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36AF22FD899
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:22:03 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270924; cv=none;
 b=rqcnz4lr519vfb2X2M9DeLf5O+erdaRAJSoZ2E9S2odeoi76dMp/OZU1NB58Qjs+uncaH3qLdMqonjZ3kQl6htfGCrXwxMWgW2YZT8y6e/FYDEkDc76bmoSAGQbtAHi6zdd/a0QMbqOAiPFDTuQ/Av7Zd2Z3POZHWg4NK1E6gso=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270924; c=relaxed/simple;
	bh=RBEoEDu2A+FwLyh++5jokg1I7TUhNUbkFf08/S4koCY=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=u++l74ct5wI88qe+ZhTxahgVxibbraZWIJUXAiGrfYKDyKJFt9lgn/tcRZtplKvjvmasXpdfWkYumF3dDOdoABJmyCFInhfV661Idkc/VE7bFkTegUBPg18Oyk856hhDCaV4uOx+JU6Wj+pkcN21wugnWmWN1myHTXggm5UEej4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=OxZqIyLs; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="OxZqIyLs"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270923; x=1781806923;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=RBEoEDu2A+FwLyh++5jokg1I7TUhNUbkFf08/S4koCY=;
  b=OxZqIyLsJSroJ2i9MdsrROxH70E3NsG6kvAKifMPTRvyyrGMmXsUYxD3
   CPzVxg8vQa7ptoVtXf4Q8V1g+8odAq77fXL+wB1Yz3cOwa5oIFmdnB5YW
   BortfRVvhpa+xJaYKcO1/iYGTjoGzZlBvQ4DqinF+ijFvIH3FXFHUY7Yw
   dnqNv+RspKaZf5GkEERusnRttKQTb+Ybdex2YDNmVMcMaLi3YqDVwQEd+
   zvko7J7nf4iHqzRFD8LqvQWYwg1aAy+yQ4qBaHEh90PM1XJHSY8jbNW6c
   NQxsij/EBLJiRtqClKTlBCTYmaEChOOO3OgR1tIMqHZLc+QmYryVKIQJ7
   Q==;
X-CSE-ConnectionGUID: DbGqwajpS0SQ1kimRCqNOw==
X-CSE-MsgGUID: IxguJ0uxQf2FmLjSrT5dxA==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931707"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931707"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:22:02 -0700
X-CSE-ConnectionGUID: 55fSL6zAQZGOckEnYdnVsw==
X-CSE-MsgGUID: LUEEbBOqRk6qNZaG5v1xSg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180960148"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:22:02 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 19/20] sched: Introduce SCHED_CACHE_LB to control cache
 aware load balance
Date: Wed, 18 Jun 2025 11:28:07 -0700
Message-Id: 
 <ac0e58b587dcd1e3ebe7a7a97973c5763d02a5f8.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Introduce the SCHED_CACHE_LB sched feature to enable or disable
cache aware load balance in the schduler.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/fair.c     | 18 ++++++++++--------
 kernel/sched/features.h |  1 +
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e342524481ed..af742601f2d7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9982,7 +9982,7 @@ int can_migrate_task(struct task_struct *p, struct lb=
_env *env)
 		return 1;
=20
 #ifdef CONFIG_SCHED_CACHE
-	if (sched_feat(SCHED_CACHE) &&
+	if (sched_feat(SCHED_CACHE) && sched_feat(SCHED_CACHE_LB) &&
 	    get_migrate_hint(env->src_cpu, env->dst_cpu, p) =3D=3D mig_forbid)
 		return 0;
 #endif
@@ -10068,7 +10068,7 @@ static struct list_head
 	LIST_HEAD(no_pref_llc);
 	LIST_HEAD(pref_other_llc);
=20
-	if (!sched_feat(SCHED_CACHE))
+	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB))
 		return tasks;
=20
 	if (cpus_share_cache(env->dst_cpu, env->src_cpu))
@@ -10253,7 +10253,8 @@ static int detach_tasks(struct lb_env *env)
 		 * The tasks have already been sorted by order_tasks_by_llc(),
 		 * they are tasks that prefer the current LLC.
 		 */
-		if (sched_feat(SCHED_CACHE) && p->preferred_llc !=3D -1 &&
+		if (sched_feat(SCHED_CACHE) && sched_feat(SCHED_CACHE_LB) &&
+		    p->preferred_llc !=3D -1 &&
 		    llc_id(env->src_cpu) =3D=3D p->preferred_llc)
 			break;
 #endif
@@ -10910,7 +10911,7 @@ static inline bool llc_balance(struct lb_env *env, =
struct sg_lb_stats *sgs,
 	struct sched_domain *child =3D env->sd->child;
 	int llc;
=20
-	if (!sched_feat(SCHED_CACHE))
+	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB))
 		return false;
=20
 	if (env->sd->flags & SD_SHARE_LLC)
@@ -11021,7 +11022,8 @@ static void update_sg_if_llc(struct lb_env *env, st=
ruct sg_lb_stats *sgs,
 	struct sched_domain *sd =3D env->sd->child;
 	struct sched_domain_shared *sd_share;
=20
-	if (!sched_feat(SCHED_CACHE) || env->idle =3D=3D CPU_NEWLY_IDLE)
+	if (!sched_feat(SCHED_CACHE) || env->idle =3D=3D CPU_NEWLY_IDLE ||
+	    !sched_feat(SCHED_CACHE_LB))
 		return;
=20
 	/* only care the sched domain that spans 1 LLC */
@@ -11083,7 +11085,7 @@ static inline void update_sg_lb_stats(struct lb_env=
 *env,
 			*sg_overutilized =3D 1;
=20
 #ifdef CONFIG_SCHED_CACHE
-		if (sched_feat(SCHED_CACHE)) {
+		if (sched_feat(SCHED_CACHE) && sched_feat(SCHED_CACHE_LB)) {
 			int j;
=20
 			for (j =3D 0; j < max_llcs; ++j)
@@ -12368,7 +12370,7 @@ imbalanced_active_balance(struct lb_env *env)
 static inline bool
 break_llc_locality(struct lb_env *env)
 {
-	if (!sched_feat(SCHED_CACHE))
+	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB))
 		return 0;
=20
 	if (cpus_share_cache(env->src_cpu, env->dst_cpu))
@@ -12870,7 +12872,7 @@ static int active_load_balance_cpu_stop(void *data)
 #ifdef CONFIG_SCHED_CACHE
 		int llc =3D llc_idx(target_cpu);
=20
-		if (!sched_feat(SCHED_CACHE))
+		if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB))
 			goto out_unlock;
=20
 		if (llc < 0)
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index d2af7bfd36bf..11dbd74cd365 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -88,6 +88,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
 SCHED_FEAT(SIS_UTIL, true)
=20
 SCHED_FEAT(SCHED_CACHE, true)
+SCHED_FEAT(SCHED_CACHE_LB, true)
 /*
  * Issue a WARN when we do multiple update_rq_clock() calls
  * in a single rq->lock section. Default disabled because the
--=20
2.32.0
From nobody Thu Oct  9 08:43:05 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37FAE2FE315
	for <linux-kernel@vger.kernel.org>; Wed, 18 Jun 2025 18:22:04 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1750270929; cv=none;
 b=onyBzk/JV7TsJ4rqzmPQxAynG5u8Uiv7NZBFqjIXX/vDeaVGTR6XM7u2t1DQFh6/8C8E442NQANkusHEp/W0G7MKp4l8bRLhTJDwy/WN6tGk0cfY5IF9GwVw8LyU0L2HDfqYL9FKb8t0ShAVnCE5wIeOC+RJNYKLspjyv345oV0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1750270929; c=relaxed/simple;
	bh=GCa+xOiJ3NtDwAxf3UTuM7FdSBlF1t2VJbPM88NipeI=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=lTB8OKjeVhk51AV9Z+yat4UlgOV4jPlwKTO3U5BqELX9+KB8jLro3KX85VFITsktx5Jba308vcVyotxvDMbwXzp5+qGVCYSvBcIyb/4B4Tot6SKFpcXHB6THYMdfQWeqbaG5Ds7ceyhFMgv2UbsDTiF0uZ6QidbwEUWatyQKrK0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=none smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=IXZaT3cj; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="IXZaT3cj"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1750270924; x=1781806924;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=GCa+xOiJ3NtDwAxf3UTuM7FdSBlF1t2VJbPM88NipeI=;
  b=IXZaT3cj2WJHpQsa0eOY4RKLxD0XHxWtW3DBUY3jLShIsyHbPv6kV6PS
   wBTZARncyqd81MhW2Dh6tAi77Kk2I1a86TYlMhKSh30I/NZi9Ohg6RQEG
   B2e6bpm5YRM81JbZP0vAzdhRwJTJ6z+fezdmgGlo8EIBWlV8PKGUd4V1y
   Q1K/xPtqmRaKI9stHeDWuocbpuMmO319jhINNuhdgtWOIH748A4vI8EIM
   vIBaj9+wydAwFrFxiz/O6rePd8/Uv/i5oca2c3tnOmtRZT0khUyTei51V
   l/RHQAM9KvyJRqc9LiIGKWsg+Dg76/187VJoJGXKdR6viVZo8MkXQOzWR
   g==;
X-CSE-ConnectionGUID: bOCOGOeLQ5mxgMbCQRgg+A==
X-CSE-MsgGUID: LC2a0LmnQ2qip+retIyBQw==
X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="63931720"
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="63931720"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Jun 2025 11:22:03 -0700
X-CSE-ConnectionGUID: rby7BZr4SRm2YmXdsRLRtQ==
X-CSE-MsgGUID: N6pda5x3S0GkUhyJrY38ZA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.16,246,1744095600";
   d="scan'208";a="180960173"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa001.fm.intel.com with ESMTP; 18 Jun 2025 11:22:03 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Chen Yu <yu.c.chen@intel.com>
Subject: [RFC patch v3 20/20] sched: Introduce SCHED_CACHE_WAKE to control LLC
 aggregation on wake up
Date: Wed, 18 Jun 2025 11:28:08 -0700
Message-Id: 
 <1f8e7ec2d84a94ac0a31ca6182218ffaf7e166df.1750268218.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1750268218.git.tim.c.chen@linux.intel.com>
References: <cover.1750268218.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Introduce SCHED_CACHE_WAKE feature to enable or disable cache-aware
wake up. Disable this feature by default because cache-aware wakeup
is overly aggressive in stacking wakees of the same process on the same LLC,
if they are frequently woken up.

The wake ups can be much more frequent than load balances, adding
much overhead when load balance alone for LLC aggregation is sufficient.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Yangyu Chen <cyy@cyyself.name>
---
 kernel/sched/fair.c     | 6 +++++-
 kernel/sched/features.h | 1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index af742601f2d7..32c90fab0d63 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9028,7 +9028,7 @@ static int select_cache_cpu(struct task_struct *p, in=
t prev_cpu)
 	struct mm_struct *mm =3D p->mm;
 	int cpu;
=20
-	if (!sched_feat(SCHED_CACHE))
+	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_WAKE))
 		return prev_cpu;
=20
 	if (!mm || p->nr_cpus_allowed =3D=3D 1)
@@ -9041,6 +9041,10 @@ static int select_cache_cpu(struct task_struct *p, i=
nt prev_cpu)
 	if (cpus_share_cache(cpu, prev_cpu))
 		return prev_cpu;
=20
+	if (_get_migrate_hint(prev_cpu, cpu,
+			      task_util(p), true) =3D=3D mig_forbid)
+		return prev_cpu;
+
 	if (static_branch_likely(&sched_numa_balancing) &&
 	    __migrate_degrades_locality(p, prev_cpu, cpu, false) > 0) {
 		/*
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 11dbd74cd365..44b408cf0dd4 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -89,6 +89,7 @@ SCHED_FEAT(SIS_UTIL, true)
=20
 SCHED_FEAT(SCHED_CACHE, true)
 SCHED_FEAT(SCHED_CACHE_LB, true)
+SCHED_FEAT(SCHED_CACHE_WAKE, false)
 /*
  * Issue a WARN when we do multiple update_rq_clock() calls
  * in a single rq->lock section. Default disabled because the
--=20
2.32.0