From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 93DAD125D6
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:07:12 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.19
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716034; cv=none;
 b=YUmFsTNH1uwQvpOHnTp11Akd3lgJWMfavT04pYrRO6bSLY9uShqjjFR32v7kBjYwOu9HZts4Psvms0Up5yiFkgkTpBdbC8CX/E7Z4c1Klx1PkIf3BPuhpb8ZvRx+SMdhPpzo/SQA6Ht628h/WhbmPYoJzx1WyHar5r5e0vVf1nw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716034; c=relaxed/simple;
	bh=B8SncTRCfxokFw3HLq476F91kwXYiv+eNctY+3vgxDg=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=LoeDB94MVSwRN9DiDv9UOzgtbfNf7Z8gM97rUyOhBmJ8xbH9EBSw6tUtmKoGA8eGV7pQZyNxS628yTpoLSby3RcBvL9Nu68rCTwLJMSC6e/upA0JZGqZ1E/H3XAf1XnjpHP133kxqHoHsAf9B7kQtb8FMbTEqziO+wZWq/wHV7c=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=EoDNlyYD; arc=none smtp.client-ip=192.198.163.19
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="EoDNlyYD"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716032; x=1786252032;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=B8SncTRCfxokFw3HLq476F91kwXYiv+eNctY+3vgxDg=;
  b=EoDNlyYDA5oPL7kQOXZZlmGPK3p6khDrsVNfQ0JxnGYMNTmoWPG3Trqv
   G2IvQfodRnNQCYPKgy20JzG+hnRCEBJuWbDYvbBKAv1X1Y6JYcYj11fWU
   ZEKDojM5x6NyBsP6fUSaKmteIt+dcABM+mQ1mSY84wSYIPWQMFhGWqxKi
   6u+a+ocT6BdIAxulicFjYoaLOtii26qUbwZRgLo92ZRGMfUm3fzaPrvmE
   Ao5J3uJtLRfBswzdorTuQV5vLeCnDshzqwFinb0JTb2FOypjk2LzTN+gp
   1hpBBXiAasf2lovIh8TYi7x2VmGvVyHeq1JBHV/mgRFlzzU7UVpHT7daf
   Q==;
X-CSE-ConnectionGUID: to6lNAM4S3a+AafwUff0zw==
X-CSE-MsgGUID: crHzkh/2S0a7G+FwBgPlVQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="56091915"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="56091915"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:07:11 -0700
X-CSE-ConnectionGUID: BhKTydHRScuaqNFjz5BXWQ==
X-CSE-MsgGUID: m+FrJxJNTOeCg6U8j8DjQQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="165475503"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by orviesa007.jf.intel.com with ESMTP; 08 Aug 2025 22:07:06 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 01/28] sched: Cache aware load-balancing
Date: Sat,  9 Aug 2025 13:00:59 +0800
Message-Id: 
 <9157186cf9e3fd541f62c637579ff736b3704c51.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Peter Zijlstra <peterz@infradead.org>

Hi all,

One of the many things on the eternal todo list has been finishing the
below hackery.

It is an attempt at modelling cache affinity -- and while the patch
really only targets LLC, it could very well be extended to also apply to
clusters (L2). Specifically any case of multiple cache domains inside a
node.

Anyway, I wrote this about a year ago, and I mentioned this at the
recent OSPM conf where Gautham and Prateek expressed interest in playing
with this code.

So here goes, very rough and largely unproven code ahead :-)

It applies to current tip/master, but I know it will fail the __percpu
validation that sits in -next, although that shouldn't be terribly hard
to fix up.

As is, it only computes a CPU inside the LLC that has the highest recent
runtime, this CPU is then used in the wake-up path to steer towards this
LLC and in task_hot() to limit migrations away from it.

More elaborate things could be done, notably there is an XXX in there
somewhere about finding the best LLC inside a NODE (interaction with
NUMA_BALANCING).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/mm_types.h |  44 ++++++
 include/linux/sched.h    |   4 +
 init/Kconfig             |   4 +
 kernel/fork.c            |   5 +
 kernel/sched/core.c      |  13 +-
 kernel/sched/fair.c      | 330 +++++++++++++++++++++++++++++++++++++--
 kernel/sched/sched.h     |   8 +
 7 files changed, 388 insertions(+), 20 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index d6b91e8a66d6..cf26ad8b41ab 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -928,6 +928,12 @@ struct mm_cid {
 };
 #endif
=20
+struct mm_sched {
+	u64 runtime;
+	unsigned long epoch;
+	unsigned long occ;
+};
+
 struct kioctx_table;
 struct iommu_mm_data;
 struct mm_struct {
@@ -1018,6 +1024,17 @@ struct mm_struct {
 		 */
 		raw_spinlock_t cpus_allowed_lock;
 #endif
+#ifdef CONFIG_SCHED_CACHE
+		/*
+		 * Track per-cpu-per-process occupancy as a proxy for cache residency.
+		 * See account_mm_sched() and ...
+		 */
+		struct mm_sched __percpu *pcpu_sched;
+		raw_spinlock_t mm_sched_lock;
+		unsigned long mm_sched_epoch;
+		int mm_sched_cpu;
+#endif
+
 #ifdef CONFIG_MMU
 		atomic_long_t pgtables_bytes;	/* size of all page tables */
 #endif
@@ -1432,6 +1449,33 @@ static inline unsigned int mm_cid_size(void)
 static inline void mm_set_cpus_allowed(struct mm_struct *mm, const struct =
cpumask *cpumask) { }
 #endif /* CONFIG_SCHED_MM_CID */
=20
+#ifdef CONFIG_SCHED_CACHE
+extern void mm_init_sched(struct mm_struct *mm, struct mm_sched *pcpu_sche=
d);
+
+static inline int mm_alloc_sched_noprof(struct mm_struct *mm)
+{
+	struct mm_sched *pcpu_sched =3D alloc_percpu_noprof(struct mm_sched);
+	if (!pcpu_sched)
+		return -ENOMEM;
+
+	mm_init_sched(mm, pcpu_sched);
+	return 0;
+}
+
+#define mm_alloc_sched(...)	alloc_hooks(mm_alloc_sched_noprof(__VA_ARGS__))
+
+static inline void mm_destroy_sched(struct mm_struct *mm)
+{
+	free_percpu(mm->pcpu_sched);
+	mm->pcpu_sched =3D NULL;
+}
+#else /* !CONFIG_SCHED_CACHE */
+
+static inline int mm_alloc_sched(struct mm_struct *mm) { return 0; }
+static inline void mm_destroy_sched(struct mm_struct *mm) { }
+
+#endif /* CONFIG_SCHED_CACHE */
+
 struct mmu_gather;
 extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm);
 extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct=
 *mm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index aa9c5be7a632..02ff8b8be25b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1403,6 +1403,10 @@ struct task_struct {
 	unsigned long			numa_pages_migrated;
 #endif /* CONFIG_NUMA_BALANCING */
=20
+#ifdef CONFIG_SCHED_CACHE
+	struct callback_head		cache_work;
+#endif
+
 #ifdef CONFIG_RSEQ
 	struct rseq __user *rseq;
 	u32 rseq_len;
diff --git a/init/Kconfig b/init/Kconfig
index 666783eb50ab..27f4012347f9 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -947,6 +947,10 @@ config NUMA_BALANCING
=20
 	  This system will be inactive on UMA systems.
=20
+config SCHED_CACHE
+	bool "Cache aware scheduler"
+	default y
+
 config NUMA_BALANCING_DEFAULT_ENABLED
 	bool "Automatically enable NUMA aware memory/task placement"
 	default y
diff --git a/kernel/fork.c b/kernel/fork.c
index 1ee8eb11f38b..546c49e46d48 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1073,6 +1073,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm=
, struct task_struct *p,
 	if (mm_alloc_cid(mm, p))
 		goto fail_cid;
=20
+	if (mm_alloc_sched(mm))
+		goto fail_sched;
+
 	if (percpu_counter_init_many(mm->rss_stat, 0, GFP_KERNEL_ACCOUNT,
 				     NR_MM_COUNTERS))
 		goto fail_pcpu;
@@ -1082,6 +1085,8 @@ static struct mm_struct *mm_init(struct mm_struct *mm=
, struct task_struct *p,
 	return mm;
=20
 fail_pcpu:
+	mm_destroy_sched(mm);
+fail_sched:
 	mm_destroy_cid(mm);
 fail_cid:
 	destroy_context(mm);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 81c6df746df1..a5fb3057b1c4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4539,6 +4539,7 @@ static void __sched_fork(unsigned long clone_flags, s=
truct task_struct *p)
 	p->migration_pending =3D NULL;
 #endif
 	init_sched_mm_cid(p);
+	init_sched_mm(p);
 }
=20
 DEFINE_STATIC_KEY_FALSE(sched_numa_balancing);
@@ -8508,6 +8509,7 @@ static struct kmem_cache *task_group_cache __ro_after=
_init;
=20
 void __init sched_init(void)
 {
+	unsigned long now =3D jiffies;
 	unsigned long ptr =3D 0;
 	int i;
=20
@@ -8582,7 +8584,7 @@ void __init sched_init(void)
 		raw_spin_lock_init(&rq->__lock);
 		rq->nr_running =3D 0;
 		rq->calc_load_active =3D 0;
-		rq->calc_load_update =3D jiffies + LOAD_FREQ;
+		rq->calc_load_update =3D now + LOAD_FREQ;
 		init_cfs_rq(&rq->cfs);
 		init_rt_rq(&rq->rt);
 		init_dl_rq(&rq->dl);
@@ -8626,7 +8628,7 @@ void __init sched_init(void)
 		rq->cpu_capacity =3D SCHED_CAPACITY_SCALE;
 		rq->balance_callback =3D &balance_push_callback;
 		rq->active_balance =3D 0;
-		rq->next_balance =3D jiffies;
+		rq->next_balance =3D now;
 		rq->push_cpu =3D 0;
 		rq->cpu =3D i;
 		rq->online =3D 0;
@@ -8638,7 +8640,7 @@ void __init sched_init(void)
=20
 		rq_attach_root(rq, &def_root_domain);
 #ifdef CONFIG_NO_HZ_COMMON
-		rq->last_blocked_load_update_tick =3D jiffies;
+		rq->last_blocked_load_update_tick =3D now;
 		atomic_set(&rq->nohz_flags, 0);
=20
 		INIT_CSD(&rq->nohz_csd, nohz_csd_func, rq);
@@ -8663,6 +8665,11 @@ void __init sched_init(void)
=20
 		rq->core_cookie =3D 0UL;
 #endif
+#ifdef CONFIG_SCHED_CACHE
+		raw_spin_lock_init(&rq->cpu_epoch_lock);
+		rq->cpu_epoch_next =3D now;
+#endif
+
 		zalloc_cpumask_var_node(&rq->scratch_mask, GFP_KERNEL, cpu_to_node(i));
 	}
=20
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a14da5396fb..e3897cd7696d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1166,10 +1166,229 @@ static s64 update_curr_se(struct rq *rq, struct sc=
hed_entity *curr)
 	return delta_exec;
 }
=20
-static inline void update_curr_task(struct task_struct *p, s64 delta_exec)
+#ifdef CONFIG_SCHED_CACHE
+
+/*
+ * XXX numbers come from a place the sun don't shine -- probably wants to =
be SD
+ * tunable or so.
+ */
+#define EPOCH_PERIOD	(HZ/100)	/* 10 ms */
+#define EPOCH_OLD	5		/* 50 ms */
+
+void mm_init_sched(struct mm_struct *mm, struct mm_sched *_pcpu_sched)
+{
+	unsigned long epoch;
+	int i;
+
+	for_each_possible_cpu(i) {
+		struct mm_sched *pcpu_sched =3D per_cpu_ptr(_pcpu_sched, i);
+		struct rq *rq =3D cpu_rq(i);
+
+		pcpu_sched->runtime =3D 0;
+		pcpu_sched->epoch =3D epoch =3D rq->cpu_epoch;
+		pcpu_sched->occ =3D -1;
+	}
+
+	raw_spin_lock_init(&mm->mm_sched_lock);
+	mm->mm_sched_epoch =3D epoch;
+	mm->mm_sched_cpu =3D -1;
+
+	smp_store_release(&mm->pcpu_sched, _pcpu_sched);
+}
+
+/* because why would C be fully specified */
+static __always_inline void __shr_u64(u64 *val, unsigned int n)
+{
+	if (n >=3D 64) {
+		*val =3D 0;
+		return;
+	}
+	*val >>=3D n;
+}
+
+static inline void __update_mm_sched(struct rq *rq, struct mm_sched *pcpu_=
sched)
+{
+	lockdep_assert_held(&rq->cpu_epoch_lock);
+
+	unsigned long n, now =3D jiffies;
+	long delta =3D now - rq->cpu_epoch_next;
+
+	if (delta > 0) {
+		n =3D (delta + EPOCH_PERIOD - 1) / EPOCH_PERIOD;
+		rq->cpu_epoch +=3D n;
+		rq->cpu_epoch_next +=3D n * EPOCH_PERIOD;
+		__shr_u64(&rq->cpu_runtime, n);
+	}
+
+	n =3D rq->cpu_epoch - pcpu_sched->epoch;
+	if (n) {
+		pcpu_sched->epoch +=3D n;
+		__shr_u64(&pcpu_sched->runtime, n);
+	}
+}
+
+static unsigned long fraction_mm_sched(struct rq *rq, struct mm_sched *pcp=
u_sched)
+{
+	guard(raw_spinlock_irqsave)(&rq->cpu_epoch_lock);
+
+	__update_mm_sched(rq, pcpu_sched);
+
+	/*
+	 * Runtime is a geometric series (r=3D0.5) and as such will sum to twice
+	 * the accumulation period, this means the multiplcation here should
+	 * not overflow.
+	 */
+	return div64_u64(NICE_0_LOAD * pcpu_sched->runtime, rq->cpu_runtime + 1);
+}
+
+static inline
+void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec)
+{
+	struct mm_struct *mm =3D p->mm;
+	struct mm_sched *pcpu_sched;
+	unsigned long epoch;
+
+	/*
+	 * init_task and kthreads don't be having no mm
+	 */
+	if (!mm || !mm->pcpu_sched)
+		return;
+
+	pcpu_sched =3D this_cpu_ptr(p->mm->pcpu_sched);
+
+	scoped_guard (raw_spinlock, &rq->cpu_epoch_lock) {
+		__update_mm_sched(rq, pcpu_sched);
+		pcpu_sched->runtime +=3D delta_exec;
+		rq->cpu_runtime +=3D delta_exec;
+		epoch =3D rq->cpu_epoch;
+	}
+
+	/*
+	 * If this task hasn't hit task_cache_work() for a while, invalidate
+	 * it's preferred state.
+	 */
+	if (epoch - READ_ONCE(mm->mm_sched_epoch) > EPOCH_OLD) {
+		mm->mm_sched_cpu =3D -1;
+		pcpu_sched->occ =3D -1;
+	}
+}
+
+static void task_tick_cache(struct rq *rq, struct task_struct *p)
+{
+	struct callback_head *work =3D &p->cache_work;
+	struct mm_struct *mm =3D p->mm;
+
+	if (!mm || !mm->pcpu_sched)
+		return;
+
+	if (mm->mm_sched_epoch =3D=3D rq->cpu_epoch)
+		return;
+
+	guard(raw_spinlock)(&mm->mm_sched_lock);
+
+	if (mm->mm_sched_epoch =3D=3D rq->cpu_epoch)
+		return;
+
+	if (work->next =3D=3D work) {
+		task_work_add(p, work, TWA_RESUME);
+		WRITE_ONCE(mm->mm_sched_epoch, rq->cpu_epoch);
+	}
+}
+
+static void task_cache_work(struct callback_head *work)
+{
+	struct task_struct *p =3D current;
+	struct mm_struct *mm =3D p->mm;
+	unsigned long m_a_occ =3D 0;
+	int cpu, m_a_cpu =3D -1;
+	cpumask_var_t cpus;
+
+	WARN_ON_ONCE(work !=3D &p->cache_work);
+
+	work->next =3D work;
+
+	if (p->flags & PF_EXITING)
+		return;
+
+	if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
+		return;
+
+	scoped_guard (cpus_read_lock) {
+		cpumask_copy(cpus, cpu_online_mask);
+
+		for_each_cpu(cpu, cpus) {
+			/* XXX sched_cluster_active */
+			struct sched_domain *sd =3D per_cpu(sd_llc, cpu);
+			unsigned long occ, m_occ =3D 0, a_occ =3D 0;
+			int m_cpu =3D -1, nr =3D 0, i;
+
+			for_each_cpu(i, sched_domain_span(sd)) {
+				occ =3D fraction_mm_sched(cpu_rq(i),
+							per_cpu_ptr(mm->pcpu_sched, i));
+				a_occ +=3D occ;
+				if (occ > m_occ) {
+					m_occ =3D occ;
+					m_cpu =3D i;
+				}
+				nr++;
+				trace_printk("(%d) occ: %ld m_occ: %ld m_cpu: %d nr: %d\n",
+					     per_cpu(sd_llc_id, i), occ, m_occ, m_cpu, nr);
+			}
+
+			a_occ /=3D nr;
+			if (a_occ > m_a_occ) {
+				m_a_occ =3D a_occ;
+				m_a_cpu =3D m_cpu;
+			}
+
+			trace_printk("(%d) a_occ: %ld m_a_occ: %ld\n",
+				     per_cpu(sd_llc_id, cpu), a_occ, m_a_occ);
+
+			for_each_cpu(i, sched_domain_span(sd)) {
+				/* XXX threshold ? */
+				per_cpu_ptr(mm->pcpu_sched, i)->occ =3D a_occ;
+			}
+
+			cpumask_andnot(cpus, cpus, sched_domain_span(sd));
+		}
+	}
+
+	/*
+	 * If the max average cache occupancy is 'small' we don't care.
+	 */
+	if (m_a_occ < (NICE_0_LOAD >> EPOCH_OLD))
+		m_a_cpu =3D -1;
+
+	mm->mm_sched_cpu =3D m_a_cpu;
+
+	free_cpumask_var(cpus);
+}
+
+void init_sched_mm(struct task_struct *p)
+{
+	struct callback_head *work =3D &p->cache_work;
+	init_task_work(work, task_cache_work);
+	work->next =3D work;
+}
+
+#else
+
+static inline void account_mm_sched(struct rq *rq, struct task_struct *p,
+				    s64 delta_exec) { }
+
+
+void init_sched_mm(struct task_struct *p) { }
+
+static void task_tick_cache(struct rq *rq, struct task_struct *p) { }
+
+#endif
+
+static inline
+void update_curr_task(struct rq *rq, struct task_struct *p, s64 delta_exec)
 {
 	trace_sched_stat_runtime(p, delta_exec);
 	account_group_exec_runtime(p, delta_exec);
+	account_mm_sched(rq, p, delta_exec);
 	cgroup_account_cputime(p, delta_exec);
 }
=20
@@ -1215,7 +1434,7 @@ s64 update_curr_common(struct rq *rq)
=20
 	delta_exec =3D update_curr_se(rq, &donor->se);
 	if (likely(delta_exec > 0))
-		update_curr_task(donor, delta_exec);
+		update_curr_task(rq, donor, delta_exec);
=20
 	return delta_exec;
 }
@@ -1244,7 +1463,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
 	if (entity_is_task(curr)) {
 		struct task_struct *p =3D task_of(curr);
=20
-		update_curr_task(p, delta_exec);
+		update_curr_task(rq, p, delta_exec);
=20
 		/*
 		 * If the fair_server is active, we need to account for the
@@ -7862,7 +8081,7 @@ static int select_idle_sibling(struct task_struct *p,=
 int prev, int target)
 	 * per-cpu select_rq_mask usage
 	 */
 	lockdep_assert_irqs_disabled();
-
+again:
 	if ((available_idle_cpu(target) || sched_idle_cpu(target)) &&
 	    asym_fits_cpu(task_util, util_min, util_max, target))
 		return target;
@@ -7900,7 +8119,8 @@ static int select_idle_sibling(struct task_struct *p,=
 int prev, int target)
 	/* Check a recently used CPU as a potential idle candidate: */
 	recent_used_cpu =3D p->recent_used_cpu;
 	p->recent_used_cpu =3D prev;
-	if (recent_used_cpu !=3D prev &&
+	if (prev =3D=3D p->wake_cpu &&
+	    recent_used_cpu !=3D prev &&
 	    recent_used_cpu !=3D target &&
 	    cpus_share_cache(recent_used_cpu, target) &&
 	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cp=
u)) &&
@@ -7953,6 +8173,18 @@ static int select_idle_sibling(struct task_struct *p=
, int prev, int target)
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
=20
+	if (prev !=3D p->wake_cpu && !cpus_share_cache(prev, p->wake_cpu)) {
+		/*
+		 * Most likely select_cache_cpu() will have re-directed
+		 * the wakeup, but getting here means the preferred cache is
+		 * too busy, so re-try with the actual previous.
+		 *
+		 * XXX wake_affine is lost for this pass.
+		 */
+		prev =3D target =3D p->wake_cpu;
+		goto again;
+	}
+
 	/*
 	 * For cluster machines which have lower sharing cache like L2 or
 	 * LLC Tag, we tend to find an idle CPU in the target's cluster
@@ -8575,6 +8807,40 @@ static int find_energy_efficient_cpu(struct task_str=
uct *p, int prev_cpu)
 	return target;
 }
=20
+#ifdef CONFIG_SCHED_CACHE
+static long __migrate_degrades_locality(struct task_struct *p, int src_cpu=
, int dst_cpu, bool idle);
+
+static int select_cache_cpu(struct task_struct *p, int prev_cpu)
+{
+	struct mm_struct *mm =3D p->mm;
+	int cpu;
+
+	if (!mm || p->nr_cpus_allowed =3D=3D 1)
+		return prev_cpu;
+
+	cpu =3D mm->mm_sched_cpu;
+	if (cpu < 0)
+		return prev_cpu;
+
+
+	if (static_branch_likely(&sched_numa_balancing) &&
+	    __migrate_degrades_locality(p, prev_cpu, cpu, false) > 0) {
+		/*
+		 * XXX look for max occupancy inside prev_cpu's node
+		 */
+		return prev_cpu;
+	}
+
+	return cpu;
+}
+#else
+static int select_cache_cpu(struct task_struct *p, int prev_cpu)
+{
+	return prev_cpu;
+}
+#endif
+
+
 /*
  * select_task_rq_fair: Select target runqueue for the waking task in doma=
ins
  * that have the relevant SD flag set. In practice, this is SD_BALANCE_WAK=
E,
@@ -8600,6 +8866,8 @@ select_task_rq_fair(struct task_struct *p, int prev_c=
pu, int wake_flags)
 	 * required for stable ->cpus_allowed
 	 */
 	lockdep_assert_held(&p->pi_lock);
+	guard(rcu)();
+
 	if (wake_flags & WF_TTWU) {
 		record_wakee(p);
=20
@@ -8607,6 +8875,8 @@ select_task_rq_fair(struct task_struct *p, int prev_c=
pu, int wake_flags)
 		    cpumask_test_cpu(cpu, p->cpus_ptr))
 			return cpu;
=20
+		new_cpu =3D prev_cpu =3D select_cache_cpu(p, prev_cpu);
+
 		if (!is_rd_overutilized(this_rq()->rd)) {
 			new_cpu =3D find_energy_efficient_cpu(p, prev_cpu);
 			if (new_cpu >=3D 0)
@@ -8617,7 +8887,6 @@ select_task_rq_fair(struct task_struct *p, int prev_c=
pu, int wake_flags)
 		want_affine =3D !wake_wide(p) && cpumask_test_cpu(cpu, p->cpus_ptr);
 	}
=20
-	rcu_read_lock();
 	for_each_domain(cpu, tmp) {
 		/*
 		 * If both 'cpu' and 'prev_cpu' are part of this domain,
@@ -8650,7 +8919,6 @@ select_task_rq_fair(struct task_struct *p, int prev_c=
pu, int wake_flags)
 		/* Fast path */
 		new_cpu =3D select_idle_sibling(p, prev_cpu, new_cpu);
 	}
-	rcu_read_unlock();
=20
 	return new_cpu;
 }
@@ -9300,6 +9568,17 @@ static int task_hot(struct task_struct *p, struct lb=
_env *env)
 	if (sysctl_sched_migration_cost =3D=3D 0)
 		return 0;
=20
+#ifdef CONFIG_SCHED_CACHE
+	if (p->mm && p->mm->pcpu_sched) {
+		/*
+		 * XXX things like Skylake have non-inclusive L3 and might not
+		 * like this L3 centric view. What to do about L2 stickyness ?
+		 */
+		return per_cpu_ptr(p->mm->pcpu_sched, env->src_cpu)->occ >
+		       per_cpu_ptr(p->mm->pcpu_sched, env->dst_cpu)->occ;
+	}
+#endif
+
 	delta =3D rq_clock_task(env->src_rq) - p->se.exec_start;
=20
 	return delta < (s64)sysctl_sched_migration_cost;
@@ -9311,27 +9590,25 @@ static int task_hot(struct task_struct *p, struct l=
b_env *env)
  * Returns 0, if task migration is not affected by locality.
  * Returns a negative value, if task migration improves locality i.e migra=
tion preferred.
  */
-static long migrate_degrades_locality(struct task_struct *p, struct lb_env=
 *env)
+static long __migrate_degrades_locality(struct task_struct *p, int src_cpu=
, int dst_cpu, bool idle)
 {
 	struct numa_group *numa_group =3D rcu_dereference(p->numa_group);
 	unsigned long src_weight, dst_weight;
 	int src_nid, dst_nid, dist;
=20
-	if (!static_branch_likely(&sched_numa_balancing))
-		return 0;
-
-	if (!p->numa_faults || !(env->sd->flags & SD_NUMA))
+	if (!p->numa_faults)
 		return 0;
=20
-	src_nid =3D cpu_to_node(env->src_cpu);
-	dst_nid =3D cpu_to_node(env->dst_cpu);
+	src_nid =3D cpu_to_node(src_cpu);
+	dst_nid =3D cpu_to_node(dst_cpu);
=20
 	if (src_nid =3D=3D dst_nid)
 		return 0;
=20
 	/* Migrating away from the preferred node is always bad. */
 	if (src_nid =3D=3D p->numa_preferred_nid) {
-		if (env->src_rq->nr_running > env->src_rq->nr_preferred_running)
+		struct rq *src_rq =3D cpu_rq(src_cpu);
+		if (src_rq->nr_running > src_rq->nr_preferred_running)
 			return 1;
 		else
 			return 0;
@@ -9342,7 +9619,7 @@ static long migrate_degrades_locality(struct task_str=
uct *p, struct lb_env *env)
 		return -1;
=20
 	/* Leaving a core idle is often worse than degrading locality. */
-	if (env->idle =3D=3D CPU_IDLE)
+	if (idle)
 		return 0;
=20
 	dist =3D node_distance(src_nid, dst_nid);
@@ -9357,7 +9634,24 @@ static long migrate_degrades_locality(struct task_st=
ruct *p, struct lb_env *env)
 	return src_weight - dst_weight;
 }
=20
+static long migrate_degrades_locality(struct task_struct *p, struct lb_env=
 *env)
+{
+	if (!static_branch_likely(&sched_numa_balancing))
+		return 0;
+
+	if (!(env->sd->flags & SD_NUMA))
+		return 0;
+
+	return __migrate_degrades_locality(p, env->src_cpu, env->dst_cpu,
+					   env->idle =3D=3D CPU_IDLE);
+}
+
 #else
+static long __migrate_degrades_locality(struct task_struct *p, int src_cpu=
, int dst_cpu, bool idle)
+{
+	return 0;
+}
+
 static inline long migrate_degrades_locality(struct task_struct *p,
 					     struct lb_env *env)
 {
@@ -13117,8 +13411,8 @@ static inline void task_tick_core(struct rq *rq, st=
ruct task_struct *curr) {}
  */
 static void task_tick_fair(struct rq *rq, struct task_struct *curr, int qu=
eued)
 {
-	struct cfs_rq *cfs_rq;
 	struct sched_entity *se =3D &curr->se;
+	struct cfs_rq *cfs_rq;
=20
 	for_each_sched_entity(se) {
 		cfs_rq =3D cfs_rq_of(se);
@@ -13128,6 +13422,8 @@ static void task_tick_fair(struct rq *rq, struct ta=
sk_struct *curr, int queued)
 	if (static_branch_unlikely(&sched_numa_balancing))
 		task_tick_numa(rq, curr);
=20
+	task_tick_cache(rq, curr);
+
 	update_misfit_status(curr, rq);
 	check_update_overutilized_status(task_rq(curr));
=20
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 83e3aa917142..839463027ab0 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1173,6 +1173,12 @@ struct rq {
 	u64			clock_pelt_idle_copy;
 	u64			clock_idle_copy;
 #endif
+#ifdef CONFIG_SCHED_CACHE
+	raw_spinlock_t		cpu_epoch_lock;
+	u64			cpu_runtime;
+	unsigned long		cpu_epoch;
+	unsigned long		cpu_epoch_next;
+#endif
=20
 	atomic_t		nr_iowait;
=20
@@ -3885,6 +3891,8 @@ static inline void task_tick_mm_cid(struct rq *rq, st=
ruct task_struct *curr) { }
 static inline void init_sched_mm_cid(struct task_struct *t) { }
 #endif /* !CONFIG_SCHED_MM_CID */
=20
+extern void init_sched_mm(struct task_struct *p);
+
 extern u64 avg_vruntime(struct cfs_rq *cfs_rq);
 extern int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se);
 #ifdef CONFIG_SMP
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 365A021D3E1
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:07:36 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.18
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716057; cv=none;
 b=gkWBJvwJg5iw9w3Hcj2a+7isBgs+dQ6fQDbY6wOnLy8+dyj/K69iJ9MXZ3iC+AHiVKMdhhAQoR1l9wBbUy+BDlfe78+DRZUcHT2UIqJWtHq9xcndAunehdB/pDXNo95Uc+pmFlmpm5x4k3E0kzRAGeqzAXJ2da+LetkIln18z8w=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716057; c=relaxed/simple;
	bh=0AeD1Ue2wq1wzi/RuwjSxpYJG1oGbIqRn0kfUtN8vWQ=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=ezySzW6WeP6U1sHlrXJpD4tSHCfswUkVtMkKRoY8mH+Is59q9EsKFce/r5LHpaugK4Vf9AWVbLfOfFJGgeU54XDE5BxdVqKFZzyDDz9t1/tqydhF9wFSbw/pomx2BYrO+hWtoQKyQHnIN8AUaxQhvGebuiVcyUt9i8bQDwF00zU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=L1tFeu3x; arc=none smtp.client-ip=198.175.65.18
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="L1tFeu3x"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716056; x=1786252056;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=0AeD1Ue2wq1wzi/RuwjSxpYJG1oGbIqRn0kfUtN8vWQ=;
  b=L1tFeu3xpYyhkd6vtGYa4ACOStVJ3enDl+olXnAD4THPx7m0Kc/94fQX
   0NFLzELjFB+k0dXkFEcvhvn2VXQNCEOqpU4KBJdAapmZmEa5Kw2a3uSD5
   5xGm04sNo/62GAtkSLJDhfLmYvSib+2Y+m+5iYRVQYWZMC9fcPoUUOJIk
   57s73MqGMxeACxAjkhR9PE504WxXvkEUrsCDlWBeU6A00KrTz8w5uJ8fg
   62R1OQ44QJ5eTLS/469R4lFtouEYqw6B8JU9gex0GxRi5dRP00WgMGCoV
   CD4HfPgwZPIsG54V4ibpdGi2Z/RSrK2prH4PrMvMSdJ5asPDTFjEDlvfI
   Q==;
X-CSE-ConnectionGUID: lQLqoBuSQl6aw8VZynnN4g==
X-CSE-MsgGUID: yMcXhXa1RZSWwEf2YK1T6Q==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="57137717"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="57137717"
Received: from fmviesa004.fm.intel.com ([10.60.135.144])
  by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:07:36 -0700
X-CSE-ConnectionGUID: ofaF5mj0TIucF4p9yS/Zvw==
X-CSE-MsgGUID: bcXwCarVSY2Y3E4fjnIy5g==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="170730164"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa004.fm.intel.com with ESMTP; 08 Aug 2025 22:07:29 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 02/28] sched: Several fixes for cache aware scheduling
Date: Sat,  9 Aug 2025 13:01:15 +0800
Message-Id: 
 <84ceaca0a1de853284b4fc9888af806b03cde8bb.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

1. Fix compile error on percpu allocation.
2. Enqueue to the target CPU rather than the current CPU.
3. NULL LLC sched domain check(Libo Chen).
4. Introduce sched feature SCHED_CACHE to control cache aware scheduling
5. Fix unsigned occupancy initialization to -1.
6. If there is only 1 thread in the process, no need to enable cache
   awareness
7. Add __maybe_unused to __migrate_degrades_locality() to
   avoid compile warnings.
8. Do not enable gcov coverage for task_cache_work() and
   fraction_mm_sched() to avoid softlockup by gcov.
9. Make CONFIG_SCHED_CACHE depending on CONFIG_SMP to
   avoid compile error on non-SMP system like microblaze
   architecture.
10. Do not enable account cache aware statistics in
    account_mm_sched() for non-normal tasks, as it could
    be invoked by RT tasks.(Shrikanth Hegde)
11. Place cpu_epoch related fields in a dedicated cache line
    to avoid interfering with clock_idle* fields.
    (Shrikanth Hegde)

Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 include/linux/mm_types.h |  4 ++--
 init/Kconfig             |  4 ++++
 kernel/sched/fair.c      | 41 +++++++++++++++++++++++++++-------------
 kernel/sched/features.h  |  1 +
 kernel/sched/sched.h     |  2 +-
 5 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index cf26ad8b41ab..41a598a44361 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1450,11 +1450,11 @@ static inline void mm_set_cpus_allowed(struct mm_st=
ruct *mm, const struct cpumas
 #endif /* CONFIG_SCHED_MM_CID */
=20
 #ifdef CONFIG_SCHED_CACHE
-extern void mm_init_sched(struct mm_struct *mm, struct mm_sched *pcpu_sche=
d);
+void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *pcpu_sc=
hed);
=20
 static inline int mm_alloc_sched_noprof(struct mm_struct *mm)
 {
-	struct mm_sched *pcpu_sched =3D alloc_percpu_noprof(struct mm_sched);
+	struct mm_sched __percpu *pcpu_sched =3D alloc_percpu_noprof(struct mm_sc=
hed);
 	if (!pcpu_sched)
 		return -ENOMEM;
=20
diff --git a/init/Kconfig b/init/Kconfig
index 27f4012347f9..4bab39a5254c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -950,6 +950,10 @@ config NUMA_BALANCING
 config SCHED_CACHE
 	bool "Cache aware scheduler"
 	default y
+	depends on SMP
+	help
+	  If set, the scheduler will try to aggregate tasks in the same process to
+	  a single LLC if possible.
=20
 config NUMA_BALANCING_DEFAULT_ENABLED
 	bool "Automatically enable NUMA aware memory/task placement"
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e3897cd7696d..e97ab46509e3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1175,7 +1175,7 @@ static s64 update_curr_se(struct rq *rq, struct sched=
_entity *curr)
 #define EPOCH_PERIOD	(HZ/100)	/* 10 ms */
 #define EPOCH_OLD	5		/* 50 ms */
=20
-void mm_init_sched(struct mm_struct *mm, struct mm_sched *_pcpu_sched)
+void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_s=
ched)
 {
 	unsigned long epoch;
 	int i;
@@ -1186,7 +1186,7 @@ void mm_init_sched(struct mm_struct *mm, struct mm_sc=
hed *_pcpu_sched)
=20
 		pcpu_sched->runtime =3D 0;
 		pcpu_sched->epoch =3D epoch =3D rq->cpu_epoch;
-		pcpu_sched->occ =3D -1;
+		pcpu_sched->occ =3D 0;
 	}
=20
 	raw_spin_lock_init(&mm->mm_sched_lock);
@@ -1227,7 +1227,7 @@ static inline void __update_mm_sched(struct rq *rq, s=
truct mm_sched *pcpu_sched)
 	}
 }
=20
-static unsigned long fraction_mm_sched(struct rq *rq, struct mm_sched *pcp=
u_sched)
+static unsigned long __no_profile fraction_mm_sched(struct rq *rq, struct =
mm_sched *pcpu_sched)
 {
 	guard(raw_spinlock_irqsave)(&rq->cpu_epoch_lock);
=20
@@ -1248,13 +1248,18 @@ void account_mm_sched(struct rq *rq, struct task_st=
ruct *p, s64 delta_exec)
 	struct mm_sched *pcpu_sched;
 	unsigned long epoch;
=20
+	if (!sched_feat(SCHED_CACHE))
+		return;
+
+	if (p->sched_class !=3D &fair_sched_class)
+		return;
 	/*
 	 * init_task and kthreads don't be having no mm
 	 */
 	if (!mm || !mm->pcpu_sched)
 		return;
=20
-	pcpu_sched =3D this_cpu_ptr(p->mm->pcpu_sched);
+	pcpu_sched =3D per_cpu_ptr(p->mm->pcpu_sched, cpu_of(rq));
=20
 	scoped_guard (raw_spinlock, &rq->cpu_epoch_lock) {
 		__update_mm_sched(rq, pcpu_sched);
@@ -1264,12 +1269,14 @@ void account_mm_sched(struct rq *rq, struct task_st=
ruct *p, s64 delta_exec)
 	}
=20
 	/*
-	 * If this task hasn't hit task_cache_work() for a while, invalidate
+	 * If this task hasn't hit task_cache_work() for a while, or it
+	 * has only 1 thread, invalidate
 	 * it's preferred state.
 	 */
-	if (epoch - READ_ONCE(mm->mm_sched_epoch) > EPOCH_OLD) {
+	if (epoch - READ_ONCE(mm->mm_sched_epoch) > EPOCH_OLD ||
+	    get_nr_threads(p) <=3D 1) {
 		mm->mm_sched_cpu =3D -1;
-		pcpu_sched->occ =3D -1;
+		pcpu_sched->occ =3D 0;
 	}
 }
=20
@@ -1278,6 +1285,9 @@ static void task_tick_cache(struct rq *rq, struct tas=
k_struct *p)
 	struct callback_head *work =3D &p->cache_work;
 	struct mm_struct *mm =3D p->mm;
=20
+	if (!sched_feat(SCHED_CACHE))
+		return;
+
 	if (!mm || !mm->pcpu_sched)
 		return;
=20
@@ -1286,16 +1296,13 @@ static void task_tick_cache(struct rq *rq, struct t=
ask_struct *p)
=20
 	guard(raw_spinlock)(&mm->mm_sched_lock);
=20
-	if (mm->mm_sched_epoch =3D=3D rq->cpu_epoch)
-		return;
-
 	if (work->next =3D=3D work) {
 		task_work_add(p, work, TWA_RESUME);
 		WRITE_ONCE(mm->mm_sched_epoch, rq->cpu_epoch);
 	}
 }
=20
-static void task_cache_work(struct callback_head *work)
+static void __no_profile task_cache_work(struct callback_head *work)
 {
 	struct task_struct *p =3D current;
 	struct mm_struct *mm =3D p->mm;
@@ -1322,6 +1329,9 @@ static void task_cache_work(struct callback_head *wor=
k)
 			unsigned long occ, m_occ =3D 0, a_occ =3D 0;
 			int m_cpu =3D -1, nr =3D 0, i;
=20
+			if (!sd)
+				continue;
+
 			for_each_cpu(i, sched_domain_span(sd)) {
 				occ =3D fraction_mm_sched(cpu_rq(i),
 							per_cpu_ptr(mm->pcpu_sched, i));
@@ -8815,6 +8825,9 @@ static int select_cache_cpu(struct task_struct *p, in=
t prev_cpu)
 	struct mm_struct *mm =3D p->mm;
 	int cpu;
=20
+	if (!sched_feat(SCHED_CACHE))
+		return prev_cpu;
+
 	if (!mm || p->nr_cpus_allowed =3D=3D 1)
 		return prev_cpu;
=20
@@ -9569,7 +9582,7 @@ static int task_hot(struct task_struct *p, struct lb_=
env *env)
 		return 0;
=20
 #ifdef CONFIG_SCHED_CACHE
-	if (p->mm && p->mm->pcpu_sched) {
+	if (sched_feat(SCHED_CACHE) && p->mm && p->mm->pcpu_sched) {
 		/*
 		 * XXX things like Skylake have non-inclusive L3 and might not
 		 * like this L3 centric view. What to do about L2 stickyness ?
@@ -9647,7 +9660,9 @@ static long migrate_degrades_locality(struct task_str=
uct *p, struct lb_env *env)
 }
=20
 #else
-static long __migrate_degrades_locality(struct task_struct *p, int src_cpu=
, int dst_cpu, bool idle)
+static __maybe_unused long __migrate_degrades_locality(struct task_struct =
*p,
+						       int src_cpu, int dst_cpu,
+						       bool idle)
 {
 	return 0;
 }
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 3c12d9f93331..d2af7bfd36bf 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -87,6 +87,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
  */
 SCHED_FEAT(SIS_UTIL, true)
=20
+SCHED_FEAT(SCHED_CACHE, true)
 /*
  * Issue a WARN when we do multiple update_rq_clock() calls
  * in a single rq->lock section. Default disabled because the
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 839463027ab0..f4ab45ecca86 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1174,7 +1174,7 @@ struct rq {
 	u64			clock_idle_copy;
 #endif
 #ifdef CONFIG_SCHED_CACHE
-	raw_spinlock_t		cpu_epoch_lock;
+	raw_spinlock_t		cpu_epoch_lock ____cacheline_aligned;
 	u64			cpu_runtime;
 	unsigned long		cpu_epoch;
 	unsigned long		cpu_epoch_next;
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4AE9D226CFC
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:07:58 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.15
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716079; cv=none;
 b=mzMpUPIBhCIfqgfGYLirSyVew1DiGNGy8kHH9pByDFwjQLg/R08SklG4sqt+h9F0MjNW8uROdXW9EhU0eQGBZx9K4bKZLpb32NTZ568kuQTL5xijzNnbyKAfpI4nekWx9gHcKn2NZrcT76Sz4xJ2qgXu5qqYX/ksmmq4ZH9u+uk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716079; c=relaxed/simple;
	bh=57gcT4d4kI048m64jkmqBDrfcowgupEqZQwQ5AEee0w=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=f3RQRxzvnj66spzaJhbe9MgqAYCa98AUSZuwljPsRXInxq/Oxk06wgAT2vRlhS0ehsQOQHM82nnzblQnvrJvdVfOkRoSiG94h3cOAWLd5yBkPjkPpHdCL+rW9rkGbaTLW4RhQdXSHhYol4ZYkaUUjpFkZLW21Gb6+B8vqJUbOW8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=c7C5KF6q; arc=none smtp.client-ip=198.175.65.15
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="c7C5KF6q"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716078; x=1786252078;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=57gcT4d4kI048m64jkmqBDrfcowgupEqZQwQ5AEee0w=;
  b=c7C5KF6q/sRDGDVG+lM2wu4H0TzbLyHIoFXYrwK1PhMTWehBhmUrl96S
   T3NkwyIpBdauVKsc3hrNWBirHezNT+Ts0OE7838wAraS+qmqOaNyn/zFO
   uMSRssAaGwLukBsRJhTXc1N5I0Xy/egiTw1fhkKvS4U8SfrTfRWrmwRa3
   RPU4tMB524z4Z6MxtH6azdWGiN57MoFd2/dFpTSaE7cXAWavDizO4/WkF
   yI3XD8KwS9r/rQo9E5DRI45b4Vgd1JhUvkVPHt9fZqza6Nai4EKnx1UNE
   0Vq3A218YIyzDKDxbIwkrtpMqkq0EWb+pBp1au3p6UMfK3D2O6VNQOM+h
   A==;
X-CSE-ConnectionGUID: Fqgj+0eySbSqlUZZn4AFOw==
X-CSE-MsgGUID: Y/ZbdGdBSSKyPLJRd4lBYA==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="60682928"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="60682928"
Received: from orviesa005.jf.intel.com ([10.64.159.145])
  by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:07:58 -0700
X-CSE-ConnectionGUID: vDQ06Yc6TseUjHJDDorjIA==
X-CSE-MsgGUID: sVVKpPMjSQmXlszsGV2bRg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="170841605"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by orviesa005.jf.intel.com with ESMTP; 08 Aug 2025 22:07:52 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 03/28] sched: Avoid task migration within its preferred
 LLC
Date: Sat,  9 Aug 2025 13:01:41 +0800
Message-Id: 
 <37376d2072f6175d2fb909a29b66a3da0bcfcce3.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

It was found that when running schbench, there is a
significant amount of in-LLC task migrations, even if
the wakee is woken up on its preferred LLC. This
leads to core-to-core latency and impairs performance.

Inhibit task migration if the wakee is already in its
preferred LLC.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 kernel/sched/fair.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e97ab46509e3..00bd0d25bc91 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8835,6 +8835,8 @@ static int select_cache_cpu(struct task_struct *p, in=
t prev_cpu)
 	if (cpu < 0)
 		return prev_cpu;
=20
+	if (cpus_share_cache(cpu, prev_cpu))
+		return prev_cpu;
=20
 	if (static_branch_likely(&sched_numa_balancing) &&
 	    __migrate_degrades_locality(p, prev_cpu, cpu, false) > 0) {
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 029841DC994
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:08:14 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.15
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716096; cv=none;
 b=WeFRqzbJuIG6rs2oBAOGSTvUwR0GN2LxtVKltBBp1IAWJ1/M5927lhVOryPqkDV68MiNoDiPeaUuIGeJXy1yxTNPU+76g8h8o2kq0++bTNlmXdtCkgRKkjAyvo6JUXPfk9qDPu5fNyxlfwrmUYWgRrIiKd3DbVL5bDFDsKlvmIg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716096; c=relaxed/simple;
	bh=3Uph+Pq82/wD/SKt6Wb33FmEmMZN7GyPmlnOYYynb0Q=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=o3E1Bq0Hs0dTChFZYVm4GVOaPeXcpLPhZZEQJ43VOlsuNVair0TGmjdyd1fLlVjeODS5guLeDmjO76w+loIt+jPuKVqUMOnTWw1sIHx/QjCLlCjeJzEknh9dFn7KMZ1m1CPRGI1DknlEPNf/b1KDvycj+UhJPpZyr/+EIpZ7wE0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=KCQ9b2gg; arc=none smtp.client-ip=198.175.65.15
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="KCQ9b2gg"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716095; x=1786252095;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=3Uph+Pq82/wD/SKt6Wb33FmEmMZN7GyPmlnOYYynb0Q=;
  b=KCQ9b2ggnvTfhFeV/E50fpZUEEpmBsHHnpwH10t35fPh5GZQ4EVxbF9O
   iwfbtGzsyddng/NZIteqbsCZ21Nl1B6x7QxI9972j42g46j13xjdwRoZ3
   8A5ColX2OkCXP0fikLLx5ox8/8xMCGNiAOuHNT4EVTgK2VkSLTjB4x6k2
   OuuokNSBejb3QbstBidVgae5eMr6rPiKsjUpKeIv2M/QgpCk+dAN8C98Z
   9hQOg7BYjmAjMdUmUQXdfIf7u4hNaX6qUCPOtPPWVhaIxMAKXUR8DS4hA
   yBD3fm5G5+abwatbqRE6FgrAva6LfJ6mMuAuKCPYQ9SPgzZrUlqukRs7B
   A==;
X-CSE-ConnectionGUID: 2W0v7dR/SgiRTYDXa05aZw==
X-CSE-MsgGUID: hTr8aLcoRrKUtcFs+hv2jQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="60682947"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="60682947"
Received: from orviesa005.jf.intel.com ([10.64.159.145])
  by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:08:15 -0700
X-CSE-ConnectionGUID: /DUYqfzETGitdTxBYN+2cg==
X-CSE-MsgGUID: /4qu8b2ATfih1n9Fc4FHNQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="170841644"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by orviesa005.jf.intel.com with ESMTP; 08 Aug 2025 22:08:09 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 04/28] sched: Avoid calculating the cpumask if the
 system is overloaded
Date: Sat,  9 Aug 2025 13:02:04 +0800
Message-Id: 
 <88d1c3bc1e817cc72346f566153a4618604b9ecd.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: K Prateek Nayak <kprateek.nayak@amd.com>

If SIS_UTIL terminates the search for idle CPUs, the result of
cpumask_and() becomes irrelevant. Given that select_idle_cpu()
may now be invoked twice per wake-up within select_idle_sibling()
due to cache-aware wake-ups, this overhead can be observed in
benchmarks such as hackbench.

To conserve additional cycles-particularly in scenarios where
the LLC is frequently targeted and the search aborts because
the LLC is busy - calculate the cpumask only when the system is
not overloaded.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 00bd0d25bc91..a7be5c5ecba3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7940,8 +7940,6 @@ static int select_idle_cpu(struct task_struct *p, str=
uct sched_domain *sd, bool
 	int i, cpu, idle_cpu =3D -1, nr =3D INT_MAX;
 	struct sched_domain_shared *sd_share;
=20
-	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
-
 	if (sched_feat(SIS_UTIL)) {
 		sd_share =3D rcu_dereference(per_cpu(sd_llc_shared, target));
 		if (sd_share) {
@@ -7953,6 +7951,8 @@ static int select_idle_cpu(struct task_struct *p, str=
uct sched_domain *sd, bool
 		}
 	}
=20
+	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+
 	if (static_branch_unlikely(&sched_cluster_active)) {
 		struct sched_group *sg =3D sd->groups;
=20
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E3651DE3BE
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:08:29 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.17
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716112; cv=none;
 b=PThukVAMb7sFuKxVcgD3wTM4phVJ8rI9r7+ebpAt9DmMUXFw/IDfhBgJzVDETnMraJUzTpjxdg0CH+MqsJFdZpG0+0YGabzIwZ03oS5dGUCpiuwcqMBi79EXkvVi691ZUTMUjUdIFwJpzWnQscUTpYi1EC8GJgP4BnZ2xG+wmR8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716112; c=relaxed/simple;
	bh=LDYuA5WJyjvULccdhS0DHdWu2p7tGoJ/eeAWhiaAlrg=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=bj8SEXlQMrLRUboVVGnQOu9JBe5g36Sf1XMNZ4Fxig9ZZalYOFEmmdY0+oe5Ky5U018MFmBrwaWfJieFQcobideyIOiTbWBmhitES6gj23mv2S9buVmE5umygmQde5ClZGVf904vuEv77RefIPDld9g1BbkRQFkgRCwN1dgJYBw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=Yz3P0As2; arc=none smtp.client-ip=198.175.65.17
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="Yz3P0As2"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716109; x=1786252109;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=LDYuA5WJyjvULccdhS0DHdWu2p7tGoJ/eeAWhiaAlrg=;
  b=Yz3P0As2P7sAAeLgT7nQe+nKKbEUL/+9rKdSGXrRCerJoyPyZVJiAIca
   rTCc2mIWo0r7xKXHBwJCzl+lhPLNxfq7ThXFDqw/086ptM6gsSmcdBYFy
   0XG6Bpx4G8F6WIlomNDg2uKFh3+Gf6iv4ohkTrkI1AR9d2HRIWlbSXqPg
   gjIc3qKxMHdoEmw84F/oRaqsVQVLHKyRLGcXSUZ869pJdp3tCl5EFYIHx
   RipzC73I4/a7J8WSfr1XW9s1QojcqMVZE0c1LndRlkFmT99Paa711cvo0
   L3/AK8mOeiqf6B9FcyzKS+XWq8jtfhVABayP9NRmDhXta4Wem5Y5cg2O/
   Q==;
X-CSE-ConnectionGUID: 1ql6YZT/RpyNFYw0G8nvRg==
X-CSE-MsgGUID: Py+W8Q8/QNGIYDmbtHMFoQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="57019866"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="57019866"
Received: from fmviesa005.fm.intel.com ([10.60.135.145])
  by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:08:29 -0700
X-CSE-ConnectionGUID: IcQ9RT+CQgWiji82ikx4LA==
X-CSE-MsgGUID: ojuGhmoFTkKNKnNcd1Z5hA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="169704810"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa005.fm.intel.com with ESMTP; 08 Aug 2025 22:08:23 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 05/28] sched: Add hysteresis to switch a task's
 preferred LLC
Date: Sat,  9 Aug 2025 13:02:18 +0800
Message-Id: 
 <e51f8a6e172606d520c91c94c0c14b045639217e.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

Switching a process's preferred LLC generates lots of task
migrations across LLCs. To avoid frequent switches
of home LLC, implement the following policy:

1. Require a 2x occ change threshold to switch preferred LLC
2. Don't discard preferred LLC for a task

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a7be5c5ecba3..9e3c6f0eb934 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1175,6 +1175,14 @@ static s64 update_curr_se(struct rq *rq, struct sche=
d_entity *curr)
 #define EPOCH_PERIOD	(HZ/100)	/* 10 ms */
 #define EPOCH_OLD	5		/* 50 ms */
=20
+static int llc_id(int cpu)
+{
+	if (cpu < 0)
+		return -1;
+
+	return per_cpu(sd_llc_id, cpu);
+}
+
 void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_s=
ched)
 {
 	unsigned long epoch;
@@ -1307,6 +1315,7 @@ static void __no_profile task_cache_work(struct callb=
ack_head *work)
 	struct task_struct *p =3D current;
 	struct mm_struct *mm =3D p->mm;
 	unsigned long m_a_occ =3D 0;
+	unsigned long last_m_a_occ =3D 0;
 	int cpu, m_a_cpu =3D -1;
 	cpumask_var_t cpus;
=20
@@ -1345,11 +1354,13 @@ static void __no_profile task_cache_work(struct cal=
lback_head *work)
 					     per_cpu(sd_llc_id, i), occ, m_occ, m_cpu, nr);
 			}
=20
-			a_occ /=3D nr;
+			// a_occ /=3D nr;
 			if (a_occ > m_a_occ) {
 				m_a_occ =3D a_occ;
 				m_a_cpu =3D m_cpu;
 			}
+			if (llc_id(cpu) =3D=3D llc_id(mm->mm_sched_cpu))
+				last_m_a_occ =3D a_occ;
=20
 			trace_printk("(%d) a_occ: %ld m_a_occ: %ld\n",
 				     per_cpu(sd_llc_id, cpu), a_occ, m_a_occ);
@@ -1363,13 +1374,10 @@ static void __no_profile task_cache_work(struct cal=
lback_head *work)
 		}
 	}
=20
-	/*
-	 * If the max average cache occupancy is 'small' we don't care.
-	 */
-	if (m_a_occ < (NICE_0_LOAD >> EPOCH_OLD))
-		m_a_cpu =3D -1;
-
-	mm->mm_sched_cpu =3D m_a_cpu;
+	if (m_a_occ > (2 * last_m_a_occ)) {
+		/* avoid the bouncing of mm_sched_cpu */
+		mm->mm_sched_cpu =3D m_a_cpu;
+	}
=20
 	free_cpumask_var(cpus);
 }
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id F24CE21D3E1
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:09:05 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.18
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716147; cv=none;
 b=LI4h+6OJWZaBS0TR7Q3NBzoXkmy9JmDvvsP0v6h3Wr+GpKZR5W1whx1t+MpULY/tpCopCTQwtk+d4eHYbBbXGG8tw911CUm66GpfCtas8ctsmrrtOtpyFMSEQ1wSEmj4dWMkrZhPa8ugb3u+CyP9djkHSe8sZ2gx2yAfRIPx9CU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716147; c=relaxed/simple;
	bh=euRWyMdufoCin//rgMJ4T3fWMZpHw1jQB7L14khTmb4=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=FhwU+egXgnivL5wAIp8WWClNxp4yT7p2+qq0OVtPsZHuRHUqSefqGa8Mw69mD2l/SvPpXJI146UF1gIL0OohXuZVBnBx9uuzjtJB0fwJPdjrZGa51C5jL55hj27fGIPyUJpmNwnKZK7cVsQFji5MJ9gcLGaigOLthcWnAQfM4Ww=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=XB8zwFIH; arc=none smtp.client-ip=198.175.65.18
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="XB8zwFIH"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716146; x=1786252146;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=euRWyMdufoCin//rgMJ4T3fWMZpHw1jQB7L14khTmb4=;
  b=XB8zwFIHW5tROwSjY+TkhHve8zxklhKn+ovNLIF8SV5U0iIW3uisGZ+h
   hY/ESXsB+pgUaisyny/yy5pPLNKo8MGtQy5YXsFvNJCRR/qBzsPTciwpd
   DWKZJ1KhyMu77ycu6eYUAzqZuN/gRnxFRxIu0gB+CDzvdaP0yW6Alm3q2
   6uSDm53TwoaOggR9d3iPh9Z+dpDEn90e2yYpi8OZHptMKcxOxMQuhOE8g
   XyVt9GJRY6uXVn+Xhk0ObrEJv4d8fU3+v015Xl9/d69ko5uk8uOcrTvoC
   5KhPHr2patZHHRizOM78ma1nH5m9MLqfkUzr1tVCq2xOpXisttW9XZBQc
   Q==;
X-CSE-ConnectionGUID: fZf0KiitSgGguAZwYKknTQ==
X-CSE-MsgGUID: qtUqGlf/QgGtoI55sDwkdQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="57137770"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="57137770"
Received: from fmviesa006.fm.intel.com ([10.60.135.146])
  by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:09:05 -0700
X-CSE-ConnectionGUID: 7a50TPFuTq2l0110KDRwEA==
X-CSE-MsgGUID: IsqZoesVQLiHgA5hLOCuhg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="165374544"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa006.fm.intel.com with ESMTP; 08 Aug 2025 22:09:00 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 06/28] sched: Save the per LLC utilization for better
 cache aware scheduling
Date: Sat,  9 Aug 2025 13:02:54 +0800
Message-Id: 
 <d77d4db175adc09cd01fdee097c16bc3e52c8be2.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

When a system gets busy and a process's preferred LLC
is saturated by too many threads within this process, there are significant
in-LLC task migrations within its preferred LLC. This leads to migration
latency and degrades performance. Ideally, task aggregation should be
inhibited if the task's preferred LLC is overloaded. This implies that a
metric is needed to indicate whether the LLC is busy.

Store the per-LLC utilization calculated via periodic load
balancing. These statistics will be used in subsequent patches to
determine whether tasks should be aggregated to their preferred LLC.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 include/linux/sched/topology.h |  3 ++
 kernel/sched/fair.c            | 53 ++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 198bb5cc1774..692f8a703b93 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -78,6 +78,9 @@ struct sched_domain_shared {
 	atomic_t	nr_busy_cpus;
 	int		has_idle_cores;
 	int		nr_idle_scan;
+#ifdef CONFIG_SCHED_CACHE
+	unsigned long	util_avg;
+#endif
 };
=20
 struct sched_domain {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9e3c6f0eb934..4f79b7652642 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8828,6 +8828,22 @@ static int find_energy_efficient_cpu(struct task_str=
uct *p, int prev_cpu)
 #ifdef CONFIG_SCHED_CACHE
 static long __migrate_degrades_locality(struct task_struct *p, int src_cpu=
, int dst_cpu, bool idle);
=20
+/* expected to be protected by rcu_read_lock() */
+static bool get_llc_stats(int cpu, unsigned long *util,
+			  unsigned long *cap)
+{
+	struct sched_domain_shared *sd_share;
+
+	sd_share =3D rcu_dereference(per_cpu(sd_llc_shared, cpu));
+	if (!sd_share)
+		return false;
+
+	*util =3D READ_ONCE(sd_share->util_avg);
+	*cap =3D per_cpu(sd_llc_size, cpu) * SCHED_CAPACITY_SCALE;
+
+	return true;
+}
+
 static int select_cache_cpu(struct task_struct *p, int prev_cpu)
 {
 	struct mm_struct *mm =3D p->mm;
@@ -10670,6 +10686,42 @@ sched_reduced_capacity(struct rq *rq, struct sched=
_domain *sd)
 	return check_cpu_capacity(rq, sd);
 }
=20
+#ifdef CONFIG_SCHED_CACHE
+/*
+ * Save this sched group's statistic for later use:
+ * The task wakeup and load balance can make better
+ * decision based on these statistics.
+ */
+static void update_sg_if_llc(struct lb_env *env, struct sg_lb_stats *sgs,
+			     struct sched_group *group)
+{
+	/* Find the sched domain that spans this group. */
+	struct sched_domain *sd =3D env->sd->child;
+	struct sched_domain_shared *sd_share;
+
+	if (!sched_feat(SCHED_CACHE) || env->idle =3D=3D CPU_NEWLY_IDLE)
+		return;
+
+	/* only care the sched domain that spans 1 LLC */
+	if (!sd || !(sd->flags & SD_SHARE_LLC) ||
+	    !sd->parent || (sd->parent->flags & SD_SHARE_LLC))
+		return;
+
+	sd_share =3D rcu_dereference(per_cpu(sd_llc_shared,
+				  cpumask_first(sched_group_span(group))));
+	if (!sd_share)
+		return;
+
+	if (likely(READ_ONCE(sd_share->util_avg) !=3D sgs->group_util))
+		WRITE_ONCE(sd_share->util_avg, sgs->group_util);
+}
+#else
+static inline void update_sg_if_llc(struct lb_env *env, struct sg_lb_stats=
 *sgs,
+				    struct sched_group *group)
+{
+}
+#endif
+
 /**
  * update_sg_lb_stats - Update sched_group's statistics for load balancing.
  * @env: The load balancing environment.
@@ -10759,6 +10811,7 @@ static inline void update_sg_lb_stats(struct lb_env=
 *env,
=20
 	sgs->group_type =3D group_classify(env->sd->imbalance_pct, group, sgs);
=20
+	update_sg_if_llc(env, sgs, group);
 	/* Computing avg_load makes sense only when group is overloaded */
 	if (sgs->group_type =3D=3D group_overloaded)
 		sgs->avg_load =3D (sgs->group_load * SCHED_CAPACITY_SCALE) /
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A097E125D6
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:09:19 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.18
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716161; cv=none;
 b=Rm7TcpHSNUIIoFYEPwrZQSk2+mFBFaIi8Biv/YBu8NhjoOpqLYiSDc8n/N7a+PcWKj1D5lN8yxsfXFZpqwNZo9V27otdxT/bMNi/j+pCcQsy85gMx6mqoYUfLUdWB0a4zERaoznppBe6okhDs8L/kX4GGnSX8g32CRW4pXc/jS4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716161; c=relaxed/simple;
	bh=V4wQFWsJMut3Mv4WM/pNBpXZswfWqWCXgqNTuOQgQwc=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=tR/Eg9pwE3kfKHQ1XrSvfns7tRo1/54dBTa3Mcw/Vf9PUP5J9yjcpYZjzIWQ96CkLtFqrg19Zl0Cj25CGXm9QEsmQiICsQrbY8sep3kg5LmP1PeugdTFvBMNkphNKcutc0NLmiPwdKekqx5gwUaOM5x4KAy1UaLJ56yS2wqthlE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=cZoFv+co; arc=none smtp.client-ip=198.175.65.18
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="cZoFv+co"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716160; x=1786252160;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=V4wQFWsJMut3Mv4WM/pNBpXZswfWqWCXgqNTuOQgQwc=;
  b=cZoFv+cooYycPX6fhTDFz3M70RpTXLjWW54A4RKCHWeGADL7LpNgffxW
   gBfRlECeZkPmt7ZbNEgFqOrc7h37RbVfI0hpXkfJXAbSVhFqX1dyT3XdI
   KBuYaf5c3EOsWZOREhMvQUtHsoLmWta+xL56O2v1gsR3leEwTYp2Wagee
   zuCK7oxtqjbXilAu6g6eLj5fAL1la9xryvQW3Hx9lwncNu2ChThNoNOIL
   8rqMMvopoFaWOd9vVKtpzIX0eyrh2S0jAjm/gycY3Z9ipQFIzNCz17K0J
   RJ7dwofq53rsYOMlEnDTxOs2VjSm+OkCdnAYfR59wl5PkmnVd8HuSY4K7
   g==;
X-CSE-ConnectionGUID: U6TaVEBsQ1urnPH9VT6FpQ==
X-CSE-MsgGUID: xLUFPU/zRaayB1BNHTCWGw==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="57137798"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="57137798"
Received: from fmviesa006.fm.intel.com ([10.60.135.146])
  by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:09:19 -0700
X-CSE-ConnectionGUID: MAk8xVHNTeiRA2GQcBH6/g==
X-CSE-MsgGUID: rE8RpT+sQLqRJLnxvLOvng==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="165374559"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa006.fm.intel.com with ESMTP; 08 Aug 2025 22:09:13 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 07/28] sched: Add helper function to decide whether to
 allow cache aware scheduling
Date: Sat,  9 Aug 2025 13:03:10 +0800
Message-Id: 
 <701c7be7f0e69582d9ad0c25025ec2e133e73fbb.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

Cache-aware scheduling is designed to aggregate threads into their
preferred LLC, either via the task wake up path or the load balancing
path. One side effect is that when the preferred LLC is saturated,
more threads will continue to be stacked on it, degrading the workload's
latency. A strategy is needed to prevent this aggregation from going too
far such that the preferred LLC is too overloaded.

Introduce helper function _get_migrate_hint() to implement the
LLC migration policy:

1) A task is aggregated to its preferred LLC if both source/dest LLC
   are not too busy (<50% utilization, tunable), or the preferred
   LLC will not be too out of balanced from the non preferred LLC
   (>20% utilization, tunable, close to imbalance_pct of the LLC
   domain).
2) Allow a task to be moved from the preferred LLC to the
   non-preferred one if the non-preferred LLC will not be too out
   of balanced from the preferred prompting an aggregation task
   migration later.  We are still experimenting with the aggregation
   and migration policy. Some other possibilities are policy based
   on LLC's load or average number of tasks running.  Those could
   be tried out by tweaking _get_migrate_hint().

The function _get_migrate_hint() returns migration suggestions for
the upper-level functions.

Aggregation will tend to make utilization on the preferred LLC to
be more than the non-preferred one. Parameter "sysctl_llc_aggr_imb"
is the imbalance allowed. If it is set to 0, as long as the preferred
LLC is not utilized more than the source LLC, we could still aggregate
towards the preferred LLC and a preference could still be there.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/debug.c |   4 ++
 kernel/sched/fair.c  | 110 ++++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/sched.h |   5 ++
 3 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 557246880a7e..682fd91a42a0 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -532,6 +532,10 @@ static __init int sched_init_debug(void)
 	debugfs_create_u32("hot_threshold_ms", 0644, numa, &sysctl_numa_balancing=
_hot_threshold);
 #endif
=20
+#ifdef CONFIG_SCHED_CACHE
+	debugfs_create_u32("llc_aggr_cap", 0644, debugfs_sched, &sysctl_llc_aggr_=
cap);
+	debugfs_create_u32("llc_aggr_imb", 0644, debugfs_sched, &sysctl_llc_aggr_=
imb);
+#endif
 	debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops=
);
=20
 	debugfs_fair_server_init();
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4f79b7652642..3128dbcf0a36 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8826,7 +8826,39 @@ static int find_energy_efficient_cpu(struct task_str=
uct *p, int prev_cpu)
 }
=20
 #ifdef CONFIG_SCHED_CACHE
-static long __migrate_degrades_locality(struct task_struct *p, int src_cpu=
, int dst_cpu, bool idle);
+static long __migrate_degrades_locality(struct task_struct *p,
+					int src_cpu, int dst_cpu,
+					bool idle);
+__read_mostly unsigned int sysctl_llc_aggr_cap       =3D 50;
+__read_mostly unsigned int sysctl_llc_aggr_imb       =3D 20;
+
+/*
+ * The margin used when comparing LLC utilization with CPU capacity.
+ * Parameter sysctl_llc_aggr_cap determines the LLC load level where
+ * active LLC aggregation is done.
+ * Derived from fits_capacity().
+ *
+ * (default: ~50%)
+ */
+#define fits_llc_capacity(util, max)	\
+	((util) * 100 < (max) * sysctl_llc_aggr_cap)
+
+/*
+ * The margin used when comparing utilization.
+ * is 'util1' noticeably greater than 'util2'
+ * Derived from capacity_greater().
+ * Bias is in perentage.
+ */
+/* Allows dst util to be bigger than src util by up to bias percent */
+#define util_greater(util1, util2) \
+	((util1) * 100 > (util2) * (100 + sysctl_llc_aggr_imb))
+
+enum llc_mig_hint {
+	mig_allow =3D 0,
+	mig_ignore,
+	mig_forbid
+};
+
=20
 /* expected to be protected by rcu_read_lock() */
 static bool get_llc_stats(int cpu, unsigned long *util,
@@ -8844,6 +8876,82 @@ static bool get_llc_stats(int cpu, unsigned long *ut=
il,
 	return true;
 }
=20
+static enum llc_mig_hint _get_migrate_hint(int src_cpu, int dst_cpu,
+					   unsigned long tsk_util,
+					   bool to_pref)
+{
+	unsigned long src_util, dst_util, src_cap, dst_cap;
+
+	if (cpus_share_cache(src_cpu, dst_cpu))
+		return mig_allow;
+
+	if (!get_llc_stats(src_cpu, &src_util, &src_cap) ||
+	    !get_llc_stats(dst_cpu, &dst_util, &dst_cap))
+		return mig_ignore;
+
+	if (!fits_llc_capacity(dst_util, dst_cap) &&
+	    !fits_llc_capacity(src_util, src_cap))
+		return mig_ignore;
+
+	src_util =3D src_util < tsk_util ? 0 : src_util - tsk_util;
+	dst_util =3D dst_util + tsk_util;
+	if (to_pref) {
+		/*
+		 * sysctl_llc_aggr_imb is the imbalance allowed between
+		 * preferred LLC and non-preferred LLC.
+		 * Don't migrate if we will get preferred LLC too
+		 * heavily loaded and if the dest is much busier
+		 * than the src, in which case migration will
+		 * increase the imbalance too much.
+		 */
+		if (!fits_llc_capacity(dst_util, dst_cap) &&
+		    util_greater(dst_util, src_util))
+			return mig_forbid;
+	} else {
+		/*
+		 * Don't migrate if we will leave preferred LLC
+		 * too idle, or if this migration leads to the
+		 * non-preferred LLC falls within sysctl_aggr_imb percent
+		 * of preferred LLC, leading to migration again
+		 * back to preferred LLC.
+		 */
+		if (fits_llc_capacity(src_util, src_cap) ||
+		    !util_greater(src_util, dst_util))
+			return mig_forbid;
+	}
+	return mig_allow;
+}
+
+/*
+ * Give suggestion when task p is migrated from src_cpu to dst_cpu.
+ */
+static __maybe_unused enum llc_mig_hint get_migrate_hint(int src_cpu, int =
dst_cpu,
+							 struct task_struct *p)
+{
+	struct mm_struct *mm;
+	int cpu;
+
+	if (cpus_share_cache(src_cpu, dst_cpu))
+		return mig_allow;
+
+	mm =3D p->mm;
+	if (!mm)
+		return mig_allow;
+
+	cpu =3D mm->mm_sched_cpu;
+	if (cpu < 0)
+		return mig_allow;
+
+	if (cpus_share_cache(dst_cpu, cpu))
+		return _get_migrate_hint(src_cpu, dst_cpu,
+					 task_util(p), true);
+	else if (cpus_share_cache(src_cpu, cpu))
+		return _get_migrate_hint(src_cpu, dst_cpu,
+					 task_util(p), false);
+	else
+		return mig_allow;
+}
+
 static int select_cache_cpu(struct task_struct *p, int prev_cpu)
 {
 	struct mm_struct *mm =3D p->mm;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f4ab45ecca86..83552aab74fb 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2844,6 +2844,11 @@ extern unsigned int sysctl_numa_balancing_scan_perio=
d_max;
 extern unsigned int sysctl_numa_balancing_scan_size;
 extern unsigned int sysctl_numa_balancing_hot_threshold;
=20
+#ifdef CONFIG_SCHED_CACHE
+extern unsigned int sysctl_llc_aggr_cap;
+extern unsigned int sysctl_llc_aggr_imb;
+#endif
+
 #ifdef CONFIG_SCHED_HRTICK
=20
 /*
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A24F2749D5
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:09:34 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.18
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716177; cv=none;
 b=NNJRjopqrRbhFk19x1BndJWZ90HqNxRnx0H7JE+07eSr/bdUJMU/c0NJ3LB2cV94Rsi1R1AdGM1d2xlML1jh2RnTHB1Dzdvr0qBwgdFuA8zjncQEUZO6kHF1Y2GSQmE70Toj/gzstTrxtr3JAqld0iuOXw9GF3i3gZmGNxoXo9Q=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716177; c=relaxed/simple;
	bh=Q9WyVWFAxaya8q+BQJRZR5gSlB4xoSQBTrL3eFp0H3k=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=OJCxUgjq3LhGvzfQnlmgDk0eZNcoMO3+ooDXAn+dS2XPaoAD00XBVRhoIDDmzlGUakQWDO9E3wgaByP2px0tcqYpCXMaz1PTKVbRf0IMfCL7wmj/Pl6WrMp5Uk7woFJsPxrVPRghCGLe/mewLfBt3ueL13B0csiHsH9qkS/YD8s=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=OW5YzRQl; arc=none smtp.client-ip=198.175.65.18
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="OW5YzRQl"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716176; x=1786252176;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Q9WyVWFAxaya8q+BQJRZR5gSlB4xoSQBTrL3eFp0H3k=;
  b=OW5YzRQlQh/nBHXwqrjN0l8Y7LfITnFjeexFpLxenGyiJvbsUX2LdH+f
   NmdygIjqrdclruZgedLbBxuvOrc8rS64ODYq+fjwfwXQVB0yteRfnTP/u
   KUJ6NLJp7E5qLizUurKDYQ/CQ8WhKvO+A1CCWJcny7Ywyk5pWHn0+ihL2
   fyfKV1cKZRnLjLxVHkt7AZCj9E7OPIlGwDuDChPwUD61pbaKxh7wR9gpr
   q7g35VzJcDGPAJtv/VzN73wW2yx/6zcGH0VLxrR+XHBCqIvDlMHA2v4f0
   DYgKaiB40pfFiu4dwUi7Ps3HxC6vxt6/7c8fQVryZXz/WrZQ5I8EfdNEJ
   A==;
X-CSE-ConnectionGUID: Y02Bl/8pShaUv1kRhHj39A==
X-CSE-MsgGUID: 3/idV79SS1m5g1kaTEBzDw==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="57137811"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="57137811"
Received: from fmviesa006.fm.intel.com ([10.60.135.146])
  by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:09:34 -0700
X-CSE-ConnectionGUID: XuuWOYxBTOOma0SA+bSkWg==
X-CSE-MsgGUID: Ip2uRtmWRum9W2ItftRIPg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="165374575"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa006.fm.intel.com with ESMTP; 08 Aug 2025 22:09:28 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 08/28] sched: Set up LLC indexing
Date: Sat,  9 Aug 2025 13:03:24 +0800
Message-Id: 
 <959d897daadc28b8115c97df04eec2af0fd79c5d.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

Prepare for indexing arrays that track in each run queue: the number
of tasks preferring current LLC and each of the other LLC.

The reason to introduce LLC index is because the per LLC-scope data
is needed to do cache aware load balancing. However, the native lld_id
is usually the first CPU of that LLC domain, which is not continuous,
which might waste the space if the per LLC-scope data is stored
in an array (in current implementation).

In the future, this LLC index could be removed after
the native llc_id is used as the key to search into xarray based
array.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 include/linux/sched.h   |  3 +++
 kernel/sched/fair.c     | 12 ++++++++++++
 kernel/sched/sched.h    |  2 ++
 kernel/sched/topology.c | 29 +++++++++++++++++++++++++++++
 4 files changed, 46 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 02ff8b8be25b..81d92e8097f5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -809,6 +809,9 @@ struct kmap_ctrl {
 #endif
 };
=20
+/* XXX need fix to not use magic number */
+#define MAX_LLC 64
+
 struct task_struct {
 #ifdef CONFIG_THREAD_INFO_IN_TASK
 	/*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3128dbcf0a36..f5075d287c51 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1183,6 +1183,18 @@ static int llc_id(int cpu)
 	return per_cpu(sd_llc_id, cpu);
 }
=20
+/*
+ * continuous index.
+ * TBD: replace by xarray with key llc_id()
+ */
+static inline int llc_idx(int cpu)
+{
+	if (cpu < 0)
+		return -1;
+
+	return per_cpu(sd_llc_idx, cpu);
+}
+
 void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_s=
ched)
 {
 	unsigned long epoch;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 83552aab74fb..c37c74dfce25 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2056,6 +2056,7 @@ static inline struct sched_domain *lowest_flag_domain=
(int cpu, int flag)
 DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc);
 DECLARE_PER_CPU(int, sd_llc_size);
 DECLARE_PER_CPU(int, sd_llc_id);
+DECLARE_PER_CPU(int, sd_llc_idx);
 DECLARE_PER_CPU(int, sd_share_id);
 DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
 DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa);
@@ -2064,6 +2065,7 @@ DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_=
cpucapacity);
=20
 extern struct static_key_false sched_asym_cpucapacity;
 extern struct static_key_false sched_cluster_active;
+extern int max_llcs;
=20
 static __always_inline bool sched_asym_cpucap_active(void)
 {
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b958fe48e020..91a2b7f65fee 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -657,6 +657,7 @@ static void destroy_sched_domains(struct sched_domain *=
sd)
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc);
 DEFINE_PER_CPU(int, sd_llc_size);
 DEFINE_PER_CPU(int, sd_llc_id);
+DEFINE_PER_CPU(int, sd_llc_idx);
 DEFINE_PER_CPU(int, sd_share_id);
 DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
@@ -666,6 +667,25 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cp=
ucapacity);
 DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
 DEFINE_STATIC_KEY_FALSE(sched_cluster_active);
=20
+int max_llcs =3D -1;
+
+static void update_llc_idx(int cpu)
+{
+#ifdef CONFIG_SCHED_CACHE
+	int idx =3D -1, llc_id =3D -1;
+
+	llc_id =3D per_cpu(sd_llc_id, cpu);
+	idx =3D per_cpu(sd_llc_idx, llc_id);
+
+	if (idx < 0) {
+		idx =3D max_llcs++;
+		BUG_ON(idx > MAX_LLC);
+		per_cpu(sd_llc_idx, llc_id) =3D idx;
+	}
+	per_cpu(sd_llc_idx, cpu) =3D idx;
+#endif
+}
+
 static void update_top_cache_domain(int cpu)
 {
 	struct sched_domain_shared *sds =3D NULL;
@@ -684,6 +704,7 @@ static void update_top_cache_domain(int cpu)
 	per_cpu(sd_llc_size, cpu) =3D size;
 	per_cpu(sd_llc_id, cpu) =3D id;
 	rcu_assign_pointer(per_cpu(sd_llc_shared, cpu), sds);
+	update_llc_idx(cpu);
=20
 	sd =3D lowest_flag_domain(cpu, SD_CLUSTER);
 	if (sd)
@@ -2456,6 +2477,14 @@ build_sched_domains(const struct cpumask *cpu_map, s=
truct sched_domain_attr *att
 	bool has_asym =3D false;
 	bool has_cluster =3D false;
=20
+#ifdef CONFIG_SCHED_CACHE
+	if (max_llcs < 0) {
+		for_each_possible_cpu(i)
+			per_cpu(sd_llc_idx, i) =3D -1;
+		max_llcs =3D 0;
+	}
+#endif
+
 	if (WARN_ON(cpumask_empty(cpu_map)))
 		goto error;
=20
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 72B6D15665C
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:10:00 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.13
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716202; cv=none;
 b=b2L1GhXR1bKfaGNtSt2TSFJgivwyGWl/zG0ke7CHSqEAcHuFzHVF+Vph/AG5ZfJphDMNbIxy4SUFIumjAOZK2TB0Z4jGWHlzOvGKs9kRxGy3WkdJTVEO3FLULtEJnBKj5AORTkYZlfIB4LE4Izx1MQm/ZkRn8Sz9XQb/WKOg49I=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716202; c=relaxed/simple;
	bh=yDyElrdoJP1owudVvXOmuFFmGrAdqlZ/3LSJv9270PI=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=j3HhsKTtfxpLA3rJndUgWlhiEvEp59nMppnJBuBt3n57eWQsLpmRNumx60yqEWTiU+2a0Akk/6QCT1AAxHoly+zNGqSyQJ4Og7AOKWr039BhdBdB9rB0XwOlLRo6MBk6oRA3xZMgTm/i1/Glk/eLvpJOrXxvUAqHVunt8/0Gy+Q=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=jA/q+1HC; arc=none smtp.client-ip=198.175.65.13
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="jA/q+1HC"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716201; x=1786252201;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=yDyElrdoJP1owudVvXOmuFFmGrAdqlZ/3LSJv9270PI=;
  b=jA/q+1HCZoWL9/if9zUiP2RagKNIlzVG2/9XmXTq+Ai1+idEA3GniBpc
   0U+c7IPLnvxvaiyKdJCCUu5unbGE6uc4OZCK3b3LFPBZpAbM8stdCZMnd
   Wj/PbIIK7iHErHRNJoSuTG2Hz3Kd1S2DZTWM7lcoF8Rml/dJplEh3gVCt
   vpngwG0Zm9NV0fxTmPcRsqshl1tnvy4tttj+WdiTSfQEPhYj49I+gD0bh
   3UQewsPVTarSIp+hr1KFG3cogmN+Rd4lGhrxiPXp8zr5spR5put/n5Xyn
   1MEtslmziwaMvG+ZfcPPao0HwFNVTDTVL4ngyCM61uwjqrVJWtD3T8BQ/
   Q==;
X-CSE-ConnectionGUID: W/xDy5btSjOwlHAt10VALQ==
X-CSE-MsgGUID: hD1wAyqHT7WAJ49GLugw0A==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="68139885"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="68139885"
Received: from fmviesa008.fm.intel.com ([10.60.135.148])
  by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:10:00 -0700
X-CSE-ConnectionGUID: c2jIhCe2THSYJtrA2gygJg==
X-CSE-MsgGUID: PCF36aw+QPy8wB+b//2Ymw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="165891237"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa008.fm.intel.com with ESMTP; 08 Aug 2025 22:09:54 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 09/28] sched: Introduce task preferred LLC field
Date: Sat,  9 Aug 2025 13:03:50 +0800
Message-Id: 
 <a8e889738dbb543c2d3844c1f1089aabc6b94bef.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

With cache aware scheduling enabled, each process is assigned
a preferred LLC id, which will be used to quickly identify
the LLC domain this thread prefers to run. This is similar to
numa_preferred_nid for NUMA balance.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 include/linux/sched.h | 1 +
 init/init_task.c      | 3 +++
 kernel/sched/fair.c   | 7 +++++++
 3 files changed, 11 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81d92e8097f5..ac4973728c3e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1408,6 +1408,7 @@ struct task_struct {
=20
 #ifdef CONFIG_SCHED_CACHE
 	struct callback_head		cache_work;
+	int				preferred_llc;
 #endif
=20
 #ifdef CONFIG_RSEQ
diff --git a/init/init_task.c b/init/init_task.c
index e557f622bd90..5fffbe766f57 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -188,6 +188,9 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) =
=3D {
 	.numa_group	=3D NULL,
 	.numa_faults	=3D NULL,
 #endif
+#ifdef CONFIG_SCHED_CACHE
+	.preferred_llc  =3D -1,
+#endif
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	.kasan_depth	=3D 1,
 #endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f5075d287c51..94ad84ba19e1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1267,6 +1267,7 @@ void account_mm_sched(struct rq *rq, struct task_stru=
ct *p, s64 delta_exec)
 	struct mm_struct *mm =3D p->mm;
 	struct mm_sched *pcpu_sched;
 	unsigned long epoch;
+	int mm_sched_llc =3D -1;
=20
 	if (!sched_feat(SCHED_CACHE))
 		return;
@@ -1298,6 +1299,12 @@ void account_mm_sched(struct rq *rq, struct task_str=
uct *p, s64 delta_exec)
 		mm->mm_sched_cpu =3D -1;
 		pcpu_sched->occ =3D 0;
 	}
+
+	if (mm->mm_sched_cpu !=3D -1)
+		mm_sched_llc =3D per_cpu(sd_llc_id, mm->mm_sched_cpu);
+
+	if (p->preferred_llc !=3D mm_sched_llc)
+		p->preferred_llc =3D mm_sched_llc;
 }
=20
 static void task_tick_cache(struct rq *rq, struct task_struct *p)
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 89D17157E6B
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:10:13 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.13
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716215; cv=none;
 b=YUFlGTDaJF85FWib/Q7a+Wsp38xQlzoRV7AlyUsBs2T1Nm/D3GTDbCOLCu/JYT4Bz1kY7FBLiNzXW6SeRLGEdj+kiCkLEbPCq1Dkw53ko18P2N2wbe+qOYsR7L33XPzRdv8x7pin6JN4QQ3K2vGxPtYxzPr3f13C84cowPsbc0I=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716215; c=relaxed/simple;
	bh=z06d0DcakiSO/sbAcrP1nzo0jya94AyoNYzp6vdKu+E=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=Ie5cAVqHtvb8rhtiw/uv2b7q9XOHp9GEaOPJqeh1V5G10tNLgO252LhnRQe82GxwRDsA9JqO8Z/Pn7dGb6/9pzL+eDl4+d0jt7D8uohXyByy5gEgRhnGJ+jOPjC5jrJlsyqSb7bw9iDQtCUKZPX6LrQnQOAMGwjypK1vT60zh4Q=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=A43oUD+X; arc=none smtp.client-ip=198.175.65.13
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="A43oUD+X"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716214; x=1786252214;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=z06d0DcakiSO/sbAcrP1nzo0jya94AyoNYzp6vdKu+E=;
  b=A43oUD+XrpEd//+VfIwxy0/clww1jBOSaqukWPErBvX/YdTyCblU2SZc
   2RzVMdCc8oIYfY7mAnN+bu4TlDneL5nuPMn8idWsUe/ibRoK5MwcfHrk1
   4wf705GDHnZFwZzx7MaW2tVbkko9eMPuBBX9wEZV9YRSNvgsYWWcfkB9f
   X09DYxiaF6aoyy46GVmca0RePk7ZqdJVl5uzZAHcWSo20QuUXb6HVtKNk
   d843I5ITdrSq6lu1g0W1GAYdjZ+obzYIC4503sdpdA31Ura1IlPBWiexd
   +xto3bCIJV99nwMfneStXQQK9UCe9VrSRX+40SkrUq9jsfJUv0KlPZ3A8
   Q==;
X-CSE-ConnectionGUID: plGWHn/OT2iWZBXIShk1/w==
X-CSE-MsgGUID: HnPTC8sbTiuVF1BIb62/Dg==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="68139903"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="68139903"
Received: from fmviesa008.fm.intel.com ([10.60.135.148])
  by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:10:13 -0700
X-CSE-ConnectionGUID: UNc6FU+9Rn6IWN1WcGQpPA==
X-CSE-MsgGUID: H1SRXM8VR6yqBYvNRQaDcA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="165891283"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa008.fm.intel.com with ESMTP; 08 Aug 2025 22:10:07 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 10/28] sched: Calculate the number of tasks that have
 LLC preference on a runqueue
Date: Sat,  9 Aug 2025 13:04:04 +0800
Message-Id: 
 <bd767d696636f233f30f334d41f46ce45c256948.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

Track for each run queue, the number of tasks that have a LLC preference
and how many of those tasks are running in its preferred LLC.  This is
similar to nr_numa_running and nr_preferred_running for NUMA balance,
and will be used by the cache-aware load balancing in subsequent patches.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/core.c  | 12 +++++++++++
 kernel/sched/fair.c  | 51 +++++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/sched.h |  7 ++++++
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a5fb3057b1c4..a97a8039ce91 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -499,6 +499,18 @@ void __trace_set_current_state(int state_value)
 }
 EXPORT_SYMBOL(__trace_set_current_state);
=20
+#ifdef CONFIG_SMP
+int task_llc(const struct task_struct *p)
+{
+	return per_cpu(sd_llc_id, task_cpu(p));
+}
+#else
+int task_llc(const struct task_struct *p)
+{
+	return 0;
+}
+#endif
+
 /*
  * Serialization rules:
  *
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 94ad84ba19e1..f964d5a44fcc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1195,6 +1195,24 @@ static inline int llc_idx(int cpu)
 	return per_cpu(sd_llc_idx, cpu);
 }
=20
+static void account_llc_enqueue(struct rq *rq, struct task_struct *p)
+{
+	if (!sched_feat(SCHED_CACHE))
+		return;
+
+	rq->nr_llc_running +=3D (p->preferred_llc !=3D -1);
+	rq->nr_pref_llc_running +=3D (p->preferred_llc =3D=3D task_llc(p));
+}
+
+static void account_llc_dequeue(struct rq *rq, struct task_struct *p)
+{
+	if (!sched_feat(SCHED_CACHE))
+		return;
+
+	rq->nr_llc_running -=3D (p->preferred_llc !=3D -1);
+	rq->nr_pref_llc_running -=3D (p->preferred_llc =3D=3D task_llc(p));
+}
+
 void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_s=
ched)
 {
 	unsigned long epoch;
@@ -1303,8 +1321,11 @@ void account_mm_sched(struct rq *rq, struct task_str=
uct *p, s64 delta_exec)
 	if (mm->mm_sched_cpu !=3D -1)
 		mm_sched_llc =3D per_cpu(sd_llc_id, mm->mm_sched_cpu);
=20
-	if (p->preferred_llc !=3D mm_sched_llc)
+	if (p->preferred_llc !=3D mm_sched_llc) {
+		account_llc_dequeue(rq, p);
 		p->preferred_llc =3D mm_sched_llc;
+		account_llc_enqueue(rq, p);
+	}
 }
=20
 static void task_tick_cache(struct rq *rq, struct task_struct *p)
@@ -1408,6 +1429,17 @@ void init_sched_mm(struct task_struct *p)
 	work->next =3D work;
 }
=20
+void reset_llc_stats(struct rq *rq)
+{
+	if (!sched_feat(SCHED_CACHE))
+		return;
+
+	if (rq->nr_llc_running)
+		rq->nr_llc_running =3D 0;
+
+	rq->nr_pref_llc_running =3D 0;
+}
+
 #else
=20
 static inline void account_mm_sched(struct rq *rq, struct task_struct *p,
@@ -1418,6 +1450,17 @@ void init_sched_mm(struct task_struct *p) { }
=20
 static void task_tick_cache(struct rq *rq, struct task_struct *p) { }
=20
+static void account_llc_enqueue(struct rq *rq, struct task_struct *p)
+{
+}
+
+static void account_llc_dequeue(struct rq *rq, struct task_struct *p)
+{
+}
+
+void reset_llc_stats(struct rq *rq)
+{
+}
 #endif
=20
 static inline
@@ -3957,6 +4000,7 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct =
sched_entity *se)
 		struct rq *rq =3D rq_of(cfs_rq);
=20
 		account_numa_enqueue(rq, task_of(se));
+		account_llc_enqueue(rq, task_of(se));
 		list_add(&se->group_node, &rq->cfs_tasks);
 	}
 #endif
@@ -3970,10 +4014,15 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struc=
t sched_entity *se)
 #ifdef CONFIG_SMP
 	if (entity_is_task(se)) {
 		account_numa_dequeue(rq_of(cfs_rq), task_of(se));
+		account_llc_dequeue(rq_of(cfs_rq), task_of(se));
 		list_del_init(&se->group_node);
 	}
 #endif
 	cfs_rq->nr_queued--;
+
+	/* safeguard? */
+	if (!parent_entity(se) && !cfs_rq->nr_queued)
+		reset_llc_stats(rq_of(cfs_rq));
 }
=20
 /*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c37c74dfce25..8026e2c66e9f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1106,6 +1106,10 @@ struct rq {
 	unsigned int		nr_preferred_running;
 	unsigned int		numa_migrate_on;
 #endif
+#ifdef CONFIG_SCHED_CACHE
+	unsigned int		nr_pref_llc_running;
+	unsigned int		nr_llc_running;
+#endif
 #ifdef CONFIG_NO_HZ_COMMON
 #ifdef CONFIG_SMP
 	unsigned long		last_blocked_load_update_tick;
@@ -1967,6 +1971,9 @@ init_numa_balancing(unsigned long clone_flags, struct=
 task_struct *p)
=20
 #endif /* !CONFIG_NUMA_BALANCING */
=20
+void reset_llc_stats(struct rq *rq);
+int task_llc(const struct task_struct *p);
+
 #ifdef CONFIG_SMP
=20
 static inline void
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACB411DE3BE
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:10:33 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.17
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716235; cv=none;
 b=lYq4sXXnPZXpun4uCiODi1V3d/oXpyY0WTO8EB77SDdlY6hzvunzTarb/1BsD6HfY68VzKdZ/P3gkg6eJSWY/V4zZO3lamlWGMLfuQQ9r0QtHrs78GolCQXrw2NP8BYJ0Ju1m5iSnICs6lWRkPa7xXiHxKjTT1AzsfOnMwn7rTs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716235; c=relaxed/simple;
	bh=mMOSEbE/HppEmtkHE5TulvZHAhAbaSvHHXGxGBzFcgU=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=j3HRMLmVG1MFAS1qP6GNt/s6/77Uh+WddEprQ5Z3l8TxNvB+P4AvujK8AWnshyXFcGmTbARtQ+BIJgmdYZlbH9m8Qs+2XeS6vuXVoCzJlMxbKBnl0JQB9Z0xEIolcdu+YlhDSc69qnES+cWMESyBPPowQbtqjplsmzkh92xzVx4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=geKQjfK/; arc=none smtp.client-ip=198.175.65.17
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="geKQjfK/"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716234; x=1786252234;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=mMOSEbE/HppEmtkHE5TulvZHAhAbaSvHHXGxGBzFcgU=;
  b=geKQjfK/ENQzxJIVdaSMSI44R3jzrRtlLAcLLVwYHZqNhRHgxmIbm8PU
   XiWO/MC/iDc0e7d2LUmRe0CKRrydnkWRCT7V1d5Ru83rqB+7K/+RrZNhk
   +2sQx23IAdQbusICOeU5sYoOB7pa5uDZu1oWLgGvhnJwEFa2V+2w+qxj2
   m6YwMTmZ4b38bn4agOoOn4ktTclSJFaj3Mp772dwNENS9tmK3L6FTfdSK
   gtxcuwqrQMw1U7n0bVzf8SFtoDVy9euo9ZcsqmS67rcCdjvv8Sewo0TTC
   69ZCq755Kkj3SVTPSNh4ROx9trC8pZSNM1tAluNyLdBtgKLb9L4vwsHnZ
   g==;
X-CSE-ConnectionGUID: gBIugcE8QvyHdtE7rR2INg==
X-CSE-MsgGUID: omeDEy9ySEOEIT5GfzHbDg==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="57019934"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="57019934"
Received: from fmviesa005.fm.intel.com ([10.60.135.145])
  by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:10:33 -0700
X-CSE-ConnectionGUID: 3IoQ+9+0SaCZvxJgCfv5ug==
X-CSE-MsgGUID: 3gf5XiXlTDy8oJdmBqQc4w==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="169705062"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa005.fm.intel.com with ESMTP; 08 Aug 2025 22:10:27 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 11/28] sched: Introduce per runqueue task LLC
 preference counter
Date: Sat,  9 Aug 2025 13:04:18 +0800
Message-Id: 
 <fc52347ecefb756cb3a9a7d90874e7c502fa55e0.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

Each runqueue is assigned a static array, where each element indicates
the number of tasks preferring a particular LLC mapped to the
array index.

For example, rq->nr_pref_llc[3] =3D 2 signifies that there are 2 tasks on
this runqueue which prefer to run within LLC3 (indexed from 0 to MAX_LLC
across the entire system). With this information, the load balancer can
make better decisions to select the busiest runqueue and migrate tasks
to their preferred LLC domains.

Note: The static array could be converted to an xarray in the future.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c  | 42 ++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/sched.h |  1 +
 2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f964d5a44fcc..cfae71ee870b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1195,22 +1195,51 @@ static inline int llc_idx(int cpu)
 	return per_cpu(sd_llc_idx, cpu);
 }
=20
+static inline int pref_llc_idx(struct task_struct *p)
+{
+	return llc_idx(p->preferred_llc);
+}
+
 static void account_llc_enqueue(struct rq *rq, struct task_struct *p)
 {
+	int pref_llc;
+
 	if (!sched_feat(SCHED_CACHE))
 		return;
=20
 	rq->nr_llc_running +=3D (p->preferred_llc !=3D -1);
 	rq->nr_pref_llc_running +=3D (p->preferred_llc =3D=3D task_llc(p));
+
+	if (p->preferred_llc < 0)
+		return;
+
+	pref_llc =3D pref_llc_idx(p);
+	if (pref_llc < 0)
+		return;
+
+	++rq->nr_pref_llc[pref_llc];
 }
=20
 static void account_llc_dequeue(struct rq *rq, struct task_struct *p)
 {
+	int pref_llc;
+
 	if (!sched_feat(SCHED_CACHE))
 		return;
=20
 	rq->nr_llc_running -=3D (p->preferred_llc !=3D -1);
 	rq->nr_pref_llc_running -=3D (p->preferred_llc =3D=3D task_llc(p));
+
+	if (p->preferred_llc < 0)
+		return;
+
+	pref_llc =3D pref_llc_idx(p);
+	if (pref_llc < 0)
+		return;
+
+	/* avoid negative counter */
+	if (rq->nr_pref_llc[pref_llc] > 0)
+		--rq->nr_pref_llc[pref_llc];
 }
=20
 void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_s=
ched)
@@ -1279,6 +1308,8 @@ static unsigned long __no_profile fraction_mm_sched(s=
truct rq *rq, struct mm_sch
 	return div64_u64(NICE_0_LOAD * pcpu_sched->runtime, rq->cpu_runtime + 1);
 }
=20
+static unsigned int task_running_on_cpu(int cpu, struct task_struct *p);
+
 static inline
 void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec)
 {
@@ -1321,7 +1352,9 @@ void account_mm_sched(struct rq *rq, struct task_stru=
ct *p, s64 delta_exec)
 	if (mm->mm_sched_cpu !=3D -1)
 		mm_sched_llc =3D per_cpu(sd_llc_id, mm->mm_sched_cpu);
=20
-	if (p->preferred_llc !=3D mm_sched_llc) {
+	/* task not on rq accounted later in account_entity_enqueue() */
+	if (task_running_on_cpu(rq->cpu, p) &&
+	    p->preferred_llc !=3D mm_sched_llc) {
 		account_llc_dequeue(rq, p);
 		p->preferred_llc =3D mm_sched_llc;
 		account_llc_enqueue(rq, p);
@@ -1431,11 +1464,16 @@ void init_sched_mm(struct task_struct *p)
=20
 void reset_llc_stats(struct rq *rq)
 {
+	int i;
+
 	if (!sched_feat(SCHED_CACHE))
 		return;
=20
-	if (rq->nr_llc_running)
+	if (rq->nr_llc_running) {
+		for (i =3D 0; i < MAX_LLC; ++i)
+			rq->nr_pref_llc[i] =3D 0;
 		rq->nr_llc_running =3D 0;
+	}
=20
 	rq->nr_pref_llc_running =3D 0;
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 8026e2c66e9f..4464b92767ad 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1109,6 +1109,7 @@ struct rq {
 #ifdef CONFIG_SCHED_CACHE
 	unsigned int		nr_pref_llc_running;
 	unsigned int		nr_llc_running;
+	unsigned int		nr_pref_llc[MAX_LLC];
 #endif
 #ifdef CONFIG_NO_HZ_COMMON
 #ifdef CONFIG_SMP
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE60C2472BA
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:10:49 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.17
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716251; cv=none;
 b=ADP9uISEjTXXmrZUS4riveuGfD0WO35c5rz53HhmhWNMrD8Dv6NEgTvyjQYnfTc2u3LwbGoaDVNp6SSxFFbbtfw/FH7XlH762INSbgUPWTafzzs+ATW6FN2x9nPTJmp96ZH+mnf1JNlGrm30zuWW2dvWodZS8ErATrdpAPxPp5I=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716251; c=relaxed/simple;
	bh=aqSPvY6s7QQr//GlOU5D+JTqKry164SNV/VZUR7Kspk=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=t/5xtGUzGrKxeHEjmp10AAMihG56pqeTBrBMPkuq4PHQwrU2AfkoQsDB4Y9TtiCy7GzKfao5LjE7hgoUByuq8rYZKjatFGSKo04f1EvdsfMKnnI+kvm1KPBegBNQVntmqPVlcdaoa8X2w/mUXe+QOHkdUYdt1Sj1Q17LdJWTtvM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=hnHR/vdK; arc=none smtp.client-ip=198.175.65.17
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="hnHR/vdK"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716250; x=1786252250;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=aqSPvY6s7QQr//GlOU5D+JTqKry164SNV/VZUR7Kspk=;
  b=hnHR/vdKO7tdLOhgecBrJ/L6XtQ1BO8dVbYIB82zj+iOPcT1jV8xsoaQ
   qVvNm13ubKRb17a6vcXBW7sO0sqESiBXEDAA3LOy6nrman47fi+cJF4GK
   /APwE4CIXL2nHpsyP/5wUxfP2JBnuaAMw69BpWv3yxh4gHtAvKYTTGlqu
   yYAwTm4DfQnzeKlTLUohbS6ngMAvDbvMCBqHRIliyvHmG2k+p5eyDajr2
   YczcwUmYJan/0n1K6JWf7awWcuI7A08+OPYkjN3MABdDHrz0f/nYtETcb
   muTOyFAlK1kHzNTlu61gC8CfjRqlA8omMF+AZVXL7YPft7aD6yFC9a32S
   Q==;
X-CSE-ConnectionGUID: 5tGpMziNRzOAuvg9CBzp+w==
X-CSE-MsgGUID: RdOEUAbzTDSASRaq4EwAIw==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="57019960"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="57019960"
Received: from fmviesa005.fm.intel.com ([10.60.135.145])
  by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:10:49 -0700
X-CSE-ConnectionGUID: DegXMsyUTHGOiD9Y2cNnrw==
X-CSE-MsgGUID: A+KDxoubRjeA2xjTkEMmBQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="169705082"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa005.fm.intel.com with ESMTP; 08 Aug 2025 22:10:44 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 12/28] sched: Calculate the total number of preferred
 LLC tasks during load balance
Date: Sat,  9 Aug 2025 13:04:39 +0800
Message-Id: 
 <4145385d4ce232e10cae713c8449d459c325db46.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

During load balancing between LLCs, gather the number of tasks
on each runqueue of a source LLC.

For example, consider a system with 4 sched groups LLC0, LLC1,
..., LLC3. We are balancing towards LLC3 and LLC0 has 3 tasks
preferring LLC3, LLC1 has 2 tasks preferring LLC3 and LLC2 has
1 task preferring LLC3. LLC0 with most tasks preferring LLC3
will be chosen as the busiest LLC to pick the tasks from.

The number of tasks preferring the destination LLC are gathered
from each run queue for a source LLC.

For example, consider the sched_group LLC0 with two CPUs, CPU0
and CPU1. On CPU0, 2 tasks prefer to run on LLC3, and on CPU1,
one task prefers LLC3. The total number of tasks preferring
LLC3 in LLC0 is 2 + 1 =3D 3.

These statistics enable the load balancer to select tasks from
a sched_group that best aligns tasks with their preferred LLCs.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cfae71ee870b..f1697658c3b8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10496,6 +10496,9 @@ struct sg_lb_stats {
 	unsigned int nr_numa_running;
 	unsigned int nr_preferred_running;
 #endif
+#ifdef CONFIG_SCHED_CACHE
+	unsigned int nr_pref_llc[MAX_LLC];
+#endif
 };
=20
 /*
@@ -10974,6 +10977,14 @@ static inline void update_sg_lb_stats(struct lb_en=
v *env,
 		if (cpu_overutilized(i))
 			*sg_overutilized =3D 1;
=20
+#ifdef CONFIG_SCHED_CACHE
+		if (sched_feat(SCHED_CACHE)) {
+			int j;
+
+			for (j =3D 0; j < max_llcs; ++j)
+				sgs->nr_pref_llc[j] +=3D rq->nr_pref_llc[j];
+		}
+#endif
 		/*
 		 * No need to call idle_cpu() if nr_running is not 0
 		 */
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4717515665C
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:11:54 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.18
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716315; cv=none;
 b=V21sVGbgLQDQ3LIjl4XUJrmJ9J0H7EnLAxMhWiHDhYvdTQf37ITQq1SLcLqEU8QZBleGBd5opQCrWn9ZA+ka3UsL0gAkAAWeYPjzH6uXv6zStuJq71dJWgYewm9hUHjq7qSX6lm/Lgw0QNQVBR235FBzMEr5TtJzqy9vbdAZgM8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716315; c=relaxed/simple;
	bh=AIuQrSu/5y/S9TIWqFOGCV+S6deOxyVrovvWy4SL2kw=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=OloMkMGj39oyisSI/elKEAvpy20hxc65DCBP77XOsD6DM1R+a5A5Da/ppqs7Z7p3GBtilMBdNOBeSCw4CLzWnpi/GbZs0twrI/jHBj75g3QWh2U+l9MnhNx5slK4wgHsVE88KJtcO1SMNVHstEsw3B+CP/Ty6sI5OGUAf63XtLk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=ZuSndoFC; arc=none smtp.client-ip=192.198.163.18
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="ZuSndoFC"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716314; x=1786252314;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=AIuQrSu/5y/S9TIWqFOGCV+S6deOxyVrovvWy4SL2kw=;
  b=ZuSndoFC4541nSvbCP6cEW/FYV1l8zzkJP80vf6pWY1ZOHgg/RVomY/T
   Ti6yNCB58L0Sd4kxkaI5sPLuF7vCi3M5pS05/bm13pSUnPEFFjtsLUyj2
   f9POeHNiXdDCLJm6AbH5YNMJHTMKKxVu+wmWQOJyUA2JQAjfjhd4Y2l7i
   hwHCcBSXZp9fOvMWTMVoUR8/ktX+69hF3c7sKUUgsZ3Ez6EVCyQ/ijbHd
   VIyS49HPzpJp2UXMWArIqJMLsn/1xm4WXUpNNwWGCUSD+Ru4vQomDx8lv
   I+sL/FNJq4W7oKMEvp16XyroY9GrTk18XR9yqRbCDZM1KnkZXCppQk2n8
   g==;
X-CSE-ConnectionGUID: UjPqhATKTouUNB6GDkUddg==
X-CSE-MsgGUID: rAOe+a/mQeqiUy/XHrFejQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="56259917"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="56259917"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:11:53 -0700
X-CSE-ConnectionGUID: WVRUnOqXSoeMjZHQRQuF4Q==
X-CSE-MsgGUID: AGpfHQaOSwWVkoL9IadjUw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="165475971"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by orviesa007.jf.intel.com with ESMTP; 08 Aug 2025 22:11:48 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 13/28] sched: Tag the sched group as llc_balance if it
 has tasks prefer other LLC
Date: Sat,  9 Aug 2025 13:05:44 +0800
Message-Id: 
 <c111a2b0837ba1616a558f5797d63294e062c33d.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

During load balancing between LLCs, check whether there are tasks
preferring the destination LLC. If so, balance those tasks to the
destination LLC first.

Tag the sched_group that has tasks preferring to run on other LLCs
(non-local) with the group_llc_balance flag. This way, the load
balancer will later attempt to pull/push these tasks to their
preferred LLCs.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f1697658c3b8..30ebc7d1b999 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10491,6 +10491,7 @@ struct sg_lb_stats {
 	enum group_type group_type;
 	unsigned int group_asym_packing;	/* Tasks should be moved to preferred CP=
U */
 	unsigned int group_smt_balance;		/* Task on busy SMT be moved */
+	unsigned int group_llc_balance;		/* Tasks should be moved to preferred LL=
C */
 	unsigned long group_misfit_task_load;	/* A CPU has a task too big for its=
 capacity */
 #ifdef CONFIG_NUMA_BALANCING
 	unsigned int nr_numa_running;
@@ -10855,6 +10856,43 @@ static inline bool smt_balance(struct lb_env *env,=
 struct sg_lb_stats *sgs,
 	return false;
 }
=20
+/*
+ * Do LLC balance on sched group that contains LLC, and have tasks preferr=
ing
+ * to run on LLC in idle dst_cpu.
+ */
+#ifdef CONFIG_SCHED_CACHE
+static inline bool llc_balance(struct lb_env *env, struct sg_lb_stats *sgs,
+			       struct sched_group *group)
+{
+	struct sched_domain *child =3D env->sd->child;
+	int llc;
+
+	if (!sched_feat(SCHED_CACHE))
+		return false;
+
+	if (env->sd->flags & SD_SHARE_LLC)
+		return false;
+
+	/* only care about task migration among LLCs */
+	if (child && !(child->flags & SD_SHARE_LLC))
+		return false;
+
+	llc =3D llc_idx(env->dst_cpu);
+	if (sgs->nr_pref_llc[llc] > 0 &&
+	    _get_migrate_hint(env->src_cpu, env->dst_cpu,
+			      0, true) =3D=3D mig_allow)
+		return true;
+
+	return false;
+}
+#else
+static inline bool llc_balance(struct lb_env *env, struct sg_lb_stats *sgs,
+			       struct sched_group *group)
+{
+	return false;
+}
+#endif
+
 static inline long sibling_imbalance(struct lb_env *env,
 				    struct sd_lb_stats *sds,
 				    struct sg_lb_stats *busiest,
@@ -11037,6 +11075,11 @@ static inline void update_sg_lb_stats(struct lb_en=
v *env,
 	sgs->group_type =3D group_classify(env->sd->imbalance_pct, group, sgs);
=20
 	update_sg_if_llc(env, sgs, group);
+
+	/* Check for tasks in this group can be moved to their preferred LLC */
+	if (!local_group && llc_balance(env, sgs, group))
+		sgs->group_llc_balance =3D 1;
+
 	/* Computing avg_load makes sense only when group is overloaded */
 	if (sgs->group_type =3D=3D group_overloaded)
 		sgs->avg_load =3D (sgs->group_load * SCHED_CAPACITY_SCALE) /
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DFDCBA36
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:12:08 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.18
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716330; cv=none;
 b=aQ1DuKtufh+Odr/LorF9O/fM51njWURpIhUr0LEbL04DjrjFm+d09C28slCF+44U9FtSmiimp1pCffRh1R/fvm2ZZoxIwYLoRUwe6OyK5cj/+TCojsQvg2MTs+TBXvkqNEX58rHKytdwSZsbXZQ7b/69UrXdxM0ua0rXL6iOtQ0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716330; c=relaxed/simple;
	bh=O81QutAo6orLIIDwAsw0ZMG10IN4T/AEDQCSSIJxqyQ=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=U5xozr9Mtp5uu2TvfPRuTYtKe4U1Wvg6adAEEQYPp3XhL43lFnMD5WFxuPpA676GFAHZvR3sqXgli2n0l/wcn29K4BggdO2CKIje4lN3tYfPeo9MARoA5x4puu9zfIoLrFm8QyrapPsLUWke2Sltghaenw5fxTvdXdEsap9QY/I=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=MEuihH0+; arc=none smtp.client-ip=192.198.163.18
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="MEuihH0+"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716328; x=1786252328;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=O81QutAo6orLIIDwAsw0ZMG10IN4T/AEDQCSSIJxqyQ=;
  b=MEuihH0+FVuXmARwKbtg3ZEInFXwZofL2oeZFz8hAFncHiYR3thGt37w
   r1T1tf3lEswVj9+r/pBIBCI/p6tNeK/mU9z44eSxhHK5hkAbm4U0pK2Yd
   kJSRq/e/BjJbTfq0wdmHL+xZeKBs3wZgVSLF/cjxbK8xkGolfMsNQtISm
   W3cnADlN4qbVKCTxsMFINRPYiR7F/yD1Oj5rfzQ0wt0MvGxsmxI+X3NFf
   9X2WdZwnrcjBg1uRuKw5Ke2i5+i08CMFggHeD7mmAh+bcE6otZUr5bfyT
   1/s8kOVFgN6mguEbs8JSr9oz3bIDVZO52aj/iKxcnT/mwWA+dnZZTC1UF
   g==;
X-CSE-ConnectionGUID: 7CDD+sx7Si6bye2RdBQ3gA==
X-CSE-MsgGUID: jib9w3+fQbenCvLznD37kQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="56259932"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="56259932"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:12:07 -0700
X-CSE-ConnectionGUID: DOXrAiV0Se6h6BcziixHkw==
X-CSE-MsgGUID: PSBpAuryS0CRK6OjhNz0bw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="165475987"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by orviesa007.jf.intel.com with ESMTP; 08 Aug 2025 22:12:02 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 14/28] sched: Introduce update_llc_busiest() to deal
 with groups having preferred LLC tasks
Date: Sat,  9 Aug 2025 13:05:58 +0800
Message-Id: 
 <bf3d8e42cd09bb23a296d26528624e835e6a520d.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

The load balancer attempts to identify the busiest sched_group with
the highest load and migrates some tasks to a less busy sched_group
to distribute the load across different CPUs.

When cache-aware scheduling is enabled, the busiest sched_group is
defined as the one with the highest number of tasks preferring to run
on the destination LLC. If the busiest group has llc_balance tag,
the cache aware load balance will be launched.

Introduce the helper function update_llc_busiest() to identify
such sched group with most tasks preferring the destination LLC.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 30ebc7d1b999..b8cc85291351 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10885,12 +10885,36 @@ static inline bool llc_balance(struct lb_env *env=
, struct sg_lb_stats *sgs,
=20
 	return false;
 }
+
+static bool update_llc_busiest(struct lb_env *env,
+			       struct sg_lb_stats *busiest,
+			       struct sg_lb_stats *sgs)
+{
+	int idx;
+
+	/* Only the candidate with llc_balance need to be taken care of */
+	if (!sgs->group_llc_balance)
+		return false;
+
+	/*
+	 * There are more tasks that want to run on dst_cpu's LLC.
+	 */
+	idx =3D llc_idx(env->dst_cpu);
+	return sgs->nr_pref_llc[idx] > busiest->nr_pref_llc[idx];
+}
 #else
 static inline bool llc_balance(struct lb_env *env, struct sg_lb_stats *sgs,
 			       struct sched_group *group)
 {
 	return false;
 }
+
+static bool update_llc_busiest(struct lb_env *env,
+			       struct sg_lb_stats *busiest,
+			       struct sg_lb_stats *sgs)
+{
+	return false;
+}
 #endif
=20
 static inline long sibling_imbalance(struct lb_env *env,
@@ -11122,6 +11146,14 @@ static bool update_sd_pick_busiest(struct lb_env *=
env,
 	     sds->local_stat.group_type !=3D group_has_spare))
 		return false;
=20
+	/* deal with prefer LLC load balance, if failed, fall into normal load ba=
lance */
+	if (update_llc_busiest(env, busiest, sgs))
+		return true;
+
+	/* if there is already a busy group, skip the normal load balance */
+	if (busiest->group_llc_balance)
+		return false;
+
 	if (sgs->group_type > busiest->group_type)
 		return true;
=20
@@ -12029,9 +12061,11 @@ static struct sched_group *sched_balance_find_src_=
group(struct lb_env *env)
 	/*
 	 * Try to move all excess tasks to a sibling domain of the busiest
 	 * group's child domain.
+	 * Also do so if we can move some tasks that prefer the local LLC.
 	 */
 	if (sds.prefer_sibling && local->group_type =3D=3D group_has_spare &&
-	    sibling_imbalance(env, &sds, busiest, local) > 1)
+	    (busiest->group_llc_balance ||
+	    sibling_imbalance(env, &sds, busiest, local) > 1))
 		goto force_balance;
=20
 	if (busiest->group_type !=3D group_overloaded) {
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7EC501DE3BE
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:12:37 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.19
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716359; cv=none;
 b=QbiGTyG3g2KE4hVHW5ANIWTpqVt3p0S/zxdB0gitjC7ulgs6vf7jEjdLirjHI7kaK+ztqqN/rCqTa/6hfausklfNc4rz8dYqp0CG6Y6YgxkKUwvBQIrp/KqxhxJRz+O3v6tp0XTYU16LxAdHr+C5BFN93tNkPr76wtjETRyaMaw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716359; c=relaxed/simple;
	bh=yGnpZ6FlvoEQkStNXEnYs/+BqL2dRHGniBRSc7+bEyI=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=m1yPm73VBI2YcbcZuIT/BdavdjC/l3BdlBwnghR8mggxNfB23vX6iBYsgIl3jTAM5i/J7m6oqk0xrxs69iqlLFQ90jQ8hbG1d9nYsXQ1qBKpwmo+MO9karhCFAgqXxtek2+Fw6jBNSgn4f3uTQlC2jV9TQHxB0EG1HVMEE5ZgXA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=TEnIOB1/; arc=none smtp.client-ip=198.175.65.19
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="TEnIOB1/"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716358; x=1786252358;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=yGnpZ6FlvoEQkStNXEnYs/+BqL2dRHGniBRSc7+bEyI=;
  b=TEnIOB1/ILlwqYFTI6kKipRVVwx6DRo7Bg6DA7Rv5STVaUTCs0KWpJvq
   Z6xP+XRjxHgLpyN1tALXTPQatPAmzpP8yoIO81oaWxQRxnjTqFILCkrJF
   kCtJQ/VZjCEhVc7wgV23PMjUStSCtMH0P9OW3KNu8Za5Pnw7tMZySzv3t
   NECmaJZ84sMsSF0CtDUsuTqG088mMNcuu4rS+3dzRuIJxgZ7St1Ds47Z9
   1QQxPYZElaPgiDkZBfePvDHM5kCX7XSWYlmCxqKtAeHU7eYkQHevmw80h
   wEnmX47OYGWTIQssLuDSa8NOAiVjB27DwhzduhCcmV9MOJn0mr3hsf+ZP
   A==;
X-CSE-ConnectionGUID: lvcsmsR8QjyFBqcujWQmiQ==
X-CSE-MsgGUID: bzzKp3XeT8WmtT0FxfRmoA==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="56932532"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="56932532"
Received: from orviesa006.jf.intel.com ([10.64.159.146])
  by orvoesa111.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:12:37 -0700
X-CSE-ConnectionGUID: cebyh5CzSdq9bdRIx+9UHw==
X-CSE-MsgGUID: 291hr3FsSYCIyZnbCahMKA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="164703738"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by orviesa006.jf.intel.com with ESMTP; 08 Aug 2025 22:12:31 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 15/28] sched: Introduce a new migration_type to track
 the preferred LLC load balance
Date: Sat,  9 Aug 2025 13:06:27 +0800
Message-Id: 
 <b2ac5df723e9995d5b336282c639cc484ac1b884.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

Introduce a new migration type named migrate_llc_task to facilitate
cache-aware load balancing.

After the busiest sched_group is identified as the one that needs
migration due to having most tasks preferring destination LLC, tag the
migration type as the newly introduced migrate_llc_task. During load
balancing, each runqueue within the busiest preferred-LLC sched_group
is checked, and the runqueue with the highest number of tasks preferring
to run on the destination CPU is chosen as the busiest runqueue.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b8cc85291351..a301b56dd2b4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9746,7 +9746,8 @@ enum migration_type {
 	migrate_load =3D 0,
 	migrate_util,
 	migrate_task,
-	migrate_misfit
+	migrate_misfit,
+	migrate_llc_task
 };
=20
 #define LBF_ALL_PINNED	0x01
@@ -10180,6 +10181,15 @@ static int detach_tasks(struct lb_env *env)
 			env->imbalance -=3D util;
 			break;
=20
+		case migrate_llc_task:
+			/*
+			 * Since can_migrate_task() succeed, when we reach here, it means that p
+			 * can be migrated even if dst_cpu is not p's preferred_llc, because th=
ere
+			 * are no idle cores for p to do in-llc load balance.
+			 */
+			env->imbalance--;
+			break;
+
 		case migrate_task:
 			env->imbalance--;
 			break;
@@ -11817,6 +11827,15 @@ static inline void calculate_imbalance(struct lb_e=
nv *env, struct sd_lb_stats *s
 		return;
 	}
=20
+#ifdef CONFIG_SCHED_CACHE
+	if (busiest->group_llc_balance) {
+		/* Move a task that prefer local LLC */
+		env->migration_type =3D migrate_llc_task;
+		env->imbalance =3D 1;
+		return;
+	}
+#endif
+
 	if (busiest->group_type =3D=3D group_imbalanced) {
 		/*
 		 * In the group_imb case we cannot rely on group-wide averages
@@ -12125,6 +12144,10 @@ static struct rq *sched_balance_find_src_rq(struct=
 lb_env *env,
 	struct rq *busiest =3D NULL, *rq;
 	unsigned long busiest_util =3D 0, busiest_load =3D 0, busiest_capacity =
=3D 1;
 	unsigned int busiest_nr =3D 0;
+#ifdef CONFIG_SCHED_CACHE
+	unsigned int busiest_pref_llc =3D 0;
+	int dst_llc;
+#endif
 	int i;
=20
 	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
@@ -12233,6 +12256,16 @@ static struct rq *sched_balance_find_src_rq(struct=
 lb_env *env,
 			}
 			break;
=20
+		case migrate_llc_task:
+#ifdef CONFIG_SCHED_CACHE
+			dst_llc =3D llc_idx(env->dst_cpu);
+			if (!cpus_share_cache(env->dst_cpu, rq->cpu) &&
+			    busiest_pref_llc < rq->nr_pref_llc[dst_llc]) {
+				busiest_pref_llc =3D rq->nr_pref_llc[dst_llc];
+				busiest =3D rq;
+			}
+#endif
+			break;
 		case migrate_task:
 			if (busiest_nr < nr_running) {
 				busiest_nr =3D nr_running;
@@ -12415,6 +12448,8 @@ static void update_lb_imbalance_stat(struct lb_env =
*env, struct sched_domain *sd
 	case migrate_misfit:
 		__schedstat_add(sd->lb_imbalance_misfit[idle], env->imbalance);
 		break;
+	case migrate_llc_task:
+		break;
 	}
 }
=20
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C82CABA36
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:12:53 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.19
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716375; cv=none;
 b=aS7CLs0yVWKaF6OGVkXK8ZAgy8yUgUakdznwWDtZD4N9gQ9eA7tWRxrHj1IeAjaqTL8M+VmHHPvI8FEcOuDBcfH3oVpULXvb4/xFnoBCpg/mVg6MCRCvDJrLWdumxn7wi15V2NyagC2GII5gOWOj3odj3IWvyB3Ywa1aJoBDB/I=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716375; c=relaxed/simple;
	bh=fRnw1t3Rh9UYvT00ArjMcNIiB37mwFZbA2eVDMCUX9M=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=RJUo8ambmNBfq+wvLRlOCmiE/W7wpbkuJF7yL9JEQZ65V62F8oCjVmz3qVgLvkV3PLO6TzjT6umm4UV9UMY1fHNephBb+kWs8mVMmZ6rPjQkthPxxV8sRM5GBZAKF/4w8+2Bp7vO3sUeIwn+6xlZ35XOq6ECCeBWwM5GJsRHcEU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=DO7eB20K; arc=none smtp.client-ip=198.175.65.19
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="DO7eB20K"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716374; x=1786252374;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=fRnw1t3Rh9UYvT00ArjMcNIiB37mwFZbA2eVDMCUX9M=;
  b=DO7eB20K/f/MVe9m5JNfhy4n46O8ccIoioHjrTsbuwZlLpqzzksjVlAv
   FSWzb4JVfyBHYiQaKCDVfK0CkNncYJ22CpRHB1RwD3zGRfwsq6x9aCRQH
   TKpWNRyQsj3e8nZiDG6U1hLeWNbIKj/X6uKv56QKiYVXkZeKHyR4Zqnxi
   U5rknviHlsICE9lsjONRBpod6oRa32YfBF17V6dJ4X6Vo8cglEVlG/FKh
   aqYrgjA98DE4rWoYD97vabGK2LMtYuZO47cKW4wuRsI+yu4gtqm55Wgcy
   1wWRnlj7aVuWb9SYbQgGx83xtwUCYP9X9i36gO7Eb2NFdoWyrm8YyDqHa
   g==;
X-CSE-ConnectionGUID: 0oJuosNIQM23ggPBmbF4UQ==
X-CSE-MsgGUID: LCnyHPYNSYquGLfPEsgHmQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="56932555"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="56932555"
Received: from orviesa006.jf.intel.com ([10.64.159.146])
  by orvoesa111.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:12:53 -0700
X-CSE-ConnectionGUID: 0pVJsl26T9OrBTO7kMP8BQ==
X-CSE-MsgGUID: o19YVeMBQCulANK7Q6DXLA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="164703753"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by orviesa006.jf.intel.com with ESMTP; 08 Aug 2025 22:12:47 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 16/28] sched: Consider LLC locality for active balance
Date: Sat,  9 Aug 2025 13:06:42 +0800
Message-Id: 
 <38d036cf946223b46a20ad60ccf13f9dcb316240.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

If busiest run queue has only one task, active balance is enlisted
to actually move the task.  However, before moving the task,
we should consider whether we are moving the task from its preferred
LLC.

Don't move the single running task in a run queue to another LLC, if
we are moving it from its desired LLC, or moving it will cause too much
imbalance between the LLCs.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 51 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a301b56dd2b4..592a4034e760 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12332,10 +12332,43 @@ imbalanced_active_balance(struct lb_env *env)
 	return 0;
 }
=20
+#ifdef CONFIG_SCHED_CACHE
+static inline bool
+break_llc_locality(struct lb_env *env)
+{
+	if (!sched_feat(SCHED_CACHE))
+		return 0;
+
+	if (cpus_share_cache(env->src_cpu, env->dst_cpu))
+		return 0;
+	/*
+	 * All tasks want to stay put. Move only if LLC is
+	 * heavily loaded or don't pull a task from its
+	 * preferred CPU if it is the only one running.
+	 */
+	if (env->src_rq->nr_pref_llc_running =3D=3D env->src_rq->cfs.h_nr_runnabl=
e &&
+	    (env->src_rq->nr_running <=3D 1 ||
+	    _get_migrate_hint(env->src_cpu, env->dst_cpu,
+			      0, false) =3D=3D mig_forbid))
+		return 1;
+
+	return 0;
+}
+#else
+static inline bool
+break_llc_locality(struct lb_env *env)
+{
+	return 0;
+}
+#endif
+
 static int need_active_balance(struct lb_env *env)
 {
 	struct sched_domain *sd =3D env->sd;
=20
+	if (break_llc_locality(env))
+		return 0;
+
 	if (asym_active_balance(env))
 		return 1;
=20
@@ -12355,7 +12388,8 @@ static int need_active_balance(struct lb_env *env)
 			return 1;
 	}
=20
-	if (env->migration_type =3D=3D migrate_misfit)
+	if (env->migration_type =3D=3D migrate_misfit ||
+	    env->migration_type =3D=3D migrate_llc_task)
 		return 1;
=20
 	return 0;
@@ -12800,9 +12834,20 @@ static int active_load_balance_cpu_stop(void *data)
 		goto out_unlock;
=20
 	/* Is there any task to move? */
-	if (busiest_rq->nr_running <=3D 1)
-		goto out_unlock;
+	if (busiest_rq->nr_running <=3D 1) {
+#ifdef CONFIG_SCHED_CACHE
+		int llc =3D llc_idx(target_cpu);
=20
+		if (!sched_feat(SCHED_CACHE))
+			goto out_unlock;
+
+		if (llc < 0)
+			goto out_unlock;
+		/* don't migrate if task does not prefer target */
+		if (busiest_rq->nr_pref_llc[llc] < 1)
+#endif
+			goto out_unlock;
+	}
 	/*
 	 * This condition is "impossible", if it occurs
 	 * we need to fix it. Originally reported by
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58996226CFC
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:13:07 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.20
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716388; cv=none;
 b=ToNlzHkLLYiYeK8T8BsAhwwVTQMjfVDgwARwIDYa+wyZ8Eu38JHaFmFWhEHDz8Y4QXb3R7dulNjX2NJYnlkPmQ0FB+POvt2GUZ/4GvwbMKz42XqpgP66/Git+tq6B67e0BFqrfwmVgwWN0fqYa7Y2mT9Jw28QbyFm7zODtPY6sc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716388; c=relaxed/simple;
	bh=sTSG237di6kHrTi+M/LVG5ENiqilE30WO0gE5TPh2Qg=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=k07/zqOO54KVW2SXRP1BcFCx9O6eoCI0J3Yg2JN4fa4l56WOxmJSEvLVg2Qq4TaVlBV1mD6qAvItCmcERw1UNU4TYSdIrJB+dIamh6hR7WzT6I/vQu1VEkz4aED2Kp/nidg5cbmW5fT2HqFSYZjL/i79XRYjNuUmf9W31GhMzEo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=G3ntFCoX; arc=none smtp.client-ip=198.175.65.20
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="G3ntFCoX"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716387; x=1786252387;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=sTSG237di6kHrTi+M/LVG5ENiqilE30WO0gE5TPh2Qg=;
  b=G3ntFCoXyBZCYeCFd2XdSpKtKywszwsAL7iG155Ga7pmma4DufkDAk1j
   J0oIixy6CX3G2NDetf51jJCmgOaHTwM5/Zyy62tX553kTkCWxQYisiVUg
   1tLtppV/kH9sI0k6oKldvrjqqgkVdJpDQWrsW6zAURpZZQre0+t9sB2DH
   giDN1ULvFcnaQhebg6L8k2Sk3KyDkVnyIgPtuntXxM6AYlGGbknUXkX/S
   gDEJ0HpmTVhkCqcyfcxY/dueuq+yt+7fItoY/olEIlSogszYxJCyn99+x
   O91JFgRzp1p1is0mJ7huD6m3c93Mm1gkIWBIs7CmYnQAnn6h0ZqAcvxVv
   Q==;
X-CSE-ConnectionGUID: RA52ecx/Ti+wEdDxJMLnkg==
X-CSE-MsgGUID: /5YdSh1jQj+Ywel/XofIvA==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="56768552"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="56768552"
Received: from fmviesa002.fm.intel.com ([10.60.135.142])
  by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:13:06 -0700
X-CSE-ConnectionGUID: +bCrWry6RQe4Sex8ybOxpA==
X-CSE-MsgGUID: IbTHoJawTVyJKtMAWCm2+g==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="189180343"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa002.fm.intel.com with ESMTP; 08 Aug 2025 22:13:01 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 17/28] sched: Consider LLC preference when picking
 tasks from busiest queue
Date: Sat,  9 Aug 2025 13:06:58 +0800
Message-Id: 
 <fe8ae3521256d58c52e8ece2a03f0d95ced1b15c.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

When picking tasks from busiest queue for load balance, we currently
do not consider LLC preference.

Order the task in the busiest queue such that we picked the tasks in the
following order:
	1. tasks that prefer dst cpu's LLC
	2. tasks that have no preference in LLC
	3. tasks that prefer LLC other than the ones they are on
	4. tasks that prefer the LLC that they are currently on

This will allow tasks better chances to wind up in its preferred LLC.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 66 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 65 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 592a4034e760..8d5792b9e658 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10093,6 +10093,68 @@ static struct task_struct *detach_one_task(struct =
lb_env *env)
 	return NULL;
 }
=20
+#ifdef CONFIG_SCHED_CACHE
+/*
+ * Prepare lists to detach tasks in the following order:
+ * 1. tasks that prefer dst cpu's LLC
+ * 2. tasks that have no preference in LLC
+ * 3. tasks that prefer LLC other than the ones they are on
+ * 4. tasks that prefer the LLC that they are currently on.
+ */
+static struct list_head
+*order_tasks_by_llc(struct lb_env *env, struct list_head *tasks)
+{
+	struct task_struct *p;
+	LIST_HEAD(pref_old_llc);
+	LIST_HEAD(pref_new_llc);
+	LIST_HEAD(no_pref_llc);
+	LIST_HEAD(pref_other_llc);
+
+	if (!sched_feat(SCHED_CACHE))
+		return tasks;
+
+	if (cpus_share_cache(env->dst_cpu, env->src_cpu))
+		return tasks;
+
+	while (!list_empty(tasks)) {
+		p =3D list_last_entry(tasks, struct task_struct, se.group_node);
+
+		if (p->preferred_llc =3D=3D llc_id(env->dst_cpu)) {
+			list_move(&p->se.group_node, &pref_new_llc);
+			continue;
+		}
+
+		if (p->preferred_llc =3D=3D llc_id(env->src_cpu)) {
+			list_move(&p->se.group_node, &pref_old_llc);
+			continue;
+		}
+
+		if (p->preferred_llc =3D=3D -1) {
+			list_move(&p->se.group_node, &no_pref_llc);
+			continue;
+		}
+
+		list_move(&p->se.group_node, &pref_other_llc);
+	}
+
+	/*
+	 * We detach tasks from list tail in detach tasks.  Put tasks
+	 * to be chosen first at end of list.
+	 */
+	list_splice(&pref_new_llc, tasks);
+	list_splice(&no_pref_llc, tasks);
+	list_splice(&pref_other_llc, tasks);
+	list_splice(&pref_old_llc, tasks);
+	return tasks;
+}
+#else
+static inline struct list_head
+*order_tasks_by_llc(struct lb_env *env, struct list_head *tasks)
+{
+	return tasks;
+}
+#endif
+
 /*
  * detach_tasks() -- tries to detach up to imbalance load/util/tasks from
  * busiest_rq, as part of a balancing operation within domain "sd".
@@ -10101,7 +10163,7 @@ static struct task_struct *detach_one_task(struct l=
b_env *env)
  */
 static int detach_tasks(struct lb_env *env)
 {
-	struct list_head *tasks =3D &env->src_rq->cfs_tasks;
+	struct list_head *tasks;
 	unsigned long util, load;
 	struct task_struct *p;
 	int detached =3D 0;
@@ -10120,6 +10182,8 @@ static int detach_tasks(struct lb_env *env)
 	if (env->imbalance <=3D 0)
 		return 0;
=20
+	tasks =3D order_tasks_by_llc(env, &env->src_rq->cfs_tasks);
+
 	while (!list_empty(tasks)) {
 		/*
 		 * We don't want to steal all, otherwise we may be treated likewise,
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1D46274B39
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:13:19 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.20
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716401; cv=none;
 b=gwUxPhraVHLm64bXQSb1oNwdX318HEEFGQP3NJIRjG0ej1HembqLwL/AMdMnKs2idXx3KEfcQggsIlJeGxPd86ymVhFs/rlGwCRgO+oHKZRTtPkeotIYE6Skr2Z90a1CPa/LNaWTM1XdHnwRmA1ybF/xRMbrR1KLLpNUD058xfI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716401; c=relaxed/simple;
	bh=s+rR7wC7UnAbjuxbPj5c6/L98TvboEiREJprAaugmo8=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=Ex3plSrAdkKiiWgy1wFo/qOyi1KRMrAmvUSGgR+Sl7c+uHNitZ3FkYOzLvNN2Kk7bcRKDLWQu/zjHkM0B/ktpD3735kBxpJu0PJ8IZ58b18B0w/r7VUcJthi/o2hdqN358rDx5jLpz4Z9VU9UsOwUlIB1nTwdFObyEGsDbo1DD0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=ZNX9P8Zh; arc=none smtp.client-ip=198.175.65.20
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="ZNX9P8Zh"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716400; x=1786252400;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=s+rR7wC7UnAbjuxbPj5c6/L98TvboEiREJprAaugmo8=;
  b=ZNX9P8Zhh4or6QRvyIGlsqU0K0XXUdzdv+gCoKT4EcsJWyPrUrrpjeh6
   Qmj8TTDc2Q3+gYj6uSTMIaEUdV5BvlkAcN9NnwrPfjzZslpxwRyFTPJTx
   KtvP4Sp2C8p5ushx8yObLd6nOXcFZSnue19p3r5NoF227rlrE4GYeUWKq
   dM+U0/Nq/0qZLmHe33WFqOXqLI4gmE0PevCwc5pjj8qUenPxHW1kXvWF+
   3fvMGqOlhGBBzPmI9Nt4so8fHdQ0chc/atY+kOpU5fgp8EHxTRIRoH0Lz
   6bGZ08Lr6XkBPeiz62J27S7cC3PWHkwpEOh1gP4JN0l7CxroHUJGaT1Ve
   g==;
X-CSE-ConnectionGUID: b1FzoNUSROqCeN8PP7hLeA==
X-CSE-MsgGUID: PpjmRF1hSJmGHpUlPRMqJA==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="56768565"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="56768565"
Received: from fmviesa002.fm.intel.com ([10.60.135.142])
  by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:13:19 -0700
X-CSE-ConnectionGUID: TVpwBJFnSOm4LFGDl/UL7A==
X-CSE-MsgGUID: vYNYzwHuTSmQtfe59f1VcQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="189180352"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa002.fm.intel.com with ESMTP; 08 Aug 2025 22:13:13 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 18/28] sched: Do not migrate task if it is moving out
 of its preferred LLC
Date: Sat,  9 Aug 2025 13:07:10 +0800
Message-Id: 
 <081010e2c9cd8f4b3c9aa6d1b98fbe9438cd3c06.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

In the final step of task migration during load balancing,
can_migrate_task() is used to determine whether a task can
be moved to the destination. If the task has an LLC preference,
consider this preference when moving it out of its preferred LLC.
With this check in place, there is no need to retain the task's
cache-hot CPU check in task_hot(); remove it accordingly.

Besides, add more checks in detach_tasks() to avoid choosing
tasks that prefer their current LLC.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8d5792b9e658..22b7a7fe538e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9822,17 +9822,6 @@ static int task_hot(struct task_struct *p, struct lb=
_env *env)
 	if (sysctl_sched_migration_cost =3D=3D 0)
 		return 0;
=20
-#ifdef CONFIG_SCHED_CACHE
-	if (sched_feat(SCHED_CACHE) && p->mm && p->mm->pcpu_sched) {
-		/*
-		 * XXX things like Skylake have non-inclusive L3 and might not
-		 * like this L3 centric view. What to do about L2 stickyness ?
-		 */
-		return per_cpu_ptr(p->mm->pcpu_sched, env->src_cpu)->occ >
-		       per_cpu_ptr(p->mm->pcpu_sched, env->dst_cpu)->occ;
-	}
-#endif
-
 	delta =3D rq_clock_task(env->src_rq) - p->se.exec_start;
=20
 	return delta < (s64)sysctl_sched_migration_cost;
@@ -10029,6 +10018,12 @@ int can_migrate_task(struct task_struct *p, struct=
 lb_env *env)
 	if (env->flags & LBF_ACTIVE_LB)
 		return 1;
=20
+#ifdef CONFIG_SCHED_CACHE
+	if (sched_feat(SCHED_CACHE) &&
+	    get_migrate_hint(env->src_cpu, env->dst_cpu, p) =3D=3D mig_forbid)
+		return 0;
+#endif
+
 	degrades =3D migrate_degrades_locality(p, env);
 	if (!degrades)
 		hot =3D task_hot(p, env);
@@ -10289,6 +10284,17 @@ static int detach_tasks(struct lb_env *env)
 		if (env->imbalance <=3D 0)
 			break;
=20
+#ifdef CONFIG_SCHED_CACHE
+		/*
+		 * Don't detach more tasks if remaining tasks want to stay:
+		 * The tasks have already been sorted by order_tasks_by_llc(),
+		 * they are tasks that prefer the current LLC.
+		 */
+		if (sched_feat(SCHED_CACHE) && p->preferred_llc !=3D -1 &&
+		    llc_id(env->src_cpu) =3D=3D p->preferred_llc)
+			break;
+#endif
+
 		continue;
 next:
 		if (p->sched_task_hot)
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E605BA36
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:13:32 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716413; cv=none;
 b=Y5DY4uVXJj/NXx4xa+vIglUy/e0Mz3lC23M4GrgHNX8VTYhXYab2lfrFY6mo9TrkT8w/WQHHy0ath+3g82U7f7w+5f8oq86hgIXxPEP3isbuKS9ryw2kjNuMOw6y8wqF8EHfhI4CaEef8Gm0ym0TuNTWKHvhA39IckSeqYVQ55Q=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716413; c=relaxed/simple;
	bh=rJfyf/57CaUPFk3O70t1/xBAt4kiiSb/LUukMkIaaWI=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=BbGezSEvMdgtuVQGY5ll4boKLDz3xf/HaQKe/BbQh2RvAxEla/bMNlbvy7fNRiyq6jqUio9sHSAT2xUSQfMGzvU0gbx6uABWuPs54UhyitX7QqsGpuhPSoWaqfjMJ+JchZxUZEza6Pv3LtvE1xF1YB8vYubCHFqLzoIDWZeEypI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=Fv6T2jfF; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="Fv6T2jfF"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716412; x=1786252412;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=rJfyf/57CaUPFk3O70t1/xBAt4kiiSb/LUukMkIaaWI=;
  b=Fv6T2jfFSgH2501Wvsd9NAJpLZ95G3qQ2wpS+brwkLJI4Z9OwbGNI010
   TXV31asmWoF9+Q+nOOHJAmbtTXBwCofZZc4StvBRsyudecftX7Wk1PRur
   u7QKz2FJNo0ci4Owq3KzhsOU/Zu+KxpkANT6PW233G4v7L1dfPrJmuTsr
   kc0L6AcVmncnIjhBRuMo0p6BD/uY9llqRtu1k1OnH8I9Jcei+J4SP8kPW
   Qrss/vUTJVcjLGz++sDQq5rXiSF8X5srU5tRisTDzgzNNua10vFLKF5a/
   +MoDPmHcbzf2Z88/IRgSh6BUtg+yyb4sVeKEc69v7SY3AfaN4pHBsvMby
   w==;
X-CSE-ConnectionGUID: yRiWx0TaSqSHcni4YwFu1Q==
X-CSE-MsgGUID: KOLdWXj6RXq5qKElN0Zavw==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="79620355"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="79620355"
Received: from fmviesa009.fm.intel.com ([10.60.135.149])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:13:31 -0700
X-CSE-ConnectionGUID: 66tPf5k4Q3+atsnA5jm4Jg==
X-CSE-MsgGUID: 4d6H20jNRjW/omJZg3lPVw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="165844013"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa009.fm.intel.com with ESMTP; 08 Aug 2025 22:13:26 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 19/28] sched: Introduce SCHED_CACHE_LB to control cache
 aware load balance
Date: Sat,  9 Aug 2025 13:07:23 +0800
Message-Id: 
 <eba3303cdab63e2d96dcc630d153004e4afb88f3.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

Introduce the SCHED_CACHE_LB sched feature to enable or disable
cache aware load balance in the schduler.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c     | 18 ++++++++++--------
 kernel/sched/features.h |  1 +
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 22b7a7fe538e..9843d4e1d84f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10019,7 +10019,7 @@ int can_migrate_task(struct task_struct *p, struct =
lb_env *env)
 		return 1;
=20
 #ifdef CONFIG_SCHED_CACHE
-	if (sched_feat(SCHED_CACHE) &&
+	if (sched_feat(SCHED_CACHE) && sched_feat(SCHED_CACHE_LB) &&
 	    get_migrate_hint(env->src_cpu, env->dst_cpu, p) =3D=3D mig_forbid)
 		return 0;
 #endif
@@ -10105,7 +10105,7 @@ static struct list_head
 	LIST_HEAD(no_pref_llc);
 	LIST_HEAD(pref_other_llc);
=20
-	if (!sched_feat(SCHED_CACHE))
+	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB))
 		return tasks;
=20
 	if (cpus_share_cache(env->dst_cpu, env->src_cpu))
@@ -10290,7 +10290,8 @@ static int detach_tasks(struct lb_env *env)
 		 * The tasks have already been sorted by order_tasks_by_llc(),
 		 * they are tasks that prefer the current LLC.
 		 */
-		if (sched_feat(SCHED_CACHE) && p->preferred_llc !=3D -1 &&
+		if (sched_feat(SCHED_CACHE) && sched_feat(SCHED_CACHE_LB) &&
+		    p->preferred_llc !=3D -1 &&
 		    llc_id(env->src_cpu) =3D=3D p->preferred_llc)
 			break;
 #endif
@@ -10947,7 +10948,7 @@ static inline bool llc_balance(struct lb_env *env, =
struct sg_lb_stats *sgs,
 	struct sched_domain *child =3D env->sd->child;
 	int llc;
=20
-	if (!sched_feat(SCHED_CACHE))
+	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB))
 		return false;
=20
 	if (env->sd->flags & SD_SHARE_LLC)
@@ -11058,7 +11059,8 @@ static void update_sg_if_llc(struct lb_env *env, st=
ruct sg_lb_stats *sgs,
 	struct sched_domain *sd =3D env->sd->child;
 	struct sched_domain_shared *sd_share;
=20
-	if (!sched_feat(SCHED_CACHE) || env->idle =3D=3D CPU_NEWLY_IDLE)
+	if (!sched_feat(SCHED_CACHE) || env->idle =3D=3D CPU_NEWLY_IDLE ||
+	    !sched_feat(SCHED_CACHE_LB))
 		return;
=20
 	/* only care the sched domain that spans 1 LLC */
@@ -11120,7 +11122,7 @@ static inline void update_sg_lb_stats(struct lb_env=
 *env,
 			*sg_overutilized =3D 1;
=20
 #ifdef CONFIG_SCHED_CACHE
-		if (sched_feat(SCHED_CACHE)) {
+		if (sched_feat(SCHED_CACHE) && sched_feat(SCHED_CACHE_LB)) {
 			int j;
=20
 			for (j =3D 0; j < max_llcs; ++j)
@@ -12406,7 +12408,7 @@ imbalanced_active_balance(struct lb_env *env)
 static inline bool
 break_llc_locality(struct lb_env *env)
 {
-	if (!sched_feat(SCHED_CACHE))
+	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB))
 		return 0;
=20
 	if (cpus_share_cache(env->src_cpu, env->dst_cpu))
@@ -12908,7 +12910,7 @@ static int active_load_balance_cpu_stop(void *data)
 #ifdef CONFIG_SCHED_CACHE
 		int llc =3D llc_idx(target_cpu);
=20
-		if (!sched_feat(SCHED_CACHE))
+		if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB))
 			goto out_unlock;
=20
 		if (llc < 0)
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index d2af7bfd36bf..11dbd74cd365 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -88,6 +88,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
 SCHED_FEAT(SIS_UTIL, true)
=20
 SCHED_FEAT(SCHED_CACHE, true)
+SCHED_FEAT(SCHED_CACHE_LB, true)
 /*
  * Issue a WARN when we do multiple update_rq_clock() calls
  * in a single rq->lock section. Default disabled because the
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A4A02749D5
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:13:44 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716425; cv=none;
 b=VIDdU39lmJxSYrHB9S4l1mh0boKZZki9BLORb9qgZo0xYyJtYcfX+m2EFuLna+wvMqdM3b9jRoxWjfYX98zrEOAuWNoH8zCG6FpNq8YiKHKq1NGdKxQHVgzOOiLG5uy1qO9t7Wa4goaOHrkQI+arcKezllgcvY4ibca99xDzFQA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716425; c=relaxed/simple;
	bh=YdhYlPfOzBX2JmU404dp2rIw4uGKUk0/reO/wjvEcdA=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=AeIzKWJSFg2pTtYt5o5Vz8pc3/BIn9Yhzjs5gEx3/9nVXOR8/67BRRpJDX5hibfRtV6EY8e/fAEY/Zxa+8RbAe6m0nXD1z5eRZPQvlghGCpOjLX0XPm2maXH0OgysqVx3VFMCsFbO52VsoQD2p85NC594bUHYDFoDa6p7TKhhlA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=RBrobR1i; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="RBrobR1i"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716424; x=1786252424;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=YdhYlPfOzBX2JmU404dp2rIw4uGKUk0/reO/wjvEcdA=;
  b=RBrobR1iwxBeJZoLKXMRg5MQx8WGXo8muZpXStdx9iyAK264sfhIG/qu
   joL4lEprzIBRw8X03Yy5P97hTPxpboN85pYrMB8bQ6FHQo9ybNckkrA6U
   Fm88MiZ42tIDdBFytUX2SY9R5LIWL4D6l7uxGHF/7t9G2tSrGXQZrLdmV
   kEXY08yQyIUpqXqtLN/Fts7veKj7eYDCqo12PTEZYQ6XGxrwnt4HlyW3a
   b9OPJyXpEcpsigfeiakBxz87spvYkl6NsSdiBGHP8WsVh/XlkJ8G7/XfP
   bZjJOX3ekGPt6NQIeusuWhHKU/YI3AgkFy7IeRU2nggpBNO+zFRf78cN1
   w==;
X-CSE-ConnectionGUID: 6JJ7KzVjTq2D3T0a9H1P9w==
X-CSE-MsgGUID: pH0Ms/VYRfa3u1EQ9lfZ0A==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="79620375"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="79620375"
Received: from fmviesa009.fm.intel.com ([10.60.135.149])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:13:44 -0700
X-CSE-ConnectionGUID: p2RcioaZTnKrvWCzoclFkA==
X-CSE-MsgGUID: 7ArRWLtgSgqBcLoFA2nIvw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="165844031"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa009.fm.intel.com with ESMTP; 08 Aug 2025 22:13:38 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 20/28] sched: Introduce SCHED_CACHE_WAKE to control LLC
 aggregation on wake up
Date: Sat,  9 Aug 2025 13:07:35 +0800
Message-Id: 
 <144358df73cbb8c7d24f757fc40cb068be603bed.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tim Chen <tim.c.chen@linux.intel.com>

Introduce SCHED_CACHE_WAKE feature to enable or disable cache-aware
wake up. Disable this feature by default because cache-aware wakeup
is overly aggressive in stacking wakees of the same process on the
same LLC, if they are frequently woken up.

The wake ups can be much more frequent than load balances, adding
much overhead when load balance alone for LLC aggregation is sufficient.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c     | 6 +++++-
 kernel/sched/features.h | 1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9843d4e1d84f..6e61f9e1f628 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9063,7 +9063,7 @@ static int select_cache_cpu(struct task_struct *p, in=
t prev_cpu)
 	struct mm_struct *mm =3D p->mm;
 	int cpu;
=20
-	if (!sched_feat(SCHED_CACHE))
+	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_WAKE))
 		return prev_cpu;
=20
 	if (!mm || p->nr_cpus_allowed =3D=3D 1)
@@ -9076,6 +9076,10 @@ static int select_cache_cpu(struct task_struct *p, i=
nt prev_cpu)
 	if (cpus_share_cache(cpu, prev_cpu))
 		return prev_cpu;
=20
+	if (_get_migrate_hint(prev_cpu, cpu,
+			      task_util(p), true) =3D=3D mig_forbid)
+		return prev_cpu;
+
 	if (static_branch_likely(&sched_numa_balancing) &&
 	    __migrate_degrades_locality(p, prev_cpu, cpu, false) > 0) {
 		/*
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 11dbd74cd365..44b408cf0dd4 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -89,6 +89,7 @@ SCHED_FEAT(SIS_UTIL, true)
=20
 SCHED_FEAT(SCHED_CACHE, true)
 SCHED_FEAT(SCHED_CACHE_LB, true)
+SCHED_FEAT(SCHED_CACHE_WAKE, false)
 /*
  * Issue a WARN when we do multiple update_rq_clock() calls
  * in a single rq->lock section. Default disabled because the
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D7E92472BA
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:13:56 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716438; cv=none;
 b=oP/KP8CT9PyG8F223CbqjTnw5fAnuBqC4qrJsRtJ4FhmBZN2qpqY1eTd8fjtB3IjDADG3eDf23ECJS0GGe21q95Lgbd8aoZ2d1dhrh9ekTQMUxybtv0qdhdWt0awWabB+mPE502GRGKAWZIXsyFrdxhb/zbs1b9+pJ2xhI3lh0U=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716438; c=relaxed/simple;
	bh=2aEa4h1fI5J4/1AEUoGjg0eQXawj82V6LxqSqdQWKjc=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=eLlyXwmAlVBfL8ugZpWuJv/cYDUhBSgXTmsJA1VoG4McRZM9A+e9EyJW9FEw5BbRwioynILfrxFhAe4zM1FRKim6rhs5NDIaRMWKq7+xJ+DXnEZ4q4gjxkB8JnCe7RI2fcforxvMACn4NmxTFkZ0407GDkQ+uKE5393PmwyageE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=jyvdr92q; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="jyvdr92q"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716436; x=1786252436;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=2aEa4h1fI5J4/1AEUoGjg0eQXawj82V6LxqSqdQWKjc=;
  b=jyvdr92qHqcE9M3jPHv+fDdx/YNY3xnAId3qtw3C2QzAwffEVL4ZQTBS
   CePnqJIrCnZ2R6zfimwVCnLdiYu5OvVZ5ChHHhlNO+ZL+HID3ktCe7O2w
   48m583KvcVHXVXNpkpIfS7DLrauwwN0nhjxTOtWhNNA6tX0C3umtnps9k
   I2871JFWkVEb0mXhuELAw1LEqE+pk38njQNVgLdHwoT5vvi9CMGrEAr/N
   RU5gBb080A9sLYEGTtpnWCaPKZUTFqtKi9ostEazBVphHyMbIcwyMjLxO
   F+hAW/C7GT5rTxqDXPHnZ6JnGmHGoIMYcwTWKxDujBKbcjIHs9lsLI5Ay
   w==;
X-CSE-ConnectionGUID: frbwZBVeQsyYvImCyN450Q==
X-CSE-MsgGUID: O9BLeD00TFuab4r38fYeEg==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="79620417"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="79620417"
Received: from fmviesa009.fm.intel.com ([10.60.135.149])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:13:55 -0700
X-CSE-ConnectionGUID: 57mN5EeXQnOL1PlRN1msdQ==
X-CSE-MsgGUID: ZdnU+8XKSQuiZvpeY/fMpw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="165844039"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa009.fm.intel.com with ESMTP; 08 Aug 2025 22:13:50 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 21/28] sched: Introduce a static key to enable cache
 aware only for multi LLCs
Date: Sat,  9 Aug 2025 13:07:47 +0800
Message-Id: 
 <fa45f8eb38ab06d02847d57195c6304af3107c70.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

If there are more than one LLCs in the node, the
cache aware scheduling is enabled. Otherwise, the
cache aware scheduling is disabled.

The definition of multiple LLCs in a node is that
every node in the system should have more than one
LLC. For example, if node0, node1, and node2 each
have 4 LLCs, while node3 has 1 LLC (possibly due
to CPU hotplug), cache-aware scheduling should be
disabled.

Suggested-by: Libo Chen <libo.chen@oracle.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 kernel/sched/fair.c     | 31 ++++++++++++++++++++++---------
 kernel/sched/sched.h    |  1 +
 kernel/sched/topology.c | 22 ++++++++++++++++++++--
 3 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6e61f9e1f628..194ec594561b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1175,6 +1175,8 @@ static s64 update_curr_se(struct rq *rq, struct sched=
_entity *curr)
 #define EPOCH_PERIOD	(HZ/100)	/* 10 ms */
 #define EPOCH_OLD	5		/* 50 ms */
=20
+DEFINE_STATIC_KEY_FALSE(sched_cache_present);
+
 static int llc_id(int cpu)
 {
 	if (cpu < 0)
@@ -1318,7 +1320,8 @@ void account_mm_sched(struct rq *rq, struct task_stru=
ct *p, s64 delta_exec)
 	unsigned long epoch;
 	int mm_sched_llc =3D -1;
=20
-	if (!sched_feat(SCHED_CACHE))
+	if (!sched_feat(SCHED_CACHE) ||
+	    !static_branch_likely(&sched_cache_present))
 		return;
=20
 	if (p->sched_class !=3D &fair_sched_class)
@@ -1366,7 +1369,8 @@ static void task_tick_cache(struct rq *rq, struct tas=
k_struct *p)
 	struct callback_head *work =3D &p->cache_work;
 	struct mm_struct *mm =3D p->mm;
=20
-	if (!sched_feat(SCHED_CACHE))
+	if (!sched_feat(SCHED_CACHE) ||
+	    !static_branch_likely(&sched_cache_present))
 		return;
=20
 	if (!mm || !mm->pcpu_sched)
@@ -9063,7 +9067,8 @@ static int select_cache_cpu(struct task_struct *p, in=
t prev_cpu)
 	struct mm_struct *mm =3D p->mm;
 	int cpu;
=20
-	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_WAKE))
+	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_WAKE) ||
+	    !static_branch_likely(&sched_cache_present))
 		return prev_cpu;
=20
 	if (!mm || p->nr_cpus_allowed =3D=3D 1)
@@ -10024,6 +10029,7 @@ int can_migrate_task(struct task_struct *p, struct =
lb_env *env)
=20
 #ifdef CONFIG_SCHED_CACHE
 	if (sched_feat(SCHED_CACHE) && sched_feat(SCHED_CACHE_LB) &&
+	   static_branch_likely(&sched_cache_present) &&
 	    get_migrate_hint(env->src_cpu, env->dst_cpu, p) =3D=3D mig_forbid)
 		return 0;
 #endif
@@ -10109,7 +10115,8 @@ static struct list_head
 	LIST_HEAD(no_pref_llc);
 	LIST_HEAD(pref_other_llc);
=20
-	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB))
+	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB) ||
+	    !static_branch_likely(&sched_cache_present))
 		return tasks;
=20
 	if (cpus_share_cache(env->dst_cpu, env->src_cpu))
@@ -10295,6 +10302,7 @@ static int detach_tasks(struct lb_env *env)
 		 * they are tasks that prefer the current LLC.
 		 */
 		if (sched_feat(SCHED_CACHE) && sched_feat(SCHED_CACHE_LB) &&
+		    static_branch_likely(&sched_cache_present) &&
 		    p->preferred_llc !=3D -1 &&
 		    llc_id(env->src_cpu) =3D=3D p->preferred_llc)
 			break;
@@ -10952,7 +10960,8 @@ static inline bool llc_balance(struct lb_env *env, =
struct sg_lb_stats *sgs,
 	struct sched_domain *child =3D env->sd->child;
 	int llc;
=20
-	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB))
+	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB) ||
+	    !static_branch_likely(&sched_cache_present))
 		return false;
=20
 	if (env->sd->flags & SD_SHARE_LLC)
@@ -11064,7 +11073,8 @@ static void update_sg_if_llc(struct lb_env *env, st=
ruct sg_lb_stats *sgs,
 	struct sched_domain_shared *sd_share;
=20
 	if (!sched_feat(SCHED_CACHE) || env->idle =3D=3D CPU_NEWLY_IDLE ||
-	    !sched_feat(SCHED_CACHE_LB))
+	    !sched_feat(SCHED_CACHE_LB) ||
+	    !static_branch_likely(&sched_cache_present))
 		return;
=20
 	/* only care the sched domain that spans 1 LLC */
@@ -11126,7 +11136,8 @@ static inline void update_sg_lb_stats(struct lb_env=
 *env,
 			*sg_overutilized =3D 1;
=20
 #ifdef CONFIG_SCHED_CACHE
-		if (sched_feat(SCHED_CACHE) && sched_feat(SCHED_CACHE_LB)) {
+		if (sched_feat(SCHED_CACHE) && sched_feat(SCHED_CACHE_LB) &&
+		    static_branch_likely(&sched_cache_present)) {
 			int j;
=20
 			for (j =3D 0; j < max_llcs; ++j)
@@ -12412,7 +12423,8 @@ imbalanced_active_balance(struct lb_env *env)
 static inline bool
 break_llc_locality(struct lb_env *env)
 {
-	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB))
+	if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB) ||
+	    !static_branch_likely(&sched_cache_present))
 		return 0;
=20
 	if (cpus_share_cache(env->src_cpu, env->dst_cpu))
@@ -12914,7 +12926,8 @@ static int active_load_balance_cpu_stop(void *data)
 #ifdef CONFIG_SCHED_CACHE
 		int llc =3D llc_idx(target_cpu);
=20
-		if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB))
+		if (!sched_feat(SCHED_CACHE) || !sched_feat(SCHED_CACHE_LB) ||
+		    !static_branch_likely(&sched_cache_present))
 			goto out_unlock;
=20
 		if (llc < 0)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4464b92767ad..3e60618a88e9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2857,6 +2857,7 @@ extern unsigned int sysctl_numa_balancing_hot_thresho=
ld;
 #ifdef CONFIG_SCHED_CACHE
 extern unsigned int sysctl_llc_aggr_cap;
 extern unsigned int sysctl_llc_aggr_imb;
+extern struct static_key_false sched_cache_present;
 #endif
=20
 #ifdef CONFIG_SCHED_HRTICK
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 91a2b7f65fee..8483c02b4d28 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2476,6 +2476,8 @@ build_sched_domains(const struct cpumask *cpu_map, st=
ruct sched_domain_attr *att
 	int i, ret =3D -ENOMEM;
 	bool has_asym =3D false;
 	bool has_cluster =3D false;
+	bool llc_has_parent_sd =3D false;
+	unsigned int multi_llcs_node =3D 1;
=20
 #ifdef CONFIG_SCHED_CACHE
 	if (max_llcs < 0) {
@@ -2545,6 +2547,8 @@ build_sched_domains(const struct cpumask *cpu_map, st=
ruct sched_domain_attr *att
 				struct sched_domain __rcu *top_p;
 				unsigned int nr_llcs;
=20
+				if (!llc_has_parent_sd)
+					llc_has_parent_sd =3D true;
 				/*
 				 * For a single LLC per node, allow an
 				 * imbalance up to 12.5% of the node. This is
@@ -2566,10 +2570,19 @@ build_sched_domains(const struct cpumask *cpu_map, =
struct sched_domain_attr *att
 				 * between LLCs and memory channels.
 				 */
 				nr_llcs =3D sd->span_weight / child->span_weight;
-				if (nr_llcs =3D=3D 1)
+				/*
+				 * iff all nodes have multiple LLCs, the
+				 * multi_llcs_node will be set to 1. If
+				 * there is at least 1 node having 1 single
+				 * LLC, the multi_llcs_node remains 0.
+				 */
+				if (nr_llcs =3D=3D 1) {
 					imb =3D sd->span_weight >> 3;
-				else
+					multi_llcs_node =3D 0;
+				} else {
 					imb =3D nr_llcs;
+					multi_llcs_node &=3D 1;
+				}
 				imb =3D max(1U, imb);
 				sd->imb_numa_nr =3D imb;
=20
@@ -2617,6 +2630,11 @@ build_sched_domains(const struct cpumask *cpu_map, s=
truct sched_domain_attr *att
 	if (has_cluster)
 		static_branch_inc_cpuslocked(&sched_cluster_active);
=20
+#ifdef CONFIG_SCHED_CACHE
+	if (llc_has_parent_sd && multi_llcs_node && !sched_asym_cpucap_active())
+		static_branch_inc_cpuslocked(&sched_cache_present);
+#endif
+
 	if (rq && sched_debug_verbose)
 		pr_info("root domain span: %*pbl\n", cpumask_pr_args(cpu_map));
=20
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 72A7D2749D5
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:14:08 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716450; cv=none;
 b=IeQpUvuEInOqzZCCDB3S2FFJC3zSp4/XkEtNAu2D98UQxwsXgo2BbwLtaxH6iJwbi4gN7aV60Aez3K8ydwiJFAzlJgHRf/3+aORKqKYd4JzlvthQfT5xmROL82OUpU66n+lqag40QZb3zB8TfPDePjvzD5oeSB7VRw6J3Cixx+g=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716450; c=relaxed/simple;
	bh=aXXr09emE5m1BA8+F+9P+Qieum2q8rFMJHj2OrnZISQ=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=S5MR0P7NeiK39j0s1bhGmWRhNoCo1BGGAp0DbGAVUu/cKCksVWgSVTBQqiegxZ+gsioGdqPIL8dWWDic13tfdX1SnoZ6qXprsQhZj7vCqF2cTb/xO8jcfhLxZaWGG0ZCiaJ0gxuXgS+7OA9TZAf4d9OKuUhX4CnNCW42X+VS4FE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=GZsq9eJL; arc=none smtp.client-ip=192.198.163.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="GZsq9eJL"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716449; x=1786252449;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=aXXr09emE5m1BA8+F+9P+Qieum2q8rFMJHj2OrnZISQ=;
  b=GZsq9eJL5DGjYunKJTsof+Ln2cKfOk2BGwjzrC+sKFMATJMkyxO99PMo
   2Oyl5uVHYhRF8Tm0YmjjPIUI3d++yQ67YwJyf8GQHF8cYeMonziUJgHhH
   kWHsp/STcLEVX42oVsCvQJlHA6eoqh5JKSyBBe3w1N12e5vNle7MdQRuI
   9sPdUfMBH0dbovuNFtw5OfBzc2eoiu4kiBY1XCFzj5eShFF03nf9Tv2B/
   ClF5YQoCu+HTwDDVvM9QKGz82gKXl8kYElV4byqv5tvHmI7Psovf6yI1d
   zi0XGuLMMAQ/QyWVmk7U53AdlwCTHAyvtt6E7DmP/8gc3IF+ydRquzbUa
   g==;
X-CSE-ConnectionGUID: Q9jwphPASBiXZUHMOabB2g==
X-CSE-MsgGUID: 2ZVJmYQUSISFufuXvDLTyg==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="60860025"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="60860025"
Received: from orviesa003.jf.intel.com ([10.64.159.143])
  by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:14:08 -0700
X-CSE-ConnectionGUID: ilrvcjmWQsiRa7CRb1DYVw==
X-CSE-MsgGUID: Uh2zL5Q0RAa3BSoaBdHAzw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="169693092"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by orviesa003.jf.intel.com with ESMTP; 08 Aug 2025 22:14:02 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 22/28] sched: Turn EPOCH_PERIOD and EPOCH_OLD into
 tunnable debugfs
Date: Sat,  9 Aug 2025 13:07:59 +0800
Message-Id: 
 <79c8fdcf7e875617935cfaba2ea1f2c2ae5ce62c.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Convert EPOCH_PERIOD and EPOCH_OLD into tunable debugfs
entries. Users can adjust the decay rate as needed.
By default, occupancy decays by half every 10 ms.

Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 kernel/sched/debug.c | 2 ++
 kernel/sched/fair.c  | 9 ++++++---
 kernel/sched/sched.h | 2 ++
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 682fd91a42a0..7a9ec03704b9 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -535,6 +535,8 @@ static __init int sched_init_debug(void)
 #ifdef CONFIG_SCHED_CACHE
 	debugfs_create_u32("llc_aggr_cap", 0644, debugfs_sched, &sysctl_llc_aggr_=
cap);
 	debugfs_create_u32("llc_aggr_imb", 0644, debugfs_sched, &sysctl_llc_aggr_=
imb);
+	debugfs_create_u32("llc_period", 0644, debugfs_sched, &sysctl_llc_period);
+	debugfs_create_u32("llc_old", 0644, debugfs_sched, &sysctl_llc_old);
 #endif
 	debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops=
);
=20
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 194ec594561b..64f757ad39fc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1175,6 +1175,9 @@ static s64 update_curr_se(struct rq *rq, struct sched=
_entity *curr)
 #define EPOCH_PERIOD	(HZ/100)	/* 10 ms */
 #define EPOCH_OLD	5		/* 50 ms */
=20
+__read_mostly unsigned int sysctl_llc_period    =3D EPOCH_PERIOD;
+__read_mostly unsigned int sysctl_llc_old       =3D EPOCH_OLD;
+
 DEFINE_STATIC_KEY_FALSE(sched_cache_present);
=20
 static int llc_id(int cpu)
@@ -1283,9 +1286,9 @@ static inline void __update_mm_sched(struct rq *rq, s=
truct mm_sched *pcpu_sched)
 	long delta =3D now - rq->cpu_epoch_next;
=20
 	if (delta > 0) {
-		n =3D (delta + EPOCH_PERIOD - 1) / EPOCH_PERIOD;
+		n =3D (delta + sysctl_llc_period - 1) / sysctl_llc_period;
 		rq->cpu_epoch +=3D n;
-		rq->cpu_epoch_next +=3D n * EPOCH_PERIOD;
+		rq->cpu_epoch_next +=3D n * sysctl_llc_period;
 		__shr_u64(&rq->cpu_runtime, n);
 	}
=20
@@ -1346,7 +1349,7 @@ void account_mm_sched(struct rq *rq, struct task_stru=
ct *p, s64 delta_exec)
 	 * has only 1 thread, invalidate
 	 * it's preferred state.
 	 */
-	if (epoch - READ_ONCE(mm->mm_sched_epoch) > EPOCH_OLD ||
+	if (epoch - READ_ONCE(mm->mm_sched_epoch) > sysctl_llc_old ||
 	    get_nr_threads(p) <=3D 1) {
 		mm->mm_sched_cpu =3D -1;
 		pcpu_sched->occ =3D 0;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3e60618a88e9..d752d64d4acd 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2858,6 +2858,8 @@ extern unsigned int sysctl_numa_balancing_hot_thresho=
ld;
 extern unsigned int sysctl_llc_aggr_cap;
 extern unsigned int sysctl_llc_aggr_imb;
 extern struct static_key_false sched_cache_present;
+extern unsigned int sysctl_llc_period;
+extern unsigned int sysctl_llc_old;
 #endif
=20
 #ifdef CONFIG_SCHED_HRTICK
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50FEB275872
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:14:20 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716461; cv=none;
 b=IojQ3o0319gniGMl43HTVAIglRucNSyn6f7mIZ2sA6nGcNZlUGEZWY6057tDsNZ4vk1O+nB32WSiImkG1cA4P3bSQwXMpAQf17p3nQR/jrpVdHP7V0+mJDJgG2Sf7l8Ti7krqUFqfX2pqdYZqFCdot+k/yWtmOk0EJDIuVmMV9c=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716461; c=relaxed/simple;
	bh=vn8SQoDAOd07cpCKj326vCb8D/qCqZCs9BwVgi4KoVg=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=PMXTmQv0twXoDtxt8UL0eicbBf9o7Tfv//e1wCHtE99sZhtB/nIU8wVaymmbbfxF6XMzVWor4WdG/qxhSD/wWJ4vwz9cuiTjtjsAycEcLvPhT9aUF3kUVc4kwE6SAX+4OTFtFFHuRXxnq1R/3zLjD9SIRWfAHwReiV3b07cgCmM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=nkaLairD; arc=none smtp.client-ip=192.198.163.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="nkaLairD"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716461; x=1786252461;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=vn8SQoDAOd07cpCKj326vCb8D/qCqZCs9BwVgi4KoVg=;
  b=nkaLairDDjkysNs4UHTZ5xKHewoxOhkUNj9VyNX4tbdn4A+qnsAVwMle
   R5heWKuCY8Flip8hzeFiNi/CFABQw9zu7obpiMTotWoXuYrKrGBh3HoZh
   lmLG5GkRobIJvrI3ad/N+LP8GAWOX5LCCD9ciXh9NpYENpuy7gVq79Rno
   lOPq4XCXPVEuiMBh+0Se3GxDjUG9K2DZWlyzewIOPzwn2XZvGRXdUZ3Ot
   w3MIJHSIsVA80TETVQPqTJE71E/W3dHyU/Fc9CdibOzm0oeRAQl7UFWIV
   RWX9ArTsi+Tp+Wc1c8CtMey1/OwsGiNy5hIgFduA1bpZq8LLDwe4VQMBf
   g==;
X-CSE-ConnectionGUID: Uof0EABPSOGTbrf//bNl/A==
X-CSE-MsgGUID: G6Abi0pKRMKI/T1SAuVRiQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="60860044"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="60860044"
Received: from orviesa003.jf.intel.com ([10.64.159.143])
  by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:14:20 -0700
X-CSE-ConnectionGUID: Y7bYuOZeSdOfQBAqGM3CtQ==
X-CSE-MsgGUID: edOp5vWRTYOveGm/1PiD4g==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="169693142"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by orviesa003.jf.intel.com with ESMTP; 08 Aug 2025 22:14:14 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 23/28] sched: Scan a task's preferred node for
 preferred LLC
Date: Sat,  9 Aug 2025 13:08:11 +0800
Message-Id: 
 <178bf43d7cbc9b2c9aea408dd56b87391067df37.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

When sched_cache is enabled, fully scanning all online
CPUs to find the hottest one is very costly. As a first
step, limit the scan to only the CPUs within the task's
preferred node. If the node containing the task's preferred
LLC is not in the CPU scan mask, add it. Additionally, if
the node where the current task is running is not in the
scan mask, add it too.

Suggested-by: Jianyong Wu <jianyong.wu@outlook.com>
Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 kernel/sched/fair.c | 36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 64f757ad39fc..420d3a080990 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1390,13 +1390,36 @@ static void task_tick_cache(struct rq *rq, struct t=
ask_struct *p)
 	}
 }
=20
+static void get_scan_cpumasks(cpumask_var_t cpus, int cache_cpu,
+			      int pref_nid, int curr_cpu)
+{
+#ifdef CONFIG_NUMA_BALANCING
+	/* first honor the task's preferred node */
+	if (pref_nid !=3D NUMA_NO_NODE)
+		cpumask_or(cpus, cpus, cpumask_of_node(pref_nid));
+#endif
+
+	/* secondly honor the task's cache CPU if it is not included */
+	if (cache_cpu !=3D -1 && !cpumask_test_cpu(cache_cpu, cpus))
+		cpumask_or(cpus, cpus,
+			   cpumask_of_node(cpu_to_node(cache_cpu)));
+
+	/*
+	 * Thirdly honor the task's current running node
+	 * as the last resort.
+	 */
+	if (!cpumask_test_cpu(curr_cpu, cpus))
+		cpumask_or(cpus, cpus, cpumask_of_node(cpu_to_node(curr_cpu)));
+}
+
 static void __no_profile task_cache_work(struct callback_head *work)
 {
 	struct task_struct *p =3D current;
 	struct mm_struct *mm =3D p->mm;
 	unsigned long m_a_occ =3D 0;
 	unsigned long last_m_a_occ =3D 0;
-	int cpu, m_a_cpu =3D -1;
+	int cpu, m_a_cpu =3D -1, cache_cpu,
+	    pref_nid =3D NUMA_NO_NODE, curr_cpu =3D smp_processor_id();
 	cpumask_var_t cpus;
=20
 	WARN_ON_ONCE(work !=3D &p->cache_work);
@@ -1406,11 +1429,18 @@ static void __no_profile task_cache_work(struct cal=
lback_head *work)
 	if (p->flags & PF_EXITING)
 		return;
=20
-	if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
+	if (!zalloc_cpumask_var(&cpus, GFP_KERNEL))
 		return;
=20
+	cache_cpu =3D mm->mm_sched_cpu;
+#ifdef CONFIG_NUMA_BALANCING
+	if (static_branch_likely(&sched_numa_balancing))
+		pref_nid =3D p->numa_preferred_nid;
+#endif
+
 	scoped_guard (cpus_read_lock) {
-		cpumask_copy(cpus, cpu_online_mask);
+		get_scan_cpumasks(cpus, cache_cpu,
+				  pref_nid, curr_cpu);
=20
 		for_each_cpu(cpu, cpus) {
 			/* XXX sched_cluster_active */
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 057372472BA
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:14:32 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716474; cv=none;
 b=VIfN6Nj+OmuTvjcAQC5ue6+EJDcsfkI76dqTW3x38qtgbZ4hqzLAXpZlXfM9DlKB8dKo57i5wMslwoWbzJnvNz1ykOFHxPfoi2S93m+jstGBBseKp1ztbQwQ2K61GTrnAqbJdZm7pnjXyKLjWHy5vyrtWh/xw7PspOvnpd6AKpw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716474; c=relaxed/simple;
	bh=FSWaU6wjxtUO8jXpCphgNEzz4lxAHBHSTH4IiVUyBUk=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=XjsKAevWGzT+p1INgbZoPieajKhnft6CnQg6Bbk4u1Z+7t5XRC0bswqFYNyS3V/ZFpoWqhCw4RYtVBZI+9IiO7g1Q46JsT7+09UPlrjZPLkzbNiFJI/DRRGFHTenkxTcke+8xYuOm3RbKap9vjHJ6muwNQMS05rVf9nebWSot00=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=gmFy4oU1; arc=none smtp.client-ip=192.198.163.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="gmFy4oU1"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716473; x=1786252473;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=FSWaU6wjxtUO8jXpCphgNEzz4lxAHBHSTH4IiVUyBUk=;
  b=gmFy4oU1sZtwQyMYwHB8SIoIYYE5hI8m7TFKqp6zqiVd8rnbxM1bMhLC
   74RYM+I/O6xKXfeu/UVzPgl1+lq9og33Njeix9LSwjF6dc54BAfz6kZpm
   XSL2l8zGvHKS024WUDTZLsjKB3ozB4WcNBoQCDO/MuFQPfhhiy+fkzGjd
   GRjfmZ2nZEDzv9f+jC+e5CY8l12nBabKfqFG4La0LMDW1GFk7YsYd275+
   ppat6y66psYB1mii4x1wz+0D3WFtxDecRb9O1Al2JUDYi696b9W2OMbl/
   S6cwP0rZ5li5OyXKhxjLqGgxHrmb3hYgWbTMhafNcMH8p7GmxPK8BizEQ
   Q==;
X-CSE-ConnectionGUID: hfmQ09kFSka1PTGb3z++tA==
X-CSE-MsgGUID: jqX5qpmwT2awTicWyNB8Gw==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="60860082"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="60860082"
Received: from orviesa003.jf.intel.com ([10.64.159.143])
  by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:14:32 -0700
X-CSE-ConnectionGUID: UKKGkDsGRSSXIWvNvv1UIw==
X-CSE-MsgGUID: AYQBgKAjRwGDzbPfQLudxQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="169693165"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by orviesa003.jf.intel.com with ESMTP; 08 Aug 2025 22:14:26 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 24/28] sched: Record average number of runninhg tasks
 per process
Date: Sat,  9 Aug 2025 13:08:23 +0800
Message-Id: 
 <f76bd466650791d1d4893bb66d973dc2861f473f.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Performance regression was found when running hackbench
with many threads per process(the fd number is high).
To avoid this regression, process having a large number
of active threads should be excluded from cache aware
scheduling.

With sched_cache enabled, record the number of active threads within
the process. This calculation occurs in the periodic task_cache_work():
when iterating over the CPUs, check the currently running task on that
CPU; if the running task belongs to the same process as the task that
launches task_cache_work(), increment the active thread count by 1.

If the number exceeds the number of CPUs in the preferred LLC,
sched_cache is prevented from aggregating too many threads in one
LLC domain.

Reported-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 include/linux/mm_types.h |  1 +
 kernel/sched/fair.c      | 14 ++++++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 41a598a44361..13b715357ccb 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1033,6 +1033,7 @@ struct mm_struct {
 		raw_spinlock_t mm_sched_lock;
 		unsigned long mm_sched_epoch;
 		int mm_sched_cpu;
+		u64 nr_running_avg;
 #endif
=20
 #ifdef CONFIG_MMU
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 420d3a080990..2577b4225c3f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1414,12 +1414,13 @@ static void get_scan_cpumasks(cpumask_var_t cpus, i=
nt cache_cpu,
=20
 static void __no_profile task_cache_work(struct callback_head *work)
 {
-	struct task_struct *p =3D current;
+	struct task_struct *p =3D current, *cur;
 	struct mm_struct *mm =3D p->mm;
 	unsigned long m_a_occ =3D 0;
 	unsigned long last_m_a_occ =3D 0;
 	int cpu, m_a_cpu =3D -1, cache_cpu,
-	    pref_nid =3D NUMA_NO_NODE, curr_cpu =3D smp_processor_id();
+	    pref_nid =3D NUMA_NO_NODE, curr_cpu =3D smp_processor_id(),
+	    nr_running =3D 0;
 	cpumask_var_t cpus;
=20
 	WARN_ON_ONCE(work !=3D &p->cache_work);
@@ -1460,6 +1461,14 @@ static void __no_profile task_cache_work(struct call=
back_head *work)
 					m_cpu =3D i;
 				}
 				nr++;
+
+				rcu_read_lock();
+				cur =3D rcu_dereference(cpu_rq(i)->curr);
+				if (cur && !(cur->flags & (PF_EXITING | PF_KTHREAD)) &&
+				    cur->mm =3D=3D mm)
+					nr_running++;
+				rcu_read_unlock();
+
 				trace_printk("(%d) occ: %ld m_occ: %ld m_cpu: %d nr: %d\n",
 					     per_cpu(sd_llc_id, i), occ, m_occ, m_cpu, nr);
 			}
@@ -1489,6 +1498,7 @@ static void __no_profile task_cache_work(struct callb=
ack_head *work)
 		mm->mm_sched_cpu =3D m_a_cpu;
 	}
=20
+	update_avg(&mm->nr_running_avg, nr_running);
 	free_cpumask_var(cpus);
 }
=20
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7B93262FE7
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:14:45 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.18
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716487; cv=none;
 b=SwLkLgdtnhX6+OqLa78yMZ0B4CsWLaf64/W+OI5unQJxwjCB9iC7AOkHjHIaNJ3elBvx03DMvZl1I9GsmU+4HfwjpJKe6RPlgB+vcyUgYbcDZyIBhWoXYSKefXUoVChQUkXZnD99yesLvH9Ng14G5w2CQagDetRVGrHCe4izAMY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716487; c=relaxed/simple;
	bh=NXUgKwlBD+LbS7BQoNhwYmUMMNvH06C+9x5ABL9KQvU=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=hLGB1Xa+0uSZNB/dGK1ZYaVry91TulpBPgnDTPCsKPDTIPOxhNaZZ9Cuzpbo38IwKkY5uJgzEZ1uUGrD/s/RAk4WaWJnoklk4db0h0lc6DTb3DdY///Tx0rE+4HGY8/VgFkFtuHBrA6i2cPS9Gpq8iWYdaB6+MhIJDjLW2gw96I=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=JtFt/v4t; arc=none smtp.client-ip=192.198.163.18
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="JtFt/v4t"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716485; x=1786252485;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=NXUgKwlBD+LbS7BQoNhwYmUMMNvH06C+9x5ABL9KQvU=;
  b=JtFt/v4t08KznPLSFaAfIMnOlZCZo8kJRJT+HfIrOEKXEd/dqH+lrk/a
   pIz5BUtjCYdr/a4c7pZ1o1bWZFwpHGC/M6S2vru19uOo+9h+a6+cDG4JL
   yM5/aSH0L5HRBrhBX4JQAp/3MD69CoaWhFe+GdocmtVOMYwP+erObftt/
   1RQRwVjl1GoMs0U6JSVWCb5Sk5EwLmq/bSRzlsfuwbMavl7fO/aSz0Urq
   XtvO5DV6xrhsk1Y77keeRc1mlMUGjry5fYahbKbwyaOxyKFtrNAFVv1mP
   qRCo9H1Kh+G72Foi9f3RFJzd/ky65xGF3aC6FH86kLP/zxm2pRC6gsTTi
   w==;
X-CSE-ConnectionGUID: IHbXoWNLTlyKIkz7HGkE0w==
X-CSE-MsgGUID: A4HbKv+ySb+99UGpcOUEYA==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="56259994"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="56259994"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:14:45 -0700
X-CSE-ConnectionGUID: tRxkh/5RR2W0QaPbtyXBXw==
X-CSE-MsgGUID: kEi0G4T2TKO4l03CeelQGA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="165476161"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by orviesa007.jf.intel.com with ESMTP; 08 Aug 2025 22:14:39 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 25/28] sched: Skip cache aware scheduling if the
 process has many active threads
Date: Sat,  9 Aug 2025 13:08:36 +0800
Message-Id: 
 <463bc54a283c1b908ea286ce67f301e2d1d39ea1.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

If the number of active threads within the process
exceeds the number of Cores(divided by SMTs number)
in the LLC, do not enable cache-aware scheduling.
This is because there is a risk of cache contention
within the preferred LLC when too many threads are
present.

Reported-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 kernel/sched/fair.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2577b4225c3f..4bf794f170cf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1205,6 +1205,18 @@ static inline int pref_llc_idx(struct task_struct *p)
 	return llc_idx(p->preferred_llc);
 }
=20
+static bool exceed_llc_nr(struct mm_struct *mm, int cpu)
+{
+	int smt_nr =3D 1;
+
+#ifdef CONFIG_SCHED_SMT
+	if (sched_smt_active())
+		smt_nr =3D cpumask_weight(cpu_smt_mask(cpu));
+#endif
+
+	return ((mm->nr_running_avg * smt_nr) > per_cpu(sd_llc_size, cpu));
+}
+
 static void account_llc_enqueue(struct rq *rq, struct task_struct *p)
 {
 	int pref_llc;
@@ -1350,7 +1362,8 @@ void account_mm_sched(struct rq *rq, struct task_stru=
ct *p, s64 delta_exec)
 	 * it's preferred state.
 	 */
 	if (epoch - READ_ONCE(mm->mm_sched_epoch) > sysctl_llc_old ||
-	    get_nr_threads(p) <=3D 1) {
+	    get_nr_threads(p) <=3D 1 ||
+	    exceed_llc_nr(mm, cpu_of(rq))) {
 		mm->mm_sched_cpu =3D -1;
 		pcpu_sched->occ =3D 0;
 	}
@@ -1430,6 +1443,11 @@ static void __no_profile task_cache_work(struct call=
back_head *work)
 	if (p->flags & PF_EXITING)
 		return;
=20
+	if (get_nr_threads(p) <=3D 1) {
+		mm->mm_sched_cpu =3D -1;
+		return;
+	}
+
 	if (!zalloc_cpumask_var(&cpus, GFP_KERNEL))
 		return;
=20
@@ -9095,6 +9113,10 @@ static __maybe_unused enum llc_mig_hint get_migrate_=
hint(int src_cpu, int dst_cp
 	if (cpu < 0)
 		return mig_allow;
=20
+	 /* skip cache aware load balance for single/too many threads */
+	if (get_nr_threads(p) <=3D 1 || exceed_llc_nr(mm, dst_cpu))
+		return mig_allow;
+
 	if (cpus_share_cache(dst_cpu, cpu))
 		return _get_migrate_hint(src_cpu, dst_cpu,
 					 task_util(p), true);
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C0E67221540
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:14:58 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.18
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716500; cv=none;
 b=MmUb4c5SUS1z0wOntqBT1lBWA98RaJXvOGOJgyKpL3css2V1PB3tSyqGl6uL4LpYMOyo+rKX3+Or66/w3kvy4IrnK/1zKzbYlsK4uG0lHZdaI/ylANl1HNqUIDGNZvHQU/rlxzJ01GSoxw4kvO56Gsq+Q3Dt9ImEFj8vhMltr7o=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716500; c=relaxed/simple;
	bh=ihWK71wp/q5X7Akw0a3NfcmXIvV26+nk2ItbYopgMBI=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=jLyq+PE48ZqCxRzNfCECXbgKXorBAOraTW1R5D8fjI5Zh6EGFB8ZxO9LxYUE8qGvLCGMlNFmK1+oj1nTLMrg0x4R6BkGXURfRaIdO4gl1uU8D++Mr+VA477bh1glz9u1Ll6+Hks8Jtf0M6xJo+lTeh2jLF0wW7on+hBzGyeTVsU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=eixROvUd; arc=none smtp.client-ip=192.198.163.18
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="eixROvUd"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716498; x=1786252498;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ihWK71wp/q5X7Akw0a3NfcmXIvV26+nk2ItbYopgMBI=;
  b=eixROvUdtZ+jX6c9PUzsFkzZYy58X17+XSMG4TAKB4dMmEuzDFKNJS5/
   y6mSE9FUJZqVr6Z/MJgBg/rnrSxKx4WNtLRYRaKlxeoov9FqlTgFQ0cm3
   xrL/E9j5rOidep+PGoL8jF0Vi5sxq3zlPOp19TXKYmYCBYSnGBQNe2AtY
   mgT3vaD7Elxg79E7NgAUMSiS4MZSj26K4v5ujKu8dsw3shTBxA6CvPmPO
   rVrBtoWLK3XI3ZTP4tki0uiAJIjOH7fmVd/U18FdmdONY3ZWGGHpTVNYs
   SDtMc5zQFr8yS7suux1xYdFcGTMSbKUOlN8esjYEKijdSJZB/ZI2rOcmc
   Q==;
X-CSE-ConnectionGUID: 2OIbjhYrRs6bZ6T4QRg5pg==
X-CSE-MsgGUID: 1Tg83zzKRiy+BCTOQqeOJA==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="56260022"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="56260022"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:14:58 -0700
X-CSE-ConnectionGUID: iiCU+no/RZqZ47l6ZSVsuQ==
X-CSE-MsgGUID: gBZUGA4RTmG67FOC847xZA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="165476169"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by orviesa007.jf.intel.com with ESMTP; 08 Aug 2025 22:14:52 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 26/28] sched: Do not enable cache aware scheduling for
 process with large RSS
Date: Sat,  9 Aug 2025 13:08:49 +0800
Message-Id: 
 <881a665a94858d4fb6f13491f4dffe58c8fc3870.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

It has been reported that when running memory-intensive workloads
such as stream, sched_cache may saturate the memory bandwidth on
the preferred LLC.

To prevent this from happening, evaluate the process's memory
footprint by checking the size of RSS (anonymous pages and shared
pages) and comparing it to the size of the LLC. If the former is
larger, skip cache-aware scheduling. This is because if tasks
do not actually share data, aggregating tasks with large RSS will
likely result in cache contention and performance depredation.

However, in theory, RSS is not the same as memory footprint.
This is just an estimated approach to prevent over-aggregation.
The default behavior is to strictly compare the size of RSS with
the size of the LLC. The next patch will introduce a user-provided
hint to customize this comparison.

Reported-by: K Prateek Nayak <kprateek.nayak@amd.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 47 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4bf794f170cf..cbda7dad1305 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1205,6 +1205,34 @@ static inline int pref_llc_idx(struct task_struct *p)
 	return llc_idx(p->preferred_llc);
 }
=20
+static bool exceed_llc_capacity(struct mm_struct *mm, int cpu)
+{
+	struct cpu_cacheinfo *this_cpu_ci;
+	struct cacheinfo *l3_leaf;
+	unsigned long rss;
+	unsigned int llc;
+
+	/*
+	 * get_cpu_cacheinfo_level() can not be used
+	 * because it requires the cpu_hotplug_lock
+	 * to be held. Use get_cpu_cacheinfo()
+	 * directly because the 'cpu' can not be
+	 * offlined at the moment.
+	 */
+	this_cpu_ci =3D get_cpu_cacheinfo(cpu);
+	if (!this_cpu_ci->info_list ||
+	    this_cpu_ci->num_leaves < 3)
+		return true;
+
+	l3_leaf =3D this_cpu_ci->info_list + 3;
+	llc =3D l3_leaf->size;
+
+	rss =3D get_mm_counter(mm, MM_ANONPAGES) +
+		get_mm_counter(mm, MM_SHMEMPAGES);
+
+	return (llc <=3D (rss * PAGE_SIZE));
+}
+
 static bool exceed_llc_nr(struct mm_struct *mm, int cpu)
 {
 	int smt_nr =3D 1;
@@ -1363,7 +1391,8 @@ void account_mm_sched(struct rq *rq, struct task_stru=
ct *p, s64 delta_exec)
 	 */
 	if (epoch - READ_ONCE(mm->mm_sched_epoch) > sysctl_llc_old ||
 	    get_nr_threads(p) <=3D 1 ||
-	    exceed_llc_nr(mm, cpu_of(rq))) {
+	    exceed_llc_nr(mm, cpu_of(rq)) ||
+	    exceed_llc_capacity(mm, cpu_of(rq))) {
 		mm->mm_sched_cpu =3D -1;
 		pcpu_sched->occ =3D 0;
 	}
@@ -1448,6 +1477,14 @@ static void __no_profile task_cache_work(struct call=
back_head *work)
 		return;
 	}
=20
+	/*
+	 * Do not check exceed_llc_nr() because
+	 * the active number of threads needs to
+	 * been updated anyway.
+	 */
+	if (exceed_llc_capacity(mm, curr_cpu))
+		return;
+
 	if (!zalloc_cpumask_var(&cpus, GFP_KERNEL))
 		return;
=20
@@ -9113,8 +9150,12 @@ static __maybe_unused enum llc_mig_hint get_migrate_=
hint(int src_cpu, int dst_cp
 	if (cpu < 0)
 		return mig_allow;
=20
-	 /* skip cache aware load balance for single/too many threads */
-	if (get_nr_threads(p) <=3D 1 || exceed_llc_nr(mm, dst_cpu))
+	/*
+	 * skip cache aware load balance for single/too many threads
+	 * and large footprint.
+	 */
+	if (get_nr_threads(p) <=3D 1 || exceed_llc_nr(mm, dst_cpu) ||
+	    exceed_llc_capacity(mm, dst_cpu))
 		return mig_allow;
=20
 	if (cpus_share_cache(dst_cpu, cpu))
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BD69278E77
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:15:14 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.18
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716517; cv=none;
 b=f0wCf0A1e24ot1gDONm8873CUbeJO7p+XOYXyd1L81oXyHjWyDUgzMJcD0hJ3DF8nImLeld/DZRB4Rw1t1WEKTNhLr+PgIxdQt1pezZAV7PflwC9pScJMoIsibbNOHtKzaO++na+m07o/7UQdsk+sPTfO2f6+LpbJhepVHPoteU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716517; c=relaxed/simple;
	bh=/Ba0fCfCtChaUyr8nA+Reo2+vDWT1X8nPBpOr+cNYjk=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=EDkKLloCKeEh8IprYHQ5cLYC7XD5yrfwdWMtkoq85n7q8KXHTNKJtaTomDYe5XtQCLNBxnEyUumHNx4/C6VKj/drV2J40y00jxpuHKC2otW2Agu5fvbDIBaIndWVIczwxgu3StyAleOP2GSxyPvAaYACU8MlGinZXBKpz5KxMhg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=Y90Y97GH; arc=none smtp.client-ip=192.198.163.18
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="Y90Y97GH"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716515; x=1786252515;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=/Ba0fCfCtChaUyr8nA+Reo2+vDWT1X8nPBpOr+cNYjk=;
  b=Y90Y97GHLKUMgwRE30WxTWa+nQqnGyCMmuH77cLq8i/vdaPuuN2Ktztv
   zBg5w7QsE4Tap803U0WzbsFCGqUe1e5QS4yNcC38D0ELqu1BwsHy6z+jR
   WhnmKOXM13ylYTNfpsosT8H/fTVU1o4HavW4jD2mb3Xd/w2lpl/NwsKv1
   lUdWhkLxUWSHTQXwZSwGQLULWx/qg/CpYON81o3vjH2gAxshxAoSNBtaB
   P0r4Ex0Lc94pbcpxMXlN2Yvf/QybuyXM6p6DUPSf0Ju1um2BF6tKB2GEf
   jmV33bJNWsB01mTnWfFZ55QI4h+P2NvGIgzeEkg3JsyzHbN7Qffq9u85f
   Q==;
X-CSE-ConnectionGUID: ktJ3gmikQC2MxSJlFd01xQ==
X-CSE-MsgGUID: gbMwARUNRLGJ2ccZVMMRBw==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="56260030"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="56260030"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:15:13 -0700
X-CSE-ConnectionGUID: lbKBkeQYTzKHrxkrdXEGtA==
X-CSE-MsgGUID: NxMJcQ1jQD24A6f12YxIFA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="165476192"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by orviesa007.jf.intel.com with ESMTP; 08 Aug 2025 22:15:08 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 27/28] sched: Allow the user space to tune the scale
 factor for RSS comparison
Date: Sat,  9 Aug 2025 13:09:02 +0800
Message-Id: 
 <81c197882b7c9f4325a5cb32f8a9d1e1fc900297.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

sched_cache compares the process's resident pages with
the size of the LLC to determine whether task aggregation
on the preferred LLC might cause cache contention. If the
former is larger than the latter, skip cache-aware task
aggregation. However, some workloads with large resident
pages have a small memory footprint; such workloads could
benefit from cache-aware scheduling. The kernel lacks a
efficient mechanism to track the task's memory footprint
(yes, we have resctrl, but it is for user-space query,
and not process scope), so it is up to userspace to pass
this hint to the kernel.

Introduce /sys/kernel/debug/sched/sched_cache_ignore_rss
to control the extent to which users ignore the RSS
restriction. This value ranges from 0 to 100. A value of
0 means that the user disables the cache aware scheduling.
1 means if a process's RSS is larger than the LLC size,
cache-aware scheduling will be skipped. 100 means cache
aware scheduling is alwasy enabled regardless of RSS size.
N (between 1 and 100) means turn off cache aware scheduling
when RSS is greater than (N-1) * 256 * LLC size

For example, suppose the L3 size is 32MB. If the
sysctl_sched_cache_ignore_rss is 1: When the RSS is larger
than 32MB, the process is regarded as exceeding the LLC capacity.
If the sysctl_sched_cache_ignore_rss is 99: When the RSS is
larger than 784GB, the process is regarded as exceeding the
LLC capacity(please refer to the code):
784GB =3D (1 + (99 - 1) * 256) * 32MB

Additionally, the number of SMTs is also considered for
sysctl_sched_cache_aggr_cap; if there are many SMTs in the core,
sysctl_llc_aggr_cap will be reduced. This inhibits task aggregation
from cache-aware scheduling on systems with a high number of SMTs,
like Power 10 and Power 11.

Reported-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reported-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Reported-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/debug.c | 82 +++++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/fair.c  | 10 ++++--
 kernel/sched/sched.h |  3 +-
 3 files changed, 90 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 7a9ec03704b9..6676fc2a8c08 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -166,6 +166,83 @@ static const struct file_operations sched_feat_fops =
=3D {
 	.release	=3D single_release,
 };
=20
+#ifdef CONFIG_SCHED_CACHE
+#define SCHED_CACHE_CREATE_CONTROL(name, val)			  \
+static int sysctl_sched_cache_##name =3D val;			  \
+static ssize_t sched_cache_write_##name(struct file *filp,	  \
+					const char __user *ubuf,  \
+					size_t cnt, loff_t *ppos) \
+{								  \
+	char buf[16];						  \
+	unsigned int percent;					  \
+	if (cnt > 15)						  \
+		cnt =3D 15;					  \
+	if (copy_from_user(&buf, ubuf, cnt))			  \
+		return -EFAULT;					  \
+	buf[cnt] =3D '\0';					  \
+	if (kstrtouint(buf, 10, &percent))			  \
+		return -EINVAL;					  \
+	if (percent > 100)					  \
+		return -EINVAL;					  \
+	sysctl_sched_cache_##name =3D percent;			  \
+	*ppos +=3D cnt;						  \
+	return cnt;						  \
+}								  \
+static int sched_cache_show_##name(struct seq_file *m, void *v)	  \
+{								  \
+	seq_printf(m, "%d\n", sysctl_sched_cache_##name);	  \
+	return 0;						  \
+}								  \
+static int sched_cache_open_##name(struct inode *inode,		  \
+				   struct file *filp)		  \
+{								  \
+	return single_open(filp, sched_cache_show_##name, NULL);  \
+}								  \
+static const struct file_operations sched_cache_fops_##name =3D {	  \
+	.open		=3D sched_cache_open_##name,		  \
+	.write		=3D sched_cache_write_##name,		  \
+	.read		=3D seq_read,				  \
+	.llseek		=3D seq_lseek,				  \
+	.release	=3D single_release,			  \
+}
+
+SCHED_CACHE_CREATE_CONTROL(ignore_rss, 1);
+int get_sched_cache_rss_scale(void)
+{
+	if (!sysctl_sched_cache_ignore_rss)
+		return 0;
+
+	if (sysctl_sched_cache_ignore_rss >=3D 100)
+		return INT_MAX;
+	/*
+	 * Suppose the L3 size is 32MB. If the
+	 * sysctl_sched_cache_ignore_rss is 1:
+	 * When the RSS is larger than 32MB,
+	 * the process is regarded as exceeding
+	 * the LLC capacity. If the
+	 * sysctl_sched_cache_ignore_rss is 99:
+	 * When the RSS is larger than 784GB,
+	 * the process is regarded as exceeding
+	 * the LLC capacity:
+	 * 784GB =3D (1 + (99 - 1) * 256) * 32MB
+	 */
+	return (1 + (sysctl_sched_cache_ignore_rss - 1) * 256);
+}
+
+SCHED_CACHE_CREATE_CONTROL(aggr_cap, 50);
+int get_sched_cache_cap_scale(void)
+{
+	int smt_nr =3D 1;
+
+#ifdef CONFIG_SCHED_SMT
+	if (sched_smt_active())
+		smt_nr =3D
+			cpumask_weight(cpu_smt_mask(raw_smp_processor_id()));
+#endif
+	return (sysctl_sched_cache_aggr_cap / smt_nr);
+}
+#endif /* SCHED_CACHE */
+
 #ifdef CONFIG_SMP
=20
 static ssize_t sched_scaling_write(struct file *filp, const char __user *u=
buf,
@@ -533,10 +610,13 @@ static __init int sched_init_debug(void)
 #endif
=20
 #ifdef CONFIG_SCHED_CACHE
-	debugfs_create_u32("llc_aggr_cap", 0644, debugfs_sched, &sysctl_llc_aggr_=
cap);
 	debugfs_create_u32("llc_aggr_imb", 0644, debugfs_sched, &sysctl_llc_aggr_=
imb);
 	debugfs_create_u32("llc_period", 0644, debugfs_sched, &sysctl_llc_period);
 	debugfs_create_u32("llc_old", 0644, debugfs_sched, &sysctl_llc_old);
+	debugfs_create_file("llc_aggr_cap", 0644, debugfs_sched, NULL,
+			    &sched_cache_fops_aggr_cap);
+	debugfs_create_file("llc_ignore_rss", 0644, debugfs_sched, NULL,
+			    &sched_cache_fops_ignore_rss);
 #endif
 	debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops=
);
=20
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cbda7dad1305..018825f04063 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1211,6 +1211,7 @@ static bool exceed_llc_capacity(struct mm_struct *mm,=
 int cpu)
 	struct cacheinfo *l3_leaf;
 	unsigned long rss;
 	unsigned int llc;
+	int scale;
=20
 	/*
 	 * get_cpu_cacheinfo_level() can not be used
@@ -1230,7 +1231,11 @@ static bool exceed_llc_capacity(struct mm_struct *mm=
, int cpu)
 	rss =3D get_mm_counter(mm, MM_ANONPAGES) +
 		get_mm_counter(mm, MM_SHMEMPAGES);
=20
-	return (llc <=3D (rss * PAGE_SIZE));
+	scale =3D get_sched_cache_rss_scale();
+	if (scale =3D=3D INT_MAX)
+		return false;
+
+	return ((llc * scale) <=3D (rss * PAGE_SIZE));
 }
=20
 static bool exceed_llc_nr(struct mm_struct *mm, int cpu)
@@ -9037,7 +9042,6 @@ static int find_energy_efficient_cpu(struct task_stru=
ct *p, int prev_cpu)
 static long __migrate_degrades_locality(struct task_struct *p,
 					int src_cpu, int dst_cpu,
 					bool idle);
-__read_mostly unsigned int sysctl_llc_aggr_cap       =3D 50;
 __read_mostly unsigned int sysctl_llc_aggr_imb       =3D 20;
=20
 /*
@@ -9049,7 +9053,7 @@ __read_mostly unsigned int sysctl_llc_aggr_imb       =
=3D 20;
  * (default: ~50%)
  */
 #define fits_llc_capacity(util, max)	\
-	((util) * 100 < (max) * sysctl_llc_aggr_cap)
+	((util) * 100 < (max) * get_sched_cache_cap_scale())
=20
 /*
  * The margin used when comparing utilization.
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d752d64d4acd..eaeca4e77ead 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2855,11 +2855,12 @@ extern unsigned int sysctl_numa_balancing_scan_size;
 extern unsigned int sysctl_numa_balancing_hot_threshold;
=20
 #ifdef CONFIG_SCHED_CACHE
-extern unsigned int sysctl_llc_aggr_cap;
 extern unsigned int sysctl_llc_aggr_imb;
 extern struct static_key_false sched_cache_present;
 extern unsigned int sysctl_llc_period;
 extern unsigned int sysctl_llc_old;
+int get_sched_cache_rss_scale(void);
+int get_sched_cache_cap_scale(void);
 #endif
=20
 #ifdef CONFIG_SCHED_HRTICK
--=20
2.25.1
From nobody Sun Oct  5 03:35:17 2025
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 47CE12797BD
	for <linux-kernel@vger.kernel.org>; Sat,  9 Aug 2025 05:15:27 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.12
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754716532; cv=none;
 b=Q3Lu9iJcgpkY3JofeQsI3NuQ1TQam6CIlO+tdvTSCRjGAjVblky3W53EIomiHy80dmktuPQdtHxgcRNWPE+j/bg5BQe6GDtHnoJUJTNFKCR/9DYjJgajvDVOAMxm+f5X8nLVN12/qTm5fIAB7ohPRpMT5XEfcEaB2rgn4WdJjlA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754716532; c=relaxed/simple;
	bh=Qnj+s91JA/iUJUPb0vlHFoRsDjRZdaL0BuPbFnDHlf4=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=Id7UAlt3jf4R+5/V26lYDKIMUvQ2sWz1U1L6PxHG7qpC1Y2DBCKgalLglN7phNtyW+llMFcUZ8TM/8hPX4zlNTja13GIfazsivILfGOPcNG17Rvk+pJ6zxYEFCqME/cFa0umvr+QT0QzQ/sIQRDaZpzSMHofa4VucqncRSPHNUo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=KZT0jTXM; arc=none smtp.client-ip=198.175.65.12
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="KZT0jTXM"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1754716528; x=1786252528;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Qnj+s91JA/iUJUPb0vlHFoRsDjRZdaL0BuPbFnDHlf4=;
  b=KZT0jTXMrwi6hPfQPUBt2UhPb6fUrWeb/O3CDNims/uo7a2fRVzB96ni
   iAcBIq7cRPl82Mb6WAlg3t8qyCFKa4+mARu2XOEG/1TlTOTPJpPcEFd23
   uyE/VkcDEWP2Pk6IHBPTbmMmpwS9xu2Sc2pif7fu4IJ95Ou4he7GApWaO
   MXcgjRuqD2lXHrJW2ZCh04Xx6L3C8w5eBUkk3oAlp2wVkN4HgtgEp9ORv
   Z96q81Q0Wd5WaHbBqUBeZQ2vuQf9nOsBlZZ3rd4ahEG7C3LjJGEU6HHLi
   KLNOB6OypoKBQux1+HQWGMGe9cMObuKZXq8mFwxxNatSDn+GrcGALRB6K
   A==;
X-CSE-ConnectionGUID: jbgqMP0qR/OvQDc48pzOPw==
X-CSE-MsgGUID: hEeOV4ycSKW7r3RHicCLKw==
X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="68514913"
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="68514913"
Received: from fmviesa010.fm.intel.com ([10.60.135.150])
  by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 08 Aug 2025 22:15:27 -0700
X-CSE-ConnectionGUID: RhK2nh/kRuuYh7KksFDbdw==
X-CSE-MsgGUID: nLJQ9FRIRdmfFlE8PcQEEw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.17,278,1747724400";
   d="scan'208";a="166275369"
Received: from chenyu-dev.sh.intel.com ([10.239.62.107])
  by fmviesa010.fm.intel.com with ESMTP; 08 Aug 2025 22:15:20 -0700
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Libo Chen <libo.chen@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 28/28] sched: Add ftrace to track cache aware load
 balance and hottest CPU changes
Date: Sat,  9 Aug 2025 13:09:17 +0800
Message-Id: 
 <3e3622a5b2129b56741989f15a8debabec064de9.1754712565.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1754712565.git.tim.c.chen@linux.intel.com>
References: <cover.1754712565.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Introduce 3 trace events:

1.
The average time spent scanning CPUs and calculating occupancy
in each sample period. This event can be used to track the
overhead of cache-aware scheduling.

2.
The footprint when switching to a new mm_sched_cpu (a cache-hot CPU).
This event can be used to track whether there is any abnormal
bouncing of mm_sched_cpu.

3.
The footprint of load balancing when migrating a task between CPUs.
This event can be used to track whether cache-aware load balancing
behaves as expected.

All these events can be used with bpftrace to gain a basic
understanding of whether cache-aware scheduling is effective.

Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 include/trace/events/sched.h | 93 ++++++++++++++++++++++++++++++++++++
 kernel/sched/fair.c          | 25 ++++++++--
 2 files changed, 113 insertions(+), 5 deletions(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 4e6b2910cec3..398180c18946 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -10,6 +10,99 @@
 #include <linux/tracepoint.h>
 #include <linux/binfmts.h>
=20
+TRACE_EVENT(sched_scan_cost,
+
+	TP_PROTO(struct task_struct *t, u64 cost, int nr,
+		 u64 old_running, u64 new_running),
+
+	TP_ARGS(t, cost, nr, old_running, new_running),
+
+	TP_STRUCT__entry(
+		__array(	char,	comm,	TASK_COMM_LEN	)
+		__field(	pid_t,	pid			)
+		__field(	u64,	cost			)
+		__field(	int,	nr			)
+		__field(	u64,	old_running		)
+		__field(	u64,	new_running		)
+	),
+
+	TP_fast_assign(
+		memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
+		__entry->pid	=3D t->pid;
+		__entry->cost	=3D cost;
+		__entry->nr	=3D nr;
+		__entry->old_running	=3D old_running;
+		__entry->new_running	=3D new_running;
+	),
+
+	TP_printk("comm=3D%s pid=3D%d cost=3D%llu nr=3D%d old_r=3D%lld new_r=3D%l=
ld",
+		  __entry->comm, __entry->pid,
+		  __entry->cost, __entry->nr,
+		  __entry->old_running, __entry->new_running)
+);
+
+TRACE_EVENT(sched_cache_work,
+
+	TP_PROTO(struct task_struct *t, int pref_cpu, int pref_llc,
+		 int new_cpu, int new_llc),
+
+	TP_ARGS(t, pref_cpu, pref_llc, new_cpu, new_llc),
+
+	TP_STRUCT__entry(
+		__array(	char,	comm,	TASK_COMM_LEN	)
+		__field(	pid_t,	pid			)
+		__field(	int,	pref_cpu		)
+		__field(	int,	pref_llc		)
+		__field(	int,	new_cpu			)
+		__field(	int,	new_llc			)
+	),
+
+	TP_fast_assign(
+		memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
+		__entry->pid	=3D t->pid;
+		__entry->pref_cpu	=3D pref_cpu;
+		__entry->pref_llc	=3D pref_llc;
+		__entry->new_cpu	=3D new_cpu;
+		__entry->new_llc	=3D new_llc;
+	),
+
+	TP_printk("comm=3D%s pid=3D%d pref_cpu=3D%d pref_llc=3D%d attach_cpu=3D%d=
 attach_llc=3D%d",
+		  __entry->comm, __entry->pid,
+		  __entry->pref_cpu, __entry->pref_llc,
+		  __entry->new_cpu, __entry->new_llc)
+);
+
+TRACE_EVENT(sched_attach_task,
+
+	TP_PROTO(struct task_struct *t, int pref_cpu, int pref_llc,
+		 int attach_cpu, int attach_llc),
+
+	TP_ARGS(t, pref_cpu, pref_llc, attach_cpu, attach_llc),
+
+	TP_STRUCT__entry(
+		__array(	char,	comm,	TASK_COMM_LEN	)
+		__field(	pid_t,	pid			)
+		__field(	int,	pref_cpu		)
+		__field(	int,	pref_llc		)
+		__field(	int,	attach_cpu		)
+		__field(	int,	attach_llc		)
+	),
+
+	TP_fast_assign(
+		memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
+		__entry->pid	=3D t->pid;
+		__entry->pref_cpu	=3D pref_cpu;
+		__entry->pref_llc	=3D pref_llc;
+		__entry->attach_cpu	=3D attach_cpu;
+		__entry->attach_llc	=3D attach_llc;
+	),
+
+	TP_printk("comm=3D%s pid=3D%d pref_cpu=3D%d pref_llc=3D%d attach_cpu=3D%d=
 attach_llc=3D%d",
+		  __entry->comm, __entry->pid,
+		  __entry->pref_cpu, __entry->pref_llc,
+		  __entry->attach_cpu, __entry->attach_llc)
+);
+
 /*
  * Tracepoint for calling kthread_stop, performed to end a kthread:
  */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 018825f04063..cb2c33ee0d92 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1467,8 +1467,9 @@ static void __no_profile task_cache_work(struct callb=
ack_head *work)
 	unsigned long last_m_a_occ =3D 0;
 	int cpu, m_a_cpu =3D -1, cache_cpu,
 	    pref_nid =3D NUMA_NO_NODE, curr_cpu =3D smp_processor_id(),
-	    nr_running =3D 0;
+	    nr_running =3D 0, nr_scan =3D 0;
 	cpumask_var_t cpus;
+	u64 t0, scan_cost =3D 0;
=20
 	WARN_ON_ONCE(work !=3D &p->cache_work);
=20
@@ -1499,6 +1500,7 @@ static void __no_profile task_cache_work(struct callb=
ack_head *work)
 		pref_nid =3D p->numa_preferred_nid;
 #endif
=20
+	t0 =3D sched_clock_cpu(curr_cpu);
 	scoped_guard (cpus_read_lock) {
 		get_scan_cpumasks(cpus, cache_cpu,
 				  pref_nid, curr_cpu);
@@ -1521,6 +1523,7 @@ static void __no_profile task_cache_work(struct callb=
ack_head *work)
 					m_cpu =3D i;
 				}
 				nr++;
+				nr_scan++;
=20
 				rcu_read_lock();
 				cur =3D rcu_dereference(cpu_rq(i)->curr);
@@ -1529,8 +1532,8 @@ static void __no_profile task_cache_work(struct callb=
ack_head *work)
 					nr_running++;
 				rcu_read_unlock();
=20
-				trace_printk("(%d) occ: %ld m_occ: %ld m_cpu: %d nr: %d\n",
-					     per_cpu(sd_llc_id, i), occ, m_occ, m_cpu, nr);
+				//trace_printk("(%d) occ: %ld m_occ: %ld m_cpu: %d nr: %d\n",
+				//	     per_cpu(sd_llc_id, i), occ, m_occ, m_cpu, nr);
 			}
=20
 			// a_occ /=3D nr;
@@ -1541,8 +1544,8 @@ static void __no_profile task_cache_work(struct callb=
ack_head *work)
 			if (llc_id(cpu) =3D=3D llc_id(mm->mm_sched_cpu))
 				last_m_a_occ =3D a_occ;
=20
-			trace_printk("(%d) a_occ: %ld m_a_occ: %ld\n",
-				     per_cpu(sd_llc_id, cpu), a_occ, m_a_occ);
+			//trace_printk("(%d) a_occ: %ld m_a_occ: %ld\n",
+			//	     per_cpu(sd_llc_id, cpu), a_occ, m_a_occ);
=20
 			for_each_cpu(i, sched_domain_span(sd)) {
 				/* XXX threshold ? */
@@ -1553,12 +1556,17 @@ static void __no_profile task_cache_work(struct cal=
lback_head *work)
 		}
 	}
=20
+	scan_cost =3D sched_clock_cpu(curr_cpu) - t0;
+
 	if (m_a_occ > (2 * last_m_a_occ)) {
 		/* avoid the bouncing of mm_sched_cpu */
+		trace_sched_cache_work(p, mm->mm_sched_cpu, llc_id(mm->mm_sched_cpu),
+					m_a_cpu, llc_id(m_a_cpu));
 		mm->mm_sched_cpu =3D m_a_cpu;
 	}
=20
 	update_avg(&mm->nr_running_avg, nr_running);
+	trace_sched_scan_cost(p, scan_cost, nr_scan, mm->nr_running_avg, nr_runni=
ng);
 	free_cpumask_var(cpus);
 }
=20
@@ -10443,6 +10451,13 @@ static void attach_task(struct rq *rq, struct task=
_struct *p)
 {
 	lockdep_assert_rq_held(rq);
=20
+#ifdef CONFIG_SCHED_CACHE
+	if (p->mm)
+		trace_sched_attach_task(p,
+					p->mm->mm_sched_cpu,
+					p->mm->mm_sched_cpu !=3D -1 ? llc_id(p->mm->mm_sched_cpu) : -1,
+					cpu_of(rq), llc_id(cpu_of(rq)));
+#endif
 	WARN_ON_ONCE(task_rq(p) !=3D rq);
 	activate_task(rq, p, ENQUEUE_NOCLOCK);
 	wakeup_preempt(rq, p, 0);
--=20
2.25.1