From nobody Sun Apr 12 21:02:23 2026
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CBE139B969
	for <linux-kernel@vger.kernel.org>; Wed,  1 Apr 2026 21:47:10 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.15
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1775080031; cv=none;
 b=qU1kBxWj6uXwd/EImRPb1lVKVDJB8zW84FggTFLB0b600pmP+1i6OonpuDFT8ePzsuP2n2Xb5ZVk5IXy9EjuRkJe/lkwsNhySDaVMkywRy5IzoZgRaruNS+Ma/lw/+77RK3i2nBO9UlA/ddMnPUtsccPR2HOY6ro3It6qt3rtls=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1775080031; c=relaxed/simple;
	bh=48rGNjeH05FXE5UL8IXWGMzqRphpvmhZEDiqHQroaDU=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=OVtrJ8w7P/O7/Q6rV85trbLDIlW0lX5BtttWsWFHXUGl8LANBwbWYlQwvWin+vXJQ7xToNzlGq6e78krcP3CzofaRUldbj1z99QIDiWpcd9+k/oZi0+3cVERGsFHwRbSZ2bKTpqjiwX9xVqlvSJXkOl+M333cd9zr2388uAzjXY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=pass smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=J9AC9D5x; arc=none smtp.client-ip=198.175.65.15
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="J9AC9D5x"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1775080031; x=1806616031;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=48rGNjeH05FXE5UL8IXWGMzqRphpvmhZEDiqHQroaDU=;
  b=J9AC9D5x2/7i2PQnuM4Rj/Blb5TbIlPFmyz+1/VUwk63BGo4Z+5Sp0rC
   jOp5KC4jQEPCfF7nbpkhBqhgGRCwhTkE3YVypY8g6+lE6IHYHyL247LjK
   J3sP1QeFR5sdHfzAYqcuJSA0qZApWeBTMopCoRw4KV7zKqeVJ2PZKqDVP
   KoiFvsyMCJak5yGkRJo0lwzH475rM7RdpSQtYLrARthmjUHnwe+mj2DNU
   4mXAVwPdZ3UkQS/f8QfvCfvdbdEZOGTs9Z+KN8V7T3H5p0k3KYzVbkcXK
   YO00UFfTyhl/T3vgUNVtViDeFNk+QSpDvX9UUZfrEbEQE/ULPhn2zvdZm
   A==;
X-CSE-ConnectionGUID: DDtcGfjoSsyjdn3yl16MUA==
X-CSE-MsgGUID: 07shk/tZQqOPC9lmw2uXHA==
X-IronPort-AV: E=McAfee;i="6800,10657,11746"; a="79740013"
X-IronPort-AV: E=Sophos;i="6.23,153,1770624000";
   d="scan'208";a="79740013"
Received: from fmviesa002.fm.intel.com ([10.60.135.142])
  by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 01 Apr 2026 14:47:09 -0700
X-CSE-ConnectionGUID: LN3U2ImKSji5YJQnF/1O9g==
X-CSE-MsgGUID: Q2GeRomqSmuEOupZqKWKwA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.23,153,1770624000";
   d="scan'208";a="249842478"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa002.fm.intel.com with ESMTP; 01 Apr 2026 14:47:07 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Vern Hao <haoxing990@gmail.com>,
	Len Brown <len.brown@intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Chen Yu <yu.c.chen@intel.com>,
	Adam Li <adamli@os.amperecomputing.com>,
	Aaron Lu <ziqianlu@bytedance.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Josh Don <joshdon@google.com>,
	Gavin Guo <gavinguo@igalia.com>,
	Qais Yousef <qyousef@layalina.io>,
	Libo Chen <libchen@purestorage.com>,
	linux-kernel@vger.kernel.org
Subject: [Patch v4 15/22] sched/cache: Respect LLC preference in task
 migration and detach
Date: Wed,  1 Apr 2026 14:52:27 -0700
Message-Id: 
 <53da65f3d59de31e1a1dc59a4093d8dd9d4dc206.1775065312.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1775065312.git.tim.c.chen@linux.intel.com>
References: <cover.1775065312.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

During load balancing, make can_migrate_task()
consider a task's LLC preference.
Prevent a task from being moved out of its preferred LLC.

During the regular load balancing, if
the task cannot be migrated due to LLC locality, the
nr_balance_failed also should not be increased.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---

Notes:
    v3->v4:
       Fix the bug in migrate_degrades_llc() that p->preferred_llc should
       be used for comparison rather than task_llc(p).
       (Madadi Vineeth Reddy)
   =20
       Let nr_balance_failed overwrite cache-aware migration if the
       former is too high. (Peter Zijlstra, K Prateek Nayak)

 kernel/sched/fair.c  | 83 +++++++++++++++++++++++++++++++++++++++++---
 kernel/sched/sched.h | 13 +++++++
 2 files changed, 91 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fef916afa1d5..9541e94370e7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9777,6 +9777,7 @@ enum migration_type {
 #define LBF_DST_PINNED  0x04
 #define LBF_SOME_PINNED	0x08
 #define LBF_ACTIVE_LB	0x10
+#define LBF_LLC_PINNED	0x20
=20
 struct lb_env {
 	struct sched_domain	*sd;
@@ -10089,8 +10090,8 @@ static enum llc_mig can_migrate_llc(int src_cpu, in=
t dst_cpu,
  * Check if task p can migrate from source LLC to
  * destination LLC in terms of cache aware load balance.
  */
-static __maybe_unused enum llc_mig can_migrate_llc_task(int src_cpu, int d=
st_cpu,
-							struct task_struct *p)
+static enum llc_mig can_migrate_llc_task(int src_cpu, int dst_cpu,
+					 struct task_struct *p)
 {
 	struct mm_struct *mm;
 	bool to_pref;
@@ -10157,6 +10158,46 @@ alb_break_llc(struct lb_env *env)
=20
 	return false;
 }
+
+/*
+ * Check if migrating task p from env->src_cpu to
+ * env->dst_cpu breaks LLC localiy.
+ */
+static bool migrate_degrades_llc(struct task_struct *p, struct lb_env *env)
+{
+	if (!sched_cache_enabled())
+		return false;
+
+	if (task_has_sched_core(p))
+		return false;
+	/*
+	 * Skip over tasks that would degrade LLC locality;
+	 * only when nr_balanced_failed is sufficiently high do we
+	 * ignore this constraint.
+	 *
+	 * Threshold of cache_nice_tries is set to 1 higher
+	 * than nr_balance_failed to avoid excessive task
+	 * migration at the same time.
+	 */
+	if (env->sd->nr_balance_failed >=3D env->sd->cache_nice_tries + 1)
+		return false;
+
+	/*
+	 * We know the env->src_cpu has some tasks prefer to
+	 * run on env->dst_cpu, skip the tasks do not prefer
+	 * env->dst_cpu, and find the one that prefers.
+	 */
+	if (env->migration_type =3D=3D migrate_llc_task &&
+	    READ_ONCE(p->preferred_llc) !=3D llc_id(env->dst_cpu))
+		return true;
+
+	if (can_migrate_llc_task(env->src_cpu,
+				 env->dst_cpu, p) !=3D mig_forbid)
+		return false;
+
+	return true;
+}
+
 #else
 static inline bool get_llc_stats(int cpu, unsigned long *util,
 				 unsigned long *cap)
@@ -10169,6 +10210,12 @@ alb_break_llc(struct lb_env *env)
 {
 	return false;
 }
+
+static inline bool
+migrate_degrades_llc(struct task_struct *p, struct lb_env *env)
+{
+	return false;
+}
 #endif
 /*
  * can_migrate_task - may task p from runqueue rq be migrated to this_cpu?
@@ -10266,10 +10313,29 @@ int can_migrate_task(struct task_struct *p, struc=
t lb_env *env)
 		return 1;
=20
 	degrades =3D migrate_degrades_locality(p, env);
-	if (!degrades)
+	if (!degrades) {
+		/*
+		 * If the NUMA locality is not broken,
+		 * further check if migration would hurt
+		 * LLC locality.
+		 */
+		if (migrate_degrades_llc(p, env)) {
+			/*
+			 * If regular load balancing fails to pull a task
+			 * due to LLC locality, this is expected behavior
+			 * and we set LBF_LLC_PINNED so we don't increase
+			 * nr_balance_failed unecessarily.
+			 */
+			if (env->migration_type !=3D migrate_llc_task)
+				env->flags |=3D LBF_LLC_PINNED;
+
+			return 0;
+		}
+
 		hot =3D task_hot(p, env);
-	else
+	} else {
 		hot =3D degrades > 0;
+	}
=20
 	if (!hot || env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
 		if (hot)
@@ -12910,9 +12976,16 @@ static int sched_balance_rq(int this_cpu, struct r=
q *this_rq,
 		 *
 		 * Similarly for migration_misfit which is not related to
 		 * load/util migration, don't pollute nr_balance_failed.
+		 *
+		 * The same for cache aware scheduling's allowance for
+		 * load imbalance. If regular load balance does not
+		 * migrate task due to LLC locality, it is a expected
+		 * behavior and don't pollute nr_balance_failed.
+		 * See can_migrate_task().
 		 */
 		if (idle !=3D CPU_NEWLY_IDLE &&
-		    env.migration_type !=3D migrate_misfit)
+		    env.migration_type !=3D migrate_misfit &&
+		    !(env.flags & LBF_LLC_PINNED))
 			sd->nr_balance_failed++;
=20
 		if (need_active_balance(&env)) {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 081f23a48414..511c85572b96 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1546,6 +1546,14 @@ extern void sched_core_dequeue(struct rq *rq, struct=
 task_struct *p, int flags);
 extern void sched_core_get(void);
 extern void sched_core_put(void);
=20
+static inline bool task_has_sched_core(struct task_struct *p)
+{
+	if (sched_core_disabled())
+		return false;
+
+	return !!p->core_cookie;
+}
+
 #else /* !CONFIG_SCHED_CORE: */
=20
 static inline bool sched_core_enabled(struct rq *rq)
@@ -1586,6 +1594,11 @@ static inline bool sched_group_cookie_match(struct r=
q *rq,
 	return true;
 }
=20
+static inline bool task_has_sched_core(struct task_struct *p)
+{
+	return false;
+}
+
 #endif /* !CONFIG_SCHED_CORE */
=20
 #ifdef CONFIG_RT_GROUP_SCHED
--=20
2.32.0