From nobody Thu Apr  2 17:02:04 2026
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F38F394481
	for <linux-kernel@vger.kernel.org>; Tue, 10 Feb 2026 22:13:50 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1770761631; cv=none;
 b=CbsdXjboyNHam/s/WgHNm2UwErJtHkfQqkiR6t1EwFvwnrfjpGPL7+fA57h0mqZbt6rFSQ58Nxtuw50O8vO/cJXuWaK/nJgAUn+fcyXUPB9OjKiiZ5dw7EqJf/YFZao72EcKTe+GJWiUqiYfk1IFwj0Kn0V9nZ44H7aOe/2qaYY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1770761631; c=relaxed/simple;
	bh=wghvjcCtlAuZLzOfZ4J/EiqZ5QL4yndcyXeomtKcxz8=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=QYEHtnMS97gFJr7UP5SX3RK690dgbnmYAcIQcjgwTBwEcdMODCUpEur8kKBR/g1d7smNJgdsNhE4H4D3zaB+9r/SAzpliPJb9Mw9uxiWMYLuOUrE5nbccqkyuG1hOKv/PzAwn3Kzgx7MsPwxbcrJOhUeKnKpQVhyOXoMZNPa5KA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=pass smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=YhdbfbjW; arc=none smtp.client-ip=192.198.163.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="YhdbfbjW"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1770761630; x=1802297630;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=wghvjcCtlAuZLzOfZ4J/EiqZ5QL4yndcyXeomtKcxz8=;
  b=YhdbfbjWjyKqgcWXrUUh+Bx6MUZmnEo3ibRVtCzCy+1wxYsrTKbzHdMZ
   DDDEMPZwr0iPLUMxQv1X8QnUD/DJ9Y0RZaaTuMOjvEcVtKnxc8u2oEHoN
   JySn++1SiF1yQWg+ZcdFzqSfUwZQl3ikWvkkP4Z622i5LIQzMbfmG5PIx
   U59Eh3g2OlfyyQY5c4bWkoqZOgiHErYr6KB17UyNYCZkl0Mi2+NHfV4on
   JJS+ecQCABxXHZ+IQa5hOqzWBnct2CTGTxfu+QjCSTc3YJPpMtXimOmS5
   ZbrRPpBCXw92xEravRppxq4S4W7wejo994Acx4S+e7+8hYkQY3TttHEqC
   w==;
X-CSE-ConnectionGUID: tFTCPkIMRc62c5T2VK6jew==
X-CSE-MsgGUID: qOcwBoX/TDyh/kqnX5PRxA==
X-IronPort-AV: E=McAfee;i="6800,10657,11697"; a="82631651"
X-IronPort-AV: E=Sophos;i="6.21,283,1763452800";
   d="scan'208";a="82631651"
Received: from fmviesa004.fm.intel.com ([10.60.135.144])
  by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 10 Feb 2026 14:13:49 -0800
X-CSE-ConnectionGUID: ye5jfKVFQ9+0v7lGFII88g==
X-CSE-MsgGUID: NdY6Z6pfSjSMakp/1q8R3g==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.21,283,1763452800";
   d="scan'208";a="216374046"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa004.fm.intel.com with ESMTP; 10 Feb 2026 14:13:48 -0800
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Cc: Chen Yu <yu.c.chen@intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Vern Hao <haoxing990@gmail.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Adam Li <adamli@os.amperecomputing.com>,
	Aaron Lu <ziqianlu@bytedance.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Josh Don <joshdon@google.com>,
	Gavin Guo <gavinguo@igalia.com>,
	Qais Yousef <qyousef@layalina.io>,
	Libo Chen <libchen@purestorage.com>,
	linux-kernel@vger.kernel.org
Subject: [PATCH v3 21/21] -- DO NOT APPLY!!! -- sched/cache/debug: Add ftrace
 to track the load balance statistics
Date: Tue, 10 Feb 2026 14:19:01 -0800
Message-Id: 
 <5d663caaed7ebe93ab9b272235675b2400b3ed8b.1770760558.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1770760558.git.tim.c.chen@linux.intel.com>
References: <cover.1770760558.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Chen Yu <yu.c.chen@intel.com>

Debug patch only.

The user leverages these trace events (via bpftrace, etc.)
to monitor the cache-aware load balancing activity - specifically,
whether tasks are moved to their preferred LLC, moved out of their
preferred LLC, or whether cache-aware load balancing is skipped
due to exceeding the memory footprint limit or too many active
tasks.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---

Notes:
    v2->v3:
    Add more trace events when the process exceeds the limitation
    of LLC size or number of active threads(moved from schedstat
    to trace event for better bpf tracking)

 include/trace/events/sched.h | 79 ++++++++++++++++++++++++++++++++++++
 kernel/sched/fair.c          | 40 ++++++++++++++----
 2 files changed, 110 insertions(+), 9 deletions(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 7b2645b50e78..b73327653e4b 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -10,6 +10,85 @@
 #include <linux/tracepoint.h>
 #include <linux/binfmts.h>
=20
+#ifdef CONFIG_SCHED_CACHE
+TRACE_EVENT(sched_exceed_llc_cap,
+
+	TP_PROTO(struct task_struct *t, int exceeded),
+
+	TP_ARGS(t, exceeded),
+
+	TP_STRUCT__entry(
+		__array( char,	comm,	TASK_COMM_LEN	)
+		__field( pid_t,	pid			)
+		__field( int,	exceeded		)
+	),
+
+	TP_fast_assign(
+		memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
+		__entry->pid		=3D t->pid;
+		__entry->exceeded	=3D exceeded;
+	),
+
+	TP_printk("comm=3D%s pid=3D%d exceed_cap=3D%d",
+			__entry->comm, __entry->pid,
+			__entry->exceeded)
+);
+
+TRACE_EVENT(sched_exceed_llc_nr,
+
+	TP_PROTO(struct task_struct *t, int exceeded),
+
+	TP_ARGS(t, exceeded),
+
+	TP_STRUCT__entry(
+		__array( char,	comm,	TASK_COMM_LEN	)
+		__field( pid_t,	pid			)
+		__field( int,	exceeded		)
+	),
+
+	TP_fast_assign(
+		memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
+		__entry->pid		=3D t->pid;
+		__entry->exceeded	=3D exceeded;
+	),
+
+	TP_printk("comm=3D%s pid=3D%d exceed_nr=3D%d",
+			__entry->comm, __entry->pid,
+			__entry->exceeded)
+);
+
+TRACE_EVENT(sched_attach_task,
+
+	TP_PROTO(struct task_struct *t, int pref_cpu, int pref_llc,
+		 int attach_cpu, int attach_llc),
+
+	TP_ARGS(t, pref_cpu, pref_llc, attach_cpu, attach_llc),
+
+	TP_STRUCT__entry(
+		__array( char,	comm,	TASK_COMM_LEN	)
+		__field( pid_t,	pid			)
+		__field( int,	pref_cpu		)
+		__field( int,	pref_llc		)
+		__field( int,	attach_cpu		)
+		__field( int,	attach_llc		)
+	),
+
+	TP_fast_assign(
+		memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
+		__entry->pid		=3D t->pid;
+		__entry->pref_cpu	=3D pref_cpu;
+		__entry->pref_llc	=3D pref_llc;
+		__entry->attach_cpu	=3D attach_cpu;
+		__entry->attach_llc	=3D attach_llc;
+	),
+
+	TP_printk("comm=3D%s pid=3D%d pref_cpu=3D%d pref_llc=3D%d attach_cpu=3D%d=
 attach_llc=3D%d",
+			__entry->comm, __entry->pid,
+			__entry->pref_cpu, __entry->pref_llc,
+			__entry->attach_cpu, __entry->attach_llc)
+);
+#endif
+
 /*
  * Tracepoint for calling kthread_stop, performed to end a kthread:
  */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 25cee3dd767c..977091fd0e49 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1245,9 +1245,11 @@ static inline int get_sched_cache_scale(int mul)
 	return (1 + (llc_aggr_tolerance - 1) * mul);
 }
=20
-static bool exceed_llc_capacity(struct mm_struct *mm, int cpu)
+static bool exceed_llc_capacity(struct mm_struct *mm, int cpu,
+				struct task_struct *p)
 {
 	struct cacheinfo *ci;
+	bool exceeded;
 	u64 rss, llc;
 	int scale;
=20
@@ -1293,12 +1295,18 @@ static bool exceed_llc_capacity(struct mm_struct *m=
m, int cpu)
 	if (scale =3D=3D INT_MAX)
 		return false;
=20
-	return ((llc * scale) <=3D (rss * PAGE_SIZE));
+	exceeded =3D ((llc * scale) <=3D (rss * PAGE_SIZE));
+
+	trace_sched_exceed_llc_cap(p, exceeded);
+
+	return exceeded;
 }
=20
-static bool exceed_llc_nr(struct mm_struct *mm, int cpu)
+static bool exceed_llc_nr(struct mm_struct *mm, int cpu,
+			  struct task_struct *p)
 {
 	int smt_nr =3D 1, scale;
+	bool exceeded;
=20
 #ifdef CONFIG_SCHED_SMT
 	if (sched_smt_active())
@@ -1313,8 +1321,12 @@ static bool exceed_llc_nr(struct mm_struct *mm, int =
cpu)
 	if (scale =3D=3D INT_MAX)
 		return false;
=20
-	return !fits_capacity((mm->sc_stat.nr_running_avg * smt_nr),
+	exceeded =3D !fits_capacity((mm->sc_stat.nr_running_avg * smt_nr),
 			(scale * per_cpu(sd_llc_size, cpu)));
+
+	trace_sched_exceed_llc_nr(p, exceeded);
+
+	return exceeded;
 }
=20
 static void account_llc_enqueue(struct rq *rq, struct task_struct *p)
@@ -1522,8 +1534,8 @@ void account_mm_sched(struct rq *rq, struct task_stru=
ct *p, s64 delta_exec)
 	if (time_after(epoch,
 		       READ_ONCE(mm->sc_stat.epoch) + llc_epoch_affinity_timeout) ||
 	    get_nr_threads(p) <=3D 1 ||
-	    exceed_llc_nr(mm, cpu_of(rq)) ||
-	    exceed_llc_capacity(mm, cpu_of(rq))) {
+	    exceed_llc_nr(mm, cpu_of(rq), p) ||
+	    exceed_llc_capacity(mm, cpu_of(rq), p)) {
 		if (mm->sc_stat.cpu !=3D -1)
 			mm->sc_stat.cpu =3D -1;
 	}
@@ -1600,7 +1612,7 @@ static void task_cache_work(struct callback_head *wor=
k)
=20
 	curr_cpu =3D task_cpu(p);
 	if (get_nr_threads(p) <=3D 1 ||
-	    exceed_llc_capacity(mm, curr_cpu)) {
+	    exceed_llc_capacity(mm, curr_cpu, p)) {
 		if (mm->sc_stat.cpu !=3D -1)
 			mm->sc_stat.cpu =3D -1;
=20
@@ -10159,8 +10171,8 @@ static enum llc_mig can_migrate_llc_task(int src_cp=
u, int dst_cpu,
 	 * Skip cache aware load balance for single/too many threads
 	 * or large memory RSS.
 	 */
-	if (get_nr_threads(p) <=3D 1 || exceed_llc_nr(mm, dst_cpu) ||
-	    exceed_llc_capacity(mm, dst_cpu)) {
+	if (get_nr_threads(p) <=3D 1 || exceed_llc_nr(mm, dst_cpu, p) ||
+	    exceed_llc_capacity(mm, dst_cpu, p)) {
 		if (mm->sc_stat.cpu !=3D -1)
 			mm->sc_stat.cpu =3D -1;
 		return mig_unrestricted;
@@ -10602,6 +10614,16 @@ static void attach_task(struct rq *rq, struct task=
_struct *p)
 {
 	lockdep_assert_rq_held(rq);
=20
+#ifdef CONFIG_SCHED_CACHE
+	if (p->mm) {
+		int pref_cpu =3D p->mm->sc_stat.cpu;
+
+		trace_sched_attach_task(p,
+					pref_cpu,
+					pref_cpu !=3D -1 ? llc_id(pref_cpu) : -1,
+					cpu_of(rq), llc_id(cpu_of(rq)));
+	}
+#endif
 	WARN_ON_ONCE(task_rq(p) !=3D rq);
 	activate_task(rq, p, ENQUEUE_NOCLOCK);
 	wakeup_preempt(rq, p, 0);
--=20
2.32.0