From nobody Sun Apr 12 21:01:34 2026
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F3733BC677
	for <linux-kernel@vger.kernel.org>; Wed,  1 Apr 2026 21:47:19 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.15
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1775080040; cv=none;
 b=QGIDrkcXmR4kpadioEP8/NSIo0zSdrJbwiRw66TK0DOLpmVczldKAM1VMFWIKVvJcWA47bIay6vF2XEFUawgSz8dBdl66ZhJBt5J5Ts1/WXFn8HpxY42uEYclFZqH0D2Ioj/k3InEaiX34H6pkWCK7meWbJOvotJQRMi44rDaI8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1775080040; c=relaxed/simple;
	bh=0G5mZJrlsNhfHdSixX9hgEbB0XUZnA2ID6dePrCiEtQ=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=WlFeOyxLQ8i7IALGFs/a7k/kwXifLpU4u0prmAUjSFlS6oawz9PcRSu3DA/4gX+FYzGQhRLUzMapsALhQMGAIm/1UOHjLxRcl2Fw9oAhkPBbJGeag7q22KwEwrqUy+swunxoOrjyPi/oiaNn3RIbebeWrhc+SV78zQ+dZQeMjh8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com;
 spf=pass smtp.mailfrom=linux.intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=CaWgjC+c; arc=none smtp.client-ip=198.175.65.15
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="CaWgjC+c"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1775080040; x=1806616040;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=0G5mZJrlsNhfHdSixX9hgEbB0XUZnA2ID6dePrCiEtQ=;
  b=CaWgjC+c9DHbM72g0mZtW2Yxyrj4Lnh8MZv2xprM6QNgwp/ZNFUvnB8S
   +e7ceH0B3CXqZup5/ue+oZl6rceqmgBzQZrubbTcrPhHu3Xj/jCGe/CN9
   fUkeLfUiqr+K6AoT9ZavbpbFRCWg6v8m7zUJHOdNn2s6jtkrfknslt0LP
   H804JUvNG2sIeKUPfQaArmIELshLkINaSepIUcxdOpBTVVgdYgSJ4bKnu
   EVkMbXtzYOqQs60t2QG1E+1I5kc+XT+uNe8/He/CfgI5A7IuoZ3fIC1wk
   +P8yJL6vWRpuYRi5xH6xZeDHPJPJsqKBwtOr7w+WW/kAh82GCa5Xnp8Il
   w==;
X-CSE-ConnectionGUID: FMkcXNZrSlip0QekKC0Hxw==
X-CSE-MsgGUID: YUY1UAkISPSdTRtBBkMKCg==
X-IronPort-AV: E=McAfee;i="6800,10657,11746"; a="79740143"
X-IronPort-AV: E=Sophos;i="6.23,153,1770624000";
   d="scan'208";a="79740143"
Received: from fmviesa002.fm.intel.com ([10.60.135.142])
  by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 01 Apr 2026 14:47:19 -0700
X-CSE-ConnectionGUID: bfF37daUR8+YzeHZZB5D+A==
X-CSE-MsgGUID: oUDYZAYATNG/O4ZYMSXRFA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.23,153,1770624000";
   d="scan'208";a="249842503"
Received: from b04f130c83f2.jf.intel.com ([10.165.154.98])
  by fmviesa002.fm.intel.com with ESMTP; 01 Apr 2026 14:47:17 -0700
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Cc: Chen Yu <yu.c.chen@intel.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>,
	Vern Hao <haoxing990@gmail.com>,
	Len Brown <len.brown@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Chen Yu <yu.chen.surf@gmail.com>,
	Adam Li <adamli@os.amperecomputing.com>,
	Aaron Lu <ziqianlu@bytedance.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Josh Don <joshdon@google.com>,
	Gavin Guo <gavinguo@igalia.com>,
	Qais Yousef <qyousef@layalina.io>,
	Libo Chen <libchen@purestorage.com>,
	linux-kernel@vger.kernel.org
Subject: [Patch v4 21/22] -- DO NOT APPLY!!! -- sched/cache/debug: Display the
 per LLC occupancy for each process via proc fs
Date: Wed,  1 Apr 2026 14:52:33 -0700
Message-Id: 
 <93fde0f38f6cb5a780b404ea59b76db47481a93a.1775065312.git.tim.c.chen@linux.intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <cover.1775065312.git.tim.c.chen@linux.intel.com>
References: <cover.1775065312.git.tim.c.chen@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Chen Yu <yu.c.chen@intel.com>

Debug patch only.

Show the per-LLC occupancy in /proc/{PID}/schedstat, with each column
corresponding to one LLC. This can be used to verify if the cache-aware
load balancer works as expected by aggregating threads onto dedicated LLCs.

Suppose there are 2 LLCs and the sampling duration is 10 seconds:

Enable the cache aware load balance, LLC0 residency delta is 0,
LLC1 is 12 seconds:
0 12281

disable the cache aware load balance, tasks are spread to 2
LLCs:
9299 5435

Note: The race condition is not properly dealt with in this patch;
out-of-bounds access may be triggered during CPU hotplug due to
max_llcs changes. Do not query per-LLC occupancy during CPU hotplug.

Co-developed-by: Aaron Lu <ziqianlu@bytedance.com>
Signed-off-by: Aaron Lu <ziqianlu@bytedance.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---

Notes:
    v3->v4:
       Minor fix.

 fs/proc/base.c           | 31 +++++++++++++++++++++++++
 include/linux/mm_types.h | 17 +++++++++++---
 include/linux/sched.h    |  6 +++++
 kernel/sched/fair.c      | 50 ++++++++++++++++++++++++++++++++++++----
 4 files changed, 97 insertions(+), 7 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 4c863d17dfb4..42629f33e0fa 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -518,6 +518,37 @@ static int proc_pid_schedstat(struct seq_file *m, stru=
ct pid_namespace *ns,
 		   (unsigned long long)task->se.sum_exec_runtime,
 		   (unsigned long long)task->sched_info.run_delay,
 		   task->sched_info.pcount);
+#ifdef CONFIG_SCHED_CACHE
+	if (sched_cache_inuse()) {
+		struct mm_struct *mm =3D task->mm;
+		u64 *llc_runtime;
+		int mm_sched_llc;
+
+		if (!mm)
+			return 0;
+
+		llc_runtime =3D kcalloc(max_lid + 1, sizeof(u64), GFP_KERNEL);
+		if (!llc_runtime)
+			return 0;
+
+		if (get_mm_per_llc_runtime(task, llc_runtime))
+			goto out;
+
+		if (mm->sc_stat.cpu =3D=3D -1)
+			mm_sched_llc =3D -1;
+		else
+			mm_sched_llc =3D llc_id(mm->sc_stat.cpu);
+
+		for (int i =3D 0; i <=3D max_lid; i++)
+			seq_printf(m, "%s%s%llu ",
+				   i =3D=3D task->preferred_llc ? "*" : "",
+				   i =3D=3D mm_sched_llc ? "?" : "",
+				   llc_runtime[i]);
+		seq_puts(m, "\n");
+out:
+		kfree(llc_runtime);
+	}
+#endif
=20
 	return 0;
 }
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 67b2dfcc71ea..0eda55f29dd2 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1579,17 +1579,26 @@ static inline unsigned int mm_cid_size(void)
=20
 #ifdef CONFIG_SCHED_CACHE
 void mm_init_sched(struct mm_struct *mm,
-		   struct sched_cache_time __percpu *pcpu_sched);
+		   struct sched_cache_time __percpu *pcpu_sched,
+		   struct sched_cache_time __percpu *pcpu_time);
=20
 static inline int mm_alloc_sched_noprof(struct mm_struct *mm)
 {
 	struct sched_cache_time __percpu *pcpu_sched =3D
-		alloc_percpu_noprof(struct sched_cache_time);
+		alloc_percpu_noprof(struct sched_cache_time),
+		*pcpu_time;
=20
 	if (!pcpu_sched)
 		return -ENOMEM;
=20
-	mm_init_sched(mm, pcpu_sched);
+	pcpu_time =3D alloc_percpu_noprof(struct sched_cache_time);
+	if (!pcpu_time) {
+		free_percpu(pcpu_sched);
+		return -ENOMEM;
+	}
+
+	mm_init_sched(mm, pcpu_sched, pcpu_time);
+
 	return 0;
 }
=20
@@ -1598,7 +1607,9 @@ static inline int mm_alloc_sched_noprof(struct mm_str=
uct *mm)
 static inline void mm_destroy_sched(struct mm_struct *mm)
 {
 	free_percpu(mm->sc_stat.pcpu_sched);
+	free_percpu(mm->sc_stat.pcpu_time);
 	mm->sc_stat.pcpu_sched =3D NULL;
+	mm->sc_stat.pcpu_time =3D NULL;
 }
 #else /* !CONFIG_SCHED_CACHE */
=20
diff --git a/include/linux/sched.h b/include/linux/sched.h
index dfa4bfd099c6..e24b2b86aba4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2390,12 +2390,18 @@ struct sched_cache_time {
=20
 struct sched_cache_stat {
 	struct sched_cache_time __percpu *pcpu_sched;
+	struct sched_cache_time __percpu *pcpu_time;
 	raw_spinlock_t lock;
 	unsigned long epoch;
 	u64 nr_running_avg;
 	int cpu;
 } ____cacheline_aligned_in_smp;
=20
+int get_mm_per_llc_runtime(struct task_struct *p, u64 *buf);
+bool sched_cache_inuse(void);
+extern int max_lid;
+int llc_id(int cpu);
+
 #else
=20
 struct sched_cache_stat { };
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e4e22696a0b1..2b12918b00fd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1288,7 +1288,12 @@ __read_mostly unsigned int llc_epoch_affinity_timeou=
t =3D EPOCH_LLC_AFFINITY_TIMEO
 __read_mostly unsigned int llc_imb_pct		=3D 20;
 __read_mostly unsigned int llc_overaggr_pct	=3D 50;
=20
-static int llc_id(int cpu)
+bool sched_cache_inuse(void)
+{
+	return sched_cache_enabled();
+}
+
+int llc_id(int cpu)
 {
 	if (cpu < 0)
 		return -1;
@@ -1448,18 +1453,21 @@ static void account_llc_dequeue(struct rq *rq, stru=
ct task_struct *p)
 }
=20
 void mm_init_sched(struct mm_struct *mm,
-		   struct sched_cache_time __percpu *_pcpu_sched)
+		   struct sched_cache_time __percpu *_pcpu_sched,
+		   struct sched_cache_time __percpu *_pcpu_time)
 {
 	unsigned long epoch =3D 0;
 	int i;
=20
 	for_each_possible_cpu(i) {
 		struct sched_cache_time *pcpu_sched =3D per_cpu_ptr(_pcpu_sched, i);
+		struct sched_cache_time *pcpu_time =3D per_cpu_ptr(_pcpu_time, i);
 		struct rq *rq =3D cpu_rq(i);
=20
 		pcpu_sched->runtime =3D 0;
 		/* a slightly stale cpu epoch is acceptible */
 		pcpu_sched->epoch =3D rq->cpu_epoch;
+		pcpu_time->runtime =3D 0;
 		epoch =3D rq->cpu_epoch;
 	}
=20
@@ -1473,6 +1481,8 @@ void mm_init_sched(struct mm_struct *mm,
 	 * the readers may get invalid mm_sched_epoch, etc.
 	 */
 	smp_store_release(&mm->sc_stat.pcpu_sched, _pcpu_sched);
+	/* barrier */
+	smp_store_release(&mm->sc_stat.pcpu_time, _pcpu_time);
 }
=20
 /* because why would C be fully specified */
@@ -1558,7 +1568,8 @@ static unsigned int task_running_on_cpu(int cpu, stru=
ct task_struct *p);
 static inline
 void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec)
 {
-	struct sched_cache_time *pcpu_sched;
+	struct sched_cache_time *pcpu_sched,
+		*pcpu_time;
 	struct mm_struct *mm =3D p->mm;
 	int mm_sched_llc =3D -1;
 	unsigned long epoch;
@@ -1572,14 +1583,18 @@ void account_mm_sched(struct rq *rq, struct task_st=
ruct *p, s64 delta_exec)
 	 * init_task, kthreads and user thread created
 	 * by user_mode_thread() don't have mm.
 	 */
-	if (!mm || !mm->sc_stat.pcpu_sched)
+	if (!mm || !mm->sc_stat.pcpu_sched ||
+	    !mm->sc_stat.pcpu_time)
 		return;
=20
 	pcpu_sched =3D per_cpu_ptr(p->mm->sc_stat.pcpu_sched, cpu_of(rq));
+	pcpu_time =3D per_cpu_ptr(p->mm->sc_stat.pcpu_time, cpu_of(rq));
=20
 	scoped_guard (raw_spinlock, &rq->cpu_epoch_lock) {
 		__update_mm_sched(rq, pcpu_sched);
 		pcpu_sched->runtime +=3D delta_exec;
+		/* pure runtime without decay */
+		pcpu_time->runtime +=3D delta_exec;
 		rq->cpu_runtime +=3D delta_exec;
 		epoch =3D rq->cpu_epoch;
 	}
@@ -10328,6 +10343,33 @@ static bool migrate_degrades_llc(struct task_struc=
t *p, struct lb_env *env)
 	return true;
 }
=20
+/* p->pi_lock is hold */
+int get_mm_per_llc_runtime(struct task_struct *p, u64 *buf)
+{
+	struct sched_cache_time *pcpu_time;
+	struct mm_struct *mm =3D p->mm;
+	int cpu;
+
+	if (!mm)
+		return -EINVAL;
+
+	rcu_read_lock();
+	for_each_online_cpu(cpu) {
+		int llc =3D llc_id(cpu);
+		u64 runtime_ms;
+
+		if (llc < 0 || llc > max_lid)
+			continue;
+
+		pcpu_time =3D per_cpu_ptr(mm->sc_stat.pcpu_time, cpu);
+		runtime_ms =3D div_u64(pcpu_time->runtime, NSEC_PER_MSEC);
+		buf[llc] +=3D runtime_ms;
+	}
+	rcu_read_unlock();
+
+	return 0;
+}
+
 #else
 static inline bool get_llc_stats(int cpu, unsigned long *util,
 				 unsigned long *cap)
--=20
2.32.0