From nobody Sun Apr 12 21:01:34 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F3733BC677 for ; Wed, 1 Apr 2026 21:47:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775080040; cv=none; b=QGIDrkcXmR4kpadioEP8/NSIo0zSdrJbwiRw66TK0DOLpmVczldKAM1VMFWIKVvJcWA47bIay6vF2XEFUawgSz8dBdl66ZhJBt5J5Ts1/WXFn8HpxY42uEYclFZqH0D2Ioj/k3InEaiX34H6pkWCK7meWbJOvotJQRMi44rDaI8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775080040; c=relaxed/simple; bh=0G5mZJrlsNhfHdSixX9hgEbB0XUZnA2ID6dePrCiEtQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WlFeOyxLQ8i7IALGFs/a7k/kwXifLpU4u0prmAUjSFlS6oawz9PcRSu3DA/4gX+FYzGQhRLUzMapsALhQMGAIm/1UOHjLxRcl2Fw9oAhkPBbJGeag7q22KwEwrqUy+swunxoOrjyPi/oiaNn3RIbebeWrhc+SV78zQ+dZQeMjh8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=CaWgjC+c; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="CaWgjC+c" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1775080040; x=1806616040; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0G5mZJrlsNhfHdSixX9hgEbB0XUZnA2ID6dePrCiEtQ=; b=CaWgjC+c9DHbM72g0mZtW2Yxyrj4Lnh8MZv2xprM6QNgwp/ZNFUvnB8S +e7ceH0B3CXqZup5/ue+oZl6rceqmgBzQZrubbTcrPhHu3Xj/jCGe/CN9 fUkeLfUiqr+K6AoT9ZavbpbFRCWg6v8m7zUJHOdNn2s6jtkrfknslt0LP H804JUvNG2sIeKUPfQaArmIELshLkINaSepIUcxdOpBTVVgdYgSJ4bKnu EVkMbXtzYOqQs60t2QG1E+1I5kc+XT+uNe8/He/CfgI5A7IuoZ3fIC1wk +P8yJL6vWRpuYRi5xH6xZeDHPJPJsqKBwtOr7w+WW/kAh82GCa5Xnp8Il w==; X-CSE-ConnectionGUID: FMkcXNZrSlip0QekKC0Hxw== X-CSE-MsgGUID: YUY1UAkISPSdTRtBBkMKCg== X-IronPort-AV: E=McAfee;i="6800,10657,11746"; a="79740143" X-IronPort-AV: E=Sophos;i="6.23,153,1770624000"; d="scan'208";a="79740143" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Apr 2026 14:47:19 -0700 X-CSE-ConnectionGUID: bfF37daUR8+YzeHZZB5D+A== X-CSE-MsgGUID: oUDYZAYATNG/O4ZYMSXRFA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,153,1770624000"; d="scan'208";a="249842503" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by fmviesa002.fm.intel.com with ESMTP; 01 Apr 2026 14:47:17 -0700 From: Tim Chen To: Peter Zijlstra , Ingo Molnar , K Prateek Nayak , "Gautham R . Shenoy" , Vincent Guittot Cc: Chen Yu , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Madadi Vineeth Reddy , Hillf Danton , Shrikanth Hegde , Jianyong Wu , Yangyu Chen , Tingyin Duan , Vern Hao , Vern Hao , Len Brown , Tim Chen , Aubrey Li , Zhao Liu , Chen Yu , Adam Li , Aaron Lu , Tim Chen , Josh Don , Gavin Guo , Qais Yousef , Libo Chen , linux-kernel@vger.kernel.org Subject: [Patch v4 21/22] -- DO NOT APPLY!!! -- sched/cache/debug: Display the per LLC occupancy for each process via proc fs Date: Wed, 1 Apr 2026 14:52:33 -0700 Message-Id: <93fde0f38f6cb5a780b404ea59b76db47481a93a.1775065312.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Chen Yu Debug patch only. Show the per-LLC occupancy in /proc/{PID}/schedstat, with each column corresponding to one LLC. This can be used to verify if the cache-aware load balancer works as expected by aggregating threads onto dedicated LLCs. Suppose there are 2 LLCs and the sampling duration is 10 seconds: Enable the cache aware load balance, LLC0 residency delta is 0, LLC1 is 12 seconds: 0 12281 disable the cache aware load balance, tasks are spread to 2 LLCs: 9299 5435 Note: The race condition is not properly dealt with in this patch; out-of-bounds access may be triggered during CPU hotplug due to max_llcs changes. Do not query per-LLC occupancy during CPU hotplug. Co-developed-by: Aaron Lu Signed-off-by: Aaron Lu Signed-off-by: Chen Yu Signed-off-by: Tim Chen --- Notes: v3->v4: Minor fix. fs/proc/base.c | 31 +++++++++++++++++++++++++ include/linux/mm_types.h | 17 +++++++++++--- include/linux/sched.h | 6 +++++ kernel/sched/fair.c | 50 ++++++++++++++++++++++++++++++++++++---- 4 files changed, 97 insertions(+), 7 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 4c863d17dfb4..42629f33e0fa 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -518,6 +518,37 @@ static int proc_pid_schedstat(struct seq_file *m, stru= ct pid_namespace *ns, (unsigned long long)task->se.sum_exec_runtime, (unsigned long long)task->sched_info.run_delay, task->sched_info.pcount); +#ifdef CONFIG_SCHED_CACHE + if (sched_cache_inuse()) { + struct mm_struct *mm =3D task->mm; + u64 *llc_runtime; + int mm_sched_llc; + + if (!mm) + return 0; + + llc_runtime =3D kcalloc(max_lid + 1, sizeof(u64), GFP_KERNEL); + if (!llc_runtime) + return 0; + + if (get_mm_per_llc_runtime(task, llc_runtime)) + goto out; + + if (mm->sc_stat.cpu =3D=3D -1) + mm_sched_llc =3D -1; + else + mm_sched_llc =3D llc_id(mm->sc_stat.cpu); + + for (int i =3D 0; i <=3D max_lid; i++) + seq_printf(m, "%s%s%llu ", + i =3D=3D task->preferred_llc ? "*" : "", + i =3D=3D mm_sched_llc ? "?" : "", + llc_runtime[i]); + seq_puts(m, "\n"); +out: + kfree(llc_runtime); + } +#endif =20 return 0; } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 67b2dfcc71ea..0eda55f29dd2 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1579,17 +1579,26 @@ static inline unsigned int mm_cid_size(void) =20 #ifdef CONFIG_SCHED_CACHE void mm_init_sched(struct mm_struct *mm, - struct sched_cache_time __percpu *pcpu_sched); + struct sched_cache_time __percpu *pcpu_sched, + struct sched_cache_time __percpu *pcpu_time); =20 static inline int mm_alloc_sched_noprof(struct mm_struct *mm) { struct sched_cache_time __percpu *pcpu_sched =3D - alloc_percpu_noprof(struct sched_cache_time); + alloc_percpu_noprof(struct sched_cache_time), + *pcpu_time; =20 if (!pcpu_sched) return -ENOMEM; =20 - mm_init_sched(mm, pcpu_sched); + pcpu_time =3D alloc_percpu_noprof(struct sched_cache_time); + if (!pcpu_time) { + free_percpu(pcpu_sched); + return -ENOMEM; + } + + mm_init_sched(mm, pcpu_sched, pcpu_time); + return 0; } =20 @@ -1598,7 +1607,9 @@ static inline int mm_alloc_sched_noprof(struct mm_str= uct *mm) static inline void mm_destroy_sched(struct mm_struct *mm) { free_percpu(mm->sc_stat.pcpu_sched); + free_percpu(mm->sc_stat.pcpu_time); mm->sc_stat.pcpu_sched =3D NULL; + mm->sc_stat.pcpu_time =3D NULL; } #else /* !CONFIG_SCHED_CACHE */ =20 diff --git a/include/linux/sched.h b/include/linux/sched.h index dfa4bfd099c6..e24b2b86aba4 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2390,12 +2390,18 @@ struct sched_cache_time { =20 struct sched_cache_stat { struct sched_cache_time __percpu *pcpu_sched; + struct sched_cache_time __percpu *pcpu_time; raw_spinlock_t lock; unsigned long epoch; u64 nr_running_avg; int cpu; } ____cacheline_aligned_in_smp; =20 +int get_mm_per_llc_runtime(struct task_struct *p, u64 *buf); +bool sched_cache_inuse(void); +extern int max_lid; +int llc_id(int cpu); + #else =20 struct sched_cache_stat { }; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e4e22696a0b1..2b12918b00fd 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1288,7 +1288,12 @@ __read_mostly unsigned int llc_epoch_affinity_timeou= t =3D EPOCH_LLC_AFFINITY_TIMEO __read_mostly unsigned int llc_imb_pct =3D 20; __read_mostly unsigned int llc_overaggr_pct =3D 50; =20 -static int llc_id(int cpu) +bool sched_cache_inuse(void) +{ + return sched_cache_enabled(); +} + +int llc_id(int cpu) { if (cpu < 0) return -1; @@ -1448,18 +1453,21 @@ static void account_llc_dequeue(struct rq *rq, stru= ct task_struct *p) } =20 void mm_init_sched(struct mm_struct *mm, - struct sched_cache_time __percpu *_pcpu_sched) + struct sched_cache_time __percpu *_pcpu_sched, + struct sched_cache_time __percpu *_pcpu_time) { unsigned long epoch =3D 0; int i; =20 for_each_possible_cpu(i) { struct sched_cache_time *pcpu_sched =3D per_cpu_ptr(_pcpu_sched, i); + struct sched_cache_time *pcpu_time =3D per_cpu_ptr(_pcpu_time, i); struct rq *rq =3D cpu_rq(i); =20 pcpu_sched->runtime =3D 0; /* a slightly stale cpu epoch is acceptible */ pcpu_sched->epoch =3D rq->cpu_epoch; + pcpu_time->runtime =3D 0; epoch =3D rq->cpu_epoch; } =20 @@ -1473,6 +1481,8 @@ void mm_init_sched(struct mm_struct *mm, * the readers may get invalid mm_sched_epoch, etc. */ smp_store_release(&mm->sc_stat.pcpu_sched, _pcpu_sched); + /* barrier */ + smp_store_release(&mm->sc_stat.pcpu_time, _pcpu_time); } =20 /* because why would C be fully specified */ @@ -1558,7 +1568,8 @@ static unsigned int task_running_on_cpu(int cpu, stru= ct task_struct *p); static inline void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec) { - struct sched_cache_time *pcpu_sched; + struct sched_cache_time *pcpu_sched, + *pcpu_time; struct mm_struct *mm =3D p->mm; int mm_sched_llc =3D -1; unsigned long epoch; @@ -1572,14 +1583,18 @@ void account_mm_sched(struct rq *rq, struct task_st= ruct *p, s64 delta_exec) * init_task, kthreads and user thread created * by user_mode_thread() don't have mm. */ - if (!mm || !mm->sc_stat.pcpu_sched) + if (!mm || !mm->sc_stat.pcpu_sched || + !mm->sc_stat.pcpu_time) return; =20 pcpu_sched =3D per_cpu_ptr(p->mm->sc_stat.pcpu_sched, cpu_of(rq)); + pcpu_time =3D per_cpu_ptr(p->mm->sc_stat.pcpu_time, cpu_of(rq)); =20 scoped_guard (raw_spinlock, &rq->cpu_epoch_lock) { __update_mm_sched(rq, pcpu_sched); pcpu_sched->runtime +=3D delta_exec; + /* pure runtime without decay */ + pcpu_time->runtime +=3D delta_exec; rq->cpu_runtime +=3D delta_exec; epoch =3D rq->cpu_epoch; } @@ -10328,6 +10343,33 @@ static bool migrate_degrades_llc(struct task_struc= t *p, struct lb_env *env) return true; } =20 +/* p->pi_lock is hold */ +int get_mm_per_llc_runtime(struct task_struct *p, u64 *buf) +{ + struct sched_cache_time *pcpu_time; + struct mm_struct *mm =3D p->mm; + int cpu; + + if (!mm) + return -EINVAL; + + rcu_read_lock(); + for_each_online_cpu(cpu) { + int llc =3D llc_id(cpu); + u64 runtime_ms; + + if (llc < 0 || llc > max_lid) + continue; + + pcpu_time =3D per_cpu_ptr(mm->sc_stat.pcpu_time, cpu); + runtime_ms =3D div_u64(pcpu_time->runtime, NSEC_PER_MSEC); + buf[llc] +=3D runtime_ms; + } + rcu_read_unlock(); + + return 0; +} + #else static inline bool get_llc_stats(int cpu, unsigned long *util, unsigned long *cap) --=20 2.32.0