From nobody Thu Apr 2 16:57:53 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE4A638B9A3 for ; Tue, 10 Feb 2026 22:13:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770761630; cv=none; b=W9BAUXsIdiUgOu/jugDl5iuSEVzsNW5XrVY7R+aldzJpO1qaGVzhguOsGkwSsVnyQnggayJC1eCZnBtkWnSMq0cbwYtorNOgZV9ozwUkoOfisv/zG1SR0RkwiBr3Nyw8XN08+5cmYDTcs3lKcD4F4xmCCspbk/BPAYHmnhfeHjk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770761630; c=relaxed/simple; bh=0Co2o6pud5mwuJHoDbZhN4azymt88zPl/PqLGlZkDXU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QOtxeM1OFLsuXT4YJp/CHQh5uUAdfjHbs5Hi4ZPTouE1n9rCowuA1f4LEshAn3KQMMB3nroZOjJYDZobWekSxzrbWf74F6Ah4xcMJVsJnycvZkUJB4iRLwlj1mb7fUMLXKahC7BKXz7vHKSCgboMdbHvy+KFwb73jr9cZ4nhrKE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Wetf7WAn; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Wetf7WAn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1770761628; x=1802297628; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0Co2o6pud5mwuJHoDbZhN4azymt88zPl/PqLGlZkDXU=; b=Wetf7WAnwwYHThKur2sCgKcmV7qrpTT7O6/bbF5bBo+E1yCIy7tG8Oo8 fH/jg3dge9L0YA3ORsXRNXRdF3j/vFORfRE9CIXIpw6KlPrpRsIYh4Lcl q+CbLzid6v+0MRkLHtnYwgpWcPIb9UIBHBCR4U5yCRURl6iLwGrddJoGu A1dU5wtD7Tl49/zM0EVPPVtFdll8ePgvYJRHBneVPbluGkEOHxDyeLulv VuKu6GkaDctsGDAG5gQEJqVtYJ/xHHKck7st3vCEHNUhGiRVKol8GrLAs 59kDlZfFkwEhL8B7eu0qLHDv82stdc4Shz+vv3431CvttZM6aCeuQiqY2 w==; X-CSE-ConnectionGUID: 1plUBdGMSouLTCEewYScHg== X-CSE-MsgGUID: 1i2YZsoiQ5qIlVf5+a/H1Q== X-IronPort-AV: E=McAfee;i="6800,10657,11697"; a="82631629" X-IronPort-AV: E=Sophos;i="6.21,283,1763452800"; d="scan'208";a="82631629" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2026 14:13:48 -0800 X-CSE-ConnectionGUID: 24BLstINQF2mO/5HF20AiQ== X-CSE-MsgGUID: MkThpCS0SredZayvs99A0Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,283,1763452800"; d="scan'208";a="216374041" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by fmviesa004.fm.intel.com with ESMTP; 10 Feb 2026 14:13:46 -0800 From: Tim Chen To: Peter Zijlstra , Ingo Molnar , K Prateek Nayak , "Gautham R . Shenoy" , Vincent Guittot Cc: Chen Yu , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Madadi Vineeth Reddy , Hillf Danton , Shrikanth Hegde , Jianyong Wu , Yangyu Chen , Tingyin Duan , Vern Hao , Vern Hao , Len Brown , Tim Chen , Aubrey Li , Zhao Liu , Chen Yu , Adam Li , Aaron Lu , Tim Chen , Josh Don , Gavin Guo , Qais Yousef , Libo Chen , linux-kernel@vger.kernel.org Subject: [PATCH v3 20/21] -- DO NOT APPLY!!! -- sched/cache/debug: Display the per LLC occupancy for each process via proc fs Date: Tue, 10 Feb 2026 14:19:00 -0800 Message-Id: <09c48847deeb9d2c1c7de1f2799cc128cd2e866e.1770760558.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Chen Yu Debug patch only. Show the per-LLC occupancy in /proc/{PID}/schedstat, with each column corresponding to one LLC. This can be used to verify if the cache-aware load balancer works as expected by aggregating threads onto dedicated LLCs. Suppose there are 2 LLCs and the sampling duration is 10 seconds: Enable the cache aware load balance: 0 12281 <--- LLC0 residency delta is 0, LLC1 is 12 seconds 0 18881 0 16217 disable the cache aware load balance: 6497 15802 9299 5435 17811 8278 Co-developed-by: Aaron Lu Signed-off-by: Aaron Lu Signed-off-by: Chen Yu Signed-off-by: Tim Chen --- Notes: v2->v3: Enhance the informational output by printing the task's preferred LLC. (Aaron Lu) fs/proc/base.c | 31 +++++++++++++++++++++++++ include/linux/mm_types.h | 17 +++++++++++--- include/linux/sched.h | 6 +++++ kernel/sched/fair.c | 50 ++++++++++++++++++++++++++++++++++++---- 4 files changed, 97 insertions(+), 7 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 4eec684baca9..76b49e80af1a 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -518,6 +518,37 @@ static int proc_pid_schedstat(struct seq_file *m, stru= ct pid_namespace *ns, (unsigned long long)task->se.sum_exec_runtime, (unsigned long long)task->sched_info.run_delay, task->sched_info.pcount); +#ifdef CONFIG_SCHED_CACHE + if (sched_cache_inuse()) { + struct mm_struct *mm =3D task->mm; + u64 *llc_runtime; + int mm_sched_llc; + + if (!mm) + return 0; + + llc_runtime =3D kcalloc(max_llcs, sizeof(u64), GFP_KERNEL); + if (!llc_runtime) + return 0; + + if (get_mm_per_llc_runtime(task, llc_runtime)) + goto out; + + if (mm->sc_stat.cpu =3D=3D -1) + mm_sched_llc =3D -1; + else + mm_sched_llc =3D llc_id(mm->sc_stat.cpu); + + for (int i =3D 0; i < max_llcs; i++) + seq_printf(m, "%s%s%llu ", + i =3D=3D task->preferred_llc ? "*" : "", + i =3D=3D mm_sched_llc ? "?" : "", + llc_runtime[i]); + seq_puts(m, "\n"); +out: + kfree(llc_runtime); + } +#endif =20 return 0; } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 777a48523aa6..2b8d0ec032e8 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1523,17 +1523,26 @@ static inline unsigned int mm_cid_size(void) =20 #ifdef CONFIG_SCHED_CACHE void mm_init_sched(struct mm_struct *mm, - struct sched_cache_time __percpu *pcpu_sched); + struct sched_cache_time __percpu *pcpu_sched, + struct sched_cache_time __percpu *pcpu_time); =20 static inline int mm_alloc_sched_noprof(struct mm_struct *mm) { struct sched_cache_time __percpu *pcpu_sched =3D - alloc_percpu_noprof(struct sched_cache_time); + alloc_percpu_noprof(struct sched_cache_time), + *pcpu_time; =20 if (!pcpu_sched) return -ENOMEM; =20 - mm_init_sched(mm, pcpu_sched); + pcpu_time =3D alloc_percpu_noprof(struct sched_cache_time); + if (!pcpu_time) { + free_percpu(pcpu_sched); + return -ENOMEM; + } + + mm_init_sched(mm, pcpu_sched, pcpu_time); + return 0; } =20 @@ -1542,7 +1551,9 @@ static inline int mm_alloc_sched_noprof(struct mm_str= uct *mm) static inline void mm_destroy_sched(struct mm_struct *mm) { free_percpu(mm->sc_stat.pcpu_sched); + free_percpu(mm->sc_stat.pcpu_time); mm->sc_stat.pcpu_sched =3D NULL; + mm->sc_stat.pcpu_time =3D NULL; } #else /* !CONFIG_SCHED_CACHE */ =20 diff --git a/include/linux/sched.h b/include/linux/sched.h index 511c9b263386..4236cacbb409 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2344,12 +2344,18 @@ struct sched_cache_time { =20 struct sched_cache_stat { struct sched_cache_time __percpu *pcpu_sched; + struct sched_cache_time __percpu *pcpu_time; raw_spinlock_t lock; unsigned long epoch; u64 nr_running_avg; int cpu; } ____cacheline_aligned_in_smp; =20 +int get_mm_per_llc_runtime(struct task_struct *p, u64 *buf); +bool sched_cache_inuse(void); +extern int max_llcs; +int llc_id(int cpu); + #else =20 struct sched_cache_stat { }; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index da4291ace24c..25cee3dd767c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1197,7 +1197,12 @@ __read_mostly unsigned int llc_epoch_affinity_timeou= t =3D EPOCH_LLC_AFFINITY_TIMEO __read_mostly unsigned int llc_imb_pct =3D 20; __read_mostly unsigned int llc_overaggr_pct =3D 50; =20 -static int llc_id(int cpu) +bool sched_cache_inuse(void) +{ + return sched_cache_enabled(); +} + +int llc_id(int cpu) { if (cpu < 0) return -1; @@ -1365,17 +1370,20 @@ static void account_llc_dequeue(struct rq *rq, stru= ct task_struct *p) } =20 void mm_init_sched(struct mm_struct *mm, - struct sched_cache_time __percpu *_pcpu_sched) + struct sched_cache_time __percpu *_pcpu_sched, + struct sched_cache_time __percpu *_pcpu_time) { unsigned long epoch; int i; =20 for_each_possible_cpu(i) { struct sched_cache_time *pcpu_sched =3D per_cpu_ptr(_pcpu_sched, i); + struct sched_cache_time *pcpu_time =3D per_cpu_ptr(_pcpu_time, i); struct rq *rq =3D cpu_rq(i); =20 pcpu_sched->runtime =3D 0; pcpu_sched->epoch =3D rq->cpu_epoch; + pcpu_time->runtime =3D 0; epoch =3D rq->cpu_epoch; } =20 @@ -1389,6 +1397,8 @@ void mm_init_sched(struct mm_struct *mm, * the readers may get invalid mm_sched_epoch, etc. */ smp_store_release(&mm->sc_stat.pcpu_sched, _pcpu_sched); + /* barrier */ + smp_store_release(&mm->sc_stat.pcpu_time, _pcpu_time); } =20 /* because why would C be fully specified */ @@ -1474,7 +1484,8 @@ static unsigned int task_running_on_cpu(int cpu, stru= ct task_struct *p); static inline void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec) { - struct sched_cache_time *pcpu_sched; + struct sched_cache_time *pcpu_sched, + *pcpu_time; struct mm_struct *mm =3D p->mm; int mm_sched_llc =3D -1; unsigned long epoch; @@ -1488,14 +1499,18 @@ void account_mm_sched(struct rq *rq, struct task_st= ruct *p, s64 delta_exec) * init_task, kthreads and user thread created * by user_mode_thread() don't have mm. */ - if (!mm || !mm->sc_stat.pcpu_sched) + if (!mm || !mm->sc_stat.pcpu_sched || + !mm->sc_stat.pcpu_time) return; =20 pcpu_sched =3D per_cpu_ptr(p->mm->sc_stat.pcpu_sched, cpu_of(rq)); + pcpu_time =3D per_cpu_ptr(p->mm->sc_stat.pcpu_time, cpu_of(rq)); =20 scoped_guard (raw_spinlock, &rq->cpu_epoch_lock) { __update_mm_sched(rq, pcpu_sched); pcpu_sched->runtime +=3D delta_exec; + /* pure runtime without decay */ + pcpu_time->runtime +=3D delta_exec; rq->cpu_runtime +=3D delta_exec; epoch =3D rq->cpu_epoch; } @@ -1676,6 +1691,33 @@ void init_sched_mm(struct task_struct *p) work->next =3D work; } =20 +/* p->pi_lock is hold */ +int get_mm_per_llc_runtime(struct task_struct *p, u64 *buf) +{ + struct sched_cache_time *pcpu_time; + struct mm_struct *mm =3D p->mm; + int cpu; + + if (!mm) + return -EINVAL; + + rcu_read_lock(); + for_each_online_cpu(cpu) { + int llc =3D llc_id(cpu); + u64 runtime_ms; + + if (!valid_llc_id(llc)) + continue; + + pcpu_time =3D per_cpu_ptr(mm->sc_stat.pcpu_sched, cpu); + runtime_ms =3D div_u64(pcpu_time->runtime, NSEC_PER_MSEC); + buf[llc] +=3D runtime_ms; + } + rcu_read_unlock(); + + return 0; +} + #else =20 static inline void account_mm_sched(struct rq *rq, struct task_struct *p, --=20 2.32.0