From nobody Fri Dec 19 21:09:01 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CB492EBBB7 for ; Wed, 3 Dec 2025 23:01:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764802919; cv=none; b=m/i8AM9jez30fmSC1ThjI0YmAYEwTjLN0aX4/W91cI/xJdwDY3yhTCxjuRQMXmg8XAbCVHRL4AColOXBfQy71E1URs7aT+GLFscw7WH4+OFmIN9YsDx0KaMus5WdBjhF8tzszL6TEZ12kmt42mlqOXQoE5Z3dqzJYmLCcriio58= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764802919; c=relaxed/simple; bh=wLrJr/SuamOWjFO9gpHP5B2k8lcK+6x8dlASnUWXGe8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=eJbxWPHDUsl7XKuqPrYe829WccTGNXVp007ecq2JrHaVKwuvPh4j19TPROJM5V4vppIdkk1U3AT26iFdDx2qrmsewZCkwqlDeBPDJqbvbZbY+3Vimkg2ojZhH8CLl94yalOO4ZSXRjWefBovmf2taUbRtFOEBHGk1S0e1XvQ7G4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=jEU0XIYU; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jEU0XIYU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764802917; x=1796338917; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wLrJr/SuamOWjFO9gpHP5B2k8lcK+6x8dlASnUWXGe8=; b=jEU0XIYUmUP1w1odUpoZztkux4d2T4uFzSQDEoeQkO6AEZ1yfHcuVfq9 YwImXDzBWY46rQh33rL3qoP+4HJZhnOXgjU9/vwFZtLvGkGs5rHvI8YBx jDLfActh0h/lcktc8ZNAWUhHLuPaktpxkehHuTNiQ+/PYiyL7+Hj8Xdrd 41rYFhxJEN7aCEKecsCgMgtV2kyKG5rxF89kVp/FA/73jNvUXDa5pRoN7 yqtdT/I+zUDFwYL0JDyMdCOZxceWrOHrciU5DroHkoBLTkvVc7oA5oIMh KkFun1mmeV+tcvGf8EXfa3CUEmb0TvEhrDlTxbkcFqltiq0sEOiCw8NXE w==; X-CSE-ConnectionGUID: 7s0dCQLrSayFkNv254nlIw== X-CSE-MsgGUID: oFf+c8koRFSSrN6sf+Ly3g== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="77136770" X-IronPort-AV: E=Sophos;i="6.20,247,1758610800"; d="scan'208";a="77136770" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Dec 2025 15:01:56 -0800 X-CSE-ConnectionGUID: F/ChnV0DRm2XulsnpHzSUQ== X-CSE-MsgGUID: VHoDeeBRRb2BuWXAH6c04Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,247,1758610800"; d="scan'208";a="199764012" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by fmviesa004.fm.intel.com with ESMTP; 03 Dec 2025 15:01:55 -0800 From: Tim Chen To: Peter Zijlstra , Ingo Molnar , K Prateek Nayak , "Gautham R . Shenoy" , Vincent Guittot Cc: Chen Yu , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Madadi Vineeth Reddy , Hillf Danton , Shrikanth Hegde , Jianyong Wu , Yangyu Chen , Tingyin Duan , Vern Hao , Vern Hao , Len Brown , Tim Chen , Aubrey Li , Zhao Liu , Chen Yu , Adam Li , Aaron Lu , Tim Chen , linux-kernel@vger.kernel.org Subject: [PATCH v2 23/23] -- DO NOT APPLY!!! -- sched/cache/debug: Display the per LLC occupancy for each process via proc fs Date: Wed, 3 Dec 2025 15:07:42 -0800 Message-Id: <0eaf9b9f89f0d97dbf46b760421f65aee3ffe063.1764801860.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Chen Yu Debug patch only. Show the per-LLC occupancy in /proc/{PID}/schedstat, with each column corresponding to one LLC. This can be used to verify if the cache-aware load balancer works as expected by aggregating threads onto dedicated LLCs. Suppose there are 2 LLCs and the sampling duration is 10 seconds: Enable the cache aware load balance: 0 12281 <--- LLC0 residency delta is 0, LLC1 is 12 seconds 0 18881 0 16217 disable the cache aware load balance: 6497 15802 9299 5435 17811 8278 Signed-off-by: Chen Yu Signed-off-by: Tim Chen --- fs/proc/base.c | 22 ++++++++++++++++++++++ include/linux/mm_types.h | 19 +++++++++++++++++-- include/linux/sched.h | 3 +++ kernel/sched/fair.c | 40 ++++++++++++++++++++++++++++++++++++++-- 4 files changed, 80 insertions(+), 4 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 6299878e3d97..f4be96f4bd01 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -518,6 +518,28 @@ static int proc_pid_schedstat(struct seq_file *m, stru= ct pid_namespace *ns, (unsigned long long)task->se.sum_exec_runtime, (unsigned long long)task->sched_info.run_delay, task->sched_info.pcount); +#ifdef CONFIG_SCHED_CACHE + if (sched_cache_enabled()) { + struct mm_struct *mm =3D task->mm; + u64 *llc_runtime; + + if (!mm) + return 0; + + llc_runtime =3D kcalloc(max_llcs, sizeof(u64), GFP_KERNEL); + if (!llc_runtime) + return 0; + + if (get_mm_per_llc_runtime(task, llc_runtime)) + goto out; + + for (int i =3D 0; i < max_llcs; i++) + seq_printf(m, "%llu ", llc_runtime[i]); + seq_puts(m, "\n"); +out: + kfree(llc_runtime); + } +#endif =20 return 0; } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 04743983de4d..255c22be7312 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -944,6 +944,10 @@ struct mm_sched { unsigned long epoch; }; =20 +struct mm_time { + u64 runtime_ns; +}; + struct kioctx_table; struct iommu_mm_data; struct mm_struct { @@ -1040,6 +1044,7 @@ struct mm_struct { * See account_mm_sched() and ... */ struct mm_sched __percpu *pcpu_sched; + struct mm_time __percpu *pcpu_time; raw_spinlock_t mm_sched_lock; unsigned long mm_sched_epoch; int mm_sched_cpu; @@ -1505,16 +1510,24 @@ static inline void mm_set_cpus_allowed(struct mm_st= ruct *mm, const struct cpumas #endif /* CONFIG_SCHED_MM_CID */ =20 #ifdef CONFIG_SCHED_CACHE -void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *pcpu_sc= hed); +void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *pcpu_sc= hed, + struct mm_time __percpu *pcpu_time); =20 static inline int mm_alloc_sched_noprof(struct mm_struct *mm) { struct mm_sched __percpu *pcpu_sched =3D alloc_percpu_noprof(struct mm_sc= hed); + struct mm_time __percpu *pcpu_time; =20 if (!pcpu_sched) return -ENOMEM; =20 - mm_init_sched(mm, pcpu_sched); + pcpu_time =3D alloc_percpu_noprof(struct mm_time); + if (!pcpu_time) { + free_percpu(mm->pcpu_sched); + return -ENOMEM; + } + + mm_init_sched(mm, pcpu_sched, pcpu_time); return 0; } =20 @@ -1523,7 +1536,9 @@ static inline int mm_alloc_sched_noprof(struct mm_str= uct *mm) static inline void mm_destroy_sched(struct mm_struct *mm) { free_percpu(mm->pcpu_sched); + free_percpu(mm->pcpu_time); mm->pcpu_sched =3D NULL; + mm->pcpu_time =3D NULL; } #else /* !CONFIG_SCHED_CACHE */ =20 diff --git a/include/linux/sched.h b/include/linux/sched.h index 95bf080bbbf0..875ac3f4208b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2442,6 +2442,9 @@ static inline bool sched_cache_enabled(void) { return static_branch_unlikely(&sched_cache_on); } + +int get_mm_per_llc_runtime(struct task_struct *p, u64 *buf); +extern int max_llcs; #endif =20 #endif diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e47b4096f0a6..205208f061bb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1355,16 +1355,19 @@ static void account_llc_dequeue(struct rq *rq, stru= ct task_struct *p) p->sched_llc_active =3D false; } =20 -void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_s= ched) +void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_s= ched, + struct mm_time __percpu *_pcpu_time) { unsigned long epoch; int i; =20 for_each_possible_cpu(i) { struct mm_sched *pcpu_sched =3D per_cpu_ptr(_pcpu_sched, i); + struct mm_time *pcpu_time =3D per_cpu_ptr(_pcpu_time, i); struct rq *rq =3D cpu_rq(i); =20 pcpu_sched->runtime =3D 0; + pcpu_time->runtime_ns =3D 0; pcpu_sched->epoch =3D rq->cpu_epoch; epoch =3D rq->cpu_epoch; } @@ -1379,6 +1382,8 @@ void mm_init_sched(struct mm_struct *mm, struct mm_sc= hed __percpu *_pcpu_sched) * the readers may get invalid mm_sched_epoch, etc. */ smp_store_release(&mm->pcpu_sched, _pcpu_sched); + /* same as above */ + smp_store_release(&mm->pcpu_time, _pcpu_time); } =20 /* because why would C be fully specified */ @@ -1428,11 +1433,39 @@ static unsigned long __no_profile fraction_mm_sched= (struct rq *rq, struct mm_sch =20 static unsigned int task_running_on_cpu(int cpu, struct task_struct *p); =20 +/* p->pi_lock is hold */ +int get_mm_per_llc_runtime(struct task_struct *p, u64 *buf) +{ + struct mm_struct *mm =3D p->mm; + struct mm_time *pcpu_time; + int cpu; + + if (!mm) + return -EINVAL; + + rcu_read_lock(); + for_each_online_cpu(cpu) { + int llc =3D llc_id(cpu); + u64 runtime_ms; + + if (llc < 0) + continue; + + pcpu_time =3D per_cpu_ptr(mm->pcpu_time, cpu); + runtime_ms =3D div_u64(pcpu_time->runtime_ns, NSEC_PER_MSEC); + buf[llc] +=3D runtime_ms; + } + rcu_read_unlock(); + + return 0; +} + static inline void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec) { struct mm_struct *mm =3D p->mm; struct mm_sched *pcpu_sched; + struct mm_time *pcpu_time; unsigned long epoch; int mm_sched_llc =3D -1; =20 @@ -1444,14 +1477,17 @@ void account_mm_sched(struct rq *rq, struct task_st= ruct *p, s64 delta_exec) /* * init_task and kthreads don't having mm */ - if (!mm || !mm->pcpu_sched) + if (!mm || !mm->pcpu_sched || !mm->pcpu_time) return; =20 pcpu_sched =3D per_cpu_ptr(p->mm->pcpu_sched, cpu_of(rq)); + pcpu_time =3D per_cpu_ptr(p->mm->pcpu_time, cpu_of(rq)); =20 scoped_guard (raw_spinlock, &rq->cpu_epoch_lock) { __update_mm_sched(rq, pcpu_sched); pcpu_sched->runtime +=3D delta_exec; + /* pure runtime without decay */ + pcpu_time->runtime_ns +=3D delta_exec; rq->cpu_runtime +=3D delta_exec; epoch =3D rq->cpu_epoch; } --=20 2.32.0