> On 4 Dec 2025, at 07:07, Tim Chen <tim.c.chen@linux.intel.com> wrote:
>
> From: Chen Yu <yu.c.chen@intel.com>
>
> Debug patch only.
>
> With cache-aware load balancing enabled, statistics related to its activity
> are exposed via /proc/schedstat and debugfs. For instance, if users want to
> verify metrics like the number of exceeding RSS and nr_running limits, they
> can filter the output of /sys/kernel/debug/sched/debug and compute the required
> statistics manually:
>
> llc_exceed_cap SUM: 6
> llc_exceed_nr SUM: 4531
>
> Furthermore, these statistics exposed in /proc/schedstats can be queried manually
> or via perf sched stats[1] with minor modifications.
>
Hi Tim,
This patch looks great, especially for multithread Verilator workloads
on clustered LLC (like AMD EPYC). I'm discussing with Verilator
upstream to disable automatic userspace affinity assignment in
Verilator if such feature exist [1]. During the discussion, I think
there should be a way for userspace software to detect if such a
feature exists. Could we expose it in `/proc/schedstats` to allow
userspace software to detect such a feature? We can just use this
patch and remove the "DO NOT APPLY" tag.
[1] https://github.com/verilator/verilator/issues/6826#issuecomment-3671287551
Thanks,
Yangyu Chen
> Link: https://lore.kernel.org/all/20250909114227.58802-1-swapnil.sapkal@amd.com #1
>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
> include/linux/sched/topology.h | 1 +
> kernel/sched/fair.c | 1 +
> kernel/sched/stats.c | 5 +++--
> 3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index 0ba4697d74ba..8702c1e731a0 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -108,6 +108,7 @@ struct sched_domain {
> unsigned int lb_imbalance_util[CPU_MAX_IDLE_TYPES];
> unsigned int lb_imbalance_task[CPU_MAX_IDLE_TYPES];
> unsigned int lb_imbalance_misfit[CPU_MAX_IDLE_TYPES];
> + unsigned int lb_imbalance_llc[CPU_MAX_IDLE_TYPES];
> unsigned int lb_gained[CPU_MAX_IDLE_TYPES];
> unsigned int lb_hot_gained[CPU_MAX_IDLE_TYPES];
> unsigned int lb_nobusyg[CPU_MAX_IDLE_TYPES];
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a2e2d6742481..742e455b093e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12684,6 +12684,7 @@ static void update_lb_imbalance_stat(struct lb_env *env, struct sched_domain *sd
> __schedstat_add(sd->lb_imbalance_misfit[idle], env->imbalance);
> break;
> case migrate_llc_task:
> + __schedstat_add(sd->lb_imbalance_llc[idle], env->imbalance);
> break;
> }
> }
> diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
> index d1c9429a4ac5..3736f6102261 100644
> --- a/kernel/sched/stats.c
> +++ b/kernel/sched/stats.c
> @@ -104,7 +104,7 @@ void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p,
> * Bump this up when changing the output format or the meaning of an existing
> * format, so that tools can adapt (or abort)
> */
> -#define SCHEDSTAT_VERSION 17
> +#define SCHEDSTAT_VERSION 18
>
> static int show_schedstat(struct seq_file *seq, void *v)
> {
> @@ -139,7 +139,7 @@ static int show_schedstat(struct seq_file *seq, void *v)
> seq_printf(seq, "domain%d %s %*pb", dcount++, sd->name,
> cpumask_pr_args(sched_domain_span(sd)));
> for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
> - seq_printf(seq, " %u %u %u %u %u %u %u %u %u %u %u",
> + seq_printf(seq, " %u %u %u %u %u %u %u %u %u %u %u %u",
> sd->lb_count[itype],
> sd->lb_balanced[itype],
> sd->lb_failed[itype],
> @@ -147,6 +147,7 @@ static int show_schedstat(struct seq_file *seq, void *v)
> sd->lb_imbalance_util[itype],
> sd->lb_imbalance_task[itype],
> sd->lb_imbalance_misfit[itype],
> + sd->lb_imbalance_llc[itype],
> sd->lb_gained[itype],
> sd->lb_hot_gained[itype],
> sd->lb_nobusyq[itype],
> --
> 2.32.0