From nobody Fri Dec 19 21:08:58 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8872B2EC0A3 for ; Wed, 3 Dec 2025 23:01:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764802884; cv=none; b=PIVYWfHNGhpYcL5pUf5pbJV6z5GC4MufyMLaT00/IZT2eIAKxBzqzRglsyVDKa18ZuvGOOBF6720BmFO1QjbQTlm++JQNaJ2Li4EQo87RGn9XE96gbHXFQW46Ye00LdP+tH7Hh5mDSD6E7sACuXB9wl4PappMcJ/np+rPkSv+fk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764802884; c=relaxed/simple; bh=1tsEZhdWTsEDcQ9RmMyka/N/6UwyydH6Z8nvicoX744=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=HIfoZU9H/SZm0t6eE4dquYqikhNAFvY4+BXlcqSIZ3CtUZjOzIUSOC63YZp9YVMZHXi1YQfdjTLmXM4JflgdOMpsYGcmIdM9y97XnpuLltYZndJJ3UMie+BQAS7WTzwavGBbWlwvukQFWzaAt18tTAj+n7TvfZUbdaq3Hd/PwnQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gY2DTyL8; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gY2DTyL8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764802881; x=1796338881; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1tsEZhdWTsEDcQ9RmMyka/N/6UwyydH6Z8nvicoX744=; b=gY2DTyL8yWnM8kjFl12irX509n24BDz4iFKCqM6WCNUCLXRN5a5IlNUP CfYI2+/YpAT4bu6uPNEPMLhPBFM2XD4LK26owQYwXoYEFxXYOPyRzMCCr rISEhzC11YficDTuxwWe3QvPX3HaXsnsqXtK9HLG/hiT6NfkxrHYuu33P 2QVChiY0MqYwc1nvL417RDFrqZbCy7kRQLG02T5nK00USUuGMRvgZv+U3 gt7oM5XlbDtNyyU+5sVU7KIViaRsZSfklkuYRaOOMQ39LYUdIFQ+Ue6G0 EAocEYO+P59FhkDZmjjHTJ9I3dlRH+Fcb/w/MBdqObwG/r+XHEjXGxmQZ g==; X-CSE-ConnectionGUID: leGPfNk6R8KUwcSASjrrFg== X-CSE-MsgGUID: gXlnURSyTm+Cie+BIy/27w== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="77136204" X-IronPort-AV: E=Sophos;i="6.20,247,1758610800"; d="scan'208";a="77136204" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Dec 2025 15:01:21 -0800 X-CSE-ConnectionGUID: R/PsDZqXSOeZVekLwfPG7Q== X-CSE-MsgGUID: gEHdD8uJSdmMqZ2X+NMi/w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,247,1758610800"; d="scan'208";a="199763741" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by fmviesa004.fm.intel.com with ESMTP; 03 Dec 2025 15:01:20 -0800 From: Tim Chen To: Peter Zijlstra , Ingo Molnar , K Prateek Nayak , "Gautham R . Shenoy" , Vincent Guittot Cc: Chen Yu , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Madadi Vineeth Reddy , Hillf Danton , Shrikanth Hegde , Jianyong Wu , Yangyu Chen , Tingyin Duan , Vern Hao , Vern Hao , Len Brown , Tim Chen , Aubrey Li , Zhao Liu , Chen Yu , Adam Li , Aaron Lu , Tim Chen , linux-kernel@vger.kernel.org Subject: [PATCH v2 02/23] sched/cache: Record per-LLC utilization to guide cache-aware scheduling decisions Date: Wed, 3 Dec 2025 15:07:21 -0800 Message-Id: X-Mailer: git-send-email 2.32.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Chen Yu When a system becomes busy and a process=E2=80=99s preferred LLC is saturated with too many threads, tasks within that LLC migrate frequently. These in LLC migrations introduce latency and degrade performance. To avoid this, task aggregation should be suppressed when the preferred LLC is overloaded, which requires a metric to indicate LLC utilization. Record per LLC utilization/cpu capacity during periodic load balancing. These statistics will be used in later patches to decide whether tasks should be aggregated into their preferred LLC. Signed-off-by: Chen Yu Signed-off-by: Tim Chen --- Notes: v1->v2: Refine the comments in record_sg_llc_stats().(Peter Zijlstra). include/linux/sched/topology.h | 4 ++ kernel/sched/fair.c | 69 ++++++++++++++++++++++++++++++++++ 2 files changed, 73 insertions(+) diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index bbcfdf12aa6e..0ba4697d74ba 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -68,6 +68,10 @@ struct sched_domain_shared { atomic_t nr_busy_cpus; int has_idle_cores; int nr_idle_scan; +#ifdef CONFIG_SCHED_CACHE + unsigned long util_avg; + unsigned long capacity ____cacheline_aligned_in_smp; +#endif }; =20 struct sched_domain { diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index cb82f558dc5b..b9f336300f14 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9622,6 +9622,29 @@ static inline int task_is_ineligible_on_dst_cpu(stru= ct task_struct *p, int dest_ return 0; } =20 +#ifdef CONFIG_SCHED_CACHE +/* Called from load balancing paths with rcu_read_lock held */ +static __maybe_unused bool get_llc_stats(int cpu, unsigned long *util, + unsigned long *cap) +{ + struct sched_domain_shared *sd_share; + + sd_share =3D rcu_dereference(per_cpu(sd_llc_shared, cpu)); + if (!sd_share) + return false; + + *util =3D READ_ONCE(sd_share->util_avg); + *cap =3D READ_ONCE(sd_share->capacity); + + return true; +} +#else +static inline bool get_llc_stats(int cpu, unsigned long *util, + unsigned long *cap) +{ + return false; +} +#endif /* * can_migrate_task - may task p from runqueue rq be migrated to this_cpu? */ @@ -10592,6 +10615,51 @@ sched_reduced_capacity(struct rq *rq, struct sched= _domain *sd) return check_cpu_capacity(rq, sd); } =20 +#ifdef CONFIG_SCHED_CACHE +/* + * Record the statistics for this scheduler group for later + * use. These values guide load balancing on aggregating tasks + * to a LLC. + */ +static void record_sg_llc_stats(struct lb_env *env, + struct sg_lb_stats *sgs, + struct sched_group *group) +{ + struct sched_domain_shared *sd_share; + + if (!sched_cache_enabled() || env->idle =3D=3D CPU_NEWLY_IDLE) + return; + + /* Only care about sched domain spanning multiple LLCs */ + if (env->sd->child !=3D rcu_dereference(per_cpu(sd_llc, env->dst_cpu))) + return; + + /* + * At this point we know this group spans a LLC domain. + * Record the statistic of this group in its corresponding + * shared LLC domain. + * Note: sd_share cannot be obtained via sd->child->shared, because + * it refers to the domain that covers the local group, while + * sd_share could represent any of the LLC group. + */ + sd_share =3D rcu_dereference(per_cpu(sd_llc_shared, + cpumask_first(sched_group_span(group)))); + if (!sd_share) + return; + + if (READ_ONCE(sd_share->util_avg) !=3D sgs->group_util) + WRITE_ONCE(sd_share->util_avg, sgs->group_util); + + if (unlikely(READ_ONCE(sd_share->capacity) !=3D sgs->group_capacity)) + WRITE_ONCE(sd_share->capacity, sgs->group_capacity); +} +#else +static inline void record_sg_llc_stats(struct lb_env *env, struct sg_lb_st= ats *sgs, + struct sched_group *group) +{ +} +#endif + /** * update_sg_lb_stats - Update sched_group's statistics for load balancing. * @env: The load balancing environment. @@ -10681,6 +10749,7 @@ static inline void update_sg_lb_stats(struct lb_env= *env, =20 sgs->group_type =3D group_classify(env->sd->imbalance_pct, group, sgs); =20 + record_sg_llc_stats(env, sgs, group); /* Computing avg_load makes sense only when group is overloaded */ if (sgs->group_type =3D=3D group_overloaded) sgs->avg_load =3D (sgs->group_load * SCHED_CAPACITY_SCALE) / --=20 2.32.0