From nobody Fri Dec 19 21:09:00 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F2A72F0696 for ; Wed, 3 Dec 2025 23:01:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764802892; cv=none; b=JuI4HP7FPjUZRRvIF57U5a+nyKFVaejSLBjOwb2o4K+dyMy+TzvS6alNai1tmhDlx/F2kpTdrbKJxXsp0ye0xTv9vWh98FuHcXDXimNg3p+EZ0AClnIocNRkMFznzOXiGUgsNO6KJzOsOmRV7MqRji4PoMn2fV9YYulhopDCdW0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764802892; c=relaxed/simple; bh=Kgfm8ZrVAem+cuIFSErLp11pWO+uaVSLeCf068dctEI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=oWgrj/+vmbD4ydPoKoPApP5RMU0UBhjF4mxsiADMVL/t5AARqr//6C8rqPkshWdzhhrhMPF1AzqYud7ZATo+YBem2D9OjWwAWcvEU+adG0BNbDeKX0F/tFC7FpYkxBtH1K1PhGVx8OIwbNowGJZ5W0OZkvMWwyvk09t3vXbHMn4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Ff+wBHml; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Ff+wBHml" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764802889; x=1796338889; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Kgfm8ZrVAem+cuIFSErLp11pWO+uaVSLeCf068dctEI=; b=Ff+wBHmlGI9Ls+hPQ/icfiRSQpZE9xFA2dMFUkAvN4HoLDq9rsPxZPeZ 8VRONCVnKKzfdp0/tx6ByohayUgQnukEiUM/5FG80edcOUwn8pLvcV6CD rsakyGnOPLHSStQkG1+f0q6DnjhqobEUdJaywwMsE54fftDticAbLprId 3bhB2AwAPJQjK37rs0/N96in+m4FjW7qil9FvPJrQKe2CXx6Vw8vc05XH UOnoKjT+4VoaXotKSh3uNxjPZTKFSxLyHcD1a3z71R7y9pyahaHenJnCZ 3UkyBEcsW2m1c1Cx8k4IAc/bj/uxMr+zGfxYNNEZL+3nmX/2zLcKYH7UG w==; X-CSE-ConnectionGUID: XVdRsMs0TMKO/Xjz8IoNBA== X-CSE-MsgGUID: 8lt8Jb1nTZqY7huRsHsSQw== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="77136318" X-IronPort-AV: E=Sophos;i="6.20,247,1758610800"; d="scan'208";a="77136318" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Dec 2025 15:01:28 -0800 X-CSE-ConnectionGUID: HuS/ZH/YT/Cjm+dSF3UiRw== X-CSE-MsgGUID: YDDbEJCdQwCGXEo7YEaNTQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,247,1758610800"; d="scan'208";a="199763787" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by fmviesa004.fm.intel.com with ESMTP; 03 Dec 2025 15:01:28 -0800 From: Tim Chen To: Peter Zijlstra , Ingo Molnar , K Prateek Nayak , "Gautham R . Shenoy" , Vincent Guittot Cc: Tim Chen , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Madadi Vineeth Reddy , Hillf Danton , Shrikanth Hegde , Jianyong Wu , Yangyu Chen , Tingyin Duan , Vern Hao , Vern Hao , Len Brown , Aubrey Li , Zhao Liu , Chen Yu , Chen Yu , Adam Li , Aaron Lu , Tim Chen , linux-kernel@vger.kernel.org Subject: [PATCH v2 07/23] sched/cache: Introduce per runqueue task LLC preference counter Date: Wed, 3 Dec 2025 15:07:26 -0800 Message-Id: <63091f7ca7bb473fbc176af86a87d27a07a6e149.1764801860.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Each runqueue is assigned an array where each element tracks the number of tasks preferring a given LLC, indexed from 0 to max_llcs - 1. For example, rq->nr_pref_llc[3] =3D 2 signifies that there are 2 tasks on this runqueue which prefer to run within LLC3. The load balancer can use this information to identify busy runqueues and migrate tasks to their preferred LLC domains. This array will be reallocated at runtime if the number of LLCs increases due to CPU hotplug. Only extending the buffer(rather than shrinking it) is supported to simplify the implementation. Introduce the buffer allocation mechanism, and the statistics will be calculated in the subsequent patch. Co-developed-by: Chen Yu Signed-off-by: Chen Yu Signed-off-by: Tim Chen --- Notes: v1->v2: Remove static allocation of per runqueue LLC preference arrays. Allocate array size to the actual number of LLCs online. (Peter Zij= lstra, Madadi Vineeth Reddy) kernel/sched/core.c | 1 + kernel/sched/sched.h | 1 + kernel/sched/topology.c | 117 +++++++++++++++++++++++++++++++++++++++- 3 files changed, 118 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 48626c81ba8e..ce533dc485f5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8800,6 +8800,7 @@ void __init sched_init(void) #ifdef CONFIG_SCHED_CACHE raw_spin_lock_init(&rq->cpu_epoch_lock); rq->cpu_epoch_next =3D jiffies; + rq->nr_pref_llc =3D NULL; #endif =20 zalloc_cpumask_var_node(&rq->scratch_mask, GFP_KERNEL, cpu_to_node(i)); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ee8b70647835..8f2a779825e4 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1129,6 +1129,7 @@ struct rq { #ifdef CONFIG_SCHED_CACHE unsigned int nr_pref_llc_running; unsigned int nr_llc_running; + unsigned int *nr_pref_llc; #endif #ifdef CONFIG_NO_HZ_COMMON unsigned long last_blocked_load_update_tick; diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index f25d950ab015..d583399fc6a1 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -17,8 +17,121 @@ void sched_domains_mutex_unlock(void) mutex_unlock(&sched_domains_mutex); } =20 +/* the number of max LLCs being detected */ +static int new_max_llcs; +/* the current number of max LLCs */ int max_llcs; =20 +#ifdef CONFIG_SCHED_CACHE + +static unsigned int *alloc_new_pref_llcs(unsigned int *old, unsigned int *= *gc) +{ + unsigned int *new =3D NULL; + + new =3D kcalloc(new_max_llcs, sizeof(unsigned int), + GFP_KERNEL | __GFP_NOWARN); + + if (!new) { + *gc =3D NULL; + } else { + /* + * Place old entry in garbage collector + * for later disposal. + */ + *gc =3D old; + } + return new; +} + +static void populate_new_pref_llcs(unsigned int *old, unsigned int *new) +{ + int i; + + if (!old) + return; + + for (i =3D 0; i < max_llcs; i++) + new[i] =3D old[i]; +} + +static int resize_llc_pref(void) +{ + unsigned int *__percpu *tmp_llc_pref; + int i, ret =3D 0; + + if (new_max_llcs <=3D max_llcs) + return 0; + + /* + * Allocate temp percpu pointer for old llc_pref, + * which will be released after switching to the + * new buffer. + */ + tmp_llc_pref =3D alloc_percpu_noprof(unsigned int *); + if (!tmp_llc_pref) + return -ENOMEM; + + for_each_present_cpu(i) + *per_cpu_ptr(tmp_llc_pref, i) =3D NULL; + + /* + * Resize the per rq nr_pref_llc buffer and + * switch to this new buffer. + */ + for_each_present_cpu(i) { + struct rq_flags rf; + unsigned int *new; + struct rq *rq; + + rq =3D cpu_rq(i); + new =3D alloc_new_pref_llcs(rq->nr_pref_llc, per_cpu_ptr(tmp_llc_pref, i= )); + if (!new) { + ret =3D -ENOMEM; + + goto release_old; + } + + /* + * Locking rq ensures that rq->nr_pref_llc values + * don't change with new task enqueue/dequeue + * when we repopulate the newly enlarged array. + */ + rq_lock_irqsave(rq, &rf); + populate_new_pref_llcs(rq->nr_pref_llc, new); + rq->nr_pref_llc =3D new; + rq_unlock_irqrestore(rq, &rf); + } + +release_old: + /* + * Load balance is done under rcu_lock. + * Wait for load balance before and during resizing to + * be done. They may refer to old nr_pref_llc[] + * that hasn't been resized. + */ + synchronize_rcu(); + for_each_present_cpu(i) + kfree(*per_cpu_ptr(tmp_llc_pref, i)); + + free_percpu(tmp_llc_pref); + + /* succeed and update */ + if (!ret) + max_llcs =3D new_max_llcs; + + return ret; +} + +#else + +static int resize_llc_pref(void) +{ + max_llcs =3D new_max_llcs; + return 0; +} + +#endif + /* Protected by sched_domains_mutex: */ static cpumask_var_t sched_domains_tmpmask; static cpumask_var_t sched_domains_tmpmask2; @@ -714,7 +827,7 @@ static int update_llc_id(struct sched_domain *sd, * * For both cases, we want to increase the number of LLCs. */ - per_cpu(sd_llc_id, cpu) =3D max_llcs++; + per_cpu(sd_llc_id, cpu) =3D new_max_llcs++; =20 return per_cpu(sd_llc_id, cpu); } @@ -2674,6 +2787,8 @@ build_sched_domains(const struct cpumask *cpu_map, st= ruct sched_domain_attr *att if (has_cluster) static_branch_inc_cpuslocked(&sched_cluster_active); =20 + resize_llc_pref(); + if (rq && sched_debug_verbose) pr_info("root domain span: %*pbl\n", cpumask_pr_args(cpu_map)); =20 --=20 2.32.0