From nobody Fri Dec 19 20:52:36 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C93962EF652 for ; Wed, 3 Dec 2025 23:01:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764802890; cv=none; b=EDsiu7g2BtXvvoS9BKwrirW/B8ldDhmwGPx+cdJzoxBtklhxCuicf7XZFi+5IO9eicj+U0q988drhlH0OJjM+IwUt0amTGbw3mfM6d+6WZDelOH8Kc3PIbWBuITzHpbg31UVRdkj3UEviuqp+uvpMTrssPknIugATiCNu3Bm+08= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764802890; c=relaxed/simple; bh=VfUUqC84e+k4dM9OCiHr0qSll3wkyw96Z2hiwhlrd+g=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bMgyCWF3/XMpBtns9xgAQbuvJYsQoxOLy5qU1v3Ure2zyH7eaHG4ZLbKyqgBn1NINjkU2O0RPcPn7whkPdiyLRm36oluEWQ4viCDhC3YxOj/EZYMjqKw4E92UmhMBk5j0NYcW2RvXkMIEQxCZjUg4qUDiMfwP1eraXWWdJgmvkk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HkSBtET5; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HkSBtET5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764802887; x=1796338887; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VfUUqC84e+k4dM9OCiHr0qSll3wkyw96Z2hiwhlrd+g=; b=HkSBtET5tJuyrYVLfwF6tgrJB3jPRTx01PEveXBF1wIqsiOJxkXhzAOm sC1smgQCW8wgJyR4E1u9VSEyU2s5OeGIeEuC988/p/oKmWX8sR4t5I1+Q tI0jgAIHPovP+AIphgRpysIDP7uveWJciGMii/zPUANlnHxP4W7VRq2eJ sBFqpGeZy1Ve8fewNRoxQswiP1fA+sTe9iwHVjtYcP+1v4kzgt4NxJNt7 wXwMA6vcMf7L8X5pDnsHkNo+K4j1B34n8SEcNJu9+4em9z3ghkY3MGzod zaVcGH6lY2mH/znHiuVlkKaau6etkJB5XXnU6Zdt6/ZSkCkDGyN6SoMYU A==; X-CSE-ConnectionGUID: k1qC8aFmROqogX9M8c8KUg== X-CSE-MsgGUID: rOmGsn1SSNWPs0ITFZygdw== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="77136288" X-IronPort-AV: E=Sophos;i="6.20,247,1758610800"; d="scan'208";a="77136288" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Dec 2025 15:01:27 -0800 X-CSE-ConnectionGUID: cvzDuwr8RH6q0DcIt04zOw== X-CSE-MsgGUID: uxGA3PlMTN6liJURHzFh/g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,247,1758610800"; d="scan'208";a="199763775" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by fmviesa004.fm.intel.com with ESMTP; 03 Dec 2025 15:01:26 -0800 From: Tim Chen To: Peter Zijlstra , Ingo Molnar , K Prateek Nayak , "Gautham R . Shenoy" , Vincent Guittot Cc: Tim Chen , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Madadi Vineeth Reddy , Hillf Danton , Shrikanth Hegde , Jianyong Wu , Yangyu Chen , Tingyin Duan , Vern Hao , Vern Hao , Len Brown , Aubrey Li , Zhao Liu , Chen Yu , Chen Yu , Adam Li , Aaron Lu , Tim Chen , linux-kernel@vger.kernel.org Subject: [PATCH v2 06/23] sched/cache: Track LLC-preferred tasks per runqueue Date: Wed, 3 Dec 2025 15:07:25 -0800 Message-Id: X-Mailer: git-send-email 2.32.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" For each runqueue, track the number of tasks with an LLC preference and how many of them are running on their preferred LLC. This mirrors nr_numa_running and nr_preferred_running for NUMA balancing, and will be used by cache-aware load balancing in later patches. Signed-off-by: Tim Chen --- Notes: v1->v2: Invoke task_of() once and reuse its result afterwards. (Peter Zijlstra) Remove hacky reset_llc_stats() and introduce sched_llc_active f= lag to properly pair enqueue/dequeue statistics update (Peter Zijls= tra, K Prateek Nayak) include/linux/sched.h | 2 ++ init/init_task.c | 1 + kernel/sched/core.c | 5 ++++ kernel/sched/fair.c | 60 ++++++++++++++++++++++++++++++++++++++++--- kernel/sched/sched.h | 6 +++++ 5 files changed, 71 insertions(+), 3 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 1ad46220cd04..466ba8b7398c 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1408,6 +1408,8 @@ struct task_struct { =20 #ifdef CONFIG_SCHED_CACHE struct callback_head cache_work; + /*the p is currently refcounted in a rq's preferred llc stats*/ + bool sched_llc_active; int preferred_llc; #endif =20 diff --git a/init/init_task.c b/init/init_task.c index 44bae72b5b7d..ee78837b0aa2 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -192,6 +192,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = =3D { .numa_faults =3D NULL, #endif #ifdef CONFIG_SCHED_CACHE + .sched_llc_active =3D false, .preferred_llc =3D -1, #endif #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e8bdf03a4b7f..48626c81ba8e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -531,6 +531,11 @@ void __trace_set_current_state(int state_value) } EXPORT_SYMBOL(__trace_set_current_state); =20 +int task_llc(const struct task_struct *p) +{ + return per_cpu(sd_llc_id, task_cpu(p)); +} + /* * Serialization rules: * diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 10cec83f65d5..d46a70a9d9fb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1223,6 +1223,43 @@ static int llc_id(int cpu) return llc; } =20 +static void account_llc_enqueue(struct rq *rq, struct task_struct *p) +{ + int pref_llc; + + if (!sched_cache_enabled()) + return; + + pref_llc =3D p->preferred_llc; + if (pref_llc < 0) + return; + + rq->nr_llc_running++; + rq->nr_pref_llc_running +=3D (pref_llc =3D=3D task_llc(p)); + p->sched_llc_active =3D true; +} + +static void account_llc_dequeue(struct rq *rq, struct task_struct *p) +{ + int pref_llc; + + /* + * Borrow the uc_se->active from uclamp_rq_inc_id(), + * uclamp_rq_dec_id() to avoid the unbalanced calculation + * of rq statistics. + */ + if (unlikely(!p->sched_llc_active)) + return; + + pref_llc =3D p->preferred_llc; + if (pref_llc < 0) + return; + + rq->nr_llc_running--; + rq->nr_pref_llc_running -=3D (pref_llc =3D=3D task_llc(p)); + p->sched_llc_active =3D false; +} + void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_s= ched) { unsigned long epoch; @@ -1294,6 +1331,8 @@ static unsigned long __no_profile fraction_mm_sched(s= truct rq *rq, struct mm_sch return div64_u64(NICE_0_LOAD * pcpu_sched->runtime, rq->cpu_runtime + 1); } =20 +static unsigned int task_running_on_cpu(int cpu, struct task_struct *p); + static inline void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec) { @@ -1346,8 +1385,13 @@ void account_mm_sched(struct rq *rq, struct task_str= uct *p, s64 delta_exec) #endif } =20 - if (p->preferred_llc !=3D mm_sched_llc) + /* task not on rq accounted later in account_entity_enqueue() */ + if (task_running_on_cpu(rq->cpu, p) && + p->preferred_llc !=3D mm_sched_llc) { + account_llc_dequeue(rq, p); p->preferred_llc =3D mm_sched_llc; + account_llc_enqueue(rq, p); + } } =20 static void task_tick_cache(struct rq *rq, struct task_struct *p) @@ -1475,6 +1519,10 @@ void init_sched_mm(struct task_struct *p) { } =20 static void task_tick_cache(struct rq *rq, struct task_struct *p) { } =20 +static void account_llc_enqueue(struct rq *rq, struct task_struct *p) {} + +static void account_llc_dequeue(struct rq *rq, struct task_struct *p) {} + #endif =20 /* @@ -3965,9 +4013,11 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct= sched_entity *se) { update_load_add(&cfs_rq->load, se->load.weight); if (entity_is_task(se)) { + struct task_struct *p =3D task_of(se); struct rq *rq =3D rq_of(cfs_rq); =20 - account_numa_enqueue(rq, task_of(se)); + account_numa_enqueue(rq, p); + account_llc_enqueue(rq, p); list_add(&se->group_node, &rq->cfs_tasks); } cfs_rq->nr_queued++; @@ -3978,7 +4028,11 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct= sched_entity *se) { update_load_sub(&cfs_rq->load, se->load.weight); if (entity_is_task(se)) { - account_numa_dequeue(rq_of(cfs_rq), task_of(se)); + struct task_struct *p =3D task_of(se); + struct rq *rq =3D rq_of(cfs_rq); + + account_numa_dequeue(rq, p); + account_llc_dequeue(rq, p); list_del_init(&se->group_node); } cfs_rq->nr_queued--; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 728737641847..ee8b70647835 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1126,6 +1126,10 @@ struct rq { unsigned int nr_preferred_running; unsigned int numa_migrate_on; #endif +#ifdef CONFIG_SCHED_CACHE + unsigned int nr_pref_llc_running; + unsigned int nr_llc_running; +#endif #ifdef CONFIG_NO_HZ_COMMON unsigned long last_blocked_load_update_tick; unsigned int has_blocked_load; @@ -1980,6 +1984,8 @@ init_numa_balancing(u64 clone_flags, struct task_stru= ct *p) =20 #endif /* !CONFIG_NUMA_BALANCING */ =20 +int task_llc(const struct task_struct *p); + static inline void queue_balance_callback(struct rq *rq, struct balance_callback *head, --=20 2.32.0