From nobody Thu Apr 2 17:02:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F38F394481 for ; Tue, 10 Feb 2026 22:13:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770761631; cv=none; b=CbsdXjboyNHam/s/WgHNm2UwErJtHkfQqkiR6t1EwFvwnrfjpGPL7+fA57h0mqZbt6rFSQ58Nxtuw50O8vO/cJXuWaK/nJgAUn+fcyXUPB9OjKiiZ5dw7EqJf/YFZao72EcKTe+GJWiUqiYfk1IFwj0Kn0V9nZ44H7aOe/2qaYY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770761631; c=relaxed/simple; bh=wghvjcCtlAuZLzOfZ4J/EiqZ5QL4yndcyXeomtKcxz8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QYEHtnMS97gFJr7UP5SX3RK690dgbnmYAcIQcjgwTBwEcdMODCUpEur8kKBR/g1d7smNJgdsNhE4H4D3zaB+9r/SAzpliPJb9Mw9uxiWMYLuOUrE5nbccqkyuG1hOKv/PzAwn3Kzgx7MsPwxbcrJOhUeKnKpQVhyOXoMZNPa5KA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YhdbfbjW; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YhdbfbjW" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1770761630; x=1802297630; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wghvjcCtlAuZLzOfZ4J/EiqZ5QL4yndcyXeomtKcxz8=; b=YhdbfbjWjyKqgcWXrUUh+Bx6MUZmnEo3ibRVtCzCy+1wxYsrTKbzHdMZ DDDEMPZwr0iPLUMxQv1X8QnUD/DJ9Y0RZaaTuMOjvEcVtKnxc8u2oEHoN JySn++1SiF1yQWg+ZcdFzqSfUwZQl3ikWvkkP4Z622i5LIQzMbfmG5PIx U59Eh3g2OlfyyQY5c4bWkoqZOgiHErYr6KB17UyNYCZkl0Mi2+NHfV4on JJS+ecQCABxXHZ+IQa5hOqzWBnct2CTGTxfu+QjCSTc3YJPpMtXimOmS5 ZbrRPpBCXw92xEravRppxq4S4W7wejo994Acx4S+e7+8hYkQY3TttHEqC w==; X-CSE-ConnectionGUID: tFTCPkIMRc62c5T2VK6jew== X-CSE-MsgGUID: qOcwBoX/TDyh/kqnX5PRxA== X-IronPort-AV: E=McAfee;i="6800,10657,11697"; a="82631651" X-IronPort-AV: E=Sophos;i="6.21,283,1763452800"; d="scan'208";a="82631651" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2026 14:13:49 -0800 X-CSE-ConnectionGUID: ye5jfKVFQ9+0v7lGFII88g== X-CSE-MsgGUID: NdY6Z6pfSjSMakp/1q8R3g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,283,1763452800"; d="scan'208";a="216374046" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by fmviesa004.fm.intel.com with ESMTP; 10 Feb 2026 14:13:48 -0800 From: Tim Chen To: Peter Zijlstra , Ingo Molnar , K Prateek Nayak , "Gautham R . Shenoy" , Vincent Guittot Cc: Chen Yu , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Madadi Vineeth Reddy , Hillf Danton , Shrikanth Hegde , Jianyong Wu , Yangyu Chen , Tingyin Duan , Vern Hao , Vern Hao , Len Brown , Tim Chen , Aubrey Li , Zhao Liu , Chen Yu , Adam Li , Aaron Lu , Tim Chen , Josh Don , Gavin Guo , Qais Yousef , Libo Chen , linux-kernel@vger.kernel.org Subject: [PATCH v3 21/21] -- DO NOT APPLY!!! -- sched/cache/debug: Add ftrace to track the load balance statistics Date: Tue, 10 Feb 2026 14:19:01 -0800 Message-Id: <5d663caaed7ebe93ab9b272235675b2400b3ed8b.1770760558.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Chen Yu Debug patch only. The user leverages these trace events (via bpftrace, etc.) to monitor the cache-aware load balancing activity - specifically, whether tasks are moved to their preferred LLC, moved out of their preferred LLC, or whether cache-aware load balancing is skipped due to exceeding the memory footprint limit or too many active tasks. Signed-off-by: Chen Yu Signed-off-by: Tim Chen --- Notes: v2->v3: Add more trace events when the process exceeds the limitation of LLC size or number of active threads(moved from schedstat to trace event for better bpf tracking) include/trace/events/sched.h | 79 ++++++++++++++++++++++++++++++++++++ kernel/sched/fair.c | 40 ++++++++++++++---- 2 files changed, 110 insertions(+), 9 deletions(-) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 7b2645b50e78..b73327653e4b 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -10,6 +10,85 @@ #include #include =20 +#ifdef CONFIG_SCHED_CACHE +TRACE_EVENT(sched_exceed_llc_cap, + + TP_PROTO(struct task_struct *t, int exceeded), + + TP_ARGS(t, exceeded), + + TP_STRUCT__entry( + __array( char, comm, TASK_COMM_LEN ) + __field( pid_t, pid ) + __field( int, exceeded ) + ), + + TP_fast_assign( + memcpy(__entry->comm, t->comm, TASK_COMM_LEN); + __entry->pid =3D t->pid; + __entry->exceeded =3D exceeded; + ), + + TP_printk("comm=3D%s pid=3D%d exceed_cap=3D%d", + __entry->comm, __entry->pid, + __entry->exceeded) +); + +TRACE_EVENT(sched_exceed_llc_nr, + + TP_PROTO(struct task_struct *t, int exceeded), + + TP_ARGS(t, exceeded), + + TP_STRUCT__entry( + __array( char, comm, TASK_COMM_LEN ) + __field( pid_t, pid ) + __field( int, exceeded ) + ), + + TP_fast_assign( + memcpy(__entry->comm, t->comm, TASK_COMM_LEN); + __entry->pid =3D t->pid; + __entry->exceeded =3D exceeded; + ), + + TP_printk("comm=3D%s pid=3D%d exceed_nr=3D%d", + __entry->comm, __entry->pid, + __entry->exceeded) +); + +TRACE_EVENT(sched_attach_task, + + TP_PROTO(struct task_struct *t, int pref_cpu, int pref_llc, + int attach_cpu, int attach_llc), + + TP_ARGS(t, pref_cpu, pref_llc, attach_cpu, attach_llc), + + TP_STRUCT__entry( + __array( char, comm, TASK_COMM_LEN ) + __field( pid_t, pid ) + __field( int, pref_cpu ) + __field( int, pref_llc ) + __field( int, attach_cpu ) + __field( int, attach_llc ) + ), + + TP_fast_assign( + memcpy(__entry->comm, t->comm, TASK_COMM_LEN); + __entry->pid =3D t->pid; + __entry->pref_cpu =3D pref_cpu; + __entry->pref_llc =3D pref_llc; + __entry->attach_cpu =3D attach_cpu; + __entry->attach_llc =3D attach_llc; + ), + + TP_printk("comm=3D%s pid=3D%d pref_cpu=3D%d pref_llc=3D%d attach_cpu=3D%d= attach_llc=3D%d", + __entry->comm, __entry->pid, + __entry->pref_cpu, __entry->pref_llc, + __entry->attach_cpu, __entry->attach_llc) +); +#endif + /* * Tracepoint for calling kthread_stop, performed to end a kthread: */ diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 25cee3dd767c..977091fd0e49 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1245,9 +1245,11 @@ static inline int get_sched_cache_scale(int mul) return (1 + (llc_aggr_tolerance - 1) * mul); } =20 -static bool exceed_llc_capacity(struct mm_struct *mm, int cpu) +static bool exceed_llc_capacity(struct mm_struct *mm, int cpu, + struct task_struct *p) { struct cacheinfo *ci; + bool exceeded; u64 rss, llc; int scale; =20 @@ -1293,12 +1295,18 @@ static bool exceed_llc_capacity(struct mm_struct *m= m, int cpu) if (scale =3D=3D INT_MAX) return false; =20 - return ((llc * scale) <=3D (rss * PAGE_SIZE)); + exceeded =3D ((llc * scale) <=3D (rss * PAGE_SIZE)); + + trace_sched_exceed_llc_cap(p, exceeded); + + return exceeded; } =20 -static bool exceed_llc_nr(struct mm_struct *mm, int cpu) +static bool exceed_llc_nr(struct mm_struct *mm, int cpu, + struct task_struct *p) { int smt_nr =3D 1, scale; + bool exceeded; =20 #ifdef CONFIG_SCHED_SMT if (sched_smt_active()) @@ -1313,8 +1321,12 @@ static bool exceed_llc_nr(struct mm_struct *mm, int = cpu) if (scale =3D=3D INT_MAX) return false; =20 - return !fits_capacity((mm->sc_stat.nr_running_avg * smt_nr), + exceeded =3D !fits_capacity((mm->sc_stat.nr_running_avg * smt_nr), (scale * per_cpu(sd_llc_size, cpu))); + + trace_sched_exceed_llc_nr(p, exceeded); + + return exceeded; } =20 static void account_llc_enqueue(struct rq *rq, struct task_struct *p) @@ -1522,8 +1534,8 @@ void account_mm_sched(struct rq *rq, struct task_stru= ct *p, s64 delta_exec) if (time_after(epoch, READ_ONCE(mm->sc_stat.epoch) + llc_epoch_affinity_timeout) || get_nr_threads(p) <=3D 1 || - exceed_llc_nr(mm, cpu_of(rq)) || - exceed_llc_capacity(mm, cpu_of(rq))) { + exceed_llc_nr(mm, cpu_of(rq), p) || + exceed_llc_capacity(mm, cpu_of(rq), p)) { if (mm->sc_stat.cpu !=3D -1) mm->sc_stat.cpu =3D -1; } @@ -1600,7 +1612,7 @@ static void task_cache_work(struct callback_head *wor= k) =20 curr_cpu =3D task_cpu(p); if (get_nr_threads(p) <=3D 1 || - exceed_llc_capacity(mm, curr_cpu)) { + exceed_llc_capacity(mm, curr_cpu, p)) { if (mm->sc_stat.cpu !=3D -1) mm->sc_stat.cpu =3D -1; =20 @@ -10159,8 +10171,8 @@ static enum llc_mig can_migrate_llc_task(int src_cp= u, int dst_cpu, * Skip cache aware load balance for single/too many threads * or large memory RSS. */ - if (get_nr_threads(p) <=3D 1 || exceed_llc_nr(mm, dst_cpu) || - exceed_llc_capacity(mm, dst_cpu)) { + if (get_nr_threads(p) <=3D 1 || exceed_llc_nr(mm, dst_cpu, p) || + exceed_llc_capacity(mm, dst_cpu, p)) { if (mm->sc_stat.cpu !=3D -1) mm->sc_stat.cpu =3D -1; return mig_unrestricted; @@ -10602,6 +10614,16 @@ static void attach_task(struct rq *rq, struct task= _struct *p) { lockdep_assert_rq_held(rq); =20 +#ifdef CONFIG_SCHED_CACHE + if (p->mm) { + int pref_cpu =3D p->mm->sc_stat.cpu; + + trace_sched_attach_task(p, + pref_cpu, + pref_cpu !=3D -1 ? llc_id(pref_cpu) : -1, + cpu_of(rq), llc_id(cpu_of(rq))); + } +#endif WARN_ON_ONCE(task_rq(p) !=3D rq); activate_task(rq, p, ENQUEUE_NOCLOCK); wakeup_preempt(rq, p, 0); --=20 2.32.0