From nobody Mon May 25 00:09:08 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD5703EE1D5; Wed, 20 May 2026 08:35:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779266105; cv=none; b=qesusfzFgBeLvcphemDj1PmbeMlgN1wBtYIzjlqjzRuDMc6zUsrC7GGzd5Llw5vF8XGCXEoFdXdDjW2tmHgJj0rRFW28LYFmqNfTABHIgRDikN9CECWaJtrPcjUTU0w2FL101UniLJkHoxZwejtmu/LTDLXq/Uhn5T0TrvLqBok= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779266105; c=relaxed/simple; bh=gvCoq7A5YTVBskV+wHdBiDVZEFfSGHHCCdL5R992j+k=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=jQL1QOCFLb5q9+RounueBVgj3fij0J4/IHWDaJVVp5PB8WkYgSFfVkE2q5xvr7u30C8g9tvZjbzw7pEyeJlekWO+Kteb0ydYt0d2LhK1vTywEo6iLcUHKmqxlk0z+4+C+VlCadFX73/Em3rg9PIf3RcLRQcBIBLdkw1GeOJMrXU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=SMNmv7Wa; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=5B3EKOqH; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="SMNmv7Wa"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="5B3EKOqH" Date: Wed, 20 May 2026 08:35:01 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1779266102; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cXx0baI+bq3HjIi8NvfPawMgqk6KRLXF8ME4l3SreVM=; b=SMNmv7WakxJZRe8oGf2096tXMz9vcPtdygxZN1h9auv9CnUbct71gR1gJVU6Pan7f5p04e yI152YWCbu6RoNjo3cEexHnHKDh4hTnPVIpe4/q9n2zmPoRI33TP5i5XySawGqkpHQahdR yT99PJpfgvs5bAh0V8nlNiEHZhHuth6Z6vEx2EQc6I5juIrKMTJeEmgAly/tCOjMhUEift W+wVNhUTvP41pH0EvuAVTNuuM2sZVLNef8cMlibAcoaCzJfhis6NfeofaI/AEjMk5d5nam hm3YkwFibzKaRATLWDDKH3fI5gfaUuzAqrFxa1ZT1fNklh8kAj7yShdboz+hQw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1779266102; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cXx0baI+bq3HjIi8NvfPawMgqk6KRLXF8ME4l3SreVM=; b=5B3EKOqH/Gr+xmjmsgJePoXXZp2tLLE9NpbjMSel9YwE5605DsJO4uLMZlWY1p9uc1clNu XC+Kyc3u81xF4aDQ== From: "tip-bot2 for Chen Yu" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/core] sched/cache: Limit the scan number of CPUs when calculating task occupancy Cc: Madadi Vineeth Reddy , Chen Yu , Tim Chen , "Peter Zijlstra (Intel)" , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: =?utf-8?q?=3C57ed5fcec9b242803fe4ea2ce6e7f3de6a6efc6b=2E1775065?= =?utf-8?q?312=2Egit=2Etim=2Ec=2Echen=40linux=2Eintel=2Ecom=3E?= References: =?utf-8?q?=3C57ed5fcec9b242803fe4ea2ce6e7f3de6a6efc6b=2E17750653?= =?utf-8?q?12=2Egit=2Etim=2Ec=2Echen=40linux=2Eintel=2Ecom=3E?= Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <177926610113.711.7480700058796646491.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the sched/core branch of tip: Commit-ID: b4606faab3188beeacc2287b8a369cca943cc8eb Gitweb: https://git.kernel.org/tip/b4606faab3188beeacc2287b8a369cca9= 43cc8eb Author: Chen Yu AuthorDate: Wed, 01 Apr 2026 14:52:14 -07:00 Committer: Peter Zijlstra CommitterDate: Thu, 09 Apr 2026 15:49:47 +02:00 sched/cache: Limit the scan number of CPUs when calculating task occupancy When NUMA balancing is enabled, the kernel currently iterates over all online CPUs to aggregate process-wide occupancy data. On large systems, this global scan introduces significant overhead. To reduce scan latency, limit the search to a subset of relevant CPUs: 1. The task's preferred NUMA node. 2. The node where the task is currently running. 3. The node that contains the task's current preferred LLC.. While focusing solely on the preferred NUMA node is ideal, a process-wide scan must remain flexible because the "preferred node" is a per-task attribute. Different threads within the same process may have different preferred nodes, causing the process-wide preference to migrate. Maintaining a mask that covers both the preferred and active running nodes ensures accuracy while significantly reducing the number of CPUs inspected. Future work may integrate numa_group to further refine task aggregation. Suggested-by: Madadi Vineeth Reddy Signed-off-by: Chen Yu Co-developed-by: Tim Chen Signed-off-by: Tim Chen Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/57ed5fcec9b242803fe4ea2ce6e7f3de6a6efc6b.177= 5065312.git.tim.c.chen@linux.intel.com --- kernel/sched/fair.c | 47 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c9cd064..a55ada2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1522,6 +1522,51 @@ static void task_tick_cache(struct rq *rq, struct ta= sk_struct *p) } } =20 +static void get_scan_cpumasks(cpumask_var_t cpus, struct task_struct *p) +{ +#ifdef CONFIG_NUMA_BALANCING + int cpu, curr_cpu, nid, pref_nid; + + if (!static_branch_likely(&sched_numa_balancing)) + goto out; + + cpu =3D p->mm->sc_stat.cpu; + if (cpu !=3D -1) + nid =3D cpu_to_node(cpu); + curr_cpu =3D task_cpu(p); + + /* + * Scanning in the preferred NUMA node is ideal. However, the NUMA + * preferred node is per-task rather than per-process. It is possible + * for different threads of the process to have distinct preferred + * nodes; consequently, the process-wide preferred LLC may bounce + * between different nodes. As a workaround, maintain the scan + * CPU mask to also cover the process's current preferred LLC and the + * current running node to mitigate the bouncing risk. + * TBD: numa_group should be considered during task aggregation. + */ + pref_nid =3D p->numa_preferred_nid; + /* honor the task's preferred node */ + if (pref_nid =3D=3D NUMA_NO_NODE) + goto out; + + cpumask_or(cpus, cpus, cpumask_of_node(pref_nid)); + + /* honor the task's preferred LLC CPU */ + if (cpu !=3D -1 && !cpumask_test_cpu(cpu, cpus) && nid !=3D NUMA_NO_NODE) + cpumask_or(cpus, cpus, cpumask_of_node(nid)); + + /* make sure the task's current running node is included */ + if (!cpumask_test_cpu(curr_cpu, cpus)) + cpumask_or(cpus, cpus, cpumask_of_node(cpu_to_node(curr_cpu))); + + return; + +out: +#endif + cpumask_copy(cpus, cpu_online_mask); +} + static void task_cache_work(struct callback_head *work) { struct task_struct *p =3D current; @@ -1544,7 +1589,7 @@ static void task_cache_work(struct callback_head *wor= k) scoped_guard (cpus_read_lock) { guard(rcu)(); =20 - cpumask_copy(cpus, cpu_online_mask); + get_scan_cpumasks(cpus, p); =20 for_each_cpu(cpu, cpus) { /* XXX sched_cluster_active */