From nobody Sun Oct 5 05:31:13 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50FEB275872 for ; Sat, 9 Aug 2025 05:14:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754716461; cv=none; b=IojQ3o0319gniGMl43HTVAIglRucNSyn6f7mIZ2sA6nGcNZlUGEZWY6057tDsNZ4vk1O+nB32WSiImkG1cA4P3bSQwXMpAQf17p3nQR/jrpVdHP7V0+mJDJgG2Sf7l8Ti7krqUFqfX2pqdYZqFCdot+k/yWtmOk0EJDIuVmMV9c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754716461; c=relaxed/simple; bh=vn8SQoDAOd07cpCKj326vCb8D/qCqZCs9BwVgi4KoVg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=PMXTmQv0twXoDtxt8UL0eicbBf9o7Tfv//e1wCHtE99sZhtB/nIU8wVaymmbbfxF6XMzVWor4WdG/qxhSD/wWJ4vwz9cuiTjtjsAycEcLvPhT9aUF3kUVc4kwE6SAX+4OTFtFFHuRXxnq1R/3zLjD9SIRWfAHwReiV3b07cgCmM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nkaLairD; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nkaLairD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1754716461; x=1786252461; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vn8SQoDAOd07cpCKj326vCb8D/qCqZCs9BwVgi4KoVg=; b=nkaLairDDjkysNs4UHTZ5xKHewoxOhkUNj9VyNX4tbdn4A+qnsAVwMle R5heWKuCY8Flip8hzeFiNi/CFABQw9zu7obpiMTotWoXuYrKrGBh3HoZh lmLG5GkRobIJvrI3ad/N+LP8GAWOX5LCCD9ciXh9NpYENpuy7gVq79Rno lOPq4XCXPVEuiMBh+0Se3GxDjUG9K2DZWlyzewIOPzwn2XZvGRXdUZ3Ot w3MIJHSIsVA80TETVQPqTJE71E/W3dHyU/Fc9CdibOzm0oeRAQl7UFWIV RWX9ArTsi+Tp+Wc1c8CtMey1/OwsGiNy5hIgFduA1bpZq8LLDwe4VQMBf g==; X-CSE-ConnectionGUID: Uof0EABPSOGTbrf//bNl/A== X-CSE-MsgGUID: G6Abi0pKRMKI/T1SAuVRiQ== X-IronPort-AV: E=McAfee;i="6800,10657,11515"; a="60860044" X-IronPort-AV: E=Sophos;i="6.17,278,1747724400"; d="scan'208";a="60860044" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2025 22:14:20 -0700 X-CSE-ConnectionGUID: Y7bYuOZeSdOfQBAqGM3CtQ== X-CSE-MsgGUID: edOp5vWRTYOveGm/1PiD4g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,278,1747724400"; d="scan'208";a="169693142" Received: from chenyu-dev.sh.intel.com ([10.239.62.107]) by orviesa003.jf.intel.com with ESMTP; 08 Aug 2025 22:14:14 -0700 From: Chen Yu To: Peter Zijlstra , Ingo Molnar , K Prateek Nayak , "Gautham R . Shenoy" Cc: Vincent Guittot , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Libo Chen , Madadi Vineeth Reddy , Hillf Danton , Shrikanth Hegde , Jianyong Wu , Yangyu Chen , Tingyin Duan , Vern Hao , Len Brown , Tim Chen , Aubrey Li , Zhao Liu , Chen Yu , Chen Yu , linux-kernel@vger.kernel.org Subject: [RFC PATCH v4 23/28] sched: Scan a task's preferred node for preferred LLC Date: Sat, 9 Aug 2025 13:08:11 +0800 Message-Id: <178bf43d7cbc9b2c9aea408dd56b87391067df37.1754712565.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When sched_cache is enabled, fully scanning all online CPUs to find the hottest one is very costly. As a first step, limit the scan to only the CPUs within the task's preferred node. If the node containing the task's preferred LLC is not in the CPU scan mask, add it. Additionally, if the node where the current task is running is not in the scan mask, add it too. Suggested-by: Jianyong Wu Suggested-by: Shrikanth Hegde Co-developed-by: Tim Chen Signed-off-by: Tim Chen Signed-off-by: Chen Yu --- kernel/sched/fair.c | 36 +++++++++++++++++++++++++++++++++--- 1 file changed, 33 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 64f757ad39fc..420d3a080990 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1390,13 +1390,36 @@ static void task_tick_cache(struct rq *rq, struct t= ask_struct *p) } } =20 +static void get_scan_cpumasks(cpumask_var_t cpus, int cache_cpu, + int pref_nid, int curr_cpu) +{ +#ifdef CONFIG_NUMA_BALANCING + /* first honor the task's preferred node */ + if (pref_nid !=3D NUMA_NO_NODE) + cpumask_or(cpus, cpus, cpumask_of_node(pref_nid)); +#endif + + /* secondly honor the task's cache CPU if it is not included */ + if (cache_cpu !=3D -1 && !cpumask_test_cpu(cache_cpu, cpus)) + cpumask_or(cpus, cpus, + cpumask_of_node(cpu_to_node(cache_cpu))); + + /* + * Thirdly honor the task's current running node + * as the last resort. + */ + if (!cpumask_test_cpu(curr_cpu, cpus)) + cpumask_or(cpus, cpus, cpumask_of_node(cpu_to_node(curr_cpu))); +} + static void __no_profile task_cache_work(struct callback_head *work) { struct task_struct *p =3D current; struct mm_struct *mm =3D p->mm; unsigned long m_a_occ =3D 0; unsigned long last_m_a_occ =3D 0; - int cpu, m_a_cpu =3D -1; + int cpu, m_a_cpu =3D -1, cache_cpu, + pref_nid =3D NUMA_NO_NODE, curr_cpu =3D smp_processor_id(); cpumask_var_t cpus; =20 WARN_ON_ONCE(work !=3D &p->cache_work); @@ -1406,11 +1429,18 @@ static void __no_profile task_cache_work(struct cal= lback_head *work) if (p->flags & PF_EXITING) return; =20 - if (!alloc_cpumask_var(&cpus, GFP_KERNEL)) + if (!zalloc_cpumask_var(&cpus, GFP_KERNEL)) return; =20 + cache_cpu =3D mm->mm_sched_cpu; +#ifdef CONFIG_NUMA_BALANCING + if (static_branch_likely(&sched_numa_balancing)) + pref_nid =3D p->numa_preferred_nid; +#endif + scoped_guard (cpus_read_lock) { - cpumask_copy(cpus, cpu_online_mask); + get_scan_cpumasks(cpus, cache_cpu, + pref_nid, curr_cpu); =20 for_each_cpu(cpu, cpus) { /* XXX sched_cluster_active */ --=20 2.25.1