From nobody Sun Apr 12 21:02:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CBE139B969 for ; Wed, 1 Apr 2026 21:47:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775080031; cv=none; b=qU1kBxWj6uXwd/EImRPb1lVKVDJB8zW84FggTFLB0b600pmP+1i6OonpuDFT8ePzsuP2n2Xb5ZVk5IXy9EjuRkJe/lkwsNhySDaVMkywRy5IzoZgRaruNS+Ma/lw/+77RK3i2nBO9UlA/ddMnPUtsccPR2HOY6ro3It6qt3rtls= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775080031; c=relaxed/simple; bh=48rGNjeH05FXE5UL8IXWGMzqRphpvmhZEDiqHQroaDU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OVtrJ8w7P/O7/Q6rV85trbLDIlW0lX5BtttWsWFHXUGl8LANBwbWYlQwvWin+vXJQ7xToNzlGq6e78krcP3CzofaRUldbj1z99QIDiWpcd9+k/oZi0+3cVERGsFHwRbSZ2bKTpqjiwX9xVqlvSJXkOl+M333cd9zr2388uAzjXY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=J9AC9D5x; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="J9AC9D5x" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1775080031; x=1806616031; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=48rGNjeH05FXE5UL8IXWGMzqRphpvmhZEDiqHQroaDU=; b=J9AC9D5x2/7i2PQnuM4Rj/Blb5TbIlPFmyz+1/VUwk63BGo4Z+5Sp0rC jOp5KC4jQEPCfF7nbpkhBqhgGRCwhTkE3YVypY8g6+lE6IHYHyL247LjK J3sP1QeFR5sdHfzAYqcuJSA0qZApWeBTMopCoRw4KV7zKqeVJ2PZKqDVP KoiFvsyMCJak5yGkRJo0lwzH475rM7RdpSQtYLrARthmjUHnwe+mj2DNU 4mXAVwPdZ3UkQS/f8QfvCfvdbdEZOGTs9Z+KN8V7T3H5p0k3KYzVbkcXK YO00UFfTyhl/T3vgUNVtViDeFNk+QSpDvX9UUZfrEbEQE/ULPhn2zvdZm A==; X-CSE-ConnectionGUID: DDtcGfjoSsyjdn3yl16MUA== X-CSE-MsgGUID: 07shk/tZQqOPC9lmw2uXHA== X-IronPort-AV: E=McAfee;i="6800,10657,11746"; a="79740013" X-IronPort-AV: E=Sophos;i="6.23,153,1770624000"; d="scan'208";a="79740013" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Apr 2026 14:47:09 -0700 X-CSE-ConnectionGUID: LN3U2ImKSji5YJQnF/1O9g== X-CSE-MsgGUID: Q2GeRomqSmuEOupZqKWKwA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,153,1770624000"; d="scan'208";a="249842478" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by fmviesa002.fm.intel.com with ESMTP; 01 Apr 2026 14:47:07 -0700 From: Tim Chen To: Peter Zijlstra , Ingo Molnar , K Prateek Nayak , "Gautham R . Shenoy" , Vincent Guittot Cc: Tim Chen , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Madadi Vineeth Reddy , Hillf Danton , Shrikanth Hegde , Jianyong Wu , Yangyu Chen , Tingyin Duan , Vern Hao , Vern Hao , Len Brown , Aubrey Li , Zhao Liu , Chen Yu , Chen Yu , Adam Li , Aaron Lu , Tim Chen , Josh Don , Gavin Guo , Qais Yousef , Libo Chen , linux-kernel@vger.kernel.org Subject: [Patch v4 15/22] sched/cache: Respect LLC preference in task migration and detach Date: Wed, 1 Apr 2026 14:52:27 -0700 Message-Id: <53da65f3d59de31e1a1dc59a4093d8dd9d4dc206.1775065312.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" During load balancing, make can_migrate_task() consider a task's LLC preference. Prevent a task from being moved out of its preferred LLC. During the regular load balancing, if the task cannot be migrated due to LLC locality, the nr_balance_failed also should not be increased. Suggested-by: Peter Zijlstra (Intel) Suggested-by: K Prateek Nayak Co-developed-by: Chen Yu Signed-off-by: Chen Yu Signed-off-by: Tim Chen --- Notes: v3->v4: Fix the bug in migrate_degrades_llc() that p->preferred_llc should be used for comparison rather than task_llc(p). (Madadi Vineeth Reddy) =20 Let nr_balance_failed overwrite cache-aware migration if the former is too high. (Peter Zijlstra, K Prateek Nayak) kernel/sched/fair.c | 83 +++++++++++++++++++++++++++++++++++++++++--- kernel/sched/sched.h | 13 +++++++ 2 files changed, 91 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index fef916afa1d5..9541e94370e7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9777,6 +9777,7 @@ enum migration_type { #define LBF_DST_PINNED 0x04 #define LBF_SOME_PINNED 0x08 #define LBF_ACTIVE_LB 0x10 +#define LBF_LLC_PINNED 0x20 =20 struct lb_env { struct sched_domain *sd; @@ -10089,8 +10090,8 @@ static enum llc_mig can_migrate_llc(int src_cpu, in= t dst_cpu, * Check if task p can migrate from source LLC to * destination LLC in terms of cache aware load balance. */ -static __maybe_unused enum llc_mig can_migrate_llc_task(int src_cpu, int d= st_cpu, - struct task_struct *p) +static enum llc_mig can_migrate_llc_task(int src_cpu, int dst_cpu, + struct task_struct *p) { struct mm_struct *mm; bool to_pref; @@ -10157,6 +10158,46 @@ alb_break_llc(struct lb_env *env) =20 return false; } + +/* + * Check if migrating task p from env->src_cpu to + * env->dst_cpu breaks LLC localiy. + */ +static bool migrate_degrades_llc(struct task_struct *p, struct lb_env *env) +{ + if (!sched_cache_enabled()) + return false; + + if (task_has_sched_core(p)) + return false; + /* + * Skip over tasks that would degrade LLC locality; + * only when nr_balanced_failed is sufficiently high do we + * ignore this constraint. + * + * Threshold of cache_nice_tries is set to 1 higher + * than nr_balance_failed to avoid excessive task + * migration at the same time. + */ + if (env->sd->nr_balance_failed >=3D env->sd->cache_nice_tries + 1) + return false; + + /* + * We know the env->src_cpu has some tasks prefer to + * run on env->dst_cpu, skip the tasks do not prefer + * env->dst_cpu, and find the one that prefers. + */ + if (env->migration_type =3D=3D migrate_llc_task && + READ_ONCE(p->preferred_llc) !=3D llc_id(env->dst_cpu)) + return true; + + if (can_migrate_llc_task(env->src_cpu, + env->dst_cpu, p) !=3D mig_forbid) + return false; + + return true; +} + #else static inline bool get_llc_stats(int cpu, unsigned long *util, unsigned long *cap) @@ -10169,6 +10210,12 @@ alb_break_llc(struct lb_env *env) { return false; } + +static inline bool +migrate_degrades_llc(struct task_struct *p, struct lb_env *env) +{ + return false; +} #endif /* * can_migrate_task - may task p from runqueue rq be migrated to this_cpu? @@ -10266,10 +10313,29 @@ int can_migrate_task(struct task_struct *p, struc= t lb_env *env) return 1; =20 degrades =3D migrate_degrades_locality(p, env); - if (!degrades) + if (!degrades) { + /* + * If the NUMA locality is not broken, + * further check if migration would hurt + * LLC locality. + */ + if (migrate_degrades_llc(p, env)) { + /* + * If regular load balancing fails to pull a task + * due to LLC locality, this is expected behavior + * and we set LBF_LLC_PINNED so we don't increase + * nr_balance_failed unecessarily. + */ + if (env->migration_type !=3D migrate_llc_task) + env->flags |=3D LBF_LLC_PINNED; + + return 0; + } + hot =3D task_hot(p, env); - else + } else { hot =3D degrades > 0; + } =20 if (!hot || env->sd->nr_balance_failed > env->sd->cache_nice_tries) { if (hot) @@ -12910,9 +12976,16 @@ static int sched_balance_rq(int this_cpu, struct r= q *this_rq, * * Similarly for migration_misfit which is not related to * load/util migration, don't pollute nr_balance_failed. + * + * The same for cache aware scheduling's allowance for + * load imbalance. If regular load balance does not + * migrate task due to LLC locality, it is a expected + * behavior and don't pollute nr_balance_failed. + * See can_migrate_task(). */ if (idle !=3D CPU_NEWLY_IDLE && - env.migration_type !=3D migrate_misfit) + env.migration_type !=3D migrate_misfit && + !(env.flags & LBF_LLC_PINNED)) sd->nr_balance_failed++; =20 if (need_active_balance(&env)) { diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 081f23a48414..511c85572b96 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1546,6 +1546,14 @@ extern void sched_core_dequeue(struct rq *rq, struct= task_struct *p, int flags); extern void sched_core_get(void); extern void sched_core_put(void); =20 +static inline bool task_has_sched_core(struct task_struct *p) +{ + if (sched_core_disabled()) + return false; + + return !!p->core_cookie; +} + #else /* !CONFIG_SCHED_CORE: */ =20 static inline bool sched_core_enabled(struct rq *rq) @@ -1586,6 +1594,11 @@ static inline bool sched_group_cookie_match(struct r= q *rq, return true; } =20 +static inline bool task_has_sched_core(struct task_struct *p) +{ + return false; +} + #endif /* !CONFIG_SCHED_CORE */ =20 #ifdef CONFIG_RT_GROUP_SCHED --=20 2.32.0