From nobody Fri Jun 12 12:45:32 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A4FB4257854 for ; Mon, 8 Jun 2026 12:47:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780922838; cv=none; b=NXb2xXJp/qkoodfWsQe2UN98HHqucPGXIzFtl2jBr0O/0BXud4i/bM8mLeGrxXTrxAbJn4j0wqSqRIeTPYKueFJyxpC+By3h78K8IKGqr8piO4l6L74zSgEuscz93uWrWsTbPZFxNoFO5ITHlzkw5dHkGFjX8oixOxRiUcILguQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780922838; c=relaxed/simple; bh=+4y2doS053fUI1M8ulzJ7xzX8CgHG125nPm549o2bN8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=KgmMzIwz3Qgnk1NyW4ij+JQXFRepDRZvE/r83DkIKJ7gtf6TQ3TedntHah87pgtUzSFq2kkPqfkqDYKrXBTFVoi36+XrFpRK0wXj6n/yvHaFkq9Qb7Jxyc3KQZdAUSJpblPThCLE+snmwy74Z3Vkoklbyy0M651W24ON752hABk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=b+tnhxNj; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="b+tnhxNj" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1780922838; x=1812458838; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=+4y2doS053fUI1M8ulzJ7xzX8CgHG125nPm549o2bN8=; b=b+tnhxNjzIsG2w5Jxs1++V07py+SbC7kEk3oiu7Pz+Nna15xs0hgRT9Z VIof39eFZfKbgSoUf+CKJr7hhuwqPoxqW9pxkDoZ6Pf6tJSlIb5/5dyJt JlAIi+khbi5w8ArtcY/pif5X56xul9OQTatKVMuvPrOl/3jKippEL4zg3 RjrCj+yJdGpN7qKD582RS6E76LbpJSQ9JWtF7Qk3ZNcJNfsadW/+LuwsS CBGojpxXin/41UnVjrZDyXeRrGPAQ54s3Rjlxp2cYJfey+i/twhyzdAcR 4AA/vLxJCJwv9gaO+ZFpaNxH3VwNMjFcPzu32O3z05HCARV6sSu36g3W1 Q==; X-CSE-ConnectionGUID: m15jeJNlS3mBf84NrkKAcg== X-CSE-MsgGUID: M3MBRz95SRi+CGAxQXEs/A== X-IronPort-AV: E=McAfee;i="6800,10657,11810"; a="85281903" X-IronPort-AV: E=Sophos;i="6.24,194,1774335600"; d="scan'208";a="85281903" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2026 05:47:15 -0700 X-CSE-ConnectionGUID: 5p9A393+Sbqpo9H7R8UbHQ== X-CSE-MsgGUID: oNLcd8DxQ4el7midFi3thw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,194,1774335600"; d="scan'208";a="247409304" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by fmviesa004.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2026 05:47:14 -0700 From: Ricardo Neri Date: Mon, 08 Jun 2026 05:57:11 -0700 Subject: [PATCH v4 1/6] sched/fair: Do not skip CPUs of similar capacity with busy SMT siblings Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260608-rneri-fix-cas-clusters-v4-1-1526711c944c@linux.intel.com> References: <20260608-rneri-fix-cas-clusters-v4-0-1526711c944c@linux.intel.com> In-Reply-To: <20260608-rneri-fix-cas-clusters-v4-0-1526711c944c@linux.intel.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Andrea Righi , K Prateek Nayak , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1780923493; l=1937; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=+4y2doS053fUI1M8ulzJ7xzX8CgHG125nPm549o2bN8=; b=12cjhSL8nxZi+taxwpAHAHrRd8e/y0X1e3woJQLzUwm9llZqEIZXCRVO2FXqPzyUp9AvRa4dW vdMbYqohMcIACE985uZfNVK+2QeVpb3Y6IOC4NsQ50PT9mD8T8Te+5U X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= When picking a busiest CPU with only one running task, the function sched_balance_find_src_rq() skips candidate CPUs if the destination CPU has less than ~5% extra capacity. This condition only holds if all the SMT siblings of a CPU are idle. SMT siblings share the computing resources of a physical core and this results in reduced capacity if more than one sibling is busy. Skipping a CPU as described would prevent the load balancer from pulling tasks from a scheduling group previously and correctly identified as group_smt_balance (i.e., one with more than one task running). Do not skip a candidate CPU of similar capacity if it has busy SMT siblings. Signed-off-by: Ricardo Neri Reviewed-by: Christian Loehle Reviewed-by: K Prateek Nayak Tested-by: Christian Loehle --- Changes in v4: * Introduced this patch. Changes in v3: * N/A Changes in v2: * N/A --- kernel/sched/fair.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f4ed841f766f..229f32cebf1f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12924,6 +12924,7 @@ static struct rq *sched_balance_find_src_rq(struct = lb_env *env, int i; =20 for_each_cpu_and(i, sched_group_span(group), env->cpus) { + bool smt_degraded_cap =3D sched_smt_active() && !is_core_idle(i); unsigned long capacity, load, util; unsigned int nr_running; enum fbq_type rt; @@ -12964,8 +12965,12 @@ static struct rq *sched_balance_find_src_rq(struct= lb_env *env, * eventually lead to active_balancing high->low capacity. * Higher per-CPU capacity is considered better than balancing * average load. + * + * Busy SMT siblings reduce the capacity of CPU i. Do not skip + * it in this case. */ if (env->sd->flags & SD_ASYM_CPUCAPACITY && + !smt_degraded_cap && !capacity_greater(capacity_of(env->dst_cpu), capacity) && nr_running =3D=3D 1) continue; --=20 2.43.0 From nobody Fri Jun 12 12:45:32 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E8A8221540 for ; Mon, 8 Jun 2026 12:47:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780922837; cv=none; b=UU6FtlxMI46c0THpnlgLb3bc12QPcM9J0wGwlx39wF4g08edIq+un6gDO2K/epDrfQ6Zn1DkssyAalTF8eg4MlwJAnky3Zx/Z3pUDWMhp6lyWYlsKO4fw5cvQVqXXaMTJyK7l3swF40oVkNpQQDhfcCHzVtdY8H6gF3F3R+o5xY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780922837; c=relaxed/simple; bh=5QIWHfuaYcbhnSqD7D2pTYbNxZvCwfXi7K2yANHnOko=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=IFUStoZbfIS9EDL0sGEn9KVjtDjyE4TFVe6BLPCd/F3ZlOKG1oE4aQfCK5vGYPjxMCCmgvE9y1SzbgxJIOGkqsyP6zUekQLoS1UIvsSTqSA+7EXuz4dPAKcr2r9HC5MwVOvGEibB8X/dnFxuAvf2Coy4tX/TanMH3S6dWKxPXR4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gNARJcPe; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gNARJcPe" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1780922836; x=1812458836; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=5QIWHfuaYcbhnSqD7D2pTYbNxZvCwfXi7K2yANHnOko=; b=gNARJcPesshkmiKWMd8M+uPCffHAjoZqIFMEq8jT4rAhfMtadeTAkh5R siUOkO4VyKS7BboqtEtVWk8OWrpV6EyZhHYgDmbyUQStsgNNP4fnGH6Ic u43WtYWowL1VhUYGl2R/lT2sRSjgSwtDNO2d5YcqXq6R4hN486VEkjz0/ x4RNrZU110t0Tysim9PhVZeiN3kU+9SYiXGcLhZG6qF94pchVWoMcGgRS FFcHX/i6YOAYT/7UtUlJkOS9SYpWETQB4kJw+IDigRe9LjiHQ56rhB4M6 dHrPFwy5qs06Wi3xTSvGuS0/WqGIk+4xdnBJY8aGnzxjiAd2VEFPLhoFc w==; X-CSE-ConnectionGUID: j7RhVZi/Qa63uXShrdSRkg== X-CSE-MsgGUID: J8+ZaWoaRBqvJFe1HjxHag== X-IronPort-AV: E=McAfee;i="6800,10657,11810"; a="85281913" X-IronPort-AV: E=Sophos;i="6.24,194,1774335600"; d="scan'208";a="85281913" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2026 05:47:15 -0700 X-CSE-ConnectionGUID: C3yT7zy5S2GE4gJliCcd2A== X-CSE-MsgGUID: be9qF3r+TJCgEcWLrSVMLQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,194,1774335600"; d="scan'208";a="247409309" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by fmviesa004.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2026 05:47:15 -0700 From: Ricardo Neri Date: Mon, 08 Jun 2026 05:57:12 -0700 Subject: [PATCH v4 2/6] sched/fair: Also gate overloaded status update for SD_ASYM_CPUCAPACITY Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260608-rneri-fix-cas-clusters-v4-2-1526711c944c@linux.intel.com> References: <20260608-rneri-fix-cas-clusters-v4-0-1526711c944c@linux.intel.com> In-Reply-To: <20260608-rneri-fix-cas-clusters-v4-0-1526711c944c@linux.intel.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Andrea Righi , K Prateek Nayak , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1780923493; l=1405; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=5QIWHfuaYcbhnSqD7D2pTYbNxZvCwfXi7K2yANHnOko=; b=sgNjrBJ6XuRbndzRjuLu2N2qjwHB4QCoPz3kKRASpCmUY0nUb1vQbvvkJQiHsanIA4IB3K8R9 c19KcA7GlpEDmZ0LxW/5jyrsK3FdvbwuzL3+ommjkkHGQop1avalfuP X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= The argument sg_overloaded of update_sg_lb_stats() is only consumed when balancing at the root domain. It only makes sense to update it in such a case. Commit 3229adbe7875 ("sched/fair: Do not compute overloaded status unnecessarily during lb") updated the logic accordingly but missed the case in which the root domain has the SD_ASYM_CPUCAPACITY flag. Fix this. Fixes: 3229adbe7875 ("sched/fair: Do not compute overloaded status unnecess= arily during lb") Reported-by: Chen Yu Signed-off-by: Ricardo Neri Tested-by: Christian Loehle --- Changes in v4: * Introduced this patch. Changes in v3: * N/A Changes in v2: * N/A --- kernel/sched/fair.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 229f32cebf1f..86987c69bddd 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -11853,7 +11853,9 @@ static inline void update_sg_lb_stats(struct lb_env= *env, /* Check for a misfit task on the cpu */ if (sgs->group_misfit_task_load < rq->misfit_task_load) { sgs->group_misfit_task_load =3D rq->misfit_task_load; - *sg_overloaded =3D 1; + + if (balancing_at_rd) + *sg_overloaded =3D 1; } } else if (env->idle && sched_reduced_capacity(rq, env->sd)) { /* Check for a task running on a CPU with reduced capacity */ --=20 2.43.0 From nobody Fri Jun 12 12:45:32 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF84025EF9C for ; Mon, 8 Jun 2026 12:47:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780922842; cv=none; b=ZnZDO56nNfhdP2j728mSsLOrbfx+teclKl7hSWOse1a+GOUHX4B5kyd3qxrTWRJtgcHHuu/6qQqeJTqkeHBRASZloNGV/MJ5jKjxj0TsUWo5kX0VeH1AUP5iRyUSFFqze2hMd7+ISiA5s4fhWMOf7COvEYJZb2nY0CDvA+qJl8E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780922842; c=relaxed/simple; bh=saTX2eKNxw/HbXvX9wE267EUwvGf5JSeNd1MTe62di4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=cV7fy5qHCnjTLWSj2RdQqqemNABhUDbmfaks+842gD+EbgGtnvPKsKkcaIoT67KY/rMZZHu+zbuoE6efvgOVjgnhvVQ9QyPafaJI5jqsmOH4kgAc7C6fj4YmLLgPdZVASXoSbgyT8h+L1iIlWJXLkrmKkMPc2Varm/Fz+OTGEpU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lbIQkflc; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lbIQkflc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1780922839; x=1812458839; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=saTX2eKNxw/HbXvX9wE267EUwvGf5JSeNd1MTe62di4=; b=lbIQkflcjBpEC9aYERKLLrXcKdst1/B45GHCKf5aom0n6nvtqcDuO/G4 BdYLCqw985PAcxFmtP95ZzwYpdMGwoEjmCFsuB+C4poy5QgTAwB9rm9DK gYDXIgZaMOJUx+mGRsCpgb08MWNW4iVmb6cwJOIwHT25CSQlyBM4ZWLXZ oU1ZlEt/xJ6U0TUs9Q0KPiQed8xba9K/qnIIJfp40LIe3QrVC4KoONEu+ TwcLbczJCij0LH+020Ep9KUnje0JiCdWp5vS/aNLyDRmzIx3YbJYKEt+/ zj0CSHrUe9x+eduEyix78LjqkwUBbvcb9w4CNHjDKCeXD6de6wkdKt9ly w==; X-CSE-ConnectionGUID: WTfzAYX8STSd3PxnOPWQpA== X-CSE-MsgGUID: ZaJbjEJkTEuXnDv0kcf+sg== X-IronPort-AV: E=McAfee;i="6800,10657,11810"; a="85281919" X-IronPort-AV: E=Sophos;i="6.24,194,1774335600"; d="scan'208";a="85281919" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2026 05:47:15 -0700 X-CSE-ConnectionGUID: T567r3DgRAe8fbWuUXXF3A== X-CSE-MsgGUID: qsZhracpRQK/7QxlBV5Nkg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,194,1774335600"; d="scan'208";a="247409312" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by fmviesa004.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2026 05:47:15 -0700 From: Ricardo Neri Date: Mon, 08 Jun 2026 05:57:13 -0700 Subject: [PATCH v4 3/6] sched/fair: Check CPU capacity before comparing group types during load balance Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260608-rneri-fix-cas-clusters-v4-3-1526711c944c@linux.intel.com> References: <20260608-rneri-fix-cas-clusters-v4-0-1526711c944c@linux.intel.com> In-Reply-To: <20260608-rneri-fix-cas-clusters-v4-0-1526711c944c@linux.intel.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Andrea Righi , K Prateek Nayak , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Vincent Guittot , Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1780923493; l=3047; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=saTX2eKNxw/HbXvX9wE267EUwvGf5JSeNd1MTe62di4=; b=HzcOYxlGJ/5HQEpCF4iQOoJBlgau6HQWBM+6LEW9OG+aKCpRka2UGkRkYvv62FPD4A5CBPoON frtRWQHXkxcDPZnBrP0KtW+n0QeQnya2xbp8wvyDEwKnTfXQcUb4sv+ X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= update_sd_pick_busiest() may incorrectly select a fully_busy group as the busiest group when its per-CPU capacity exceeds that of the destination CPU. This happens because the type of busiest group is initialized to group_has_spare and allows the fully_busy group to win the type comparison. update_sd_pick_busiest() should not choose a candidate scheduling group with at most one runnable task if its per-CPU capacity is greater than that of the destination CPU. Such a check already exists, but it is done too late: after the type comparison, preventing a subsequent fully_busy group of equal per-CPU capacity from being correctly selected. Move this check to occur before comparing group types. Fixes: 0b0695f2b34a ("sched/fair: Rework load_balance()") Reviewed-by: Christian Loehle Reviewed-by: Chen Yu Reviewed-by: Tim Chen Reviewed-by: Vincent Guittot Signed-off-by: Ricardo Neri Tested-by: Christian Loehle --- Changes in v4: * Dropped note on SMT not being affected since SMT + asym capacity is now supported. * Added Reviewed-by tags from Vincent, Tim, and Chen Yu. Thanks! Changes in v3: * Added a Fixes tag. (Christian) * Added Reviewed-by tag from Christian. Thanks! Changes in v2: * Added a note clarifying that SMT and SD_ASYM_CPUCAPACITY are mutually exclusive. (Tim) * Kept parentheses around bitwise operators for clarity. * Rewrote patch description for clarity. --- kernel/sched/fair.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 86987c69bddd..a30ba02df688 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -11934,6 +11934,17 @@ static bool update_sd_pick_busiest(struct lb_env *= env, sds->local_stat.group_type !=3D group_has_spare)) return false; =20 + /* + * Candidate sg has no more than one task per CPU and has higher + * per-CPU capacity. Migrating tasks to less capable CPUs may harm + * throughput. Maximize throughput, power/energy consequences are not + * considered. + */ + if ((env->sd->flags & SD_ASYM_CPUCAPACITY) && + (sgs->group_type <=3D group_fully_busy) && + (capacity_greater(sg->sgc->min_capacity, capacity_of(env->dst_cpu)))) + return false; + if (sgs->group_type > busiest->group_type) return true; =20 @@ -12040,17 +12051,6 @@ static bool update_sd_pick_busiest(struct lb_env *= env, break; } =20 - /* - * Candidate sg has no more than one task per CPU and has higher - * per-CPU capacity. Migrating tasks to less capable CPUs may harm - * throughput. Maximize throughput, power/energy consequences are not - * considered. - */ - if ((env->sd->flags & SD_ASYM_CPUCAPACITY) && - (sgs->group_type <=3D group_fully_busy) && - (capacity_greater(sg->sgc->min_capacity, capacity_of(env->dst_cpu)))) - return false; - return true; } =20 --=20 2.43.0 From nobody Fri Jun 12 12:45:32 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 381AA26059D for ; Mon, 8 Jun 2026 12:47:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780922845; cv=none; b=T/Nh0ax80cH+I+r4yoB0qzkViGIrLWl6LymDEgnllZCH7uPbpD4FmsirisvUYGidk8cWqO3l2BlVqcH3qfz7RLNt4gBei8bjdDvWCmIeV4MIDm4CVTBH7oblljqtY82FGsVjGXxoypaiGLxdmrUgRIfRcUywX3RR3g8Xbf1l/cA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780922845; c=relaxed/simple; bh=toNAxFLJrB8iY20xTgp65NMOM25Ww2fDrd1TpcUZKj4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=IGDFREomeP9ZIFnvHv06mGzaR6CHX5Tl2Nti1JIiGrFaRRdPXxPwE3ZZmSJDOXy0iv+HxpqfSCKQzP82uWqpFPe2KOVLh8cPTOESkT3erISG7fKlJLGxjDq6EWxZOYat0NvrnmQm3DaOLyMQyE9+EJpzZapDkjXENzokmaffYWI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=fZdTsufn; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="fZdTsufn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1780922839; x=1812458839; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=toNAxFLJrB8iY20xTgp65NMOM25Ww2fDrd1TpcUZKj4=; b=fZdTsufnlSkluQgkXPv9u3tq3qxD5VoIHmEmAogq+nIciLzj4tCRd3i7 J1CEFsndXYIMC8+QUaFRlwAqt4uXAcgyMiQxUWRDKKFLOwiALGexzPG6a XdGo9b8+izJagAIAV+DmPv1a/oBwT+WWcJhVMoBGR27BOZ0BSLxKhR5SB o9y/l1C6/hXT2f7J5oHClW1iIc4XV/0O4eDPua2rFs/XZ83BUSays2SUa FlwV37ZnBegJfllf/V7O9S5rQ1Bl8GSF3NU0aIg0eBB7x2E/yuPFQH5yH 13bnEBAALoJjZ48v3jcb3MwHqtOllavMS0T5LgW6MVdOM1L2ncKty8y3V w==; X-CSE-ConnectionGUID: /csUaKmXSsOarHb4WSKrxw== X-CSE-MsgGUID: KSZ2dGjnT0WvW613bq2zig== X-IronPort-AV: E=McAfee;i="6800,10657,11810"; a="85281930" X-IronPort-AV: E=Sophos;i="6.24,194,1774335600"; d="scan'208";a="85281930" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2026 05:47:15 -0700 X-CSE-ConnectionGUID: RrpliyIjTben3eoDZzjHZg== X-CSE-MsgGUID: J/V2boXWRWGCwaiCLhHO2w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,194,1774335600"; d="scan'208";a="247409316" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by fmviesa004.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2026 05:47:15 -0700 From: Ricardo Neri Date: Mon, 08 Jun 2026 05:57:14 -0700 Subject: [PATCH v4 4/6] sched/fair: Skip misfit load accounting when the destination CPU cannot help Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260608-rneri-fix-cas-clusters-v4-4-1526711c944c@linux.intel.com> References: <20260608-rneri-fix-cas-clusters-v4-0-1526711c944c@linux.intel.com> In-Reply-To: <20260608-rneri-fix-cas-clusters-v4-0-1526711c944c@linux.intel.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Andrea Righi , K Prateek Nayak , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1780923493; l=3147; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=toNAxFLJrB8iY20xTgp65NMOM25Ww2fDrd1TpcUZKj4=; b=/jq0pFzuQzDLStUkrmt9wzmntgBf8pWfhsdXdtw+nOJret46qzcB1P4w+wQvBVh0JF+zoDZcY iBEaUjVaeBqCto4iTT/M/7e7eC0bTIqejYY5e8owgR0DQo07K05IMpI X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= In domains with asymmetric capacity, identifying misfit load in a scheduling group is not useful when the destination CPU cannot help (i.e., its capacity exceeds the group's maximum CPU capacity by less than ~5%). In such cases, it also prevents load balance among clusters of equal capacity when CONFIG_SCHED_CLUSTER is enabled. This happens because update_sd_pick_busiest() skips candidate groups of type misfit_task if the destination CPU has similar capacity. Skipping misfit load accounting in this situation allows the group to be classified as has_spare or fully_busy and lets load balancing proceed. Keep marking scheduling groups as overloaded when misfit tasks are present. The sg_overloaded flag propagates to the root domain and allows bigger CPUs in it to help via newly idle balance. Reviewed-by: Christian Loehle Reviewed-by: Chen Yu Reviewed-by: Vincent Guittot Signed-off-by: Ricardo Neri Tested-by: Christian Loehle --- Changes in v4: * Added Reviewed-by tags from Vincent and Chen Yu. Thanks! Changes in v3: * Added Reviewed-by tag from Christian. Thanks! Changes in v2: * Moved the check of the destination CPU capacity inside the code block used for SD_ASYM_CPUCAPACITY. v1 inadvertently broke the mutual exclusion of the sched_reduced_capacity() path. * Keep marking the root domain as overloaded to allow bigger CPUs to help. (sashiko) * Fixed patch description to clarify that the capacity_greater() looks for differences of 5% or more. (Christian) * Reworded the patch description for clarity. * I did not include the Reviewed-by tag from Christian since the patch changed functionally. --- kernel/sched/fair.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a30ba02df688..77554d7410ff 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -11850,12 +11850,25 @@ static inline void update_sg_lb_stats(struct lb_e= nv *env, continue; =20 if (sd_flags & SD_ASYM_CPUCAPACITY) { - /* Check for a misfit task on the cpu */ - if (sgs->group_misfit_task_load < rq->misfit_task_load) { - sgs->group_misfit_task_load =3D rq->misfit_task_load; - + if (rq->misfit_task_load) { + /* + * Always mark the root domain overloaded so big + * CPUs can pick up misfit tasks via newly idle + * balance. + */ if (balancing_at_rd) *sg_overloaded =3D 1; + + /* + * Only account misfit load if @dst_cpu can + * help; otherwise, the group may be classified + * as misfit_task and update_sd_pick_busiest() + * will skip it. + */ + if (capacity_greater(capacity_of(env->dst_cpu), + group->sgc->max_capacity) && + (sgs->group_misfit_task_load < rq->misfit_task_load)) + sgs->group_misfit_task_load =3D rq->misfit_task_load; } } else if (env->idle && sched_reduced_capacity(rq, env->sd)) { /* Check for a task running on a CPU with reduced capacity */ --=20 2.43.0 From nobody Fri Jun 12 12:45:32 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79AFB259C82 for ; Mon, 8 Jun 2026 12:47:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780922847; cv=none; b=ll7/SNuf1cCCv5EEz+Vs7XEHgn1AlMtNjc4NmH19XeXXzfM34Np3uz3AWQEOQIyzjoZRquJTiQzZjY3cB9LDZjDkiLTGap7puMdhMiL3/wq4k8ySFMX9vY+rrcFuz0d7tiynQaGzKQ9vgliwbt57OLNA2fVTzSJYyzISdT1Iff8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780922847; c=relaxed/simple; bh=0OJPyCwJgXTPDPaVObwgq4Blszz/85fmSrMQsJ1gBpQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=DlWvFTXMxCua5QlBZLM4NJyDMJRnPXdIlQjQtFXSHCHzWaHzb/paA9uS3tq31el83QePjYQi652mTroPwrhBMoIa4D0PEiUM/UwE4MbIg6ScRCytFqZymZEqzefaKOSggvlJ1xQ0CVulkqH6gaPf0/hp3abw0CwzURKlOXDeJEM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nzV/uSvy; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nzV/uSvy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1780922840; x=1812458840; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=0OJPyCwJgXTPDPaVObwgq4Blszz/85fmSrMQsJ1gBpQ=; b=nzV/uSvyuGzOh37YsnOFbdMqqCCoHqT9qVdmydSrS9WJ9U49iSXV/bGK PP1c1hymGD5yYrWOcO/k4GWqcBPllt+/p/cNt1CKVvaa3P0zJ91lLhSJK qqK8NlQ1JfzTu/xKozGkbg8xOMsihbQC+hffYwQLAnj4cMJleCgU7VtrO aVsr6IbR1W2Z/mw/d0h0IqrkjggD3c/qibKsWcCvsKUt6mrN4/SOlRrfc T8dCVVoD0JWysFN0+6xFeLn8/K/ViZv6WVPaPmlCfSLOAGGLbeZ1aWdmk 4z0+RkhZZhAZYXXngJfpzDXqTsoESNRjoIYmE2BaZYTNUhHdNoOVHVkZp Q==; X-CSE-ConnectionGUID: ebk8pKygS16/zd+aTz9s9A== X-CSE-MsgGUID: gs+u3C+XQwqrtsWft3rdlQ== X-IronPort-AV: E=McAfee;i="6800,10657,11810"; a="85281937" X-IronPort-AV: E=Sophos;i="6.24,194,1774335600"; d="scan'208";a="85281937" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2026 05:47:15 -0700 X-CSE-ConnectionGUID: ZbMgKTkaRIKMIbKPsKZF5Q== X-CSE-MsgGUID: bXyvIGIUR1aS02lyJ5vg1A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,194,1774335600"; d="scan'208";a="247409320" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by fmviesa004.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2026 05:47:15 -0700 From: Ricardo Neri Date: Mon, 08 Jun 2026 05:57:15 -0700 Subject: [PATCH v4 5/6] sched/fair: Allow load balancing between CPUs of identical capacity Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260608-rneri-fix-cas-clusters-v4-5-1526711c944c@linux.intel.com> References: <20260608-rneri-fix-cas-clusters-v4-0-1526711c944c@linux.intel.com> In-Reply-To: <20260608-rneri-fix-cas-clusters-v4-0-1526711c944c@linux.intel.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Andrea Righi , K Prateek Nayak , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1780923493; l=2687; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=0OJPyCwJgXTPDPaVObwgq4Blszz/85fmSrMQsJ1gBpQ=; b=DpiAN2TfktJuo6HgO7xiirCrGSVN/8flc96oYGK9WeNErZSoBvYz54VQOpzx2sPyYzJ48YefI Mabx24dcan+D76+jx53a8qrJ/uD2l3YD5CM6o0sSolVkAqrH2zvu6I2 X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= sched_balance_find_src_rq() avoids selecting a runqueue with a single running task as busiest if doing so results in migrating the task to a CPU with less than ~5% of extra capacity. It also unintentionally prevents migrations between CPUs of identical capacity. When CONFIG_SCHED_CLUSTER is enabled, load should be balanced across clusters of CPUs with the same capacity. Allowing migration between CPUs of identical capacity is necessary to meet this goal. Use arch_scale_cpu_capacity() to reflect architectural capacity, excluding runtime reductions due to side activity or thermal pressure. Guard this check with the sched_cluster_active static key so that systems without cluster topology are unaffected. Signed-off-by: Ricardo Neri Tested-by: Christian Loehle --- Changes in v4: * Implemented the check for cluster with a local variable for improved readability. Changes in v3: * Reverted the inverted capacity check; the inverted form incorrectly allows migrations to CPUs of slightly less capacity. * Guarded the check for architectural capacity with the sched_cluster_active static key. Changes in v2: * Used arch_scale_cpu_capacity() instead of capacity_of() to ignore runtime variability. * Inverted the check for runtime capacity. (Christian) * Reworded patch description for clarity. --- kernel/sched/fair.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 77554d7410ff..74b9669d149b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12939,6 +12939,9 @@ static struct rq *sched_balance_find_src_rq(struct = lb_env *env, int i; =20 for_each_cpu_and(i, sched_group_span(group), env->cpus) { + bool same_arch_cluster =3D static_branch_unlikely(&sched_cluster_active)= && + (arch_scale_cpu_capacity(env->dst_cpu) =3D=3D + arch_scale_cpu_capacity(i)); bool smt_degraded_cap =3D sched_smt_active() && !is_core_idle(i); unsigned long capacity, load, util; unsigned int nr_running; @@ -12983,9 +12986,13 @@ static struct rq *sched_balance_find_src_rq(struct= lb_env *env, * * Busy SMT siblings reduce the capacity of CPU i. Do not skip * it in this case. + * + * CONFIG_SCHED_CLUSTER requires balancing load across clusters + * of identical capacity. Use architectural capacity to ignore + * runtime variability. */ if (env->sd->flags & SD_ASYM_CPUCAPACITY && - !smt_degraded_cap && + !smt_degraded_cap && !same_arch_cluster && !capacity_greater(capacity_of(env->dst_cpu), capacity) && nr_running =3D=3D 1) continue; --=20 2.43.0 From nobody Fri Jun 12 12:45:32 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 717952580F2 for ; Mon, 8 Jun 2026 12:47:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780922844; cv=none; b=fA2/jfIH0AjD1DqwuKnzJdCGNTgiczjCH/kQUX1+jyY/cA8TYERw9qU/nrwev9UuWW51yaE8UAtH1C1weCWrVTKLqzp7su6oDDfs0Ld0Y2F6DCrrVB3kJEdXxzlXLUoaK+vtztwCrLWT8yicsuu/Sg5Mv1qY5zHV70VPgFoGiu0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780922844; c=relaxed/simple; bh=r1Nk88pdY5S+Pk2JO2QN3zMUcMybntvPl69IwxLJ1sk=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Y0RYFDCy29r2Zto8hNrs9mqJSuAgtBHIEITkIomkyLwswPWvVHeSPaYADrnW1FEpsUCzatFalwcRU5hvF+td7c+eIp0gNd45dqvFkRMiFy0ETlNI5HQN7qdaP+3e3kIn6d9B8hpYu0Rnymn80Cw9AgeHVoZpUZdVKDM14u0iaTs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TMCZYBZg; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TMCZYBZg" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1780922843; x=1812458843; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=r1Nk88pdY5S+Pk2JO2QN3zMUcMybntvPl69IwxLJ1sk=; b=TMCZYBZgKCZhlJPLw0emw0Wd2kR6ZIQggPYe128rF+2CwhVRyqOcPKmX ka5KOCJl0xU6RCLT0WPz4BHU4115/wnELs4hzN2McjEJAs9ILMRvA2Swz qka1yWHwP8rAVXc12Vw875oqntvEQTkjzhG4FELG6ge5SaTRaZRrlfM5s 6ZLsyE99GrMzj9xtwvLhDkhhe5HlpgfMv3D3q3bRxdtgE9py8xgQfCfaE v91S3PBe2l1vbQ4XVlsEw7HnJV1u2zOil6WmY9c/PDFd2J6WudbvN7WK3 7U+XzJTU30yWkLpJnHLw1WJnvibnRZZcVb1r57F3okwzAYmL0K8I7NS/o g==; X-CSE-ConnectionGUID: n1dyBckfTyKk/+52J1/1Gw== X-CSE-MsgGUID: qG7VEeJwSQ+S5xDgSXSi5w== X-IronPort-AV: E=McAfee;i="6800,10657,11810"; a="85281946" X-IronPort-AV: E=Sophos;i="6.24,194,1774335600"; d="scan'208";a="85281946" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2026 05:47:15 -0700 X-CSE-ConnectionGUID: PwWqZgFuRe2Z387dXt+Rgw== X-CSE-MsgGUID: 4ZFUw5gDSjqtECcB2WRtNg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,194,1774335600"; d="scan'208";a="247409326" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by fmviesa004.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2026 05:47:15 -0700 From: Ricardo Neri Date: Mon, 08 Jun 2026 05:57:16 -0700 Subject: [PATCH v4 6/6] sched/topology: Do not clear SD_PREFER_SIBLING in domains with clusters Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260608-rneri-fix-cas-clusters-v4-6-1526711c944c@linux.intel.com> References: <20260608-rneri-fix-cas-clusters-v4-0-1526711c944c@linux.intel.com> In-Reply-To: <20260608-rneri-fix-cas-clusters-v4-0-1526711c944c@linux.intel.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Andrea Righi , K Prateek Nayak , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1780923493; l=3389; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=r1Nk88pdY5S+Pk2JO2QN3zMUcMybntvPl69IwxLJ1sk=; b=8SljkKRSGN/2KgAWz9qAfrsJ6pDAeK0zKSXf1NcYsCwe5zxQtKyuig/Af2HWPTLlmxuMAIpN9 7ZohaLlGGtrAoWjnqBPzip14NMIW6+PTH28En2DhBLI2aW1QZizG+Cw X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= Some topologies have scheduling domains that contain CPUs of asymmetric capacity, grouped into two or more clusters of equal-capacity CPUs sharing an L2 cache. When CONFIG_SCHED_CLUSTER is enabled, load must be balanced across these resource-sharing clusters. Do not clear SD_PREFER_SIBLING in the child domains to indicate to the load balancer that it should spread load among cluster siblings. Checks for capacity in update_sd_pick_busiest() prevent migrations from high- to low-capacity CPUs if a candidate group is not overloaded. An effect of keeping the SD_PREFER_SIBLING in domains with asymmetric capacity is that low-capacity clusters with spare capacity can now help overloaded higher-capacity groups. This was already the case for single-CPU groups (see calculate_imbalance() for domains with SD_SHARE_LLC). Once the overloading condition disappears, misfit load will still be used to move high-utilization tasks to bigger CPUs if they have spare capacity. Reviewed-by: Tim Chen Signed-off-by: Ricardo Neri Tested-by: Christian Loehle --- Changes in v4: * Added Reviewed-by tag from Tim. Thanks! Changes in v3: * Updated documentation of SD_PREFER_SIBLING. * Expanded the patch description to explain the behavior when overloaded groups are involved. Changes in v2: * Reworded the patch description for clarity. * Kept parentheses around bitwise operators for clarity. --- include/linux/sched/sd_flags.h | 3 ++- kernel/sched/topology.c | 14 ++++++++++++-- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h index 42839cfa2778..42f74af83b8c 100644 --- a/include/linux/sched/sd_flags.h +++ b/include/linux/sched/sd_flags.h @@ -147,7 +147,8 @@ SD_FLAG(SD_ASYM_PACKING, SDF_NEEDS_GROUPS) * Prefer to place tasks in a sibling domain * * Set up until domains start spanning NUMA nodes. Close to being a SHARED= _CHILD - * flag, but cleared below domains with SD_ASYM_CPUCAPACITY. + * flag, but cleared below domains with SD_ASYM_CPUCAPACITY if the domain = does + * not have clusters of CPUs sharing cache. * * NEEDS_GROUPS: Load balancing flag. */ diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 622e2e01974c..f35203ed52c0 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1995,8 +1995,18 @@ sd_init(struct sched_domain_topology_level *tl, /* * Convert topological properties into behaviour. */ - /* Don't attempt to spread across CPUs of different capacities. */ - if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child) + /* + * Don't attempt to spread across CPUs of different capacities. + * + * If the domain has clusters of CPUs sharing L2 cache, keep the flag to + * spread tasks across clusters of identical capacity. Checks in + * update_sd_pick_busiest() prevent task migrations from high- to low- + * capacity CPUs for non-overloaded groups. Migrations to a lower- + * capacity CPU can happen if a higher-capacity group is overloaded and + * a low-capacity cluster has spare capacity. + */ + if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child && + !(sd->child->flags & SD_CLUSTER)) sd->child->flags &=3D ~SD_PREFER_SIBLING; =20 if (sd->flags & SD_SHARE_CPUCAPACITY) { --=20 2.43.0