From nobody Tue Jun 16 18:01:11 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C59E23AE6FB for ; Wed, 29 Apr 2026 21:21:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777497709; cv=none; b=A9n0dio3aKcx4A16VUltCF5HGPT4NSNB5BAta7QAtTy8e0gMKyiT/RJfy1MX4lwrUpUMfl6ZcyLe45TrmGeBum7QWmYc9XbhhbKhQW5siUiIm/eZ7wPGaleROsHNYYzezNkrAl/VKpAz/7g4I9teaa/frd7L075yCy1pHNGC9/w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777497709; c=relaxed/simple; bh=KOLfANa1cv8HB7c3O7D+HVteEa95bW0wXp59YXDxhBI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=IHjAGA3Abk5N5ThmvSMPHL+F5Mw+MIYNQNZ+KT4I8dB27AbDApkOgqVSF3AmObhGsOMVgnTSDbX4sxVM5IP1zQxGV4YSiqpsAiYM0WdJH4TKEppWJF1XiFsAhVmwDimTlVUPwY+la40gjfk204ycGlXygh2n3ckNoHMZtL+jIZs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=UVwus8xr; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="UVwus8xr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1777497708; x=1809033708; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=KOLfANa1cv8HB7c3O7D+HVteEa95bW0wXp59YXDxhBI=; b=UVwus8xrZEQa9YKAQqC4y3JAIYKuUgAsBTr8bmtp2Yn4or8bedlFFsoF C6U8zS4lm/tA3VFsPDujSfFHDBD5r/rxa5OuMGOGBECGr0heqkt4CQLX3 FDfcS0cbauIgSKvUXrs+nee6pWiq1WWBAdPVd6LXJ4xuUtcwdcUw/x2ay m0KpzpOw60ET2A7+oqaaZ9NFl/PlJM2M2EMCN5aBiY1e5Ja13hVpYTCsd p/vq+b6+8jAuLI/MAacZnaqqCmxIR83DzAUhoMw/eAYFZOQXFK3jb4tjn RSsM8RGringnZlt8ebVJy7OQc260lE+4EDk4DBczF5xXFZkKAwWzBm/jP g==; X-CSE-ConnectionGUID: tbqij4K/ToG/y0deNuZDmg== X-CSE-MsgGUID: NsevcDZ6TYyUuaHd6FwoDA== X-IronPort-AV: E=McAfee;i="6800,10657,11771"; a="88748734" X-IronPort-AV: E=Sophos;i="6.23,206,1770624000"; d="scan'208";a="88748734" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2026 14:21:44 -0700 X-CSE-ConnectionGUID: p9N1QvrLS/+N25WG6bsXzg== X-CSE-MsgGUID: 5SlK89CpQ7iStyDDrIxMIA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,206,1770624000"; d="scan'208";a="234260025" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by orviesa008.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2026 14:21:44 -0700 From: Ricardo Neri Date: Wed, 29 Apr 2026 14:19:44 -0700 Subject: [PATCH v2 1/4] sched/fair: Check CPU capacity before comparing group types during load balance Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260429-rneri-fix-cas-clusters-v2-1-cd787de35cc6@linux.intel.com> References: <20260429-rneri-fix-cas-clusters-v2-0-cd787de35cc6@linux.intel.com> In-Reply-To: <20260429-rneri-fix-cas-clusters-v2-0-cd787de35cc6@linux.intel.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777497633; l=2602; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=KOLfANa1cv8HB7c3O7D+HVteEa95bW0wXp59YXDxhBI=; b=oOci730FgZeICJvOGAwT/2o23TImg5FhO0WoTAVM6NQmioiM0RBABJ4sj3CQV91XsqzlbS1lt jRc94ajq5T9Dv19POsjCC5+mLeVsbk8TGa6cRdAE0iFIXgDCfgeNVYP X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= update_sd_pick_busiest() may incorrectly select a fully_busy group as the busiest group when its per-CPU capacity exceeds that of the destination CPU. This happens because the type of busiest group is initialized to group_has_spare and allows the fully_busy group to win the type comparison. update_sd_pick_busiest() should not choose a candidate scheduling group with at most one runnable task if its per-CPU capacity is greater than that of the destination CPU. Such a check already exists, but it is done too late: after the type comparison, preventing a subsequent fully_busy group of equal per-CPU capacity from being correctly selected. Move this check to occur before comparing group types. Signed-off-by: Ricardo Neri Reviewed-by: Christian Loehle --- Changes since v1: * Added a note clarifying that SMT and SD_ASYM_CPUCAPACITY are mutually exclusive. (Tim) * Kept parentheses around bitwise operators for clarity. * Rewrote patch description for clarity. --- kernel/sched/fair.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 728965851842..0dbed82aa63f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10788,6 +10788,20 @@ static bool update_sd_pick_busiest(struct lb_env *= env, sds->local_stat.group_type !=3D group_has_spare)) return false; =20 + /* + * Candidate sg has no more than one task per CPU and has higher + * per-CPU capacity. Migrating tasks to less capable CPUs may harm + * throughput. Maximize throughput, power/energy consequences are not + * considered. + * + * Systems with SMT are unaffected, as asymmetric capacity is not set + * in such case. + */ + if ((env->sd->flags & SD_ASYM_CPUCAPACITY) && + (sgs->group_type <=3D group_fully_busy) && + (capacity_greater(sg->sgc->min_capacity, capacity_of(env->dst_cpu)))) + return false; + if (sgs->group_type > busiest->group_type) return true; =20 @@ -10890,17 +10904,6 @@ static bool update_sd_pick_busiest(struct lb_env *= env, break; } =20 - /* - * Candidate sg has no more than one task per CPU and has higher - * per-CPU capacity. Migrating tasks to less capable CPUs may harm - * throughput. Maximize throughput, power/energy consequences are not - * considered. - */ - if ((env->sd->flags & SD_ASYM_CPUCAPACITY) && - (sgs->group_type <=3D group_fully_busy) && - (capacity_greater(sg->sgc->min_capacity, capacity_of(env->dst_cpu)))) - return false; - return true; } =20 --=20 2.43.0 From nobody Tue Jun 16 18:01:11 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D2633AE70F for ; Wed, 29 Apr 2026 21:21:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777497709; cv=none; b=IUpfEmbOzmUzkfrXL5pc03OCeSId5HmhALX0HT/NM8K0wNAQWHvvStEydsHz8vF7XhKcergmA/+RF0H2zvEdcg9YDoJfbHs/HzSAHlVfQYrxT8ddixG2MfEKcJgRrULkT3gsSQHzjQygAzyorN+BnrGZNHfnUYW7J6s/dsat8GU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777497709; c=relaxed/simple; bh=n59Y7KiXKIGznfBdAZRxsuP0rvxNrHdaHG5xC2zO6K8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=NxM6Gc/rJbMGreHGBdJDHcYR+8V3+uUan+c9Ot7yP+MQ1wDcE0331SW9Oz13HPHCeUjgpb0G3n+TeQn/BuozSZF8OJK47HckgCvHRavb4pAVpRDQHbMiIuGlvBN92KDVUYqmgzmbbB7sds++1/b73/DwTQjNt7GbdF6cci9sQ6U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=mzwzl5FN; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="mzwzl5FN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1777497708; x=1809033708; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=n59Y7KiXKIGznfBdAZRxsuP0rvxNrHdaHG5xC2zO6K8=; b=mzwzl5FNSHtC1qWYescQAynebINA7bXXtQLLLbQ4M9IIY0inqWALDBJy G8j791sGM3fRux7n46HqUw7QVOCDzuw2Tfnf/mq722tZzp5+TEW6SmM4s J87SEJHHwpWXeE6CAO4IT34g07HvxHooaMLwPFOwKXho8m22P7qWuXmWc Mc44gqEFtmszqFpUko/8DDxsHMuyy4EpgBz2ruoTLbLpl6gwjq2scpXQr qQwc0Lb8/JrAfJXIUtyJyBjbh7q6tHIV3UUwExbSVWeNlDfd9lsb7Vzzo vFfeTqjyetzdLq46Eaj0vdtTVoQQu0bnZGMTER9wvg0A+LDZTmGfR4z4x A==; X-CSE-ConnectionGUID: iZZDEh6HSr+iuipop0oiig== X-CSE-MsgGUID: oGO49mtPTWGBhSpP6Wo80A== X-IronPort-AV: E=McAfee;i="6800,10657,11771"; a="88748743" X-IronPort-AV: E=Sophos;i="6.23,206,1770624000"; d="scan'208";a="88748743" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2026 14:21:44 -0700 X-CSE-ConnectionGUID: yStbbOrqQzqFBeRHhO4YZA== X-CSE-MsgGUID: TT/OEYH2RkqzQzo0JngkEg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,206,1770624000"; d="scan'208";a="234260030" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by orviesa008.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2026 14:21:44 -0700 From: Ricardo Neri Date: Wed, 29 Apr 2026 14:19:45 -0700 Subject: [PATCH v2 2/4] sched/fair: Skip misfit load accounting when the destination CPU cannot help Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260429-rneri-fix-cas-clusters-v2-2-cd787de35cc6@linux.intel.com> References: <20260429-rneri-fix-cas-clusters-v2-0-cd787de35cc6@linux.intel.com> In-Reply-To: <20260429-rneri-fix-cas-clusters-v2-0-cd787de35cc6@linux.intel.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777497633; l=2786; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=n59Y7KiXKIGznfBdAZRxsuP0rvxNrHdaHG5xC2zO6K8=; b=dmzpUn7SXtqQlOLlJIVIcTViBQVi1fzj4lfXnDchKbx/kLS9yyI0fjMX1mF/98erUiPanTHRD YwwjjTJr7rKDQNZb4sZugtyIFLiEqZPW5rfH8rikNFTY0Zctx08CR4l X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= In domains with asymmetric capacity, identifying misfit load in a scheduling group is not useful when the destination CPU cannot help (i.e., its capacity exceeds the group's maximum CPU capacity by less than ~5%). In such cases, it also prevents load balance among clusters of equal capacity when CONFIG_SCHED_CLUSTER is enabled. This happens because update_sd_pick_busiest() skips candidate groups of type misfit_task if the destination CPU has similar capacity. Skipping misfit load accounting in this situation allows the group to be classified as has_spare or fully_busy and lets load balancing proceed. Keep marking scheduling groups as overloaded when misfit tasks are present. This flag propagates to the root domain and allows bigger CPUs in it to help via newly idle balance. Signed-off-by: Ricardo Neri Reviewed-by: Christian Loehle --- Changes since v1: * Moved the check of the destination CPU capacity inside the code block used for SD_ASYM_CPUCAPACITY. v1 inadvertedly broke the mutual exclusion of the sched_reduced_capacity() path. * Keep marking the root domain as overloaded to allow bigger CPUs to help. (sashiko) * Fixed patch description to clarify that the capacity_greater() looks differences of 5% or more. (Christian) * Reworded the patch description for clarity. * I did not include the Reviewed-by tag from Christian since the patch changed functionally. --- kernel/sched/fair.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0dbed82aa63f..166a5b109e0e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10719,10 +10719,24 @@ static inline void update_sg_lb_stats(struct lb_e= nv *env, continue; =20 if (sd_flags & SD_ASYM_CPUCAPACITY) { - /* Check for a misfit task on the cpu */ - if (sgs->group_misfit_task_load < rq->misfit_task_load) { - sgs->group_misfit_task_load =3D rq->misfit_task_load; + if (rq->misfit_task_load) { + /* + * Always mark the domain overloaded so big CPUs + * can pick up misfit tasks via newly idle + * balance. + */ *sg_overloaded =3D 1; + + /* + * Only account misfit load if @dst_cpu can + * help, otherwise the group may be classified + * as misfit_task and update_sd_pick_busiest() + * will skip it. + */ + if (capacity_greater(capacity_of(env->dst_cpu), + group->sgc->max_capacity) && + (sgs->group_misfit_task_load < rq->misfit_task_load)) + sgs->group_misfit_task_load =3D rq->misfit_task_load; } } else if (env->idle && sched_reduced_capacity(rq, env->sd)) { /* Check for a task running on a CPU with reduced capacity */ --=20 2.43.0 From nobody Tue Jun 16 18:01:11 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6658E3B0AC4 for ; Wed, 29 Apr 2026 21:21:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777497710; cv=none; b=sbs3dmlsaTu1ud0GFiVqOY2p1oo+z683r++IShZCPqAmgGn/T4u+z1ObaLu9bYAVeDERSKnbSbXVz/0Dk22mKGbsK9td1b4wd6c+enEoZydscNI2A7J9OrRAvx5vsuorQz+S8BM73+si3ysUUMQs9ZAH9VMlBZUZhLHRTKUDl+8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777497710; c=relaxed/simple; bh=3N8DdPsZ0XnP6ME8/dycbVhxXqjJ98aGEwzqEAPebNs=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=FN5ztCMk4QG6vBXJzDwZa1AOAw1upCWYP/dBvgYWTbh5OIhDyuxG0+YYqhs75RMD4IAtwSnnuYJzQ5vlp6zCTc842FWJCcBU9iTKDb/orXq4PrI3mDfiTjmahmWu0eTDac9xc/nxlvalsZ11NX+nUf9ebZ3EQSD/PfLdv2ph5DM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KywlqMkJ; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KywlqMkJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1777497709; x=1809033709; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=3N8DdPsZ0XnP6ME8/dycbVhxXqjJ98aGEwzqEAPebNs=; b=KywlqMkJok2EcO2y5Buhg1cpmCWhN3uaS4DoNvzeUDExch0Px73ZPsuD A69G7agoN3gdRIxwqGKd5sZDs7ZW4t9GkbsUY8MjlQRdm3oX1Lh2tGxw5 gAzZw1LJRD/Ghj45bCbatSoGmQF5RjoaQAFtIIPbncXHVmhSdzAK6hEb2 KHkGhlzWrJNFLCMp5YXMXgOWoYXv9b2JKJWJ7zs1BlTHfwePM7QtLmhS7 fJUQ7K3y8vtSSrYGjK64t3c8ZCnC9ACaVlB42smxVgClhhYwSc+Hq8sTh KCCgmLGRk1oOTnWpR7RTqEq2TiUA2ZXP2Qap8tPTAuCIMoTvhQUajVGTh g==; X-CSE-ConnectionGUID: PiTjJQ54Scy9+RIZPRnkQQ== X-CSE-MsgGUID: pMwiyj+9R1WtWdA8JGqT1A== X-IronPort-AV: E=McAfee;i="6800,10657,11771"; a="88748752" X-IronPort-AV: E=Sophos;i="6.23,206,1770624000"; d="scan'208";a="88748752" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2026 14:21:45 -0700 X-CSE-ConnectionGUID: U9d91G/EQryT8PoiFlYiCA== X-CSE-MsgGUID: bW4+Ws/cT/es3/xrzMCJmQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,206,1770624000"; d="scan'208";a="234260035" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by orviesa008.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2026 14:21:45 -0700 From: Ricardo Neri Date: Wed, 29 Apr 2026 14:19:46 -0700 Subject: [PATCH v2 3/4] sched/fair: Allow load balancing between CPUs of identical capacity Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260429-rneri-fix-cas-clusters-v2-3-cd787de35cc6@linux.intel.com> References: <20260429-rneri-fix-cas-clusters-v2-0-cd787de35cc6@linux.intel.com> In-Reply-To: <20260429-rneri-fix-cas-clusters-v2-0-cd787de35cc6@linux.intel.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777497633; l=1972; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=3N8DdPsZ0XnP6ME8/dycbVhxXqjJ98aGEwzqEAPebNs=; b=C5+xVatnVOe2mB6h09hIWx9VD/qnY/kGyLRr1u9HOdkcd7jLQCIx8oLt0w5hCFhlE8FUd9neB Bk7cdmnCsjlCKNQG5O1nhoVig+B1Q3gPktnzOqANtZziq+icz+Lvf/w X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= sched_balance_find_src_rq() avoids selecting a runqueue with a single running task as busiest if doing so results in migrating the task to a CPU with less than ~5% of extra capacity. It also unintentionally prevents migrations between CPUs of identical capacity. When CONFIG_SCHED_CLUSTER is enabled, load should be balanced across clusters of CPUs with the same capacity. Allowing migration between CPUs of identical capacity is necessary to meet this goal. We are interested in the architectural capacity of the involved CPUs, excluding any reductions due to side activity or thermal pressure. Use arch_scale_cpu_capacity(). While here, invert the check for runtime capacity for clarity. Signed-off-by: Ricardo Neri --- Changes since v1: * Used arch_scale_cpu_capacity() instead of capacity_of() to ignore runtime variability. * Inverted the check for runtime capacity. (Christian) * Reworded patch description for clarity. --- kernel/sched/fair.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 166a5b109e0e..4105717e64fe 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -11816,9 +11816,14 @@ static struct rq *sched_balance_find_src_rq(struct= lb_env *env, * eventually lead to active_balancing high->low capacity. * Higher per-CPU capacity is considered better than balancing * average load. + * + * Cluster scheduling requires balancing load across clusters + * of identical capacity. Use architectural capacity to ignore + * runtime variability. */ if (env->sd->flags & SD_ASYM_CPUCAPACITY && - !capacity_greater(capacity_of(env->dst_cpu), capacity) && + arch_scale_cpu_capacity(env->dst_cpu) !=3D arch_scale_cpu_capacity(i= ) && + capacity_greater(capacity, capacity_of(env->dst_cpu)) && nr_running =3D=3D 1) continue; =20 --=20 2.43.0 From nobody Tue Jun 16 18:01:11 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 932D13B19D5 for ; Wed, 29 Apr 2026 21:21:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777497712; cv=none; b=Mes1/mM8m5hg3fhbbKHPmeyl67uGHHy7vF334rjuwe1QjB0J9WwgxlEq7rHQPgQrLtwVtMPVAcyT0MrKW3CKEzsV7mKDym4NP05QT1ozuDaAQI44e9O5C3tEcWtBJmW/RB/TmNfSfbH40oAGq+qnfw6hANbXa6SbI6PwQ8IVYJA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777497712; c=relaxed/simple; bh=cmMXLhbK3Dzeq/KxSr+aCDYgrcwnpbbpcXbN98tD2HI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=LmdWiC2SwYl51KPlfMd6kZCemid49pudH8kksFTsxPSdoHwL22HgPuTwJ+slTBXsfGIlthEbTkDdn6OiEkYqKUXD3CfHGHvIA9ErdK8varURq9u9nZ+PssY3INCYyytJhV2n50yZEnssfp89TYTFWORynEGH9zgb87W6ZHfGLII= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=eq/OsbqN; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="eq/OsbqN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1777497709; x=1809033709; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=cmMXLhbK3Dzeq/KxSr+aCDYgrcwnpbbpcXbN98tD2HI=; b=eq/OsbqNOdJM2cv3dGcz7e5fRG3KSkYGIfgDpENSvarZcbD2YX2gL5am CwpzYEhiMrAKcqk88nSh+NuwzGRV1r+/6ScNncqoD5bEhcugcFSNrqIi7 POyRUan8a+5TzkAzD4JIn2cDU5pq6qAXbRU9sNPNiK+s/AhhmJrKPYW6n QdkWTOM4QFz5NdRyg2Y2J7Bs3bAw2n57kbxmbx70UGYvl4ZZcfauzHarx D/l12rGDC13ykx0LfO2eJWpNnEchbzbrbWe6ksSbZeI9AUs5z3qs2wmXH Riv6udZ5JcM1tk4+kQFyxawWWBGbfWvurgpAElZXiXqpKfTq9IcTSqFH3 Q==; X-CSE-ConnectionGUID: LsUnhGEdTXaoGmqF+KSi9g== X-CSE-MsgGUID: y75fTB8zTsmucrbuKDxuPA== X-IronPort-AV: E=McAfee;i="6800,10657,11771"; a="88748758" X-IronPort-AV: E=Sophos;i="6.23,206,1770624000"; d="scan'208";a="88748758" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2026 14:21:45 -0700 X-CSE-ConnectionGUID: hX3EjMbPSwqLwx8xxigVqA== X-CSE-MsgGUID: 8UZsK07HTxeBnTjLGHoUAA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,206,1770624000"; d="scan'208";a="234260041" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by orviesa008.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2026 14:21:45 -0700 From: Ricardo Neri Date: Wed, 29 Apr 2026 14:19:47 -0700 Subject: [PATCH v2 4/4] sched/topology: Do not clear SD_PREFER_SIBLING in domains with clusters Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260429-rneri-fix-cas-clusters-v2-4-cd787de35cc6@linux.intel.com> References: <20260429-rneri-fix-cas-clusters-v2-0-cd787de35cc6@linux.intel.com> In-Reply-To: <20260429-rneri-fix-cas-clusters-v2-0-cd787de35cc6@linux.intel.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777497633; l=1867; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=cmMXLhbK3Dzeq/KxSr+aCDYgrcwnpbbpcXbN98tD2HI=; b=2yHRndo5DHczQBJJOBU1d6vwHYGGEeouGSn7NIRDZbL6+1cCzjv+al0C8nKN8shjrs/SJ12bn EhvWOWCyUCSDdu3cbqnIkE6aM/c2JvVDjrtgfAoppYOtp+NEzUcKZM3 X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= Some topologies have scheduling domains that contain CPUs of asymmetric capacity, grouped into two or more clusters of equal-capacity CPUs sharing an L2 cache. When CONFIG_SCHED_CLUSTER is enabled, load must be balanced across these resource-sharing clusters. Do not clear the SD_PREFER_SIBLING in the child domains to indicate to the load balancer that it should spread load among cluster siblings. Checks for capacity in the load balancer will prevent migrations from high- to low-capacity CPUs. Likewise, misfit load will still be used to move high-utilization tasks to bigger CPUs. Signed-off-by: Ricardo Neri --- Changes since v1: * Reworded the patch description for clarity. * Kept parentheses around bitwise operators for clarity. --- kernel/sched/topology.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 5847b83d9d55..78ffc1b8eaff 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1723,8 +1723,15 @@ sd_init(struct sched_domain_topology_level *tl, /* * Convert topological properties into behaviour. */ - /* Don't attempt to spread across CPUs of different capacities. */ - if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child) + /* + * Don't attempt to spread across CPUs of different capacities. An + * exception to this rule are domains in which there are clusters of + * CPUs sharing a resource. Keep the flag in such case to balance load + * among them. The load balancer will prevent task migrations from + * high- to low-capacity CPUs. + */ + if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child && + !(sd->child->flags & SD_CLUSTER)) sd->child->flags &=3D ~SD_PREFER_SIBLING; =20 if (sd->flags & SD_SHARE_CPUCAPACITY) { --=20 2.43.0