From nobody Fri Jun 12 12:45:08 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39AE739FCB9 for ; Thu, 14 May 2026 18:24:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778783053; cv=none; b=Bvm/8Kjf1sg0uoeRBKPeOowndIbpUB4ja34w9A+8UIIklE2UcmY5lHQk5YTZ3aecQ30svAXJ5Dt67STwySVk/JBESmpwXtqnZOTNMkQFR8ual888SOz7I6jhRzMgIg/+UVAOaXXUku7i4tszduGWAneIdP0TTc+fxAolVIgRrQg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778783053; c=relaxed/simple; bh=DJP1Odc4BgGcB8RgqGVoQ/OTqKr+5tC74UGMCU1Z/Wg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=J9as5z1RXA3o7BkLlLyuUu9zVpd1PElftLUmsRaJeMdkHT6dBAOt6i3K2BEBn1xQsgOCyOzJP4ESHhHDIfmLyh4/Ew6w3OZGwfqwF+exr+BcxwCG7thsbpPeR1d2sqqanyKDSpPmfIvHOWCOoGL3Tqqh+8fWgjREqLMvJH2waGc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Lr5d09ir; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Lr5d09ir" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778783052; x=1810319052; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=DJP1Odc4BgGcB8RgqGVoQ/OTqKr+5tC74UGMCU1Z/Wg=; b=Lr5d09irvtL3AGkkgQOoViyB8R0DzFmA/ZQpJM7qiCE7mVvx98MkP1UI bxNDS1rcIivMGps4iCOD67S/09VingfPy3XXBreECmfYiKfuvi52WgQWh GpJvabVMMPFK8D5Gk2FbpuL7nfyeZYYaubfc9MphffjrUFArF3ZGcTE3o OOZChHxc4JK22cN1mheqPeVMn2RI3EMcfBPG3/Z4nJ8Ei/wkr/j6zg7w2 syAD9kDURECfMXvm+UCL4P3rttTdyLQMW8wofs9WZblmxTgORyp5fdRQ0 LMctqmVV/LPtU0uQc732Yqi/0k0/GRikjSxFN8T+regzsQeUhTOVG3KQA g==; X-CSE-ConnectionGUID: 97RTkM01TCSFgvwV9ZR0Hg== X-CSE-MsgGUID: ZR8kimzYSpa9RQjWkIlSSg== X-IronPort-AV: E=McAfee;i="6800,10657,11786"; a="82303124" X-IronPort-AV: E=Sophos;i="6.23,235,1770624000"; d="scan'208";a="82303124" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2026 11:24:08 -0700 X-CSE-ConnectionGUID: 1zQbOlP8S8mjgI1MjgsEEg== X-CSE-MsgGUID: VIm9HtfUQimCaM3yqdUYQw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,235,1770624000"; d="scan'208";a="234181037" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by fmviesa010.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2026 11:24:08 -0700 From: Ricardo Neri Date: Thu, 14 May 2026 11:34:37 -0700 Subject: [PATCH v3 1/4] sched/fair: Check CPU capacity before comparing group types during load balance Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260514-rneri-fix-cas-clusters-v3-1-0037869554bd@linux.intel.com> References: <20260514-rneri-fix-cas-clusters-v3-0-0037869554bd@linux.intel.com> In-Reply-To: <20260514-rneri-fix-cas-clusters-v3-0-0037869554bd@linux.intel.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1778783711; l=2820; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=DJP1Odc4BgGcB8RgqGVoQ/OTqKr+5tC74UGMCU1Z/Wg=; b=gFwcWbwoS097m/sRypzq6rC9ldtcrpkYplaOLjkgFlIfr7dNaRxgT8E3VL0vK5OiHQVOqadVT 9RGspL/B900A7p3KorZRD0DlRckpgH9tjWN/gFVweTJ+0dKvR/HkvoF X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= update_sd_pick_busiest() may incorrectly select a fully_busy group as the busiest group when its per-CPU capacity exceeds that of the destination CPU. This happens because the type of busiest group is initialized to group_has_spare and allows the fully_busy group to win the type comparison. update_sd_pick_busiest() should not choose a candidate scheduling group with at most one runnable task if its per-CPU capacity is greater than that of the destination CPU. Such a check already exists, but it is done too late: after the type comparison, preventing a subsequent fully_busy group of equal per-CPU capacity from being correctly selected. Move this check to occur before comparing group types. Fixes: 0b0695f2b34a ("sched/fair: Rework load_balance()") Reviewed-by: Christian Loehle Signed-off-by: Ricardo Neri Reviewed-by: Tim Chen Reviewed-by: Vincent Guittot --- Changes in v3: * Added a Fixes tag. (Christian) * Added Reviewed-by tag from Christian. Thanks! Changes in v2: * Added a note clarifying that SMT and SD_ASYM_CPUCAPACITY are mutually exclusive. (Tim) * Kept parentheses around bitwise operators for clarity. * Rewrote patch description for clarity. --- kernel/sched/fair.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3ebec186f982..e06e74d9ce0e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10818,6 +10818,20 @@ static bool update_sd_pick_busiest(struct lb_env *= env, sds->local_stat.group_type !=3D group_has_spare)) return false; =20 + /* + * Candidate sg has no more than one task per CPU and has higher + * per-CPU capacity. Migrating tasks to less capable CPUs may harm + * throughput. Maximize throughput, power/energy consequences are not + * considered. + * + * Systems with SMT are unaffected, as asymmetric capacity is not set + * in such cases. + */ + if ((env->sd->flags & SD_ASYM_CPUCAPACITY) && + (sgs->group_type <=3D group_fully_busy) && + (capacity_greater(sg->sgc->min_capacity, capacity_of(env->dst_cpu)))) + return false; + if (sgs->group_type > busiest->group_type) return true; =20 @@ -10920,17 +10934,6 @@ static bool update_sd_pick_busiest(struct lb_env *= env, break; } =20 - /* - * Candidate sg has no more than one task per CPU and has higher - * per-CPU capacity. Migrating tasks to less capable CPUs may harm - * throughput. Maximize throughput, power/energy consequences are not - * considered. - */ - if ((env->sd->flags & SD_ASYM_CPUCAPACITY) && - (sgs->group_type <=3D group_fully_busy) && - (capacity_greater(sg->sgc->min_capacity, capacity_of(env->dst_cpu)))) - return false; - return true; } =20 --=20 2.43.0 From nobody Fri Jun 12 12:45:08 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C69BA39EF2B for ; Thu, 14 May 2026 18:24:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778783053; cv=none; b=cBlGJG0bRbaAe6d7R3TaT86NtRSWKyEkwmKcw3hSWcMRl+5+7Wnh7aCtafffOz0F1hmPRAxx92GcKruLl6ZMM5/UR7LBtjeGK1qMW+9XvVImMH+5KL6JnXrTF0sWCT4ohChzRJqj1q+VamiJ6XByQ/+cfO8CunUxJvrDq226q/U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778783053; c=relaxed/simple; bh=qL8r+wnrJq4XPLRRMrn6L7qhI9j4aqhNun2yMF4OSSM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=cqGmerlCjJoq4bAQN6vQs4SR0TtyKwuhO2/2X8QjdkiOVKREkORcWcSe99K2U/nQIkmVRnj4pmhaJjCEbHFUHJQ+iDpSrDm3MfRyFI3t1E5yzpTVRS9oHyFIboi65UnplaK1P/b7gvL1vPfaAYZ2EkA1Xp/y8d3mciK62fVlfRY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Rt8brL6j; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Rt8brL6j" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778783052; x=1810319052; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=qL8r+wnrJq4XPLRRMrn6L7qhI9j4aqhNun2yMF4OSSM=; b=Rt8brL6jE34Nw9IIA44bfyKoT/+MEeh69EwOs/ug3MP17QANcxBXk/Xl AC8EU8u3csfmhPAoTzdC9GiMtREneCF5QUYfytZnuwnCC9v9NofNY6K/G RdjZRmZsnfv92lOq0ZTi38PAkqSF3gQeWCl+0THP5poh6jZxEcAFP82GK LJ824w+U4TYUx4TjnXsPwOYpIF0/eQr6eR2Uv/qjVfcbA2wmJUI/Ga3zN eQYBN6Xovo9dK3YP1a7EuZVNQo/1IaGDFne2XB6OF8E7gNjfXW4fcoq1Y AXHFjB7iKkqFX8rgOjfIiJdBMoK0B8VJxzM1BMTVOXk51vKA1PYE9qAMW Q==; X-CSE-ConnectionGUID: 5q6RSrHEQj21AWygz+soHg== X-CSE-MsgGUID: zEKyMnr7T5W/1kPa1s9Bvg== X-IronPort-AV: E=McAfee;i="6800,10657,11786"; a="82303133" X-IronPort-AV: E=Sophos;i="6.23,235,1770624000"; d="scan'208";a="82303133" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2026 11:24:08 -0700 X-CSE-ConnectionGUID: dZ3L7u6XRaezVKlu2qq6jg== X-CSE-MsgGUID: Qw4500y+Rumz61TRvE4qkQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,235,1770624000"; d="scan'208";a="234181042" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by fmviesa010.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2026 11:24:08 -0700 From: Ricardo Neri Date: Thu, 14 May 2026 11:34:38 -0700 Subject: [PATCH v3 2/4] sched/fair: Skip misfit load accounting when the destination CPU cannot help Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260514-rneri-fix-cas-clusters-v3-2-0037869554bd@linux.intel.com> References: <20260514-rneri-fix-cas-clusters-v3-0-0037869554bd@linux.intel.com> In-Reply-To: <20260514-rneri-fix-cas-clusters-v3-0-0037869554bd@linux.intel.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1778783711; l=2928; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=qL8r+wnrJq4XPLRRMrn6L7qhI9j4aqhNun2yMF4OSSM=; b=eBYOpBT8W10xfQNo69OxUJfgLvsdf7P6/RKsVlc5rYEAHopbBJKoTZvKDCDDVqpE/0poJ2cvc ufI5R45YoP6ADoT0S/73ERFG9dWsx1gJ4Y2nR391+jMM0bIgmmbqKTG X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= In domains with asymmetric capacity, identifying misfit load in a scheduling group is not useful when the destination CPU cannot help (i.e., its capacity exceeds the group's maximum CPU capacity by less than ~5%). In such cases, it also prevents load balance among clusters of equal capacity when CONFIG_SCHED_CLUSTER is enabled. This happens because update_sd_pick_busiest() skips candidate groups of type misfit_task if the destination CPU has similar capacity. Skipping misfit load accounting in this situation allows the group to be classified as has_spare or fully_busy and lets load balancing proceed. Keep marking scheduling groups as overloaded when misfit tasks are present. The sg_overloaded flag propagates to the root domain and allows bigger CPUs in it to help via newly idle balance. Reviewed-by: Christian Loehle Signed-off-by: Ricardo Neri Reviewed-by: Chen Yu Reviewed-by: Vincent Guittot --- Changes in v3: * Added Reviewed-by tag from Christian. Thanks! Changes in v2: * Moved the check of the destination CPU capacity inside the code block used for SD_ASYM_CPUCAPACITY. v1 inadvertently broke the mutual exclusion of the sched_reduced_capacity() path. * Keep marking the root domain as overloaded to allow bigger CPUs to help. (sashiko) * Fixed patch description to clarify that the capacity_greater() looks for differences of 5% or more. (Christian) * Reworded the patch description for clarity. * I did not include the Reviewed-by tag from Christian since the patch changed functionally. --- kernel/sched/fair.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e06e74d9ce0e..dcc02ceb44b5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10749,10 +10749,24 @@ static inline void update_sg_lb_stats(struct lb_e= nv *env, continue; =20 if (sd_flags & SD_ASYM_CPUCAPACITY) { - /* Check for a misfit task on the cpu */ - if (sgs->group_misfit_task_load < rq->misfit_task_load) { - sgs->group_misfit_task_load =3D rq->misfit_task_load; + if (rq->misfit_task_load) { + /* + * Always mark the domain overloaded so big CPUs + * can pick up misfit tasks via newly idle + * balance. + */ *sg_overloaded =3D 1; + + /* + * Only account misfit load if @dst_cpu can + * help; otherwise, the group may be classified + * as misfit_task and update_sd_pick_busiest() + * will skip it. + */ + if (capacity_greater(capacity_of(env->dst_cpu), + group->sgc->max_capacity) && + (sgs->group_misfit_task_load < rq->misfit_task_load)) + sgs->group_misfit_task_load =3D rq->misfit_task_load; } } else if (env->idle && sched_reduced_capacity(rq, env->sd)) { /* Check for a task running on a CPU with reduced capacity */ --=20 2.43.0 From nobody Fri Jun 12 12:45:08 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F00E3A1A35 for ; Thu, 14 May 2026 18:24:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778783054; cv=none; b=BuJBPTXZhGBzsq0ip/7JSfJoVPTDfAZHE1D2wZO6Wrjb0sPFf1EBvQRF101kB0zpMxSdFvJZK1QcAIiUsc3bS5eQQC6aKhGzQ6fbFmBDrbLf1EBXSTp+pophzZPq5pGTwPQ4LRfN8/c9WQQcRu6sazOUhMD3OCXey2O6FbihhAs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778783054; c=relaxed/simple; bh=Y5BzhH7DNlJSh12M1xMRKWnp/u0MEradRqFK7fJVqKg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=p2lW31qtFggF1CydNPuyu128JBH/qZsI5U0Obrcom4xDZutVan2ZP6wjmdwdGQww6qkBGmkkoQcyGULzcAwyRwwg5eHPKu19n6OS3452XgZ8cwIwc5xgOzuMg4+XiOPXmtEERWeIh7Yxq2qEQmVcRFeY0Xs1s7w+MdZD4CZaikU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=VjTWBuZU; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="VjTWBuZU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778783053; x=1810319053; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=Y5BzhH7DNlJSh12M1xMRKWnp/u0MEradRqFK7fJVqKg=; b=VjTWBuZUl225ho9IWwM/vbc4bAUOqWStnkz55aN2qOQ+45AcrvYwdxQd XJX1ThVAYvXQXDtE+8SZ33YeOUF5l99HCeiQ38WAznCj/mxGTmrEHDg4H 9tN7yO8PE1BaiIEgXZ0H4ft+6XkhYZ7AnR5ILaqjFBmme6HfpBEWd6dRx R7JZ+MxU98+2XM3i3Y3Y+BN5yse2wNCV+b+afhEP4Ro0860EeNNhTeF2H BIWKjRz1bgg2BAGoo4m5UjErKHyYSv5l7m2zJ0O0uZ/a9AsSIV4AXycEV 5ApTJKXkILoEpfVGg0WoLWHymKq7KyoUkXk0u8w9UxccIybwlzZcOm9P/ A==; X-CSE-ConnectionGUID: xDx9vMfGS+++WeBnRQXRJw== X-CSE-MsgGUID: 9lA6nFc0Qn+nOVf9H69+6Q== X-IronPort-AV: E=McAfee;i="6800,10657,11786"; a="82303143" X-IronPort-AV: E=Sophos;i="6.23,235,1770624000"; d="scan'208";a="82303143" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2026 11:24:09 -0700 X-CSE-ConnectionGUID: 6bZiSnNMQ2+Z8fg9LNN8tg== X-CSE-MsgGUID: 0i4B5locQ3K/mKKVKiPvvQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,235,1770624000"; d="scan'208";a="234181048" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by fmviesa010.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2026 11:24:08 -0700 From: Ricardo Neri Date: Thu, 14 May 2026 11:34:39 -0700 Subject: [PATCH v3 3/4] sched/fair: Allow load balancing between CPUs of identical capacity Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260514-rneri-fix-cas-clusters-v3-3-0037869554bd@linux.intel.com> References: <20260514-rneri-fix-cas-clusters-v3-0-0037869554bd@linux.intel.com> In-Reply-To: <20260514-rneri-fix-cas-clusters-v3-0-0037869554bd@linux.intel.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1778783711; l=2208; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=Y5BzhH7DNlJSh12M1xMRKWnp/u0MEradRqFK7fJVqKg=; b=R8a84tVqE5g7KOGM0O+w2u20SFjZhmAc+HycMcd4ebl1XxJPRfR/N4LOXZ8nJx03r/Xp20eZc P9Vz0o3YVCDDcS+ozvVQxrOI106POSitw4rssMS7P0h4arko8zAEyBB X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= sched_balance_find_src_rq() avoids selecting a runqueue with a single running task as busiest if doing so results in migrating the task to a CPU with less than ~5% of extra capacity. It also unintentionally prevents migrations between CPUs of identical capacity. When CONFIG_SCHED_CLUSTER is enabled, load should be balanced across clusters of CPUs with the same capacity. Allowing migration between CPUs of identical capacity is necessary to meet this goal. Use arch_scale_cpu_capacity() to reflect architectural capacity, excluding runtime reductions due to side activity or thermal pressure. Guard this check with the sched_cluster_active static key so that systems without cluster topology are unaffected. Signed-off-by: Ricardo Neri --- Changes in v3: * Reverted the inverted capacity check; the inverted form incorrectly allows migrations to CPUs of slightly less capacity. * Guarded the check for architectural capacity with the sched_cluster_active static key. Changes in v2: * Used arch_scale_cpu_capacity() instead of capacity_of() to ignore runtime variability. * Inverted the check for runtime capacity. (Christian) * Reworded patch description for clarity. --- kernel/sched/fair.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index dcc02ceb44b5..d2a4c529f67f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -11846,8 +11846,14 @@ static struct rq *sched_balance_find_src_rq(struct= lb_env *env, * eventually lead to active_balancing high->low capacity. * Higher per-CPU capacity is considered better than balancing * average load. + * + * CONFIG_SCHED_CLUSTER requires balancing load across clusters + * of identical capacity. Use architectural capacity to ignore + * runtime variability. */ if (env->sd->flags & SD_ASYM_CPUCAPACITY && + (!static_branch_unlikely(&sched_cluster_active) || + arch_scale_cpu_capacity(env->dst_cpu) !=3D arch_scale_cpu_capacity(= i)) && !capacity_greater(capacity_of(env->dst_cpu), capacity) && nr_running =3D=3D 1) continue; --=20 2.43.0 From nobody Fri Jun 12 12:45:08 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFAED39937B for ; Thu, 14 May 2026 18:24:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778783055; cv=none; b=qsG1wcKa46L+YwbBIbsAuGM9lFyvnyIlcVSiZEyC1Y4NheW0JYeyy0+RZLy1eCqi7NNKnZqdmusI6DjYg2UZsHX2/FWW3aC6ZBb3rk1lWFxbllOjW8XaKLiF7WlAmvFtcM+0j3W71ZRkZrAXdnLrMSH9fGqOI2t8+XG0ifK2NHg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778783055; c=relaxed/simple; bh=2iufLt19sQxgbMetRHCWJ/NDIS5fCuPz/ShGG8AGH7k=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=bLDGSKc3R4anuSSyuXqvpDfCC3I88Oq2+ZdgQptx2sF/UM/rSg0MWl3iLsItl7gNjs5zNTRgfMbHuGaIxN6ceaEB3QZa0COVYExBWyVF9iv+k69KQnD81S6+oNXppnySIIaT4sPsqLOWp4iUv+xuFZ9q7sp1xpwQCfh9taY8yXU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RigaC6Dv; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RigaC6Dv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778783054; x=1810319054; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=2iufLt19sQxgbMetRHCWJ/NDIS5fCuPz/ShGG8AGH7k=; b=RigaC6DvIEQ+ES/10O2wB6uf0Rkl+N3nTS3XIkY4sk+yJLETfsgwQcHX 6RoMLph0PDTExUKUleiXi6S9yjQ2iPMRenYWBKZj4Xkb28BijgMG0zKnR S5Ka06vpN9W98H3OfiDr731cm9kz/izXm097+zynGuf/QE9PKcXzTgEEV NF+wYOavr7piJ6UaqzsDGneljWTHzokvIwobjbh9vuhGfSHPvbIwgKYL6 4+1zSuA3KPpeZsX3FcRrcKEnQpdjdP6lXCmu9rn18IKevDnSSSeUTVmT/ V1eFEhMP5eQNp7gd6gLdCSEWAooXmQZsJdYG+tqgaz6dvbbmjoTZE2tHx w==; X-CSE-ConnectionGUID: /SuvJjAsS0meHGU6HIKF/w== X-CSE-MsgGUID: YGe6pwnURmC0eoZTwXwYdw== X-IronPort-AV: E=McAfee;i="6800,10657,11786"; a="82303152" X-IronPort-AV: E=Sophos;i="6.23,235,1770624000"; d="scan'208";a="82303152" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2026 11:24:09 -0700 X-CSE-ConnectionGUID: hmFqF/y1R1mhuMXAeMNHnA== X-CSE-MsgGUID: HkR56d8qQV2H/FY4g6ZLrA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,235,1770624000"; d="scan'208";a="234181052" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by fmviesa010.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2026 11:24:08 -0700 From: Ricardo Neri Date: Thu, 14 May 2026 11:34:40 -0700 Subject: [PATCH v3 4/4] sched/topology: Do not clear SD_PREFER_SIBLING in domains with clusters Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260514-rneri-fix-cas-clusters-v3-4-0037869554bd@linux.intel.com> References: <20260514-rneri-fix-cas-clusters-v3-0-0037869554bd@linux.intel.com> In-Reply-To: <20260514-rneri-fix-cas-clusters-v3-0-0037869554bd@linux.intel.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1778783712; l=3275; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=2iufLt19sQxgbMetRHCWJ/NDIS5fCuPz/ShGG8AGH7k=; b=LVJsUCC2rXmQR2mS9zM9IB3A9Y7qLhjVZabbfiLQFcozEDIm80M8+uy/9odKyueoCUGUaEzjy LcbndD++TNFAhJdKDqOI/Uaxj8QvcYBZ18/FUzQVVaEvuyDOkZ0dtYy X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= Some topologies have scheduling domains that contain CPUs of asymmetric capacity, grouped into two or more clusters of equal-capacity CPUs sharing an L2 cache. When CONFIG_SCHED_CLUSTER is enabled, load must be balanced across these resource-sharing clusters. Do not clear SD_PREFER_SIBLING in the child domains to indicate to the load balancer that it should spread load among cluster siblings. Checks for capacity in update_sd_pick_busiest() prevent migrations from high- to low-capacity CPUs if a candidate group is not overloaded. An effect of keeping the SD_PREFER_SIBLING in domains with asymmetric capacity is that low-capacity clusters with spare capacity can now help overloaded higher-capacity groups. This was already the case for single-CPU groups (see calculate_imbalance() for domains with SD_SHARE_LLC). Once the overloading condition disappears, misfit load will still be used to move high-utilization tasks to bigger CPUs if they have spare capacity. Signed-off-by: Ricardo Neri Reviewed-by: Tim Chen --- Changes in v3: * Updated documentation of SD_PREFER_SIBLING. * Expanded the patch description to explain the behavior when overloaded groups are involved. Changes in v2: * Reworded the patch description for clarity. * Kept parentheses around bitwise operators for clarity. --- include/linux/sched/sd_flags.h | 3 ++- kernel/sched/topology.c | 14 ++++++++++++-- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h index 42839cfa2778..42f74af83b8c 100644 --- a/include/linux/sched/sd_flags.h +++ b/include/linux/sched/sd_flags.h @@ -147,7 +147,8 @@ SD_FLAG(SD_ASYM_PACKING, SDF_NEEDS_GROUPS) * Prefer to place tasks in a sibling domain * * Set up until domains start spanning NUMA nodes. Close to being a SHARED= _CHILD - * flag, but cleared below domains with SD_ASYM_CPUCAPACITY. + * flag, but cleared below domains with SD_ASYM_CPUCAPACITY if the domain = does + * not have clusters of CPUs sharing cache. * * NEEDS_GROUPS: Load balancing flag. */ diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 5847b83d9d55..a1d048344ea1 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1723,8 +1723,18 @@ sd_init(struct sched_domain_topology_level *tl, /* * Convert topological properties into behaviour. */ - /* Don't attempt to spread across CPUs of different capacities. */ - if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child) + /* + * Don't attempt to spread across CPUs of different capacities. + * + * If the domain has clusters of CPUs sharing L2 cache, keep the flag to + * spread tasks across clusters of identical capacity. Checks in + * update_sd_pick_busiest() prevent task migrations from high- to low- + * capacity CPUs for non-overloaded groups. Migrations to a lower- + * capacity CPU can happen if a higher-capacity group is overloaded and + * a low-capacity cluster has spare capacity. + */ + if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child && + !(sd->child->flags & SD_CLUSTER)) sd->child->flags &=3D ~SD_PREFER_SIBLING; =20 if (sd->flags & SD_SHARE_CPUCAPACITY) { --=20 2.43.0