kernel/sched/fair.c | 163 +++++++++++++++++++++++++++++++++++++++++++----- kernel/sched/topology.c | 9 --- 2 files changed, 147 insertions(+), 25 deletions(-)
This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by
introducing SMT awareness.
= Problem =
Nominal per-logical-CPU capacity can overstate usable compute when an SMT
sibling is busy, because the physical core doesn't deliver its full nominal
capacity. So, several SD_ASYM_CPUCAPACITY paths may pick high capacity CPUs
that are not actually good destinations.
= Proposed Solution =
This patch set aligns those paths with a simple rule already used
elsewhere: when SMT is active, prefer fully idle cores and avoid treating
partially idle SMT siblings as full-capacity targets where that would
mislead load balance.
Patch set summary:
- [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
Prefer fully-idle SMT cores in asym-capacity idle selection. In the
wakeup fast path, extend select_idle_capacity() / asym_fits_cpu() so
idle selection can prefer CPUs on fully idle cores, with a safe fallback.
- [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity
Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY.
Provided for consistency with PATCH 1/4.
- [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems
Enable EAS with SD_ASYM_CPUCAPACITY and SMT. Also provided for
consistency with PATCH 1/4. I've also tested with/without
/proc/sys/kernel/sched_energy_aware enabled (same platform) and haven't
noticed any regression.
- [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer
When choosing the housekeeping CPU that runs the idle load balancer,
prefer an idle CPU on a fully idle core so migrated work lands where
effective capacity is available.
The change is still consistent with the same "avoid CPUs with busy
sibling" logic and it shows some benefits on Vera, but could have
negative impact on other systems, I'm including it for completeness
(feedback is appreciated).
This patch set has been tested on the new NVIDIA Vera Rubin platform, where
SMT is enabled and the firmware exposes small frequency variations (+/-~5%)
as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set.
Without these patches, performance can drop up to ~2x with CPU-intensive
workloads, because the SD_ASYM_CPUCAPACITY idle selection policy does not
account for busy SMT siblings.
Alternative approaches have been evaluated, such as equalizing CPU
capacities, either by exposing uniform values via firmware (ACPI/CPPC) or
normalizing them in the kernel by grouping CPUs within a small capacity
window (+-5%) [1][2], or enabling asympacking [3].
However, adding SMT awareness to SD_ASYM_CPUCAPACITY has shown better
results so far. Improving this policy also seems worthwhile in general, as
other platforms in the future may enable SMT with asymmetric CPU
topologies.
[1] https://lore.kernel.org/lkml/20260324005509.1134981-1-arighi@nvidia.com
[2] https://lore.kernel.org/lkml/20260318092214.130908-1-arighi@nvidia.com
[3] https://lore.kernel.org/all/20260325181314.3875909-1-christian.loehle@arm.com/
Andrea Righi (4):
sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity
sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems
sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer
kernel/sched/fair.c | 163 +++++++++++++++++++++++++++++++++++++++++++-----
kernel/sched/topology.c | 9 ---
2 files changed, 147 insertions(+), 25 deletions(-)
Hi Andrea, On 26.03.26 16:02, Andrea Righi wrote: [...] > This patch set has been tested on the new NVIDIA Vera Rubin platform, where > SMT is enabled and the firmware exposes small frequency variations (+/-~5%) > as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set. > > Without these patches, performance can drop up to ~2x with CPU-intensive > workloads, because the SD_ASYM_CPUCAPACITY idle selection policy does not > account for busy SMT siblings. > > Alternative approaches have been evaluated, such as equalizing CPU > capacities, either by exposing uniform values via firmware (ACPI/CPPC) or > normalizing them in the kernel by grouping CPUs within a small capacity > window (+-5%) [1][2], or enabling asympacking [3]. > > However, adding SMT awareness to SD_ASYM_CPUCAPACITY has shown better > results so far. Improving this policy also seems worthwhile in general, as > other platforms in the future may enable SMT with asymmetric CPU > topologies. I still wonder whether we really need select_idle_capacity() (plus the smt part) for asymmetric CPU capacity systems where the CPU capacity differences are < 5% of SCHED_CAPACITY_SCALE. The known example would be the NVIDIA Grace (!smt) server with its slightly different perf_caps.highest_perf values. We did run DCPerf Mediawiki on this thing with: (1) ASYM_CPUCAPACITY (default) (2) NO ASYM_CPUCAPACITY We also ran on a comparable ARM64 server (!smt) for comparison: (1) ASYM_CPUCAPACITY (2) NO ASYM_CPUCAPACITY (default) Both systems have 72 CPUs, run v6.8 and have a single MC sched domain with LLC spanning over all 72 CPUs. During the tests there were ~750 tasks among them the workload related: #hhvmworker 147 #mariadbd 204 #memcached 11 #nginx 8 #wrk 144 #ProxygenWorker 1 load_balance: not_idle 3x more on (2) idle 2x more on (2) newly_idle 2-10x more on (2) wakeup: move_affine 2-3x more on (1) ttwu_local 1.5-2 more on (2) We also instrumented all the bailout conditions in select_task_sibling() (sis())->select_idle_cpu() and select_idle_capacity() (sic()). In (1) almost all wakeups end up in select_idle_cpu() returning -1 due to the fact that 'sd->shared->nr_idle_scan' under SIS_UTIL is 0. So sis() in (1) almost always returns target (this_cpu or prev_cpu). sic() doesn't do this. What I haven't done is to try (1) with SIS_UTIL or (2) with NO_SIS_UTIL. I wonder whether this is the underlying reason for the benefit of (1) over (2) we see here with smt now? So IMHO before adding smt support to (1) for these small CPPC based CPU capacity differences we should make sure that the same can't be achieved by disabling SIS_UTIL or to soften it a bit. So does (2) with NO_SIS_UTIL performs worse than (1) with your smt related add-ons in sic()?
Hi Dietmar, On Tue, Mar 31, 2026 at 12:30:55AM +0200, Dietmar Eggemann wrote: > Hi Andrea, > > On 26.03.26 16:02, Andrea Righi wrote: > > [...] > > > This patch set has been tested on the new NVIDIA Vera Rubin platform, where > > SMT is enabled and the firmware exposes small frequency variations (+/-~5%) > > as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set. > > > > Without these patches, performance can drop up to ~2x with CPU-intensive > > workloads, because the SD_ASYM_CPUCAPACITY idle selection policy does not > > account for busy SMT siblings. > > > > Alternative approaches have been evaluated, such as equalizing CPU > > capacities, either by exposing uniform values via firmware (ACPI/CPPC) or > > normalizing them in the kernel by grouping CPUs within a small capacity > > window (+-5%) [1][2], or enabling asympacking [3]. > > > > However, adding SMT awareness to SD_ASYM_CPUCAPACITY has shown better > > results so far. Improving this policy also seems worthwhile in general, as > > other platforms in the future may enable SMT with asymmetric CPU > > topologies. > I still wonder whether we really need select_idle_capacity() (plus the > smt part) for asymmetric CPU capacity systems where the CPU capacity > differences are < 5% of SCHED_CAPACITY_SCALE. > > The known example would be the NVIDIA Grace (!smt) server with its > slightly different perf_caps.highest_perf values. > > We did run DCPerf Mediawiki on this thing with: > > (1) ASYM_CPUCAPACITY (default) > > (2) NO ASYM_CPUCAPACITY > > We also ran on a comparable ARM64 server (!smt) for comparison: > > (1) ASYM_CPUCAPACITY > > (2) NO ASYM_CPUCAPACITY (default) > > Both systems have 72 CPUs, run v6.8 and have a single MC sched domain > with LLC spanning over all 72 CPUs. During the tests there were ~750 > tasks among them the workload related: > > #hhvmworker 147 > #mariadbd 204 > #memcached 11 > #nginx 8 > #wrk 144 > #ProxygenWorker 1 > > load_balance: > > not_idle 3x more on (2) > > idle 2x more on (2) > > newly_idle 2-10x more on (2) > > wakeup: > > move_affine 2-3x more on (1) > > ttwu_local 1.5-2 more on (2) > > We also instrumented all the bailout conditions in select_task_sibling() > (sis())->select_idle_cpu() and select_idle_capacity() (sic()). > > In (1) almost all wakeups end up in select_idle_cpu() returning -1 due > to the fact that 'sd->shared->nr_idle_scan' under SIS_UTIL is 0. So > sis() in (1) almost always returns target (this_cpu or prev_cpu). sic() > doesn't do this. > > What I haven't done is to try (1) with SIS_UTIL or (2) with NO_SIS_UTIL. > > I wonder whether this is the underlying reason for the benefit of (1) > over (2) we see here with smt now? > > So IMHO before adding smt support to (1) for these small CPPC based CPU > capacity differences we should make sure that the same can't be achieved > by disabling SIS_UTIL or to soften it a bit. > > So does (2) with NO_SIS_UTIL performs worse than (1) with your smt > related add-ons in sic()? Thanks for running these experiments and sharing the data, this is very useful! I did a quick test on Vera using the NVBLAS benchmark, comparing NO ASYM_CPUCAPACITY with and without SIS_UTIL, but the difference seems to be within error range. I'll also run DCPerf MediaWiki with all the different configurations to see if I get similar results. More in general, I agree that for small capacity differences (e.g., within ~5%) the benefits of using ASYM_CPUCAPACITY is questionable. And I'm also fine to go back to the idea of grouping together CPUS within the 5% capacity window, if we think it's a safer approach (results in your case are quite evident, and BTW, that means we also shouldn't have ASYM_CPU_CAPACITY on Grace, so in theory the 5% threshold should also improve performance on Grace, that doesn't have SMT). That said, I still think there's value in adding SMT awareness to select_idle_capacity(). Even if we decide to avoid ASYM_CPUCAPACITY for small capacity deltas, we should ensure that the behavior remains reasonable if both features are enabled, for any reason. Right now, there are cases where the current behavior leads to significant performance degradation (~2x), so having a mechanism to prevent clearly suboptimal task placement still seems worthwhile. Essentially, what I'm saying is that one thing doesn't exclude the other. Thanks, -Andrea
On 31.03.26 11:04, Andrea Righi wrote: > Hi Dietmar, > > On Tue, Mar 31, 2026 at 12:30:55AM +0200, Dietmar Eggemann wrote: >> Hi Andrea, >> >> On 26.03.26 16:02, Andrea Righi wrote: [...] >> So does (2) with NO_SIS_UTIL performs worse than (1) with your smt >> related add-ons in sic()? > > Thanks for running these experiments and sharing the data, this is very > useful! > > I did a quick test on Vera using the NVBLAS benchmark, comparing NO > ASYM_CPUCAPACITY with and without SIS_UTIL, but the difference seems to be > within error range. I'll also run DCPerf MediaWiki with all the different I'm not familiar with the NVBLAS benchmark. Does it drive your system into 'sd->shared->nr_idle_scan = 0' state? We just have to understand where this benefit of using sic() instead of sis() is coming from. I'm doubtful that this is the best_cpu thing after if (!choose_idle_cpu(cpu, p)) in sic()'s for_each_cpu_wrap(cpu, cpus, target) loop given that the CPU capacity diffs are so small. > configurations to see if I get similar results. > > More in general, I agree that for small capacity differences (e.g., within > ~5%) the benefits of using ASYM_CPUCAPACITY is questionable. And I'm also > fine to go back to the idea of grouping together CPUS within the 5% > capacity window, if we think it's a safer approach (results in your case > are quite evident, and BTW, that means we also shouldn't have > ASYM_CPU_CAPACITY on Grace, so in theory the 5% threshold should also > improve performance on Grace, that doesn't have SMT). There shouldn't be so many machines with these binning-introduced small CPU capacity diffs out there? In fact, I only know about your Grace (!smt) and Vera (smt) machines. > That said, I still think there's value in adding SMT awareness to > select_idle_capacity(). Even if we decide to avoid ASYM_CPUCAPACITY for > small capacity deltas, we should ensure that the behavior remains > reasonable if both features are enabled, for any reason. Right now, there > are cases where the current behavior leads to significant performance > degradation (~2x), so having a mechanism to prevent clearly suboptimal task > placement still seems worthwhile. Essentially, what I'm saying is that one > thing doesn't exclude the other. IMHO, in case we would know where this improvement is coming from using sic() instead of default sis() (which already as smt support) then maybe, it's a lot of extra code at the end ... And mobile big.LITTLE (with larger CPU capacity diffs) doesn't have smt.
On Wed, 1 Apr 2026 at 13:57, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > > On 31.03.26 11:04, Andrea Righi wrote: > > Hi Dietmar, > > > > On Tue, Mar 31, 2026 at 12:30:55AM +0200, Dietmar Eggemann wrote: > >> Hi Andrea, > >> > >> On 26.03.26 16:02, Andrea Righi wrote: > > [...] > > >> So does (2) with NO_SIS_UTIL performs worse than (1) with your smt > >> related add-ons in sic()? > > > > Thanks for running these experiments and sharing the data, this is very > > useful! > > > > I did a quick test on Vera using the NVBLAS benchmark, comparing NO > > ASYM_CPUCAPACITY with and without SIS_UTIL, but the difference seems to be > > within error range. I'll also run DCPerf MediaWiki with all the different > > I'm not familiar with the NVBLAS benchmark. Does it drive your system > into 'sd->shared->nr_idle_scan = 0' state? > > We just have to understand where this benefit of using sic() instead of > sis() is coming from. I'm doubtful that this is the best_cpu thing after > if (!choose_idle_cpu(cpu, p)) in sic()'s for_each_cpu_wrap(cpu, cpus, > target) loop given that the CPU capacity diffs are so small. > > > configurations to see if I get similar results. > > > > More in general, I agree that for small capacity differences (e.g., within > > ~5%) the benefits of using ASYM_CPUCAPACITY is questionable. And I'm also > > fine to go back to the idea of grouping together CPUS within the 5% > > capacity window, if we think it's a safer approach (results in your case > > are quite evident, and BTW, that means we also shouldn't have > > ASYM_CPU_CAPACITY on Grace, so in theory the 5% threshold should also > > improve performance on Grace, that doesn't have SMT). > > There shouldn't be so many machines with these binning-introduced small > CPU capacity diffs out there? In fact, I only know about your Grace > (!smt) and Vera (smt) machines. In any case it's always better to add the support than enabling asym_packing > > > That said, I still think there's value in adding SMT awareness to > > select_idle_capacity(). Even if we decide to avoid ASYM_CPUCAPACITY for > > small capacity deltas, we should ensure that the behavior remains > > reasonable if both features are enabled, for any reason. Right now, there > > are cases where the current behavior leads to significant performance > > degradation (~2x), so having a mechanism to prevent clearly suboptimal task > > placement still seems worthwhile. Essentially, what I'm saying is that one > > thing doesn't exclude the other. > > IMHO, in case we would know where this improvement is coming from using > sic() instead of default sis() (which already as smt support) then > maybe, it's a lot of extra code at the end ... And mobile big.LITTLE > (with larger CPU capacity diffs) doesn't have smt. The last proposal based on prateek proposal in sic() doesn't seems that large
On Wed, Apr 01, 2026 at 02:08:27PM +0200, Vincent Guittot wrote: > On Wed, 1 Apr 2026 at 13:57, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > > > > On 31.03.26 11:04, Andrea Righi wrote: > > > Hi Dietmar, > > > > > > On Tue, Mar 31, 2026 at 12:30:55AM +0200, Dietmar Eggemann wrote: > > >> Hi Andrea, > > >> > > >> On 26.03.26 16:02, Andrea Righi wrote: > > > > [...] > > > > >> So does (2) with NO_SIS_UTIL performs worse than (1) with your smt > > >> related add-ons in sic()? > > > > > > Thanks for running these experiments and sharing the data, this is very > > > useful! > > > > > > I did a quick test on Vera using the NVBLAS benchmark, comparing NO > > > ASYM_CPUCAPACITY with and without SIS_UTIL, but the difference seems to be > > > within error range. I'll also run DCPerf MediaWiki with all the different > > > > I'm not familiar with the NVBLAS benchmark. Does it drive your system > > into 'sd->shared->nr_idle_scan = 0' state? It's something internally unfortunately... it's just running a single CPU-intensive task for each SMT core (in practice half of the CPUs tasks). I don't think we're hitting sd->shared->nr_idle_scan == 0 in this case. > > > > We just have to understand where this benefit of using sic() instead of > > sis() is coming from. I'm doubtful that this is the best_cpu thing after > > if (!choose_idle_cpu(cpu, p)) in sic()'s for_each_cpu_wrap(cpu, cpus, > > target) loop given that the CPU capacity diffs are so small. > > > > > configurations to see if I get similar results. > > > > > > More in general, I agree that for small capacity differences (e.g., within > > > ~5%) the benefits of using ASYM_CPUCAPACITY is questionable. And I'm also > > > fine to go back to the idea of grouping together CPUS within the 5% > > > capacity window, if we think it's a safer approach (results in your case > > > are quite evident, and BTW, that means we also shouldn't have > > > ASYM_CPU_CAPACITY on Grace, so in theory the 5% threshold should also > > > improve performance on Grace, that doesn't have SMT). > > > > There shouldn't be so many machines with these binning-introduced small > > CPU capacity diffs out there? In fact, I only know about your Grace > > (!smt) and Vera (smt) machines. > > In any case it's always better to add the support than enabling asym_packing > > > > > > That said, I still think there's value in adding SMT awareness to > > > select_idle_capacity(). Even if we decide to avoid ASYM_CPUCAPACITY for > > > small capacity deltas, we should ensure that the behavior remains > > > reasonable if both features are enabled, for any reason. Right now, there > > > are cases where the current behavior leads to significant performance > > > degradation (~2x), so having a mechanism to prevent clearly suboptimal task > > > placement still seems worthwhile. Essentially, what I'm saying is that one > > > thing doesn't exclude the other. > > > > IMHO, in case we would know where this improvement is coming from using > > sic() instead of default sis() (which already as smt support) then > > maybe, it's a lot of extra code at the end ... And mobile big.LITTLE > > (with larger CPU capacity diffs) doesn't have smt. > > The last proposal based on prateek proposal in sic() doesn't seems that large Exactly, I was referring just to that patch, which would solve the big part of the performance issue. We can ignore the ILB part for now. Thanks, -Andrea
On Wed, Apr 01, 2026 at 02:42:34PM +0200, Andrea Righi wrote: > On Wed, Apr 01, 2026 at 02:08:27PM +0200, Vincent Guittot wrote: > > On Wed, 1 Apr 2026 at 13:57, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > > > > > > On 31.03.26 11:04, Andrea Righi wrote: > > > > Hi Dietmar, > > > > > > > > On Tue, Mar 31, 2026 at 12:30:55AM +0200, Dietmar Eggemann wrote: > > > >> Hi Andrea, > > > >> > > > >> On 26.03.26 16:02, Andrea Righi wrote: > > > > > > [...] > > > > > > >> So does (2) with NO_SIS_UTIL performs worse than (1) with your smt > > > >> related add-ons in sic()? > > > > > > > > Thanks for running these experiments and sharing the data, this is very > > > > useful! > > > > > > > > I did a quick test on Vera using the NVBLAS benchmark, comparing NO > > > > ASYM_CPUCAPACITY with and without SIS_UTIL, but the difference seems to be > > > > within error range. I'll also run DCPerf MediaWiki with all the different > > > > > > I'm not familiar with the NVBLAS benchmark. Does it drive your system > > > into 'sd->shared->nr_idle_scan = 0' state? > > It's something internally unfortunately... it's just running a single > CPU-intensive task for each SMT core (in practice half of the CPUs tasks). > I don't think we're hitting sd->shared->nr_idle_scan == 0 in this case. Just finished running some tests with DCPerf MediaWiki on Vera as well (sorry, it took a while, I did mutliple runs to rule out potential flukes): +---------------------------------+--------+--------+--------+--------+ | Configuration | rps | p50 | p95 | p99 | +---------------------------------+--------+--------+--------+--------+ | NO ASYM + SIS_UTIL | 8113 | 0.067 | 0.184 | 0.225 | | NO ASYM + NO_SIS_UTIL | 8093 | 0.068 | 0.184 | 0.223 | | | | | | | | ASYM + SMT + SIS_UTIL | 8129 | 0.076 | 0.149 | 0.188 | | ASYM + SMT + NO_SIS_UTIL | 8138 | 0.076 | 0.148 | 0.186 | | | | | | | | ASYM + ILB SMT + SIS_UTIL | 8189 | 0.075 | 0.150 | 0.189 | | ASYM + SMT + ILB SMT + SIS_UTIL | 8185 | 0.076 | 0.151 | 0.190 | +---------------------------------+--------+--------+--------+--------+ Looking at the data: - SIS_UTIL doesn't seem relevant in this case (differences are within error range), - ASYM_CPU_CAPACITY seems to provide a small throughput gain, but it seems more beneficial for tail latency reduction, - the ILB SMT patch seems to slightly improve throughput, but the biggest benefit is still coming from ASYM_CPU_CAPACITY. Overall, also in this case it seems beneficial to use ASYM_CPU_CAPACITY rather than equalizing the capacities. That said, I'm still not sure why ASYM is helping. The frequency asymmetry is really small (~2%), so the latency improvements are unlikely to come from prioritizing the faster cores, as that should mainly affect throughput rather than tail latency and likely to a smaller extent. Thanks, -Andrea
On 3/27/26 02:02, Andrea Righi wrote: > This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by > introducing SMT awareness. > > = Problem = > > Nominal per-logical-CPU capacity can overstate usable compute when an SMT > sibling is busy, because the physical core doesn't deliver its full nominal > capacity. So, several SD_ASYM_CPUCAPACITY paths may pick high capacity CPUs > that are not actually good destinations. > > = Proposed Solution = > > This patch set aligns those paths with a simple rule already used > elsewhere: when SMT is active, prefer fully idle cores and avoid treating > partially idle SMT siblings as full-capacity targets where that would > mislead load balance. In kernel/sched/topology.c /* Don't attempt to spread across CPUs of different capacities. */ if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child) sd->child->flags &= ~SD_PREFER_SIBLING; Should handle the selection, but I guess this does not work for SMT level sd's? > > Patch set summary: > > - [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection > > Prefer fully-idle SMT cores in asym-capacity idle selection. In the > wakeup fast path, extend select_idle_capacity() / asym_fits_cpu() so > idle selection can prefer CPUs on fully idle cores, with a safe fallback. > > - [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity > > Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY. > Provided for consistency with PATCH 1/4. > > - [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems > > Enable EAS with SD_ASYM_CPUCAPACITY and SMT. Also provided for > consistency with PATCH 1/4. I've also tested with/without > /proc/sys/kernel/sched_energy_aware enabled (same platform) and haven't > noticed any regression. > > - [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer > > When choosing the housekeeping CPU that runs the idle load balancer, > prefer an idle CPU on a fully idle core so migrated work lands where > effective capacity is available. > > The change is still consistent with the same "avoid CPUs with busy > sibling" logic and it shows some benefits on Vera, but could have > negative impact on other systems, I'm including it for completeness > (feedback is appreciated). > > This patch set has been tested on the new NVIDIA Vera Rubin platform, where > SMT is enabled and the firmware exposes small frequency variations (+/-~5%) > as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set. > Are you referring to nominal_freq? > Without these patches, performance can drop up to ~2x with CPU-intensive > workloads, because the SD_ASYM_CPUCAPACITY idle selection policy does not > account for busy SMT siblings. > > Alternative approaches have been evaluated, such as equalizing CPU > capacities, either by exposing uniform values via firmware (ACPI/CPPC) or > normalizing them in the kernel by grouping CPUs within a small capacity > window (+-5%) [1][2], or enabling asympacking [3]. > > However, adding SMT awareness to SD_ASYM_CPUCAPACITY has shown better > results so far. Improving this policy also seems worthwhile in general, as > other platforms in the future may enable SMT with asymmetric CPU > topologies. > > [1] https://lore.kernel.org/lkml/20260324005509.1134981-1-arighi@nvidia.com > [2] https://lore.kernel.org/lkml/20260318092214.130908-1-arighi@nvidia.com > [3] https://lore.kernel.org/all/20260325181314.3875909-1-christian.loehle@arm.com/ > > Andrea Righi (4): > sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection > sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity > sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems > sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer > > kernel/sched/fair.c | 163 +++++++++++++++++++++++++++++++++++++++++++----- > kernel/sched/topology.c | 9 --- > 2 files changed, 147 insertions(+), 25 deletions(-) Thanks, Balbir
Hi Balbir, On Sun, Mar 29, 2026 at 12:03:19AM +1100, Balbir Singh wrote: > On 3/27/26 02:02, Andrea Righi wrote: > > This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by > > introducing SMT awareness. > > > > = Problem = > > > > Nominal per-logical-CPU capacity can overstate usable compute when an SMT > > sibling is busy, because the physical core doesn't deliver its full nominal > > capacity. So, several SD_ASYM_CPUCAPACITY paths may pick high capacity CPUs > > that are not actually good destinations. > > > > = Proposed Solution = > > > > This patch set aligns those paths with a simple rule already used > > elsewhere: when SMT is active, prefer fully idle cores and avoid treating > > partially idle SMT siblings as full-capacity targets where that would > > mislead load balance. > > In kernel/sched/topology.c > > /* Don't attempt to spread across CPUs of different capacities. */ > if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child) > sd->child->flags &= ~SD_PREFER_SIBLING; > > Should handle the selection, but I guess this does not work for SMT level sd's? IIUC, SD_PREFER_SIBLING steers load balance toward sibling_imbalance() (spread runnables across child/sibling domains), it doesn't encode the fully-idle core first logic. In practice it doesn't give us SMT-aware destination choice when a sibling is busy and this series is trying to cover that gap in the palcement path. BTW, on Vera the hierarchy is SMT -> MC -> NUMA: root@localhost:~# grep . /sys/kernel/debug/sched/domains/cpu0/domain*/flags /sys/kernel/debug/sched/domains/cpu0/domain0/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_CPUCAPACITY SD_SHARE_LLC SD_PREFER_SIBLING /sys/kernel/debug/sched/domains/cpu0/domain1/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_ASYM_CPUCAPACITY SD_SHARE_LLC /sys/kernel/debug/sched/domains/cpu0/domain2/flags:SD_BALANCE_NEWIDLE SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL SD_SERIALIZE SD_NUMA And domain1/groups_flags (child / SMT flags on the sched groups used at the MC level) still has SD_PREFER_SIBLING together with SD_SHARE_CPUCAPACITY. root@localhost:~# cat /sys/kernel/debug/sched/domains/cpu0/domain1/groups_flags SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_CPUCAPACITY SD_SHARE_LLC SD_PREFER_SIBLING So, prefer-sibling is still in play for SMT (including via MC groups_flags). On machines where asymmetry attaches immediately above SMT, topology may strip that flag and reduce this branch of behavior, but explicit SMT-aware placement still matters. > > > > Patch set summary: > > > > - [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection > > > > Prefer fully-idle SMT cores in asym-capacity idle selection. In the > > wakeup fast path, extend select_idle_capacity() / asym_fits_cpu() so > > idle selection can prefer CPUs on fully idle cores, with a safe fallback. > > > > - [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity > > > > Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY. > > Provided for consistency with PATCH 1/4. > > > > - [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems > > > > Enable EAS with SD_ASYM_CPUCAPACITY and SMT. Also provided for > > consistency with PATCH 1/4. I've also tested with/without > > /proc/sys/kernel/sched_energy_aware enabled (same platform) and haven't > > noticed any regression. > > > > - [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer > > > > When choosing the housekeeping CPU that runs the idle load balancer, > > prefer an idle CPU on a fully idle core so migrated work lands where > > effective capacity is available. > > > > The change is still consistent with the same "avoid CPUs with busy > > sibling" logic and it shows some benefits on Vera, but could have > > negative impact on other systems, I'm including it for completeness > > (feedback is appreciated). > > > > This patch set has been tested on the new NVIDIA Vera Rubin platform, where > > SMT is enabled and the firmware exposes small frequency variations (+/-~5%) > > as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set. > > > > Are you referring to nominal_freq? > Correct. Thanks, -Andrea
On 3/29/26 09:50, Andrea Righi wrote: > Hi Balbir, > > On Sun, Mar 29, 2026 at 12:03:19AM +1100, Balbir Singh wrote: >> On 3/27/26 02:02, Andrea Righi wrote: >>> This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by >>> introducing SMT awareness. >>> >>> = Problem = >>> >>> Nominal per-logical-CPU capacity can overstate usable compute when an SMT >>> sibling is busy, because the physical core doesn't deliver its full nominal >>> capacity. So, several SD_ASYM_CPUCAPACITY paths may pick high capacity CPUs >>> that are not actually good destinations. >>> >>> = Proposed Solution = >>> >>> This patch set aligns those paths with a simple rule already used >>> elsewhere: when SMT is active, prefer fully idle cores and avoid treating >>> partially idle SMT siblings as full-capacity targets where that would >>> mislead load balance. >> >> In kernel/sched/topology.c >> >> /* Don't attempt to spread across CPUs of different capacities. */ >> if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child) >> sd->child->flags &= ~SD_PREFER_SIBLING; >> >> Should handle the selection, but I guess this does not work for SMT level sd's? > > IIUC, SD_PREFER_SIBLING steers load balance toward sibling_imbalance() > (spread runnables across child/sibling domains), it doesn't encode the > fully-idle core first logic. In practice it doesn't give us SMT-aware > destination choice when a sibling is busy and this series is trying to > cover that gap in the palcement path. > Thanks, so we care about idle selection, not necessarily balancing and yes I did see that sd->child needs to be set for SD_PEFER_SIBLING to be cleared. > BTW, on Vera the hierarchy is SMT -> MC -> NUMA: > > root@localhost:~# grep . /sys/kernel/debug/sched/domains/cpu0/domain*/flags > /sys/kernel/debug/sched/domains/cpu0/domain0/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_CPUCAPACITY SD_SHARE_LLC SD_PREFER_SIBLING > /sys/kernel/debug/sched/domains/cpu0/domain1/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_ASYM_CPUCAPACITY SD_SHARE_LLC > /sys/kernel/debug/sched/domains/cpu0/domain2/flags:SD_BALANCE_NEWIDLE SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL SD_SERIALIZE SD_NUMA > > And domain1/groups_flags (child / SMT flags on the sched groups used at the > MC level) still has SD_PREFER_SIBLING together with SD_SHARE_CPUCAPACITY. > > root@localhost:~# cat /sys/kernel/debug/sched/domains/cpu0/domain1/groups_flags > SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_CPUCAPACITY SD_SHARE_LLC SD_PREFER_SIBLING > > So, prefer-sibling is still in play for SMT (including via MC > groups_flags). On machines where asymmetry attaches immediately above SMT, > topology may strip that flag and reduce this branch of behavior, but > explicit SMT-aware placement still matters. > >>> >>> Patch set summary: >>> >>> - [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection >>> >>> Prefer fully-idle SMT cores in asym-capacity idle selection. In the >>> wakeup fast path, extend select_idle_capacity() / asym_fits_cpu() so >>> idle selection can prefer CPUs on fully idle cores, with a safe fallback. >>> >>> - [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity >>> >>> Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY. >>> Provided for consistency with PATCH 1/4. >>> >>> - [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems >>> >>> Enable EAS with SD_ASYM_CPUCAPACITY and SMT. Also provided for >>> consistency with PATCH 1/4. I've also tested with/without >>> /proc/sys/kernel/sched_energy_aware enabled (same platform) and haven't >>> noticed any regression. >>> >>> - [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer >>> >>> When choosing the housekeeping CPU that runs the idle load balancer, >>> prefer an idle CPU on a fully idle core so migrated work lands where >>> effective capacity is available. >>> >>> The change is still consistent with the same "avoid CPUs with busy >>> sibling" logic and it shows some benefits on Vera, but could have >>> negative impact on other systems, I'm including it for completeness >>> (feedback is appreciated). >>> >>> This patch set has been tested on the new NVIDIA Vera Rubin platform, where >>> SMT is enabled and the firmware exposes small frequency variations (+/-~5%) >>> as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set. >>> >> >> Are you referring to nominal_freq? >> > > Correct. > Thanks, Balbir
Hi Andrea. On 3/26/26 8:32 PM, Andrea Righi wrote: > This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by > introducing SMT awareness. > > = Problem = > > Nominal per-logical-CPU capacity can overstate usable compute when an SMT > sibling is busy, because the physical core doesn't deliver its full nominal > capacity. So, several SD_ASYM_CPUCAPACITY paths may pick high capacity CPUs > that are not actually good destinations. > How does energy model define the opp for SMT? SMT systems have multiple of different functional blocks, a few ALU(arithmetic), LSU(load store unit) etc. If same/similar workload runs on sibling, it would affect the performance, but sibling is using different functional blocks, then it would not. So underlying actual CPU Capacity of each thread depends on what each sibling is running. I don't understand how does the firmware/energy models define this. > = Proposed Solution = > > This patch set aligns those paths with a simple rule already used > elsewhere: when SMT is active, prefer fully idle cores and avoid treating > partially idle SMT siblings as full-capacity targets where that would > mislead load balance. > > Patch set summary: > > - [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection > > Prefer fully-idle SMT cores in asym-capacity idle selection. In the > wakeup fast path, extend select_idle_capacity() / asym_fits_cpu() so > idle selection can prefer CPUs on fully idle cores, with a safe fallback. > > - [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity > > Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY. > Provided for consistency with PATCH 1/4. > > - [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems > > Enable EAS with SD_ASYM_CPUCAPACITY and SMT. Also provided for > consistency with PATCH 1/4. I've also tested with/without > /proc/sys/kernel/sched_energy_aware enabled (same platform) and haven't > noticed any regression. > > - [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer > > When choosing the housekeeping CPU that runs the idle load balancer, > prefer an idle CPU on a fully idle core so migrated work lands where > effective capacity is available. > > The change is still consistent with the same "avoid CPUs with busy > sibling" logic and it shows some benefits on Vera, but could have > negative impact on other systems, I'm including it for completeness > (feedback is appreciated). > > This patch set has been tested on the new NVIDIA Vera Rubin platform, where > SMT is enabled and the firmware exposes small frequency variations (+/-~5%) > as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set. > I assume the CPU_CAPACITY values fixed? first sibling has max, while other has less? > Without these patches, performance can drop up to ~2x with CPU-intensive > workloads, because the SD_ASYM_CPUCAPACITY idle selection policy does not > account for busy SMT siblings. > How is the performance measured here? Which benchmark? By any chance you are running number_running_task <= (nr_cpus / smt_threads_per_core), so it is all fitting nicely? If you increase those numbers, how does the performance numbers compare? Also, whats the system is like? SMT level? > Alternative approaches have been evaluated, such as equalizing CPU > capacities, either by exposing uniform values via firmware (ACPI/CPPC) or > normalizing them in the kernel by grouping CPUs within a small capacity > window (+-5%) [1][2], or enabling asympacking [3]. > > However, adding SMT awareness to SD_ASYM_CPUCAPACITY has shown better > results so far. Improving this policy also seems worthwhile in general, as > other platforms in the future may enable SMT with asymmetric CPU > topologies. > > [1] https://lore.kernel.org/lkml/20260324005509.1134981-1-arighi@nvidia.com > [2] https://lore.kernel.org/lkml/20260318092214.130908-1-arighi@nvidia.com > [3] https://lore.kernel.org/all/20260325181314.3875909-1-christian.loehle@arm.com/ > > Andrea Righi (4): > sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection > sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity > sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems > sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer > > kernel/sched/fair.c | 163 +++++++++++++++++++++++++++++++++++++++++++----- > kernel/sched/topology.c | 9 --- > 2 files changed, 147 insertions(+), 25 deletions(-)
On Fri, Mar 27, 2026 at 10:01:03PM +0530, Shrikanth Hegde wrote: > Hi Andrea. > > On 3/26/26 8:32 PM, Andrea Righi wrote: > > This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by > > introducing SMT awareness. > > > > = Problem = > > > > Nominal per-logical-CPU capacity can overstate usable compute when an SMT > > sibling is busy, because the physical core doesn't deliver its full nominal > > capacity. So, several SD_ASYM_CPUCAPACITY paths may pick high capacity CPUs > > that are not actually good destinations. > > > > How does energy model define the opp for SMT? For now, as suggested by Vincent, we should probably ignore EAS / energy model and keep it as it is (not compatible with SMT). I'll drop PATCH 3/4 and focus only at SD_ASYM_CPUCAPACITY + SMT. > > SMT systems have multiple of different functional blocks, a few ALU(arithmetic), > LSU(load store unit) etc. If same/similar workload runs on sibling, it would affect the > performance, but sibling is using different functional blocks, then it would > not. > > So underlying actual CPU Capacity of each thread depends on what each sibling is running. > I don't understand how does the firmware/energy models define this. They don't and they probably shouldn't. I don't think it's possible to model CPU capacity with a static nominal value when SMT is enabled, since the effective capacity changes if the corresponding sibling is busy or not. It should be up to the scheduler to figure out a reasonable way to estimate the actual capacity, considering the status of the other sibling (e.g., prioritizing the fully-idle SMT cores over the partially-idle SMT cores, like we do in other parts of the scheduler code). > > > = Proposed Solution = > > > > This patch set aligns those paths with a simple rule already used > > elsewhere: when SMT is active, prefer fully idle cores and avoid treating > > partially idle SMT siblings as full-capacity targets where that would > > mislead load balance. > > > > Patch set summary: > > > > - [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection > > > > Prefer fully-idle SMT cores in asym-capacity idle selection. In the > > wakeup fast path, extend select_idle_capacity() / asym_fits_cpu() so > > idle selection can prefer CPUs on fully idle cores, with a safe fallback. > > > > - [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity > > > > Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY. > > Provided for consistency with PATCH 1/4. > > > > - [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems > > > > Enable EAS with SD_ASYM_CPUCAPACITY and SMT. Also provided for > > consistency with PATCH 1/4. I've also tested with/without > > /proc/sys/kernel/sched_energy_aware enabled (same platform) and haven't > > noticed any regression. > > > > - [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer > > > > When choosing the housekeeping CPU that runs the idle load balancer, > > prefer an idle CPU on a fully idle core so migrated work lands where > > effective capacity is available. > > > > The change is still consistent with the same "avoid CPUs with busy > > sibling" logic and it shows some benefits on Vera, but could have > > negative impact on other systems, I'm including it for completeness > > (feedback is appreciated). > > > > This patch set has been tested on the new NVIDIA Vera Rubin platform, where > > SMT is enabled and the firmware exposes small frequency variations (+/-~5%) > > as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set. > > > > I assume the CPU_CAPACITY values fixed? > first sibling has max, while other has less? The firmware is exposing the same capacity for both siblings. SMT cores may have different capacity, but siblings within the same SMT core have the same capacity. There was an idea to expose a higher capacity for all the 1st sibling and a lower capacity for all the 2nd siblings, but I don't think it's a good idea, since that would just confuse the scheduler (and the 2nd sibling doesn't really have a lower nominal capacity if it's running alone). > > > Without these patches, performance can drop up to ~2x with CPU-intensive > > workloads, because the SD_ASYM_CPUCAPACITY idle selection policy does not > > account for busy SMT siblings. > > > > How is the performance measured here? Which benchmark? I've used an internal NVIDIA suite (based on NVBLAS), I also tried Linpack and got similar results. I'm planning to repeat the tests using public benchmarks and share the results as soon as I can. > By any chance you are running number_running_task <= (nr_cpus / smt_threads_per_core), > so it is all fitting nicely? That's the case that gives me the optimal results. > > If you increase those numbers, how does the performance numbers compare? I tried different number of tasks. The more I approach system saturation the smaller the benefits are. When I completely saturate the system I don't see any benefit with this changes, neither regressions, but I guess that's expected. > > Also, whats the system is like? SMT level? 2 siblings for each SMT core. Thanks, -Andrea
>> How is the performance measured here? Which benchmark? > > I've used an internal NVIDIA suite (based on NVBLAS), I also tried Linpack > and got similar results. I'm planning to repeat the tests using public > benchmarks and share the results as soon as I can. > >> By any chance you are running number_running_task <= (nr_cpus / smt_threads_per_core), >> so it is all fitting nicely? > > That's the case that gives me the optimal results. > >> >> If you increase those numbers, how does the performance numbers compare? > > I tried different number of tasks. The more I approach system saturation > the smaller the benefits are. When I completely saturate the system I don't > see any benefit with this changes, neither regressions, but I guess that's > expected. > Ok. That's good. I gave hackbench on powerpc with SMT=4, i didn't observe any regressions or improvements. Only PATCH 4/4 applies in this case as there is no asym_cpu_capacity
On 3/26/26 15:02, Andrea Righi wrote:
> This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by
> introducing SMT awareness.
>
> = Problem =
>
> Nominal per-logical-CPU capacity can overstate usable compute when an SMT
> sibling is busy, because the physical core doesn't deliver its full nominal
> capacity. So, several SD_ASYM_CPUCAPACITY paths may pick high capacity CPUs
> that are not actually good destinations.
>
> = Proposed Solution =
>
> This patch set aligns those paths with a simple rule already used
> elsewhere: when SMT is active, prefer fully idle cores and avoid treating
> partially idle SMT siblings as full-capacity targets where that would
> mislead load balance.
>
> Patch set summary:
>
> - [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
>
> Prefer fully-idle SMT cores in asym-capacity idle selection. In the
> wakeup fast path, extend select_idle_capacity() / asym_fits_cpu() so
> idle selection can prefer CPUs on fully idle cores, with a safe fallback.
>
> - [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity
>
> Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY.
> Provided for consistency with PATCH 1/4.
>
> - [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems
>
> Enable EAS with SD_ASYM_CPUCAPACITY and SMT. Also provided for
> consistency with PATCH 1/4. I've also tested with/without
> /proc/sys/kernel/sched_energy_aware enabled (same platform) and haven't
> noticed any regression.
There's a lot more to unpack, but just to confirm, Vera doesn't have an EM, right?
There's no EAS with it?
(To be more precise, CPPC should bail out of building an artifical EM if there's no
or only one efficiency class:
drivers/cpufreq/cppc_cpufreq.c:
if (bitmap_weight(used_classes, 256) <= 1) {
pr_debug("Efficiency classes are all equal (=%d). "
"No EM registered", class);
return;
}
This is the case, right?
> [snip]
On Thu, Mar 26, 2026 at 04:33:08PM +0000, Christian Loehle wrote:
> On 3/26/26 15:02, Andrea Righi wrote:
> > This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by
> > introducing SMT awareness.
> >
> > = Problem =
> >
> > Nominal per-logical-CPU capacity can overstate usable compute when an SMT
> > sibling is busy, because the physical core doesn't deliver its full nominal
> > capacity. So, several SD_ASYM_CPUCAPACITY paths may pick high capacity CPUs
> > that are not actually good destinations.
> >
> > = Proposed Solution =
> >
> > This patch set aligns those paths with a simple rule already used
> > elsewhere: when SMT is active, prefer fully idle cores and avoid treating
> > partially idle SMT siblings as full-capacity targets where that would
> > mislead load balance.
> >
> > Patch set summary:
> >
> > - [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
> >
> > Prefer fully-idle SMT cores in asym-capacity idle selection. In the
> > wakeup fast path, extend select_idle_capacity() / asym_fits_cpu() so
> > idle selection can prefer CPUs on fully idle cores, with a safe fallback.
> >
> > - [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity
> >
> > Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY.
> > Provided for consistency with PATCH 1/4.
> >
> > - [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems
> >
> > Enable EAS with SD_ASYM_CPUCAPACITY and SMT. Also provided for
> > consistency with PATCH 1/4. I've also tested with/without
> > /proc/sys/kernel/sched_energy_aware enabled (same platform) and haven't
> > noticed any regression.
>
>
> There's a lot more to unpack, but just to confirm, Vera doesn't have an EM, right?
> There's no EAS with it?
> (To be more precise, CPPC should bail out of building an artifical EM if there's no
> or only one efficiency class:
> drivers/cpufreq/cppc_cpufreq.c:
>
> if (bitmap_weight(used_classes, 256) <= 1) {
> pr_debug("Efficiency classes are all equal (=%d). "
> "No EM registered", class);
> return;
> }
>
> This is the case, right?
Yes, that's correct, so my testing on Vera with EAS isn't that meaningful.
Thanks,
-Andrea
© 2016 - 2026 Red Hat, Inc.