[PATCH 0/2] Fix NUMA sched domain build errors for GNR-X and CWF-X

Tim Chen posted 2 patches 1 month, 1 week ago
arch/x86/kernel/smpboot.c      | 28 ++++++++++++++++++++++++++++
include/linux/sched/topology.h |  1 +
kernel/sched/topology.c        | 33 +++++++++++++++++++++++++++------
3 files changed, 56 insertions(+), 6 deletions(-)
[PATCH 0/2] Fix NUMA sched domain build errors for GNR-X and CWF-X
Posted by Tim Chen 1 month, 1 week ago
While testing Granite Rapids X (GNR-X) and Clearwater Forest X (CWF-X) in
SNc-3 mode, we encountered sched domain build errors reported in dmesg.
Asymmetric node distances from local node to to nodes in remote package
was not expected by the scheduler domain code and also led to excessive
number of sched domain hierachy levels.

Fix the missing NUMA domain level set in topology_span_sane() check and
also simplify the distance to nodes in remote package to retain distance
symmetry and make the NUMA topology sane for GNR-X and CWF-X.

Tim Chen (1):
  sched: Fix sched domain build error for GNR-X, CWF-X in SNC-3 mode

Vinicius Costa Gomes (1):
  sched: topology: Fix topology validation error

 arch/x86/kernel/smpboot.c      | 28 ++++++++++++++++++++++++++++
 include/linux/sched/topology.h |  1 +
 kernel/sched/topology.c        | 33 +++++++++++++++++++++++++++------
 3 files changed, 56 insertions(+), 6 deletions(-)

-- 
2.32.0
Re: [PATCH 0/2] Fix NUMA sched domain build errors for GNR-X and CWF-X
Posted by K Prateek Nayak 1 month, 1 week ago
Hello Tim,

On 8/23/2025 1:44 AM, Tim Chen wrote:
> While testing Granite Rapids X (GNR-X) and Clearwater Forest X (CWF-X) in
> SNc-3 mode, we encountered sched domain build errors reported in dmesg.
> Asymmetric node distances from local node to to nodes in remote package
> was not expected by the scheduler domain code and also led to excessive
> number of sched domain hierachy levels.
> 
> Fix the missing NUMA domain level set in topology_span_sane() check and
> also simplify the distance to nodes in remote package to retain distance
> symmetry and make the NUMA topology sane for GNR-X and CWF-X.

I did some sanity testing on an EPYC platform  on NPS2/4 and didn't
see any changes to the sched domain layout or the sched_node_distance()
being used when constructing them with the series.

Feel free to include:

Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>

-- 
Thanks and Regards,
Prateek
Re: [PATCH 0/2] Fix NUMA sched domain build errors for GNR-X and CWF-X
Posted by Tim Chen 1 month, 1 week ago
On Mon, 2025-08-25 at 09:48 +0530, K Prateek Nayak wrote:
> Hello Tim,
> 
> On 8/23/2025 1:44 AM, Tim Chen wrote:
> > While testing Granite Rapids X (GNR-X) and Clearwater Forest X (CWF-X) in
> > SNc-3 mode, we encountered sched domain build errors reported in dmesg.
> > Asymmetric node distances from local node to to nodes in remote package
> > was not expected by the scheduler domain code and also led to excessive
> > number of sched domain hierachy levels.
> > 
> > Fix the missing NUMA domain level set in topology_span_sane() check and
> > also simplify the distance to nodes in remote package to retain distance
> > symmetry and make the NUMA topology sane for GNR-X and CWF-X.
> 
> I did some sanity testing on an EPYC platform  on NPS2/4 and didn't
> see any changes to the sched domain layout or the sched_node_distance()
> being used when constructing them with the series.
> 
> Feel free to include:
> 
> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> 

Thanks for testing it.

Tim