[v1] x86/topo: SNC Divination

[RFC][PATCH 5/6] x86/topo: Fix SNC topology mess

Posted by Peter Zijlstra 1 month, 1 week ago

So per 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode")

The original crazy SNC-3 SLIT table was:

node distances:
node     0    1    2    3    4    5
    0:   10   15   17   21   28   26
    1:   15   10   15   23   26   23
    2:   17   15   10   26   23   21
    3:   21   28   26   10   15   17
    4:   23   26   23   15   10   15
    5:   26   23   21   17   15   10

And per:

  https://lore.kernel.org/lkml/20250825075642.GQ3245006@noisy.programming.kicks-ass.net/

My suggestion was to average the off-trace clusters to restore sanity.

However, 4d6dd05d07d0 implements this under various assumptions:

 - there will never be more than 2 packages;
 - the off-trace cluster will have distance >20

And then HPE shows up with a machine that matches the
Vendor-Family-Model checks but looks like this:

Here's an 8 socket (2 chassis) HPE system with SNC enabled:

node   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
  0:  10  12  16  16  16  16  18  18  40  40  40  40  40  40  40  40
  1:  12  10  16  16  16  16  18  18  40  40  40  40  40  40  40  40
  2:  16  16  10  12  18  18  16  16  40  40  40  40  40  40  40  40
  3:  16  16  12  10  18  18  16  16  40  40  40  40  40  40  40  40
  4:  16  16  18  18  10  12  16  16  40  40  40  40  40  40  40  40
  5:  16  16  18  18  12  10  16  16  40  40  40  40  40  40  40  40
  6:  18  18  16  16  16  16  10  12  40  40  40  40  40  40  40  40
  7:  18  18  16  16  16  16  12  10  40  40  40  40  40  40  40  40
  8:  40  40  40  40  40  40  40  40  10  12  16  16  16  16  18  18
  9:  40  40  40  40  40  40  40  40  12  10  16  16  16  16  18  18
 10:  40  40  40  40  40  40  40  40  16  16  10  12  18  18  16  16
 11:  40  40  40  40  40  40  40  40  16  16  12  10  18  18  16  16
 12:  40  40  40  40  40  40  40  40  16  16  18  18  10  12  16  16
 13:  40  40  40  40  40  40  40  40  16  16  18  18  12  10  16  16
 14:  40  40  40  40  40  40  40  40  18  18  16  16  16  16  10  12
 15:  40  40  40  40  40  40  40  40  18  18  16  16  16  16  12  10

 10 = Same chassis and socket
 12 = Same chassis and socket (SNC)
 16 = Same chassis and adjacent socket
 18 = Same chassis and non-adjacent socket
 40 = Different chassis

*However* this is SNC-2.

This completely invalidates all the earlier assumptions and trips
WARNs.

Now that the topology code has a sensible measure of
nodes-per-package, we can use that to divinate the SNC mode at hand,
and only fix up SNC-3 topologies.

With the only assumption that there are no CPU-less nodes -- is this
a valid assumption ?

Fixes: 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/smpboot.c |   64 +++++++++++++++++-----------------------------
 1 file changed, 25 insertions(+), 39 deletions(-)

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -506,33 +506,32 @@ static void __init build_sched_topology(
 }
 
 #ifdef CONFIG_NUMA
-static int sched_avg_remote_distance;
-static int avg_remote_numa_distance(void)
+static int slit_cluster_distance(int i, int j)
 {
-	int i, j;
-	int distance, nr_remote, total_distance;
-
-	if (sched_avg_remote_distance > 0)
-		return sched_avg_remote_distance;
-
-	nr_remote = 0;
-	total_distance = 0;
-	for_each_node_state(i, N_CPU) {
-		for_each_node_state(j, N_CPU) {
-			distance = node_distance(i, j);
-
-			if (distance >= REMOTE_DISTANCE) {
-				nr_remote++;
-				total_distance += distance;
-			}
+	int u = __num_nodes_per_package;
+	long d = 0;
+	int x, y;
+
+	/*
+	 * Is this a unit cluster on the trace?
+	 */
+	if ((i / u) == (j / u))
+		return node_distance(i, j);
+
+	/*
+	 * Off-trace cluster, return average of the cluster to force symmetry.
+	 */
+	x = i - (i % u);
+	y = j - (j % u);
+
+	for (i = x; i < x + u; i++) {
+		for (j = y; j < y + u; j++) {
+			d += node_distance(i, j);
+			d += node_distance(j, i);
 		}
 	}
-	if (nr_remote)
-		sched_avg_remote_distance = total_distance / nr_remote;
-	else
-		sched_avg_remote_distance = REMOTE_DISTANCE;
 
-	return sched_avg_remote_distance;
+	return d / (2*u*u);
 }
 
 int arch_sched_node_distance(int from, int to)
@@ -542,13 +541,11 @@ int arch_sched_node_distance(int from, i
 	switch (boot_cpu_data.x86_vfm) {
 	case INTEL_GRANITERAPIDS_X:
 	case INTEL_ATOM_DARKMONT_X:
-
-		if (topology_max_packages() == 1 || __num_nodes_per_package == 1 ||
-		    d < REMOTE_DISTANCE)
+		if (topology_max_packages() == 1 || __num_nodes_per_package < 3)
 			return d;
 
 		/*
-		 * With SNC enabled, there could be too many levels of remote
+		 * With SNC-3 enabled, there could be too many levels of remote
 		 * NUMA node distances, creating NUMA domain levels
 		 * including local nodes and partial remote nodes.
 		 *
@@ -557,19 +554,8 @@ int arch_sched_node_distance(int from, i
 		 * in the remote package in the same sched group.
 		 * Simplify NUMA domains and avoid extra NUMA levels including
 		 * different remote NUMA nodes and local nodes.
-		 *
-		 * GNR and CWF don't expect systems with more than 2 packages
-		 * and more than 2 hops between packages. Single average remote
-		 * distance won't be appropriate if there are more than 2
-		 * packages as average distance to different remote packages
-		 * could be different.
 		 */
-		WARN_ONCE(topology_max_packages() > 2,
-			  "sched: Expect only up to 2 packages for GNR or CWF, "
-			  "but saw %d packages when building sched domains.",
-			  topology_max_packages());
-
-		d = avg_remote_numa_distance();
+		return slit_cluster_distance(from, to);
 	}
 	return d;
 }

Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess

Posted by Chen, Yu C 1 month, 1 week ago

Hi Peter,

On 2/26/2026 6:49 PM, Peter Zijlstra wrote:
> +	int u = __num_nodes_per_package;

Yes, this is much simpler, thanks for the patch!

> +	long d = 0;
> +	int x, y;
> +
> +	/*
> +	 * Is this a unit cluster on the trace?
> +	 */
> +	if ((i / u) == (j / u))
> +		return node_distance(i, j);

If the number of nodes per package is 3, we assume that
every 3 consecutive nodes are SNC siblings (on the same
trace):node0, node1, and node2 are SNC siblings, while
node3, node4, and node5 form another group of SNC siblings.

I have a curious thought: could it be possible that
node0, node2, and node4 are SNC siblings, and node1,
node3, and node5 are another set of SNC siblings instead?

Then I studied the code a little more, node ids are dynamically
allocated via the acpi_map_pxm_to_node, so the assignment of node
ids depends on the order in which each processor affinity structure
is listed in the SRAT table. For example, suppose CPU0 belongs to
package0 and CPU1 belongs to package1, but their entries are placed
consecutively in the SRAT. In this case, the Proximity Domain of
CPU0 would be mapped to node0 via acpi_map_pxm_to_node, and CPU1’s
Proximity Domain would be assigned node1. The logic above would
then treat them as belonging to the same package, even though they
are physically in different packages. However, I believe such a
scenario is unlikely to occur in practice in the BIOS and if it
happens it should be a BIOS bug if I understand correctly.

thanks,
Chenyu

Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess

Posted by Peter Zijlstra 1 month, 1 week ago

On Fri, Feb 27, 2026 at 01:07:40AM +0800, Chen, Yu C wrote:
> Hi Peter,
> 
> On 2/26/2026 6:49 PM, Peter Zijlstra wrote:
> > +	int u = __num_nodes_per_package;
> 
> Yes, this is much simpler, thanks for the patch!
> 
> > +	long d = 0;
> > +	int x, y;
> > +
> > +	/*
> > +	 * Is this a unit cluster on the trace?
> > +	 */
> > +	if ((i / u) == (j / u))
> > +		return node_distance(i, j);
> 
> If the number of nodes per package is 3, we assume that
> every 3 consecutive nodes are SNC siblings (on the same
> trace):node0, node1, and node2 are SNC siblings, while
> node3, node4, and node5 form another group of SNC siblings.
> 
> I have a curious thought: could it be possible that
> node0, node2, and node4 are SNC siblings, and node1,
> node3, and node5 are another set of SNC siblings instead?

Yes, give a BIOS guy enough bong-hits and this can be.

That said (and knock on wood), I've so far never seen this (and please
people, don't take this as a challenge).

> Then I studied the code a little more, node ids are dynamically
> allocated via the acpi_map_pxm_to_node, so the assignment of node
> ids depends on the order in which each processor affinity structure
> is listed in the SRAT table. For example, suppose CPU0 belongs to
> package0 and CPU1 belongs to package1, but their entries are placed
> consecutively in the SRAT. In this case, the Proximity Domain of
> CPU0 would be mapped to node0 via acpi_map_pxm_to_node, and CPU1’s
> Proximity Domain would be assigned node1. The logic above would
> then treat them as belonging to the same package, even though they
> are physically in different packages. However, I believe such a
> scenario is unlikely to occur in practice in the BIOS and if it
> happens it should be a BIOS bug if I understand correctly.

Just so.

The thing I worried about is getting memory only nodes iterated in
between or something. But as long as the CPU enumeration happens before
the 'other' crud, then the CPU node mappings should be the consecutive
low numbers and it all just works.

Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess

Posted by Tim Chen 1 month, 1 week ago

On Fri, 2026-02-27 at 01:07 +0800, Chen, Yu C wrote:
> Hi Peter,
> 
> On 2/26/2026 6:49 PM, Peter Zijlstra wrote:
> > +	int u = __num_nodes_per_package;
> 
> Yes, this is much simpler, thanks for the patch!
> 
> > +	long d = 0;
> > +	int x, y;
> > +
> > +	/*
> > +	 * Is this a unit cluster on the trace?
> > +	 */
> > +	if ((i / u) == (j / u))
> > +		return node_distance(i, j);
> 
> If the number of nodes per package is 3, we assume that
> every 3 consecutive nodes are SNC siblings (on the same
> trace):node0, node1, and node2 are SNC siblings, while
> node3, node4, and node5 form another group of SNC siblings.
> 
> I have a curious thought: could it be possible that
> node0, node2, and node4 are SNC siblings, and node1,
> node3, and node5 are another set of SNC siblings instead?
> 
> Then I studied the code a little more, node ids are dynamically
> allocated via the acpi_map_pxm_to_node, so the assignment of node
> ids depends on the order in which each processor affinity structure
> is listed in the SRAT table. For example, suppose CPU0 belongs to
> package0 and CPU1 belongs to package1, but their entries are placed
> consecutively in the SRAT. In this case, the Proximity Domain of
> CPU0 would be mapped to node0 via acpi_map_pxm_to_node, and CPU1’s
> Proximity Domain would be assigned node1. The logic above would
> then treat them as belonging to the same package, even though they
> are physically in different packages. However, I believe such a
> scenario is unlikely to occur in practice in the BIOS and if it
> happens it should be a BIOS bug if I understand correctly.
> 
> 

May be a good idea to sanity check that the nodes in the first unit cluster
has the same package id and give a WARNING if that's not the case.

Tim

Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess

Posted by Peter Zijlstra 1 month, 1 week ago

On Thu, Feb 26, 2026 at 11:00:38AM -0800, Tim Chen wrote:
> May be a good idea to sanity check that the nodes in the first unit cluster
> has the same package id and give a WARNING if that's not the case.

But then we'd also have check the second cluster is another package. And
if we're checking that, we might as well check they're symmetric.

Is this sufficiently paranoid for you? :-)


---
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -61,6 +61,7 @@
 #include <linux/cpuhotplug.h>
 #include <linux/mc146818rtc.h>
 #include <linux/acpi.h>
+#include <linux/once_lite.h>
 
 #include <asm/acpi.h>
 #include <asm/cacheinfo.h>
@@ -506,12 +507,58 @@ static void __init build_sched_topology(
 }
 
 #ifdef CONFIG_NUMA
+static bool slit_cluster_symmetric(int N)
+{
+	for (int k = 0; k < __num_nodes_per_package; k++) {
+		for (int l = k; l < __num_nodes_per_package; l++) {
+			if (node_distance(N + k, N + l) != 
+			    node_distance(N + l, N + k))
+				return false;
+		}
+	}
+
+	return true;
+}
+
+static u32 slit_cluster_package(int N)
+{
+	u32 pkg_id = ~0;
+
+	for (int n = 0; n < __num_nodes_per_package; n++) {
+		const struct cpumask *cpus = cpumask_of_node(N + n);
+		int cpu;
+
+		for_each_cpu(cpu, cpus) {
+			u32 id = topology_logical_package_id(cpu);
+			if (pkg_id == ~0)
+				pkg_id = id;
+			if (pkg_id != id)
+				return ~0;
+		}
+	}
+
+	return pkg_id;
+}
+
+/* If you NUMA_EMU on top of SNC, you get to keep the pieces */
+static void slit_validate(void)
+{
+	u32 pkg1 = slit_cluster_package(0);
+	u32 pkg2 = slit_cluster_package(__num_nodes_per_package);
+	WARN_ON(pkg1 == ~0 || pkg2 == ~0 || pkg1 == pkg2);
+
+	WARN_ON(!slit_cluster_symmetric(0));
+	WARN_ON(!slit_cluster_symmetric(__num_nodes_per_package));
+}
+
 static int slit_cluster_distance(int i, int j)
 {
 	int u = __num_nodes_per_package;
 	long d = 0;
 	int x, y;
 
+	DO_ONCE_LITE(slit_validate);
+
 	/*
 	 * Is this a unit cluster on the trace?
 	 */

Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess

Posted by Tim Chen 1 month, 1 week ago

On Fri, 2026-02-27 at 14:01 +0100, Peter Zijlstra wrote:
> On Thu, Feb 26, 2026 at 11:00:38AM -0800, Tim Chen wrote:
> > May be a good idea to sanity check that the nodes in the first unit cluster
> > has the same package id and give a WARNING if that's not the case.
> 
> But then we'd also have check the second cluster is another package. And
> if we're checking that, we might as well check they're symmetric.
> 
> Is this sufficiently paranoid for you? :-)
> 

Thanks. Looks pretty good and hopefully those warnings will
never be triggered.

Tim

> 
> ---
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -61,6 +61,7 @@
>  #include <linux/cpuhotplug.h>
>  #include <linux/mc146818rtc.h>
>  #include <linux/acpi.h>
> +#include <linux/once_lite.h>
>  
>  #include <asm/acpi.h>
>  #include <asm/cacheinfo.h>
> @@ -506,12 +507,58 @@ static void __init build_sched_topology(
>  }
>  
>  #ifdef CONFIG_NUMA
> +static bool slit_cluster_symmetric(int N)
> +{
> +	for (int k = 0; k < __num_nodes_per_package; k++) {
> +		for (int l = k; l < __num_nodes_per_package; l++) {
> +			if (node_distance(N + k, N + l) != 
> +			    node_distance(N + l, N + k))
> +				return false;
> +		}
> +	}
> +
> +	return true;
> +}
> +
> +static u32 slit_cluster_package(int N)
> +{
> +	u32 pkg_id = ~0;
> +
> +	for (int n = 0; n < __num_nodes_per_package; n++) {
> +		const struct cpumask *cpus = cpumask_of_node(N + n);
> +		int cpu;
> +
> +		for_each_cpu(cpu, cpus) {
> +			u32 id = topology_logical_package_id(cpu);
> +			if (pkg_id == ~0)
> +				pkg_id = id;
> +			if (pkg_id != id)
> +				return ~0;
> +		}
> +	}
> +
> +	return pkg_id;
> +}
> +
> +/* If you NUMA_EMU on top of SNC, you get to keep the pieces */
> +static void slit_validate(void)
> +{
> +	u32 pkg1 = slit_cluster_package(0);
> +	u32 pkg2 = slit_cluster_package(__num_nodes_per_package);
> +	WARN_ON(pkg1 == ~0 || pkg2 == ~0 || pkg1 == pkg2);
> +
> +	WARN_ON(!slit_cluster_symmetric(0));
> +	WARN_ON(!slit_cluster_symmetric(__num_nodes_per_package));
> +}
> +
>  static int slit_cluster_distance(int i, int j)
>  {
>  	int u = __num_nodes_per_package;
>  	long d = 0;
>  	int x, y;
>  
> +	DO_ONCE_LITE(slit_validate);
> +
>  	/*
>  	 * Is this a unit cluster on the trace?
>  	 */
>

Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess

Posted by Chen, Yu C 1 month, 1 week ago

On 2/28/2026 3:23 AM, Tim Chen wrote:
> On Fri, 2026-02-27 at 14:01 +0100, Peter Zijlstra wrote:
>> On Thu, Feb 26, 2026 at 11:00:38AM -0800, Tim Chen wrote:
>>> May be a good idea to sanity check that the nodes in the first unit cluster
>>> has the same package id and give a WARNING if that's not the case.
>>
>> But then we'd also have check the second cluster is another package. And
>> if we're checking that, we might as well check they're symmetric.
>>
>> Is this sufficiently paranoid for you? :-)
>>
> 
> Thanks. Looks pretty good and hopefully those warnings will
> never be triggered.
> 
>> +
>> +/* If you NUMA_EMU on top of SNC, you get to keep the pieces */
>> +static void slit_validate(void)
>> +{
>> +	u32 pkg1 = slit_cluster_package(0);
>> +	u32 pkg2 = slit_cluster_package(__num_nodes_per_package);
>> +	WARN_ON(pkg1 == ~0 || pkg2 == ~0 || pkg1 == pkg2);
>> +
>> +	WARN_ON(!slit_cluster_symmetric(0));
>> +	WARN_ON(!slit_cluster_symmetric(__num_nodes_per_package));
>> +}
>> +

Here we check packages0 and 1, should we check all the packages?

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8735f1968b00..91bac3e2e7fd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -543,12 +543,13 @@ static u32 slit_cluster_package(int N)
  /* If you NUMA_EMU on top of SNC, you get to keep the pieces */
  static void slit_validate(void)
  {
-       u32 pkg1 = slit_cluster_package(0);
-       u32 pkg2 = slit_cluster_package(__num_nodes_per_package);
-       WARN_ON(pkg1 == ~0 || pkg2 == ~0 || pkg1 == pkg2);
+       for (int pkg = 0; pkg < topology_max_packages(); pkg++) {
+               int node = pkg * __num_nodes_per_package;
+               u32 pkg_id = slit_cluster_package(node);

-       WARN_ON(!slit_cluster_symmetric(0));
-       WARN_ON(!slit_cluster_symmetric(__num_nodes_per_package));
+               WARN_ON(pkg_id == ~0);
+               WARN_ON(!slit_cluster_symmetric(node));
+       }
  }

Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess

Posted by Peter Zijlstra 1 month ago

nt On Sat, Feb 28, 2026 at 03:35:26PM +0800, Chen, Yu C wrote:

> Here we check packages0 and 1, should we check all the packages?

Might as well I suppose.

Could you boot queue/x86/topo on an snc-3 machine to verify it doesn't
explode with all the paranoia on?

Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess

Posted by Zhang Rui 1 month ago

On Mon, 2026-03-02 at 17:43 +0100, Peter Zijlstra wrote:
> nt On Sat, Feb 28, 2026 at 03:35:26PM +0800, Chen, Yu C wrote:
> 
> > Here we check packages0 and 1, should we check all the packages?
> 
> Might as well I suppose.
> 
> Could you boot queue/x86/topo on an snc-3 machine to verify it
> doesn't
> explode with all the paranoia on?

Hi, Peter,

regarding slit_validate() in
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=x86/topo&id=24ca94ac4b72803a7164b7ad84f06f0e9f0c49df

I suppose we want to use 
	WARN_ON_ONCE(!slit_cluster_symmetric(n))
rather than
	WARN_ON_ONCE(slit_cluster_symmetric(n));
right?

I tested the queue/x86/topo plus above change,

1. on GNR 4 sockets with SNC2, no difference about sched_domains in
/proc/schedstat compared with upstream 7.0-rc1, plus the below warning
is also gone
[   11.439633] ------------[ cut here ]------------
[   11.440491] sched: Expect only up to 2 packages for GNR or CWF, but
saw 4 packages when building sched domains.
[   11.440493] WARNING: arch/x86/kernel/smpboot.c:574 at
arch_sched_node_distance+0x133/0x140, CPU#0: swapper/0/1

2. on CWF 2 sockets with SNC3 and CWF 1 socket with SNC3, no difference
about sched_domains in /proc/schedstat compared with upstream 7.0-rc1

So

Tested-by: Zhang Rui <rui.zhang@intel.com>

-rui

Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess

Posted by Peter Zijlstra 1 month ago

On Tue, Mar 03, 2026 at 02:31:37PM +0800, Zhang Rui wrote:
> On Mon, 2026-03-02 at 17:43 +0100, Peter Zijlstra wrote:
> > nt On Sat, Feb 28, 2026 at 03:35:26PM +0800, Chen, Yu C wrote:
> > 
> > > Here we check packages0 and 1, should we check all the packages?
> > 
> > Might as well I suppose.
> > 
> > Could you boot queue/x86/topo on an snc-3 machine to verify it
> > doesn't
> > explode with all the paranoia on?
> 
> Hi, Peter,
> 
> regarding slit_validate() in
> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=x86/topo&id=24ca94ac4b72803a7164b7ad84f06f0e9f0c49df
> 
> I suppose we want to use 
> 	WARN_ON_ONCE(!slit_cluster_symmetric(n))
> rather than
> 	WARN_ON_ONCE(slit_cluster_symmetric(n));
> right?

D'0h yes, very much so!

Thanks!

Let me go expand the Changelogs some post and then probably stick it in
tip somewhere.

Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess

Posted by Chen, Yu C 1 month ago

On 3/3/2026 2:31 PM, Zhang Rui wrote:
> On Mon, 2026-03-02 at 17:43 +0100, Peter Zijlstra wrote:
>> nt On Sat, Feb 28, 2026 at 03:35:26PM +0800, Chen, Yu C wrote:
>>
>>> Here we check packages0 and 1, should we check all the packages?
>>
>> Might as well I suppose.
>>
>> Could you boot queue/x86/topo on an snc-3 machine to verify it
>> doesn't
>> explode with all the paranoia on?
> 
> Hi, Peter,
> 
> regarding slit_validate() in
> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=x86/topo&id=24ca94ac4b72803a7164b7ad84f06f0e9f0c49df
> 
> I suppose we want to use
> 	WARN_ON_ONCE(!slit_cluster_symmetric(n))
> rather than
> 	WARN_ON_ONCE(slit_cluster_symmetric(n));

With Rui's fix, tested on GNR ,2 sockets, SNC3, NUMA distance symmetric 
platform,
it works as expected,
  smp: Brought up 6 nodes, 384 CPUs
   domain-0: span=0,192 level=SMT
   domain-1: span=0-31,192-223 level=MC
   domain-2: span=0-95,192-287 level=NUMA
     groups: 0:{ span=0-31,192-223 cap=65536 }, 32:{ span=32-63,224-255 
cap=65536 }, 64:{ span=64-95,256-287 cap=65536 }
   domain-3: span=0-383 level=NUMA
     groups: 0:{ span=0-95,192-287 cap=196608 }, 96:{ 
span=96-191,288-383 cap=196608 }

resctrl also detects SNC3
root@a4bf018d7604:/sys/fs/resctrl/mon_data# tree mon_L3_00/
mon_L3_00/
├── llc_occupancy
├── mbm_local_bytes
├── mbm_total_bytes
├── mon_sub_L3_00
│   ├── llc_occupancy
│   ├── mbm_local_bytes
│   └── mbm_total_bytes
├── mon_sub_L3_01
│   ├── llc_occupancy
│   ├── mbm_local_bytes
│   └── mbm_total_bytes
└── mon_sub_L3_02
     ├── llc_occupancy
     ├── mbm_local_bytes
     └── mbm_total_bytes

> So
> 
> Tested-by: Zhang Rui <rui.zhang@intel.com>
> 

For this series,
Tested-by: Chen Yu <yu.c.chen@intel.com>

thanks,
Chenyu

Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess

Posted by Tim Chen 1 month, 1 week ago

On Thu, 2026-02-26 at 11:00 -0800, Tim Chen wrote:
> On Fri, 2026-02-27 at 01:07 +0800, Chen, Yu C wrote:
> > Hi Peter,
> > 
> > On 2/26/2026 6:49 PM, Peter Zijlstra wrote:
> > > +	int u = __num_nodes_per_package;
> > 
> > Yes, this is much simpler, thanks for the patch!
> > 
> > > +	long d = 0;
> > > +	int x, y;
> > > +
> > > +	/*
> > > +	 * Is this a unit cluster on the trace?
> > > +	 */
> > > +	if ((i / u) == (j / u))
> > > +		return node_distance(i, j);
> > 
> > If the number of nodes per package is 3, we assume that
> > every 3 consecutive nodes are SNC siblings (on the same
> > trace):node0, node1, and node2 are SNC siblings, while
> > node3, node4, and node5 form another group of SNC siblings.
> > 
> > I have a curious thought: could it be possible that
> > node0, node2, and node4 are SNC siblings, and node1,
> > node3, and node5 are another set of SNC siblings instead?
> > 
> > Then I studied the code a little more, node ids are dynamically
> > allocated via the acpi_map_pxm_to_node, so the assignment of node
> > ids depends on the order in which each processor affinity structure
> > is listed in the SRAT table. For example, suppose CPU0 belongs to
> > package0 and CPU1 belongs to package1, but their entries are placed
> > consecutively in the SRAT. In this case, the Proximity Domain of
> > CPU0 would be mapped to node0 via acpi_map_pxm_to_node, and CPU1’s
> > Proximity Domain would be assigned node1. The logic above would
> > then treat them as belonging to the same package, even though they
> > are physically in different packages. However, I believe such a
> > scenario is unlikely to occur in practice in the BIOS and if it
> > happens it should be a BIOS bug if I understand correctly.
> > 
> > 
> 
> May be a good idea to sanity check that the nodes in the first unit cluster
> has the same package id and give a WARNING if that's not the case.
> 
Perhaps something like below

Tim

---
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d97f8f4e014c..38384ea5253a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -511,6 +511,24 @@ static int slit_cluster_distance(int i, int j)
 	int u = __num_nodes_per_package;
 	long d = 0;
 	int x, y;
+	static int valid_slit = 0;
+
+	if (valid_slit == -1)
+		return node_distance(i, j);
+
+	if (valid_slit == 0) {
+		/* Check first nodes in package are grouped together consecutively */
+		for (x = 0; x < u-1 ; x++) {
+			if (topology_physical_package_id(x) !=
+			    topology_physical_package_id(x+1)) {
+				pr_warn("Expect nodes %d and %d to be in the same package",
+						x, x+1);
+				valid_slit = -1;
+				return node_distance(i, j);
+			}
+		}
+		valid_slit = 1;
+	}
 
 	/*
 	 * Is this a unit cluster on the trace?

Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess

Posted by Tim Chen 1 month, 1 week ago

On Thu, 2026-02-26 at 14:11 -0800, Tim Chen wrote:
> On Thu, 2026-02-26 at 11:00 -0800, Tim Chen wrote:
> > On Fri, 2026-02-27 at 01:07 +0800, Chen, Yu C wrote:
> > > Hi Peter,
> > > 
> > > On 2/26/2026 6:49 PM, Peter Zijlstra wrote:
> > > > +	int u = __num_nodes_per_package;
> > > 
> > > Yes, this is much simpler, thanks for the patch!
> > > 
> > > > +	long d = 0;
> > > > +	int x, y;
> > > > +
> > > > +	/*
> > > > +	 * Is this a unit cluster on the trace?
> > > > +	 */
> > > > +	if ((i / u) == (j / u))
> > > > +		return node_distance(i, j);
> > > 
> > > If the number of nodes per package is 3, we assume that
> > > every 3 consecutive nodes are SNC siblings (on the same
> > > trace):node0, node1, and node2 are SNC siblings, while
> > > node3, node4, and node5 form another group of SNC siblings.
> > > 
> > > I have a curious thought: could it be possible that
> > > node0, node2, and node4 are SNC siblings, and node1,
> > > node3, and node5 are another set of SNC siblings instead?
> > > 
> > > Then I studied the code a little more, node ids are dynamically
> > > allocated via the acpi_map_pxm_to_node, so the assignment of node
> > > ids depends on the order in which each processor affinity structure
> > > is listed in the SRAT table. For example, suppose CPU0 belongs to
> > > package0 and CPU1 belongs to package1, but their entries are placed
> > > consecutively in the SRAT. In this case, the Proximity Domain of
> > > CPU0 would be mapped to node0 via acpi_map_pxm_to_node, and CPU1’s
> > > Proximity Domain would be assigned node1. The logic above would
> > > then treat them as belonging to the same package, even though they
> > > are physically in different packages. However, I believe such a
> > > scenario is unlikely to occur in practice in the BIOS and if it
> > > happens it should be a BIOS bug if I understand correctly.
> > > 
> > > 
> > 
> > May be a good idea to sanity check that the nodes in the first unit cluster
> > has the same package id and give a WARNING if that's not the case.
> > 
> Perhaps something like below
> 
> Tim
> 
> ---
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index d97f8f4e014c..38384ea5253a 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -511,6 +511,24 @@ static int slit_cluster_distance(int i, int j)
>  	int u = __num_nodes_per_package;
>  	long d = 0;
>  	int x, y;
> +	static int valid_slit = 0;
> +
> +	if (valid_slit == -1)
> +		return node_distance(i, j);
> +
> +	if (valid_slit == 0) {
> +		/* Check first nodes in package are grouped together consecutively */
> +		for (x = 0; x < u-1 ; x++) {
> +			if (topology_physical_package_id(x) !=
> +			    topology_physical_package_id(x+1)) {

This won't work because topology_physical_package_id() takes cpu
as argument.  Will need to find the first cpu of x and x+1 and pass to it.

> +				pr_warn("Expect nodes %d and %d to be in the same package",
> +						x, x+1);
> +				valid_slit = -1;
> +				return node_distance(i, j);
> +			}
> +		}
> +		valid_slit = 1;
> +	}
>  
>  	/*
>  	 * Is this a unit cluster on the trace?
>