[PATCH 1/2] sched: topology: Fix topology validation error

Tim Chen posted 2 patches 1 month, 1 week ago
[PATCH 1/2] sched: topology: Fix topology validation error
Posted by Tim Chen 1 month, 1 week ago
From: Vinicius Costa Gomes <vinicius.gomes@intel.com>

As sd_numa_mask() (the function behind tl->mask() for the NUMA levels
of the topology) depends on the value of sched_domains_curr_level,
it's possible to be iterating over a level while, sd_numa_mask()
thinks we are in another, causing the topology validation to fail (for
valid cases).

Set sched_domains_curr_level to the current topology level while
iterating.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/topology.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 977e133bb8a4..9a7ac67e3d63 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2394,6 +2394,14 @@ static bool topology_span_sane(const struct cpumask *cpu_map)
 	for_each_sd_topology(tl) {
 		int tl_common_flags = 0;
 
+#ifdef CONFIG_NUMA
+		/*
+		 * sd_numa_mask() (one of the possible values of
+		 * tl->mask()) depends on the current level to work
+		 * correctly.
+		 */
+		sched_domains_curr_level = tl->numa_level;
+#endif
 		if (tl->sd_flags)
 			tl_common_flags = (*tl->sd_flags)();
 
-- 
2.32.0
Re: [PATCH 1/2] sched: topology: Fix topology validation error
Posted by Peter Zijlstra 1 month, 1 week ago
On Fri, Aug 22, 2025 at 01:14:14PM -0700, Tim Chen wrote:
> From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> 
> As sd_numa_mask() (the function behind tl->mask() for the NUMA levels
> of the topology) depends on the value of sched_domains_curr_level,
> it's possible to be iterating over a level while, sd_numa_mask()
> thinks we are in another, causing the topology validation to fail (for
> valid cases).
> 
> Set sched_domains_curr_level to the current topology level while
> iterating.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  kernel/sched/topology.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 977e133bb8a4..9a7ac67e3d63 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2394,6 +2394,14 @@ static bool topology_span_sane(const struct cpumask *cpu_map)
>  	for_each_sd_topology(tl) {
>  		int tl_common_flags = 0;
>  
> +#ifdef CONFIG_NUMA
> +		/*
> +		 * sd_numa_mask() (one of the possible values of
> +		 * tl->mask()) depends on the current level to work
> +		 * correctly.
> +		 */

This is propagating that ugly hack from sd_init(), isn't it. Except its
pretending like its sane code... And for what?

> +		sched_domains_curr_level = tl->numa_level;
> +#endif
>  		if (tl->sd_flags)
>  			tl_common_flags = (*tl->sd_flags)();
>  
		if (tl_common_flags & SD_NUMA)
			continue;

So how does this make any difference ?

We should never get to calling tl->mask() for NUMA.
Re: [PATCH 1/2] sched: topology: Fix topology validation error
Posted by Tim Chen 1 month, 1 week ago
On Mon, 2025-08-25 at 09:25 +0200, Peter Zijlstra wrote:
> On Fri, Aug 22, 2025 at 01:14:14PM -0700, Tim Chen wrote:
> > From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > 
> > As sd_numa_mask() (the function behind tl->mask() for the NUMA levels
> > of the topology) depends on the value of sched_domains_curr_level,
> > it's possible to be iterating over a level while, sd_numa_mask()
> > thinks we are in another, causing the topology validation to fail (for
> > valid cases).
> > 
> > Set sched_domains_curr_level to the current topology level while
> > iterating.
> > 
> > Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > ---
> >  kernel/sched/topology.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 977e133bb8a4..9a7ac67e3d63 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -2394,6 +2394,14 @@ static bool topology_span_sane(const struct cpumask *cpu_map)
> >  	for_each_sd_topology(tl) {
> >  		int tl_common_flags = 0;
> >  
> > +#ifdef CONFIG_NUMA
> > +		/*
> > +		 * sd_numa_mask() (one of the possible values of
> > +		 * tl->mask()) depends on the current level to work
> > +		 * correctly.
> > +		 */
> 
> This is propagating that ugly hack from sd_init(), isn't it. Except its
> pretending like its sane code... And for what?

How about the following fix for the CONFIG_NUMA case?  Will this be more sane? 

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b958fe48e020..a92457fed135 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1758,7 +1758,7 @@ static struct sched_domain_topology_level *sched_domain_topology =
 static struct sched_domain_topology_level *sched_domain_topology_saved;
 
 #define for_each_sd_topology(tl)                       \
-       for (tl = sched_domain_topology; tl->mask; tl++)
+       for (tl = sched_domain_topology; tl->mask; ++tl, sched_domains_curr_level = tl->numa_level)
 
 void __init set_sched_topology(struct sched_domain_topology_level *tl)
 {


> 
> > +		sched_domains_curr_level = tl->numa_level;
> > +#endif
> >  		if (tl->sd_flags)
> >  			tl_common_flags = (*tl->sd_flags)();
> >  
> 		if (tl_common_flags & SD_NUMA)
> 			continue;
> 
> So how does this make any difference ?
> 
> We should never get to calling tl->mask() for NUMA.
> 

True.  I think we originally was fixing the v6.16 case which
wasn't checking for the SD_NUMA flag.  Overlooked that when we ported
the fix.

That said, I think that the for_each_sd_topology() macro needs to have
sched_domains_curr_level updated to prevent future problems.

Tim
Re: [PATCH 1/2] sched: topology: Fix topology validation error
Posted by K Prateek Nayak 1 month, 1 week ago
Hello Tim, Vinicius,

On 8/23/2025 1:44 AM, Tim Chen wrote:
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2394,6 +2394,14 @@ static bool topology_span_sane(const struct cpumask *cpu_map)
>  	for_each_sd_topology(tl) {
>  		int tl_common_flags = 0;
>  
> +#ifdef CONFIG_NUMA
> +		/*
> +		 * sd_numa_mask() (one of the possible values of
> +		 * tl->mask()) depends on the current level to work
> +		 * correctly.
> +		 */
> +		sched_domains_curr_level = tl->numa_level;
> +#endif

A similar solution was proposed in [1] and after a few iterations, we
arrived at [2] as a potential solution to this issue. Now that the merge
window is behind us, Peter would it be possible to pick one of these up?

P.S. Leon has confirmed this solved the splat of their deployments too
on an earlier version [3].

[1] https://lore.kernel.org/lkml/20250624041235.1589-1-kprateek.nayak@amd.com/
[2] https://lore.kernel.org/lkml/20250715040824.893-1-kprateek.nayak@amd.com/
[3] https://lore.kernel.org/lkml/20250720104136.GI402218@unreal/

>  		if (tl->sd_flags)
>  			tl_common_flags = (*tl->sd_flags)();
>  

-- 
Thanks and Regards,
Prateek
Re: [PATCH 1/2] sched: topology: Fix topology validation error
Posted by Peter Zijlstra 1 month, 1 week ago
On Mon, Aug 25, 2025 at 08:48:29AM +0530, K Prateek Nayak wrote:
> Hello Tim, Vinicius,
> 
> On 8/23/2025 1:44 AM, Tim Chen wrote:
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -2394,6 +2394,14 @@ static bool topology_span_sane(const struct cpumask *cpu_map)
> >  	for_each_sd_topology(tl) {
> >  		int tl_common_flags = 0;
> >  
> > +#ifdef CONFIG_NUMA
> > +		/*
> > +		 * sd_numa_mask() (one of the possible values of
> > +		 * tl->mask()) depends on the current level to work
> > +		 * correctly.
> > +		 */
> > +		sched_domains_curr_level = tl->numa_level;
> > +#endif
> 
> A similar solution was proposed in [1] and after a few iterations, we
> arrived at [2] as a potential solution to this issue. Now that the merge
> window is behind us, Peter would it be possible to pick one of these up?
> 
> P.S. Leon has confirmed this solved the splat of their deployments too
> on an earlier version [3].
> 
> [1] https://lore.kernel.org/lkml/20250624041235.1589-1-kprateek.nayak@amd.com/
> [2] https://lore.kernel.org/lkml/20250715040824.893-1-kprateek.nayak@amd.com/
> [3] https://lore.kernel.org/lkml/20250720104136.GI402218@unreal/

I'm sure that's stuck somewhere in my holiday backlog ... Let me go try
and find it.
Re: [PATCH 1/2] sched: topology: Fix topology validation error
Posted by Peter Zijlstra 1 month, 1 week ago
On Mon, Aug 25, 2025 at 09:58:07AM +0200, Peter Zijlstra wrote:
> On Mon, Aug 25, 2025 at 08:48:29AM +0530, K Prateek Nayak wrote:
> > Hello Tim, Vinicius,
> > 
> > On 8/23/2025 1:44 AM, Tim Chen wrote:
> > > --- a/kernel/sched/topology.c
> > > +++ b/kernel/sched/topology.c
> > > @@ -2394,6 +2394,14 @@ static bool topology_span_sane(const struct cpumask *cpu_map)
> > >  	for_each_sd_topology(tl) {
> > >  		int tl_common_flags = 0;
> > >  
> > > +#ifdef CONFIG_NUMA
> > > +		/*
> > > +		 * sd_numa_mask() (one of the possible values of
> > > +		 * tl->mask()) depends on the current level to work
> > > +		 * correctly.
> > > +		 */
> > > +		sched_domains_curr_level = tl->numa_level;
> > > +#endif
> > 
> > A similar solution was proposed in [1] and after a few iterations, we
> > arrived at [2] as a potential solution to this issue. Now that the merge
> > window is behind us, Peter would it be possible to pick one of these up?
> > 
> > P.S. Leon has confirmed this solved the splat of their deployments too
> > on an earlier version [3].
> > 
> > [1] https://lore.kernel.org/lkml/20250624041235.1589-1-kprateek.nayak@amd.com/
> > [2] https://lore.kernel.org/lkml/20250715040824.893-1-kprateek.nayak@amd.com/
> > [3] https://lore.kernel.org/lkml/20250720104136.GI402218@unreal/
> 
> I'm sure that's stuck somewhere in my holiday backlog ... Let me go try
> and find it.

Replied there.

  https://lkml.kernel.org/r/20250825091910.GT3245006@noisy.programming.kicks-ass.net