[PATCH 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled

Li Chen posted 2 patches 3 months, 2 weeks ago
There is a newer version of this series
[PATCH 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
Posted by Li Chen 3 months, 2 weeks ago
From: Li Chen <chenl311@chinatelecom.cn>

Currently, the SMT domain is added into sched_domain_topology
by default if CONFIG_SCHED_SMT is enabled.

If cpu_attach_domain finds that the CPU SMT domain’s cpumask_weight
is just 1, it will destroy_sched_domain it.

On a large machine, such as one with 512 cores, this results in
512 redundant domain attach/destroy operations.

We can avoid these unnecessary operations by simply checking
cpu_smt_num_threads and not inserting SMT domain into x86_topology if SMT
is not enabled.

Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
---
 arch/x86/kernel/smpboot.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 7d202f9785362..9ff8b10715cc1 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -492,8 +492,24 @@ static struct sched_domain_topology_level x86_topology[] = {
 	{ NULL },
 };
 
+static void __init maybe_remove_smt_level(void)
+{
+	if (cpu_smt_num_threads <= 1) {
+		/*
+		 * SMT level is x86_topology[0].  Shift the array left by one,
+		 * keep the sentinel { NULL } at the end.
+		 */
+		memmove(&x86_topology[0], &x86_topology[1],
+			sizeof(x86_topology) - sizeof(x86_topology[0]));
+		memset(&x86_topology[ARRAY_SIZE(x86_topology) - 1], 0,
+		       sizeof(x86_topology[0]));
+	}
+}
+
 static void __init build_sched_topology(void)
 {
+	maybe_remove_smt_level();
+
 	/*
 	 * When there is NUMA topology inside the package invalidate the
 	 * PKG domain since the NUMA domains will auto-magically create the
-- 
2.49.0

Re: [PATCH 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
Posted by Thomas Gleixner 3 months, 2 weeks ago
On Tue, Jun 24 2025 at 16:08, Li Chen wrote:
> From: Li Chen <chenl311@chinatelecom.cn>
>
> Currently, the SMT domain is added into sched_domain_topology
> by default if CONFIG_SCHED_SMT is enabled.
>
> If cpu_attach_domain finds that the CPU SMT domain’s cpumask_weight

If cpu_attach_domain()

IIRC, I told you that before.

> is just 1, it will destroy_sched_domain it.
>
> On a large machine, such as one with 512 cores, this results in
> 512 redundant domain attach/destroy operations.
>
> We can avoid these unnecessary operations by simply checking

s/We can avoid/Avoid/

Care to read my reviews? If you disagree, then discuss it with me, but
silently ignoring it them is not an option.

> cpu_smt_num_threads and not inserting SMT domain into x86_topology if SMT

not inserting? That's not what this new version does.

> +static void __init maybe_remove_smt_level(void)
> +{
> +	if (cpu_smt_num_threads <= 1) {
> +		/*
> +		 * SMT level is x86_topology[0].  Shift the array left by one,
> +		 * keep the sentinel { NULL } at the end.
> +		 */
> +		memmove(&x86_topology[0], &x86_topology[1],
> +			sizeof(x86_topology) - sizeof(x86_topology[0]));
> +		memset(&x86_topology[ARRAY_SIZE(x86_topology) - 1], 0,
> +		       sizeof(x86_topology[0]));

So this sets the last entry in the array, aka the original sentinel in
the last array entry, to zero...

This is completely pointless. The above memmove() copies

	topo[1 .. (N - 1)] 
to
	topo[0 .. (N - 2)]

Where N = ARRAY_SIZE(topo).

Therefore
        topo[N - 1] == NULL
and
        topo[N - 2] == NULL

No?

But then what's worse is that you fail to take that removal into account
for the x86_has_numa_in_package case, which still unconditionally sets
topo[N - 2] to zero even if the SMT level had been removed...

Please take your time and do not rush out half baked stuff.

Thanks,

        tglx
Re: [PATCH 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
Posted by Li Chen 3 months, 2 weeks ago
Hi Thomas. 

 ---- On Tue, 24 Jun 2025 21:36:10 +0800  Thomas Gleixner <tglx@linutronix.de> wrote --- 
 > On Tue, Jun 24 2025 at 16:08, Li Chen wrote:
 > > From: Li Chen <chenl311@chinatelecom.cn>
 > >
 > > Currently, the SMT domain is added into sched_domain_topology
 > > by default if CONFIG_SCHED_SMT is enabled.
 > >
 > > If cpu_attach_domain finds that the CPU SMT domain’s cpumask_weight
 > 
 > If cpu_attach_domain()
 > 
 > IIRC, I told you that before.
 > 
 > > is just 1, it will destroy_sched_domain it.
 > >
 > > On a large machine, such as one with 512 cores, this results in
 > > 512 redundant domain attach/destroy operations.
 > >
 > > We can avoid these unnecessary operations by simply checking
 > 
 > s/We can avoid/Avoid/
 > 
 > Care to read my reviews? If you disagree, then discuss it with me, but

I'm sorry that I forget to say that your previous wording review have already been fixed in v2 https://lore.kernel.org/all/20250624085559.69436-3-me@linux.beauty/

And I would replace cpu_attach_domain with cpu_attach_domain().

Sorry for wasting your time. 

 > silently ignoring it them is not an option.
 > 
 > > cpu_smt_num_threads and not inserting SMT domain into x86_topology if SMT
 > 
 > not inserting? That's not what this new version does.
 > 
 > > +static void __init maybe_remove_smt_level(void)
 > > +{
 > > +    if (cpu_smt_num_threads <= 1) {
 > > +        /*
 > > +         * SMT level is x86_topology[0].  Shift the array left by one,
 > > +         * keep the sentinel { NULL } at the end.
 > > +         */
 > > +        memmove(&x86_topology[0], &x86_topology[1],
 > > +            sizeof(x86_topology) - sizeof(x86_topology[0]));
 > > +        memset(&x86_topology[ARRAY_SIZE(x86_topology) - 1], 0,
 > > +               sizeof(x86_topology[0]));
 > 
 > So this sets the last entry in the array, aka the original sentinel in
 > the last array entry, to zero...
 > 
 > This is completely pointless. The above memmove() copies
 > 
 >     topo[1 .. (N - 1)] 
 > to
 >     topo[0 .. (N - 2)]
 > 
 > Where N = ARRAY_SIZE(topo).
 > 
 > Therefore
 >         topo[N - 1] == NULL
 > and
 >         topo[N - 2] == NULL
 > 
 > No?
 > 
 > But then what's worse is that you fail to take that removal into account
 > for the x86_has_numa_in_package case, which still unconditionally sets
 > topo[N - 2] to zero even if the SMT level had been removed...
 > 
 > Please take your time and do not rush out half baked stuff.
 
Sorry again for my mistake. I will fix it in v3.

Regards,
Li