[PATCH v1] cpu: Add missing check to cpuhp_smt_enable()

Rafael J. Wysocki posted 1 patch 1 month ago
kernel/cpu.c |    6 ++++++
1 file changed, 6 insertions(+)
[PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
Posted by Rafael J. Wysocki 1 month ago
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Christian has reported that commit a430c11f4015 ("intel_idle: Rescan
"dead" SMT siblings during initialization") broke the use case in
which both nosmt and maxcpus were added to the kernel command line
because it caused CPUs that were not SMT siblings to be brought
online during the intel_idle driver initialization in violation of the
maxcpus limit.

The underlying reason for this is a missing topology_is_primary_thread()
check in cpuhp_smt_enable() which causes that function to put online
more CPUs than just the SMT siblings that it is supposed to handle.

Add the missing check to address the issue.

Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization")
Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/
Reported-by: Christian Loehle <christian.loehle@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 kernel/cpu.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2710,6 +2710,12 @@
 	cpu_maps_update_begin();
 	cpu_smt_control = CPU_SMT_ENABLED;
 	for_each_present_cpu(cpu) {
+		/*
+		 * Avoid accidentally onlining primary thread CPUs that have
+		 * been taken offline.
+		 */
+		if (topology_is_primary_thread(cpu))
+			continue;
 		/* Skip online CPUs and CPUs on offline nodes */
 		if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
 			continue;
Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
Posted by Thomas Gleixner 4 weeks ago
On Fri, Aug 29 2025 at 22:01, Rafael J. Wysocki wrote:
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2710,6 +2710,12 @@
>  	cpu_maps_update_begin();
>  	cpu_smt_control = CPU_SMT_ENABLED;
>  	for_each_present_cpu(cpu) {
> +		/*
> +		 * Avoid accidentally onlining primary thread CPUs that have
> +		 * been taken offline.
> +		 */
> +		if (topology_is_primary_thread(cpu))
> +			continue;

Hmm. That's kinda solving the problem, but I think the proper solution
would be to implement topology_is_core_online() for x86.

The above is preventing the primary thread to be onlined, but then
onlines the corresponding hyperthread, which does not really make sense.

Something like the completely untested below.

Thanks,

        tglx
---
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 6c79ee7c0957..21041898157a 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -231,6 +231,16 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
 }
 #define topology_is_primary_thread topology_is_primary_thread
 
+int topology_get_primary_thread(unsigned int cpu);
+
+static inline bool topology_is_core_online(unsigned int cpu)
+{
+	int pcpu = topology_get_primary_thread(cpu);
+
+	return pcpu >= 0 ? cpu_online(pcpu) : false;
+}
+#define topology_is_core_online topology_is_core_online
+
 #else /* CONFIG_SMP */
 static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
 static inline int topology_max_smt_threads(void) { return 1; }
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index e35ccdc84910..6073a16628f9 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -372,6 +372,19 @@ unsigned int topology_unit_count(u32 apicid, enum x86_topology_domains which_uni
 	return topo_unit_count(lvlid, at_level, apic_maps[which_units].map);
 }
 
+#ifdef CONFIG_SMP
+int topology_get_primary_thread(unsigned int cpu)
+{
+	u32 apic_id = cpuid_to_apicid[cpu];
+
+	/*
+	 * Get the core domain level APIC id, which is the primary thread
+	 * and return the CPU number assigned to it.
+	 */
+	return topo_lookup_cpuid(topo_apicid(apic_id, TOPO_CORE_DOMAIN));
+}
+#endif
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /**
  * topology_hotplug_apic - Handle a physical hotplugged APIC after boot
Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
Posted by Rafael J. Wysocki 3 weeks, 6 days ago
On Fri, Sep 5, 2025 at 9:39 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Fri, Aug 29 2025 at 22:01, Rafael J. Wysocki wrote:
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -2710,6 +2710,12 @@
> >       cpu_maps_update_begin();
> >       cpu_smt_control = CPU_SMT_ENABLED;
> >       for_each_present_cpu(cpu) {
> > +             /*
> > +              * Avoid accidentally onlining primary thread CPUs that have
> > +              * been taken offline.
> > +              */
> > +             if (topology_is_primary_thread(cpu))
> > +                     continue;
>
> Hmm. That's kinda solving the problem, but I think the proper solution
> would be to implement topology_is_core_online() for x86.
>
> The above is preventing the primary thread to be onlined, but then
> onlines the corresponding hyperthread, which does not really make sense.

Well, manual online can be used for onlining the secondary thread of a
core where the primary thread is offline, so this is technically
possible already.

> Something like the completely untested below.

So given the above, shouldn't topology_is_core_online() check if any
thread in the given core is online?

> ---
> diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
> index 6c79ee7c0957..21041898157a 100644
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -231,6 +231,16 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
>  }
>  #define topology_is_primary_thread topology_is_primary_thread
>
> +int topology_get_primary_thread(unsigned int cpu);
> +
> +static inline bool topology_is_core_online(unsigned int cpu)
> +{
> +       int pcpu = topology_get_primary_thread(cpu);
> +
> +       return pcpu >= 0 ? cpu_online(pcpu) : false;
> +}
> +#define topology_is_core_online topology_is_core_online
> +
>  #else /* CONFIG_SMP */
>  static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
>  static inline int topology_max_smt_threads(void) { return 1; }
> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index e35ccdc84910..6073a16628f9 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -372,6 +372,19 @@ unsigned int topology_unit_count(u32 apicid, enum x86_topology_domains which_uni
>         return topo_unit_count(lvlid, at_level, apic_maps[which_units].map);
>  }
>
> +#ifdef CONFIG_SMP
> +int topology_get_primary_thread(unsigned int cpu)
> +{
> +       u32 apic_id = cpuid_to_apicid[cpu];
> +
> +       /*
> +        * Get the core domain level APIC id, which is the primary thread
> +        * and return the CPU number assigned to it.
> +        */
> +       return topo_lookup_cpuid(topo_apicid(apic_id, TOPO_CORE_DOMAIN));
> +}
> +#endif
> +
>  #ifdef CONFIG_ACPI_HOTPLUG_CPU
>  /**
>   * topology_hotplug_apic - Handle a physical hotplugged APIC after boot
>
>
Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
Posted by Rafael J. Wysocki 3 weeks, 6 days ago
On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Sep 5, 2025 at 9:39 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > On Fri, Aug 29 2025 at 22:01, Rafael J. Wysocki wrote:
> > > --- a/kernel/cpu.c
> > > +++ b/kernel/cpu.c
> > > @@ -2710,6 +2710,12 @@
> > >       cpu_maps_update_begin();
> > >       cpu_smt_control = CPU_SMT_ENABLED;
> > >       for_each_present_cpu(cpu) {
> > > +             /*
> > > +              * Avoid accidentally onlining primary thread CPUs that have
> > > +              * been taken offline.
> > > +              */
> > > +             if (topology_is_primary_thread(cpu))
> > > +                     continue;
> >
> > Hmm. That's kinda solving the problem, but I think the proper solution
> > would be to implement topology_is_core_online() for x86.
> >
> > The above is preventing the primary thread to be onlined, but then
> > onlines the corresponding hyperthread, which does not really make sense.
>
> Well, manual online can be used for onlining the secondary thread of a
> core where the primary thread is offline, so this is technically
> possible already.
>
> > Something like the completely untested below.
>
> So given the above, shouldn't topology_is_core_online() check if any
> thread in the given core is online?

Besides, this would cause the siblings of offline SMT threads to be
skipped while enabling SMT via sysfs (using
/sys/devices/system/cpu/smt/control), but I'm not sure if this is the
expectation in the field today.  The current behavior is to online all
secondary SMT threads (and more, but that part is quite arguably
broken).

That's why I decided to use a simpler patch which doesn't go that far.
Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
Posted by Thomas Gleixner 3 weeks, 6 days ago
On Fri, Sep 05 2025 at 15:27, Rafael J. Wysocki wrote:
> On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>> Well, manual online can be used for onlining the secondary thread of a
>> core where the primary thread is offline, so this is technically
>> possible already.
>>
>> > Something like the completely untested below.
>>
>> So given the above, shouldn't topology_is_core_online() check if any
>> thread in the given core is online?
>
> Besides, this would cause the siblings of offline SMT threads to be
> skipped while enabling SMT via sysfs (using
> /sys/devices/system/cpu/smt/control), but I'm not sure if this is the
> expectation in the field today.  The current behavior is to online all
> secondary SMT threads (and more, but that part is quite arguably
> broken).

It is broken, because the initial logic is to bring up primary threads
unconditionally and then refuse to bring up sibling threads.

With "maxcpus=xxx" this obviously limits the amount of primary threads,
so there is arguably no point to online any of the related secondary
threads of them.

The initial implementation was naively making that assumption, but the
core check which was added due to PPC made this actually correct.

It just did not snap with me back then, but it's actually the correct
thing to do, no?

Thanks,

        tglx
Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
Posted by Rafael J. Wysocki 3 weeks, 6 days ago
On Fri, Sep 5, 2025 at 10:47 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Fri, Sep 05 2025 at 15:27, Rafael J. Wysocki wrote:
> > On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >> Well, manual online can be used for onlining the secondary thread of a
> >> core where the primary thread is offline, so this is technically
> >> possible already.
> >>
> >> > Something like the completely untested below.
> >>
> >> So given the above, shouldn't topology_is_core_online() check if any
> >> thread in the given core is online?
> >
> > Besides, this would cause the siblings of offline SMT threads to be
> > skipped while enabling SMT via sysfs (using
> > /sys/devices/system/cpu/smt/control), but I'm not sure if this is the
> > expectation in the field today.  The current behavior is to online all
> > secondary SMT threads (and more, but that part is quite arguably
> > broken).
>
> It is broken, because the initial logic is to bring up primary threads
> unconditionally and then refuse to bring up sibling threads.
>
> With "maxcpus=xxx" this obviously limits the amount of primary threads,
> so there is arguably no point to online any of the related secondary
> threads of them.
>
> The initial implementation was naively making that assumption, but the
> core check which was added due to PPC made this actually correct.
>
> It just did not snap with me back then, but it's actually the correct
> thing to do, no?

It would at least be consistent with the existing PPC behavior. :-)
Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
Posted by Thomas Gleixner 3 weeks, 4 days ago
On Fri, Sep 05 2025 at 22:56, Rafael J. Wysocki wrote:
> On Fri, Sep 5, 2025 at 10:47 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> On Fri, Sep 05 2025 at 15:27, Rafael J. Wysocki wrote:
>> > On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>> >> Well, manual online can be used for onlining the secondary thread of a
>> >> core where the primary thread is offline, so this is technically
>> >> possible already.
>> >>
>> >> > Something like the completely untested below.
>> >>
>> >> So given the above, shouldn't topology_is_core_online() check if any
>> >> thread in the given core is online?
>> >
>> > Besides, this would cause the siblings of offline SMT threads to be
>> > skipped while enabling SMT via sysfs (using
>> > /sys/devices/system/cpu/smt/control), but I'm not sure if this is the
>> > expectation in the field today.  The current behavior is to online all
>> > secondary SMT threads (and more, but that part is quite arguably
>> > broken).
>>
>> It is broken, because the initial logic is to bring up primary threads
>> unconditionally and then refuse to bring up sibling threads.
>>
>> With "maxcpus=xxx" this obviously limits the amount of primary threads,
>> so there is arguably no point to online any of the related secondary
>> threads of them.
>>
>> The initial implementation was naively making that assumption, but the
>> core check which was added due to PPC made this actually correct.
>>
>> It just did not snap with me back then, but it's actually the correct
>> thing to do, no?
>
> It would at least be consistent with the existing PPC behavior. :-)

Correct.
Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
Posted by Rafael J. Wysocki 3 weeks, 1 day ago
On Sun, Sep 7, 2025 at 3:14 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Fri, Sep 05 2025 at 22:56, Rafael J. Wysocki wrote:
> > On Fri, Sep 5, 2025 at 10:47 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >>
> >> On Fri, Sep 05 2025 at 15:27, Rafael J. Wysocki wrote:
> >> > On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >> >> Well, manual online can be used for onlining the secondary thread of a
> >> >> core where the primary thread is offline, so this is technically
> >> >> possible already.
> >> >>
> >> >> > Something like the completely untested below.
> >> >>
> >> >> So given the above, shouldn't topology_is_core_online() check if any
> >> >> thread in the given core is online?
> >> >
> >> > Besides, this would cause the siblings of offline SMT threads to be
> >> > skipped while enabling SMT via sysfs (using
> >> > /sys/devices/system/cpu/smt/control), but I'm not sure if this is the
> >> > expectation in the field today.  The current behavior is to online all
> >> > secondary SMT threads (and more, but that part is quite arguably
> >> > broken).
> >>
> >> It is broken, because the initial logic is to bring up primary threads
> >> unconditionally and then refuse to bring up sibling threads.
> >>
> >> With "maxcpus=xxx" this obviously limits the amount of primary threads,
> >> so there is arguably no point to online any of the related secondary
> >> threads of them.
> >>
> >> The initial implementation was naively making that assumption, but the
> >> core check which was added due to PPC made this actually correct.
> >>
> >> It just did not snap with me back then, but it's actually the correct
> >> thing to do, no?
> >
> > It would at least be consistent with the existing PPC behavior. :-)
>
> Correct.

So are you going to send a patch or do you want me to do something?

From a user standpoint, this issue is a regression in 6.16, so it
would be good to address it before final 6.17.
[PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression
Posted by Thomas Gleixner 1 week, 5 days ago
Christian reported that commit a430c11f4015 ("intel_idle: Rescan "dead" SMT
siblings during initialization") broke the use case in which both 'nosmt'
and 'maxcpus' are on the kernel command line because it onlines primary
threads, which were offline due to the maxcpus limit.

The initially proposed fix to skip primary threads in the loop is
inconsistent. While it prevents the primary thread to be onlined, it then
onlines the corresponding hyperthread(s), which does not really make sense.

The CPU iterator in cpuhp_smt_enable() contains a check which excludes all
threads of a core, when the primary thread is offline. The default
implementation is a NOOP and therefore not effective on x86.

Implement topology_is_core_online() on x86 to address this issue. This
makes the behaviour consistent between x86 and PowerPC.

Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization")
Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
Reported-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/12740505.O9o76ZdvQC@rafael.j.wysocki
Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/
---
 arch/x86/include/asm/topology.h |   10 ++++++++++
 arch/x86/kernel/cpu/topology.c  |   13 +++++++++++++
 2 files changed, 23 insertions(+)

--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -231,6 +231,16 @@ static inline bool topology_is_primary_t
 }
 #define topology_is_primary_thread topology_is_primary_thread
 
+int topology_get_primary_thread(unsigned int cpu);
+
+static inline bool topology_is_core_online(unsigned int cpu)
+{
+	int pcpu = topology_get_primary_thread(cpu);
+
+	return pcpu >= 0 ? cpu_online(pcpu) : false;
+}
+#define topology_is_core_online topology_is_core_online
+
 #else /* CONFIG_SMP */
 static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
 static inline int topology_max_smt_threads(void) { return 1; }
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -372,6 +372,19 @@ unsigned int topology_unit_count(u32 api
 	return topo_unit_count(lvlid, at_level, apic_maps[which_units].map);
 }
 
+#ifdef CONFIG_SMP
+int topology_get_primary_thread(unsigned int cpu)
+{
+	u32 apic_id = cpuid_to_apicid[cpu];
+
+	/*
+	 * Get the core domain level APIC id, which is the primary thread
+	 * and return the CPU number assigned to it.
+	 */
+	return topo_lookup_cpuid(topo_apicid(apic_id, TOPO_CORE_DOMAIN));
+}
+#endif
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /**
  * topology_hotplug_apic - Handle a physical hotplugged APIC after boot
Re: [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression
Posted by Rafael J. Wysocki 1 week, 3 days ago
On Sun, Sep 21, 2025 at 10:56 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Christian reported that commit a430c11f4015 ("intel_idle: Rescan "dead" SMT
> siblings during initialization") broke the use case in which both 'nosmt'
> and 'maxcpus' are on the kernel command line because it onlines primary
> threads, which were offline due to the maxcpus limit.
>
> The initially proposed fix to skip primary threads in the loop is
> inconsistent. While it prevents the primary thread to be onlined, it then
> onlines the corresponding hyperthread(s), which does not really make sense.
>
> The CPU iterator in cpuhp_smt_enable() contains a check which excludes all
> threads of a core, when the primary thread is offline. The default
> implementation is a NOOP and therefore not effective on x86.
>
> Implement topology_is_core_online() on x86 to address this issue. This
> makes the behaviour consistent between x86 and PowerPC.
>
> Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization")
> Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
> Reported-by: Christian Loehle <christian.loehle@arm.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> Link: https://lore.kernel.org/12740505.O9o76ZdvQC@rafael.j.wysocki
> Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/

Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>

> ---
>  arch/x86/include/asm/topology.h |   10 ++++++++++
>  arch/x86/kernel/cpu/topology.c  |   13 +++++++++++++
>  2 files changed, 23 insertions(+)
>
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -231,6 +231,16 @@ static inline bool topology_is_primary_t
>  }
>  #define topology_is_primary_thread topology_is_primary_thread
>
> +int topology_get_primary_thread(unsigned int cpu);
> +
> +static inline bool topology_is_core_online(unsigned int cpu)
> +{
> +       int pcpu = topology_get_primary_thread(cpu);
> +
> +       return pcpu >= 0 ? cpu_online(pcpu) : false;
> +}
> +#define topology_is_core_online topology_is_core_online
> +
>  #else /* CONFIG_SMP */
>  static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
>  static inline int topology_max_smt_threads(void) { return 1; }
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -372,6 +372,19 @@ unsigned int topology_unit_count(u32 api
>         return topo_unit_count(lvlid, at_level, apic_maps[which_units].map);
>  }
>
> +#ifdef CONFIG_SMP
> +int topology_get_primary_thread(unsigned int cpu)
> +{
> +       u32 apic_id = cpuid_to_apicid[cpu];
> +
> +       /*
> +        * Get the core domain level APIC id, which is the primary thread
> +        * and return the CPU number assigned to it.
> +        */
> +       return topo_lookup_cpuid(topo_apicid(apic_id, TOPO_CORE_DOMAIN));
> +}
> +#endif
> +
>  #ifdef CONFIG_ACPI_HOTPLUG_CPU
>  /**
>   * topology_hotplug_apic - Handle a physical hotplugged APIC after boot
Re: [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression
Posted by Christian Loehle 1 week, 5 days ago
On 9/21/25 09:56, Thomas Gleixner wrote:
> Christian reported that commit a430c11f4015 ("intel_idle: Rescan "dead" SMT
> siblings during initialization") broke the use case in which both 'nosmt'
> and 'maxcpus' are on the kernel command line because it onlines primary
> threads, which were offline due to the maxcpus limit.
> 
> The initially proposed fix to skip primary threads in the loop is
> inconsistent. While it prevents the primary thread to be onlined, it then
> onlines the corresponding hyperthread(s), which does not really make sense.
> 
> The CPU iterator in cpuhp_smt_enable() contains a check which excludes all
> threads of a core, when the primary thread is offline. The default
> implementation is a NOOP and therefore not effective on x86.
> 
> Implement topology_is_core_online() on x86 to address this issue. This
> makes the behaviour consistent between x86 and PowerPC.
> 
> Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization")
> Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
> Reported-by: Christian Loehle <christian.loehle@arm.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Tested-by: Christian Loehle <christian.loehle@arm.com>
[tip: x86/urgent] x86/topology: Implement topology_is_core_online() to address SMT regression
Posted by tip-bot2 for Thomas Gleixner 1 week, 3 days ago
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     2066f00e5b2dc061fb6d8c88fadaebc97f11feaa
Gitweb:        https://git.kernel.org/tip/2066f00e5b2dc061fb6d8c88fadaebc97f11feaa
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sun, 21 Sep 2025 10:56:40 +02:00
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Mon, 22 Sep 2025 21:25:36 +02:00

x86/topology: Implement topology_is_core_online() to address SMT regression

Christian reported that commit a430c11f4015 ("intel_idle: Rescan "dead" SMT
siblings during initialization") broke the use case in which both 'nosmt'
and 'maxcpus' are on the kernel command line because it onlines primary
threads, which were offline due to the maxcpus limit.

The initially proposed fix to skip primary threads in the loop is
inconsistent. While it prevents the primary thread to be onlined, it then
onlines the corresponding hyperthread(s), which does not really make sense.

The CPU iterator in cpuhp_smt_enable() contains a check which excludes all
threads of a core, when the primary thread is offline. The default
implementation is a NOOP and therefore not effective on x86.

Implement topology_is_core_online() on x86 to address this issue. This
makes the behaviour consistent between x86 and PowerPC.

Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization")
Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/
Reported-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/12740505.O9o76ZdvQC@rafael.j.wysocki
---
 arch/x86/include/asm/topology.h | 10 ++++++++++
 arch/x86/kernel/cpu/topology.c  | 13 +++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 6c79ee7..2104189 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -231,6 +231,16 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
 }
 #define topology_is_primary_thread topology_is_primary_thread
 
+int topology_get_primary_thread(unsigned int cpu);
+
+static inline bool topology_is_core_online(unsigned int cpu)
+{
+	int pcpu = topology_get_primary_thread(cpu);
+
+	return pcpu >= 0 ? cpu_online(pcpu) : false;
+}
+#define topology_is_core_online topology_is_core_online
+
 #else /* CONFIG_SMP */
 static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
 static inline int topology_max_smt_threads(void) { return 1; }
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index e35ccdc..6073a16 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -372,6 +372,19 @@ unsigned int topology_unit_count(u32 apicid, enum x86_topology_domains which_uni
 	return topo_unit_count(lvlid, at_level, apic_maps[which_units].map);
 }
 
+#ifdef CONFIG_SMP
+int topology_get_primary_thread(unsigned int cpu)
+{
+	u32 apic_id = cpuid_to_apicid[cpu];
+
+	/*
+	 * Get the core domain level APIC id, which is the primary thread
+	 * and return the CPU number assigned to it.
+	 */
+	return topo_lookup_cpuid(topo_apicid(apic_id, TOPO_CORE_DOMAIN));
+}
+#endif
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /**
  * topology_hotplug_apic - Handle a physical hotplugged APIC after boot
Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
Posted by Rafael J. Wysocki 1 month ago
On Fri, Aug 29, 2025 at 10:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Christian has reported that commit a430c11f4015 ("intel_idle: Rescan
> "dead" SMT siblings during initialization") broke the use case in
> which both nosmt and maxcpus were added to the kernel command line
> because it caused CPUs that were not SMT siblings to be brought
> online during the intel_idle driver initialization in violation of the
> maxcpus limit.
>
> The underlying reason for this is a missing topology_is_primary_thread()
> check in cpuhp_smt_enable() which causes that function to put online
> more CPUs than just the SMT siblings that it is supposed to handle.
>
> Add the missing check to address the issue.
>
> Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization")
> Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
> Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/
> Reported-by: Christian Loehle <christian.loehle@arm.com>
> Tested-by: Christian Loehle <christian.loehle@arm.com>
> Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

I'm wondering if there are any objections or concerns.

Since this is a regression in 6.16, it would be good to get it fixed in 6.17-rc.

> ---
>  kernel/cpu.c |    6 ++++++
>  1 file changed, 6 insertions(+)
>
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2710,6 +2710,12 @@
>         cpu_maps_update_begin();
>         cpu_smt_control = CPU_SMT_ENABLED;
>         for_each_present_cpu(cpu) {
> +               /*
> +                * Avoid accidentally onlining primary thread CPUs that have
> +                * been taken offline.
> +                */
> +               if (topology_is_primary_thread(cpu))
> +                       continue;
>                 /* Skip online CPUs and CPUs on offline nodes */
>                 if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
>                         continue;
>
>
>
>
Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
Posted by Dave Hansen 1 month ago
On 9/2/25 08:05, Rafael J. Wysocki wrote:
> On Fri, Aug 29, 2025 at 10:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Christian has reported that commit a430c11f4015 ("intel_idle: Rescan
>> "dead" SMT siblings during initialization") broke the use case in

Does "dead" here mean present but offline?

>> which both nosmt and maxcpus were added to the kernel command line
>> because it caused CPUs that were not SMT siblings to be brought
>> online during the intel_idle driver initialization in violation of the
>> maxcpus limit.

How does intel_idle fit in here? I don't immediately see it calling
cpuhp_smt_enable().

>> The underlying reason for this is a missing topology_is_primary_thread()
>> check in cpuhp_smt_enable() which causes that function to put online
>> more CPUs than just the SMT siblings that it is supposed to handle.
>>
>> Add the missing check to address the issue.

We should probably add a bit more checking in cpuhp_smt_enable() to make
sure that it's being called under expected conditions like a:

	WARN_ON_ONCE(cpu_smt_control != CPU_SMT_DISABLED);

and probably also some comments about how it is expected to work.

cpuhp_smt_enable() doesn't _really_ enable SMT. It specifically takes it
from DISABLED=>ENABLED. Thinking about it in that context, enabling
_just_ the secondary (disabled) threads makes a lot of sense.

But that can come later, after the bug fix.

>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -2710,6 +2710,12 @@

No 'diff -p', eh?

>>         cpu_maps_update_begin();
>>         cpu_smt_control = CPU_SMT_ENABLED;
>>         for_each_present_cpu(cpu) {
>> +               /*
>> +                * Avoid accidentally onlining primary thread CPUs that have
>> +                * been taken offline.
>> +                */
>> +               if (topology_is_primary_thread(cpu))
>> +                       continue;
>>                 /* Skip online CPUs and CPUs on offline nodes */
>>                 if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
>>                         continue;
Is there a more generic problem with this not respecting 'maxcpus'? If
maxcpus had forced a primary thread offline, this would still online the
secondary thread, even with the fix. Taking _that_ online might still
bring you over the maxcpus limit.


Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
Posted by Rafael J. Wysocki 1 month ago
On Tue, Sep 2, 2025 at 6:56 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 9/2/25 08:05, Rafael J. Wysocki wrote:
> > On Fri, Aug 29, 2025 at 10:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> Christian has reported that commit a430c11f4015 ("intel_idle: Rescan
> >> "dead" SMT siblings during initialization") broke the use case in
>
> Does "dead" here mean present but offline?

Yes.

> >> which both nosmt and maxcpus were added to the kernel command line
> >> because it caused CPUs that were not SMT siblings to be brought
> >> online during the intel_idle driver initialization in violation of the
> >> maxcpus limit.
>
> How does intel_idle fit in here? I don't immediately see it calling
> cpuhp_smt_enable().

It calls arch_cpu_rescan_dead_smt_siblings() which calls cpuhp_smt_enable().

> >> The underlying reason for this is a missing topology_is_primary_thread()
> >> check in cpuhp_smt_enable() which causes that function to put online
> >> more CPUs than just the SMT siblings that it is supposed to handle.
> >>
> >> Add the missing check to address the issue.
>
> We should probably add a bit more checking in cpuhp_smt_enable() to make
> sure that it's being called under expected conditions like a:
>
>         WARN_ON_ONCE(cpu_smt_control != CPU_SMT_DISABLED);
>
> and probably also some comments about how it is expected to work.

Well, see the callers.  Two out of three take care of this already and
the third one doesn't care.

> cpuhp_smt_enable() doesn't _really_ enable SMT. It specifically takes it
> from DISABLED=>ENABLED. Thinking about it in that context, enabling
> _just_ the secondary (disabled) threads makes a lot of sense.
>
> But that can come later, after the bug fix.
>
> >> --- a/kernel/cpu.c
> >> +++ b/kernel/cpu.c
> >> @@ -2710,6 +2710,12 @@
>
> No 'diff -p', eh?

Ah, sorry about this.

> >>         cpu_maps_update_begin();
> >>         cpu_smt_control = CPU_SMT_ENABLED;
> >>         for_each_present_cpu(cpu) {
> >> +               /*
> >> +                * Avoid accidentally onlining primary thread CPUs that have
> >> +                * been taken offline.
> >> +                */
> >> +               if (topology_is_primary_thread(cpu))
> >> +                       continue;
> >>                 /* Skip online CPUs and CPUs on offline nodes */
> >>                 if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
> >>                         continue;
>
> Is there a more generic problem with this not respecting 'maxcpus'? If
> maxcpus had forced a primary thread offline, this would still online the
> secondary thread, even with the fix. Taking _that_ online might still
> bring you over the maxcpus limit.

Yes, there is AFAICS, but it has been there for some time and it
doesn't affect the rescan during boot.

For the rescan, it is actually useful to not respect maxcpus because
it allows all of the SMT siblings to be kicked, but for run-time
enabling of SMT it may be sort of a problem.

This change simply addresses the regression, which is that cores
without SMT become online quite surprisingly after intel_idle (and
ACPI idle too for that matter) initialization, due to
arch_cpu_rescan_dead_smt_siblings().
Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
Posted by Dave Hansen 4 weeks, 1 day ago
I munged the changelog up a bit:

> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/commit/?h=testme&id=13f107f882bb5

I hope that looks better to everyone.

I also went looking at this assuming that it was x86-specific. I'd
appreciate a quick head nod from Thomas or Peter on this before I push
it anywhere, though.
Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
Posted by Rafael J. Wysocki 4 weeks, 1 day ago
On Wed, Sep 3, 2025 at 8:00 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> I munged the changelog up a bit:
>
> > https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/commit/?h=testme&id=13f107f882bb5
>
> I hope that looks better to everyone.

It does to me, thanks!

> I also went looking at this assuming that it was x86-specific. I'd
> appreciate a quick head nod from Thomas or Peter on this before I push
> it anywhere, though.

Sure.

Thank you!