[PATCH v4] x86/tsc: Use topology_max_packages() to get package number

Feng Tang posted 1 patch 1 year, 6 months ago
arch/x86/kernel/cpu/topology.c | 5 ++++-
arch/x86/kernel/tsc.c          | 7 ++-----
2 files changed, 6 insertions(+), 6 deletions(-)
[PATCH v4] x86/tsc: Use topology_max_packages() to get package number
Posted by Feng Tang 1 year, 6 months ago
Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
on qualified platorms") was introduced to solve problem that
sometimes TSC clocksource is wrongly judged as unstable by watchdog
like 'jiffies', HPET, etc.

In it, the hardware package number is a key factor for judging whether
to disable the watchdog for TSC, and 'nr_online_nodes' was chosen due
to, at that time (kernel v5.1x), it is available in early boot phase
before registering 'tsc-early' clocksource, where all non-boot CPUs are
not brought up yet.

Dave and Rui pointed out there are many cases in which 'nr_online_nodes'
is cheated and not accurate, like:

* SNC (sub-numa cluster) mode enabled
* numa emulation (numa=fake=8 etc.)
* numa=off
* platforms with CPU-less HBM nodes, CPU-less Optane memory nodes.
* 'maxcpus=' cmdline setup, where chopped CPUs could be onlined later
* 'nr_cpus=', 'possible_cpus=' cmdline setup, where chopped CPUs can
  not be onlined after boot

The SNC case is the most user-visible case, as many CSP (Cloud Service
Provider) enable this feature in their server fleets. When SNC3 enabled,
a 2 socket machine will appear to have 6 NUMA nodes, and get impacted
by the issue in reality.

Thomas' recent patchset of refactoring x86 topology code improves
topology_max_packages() greatly, by making it more accurate and
available in early boot phase, which works well in most of the above
cases.

The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup, which
may under-estimate the package number. As during topology setup, the
boot CPU iterates through all enumerated APIC IDs and either accepts
or rejects the APIC ID. For accepted IDs, it figures out which bits of
the ID map to the package number.  It tracks which package numbers have
been seen in a bitmap.  topology_max_packages() just returns the number
of bits set in that bitmap.

'nr_cpus=' and 'possible_cpus=' can cause more APIC IDs to be rejected
and can artificially lower the number of bits in the package bitmap
and thus topology_max_packages().  This means that, for example, a
system with 8 physical packages might reject all the CPUs on 6 of those
packages and be left with only 2 packages and 2 bits set in the package
bitmap. It needs the TSC watchdog, but would disable it anyway.  This
isn't ideal, but it only happens for debug-oriented options. This is
fixable by tracking the package numbers for rejected CPUs.  But it's
not worth the trouble for debugging.

So use topology_max_packages() to replace nr_online_nodes().

Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Closes: https://lore.kernel.org/lkml/a4860054-0f16-6513-f121-501048431086@intel.com/
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Waiman Long <longman@redhat.com>
---
Hi all,

For warning about possible compromise due to 'nr_cpus=' and 'possible_cpus=',
one alternative is to check whether these has been setup in cmdline inside
tsc.c and warn there. 

Changelog:
 
  Since v3:
  * Rebase against v6.11-rc1
  * Add reviewed tag from Waiman
  * Refine the commmit log with user reported SNC3 bug info

  Since v2:
  * Use 'pr_info' to replace 'pr_warn' which could panic system
    if 'panic_on_warn=1' kcmdline parameter is on (Waiman)

  Since v1:
  * Use Dave's detailed elaboration about 'nr_cpus=', 'possible_cpus='
    possibly compromising '__max_logical_packages' in commit log
  * Fix typos and inaccuracy (Rui and Longman)


 arch/x86/kernel/cpu/topology.c | 5 ++++-
 arch/x86/kernel/tsc.c          | 7 ++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 621a151ccf7d..5603aef16bf9 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -521,8 +521,11 @@ void __init topology_init_possible_cpus(void)
 	pr_info("Num. threads per package: %3u\n", __num_threads_per_package);
 
 	pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
-	if (topo_info.nr_rejected_cpus)
+	if (topo_info.nr_rejected_cpus) {
 		pr_info("Rejected CPUs %u\n", topo_info.nr_rejected_cpus);
+		if (__max_logical_packages <= 4)
+			pr_info("TSC might be buggered due to the rejected CPUs\n");
+	}
 
 	init_cpu_present(cpumask_of(0));
 	init_cpu_possible(cpumask_of(0));
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index d4462fb26299..e6efe3a8dd5e 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1253,15 +1253,12 @@ static void __init check_system_tsc_reliable(void)
 	 *  - TSC which does not stop in C-States
 	 *  - the TSC_ADJUST register which allows to detect even minimal
 	 *    modifications
-	 *  - not more than two sockets. As the number of sockets cannot be
-	 *    evaluated at the early boot stage where this has to be
-	 *    invoked, check the number of online memory nodes as a
-	 *    fallback solution which is an reasonable estimate.
+	 *  - not more than four packages
 	 */
 	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
 	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
 	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
-	    nr_online_nodes <= 4)
+	    topology_max_packages() <= 4)
 		tsc_disable_clocksource_watchdog();
 }
 
-- 
2.34.1
Re: [PATCH v4] x86/tsc: Use topology_max_packages() to get package number
Posted by Thomas Gleixner 1 year, 6 months ago
On Mon, Jul 29 2024 at 10:12, Feng Tang wrote:
>  	pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
> -	if (topo_info.nr_rejected_cpus)
> +	if (topo_info.nr_rejected_cpus) {
>  		pr_info("Rejected CPUs %u\n", topo_info.nr_rejected_cpus);
> +		if (__max_logical_packages <= 4)
> +			pr_info("TSC might be buggered due to the rejected CPUs\n");

I'm not really convinced of the value of this message.

People who limit their CPUs on the command line or at compile time
really should know what they are doing. The kernel already tells that
there are rejected CPUs and that extra TSC info is just annoying and
confusing noise for people who run that and have a perfectly working TSC
on a single/dual/quad socket machine.

I just drop that noise.

Thanks,

        tglx
Re: [PATCH v4] x86/tsc: Use topology_max_packages() to get package number
Posted by Feng Tang 1 year, 6 months ago
On Wed, Jul 31, 2024 at 08:53:11PM +0200, Thomas Gleixner wrote:
> On Mon, Jul 29 2024 at 10:12, Feng Tang wrote:
> >  	pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
> > -	if (topo_info.nr_rejected_cpus)
> > +	if (topo_info.nr_rejected_cpus) {
> >  		pr_info("Rejected CPUs %u\n", topo_info.nr_rejected_cpus);
> > +		if (__max_logical_packages <= 4)
> > +			pr_info("TSC might be buggered due to the rejected CPUs\n");
> 
> I'm not really convinced of the value of this message.
> 
> People who limit their CPUs on the command line or at compile time
> really should know what they are doing. The kernel already tells that
> there are rejected CPUs and that extra TSC info is just annoying and
> confusing noise for people who run that and have a perfectly working TSC
> on a single/dual/quad socket machine.
> 
> I just drop that noise.
 
That makes sense to me, thanks!

Before posting the patch, I discussed this with Rui and we thought the
message was not that necessary and it was kept only in case some user
may want it. 

- Feng

> Thanks,
> 
>         tglx
> 
>
Re: [PATCH v4] x86/tsc: Use topology_max_packages() to get package number
Posted by Thomas Gleixner 1 year, 6 months ago
On Mon, Jul 29 2024 at 10:12, Feng Tang wrote:
> So use topology_max_packages() to replace nr_online_nodes().
>
> Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
> Closes: https://lore.kernel.org/lkml/a4860054-0f16-6513-f121-501048431086@intel.com/
> Signed-off-by: Feng Tang <feng.tang@intel.com>
> Reviewed-by: Waiman Long <longman@redhat.com>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
[tip: x86/timers] x86/tsc: Use topology_max_packages() to get package number
Posted by tip-bot2 for Feng Tang 1 year, 6 months ago
The following commit has been merged into the x86/timers branch of tip:

Commit-ID:     b4bac279319d3082eb42f074799c7b18ba528c71
Gitweb:        https://git.kernel.org/tip/b4bac279319d3082eb42f074799c7b18ba528c71
Author:        Feng Tang <feng.tang@intel.com>
AuthorDate:    Mon, 29 Jul 2024 10:12:02 +08:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 31 Jul 2024 21:12:09 +02:00

x86/tsc: Use topology_max_packages() to get package number

Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC on
qualified platorms") was introduced to solve problem that sometimes TSC
clocksource is wrongly judged as unstable by watchdog like 'jiffies', HPET,
etc.

In it, the hardware package number is a key factor for judging whether to
disable the watchdog for TSC, and 'nr_online_nodes' was chosen due to, at
that time (kernel v5.1x), it is available in early boot phase before
registering 'tsc-early' clocksource, where all non-boot CPUs are not
brought up yet.

Dave and Rui pointed out there are many cases in which 'nr_online_nodes'
is cheated and not accurate, like:

 * SNC (sub-numa cluster) mode enabled
 * numa emulation (numa=fake=8 etc.)
 * numa=off
 * platforms with CPU-less HBM nodes, CPU-less Optane memory nodes.
 * 'maxcpus=' cmdline setup, where chopped CPUs could be onlined later
 * 'nr_cpus=', 'possible_cpus=' cmdline setup, where chopped CPUs can
   not be onlined after boot

The SNC case is the most user-visible case, as many CSP (Cloud Service
Provider) enable this feature in their server fleets. When SNC3 enabled, a
2 socket machine will appear to have 6 NUMA nodes, and get impacted by the
issue in reality.

Thomas' recent patchset of refactoring x86 topology code improves
topology_max_packages() greatly, by making it more accurate and available
in early boot phase, which works well in most of the above cases.

The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup, which may
under-estimate the package number. As during topology setup, the boot CPU
iterates through all enumerated APIC IDs and either accepts or rejects the
APIC ID. For accepted IDs, it figures out which bits of the ID map to the
package number.  It tracks which package numbers have been seen in a
bitmap.  topology_max_packages() just returns the number of bits set in
that bitmap.

'nr_cpus=' and 'possible_cpus=' can cause more APIC IDs to be rejected and
can artificially lower the number of bits in the package bitmap and thus
topology_max_packages().  This means that, for example, a system with 8
physical packages might reject all the CPUs on 6 of those packages and be
left with only 2 packages and 2 bits set in the package bitmap. It needs
the TSC watchdog, but would disable it anyway.  This isn't ideal, but it
only happens for debug-oriented options. This is fixable by tracking the
package numbers for rejected CPUs.  But it's not worth the trouble for
debugging.

So use topology_max_packages() to replace nr_online_nodes().

Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/all/20240729021202.180955-1-feng.tang@intel.com
Closes: https://lore.kernel.org/lkml/a4860054-0f16-6513-f121-501048431086@intel.com/
---
 arch/x86/kernel/tsc.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index d4462fb..0ced187 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -28,6 +28,7 @@
 #include <asm/apic.h>
 #include <asm/cpu_device_id.h>
 #include <asm/i8259.h>
+#include <asm/topology.h>
 #include <asm/uv/uv.h>
 
 unsigned int __read_mostly cpu_khz;	/* TSC clocks / usec, not used here */
@@ -1253,15 +1254,12 @@ static void __init check_system_tsc_reliable(void)
 	 *  - TSC which does not stop in C-States
 	 *  - the TSC_ADJUST register which allows to detect even minimal
 	 *    modifications
-	 *  - not more than two sockets. As the number of sockets cannot be
-	 *    evaluated at the early boot stage where this has to be
-	 *    invoked, check the number of online memory nodes as a
-	 *    fallback solution which is an reasonable estimate.
+	 *  - not more than four packages
 	 */
 	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
 	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
 	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
-	    nr_online_nodes <= 4)
+	    topology_max_packages() <= 4)
 		tsc_disable_clocksource_watchdog();
 }