[v2] ACPI: PPTT: Dump PPTT table when error detected

[PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Feng Tang 1 month, 1 week ago

There was warning message about PPTT table:

	"ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",

and it in turn caused scheduler warnings when building up the system.
It took a while to root cause the problem be related a broken PPTT
table which has wrong cache information.

To speedup debugging similar issues, dump the PPTT table, which makes
the warning more noticeable and helps bug hunting.

The dumped info format on a ARM server is like:

    ACPI PPTT: Processors:
    P[  0][0x0024]: parent=0x0000 acpi_proc_id=  0 num_res=1 flags=0x11(package)
    P[  1][0x005a]: parent=0x0024 acpi_proc_id=  0 num_res=1 flags=0x12()
    P[  2][0x008a]: parent=0x005a acpi_proc_id=  0 num_res=3 flags=0x1a(leaf)
    P[  3][0x00f2]: parent=0x005a acpi_proc_id=  1 num_res=3 flags=0x1a(leaf)
    P[  4][0x015a]: parent=0x005a acpi_proc_id=  2 num_res=3 flags=0x1a(leaf)
    ...
    ACPI PPTT: Caches:
    C[   0][0x0072]: flags=0x7f next_level=0x0000 size=0x4000000  sets=65536  way=16 attribute=0xa  line_size=64
    C[   1][0x00aa]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x4  line_size=64
    C[   2][0x00c2]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x2  line_size=64
    C[   3][0x00da]: flags=0x7f next_level=0x0000 size=0x100000   sets=2048   way=8  attribute=0xa  line_size=64
    ...

It provides a global and straightforward view of the hierarchy of the
processor and caches info of the platform, and from the offset info
(the 3rd column), the child-parent relation could be checked.

With this, the root cause of the original issue was pretty obvious,
that there were some caches items missing which caused the issue when
building up scheduler domain.

Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
Changelog:

  v2
  * rebase againt 6.19 and refine the commit log

 drivers/acpi/pptt.c | 75 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index de5f8c018333..e00abedcd786 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -529,6 +529,79 @@ static void acpi_pptt_warn_missing(void)
 	pr_warn_once("No PPTT table found, CPU and cache topology may be inaccurate\n");
 }
 
+static void acpi_dump_pptt_table(struct acpi_table_header *table_hdr)
+{
+	struct acpi_subtable_header *entry, *entry_start;
+	unsigned long end;
+	struct acpi_pptt_processor *cpu;
+	struct acpi_pptt_cache *cache;
+	u32 entry_sz, i;
+	u8 len;
+	static bool dumped;
+
+	/* PPTT table could be pretty big, no need to dump it twice */
+	if (dumped)
+		return;
+	dumped = true;
+
+	end = (unsigned long)table_hdr + table_hdr->length;
+	entry_start = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
+			     sizeof(struct acpi_table_pptt));
+
+	pr_info("Processors:\n");
+	entry_sz = sizeof(struct acpi_pptt_processor);
+	entry = entry_start;
+	i = 0;
+	while ((unsigned long)entry + entry_sz <= end) {
+		len = entry->length;
+		if (!len) {
+			pr_warn("Invalid zero length subtable\n");
+			return;
+		}
+
+		cpu = (struct acpi_pptt_processor *)entry;
+		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry, len);
+
+		if (cpu->header.type != ACPI_PPTT_TYPE_PROCESSOR)
+			continue;
+
+		printk(KERN_INFO "P[%3d][0x%04lx]: parent=0x%04x acpi_proc_id=%3d num_res=%d flags=0x%02x(%s%s%s)\n",
+			i++, (unsigned long)cpu - (unsigned long)table_hdr,
+			cpu->parent, cpu->acpi_processor_id,
+			cpu->number_of_priv_resources, cpu->flags,
+			cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE ? "package" : "",
+			cpu->flags & ACPI_PPTT_ACPI_LEAF_NODE ? "leaf" : "",
+			cpu->flags & ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD ? ", thread" : ""
+			);
+
+	}
+
+	pr_info("Caches:\n");
+	entry_sz = sizeof(struct acpi_pptt_cache);
+	entry = entry_start;
+	i = 0;
+	while ((unsigned long)entry + entry_sz <= end) {
+		len = entry->length;
+		if (!len) {
+			pr_warn("Invalid zero length subtable\n");
+			return;
+		}
+
+		cache = (struct acpi_pptt_cache *)entry;
+		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry, len);
+
+		if (cache->header.type != ACPI_PPTT_TYPE_CACHE)
+			continue;
+
+		printk(KERN_INFO "C[%4d][0x%04lx]: flags=0x%02x next_level=0x%04x size=0x%-8x sets=%-6d way=%-2d attribute=0x%-2x line_size=%d\n",
+			i++, (unsigned long)cache - (unsigned long)table_hdr,
+			cache->flags, cache->next_level_of_cache, cache->size,
+			cache->number_of_sets, cache->associativity,
+			cache->attributes, cache->line_size
+			);
+	}
+}
+
 /**
  * topology_get_acpi_cpu_tag() - Find a unique topology value for a feature
  * @table: Pointer to the head of the PPTT table
@@ -565,6 +638,8 @@ static int topology_get_acpi_cpu_tag(struct acpi_table_header *table,
 	}
 	pr_warn_once("PPTT table found, but unable to locate core %d (%d)\n",
 		    cpu, acpi_cpu_id);
+
+	acpi_dump_pptt_table(table);
 	return -ENOENT;
 }
 

base-commit: c8ebd433459bcbf068682b09544e830acd7ed222
-- 
2.39.5 (Apple Git-154)

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Sudeep Holla 3 weeks, 4 days ago

On Wed, Dec 31, 2025 at 06:49:09PM +0800, Feng Tang wrote:
> There was warning message about PPTT table:
> 
> 	"ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
> 
> and it in turn caused scheduler warnings when building up the system.
> It took a while to root cause the problem be related a broken PPTT
> table which has wrong cache information.
> 
> To speedup debugging similar issues, dump the PPTT table, which makes
> the warning more noticeable and helps bug hunting.
> 
> The dumped info format on a ARM server is like:
> 
>     ACPI PPTT: Processors:
>     P[  0][0x0024]: parent=0x0000 acpi_proc_id=  0 num_res=1 flags=0x11(package)
>     P[  1][0x005a]: parent=0x0024 acpi_proc_id=  0 num_res=1 flags=0x12()
>     P[  2][0x008a]: parent=0x005a acpi_proc_id=  0 num_res=3 flags=0x1a(leaf)
>     P[  3][0x00f2]: parent=0x005a acpi_proc_id=  1 num_res=3 flags=0x1a(leaf)
>     P[  4][0x015a]: parent=0x005a acpi_proc_id=  2 num_res=3 flags=0x1a(leaf)
>     ...
>     ACPI PPTT: Caches:
>     C[   0][0x0072]: flags=0x7f next_level=0x0000 size=0x4000000  sets=65536  way=16 attribute=0xa  line_size=64
>     C[   1][0x00aa]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x4  line_size=64
>     C[   2][0x00c2]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x2  line_size=64
>     C[   3][0x00da]: flags=0x7f next_level=0x0000 size=0x100000   sets=2048   way=8  attribute=0xa  line_size=64
>     ...
> 
> It provides a global and straightforward view of the hierarchy of the
> processor and caches info of the platform, and from the offset info
> (the 3rd column), the child-parent relation could be checked.
> 
> With this, the root cause of the original issue was pretty obvious,
> that there were some caches items missing which caused the issue when
> building up scheduler domain.
> 

While this may sound like a good idea, it deviates from how errors in other
table-parsing code are handled. Instead of dumping the entire table, it would
be preferable to report the specific issue encountered during parsing.

I do not have a strong objection if Rafael is comfortable with this approach;
however, it does differ from the established pattern used by similar code.
Dumping the entire table in a custom manner is not the standard way of
handling parsing errors. Just my opinion.

-- 
Regards,
Sudeep

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Rafael J. Wysocki 3 weeks, 3 days ago

On Mon, Jan 12, 2026 at 6:03 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Wed, Dec 31, 2025 at 06:49:09PM +0800, Feng Tang wrote:
> > There was warning message about PPTT table:
> >
> >       "ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
> >
> > and it in turn caused scheduler warnings when building up the system.
> > It took a while to root cause the problem be related a broken PPTT
> > table which has wrong cache information.
> >
> > To speedup debugging similar issues, dump the PPTT table, which makes
> > the warning more noticeable and helps bug hunting.
> >
> > The dumped info format on a ARM server is like:
> >
> >     ACPI PPTT: Processors:
> >     P[  0][0x0024]: parent=0x0000 acpi_proc_id=  0 num_res=1 flags=0x11(package)
> >     P[  1][0x005a]: parent=0x0024 acpi_proc_id=  0 num_res=1 flags=0x12()
> >     P[  2][0x008a]: parent=0x005a acpi_proc_id=  0 num_res=3 flags=0x1a(leaf)
> >     P[  3][0x00f2]: parent=0x005a acpi_proc_id=  1 num_res=3 flags=0x1a(leaf)
> >     P[  4][0x015a]: parent=0x005a acpi_proc_id=  2 num_res=3 flags=0x1a(leaf)
> >     ...
> >     ACPI PPTT: Caches:
> >     C[   0][0x0072]: flags=0x7f next_level=0x0000 size=0x4000000  sets=65536  way=16 attribute=0xa  line_size=64
> >     C[   1][0x00aa]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x4  line_size=64
> >     C[   2][0x00c2]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x2  line_size=64
> >     C[   3][0x00da]: flags=0x7f next_level=0x0000 size=0x100000   sets=2048   way=8  attribute=0xa  line_size=64
> >     ...
> >
> > It provides a global and straightforward view of the hierarchy of the
> > processor and caches info of the platform, and from the offset info
> > (the 3rd column), the child-parent relation could be checked.
> >
> > With this, the root cause of the original issue was pretty obvious,
> > that there were some caches items missing which caused the issue when
> > building up scheduler domain.
> >
>
> While this may sound like a good idea, it deviates from how errors in other
> table-parsing code are handled. Instead of dumping the entire table, it would
> be preferable to report the specific issue encountered during parsing.
>
> I do not have a strong objection if Rafael is comfortable with this approach;

I'm not a big fan of it TBH.

> however, it does differ from the established pattern used by similar code.
> Dumping the entire table in a custom manner is not the standard way of
> handling parsing errors. Just my opinion.

I agree.

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Feng Tang 3 weeks, 3 days ago

Hi Rafael,

Thanks for the review!

On Tue, Jan 13, 2026 at 05:21:07PM +0100, Rafael J. Wysocki wrote:
> On Mon, Jan 12, 2026 at 6:03 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> >
> > On Wed, Dec 31, 2025 at 06:49:09PM +0800, Feng Tang wrote:
> > > There was warning message about PPTT table:
> > >
> > >       "ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
> > >
> > > and it in turn caused scheduler warnings when building up the system.
> > > It took a while to root cause the problem be related a broken PPTT
> > > table which has wrong cache information.
> > >
> > > To speedup debugging similar issues, dump the PPTT table, which makes
> > > the warning more noticeable and helps bug hunting.
> > >
> > > The dumped info format on a ARM server is like:
> > >
> > >     ACPI PPTT: Processors:
> > >     P[  0][0x0024]: parent=0x0000 acpi_proc_id=  0 num_res=1 flags=0x11(package)
> > >     P[  1][0x005a]: parent=0x0024 acpi_proc_id=  0 num_res=1 flags=0x12()
> > >     P[  2][0x008a]: parent=0x005a acpi_proc_id=  0 num_res=3 flags=0x1a(leaf)
> > >     P[  3][0x00f2]: parent=0x005a acpi_proc_id=  1 num_res=3 flags=0x1a(leaf)
> > >     P[  4][0x015a]: parent=0x005a acpi_proc_id=  2 num_res=3 flags=0x1a(leaf)
> > >     ...
> > >     ACPI PPTT: Caches:
> > >     C[   0][0x0072]: flags=0x7f next_level=0x0000 size=0x4000000  sets=65536  way=16 attribute=0xa  line_size=64
> > >     C[   1][0x00aa]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x4  line_size=64
> > >     C[   2][0x00c2]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x2  line_size=64
> > >     C[   3][0x00da]: flags=0x7f next_level=0x0000 size=0x100000   sets=2048   way=8  attribute=0xa  line_size=64
> > >     ...
> > >
> > > It provides a global and straightforward view of the hierarchy of the
> > > processor and caches info of the platform, and from the offset info
> > > (the 3rd column), the child-parent relation could be checked.
> > >
> > > With this, the root cause of the original issue was pretty obvious,
> > > that there were some caches items missing which caused the issue when
> > > building up scheduler domain.
> > >
> >
> > While this may sound like a good idea, it deviates from how errors in other
> > table-parsing code are handled. Instead of dumping the entire table, it would
> > be preferable to report the specific issue encountered during parsing.
> >
> > I do not have a strong objection if Rafael is comfortable with this approach;
> 
> I'm not a big fan of it TBH.
> 
> > however, it does differ from the established pattern used by similar code.
> > Dumping the entire table in a custom manner is not the standard way of
> > handling parsing errors. Just my opinion.
> 
> I agree.

I understand the concern of this could be kind of special, Hanjun and Sudeep
have the same feeling.

The reason for the patch is:
* The apcidump tool follow the standard general format to dump each item,
  without grouping them according to type, the number of lines of acpidump
  is about 20X more than this, making it harder to parse
* In rare cases like for silicon enabling, sometimes the kernel can fail
  early where the user space checking is not available. If HW debugger is
  not available either, the kernel dumping is the only way to debug.

Does the proposal of putting it under a kernel config look doable to you?
If not, I will keeep the code local for now.

Thanks,
Feng

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Rafael J. Wysocki 3 weeks, 2 days ago

On Wed, Jan 14, 2026 at 8:48 AM Feng Tang <feng.tang@linux.alibaba.com> wrote:
>
> Hi Rafael,
>
> Thanks for the review!
>
> On Tue, Jan 13, 2026 at 05:21:07PM +0100, Rafael J. Wysocki wrote:
> > On Mon, Jan 12, 2026 at 6:03 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> > >
> > > On Wed, Dec 31, 2025 at 06:49:09PM +0800, Feng Tang wrote:
> > > > There was warning message about PPTT table:
> > > >
> > > >       "ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
> > > >
> > > > and it in turn caused scheduler warnings when building up the system.
> > > > It took a while to root cause the problem be related a broken PPTT
> > > > table which has wrong cache information.
> > > >
> > > > To speedup debugging similar issues, dump the PPTT table, which makes
> > > > the warning more noticeable and helps bug hunting.
> > > >
> > > > The dumped info format on a ARM server is like:
> > > >
> > > >     ACPI PPTT: Processors:
> > > >     P[  0][0x0024]: parent=0x0000 acpi_proc_id=  0 num_res=1 flags=0x11(package)
> > > >     P[  1][0x005a]: parent=0x0024 acpi_proc_id=  0 num_res=1 flags=0x12()
> > > >     P[  2][0x008a]: parent=0x005a acpi_proc_id=  0 num_res=3 flags=0x1a(leaf)
> > > >     P[  3][0x00f2]: parent=0x005a acpi_proc_id=  1 num_res=3 flags=0x1a(leaf)
> > > >     P[  4][0x015a]: parent=0x005a acpi_proc_id=  2 num_res=3 flags=0x1a(leaf)
> > > >     ...
> > > >     ACPI PPTT: Caches:
> > > >     C[   0][0x0072]: flags=0x7f next_level=0x0000 size=0x4000000  sets=65536  way=16 attribute=0xa  line_size=64
> > > >     C[   1][0x00aa]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x4  line_size=64
> > > >     C[   2][0x00c2]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x2  line_size=64
> > > >     C[   3][0x00da]: flags=0x7f next_level=0x0000 size=0x100000   sets=2048   way=8  attribute=0xa  line_size=64
> > > >     ...
> > > >
> > > > It provides a global and straightforward view of the hierarchy of the
> > > > processor and caches info of the platform, and from the offset info
> > > > (the 3rd column), the child-parent relation could be checked.
> > > >
> > > > With this, the root cause of the original issue was pretty obvious,
> > > > that there were some caches items missing which caused the issue when
> > > > building up scheduler domain.
> > > >
> > >
> > > While this may sound like a good idea, it deviates from how errors in other
> > > table-parsing code are handled. Instead of dumping the entire table, it would
> > > be preferable to report the specific issue encountered during parsing.
> > >
> > > I do not have a strong objection if Rafael is comfortable with this approach;
> >
> > I'm not a big fan of it TBH.
> >
> > > however, it does differ from the established pattern used by similar code.
> > > Dumping the entire table in a custom manner is not the standard way of
> > > handling parsing errors. Just my opinion.
> >
> > I agree.
>
> I understand the concern of this could be kind of special, Hanjun and Sudeep
> have the same feeling.
>
> The reason for the patch is:
> * The apcidump tool follow the standard general format to dump each item,
>   without grouping them according to type, the number of lines of acpidump
>   is about 20X more than this, making it harder to parse

But you can develop a PPTT parser.

All ACPI tables are exposed verbatim via /sys/firmware/acpi/tables/.

> * In rare cases like for silicon enabling, sometimes the kernel can fail
>   early where the user space checking is not available. If HW debugger is
>   not available either, the kernel dumping is the only way to debug.

But I don't think you need to dump the entire table in those cases.

> Does the proposal of putting it under a kernel config look doable to you?

That would mean extra code that's almost never used and needs to be
taken into account when making changes that may affect it.  Thanks,
but no thanks.

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Feng Tang 3 weeks, 4 days ago

Hi Sudeep,

Thanks for the reviews!

On Mon, Jan 12, 2026 at 05:02:59PM +0000, Sudeep Holla wrote:
> On Wed, Dec 31, 2025 at 06:49:09PM +0800, Feng Tang wrote:
> > There was warning message about PPTT table:
> > 
> > 	"ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
> > 
> > and it in turn caused scheduler warnings when building up the system.
> > It took a while to root cause the problem be related a broken PPTT
> > table which has wrong cache information.
> > 
> > To speedup debugging similar issues, dump the PPTT table, which makes
> > the warning more noticeable and helps bug hunting.
> > 
> > The dumped info format on a ARM server is like:
> > 
> >     ACPI PPTT: Processors:
> >     P[  0][0x0024]: parent=0x0000 acpi_proc_id=  0 num_res=1 flags=0x11(package)
> >     P[  1][0x005a]: parent=0x0024 acpi_proc_id=  0 num_res=1 flags=0x12()
> >     P[  2][0x008a]: parent=0x005a acpi_proc_id=  0 num_res=3 flags=0x1a(leaf)
> >     P[  3][0x00f2]: parent=0x005a acpi_proc_id=  1 num_res=3 flags=0x1a(leaf)
> >     P[  4][0x015a]: parent=0x005a acpi_proc_id=  2 num_res=3 flags=0x1a(leaf)
> >     ...
> >     ACPI PPTT: Caches:
> >     C[   0][0x0072]: flags=0x7f next_level=0x0000 size=0x4000000  sets=65536  way=16 attribute=0xa  line_size=64
> >     C[   1][0x00aa]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x4  line_size=64
> >     C[   2][0x00c2]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x2  line_size=64
> >     C[   3][0x00da]: flags=0x7f next_level=0x0000 size=0x100000   sets=2048   way=8  attribute=0xa  line_size=64
> >     ...
> > 
> > It provides a global and straightforward view of the hierarchy of the
> > processor and caches info of the platform, and from the offset info
> > (the 3rd column), the child-parent relation could be checked.
> > 
> > With this, the root cause of the original issue was pretty obvious,
> > that there were some caches items missing which caused the issue when
> > building up scheduler domain.
> > 
> 
> While this may sound like a good idea, it deviates from how errors in other
> table-parsing code are handled. Instead of dumping the entire table, it would
> be preferable to report the specific issue encountered during parsing.
> 
> I do not have a strong objection if Rafael is comfortable with this approach;
> however, it does differ from the established pattern used by similar code.
> Dumping the entire table in a custom manner is not the standard way of
> handling parsing errors. Just my opinion.

Yes, it's a fair point about the error handling. Actually for the issue
we met, the PPTT table complies with ACPI spec and PPTT table spec nicely,
that it has no checksum or format issue, the only problem is some items
are missing. 

So I would say the dump itself doesn't break any existing ACPI table error
handling, or change anything. As Hanjun suggested, it could be put under a
CONFIG_ACPI_PPTT_ERR_DUMP option as a PPTT specific debug method, and not
related to general ACPI table error handling.

We have had this in our tree for a while, and the good part is it gives a
direct overview of all the processors and caches in system, you get to
know the rough number of them from the index, and items are listed side
by side so that some minor error could be very obvious in this comparing
mode.

Thanks,
Feng

> 
> -- 
> Regards,
> Sudeep

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Sudeep Holla 3 weeks, 3 days ago

On Tue, Jan 13, 2026 at 04:25:29PM +0800, Feng Tang wrote:
> Hi Sudeep,
> 
> Thanks for the reviews!
> 
> On Mon, Jan 12, 2026 at 05:02:59PM +0000, Sudeep Holla wrote:
> > On Wed, Dec 31, 2025 at 06:49:09PM +0800, Feng Tang wrote:
> > > There was warning message about PPTT table:
> > > 
> > > 	"ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
> > > 
> > > and it in turn caused scheduler warnings when building up the system.
> > > It took a while to root cause the problem be related a broken PPTT
> > > table which has wrong cache information.
> > > 
> > > To speedup debugging similar issues, dump the PPTT table, which makes
> > > the warning more noticeable and helps bug hunting.
> > > 
> > > The dumped info format on a ARM server is like:
> > > 
> > >     ACPI PPTT: Processors:
> > >     P[  0][0x0024]: parent=0x0000 acpi_proc_id=  0 num_res=1 flags=0x11(package)
> > >     P[  1][0x005a]: parent=0x0024 acpi_proc_id=  0 num_res=1 flags=0x12()
> > >     P[  2][0x008a]: parent=0x005a acpi_proc_id=  0 num_res=3 flags=0x1a(leaf)
> > >     P[  3][0x00f2]: parent=0x005a acpi_proc_id=  1 num_res=3 flags=0x1a(leaf)
> > >     P[  4][0x015a]: parent=0x005a acpi_proc_id=  2 num_res=3 flags=0x1a(leaf)
> > >     ...
> > >     ACPI PPTT: Caches:
> > >     C[   0][0x0072]: flags=0x7f next_level=0x0000 size=0x4000000  sets=65536  way=16 attribute=0xa  line_size=64
> > >     C[   1][0x00aa]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x4  line_size=64
> > >     C[   2][0x00c2]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x2  line_size=64
> > >     C[   3][0x00da]: flags=0x7f next_level=0x0000 size=0x100000   sets=2048   way=8  attribute=0xa  line_size=64
> > >     ...
> > > 
> > > It provides a global and straightforward view of the hierarchy of the
> > > processor and caches info of the platform, and from the offset info
> > > (the 3rd column), the child-parent relation could be checked.
> > > 
> > > With this, the root cause of the original issue was pretty obvious,
> > > that there were some caches items missing which caused the issue when
> > > building up scheduler domain.
> > > 
> > 
> > While this may sound like a good idea, it deviates from how errors in other
> > table-parsing code are handled. Instead of dumping the entire table, it would
> > be preferable to report the specific issue encountered during parsing.
> > 
> > I do not have a strong objection if Rafael is comfortable with this approach;
> > however, it does differ from the established pattern used by similar code.
> > Dumping the entire table in a custom manner is not the standard way of
> > handling parsing errors. Just my opinion.
> 
> Yes, it's a fair point about the error handling. Actually for the issue
> we met, the PPTT table complies with ACPI spec and PPTT table spec nicely,
> that it has no checksum or format issue, the only problem is some items
> are missing. 
> 

Agreed, but how is this any different from other tables that contain optional
entries the ASL compiler cannot detect?

> So I would say the dump itself doesn't break any existing ACPI table error
> handling, or change anything. As Hanjun suggested, it could be put under a
> CONFIG_ACPI_PPTT_ERR_DUMP option as a PPTT specific debug method, and not
> related to general ACPI table error handling.
> 

Sure, that could be an option as long as CONFIG_ACPI_PPTT_ERR_DUMP is default
off and are enabled only when debugging and not always like in distro images.
Does that work for you ?

> We have had this in our tree for a while, and the good part is it gives a
> direct overview of all the processors and caches in system, you get to
> know the rough number of them from the index, and items are listed side
> by side so that some minor error could be very obvious in this comparing
> mode.
> 

Agreed, but all this info are available to userspace in some form already.
What does this dump give other than debugging a broken PPTT ?

-- 
Regards,
Sudeep

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Feng Tang 3 weeks, 3 days ago

Hi Sudeep,

On Tue, Jan 13, 2026 at 02:40:56PM +0000, Sudeep Holla wrote:
[...]
> > > > 
> > > > With this, the root cause of the original issue was pretty obvious,
> > > > that there were some caches items missing which caused the issue when
> > > > building up scheduler domain.
> > > > 
> > > 
> > > While this may sound like a good idea, it deviates from how errors in other
> > > table-parsing code are handled. Instead of dumping the entire table, it would
> > > be preferable to report the specific issue encountered during parsing.
> > > 
> > > I do not have a strong objection if Rafael is comfortable with this approach;
> > > however, it does differ from the established pattern used by similar code.
> > > Dumping the entire table in a custom manner is not the standard way of
> > > handling parsing errors. Just my opinion.
> > 
> > Yes, it's a fair point about the error handling. Actually for the issue
> > we met, the PPTT table complies with ACPI spec and PPTT table spec nicely,
> > that it has no checksum or format issue, the only problem is some items
> > are missing. 
> > 
> 
> Agreed, but how is this any different from other tables that contain optional
> entries the ASL compiler cannot detect?
> 
> > So I would say the dump itself doesn't break any existing ACPI table error
> > handling, or change anything. As Hanjun suggested, it could be put under a
> > CONFIG_ACPI_PPTT_ERR_DUMP option as a PPTT specific debug method, and not
> > related to general ACPI table error handling.
> > 
> 
> Sure, that could be an option as long as CONFIG_ACPI_PPTT_ERR_DUMP is default
> off and are enabled only when debugging and not always like in distro images.
> Does that work for you ?

Yes. It sounds great to me.

> > We have had this in our tree for a while, and the good part is it gives a
> > direct overview of all the processors and caches in system, you get to
> > know the rough number of them from the index, and items are listed side
> > by side so that some minor error could be very obvious in this comparing
> > mode.
> > 
> 
> Agreed, but all this info are available to userspace in some form already.
> What does this dump give other than debugging a broken PPTT ?

It is mainly for debugging issues. Though we locally has option to dump it
on boot unconditionally to help kernel/BIOS devleoper to have a quick
overview of the PPTT table, as the table gets updated from time to time,
or sometime the kernel could fail before booting to user space.

Thanks,
Feng

> -- 
> Regards,
> Sudeep

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Rafael J. Wysocki 3 weeks, 2 days ago

On Wed, Jan 14, 2026 at 8:07 AM Feng Tang <feng.tang@linux.alibaba.com> wrote:
>
> Hi Sudeep,
>
> On Tue, Jan 13, 2026 at 02:40:56PM +0000, Sudeep Holla wrote:
> [...]
> > > > >
> > > > > With this, the root cause of the original issue was pretty obvious,
> > > > > that there were some caches items missing which caused the issue when
> > > > > building up scheduler domain.
> > > > >
> > > >
> > > > While this may sound like a good idea, it deviates from how errors in other
> > > > table-parsing code are handled. Instead of dumping the entire table, it would
> > > > be preferable to report the specific issue encountered during parsing.
> > > >
> > > > I do not have a strong objection if Rafael is comfortable with this approach;
> > > > however, it does differ from the established pattern used by similar code.
> > > > Dumping the entire table in a custom manner is not the standard way of
> > > > handling parsing errors. Just my opinion.
> > >
> > > Yes, it's a fair point about the error handling. Actually for the issue
> > > we met, the PPTT table complies with ACPI spec and PPTT table spec nicely,
> > > that it has no checksum or format issue, the only problem is some items
> > > are missing.
> > >
> >
> > Agreed, but how is this any different from other tables that contain optional
> > entries the ASL compiler cannot detect?
> >
> > > So I would say the dump itself doesn't break any existing ACPI table error
> > > handling, or change anything. As Hanjun suggested, it could be put under a
> > > CONFIG_ACPI_PPTT_ERR_DUMP option as a PPTT specific debug method, and not
> > > related to general ACPI table error handling.
> > >
> >
> > Sure, that could be an option as long as CONFIG_ACPI_PPTT_ERR_DUMP is default
> > off and are enabled only when debugging and not always like in distro images.
> > Does that work for you ?
>
> Yes. It sounds great to me.
>
> > > We have had this in our tree for a while, and the good part is it gives a
> > > direct overview of all the processors and caches in system, you get to
> > > know the rough number of them from the index, and items are listed side
> > > by side so that some minor error could be very obvious in this comparing
> > > mode.
> > >
> >
> > Agreed, but all this info are available to userspace in some form already.
> > What does this dump give other than debugging a broken PPTT ?
>
> It is mainly for debugging issues. Though we locally has option to dump it
> on boot unconditionally to help kernel/BIOS devleoper to have a quick
> overview of the PPTT table, as the table gets updated from time to time,
> or sometime the kernel could fail before booting to user space.

The kernel message buffer is not a great place for dumping ACPI tables though.

If an invalid PPTT prevents the system from booting, print out enough
information to identify the cause of the failure.

For everything else, use the tools in user space.

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Feng Tang 3 weeks, 2 days ago

On Wed, Jan 14, 2026 at 12:36:58PM +0100, Rafael J. Wysocki wrote:
> > > Sure, that could be an option as long as CONFIG_ACPI_PPTT_ERR_DUMP is default
> > > off and are enabled only when debugging and not always like in distro images.
> > > Does that work for you ?
> >
> > Yes. It sounds great to me.
> >
> > > > We have had this in our tree for a while, and the good part is it gives a
> > > > direct overview of all the processors and caches in system, you get to
> > > > know the rough number of them from the index, and items are listed side
> > > > by side so that some minor error could be very obvious in this comparing
> > > > mode.
> > > >
> > >
> > > Agreed, but all this info are available to userspace in some form already.
> > > What does this dump give other than debugging a broken PPTT ?
> >
> > It is mainly for debugging issues. Though we locally has option to dump it
> > on boot unconditionally to help kernel/BIOS devleoper to have a quick
> > overview of the PPTT table, as the table gets updated from time to time,
> > or sometime the kernel could fail before booting to user space.
> 
> The kernel message buffer is not a great place for dumping ACPI tables though.

Yes.

> If an invalid PPTT prevents the system from booting, print out enough
> information to identify the cause of the failure.

Good suggestion! We do have some cases that wrong or missing info
of some ACPI table entries cause boot failure like IORT table.

As for the original issue where kernel printed the error message
" ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
can we just printed out all the CPU entries of the PPTT table? 
which is much cleaner and smaller, and have the enough information
for quickly identifying the root cause. As the number of cache
items is usually 3X of number of CPUs.

> 
> For everything else, use the tools in user space.

OK.

Thanks,
Feng

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Sudeep Holla 3 weeks, 2 days ago

On Wed, Jan 14, 2026 at 10:28:19PM +0800, Feng Tang wrote:
> 
> As for the original issue where kernel printed the error message
> " ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
> can we just printed out all the CPU entries of the PPTT table? 
> which is much cleaner and smaller, and have the enough information
> for quickly identifying the root cause. As the number of cache
> items is usually 3X of number of CPUs.

I am still not sure what additional value is gained by listing all those CPU
entries. On a 512-CPU system, for example, if an issue is identified with the
entry for CPU 256, what extra information is obtained by listing all the other
CPUs, such as those sharing the same L3 cache or entire list of CPUs on this
system?

The message above already indicates that something is wrong with core
(n = 1 in above case). If that is not sufficiently clear, it should be
improved to be more specific about the issue. Simply listing all CPUs in the
PPTT provides no additional insight and only results in an unnecessarily long
and distracting CPU list in the kernel log.

-- 
Regards,
Sudeep

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Feng Tang 3 weeks, 2 days ago

Hi Sudeep,

On Wed, Jan 14, 2026 at 03:06:09PM +0000, Sudeep Holla wrote:
> On Wed, Jan 14, 2026 at 10:28:19PM +0800, Feng Tang wrote:
> > 
> > As for the original issue where kernel printed the error message
> > " ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
> > can we just printed out all the CPU entries of the PPTT table? 
> > which is much cleaner and smaller, and have the enough information
> > for quickly identifying the root cause. As the number of cache
> > items is usually 3X of number of CPUs.
> 
> I am still not sure what additional value is gained by listing all those CPU
> entries. On a 512-CPU system, for example, if an issue is identified with the
> entry for CPU 256, what extra information is obtained by listing all the other
> CPUs, such as those sharing the same L3 cache or entire list of CPUs on this
> system?

My bad that I didn't make it clear. As for the original issue, the
platform has 8 CPUs, but the PPTT table only has 4 CPUs, while the MADT
and other tables are correct about the CPU numbers, and kernel does
successfully bringup all 8 CPUs. The PPTT message
" ACPI PPTT: PPTT table found, but unable to locate core 1 (1)" is kind
of modest and didn't caught our much attention as all 8 CPUS were onlined
fine. So with the "print only necessary info" suggestion from Rafael,
it will print out only 4 CPUS, which should immediately show the PPTT
table itself is wrong, and worth deeper check.

> 
> The message above already indicates that something is wrong with core
> (n = 1 in above case). If that is not sufficiently clear, it should be
> improved to be more specific about the issue. Simply listing all CPUs in the
> PPTT provides no additional insight and only results in an unnecessarily long
> and distracting CPU list in the kernel log.

As the print will be embedded under a default-no kernel config as we
discussed, and only be printed when error is detects, it may still be
acceptable regarding kernel log buffer? Or, any suggestion on how to
check the PPTT table to help future debugging? thanks!

- Feng

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Sudeep Holla 3 weeks, 1 day ago

On Thu, Jan 15, 2026 at 05:05:45PM +0800, Feng Tang wrote:
> Hi Sudeep,
> 
> On Wed, Jan 14, 2026 at 03:06:09PM +0000, Sudeep Holla wrote:
> > On Wed, Jan 14, 2026 at 10:28:19PM +0800, Feng Tang wrote:
> > > 
> > > As for the original issue where kernel printed the error message
> > > " ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
> > > can we just printed out all the CPU entries of the PPTT table? 
> > > which is much cleaner and smaller, and have the enough information
> > > for quickly identifying the root cause. As the number of cache
> > > items is usually 3X of number of CPUs.
> > 
> > I am still not sure what additional value is gained by listing all those CPU
> > entries. On a 512-CPU system, for example, if an issue is identified with the
> > entry for CPU 256, what extra information is obtained by listing all the other
> > CPUs, such as those sharing the same L3 cache or entire list of CPUs on this
> > system?
> 
> My bad that I didn't make it clear. As for the original issue, the
> platform has 8 CPUs, but the PPTT table only has 4 CPUs, while the MADT
> and other tables are correct about the CPU numbers, and kernel does
> successfully bringup all 8 CPUs. The PPTT message
> " ACPI PPTT: PPTT table found, but unable to locate core 1 (1)" is kind
> of modest and didn't caught our much attention as all 8 CPUS were onlined
> fine. So with the "print only necessary info" suggestion from Rafael,
> it will print out only 4 CPUS, which should immediately show the PPTT
> table itself is wrong, and worth deeper check.
> 

To be clear, listing CPUs is annoying on large systems. In your case, it may
be only 4 CPUs and that seems fine, but imagine if one CPU entry is missing on
a 512 CPU system - dumping a list of 511 CPUs is not only irritating, but also
largely useless for diagnosing the issue.

In my view, for the scenario above, the error should say something along the
lines of: the PPTT CPU entry count does not match the system CPU count.

-- 
Regards,
Sudeep

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Feng Tang 3 weeks, 1 day ago

On Thu, Jan 15, 2026 at 10:02:06AM +0000, Sudeep Holla wrote:
> On Thu, Jan 15, 2026 at 05:05:45PM +0800, Feng Tang wrote:
> > Hi Sudeep,
> > 
> > On Wed, Jan 14, 2026 at 03:06:09PM +0000, Sudeep Holla wrote:
> > > On Wed, Jan 14, 2026 at 10:28:19PM +0800, Feng Tang wrote:
> > > > 
> > > > As for the original issue where kernel printed the error message
> > > > " ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
> > > > can we just printed out all the CPU entries of the PPTT table? 
> > > > which is much cleaner and smaller, and have the enough information
> > > > for quickly identifying the root cause. As the number of cache
> > > > items is usually 3X of number of CPUs.
> > > 
> > > I am still not sure what additional value is gained by listing all those CPU
> > > entries. On a 512-CPU system, for example, if an issue is identified with the
> > > entry for CPU 256, what extra information is obtained by listing all the other
> > > CPUs, such as those sharing the same L3 cache or entire list of CPUs on this
> > > system?
> > 
> > My bad that I didn't make it clear. As for the original issue, the
> > platform has 8 CPUs, but the PPTT table only has 4 CPUs, while the MADT
> > and other tables are correct about the CPU numbers, and kernel does
> > successfully bringup all 8 CPUs. The PPTT message
> > " ACPI PPTT: PPTT table found, but unable to locate core 1 (1)" is kind
> > of modest and didn't caught our much attention as all 8 CPUS were onlined
> > fine. So with the "print only necessary info" suggestion from Rafael,
> > it will print out only 4 CPUS, which should immediately show the PPTT
> > table itself is wrong, and worth deeper check.
> > 
> 
> To be clear, listing CPUs is annoying on large systems. In your case, it may
> be only 4 CPUs and that seems fine, but imagine if one CPU entry is missing on
> a 512 CPU system - dumping a list of 511 CPUs is not only irritating, but also
> largely useless for diagnosing the issue.
> 
> In my view, for the scenario above, the error should say something along the
> lines of: the PPTT CPU entry count does not match the system CPU count.

This makes sense to me, thanks for the suggestion! Will check how to
implement it and test. The error happens in early boot phase, and I
guess only '__cpu_possible_mask' could be used for 'system CPU count'. 

Thanks,
Feng

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Rafael J. Wysocki 3 weeks, 2 days ago

On Wed, Jan 14, 2026 at 4:06 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Wed, Jan 14, 2026 at 10:28:19PM +0800, Feng Tang wrote:
> >
> > As for the original issue where kernel printed the error message
> > " ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
> > can we just printed out all the CPU entries of the PPTT table?
> > which is much cleaner and smaller, and have the enough information
> > for quickly identifying the root cause. As the number of cache
> > items is usually 3X of number of CPUs.
>
> I am still not sure what additional value is gained by listing all those CPU
> entries. On a 512-CPU system, for example, if an issue is identified with the
> entry for CPU 256, what extra information is obtained by listing all the other
> CPUs, such as those sharing the same L3 cache or entire list of CPUs on this
> system?
>
> The message above already indicates that something is wrong with core
> (n = 1 in above case). If that is not sufficiently clear, it should be
> improved to be more specific about the issue. Simply listing all CPUs in the
> PPTT provides no additional insight and only results in an unnecessarily long
> and distracting CPU list in the kernel log.

Fair enough.

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Rafael J. Wysocki 3 weeks, 2 days ago

On Wed, Jan 14, 2026 at 3:28 PM Feng Tang <feng.tang@linux.alibaba.com> wrote:
>
> On Wed, Jan 14, 2026 at 12:36:58PM +0100, Rafael J. Wysocki wrote:
> > > > Sure, that could be an option as long as CONFIG_ACPI_PPTT_ERR_DUMP is default
> > > > off and are enabled only when debugging and not always like in distro images.
> > > > Does that work for you ?
> > >
> > > Yes. It sounds great to me.
> > >
> > > > > We have had this in our tree for a while, and the good part is it gives a
> > > > > direct overview of all the processors and caches in system, you get to
> > > > > know the rough number of them from the index, and items are listed side
> > > > > by side so that some minor error could be very obvious in this comparing
> > > > > mode.
> > > > >
> > > >
> > > > Agreed, but all this info are available to userspace in some form already.
> > > > What does this dump give other than debugging a broken PPTT ?
> > >
> > > It is mainly for debugging issues. Though we locally has option to dump it
> > > on boot unconditionally to help kernel/BIOS devleoper to have a quick
> > > overview of the PPTT table, as the table gets updated from time to time,
> > > or sometime the kernel could fail before booting to user space.
> >
> > The kernel message buffer is not a great place for dumping ACPI tables though.
>
> Yes.
>
> > If an invalid PPTT prevents the system from booting, print out enough
> > information to identify the cause of the failure.
>
> Good suggestion! We do have some cases that wrong or missing info
> of some ACPI table entries cause boot failure like IORT table.
>
> As for the original issue where kernel printed the error message
> " ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
> can we just printed out all the CPU entries of the PPTT table?

As I said, print enough information to allow the problem to be
identified.  Please avoid excessive verbosity though.

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Hanjun Guo 4 weeks ago

Hi Feng Tang,

On 2025/12/31 18:49, Feng Tang wrote:
> There was warning message about PPTT table:
> 
> 	"ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
> 
> and it in turn caused scheduler warnings when building up the system.
> It took a while to root cause the problem be related a broken PPTT
> table which has wrong cache information.
> 
> To speedup debugging similar issues, dump the PPTT table, which makes
> the warning more noticeable and helps bug hunting.

Agreed, I think it was useful for debugging.

> 
> The dumped info format on a ARM server is like:
> 
>      ACPI PPTT: Processors:
>      P[  0][0x0024]: parent=0x0000 acpi_proc_id=  0 num_res=1 flags=0x11(package)
>      P[  1][0x005a]: parent=0x0024 acpi_proc_id=  0 num_res=1 flags=0x12()
>      P[  2][0x008a]: parent=0x005a acpi_proc_id=  0 num_res=3 flags=0x1a(leaf)
>      P[  3][0x00f2]: parent=0x005a acpi_proc_id=  1 num_res=3 flags=0x1a(leaf)
>      P[  4][0x015a]: parent=0x005a acpi_proc_id=  2 num_res=3 flags=0x1a(leaf)
>      ...
>      ACPI PPTT: Caches:
>      C[   0][0x0072]: flags=0x7f next_level=0x0000 size=0x4000000  sets=65536  way=16 attribute=0xa  line_size=64
>      C[   1][0x00aa]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x4  line_size=64
>      C[   2][0x00c2]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x2  line_size=64
>      C[   3][0x00da]: flags=0x7f next_level=0x0000 size=0x100000   sets=2048   way=8  attribute=0xa  line_size=64
>      ...
> 
> It provides a global and straightforward view of the hierarchy of the
> processor and caches info of the platform, and from the offset info
> (the 3rd column), the child-parent relation could be checked.
> 
> With this, the root cause of the original issue was pretty obvious,
> that there were some caches items missing which caused the issue when
> building up scheduler domain.

Just a discussion, can we just dump the raw PPTT table via acpidump
in user space when we meet the problem? With the raw PPTT table, we
can go though the content to see if we have problems.

> 
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> ---
> Changelog:
> 
>    v2
>    * rebase againt 6.19 and refine the commit log
> 
>   drivers/acpi/pptt.c | 75 +++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 75 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index de5f8c018333..e00abedcd786 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -529,6 +529,79 @@ static void acpi_pptt_warn_missing(void)
>   	pr_warn_once("No PPTT table found, CPU and cache topology may be inaccurate\n");
>   }
>   
> +static void acpi_dump_pptt_table(struct acpi_table_header *table_hdr)
> +{
> +	struct acpi_subtable_header *entry, *entry_start;
> +	unsigned long end;
> +	struct acpi_pptt_processor *cpu;
> +	struct acpi_pptt_cache *cache;
> +	u32 entry_sz, i;
> +	u8 len;
> +	static bool dumped;
> +
> +	/* PPTT table could be pretty big, no need to dump it twice */
> +	if (dumped)
> +		return;
> +	dumped = true;
> +
> +	end = (unsigned long)table_hdr + table_hdr->length;
> +	entry_start = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> +			     sizeof(struct acpi_table_pptt));
> +
> +	pr_info("Processors:\n");
> +	entry_sz = sizeof(struct acpi_pptt_processor);
> +	entry = entry_start;
> +	i = 0;
> +	while ((unsigned long)entry + entry_sz <= end) {
> +		len = entry->length;
> +		if (!len) {
> +			pr_warn("Invalid zero length subtable\n");
> +			return;
> +		}
> +
> +		cpu = (struct acpi_pptt_processor *)entry;
> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry, len);
> +
> +		if (cpu->header.type != ACPI_PPTT_TYPE_PROCESSOR)
> +			continue;
> +
> +		printk(KERN_INFO "P[%3d][0x%04lx]: parent=0x%04x acpi_proc_id=%3d num_res=%d flags=0x%02x(%s%s%s)\n",

pr_info() please.

> +			i++, (unsigned long)cpu - (unsigned long)table_hdr,
> +			cpu->parent, cpu->acpi_processor_id,
> +			cpu->number_of_priv_resources, cpu->flags,
> +			cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE ? "package" : "",
> +			cpu->flags & ACPI_PPTT_ACPI_LEAF_NODE ? "leaf" : "",
> +			cpu->flags & ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD ? ", thread" : ""
> +			);
> +
> +	}
> +
> +	pr_info("Caches:\n");
> +	entry_sz = sizeof(struct acpi_pptt_cache);
> +	entry = entry_start;
> +	i = 0;
> +	while ((unsigned long)entry + entry_sz <= end) {
> +		len = entry->length;
> +		if (!len) {
> +			pr_warn("Invalid zero length subtable\n");
> +			return;
> +		}
> +
> +		cache = (struct acpi_pptt_cache *)entry;
> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry, len);
> +
> +		if (cache->header.type != ACPI_PPTT_TYPE_CACHE)
> +			continue;
> +
> +		printk(KERN_INFO "C[%4d][0x%04lx]: flags=0x%02x next_level=0x%04x size=0x%-8x sets=%-6d way=%-2d attribute=0x%-2x line_size=%d\n",

Same here.

> +			i++, (unsigned long)cache - (unsigned long)table_hdr,
> +			cache->flags, cache->next_level_of_cache, cache->size,
> +			cache->number_of_sets, cache->associativity,
> +			cache->attributes, cache->line_size
> +			);
> +	}
> +}
> +
>   /**
>    * topology_get_acpi_cpu_tag() - Find a unique topology value for a feature
>    * @table: Pointer to the head of the PPTT table
> @@ -565,6 +638,8 @@ static int topology_get_acpi_cpu_tag(struct acpi_table_header *table,
>   	}
>   	pr_warn_once("PPTT table found, but unable to locate core %d (%d)\n",
>   		    cpu, acpi_cpu_id);
> +
> +	acpi_dump_pptt_table(table);

I think it would be good to dump it as needed, as a debug feature.

Thanks
Hanjun

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Feng Tang 3 weeks, 6 days ago

Hi Hanjun,

Thanks for the review!

On Sat, Jan 10, 2026 at 12:29:43PM +0800, Hanjun Guo wrote:
> Hi Feng Tang,
> 
> On 2025/12/31 18:49, Feng Tang wrote:
> > There was warning message about PPTT table:
> > 
> > 	"ACPI PPTT: PPTT table found, but unable to locate core 1 (1)",
> > 
> > and it in turn caused scheduler warnings when building up the system.
> > It took a while to root cause the problem be related a broken PPTT
> > table which has wrong cache information.
> > 
> > To speedup debugging similar issues, dump the PPTT table, which makes
> > the warning more noticeable and helps bug hunting.
> 
> Agreed, I think it was useful for debugging.
> 
> > 
> > The dumped info format on a ARM server is like:
> > 
> >      ACPI PPTT: Processors:
> >      P[  0][0x0024]: parent=0x0000 acpi_proc_id=  0 num_res=1 flags=0x11(package)
> >      P[  1][0x005a]: parent=0x0024 acpi_proc_id=  0 num_res=1 flags=0x12()
> >      P[  2][0x008a]: parent=0x005a acpi_proc_id=  0 num_res=3 flags=0x1a(leaf)
> >      P[  3][0x00f2]: parent=0x005a acpi_proc_id=  1 num_res=3 flags=0x1a(leaf)
> >      P[  4][0x015a]: parent=0x005a acpi_proc_id=  2 num_res=3 flags=0x1a(leaf)
> >      ...
> >      ACPI PPTT: Caches:
> >      C[   0][0x0072]: flags=0x7f next_level=0x0000 size=0x4000000  sets=65536  way=16 attribute=0xa  line_size=64
> >      C[   1][0x00aa]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x4  line_size=64
> >      C[   2][0x00c2]: flags=0x7f next_level=0x00da size=0x10000    sets=256    way=4  attribute=0x2  line_size=64
> >      C[   3][0x00da]: flags=0x7f next_level=0x0000 size=0x100000   sets=2048   way=8  attribute=0xa  line_size=64
> >      ...
> > 
> > It provides a global and straightforward view of the hierarchy of the
> > processor and caches info of the platform, and from the offset info
> > (the 3rd column), the child-parent relation could be checked.
> > 
> > With this, the root cause of the original issue was pretty obvious,
> > that there were some caches items missing which caused the issue when
> > building up scheduler domain.
> 
> Just a discussion, can we just dump the raw PPTT table via acpidump
> in user space when we meet the problem? With the raw PPTT table, we
> can go though the content to see if we have problems.

Good point! We can use iasl to decode the PPTT table. And this dump
is still useful as:
* when enabling new silicon or new firmware (APCI tables), sometimes it
  can't make to boot to user space when the issue happens.
* This dump shows the processor and cache items separately and cleanly,
  while the P[]/C[] index imply the numbers. In an 128 core product ARM
  sever, the print with this patch is about 500 line, while the acpidump
  is about 10,000 lines and harder to parse.

> 
> > 
> > Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> > ---
> > Changelog:
> > 
> >    v2
> >    * rebase againt 6.19 and refine the commit log
> > 
> >   drivers/acpi/pptt.c | 75 +++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 75 insertions(+)
> > 
> > diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> > index de5f8c018333..e00abedcd786 100644
> > --- a/drivers/acpi/pptt.c
> > +++ b/drivers/acpi/pptt.c
> > @@ -529,6 +529,79 @@ static void acpi_pptt_warn_missing(void)
> >   	pr_warn_once("No PPTT table found, CPU and cache topology may be inaccurate\n");
> >   }
> > +static void acpi_dump_pptt_table(struct acpi_table_header *table_hdr)
> > +{
> > +	struct acpi_subtable_header *entry, *entry_start;
> > +	unsigned long end;
> > +	struct acpi_pptt_processor *cpu;
> > +	struct acpi_pptt_cache *cache;
> > +	u32 entry_sz, i;
> > +	u8 len;
> > +	static bool dumped;
> > +
> > +	/* PPTT table could be pretty big, no need to dump it twice */
> > +	if (dumped)
> > +		return;
> > +	dumped = true;
> > +
> > +	end = (unsigned long)table_hdr + table_hdr->length;
> > +	entry_start = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> > +			     sizeof(struct acpi_table_pptt));
> > +
> > +	pr_info("Processors:\n");
> > +	entry_sz = sizeof(struct acpi_pptt_processor);
> > +	entry = entry_start;
> > +	i = 0;
> > +	while ((unsigned long)entry + entry_sz <= end) {
> > +		len = entry->length;
> > +		if (!len) {
> > +			pr_warn("Invalid zero length subtable\n");
> > +			return;
> > +		}
> > +
> > +		cpu = (struct acpi_pptt_processor *)entry;
> > +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry, len);
> > +
> > +		if (cpu->header.type != ACPI_PPTT_TYPE_PROCESSOR)
> > +			continue;
> > +
> > +		printk(KERN_INFO "P[%3d][0x%04lx]: parent=0x%04x acpi_proc_id=%3d num_res=%d flags=0x%02x(%s%s%s)\n",
> 
> pr_info() please.

Will change.
 
> > +			i++, (unsigned long)cpu - (unsigned long)table_hdr,
> > +			cpu->parent, cpu->acpi_processor_id,
> > +			cpu->number_of_priv_resources, cpu->flags,
> > +			cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE ? "package" : "",
> > +			cpu->flags & ACPI_PPTT_ACPI_LEAF_NODE ? "leaf" : "",
> > +			cpu->flags & ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD ? ", thread" : ""
> > +			);
> > +
> > +	}
> > +
> > +	pr_info("Caches:\n");
> > +	entry_sz = sizeof(struct acpi_pptt_cache);
> > +	entry = entry_start;
> > +	i = 0;
> > +	while ((unsigned long)entry + entry_sz <= end) {
> > +		len = entry->length;
> > +		if (!len) {
> > +			pr_warn("Invalid zero length subtable\n");
> > +			return;
> > +		}
> > +
> > +		cache = (struct acpi_pptt_cache *)entry;
> > +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry, len);
> > +
> > +		if (cache->header.type != ACPI_PPTT_TYPE_CACHE)
> > +			continue;
> > +
> > +		printk(KERN_INFO "C[%4d][0x%04lx]: flags=0x%02x next_level=0x%04x size=0x%-8x sets=%-6d way=%-2d attribute=0x%-2x line_size=%d\n",
> 
> Same here.

Yes.

> > +			i++, (unsigned long)cache - (unsigned long)table_hdr,
> > +			cache->flags, cache->next_level_of_cache, cache->size,
> > +			cache->number_of_sets, cache->associativity,
> > +			cache->attributes, cache->line_size
> > +			);
> > +	}
> > +}
> > +
> >   /**
> >    * topology_get_acpi_cpu_tag() - Find a unique topology value for a feature
> >    * @table: Pointer to the head of the PPTT table
> > @@ -565,6 +638,8 @@ static int topology_get_acpi_cpu_tag(struct acpi_table_header *table,
> >   	}
> >   	pr_warn_once("PPTT table found, but unable to locate core %d (%d)\n",
> >   		    cpu, acpi_cpu_id);
> > +
> > +	acpi_dump_pptt_table(table);
> 
> I think it would be good to dump it as needed, as a debug feature.

Makes sense to me. Should I add a kernel config option or a module
parameter for it, or just change the pr_info to pr_debug (it's in
a unlikely error path)?

Thanks,
Feng

> Thanks
> Hanjun

Re: [PATCH v2] ACPI: PPTT: Dump PPTT table when error detected

Posted by Hanjun Guo 3 weeks, 4 days ago

On 2026/1/10 23:04, Feng Tang wrote:
> Hi Hanjun,
> 
[...]
>>>
>>> It provides a global and straightforward view of the hierarchy of the
>>> processor and caches info of the platform, and from the offset info
>>> (the 3rd column), the child-parent relation could be checked.
>>>
>>> With this, the root cause of the original issue was pretty obvious,
>>> that there were some caches items missing which caused the issue when
>>> building up scheduler domain.
>>
>> Just a discussion, can we just dump the raw PPTT table via acpidump
>> in user space when we meet the problem? With the raw PPTT table, we
>> can go though the content to see if we have problems.
> 
> Good point! We can use iasl to decode the PPTT table. And this dump
> is still useful as:
> * when enabling new silicon or new firmware (APCI tables), sometimes it
>    can't make to boot to user space when the issue happens.
> * This dump shows the processor and cache items separately and cleanly,
>    while the P[]/C[] index imply the numbers. In an 128 core product ARM
>    sever, the print with this patch is about 500 line, while the acpidump
>    is about 10,000 lines and harder to parse.

Thanks for the user case, it makes sense to me.

> 
[...]
>>>    /**
>>>     * topology_get_acpi_cpu_tag() - Find a unique topology value for a feature
>>>     * @table: Pointer to the head of the PPTT table
>>> @@ -565,6 +638,8 @@ static int topology_get_acpi_cpu_tag(struct acpi_table_header *table,
>>>    	}
>>>    	pr_warn_once("PPTT table found, but unable to locate core %d (%d)\n",
>>>    		    cpu, acpi_cpu_id);
>>> +
>>> +	acpi_dump_pptt_table(table);
>>
>> I think it would be good to dump it as needed, as a debug feature.
> 
> Makes sense to me. Should I add a kernel config option or a module
> parameter for it, or just change the pr_info to pr_debug (it's in
> a unlikely error path)?

PPTT driver can not be compiled as a module, I would like to add a
kernel config for it.

Thanks
Hanjun