[PATCH 2/2] x86/cpu/topology: Use initial APICID from XTOPOEXT on AMD/HYGON

K Prateek Nayak posted 2 patches 4 months ago
There is a newer version of this series
[PATCH 2/2] x86/cpu/topology: Use initial APICID from XTOPOEXT on AMD/HYGON
Posted by K Prateek Nayak 4 months ago
Prior to the topology parsing rewrite and the switchover to the new
parsing logic for AMD processors in commit c749ce393b8f ("x86/cpu: Use
common topology code for AMD"), the "initial_apicid" on these platforms
was:

- First initialized to the LocalApicId from CPUID leaf 0x1 EBX[31:24].

- Then overwritten by the ExtendedLocalApicId in CPUID leaf 0xb
  EDX[31:0] on processors that supported topoext.

With the new parsing flow introduced in commit f7fb3b2dd92c ("x86/cpu:
Provide an AMD/HYGON specific topology parser"), parse_8000_001e() now
unconditionally overwrites the "initial_apicid" already parsed during
cpu_parse_topology_ext().

Although this has not been a problem on baremetal platforms, on
virtualized AMD guests that feature more than 255 cores, QEMU 0's out
the CPUID leaf 0x8000001e on CPUs with "CoreID" > 255 to prevent
collision of these IDs in EBX[7:0] which can only represent a maximum of
255 cores [1].

This results in the following FW_BUG being logged when booting a guest
with more than 255 cores:

    [Firmware Bug]: CPU 512: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0200

Rely on the APICID parsed during cpu_parse_topology_ext() from CPUID
leaf 0x80000026 or 0xb and only use the APICID from leaf 0x8000001e if
cpu_parse_topology_ext() failed (has_topoext is false).

On platforms that support the 0xb leaf (Zen2 or later, AMD guests on
QEMU) or the extended leaf 0x80000026 (Zen4 or later), the
"initial_apicid" is now set to the value parsed from EDX[31:0].

On older AMD/Hygon platforms that does not support the 0xb leaf but
supports the TOPOEXT extension (Fam 0x15, 0x16, 0x17[Zen1], and Hygon),
the current behavior is retained where "initial_apicid" is set using
the 0x8000001e leaf.

Link: https://github.com/qemu/qemu/commit/35ac5dfbcaa4b [1]
Debugged-by: Naveen N Rao (AMD) <naveen@kernel.org>
Debugged-by: Sairaj Kodilkar <sarunkod@amd.com>
Fixes: c749ce393b8f ("x86/cpu: Use common topology code for AMD")
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 arch/x86/kernel/cpu/topology_amd.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/topology_amd.c b/arch/x86/kernel/cpu/topology_amd.c
index fec849fff82f..2ff6988e020a 100644
--- a/arch/x86/kernel/cpu/topology_amd.c
+++ b/arch/x86/kernel/cpu/topology_amd.c
@@ -80,7 +80,13 @@ static bool parse_8000_001e(struct topo_scan *tscan, bool has_topoext)
 
 	cpuid_leaf(0x8000001e, &leaf);
 
-	tscan->c->topo.initial_apicid = leaf.ext_apic_id;
+	/*
+	 * Prefer initial_apicid parsed from CPUID leaf 0x8000026 or 0xb
+	 * if available. Otherwise prefer the one from leaf 0x8000001e
+	 * over 0x1.
+	 */
+	if (!has_topoext)
+		tscan->c->topo.initial_apicid = leaf.ext_apic_id;
 
 	/*
 	 * If leaf 0xb is available, then the domain shifts are set
-- 
2.34.1
Re: [PATCH 2/2] x86/cpu/topology: Use initial APICID from XTOPOEXT on AMD/HYGON
Posted by Naveen N Rao 3 months, 3 weeks ago
On Thu, Jun 12, 2025 at 07:29:21AM +0000, K Prateek Nayak wrote:
> Prior to the topology parsing rewrite and the switchover to the new
> parsing logic for AMD processors in commit c749ce393b8f ("x86/cpu: Use
> common topology code for AMD"), the "initial_apicid" on these platforms
> was:
> 
> - First initialized to the LocalApicId from CPUID leaf 0x1 EBX[31:24].
> 
> - Then overwritten by the ExtendedLocalApicId in CPUID leaf 0xb
>   EDX[31:0] on processors that supported topoext.
> 
> With the new parsing flow introduced in commit f7fb3b2dd92c ("x86/cpu:
> Provide an AMD/HYGON specific topology parser"), parse_8000_001e() now
> unconditionally overwrites the "initial_apicid" already parsed during
> cpu_parse_topology_ext().
> 
> Although this has not been a problem on baremetal platforms, on
> virtualized AMD guests that feature more than 255 cores, QEMU 0's out
> the CPUID leaf 0x8000001e on CPUs with "CoreID" > 255 to prevent
> collision of these IDs in EBX[7:0] which can only represent a maximum of
> 255 cores [1].
> 
> This results in the following FW_BUG being logged when booting a guest
> with more than 255 cores:
> 
>     [Firmware Bug]: CPU 512: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0200
> 
> Rely on the APICID parsed during cpu_parse_topology_ext() from CPUID
> leaf 0x80000026 or 0xb and only use the APICID from leaf 0x8000001e if
> cpu_parse_topology_ext() failed (has_topoext is false).
> 
> On platforms that support the 0xb leaf (Zen2 or later, AMD guests on
> QEMU) or the extended leaf 0x80000026 (Zen4 or later), the
> "initial_apicid" is now set to the value parsed from EDX[31:0].
> 
> On older AMD/Hygon platforms that does not support the 0xb leaf but
> supports the TOPOEXT extension (Fam 0x15, 0x16, 0x17[Zen1], and Hygon),
> the current behavior is retained where "initial_apicid" is set using
> the 0x8000001e leaf.
> 
> Link: https://github.com/qemu/qemu/commit/35ac5dfbcaa4b [1]
> Debugged-by: Naveen N Rao (AMD) <naveen@kernel.org>
> Debugged-by: Sairaj Kodilkar <sarunkod@amd.com>
> Fixes: c749ce393b8f ("x86/cpu: Use common topology code for AMD")
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>  arch/x86/kernel/cpu/topology_amd.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/topology_amd.c b/arch/x86/kernel/cpu/topology_amd.c
> index fec849fff82f..2ff6988e020a 100644
> --- a/arch/x86/kernel/cpu/topology_amd.c
> +++ b/arch/x86/kernel/cpu/topology_amd.c
> @@ -80,7 +80,13 @@ static bool parse_8000_001e(struct topo_scan *tscan, bool has_topoext)
>  
>  	cpuid_leaf(0x8000001e, &leaf);
>  
> -	tscan->c->topo.initial_apicid = leaf.ext_apic_id;
> +	/*
> +	 * Prefer initial_apicid parsed from CPUID leaf 0x8000026 or 0xb
> +	 * if available. Otherwise prefer the one from leaf 0x8000001e
> +	 * over 0x1.
> +	 */
> +	if (!has_topoext)
> +		tscan->c->topo.initial_apicid = leaf.ext_apic_id;

My understanding is that the rest of this function continues to work 
properly with an all-zero return value from CPUID leaf 0x8000001e - data 
from topoext is preferred where available, and other sources are used 
for llc_id.

And, this fixes the firmware bug warning seen during a guest boot in 
qemu for me, so for this patch:
Tested-by: Naveen N Rao (AMD) <naveen@kernel.org>


Thanks,
Naveen
Re: [PATCH 2/2] x86/cpu/topology: Use initial APICID from XTOPOEXT on AMD/HYGON
Posted by K Prateek Nayak 3 months, 3 weeks ago
Hello Naveen,

On 6/18/2025 9:30 AM, Naveen N Rao wrote:
>> diff --git a/arch/x86/kernel/cpu/topology_amd.c b/arch/x86/kernel/cpu/topology_amd.c
>> index fec849fff82f..2ff6988e020a 100644
>> --- a/arch/x86/kernel/cpu/topology_amd.c
>> +++ b/arch/x86/kernel/cpu/topology_amd.c
>> @@ -80,7 +80,13 @@ static bool parse_8000_001e(struct topo_scan *tscan, bool has_topoext)
>>   
>>   	cpuid_leaf(0x8000001e, &leaf);
>>   
>> -	tscan->c->topo.initial_apicid = leaf.ext_apic_id;
>> +	/*
>> +	 * Prefer initial_apicid parsed from CPUID leaf 0x8000026 or 0xb
>> +	 * if available. Otherwise prefer the one from leaf 0x8000001e
>> +	 * over 0x1.
>> +	 */
>> +	if (!has_topoext)
>> +		tscan->c->topo.initial_apicid = leaf.ext_apic_id;
> 
> My understanding is that the rest of this function continues to work
> properly with an all-zero return value from CPUID leaf 0x8000001e - data
> from topoext is preferred where available, and other sources are used
> for llc_id.

Yes. QEMU just errs on the side of caution for CoreID collision in
0x8000001e and "NumSharingCache" in 0x8000001d EAX[25:14] can still
represent 2^12 (4096) CPUs in each cache domain which is larger than
the total number of vCPUs possible currently I believe.

> And, this fixes the firmware bug warning seen during a guest boot in
> qemu for me, so for this patch:
> Tested-by: Naveen N Rao (AMD) <naveen@kernel.org>

Thank you for debugging the issue and testing :)

-- 
Thanks and Regards,
Prateek