[PATCH 0/2] x86/cpu/topology: Work around the nuances of virtualization on AMD/Hygon

K Prateek Nayak posted 2 patches 4 months ago
There is a newer version of this series
arch/x86/kernel/cpu/topology_amd.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
[PATCH 0/2] x86/cpu/topology: Work around the nuances of virtualization on AMD/Hygon
Posted by K Prateek Nayak 4 months ago
When running an AMD guest on QEMU with > 255 cores, the following FW_BUG
was noticed with recent kernels:

    [Firmware Bug]: CPU 512: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0200

Naveen, Sairaj debugged the cause to commit c749ce393b8f ("x86/cpu: Use
common topology code for AMD") where, after the rework, the initial
APICID was set using the CPUID leaf 0x8000001e EAX[31:0] as opposed to
the value from CPUID leaf 0xb EDX[31:0] previously.

This led us down a rabbit hole of XTOPOEXT vs TOPOEXT support, preferred
order of their parsing, and QEMU nuances like [1] where QEMU 0's out the
CPUID leaf 0x8000001e on CPUs where Core ID crosses 255 fearing a
Core ID collision in the 8 bit field which leads to the reported FW_BUG.

Following were major observations during the debug which the two
patches address respectively:

1. The support for CPUID leaf 0xb is independent of the TOPOEXT feature
   and is rather linked to the x2APIC enablement. On baremetal, this has
   not been a problem since TOPOEXT support (Fam 0x15 and above)
   predates the support for CPUID leaf 0xb (Fam 0x17[Zen2] and above)
   however, in virtualized environment, the support for x2APIC can be
   enabled independent of topoext where QEMU expects the guest to parse
   the topology and the APICID from CPUID leaf 0xb.

2. Since CPUID leaf 0x8000001e cannot represent Core ID without
   collision for guests with > 255 cores, and QEMU 0's out the entire
   leaf when Core ID crosses 255. Prefer initial APIC read from the
   XTOPOEXT leaf before falling back to the APICID from 0x8000001e
   which is still better than 8-bit APICID from leaf 0x1 EBX[31:24].

More details are enclosed in the commit logs.

Ideally, these changes should not affect baremetal AMD/Hygon platforms
as they have supported TOPOEXT long before the support for CPUID leaf
0xb and the extended CPUID leaf 0x80000026 (famous last words).

This series has been tested on baremetal Zen1 (contains topoext but not
0xb leaf), Zen3 (contains both topoext and 0xb leaf), and Zen4 (contains
topoext, 0xb leaf, and 0x80000026 leaf) servers with no changes
observed in "/sys/kernel/debug/x86/topo/" directory.

The series was also tested on 255 and 512 vCPU (each vCPU is an
individual core from QEMU topology being passed) EPYC-Genoa guest with
and without x2apic and topoext enabled and this series solves the FW_BUG
seen on guest with > 255 VCPUs. No changes observed in
"/sys/kernel/debug/x86/topo/" for all other cases without warning.
0xb leaf is provided unconditionally on these guests (with or without
topoext, even with x2apic disabled on guests with <= 255 vCPU).

Relevant bits of QEMU cmdline used during testing are as follows:

    qemu-system-x86_64 \
    -enable-kvm -m 32G -smp cpus=255,cores=255 \
    -cpu EPYC-Genoa,x2apic=on,kvm-msi-ext-dest-id=on,+kvm-pv-unhalt,kvm-pv-tlb-flush,kvm-pv-ipi,kvm-pv-sched-yield,[-topoext]  \
    -machine q35,kernel_irqchip=split \
    -global kvm-pit.lost_tick_policy=discard
    ...

References:

[1] https://github.com/qemu/qemu/commit/35ac5dfbcaa4b

Series is based on tip:x86/cpu at tag v6.15-rc6.

---
K Prateek Nayak (2):
  x86/cpu/topology: Always try cpu_parse_topology_ext() on AMD/Hygon
  x86/cpu/topology: Use initial APICID from XTOPOEXT on AMD/HYGON

 arch/x86/kernel/cpu/topology_amd.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)


base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
-- 
2.34.1
Re: [PATCH 0/2] x86/cpu/topology: Work around the nuances of virtualization on AMD/Hygon
Posted by K Prateek Nayak 3 months, 2 weeks ago
On 6/12/2025 12:59 PM, K Prateek Nayak wrote:
> When running an AMD guest on QEMU with > 255 cores, the following FW_BUG
> was noticed with recent kernels:
> 
>      [Firmware Bug]: CPU 512: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0200
> 
> Naveen, Sairaj debugged the cause to commit c749ce393b8f ("x86/cpu: Use
> common topology code for AMD") where, after the rework, the initial
> APICID was set using the CPUID leaf 0x8000001e EAX[31:0] as opposed to
> the value from CPUID leaf 0xb EDX[31:0] previously.
> 
> This led us down a rabbit hole of XTOPOEXT vs TOPOEXT support, preferred
> order of their parsing, and QEMU nuances like [1] where QEMU 0's out the
> CPUID leaf 0x8000001e on CPUs where Core ID crosses 255 fearing a
> Core ID collision in the 8 bit field which leads to the reported FW_BUG.
> 
> Following were major observations during the debug which the two
> patches address respectively:
> 
> 1. The support for CPUID leaf 0xb is independent of the TOPOEXT feature
>     and is rather linked to the x2APIC enablement. On baremetal, this has
>     not been a problem since TOPOEXT support (Fam 0x15 and above)
>     predates the support for CPUID leaf 0xb (Fam 0x17[Zen2] and above)
>     however, in virtualized environment, the support for x2APIC can be
>     enabled independent of topoext where QEMU expects the guest to parse
>     the topology and the APICID from CPUID leaf 0xb.
> 
> 2. Since CPUID leaf 0x8000001e cannot represent Core ID without
>     collision for guests with > 255 cores, and QEMU 0's out the entire
>     leaf when Core ID crosses 255. Prefer initial APIC read from the
>     XTOPOEXT leaf before falling back to the APICID from 0x8000001e
>     which is still better than 8-bit APICID from leaf 0x1 EBX[31:24].
> 
> More details are enclosed in the commit logs.
> 
> Ideally, these changes should not affect baremetal AMD/Hygon platforms
> as they have supported TOPOEXT long before the support for CPUID leaf
> 0xb and the extended CPUID leaf 0x80000026 (famous last words).
> 
> This series has been tested on baremetal Zen1 (contains topoext but not
> 0xb leaf), Zen3 (contains both topoext and 0xb leaf), and Zen4 (contains
> topoext, 0xb leaf, and 0x80000026 leaf) servers with no changes
> observed in "/sys/kernel/debug/x86/topo/" directory.
> 
> The series was also tested on 255 and 512 vCPU (each vCPU is an
> individual core from QEMU topology being passed) EPYC-Genoa guest with
> and without x2apic and topoext enabled and this series solves the FW_BUG
> seen on guest with > 255 VCPUs. No changes observed in
> "/sys/kernel/debug/x86/topo/" for all other cases without warning.
> 0xb leaf is provided unconditionally on these guests (with or without
> topoext, even with x2apic disabled on guests with <= 255 vCPU).
> 
> Relevant bits of QEMU cmdline used during testing are as follows:
> 
>      qemu-system-x86_64 \
>      -enable-kvm -m 32G -smp cpus=255,cores=255 \
>      -cpu EPYC-Genoa,x2apic=on,kvm-msi-ext-dest-id=on,+kvm-pv-unhalt,kvm-pv-tlb-flush,kvm-pv-ipi,kvm-pv-sched-yield,[-topoext]  \
>      -machine q35,kernel_irqchip=split \
>      -global kvm-pit.lost_tick_policy=discard
>      ...
> 
> References:
> 
> [1] https://github.com/qemu/qemu/commit/35ac5dfbcaa4b
> 
> Series is based on tip:x86/cpu at tag v6.15-rc6.
> 
> ---
> K Prateek Nayak (2):
>    x86/cpu/topology: Always try cpu_parse_topology_ext() on AMD/Hygon
>    x86/cpu/topology: Use initial APICID from XTOPOEXT on AMD/HYGON
> 
>   arch/x86/kernel/cpu/topology_amd.c | 16 +++++++++-------
>   1 file changed, 9 insertions(+), 7 deletions(-)
> 
> 
> base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3

Gentle ping!

-- 
Thanks and Regards,
Prateek
Re: [PATCH 0/2] x86/cpu/topology: Work around the nuances of virtualization on AMD/Hygon
Posted by K Prateek Nayak 3 months, 1 week ago
On 6/25/2025 11:58 AM, K Prateek Nayak wrote:
> On 6/12/2025 12:59 PM, K Prateek Nayak wrote:
>> When running an AMD guest on QEMU with > 255 cores, the following FW_BUG
>> was noticed with recent kernels:
>>
>>      [Firmware Bug]: CPU 512: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0200
>>
>> Naveen, Sairaj debugged the cause to commit c749ce393b8f ("x86/cpu: Use
>> common topology code for AMD") where, after the rework, the initial
>> APICID was set using the CPUID leaf 0x8000001e EAX[31:0] as opposed to
>> the value from CPUID leaf 0xb EDX[31:0] previously.
>>
>> This led us down a rabbit hole of XTOPOEXT vs TOPOEXT support, preferred
>> order of their parsing, and QEMU nuances like [1] where QEMU 0's out the
>> CPUID leaf 0x8000001e on CPUs where Core ID crosses 255 fearing a
>> Core ID collision in the 8 bit field which leads to the reported FW_BUG.
>>
>> Following were major observations during the debug which the two
>> patches address respectively:
>>
>> 1. The support for CPUID leaf 0xb is independent of the TOPOEXT feature
>>     and is rather linked to the x2APIC enablement. On baremetal, this has
>>     not been a problem since TOPOEXT support (Fam 0x15 and above)
>>     predates the support for CPUID leaf 0xb (Fam 0x17[Zen2] and above)
>>     however, in virtualized environment, the support for x2APIC can be
>>     enabled independent of topoext where QEMU expects the guest to parse
>>     the topology and the APICID from CPUID leaf 0xb.
>>
>> 2. Since CPUID leaf 0x8000001e cannot represent Core ID without
>>     collision for guests with > 255 cores, and QEMU 0's out the entire
>>     leaf when Core ID crosses 255. Prefer initial APIC read from the
>>     XTOPOEXT leaf before falling back to the APICID from 0x8000001e
>>     which is still better than 8-bit APICID from leaf 0x1 EBX[31:24].
>>
>> More details are enclosed in the commit logs.
>>
>> Ideally, these changes should not affect baremetal AMD/Hygon platforms
>> as they have supported TOPOEXT long before the support for CPUID leaf
>> 0xb and the extended CPUID leaf 0x80000026 (famous last words).
>>
>> This series has been tested on baremetal Zen1 (contains topoext but not
>> 0xb leaf), Zen3 (contains both topoext and 0xb leaf), and Zen4 (contains
>> topoext, 0xb leaf, and 0x80000026 leaf) servers with no changes
>> observed in "/sys/kernel/debug/x86/topo/" directory.
>>
>> The series was also tested on 255 and 512 vCPU (each vCPU is an
>> individual core from QEMU topology being passed) EPYC-Genoa guest with
>> and without x2apic and topoext enabled and this series solves the FW_BUG
>> seen on guest with > 255 VCPUs. No changes observed in
>> "/sys/kernel/debug/x86/topo/" for all other cases without warning.
>> 0xb leaf is provided unconditionally on these guests (with or without
>> topoext, even with x2apic disabled on guests with <= 255 vCPU).
>>
>> Relevant bits of QEMU cmdline used during testing are as follows:
>>
>>      qemu-system-x86_64 \
>>      -enable-kvm -m 32G -smp cpus=255,cores=255 \
>>      -cpu EPYC-Genoa,x2apic=on,kvm-msi-ext-dest-id=on,+kvm-pv-unhalt,kvm-pv-tlb-flush,kvm-pv-ipi,kvm-pv-sched-yield,[-topoext]  \
>>      -machine q35,kernel_irqchip=split \
>>      -global kvm-pit.lost_tick_policy=discard
>>      ...
>>
>> References:
>>
>> [1] https://github.com/qemu/qemu/commit/35ac5dfbcaa4b
>>
>> Series is based on tip:x86/cpu at tag v6.15-rc6.
>>
>> ---
>> K Prateek Nayak (2):
>>    x86/cpu/topology: Always try cpu_parse_topology_ext() on AMD/Hygon
>>    x86/cpu/topology: Use initial APICID from XTOPOEXT on AMD/HYGON
>>
>>   arch/x86/kernel/cpu/topology_amd.c | 16 +++++++++-------
>>   1 file changed, 9 insertions(+), 7 deletions(-)
>>
>>
>> base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
> 
> Gentle ping!

Gentle ping! But a wee bit less gentle compared to last time :)

-- 
Thanks and Regards,
Prateek