[PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology

Penny Zheng posted 2 patches 1 week ago
Failed in applying to current master (apply log)
There is a newer version of this series
arch/x86/kernel/amd_node.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
[PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology
Posted by Penny Zheng 1 week ago
While booting a recent linux-next kernel as a Xen PVH dom0 on x86, the kernel
oopses very early during fs_initcall:

  Oops: divide error: 0000 [#1] SMP NOPTI
  RIP: 0010:amd_smn_init+0x188/0x2e0

Followed: on a kernel that survives the divide, it will fail by a NULL pointer
dereference from the first SMN consumer (amd_pmc_probe -> amd_smn_read).

Root cause
==========

To prevent each dom0 vCPU from looking like an SMT sibling of another
vCPU, Xen synthesizes guest x2APIC IDs as vcpu_index * 2. This spacing every
vCPU's APIC ID by 2 can push the synthesized IDs past the package-field
boundary. Linux then infers more "packages" and therefore more AMD
nodes via amd_num_nodes() than the platform actually has, while the
PCI-side host-bridge scan correctly reports the number of root complex.

The fixes are tested on Xen 4.20 PVH dom0 on AMD Zen (16 vCPUs) on top of
linux-next/master (next-20260505).

Penny Zheng (2):
  x86/amd_node: avoid divide-by-zero in amd_smn_init() under Xen dom0
  x86/amd_node: reject SMN access when amd_smn_init() did not complete

 arch/x86/kernel/amd_node.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

-- 
2.43.0
Re: [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology
Posted by Jiaqing Zhao 6 days, 10 hours ago
This amd_smn_init div0 oops is also observed on 6.19.14+deb14 (Debian testing)
and 6.18.27. Given that 6.18 is LTS, suggesting `Cc: stable@vger.kernel.org`
to get it backported.

Thanks,
Jiaqing

On 2026-05-06 13:55, Penny Zheng wrote:
> While booting a recent linux-next kernel as a Xen PVH dom0 on x86, the kernel
> oopses very early during fs_initcall:
> 
>   Oops: divide error: 0000 [#1] SMP NOPTI
>   RIP: 0010:amd_smn_init+0x188/0x2e0
> 
> Followed: on a kernel that survives the divide, it will fail by a NULL pointer
> dereference from the first SMN consumer (amd_pmc_probe -> amd_smn_read).
> 
> Root cause
> ==========
> 
> To prevent each dom0 vCPU from looking like an SMT sibling of another
> vCPU, Xen synthesizes guest x2APIC IDs as vcpu_index * 2. This spacing every
> vCPU's APIC ID by 2 can push the synthesized IDs past the package-field
> boundary. Linux then infers more "packages" and therefore more AMD
> nodes via amd_num_nodes() than the platform actually has, while the
> PCI-side host-bridge scan correctly reports the number of root complex.
> 
> The fixes are tested on Xen 4.20 PVH dom0 on AMD Zen (16 vCPUs) on top of
> linux-next/master (next-20260505).
> 
> Penny Zheng (2):
>   x86/amd_node: avoid divide-by-zero in amd_smn_init() under Xen dom0
>   x86/amd_node: reject SMN access when amd_smn_init() did not complete
> 
>  arch/x86/kernel/amd_node.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
>
Re: [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology
Posted by Mario Limonciello 1 week ago

On 5/6/26 00:55, Penny Zheng wrote:
> While booting a recent linux-next kernel as a Xen PVH dom0 on x86, the kernel
> oopses very early during fs_initcall:
> 
>    Oops: divide error: 0000 [#1] SMP NOPTI
>    RIP: 0010:amd_smn_init+0x188/0x2e0
> 
> Followed: on a kernel that survives the divide, it will fail by a NULL pointer
> dereference from the first SMN consumer (amd_pmc_probe -> amd_smn_read).

So to confirm - does amd_pmc_probe work properly with this series now?

> 
> Root cause
> ==========
> 
> To prevent each dom0 vCPU from looking like an SMT sibling of another
> vCPU, Xen synthesizes guest x2APIC IDs as vcpu_index * 2. This spacing every
> vCPU's APIC ID by 2 can push the synthesized IDs past the package-field
> boundary. Linux then infers more "packages" and therefore more AMD
> nodes via amd_num_nodes() than the platform actually has, while the
> PCI-side host-bridge scan correctly reports the number of root complex.
> 
> The fixes are tested on Xen 4.20 PVH dom0 on AMD Zen (16 vCPUs) on top of
> linux-next/master (next-20260505).
> 
> Penny Zheng (2):
>    x86/amd_node: avoid divide-by-zero in amd_smn_init() under Xen dom0
>    x86/amd_node: reject SMN access when amd_smn_init() did not complete
> 
>   arch/x86/kernel/amd_node.c | 23 ++++++++++++++++++++---
>   1 file changed, 20 insertions(+), 3 deletions(-)
>