[PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology

Penny Zheng posted 2 patches 1 month, 1 week ago
arch/x86/kernel/amd_node.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
[PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology
Posted by Penny Zheng 1 month, 1 week ago
While booting a recent linux-next kernel as a Xen PVH dom0 on x86, the kernel
oopses very early during fs_initcall:

  Oops: divide error: 0000 [#1] SMP NOPTI
  RIP: 0010:amd_smn_init+0x188/0x2e0

Followed: on a kernel that survives the divide, it will fail by a NULL pointer
dereference from the first SMN consumer (amd_pmc_probe -> amd_smn_read).

Root cause
==========

To prevent each dom0 vCPU from looking like an SMT sibling of another
vCPU, Xen synthesizes guest x2APIC IDs as vcpu_index * 2. This spacing every
vCPU's APIC ID by 2 can push the synthesized IDs past the package-field
boundary. Linux then infers more "packages" and therefore more AMD
nodes via amd_num_nodes() than the platform actually has, while the
PCI-side host-bridge scan correctly reports the number of root complex.

The fixes are tested on Xen 4.20 PVH dom0 on AMD Zen (16 vCPUs) on top of
linux-next/master (next-20260505).

Penny Zheng (2):
  x86/amd_node: avoid divide-by-zero in amd_smn_init() under Xen dom0
  x86/amd_node: reject SMN access when amd_smn_init() did not complete

 arch/x86/kernel/amd_node.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

-- 
2.43.0
Re: [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology
Posted by Borislav Petkov 1 week, 3 days ago
On Wed, May 06, 2026 at 01:55:17PM +0800, Penny Zheng wrote:
> While booting a recent linux-next kernel as a Xen PVH dom0 on x86, the kernel
> oopses very early during fs_initcall:
> 
>   Oops: divide error: 0000 [#1] SMP NOPTI
>   RIP: 0010:amd_smn_init+0x188/0x2e0
> 
> Followed: on a kernel that survives the divide, it will fail by a NULL pointer
> dereference from the first SMN consumer (amd_pmc_probe -> amd_smn_read).
> 
> Root cause
> ==========
> 
> To prevent each dom0 vCPU from looking like an SMT sibling of another
> vCPU, Xen synthesizes guest x2APIC IDs as vcpu_index * 2. This spacing every
> vCPU's APIC ID by 2 can push the synthesized IDs past the package-field
> boundary. Linux then infers more "packages" and therefore more AMD
> nodes via amd_num_nodes() than the platform actually has, while the
> PCI-side host-bridge scan correctly reports the number of root complex.

Does that work?

diff --git a/arch/x86/kernel/amd_node.c b/arch/x86/kernel/amd_node.c
index 0be01725a2a4..0f1ad1883883 100644
--- a/arch/x86/kernel/amd_node.c
+++ b/arch/x86/kernel/amd_node.c
@@ -249,6 +249,9 @@ static int __init amd_smn_init(void)
 	u16 count, num_roots, roots_per_node, node, num_nodes;
 	struct pci_dev *root;
 
+	if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+		return 0;
+
 	if (!cpu_feature_enabled(X86_FEATURE_ZEN))
 		return 0;
 

Totally untested ofc.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology
Posted by Jiaqing Zhao 1 month, 1 week ago
This amd_smn_init div0 oops is also observed on 6.19.14+deb14 (Debian testing)
and 6.18.27. Given that 6.18 is LTS, suggesting `Cc: stable@vger.kernel.org`
to get it backported.

Thanks,
Jiaqing

On 2026-05-06 13:55, Penny Zheng wrote:
> While booting a recent linux-next kernel as a Xen PVH dom0 on x86, the kernel
> oopses very early during fs_initcall:
> 
>   Oops: divide error: 0000 [#1] SMP NOPTI
>   RIP: 0010:amd_smn_init+0x188/0x2e0
> 
> Followed: on a kernel that survives the divide, it will fail by a NULL pointer
> dereference from the first SMN consumer (amd_pmc_probe -> amd_smn_read).
> 
> Root cause
> ==========
> 
> To prevent each dom0 vCPU from looking like an SMT sibling of another
> vCPU, Xen synthesizes guest x2APIC IDs as vcpu_index * 2. This spacing every
> vCPU's APIC ID by 2 can push the synthesized IDs past the package-field
> boundary. Linux then infers more "packages" and therefore more AMD
> nodes via amd_num_nodes() than the platform actually has, while the
> PCI-side host-bridge scan correctly reports the number of root complex.
> 
> The fixes are tested on Xen 4.20 PVH dom0 on AMD Zen (16 vCPUs) on top of
> linux-next/master (next-20260505).
> 
> Penny Zheng (2):
>   x86/amd_node: avoid divide-by-zero in amd_smn_init() under Xen dom0
>   x86/amd_node: reject SMN access when amd_smn_init() did not complete
> 
>  arch/x86/kernel/amd_node.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
>
Re: [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology
Posted by Yazen Ghannam 3 weeks, 4 days ago
On Thu, May 07, 2026 at 04:37:43PM +0800, Jiaqing Zhao wrote:
> This amd_smn_init div0 oops is also observed on 6.19.14+deb14 (Debian testing)
> and 6.18.27. Given that 6.18 is LTS, suggesting `Cc: stable@vger.kernel.org`
> to get it backported.
> 

Hi Jiaqing,

So is the issue observed with the latest mainline kernel?

Or only with v6.19 and before?

Thanks,
Yazen
Re: [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology
Posted by Jiaqing Zhao 1 week, 4 days ago
Hi, Yazen

On 2026-05-19 22:10, Yazen Ghannam wrote:
> On Thu, May 07, 2026 at 04:37:43PM +0800, Jiaqing Zhao wrote:
>> This amd_smn_init div0 oops is also observed on 6.19.14+deb14 (Debian testing)
>> and 6.18.27. Given that 6.18 is LTS, suggesting `Cc: stable@vger.kernel.org`
>> to get it backported.
>>
> 
> Hi Jiaqing,
> 
> So is the issue observed with the latest mainline kernel?
> 
> Or only with v6.19 and before?

It is observed with both latest mainline and v6.18/6.19. The issue comes with
commit 0a4b61d9c2e49 ("x86/amd_node: Fix AMD root device caching").

Thanks,
Jiaqing

> Thanks,
> Yazen
Re: [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology
Posted by Mario Limonciello 1 month, 1 week ago

On 5/6/26 00:55, Penny Zheng wrote:
> While booting a recent linux-next kernel as a Xen PVH dom0 on x86, the kernel
> oopses very early during fs_initcall:
> 
>    Oops: divide error: 0000 [#1] SMP NOPTI
>    RIP: 0010:amd_smn_init+0x188/0x2e0
> 
> Followed: on a kernel that survives the divide, it will fail by a NULL pointer
> dereference from the first SMN consumer (amd_pmc_probe -> amd_smn_read).

So to confirm - does amd_pmc_probe work properly with this series now?

> 
> Root cause
> ==========
> 
> To prevent each dom0 vCPU from looking like an SMT sibling of another
> vCPU, Xen synthesizes guest x2APIC IDs as vcpu_index * 2. This spacing every
> vCPU's APIC ID by 2 can push the synthesized IDs past the package-field
> boundary. Linux then infers more "packages" and therefore more AMD
> nodes via amd_num_nodes() than the platform actually has, while the
> PCI-side host-bridge scan correctly reports the number of root complex.
> 
> The fixes are tested on Xen 4.20 PVH dom0 on AMD Zen (16 vCPUs) on top of
> linux-next/master (next-20260505).
> 
> Penny Zheng (2):
>    x86/amd_node: avoid divide-by-zero in amd_smn_init() under Xen dom0
>    x86/amd_node: reject SMN access when amd_smn_init() did not complete
> 
>   arch/x86/kernel/amd_node.c | 23 ++++++++++++++++++++---
>   1 file changed, 20 insertions(+), 3 deletions(-)
>