arch/x86/kernel/amd_node.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)
While booting a recent linux-next kernel as a Xen PVH dom0 on x86, the kernel oopses very early during fs_initcall: Oops: divide error: 0000 [#1] SMP NOPTI RIP: 0010:amd_smn_init+0x188/0x2e0 Followed: on a kernel that survives the divide, it will fail by a NULL pointer dereference from the first SMN consumer (amd_pmc_probe -> amd_smn_read). Root cause ========== To prevent each dom0 vCPU from looking like an SMT sibling of another vCPU, Xen synthesizes guest x2APIC IDs as vcpu_index * 2. This spacing every vCPU's APIC ID by 2 can push the synthesized IDs past the package-field boundary. Linux then infers more "packages" and therefore more AMD nodes via amd_num_nodes() than the platform actually has, while the PCI-side host-bridge scan correctly reports the number of root complex. The fixes are tested on Xen 4.20 PVH dom0 on AMD Zen (16 vCPUs) on top of linux-next/master (next-20260505). Penny Zheng (2): x86/amd_node: avoid divide-by-zero in amd_smn_init() under Xen dom0 x86/amd_node: reject SMN access when amd_smn_init() did not complete arch/x86/kernel/amd_node.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) -- 2.43.0
On Wed, May 06, 2026 at 01:55:17PM +0800, Penny Zheng wrote:
> While booting a recent linux-next kernel as a Xen PVH dom0 on x86, the kernel
> oopses very early during fs_initcall:
>
> Oops: divide error: 0000 [#1] SMP NOPTI
> RIP: 0010:amd_smn_init+0x188/0x2e0
>
> Followed: on a kernel that survives the divide, it will fail by a NULL pointer
> dereference from the first SMN consumer (amd_pmc_probe -> amd_smn_read).
>
> Root cause
> ==========
>
> To prevent each dom0 vCPU from looking like an SMT sibling of another
> vCPU, Xen synthesizes guest x2APIC IDs as vcpu_index * 2. This spacing every
> vCPU's APIC ID by 2 can push the synthesized IDs past the package-field
> boundary. Linux then infers more "packages" and therefore more AMD
> nodes via amd_num_nodes() than the platform actually has, while the
> PCI-side host-bridge scan correctly reports the number of root complex.
Does that work?
diff --git a/arch/x86/kernel/amd_node.c b/arch/x86/kernel/amd_node.c
index 0be01725a2a4..0f1ad1883883 100644
--- a/arch/x86/kernel/amd_node.c
+++ b/arch/x86/kernel/amd_node.c
@@ -249,6 +249,9 @@ static int __init amd_smn_init(void)
u16 count, num_roots, roots_per_node, node, num_nodes;
struct pci_dev *root;
+ if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+ return 0;
+
if (!cpu_feature_enabled(X86_FEATURE_ZEN))
return 0;
Totally untested ofc.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
This amd_smn_init div0 oops is also observed on 6.19.14+deb14 (Debian testing) and 6.18.27. Given that 6.18 is LTS, suggesting `Cc: stable@vger.kernel.org` to get it backported. Thanks, Jiaqing On 2026-05-06 13:55, Penny Zheng wrote: > While booting a recent linux-next kernel as a Xen PVH dom0 on x86, the kernel > oopses very early during fs_initcall: > > Oops: divide error: 0000 [#1] SMP NOPTI > RIP: 0010:amd_smn_init+0x188/0x2e0 > > Followed: on a kernel that survives the divide, it will fail by a NULL pointer > dereference from the first SMN consumer (amd_pmc_probe -> amd_smn_read). > > Root cause > ========== > > To prevent each dom0 vCPU from looking like an SMT sibling of another > vCPU, Xen synthesizes guest x2APIC IDs as vcpu_index * 2. This spacing every > vCPU's APIC ID by 2 can push the synthesized IDs past the package-field > boundary. Linux then infers more "packages" and therefore more AMD > nodes via amd_num_nodes() than the platform actually has, while the > PCI-side host-bridge scan correctly reports the number of root complex. > > The fixes are tested on Xen 4.20 PVH dom0 on AMD Zen (16 vCPUs) on top of > linux-next/master (next-20260505). > > Penny Zheng (2): > x86/amd_node: avoid divide-by-zero in amd_smn_init() under Xen dom0 > x86/amd_node: reject SMN access when amd_smn_init() did not complete > > arch/x86/kernel/amd_node.c | 23 ++++++++++++++++++++--- > 1 file changed, 20 insertions(+), 3 deletions(-) >
On Thu, May 07, 2026 at 04:37:43PM +0800, Jiaqing Zhao wrote: > This amd_smn_init div0 oops is also observed on 6.19.14+deb14 (Debian testing) > and 6.18.27. Given that 6.18 is LTS, suggesting `Cc: stable@vger.kernel.org` > to get it backported. > Hi Jiaqing, So is the issue observed with the latest mainline kernel? Or only with v6.19 and before? Thanks, Yazen
Hi, Yazen
On 2026-05-19 22:10, Yazen Ghannam wrote:
> On Thu, May 07, 2026 at 04:37:43PM +0800, Jiaqing Zhao wrote:
>> This amd_smn_init div0 oops is also observed on 6.19.14+deb14 (Debian testing)
>> and 6.18.27. Given that 6.18 is LTS, suggesting `Cc: stable@vger.kernel.org`
>> to get it backported.
>>
>
> Hi Jiaqing,
>
> So is the issue observed with the latest mainline kernel?
>
> Or only with v6.19 and before?
It is observed with both latest mainline and v6.18/6.19. The issue comes with
commit 0a4b61d9c2e49 ("x86/amd_node: Fix AMD root device caching").
Thanks,
Jiaqing
> Thanks,
> Yazen
On 5/6/26 00:55, Penny Zheng wrote: > While booting a recent linux-next kernel as a Xen PVH dom0 on x86, the kernel > oopses very early during fs_initcall: > > Oops: divide error: 0000 [#1] SMP NOPTI > RIP: 0010:amd_smn_init+0x188/0x2e0 > > Followed: on a kernel that survives the divide, it will fail by a NULL pointer > dereference from the first SMN consumer (amd_pmc_probe -> amd_smn_read). So to confirm - does amd_pmc_probe work properly with this series now? > > Root cause > ========== > > To prevent each dom0 vCPU from looking like an SMT sibling of another > vCPU, Xen synthesizes guest x2APIC IDs as vcpu_index * 2. This spacing every > vCPU's APIC ID by 2 can push the synthesized IDs past the package-field > boundary. Linux then infers more "packages" and therefore more AMD > nodes via amd_num_nodes() than the platform actually has, while the > PCI-side host-bridge scan correctly reports the number of root complex. > > The fixes are tested on Xen 4.20 PVH dom0 on AMD Zen (16 vCPUs) on top of > linux-next/master (next-20260505). > > Penny Zheng (2): > x86/amd_node: avoid divide-by-zero in amd_smn_init() under Xen dom0 > x86/amd_node: reject SMN access when amd_smn_init() did not complete > > arch/x86/kernel/amd_node.c | 23 ++++++++++++++++++++--- > 1 file changed, 20 insertions(+), 3 deletions(-) >
© 2016 - 2026 Red Hat, Inc.