drivers/pci/msi/msi.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
From: Himanshu Madhani <himanshu.madhani@oracle.com>
Fix system hang due to incorrect mutex lock placement.
following stack trace will be seen on system boot
[ 525.664681] task:systemd-shutdow state:D stack:0 pid:1 tgid:1 ppid:0 task_flags:0x400100 flags:0x00004002
[ 525.796878] Call Trace:
[ 525.826116] <TASK>
[ 525.851195] __schedule+0x2d1/0x730
[ 525.892917] schedule+0x27/0x80
[ 525.930478] schedule_preempt_disabled+0x15/0x30
[ 525.985718] __mutex_lock.constprop.0+0x4be/0x8a0
[ 526.041993] msi_domain_get_virq+0xcc/0x110
[ 526.092031] pci_msix_write_tph_tag+0x3c/0x100
[ 526.145186] pcie_tph_set_st_entry+0x125/0x1d0
[ 526.198346] bnxt_irq_affinity_release+0x35/0x50 [bnxt_en]
[ 526.264015] irq_set_affinity_notifier+0xe0/0x130
[ 526.320291] bnxt_free_irq+0x6e/0x110 [bnxt_en]
[ 526.374507] __bnxt_close_nic.isra.0+0x1eb/0x220 [bnxt_en]
[ 526.440175] bnxt_close+0x3a/0x100 [bnxt_en]
[ 526.491264] __dev_close_many+0xae/0x220
[ 526.538179] dev_close_many+0xc2/0x1b0
[ 526.583014] netif_close+0x9d/0xd0
[ 526.623693] bnxt_shutdown+0xb1/0xe0 [bnxt_en]
[ 526.676874] pci_device_shutdown+0x35/0x70
[ 526.725871] device_shutdown+0x118/0x1a0
[ 526.772788] kernel_restart+0x3a/0x70
[ 526.816588] __do_sys_reboot+0x150/0x250
[ 526.863504] do_syscall_64+0x84/0x940
[ 526.907300] ? __put_user_8+0xd/0x20
[ 526.950059] ? rseq_ip_fixup+0x90/0x1e0
[ 526.995937] ? task_mm_cid_work+0x1ad/0x220
[ 527.045971] ? __rseq_handle_notify_resume+0x35/0x90
[ 527.105367] ? arch_exit_to_user_mode_prepare.isra.0+0x98/0xb0
[ 527.175166] ? do_syscall_64+0xba/0x940
[ 527.221040] ? do_filp_open+0xd7/0x1a0
[ 527.265882] ? alloc_fd+0xba/0x110
[ 527.306556] ? do_sys_openat2+0xa4/0xf0
[ 527.352434] ? __x64_sys_openat+0x54/0xb0
[ 527.400389] ? arch_exit_to_user_mode_prepare.isra.0+0x9/0xb0
[ 527.469150] ? do_syscall_64+0xba/0x940
[ 527.515023] ? do_user_addr_fault+0x221/0x690
[ 527.567141] ? clear_bhb_loop+0x30/0x80
[ 527.613017] ? clear_bhb_loop+0x30/0x80
[ 527.658895] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 527.719332] RIP: 0033:0x7fc3ec504777
[ 527.762091] RSP: 002b:00007ffecd62c4f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
[ 527.852685] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3ec504777
[ 527.938085] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
[ 528.023485] RBP: 00007ffecd62c700 R08: 0000000000000000 R09: 00007ffecd62b8e0
[ 528.108878] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffecd62c568
[ 528.194273] R13: 00007ffecd62c548 R14: 00007ffecd62c568 R15: 0000000000000000
[ 528.279672] </TASK>
Fixes: d5124a9957b2 ("PCI/MSI: Provide a sane mechanism for TPH")
Fixes: 71296eae5887 ("PCI/TPH: Replace the broken MSI-X control word update")
Reported-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
---
drivers/pci/msi/msi.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 6ede55a7c5e6..d686488f4111 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -934,10 +934,12 @@ int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int index, u16 tag)
if (!pdev->msix_enabled)
return -ENXIO;
- guard(msi_descs_lock)(&pdev->dev);
virq = msi_get_virq(&pdev->dev, index);
if (!virq)
return -ENXIO;
+
+ guard(msi_descs_lock)(&pdev->dev);
+
/*
* This is a horrible hack, but short of implementing a PCI
* specific interrupt chip callback and a huge pile of
--
2.41.0.rc2
On Tue, Jul 08 2025 at 22:25, himanshu madhani wrote: > From: Himanshu Madhani <himanshu.madhani@oracle.com> The subject line is misleading because the problem is not in the THP code. It's in the PCI/MSI code implementation of the function which is used by THP. > Fix system hang due to incorrect mutex lock placement. > > following stack trace will be seen on system boot Please trim down stack traces to the bare minimum which is required to illustrate your point. > [ 525.664681] task:systemd-shutdow state:D stack:0 pid:1 tgid:1 ppid:0 task_flags:0x400100 flags:0x00004002 > [ 525.796878] Call Trace: > [ 525.826116] <TASK> > [ 525.851195] __schedule+0x2d1/0x730 > [ 525.892917] schedule+0x27/0x80 > [ 525.930478] schedule_preempt_disabled+0x15/0x30 > [ 525.985718] __mutex_lock.constprop.0+0x4be/0x8a0 > [ 526.041993] msi_domain_get_virq+0xcc/0x110 > [ 526.092031] pci_msix_write_tph_tag+0x3c/0x100 > [ 526.145186] pcie_tph_set_st_entry+0x125/0x1d0 > [ 526.198346] bnxt_irq_affinity_release+0x35/0x50 [bnxt_en] > [ 526.264015] irq_set_affinity_notifier+0xe0/0x130 > [ 526.320291] bnxt_free_irq+0x6e/0x110 [bnxt_en] > [ 526.374507] __bnxt_close_nic.isra.0+0x1eb/0x220 [bnxt_en] > [ 526.440175] bnxt_close+0x3a/0x100 [bnxt_en] > [ 526.491264] __dev_close_many+0xae/0x220 > [ 526.538179] dev_close_many+0xc2/0x1b0 > [ 526.583014] netif_close+0x9d/0xd0 > [ 526.623693] bnxt_shutdown+0xb1/0xe0 [bnxt_en] > [ 526.676874] pci_device_shutdown+0x35/0x70 > [ 526.725871] device_shutdown+0x118/0x1a0 You trimmed the interesting information that this is a softlockup and kept all the gunk below whihc is completely useless. > [ 526.772788] kernel_restart+0x3a/0x70 > [ 526.816588] __do_sys_reboot+0x150/0x250 > [ 526.863504] do_syscall_64+0x84/0x940 > [ 526.907300] ? __put_user_8+0xd/0x20 > [ 526.950059] ? rseq_ip_fixup+0x90/0x1e0 > [ 526.995937] ? task_mm_cid_work+0x1ad/0x220 > [ 527.045971] ? __rseq_handle_notify_resume+0x35/0x90 > [ 527.105367] ? arch_exit_to_user_mode_prepare.isra.0+0x98/0xb0 > [ 527.175166] ? do_syscall_64+0xba/0x940 > [ 527.221040] ? do_filp_open+0xd7/0x1a0 > [ 527.265882] ? alloc_fd+0xba/0x110 > [ 527.306556] ? do_sys_openat2+0xa4/0xf0 > [ 527.352434] ? __x64_sys_openat+0x54/0xb0 > [ 527.400389] ? arch_exit_to_user_mode_prepare.isra.0+0x9/0xb0 > [ 527.469150] ? do_syscall_64+0xba/0x940 > [ 527.515023] ? do_user_addr_fault+0x221/0x690 > [ 527.567141] ? clear_bhb_loop+0x30/0x80 > [ 527.613017] ? clear_bhb_loop+0x30/0x80 > [ 527.658895] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 527.719332] RIP: 0033:0x7fc3ec504777 > [ 527.762091] RSP: 002b:00007ffecd62c4f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9 > [ 527.852685] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3ec504777 > [ 527.938085] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead > [ 528.023485] RBP: 00007ffecd62c700 R08: 0000000000000000 R09: 00007ffecd62b8e0 > [ 528.108878] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffecd62c568 > [ 528.194273] R13: 00007ffecd62c548 R14: 00007ffecd62c568 R15: 0000000000000000 > [ 528.279672] </TASK> See https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces > Fixes: d5124a9957b2 ("PCI/MSI: Provide a sane mechanism for TPH") This fixes tag is correct > Fixes: 71296eae5887 ("PCI/TPH: Replace the broken MSI-X control word update") This one not because it's a subsequent problem caused by the above. > Reported-by: Jorge Lopez <jorge.jo.lopez@oracle.com> > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Tested-by: Jorge Lopez <jorge.jo.lopez@oracle.com> > Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com> Other than that this looks good. I pick it up tomorrow through the tip irq/urgent branch and fixup the changelog, so no need to resend. Thanks, tglx
Just realized this patch was missing maintainer email address and mailing list for reviews. Bjorn, Apologies. Can you please queue this up in the PCI tree for inclusion in 6.16-rc6. > On Jul 8, 2025, at 15:25, himanshu.madhani@oracle.com wrote: > > From: Himanshu Madhani <himanshu.madhani@oracle.com> > > Fix system hang due to incorrect mutex lock placement. > > following stack trace will be seen on system boot > > [ 525.664681] task:systemd-shutdow state:D stack:0 pid:1 tgid:1 ppid:0 task_flags:0x400100 flags:0x00004002 > [ 525.796878] Call Trace: > [ 525.826116] <TASK> > [ 525.851195] __schedule+0x2d1/0x730 > [ 525.892917] schedule+0x27/0x80 > [ 525.930478] schedule_preempt_disabled+0x15/0x30 > [ 525.985718] __mutex_lock.constprop.0+0x4be/0x8a0 > [ 526.041993] msi_domain_get_virq+0xcc/0x110 > [ 526.092031] pci_msix_write_tph_tag+0x3c/0x100 > [ 526.145186] pcie_tph_set_st_entry+0x125/0x1d0 > [ 526.198346] bnxt_irq_affinity_release+0x35/0x50 [bnxt_en] > [ 526.264015] irq_set_affinity_notifier+0xe0/0x130 > [ 526.320291] bnxt_free_irq+0x6e/0x110 [bnxt_en] > [ 526.374507] __bnxt_close_nic.isra.0+0x1eb/0x220 [bnxt_en] > [ 526.440175] bnxt_close+0x3a/0x100 [bnxt_en] > [ 526.491264] __dev_close_many+0xae/0x220 > [ 526.538179] dev_close_many+0xc2/0x1b0 > [ 526.583014] netif_close+0x9d/0xd0 > [ 526.623693] bnxt_shutdown+0xb1/0xe0 [bnxt_en] > [ 526.676874] pci_device_shutdown+0x35/0x70 > [ 526.725871] device_shutdown+0x118/0x1a0 > [ 526.772788] kernel_restart+0x3a/0x70 > [ 526.816588] __do_sys_reboot+0x150/0x250 > [ 526.863504] do_syscall_64+0x84/0x940 > [ 526.907300] ? __put_user_8+0xd/0x20 > [ 526.950059] ? rseq_ip_fixup+0x90/0x1e0 > [ 526.995937] ? task_mm_cid_work+0x1ad/0x220 > [ 527.045971] ? __rseq_handle_notify_resume+0x35/0x90 > [ 527.105367] ? arch_exit_to_user_mode_prepare.isra.0+0x98/0xb0 > [ 527.175166] ? do_syscall_64+0xba/0x940 > [ 527.221040] ? do_filp_open+0xd7/0x1a0 > [ 527.265882] ? alloc_fd+0xba/0x110 > [ 527.306556] ? do_sys_openat2+0xa4/0xf0 > [ 527.352434] ? __x64_sys_openat+0x54/0xb0 > [ 527.400389] ? arch_exit_to_user_mode_prepare.isra.0+0x9/0xb0 > [ 527.469150] ? do_syscall_64+0xba/0x940 > [ 527.515023] ? do_user_addr_fault+0x221/0x690 > [ 527.567141] ? clear_bhb_loop+0x30/0x80 > [ 527.613017] ? clear_bhb_loop+0x30/0x80 > [ 527.658895] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 527.719332] RIP: 0033:0x7fc3ec504777 > [ 527.762091] RSP: 002b:00007ffecd62c4f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9 > [ 527.852685] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3ec504777 > [ 527.938085] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead > [ 528.023485] RBP: 00007ffecd62c700 R08: 0000000000000000 R09: 00007ffecd62b8e0 > [ 528.108878] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffecd62c568 > [ 528.194273] R13: 00007ffecd62c548 R14: 00007ffecd62c568 R15: 0000000000000000 > [ 528.279672] </TASK> > > Fixes: d5124a9957b2 ("PCI/MSI: Provide a sane mechanism for TPH") > Fixes: 71296eae5887 ("PCI/TPH: Replace the broken MSI-X control word update") > Reported-by: Jorge Lopez <jorge.jo.lopez@oracle.com> > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Tested-by: Jorge Lopez <jorge.jo.lopez@oracle.com> > Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com> > --- > drivers/pci/msi/msi.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c > index 6ede55a7c5e6..d686488f4111 100644 > --- a/drivers/pci/msi/msi.c > +++ b/drivers/pci/msi/msi.c > @@ -934,10 +934,12 @@ int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int index, u16 tag) > if (!pdev->msix_enabled) > return -ENXIO; > > - guard(msi_descs_lock)(&pdev->dev); > virq = msi_get_virq(&pdev->dev, index); > if (!virq) > return -ENXIO; > + > + guard(msi_descs_lock)(&pdev->dev); > + > /* > * This is a horrible hack, but short of implementing a PCI > * specific interrupt chip callback and a huge pile of > -- > 2.41.0.rc2 > -- Himanshu Madhani Oracle Linux Engineering
The following commit has been merged into the irq/urgent branch of tip:
Commit-ID: 68ea85df15d111d82fc474cbe104174791169355
Gitweb: https://git.kernel.org/tip/68ea85df15d111d82fc474cbe104174791169355
Author: Himanshu Madhani <himanshu.madhani@oracle.com>
AuthorDate: Tue, 08 Jul 2025 22:25:30
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 10 Jul 2025 23:41:08 +02:00
PCI/MSI: Prevent recursive locking in pci_msix_write_tph_tag()
pci_msix_write_tph_tag() takes the per device MSI descriptor mutex and then
invokes msi_domain_get_virq(), which takes the same mutex again. That
obviously results in a system hang which is exposed by a softlockup or
lockdep warning.
Move the lock guard after the invocation of msi_domain_get_virq() to fix
this.
[ tglx: Massage changelog by adding a proper explanation and removing the
not really useful stacktrace ]
Fixes: d5124a9957b2 ("PCI/MSI: Provide a sane mechanism for TPH")
Reported-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
Link: https://lore.kernel.org/all/20250708222530.1041477-1-himanshu.madhani@oracle.com
---
drivers/pci/msi/msi.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 6ede55a..d686488 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -934,10 +934,12 @@ int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int index, u16 tag)
if (!pdev->msix_enabled)
return -ENXIO;
- guard(msi_descs_lock)(&pdev->dev);
virq = msi_get_virq(&pdev->dev, index);
if (!virq)
return -ENXIO;
+
+ guard(msi_descs_lock)(&pdev->dev);
+
/*
* This is a horrible hack, but short of implementing a PCI
* specific interrupt chip callback and a huge pile of
© 2016 - 2025 Red Hat, Inc.