[PATCH] PCI/THP: Fix hang due to incorrect guard lock

himanshu.madhani@oracle.com posted 1 patch 3 months ago
drivers/pci/msi/msi.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
[PATCH] PCI/THP: Fix hang due to incorrect guard lock
Posted by himanshu.madhani@oracle.com 3 months ago
From: Himanshu Madhani <himanshu.madhani@oracle.com>

Fix system hang due to incorrect mutex lock placement.

following stack trace will be seen on system boot

[  525.664681] task:systemd-shutdow state:D stack:0     pid:1     tgid:1     ppid:0      task_flags:0x400100 flags:0x00004002
[  525.796878] Call Trace:
[  525.826116]  <TASK>
[  525.851195]  __schedule+0x2d1/0x730
[  525.892917]  schedule+0x27/0x80
[  525.930478]  schedule_preempt_disabled+0x15/0x30
[  525.985718]  __mutex_lock.constprop.0+0x4be/0x8a0
[  526.041993]  msi_domain_get_virq+0xcc/0x110
[  526.092031]  pci_msix_write_tph_tag+0x3c/0x100
[  526.145186]  pcie_tph_set_st_entry+0x125/0x1d0
[  526.198346]  bnxt_irq_affinity_release+0x35/0x50 [bnxt_en]
[  526.264015]  irq_set_affinity_notifier+0xe0/0x130
[  526.320291]  bnxt_free_irq+0x6e/0x110 [bnxt_en]
[  526.374507]  __bnxt_close_nic.isra.0+0x1eb/0x220 [bnxt_en]
[  526.440175]  bnxt_close+0x3a/0x100 [bnxt_en]
[  526.491264]  __dev_close_many+0xae/0x220
[  526.538179]  dev_close_many+0xc2/0x1b0
[  526.583014]  netif_close+0x9d/0xd0
[  526.623693]  bnxt_shutdown+0xb1/0xe0 [bnxt_en]
[  526.676874]  pci_device_shutdown+0x35/0x70
[  526.725871]  device_shutdown+0x118/0x1a0
[  526.772788]  kernel_restart+0x3a/0x70
[  526.816588]  __do_sys_reboot+0x150/0x250
[  526.863504]  do_syscall_64+0x84/0x940
[  526.907300]  ? __put_user_8+0xd/0x20
[  526.950059]  ? rseq_ip_fixup+0x90/0x1e0
[  526.995937]  ? task_mm_cid_work+0x1ad/0x220
[  527.045971]  ? __rseq_handle_notify_resume+0x35/0x90
[  527.105367]  ? arch_exit_to_user_mode_prepare.isra.0+0x98/0xb0
[  527.175166]  ? do_syscall_64+0xba/0x940
[  527.221040]  ? do_filp_open+0xd7/0x1a0
[  527.265882]  ? alloc_fd+0xba/0x110
[  527.306556]  ? do_sys_openat2+0xa4/0xf0
[  527.352434]  ? __x64_sys_openat+0x54/0xb0
[  527.400389]  ? arch_exit_to_user_mode_prepare.isra.0+0x9/0xb0
[  527.469150]  ? do_syscall_64+0xba/0x940
[  527.515023]  ? do_user_addr_fault+0x221/0x690
[  527.567141]  ? clear_bhb_loop+0x30/0x80
[  527.613017]  ? clear_bhb_loop+0x30/0x80
[  527.658895]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  527.719332] RIP: 0033:0x7fc3ec504777
[  527.762091] RSP: 002b:00007ffecd62c4f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
[  527.852685] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3ec504777
[  527.938085] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
[  528.023485] RBP: 00007ffecd62c700 R08: 0000000000000000 R09: 00007ffecd62b8e0
[  528.108878] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffecd62c568
[  528.194273] R13: 00007ffecd62c548 R14: 00007ffecd62c568 R15: 0000000000000000
[  528.279672]  </TASK>

Fixes: d5124a9957b2 ("PCI/MSI: Provide a sane mechanism for TPH")
Fixes: 71296eae5887 ("PCI/TPH: Replace the broken MSI-X control word update")
Reported-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
---
 drivers/pci/msi/msi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 6ede55a7c5e6..d686488f4111 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -934,10 +934,12 @@ int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int index, u16 tag)
 	if (!pdev->msix_enabled)
 		return -ENXIO;
 
-	guard(msi_descs_lock)(&pdev->dev);
 	virq = msi_get_virq(&pdev->dev, index);
 	if (!virq)
 		return -ENXIO;
+
+	guard(msi_descs_lock)(&pdev->dev);
+
 	/*
 	 * This is a horrible hack, but short of implementing a PCI
 	 * specific interrupt chip callback and a huge pile of
-- 
2.41.0.rc2
Re: [PATCH] PCI/THP: Fix hang due to incorrect guard lock
Posted by Thomas Gleixner 2 months, 4 weeks ago
On Tue, Jul 08 2025 at 22:25, himanshu madhani wrote:

> From: Himanshu Madhani <himanshu.madhani@oracle.com>

The subject line is misleading because the problem is not in the THP
code. It's in the PCI/MSI code implementation of the function which is
used by THP.

> Fix system hang due to incorrect mutex lock placement.
>
> following stack trace will be seen on system boot

Please trim down stack traces to the bare minimum which is required to
illustrate your point.

> [  525.664681] task:systemd-shutdow state:D stack:0     pid:1     tgid:1     ppid:0      task_flags:0x400100 flags:0x00004002
> [  525.796878] Call Trace:
> [  525.826116]  <TASK>
> [  525.851195]  __schedule+0x2d1/0x730
> [  525.892917]  schedule+0x27/0x80
> [  525.930478]  schedule_preempt_disabled+0x15/0x30
> [  525.985718]  __mutex_lock.constprop.0+0x4be/0x8a0
> [  526.041993]  msi_domain_get_virq+0xcc/0x110
> [  526.092031]  pci_msix_write_tph_tag+0x3c/0x100
> [  526.145186]  pcie_tph_set_st_entry+0x125/0x1d0
> [  526.198346]  bnxt_irq_affinity_release+0x35/0x50 [bnxt_en]
> [  526.264015]  irq_set_affinity_notifier+0xe0/0x130
> [  526.320291]  bnxt_free_irq+0x6e/0x110 [bnxt_en]
> [  526.374507]  __bnxt_close_nic.isra.0+0x1eb/0x220 [bnxt_en]
> [  526.440175]  bnxt_close+0x3a/0x100 [bnxt_en]
> [  526.491264]  __dev_close_many+0xae/0x220
> [  526.538179]  dev_close_many+0xc2/0x1b0
> [  526.583014]  netif_close+0x9d/0xd0
> [  526.623693]  bnxt_shutdown+0xb1/0xe0 [bnxt_en]
> [  526.676874]  pci_device_shutdown+0x35/0x70
> [  526.725871]  device_shutdown+0x118/0x1a0

You trimmed the interesting information that this is a softlockup and
kept all the gunk below whihc is completely useless.

> [  526.772788]  kernel_restart+0x3a/0x70
> [  526.816588]  __do_sys_reboot+0x150/0x250
> [  526.863504]  do_syscall_64+0x84/0x940
> [  526.907300]  ? __put_user_8+0xd/0x20
> [  526.950059]  ? rseq_ip_fixup+0x90/0x1e0
> [  526.995937]  ? task_mm_cid_work+0x1ad/0x220
> [  527.045971]  ? __rseq_handle_notify_resume+0x35/0x90
> [  527.105367]  ? arch_exit_to_user_mode_prepare.isra.0+0x98/0xb0
> [  527.175166]  ? do_syscall_64+0xba/0x940
> [  527.221040]  ? do_filp_open+0xd7/0x1a0
> [  527.265882]  ? alloc_fd+0xba/0x110
> [  527.306556]  ? do_sys_openat2+0xa4/0xf0
> [  527.352434]  ? __x64_sys_openat+0x54/0xb0
> [  527.400389]  ? arch_exit_to_user_mode_prepare.isra.0+0x9/0xb0
> [  527.469150]  ? do_syscall_64+0xba/0x940
> [  527.515023]  ? do_user_addr_fault+0x221/0x690
> [  527.567141]  ? clear_bhb_loop+0x30/0x80
> [  527.613017]  ? clear_bhb_loop+0x30/0x80
> [  527.658895]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  527.719332] RIP: 0033:0x7fc3ec504777
> [  527.762091] RSP: 002b:00007ffecd62c4f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
> [  527.852685] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3ec504777
> [  527.938085] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
> [  528.023485] RBP: 00007ffecd62c700 R08: 0000000000000000 R09: 00007ffecd62b8e0
> [  528.108878] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffecd62c568
> [  528.194273] R13: 00007ffecd62c548 R14: 00007ffecd62c568 R15: 0000000000000000
> [  528.279672]  </TASK>

See https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces

> Fixes: d5124a9957b2 ("PCI/MSI: Provide a sane mechanism for TPH")

This fixes tag is correct

> Fixes: 71296eae5887 ("PCI/TPH: Replace the broken MSI-X control word update")

This one not because it's a subsequent problem caused by the above.

> Reported-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
> Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>

Other than that this looks good.

I pick it up tomorrow through the tip irq/urgent branch and fixup the
changelog, so no need to resend.

Thanks,

        tglx
Re: [PATCH] PCI/THP: Fix hang due to incorrect guard lock
Posted by Himanshu Madhani 2 months, 4 weeks ago
Just realized this patch was missing maintainer email address and mailing list for reviews.  

Bjorn, 

Apologies. Can you please queue this up in the PCI tree for inclusion in 6.16-rc6. 


> On Jul 8, 2025, at 15:25, himanshu.madhani@oracle.com wrote:
> 
> From: Himanshu Madhani <himanshu.madhani@oracle.com>
> 
> Fix system hang due to incorrect mutex lock placement.
> 
> following stack trace will be seen on system boot
> 
> [  525.664681] task:systemd-shutdow state:D stack:0     pid:1     tgid:1     ppid:0      task_flags:0x400100 flags:0x00004002
> [  525.796878] Call Trace:
> [  525.826116]  <TASK>
> [  525.851195]  __schedule+0x2d1/0x730
> [  525.892917]  schedule+0x27/0x80
> [  525.930478]  schedule_preempt_disabled+0x15/0x30
> [  525.985718]  __mutex_lock.constprop.0+0x4be/0x8a0
> [  526.041993]  msi_domain_get_virq+0xcc/0x110
> [  526.092031]  pci_msix_write_tph_tag+0x3c/0x100
> [  526.145186]  pcie_tph_set_st_entry+0x125/0x1d0
> [  526.198346]  bnxt_irq_affinity_release+0x35/0x50 [bnxt_en]
> [  526.264015]  irq_set_affinity_notifier+0xe0/0x130
> [  526.320291]  bnxt_free_irq+0x6e/0x110 [bnxt_en]
> [  526.374507]  __bnxt_close_nic.isra.0+0x1eb/0x220 [bnxt_en]
> [  526.440175]  bnxt_close+0x3a/0x100 [bnxt_en]
> [  526.491264]  __dev_close_many+0xae/0x220
> [  526.538179]  dev_close_many+0xc2/0x1b0
> [  526.583014]  netif_close+0x9d/0xd0
> [  526.623693]  bnxt_shutdown+0xb1/0xe0 [bnxt_en]
> [  526.676874]  pci_device_shutdown+0x35/0x70
> [  526.725871]  device_shutdown+0x118/0x1a0
> [  526.772788]  kernel_restart+0x3a/0x70
> [  526.816588]  __do_sys_reboot+0x150/0x250
> [  526.863504]  do_syscall_64+0x84/0x940
> [  526.907300]  ? __put_user_8+0xd/0x20
> [  526.950059]  ? rseq_ip_fixup+0x90/0x1e0
> [  526.995937]  ? task_mm_cid_work+0x1ad/0x220
> [  527.045971]  ? __rseq_handle_notify_resume+0x35/0x90
> [  527.105367]  ? arch_exit_to_user_mode_prepare.isra.0+0x98/0xb0
> [  527.175166]  ? do_syscall_64+0xba/0x940
> [  527.221040]  ? do_filp_open+0xd7/0x1a0
> [  527.265882]  ? alloc_fd+0xba/0x110
> [  527.306556]  ? do_sys_openat2+0xa4/0xf0
> [  527.352434]  ? __x64_sys_openat+0x54/0xb0
> [  527.400389]  ? arch_exit_to_user_mode_prepare.isra.0+0x9/0xb0
> [  527.469150]  ? do_syscall_64+0xba/0x940
> [  527.515023]  ? do_user_addr_fault+0x221/0x690
> [  527.567141]  ? clear_bhb_loop+0x30/0x80
> [  527.613017]  ? clear_bhb_loop+0x30/0x80
> [  527.658895]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  527.719332] RIP: 0033:0x7fc3ec504777
> [  527.762091] RSP: 002b:00007ffecd62c4f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
> [  527.852685] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3ec504777
> [  527.938085] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
> [  528.023485] RBP: 00007ffecd62c700 R08: 0000000000000000 R09: 00007ffecd62b8e0
> [  528.108878] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffecd62c568
> [  528.194273] R13: 00007ffecd62c548 R14: 00007ffecd62c568 R15: 0000000000000000
> [  528.279672]  </TASK>
> 
> Fixes: d5124a9957b2 ("PCI/MSI: Provide a sane mechanism for TPH")
> Fixes: 71296eae5887 ("PCI/TPH: Replace the broken MSI-X control word update")
> Reported-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
> Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
> ---
> drivers/pci/msi/msi.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 6ede55a7c5e6..d686488f4111 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -934,10 +934,12 @@ int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int index, u16 tag)
> if (!pdev->msix_enabled)
> return -ENXIO;
> 
> - guard(msi_descs_lock)(&pdev->dev);
> virq = msi_get_virq(&pdev->dev, index);
> if (!virq)
> return -ENXIO;
> +
> + guard(msi_descs_lock)(&pdev->dev);
> +
> /*
> * This is a horrible hack, but short of implementing a PCI
> * specific interrupt chip callback and a huge pile of
> -- 
> 2.41.0.rc2
> 



-- 
Himanshu Madhani	Oracle Linux Engineering
[tip: irq/urgent] PCI/MSI: Prevent recursive locking in pci_msix_write_tph_tag()
Posted by tip-bot2 for Himanshu Madhani 2 months, 4 weeks ago
The following commit has been merged into the irq/urgent branch of tip:

Commit-ID:     68ea85df15d111d82fc474cbe104174791169355
Gitweb:        https://git.kernel.org/tip/68ea85df15d111d82fc474cbe104174791169355
Author:        Himanshu Madhani <himanshu.madhani@oracle.com>
AuthorDate:    Tue, 08 Jul 2025 22:25:30 
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 10 Jul 2025 23:41:08 +02:00

PCI/MSI: Prevent recursive locking in pci_msix_write_tph_tag()

pci_msix_write_tph_tag() takes the per device MSI descriptor mutex and then
invokes msi_domain_get_virq(), which takes the same mutex again. That
obviously results in a system hang which is exposed by a softlockup or
lockdep warning.

Move the lock guard after the invocation of msi_domain_get_virq() to fix
this.

[ tglx: Massage changelog by adding a proper explanation and removing the
  	not really useful stacktrace ]

Fixes: d5124a9957b2 ("PCI/MSI: Provide a sane mechanism for TPH")
Reported-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
Link: https://lore.kernel.org/all/20250708222530.1041477-1-himanshu.madhani@oracle.com
---
 drivers/pci/msi/msi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 6ede55a..d686488 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -934,10 +934,12 @@ int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int index, u16 tag)
 	if (!pdev->msix_enabled)
 		return -ENXIO;
 
-	guard(msi_descs_lock)(&pdev->dev);
 	virq = msi_get_virq(&pdev->dev, index);
 	if (!virq)
 		return -ENXIO;
+
+	guard(msi_descs_lock)(&pdev->dev);
+
 	/*
 	 * This is a horrible hack, but short of implementing a PCI
 	 * specific interrupt chip callback and a huge pile of