[v1] PCI: dpc: Increase pciehp waiting time for DPC recovery

[PATCH] PCI: dpc: Increase pciehp waiting time for DPC recovery

Posted by LeoLiu-oc 2 weeks ago

Commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
amended PCIe hotplug to not bring down the slot upon Data Link Layer State
Changed events caused by Downstream Port Containment.

However, PCIe hotplug (pciehp) waits up to 4 seconds before assuming that
DPC recovery has failed and disabling the slot. This timeout period is
insufficient for some PCIe devices.
For example, the E810 dual-port network card driver needs to take over
10 seconds to execute its err_detected() callback.
Since this exceeds the maximum wait time allowed for DPC recovery by the
hotplug IRQ threads, a race condition occurs between the hotplug thread and
the dpc_handler() thread.

Signed-off-by: LeoLiu-oc <LeoLiu-oc@zhaoxin.com>
---
 drivers/pci/pcie/dpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index fc18349614d7..08b5f275699a 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -121,7 +121,7 @@ bool pci_dpc_recovered(struct pci_dev *pdev)
 	 * but reports indicate that DPC completes within 4 seconds.
 	 */
 	wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
-			   msecs_to_jiffies(4000));
+			   msecs_to_jiffies(16000));
 
 	return test_and_clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
 }
-- 
2.43.0

Re: [PATCH] PCI: dpc: Increase pciehp waiting time for DPC recovery

Posted by Bjorn Helgaas 2 weeks ago

[+cc Lukas, pciehp expert and author of a97396c6eb13]

On Fri, Jan 23, 2026 at 06:40:34PM +0800, LeoLiu-oc wrote:
> Commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
> amended PCIe hotplug to not bring down the slot upon Data Link Layer State
> Changed events caused by Downstream Port Containment.
> 
> However, PCIe hotplug (pciehp) waits up to 4 seconds before assuming that
> DPC recovery has failed and disabling the slot. This timeout period is
> insufficient for some PCIe devices.
> For example, the E810 dual-port network card driver needs to take over
> 10 seconds to execute its err_detected() callback.
> Since this exceeds the maximum wait time allowed for DPC recovery by the
> hotplug IRQ threads, a race condition occurs between the hotplug thread and
> the dpc_handler() thread.

Add blank lines between paragraphs.

Include the name of the E810 driver so we can easily find the
.err_detected() callback in question.  Actually, including the *name*
of that callback would be a very direct way of doing this :)

I guess the problem this fixes is that there was a PCIe error that
triggered DPC, and the E810 .err_detected() works but takes longer
than expected, which results in pciehp disabling the slot when it
doesn't need to?  So the user basically sees a dead E810 device?

It seems unfortunate that we have this dependency on the time allowed
for .err_detected() to execute.  It's nice if adding arbitrary delay
doesn't break things, but maybe we can't always achieve that.

I see that pci_dpc_recovered() is called from pciehp_ist().  Are we
prepared for long delays there?

> Signed-off-by: LeoLiu-oc <LeoLiu-oc@zhaoxin.com>
> ---
>  drivers/pci/pcie/dpc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index fc18349614d7..08b5f275699a 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -121,7 +121,7 @@ bool pci_dpc_recovered(struct pci_dev *pdev)
>  	 * but reports indicate that DPC completes within 4 seconds.
>  	 */
>  	wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
> -			   msecs_to_jiffies(4000));
> +			   msecs_to_jiffies(16000));

It looks like this breaks the connection between the "completes within
4 seconds" comment and the 4000ms wait_event timeout.

>  	return test_and_clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
>  }
> -- 
> 2.43.0
>

Re: [PATCH] PCI: dpc: Increase pciehp waiting time for DPC recovery

Posted by LeoLiu-oc 1 week, 2 days ago


在 2026/1/24 4:21, Bjorn Helgaas 写道:
> 
> 
> [这封邮件来自外部发件人 谨防风险]
> 
> [+cc Lukas, pciehp expert and author of a97396c6eb13]
> 
> On Fri, Jan 23, 2026 at 06:40:34PM +0800, LeoLiu-oc wrote:
>> Commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
>> amended PCIe hotplug to not bring down the slot upon Data Link Layer State
>> Changed events caused by Downstream Port Containment.
>>
>> However, PCIe hotplug (pciehp) waits up to 4 seconds before assuming that
>> DPC recovery has failed and disabling the slot. This timeout period is
>> insufficient for some PCIe devices.
>> For example, the E810 dual-port network card driver needs to take over
>> 10 seconds to execute its err_detected() callback.
>> Since this exceeds the maximum wait time allowed for DPC recovery by the
>> hotplug IRQ threads, a race condition occurs between the hotplug thread and
>> the dpc_handler() thread.
> 
> Add blank lines between paragraphs.
> 
> Include the name of the E810 driver so we can easily find the
> .err_detected() callback in question.  Actually, including the *name*
> of that callback would be a very direct way of doing this :)
> 
> I guess the problem this fixes is that there was a PCIe error that
> triggered DPC, and the E810 .err_detected() works but takes longer
> than expected, which results in pciehp disabling the slot when it
> doesn't need to?  So the user basically sees a dead E810 device?
> 
Yes, this patch is to solve this problem.

> It seems unfortunate that we have this dependency on the time allowed
> for .err_detected() to execute.  It's nice if adding arbitrary delay
> doesn't break things, but maybe we can't always achieve that.
> 
I think this is a feasible solution. For some PCIE devices, executing
the .err_detect() within 4 seconds will not have any impact, for a few
PCIE devices, it might increase the execution time of pciehp_ist().
Without this patch, PCIE devices may not be usable and could even cause
more serious errors, such as a kernel panic. For example, the following
log is encountered in hardware testing:

list_del corruption, ffff8881418b79e8->next is LIST_POISON1
(dead000000000100)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:56!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
...
Kernel panic - not syncing: Fatal exception

> I see that pci_dpc_recovered() is called from pciehp_ist().  Are we
> prepared for long delays there?
> 
This patch may affect the hotplug IRQ threads execution time triggered
by DPC, but it has no effect for normal HotPlug operation, e.g.
Attention Button Pressed or Power Fault Detected. If you have better
modification suggestions, I will update to the next version.

>> Signed-off-by: LeoLiu-oc <LeoLiu-oc@zhaoxin.com>
>> ---
>>  drivers/pci/pcie/dpc.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
>> index fc18349614d7..08b5f275699a 100644
>> --- a/drivers/pci/pcie/dpc.c
>> +++ b/drivers/pci/pcie/dpc.c
>> @@ -121,7 +121,7 @@ bool pci_dpc_recovered(struct pci_dev *pdev)
>>        * but reports indicate that DPC completes within 4 seconds.
>>        */
>>       wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
>> -                        msecs_to_jiffies(4000));
>> +                        msecs_to_jiffies(16000));
> 
> It looks like this breaks the connection between the "completes within
> 4 seconds" comment and the 4000ms wait_event timeout.
> 
Thanks for your suggestion, I will change it in the next version.

Yours sincerely.
LeoLiu-oc

>>       return test_and_clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
>>  }
>> --
>> 2.43.0
>>

Re: [PATCH] PCI: dpc: Increase pciehp waiting time for DPC recovery

Posted by Lukas Wunner 1 week ago

On Wed, Jan 28, 2026 at 06:07:51PM +0800, LeoLiu-oc wrote:
> Without this patch, PCIE devices may not be usable and could even cause
> more serious errors, such as a kernel panic. For example, the following
> log is encountered in hardware testing:
> 
> list_del corruption, ffff8881418b79e8->next is LIST_POISON1
> (dead000000000100)
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:56!
> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> ...
> Kernel panic - not syncing: Fatal exception

This should not happen.  Which kernel version are you using?

There used to be a use-after-free on concurrent DPC and hot-removal.
It was fixed by 11a1f4bc4736, which went into v6.11 and was subsequently
ported all the way back to v5.10-stable.

I suspect you may be using a kernel which lacks that fix.

Thanks,

Lukas

Re: [PATCH] PCI: dpc: Increase pciehp waiting time for DPC recovery

Posted by LeoLiu-oc 5 days ago


在 2026/1/30 19:59, Lukas Wunner 写道:
> 
> 
> [这封邮件来自外部发件人 谨防风险]
> 
> On Wed, Jan 28, 2026 at 06:07:51PM +0800, LeoLiu-oc wrote:
>> Without this patch, PCIE devices may not be usable and could even cause
>> more serious errors, such as a kernel panic. For example, the following
>> log is encountered in hardware testing:
>>
>> list_del corruption, ffff8881418b79e8->next is LIST_POISON1
>> (dead000000000100)
>> ------------[ cut here ]------------
>> kernel BUG at lib/list_debug.c:56!
>> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>> ...
>> Kernel panic - not syncing: Fatal exception
> 
> This should not happen.  Which kernel version are you using?

The kernel version I am using is 6.18.6. This patch has already been
included.

The complete log of the kernel panic is as follows:

[  100.304077][  T843] list_del corruption, ffff8881418b79e8->next is
LIST_POISON1 (dead000000000100)
[  100.312989][  T843] ------------[ cut here ]------------
[  100.318268][  T843] kernel BUG at lib/list_debug.c:56!
[  100.323380][  T843] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  100.329250][  T843] CPU: 7 PID: 843 Comm: irq/27-pciehp Tainted: P
    W  OE     ------- ----  6.6.0-32.7.v2505.ky11.x86_64 #1
[  100.340793][  T843] Source Version:
71d5b964051132b7772acd935972fca11462bbfe
[  100.359228][  T843] RIP: 0010:__list_del_entry_valid_or_report+0x7f/0xc0
[  100.365877][  T843] Code: 66 4b a6 e8 c3 43 a9 ff 0f 0b 48 89 fe 48
c7 c7 10 67 4b a6 e8 b2 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 40 67 4b a6 e8
a1 43 a9 ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 78 67 4b a6 e8 8d 43 a9
ff 0f 0b
[  100.385158][  T843] RSP: 0018:ffffc9000f70fc08 EFLAGS: 00010246
[  100.391024][  T843] RAX: 000000000000004e RBX: ffff8881418b79e8 RCX:
0000000000000000
[  100.398781][  T843] RDX: 0000000000000000 RSI: ffff8897df5a32c0 RDI:
ffff8897df5a32c0
[  100.406538][  T843] RBP: ffff8881257f9608 R08: 0000000000000000 R09:
0000000000000003
[  100.414294][  T843] R10: ffffc9000f70fa90 R11: ffffffffa6fee508 R12:
0000000000000000
[  100.422050][  T843] R13: ffff8881257f9608 R14: ffff888116507c28 R15:
ffff888116507c28
[  100.429807][  T843] FS:  0000000000000000(0000)
GS:ffff8897df580000(0000) knlGS:0000000000000000
[  100.438511][  T843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  100.444891][  T843] CR2: 00007f9563bac1c0 CR3: 0000000c4be26004 CR4:
0000000000570ee0
[  100.452647][  T843] PKRU: 55555554
[  100.456017][  T843] Call Trace:
[  100.459129][  T843]  <TASK>
[  100.461898][  T843]  ice_flow_rem_entry_sync.constprop.0+0x1c/0x90 [ice]
[  100.468663][  T843]  ice_flow_rem_entry+0x3d/0x60 [ice]
[  100.473925][  T843]
ice_fdir_erase_flow_from_hw.constprop.0+0x9b/0x100 [ice]
[  100.481078][  T843]  ice_fdir_rem_flow.constprop.0+0x32/0xb0 [ice]
[  100.487284][  T843]  ice_vsi_manage_fdir+0x7b/0xb0 [ice]
[  100.492629][  T843]  ice_deinit_features.part.0+0x46/0xc0 [ice]
[  100.498571][  T843]  ice_remove+0xcf/0x220 [ice]
[  100.503222][  T843]  pci_device_remove+0x3f/0xb0
[  100.507798][  T843]  device_release_driver_internal+0x19d/0x220
[  100.513667][  T843]  pci_stop_bus_device+0x6c/0x90
[  100.518417][  T843]  pci_stop_and_remove_bus_device+0x12/0x20
[  100.524110][  T843]  pciehp_unconfigure_device+0x9f/0x160
[  100.529463][  T843]  pciehp_disable_slot+0x69/0x130
[  100.534296][  T843]  pciehp_handle_presence_or_link_change+0xfc/0x210
[  100.540678][  T843]  pciehp_ist+0x204/0x230
[  100.544824][  T843]  ? __pfx_irq_thread_fn+0x10/0x10
[  100.549747][  T843]  irq_thread_fn+0x20/0x60
[  100.553978][  T843]  irq_thread+0xfb/0x1c0
[  100.558038][  T843]  ? __pfx_irq_thread_dtor+0x10/0x10
[  100.563130][  T843]  ? __pfx_irq_thread+0x10/0x10
[  100.567791][  T843]  kthread+0xe5/0x120
[  100.571594][  T843]  ? __pfx_kthread+0x10/0x10
[  100.575997][  T843]  ret_from_fork+0x17a/0x1a0
[  100.580403][  T843]  ? __pfx_kthread+0x10/0x10
[  100.584805][  T843]  ret_from_fork_asm+0x1a/0x30
[  100.589384][  T843]  </TASK>
[  100.592237][  T843] Modules linked in: zxmem(OE) einj amdgpu amdxcp
gpu_sched drm_exec drm_buddy nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 zhaoxin_cputemp
nf_defrag_ipv4 zhaoxin_rng snd_hda_codec_hdmi radeon rfkill
snd_hda_intel snd_intel_dspcfg irdma i2c_algo_bit snd_intel_sdw_acpi
ip_set i40e drm_suballoc_helper nf_tables drm_ttm_helper pcicfg(POE)
snd_hda_codec ib_uverbs sunrpc ttm ib_core snd_hda_core
drm_display_helper snd_hwdep kvm_intel snd_pcm cec vfat fat
drm_kms_helper snd_timer kvm video ice snd psmouse soundcore wmi
acpi_cpufreq pcspkr i2c_zhaoxin sg sch_fq_codel drm fuse backlight
nfnetlink xfs sd_mod t10_pi sm2_zhaoxin_gmi crct10dif_pclmul
crc32_pclmul ahci crc32c_intel libahci r8169 ghash_clmulni_intel libata
sha512_ssse3 serio_raw realtek dm_mirror dm_region_hash dm_log
dm_multipath dm_mod i2c_dev autofs4
[  100.674508][  T843] ---[ end trace 0000000000000000 ]---
[  100.709547][  T843] RIP: 0010:__list_del_entry_valid_or_report+0x7f/0xc0
[  100.716197][  T843] Code: 66 4b a6 e8 c3 43 a9 ff 0f 0b 48 89 fe 48
c7 c7 10 67 4b a6 e8 b2 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 40 67 4b a6 e8
a1 43 a9 ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 78 67 4b a6 e8 8d 43 a9
ff 0f 0b
[  100.735491][  T843] RSP: 0018:ffffc9000f70fc08 EFLAGS: 00010246
[  100.741367][  T843] RAX: 000000000000004e RBX: ffff8881418b79e8 RCX:
0000000000000000
[  100.749137][  T843] RDX: 0000000000000000 RSI: ffff8897df5a32c0 RDI:
ffff8897df5a32c0
[  100.756909][  T843] RBP: ffff8881257f9608 R08: 0000000000000000 R09:
0000000000000003
[  100.764678][  T843] R10: ffffc9000f70fa90 R11: ffffffffa6fee508 R12:
0000000000000000
[  100.772448][  T843] R13: ffff8881257f9608 R14: ffff888116507c28 R15:
ffff888116507c28
[  100.780218][  T843] FS:  0000000000000000(0000)
GS:ffff8897df580000(0000) knlGS:0000000000000000
[  100.788934][  T843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  100.795329][  T843] CR2: 00007f9563bac1c0 CR3: 0000000c4be26004 CR4:
0000000000570ee0
[  100.803099][  T843] PKRU: 55555554
[  100.806483][  T843] Kernel panic - not syncing: Fatal exception
[  100.812794][  T843] Kernel Offset: disabled
[  100.821613][  T843] pstore: backend (erst) writing error (-28)
[  100.827481][  T843] ---[ end Kernel panic - not syncing: Fatal
exception ]---

The reason for this kernel panic is that the ice network card driver
executed the ice_pci_err_detected() for a longer time than the maximum
waiting time allowed by pciehp. After that, the pciehp_ist() will
execute the ice network card driver's ice_remove() process. This results
in the ice_pci_err_detected() having already deleted the list, while the
ice_remove() is still attempting to delete a list that no longer exists.

> There used to be a use-after-free on concurrent DPC and hot-removal.
> It was fixed by 11a1f4bc4736, which went into v6.11 and was subsequently
> ported all the way back to v5.10-stable.
> 
> I suspect you may be using a kernel which lacks that fix.
> 

From the above analysis process, it is clear that this is not the same
issue.

Yours sincerely,
LeoLiu-oc

> Thanks,
> 
> Lukas

Re: [PATCH] PCI: dpc: Increase pciehp waiting time for DPC recovery

Posted by Lukas Wunner 4 days, 21 hours ago

[cc += Tony, Przemek (ice driver maintainers), start of thread is here:
https://lore.kernel.org/all/20260123104034.429060-1-LeoLiu-oc@zhaoxin.com/
]

On Mon, Feb 02, 2026 at 02:00:55PM +0800, LeoLiu-oc wrote:
> The kernel version I am using is 6.18.6.
[...]
> The complete log of the kernel panic is as follows:
> 
> [  100.304077][  T843] list_del corruption, ffff8881418b79e8->next is LIST_POISON1 (dead000000000100)
> [  100.312989][  T843] ------------[ cut here ]------------
> [  100.318268][  T843] kernel BUG at lib/list_debug.c:56!
> [  100.323380][  T843] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [  100.329250][  T843] CPU: 7 PID: 843 Comm: irq/27-pciehp Tainted: P    W  OE     ------- ----  6.6.0-32.7.v2505.ky11.x86_64 #1
> [  100.340793][  T843] Source Version: 71d5b964051132b7772acd935972fca11462bbfe
> [  100.359228][  T843] RIP: 0010:__list_del_entry_valid_or_report+0x7f/0xc0
> [  100.365877][  T843] Code: 66 4b a6 e8 c3 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 10 67 4b a6 e8 b2 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 40 67 4b a6 e8 a1 43 a9 ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 78 67 4b a6 e8 8d 43 a9 ff 0f 0b
> [  100.385158][  T843] RSP: 0018:ffffc9000f70fc08 EFLAGS: 00010246
> [  100.391024][  T843] RAX: 000000000000004e RBX: ffff8881418b79e8 RCX: 0000000000000000
> [  100.398781][  T843] RDX: 0000000000000000 RSI: ffff8897df5a32c0 RDI: ffff8897df5a32c0
> [  100.406538][  T843] RBP: ffff8881257f9608 R08: 0000000000000000 R09: 0000000000000003
> [  100.414294][  T843] R10: ffffc9000f70fa90 R11: ffffffffa6fee508 R12: 0000000000000000
> [  100.422050][  T843] R13: ffff8881257f9608 R14: ffff888116507c28 R15: ffff888116507c28
> [  100.429807][  T843] FS:  0000000000000000(0000) GS:ffff8897df580000(0000) knlGS:0000000000000000
> [  100.438511][  T843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  100.444891][  T843] CR2: 00007f9563bac1c0 CR3: 0000000c4be26004 CR4: 0000000000570ee0
> [  100.452647][  T843] PKRU: 55555554
> [  100.456017][  T843] Call Trace:
> [  100.459129][  T843]  <TASK>
> [  100.461898][  T843]  ice_flow_rem_entry_sync.constprop.0+0x1c/0x90 [ice]
> [  100.468663][  T843]  ice_flow_rem_entry+0x3d/0x60 [ice]
> [  100.473925][  T843]  ice_fdir_erase_flow_from_hw.constprop.0+0x9b/0x100 [ice]
> [  100.481078][  T843]  ice_fdir_rem_flow.constprop.0+0x32/0xb0 [ice]
> [  100.487284][  T843]  ice_vsi_manage_fdir+0x7b/0xb0 [ice]
> [  100.492629][  T843]  ice_deinit_features.part.0+0x46/0xc0 [ice]
> [  100.498571][  T843]  ice_remove+0xcf/0x220 [ice]
> [  100.503222][  T843]  pci_device_remove+0x3f/0xb0
> [  100.507798][  T843]  device_release_driver_internal+0x19d/0x220
> [  100.513667][  T843]  pci_stop_bus_device+0x6c/0x90
> [  100.518417][  T843]  pci_stop_and_remove_bus_device+0x12/0x20
> [  100.524110][  T843]  pciehp_unconfigure_device+0x9f/0x160
> [  100.529463][  T843]  pciehp_disable_slot+0x69/0x130
> [  100.534296][  T843]  pciehp_handle_presence_or_link_change+0xfc/0x210
> [  100.540678][  T843]  pciehp_ist+0x204/0x230
> [  100.544824][  T843]  ? __pfx_irq_thread_fn+0x10/0x10
> [  100.549747][  T843]  irq_thread_fn+0x20/0x60
> [  100.553978][  T843]  irq_thread+0xfb/0x1c0
> [  100.558038][  T843]  ? __pfx_irq_thread_dtor+0x10/0x10
> [  100.563130][  T843]  ? __pfx_irq_thread+0x10/0x10
> [  100.567791][  T843]  kthread+0xe5/0x120
> [  100.571594][  T843]  ? __pfx_kthread+0x10/0x10
> [  100.575997][  T843]  ret_from_fork+0x17a/0x1a0
> [  100.580403][  T843]  ? __pfx_kthread+0x10/0x10
> [  100.584805][  T843]  ret_from_fork_asm+0x1a/0x30
> [  100.589384][  T843]  </TASK>
> [  100.592237][  T843] Modules linked in: zxmem(OE) einj amdgpu amdxcp
> gpu_sched drm_exec drm_buddy nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
> nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 zhaoxin_cputemp
> nf_defrag_ipv4 zhaoxin_rng snd_hda_codec_hdmi radeon rfkill
> snd_hda_intel snd_intel_dspcfg irdma i2c_algo_bit snd_intel_sdw_acpi
> ip_set i40e drm_suballoc_helper nf_tables drm_ttm_helper pcicfg(POE)
> snd_hda_codec ib_uverbs sunrpc ttm ib_core snd_hda_core
> drm_display_helper snd_hwdep kvm_intel snd_pcm cec vfat fat
> drm_kms_helper snd_timer kvm video ice snd psmouse soundcore wmi
> acpi_cpufreq pcspkr i2c_zhaoxin sg sch_fq_codel drm fuse backlight
> nfnetlink xfs sd_mod t10_pi sm2_zhaoxin_gmi crct10dif_pclmul
> crc32_pclmul ahci crc32c_intel libahci r8169 ghash_clmulni_intel libata
> sha512_ssse3 serio_raw realtek dm_mirror dm_region_hash dm_log
> dm_multipath dm_mod i2c_dev autofs4
> [  100.674508][  T843] ---[ end trace 0000000000000000 ]---
> [  100.709547][  T843] RIP: 0010:__list_del_entry_valid_or_report+0x7f/0xc0
> [  100.716197][  T843] Code: 66 4b a6 e8 c3 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 10 67 4b a6 e8 b2 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 40 67 4b a6 e8 a1 43 a9 ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 78 67 4b a6 e8 8d 43 a9 ff 0f 0b
> [  100.735491][  T843] RSP: 0018:ffffc9000f70fc08 EFLAGS: 00010246
> [  100.741367][  T843] RAX: 000000000000004e RBX: ffff8881418b79e8 RCX: 0000000000000000
> [  100.749137][  T843] RDX: 0000000000000000 RSI: ffff8897df5a32c0 RDI: ffff8897df5a32c0
> [  100.756909][  T843] RBP: ffff8881257f9608 R08: 0000000000000000 R09: 0000000000000003
> [  100.764678][  T843] R10: ffffc9000f70fa90 R11: ffffffffa6fee508 R12: 0000000000000000
> [  100.772448][  T843] R13: ffff8881257f9608 R14: ffff888116507c28 R15: ffff888116507c28
> [  100.780218][  T843] FS:  0000000000000000(0000) GS:ffff8897df580000(0000) knlGS:0000000000000000
> [  100.788934][  T843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  100.795329][  T843] CR2: 00007f9563bac1c0 CR3: 0000000c4be26004 CR4: 0000000000570ee0
> [  100.803099][  T843] PKRU: 55555554
> [  100.806483][  T843] Kernel panic - not syncing: Fatal exception
> [  100.812794][  T843] Kernel Offset: disabled
> [  100.821613][  T843] pstore: backend (erst) writing error (-28)
> [  100.827481][  T843] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 
> The reason for this kernel panic is that the ice network card driver
> executed the ice_pci_err_detected() for a longer time than the maximum
> waiting time allowed by pciehp. After that, the pciehp_ist() will
> execute the ice network card driver's ice_remove() process. This results
> in the ice_pci_err_detected() having already deleted the list, while the
> ice_remove() is still attempting to delete a list that no longer exists.

This is a bug in the ice driver, not in the pciehp or dpc driver.
As such, it is not a good argument to support the extension of the
timeout.  I'm not against extending the timeout, but the argument
that it's necessary to avoid occurrence of a bug is not a good one.

You should first try to unbind the ice driver at runtime to see if
there is a general problem in the unbind code path:

echo abcd:ef:gh.i > /sys/bus/pci/drivers/shpchp/unbind

Replace abcd:ef:gh.i with the domain/bus/device/function of the Ethernet
card.  The dmesg excerpt you've provided unfortunately does not betray
the card's address.

Then try to rebind the driver via the "bind" sysfs attribute.

If this works, the next thing to debug is whether the driver has a
problem with surprise removal.  I'm not fully convinced that the
crash you're seeing is caused by concurrent execution of
ice_pci_err_detected() and ice_remove().  When pciehp unbinds the
driver during DPC recovery, the device is likely inaccessible.
It's possible that ice_remove() behaves differently for an
inaccessible device and that may cause the crash instead of the
concurrent execution of ice_pci_err_detected().

It would also be good to understand why DPC recovery of the Ethernet
card takes this long.  Does it take a long time to come out of reset?
Could the ice driver be changed to allow for faster recovery?

Thanks,

Lukas

Re: [PATCH] PCI: dpc: Increase pciehp waiting time for DPC recovery

Posted by LeoLiu-oc 3 days, 4 hours ago


在 2026/2/2 17:02, Lukas Wunner 写道:
> 
> 
> [这封邮件来自外部发件人 谨防风险]
> 
> [cc += Tony, Przemek (ice driver maintainers), start of thread is here:
> https://lore.kernel.org/all/20260123104034.429060-1-LeoLiu-oc@zhaoxin.com/
> ]
> 
> On Mon, Feb 02, 2026 at 02:00:55PM +0800, LeoLiu-oc wrote:
>> The kernel version I am using is 6.18.6.
> [...]
>> The complete log of the kernel panic is as follows:
>>
>> [  100.304077][  T843] list_del corruption, ffff8881418b79e8->next is LIST_POISON1 (dead000000000100)
>> [  100.312989][  T843] ------------[ cut here ]------------
>> [  100.318268][  T843] kernel BUG at lib/list_debug.c:56!
>> [  100.323380][  T843] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>> [  100.329250][  T843] CPU: 7 PID: 843 Comm: irq/27-pciehp Tainted: P    W  OE     ------- ----  6.6.0-32.7.v2505.ky11.x86_64 #1
>> [  100.340793][  T843] Source Version: 71d5b964051132b7772acd935972fca11462bbfe
>> [  100.359228][  T843] RIP: 0010:__list_del_entry_valid_or_report+0x7f/0xc0
>> [  100.365877][  T843] Code: 66 4b a6 e8 c3 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 10 67 4b a6 e8 b2 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 40 67 4b a6 e8 a1 43 a9 ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 78 67 4b a6 e8 8d 43 a9 ff 0f 0b
>> [  100.385158][  T843] RSP: 0018:ffffc9000f70fc08 EFLAGS: 00010246
>> [  100.391024][  T843] RAX: 000000000000004e RBX: ffff8881418b79e8 RCX: 0000000000000000
>> [  100.398781][  T843] RDX: 0000000000000000 RSI: ffff8897df5a32c0 RDI: ffff8897df5a32c0
>> [  100.406538][  T843] RBP: ffff8881257f9608 R08: 0000000000000000 R09: 0000000000000003
>> [  100.414294][  T843] R10: ffffc9000f70fa90 R11: ffffffffa6fee508 R12: 0000000000000000
>> [  100.422050][  T843] R13: ffff8881257f9608 R14: ffff888116507c28 R15: ffff888116507c28
>> [  100.429807][  T843] FS:  0000000000000000(0000) GS:ffff8897df580000(0000) knlGS:0000000000000000
>> [  100.438511][  T843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  100.444891][  T843] CR2: 00007f9563bac1c0 CR3: 0000000c4be26004 CR4: 0000000000570ee0
>> [  100.452647][  T843] PKRU: 55555554
>> [  100.456017][  T843] Call Trace:
>> [  100.459129][  T843]  <TASK>
>> [  100.461898][  T843]  ice_flow_rem_entry_sync.constprop.0+0x1c/0x90 [ice]
>> [  100.468663][  T843]  ice_flow_rem_entry+0x3d/0x60 [ice]
>> [  100.473925][  T843]  ice_fdir_erase_flow_from_hw.constprop.0+0x9b/0x100 [ice]
>> [  100.481078][  T843]  ice_fdir_rem_flow.constprop.0+0x32/0xb0 [ice]
>> [  100.487284][  T843]  ice_vsi_manage_fdir+0x7b/0xb0 [ice]
>> [  100.492629][  T843]  ice_deinit_features.part.0+0x46/0xc0 [ice]
>> [  100.498571][  T843]  ice_remove+0xcf/0x220 [ice]
>> [  100.503222][  T843]  pci_device_remove+0x3f/0xb0
>> [  100.507798][  T843]  device_release_driver_internal+0x19d/0x220
>> [  100.513667][  T843]  pci_stop_bus_device+0x6c/0x90
>> [  100.518417][  T843]  pci_stop_and_remove_bus_device+0x12/0x20
>> [  100.524110][  T843]  pciehp_unconfigure_device+0x9f/0x160
>> [  100.529463][  T843]  pciehp_disable_slot+0x69/0x130
>> [  100.534296][  T843]  pciehp_handle_presence_or_link_change+0xfc/0x210
>> [  100.540678][  T843]  pciehp_ist+0x204/0x230
>> [  100.544824][  T843]  ? __pfx_irq_thread_fn+0x10/0x10
>> [  100.549747][  T843]  irq_thread_fn+0x20/0x60
>> [  100.553978][  T843]  irq_thread+0xfb/0x1c0
>> [  100.558038][  T843]  ? __pfx_irq_thread_dtor+0x10/0x10
>> [  100.563130][  T843]  ? __pfx_irq_thread+0x10/0x10
>> [  100.567791][  T843]  kthread+0xe5/0x120
>> [  100.571594][  T843]  ? __pfx_kthread+0x10/0x10
>> [  100.575997][  T843]  ret_from_fork+0x17a/0x1a0
>> [  100.580403][  T843]  ? __pfx_kthread+0x10/0x10
>> [  100.584805][  T843]  ret_from_fork_asm+0x1a/0x30
>> [  100.589384][  T843]  </TASK>
>> [  100.592237][  T843] Modules linked in: zxmem(OE) einj amdgpu amdxcp
>> gpu_sched drm_exec drm_buddy nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
>> nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
>> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 zhaoxin_cputemp
>> nf_defrag_ipv4 zhaoxin_rng snd_hda_codec_hdmi radeon rfkill
>> snd_hda_intel snd_intel_dspcfg irdma i2c_algo_bit snd_intel_sdw_acpi
>> ip_set i40e drm_suballoc_helper nf_tables drm_ttm_helper pcicfg(POE)
>> snd_hda_codec ib_uverbs sunrpc ttm ib_core snd_hda_core
>> drm_display_helper snd_hwdep kvm_intel snd_pcm cec vfat fat
>> drm_kms_helper snd_timer kvm video ice snd psmouse soundcore wmi
>> acpi_cpufreq pcspkr i2c_zhaoxin sg sch_fq_codel drm fuse backlight
>> nfnetlink xfs sd_mod t10_pi sm2_zhaoxin_gmi crct10dif_pclmul
>> crc32_pclmul ahci crc32c_intel libahci r8169 ghash_clmulni_intel libata
>> sha512_ssse3 serio_raw realtek dm_mirror dm_region_hash dm_log
>> dm_multipath dm_mod i2c_dev autofs4
>> [  100.674508][  T843] ---[ end trace 0000000000000000 ]---
>> [  100.709547][  T843] RIP: 0010:__list_del_entry_valid_or_report+0x7f/0xc0
>> [  100.716197][  T843] Code: 66 4b a6 e8 c3 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 10 67 4b a6 e8 b2 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 40 67 4b a6 e8 a1 43 a9 ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 78 67 4b a6 e8 8d 43 a9 ff 0f 0b
>> [  100.735491][  T843] RSP: 0018:ffffc9000f70fc08 EFLAGS: 00010246
>> [  100.741367][  T843] RAX: 000000000000004e RBX: ffff8881418b79e8 RCX: 0000000000000000
>> [  100.749137][  T843] RDX: 0000000000000000 RSI: ffff8897df5a32c0 RDI: ffff8897df5a32c0
>> [  100.756909][  T843] RBP: ffff8881257f9608 R08: 0000000000000000 R09: 0000000000000003
>> [  100.764678][  T843] R10: ffffc9000f70fa90 R11: ffffffffa6fee508 R12: 0000000000000000
>> [  100.772448][  T843] R13: ffff8881257f9608 R14: ffff888116507c28 R15: ffff888116507c28
>> [  100.780218][  T843] FS:  0000000000000000(0000) GS:ffff8897df580000(0000) knlGS:0000000000000000
>> [  100.788934][  T843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  100.795329][  T843] CR2: 00007f9563bac1c0 CR3: 0000000c4be26004 CR4: 0000000000570ee0
>> [  100.803099][  T843] PKRU: 55555554
>> [  100.806483][  T843] Kernel panic - not syncing: Fatal exception
>> [  100.812794][  T843] Kernel Offset: disabled
>> [  100.821613][  T843] pstore: backend (erst) writing error (-28)
>> [  100.827481][  T843] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>> The reason for this kernel panic is that the ice network card driver
>> executed the ice_pci_err_detected() for a longer time than the maximum
>> waiting time allowed by pciehp. After that, the pciehp_ist() will
>> execute the ice network card driver's ice_remove() process. This results
>> in the ice_pci_err_detected() having already deleted the list, while the
>> ice_remove() is still attempting to delete a list that no longer exists.
> 
> This is a bug in the ice driver, not in the pciehp or dpc driver.
> As such, it is not a good argument to support the extension of the
> timeout.  I'm not against extending the timeout, but the argument
> that it's necessary to avoid occurrence of a bug is not a good one.
> 
> You should first try to unbind the ice driver at runtime to see if
> there is a general problem in the unbind code path:
> 
> echo abcd:ef:gh.i > /sys/bus/pci/drivers/shpchp/unbind
> 
> Replace abcd:ef:gh.i with the domain/bus/device/function of the Ethernet
> card.  The dmesg excerpt you've provided unfortunately does not betray
> the card's address.
> 
> Then try to rebind the driver via the "bind" sysfs attribute.
> 
> If this works, the next thing to debug is whether the driver has a
> problem with surprise removal.  I'm not fully convinced that the
> crash you're seeing is caused by concurrent execution of
> ice_pci_err_detected() and ice_remove().  When pciehp unbinds the
> driver during DPC recovery, the device is likely inaccessible.
> It's possible that ice_remove() behaves differently for an
> inaccessible device and that may cause the crash instead of the
> concurrent execution of ice_pci_err_detected().
> 
The fundamental cause of this problem lies in the fact that the network
driver took longer than the maximum time (4 seconds) set by pcie_ist()
for the DPC to recover when executing ice_pci_err_detected(). This
forced the execution of pciehp_disable_slot() which should not have been
executed, while pcie_do_recovery() continued to execute. This situation
led to a competition between the execution processes of
pciehp_disable_slot() and pcie_do_recovery(), resulting in the
unavailability of the device and the possibility of kernel crashes.

> It would also be good to understand why DPC recovery of the Ethernet
> card takes this long.  Does it take a long time to come out of reset?
> Could the ice driver be changed to allow for faster recovery?
> 
Based on the current situation, it is observed that the execution of
ice_pci_err_detected() in the ice network card driver takes a very long
time, which is intolerable for the synchronization protocol between the
PCIe hotplug driver and the DPC recovery.

Yours sincerely,
LeoLiu-oc

> Thanks,
> 
> Lukas

Re: [PATCH] PCI: dpc: Increase pciehp waiting time for DPC recovery

Posted by LeoLiu-oc 22 hours ago


在 2026/2/4 10:10, LeoLiu-oc 写道:
> 
> 
> 在 2026/2/2 17:02, Lukas Wunner 写道:
>>
>>
>> [这封邮件来自外部发件人 谨防风险]
>>
>> [cc += Tony, Przemek (ice driver maintainers), start of thread is here:
>> https://lore.kernel.org/all/20260123104034.429060-1-LeoLiu-oc@zhaoxin.com/
>> ]
>>
>> On Mon, Feb 02, 2026 at 02:00:55PM +0800, LeoLiu-oc wrote:
>>> The kernel version I am using is 6.18.6.
>> [...]
>>> The complete log of the kernel panic is as follows:
>>>
>>> [  100.304077][  T843] list_del corruption, ffff8881418b79e8->next is LIST_POISON1 (dead000000000100)
>>> [  100.312989][  T843] ------------[ cut here ]------------
>>> [  100.318268][  T843] kernel BUG at lib/list_debug.c:56!
>>> [  100.323380][  T843] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>> [  100.329250][  T843] CPU: 7 PID: 843 Comm: irq/27-pciehp Tainted: P    W  OE     ------- ----  6.6.0-32.7.v2505.ky11.x86_64 #1
>>> [  100.340793][  T843] Source Version: 71d5b964051132b7772acd935972fca11462bbfe
>>> [  100.359228][  T843] RIP: 0010:__list_del_entry_valid_or_report+0x7f/0xc0
>>> [  100.365877][  T843] Code: 66 4b a6 e8 c3 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 10 67 4b a6 e8 b2 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 40 67 4b a6 e8 a1 43 a9 ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 78 67 4b a6 e8 8d 43 a9 ff 0f 0b
>>> [  100.385158][  T843] RSP: 0018:ffffc9000f70fc08 EFLAGS: 00010246
>>> [  100.391024][  T843] RAX: 000000000000004e RBX: ffff8881418b79e8 RCX: 0000000000000000
>>> [  100.398781][  T843] RDX: 0000000000000000 RSI: ffff8897df5a32c0 RDI: ffff8897df5a32c0
>>> [  100.406538][  T843] RBP: ffff8881257f9608 R08: 0000000000000000 R09: 0000000000000003
>>> [  100.414294][  T843] R10: ffffc9000f70fa90 R11: ffffffffa6fee508 R12: 0000000000000000
>>> [  100.422050][  T843] R13: ffff8881257f9608 R14: ffff888116507c28 R15: ffff888116507c28
>>> [  100.429807][  T843] FS:  0000000000000000(0000) GS:ffff8897df580000(0000) knlGS:0000000000000000
>>> [  100.438511][  T843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  100.444891][  T843] CR2: 00007f9563bac1c0 CR3: 0000000c4be26004 CR4: 0000000000570ee0
>>> [  100.452647][  T843] PKRU: 55555554
>>> [  100.456017][  T843] Call Trace:
>>> [  100.459129][  T843]  <TASK>
>>> [  100.461898][  T843]  ice_flow_rem_entry_sync.constprop.0+0x1c/0x90 [ice]
>>> [  100.468663][  T843]  ice_flow_rem_entry+0x3d/0x60 [ice]
>>> [  100.473925][  T843]  ice_fdir_erase_flow_from_hw.constprop.0+0x9b/0x100 [ice]
>>> [  100.481078][  T843]  ice_fdir_rem_flow.constprop.0+0x32/0xb0 [ice]
>>> [  100.487284][  T843]  ice_vsi_manage_fdir+0x7b/0xb0 [ice]
>>> [  100.492629][  T843]  ice_deinit_features.part.0+0x46/0xc0 [ice]
>>> [  100.498571][  T843]  ice_remove+0xcf/0x220 [ice]
>>> [  100.503222][  T843]  pci_device_remove+0x3f/0xb0
>>> [  100.507798][  T843]  device_release_driver_internal+0x19d/0x220
>>> [  100.513667][  T843]  pci_stop_bus_device+0x6c/0x90
>>> [  100.518417][  T843]  pci_stop_and_remove_bus_device+0x12/0x20
>>> [  100.524110][  T843]  pciehp_unconfigure_device+0x9f/0x160
>>> [  100.529463][  T843]  pciehp_disable_slot+0x69/0x130
>>> [  100.534296][  T843]  pciehp_handle_presence_or_link_change+0xfc/0x210
>>> [  100.540678][  T843]  pciehp_ist+0x204/0x230
>>> [  100.544824][  T843]  ? __pfx_irq_thread_fn+0x10/0x10
>>> [  100.549747][  T843]  irq_thread_fn+0x20/0x60
>>> [  100.553978][  T843]  irq_thread+0xfb/0x1c0
>>> [  100.558038][  T843]  ? __pfx_irq_thread_dtor+0x10/0x10
>>> [  100.563130][  T843]  ? __pfx_irq_thread+0x10/0x10
>>> [  100.567791][  T843]  kthread+0xe5/0x120
>>> [  100.571594][  T843]  ? __pfx_kthread+0x10/0x10
>>> [  100.575997][  T843]  ret_from_fork+0x17a/0x1a0
>>> [  100.580403][  T843]  ? __pfx_kthread+0x10/0x10
>>> [  100.584805][  T843]  ret_from_fork_asm+0x1a/0x30
>>> [  100.589384][  T843]  </TASK>
>>> [  100.592237][  T843] Modules linked in: zxmem(OE) einj amdgpu amdxcp
>>> gpu_sched drm_exec drm_buddy nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
>>> nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
>>> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 zhaoxin_cputemp
>>> nf_defrag_ipv4 zhaoxin_rng snd_hda_codec_hdmi radeon rfkill
>>> snd_hda_intel snd_intel_dspcfg irdma i2c_algo_bit snd_intel_sdw_acpi
>>> ip_set i40e drm_suballoc_helper nf_tables drm_ttm_helper pcicfg(POE)
>>> snd_hda_codec ib_uverbs sunrpc ttm ib_core snd_hda_core
>>> drm_display_helper snd_hwdep kvm_intel snd_pcm cec vfat fat
>>> drm_kms_helper snd_timer kvm video ice snd psmouse soundcore wmi
>>> acpi_cpufreq pcspkr i2c_zhaoxin sg sch_fq_codel drm fuse backlight
>>> nfnetlink xfs sd_mod t10_pi sm2_zhaoxin_gmi crct10dif_pclmul
>>> crc32_pclmul ahci crc32c_intel libahci r8169 ghash_clmulni_intel libata
>>> sha512_ssse3 serio_raw realtek dm_mirror dm_region_hash dm_log
>>> dm_multipath dm_mod i2c_dev autofs4
>>> [  100.674508][  T843] ---[ end trace 0000000000000000 ]---
>>> [  100.709547][  T843] RIP: 0010:__list_del_entry_valid_or_report+0x7f/0xc0
>>> [  100.716197][  T843] Code: 66 4b a6 e8 c3 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 10 67 4b a6 e8 b2 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 40 67 4b a6 e8 a1 43 a9 ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 78 67 4b a6 e8 8d 43 a9 ff 0f 0b
>>> [  100.735491][  T843] RSP: 0018:ffffc9000f70fc08 EFLAGS: 00010246
>>> [  100.741367][  T843] RAX: 000000000000004e RBX: ffff8881418b79e8 RCX: 0000000000000000
>>> [  100.749137][  T843] RDX: 0000000000000000 RSI: ffff8897df5a32c0 RDI: ffff8897df5a32c0
>>> [  100.756909][  T843] RBP: ffff8881257f9608 R08: 0000000000000000 R09: 0000000000000003
>>> [  100.764678][  T843] R10: ffffc9000f70fa90 R11: ffffffffa6fee508 R12: 0000000000000000
>>> [  100.772448][  T843] R13: ffff8881257f9608 R14: ffff888116507c28 R15: ffff888116507c28
>>> [  100.780218][  T843] FS:  0000000000000000(0000) GS:ffff8897df580000(0000) knlGS:0000000000000000
>>> [  100.788934][  T843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  100.795329][  T843] CR2: 00007f9563bac1c0 CR3: 0000000c4be26004 CR4: 0000000000570ee0
>>> [  100.803099][  T843] PKRU: 55555554
>>> [  100.806483][  T843] Kernel panic - not syncing: Fatal exception
>>> [  100.812794][  T843] Kernel Offset: disabled
>>> [  100.821613][  T843] pstore: backend (erst) writing error (-28)
>>> [  100.827481][  T843] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>
>>> The reason for this kernel panic is that the ice network card driver
>>> executed the ice_pci_err_detected() for a longer time than the maximum
>>> waiting time allowed by pciehp. After that, the pciehp_ist() will
>>> execute the ice network card driver's ice_remove() process. This results
>>> in the ice_pci_err_detected() having already deleted the list, while the
>>> ice_remove() is still attempting to delete a list that no longer exists.
>>
>> This is a bug in the ice driver, not in the pciehp or dpc driver.
>> As such, it is not a good argument to support the extension of the
>> timeout.  I'm not against extending the timeout, but the argument
>> that it's necessary to avoid occurrence of a bug is not a good one.
>>
>> You should first try to unbind the ice driver at runtime to see if
>> there is a general problem in the unbind code path:
>>
>> echo abcd:ef:gh.i > /sys/bus/pci/drivers/shpchp/unbind
>>
>> Replace abcd:ef:gh.i with the domain/bus/device/function of the Ethernet
>> card.  The dmesg excerpt you've provided unfortunately does not betray
>> the card's address.
>>
>> Then try to rebind the driver via the "bind" sysfs attribute.
>>
Sorry, I didn't mean to ignore your question，because these issues are
not the cause of the kernel panic. I have previously conducted a test
where I first unbound the ice network card and then bound it. There was
no problem with that.

>> If this works, the next thing to debug is whether the driver has a
>> problem with surprise removal.  I'm not fully convinced that the
>> crash you're seeing is caused by concurrent execution of
>> ice_pci_err_detected() and ice_remove().  When pciehp unbinds the
>> driver during DPC recovery, the device is likely inaccessible.
>> It's possible that ice_remove() behaves differently for an
>> inaccessible device and that may cause the crash instead of the
>> concurrent execution of ice_pci_err_detected().
>>
I was able to turn off the power supply of the slot where the ice
network card is located and then enable the power supply through the
sysfs interface, without any issues.

For example,
echo 0 > /sys/bus/pci/slots/[slot number]/power
echo 1 > /sys/bus/pci/slots/[slot number]/power

It is also fine to perform DPC recovery separately for the slot where
the ice network card is located.

When the slot where the ice network card is located enables both DPC and
hotplug simultaneously, conducting the DPC recovery test will result in
issues, such as unavailability of the device and kernel panic.

I had previously confirmed through the code for deleting the list by
using the core dump method. The cause of the kernel panic was exactly as
I had described before: The reason for this kernel panic is that the ice
network card driver executed the ice_pci_err_detected() for a longer
time than the maximum waiting time allowed by pciehp. After that, the
pciehp_ist() will execute the ice network card driver's ice_remove()
process. This results in the ice_pci_err_detected() having already
deleted the list, while the ice_remove() is still attempting to delete a
list that no longer exists.

> The fundamental cause of this problem lies in the fact that the network
> driver took longer than the maximum time (4 seconds) set by pcie_ist()
> for the DPC to recover when executing ice_pci_err_detected(). This
> forced the execution of pciehp_disable_slot() which should not have been
> executed, while pcie_do_recovery() continued to execute. This situation
> led to a competition between the execution processes of
> pciehp_disable_slot() and pcie_do_recovery(), resulting in the
> unavailability of the device and the possibility of kernel crashes.
> 
Add some information about this question, this might be a problem with a
series of PCIe devices, rather than just an issue with the ice network
card driver. Therefore, we should address the issue at the PCIe driver
architecture level to ensure that other PCIe devices do not encounter
such problems.

>> It would also be good to understand why DPC recovery of the Ethernet
>> card takes this long.  Does it take a long time to come out of reset?
>> Could the ice driver be changed to allow for faster recovery?
>>
> Based on the current situation, it is observed that the execution of
> ice_pci_err_detected() in the ice network card driver takes a very long
> time, which is intolerable for the synchronization protocol between the
> PCIe hotplug driver and the DPC recovery.
> 
Based on the previous debugging results, the long execution time of the
ice network card driver for ice_pci_err_detected() is mainly influenced
by the irdma driver of the ice network card.

> Yours sincerely,
> LeoLiu-oc
> 
>> Thanks,
>>
>> Lukas
>

Re: [PATCH] PCI: dpc: Increase pciehp waiting time for DPC recovery

Posted by Przemek Kitszel 3 days, 18 hours ago

On 2/2/26 10:02, Lukas Wunner wrote:
> [cc += Tony, Przemek (ice driver maintainers), start of thread is here:
> https://lore.kernel.org/all/20260123104034.429060-1-LeoLiu-oc@zhaoxin.com/
> ]
> 

thank you for the report, I've asked people working on similar issue to
take a look also here

> On Mon, Feb 02, 2026 at 02:00:55PM +0800, LeoLiu-oc wrote:
>> The kernel version I am using is 6.18.6.
> [...]
>> The complete log of the kernel panic is as follows:
>>
>> [  100.304077][  T843] list_del corruption, ffff8881418b79e8->next is LIST_POISON1 (dead000000000100)
>> [  100.312989][  T843] ------------[ cut here ]------------
>> [  100.318268][  T843] kernel BUG at lib/list_debug.c:56!
>> [  100.323380][  T843] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>> [  100.329250][  T843] CPU: 7 PID: 843 Comm: irq/27-pciehp Tainted: P    W  OE     ------- ----  6.6.0-32.7.v2505.ky11.x86_64 #1
>> [  100.340793][  T843] Source Version: 71d5b964051132b7772acd935972fca11462bbfe
>> [  100.359228][  T843] RIP: 0010:__list_del_entry_valid_or_report+0x7f/0xc0
>> [  100.365877][  T843] Code: 66 4b a6 e8 c3 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 10 67 4b a6 e8 b2 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 40 67 4b a6 e8 a1 43 a9 ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 78 67 4b a6 e8 8d 43 a9 ff 0f 0b
>> [  100.385158][  T843] RSP: 0018:ffffc9000f70fc08 EFLAGS: 00010246
>> [  100.391024][  T843] RAX: 000000000000004e RBX: ffff8881418b79e8 RCX: 0000000000000000
>> [  100.398781][  T843] RDX: 0000000000000000 RSI: ffff8897df5a32c0 RDI: ffff8897df5a32c0
>> [  100.406538][  T843] RBP: ffff8881257f9608 R08: 0000000000000000 R09: 0000000000000003
>> [  100.414294][  T843] R10: ffffc9000f70fa90 R11: ffffffffa6fee508 R12: 0000000000000000
>> [  100.422050][  T843] R13: ffff8881257f9608 R14: ffff888116507c28 R15: ffff888116507c28
>> [  100.429807][  T843] FS:  0000000000000000(0000) GS:ffff8897df580000(0000) knlGS:0000000000000000
>> [  100.438511][  T843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  100.444891][  T843] CR2: 00007f9563bac1c0 CR3: 0000000c4be26004 CR4: 0000000000570ee0
>> [  100.452647][  T843] PKRU: 55555554
>> [  100.456017][  T843] Call Trace:
>> [  100.459129][  T843]  <TASK>
>> [  100.461898][  T843]  ice_flow_rem_entry_sync.constprop.0+0x1c/0x90 [ice]
>> [  100.468663][  T843]  ice_flow_rem_entry+0x3d/0x60 [ice]
>> [  100.473925][  T843]  ice_fdir_erase_flow_from_hw.constprop.0+0x9b/0x100 [ice]
>> [  100.481078][  T843]  ice_fdir_rem_flow.constprop.0+0x32/0xb0 [ice]
>> [  100.487284][  T843]  ice_vsi_manage_fdir+0x7b/0xb0 [ice]
>> [  100.492629][  T843]  ice_deinit_features.part.0+0x46/0xc0 [ice]
>> [  100.498571][  T843]  ice_remove+0xcf/0x220 [ice]
>> [  100.503222][  T843]  pci_device_remove+0x3f/0xb0
>> [  100.507798][  T843]  device_release_driver_internal+0x19d/0x220
>> [  100.513667][  T843]  pci_stop_bus_device+0x6c/0x90
>> [  100.518417][  T843]  pci_stop_and_remove_bus_device+0x12/0x20
>> [  100.524110][  T843]  pciehp_unconfigure_device+0x9f/0x160
>> [  100.529463][  T843]  pciehp_disable_slot+0x69/0x130
>> [  100.534296][  T843]  pciehp_handle_presence_or_link_change+0xfc/0x210
>> [  100.540678][  T843]  pciehp_ist+0x204/0x230
>> [  100.544824][  T843]  ? __pfx_irq_thread_fn+0x10/0x10
>> [  100.549747][  T843]  irq_thread_fn+0x20/0x60
>> [  100.553978][  T843]  irq_thread+0xfb/0x1c0
>> [  100.558038][  T843]  ? __pfx_irq_thread_dtor+0x10/0x10
>> [  100.563130][  T843]  ? __pfx_irq_thread+0x10/0x10
>> [  100.567791][  T843]  kthread+0xe5/0x120
>> [  100.571594][  T843]  ? __pfx_kthread+0x10/0x10
>> [  100.575997][  T843]  ret_from_fork+0x17a/0x1a0
>> [  100.580403][  T843]  ? __pfx_kthread+0x10/0x10
>> [  100.584805][  T843]  ret_from_fork_asm+0x1a/0x30
>> [  100.589384][  T843]  </TASK>
>> [  100.592237][  T843] Modules linked in: zxmem(OE) einj amdgpu amdxcp
>> gpu_sched drm_exec drm_buddy nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
>> nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
>> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 zhaoxin_cputemp
>> nf_defrag_ipv4 zhaoxin_rng snd_hda_codec_hdmi radeon rfkill
>> snd_hda_intel snd_intel_dspcfg irdma i2c_algo_bit snd_intel_sdw_acpi
>> ip_set i40e drm_suballoc_helper nf_tables drm_ttm_helper pcicfg(POE)
>> snd_hda_codec ib_uverbs sunrpc ttm ib_core snd_hda_core
>> drm_display_helper snd_hwdep kvm_intel snd_pcm cec vfat fat
>> drm_kms_helper snd_timer kvm video ice snd psmouse soundcore wmi
>> acpi_cpufreq pcspkr i2c_zhaoxin sg sch_fq_codel drm fuse backlight
>> nfnetlink xfs sd_mod t10_pi sm2_zhaoxin_gmi crct10dif_pclmul
>> crc32_pclmul ahci crc32c_intel libahci r8169 ghash_clmulni_intel libata
>> sha512_ssse3 serio_raw realtek dm_mirror dm_region_hash dm_log
>> dm_multipath dm_mod i2c_dev autofs4
>> [  100.674508][  T843] ---[ end trace 0000000000000000 ]---
>> [  100.709547][  T843] RIP: 0010:__list_del_entry_valid_or_report+0x7f/0xc0
>> [  100.716197][  T843] Code: 66 4b a6 e8 c3 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 10 67 4b a6 e8 b2 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 40 67 4b a6 e8 a1 43 a9 ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 78 67 4b a6 e8 8d 43 a9 ff 0f 0b
>> [  100.735491][  T843] RSP: 0018:ffffc9000f70fc08 EFLAGS: 00010246
>> [  100.741367][  T843] RAX: 000000000000004e RBX: ffff8881418b79e8 RCX: 0000000000000000
>> [  100.749137][  T843] RDX: 0000000000000000 RSI: ffff8897df5a32c0 RDI: ffff8897df5a32c0
>> [  100.756909][  T843] RBP: ffff8881257f9608 R08: 0000000000000000 R09: 0000000000000003
>> [  100.764678][  T843] R10: ffffc9000f70fa90 R11: ffffffffa6fee508 R12: 0000000000000000
>> [  100.772448][  T843] R13: ffff8881257f9608 R14: ffff888116507c28 R15: ffff888116507c28
>> [  100.780218][  T843] FS:  0000000000000000(0000) GS:ffff8897df580000(0000) knlGS:0000000000000000
>> [  100.788934][  T843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  100.795329][  T843] CR2: 00007f9563bac1c0 CR3: 0000000c4be26004 CR4: 0000000000570ee0
>> [  100.803099][  T843] PKRU: 55555554
>> [  100.806483][  T843] Kernel panic - not syncing: Fatal exception
>> [  100.812794][  T843] Kernel Offset: disabled
>> [  100.821613][  T843] pstore: backend (erst) writing error (-28)
>> [  100.827481][  T843] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>> The reason for this kernel panic is that the ice network card driver
>> executed the ice_pci_err_detected() for a longer time than the maximum
>> waiting time allowed by pciehp. After that, the pciehp_ist() will
>> execute the ice network card driver's ice_remove() process. This results
>> in the ice_pci_err_detected() having already deleted the list, while the
>> ice_remove() is still attempting to delete a list that no longer exists.
> 
> This is a bug in the ice driver, not in the pciehp or dpc driver.
> As such, it is not a good argument to support the extension of the
> timeout.  I'm not against extending the timeout, but the argument
> that it's necessary to avoid occurrence of a bug is not a good one.
> 
> You should first try to unbind the ice driver at runtime to see if
> there is a general problem in the unbind code path:
> 
> echo abcd:ef:gh.i > /sys/bus/pci/drivers/shpchp/unbind
> 
> Replace abcd:ef:gh.i with the domain/bus/device/function of the Ethernet
> card.  The dmesg excerpt you've provided unfortunately does not betray
> the card's address.
> 
> Then try to rebind the driver via the "bind" sysfs attribute.
> 
> If this works, the next thing to debug is whether the driver has a
> problem with surprise removal.  I'm not fully convinced that the
> crash you're seeing is caused by concurrent execution of
> ice_pci_err_detected() and ice_remove().  When pciehp unbinds the
> driver during DPC recovery, the device is likely inaccessible.
> It's possible that ice_remove() behaves differently for an
> inaccessible device and that may cause the crash instead of the
> concurrent execution of ice_pci_err_detected().
> 
> It would also be good to understand why DPC recovery of the Ethernet
> card takes this long.  Does it take a long time to come out of reset?
> Could the ice driver be changed to allow for faster recovery?
> 
> Thanks,
> 
> Lukas

Re: [PATCH] PCI: dpc: Increase pciehp waiting time for DPC recovery

Posted by Bjorn Helgaas 1 week, 2 days ago

On Wed, Jan 28, 2026 at 06:07:51PM +0800, LeoLiu-oc wrote:
> 在 2026/1/24 4:21, Bjorn Helgaas 写道:
> > [这封邮件来自外部发件人 谨防风险]
> > On Fri, Jan 23, 2026 at 06:40:34PM +0800, LeoLiu-oc wrote:
> >> Commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
> >> amended PCIe hotplug to not bring down the slot upon Data Link Layer State
> >> Changed events caused by Downstream Port Containment.
> >>
> >> However, PCIe hotplug (pciehp) waits up to 4 seconds before assuming that
> >> DPC recovery has failed and disabling the slot. This timeout period is
> >> insufficient for some PCIe devices.
> >> For example, the E810 dual-port network card driver needs to take over
> >> 10 seconds to execute its err_detected() callback.
> >> Since this exceeds the maximum wait time allowed for DPC recovery by the
> >> hotplug IRQ threads, a race condition occurs between the hotplug thread and
> >> the dpc_handler() thread.

> > Include the name of the E810 driver so we can easily find the
> > .err_detected() callback in question.  Actually, including the *name*
> > of that callback would be a very direct way of doing this :)

AFAICS there is no ".err_detected()" callback.  I assume you mean the
".error_detected()" callback in struct pci_error_handlers.  Sorry to
be pedantic, but it makes things a lot harder to review if we don't
refer to the actual names in the code.

And my guess is that E810 means the Intel E810 NIC, probably claimed
by the "ice" driver, which would mean ice_pci_err_detected() is the
callback in question?

> > I guess the problem this fixes is that there was a PCIe error that
> > triggered DPC, and the E810 .err_detected() works but takes longer
> > than expected, which results in pciehp disabling the slot when it
> > doesn't need to?  So the user basically sees a dead E810 device?
>
> Yes, this patch is to solve this problem.
> 
> > It seems unfortunate that we have this dependency on the time allowed
> > for .err_detected() to execute.  It's nice if adding arbitrary delay
> > doesn't break things, but maybe we can't always achieve that.
> > 
> I think this is a feasible solution. For some PCIE devices, executing
> the .err_detect() within 4 seconds will not have any impact, for a few
> PCIE devices, it might increase the execution time of pciehp_ist().
> Without this patch, PCIE devices may not be usable and could even cause
> more serious errors, such as a kernel panic. For example, the following
> log is encountered in hardware testing:
> 
> list_del corruption, ffff8881418b79e8->next is LIST_POISON1
> (dead000000000100)
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:56!
> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> ...
> Kernel panic - not syncing: Fatal exception

This is an interesting panic and looks like it might have been hard to
debug.

Do you have any idea what exactly caused this and how it's related to
the timeout in pci_dpc_recovered()?  Is there a race where pciehp
disables the slot and removes the driver, but eventually
ice_pci_err_detected() completes and we're running some ice driver
code while it's being removed or something?

Simply increasing the timeout doesn't feel like a very robust way of
solving the problem.  What happens when some other device needs 17
seconds?

But if increasing the timeout is the best we can do, maybe a warning
message in pci_dpc_recovered() when we time out would at least be a
hint that we might be heading for trouble?

> > I see that pci_dpc_recovered() is called from pciehp_ist().  Are we
> > prepared for long delays there?
>
> This patch may affect the hotplug IRQ threads execution time triggered
> by DPC, but it has no effect for normal HotPlug operation, e.g.
> Attention Button Pressed or Power Fault Detected. If you have better
> modification suggestions, I will update to the next version.

> >> +++ b/drivers/pci/pcie/dpc.c
> >> @@ -121,7 +121,7 @@ bool pci_dpc_recovered(struct pci_dev *pdev)
> >>        * but reports indicate that DPC completes within 4 seconds.
> >>        */
> >>       wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
> >> -                        msecs_to_jiffies(4000));
> >> +                        msecs_to_jiffies(16000));

Re: [PATCH] PCI: dpc: Increase pciehp waiting time for DPC recovery

Posted by LeoLiu-oc 5 days ago


在 2026/1/29 3:48, Bjorn Helgaas 写道:
> 
> 
> [这封邮件来自外部发件人 谨防风险]
> 
> On Wed, Jan 28, 2026 at 06:07:51PM +0800, LeoLiu-oc wrote:
>> 在 2026/1/24 4:21, Bjorn Helgaas 写道:
>>> [这封邮件来自外部发件人 谨防风险]
>>> On Fri, Jan 23, 2026 at 06:40:34PM +0800, LeoLiu-oc wrote:
>>>> Commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
>>>> amended PCIe hotplug to not bring down the slot upon Data Link Layer State
>>>> Changed events caused by Downstream Port Containment.
>>>>
>>>> However, PCIe hotplug (pciehp) waits up to 4 seconds before assuming that
>>>> DPC recovery has failed and disabling the slot. This timeout period is
>>>> insufficient for some PCIe devices.
>>>> For example, the E810 dual-port network card driver needs to take over
>>>> 10 seconds to execute its err_detected() callback.
>>>> Since this exceeds the maximum wait time allowed for DPC recovery by the
>>>> hotplug IRQ threads, a race condition occurs between the hotplug thread and
>>>> the dpc_handler() thread.
> 
>>> Include the name of the E810 driver so we can easily find the
>>> .err_detected() callback in question.  Actually, including the *name*
>>> of that callback would be a very direct way of doing this :)
> 
Ok, your advice is very good. I'll follow your suggestion.

> AFAICS there is no ".err_detected()" callback.  I assume you mean the
> ".error_detected()" callback in struct pci_error_handlers.  Sorry to
> be pedantic, but it makes things a lot harder to review if we don't
> refer to the actual names in the code.
> 
Yes,.err_detected() described above refers to the ice_pci_err_detected
function of the ice network driver.I will modify this part in the commit
of the next version.

> And my guess is that E810 means the Intel E810 NIC, probably claimed
> by the "ice" driver, which would mean ice_pci_err_detected() is the
> callback in question?
> 
Yes, the "E810" device driver described above refers to the "ice" driver.

Commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
amended PCIe hotplug to not bring down the slot upon Data Link Layer State
Changed events caused by Downstream Port Containment.
Commit c3be50f7547c ("PCI: pciehp: Ignore Presence Detect Changed caused
by DPC") sought to ignore Presence Detect Changed events occurring as a side
effect of Downstream Port Containment.
These commits await recovery from DPC and then clears events which
occurred in the meantime.

However, pciehp_ist() waits up to 4 seconds before assuming that DPC
recovery has failed and disabling the slot. This timeout period is
insufficient for some PCIe devices. For example, the ice network card
driver execution exceeded the maximum waiting time for DPC recovery,
causing the pciehp_disable_slot function to be executed which is not
needed. From the user's point of view, you will see that the ice network
card may not be usable and could even cause more serious errors, such as
a kernel panic. kernel panic is caused by a race between
pciep_disable_slot() and pcie_do_recovery().
For example, the following log is encountered in hardware testing:

[  100.304077][  T843] list_del corruption, ffff8881418b79e8->next is
LIST_POISON1 (dead000000000100)
[  100.312989][  T843] ------------[ cut here ]------------
[  100.318268][  T843] kernel BUG at lib/list_debug.c:56!
[  100.323380][  T843] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  100.329250][  T843] CPU: 7 PID: 843 Comm: irq/27-pciehp Tainted: P
    W  OE     ------- ----  6.6.0-32.7.v2505.ky11.x86_64 #1
[  100.340793][  T843] Source Version:
71d5b964051132b7772acd935972fca11462bbfe
[  100.359228][  T843] RIP: 0010:__list_del_entry_valid_or_report+0x7f/0xc0
[  100.365877][  T843] Code: 66 4b a6 e8 c3 43 a9 ff 0f 0b 48 89 fe 48
c7 c7 10 67 4b a6 e8 b2 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 40 67 4b a6 e8
a1 43 a9 ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 78 67 4b a6 e8 8d 43 a9
ff 0f 0b
[  100.385158][  T843] RSP: 0018:ffffc9000f70fc08 EFLAGS: 00010246
[  100.391024][  T843] RAX: 000000000000004e RBX: ffff8881418b79e8 RCX:
0000000000000000
[  100.398781][  T843] RDX: 0000000000000000 RSI: ffff8897df5a32c0 RDI:
ffff8897df5a32c0
[  100.406538][  T843] RBP: ffff8881257f9608 R08: 0000000000000000 R09:
0000000000000003
[  100.414294][  T843] R10: ffffc9000f70fa90 R11: ffffffffa6fee508 R12:
0000000000000000
[  100.422050][  T843] R13: ffff8881257f9608 R14: ffff888116507c28 R15:
ffff888116507c28
[  100.429807][  T843] FS:  0000000000000000(0000)
GS:ffff8897df580000(0000) knlGS:0000000000000000
[  100.438511][  T843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  100.444891][  T843] CR2: 00007f9563bac1c0 CR3: 0000000c4be26004 CR4:
0000000000570ee0
[  100.452647][  T843] PKRU: 55555554
[  100.456017][  T843] Call Trace:
[  100.459129][  T843]  <TASK>
[  100.461898][  T843]  ice_flow_rem_entry_sync.constprop.0+0x1c/0x90 [ice]
[  100.468663][  T843]  ice_flow_rem_entry+0x3d/0x60 [ice]
[  100.473925][  T843]
ice_fdir_erase_flow_from_hw.constprop.0+0x9b/0x100 [ice]
[  100.481078][  T843]  ice_fdir_rem_flow.constprop.0+0x32/0xb0 [ice]
[  100.487284][  T843]  ice_vsi_manage_fdir+0x7b/0xb0 [ice]
[  100.492629][  T843]  ice_deinit_features.part.0+0x46/0xc0 [ice]
[  100.498571][  T843]  ice_remove+0xcf/0x220 [ice]
[  100.503222][  T843]  pci_device_remove+0x3f/0xb0
[  100.507798][  T843]  device_release_driver_internal+0x19d/0x220
[  100.513667][  T843]  pci_stop_bus_device+0x6c/0x90
[  100.518417][  T843]  pci_stop_and_remove_bus_device+0x12/0x20
[  100.524110][  T843]  pciehp_unconfigure_device+0x9f/0x160
[  100.529463][  T843]  pciehp_disable_slot+0x69/0x130
[  100.534296][  T843]  pciehp_handle_presence_or_link_change+0xfc/0x210
[  100.540678][  T843]  pciehp_ist+0x204/0x230
[  100.544824][  T843]  ? __pfx_irq_thread_fn+0x10/0x10
[  100.549747][  T843]  irq_thread_fn+0x20/0x60
[  100.553978][  T843]  irq_thread+0xfb/0x1c0
[  100.558038][  T843]  ? __pfx_irq_thread_dtor+0x10/0x10
[  100.563130][  T843]  ? __pfx_irq_thread+0x10/0x10
[  100.567791][  T843]  kthread+0xe5/0x120
[  100.571594][  T843]  ? __pfx_kthread+0x10/0x10
[  100.575997][  T843]  ret_from_fork+0x17a/0x1a0
[  100.580403][  T843]  ? __pfx_kthread+0x10/0x10
[  100.584805][  T843]  ret_from_fork_asm+0x1a/0x30
[  100.589384][  T843]  </TASK>
[  100.592237][  T843] Modules linked in: zxmem(OE) einj amdgpu amdxcp
gpu_sched drm_exec drm_buddy nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 zhaoxin_cputemp
nf_defrag_ipv4 zhaoxin_rng snd_hda_codec_hdmi radeon rfkill
snd_hda_intel snd_intel_dspcfg irdma i2c_algo_bit snd_intel_sdw_acpi
ip_set i40e drm_suballoc_helper nf_tables drm_ttm_helper pcicfg(POE)
snd_hda_codec ib_uverbs sunrpc ttm ib_core snd_hda_core
drm_display_helper snd_hwdep kvm_intel snd_pcm cec vfat fat
drm_kms_helper snd_timer kvm video ice snd psmouse soundcore wmi
acpi_cpufreq pcspkr i2c_zhaoxin sg sch_fq_codel drm fuse backlight
nfnetlink xfs sd_mod t10_pi sm2_zhaoxin_gmi crct10dif_pclmul
crc32_pclmul ahci crc32c_intel libahci r8169 ghash_clmulni_intel libata
sha512_ssse3 serio_raw realtek dm_mirror dm_region_hash dm_log
dm_multipath dm_mod i2c_dev autofs4
[  100.674508][  T843] ---[ end trace 0000000000000000 ]---
[  100.709547][  T843] RIP: 0010:__list_del_entry_valid_or_report+0x7f/0xc0
[  100.716197][  T843] Code: 66 4b a6 e8 c3 43 a9 ff 0f 0b 48 89 fe 48
c7 c7 10 67 4b a6 e8 b2 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 40 67 4b a6 e8
a1 43 a9 ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 78 67 4b a6 e8 8d 43 a9
ff 0f 0b
[  100.735491][  T843] RSP: 0018:ffffc9000f70fc08 EFLAGS: 00010246
[  100.741367][  T843] RAX: 000000000000004e RBX: ffff8881418b79e8 RCX:
0000000000000000
[  100.749137][  T843] RDX: 0000000000000000 RSI: ffff8897df5a32c0 RDI:
ffff8897df5a32c0
[  100.756909][  T843] RBP: ffff8881257f9608 R08: 0000000000000000 R09:
0000000000000003
[  100.764678][  T843] R10: ffffc9000f70fa90 R11: ffffffffa6fee508 R12:
0000000000000000
[  100.772448][  T843] R13: ffff8881257f9608 R14: ffff888116507c28 R15:
ffff888116507c28
[  100.780218][  T843] FS:  0000000000000000(0000)
GS:ffff8897df580000(0000) knlGS:0000000000000000
[  100.788934][  T843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  100.795329][  T843] CR2: 00007f9563bac1c0 CR3: 0000000c4be26004 CR4:
0000000000570ee0
[  100.803099][  T843] PKRU: 55555554
[  100.806483][  T843] Kernel panic - not syncing: Fatal exception
[  100.812794][  T843] Kernel Offset: disabled
[  100.821613][  T843] pstore: backend (erst) writing error (-28)
[  100.827481][  T843] ---[ end Kernel panic - not syncing: Fatal
exception ]---


>>> I guess the problem this fixes is that there was a PCIe error that
>>> triggered DPC, and the E810 .err_detected() works but takes longer
>>> than expected, which results in pciehp disabling the slot when it
>>> doesn't need to?  So the user basically sees a dead E810 device?
>>
>> Yes, this patch is to solve this problem.
>>
Without this patch, we would observe that the ice network card is in an
unavailable state and a kernel panic.

>>> It seems unfortunate that we have this dependency on the time allowed
>>> for .err_detected() to execute.  It's nice if adding arbitrary delay
>>> doesn't break things, but maybe we can't always achieve that.
>>>
>> I think this is a feasible solution. For some PCIE devices, executing
>> the .err_detect() within 4 seconds will not have any impact, for a few
>> PCIE devices, it might increase the execution time of pciehp_ist().
>> Without this patch, PCIE devices may not be usable and could even cause
>> more serious errors, such as a kernel panic. For example, the following
>> log is encountered in hardware testing:
>>
>> list_del corruption, ffff8881418b79e8->next is LIST_POISON1
>> (dead000000000100)
>> ------------[ cut here ]------------
>> kernel BUG at lib/list_debug.c:56!
>> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>> ...
>> Kernel panic - not syncing: Fatal exception
> 
> This is an interesting panic and looks like it might have been hard to
> debug.
> 
> Do you have any idea what exactly caused this and how it's related to
> the timeout in pci_dpc_recovered()?  Is there a race where pciehp
> disables the slot and removes the driver, but eventually
> ice_pci_err_detected() completes and we're running some ice driver
> code while it's being removed or something?
> 
The reason for this kernel panic is that the ice network card driver
executed the ice_pci_err_detected() for a longer time than the maximum
waiting time allowed by pciehp. After that, the pciehp_ist() will
execute the ice network card driver's ice_remove() process. This results
in the ice_pci_err_detected() having already deleted the list, while the
ice_remove() is still attempting to delete a list that no longer exists.

> Simply increasing the timeout doesn't feel like a very robust way of
> solving the problem.  What happens when some other device needs 17
> seconds?
> 
The situation you described is indeed possible. Yes, we cannot guarantee
that all PCIe devices will complete the DPC recovery process within 16
seconds.

> But if increasing the timeout is the best we can do, maybe a warning
> message in pci_dpc_recovered() when we time out would at least be a
> hint that we might be heading for trouble?
> 
This is indeed a very good suggestion. Here is another alternative.

@@ -100,6 +100,7 @@ static bool dpc_completed(struct pci_dev *pdev)
 bool pci_dpc_recovered(struct pci_dev *pdev)
 {
 	struct pci_host_bridge *host;
+	u16 status;

 	if (!pdev->dpc_cap)
 		return false;
@@ -120,6 +121,12 @@ bool pci_dpc_recovered(struct pci_dev *pdev)
 	wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
 			   msecs_to_jiffies(4000));

+	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS, &status);
+	if ((!PCI_POSSIBLE_ERROR(status)) && (status &
PCI_EXP_DPC_STATUS_TRIGGER)){
+		pci_warn(pdev, "The execution of device driver's error_detected()
took too long\n");
+		return true;
+	}
+
 	return test_and_clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);

>>> I see that pci_dpc_recovered() is called from pciehp_ist().  Are we
>>> prepared for long delays there?
>>
>> This patch may affect the hotplug IRQ threads execution time triggered
>> by DPC, but it has no effect for normal HotPlug operation, e.g.
>> Attention Button Pressed or Power Fault Detected. If you have better
>> modification suggestions, I will update to the next version.
> 
>>>> +++ b/drivers/pci/pcie/dpc.c
>>>> @@ -121,7 +121,7 @@ bool pci_dpc_recovered(struct pci_dev *pdev)
>>>>        * but reports indicate that DPC completes within 4 seconds.
>>>>        */
>>>>       wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
>>>> -                        msecs_to_jiffies(4000));
>>>> +                        msecs_to_jiffies(16000));