drivers/net/ethernet/broadcom/tg3.c | 19 +++++++++++++++++-- drivers/net/ethernet/broadcom/tg3.h | 1 + 2 files changed, 18 insertions(+), 2 deletions(-)
During PCIe hot-plug events, uncorrectable errors can be reported and
AER recovery for the tg3 device is initiated by the AER kernel driver.
The tg3_io_error_detected function is the AER error recovery handler.
From tg3_io_error_detected, we call tg3_netif_stop->tg3_napi_disable->
napi_disable and return PCI_ERS_RESULT_NEED_RESET on non-fatal error.
We expect that during AER recovery tg3_io_slot_reset and tg3_io_resume
will be called. But AER error recovery can fail. For example, when one
of PCIe devices on the same bus reports PCI_ERS_RESULT_NO_AER_DRIVER.
As a result, tg3_io_slot_reset and tg3_io_resume are not called, PCIe
device is disabled and NAPI is disabled (pci_disable_device and
napi_disable are called from tg3_io_error_detected). Then we can try to
disable PCIe link and napi_disable will be called again:
napi_disable+0x1b/0x1b0
tg3_napi_disable+0x89/0xa0 [tg3]
tg3_netif_stop+0x37/0xe3 [tg3]
tg3_stop+0x30/0x160 [tg3]
tg3_close+0x2a/0x60 [tg3]
__dev_close_many+0xad/0x130
dev_close_many+0xb2/0x190
unregister_netdevice_many_notify+0x19d/0xa00
unregister_netdevice_queue+0xf8/0x140
unregister_netdev+0x1c/0x30
tg3_remove_one+0xaa/0x150 [tg3]
pci_device_remove+0x42/0xb0
device_release_driver_internal+0x19c/0x200
pci_stop_bus_device+0x85/0xb0
pci_stop_bus_device+0x2c/0xb0
pci_stop_bus_device+0x2c/0xb0
pci_stop_and_remove_bus_device+0x12/0x20
pciehp_unconfigure_device+0x9f/0x160
pciehp_disable_slot+0x67/0x100
pciehp_handle_presence_or_link_change+0x77/0x350
This is not expected by napi_disable and a thread can be locked in
napi_disable forever. We have pcierr_recovery to cover a similar issue,
but for fatal errors. We cannot reuse this flag because it is reset in
tg3_io_resume, but it is not called when AER recovery fails.
Similarly, if an AER error is reported and tg3_io_error_detected calls
pci_disable_device, a subsequent device removal via tg3_remove_one or
tg3_shutdown will call pci_disable_device again for the already-disabled
device.
Add a napi_enabled flag to struct tg3 to track whether napi_enable has
been called. Guard tg3_napi_disable() against being called before
tg3_napi_enable(), logging an error if that happens. Also guard
pci_disable_device() calls in tg3_remove_one() and tg3_shutdown() with
pci_is_enabled() to avoid disabling an already-disabled device.
Signed-off-by: Yury Murashka <yurypm@arista.com>
---
drivers/net/ethernet/broadcom/tg3.c | 19 +++++++++++++++++--
drivers/net/ethernet/broadcom/tg3.h | 1 +
2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c
b/drivers/net/ethernet/broadcom/tg3.c
index 73a4b569b..500b6f7fa 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -7396,8 +7396,18 @@ static void tg3_napi_disable(struct tg3 *tp)
int txq_idx = tp->txq_cnt - 1;
int rxq_idx = tp->rxq_cnt - 1;
struct tg3_napi *tnapi;
+ struct net_device *netdev = tp->dev;
int i;
+ if (!tp->napi_enabled) {
+ netdev_err(netdev, "%s() called when napi_enable
wasn't called before, netif_running=%d, pci_enabled=%d\n",
+ __func__, netif_running(netdev),
+ pci_is_enabled(tp->pdev));
+ return;
+ }
+
+ tp->napi_enabled = false;
+
for (i = tp->irq_cnt - 1; i >= 0; i--) {
tnapi = &tp->napi[i];
if (tnapi->tx_buffers) {
@@ -7420,6 +7430,8 @@ static void tg3_napi_enable(struct tg3 *tp)
struct tg3_napi *tnapi;
int i;
+ tp->napi_enabled = true;
+
for (i = 0; i < tp->irq_cnt; i++) {
tnapi = &tp->napi[i];
napi_enable_locked(&tnapi->napi);
@@ -17718,6 +17730,7 @@ static int tg3_init_one(struct pci_dev *pdev,
tp->tx_mode = TG3_DEF_TX_MODE;
tp->irq_sync = 1;
tp->pcierr_recovery = false;
+ tp->napi_enabled = false;
if (tg3_debug > 0)
tp->msg_enable = tg3_debug;
@@ -18099,7 +18112,8 @@ static void tg3_remove_one(struct pci_dev *pdev)
}
free_netdev(dev);
pci_release_regions(pdev);
- pci_disable_device(pdev);
+ if (pci_is_enabled(pdev))
+ pci_disable_device(pdev);
}
}
@@ -18257,7 +18271,8 @@ static void tg3_shutdown(struct pci_dev *pdev)
rtnl_unlock();
- pci_disable_device(pdev);
+ if (pci_is_enabled(pdev))
+ pci_disable_device(pdev);
}
/**
diff --git a/drivers/net/ethernet/broadcom/tg3.h
b/drivers/net/ethernet/broadcom/tg3.h
index a9e7f88fa..34fb771e8 100644
--- a/drivers/net/ethernet/broadcom/tg3.h
+++ b/drivers/net/ethernet/broadcom/tg3.h
@@ -3429,6 +3429,7 @@ struct tg3 {
struct device *hwmon_dev;
bool link_up;
bool pcierr_recovery;
+ bool napi_enabled;
u32 ape_hb;
unsigned long ape_hb_interval;
--
2.51.0
On Fri, May 15, 2026 at 4:28 PM Yury Murashka <yurypm@arista.com> wrote: > > During PCIe hot-plug events, uncorrectable errors can be reported and > AER recovery for the tg3 device is initiated by the AER kernel driver. > The tg3_io_error_detected function is the AER error recovery handler. > > From tg3_io_error_detected, we call tg3_netif_stop->tg3_napi_disable-> > napi_disable and return PCI_ERS_RESULT_NEED_RESET on non-fatal error. > We expect that during AER recovery tg3_io_slot_reset and tg3_io_resume > will be called. But AER error recovery can fail. For example, when one > of PCIe devices on the same bus reports PCI_ERS_RESULT_NO_AER_DRIVER. > As a result, tg3_io_slot_reset and tg3_io_resume are not called, PCIe > device is disabled and NAPI is disabled (pci_disable_device and > napi_disable are called from tg3_io_error_detected). Then we can try to > disable PCIe link and napi_disable will be called again: Calling napi_disable() in case of teardown due to error and in ndo_stop is very common. So I imagine many drivers will encounter this same situation. I am not sure how real the NO_AER_DRIVER occurring situation is. If yes, then we need to fix more drivers? > > napi_disable+0x1b/0x1b0 > tg3_napi_disable+0x89/0xa0 [tg3] > tg3_netif_stop+0x37/0xe3 [tg3] > tg3_stop+0x30/0x160 [tg3] > tg3_close+0x2a/0x60 [tg3] > __dev_close_many+0xad/0x130 > dev_close_many+0xb2/0x190 > unregister_netdevice_many_notify+0x19d/0xa00 > unregister_netdevice_queue+0xf8/0x140 > unregister_netdev+0x1c/0x30 > tg3_remove_one+0xaa/0x150 [tg3] > pci_device_remove+0x42/0xb0 > device_release_driver_internal+0x19c/0x200 > pci_stop_bus_device+0x85/0xb0 > pci_stop_bus_device+0x2c/0xb0 > pci_stop_bus_device+0x2c/0xb0 > pci_stop_and_remove_bus_device+0x12/0x20 > pciehp_unconfigure_device+0x9f/0x160 > pciehp_disable_slot+0x67/0x100 > pciehp_handle_presence_or_link_change+0x77/0x350 > > This is not expected by napi_disable and a thread can be locked in > napi_disable forever. We have pcierr_recovery to cover a similar issue, > but for fatal errors. We cannot reuse this flag because it is reset in > tg3_io_resume, but it is not called when AER recovery fails. > > Similarly, if an AER error is reported and tg3_io_error_detected calls > pci_disable_device, a subsequent device removal via tg3_remove_one or > tg3_shutdown will call pci_disable_device again for the already-disabled > device. I believe the same argument is true here also.. P.S: patches containing fixes should mention 'net' and should contain fixes tag
This is what I saw in our real environment: [ 475.568144] pcieport 0000:00:03.0: AER: Uncorrectable (Non-Fatal) error message received from 0000:00:00.0 [ 475.568255] pcieport 0000:06:07.0: PCIe Bus Error: severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Requester ID) [ 475.703860] pcieport 0000:06:07.0: device [11f8:8533] error status/mask=00100000/04400000 [ 475.727967] pcieport 0000:52:02.0: Unable to change power state from D3hot to D0, device inaccessible [ 475.804002] pcieport 0000:06:07.0: [20] UnsupReq (First) [ 475.804008] pcieport 0000:06:07.0: AER: TLP Header: 60000001 0000010f 0000380e 00000068 [ 475.916817] tg3 0000:49:00.0 lc3: PCI I/O error detected [ 476.094461] eth0: port 3(lc4) entered disabled state#8 #11 #1 #3 #4 #7 #9 #10 [ 476.096010] br1: port 14(lc4.42) entered disabled state [ 476.097188] eth0: port 3(lc4) entered disabled state [ 476.097485] lc4.42 (unregistering): left allmulticast modC [ 476.097491] tg3 0000:54:00.0 lc4 (unregistering): left allmulticast mode [ 476.097494] lc4.42 (unregistering): left promiscuous mode [ 476.097508] tg3 0000:54:00.0 lc4 (unregistering): left promiscuous modeS) [ 476.097513] br1: port 14(lc4.42) entered disabled state [ 476.224325] pci 0000:46:00.1: AER: can't recover (no error_detected callback) [ 476.224333] pci 0000:46:00.2: AER: can't recover (no error_detected callback) [ 476.224335] pci 0000:46:00.3: AER: can't recover (no error_detected callback) [ 476.224338] pci 0000:46:00.4: AER: can't recover (no error_detected callback) [ 476.224371] pcieport 0000:06:07.0: AER: device recovery failedy This is PCIe tree: #lspci -vvvt -+-[0000:00]-+-00.0 Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 | +-01.0-[01]-- | +-01.1-[02]-- | +-02.0-[03]--+-00.0 Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 0 | | +-00.1 Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 1 | | +-00.2 Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 2 | | \-00.3 Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 3 | +-02.2-[04]--+-00.0 Intel Corporation Ethernet Connection X552 10 GbE Backplane | | \-00.1 Intel Corporation Ethernet Connection X552 10 GbE Backplane | +-03.0-[05-9e]--+-00.0-[06-9d]--+-00.0-[07-10]--+-00.0-[08-10]--+-01.0-[09-0c]----00.0 Broadcom Inc. and subsidiaries Device 8797 | | | | | +-02.0-[0d-0f]-- | | | | | \-03.0-[10]----00.0 Broadcom Inc. and subsidiaries Device 8797 | | | | +-00.1 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.2 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.3 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | \-00.4 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | +-01.0-[11-1a]--+-00.0-[12-1a]--+-01.0-[13-16]----00.0 Broadcom Inc. and subsidiaries Device 8797 | | | | | +-02.0-[17-19]-- | | | | | \-03.0-[1a]----00.0 Broadcom Inc. and subsidiaries Device 8797 | | | | +-00.1 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.2 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.3 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | \-00.4 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | +-02.0-[1b-24]--+-00.0-[1c-24]--+-01.0-[1d-20]----00.0 Broadcom Inc. and subsidiaries Device 8797 | | | | | +-02.0-[21-23]-- | | | | | \-03.0-[24]----00.0 Broadcom Inc. and subsidiaries Device 8797 | | | | +-00.1 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.2 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.3 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | \-00.4 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | +-03.0-[25-2e]--+-00.0-[26-2e]--+-01.0-[27-2a]----00.0 Broadcom Inc. and subsidiaries Device 8797 | | | | | +-02.0-[2b-2d]-- | | | | | \-03.0-[2e]----00.0 Broadcom Inc. and subsidiaries Device 8797 | | | | +-00.1 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.2 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.3 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | \-00.4 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | +-04.0-[2f-38]--+-00.0-[30-38]--+-01.0-[31-34]----00.0 Broadcom Inc. and subsidiaries Device 8797 | | | | | +-02.0-[35-37]-- | | | | | \-03.0-[38]----00.0 Broadcom Inc. and subsidiaries Device 8797 | | | | +-00.1 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.2 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.3 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | \-00.4 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | +-05.0-[39-42]--+-00.0-[3a-42]--+-01.0-[3b-3e]----00.0 Broadcom Inc. and subsidiaries Device 8797 | | | | | +-02.0-[3f-41]-- | | | | | \-03.0-[42]----00.0 Broadcom Inc. and subsidiaries Device 8797 | | | | +-00.1 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.2 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.3 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | \-00.4 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | +-06.0-[43-45]-- | | | +-07.0-[46-50]--+-00.0-[47-49]--+-02.0-[48]-- | | | | | \-0d.0-[49]----00.0 Broadcom Inc. and subsidiaries NetXtreme BCM57762 Gigabit Ethernet PCIe | | | | +-00.1 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.2 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.3 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | \-00.4 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | +-08.0-[51-5b]--+-00.0-[52-54]--+-02.0-[53]-- | | | | | \-0d.0-[54]----00.0 Broadcom Inc. and subsidiaries NetXtreme BCM57762 Gigabit Ethernet PCIe | | | | +-00.1 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.2 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.3 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | \-00.4 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | +-09.0-[5c-66]--+-00.0-[5d-5f]--+-02.0-[5e]-- | | | | | \-0d.0-[5f]----00.0 Broadcom Inc. and subsidiaries NetXtreme BCM57762 Gigabit Ethernet PCIe | | | | +-00.1 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.2 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.3 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | \-00.4 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | +-0a.0-[67-71]--+-00.0-[68-71]--+-02.0-[69]-- | | | | | \-0d.0-[6a-71]----00.0 Broadcom Inc. and subsidiaries NetXtreme BCM57762 Gigabit Ethernet PCIe | | | | +-00.1 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.2 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | +-00.3 PLX Technology, Inc. PEX PCI Express Switch DMA interface | | | | \-00.4 PLX Technology, Inc. PEX PCI Express Switch DMA interface To do cleanup on device-stop or error handling is a common approach, and it is not a problem. The problem is that tg3 doesn't track the device status. If a cleanup was performed (napi_disable and pci_disable_device called), tg3 should know that the device is not in an initialized state. When we subsequently try to disable the device, tg3 should not try to cleanup again. That is the problem I'm trying to fix. Frankly speaking, I'm adding a flag which signals that device cleanup was completed during the handling of an AER error, so when tg3 stops/removes the device, it should not perform cleanup again. Maybe PCI_ERS_RESULT_NO_AER_DRIVER is a rare case in the PCIe world, but I think that in any case, the tg3 driver should correctly handle AER recovery failure. Recovery can fail not only because of the PCI_ERS_RESULT_NO_AER_DRIVER return code. The problem is that a double napi_disable call causes a soft lockup, and not just one driver/device stops functioning—the whole system is affected. On 5/15/26 16:42, Pavan Chebbi wrote: > On Fri, May 15, 2026 at 4:28 PM Yury Murashka <yurypm@arista.com> wrote: >> During PCIe hot-plug events, uncorrectable errors can be reported and >> AER recovery for the tg3 device is initiated by the AER kernel driver. >> The tg3_io_error_detected function is the AER error recovery handler. >> >> From tg3_io_error_detected, we call tg3_netif_stop->tg3_napi_disable-> >> napi_disable and return PCI_ERS_RESULT_NEED_RESET on non-fatal error. >> We expect that during AER recovery tg3_io_slot_reset and tg3_io_resume >> will be called. But AER error recovery can fail. For example, when one >> of PCIe devices on the same bus reports PCI_ERS_RESULT_NO_AER_DRIVER. >> As a result, tg3_io_slot_reset and tg3_io_resume are not called, PCIe >> device is disabled and NAPI is disabled (pci_disable_device and >> napi_disable are called from tg3_io_error_detected). Then we can try to >> disable PCIe link and napi_disable will be called again: > Calling napi_disable() in case of teardown due to error and in > ndo_stop is very common. > So I imagine many drivers will encounter this same situation. I am not > sure how real the NO_AER_DRIVER occurring situation is. > If yes, then we need to fix more drivers? > >> napi_disable+0x1b/0x1b0 >> tg3_napi_disable+0x89/0xa0 [tg3] >> tg3_netif_stop+0x37/0xe3 [tg3] >> tg3_stop+0x30/0x160 [tg3] >> tg3_close+0x2a/0x60 [tg3] >> __dev_close_many+0xad/0x130 >> dev_close_many+0xb2/0x190 >> unregister_netdevice_many_notify+0x19d/0xa00 >> unregister_netdevice_queue+0xf8/0x140 >> unregister_netdev+0x1c/0x30 >> tg3_remove_one+0xaa/0x150 [tg3] >> pci_device_remove+0x42/0xb0 >> device_release_driver_internal+0x19c/0x200 >> pci_stop_bus_device+0x85/0xb0 >> pci_stop_bus_device+0x2c/0xb0 >> pci_stop_bus_device+0x2c/0xb0 >> pci_stop_and_remove_bus_device+0x12/0x20 >> pciehp_unconfigure_device+0x9f/0x160 >> pciehp_disable_slot+0x67/0x100 >> pciehp_handle_presence_or_link_change+0x77/0x350 >> >> This is not expected by napi_disable and a thread can be locked in >> napi_disable forever. We have pcierr_recovery to cover a similar issue, >> but for fatal errors. We cannot reuse this flag because it is reset in >> tg3_io_resume, but it is not called when AER recovery fails. >> >> Similarly, if an AER error is reported and tg3_io_error_detected calls >> pci_disable_device, a subsequent device removal via tg3_remove_one or >> tg3_shutdown will call pci_disable_device again for the already-disabled >> device. > I believe the same argument is true here also.. > P.S: patches containing fixes should mention 'net' and should contain fixes tag
© 2016 - 2026 Red Hat, Inc.