From nobody Mon Jun 8 18:57:24 2026 Received: from mail-dl1-f51.google.com (mail-dl1-f51.google.com [74.125.82.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B83C03F8EC6 for ; Wed, 27 May 2026 11:55:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779882945; cv=none; b=FBrVauJwC59B6VLK5xQ0263cYoCl05ibwJusYpyBxiYcPqHkCsSwR7lj5Xpg3syN8NyOfNbOpO71johORwHm7Cm9onBzkHZvQClBcnGM03aMKINvODfq0yf4jjyAgU6TJjId1MqvGYQInS0L2xzl2Zqbg8uV1cnoUtxOZacXMOo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779882945; c=relaxed/simple; bh=GAApqeCRjQK3CN59WuzG4AK8wYM3U3jG5QMMu3Zh/j8=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=nLhBBAijSP6J2Rv6p16J+GP9AvqctH/rc+mxsik7R9NxpVfvktmVAb04BUtZS7L59+Y7v2cGs1kXXgt9WPy6ZcATqoJtIzla8tknptz/Ng43WUy9bHB39OxxkZs6CZeptMsR5sw+sMddAhHOL5egKcF/q56QxqcvrF6ks9S2hvU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=arista.com; spf=pass smtp.mailfrom=arista.com; dkim=pass (2048-bit key) header.d=arista.com header.i=@arista.com header.b=HaektoMj; arc=none smtp.client-ip=74.125.82.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=arista.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arista.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=arista.com header.i=@arista.com header.b="HaektoMj" Received: by mail-dl1-f51.google.com with SMTP id a92af1059eb24-135e88b8e55so5128202c88.0 for ; Wed, 27 May 2026 04:55:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arista.com; s=google; t=1779882943; x=1780487743; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=rrJTIz+SLJ7Rrnm7gwuGykMXj+faWEwAmsB52W+eYH8=; b=HaektoMjnt48XvtkRiauq1TZKy/KRWQD/T08okNCW9mImPS7uDTkVrWikUjt570j2z 7UZH7tyYFAEzZWoGCc5Tjet7RjgqrRDQvMD4XkjiIochE/zJT+igrkv+vG/Stochnym5 PLbX5DlVMuzXiFb35oQEnS1xMKPgnLyeNWWSjnGPmNqVOyxmf5yMm8a2igLlvGgyt/ey Ep14YI3IJW6tKkeH46mFg1FHGVtt4B/1EV3VOOEBCoAhUCLP0WOBYkyGhkJBniLC6DpG TJ5NmYlLWYFJnpDIE/2EJ4rx1v4J2npKDX3ty15Cn/SHt3NTAVMLC4lifOaGULFaUSQR 64bA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779882943; x=1780487743; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=rrJTIz+SLJ7Rrnm7gwuGykMXj+faWEwAmsB52W+eYH8=; b=ARPu0YQTSthhkyhn+CUcLm978EhiP3r7qyFe+jAhIqzzt4c6wqfMMN2ZV8JLxS4/6r lN/QbxCVCgYZWFACPZAYsnJ8vDMVU7BU0rG6Brl3B9iJTxX5RmBssBhauQzToFMCDXA0 gu76OtrLIJrg7aRwmmheAEq1wmw/4IHTPVfL2P/MKVGtChqXbTbha9kDU8Tn9rQQNCr6 80969lzQZNgMrcTi42eM9tIzVC/2qtk7q7lDyy9pxLWuyxw2kNjAwiuWXGc2AYD/YfnF jtnxxJFzpATyJa521LtPzCT08qF63L+vUHOa5bsL+tI86TYTkkOX0UdDsFY4+JYgJlmK QjJg== X-Forwarded-Encrypted: i=1; AFNElJ8oz8ivF8jN2CP9Qv4AJnVMJdMFpUKfS49QXoaM3R9lnVXFpSy5pGSU850ykzXruaiQlK/LDna9WZUwNPY=@vger.kernel.org X-Gm-Message-State: AOJu0Yxfm59nzwK9atCYsKx+hduzjyyzfPOjdw1LhSW/KF80mgS3VWLL 9Uym4Gp+mZ577/IKMgclvtCDP19h4PSG4KhgbpLsXd2Xq+ex429MigIwuObNgAzOqA== X-Gm-Gg: Acq92OHywvdexUZIPsToRt5NOR41kLImUzF+C8nlU2d2sUxnVlLgeCW93tEOgj3IxN1 uDeOweKDwXsE9oy2zN0wSj2dzqc4KUa1blQuzr2FH+ShLM/GW3AXMRprM2CDE1CoVnJtcV8Cf+1 KRerCSLK6GIpL/CSCjppS44aZlXcH9M55gL/946GXlBRkSzdvOuUdnbv8dKn8D4AZNxOH1DHem6 2gy2FL29m8af66YnmFVRan0hugq6EIbl79sXEPvLQ2KI3UTf9WARIQrOYNsTICn+pb7fDJn54uH U6t/mh2xfI62OlUJ5BVDNs23tAZ8BQY1z5Bitlw4CqGYTx1E0rRAhotr26kMROGW3u4lS5kx0HN nWpfVZIpbhumfwGlLrMaiywZsoK4jN5Hys7IwO2wOYaz8C0uDEs6myiIBAlI8kAL60kWlIu0zg3 JCZNnA0pL8Eap6+YZI/9h66TgA+KCk8P9iV/cgoObZSfuZVJCLaJX6f8nPOnZTnwCF7po+UynzL XwU X-Received: by 2002:a05:7022:ebc5:b0:12d:ce34:3f92 with SMTP id a92af1059eb24-1365fb622e6mr9412986c88.31.1779882942567; Wed, 27 May 2026 04:55:42 -0700 (PDT) Received: from yurypm-home-4hjc6.sjc.aristanetworks.com ([74.123.28.15]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-1366a2e672asm12917022c88.3.2026.05.27.04.55.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 04:55:41 -0700 (PDT) From: Yury Murashka To: pavan.chebbi@broadcom.com, mchan@broadcom.com Cc: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Yury Murashka Subject: [PATCH net v4] net: tg3: guard napi_disable and pci_disable_device calls Date: Wed, 27 May 2026 11:55:35 +0000 Message-ID: <20260527115535.1686932-1-yurypm@arista.com> X-Mailer: git-send-email 2.51.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" During PCIe hot-plug events, uncorrectable errors can be reported and AER recovery for the tg3 device is initiated by the AER kernel driver. The tg3_io_error_detected function is the AER error recovery handler. From tg3_io_error_detected, we call tg3_netif_stop->tg3_napi_disable-> napi_disable and return PCI_ERS_RESULT_NEED_RESET on non-fatal error. We expect that during AER recovery tg3_io_slot_reset and tg3_io_resume will be called. But AER error recovery can fail. For example, when one of PCIe devices on the same bus reports PCI_ERS_RESULT_NO_AER_DRIVER. As a result, tg3_io_slot_reset and tg3_io_resume are not called, PCIe device is disabled and NAPI is disabled (pci_disable_device and napi_disable are called from tg3_io_error_detected). Then we can try to disable PCIe link and napi_disable will be called again: napi_disable+0x1b/0x1b0 tg3_napi_disable+0x89/0xa0 [tg3] tg3_netif_stop+0x37/0xe3 [tg3] tg3_stop+0x30/0x160 [tg3] tg3_close+0x2a/0x60 [tg3] __dev_close_many+0xad/0x130 dev_close_many+0xb2/0x190 unregister_netdevice_many_notify+0x19d/0xa00 unregister_netdevice_queue+0xf8/0x140 unregister_netdev+0x1c/0x30 tg3_remove_one+0xaa/0x150 [tg3] pci_device_remove+0x42/0xb0 device_release_driver_internal+0x19c/0x200 pci_stop_bus_device+0x85/0xb0 pci_stop_bus_device+0x2c/0xb0 pci_stop_bus_device+0x2c/0xb0 pci_stop_and_remove_bus_device+0x12/0x20 pciehp_unconfigure_device+0x9f/0x160 pciehp_disable_slot+0x67/0x100 pciehp_handle_presence_or_link_change+0x77/0x350 This is not expected by napi_disable and a thread can be locked in napi_disable forever. We have pcierr_recovery to cover a similar issue, but for fatal errors. We cannot reuse this flag because it is reset in tg3_io_resume, but it is not called when AER recovery fails. Similarly, if an AER error is reported and tg3_io_error_detected calls pci_disable_device, a subsequent device removal via tg3_remove_one or tg3_shutdown will call pci_disable_device again for the already-disabled device. Add a napi_enabled flag to struct tg3 to track whether napi_enable has been called. Guard tg3_napi_disable() so it returns early if NAPI was not previously enabled. Also guard pci_disable_device() calls in tg3_remove_one() and tg3_shutdown() with pci_is_enabled() to avoid disabling an already-disabled device. Fixes: b45aa2f6192e ("tg3: Add EEH support") Signed-off-by: Yury Murashka Reviewed-by: Jacob Keller --- v4: - Rebased on the latest net tree and fixed indentation v3: - Removed netdev_err() log from tg3_napi_disable() guard; silently return instead v2: - Rewrote commit message with full problem description and call trace - Added Fixes tag - Added "net" tree prefix to subject drivers/net/ethernet/broadcom/tg3.c | 14 ++++++++++++-- drivers/net/ethernet/broadcom/tg3.h | 1 + 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/bro= adcom/tg3.c index 73a4b569b..86995e689 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -7398,6 +7398,11 @@ static void tg3_napi_disable(struct tg3 *tp) struct tg3_napi *tnapi; int i; =20 + if (!tp->napi_enabled) + return; + + tp->napi_enabled =3D false; + for (i =3D tp->irq_cnt - 1; i >=3D 0; i--) { tnapi =3D &tp->napi[i]; if (tnapi->tx_buffers) { @@ -7420,6 +7425,8 @@ static void tg3_napi_enable(struct tg3 *tp) struct tg3_napi *tnapi; int i; =20 + tp->napi_enabled =3D true; + for (i =3D 0; i < tp->irq_cnt; i++) { tnapi =3D &tp->napi[i]; napi_enable_locked(&tnapi->napi); @@ -17718,6 +17725,7 @@ static int tg3_init_one(struct pci_dev *pdev, tp->tx_mode =3D TG3_DEF_TX_MODE; tp->irq_sync =3D 1; tp->pcierr_recovery =3D false; + tp->napi_enabled =3D false; =20 if (tg3_debug > 0) tp->msg_enable =3D tg3_debug; @@ -18099,7 +18107,8 @@ static void tg3_remove_one(struct pci_dev *pdev) } free_netdev(dev); pci_release_regions(pdev); - pci_disable_device(pdev); + if (pci_is_enabled(pdev)) + pci_disable_device(pdev); } } =20 @@ -18257,7 +18266,8 @@ static void tg3_shutdown(struct pci_dev *pdev) =20 rtnl_unlock(); =20 - pci_disable_device(pdev); + if (pci_is_enabled(pdev)) + pci_disable_device(pdev); } =20 /** diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/bro= adcom/tg3.h index a9e7f88fa..34fb771e8 100644 --- a/drivers/net/ethernet/broadcom/tg3.h +++ b/drivers/net/ethernet/broadcom/tg3.h @@ -3429,6 +3429,7 @@ struct tg3 { struct device *hwmon_dev; bool link_up; bool pcierr_recovery; + bool napi_enabled; =20 u32 ape_hb; unsigned long ape_hb_interval; --=20 2.51.0