From nobody Sun May 24 22:42:08 2026 Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 00DD93B3886 for ; Wed, 20 May 2026 16:35:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779294935; cv=none; b=FFUSjJZUkmy0TsZH7Ezgxs2Gl99xpFJOl/uOC3AsWlF5QtccwmBLECT3kl1hckhFWfAspSH6hhCNJecdgItIj/7Atrvs15VPf1WpZoMcV5LLcEwhE8VFsjrHwGCuTC93VRt3TkcTUc2jhXt/kT3FF6Qvy6pI+J/njy0Biz3p3gg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779294935; c=relaxed/simple; bh=8jEkkJubtXEaeLZQ+pYsHFV4qq1QXHCrTGtNi8vqZmA=; h=Message-ID:Date:MIME-Version:From:Subject:To:Cc:Content-Type; b=njC8WRZxkAPGZbN5PFA2sT94Y+NOfICdvSdrbP/RfO6D28z86iKFTs7Fneg8x3XUk2M/PXph3f134r8SDuAhvmQsIsCgx8ZBJCG5gjyuzxOpwrbipJ0lUzm7N9tKpbIYry975oOM2Py6ZTUkF5rhWb/0FLDbOddG8s9y2iNSVDI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=arista.com; spf=pass smtp.mailfrom=arista.com; dkim=pass (2048-bit key) header.d=arista.com header.i=@arista.com header.b=TsdWpPVI; arc=none smtp.client-ip=209.85.221.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=arista.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arista.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=arista.com header.i=@arista.com header.b="TsdWpPVI" Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-44c350a5b87so3165690f8f.3 for ; Wed, 20 May 2026 09:35:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arista.com; s=google; t=1779294930; x=1779899730; darn=vger.kernel.org; h=content-transfer-encoding:content-language:cc:to:subject:from :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=5m4belQIBg7Mn7sCkneyV4rbMVW1Gi36YS0PWnsWsEQ=; b=TsdWpPVI859z2Ndq64Y66oZWBs/yYAR794a4lj9tpEM8wFHMYNTXB3+OWNrHb7ANH+ rD5igm/bpQcPRpUCOCDPZ8mnagjLrvNz5hhUV5+dDyAXIMeu28j4k8/me+xeQ+u3oPww NqvYkwZRmGfvr2hZSUbhAPZEjf47GrbPDR93FnX2PV6RRbfV2q1BewvmfB0bWvXzY7A1 c10F6CFOoOIyuvvYd6bJ51EVkZJoo02N7wWu9U6JcI/zxwrSes6eziHW65+RCFYsRCIV gop77/f1BR7UvcmMNgmDAA7biAQwZVhFw4enOc07WKG0w0ngIcf5901OKlUXg0i/oRH4 9ZNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779294930; x=1779899730; h=content-transfer-encoding:content-language:cc:to:subject:from :user-agent:mime-version:date:message-id:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=5m4belQIBg7Mn7sCkneyV4rbMVW1Gi36YS0PWnsWsEQ=; b=c7c/t8jol+MVbBxQrQ6cY61tLh8Y3q5+8Y9HPHTpn/00AOPqZwlQ4jRjjSL+FRYk+u qlFUGIWgTPhf8nweQwPmaR5vNnJ08fHaE4Mg0snpq6awGWTyzcnOzyGLD5C6SFdUILT5 ucNvmWNfLIRq86bFYopj3MUsJxrET8fKGE8XG2PQv4WzDHb6Np+k+R8q8BVpu1uzFr4c eXUYwv056zSAV23I/mQEWP1z61rOMVIdcfD6wT/Woxk+bUpLEamraFWCU2x9QhFlk0bw CSVNAgfbOeDUx1++PMKjdz0BHwcBNvQle0N7aQjLRC/dDA87cLyjDUI9Q2c7/0YCIcEH fzGg== X-Forwarded-Encrypted: i=1; AFNElJ9m/A3FaAWDeiaTjIyHcH4vCYUBWdTtxiVQDyQ1ggAUbXzoFdgewBHmv7HMCNzxKiKL7+J2Tm7ahz2TFT4=@vger.kernel.org X-Gm-Message-State: AOJu0YzxRl1M0c/UsFUhLQO6uJt4RGbMckeo8mYA/Bjrmg6qpC96wRuW Cy+13aSQOU6WPf++UjRNKUgzdtbYbuzLjWKHv/26DpNtz8oKzzRt+vF5LyNY6dHO1g== X-Gm-Gg: Acq92OGwzXmea1SFDrvJjUKGB4hqf5jXcaJlrEmMwjjAMzZl1TBJPKSFJH7Xj0yAJbK B0qV4snrpVPD0jhZEfB+t2NdvuETPY6EAtGJX66FfJPCHm1SgkHhxEoypvLy85xJJbvyCjcz5Hg 14rX5fL48ZlTc2jdzd2pfggWKpTruQ9wjTBzLaA8SaMeF1xWdibInSjwZwFl/3KI596iRu7KM1K TeTd0rwJiOplU3vahpzOiOaZKD4unP/pjRtCD9UirCqacQVdmiKtkFI2s8/6RgwsE9lVCKy9kGk YeHUyetkexGrDK3LioDEFdNjfHV7xw45X2XqP43Yextats/PMPlmIvvZD9gRKxvrc+w4HxDQE4Y c2i62gJo+Pd6hNS/XdIx2Qeb1SIWtVoyV8vbqJ+pvcgYVyleYm1iZSd5HduJZXH0VdwuIrDQOwd yWJDoz1UCaaKE/PsIqR2Rr8t4= X-Received: by 2002:a5d:588f:0:b0:43d:733f:aee6 with SMTP id ffacd0b85a97d-45e5c5aefe4mr39349312f8f.10.1779294930382; Wed, 20 May 2026 09:35:30 -0700 (PDT) Received: from [10.83.43.186] ([159.134.255.34]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45d9e768c4fsm55863651f8f.8.2026.05.20.09.35.28 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 20 May 2026 09:35:29 -0700 (PDT) Message-ID: <591b66a1-4b20-40d5-b454-9eecbabfc832@arista.com> Date: Wed, 20 May 2026 17:35:28 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: "Yury M." Subject: [PATCH net v2] net: tg3: guard napi_disable and pci_disable_device calls To: pavan.chebbi@broadcom.com, mchan@broadcom.com Cc: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Yury Murashka Content-Language: en-US Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; format="flowed" During PCIe hot-plug events, uncorrectable errors can be reported and AER recovery for the tg3 device is initiated by the AER kernel driver. The tg3_io_error_detected function is the AER error recovery handler. From tg3_io_error_detected, we call tg3_netif_stop->tg3_napi_disable-> napi_disable and return PCI_ERS_RESULT_NEED_RESET on non-fatal error. We expect that during AER recovery tg3_io_slot_reset and tg3_io_resume will be called. But AER error recovery can fail. For example, when one of PCIe devices on the same bus reports PCI_ERS_RESULT_NO_AER_DRIVER. As a result, tg3_io_slot_reset and tg3_io_resume are not called, PCIe device is disabled and NAPI is disabled (pci_disable_device and napi_disable are called from tg3_io_error_detected). Then we can try to disable PCIe link and napi_disable will be called again: napi_disable+0x1b/0x1b0 tg3_napi_disable+0x89/0xa0 [tg3] tg3_netif_stop+0x37/0xe3 [tg3] tg3_stop+0x30/0x160 [tg3] tg3_close+0x2a/0x60 [tg3] __dev_close_many+0xad/0x130 dev_close_many+0xb2/0x190 unregister_netdevice_many_notify+0x19d/0xa00 unregister_netdevice_queue+0xf8/0x140 unregister_netdev+0x1c/0x30 tg3_remove_one+0xaa/0x150 [tg3] pci_device_remove+0x42/0xb0 device_release_driver_internal+0x19c/0x200 pci_stop_bus_device+0x85/0xb0 pci_stop_bus_device+0x2c/0xb0 pci_stop_bus_device+0x2c/0xb0 pci_stop_and_remove_bus_device+0x12/0x20 pciehp_unconfigure_device+0x9f/0x160 pciehp_disable_slot+0x67/0x100 pciehp_handle_presence_or_link_change+0x77/0x350 This is not expected by napi_disable and a thread can be locked in napi_disable forever. We have pcierr_recovery to cover a similar issue, but for fatal errors. We cannot reuse this flag because it is reset in tg3_io_resume, but it is not called when AER recovery fails. Similarly, if an AER error is reported and tg3_io_error_detected calls pci_disable_device, a subsequent device removal via tg3_remove_one or tg3_shutdown will call pci_disable_device again for the already-disabled device. Add a napi_enabled flag to struct tg3 to track whether napi_enable has been called. Guard tg3_napi_disable() against being called before tg3_napi_enable(), logging an error if that happens. Also guard pci_disable_device() calls in tg3_remove_one() and tg3_shutdown() with pci_is_enabled() to avoid disabling an already-disabled device. Fixes: b45aa2f6192e ("tg3: Add EEH support") Signed-off-by: Yury Murashka --- drivers/net/ethernet/broadcom/tg3.c | 19 +++++++++++++++++-- drivers/net/ethernet/broadcom/tg3.h | 1 + 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/tg3.c=20 b/drivers/net/ethernet/broadcom/tg3.c index 73a4b569b..500b6f7fa 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -7396,8 +7396,18 @@ static void tg3_napi_disable(struct tg3 *tp) int txq_idx =3D tp->txq_cnt - 1; int rxq_idx =3D tp->rxq_cnt - 1; struct tg3_napi *tnapi; + struct net_device *netdev =3D tp->dev; int i; + if (!tp->napi_enabled) { + netdev_err(netdev, "%s() called when napi_enable wasn't called=20 before, netif_running=3D%d, pci_enabled=3D%d\n", + __func__, netif_running(netdev), + pci_is_enabled(tp->pdev)); + return; + } + + tp->napi_enabled =3D false; + for (i =3D tp->irq_cnt - 1; i >=3D 0; i--) { tnapi =3D &tp->napi[i]; if (tnapi->tx_buffers) { @@ -7420,6 +7430,8 @@ static void tg3_napi_enable(struct tg3 *tp) struct tg3_napi *tnapi; int i; + tp->napi_enabled =3D true; + for (i =3D 0; i < tp->irq_cnt; i++) { tnapi =3D &tp->napi[i]; napi_enable_locked(&tnapi->napi); @@ -17718,6 +17730,7 @@ static int tg3_init_one(struct pci_dev *pdev, tp->tx_mode =3D TG3_DEF_TX_MODE; tp->irq_sync =3D 1; tp->pcierr_recovery =3D false; + tp->napi_enabled =3D false; if (tg3_debug > 0) tp->msg_enable =3D tg3_debug; @@ -18099,7 +18112,8 @@ static void tg3_remove_one(struct pci_dev *pdev) } free_netdev(dev); pci_release_regions(pdev); - pci_disable_device(pdev); + if (pci_is_enabled(pdev)) + pci_disable_device(pdev); } } @@ -18257,7 +18271,8 @@ static void tg3_shutdown(struct pci_dev *pdev) rtnl_unlock(); - pci_disable_device(pdev); + if (pci_is_enabled(pdev)) + pci_disable_device(pdev); } /** diff --git a/drivers/net/ethernet/broadcom/tg3.h=20 b/drivers/net/ethernet/broadcom/tg3.h index a9e7f88fa..34fb771e8 100644 --- a/drivers/net/ethernet/broadcom/tg3.h +++ b/drivers/net/ethernet/broadcom/tg3.h @@ -3429,6 +3429,7 @@ struct tg3 { struct device *hwmon_dev; bool link_up; bool pcierr_recovery; + bool napi_enabled; u32 ape_hb; unsigned long ape_hb_interval; --=20 2.51.0