From nobody Sun Oct 5 10:51:18 2025 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F610A41; Wed, 6 Aug 2025 10:35:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754476520; cv=none; b=iokL3NKVar0YVNM/ueg3R0R9WooqSCdmJ/U518bSGNUZyGHYvy/CxDgIJ4Xu8UpFR5GS/sqIc0dHVtGnyOqfJzW75jHt4ycEL9dpgltJLDYMMvK5h26mziXt+da+NWL4olb/5KfhUvB8wIdZ8xS5Fo3MliZ50VjttyDM5qlY8Mo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754476520; c=relaxed/simple; bh=rjny7gErLFEdkyoVCzj4TvXLwbf6D+3TUyA/WEluhsA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=KXNQ7n6fhZI8jxSh6riWmBB551aSPCMnFEL6Nb4IyhfjRHFEbN7D0k1C0wZWvqfUPIzl5BGFV3uwHrboIH21zB50qz6xtEsZMaUUF4hQeaJEhPNsjk4NCQRMYQEDklLFdfN4LUsZcNI1Jlpottz9V/65dL/SkTDC3BNctADi1fY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4bxmql2YVsztT1y; Wed, 6 Aug 2025 18:34:11 +0800 (CST) Received: from kwepemk100013.china.huawei.com (unknown [7.202.194.61]) by mail.maildlp.com (Postfix) with ESMTPS id C56EE180486; Wed, 6 Aug 2025 18:35:12 +0800 (CST) Received: from localhost.localdomain (10.90.31.46) by kwepemk100013.china.huawei.com (7.202.194.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 6 Aug 2025 18:35:12 +0800 From: Jijie Shao To: , , , , , CC: , , , , , , , , Subject: [PATCH V3 net 1/3] net: hibmcge: fix rtnl deadlock issue Date: Wed, 6 Aug 2025 18:27:56 +0800 Message-ID: <20250806102758.3632674-2-shaojijie@huawei.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20250806102758.3632674-1-shaojijie@huawei.com> References: <20250806102758.3632674-1-shaojijie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems500002.china.huawei.com (7.221.188.17) To kwepemk100013.china.huawei.com (7.202.194.61) Content-Type: text/plain; charset="utf-8" Currently, the hibmcge netdev acquires the rtnl_lock in pci_error_handlers.reset_prepare() and releases it in pci_error_handlers.reset_done(). However, in the PCI framework: pci_reset_bus - __pci_reset_slot - pci_slot_save_and_disable_locked - pci_dev_save_and_disable - err_handler->reset_prepare(dev); In pci_slot_save_and_disable_locked(): list_for_each_entry(dev, &slot->bus->devices, bus_list) { if (!dev->slot || dev->slot!=3D slot) continue; pci_dev_save_and_disable(dev); if (dev->subordinate) pci_bus_save_and_disable_locked(dev->subordinate); } This will iterate through all devices under the current bus and execute err_handler->reset_prepare(), causing two devices of the hibmcge driver to sequentially request the rtnl_lock, leading to a deadlock. Since the driver now executes netif_device_detach() before the reset process, it will not concurrently with other netdev APIs, so there is no need to hold the rtnl_lock now. Therefore, this patch removes the rtnl_lock during the reset process and adjusts the position of HBG_NIC_STATE_RESETTING to ensure that multiple resets are not executed concurrently. Fixes: 3f5a61f6d504f ("net: hibmcge: Add reset supported in this module") Signed-off-by: Jijie Shao Reviewed-by: Simon Horman --- ChangeLog: v1 -> v2: - Fix a concurrency issue, suggested by Simon Horman v1: https://lore.kernel.org/all/20250731134749.4090041-1-shaojijie@huawei= .com/ --- drivers/net/ethernet/hisilicon/hibmcge/hbg_err.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hibmcge/hbg_err.c b/drivers/net= /ethernet/hisilicon/hibmcge/hbg_err.c index 503cfbfb4a8a..83cf75bf7a17 100644 --- a/drivers/net/ethernet/hisilicon/hibmcge/hbg_err.c +++ b/drivers/net/ethernet/hisilicon/hibmcge/hbg_err.c @@ -53,9 +53,11 @@ static int hbg_reset_prepare(struct hbg_priv *priv, enum= hbg_reset_type type) { int ret; =20 - ASSERT_RTNL(); + if (test_and_set_bit(HBG_NIC_STATE_RESETTING, &priv->state)) + return -EBUSY; =20 if (netif_running(priv->netdev)) { + clear_bit(HBG_NIC_STATE_RESETTING, &priv->state); dev_warn(&priv->pdev->dev, "failed to reset because port is up\n"); return -EBUSY; @@ -64,7 +66,6 @@ static int hbg_reset_prepare(struct hbg_priv *priv, enum = hbg_reset_type type) netif_device_detach(priv->netdev); =20 priv->reset_type =3D type; - set_bit(HBG_NIC_STATE_RESETTING, &priv->state); clear_bit(HBG_NIC_STATE_RESET_FAIL, &priv->state); ret =3D hbg_hw_event_notify(priv, HBG_HW_EVENT_RESET); if (ret) { @@ -84,29 +85,26 @@ static int hbg_reset_done(struct hbg_priv *priv, enum h= bg_reset_type type) type !=3D priv->reset_type) return 0; =20 - ASSERT_RTNL(); - - clear_bit(HBG_NIC_STATE_RESETTING, &priv->state); ret =3D hbg_rebuild(priv); if (ret) { priv->stats.reset_fail_cnt++; set_bit(HBG_NIC_STATE_RESET_FAIL, &priv->state); + clear_bit(HBG_NIC_STATE_RESETTING, &priv->state); dev_err(&priv->pdev->dev, "failed to rebuild after reset\n"); return ret; } =20 netif_device_attach(priv->netdev); + clear_bit(HBG_NIC_STATE_RESETTING, &priv->state); =20 dev_info(&priv->pdev->dev, "reset done\n"); return ret; } =20 -/* must be protected by rtnl lock */ int hbg_reset(struct hbg_priv *priv) { int ret; =20 - ASSERT_RTNL(); ret =3D hbg_reset_prepare(priv, HBG_RESET_TYPE_FUNCTION); if (ret) return ret; @@ -171,7 +169,6 @@ static void hbg_pci_err_reset_prepare(struct pci_dev *p= dev) struct net_device *netdev =3D pci_get_drvdata(pdev); struct hbg_priv *priv =3D netdev_priv(netdev); =20 - rtnl_lock(); hbg_reset_prepare(priv, HBG_RESET_TYPE_FLR); } =20 @@ -181,7 +178,6 @@ static void hbg_pci_err_reset_done(struct pci_dev *pdev) struct hbg_priv *priv =3D netdev_priv(netdev); =20 hbg_reset_done(priv, HBG_RESET_TYPE_FLR); - rtnl_unlock(); } =20 static const struct pci_error_handlers hbg_pci_err_handler =3D { --=20 2.33.0