From nobody Sat Feb 7 19:50:52 2026 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2084.outbound.protection.outlook.com [40.107.243.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1909266B52; Wed, 21 May 2025 12:10:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.243.84 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747829410; cv=fail; b=bci29b5D6d+MO+4u2BXhB69FBoB6BmoOHJBKh/T9sRqzena1CDq4A1uETDMUjbLf/5IlG4PY0ypqsLnKk0NuQoyKruB+NfcgCkVuTRlpE51S+fQ8ITTmjniM0Z1qpIOXc3d/3sy7b7doxB18AY55Kd3JYSG1EFWW1TusBbNhecE= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747829410; c=relaxed/simple; bh=QVqreS1FNKdfu6T7MdeuqPiZ+J5zBOXCyXF7On580WU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ujhW1KN1l1XNJM3Kq6IiGUHtlkbQNZn9jEgHCenaK5goMyPos3wHCQuwm5vCQr+Lku8XjO0EzKR6eCRkv2r/uUzxdq+KQFqNZyh5BG/1MatIeu2qjojmCy397nvPkpidQKwFkcxA0eesE80U3W9DFXmRGsNLB1kuNxFG7VwB3IA= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=TkAKFNof; arc=fail smtp.client-ip=40.107.243.84 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="TkAKFNof" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=sdN2AHAYhvkRU9BjXe6AQ23yW+DBYIhNE8Yy0TiX3DnvYB/Dt+QQNZIBfXJFxYR13RM3LlAZRFVVugJDPnBh+AxPi19HdESNhTCXauUfRt53+2eK1tIz5FKGV7m4Pnik+pe/mPgjWgYgBHW8947qIrVpJmDMQCW/8yv7QiUObuGSxWnIU0vxw8imA8Y6vLXaKYgkHNYoH/veOMEsdPgRPh4bquje2V+LCzp3ClHYD58LW8wF7k9kCz8ByAUFPf8deX61Vdc0tB23AArmdlfDESNNJXWP7uEvjXx/pBK+Lobs+VzdkHNBRb78TkcMQhVHHpDTyA5lAkMSGL4/1ke+Dw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jWg5oYYazKWzDrDncvIh9ophWw/l7xzZQnXZDpTmuN4=; b=giAv6efKGI19aDBhWUNCWHOlMmDSklESlrhTlloSjjlsfObQUYC8XSxBnHxvTaGM8cpkfs/tlhWSyzDq4OXaorLv5AMdeXK9K1fp+EG+RU4FZ4c2Iiyhd5QUzWukx12Xenu8h5LNvKttEWUBc5+3TM1A5EJuGg0ufwblhEMnsRQnHCe/sdroc1PtEQo+KG0JZ6nznpwxa6kutVmy0jD1IByLAkrOJzRuZ4zHR04ZHGgyPhm6ONsBYSCnsOPgMnakQtsbizq5RxYgZ1qxkrD36ZK+K3xwMbUzcVlz5SbkHVY9c6NATaAYmeRQ5WV39YsRAypRC1KrsH9wONFEAqucaQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jWg5oYYazKWzDrDncvIh9ophWw/l7xzZQnXZDpTmuN4=; b=TkAKFNofyzoxZLsn7wd+SAq0kRCLMTYcnmqO99I0wAjD8WxiER/ZzL0o3XgSVRQeNLNr2jSDdS1ZDOxpo6uqt+dpMd7FJvTI7Sw8S5za1xDOegDtvyIJg1cHC4/X9HhEXkOdOeR/FVlSU/iKN2marOQUob10uJL8w4ADiceYpqXL9AGmLUL9vyJ2x1yGnSs85tmNshD/HDtzTpzfQUZZKtf0SlBc7a3GA7BolvOa7q+uBLbCXqf80xaHyN23v0OwnPprsisuPyCv8Hs+SR4XMmaujpYq3AAnmk0iOvK8H2GsPDM9YildaYWMfiVvAeePlk9mF6nuOQiZOUXhEJXN9Q== Received: from MW4PR03CA0314.namprd03.prod.outlook.com (2603:10b6:303:dd::19) by CY5PR12MB6455.namprd12.prod.outlook.com (2603:10b6:930:35::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8746.31; Wed, 21 May 2025 12:10:03 +0000 Received: from SJ5PEPF00000207.namprd05.prod.outlook.com (2603:10b6:303:dd:cafe::e5) by MW4PR03CA0314.outlook.office365.com (2603:10b6:303:dd::19) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8746.27 via Frontend Transport; Wed, 21 May 2025 12:10:03 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SJ5PEPF00000207.mail.protection.outlook.com (10.167.244.40) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8769.18 via Frontend Transport; Wed, 21 May 2025 12:10:03 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Wed, 21 May 2025 05:09:50 -0700 Received: from rnnvmail202.nvidia.com (10.129.68.7) by rnnvmail204.nvidia.com (10.129.68.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 21 May 2025 05:09:49 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.7) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Wed, 21 May 2025 05:09:44 -0700 From: Tariq Toukan To: "David S. Miller" , Jakub Kicinski , Paolo Abeni , Eric Dumazet , "Andrew Lunn" CC: Jason Gunthorpe , Leon Romanovsky , "Saeed Mahameed" , Tariq Toukan , "Richard Cochran" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , , , , , Moshe Shemesh , Mark Bloch , Gal Pressman , Cosmin Ratiu Subject: [PATCH net-next 1/5] IB/IPoIB: Enqueue separate work_structs for each flushed interface Date: Wed, 21 May 2025 15:08:58 +0300 Message-ID: <1747829342-1018757-2-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1747829342-1018757-1-git-send-email-tariqt@nvidia.com> References: <1747829342-1018757-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: AnonymousSubmission X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF00000207:EE_|CY5PR12MB6455:EE_ X-MS-Office365-Filtering-Correlation-Id: 9585f803-1830-41e0-b2f9-08dd98606abd X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|36860700013|376014|1800799024|7416014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?16HOwt32GqYSwkd4/0lkdSoAwJI1vav07jWd8qz0oXgBjPwm/E5bYi/hdATE?= =?us-ascii?Q?FIHGQTEEkbuDUpJUA1quq4/dJld1XI8EwUjoEhG/TwlZ/5cQlodYrj9V/9Np?= =?us-ascii?Q?hACSReEx9KLRrwxTM9jrtRGbrbgE0rvui2iHkSwbCpXLLDhaeAohf5Aq5o9k?= =?us-ascii?Q?tO8a6SlVJMQxz4oE6RnWSDXil/zmC0ZQJM2MA+ErZCNujoNAX1bTqDQu98Jd?= =?us-ascii?Q?uzzrqc20k6l7p5X07OElnrTCF5nm0sD+oB5e9YQLpvmfTLHu7QXF2vf+/eDH?= =?us-ascii?Q?OTxFOJUfoseyUNYX2aRnpqLD8Df61s/AwTQfSmYWNMEgnWDUXbwM8gicoZhN?= =?us-ascii?Q?M3GCTZtXl8vg7Cq56aqE3foaBX246gyIHsCX5+tCLinH/LbYZ6KwEvM7o5OT?= =?us-ascii?Q?JZdxKt7+3khPTtE2+yaZsYJEfenQHo3556cybQ8qfjzn7QeFQGAmpoCzyvP2?= =?us-ascii?Q?I0JwjuNocov0MLWM3Yk1iJYCfMLANJbQEg6Eh/+4CR/6jzZaqmsqH4fwW4HR?= =?us-ascii?Q?k9uX86qqoA1NcsZdMPmCYi37ZKjEyVeCafoXIAsZbJi6xarVu1TqcQHivRQY?= =?us-ascii?Q?tl7nJBDeuUARCgVXZehLdplAlTsmq0Xq3TFv0eE6Ny2dXWYSaYr58gaYoQx2?= =?us-ascii?Q?sS9WutNE6dYau49gCdhUV68pZ/5D7XAclLIojUZnwP9xDugpfw7DPv9sIhY6?= =?us-ascii?Q?yQ5XB+5jKwJQaTYNa4MEXC76mGpNIfsoodDxYlk9Ws0VgXZ5U98E6+9OsE2b?= =?us-ascii?Q?ji/zl0eHx5vrm/1Hf1IKIVUwLUQC3kPUOlLFVBfsJ1mRL/Ujoj/SzxnItXfQ?= =?us-ascii?Q?y1t8P+lxYig+/Y7pymmTvJoN30y3cuxJjtN7xNJ27bE5WxYD1eYdoibYH75W?= =?us-ascii?Q?v8T93BXZmOQ9MhEV0UAcGB201jitmQ8Lz5z8VnHS7lOThu7CG/YGYwxmRy/k?= =?us-ascii?Q?twjKHDCqDLChPlEbbTWlRAsGBY5xjMfb3LPL3SpxZjDmOJ2PfWfW7bvYrmou?= =?us-ascii?Q?MZJqvx9ArHgiicBL9Ux5vs9eKH6BQYn2eekk+1uUrNEPLmGGhlNkFFTLL+kI?= =?us-ascii?Q?M0dqpyq0cftHUNjWRrm6qUC+V0YgPdGhlY1K534HkTwNhr3Crv20er8iAmnz?= =?us-ascii?Q?ZweSWqWmZLJJ5/bLuNWHWS1I1DDQ5NAQ3kw8PfR0Rebi0BN3yOadYoDUCcbe?= =?us-ascii?Q?Sk5OfEGEFhppCCcIemJ+cnSnyPYWNsdmSqZ+FD8C05Lm0+ZukPNdQcaAjApv?= =?us-ascii?Q?lZ3PndWCR4+6fjhAsKSEGti+orummY/7mCG9eLps96//yMqvh+AVkBJWLYEm?= =?us-ascii?Q?5LcA7BQzE16fdiuZAcvXI8tuJw2YeZP+p/CxlqNGzpCTNMbHKdTwPHo66Jg1?= =?us-ascii?Q?uDeCqNDZj/E89jlqsiAxjyOV3+C7+/MnlaZHCBGSdkk0blRRXz5+BcBHs1oh?= =?us-ascii?Q?y70rwFSm3gVbef8ACf3X5VZY1DsJR0ic6V/37G9D3dmbhMDSkbczhMSkFdM0?= =?us-ascii?Q?Xnzc4B9LdQ9DLoKEsYXwvK0HfY9wPScp2xlL?= X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(36860700013)(376014)(1800799024)(7416014);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 May 2025 12:10:03.1252 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9585f803-1830-41e0-b2f9-08dd98606abd X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF00000207.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6455 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Cosmin Ratiu Previously, flushing a netdevice involved first flushing all child devices from the flush task itself. That requires holding the lock that protects the list for the entire duration of the flush. This poses a problem when converting from vlan_rwsem to the netdev instance lock (next patch), because holding the parent lock while trying to acquire a child lock makes lockdep unhappy, rightfully. Fix this by splitting a big flush task into individual flush tasks (all are already created in their respective ipoib_dev_priv structs) and defining a helper function to enqueue all of them while holding the list lock. In ipoib_set_mac, the function is not used and the task is enqueued directly, because in the subsequent patches locking is changed and this function may be called with the netdev instance lock held. This is effectively a noop, the wq is single-threaded and ordered and will execute the same flush operations in the same order as before. Furthermore, there should be no new races because ipoib_parent_unregister_pre() calls flush_workqueue() after stopping new work generation to wait for pending work to complete. flush_workqueue() waits for all currently enqueued work to finish before returning. Signed-off-by: Cosmin Ratiu Reviewed-by: Carolina Jubran Reviewed-by: Leon Romanovsky Signed-off-by: Tariq Toukan --- drivers/infiniband/ulp/ipoib/ipoib.h | 2 + drivers/infiniband/ulp/ipoib/ipoib_ib.c | 46 ++++++++++++++-------- drivers/infiniband/ulp/ipoib/ipoib_main.c | 10 ++++- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 8 ++-- 4 files changed, 44 insertions(+), 22 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/= ipoib/ipoib.h index abe0522b7df4..2e05e9c9317d 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -512,6 +512,8 @@ int ipoib_intf_init(struct ib_device *hca, u32 port, co= nst char *format, void ipoib_ib_dev_flush_light(struct work_struct *work); void ipoib_ib_dev_flush_normal(struct work_struct *work); void ipoib_ib_dev_flush_heavy(struct work_struct *work); +void ipoib_queue_work(struct ipoib_dev_priv *priv, + enum ipoib_flush_level level); void ipoib_ib_tx_timeout_work(struct work_struct *work); void ipoib_ib_dev_cleanup(struct net_device *dev); =20 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/u= lp/ipoib/ipoib_ib.c index 5cde275daa94..e0e7f600097d 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -1172,24 +1172,11 @@ static bool ipoib_dev_addr_changed_valid(struct ipo= ib_dev_priv *priv) } =20 static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, - enum ipoib_flush_level level, - int nesting) + enum ipoib_flush_level level) { - struct ipoib_dev_priv *cpriv; struct net_device *dev =3D priv->dev; int result; =20 - down_read_nested(&priv->vlan_rwsem, nesting); - - /* - * Flush any child interfaces too -- they might be up even if - * the parent is down. - */ - list_for_each_entry(cpriv, &priv->child_intfs, list) - __ipoib_ib_dev_flush(cpriv, level, nesting + 1); - - up_read(&priv->vlan_rwsem); - if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) && level !=3D IPOIB_FLUSH_HEAVY) { /* Make sure the dev_addr is set even if not flushing */ @@ -1280,7 +1267,7 @@ void ipoib_ib_dev_flush_light(struct work_struct *wor= k) struct ipoib_dev_priv *priv =3D container_of(work, struct ipoib_dev_priv, flush_light); =20 - __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_LIGHT, 0); + __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_LIGHT); } =20 void ipoib_ib_dev_flush_normal(struct work_struct *work) @@ -1288,7 +1275,7 @@ void ipoib_ib_dev_flush_normal(struct work_struct *wo= rk) struct ipoib_dev_priv *priv =3D container_of(work, struct ipoib_dev_priv, flush_normal); =20 - __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_NORMAL, 0); + __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_NORMAL); } =20 void ipoib_ib_dev_flush_heavy(struct work_struct *work) @@ -1297,10 +1284,35 @@ void ipoib_ib_dev_flush_heavy(struct work_struct *w= ork) container_of(work, struct ipoib_dev_priv, flush_heavy); =20 rtnl_lock(); - __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_HEAVY, 0); + __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_HEAVY); rtnl_unlock(); } =20 +void ipoib_queue_work(struct ipoib_dev_priv *priv, + enum ipoib_flush_level level) +{ + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { + struct ipoib_dev_priv *cpriv; + + down_read(&priv->vlan_rwsem); + list_for_each_entry(cpriv, &priv->child_intfs, list) + ipoib_queue_work(cpriv, level); + up_read(&priv->vlan_rwsem); + } + + switch (level) { + case IPOIB_FLUSH_LIGHT: + queue_work(ipoib_workqueue, &priv->flush_light); + break; + case IPOIB_FLUSH_NORMAL: + queue_work(ipoib_workqueue, &priv->flush_normal); + break; + case IPOIB_FLUSH_HEAVY: + queue_work(ipoib_workqueue, &priv->flush_heavy); + break; + } +} + void ipoib_ib_dev_cleanup(struct net_device *dev) { struct ipoib_dev_priv *priv =3D ipoib_priv(dev); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband= /ulp/ipoib/ipoib_main.c index 3b463db8ce39..55b1f3cbee17 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -2415,6 +2415,14 @@ static int ipoib_set_mac(struct net_device *dev, voi= d *addr) =20 set_base_guid(priv, (union ib_gid *)(ss->__data + 4)); =20 + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { + struct ipoib_dev_priv *cpriv; + + down_read(&priv->vlan_rwsem); + list_for_each_entry(cpriv, &priv->child_intfs, list) + queue_work(ipoib_workqueue, &cpriv->flush_light); + up_read(&priv->vlan_rwsem); + } queue_work(ipoib_workqueue, &priv->flush_light); =20 return 0; @@ -2526,7 +2534,7 @@ static struct net_device *ipoib_add_port(const char *= format, ib_register_event_handler(&priv->event_handler); =20 /* call event handler to ensure pkey in sync */ - queue_work(ipoib_workqueue, &priv->flush_heavy); + ipoib_queue_work(priv, IPOIB_FLUSH_HEAVY); =20 ndev->rtnl_link_ops =3D ipoib_get_link_ops(); =20 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniban= d/ulp/ipoib/ipoib_verbs.c index 368e5d77416d..86983080d28b 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -280,15 +280,15 @@ void ipoib_event(struct ib_event_handler *handler, dev_name(&record->device->dev), record->element.port_num); =20 if (record->event =3D=3D IB_EVENT_CLIENT_REREGISTER) { - queue_work(ipoib_workqueue, &priv->flush_light); + ipoib_queue_work(priv, IPOIB_FLUSH_LIGHT); } else if (record->event =3D=3D IB_EVENT_PORT_ERR || record->event =3D=3D IB_EVENT_PORT_ACTIVE || record->event =3D=3D IB_EVENT_LID_CHANGE) { - queue_work(ipoib_workqueue, &priv->flush_normal); + ipoib_queue_work(priv, IPOIB_FLUSH_NORMAL); } else if (record->event =3D=3D IB_EVENT_PKEY_CHANGE) { - queue_work(ipoib_workqueue, &priv->flush_heavy); + ipoib_queue_work(priv, IPOIB_FLUSH_HEAVY); } else if (record->event =3D=3D IB_EVENT_GID_CHANGE && !test_bit(IPOIB_FLAG_DEV_ADDR_SET, &priv->flags)) { - queue_work(ipoib_workqueue, &priv->flush_light); + ipoib_queue_work(priv, IPOIB_FLUSH_LIGHT); } } --=20 2.31.1 From nobody Sat Feb 7 19:50:52 2026 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2061.outbound.protection.outlook.com [40.107.94.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1EB6D265CB0; Wed, 21 May 2025 12:10:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.94.61 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747829422; cv=fail; b=pPzQSh5GICZ8hecwH8sF4tacWxM1Q5jZssDjRn2YU77Y9kC9el7DHB5AC0q2yIx5N+YrpcCGfPXFz5BEXG6R5sF/3Z3+UgJ0CRbj5sTG4j+y+Q5yoWGUtsWB2bAjJxIPBDpaL9R2B5jcypqzXmr54WKCXt77RyHgIXbVk9GQ8G0= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747829422; c=relaxed/simple; bh=kisSevErLWTYF6ab9499ZV4KUTrd2ZADH7xbxnr3CKU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NCTkJJDv+OmkLje3Zi4wD9wUow4LWRfb1dmKV2IZZBF7cDW4ZK7KpSx0nv8+5Kf+DoLECqE72FmTrN+B55pgQSPwJUQF4z+d4Uu7VJyrB5noaRMwgWtsQfR02G/UDQm4802Rse7iRaYCU2/ffRa7hC5Ze+V6l4mNXSwhCLMT3/0= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=VyOj3JUH; arc=fail smtp.client-ip=40.107.94.61 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="VyOj3JUH" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=OHuxGJJUFrZLYDAHbmX7rYEWHPYMICkRbVXTmG+ItJ9BATSr+yMYq4N5jLU33lI3ueIya4BVHaboeybcb8WbYNMaiL0rA+TL136PfPTGsBfraLxlO+tcccKcQaQ6Rz7CZp1Qi8itQN8FY4i+yQqrmE+QMyFEWxa+ecCz2ZY/HJO7lP3wfPQzCbihYiUf3yyIQqAle9eBojphy813KiK6H8AzgUQ827myfgS8mf6MlWdCgJ48C3QhD/PWlzmLurcEWEQjXRWesdXNE44ndUzzA+TzoJlGreFvmg2JPL0ZLfh98UR3GBTAjmLFwfuE6i4eeOZwZBzrAxxliorG3eA5Lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ItESPl4gXUfxTF91Eln4pdw2pxtL1M4mEHH35cbcqW8=; b=iljPp/NexDk4I04crGK9EGXG2Rpq2LDUL0TZ0fgTj/MRiVHcqlq2HmPszrmwqxSI3lQWNZoSSx8qmdRt6P33+Bzw71TZ6V2wOcIXFepVGJsMBZtmtbdXc9d6/0Hs151GlyehFYdMjjIC2lGDtqTj1BvTbLoekxHxCZwCVUQ2xfkEXFVIRCdgvYvtZNwU1rxkhoovqkQKr0bW8Bf/4qhHTee0amfdAVhRoFCjYOrhj3YnpckEjHwrqZwgtJaaKbltlrkh1zgGLVaCixwdJNxyi+fgOsuy8AFayR79Wr3wrYqdJSDiQdMoGjEkL4NmxOUo9e5UJvQd3ny22B1sbZaB3w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ItESPl4gXUfxTF91Eln4pdw2pxtL1M4mEHH35cbcqW8=; b=VyOj3JUHLLTVg0iBNoc/aKcKM1GZTv+qTaeY1Uz3ncL7gfSIUPDT4+2ORLf98DKbQO051Eyi8JYE4GVY/8DB+Gk6gtk5fgWXB7936c7ss1YGgleUhZ0k/PLFRg4U6SkSkfWsHIHgzsz7WFFBdsfCOME4BUEYVZBL/i4hkivVYXe2garFU8imaeZJ4IZQmVxsfZ8vtfSZQqyrBPvNzUuooglmWY+uTxAN5CHneTMIUEobU3wxRnZUYBm/KEBQ8IQ6B2u+ZGblLh11u0z0yLiyqec3/fj36rgryt71IlwkaikxWYEBS4hcL8IQRG7b45zeogStXc0tjFYShqYrOgHUbQ== Received: from SN6PR04CA0094.namprd04.prod.outlook.com (2603:10b6:805:f2::35) by SA1PR12MB9548.namprd12.prod.outlook.com (2603:10b6:806:458::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8746.30; Wed, 21 May 2025 12:10:14 +0000 Received: from SN1PEPF00036F40.namprd05.prod.outlook.com (2603:10b6:805:f2:cafe::6f) by SN6PR04CA0094.outlook.office365.com (2603:10b6:805:f2::35) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8746.30 via Frontend Transport; Wed, 21 May 2025 12:10:13 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by SN1PEPF00036F40.mail.protection.outlook.com (10.167.248.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8769.18 via Frontend Transport; Wed, 21 May 2025 12:10:13 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Wed, 21 May 2025 05:09:55 -0700 Received: from rnnvmail202.nvidia.com (10.129.68.7) by rnnvmail204.nvidia.com (10.129.68.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 21 May 2025 05:09:55 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.7) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Wed, 21 May 2025 05:09:50 -0700 From: Tariq Toukan To: "David S. Miller" , Jakub Kicinski , Paolo Abeni , Eric Dumazet , "Andrew Lunn" CC: Jason Gunthorpe , Leon Romanovsky , "Saeed Mahameed" , Tariq Toukan , "Richard Cochran" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , , , , , Moshe Shemesh , Mark Bloch , Gal Pressman , Cosmin Ratiu Subject: [PATCH net-next 2/5] IB/IPoIB: Replace vlan_rwsem with the netdev instance lock Date: Wed, 21 May 2025 15:08:59 +0300 Message-ID: <1747829342-1018757-3-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1747829342-1018757-1-git-send-email-tariqt@nvidia.com> References: <1747829342-1018757-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: AnonymousSubmission X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF00036F40:EE_|SA1PR12MB9548:EE_ X-MS-Office365-Filtering-Correlation-Id: 3af59c4e-e122-49b3-9bbc-08dd986070b8 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|36860700013|7416014|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?v2CecPO2o+oW0LRWuPqwHpeRomocU5f4bROeyC82HAnab9e+YBK+oAMc0wol?= =?us-ascii?Q?jvSh5hndgbNM+s6YCxGmTwGM5iCMM2OxFMHNplzxZ+2+o4glVwBn4R9wHPDt?= =?us-ascii?Q?g5hOB/aEYm26ubgo/89HPoKK98WDuUjKIPPnQ5ynszIo6an7JvRzAZ8MxIkF?= =?us-ascii?Q?OgUca8uUF2cGHuuFZwM/aC6S5t7PvUdYsqe3sEMJpL9Cat+2QqATEG3CUQwg?= =?us-ascii?Q?wU2BPFbKjmyWcCU+XWLniUAcH8ZzSxxq6ruMrYtA5eQh/0EsA15bMXSL1Qhr?= =?us-ascii?Q?ytPIYhk8clVGUw3G3t9iX+q5TOlw/U72UbnNLY+AdbDFW4aKI2pe+BX5EYY+?= =?us-ascii?Q?Z9GKAPvm11ixfdsGrSGqJwoq/A/+kn1hUPVVOn3TNdbOegH9TVNS2sVovAOq?= =?us-ascii?Q?hUSUShBsNlMS2AVXlJacrk01s7yPH28BcTy8GP0PQQYGF7aY9/6eibmk6YDP?= =?us-ascii?Q?3j4G9x1ZNBuES4ShiGTpoxIAFjw0MTVx59HYJnzZxDkoCTVOsVcY8qDjjV94?= =?us-ascii?Q?VNrIBhFLRPUpznddgubKmq7pbW1+bTbdwfigbZmJlCFVWzSNkDLLhTSWnWNg?= =?us-ascii?Q?0Pkyeh9P2p2n7CxyMHhD6hC1arDJeACQf1+su3dvwUmW4vvvbfgBgT7IH1G7?= =?us-ascii?Q?3mipsg9ALJUyPqjwN30tL9jrzPIScuT7/1CkoxXktOlDLQO18zlxTtOT9gHQ?= =?us-ascii?Q?OgfXqfjbaCcGH8igjoRDyKde6AdHSKGYtyvlD+duDCsEL6P0NZyc+M8uecHZ?= =?us-ascii?Q?wOnG4T/RoiK1+yhfefIqfjkJ7qWBdcpnW7O+EXuELhSxe6Dqnx6Y8unPYOXX?= =?us-ascii?Q?2VK2D2P4e7D9FyZfFqgf5ukJflZv+NYAXQOwSpJi7BmjSisebLgJjpXwTZ6L?= =?us-ascii?Q?AezhlYOezsb4JwO+jOnlaF+w/ODTX5H0TO3887KCIE8hexJ5fHr4FWmsODFk?= =?us-ascii?Q?YWhcV4Xp6oi43BeLBZhDYNK6y89Gyz2H55R1jB+bKAw59GpDZcTHMU7eqqKu?= =?us-ascii?Q?xOQvHCS9uzQW9JRmILyM/yRw/RfSQr83Z+yhqvca76ORm+QeANAiGBe5Oq51?= =?us-ascii?Q?FaxALKj/SqcYWRGSvp/xHV5+IEp2wq0l/cK6BEMtkgMOXWgTAo+O4BkZZzNO?= =?us-ascii?Q?k0o6bcn/UaDR+uXUM69JsAcypJyMozGFxmc/QVjImDzbeF2wPI5GlA9sptKD?= =?us-ascii?Q?7bDCubnhS1zbnzJZfYk9ZzdKHDl1fuKtpRp48PNeKEBzbAsPiuZHIHjCMH68?= =?us-ascii?Q?8gIlBqkAG6n/k3eopy+LjyJmrcCRmPij2e3boi95p1O/ogS+DAGGgNP/zrvo?= =?us-ascii?Q?UCyET6Xh2hbfsi/1OIzkeNUufiDHrj5svlVrbP6Oi0HFUgUhtN6cHi3lNj7H?= =?us-ascii?Q?Zp7FNiLv9J42+GbuU8dskxx20mo5x13f73S/U25SeIuZ/BQY7CGQeowIunf/?= =?us-ascii?Q?V0cwuQhUSrroP1r8ifm1G25vx7nY9tFQQkajaMN1eLcbhW2FOWY7DZwC7iR8?= =?us-ascii?Q?DO1PhOS3ZnQuPRi40dTYYPNJhl7PRHHtwHc8?= X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230040)(1800799024)(376014)(36860700013)(7416014)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 May 2025 12:10:13.0522 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3af59c4e-e122-49b3-9bbc-08dd986070b8 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF00036F40.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB9548 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Cosmin Ratiu vlan_rwsem was added more than a decade ago to work around a deadlock involving the original mutex being acquired twice, once from the wq. Subsequent changes then tweaked it to partially protect access to ipoib_dev_priv->child_intfs together with the RTNL. Flushing the wq synchronously was also since then refactored to happen separately. This semaphore unfortunately prevents updating ipoib to work with devices that require the netdev lock, because of lock ordering issues between RTNL, vlan_rwsem and the netdev instance locks of parent and child devices. To uncomplicate things, this commit replaces vlan_rwsem with the netdev instance lock of the parent device. Both parent child_intfs list and the children's list membership in it require holding the parent netdev instance lock. All call paths were carefully reviewed and no-longer-needed ASSERT_RTNL calls were dropped. Some non-trivial changes: - ipoib_match_gid_pkey_addr() now only acquires the instance lock and iterates through child_intfs for the first level of recursion (the parent), as it's not possible to have multiple levels of nested subinterfaces. - ipoib_open() and ipoib_stop() schedule tasks on the global workqueue to open/stop child interfaces to avoid potentially acquiring nested netdev instance locks. To avoid the device going away between the task scheduling and execution, netdev_hold/netdev_put are used. Signed-off-by: Cosmin Ratiu Reviewed-by: Carolina Jubran Reviewed-by: Leon Romanovsky Signed-off-by: Tariq Toukan --- drivers/infiniband/ulp/ipoib/ipoib.h | 11 +-- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 4 +- drivers/infiniband/ulp/ipoib/ipoib_main.c | 110 ++++++++++++++-------- drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 19 ++-- 4 files changed, 87 insertions(+), 57 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/= ipoib/ipoib.h index 2e05e9c9317d..91f866e3fb8b 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -329,14 +329,6 @@ struct ipoib_dev_priv { =20 unsigned long flags; =20 - /* - * This protects access to the child_intfs list. - * To READ from child_intfs the RTNL or vlan_rwsem read side must be - * held. To WRITE RTNL and the vlan_rwsem write side must be held (in - * that order) This lock exists because we have a few contexts where - * we need the child_intfs, but do not want to grab the RTNL. - */ - struct rw_semaphore vlan_rwsem; struct mutex mcast_mutex; =20 struct rb_root path_tree; @@ -399,6 +391,9 @@ struct ipoib_dev_priv { struct ib_event_handler event_handler; =20 struct net_device *parent; + /* 'child_intfs' and 'list' membership of all child devices are + * protected by the netdev instance lock of 'dev'. + */ struct list_head child_intfs; struct list_head list; int child_type; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/u= lp/ipoib/ipoib_ib.c index e0e7f600097d..dc670b4a191b 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -1294,10 +1294,10 @@ void ipoib_queue_work(struct ipoib_dev_priv *priv, if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { struct ipoib_dev_priv *cpriv; =20 - down_read(&priv->vlan_rwsem); + netdev_lock(priv->dev); list_for_each_entry(cpriv, &priv->child_intfs, list) ipoib_queue_work(cpriv, level); - up_read(&priv->vlan_rwsem); + netdev_unlock(priv->dev); } =20 switch (level) { diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband= /ulp/ipoib/ipoib_main.c index 55b1f3cbee17..4879fd17e868 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -132,6 +132,52 @@ static int ipoib_netdev_event(struct notifier_block *t= his, } #endif =20 +struct ipoib_ifupdown_work { + struct work_struct work; + struct net_device *dev; + netdevice_tracker dev_tracker; + bool up; +}; + +static void ipoib_ifupdown_task(struct work_struct *work) +{ + struct ipoib_ifupdown_work *pwork =3D + container_of(work, struct ipoib_ifupdown_work, work); + struct net_device *dev =3D pwork->dev; + unsigned int flags; + + rtnl_lock(); + flags =3D dev->flags; + if (pwork->up) + flags |=3D IFF_UP; + else + flags &=3D ~IFF_UP; + + if (dev->flags !=3D flags) + dev_change_flags(dev, flags, NULL); + rtnl_unlock(); + netdev_put(dev, &pwork->dev_tracker); + kfree(pwork); +} + +static void ipoib_schedule_ifupdown_task(struct net_device *dev, bool up) +{ + struct ipoib_ifupdown_work *work; + + if ((up && (dev->flags & IFF_UP)) || + (!up && !(dev->flags & IFF_UP))) + return; + + work =3D kmalloc(sizeof(*work), GFP_KERNEL); + if (!work) + return; + work->dev =3D dev; + netdev_hold(dev, &work->dev_tracker, GFP_KERNEL); + work->up =3D up; + INIT_WORK(&work->work, ipoib_ifupdown_task); + queue_work(ipoib_workqueue, &work->work); +} + int ipoib_open(struct net_device *dev) { struct ipoib_dev_priv *priv =3D ipoib_priv(dev); @@ -154,17 +200,10 @@ int ipoib_open(struct net_device *dev) struct ipoib_dev_priv *cpriv; =20 /* Bring up any child interfaces too */ - down_read(&priv->vlan_rwsem); - list_for_each_entry(cpriv, &priv->child_intfs, list) { - int flags; - - flags =3D cpriv->dev->flags; - if (flags & IFF_UP) - continue; - - dev_change_flags(cpriv->dev, flags | IFF_UP, NULL); - } - up_read(&priv->vlan_rwsem); + netdev_lock(dev); + list_for_each_entry(cpriv, &priv->child_intfs, list) + ipoib_schedule_ifupdown_task(cpriv->dev, true); + netdev_unlock(dev); } else if (priv->parent) { struct ipoib_dev_priv *ppriv =3D ipoib_priv(priv->parent); =20 @@ -199,17 +238,10 @@ static int ipoib_stop(struct net_device *dev) struct ipoib_dev_priv *cpriv; =20 /* Bring down any child interfaces too */ - down_read(&priv->vlan_rwsem); - list_for_each_entry(cpriv, &priv->child_intfs, list) { - int flags; - - flags =3D cpriv->dev->flags; - if (!(flags & IFF_UP)) - continue; - - dev_change_flags(cpriv->dev, flags & ~IFF_UP, NULL); - } - up_read(&priv->vlan_rwsem); + netdev_lock(dev); + list_for_each_entry(cpriv, &priv->child_intfs, list) + ipoib_schedule_ifupdown_task(cpriv->dev, false); + netdev_unlock(dev); } =20 return 0; @@ -426,17 +458,20 @@ static int ipoib_match_gid_pkey_addr(struct ipoib_dev= _priv *priv, } } =20 + if (test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) + return matches; + /* Check child interfaces */ - down_read_nested(&priv->vlan_rwsem, nesting); + netdev_lock(priv->dev); list_for_each_entry(child_priv, &priv->child_intfs, list) { matches +=3D ipoib_match_gid_pkey_addr(child_priv, gid, - pkey_index, addr, - nesting + 1, - found_net_dev); + pkey_index, addr, + nesting + 1, + found_net_dev); if (matches > 1) break; } - up_read(&priv->vlan_rwsem); + netdev_unlock(priv->dev); =20 return matches; } @@ -1992,9 +2027,9 @@ static int ipoib_ndo_init(struct net_device *ndev) =20 dev_hold(priv->parent); =20 - down_write(&ppriv->vlan_rwsem); + netdev_lock(priv->parent); list_add_tail(&priv->list, &ppriv->child_intfs); - up_write(&ppriv->vlan_rwsem); + netdev_unlock(priv->parent); } =20 return 0; @@ -2004,8 +2039,6 @@ static void ipoib_ndo_uninit(struct net_device *dev) { struct ipoib_dev_priv *priv =3D ipoib_priv(dev); =20 - ASSERT_RTNL(); - /* * ipoib_remove_one guarantees the children are removed before the * parent, and that is the only place where a parent can be removed. @@ -2015,9 +2048,9 @@ static void ipoib_ndo_uninit(struct net_device *dev) if (priv->parent) { struct ipoib_dev_priv *ppriv =3D ipoib_priv(priv->parent); =20 - down_write(&ppriv->vlan_rwsem); + netdev_lock(ppriv->dev); list_del(&priv->list); - up_write(&ppriv->vlan_rwsem); + netdev_unlock(ppriv->dev); } =20 ipoib_neigh_hash_uninit(dev); @@ -2167,7 +2200,6 @@ static void ipoib_build_priv(struct net_device *dev) =20 priv->dev =3D dev; spin_lock_init(&priv->lock); - init_rwsem(&priv->vlan_rwsem); mutex_init(&priv->mcast_mutex); =20 INIT_LIST_HEAD(&priv->path_list); @@ -2372,10 +2404,10 @@ static void set_base_guid(struct ipoib_dev_priv *pr= iv, union ib_gid *gid) netif_addr_unlock_bh(netdev); =20 if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { - down_read(&priv->vlan_rwsem); + netdev_lock(priv->dev); list_for_each_entry(child_priv, &priv->child_intfs, list) set_base_guid(child_priv, gid); - up_read(&priv->vlan_rwsem); + netdev_unlock(priv->dev); } } =20 @@ -2418,10 +2450,10 @@ static int ipoib_set_mac(struct net_device *dev, vo= id *addr) if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { struct ipoib_dev_priv *cpriv; =20 - down_read(&priv->vlan_rwsem); + netdev_lock(dev); list_for_each_entry(cpriv, &priv->child_intfs, list) queue_work(ipoib_workqueue, &cpriv->flush_light); - up_read(&priv->vlan_rwsem); + netdev_unlock(dev); } queue_work(ipoib_workqueue, &priv->flush_light); =20 @@ -2632,9 +2664,11 @@ static void ipoib_remove_one(struct ib_device *devic= e, void *client_data) =20 rtnl_lock(); =20 + netdev_lock(priv->dev); list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs, list) unregister_netdevice_queue(cpriv->dev, &head); + netdev_unlock(priv->dev); unregister_netdevice_queue(priv->dev, &head); unregister_netdevice_many(&head); =20 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband= /ulp/ipoib/ipoib_vlan.c index 562df2b3ef18..243e8f555eca 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -53,8 +53,7 @@ static bool is_child_unique(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv) { struct ipoib_dev_priv *tpriv; - - ASSERT_RTNL(); + bool result =3D true; =20 /* * Since the legacy sysfs interface uses pkey for deletion it cannot @@ -73,13 +72,17 @@ static bool is_child_unique(struct ipoib_dev_priv *ppri= v, if (ppriv->pkey =3D=3D priv->pkey) return false; =20 + netdev_lock(ppriv->dev); list_for_each_entry(tpriv, &ppriv->child_intfs, list) { if (tpriv->pkey =3D=3D priv->pkey && - tpriv->child_type =3D=3D IPOIB_LEGACY_CHILD) - return false; + tpriv->child_type =3D=3D IPOIB_LEGACY_CHILD) { + result =3D false; + break; + } } + netdev_unlock(ppriv->dev); =20 - return true; + return result; } =20 /* @@ -98,8 +101,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struc= t ipoib_dev_priv *priv, int result; struct rdma_netdev *rn =3D netdev_priv(ndev); =20 - ASSERT_RTNL(); - /* * We do not need to touch priv if register_netdevice fails, so just * always use this flow. @@ -267,6 +268,7 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned= short pkey) ppriv =3D ipoib_priv(pdev); =20 rc =3D -ENODEV; + netdev_lock(ppriv->dev); list_for_each_entry_safe(priv, tpriv, &ppriv->child_intfs, list) { if (priv->pkey =3D=3D pkey && priv->child_type =3D=3D IPOIB_LEGACY_CHILD) { @@ -278,9 +280,7 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned= short pkey) goto out; } =20 - down_write(&ppriv->vlan_rwsem); list_del_init(&priv->list); - up_write(&ppriv->vlan_rwsem); work->dev =3D priv->dev; INIT_WORK(&work->work, ipoib_vlan_delete_task); queue_work(ipoib_workqueue, &work->work); @@ -291,6 +291,7 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned= short pkey) } =20 out: + netdev_unlock(ppriv->dev); rtnl_unlock(); =20 return rc; --=20 2.31.1 From nobody Sat Feb 7 19:50:52 2026 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2052.outbound.protection.outlook.com [40.107.94.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71C4A268690; Wed, 21 May 2025 12:10:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.94.52 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747829428; cv=fail; b=mYux6/50mbpN/5rBsJRkU+rCFYgXv6HcOwyuGMMlrNRQTTCSi21SjGK1x170FdScRp2XZe3G9at/YJEBZmbMk+HBzxLbyXwW+VfbwjR/CZUvuIROJTrDMBLja5cNJIhztJuQlSLuNPewqDEWf50arHKaO+isZroKY8JtvODeI4g= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747829428; c=relaxed/simple; bh=3ncon6ta0gWsqdvYlqPU6POpYbAdmRb8DIQzRfdfGKU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ijbyxzZs5LKhyauhH+PIB27Vjkj/4VYqKQthYGlzY4gOHPuJP74zQamdetGQfy/xSYuHzm4/O3DtNY3zIRu+L3oVlLhq6MUX8mlVcd8H2VDG8WFtdWVF7p15MHfT7p7ptsnrKM37q/H7P851ri+MoLseou1a8AcEUQuw4YUYULc= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=grNf7JRq; arc=fail smtp.client-ip=40.107.94.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="grNf7JRq" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=S2/PZsF/4N4w5Huom+sTYDCXArlcF4kBfwAF/IU3Kr856696DdYorlumpobbo6X/5fzuLRvt2gvV4jsoa271c2v4ghooo3G/rrS6cU4hf6Ojl5asGFuU7nbVrsWYLKf1GhHniyMMt9fcF5LfA94ClIti8UjI3YSvCFuQSuaCd6hvcRZ3tGPqcpTgFm7L/JHCjPel5uBIWySpjVGjuU0wPXgWnzoffErCZ8M2/cILgvGN0BLYiQ9Ty+AHES663fas5I6yqRZ0OU6Drf2DnA7jMS70Z2mwwIBZ+wMSz38iqrOpObxGaQYQmCYBw78CVs5b9hDFWXQgoW9L7vxvVcfoug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=590wC9aOMtp9de48HYbocZtv6/4hQ4j3SJMHxdKCDbY=; b=ZewxkIVi+C0put1IFEOaUtd6X1Tya248xcLYc4ntOH0CgQRCLsTUlOpYCC/qQSM8LlatAD12/T2vezziHo7CDpcUa2JVWlkTsN1DP3OoJCqrxfhMnL33kDIcr+jgWWMLTJy/ZiBhyb1hEUQ4/V4gMRO8PUI9EQlUqbUfQl4jU3nUawreMHN4VUe9uVUah9ZWqyIHttIzpxRRzu3t9vMY161fR1disVrctz8BcdNUaILj/C3NmP2FbYNsVS4uqjoloaJ2lIwnEfc5IsguqmWxmHlUxGGl6q00Nbz12B+kyVNIlDdPcly77+01gR2ItTpGQk2FuZztD8IyIWQ5/I3o9A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=590wC9aOMtp9de48HYbocZtv6/4hQ4j3SJMHxdKCDbY=; b=grNf7JRqsH/hBd8vgziUwqtQfMXYtWn9GX3BxLMr8aZZaPxksXnVkeZkGaQSWi/MMBsd8sJYrxuHO+eXFXkXW3+xg2foukOmXtSl2sI61+uYTcNsB3oi0u58GjpOFv/jgBtSI9zs1VwsgM9bx23ro9+Qcq9Ynz07DvU/X9ZVJMTFiYnLn0/R4BIh+VkP/nxpiUuqp6sqYScE4fsrfjMo2sXO9aFqUb/amypDAgbJ7UiVfkXiXWwP2O00AmdIo2x7zTVLHY4M3QS8GqBEnm+z9lLee1Nfk7YcJedXuPDUBVnrK2BXdUPmsRDQPPNfdyNVIJsxgmIzvpm040t5ualuQg== Received: from PH7PR17CA0044.namprd17.prod.outlook.com (2603:10b6:510:323::15) by CY3PR12MB9630.namprd12.prod.outlook.com (2603:10b6:930:101::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8746.30; Wed, 21 May 2025 12:10:20 +0000 Received: from SN1PEPF00036F43.namprd05.prod.outlook.com (2603:10b6:510:323:cafe::c2) by PH7PR17CA0044.outlook.office365.com (2603:10b6:510:323::15) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8746.34 via Frontend Transport; Wed, 21 May 2025 12:10:19 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by SN1PEPF00036F43.mail.protection.outlook.com (10.167.248.27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8769.18 via Frontend Transport; Wed, 21 May 2025 12:10:19 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Wed, 21 May 2025 05:10:01 -0700 Received: from rnnvmail202.nvidia.com (10.129.68.7) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 21 May 2025 05:10:00 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.7) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Wed, 21 May 2025 05:09:55 -0700 From: Tariq Toukan To: "David S. Miller" , Jakub Kicinski , Paolo Abeni , Eric Dumazet , "Andrew Lunn" CC: Jason Gunthorpe , Leon Romanovsky , "Saeed Mahameed" , Tariq Toukan , "Richard Cochran" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , , , , , Moshe Shemesh , Mark Bloch , Gal Pressman , Cosmin Ratiu Subject: [PATCH net-next 3/5] IB/IPoIB: Allow using netdevs that require the instance lock Date: Wed, 21 May 2025 15:09:00 +0300 Message-ID: <1747829342-1018757-4-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1747829342-1018757-1-git-send-email-tariqt@nvidia.com> References: <1747829342-1018757-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: AnonymousSubmission X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF00036F43:EE_|CY3PR12MB9630:EE_ X-MS-Office365-Filtering-Correlation-Id: f989acce-6e0b-4cd6-2fb8-08dd98607483 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|36860700013|1800799024|7416014|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?hYo7lwr/YEnTSUaNY8tuOJIfetF6OCinOeciaJQAw662oYQFVO8m693/jiK5?= =?us-ascii?Q?Xup5yM4Uc2iYdf+cHSBdfyMN/fJ5dFONfcPnBVsC6Io3oJ7rtr3ZLOMvvk63?= =?us-ascii?Q?s4kUSctlqWNvfU2KW5TQ0envSvB1s2LpAnv0HW2BKZVvYuGjNkGCknPezVBf?= =?us-ascii?Q?kW7mvQHXtm7DRIoxApsnWg1eYfi6RSGlYQo16sgntC2vM3RyQk7n04OQIqQ+?= =?us-ascii?Q?DQMaOJuISJRIZD352mxnZkXbQpCvp58iUbeAm/UG5Y4KKVgk0KLFoUlbbyfY?= =?us-ascii?Q?ZmmnYMZQ8ZvrGBlcPoDmyTObd1Y2zdjTBZOexgH1wjqHUfbH1HZmV2+W4Wcy?= =?us-ascii?Q?xbjL0EE6i17/Wn2WVnPCFBTsEnVYknjSyZfRKw5mKONuiQ9aA242ZBpQygKv?= =?us-ascii?Q?wjV+RXVTIvl0ANQK9E9eys2EK+ZaQtul6B6CZw3LpPIPb4uwG7IbRKVQV5H2?= =?us-ascii?Q?dHl6NgKQq0UfNw7cU5hf908gsRbNoE48bpIUPb7UYPMWk7mGsHHjUVTwC5WM?= =?us-ascii?Q?Jmh6LnHQWgmNIerGnZAkavesXaE3qi2nIRNcpTaQM/EFsmWNZEV4lS+CzJSm?= =?us-ascii?Q?aw6E/3VsqJaiEGYfpkH96VDumD/dHFReVZrT8K68fLINCy4YKuq4amvD5/Tv?= =?us-ascii?Q?jScvFC0FhujNoSBrn/NCcWaUo/esPe5l4X73EE0z1kLjYNR9NvXJO3ZAWt93?= =?us-ascii?Q?Nicrh4Unk+6AbToelAZIQEdn1pmMnu1XWntPNbiDq6m9SKzM/KY76tmGwzHe?= =?us-ascii?Q?YoQT6ul5DrBOdb2jxgTQyji+zB189vjv7j8QBZIj0SD4ATRnTkjpWGsbcizq?= =?us-ascii?Q?6bFo1CqVZBMkZNYJAG8bplT0psZSXJsKXYrygr6dYFsN4YZYXvwgL2aXX4lj?= =?us-ascii?Q?0F+6jelaTNp63j7OXtqC8Fn+k179EWMCiGKCCkYCcKUIBh/7pFAaDUXauh3K?= =?us-ascii?Q?cWx6zXcMoGVbCN/y1tXnjVbS45tDeCowi8m/CW9z5R7tOhyxWlVeGtSmcinz?= =?us-ascii?Q?0B425XldlhUhfn187OkMXeaAJcd6y3zZ7ZjZfRFTTlE1FNPAw7cUrHoMtc+2?= =?us-ascii?Q?rxiZtKTK27aVORdvo4uRwu+i6aNr16M5y9qXlhPpQBxhUHYKP6Pl8o6sI3UW?= =?us-ascii?Q?dlIStu2NethrzxxZ6bbhREiWIExC3TOr0LfbEHMJivcdpeFGzwRJ4g4XiuVD?= =?us-ascii?Q?2yIwqulfHl/bx08bO801x+51RGVK3Hh0bdqIOO5YjJDSAbdMF4JrfK62Us9h?= =?us-ascii?Q?Xe3ML3TGr8sTvOc+/aTD07xkXD/5APp3DwDfijib/S3UjgZVxfQYGtkI1U3M?= =?us-ascii?Q?7Lq6Evls9hqIRSTxbfH+MT1mbLuQQeE10It6/Q9jCs50Ra6m/qZSTJ7jJwvO?= =?us-ascii?Q?j1k2AewwhJGAln2SBuow0WeyQapCloENrgBfIhMtaTszLAlv+6jMU9s4VJCs?= =?us-ascii?Q?UsWCwjRBXqELURc1AUeyBuMnQIMpPg6DlVegfnx8ha7WmHOD1YSLvGAx7yqk?= =?us-ascii?Q?WRkNeGnNLtCLSmX7hI9Lyd9mTwLb6fbew565?= X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230040)(376014)(36860700013)(1800799024)(7416014)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 May 2025 12:10:19.4125 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f989acce-6e0b-4cd6-2fb8-08dd98607483 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF00036F43.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY3PR12MB9630 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Cosmin Ratiu After the last patch removing vlan_rwsem, it is an incremental step to allow ipoib to work with netdevs that require the instance lock. In several places, netdev_lock() is changed to netdev_lock_ops_to_full() which takes care of not acquiring the lock again when the netdev is already locked. In ipoib_ib_tx_timeout_work() and __ipoib_ib_dev_flush() for HEAVY flushes, the netdev lock is acquired/released. This is needed because these functions end up calling .ndo_stop()/.ndo_open() on subinterfaces, and the device may expect the netdev instance lock to be held. ipoib_set_mode() now explicitly acquires ops lock while manipulating the features, mtu and tx queues. Finally, ipoib_napi_enable()/ipoib_napi_disable() now use the *_locked variants of the napi_enable()/napi_disable() calls and optionally acquire the netdev lock themselves depending on the dev they operate on. Signed-off-by: Cosmin Ratiu Reviewed-by: Carolina Jubran Reviewed-by: Leon Romanovsky Signed-off-by: Tariq Toukan --- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 19 +++++++++++----- drivers/infiniband/ulp/ipoib/ipoib_main.c | 27 ++++++++++++++--------- 2 files changed, 31 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/u= lp/ipoib/ipoib_ib.c index dc670b4a191b..10b0dbda6cd5 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -40,6 +40,7 @@ =20 #include #include +#include #include =20 #include "ipoib.h" @@ -781,16 +782,20 @@ static void ipoib_napi_enable(struct net_device *dev) { struct ipoib_dev_priv *priv =3D ipoib_priv(dev); =20 - napi_enable(&priv->recv_napi); - napi_enable(&priv->send_napi); + netdev_lock_ops_to_full(dev); + napi_enable_locked(&priv->recv_napi); + napi_enable_locked(&priv->send_napi); + netdev_unlock_full_to_ops(dev); } =20 static void ipoib_napi_disable(struct net_device *dev) { struct ipoib_dev_priv *priv =3D ipoib_priv(dev); =20 - napi_disable(&priv->recv_napi); - napi_disable(&priv->send_napi); + netdev_lock_ops_to_full(dev); + napi_disable_locked(&priv->recv_napi); + napi_disable_locked(&priv->send_napi); + netdev_unlock_full_to_ops(dev); } =20 int ipoib_ib_dev_stop_default(struct net_device *dev) @@ -1240,10 +1245,14 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_p= riv *priv, ipoib_ib_dev_down(dev); =20 if (level =3D=3D IPOIB_FLUSH_HEAVY) { + netdev_lock_ops(dev); if (test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags)) ipoib_ib_dev_stop(dev); =20 - if (ipoib_ib_dev_open(dev)) + result =3D ipoib_ib_dev_open(dev); + netdev_unlock_ops(dev); + + if (result) return; =20 if (netif_queue_stopped(dev)) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband= /ulp/ipoib/ipoib_main.c index 4879fd17e868..f2f5465f2a90 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -49,6 +49,7 @@ #include #include #include +#include #include #include #include @@ -200,10 +201,10 @@ int ipoib_open(struct net_device *dev) struct ipoib_dev_priv *cpriv; =20 /* Bring up any child interfaces too */ - netdev_lock(dev); + netdev_lock_ops_to_full(dev); list_for_each_entry(cpriv, &priv->child_intfs, list) ipoib_schedule_ifupdown_task(cpriv->dev, true); - netdev_unlock(dev); + netdev_unlock_full_to_ops(dev); } else if (priv->parent) { struct ipoib_dev_priv *ppriv =3D ipoib_priv(priv->parent); =20 @@ -238,10 +239,10 @@ static int ipoib_stop(struct net_device *dev) struct ipoib_dev_priv *cpriv; =20 /* Bring down any child interfaces too */ - netdev_lock(dev); + netdev_lock_ops_to_full(dev); list_for_each_entry(cpriv, &priv->child_intfs, list) ipoib_schedule_ifupdown_task(cpriv->dev, false); - netdev_unlock(dev); + netdev_unlock_full_to_ops(dev); } =20 return 0; @@ -566,9 +567,11 @@ int ipoib_set_mode(struct net_device *dev, const char = *buf) set_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); ipoib_warn(priv, "enabling connected mode " "will cause multicast packet drops\n"); + netdev_lock_ops(dev); netdev_update_features(dev); - dev_set_mtu(dev, ipoib_cm_max_mtu(dev)); + netif_set_mtu(dev, ipoib_cm_max_mtu(dev)); netif_set_real_num_tx_queues(dev, 1); + netdev_unlock_ops(dev); rtnl_unlock(); priv->tx_wr.wr.send_flags &=3D ~IB_SEND_IP_CSUM; =20 @@ -578,9 +581,11 @@ int ipoib_set_mode(struct net_device *dev, const char = *buf) =20 if (!strcmp(buf, "datagram\n")) { clear_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); + netdev_lock_ops(dev); netdev_update_features(dev); - dev_set_mtu(dev, min(priv->mcast_mtu, dev->mtu)); + netif_set_mtu(dev, min(priv->mcast_mtu, dev->mtu)); netif_set_real_num_tx_queues(dev, dev->num_tx_queues); + netdev_unlock_ops(dev); rtnl_unlock(); ipoib_flush_paths(dev); return (!rtnl_trylock()) ? -EBUSY : 0; @@ -1247,6 +1252,7 @@ void ipoib_ib_tx_timeout_work(struct work_struct *wor= k) int err; =20 rtnl_lock(); + netdev_lock_ops(priv->dev); =20 if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) goto unlock; @@ -1261,6 +1267,7 @@ void ipoib_ib_tx_timeout_work(struct work_struct *wor= k) =20 netif_tx_wake_all_queues(priv->dev); unlock: + netdev_unlock_ops(priv->dev); rtnl_unlock(); =20 } @@ -2404,10 +2411,10 @@ static void set_base_guid(struct ipoib_dev_priv *pr= iv, union ib_gid *gid) netif_addr_unlock_bh(netdev); =20 if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { - netdev_lock(priv->dev); + netdev_lock_ops_to_full(priv->dev); list_for_each_entry(child_priv, &priv->child_intfs, list) set_base_guid(child_priv, gid); - netdev_unlock(priv->dev); + netdev_unlock_full_to_ops(priv->dev); } } =20 @@ -2450,10 +2457,10 @@ static int ipoib_set_mac(struct net_device *dev, vo= id *addr) if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { struct ipoib_dev_priv *cpriv; =20 - netdev_lock(dev); + netdev_lock_ops_to_full(dev); list_for_each_entry(cpriv, &priv->child_intfs, list) queue_work(ipoib_workqueue, &cpriv->flush_light); - netdev_unlock(dev); + netdev_unlock_full_to_ops(dev); } queue_work(ipoib_workqueue, &priv->flush_light); =20 --=20 2.31.1 From nobody Sat Feb 7 19:50:52 2026 Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam04on2084.outbound.protection.outlook.com [40.107.102.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6AE55267F44; Wed, 21 May 2025 12:10:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.102.84 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747829426; cv=fail; b=YElhNY+vyFXb5zYlJNm/+jdiNW+RIrDou/0Ng56mTpt4SYRwr/BFdui9UF55QaEsSU832FZJ/4pA5A4eTFvWmI++W2uOPao8KBYP4kxrvwvsIo5iAmjdwuh4cPe4obqq3nVyur7ULKpY1RBwK9ZoFK1aXUVopQHIdkoXVPripVc= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747829426; c=relaxed/simple; bh=kSxuhH6CcBB1pGIlCR9bagjHmIGUG2t2VaCaszXtcjk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=MXxB3xlt2IOazn3R8haKmbNhYtuJ9qKASvhp7wvoAcvK1YPmdR12RKHbvD4lEBGzrnGrB9TEnkDxIf+/d6I+7lkorGIV5T/jLssGdRPJPDUYqtoAQsjFuIn0BiNcpX7FzcOCGfiW0eUlEnwUh6/uJvk4LRRhsdpZUm8Vcr7Qe68= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=hKETg3Pu; arc=fail smtp.client-ip=40.107.102.84 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="hKETg3Pu" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=WcDMXhXk+uQLy3ShuEUXWtYEMjCYxuXzFZo8uyuAGaeU9qruPm9xzlYqpRPJiM+51yCSausLpq3vCk2baYViZ5d+ZyCwIfKEqujtC2EJajwm450ZWE+Jcfe/r+BUw61t7MLL2cPZELbzN2w+E02HMUItT0fCt1GvvBtbBN7nN1Z6SJ3Q73HwtSUTCbpLuIHbO6o5bC/qav9zdQ3dNu+jUEW0sptyHyNcYbH8LGrFOHK4z4NOoa8OzImjxx2swIZIaRa9XlodA43W5BIwqxVEzFv+B1lu4hKTLgiaKZ51Us5dUGjoxS9tUbcO1pe7rjaIVLhlMVrGEOeUs15OVTKb+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=G+7Vej9OqVluxU9s5Dxz8U9W9HlMyYxsBYhdPtNATiY=; b=yOZ7Plq33Xf1+zUscqiKf5vHxnYU8ucET+/yoyrcPdT/NDAfNrXcKXYSKeFIblZyJknK+BMxiz7M3PuVeDjG2ufg2usxDfV/8DMkNvEnnGUEvRHZKSOHW+QnkAAO1eizYO6QeMadxJ5Eaxrng2nQjaPTEq+IQUMBMrp4c4g3xli7dlAceP92QCNgmeLqiqnVx3MyM4c/Ijm7GcIVA2MYIiH55Vypq9BvvMnqMqjcQ9tWNulPDworTCovjttwvHIgXYV4m+6BiZ6IgzeGVasbB6BGRysXqNnH29/FHANnQlkt5ztqNXmJ8eXmxbrd6j+WkUGmY6/82LSH2hl7VbmgOg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=G+7Vej9OqVluxU9s5Dxz8U9W9HlMyYxsBYhdPtNATiY=; b=hKETg3PuhaqH0tIsDJHMc5Y3WsQfMJSkvwgo+Lwwsa//TvBKBJacUYm/eX53m381tz7usbNmlLYV6fryXuKPHtnyvNv039hM34UsL12qMvrc8NNTrVkvyVApeOIIxltBUmzHM5QabyP8sQ7caCO6nVYCSfeJgyb95SzQYZVMy3e/O1RAqCVmnTjqdu+Ty1HWyuDJfB4P39XNrWqgbcSEIiw1Pw+1BK9CBDUUrT8rtxxtn01XPileRzB3vXFfm9P+wMVbzvdoxonV+QGniQ+sn0XJzVbilbJbFIXXGaPg7elXaWVYszkl1dex8u7ZyvN+yPUHCEhWBY64g7SQ6I7j1w== Received: from BYAPR07CA0022.namprd07.prod.outlook.com (2603:10b6:a02:bc::35) by SN7PR12MB7276.namprd12.prod.outlook.com (2603:10b6:806:2af::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8722.31; Wed, 21 May 2025 12:10:21 +0000 Received: from SJ5PEPF00000208.namprd05.prod.outlook.com (2603:10b6:a02:bc:cafe::fc) by BYAPR07CA0022.outlook.office365.com (2603:10b6:a02:bc::35) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8769.19 via Frontend Transport; Wed, 21 May 2025 12:10:21 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SJ5PEPF00000208.mail.protection.outlook.com (10.167.244.41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8769.18 via Frontend Transport; Wed, 21 May 2025 12:10:20 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Wed, 21 May 2025 05:10:06 -0700 Received: from rnnvmail202.nvidia.com (10.129.68.7) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 21 May 2025 05:10:06 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.7) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Wed, 21 May 2025 05:10:01 -0700 From: Tariq Toukan To: "David S. Miller" , Jakub Kicinski , Paolo Abeni , Eric Dumazet , "Andrew Lunn" CC: Jason Gunthorpe , Leon Romanovsky , "Saeed Mahameed" , Tariq Toukan , "Richard Cochran" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , , , , , Moshe Shemesh , Mark Bloch , Gal Pressman , Cosmin Ratiu Subject: [PATCH net-next 4/5] net/mlx5e: Don't drop RTNL during firmware flash Date: Wed, 21 May 2025 15:09:01 +0300 Message-ID: <1747829342-1018757-5-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1747829342-1018757-1-git-send-email-tariqt@nvidia.com> References: <1747829342-1018757-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: AnonymousSubmission X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF00000208:EE_|SN7PR12MB7276:EE_ X-MS-Office365-Filtering-Correlation-Id: 09b187d7-e54d-431b-6570-08dd98607563 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|36860700013|7416014|376014|1800799024; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?tHYUKogkKoZPgjf0bZq7H8h4TxgComi16AXskmDdoRhPyhZM+HhQATOj1BXX?= =?us-ascii?Q?nUyUeISwuSOvtbGKlDVyhfI3FlKJDc/l6v5bg0V/gzzYU4uu+Jcyf/6TX188?= =?us-ascii?Q?Wjs00iOE+rqgAVwi45ZNPZQF/8V51UOp6unWxRLO0KHwt7YmS8Sk0ckvDgAd?= =?us-ascii?Q?D3utSRvfeyo4iAkOJ7Gn5YxdclxpqGTUKW8eIHJKMCcAnIl3NG9b3ToVsAj6?= =?us-ascii?Q?G2Vbx4+cAV0/jXdItre4/7/r8csZ22PbVMyAtxY1mLYNTKmeRwhj0IHIBvAW?= =?us-ascii?Q?jT1h9sUYXic6lCGkRoHDkAX4UvTQc3V6EuyuAyVN0WZLbAeweJWiPRQLGSHV?= =?us-ascii?Q?SjzdUz3QLzZpETysEGko/fS5H/YsTL1Q9uaAH10UmPdUFB7qJ0jmhK9oVYsY?= =?us-ascii?Q?sa6iJP1rBXHZQWunxDLGN1sjcnG/NLd9w653zmbB8mqUgEKHQBjPDUkhZObZ?= =?us-ascii?Q?4iE6bciTgruUXJjfu8b4KGaTgI9tTNGgbgFIgiHa5ZHqAfetEBp43jb9D4R0?= =?us-ascii?Q?YigxpLNRviECERtw4DrHmP278IT64Db3hbIfcyJockfWw6X9/EXAukykW34T?= =?us-ascii?Q?MdteVDh7hSvbILOzYFFh/s5+w4X/Ww0Zg1VKAxq/kZR6PUFL0BlhXn7ASugN?= =?us-ascii?Q?vn6wKvqf+yikHqvUvJs9VzKDMZMCEZ0Kqu3fhQoEG2X3T3ryoPRrsTqGS64Q?= =?us-ascii?Q?klL88SiXHuOoexNZYsTVSJ31bAtDfqylUNwWFbdvx2s9DCX1QqBhC6Ck2JFG?= =?us-ascii?Q?lGM06yKJZiSSd0UZJdEsLsa/4nSXj9IGRok0f977kvvsOB/aaxC6g8EuHqN9?= =?us-ascii?Q?/1uc1WtfQRtst37yiw1zbkyqFLIC/kPa/hzklVLPWtanwv57m6k6C6jeq9Dy?= =?us-ascii?Q?QSmn/1i442elAb83dSUz1TgDtcB7zTpr2BCrczJ+/p5iWNhKwA+emCHOk4MQ?= =?us-ascii?Q?cH5Mozim1ZTHx/Ewpr6l1FTnPahwKY/N1bFSGmibI6Cxr9gLZdnh0+voVJ15?= =?us-ascii?Q?XgdcCRTHi+1Z87U6EjYgFgVcJk6R2a+yKXLdasQPNG/Ab5XsE6oSNT2vlqYi?= =?us-ascii?Q?yXp/dk7yQtvc3fgQ7YrhvPeoxPS6OOMIaespY/ixf1iQXu0bS7ZdF4vt08bb?= =?us-ascii?Q?Pxz8vaSfsrQ85+oCHJMA1RV/Gqrh0PSXZcQHq59NWUOCy/gVEDd8e0G5UBMx?= =?us-ascii?Q?8voiHDhnhIjjsVfZGrD0klqfELyHuDc5W+G61lT/T5YPOpIoJcGq5mSJIQXA?= =?us-ascii?Q?x584wMLTzaTM+4jOABTZrd/mG46mFuNqE+Q+/7unqVZiPyp617EUbDSmYUvL?= =?us-ascii?Q?DM4MbzZV9ip5R5YW9DLrZOTcuTzplQVOw2h/+8a+mFkUhXCt6uRVmFHLEu2a?= =?us-ascii?Q?hlVeGILqx7a6mEh1bCh8aMC/Kny/kfxVL6ympY8n98e1VbWghuOpbs/dQwsY?= =?us-ascii?Q?jGaLukySIY6TUs3thW8QMZk0KHoq0rkjDXubWoCF9btcZOHRz9CiQYdmR25b?= =?us-ascii?Q?JYNZZjakFhTI7u+ebJfR6Q/UAMisL4qAH3N7?= X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(36860700013)(7416014)(376014)(1800799024);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 May 2025 12:10:20.9932 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 09b187d7-e54d-431b-6570-08dd98607563 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF00000208.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB7276 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Cosmin Ratiu There's no explanation in the original commit of why that was done, but presumably flashing takes a long time and holding RTNL for so long blocks other interactions with the netdev layer. However, the stack is moving towards netdev instance locking and dropping and reacquiring RTNL in the context of flashing introduces locking ordering issues: RTNL must be acquired before the netdev instance lock and released after it. This patch therefore takes the simpler approach by no longer dropping and reacquiring the RTNL, as soon RTNL for ethtool will be removed, leaving only the instance lock to protect against races. Signed-off-by: Cosmin Ratiu Reviewed-by: Carolina Jubran Reviewed-by: Dragos Tatulea Signed-off-by: Tariq Toukan --- drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers= /net/ethernet/mellanox/mlx5/core/en_ethtool.c index e399d7a3d6cb..ea078c9f5d15 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -2060,14 +2060,9 @@ int mlx5e_ethtool_flash_device(struct mlx5e_priv *pr= iv, if (err) return err; =20 - dev_hold(dev); - rtnl_unlock(); - err =3D mlx5_firmware_flash(mdev, fw, NULL); release_firmware(fw); =20 - rtnl_lock(); - dev_put(dev); return err; } =20 --=20 2.31.1 From nobody Sat Feb 7 19:50:52 2026 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2046.outbound.protection.outlook.com [40.107.236.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 276002676C9; Wed, 21 May 2025 12:10:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.236.46 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747829439; cv=fail; b=UYUNHsz03N3eGGBvXunc7l6NiRvZEVw8aba/Zs01jvgLhQzJ1N+DNVNGO9rAOeFci/j3OB+u0zyASGqc7bQ/Te4FLdxULa84shJaNVMNfDDcs2WgcrEku73br3zwm/TKJKSW3pQ/a7lGU+zupQKk67McgLvqYTqUX0rSR3IhYoU= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747829439; c=relaxed/simple; bh=cMOGa0k6VC5c/RIcAqQsGu03YSP/AF21k524egpdoMU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ddH7eljg1ivM8foTUrAizZj+4akkkjIlCBfuVG4YFUK/DkZcaW9Mnpz+N5wb9L4MuVfgEeICn/IO34ypeMCQi9VUmVIihOx/hCQQbCjPZ1K+ZKQr5sFEP8HkTWUSN5nNeXtQUCx9ezLZKYaTT6TxcElGYbfkz+mlCuQaYUtP4/M= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=AhEHjJRo; arc=fail smtp.client-ip=40.107.236.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="AhEHjJRo" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=T04qKx/YKDdPmd664Go7xr+4dj2cc6qYlSWOS+A/VqekS9JwudqCDakwpg82TmLljRMd1klSNFYolw0meywqGl0K0yPC3may+DZrjFLctUkM0633ty0nVhUsiH751/BMQJ8lMFkbjq7iLw3Pf/ALkCafuuclOswhcDQWnHfcdv7S8DGWQDcALGETRjTPT780drDBHzIUA4pbMiiqE0jf3TBDHkNU6QGGH6YwKeFer37vfO9U77jjeHnSBAO1hWCXv9tnJyJ75zemC4HkcbZb+qQ09YaFL+lyjlOl80Qdu9X1LRrEjVl95UrL6bZ2uDtH4VL+A0F1Ep6Y0m2U5ogC3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=a5/YPWv0JZXMaFpwlhKhwox14+IksW3VTa2ylH5jCDE=; b=lGIU5kjKweRuDUONXjegyt39rRQWDOSdFX7SDMOLbJCSZrMXay+Wl0Co21KwVHPtbkBavd2LzCgUO+oZCNVhRumsTkS6omyzeSjF5RS03Mm4u4ikJXqstsuUeeUBn/ei5KgGJ/ds6Sxfj3kyhLx8G72WE99OQumdplcOQoBGpE0kcQezOP86l81db77FdWnip1ISsxYDVNXNXsf0HsPcBAdE1TVX+Hfn+GKcsgIQK+i/D0hJVOsQ/Z7y14BIi3vitqPOB1UaPdEkHuaq2xaaVGO6fhOzkSqgpvE+YlF9I9QOWu3grH/eKXiCMYWI8cBWcl5qrsfZS6MuISU8kkr0aw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=a5/YPWv0JZXMaFpwlhKhwox14+IksW3VTa2ylH5jCDE=; b=AhEHjJRot+pjQxAzFUd1k5FMX1gRhhm5ZLrSqsSL3T4HaChUE9/aVkZNl+vDSGXytC6HLA+P/ufddfOTVjnrupYQgCAwXqLmUMHb+549tITph4DwCArrXiE8QFYpihJLxCVRuVvwPqk0fQ4wH6MxC91JVXLjM6bFwFPJVQmR7yk1HQlrOEXDumJpg6oFUMFixfv8O5hSx333a1CxO7GLunYBFu3dO2kwJDfoLdT6D5gPQRbS/8PDjs6ockAj38cF6AZascq4QUISc3S74LjVlCK/vcqBMK0bGV+LH9UTulvb8MurFJHeyIM3hB7r8zSI+sHNKwPwS6NBZHwAZ6XbCg== Received: from SJ0PR13CA0148.namprd13.prod.outlook.com (2603:10b6:a03:2c6::33) by SN7PR12MB7419.namprd12.prod.outlook.com (2603:10b6:806:2a6::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8769.20; Wed, 21 May 2025 12:10:29 +0000 Received: from SJ5PEPF00000205.namprd05.prod.outlook.com (2603:10b6:a03:2c6:cafe::d5) by SJ0PR13CA0148.outlook.office365.com (2603:10b6:a03:2c6::33) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8769.16 via Frontend Transport; Wed, 21 May 2025 12:10:29 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SJ5PEPF00000205.mail.protection.outlook.com (10.167.244.38) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8769.18 via Frontend Transport; Wed, 21 May 2025 12:10:29 +0000 Received: from rnnvmail202.nvidia.com (10.129.68.7) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Wed, 21 May 2025 05:10:12 -0700 Received: from rnnvmail202.nvidia.com (10.129.68.7) by rnnvmail202.nvidia.com (10.129.68.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 21 May 2025 05:10:12 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.7) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Wed, 21 May 2025 05:10:07 -0700 From: Tariq Toukan To: "David S. Miller" , Jakub Kicinski , Paolo Abeni , Eric Dumazet , "Andrew Lunn" CC: Jason Gunthorpe , Leon Romanovsky , "Saeed Mahameed" , Tariq Toukan , "Richard Cochran" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , , , , , Moshe Shemesh , Mark Bloch , Gal Pressman , Cosmin Ratiu Subject: [PATCH net-next 5/5] net/mlx5e: Convert mlx5 netdevs to instance locking Date: Wed, 21 May 2025 15:09:02 +0300 Message-ID: <1747829342-1018757-6-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1747829342-1018757-1-git-send-email-tariqt@nvidia.com> References: <1747829342-1018757-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: AnonymousSubmission X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF00000205:EE_|SN7PR12MB7419:EE_ X-MS-Office365-Filtering-Correlation-Id: 2e85b425-6928-4887-5bcc-08dd98607a4f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|82310400026|1800799024|36860700013; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?91v7fhnpsXPzQIjnRSbybYzoUAVffHISpTBYxtM5wWCHdSRcfwN4/sliDc7e?= =?us-ascii?Q?keSr1CNtUkN4pz67wtYfJtbh6HAQzzb0vRvlUJSN6jtfkD1/ZMNLMLdlYk/E?= =?us-ascii?Q?1FI6tbdea6Yy5mcsroP/kUbzELjaABdqGhttTj3H8XcXHzUCs3XzEDVw2t2a?= =?us-ascii?Q?C65yjosD09lgMucPYsHeKWaESJ2FQa1fMars5UNXpYLgAViiJhfEfgpkwmJz?= =?us-ascii?Q?bnjRiNQf7fAvKc80PI9Sb+ho4+W8Sy18bGcuWSh4Oii4iTF6KiMBD+wCCxNR?= =?us-ascii?Q?UTB5OqGBgnxzaq5E0GlEUefHHLTh+X4gizbnnhXIjBANRWpZQUpwKZ1WrYJC?= =?us-ascii?Q?XjFbBng5RXYx7mjpVco00dTFTdqjGFqaupx7jnQFSdoHCNKnBvZvJK5wuB5Z?= =?us-ascii?Q?/gyHjNENgZpzms4QIGfc4iBDx2lB31qI3/QCj01pqKpq6TJ9Ll/norsgFU5+?= =?us-ascii?Q?5qnF+nnRhZ8wo8RaJTkrDvDzMnwYK6hULflgEtgGvlZHiDHU8nZxvBugj+ml?= =?us-ascii?Q?F5K/vzlYbhDbdDEX+TxWlkLPjTK/SH+98gHRdeidvhcAksB7OFhbl11GFEwf?= =?us-ascii?Q?Si9jTes8gP89ZQsxPdPwBtV1z+JMb+QFlkVJW662jy5qT/etBBXaEvLj3byT?= =?us-ascii?Q?/vZRG7WcEBfe9RMlXWK79d1rPhpqjBrS0q8VQWrKsFIMuP+dtuAVXawSI9xf?= =?us-ascii?Q?eb9zC+SW0XUxeEOLmfXiImQDIRA2+oWdKJ01MB0giKuguKgzalltbmO+egPk?= =?us-ascii?Q?Ffc07oYBpTvhDdaZhtbS4l5sPy7aIeSTaOu5xCtfHN2XPZWvnpvMWUiSIhgK?= =?us-ascii?Q?8Lb0PUdFcUEY/ForsUKZHjHf4+QkjUbAsmbbtnOPTC7ENCIyagZjjvZNcqsk?= =?us-ascii?Q?aPw2H53UMFCJ6PT/1EDo333VVCpx3Ybar/x+kkvUBcLXJHc+lXdX9J0bgitj?= =?us-ascii?Q?aHiC4VXV89jMV4uWsq94H164ZlZXmd1HATMhb57kiZAugx1aRwA/7rUbHYDA?= =?us-ascii?Q?uaOAnwrvbAHgUUmtMU4LaW0Q31DSSXCc8yNGzR8x6VrYYMALIOgWfIrLY5Kt?= =?us-ascii?Q?O5Z/XblkQlD1U0VIoS7L0JmLs9Jm4qZAJDb2N+KD+E5dHSD13fvXJO6mL8nL?= =?us-ascii?Q?Z6XCT0iOeiiF43zm/uBtprAh4wgjM3kzwyL+7R9BIKUOIf5b78f9Cr0E3hxi?= =?us-ascii?Q?dj4Q9ONFItFrDWxbJCYqwzuyjK8ScMxQXFxQRrmuQ68nDaOmWmL5LrieFKjN?= =?us-ascii?Q?adyUscERwDFcxvhHwi8S1Z7pH+3zAWo9MP6Nt+GNdhS0V5ByUfX1Pb1i9CGX?= =?us-ascii?Q?EHiuKHc/1/q+7NFYMRhmpeZ2GfHm2lmUxjEEIOGvB3QRgudQpuQBOkXNPkjc?= =?us-ascii?Q?KjH3VXHZBXSCFa7ZXIGCyIDaM+rXHHfD766kW/3UqKMr8uN+j2iV9F52OjXp?= =?us-ascii?Q?GtVlFtHbT4JMNj5gdqRSlHbOOXY0nPwWY2ulgFy2BUjmAmNVHFKO1Movltnx?= =?us-ascii?Q?0q4mUJgjQfXHGHigpQIzQ1OseQ0pr2pL+q3G?= X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(7416014)(376014)(82310400026)(1800799024)(36860700013);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 May 2025 12:10:29.2341 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2e85b425-6928-4887-5bcc-08dd98607a4f X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF00000205.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB7419 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Cosmin Ratiu This patch convert mlx5 to use the new netdev instance lock in addition to the pre-existing state_lock (and the RTNL). mlx5e_priv.state_lock was already used throughout mlx5 to protect against concurrent state modifications on the same netdev, usually in addition to the RTNL. The new netdev instance lock will eventually replace it, but for now, it is acquired in addition to the existing locks in the order RTNL -> instance lock -> state_lock. All three netdev types handled by mlx5 are converted to the new style of locking, because they share a lot of code related to initializing channels and dealing with NAPI, so it's better to convert all three rather than introduce different assumptions deep in the call stack depending on the type of device. Because of the nature of the call graphs in mlx5, it wasn't possible to incrementally convert parts of the driver to use the new lock, since either all call paths into NAPI have to possess the new lock if the *_locked variants are used, or none of them can have the lock. One area which required extra care is the interaction between closing channels and devlink health reporter tasks. Previously, the recovery tasks were unconditionally acquiring the RTNL, which could lead to deadlocks in these scenarios: T1: mlx5e_close (=3D=3D .ndo_stop(), has RTNL) -> mlx5e_close_locked -> mlx5e_close_channels -> mlx5e_ptp_close -> mlx5e_ptp_close_queues -> mlx5e_ptp_close_txqsqs -> mlx5e_ptp_close_txqsq -> cancel_work_sync(&ptpsq->report_unhealthy_work) waits for T2: mlx5e_ptpsq_unhealthy_work -> mlx5e_reporter_tx_ptpsq_unhealthy -> mlx5e_health_report -> devlink_health_report -> devlink_health_reporter_recover -> mlx5e_tx_reporter_ptpsq_unhealthy_recover which does: rtnl_lock(); =3D> Deadlock. Another similar instance of this is: T1: mlx5e_close (=3D=3D .ndo_stop(), has RTNL) -> mlx5e_close_locked -> mlx5e_close_channels -> mlx5e_ptp_close -> mlx5e_ptp_close_queues -> mlx5e_ptp_close_txqsqs -> mlx5e_ptp_close_txqsq -> cancel_work_sync(&sq->recover_work) waits for T2: mlx5e_tx_err_cqe_work -> mlx5e_reporter_tx_err_cqe -> mlx5e_health_report -> devlink_health_report -> devlink_health_reporter_recover -> mlx5e_tx_reporter_err_cqe_recover which does: rtnl_lock(); =3D> Another deadlock. Fix that by using the same pattern previously done in mlx5e_tx_timeout_work, where the RTNL was repeatedly tried to be acquired until either: a) it is successfully acquired or b) there's no need for the work to be done any more (channel is being closed). Now, for all three recovery tasks, the instance lock is repeatedly tried to be acquired until successful or the channel/SQ is closed. As a side-effect, drop the !test_bit(MLX5E_STATE_OPENED, &priv->state) check from mlx5e_tx_timeout_work, it's weaker than !test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state) and unnecessary. Future patches will introduce new call paths (from netdev queue management ops) which can close channels (and call cancel_work_sync on the recovery tasks) without the RTNL lock and only with the netdev instance lock. Signed-off-by: Cosmin Ratiu Reviewed-by: Carolina Jubran Reviewed-by: Dragos Tatulea Signed-off-by: Tariq Toukan --- .../ethernet/mellanox/mlx5/core/en/health.c | 2 + .../net/ethernet/mellanox/mlx5/core/en/ptp.c | 25 ++++-- .../mellanox/mlx5/core/en/reporter_tx.c | 4 - .../net/ethernet/mellanox/mlx5/core/en/trap.c | 12 +-- .../ethernet/mellanox/mlx5/core/en_dcbnl.c | 2 + .../net/ethernet/mellanox/mlx5/core/en_fs.c | 4 + .../net/ethernet/mellanox/mlx5/core/en_main.c | 82 ++++++++++++------- .../net/ethernet/mellanox/mlx5/core/en_rep.c | 7 ++ .../ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 3 + 9 files changed, 96 insertions(+), 45 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c b/drivers/= net/ethernet/mellanox/mlx5/core/en/health.c index 81523825faa2..cb972b2d46e2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c @@ -114,6 +114,7 @@ int mlx5e_health_recover_channels(struct mlx5e_priv *pr= iv) int err =3D 0; =20 rtnl_lock(); + netdev_lock(priv->netdev); mutex_lock(&priv->state_lock); =20 if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) @@ -123,6 +124,7 @@ int mlx5e_health_recover_channels(struct mlx5e_priv *pr= iv) =20 out: mutex_unlock(&priv->state_lock); + netdev_unlock(priv->netdev); rtnl_unlock(); =20 return err; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net= /ethernet/mellanox/mlx5/core/en/ptp.c index 131ed97ca997..5d0014129a7e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c @@ -8,6 +8,7 @@ #include "en/fs_tt_redirect.h" #include #include +#include =20 struct mlx5e_ptp_fs { struct mlx5_flow_handle *l2_rule; @@ -449,8 +450,22 @@ static void mlx5e_ptpsq_unhealthy_work(struct work_str= uct *work) { struct mlx5e_ptpsq *ptpsq =3D container_of(work, struct mlx5e_ptpsq, report_unhealthy_work); + struct mlx5e_txqsq *sq =3D &ptpsq->txqsq; + + /* Recovering the PTP SQ means re-enabling NAPI, which requires the + * netdev instance lock. However, SQ closing has to wait for this work + * task to finish while also holding the same lock. So either get the + * lock or find that the SQ is no longer enabled and thus this work is + * not relevant anymore. + */ + while (!netdev_trylock(sq->netdev)) { + if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state)) + return; + msleep(20); + } =20 mlx5e_reporter_tx_ptpsq_unhealthy(ptpsq); + netdev_unlock(sq->netdev); } =20 static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn, @@ -892,7 +907,7 @@ int mlx5e_ptp_open(struct mlx5e_priv *priv, struct mlx5= e_params *params, if (err) goto err_free; =20 - netif_napi_add(netdev, &c->napi, mlx5e_ptp_napi_poll); + netif_napi_add_locked(netdev, &c->napi, mlx5e_ptp_napi_poll); =20 mlx5e_ptp_build_params(c, cparams, params); =20 @@ -910,7 +925,7 @@ int mlx5e_ptp_open(struct mlx5e_priv *priv, struct mlx5= e_params *params, return 0; =20 err_napi_del: - netif_napi_del(&c->napi); + netif_napi_del_locked(&c->napi); err_free: kvfree(cparams); kvfree(c); @@ -920,7 +935,7 @@ int mlx5e_ptp_open(struct mlx5e_priv *priv, struct mlx5= e_params *params, void mlx5e_ptp_close(struct mlx5e_ptp *c) { mlx5e_ptp_close_queues(c); - netif_napi_del(&c->napi); + netif_napi_del_locked(&c->napi); =20 kvfree(c); } @@ -929,7 +944,7 @@ void mlx5e_ptp_activate_channel(struct mlx5e_ptp *c) { int tc; =20 - napi_enable(&c->napi); + napi_enable_locked(&c->napi); =20 if (test_bit(MLX5E_PTP_STATE_TX, c->state)) { for (tc =3D 0; tc < c->num_tc; tc++) @@ -957,7 +972,7 @@ void mlx5e_ptp_deactivate_channel(struct mlx5e_ptp *c) mlx5e_deactivate_txqsq(&c->ptpsq[tc].txqsq); } =20 - napi_disable(&c->napi); + napi_disable_locked(&c->napi); } =20 int mlx5e_ptp_get_rqn(struct mlx5e_ptp *c, u32 *rqn) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/dri= vers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c index dbd9482359e1..c3bda4612fa9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c @@ -107,9 +107,7 @@ static int mlx5e_tx_reporter_err_cqe_recover(void *ctx) mlx5e_reset_txqsq_cc_pc(sq); sq->stats->recover++; clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state); - rtnl_lock(); mlx5e_activate_txqsq(sq); - rtnl_unlock(); =20 if (sq->channel) mlx5e_trigger_napi_icosq(sq->channel); @@ -176,7 +174,6 @@ static int mlx5e_tx_reporter_ptpsq_unhealthy_recover(vo= id *ctx) =20 priv =3D ptpsq->txqsq.priv; =20 - rtnl_lock(); mutex_lock(&priv->state_lock); chs =3D &priv->channels; netdev =3D priv->netdev; @@ -196,7 +193,6 @@ static int mlx5e_tx_reporter_ptpsq_unhealthy_recover(vo= id *ctx) netif_carrier_on(netdev); =20 mutex_unlock(&priv->state_lock); - rtnl_unlock(); =20 return err; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c b/drivers/ne= t/ethernet/mellanox/mlx5/core/en/trap.c index 140606fcd23b..b5c19396e096 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c @@ -149,7 +149,7 @@ static struct mlx5e_trap *mlx5e_open_trap(struct mlx5e_= priv *priv) t->mkey_be =3D cpu_to_be32(priv->mdev->mlx5e_res.hw_objs.mkey); t->stats =3D &priv->trap_stats.ch; =20 - netif_napi_add(netdev, &t->napi, mlx5e_trap_napi_poll); + netif_napi_add_locked(netdev, &t->napi, mlx5e_trap_napi_poll); =20 err =3D mlx5e_open_trap_rq(priv, t); if (unlikely(err)) @@ -164,7 +164,7 @@ static struct mlx5e_trap *mlx5e_open_trap(struct mlx5e_= priv *priv) err_close_trap_rq: mlx5e_close_trap_rq(&t->rq); err_napi_del: - netif_napi_del(&t->napi); + netif_napi_del_locked(&t->napi); kvfree(t); return ERR_PTR(err); } @@ -173,13 +173,13 @@ void mlx5e_close_trap(struct mlx5e_trap *trap) { mlx5e_tir_destroy(&trap->tir); mlx5e_close_trap_rq(&trap->rq); - netif_napi_del(&trap->napi); + netif_napi_del_locked(&trap->napi); kvfree(trap); } =20 static void mlx5e_activate_trap(struct mlx5e_trap *trap) { - napi_enable(&trap->napi); + napi_enable_locked(&trap->napi); mlx5e_activate_rq(&trap->rq); mlx5e_trigger_napi_sched(&trap->napi); } @@ -189,7 +189,7 @@ void mlx5e_deactivate_trap(struct mlx5e_priv *priv) struct mlx5e_trap *trap =3D priv->en_trap; =20 mlx5e_deactivate_rq(&trap->rq); - napi_disable(&trap->napi); + napi_disable_locked(&trap->napi); } =20 static struct mlx5e_trap *mlx5e_add_trap_queue(struct mlx5e_priv *priv) @@ -285,6 +285,7 @@ int mlx5e_handle_trap_event(struct mlx5e_priv *priv, st= ruct mlx5_trap_ctx *trap_ if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) return 0; =20 + netdev_lock(priv->netdev); switch (trap_ctx->action) { case DEVLINK_TRAP_ACTION_TRAP: err =3D mlx5e_handle_action_trap(priv, trap_ctx->id); @@ -297,6 +298,7 @@ int mlx5e_handle_trap_event(struct mlx5e_priv *priv, st= ruct mlx5_trap_ctx *trap_ trap_ctx->action); err =3D -EINVAL; } + netdev_unlock(priv->netdev); return err; } =20 diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/n= et/ethernet/mellanox/mlx5/core/en_dcbnl.c index 8705cffc747f..5fe016e477b3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c @@ -1147,6 +1147,7 @@ static int mlx5e_set_trust_state(struct mlx5e_priv *p= riv, u8 trust_state) bool reset =3D true; int err; =20 + netdev_lock(priv->netdev); mutex_lock(&priv->state_lock); =20 new_params =3D priv->channels.params; @@ -1162,6 +1163,7 @@ static int mlx5e_set_trust_state(struct mlx5e_priv *p= riv, u8 trust_state) &trust_state, reset); =20 mutex_unlock(&priv->state_lock); + netdev_unlock(priv->netdev); =20 return err; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/= ethernet/mellanox/mlx5/core/en_fs.c index 05058710d2c7..04a969128161 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c @@ -484,7 +484,9 @@ static int mlx5e_vlan_rx_add_svid(struct mlx5e_flow_ste= ering *fs, } =20 /* Need to fix some features.. */ + netdev_lock(netdev); netdev_update_features(netdev); + netdev_unlock(netdev); return err; } =20 @@ -521,7 +523,9 @@ int mlx5e_fs_vlan_rx_kill_vid(struct mlx5e_flow_steerin= g *fs, } else if (be16_to_cpu(proto) =3D=3D ETH_P_8021AD) { clear_bit(vid, fs->vlan->active_svlans); mlx5e_fs_del_vlan_rule(fs, MLX5E_VLAN_RULE_TYPE_MATCH_STAG_VID, vid); + netdev_lock(netdev); netdev_update_features(netdev); + netdev_unlock(netdev); } =20 return 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/ne= t/ethernet/mellanox/mlx5/core/en_main.c index 9bd166f489e7..ea822c69d137 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include #include @@ -1903,7 +1904,20 @@ void mlx5e_tx_err_cqe_work(struct work_struct *recov= er_work) struct mlx5e_txqsq *sq =3D container_of(recover_work, struct mlx5e_txqsq, recover_work); =20 + /* Recovering queues means re-enabling NAPI, which requires the netdev + * instance lock. However, SQ closing flows have to wait for work tasks + * to finish while also holding the netdev instance lock. So either get + * the lock or find that the SQ is no longer enabled and thus this work + * is not relevant anymore. + */ + while (!netdev_trylock(sq->netdev)) { + if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state)) + return; + msleep(20); + } + mlx5e_reporter_tx_err_cqe(sq); + netdev_unlock(sq->netdev); } =20 static struct dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode) @@ -2705,8 +2719,8 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv= , int ix, c->aff_mask =3D irq_get_effective_affinity_mask(irq); c->lag_port =3D mlx5e_enumerate_lag_port(mdev, ix); =20 - netif_napi_add_config(netdev, &c->napi, mlx5e_napi_poll, ix); - netif_napi_set_irq(&c->napi, irq); + netif_napi_add_config_locked(netdev, &c->napi, mlx5e_napi_poll, ix); + netif_napi_set_irq_locked(&c->napi, irq); =20 err =3D mlx5e_open_queues(c, params, cparam); if (unlikely(err)) @@ -2728,7 +2742,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv= , int ix, mlx5e_close_queues(c); =20 err_napi_del: - netif_napi_del(&c->napi); + netif_napi_del_locked(&c->napi); =20 err_free: kvfree(cparam); @@ -2741,7 +2755,7 @@ static void mlx5e_activate_channel(struct mlx5e_chann= el *c) { int tc; =20 - napi_enable(&c->napi); + napi_enable_locked(&c->napi); =20 for (tc =3D 0; tc < c->num_tc; tc++) mlx5e_activate_txqsq(&c->sq[tc]); @@ -2773,7 +2787,7 @@ static void mlx5e_deactivate_channel(struct mlx5e_cha= nnel *c) mlx5e_deactivate_txqsq(&c->sq[tc]); mlx5e_qos_deactivate_queues(c); =20 - napi_disable(&c->napi); + napi_disable_locked(&c->napi); } =20 static void mlx5e_close_channel(struct mlx5e_channel *c) @@ -2782,7 +2796,7 @@ static void mlx5e_close_channel(struct mlx5e_channel = *c) mlx5e_close_xsk(c); mlx5e_close_queues(c); mlx5e_qos_close_queues(c); - netif_napi_del(&c->napi); + netif_napi_del_locked(&c->napi); =20 kvfree(c); } @@ -4276,7 +4290,7 @@ void mlx5e_set_xdp_feature(struct net_device *netdev) =20 if (!netdev->netdev_ops->ndo_bpf || params->packet_merge.type !=3D MLX5E_PACKET_MERGE_NONE) { - xdp_clear_features_flag(netdev); + xdp_set_features_flag_locked(netdev, 0); return; } =20 @@ -4285,7 +4299,7 @@ void mlx5e_set_xdp_feature(struct net_device *netdev) NETDEV_XDP_ACT_RX_SG | NETDEV_XDP_ACT_NDO_XMIT | NETDEV_XDP_ACT_NDO_XMIT_SG; - xdp_set_features_flag(netdev, val); + xdp_set_features_flag_locked(netdev, val); } =20 int mlx5e_set_features(struct net_device *netdev, netdev_features_t featur= es) @@ -4968,21 +4982,19 @@ static void mlx5e_tx_timeout_work(struct work_struc= t *work) struct net_device *netdev =3D priv->netdev; int i; =20 - /* Take rtnl_lock to ensure no change in netdev->real_num_tx_queues - * through this flow. However, channel closing flows have to wait for - * this work to finish while holding rtnl lock too. So either get the - * lock or find that channels are being closed for other reason and - * this work is not relevant anymore. + /* Recovering the TX queues implies re-enabling NAPI, which requires + * the netdev instance lock. + * However, channel closing flows have to wait for this work to finish + * while holding the same lock. So either get the lock or find that + * channels are being closed for other reason and this work is not + * relevant anymore. */ - while (!rtnl_trylock()) { + while (!netdev_trylock(netdev)) { if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state)) return; msleep(20); } =20 - if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) - goto unlock; - for (i =3D 0; i < netdev->real_num_tx_queues; i++) { struct netdev_queue *dev_queue =3D netdev_get_tx_queue(netdev, i); @@ -4996,8 +5008,7 @@ static void mlx5e_tx_timeout_work(struct work_struct = *work) break; } =20 -unlock: - rtnl_unlock(); + netdev_unlock(netdev); } =20 static void mlx5e_tx_timeout(struct net_device *dev, unsigned int txqueue) @@ -5321,7 +5332,6 @@ static void mlx5e_get_queue_stats_rx(struct net_devic= e *dev, int i, struct mlx5e_rq_stats *xskrq_stats; struct mlx5e_rq_stats *rq_stats; =20 - ASSERT_RTNL(); if (mlx5e_is_uplink_rep(priv) || !priv->stats_nch) return; =20 @@ -5341,7 +5351,6 @@ static void mlx5e_get_queue_stats_tx(struct net_devic= e *dev, int i, struct mlx5e_priv *priv =3D netdev_priv(dev); struct mlx5e_sq_stats *sq_stats; =20 - ASSERT_RTNL(); if (!priv->stats_nch) return; =20 @@ -5362,7 +5371,6 @@ static void mlx5e_get_base_stats(struct net_device *d= ev, struct mlx5e_ptp *ptp_channel; int i, tc; =20 - ASSERT_RTNL(); if (!mlx5e_is_uplink_rep(priv)) { rx->packets =3D 0; rx->bytes =3D 0; @@ -5458,6 +5466,8 @@ static void mlx5e_build_nic_netdev(struct net_device = *netdev) netdev->netdev_ops =3D &mlx5e_netdev_ops; netdev->xdp_metadata_ops =3D &mlx5e_xdp_metadata_ops; netdev->xsk_tx_metadata_ops =3D &mlx5e_xsk_tx_metadata_ops; + netdev->request_ops_lock =3D true; + netdev_lockdep_set_classes(netdev); =20 mlx5e_dcbnl_build_netdev(netdev); =20 @@ -5839,9 +5849,11 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv) mlx5e_nic_set_rx_mode(priv); =20 rtnl_lock(); + netdev_lock(netdev); if (netif_running(netdev)) mlx5e_open(netdev); udp_tunnel_nic_reset_ntf(priv->netdev); + netdev_unlock(netdev); netif_device_attach(netdev); rtnl_unlock(); } @@ -5854,9 +5866,16 @@ static void mlx5e_nic_disable(struct mlx5e_priv *pri= v) mlx5e_dcbnl_delete_app(priv); =20 rtnl_lock(); + netdev_lock(priv->netdev); if (netif_running(priv->netdev)) mlx5e_close(priv->netdev); netif_device_detach(priv->netdev); + if (priv->en_trap) { + mlx5e_deactivate_trap(priv); + mlx5e_close_trap(priv->en_trap); + priv->en_trap =3D NULL; + } + netdev_unlock(priv->netdev); rtnl_unlock(); =20 mlx5e_nic_set_rx_mode(priv); @@ -5866,11 +5885,6 @@ static void mlx5e_nic_disable(struct mlx5e_priv *pri= v) mlx5e_monitor_counter_cleanup(priv); =20 mlx5e_disable_blocking_events(priv); - if (priv->en_trap) { - mlx5e_deactivate_trap(priv); - mlx5e_close_trap(priv->en_trap); - priv->en_trap =3D NULL; - } mlx5e_disable_async_events(priv); mlx5_lag_remove_netdev(mdev, priv->netdev); mlx5_vxlan_reset_to_default(mdev->vxlan); @@ -6125,7 +6139,9 @@ static void mlx5e_update_features(struct net_device *= netdev) return; /* features will be updated on netdev registration */ =20 rtnl_lock(); + netdev_lock(netdev); netdev_update_features(netdev); + netdev_unlock(netdev); rtnl_unlock(); } =20 @@ -6136,7 +6152,7 @@ static void mlx5e_reset_channels(struct net_device *n= etdev) =20 int mlx5e_attach_netdev(struct mlx5e_priv *priv) { - const bool take_rtnl =3D priv->netdev->reg_state =3D=3D NETREG_REGISTERED; + const bool need_lock =3D priv->netdev->reg_state =3D=3D NETREG_REGISTERED; const struct mlx5e_profile *profile =3D priv->profile; int max_nch; int err; @@ -6178,15 +6194,19 @@ int mlx5e_attach_netdev(struct mlx5e_priv *priv) * 2. Set our default XPS cpumask. * 3. Build the RQT. * - * rtnl_lock is required by netif_set_real_num_*_queues in case the + * Locking is required by netif_set_real_num_*_queues in case the * netdev has been registered by this point (if this function was called * in the reload or resume flow). */ - if (take_rtnl) + if (need_lock) { rtnl_lock(); + netdev_lock(priv->netdev); + } err =3D mlx5e_num_channels_changed(priv); - if (take_rtnl) + if (need_lock) { + netdev_unlock(priv->netdev); rtnl_unlock(); + } if (err) goto out; =20 diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net= /ethernet/mellanox/mlx5/core/en_rep.c index 2abab241f03b..719aa16bd404 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include #include @@ -885,6 +886,8 @@ static void mlx5e_build_rep_netdev(struct net_device *n= etdev, { SET_NETDEV_DEV(netdev, mdev->device); netdev->netdev_ops =3D &mlx5e_netdev_ops_rep; + netdev->request_ops_lock =3D true; + netdev_lockdep_set_classes(netdev); eth_hw_addr_random(netdev); netdev->ethtool_ops =3D &mlx5e_rep_ethtool_ops; =20 @@ -1344,9 +1347,11 @@ static void mlx5e_uplink_rep_enable(struct mlx5e_pri= v *priv) netdev->wanted_features |=3D NETIF_F_HW_TC; =20 rtnl_lock(); + netdev_lock(netdev); if (netif_running(netdev)) mlx5e_open(netdev); udp_tunnel_nic_reset_ntf(priv->netdev); + netdev_unlock(netdev); netif_device_attach(netdev); rtnl_unlock(); } @@ -1356,9 +1361,11 @@ static void mlx5e_uplink_rep_disable(struct mlx5e_pr= iv *priv) struct mlx5_core_dev *mdev =3D priv->mdev; =20 rtnl_lock(); + netdev_lock(priv->netdev); if (netif_running(priv->netdev)) mlx5e_close(priv->netdev); netif_device_detach(priv->netdev); + netdev_unlock(priv->netdev); rtnl_unlock(); =20 mlx5e_rep_bridge_cleanup(priv); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/driver= s/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c index 0979d672d47f..79ae3a51a4b3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c @@ -32,6 +32,7 @@ =20 #include #include +#include #include "en.h" #include "en/params.h" #include "ipoib.h" @@ -102,6 +103,8 @@ int mlx5i_init(struct mlx5_core_dev *mdev, struct net_d= evice *netdev) =20 netdev->netdev_ops =3D &mlx5i_netdev_ops; netdev->ethtool_ops =3D &mlx5i_ethtool_ops; + netdev->request_ops_lock =3D true; + netdev_lockdep_set_classes(netdev); =20 return 0; } --=20 2.31.1