[PATCH net 2/4] net/mlx5: Fix deadlock between devlink lock and esw->wq

Tariq Toukan posted 4 patches 2 weeks, 5 days ago
[PATCH net 2/4] net/mlx5: Fix deadlock between devlink lock and esw->wq
Posted by Tariq Toukan 2 weeks, 5 days ago
From: Cosmin Ratiu <cratiu@nvidia.com>

esw_functions_changed_event_handler -> esw_vfs_changed_event_handler is
called from the esw->work_queue and acquires the devlink lock.

Changing the esw mode is done via .eswitch_mode_set (acquires devlink
lock in the devlink_nl_pre_doit call) -> mlx5_devlink_eswitch_mode_set
-> mlx5_eswitch_disable_locked -> mlx5_eswitch_event_handler_unregister
-> flush_workqueue.

This creates a circular lock dependency which could lead to a real
deadlock, as the code flushing the workqueue is holding the devlink
lock, and the work handler being flushed could try to acquire it.

Fix that by adding a new bool field 'notifier_enabled' next to the event
handler scheduling the work, keeping it true while the notifier is
active, and using it to repeatedly try to acquire the devlink lock from
the work handler while true, with a slight delay to avoid busy looping.

This avoids the deadlock because the event handler will be removed
first (turning 'notifier_enabled' false), and the work handler will
eventually give up in acquiring the lock because the work is no longer
necessary.

Fixes: f1bc646c9a06 ("net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c    |  6 +++++-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h    |  1 +
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c   | 12 +++++++++++-
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 4b7a1ce7f406..fddc3b33222d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1066,14 +1066,18 @@ static void mlx5_eswitch_event_handler_register(struct mlx5_eswitch *esw)
 	if (esw->mode == MLX5_ESWITCH_OFFLOADS && mlx5_eswitch_is_funcs_handler(esw->dev)) {
 		MLX5_NB_INIT(&esw->esw_funcs.nb, mlx5_esw_funcs_changed_handler,
 			     ESW_FUNCTIONS_CHANGED);
+		esw->esw_funcs.notifier_enabled = true;
 		mlx5_eq_notifier_register(esw->dev, &esw->esw_funcs.nb);
 	}
 }
 
 static void mlx5_eswitch_event_handler_unregister(struct mlx5_eswitch *esw)
 {
-	if (esw->mode == MLX5_ESWITCH_OFFLOADS && mlx5_eswitch_is_funcs_handler(esw->dev))
+	if (esw->mode == MLX5_ESWITCH_OFFLOADS &&
+	    mlx5_eswitch_is_funcs_handler(esw->dev)) {
+		esw->esw_funcs.notifier_enabled = false;
 		mlx5_eq_notifier_unregister(esw->dev, &esw->esw_funcs.nb);
+	}
 
 	flush_workqueue(esw->work_queue);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index ad1073f7b79f..e20574a197e4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -338,6 +338,7 @@ struct mlx5_host_work {
 
 struct mlx5_esw_functions {
 	struct mlx5_nb		nb;
+	bool                    notifier_enabled;
 	bool			host_funcs_disabled;
 	u16			num_vfs;
 	u16			num_ec_vfs;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index ea94a727633f..0199bea2cb31 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -3597,7 +3597,17 @@ esw_vfs_changed_event_handler(struct mlx5_eswitch *esw, const u32 *out)
 		return;
 
 	devlink = priv_to_devlink(esw->dev);
-	devl_lock(devlink);
+	/* Repeatedly try to grab the lock with a delay while this work is
+	 * still relevant.
+	 * This allows a concurrent mlx5_eswitch_event_handler_unregister
+	 * (holding the devlink lock) to flush the wq without deadlocking.
+	 */
+	while (!devl_trylock(devlink)) {
+		if (!esw->esw_funcs.notifier_enabled)
+			return;
+		schedule_timeout_interruptible(msecs_to_jiffies(10));
+	}
+
 	/* Number of VFs can only change from "0 to x" or "x to 0". */
 	if (esw->esw_funcs.num_vfs > 0) {
 		mlx5_eswitch_unload_vf_vports(esw, esw->esw_funcs.num_vfs);
-- 
2.44.0