[PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns

Tariq Toukan posted 3 patches 3 weeks, 3 days ago
[PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns
Posted by Tariq Toukan 3 weeks, 3 days ago
From: Jianbo Liu <jianbol@nvidia.com>

When a PF enters switchdev mode, its netdevice becomes the uplink
representor but remains in its current network namespace. All other
representors (VFs, SFs) are created in the netns of the devlink
instance.

If the PF's netns has been moved and differs from the devlink's netns,
enabling switchdev mode would create an invalid state where
representors and PF exist in different namespaces.

To prevent this inconsistent configuration, block the request to enter
switchdev mode if the PF netdevice's netns does not match the netns of
its devlink instance.

As part of this change, the PF's netns is first marked as immutable.
This prevents race conditions where the netns could be changed after
the check is performed but before the mode transition is complete, and
it aligns the PF's behavior with that of the final uplink representor.

Fixes: 71c6eaebf06a ("net/mlx5e: Set netdev name space on creation")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../mellanox/mlx5/core/eswitch_offloads.c     | 33 +++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index bee906661282..b204ed459760 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -3739,6 +3739,29 @@ void mlx5_eswitch_unblock_mode(struct mlx5_core_dev *dev)
 	up_write(&esw->mode_lock);
 }
 
+/* Returns false only when uplink netdev exists and its netns is different from
+ * devlink's netns. True for all others so entering switchdev mode is allowed.
+ */
+static bool mlx5_devlink_netdev_netns_immutable_set(struct devlink *devlink,
+						    bool immutable)
+{
+	struct mlx5_core_dev *mdev = devlink_priv(devlink);
+	struct net_device *netdev;
+	bool ret;
+
+	netdev = mlx5_uplink_netdev_get(mdev);
+	if (!netdev)
+		return true;
+
+	rtnl_lock();
+	netdev->netns_immutable = immutable;
+	ret = net_eq(dev_net(netdev), devlink_net(devlink));
+	rtnl_unlock();
+
+	mlx5_uplink_netdev_put(mdev, netdev);
+	return ret;
+}
+
 int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 				  struct netlink_ext_ack *extack)
 {
@@ -3781,6 +3804,14 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 	esw->eswitch_operation_in_progress = true;
 	up_write(&esw->mode_lock);
 
+	if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV &&
+	    !mlx5_devlink_netdev_netns_immutable_set(devlink, true)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Can't change E-Switch mode to switchdev when netdev net namespace has diverged from the devlink's.");
+		err = -EINVAL;
+		goto skip;
+	}
+
 	if (mode == DEVLINK_ESWITCH_MODE_LEGACY)
 		esw->dev->priv.flags |= MLX5_PRIV_FLAGS_SWITCH_LEGACY;
 	mlx5_eswitch_disable_locked(esw);
@@ -3799,6 +3830,8 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 	}
 
 skip:
+	if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV && err)
+		mlx5_devlink_netdev_netns_immutable_set(devlink, false);
 	down_write(&esw->mode_lock);
 	esw->eswitch_operation_in_progress = false;
 unlock:
-- 
2.31.1
Re: [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns
Posted by Jakub Kicinski 3 weeks, 2 days ago
On Mon, 8 Sep 2025 13:07:05 +0300 Tariq Toukan wrote:
> If the PF's netns has been moved and differs from the devlink's netns,
> enabling switchdev mode would create an invalid state where
> representors and PF exist in different namespaces.
> 
> To prevent this inconsistent configuration,

Could you explain clearly what is the problem with having different
netdevs in different namespaces? From networking perspective it really
doesn't matter.
-- 
pw-bot: cr
Re: [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns
Posted by Jianbo Liu 3 weeks, 1 day ago

On 9/10/2025 9:23 AM, Jakub Kicinski wrote:
> On Mon, 8 Sep 2025 13:07:05 +0300 Tariq Toukan wrote:
>> If the PF's netns has been moved and differs from the devlink's netns,
>> enabling switchdev mode would create an invalid state where
>> representors and PF exist in different namespaces.
>>
>> To prevent this inconsistent configuration,
> 
> Could you explain clearly what is the problem with having different
> netdevs in different namespaces? From networking perspective it really
> doesn't matter.

There is a requirement from customer who wants to manage openvswitch in 
a container. But he can't complete the steps (changing eswitch and 
configuring OVS) in the container if the netns are different.
Besides, ibdev is dependent on netdev, there is refcnt issue if netdev 
is moved to other netns but devlink netns is not changed by "devlink dev 
reload netns" command.

Thanks!
Jianbo
Re: [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns
Posted by Jakub Kicinski 3 weeks, 1 day ago
On Wed, 10 Sep 2025 11:01:18 +0800 Jianbo Liu wrote:
> On 9/10/2025 9:23 AM, Jakub Kicinski wrote:
> > On Mon, 8 Sep 2025 13:07:05 +0300 Tariq Toukan wrote:  
> >> If the PF's netns has been moved and differs from the devlink's netns,
> >> enabling switchdev mode would create an invalid state where
> >> representors and PF exist in different namespaces.
> >>
> >> To prevent this inconsistent configuration,  
> > 
> > Could you explain clearly what is the problem with having different
> > netdevs in different namespaces? From networking perspective it really
> > doesn't matter.  
> 
> There is a requirement from customer who wants to manage openvswitch in 
> a container. But he can't complete the steps (changing eswitch and 
> configuring OVS) in the container if the netns are different.

You're preventing a configuration which you think is "bad" (for a
reason unknown). How is _rejecting_ a config enabling you to fulfill
some "customer requirement" which sounds like having all interfaces 
in a separate ns?

> Besides, ibdev is dependent on netdev, there is refcnt issue if netdev 
> is moved to other netns but devlink netns is not changed by "devlink dev 
> reload netns" command.

shrug
Re: [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns
Posted by Jianbo Liu 3 weeks ago

On 9/11/2025 8:48 AM, Jakub Kicinski wrote:
> On Wed, 10 Sep 2025 11:01:18 +0800 Jianbo Liu wrote:
>> On 9/10/2025 9:23 AM, Jakub Kicinski wrote:
>>> On Mon, 8 Sep 2025 13:07:05 +0300 Tariq Toukan wrote:
>>>> If the PF's netns has been moved and differs from the devlink's netns,
>>>> enabling switchdev mode would create an invalid state where
>>>> representors and PF exist in different namespaces.
>>>>
>>>> To prevent this inconsistent configuration,
>>>
>>> Could you explain clearly what is the problem with having different
>>> netdevs in different namespaces? From networking perspective it really
>>> doesn't matter.
>>
>> There is a requirement from customer who wants to manage openvswitch in
>> a container. But he can't complete the steps (changing eswitch and
>> configuring OVS) in the container if the netns are different.
> 
> You're preventing a configuration which you think is "bad" (for a
> reason unknown). How is _rejecting_ a config enabling you to fulfill
> some "customer requirement" which sounds like having all interfaces
> in a separate ns?
> 

My apologies, I wasn't clear. The problem is specific to the OVS control 
plane. ovs-vsctl cannot manage the switch if the PF uplink and VF 
representors are in different namespaces. When the PF is in a container 
while the devlink instance is bound to the host, enabling switchdev 
creates this exact split: the PF uplink stays in the container, while 
the VF representors are created on the host.

Our patch prevents this broken state by requiring the devlink namespace 
to be set to the container's namespace before switchdev can be enabled.

>> Besides, ibdev is dependent on netdev, there is refcnt issue if netdev
>> is moved to other netns but devlink netns is not changed by "devlink dev
>> reload netns" command.
> 
> shrug
Re: [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns
Posted by Jakub Kicinski 3 weeks ago
On Thu, 11 Sep 2025 15:48:24 +0800 Jianbo Liu wrote:
> >> There is a requirement from customer who wants to manage openvswitch in
> >> a container. But he can't complete the steps (changing eswitch and
> >> configuring OVS) in the container if the netns are different.  
> > 
> > You're preventing a configuration which you think is "bad" (for a
> > reason unknown). How is _rejecting_ a config enabling you to fulfill
> > some "customer requirement" which sounds like having all interfaces
> > in a separate ns?
> 
> My apologies, I wasn't clear. The problem is specific to the OVS control 
> plane. ovs-vsctl cannot manage the switch if the PF uplink and VF 
> representors are in different namespaces. When the PF is in a container 
> while the devlink instance is bound to the host, enabling switchdev 
> creates this exact split: the PF uplink stays in the container, while 
> the VF representors are created on the host.

So you're saying the user can mess up the configuration in a way that'd
prevent them from using OVS. No strong objection to the patch (assuming
commit message is improved), but I don't see how this is a fix.
Re: [PATCH net 2/3] net/mlx5e: Prevent entering switchdev mode with inconsistent netns
Posted by Jianbo Liu 3 weeks ago

On 9/12/2025 8:11 AM, Jakub Kicinski wrote:
> On Thu, 11 Sep 2025 15:48:24 +0800 Jianbo Liu wrote:
>>>> There is a requirement from customer who wants to manage openvswitch in
>>>> a container. But he can't complete the steps (changing eswitch and
>>>> configuring OVS) in the container if the netns are different.
>>>
>>> You're preventing a configuration which you think is "bad" (for a
>>> reason unknown). How is _rejecting_ a config enabling you to fulfill
>>> some "customer requirement" which sounds like having all interfaces
>>> in a separate ns?
>>
>> My apologies, I wasn't clear. The problem is specific to the OVS control
>> plane. ovs-vsctl cannot manage the switch if the PF uplink and VF
>> representors are in different namespaces. When the PF is in a container
>> while the devlink instance is bound to the host, enabling switchdev
>> creates this exact split: the PF uplink stays in the container, while
>> the VF representors are created on the host.
> 
> So you're saying the user can mess up the configuration in a way that'd
> prevent them from using OVS. No strong objection to the patch (assuming
> commit message is improved), but I don't see how this is a fix.

Yes. We are preventing a configuration that breaks the OVS control plane 
for this specific use case. Thank you for the review. I will update the 
commit message.